In my previous post, I described one of the many ways to set up your own Spark cluster (in AWS) and submitting spark jobs in that cluster from an edge node (in AWS). …


Data Observability — a must have metric today

Today, organisations in the world leverage multiple tools/frameworks to enable traceability of data running throughout various data pipelines within their own data landscape. A variety of tools/frameworks exist to track and report pipeline performance, to alert on imminent SLA breaches and also to show the lineage of a data product…


In my journey as a data engineer, I came across spark when the big data hype was at its fever pitch (it remains high today, however, some of the myths have been replaced with reality). While I have developed production grade spark applications, it came to my notice that there…

Suman Kumar Gangopadhyay

In a mission to reduce waste in supply chain using AI/ML, visit Noodle.ai for more details

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store