PinnedSuman Kumar GangopadhyayUsing XCOMs in Airflow — Scenario based examples with codeThere are many tutorials which already describes how to use XCOMs in airflow, however I found that most of these are two simplistic and do…Oct 4, 20221Oct 4, 20221
PinnedSuman Kumar GangopadhyayAirflow, Spark & S3, stitching it all togetherIn my previous post, I described one of the many ways to set up your own Spark cluster (in AWS) and submitting spark jobs in that cluster…Apr 18, 2021Apr 18, 2021
Suman Kumar GangopadhyayNested Vectorisation and its speed boostWe are all aware of vectorisation and its benefits in terms of boosting performance of various data engineering/machine learning pipelines…Aug 19Aug 19
Suman Kumar GangopadhyayHow to start using Polars & DuckDB together for data analysisPython Pandas have been around for a very long time and it will continue to do so for the foreseeable future, however, that shouldn’t stop…Mar 181Mar 181
Suman Kumar GangopadhyayHow to Spin up AWS EKS and deploy your applications in itThis blog will lay out the steps needed to spin up an AWS EKS cluster with EC2 nodes. This is aimed primarily at beginners and also for…May 15, 2023May 15, 2023
Suman Kumar GangopadhyayinAnalytics VidhyaA tool/framework to detect the extent of changes in data entities between time periodsToday, organisations in the world leverage multiple tools/frameworks to enable traceability of data running throughout various data…Oct 2, 2021Oct 2, 2021
Suman Kumar GangopadhyayCommissioning EMR Spark cluster in AWS and accessing it via an Edge NodeIn my journey as a data engineer, I came across spark when the big data hype was at its fever pitch (it remains high today, however, some…Mar 27, 2021Mar 27, 2021