PinnedUsing XCOMs in Airflow — Scenario based examples with codeThere are many tutorials which already describes how to use XCOMs in airflow, however I found that most of these are two simplistic and do…Oct 4, 20221Oct 4, 20221
PinnedAirflow, Spark & S3, stitching it all togetherIn my previous post, I described one of the many ways to set up your own Spark cluster (in AWS) and submitting spark jobs in that cluster…Apr 18, 2021Apr 18, 2021
Nested Vectorisation and its speed boostWe are all aware of vectorisation and its benefits in terms of boosting performance of various data engineering/machine learning pipelines…Aug 19, 2024Aug 19, 2024
How to start using Polars & DuckDB together for data analysisPython Pandas have been around for a very long time and it will continue to do so for the foreseeable future, however, that shouldn’t stop…Mar 18, 20241Mar 18, 20241
How to Spin up AWS EKS and deploy your applications in itThis blog will lay out the steps needed to spin up an AWS EKS cluster with EC2 nodes. This is aimed primarily at beginners and also for…May 15, 2023May 15, 2023
Published inAnalytics VidhyaA tool/framework to detect the extent of changes in data entities between time periodsToday, organisations in the world leverage multiple tools/frameworks to enable traceability of data running throughout various data…Oct 2, 2021Oct 2, 2021
Commissioning EMR Spark cluster in AWS and accessing it via an Edge NodeIn my journey as a data engineer, I came across spark when the big data hype was at its fever pitch (it remains high today, however, some…Mar 27, 2021Mar 27, 2021