PinnedSuman Kumar GangopadhyayinAnalytics VidhyaAirflow, Spark & S3, stitching it all togetherIn my previous post, I described one of the many ways to set up your own Spark cluster (in AWS) and submitting spark jobs in that cluster…8 min read·Apr 18, 2021----
Suman Kumar GangopadhyayHow to start using Polars & DuckDB together for data analysisPython Pandas have been around for a very long time and it will continue to do so for the foreseeable future, however, that shouldn’t stop…7 min read·Mar 18, 2024----
Suman Kumar GangopadhyayinAnalytics VidhyaHow to Spin up AWS EKS and deploy your applications in itThis blog will lay out the steps needed to spin up an AWS EKS cluster with EC2 nodes. This is aimed primarily at beginners and also for…9 min read·May 15, 2023----
Suman Kumar GangopadhyayUsing XCOMs in Airflow — Scenario based examples with codeThere are many tutorials which already describes how to use XCOMs in airflow, however I found that most of these are two simplistic and do…5 min read·Oct 4, 2022--1--1
Suman Kumar GangopadhyayinAnalytics VidhyaExploratory Data Analysis using SparkIntroduction10 min read·Oct 31, 2021----
Suman Kumar GangopadhyayinAnalytics VidhyaA tool/framework to detect the extent of changes in data entities between time periodsToday, organisations in the world leverage multiple tools/frameworks to enable traceability of data running throughout various data…8 min read·Oct 2, 2021----
Suman Kumar GangopadhyayCommissioning EMR Spark cluster in AWS and accessing it via an Edge NodeIn my journey as a data engineer, I came across spark when the big data hype was at its fever pitch (it remains high today, however, some…10 min read·Mar 27, 2021----