Data pipelines are basically the plumbing of modern analytics. But how do you get data from “raw” to “curated” without everything breaking at 2am? In this episode, MLG breaks down:
He also covers the reality that some pipelines are effectively ELTEL (hello Power BI/Tableau), when ETL still makes sense, and how cheaper storage and columnar formats like Parquet changed the game. MLG wraps up the episode with practical best practices to make pipelines boring (in the best way): retries, alerting, logging, version control, small testable steps, parameterisation, data quality checks, and timeouts.
If you’re building anything from dashboards to analytics platforms, this is the mental model you want.