Oakland

Episode 7: Data pipelines: From raw to curated data

Data pipelines are basically the plumbing of modern analytics. But how do you get data from “raw” to “curated” without everything breaking at 2am? In this episode, MLG breaks down:

  1. What a data pipeline actually is
  2. Why most teams should default to ELT (not ETL)
  3. How DAGs and orchestrators (like Airflow, Dagster, Prefect, and more) keep your workflows running reliably at scale 

He also covers the reality that some pipelines are effectively ELTEL (hello Power BI/Tableau), when ETL still makes sense, and how cheaper storage and columnar formats like Parquet changed the game. MLG wraps up the episode with practical best practices to make pipelines boring (in the best way): retries, alerting, logging, version control, small testable steps, parameterisation, data quality checks, and timeouts. 

If you’re building anything from dashboards to analytics platforms, this is the mental model you want.