The document outlines a 10-week training program focused on data engineering, covering topics such as setting up Postgres and Airflow, data ingestion, analytics engineering, batch and streaming processing, data quality, and orchestration. Each week has specific objectives, including hands-on labs and a capstone project to apply learned skills. The program emphasizes the use of tools like Docker, dbt, Spark, and Great Expectations for data management and validation.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
6 views2 pages
Data Engineering Roadmap
The document outlines a 10-week training program focused on data engineering, covering topics such as setting up Postgres and Airflow, data ingestion, analytics engineering, batch and streaming processing, data quality, and orchestration. Each week has specific objectives, including hands-on labs and a capstone project to apply learned skills. The program emphasizes the use of tools like Docker, dbt, Spark, and Great Expectations for data management and validation.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 2
Subject Objective
Week 1: Introduction and - Running Postgres locally with Docker
Prerequisites -- Setting Setting up up Airflow Snowflakelocally Cloud Data Warehouse Week 2: Data Ingestion - Ingesting data to AWS with Airflow -- Partitioning andtoClustering Ingesting data local Postgres with Airflow Week 3: Data Warehouse -- Postgres and dbt Best practices Week 4: Analytics Engineering -- dbt Whatmodels is Spark Week 5: Batch Processing -- Testing and documenting Spark Dataframes Week 6: Streaming -- Schemas Spark SQL(avro) Processing -- Kafka Streams with Great Expectations and Deequ Data validation Week 7: Data Quality -- Pipeline Week 8: Orchestration and Anomalyorchestration detection andbenefits incremental validation with Deequ - Creating Data Lineage Automation -- Week 9: working on your project Week 9 : Capstone Project Event-based vs time-based ; business driven vs data driven - Week 10 (extra): reviewing your peers python Labs function and 3 DDL for 3 normal form tables. - Forward and Backward data format -- Sample End-to-End data pipeline Setup Docker -- Setup MinIO for datalake Colllect data from API, Database - Build Pipeline to load data from datalake to data warehouse -with Schedule dbt pipeline adenpotent patternwith Airlfow (Astronomer) -- Processing Connect BI large data with tool (Google Spark Studio / Metabase) with data - Trigger and schedule spark job -- Setup schema Apply Spark jobregister and ML to process validation pipeline -- Analyze real-time data Implement dataops with dbt and schduling with Airflow -- Data Quality Research datawith Great Expectations lineage - Design data model for logging and lineage To be defined