0% found this document useful (0 votes)
6 views2 pages

Data Engineering Roadmap

The document outlines a 10-week training program focused on data engineering, covering topics such as setting up Postgres and Airflow, data ingestion, analytics engineering, batch and streaming processing, data quality, and orchestration. Each week has specific objectives, including hands-on labs and a capstone project to apply learned skills. The program emphasizes the use of tools like Docker, dbt, Spark, and Great Expectations for data management and validation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Data Engineering Roadmap

The document outlines a 10-week training program focused on data engineering, covering topics such as setting up Postgres and Airflow, data ingestion, analytics engineering, batch and streaming processing, data quality, and orchestration. Each week has specific objectives, including hands-on labs and a capstone project to apply learned skills. The program emphasizes the use of tools like Docker, dbt, Spark, and Great Expectations for data management and validation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Subject Objective

Week 1: Introduction and - Running Postgres locally with Docker


Prerequisites -- Setting Setting up
up Airflow
Snowflakelocally
Cloud Data Warehouse
Week 2: Data Ingestion - Ingesting data to AWS with Airflow
-- Partitioning andtoClustering
Ingesting data local Postgres with Airflow
Week 3: Data Warehouse -- Postgres and dbt
Best practices
Week 4: Analytics Engineering -- dbt Whatmodels
is Spark
Week 5: Batch Processing -- Testing and documenting
Spark Dataframes
Week 6: Streaming -- Schemas
Spark SQL(avro)
Processing -- Kafka Streams with Great Expectations and Deequ
Data validation
Week 7: Data Quality -- Pipeline
Week 8: Orchestration and Anomalyorchestration
detection andbenefits
incremental validation with Deequ
- Creating Data Lineage
Automation -- Week 9: working on your project
Week 9 : Capstone Project Event-based vs time-based ; business driven vs data driven
- Week 10 (extra): reviewing your peers
python Labs function and 3 DDL for 3 normal form tables.
- Forward and Backward data format
-- Sample End-to-End data pipeline
Setup Docker
-- Setup MinIO for datalake
Colllect data from API, Database
- Build Pipeline to load data from datalake to data warehouse
-with Schedule dbt pipeline
adenpotent patternwith Airlfow (Astronomer)
-- Processing
Connect BI large data with
tool (Google Spark
Studio / Metabase) with data
- Trigger and schedule spark job
-- Setup schema
Apply Spark jobregister and ML
to process validation
pipeline
-- Analyze real-time data
Implement dataops with dbt and schduling with Airflow
-- Data Quality
Research datawith Great Expectations
lineage
- Design data model for logging and lineage
To be defined

You might also like