Lab_01 - Data Engineering Practice
Lab_01 - Data Engineering Practice
Resources:
• SQLite Quickstart
Practice Steps:
Install PostgreSQL or SQLite.
Use Pandas to read the dataset.
Write a Python script to insert data into the database.
2: Data Processing & Transformation
Task 3: Transform Data Using Pandas & SQL
Resources:
Practice Steps:
Write SQL queries to clean the data.
Perform aggregations using Pandas.
3: Data Orchestration with Apache Airflow
Task 4: Automate Data Processing with Airflow
Resources:
Practice Steps:
Install Airflow and configure it.
Write a DAG to automate data ingestion & transformation.
Schedule the DAG to run every fixed interval e.g.: 5 minute or every hour:
Additional Resources for Downloading Notebooks &
Datasets
Open Datasets
1. Kaggle – https://www.kaggle.com/datasets
4. Apache Airflow
Examples – https://github.com/apache/airflow/tree/main/airflow/example_dags
📌 What's Next?
If you have more time, try these:
Deploy your pipeline on the cloud (AWS/GCP/Azure).
Use Kafka for real-time data ingestion.
Implement a Feature Store with Feast.