0% found this document useful (1 vote)
975 views

Data Engineering

This document provides a comprehensive guide for learning computer science fundamentals and data engineering skills. It includes 12 sections covering topics like Python, SQL, Linux, big data, data warehousing, batch processing, streaming, data orchestration, cloud computing, and the modern data stack. Each section lists relevant courses, tutorials, books, and hands-on projects for skills development. Overall, the guide outlines a complete curriculum and resources for becoming a well-rounded data engineer.

Uploaded by

Mohsin Khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
975 views

Data Engineering

This document provides a comprehensive guide for learning computer science fundamentals and data engineering skills. It includes 12 sections covering topics like Python, SQL, Linux, big data, data warehousing, batch processing, streaming, data orchestration, cloud computing, and the modern data stack. Each section lists relevant courses, tutorials, books, and hands-on projects for skills development. Overall, the guide outlines a complete curriculum and resources for becoming a well-rounded data engineer.

Uploaded by

Mohsin Khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Computer Science Fundamentals (If you don’t have a CS background)

Watch this if you don’t have a computer science background, as a Data Engineer having
good knowledge of CS fundamentals is important to understand big systems and how they
work

Watching these videos will give you a basic understanding of CS fundamentals

You can watch the first 7 lectures from this playlist

1. CS50 2022
2. Book - Grokking Algorithms: An illustrated guide
2. Programming Language

Do any courses, your main goal here is to understand how to write basic Python

Code and how to work with different datasets!

1. Darshil - Python for Data Engineering (Recommended)


2. DataCamp - Data Engineering With Python
3. Coursera - Python for Everybody Specialization (Do this if you don’t know
anything about python)
4. Udemy - Python Bootcamps: Learn Python Programming and Code Training
5. freeCodeCamp - Learn Python - Full Course for Beginners

Practice Projects:

● Scrape Data Using BeautifulSoup Library eg. Amazon, Covid, Wikipedia, or any
website you like
● Build A Calculator Using Python
3. SQL (Structured Query Language)

Learn about the basics of SQL and how to write queries, once you complete the course
make sure you do hands-on practice on Hackerrank or any website you like!

1. Udemy - The Complete SQL Bootcamp for the Manipulation and Analysis of
Data (Recommended)
2. Coursera - SQL for Data Science
3. DataCamp - Intro To SQL DataCamp

Practice SQL here

● Hackerrank SQL
4. Basics Of Linux

Why Linux? Because you will be working with many remote machines, doing SSH to access
them, and performing operations so it’s important to learn them.

You don’t have to remember all the commands but just understand what they do and how to
write them

1. Udemy - Linux for Beginners: Linux Basics


2. Coursera - Linux Fundamentals
3. freeCodeCamp - Top 50 Most Popular Linux Commands (Recommended)
Do Hands-On Project

● Beginner Data Engineering Portfolio Project (Recommended)


5. Big Data Fundamentals

This section is theoretical and you need to understand how big data system works and their
history of them

1. Coursera - Big Data Specialization (Recommended)


2. Udemy - Learn Big Data: The Hadoop Ecosystem Masterclass (Do this if you
want to learn about legacy systems)
6. Data Warehouse Fundamentals + Tool

Learn Fundamentals and then learn one tool, Snowflake, BigQuery, Redshift, etc… Just
learn one and you are good!

1. Fundamentals
1. Coursera - Data Warehousing for Business Intelligence Specialization
(recommended for deep dive)
2. Udemy - Data Warehouse Fundamentals for Beginners
(recommended for quick learning)
2. Tools
1. Snowflake - Snowflake – The Complete Masterclass
2. Snowflake Doc - https://www.snowflake.com/certifications/
7. Learn Batch Processing + Tool
1. Spark Fundamentals
1. DataCamp - Big Data Fundamentals with PySpark (recommended)
2. Udemy - Spark and Python for Big Data with PySpark
2. Databricks
1. Udemy - Azure Databricks & Spark Core
2. Udemy - Databricks Certified Data Engineer Associate
3. Coursera - Databricks for Data Engineering
8. Learn RealTime Streaming
1. Realtime Streaming (Kafka)
1. Udemy - Apache Kafka Course for Beginners: Learn Kafka Online
(check this)
2. edX - Building ETL and Data Pipelines with Bash, Airflow, and Kafka

Do Hands-On Project - Stock Market Real-Time Streaming Pipeline

9. Data Orchestration (AirFlow)


1. Udemy - The Complete Hands-On Introduction to Apache Airflow
2. DataCamp - Airflow

Do Hands-On Project - Twitter Data Pipeline using Airflow

10. Cloud Computing

Advance section, do courses, and then do the certification to add value in your

Resume, If you are new then start with AWS but if you know about

other clouds then you can do that too!

1. AWS (Amazon Web Services)


1. Udemy - Ultimate AWS Certified Cloud Practitioner
2. Udemy - Ultimate AWS Certified Solutions Architect Associate (SAA)
3. Coursera - AWS Solution Architect Associate
2. GCP (Google Cloud Platform)
1. Coursera - Cloud Data Engineer Professional Certificate
3. Microsoft Azure
1. Coursera - Microsoft Azure Data Engineering Associate
2. Udemy - AZ-900: Microsoft Azure Fundamentals
3. Udemy - Azure Data Engineer Certified:8 COURSE BUNDLE

Do Hands-On Project

1. Build ETL Pipeline Using AWS Cloud


2. Covid Data Analysis Project
3. YouTube Data Analysis (End-To-End Data Engineering Project)
11. Learn Modern Data Stack
1. Learn Basics - https://analyticsindiamag.com/modern-data-stack-and-what-
we-know-about-it/
2. Dbt - https://www.getdbt.com/dbt-learn/
3. Airbyte - https://airbyte.com/
4. Fivetran - https://www.fivetran.com/
12. DataOps
1. Docker Guide - https://www.coursera.org/projects/docker-for-absolute-
beginners
2. Udemy - Docker & Kubernetes: The Practical Guide

Recommended Books

1. Designing Data-Intensive Applications


2. Fundamentals of Data Engineering
3. The Data Warehouse Toolkit

Read Real-World Case Studies

1. Netflix - https://netflixtechblog.medium.com/
2. AWS - https://aws.amazon.com/solutions/case-studies/
3. GCP - https://cloud.google.com/customers
4. Azure - https://azure.microsoft.com/en-us/resources/customer-stories/

All the best <3

You might also like