Data Engineering
Data Engineering
Watch this if you don’t have a computer science background, as a Data Engineer having
good knowledge of CS fundamentals is important to understand big systems and how they
work
1. CS50 2022
2. Book - Grokking Algorithms: An illustrated guide
2. Programming Language
Do any courses, your main goal here is to understand how to write basic Python
Practice Projects:
● Scrape Data Using BeautifulSoup Library eg. Amazon, Covid, Wikipedia, or any
website you like
● Build A Calculator Using Python
3. SQL (Structured Query Language)
Learn about the basics of SQL and how to write queries, once you complete the course
make sure you do hands-on practice on Hackerrank or any website you like!
1. Udemy - The Complete SQL Bootcamp for the Manipulation and Analysis of
Data (Recommended)
2. Coursera - SQL for Data Science
3. DataCamp - Intro To SQL DataCamp
● Hackerrank SQL
4. Basics Of Linux
Why Linux? Because you will be working with many remote machines, doing SSH to access
them, and performing operations so it’s important to learn them.
You don’t have to remember all the commands but just understand what they do and how to
write them
This section is theoretical and you need to understand how big data system works and their
history of them
Learn Fundamentals and then learn one tool, Snowflake, BigQuery, Redshift, etc… Just
learn one and you are good!
1. Fundamentals
1. Coursera - Data Warehousing for Business Intelligence Specialization
(recommended for deep dive)
2. Udemy - Data Warehouse Fundamentals for Beginners
(recommended for quick learning)
2. Tools
1. Snowflake - Snowflake – The Complete Masterclass
2. Snowflake Doc - https://www.snowflake.com/certifications/
7. Learn Batch Processing + Tool
1. Spark Fundamentals
1. DataCamp - Big Data Fundamentals with PySpark (recommended)
2. Udemy - Spark and Python for Big Data with PySpark
2. Databricks
1. Udemy - Azure Databricks & Spark Core
2. Udemy - Databricks Certified Data Engineer Associate
3. Coursera - Databricks for Data Engineering
8. Learn RealTime Streaming
1. Realtime Streaming (Kafka)
1. Udemy - Apache Kafka Course for Beginners: Learn Kafka Online
(check this)
2. edX - Building ETL and Data Pipelines with Bash, Airflow, and Kafka
Advance section, do courses, and then do the certification to add value in your
Resume, If you are new then start with AWS but if you know about
Do Hands-On Project
Recommended Books
1. Netflix - https://netflixtechblog.medium.com/
2. AWS - https://aws.amazon.com/solutions/case-studies/
3. GCP - https://cloud.google.com/customers
4. Azure - https://azure.microsoft.com/en-us/resources/customer-stories/