0% found this document useful (0 votes)

137 views16 pages

Data Engineering Nanodegree Program Syllabus

The document provides an overview of a Nanodegree program on Data Engineering with AWS. The 4-month program teaches skills like data modeling, building data warehouses and lakes, automating data pipelines, and managing large datasets using tools like Spark, Airflow, and AWS services. It consists of 3 courses that cover data modeling, cloud data warehousing, and building data lakes on AWS Spark. Learners will complete projects to demonstrate designing databases, building ETL pipelines, and setting up lakehouse architectures to prepare for careers in data engineering.

Uploaded by

Jonatas Eleoterio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views16 pages

Data Engineering Nanodegree Program Syllabus

Uploaded by

Jonatas Eleoterio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

INDIVIDUAL LEARNERS

S C H O O L O F D ATA S C I E N C E

Data Engineering
with AWS
Nanodegree Program Syllabus
Overview
Learn to design data models, build data warehouses and data lakes, automate data pipelines, and manage massive datasets.

Learning Objectives
Students will learn to:

• Create user-friendly relational and NoSQL data models.

• Create scalable and efficient data warehouses.

• Work efficiently with massive datasets.

• Build and interact with a cloud-based data lake.

• Automate and monitor data pipelines.

• Develop proficiency in Spark, Airflow, and AWS tools.

Data Engineering with AWS 2

Program information

Estimated Time Skill Level

4 months Intermediate

Prerequisites

It is recommended that learners have intermediate Python, intermediate SQL, and command line skills.

Required Hardware/Software

There are no software and version requirements to complete this Nanodegree program. All coursework and projects can be
completed via Student Workspaces in the Udacity online classroom. Udacity’s basic tech requirements can be found at https://
www.udacity.com/tech/requirements.

*The length of this program is an estimation of total hours the average student may take to complete all required
coursework, including lecture and project time. If you spend about 5-10 hours per week working through the program, you
should finish within the time provided. Actual hours may vary.

Data Engineering with AWS 3

Course 1

Data Modeling
Learners will create relational and NoSQL data models to fit the diverse needs of data consumers. They’ll also use ETL to build
databases in Apache Cassandra.

Course Project

Data Modeling with Apache Cassandra

Model event data to create a non-relational database and ETL pipeline for a music streaming app. Learners
will define queries and tables for a database built using Apache Cassandra.

• Understand the purpose of data modeling.

Lesson 1 • Identify the strengths and weaknesses of different types of databases and data
storage techniques.
Introduction to Data Modeling
• Create a table in Apache Cassandra.

• Understand when to use a relational database.

Lesson 2 • Understand the difference between OLAP and OLTP databases.

Relational Data Models • Create normalized data tables.

• Implement denormalized schemas (e.g. STAR, Snowflake).

Data Engineering with AWS 4

• Understand when to use NoSQL databases and how they differ from relational
Lesson 3 databases.

• Select the appropriate primary key and clustering columns for a given use case.
NoSQL Data Models
• Create a NoSQL database in Apache Cassandra.

Data Engineering with AWS 5

Course 2

Cloud Data Warehouses

In this course, learners will create cloud-based data warehouses. They will sharpen their data warehousing skills, deepen their
understanding of data infrastructure, and be introduced to data engineering on the cloud using Amazon Web Services (AWS).

Course Project

Data Warehouse
In this project, learners will act as a data engineer for a streaming music service. They are tasked with
building an ELT pipeline that extracts data from S3, stages it in Redshift, and transforms it into a set of
dimensional tables for an analytics team to find insights into what songs their users are listening to.

• Explain how OLAP may support certain business users better than OLTP.

• Implement ETL for OLAP Transformations with SQL.

• Describe Data Warehouse Architecture.

Lesson 1
• Describe OLAP cube from facts and dimensions to slice, dice, roll-up, and drill
down operations.
Introduction to Data
Warehouses • Implement OLAP cubes from facts and dimensions to slice, dice, roll-up, and
drill down.

• Compare columnar vs. row-oriented approaches.

• Implement columnar vs. row-oriented approaches.

Data Engineering with AWS 6

• Explain the differences between ETL and ELT.

• Differentiate scenarios where ELT is preferred over ETL.

Lesson 2
• Implement ETL for OLAP Transformations with SQL.
ELT and Data Warehouse • Select appropriate cloud data storage solutions.
Technology in the Cloud
• Select appropriate cloud pipeline solutions.

• Select appropriate cloud data warehouse solutions.

• Describe AWS data warehouse services and technologies.

Lesson 3 • Create and configure AWS Storage Resources.

AWS Data Technologies • Create and configure Amazon Redshift resources.

• Implement infrastructure as code for Redshift on AWS.

• Describe Redshift data warehouse architecture.

Lesson 4
• Run ETL process to extract data from AWS S3 into Redshift.
Implementing Data
• Design optimized tables by selecting appropriate distribution styles and sorting
Warehouses on AWS
keys.

Data Engineering with AWS 7

Course 3

Spark & Data Lakes

Learners will build a data lake on AWS and a data catalog following the principles of data lakehouse architecture. They will
learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformation. They’ll work with
AWS data tools and services to extract, load, process, query, and transform semi-structured data in data lakes.

Course Project

STEDI Human Balance Analytics

In this project, learners will act as a data engineer for the STEDI team to build a data lakehouse solution for
sensor data that trains a machine learning model. They will build an ELT (Extract, Load, Transform) pipeline
for lakehouse architecture, load data from an AWS S3 data lake, process the data into analytics tables using
Spark and AWS Glue, and load them back into lakehouse architecture.

• Identify what constitutes the big data ecosystem for data engineering.

Lesson 1 • Explain the purpose and evolution of data lakes in the big data ecosystem.

• Compare the Spark framework with Hadoop framework.

Big Data Ecosystem, Data
Lakes, & Spark • Identify when to use Spark and when not to use it.

• Describe the features of lakehouse architecture.

Data Engineering with AWS 8

• Wrangle data with Spark and functional programming to scale across
distributed systems.

Lesson 2 • Process data with Spark DataFrames and Spark SQL.

• Process data in common formats such as CSV and JSON.

Spark Essentials
• Use the Spark RDDs API to wrangle data.

• Transform and filter data with Spark.

• Use distributed data storage with Amazon S3.

Lesson 3 • Identify properties of AWS S3 data lakes.

• Identify service options for using Spark in AWS.

Using Spark & Data Lakes in
the AWS Cloud • Configure AWS Glue.

• Create and run Spark Jobs with AWS Glue.

• Use Spark with AWS Glue to run ELT processes on data of diverse sources,
structures, and vintages in lakehouse architecture.

• Create a Glue Data Catalog and Glue Tables.

Lesson 4 • Use AWS Athena for ad-hoc queries in a lakehouse.

• Leverage Glue for SQL AWS S3 queries and ELT.

Ingesting & organizing data
in lakehouse architecture on • Ingest data into lakehouse zones.

AWS • Transform and filter data into curated lakehouse zones with Spark and AWS
Glue.

• Join and process data into lakehouse zones with Spark and AWS Glue.

Data Engineering with AWS 9

Course 4

Automate Data Pipelines

In this course, learners will dive into the concept of data pipelines and how they can use them to accelerate their career as a
data engineer. This course will focus on applying the data pipeline concepts students will learn through an open-source tool
from Airbnb called Apache Airflow. This course will start by covering concepts including data validation, DAGs, and Airflow.
We’ll then venture into AWS quality concepts like copying S3 data, connections and hooks, and Redshift Serverless. Next,
learners will explore data quality through data lineage, data pipeline schedules, and data partitioning. Finally, they’ll put data
pipelines into production by extending Airflow with plugins, implementing task boundaries, and refactoring DAGs.

Course Project

Data Pipelines with Airflow

In this project, learners will work to build high grade data pipelines from reusable tasks that can be
monitored and provide easy backfills for a music streaming company, Sparkify. They will move JSON logs
of user activity and JSON metadata data from S3 and process it in Sparkify’s data warehouse in Amazon
Redshift. To complete the project, learners will need to create their own custom operators to perform tasks
such as staging the data, filling the data warehouse, and running checks on the data as the final step.

• Define and describe a data pipeline and its usage.

• Explain the relationship between DAGs, S3, and Redshift within a given
Lesson 1 example.

• Employ tasks as instantiated operators.

Data Pipelines
• Organize task dependencies based on logic flow.

• Apply templating in codebase with kwargs parameter to set runtime variables.

Data Engineering with AWS 10

• Create Airflow Connection to AWS using AWS credentials.

Lesson 2 • Create Postgres/Redshift Airflow Connections.

Airflow & AWS • Leverage hooks to use Connections in DAGs.

• Connect S3 to a Redshift DAG programmatically.

• Utilize the logic flow of task dependencies to investigate potential errors within
data lineage.
Lesson 3
• Leverage Airflow catchup to backfill data.
Data Quality • Extract data from a specific time range by employing the kwargs parameters.

• Create a task to ensure data quality within select tables.

• Consolidate repeated code into operator plugins.

Lesson 4 • Refactor a complex task into multiple tasks with separate SQL statements.

Production Data Pipelines • Convert an Airflow 1 DAG into an Airflow 2 DAG.

• Construct a DAG and custom operator end-to-end.

Data Engineering with AWS 11

Meet your instructors.

Amanda Moran
Developer Advocate at DataStax

Amanda is a developer advocate for DataStax after spending the last 6 years as a software engineer
on 4 different distributed databases. Her passion is bridging the gap between customers and
engineering. She has degrees from the University of Washington and Santa Clara University.

Ben Goldberg
Staff Engineer at SpotHero

In his career as an engineer, Ben Goldberg has worked in fields ranging from computer vision
to natural language processing. At SpotHero, he founded and built out their data engineering
team, using Airflow as one of the key technologies.

Valerie Scarlata
Curriculum Manager at Udacity

Valerie is a curriculum manager at Udacity who has developed and taught a broad range of
computing curriculum for several colleges and universities. She was a professor and software
engineer for over 10 years specializing in web, mobile, voice assistant, and social full-stack
application development.

Matt Swaffer
Solutions Architect

Matt is a software and solutions architect focusing on data science and analytics for managed
business solutions. In addition, Matt is an adjunct lecturer, teaching courses in the computer
information systems department at the University of Northern Colorado where he received his
PhD in educational psychology.

Data Engineering with AWS 12

Sean Murdock

Professor at Brigham Young University Idaho

Sean currently teaches cybersecurity and DevOps courses at Brigham Young University Idaho. He
has been a software engineer for over 16 years. Some of the most exciting projects he has worked
on involved data pipelines for DNA processing and vehicle telematics.

Data Engineering with AWS 13

Udacity’s learning
experience

Hands-on Projects Quizzes

Open-ended, experiential projects are designed Auto-graded quizzes strengthen comprehension.
to reflect actual workplace challenges. They aren’t Learners can return to lessons at any time during
just multiple choice questions or step-by-step the course to refresh concepts.
guides, but instead require critical thinking.

Knowledge Custom Study Plans

Find answers to your questions with Knowledge, Create a personalized study plan that fits your
our proprietary wiki. Search questions asked by individual needs. Utilize this plan to keep track of
other students, connect with technical mentors, movement toward your overall goal.
and discover how to solve the challenges that
you encounter.

Workspaces Progress Tracker

See your code in action. Check the output and Take advantage of milestone reminders to stay
quality of your code by running it on interactive on schedule and complete your program.
workspaces that are integrated into the platform.

Data Engineering with AWS 14

Our proven approach for building
job-ready digital skills.
Experienced Project Reviewers

Verify skills mastery.

• Personalized project feedback and critique includes line-by-line code review from
skilled practitioners with an average turnaround time of 1.1 hours.

• Project review cycle creates a feedback loop with multiple opportunities for
improvement—until the concept is mastered.

• Project reviewers leverage industry best practices and provide pro tips.

Technical Mentor Support

24/7 support unblocks learning.

• Learning accelerates as skilled mentors identify areas of achievement and potential
for growth.

• Unlimited access to mentors means help arrives when it’s needed most.

• 2 hr or less average question response time assures that skills development stays on track.

Personal Career Services

Empower job-readiness.
• Access to a Github portfolio review that can give you an edge by highlighting your
strengths, and demonstrating your value to employers.*

• Get help optimizing your LinkedIn and establishing your personal brand so your profile
ranks higher in searches by recruiters and hiring managers.

Mentor Network

Highly vetted for effectiveness.

• Mentors must complete a 5-step hiring process to join Udacity’s selective network.

• After passing an objective and situational assessment, mentors must demonstrate

communication and behavioral fit for a mentorship role.

• Mentors work across more than 30 different industries and often complete a Nanodegree
program themselves.

*Applies to select Nanodegree programs only.

Data Engineering with AWS 15

Learn more at
www.udacity.com/online-learning-for-individuals →

11.28.22 | V1.0

Wiki Data Structures
No ratings yet
Wiki Data Structures
583 pages
Kominfo Iykra
No ratings yet
Kominfo Iykra
85 pages
azure comapny wise question
No ratings yet
azure comapny wise question
68 pages
MIMIX Administrator Reference
No ratings yet
MIMIX Administrator Reference
863 pages
Technician'S Installation Guide: Cs Softdent Practice Management Software Versions 16.1 and Higher
No ratings yet
Technician'S Installation Guide: Cs Softdent Practice Management Software Versions 16.1 and Higher
13 pages
What Would Happen If I Surrender
No ratings yet
What Would Happen If I Surrender
211 pages
Mastering Apache Spark - Sample Chapter
No ratings yet
Mastering Apache Spark - Sample Chapter
24 pages
2024 DQOps Ebook A Step-By-step Guide To Improve Data Quality
No ratings yet
2024 DQOps Ebook A Step-By-step Guide To Improve Data Quality
120 pages
Cloudera Administration PDF
100% (1)
Cloudera Administration PDF
476 pages
Full Stack 2025
No ratings yet
Full Stack 2025
8 pages
File processing (1)
No ratings yet
File processing (1)
55 pages
Database Management System Class 11 Notes
No ratings yet
Database Management System Class 11 Notes
6 pages
Matplotlib linechatsy
No ratings yet
Matplotlib linechatsy
38 pages
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
No ratings yet
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
13 pages
Spark Interview Questions 1713805760
No ratings yet
Spark Interview Questions 1713805760
40 pages
Theoritical Questions With Answers - 1
No ratings yet
Theoritical Questions With Answers - 1
25 pages
Pyspark SQL Basics Cheat Sheet: Python For Data Science
No ratings yet
Pyspark SQL Basics Cheat Sheet: Python For Data Science
1 page
Prediction of Bankruptcy Using Big Data Analytic Based On Fuzzy C-Means Algorithm
No ratings yet
Prediction of Bankruptcy Using Big Data Analytic Based On Fuzzy C-Means Algorithm
7 pages
comp st3 notes (1)
No ratings yet
comp st3 notes (1)
15 pages
OutSystems Reactive Certification - Sample Practice Test
100% (1)
OutSystems Reactive Certification - Sample Practice Test
11 pages
Chapter 3
100% (3)
Chapter 3
4 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
11 pages
Dags: The Definitive Guide: Everything You Need To Know About Airflow Dags
100% (1)
Dags: The Definitive Guide: Everything You Need To Know About Airflow Dags
72 pages
HDP Developer-Enterprise Spark 1-Python Lab Guide-Rev 1
No ratings yet
HDP Developer-Enterprise Spark 1-Python Lab Guide-Rev 1
168 pages
Details of Delta Lake Tutorial
67% (3)
Details of Delta Lake Tutorial
43 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Windowing Functions
No ratings yet
Windowing Functions
54 pages
Apache Spark RDD API Examples
No ratings yet
Apache Spark RDD API Examples
38 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Pyspark PDF
0% (1)
Pyspark PDF
239 pages
How To Work With Apache Airflow
No ratings yet
How To Work With Apache Airflow
111 pages
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
102 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Google Cloud Platform (GCP) at A Glance
No ratings yet
Google Cloud Platform (GCP) at A Glance
8 pages
150 Data Engineering Interview Questions PDF
No ratings yet
150 Data Engineering Interview Questions PDF
8 pages
Interview Q&A
No ratings yet
Interview Q&A
8 pages
Learn Apache Spark
100% (1)
Learn Apache Spark
31 pages
Spark With Bigdata
No ratings yet
Spark With Bigdata
94 pages
Databases 2 Course Material
No ratings yet
Databases 2 Course Material
13 pages
HowToCrackInterview Udemy
No ratings yet
HowToCrackInterview Udemy
58 pages
Online Job Portal System
No ratings yet
Online Job Portal System
5 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Module 4 - Relationship and Cardinality PDF
No ratings yet
Module 4 - Relationship and Cardinality PDF
7 pages
5 - Programming With RDDs and Dataframes
No ratings yet
5 - Programming With RDDs and Dataframes
32 pages
SQL Server 2020
No ratings yet
SQL Server 2020
4 pages
CIS 218 Course Outline
No ratings yet
CIS 218 Course Outline
3 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Spark Tutorial
No ratings yet
Spark Tutorial
8 pages
Data Engineering
No ratings yet
Data Engineering
92 pages
Assignment 6
No ratings yet
Assignment 6
3 pages
How To Master Apache Spark Interview Questions
No ratings yet
How To Master Apache Spark Interview Questions
14 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
SIEM Deck - Blog
No ratings yet
SIEM Deck - Blog
12 pages
Bayaua, Loui Mark B. AC 301 Asset Management System
No ratings yet
Bayaua, Loui Mark B. AC 301 Asset Management System
2 pages
Spark Interview
No ratings yet
Spark Interview
17 pages
Laravel Main Features
No ratings yet
Laravel Main Features
2 pages
802 Information Technology SQP
No ratings yet
802 Information Technology SQP
7 pages
Data-Engineering Course Structure
No ratings yet
Data-Engineering Course Structure
9 pages
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
No ratings yet
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
35 pages
08 Task Performance 1OS FINAL
No ratings yet
08 Task Performance 1OS FINAL
2 pages
Cloudera Introduction PDF
No ratings yet
Cloudera Introduction PDF
97 pages
Ics 2206 Database Systems
No ratings yet
Ics 2206 Database Systems
2 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Making Big Data Simple With Databricks
No ratings yet
Making Big Data Simple With Databricks
25 pages
Spark Concept
No ratings yet
Spark Concept
18 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
QB Sycs DBMS
No ratings yet
QB Sycs DBMS
1 page
SSIS Succinctly
No ratings yet
SSIS Succinctly
116 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
7 pages
DWH Fundamentals (Training Material)
No ratings yet
DWH Fundamentals (Training Material)
21 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
Pyspark Material
No ratings yet
Pyspark Material
16 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
What Is Bigquery: Enterprise Data Warehouse
No ratings yet
What Is Bigquery: Enterprise Data Warehouse
2 pages
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
No ratings yet
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
6 pages
Post-Quiz - Attempt Review
No ratings yet
Post-Quiz - Attempt Review
3 pages
Cloud Dataproc Workflow Animation
No ratings yet
Cloud Dataproc Workflow Animation
2 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Snowflake:: Data Warehouse For Cloud
No ratings yet
Snowflake:: Data Warehouse For Cloud
2 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
From Everand
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
Vishwanathan Narayanan
No ratings yet
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet