Data Engineering Nanodegree Program Syllabus
Data Engineering Nanodegree Program Syllabus
S C H O O L O F D ATA S C I E N C E
Data Engineering
with AWS
Nanodegree Program Syllabus
Overview
Learn to design data models, build data warehouses and data lakes, automate data pipelines, and manage massive datasets.
Learning Objectives
Students will learn to:
4 months Intermediate
Prerequisites
It is recommended that learners have intermediate Python, intermediate SQL, and command line skills.
Required Hardware/Software
There are no software and version requirements to complete this Nanodegree program. All coursework and projects can be
completed via Student Workspaces in the Udacity online classroom. Udacity’s basic tech requirements can be found at https://
www.udacity.com/tech/requirements.
*The length of this program is an estimation of total hours the average student may take to complete all required
coursework, including lecture and project time. If you spend about 5-10 hours per week working through the program, you
should finish within the time provided. Actual hours may vary.
Data Modeling
Learners will create relational and NoSQL data models to fit the diverse needs of data consumers. They’ll also use ETL to build
databases in Apache Cassandra.
Course Project
• Select the appropriate primary key and clustering columns for a given use case.
NoSQL Data Models
• Create a NoSQL database in Apache Cassandra.
Course Project
Data Warehouse
In this project, learners will act as a data engineer for a streaming music service. They are tasked with
building an ELT pipeline that extracts data from S3, stages it in Redshift, and transforms it into a set of
dimensional tables for an analytics team to find insights into what songs their users are listening to.
• Explain how OLAP may support certain business users better than OLTP.
Course Project
• Identify what constitutes the big data ecosystem for data engineering.
Lesson 1 • Explain the purpose and evolution of data lakes in the big data ecosystem.
• Use Spark with AWS Glue to run ELT processes on data of diverse sources,
structures, and vintages in lakehouse architecture.
AWS • Transform and filter data into curated lakehouse zones with Spark and AWS
Glue.
• Join and process data into lakehouse zones with Spark and AWS Glue.
Course Project
• Explain the relationship between DAGs, S3, and Redshift within a given
Lesson 1 example.
• Utilize the logic flow of task dependencies to investigate potential errors within
data lineage.
Lesson 3
• Leverage Airflow catchup to backfill data.
Data Quality • Extract data from a specific time range by employing the kwargs parameters.
Lesson 4 • Refactor a complex task into multiple tasks with separate SQL statements.
Amanda Moran
Developer Advocate at DataStax
Amanda is a developer advocate for DataStax after spending the last 6 years as a software engineer
on 4 different distributed databases. Her passion is bridging the gap between customers and
engineering. She has degrees from the University of Washington and Santa Clara University.
Ben Goldberg
Staff Engineer at SpotHero
In his career as an engineer, Ben Goldberg has worked in fields ranging from computer vision
to natural language processing. At SpotHero, he founded and built out their data engineering
team, using Airflow as one of the key technologies.
Valerie Scarlata
Curriculum Manager at Udacity
Valerie is a curriculum manager at Udacity who has developed and taught a broad range of
computing curriculum for several colleges and universities. She was a professor and software
engineer for over 10 years specializing in web, mobile, voice assistant, and social full-stack
application development.
Matt Swaffer
Solutions Architect
Matt is a software and solutions architect focusing on data science and analytics for managed
business solutions. In addition, Matt is an adjunct lecturer, teaching courses in the computer
information systems department at the University of Northern Colorado where he received his
PhD in educational psychology.
Sean currently teaches cybersecurity and DevOps courses at Brigham Young University Idaho. He
has been a software engineer for over 16 years. Some of the most exciting projects he has worked
on involved data pipelines for DNA processing and vehicle telematics.
• Project review cycle creates a feedback loop with multiple opportunities for
improvement—until the concept is mastered.
• Project reviewers leverage industry best practices and provide pro tips.
• Unlimited access to mentors means help arrives when it’s needed most.
• 2 hr or less average question response time assures that skills development stays on track.
Empower job-readiness.
• Access to a Github portfolio review that can give you an edge by highlighting your
strengths, and demonstrating your value to employers.*
• Get help optimizing your LinkedIn and establishing your personal brand so your profile
ranks higher in searches by recruiters and hiring managers.
Mentor Network
• Mentors work across more than 30 different industries and often complete a Nanodegree
program themselves.
11.28.22 | V1.0