0% found this document useful (0 votes)
21 views2 pages

20CS11Q3

This document outlines a course on modern data engineering. The course is divided into 5 units that cover topics such as data lakes architectures, data engineering tools on Microsoft Azure, data pipelines, the bronze and silver layers for data collection and curation, delta lake tables, and the gold layer for data aggregation. By the end of the course, students will be able to understand data lakes, explain data engineering pipelines and services, create delta lake tables, develop data curation and aggregation pipelines, and verify aggregated data.

Uploaded by

sushmitha2684
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views2 pages

20CS11Q3

This document outlines a course on modern data engineering. The course is divided into 5 units that cover topics such as data lakes architectures, data engineering tools on Microsoft Azure, data pipelines, the bronze and silver layers for data collection and curation, delta lake tables, and the gold layer for data aggregation. By the end of the course, students will be able to understand data lakes, explain data engineering pipelines and services, create delta lake tables, develop data curation and aggregation pipelines, and verify aggregated data.

Uploaded by

sushmitha2684
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

MODERN DATA ENGINEERING

(Job Oriented Elective)

Course Code:20CS11Q3 L T P C
3 0 0 3
Course outcomes: At the end of the course students will be able to
CO1: Understand data lakes architectures and data engineering tools and services. (L2)
CO2: Explain architectures and pipelines to create data lakes. (L2)
CO3: experiment with delta lake tables (L3)
CO4: Build the data pipeline for the data curation stage. (L2)
CO5: Develop gold layer for data aggregation to meet customer expectations. (L3)

UNIT-I: (10 Lectures)


Discovering Storage and Compute Data Lakes: Introducing data lakes, Discovering data lake
architectures, Data Warehouse Vs Datalakes.
Data Engineering on Microsoft Azure: Introducing data engineering in Azure, Performing data
engineering in Microsoft Azure-Self-managed data, engineering services (IaaS), Azure-managed
data engineering services (PaaS), Data processing services in Microsoft Azure, Data engineering
as a service (SaaS), Data cataloging and sharing services in Microsoft Azure; Opening a free
account with Microsoft Azure. (Chapter 2,3)

Learning Outcomes: At the end of the module, students will be able to:

1. Explain introduction of data lakes. (L2)


2. Describe data engineering in Azure. (L2)
3. Summarize data engineering services.(L2)

UNIT-II: (10 Lectures)


Understanding Data Pipelines: Exploring data pipelines, Process of creating a data pipeline,
Running a data pipeline, Sample lakehouse project
Data Collection Stage – The Bronze Layer: Architecting the Electroniz data lake,
Understanding the bronze layer, Configuring data sources, Configuring data destinations,
Building the ingestion pipelines

Learning Outcomes: At the end of the module, students will be able to:

1. Understand the bronze layer. (L2)


2. Describe the process of configuring data sources. (L2)
3. Explain how to build ingestion pipelines.(L2)
UNIT-III: (10 Lectures)
Understanding Delta Lake: Understanding how Delta Lake enables the lakehouse,
Understanding Delta Lake, Creating a Delta Lake table, Changing data in an existing Delta Lake
table, Performing time travel, Performing upserts of data, Understanding isolation levels,
Understanding concurrency control, Cleaning up Azure resources

Learning Outcomes: At the end of the module, students will be able to:

1.summarize the process of clean the raw data (L2)


2. create a delta lake table (L3)
3. illustrate isolation levels and concurrency control(L2)

UNIT-IV: (10 Lectures)


Data Curation Stage – The Silver Layer: The need for curating raw data, The process of
curating raw data, Developing a data curation pipeline, Running the pipeline for the silver layer,
Verifying curated data in the silver layer, Cleaning up Azure resources. (Chapter 7)

Learning Outcomes: At the end of the module, students will be able to:

1.explain the need for curating the data. (L2)


2. Outline the process of curating the data. (L2)
3. Develop the data curation pipeline. (L2)

UNIT-V: (10 Lectures)


Data Aggregation Stage – The Gold Layer : The need to aggregate data, The process of
aggregating data, Developing a data aggregation pipeline, Running the aggregation pipeline,
Understanding data consumption, Verifying aggregated data in the gold layer, Meeting customer
expectations. (Chapter 8)

Learning Outcomes: At the end of the module, students will be able to:

1. Explain the need to aggregate data. (L2)


2. Build a data aggregation pipeline. (L3)
3. Interpret verification of aggregated data in the gold layer(L2)

Text Books:
1. Manoj Kukreja,, Data Engineering with Apache Spark, Delta Lake, and Lakehouse, Packt
Publishing, 2021.

References Books:
1. Scott Haines, Modern Data Engineering with Apache Spark: A Hands-On Guide for
Building Mission-Critical Streaming Applications, Apress, 2022.

Web References:
1. https://www.coursera.org/learn/introduction-to-data-engineering
2. https://www.coursera.org/professional-certificates/microsoft-azure-dp-203-data-engineeri
ng
3. https://aws.amazon.com/compare/the-difference-between-a-data-warehouse-data-lake-an
d-data-mart/

You might also like