0% found this document useful (0 votes)
80 views6 pages

Azure Data Engineering Course Content Day Wise.

Uploaded by

sumankoppula1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views6 pages

Azure Data Engineering Course Content Day Wise.

Uploaded by

sumankoppula1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

AZURE DATA ENGINEERING – ADF & ADB

Day-1
*****

Components in data engineering or the data engineering process…


What is on-premises?
What is cloud computing?
Different types of cloud computing.
Types of services in cloud computing.

Azure portal walkthrough.


Microsoft Enterprise ID (Azure Active Directory).
Subscription-----free trial…
Resource group
Resources.

Day-2
*****

Create a storage account.


Deep understanding of storage accounts - Blob & Gen2.
Connect via Storage Explorer.

Day-3
*****

Authentication methods.
Account key.
SAS (Shared Access Signature).
Service principle.
Managed identity.
Understanding of ACL (Access Control List).
Built-in roles, custom roles.

Day-4 & Day-5 & day-6


************
Prerequisite for data engineering: SQL and Python.
Create tables, SELECT, CASE, GROUP BY, joins, window functions, pivot, cube, rollup, and other built-
in functions.
UPDATE, DELETE, INSERT operations.
Indexes: Cluster index, non-cluster index, column storage index.
Primary key, foreign key
Stored procedures
Connect via Azure Data Studio.
Day-7
*****

ADF (Azure Data Factory) - walkthrough of all the options in Data Factory.
What is integration runtime? Different types and uses.
What is linked services and how to create them using various methods.
What is a dataset and how to create it in different ways.
Different types of activities available in ADF.
Practical exercises on simple copy activities between databases, databases to Azure Data Lake
Storage (ADLS), and ADLS to ADLS.

Day-8
*****

Deep dive into copy activity: Understanding all the tabs and options available/ pipeline optimization

Day-9
*****

Understand the importance of parameterizing pipelines.


Parameterize the pipeline, dataset, and linked services.
Redo the copy activity with parameters and demonstrate the power of parameterization.

Day-10
*****
Explore foreach, if condition, switch activity, until activity, execute activity, validation activity, filter
activity, set variable, append variable, delete activity.

Day-11
*****
Work with activities like, stored procedure activity lookup activity, get metadata activity

Day-12
******

web activity, and webhook activity.


Logic apps

Day-13
*******
Explore triggers in ADF, including schedule triggers, tumbling triggers, and event-based triggers.

Day-14
******

Understand full load and incremental data loading and various methods to achieve it.
Day-15
******

Meta driven pipeline


Azure key vault

Day-16
******

Explore Notebook activity and how to call Databricks notebooks.


Walkthrough of Databricks tools and all available options.
What is Workspace
What is Metastore
What is catalog
What is table, view, volume
What is unity catalog
What is cluster, how to create it and different options in cluster configuration.

Day-17
******

Understand DBFS (Databricks File System) and mounting, including different mounting methods
(using Account key, SAS, Service Principle).
Understand the Dbtuils

Day-18
******

Introduction to Python.
Understand Python data types theoretically: string, int, list, tuple, set, dictionary.
Conditions: if condition, while loop, for loop

Day 19
******
List, list related methods, list comprehensions,
Functions: How to create functions, parameterization of functions.
Lambda functions

Day 20
*******
How to pass function as parameter to other function
Python in-built function – Map, reduce, filter and so on

Day-21
******
Tuple, set, dictionary – methods.
Day-21
******
Explore serialization and deserialization.
In-depth understanding of different big data file formats: Parquet, Avro ORC, CSV, JSON, Delta.

Day -22
******
Learn how to read different file formats using PySpark.
Write data to different file formats.
Explore options for each file format.

Day-23 & Day-24 & Day-25


******
Deep Dive into PySpark functions.
Widgets.

Day-26
******

Understand RDD (Resilient Distributed Dataset) and a few important RDD functions.

Day-27 & Day-28


**************

Explore lake house architecture.


Practical understanding of Parquet, why Delta format is chosen in Databricks.
What is Delta Lake - theory.
Coding Delta Lake in SQL and PySpark.

Day 29
******
What is MapReduce.
Brief understanding of HDFS architecture.
why Hive came into the picture.
Unity catalog
What is a meta store and catalog.
Managed table vs. external table.

Day-30 & Day-31 & Day-32


***********************

In-depth understanding of Spark architecture, covering lazy evaluation, fault tolerance, DAG
(Directed Acyclic Graph), lineage, checkpointing.
Wide and narrow transformations.
Types of clusters and modes of clusters in Databricks.
What is auto-scaling and jobs.
Catalyst optimizer.
Day-33
*******
Practical understanding of concepts like cache, persist, broadcast, accumulator, and df.explain.

Day-34
******
Spark job debugging,
Medallion architecture,
workflows.

Day 35
******
Delta live tables & unity catalog

Day-36
******
Data modelling at a high level – (conceptual, Logical & physical data model. Fact & Dimensions. Star
& Snow flake schema. Normalization and Denormalizations.)

Day-37
******
Data flows in ADF

Day-38
******

SCD2 implementation in ADB & ADF ---------


Grouping all different performance improvement techniques in Spark, Delta, and Databricks, which
we discussed in previous classes.
Topics include cache, persist, partitioning, bucketing, optimization,

Day-39
******
GIT configuration in Databricks and ADF.
Creating branches, understanding the main branch, feature/common branches, developer branches.

Day – 40
********
Resume building, Important questions discussion.

Day-41 – Day-45 (one weekend sat & Sunday – (4-5 hours)


*************
Interview questions preparation in SQL, Python, PySpark.
Agile process
CICD pipelines using Azure DevOps or GitHub Actions.
Real time project flow.

You might also like