0% found this document useful (0 votes)

459 views8 pages

Azure Data Engineer

The document outlines a comprehensive training program for Azure Data Engineers, covering key concepts such as Azure services, Azure Data Factory, Azure Databricks, PySpark, and SQL Server. It includes practical scenarios, project examples, and a structured schedule for learning over 45 days, with sessions focused on various Azure technologies. Additionally, it offers post-course support including real-time projects and interview preparation.

Uploaded by

KONDURU KRISHNA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

459 views8 pages

Azure Data Engineer

Uploaded by

KONDURU KRISHNA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

AZURE DATA ENGINEER

Basics:
 On Prem vs cloud
 Azure cloud, region, zone
 Portal creation – Credit CARD OR student account
 Hierarchy
 Tenant id, Subscriptions id’s, RG, vent, Virtual machines, SA
 IAAS, PAAS, SAAS
 Subscriptions
 Resource groups
 Virtual network
 Storage accounts (LRS, GRS, ZRS, GZRS)
 Key vaults

1. Azure data factory (Extract Transform Load)

 Azure data factory service creation

 Integration runtimes – gateway source to destination

 Azure IR or auto resolve IR.

 Self-hosted IR

 SSIS IR

 Linked services – connection string.

 Data sets – represents data.

 Pipelines

 Activities

 Lookup activity

 Get metadata activity.

 Filter activity

 If activity

 Foreach activity

 Copy activity.

 Stored procedure activity

 Web activity

 Switch activity.

 Nested foreach activity

Scenario1: Filter activity in ADF Dynamic copy

Scenario2: Get file names from folder dynamically.
Scenario3: copy activity behavior
Scenario4: Validate copied data between source and sink in ADF.
Scenario5: Load data from on premise SQL server to Azure SQL DB in Azure Data Factory
Scenario6: Full load and Incremental load
Scenario8: Copy Data from on-premises File System 2 ADLS Gen2 Install Self Hosted IR
Scenario9: Copying data from Snowflake to ADLS Gen2 using For Each _ Copy in

ADF V2

 Azure SQL

 Introduction

 Creating first SQL database deployment

 Copying data from azure SQL to azure blob.

 Copy multiple tables in bulk with lookup and foreach activity in ADF.
 Use foreach loop activity to copy multiple tables.

 Incremental load and delta load from SQL to Blob storage.

 Triggers

 Event based trigger.

 Scheduled trigger

 Thumbling window trigger

Free Playlist:
 Azure data flows Basics pipeline flow

 Azure data flows Joins, Advanced Joins, Conditional split, Remove duplicates,

 Rank, dense_rank, row_number, Pivot, Unpivot, Surrogate key, Hash values,

crc32 etc

 Slowly changing dimensions type1 and type2

 Azure devops introduction

• Set up Azure repos in ADF.

• Branching strategies

• Important GIT commands

• Crete build pipeline in azure devops

• Crete release pipeline in azure devops

• Enable CI and CD

2. Azure data bricks

 Introduction

 Databricks creation

 Workspace, Data management, Computation management,

 Types of clusters

 Notebooks

 Widgets

 dB utilities

 DBFS

 Reading and writing files from storage account (blob, ADLS GEN2) to
azure data bricks

 Excel, Parquet, XML, JSON, Delta

 Reading and writing data from azure SQL database to azure data bricks

 Unity Catalog

3. PySpark

 What is Spark and Pyspark

 Spark vs MapReduce

 Real time data processing vs batch processing

 Spark Architecture  Spark components

 In-Memory computation & Lazy evolution, partition, fault tolerance

 Spark transformations

 Narrow and Wide transformations

 RDD – no use

 Data frames

• Spark session vs Spark context  Different ways to create

Data frames.

• Read data frame.

• Select Columns from data frame.

• Remove duplicates from data frame.

• Filter records from data frame

• And OR in operators from data frame

• Sort data and order data in data frame

• Data frame actions

• Check data frame schema.

 Spark SQL Basics

• Filter null, not nulls.

• Group by

• Collect_lst

• Collect_set

• Approx_count_distinct

• Max, min, sum

• Sum distinct

• Avg, count action, like, not like, between  Joins and types.

• Joins overview.

• Inner  Left

• Right

• Full  Left semi.

• Left anti.

• Cross

• Advanced joins  Transformations

• Derive column.  Update column  Drop column.

 Functions

• Concat

• Concat_ws

• When
• Row_number

• Rank function

• Dense_rank

• Crc32

• Md5

• Sha1, sha2

• Ipad, round, ceil, floor, partition by, Lag, lead, first, split, array,
explode.
 Auto loader

 Delta tables

 Delta live tables

 SCD type 1 and type 2 with Joins

 Apache Airflow vs Workflow

 Azure devops introduction

• Set up Azure repos in data bricks notebooks.

• Branching strategies

• Important GIT commands

• Crete build pipeline in azure devops

• Crete release pipeline in azure devops

• Enable CI and CD

4. Python

 Introduction

 Syntax

 Comments and variables, data types

 Operators, lists, tuple, sets, Dictionaries

 If else, while, for loop

 Functions

 Lambda

 Arrays

 Classes/Objects, Inheritance, Polymorphism

 Modules

 Exception handling

5. SQL Server

 Introduction and syntax

 Create database, Drop database.

 Create table, drop, alter.

 Constraints, not null, unique, primary key, foreign key

 Select, select distinct, where, And or not, order by, insert into, null
values.

 Update, delete, select top, min and max, count, avg, sum,

 like, in  Between, alias, joins, having, Stored procedures.

Project1: On Prem (SQL server) -> ADF(Dataflows)->Azure SQL

Project2: On Prem (Filesystem)-> ADF(ADB)-> Azure synapse
Project3: Fabric (data factory, spark, data modeling(star schema),
power BI)

Free Playlist: Azure Synapse, Snowflake

Azure Data Engineer

Fast track: 45days

Mon, Tue, Thu, Friday – Azure Databricks, Pyspark, SQL,

Python, Synapse, Microsoft Fabric
Wen, Sat-Azure data Factory, Microsoft Fabric

Timings: 6 PM TO 7.30 PM IST

6.30 AM TO 7.45AM IST
8.30 PM TO 9.30 PM IST
Monday to Saturday – Regular sessions
After the Course: 3 real time Projects, Mock interviews,
Resume pre,
LinkedIn, Naukri setups

Databricks Data Engineer Associate Notes
No ratings yet
Databricks Data Engineer Associate Notes
5 pages
Azure Data Engineering Interview Q & A - Topicwise
No ratings yet
Azure Data Engineering Interview Q & A - Topicwise
57 pages
Azure Administration Lab Guide
No ratings yet
Azure Administration Lab Guide
87 pages
Spark QA
No ratings yet
Spark QA
34 pages
R23AMR ReleaseHighlights
100% (1)
R23AMR ReleaseHighlights
387 pages
Azure Data Factory
No ratings yet
Azure Data Factory
3,167 pages
Project Ready Workshop Catalog - Updated Nov 2024
No ratings yet
Project Ready Workshop Catalog - Updated Nov 2024
121 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Microsoft AZ-104 Czesc 2 - ExamTopics
No ratings yet
Microsoft AZ-104 Czesc 2 - ExamTopics
839 pages
Azure Active Directory Domain Services
No ratings yet
Azure Active Directory Domain Services
459 pages
Azure Data Engineer Resume
No ratings yet
Azure Data Engineer Resume
2 pages
ADF Course Deck
No ratings yet
ADF Course Deck
88 pages
Py 1731703428
No ratings yet
Py 1731703428
8 pages
IA Test-Question and Answer-AZ104
No ratings yet
IA Test-Question and Answer-AZ104
18 pages
AZ 500 Ultimate Prep Guide
No ratings yet
AZ 500 Ultimate Prep Guide
16 pages
SQL - & - Pyspak
No ratings yet
SQL - & - Pyspak
6 pages
My North Star Is Customer Success (Where "Success" Means Whatever Aligns Best With Customer
No ratings yet
My North Star Is Customer Success (Where "Success" Means Whatever Aligns Best With Customer
280 pages
Azure Data Factory
No ratings yet
Azure Data Factory
6 pages
PySpark Meetup Talk
No ratings yet
PySpark Meetup Talk
35 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
SCD in Databricks
No ratings yet
SCD in Databricks
16 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
Ebook SysAdmin Guide To Azure IaaS
100% (1)
Ebook SysAdmin Guide To Azure IaaS
109 pages
Azure Databricks An Introduction
No ratings yet
Azure Databricks An Introduction
54 pages
ADF DataFlow Functions CheatSheet by Deepak Goyal Azurelib-H0X4sMxnVP-DsMku3fYRq
No ratings yet
ADF DataFlow Functions CheatSheet by Deepak Goyal Azurelib-H0X4sMxnVP-DsMku3fYRq
29 pages
DP 3011 ENU TrainerPrepGuide
No ratings yet
DP 3011 ENU TrainerPrepGuide
6 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
04 TechTalks The Evolution of Edge Computing and AI Consolidated Sharing 1213
No ratings yet
04 TechTalks The Evolution of Edge Computing and AI Consolidated Sharing 1213
41 pages
SSIS in The Cloud
No ratings yet
SSIS in The Cloud
17 pages
ADF Notes
No ratings yet
ADF Notes
1 page
ADE Azure Data Engineer Interview
No ratings yet
ADE Azure Data Engineer Interview
12 pages
Databricks Certified Data Engineer Professional Practice Questions
No ratings yet
Databricks Certified Data Engineer Professional Practice Questions
13 pages
CH 2 Introduction To Data Warehousing
No ratings yet
CH 2 Introduction To Data Warehousing
31 pages
Unit 3-1
No ratings yet
Unit 3-1
32 pages
Roadmap To Become An Azure Data Engineer 2024
No ratings yet
Roadmap To Become An Azure Data Engineer 2024
3 pages
Snowpro Advanced Data Engineer
No ratings yet
Snowpro Advanced Data Engineer
17 pages
KQL Cheat Sheet DP700
No ratings yet
KQL Cheat Sheet DP700
2 pages
WinWire Hadoop To Databricks Migration
No ratings yet
WinWire Hadoop To Databricks Migration
14 pages
CRIBL
No ratings yet
CRIBL
9 pages
FY20 CSP Indirect Reseller Incentive Guide (EN) (18sept2019)
No ratings yet
FY20 CSP Indirect Reseller Incentive Guide (EN) (18sept2019)
25 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
GSR Azure High Level Architecture
No ratings yet
GSR Azure High Level Architecture
4 pages
Akshitha SR SRE Resume
No ratings yet
Akshitha SR SRE Resume
10 pages
Veeam Rental Licensing and Usage Reporting: Reference Guide
No ratings yet
Veeam Rental Licensing and Usage Reporting: Reference Guide
36 pages
Azure DataEngineer Training
No ratings yet
Azure DataEngineer Training
13 pages
Best Tools For DevSecOps Professionals in 2023 1686775400
No ratings yet
Best Tools For DevSecOps Professionals in 2023 1686775400
8 pages
Integrating SAP SuccessFactors With
No ratings yet
Integrating SAP SuccessFactors With
12 pages
Unit-5 Final
No ratings yet
Unit-5 Final
19 pages
Resume 3
No ratings yet
Resume 3
3 pages
Data Engineering Roadmap 2024
No ratings yet
Data Engineering Roadmap 2024
4 pages
Study Guide: Exam AZ-900: Microsoft Azure Fundamentals
No ratings yet
Study Guide: Exam AZ-900: Microsoft Azure Fundamentals
8 pages
Gartner 2018
No ratings yet
Gartner 2018
40 pages
Ds Material PDF
No ratings yet
Ds Material PDF
243 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
Azure Exam
No ratings yet
Azure Exam
28 pages
Data Engineer Profiles
No ratings yet
Data Engineer Profiles
5 pages
Data Bricks
No ratings yet
Data Bricks
43 pages
Lab 2 - Working With Data Storage
No ratings yet
Lab 2 - Working With Data Storage
15 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
Standard Responseto Requestfor Information Windows Azure Security Privacy
No ratings yet
Standard Responseto Requestfor Information Windows Azure Security Privacy
67 pages
Data Lake Bootcamp: Building Reliable Data Lakes
No ratings yet
Data Lake Bootcamp: Building Reliable Data Lakes
29 pages
Yash Da
No ratings yet
Yash Da
1 page
New Updated Certifications List
No ratings yet
New Updated Certifications List
5 pages
Talend Data Integration
No ratings yet
Talend Data Integration
5 pages
Python For Data Engineering Guide
No ratings yet
Python For Data Engineering Guide
4 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Microsoft - Pass4sure - DP 203.free - pdf.2024 Mar 29
No ratings yet
Microsoft - Pass4sure - DP 203.free - pdf.2024 Mar 29
21 pages
Cushman & Wakefield - Data Center Global Cloud Report Summer 2020
No ratings yet
Cushman & Wakefield - Data Center Global Cloud Report Summer 2020
4 pages
PASS Azure Data Engineering Bootcamp
No ratings yet
PASS Azure Data Engineering Bootcamp
35 pages
Ajay Resume VLaF
No ratings yet
Ajay Resume VLaF
2 pages
ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
Jarupula Praveen
No ratings yet
Jarupula Praveen
7 pages
Databricks
No ratings yet
Databricks
11 pages
Talend Interview Questions
No ratings yet
Talend Interview Questions
5 pages
Hemanta Katwal
No ratings yet
Hemanta Katwal
7 pages
SCD Typ2 in Databricks Azure
0% (1)
SCD Typ2 in Databricks Azure
8 pages
Azure Data Factory Monitoring Best Practices
No ratings yet
Azure Data Factory Monitoring Best Practices
9 pages
Phani-Devops Updated Resume
No ratings yet
Phani-Devops Updated Resume
7 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Interview Questions On ADF
No ratings yet
Interview Questions On ADF
2 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Oracle Forms Reports Besant Technologies Course Syllabus
No ratings yet
Oracle Forms Reports Besant Technologies Course Syllabus
6 pages
Teradata Scripts
No ratings yet
Teradata Scripts
998 pages
CA7 Notes
100% (1)
CA7 Notes
27 pages
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
From Everand
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
Adam Jones
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet

Azure Data Engineer

Uploaded by

Azure Data Engineer

Uploaded by

AZURE DATA ENGINEER

1. Azure data factory (Extract Transform Load)

 Integration runtimes – gateway source to destination

 Azure IR or auto resolve IR.

 Linked services – connection string.

 Data sets – represents data.

 Get metadata activity.

 Stored procedure activity

 Nested foreach activity

Scenario1: Filter activity in ADF Dynamic copy

 Creating first SQL database deployment

 Copying data from azure SQL to azure blob.

 Incremental load and delta load from SQL to Blob storage.

 Event based trigger.

 Thumbling window trigger

 Rank, dense_rank, row_number, Pivot, Unpivot, Surrogate key, Hash values,

 Slowly changing dimensions type1 and type2

 Azure devops introduction

• Set up Azure repos in ADF.

• Important GIT commands

• Crete build pipeline in azure devops

• Crete release pipeline in azure devops

2. Azure data bricks

 Workspace, Data management, Computation management,

 Excel, Parquet, XML, JSON, Delta

 What is Spark and Pyspark

 Real time data processing vs batch processing

 Spark Architecture  Spark components

 In-Memory computation & Lazy evolution, partition, fault tolerance

 Narrow and Wide transformations

• Spark session vs Spark context  Different ways to create

• Read data frame.

• Select Columns from data frame.

• Remove duplicates from data frame.

• Filter records from data frame

• Sort data and order data in data frame

• Data frame actions

• Check data frame schema.

• Filter null, not nulls.

• Max, min, sum

• Full  Left semi.

• Advanced joins  Transformations

• Derive column.  Update column  Drop column.

 Delta live tables

 SCD type 1 and type 2 with Joins

 Apache Airflow vs Workflow

 Azure devops introduction

• Set up Azure repos in data bricks notebooks.

• Important GIT commands

• Crete build pipeline in azure devops

• Crete release pipeline in azure devops

 Comments and variables, data types

 Operators, lists, tuple, sets, Dictionaries

 Classes/Objects, Inheritance, Polymorphism

 Introduction and syntax

 Create database, Drop database.

 Create table, drop, alter.

 Constraints, not null, unique, primary key, foreign key

 like, in  Between, alias, joins, having, Stored procedures.

Project1: On Prem (SQL server) -> ADF(Dataflows)->Azure SQL

Free Playlist: Azure Synapse, Snowflake

Fast track: 45days

Mon, Tue, Thu, Friday – Azure Databricks, Pyspark, SQL,

Timings: 6 PM TO 7.30 PM IST

You might also like