0% found this document useful (0 votes)
459 views8 pages

Azure Data Engineer

The document outlines a comprehensive training program for Azure Data Engineers, covering key concepts such as Azure services, Azure Data Factory, Azure Databricks, PySpark, and SQL Server. It includes practical scenarios, project examples, and a structured schedule for learning over 45 days, with sessions focused on various Azure technologies. Additionally, it offers post-course support including real-time projects and interview preparation.

Uploaded by

KONDURU KRISHNA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
459 views8 pages

Azure Data Engineer

The document outlines a comprehensive training program for Azure Data Engineers, covering key concepts such as Azure services, Azure Data Factory, Azure Databricks, PySpark, and SQL Server. It includes practical scenarios, project examples, and a structured schedule for learning over 45 days, with sessions focused on various Azure technologies. Additionally, it offers post-course support including real-time projects and interview preparation.

Uploaded by

KONDURU KRISHNA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

AZURE DATA ENGINEER

Basics:
 On Prem vs cloud
 Azure cloud, region, zone
 Portal creation – Credit CARD OR student account
 Hierarchy
 Tenant id, Subscriptions id’s, RG, vent, Virtual machines, SA
 IAAS, PAAS, SAAS
 Subscriptions
 Resource groups
 Virtual network
 Storage accounts (LRS, GRS, ZRS, GZRS)
 Key vaults

1. Azure data factory (Extract Transform Load)


 Azure data factory service creation

 Integration runtimes – gateway source to destination

 Azure IR or auto resolve IR.

 Self-hosted IR

 SSIS IR

 Linked services – connection string.

 Data sets – represents data.


 Pipelines

 Activities

 Lookup activity

 Get metadata activity.

 Filter activity

 If activity

 Foreach activity

 Copy activity.

 Stored procedure activity

 Web activity

 Switch activity.

 Nested foreach activity

Scenario1: Filter activity in ADF Dynamic copy


Scenario2: Get file names from folder dynamically.
Scenario3: copy activity behavior
Scenario4: Validate copied data between source and sink in ADF.
Scenario5: Load data from on premise SQL server to Azure SQL DB in Azure Data Factory
Scenario6: Full load and Incremental load
Scenario8: Copy Data from on-premises File System 2 ADLS Gen2 Install Self Hosted IR
Scenario9: Copying data from Snowflake to ADLS Gen2 using For Each _ Copy in

ADF V2

 Azure SQL

 Introduction

 Creating first SQL database deployment

 Copying data from azure SQL to azure blob.

 Copy multiple tables in bulk with lookup and foreach activity in ADF.
 Use foreach loop activity to copy multiple tables.

 Incremental load and delta load from SQL to Blob storage.

 Triggers

 Event based trigger.

 Scheduled trigger

 Thumbling window trigger

Free Playlist:
 Azure data flows Basics pipeline flow

 Azure data flows Joins, Advanced Joins, Conditional split, Remove duplicates,

 Rank, dense_rank, row_number, Pivot, Unpivot, Surrogate key, Hash values,


crc32 etc

 Slowly changing dimensions type1 and type2

 Azure devops introduction

• Set up Azure repos in ADF.

• Branching strategies

• Important GIT commands

• Crete build pipeline in azure devops

• Crete release pipeline in azure devops

• Enable CI and CD

2. Azure data bricks

 Introduction

 Databricks creation

 Workspace, Data management, Computation management,


 Types of clusters

 Notebooks

 Widgets

 dB utilities

 DBFS

 Reading and writing files from storage account (blob, ADLS GEN2) to
azure data bricks

 Excel, Parquet, XML, JSON, Delta

 Reading and writing data from azure SQL database to azure data bricks

 Unity Catalog

3. PySpark

 What is Spark and Pyspark

 Spark vs MapReduce

 Real time data processing vs batch processing

 Spark Architecture  Spark components

 In-Memory computation & Lazy evolution, partition, fault tolerance

 Spark transformations

 Narrow and Wide transformations

 RDD – no use

 Data frames

• Spark session vs Spark context  Different ways to create

Data frames.

• Read data frame.

• Select Columns from data frame.

• Remove duplicates from data frame.

• Filter records from data frame


• And OR in operators from data frame

• Sort data and order data in data frame

• Data frame actions

• Check data frame schema.


 Spark SQL Basics

• Filter null, not nulls.

• Group by

• Collect_lst

• Collect_set

• Approx_count_distinct

• Max, min, sum

• Sum distinct

• Avg, count action, like, not like, between  Joins and types.

• Joins overview.

• Inner  Left

• Right

• Full  Left semi.

• Left anti.

• Cross

• Advanced joins  Transformations

• Derive column.  Update column  Drop column.

 Functions

• Concat

• Concat_ws

• When
• Row_number

• Rank function

• Dense_rank

• Crc32

• Md5

• Sha1, sha2

• Ipad, round, ceil, floor, partition by, Lag, lead, first, split, array,
explode.
 Auto loader

 Delta tables

 Delta live tables

 SCD type 1 and type 2 with Joins

 Apache Airflow vs Workflow

 Azure devops introduction

• Set up Azure repos in data bricks notebooks.

• Branching strategies

• Important GIT commands

• Crete build pipeline in azure devops

• Crete release pipeline in azure devops

• Enable CI and CD

4. Python

 Introduction

 Syntax

 Comments and variables, data types

 Operators, lists, tuple, sets, Dictionaries


 If else, while, for loop

 Functions

 Lambda

 Arrays

 Classes/Objects, Inheritance, Polymorphism

 Modules

 Exception handling

5. SQL Server

 Introduction and syntax

 Create database, Drop database.

 Create table, drop, alter.

 Constraints, not null, unique, primary key, foreign key

 Select, select distinct, where, And or not, order by, insert into, null
values.

 Update, delete, select top, min and max, count, avg, sum,

 like, in  Between, alias, joins, having, Stored procedures.

Project1: On Prem (SQL server) -> ADF(Dataflows)->Azure SQL


Project2: On Prem (Filesystem)-> ADF(ADB)-> Azure synapse
Project3: Fabric (data factory, spark, data modeling(star schema),
power BI)

Free Playlist: Azure Synapse, Snowflake


Azure Data Engineer

Fast track: 45days

Mon, Tue, Thu, Friday – Azure Databricks, Pyspark, SQL,


Python, Synapse, Microsoft Fabric
Wen, Sat-Azure data Factory, Microsoft Fabric

Timings: 6 PM TO 7.30 PM IST


6.30 AM TO 7.45AM IST
8.30 PM TO 9.30 PM IST
Monday to Saturday – Regular sessions
After the Course: 3 real time Projects, Mock interviews,
Resume pre,
LinkedIn, Naukri setups

You might also like