Azure Data Engineer
Azure Data Engineer
Basics:
On Prem vs cloud
Azure cloud, region, zone
Portal creation – Credit CARD OR student account
Hierarchy
Tenant id, Subscriptions id’s, RG, vent, Virtual machines, SA
IAAS, PAAS, SAAS
Subscriptions
Resource groups
Virtual network
Storage accounts (LRS, GRS, ZRS, GZRS)
Key vaults
Self-hosted IR
SSIS IR
Activities
Lookup activity
Filter activity
If activity
Foreach activity
Copy activity.
Web activity
Switch activity.
ADF V2
Azure SQL
Introduction
Copy multiple tables in bulk with lookup and foreach activity in ADF.
Use foreach loop activity to copy multiple tables.
Triggers
Scheduled trigger
Free Playlist:
Azure data flows Basics pipeline flow
Azure data flows Joins, Advanced Joins, Conditional split, Remove duplicates,
• Branching strategies
• Enable CI and CD
Introduction
Databricks creation
Notebooks
Widgets
dB utilities
DBFS
Reading and writing files from storage account (blob, ADLS GEN2) to
azure data bricks
Reading and writing data from azure SQL database to azure data bricks
Unity Catalog
3. PySpark
Spark vs MapReduce
Spark transformations
RDD – no use
Data frames
Data frames.
• Group by
• Collect_lst
• Collect_set
• Approx_count_distinct
• Sum distinct
• Avg, count action, like, not like, between Joins and types.
• Joins overview.
• Inner Left
• Right
• Left anti.
• Cross
Functions
• Concat
• Concat_ws
• When
• Row_number
• Rank function
• Dense_rank
• Crc32
• Md5
• Sha1, sha2
• Ipad, round, ceil, floor, partition by, Lag, lead, first, split, array,
explode.
Auto loader
Delta tables
• Branching strategies
• Enable CI and CD
4. Python
Introduction
Syntax
Functions
Lambda
Arrays
Modules
Exception handling
5. SQL Server
Select, select distinct, where, And or not, order by, insert into, null
values.
Update, delete, select top, min and max, count, avg, sum,