Azure Data Factory tutorial
Azure Data Factory tutorial
Transform Data
Using newly added Data Flow, now Data Factory is complete cloud based ETL tool.
Definition:
Azure Data Factory (ADF) is a hybrid data integration service
Migration?
01 Data Factory excels in periodic data loads and transformation instead.
Streaming?
02 ADF can orchestrate, but there are other dedicated services for streaming
Transformations?
03 Data flows for simple ones, but you can use Databricks or HDInsight for more complex transforms
SSIS vs Data Factory
Cluster Types
Delivery man
Shop House
Integration Runtime
Blob Storage
Copy Activity
Order Table
Order.csv
Data Factory vs SSIS
Cluster Types
Control activities
03 Control pipeline flow
e.g. ForEach, Web
• Data Flow is a new feature of Azure Data Factory
(ADF) that allows you to develop graphical data
transformation logic that can be executed as activities
Data Flows within ADF pipelines.
• Two types:
• Mapping
• Wrangling
➢ Simply point or reference the data
➢ Files
➢ Folders
Dataset
➢ Documents
➢ Tables
➢ Similar to connection string
external resources
Linked service
➢ Datastores like Azure SQL Server
infrastructure
➢ Data Flow
➢ Data Movement
deserialization etc.
performance manner.
➢ Create Azure IR
management purpose
➢ Execute pipeline
➢ one-to-one relationship
➢ Properties – triggerBody().folderPath/fileName
Demo: Copy Activity
Summary
Data Flows
Behind the scene Data flow will execute on Azure Databricks using Spark
ADF internally handles all the code translation, spark optimization and execution of transformation