Azure Data Factory

Uploaded by

Jagadeesh Babu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

456 views6 pages

Azure Data Factory

Uploaded by

Jagadeesh Babu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Azure Data Factory

1. Key Concepts of Azure Data Factory

a. Data Pipelines
A data pipeline in ADF is a logical grouping of activities that together perform a
task. The pipeline allows you to manage the orchestration and execution of
workflows like ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform).
b. Activity
An activity represents a single unit of work in ADF. There are multiple types of
activities:
 Data movement: Moves data from one source to another (Copy activity).
 Data transformation: Transforms data (e.g., using Azure Databricks, Data
Flow, or HDInsight activities).
 Control flow: Directs the flow of execution (e.g., If Condition, ForEach,
Wait, etc.).
c. Dataset
A dataset represents the data structure that the activity works with, defining
the data to be consumed or produced by the activity. Datasets represent data
in tables, files, or other formats.
d. Linked Service
A linked service defines a connection string to a data store or a compute
resource where the pipeline activities will run. It serves as a configuration that
connects to various data sources (e.g., SQL, blob storage, on-premise
databases).
e. Triggers
A trigger allows you to schedule a pipeline to run either on-demand or at
scheduled intervals (e.g., daily, weekly). Triggers can be:
 Schedule-based: Runs pipelines at specific times or intervals.
 Event-based: Runs pipelines based on events (e.g., file arrival in storage).
f. Integration Runtime (IR)
Integration Runtime (IR) is the compute infrastructure used by Azure Data
Factory to perform data movement and transformation activities. There are
three types of IR:
 Azure IR: For cloud-based data movement and transformation.
 Self-hosted IR: For on-premises data sources.
 SSIS IR: To run SQL Server Integration Services (SSIS) packages.

2. Azure Data Factory Architecture

Azure Data Factory provides a hybrid data integration service, allowing you to
move data between cloud-based and on-premises environments. Here’s how
its architecture breaks down:
a. Authoring
You can author pipelines using the following methods:
 ADF Studio: A graphical user interface (GUI) in the Azure portal.
 JSON-based definition: ADF resources (pipelines, datasets, etc.) are
defined using JSON.
 Azure SDK or APIs: You can use Azure SDKs, PowerShell, or REST APIs to
create and manage ADF resources programmatically.
b. Pipeline Orchestration
Azure Data Factory orchestrates workflows in a serverless manner, meaning
you don't have to manage the infrastructure. You create and schedule data-
driven workflows (pipelines), and Azure automatically scales resources as
needed.
c. Data Integration
Azure Data Factory can integrate with many data sources, including:
 Azure services (e.g., Azure Blob Storage, Azure Data Lake, Azure SQL
Database).
 On-premise databases (SQL Server, Oracle, etc.).
 Other cloud services (Amazon S3, Google BigQuery, etc.).
 File-based data (CSV, JSON, Parquet, etc.).

3. Core Components of Azure Data Factory

a. Pipelines
A collection of activities grouped logically that perform a specific task (e.g.,
moving or transforming data).
b. Activities
Azure Data Factory supports the following types of activities:
 Copy Activity: Moves data from source to destination.
 Data Flow Activity: Executes transformations using the Mapping Data
Flow service.
 Transformation Activities: These include calling external services like:
o Azure Databricks
o HDInsight
o Azure Functions
o Stored Procedures, and more.
 Control Activities: These include:
o If Condition for conditional logic.
o Wait for pausing pipelines.
o ForEach for iterating through collections.
c. Data Flows
Data flows in Azure Data Factory allow for data transformations without writing
code. They are visually designed, providing a low-code/no-code way to perform
complex data operations (joins, aggregations, filtering, etc.).
d. Linked Services
They store connection information to external sources. For example:
 Azure Blob Storage
 Azure SQL Database
 Amazon S3
 On-premises databases via self-hosted IR.
e. Integration Runtime
The compute infrastructure for Azure Data Factory that performs data
movement and transformation. You can choose from:
 Azure IR (default for cloud-based activities).
 Self-hosted IR (for on-premise or hybrid scenarios).
 SSIS IR (for running SSIS packages in ADF).

4. Azure Data Factory Use Cases

a. ETL/ELT Data Pipelines
ADF can automate ETL (Extract, Transform, Load) or ELT (Extract, Load,
Transform) processes. It can pull data from multiple sources, transform it using
Azure Data Flow or Databricks, and load it into a target database or data lake.
b. Hybrid Data Integration
ADF supports both on-premises and cloud-based data sources. For on-
premises data, it uses self-hosted IR to connect securely.
c. Data Transformation
ADF allows you to create transformations through Mapping Data Flows or by
calling Azure Databricks or HDInsight for more complex transformations.
d. Big Data Integration
You can integrate big data services like Azure Data Lake and Azure Synapse
Analytics with ADF to handle large datasets and perform advanced analytics.
e. Orchestration of Data Workflows
ADF orchestrates the entire data pipeline, including scheduling, retry logic,
branching, parallelism, and event-based triggers.

5. Azure Data Factory Monitoring

You can monitor your pipelines and activities using:
 ADF Studio: Provides a graphical overview of pipeline runs, activity runs,
and trigger execution.
 Azure Monitor: Allows logging, alerting, and real-time monitoring of
data factory operations.
 Alerts: You can set up alerts for failed pipelines and errors using Azure
Monitor.

6. Pricing Model
Azure Data Factory pricing is based on:
 Data movement: Pricing is based on the amount of data moved.
 Data transformation: The number of data transformation activities.
 Pipeline orchestration: The number of pipeline runs and activities.

7. Example: Creating a Simple Pipeline

Here’s a simple walkthrough of creating a pipeline in Azure Data Factory that
copies data from an Azure Blob Storage container to an Azure SQL Database.
1. Create a Linked Service for Azure Blob Storage and Azure SQL Database.
2. Create Datasets for the source (blob storage) and sink (SQL Database).
3. Create a Copy Activity in the pipeline to copy data from the source
dataset to the sink dataset.
4. Trigger the pipeline to execute immediately or schedule it.

8. Hands-On Labs and Tutorials

 Microsoft Azure Data Factory Documentation
 Azure Data Factory - Hands-On Labs
This should provide a comprehensive overview of Azure Data Factory. Would
you like to dive deeper into any specific section, or need more detailed
examples?

Azure Data Engineering Interview Q & A - Topicwise
No ratings yet
Azure Data Engineering Interview Q & A - Topicwise
57 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Azure Data Engineer
No ratings yet
Azure Data Engineer
8 pages
Databricks Final
100% (1)
Databricks Final
81 pages
Spark QA
No ratings yet
Spark QA
34 pages
Databricks Certified Data Engineer Professional Practice Questions
No ratings yet
Databricks Certified Data Engineer Professional Practice Questions
13 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Spark Architecture
No ratings yet
Spark Architecture
7 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Pyspark
No ratings yet
Pyspark
31 pages
Azure Databricks Interview Question
No ratings yet
Azure Databricks Interview Question
12 pages
ADF Notes
No ratings yet
ADF Notes
1 page
ADE Azure Data Engineer Interview
No ratings yet
ADE Azure Data Engineer Interview
12 pages
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Performance Tuning Spark UI
No ratings yet
Performance Tuning Spark UI
37 pages
Snowflake - Syllubus and DBT
No ratings yet
Snowflake - Syllubus and DBT
11 pages
SCD in Databricks
No ratings yet
SCD in Databricks
16 pages
External Tables
No ratings yet
External Tables
105 pages
SQL Interview Questions For A Data Engineer
No ratings yet
SQL Interview Questions For A Data Engineer
11 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
Airflow 2 X
100% (2)
Airflow 2 X
39 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
What Are DBT Sources
No ratings yet
What Are DBT Sources
109 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Interview Series ADF Part-1
No ratings yet
Interview Series ADF Part-1
17 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
PLSQL Introduction Final
No ratings yet
PLSQL Introduction Final
81 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
Python For Data Engineering Guide
No ratings yet
Python For Data Engineering Guide
4 pages
Azure Data Factory v2 (PDFDrive)
No ratings yet
Azure Data Factory v2 (PDFDrive)
78 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
ADF DataFlow Functions CheatSheet by Deepak Goyal Azurelib-H0X4sMxnVP-DsMku3fYRq
No ratings yet
ADF DataFlow Functions CheatSheet by Deepak Goyal Azurelib-H0X4sMxnVP-DsMku3fYRq
29 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
NMS TJ5500 8.x UI Guide
No ratings yet
NMS TJ5500 8.x UI Guide
1,085 pages
SCD Typ2 in Databricks Azure
0% (1)
SCD Typ2 in Databricks Azure
8 pages
Azure Data Factory
No ratings yet
Azure Data Factory
47 pages
PySpark Meetup Talk
No ratings yet
PySpark Meetup Talk
35 pages
Data Warehousing Interview Questions - by Shobha Bhagwat - Medium
No ratings yet
Data Warehousing Interview Questions - by Shobha Bhagwat - Medium
9 pages
24 StoredProcs
No ratings yet
24 StoredProcs
6 pages
SQL - & - Pyspak
No ratings yet
SQL - & - Pyspak
6 pages
Tribhuvan University: Project Proposal
No ratings yet
Tribhuvan University: Project Proposal
17 pages
Interview Questions On ADF
No ratings yet
Interview Questions On ADF
2 pages
Snowflake - Billing Components
No ratings yet
Snowflake - Billing Components
9 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
Databricks
No ratings yet
Databricks
4 pages
PySpark Optimization Scenarios - Wipro
No ratings yet
PySpark Optimization Scenarios - Wipro
8 pages
Lab 2 - Working With Data Storage
No ratings yet
Lab 2 - Working With Data Storage
15 pages
SQL Server Course Content
No ratings yet
SQL Server Course Content
11 pages
My Activities in Module 2
100% (6)
My Activities in Module 2
7 pages
BCA Course PDF
No ratings yet
BCA Course PDF
109 pages
Case Study Mysql
0% (1)
Case Study Mysql
3 pages
Com Profibus 7sj602 en
No ratings yet
Com Profibus 7sj602 en
54 pages
COMPUTER Input and Out Put Devices
100% (1)
COMPUTER Input and Out Put Devices
15 pages
Chapter 2 (Web Design and Programming)
No ratings yet
Chapter 2 (Web Design and Programming)
74 pages
100 Days of Code - The Complete Python Pro Bootcamp For 2021
No ratings yet
100 Days of Code - The Complete Python Pro Bootcamp For 2021
15 pages
MunsA Textd Grade 4 ICT TERM 2 HOLIDAY WORK 2024
No ratings yet
MunsA Textd Grade 4 ICT TERM 2 HOLIDAY WORK 2024
4 pages
Scadaaa
No ratings yet
Scadaaa
121 pages
IT Security Hacker Pitch Deck by Slidesgo
No ratings yet
IT Security Hacker Pitch Deck by Slidesgo
42 pages
MTS 102 Module 5
No ratings yet
MTS 102 Module 5
8 pages
19 Assessing Model Accuracy
No ratings yet
19 Assessing Model Accuracy
16 pages
Wimax: Submitted To:-Ms. Pooja Dhiman Submitted By: - Priyanka Verma Ranjana Rani
No ratings yet
Wimax: Submitted To:-Ms. Pooja Dhiman Submitted By: - Priyanka Verma Ranjana Rani
13 pages
Tringo Catalogue 2024 (TG-EP)
No ratings yet
Tringo Catalogue 2024 (TG-EP)
10 pages
Application: Material: Color: Advantage
No ratings yet
Application: Material: Color: Advantage
1 page
Fish Farming
No ratings yet
Fish Farming
11 pages
TURNIRIN TP060237 - Individual Assignment
No ratings yet
TURNIRIN TP060237 - Individual Assignment
29 pages
Bimal Maruti Audit Report
No ratings yet
Bimal Maruti Audit Report
25 pages
Zarin Tasnim
No ratings yet
Zarin Tasnim
11 pages
Eye in The Sky Real Time Drone Surveillance System
No ratings yet
Eye in The Sky Real Time Drone Surveillance System
7 pages
SV102
No ratings yet
SV102
2 pages
Wireless Network Assignment
No ratings yet
Wireless Network Assignment
5 pages
3G Acceptance Doc SSV RAN 2014 - Phase5
No ratings yet
3G Acceptance Doc SSV RAN 2014 - Phase5
6 pages
References From The Reading
No ratings yet
References From The Reading
16 pages
CL-1208 CL-1216: User Manual
No ratings yet
CL-1208 CL-1216: User Manual
82 pages
OOP Chapter 4
No ratings yet
OOP Chapter 4
17 pages
University of Science and Technology Chittagong
No ratings yet
University of Science and Technology Chittagong
14 pages
Individual Assignment 02
No ratings yet
Individual Assignment 02
2 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Azure Data Factory

Uploaded by

Azure Data Factory

Uploaded by

Azure Data Factory

1. Key Concepts of Azure Data Factory

2. Azure Data Factory Architecture

3. Core Components of Azure Data Factory

4. Azure Data Factory Use Cases

5. Azure Data Factory Monitoring

7. Example: Creating a Simple Pipeline

8. Hands-On Labs and Tutorials

You might also like