0% found this document useful (0 votes)
19 views32 pages

M01 - Fundamentals

Uploaded by

78kmsqykrd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views32 pages

M01 - Fundamentals

Uploaded by

78kmsqykrd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Module 1:

Data Factory Fundamentals


What is it and why use it?
Resource Components
Common Activities
Execution Dependencies
Azure Data Factory –
What is it?
Why use it?
A Quick History Lesson

v1 v2 Data Flows (Alpha)


Sept
July Sept March
2020
2016 2017 2018
What is Azure Data Factory (ADF)?

https://azure.microsoft.com/en-gb/services/data-factory
What is Azure Data Factory (ADF)?

Copy Transform
Resource Components
Data Factory Components

Copy Transform
Data Factory Components
Data Factory Components

1 Linked Services – What to interact with and how?

SQLDBLinkedService

ConnectionString: Server=MyServer;Database=myDataBase
UserName: “MrPaulAndrew”
Password: ***************
Data Factory Components

1 Linked Services

2 Datasets – Where is my data? What format? What file path/table do I need?

[dbo].[SalesOrders]

/RAW/Orders/2018/01/01/SalesOrders.csv
Data Factory Components

1 Linked Services

2 Datasets
Databricks Notebook Activity

3 Activities – What do we notebookPath: /Playground/Playing


want to happen when we baseParameters: Testing
libraries[ jar]: dbfs:/lib1.jar
invoke a Linked Service?
linkedServiceName: BricksOfData01
With what conditions?
Data Factory Components

Extract Transform
1 Linked Services

2 Datasets

3 Activities

4 Pipelines – Logical groups of


work that can be executed.
Execute Pipeline
Activity
Data Factory Components

Extract
1 Linked Services

2 Datasets

3 Activities

4 Pipelines – Logical groups of Transform & Load


work that can be executed.
Data Factory Components

Extract Transform
1 Linked Services

2 Datasets Manually
Programmatically
3 Activities Schedule
Tumbling Windows
4 Pipelines
File Event
5 Triggers – Telling our when pipelines to run.
Data Factory Components

1 Linked Services

2 Datasets

3 Activities

4 Pipelines

5 Triggers
Data Factory Control Flow Components

1 Linked Services

2 Datasets

3 Activities

4 Pipelines

5 Triggers
Common Activities

Paul’s Favourites
Data Factory Common Activities

1 Linked Services

2 Datasets

3 Activities

4 Pipelines

5 Triggers
Copy

Dataset Dataset
(Source) (Sink)

Copy Data

Auto Scaling
Transactional Restarts
Handle Zip Compression
Attribute Mapping and Schema Drift
Handle Failed Rows
Add Custom Attributes
Parse Excel & JSON Files
Lookup
Get value to support other control flow activities
Single Value
Or
Many Values
Dataset [array]

Lookup

https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
ForEach IsSequential:
true
[array]
Scaling Out Control Flow Activities [0]

[1]

Many Values [2]


[array]
[3]

[i]

ForEach
[array]

Lookup

[0] [1] [2] [3] [4] [5] [6] [i]


Copy Data Do Stuff

Batch Count Default: 20


@item().
Batch Count Max: 50

https://docs.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity
Switch

@case Switch

Default: Dev: Test: Prod:

Run Notebook Run Notebook


Raise on Small on Medium
Run Notebook
Error on Big Cluster
Cluster Cluster

https://mrpaulandrew.com/2020/01/22/using-the-azure-data-factory-switch-activity/
Execute Pipeline

Execute Pipeline

Call Child
Pipeline
Custom
Extend Data Factory with Custom Code
References Objects
Datasets: []
Linked Services: []

Custom

Linked Services
Azure Batch ???
Azure Blob Storage

https://mrpaulandrew.com/2018/11/12/creating-an-azure-data-factory-v2-custom-activity/
Azure Function
Web

Do Stuff
Extend Data Factory with Rest Calls
Web Hook

GET Do Stuff
POST
PUT
etc...

Do Stuff ???

Headers
Body
Execution Dependencies
Execution Dependency Options

Success

Fail

Get
Values
Complete

Skip
Execution On Failure

Get
Values

Error Handler
Execution On Failure or On Success

Do Stuff

Get
Values

Error Handler
Execution On ???

Run Stored
Do Stuff Procedure

Get
Values
AND AN
D
Error Handler
Execution On Failure or On Success

Run Stored
Do Stuff Procedure

Get
Values

OR OR OR
Error Handler Error Handler Error Handler
Module 1:
Data Factory Fundamentals
What is it and why use it?
Resource Components
Common Activities
Execution Dependencies

You might also like