100% found this document useful (2 votes)
1K views30 pages

Start To Finish With Azure Data Factory

Start to Finish With Azure Data Factory

Uploaded by

arjun.ec633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
1K views30 pages

Start To Finish With Azure Data Factory

Start to Finish With Azure Data Factory

Uploaded by

arjun.ec633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Start to Finish with Azure

Data FactoryAndy Roberts


Data Architect
[email protected]
Session Objectives and Takeaways
Session Objectives
Understand where ADF fits in Cortana Analytics
Understand how ADF Works, and its components
Be able to deploy and manage a simple ADF
implementation

Key Takeaway:
ADF can be used in real world data pipeline
scenarios, quickly and easily
Cortana Analytics
A Suite of Products that
allow you to Predict
Outcomes,Prescribe
Actions andAutomate
Decisions
Cortana
Cortana

Power BI

Analytics Azure Stream Analytics

Azure HDInsight

Stack Azure Machine Learning

Azure SQL DB, Data Warehouse, DocumentDB

Azure Data Lake

Azure Event Hubs

Azure Data Catalog

Azure Data Factory

Microsoft Azure
Operationalized Analytic
Solutions Information Big Data Stores Machine Learning Dashboards and
Management Visualizations
and Analytics Power BI
Business Azure
apps Storage
Azure
Personal Digital
Machine Learning
Assistant
Azure Azure Cortana
Data Factory Data Lake
People
Azure
Custom HDInsight (Hadoop) Perceptual
apps Azure Intelligence
Face, vision
Data Catalog Azure
SQL Data Warehouse
Speech, text
Azure
Azure Stream Analytics Business
Event Hub Scenarios
Recommendations,
Sensors
and devices customer churn, Automate
d
forecasting, etc.
Systems

DATA INTELLIGENCE ACTION


Advanced Analytics Life
Cycle Transform Analyze

Catalog
Discover
{} Orchestrat
Store Act
e

Cortana Analytics
Process:
https://tinyurl.com/caproc Ingest
Azure Data Factory

Create, orchestrate, and


manage data movement and
enrichment through the cloud
ADF Components
ADF Logical Flow
ADF Process
1.Define Architecture: Set up objectives and
flow
2.Create the Data Factory: Portal, PowerShell,
VS
3.Create Linked Services: Connections to
Data and Services
4.Create Datasets: Input and Output
5.Create Pipeline: Define Activities
1. Design
Process
Define data
sources, processing
requirements, and
output also
management and
monitoring
Example - Churn
Azure Data Data Activity: a processing Pipeline: a logical
Factory: Set step group of activities
(Collection of (Hadoop job, custom code, ML model,
files, DB table, etc)
etc)

Data Ingest Transform & Analyze Publish


Sources

Call Log Transform,


Files Call Log Files Analyz Move Act
Combine, etc
e (Visualize
)
Customer Table
Customer
Table
Customer Customer
On Premises Customer Churn
s Likely to
Call Details Table
Data Mart Churn

Azure Blob Storage Azure


DB
Our ADF:
Business Goal: Transform and
Analyze Web Logs each month
Design Process: Transform Raw
Weblogs stored in a temporary
location, using a Hive Query, storing
Web the results in Blob Storage Files
ready for
Logs in analysis
HDFS and use in
File AzureML
store
2. Create
the Data
Factory
Portal,
PowerShell
and Visual
Using the
Portal

Use in Non-MS Clients


Use for Exploration
Use when teaching or in
Using
PowerSh
ell
Use in MS Clients
Use for Automation
Use for quick set up and tear
PowerShell ADF Example
1. Run Add-AzureAccount and enter the user name and
password
2. Run Get-AzureSubscription to view all the
subscriptions for this account.
3. Run Select-AzureSubscription to select the
subscription that you want to work with.
4. Run Switch-AzureMode AzureResourceManager
5. Run New-AzureResourceGroup -Name
ADFTutorialResourceGroup -Location "West US"
6. Run New-AzureDataFactory -ResourceGroupName
ADFTutorialResourceGroup Name DataFactory(your
Using
Visual
Studio
Use in mature dev
environments
Use when integrated into
3. Create
Linked
Services
Connection
Data or
to

Connection to
Compute
Resource Also
Data Options
Source Sink
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
Blob
DocumentDB, OnPrem File System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
Table
DocumentDB, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
SQL Database
DocumentDB, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
SQL Data Warehouse
DocumentDB, Data Lake Store
DocumentDB Blob, Table, SQL Database, SQL Data Warehouse, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
Data Lake Store
DocumentDB, OnPrem File System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
SQL Server on IaaS
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
OnPrem File System
OnPrem File System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
OnPrem SQL Server
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
OnPrem Oracle Database
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
OnPrem MySQL Database
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
OnPrem DB2 Database
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS,
Activity Options
Transformation activity Compute environment
Hive HDInsight [Hadoop]
Pig HDInsight [Hadoop]
MapReduce HDInsight [Hadoop]
Hadoop Streaming HDInsight [Hadoop]
Machine Learning activities:
Batch Execution and Update Azure VM
Resource
Stored Procedure Azure SQL
Data Lake Analytics U-SQL Azure Data Lake Analytics
HDInsight [Hadoop] or Azure
DotNet
Batch
4: Create
Datasets
Named
reference or
pointer to
Dataset Concepts
{
"name": "<name of dataset>",
"properties":
{
"structure": [ ],
"type": "<type of dataset>",
"external": <boolean flag to indicate external data>,
"typeProperties":
{
},
"availability":
{

},
"policy":
{

}
}.
5. Create
Pipelines
Logical
Grouping of
Activities
Pipeline Concepts
{
"name": "PipelineName",
"properties":
{
"description" : "pipeline description",
"activities":
[

],
"start": "<start date-time>",
"end": "<end date-time>"
}
}
6. Manage and
Monitor
Scheduling,
Monitoring,
Disposition
Locating Failures within a
Pipeline
2015 Microsoft Corporation. All rights reserved.

You might also like