0% found this document useful (0 votes)
21 views5 pages

Azure Data Factory Overview With Realtime Ex

Uploaded by

SANDY P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Azure Data Factory Overview With Realtime Ex

Uploaded by

SANDY P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Azure Data Factory Overview with Coding Real-Time Example

Azure Data Factory (ADF) is a cloud-based data integration service from Microsoft Azure, which

allows you to create,

schedule, and orchestrate ETL workflows. It supports seamless data flow between various data

sources and destinations,

and it offers both code-based and no-code solutions for creating ETL pipelines.

Key Components of Azure Data Factory:

1. Pipelines

2. Activities

3. Datasets

4. Linked Services

5. Triggers

6. Integration Runtime (IR)

Coding Real-Time Example: Loading Data from Azure Blob to Azure SQL Database

In this example, we'll move data from an Azure Blob Storage to an Azure SQL Database using

Azure Data Factory.

We will also demonstrate the code that can automate this pipeline creation using the Azure Python

SDK.

### Steps:

1. **Create Linked Services**: Define connections for the Azure Blob Storage (source) and Azure
SQL Database (destination).

2. **Create Datasets**: Define the source dataset (Blob storage) and the target dataset (SQL table).

3. **Define Activities**: Create a Copy activity to transfer data from Blob to SQL.

4. **Pipeline Creation**: Use Python SDK to orchestrate the ETL pipeline.

### Python Code Example for Pipeline Creation

```python

from azure.identity import DefaultAzureCredential

from azure.mgmt.datafactory import DataFactoryManagementClient

from azure.mgmt.datafactory.models import *

# Set up credentials

credential = DefaultAzureCredential()

subscription_id = 'your-subscription-id'

resource_group_name = 'your-resource-group'

data_factory_name = 'your-data-factory-name'

# Initialize client

adf_client = DataFactoryManagementClient(credential, subscription_id)

# Create Linked Service for Blob

linked_service_blob =

LinkedServiceResource(properties=AzureBlobStorageLinkedService(connection_string="Blob-Conn
ection-String"))

adf_client.linked_services.create_or_update(resource_group_name, data_factory_name,

'AzureBlobStorage', linked_service_blob)

# Create Linked Service for SQL Database

linked_service_sql =

LinkedServiceResource(properties=AzureSqlDatabaseLinkedService(connection_string="SQL-Conn

ection-String"))

adf_client.linked_services.create_or_update(resource_group_name, data_factory_name,

'AzureSqlDatabase', linked_service_sql)

# Create Dataset for Blob Storage

dataset_blob =

DatasetResource(properties=AzureBlobDataset(linked_service_name=LinkedServiceReference(refe

rence_name="AzureBlobStorage"), folder_path="input-folder", file_name="data.csv"))

adf_client.datasets.create_or_update(resource_group_name, data_factory_name,

'BlobInputDataset', dataset_blob)

# Create Dataset for SQL Database

dataset_sql =

DatasetResource(properties=SqlServerTableDataset(linked_service_name=LinkedServiceReferenc

e(reference_name="AzureSqlDatabase"), table_name="dbo.SalesData"))

adf_client.datasets.create_or_update(resource_group_name, data_factory_name,

'SqlOutputDataset', dataset_sql)

# Define Copy Activity

copy_activity = CopyActivity(name="CopyFromBlobToSQL",
inputs=[DatasetReference(reference_name="BlobInputDataset")],

outputs=[DatasetReference(reference_name="SqlOutputDataset")], source=BlobSource(),

sink=SqlSink())

# Create Pipeline with the Copy Activity

pipeline_resource = PipelineResource(activities=[copy_activity])

adf_client.pipelines.create_or_update(resource_group_name, data_factory_name, 'CopyPipeline',

pipeline_resource)

# Trigger Pipeline Run

run_response = adf_client.pipelines.create_run(resource_group_name, data_factory_name,

'CopyPipeline')

```

This Python code uses Azure's Data Factory SDK to create a pipeline that reads data from an Azure

Blob Storage container and writes it to an Azure SQL Database. It defines linked services for the

storage and database, creates datasets for the input and output, and finally orchestrates the data

copy using a `CopyActivity`. You can monitor the pipeline using Azure's web interface or

programmatically by querying the status of the pipeline run.

### Benefits of Coding in Azure Data Factory

- **Automation**: Using SDKs, you can automate pipeline creation and updates, making it scalable

for large deployments.

- **Flexibility**: Allows fine-tuned control over the pipeline configuration.

- **Monitoring**: Easily integrate monitoring and alerting systems with code for real-time failure

detection.
Azure Data Factory supports various SDKs (Python, .NET, etc.) and can be used in combination

with other Azure services like Azure Functions, Logic Apps, and Event Grid for more advanced

scenarios.

You might also like