Azure Data Factory Overview With Realtime Ex
Azure Data Factory Overview With Realtime Ex
Azure Data Factory (ADF) is a cloud-based data integration service from Microsoft Azure, which
schedule, and orchestrate ETL workflows. It supports seamless data flow between various data
and it offers both code-based and no-code solutions for creating ETL pipelines.
1. Pipelines
2. Activities
3. Datasets
4. Linked Services
5. Triggers
Coding Real-Time Example: Loading Data from Azure Blob to Azure SQL Database
In this example, we'll move data from an Azure Blob Storage to an Azure SQL Database using
We will also demonstrate the code that can automate this pipeline creation using the Azure Python
SDK.
### Steps:
1. **Create Linked Services**: Define connections for the Azure Blob Storage (source) and Azure
SQL Database (destination).
2. **Create Datasets**: Define the source dataset (Blob storage) and the target dataset (SQL table).
3. **Define Activities**: Create a Copy activity to transfer data from Blob to SQL.
```python
# Set up credentials
credential = DefaultAzureCredential()
subscription_id = 'your-subscription-id'
resource_group_name = 'your-resource-group'
data_factory_name = 'your-data-factory-name'
# Initialize client
linked_service_blob =
LinkedServiceResource(properties=AzureBlobStorageLinkedService(connection_string="Blob-Conn
ection-String"))
adf_client.linked_services.create_or_update(resource_group_name, data_factory_name,
'AzureBlobStorage', linked_service_blob)
linked_service_sql =
LinkedServiceResource(properties=AzureSqlDatabaseLinkedService(connection_string="SQL-Conn
ection-String"))
adf_client.linked_services.create_or_update(resource_group_name, data_factory_name,
'AzureSqlDatabase', linked_service_sql)
dataset_blob =
DatasetResource(properties=AzureBlobDataset(linked_service_name=LinkedServiceReference(refe
adf_client.datasets.create_or_update(resource_group_name, data_factory_name,
'BlobInputDataset', dataset_blob)
dataset_sql =
DatasetResource(properties=SqlServerTableDataset(linked_service_name=LinkedServiceReferenc
e(reference_name="AzureSqlDatabase"), table_name="dbo.SalesData"))
adf_client.datasets.create_or_update(resource_group_name, data_factory_name,
'SqlOutputDataset', dataset_sql)
copy_activity = CopyActivity(name="CopyFromBlobToSQL",
inputs=[DatasetReference(reference_name="BlobInputDataset")],
outputs=[DatasetReference(reference_name="SqlOutputDataset")], source=BlobSource(),
sink=SqlSink())
pipeline_resource = PipelineResource(activities=[copy_activity])
pipeline_resource)
'CopyPipeline')
```
This Python code uses Azure's Data Factory SDK to create a pipeline that reads data from an Azure
Blob Storage container and writes it to an Azure SQL Database. It defines linked services for the
storage and database, creates datasets for the input and output, and finally orchestrates the data
copy using a `CopyActivity`. You can monitor the pipeline using Azure's web interface or
- **Automation**: Using SDKs, you can automate pipeline creation and updates, making it scalable
- **Monitoring**: Easily integrate monitoring and alerting systems with code for real-time failure
detection.
Azure Data Factory supports various SDKs (Python, .NET, etc.) and can be used in combination
with other Azure services like Azure Functions, Logic Apps, and Event Grid for more advanced
scenarios.