ADF - Data Tranformation Activities
ADF - Data Tranformation Activities
Activities
- Activities that you can use to transform and process your raw data
into predictions and insights at scale
Stored Procedure Activity
• Allows you to execute a stored procedure in a SQL database as part of a data pipeline.
• Supported Databases: SQL Server, Azure SQL Database, Azure Synapse Analytics, and other databases with
linked services
• Key Features
• Parameter Passing: Pass parameters to the stored procedure from ADF pipeline.
• Output Handling: Capture and utilize the output of the stored procedure if it returns any result sets.
• Integration: Works seamlessly with other ADF activities, such as Copy and Data Flow activities.
• Use Cases
• Data Transformation: Execute complex transformations that are easier to manage in SQL.
• Data Cleaning: Run procedures to clean or aggregate data before further processing.
• Custom Logic: Implement and manage custom business logic directly in the database.
Configure a SP activity
7. Create and configure SP to update
customer table
Create table
a. Create table and schema in SQL database Create SP to insert value
• Key Features
• Seamless Execution: Run Databricks notebooks directly from ADF pipelines.
• Parameterization: Pass parameters from ADF to Databricks notebooks.
• Integration: Utilize notebook results in downstream ADF activities.
• Use Cases
• Data Transformation: Execute complex transformations that are better suited for Databricks notebooks.
• Machine Learning: Run machine learning models and training scripts written in notebooks.
• Data Integration: Integrate Databricks processing with other ADF activities like data copy and data flow.
8. Configure Databricks Activity
1. Search for Notebook in the pipeline Activities pane, and drag a Notebook activity to the
pipeline canvas.
2. Select the new Notebook activity on the canvas if it is not already selected.
3. Select the Azure Databricks tab to select or create a new Azure Databricks linked service that
will execute the Notebook activity.
• Select the Settings tab and specify the notebook path to be executed on Azure Databricks, optional
base parameters to be passed to the notebook, and any additional libraries to be installed on the
cluster to execute the job.
Databricks activity properties
Azure Function Activity
• Integrate and execute Azure Functions from within an Azure Data Factory (ADF) pipeline.
• Functionality: Allows for serverless compute tasks, enabling custom processing logic as part of your data
workflows.
• Key Features
• Serverless Execution: Leverage Azure Functions to run code without managing infrastructure.
• Custom Logic: Execute custom logic or APIs directly from your ADF pipeline.
• Parameter Passing: Send parameters to Azure Functions and retrieve results
• Use Cases
• Custom Processing: Implement custom data transformations or processing logic.
• API Calls: Trigger APIs or microservices that are encapsulated in Azure Functions.
• Data Enrichment: Enhance or transform data by executing complex logic in serverless functions.
9. Configure Azure Function Activity
1. Expand the Azure Function section of the pipeline Activities pane, and drag an Azure Function activity to
the pipeline canvas.
2. Select the new Azure Function activity on the canvas if it is not already selected, and its Settings tab, to
edit its details.
3. If you do not already have an Azure Function
linked service defined, select New to create a new
one. In the new Azure Function linked service
pane, choose your existing Azure Function App
URL and provide a Function Key.
4. After selecting the Azure Function linked service,
provide the function name and other details to
complete the configuration.
10. Copy data from SQL database to blob and log the pipeline
status in log table
create table audit.copylog
(
a. Create copy activity to copy data from loadid INT IDENTITY(1,1) PRIMARY KEY,
SQL to csv as shown before tablename varchar(50),
loadstatus varchar(50),
b. Create a audit table in SQL database dataread varchar(50),
to log the execution details Errorid varchar(50),
errormessage varchar(50)
c. Create stored procedure for logging
)
the execution details
CREATE PROCEDURE audit.loadstatus
@TableName VARCHAR(50),
@loadstatus varchar(50),
@dataread varchar(50),
@errorid varchar(50),
@errormessage varchar(50)
AS
BEGIN
INSERT INTO audit.copylog
(tablename,loadstatus,dataread,errorid,errormessage)
VALUES(@tablename,
@loadstatus,@dataread,@errorid,@errormessage)
END
d. Attach stored procedure activity to success side
of copy activity to log success details
e. Configure the activity to use audit.loadstatus SP
created
f. Pass values for parameters
Dataread = @activity('copy_data_sql_csv').output.dataread
Laodstatus =
@activity('copy_data_sql_csv').output.executionDetails[0].status
Tablename =
@concat(pipeline().parameters.schemaname,'_',pipeline().parameters
.tablename)
@activity('copy_data_sql_csv').output.errors[0].c
ode
@activity('copy_data_sql_csv').output.errors[0].m
essage
• Mapping Data Flows:
• Visual design of data transformations without coding.
• Executes as activities within pipelines on scalable Spark clusters.
• Integrates with scheduling, control, and monitoring features.
• HDInsight Activities:
• Hive: Execute Hive queries on HDInsight clusters.
• Pig: Execute Pig queries on HDInsight clusters.
• MapReduce: Run MapReduce programs on HDInsight clusters.
• Streaming: Execute Hadoop Streaming programs on HDInsight clusters.
• Spark: Run Spark programs on HDInsight clusters.
• ML Studio (Classic):
• Support ends on August 31, 2024.
• Use Batch Execution to run predictions; update models with the Update Resource activity
• Data Wrangling:
• Code-free data preparation using Power Query.
• Supports iterative, cloud-scale data wrangling via Spark execution.
• Note: Power Query is supported only in Azure Data Factory.
• Stored Procedure Activity:
• Invoke stored procedures in various SQL-based data stores.
• Data Lake Analytics U-SQL Activity:
• Run U-SQL scripts on Data Lake Analytics clusters.
• Azure Synapse & Databricks Activities:
• Synapse Notebook: Run Synapse notebooks.
• Databricks Notebook, Jar, Python: Run notebooks, Jars, and Python scripts on Databricks
clusters.
• Custom Activity:
• Create custom transformations with your own logic using Azure Batch or HDInsight.
References
• Pipelines and activities - Azure Data Factory &
• https://learn.microsoft.com/en-us/azure/data-factory/transform-data Azure Synapse | Microsoft Learn