Azure Data Factory Interview Questions Answers 1740678784
Azure Data Factory Interview Questions Answers 1740678784
3️. What are the different types of Integration Runtimes (IR) in ADF?
💡 Answer:
Azure Data Factory supports three types of Integration Runtimes:
🔹 Azure IR – Used for cloud-based data movement & transformation
🔹 Self-Hosted IR – Used for on-prem or private network data access
🔹 Azure-SSIS IR – Used to run SSIS packages in the cloud
5️. How do you move data from On-Prem to Azure using ADF?
💡 Answer:
To move data from an on-premises database to Azure, follow these steps:
✅ Install Self-Hosted Integration Runtime (SHIR) on an on-prem server
✅ Create a Linked Service in ADF to connect to the on-prem database
✅ Use Copy Data Activity to move data to Azure (ADLS, Blob, SQL, Synapse)
✅ Schedule pipeline execution using triggers
📌 Example Use Case: Cleansing & transforming raw data before loading it into Azure
Synapse Analytics.
🔹 ADF is a modern, scalable, cloud-native alternative to SSIS for hybrid data movement &
orchestration.
🔹 Solution Approach:
✅ Extract: Copy sales data from SQL Server to Azure Data Lake (ADLS Gen2️).
✅ Transform: Use Mapping Data Flow or Databricks for cleansing.
✅ Load: Store the transformed data into Azure Synapse for Power BI reporting.
🔹 Step-by-Step Implementation:
1️⃣ Create a Linked Service to connect to SQL Server (on-prem).
2️⃣ Use Self-Hosted IR to securely move data from on-prem to cloud.
3️⃣ Copy Data Activity → Move data to Azure Data Lake Storage (ADLS Gen2️).
4️⃣ Mapping Data Flow → Clean missing values, format dates, and filter records.
5️⃣ Load Data into Azure Synapse Analytics for BI reporting.
6️⃣ Schedule Pipeline Execution using Schedule Trigger (runs daily at midnight).
📌 Example Query for Incremental Load:
✔ Outcome: Automated ETL pipeline keeps data updated in Azure Synapse for Power BI
reports.
2️. Data Migration: Moving Data from On-Prem SQL Server to Azure
🔹 Scenario:
A financial company wants to migrate historical data from on-prem SQL Server to Azure
SQL Database.
🔹 Solution Approach:
✅ Extract: Read data from on-prem SQL Server.
✅ Transfer: Use Self-Hosted IR to securely move data to Azure.
✅ Load: Store data in Azure SQL Database with incremental updates.
🔹 Step-by-Step Implementation:
1️⃣ Install Self-Hosted Integration Runtime (SHIR) on an on-prem machine.
2️⃣ Create a Linked Service to connect SQL Server and Azure SQL Database.
3️⃣ Use Copy Data Activity to transfer data.
4️⃣ Enable Incremental Load using Watermark Columns.
5️⃣ Monitor & Log Pipeline Runs using Azure Monitor.
🔹 Step-by-Step Implementation:
1️⃣ Use Event-Based Trigger to detect new IoT data arrival in ADLS.
2️⃣ Copy Raw IoT Data to Azure Databricks for processing.
3️⃣ Use Databricks Notebooks to filter anomalies, aggregate sensor readings.
4️⃣ Store Data in Delta Lake (Optimized for analytics).
5️⃣ Use Power BI for Real-Time Dashboards.
df = spark.read.json("dbfs:/mnt/iot/raw/")
df_cleaned = df.filter(col("temperature").isNotNull()) \
.groupBy("device_id") \
.agg(avg("temperature").alias("avg_temperature"))
df_cleaned.write.format("delta").save("dbfs:/mnt/iot/processed/")
🔹 Solution Approach:
✅ Extract Data from API using Web Activity in ADF.
✅ Transform Data in Mapping Data Flow (clean, remove duplicates).
✅ Load Data into Azure SQL Database for reporting.
🔹 Step-by-Step Implementation:
1️⃣ Create a Web Activity in ADF to call REST API (GET request).
2️⃣ Store JSON response in ADLS for staging.
3️⃣ Use Mapping Data Flow to parse and clean API data.
4️⃣ Use Copy Data Activity to store data in Azure SQL.
5️⃣ Schedule Pipeline Execution using Triggers.
"url": "https://api.example.com/patients",
"method": "GET",
"headers": {
✔ Outcome: API data is automatically fetched and stored in Azure SQL for further analysis.
🔹 Solution Approach:
✅ Ingest CSV Files using Event-Based Triggers in ADF.
✅ Convert CSV to Parquet Format for better performance.
✅ Store in Azure Data Lake & Query with Synapse.
🔹 Step-by-Step Implementation:
1️⃣ Use Event-Based Trigger to detect new CSV files in ADLS.
2️⃣ Copy Data Activity to move raw CSV files to staging folder.
3️⃣ Use Mapping Data Flow to convert CSV to Parquet format.
4️⃣ Store Processed Data in Azure Data Lake (ADLS Gen2️).
5️⃣ Query Data Using Azure Synapse Serverless SQL.
📌 Example Query to Read Parquet Data in Synapse:
BULK 'https://datalake.blob.core.windows.net/processed/shipments.parquet',
FORMAT='PARQUET'
) AS Shipments
✔ Outcome: Optimized Parquet files allow faster queries and reduced storage costs.
Gopi Rayavarapu