Answer For Assignment 4
Answer For Assignment 4
1. Note:
This assignment needs to be done by using the Azure Cloud Platform. In this assignment,
you will be working with Azure Data Factory, Azure SQL DB, Blob storage account and
ADLS Gen2.
Submit a compressed archive (zip, tar, etc.) of your code, along with screenshots
(output/input commands with results). Please include a pdf document with answers to the
questions below.
For Part A: Please submit all screenshots showing deployed resources in your Azure portal,
Azure Blob Storage, Azure Data Factory, ADLS Gen 2 and Azure SQL DB including your
account information at the top right corner of the webpage. Include the successful pipeline
runs screenshots with triggers.
For Part B: Please submit all screenshots showing deployed resources in your Azure portal,
Azure SQL DB and Query Editor screenshots where you run your code with output.
Contact your TA for any questions related to this assignment or post clarification questions
to the Piazza platform.
Part A:
1. [Marks: 5] Create a resource group in your Azure portal and deploy three resources.
Azure Data Factory, Azure SQL DB and Blob storage account.
2. [Marks: 15] Now create a pipeline in Azure Data Factory and copy
gender_jobs_data.csv file from the Blob storage account to Azure SQL DB. (First copy
this file from your local machine to Blob Storage). See this
https://docs.microsoft.com/en-us/azure/data-factory/tutorial-copy-data-portal for
reference.
The code is in the SQL code file: Part_Q2.sql
3. [Marks: 10] Explain the different types of triggers available in ADF. Now create a
schedule trigger and run your pipeline every 3 minutes. Show 5 successful runs.
Ans:
The main types of triggers are Schedule triggers, Tumbling window triggers, and Event-based
triggers. Schedule triggers allow pipelines to run at specified times and intervals, making them
ideal for regular, recurring tasks. Tumbling window triggers execute pipelines in contiguous, non-
overlapping time windows, which are useful for scenarios where data is processed in fixed-size
chunks, ensuring no data overlap. Event-based triggers initiate pipelines in response to events such
as the arrival or deletion of blobs in Azure Blob Storage. This trigger type is particularly useful for
reactive data processing, enabling immediate action upon data changes. Each trigger type in ADF
helps in automating and streamlining data workflows, catering to various scheduling and event-
driven needs.
4. [Marks: 20] A client needs to replicate objects from ADLS Gen 2 in Canada Central
to ADLS Gen 2 in West Europe. Let’s say they want to do this in a bi-directional
way. How can you set this up? [Hint: This probably can be done using Azure Data
Factory and Event Triggers. For eg; every time there is a new Blob on one side, it
needs to be replicated on the other one]
Ans:
To set up bi-directional replication of objects between ADLS Gen2 accounts in Canada Central and
West Europe using ADF and Event Triggers, begin by creating ADLS Gen2 accounts and
containers in both regions. Next, establish an Azure Data Factory instance and create linked
services to connect to these ADLS Gen2 accounts. In ADF, set up datasets for the source and
destination containers in each region. Create two pipelines: one to copy data from Canada Central
to West Europe and another to copy data from West Europe to Canada Central, each configured
with appropriate source and sink datasets. For automation, create event triggers in ADF. One event
trigger should monitor the ADLS Gen2 container in Canada Central and link to the pipeline that
copies data to West Europe. Similarly, another event trigger should monitor the container in West
Europe and link to the pipeline that copies data to Canada Central. Optionally, implement
additional logic for handling deletions if necessary. Finally, monitor and test the setup by
uploading files to both ADLS Gen2 containers to ensure successful replication across regions. This
configuration ensures any new blobs created in one region are automatically replicated to the other,
maintaining bi-directional synchronization.
PART B:
In this part, you will use Query Editor in Azure SQL DB and use the gender_jobs_data.csv table
to perform the below queries.
Data input
For part B implementation, use the same table that is provided to you.
• Gender_jobs_data.csv
Implementation
You need to use Azure SQL Database for this part.
SELECT occupation
FROM gender_jobs
WHERE major_category = 'Computer, Engineering, and Science' AND year = 2013;
Ans: 28
3. [Marks:5] In the gender_jobs_data table - Get all relevant information for bus drivers
across all years
SELECT *
FROM gender_jobs
WHERE occupation = 'Bus drivers';
5. [Marks:5] In the gender_jobs_data table - What were the total earnings of male
(TOTAL_EARNINGS_MALE) employees in the Service MAJOR_CATEGORY for the
year 2015?
Ans: 2502426
6. [Marks:5] In the gender_jobs_data table - How many female workers were in management
roles in the year 2015?
Ans: 5166720
SELECT
SUM(CAST(total_earnings_female AS int)) AS total_earnings_female
FROM gender_jobs
WHERE occupation LIKE '%engineer%'
AND year = 2016
AND total_earnings_female IS NOT NULL;
Ans: 1844254
9. [Marks:10] What is the total number of full-time and part-time female workers versus male
workers year over year?
SELECT year,
SUM(ROUND(CAST(workers_female AS int) * CAST(full_time_female AS float) / 100, 0))
AS total_full_time_female,
SUM(ROUND(CAST(workers_male AS int) * CAST(full_time_male AS float) / 100, 0)) AS
total_full_time_male,
SUM(ROUND(CAST(workers_female AS int) * CAST(part_time_female AS float) / 100, 0))
AS total_part_time_female,
SUM(ROUND(CAST(workers_male AS int) * CAST(part_time_male AS float) / 100, 0)) AS
total_part_time_male
FROM gender_jobs
WHERE workers_female IS NOT NULL
AND full_time_female IS NOT NULL
AND workers_male IS NOT NULL
AND full_time_male IS NOT NULL
AND part_time_female IS NOT NULL
AND part_time_male IS NOT NULL
GROUP BY year
ORDER BY year;