0% found this document useful (0 votes)
52 views

Answer For Assignment 4

Uploaded by

Jiawei Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Answer For Assignment 4

Uploaded by

Jiawei Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment 4

Assignment on Azure Cloud Platform

Due by July 21, 2024

Name : Jiawei HUANG Student ID:1010629072

1. Note:
This assignment needs to be done by using the Azure Cloud Platform. In this assignment,
you will be working with Azure Data Factory, Azure SQL DB, Blob storage account and
ADLS Gen2.

Submit a compressed archive (zip, tar, etc.) of your code, along with screenshots
(output/input commands with results). Please include a pdf document with answers to the
questions below.

For Part A: Please submit all screenshots showing deployed resources in your Azure portal,
Azure Blob Storage, Azure Data Factory, ADLS Gen 2 and Azure SQL DB including your
account information at the top right corner of the webpage. Include the successful pipeline
runs screenshots with triggers.

For Part B: Please submit all screenshots showing deployed resources in your Azure portal,
Azure SQL DB and Query Editor screenshots where you run your code with output.

Contact your TA for any questions related to this assignment or post clarification questions
to the Piazza platform.

Part A:

1. [Marks: 5] Create a resource group in your Azure portal and deploy three resources.
Azure Data Factory, Azure SQL DB and Blob storage account.
2. [Marks: 15] Now create a pipeline in Azure Data Factory and copy
gender_jobs_data.csv file from the Blob storage account to Azure SQL DB. (First copy
this file from your local machine to Blob Storage). See this
https://docs.microsoft.com/en-us/azure/data-factory/tutorial-copy-data-portal for
reference.
The code is in the SQL code file: Part_Q2.sql
3. [Marks: 10] Explain the different types of triggers available in ADF. Now create a
schedule trigger and run your pipeline every 3 minutes. Show 5 successful runs.
Ans:

The main types of triggers are Schedule triggers, Tumbling window triggers, and Event-based
triggers. Schedule triggers allow pipelines to run at specified times and intervals, making them
ideal for regular, recurring tasks. Tumbling window triggers execute pipelines in contiguous, non-
overlapping time windows, which are useful for scenarios where data is processed in fixed-size
chunks, ensuring no data overlap. Event-based triggers initiate pipelines in response to events such
as the arrival or deletion of blobs in Azure Blob Storage. This trigger type is particularly useful for
reactive data processing, enabling immediate action upon data changes. Each trigger type in ADF
helps in automating and streamlining data workflows, catering to various scheduling and event-
driven needs.
4. [Marks: 20] A client needs to replicate objects from ADLS Gen 2 in Canada Central
to ADLS Gen 2 in West Europe. Let’s say they want to do this in a bi-directional
way. How can you set this up? [Hint: This probably can be done using Azure Data
Factory and Event Triggers. For eg; every time there is a new Blob on one side, it
needs to be replicated on the other one]

Ans:

To set up bi-directional replication of objects between ADLS Gen2 accounts in Canada Central and
West Europe using ADF and Event Triggers, begin by creating ADLS Gen2 accounts and
containers in both regions. Next, establish an Azure Data Factory instance and create linked
services to connect to these ADLS Gen2 accounts. In ADF, set up datasets for the source and
destination containers in each region. Create two pipelines: one to copy data from Canada Central
to West Europe and another to copy data from West Europe to Canada Central, each configured
with appropriate source and sink datasets. For automation, create event triggers in ADF. One event
trigger should monitor the ADLS Gen2 container in Canada Central and link to the pipeline that
copies data to West Europe. Similarly, another event trigger should monitor the container in West
Europe and link to the pipeline that copies data to Canada Central. Optionally, implement
additional logic for handling deletions if necessary. Finally, monitor and test the setup by
uploading files to both ADLS Gen2 containers to ensure successful replication across regions. This
configuration ensures any new blobs created in one region are automatically replicated to the other,
maintaining bi-directional synchronization.
PART B:

In this part, you will use Query Editor in Azure SQL DB and use the gender_jobs_data.csv table
to perform the below queries.

Data input

For part B implementation, use the same table that is provided to you.
• Gender_jobs_data.csv

Implementation
You need to use Azure SQL Database for this part.

1. [Marks:5] In the gender_jobs_data table - Filter all the OCCUPATIONS in


MAJOR_CATEGORY of Computer, Engineering, and Science for the YEAR 2013

SELECT occupation
FROM gender_jobs
WHERE major_category = 'Computer, Engineering, and Science' AND year = 2013;

Ans: Answer shows in the Answer CSV file: PartB_Q1_result.cvs

2. [Marks:5] In the gender_jobs_data table - How many OCCUPATIONS exist in the


MINOR_CATEGORY of Business and Financial Operations overall?

SELECT COUNT(DISTINCT occupation)


FROM gender_jobs
WHERE minor_category = 'Business and Financial Operations';

Ans: 28

3. [Marks:5] In the gender_jobs_data table - Get all relevant information for bus drivers
across all years

SELECT *
FROM gender_jobs
WHERE occupation = 'Bus drivers';

Ans: Answer shows in the Answer CSV file: PartB_Q3_result.cvs

4. [Marks:5] In the gender_jobs_data table - Summarize the total number of


WORKERS_FEMALE in the MAJOR_CATEGORY of Management, Business, and
Financial by each year.

SELECT year, SUM(workers_female) AS total_female_workers


FROM gender_jobs
WHERE major_category = 'Management, Business, and Financial'
GROUP BY year
ORDER BY year;

5. [Marks:5] In the gender_jobs_data table - What were the total earnings of male
(TOTAL_EARNINGS_MALE) employees in the Service MAJOR_CATEGORY for the
year 2015?

SELECT SUM(total_earnings_male) AS total_earnings_male_2015


FROM gender_jobs
WHERE major_category = 'Service' AND year = 2015;

Ans: 2502426

6. [Marks:5] In the gender_jobs_data table - How many female workers were in management
roles in the year 2015?

SELECT SUM(CAST(workers_female AS int))


FROM gender_jobs
WHERE minor_category = 'Management' AND year = 2015;

Ans: 5166720

7. [Marks:5] In the gender_jobs_data table - Compare the TOTAL_EARNINGS_MALE and


TOTAL_EARNINGS_FEMALE earnings irrespective of occupation by each year
SELECT
year,
SUM(total_earnings_male) AS total_earnings_male,
SUM(total_earnings_female) AS total_earnings_female
FROM gender_jobs
WHERE total_earnings_male IS NOT NULL
AND total_earnings_female IS NOT NULL
GROUP BY year
ORDER BY year;
8. [Marks:5] In the gender_jobs_data table - How much money
(TOTAL_EARNINGS_FEMALE) did female workers make as engineers in 2016?

SELECT
SUM(CAST(total_earnings_female AS int)) AS total_earnings_female
FROM gender_jobs
WHERE occupation LIKE '%engineer%'
AND year = 2016
AND total_earnings_female IS NOT NULL;

Ans: 1844254

9. [Marks:10] What is the total number of full-time and part-time female workers versus male
workers year over year?

SELECT year,
SUM(ROUND(CAST(workers_female AS int) * CAST(full_time_female AS float) / 100, 0))
AS total_full_time_female,
SUM(ROUND(CAST(workers_male AS int) * CAST(full_time_male AS float) / 100, 0)) AS
total_full_time_male,
SUM(ROUND(CAST(workers_female AS int) * CAST(part_time_female AS float) / 100, 0))
AS total_part_time_female,
SUM(ROUND(CAST(workers_male AS int) * CAST(part_time_male AS float) / 100, 0)) AS
total_part_time_male
FROM gender_jobs
WHERE workers_female IS NOT NULL
AND full_time_female IS NOT NULL
AND workers_male IS NOT NULL
AND full_time_male IS NOT NULL
AND part_time_female IS NOT NULL
AND part_time_male IS NOT NULL
GROUP BY year
ORDER BY year;

You might also like