0% found this document useful (0 votes)
12 views

Project Overview

Uploaded by

Aya Laadaili
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Project Overview

Uploaded by

Aya Laadaili
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Project Overview

Instructions
Now that you are equipped with the knowledge and skills to extract, transform and load data you will use
these skills to perform ETL, create a pipeline and upload the data into a database. You will use BashOperator
with Airflow in the hands-on lab.

Scenario
You are a data engineer at a data analytics consulting company. You have been assigned a project to
decongest the national highways by analyzing the road traffic data from different toll plazas. Each highway is
operated by a different toll operator with a different IT setup that uses different file formats. Your job is to
collect data available in different formats and consolidate it into a single file.

In this assignment, you will develop an Apache Airflow DAG that will:

Extract data from a csv file


Extract data from a tsv file
Extract data from a fixed-width file
Transform the data
Load the transformed data into the staging area

Grading Criteria
There are a total of 25 points for 13 tasks in this final project spread in one hands-on lab.

Your final assignment will be graded by your peers who are also completing this assignment within the same
session. Your grade will be based on the following tasks:

Exercise 1: Create imports, DAG argument and definition


Task 1.1: Define DAG arguments (2pts)
Task 1.2: Define the DAG (2pts)
Exercise 2: Create the tasks using BashOperator
Task 2.1: Create a task to unzip data. (2pts)
Task 2.2: Create a task to extract data from csv file (2pts)
Task 2.3: Create a task to extract data from tsv file (2pts)
Task 2.4: Create a task to extract data from fixed width file (2pts)
Task 2.5: Create a task to consolidate data extracted from previous tasks (2pts)
Task 2.6: Transform the data (2 pts)
Task 2.7: Define the task pipeline (1pt)
Exercise 3: Getting the DAG operational
Task 3.1: Submit the DAG (1pt)
Task3.2: Unpause and trigger the DAG (3pt)
Task 3.3: List the DAG tasks (2 pt)
Task 3.4: Monitor the DAG (2pt)

How to submit
You will need to submit a screenshot in the JPEG or PNG format or all tasks. The screenshots will be uploaded
in the submission step of the final project. You will be prompted to save screenshots throughout the labs and
these will be the files you submit during the Project Submission and Peer Review section of this course.

You might also like