Assignment 53
Assignment 53
1. Note:
Part B of this assignment can be done in groups of two students or individuals. Both
students need to submit the assignment for both parts and provide both names,
email and student ID at the top of the assignment.
Submit a compressed archive (zip, tar, etc.) of your code, along with the input and output
files and screenshots (output/input commands with results). Please include your Azure
Machine Learning Notebook with markdown.
Also, include a pdf document with answers to the questions below. Please submit all
screenshots showing deployed resources in your Azure portal provide an explanation for
each step, also show final output screenshots.
Contact your TA for any questions related to this assignment or post clarification questions
to the Piazza platform.
PART A:
1. [Marks: 5] Explain below the 5 components shown in orange boxes. Explain which
Azure components you will use in this big data architecture and why.
Raw Data
Prepare and
transform data Model and
Ingest Data Data Store serve data
Unstructured Data
Structured Data
Part B:
Data Input: Claim a dataset from Piazza - link. If the dataset is too large, you can take a subset of
the data as well. No two groups can have the same dataset.
1. [Marks: 10] Explain what problem you are going to solve using this dataset. Provide a
brief overview of your problem statement. [Discuss your problem statement with your
TAs if they approve then you can proceed with the next steps.]
2. [Marks: 15] Explain your dataset. Explore your dataset and provide at least 5 meaningful
charts/graphs with an explanation.
3. [Marks: 15] Do data cleaning/pre-processing as required and explain what you have done
for your dataset and why.
4. [Marks: 20] Implement 2 machine learning models, explain which algorithms you have
selected and why. Compare them and show success metrics (Accuracy/RMSE/Confusion
Matrix) as per your problem. Explain results.
5. [Marks: 20] Deploy a run-time pipeline for your dataset using Azure Designer Studio.
Or
Do hyperparameter tuning for your algorithms. Explain your results.
Or
Use Automated ML for your data set. Explain the best model results.