Assignment 5-2
Assignment 5-2
1. Note:
Submit a compressed archive (zip, tar, etc.) of your code, along with the input and output
files and screenshots (output/input commands with results) or include your Azure Machine
Learning Notebook with markdown explanations.
Also, include a pdf document with answers to the questions below. Please submit all
screenshots showing deployed resources in your Azure portal and provide an explanation
for each step, also show final output screenshots.
Contact your TA for any questions related to this assignment or post clarification questions
to the Piazza platform.
PART A:
1. [Marks: 5] Explain below the 5 components shown in orange boxes. Explain which
Azure components you will use where in this big data architecture and why.
Raw Data
Prepare and
transform data Model and
Ingest Data Data Store serve data
Unstructured Data
Structured Data
Part B:
Data Input: Claim a dataset from Piazza - link. If the dataset is too large, you can take a subset of
the data as well. No two groups can have the same dataset.
1. [Marks: 10] Explain what problem you are going to solve using this dataset. Provide a
brief overview of your problem statement. [Discuss your problem statement with your
TAs if they approve then you can proceed with the next steps.]
2. [Marks: 15] Explain your dataset. Explore your dataset and provide at least 5 meaningful
charts/graphs with an explanation.
3. [Marks: 15] Do data cleaning/pre-processing as required and explain what you have done
for your dataset and why?
4. [Marks: 20] Implement 2 machine learning models and explain which algorithms you
have selected and why. Compare them and show success metrics
(Accuracy/RMSE/Precision/Recall) as per your problem. Explain results.
5. [Marks: 20] Deploy a run-time pipeline for your dataset using Azure Designer Studio.
Or
Do hyperparameter tuning for your algorithms. Explain your results.
Or
Use Automated ML for your data set. Explain the best model results.