CIS2205-24-25-Assignment 2
CIS2205-24-25-Assignment 2
Tutor Referral NO
available
Resources • Please note: you can access free Office365 software
and you have 100 Gb of free storage space available on
Microsoft’s OneDrive – Guidance on downloading Office
365.
Assignment 2: Data-driven Artificial Intelligence
1. Assignment Aim
• To develop proficiency in implementing machine learning workflows in Python.
2. Learning Outcomes:
• Demonstrate a solid understanding of key stages in a machine-learning workflow.
• Implement and evaluate machine learning models using appropriate performance metrics.
• Effectively use Python tools for data collection, preprocessing, modeling, prediction,
evaluation, and visualization.
3. Assessment Brief:
This assignment consists of four questions, each designed to guide you through essential stages
of a machine learning project.
Section 1 – Data Collection and Preprocessing [20%]
• Select a dataset relevant to a predictive modeling task.
• Provide a brief description of the dataset, including feature descriptions and target
variable.
• Perform the following preprocessing steps:
1. Handle any missing values and outliers.
2. Encode categorical variables as needed.
3. Scale features if necessary.
4. Split the data into training and testing sets (e.g., 70% train, 30% test).
Provide and explain Python code snippets for each preprocessing step. Discuss the data
transformations applied and their relevance.
Section 2 – Model Selection and Training [25%]
• Select two machine learning algorithms suitable for the dataset and prediction task (e.g.,
classification or regression).
• For each model, implement the training process in Python, using the training dataset from
Section 1.
Required Steps:
1. Define each model and provide a brief explanation of why it is suitable for the task.
2. Train both models and save the trained models for evaluation.
Present code snippets and a brief justification for each model choice. Include any
hyperparameter tuning or optimizations performed.
Section 3 – Prediction and Evaluation [30%]
• Use the trained models from Question 2 to generate predictions on the test dataset.
• Evaluate each model’s performance using appropriate metrics:
o For classification tasks, report metrics such as accuracy, precision, recall, and F1-
score.
o For regression tasks, report metrics such as Mean Absolute Error (MAE), Root
Mean Square Error (RMSE), or R-squared (R²).
• Compare the models’ performances and discuss which model performs better and why.
Include Python code for generating predictions and calculating performance metrics. Interpret
and compare the results for each model.
Section 4 – Visualization and Insights [25%]
• Visualize key aspects of your machine learning project to help understand model
performance and data distribution.
Required Visualizations:
1. For classification tasks, provide a confusion matrix or ROC curve.
2. For regression tasks, plot predicted values against actual values.
3. Visualize feature importance (if applicable) to understand which features
contribute most to predictions.
Include code for each visualization and describe the insights gained from these visualizations.
Summarize your findings and observations based on the entire workflow.
Deliverables
1. A Python notebook (or script) containing well-commented code for each section.
2. A brief report summarizing your approach, findings, and key insights across the
assignment.
3. A video presentation demonstrating your project, with a maximum duration of 3 minutes.
**************************** Optional tasks ******************************************
For students seeking additional marks, complete one or more of the following tasks:
Note: This section is optional and provides an opportunity for additional marks. Select
one or more tasks from the list above, and include any code, visualizations, and
discussions in your submission.
******************************************************************************************
.
4. Marking Scheme (Assignment 2)
Marking Criteria – Data Collection and Preprocessing (5 + 5 + 5 + 5 = 20%)
• Dataset Description (5%): Up to 5 points for a complete and clear description of the
dataset, including features and target variable.
• Handling Missing Values and Outliers (5%): Up to 5 points for correctly identifying and
addressing missing values and outliers with clear explanations.
• Feature Scaling/Encoding (5%): Up to 5 points for appropriate feature scaling and
encoding based on the dataset’s requirements.
• Data Splitting (5%): Up to 5 points for correctly splitting the data into training and testing
sets and providing a justification for the chosen split ratio.
• Model Choice Justification (10%): Up to 10 points for choosing two suitable models
with thorough explanations of why each model fits the prediction task.
• Model Training (5%): Up to 5 points for successful training of both models, including
any tuning/optimization steps.
• Code Quality (5%): Up to 5 points for well-structured and well-documented code.
• Justification of Hyperparameter Choices (5%): Up to 5 points for clear justifications
of hyperparameters used, showing understanding of model settings.
• Confusion Matrix or ROC Curve (2.5%): Up to 2.5 points for a clear and accurate
visualization of classification model performance (or regression plot, if applicable).
• Feature Importance (2.5%): Up to 2.5 points for correctly visualizing feature
importance (if applicable), with explanations on the feature impacts.
• Additional Visualization (2.5%): Up to 2.5 points for any additional visualization, such
as error analysis or scatter plots of predictions vs. actuals.
• Insightful Analysis (7.5%): Up to 7.5 points for providing meaningful insights based on
the visualizations and summarizing findings effectively.
• Video demonstration (10%): A video presentation demonstrating your project, with a
maximum duration of 3 minutes.
5. Grading Rubric
These criteria are intended to help you understand how your work will be assessed. They describe
different levels of performance of a given criterion.
Criteria are not weighted equally, and the marking process involves academic judgement and
interpretation within the marking criteria.
The grades between Pass and Very Good should be considered as different levels of performance
within the normal bounds of the module. The Exceptional and Outstanding categories allow for
students who, in addition to fulfilling the Excellent requirements, perform at a superior level beyond
the normal boundaries of the module and demonstrate intellectual creativity, originality and
innovation.
INTERMEDIATE (FHEQ LEVEL 5)
90 Outstanding demonstration of scholarly application and critical understanding of subject
+ area knowledge
• well-structured assessment that addresses the learning outcomes and specific criteria for
the module
• critical understanding/application is evident through systematic, relevant and
comprehensive coverage of content
• clearly communicated in a style appropriate to the assessment brief
• very limited areas for improvement
60 Very good demonstration of the scholarly application and critical understanding of subject
+ area knowledge
• well-structured assessment that addresses the learning outcomes and specific criteria for
the module
• Critical understanding/application is generally evident in the coverage of content
• Clearly communicated in a style appropriate to the assessment brief
10 • poorly structured assessment that does not address the module learning outcomes and
+ specific criteria.
• coverage of the content is inadequate or incomplete.
• poor communication that does not use a style appropriate to the assessment brief
0+ Poorly structured assessment that does not address at all the learning outcomes and specific
criteria for the module
10