0% found this document useful (0 votes)
16 views

COM7039M MachineLearning Assignment Brief-Level 7-1

The COM7039M Machine Learning assignment requires students to design and develop a machine learning prediction model, demonstrating their understanding of the subject through two tasks: a theoretical exercise on K-means clustering and a programming exercise using a hate speech dataset from Twitter. Students must explore, preprocess, and analyze the dataset, select appropriate algorithms, and evaluate model performance using various metrics. The assignment emphasizes the importance of academic integrity, proper referencing, and adherence to submission guidelines.

Uploaded by

Alen Joy kj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

COM7039M MachineLearning Assignment Brief-Level 7-1

The COM7039M Machine Learning assignment requires students to design and develop a machine learning prediction model, demonstrating their understanding of the subject through two tasks: a theoretical exercise on K-means clustering and a programming exercise using a hate speech dataset from Twitter. Students must explore, preprocess, and analyze the dataset, select appropriate algorithms, and evaluate model performance using various metrics. The assignment emphasizes the importance of academic integrity, proper referencing, and adherence to submission guidelines.

Uploaded by

Alen Joy kj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

School of London

COM7039M Machine Learning Assignment


Brief
Contents
Module Details ........................................................................................................... 1
Assignment Description.............................................................................................. 2
Learning Outcomes .................................................................................................... 4
Advice and Guidance ................................................................................................. 5
How is this assessment marked? ............................................................................... 6
Marking Criteria .......................................................................................................... 7

Module Details

Module code: COM7039M Level of Study: 7

Module Leader(s): Dr. Rebecca Credits: 15


Jeyavadhanam. B
Assessment format: Creative Artefact- Method of Turnitin within
A practical project submission: Moodle
to design and
develop an ML
prediction model
with supporting
documentation
Deadline or 28th Jan 2025, Feedback date and 19th Feb 2025,
Assessment Period: 12Noon place: Written feedback
within Turnitin/Moodle
Assessment limits: N/A Component number: 1 of 1
length, load, word count,
etc.
Is this exempt from No Component 100%
anonymous marking weighting:
under the policy?
School of London

Assignment Description

This coursework aims to demonstrate the students' comprehensive understanding


and knowledge of the Machine Learning module by evaluating their analytical
abilities and strengths. It comprises two tasks designed to assess and challenge
their analytical skills. These tasks have been carefully crafted to test the student's
proficiency in applying the concepts learned throughout the module and to
showcase their ability to tackle real-world scenarios and problems. Successful
completion of these tasks will reflect the students' mastery of the subject matter and
their capacity for critical and creative thinking.

The Assessment Consists of TWO Tasks:


In Task 1, you will be presented with a set of questions that require critical
evaluation of your subject knowledge and understanding of the concepts
related to the Machine Learning (ML) process and techniques.

In Task 2, you will be provided a programming exercise with a dataset to


analyse using the Machine Learning approach. Your objective is to use
suitable Machine Learning algorithms to predict and evaluate a model's
accuracy. The dataset will contain various features and a target variable that
you need to predict. You are expected to develop a model, followed by training,
testing, and evaluating its performance. Your goal is to identify and highlight
the best predictive model for classification. This includes comparing different
models and selecting the one that offers the highest accuracy and best
performance metrics for the classification task.

The content of this assignment must be supported by the inclusion of pertinent


academic theories, concepts, models, and contemporary industrial insights.
Provide a detailed and relevant description of your code. Ensure your work is
accurately cited and referenced using the York St John Harvard Referencing
Style.

Task 01: Theory Exercise – (20 Marks)

a. Discuss the K-means clustering algorithm in detail, including its working


principles, advantages, disadvantages, and real-world applications. (10
Marks)

b. How would you evaluate the performance of the classification algorithm with
appropriate metrics? (10 Marks)

Task 02: Programming Exercise -– (80 Marks)

a. Develop a classification model with Machine Learning techniques to detect


hate speech from the Twitter dataset.
School of London

Assignment Description

Dataset Description:

The "Hate Speech and Offensive Language" dataset is collected from Twitter.
It is primarily designed to support research and development in detecting and
analyzing hate speech and offensive language on social media, distinguishing
them from ordinary slang and neutral content.

Dataset Link:

https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-
language-dataset

Alternative Source:

The datasets are available to download from the Machine Learning (ML) module
in the Moodle platform.

Guidelines to Prepare Your Assignment:

1. Data Exploration and Pre-processing: (10 Marks)


✓ Load and explore the dataset to gain insights into the data's characteristics.
✓ Handle missing values, if any, and perform data cleansing as required.
✓ Perform data visualization to understand variables' distribution and
relationships with them.
2. Feature Engineering Process: (15 marks)
✓ Extract meaningful information from the dataset using Feature Extraction
Technique
✓ Reduce the number of features using PCA as the Dimensionality Reduction
Technique
✓ Select the most prominent features to capture the complex relationships
✓ Handle the outliers to prevent them from significantly affecting the model
3. Model Selection and Training: (10 Marks)
✓ Split the dataset into training and testing sets.
✓ Select appropriate machine learning algorithms (e.g., logistic regression,
decision trees, random forests, support vector machines, etc.) for the
predictive modeling task.
✓ Train the selected models on the training data and evaluate their
performance on the testing data.
4. Hyperparameter Tuning: (10 Marks)
✓ Fine-tune the hyperparameters of the chosen algorithm to optimize the
model's performance and avoid model overfitting.
5. Model Evaluation: (20 Marks)
✓ Compare the performance of different models using appropriate evaluation
metrics such as accuracy, precision, recall, F1-score, area under the
receiver operating characteristic curve (AUC-ROC), Confusion matrix, and
Logarithmic loss (Log Loss).
School of London

Assignment Description

✓ Comparative analysis to identify the best-performing model for classification.


6. Model Deployment: (10 Marks)
✓ Deploy the model to generate predictions for new and previously unseen
data.
7. Conclusion and Recommendations: (5 marks)
✓ Summarize the key findings of your analysis, highlighting the model's
performance and any insights gained.
Propose potential improvements or additional steps that could be taken to
enhance the system.

Learning Outcomes

PLOs 7.1-7.7

7.1Evaluate computer science concepts and principles and their application to the
effective

design, implementation, and usability of computer-based systems.

7.2 Apply the findings of advanced scholarship and/or contemporary research and
practice to

the solution to computer science problems

7.3 Critically evaluate computer science problems, including those at the forefront of
the field.

7.4 Demonstrate operation within applicable professional, legal, social, and ethical
frameworks.

7.5 Demonstrate originality and creativity in the solution of computer science problems.

7.6 Recommend, with detailed justification, the appropriate computer science


principles and

practices to apply to significant domain-specific activity.

7.7 Apply standards, quality processes, and engineering principles to the solution of
computer science problems.
School of London

Advice and Guidance

Submission Guidelines:

➢ Prepare a comprehensive report documenting your approach,


methodologies, results, and insights gained from the project.

➢ Include code snippets, visualizations, and explanations to support your


findings. Your report should be clear, concise, and well-organized.

➢ Submit the complete report, including both Task 1 and Task 2, as a


single PDF file. The code must be submitted as a supporting document
in the .ipynb file format.

Additional Guidelines for Students:


Students must submit their own work. They must acknowledge the sources
used in this assignment, failure to acknowledge would be plagiarism which is
an academic offence and a penalty can be imposed. Students need to write by
reading other papers on their own with citations and leave references at the
end of the assignment.
Students work would be submitted to the national plagiarism facility. This
identifies the sources from the internet and other extensive databases. Once
the student’s work has been submitted to detection services, work is stored in
databases electronically and compared their work from other sources. It is
necessary to keep a backup of their work. Students’ materials would be stored
in the database electronically for indefinite periods.
It is essential that you acknowledge the source of any research, information,
ideas, opinions, theories, or other material which is not your own. Effective
referencing, quoting, paraphrasing, and summarising show evidence of the
reading you have done and ensure that you avoid accusations of plagiarism.
The University's fundamental stance on the use of Turnitin is geared toward
supporting students' academic development. You can use this link to check
your work for areas where you might be at risk of plagiarising.
Please submit your assignment on time. All assignments may be electronically
submitted using Turnitin (via Moodle) by midnight on the due date. Please do
not submit your assignment last minute. Please also allow time for any
problems or issues with systems.
The work you present should be your own work, and not just copied from
others. You can quote from others, but you must say who the author is and
School of London

Advice and Guidance

use quotation marks or paraphrase. If you do not do so, we will investigate


your work for academic misconduct. This is particularly likely if your Turnitin
similarity score is above 25% and/or individual matches are above 6%.
If you require support with your study skills, please visit
https://www.yorksj.ac.uk/students/study-skills/
Please refer to the York St John University Code of Practice for Assessment
and Academic Related Matters 2024-25.
We ask that you pay particular attention to the academic misconduct policy.
Penalties will be applied where a student is found guilty of academic and/or
ethical misconduct, including termination of programme (Policy Link).
You are required to keep to the word limit set for an assessment and to note
that you may be subject to penalty if you exceed that limit. You are required to
provide an accurate word count on the cover sheet for each piece of work you
submit (Policy Link).
For late or non-submission of work by the published deadline or an approved
extended deadline, a mark of 0NS will be recorded. Where a re-assessment
opportunity exists, a student will normally be permitted only one attempt to be
re-assessed for a capped mark (Policy Link).
An extension to the published deadline may be granted to an individual student
if they meet the eligibility criteria of the (Policy Link).

How is this assessment marked?

Your work will be marked according to the assessment instructions provided within this
document and the selected Learning Outcomes’ (LOs) (see above).

Furthermore, this assessment is marked using the assessment marking criteria or a similar
rubric that aligns with the University’s Generic Assessment Descriptors (see below).1 This
is to ensure all assessment decisions are comparable regardless of the discipline or mode
of assessment.

Please note that you must meet the required baseline standards (50 – 59%) which will
include the LOs and minimum expectations of the assessment. Further still, you must
ensure you meet the requirements of each grade boundary to progress to the next, i.e., you
should demonstrate your learning through the standards of the Pass, Merit and Distinction
to reach a Distinction (70 – 84%). These standards are designed to scaffold and build your
learning to achieve your fullest potential in each criterion being assessed.
School of London
Deliverables for Task 1 and Task 2

Criteria Deliverables Marks

a An extraordinary conceptual 10
understanding of K-means Marks
clustering algorithm, advantages,
Task and disadvantages with real-world
01- 20 applications. If any examples are
provided.
Marks b An appropriate description of all 10
the metrics like accuracy, Marks
precision, recall, F1-score, area
under the receiver operating
characteristic curve (AUC-ROC),
Confusion matrix, and Logarithmic
loss (Log Loss.)
Data Exploration Correct handling of missing values, 10
and Pre-processing outliers, and data normalization. Marks
Effective exploratory data analysis
(EDA) to gain insights into the
dataset. Provide a well-documented
Jupyter Notebook or script
Task
containing the code for data
02- preprocessing steps. Ensure that
each step is properly commented on
80
to explain its purpose and
Marks functionality.
Feature Present code segments that 15
Engineering generate new features based on Marks
domain knowledge or creative
insights. Discuss the encoding
methods chosen and their suitability
for the problem. Applying
mathematical transformations to
numerical features. Scaling
numerical features to ensure they
are on similar scales. Discuss the
feature selection and its importance.
Model Selection Correct implementation of selected 10
and Training machine learning algorithms Marks
should be presented. Splitting the
dataset into training and testing
(validation) sets using techniques
School of London
like the train-test split or k-fold
cross-validation. Adequate use of
libraries and tools to streamline the
implementation process.
Explanation of the method you used 10
Hyperparameter for hyperparameter tuning reasons Marks
Tuning for selecting this method and how it
suits your specific problem. Data
Splitting Strategy: How you divided
your data into training, validation,
and test sets. Model Training and
Evaluation Protocol: Description of
how you trained and evaluated
models for different hyperparameter
configurations. Explanation of the
performance metric(s) you used to
assess model performance.
Model Evaluation Accurate evaluation of model 20
performance using relevant metrics Marks
(e.g., accuracy, precision, recall, F1-
score).
Comprehensive comparison of
multiple models and their
strengths/limitations. Insightful
interpretation of results and trends
observed.
Model Deployment Testing the developed model using 10
real-world data or unseen data that Marks
it performs as expected and
provides accurate predictions and
validates the model performance.
Conclusion and The clear and organized structure of 5
Recommendations the report with proper sections Marks
(Introduction, Methodology, Results,
Discussion, Conclusion). Coherent
explanations of the implemented
algorithms and techniques. Effective
visualization of results through
graphs, charts, and tables. Cohesive
and well-written analysis of findings
and conclusions.
Total Marks 100
Marks
School of London
School of London

Marking Criteria
Pass Grade Bands (100 – 50) (Learning Outcomes must be met)
Fail Grade Bands (49 – 0) (Learning Outcomes are not met)

Assessment Criteria Pass Merit Distinction Distinction Borderline Fail Fail Fail
(50 – 59) (60 – 69) (70 – 84) (85 – 100) (45 - 49) (30 - 44) (0 - 29)
(Credits may be (Credits may not be (Credits may
compensated) compensated) not be
compensated)
Research An Demonstrates a deep and Shows a strong Provides a good Demonstrates adequate Shows limited Demonstrates a poor Fails to
Skills extraordina insightful understanding of the K- understanding with understanding with understanding with basic understanding with understanding with demonstrate
ry means clustering algorithm with relevant examples and some examples and examples and some insufficient examples or little to no relevant understanding
conceptual detailed examples and critical solid analysis. analysis. discussion. analysis. examples. of the K-
understandi analysis. means
ng of the K- clustering
means algorithm.
clustering
algorithm,
including
its
advantages
,
disadvanta
ges, and
real-world
application
s. 10%
Thinking Skills An Exhibits exceptional creativity and Shows strong creativity Demonstrates good Provides adequate Limited creativity with Shows minimal Fails to
and Creativity innovative originality in problem-solving. with effective problem- creativity with some creativity with basic few innovative solutions. creativity with demonstrate
approach to solving approaches. innovative solutions. problem-solving ineffective problem- creativity or
problem- approaches. solving. effective
solving problem-
with solving.
creative
insights
and
solutions.
10%
School of London
Assessment Criteria Pass Merit Distinction Distinction Borderline Fail Fail Fail
(50 – 59) (60 – 69) (70 – 84) (85 – 100) (45 - 49) (30 - 44) (0 - 29)
(Credits may be (Credits may not be (Credits may
compensated) compensated) not be
compensated)
Practical Skills Data Excellent handling of missing Strong handling of data Good handling of data Adequate handling of data Limited handling of data Poor handling of data Fails to handle
and Exploration values, outliers, and data issues and effective issues and EDA. issues with basic EDA. issues or EDA. issues with insufficient data issues
normalization. Effective and EDA. Well-documented Adequate Documentation and Incomplete EDA. Inadequate effectively.
Professional and Pre-
insightful EDA with a well- with minor gaps in documentation and explanations are present documentation or documentation and Lacks proper
Learning Skills processing- documented Jupyter Notebook or explanation. explanation. but may lack depth. explanation. explanations. EDA and
10% script. documentation
Thinking skills Feature Innovative and effective feature Strong feature Good feature Adequate feature Limited feature Poor feature Fails to
&Practical Engineering engineering with detailed engineering with good engineering with some engineering with basic engineering with minimal engineering with demonstrate
explanation of encoding methods, explanation of methods explanation of methods explanation of methods explanation. insufficient effective
Skills and Process-15%
mathematical transformations, and transformations. and transformations. and transformations. explanation. feature
Professional scaling, and feature selection. engineering.
Learning Skills
Practical Skills Model Excellent implementation of Strong implementation Good implementation Adequate implementation Limited implementation Poor implementation Fails to
and Selection algorithms with thoughtful data with appropriate data with correct data splitting with basic data splitting with inappropriate data with ineffective data implement
and splitting and optimal use of tools splitting and effective and adequate tool and tool usage. splitting or tool usage. splitting or minimal models
Professional
Training- and libraries. use of tools. usage. tool usage. correctly or
Learning Skills 10% use tools
effectively.
Practical Skills Hyperpara Comprehensive evaluation using Accurate evaluation with Good evaluation with Adequate evaluation with Limited evaluation with Poor evaluation with Fails to
and meter relevant metrics with insightful good comparison and appropriate metrics and basic metrics and limited minimal use of metrics inadequate metrics evaluate
Tuning-10% comparison and interpretation of interpretation of multiple some comparison of comparison of models. and comparison. and no comparison of models
Professional
results. models. models. models. effectively or
Learning Skills provide
meaningful
insights.
Practical Skills Model Comprehensive evaluation using Accurate evaluation with Good evaluation with Adequate evaluation with Limited evaluation with Poor evaluation with Fails to
and Evaluation- relevant metrics with insightful good comparison and appropriate metrics and basic metrics and limited minimal use of metrics inadequate metrics evaluate
20% comparison and interpretation of interpretation of multiple some comparison of comparison of models. and comparison. and no comparison of models
Professional
results. models. models. models. effectively or
Learning Skills provide
meaningful
insights.
Practical Skills Model Thorough testing of the model with Effective testing with real- Good testing with some Adequate testing with basic Limited testing with Poor testing with Fails to test or
and Deploymen real-world or unseen data, world or unseen data and validation of performance validation of performance. insufficient validation of minimal or ineffective validate model
Professional t-10% demonstrating accurate predictions validation of performance. using unseen data. performance. validation of performance
and validation of performance. performance. effectively.
Learning Skills
School of London
Assessment Criteria Pass Merit Distinction Distinction Borderline Fail Fail Fail
(50 – 59) (60 – 69) (70 – 84) (85 – 100) (45 - 49) (30 - 44) (0 - 29)
(Credits may be (Credits may not be (Credits may
compensated) compensated) not be
compensated)
Communicati Report Clear, organized report with detailed Well-structured report Clear report with adequate Adequate report with basic Limited report with Poor report with Fails to provide
on Structure sections, insightful explanations, and with good explanations structure, explanations, structure and analysis, unclear structure and minimal organization, a coherent
and Clarity- effective visualizations. Cohesive and visualizations. and some visualizations. though visualizations may be insufficient analysis or analysis, and report or
analysis and conclusions. lacking. visualizations. visualizations. meaningful
5%
analysis and
visualizations.

You might also like