Coursework Specification

Read this coursework specification carefully, it tells you how you are going to be assessed,
how to submit your coursework on-time and how (and when) you’ll receive your marks and

Module Code CSI_7_DMA

Module Title Data Mining and Analysis
Lecturer Daqing Chen, Kamal Thapa
% of Module Mark 60%
Distributed 26/01/23
Online submission via the Module’s Moodle site on
Submission Method
the VLE
17:00, Friday 28/04/23
Release of Feedback and provisional marks will be available in
Feedback & Marks the Gradebook on the VLE from 15/05/23

Coursework Aim:
This team-based assignment involves analysing a real-world dataset and creating
meaningful insights to address certain business concerns and problems identified.
You’ll be working in pairs for this assignment.

Coursework Details:

Type: Report
Overview The objective of this individual assignment is to evaluate
your understanding of the basic theory, concepts, and
various methods and algorithms in data mining, and assess
your skills of applying appropriate Python packages, such
as NumPy, Pandas, Matplotlib, and Scikit-learn, etc., to
carry out a data mining project.

The dataset for this coursework is related to London Fire

Brigade (LFB) incidents reported from 2019 to 2022. Your
role in this project is two-fold: acting as a business client and
as a data analyst. As a business client, you are expected to
raise meaningful business concerns/problems in relation to
the data given. And as a data analyst, you are required to
follow a proper data mining methodology and apply various
techniques covered in lectures to analyse your data to
address the business concerns and problems having been
Tasks You are required to undertake the following tasks:

1. Problem Identification
1.1. Read the data description file (metadata) to learn the
basic characteristics of the dataset including the
certain business context associated with the data,
the total number of attributes (dimensions,
variables), the data type of each attribute, the value
range/mode, skewness, and kurtosis of each
attribute, the total number of instances, and simple
data exploration with essential plotting, etc.
1.2. Identify a set of meaningful business problems of
interest with regard to the data for analysis.
1.3. Identify what data mining tasks need to be performed
in order to address the business problems raised.

2. Data Preparation
2.1. Determine which variables to be used in which
analysis. Also refer to 1.2. and 1.3. Task 1.
2.2. Get your data for analysis. Choose appropriate
methods for data pre-processing, including detecting
and dealing with incorrect data types, irrelevant
variables, missing values, outliers, imbalanced
classes, and duplicates, changing data type, and
conducting proper dimensionality reduction, feature
extraction, data transformation, data partition, and
normalisation, etc. where appropriate. Also refer to
1.1. Task 1.

3. Model Construction
3.1. With the pre-processed dataset undertake the data
mining tasks you have identified in 1.2. You are
required to apply two different algorithms for both
predictive and descriptive modellings. For
descriptive modelling, you may choose to use the k-
means clustering and various EDA (Exploratory Data
Analysis) methods, e. g., histograms, bar charts, and
Person’s correlation coefficient, etc. For predictive
modelling, for example, you may use decision trees
and artificial neural networks, or decision trees and
k-nearest-neighbour, etc.
3.2. In order to build the most appropriate and accurate
models and identify meaningful hidden patterns,
different settings for the relevant model parameters
should be considered for each of the selected
algorithms and methods.

4. Model Interpretation and Evaluation

4.1. Interpret the descriptive models created, such as
clusters created using k-means algorithms,
correlation among variables, and various relevant
plots created.
4.2. Compare the performances of different predictive
models in terms of accuracy, error rate,
generalisation capability (over-fitting), simplicity and
cost, etc., where appropriate.
4.3. Discuss the meaningfulness and usefulness of the
models built and the patterns revealed, and how the
models and the patterns can be used to address
the original business concerns. This includes both
descriptive and predictive models.
5. A summary of the main findings of the project and
suggestions to LFB based on your analysis.
Word Count: As a guide, aim for 3000 words, excluding Title page, Table
of Contents, tables and graphs, footnotes, bibliography,
and scripts. The maximum word limit is 3200 words.

You may get a reduction in mark for not meeting the word
count limit.

Presentation: • Work must be referenced, and a bibliography provided.

• Work must be submitted as a Word document (.doc/docx)
or a PDF.
• Course work must be submitted using Arial font size 11 (or
larger if you need to), with a minimum of 1.5 line spacing.
• Your student number must appear at the front of the
coursework. Your name must not be on your coursework.

Referencing: Harvard Referencing should be used, see your Library
Subject Guide for guides and tips on referencing.

Regulations: Make sure you understand the University Regulations on

expected academic practice and academic misconduct.
Note in particular:

• Your work must be your own. Markers will be attentive

to both the plausibility of the sources provided as well
as the consistency and approach to writing of the work.
Simply, if you do the research and reading, and then
write it up on your own, giving the reference to sources,
you will approach the work in the appropriate way and
will cause not give markers reason to question the
authenticity of the work.
• All quotations must be credited and properly referenced.
Paraphrasing is still regarded as plagiarism if you fail to
acknowledge the source for the ideas being expressed.

TURNITIN: When you upload your work to the Moodle site

it will be checked by anti-plagiarism software.

Learning Outcomes
This coursework will partially assess the following learning outcomes for this module as
indicated by *.

Knowledge and Understanding

On successful completion of this module, you will be able to
• Describe and explain the concepts of data mining and business analytics.
• Critically review and appreciate the role of data mining in business analytics. *
• Critically explain how and why data mining and business analytics can be used to
create competitive advantage for businesses and enterprises. *
• Critically analyse when, why, and how data mining should be considered a possible
problem-solving strategy from a business perspective. *
• Gain sufficient working knowledge of using Python packages and libraries, such as
Numpy, Pandas, Matplotlib, and Sklearn, etc., for performing data exploration,
detecting and data quality issues, modelling, model interpretation and comparison, and
reporting with real-world case studies. *

Intellectual Skills
On successful completion of this module, you will be able to
• Identify different types of data mining tasks in relation to various business concerns,
including classification, prediction, cluster analysis and segmentation, and association
analysis and market basket analysis. *
• Critically review and appreciate the strengths and weaknesses of different data mining
techniques, models, and tools. *

Practical Skills
On successful completion of this module, you will be able to

• Select and apply appropriate data mining techniques for a given real-world problem. *
• Evaluate various models built from a data mining process. *
• Undertake a data mining project with clear business focus, in particular, in relation to
CRM analysis, RFM modelling, and credit risk scoring. *

Transferable Skills
On successful completion of this module, you will be able to
• Demonstrate analytical skills. *
• Demonstrate project management skills. *
• Teamwork skills. *

Assessment Criteria and Weighting

LSBU marking criteria have been developed to help tutors give you clear and helpful
feedback on your work. They will be applied to your work to help you understand
what you have accomplished, how any mark given was arrived at, and how you can
improve your work in future.

Feedforward comments

100 - 80% 79 - 70% 69 - 60% 59 - 50% 49 - 40% 39 - 30% 29 - 0%

1. Business Exceptionally clear and Thorough and clear analysis Clear analysis of business Clear analysis of Adequate analysis of the Inadequate analysis of Little or no analysis
Understanding and Data concise analysis of business of business concerns and concerns and relevant data business concerns and key business concerns business concerns and of business
Understanding concerns and relevant data relevant data mining tasks. mining tasks to a certain relevant data mining and data mining tasks. data mining tasks. Only concerns and data
mining tasks. Excellent and Excellent initial data depth. Sensible initial data tasks. Probably lack Limited initial data simple initial data mining tasks. Little
creative initial data exploration exploration with effective exploration performed with some in-depth view. exploration. Probably lack exploration performed. or no initial data
with effective means. means. appropriate means. Essential initial data some relevance. Lack clarity and exploration

exploration performed. Inappropriate means. relevance. performed. Little or

no relevancy.
2. Data Pre-processing Thorough and extensive Thorough consideration of Good consideration of data Reasonable Limited consideration of Inadequate view of Little or no data
consideration of data quality data quality issues. quality issues. Appropriate consideration of data data quality issues. Some data quality issues. quality issues
issues. Appropriate Appropriate approaches approaches adopted with quality issues. appropriate approaches Inappropriate considered.
approaches adopted with adopted with outstanding clear understanding and Appropriate adopted with limited approaches adopted. Inappropriate
Exceptionally clear understanding. Excellent every aspect covered. Good approaches adopted understanding and limited Poor use of the approaches
understanding. Excellent use use of the relevant Python and flexible use of the with reasonable coverage. Limited use of relevant Python adopted. No or
of the relevant Python packages. relevant Python packages. understanding and the relevant Python packages. inappropriate use
packages. most of the main issues packages. of the relevant
covered. Good use of Python packages.

the relevant Python

3. Model Construction Appropriate algorithms Appropriate algorithms Appropriate algorithms Appropriate algorithms Some appropriate Inappropriate Little or no
employed with Exceptionally employed with outstanding employed with clear employed with algorithms employed with algorithms employed. algorithms
clear outstanding understanding. Modelling understanding. Good and reasonable limited understanding. Poor use of the employed. Little or
understanding. Modelling with with excellent working flexible use of the relevant understanding. Good Limited use of the relevant Python no use of the
excellent working knowledge knowledge of the relevant Python packages. use of the relevant relevant Python packages. relevant Python

of the relevant Python Python packages. Python packages. packages. packages.

4. Model Evaluation Exceptionally thorough and Thorough and clear model Clear model interpretation Basic model Weak model Poor model Little or no model
clear model interpretation and interpretation and and comparison with interpretation and interpretation and interpretation and interpretation and
comparison with regards to comparison with regards to regards to business comparison with comparison with regards comparison with comparison with
business concerns. Excellent business concerns. concerns. Significantly regards to business to business concerns. regards to business regards to
and meaningful Excellent meaningful meaningful models/patterns concerns. Reasonable Very limited concerns. No or little business concerns.
models/patterns created. models/patterns created. created. models/patterns meaningfulness. Probably meaningful

created. lack some clarity. models/patterns

5. Report Exceptionally clear and Very clear and concise Clear and concise summary Clear review and Adequate review of Inadequate review of Little or no review
concise summary of project summary of project findings. of project findings. Excellent summary of project project findings. Probably project findings. Lack of of project findings.
findings. May raise questions May raise questions for presentation. Clear structure findings. Good lack of some clarity. clarity and accuracy. Significantly Lack
for future research. future research. and layout. presentation with Acceptable presentation. Poor presentation. of clarity and
Exceptional outstanding Outstanding presentation. proper structure and accuracy. Very

presentation. Clear structure Clear structure and layout. layout. poor presentation.
and layout.

How to get help

If you have related questions, please contact Daqing Chen, email:

[email protected], as soon as possible.

All the module’s lectures, tutorial handouts, and the references recommended
in the module guide.

Quality assurance of coursework specifications

Coursework specifications within CSI division go through internal (for new
modules with 100% coursework also through external) moderation. This is to
ensure high quality, consistency and appropriateness of the coursework as
well as to share best practice within the CSI division.

