CA One 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Assessment Brief

Module Title: Applied Statistics & Machine Learning

Module Code: B9BA102

Assessment Title: Continuous Assessment One

Assessment Number: 1

Assessment Type: Practical Assignment – Supervised Machine Learning

Individual/Group: Individual

Assessment Weighting: 30%

Issue Date: 25/10/2024 (Week 6)

Due Date/Time: Deadline – 11.55 pm on Sunday 16/11/2024 (Week 9)

Mode of Submission: MOODLE

Learning Outcomes to be assessed

1. Analyze a dataset from a problem domain in depth, and select appropriate statistical models,
tools, and techniques to derive insights regarding the dataset and domain.

2. Effectively extract, transform, interrogate, and analyze large datasets.

3. Construct, refine, interpret, and critically evaluate predictive analytical and machine learning
models.

4. Critically evaluate and utilize hyperparameter search strategies for optimizing machine
learning models.
Supervised Machine Learning – Classification (100 Marks)

Dataset

Each row in Bank.csv corresponds to a bank’s credit card customer.

1
Relevant information about this dataset is given below:

Number of Instances: 4522


Number of Attributes: 16 independent variables + 1 target variable
Independent Variables:
Input variables (bank client data):
1 - Age (numeric)
2 - job : type of job (categorical:
"admin.","unknown","unemployed","management","housemaid","entrepreneur","student",
"blue-collar","self-employed","retired","technician","services")
3 - Marital : marital status (categorical: "married","divorced","single"; note: "divorced"
means divorced or widowed)
4 - Education (categorical: "unknown","secondary","primary","tertiary")
5 - default: has credit in default? (binary: "yes","no")
6 - balance: average yearly balance, in euros (numeric)
7 - housing: has housing loan? (binary: "yes","no")
8 - Loan: has personal loan? (binary: "yes","no")
# related with the last contact of the current campaign:
9 - Contact: contact communication type (categorical: "unknown","telephone","cellular")
10 - Day: last contact day of the month (numeric)
11 - Month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
12 - Duration: last contact duration, in seconds (numeric)
# other attributes:
13 - Campaign: number of contacts performed during this campaign and for this client
(numeric, includes last contact)
14 - Pdays: number of days that passed by after the client was last contacted from a previous
campaign (numeric, -1 means client was not previously contacted)
15 - Previous: number of contacts performed before this campaign and for this client
(numeric)
16 - Poutcome: outcome of the previous marketing campaign (categorical:
"unknown","other","failure","success")

Output variable (desired target):


17 - Y - has the client subscribed a term deposit? (binary: "yes","no").

Task

The bank wants to use a classification model that can predict whether the client has subscribed
a term deposit? Construct a suitable classification model for the bank by implementing both
random forest and support vector classification algorithms in Python.

In addition to providing the python code file, you are required to provide critical analysis of
your approach and results in a pdf report.

2
Your code and analysis should cover the following points:

1. Data Preparation (What steps would you take to prepare your data? Discuss your approach)
[20]
2. Model Hyperparameter Tuning (Which hyperparameters would you tune and why?
How would you tune them?) [20]

3. Choice of Evaluation Metric (Which metric would be suitable for model evaluation and
why?) [20]

4. Overfitting avoidance mechanism (Which mechanism (feature Selection/


regularization) would you use and why?) [20]

5. Results analysis
a). Which of the two models (random forest or support vector classifier) would you
recommend for deployment in the real-world?
b). Is any model underfitting? If yes, what could be the possible reasons?
[20]

You must submit the following in a zipped folder:

1. Critical Analysis Report (.pdf) 2. Python Code (.py)

Naming convention:
Report should be named as –
Report_Firstname_Surname.pdf

Code should be named as –


Code_Firstname_Surname.py
Zipped folder should be named as –
Firstname_Surname.zip

There is no prescribed word-count for the report. It will be assessed on quality, and not
quantity of content.

3
Assessment Criteria
Each part will be graded according to the following criteria:
1. Quality of code (correctness and completeness) [Weightage – 40%]

2. Quality of analysis in report (critical analysis of approach, presentation and interpretation


of results, conclusion) [Weightage – 60%]

General Requirements for Students

PLEASE READ CAREFULLY


1. It is your responsibility to ensure your file is uploaded correctly.
2. Students are required to retain a copy of each assignment.
3. When an assignment is submitted, it is the student’s responsibility to ensure that the file is
in the correct format and opens correctly.
4. Students should refer to the assessment regulations in their Course Guide.
5. DBS penalizes students who engage in academic impropriety (i.e. plagiarism, Collusion and
/ or copying). Please refer to the referencing guidelines on Moodle for information on
correct referencing.
6. All relevant provisions of the Assessment Regulations must be complied with.
7. Penalties for late submission of assignments are as follows:
a. 25% penalty for assignments submitted within 5 working days of the deadline.
b. No marks for assignments submitted more than 5 working days after the deadline.

Extensions to assignment submission deadlines will be granted in exceptional circumstances


only. The appropriate “Application for Extension” form must be used and supporting
documentation (e.g. medical certificate) must be attached. Applications for extensions
should be made directly to the Programme Coordinator in advance of the deadline date.

You might also like