0% found this document useful (0 votes)

36 views3 pages

Assignment 1 DA - E Oct 2023 V1-1

sdsads

Uploaded by

Anisha Gheever

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views3 pages

Assignment 1 DA - E Oct 2023 V1-1

sdsads

Uploaded by

Anisha Gheever

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Data8001

Lecturer : Aengus Daly

Assignment

Due - Monday 20th Nov 2023, 11.59 pm

Please submit your work via Canvas ,((1) your code script with comments, (2) your report in Word,
pdf or similar format, (3) a 4 minute mp4 recording). Name your final files starting as follows:
FirstName_Surname_. Have sure your code is working using only the initial dataset. Do not zip your
files.

Standard MTU penalties apply for work submitted after the due date.

Your submission should be your own work, plagiarism will be dealt with in accordance with MTU
regulations.

Note this assignment is worth 40% of this module. Reference your work appropriately.

Annotate your code with comments especially for code that is complicated; marks will be given for
these comments that display understanding of all the code you use, including code given in labs and
class.

Marks are awarded for code that is succinct and neat and the labelling of variables in a meaningful
and clear manner. Marks are also awarded for answers that have a level of individual though and
expression, so add these where possible in your comments.
Question 1

Siobhán, a manager at a financial institution has contacted you. She is asking you for assistance in assessing
the credit worthiness of future potential customers. She has a dataset of 904 past loan customer cases, with
14 attributes for each case, including attributes such as financial standing, reason for the loan, employment,
demographic information, foreign national, years residence in the district and the outcome/label variable
Credit Standing - classifying each case as either a good loan or bad loan.

The dataset is on Canvas, in the assignment folder and called Credit_Risk_32_final.csv.

Data Details

Most of the attributes are self-explanatory; the name of some of the attributes are somewhat cumbersome
but this is what you have been given; here are the further details of some of them:

Checking Acct - What level of regular checking account does the customer have –No acct, 0balance, low
(balance), high (balance)

Credit History – All paid – no credit taken or all credit paid back duly
Bank Paid – All credit at this bank paid back
Current – Existing loan/credit paid back duly till now
Critical – Risky account or other credits at other banks
Delay – Delay in paying back credit/loan in the past

Months Acct – The number of months the customer has an account with the bank.

Credibility score – A score given to applicants to reflect the credibility of them repaying the loan, using a formula
created by a data analyst and had access to all historical data.
Check – The data analyst created this field as a check on Credit Standing and had access to all historical data.

Using R or python help Siobhán answer the following questions. Make sure you explain your code, especially
the more complicated sections. If you are unable to complete some of the coding parts explain in words
with pseudo code if appropriate what you would like to do.

a) Exploratory Data Analysis (EDA): - Carry out EDA on the data set; do you notice anything unusual
(missing data, outliers, duplicates etc.) or any patterns with the data set? Detail these and outline
any actions you propose to take before you start model building in part b). Max word count 500.
10 marks

b) Split the dataset into 75% training and 25% test set using set.seed(abc) where abc are the last 3 digits
of your student no. (Use this set.seed for all other functions with an element of randomness in this
work).

c) Using the code given in the labs or otherwise, use base R (or python equivalent) to build code using
the entropy formula to split only the categorical type predictor variables. Show which predictor
variable should be used for the root node split. Use only the training set from b) to do this and you
are not constrained to binary splits.
10 marks
d) Now redo part c) but now you are constrained to only binary splits, i.e. a split with only 2 possible
outcomes. Show how this affects your results and give reasons why this is the case.
10 marks

e) Now include the continuous numeric predictor variables, again use only a binary split. Which is now
the root node split? Analyse your results and comment.
10 marks

f) Now investigate the second split, i.e. determine which next predictor variable(s) should be used to
split at the next level of the decision tree. Only binary splits are allowed again here. Detail in words
and diagrams and code and the approach you are going to use.
10 marks

g) Use the tree function from the package tree, or equivalent, build a decision tree and compare the
results to those in f) and comment. If you use pruning here you should explain all the methodology
you use.
10 marks
h) Now see if you can improve your results by using a random forest model. Give your results (5 marks)
and explain and comment (5 marks).
10 marks
i) Due to GDPR you are no longer allowed use the following variables to build your model Age,
Personal.Status and Foreign.National. Now redo your working for your best model. Give your results
and comment.
10 marks

j) Siobhán’s company uses a process that is a mixture of a grading system and human input to grade
each past loan as good or bad. Siobhán is suspicious that during a particular time that this process
performed poorly. The ID numbers can be taken as time stamp values. Develop a strategy to find a
series of consecutive ID numbers, i.e. where these gradings show a higher than normal pattern of
suspiciously incorrect or correct gradings. Detail how you go about your investigation.
10 marks

k) Select 2 parts of your answer above, e.g. (i) and (j) and record a 4 min video to demonstrate your
learning/understanding of ideally the difficult parts of these questions. Only the first 4 mins of the
recording will be viewed.
10 marks

[Total 100 marks]

Python PDF Merged
No ratings yet
Python PDF Merged
350 pages
AI200 Capstone Project Instructions
No ratings yet
AI200 Capstone Project Instructions
8 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
TY - Lab-II CS-358 Web Tech & DS Slip (Rev 2021-22)
No ratings yet
TY - Lab-II CS-358 Web Tech & DS Slip (Rev 2021-22)
20 pages
Fidelia Hilda Rolland - 2020996845
No ratings yet
Fidelia Hilda Rolland - 2020996845
10 pages
Data Science and ML-KTU
No ratings yet
Data Science and ML-KTU
11 pages
Ids Final Sol
No ratings yet
Ids Final Sol
16 pages
M818A: Machine Learning and Cyber Security-A
No ratings yet
M818A: Machine Learning and Cyber Security-A
11 pages
Data Preprocessing
No ratings yet
Data Preprocessing
13 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Data Mining Questions Q&A
No ratings yet
Data Mining Questions Q&A
11 pages
Computational Thinking Theory Answers
No ratings yet
Computational Thinking Theory Answers
2 pages
Scoring Key/marking Scheme
No ratings yet
Scoring Key/marking Scheme
9 pages
Thera Bank Loan Purchase Modelling
No ratings yet
Thera Bank Loan Purchase Modelling
44 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
ITNPBD6 Assignment 2018-2 PDF
No ratings yet
ITNPBD6 Assignment 2018-2 PDF
2 pages
ECON 460202E006 MLforBI2 S23o
No ratings yet
ECON 460202E006 MLforBI2 S23o
5 pages
QMM1001 Applied Activity 2
No ratings yet
QMM1001 Applied Activity 2
2 pages
Computational
No ratings yet
Computational
7 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
ADA Assignment - Final - 2024
No ratings yet
ADA Assignment - Final - 2024
5 pages
Grade11 Datascience
No ratings yet
Grade11 Datascience
4 pages
MGT555 OCT2022 ICEPS Individual Assignment
No ratings yet
MGT555 OCT2022 ICEPS Individual Assignment
1 page
Set B
No ratings yet
Set B
4 pages
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
CA One 2024
No ratings yet
CA One 2024
4 pages
ADA Assignment - Final - 2022
No ratings yet
ADA Assignment - Final - 2022
6 pages
Key Ip Pre Board 2024-25
No ratings yet
Key Ip Pre Board 2024-25
10 pages
Singh Project1 Report
No ratings yet
Singh Project1 Report
12 pages
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
52 pages
Updated InformaticsPractices MS
No ratings yet
Updated InformaticsPractices MS
7 pages
EC4401 - Pract. Exam (2024-2025)
No ratings yet
EC4401 - Pract. Exam (2024-2025)
3 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
12pb24ip01 QP
No ratings yet
12pb24ip01 QP
12 pages
Mid-Sem Model Answer 7
No ratings yet
Mid-Sem Model Answer 7
5 pages
MBA786M Project
No ratings yet
MBA786M Project
2 pages
IP-MS-2 India
No ratings yet
IP-MS-2 India
5 pages
Theory (10 Marks)
No ratings yet
Theory (10 Marks)
4 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
BigDatal PDF
No ratings yet
BigDatal PDF
50 pages
Solution
No ratings yet
Solution
18 pages
Assignment - 1 - Machine Learning
No ratings yet
Assignment - 1 - Machine Learning
3 pages
Assignment 1 Specification - T1 - 2023 - COIT12209
No ratings yet
Assignment 1 Specification - T1 - 2023 - COIT12209
3 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Data Science
No ratings yet
Data Science
10 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
Final Paper MF 450 BA
No ratings yet
Final Paper MF 450 BA
1 page
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Datascience
No ratings yet
Datascience
8 pages
EM526 Quiz 2
No ratings yet
EM526 Quiz 2
2 pages
R - Programming: Assignment
No ratings yet
R - Programming: Assignment
3 pages
Dsa - DK Question Paper
No ratings yet
Dsa - DK Question Paper
4 pages
Project Management Life Cycle
50% (2)
Project Management Life Cycle
5 pages
NUS - SOC - AML - Required Capstone Project
No ratings yet
NUS - SOC - AML - Required Capstone Project
5 pages
NPV 70 Marks Set 2
No ratings yet
NPV 70 Marks Set 2
4 pages
Quiz Complete
No ratings yet
Quiz Complete
4 pages
SLC 70 Marks Set 1
No ratings yet
SLC 70 Marks Set 1
3 pages
Final Coursework - 24.2 Ad Cert Python
No ratings yet
Final Coursework - 24.2 Ad Cert Python
2 pages
Bookstore Management System
100% (1)
Bookstore Management System
40 pages
Areva p343 p344 p345 Xrio Converter Manual Enu Tu2.22 v1.001
No ratings yet
Areva p343 p344 p345 Xrio Converter Manual Enu Tu2.22 v1.001
16 pages
John Crane Gas Seal Technology: 27 September, Singapore
No ratings yet
John Crane Gas Seal Technology: 27 September, Singapore
44 pages
ISPF User's Guide Volume I PDF
No ratings yet
ISPF User's Guide Volume I PDF
260 pages
Ireland Companies List - Consumer Goods
No ratings yet
Ireland Companies List - Consumer Goods
23 pages
Magnetos Maintenance and Overhaul PDF
100% (1)
Magnetos Maintenance and Overhaul PDF
64 pages
FinalPaperDesign and Simulation of PID Controller For Power Electronics Converter Circuits170541
No ratings yet
FinalPaperDesign and Simulation of PID Controller For Power Electronics Converter Circuits170541
6 pages
Ireland Companies List - Industrial Automation
100% (1)
Ireland Companies List - Industrial Automation
2 pages
QuantNet Online C Course
No ratings yet
QuantNet Online C Course
9 pages
DBDM Lecture Notes
No ratings yet
DBDM Lecture Notes
242 pages
SG Acma
No ratings yet
SG Acma
9 pages
Ireland Companies List - Computer Hardware
No ratings yet
Ireland Companies List - Computer Hardware
1 page
SDL Plugins
No ratings yet
SDL Plugins
5 pages
Wang 2015
No ratings yet
Wang 2015
14 pages
Ireland Companies List - Computer Software
No ratings yet
Ireland Companies List - Computer Software
12 pages
Invoice: WD Elements (WDBUZG0010BBK) 1 TB Portable External Hard Drive (Black) 1 4284 4284
No ratings yet
Invoice: WD Elements (WDBUZG0010BBK) 1 TB Portable External Hard Drive (Black) 1 4284 4284
1 page
Interim Report
No ratings yet
Interim Report
17 pages
Literature Review
No ratings yet
Literature Review
4 pages
Ireland Companies List - Computer & Network Security
No ratings yet
Ireland Companies List - Computer & Network Security
2 pages
Churn Data Prediction Project
No ratings yet
Churn Data Prediction Project
5 pages
Methodology
No ratings yet
Methodology
12 pages
Advanced Excel - Waterfall Chart
No ratings yet
Advanced Excel - Waterfall Chart
8 pages
Programming The Internet of Things
100% (1)
Programming The Internet of Things
86 pages
Synopsis
No ratings yet
Synopsis
3 pages
STAT8010 Assignment 2 - 2023
No ratings yet
STAT8010 Assignment 2 - 2023
4 pages
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
No ratings yet
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
40 pages
OFAD 40023 Internet and Web Design COMMON
No ratings yet
OFAD 40023 Internet and Web Design COMMON
86 pages
Config Zyxel 3550
No ratings yet
Config Zyxel 3550
370 pages
MATH8009 2023-24 Project
No ratings yet
MATH8009 2023-24 Project
3 pages
Delta Ia-Cnc Solution en 20190123
No ratings yet
Delta Ia-Cnc Solution en 20190123
44 pages
Results and Discussions
No ratings yet
Results and Discussions
5 pages
BL Outline 14 01 24
No ratings yet
BL Outline 14 01 24
8 pages
BDCOM S2928 Hardware Installation Manual
No ratings yet
BDCOM S2928 Hardware Installation Manual
21 pages
Shanghai City Times
No ratings yet
Shanghai City Times
3 pages
Lecture 3
No ratings yet
Lecture 3
23 pages
eSthenos-Mobility Solutions For MFI/Banks/SBL
No ratings yet
eSthenos-Mobility Solutions For MFI/Banks/SBL
8 pages
Project 2
No ratings yet
Project 2
8 pages
Tutorial 1 The Fairy On The Dead Tree
No ratings yet
Tutorial 1 The Fairy On The Dead Tree
4 pages
ICT Assignment 4 Bachelors
No ratings yet
ICT Assignment 4 Bachelors
4 pages
Unit 7-PHP
No ratings yet
Unit 7-PHP
12 pages
Gauss Jordan Elimination 2a For Print 3a
No ratings yet
Gauss Jordan Elimination 2a For Print 3a
24 pages
BDA3073 - 11 Bode Plot
No ratings yet
BDA3073 - 11 Bode Plot
26 pages
IGNOU PGDCA All in One Previous Years Unsolved Papers
From Everand
IGNOU PGDCA All in One Previous Years Unsolved Papers
Manish Soni
No ratings yet

Assignment 1 DA - E Oct 2023 V1-1

Uploaded by

Assignment 1 DA - E Oct 2023 V1-1

Uploaded by

Data8001

Lecturer : Aengus Daly

Due - Monday 20th Nov 2023, 11.59 pm

The dataset is on Canvas, in the assignment folder and called Credit_Risk_32_final.csv.

[Total 100 marks]

You might also like