Customer Analytics Homework 1

The document presents a customer analytics homework focused on logistic regression models to predict customer defaults. Two models are developed, with Model 2 showing slightly better performance due to its inclusion of an interaction term, achieving an AUC of 0.776 compared to Model 1's 0.770. Both models demonstrate good accuracy and specificity, but Model 2 is preferred for its flexibility and improved classification ability.

Uploaded by

daimingyue02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

Customer Analytics Homework 1

Uploaded by

daimingyue02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Customer

Analytics
Homework 1
By Group 11: Boying Li, Mingyue Dai
and Yuantong Zhou
Estimation Preliminaries

Rename the target column into y.

Converting categorical variables to factors ensures that models treat them correctly.
Descriptive
Statistics

barplot(table(df$y barlot(table(df$pay_barplot(df$bill_a
)) hist(df$age) 1)) mt1)
Question 1: Generate a random training/validation index that
implements a 70/30 split. Use a random seed of your choice.

First, we set a random seed (set.seed(365)) to ensure reproducibility. Then, we used the sample()
function to randomly assign observations to either the training set or the validation set, with a
probability of 70% for training and 30% for validation (prob = c(0.7, 0.3)). This approach ensures the data
is split consistently and randomly into the two groups. After running the code, the table(idx) function
confirmed the distribution of samples, where idx == 1 represents the training set and idx == 2 represents
the validation set.
Question 2:
Estimate two logistic specifications that allow you to generate out-of-sample predictions of y. Take the following points into account:
You choose the variables X that enter each model specification. These variables X can be continuous or categorical. Make sure continuous and
categorical variables are entered appropriately into the models.
Specify model 1 as the simplest of the two. This model must include at least 5 explanatory variables.
Specify model 2 as the richer/more flexible of the two. Control flexibility through the set of X variables used. Include at least one variable interaction.
[An interaction of two variables, x1 and x2, would be x3 = x1*x2.
Model 1
• Model 1 is a simple logistic regression model created to predict whether a customer will default. It
excludes unnecessary variables like id and features such as bill_amt5, bill_amt6, pay_amt5, and
pay_amt6 to streamline the model.

• On the validation set, the model achieved an

accuracy of 82.0%, showing it predicts most
outcomes correctly. It performed
exceptionally well in identifying non-
defaulters, with a specificity of 95.6%, but its
sensitivity was relatively low at 34.6%.
Model 1 – ROC & AUC

• The ROC curve had an AUC of 0.770, meaning the model is good at distinguishing between defaulters
and non-defaulters.
Model 2
• Model 2 is a logistic regression model contains all features excluding id and an interaction of bill_amt1
and pay_amt1, which is a more complex model.

• The accuracy remains relatively stable

between training (82.04%) and test data
(81.72%), suggesting the model generalizes
well. Specificity is consistently high
(~95%), indicating the model is good at
identifying non-defaulters (negative class).
Model 2 – ROC & AUC

• The ROC curve had an AUC of 0.776, meaning the model is good at distinguishing between defaulters
and non-defaulters.
Model 1 and Model 2 Comparison

• Compared to Model 1, Model 2 performs slightly better overall, with a higher AUC of 0.776 compared to 0.770,
indicating improved ability to distinguish between defaulters and non-defaulters. Both models achieve
similar accuracy, around 82%, and maintain high specificity (Model 2: 95.5%, Model 1: 95.6%), making them
highly reliable at identifying non-defaulters. The inclusion of the interaction term in Model 2 adds flexibility,
capturing relationships that Model 1 may miss. Model 2 provides a slight edge in overall classification
performance and adaptability.
Question 3: Do any of your models exhibit signs of
overfitting? Explain.
Accuracy Model 1 Model 2
In-Sample 0.8219 0.8204
Out-of-Sample 0.8202 0.8172

Neither Model 1 nor Model 2 shows clear signs of overfitting, as their performance on the training
and validation sets is very similar. Both models maintain consistent results, like high specificity
and a solid AUC, across datasets, suggesting they generalize well. While Model 2 is slightly more
complex due to the interaction term, it doesn’t result in any noticeable overfitting. Overall, both
models perform reliably without overfitting.
Question 4:Provide a discussion of which of the two models you would
prefer for the purpose of identifying consumers who will default in the
future. If needed, make assumptions.

Between the two models, I would prefer Model 2 for identifying consumers who will default. Both models
achieve similar accuracy (~82%) and high specificity, but Model 2 has a slight edge with a higher AUC
(0.776 vs. 0.770), meaning it is better at distinguishing defaulters from non-defaulters overall. The
inclusion of the interaction term (bill_amt1 * pay_amt1) makes Model 2 more flexible, allowing it to
capture relationships that Model 1 might miss. While neither model is perfect at identifying all defaulters,
Model 2's improved classification ability and adaptability make it the better choice for future predictions.
Additionally, in this case, if false negatives, which is predicting non-default when they will actually
default, is high, it will cause higher lost to the company, Model 2 offers better precision or recall.

Supervised Learning
100% (1)
Supervised Learning
15 pages
UGBA 104 Prob Set C
No ratings yet
UGBA 104 Prob Set C
29 pages
Data Science For Online Customer Analytics - Assignment
No ratings yet
Data Science For Online Customer Analytics - Assignment
11 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
IDS 575 Project Report
No ratings yet
IDS 575 Project Report
9 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
26l Brake Valve KBPDF
100% (1)
26l Brake Valve KBPDF
80 pages
DSP 51 Mock Test II
No ratings yet
DSP 51 Mock Test II
4 pages
Machinelearning
No ratings yet
Machinelearning
24 pages
Capstone Assessment
No ratings yet
Capstone Assessment
18 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Amta Assignment
No ratings yet
Amta Assignment
20 pages
Ppa Final Project
No ratings yet
Ppa Final Project
17 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
AAS DSExam
No ratings yet
AAS DSExam
5 pages
CAP5768 Homework3
No ratings yet
CAP5768 Homework3
10 pages
Finclub Summer Project 2 (2025)
No ratings yet
Finclub Summer Project 2 (2025)
7 pages
RMSC3001 2023-24 PS2
No ratings yet
RMSC3001 2023-24 PS2
2 pages
MBA786M Project
No ratings yet
MBA786M Project
2 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
NUS - SOC - AML - Required Capstone Project
No ratings yet
NUS - SOC - AML - Required Capstone Project
5 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
SSRN Id3769854
No ratings yet
SSRN Id3769854
8 pages
Spark Python Course APPLY Project Problem Statement
No ratings yet
Spark Python Course APPLY Project Problem Statement
3 pages
CO 2 Session 3
No ratings yet
CO 2 Session 3
39 pages
HCI ScorecardModel PPT
No ratings yet
HCI ScorecardModel PPT
9 pages
Questions For Chapter 2
No ratings yet
Questions For Chapter 2
6 pages
Mmla Ia FT202087
No ratings yet
Mmla Ia FT202087
6 pages
November 2010)
No ratings yet
November 2010)
6 pages
Finance & Risk Analytics QSTN 1 - Credit Risk
No ratings yet
Finance & Risk Analytics QSTN 1 - Credit Risk
24 pages
Project Report
No ratings yet
Project Report
19 pages
1) Identify The Dependent Variable in The Above Data: Ans
No ratings yet
1) Identify The Dependent Variable in The Above Data: Ans
2 pages
BAUDM Assignment2
No ratings yet
BAUDM Assignment2
16 pages
Project: Creditworthiness: Step 1: Business and Data Understanding
No ratings yet
Project: Creditworthiness: Step 1: Business and Data Understanding
12 pages
PA v0.25
No ratings yet
PA v0.25
18 pages
Qns Exam2
No ratings yet
Qns Exam2
11 pages
Loan Application Approval Prediction
No ratings yet
Loan Application Approval Prediction
14 pages
LogisticRegression Vs RandomForest 1601998364
No ratings yet
LogisticRegression Vs RandomForest 1601998364
14 pages
75.an Approach For Prediction of Loan Approval Using
No ratings yet
75.an Approach For Prediction of Loan Approval Using
5 pages
Progress Report 1
No ratings yet
Progress Report 1
11 pages
Banking Project Final
No ratings yet
Banking Project Final
38 pages
Credit Defaulter Classifier 1659348484
No ratings yet
Credit Defaulter Classifier 1659348484
7 pages
Project Presentation.
No ratings yet
Project Presentation.
19 pages
Project Report - ML
100% (1)
Project Report - ML
17 pages
Progress Report 2
No ratings yet
Progress Report 2
10 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Project Presentation
No ratings yet
Project Presentation
19 pages
Capstone Project Report v1 - Abhishek Bihani
No ratings yet
Capstone Project Report v1 - Abhishek Bihani
16 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
DATT - Class 01 - Assignment - GR 9
No ratings yet
DATT - Class 01 - Assignment - GR 9
11 pages
Omicron
No ratings yet
Omicron
23 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Part 1 - Building Your Own Binary Classification Model - Coursera
0% (10)
Part 1 - Building Your Own Binary Classification Model - Coursera
3 pages
Modelling-Project Notes-2
No ratings yet
Modelling-Project Notes-2
49 pages
Geldium Task2 Model Plan
No ratings yet
Geldium Task2 Model Plan
4 pages
Thera Bank
100% (1)
Thera Bank
25 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
2 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
Dsa - DK Question Paper
No ratings yet
Dsa - DK Question Paper
4 pages
SPEEDAIRE 5Z405E - 251204 - 0508-Web
No ratings yet
SPEEDAIRE 5Z405E - 251204 - 0508-Web
20 pages
TF 700-R-07 - Slab - On - Ground
No ratings yet
TF 700-R-07 - Slab - On - Ground
36 pages
Read Me File GenAlEx 6.1
No ratings yet
Read Me File GenAlEx 6.1
5 pages
PES Wind 1 17 Moventas Talking Point 1
No ratings yet
PES Wind 1 17 Moventas Talking Point 1
4 pages
Hollow Section Acc. To 10210 PDF
No ratings yet
Hollow Section Acc. To 10210 PDF
3 pages
Dense Optical Flow Expansion Based On Polynomial Basis Approximation
No ratings yet
Dense Optical Flow Expansion Based On Polynomial Basis Approximation
12 pages
Ansible - Automation Sibelius
No ratings yet
Ansible - Automation Sibelius
4 pages
Design of Water Distribution System, Environmental Engineering
87% (55)
Design of Water Distribution System, Environmental Engineering
37 pages
Ahg MCC A 001
100% (1)
Ahg MCC A 001
22 pages
BCME Semester Examination Model Question Paper
No ratings yet
BCME Semester Examination Model Question Paper
3 pages
LG GX500 Dual Black Hand Lei Ding Engels
No ratings yet
LG GX500 Dual Black Hand Lei Ding Engels
114 pages
Comdiflex Spiral Wound Gaskets Technical Catalogue
No ratings yet
Comdiflex Spiral Wound Gaskets Technical Catalogue
7 pages
Thermal Conductivity of Magnesium Oxide From Absolute, Steady-State Measurements
No ratings yet
Thermal Conductivity of Magnesium Oxide From Absolute, Steady-State Measurements
7 pages
Maleic Anhydride: Description/Applications
No ratings yet
Maleic Anhydride: Description/Applications
1 page
SHARC Hardware Accelerators
No ratings yet
SHARC Hardware Accelerators
2 pages
IELTS
100% (1)
IELTS
15 pages
Calibration Matters I
0% (1)
Calibration Matters I
18 pages
CRP Sampling Literature
No ratings yet
CRP Sampling Literature
24 pages
50the Power Tips
No ratings yet
50the Power Tips
4 pages
0821 Part B DCHB Ajmer
No ratings yet
0821 Part B DCHB Ajmer
308 pages
Examples of Applicable Steels For Typical Products
No ratings yet
Examples of Applicable Steels For Typical Products
1 page
Growth Story of Biopesticides in Tamil Nadu
No ratings yet
Growth Story of Biopesticides in Tamil Nadu
26 pages
Analysis of Attribute Acceptance Sampling Properties
No ratings yet
Analysis of Attribute Acceptance Sampling Properties
10 pages
Tricia Ann Cayetano Portfolio 2014
No ratings yet
Tricia Ann Cayetano Portfolio 2014
20 pages
Ch1 AI: History and Applications: Dr. Bernard Chen PH.D
No ratings yet
Ch1 AI: History and Applications: Dr. Bernard Chen PH.D
17 pages
GE Capabilities For Ansaldo Fleet
100% (1)
GE Capabilities For Ansaldo Fleet
17 pages
PDF Allpa Technodrive TM200B
No ratings yet
PDF Allpa Technodrive TM200B
9 pages
Overview of Computer Networks: Norman Matloff Dept. of Computer Science University of California at Davis C
No ratings yet
Overview of Computer Networks: Norman Matloff Dept. of Computer Science University of California at Davis C
29 pages