HW3

This document outlines the requirements for Homework 3 in the Business Data Mining course. It includes 4 problems analyzing datasets using data mining techniques like decision trees and ROC curves. Students must submit a paper of no more than 6 pages by September 23 addressing the questions, which involve tasks like determining misclassification rates, plotting charts, and interpreting evaluation metrics for predictive models.

Uploaded by

Arpit Gulati

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

716 views

HW3

Uploaded by

Arpit Gulati

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

BUSINESS DATA MINING (IDS 572)

HOMEWORK 3
DUE DATE: WEDNESDAY, SEPTEMBER 23 AT 6:00 PM

Please provide succinct answers to the questions below.

Your entire write up must be at most six pages long including any figures and/or SPSS printouts.
You should submit an electronic pdf or word file in blackboard.
Please include the names of all team-members in your write up and in the name of the file.

Problem 1. Consider the 100 data points in the file hw3.xls. Each data point is either POSITIVE or
NEGATIVE. Based on data not shown, two models have been trained to predict the values of these
data points. For each of the two models, the table gives the probabilities of POSITIVE of each data
point, the last column shows the actual value.
(a) Determine the proportion of records p that are POSITIVE.
(b) Assume the models classify a record as POSITIVE if the probability of POSITIVE is larger
than 0.5 and otherwise they classify it as NEGATIVE. Determine the misclassification rate for
each of the two models.
(c) Again, assume the models classify a record as POSITIVE if the probability of POSITIVE is
larger than 0.5. Give the coincidence matrices for the two models.
(d) Plot the cumulative response charts for the two models (you can plot the results for the two
models in the same chart). Assume a hit is a POSITIVE record.
(e) Plot the gain charts for the two models (you can plot the results for the two models in the same
chart). Again, assume a hit is a POSITIVE record.
Problem 2. Download the file bank-data.csv. The key field in this data set is PEP (Personal Equity
Plan, a savings product our bank offers). Our goal is to predict whether or not a customer will purchase
a PEP. We have data from 600 customers as to their purchasing patterns. The fields are
id
a unique identification number
age
age of customer in years
sex
MALE / FEMALE
region
inner city/rural/suburban/town
income
income of customer
married
is the customer married (YES/NO)
children
number of children
car
does the customer own a car (YES/NO)
save
acct does the customer have a saving account (YES/NO)
current
acct does the customer have a current account (YES/NO)
mortgage does the customer have a mortgage (YES/NO)
pep
did the customer buy a PEP after the last mailing (YES/NO)
Use SPSS Modeler to answer the following questions.
(a) Use 67% of the data set for training and 33% for testing. Create the default C&RT and C5.0
decision tree models. Use the Analysis node to determine the misclassification rates and
coincidence matrices for each of the two models on the testing data.
(b) Use the evaluation node (and relevant charts) to answer the following questions for the decision
tree model:
1

HOMEWORK 3 DUE DATE: WEDNESDAY, SEPTEMBER 23 AT 6:00 PM

i. What fraction of those who would buy the PEP product do we reach if we mail to only
half of our customer base?
ii. If we mail to half of our customer base, what fraction of those would we expect to purchase
PEP (assuming, of course, we are mailing to a previously unmailed-to group).
iii. What lift would we get if we mailed to only the most likely 10% of the population?
(c) Repeat part (b) for the C5.0 model.
To learn how to draw different graphs using evaluation node in SPSS Modeler, please follow
the instructions given in the document Evaluation Node on blackboard. This document can
be found under the SPSS Modeler Documents area.
Problem 3. A data mining routine has been applied to a transaction dataset and has classified 88
records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).
(a) Construct the confusion matrix and calculate the error rate, accuracy rate, recall, precision,
specificity, and false alarm rate. Please include the formulas.
(b) Consider the decile lift chart below (decile lift chart is the same as lift chart portrayed as a
decile chart) for the transaction data model applied to a new data.

Interpret the meaning of the first and second bars from the left.
(c) Another analyst comments that you could improve the accuracy of the model by classifying
everything as nonfraudulent. If you do that, what is the error rate?
(d) Comment on the usefulness, in this situation, of these two metrics of the model performance
(error rate and lift).

BUSINESS DATA MINING (IDS 572)

Problem 4. Suppose we have developed a classifier that will be used in an alarm system. Usually we
are especially interested in portion of alarms caused by positive events (that should really fire an alarm)
and portion of alarms caused by negative events. The ratio between positive and negative events can
vary during time, so we want to measure the quality of our alarm system independently of this ratio.
The table below shows the results of a probabilistic classifier on a given test set. Draw the ROC curve
for this classifier (you can use Excel to draw this chart).
Inst#
1
2
3
4
5
6
7
8
9
10

Class
p
p
n
p
p
p
n
n
p
n

Score
0.9
0.8
0.7
0.6
0.55
0.54
0.53
0.52
0.51
0.505

Inst#
11
12
13
14
15
16
17
18
19
20

Class
p
n
p
p
n
n
p
n
p
n

Score
0.4
0.39
0.38
0.37
0.36
0.35
0.34
0.33
0.30
0.1

American Well: The Doctor Will E-See You Now - Case
100% (6)
American Well: The Doctor Will E-See You Now - Case
6 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
7 pages
Electrical Supplies and Materials
100% (2)
Electrical Supplies and Materials
3 pages
Module 7 Homework Prompt - JMP
No ratings yet
Module 7 Homework Prompt - JMP
6 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
Data Mining Question Set
No ratings yet
Data Mining Question Set
5 pages
Ga1 Deguzman Delto Regodon
No ratings yet
Ga1 Deguzman Delto Regodon
5 pages
6720 Labs Chapter 2
No ratings yet
6720 Labs Chapter 2
3 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
fda_a3_13642032.pdf
No ratings yet
fda_a3_13642032.pdf
19 pages
Chap3 Sec2 Overfitting
No ratings yet
Chap3 Sec2 Overfitting
22 pages
DMBI Questions
No ratings yet
DMBI Questions
8 pages
Data Mining List of Important Question
No ratings yet
Data Mining List of Important Question
4 pages
Final Exam Review
No ratings yet
Final Exam Review
6 pages
Project Data Mining
No ratings yet
Project Data Mining
55 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Sample Quiz1 Questions
No ratings yet
Sample Quiz1 Questions
8 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
Exam-dm1-121017-ans
No ratings yet
Exam-dm1-121017-ans
8 pages
15 Unit Wise Questions
No ratings yet
15 Unit Wise Questions
2 pages
Prediction---accuracy
No ratings yet
Prediction---accuracy
33 pages
Exam 2 Review
No ratings yet
Exam 2 Review
19 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
Concepts - Model Evaluation (Data Mining Fundamentals)
No ratings yet
Concepts - Model Evaluation (Data Mining Fundamentals)
40 pages
ML SP24 Mid Term Exam - Solution
No ratings yet
ML SP24 Mid Term Exam - Solution
8 pages
HW_02
No ratings yet
HW_02
3 pages
Final Report For Sales Dataset Project
No ratings yet
Final Report For Sales Dataset Project
25 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
Chapter 02 Overview (R)
No ratings yet
Chapter 02 Overview (R)
43 pages
MBA786M Project
No ratings yet
MBA786M Project
2 pages
hw2 2011spring
0% (1)
hw2 2011spring
3 pages
Unit 5(DS)
No ratings yet
Unit 5(DS)
15 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Id5059 23 2 1
No ratings yet
Id5059 23 2 1
8 pages
QB - Data Science
No ratings yet
QB - Data Science
4 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
4CL
No ratings yet
4CL
76 pages
Informartion Technology
No ratings yet
Informartion Technology
9 pages
Data Mining For Business Intelligence: Shmueli, Patel & Bruce
No ratings yet
Data Mining For Business Intelligence: Shmueli, Patel & Bruce
37 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Data Mining Notes
No ratings yet
Data Mining Notes
43 pages
EDA FAT
No ratings yet
EDA FAT
3 pages
Problem 1: Cse352 AI Homework 3 Solutions
No ratings yet
Problem 1: Cse352 AI Homework 3 Solutions
31 pages
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
No ratings yet
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
43 pages
Ml Record
No ratings yet
Ml Record
23 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Assignment DMW
No ratings yet
Assignment DMW
2 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Mid-Semester Regular Data Mining QP v1 PDF
No ratings yet
Mid-Semester Regular Data Mining QP v1 PDF
2 pages
Thera Bank
100% (1)
Thera Bank
25 pages
Lab 10 (1)
No ratings yet
Lab 10 (1)
6 pages
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Q1.Bayes' Theorem
No ratings yet
Q1.Bayes' Theorem
5 pages
Mid-Sem Model Answer 7
No ratings yet
Mid-Sem Model Answer 7
5 pages
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
QB - Data Science
No ratings yet
QB - Data Science
7 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Instructions For Installing The BatchInfo CSV File
No ratings yet
Instructions For Installing The BatchInfo CSV File
1 page
Modeling Data in The Organization
No ratings yet
Modeling Data in The Organization
45 pages
2011 Er Relational Exercise Pharma PDF
No ratings yet
2011 Er Relational Exercise Pharma PDF
4 pages
Work Energy & Power (N)
No ratings yet
Work Energy & Power (N)
1 page
Homework 2
100% (1)
Homework 2
25 pages
Gulati
No ratings yet
Gulati
4 pages
User's Manual: Rfid Access Control System
100% (1)
User's Manual: Rfid Access Control System
11 pages
To Study Market Share of Apollo Tyres For Passenger Car in Radial Tyres Segment at Varanasi Area
No ratings yet
To Study Market Share of Apollo Tyres For Passenger Car in Radial Tyres Segment at Varanasi Area
285 pages
Solid Waste Management
No ratings yet
Solid Waste Management
9 pages
Schrage Motor
No ratings yet
Schrage Motor
14 pages
Tutorial Bank: Unit-I The Cellular Concept-System Design Fundamentals
No ratings yet
Tutorial Bank: Unit-I The Cellular Concept-System Design Fundamentals
7 pages
Chapter 9 Atomic Absorption
No ratings yet
Chapter 9 Atomic Absorption
11 pages
TOR 1 Sanaag
No ratings yet
TOR 1 Sanaag
8 pages
Flange Sae
No ratings yet
Flange Sae
57 pages
Alea Labs Announces FlexAir™ HVAC Controller For Sustainable Home Heating and Cooling
No ratings yet
Alea Labs Announces FlexAir™ HVAC Controller For Sustainable Home Heating and Cooling
3 pages
Firewater Pump Systems For FPSOs and FSOs BROCHURE
100% (1)
Firewater Pump Systems For FPSOs and FSOs BROCHURE
8 pages
2G Optimization Process
No ratings yet
2G Optimization Process
17 pages
Virtual System For Airport (Vision Document)
No ratings yet
Virtual System For Airport (Vision Document)
11 pages
6479A1 3500-72 Rodrop
100% (1)
6479A1 3500-72 Rodrop
168 pages
M40 Rev1 Eng-2019
No ratings yet
M40 Rev1 Eng-2019
2 pages
Dkg-175 Automatic Transfer Switch (Without DC Supply)
No ratings yet
Dkg-175 Automatic Transfer Switch (Without DC Supply)
11 pages
Donaldson Air Filter - P618932 - Donaldson Filters
100% (1)
Donaldson Air Filter - P618932 - Donaldson Filters
3 pages
Manual de Soplador
No ratings yet
Manual de Soplador
8 pages
MSTD - BillingSoftware - User Manual Ver 1.01
No ratings yet
MSTD - BillingSoftware - User Manual Ver 1.01
52 pages
Quiz in MOOC: An Overview
No ratings yet
Quiz in MOOC: An Overview
5 pages
Advanced Novell Network Management Netware 6
No ratings yet
Advanced Novell Network Management Netware 6
699 pages
Community Leadership
No ratings yet
Community Leadership
2 pages
IT Does So Matter
No ratings yet
IT Does So Matter
5 pages
Act 87-88 Esp - Data PDF
No ratings yet
Act 87-88 Esp - Data PDF
12 pages
Firstranker: Ii B. Tech I Semester Supplementary Examinations May - 2013 Electrical Technology
No ratings yet
Firstranker: Ii B. Tech I Semester Supplementary Examinations May - 2013 Electrical Technology
4 pages
Installation Instruction: Single Pole Insulated Conductor Rail Programme 812
No ratings yet
Installation Instruction: Single Pole Insulated Conductor Rail Programme 812
9 pages
MS Polycoat - RBE
No ratings yet
MS Polycoat - RBE
1 page
2015 Mathematics Grade 4 Unit 3 Study Guide 2
No ratings yet
2015 Mathematics Grade 4 Unit 3 Study Guide 2
10 pages
Catalog Number: HTFX330125 Product ID: UPC Number: Status:: North American Specifications (UNSPSC)
No ratings yet
Catalog Number: HTFX330125 Product ID: UPC Number: Status:: North American Specifications (UNSPSC)
1 page
SYSTEM_DC_90_TECHNOLOGY
No ratings yet
SYSTEM_DC_90_TECHNOLOGY
11 pages