0% found this document useful (0 votes)
81 views2 pages

6720 Labs Chapter 7

Universal Bank wants to increase personal loan customers using data on 5,000 previous customers. The document asks to: 1) Perform a k-NN analysis with k=1 on training data to classify a new customer, predicting they would accept a loan. 2) Determine the best k value to balance overfitting and information by testing k from 1 to 10. 3) Report the number of correct and incorrect classifications on the validation set using the best k. 4) Classify the new customer using the best k. 5) Repartition data into training, validation, test sets and compare confusion matrices, noting differences indicate overfitting on training data.

Uploaded by

sweetie05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views2 pages

6720 Labs Chapter 7

Universal Bank wants to increase personal loan customers using data on 5,000 previous customers. The document asks to: 1) Perform a k-NN analysis with k=1 on training data to classify a new customer, predicting they would accept a loan. 2) Determine the best k value to balance overfitting and information by testing k from 1 to 10. 3) Report the number of correct and incorrect classifications on the validation set using the best k. 4) Classify the new customer using the best k. 5) Repartition data into training, validation, test sets and compare confusion matrices, noting differences indicate overfitting on training data.

Uploaded by

sweetie05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Mining Review Questions / XLMiner Labs

Chapter 7 k -Nearest Neighbors (k -NN)

1. Personal Loan Acceptance. Universal Bank is a relatively young bank growing


rapidly in terms of overall customer acquisition. Universal bank wants to convert
its liability customers (depositors) into personal loan customers (while retaining
them as depositors). A campaign that the bank ran last year for liability
customers showed a healthy conversion rate of over 9% success. This has
encouraged the retail marketing department to devise smarter campaigns with
better target marketing. The goal of our analysis is to model the previous
campaigns customer behavior to analyze what combination of factors make a
customer more likely to take out a personal loan.
The file UniversalBank.xls contains data on 5,000 customers. The data include
demographic information (age, income, etc.), the customers relationship with
the bank (mortgage, securities account, etc.), and the customers response to
the last personal loan campaign (variable = Personal Loan). Among the 5,000
customers, only 480 (9.6%) accepted the personal loan offer in the last
campaign (textbook reference - 7.1).
Partition the data into training (60%) and validation (40%) sets.
a. Perform a k -NN classification with all input variables except ID and ZIP
CODE using k = 1. (Remember to transform categorical variables with two
or more categories into dummy variables). Specify the success class as
1 (loan accepted), and use the default cutoff value of 0.5. How would
the following new customer be classified using your model: Age=40,
Experience=10, Income=84, Family=2, CCAvg=2, Education_1=0,
Education_2=1, Education_3=0, Mortgage=0, Securities Account=0, CD
Account=0, Online=1, and Credit Card=1?
b. What is the choice of k that balances between overfitting and ignoring the
predictor information? (Hint: Run k-NN for k values 1 to 10).
c. Using the Confusion Matrix for the validation data in Part b, how many
customers were classified correctly? How many customers were classified
incorrectly?
d. Classify the new customer using the best k.

e. Repartition the data; this time into training, validation, and test sets
(50% : 30% : 20%). Apply the k-NN method with the k chosen above.
Compare the Confusion Matrix of the test set with that of the training and
validation sets. Comment on the differences and their reason. What is
your assessment of the performance of this model?

You might also like