Module 7 Homework Prompt - JMP
Module 7 Homework Prompt - JMP
IMPORTANT: As you complete these problems, save your completed JMP files for each Problem so
that you may submit them with your Solution Word document.
Salmons Stores operates a national chain of women’s apparel stores. Five thousand copies of an
expensive four-color sales catalog have been printed, and each catalog includes a coupon that provides a
$50 discount on purchases of $200 or more. Salmons would like to send the catalogs only to customers
who have the highest probability of using the coupon. The file “Module 7 SalmonsStores.xlsx” contains
data from an earlier promotional campaign. For each of 1,000 Salmons customers, three variables are
tracked: last year’s total spending at Salmons, whether they have a Salmons store credit card, and
whether they used the promotional coupon they were sent. Use the data in “Module 7 SalmonsStores
Data.xlsx” to complete the steps below.
Follow the instructions below to partition the data into Training, Validation, and Test Sets, and perform
a Logistic Regression on the data.
Step 2: Now, we need to partition the data into Training, Validation, and Test sets. Under Analyze >
Predictive Modeling, select . In this dialog box, you don’t need to select
anything; just click OK. In the next window, change the Training Set value to 0.40, the Validation Set
value to 0.40, and the Test Set value to 0.2. In the Options, change New Column Name to “SetName”
without the quotes, and change the Random Seed to 1. Make sure that your dialog box matches the
screenshot to the right before you continue. Once you are satisfied, click Go.
Step 3: Now we can build the Logistic Regression model. Under the Analyze tab on the menu bar, select
Fit Model. Select the Coupon Column as the Y variable either by clicking and dragging or by
highlighting Coupon and then clicking . In the upper-right of this dialog box, the Personality
should automatically switch to Nominal Logistic; if it does not, go back to Step 1 and re-check your data
types. Next, add SetName to the Validation box and add Spending and Card to the Construct Model
Effects. Again, make sure that your dialog box matches the screenshot below before you continue.
Once you are satisfied, click Run.
Step 4: First, minimize all the report subsections except for Parameter Estimates. Then, click the red
triangle next to Nominal Logistic Fit for Coupon, and choose Lift Curve, then open the red triangle
again and choose Confusion Matrix (both are near the middle). Take screenshots of the Parameter
Estimates table, the three Lift Curves, and the Confusion Matrix, and paste them into the document
under the appropriate headers below.
Parameter Estimates
Step 5: Now we can begin working to understand the report. Interpret the output by completing the
sentence, “The smallest classification error on the validation set results from the model… ” in the space
below, rounding parameter values to four decimal places:
Step 6: Recall that a value of 1 indicates that the decile is equally likely to correctly predict observations
(customers in this case) compared to choosing randomly, while a value of 1.35 indicates that the decile
is 35% more likely to predict customers correctly. Now, with this in mind, consider the Lift Curves we
added in Step 4; at what decile should we expect our model to be around twice as good at predicting
which customers use a Coupon? Enter your answer in the space below.
Step 7: Again consider the Lift Curves, and compare the Lift Curve on Training Data to the other two
Lift Curves. Does this suggest that the Regression Equation you defined in Step 5 has good predictive
power, or is there evidence of model overfitting? Justify your answer in the space below.
Problem 2 – BlueOrRed
Suppose that campaign organizers for both the Republican and Democratic parties are interested in
identifying individual undecided voters who would consider voting for their party in an upcoming
election. The file “Module 7 BlueOrRed Data.xlsx” contains data on a sample of voters with tracked
variables, including whether or not they are undecided regarding their candidate preference, age,
whether they own a home, gender, marital status, household size, income, years of education, and
whether they attend church.
Follow the instructions below to partition the data into Training, Validation, and Test Sets, and perform
K Nearest Neighbors on the data.
Step 4: Take screenshots of the Model Selection Chart, Training Table, Validation Table, and Test Table
and include them under the appropriate headings below.
Training Table
Validation Table
Test Table
Step 5: Consider the four screenshots you just took; these report the misclassification percentage for
each value of k, meaning for each number of neighbors used in the k-Nearest Neighbors Classification
procedure. Based on these screenshots, which value for k has the smallest misclassification rate, and
consequently, what is the optimal number of neighbors to use in our analysis? Justify your response in
the space below.
Step 6: Consider the Misclassification Rates reported on each Training, Validation, and Test tables; does
the error rate reported on the Training table seem to be optimistic (better than the true performance of
the model), conservative (worse than the true performance of the model), or somewhere in the middle?
Justify your response in the space below.
Problem 3 – CreditScore
Step 5: Now, click the red triangle menu and choose Display Options > Show Tree. Don’t be alarmed,
but it won’t fit on your screen very well. Take a screenshot of the top node labeled “All Rows” and paste
it into the space below (again, just the one box that says All Rows at the top). IMPORTANT: If you
click Split, Prune, or Go, you’ll have to Redo the Analysis.
Step 6: There’s a lot to dive into here, but we’ll keep it simple and accept the Decision Tree that JMP
has given us. This might look very intimidating, but this Decision Tree is just like the one we went over
in the Classification Trees lecture video – it just looks a little different! Let me provide a quick
recap/description of how the tree is formatted in JMP Pro:
Each node is giving us the criteria for entry into the node (the bold label text), along with some
key metrics: the number of people in our Training Sample that meet the criteria to sort into this
node (which means they meet all the criteria from the nodes higher in the Tree as well), the
Mean credit score of those qualifying individuals (which doubles as the predicted value at that
node), and the Standard Deviation of credit scores within those qualifying individuals.
So, with that in mind, use the Decision Tree to predict the credit score of an individual who has had 5
credit bureau inquiries, has used 10% of her available credit, has $14,500 of total available credit, has no
collection reports or missed payments, is a homeowner, has an average credit age of 6.5 years (i.e.
CreditAge=6.5), and has worked continuously for the past 5 years (i.e. TimeOnJob=5). Enter your
estimate for the credit score, i.e. the Mean of the final node reached by the individual described above,
into the space below, rounding your answer to two decimal places.
Hint/Reminder: the process for this was described in the Classification Trees video provided on D2L; if
you’re stuck, review that video, and if you’re still stuck, send me (or the Tutoring Office) a quick email
so we can find a time to meet. This is a lot easier to explain “live”.
Once you have completed all three problems above, submit your completed version of this file to the
Assignment on D2L along with your completed JMP files. As always, let me know if you have any
questions!