0% found this document useful (0 votes)
11 views40 pages

Session6 Choice

Uploaded by

Trend Tubbies
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views40 pages

Session6 Choice

Uploaded by

Trend Tubbies
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Prof.

Raghunath Singh Rao


Marketing Research & Methods
SESSION 6: UNDERSTANDING CONSUMER CHOICE
FEB 12, 2024

Raghunath Rao Marketing Research & Methods @ BITSoM 1


Outcomes can be qualitative
• In many situations, response variable is qualitative (category)
• Eye color: blue, brown, green
• Final grades of the course Marketing Analytics: A, A-, B+, ..
• Churn/ Stay Loyal
• Credit default /Do not default
• Buy/Not buy
• Support/Do not support
• Subscribe/Do not subscribe
• Click/Do not click
• Positive/Negative
• Spam/Ham

Raghunath Rao Marketing Research & Methods @ BITSoM 2


Linear Regression Approach to Qualitative Outcomes
• Suppose we are trying to understand why people default on
their auto loans
• We have data on 10,000 auto-loans via a bank
– We know who defaulted (1) and who did not (0)
– We have the debtor’s information: income, credit score, age, marital
status, state of residence
– We have loan specific information: interest rate, amount of loan,
details of vehicle financed
• How do we build a model of loan default using this data?

Raghunath Rao Marketing Research & Methods @ BITSoM 3


Running a linear regression with outcome data

• Homoscedasticity violated
• Negative predictions
• Predictions >1
• Bad fit

We need a new tool!

Raghunath Rao Marketing Research & Methods @ BITSoM 4


Logistic Regression
• Binary (0/1) dependent variable yi
– Typically a response to marketing actions (or lack of):
acquisition of new customer, losing a current customer (churn), click on
banner ad, ordering an Uber ride, etc.

• Link 0/1 dependent variable to independent variables through a


“probability of observing y=1”:

exp 𝑎𝑎 + 𝑏𝑏1𝑋𝑋1 + 𝑏𝑏2𝑋𝑋2 + ⋯ + 𝑏𝑏𝐾𝐾𝑋𝑋𝐾𝐾


𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 =
1 + exp 𝑎𝑎 + 𝑏𝑏1𝑋𝑋1 + 𝑏𝑏2𝑋𝑋2 + ⋯ + 𝑏𝑏𝐾𝐾𝑋𝑋𝐾𝐾

• This function is called the “logit function” and comes from utility theory
Raghunath Rao Marketing Research
5 & Methods @ BITSoM
Logit function
1
p(response)

exp 𝑎𝑎 + 𝑏𝑏 Income
𝑝𝑝 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 =
1 + exp 𝑎𝑎 + 𝑏𝑏 Income

0
X (income)

Raghunath Rao Marketing Research


6 & Methods @ BITSoM
Interpretation of logit coefficients
• b<0 : negative effect on probability of outcome of interest
• b=0 : no effect on probability
• b>0 : positive effect

• Or… we look at exp(b)


𝑃𝑃 𝑦𝑦𝑖𝑖=1
• exp(b) is change in the “odds” if X increases by one unit
𝑃𝑃 𝑦𝑦𝑖𝑖=0

Raghunath Rao Marketing Research


7 & Methods @ BITSoM
Running a logit model
• Let’s start with a simplest possible model with a binary output
and one input
• Data from 1022 field goal attempts from NFL
• Two pieces of information
– Did the goal happen (“Good”)?
– Goal attempt distance (“Length”)

See file “S6 FG Data.xls”

Raghunath Rao Marketing Research & Methods @ BITSoM 8


Step 1: Running a logistic regression

Raghunath Rao Marketing Research & Methods @ BITSoM 9


Step 3: Key things to look for

Raghunath Rao Marketing Research & Methods @ BITSoM 10


Step 3: Visualizing the Effect
Good / Standardized coefficients(95% conf. interval)
0

-0.1

-0.2
Standardized coefficients

-0.3

-0.4

-0.5

length

-0.6

-0.7
Variable

Raghunath Rao Marketing Research & Methods @ BITSoM 11


Step 3: Predictions

Raghunath Rao Marketing Research & Methods @ BITSoM 12


Confusion (Classification) Matrix

Classification table for the training sample (Variable Good):

%
from \ to 0 1 Total correct
0 8 169 177 4.52%
1 5 840 845 99.41%
Total 13 1009 1022 82.97%

Not a very good classifier!

Raghunath Rao Marketing Research & Methods @ BITSoM 13


In class exercise #1
• Please look at Excel File “S6 Healthcare Data.xls”
• It contains results of a survey about whether a person supports
or does not support ACA (“Obamacare”)
• Run a logistic regression with “For?” as DV and other variables
as IV
• What is the prob that a 57 year old Democrat who makes 90K
supports ACA?

Raghunath Rao Marketing Research & Methods @ BITSoM 14


Logit Models
• Very powerful tools

• Simple to use yet have enormous predictive power

• Often work as well as sophisticated machine learning tools


– Especially with large datasets

• Workhorse of marketing analytics toolkit

Raghunath Rao Marketing Research & Methods @ BITSoM 15


Choice Models: Two Applications

1. Understanding and classifying outcomes (Predictive


Modeling)
– Is someone pregnant in your home?

2. Optimizing outcomes (Prescriptive Modeling)


– How can I win an auction and profit from it?

Raghunath Rao Marketing Research & Methods @ BITSoM 16


Application 1

Raghunath Rao Marketing Research & Methods @ BITSoM 17


Data Analytics to Predict Pregnancy

• A mid-sized retailer “Lone Star Mart” in Texas


• Millions of customer purchase tracked via their loyalty cards
– What they purchased, when they purchased
– Did someone become pregnant in the household?

• Can certain type of purchases provide signal for the upcoming


event?

File “S6 Retail Data.xls”

Raghunath Rao Marketing Research & Methods @ BITSoM 18


Standardized coefficients

0.2
0.4
0.6
0.8
1.2

-0.8
-0.6
-0.4
-0.2
0
1
Male

Female

Home

Apt

Pregnancy Test

Birth Control

Feminine Hygiene

Folic Acid

Prenatal Vitamins

Variable Prenatal Yoga


(95% conf. interval)

Body Pillow
PREGNANT / Standardized coefficients

Ginger Ale
Raghunath Rao Marketing Research & Methods @ BITSoM

Sea Bands
Standardized Effects

Stopped buying ciggies

Cigarettes

Smoking Cessation

Stopped buying wine

Wine

Maternity Clothes
19
Classification Matrix
True response

No Yes

True negative False Negative


Predicted No
(TN) (FN)
response False Positive
True
Yes (FP)
positive (TP)

Hit Rate=(TN+TP)/(TN+FN+FP+TP)

Raghunath Rao Marketing Research & Methods @ BITSoM 20


Two Important Classification Metrics

• True Positive Rate (TPR): (True Positives)/(Total actual


positives in the data): TP/(TP+FN)
– Also called “sensitivity”

• False Positive Rate (FPR): (False Positives)/(Total actual


negatives in the data): FP/(FP+TN)

Raghunath Rao Marketing Research & Methods @ BITSoM 21


ROC curve measuring prediction performance
True response

No Yes

No TN FN
Predicted increase x%
response
Yes FP TP

Rule: “Target top x%” What is a good x?


If we increase x% (lower threshold), both number of false and true positives go up!
Good prediction would show quick increase in true positives and
slow increase in false positive
ROC curve visualizes this
Raghunath Rao Marketing Research & Methods @ BITSoM 22
Receiver Operating Characteristic (ROC)

ROC CURVE (AUC=0.8)


1
0.9 better

Sensitivity (true positive rate = benefit)


0.8
0.7
Rate at which we correctly predict an
0.6
actual conversion worse Complete random
(y=1) 0.5
guess
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
1 – Specificity (false positive rate = cost)

Rate at which we incorrectly predict a conversion, while in fact the customer


was not converted
Raghunath (y=0) Research & Methods @ BITSoM
Rao Marketing 23
Model Performance: ROC and AUC

• An ROC (Receiver Operating Characteristic) curve is the most


commonly used way to visualise the performance of a binary
classifier

• AUC (Area Under the Curve) is the best way to summarize its
performance in a single number.

Raghunath Rao Marketing Research & Methods @ BITSoM 24


ROC

Raghunath Rao Marketing Research & Methods @ BITSoM 25


ROC

• ROC plots the True Positive Rate on the y-axis versus the False Positive Rate on the x-
axis.
– It calculates the TPR and the FPR for each classification “threshold”.
– The point above is critical. A standard misclassification matrix shows you errors in
classification only for a single threshold.
– A low threshold means you will increase your TPR, but also increase your FPR. A high
threshold means you reduce your TPR, but also reduce your FPR.
– Setting the threshold is a managerial decision, based on expected costs.
• Example 1: You are deciding whether to give a loan to an applicant or not. It is important that you
not give it to someone who is likely to default – you want to minimise the FPR. A high threshold
will do that.
• Example 2: You have an automatic trigger to flag credit card fraud. Here, it is vital that you
maximise the TPR. A low threshold will do that.
Raghunath Rao Marketing Research & Methods @ BITSoM 26
Area Under the Curve (AUC)

• AUC measures the percentage of the box that is under the


curve.
– The higher the AUC, the better the model.
– The worst model has an AUC of 0.5, i.e., it does no better than
random guessing.
– The higher the curve to the top left, the better the model.

Raghunath Rao Marketing Research & Methods @ BITSoM 27


Lone Star Mart Approach
• Oversampling
– Randomly selected 500 past non-pregnant households from loyalty
members
– Randomly selected 500 past pregnant households from loyalty members
– Why would you do that?
– This is the training data

• Randomly selected another 1,000 household from loyalty members


– This is the test data
File “S6 Retail Data New.xls”

Raghunath Rao Marketing Research & Methods @ BITSoM 28


What are the costs of misclassification?
• False Negative: If I am not able to identify a pregnant
household
– I am losing thousands of dollars in potential lifetime revenue
• False Positive: If I incorrectly identify a non-pregnant
household as pregnant
– I am spending promotion dollars at wrong target
– I am might turn-off my customers and lose them forever

See “S6 Pregnancy ROI. xls”

Raghunath Rao Marketing Research & Methods @ BITSoM 29


Application 2

Designing optimal bidding strategy for Fjord Motors

S6 auto auction data.xls

Raghunath Rao Marketing Research & Methods @ BITSoM 30


Data
• History of 4,000 bids for fleet sales
– Coronet Elizabeth
• MSRP: $25,000 Cost: $15,000
– Bids are between cost and MSRP
• You have been hired as a consultant to study their bidding
strategy and come up with a better plan
S6 auto auction data.xls
© Columbia Case Works

Raghunath Rao Marketing Research & Methods @ BITSoM 31


Data: Information on 4,000 bids

Bid no Units Unit Price Win(?)


1 12 16551 1
2 24 16272 0
3 16 21266 1
4 21 18805 0
5 27 15884 0
6 13 22168 0
7 15 15226 1
8 27 18850 0
9 29 18755 0
10 20 22003 0
11 11 22064 0

Raghunath Rao Marketing Research & Methods @ BITSoM 32


Analysis Strategy
• What does company want to know?
– How to bid?
– A prescription

• What does analyst need to know?


– What does bid do to the outcome?
– A prediction

Raghunath Rao Marketing Research & Methods @ BITSoM 33


Step 1: Model Building

• Identify DV

• Identify IVs

Raghunath Rao Marketing Research & Methods @ BITSoM 34


Step 2: Model estimation

• Run the model


– How?

• Why logistic regression?

Raghunath Rao Marketing Research & Methods @ BITSoM 35


Model Estimates

Intercept (a) Prc/MSRP (b)


7.76 -9.16

Raghunath Rao Marketing Research & Methods @ BITSoM 36


Step 3: Checking Parameter Signs
• Do the signs look right?

• Price/MSRP estimate negative and significant.


– Makes sense?

• This is your baseline predictive model

Raghunath Rao Marketing Research & Methods @ BITSoM 37


Step 4: Creating a Prescriptive Model

• What should the firm do?

• Your prescription requires you to find out firm’s objective

• They want to maximize wins?


– Does not require any analytics

• Maximize profits?
Raghunath Rao Marketing Research & Methods @ BITSoM 38
Expected profits

Expected profits = (Profit if win) * (Probability of Win)


= [(Pr/MSRP)*MSRP-COST]*(Prob. Of win)

What is the probability of win a given price?


Our predictive model tells us!

Raghunath Rao Marketing Research & Methods @ BITSoM 39


In Class Exercise #2
• Please look at Excel File “S6 Auto Auction Data.xls”
• Create a new dummy for Police Department orders
– First 2,000 rows in the dataset are for Police Department orders while
the rest are for rental car companies
• Include “Pr/MSRP” and “Police Dummy” in the logit regression
as explanatory variable.
• Explain in words, what does the estimate of “Police” in logit
estimates imply?
• What is the optimal price for police order? For rental car orders?

Raghunath Rao Marketing Research & Methods @ BITSoM 40

You might also like