0% found this document useful (0 votes)

58 views9 pages

BOP Assignment

This document describes a logistic regression analysis conducted on customer data from Blackberry retailers' loyalty program to determine the optimal targeting for a new marketing campaign. The analysis involved cleaning the data, building a logistic regression model to predict purchases, using an ROC curve to determine a new probability cutoff, rebuilding the model, and calculating deciles and their purchase rates to select the optimal target population. The goal is to maximize profits from the campaign by targeting the most responsive customers.

Uploaded by

Saurav Prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views9 pages

BOP Assignment

Uploaded by

Saurav Prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Logistic Regression Individual assignment

An assignment submitted in partial fulfillment of the Course

Marketing Analytics

Problem Statement:
Blackberry is a major retailer, running multiple stores. They have a huge loyalty program with
approximately 250,000 members.
They have recently launched a new range of private label consumer durables and are looking to run a
marketing campaign covering all the customers who are part of the loyalty program. A mini campaign
run was run on a random sample of approximately 22,000 customers who have signed up to the loyalty
program and the purchase outcome (or otherwise) has been recorded. Data on the following
information is available from the sample dataset
ID Customer identification
Prosp Prosperity grade on a scale of 1 to 30
Age Age, in years
Code House location Code
Gender Male/Female/Unknown
Region Geographic Region
TV Television Region
Class Loyalty Status
AmtSpent Amount spent by customer
Time Length of relationship
Buy Yes/No

Further it is known that the lifetime value of a new customer is $15,000 while the cost per capita of the
campaign is $4,420.
As it does not make sense to target the whole population of 220,000, Therefore, we have to determine
the best subset based on the model built on the sample (top 10%/20%/30%...).
To do so, focus is to decile the sample based on probability of purchase and use performance in each
decile (% purch, % non purch) to determine what percent of overall population should be targeted to
maximize profits by appluting the below formula:
(15,000 * # of customers in top 2 decilea * Cumulative %purch) – (4,420 * # of customers in top 2
deciles)
With the objective of maximizing profits, we are required to come up with an analytics-based targeting
plan, given the campaign economics stated above.

Approach:

1. Framing and Understanding the Problem Statement

2. Analysis of Survey Questions
3. Data Clean-up (replace missing data with the mode in case of categorical variables and with the
mean in case of a numeric variable)
4. Logistic Regression
5. ROC Curve to determine the Cut-off Value

Page | 2
6. Logistic Regression with the new Cut-off value
7. Hosmer and Lemeshow Test to determine the ideal Sample Set

Data Cleanup

If we look at the data set, this is about a set of data that is with Blackberry for a mini-campaign run on a
random sample of 22223 customers who have signed up for the loyalty program and the purchase
outcome has been recorded. It is having data about their customers and some prospects who did not turn
out to be customers at the end of the marketing campaigns.
Some information about the customers and their prospects, like prosperity, location, age, class, region,
and buying decision which tells us whether they own a house or not own a house, the number of cars that
they own, age of the buyer, whether a loyalty program has been taken up or not.
Code, Gender, Region, and TV Region are essentially categorical transformations of the earlier variables.
As evident in the dataset, a number of data points in each category were found to be missing. Referring
to the Data Preparation instruction in the case, for numeric variables we replace the missing data for
numeric variables with the mean calculated based on the remaining data and for categorical variables with
the mode. The above exercise was achieved with the help of using pivot tables.

Logistic Regression

Now the modified data, we started to actually build the model and ran the Logistic Regression.
Buying decision of the customer is our dependent variable and Prospect, Code, Gender, Region, TV,
Class, Amount spent by the customer, Time and age are the covariates.
Code, Gender, Region, and TV are identified as categorical variables based on the responses that are there
in the data. Since they are alphanumeric, it is automatically coded as categorical.
Next, we save the predicted probabilities to see how that is panning out.
The default cut-off in SPSS is always 0.5, which means if a particular row has a predicted probability
greater than or equal to 0.5, it is classified as a positive outcome. In this case, somebody who's potentially
going to buy. And if the probability is less than 0.5, then it is classified as a zero, or predicted zero, or
somebody who is not going to buy.

 Output of the Model

Page | 3
Block 0
In the output, the first block, essentially, tell us how many cases are there for the analysis and if there are
any missing cases.
The next block tells us how the dependent variable has been coded internally.

The dependent variable, in this case, is 1 or 0, and internally also it has been coded as 1 or 0.

Then next comes the categorical variable coding. Code, Region, TV, and Gender is a categorical variable
and we have kept the reference category last. Hence the last variable is coded as 0 and others are coded
with respect to that. For example, in Gender “Unknown” is coded as 0 and Male and Female are coded
with respect to that. The beginning block is essential and talks about when we have the null model, which
is when there are no predictors in the model, And the accuracy which we got is 75.2 percent with a
default cut-off value of 0.5.

Block 1

This block model summary gives us the equivalent of the R square which explains the proportion of
variation. So Cox and Snell R square value (R square) is coming as 0.201 and Nagelkerke R
square(Adjusted R square) value is 0.298.

Page | 4
The classification table tells us how many are the actual number of buyers and what the predicted number
of buyers is and how many actual numbers of nonbuyers and the predicted number of nonbuyers. Also,
we get the sensitivity and specificity data from this table. Right now it is telling us that our model has an
accuracy of 80.3 percent (with the default cut-off of 0.5).

There are 9 variables in the equation.

For eg: Prosperity has a beta of 0.251. We will interpret this positive coefficient as exponential to the
power 0.251 which is 1.285 which shows that 1 unit increase in Prosperity increases the likelihood of
buying by 28.5%.

ROC Curve:

ROC Curve curve to determine an optimal cut-off and refine our results. The saved probabilities are
deterministic in building the ROC Curve. This can be checked and viewed in the data view of SPSS

Page | 5
whether the probabilities are reflected or not. The predicted probability reflects if the new range of private
label consumer durables recently launched by Blackberry will be bought or not.
The ROC Curve helps to understand how to balance the cost of false positives and false negatives.
Parameters that are there in understanding accuracy.
Overall accuracy is always going to be the percentage of 1s plus the percentage of 0s. So the number of
the 1s correctly predicted plus the number of negative 0s correctly predicted the total number of
responses. But there are two other metrics that are of concern.
One is Sensitivity and the other is Specificity. The reason is, not always the overall accuracy important.
There are times when the percentage of ones correctly predicted is much more important than the overall
accuracy and similarly percentage of zeros correctly is more important the overall accuracy.

 Output

In the output, The first block is telling us which is the number of positives and the number of negatives
and the Roc curve.
Next Block tell us area under the curve which shows how good the model is.

If the value is 0.9 and above it is considered to be an excellent model.

Our value id 0.788 which comes under the acceptable range.

Page | 6
New Cut off value based on ROC Curve:

To refine our results we got a new cut-off value from the ROC table, we get the value from the
intersection of sensitivity and specificity.

The point of intersection the specificity of 0.714 and p=0.246. Now we rebuilt the logistic regression, and
instead of 0.5, we keep 0.246. And then we run the model again.

Logistic Regression with the new Cut off Value:

We see the overall accuracy dropped to 71.6 % however since we want to have the number of 1’s
correctly predicted as high as possible which increased to 3917 will help us to target more people who are
likely to buy than what could have happened earlier. This is how we balance the ROC Curve here. Now
we are able to get a better trade off done and more possible ones captured. Anybody who's predicted
properties greater than this number will be considered as, as a possible buyer and will be targeted by the
marketing campaign. This is how we balance on the costs of false positive and false negative.

Decile Calculation:

In the rebuilt model for decile calculation, we use the Hosmer and Lemeshow Test.

Page | 7
It works in a way that after the algorithm scores the data and creates the predicted probabilities, the data is
sorted in descending order of the predicted probabilities. Once the data is sorted in the descending order
of predicted probabilities, it is decile into 10 groups (10%,20%,30%.....100%). So we now have for each
decile an observed value and expected value for both zeros and ones.

We take deciled data to excel where we find out the buyer percentage, and percentage of the segment size
and finally calculate the profitability.

On our observation, we see the 30% segment size had near to 51% buyer in the sample. This is also
explained by the logic of maxima, as the segment increases there will be a point of maximum and then the
profitability will start to fall.

Profit Calculation:

It is known that the lifetime value of a new customer is $15,000 while the cost per capita of the campaign
is $4,420.
Buyer% in most profitable sample (30%) = 50.94%%
So the total profit will be
(15,000 * # of customers in top 3 decile * Cumulative % purch) – (4,420 * # of customers in top 3
deciles)
= 15,000 X (22223 X 30.01%) X 50.94% – 4,420 X (22223 X 30.01%)
= 15,000 X 6669 X 50.94% – 4,420 X 6669
=21476640

Conclusion:

In order to maximise profits, the best subset is to be based on the model built on sample of 30%.

Page | 8
Page | 9

Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Measures of Central Tendency Dispersion and Correlation
100% (1)
Measures of Central Tendency Dispersion and Correlation
27 pages
ED 801 Module 4 Answers
100% (1)
ED 801 Module 4 Answers
23 pages
Types of Reliability
No ratings yet
Types of Reliability
39 pages
One Sample Tests of Hypothesis: ©the Mcgraw Hill Companies, Inc. 2008 Mcgraw Hill/Irwin
100% (1)
One Sample Tests of Hypothesis: ©the Mcgraw Hill Companies, Inc. 2008 Mcgraw Hill/Irwin
45 pages
Weibull-Analysis-In-Excel Standard IEC 61649
No ratings yet
Weibull-Analysis-In-Excel Standard IEC 61649
113 pages
Biostatistics Notes
No ratings yet
Biostatistics Notes
43 pages
BUS 511 Spring2014-Sec02 Stat
No ratings yet
BUS 511 Spring2014-Sec02 Stat
4 pages
Linear Regression Analysis: Module - I
No ratings yet
Linear Regression Analysis: Module - I
13 pages
A Study On The Effectiveness of Personal Protective Equipment (PPE) On Building Construction Workers
No ratings yet
A Study On The Effectiveness of Personal Protective Equipment (PPE) On Building Construction Workers
9 pages
Sampling
No ratings yet
Sampling
32 pages
Remote Sensing
0% (1)
Remote Sensing
20 pages
ECO Paper
No ratings yet
ECO Paper
35 pages
Revision Question
No ratings yet
Revision Question
7 pages
Specification Variable in Econometrics
No ratings yet
Specification Variable in Econometrics
15 pages
Assignment (Unit I)
No ratings yet
Assignment (Unit I)
2 pages
DLL Statistics and Probability
No ratings yet
DLL Statistics and Probability
6 pages
11 ECO 06 Measures of Dispersion
No ratings yet
11 ECO 06 Measures of Dispersion
7 pages
Shepherdville College: Q1. If You Score 76, 85, 97, and 83 On Your Math Tests, What Would Your Median Score Be?
No ratings yet
Shepherdville College: Q1. If You Score 76, 85, 97, and 83 On Your Math Tests, What Would Your Median Score Be?
3 pages
EMF - Prático
No ratings yet
EMF - Prático
32 pages
Stats Quiz 5 Ans
No ratings yet
Stats Quiz 5 Ans
4 pages
DS-301 Introduction To Data Science
No ratings yet
DS-301 Introduction To Data Science
2 pages
Structural Equation Modeling
No ratings yet
Structural Equation Modeling
12 pages
Assignment Sta116
No ratings yet
Assignment Sta116
3 pages
Analisis Pengaruh Inflasi, Suku Bunga Kredit, Pendapatan Per Kapita Terhadap Penanaman Modal Dalam Negeri Di Indonesia
No ratings yet
Analisis Pengaruh Inflasi, Suku Bunga Kredit, Pendapatan Per Kapita Terhadap Penanaman Modal Dalam Negeri Di Indonesia
19 pages
Assignment 2-Data Analysis and Report Writing
No ratings yet
Assignment 2-Data Analysis and Report Writing
2 pages
Confidence Intervals: Vocabulary: Point Estimate - Interval Estimate - Level of Confidence - Margin of Error
No ratings yet
Confidence Intervals: Vocabulary: Point Estimate - Interval Estimate - Level of Confidence - Margin of Error
8 pages
Ch.3 Normal Distribution
No ratings yet
Ch.3 Normal Distribution
1 page
X X Y Y: Respondent No Attitude Toward Skin Care Brand (Y) Length of Brand Use (X)
No ratings yet
X X Y Y: Respondent No Attitude Toward Skin Care Brand (Y) Length of Brand Use (X)
4 pages
Variance of Sample Variance
No ratings yet
Variance of Sample Variance
0 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

BOP Assignment

Uploaded by

BOP Assignment

Uploaded by

Logistic Regression Individual assignment

An assignment submitted in partial fulfillment of the Course

1. Framing and Understanding the Problem Statement

 Output of the Model

There are 9 variables in the equation.

If the value is 0.9 and above it is considered to be an excellent model.

Our value id 0.788 which comes under the acceptable range.

Logistic Regression with the new Cut off Value:

You might also like