0% found this document useful (0 votes)
63 views7 pages

Col Solare Case Study 2

A logistic regression model was created to predict the probability of customers buying wine. Customers were assigned to deciles based on their predicted probabilities, with decile 1 having the highest predicted probabilities. Lift analysis showed that targeting customers in the top deciles resulted in much higher response rates compared to random selection, demonstrating the model's ability to effectively target high-potential customers.

Uploaded by

perestotnik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views7 pages

Col Solare Case Study 2

A logistic regression model was created to predict the probability of customers buying wine. Customers were assigned to deciles based on their predicted probabilities, with decile 1 having the highest predicted probabilities. Lift analysis showed that targeting customers in the top deciles resulted in much higher response rates compared to random selection, demonstrating the model's ability to effectively target high-potential customers.

Uploaded by

perestotnik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

1.

A logistic regression model was created in R after first transforming the “customer type” variable
into a 0/1 dummy variable called restaurant. The following command was used:
> ColSolare$restaurant <- ifelse(ColSolare$customer_type == "restaurant",
1, 0)

The following command was used to create the logit model:


> ColSolarelogit <- glm(buyer ~ last_purch + dollars + restaurant +
customer_sqft + cab_franc + cab_sauvignon + malbec + merlot + red_blend +
syrah, family = binomial(link='logit'), data = ColSolare)
> summary (ColSolarelogit)
> ColSolare$purch_prob <- predict.glm(ColSolarelogit, ColSolare, type =
"response")

The following output was obtained in R:


Deviance Residuals:
Min 1Q Median 3Q Max
-2.2897 -0.4260 -0.2897 -0.1862 3.3694

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.778e+00 1.066e-01 -16.679 < 2e-16 ***
last_purch -9.215e-02 3.112e-03 -29.613 < 2e-16 ***
dollars 3.401e-04 3.942e-04 0.863 0.38828
restaurant -8.181e-01 4.027e-02 -20.317 < 2e-16 ***
customer_sqft 2.787e-05 3.110e-05 0.896 0.37021
cab_franc -3.521e-01 2.856e-01 -1.233 0.21767
cab_sauvignon 3.335e-01 2.846e-01 1.172 0.24134
malbec -7.616e-01 2.860e-01 -2.663 0.00775 **
merlot -4.148e-01 2.845e-01 -1.458 0.14479
red_blend 8.094e-01 2.849e-01 2.840 0.00451 **
syrah 2.253e-03 2.853e-01 0.008 0.99370
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 23817 on 39999 degrees of freedom


Residual deviance: 19373 on 39989 degrees of freedom
AIC: 19395

Number of Fisher Scoring iterations: 6

A new variable, purch_prob, was created using the command:


> ColSolare$purch_prob <- predict.glm(ColSolarelogit, ColSolare, type =
"response")

2. The odds ratios were calculated in R using the commands:


> oddsratio_ColSolarelogit <- exp(ColSolarelogit$coef)
> oddsratio_ColSolarelogit

The following output was obtained:


(Intercept) last_purch dollars restaurant customer_sqft
cab_franc cab_sauvignon malbec merlot
0.1689488 0.9119639 1.0003402 0.4412598 1.0000279
0.7032023 1.3958055 0.4669234 0.6604796
red_blend syrah
2.2464588 1.0022553
The odds ratios can be interpreted for marketing managers as follows:

The odds of buying the 2019 Red Blend after receiving the free sample increase by a factor of .91
(which means decrease by 8.8%) for each additional month elapsed since the customer’s most recent
purchase from Col Solare if all else remains constant.

The odds of buying the 2019 Red Blend after receiving the free sample increase by a factor of .44
(which means decrease by 55%) if the customer’s establishment is a restaurant. In other words, odds
of buying are more than 1 if it is a bar.

The probability of buying the 2019 Red Blend after receiving the free sample is not affected by the
total dollars that the customer has spent with Col Solare (odds ratio of 1) if all else remains constant.

The probability of buying the 2019 Red Blend after receiving the free sample is not affected by the
size of the customer’s establishment (odds ratio of 1) if all else remains constant.

3. Customers were each assigned to a decile based on their purchase probabilities stored in the
variable purch_prob. The following command was used:
> ColSolare$predict <- 11 - ntile(ColSolare$purch_prob, 10)

The following output was obtained:


0% 10% 20% 30% 40%
50% 60% 70% 80%
0.0005558238 0.0119596568 0.0200447655 0.0283770059 0.0380511811
0.0511387163 0.0659792076 0.0905440840 0.1256687440
90% 100%
0.1978702502 0.9793837331

4. The bar chart is shown below.


> ggplot(ColSolare) + geom_bar(aes(x = predict, y = buyer), stat =
"summary", fun = "mean")
5. The following report was generated:
> ColSolare %>% group_by(ColSolare$predict) %>% summarize(count =
length(red_blend), buyers = sum(buyer), responserate =
sum(buyer)/sum(count))

# A tibble: 10 x 4
`ColSolare$predict` count buyers responserate
<dbl> <int> <dbl> <dbl>
1 1 4000 1458 0.364
2 2 4000 598 0.150
3 3 4000 401 0.100
4 4 4000 331 0.0828
5 5 4000 226 0.0565
6 6 4000 184 0.046
7 7 4000 118 0.0295
8 8 4000 89 0.0222
9 9 4000 80 0.02
10 10 4000 32 0.008

6. The logistic regression was performed using only the merlot variable using the command
> ColSolaremerlot <- glm(buyer ~ merlot, family = binomial(link='logit'),
data = ColSolare)
> summary (ColSolaremerlot)

The following output was obtained in R:


Call:
glm(formula = buyer ~ merlot, family = binomial(link = "logit"),
data = ColSolare)

Deviance Residuals:
Min 1Q Median 3Q Max
-0.5447 -0.4306 -0.4162 -0.4162 2.2314

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.40289 0.02251 -106.766 < 2e-16 ***
merlot 0.07122 0.01496 4.761 1.93e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 23817 on 39999 degrees of freedom


Residual deviance: 23795 on 39998 degrees of freedom
AIC: 23799

Number of Fisher Scoring iterations: 5

> exp(ColSolaremerlot$coef)

(Intercept) merlot
0.09045611 1.07381299

Intuitively, the odds ratio for Merlot is different than the earlier logistic regression because the
second model does not take into account the residual effect from the other predictive variables (the
other wine brands). As a result, it is not weighed correctly.

7. The following table shows the lift and cumulative lift for each decile:

No No of No
Decil of Cumulativ Cumulativ Buyer Cumulativ Respons Cumulativ Cumulativ Mode
e Cust e e% s e e rate Lift e resp rate e Lift l
400 4.1
1 0 4000 10% 1458 1458 36.45% 5 36.45% 4.15 1
400
2 0 8000 20% 598 2056 14.95% 1.7 25.70% 2.92 1
400 1.1
3 0 12000 30% 401 2457 10.03% 4 20.48% 2.33 1
400 0.9
4 0 16000 40% 331 2788 8.28% 4 17.43% 1.98 1
400 0.6
5 0 20000 50% 226 3014 5.65% 4 15.07% 1.71 1
400 0.5
6 0 24000 60% 184 3198 4.60% 2 13.33% 1.52 1
400 0.3
7 0 28000 70% 118 3316 2.95% 4 11.84% 1.35 1
400 0.2
8 0 32000 80% 89 3405 2.23% 5 10.64% 1.21 1
400 0.2
9 0 36000 90% 80 3485 2.00% 3 9.68% 1.1 1
400 0.0
10 0 40000 100% 32 3517 0.80% 9 8.79% 1 1

8. The chart was created in Excel and is shown below.


9. The table is shown below

No of Cumulative No of Cumulative Cumulative No


Decile Cust Cumulative % Buyers Buyers Gains Gains Model
0 0 0 0 0 0 0 0 0
1 4000 4000 10% 1458 1458 41.46% 41.46% 10%
2 4000 8000 20% 598 2056 17.00% 58.46% 20%
3 4000 12000 30% 401 2457 11.40% 69.86% 30%
4 4000 16000 40% 331 2788 9.41% 79.27% 40%
5 4000 20000 50% 226 3014 6.43% 85.70% 50%
6 4000 24000 60% 184 3198 5.23% 90.93% 60%
7 4000 28000 70% 118 3316 3.36% 94.28% 70%
8 4000 32000 80% 89 3405 2.53% 96.82% 80%
9 4000 36000 90% 80 3485 2.27% 99.09% 90%
10 4000 40000 100% 32 3517 0.91% 100.00% 100%
4000
Total 0 3517

10. The chart is shown below.


11. The following R command was used to calculate the customer’s predicted probability of response
need to be in order for them to be a profitable target:
> breakeven_rate <- ((20+10) / (720 - 40 - 120))*100
> breakeven_rate
[1] 5.357143

Therefore, the predicted probability of response should be at least 5.36%.

12. Number of buyers was calculated using the following R commands:


> ColSolare$targetcust <- ifelse(ColSolare$purch_prob > 0.0536,1,0)
> mean(ColSolare$targetcust)
[1] 0.4843
> .4843 * 120000
[1] 58116
> mean(subset(ColSolare, targetcust==1)$purch_prob)
[1] 0.1542016
> 0.1542016 * 58116
[1] 8961.58
> mean(subset(ColSolare, targetcust==1)$purch_prob)
[1] 0.1542016
> (720-40-120) * 8961.58 - 30 * 58116
[1] 3275005

a. I would expect 8962 buyers.


b. The expected response rate would be 15.42%
c. Expected profit would be $3,275,005.
d. Expected return on marketing expenses would be 3275005/(30*58116) = 1.87843 or 188%.

13. Yes, ColSolare needs to run a new campaign next year.


The reason is that customer taste or preferences may shift next year and they may not prefer merlot
compared to the red blend. Therefore, the marketing department may not be able to use the
response from red blend to find out how many customers will buy the company’s merlot brand.

14. A linear regression model would be appropriate in this case because dollars is a continuous
predictor. The model was created as follows:
> CustSpend <- lm(dollars ~ first_purch + malbec + merlot + syrah, data =
ColSolare)
> summary(CustSpend)

Call:
lm(formula = dollars ~ first_purch + malbec + merlot + syrah,
data = ColSolare)

Residuals:
Min 1Q Median 3Q Max
-3381.2 -527.9 -35.7 373.3 4695.0

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.2988 6.8775 13.57 <2e-16 ***
first_purch 38.7869 0.3032 127.91 <2e-16 ***
malbec 673.4377 5.8776 114.58 <2e-16 ***
merlot 685.6872 4.3502 157.62 <2e-16 ***
syrah 658.7488 7.1909 91.61 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 801.8 on 39995 degrees of freedom


Multiple R-squared: 0.829, Adjusted R-squared: 0.8289
F-statistic: 4.846e+04 on 4 and 39995 DF, p-value: < 2.2e-16

Each of the predictor variables is significant and the model can predict 82.9% of the variation in
spending. The coefficients can be interpreted as follows:

Spending increases by $38.79 for each additional month since first purchase if all else remains
constant. Spending increases by $673.47 for each additional case of malbec purchased if all else
remains constant. Spending increases by $685.69 for each additional case of merlot purchased if all
else remains constant. Spending increases by $658.75 for each additional case of syrah purchased if
all else remains constant.

You might also like