Col Solare Case Study 2
Col Solare Case Study 2
A logistic regression model was created in R after first transforming the “customer type” variable
into a 0/1 dummy variable called restaurant. The following command was used:
> ColSolare$restaurant <- ifelse(ColSolare$customer_type == "restaurant",
1, 0)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.778e+00 1.066e-01 -16.679 < 2e-16 ***
last_purch -9.215e-02 3.112e-03 -29.613 < 2e-16 ***
dollars 3.401e-04 3.942e-04 0.863 0.38828
restaurant -8.181e-01 4.027e-02 -20.317 < 2e-16 ***
customer_sqft 2.787e-05 3.110e-05 0.896 0.37021
cab_franc -3.521e-01 2.856e-01 -1.233 0.21767
cab_sauvignon 3.335e-01 2.846e-01 1.172 0.24134
malbec -7.616e-01 2.860e-01 -2.663 0.00775 **
merlot -4.148e-01 2.845e-01 -1.458 0.14479
red_blend 8.094e-01 2.849e-01 2.840 0.00451 **
syrah 2.253e-03 2.853e-01 0.008 0.99370
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The odds of buying the 2019 Red Blend after receiving the free sample increase by a factor of .91
(which means decrease by 8.8%) for each additional month elapsed since the customer’s most recent
purchase from Col Solare if all else remains constant.
The odds of buying the 2019 Red Blend after receiving the free sample increase by a factor of .44
(which means decrease by 55%) if the customer’s establishment is a restaurant. In other words, odds
of buying are more than 1 if it is a bar.
The probability of buying the 2019 Red Blend after receiving the free sample is not affected by the
total dollars that the customer has spent with Col Solare (odds ratio of 1) if all else remains constant.
The probability of buying the 2019 Red Blend after receiving the free sample is not affected by the
size of the customer’s establishment (odds ratio of 1) if all else remains constant.
3. Customers were each assigned to a decile based on their purchase probabilities stored in the
variable purch_prob. The following command was used:
> ColSolare$predict <- 11 - ntile(ColSolare$purch_prob, 10)
# A tibble: 10 x 4
`ColSolare$predict` count buyers responserate
<dbl> <int> <dbl> <dbl>
1 1 4000 1458 0.364
2 2 4000 598 0.150
3 3 4000 401 0.100
4 4 4000 331 0.0828
5 5 4000 226 0.0565
6 6 4000 184 0.046
7 7 4000 118 0.0295
8 8 4000 89 0.0222
9 9 4000 80 0.02
10 10 4000 32 0.008
6. The logistic regression was performed using only the merlot variable using the command
> ColSolaremerlot <- glm(buyer ~ merlot, family = binomial(link='logit'),
data = ColSolare)
> summary (ColSolaremerlot)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.5447 -0.4306 -0.4162 -0.4162 2.2314
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.40289 0.02251 -106.766 < 2e-16 ***
merlot 0.07122 0.01496 4.761 1.93e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
> exp(ColSolaremerlot$coef)
(Intercept) merlot
0.09045611 1.07381299
Intuitively, the odds ratio for Merlot is different than the earlier logistic regression because the
second model does not take into account the residual effect from the other predictive variables (the
other wine brands). As a result, it is not weighed correctly.
7. The following table shows the lift and cumulative lift for each decile:
No No of No
Decil of Cumulativ Cumulativ Buyer Cumulativ Respons Cumulativ Cumulativ Mode
e Cust e e% s e e rate Lift e resp rate e Lift l
400 4.1
1 0 4000 10% 1458 1458 36.45% 5 36.45% 4.15 1
400
2 0 8000 20% 598 2056 14.95% 1.7 25.70% 2.92 1
400 1.1
3 0 12000 30% 401 2457 10.03% 4 20.48% 2.33 1
400 0.9
4 0 16000 40% 331 2788 8.28% 4 17.43% 1.98 1
400 0.6
5 0 20000 50% 226 3014 5.65% 4 15.07% 1.71 1
400 0.5
6 0 24000 60% 184 3198 4.60% 2 13.33% 1.52 1
400 0.3
7 0 28000 70% 118 3316 2.95% 4 11.84% 1.35 1
400 0.2
8 0 32000 80% 89 3405 2.23% 5 10.64% 1.21 1
400 0.2
9 0 36000 90% 80 3485 2.00% 3 9.68% 1.1 1
400 0.0
10 0 40000 100% 32 3517 0.80% 9 8.79% 1 1
14. A linear regression model would be appropriate in this case because dollars is a continuous
predictor. The model was created as follows:
> CustSpend <- lm(dollars ~ first_purch + malbec + merlot + syrah, data =
ColSolare)
> summary(CustSpend)
Call:
lm(formula = dollars ~ first_purch + malbec + merlot + syrah,
data = ColSolare)
Residuals:
Min 1Q Median 3Q Max
-3381.2 -527.9 -35.7 373.3 4695.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.2988 6.8775 13.57 <2e-16 ***
first_purch 38.7869 0.3032 127.91 <2e-16 ***
malbec 673.4377 5.8776 114.58 <2e-16 ***
merlot 685.6872 4.3502 157.62 <2e-16 ***
syrah 658.7488 7.1909 91.61 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Each of the predictor variables is significant and the model can predict 82.9% of the variation in
spending. The coefficients can be interpreted as follows:
Spending increases by $38.79 for each additional month since first purchase if all else remains
constant. Spending increases by $673.47 for each additional case of malbec purchased if all else
remains constant. Spending increases by $685.69 for each additional case of merlot purchased if all
else remains constant. Spending increases by $658.75 for each additional case of syrah purchased if
all else remains constant.