BT4211
Data-Driven Marketing
Customer: Purchase Choice, Quantity,
Duration
March 7, 2018 1
Purchase Decisions & Models
Purchase choice
– Whether the customer will buy/churn?
– What brand/product/service will the customer buy?
• Binary logit (logistic regression) model, multinomial logit model
Purchase quantity
– How much or how many units will the customer buy?
• Count data model (Poisson, negative binomial)
Duration: inter-purchase, customer lifetime
– How soon will the customer make another purchase?
– How long will the customer stay on with the firm?
• Linear regression model, hazard model
2
Binary Response Models
Linear probability model
– Link function
3
Binary Response Models
Linear probability model
– Problems for binary responses
• Error term violates homoscedasticity assumption of
classical linear regression model
– Heteroscedasticity, if not corrected for, can increase prediction
error
• Predicted probability may not be bounded from 0 to 1
– Predictions can be impossible to interpret as probabilities
4
Binary Response Models
Binary logit (logistic regression) model
– Link function
– Estimation method
• Maximum likelihood estimation
– Interpretation
• Odds ratio:
• Odds ratio per standard deviation change in x:
5
Binary Response Models
Binary probit model
– Link function
– Estimation method
• Maximum likelihood estimation
6
Binary Response Models
Logistic regression with rare events data
– Problems
• Rare event response rates below 1% are not unusual
• Binary logit and probit models can under-estimate
response probability in such cases
• Predicted response probabilities under-estimate the
actual likelihood of response
– Solutions
• Adjustments with choice-based sampling
7
Multinomial Response Models
Multinomial response model
– Specification
• Number of choice (or response) alternatives is J
• Probability of a consumer i choosing alternative j
– Applications
• Brand or product choices
• Customer segment predictions
8
Multinomial Response Models
Choice of function results in different
multinomial model types
• Examples: logit, probit, nested logit, ordered logit, etc.
Alternative-varying regressors
• Regressors xi take different values for different alternatives
• Examples: costs of transport modes, prices of brands
Alternative-invariant regressors
• Regressors xi take same values across alternatives
• Examples: socioeconomic status such as income, gender 9
Multinomial Response Models
Model evaluation and selection methods
– Range of in-sample fitted probabilities for each
alternative
• Wider the range, the more discriminating is the model
– Akaike and/or Bayesian Information Criterion
– Pseudo R2
10
Multinomial Response Models
Conditional logit model (CL)
• For alternative-varying regressors
Multinomial logit model (MNL)
• For alternative-invariant regressors
Mixed logit model (ML)
• For both alternative-varying and -invariant regressors
11
Multinomial Response Models
Example:
12
Multinomial Response Models
Example: marginal effects
– Conditional logit model (CL):
– Multinomial logit model (MNL):
13
Marginal Effects of Regressors
Marginal effects of regressors:
– Change in conditional mean of y when regressors
x change by one unit
– Linear regression:
– Non-linear regression:
– General regression function
14
Marginal Effects of Regressors
Marginal effects of regressors
– Calculus method
– All 3 measures are same for linear models
– All 3 measures are different for non-linear models
• Care must be taken in interpreting estimated coefficients
• R, Stata commands: margins, after model estimations 15
Count Data Models
Overview of count data
– Discrete data with ordered metric (0,1,2,3,…)
• Examples
– Number of beers a consumer drinks in a week
– Number of mail orders a customer makes in a year
– Number of complaints a customer makes in a month
– Alternative modeling methods
• Multinomial logit model
– Inappropriate since dependent variable is ordered
• Linear regression model
– Inappropriate assumptions of normally distributed error terms
and continuous nature of dependent variable
16
Count Data Models
Poisson regression model
– Specification
– Estimation method
• Maximum likelihood estimation
17
Count Data Models
Poisson regression model
– Limitations
• Distribution is parameterized in terms of a single scalar
parameter
• Excess zeros problem
– More zeros in data than Poisson model predicts
• Over-dispersion in data
– Variance exceeds mean but Poisson model implies equality of
variance and mean
– Poisson MLE is still consistent, if conditional mean is correctly
specified
– Leads to deflated standard errors, inflated t statistics
– Over-dispersion and under-dispersion test statistics
18
Count Data Models
Negative binomial regression model
– Specification
• Conditional distribution of Yi given ui
• Unconditional distribution of Yi
19
Count Data Models
Negative binomial regression model
– Specification
• Unconditional distribution of Yi with ui assumed to be
from a Gamma distribution
• Mean:
• Variance:
– Estimation method
• Maximum likelihood estimation
20
Duration Models
Overview of duration data
– Continuous or discrete time duration variable
• Example questions addressed by duration models
– What is the probability that a customer in a telecommunication
company will remain as a customer after a year?
– What is the attrition probability of each customer in a month?
– Are attrition probabilities different depending on the customer’s
demographic characteristics?
– What is the expected duration of a customer’s relationship with
the firm?
26
Duration Models
Overview of duration data
– Censoring
• Buyer 1: complete information
• Buyer 2: left-censored
• Buyer 3: right-censored
• Buyer 4: left-and-right censored
• Buyer 5: interval-censored
Buyer 1
Buyer 2
Buyer 3
Buyer 4
Buyer 5
t0 Observation Window tN
27
Duration Models
Linear regression model
– Method
• Simplest model to explain the relation between customer duration and
other explanatory variables
• Focus only on sample of prior customers (with full lifespan observed)
and omit right-censored observations, i.e., current customers
– Limitations
• Potential censoring bias, since data sample does not include all
customers, but only those prior ones with full lifespan observed
– Problematic especially when number of complete observations is small
relative to number of incomplete observations (i.e., current customers)
• Limited in helping to manage customer relationships
– Does not address probability of attrition during specified time periods
28
Duration Models
Hazard model
– Objective
• Models length of time spent in a given state before
transition to another state
– Duration from being an active customer to a churned one
– Duration between two consecutive purchases
– Basic concepts
• Cumulative distribution function
• Survivor function
– Probability that the length of duration is at least t 29
Duration Models
Hazard model
– Basic concepts
• Hazard rate function
– Instantaneous probability of leaving a state conditional on
survival to time t
• Cumulative hazard rate function
30
Duration Models
Hazard model
– Basic concepts
• Hazard rate function: examples
31
Duration Models
Hazard model
– Basic concepts
• Hazard rate function plots
32
Duration Models
Hazard model
– Exponential distribution
• Constant hazard rate that does not vary with time
• Memory-less property
– Weibull distribution
• Hazard is monotonically increasing if
• Hazard is monotonically decreasing if
• can be a function of covariates X
– Generalized Weibull distribution
• Additional shape parameter , gives more flexibility
• Hazard is monotonically decreasing if
• Hazard is unimodal or U-shaped if 33
Duration Models
Hazard model
– Gompertz distribution
• Hazard is monotonically increasing if
• Hazard is monotonically decreasing if
– Log-normal distribution
• Hazard is inverted U-shaped
– Log-logistic distribution
• Hazard is inverted U-shaped if
– Main issues in modeling
• Dependence on correct model specification
• Proportional Hazard (PH) model
• Accelerated Failure Time (AFT) model 34
Duration Models
Maximum likelihood estimation
• Uncensored observations
• Censored observations
Likelihood function for ith observation
Log-likelihood function for entire sample
35
Proportional Hazard Model
Conditional hazard rate
Baseline hazard
• If 0 (t , ) is assumed to be non/semi-parametric =>
Cox Proportional Hazard model
Scale factor
Distributional examples
• Exponential, Weibull, Gompertz distributions 36
Proportional Hazard Model
Interpretation
– Hazard ratio:
– Relative hazard rate:
• Percentage change of hazard rate with respect to the
unit change of the independent variable
37
Accelerated Failure Time Model
Models ln(t) rather than t
Conditional hazard rate
• Acceleration of baseline hazard if
• Deceleration of baseline hazard if
Distributional examples
• Exponential, Weibull, Log-normal, Log-logistic distribution 38
Stata Commands for Models
Linear regression model
• reg; xtreg
Binary logit, probit model
• logit; probit; xtlogit; xtprobit
Conditional logit model
• clogit
Multinomial logit model
• mlogit
Poisson regression model
• poisson; xtpoisson
39
Stata Commands for Models
Negative binomial regression model
• nbreg; xtnbreg
Tobit (Type I) model
• tobit; xttobit
Tobit (Type II) model
• heckman
Proportional hazard model
• stcox x1, x2, …; streg x1, x2, …; xtstreg x1, x2, …
Accelerated failure time model
• streg x1, x2, …, time; xtstreg x1, x2, …, time
40