Model Specification
Model Specification
Ŷi b0 b1Xi
Consider a data generating process:
Non-Linear in
1
Yi 1 X i Parameters
0.7
0.6
0.5
0.4
Y
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9
X
You are a marketing manager for OmniFoods, with oversight for nutrition bars and similar snack items.
You seek to revive the sales of OmniPower, the company’s primary product in this category. Originally
marketed as a high energy bar to runners, mountain climbers, and other athelets, OmniPower reached its
greatest sales in an earlier time, when high-energy bars were most popular with consumers. Now you
seek to remarket the product as a nutrition bar to benefit from the booming market for such bars.
Because the market already contains several successful nutrition bars, you need to develop an effective
marketing strategy. In particular, you need to determine the effect that price and advertising will have on
sales, using a sample of OmniPower. Given the data, how can you use linear regression methods to
incorporate the effects of price and promotion on sales?
Omni
Power
Store
Sales
(Y)
Price
($) (X1)
Advertising
($100s) (X2)
Data generating process for the entire population:
1 350 5.50 3.3
2
3
460
350
7.50
8.00
3.3
3.0
Yi β 0 β1X1i β 2 X 2i ε i
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
Using the sample data, we run OLS regression
7 430 4.50 3.0 to estimate the values of β0, β1, β2
8 470 6.40 3.7
9 450 7.00 3.5 The estimated model is:
10 490 5.00 4.0
Ŷi b 0 b1X1i b 2 X 2i
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Excel Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Ŷ 306.526 - 24.975(X1 ) 74.131(X2 )
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Regression Statistics
Multiple R 0.735854229
R Square 0.541481447
Adjusted R Square 0.465061688
Standard Error 0.120201618
Observations 15
ANOVA
df SS MS F Significance F
Regression 2 0.204752184 0.102376 7.085621 0.009292691
Residual 12 0.173381149 0.014448
Total 14 0.378133333
one percent increase in price leads to 0.38 percent fall in average sales
one percent increase in ad expenses leads to 0.69 percent increase in average sales
Log-Lin Model
Topic of interest to economists, government and businesses is the
rate of growth of key economic variables, such as GDP, money
supply, employment, interest rates etc.
DATA
ln Y b0 b1 X
1 dY
b1
Y dX
dY
Y b
1
dX
dY
b1 dX
Y
dX 1
dY
b1
Y If t increases by 1 year, RGDP increases by 0.0315*100=
%Y b1 *100 3.15%
Log-Log and log-lin combined Model
Consider the regression model, where we are trying to investigate the relationship
between annual demand of cocoa and price, PCI and time.
Where,
Demand of cocoa is in millions per pound.
The time period chosen is 1960 to 2008 DATA
Price of cocoa is in dollars per pound
PCI is in dollars
(SUMMARY OUTPUT)
Observations 49
ANOVA
For every 1% increase in PCI, the demand
df SS MSgoes downF by 0.57%, all other
Significance F variables
remaining at the same level.
Regression 3 7.36 2.454 336.513 0.001
Residual 45 0.329 0.008
Total 48 7.688
Standard
Coefficients Error Fort Stat
each additional
P-value year, demand
Lower 95% increases
Upper 95%by
0.036*100=3.6%, all other variables remaining
Intercept -58.866 12.767 -4.611
same. 0.001 -84.579 -33.153
ln(Price) -0.223 0.034 -6.584 0.001 -0.29 -0.155
ln(Pci) -0.572 0.354 -1.619 0.113 -1.284 0.14
Year 0.037 0.009 4.4 0.001 0.02 0.053
Lin-Log Model
Yi b0 b1 ln X i
𝑑𝑌 1
= 𝑏1
𝑑𝑋 𝑋
𝑑𝑌
(𝑋) = 𝑏1
𝑑𝑋
𝑑𝑌
𝑏1 =
𝑑𝑋
(𝑋)
𝑑𝑋
𝑑𝑌 = 𝑏1 ( )
𝑋
DATA
Lin-Log Model
SUMMARY OUTPUT
ANOVA
df SS MS F Significance F
Regression 1 2.22 2.22 468.65 0.01
Residual 867 4.1 0.01
Total 868 6.32
Yi b0 b1 ln X i
Yi 0.94 0.08 ln X i
dWage
380.87
dExp
dWage
955.42 25.58Exp
dExp
Choice of Functional Form: The Box-Cox test
• So far considered models written in linear form Y = β0 + β 1X + ε.
Implies a straight line relationship between Y and X.
Y 1 1 X 2 1
0 1
1 2
• Maximum Likelihood estimation (MLE) procedure is used to obtain the values
of lambda1 and lambda2. The optimal values of lambda1 and lambda2 are the
ones for which the probability of getting the sample that we are using is
maximum.
• The MLE estimation returns MLE(log likelihood) scores for values of lambda1
and lambda2. The values of lambda1 and lambda2 are the ones for which the
log likelihood score is maximum.
Box-Cox test
• The Box-Cox method begins by computing the MLE score
when λ1 and λ2=1. Note when λ1and λ2=1, the equation
becomes: Y 0 1 X
• The method then tries other values for λ1and λ2. Each time, it
runs a regression using (Yλ1 -1)/ λ1 and (X λ2 -1)/ λ2 as the LHS
and RHS variables and then computes the MLE score.