0% found this document useful (0 votes)
7 views

Model Specification

Uploaded by

rinkiakumari16
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Model Specification

Uploaded by

rinkiakumari16
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Model Specification

Recommended Text: Gujarati:


Econometrics by Example
Functional Forms of Linear Regression Model

• The Data Generating Process (DGP) is stated as:


linear in both
Yi   0  1 X i   i parameters and
variables

• Assuming a linear DGP, we run a OLS


regression to estimate β0 and β1 as

Ŷi  b0  b1Xi
Consider a data generating process:
Non-Linear in
1
Yi   1 X i Parameters

Sales of Omni Price of


Foods Omni
Foods

0.7

0.6

0.5

0.4
Y

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9
X
You are a marketing manager for OmniFoods, with oversight for nutrition bars and similar snack items.
You seek to revive the sales of OmniPower, the company’s primary product in this category. Originally
marketed as a high energy bar to runners, mountain climbers, and other athelets, OmniPower reached its
greatest sales in an earlier time, when high-energy bars were most popular with consumers. Now you
seek to remarket the product as a nutrition bar to benefit from the booming market for such bars.
Because the market already contains several successful nutrition bars, you need to develop an effective
marketing strategy. In particular, you need to determine the effect that price and advertising will have on
sales, using a sample of OmniPower. Given the data, how can you use linear regression methods to
incorporate the effects of price and promotion on sales?
Omni
Power

Store
Sales
(Y)
Price
($) (X1)
Advertising
($100s) (X2)
Data generating process for the entire population:
1 350 5.50 3.3
2
3
460
350
7.50
8.00
3.3
3.0
Yi  β 0  β1X1i  β 2 X 2i  ε i
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
Using the sample data, we run OLS regression
7 430 4.50 3.0 to estimate the values of β0, β1, β2
8 470 6.40 3.7
9 450 7.00 3.5 The estimated model is:
10 490 5.00 4.0

Ŷi  b 0  b1X1i  b 2 X 2i
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Excel Multiple Regression Output

Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Ŷ  306.526 - 24.975(X1 )  74.131(X2 )
Observations 15

ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
The Multiple Regression
Equation
Sales  306.526 - 24.975(Price)  74.131(Advertising)
where
Sales is in number of bars sold
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales
will decrease, on will increase, on
average, by 24.975 average, by 74.131
bars for each $1 bars for each $100
increase in selling increase in
price advertising
Using The Equation to
Make Predictions
Predict sales in which the selling price is $5.50
and advertising is $350:

Sales  306.526 - 24.975(Price)  74.131(Advertising)


 306.526 - 24.975 (5.50)  74.131(3.5)
 428.62

Note that Advertising is


Predicted sales in $100s, so $350 means
that X2 = 3.5
is 428.62 bars
Log-Log Model
OmniPower
Store Sales (Y) LnPrice(X1) LnAd(X2) Data generating process for the entire
1 5.86 1.7 1.19
2 6.13 2.01 1.19
population:
3 5.86 2.08 1.1
4 6.06 2.08 1.5 ln 𝑌 = 𝛽0 + 𝛽1 ln 𝑋1 + 𝛽2 ln 𝑋2 + Ɛ
5 5.86 1.92 1.1
6 5.94 2.01 1.39
Using the sample data, we run OLS regression
7 6.06 1.5 1.1
8 6.15 1.86 1.31 to estimate the values of β0, β1, β2
9 6.11 1.95 1.25
10 6.19 1.61 1.39 The estimated model is:
11 5.83 1.97 1.25
12 5.7 2.07 1.16
13 6.09 1.77 1.39 ln 𝑌 = 𝑏0 + 𝑏1 ln 𝑋1 + 𝑏2 ln 𝑋2
14 6.11 1.61 1.25
15 5.7 1.95 0.99
Log-Log Model ln 𝑌 = 𝑏0 + 𝑏1 ln 𝑋1 + 𝑏2 ln 𝑋2

Regression Statistics
Multiple R 0.735854229
R Square 0.541481447
Adjusted R Square 0.465061688
Standard Error 0.120201618
Observations 15

ANOVA
df SS MS F Significance F
Regression 2 0.204752184 0.102376 7.085621 0.009292691
Residual 12 0.173381149 0.014448
Total 14 0.378133333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 5.835251861 0.426707752 13.67505 1.11E-08 4.905535537 6.764968185
LnPrice -0.383040019 0.169367138 -2.2616 0.043087 -0.752059312 -0.014020726
LnAd 0.694009494 0.230099559 3.016127 0.01074 0.192665622 1.195353366

𝒍𝒏 𝒀 = 𝟓. 𝟖𝟑𝟓 − 𝟎. 𝟑𝟖𝟑 𝐥𝐧 𝑿𝟏 + 𝟎. 𝟔𝟗𝟒 𝐥𝐧 𝑿𝟐


Log-Log Model: Interpretation of the coefficient
ln 𝑌 = 𝑏0 + 𝑏1 ln 𝑋1 + 𝑏2 ln 𝑋2

b1: one percent change in X1 leads to b1 percent change in Y


b2: one percent change in X2 leads to b2 percent change in Y

ln 𝑌 = 5.835 − 0.383 ln 𝑋1 + 0.694 ln 𝑋2

one percent increase in price leads to 0.38 percent fall in average sales
one percent increase in ad expenses leads to 0.69 percent increase in average sales
Log-Lin Model
Topic of interest to economists, government and businesses is the
rate of growth of key economic variables, such as GDP, money
supply, employment, interest rates etc.

ln RGDPt = β1 +β2 t+ε

DATA

ln RGDPt = 7.875 +0.0315t


Log-Lin Model
ln RGDPt = 7.875 +0.0315 t
β2 is interpreted as the relative change in RGDP due to 1 year increase
in t. If 100 is multiplied this becomes percentage change.

ln Y  b0  b1 X
1 dY
 b1
Y dX
dY
Y b
1
dX
dY
 b1 dX
Y
dX  1
dY
 b1
Y If t increases by 1 year, RGDP increases by 0.0315*100=
%Y  b1 *100 3.15%
Log-Log and log-lin combined Model
Consider the regression model, where we are trying to investigate the relationship
between annual demand of cocoa and price, PCI and time.

ln(Demand)  b0  b1 ln(Pr ice)  b2 ln(PCI )  b3Year

Where,
Demand of cocoa is in millions per pound.
The time period chosen is 1960 to 2008 DATA
Price of cocoa is in dollars per pound
PCI is in dollars
(SUMMARY OUTPUT)

Dependent Variable Log(Demand)


Regression Statistics
For every 1% increase in price, the demand goes
Multiple R 0.979 down by 0.22%, all other variables remaining at
R Square 0.958 the same level.

Adjusted R Square 0.955


Standard Error 0.086

Observations 49

ANOVA
For every 1% increase in PCI, the demand
df SS MSgoes downF by 0.57%, all other
Significance F variables
remaining at the same level.
Regression 3 7.36 2.454 336.513 0.001
Residual 45 0.329 0.008

Total 48 7.688

Standard
Coefficients Error Fort Stat
each additional
P-value year, demand
Lower 95% increases
Upper 95%by
0.036*100=3.6%, all other variables remaining
Intercept -58.866 12.767 -4.611
same. 0.001 -84.579 -33.153
ln(Price) -0.223 0.034 -6.584 0.001 -0.29 -0.155
ln(Pci) -0.572 0.354 -1.619 0.113 -1.284 0.14
Year 0.037 0.009 4.4 0.001 0.02 0.053
Lin-Log Model

Yi  b0  b1 ln X i
𝑑𝑌 1
= 𝑏1
𝑑𝑋 𝑋
𝑑𝑌
(𝑋) = 𝑏1
𝑑𝑋
𝑑𝑌
𝑏1 =
𝑑𝑋
(𝑋)
𝑑𝑋
𝑑𝑌 = 𝑏1 ( )
𝑋

Therefore, a 1% change in X (dX/X=.01), Y would change by (.01)(b1)= b1/100 units


Engel expenditure function states that the total expenditure on food
tends to increase in an arithmetic progression as total expenditure
increases in geometric progression

Share of expenditure on food decreases as total expenditure increases

DATA
Lin-Log Model
SUMMARY OUTPUT

Regression Statistics Total Expenditure


Multiple R 0.593
R Square 0.351 Yi  b0  b1 ln X i
Adjusted R Square 0.351
Standard Error 0.069 Share of food expenditure
out of total expenditure
Observations 869

ANOVA
df SS MS F Significance F
Regression 1 2.22 2.22 468.65 0.01
Residual 867 4.1 0.01
Total 868 6.32

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 0.94 0.04 25.59 0.01 0.86 1.01
Log(Expend) -0.08 0.01 -21.65 0.01 -0.09 -0.08
Lin-Log Model

Yi  b0  b1 ln X i

Yi  0.94  0.08 ln X i

Therefore, a 1% increase in X (dX/X=.01), Y would decrease by (.01)(-.08)=-.0008 units


MODEL COEFFICIENT INTERPRETATION

Log-Log Model b= (relative change in Y)/(relative change in X)

Log-Lin Model b= (relative change in Y)/(absolute change in X)

Multiply b by 100 to get the percentage change in Y

Lin-Log Model b= (absolute change in Y)/(relative change in X)

Divide b by 100 to get the absolute change in Y


Polynomial Regression Model

(Wage)i   0  1Genderi   2 Racei   3 Educationi   4 Agei  

(Wage)i   0  1Genderi   2 Racei   3 Educationi   4 Exp i  

(Wage)i   0  1Genderi   2 Racei   3 Educationi   4 Exp i


  5 Expi 2  

Polynomial Regression Model

Strictly for lecture purpose at NMIMS SOC


(Wage)i  b 0  b1Genderi  b 2 Racei  b3 Educationi  b4 Exp i
(Wage)i  10440.61 - 11691.52Genderi - 8029.58Racei  3127.25Educi
 380.87Expi

(Wage)i  b 0  b1Genderi  b 2 Racei  b3 Educationi  b4 Exp i


 b5 Expi 2

(Wage)i  10084.64 - 12339.44.52Genderi - 8186.35Racei  2804.33Educi


 955.42Expi  12.79Expi 2

dWage
 380.87
dExp
dWage
 955.42  25.58Exp
dExp
Choice of Functional Form: The Box-Cox test
• So far considered models written in linear form Y = β0 + β 1X + ε.
Implies a straight line relationship between Y and X.

• Sometimes economic theory and/or observation of data will not


suggest that there is a linear relationship between variables

• To detect the type of non-linearity, we use the Box-Cox test


Box-Cox test
• The algorithm takes the dependent variable Y and the independent variables
Xs and transforms it according to the following:
Y 1  1
Y
1
X 2  1
X
2
• The regression equation becomes:

Y 1  1 X 2  1
  0  1 
1 2
• Maximum Likelihood estimation (MLE) procedure is used to obtain the values
of lambda1 and lambda2. The optimal values of lambda1 and lambda2 are the
ones for which the probability of getting the sample that we are using is
maximum.
• The MLE estimation returns MLE(log likelihood) scores for values of lambda1
and lambda2. The values of lambda1 and lambda2 are the ones for which the
log likelihood score is maximum.
Box-Cox test
• The Box-Cox method begins by computing the MLE score
when λ1 and λ2=1. Note when λ1and λ2=1, the equation
becomes: Y   0  1 X  

• The method then tries other values for λ1and λ2. Each time, it
runs a regression using (Yλ1 -1)/ λ1 and (X λ2 -1)/ λ2 as the LHS
and RHS variables and then computes the MLE score.

• The method reports the value of λ1 and λ2 that maximizes the


MLE score.
Box-Cox test
λ1 and λ2 can take any value. For instance, below are some of the
values of λ1 and λ2 and the corresponding functional forms:

Values of λ1 and λ2 Functional form


2 Quadratic Y and X
1 Linear Y and X
0 Log-log
-1 reciprocal
-2 1/Y2 and 1/X2
Box-Cox test: Example
Refer to the excel file:
BoxCox_Analyses_Final.xlsx

You might also like