0% found this document useful (0 votes)
81 views34 pages

Chapter 3 Econometrics

Multiple linear regression analysis allows researchers to predict the value of a dependent variable based on the values of two or more independent variables. It extends simple linear regression to incorporate multiple explanatory variables. The key aspects covered include defining multiple regression, distinguishing it from simple regression, describing the multiple regression model and assumptions required for its use, and interpreting the outputs of a multiple regression analysis such as regression coefficients, R-squared, and significance tests of variables.

Uploaded by

Ashenafi Zeleke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views34 pages

Chapter 3 Econometrics

Multiple linear regression analysis allows researchers to predict the value of a dependent variable based on the values of two or more independent variables. It extends simple linear regression to incorporate multiple explanatory variables. The key aspects covered include defining multiple regression, distinguishing it from simple regression, describing the multiple regression model and assumptions required for its use, and interpreting the outputs of a multiple regression analysis such as regression coefficients, R-squared, and significance tests of variables.

Uploaded by

Ashenafi Zeleke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Chapter Three

Multiple Linear Regression Analysis

1
Multiple Regression
 A statistical model that utilizes two or more quantitative and
qualitative explanatory variables (x1,..., xp) to predict a
quantitative dependent variable Y.
Caution: have at least two or more quantitative
explanatory variables.
 Multiple regression simultaneously considers the influence of
multiple explanatory variables on a response variable Y:

2
Simple vs. Multiple
•  represents the unit change • i represents the unit change in
in Y per unit change in X . Y per unit change in Xi.
• Does not take into account
any other variable besides • Takes into account the effect of
single independent variable. other independent variables.

• R2: proportion of variation • R2: proportion of variation in Y


in Y predictable from X. predictable by set of X’s

3
Multiple Regression Models
Multiple
Regression
M odels
Non-
Linear
Linear

Dummy Inter-
Linear action
Variable

Poly- Square
Log Reciprocal Exponential
Nomial Root
4
The Multiple Linear Regression Model building

Idea: Examine the linear relationship between 1


dependent (Y) & 2 or more independent variables (X i)

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

Yi  β0 β1X1i β2X2i βkXki  Ui

5
• The coefficients of the multiple regression model are
estimated using sample data with k independent
variables
Estimated Estimated
(or predicted) Estimated slope coefficients
value of Y intercept

Ŷi b0 b1 X1i b2 X2i bk Xki


• Interpretation of the Slopes:
– b1=The change in the mean of Y per unit change in X1,
taking into account the effect of the rest of Xj’s (or net of
Xj)
– b0 = Y intercept. It is the same as simple regression.
6
ASSUMPTIONS
• Linear regression model: The regression model is linear in the
parameters, though it may or may not be linear in variables.
• The X variable is independent of the error term. This means
that we require zero covariance between ui and each X variables.
cov.(ui , X1i) = cov(ui, X2i)=------- = cov(ui, Xki) = 0
• Zero mean value of the disturbance term ui. Given the value
of Xi, the mean, or the expected value of the random disturbance
term ui is zero.
E(ui)= 0 for each i
• Homoscedasticity or constant variance of ui . This implies that
the variance of the error term is the same, regardless of the value
of X.
var (ui) = σ2 7
• No auto-correlation between the disturbance terms.

cov ( ui, uj) = 0 i≠j

 This implies that the observations are sampled independently.


• The number of observations n must be greater than the number
of parameters to be estimated.
• There must be variation in the values of the X variables.
Technically, var(X) must be a positive number.
No strong/perfect multicollinearity: No exact linear relationship
exists between any of the explanatory variables.

8
Estimation of parameters and standard errors

9
10
11
The coefficient of determination and test of model adequacy

12
13
Test of the Significance of Individual Variables!
• Use t-tests of individual variable slopes
• If there is a linear relationship between the variable Xi and Y;
Hypotheses:
• H0: βi = 0 (no linear relationship)

• H1: βi ≠ 0 (linear relationship does exist between Xi and Y)

• Test Statistic: bi  0
t* 
S bi

• Confidence interval for the population slope βi


b i  t c S bi
14
• Then, as before if t*>t & If t*<t …… Accept null
Assumptions and Procedures to Conduct Multiple
Linear Regression
 When you choose to analyse your data using multiple
regression, make sure that the data you want to analyse can
actually be analysed using multiple regression.
 It is only appropriate to use multiple regression if your data
"passes" eight assumptions that are required for multiple
regression to give you a valid result.
 let's take a look at these eight assumptions:
Assumption #1:
 Your dependent variable should be measured on a
continuous scale .
15
Assumption #2:
 You should have two or more IdepVars, which can be either
continuous or categorical or dummy.
Assumption #3:
 You should have independence of residuals, which you can
easily check using the Durbin-Watson statistic.
Assumption #4:
 There needs to be a linear relationship between :
 the Depvar and each of your indepvars
Assumption #5:
 Your data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move
along the line.
16
Assumption #6:
 Your data must not show multicollinearity, which occurs when
you have two or more Idepvars that are highly correlated with
each other.
Assumption #7:
 There should be no significant outliers
 This can change the output that any Statistics produces and
reduce the predictive accuracy of your results as well as the
statistical significance.
Assumption #8:
 Finally, you need to check that the residuals (errors) are
normally distributed.
17
You can check assumptions #3, #4, #5, #6, #7 and #8 using
STATA/SPSS.
Assumptions #1 and #2 should be checked first, before
moving onto assumptions #3, #4, #5, #6, #7 and #8.
 Just remember that if you do not run the statistical tests on
these assumptions correctly, the results you get when
running multiple regression might not be valid.

18
Given the assumptions and data on Y and set of IVs (X1,..,
XK ) , the following are a suggested procedures/steps to
conduct multiple linear regression:
1. Select variables that you believe are linearly related to the
dependent variable.
2. Use a software to generate the coefficients and the
statistics used to assess the model.
3. Diagnose violations of required conditions/ assumptions.
 If there are problems, attempt to remedy them.
4. Assess the model’s fit.
5. Test & interpret the coefficients
6. We use the model to predict a value of the DV.
19
Regression Output Interpretation
Example
 In a study of consumer demand (Qd), multiple regression
analysis is done to examine the relationship between quantity
demanded and four potential predictors.
The four independent variables are: price, income, tax and Price
of related goods.
The output for this example is interpreted as follows:
The multiple correlation coefficient is 0.971.
 R is the correlation between the observed values of Y and the
values of Y predicted by the model.

20
Source | SS df MS Number of obs = 16

-------------+------------------------------ F(4,11) = 45.76

Model | 16478.6652 4 4119.66629 Prob > F = 0.0000

Residual | 990.272334 11 90.0247576 R-squared = 0.9433

-------------+------------------------------ Adj R-squared = 0.9227

Total | 17468.9375 15 1164.59583 Root MSE = 9.4881

------------------------------------------------------------------------------

Qd | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

p | -.1619978 .0258238 -6.27 0.000 -.2188356 -.1051599

I | .000022 .0008965 0.02 0.981 -.0019511 .001995

pr | -.4774324 .1851878 -2.58 0.026 -.8850281 -.0698367

tax | 6.270663 2.953327 2.12 0.057 -.229565 12.77089

_cons | 100.8259 14.82823 6.80 0.000 68.18922 133.4627


21
 Therefore, large values of R represent a large correlation between the
predicted and observed values of the outcome.

 R of 1 implies the model perfectly predicts the observed.

 0.971 implies nearly perfect prediction of the actual Y.

 The R2 is 0.943.

 This means that the IVs explain 98.2% of the variation in the DV.

The adjusted R-square, a measure of explanatory power, is 0.922.

 This statistic is not generally interpreted because it is neither a


percentage (like the R2), nor a test of significance (the F-statistic).
22
The p value for the F statistic is <.01.
 This means that at least one of the IVs is a significant predictor
of the dependent variable (quantity demanded).
 This indicates rejection of the null hypothesis.

Interpreting Parameter Values (Model Coefficients)

The results of the estimated regression line include the estimated


coefficients, the standard error of the coefficients, the calculated t-
statistic, the corresponding p-value, and the bounds of the 95%
confidence intervals. STATA Version 12

23
Finally, the above table will help us to determine whether quantity
demanded and explanatory variables are significantly related,
and the direction and strength of their relationship.
 The prediction equation is written as:

Qd= 100.82-0.162p+0.22*10-4 y-0.477pr+6.27tax


Results of the multiple linear regression model showed that out of the 4
explanatory variables that were entered to the model, 2 of them, namely price
of the product and price of related commodities were found to be statistically
significant @ 5% while tax is significant only @10%.

 Results of the statistically significant variables are discussed as follows:

 The Constant is the predicted value of quantity demanded when all of the
independent variables have a value of zero.

24
 The b coefficient associated with price (-0. 162) is negative,
indicating an inverse relationship in which higher price of the
product is associated with lower quantity demanded.
 For the independent variable price, the probability of the t
statistic (0.000) for the b coefficient is less than the level of
significance of 0.05.
 We reject the null hypothesis that the slope associated with
price is equal to zero and conclude that there is a statistically
significant relationship between price and quantity demanded.
 A unit increase/decrease in the price of the product leads to a
0.162 decrease/increase in quantity demanded, ceteris paribus.

25
 The income variable is found to be positively and
insignificantly ( even at 10% level of significance) related to
quantity demanded. There is no relation between income and
quantity demanded of this good.
 Tax coefficient is statistically significant (at 10% probability
level) and carries positive sign.
 The slope of tax is 6.27. This means that for every one unit
increase/decrease in tax on a commodity, quantity demanded
will increase/decrease by 6.27 units, ceteris paribus. Of
course, this is not a valid conclusion.

26
Dummy independent Variables
Describing Qualitative Information
• In regression analysis the dependent variable can be
influenced by variables that are essentially qualitative in
nature,
 such as sex, race, color, religion, nationality, geographical
region, political upheavals, and party affiliation.
• One way we could “quantify” such attributes is by
constructing artificial variables that take on values of 1 or 0,
 1 indicating the presence (or possession) of that attribute and 0
indicating the absence of that attribute.
• Variables that assume such 0 and 1 values are called dummy/
indicator/ binary/ categorical/ dichotomous variables.
27
Example 1 :
where Y=annual salary of a college professor
Di  1 if male college professor

= 0 otherwise (i.e., female professor)


 The Model may enable us to find out whether sex makes any
difference in a college professor’s salary, assuming, of course,
that all other variables such as age, degree attained, and years of
experience are held constant.
 Mean salary of female college professor:
 Mean salary of male college professor:
  tells by how much the mean salary of a male college professor
differs from the mean salary of his female counterpart.
 A test of the null hypothesis that there is no sex discrimination
( Ho:  = 0) can be easily made and finding out whether the
estimated  is statistically significant on the basis of the t test.
28
Example 2:
Where: Xi = years of teaching experience
Mean salary of female college professor: E (Yi / X i , Di  0)   1  X i
Mean salary of male college professor: E (Yi / X i , Di  1)  (   2 )  X i

 the male and female college professors’ salary functions in relation to


the years of teaching experience have the same slope () but
different intercepts.
 Male intercept = a1 +a2
 Female intercept = a1
 Difference = a2
Note: If a qualitative variable has ‘m’ categories, introduce only ‘m-1’
dummy variables.
 The group, category, or classification that is assigned the value of 0
is often referred to as the base, benchmark, control, comparison,
reference, or omitted category. 29
30
Example 3: qualitative variable with more than two classes
 regress the annual expenditure on health care by an
individual on the income and education of the individual.
Yi   1   2 D2i   3 D3i  X i  u i

Where Yi  annual expenditure on health care


X i  annual income

D2  1 if high school education


= 0 otherwise
D3  1 if college education

= 0 otherwise
 “less than high school education” category as the base
category.
 Therefore, the intercept  will reflect the intercept for this
category.
31
• the mean health care expenditure functions for the three
levels of education, namely, less than high school, high
school, and college:
E (Yi | D2  0, D3  0, X i )   1  X i

E (Yi | D2  1, D3  0, X i )  ( 1   2 )   X i

E (Yi | D2  0, D3  1, X i )  ( 1   3 )  X i

32
Log-Level

 wage increases by 8.3 percent for every additional year of


education.
33
Log-Log:

 The coefficient of log(sales) is the estimated elasticity of


salary with respect to sales.
• It implies that a 1 percent increase in firm sales increases
salary by about 0.257 percent—the usual interpretation of an
elasticity.

Level – Log:
 it arises less often in practice.

Y = 0 + 1 log(x) + u
34
Y –hat = 110 + 12 log(x), change in Y hat =? 0.12 units.

You might also like