0% found this document useful (0 votes)
4 views

Multivariate Regression Model - Lecture Notes

Uploaded by

Shruti Mittal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Multivariate Regression Model - Lecture Notes

Uploaded by

Shruti Mittal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Econometrics for Business I

BSE3703
Topic 3
Multivariate Regression Model
Learning Outcomes

At the end of the lesson, students must be able to:


1. understand the specifications of multivariate linear regression.
2. interpret the regression output table of multivariate regression model.
3. explain the normality assumption of the regression model.
4. understand the sampling distribution of OLS estimators in the regression model.
Multivariate Regression Model
hope for a lineaar relationship between variables

yi = β0 + β1 x1i + β2 x2i + ⋯ + βk xki + ui j=1 j=2 j=3 j=4


Population Size y x1 x2 x3 x4
N,k i=1 6756 29 17 2.83 21
i = 1, 2, … , N
= β0 + ෍ βj xji + ui j = 1, 2, … , k
i=2 7500 38 23 3.75 22
i=3 7440 32 19 3.15 14
i=1,j=1
i=4 7740 31 23 4.12 19
include all determinants possible
i=5 7836 38 21 3.57 16
sample regression fn
yi = β෠ 0 + β෠ 1 x1i + β෠ 2 x2i + ⋯ + β෠ k xki + uො i
… 7416 28 17 2.83 16
… 7596 37 20 3.37 21

n,k
𝑥1,5 … 7860 42 23 4.67 17
i = 1, 2, … , n … 7716 30 22 3.68 19
= β෠ 0 + ෍ β෠ j xji + uො i j = 1, 2, … , k … 7476 34 20 3.32 23
i=1,j=1 i=n−1 7536 35 21 3.42 18
Sample Size
i=n 7356 30 19 3.13 22

𝑥3,𝑛
Sample Size for Regression Model

▪ Sample size for multivariate regression model should be ‘sufficiently large’ in order
to build an adequate regression model for inference and forecasting.
➢ There is no ‘fast and hard rule’ to derive the appropriate sample size.
➢ In the real-world, the best practice is to configure your sample size according to your context.

▪ A ‘sufficiently large’ sample size will help to


1. increase the accuracy of the OLS estimators.
2. ensure the validity of hypothesis tests on the OLS estimators.
3. satisfy the normality assumption in the regression model.
4. mitigate the risks of ‘common problems’ often associated with the regression model.
Outliers
▪ Outliers will distort the computation of OLS estimators in the regression model.
▪ Therefore, the best practice is to clean up (or remove) outliers in the sample data
before moving forward to build the regression model.
▪ Outliers can be identified (and subsequently omitted) using

Outliers

Histogram Box Plot


Types of Survey Error
▪ Surveys are often used to collect samples.

▪ But surveys are subjected to potential errors.


▪ There are four types of survey errors.
1. Coverage error occurs if certain groups of items are excluded so that they have no chance of being
selected in the sample.
2. Nonresponse error arises from the failure to collect data on all items in the sample and results in a
nonresponse bias. if there is even one missing answer in a survey form
3. Sampling error reflects the “chance differences” from sample to sample, based on the probability of
particular individuals or items being selected from sample to sample.
4. Measurement error occurs because of a weakness in question wording, attributed to the fact that
the process of measurement is often governed by what is convenient, not what is needed.
Multivariate Regression Model : Example
Variable of Interest: Salary of Singapore Degree Holder
Determinants (Explanatory Variables):
Determinants Representation Data Type
SALARY AGE EDU GPA SIB
1 Age (in number of years) AGE Numerical
i=1 6756 29 17 2.83 0
2 Years of Education EDU Numerical
i=2 7500 38 23 3.75 1
3 Gender GEN Categorical
i=3 7440 32 19 3.15 2
4 GPA Score GPA Numerical
i=4 7740 31 23 4.12 0
5 Professional Certification(s) PRO Categorical
i=5 7836 38 21 3.57 1
6 Number of Siblings SIB Numerical
… 7416 28 17 2.83 1
in topic 3 only numerical data is analysed
… 7596 37 20 3.37 0
Population Regression Model:
… 7860 42 23 4.67 3
yi = β0 + β1 AGEi + β2 EDUi + β3 GPAi + β4 SIBi + ui … 7716 30 22 3.68 0
… 7476 34 20 3.32 2
Sample Regression Model:
i = 236 7536 35 21 3.42 0
yi = β෠ 0 + β෠ 1 AGEi + β෠ 2 EDUi + β෠ 3 GPAi + β෠ 4 SIBi + uො i i = 237 7356 30 19 3.13 2
Regression Output Table : Interpretation
k↑ → σ uො 2i ↓ → R2 ↑
yi = β෠ 0 + β෠ 1 AGEi + β෠ 2 EDUi + β෠ 3 GPAi + β෠ 4 SIBi + uො i
2
RSS σ uො 2i
𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚 number of beta (b0) R =1− =1− 2
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 k n−1−k 𝑀𝑒𝑎𝑛 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠
n sample size
TSS σ yi − yത
𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚
▪ A problem with the R2 statistics is that adding additional
explanatory variables to the model will always increase it,
even if those variables do not have explanatory power.

ESS
σ uො 2i
RSS
Adjusted R2 = R ഥ2 = 1 − n − 1 − k
TSS R square will increase only if the variable is relevant as the change in σ yi − y ത 2
TSS = ESS + RSS n−1
numerator will be greater than change in denominator. adjusted r sq.
will always be smaller than r sq.
n−1
ഥ2 ≤ 1 but can be negative. As more explanatory variables are
▪ R
added to the model, R ഥ2 only increases if the extra variables
contribute significantly to the model’s explanatory power.
▪ n − 1 Τ n − 1 − k ≥ 1 implies R ഥ 2 ≤ R2

RSS σ yi − yො i 2 σ uො 2i
SER = = =
n−1−k n−1−k n−1−k
Regression Output Table : Interpretation
yi = 4348.772 + 37.3697AGEi + 35.70034EDUi + 377.8493GPAi − 5.317322SIBi + uො i

▪ The OLS estimators for a multivariate


regression model can be easily
computed using statistical software
(including Stata and MS Excel).

▪ Apart from the OLS estimates, the y

regression output table consists of β෠ 1


β෠ 2
useful information to check whether 𝐸𝑥𝑝𝑙𝑎𝑛𝑎𝑡𝑜𝑟𝑦
𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 β෠ 3
the constructed regression model is β෠ 4
adequate. β෠ 0

Hypothesis Testing
Residuals : Normality Assumption
residuals
yi = 4348.772 + 37.3697AGEi + 35.70034EDUi + 377.8493GPAi −5.317322SIBi + uො i
Normality Assumption (of residuals)
▪ The residuals of the estimated regression model (equation)
should be distributed symmetrically around zero, which means
that the residual is a random and independent variable which
follows an approximately normal distribution with zero mean.
▪ The normality assumption of residuals will imply
1. the estimated regression equation captures the main patterns
and sources of variation between the response variable and
all explanatory variables.
perform hypothesisis testing only if betas are normally distributed
2. the OLS estimators are approximately normally distributed
(which can be mathematically proven). This will ensure that
hypothesis tests on the OLS estimators in the estimated
regression model can be performed with accuracy.

▪ Causes of a skewed distribution of residuals include outliers and


sample not being ‘sufficiently large’.
Residuals : Tests for Normality
residuals
yi = 4348.772 + 37.3697AGEi + 35.70034EDUi + 377.8493GPAi − 5.317322SIBi + uො i

Histogram Normal Quantile Plot


Normal Density Plot 45-Degree Reference Line
Kernel Density Plot

1. Normal Density Plot depicts the probability density function of the data. 1. If the residuals fall along the 45-degree reference line, then the residuals
Unlike the histogram, the curve represents the proportion of the data in are approximately normally distributed.
each range, rather than the frequency.
2. Kernel Density Plot includes a kernel smoothing effect on the probability
density estimation of the data.

* Other statistical tests for normality includes the Kolmogorov-Smirnov test,


the Shapiro-Wilk test, the Jarque-Bera test, and the Anderson-Darling test.
Sampling Distribution of OLS Estimators
β෠ ≈ N E β෠ , Var β෠
E β෠ = β

Var β෠ = σ2 (X ′ X)−1 where σ2 = Var(u) β෠


β

෢2 (X ′ X)−1
෢ β෠ = σ
Var ෢2 = 1 σn uො i 2
where σ n i=1

β෠ − E β෠
≈ 𝑡 𝑛−1−𝑘
• β෠ is an unbiased estimator of β se β෠ ෠
Var β
Degree of Freedom (df)

• Standard error of β෠ = se β෠ = ෠
Var(β)
t
0
Sampling Distribution of OLS Estimators
X = N E X , Var X β෠ ≈ N E β෠ , Var β෠

X β෠
E X β

X−E X β෠ − E β෠
Z= = N(0,1) ≈ 𝑡 𝑛−1−𝑘
sd X se β෠ ෠
Var β
Degree of Freedom (df)

Z t
0 0
t Distribution

Standard Normal Distribution


Z Distribution = 𝑁(0,1)
t-Distribution (𝑑𝑓 ≥ 120)

t-Distribution (𝑑𝑓 = 60) Degree of Freedom (df)

t-Distribution (𝑑𝑓 = 10)

▪ When n (sample size) increases


→ Degree of Freedom (df) of t-distribution increases
→ t-distribution will approach Z distribution.

n increases → s. e. β෠ decreases → accuracy increases


Prepared by

Daniel SOH

You might also like