Multivariate Regression Model - Lecture Notes
Multivariate Regression Model - Lecture Notes
BSE3703
Topic 3
Multivariate Regression Model
Learning Outcomes
n,k
𝑥1,5 … 7860 42 23 4.67 17
i = 1, 2, … , n … 7716 30 22 3.68 19
= β 0 + β j xji + uො i j = 1, 2, … , k … 7476 34 20 3.32 23
i=1,j=1 i=n−1 7536 35 21 3.42 18
Sample Size
i=n 7356 30 19 3.13 22
𝑥3,𝑛
Sample Size for Regression Model
▪ Sample size for multivariate regression model should be ‘sufficiently large’ in order
to build an adequate regression model for inference and forecasting.
➢ There is no ‘fast and hard rule’ to derive the appropriate sample size.
➢ In the real-world, the best practice is to configure your sample size according to your context.
Outliers
ESS
σ uො 2i
RSS
Adjusted R2 = R ഥ2 = 1 − n − 1 − k
TSS R square will increase only if the variable is relevant as the change in σ yi − y ത 2
TSS = ESS + RSS n−1
numerator will be greater than change in denominator. adjusted r sq.
will always be smaller than r sq.
n−1
ഥ2 ≤ 1 but can be negative. As more explanatory variables are
▪ R
added to the model, R ഥ2 only increases if the extra variables
contribute significantly to the model’s explanatory power.
▪ n − 1 Τ n − 1 − k ≥ 1 implies R ഥ 2 ≤ R2
RSS σ yi − yො i 2 σ uො 2i
SER = = =
n−1−k n−1−k n−1−k
Regression Output Table : Interpretation
yi = 4348.772 + 37.3697AGEi + 35.70034EDUi + 377.8493GPAi − 5.317322SIBi + uො i
Hypothesis Testing
Residuals : Normality Assumption
residuals
yi = 4348.772 + 37.3697AGEi + 35.70034EDUi + 377.8493GPAi −5.317322SIBi + uො i
Normality Assumption (of residuals)
▪ The residuals of the estimated regression model (equation)
should be distributed symmetrically around zero, which means
that the residual is a random and independent variable which
follows an approximately normal distribution with zero mean.
▪ The normality assumption of residuals will imply
1. the estimated regression equation captures the main patterns
and sources of variation between the response variable and
all explanatory variables.
perform hypothesisis testing only if betas are normally distributed
2. the OLS estimators are approximately normally distributed
(which can be mathematically proven). This will ensure that
hypothesis tests on the OLS estimators in the estimated
regression model can be performed with accuracy.
1. Normal Density Plot depicts the probability density function of the data. 1. If the residuals fall along the 45-degree reference line, then the residuals
Unlike the histogram, the curve represents the proportion of the data in are approximately normally distributed.
each range, rather than the frequency.
2. Kernel Density Plot includes a kernel smoothing effect on the probability
density estimation of the data.
2 (X ′ X)−1
β = σ
Var 2 = 1 σn uො i 2
where σ n i=1
β − E β
≈ 𝑡 𝑛−1−𝑘
• β is an unbiased estimator of β se β
Var β
Degree of Freedom (df)
• Standard error of β = se β =
Var(β)
t
0
Sampling Distribution of OLS Estimators
X = N E X , Var X β ≈ N E β , Var β
X β
E X β
X−E X β − E β
Z= = N(0,1) ≈ 𝑡 𝑛−1−𝑘
sd X se β
Var β
Degree of Freedom (df)
Z t
0 0
t Distribution
Daniel SOH