Chapter 3 two variable regression model
Chapter 3 two variable regression model
Ordinary Least Squares (OLS) is a statistical method used for estimating the parameters of a
linear regression model. It seeks to find the line (or hyperplane in multiple dimensions) that best
fits a set of observed data points by minimizing the sum of the squared differences between the
observed and predicted values. This approach is commonly used in regression analysis to find
the relationship between independent variables (predictors) and a dependent variable (response).
The Basics:
Linear Regression Model: The general form of a linear regression model with one
predictor variable is:
6. Assumptions of OLS
1. Linearity: The relationship between the independent and dependent variable is linear.
2. Independence: The residuals (errors) are independent.
3. Homoscedasticity: The residuals have constant variance at every level of the
independent variable.
4. Normality of Errors: The residuals are normally distributed.
7. Limitations of OLS
It can be sensitive to outliers, which can significantly affect the slope and intercept.
OLS assumes a linear relationship; if the relationship is nonlinear, the model may not be
appropriate.
OLS can be extended to multiple linear regression where there are multiple predictor variables.
The goal remains the same: minimizing the sum of squared residuals, but the calculation of the
parameters becomes more complex.
The classical linear regression model: the assumptions underlying the method of least
squares
The Classical Linear Regression Model (CLRM) is based on several key assumptions that
underlie the validity of the Ordinary Least Squares (OLS) estimation method. These assumptions
ensure that OLS provides the "Best Linear Unbiased Estimators" (BLUE) of the parameters in a
linear regression model, which means the estimators are unbiased, have minimum variance
among all linear unbiased estimators, and follow a normal distribution. Let's discuss these
assumptions in detail:
2. Independence of the Error Terms
Assumption: The error terms (ϵi) are independently distributed across observations.
Importance: If the errors are not independent (e.g., autocorrelation exists), it means that the
outcome of one observation may be influencing another. This violates the assumption and can
lead to inefficient and biased estimators, especially in time series data where observations can be
correlated over time.
4. No Perfect Multicollinearity
Assumption: There is no exact linear relationship between the independent variables. In other
words, the independent variables are not perfectly collinear.
Importance: The normality assumption is particularly important for hypothesis testing. If the
error terms are not normally distributed, the test statistics (such as t-tests and F-tests) used to
evaluate the significance of the estimated coefficients may not follow their assumed
distributions, leading to invalid conclusions. This assumption is especially important for small
sample sizes.
The precision or standard errors of the Ordinary Least Squares (OLS) estimates are critical in
understanding the reliability and accuracy of the estimated coefficients in a regression model.
They measure the dispersion or variability of the estimated regression coefficients from the true
population parameters. Let's discuss the standard errors of OLS estimates in detail:
In the context of OLS regression, the standard error of an estimated coefficient represents the
estimated standard deviation of the sampling distribution of that coefficient. It indicates how
much the estimated value of a coefficient would vary if the regression were repeated multiple
times with different samples drawn from the same population.
The smaller the standard error, the more precise the estimate. In contrast, a larger standard error
implies greater variability in the estimate, making the coefficient less reliable.
Sample Size (n): Larger sample sizes generally lead to smaller standard errors, as the estimates
are based on more information.
Variance of the Error Term (σ^): If the variance of the residuals is large, the standard errors will
also be large, indicating less precision in the estimates.