Module 4 Activity DB
Module 4 Activity DB
MIM-633-41
1. Explain the concept of Ordinary Least Squares (OLS) regression. What is the goal of OLS,
and how does it achieve this goal?
Ordinary Least Squares (OLS) regression is a statistical method used to model the linear
relationship between a dependent variable and one or more independent variables. The primary
goal of OLS is to find the line (or hyperplane in multiple regression) that best fits the data by
minimizing the sum of the squared differences between the observed values of the dependent
variable and the values predicted by the model. This is achieved by calculating the coefficients
that define the line or hyperplane, effectively finding the parameters that minimize the errors in
the prediction.
2. What are the key assumptions of OLS regression? Why are these assumptions important,
and what are the potential consequences of violating them?
3. Describe the difference between simple OLS regression and multiple OLS regression. When
would you use each type of model?
The difference between simple OLS regression and multiple OLS regression is simple OLS
regression involves one dependent variable and one independent variable, modeling a linear
relationship between them. It is used when you want to understand the effect of a single
predictor on an outcome. Multiple OLS regression involves one dependent variable and two or
more independent variables, allowing you to model the combined effect of multiple predictors. It
is used when you want to understand the relationship between a dependent variable and
multiple factors or to control for confounding variables. Simple regression is appropriate for
basic relationships, while multiple regression is necessary for more complex scenarios with
multiple influencing factors.
4. What are some of the limitations of OLS regression? How can these limitations be
addressed?
Some of the limitations of OLS regression is that it has several limitations. It assumes linearity,
which may not always be the case in real-world data. It is sensitive to outliers, which can
disproportionately influence the regression line. It can struggle with multicollinearity, where
independent variables are highly correlated, leading to unstable coefficient estimates.
Additionally, it assumes that the errors are normally distributed and homoscedastic, which is not
always true. To address these limitations, you can use data transformations to linearize
relationships, employ robust regression techniques to handle outliers, use techniques like VIF to
detect and mitigate multicollinearity, and perform diagnostic tests to check for violations of
assumptions and apply appropriate corrections.