0% found this document useful (0 votes)
76 views

Linear Regression - Important Notes Regression

Linear regression is a statistical technique used to determine the relationship between a target variable and one or more predictor variables. Simple linear regression examines the relationship between one target and one predictor variable, while multiple linear regression examines the relationship between a target variable and two or more predictor variables. Key aspects of linear regression include coefficient estimates, p-values, residual standard error, R-squared, and adjusted R-squared values. Performing linear regression involves understanding the problem, preparing the data, selecting variables, running the analysis, and interpreting the results.

Uploaded by

meajagun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Linear Regression - Important Notes Regression

Linear regression is a statistical technique used to determine the relationship between a target variable and one or more predictor variables. Simple linear regression examines the relationship between one target and one predictor variable, while multiple linear regression examines the relationship between a target variable and two or more predictor variables. Key aspects of linear regression include coefficient estimates, p-values, residual standard error, R-squared, and adjusted R-squared values. Performing linear regression involves understanding the problem, preparing the data, selecting variables, running the analysis, and interpreting the results.

Uploaded by

meajagun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Linear Regression – Important Notes

Regression is a statistical technique/method/algorithm used in trying to determine


the strength of the relationship between one target/dependent variable
(denoted by Y) and one or multiple predictor/independent variables (denoted
by X).
Linear Regression is a statistical method used in trying to determine strength of a
straight-line relationship between a target or dependent variable Y and one or
more predictor or dependent variables X.
Simple Linear Regression – This is when the linear (straight-line) relationship being
studied is between one target variable Y and one predictor variable X.
Multiple Linear Regression – This is when the linear (straight-line) relationship being
studied is between one target variable Y and two or more predictor variables X.
Target/dependent variable – This is the variable we are trying to understand and
predict through the regression analysis.
Predictor/independent variable – these are the variables used in trying to predict
the target/dependent variable.

Linear Regression Equations


Simple Linear Regression Equation: Y = mX + b
Where:
Y = Target Variable
X = Predictor Variable
m = Slope of the line
b = Y-intercept

Multiple Linear Regression Equation: Y = β0 + β1X1 + β2X 2 +…+ βnXn + ε

Where:
Y = Target Variable
X = Predictor Variable
β0 = Y-intercept
β1X1 … βnXn = Regression coefficient of the first to the last predictor Variable
ε = model error (amount of variation in the model predicted estimate of Y)

Regression Notes | pg. 1


Interpreting Regression Results
Coefficient Estimates – represent the relationship between each predictor
variable and the target variable.
P Value (Pr(>|t|)) – this is probability that the observed coefficient estimates
values occurred by chance and there is no relationship between the predictor
variable and target variable. If the p-value is less than 0.05 this means, there is
significant relationship between the predictor variable and the target variable,
and the coefficient estimates value did not occur by chance.
Residual Standard Error – is a measure of the quality of the linear regression fit. In
theory, every linear regression model is assumed to contain some form of error
which makes it impossible to achieve perfect predictions of the target variable.
R-Squared – represents the variance between the target variable and the
predictor variable. The R-squared value ranges from 0 to 1. The R-squared is a
measure of goodness of fit and evaluates the scatter of the data points around
the regression line. In the case of multiple regression, the R-squared is called as
‘multiple R-squared’. R-Squared is also known as the coefficients of determination.
Adjusted R-Squared – is a modified version of the R-squared to cater for multiple
predictor variables in the regression model in the case of multiple linear regression.
This is because in multiple linear regression, the R-squared increases with the
addition of more predictor variables even if the predictor variables do not
improve the model fit. Thus, the Adjusted R-squared increases when the added
new predictor variable increases the model fit and reduces vice versa.

Regression Analysis Checklist


Step 1 – Understanding the Problem (Business Issue Understanding):
Understand the business issue/problem to help decide what decisions need to be
made based on the questions that need answering.
Step 2 – Data Understanding:
Check if the data available is suitable and sufficient to perform the predictive
analysis to provide the desired results of the business issue/problem.
Step 3 - Data Preparation:
Check the data for anomalies, outliers, null and missing values. Format and clean
the data before analysis.

Regression Notes | pg. 2


Step 4 – Variable Selection:
This involves observing, identifying, and choosing the most suitable target and
predictor variables based on the business issue understanding.
The target variable is chosen based on the variable that needs to be predicted
while the predictor variable(s) is chosen as the most suitable variables required to
predict the target variable.
Step 5 – Model & Analysis:
Select the best regression model to use based on the available data and variable
types (simple linear or multiple linear regression) and run the regression analysis.
Step 6 – Results Interpretation & Visualization:
Go through and make meaning of the regression results. Use correlation results,
coefficient estimates, R-Square or Adjusted R-squared values to determine the
significance and strength of the regression relationship.
Choose your report options on how to best relay the result of the regression
analysis to the final consumer of the analysis of the results.

Regression Notes | pg. 3

You might also like