QT - Unit 2 - Part B - Regression
QT - Unit 2 - Part B - Regression
REGRESSION
Regression Analysis
Establishing Correlation is a prerequisite for Linear Regression. We
can't use Linear Regression unless there is a Linear Correlation.
Correlation analysis describes the present or past situation. It
uses Sample data to infer a property of the source Population or
Process. There is no looking into the future. Linear Regression is
used to predict results.
Correlation analysis studies whether the variables under study are related or not and
to what degree. Correlation Analysis does not attempt to identify a
Cause-Effect relationship, Regression does.
In Correlation, we ask to what degree the plotted data forms a
shape that seems to follow an imaginary line that would go
through it. But we don't try to specify that line. In Linear
Regression, that establishes
Regression analysis line is thethe
whole point.
“nature We calculate
of relationship” a best-fit
between line
the variables.
through the data:relationship
It studies functional y = a + bx. and provides a mechanism for prediction or
forecasting.
Regression analysis is a statistical method to model the relationship
between a dependent (target or outcome) variable and one or more
independent (predictor) variables.
Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable when other
independent variables are held fixed. It predicts continuous/real values
such as temperature, age, salary, price, etc.
The green points shown in the graph are actual data points.
Least Squares Method
Line of Best Fit/Regression Line
The least-squares regression method is a technique commonly used in
Regression Analysis. It is a mathematical method used to find the best fit
line that represents the relationship between an independent and
dependent variable in such a way that the error is minimized.
The Line of best fit is drawn across a scatter plot of data points in order to
represent a relationship between those data points.
The least squares method is one of the most effective ways used to draw
the line of best fit it is based on the idea that the square of the errors
(residuals) obtained must be minimized to the most possible extent and
hence the name least squares method
Regression Line
Regression Line is defined as a statistical concept that facilitates and predicts
the relationship between independent variable and dependent variable. A
regression line is a straight line that reflects the best-fit connection in a dataset
between independent and dependent variables.
The lines of regression Y on X or X on Y are best fit in the sense that it minimizes
the sum of the squares of the vertical distances from the observed points to the
line.
When X is known & Y is to predicted – Y on X is used.
When Y is known & X is to predicted – X on Y is used.
Assumptions of (Simple Linear) Regression
We make a few assumptions when we use linear regression to model the
relationship between independent and dependent variables. These
assumptions are essentially conditions that should be met before we draw
inferences regarding the model estimates or before we use a model to make a
prediction.
Regression fails to deliver good results with data sets which doesn’t fulfil its
assumptions. Therefore, for a successful regression analysis, it’s essential to
validate these assumptions
1. Linear relationship - There should be a linear relationship between dependent
(response) variable and independent (predictor) variable(s).
2. Normality of Errors- The errors or residuals must be normally distributed.
3. Homoscedasticity (or, equal variance around the line) - The error terms must
have constant variance.
4. No multicollinearity - The independent variables should not be correlated.
5. Autocorrelation - There should be no correlation between the residual (error).
The presence of correlation in error terms drastically reduces model’s accuracy.
Coefficient of Regression
Regression Equations/Lines
Regression Coefficient – Some Formulas
1. From Original Data
7. Correlation analysis is confined only to the Regression analysis has much wider
study of linear relationship between the applications as it studies linear as well as
variables and therefore has limited non-linear relationship between the
applications. variables.
8 Correlation coefficient is independent of Regression coefficients are independent
change of origin and scale. of only change of origin but not of scale.
• It provides a functional relationship between two or more related variables with the
help of which we can easily estimate or predict the unknown values of one variable
from the known values of another variable.
• It provides a valuable tool for measuring and estimating the cause and effect
relationship among the economic variables that constitute the essence of economic
theory and economic life. It is highly used in the estimation of Demand curves, Supply
curves, Production functions, Cost functions, Consumption functions etc.
• This technique is highly used in our day-to-day life and sociological studies as well to
estimate the various factors viz. birth rate, death rate, tax rate, yield rate, etc.
• Last but not the least, the regression analysis technique gives us an idea about the
relative variation of a series.
Pitfalls of Correlation & Regression Analysis