(Unit-04) Part-01 - ML Algo
(Unit-04) Part-01 - ML Algo
(CSE3007)
Unit – 04 (Part-I)
So far the distance between the data points and the
line is:
The Main Idea of Least Square and Linear Regression
So far the distance between the data points and the
line is:
The Main Idea of Least Square and Linear Regression
Finally …….
So to make the cost positive and more
mathematically meaningful, each difference
terms are squared and added together to find
the fit:
= 24.62
or Slope
y- intercept
•• We
want to minimize the squares of the distance between the
observed value and the line.
Centroid ()
or
𝑐=𝑦 −𝑚 𝑥
4
𝑚= =0 . 4
𝑐=3.6−0.4×3=2.4
10
The Predicted Line…..
The Predicted Line…..
Goodness of fit…. – R2
WHAT IS R-SQUARED?
R-squared is a statistical measure of how close the data are to the fitted
regression line.
It is also known as the coefficient of determination, or the coefficient of
multiple determination for multiple regression.
The definition of R-squared is fairly straight-forward; it is the percentage of
the response variable variation that is explained by a linear model.
R-squared = Explained variation / Total variation
Calculation of – R2
Calculation of – R2
2
𝑅 ≈0.3
Interpretation of values of R2
R2=1
Regression line is a Perfect fit
on actual values
R2=0
There is larger distance
between Actual and predicted
values.
Advantages And Disadvantages
Advantages Disadvantages
Linear regression performs exceptionally The assumption of linearity between
well for linearly separable data dependent and independent variables
Easier to implement, interpret and It is often quite prone to noise and
efficient to train overfitting
It handles overfitting pretty well using Linear regression is quite sensitive to
dimensionally reduction techniques,
regularization, and cross-validation outliers
One more advantage is the extrapolation It is prone to multicollinearity
beyond a specific data set
Solve it
• Use least-squares regression to fit a straight line to