ML-Unit I - Linear Regression
ML-Unit I - Linear Regression
● Problem statement:
Word Vote
Count ○ For every review (given in terms of word count
in the dataset) that will be posted on the
27 52
e-commerce website, predict how many votes it
2 6
will receive?
100 42 ● Here, we have input and output both given in the
20 ?? dataset, so it is a supervised problem.
● Second, while looking the vote column (output) it looks
Dataset like continuous value and not categorical value.
○ Hence, it is a regression problem.
One-dimensional data
Sl. No. Vote
● Unfortunately, while storing the data we collected only
1 5
vote and not the review word count.
2 17 ● So, this is the best data we have now, and we have to
3 11 find what will be the next vote for review no. 4.
4 8 ● How will you predict the vote count for future review
5 14 only on this data?
6 5
Dataset
Data Visualization
1 5
2 17
3 11
4 8
5 14
6 5
Dataset
Best line for the given data
1 5
2 17
3 11
4 8
5 14
6 5
Dataset Ŷ = 10
“Mean”: Best line for the given data
1 5
2 17
3 11
4 8
5 14
6 5 ● With only one variable, and no other information, the best prediction for next
measurement is the mean itself.
Dataset ● The variability in the vote can be explained by vote itself.
Ŷ = 10
“Goodness of fit” for the Vote
1 5
2 17
3 11
4 8
5 14
6 5 ● Residuals are also known as Errors.
● Residual always add ups to Zero.
Dataset Ŷ = 10 ○ In this case Residual above line is +12 and below line is -12.
Squaring the residuals
Sl. R R2 Sum of squared errors (SSE) = 120
No.
1 -5 25
2 +7 49
3 +1 1
4 -2 4
5 +4 16
6 -5 25
Dataset Ŷ = 10
Squaring the residuals
Sum of squared errors (SSE) = 120
Sl. R R2
No.
1 -5 25
2 +7 49
3 +1 1
4 -2 4
5 +4 16
6 -5 25
Dataset Ŷ = 10
Important points
Sl. R R2 ● The goal of simple linear regression is to create a linear model
No. that minimizes the sum of squares of the residuals/ errors
(SSE).
1 -5 25
● When conducting simple linear regression with two variables,
2 +7 49 we will determine how good that line “fits” the data by
3 +1 1 comparing it to this type of line; where we pretend the second
4 -2 4 variable does not exist.
● If a two variable regression model looks like this example,
5 +4 16
other variable does nothing to explain dependent variable.
6 -5 25
Dataset Ŷ = 10
Important points
● Simple linear regression is really a comparison of two models.
○ One is where the independent variable does not even exist.
○ Other uses the best fit regression line.
● If there is only one variable in dataset, the best prediction is given by the mean of
the dependent variable.
● The difference between the best fit line and the observed value is called the
residuals (or errors).
● The residuals are squared and sum together to given sum of squared residuals/errors
(SSE).
● Simple linear regression is designed to find the best fitting line through the data that
minimizes the SSE.
Linear Regression with independent variable
● Linear regression is a statistical method of
finding the relationship between independent
and dependent variables.
● Why do we call them as Independent and
Dependent variables?
○ Our independent variables are
independent because we cannot
mathematically determine the years of
experience.
○ But, we can determine / predict salary
column values (Dependent Variables)
based on years of experience.
Linear Regression with independent variable
● If you look at the data, the dependent column
values (Salary in 1000$) are increasing /
decreasing based on years of experience.
● Total Sum of Squares (SST):
○ The SST is the sum of all squared
differences between the mean of a sample
and the individual values in that sample. It
is represented mathematically with the
formula.
Linear Regression with independent variable
● Total Sum of Squares (SST):
m = 1037.8 / 216.19
m = 4.80
b = 45.44 - 4.80 * 7.56 = 9.15
Hence,
y = mx + b → 4.80x + 9.15
y = 4.80x + 9.15
Ordinary Least Square (OLS) Linear Regression
● Let’s compare our OLS method result with MS-Excel.
● Yes, we can test our linear regression best line fit in Microsoft Excel.
y = 4.79x + 9.18
Ordinary Least Square (OLS) Linear Regression
● Let us calculate SSE again by using our output equation.
Dataset
Linear Regression
Word Vote
● Now, let's come back to our main dataset and find the
Count
relationship between Word count (X) and Vote (Y)
27 52
2 6
100 42
40 38
14 30
20 ??
Dataset
OLS Linear Regression
Word Vote ● We know that a linear relationship can be obtained by
Count drawing a straight line between Word count (X) and
27 52 Vote (Y) which is given as:
2 6 Y = mx + c
100 42 Where,
m= slope
40 38
c= intercept
14 30
20 ??
Dataset
OLS Linear Regression
Word Vote ● We know that a linear relationship can be obtained by
Count drawing a straight line between Word count (X) and
27 52 Vote (Y) which is given as:
2 6 Y = mx + c
Where, Avg (x) = (27+2+100+40+14)/5 =36.6
100 42
m= slope Avg (y) = (52+6+42+38+30)/5 = 33.6
40 38
c= intercept
14 30
20 ??
Dataset
OLS Linear Regression
Word Vote
Count
27 52
2 6
100 42
40 38
14 30
20 ??
Dataset
OLS Linear Regression
Word Vote But, is it the
Count best line?
27 52
2 6
100 42
40 38
14 30
20 ??
Dataset
Linear Regression: choosing best line
Word Vote ● But, is it the best line?
Count ○ We can get multiple lines if we change the value
of m and c in the equation Y=mx+c.
27 52
To get the best line we will
2 6 use the gradient descent
algorithm.
100 42
40 38 Idea: Choose m and c
such that f(x) is close to y
14 30 for our training example
(x, y).
20 ??
Therefore, we need to
minimize the difference
Dataset between f(x) and y.
Re-writing hypothesis for finding best line
● A best regression line is one for which we get the least
error.
● Objective: of all possible lines, find the one that minimizes
the distance between the predicted y values (on the line)
and the true y values
Hypothesis Function: h(x)= w0+w1x
Cost function: Mean Squared Error
● Objective: of all possible lines, find the one that
minimizes the distance between the predicted y
values (on the line) and the true y values
Hypothesis Function: h(x)= w0+w1x
Cost function: Mean Squared Error
● Objective: of all possible lines, find the one that
minimizes the distance between the predicted y
values (on the line) and the true y values
Hypothesis Function: h(x)= w0+w1x
For W1=0.5
For W1=0
For W1=2
Cost Function
Cost function (MSE): of two parameters (W0, W1)
Cost function (MSE): of two parameters (W0, W1)
Cost function (MSE): of two parameters (W0, W1)
Gradient Descent
● We want to find the line that best fits the
data, i.e., we want to find w0 and w1 that
minimize the cost, J(w0,w1).