Regression PPT
Regression PPT
Linear Regression
• Got a bunch of points (xi , yi).
• Want to fit a line y = ax + b that describes the trend
• We define a cost function that computes the total squared error of our
predictions w.r.t. observed values yi, J(a, b) = P(axi + b − yi)2 that we want to
minimize.
• See it as a function of a and b: compute both derivatives, force them equal
to zero, and solve for a and b
• The coefficients you get give you the minimum squared error.
• Can do this for specific points, or in general and find the formulas.
Sum of Squared Error
• In order to fit the best intercept line between the points in the above
scatter plots, we use a metric called “Sum of Squared Errors” (SSE) and
• compare the lines to find out the best fit by reducing errors. The errors
are sum difference between actual value and predicted value.
• To find the errors for each dependent value, we need to use the formula
below.
Sum of Squared Error
The sum of squared errors SSE output is 5226.19. To do the best fit of line intercept, we need to apply a linear
regression model to reduce the SSE value at minimum as possible
Sum of Squared Error
We will use Ordinary Least Squares method to find the best line
intercept (b) slope (m)
• The main reason why gradient descent is used for linear regression is
the computational complexity: it's computationally cheaper (faster) to
find the solution using the gradient descent in some cases.
• The formula which you wrote looks very simple, even computationally,
because it only works for univariate case, i.e. when you have only one
variable. In the multivariate case, when you have many variables, the
formulae is slightly more complicated on paper and requires much more
calculations when you implement it in software:
• He goes down the slope and takes large steps when the slope is steep
and small steps when the slope is less steep.
• He decides his next position based on his current position and stops
when he gets to the bottom of the valley which was his goal.
1. Initially let m = 0 and c = 0. Let L be our learning rate. This controls how
much the value of m changes with each step. L could be a small value
like 0.0001 for good accuracy.
Dm is the value of the partial derivative with respect to m. Similarly lets find the
partial derivative with respect to c, Dc :
3. Now we update the current value of m and c using the following equation:
Gradient Descent Algorithm
4. We repeat this process until our loss function is a very small value or
ideally 0 (which means 0 error or 100% accuracy). The value of m and c
that we are left with now will be the optimum values.
• Now going back to our analogy, m can be considered the current position of
the person. D is equivalent to the steepness of the slope and L can be the
speed with which he moves.
• Now the new value of m that we calculate using the above equation will be
his next position, and L×D will be the size of the steps he will take.
• When the slope is more steep (D is more) he takes longer steps and when it
is less steep (D is less), he takes smaller steps. Finally he arrives at the
bottom of the valley which corresponds to our loss = 0.
Now with the optimum value of m and c our model is ready to make
predictions !
Logistic Regression
• Logistic regression is a classification algorithm used to assign observations to a
discrete set of classes
Linear Regression could help us predict the student’s test score on a scale of 0 -
100. Linear regression predictions are continuous (numbers in a range).
Logistic Regression could help use predict whether the student passed or failed..
Types of Logistic Regression
• Binary (Pass/Fail)
1
𝑆(𝑧) =
1 + 𝑒 −𝑧
Decision boundary
Making predictions