0% found this document useful (0 votes)
198 views21 pages

Regression PPT

Linear regression is a machine learning technique used to model the relationship between independent variables (x) and a dependent variable (y). It finds the best fit linear equation to describe the relationship by minimizing the sum of the squared errors between observed and predicted values of y. Gradient descent is an iterative algorithm that can be used to minimize this error function and find the optimal coefficients for the linear regression model when there are multiple independent variables. Logistic regression is similar but uses a sigmoid function to transform its output into a probability value that can predict discrete class labels rather than continuous target variables.

Uploaded by

Rakesh bhukya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views21 pages

Regression PPT

Linear regression is a machine learning technique used to model the relationship between independent variables (x) and a dependent variable (y). It finds the best fit linear equation to describe the relationship by minimizing the sum of the squared errors between observed and predicted values of y. Gradient descent is an iterative algorithm that can be used to minimize this error function and find the optimal coefficients for the linear regression model when there are multiple independent variables. Logistic regression is similar but uses a sigmoid function to transform its output into a probability value that can predict discrete class labels rather than continuous target variables.

Uploaded by

Rakesh bhukya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Regression

Linear Regression
• Got a bunch of points (xi , yi).
• Want to fit a line y = ax + b that describes the trend
• We define a cost function that computes the total squared error of our
predictions w.r.t. observed values yi, J(a, b) = P(axi + b − yi)2 that we want to
minimize.
• See it as a function of a and b: compute both derivatives, force them equal
to zero, and solve for a and b
• The coefficients you get give you the minimum squared error.
• Can do this for specific points, or in general and find the formulas.
Sum of Squared Error

Table 1 Sample Salary Data Figure 1 Scatter Plot Diagram

we cannot mathematically determine the years of


experience.

But, we can determine / predict salary column values


(Dependent Variables) based on years of experience.

If you look at the data, the dependent column values


(Salary in 1000$) are increasing / decreasing based on
years of experience.
Sum of Squared Error

• In order to fit the best intercept line between the points in the above
scatter plots, we use a metric called “Sum of Squared Errors” (SSE) and

• compare the lines to find out the best fit by reducing errors. The errors
are sum difference between actual value and predicted value.

• To find the errors for each dependent value, we need to use the formula
below.
Sum of Squared Error

Table 3 SSE Calculation

The sum of squared errors SSE output is 5226.19. To do the best fit of line intercept, we need to apply a linear
regression model to reduce the SSE value at minimum as possible
Sum of Squared Error

• To identify a slope intercept, we use the equation y = mx + b


– ‘m’ is the slope
– ‘x’ → independent variables
– ‘b’ is intercept

 We will use Ordinary Least Squares method to find the best line
intercept (b) slope (m)

Ordinary Least Squares (OLS) Method


To use OLS method, we apply the below formula to find the equation
Sum of Squared Error

Table 4: OLS method calculations


Sum of Squared Error
• We can also find the equation in MS-Excel.

• First, select data and then go to the Insert


Tab and from the Charts group, select Scatter
plot.

• After getting the Scatter plot, go to the Add


Chart Element (in Chart Layout group) and
then select Trendline and go to the More
Trendline Options… and then tick on Display
Equation on chart
Gradient Descent Algorithm

• Ordinary Least Square method looks simple and computation is easy.


But, this OLS method will only work for a univariate dataset which is
single independent variables and single dependent variables.

• Multi-variate dataset contains a single independent variables set and


multiple dependent variables sets, require us to use a machine learning
algorithm called “Gradient Descent”.

• The main reason why gradient descent is used for linear regression is
the computational complexity: it's computationally cheaper (faster) to
find the solution using the gradient descent in some cases.

• The formula which you wrote looks very simple, even computationally,
because it only works for univariate case, i.e. when you have only one
variable. In the multivariate case, when you have many variables, the
formulae is slightly more complicated on paper and requires much more
calculations when you implement it in software:

• Gradient descent is an iterative optimization algorithm to find the


minimum of a function. Here that function is our Loss Function.
Gradient Descent Algorithm

• The values of m and c are updated at each iteration


to get the optimal solution.
Gradient Descent Algorithm
Gradient Descent Algorithms

• Imagine a valley and a person with no sense of direction who wants to


get to the bottom of the valley.

• He goes down the slope and takes large steps when the slope is steep
and small steps when the slope is less steep.

• He decides his next position based on his current position and stops
when he gets to the bottom of the valley which was his goal.

• Let’s try applying gradient descent to m and c and approach it step by


step:
Gradient Descent Algorithm

1. Initially let m = 0 and c = 0. Let L be our learning rate. This controls how
much the value of m changes with each step. L could be a small value
like 0.0001 for good accuracy.

2. Calculate the partial derivative of the loss function with respect to m,


and plug in the current values of x, y, m and c in it to obtain the
derivative
Gradient Descent Algorithm

Dm is the value of the partial derivative with respect to m. Similarly lets find the
partial derivative with respect to c, Dc :

3. Now we update the current value of m and c using the following equation:
Gradient Descent Algorithm

4. We repeat this process until our loss function is a very small value or
ideally 0 (which means 0 error or 100% accuracy). The value of m and c
that we are left with now will be the optimum values.
• Now going back to our analogy, m can be considered the current position of
the person. D is equivalent to the steepness of the slope and L can be the
speed with which he moves.
• Now the new value of m that we calculate using the above equation will be
his next position, and L×D will be the size of the steps he will take.
• When the slope is more steep (D is more) he takes longer steps and when it
is less steep (D is less), he takes smaller steps. Finally he arrives at the
bottom of the valley which corresponds to our loss = 0.
Now with the optimum value of m and c our model is ready to make
predictions !
Logistic Regression
• Logistic regression is a classification algorithm used to assign observations to a
discrete set of classes

• Unlike linear regression which outputs continuous number values, logistic


regression transforms its output using the logistic sigmoid function to return a
probability value which can then be mapped to two or more discrete classes.

Comparison to linear regression


 Given data on time spent studying and exam scores. Linear Regression and
logistic regression can predict different things:

 Linear Regression could help us predict the student’s test score on a scale of 0 -
100. Linear regression predictions are continuous (numbers in a range).

 Logistic Regression could help use predict whether the student passed or failed..
Types of Logistic Regression
• Binary (Pass/Fail)

• Multi (Cats, Dogs, Sheep)

• Ordinal (Low, Medium, High)

Binary Logistic Regression


 Given data on student exam results and our goal is to predict whether a
student will pass or fail based on number of hours slept and hours spent
studying.
 We have two features (hours slept, hours studied) and two classes:
passed (1) and failed (0).
Binary Logistic Regression

Studied Slept Passed


4.85 9.63 1
8.62 3.23 0
5.43 8.23 1
9.21 6.34 0

Graphically we could represent our data with scatter plot


Sigmoid Activation

• In order to map predicted values to probabilities, we use the sigmoid function


• The function maps any real value into another value between 0 and 1
• In machine learning, we use sigmoid to map predictions to probabilities.

1
𝑆(𝑧) =
1 + 𝑒 −𝑧
Decision boundary
Making predictions

You might also like