0% found this document useful (0 votes)
34 views49 pages

HAAI Linear Models Slides

Uploaded by

basu.sam09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views49 pages

HAAI Linear Models Slides

Uploaded by

basu.sam09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Linear Models for

Regression and Classification


Dr. Saptarshi Ghosh
Department of Computer Science and Engineering
IIT Kharagpur
http://cse.iitkgp.ac.in/~saptarshi/

Hands-on approach to AI for real-world applications


Regression

Often we want to predict a particular property/value for an entity, given some other
properties/values
Examples:
- Marks in Physics of a student in Class 12 finals, given his/her marks in
practice tests
- Predict the price of houses in a city, given their area, number of rooms, …

Such problems, where we try to predict a continuous value, are called Regression

Hands-on approach to AI for real-world applications


Dataset of house Area vs Price in a city

For simplicity, let us assume


the price depends only on
one factor - area of house

How can we learn to predict the prices of houses of other sizes in the city,
as a function of their area?
Hands-on approach to AI for real-world applications
Dataset of house Area vs Price in a city

Example of a
supervised learning problem!

This is a training set.

When the target variable we are trying to predict is continuous: regression problem

Hands-on approach to AI for real-world applications


Dataset of house Area vs Price in a city

m = number of training examples


x's = input variables / features
y's = output variables / "target"
variables

(x(i),y(i)) = single training example


i is an index to training
set

Hands-on approach to AI for real-world applications


How to train a model?

Training Set
Learn a function h(x), so that
h(x) is a good predictor for the
Learning
corresponding value of y Algorithm

h: hypothesis function

x h y’
(house area) (predicted price)

Acknowledgement: Some of the images are taken from course materials of Professor Andrew Ng
Hands-on approach to AI for real-world applications
Linear Regression

Hands-on approach to AI for real-world applications


Hypothesis for linear regression

● wi are the parameters : - w0 is the zero condition (bias)


- w1 is the
gradient For simplicity, we
w = [w0 w1] is the vector of all parameters assume only one input
variable x for now

● We assume y is a linear function of x


● How to learnHands-on
the values
approachof the
to AI parameters?
for real-world applications
Intuition for the hypothesis function

Which is the best straight line


to fit the data?

Choose the parameters such


that the prediction is closest to
the actual y-value for all the
training examples

Hands-on approach to AI for real-world applications


Try to minimize the prediction error

(x, y): a training example


(x, hw(x)): prediction of the model

( hw(x) – y ): prediction error for


this particular training example

➢ Minimize the prediction error


across all training examples

Hands-on approach to AI for real-world applications


Cost function

➢ Measure of how close the predictions are to the actual y-values


➢ Average over all the m training instances
➢ Mean Squared Error (MSE) cost function E(w)
➢ Choose parameters w so that E(w) is minimized

Hands-on approach to AI for real-world applications


Recap

Hypothesis:

Parameters: w = [ w0, w1 ]

MSE Cost function:

We want to find those values of w for which Cost function will be minimized.
Hands-on approach to AI for real-world applications
For simplicity, let us for now assume w0 = 0

Hypothesis:

Parameters: w = [ w0, w1 ]

MSE Cost function:

We want to find those values of w for which Cost function will be minimized.
Hands-on approach to AI for real-world applications
Hands-on approach to AI for real-world applications
Now let us again consider both w0 and w1 vary

Hypothesis:

Parameters: w = [ w0, w1 ]

MSE Cost function:

We want to find those values of w for which Cost function will be minimized.

Hands-on approach to AI for real-world applications


Contour Plot of Cost Function

E(w)

w1
w0
Acknowledgement: Some of the images are taken from course materials of Professor Andrew Ng
Hands-on approach to AI for real-world applications
Gradient Descent for minimizing a cost function

● Efficient and scalable to thousands of parameters


● Used in many applications of minimizing functions (applicable to any
function in general, not only cost functions)

Basic algorithm (iterative method):


● Start with some w
● Keep changing w to reduce E(w) until we (hopefully) end up at a
minimum

Hands-on approach to AI for real-world applications


Issues with Gradient Descent
If a function has multiple local minima,
where one starts can decide which
minimum is reached

E(w)
Gradient Descent can get stuck at a
local minima

w1
The MSE cost function in linear w0
regression is always a convex function
● always has a single minimum.
● gradient descent always converges
Hands-on approach to AI for real-world applications
Gradient descent algorithm

Repeat until convergence {

(Simultaneously
update w0 and w1)
}

α is the learning rate (will be discussed soon)

Hands-on approach to AI for real-world applications


Gradient descent for single variable (for simplicity)

➢ If the derivative is positive, E(w)


reduce value of w1
➢ If the derivative is negative,
increase value of w1
w1
Hands-on approach to AI for real-world applications
The learning rate α
● Do we need to change learning rate over time?
○ No, Gradient descent can converge to a local minimum,
even with the learning rate α fixed
○ Step size adjusted automatically

● But, value needs to be chosen judiciously


○ If α is too small, gradient descent can be slow to converge
Can get stuck in local minima for non convex cost function!
○ If α is too large, gradient descent can overshoot the minimum.
It may fail to converge, or even diverge!
Hands-on approach to AI for real-world applications
Gradient for MSE cost function

Hands-on approach to AI for real-world applications


Gradient descent for univariate linear regression

Repeat until convergence {


(Simultaneously
update w0 and w1)

Hands-on approach to AI for real-world applications


Multivariate Linear Regression (multiple variables)

Number of Number of Age of home


Size (feet2) bedrooms floors (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Hands-on approach to AI for real-world applications
Multivariate Linear Regression
X1 X2 X3 X4 Y

Take x0 = 1
(for easier notation)

Hands-on approach to AI for real-world applications


Gradient descent for multivariate linear regression
Repeat:

...
Hands-on approach to AI for real-world applications
Practical aspects of applying gradient descent

Feature Scaling: Make sure features are on a similar scale. If numerically in very
different range, gradient descent steps updates will be dominated by numerically
larger features
E.g. x1 = area between 300 - 5000 sq.ft.
x2 = #bedrooms between 1 - 5

Normalization strategies:
- Divide by maximum value of the feature
- Min-Max Scaling
- Replace with to make features have approx zero mean
Hands-on approach to AI for real-world applications
Practical aspects of applying gradient descent
Is gradient descent working properly?
- Plot how E(w) changes with every iteration of gradient descent
- For sufficiently small learning rate, E(w) should decrease with every iteration
- If not (e.g., fluctuating), learning rate needs to be reduced
- However, too small learning rate means slow convergence!

When to end gradient descent?


- Run a fixed number of iterations
- Declare convergence if E(w) decreases by less than ε in an iteration
(assuming E(w) is decreasing in every iteration)

Hands-on approach to AI for real-world applications


Polynomial Regression for multiple variables
A good way of
fitting a non-
linear curve
using the linear
regression
mechanism

y ’ = w 0 + w 1. x y’ = w0 + w1. x + w2.t
t = x2
Can also be multivariate:
y’ = w0 + (w11. x1 + w12. x12 + w13. x13 + …) + (w21. x2 + w22. x22 + w23. x23 + …) + …

Hands-on approach to AI for real-world applications


Logistic Regression
(for classification)

Hands-on approach to AI for real-world applications


Examples of classification

● Loan Application: Approve / Deny


Regression: predict a
● Email: Spam / Not Spam continuous value

● Face Recognition: Valid user / Unknown user Classification: predict a


discrete label
● Tumour: Malignant / Benign

Let's consider a simple loan application problem:


x = credit score ϵ 𝕫+
y ϵ {0, 1} 0: “Negative class” (deny) 1: “Positive class” (approve)
Hands-on approach to AI for real-world applications
How to approach classification?

Let's consider a simple loan application problem:


x = credit score ϵ 𝕫+
y ϵ {0, 1} 0: “Negative class” (deny) 1: “Positive class” (approve)

Given an input x, try to predict / estimate the probability that y = 1 for this x

hw(x) = estimated probability that y=1 for input x, parameterized by w

How will the function hw(x) look like?

Hands-on approach to AI for real-world applications


The sigmoid / logistic function

Want: 0 ≤ hw(x) ≤ 1
hw(x) be differentiable at all points

hw(x) = 𝝈 (wT x)

where 𝝈 (z) = 1 / ( 1 + e-z )


Convex function: can compute
Pass the linear regression output differential at any point !
through a “Sigmoid / Logistic function”
Hands-on approach to AI for real-world applications
From probabilities to classification

hw(x) = 𝝈 (wT x)
where 𝝈 (z) = 1 / (1 + e-z)

Suppose:
Predict y = 1 when hw(x) ≥ 0.5

➢ wT x ≥ 0

Predict y = 0 when hw(x) < 0.5

➢ wT x < 0
Hands-on approach to AI for real-world applications
Separating two classes of points

● We are attempting to separate two given sets / classes of points


● Separate two regions of the feature space
● Concept of Decision Boundary - a boundary that separates two
classes of points / regions on the feature space
● Finding a good decision boundary
➢ learn appropriate values for the parameters w

Hands-on approach to AI for real-world applications


Decision Boundary

Predict y = 1 if
- 3 + x1 + x2 ≥ 0

How to get the parameter values - to


be discussed soon
Hands-on approach to AI for real-world applications
Non-linear Decision Boundary

Predict y = 1 if
- 1 + x12 + x22 ≥ 0

How to get the parameter values - to


be discussed soon
Hands-on approach to AI for real-world applications
Cost function for Logistic
Regression
How to get the parameter values?

Hands-on approach to AI for real-world applications


We have a training dataset for classification

m = number of training examples


x's = input variables / features
y's = "target" class, assuming binary (0 or 1)

(x(i),y(i)) = single training example


i is an index to training set

Hypothesis: sigmoid function hw(x)

Hands-on approach to AI for real-world applications


Recap: Linear Regression Cost function

However this cost function is non-convex for the


hypothesis (sigmoid) of logistic regression.

Hands-on approach to AI for real-world applications


Logistic Regression Cost function
If y is 0
If y is 1

CE
cost

0.0 hw(x) 1.0

{
Hands-on approach to AI for real-world applications
Logistic Regression Cost function
If y is 0
If y is 1

CE
cost

0.0 hw(x) 1.0

{
Now use the fact that y is always either 0 or 1, to write this as a closed-form expression
Hands-on approach to AI for real-world applications
Logistic Regression Cost function

Remember: y is always
either 0 or 1

Also known as Cross Entropy loss function.


This cost function is convex.

Objective: Find w that minimizes E(w)


Hands-on approach to AI for real-world applications
Gradient Descent step

Repeat until convergence {

Hands-on approach to AI for real-world applications


Gradient Descent step

Repeat until convergence {

Algorithm looks identical to linear regression, but the hypothesis function is


different for logistic regression.
Hands-on approach to AI for real-world applications
Making a prediction for a new input

So we can use Gradient Descent to learn optimal parameter values w

Given a new input x, to make a prediction, output the estimated probability that
y=1 for input x:

hw(x) = 𝝈 (wT x)
where 𝝈 (z) = 1 / ( 1 + e-z )

Predict class based on the estimated probability and decision boundary

Hands-on approach to AI for real-world applications


How to use the estimated probability?

● Refraining from classifying unless confident


● Multi-class classification
○ E.g. News article tagging: Politics, Sports, Movies, Religion
○ One-vs-all / one-vs-rest classifier: separate classifier for each class vs others
○ One-vs-one classifier: separate classifier for each pair of classes

Hands-on approach to AI for real-world applications


Binary vs multi-class classification

Hands-on approach to AI for real-world applications


Thank You!!

Hands-on approach to AI for real-world applications

You might also like