0% found this document useful (0 votes)

34 views49 pages

HAAI Linear Models Slides

Uploaded by

basu.sam09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views49 pages

HAAI Linear Models Slides

Uploaded by

basu.sam09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Linear Models for

Regression and Classification

Dr. Saptarshi Ghosh
Department of Computer Science and Engineering
IIT Kharagpur
http://cse.iitkgp.ac.in/~saptarshi/

Hands-on approach to AI for real-world applications

Regression

Often we want to predict a particular property/value for an entity, given some other
properties/values
Examples:
- Marks in Physics of a student in Class 12 finals, given his/her marks in
practice tests
- Predict the price of houses in a city, given their area, number of rooms, …

Such problems, where we try to predict a continuous value, are called Regression

Hands-on approach to AI for real-world applications

Dataset of house Area vs Price in a city

For simplicity, let us assume

the price depends only on
one factor - area of house

How can we learn to predict the prices of houses of other sizes in the city,
as a function of their area?
Hands-on approach to AI for real-world applications
Dataset of house Area vs Price in a city

Example of a
supervised learning problem!

This is a training set.

When the target variable we are trying to predict is continuous: regression problem

Hands-on approach to AI for real-world applications

Dataset of house Area vs Price in a city

m = number of training examples

x's = input variables / features
y's = output variables / "target"
variables

(x(i),y(i)) = single training example

i is an index to training
set

Hands-on approach to AI for real-world applications

How to train a model?

Training Set
Learn a function h(x), so that
h(x) is a good predictor for the
Learning
corresponding value of y Algorithm

h: hypothesis function

x h y’
(house area) (predicted price)

Acknowledgement: Some of the images are taken from course materials of Professor Andrew Ng
Hands-on approach to AI for real-world applications
Linear Regression

Hands-on approach to AI for real-world applications

Hypothesis for linear regression

● wi are the parameters : - w0 is the zero condition (bias)

- w1 is the
gradient For simplicity, we
w = [w0 w1] is the vector of all parameters assume only one input
variable x for now

● We assume y is a linear function of x

● How to learnHands-on
the values
approachof the
to AI parameters?
for real-world applications
Intuition for the hypothesis function

Which is the best straight line

to fit the data?

Choose the parameters such

that the prediction is closest to
the actual y-value for all the
training examples

Hands-on approach to AI for real-world applications

Try to minimize the prediction error

(x, y): a training example

(x, hw(x)): prediction of the model

( hw(x) – y ): prediction error for

this particular training example

➢ Minimize the prediction error

across all training examples

Hands-on approach to AI for real-world applications

Cost function

➢ Measure of how close the predictions are to the actual y-values

➢ Average over all the m training instances
➢ Mean Squared Error (MSE) cost function E(w)
➢ Choose parameters w so that E(w) is minimized

Hands-on approach to AI for real-world applications

Recap

Hypothesis:

Parameters: w = [ w0, w1 ]

MSE Cost function:

We want to find those values of w for which Cost function will be minimized.
Hands-on approach to AI for real-world applications
For simplicity, let us for now assume w0 = 0

Hypothesis:

Parameters: w = [ w0, w1 ]

MSE Cost function:

We want to find those values of w for which Cost function will be minimized.
Hands-on approach to AI for real-world applications
Hands-on approach to AI for real-world applications
Now let us again consider both w0 and w1 vary

Hypothesis:

Parameters: w = [ w0, w1 ]

MSE Cost function:

We want to find those values of w for which Cost function will be minimized.

Hands-on approach to AI for real-world applications

Contour Plot of Cost Function

E(w)

w1
w0
Acknowledgement: Some of the images are taken from course materials of Professor Andrew Ng
Hands-on approach to AI for real-world applications
Gradient Descent for minimizing a cost function

● Efficient and scalable to thousands of parameters

● Used in many applications of minimizing functions (applicable to any
function in general, not only cost functions)

Basic algorithm (iterative method):

● Start with some w
● Keep changing w to reduce E(w) until we (hopefully) end up at a
minimum

Hands-on approach to AI for real-world applications

Issues with Gradient Descent
If a function has multiple local minima,
where one starts can decide which
minimum is reached

E(w)
Gradient Descent can get stuck at a
local minima

w1
The MSE cost function in linear w0
regression is always a convex function
● always has a single minimum.
● gradient descent always converges
Hands-on approach to AI for real-world applications
Gradient descent algorithm

Repeat until convergence {

(Simultaneously
update w0 and w1)
}

α is the learning rate (will be discussed soon)

Hands-on approach to AI for real-world applications

Gradient descent for single variable (for simplicity)

➢ If the derivative is positive, E(w)

reduce value of w1
➢ If the derivative is negative,
increase value of w1
w1
Hands-on approach to AI for real-world applications
The learning rate α
● Do we need to change learning rate over time?
○ No, Gradient descent can converge to a local minimum,
even with the learning rate α fixed
○ Step size adjusted automatically

● But, value needs to be chosen judiciously

○ If α is too small, gradient descent can be slow to converge
Can get stuck in local minima for non convex cost function!
○ If α is too large, gradient descent can overshoot the minimum.
It may fail to converge, or even diverge!
Hands-on approach to AI for real-world applications
Gradient for MSE cost function

Hands-on approach to AI for real-world applications

Gradient descent for univariate linear regression

Repeat until convergence {

(Simultaneously
update w0 and w1)

Hands-on approach to AI for real-world applications

Multivariate Linear Regression (multiple variables)

Number of Number of Age of home

Size (feet2) bedrooms floors (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Hands-on approach to AI for real-world applications
Multivariate Linear Regression
X1 X2 X3 X4 Y

Take x0 = 1
(for easier notation)

Hands-on approach to AI for real-world applications

Gradient descent for multivariate linear regression
Repeat:

...
Hands-on approach to AI for real-world applications
Practical aspects of applying gradient descent

Feature Scaling: Make sure features are on a similar scale. If numerically in very
different range, gradient descent steps updates will be dominated by numerically
larger features
E.g. x1 = area between 300 - 5000 sq.ft.
x2 = #bedrooms between 1 - 5

Normalization strategies:
- Divide by maximum value of the feature
- Min-Max Scaling
- Replace with to make features have approx zero mean
Hands-on approach to AI for real-world applications
Practical aspects of applying gradient descent
Is gradient descent working properly?
- Plot how E(w) changes with every iteration of gradient descent
- For sufficiently small learning rate, E(w) should decrease with every iteration
- If not (e.g., fluctuating), learning rate needs to be reduced
- However, too small learning rate means slow convergence!

When to end gradient descent?

- Run a fixed number of iterations
- Declare convergence if E(w) decreases by less than ε in an iteration
(assuming E(w) is decreasing in every iteration)

Hands-on approach to AI for real-world applications

Polynomial Regression for multiple variables
A good way of
fitting a non-
linear curve
using the linear
regression
mechanism

y ’ = w 0 + w 1. x y’ = w0 + w1. x + w2.t
t = x2
Can also be multivariate:
y’ = w0 + (w11. x1 + w12. x12 + w13. x13 + …) + (w21. x2 + w22. x22 + w23. x23 + …) + …

Hands-on approach to AI for real-world applications

Logistic Regression
(for classification)

Hands-on approach to AI for real-world applications

Examples of classification

● Loan Application: Approve / Deny

Regression: predict a
● Email: Spam / Not Spam continuous value

● Face Recognition: Valid user / Unknown user Classification: predict a

discrete label
● Tumour: Malignant / Benign

Let's consider a simple loan application problem:

x = credit score ϵ 𝕫+
y ϵ {0, 1} 0: “Negative class” (deny) 1: “Positive class” (approve)
Hands-on approach to AI for real-world applications
How to approach classification?

Let's consider a simple loan application problem:

x = credit score ϵ 𝕫+
y ϵ {0, 1} 0: “Negative class” (deny) 1: “Positive class” (approve)

Given an input x, try to predict / estimate the probability that y = 1 for this x

hw(x) = estimated probability that y=1 for input x, parameterized by w

How will the function hw(x) look like?

Hands-on approach to AI for real-world applications

The sigmoid / logistic function

Want: 0 ≤ hw(x) ≤ 1
hw(x) be differentiable at all points

hw(x) = 𝝈 (wT x)

where 𝝈 (z) = 1 / ( 1 + e-z )

Convex function: can compute
Pass the linear regression output differential at any point !
through a “Sigmoid / Logistic function”
Hands-on approach to AI for real-world applications
From probabilities to classification

hw(x) = 𝝈 (wT x)
where 𝝈 (z) = 1 / (1 + e-z)

Suppose:
Predict y = 1 when hw(x) ≥ 0.5

➢ wT x ≥ 0

Predict y = 0 when hw(x) < 0.5

➢ wT x < 0
Hands-on approach to AI for real-world applications
Separating two classes of points

● We are attempting to separate two given sets / classes of points

● Separate two regions of the feature space
● Concept of Decision Boundary - a boundary that separates two
classes of points / regions on the feature space
● Finding a good decision boundary
➢ learn appropriate values for the parameters w

Hands-on approach to AI for real-world applications

Decision Boundary

Predict y = 1 if
- 3 + x1 + x2 ≥ 0

How to get the parameter values - to

be discussed soon
Hands-on approach to AI for real-world applications
Non-linear Decision Boundary

Predict y = 1 if
- 1 + x12 + x22 ≥ 0

How to get the parameter values - to

be discussed soon
Hands-on approach to AI for real-world applications
Cost function for Logistic
Regression
How to get the parameter values?

Hands-on approach to AI for real-world applications

We have a training dataset for classification

m = number of training examples

x's = input variables / features
y's = "target" class, assuming binary (0 or 1)

(x(i),y(i)) = single training example

i is an index to training set

Hypothesis: sigmoid function hw(x)

Hands-on approach to AI for real-world applications

Recap: Linear Regression Cost function

However this cost function is non-convex for the

hypothesis (sigmoid) of logistic regression.

Hands-on approach to AI for real-world applications

Logistic Regression Cost function
If y is 0
If y is 1

CE
cost

0.0 hw(x) 1.0

{
Hands-on approach to AI for real-world applications
Logistic Regression Cost function
If y is 0
If y is 1

CE
cost

0.0 hw(x) 1.0

{
Now use the fact that y is always either 0 or 1, to write this as a closed-form expression
Hands-on approach to AI for real-world applications
Logistic Regression Cost function

Remember: y is always
either 0 or 1

Also known as Cross Entropy loss function.

This cost function is convex.

Objective: Find w that minimizes E(w)

Hands-on approach to AI for real-world applications
Gradient Descent step

Repeat until convergence {

Hands-on approach to AI for real-world applications

Gradient Descent step

Repeat until convergence {

Algorithm looks identical to linear regression, but the hypothesis function is

different for logistic regression.
Hands-on approach to AI for real-world applications
Making a prediction for a new input

So we can use Gradient Descent to learn optimal parameter values w

Given a new input x, to make a prediction, output the estimated probability that
y=1 for input x:

hw(x) = 𝝈 (wT x)
where 𝝈 (z) = 1 / ( 1 + e-z )

Predict class based on the estimated probability and decision boundary

Hands-on approach to AI for real-world applications

How to use the estimated probability?

● Refraining from classifying unless confident

● Multi-class classification
○ E.g. News article tagging: Politics, Sports, Movies, Religion
○ One-vs-all / one-vs-rest classifier: separate classifier for each class vs others
○ One-vs-one classifier: separate classifier for each pair of classes

Hands-on approach to AI for real-world applications

Binary vs multi-class classification

Hands-on approach to AI for real-world applications

Thank You!!

Hands-on approach to AI for real-world applications

DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
351 pages
Module III - Simulation Scenarios - C1
No ratings yet
Module III - Simulation Scenarios - C1
179 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
cs229 2
No ratings yet
cs229 2
275 pages
Supervised Learning - Regression - Public - BS AI
No ratings yet
Supervised Learning - Regression - Public - BS AI
51 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Logistic Regression
No ratings yet
Logistic Regression
51 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
7 - Feedforward and Backpropagation
No ratings yet
7 - Feedforward and Backpropagation
55 pages
Take It Easy: Created Status Last Read
No ratings yet
Take It Easy: Created Status Last Read
55 pages
Regression
No ratings yet
Regression
30 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
CS229
No ratings yet
CS229
69 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Week3 LearningI
No ratings yet
Week3 LearningI
48 pages
First Cours 2
No ratings yet
First Cours 2
42 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Sourav Moocs A2 65
No ratings yet
Sourav Moocs A2 65
32 pages
Supevised Learning - 1
No ratings yet
Supevised Learning - 1
26 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
NN Theory
No ratings yet
NN Theory
138 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Chapter 2 - 2 Shallow Neural Network 2 - 2
No ratings yet
Chapter 2 - 2 Shallow Neural Network 2 - 2
34 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Neural Networks: Artificial Intelligence: Representation and Problem Solving
No ratings yet
Neural Networks: Artificial Intelligence: Representation and Problem Solving
19 pages
cs188 Fa23 Note23
No ratings yet
cs188 Fa23 Note23
2 pages
cs188 Fa24 Lec24
No ratings yet
cs188 Fa24 Lec24
46 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
9 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
M146 Lec3 Sidenotes S25
No ratings yet
M146 Lec3 Sidenotes S25
29 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
PDF 1678529419
No ratings yet
PDF 1678529419
100 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
ML Notes
No ratings yet
ML Notes
14 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Notes 1
No ratings yet
Notes 1
30 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Computing Gradient Using Backpropagation: ZV0GDF798E
No ratings yet
Computing Gradient Using Backpropagation: ZV0GDF798E
5 pages
DeepLearning Introduction
No ratings yet
DeepLearning Introduction
14 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Unit 9 DevOps
100% (1)
Unit 9 DevOps
39 pages
Feature Description Guide
No ratings yet
Feature Description Guide
100 pages
X Maths Ncert
No ratings yet
X Maths Ncert
236 pages
Siprotec 5 Configuration Aug 18, 2020 7:53 AM
0% (1)
Siprotec 5 Configuration Aug 18, 2020 7:53 AM
6 pages
Unit-5 Notes CN
No ratings yet
Unit-5 Notes CN
19 pages
User Manual 40224
No ratings yet
User Manual 40224
103 pages
Dell 27 Monitor - E2720H Data Sheet
No ratings yet
Dell 27 Monitor - E2720H Data Sheet
4 pages
Cdnuploadswebsitesyllabusm SC CSM SC CS 2023 2025 Batch Syllabus PDF
No ratings yet
Cdnuploadswebsitesyllabusm SC CSM SC CS 2023 2025 Batch Syllabus PDF
90 pages
MATH 8 Test Questions Q1 SY 2024-25 REVIEWER
No ratings yet
MATH 8 Test Questions Q1 SY 2024-25 REVIEWER
4 pages
Nikon f6
No ratings yet
Nikon f6
192 pages
6-Week6 Osa Theory
No ratings yet
6-Week6 Osa Theory
36 pages
Devops Fresher Resume 1
No ratings yet
Devops Fresher Resume 1
2 pages
Anti-Surge Presentation Mod LV
No ratings yet
Anti-Surge Presentation Mod LV
46 pages
Module 1 Cataloging and Catalogs
No ratings yet
Module 1 Cataloging and Catalogs
50 pages
Home Page Acumatica
No ratings yet
Home Page Acumatica
88 pages
RRB Alp CBT 1 Paper 13 Aug 2018 Shift 01
No ratings yet
RRB Alp CBT 1 Paper 13 Aug 2018 Shift 01
25 pages
Mod 4
No ratings yet
Mod 4
46 pages
False Color Analysis Page: Software Starting
No ratings yet
False Color Analysis Page: Software Starting
23 pages
TB82PH Quick-Start Guide: Echnical Upport
No ratings yet
TB82PH Quick-Start Guide: Echnical Upport
33 pages
Building Java Programs: Implementing A Collection Class: Arrayintlist
No ratings yet
Building Java Programs: Implementing A Collection Class: Arrayintlist
61 pages
Mlops - Definitions, Tools and Challenges: Elated Ork
No ratings yet
Mlops - Definitions, Tools and Challenges: Elated Ork
8 pages
WinFR Recover
No ratings yet
WinFR Recover
13 pages
Google Cloud Foundation Accelerator Brochure
No ratings yet
Google Cloud Foundation Accelerator Brochure
3 pages
07 - Investigation Scripts Sage
No ratings yet
07 - Investigation Scripts Sage
17 pages
Building and Managing System
No ratings yet
Building and Managing System
5 pages
Class Xii CS PB Term 2 Question Paper
No ratings yet
Class Xii CS PB Term 2 Question Paper
4 pages
Ems Guide Book: Electromedic - X86 Friza Servile 03/03/2015
No ratings yet
Ems Guide Book: Electromedic - X86 Friza Servile 03/03/2015
30 pages
Dell Precision 690 Technicke Specifikace en
No ratings yet
Dell Precision 690 Technicke Specifikace en
2 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Service & Support: Communication Between SIMATIC S5 and Simatic S7 Over Profibus
No ratings yet
Service & Support: Communication Between SIMATIC S5 and Simatic S7 Over Profibus
30 pages
Two Dimensional Computer Graphics: Exploring the Visual Realm: Two Dimensional Computer Graphics in Computer Vision
From Everand
Two Dimensional Computer Graphics: Exploring the Visual Realm: Two Dimensional Computer Graphics in Computer Vision
Fouad Sabry
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Image Based Modeling and Rendering: Exploring Visual Realism: Techniques in Computer Vision
From Everand
Image Based Modeling and Rendering: Exploring Visual Realism: Techniques in Computer Vision
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet