0% found this document useful (0 votes)

5 views

Logistic Regression

Uploaded by

mertdene10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Logistic Regression

Uploaded by

mertdene10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

CMPE 442 Introduction to

Machine Learning
• Polynomial Regression
• Locally Weighted Linear Regression
• Logistic Regression/ Softmax Regression
Polynomial Regression

 What if your data is actually more complex than a simple straight line?
 We can use a linear model to fit nonlinear data
 A model is said to be linear if it is linear in parameters
y   0  1 x1   2 x2  

y   0  1 x   2 x 2  

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 Add powers of each feature as new features, then train a linear model on
this extended set of features
Polynomial Regression

 A kth order polynomial model in one variable is given by

y   0  1 x   2 x 2  ...   k x k  

 A second order polynomial model in two variable is given by

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 An array containing n features can be transformed into an array containing

( n  d )!
features, where d is polynomial order
d !n!
Polynomial Regression: Example

 Let’s generate some nonlinear data based on a simple quadratic equation

 x’s are assigned randomly
y  2  x  0.5 x 2  
Polynomial Regression: Example

A straight line poorly fits nonlinear data

Case for underfitting

y  3.52  0.98 x
Polynomial Regression: Example

Add powers of each feature as new features

x (i )  [ x (i ) , ( x (i ) ) 2 ]

y  2  x  0.5 x 2  

h(x)  2.04  0.96 x  0.51x 2

Polynomial Regression: Example
Polynomial Regression: Example
Evaluating the Performance of the
Model
 RMSE - is the square root of the average of the sum of the squares of residuals

1 m
RMSE ( X, h )  
m i 1
(h ( x (i ) )  y (i ) ) 2

 R2- score or the coefficient of determination explains how much the total variance of
the dependent variable can be reduced by using the least square regression.

𝑅 =1− where, 𝑆𝑆 = ∑ (𝑦 − 𝑦) and 𝑆𝑆 = ∑ (ℎ 𝑥 − 𝑦( ))

Locally Weighted Linear Regression

 In the original linear regression algorithm, to make a prediction at a query

point x, we would:
1. Fit θ to minimize (y i
(i )
 θT x ( i ) ) 2
T (i )
2. Output θ x

 In locally weighted linear regression algorithm

1. Fit θ to minimize w
i
(i )
( y ( i )  θT x ( i ) ) 2
2. Output θ x T (i )
Linear Regression

y  2  x  0.5 x 2  
Linear Regression

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

Linear Regression

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

𝑥 = 2.5 ℎ 2.5 = 2.04 + 0.96 ∗ 2.5 + 0.51 ∗ 2.5 = 7.63

Locally Weighted Linear Regression

y  2  x  0.5 x 2  
Locally Weighted Linear Regression

y  2  x  0.5 x 2  

𝑥 = 2.5
Locally Weighted Linear Regression

y  2  x  0.5 x 2  

𝑥 = 2.5
Locally Weighted Linear Regression

This line is learnt specifically

y  2  x  0.5 x 2   for x_new

𝑥 = 2.5
Locally Weighted Linear Regression

y  2  x  0.5 x 2  

𝑥 = 2.5
Gradient Descent

 Update Rule:
 𝜃 =𝜃 −η ∑ 𝑤 ( ) (ℎ 𝑥 ( ) − 𝑦 ( ) ) 𝑥 ()

 The magnitude of the update is proportional to the error and the weight of
a sample
Locally Weighted Linear Regression

 w(i)’s are non-negative valued weights.

 Weights depend on the particular point x at which we are trying to
evaluate x.

w  exp(
(i ) ( x ( i )  x) 2
)  bandwidth parameter
2 2

 If x  x is small, then w(i ) is close to 1

(i )

 If x  x is large, then w(i ) is small

(i )

 θ is chosen giving a much higher weight to the (errors on) training examples
close to the query point x.
Locally Weighted Linear Regression

 Linear regression algorithm is a parametric learning algorithm

 It has fixed, finite number of parameters which are fit to the data
 Once we fit θ’s and store them away, we don’t need to keep training data for
future predictions
 Locally weighted linear regression is a non-parametric algorithm
 To make predictions we need to keep entire training set around
 Amount of stuff we need to keep in order to represent the hypothesis h grows
linearly with the size of the training set
Question

 Is the gradient descent algorithm affected by the scales of features?

Question

 Is the gradient descent algorithm affected by the scales of features?

Feature Normalization

 We can speed up the gradient descent by normalizing features (having

them in approximately equal ranges),
 𝜃 will descent quickly on small ranges
1) Feature scaling: 𝑥 =
( )

2) Mean normalization: 𝑥 = 𝑜𝑟 𝑥 =
( )

 For polynomial regression feature normalization is even more important.

 Not needed for Normal Equation.
Summary on Linear Regression

 Considered the case when the hypothesis function is linear.

 We can make our hypothesis nonlinear by combining existing features and
obtaining more features and make the function quadratic, cubic.
 Closed-form solution: Normal Equations
 Scaling the data for gradient descent algorithm is important.
Logistic Regression

 Classification problem
 Similar to regression problem
 Assume a binary classifier: y can take on only two values 0 and 1.
 Ex.: spam email (1, positive class), ham email (0 negative class)
Linear Regression

ℎ 𝑥 = 𝜃 + 𝜃 𝑥 + 𝜃 𝑥 + ⋯+ 𝜃 𝑥
Classification Problem

Threshold classifier output at 0.5

If ℎ 𝑥 ≥ 0.5 predict 𝑦 = 1
ℎ 𝑥 = −0.34 + 1.58𝑥 If ℎ (𝑥) < 0.5 predict 𝑦 = 0
Classification Problem

 𝑦 ∈ 0,1
 So, we want 0≤ℎ 𝑥 ≤1
Logistic Regression

 Let’s change the form of hypotheses h (x)

1
h ( x )  g ( T x )  T
1  e  x

1
g (z)  Logistic function or sigmoid function
1  ez
Logistic Regression

1
g (z) 
1  ez
g ( z ) tends towards 1 as z  

g ( z ) tends towards 0 as z  

g(z) is always bounded between 0 and 1

1
h ( x )  g ( T x )  T
1  e  x

n
 x  0   j x j
T
x0  1
j 1
Logistic Regression
Predicts something that is
True or False, instead of
something continuous like
size
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Interpretation of Logistic Function

 ℎ 𝑥 - can be estimated as a probability that 𝑦 = 1 on input 𝑥

 ℎ 𝑥 = 𝑃(𝑦 = 1|𝑥; 𝜃)
 𝑃 𝑦 = 0 𝑥; 𝜃 = 1 − 𝑃(𝑦 = 1|𝑥; 𝜃)
Decision Boundary

 Assume: 𝑦 = 1 𝑖𝑓 ℎ 𝑥 ≥ 0.5
𝑦 = 0 𝑖𝑓 ℎ < 0.5

ℎ 𝑥 = 𝑔(𝜃 + 𝜃 𝑥 + 𝜃 𝑥 )

−3
𝜃= 1
1

Predict 𝑦 = 1 if −3 + 𝑥 + 𝑥 ≥ 0

𝑥 +𝑥 =3 Decision Boundary
Logistic Regression

 Given the logistic regression model, how do we fit θ for it?

Logistic Regression

 Training set: { 𝑥 ,𝑦 , 𝑥 ,𝑦 , … , (𝑥 , 𝑦( )
)}
 𝑚 examples
𝑥
 𝑥= ⋮ 𝑥 =1
𝑥
 𝑦 ∈ {0,1}
 ℎ 𝑥 =
Cost Function

 Linear Regression: 𝐽 𝜃 = ∑ (ℎ 𝑥 − 𝑦( ))

 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 = (ℎ 𝑥 − 𝑦( ))
 Using this cost function for logistic regression gives non-convex function
 Not guaranteed to converge to global minima
Cost Function

− log ℎ 𝑥 𝑖𝑓 𝑦 = 1
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 =
− log 1 − ℎ 𝑥 𝑖𝑓 𝑦 = 0

 𝐶𝑜𝑠𝑡 = 0 𝑖𝑓 𝑦 = 1 𝑎𝑛𝑑 ℎ 𝑥 = 1
 But as ℎ 𝑥 → 0 𝐶𝑜𝑠𝑡 → ∞
 If ℎ 𝑥 = 0 => predict that 𝑝 𝑦 = 1 𝑥; 𝜃 = 0  if actually 𝑦 = 1 then
penalizes learning algorithm by a very large cost
Cost Function

− log ℎ 𝑥 𝑖𝑓 𝑦 = 1 1
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 = 𝐽 𝜃 = 𝐶𝑜𝑠𝑡(ℎ 𝑥 , 𝑦( ))
− log 1 − ℎ 𝑥 𝑖𝑓 𝑦 = 0 𝑚

Cost Cost

ℎ 𝑥 ℎ 𝑥
Cost Function

− log ℎ 𝑥 𝑖𝑓 𝑦 = 1
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 =
− log 1 − ℎ 𝑥 𝑖𝑓 𝑦 = 0
 𝑦 = 1 𝑜𝑟 𝑦 = 0 only two values are possible
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 = −𝑦𝑙𝑜𝑔 ℎ 𝑥 − 1 − 𝑦 log 1 − ℎ 𝑥

 𝐽 𝜃 = ∑ 𝐶𝑜𝑠𝑡(ℎ 𝑥 , 𝑦( )) = − ∑ 𝑦 ( ) log(ℎ 𝑥 )+ 1−
𝑦 log(1 − ℎ 𝑥 )
 This is a convex function
Logistic Regression

 To fit parameters 𝜃:
 min 𝐽 𝜃 to obtain 𝜃

 To make a prediction on given new 𝑥:

 Output ℎ 𝑥 =  𝑝(𝑦 = 1|𝑥; 𝜃)
Gradient Descent

 Want min 𝐽 𝜃 :

Repeat{

𝜃 ≔𝜃 −𝞰 𝐽(𝜃)

} (Simultaneously update for all 𝜃 s)

Gradient Descent

1
𝐽 𝜃 = 𝐶𝑜𝑠𝑡(ℎ 𝑥 , 𝑦( ))
𝑚
1
=− 𝑦 ( ) log(ℎ 𝑥 )+ 1−𝑦 log(1 − ℎ 𝑥 )
𝑚

𝜕 1
𝐽 𝜃 = (ℎ 𝑥 − 𝑦 ( ) )𝑥 ()
𝜕𝜃 𝑚
Gradient Descent

 Want min 𝐽 𝜃 : 1
ℎ 𝑥 =
Repeat{ 1+𝑒
𝜃 ≔𝜃 −𝞰 ∑ (ℎ 𝑥 − 𝑦 ( ) )𝑥 ()

} (Simultaneously update for all 𝜃 s)

Python Implementation

Gives probabilities.
To get the classes just use log_reg.predict()
Example
Let’s apply Logistic Regression on Iris dataset.
Irisfamous dataset, contains the sepal and petal length and width of 150 iris flowers
Three different species: Iris-Setosa, Iris-Versicolor and Iris-Virginica
Iris Dataset
Example
Example
Nonlinear Decision Boundary
Nonlinear Decision Boundary

𝑥 =𝑥
Nonlinear Decision Boundary

ℎ 𝑥 = 𝑔(𝜃 + 𝜃 𝑥 + 𝜃 𝑥 )
𝟏
ℎ 𝑥 = 𝑔(𝜃 + 𝜃 𝑥 + 𝜃 𝑥 +𝜃 𝑥 +𝜃 𝑥 )

−1
0
𝟏 𝑋 𝜃= 0
−𝟏
1
−𝟏 1

Predict 𝑦 = 1 if −1 + 𝑥 +𝑥 ≥0

𝑥 +𝑥 ≥1 𝑥 +𝑥 =1
Multiclass Classification

 What if 𝑦 ∈ {1,2, … , 𝑘} classes

 Example: weather prediction {sunny, cloudy, snowy, …}

𝑋
One-vs-all Classifier (one-vs-rest)

Class 1:
Class 2:
Class 3:

𝑋 𝑋 𝑋

𝑋 𝑋 𝑋
ℎ (𝑥) ℎ (𝑥) ℎ (𝑥)
One-vs-all Classifier (one-vs-rest)

 ℎ 𝑥 = 𝑝 𝑦 = 𝑖 𝑥; 𝜃 𝑓𝑜𝑟 𝑖 = 1,2,3
 One-vs-all:
 Train a logistic regression classifier ℎ (𝑥) for each class 𝑖 to predict the probability
𝑦=1
 On a new input 𝑥, to make a prediction pick the class 𝑖 that maximizes max ℎ (𝑥)
Softmax Regression

 Multi-class classification
 Multinomial Logistic Regression
 Generalization of logistic regression
 𝑦 ∈ {1,2, … , 𝐾}
Softmax Regression

 Given a test input 𝑥, estimate the probability that 𝑃(𝑦 = 𝑘|𝑥) for each value
of 𝑘 = 1, … , 𝐾
 Estimate the probability of the class label taking on each of the 𝐾 different
possible values.
 Our hypothesis will output a 𝐾-dimensional vector (whose elements sum to
1) giving 𝐾 estimated probabilities.
Softmax Regression: Hypothesis

 Hypothesis ℎ (𝑥) takes the form:

𝑃(𝑦 = 1|𝒙; 𝜃) exp(𝜽 𝒙)
𝑃(𝑦 = 2|𝒙; 𝜃) exp(𝜽 𝒙)
 ℎ 𝒙 = =∑
⋮ (𝜽 𝒙) ⋮
𝑃(𝑦 = 𝐾|𝒙; 𝜃) exp(𝜽 𝒙)
 𝜽( ) , 𝜽( ) , … , 𝜽( )
∈ℜ parameters of the model
 ∑
normalizes distribution so it sums to one
(𝜽 𝒙)

 1 - indicator function
 1 𝑇𝑟𝑢𝑒 𝑠𝑡𝑎𝑡𝑒𝑚𝑒𝑛𝑡 = 1 and 1 𝐹𝑎𝑙𝑠𝑒 𝑠𝑡𝑎𝑡𝑒𝑚𝑒𝑛𝑡 = 0
 Minimize the cost function:

()
exp(𝜽 𝒙( ) )
𝐽 𝜃 =− 1𝑦 = 𝑘 𝑙𝑜𝑔
∑ exp(𝜽 𝒙( ) )

(𝒊)
exp(𝜽 𝒙( ) )
𝑃 𝑦 = 𝑘 𝒙 ;𝜽 =
∑ exp(𝜽 𝒙( ) )
Softmax Regression

 We cannot solve for the minimum of 𝐽(𝜃) analytically

 Use iterative optimization algorithm.
 Taking derivatives, one can show that the gradient is:

𝛻 𝐽 𝜽 =− 𝒙 (1 𝑦 = 𝑘 − 𝑃(𝑦 = 𝑘|𝒙; 𝜽))

exp(𝜽 𝒙( ) )
𝑃 𝑦 =𝑘 𝒙(𝒊) ; 𝜽 =
∑ exp(𝜽 𝒙( ) )
Example: Iris dataset
Python Implementation

 ScikitLearn’s Logistic Regression uses one-versus-all bu default

 Switch to Softmax Regression by setting multi-class=“multinomial”
Example: MNIST dataset

 70000 images of digits handwritten by high school students and employees

of the US Census Bureau
 Each image is 28x28 pixels (hence has 784 features)
 Each feature simply represents one pixel’s intensity, from 0 (white) to 255
(black)
 Is already split into a training set ( the first 60000 images) and a test set (last
10000 images)
Example: MNIST dataset
Example: MNIST dataset
Learned Weights
Example: MNIST dataset

Illustration of the decrease of the cost

function over time

 Training Accuracy: 0.9209666666666667

 Test Accuracy: 0.9206
Question

 Suppose you train a logistic regression classifier and your hypothesis function H is

 Which of the following figure will represent the decision boundary as given by above classifier?
Question

 Imagine, you have given the below graph of logistic regression which is shows the relationships
between cost function and number of iteration for 3 different learning rate values (different colors
are showing different curves at different learning rates ).
Suppose, you save the graph for future reference but you forgot to save the value of
different learning rates for this graph. Now, you want to find out the relation between
the leaning rate values of these curve. Which of the following will be the true relation?
Note:
1. The learning rate for blue is l1
2. The learning rate for red is l2
3. The learning rate for green is l3

A) l1>l2>l3
B) l1 = l2 = l3
C) l1 < l2 < l3
D) None of these
Question

 Suppose you are dealing with 4 class classification problem and you want to train a logistic
regression model on the data for that you are using One-vs-all method.
 How many logistic regression models do we need to fit in 4-class classification problem?

AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
06LogisticRegression
No ratings yet
06LogisticRegression
55 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
3 Logistic Regression and Regularization
No ratings yet
3 Logistic Regression and Regularization
42 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
DDA3020 Lecture 06 Logistic Regression
No ratings yet
DDA3020 Lecture 06 Logistic Regression
47 pages
Lecture 5_Logistic Regression (1)
No ratings yet
Lecture 5_Logistic Regression (1)
28 pages
CH 4
No ratings yet
CH 4
41 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
10. Binary Logistic Regression 2
No ratings yet
10. Binary Logistic Regression 2
43 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
fileml
No ratings yet
fileml
54 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Final Ml
No ratings yet
Final Ml
54 pages
Cost Function
No ratings yet
Cost Function
17 pages
Supervised Learning Part2
No ratings yet
Supervised Learning Part2
27 pages
CH 1
No ratings yet
CH 1
24 pages
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
No ratings yet
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
42 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
CH3 Logistic Regression 2020
No ratings yet
CH3 Logistic Regression 2020
28 pages
L4 - Logistic Regression - B
No ratings yet
L4 - Logistic Regression - B
45 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
04-LogisticRegression
No ratings yet
04-LogisticRegression
29 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
Logistic - Regression Class 3
No ratings yet
Logistic - Regression Class 3
88 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
No ratings yet
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
31 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
Notes 05
No ratings yet
Notes 05
51 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
ML-Unit I - Logistic Regression
No ratings yet
ML-Unit I - Logistic Regression
102 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
LR2
No ratings yet
LR2
25 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
CSE445 T4a Logistic Regression
No ratings yet
CSE445 T4a Logistic Regression
38 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
ML-1
No ratings yet
ML-1
24 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Regression and Classification
No ratings yet
Regression and Classification
15 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Sample Research Paper
No ratings yet
Sample Research Paper
26 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Attachment
No ratings yet
Attachment
1 page
Assignment For Burrito Optimization Game
No ratings yet
Assignment For Burrito Optimization Game
3 pages
DAA - Shortest Paths - Tutorialspoint
No ratings yet
DAA - Shortest Paths - Tutorialspoint
5 pages
Ee04 Lesson 1-Part 1
No ratings yet
Ee04 Lesson 1-Part 1
31 pages
CE50P-2/E01 Date: 05/06/20 CE-2 Numerical Differentiation and Integration Date: 24/07/20
No ratings yet
CE50P-2/E01 Date: 05/06/20 CE-2 Numerical Differentiation and Integration Date: 24/07/20
9 pages
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
No ratings yet
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
24 pages
Or Viva Question Bank With Answers - RNP
No ratings yet
Or Viva Question Bank With Answers - RNP
8 pages
3 Hproblems
No ratings yet
3 Hproblems
8 pages
Sequencing Problems Processing N Jobs Through 2 Machines Problem Example PDF
100% (1)
Sequencing Problems Processing N Jobs Through 2 Machines Problem Example PDF
2 pages
EPPC102 Module III
No ratings yet
EPPC102 Module III
26 pages
Lecture 7 - Backpropagation Example With Numbers Step by Step – a Not So Random Walk
No ratings yet
Lecture 7 - Backpropagation Example With Numbers Step by Step – a Not So Random Walk
10 pages
Optimization in Rubber Industry
No ratings yet
Optimization in Rubber Industry
48 pages
ETHZ Lecture9
No ratings yet
ETHZ Lecture9
28 pages
LP_duality _5
No ratings yet
LP_duality _5
40 pages
Maths Python Code
No ratings yet
Maths Python Code
4 pages
Algorithm Design Paradigm-2 PDF
No ratings yet
Algorithm Design Paradigm-2 PDF
28 pages
G8 MIMs # 5 - WEEK 1 - 1st QUARTER
No ratings yet
G8 MIMs # 5 - WEEK 1 - 1st QUARTER
3 pages
Unit 2 - Bubble - Sort
No ratings yet
Unit 2 - Bubble - Sort
15 pages
665282-CL09 WS2 Polynomials Sheena 23 24
No ratings yet
665282-CL09 WS2 Polynomials Sheena 23 24
3 pages
Lecture 9 Si416
No ratings yet
Lecture 9 Si416
14 pages
Unit 3 Artificial Intelligence
No ratings yet
Unit 3 Artificial Intelligence
21 pages
Cvx Lecture Slides
No ratings yet
Cvx Lecture Slides
29 pages
All chapter download Numerical Methods for Engineers 7th Edition Chapra Solutions Manual
75% (4)
All chapter download Numerical Methods for Engineers 7th Edition Chapra Solutions Manual
42 pages
Adding Polynomials: Polynomial
No ratings yet
Adding Polynomials: Polynomial
7 pages
PDF
100% (3)
PDF
18 pages
Nptel 1 PDF
No ratings yet
Nptel 1 PDF
23 pages
Curve Fitting: Fitting A Straight Line
No ratings yet
Curve Fitting: Fitting A Straight Line
17 pages
Math10 Q2W1 2 OHSP
No ratings yet
Math10 Q2W1 2 OHSP
18 pages
Algorithm Design & Data Structures
No ratings yet
Algorithm Design & Data Structures
13 pages
Matching in Planar Graphs
No ratings yet
Matching in Planar Graphs
27 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

CMPE 442 Introduction to

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 A kth order polynomial model in one variable is given by

 A second order polynomial model in two variable is given by

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 An array containing n features can be transformed into an array containing

 Let’s generate some nonlinear data based on a simple quadratic equation

A straight line poorly fits nonlinear data

Add powers of each feature as new features

h(x)  2.04  0.96 x  0.51x 2

𝑅 =1− where, 𝑆𝑆 = ∑ (𝑦 − 𝑦) and 𝑆𝑆 = ∑ (ℎ 𝑥 − 𝑦( ))

 In the original linear regression algorithm, to make a prediction at a query

 In locally weighted linear regression algorithm

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

𝑥 = 2.5 ℎ 2.5 = 2.04 + 0.96 ∗ 2.5 + 0.51 ∗ 2.5 = 7.63

This line is learnt specifically

 w(i)’s are non-negative valued weights.

 If x  x is small, then w(i ) is close to 1

 If x  x is large, then w(i ) is small

 Linear regression algorithm is a parametric learning algorithm

 Is the gradient descent algorithm affected by the scales of features?

 Is the gradient descent algorithm affected by the scales of features?

 We can speed up the gradient descent by normalizing features (having

 For polynomial regression feature normalization is even more important.

 Considered the case when the hypothesis function is linear.

Threshold classifier output at 0.5

 Let’s change the form of hypotheses h (x)

g(z) is always bounded between 0 and 1

 ℎ 𝑥 - can be estimated as a probability that 𝑦 = 1 on input 𝑥

 Given the logistic regression model, how do we fit θ for it?

 To make a prediction on given new 𝑥:

} (Simultaneously update for all 𝜃 s)

} (Simultaneously update for all 𝜃 s)

 What if 𝑦 ∈ {1,2, … , 𝑘} classes

 Hypothesis ℎ (𝑥) takes the form:

 We cannot solve for the minimum of 𝐽(𝜃) analytically

𝛻 𝐽 𝜽 =− 𝒙 (1 𝑦 = 𝑘 − 𝑃(𝑦 = 𝑘|𝒙; 𝜽))

 ScikitLearn’s Logistic Regression uses one-versus-all bu default

 70000 images of digits handwritten by high school students and employees

Illustration of the decrease of the cost

 Training Accuracy: 0.9209666666666667

You might also like