0% found this document useful (0 votes)
4 views59 pages

03-Logistic Regression

The document is a course outline for a Machine Learning class (CE 40717) at Sharif University of Technology, focusing on logistic regression. It covers topics such as classification problems, the sigmoid function, decision surfaces, maximum likelihood estimation, cost functions, and gradient descent. The document includes an introduction, detailed sections on logistic regression fundamentals, and references for further reading.

Uploaded by

mmaaee171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views59 pages

03-Logistic Regression

The document is a course outline for a Machine Learning class (CE 40717) at Sharif University of Technology, focusing on logistic regression. It covers topics such as classification problems, the sigmoid function, decision surfaces, maximum likelihood estimation, cost functions, and gradient descent. The document includes an introduction, detailed sections on logistic regression fundamentals, and references for further reading.

Uploaded by

mmaaee171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Introduction Logistic Regression Summary Extra reading References

Machine Learning (CE 40717)


Fall 2024

Ali Sharifi-Zarchi

CE Department
Sharif University of Technology

October 5, 2024

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 1 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression

3 Summary

4 Extra reading

5 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 2 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression

3 Summary

4 Extra reading

5 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 3 / 59
Introduction Logistic Regression Summary Extra reading References

Classification problem

• Classification (binary)
• Email: Spam / Not Spam?
• Online Transactions: Fraudulent / Genuine?
• Tumor: Malignant / Benign?

(
0: “Negative Class” (e.g., benign tumor)
y ∈ {0, 1}
1: “Positive Class” (e.g., malignant tumor)

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 4 / 59
Introduction Logistic Regression Summary Extra reading References

Classification problem (cont.)

• Can we solve the problem using linear regression?

• We could fit a straight line and define a threshold at 0.5:

If hθ (x) ≥ 0.5, predict y = 1


If hθ (x) < 0.5, predict y = 0

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 5 / 59
Introduction Logistic Regression Summary Extra reading References

Classification problem (cont.)

• What about now? (By adding a new data point)

• Classification: y = 0 or y = 1
• hθ (x) can be > 1 or < 0
• Another drawback of using linear regression for this problem
• What we need:

Logistic regression: 0 ≤ hθ (x) ≤ 1

• We also show this function with other notations: f (x; w) = σ(wT x)


CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 6 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression
Fundamentals
Decision surface
ML estimation
Cost function
Gradient descent
Multi-class logistic regression

3 Summary

4 Extra reading

5 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 7 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression
Fundamentals
Decision surface
ML estimation
Cost function
Gradient descent
Multi-class logistic regression

3 Summary

4 Extra reading

5 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 8 / 59
Introduction Logistic Regression Summary Extra reading References

Introduction

• Suppose we have a binary classification task (so K = 2).


• By observing age, gender, height, weight and BMI we try to distinguish if a person
is overweight or not overweight.
Age Gender Height (cm) Weight (kg) BMI Overweight
25 Male 175 80 25.3 0
30 Female 160 60 22.5 0
...
35 Male 180 90 27.3 1
• We denote the features of a sample with vector x and the label with y.
• In logistic regression we try to find an σ(wT x) which predicts posterior
probabilities P(y = 1|x).

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 9 / 59
Introduction Logistic Regression Summary Extra reading References

Introduction (cont.)

• σ(wT x): probability that y = 1 given x (parameterized by w)

P(y = 1|x, w) = σ(wT x)


P(y = 0|x, w) = 1 − σ(wT x)

• We need to look for a function which gives us an output in the range [0, 1]. (like a
probability).
• Let’s denote this function with σ(.) and call it the activation function.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 10 / 59
Introduction Logistic Regression Summary Extra reading References

Introduction (cont.)

• Sigmoid (logistic) function.

1
σ(z) =
1 + e−z
• A good candidate for activation function.
• It gives us a number between 0 and 1
smoothly.
• It is also differentiable

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 11 / 59
Introduction Logistic Regression Summary Extra reading References

Sigmoid function & its derivative

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 12 / 59
Introduction Logistic Regression Summary Extra reading References

Introduction (cont.)

• The sigmoid function takes a number as input but we have:

x = [x0 = 1, x1 , . . . , xd ]
w = [w0 , w1 , . . . , wd ]

• So we can use the dot product of x and w.


• We have 0 ≤ σ(wT x) ≤ 1. which is the estimated probability of y = 1 on input x.
• An Example : A basketball game (Win, Lose)
• σ(wT x) = 0.7
• In other terms 70 percent chance of winning the game.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 13 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression
Fundamentals
Decision surface
ML estimation
Cost function
Gradient descent
Multi-class logistic regression

3 Summary

4 Extra reading

5 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 14 / 59
Introduction Logistic Regression Summary Extra reading References

Decision surface

• Decision surface or decision boundary is the region of a problem space in which


the output label of a classifier is ambiguous. (could be linear or non-linear)
• In binary classification it is where the probability of a sample belonging to each
y = 0 and y = 1 is equal.

• Decision boundary hyperplane always has one less dimension than the feature
space.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 15 / 59
Introduction Logistic Regression Summary Extra reading References

Decision surface (cont.)

• An example of linear decision boundaries:

Figure adapted from Eric Xing, Machine Learning, CMU


CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 16 / 59
Introduction Logistic Regression Summary Extra reading References

Decision surface (cont.)

• Back to our logistic regression problem.


• Decision surface σ(wT x) = constant.

1
σ(wT x) = T
= 0.5
1 + e−(w x)
• Decision surfaces are linear functions of x
• if σ(wT x) ≥ 0.5 then ŷ = 1, else ŷ = 0
• Equivalently, if wT x + w0 ≥ 0.5 then decide ŷ = 1, else ŷ = 0

ŷ is the predicted label

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 17 / 59
Introduction Logistic Regression Summary Extra reading References

Decision boundary example

σ(wT x) = σ(w0 + w1 x1 + w2 x2 )

Predict y = 1 if − 3 + x1 + x2 ≥ 0

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 18 / 59
Introduction Logistic Regression Summary Extra reading References

Non-linear decision boundary example

σ(wT x) = σ(w0 + w1 x1 + w2 x2 + w3 x12 + w4 x22 )


We can learn more complex decision boundaries when having higher order terms

Predict y = 1 if − 1 + x12 + x22 ≥ 0

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 19 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression
Fundamentals
Decision surface
ML estimation
Cost function
Gradient descent
Multi-class logistic regression

3 Summary

4 Extra reading

5 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 20 / 59
Introduction Logistic Regression Summary Extra reading References

ML estimation

• We had posterior of a sample as:

P(y (i) |x(i) , w)

• Logistic regression should maximize production of all these sample posteriors.


• Maximum (conditional) log likelihood:
n
P(y (i) |x(i) , w)
Y
ŵ = arg max log
w i=1

• Note that in binary classification y is either 1 or 0, So we can have posterior term


simplified as follows:

(i) (i)
P(y (i) |x(i) , w) = σ(wT x(i) )y (1 − σ(wT x(i) ))(1−y )

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 21 / 59
Introduction Logistic Regression Summary Extra reading References

ML estimation

• Logarithm of the posterior probability:

log P(y (i) |x(i) , w) = y (i) log(σ(wT x(i) )) + (1 − y (i) ) log(1 − σ(wT x(i) ))

• Hence the log likelihood is as follows:


n n
log P(y (i) |x(i) , w) = log P(y (i) |x(i) , w)
Y X
i=1 i=1
n
[y (i) log(σ(wT x(i) )) + (1 − y (i) ) log(1 − σ(wT x(i) ))]
X
=
i=1

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 22 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression
Fundamentals
Decision surface
ML estimation
Cost function
Gradient descent
Multi-class logistic regression

3 Summary

4 Extra reading

5 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 23 / 59
Introduction Logistic Regression Summary Extra reading References

Cost function

• We should find

ŵ = arg min J(w)


w

• MLE finds parameters that best describe a classification problem so cost function
should be negative of log likelihood term:
n
log P(y (i) |x(i) , w)
X
J(w) = −
i=1
n
−y (i) log(σ(wT x(i) )) − (1 − y (i) ) log(1 − σ(wT x(i) ))
X
=
i=1

• No closed form solution for ∇w J(w) = 0


• However J(w) is convex.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 24 / 59
Introduction Logistic Regression Summary Extra reading References

Cost function (cont.)

• Convexity of J(w) can easily be proved:


• We use the lemma that sum of several convex functions is still convex (you can prove
it on your own).
• Each term in the summation is differentiable (twice).
• If you twice get derivative of (with respect to σ):

−y (i) log(σ(wT x(i) )) − (1 − y (i) ) log(1 − σ(wT x(i) ))

• You get:
y 1−y
+
σ2 (1 − σ)2
• Which for both y = 0 and y = 1 is positive.
• Each log P(y (i) |x(i) , w) is convex, hence the summation is convex as well.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 25 / 59
Introduction Logistic Regression Summary Extra reading References

Cost function (cont.)

• Visualization of each binary cross entropy loss term:

• As you can see if the model predicted value is ŷ = 0.16 and true label is y = 1 then
the error is high but if the true label is y = 0 the error would be low.

Figure adopted from https://towardsdatascience.com/logistic-regression-from-scratch-69db4f587e17


CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 26 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression
Fundamentals
Decision surface
ML estimation
Cost function
Gradient descent
Multi-class logistic regression

3 Summary

4 Extra reading

5 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 27 / 59
Introduction Logistic Regression Summary Extra reading References

Gradient descent

• Remember from previous slides:

n
−y (i) log(σ(wT x(i) )) − (1 − y (i) ) log(1 − σ(wT x(i) ))
X
J(w) =
i=1

• Update rule for gradient descent:

wt+1 = wt − η∇w J(wt )

• With J(w) definition for logistic regression we get:

n
(σ(wT x(i) ) − y (i) )x(i)
X
∇w J(w) =
i=1

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 28 / 59
Introduction Logistic Regression Summary Extra reading References

Gradient descent

• Compare the gradient of logistic regression with the gradient of SSE in linear
regression :

n
(σ(wT x(i) ) − y (i) )x(i)
X
∇w J(w) =
i=1

n
(wT x(i) − y (i) )x(i)
X
∇w J(w) =
i=1

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 29 / 59
Introduction Logistic Regression Summary Extra reading References

Loss function

• Loss function is a single overall measure of loss incurred for taking our decisions
(over entire dataset).
• We have:

Loss(y, σ(wT x)) = −y × log(σ(wT x)) − (1 − y) × log(1 − σ(wT x))

• Since in binary classification either y = 1 or y = 0 we have:


(
T − log(σ(wT x)) if y = 1
Loss(y, σ(w x)) = T
− log(1 − σ(w x)) if y = 0

• How is it related to zero-one loss? (ŷ is the predicted label and y is the ture label)
(
1 if y ̸= ŷ
Loss(y, ŷ) =
0 if y = ŷ

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 30 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression
Fundamentals
Decision surface
ML estimation
Cost function
Gradient descent
Multi-class logistic regression

3 Summary

4 Extra reading

5 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 31 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression

• Now consider a problem where we have K classes and every sample only belongs
to one class (for simplicity).

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 32 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression (cont.)

• For each class k, σk (x; W) predicts the probability of y = k.


• i.e., P(y = k|x, W)
PK
• For each data point x0 , P(y = k|x0 , W) must be 1
k=1
• W denotes a matrix of wi ’s in which each wi is a weight vector dedicated for class
label i.
• On a new input x, to make a prediction, we pick the class that maximizes σk (x; W):

α(x) = arg max σk (x; W)


k=1,...,K

if σk (x; W) > σj (x; W) ∀j ̸= k then decide Ck

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 33 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression (cont.)

• K > 2 and y ∈ {1, 2, . . . , K }

exp (wkT x)
σk (x, W) = P(y = k|x) = PK T
j=1 exp (wj x)

• Normalized exponential (Aka Softmax)


• if wT x ≫ wT x for all j ̸= k then P(Ck |x) ≈ 1 and P(Cj |x) ≈ 0
k j
• Note : remember from Bayes theorem:

P(x|Ck )P(Ck )
P(Ck |x) = PK
j=1 P(x|Cj )P(Cj )

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 34 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression (cont.)

• Softmax function smoothly highlights the maximum probability and is


differentiable.
• Compare it with max(.) function which is strict and non-differentiable
• Softmax can also handle negative values because we are using exponential function
• And it gives us probability for each class since:

K
X exp(wkT x)
PK T
=1
k=1 j=1 exp(wj x)

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 35 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression (cont.)

• An example of applying softmax (note that zi = wT xi ):

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 36 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression (cont.)

• Again we set J(W ) as negative of log likelihood.


• We need Ŵ = arg min J(W )
W

n
P(y (i) |x(i) , W)
Y
J(W ) = − log
i=1
n YK (i)
σk (x(i) ; W)yk
Y
= − log
i=1 k=1
n X
K
yk(i) log(σk (x(i) ; W))
X
=−
i=1 k=1

• If i-th sample belongs to class k then y (i) is 1 else 0.


k
• Again no closed-from solution for Ŵ

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 37 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression (cont.)

• From previous slides we have:

n X
K
yk(i) log(σk (x(i) ; W))
X
J(W ) = −
i=1 k=1

• In which:
  (1)
yK(1)

y (1) y1 ...

 y (2)   y (2) ... yK(2) 

   1
W = [w1 , w2 , . . . , wK ], Y =
 ..  =  ..
 
.. .. 
 .   . . . 

y (n) y (n) ... yK(n)
1
• y is a vector of length K (1-of-K encoding)
• For example y = [0, 0, 1, 0]T when the target class is C3 .

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 38 / 59
Introduction Logistic Regression Summary Extra reading References

Multi-class logistic regression (cont.)

• Update rule for gradient descent:

wjt+1 = wjt − η∇W J(W t )


n
(σj (x(i) ; W) − yj(i) )x(i)
X
∇wj J(W ) =
i=1

• wt denotes the weight vector for class j (since in multi-class LR, each class has its
j
own weight vector) in the t-th iteration

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 39 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression

3 Summary

4 Extra reading

5 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 40 / 59
Introduction Logistic Regression Summary Extra reading References

Logistic regression (LR) summary

• LR is a linear classifier
• LR optimization problem is obtained by maximum likelihood
• No closed-form solution for its optimization problem
• But convex cost function and global optimum can be found by gradient ascent

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 41 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression

3 Summary

4 Extra reading
Probabilistic view in classification
Probabilistic classifiers

5 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 42 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression

3 Summary

4 Extra reading
Probabilistic view in classification
Probabilistic classifiers

5 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 43 / 59
Introduction Logistic Regression Summary Extra reading References

Probabilistic view in classification problem

• In a classification problem:
• Each feature is a random variable (e.g. a person’s height)
• The class label is also considered a random variable (e.g. a person could be
overweight or not)
• We observe the feature values for a random sample and intend to find its class label
• Evidence: Feature vector x
• Objective: Class label

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 44 / 59
Introduction Logistic Regression Summary Extra reading References

Definitions

• Posterior probability : The probability of a class label Ck given a sample x

P(Ck |x)

• Likelihood or class conditional probability : PDF of feature vector x for samples of


class Ck
P(x|Ck )
• Prior probability : Probability of the label be Ck

P(Ck )
• P(x): PDF of feature vector x
• From total probability theorem:

K
X
P(x) = P(x|Ck )P(Ck )
k=1

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 45 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression

3 Summary

4 Extra reading
Probabilistic view in classification
Probabilistic classifiers

5 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 46 / 59
Introduction Logistic Regression Summary Extra reading References

Probabilistic classifiers

• Probabilistic approaches can be divided in two main categories:


• Generative
• Estimate PDF P(x, Ck ) for each class Ck and then use it to find P(Ck |x). Alternatively
estimate both PDF P(x|Ck ) and P(Ck ) to find P(Ck |x).
• Discriminative
• Directly estimate P(Ck |x) for class Ck

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 47 / 59
Introduction Logistic Regression Summary Extra reading References

Probabilistic classifiers (cont.)

• Let’s assume we have input data x and want to classify the data into labels y.
• A generative model learns the joint probability distribution P(x, y).
• A discriminative model learns the conditional probability distribution P(y|x)

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 48 / 59
Introduction Logistic Regression Summary Extra reading References

Discriminative vs. Generative : example

• Suppose we have the following dataset in form of (x, y):

(1, 0), (1, 0), (2, 0), (2, 1)

• P(x, y) is :
y=0 y=1
1
x=1 2 0
1 1
x=2 4 4
• P(y|x) is :
y=0 y=1
x=1 1 0
1 1
x=2 2 2

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 49 / 59
Introduction Logistic Regression Summary Extra reading References

Discriminative vs. Generative : example (cont.)

• The distribution P(y|x) is the natural distribution for classifying a given sample x
into class y.
• This is why that algorithms which model this directly are called discriminative
algorithms.
• Generative algorithms model P(x, y), which can be transformed into P(y|x) by
Bayes rule and then used for classification.
• However, the distribution P(x, y) can also be used for other purposes.
• For example we can use P(x, y) to generate likely (x, y) pairs

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 50 / 59
Introduction Logistic Regression Summary Extra reading References

Generative approach

1 Inference
• Determine class conditional densities P(x|Ck ) and priors P(Ck )
• Use Bayes theorem to find P(Ck |x)
2 Decision
• Make optimal assignment for new input (after learning the model in the inference
stage)
• if P(Ci |x) > P(Cj |x)∀j ̸= i, then decide Ci .

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 51 / 59
Introduction Logistic Regression Summary Extra reading References

Generative approach (cont.)

• Generative approach for a binary classification problem:

Figures adapted from Machine Learning and Pattern Recognition, Bishop


CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 52 / 59
Introduction Logistic Regression Summary Extra reading References

Discriminative approach

1 Inference
• Determine the posterior class probabilities P(Ck |x) directly.
2 Decision
• Make optimal assignment for new input (after learning the model in the inference
stage)
• if P(Ci |x) > P(Cj |x)∀j ̸= i, then decide Ci .

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 53 / 59
Introduction Logistic Regression Summary Extra reading References

Discriminative approach (cont.)

• Discriminative approach for a binary classification problem:

Figures adapted from Machine Learning and Pattern Recognition, Bishop


CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 54 / 59
Introduction Logistic Regression Summary Extra reading References

Discriminative approach (cont.)

• Logistic regression is a discriminative approach.


• We directly want to specify the class label with σ(wT x)

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 55 / 59
Introduction Logistic Regression Summary Extra reading References

1 Introduction

2 Logistic Regression

3 Summary

4 Extra reading

5 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 56 / 59
Introduction Logistic Regression Summary Extra reading References

Contributions

• These slides are authored by:


• Danial Gharib

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 57 / 59
Introduction Logistic Regression Summary Extra reading References

[1] M. Soleymani Baghshah, “Machine learning.” Lecture slides.


[2] A. Ng, “Ml-005, lecture 6.” Lecture slides.
[3] C. M. Bishop, Pattern Recognition and Machine Learning.
Information Science and Statistics, New York, NY: Springer, 1 ed., Aug. 2006.
[4] S. Fidler, “Csc411.” Lecture slides.
[5] A. Ng and T. Ma, CS229 Lecture Notes.
Updated June 11, 2023.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 58 / 59
Introduction Logistic Regression Summary Extra reading References

Any Questions?

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 5, 2024 59 / 59

You might also like