0% found this document useful (0 votes)

12 views22 pages

Lecture 3 - 1

Uploaded by

salah.abdo.tech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views22 pages

Lecture 3 - 1

Uploaded by

salah.abdo.tech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

CPE/EE 695: Applied Machine Learning

Lecture 3-1: Logistic Regression

Dr. Shucheng Yu, Associate Professor

Department of Electrical and Computer Engineering
Stevens Institute of Technology
Logistic Regression
Usually be used for binary classification:

Pr(Y|X), where Y is a binary variable. (why probability?)

2
Logistic Regression
Usually be used for binary classification:

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

3
Logistic Regression
Usually be used for binary classification:

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

Let Pr( Y=1 | X=x) = !(#; %)

4
Logistic Regression
Usually be used for binary classification:

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

Let Pr( Y=1 | X=x) = !(#; %)

Maximize likelihood:
$
! " %! (1 − ")#&%!
!"#

5
Logistic Regression
Usually be used for binary classification:

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

Let Pr( Y=1 | X=x) = !(#; %)

Assumption: p is modeled with parameter !;
otherwise, optimization problem doesn’t
Maximize likelihood: work
$
! " %! (1 − ")#&%!
!"#

6
Logistic Regression
Let Pr( Y=1 | X=x) = !(#; %)

Maximize likelihood:
$
! " %! (1 − ")#&%!
!"#

7
Logistic Regression
Let Pr( Y=1 | X=x) = !(#; %)

Maximize likelihood:
$
! " %! (1 − ")#&%!
!"#

Task: to estimate ' by maximizing the likelihood.

8
Logistic Regression
Let Pr( Y=1 | X=x) = !(#; %)

Maximize likelihood:
$
! " %! (1 − ")#&%!
!"#

Task: to estimate ' by maximizing the likelihood.

How can we use linear regression to solve this?

9
Logistic Regression
Let Pr( Y=1 | X=x) = !(#; %)
Task: to estimate ' by maximizing the likelihood.

How can we use linear regression to solve this?

Attempt 1: assume "((; ') be a linear function of (

Attempt 2: assume log "((; ') be a linear function of (
'
Attempt 3: assume log be a linear function of ( (good)
#&'

Remember: 0<= p <= 1

10
Logistic Regression
Logistic regression model
"
log = '( + ( / '
1−"

#
Which gives " ( =
#) * "#$ %&'#

11
Logistic Regression

Logistic Regression model estimated probability:

!̂ = ℎ" # = *(% # + #)
where * + is a logistic function (or sigmoid function).

Prediction:
0, 12 !̂ < 0.5
-, = .
1, 12 !̂ ≥ 0.5

Training the logistic regression model %̂ & = ((! ! * &) is to learn the best value of
parameter ! that makes the model fit the training data.

12
Logistic Regression
To train a logistic regression model, we first need to define a performance
measure. A commonly used measure is so-called the log loss function:
+ ,
0 1 =− ∑-"+[4 - 9 - )+ :−4
567(8 - 8 - ))]
567(: − 9
,

It is easier to explain the log loss function with one train example case, in
which we want to maximize the posterior probability
",̂ BℎDE = = 1
8. (: − 9
P(=|() = 9 8) +&. =?
1 − ",̂ BℎDE = = 0

Take log of both sides, we have 567G 4 H = 4IJK8 8).

9 + : − 4 IJK(: − 9

Average the sum of L training examples, we obtain M ' .

13
Logistic Regression
Learning the logistic regression model is to find:
θ8 = :;<=1>, J θ .

#
Where J ' = − / ∑/ ! 9 ! )+ 1−=
!"#[= log(8
! 8 ! ))],
log(1 − 9

"̂ = O(' 0 / ()

No Normal Equation (i.e., closed form solution) for θ.

But the cost function @ A is convex and derivable. Gradient

Descent is guaranteed to find global maximum.

14
Training Logistic Regression Model
The gradient of the log loss function J θ is:
. /
B- @ A = ∑ (D A2 ⋅ F 0 − H(0) )F3 (0)
/ 01.

At each round of GD, % is updated as following (similar to

linear regression, different values of = for different modes of
GD):
% = % − I∇4 K %

15
Training Logistic Regression Model
Overfitting may also happen in logistic regression.

Similarly, to combat overfitting we can introduce a regularization term to

the cost function J θ :
#
M ' = − / ∑/ ! !
!"#[= QRS("̂ ) + 1 − =
! QRS(1 − "̂ ! ))] + TU(1)

where V is a hyperparameter and W(') can be ℓ1 -norm of ', i.e.,

(
1
W ' = ∥ ' ∥1 = (∑ '! ) )

Note: W(') is ℓ2 -norm in Ridge regression, ℓ# -norm in Lasso regression.

16
Multi-Class Classification
One-Vs-Rest Method:

We can use binary classifier for multi-class classification with so-called

the One-Vs-Rest (OvR) method. Specifically, it uses multiple rounds of
binary classification for multi-class classification.
For example, to determine if an object X is a dog, cat or fish, we call a
binary classifier f() as follows:

if f(X) outputs dog

return dog;
else if f(X) outputs cat
return cat;
else return fish

17
Multi-Class Classification
Multinomial Logistic Regression:

Another approach for multi-class classification is to use the

multinomial logistic regression. For each class 1 ≤ M ≤ N,
!
1) first compute 5" & = !" * &
2) then compute Softmax function:
#$%('! ( )
%̂" = 6(5" & )" = %
∑"#$ #$%('" ( )

where !" is the vector of parameters

of input features for 5" .

18
Multi-Class Classification
Training Multinomial Logistic Regression Model:

The performance measure is the cross-entropy cost function:

. (0)
@ A = − / ∑/ ∑ 8
01. 71. H 7
(0) OPQ(R
S7 )

(:) 1, UℎV 1 ;< V#:=!WV X2 YW:ZZ M

where -9 =T
0, 1 ;< V#:=!WV >XU X2 YW:ZZ M
GD can be used to train the multinomial logistic regression
model. The gradient is:
. 0
B-! @ A = / ∑/
01.(R
S7 − H7 (0) )F(0)

19
Multi-Class Classification
Other Approaches: One-Vs-One Method:

The One-Vs-One (OvO) method constructs a binary classifier for each

pair of classes. Therefore, with K classes, we need to construct K(K-1)/2
binary classifiers.

The decision at prediction time can be made by counting the votes from
individual binary classifiers. In case of a tie, it compares the aggregated
classification confidence (i.e., the output probability) of individual binary
classifiers of each class and the higher one is selected.

The OvO method is slower than OvR. But for some algorithms (e.g., Kernel
algorithms) which cannot scale with many training examples, this algorithm
can be helpful.

20
Multi-Class Classification
Other Approaches: Error-Correcting Output Codes
The Error-Correcting Output Codes (ECOC) method encodes K classes into N bit
vectors. Each class is represented as a bit in each bit vector. ECOC trains N
binary classifiers, each splitting one group of classes from another (using the
column bit vectors below). At prediction time, the N binary classifiers are called, the
outputs of them yielding an N-bit vector. A class with the closest Euclidean
distance to the N-bit vector is selected. To reduce the classification error, error
correcting codes are used when generating the “code book”

A code book with K=9, N=15

Physical Models of Living Systems
100% (3)
Physical Models of Living Systems
365 pages
(J. G. Kalbfleisch) Probability and Statistical I PDF
No ratings yet
(J. G. Kalbfleisch) Probability and Statistical I PDF
188 pages
Survival Models and Their Estimation
No ratings yet
Survival Models and Their Estimation
344 pages
Linear Mixed Effects Modeling in SPSS
No ratings yet
Linear Mixed Effects Modeling in SPSS
29 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Lecture 4-Logistic-Regression
No ratings yet
Lecture 4-Logistic-Regression
50 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
Week 8
No ratings yet
Week 8
38 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
Logistic Regression Video
No ratings yet
Logistic Regression Video
37 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Logistic Regression Annotated
No ratings yet
Logistic Regression Annotated
23 pages
Logistic Regression by Nirzona
No ratings yet
Logistic Regression by Nirzona
11 pages
Lecture 21 - Logistic Regression
No ratings yet
Lecture 21 - Logistic Regression
34 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
P 2.1 Logistic Regression
No ratings yet
P 2.1 Logistic Regression
18 pages
Unit II
100% (1)
Unit II
13 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Lec 20
No ratings yet
Lec 20
16 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
ML (08-08-2024)
No ratings yet
ML (08-08-2024)
5 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
ML Assignment Kv2
No ratings yet
ML Assignment Kv2
10 pages
Lecture 13
No ratings yet
Lecture 13
14 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
53 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Unit - 5
No ratings yet
Unit - 5
111 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Class
No ratings yet
Class
102 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
05 Logistic Regression
No ratings yet
05 Logistic Regression
33 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
2+logistic Regression
No ratings yet
2+logistic Regression
10 pages
Sonia Jessica - 2022 - How Does Logistic Regression Work
No ratings yet
Sonia Jessica - 2022 - How Does Logistic Regression Work
4 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
ATBU
No ratings yet
ATBU
22 pages
Unit 3
No ratings yet
Unit 3
22 pages
Dynamic Linear Models With Markov-Switching: Chang-Jin Kim
No ratings yet
Dynamic Linear Models With Markov-Switching: Chang-Jin Kim
22 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
79 pages
A General Definition of Residuals
No ratings yet
A General Definition of Residuals
29 pages
Div Class Title Explaining Fixed Effects Random Effects Modeling of Time Series Cross Sectional and Panel Data A Href fn2606 Ref Type FN A Div
No ratings yet
Div Class Title Explaining Fixed Effects Random Effects Modeling of Time Series Cross Sectional and Panel Data A Href fn2606 Ref Type FN A Div
21 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Bayesian Networks - Exercises: 1 Independence and Conditional Independence
No ratings yet
Bayesian Networks - Exercises: 1 Independence and Conditional Independence
20 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
9 pages
6 Particle Filter
No ratings yet
6 Particle Filter
35 pages
Statistical Analysis of Caterpillar 793D Haul Truck Engine Data
No ratings yet
Statistical Analysis of Caterpillar 793D Haul Truck Engine Data
10 pages
CPT Pharmacom Syst Pharma - 2012 - Mould - Basic Concepts in Population Modeling Simulation and Model Based Drug
No ratings yet
CPT Pharmacom Syst Pharma - 2012 - Mould - Basic Concepts in Population Modeling Simulation and Model Based Drug
14 pages
Gretl Guide (401 450)
No ratings yet
Gretl Guide (401 450)
50 pages
UMBayesAdaptIntro SM2
No ratings yet
UMBayesAdaptIntro SM2
64 pages
Estimation of Claim Cost Data Using Zero Adjusted Gamma and Inverse Gaussian Regression Models
No ratings yet
Estimation of Claim Cost Data Using Zero Adjusted Gamma and Inverse Gaussian Regression Models
7 pages
Quizz ML
No ratings yet
Quizz ML
3 pages
Verbeek e Nijman - Testing For Selectivity Bias in Panel Data Models
No ratings yet
Verbeek e Nijman - Testing For Selectivity Bias in Panel Data Models
24 pages
The UKPDS Risk Engine: A Model For The Risk of Coronary Heart Disease in Type II Diabetes (UKPDS 56)
No ratings yet
The UKPDS Risk Engine: A Model For The Risk of Coronary Heart Disease in Type II Diabetes (UKPDS 56)
9 pages
NLP Unit V Notes
100% (1)
NLP Unit V Notes
21 pages
Failure Rate Analysis of Jaw Crusher Using Weibull Model
No ratings yet
Failure Rate Analysis of Jaw Crusher Using Weibull Model
14 pages
Bayesian Statistics and Modelling
No ratings yet
Bayesian Statistics and Modelling
28 pages
Drilling Risk Identification
No ratings yet
Drilling Risk Identification
11 pages
Bayseian Sensor Location2015
No ratings yet
Bayseian Sensor Location2015
18 pages
SSM Book (Durbin Koopman)
No ratings yet
SSM Book (Durbin Koopman)
41 pages
Real Options Valuation of Photovoltaic Power Investments in Existing Buildings
No ratings yet
Real Options Valuation of Photovoltaic Power Investments in Existing Buildings
14 pages
Public Perceptions of State Police: An Analysis of Individual-Level and Contextual Variables
No ratings yet
Public Perceptions of State Police: An Analysis of Individual-Level and Contextual Variables
12 pages

Lecture 3 - 1

Uploaded by

Lecture 3 - 1

Uploaded by

CPE/EE 695: Applied Machine Learning

Lecture 3-1: Logistic Regression

Dr. Shucheng Yu, Associate Professor

Pr(Y|X), where Y is a binary variable. (why probability?)

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

Let Pr( Y=1 | X=x) = !(#; %)

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

Let Pr( Y=1 | X=x) = !(#; %)

Pr(Y|X), where Y is a binary variable. (why probability?)

E.g., Pr(Tomorrow snow | today windy)

Let Pr( Y=1 | X=x) = !(#; %)

Task: to estimate ' by maximizing the likelihood.

Task: to estimate ' by maximizing the likelihood.

How can we use linear regression to solve this?

How can we use linear regression to solve this?

Attempt 1: assume "((; ') be a linear function of (

Remember: 0<= p <= 1

Logistic Regression model estimated probability:

Take log of both sides, we have 567G 4 H = 4IJK8 8).

Average the sum of L training examples, we obtain M ' .

No Normal Equation (i.e., closed form solution) for θ.

But the cost function @ A is convex and derivable. Gradient

At each round of GD, % is updated as following (similar to

Similarly, to combat overfitting we can introduce a regularization term to

where V is a hyperparameter and W(') can be ℓ1 -norm of ', i.e.,

Note: W(') is ℓ2 -norm in Ridge regression, ℓ# -norm in Lasso regression.

We can use binary classifier for multi-class classification with so-called

if f(X) outputs dog

Another approach for multi-class classification is to use the

where !" is the vector of parameters

The performance measure is the cross-entropy cost function:

(:) 1, UℎV 1 ;< V#:=!WV X2 YW:ZZ M

The One-Vs-One (OvO) method constructs a binary classifier for each

A code book with K=9, N=15

You might also like