0% found this document useful (0 votes)

34 views47 pages

DDA3020 Lecture 06 Logistic Regression

This document outlines a lecture on logistic regression. It begins with a review of linear regression, discussing both the deterministic and probabilistic perspectives. It then introduces classification problems and representations before covering the topics of logistic regression, regularized logistic regression, and the probabilistic perspective of logistic regression. Variants of linear regression like ridge regression, lasso regression, and robust regression are also briefly discussed.

Uploaded by

J Deng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views47 pages

DDA3020 Lecture 06 Logistic Regression

Uploaded by

J Deng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

DDA3020 Machine Learning

Lecture 06 Logistic Regression

Jicong Fan
School of Data Science, CUHK-SZ

October 10/12, 2022

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 1 / 47
Outline

1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 2 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 3 / 47
Linear regression: deterministic perspective

Linear hypothesis function: fw,b (x) = x> w + b, or, simply fw (x) =

x> w by concatenating b and w together and augmenting x to [1; x]
Linear regression by minimizing residual sum of squares (RSS):
m
1X > 1
w∗ = arg min J(w), where J(w) = (xi w − yi )2 = kXw − yk2
w 2 i=1 2

Two solutions:
−1 >
Closed-form solution: w∗ = X> X X y
Gradient descent: w ← w − αX> (Xw − y), for multiple iterations until
convergence

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 4 / 47
Linear regression: probabilistic perspective
We assume that: y = w> x + e, where e ∼ N (0, σ 2 ) is called observation
noise or residual error
y is also a random variable, and its conditional probability is

p(y|x, w) = N (w> x, σ 2 )

Maximum log-likelihood estimation:

m
Y
wM LE = arg max log L(w|D) = arg max log p(yi |xi , w) (1)
w w
i
m
X m
X
= arg max log p(yi |xi , w) = arg max log N (w> xi , σ 2 ) (2)
w w
i i
m
1 X m
= arg max − log(σ m (2π) ) − 2 (yi − w> xi )2
2 (3)
w 2σ i
m
1X
= arg min (yi − w> xi )2 , (4)
w 2 i

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 5 / 47
Variants of linear regression
Ridge regression to avoid over-fitting, through MAP estimation:
m
X
wM AP = arg max log p(yi |xi , w) + log p(w) (5)
w
i=1
Xm
= arg max log N (w> xi , σ 2 ) + N (w|0, τ 2 I) (6)
w
i=1
m
X
≡ arg min (w> xi − yi )2 + λkwk22 . (7)
w
i=1

Polynomial regression: linear model with basis expansion φ(x)

d
X d X
X d d X
X d X
d
fw,b (x) = b + wi xi + wij xi xj + wijk xi xj xk + . . .
i=1 i=1 j=1 i=1 j=1 k=1

= φ(x)> w, (8)
>
φ(x) = [1, x1 , . . . , xd , . . . , xi xj , . . . , xi xj xk , . . .] ,
w = [b, w1 , . . . , wd , . . . , wij , . . . , wijk , . . .]> .
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 6 / 47
Variants of linear regression

Lasso regression to obtain sparse model,

m
X
wM AP = arg max log N (w> xi , σ 2 ) + Lap(w|0, b) (9)
w
i
m
X
= arg min (w> xi − yi )2 + λkwk1 . (10)
w
i=1

Robust regression for data with outliers:

m
X
wM LE = arg min |w> xi − yi | (11)
w
i=1

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 7 / 47
Summary of different linear regressions
Note that the uniform distribution will not change the mode of the likelihood.
Thus, MAP estimation with a uniform prior corresponds to MLE.
p(y|x, w) p(w) regression method
Gaussian Uniform Least squares
Gaussian Gaussian Ridge regression
Gaussian Laplace Lasso regression
Laplace Uniform Robust regression
Student Uniform Robust regression

u1
ML Estimate

MAP Estimate

prior mean

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 8 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 9 / 47
Classification

Classification: classifying input data into discrete states

Email filtering: spam / not spam?
Weather forecast: sunny / not sunny?
Tumor: malignant / benign?
The label y ∈ {0, 1}:
y = 0: negative class, e.g., not spam, not sunny, benign
y = 1: positive class, e.g., spam, sunny, malignant

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 10 / 47
Threshold classifier with linear regression

We assume a linear hypothesis function fw,b (x) = x> w + b

A simple threshold classifier with this hypothesis function is
If fw,b (x) > 0.5, then y = 1, i.e., malignant tumor
If fw,b (x) < 0.5, then y = 0, i.e., benign tumor

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 11 / 47
Threshold classifier with linear regression

It seems that the simple threshold classifier with linear regression works
well on this classification task
However, if there is a positive sample with very large tumor size (plot
above), what will happen?
The hypothesis function will be significantly changed, causing that some
positive samples are mis-classified as negative (not malignant). How to han-
dle it? Adjusting the threshold value, or adopting robust linear regression.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 12 / 47
Threshold classifier with linear regression

But there is still something wired.

Our goal is to predict y ∈ {0, 1}, but the prediction could be fw,b (x) > 1
or fw,b (x) < 0, which does not serve our purpose.
A desired hypothesis function for this task should be fw,b (x) ∈ [0, 1].

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 13 / 47
Threshold classifier with linear regression

Exercise: Which statements are true?

If linear regression doesn’t work well like the above example, feature scaling
may help
If the training set satisfies that all yi ∈ [0, 1] for all points (xi , yi ), then the
linear hypothesis function fw,b (x) ∈ [0, 1] for all values of xi
None of the above is correct

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 14 / 47
Hypothesis representation

A desired hypothesis function for this task should be fw,b (x) ∈ [0, 1]
To this end, we introduce a novel function, as follows:
1
fw,b (x) = g(w> x) ∈ [0, 1], g(z) = ,
1 + exp(−z)

where g(·) is called sigmoid function or logistic function (shown below)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−10 −5 0 5 10

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 15 / 47
Hypothesis representation
Interpretation of sigmoid/logistic function
fw,b (x) = estimated probability that y = 1 of input x.
For example (plot below), if fw,b (x) = 0.8, then it means that a patient
with tumor size x has 80% chance of tumor being malignant. In this task,
larger tumor size has larger chance/probability of being malignant tumor.
Thus, we can say that

fw,b (x) = P (y = 1|x; w).

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−10 −5 0 5 10

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 16 / 47
Decision boundary
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−10 −5 0 5 10

In logistic regression, we have

1
fw,b (x) = g(w> x + b) = P (y = 1|x; w) ∈ [0, 1], g(z) = .
1 + exp(−z)
Suppose that if fw,b (x) ≥ 0.5, then we predict y = 1; if fw,b (x) < 0.5, then
we predict y = 0
Correspondingly, if w> x + b ≥ 0, we predict y = 1; if w> x + b < 0, then
we predict y = 0.
It determines the decision boundary, which is the curve/hyper-plane cor-
responding to fw,b (x) = 0.5, or w> x + b = 0
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 17 / 47
Decision boundary

fw,b (x) = g(b + w1 x1 + w2 x2 ) = g(−3 + x1 + x2 )

Predict y = 1 if −3 + x1 + x2 ≥ 0 (plot above)

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 18 / 47
Decision boundary

Figure: Non-linear decision boundary

fw,b (x) = g(b + w1 x1 + w2 x2 + w3 x21 + w4 x22 ) = g(−1 + x21 + x22 )

Predict y = 1 if −1 + x21 + x22 ≥ 0 (plot above)

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 19 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 20 / 47
Cost function

Training set: m training examples {(xi , yi )}m

i=1
Hypothesis function: fw,b (x) = g(w> x + b) = 1
1+exp(−w> x−b)
Cost function:
1
Pm 2 1 2
Linear regression: J(w) = 2m i=1 (fw,b (xi ) − yi ) = 2m kXw − yk ,
which is called `2 loss or residual sum of squares
It is convex w.r.t. w for linear regression
Logistic regression: If we adopt the same cost function for logistic regres-
sion, we have
m
1 X
J(w) = (g(w> xi ) − yi )2 .
2m i
However, it is non-convex w.r.t. w.

Exercise 1: Prove the `2 loss is convex w.r.t. w for linear regression.

Exercise 2: Prove the `2 loss is non-convex w.r.t. w for logistic regression.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 21 / 47
Cost function

Exercise 1: Prove the `2 loss is convex w.r.t. w for linear regression.

Exercise 2: Prove the `2 loss is non-convex w.r.t. w for logistic regression.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 22 / 47
Cost function
Cross-entropy:
Z X
H(p, q) = − p(x) log(q(x))dx or − p(x) log(q(x)),
x x

where p(x), q(x) are probability density functions (PDF) of x if x is

a continuous random variable, or, probability mass functions if x is a
discrete random variable
We set
ground-truth posterior probability : y(x) = P (y = 1|x),
predicted posterior probability : fw,b (x) = P (y = 1|x; w).
Cross-entropy loss:

cost y(x), fw,b (x) = H y(x), fw,b (x)
= − P (y = 1|x) · log P (y = 1|x; w) − P (y = 0|x) · log P (y = 0|x; w)
(
− log(fw,b (x)), if y(x) = 1
=
− log(1 − fw,b (x)), if y(x) = 0

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 23 / 47
Cost function for logistic regression
Cross-entropy loss:
(
− log(fw,b (x)), if y(x) = 1
cost y(x), fw,b (x) =
− log(1 − fw,b (x)), if y(x) = 0

For y = 1, if fw,b (x) = 1, i.e., P (y = 1|x; w) = 1, then the prediction

equals to the ground-truth label, the cost is 0.
For y = 1, if fw,b (x) → 0, i.e., P (y = 1|x; w) → 0, then it should be
penalized with a very large cost. Here we have cost(y(x), fw,b (x)) → ∞.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 24 / 47
Cost function for logistic regression
Cross-entropy loss:
(
− log(fw,b (x)), if y(x) = 1
cost(y(x), fw,b (x)) =
− log(1 − fw,b (x)), if y(x) = 0
For y = 0, if fw,b (x) = 0, i.e., P (y = 1|x; w) = 0, then the prediction
equals to the ground-truth label, the cost is 0
For y = 0, if fw,b (x) → 1, i.e., P (y = 1|x; w) → 0, then it should be
penalized with a very large cost. Here we have cost(y(x), fw,b (x)) → ∞

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 25 / 47
Cost function for logistic regression

Cross-entropy loss:
(
− log(fw,b (x)), if y(x) = 1
cost(y(x), fw,b (x)) =
− log(1 − fw,b (x)), if y(x) = 0

Exercise: Which states are true?

If fw,b (x) = y, then cost(y(x), fw,b (x)) = 0 for both y = 0 and y = 1
If y = 0, then cost(y(x), fw,b (x)) → ∞ as fw,b (x) → 1
If y = 0, then cost(y(x), fw,b (x)) → ∞ as fw,b (x) → 0
Regardless whether y = 0 or y = 1, if fw,b (x) = 0.5, then
cost(y(x), fw,b (x)) > 0

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 26 / 47
Cost function of logistic regression

Cost function of logistic regression

m
1 X
J(w) = cost(yi , fw,b (xi )),
m i=1
(
− log(fw,b (x)), if y(x) = 1
cost(y(x), fw,b (x)) =
− log(1 − fw,b (x)), if y(x) = 0

The above cost function can be simplified as follows

m
1 X
J(w) = − yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) .
m i=1

Exercise: Please prove that J(w) is convex w.r.t. w.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 27 / 47
Gradient descent for logistic regression

Learning w by minimize J(w), i.e.,

m
1 X
w∗ = arg min J(w) = −

yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) .
w m i=1

Gradient descent: repeat the following update until convergence

w ← w − α∇w J(w)
m
1 X
∇w J(w) = [fw,b (xi ) − yi ]xi
m i=1

How to define convergence? Calculating the changes of J(w) or w in the

last K steps, if the change is lower than a threshold, than it can be seen as
convergence. Remember that choosing suitable learning rate α is important
to achieve a good converged solution.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 28 / 47
Gradient descent for logistic regression

Exercise: Suppose you are running a logistic regression model, and you should
observe the learning procedure to find a suitable learning rate α. Which of the
following is reasonable to make sure α is set properly and that the gradient
descent is running correctly?
1
Pm 2
Plot J(w) = − m i (yi − fw,b (xi )) as a function of the number of itera-
tions (i.e., the horizontal axis is the iteration number) and make sure J(w)
is decreasing on every iteration.
1
Pm
Plot J(w) = − m i yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) as a
function of the number of iterations (i.e., the horizontal axis is the iteration
number) and make sure J(w) is decreasing on every iteration.
Plot J(w) as a function of w and make sure it is decreasing on every
iteration.
Plot J(w) as a function of w and make sure it is convex.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 29 / 47
Multi-class classification

Binary classification: in above examples and derivations, we only consider

the binary classification problem, i.e., y ∈ {0, 1}.
Multi-class/multi-category classification: however, many practical prob-
lems involve with multi-category outputs, i.e., y ∈ {1, . . . , C}:
Weather forecast: sunny, cloudy, rain, snow
Email tagging: work, friends, families, hobby

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 30 / 47
Multi-class classification

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 31 / 47
Multi-class classification: one-vs-all

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 32 / 47
Multi-class classification: one-vs-all

One-vs-all logistic regression:

Train a binary logistic regression fwj ,bj (·) for each class j, by setting all
samples of other classes as negative class
For a new testing sample x, predict its class as arg maxj fwj ,bj (x).
Pros: Easy to implement
Cons: The training cost is too high, and is difficult to scale to tasks with large
number of classes.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 33 / 47
Multi-class classification: Softmax regression

Softmax function:

(j) exp(wj> x + bj )
fW,b (x) = PC = P (y = j|x; W, b), (12)
>
c=1 exp(wc x + bc )

where W = [w1 , . . . , wC ], b = [b1 ; b2 ; . . . ; bC ] with C being the number of

(j)
classes. For simplicity, in the following we write fW,b (·) as fwj ,bj (·)
Cost function:
m C
1 XX
J(W) = − I(yi = j) log(fwj ,bj (xi )) , (13)
m i j

where I(a) = 1 if a is true, otherwise I(a) = 0.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 34 / 47
Multi-class classification: Softmax regression
It can also be optimized by gradient descent:

∂J(W)
wj ← wj − α ,
∂wj
m
∂J(W) 1 X I(yi = j) ∇fwj ,bj (xi )
=− ·
∂wj m i fwj ,bj (xi )) ∇wj
C
X I(yi 6= j) ∇fwc ,bc (xi )
+ ·
f
c=1 wc ,bc
(xi )) ∇wj
∇fwj ,bj (xi )
= fwj ,bj (xi ) · (1 − fwj ,bj (xi )) · xi .
∇wj
∇fwc ,bc (xi )
= −fwj ,bj (xi ) · fwc ,bc (xi ) · xi
∇wj
m
∂J(W) 1 X
=⇒ = fwj ,bj (xi ) − I(yi = j) xi (14)
∂wj m i

Note: {wc }C
c=1 should be updated in parallel, rather than sequentially.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 35 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 36 / 47
Overfitting in linear regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 37 / 47
Overfitting in linear regression

Overfitting: If we have too many features, the learned hypothesis may fit the
training data very well (low bias), but fail to generalize to new examples.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 38 / 47
Overfitting in logistic regression

Under-fitting Good-fitting Over-fitting

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 39 / 47
Addressing Overfitting

Generally, there are two approaches to address the overfitting problem, includ-
ing:
Reducing the number of features:
Feature selection
Dimensionality reduction (introduced in later lectures)
Regularization:
Keep all features, but reduce magnitude/value of each parameter, such that
each feature contributes a bit to predict y
In the following, we will focus on the regularization-based approach.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 40 / 47
Regularized logistic regression
The objective function of the regularized logistic regression is formulated
as follows
d
¯ λ X 2
J(w) = J(w) + w
2m j=1 j
m d
1 X λ X 2
=− yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) + w .
m i 2m j=1 j

Note: the bias parameter w0 (or b) is not regularized/penalized.

The above objective function can also be solved by gradient descent, as
follows
m
α X
w0 ← w0 − (fw,b (xi ) − yi ) · xi (0), where xi (0) = 1, ∀i
m i=1
m
α X
wj ← wj − (fw,b (xi ) − yi ) · xi (j) + λ · wj ,
m i=1

where xi (j) denotes the j-th entry of xi , and j = 0, . . . , d.

Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 41 / 47
Regularized logistic regression

Exercise: When using regularized logistic regression, which of these is the

best way to monitor whether gradient descent is working correctly?
Plot J(w) as a function of the number of iterations and make sure it’s
decreasing
λ
Pd 2
Plot J(w) − 2m j=1 wj as a function of the number of iterations and
make sure it’s decreasing
λ
Pd 2
Plot J(w) + 2m j=1 wj as a function of the number of iterations and
make sure it’s decreasing
Pd
Plot j=1 wj2 as a function of the number of iterations and make sure it’s
decreasing

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 42 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 43 / 47
Logistic regression: probabilistic modeling
Behind logistic regression for binary classification, we assume that
both the feature x and and the label y are random variables, as follows

µ(x|w) = Sigmoid(w> x),

y(x|w) ∼ Bernoulli(µ(x|w)).

Then, we have
(
µ if y = 1,
P (y|x; w) =
1 − µ if y = 0.

The log-likelihood function of P (y|x; w) is formulated as

L(w) = y log(µ) + (1 − y) log(1 − µ).

Thus, we obtain

max L(w) ≡ min J(w).

w w

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 44 / 47
Logistic regression: probabilistic modeling

Behind logistic regression, we assume that

µ(x|w) = Sigmoid(w> x),

y(x|w) ∼ Bernoulli(µ(x|w)).

`2 -regularized logistic regression: we further assume w ∼ N (w|0, σ 2 I),

then we have
d
λ X 2
max L(w) + log N (w|0, σ 2 I) ≡ min J(w) + w .
w w 2m j=1 j

`1 -regularized logistic regression: if we assume w ∼ Laplace(w|0, b), then

we have
d
λ X
max L(w) + log Laplace(w|0, b) ≡ min J(w) + |wj |.
w w 2m j=1

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 45 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 46 / 47
Summary: linear regression vs. logistic regression

Linear regression Logistic regression

Task regression classification
Hypothesis fw,b (x) w> x + b ∈ (−∞, ∞) g(w> x+ b) ∈ [0, 1]
1
Pm > 2 1
Pm
Objective J(w) 2m i (yi − w xi ) −m i=1 yi log(fw,b (xi ))
+(1 − yi ) log(1 − fw,b (xi ))
Solution closed-form or gradient descent gradient descent

Note that: For each variant of linear/logistic regression, you can derive it from both
the deterministic and the probabilistic perspectives.
Own reading: Both linear regression and logistic regression are special cases of gen-
eralized linear models. If interested, you can find more details from Section 4 of th
book “Pattern Recognition and Machine Learning”, Bishop, 2006.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 47 / 47

03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
ML-Unit I - Logistic Regression
No ratings yet
ML-Unit I - Logistic Regression
102 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Ziad Aladawy - Logistic Regressio
No ratings yet
Ziad Aladawy - Logistic Regressio
54 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
Logistic - Regression Class 3
No ratings yet
Logistic - Regression Class 3
88 pages
cs3362 Foundations of Data Science Lab Manual
75% (8)
cs3362 Foundations of Data Science Lab Manual
53 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Notes 05
No ratings yet
Notes 05
51 pages
Logistic Regression by Nirzona
No ratings yet
Logistic Regression by Nirzona
11 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Logistic Regression Annotated
No ratings yet
Logistic Regression Annotated
23 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Lecture 4-Logistic-Regression
No ratings yet
Lecture 4-Logistic-Regression
50 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Lecture 21 - Logistic Regression
No ratings yet
Lecture 21 - Logistic Regression
34 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Week 8
No ratings yet
Week 8
38 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
10 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
Unit II
100% (1)
Unit II
13 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
MLStackCafe QAS 1672810525772
No ratings yet
MLStackCafe QAS 1672810525772
12 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
ML Logistic Regression
No ratings yet
ML Logistic Regression
19 pages
Essentials of Econometrics
7% (27)
Essentials of Econometrics
12 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Logistic Regression Quiz - Predictive Modeling - Great Learning
100% (4)
Logistic Regression Quiz - Predictive Modeling - Great Learning
8 pages
Social Media Sentiment Analysis
No ratings yet
Social Media Sentiment Analysis
49 pages
Dsbda Unit4
No ratings yet
Dsbda Unit4
110 pages
Chapter I The Problem and Its Scope
No ratings yet
Chapter I The Problem and Its Scope
40 pages
Essentials of Machine Learning Algorithms (With Python and R Codes) PDF
100% (1)
Essentials of Machine Learning Algorithms (With Python and R Codes) PDF
20 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Chap6 ClassificationBasic
No ratings yet
Chap6 ClassificationBasic
83 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
4 李孝悌：桃花扇底送南朝断裂的逸乐
No ratings yet
4 李孝悌：桃花扇底送南朝断裂的逸乐
53 pages
(Suny Series On The Presidency - Contemporary Issues) Jeff Yates - Popular Justice - Presidential Prestige and Executive Success in The Supreme Court - State University of New York Press (2002)
No ratings yet
(Suny Series On The Presidency - Contemporary Issues) Jeff Yates - Popular Justice - Presidential Prestige and Executive Success in The Supreme Court - State University of New York Press (2002)
144 pages
Questionnaire Survey of Working Relationships Between Nurses and Doctors in University Teaching Hospitals in Southern Nigeria
No ratings yet
Questionnaire Survey of Working Relationships Between Nurses and Doctors in University Teaching Hospitals in Southern Nigeria
22 pages
CSC4005 Tutorial3
No ratings yet
CSC4005 Tutorial3
40 pages
Ordered Probit Model
No ratings yet
Ordered Probit Model
13 pages
Foundations of Probability in Python - Part 4
No ratings yet
Foundations of Probability in Python - Part 4
62 pages
Tutorial 2
No ratings yet
Tutorial 2
58 pages
Berry 2011
No ratings yet
Berry 2011
14 pages
CH 03
No ratings yet
CH 03
56 pages
DDA3020 Lecture 02 Linear Algebra
No ratings yet
DDA3020 Lecture 02 Linear Algebra
37 pages
Ijoc 2023 1292 sm1
No ratings yet
Ijoc 2023 1292 sm1
14 pages
A Summary of Introductory Econometrics by Wooldridge: SSRN Electronic Journal January 2015
No ratings yet
A Summary of Introductory Econometrics by Wooldridge: SSRN Electronic Journal January 2015
62 pages
Short Quiz 1 6
No ratings yet
Short Quiz 1 6
5 pages
Data Science and Machine Learning
No ratings yet
Data Science and Machine Learning
13 pages
2009 - Christie - An Economic Assessment of The Amenity Benefits Associated With Alternative Coastal Defence Options
No ratings yet
2009 - Christie - An Economic Assessment of The Amenity Benefits Associated With Alternative Coastal Defence Options
20 pages
Multidrug-Resistant Tuberculosis Among Children Under 15 Years of Age in Khyber Pakhtunkhwa Province and Federally Administered Tribal Areas Pakistan
No ratings yet
Multidrug-Resistant Tuberculosis Among Children Under 15 Years of Age in Khyber Pakhtunkhwa Province and Federally Administered Tribal Areas Pakistan
9 pages
Milburn Et Al Spelling Predictors in PSCH Research Paper 1468798415624482
No ratings yet
Milburn Et Al Spelling Predictors in PSCH Research Paper 1468798415624482
26 pages
The Effect of Extracurricular Activities On School Dropout
No ratings yet
The Effect of Extracurricular Activities On School Dropout
67 pages
Ordinal Logistic Regression Analysis of Factors Affecting The Blood Sugar Levels of Diabetes Mellitus Patients
No ratings yet
Ordinal Logistic Regression Analysis of Factors Affecting The Blood Sugar Levels of Diabetes Mellitus Patients
10 pages
Weaning From Mechanical Ventilation in ICU Across 50 Countries. WEAN SAFE - Pham 2023 PDF
No ratings yet
Weaning From Mechanical Ventilation in ICU Across 50 Countries. WEAN SAFE - Pham 2023 PDF
12 pages
Credit Risk Modeling in R: Logistic Regression: Introduction
No ratings yet
Credit Risk Modeling in R: Logistic Regression: Introduction
27 pages
A Financial Distress Pre-Warning Study by Fuzzy Regression Model of Tse-Listed Companies
No ratings yet
A Financial Distress Pre-Warning Study by Fuzzy Regression Model of Tse-Listed Companies
19 pages
Sentiment Analysis Using Machine Learning
No ratings yet
Sentiment Analysis Using Machine Learning
5 pages
Determination of Factors Influencing Consumption Pattern of Ghee in Bengaluru Market: An Application of Logistic..
No ratings yet
Determination of Factors Influencing Consumption Pattern of Ghee in Bengaluru Market: An Application of Logistic..
8 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet