CH3 Logistic Regression 2020
CH3 Logistic Regression 2020
Logistic regression
1
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems
Binary classification
Classification problems
o Email spam/not spam?
o Online transactions fraudulent?
o Tumor Malignant/benign
Variable in these problems is Y
o Y is either 0 or 1
0 = negative class (absence of something)
1 = positive class (presence of something)
Start with binary class problems
Later look at multiclass classification problem, although this is just an extension of binary
classification
Machine learning Mourad ZAIED RTIM 2
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems
H w ( x) 1 where z
1 eW
T
x
d 1
g '( z) z
dz 1 e
1 (e z )
(1 e z )2
1 (1
1 )
(1 e z ) (1 e z )
g ( z )(1 g ( z ))
Decision boundary
So what we've shown is that the hypothesis predicts y = 1 when wT x >= 0
o The corollary of that when wT x < 0 then the hypothesis predicts y = 0
o Let's use this to better understand how the hypothesis makes its predictions
Decision boundary
Example w0 = -3, w1 = 1 and w2 = 1
Our parameter vector is a column vector with the above values wT is a row vector = [-3,1,1]
The z here becomes wT x
o We predict "y = 1" if
-3x0 + 1x1 + 1x2 >= 0 -3 + x1 + x2 >= 0
We can also re-write this as
o If (x1 + x2 >= 3) then we predict y = 1
o If we plot
x1 + x2 = 3 we graphically plot our decision boundary
Machine learning Mourad ZAIED RTIM 11
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems
• We want a classifier that produces very high Hw(x) when y=1, and conversely very low Hw(x)
when y=0. We hope that a is very close to y for each sample
• In other words,
• If y=1 we want to maximize Hw(x)
• If y=0 we want to maximize 1-Hw(x)
• If we combine (1) and (2), we want to maximize:
y 1 y
H x 1 H x
w w
y
log H w x 1 H w x
1 y
y log H ( x) 1 y log 1 H ( x)
w w
y log H w ( x) 1 y log 1 H w ( x)
• Generally, in Machine Learning we like to minimize the loss. That is why we changed the
direction. This is by convention.
• The above formula defines a cost function (or loss) for only one sample. We also need a
loss function for multiple samples (which we will call C(W)).
1 m (i)
C ( w) y log H w ( x ( i ) ) 1 y ( i ) log 1 H w ( x ( i ) )
m i 1
Machine learning Mourad ZAIED RTIM 16
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems
Gradient of ∁( ) 1/3
1 m
C ( w) - ylog H w ( x) + 1 y log 1 H w ( x)
m i 1
C ( w) 1 m log H w ( x) log 1 H w ( x)
- y + 1 y
w j m i 1 w j w j
H w ( x) 1 H w ( x)
1 m w j w j
- y + 1 y
m i 1 H w ( x) 1 H w ( x)
Gradient of ∁( ) 2/3
1
n
wk x k
H w ( x) fog w 1
1 e
k 0 j
With f and g w0 x0 ... w j x j ... wn xn
w j w j w j 1 e x
Gradient of ∁( ) ) 3/3
{
m
(i ) (i ) (i )
wj wj
m
(
i 1
y H w ( x )) x j
1ière iteration:
Base d’apprentissage • w0=0.1+0.01/4((0-g(w0x0(1)+w1x1(1)+w2x2 (1)) x0(1)+
(0-g(w0 x0(2) +w1x1(2)+w2x2(2)) x0(2) +
x0 x1 x2 Y (1-g(w0 x0(3 +w1x1(3)+w2x2(3)) x0(3) +
1 -0.1 1.4 0 (1-g(w0 x0(4) +w1x1(4)+w2x2(4)) x0(4) )
1 -0.5 -0.1 0 =0.1+0.01/4((0-g(0.1-0.2*0.1+0.3*1.4)+
1 1.3 0.9 1 (0-g(0.1-0.2*0.5-0.3*0.1) +
1 -0.6 0.4 1 (1-g(0.1+0.2*1.3+0.3*0.9) +
(1-g(0.1-0.2*0.6+0.3*0.4)
Valeurs initiales des wi: w0= 0.1 w1= 0.2 w2= 0.3
=0.1+0.01/4((0-g(0.5)+ (0-g(-0.03) + (1-g(0.63)) +(1-g(0.1))
= 0.1+0.01/4(2.78442238235- 0.92460027612+2.40347419836-
Taux d’apprentissage alpha = 0.01
0.37330225703)
1 = 0.1+0.01/4(5.7391945998)=0.1+0.0025* 3.88999404756
( )= =0.10972498511
1 + exp(− )
Machine learning Mourad ZAIED RTIM 21
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems
=0.2+0.01/4((0-g(0.1-0.2*0.1+0.3*1.4)*(-0.1)+
(0-g(0.1-0.2*0.5-0.3*0.1) )*(-0.5)+ Base d’apprentissage
(1-g(0.2+0.2*1.3+0.3*0.9) *(1.3) +
(1-g(0.1-0.2*0.6+0.3*0.4) *(-0.6) x0 x1 x2 Y
1 -0.1 1.4 0
=0.2+0.01/4((0-g(0.5) *(-0.1)+ (0-g(-0.03) )*(-0.5) + (1-g(0.63)) *(1.3) +(1- 1 -0.5 -0.1 0
g(0.1)) *(-0.6) 1 1.3 0.9 1
= 0.2+0.01/4(2.78442238235*(-0.1)- 0.92460027612*(0.5)+ 2.40347419836 1 -0.6 0.4 1
*(1.3) -0.37330225703) *(-0.6))
= 0.2+0.01/4(-0.27844223823 -0.46230013806 -
3.12451645787+0.22398135421)
=0.2+0.0025*(-3.64127747995) =0.1908968063
Machine learning Mourad ZAIED RTIM 22
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems
=0.2+0.01/4((0-g(0.1-0.2*0.1+0.3*1.4)*(1.4)+
(0-g(0.1-0.2*0.5-0.3*0.1) )*(-0.1)+ Base d’apprentissage
(1-g(0.2+0.2*1.3+0.3*0.9) *(0.9) +
(1-g(0.1-0.2*0.6+0.3*0.4) *(0.4) x0 x1 x2 Y
1 -0.1 1.4 0
=0.2+0.01/4((0-g(0.5) *(1.4)+ (0-g(-0.03) )*(- 0.1) + (1-g(0.63)) *(0.9) +(1- 1 -0.5 -0.1 0
g(0.1)) *(0.4) 1 1.3 0.9 1
= 0.2+0.01/4(2.78442238235*(1.4)- 0.92460027612*(-0.1)+ 2.40347419836 1 -0.6 0.4 1
*(0.9) -0.37330225703) *(0.4))
= 0.2+0.01/4(3.89819133529+ 0.09246002761+ 2.16312677852 -
0.14932090281)
=0.2+0.0025*(6.00445723861) = 0.21501114309
Machine learning Mourad ZAIED RTIM 23
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems
2ière iteration:
• w0=0.10972498511+0.01/4((0-g(w0x0(1)+w1x1(1)+w2x2 (1)) x0(1)+
(0-g(w0 x0(2) +w1x1(2)+w2x2(2)) x0(2) +
(1-g(w0 x0(3 +w1x1(3)+w2x2(3)) x0(3) +
(1-g(w0 x0(4) +w1x1(4)+w2x2(4)) x0(4) )
o Use one vs. all classification make binary classification work for multiclass classification
o Split the training set into three separate binary classification problems
i.e. create a new fake training set
P(y=1 | x2; w)
P(y=1 | x3; w)
Machine learning Mourad ZAIED RTIM 27
Classification problem
Hypothesis representation
Decision boundary
Cost function for logistic regression
Gradient descent for logistic regression
Multiclass classification problems