0% found this document useful (0 votes)
5 views16 pages

4.logistic Regression

Logistic regression is a supervised classification algorithm that predicts discrete outcomes based on input features, with types including binary, multi-class, and ordinal logistic regression. It uses the sigmoid function to ensure predicted probabilities fall within the range of 0 to 1, and employs a cost function known as Log-Loss to minimize prediction errors during training. The training process involves gradient descent to optimize coefficients, allowing for accurate classification of new inputs.

Uploaded by

Sairam Manne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

4.logistic Regression

Logistic regression is a supervised classification algorithm that predicts discrete outcomes based on input features, with types including binary, multi-class, and ordinal logistic regression. It uses the sigmoid function to ensure predicted probabilities fall within the range of 0 to 1, and employs a cost function known as Log-Loss to minimize prediction errors during training. The training process involves gradient descent to optimize coefficients, allowing for accurate classification of new inputs.

Uploaded by

Sairam Manne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Logistic Regression

Logistic Regression
• A supervised classification algorithm.
• Predicted output is discrete – only specific values or classes
are allowed. Eg: Pass/Fail, Spam/No spam, Malignant or
Benign Tumor etc.
• That means, Output y (y = f (x)), which is predicted from x
(inputs or features), takes only discrete values/classes.
• Types of logistic regression:
(a) Binary logistic regression: y takes two discrete values (for
two classes).

1
Logistic Regression (contd.)

(b) Multi-class logistic regression: y takes more than two


discrete values (for multiple number of classes). For example,
in digit classification, there are 10 classes (0 – 9).
(c) Ordinal logistic regression: Multiple classes in some order
(Eg: Low, Medium, High).
• Predicts class based on probability.

2
Binary Logistic Regression

• A Two-class classification: y = 0 or 1.
• y can be predicted based on single variable or feature (x):
y = f (x) = β0 + β1 x
• Based on multiple features (x1 , x2 , · · · , xk ):
y = f (x1 , x2 , · · · , xk ) = β0 + β1 x1 + β2 x2 + · · · βk xk
• An example on student dataset:
Hours studied Hours slept Result (Pass (1)/Fail (0))
4.85 9.63 1
8.62 3.23 0
5.43 8.23 1
9.21 6.34 0

3
Binary Logistic Regression (contd.)

• f should generate probabilities (in the range, 0 to 1), to


predict classes.
• Choose a threshold P (say, 0.5) such that, if probability is less
than threshold P, predict y as Class 0. Otherwise, predict y
as Class 1.
• The above function, f , is used for linear regression.
• As shown in the below figure, it is not useful for logistic
regression, as predicted output should be in the range, [0, 1].

4
Binary Logistic Regression (contd.)

• A function that has the range, [0, 1], is required.


• Sigmoid function maps any real value to the range, [0, 1].
σ(z) = 1+e1 −z

5
Binary Logistic Regression (contd.)

• If z = β0 + β1 x1 + β2 x2 + · · · + βk xk , σ(z) gives values in the


range, [0, 1].

6
Binary Logistic Regression (contd.)

• Logistic regression is based on Sigmoid function which is


also called Logistic function, given by:
1
y = f (x1 , x2 , · · · , xk ) = −(β
1 + e 0 1 1 +β2 x2 +···+βk xk )
+β x
• Similar to linear regression, coefficients (β0 , β1 , etc.) are
learnt from training data, based on gradient descent on the
error or cost function – Training process.

7
Cost function
• Consider n training points or samples, (Xi , Yi ), where each Xi
is a k-dimensional feature vector (k features) and Yi = 0 or 1.
• For logistic regression, as Sigmoid function is used which has
exponential term in the denominator, the following cost
function called Log-Loss or Cross Entropy, is used (instead
of MSE used for linear regression):
J(θ) = J(β0 , β1 , · · · , βk ) =
n
1X
− [Yi log (f (Xi )) + (1 − Yi )log (1 − f (Xi ))]
n
i=1
• Similar to MSE, the above cost function which quantifies the
difference between actual (Yi ) and predicted (f (Xi )) outputs,
has to be minimized.

8
Cost function (contd.)

• If the actual output, Yi = 1, J(θ) = −log (f (Xi )). If f (Xi ) is


low, the error, J(θ), is high. If f (Xi ) approaches 1, J(θ)
moves towards 0.
• If the actual output, Yi = 0, J(θ) = −log (1 − f (Xi )). If f (Xi )
is high, error J(θ) is high. If f (Xi ) approaches 0, J(θ) moves
towards 0.
• That means, if both actual and predicted outputs, Yi and
f (Xi ), match, then the error is 0.

9
Cost function (contd.)

• Illustrated in the below plots, for some Y and f (X ):


J(θ) vs f (X ) for Y = 1 and Y = 0.

10
Gradient descent for training

• Initialize each βj , j = 1, 2, · · · k, to some random values.


• For one or more epochs or until some minimum error
threshold (say, ϵ < 0.001) is reached, do the following:
For each j = 1, 2, · · · k,
(i) δβj = −η ∂J(θ)
∂βj
(ii) βj = βj + δβj
• Training process can be monitored by plotting cost, J(θ) vs
number of training epochs.

11
Gradient descent for training (contd.)

12
Testing

• After weights or coefficients (β0 , β1 , · · · , βk ) are learnt after


training, the function,
1
y = f (x1 , x2 , · · · , xk ) = −(β +β x +β2 x2 +···+βk xk )
1+e 0 1 1
can be used to predict output, y , for any new input,
(x1 , x2 , · · · , xk ).
• Choose a threshold, P, say 0.5.
• If y ≥ P, output Class 1.
• If y < P, output Class 0.

13
Multi-class logistic regression

• Suppose there are c classes.


• Divide the problem into c binary classification problems, for
each class – Generate c binary classifiers.
• To train binary classifier i, training samples from class i have
actual output, y = 1. For the remaining samples, y = 0.
• After training c binary classifiers, each classifier i can be used
to determine if a sample belongs to class i or not (binary).
• For testing, compute probabilities from the c binary classifiers
corresponding to the c classes, and then output the class
which has the maximum probability.

14
Thank You

15

You might also like