0% found this document useful (0 votes)

7 views

Logistic Regression

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Logistic Regression

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Logistic Regression

Introduction 1
Comparison to linear regression 1
Types of logistic regression 1
Binary logistic regression 2
Sigmoid activation 2
Decision boundary 3
Making predictions 3
Cost function 4
Gradient descent 5
Mapping probabilities to classes 5
Training 5
Model evaluation 6
Multiclass logistic regression 7
Procedure 7
Softmax activation 7
Scikit-Learn example 8

Introduction
Logistic regression is a classification algorithm used to assign observations to a discrete set of classes.
Unlike linear regression which outputs continuous number values, logistic regression transforms its output
using the logistic sigmoid function to return a probability value which can then be mapped to two or more
discrete classes.

Comparison to linear regression

Given data on time spent studying and exam scores. :doc:`linear_regression` and logistic regression can
predict different things:

• Linear Regression could help us predict the student's test score on a scale of 0 - 100. Linear
regression predictions are continuous (numbers in a range).
• Logistic Regression could help use predict whether the student passed or failed. Logistic
regression predictions are discrete (only specific values or categories are allowed). We can also
view probability scores underlying the model's classifications.

Types of logistic regression

• Binary (Pass/Fail)
• Multi (Cats, Dogs, Sheep)
• Ordinal (Low, Medium, High)
Binary logistic regression
Say we're given data on student exam results and our goal is to predict whether a student will pass or fail
based on number of hours slept and hours spent studying. We have two features (hours slept, hours
studied) and two classes: passed (1) and failed (0).

Studied Slept Passed

4.85 9.63 1
8.62 3.23 0
5.43 8.23 1
9.21 6.34 0

Graphically we could represent our data with a scatter plot.

Sigmoid activation
In order to map predicted values to probabilities, we use the :ref:`sigmoid <activation_sigmoid>` function.
The function maps any real value into another value between 0 and 1. In machine learning, we use
sigmoid to map predictions to probabilities.
Math
S(z) = 1 +1e −z

Note

• = output between 0 and 1 (probability estimate)

• = input to the function (your algorithm's prediction e.g. mx + b)
• = base of natural log
Graph

Code

Decision boundary
Our current prediction function returns a probability score between 0 and 1. In order to map this to a
discrete class (true/false, cat/dog), we select a threshold value or tipping point above which we will classify
values into class 1 and below which we classify values into class 2.
p \geq 0.5, class=1 \\■p < 0.5, class=0
For example, if our threshold was .5 and our prediction function returned .7, we would classify this
observation as positive. If our prediction was .2 we would classify the observation as negative. For logistic
regression with multiple classes we could select the class with the highest predicted probability.

Making predictions
Using our knowledge of sigmoid functions and decision boundaries, we can now write a prediction
function. A prediction function in logistic regression returns the probability of our observation being
positive, True, or "Yes". We call this class 1 and its notation is . As the probability gets closer
to 1, our model is more confident that the observation is in class 1.
Math
Let's use the same :ref:`multiple linear regression <multiple_linear_regression_predict>` equation from our
linear regression tutorial.
z = W0 + W1Studied + W2Slept
This time however we will transform the output using the sigmoid function to return a probability value
between 0 and 1.
P(class = 1) = 1 +1e −z
If the model returns .4 it believes there is only a 40% chance of passing. If our decision boundary was .5,
we would categorize this observation as "Fail.""
Code
We wrap the sigmoid function over the same prediction function we used in :ref:`multiple linear regression
<multiple_linear_regression_predict>`

Cost function
Unfortunately we can't (or at least shouldn't) use the same cost function :ref:`mse` as we did for linear
regression. Why? There is a great math explanation in chapter 3 of Michael Neilson's deep learning book
5
, but for now I'll simply say it's because our prediction function is non-linear (due to sigmoid transform).
Squaring this prediction as we do in MSE results in a non-convex function with many local minimums. If
our cost function has many local minimums, gradient descent may not find the optimal global minimum.
Math
Instead of Mean Squared Error, we use a cost function called :ref:`loss_cross_entropy`, also known as
Log Loss. Cross-entropy loss can be divided into two separate cost functions: one for and one for
.

The benefits of taking the logarithm reveal themselves when you look at the cost function graphs for y=1
and y=0. These smooth monotonic functions 7 (always increasing or always decreasing) make it easy to
calculate the gradient and minimize cost. Image from Andrew Ng's slides on logistic regression 1.

The key thing to note is the cost function penalizes confident and wrong predictions more than it rewards
confident and right predictions! The corollary is increasing prediction accuracy (closer to 0 or 1) has
diminishing returns on reducing cost due to the logistic nature of our cost function.
Above functions compressed into one

Multiplying by and in the above equation is a sneaky trick that let's us use the same equation to
solve for both y=1 and y=0 cases. If y=0, the first side cancels out. If y=1, the second side cancels out. In
both cases we only perform the operation we need to perform.
Vectorized cost function

Code
Gradient descent
To minimize our cost, we use :doc:`gradient_descent` just like before in :doc:`linear_regression`. There
are other more sophisticated optimization algorithms out there such as conjugate gradient like
:ref:`optimizers_lbfgs`, but you don't have to worry about these. Machine learning libraries like Scikit-learn
hide their implementations so you can focus on more interesting things!
Math
One of the neat properties of the sigmoid function is its derivative is easy to calculate. If you're curious,
there is a good walk-through derivation on stack overflow 6. Michael Neilson also covers the topic in
chapter 3 of his book.
\begin{align}■s'(z) & = s(z)(1 - s(z))■\end{align}
Which leads to an equally beautiful and convenient cost function derivative:
C 0 = x(s(z) − y)

Note

• is the derivative of cost with respect to weights

• is the actual class label (0 or 1)
• is your model's prediction
• is your feature or feature vector.

Notice how this gradient is the same as the :ref:`mse` gradient, the only difference is the hypothesis
function.
Pseudocode

Repeat {

1. Calculate gradient average

2. Multiply by learning rate
3. Subtract from weights

Code

Mapping probabilities to classes

The final step is assign class labels (0 or 1) to our predicted probabilities.
Decision boundary
Convert probabilities to classes
Example output

Probabilities = [ 0.967, 0.448, 0.015, 0.780, 0.978, 0.004]

Classifications = [1, 0, 0, 1, 1, 0]

Training
Our training code is the same as we used for :ref:`linear regression <simple_linear_regression_training>`.
Model evaluation
If our model is working, we should see our cost decrease after every iteration.

iter: 0 cost: 0.635

iter: 1000 cost: 0.302
iter: 2000 cost: 0.264

Final cost: 0.2487. Final weights: [-8.197, .921, .738]

Cost history

Accuracy
:ref:`Accuracy <glossary_accuracy>` measures how correct our predictions were. In this case we simply
compare predicted labels to true labels and divide by the total.
Decision boundary
Another helpful technique is to plot the decision boundary on top of our predictions to see how our labels
compare to the actual labels. This involves plotting our predicted probabilities and coloring them with their
true labels.
Code to plot the decision boundary

Multiclass logistic regression

Instead of we will expand our definition so that . Basically we re-run binary
classification multiple times, once for each class.

Procedure

1. Divide the problem into n+1 binary classification problems (+1 because the index starts at 0?).
2. For each class...
3. Predict the probability the observations are in that single class.
4. prediction = <math>max(probability of the classes)
For each sub-problem, we select one class (YES) and lump all the others into a second class (NO). Then
we take the class with the highest predicted value.

Softmax activation
The softmax function (softargmax or normalized exponential function) is a function that takes as input a
vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities
proportional to the exponentials of the input numbers. That is, prior to applying softmax, some vector
components could be negative, or greater than one; and might not sum to 1; but after applying softmax,
each component will be in the interval [ 0 , 1 ] , and the components will add up to 1, so that they can be
interpreted as probabilities. The standard (unit) softmax function is defined by the formula
\begin{align}■ σ(z_i) = \frac{e^{z_{(i)}}}{\sum_{j=1}^K e^{z
In words: we apply the standard exponential function to each element of the input vector and
normalize these values by dividing by the sum of all these exponentials; this normalization ensures that the
sum of the components of the output vector is 1. 9
Scikit-Learn example
Let's compare our performance to the LogisticRegression model provided by scikit-learn 8.
Scikit score: 0.88. Our score: 0.89
References

1 http://www.holehouse.org/mlclass/06_Logistic_Regression.html
2 http://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning
3 https://scilab.io/machine-learning-logistic-regression-tutorial/
4 https://github.com/perborgen/LogisticRegression/blob/master/logistic.py
5 http://neuralnetworksanddeeplearning.com/chap3.html
6 http://math.stackexchange.com/questions/78575/derivative-of-sigmoid-function-sigma-x-frac11e-x
7 https://en.wikipedia.org/wiki/Monotoniconotonic_function
8 http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression>
9 https://en.wikipedia.org/wiki/Softmax_function

Model Que UKG
91% (11)
Model Que UKG
8 pages
Deep Learning Week 204-4
No ratings yet
Deep Learning Week 204-4
1 page
Slide 2
No ratings yet
Slide 2
30 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Week 3 Lecture Notes
No ratings yet
Week 3 Lecture Notes
7 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Neural Network
No ratings yet
Neural Network
14 pages
04- Linear-Classification-2024
No ratings yet
04- Linear-Classification-2024
65 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
logistic regression
No ratings yet
logistic regression
6 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
Sample Research Paper
No ratings yet
Sample Research Paper
26 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Lecture 5_Logistic Regression (1)
No ratings yet
Lecture 5_Logistic Regression (1)
28 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
W2 Ann
No ratings yet
W2 Ann
12 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
06LogisticRegression
No ratings yet
06LogisticRegression
55 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
fileml
No ratings yet
fileml
54 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
Binary Logistic Regression From Scratch
No ratings yet
Binary Logistic Regression From Scratch
10 pages
lec20
No ratings yet
lec20
16 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
Chapter02-Introduction-to-DeepLearning
No ratings yet
Chapter02-Introduction-to-DeepLearning
84 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
13.logistic Regression
No ratings yet
13.logistic Regression
9 pages
Final Ml
No ratings yet
Final Ml
54 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
Lec12 Logreg
No ratings yet
Lec12 Logreg
41 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
4.Logistic Regression
No ratings yet
4.Logistic Regression
16 pages
M146-Lec3-sidenotes-S25
No ratings yet
M146-Lec3-sidenotes-S25
29 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
Q4 DLP GRADE 5 & 6
No ratings yet
Q4 DLP GRADE 5 & 6
107 pages
PDF Level Sets and Extrema of Random Processes and Fields 1st Edition Jean-Marc Azais download
100% (9)
PDF Level Sets and Extrema of Random Processes and Fields 1st Edition Jean-Marc Azais download
67 pages
Unit 7
No ratings yet
Unit 7
1 page
Module 2 Measurement of Horizontal Distances
No ratings yet
Module 2 Measurement of Horizontal Distances
16 pages
An Analysis of Data Mining Applications in Crime Domain: P. Thongtae and S. Srisuk
No ratings yet
An Analysis of Data Mining Applications in Crime Domain: P. Thongtae and S. Srisuk
5 pages
P.1 Maths Topical Breakdown
No ratings yet
P.1 Maths Topical Breakdown
34 pages
Computation 09 00029 v2
No ratings yet
Computation 09 00029 v2
49 pages
LECTURE 5 & 6 - successive differentiation
No ratings yet
LECTURE 5 & 6 - successive differentiation
6 pages
(Oct. 13, Part 1) Individual Risk Model
No ratings yet
(Oct. 13, Part 1) Individual Risk Model
16 pages
Presentation
No ratings yet
Presentation
41 pages
Cambridge International AS & A Level: PHYSICS 9702/32
No ratings yet
Cambridge International AS & A Level: PHYSICS 9702/32
16 pages
How To Use Matrix in Matlab?: Example 1
No ratings yet
How To Use Matrix in Matlab?: Example 1
2 pages
DLL - Mathematics 6 - Q4 - W1
No ratings yet
DLL - Mathematics 6 - Q4 - W1
9 pages
Impex
No ratings yet
Impex
2 pages
Comptech Cheat Sheet
No ratings yet
Comptech Cheat Sheet
2 pages
TR Food and Beverage Services NC III
No ratings yet
TR Food and Beverage Services NC III
52 pages
Portfolio Risk
No ratings yet
Portfolio Risk
24 pages
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
No ratings yet
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
73 pages
Franck-Condon Factors For Electronic Band Systems of Molecular Nitrogen
No ratings yet
Franck-Condon Factors For Electronic Band Systems of Molecular Nitrogen
23 pages
Performance Analysis and Optimization of Double-Flash Geothermal Power Plants PDF
No ratings yet
Performance Analysis and Optimization of Double-Flash Geothermal Power Plants PDF
9 pages
Significant Figure 1
No ratings yet
Significant Figure 1
16 pages
DLP - Arithmetic Sequence
No ratings yet
DLP - Arithmetic Sequence
10 pages
Generalized Acceleration Response Spectra Development by Probabilistic Seismic Hazard Analysis (PSHA)
No ratings yet
Generalized Acceleration Response Spectra Development by Probabilistic Seismic Hazard Analysis (PSHA)
10 pages
Mathematical Foundations
No ratings yet
Mathematical Foundations
5 pages
24-25 CBSE 10 Maths (Stan) Booklet Solutions
No ratings yet
24-25 CBSE 10 Maths (Stan) Booklet Solutions
30 pages
Title: Pump Impeller Report
No ratings yet
Title: Pump Impeller Report
35 pages
Csat Pyq Solved - (2011-2022) - Final Updated
No ratings yet
Csat Pyq Solved - (2011-2022) - Final Updated
258 pages
Logility Demand Planning Tips Ebook
No ratings yet
Logility Demand Planning Tips Ebook
10 pages
S11 Creating Pipe Fittings in Revit MEP-Jose Fandos - Handout
No ratings yet
S11 Creating Pipe Fittings in Revit MEP-Jose Fandos - Handout
24 pages