100% found this document useful (2 votes)
48 views

Logistic Regression

Logistic regression is a classification algorithm that uses a logistic function to predict a discrete outcome such as male or female. It determines weights to minimize the log-likelihood function and fit a linear classifier. The algorithm was demonstrated using Python's scikit-learn library to classify data as 0 or 1, with evaluation showing 90% accuracy on test data using metrics like confusion matrix and classification report.

Uploaded by

Bharath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
48 views

Logistic Regression

Logistic regression is a classification algorithm that uses a logistic function to predict a discrete outcome such as male or female. It determines weights to minimize the log-likelihood function and fit a linear classifier. The algorithm was demonstrated using Python's scikit-learn library to classify data as 0 or 1, with evaluation showing 90% accuracy on test data using metrics like confusion matrix and classification report.

Uploaded by

Bharath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Logistic Regression

Dr. Rajavel Ramadoss


Associate Professor
SSN College of Engineering
Presentation Outline
❑ Introduction to logistic regression

❑ Problem formulation

❑ Logistic regression for binary and multiclass classification

❑ Demo - Python implementation

❑ Summary
Logistic Regression
▪ Logistic regression is a simple classification algorithm for
learning to predict a discrete variable such as predicting whether
a person is

male or female (binary classification)

male, female or transgender (multiclass classification)

▪ Logistic regression is a linear classifier, so we’ll use a linear


function 𝑓(𝐱) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ, also called the logit.

▪ The variables 𝑏₀, 𝑏₁, …, 𝑏ᵣ are the estimators of the regression


coefficients, which are also called the predicted weights or
just coefficients.
Logistic Regression
▪ The logistic regression function 𝑝(𝐱) is the sigmoid function of
𝑓(𝐱): 𝑝(𝐱) = 1 / (1 + exp(−𝑓(𝐱)).

▪ The function 𝑝(𝐱) is often interpreted as the predicted probability


that the output for a given 𝐱 is equal to 1.

▪ Therefore, 1 − 𝑝(𝑥) is the probability that the output is 0.


Logistic Regression
▪ Logistic regression determines the best predicted weights 𝑏₀, 𝑏₁,
…, 𝑏ᵣ such that the function 𝑝(𝐱) is as close as possible to all
actual responses 𝑦ᵢ, 𝑖 = 1, …, 𝑛, where 𝑛 is the number of
observations.

▪ The process of calculating the best weights using available


observations is called model training or fitting.

▪ To get the best weights, we usually maximize the log-likelihood


function (LLF) for all observations 𝑖 = 1, …, 𝑛.

▪ This method is called the maximum likelihood estimation


and is represented by the equation

LLF = Σᵢ(𝑦ᵢ log(𝑝(𝐱ᵢ)) + (1 − 𝑦ᵢ) log(1 − 𝑝(𝐱ᵢ))).


Logistic Regression - Notations
Logistic Regression-Objective Function
▪ In logistic regression we can’t use the same objective function
(MSE) as used in linear regression

▪ MSE introduces different penalty to different error values. For


example, Y – Y^ is small it adds low penalty, i.e y-y^ = 0.1 it
adds penalty as (0.1)^2 = 0.001, whereas Y – Y^ is larger it
adds large penalty, i.e. y-y^ = 10 it adds penalty as (10)^2 =
100

▪ This is not the case in logistic regression, because the error will
always be less than 1 since both model output and label is ≤1
Logistic Regression-Objective Function
▪ In logistic regression, we use cross entropy as the objective
function
Intuition behind the Objective Function
Intuition behind the Objective Function
Intuition behind the Objective Function
Multi-Class Classification
Multi-Class Logistic Regression
Multi-Class Logistic Regression
Multi-Class Logistic Regression
Multi-Class Logistic Regression
Logistic Regression in Python With scikit-learn
▪ General steps to prepare classification models:
1. Import packages, functions, and classes
2. Get data to work with and, if appropriate, transform
it
3. Create a classification model and train (or fit) it with
your existing data
4. Evaluate your model to see if its performance is
satisfactory
A sufficiently good model that we define can be used to
make further predictions related to new, unseen data.
Logistic Regression in Python With scikit-learn
Step 1: Import Packages, Functions, and Classes
import matplotlib.pyplot as plt
import numpy as np
From sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report,
confusion_matrix
Step 2: Get Data
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
Note: .reshape() with the arguments -1 to get as many rows as
needed and 1 to get one column
Logistic Regression in Python With scikit-learn
Step 3: Create a Model and Train It
model = LogisticRegression(solver='liblinear', random_state=0)

▪ Once the model is created, we need to fit (or train) it.


▪ Model fitting is the process of determining the
coefficients 𝑏₀, 𝑏₁, …, 𝑏ᵣ that correspond to the best
value of the cost function.
▪ We can fit the model with .fit()
model.fit(x, y)
Or equantly we can use
model = LogisticRegression(solver='liblinear', random_state=0).fit(x, y)
Logistic Regression in Python With scikit-learn
▪ We can get the attributes of the model as follows:

model.classes_

array([0, 1])

model.intercept_

array([-1.04608067])

model.coef_

array([[0.51491375]])
Logistic Regression in Python With scikit-learn
Step 4: Evaluate the Model

▪ Once a model is defined, we can check its


performance with .predict_proba(), which returns the
matrix of probabilities that the predicted output is
equal to zero or one

model.predict_proba(x)

array([[0.74002157, 0.25997843], [0.62975524, 0.37024476],


[0.5040632, 0.4959368], [0.37785549, 0.62214451],
[0.26628093, 0.73371907], [0.17821501, 0.82178499],
[0.11472079, 0.88527921], [0.07186982, 0.92813018],
[0.04422513, 0.95577487], [0.02690569, 0.97309431]])
Logistic Regression in Python With scikit-learn
▪ The first column is the probability of the predicted
output being zero, that is 1 - 𝑝(𝑥).
▪ The second column is the probability that the output is
one, or 𝑝(𝑥).
▪ We can get the actual predictions, based on the
probability matrix and the values of 𝑝(𝑥), with
.predict():
model.predict(x)
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
▪ This function returns the predicted output values as a
one-dimensional array.
Logistic Regression in Python With scikit-learn
Logistic Regression in Python With scikit-learn
▪ The green circles represent the actual responses as
well as the correct predictions.

▪ The red × shows the incorrect prediction.

▪ The full black line is the estimated logistic regression


line 𝑝(𝑥).

▪ The grey squares are the points on this line that


correspond to 𝑥 and the values in the second column
of the probability matrix.

▪ The black dashed line is the logit 𝑓(𝑥).


Logistic Regression in Python With scikit-learn
▪ The value of 𝑥 slightly above 2 corresponds to the
threshold 𝑝(𝑥)=0.5, which is 𝑓(𝑥)=0.
▪ For example, the first point has input 𝑥=0, actual
output 𝑦=0, probability 𝑝=0.26, and a predicted
value of 0.
▪ The second point has 𝑥=1, 𝑦=0, 𝑝=0.37, and a
prediction of 0.
▪ Only the fourth point has the actual output 𝑦=0 and
the probability higher than 0.5 (at 𝑝=0.62), so it’s
wrongly classified as 1.
▪ All other values are predicted correctly.
Logistic Regression in Python With scikit-learn
▪ The accuracy of the model is 9/10=0.9, which we
can obtain with .score():
model.score(x, y) => 0.9
▪ We can get more information on the accuracy of the
model with a confusion matrix.
confusion_matrix(y, model.predict(x))
array([[3, 1],
[0, 6]])
Logistic Regression in Python With scikit-learn
▪ We can get a more comprehensive report on the
classification with classification_report():
print(classification_report(y, model.predict(x)))
precision recall f1-score support
0 1.00 0.75 0.86 4
1 0.86 1.00 0.92 6
accuracy 0.90 10
macro avg 0.93 0.88 0.89 10
weighted avg 0.91 0.90 0.90 10
Session Summary
In this session we have learned,

❑ Introduction to logistic regression

❑ Problem formulation

❑ Logistic regression for binary and multiclass


classification

❑ Demo - Python implementation


References

1. https://realpython.com/logistic-regression-python/
Thanks

You might also like