0% found this document useful (0 votes)
11 views14 pages

Lecture 13

Logistic regression is a supervised learning algorithm used for predicting categorical dependent variables, providing probabilistic outputs between 0 and 1. Unlike linear regression, which can yield values outside this range, logistic regression uses the sigmoid function to ensure outputs remain valid probabilities. The document also discusses the application of logistic regression in multi-class classification and outlines the Stochastic Gradient Descent algorithm for parameter estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Lecture 13

Logistic regression is a supervised learning algorithm used for predicting categorical dependent variables, providing probabilistic outputs between 0 and 1. Unlike linear regression, which can yield values outside this range, logistic regression uses the sigmoid function to ensure outputs remain valid probabilities. The document also discusses the application of logistic regression in multi-class classification and outlines the Stochastic Gradient Descent algorithm for parameter estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Logistic Regression

Mr. Tapan Kumar Dey


Manipal University Jaipur

February 11, 2025

(MUJ) Logistic Regression February 11, 2025 1 / 14


Table of Contents

1 Logistic Regression

2 Why not use Linear regression for classification

3 Logistic Regression for Multi class Classification

(MUJ) Logistic Regression February 11, 2025 2 / 14


Introduction

Logistic regression is one of the most popular Machine Learning algo-


rithms, which comes under the Supervised Learning technique. It is
used for predicting the categorical dependent variable using a given set
of independent variables.
Logistic regression predicts the output of a categorical dependent vari-
able. Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc.
but instead of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
Example:
Email is spam or not.
Will customer will buy insurance.
Given images of a tumor we have to predict it cancer/non cancer.

(MUJ) Logistic Regression February 11, 2025 3 / 14


Case Study
Suppose you are working as a data scientist in a life insurance company. You
are going to predict how likely a potential customer will buy your insurance
product. Observing the age of customer’s will know that he or she will buy
an insurance product or not.

(MUJ) Logistic Regression February 11, 2025 4 / 14


Solution
If we will going to model the problem using linear regression.

Imagine if we draw the best fit like like below. This could be much better
fit as compared to the previous.

(MUJ) Logistic Regression February 11, 2025 5 / 14


Why not use Linear regression for classification

Linear regression models the relationship as:

y = β0 + β1 x1 + β2 x2 + · · · + βp xp

However, in classification tasks the target variable


y is categorical (e.g., 0 or 1).
Linear regression can produce values outside the range [0,1], making
it unsuitable for probability estimation.
To address this, we use logistic regression, which maps predictions to a
probability range [0,1] using the sigmoid function. To ensure that the output
remains between 0 and 1, we apply the sigmoid function:
1
σ(z) =
1 + e −z
where z is a linear combination of input features:

(MUJ) Logistic Regression February 11, 2025 6 / 14


Logistic Regression equation

Z = β0 + β1 x1 + β2 x2 + · · · + βp xp (1)
The logistic regression model equation is:
1
p(y = 1|X ) = (2)
1+ e −(β0 +β1 x1 +β2 x2 +···+βp xp )

(MUJ) Logistic Regression February 11, 2025 7 / 14


Logistic Regression Equation

(MUJ) Logistic Regression February 11, 2025 8 / 14


Example

A company wants to predict whether a customer will buy a product (Yes =


1, No = 0) based on their annual income (x1 ) and age (x2 )

1 Compute the log-odds (z) for this customer.


2 Calculate the probability that the customer will buy the product.

(MUJ) Logistic Regression February 11, 2025 9 / 14


Solution

Compute z:
z = −4 + (0.02 × 50) + (0.3 × 30)
z = −4 + 1 + 9 = 6
Calculate P(y = 1 | x) :

1
P(y = 1 | x) =
1 + e −6
1
P(y = 1 | x) = = 0.9975
1 + 0.00248
The probability that the customer will buy the product is approximately
99.75%.

(MUJ) Logistic Regression February 11, 2025 10 / 14


Logistic Regression for Multi class Classification

(MUJ) Logistic Regression February 11, 2025 11 / 14


Example
This dataset has two input variables (X1 and X2) and one output variable
(Y). In input variables are real-valued random numbers drawn from a Gaus-
sian distribution. The output variable has two values, making the problem
a binary classification problem.
X1 X2 Y
2.7810836 2.550537003 0
1.465489372 2.362125076 0
3.396561688 4.400293529 0
1.38807019 1.850220317 0
3.06407232 3.005305973 0
7.627531214 2.759262235 1
5.332441248 2.088626775 1
6.922596716 1.77106367 1
8.675418651 -0.2420686549 1
7.673756466 3.508563011 1

(MUJ) Logistic Regression February 11, 2025 12 / 14


Solution
For the given dataset, the logistic regression has three coefficients and the
linear combinations of these inputs have

Z = β0 + β1 x1 + β2 x2 + · · · + βp xp

The job of the learning algorithm will be to discover the best values for the
coefficients (β0 , β1 and β2 ) based on the training data.

1
p(y = 1 | X ) = σ(X β) =
1 + e −X β
To estimate the parameter β, we maximize the log-likelihood function.
n
X
ℓ(β) = [yi log σ(Xi β) + (1 − yi ) log(1 − σ(Xi β))]
i=1

Since solving this equation analytically complex, we use Stochastic


Gradient Descent algorithm.
(MUJ) Logistic Regression February 11, 2025 13 / 14
Stochastic Gradient Descent algorithm
Algorithm:
1 Initialize parameters β randomly or with small values.
2 For each training sample (Xi , yi ):
1 Compute the prediction using the sigmoid function:
1
ŷi = σ(Xi β) =
1 + e −Xi β
2 Compute the gradient of the loss function (log-likelihood):

∇ℓ(β) = XiT (yi − ŷi )

3 Update the parameter β using the SGD update rule:

β = β + αXiT (yi − ŷi )

where α is the learning rate.


3 Repeat for multiple epochs until convergence.
(MUJ) Logistic Regression February 11, 2025 14 / 14

You might also like