0% found this document useful (0 votes)

19 views20 pages

What Is Logistic Regression

Logistic regression is used when the dependent variable is binary. It predicts the probability of an outcome being yes or no. Logistic regression uses the logistic function to squeeze the linear regression output between 0 and 1. It works by training a model on data to find the coefficients that minimize error and then using the model to make predictions on new data.

Uploaded by

SHRAVANI VISSAMSETTI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views20 pages

What Is Logistic Regression

Uploaded by

SHRAVANI VISSAMSETTI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

What is Logistic Regression?

Logistic regression is the appropriate regression analysis to conduct when the

dependent variable is dichotomous (binary). Like all regression analyses, logistic

regression is a predictive analysis. It is used to describe data and to explain the

relationship between one dependent binary variable and one or more nominal,

ordinal, interval or ratio-level independent variables.

I found this definition on google and now we’ll try to understand it.

Logistic Regression is another statistical analysis method borrowed by Machine

Learning. It is used when our dependent variable is dichotomous or binary. It just

means a variable that has only 2 outputs, for example, A person will survive this

accident or not, The student will pass this exam or not. The outcome can either

be yes or no (2 outputs). This regression technique is similar to linear regression

and can be used to predict the Probabilities for classification problems.

Types of Logistic Regression
Binary logistic regression

Binary logistic regression is used to predict the probability of a binary outcome,

such as yes or no, true or false, or 0 or 1. For example, it could be used to

predict whether a customer will churn or not, whether a patient has a disease or

not, or whether a loan will be repaid or not.

Multinomial logistic regression

Multinomial logistic regression is used to predict the probability of one of three or

more possible outcomes, such as the type of product a customer will buy, the

rating a customer will give a product, or the political party a person will vote for.

Ordinal logistic regression

is used to predict the probability of an outcome that falls into a predetermined

order, such as the level of customer satisfaction, the severity of a disease, or the

stage of cancer.

Why do we use Logistic Regression rather than Linear Regression?

If you have this doubt, then you’re in the right place, my friend. After reading the

definition of logistic regression we now know that it is only used when our

dependent variable is binary and in linear regression this dependent variable is

continuous.

The second problem is that if we add an outlier in our dataset, the best fit line in

linear regression shifts to fit that point.

Now, if we use linear regression to find the best fit line which aims at minimizing

the distance between the predicted value and actual value, the line will be like

this:

Image Source:towardsdatascience.com

Here the threshold value is 0.5, which means if the value of h(x) is greater than

0.5 then we predict malignant tumor (1) and if it is less than 0.5 then we predict

benign tumor (0). Everything seems okay here but now let’s change it a bit, we

add some outliers in our dataset, now this best fit line will shift to that point.

Hence the line will be somewhat like this:

Image Source:towardsdatascience.com
Do you see any problem here? The blue line represents the old threshold and the

yellow line represents the new threshold which is maybe 0.2 here. To keep our

predictions right we had to lower our threshold value. Hence we can say that

linear regression is prone to outliers. Now here if h(x) is greater than 0.2 then

only this regression will give correct outputs.

Another problem with linear regression is that the predicted values may be out of

range. We know that probability can be between 0 and 1, but if we use linear

regression this probability may exceed 1 or go below 0.

To overcome these problems we use Logistic Regression, which converts this

straight best fit line in linear regression to an S-curve using the sigmoid function,

which will always give values between 0 and 1. How does this work and what’s

the math behind this will be covered in a later section?

If you want to know the difference between logistic regression and linear

regression then you refer to this article.

How does Logistic Regression work?

Logistic regression works in the following steps:

1. Prepare the data: The data should be in a format where each row represents a

single observation and each column represents a different variable. The target

variable (the variable you want to predict) should be binary (yes/no, true/false,

0/1).
2. Train the model: We teach the model by showing it the training data. This

involves finding the values of the model parameters that minimize the error in the

training data.

3. Evaluate the model: The model is evaluated on the held-out test data to assess

its performance on unseen data.

4. Use the model to make predictions: After the model has been trained and

assessed, it can be used to forecast outcomes on new data.

Logistic Function

You must be wondering how logistic regression squeezes the output of linear

regression between 0 and 1. If you haven’t read my article on Linear Regression

then please have a look at it for a better understanding.

Well, there’s a little bit of math included behind this and it is pretty interesting

trust me.

Let’s start by mentioning the formula of logistic function:

How similar it is too linear regression? If you haven’t read my article on Linear

Regression, then please have a look at it for a better understanding.

Best Fit Equation in Linear Regression

We all know the equation of the best fit line in linear regression is:
Let’s say instead of y we are taking probabilities (P). But there is an issue here,

the value of (P) will exceed 1 or go below 0 and we know that range of

Probability is (0-1). To overcome this issue we take “odds” of P:

Do you think we are done here? No, we are not. We know that odds can always

be positive which means the range will always be (0,+∞ ). Odds are nothing but

the ratio of the probability of success and probability of failure. Now the question

comes out of so many other options to transform this why did we only

take ‘odds’? Because odds are probably the easiest way to do this, that’s it.

The problem here is that the range is restricted and we don’t want a restricted

range because if we do so then our correlation will decrease. By restricting the

range we are actually decreasing the number of data points and of course, if we

decrease our data points, our correlation will decrease. It is difficult to model a

variable that has a restricted range. To control this we take the log of

odds which has a range from (-∞,+∞).

If you understood what I did here then you have done 80% of the maths. Now we

just want a function of P because we want to predict probability right? not log of

odds. To do so we will multiply by exponent on both sides and then solve for P.
Now we have our logistic function, also called a sigmoid function. The graph of a

sigmoid function is as shown below. It squeezes a straight line into an S-curve.

Key properties of the logistic regression equation

 Sigmoid Function: The logistic regression model, when explained, uses a

special “S” shaped curve to predict probabilities. It ensures that the predicted

probabilities stay between 0 and 1, which makes sense for probabilities.

 Straightforward Relationship: Even though the logistic regression model might

seem complex, the relationship between our inputs (like age, height, etc.) and

the outcome (like yes/no) is pretty simple to understand. It’s like drawing a

straight line, but with a curve instead.

 Coefficients: These are just numbers that tell us how much each input affects

the outcome in the logistic regression model. For example, if age is a predictor,

the coefficient tells us how much the outcome changes for every one year

increase in age.

 Best Guess: We figure out the best coefficients for the logistic regression model

by looking at the data we have and tweaking them until our predictions match the

real outcomes as closely as possible.

 Basic Assumptions: In logistic regression explained, we assume that our

observations are independent, meaning one doesn’t affect the other. We also

assume that there’s not too much overlap between our predictors (like age and

height), and the relationship between our predictors and the outcome is kind of

like a straight line.

 Probabilities, Not Certainties: Instead of saying “yes” or “no” directly, logistic

regression gives us probabilities, like saying there’s a 70% chance it’s a “yes” in

the logistic regression model. We can then decide on a cutoff point to make our

final decision.

 Checking Our Work: In logistic regression explained, we have some tools to

make sure our predictions are good, like accuracy, precision, recall, and a curve

called the ROC curve. These help us see how well our logistic regression model

is doing its job.

Cost Function in Logistic Regression

In linear regression, we use the Mean squared error which was the difference

between y_predicted and y_actual and this is derived from the maximum

likelihood estimator. The graph of the cost function in linear regression is like

this:

Image Source: https://dchandra.com/

In logistic regression Yi is a non-linear function (Ŷ=1/1+ e-z). If we use this in the

above MSE equation then it will give a non-convex graph with many local minima

as shown

Image Source:towardsdatascience.com

The problem here is that this cost function will give results with local minima,

which is a big problem because then we’ll miss out on our global minima and our

error will increase.

In order to solve this problem, we derive a different cost function for logistic

regression called log loss which is also derived from the maximum likelihood

estimation method.

In the next section, we’ll talk a little bit about the maximum likelihood estimator

and what it is used for. We’ll also try to see the math behind this log loss

function.
What is the use of Maximum Likelihood Estimator?

The primary objective of Maximum Likelihood Estimation (MLE) in machine

learning, particularly in the context of logistic regression, is to identify parameter

values that maximize the likelihood function. This function represents the joint

probability density function (pdf) of our sample observations. In essence, it

involves multiplying the conditional probabilities for observing each example

given the distribution parameters. In the realm of logistic regression in machine

learning, this process aims to discover parameter values such that, when

plugged into the model for P(x), it produces a value close to one for individuals

with a malignant tumor and close to zero for those with a benign tumor.

Let’s start by defining our likelihood function. We now know that the labels are

binary which means they can be either yes/no or pass/fail etc. We can also say

we have two outcomes success and failure. This means we can interpret each

label as Bernoulli random variable.

Random Experiment

A random experiment whose outcomes are of two types, success S and failure F,

occurring with probabilities p and q respectively is called a Bernoulli trial. If for

this experiment a random variable X is defined such that it takes value 1 when S

occurs and 0 if F occurs, then X follows a Bernoulli Distribution.

Where P is our sigmoid function

where σ(θ^T*x^i) is the sigmoid function. Now for n observations,

We need a value for theta which will maximize this likelihood function. To make

our calculations easier we multiply the log on both sides. The function we get is

also called the log-likelihood function or sum of the log conditional probability

In machine learning, it is conventional to minimize a loss(error) function via

gradient descent, rather than maximize an objective function via gradient ascent.

If we maximize this above function then we’ll have to deal with gradient ascent to

avoid this we take negative of this log so that we use gradient descent. We’ll talk

more about gradient descent in a later section and then you’ll have more clarity.

Also, remember,

max[log(x)] = min[-log(x)]

The negative of this function is our cost function and what do we want with our

cost function? That it should have a minimum value. It is common practice to

minimize a cost function for optimization problems; therefore, we can invert the

function so that we minimize the negative log-likelihood (NLL). So in logistic

regression, our cost function is:

Here y represents the actual class and log(σ(θ^T*x^i) ) is the probability of that

class.

 p(y) is the probability of 1.

 1-p(y) is the probability of 0.

Let’s see what will be the graph of cost function when y=1 and y=0

If we combine both the graphs, we will get a convex graph with only 1 local

minimum and now it’ll be easy to use gradient descent here.

The red line here represents the 1 class (y=1), the right term of cost function will

vanish. Now if the predicted probability is close to 1 then our loss will be less and

when probability approaches 0, our loss function reaches infinity.

The black line represents 0 class (y=0), the left term will vanish in our cost

function and if the predicted probability is close to 0 then our loss function will be

less but if our probability approaches 1 then our loss function reaches infinity.

This cost function is also called log loss. It also ensures that as the probability of

the correct answer is maximized, the probability of the incorrect answer is

minimized. Lower the value of this cost function higher will be the accuracy.

Gradient Descent Optimization

In this section, we will try to understand how we can utilize Gradient Descent to

compute the minimum cost.

Gradient descent changes the value of our weights in such a way that it always

converges to minimum point or we can also say that, it aims at finding the

optimal weights which minimize the loss function of our model. It is an iterative

method that finds the minimum of a function by figuring out the slope at a random

point and then moving in the opposite direction.

Image Source: www.analyticsvidhya.com/

The intuition is that if you are hiking in a canyon and trying to descend most

quickly down to the river at the bottom, you might look around yourself 360

degrees, find the direction where the ground is sloping the steepest, and walk

downhill in that direction.

At first gradient descent takes a random value of our parameters from our

function. Now we need an algorithm that will tell us whether at the next iteration

we should move left or right to reach the minimum point. The gradient descent

algorithm finds the slope of the loss function at that particular point and then in

the next iteration, it moves in the opposite direction to reach the minima. Since

we have a convex graph now we don’t need to worry about local minima. A

convex curve will always have only 1 minima.

We can summarize the gradient descent algorithm as:

Here alpha is known as the learning rate. It determines the step size at each

iteration while moving towards the minimum point. Usually, a lower value

of “alpha” is preferred, because if the learning rate is a big number then we may

miss the minimum point and keep on oscillating in the convex curve

Image
Source : https://stackoverflow.com/

Now the question is what is this derivative of cost function? How do we do this?

Don’t worry, In the next section we’ll see how we can derive this cost function

w.r.t our parameters.

Derivation of Cost Function:

Before we derive our cost function we’ll first find a derivative for our sigmoid

function because it will be used in derivating the cost function.

Now, we will derive the cost function with the help of the chain rule as it allows us

to calculate complex partial derivatives by breaking them down.

Step-1: Use chain rule and break the partial derivative of log-likelihood.
Step-2: Find derivative of log-likelihood w.r.t p

Step-3: Find derivative of ‘p’ w.r.t ‘z’

Step-4: Find derivate of z w.r.t θ

Step-5: Put all the derivatives in equation 1

Hence the derivative of our cost function is:

Now since we have our derivative of the cost function, we can write our gradient

descent algorithm as:

If the slope is negative (downward slope) then our gradient descent will add

some value to our new value of the parameter directing it towards the minimum

point of the convex curve. Whereas if the slope is positive (upward slope) the

gradient descent will minus some value to direct it towards the minimum point.

Lecture 4-Logistic Regression
No ratings yet
Lecture 4-Logistic Regression
20 pages
Dissertation Using Logistic Regression
100% (2)
Dissertation Using Logistic Regression
6 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
ALC Unit-4
No ratings yet
ALC Unit-4
15 pages
Topic 8 Time Series and Forecasting
No ratings yet
Topic 8 Time Series and Forecasting
33 pages
Chapter 10 - Determining How Costs Behave
100% (1)
Chapter 10 - Determining How Costs Behave
41 pages
Sri Indu College of Engineering & Technology: Email Address
No ratings yet
Sri Indu College of Engineering & Technology: Email Address
11 pages
MLStackCafe QAS 1672810525772
No ratings yet
MLStackCafe QAS 1672810525772
12 pages
Stiffness Method Beam
No ratings yet
Stiffness Method Beam
8 pages
Introduction To Time Series Analysis
No ratings yet
Introduction To Time Series Analysis
17 pages
Gyanoday Internship-Guidelines GRID-InDIA Internal
No ratings yet
Gyanoday Internship-Guidelines GRID-InDIA Internal
11 pages
State Space Search: Water Jug Problem
100% (2)
State Space Search: Water Jug Problem
14 pages
Stability of Linear Control Systems
100% (1)
Stability of Linear Control Systems
11 pages
Goal Programming
No ratings yet
Goal Programming
12 pages
SR Flip Flop JK Flip Flop
No ratings yet
SR Flip Flop JK Flip Flop
7 pages
Phai
No ratings yet
Phai
135 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
MACHINE LEARNING Presentation Logistic Regression
No ratings yet
MACHINE LEARNING Presentation Logistic Regression
18 pages
Introduction To Optical Quantum Information Processing Kok P PDF Download
No ratings yet
Introduction To Optical Quantum Information Processing Kok P PDF Download
78 pages
Types of Kriging
100% (1)
Types of Kriging
9 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
53 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
Logistic Regression Report
No ratings yet
Logistic Regression Report
39 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Logistic Regression
No ratings yet
Logistic Regression
22 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
25 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
LAB04 RegressionTasks
No ratings yet
LAB04 RegressionTasks
31 pages
4.logistic Regression
No ratings yet
4.logistic Regression
28 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Seismic Isolation: Linear Theory of Base Isolation
No ratings yet
Seismic Isolation: Linear Theory of Base Isolation
2 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Numerical Methods For CSE Problem Sheet 4: Problem 1. Order of Convergence From Error Recursion (Core Prob-Lem)
No ratings yet
Numerical Methods For CSE Problem Sheet 4: Problem 1. Order of Convergence From Error Recursion (Core Prob-Lem)
14 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Logistic Regression
No ratings yet
Logistic Regression
17 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
10 pages
2-Logistic Regression
No ratings yet
2-Logistic Regression
15 pages
Exponential Backoff and Jitter - AWS Architecture Blog
No ratings yet
Exponential Backoff and Jitter - AWS Architecture Blog
9 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Problem Set 4: Graphs: CS 3510: Design & Analysis of Algorithms
No ratings yet
Problem Set 4: Graphs: CS 3510: Design & Analysis of Algorithms
5 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
Dhaka ICPC Preliminary 2021 Editorial
No ratings yet
Dhaka ICPC Preliminary 2021 Editorial
7 pages
Report CA2 Template
No ratings yet
Report CA2 Template
5 pages
Logistic Regression in R and Python
No ratings yet
Logistic Regression in R and Python
9 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
Developments in KD Tree and KNN Searches
No ratings yet
Developments in KD Tree and KNN Searches
8 pages
Call For Proposals Document Innovation Challenge For Circularity in Renewable Energy Technologies
No ratings yet
Call For Proposals Document Innovation Challenge For Circularity in Renewable Energy Technologies
8 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
Logistic Regression in Machine Learning - GeeksforGeeks
10 pages
B.Tech V KCS055 Unit2 2
No ratings yet
B.Tech V KCS055 Unit2 2
7 pages
M2 Logistic Regression Classcopy 4
No ratings yet
M2 Logistic Regression Classcopy 4
7 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Exp 2 121a1047 ML Lavanya Kurup Div C C3
No ratings yet
Exp 2 121a1047 ML Lavanya Kurup Div C C3
8 pages
Fourier Transform
No ratings yet
Fourier Transform
16 pages
Sonia Jessica - 2022 - How Does Logistic Regression Work
No ratings yet
Sonia Jessica - 2022 - How Does Logistic Regression Work
4 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
MTH403 Assignment 231219
No ratings yet
MTH403 Assignment 231219
2 pages
20 Securities Risk and Return Calculation in Excel
No ratings yet
20 Securities Risk and Return Calculation in Excel
19 pages
Logistics Regression
No ratings yet
Logistics Regression
8 pages
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
No ratings yet
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
6 pages
2fy2-01 Engineering Mathematics-I (A To D)
No ratings yet
2fy2-01 Engineering Mathematics-I (A To D)
3 pages
Paper 7 - The Object Detection Based On Deep Learning
No ratings yet
Paper 7 - The Object Detection Based On Deep Learning
6 pages
Samatrix Kaa Kaam
No ratings yet
Samatrix Kaa Kaam
3 pages
Review ICC
No ratings yet
Review ICC
3 pages
ML (08-08-2024)
No ratings yet
ML (08-08-2024)
5 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Ece IV Signals & Systems (10ec44) Notes
No ratings yet
Ece IV Signals & Systems (10ec44) Notes
115 pages
Ford
No ratings yet
Ford
5 pages
Experiment No 8
No ratings yet
Experiment No 8
4 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
3 pages
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 6
No ratings yet
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 6
2 pages
Chapter Two Dss
No ratings yet
Chapter Two Dss
3 pages
Misc 5
No ratings yet
Misc 5
1 page
No Data Decision Problem Question 1
No ratings yet
No Data Decision Problem Question 1
3 pages
Minimalist Weekly Planner PDF
No ratings yet
Minimalist Weekly Planner PDF
1 page