0% found this document useful (0 votes)
10 views

Logistic Regression

Logistic regression is a machine learning classification algorithm that predicts categorical dependent variables by estimating probabilities using a logistic function. It is used to predict binary outcomes like pass/fail, yes/no, etc. based on one or more independent variables that are numerical, categorical, or both. Logistic regression fits an S-shaped logistic function to the data and outputs probabilities between 0 and 1, which can then be classified using a threshold.

Uploaded by

Yogesh Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Logistic Regression

Logistic regression is a machine learning classification algorithm that predicts categorical dependent variables by estimating probabilities using a logistic function. It is used to predict binary outcomes like pass/fail, yes/no, etc. based on one or more independent variables that are numerical, categorical, or both. Logistic regression fits an S-shaped logistic function to the data and outputs probabilities between 0 and 1, which can then be classified using a threshold.

Uploaded by

Yogesh Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Logistic Regression

What is Logistic Regression?

• Logistic regression is one of the most popular Machine Learning algorithms, which comes under the
Supervised Learning technique.
• It is used for predicting the categorical dependent variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent variable.
• Therefore the outcome must be a categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives
the probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are used.
• Linear Regression is used for solving Regression problems, whereas Logistic regression is used for
solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
• The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
• Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
• Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification.
• The below image is showing the logistic function:

Note: Logistic regression uses the concept of predictive modeling as regression;


therefore, it is called logistic regression, but is used to classify samples. Therefore,
it falls under the classification algorithm.
Linear Regression could help us predict the student’s test
score on a scale of 0 - 100.

Linear regression predictions are continuous (numbers in a


range).
Comparison
to Linear Logistic Regression could help use predict whether the

Regression student passed or failed. Logistic regression predictions are


discrete (only specific values or categories are allowed).

We can also view probability scores underlying the model’s


classifications.
Binomial

• In such a kind of classification, a dependent variable will have only


two possible types either 1 and 0.
• For example, these variables may represent success or failure, yes
or no, win or loss etc.

Multinomial
Types of • In such a kind of classification, dependent variable can have 3 or
more possible unordered types or the types having no quantitative
Logistic significance.
• For example, these variables may represent “Type A” or “Type B”

Regression or “Type C”.

Ordinal

• In such a kind of classification, dependent variable can have 3 or


more possible ordered types or the types having a quantitative
significance.
• For example, these variables may represent “poor” or “good”,
“very good”, “Excellent” and each category can have the scores like
0,1,2,3.
• Binomial: In binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc.
• Say we’re given data on student exam results and our goal is to predict whether a
student will pass or fail based on number of hours slept and hours spent studying.
• We have two features (hours slept, hours studied) and two classes: passed (1) and
failed (0).

Studied Slept Passed


4.85 9.63 1
8.62 3.23 0
5.43 8.23 1
9.21 6.34 0
Sigmoid activation
In order to map predicted values to probabilities, we use the sigmoid function.
The function maps any real value into another value between 0 and 1.
In machine learning, we use sigmoid to map predictions to probabilities.
Math
S(z)=

•s(z) = output between 0 and 1 (probability estimate)


•z = input to the function (your algorithm’s prediction e.g. mx + b)
•e = base of natural log
To implement the Logistic Regression using Python, we will use the same
steps as we have done in previous topics of Regression. Below are the
steps:
Data Pre-processing step

Steps in Fitting Logistic Regression to the Training set

Logistic Predicting the test result

Regression Test accuracy of the result(Creation of Confusion matrix)

Visualizing the test set result.


Prepare Data for Logistic Regression
• Binary Output Variable:
• This might be obvious as we have already mentioned it, but logistic regression is intended for binary (two-
class) classification problems.
• It will predict the probability of an instance belonging to the default class, which can be snapped into a 0 or 1
classification.
• Remove Noise:
• Logistic regression assumes no error in the output variable (y), consider removing outliers and possibly
misclassified instances from your training data.
• Gaussian Distribution:
• Logistic regression is a linear algorithm (with a non-linear transform on output). It does assume a linear
relationship between the input variables with the output.
• Data transforms of your input variables that better expose this linear relationship can result in a more accurate
model. For example, you can use log, root, Box-Cox and other univariate transforms to better expose this
relationship.
• Remove Correlated Inputs:
• Like linear regression, the model can overfit if you have multiple highly-correlated
inputs.
• Consider calculating the pairwise correlations between all inputs and removing highly
correlated inputs.
• Fail to Converge:
• It is possible for the expected likelihood estimation process that learns the coefficients
to fail to converge.
• This can happen if there are many highly correlated inputs in your data or the data is
very sparse (e.g. lots of zeros in your input data).
Logistic Regression Assumptions

• Before diving into the implementation of logistic regression, we must be aware of


the following assumptions about the same −
• In case of binary logistic regression, the target variables must be binary always
and the desired outcome is represented by the factor level 1.
• There should not be any multi-collinearity in the model, which means the
independent variables must be independent of each other .
• We must include meaningful variables in our model.
• We should choose a large sample size for logistic regression.
Linear Regression Logistic Regression
Linear Regression is a supervised regression model. Logistic Regression is a supervised classification model.

In Linear Regression, we predict the value by an integer


In Logistic Regression, we predict the value by 1 or 0.
number.

Here activation function is used to convert a linear regression


Here no activation function is used.
equation to the logistic regression equation

Here no threshold value is needed. Here a threshold value is added.

Here we calculate Root Mean Square Error(RMSE) to predict


Here we use precision to predict the next weight value.
the next weight value.

Here the dependent variable consists of only two categories.


Here dependent variable should be numeric and the Logistic regression estimates the odds outcome of the
response variable is continuous to value. dependent variable given a set of quantitative or categorical
independent variables.

It is based on the least square estimation. It is based on maximum likelihood estimation.


Linear Regression Logistic Regression

Any change in the coefficient leads to a change in both the


Here when we plot the training datasets, a straight line can be direction and the steepness of the logistic function. It means
drawn that touches maximum plots. positive slopes result in an S-shaped curve and negative
slopes result in a Z-shaped curve.

Linear regression is used to estimate the dependent variable Whereas logistic regression is used to calculate the probability
in case of a change in independent variables. For example, of an event. For example, classify if tissue is benign or
predict the price of houses. malignant.

Linear regression assumes the normal or gaussian distribution Logistic regression assumes the binomial distribution of the
of the dependent variable. dependent variable.

You might also like