0% found this document useful (0 votes)
13 views

Logistic Regression

sfdg

Uploaded by

Gaurav tailor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Logistic Regression

sfdg

Uploaded by

Gaurav tailor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Logistic Regression in Machine Learning

Logistic regression is a supervised machine learning algorithm mainly used for classification tasks where
the goal is to predict the probability that an instance of belonging to a given class or not. It is a kind of
statistical algorithm, which analyze the relationship between a set of independent variables and the
dependent binary variables. It is a powerful tool for decision-making. For example email spam or not.

Logistic Regression
Logistic regression is a supervised machine learning algorithm mainly used for classification tasks where
the goal is to predict the probability that an instance of belonging to a given class. It is used for
classification algorithms its name is logistic regression.

it’s referred to as regression because it takes the output of the linear regression function as input and uses a sigmoid function to
estimate the probability for the given class.

Logistic Regression:

It is used for predicting the categorical dependent variable using a given set of
independent variables.

Logistic regression predicts the output of a categorical dependent variable.


Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the
exact value as 0 and 1, it gives the probabilistic values which lie between 0 and
1.
Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped
logistic function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such
as whether the cells are cancerous or not, a mouse is obese or not based on its
weight, etc.
Logistic Regression is a significant machine learning algorithm because it has
the ability to provide probabilities and classify new data using continuous and
discrete datasets.
Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for
the classification.

Logistic Function (Sigmoid Function):

The sigmoid function is a mathematical function used to map the predicted


values to probabilities.
It maps any real value into another value within a range of 0 and 1. o The value
of the logistic regression must be between 0 and 1, which cannot go beyond
this limit, so it forms a curve like the “S” form.
The S-form curve is called the Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines
the probability of either 0 or 1. Such as values above the threshold value tends
to 1, and a value below the threshold values tends to 0.

Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three
types:

1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of
the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as “low”, “Medium”, or “High”.

Terminologies involved in Logistic Regression:

Here are some common terms involved in logistic regression:

Independent variables: The input characteristics or predictor factors applied to the dependent
variable’s predictions.
Dependent variable: The target variable in a logistic regression model, which we are trying to predict.
Logistic function: The formula used to represent how the independent and dependent variables relate
to one another. The logistic function transforms the input variables into a probability value between 0
and 1, which represents the likelihood of the dependent variable being 1 or 0.
Odds: It is the ratio of something occurring to something not occurring. it is different from probability
as the probability is the ratio of something occurring to everything that could possibly occur.
Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the odds. In
logistic regression, the log odds of the dependent variable are modeled as a linear combination of the
independent variables and the intercept.
Coefficient: The logistic regression model’s estimated parameters, show how the independent and
dependent variables relate to one another.
Intercept: A constant term in the logistic regression model, which represents the log odds when all
independent variables are equal to zero.
Maximum likelihood estimation: The method used to estimate the coefficients of the logistic
regression model, which maximizes the likelihood of observing the data given the model.

How does Logistic Regression work?


The logistic regression model transforms the linear regression function continuous
value output into categorical value output using a sigmoid function, which maps
any real-valued set of independent variables input into a value between 0 and 1.
This function is known as the logistic function.

Let the independent input features be


and the dependent variable is Y having only binary value i.e. 0 or 1.

then apply the multi-linear function to the input variables X

Sigmoid Function

Now we use the sigmoid function where the input will be z and we find the
probability between 0 and 1. i.e predicted y.
As shown above, the figure sigmoid function converts the continuous variable data
into the probability i.e. between 0 and 1.

tends towards 1 as
tends towards 0 as
is always bounded between 0 and 1

where the probability of being a class can be measured as:


Logistic Regression Equation
The odd is the ratio of something occurring to something not occurring. it is
different from probability as the probability is the ratio of something occurring to
everything that could possibly occur. so odd will be

Applying natural log on odd. then log odd will be

then the final logistic regression equation will be:

Likelihood function for Logistic Regression


The predicted probabilities will p(X;b,w) = p(x) for y=1 and for y = 0 predicted
probabilities will 1-p(X;b,w) = 1-p(x)

Taking natural logs on both sides


Gradient of the log-likelihood function
To find the maximum likelihood estimates, we differentiate w.r.t w,

Assumptions for Logistic Regression


The assumptions for Logistic regression are as follows:

Independent observations: Each observation is independent of the other.


meaning there is no correlation between any input variables.
Binary dependent variables: It takes the assumption that the dependent
variable must be binary or dichotomous, meaning it can take only two values.
For more than two categories softmax functions are used.
Linearity relationship between independent variables and log odds: The
relationship between the independent variables and the log odds of the
dependent variable should be linear.
No outliers: There should be no outliers in the dataset.
Large sample size: The sample size is sufficiently large

The difference between linear regression and logistic regression is that linear regression output is the continuous value that can be

anything while logistic regression predicts the probability that an instance belongs to a given class or not.

Sr.No Linear Regresssion Logistic Regression

1 Linear regression is used to predict the Logistic regression is used to predict


continuous dependent variable using a the categorical dependent variable

given set of independent variables. using a given set of independent


variables.

2 Linear regression is used for solving It is used for solving classification


Regression problem. problems.

3 In this we predict the value of In this we predict values of categorical


continuous variables varibles

4 In this we find best fit line. In this we find S-Curve .

5 Least square estimation method is Maximum likelihood estimation


used for estimation of accuracy. method is used for Estimation of

accuracy.

6 The output must be continuous Output is must be categorical value


value,such as price,age,etc. such as 0 or 1, Yes or no, etc.

7 It required linear relationship between It not required linear relationship.


dependent and independent variables.

8 There may be collinearity between the There should not be collinearity


independent variables. between independent varible.

Applying steps in logistic regression modeling:


The following are the steps involved in logistic regression modeling:

Define the problem: Identify the dependent variable and independent


variables and determine if the problem is a binary classification problem.
Data preparation: Clean and preprocess the data, and make sure the data is
suitable for logistic regression modeling.
Exploratory Data Analysis (EDA): Visualize the relationships between the
dependent and independent variables, and identify any outliers or anomalies in
the data.
Feature Selection: Choose the independent variables that have a significant
relationship with the dependent variable, and remove any redundant or
irrelevant features.
Model Building: Train the logistic regression model on the selected
independent variables and estimate the coefficients of the model.
Model Evaluation: Evaluate the performance of the logistic regression model
using appropriate metrics such as accuracy, precision, recall, F1-score, or AUC-
ROC.
Model improvement: Based on the results of the evaluation, fine-tune the
model by adjusting the independent variables, adding new features, or using
regularization techniques to reduce overfitting.
Model Deployment: Deploy the logistic regression model in a real-world
scenario and make predictions on new data.

Logistic Regression Model Thresholding


Logistic regression becomes a classification technique only when a decision
threshold is brought into the picture. The setting of the threshold value is a very
important aspect of Logistic regression and is dependent on the classification
problem itself.

The decision for the value of the threshold value is majorly affected by the values
of precision and recall. Ideally, we want both precision and recall to be 1, but this
seldom is the case.

In the case of a Precision-Recall tradeoff, we use the following arguments to decide


upon the threshold:

1. Low Precision/High Recall: In applications where we want to reduce the


number of false negatives without necessarily reducing the number of false
positives, we choose a decision value that has a low value of Precision or a high
value of Recall. For example, in a cancer diagnosis application, we do not want
any affected patient to be classified as not affected without giving much heed
to if the patient is being wrongfully diagnosed with cancer. This is because the
absence of cancer can be detected by further medical diseases but the
presence of the disease cannot be detected in an already rejected candidate.
2. High Precision/Low Recall: In applications where we want to reduce the
number of false positives without necessarily reducing the number of false
negatives, we choose a decision value that has a high value of Precision or a low
value of Recall. For example, if we are classifying customers whether they will
react positively or negatively to a personalized advertisement, we want to be
absolutely sure that the customer will react positively to the advertisement
because otherwise, a negative reaction can cause a loss of potential sales from
the customer.

Whether you're preparing for your first job interview or aiming to upskill in this
ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We
provide top-quality content at affordable prices, all geared towards accelerating
your growth in a time-bound manner. Join the millions we've already empowered,
and we're here to do the same for you. Don't miss out -

You might also like