0% found this document useful (0 votes)
20 views

logistic regression

Logistic Regression is a key algorithm for binary classification, focusing on predicting probabilities of outcomes based on categorical data. It employs the sigmoid function to map inputs into a probability range of 0 to 1, transforming linear equations into a logistic model. The process involves optimizing a cost function using gradient descent and evaluating model performance through metrics like confusion matrices and ROC curves.

Uploaded by

shubham.kunjapur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

logistic regression

Logistic Regression is a key algorithm for binary classification, focusing on predicting probabilities of outcomes based on categorical data. It employs the sigmoid function to map inputs into a probability range of 0 to 1, transforming linear equations into a logistic model. The process involves optimizing a cost function using gradient descent and evaluating model performance through metrics like confusion matrices and ROC curves.

Uploaded by

shubham.kunjapur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Breaking down Logistic Regression to its

basics

#1. The Path from Data to Decision


In the vast expanse of ML algorithms, Logistic Regression stands as an optimal model for binary
classification problems.
It is the trusted path we take when the terrain is categorical, and the destination is decision-making.
Logistic Regression is not merely a statistical tool but a storytelling device that translates numerical
tales into binary outcomes.

#2. Introduction to Logistic Regression


Imagine you are at a crossroads where each path leads to a distinct outcome, and your choice is
binary: yes or no, true or false, A or B.
Logistic regression is the queen in this field of dichotomies.
At its core, Logistic Regression is about probabilities. It measures the likelihood of an event
occurring.
Its main goal? 🎯
Logistic regression aims to find the probability that a given input belongs to a certain class.

#3. The Sigmoid Function


Logistic regression is based on the sigmoid function, a mathematical curve that maps any real-
valued input into a value between 0 and 1, suitable for probability interpretation.
This is the probability space where Logistic Regression composes its symphony.

The elegance of this function lies in its simplicity: it takes the linear equation, akin to a straight
road, and bends it into an S-shaped path that gracefully transitions from one state to another.
#3.1 From Linear To Logistic Regression
Step 1. Linear Regression Foundation:We begin with a representation of a linear regression
equation:
Y = Ax + B,
where:
• Y is the dependent variable (the outcome we're trying to predict).
• X is the independent variable (the predictor).
• A and B are the coefficients that represent the slope and the y-intercept of the regression
line, respectively.
The main problem?
The plot shows how linear regression would typically fit a straight line through data points.
Step 2. Probability Adjustment: Since linear regression outputs can extend beyond the range of
[0,1], which is not suitable for probability, the equation is adjusted to P = Ax + B to reflect that
P (probability) is being modeled instead of a direct measurement Y.

Step 3. Odds Transformation. However, P can still take on values less than 0 or greater than 1,
which is not practical for probabilities. So to constraint P between [0,1] the odds are computed.

Step 4. Log Transformation: A log transformation is applied, leading to the equation


log(P / (1 - P)) = Ax + B. (LOGIT function)

This is a pivotal step in moving from linear to logistic regression. This transformation allows us to
model P as a linear combination of x but in the log-odds space, not the probability space.

Step 5. Sigmoid Function Derivation: By rearranging the log-odds equation, we obtain


P = e^(Ax + B) / (1 + e^(Ax + B)).

This equation represents the sigmoid function, which bounds P between 0 and 1. It translates the
linear combination of x into a probability.

#4. How to obtain it mathematically?


For a binary classification problem, the model output corresponds to the
probability of prediction y being:
• y = 1 when the output is a class, let's say A.
• y = 0 when the output is the other class or B.
Of course, this codification could be vice versa.
If we define this as our hypothesis, we can mathematically obtain the cost function to minimize it,
known as binary cross-entropy or log loss.

By looking at the Loss function we see:


• The loss function approaches infinity if we predict it incorrectly.
• The loss approaches 0 when we predict correctly, and thus, a minimum.
And now following step is…

#5. Find the optimal solution with Gradient Descent


Gradient descent is a pivotal optimization algorithm used to minimize the cost function, aiding us in
our aim to find the most accurate weight values for our predictive model.
Envision standing atop a hill, your objective is the valley below — this represents our cost
function's minimum point.
To reach it, we begin with initial guesses for our weights, A and B, and iteratively refine these
guesses.
The process is akin to descending a hill: with each step, we assess our surroundings and adjust our
trajectory to ensure each subsequent step brings us closer to the valley floor.
These steps are guided by the learning rate — a vital hyperparameter symbolized as lr in the
equations. This learning rate controls the size of our steps or adjustments to the parameters A and B,
ensuring that we do not overshoot the minimum.
As we take each step, we calculate the partial derivatives of the cost function with respect to A and
B, denoted as dA and dB respectively. These derivatives point us in the direction where the cost
function decreases the fastest, akin to finding the steepest descent on our metaphorical hill.
The updated equations for A and B in each iteration, factoring in the learning rate, are as follows:
This process is repeated until we reach a point where the cost function's decrease is negligible,
suggesting we've arrived at or near the global minimum — our destination where the predictive
error is minimized, and our model's accuracy is maximized.

#6. Model Evaluation


To evaluate the performance of our model, there are different approaches:
1. Confusion Matrix: A table used to describe the performance of a classification model. It
categorizes predictions into true positives, true negatives, false positives, and false
negatives. With these metrics, we have clear picture of a model's predictive accuracy and the
nature of errors.
2. ROC Curve: A graph that illustrates the model's
3. ability to correctly predict the positive class at
4. various threshold levels, providing insights into the
5. balance between sensitivity and specificity.
6. AUC: Standing for "Area Under the ROC Curve," this metric quantifies the overall ability
of the model to distinguish between classes, with higher values indicating better
performance.

#7. The Assumptions Behind the Algorithm


Every model rests on assumptions, and Logistic Regression is no different. It presumes:
• A binary outcome
• A linear relationship between the log odds and the
• independent variables.
• The independent variables are not in a
• multicollinearity ensemble
• A sample size that can give reliable insights
• into the patterns of the data.

You might also like