Logistic Regression in Machine Learning
Logistic Regression in Machine Learning
In our previous discussion, we explored the fundamentals of machine learning and walked
through a hands-on implementation of Linear Regression. Now, let’s take a step forward and
dive into one of the first and most widely used classification algorithms — Logistic Regression
What is Logistic Regression?
Logistic regression is a supervised machine learning algorithm used for classification
tasks where the goal is to predict the probability that an instance belongs to a given class or
not. Logistic regression is a statistical algorithm which analyze the relationship between two
data factors. The article explores the fundamentals of logistic regression, it’s types and
implementations.
Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for
an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to
Class 0. It’s referred to as regression because it is the extension of linear regression but is
mainly used for classification problems.
Key Points:
• Logistic regression predicts the output of a categorical dependent variable. Therefore,
the outcome must be a categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value
as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
of dependent variables, such as “low”, “Medium”, or “High”.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these assumptions is
important to ensure that we are using appropriate application of the model. The assumption
include:
1. Independent observations: Each observation is independent of the other. meaning there
is no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the dependent variable must
be binary or dichotomous, meaning it can take only two values. For more than two
categories SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The relationship
between the independent variables and the log odds of the dependent variable should
be linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large
Understanding Sigmoid Function
So far, we’ve covered the basics of logistic regression but now let’s focus on the most important
function that forms the core of logistic regression.
• The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
• It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
• The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
Sigmoid Function
Now we use the sigmoid function where the input will be z and we find the probability between
0 and 1. i.e. predicted y.
σ(z)=11+e−zσ(z)=1+e−z1
Sigmoid function
As shown above, the figure sigmoid function converts the continuous variable data into
the probability i.e. between 0 and 1.
• σ(z) σ(z) tends towards 1 as z→∞z→∞
• σ(z) σ(z) tends towards 0 as z→−∞z→−∞
• σ(z) σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:
P(y=1) =σ(z)
P(y=0) =1−σ(z)
Equation of Logistic Regression:
The odd is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could
possibly occur. so odd will be:
This code loads the breast cancer dataset from scikit-learn, splits it into training and testing
sets, and then trains a Logistic Regression model on the training data. The model is used to
predict the labels for the test data, and the accuracy of these predictions is calculated by
comparing the predicted values with the actual labels from the test set. Finally, the accuracy is
printed as a percentage.
Multinomial Logistic Regression:
Target variable can have 3 or more possible types which are not ordered (i.e. types have no
quantitative significance) like “disease A” vs “disease B” vs “disease C”.
In this case, the softmax function is used in place of the sigmoid function. Softmax function for
K classes will be:
Here, K represents the number of elements in the vector z, and i, j iterates over all the elements
in the vector.
Then the probability for class c will be:
In Multinomial Logistic Regression, the output variable can have more than two possible
discrete outputs. Consider the Digit Dataset.
Model Interpretation
• Accuracy: Measures overall correctness of predictions
• Confusion Matrix: Shows True Positives (TP), False Positives (FP), True Negatives
(TN), and False Negatives (FN)
• Precision & Recall: Important when dealing with imbalanced classes
• ROC Curve & AUC Score: Measures model's ability to distinguish between classes
from sklearn.metrics import roc_curve, auc
y_prob = model.predict_proba(X_test)[:, 1] # Get probability scores
fpr, tpr, _ = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()