Mla 4
Mla 4
Date :
Practical Details: Practical No. 3
Student Details:
Roll Number 56
Name Sanket Jambhulkar
Semester 5th
Section B
Branch CSE
Subject MLA
Aim: Write a python program to classify the given dataset using Logistic Regression and
evaluate the model.
Theory: Introduction:
Logistic Regression is a fundamental algorithm used for binary classification tasks. Despite its
name, it's a classification algorithm rather than a regression one. In this theoretical explanation,
we will delve into the concept of Logistic Regression, its underlying principles, and how it is
used for classification tasks. Furthermore, we will discuss the process of evaluating a Logistic
Regression model's performance.
1) Logistic Regression:
Logistic Regression is a statistical method used for predicting the probability of a binary
outcome.
It models the probability that a given input belongs to a particular class.
The logistic function (sigmoid function) is used to map input features to the range [0, 1],
representing probabilities.
Mathematically, the logistic function is expressed as:
σ(z) = 1 / (1 + e^(-z)), where z = w^T * x + b, w is the weight vector, x is the feature
vector, and b is the bias term.
2) Training Process:
In the training process, the Logistic Regression model learns the optimal weights and
bias that minimize a predefined loss function, typically the logistic loss or cross-entropy
loss.
This process involves iterative optimization algorithms such as gradient descent, where
the model iteratively updates the weights and bias to minimize the loss function.
3) Classification:
After training, the Logistic Regression model uses the learned parameters to predict the
probability that a given input belongs to the positive class (class 1).
If the predicted probability is greater than a predefined threshold (usually 0.5), the input
is classified as belonging to the positive class; otherwise, it is classified as belonging to
the negative class (class 0).
4) Model Evaluation:
Several metrics are commonly used to evaluate the performance of a Logistic Regression
model, including accuracy, precision, recall, F1-score, and area under the ROC curve
(AUC-ROC).
Accuracy measures the proportion of correctly classified instances out of the total
instances.
Precision measures the proportion of true positive predictions among all positive
predictions.
Recall measures the proportion of true positive predictions among all actual positive
instances.
F1-score is the harmonic mean of precision and recall and provides a balanced measure
of a model's performance.
AUC-ROC measures the area under the Receiver Operating Characteristic curve and
provides a comprehensive evaluation of the model's ability to discriminate between
positive and negative instances across different threshold value