Experiment No 3
Experiment No 3
Experiment No. 3
Course Instructor -
DR. TAHSEEN A. MULLA
ASSOCIATE PROFESSOR
QUALITY LABORATORY MANUAL
Prepared by – Dr. Tahseen A. Mulla
Machine Learning Laboratory [0ICPC455]
Final Year – AY 2024-25 [Odd Semester]
Experiment No. 3
System Requirements – Win 8 and above OS, 4GB RAM, 2.33 GHz Processor
Experiment Outcomes –
1. Understand the principles and fundamentals of logistic regression as binary classifier
2. Gain insights into how logistic regression fits into the broader landscape of Machine Learning
models
3. Able to evaluate the performance of logistic regression model using appropriate metrics
4. Extend the possibilities to handle multi-class classification problems with logistic regression
Theory –
Logistic Regression is a type of regression analysis used for predicting the outcome of a
categorical dependent variable based on one or more predictor variables (independent variables).
Unlike linear regression, which predicts continuous outcomes, logistic regression predicts a
probability that the dependent variable belongs to a particular category.
Logistic regression uses a logistic function called a sigmoid function to map predictions
and their probabilities. The sigmoid function refers to an S-shaped curve that converts any real
value to a range between 0 and 1.
If the output of the sigmoid function (estimated probability) is greater than a predefined
threshold on the graph, the model predicts that the instance belongs to that class. If the estimated
probability is less than the predefined threshold, the model predicts that the instance does not
belong to the class.
For binary classification, the outcome is often coded as 0 or 1, where 1 typically represents
the presence of an event (e.g., success, yes) and 0 represents its absence (e.g., failure, no).
The sigmoid function is referred to as an activation function for logistic regression and is defined
as:
1
𝑓(𝑥) =
1 + 𝑒 −𝑥
Where,
The logistic regression model uses the following logistic function (also known as the
sigmoid function) to map predicted values to probabilities:
1
𝑃(𝑦 = 1) =
1 + 𝑒 −(𝛽0 + 𝛽1 𝑥1+ 𝛽2𝑥2 +⋯+ 𝛽𝑛𝑥𝑛)
where:
The model aims to find the best-fit coefficients that maximize the likelihood of observing the
given data.
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High"
2. Data Preprocessing:
a. Load the dataset into a Pandas DataFrame.
b. Handle missing values, outliers, and encode categorical variables if necessary.
c. Normalize or standardize the data if needed.
3. Exploratory Data Analysis (EDA): Visualize relationships between the dependent and
independent variables using scatter plots and correlation matrices.
4. Splitting the Data: Split the dataset into training and test sets to evaluate the model's
performance.
6. Model Prediction:
a. Use the trained model to make predictions on the test data.
b. Generate a confusion matrix and classification report to assess accuracy
7. Model Evaluation: Evaluate the model using metrics such as accuracy, precision, recall,
F1-score and ROC-AUC curve
Sample Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix,
classification_report, roc_auc_score, roc_curve
data = {
'Study_Hours': [10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
'Attendance': [90, 85, 80, 75, 70, 65, 60, 55, 50, 45],
'Pass': [1, 1, 1, 1, 1, 0, 0, 0, 0, 0] # 1: Pass, 0: Fail
}
df = pd.DataFrame(data)
plt.subplot(1, 2, 1)
plt.scatter(df['Study_Hours'], df['Pass'], color='blue')
plt.xlabel('Study Hours')
plt.ylabel('Pass (1) / Fail (0)')
plt.title('Study Hours vs Pass/Fail')
plt.subplot(1, 2, 2)
plt.scatter(df['Attendance'], df['Pass'], color='green')
plt.xlabel('Attendance (%)')
plt.ylabel('Pass (1) / Fail (0)')
plt.title('Attendance vs Pass/Fail')
plt.show()
# Classification Report
class_report = classification_report(y_test, y_pred)
print(f"Classification Report:\n{class_report}")
# ROC-AUC Score
roc_auc = roc_auc_score(y_test, y_pred)
print(f"ROC-AUC Score: {roc_auc}")
# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test,
logreg.predict_proba(X_test)[:,1])
plt.plot(fpr, tpr, color='blue')
plt.plot([0, 1], [0, 1], color='red', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()
Observations -
Record the predicted outcomes (pass/fail) for the test data
Observe the performance of the model using the confusion matrix and classification report
Conclusion –
Hence, the model summarize the findings from the experiment, such as the relationship
between study hours, attendance and the likelihood of passing
References –
a. Textbook –
i. Machine Learning with Python – An approach to Applied ML – Abhishek
Vijayvargiya, BPB Publications, 1st Edition 2018
ii. Machine Learning, Tom Mitchell, McGraw Hill Education, 1st Edition 1997
b. Online references –
i. https://www.analyticsvidhya.com/blog/2021/08/conceptual-understanding-of-logistic-
regression-for-data-science-beginners/
ii. https://www.simplilearn.com/tutorials/machine-learning-tutorial/logistic-regression-
in-python
iii. https://www.kaggle.com/code/nargisbegum82/logistic-regression-in-machine-
learning
1. What is Logistic Regression and how does it differ from Linear Regression?
2. When would you use Logistic Regression instead of Linear Regression?
3. Explain the logistic function and how it is used in Logistic Regression?
4. What is the range of the output of a Logistic Regression model and what does it represent?
5. How do you handle the categorical predictors in Logistic Regression?
6. What are some methods to assess the performance of a Logistic Regression model?
7. What is the purpose of using a confusion matrix in the context of Logistic Regression?
8. How can you handle imbalanced datasets in Logistic Regression?
9. What is the difference between binary logistic regression and multinomial logistic
regression?
10. State a basic example for Logisitc Regression?
FAQ’s in Interview –
1. What is Logistic Regression and how does it differ from Linear Regression?
2. Explain the concept of the logit function in Logistic Regression?
3. How is Logistic Regression used for classification tasks?
4. What is sigmoid function and why is it important in Logistic Regression?
5. How do you interpret the coefficients of Logistic Regression model?