0% found this document useful (0 votes)
14 views7 pages

Experiment No 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Experiment No 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

0ICPC455

AY 2024-25 Machine Learning Lab


Quality Laboratory Manual

Experiment No. 3

Study and Implementation of Logistic


Regression

Course Instructor -
DR. TAHSEEN A. MULLA
ASSOCIATE PROFESSOR
QUALITY LABORATORY MANUAL
Prepared by – Dr. Tahseen A. Mulla
Machine Learning Laboratory [0ICPC455]
Final Year – AY 2024-25 [Odd Semester]

Experiment No. 3

Title of Experiment: To study and implement Logistic Regression

Aim of Experiment: To implement and understand the working of Logistic Regression, a


statistical method used for binary classification problems in Machine Learning

System Requirements – Win 8 and above OS, 4GB RAM, 2.33 GHz Processor

Software/s Needed for Experiment – Jupyter Notebook/ Anaconda Navigator/ Google


Colaboratory/ Spyder, Python 3.x [With libraries such as Numpy, Pandas, Matplotlib and Scikit-
Learn]

Experiment Outcomes –
1. Understand the principles and fundamentals of logistic regression as binary classifier
2. Gain insights into how logistic regression fits into the broader landscape of Machine Learning
models
3. Able to evaluate the performance of logistic regression model using appropriate metrics
4. Extend the possibilities to handle multi-class classification problems with logistic regression

Theory –
Logistic Regression is a type of regression analysis used for predicting the outcome of a
categorical dependent variable based on one or more predictor variables (independent variables).
Unlike linear regression, which predicts continuous outcomes, logistic regression predicts a
probability that the dependent variable belongs to a particular category.
Logistic regression uses a logistic function called a sigmoid function to map predictions
and their probabilities. The sigmoid function refers to an S-shaped curve that converts any real
value to a range between 0 and 1.
If the output of the sigmoid function (estimated probability) is greater than a predefined
threshold on the graph, the model predicts that the instance belongs to that class. If the estimated
probability is less than the predefined threshold, the model predicts that the instance does not
belong to the class.
For binary classification, the outcome is often coded as 0 or 1, where 1 typically represents
the presence of an event (e.g., success, yes) and 0 represents its absence (e.g., failure, no).

The sigmoid function is referred to as an activation function for logistic regression and is defined
as:
1
𝑓(𝑥) =
1 + 𝑒 −𝑥
Where,

Study and Implementation of Logistic Regression Page 1 of 6


QUALITY LABORATORY MANUAL
Prepared by – Dr. Tahseen A. Mulla
Machine Learning Laboratory [0ICPC455]
Final Year – AY 2024-25 [Odd Semester]

e = base of natural logarithms

The logistic regression model uses the following logistic function (also known as the
sigmoid function) to map predicted values to probabilities:
1
𝑃(𝑦 = 1) =
1 + 𝑒 −(𝛽0 + 𝛽1 𝑥1+ 𝛽2𝑥2 +⋯+ 𝛽𝑛𝑥𝑛)

where:

 P(y=1) is the probability that the dependent variable y equals 1


 β0 is the intercept
 β1, β2, …, βn are the coefficients for the independent variables x1, x2, … , xn

The model aims to find the best-fit coefficients that maximize the likelihood of observing the
given data.

Type of Logistic Regression:


On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.

2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible


unordered types of the dependent variable, such as "cat", "dogs", or "sheep"

3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High"

Study and Implementation of Logistic Regression Page 2 of 6


QUALITY LABORATORY MANUAL
Prepared by – Dr. Tahseen A. Mulla
Machine Learning Laboratory [0ICPC455]
Final Year – AY 2024-25 [Odd Semester]

Procedure to implement Logistic Regression:


1. Data Collection: Obtain a dataset suitable for binary classification. A dataset such as
predicting whether a student passes or fails based on study hours and attendance

2. Data Preprocessing:
a. Load the dataset into a Pandas DataFrame.
b. Handle missing values, outliers, and encode categorical variables if necessary.
c. Normalize or standardize the data if needed.

3. Exploratory Data Analysis (EDA): Visualize relationships between the dependent and
independent variables using scatter plots and correlation matrices.

4. Splitting the Data: Split the dataset into training and test sets to evaluate the model's
performance.

5. Implementing Multiple Linear Regression:


a. Use the Scikit-learn library to fit a logistic regression model.
b. Train the model using the training data.

6. Model Prediction:
a. Use the trained model to make predictions on the test data.
b. Generate a confusion matrix and classification report to assess accuracy

7. Model Evaluation: Evaluate the model using metrics such as accuracy, precision, recall,
F1-score and ROC-AUC curve

Sample Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix,
classification_report, roc_auc_score, roc_curve

# Step 1: Data Collection (Example dataset)


# Dataset: Predicting whether a student passes or fails based on study hours and attendance

data = {
'Study_Hours': [10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
'Attendance': [90, 85, 80, 75, 70, 65, 60, 55, 50, 45],
'Pass': [1, 1, 1, 1, 1, 0, 0, 0, 0, 0] # 1: Pass, 0: Fail
}

df = pd.DataFrame(data)

Study and Implementation of Logistic Regression Page 3 of 6


QUALITY LABORATORY MANUAL
Prepared by – Dr. Tahseen A. Mulla
Machine Learning Laboratory [0ICPC455]
Final Year – AY 2024-25 [Odd Semester]

# Step 2: Data Preprocessing


# No missing values or categorical variables in this simple dataset.

# Step 3: Exploratory Data Analysis (EDA)


# Plotting scatter plots for independent variables against the dependent variable (Pass/Fail)
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.scatter(df['Study_Hours'], df['Pass'], color='blue')
plt.xlabel('Study Hours')
plt.ylabel('Pass (1) / Fail (0)')
plt.title('Study Hours vs Pass/Fail')

plt.subplot(1, 2, 2)
plt.scatter(df['Attendance'], df['Pass'], color='green')
plt.xlabel('Attendance (%)')
plt.ylabel('Pass (1) / Fail (0)')
plt.title('Attendance vs Pass/Fail')

plt.show()

# Step 4: Splitting the Data


X = df[['Study_Hours', 'Attendance']] # Independent variables
y = df['Pass'] # Dependent variable

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.2, random_state=0)

# Step 5: Implementing Logistic Regression


logreg = LogisticRegression()
logreg.fit(X_train, y_train)

# Step 6: Model Prediction


y_pred = logreg.predict(X_test)

# Comparing Actual vs Predicted


comparison_df = pd.DataFrame({'Actual': y_test, 'Predicted':
y_pred})
print(comparison_df)

# Step 7: Model Evaluation


# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print(f"Confusion Matrix:\n{conf_matrix}")

Study and Implementation of Logistic Regression Page 4 of 6


QUALITY LABORATORY MANUAL
Prepared by – Dr. Tahseen A. Mulla
Machine Learning Laboratory [0ICPC455]
Final Year – AY 2024-25 [Odd Semester]

# Classification Report
class_report = classification_report(y_test, y_pred)
print(f"Classification Report:\n{class_report}")

# ROC-AUC Score
roc_auc = roc_auc_score(y_test, y_pred)
print(f"ROC-AUC Score: {roc_auc}")

# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test,
logreg.predict_proba(X_test)[:,1])
plt.plot(fpr, tpr, color='blue')
plt.plot([0, 1], [0, 1], color='red', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()

# Coefficients of the model


print(f"Intercept (β0): {logreg.intercept_[0]}")
print(f"Coefficients (β1, β2): {logreg.coef_[0]}")

Observations -
 Record the predicted outcomes (pass/fail) for the test data
 Observe the performance of the model using the confusion matrix and classification report

Conclusion –
Hence, the model summarize the findings from the experiment, such as the relationship
between study hours, attendance and the likelihood of passing

References –
a. Textbook –
i. Machine Learning with Python – An approach to Applied ML – Abhishek
Vijayvargiya, BPB Publications, 1st Edition 2018
ii. Machine Learning, Tom Mitchell, McGraw Hill Education, 1st Edition 1997
b. Online references –
i. https://www.analyticsvidhya.com/blog/2021/08/conceptual-understanding-of-logistic-
regression-for-data-science-beginners/
ii. https://www.simplilearn.com/tutorials/machine-learning-tutorial/logistic-regression-
in-python
iii. https://www.kaggle.com/code/nargisbegum82/logistic-regression-in-machine-
learning

Study and Implementation of Logistic Regression Page 5 of 6


QUALITY LABORATORY MANUAL
Prepared by – Dr. Tahseen A. Mulla
Machine Learning Laboratory [0ICPC455]
Final Year – AY 2024-25 [Odd Semester]

Expected Oral Questions –

1. What is Logistic Regression and how does it differ from Linear Regression?
2. When would you use Logistic Regression instead of Linear Regression?
3. Explain the logistic function and how it is used in Logistic Regression?
4. What is the range of the output of a Logistic Regression model and what does it represent?
5. How do you handle the categorical predictors in Logistic Regression?
6. What are some methods to assess the performance of a Logistic Regression model?
7. What is the purpose of using a confusion matrix in the context of Logistic Regression?
8. How can you handle imbalanced datasets in Logistic Regression?
9. What is the difference between binary logistic regression and multinomial logistic
regression?
10. State a basic example for Logisitc Regression?

FAQ’s in Interview –
1. What is Logistic Regression and how does it differ from Linear Regression?
2. Explain the concept of the logit function in Logistic Regression?
3. How is Logistic Regression used for classification tasks?
4. What is sigmoid function and why is it important in Logistic Regression?
5. How do you interpret the coefficients of Logistic Regression model?

Study and Implementation of Logistic Regression Page 6 of 6

You might also like