0% found this document useful (0 votes)

6 views7 pages

Annotated Follow-Along Guide - Construct A Logistic Regression Model With Python

This document is a guide for constructing a binomial logistic regression model using Python, specifically utilizing libraries such as pandas, seaborn, and scikit-learn. It includes steps for importing data, performing exploratory data analysis, building the model, and visualizing results, along with instructions for evaluating the model using confusion matrices. The guide emphasizes the importance of reproducibility in the modeling process by setting a random state during data splitting.

Uploaded by

sulaiman.ext

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

Annotated Follow-Along Guide - Construct A Logistic Regression Model With Python

Uploaded by

sulaiman.ext

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Annotated follow-along guide_ Construct a logistic regression

model with Python

July 17, 2025

1 Binomial logistic regression (Part 1)

Throughout the following exercises, you will learn to use Python to build and evaluate a binomial
logistic regression model. Before starting on this programming exercise, we strongly recommend
watching the video lecture and completing the IVQ for the associated topics.
All the information you need for solving this assignment is in this notebook, and all the code you
will be implementing will take place within this notebook.
As we move forward, you can find instructions on how to install required libraries as they arise in
this notebook. Before we begin with the exercises and analyzing the data, we need to import all
libraries and extensions required for this programming exercise. Throughout the course, we will be
using pandas and sickit-learn for operations, and seaborn for plotting.

1.1 Relevant imports

Begin by importing the relevant packages and data.

[1]: # Import pandas and seaborn packages
import pandas as pd
import seaborn as sns

1.2 Exploratory data analysis

Note: The following code cell is shown in the video, but it will only work if the .csv file is in the
same folder as the notebook. Otherwise, please follow the data loading process outlined above.
[2]: # Load in if csv file is in the same folder as notebook
activity = pd.read_csv("activity.csv")

[3]: # Get summary statistics about the dataset

activity.describe()

[3]: Acc (vertical) LyingDown

count 494.000000 494.000000

1
mean 45.512363 0.516194
std 44.799360 0.500244
min -48.459000 0.000000
25% 0.918650 0.000000
50% 41.109500 1.000000
75% 89.339000 1.000000
max 112.310000 1.000000

[4]: # Examine the dataset

activity.head()

[4]: Acc (vertical) LyingDown

0 96.229 0
1 84.746 0
2 82.449 0
3 106.560 0
4 80.152 0

1.3 Construct binomial logistic regression model

For binomial logistic regression, we’ll be using the scikit-learn package, which is frequently used
for machine learning and more advanced data science topics. For the purposes of this exercise, we’ll
only load in the functions we need: train_test_split() and LogisticRegression().

[5]: # Load in sci-kit learn functions for constructing logistic regression

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

Then, we’ll save the data into variables called X and y so we can use the train_test_split()
function more easily. Remember that you can subset specific columns of a DataFrame object by
using double square brackets: [[]] and listing the columns in between, separated by commas.
[6]: # Save X and y data into variables
X = activity[["Acc (vertical)"]]
y = activity[["LyingDown"]]

Then we’ll split the data into training and holdout datasets. We set the test_size to 0.3 so that
the holdout dataset is only 30% of the total data we have. We’ll set the random_state equal to 42.
If you change this variable, then your results will be different from ours. Setting the random_state
is mainly for reproducibility purposes.
[7]: # Split dataset into training and holdout datasets
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3,␣
,→random_state=42)

Then we’ll build our classifier, and fit the model to the data by using the .fit() function. We’ll
save the fitted model as a variable called clf.

2
[8]: clf = LogisticRegression().fit(X_train,y_train)

1.4 Get coefficients and visualize model

We can use the coef_ and intercept_ attributes of the clf object to get the coefficient and
intercept of our model.
[9]: # Print the coefficient
clf.coef_

[9]: array([[-0.1177466]])

[10]: # Print the intercept

clf.intercept_

[10]: array([6.10177895])

So, based on what we’ve found, our model has an intercept or β0 of 6.10 and a β1 of -0.12. Now
we can plot our model and data with a 95% confidence band using the regplot() function from
the seaborn package. Remember to set the argument logistic=True so that the function knows
we are plotting a logistic regression model, not a linear regression model.
[11]: # Plot the logistic regression and its confidence band
sns.regplot(x="Acc (vertical)", y="LyingDown", data=activity, logistic=True)

[11]: <matplotlib.axes._subplots.AxesSubplot at 0x7ff7757e3850>

3
2 Confusion matrix (Part II)

This part of the notebook contains all of the code that will be presented in the second part of this
section in the course. The focus is on confusion matrices, which are used to evaluate classification
models, such as a binomial logistic regression model.
Note: We are assuming that the earlier parts of this notebook have been run, and that the existing
variables and imported packages have been saved.

2.1 Construct logistic regression model

Once again, we split our data, which is currently saved as variables X and y, into training and
holdout datasets using the train_test_split() function. The function has already been imported
from the scikit-learn package. Then, we build the model by using the LogisticRegression()
function with the .fit() function.
Next, we can save our model’s predictions by inputting the holdout sample, X_test into the model’s
.predict() function.
[12]: # Split data into training and holdout samples
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,␣
,→random_state=42)

# Build regression model

clf = LogisticRegression().fit(X_train,y_train)

# Save predictions
y_pred = clf.predict(X_test)

We can print out the predicted labels by just calling on clf.predict(X_test). Recall that 0
means not lying down, and 1 means lying down.
[13]: # Print out the predicted labels
clf.predict(X_test)

[13]: array([0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1,
1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0,
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1,
1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0,
0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1])

4
But, the model actually calculates a probability that given a particular value of X, the person is
lying down. We can print out the predicted probabilities with the following line of code. You can
read more about the LogisticRegression() function, its attributes, and related functions on the
scikit-learn website.
[14]: # Print out the predicted probabilities
clf.predict_proba(X_test)[::,-1]

[14]: array([2.31693308e-02, 9.97511568e-01, 1.04282071e-02, 1.05672351e-03,

9.99781257e-01, 1.04282071e-02, 4.66017106e-03, 9.95733864e-01,
9.72343955e-01, 2.31693308e-02, 9.97825589e-01, 1.36212174e-02,
3.43616982e-02, 9.95733864e-01, 9.98892860e-01, 5.75929752e-02,
4.77298577e-01, 6.09795092e-03, 9.98732777e-01, 9.92695617e-01,
3.91403776e-02, 9.31712676e-01, 9.64063419e-01, 1.55638073e-02,
9.83708329e-01, 1.19195424e-02, 7.55107906e-01, 8.74057100e-01,
3.43616982e-02, 9.91646273e-01, 1.77763901e-02, 1.19195424e-02,
9.97152314e-01, 2.64340977e-02, 9.98892860e-01, 1.19195424e-02,
9.31712676e-01, 9.95119247e-01, 9.00935568e-01, 9.22594161e-01,
8.39846087e-02, 9.99624389e-01, 1.19195424e-02, 8.01629006e-01,
5.33106480e-03, 1.36212174e-02, 3.01447030e-02, 3.55976011e-03,
4.07288328e-03, 1.19195424e-02, 9.98892860e-01, 6.53836704e-02,
4.07288328e-03, 9.98892860e-01, 9.12067619e-03, 9.53422359e-01,
1.20976784e-03, 9.89081439e-01, 6.53836704e-02, 9.99950563e-01,
3.55976011e-03, 4.66017106e-03, 9.49855175e-02, 9.99713340e-01,
6.09795092e-03, 2.71879989e-03, 9.98732777e-01, 9.92695617e-01,
2.02970330e-02, 4.66017106e-03, 9.92695617e-01, 4.66017106e-03,
3.55976011e-03, 8.06224240e-04, 1.77763901e-02, 5.75929752e-02,
9.83708329e-01, 9.98732777e-01, 5.75929752e-02, 2.12858682e-01,
1.55638073e-02, 9.78756075e-01, 9.98339943e-01, 1.07258677e-01,
9.53422359e-01, 6.97519741e-03, 9.97511568e-01, 9.99261761e-01,
9.99671864e-01, 1.04282071e-02, 9.96271374e-01, 1.55638073e-02,
9.98339943e-01, 9.98892860e-01, 9.87520863e-01, 9.31712676e-01,
9.47037987e-01, 2.64340977e-02, 7.55107906e-01, 9.99569998e-01,
4.66017106e-03, 3.91403776e-02, 8.22251990e-01, 9.99154973e-01,
9.39833736e-01, 9.39833736e-01, 9.98549580e-01, 9.85738795e-01,
9.93613325e-01, 9.12067619e-03, 6.97519741e-03, 1.19195424e-02,
9.99713340e-01, 6.97519741e-03, 9.99671864e-01, 9.95733864e-01,
4.45529655e-02, 9.59075003e-01, 9.99935211e-01, 9.99569998e-01,
8.01629006e-01, 9.81394227e-01, 5.44778650e-01, 1.36212174e-02,
9.97825589e-01, 9.93613325e-01, 1.36212174e-02, 9.99781257e-01,
9.99154973e-01, 3.43616982e-02, 9.68463782e-01, 6.97519741e-03,
1.55638073e-02, 9.75755881e-01, 2.02970330e-02, 5.75929752e-02,
9.92695617e-01, 9.47037987e-01, 1.04282071e-02, 9.99355101e-01,
1.36212174e-02, 9.87520863e-01, 2.02970330e-02, 7.41453115e-02,
9.85738795e-01, 2.37530617e-03, 9.78756075e-01, 9.98732777e-01,
9.97511568e-01])

5
2.2 Create confusion matrix

To finish this part of the course, we’ll create a confusion matrix. Recall the following definition:
• Confusion matrix: A graphical representation of how accurate a classifier is at predicting
the labels for a categorical variable.
To create a confusion matrix, we’ll use the confusion_matrix() function from the metrics module
of scikit-learn. To use the function, we’ll need to input the following: * Actual labels of the
holdout sample, stored as y_test * Predicted labels of the holdout sample, stored as y_pred * The
names of the labels, which you can access using clf.classes_
Note: If there were more classes, we would have more numbers or labels in clf.classes_. Since
this is a binomial logistic regression, there are only two labels, 0 and 1.
[15]: # Import the metrics module from scikit-learn
import sklearn.metrics as metrics

[16]: # Calculate the values for each quadrant in the confusion matrix
cm = metrics.confusion_matrix(y_test, y_pred, labels = clf.classes_)

[17]: # Create the confusion matrix as a visualization

disp = metrics.ConfusionMatrixDisplay(confusion_matrix = cm,display_labels =␣
,→clf.classes_)

In order to understand and interpret the numbers in the below confusion matrix, it is important
to keep the following in mind:
• The upper-left quadrant displays the number of true negatives.
• The bottom-left quadrant displays the number of false negatives.
• The upper-right quadrant displays the number of false positives.
• The bottom-right quadrant displays the number of true positives.
We can define the above bolded terms as follows in our given context: * True negatives: The
number of people that were not lying down that the model accurately predicted were not lying
down. * False negatives: The number of people that were lying down that the model inaccurately
predicted were not lying down. * False positives: The number of people that were not lying down
that the model inaccurately predicted were lying down. * True positives: The number of people
that were lying down that the model accurately predicted were lying down.
A perfect model would yield all true negatives and true positives, and no false negatives or false
positives.
[18]: # Display the confusion matrix
disp.plot()

[18]: <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at
0x7ff771be1b10>

6
Congratulations! You’ve completed this lab. However, you may not notice a green check mark
next to this item on Coursera’s platform. Please continue your progress regardless of the check
mark. Just click on the “save” icon at the top of this notebook to ensure your work has been logged.
You now understand how to build and evaluate a binomial logistic regression model with Python.
Going forward, you can start using binomial logistic regression models with your own datasets.

DTDC FIRST MILE API ORDER UPLOAD WS Ver 2.0
No ratings yet
DTDC FIRST MILE API ORDER UPLOAD WS Ver 2.0
17 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
JS - Mock Exam 1
No ratings yet
JS - Mock Exam 1
6 pages
Exp2 Milf
No ratings yet
Exp2 Milf
7 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Logistic Regression
No ratings yet
Logistic Regression
3 pages
29 - ML Exp - 03
No ratings yet
29 - ML Exp - 03
4 pages
223A1131_ML_EXP_2
No ratings yet
223A1131_ML_EXP_2
6 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
23UCC554
No ratings yet
23UCC554
9 pages
Experiment No 3
No ratings yet
Experiment No 3
7 pages
ML Exp 8
No ratings yet
ML Exp 8
22 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Task 1
No ratings yet
Task 1
7 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
C1 W3 Logistic Regression
No ratings yet
C1 W3 Logistic Regression
27 pages
MD - Sajedul Islam - Assaignment - 02
No ratings yet
MD - Sajedul Islam - Assaignment - 02
11 pages
Lab 1
No ratings yet
Lab 1
8 pages
22se02cs039 DS P-11
No ratings yet
22se02cs039 DS P-11
10 pages
DTS 101 Lecture 3
No ratings yet
DTS 101 Lecture 3
21 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
5 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Data Analytcs 2
No ratings yet
Data Analytcs 2
2 pages
Chapter 10 Logistic Reg (Python)
No ratings yet
Chapter 10 Logistic Reg (Python)
29 pages
Introduction To Logistics Regression.
No ratings yet
Introduction To Logistics Regression.
4 pages
Machine Intelligence
No ratings yet
Machine Intelligence
24 pages
Write A Lab Report On Linear Regression and Logistic Regression. Include The Cost Function Differentiation and The Code in The Report.
No ratings yet
Write A Lab Report On Linear Regression and Logistic Regression. Include The Cost Function Differentiation and The Code in The Report.
7 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Shivansh Exp7
No ratings yet
Shivansh Exp7
5 pages
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
No ratings yet
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
6 pages
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 2
No ratings yet
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 2
6 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
Misc 5
No ratings yet
Misc 5
1 page
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
AI Lab8
No ratings yet
AI Lab8
8 pages
Au953721103009 Font
No ratings yet
Au953721103009 Font
26 pages
Logistic Exp1
No ratings yet
Logistic Exp1
2 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
6 pages
ML File
No ratings yet
ML File
17 pages
Experiment1 Explanation
No ratings yet
Experiment1 Explanation
6 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
JAVA PROGRAMMING FOR BEGINNERS: Master Java Fundamentals and Build Your Own Applications (2023 Crash Course)
From Everand
JAVA PROGRAMMING FOR BEGINNERS: Master Java Fundamentals and Build Your Own Applications (2023 Crash Course)
Theo Houle
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Process Analysis
No ratings yet
Process Analysis
13 pages
KRT V3.5.2 Eng
No ratings yet
KRT V3.5.2 Eng
108 pages
Transport Layer Protocols
No ratings yet
Transport Layer Protocols
12 pages
Project Advertisement New Version
No ratings yet
Project Advertisement New Version
13 pages
Assignment 12 Solution
No ratings yet
Assignment 12 Solution
20 pages
AWS Certified Cloud Practitioner CLF-C02 p3
No ratings yet
AWS Certified Cloud Practitioner CLF-C02 p3
15 pages
General Mathematics 101 Math Week 2
No ratings yet
General Mathematics 101 Math Week 2
5 pages
Protection System & Its Applications
No ratings yet
Protection System & Its Applications
27 pages
In Ac bangaloreuniversity-DGCER-202117SBC27035
No ratings yet
In Ac bangaloreuniversity-DGCER-202117SBC27035
1 page
Data Visualization Presentation (Tableau) - 1317686
No ratings yet
Data Visualization Presentation (Tableau) - 1317686
9 pages
64 Bit Android V8 Installation Instruction PCT
No ratings yet
64 Bit Android V8 Installation Instruction PCT
1 page
Tutorial Letter 101/0/2024: Advanced Structural Steel Design
No ratings yet
Tutorial Letter 101/0/2024: Advanced Structural Steel Design
24 pages
Learn The Basics of Web Development
No ratings yet
Learn The Basics of Web Development
2 pages
CCS340 - Cybersecurity
No ratings yet
CCS340 - Cybersecurity
1 page
Chapter 4 Coordinate Geometry
No ratings yet
Chapter 4 Coordinate Geometry
16 pages
Unit Testing Explained: Tools, Techniques, and Examples
No ratings yet
Unit Testing Explained: Tools, Techniques, and Examples
15 pages
Business Process Procedure: Quality Management
No ratings yet
Business Process Procedure: Quality Management
16 pages
Cucm - B - Upgrade and Migration Guide 1201
No ratings yet
Cucm - B - Upgrade and Migration Guide 1201
158 pages
A Solution Method For Optimal Weight Design Problem
No ratings yet
A Solution Method For Optimal Weight Design Problem
4 pages
4.3.8-Packet-Tracer - Configure-Layer-3-Switching-And-Inter-Vlan-Routing
No ratings yet
4.3.8-Packet-Tracer - Configure-Layer-3-Switching-And-Inter-Vlan-Routing
5 pages
Week 1 Intro To Propositional Logic
No ratings yet
Week 1 Intro To Propositional Logic
9 pages
System Modelling
No ratings yet
System Modelling
19 pages
Runbook For Upgrading RAM
No ratings yet
Runbook For Upgrading RAM
3 pages
Problem 1
No ratings yet
Problem 1
2 pages
MIDAS - Multi-Layered Attack Detection Architecture With Decision Optimisation
No ratings yet
MIDAS - Multi-Layered Attack Detection Architecture With Decision Optimisation
14 pages
SPU - Quiz Manual Final
No ratings yet
SPU - Quiz Manual Final
7 pages
Porheat 35P271146E - MIL-1
No ratings yet
Porheat 35P271146E - MIL-1
32 pages
Sekhar Sir Hibernate Complete Notes (Sathya Technologies)
79% (34)
Sekhar Sir Hibernate Complete Notes (Sathya Technologies)
231 pages