0% found this document useful (0 votes)

33 views8 pages

Logistic Regression

This document describes using logistic regression for classification. It loads banking data, splits it into training and test sets, builds a logistic regression model, and evaluates the model's performance using various metrics like accuracy, confusion matrix, ROC curve, etc. Key steps include preprocessing data, training and testing the model, interpreting coefficients, predicting probabilities and classes, and generating classification reports.

Uploaded by

Nipuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views8 pages

Logistic Regression

Uploaded by

Nipuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

logistic-regression

March 24, 2024

[2]: import pandas as pd

import numpy as np
from sklearn.linear_model import LogisticRegression #This is for logistic␣
↪regression

from sklearn.model_selection import train_test_split

from sklearn.metrics import confusion_matrix,␣
↪accuracy_score,classification_report,roc_curve,roc_auc_score #Metrics for␣

↪classification

import seaborn as sns

import matplotlib.pyplot as plt

from pandas_profiling import ProfileReport #pandas profiling for report␣

↪generation

[31]: #load dataset

data=pd.read_csv("C:/Users/ramaleer/Desktop/Practical 2/Datasets/Bank.CSV")
data.head()

[31]: age duration emp_var_rate cons_price_idx cons_conf_idx euribor3m \

0 44 210 1.4 93.444 -36.1 4.963
1 53 138 -0.1 93.200 -42.0 4.021
2 28 339 -1.7 94.055 -39.8 0.729
3 39 185 -1.8 93.075 -47.1 1.405
4 55 137 -2.9 92.201 -31.4 0.869

nr_employed y
0 5228.1 0
1 5195.8 0
2 4991.6 1
3 5099.1 0
4 5076.2 1

0.0.1 Data Description

• 1 - Age (numeric)
• 2 - Duration: last contact duration, in seconds (numeric)
• 3 - Emp.var.rate: employment variation rate - quarterly indicator (numeric)
• 4 - Cons.price.idx: consumer price index - monthly indicator (numeric)

1
• 5 - Cons.conf.idx: consumer confidence index - monthly indicator (numeric)
• 6 - Euribor3m: euribor 3 month rate - daily indicator (numeric)
• 7 - Nr.employed: number of employees - quarterly indicator (numeric)
The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).
Where variable y is encoded as yes = 1 and No = 0
[4]: #check if the data set is balanced or not.
data.y.value_counts()

#values show its imbalanced

[4]: 0 36548
1 4640
Name: y, dtype: int64

[5]: #get information about data

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 41188 non-null int64
1 duration 41188 non-null int64
2 emp_var_rate 41188 non-null float64
3 cons_price_idx 41188 non-null float64
4 cons_conf_idx 41188 non-null float64
5 euribor3m 41188 non-null float64
6 nr_employed 41188 non-null float64
7 y 41188 non-null int64
dtypes: float64(5), int64(3)
memory usage: 2.5 MB

[6]: #get the column and row count

print("Columns:",data.shape[1])
print("Rows:",data.shape[0])

Columns: 8
Rows: 41188

[7]: #define x, y
#Seperating independent data matrix & response vector

x = data.drop(columns = ['y'], axis=1) #independent variables selected

y = data.y #target vaiable is y

2
[8]: # Splitting data into training & testing sets (Validation set approach)

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.
↪2,random_state=0) # 20% of data is selected for test set-validation set

1 Creating the logistic regression model

[9]: model=LogisticRegression()

2 Training the model with training data

[10]: model.fit(x_train,y_train)

[10]: LogisticRegression()

3 Estimated coefficients for parameters and intercept of the model

[11]: model.coef_

[11]: array([[ 0.00102983, 0.00453544, -0.21667 , 0.4244128 , 0.05623385,

-0.27695426, -0.0078621 ]])

[12]: model.intercept_

[12]: array([0.00389662])

4 Predict the class of the unseen data

[13]: #here we use our test set for validation
y_pred=model.predict(x_test)
y_pred

[13]: array([0, 0, 0, …, 0, 0, 0], dtype=int64)

5 Predicted probabilities for each observation

[14]: #get the probabilities
y_pred_probs=model.predict_proba(x_test)
y_pred_probs #Left side column for 0 and right side column for 1

[14]: array([[0.93744048, 0.06255952],

[0.67198681, 0.32801319],

3
[0.9914955 , 0.0085045 ],
…,
[0.9921623 , 0.0078377 ],
[0.94365365, 0.05634635],
[0.99442537, 0.00557463]])

[15]: #probability is summation equals to 1

#Example:
0.93744108 + 0.06255892

#probability of outcome = 0 (no) and probability of outcome = 1 (yes)

[15]: 1.0

6 Confusion matrix
[16]: #obtain the confucion matrix
confusion_matrix(y_test,y_pred)

[16]: array([[7157, 168],

[ 606, 307]], dtype=int64)

[17]: sns.heatmap(confusion_matrix(y_test,y_pred),annot=True,fmt="g")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

4
7 Values inside the Confusion Matrix
[18]: tn,fp,fn,tp=confusion_matrix(y_test,y_pred).ravel()

[19]: tn #True Negatives : Negatives predicted correcly

[19]: 7157

[20]: fn #False Negatives : Predicted as No but Actualy its y= No.

[20]: 606

[21]: tp #True Positives : Positives predicted correcly

[21]: 307

[22]: fp #False Prositives : Predicted as positives but its actually false(because␣

↪its actually negative)

[22]: 168

5
8 Accuracy & Misclassification Error
[23]: #accuracy values from manual calculation
accuracy=(np.diag(confusion_matrix(y_test,y_pred)).sum())/len(y_test)
accuracy

[23]: 0.9060451565914057

[24]: #accuracy values: from function

accuracy_score(y_test,y_pred)

[24]: 0.9060451565914057

[25]: 1-accuracy_score(y_test,y_pred)

[25]: 0.09395484340859428

[26]: # MCE=miss classification error : incorrect classifications (FP and FN)

MCE=1-accuracy
MCE

[26]: 0.09395484340859428

9 Classification report with more metrics

[27]: print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.92 0.98 0.95 7325

1 0.65 0.34 0.44 913

accuracy 0.91 8238

macro avg 0.78 0.66 0.70 8238
weighted avg 0.89 0.91 0.89 8238

10 Receiver operating characteristic Curve (ROC Curve)

[28]: fpr, tpr, _ = roc_curve(y_test, y_pred_probs[:,1])
plt.plot(fpr,tpr)
plt.title("ROC Curve")
plt.show()

6
[29]: #Area under Curve
auc = roc_auc_score(y_test, y_pred_probs[:,1])
auc

[29]: 0.9121246014152794

[30]: # Homework : Check if there is a way for you to calculate the best threshold
# Homewrork : Experiment with other available classifier models on this Data to␣
↪classify the subscription of term deposit.

10.1 Generate Report using Pandas Profiling

[32]: profile = ProfileReport(data)
profile

Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]

Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
<IPython.core.display.HTML object>

7
[32]:

[ ]:

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Assignment 2 Specification SWE5204 Advanced Databases and Big Data
No ratings yet
Assignment 2 Specification SWE5204 Advanced Databases and Big Data
8 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Credit - Defaulters - Prediction Using Logostic Regression
No ratings yet
Credit - Defaulters - Prediction Using Logostic Regression
17 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
Classification
No ratings yet
Classification
3 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
Data Analysis in Python-3
No ratings yet
Data Analysis in Python-3
4 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Project Report
No ratings yet
Project Report
19 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Step by Step Data Processing For ML Project
No ratings yet
Step by Step Data Processing For ML Project
16 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Case Study Stock Market Prediciton
No ratings yet
Case Study Stock Market Prediciton
10 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Aquif Ibrar 1212
No ratings yet
Aquif Ibrar 1212
9 pages
Random Forest
No ratings yet
Random Forest
8 pages
MD - Sajedul Islam - Assaignment - 02
No ratings yet
MD - Sajedul Islam - Assaignment - 02
11 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
Home Work
No ratings yet
Home Work
12 pages
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
No ratings yet
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
8 pages
INSY446 - 4 - Classification Part 1
No ratings yet
INSY446 - 4 - Classification Part 1
26 pages
Import As Import As From Import From Import From Import From Import
No ratings yet
Import As Import As From Import From Import From Import From Import
4 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
FML File Final
No ratings yet
FML File Final
36 pages
Maxbox Starter66 Machine Learning4
No ratings yet
Maxbox Starter66 Machine Learning4
10 pages
AI and ML Lab Ex3 To 12
No ratings yet
AI and ML Lab Ex3 To 12
27 pages
Project Presentation.
No ratings yet
Project Presentation.
19 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
21CSC305P ML - Lab Programs 1 - 9
No ratings yet
21CSC305P ML - Lab Programs 1 - 9
36 pages
Project Presentation
No ratings yet
Project Presentation
19 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
ML 6 7 8
No ratings yet
ML 6 7 8
10 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
DA Programs
No ratings yet
DA Programs
44 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
3 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
DSBDA5 - Jupyter Notebook
No ratings yet
DSBDA5 - Jupyter Notebook
4 pages
Naive Bayes Model With Python 1684166563
No ratings yet
Naive Bayes Model With Python 1684166563
9 pages
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
No ratings yet
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
26 pages
ML File - 1
No ratings yet
ML File - 1
12 pages
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
No ratings yet
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
9 pages
R Assignment
No ratings yet
R Assignment
8 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
Annotated Follow-Along Guide - Construct A Logistic Regression Model With Python
No ratings yet
Annotated Follow-Along Guide - Construct A Logistic Regression Model With Python
7 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Linear and Nonlinear Programming Essentials
From Everand
Linear and Nonlinear Programming Essentials
Tanushri Kaniyar
No ratings yet
Soper and Mitra-2013 Amcis-An Inquiry Into Mental Models of Web Interface Design
No ratings yet
Soper and Mitra-2013 Amcis-An Inquiry Into Mental Models of Web Interface Design
7 pages
Data Cleaning and Pre Processing 1
No ratings yet
Data Cleaning and Pre Processing 1
12 pages
Text Processing
No ratings yet
Text Processing
16 pages
Apache Storm
No ratings yet
Apache Storm
29 pages
EDA Regression1
100% (1)
EDA Regression1
15 pages
Combined FDS PPT - Prof Arindam Roy Lectures 1-6
No ratings yet
Combined FDS PPT - Prof Arindam Roy Lectures 1-6
134 pages
Report - Management Accounting Practice &amp Implement in Bangladesh
88% (8)
Report - Management Accounting Practice &amp Implement in Bangladesh
62 pages
Factor Analysis Explained
No ratings yet
Factor Analysis Explained
4 pages
Data Considerations For Crossed Gage R
No ratings yet
Data Considerations For Crossed Gage R
11 pages
Sameer Srivastava Curriculum Vitae
No ratings yet
Sameer Srivastava Curriculum Vitae
2 pages
Chapter 11: Simple Linear Regression
No ratings yet
Chapter 11: Simple Linear Regression
57 pages
Doc3 Main Report
No ratings yet
Doc3 Main Report
60 pages
Synopsis For Data Analyzer
No ratings yet
Synopsis For Data Analyzer
18 pages
Research Project 12
No ratings yet
Research Project 12
18 pages
Deepak Data Analyst
No ratings yet
Deepak Data Analyst
1 page
Differnt Reg Lines
No ratings yet
Differnt Reg Lines
9 pages
GridDataReport-Surfer - Curvas de Nivel
No ratings yet
GridDataReport-Surfer - Curvas de Nivel
7 pages
Jr. Asst Cum Computer Assistant
No ratings yet
Jr. Asst Cum Computer Assistant
2 pages
Introduction To VAR Model
No ratings yet
Introduction To VAR Model
8 pages
H0: Age and Are Independent of Each Other H1: Age and Are Associated With Each Other
No ratings yet
H0: Age and Are Independent of Each Other H1: Age and Are Associated With Each Other
4 pages
BA 1 - Describing and Summarizing Data PDF
No ratings yet
BA 1 - Describing and Summarizing Data PDF
4 pages
BSBMKG517 Assessment Task 1
No ratings yet
BSBMKG517 Assessment Task 1
8 pages
MSBA - 2023-Brochure - Terry School of Business Georgia
No ratings yet
MSBA - 2023-Brochure - Terry School of Business Georgia
12 pages
Handy Reference Sheet 2 - HRP 259 Calculation Formula's For Sample Data
No ratings yet
Handy Reference Sheet 2 - HRP 259 Calculation Formula's For Sample Data
15 pages
Structural Equation Modeling (Sem) : Kassa T. (PHD) Email: Tel
No ratings yet
Structural Equation Modeling (Sem) : Kassa T. (PHD) Email: Tel
76 pages
One Way Anova - Ii: Orthogonal Contrasts ANOVA Models
No ratings yet
One Way Anova - Ii: Orthogonal Contrasts ANOVA Models
16 pages
Social Media Marketing and Digital Marketing Project
50% (2)
Social Media Marketing and Digital Marketing Project
67 pages
GA Data Analytics Immersive Syllabus Bahrain
No ratings yet
GA Data Analytics Immersive Syllabus Bahrain
12 pages
BÀI TẬP 4-Nguyễn Huỳnh Thảo Vy
No ratings yet
BÀI TẬP 4-Nguyễn Huỳnh Thảo Vy
2 pages
Assignment 2023 2024
No ratings yet
Assignment 2023 2024
3 pages
Salman's Resume BCG
No ratings yet
Salman's Resume BCG
1 page
Consumer Perceptions PDF
No ratings yet
Consumer Perceptions PDF
14 pages
DMDW Syllabus
No ratings yet
DMDW Syllabus
2 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

logistic-regression

March 24, 2024

[2]: import pandas as pd

from sklearn.model_selection import train_test_split

import seaborn as sns

from pandas_profiling import ProfileReport #pandas profiling for report␣

[31]: #load dataset

[31]: age duration emp_var_rate cons_price_idx cons_conf_idx euribor3m \

0.0.1 Data Description

#values show its imbalanced

[5]: #get information about data

[6]: #get the column and row count

x = data.drop(columns = ['y'], axis=1) #independent variables selected

1 Creating the logistic regression model

2 Training the model with training data

3 Estimated coefficients for parameters and intercept of the model

[11]: array([[ 0.00102983, 0.00453544, -0.21667 , 0.4244128 , 0.05623385,

4 Predict the class of the unseen data

[13]: array([0, 0, 0, …, 0, 0, 0], dtype=int64)

5 Predicted probabilities for each observation

[14]: array([[0.93744048, 0.06255952],

[15]: #probability is summation equals to 1

#probability of outcome = 0 (no) and probability of outcome = 1 (yes)

[16]: array([[7157, 168],

[19]: tn #True Negatives : Negatives predicted correcly

[20]: fn #False Negatives : Predicted as No but Actualy its y= No.

[21]: tp #True Positives : Positives predicted correcly

[22]: fp #False Prositives : Predicted as positives but its actually false(because␣

[24]: #accuracy values: from function

[26]: # MCE=miss classification error : incorrect classifications (FP and FN)

9 Classification report with more metrics

precision recall f1-score support

0 0.92 0.98 0.95 7325

accuracy 0.91 8238

10 Receiver operating characteristic Curve (ROC Curve)

10.1 Generate Report using Pandas Profiling

Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]

You might also like