0% found this document useful (0 votes)

21 views3 pages

Logistic Binary Classification

The document discusses using logistic regression on an insurance dataset to predict whether individuals will buy insurance based on their age. It shows splitting the data into training and test sets, fitting a logistic regression model to the training data, and using the model to make predictions on the test set, achieving an accuracy score of 83.3%.

Uploaded by

jaymehta1444

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views3 pages

Logistic Binary Classification

Uploaded by

jaymehta1444

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

3/12/24, 11:56 AM MlYtLec8.

ipynb - Colaboratory

keyboard_arrow_down Question in the video

import pandas as pd

df = pd.read_csv("insurance_data.csv")
df.head()

age bought_insurance

0 22 0

1 25 0

2 47 1

3 52 0

4 46 1

Next steps: Generate code with df

toggle_off View recommended plots

import matplotlib.pyplot as plt

%matplotlib inline

As we can see below, the points are mainly on the line y=0 and y=1 (Binary).

Therefore we can use Logistic Regression.

plt.scatter(df.age, df.bought_insurance, marker='+', c='red')

<matplotlib.collections.PathCollection at 0x7f0b78a97f40>

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df[['age']], df.bought_insurance, test_size=0.2)

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

lr.fit(X_train, y_train)

▾ LogisticRegression
LogisticRegression()

lr.predict(X_test)

array([1, 0, 1, 0, 0, 0])

lr.score(X_test, y_test)

0.8333333333333334

keyboard_arrow_down Exercise
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df1 = pd.read_csv("HR_comma_sep.csv") # target variable is 'left'

df1.head()

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years Department salary

0 0.38 0.53 2 157 3 0 1 0 sales low

1 0.80 0.86 5 262 6 0 1 0 sales medium

2 0.11 0.88 7 272 4 0 1 0 sales medium

3 0.72 0.87 5 223 5 0 1 0 sales low

4 0.37 0.52 2 159 3 0 1 0 sales low

Next steps: Generate code with df1

toggle_off View recommended plots

df1.describe() # gives the entire statistical details about every column

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years

count 14999.000000 14999.000000 14999.000000 14999.000000 14999.000000 14999.000000 14999.000000 14999.000000

mean 0.612834 0.716102 3.803054 201.050337 3.498233 0.144610 0.238083 0.021268

std 0.248631 0.171169 1.232592 49.943099 1.460136 0.351719 0.425924 0.144281

min 0.090000 0.360000 2.000000 96.000000 2.000000 0.000000 0.000000 0.000000

25% 0.440000 0.560000 3.000000 156.000000 3.000000 0.000000 0.000000 0.000000

50% 0.640000 0.720000 4.000000 200.000000 3.000000 0.000000 0.000000 0.000000

75% 0.820000 0.870000 5.000000 245.000000 4.000000 0.000000 0.000000 0.000000

max 1.000000 1.000000 7.000000 310.000000 10.000000 1.000000 1.000000 1.000000

df1['left'].value_counts()

0 11428
1 3571
Name: left, dtype: int64

df_no_strings = df1.drop(['Department', 'salary'], axis=1) # we create a new dataframe without the columns that have non-numeric value
df_no_strings.head()

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years

0 0.38 0.53 2 157 3 0 1 0

1 0.80 0.86 5 262 6 0 1 0

2 0.11 0.88 7 272 4 0 1 0

3 0.72 0.87 5 223 5 0 1 0

4 0.37 0.52 2 159 3 0 1 0

Next steps: Generate code with df_no_strings

toggle_off View recommended plots

df_no_strings.corr().style.background_gradient(cmap='coolwarm', axis=None) # we use this new data frame to find out the correlation of each attribute with each other

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years

satisfaction_level 1.000000 0.105021 -0.142970 -0.020048 -0.100866 0.058697 -0.388375 0.025605

last_evaluation 0.105021 1.000000 0.349333 0.339742 0.131591 -0.007104 0.006567 -0.008684

number_project -0.142970 0.349333 1.000000 0.417211 0.196786 -0.004741 0.023787 -0.006064

average_montly_hours -0.020048 0.339742 0.417211 1.000000 0.127755 -0.010143 0.071287 -0.003544

time_spend_company -0.100866 0.131591 0.196786 0.127755 1.000000 0.002120 0.144822 0.067433

Work_accident 0.058697 -0.007104 -0.004741 -0.010143 0.002120 1.000000 -0.154622 0.039245

left -0.388375 0.006567 0.023787 0.071287 0.144822 -0.154622 1.000000 -0.061788

promotion_last_5years 0.025605 -0.008684 -0.006064 -0.003544 0.067433 0.039245 -0.061788 1.000000

From the above table, we can see that the relation between 'last_evaulation', 'number_project', 'average_monthly_hours', 'promotion_last_5years'
and 'left' is very less (0.0...).

This means that they do not affect the result a lot. So we can ignore these attributes.

keyboard_arrow_down We check the relation of 'Salary' and 'Department' seperately in the form of barcharts as we
excluded them in the correlation table.

pd.crosstab(df1.salary, df1.left).plot(kind='bar')

https://colab.research.google.com/drive/1naxOKbCFR9MWgqvfW8Iq8iyQ2Em8ZIAL#scrollTo=iBVyRIRbVhSX&printMode=true 1/3
3/12/24, 11:56 AM MlYtLec8.ipynb - Colaboratory
<Axes: xlabel='salary'>

It is seen above that people with high salary do not tend to leave the company.

pd.crosstab(df1.Department, df1.left).plot(kind='bar')

<Axes: xlabel='Department'>

It is seen above that a lot of employees left from the Sales dept but a lot of them retained as well. So we can conclude that there is no direct
relationship of 'Department' and 'left'.

keyboard_arrow_down From the Data Analysis, we can conclude that:

We will use the following variables as independent variables in our model -
1) satisfaction_level

2) time_spend_company

3) Work_accident

4) salary

newdf1 = df1[['satisfaction_level', 'time_spend_company', 'Work_accident', 'salary', 'left']]

newdf1

satisfaction_level time_spend_company Work_accident salary left

0 0.38 3 0 low 1

1 0.80 6 0 medium 1

2 0.11 4 0 medium 1

3 0.72 5 0 low 1

4 0.37 3 0 low 1

... ... ... ... ... ...

14994 0.40 3 0 low 1

14995 0.37 3 0 low 1

14996 0.37 3 0 low 1

14997 0.11 4 0 low 1

14998 0.37 3 0 low 1

14999 rows × 5 columns

Next steps: Generate code with newdf1

toggle_off View recommended plots

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

newdf1['salary'] = le.fit_transform(newdf1['salary'])
newdf1

<ipython-input-127-3d81752a9699>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

newdf1['salary'] = le.fit_transform(newdf1['salary'])
satisfaction_level time_spend_company Work_accident salary left

0 0.38 3 0 1 1

1 0.80 6 0 2 1

2 0.11 4 0 2 1

3 0.72 5 0 1 1

4 0.37 3 0 1 1

... ... ... ... ... ...

14994 0.40 3 0 1 1

14995 0.37 3 0 1 1

14996 0.37 3 0 1 1

14997 0.11 4 0 1 1

14998 0.37 3 0 1 1

14999 rows × 5 columns

Next steps: Generate code with newdf1

toggle_off View recommended plots

X = newdf1.drop(['left'], axis='columns')
y = newdf1.left

from sklearn.model_selection import train_test_split

Start coding or generate with AI.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

lr.fit(X_train, y_train)

▾ LogisticRegression
LogisticRegression()

lr.predict(X_test)

array([0, 1, 0, ..., 0, 0, 0])

lr.score(X_test, y_test)

output 0.768

Start coding or generate with AI.

https://colab.research.google.com/drive/1naxOKbCFR9MWgqvfW8Iq8iyQ2Em8ZIAL#scrollTo=iBVyRIRbVhSX&printMode=true 2/3
3/12/24, 11:56 AM MlYtLec8.ipynb - Colaboratory

https://colab.research.google.com/drive/1naxOKbCFR9MWgqvfW8Iq8iyQ2Em8ZIAL#scrollTo=iBVyRIRbVhSX&printMode=true 3/3

Six Shuttle Circular Loom User Manual
89% (9)
Six Shuttle Circular Loom User Manual
23 pages
Design and Analysis of Differential Gearbox
50% (4)
Design and Analysis of Differential Gearbox
49 pages
Komal ML Assg1
No ratings yet
Komal ML Assg1
9 pages
Srushti ML Assign1
No ratings yet
Srushti ML Assign1
9 pages
Sanket ML Assign1
No ratings yet
Sanket ML Assign1
9 pages
ML 6 7 8
No ratings yet
ML 6 7 8
10 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Logistic Regression Implementation Insurance Data
No ratings yet
Logistic Regression Implementation Insurance Data
3 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Regression
No ratings yet
Regression
16 pages
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
No ratings yet
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
22 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
Python 1
No ratings yet
Python 1
3 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
7 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
DA Programs
No ratings yet
DA Programs
44 pages
Regression Demo
No ratings yet
Regression Demo
8 pages
Unit5 - Linear Regression
No ratings yet
Unit5 - Linear Regression
4 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Data Preprocessing
No ratings yet
Data Preprocessing
18 pages
DSBDA5 - Jupyter Notebook
No ratings yet
DSBDA5 - Jupyter Notebook
4 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
188 Code Tugas 1
No ratings yet
188 Code Tugas 1
18 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
ML Projects
No ratings yet
ML Projects
22 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
5 Logistic Regression Social NW
No ratings yet
5 Logistic Regression Social NW
5 pages
Datascience PR 6 Veda
No ratings yet
Datascience PR 6 Veda
6 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
C: Users Dell Downloads Salary - Data - CSV
No ratings yet
C: Users Dell Downloads Salary - Data - CSV
2 pages
Logistic Regression vs. SVMs - Solution
No ratings yet
Logistic Regression vs. SVMs - Solution
7 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Linear - Regression - Ipynb - Colaboratory
No ratings yet
Linear - Regression - Ipynb - Colaboratory
4 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
SVM Practical4 ML4
No ratings yet
SVM Practical4 ML4
3 pages
ML Assignment 01
No ratings yet
ML Assignment 01
1 page
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
No ratings yet
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
22 pages
Data Analysis in Python-3
No ratings yet
Data Analysis in Python-3
4 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Data Preprocessing 1
No ratings yet
Data Preprocessing 1
6 pages
Credit - Defaulters - Prediction Using Logostic Regression
No ratings yet
Credit - Defaulters - Prediction Using Logostic Regression
17 pages
Iot Da3
No ratings yet
Iot Da3
12 pages
PA Lab 4
No ratings yet
PA Lab 4
6 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
No ratings yet
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
5 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Develop Snakes & Ladders Game Complete Guide with Code & Design
From Everand
Develop Snakes & Ladders Game Complete Guide with Code & Design
Anurag Pandey
No ratings yet
IoTAssignment 2
No ratings yet
IoTAssignment 2
7 pages
Fire CR Dental
No ratings yet
Fire CR Dental
64 pages
DBM 8200 Enh VehicleActions 2013
No ratings yet
DBM 8200 Enh VehicleActions 2013
40 pages
160 Proficiency Syllabus
No ratings yet
160 Proficiency Syllabus
129 pages
Cloud
No ratings yet
Cloud
9 pages
Stree 2 Sarkate Ka Aatank Movie Showtimes in Hyderabad & Online Ticket Booking
No ratings yet
Stree 2 Sarkate Ka Aatank Movie Showtimes in Hyderabad & Online Ticket Booking
1 page
Volume B Operation Procedure of Plant Electrical Systems (HTOM-E-02)
No ratings yet
Volume B Operation Procedure of Plant Electrical Systems (HTOM-E-02)
177 pages
WEB Tech Lab-Manual-22
No ratings yet
WEB Tech Lab-Manual-22
2 pages
Ensayo de Vacaciones de Primavera
100% (1)
Ensayo de Vacaciones de Primavera
7 pages
Draft R4-2002205 (CR) Handover Requirements 38.133 R15 v2
No ratings yet
Draft R4-2002205 (CR) Handover Requirements 38.133 R15 v2
6 pages
Hoistway Top: Gen2 Nova MRL - Machine
100% (1)
Hoistway Top: Gen2 Nova MRL - Machine
4 pages
Riley Nelson Resume
No ratings yet
Riley Nelson Resume
2 pages
Facility Location: Presented By: Shreyas Todankar Gaurang Sampat Nikhil Kawde
No ratings yet
Facility Location: Presented By: Shreyas Todankar Gaurang Sampat Nikhil Kawde
54 pages
Introduction To Parallel Algorithms and Parallel Program Design
No ratings yet
Introduction To Parallel Algorithms and Parallel Program Design
91 pages
Taller1 Hanger Sizing in Caesar
No ratings yet
Taller1 Hanger Sizing in Caesar
37 pages
EWLC 74 $fpeng0001121997
No ratings yet
EWLC 74 $fpeng0001121997
3 pages
An Introduction To Music Technology 2nd Edition Dan Hosken PDF Download
No ratings yet
An Introduction To Music Technology 2nd Edition Dan Hosken PDF Download
54 pages
Accounts Receivables
No ratings yet
Accounts Receivables
93 pages
Reading Answer Sheet
No ratings yet
Reading Answer Sheet
1 page
Table - Selection
No ratings yet
Table - Selection
2 pages
A Review On Applications of Urban Flood Models in Flood Mitigation Strategies
No ratings yet
A Review On Applications of Urban Flood Models in Flood Mitigation Strategies
32 pages
Chapter 3 Cocomo II
No ratings yet
Chapter 3 Cocomo II
30 pages
Pertemuan 2 Strategi Operasi Dalam Lingkungan Global
No ratings yet
Pertemuan 2 Strategi Operasi Dalam Lingkungan Global
48 pages
NAJRUL ANSARI Storekeeper
No ratings yet
NAJRUL ANSARI Storekeeper
3 pages
Class 2 Word Processing (Ms Word)
No ratings yet
Class 2 Word Processing (Ms Word)
8 pages
SEO Complete Guide by Surojit
No ratings yet
SEO Complete Guide by Surojit
55 pages
Mock Exam 03
No ratings yet
Mock Exam 03
7 pages
Astable Multivibrator
100% (1)
Astable Multivibrator
4 pages