Logistic Binary Classification
Logistic Binary Classification
ipynb - Colaboratory
df = pd.read_csv("insurance_data.csv")
df.head()
age bought_insurance
0 22 0
1 25 0
2 47 1
3 52 0
4 46 1
As we can see below, the points are mainly on the line y=0 and y=1 (Binary).
<matplotlib.collections.PathCollection at 0x7f0b78a97f40>
lr.fit(X_train, y_train)
▾ LogisticRegression
LogisticRegression()
lr.predict(X_test)
array([1, 0, 1, 0, 0, 0])
lr.score(X_test, y_test)
0.8333333333333334
keyboard_arrow_down Exercise
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years Department salary
df1['left'].value_counts()
0 11428
1 3571
Name: left, dtype: int64
df_no_strings = df1.drop(['Department', 'salary'], axis=1) # we create a new dataframe without the columns that have non-numeric value
df_no_strings.head()
df_no_strings.corr().style.background_gradient(cmap='coolwarm', axis=None) # we use this new data frame to find out the correlation of each attribute with each other
From the above table, we can see that the relation between 'last_evaulation', 'number_project', 'average_monthly_hours', 'promotion_last_5years'
and 'left' is very less (0.0...).
This means that they do not affect the result a lot. So we can ignore these attributes.
keyboard_arrow_down We check the relation of 'Salary' and 'Department' seperately in the form of barcharts as we
excluded them in the correlation table.
pd.crosstab(df1.salary, df1.left).plot(kind='bar')
https://colab.research.google.com/drive/1naxOKbCFR9MWgqvfW8Iq8iyQ2Em8ZIAL#scrollTo=iBVyRIRbVhSX&printMode=true 1/3
3/12/24, 11:56 AM MlYtLec8.ipynb - Colaboratory
<Axes: xlabel='salary'>
It is seen above that people with high salary do not tend to leave the company.
pd.crosstab(df1.Department, df1.left).plot(kind='bar')
<Axes: xlabel='Department'>
It is seen above that a lot of employees left from the Sales dept but a lot of them retained as well. So we can conclude that there is no direct
relationship of 'Department' and 'left'.
2) time_spend_company
3) Work_accident
4) salary
0 0.38 3 0 low 1
1 0.80 6 0 medium 1
2 0.11 4 0 medium 1
3 0.72 5 0 low 1
4 0.37 3 0 low 1
newdf1['salary'] = le.fit_transform(newdf1['salary'])
newdf1
<ipython-input-127-3d81752a9699>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
0 0.38 3 0 1 1
1 0.80 6 0 2 1
2 0.11 4 0 2 1
3 0.72 5 0 1 1
4 0.37 3 0 1 1
14994 0.40 3 0 1 1
14995 0.37 3 0 1 1
14996 0.37 3 0 1 1
14997 0.11 4 0 1 1
14998 0.37 3 0 1 1
X = newdf1.drop(['left'], axis='columns')
y = newdf1.left
lr.fit(X_train, y_train)
▾ LogisticRegression
LogisticRegression()
lr.predict(X_test)
lr.score(X_test, y_test)
output 0.768
https://colab.research.google.com/drive/1naxOKbCFR9MWgqvfW8Iq8iyQ2Em8ZIAL#scrollTo=iBVyRIRbVhSX&printMode=true 2/3
3/12/24, 11:56 AM MlYtLec8.ipynb - Colaboratory
https://colab.research.google.com/drive/1naxOKbCFR9MWgqvfW8Iq8iyQ2Em8ZIAL#scrollTo=iBVyRIRbVhSX&printMode=true 3/3