ANLY 502 Final Report
ANLY 502 Final Report
Employee Attrition
Team Analyzers:
Priya Komineedi
Rama Chaitanya Samanchi
Richa Girdhar
Subash Putti
SECTION-1: Introduction
SECTION-4: Methodology
# In[1]:
import csv
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
# In[2]:
df = pd.read_csv(r'C:\\Users\\u6073468\\Downloads\\ibm-hr-analytics-
employee-attrition-performance\\WA_Fn-UseC_-HR-Employee-Attrition.csv')
df.head()
# In[3]:
# In[4]:
# In[5]:
# In[7]:
#removing columns that only have one value for all data points since they
do not add to the model
df = df.drop(['EmployeeCount','Over18','StandardHours','EmployeeNumber'],
axis=1)
df.head()
# In[8]:
# Label Encoding will be used for columns with 2 or less unique values
lab_enc_count = 0
for col in df.columns[1:]:
if df[col].dtype == 'object':
if len(list(df[col].unique())) <= 2:
lab_enc.fit(df[col])
df[col] = lab_enc.transform(df[col])
lab_enc_count += 1
print('{} columns were label encoded.'.format(lab_enc_count))
# In[12]:
# In[15]:
#we can now proceed to divide data into testing and training dataset
y = df['Attrition_cd']
X = df.drop(['Attrition_cd'],axis=1)
We have chosen a Decision Tree Classification Model for this Business Case which
uses a classification model. This algorithm used training and testing datasets to predict
the values of the target variables as explained in the previous section. We couldn’t use
the linear regression model as we are predicting a categorical variable, attrition. And
then we have used Feature Scaling using a Standard Scaler as it transforms the data in
such way that the distribution will have a standard deviation of 1. Before, training the
model, we have also employed Synthetic Minority Oversampling Technique(SMOTE),
as Class Imbalance affects Ensemble classifiers a lot.
So, the Decision tree classification model has been built with cross_val_score as it is
the simplest way to make a cross-validation. We have also checked the
cross_val_predict but, found that it predicts for each observation which the
cross_val_score can do it directly. The metrics used to evaluate the model, accuracy
score, classification report, confusion_matrix, and auroc_score, were also imported. The
same were printed for the training and test datasets prepared earlier(Fig-1).
Fig-1:
The first metric, Accuracy score for the test set is 0.7302 which means the classification
is 73.02% accurate. The second metric, Classification Report gives an extensive data of
Precision which is the accuracy of predictions, Recall which is the ability to identify the
true cases, f1-score is the harmonic mean of true precision and recall, and support is
the total no of data points under each category. So, the accuracy of prediction by the
model for attrition(1) is 0.24, the ability to identify true cases is 0.35 and the harmonic
mean is 0.29. The total no of data point used in the model to test are 441 in which 69
resulted in an attrition. All the weighted averages are greater than 0.5 which approves
the model
The third metric, Confusion Matrix just displays the True positives, True negatives,
False positives, and False Negatives cases which shows 322 TP+FN out of 441 cases.
And, the AUROC score is 0.575 which is greater than 0.575 which makes the model
reliable for the decision. Thus, we can conclude that considering all these metrics are
showing that the model we have developed is good enough to base our HR decisions.
SECTION-6: Results
Most of the variables affect attrition at least to some extent. But, we want to discuss the
top four variables. The bar graph(Fig-2) produced shows the independent variables
which affect attrition are Stock Option level, Job Satisfaction, Job Involvement, and
Relationship Satisfaction. The result is very self-explanatory as providing stock option to
an employee will affect their decision in leaving or staying back. Similarly, an
employee’s Job satisfaction and their involvement will significantly influence their stay in
the company.
Fig-2:
SECTION-7: Recommendations
Recommendation 2: Having reward system and star of the month award system
Friction in work place is a common factor which results in affecting relationships. And, if
the conflict is with a manager, it could lead to attrition. Therefore, to tackle this problem,
training programs to managers and employees like the Laboratory Technicians can
create everlasting relationships and a sense of positive push.
SECTION-8: Conclusion
This model’s aim was to study the HR data to understand the important factors which
affect attrition levels and to come up recommendations. Therefore, we have built a
decision tree classification model which is reliable enough by training and testing the
data, using different metrics to confirm the reliability of the model, and identifying the
mot important factors influencing attrition.
Bibliography
References
https://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/lguo/decisionTree.html
https://smallbusiness.chron.com/lower-attrition-50169.html