0% found this document useful (0 votes)
25 views7 pages

ANLY 502 Final Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views7 pages

ANLY 502 Final Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Final Project Report:

Employee Attrition

Team Analyzers:
 Priya Komineedi
 Rama Chaitanya Samanchi
 Richa Girdhar
 Subash Putti
SECTION-1: Introduction

SECTION-2: Business case

SECTION-3: Description of the data set

SECTION-4: Methodology

# In[1]:

import csv
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

# In[2]:

df = pd.read_csv(r'C:\\Users\\u6073468\\Downloads\\ibm-hr-analytics-
employee-attrition-performance\\WA_Fn-UseC_-HR-Employee-Attrition.csv')
df.head()

# In[3]:

#data cleaning - make this the heading of this section


df.isnull().any() #check for nan values

# In[4]:

#changing attrition variable values to boolean


map_dict = {'Yes':1, 'No':0}
# Use the pandas apply method to numerically encode our attrition target
variable
df['Attrition_cd'] = df["Attrition"].apply(lambda x: map_dict[x])
df =df.drop('Attrition',1) #delete original attrition column, since we'll
use attrition_cd for modeling

# In[5]:

#reviewing the data using plots


df.hist(figsize=(15,15))
plt.show()

# In[7]:

#removing columns that only have one value for all data points since they
do not add to the model
df = df.drop(['EmployeeCount','Over18','StandardHours','EmployeeNumber'],
axis=1)
df.head()

# In[8]:

#checking variables for correlation by creating a correlation matrix


correlation_matrix= df.corr()
mask = np.zeros_like(correlation_matrix)
mask[np.tril_indices_from(mask)] = True
plt.figure(figsize=(15, 10))
sns.heatmap(correlation_matrix,
vmax=.5,
mask=mask,
linewidths=.2, cmap="YlGnBu")

#prepping the data -- this is the new heading


#Feature Encoding - changing categorical variables into numerical values
#we will use Lable Encoding and One-Hot Encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lab_enc = LabelEncoder()

# Label Encoding will be used for columns with 2 or less unique values
lab_enc_count = 0
for col in df.columns[1:]:
if df[col].dtype == 'object':
if len(list(df[col].unique())) <= 2:
lab_enc.fit(df[col])
df[col] = lab_enc.transform(df[col])
lab_enc_count += 1
print('{} columns were label encoded.'.format(lab_enc_count))

# In[12]:

# convert rest of categorical variable into dummy


df = pd.get_dummies(df, drop_first=True) #drop one dimension from the
representation so as to avoid dependency among the variables

print('Size of Full Encoded Dataset: {}'. format(df.shape))


df.head()
# In[14]:

# Store the numerical features to a dataframe attrition_num


attrition_num = df['Attrition_cd'].copy()

# In[15]:

#we can now proceed to divide data into testing and training dataset

X_train, X_test, y_train, y_test = train_test_split(df,


attrition_num,
train_size= 0.75,
random_state=0,
stratify =
attrition_num);
print("Number transactions X_train dataset: ", X_train.shape)
print("Number transactions y_train dataset: ", y_train.shape)
print("Number transactions X_test dataset: ", X_test.shape)
print("Number transactions y_test dataset: ", y_test.shape)

y = df['Attrition_cd']
X = df.drop(['Attrition_cd'],axis=1)

# this is the end of data prepping section

SECTION-5: The Best Model

We have chosen a Decision Tree Classification Model for this Business Case which
uses a classification model. This algorithm used training and testing datasets to predict
the values of the target variables as explained in the previous section. We couldn’t use
the linear regression model as we are predicting a categorical variable, attrition. And
then we have used Feature Scaling using a Standard Scaler as it transforms the data in
such way that the distribution will have a standard deviation of 1. Before, training the
model, we have also employed Synthetic Minority Oversampling Technique(SMOTE),
as Class Imbalance affects Ensemble classifiers a lot.

So, the Decision tree classification model has been built with cross_val_score as it is
the simplest way to make a cross-validation. We have also checked the
cross_val_predict but, found that it predicts for each observation which the
cross_val_score can do it directly. The metrics used to evaluate the model, accuracy
score, classification report, confusion_matrix, and auroc_score, were also imported. The
same were printed for the training and test datasets prepared earlier(Fig-1).
Fig-1:

We have four metrics to explain our classification our test dataset:

The first metric, Accuracy score for the test set is 0.7302 which means the classification
is 73.02% accurate. The second metric, Classification Report gives an extensive data of
Precision which is the accuracy of predictions, Recall which is the ability to identify the
true cases, f1-score is the harmonic mean of true precision and recall, and support is
the total no of data points under each category. So, the accuracy of prediction by the
model for attrition(1) is 0.24, the ability to identify true cases is 0.35 and the harmonic
mean is 0.29. The total no of data point used in the model to test are 441 in which 69
resulted in an attrition. All the weighted averages are greater than 0.5 which approves
the model

The third metric, Confusion Matrix just displays the True positives, True negatives,
False positives, and False Negatives cases which shows 322 TP+FN out of 441 cases.
And, the AUROC score is 0.575 which is greater than 0.575 which makes the model
reliable for the decision. Thus, we can conclude that considering all these metrics are
showing that the model we have developed is good enough to base our HR decisions.
SECTION-6: Results

Most of the variables affect attrition at least to some extent. But, we want to discuss the
top four variables. The bar graph(Fig-2) produced shows the independent variables
which affect attrition are Stock Option level, Job Satisfaction, Job Involvement, and
Relationship Satisfaction. The result is very self-explanatory as providing stock option to
an employee will affect their decision in leaving or staying back. Similarly, an
employee’s Job satisfaction and their involvement will significantly influence their stay in
the company.

Fig-2:

SECTION-7: Recommendations

Recommendation 1: Providing employees shares in their company’s stock


It is a well-known fact that stock shares are a huge incentive for reducing attrition as it
creates a sense of ownership in the minds of employee. So, providing shares in the
stock is good solution to prevent an employee from leaving the company. An employee
who performs well can be given relevant share in the stock as a bonus which can
greatly increase their morale. However, it might not work if the market is a bear market.

Recommendation 2: Having reward system and star of the month award system

Recognition is a great tool to keep an employee’s morale high. So, implementing


positive systems such as thank you points from the higher management for an
employee contribution when exceeded expectation or performed well during a crisis
system will act as a positive booster. Also, acknowledging an employee who innovates
or brings a positive change in the process, accomplishes a lean six sigma project, or an
improvement activity can act as a great reward for an employee to take ownership of
their job. This will improve job involvement further which results in reducing negative
attrition.

Recommendation-3: Conducting training sessions to help in building relationships

Friction in work place is a common factor which results in affecting relationships. And, if
the conflict is with a manager, it could lead to attrition. Therefore, to tackle this problem,
training programs to managers and employees like the Laboratory Technicians can
create everlasting relationships and a sense of positive push.

SECTION-8: Conclusion

This model’s aim was to study the HR data to understand the important factors which
affect attrition levels and to come up recommendations. Therefore, we have built a
decision tree classification model which is reliable enough by training and testing the
data, using different metrics to confirm the reliability of the model, and identifying the
mot important factors influencing attrition.

Bibliography

References

https://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/

http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/lguo/decisionTree.html

https://smallbusiness.chron.com/lower-attrition-50169.html

You might also like