0% found this document useful (0 votes)

25 views7 pages

ANLY 502 Final Report

Uploaded by

chaitanya.samanchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views7 pages

ANLY 502 Final Report

Uploaded by

chaitanya.samanchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Final Project Report:

Employee Attrition

Team Analyzers:
 Priya Komineedi
 Rama Chaitanya Samanchi
 Richa Girdhar
 Subash Putti
SECTION-1: Introduction

SECTION-2: Business case

SECTION-3: Description of the data set

SECTION-4: Methodology

# In[1]:

import csv
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

# In[2]:

df = pd.read_csv(r'C:\\Users\\u6073468\\Downloads\\ibm-hr-analytics-
employee-attrition-performance\\WA_Fn-UseC_-HR-Employee-Attrition.csv')
df.head()

# In[3]:

#data cleaning - make this the heading of this section

df.isnull().any() #check for nan values

# In[4]:

#changing attrition variable values to boolean

map_dict = {'Yes':1, 'No':0}
# Use the pandas apply method to numerically encode our attrition target
variable
df['Attrition_cd'] = df["Attrition"].apply(lambda x: map_dict[x])
df =df.drop('Attrition',1) #delete original attrition column, since we'll
use attrition_cd for modeling

# In[5]:

#reviewing the data using plots

df.hist(figsize=(15,15))
plt.show()

# In[7]:

#removing columns that only have one value for all data points since they
do not add to the model
df = df.drop(['EmployeeCount','Over18','StandardHours','EmployeeNumber'],
axis=1)
df.head()

# In[8]:

#checking variables for correlation by creating a correlation matrix

correlation_matrix= df.corr()
mask = np.zeros_like(correlation_matrix)
mask[np.tril_indices_from(mask)] = True
plt.figure(figsize=(15, 10))
sns.heatmap(correlation_matrix,
vmax=.5,
mask=mask,
linewidths=.2, cmap="YlGnBu")

#prepping the data -- this is the new heading

#Feature Encoding - changing categorical variables into numerical values
#we will use Lable Encoding and One-Hot Encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lab_enc = LabelEncoder()

# Label Encoding will be used for columns with 2 or less unique values
lab_enc_count = 0
for col in df.columns[1:]:
if df[col].dtype == 'object':
if len(list(df[col].unique())) <= 2:
lab_enc.fit(df[col])
df[col] = lab_enc.transform(df[col])
lab_enc_count += 1
print('{} columns were label encoded.'.format(lab_enc_count))

# In[12]:

# convert rest of categorical variable into dummy

df = pd.get_dummies(df, drop_first=True) #drop one dimension from the
representation so as to avoid dependency among the variables

print('Size of Full Encoded Dataset: {}'. format(df.shape))

df.head()
# In[14]:

# Store the numerical features to a dataframe attrition_num

attrition_num = df['Attrition_cd'].copy()

# In[15]:

#we can now proceed to divide data into testing and training dataset

X_train, X_test, y_train, y_test = train_test_split(df,

attrition_num,
train_size= 0.75,
random_state=0,
stratify =
attrition_num);
print("Number transactions X_train dataset: ", X_train.shape)
print("Number transactions y_train dataset: ", y_train.shape)
print("Number transactions X_test dataset: ", X_test.shape)
print("Number transactions y_test dataset: ", y_test.shape)

y = df['Attrition_cd']
X = df.drop(['Attrition_cd'],axis=1)

# this is the end of data prepping section

SECTION-5: The Best Model

We have chosen a Decision Tree Classification Model for this Business Case which
uses a classification model. This algorithm used training and testing datasets to predict
the values of the target variables as explained in the previous section. We couldn’t use
the linear regression model as we are predicting a categorical variable, attrition. And
then we have used Feature Scaling using a Standard Scaler as it transforms the data in
such way that the distribution will have a standard deviation of 1. Before, training the
model, we have also employed Synthetic Minority Oversampling Technique(SMOTE),
as Class Imbalance affects Ensemble classifiers a lot.

So, the Decision tree classification model has been built with cross_val_score as it is
the simplest way to make a cross-validation. We have also checked the
cross_val_predict but, found that it predicts for each observation which the
cross_val_score can do it directly. The metrics used to evaluate the model, accuracy
score, classification report, confusion_matrix, and auroc_score, were also imported. The
same were printed for the training and test datasets prepared earlier(Fig-1).
Fig-1:

We have four metrics to explain our classification our test dataset:

The first metric, Accuracy score for the test set is 0.7302 which means the classification
is 73.02% accurate. The second metric, Classification Report gives an extensive data of
Precision which is the accuracy of predictions, Recall which is the ability to identify the
true cases, f1-score is the harmonic mean of true precision and recall, and support is
the total no of data points under each category. So, the accuracy of prediction by the
model for attrition(1) is 0.24, the ability to identify true cases is 0.35 and the harmonic
mean is 0.29. The total no of data point used in the model to test are 441 in which 69
resulted in an attrition. All the weighted averages are greater than 0.5 which approves
the model

The third metric, Confusion Matrix just displays the True positives, True negatives,
False positives, and False Negatives cases which shows 322 TP+FN out of 441 cases.
And, the AUROC score is 0.575 which is greater than 0.575 which makes the model
reliable for the decision. Thus, we can conclude that considering all these metrics are
showing that the model we have developed is good enough to base our HR decisions.
SECTION-6: Results

Most of the variables affect attrition at least to some extent. But, we want to discuss the
top four variables. The bar graph(Fig-2) produced shows the independent variables
which affect attrition are Stock Option level, Job Satisfaction, Job Involvement, and
Relationship Satisfaction. The result is very self-explanatory as providing stock option to
an employee will affect their decision in leaving or staying back. Similarly, an
employee’s Job satisfaction and their involvement will significantly influence their stay in
the company.

Fig-2:

SECTION-7: Recommendations

Recommendation 1: Providing employees shares in their company’s stock

It is a well-known fact that stock shares are a huge incentive for reducing attrition as it
creates a sense of ownership in the minds of employee. So, providing shares in the
stock is good solution to prevent an employee from leaving the company. An employee
who performs well can be given relevant share in the stock as a bonus which can
greatly increase their morale. However, it might not work if the market is a bear market.

Recommendation 2: Having reward system and star of the month award system

Recognition is a great tool to keep an employee’s morale high. So, implementing

positive systems such as thank you points from the higher management for an
employee contribution when exceeded expectation or performed well during a crisis
system will act as a positive booster. Also, acknowledging an employee who innovates
or brings a positive change in the process, accomplishes a lean six sigma project, or an
improvement activity can act as a great reward for an employee to take ownership of
their job. This will improve job involvement further which results in reducing negative
attrition.

Recommendation-3: Conducting training sessions to help in building relationships

Friction in work place is a common factor which results in affecting relationships. And, if
the conflict is with a manager, it could lead to attrition. Therefore, to tackle this problem,
training programs to managers and employees like the Laboratory Technicians can
create everlasting relationships and a sense of positive push.

SECTION-8: Conclusion

This model’s aim was to study the HR data to understand the important factors which
affect attrition levels and to come up recommendations. Therefore, we have built a
decision tree classification model which is reliable enough by training and testing the
data, using different metrics to confirm the reliability of the model, and identifying the
mot important factors influencing attrition.

Bibliography

References

https://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/

http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/lguo/decisionTree.html

https://smallbusiness.chron.com/lower-attrition-50169.html

(Slides) Module 8 (Employee Attrition Prediction)
No ratings yet
(Slides) Module 8 (Employee Attrition Prediction)
100 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Attrition Project Mangal
No ratings yet
Attrition Project Mangal
75 pages
E-Commerce & Business Communication Ebook (SEM 4)
No ratings yet
E-Commerce & Business Communication Ebook (SEM 4)
87 pages
DOCUMENTATION12
No ratings yet
DOCUMENTATION12
42 pages
Water Calculation
100% (2)
Water Calculation
38 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Employee Attrition Prediction Using Machine Learning
No ratings yet
Employee Attrition Prediction Using Machine Learning
9 pages
Employee Attrition PREDICTION Using Machine Learning
No ratings yet
Employee Attrition PREDICTION Using Machine Learning
11 pages
Capstone Final PPT Group 6
No ratings yet
Capstone Final PPT Group 6
19 pages
5 6089131777291453670
100% (1)
5 6089131777291453670
70 pages
Summer Internship Report
No ratings yet
Summer Internship Report
24 pages
Assighment3 4 AI Projecct
No ratings yet
Assighment3 4 AI Projecct
58 pages
Employee Attrition Study Case
No ratings yet
Employee Attrition Study Case
88 pages
Employee Turnover Prediction Project
No ratings yet
Employee Turnover Prediction Project
10 pages
Group Assignment - Data Mining
No ratings yet
Group Assignment - Data Mining
28 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
Employee Attrition Prediction
100% (1)
Employee Attrition Prediction
21 pages
Is451 Slide Deck 1
No ratings yet
Is451 Slide Deck 1
28 pages
Decision - Tree-Random - Forest - Jupyter Notebook
No ratings yet
Decision - Tree-Random - Forest - Jupyter Notebook
12 pages
Linear Regression Model
No ratings yet
Linear Regression Model
7 pages
Report
No ratings yet
Report
45 pages
RESEARCH PAPER (HR Analytics)
No ratings yet
RESEARCH PAPER (HR Analytics)
11 pages
BerkeGündüz MelihAydın Cmpe442 Training Report
No ratings yet
BerkeGündüz MelihAydın Cmpe442 Training Report
14 pages
BQ Tandas
100% (1)
BQ Tandas
102 pages
941-Article Text-9536-1-10-20240830
No ratings yet
941-Article Text-9536-1-10-20240830
12 pages
IBM HR Analytics For Employee Attrition and Performance Prediction
No ratings yet
IBM HR Analytics For Employee Attrition and Performance Prediction
44 pages
11783-Article Text-8048-2-10-20221004
No ratings yet
11783-Article Text-8048-2-10-20221004
9 pages
Absenteeism at Work Project Report
No ratings yet
Absenteeism at Work Project Report
12 pages
Ataiml 02.04 04
No ratings yet
Ataiml 02.04 04
14 pages
A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
No ratings yet
A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
9 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Emloyee Attrition and Retention
No ratings yet
Emloyee Attrition and Retention
17 pages
HR Review1
No ratings yet
HR Review1
11 pages
Applsci 13 00267
No ratings yet
Applsci 13 00267
8 pages
Module 6 NC II Presenting Relevant Information Final
No ratings yet
Module 6 NC II Presenting Relevant Information Final
68 pages
Employee Turnover Prediction
100% (1)
Employee Turnover Prediction
16 pages
(Out of 100 Marks) (Out of 40 Marks) : Mathematics (Paper 1)
No ratings yet
(Out of 100 Marks) (Out of 40 Marks) : Mathematics (Paper 1)
38 pages
Ibm Attrition Practices
No ratings yet
Ibm Attrition Practices
7 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
HR Analyst (Data Analyst)
No ratings yet
HR Analyst (Data Analyst)
11 pages
Thumb Rules - Xls For Chemical Engineer
No ratings yet
Thumb Rules - Xls For Chemical Engineer
46 pages
Assignment Report - Group A
No ratings yet
Assignment Report - Group A
31 pages
GSTR1 Excel Workbook Template V1.4
No ratings yet
GSTR1 Excel Workbook Template V1.4
84 pages
Employee Attrition Classification
No ratings yet
Employee Attrition Classification
16 pages
IBM Analysis
No ratings yet
IBM Analysis
17 pages
Db15 Conference
No ratings yet
Db15 Conference
6 pages
Applsci 12 06424
No ratings yet
Applsci 12 06424
17 pages
Employee Attrition Prediction Analysis Report
No ratings yet
Employee Attrition Prediction Analysis Report
6 pages
Research Paper
No ratings yet
Research Paper
5 pages
Cdu 1121 09
No ratings yet
Cdu 1121 09
10 pages
Employee Turnover1
No ratings yet
Employee Turnover1
4 pages
DATA4800 Report
No ratings yet
DATA4800 Report
6 pages
Data Mining
No ratings yet
Data Mining
17 pages
Evaluating Employee Attrition - Design and Implementation
No ratings yet
Evaluating Employee Attrition - Design and Implementation
10 pages
Reportprediction of Employee Atrition Uisng Machine Learning
No ratings yet
Reportprediction of Employee Atrition Uisng Machine Learning
6 pages
Unit 2
No ratings yet
Unit 2
15 pages
Karpagam Sep Oct 2019 Article 6
No ratings yet
Karpagam Sep Oct 2019 Article 6
6 pages
Evaluation of Machine Learning Models For Employee Churn
No ratings yet
Evaluation of Machine Learning Models For Employee Churn
5 pages
Hazardous Area Ventilation Sce Performance Standard
No ratings yet
Hazardous Area Ventilation Sce Performance Standard
82 pages
Tentative Research Topic
No ratings yet
Tentative Research Topic
4 pages
Employee Attrition Analysis
No ratings yet
Employee Attrition Analysis
2 pages
Hays Report V4 02122013 Online
No ratings yet
Hays Report V4 02122013 Online
13 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Various - Rock'n Roll Project
No ratings yet
Various - Rock'n Roll Project
15 pages
Financial Modelling PDF
No ratings yet
Financial Modelling PDF
2 pages
Corrosion of Aluminium 2nd Edition Christian Vargel Instant Download
100% (1)
Corrosion of Aluminium 2nd Edition Christian Vargel Instant Download
62 pages
7 ROOT LOCUS Part 1
No ratings yet
7 ROOT LOCUS Part 1
7 pages
Problem Statement:: Field Characteristics Data Type
No ratings yet
Problem Statement:: Field Characteristics Data Type
4 pages
Prediction of Employee Attrition PDF
0% (1)
Prediction of Employee Attrition PDF
7 pages
HR Analytics Synopsis
100% (1)
HR Analytics Synopsis
3 pages
Employee Future Prediction
No ratings yet
Employee Future Prediction
3 pages
A'Seeb Wastewater Project Seeb, Muscat, Sultanate of Oman
No ratings yet
A'Seeb Wastewater Project Seeb, Muscat, Sultanate of Oman
3 pages
(Preview) Notifier - MMX-1 (A) Monitor Module CMX-2 (A) Control Module and ISO-X
No ratings yet
(Preview) Notifier - MMX-1 (A) Monitor Module CMX-2 (A) Control Module and ISO-X
1 page
My Time Table 2024-25
No ratings yet
My Time Table 2024-25
1 page
Complete Mesocolic Excision and Extent of Lymphadenectomy For The Treatment of Colon Cancer
No ratings yet
Complete Mesocolic Excision and Extent of Lymphadenectomy For The Treatment of Colon Cancer
14 pages
Zok The Armenian Dialect of Agulis
No ratings yet
Zok The Armenian Dialect of Agulis
19 pages
Employee Attrition Prediction
No ratings yet
Employee Attrition Prediction
3 pages
KF Quick Reference Guide Method Parameters
No ratings yet
KF Quick Reference Guide Method Parameters
2 pages
12620101AN - KS-VISION - Modbus Supervision Protocol Rev08
No ratings yet
12620101AN - KS-VISION - Modbus Supervision Protocol Rev08
16 pages
04 - RPi Pico - Measure Distance With Ultrasonic Sensor HC-SR04
No ratings yet
04 - RPi Pico - Measure Distance With Ultrasonic Sensor HC-SR04
6 pages
Company QMS - Quality Policy
No ratings yet
Company QMS - Quality Policy
2 pages
For Placement
No ratings yet
For Placement
7 pages
CAWRT Drill Flyer
No ratings yet
CAWRT Drill Flyer
1 page
History of Kenya
No ratings yet
History of Kenya
2 pages
Pronunciation Rules Regular Past Verbs - US
No ratings yet
Pronunciation Rules Regular Past Verbs - US
1 page
Nandkishor Patil
No ratings yet
Nandkishor Patil
2 pages
Housekeeping
No ratings yet
Housekeeping
1 page

ANLY 502 Final Report

Uploaded by

ANLY 502 Final Report

Uploaded by

Final Project Report:

SECTION-2: Business case

SECTION-3: Description of the data set

#data cleaning - make this the heading of this section

#changing attrition variable values to boolean

#reviewing the data using plots

#checking variables for correlation by creating a correlation matrix

#prepping the data -- this is the new heading

# convert rest of categorical variable into dummy

print('Size of Full Encoded Dataset: {}'. format(df.shape))

# Store the numerical features to a dataframe attrition_num

X_train, X_test, y_train, y_test = train_test_split(df,

# this is the end of data prepping section

SECTION-5: The Best Model

We have four metrics to explain our classification our test dataset:

Recommendation 1: Providing employees shares in their company’s stock

Recognition is a great tool to keep an employee’s morale high. So, implementing

Recommendation-3: Conducting training sessions to help in building relationships

You might also like