0% found this document useful (0 votes)

74 views

Data Mining

This document is the final report from Team 8 for their business data mining course. It details an employee retention program for an IT company called ABC to help them determine which employees should receive retention bonuses to reduce turnover. The team analyzed a dataset of 4,653 employees to build predictive models to identify those most likely to leave. Logistic regression was chosen as the best model. When applied to a holdout test set, it identified 22 employees very likely to leave, 234 very likely to stay, and 209 unsure. The company should incentivize the unsure employees to maximize retention.

Uploaded by

Samruddha Shedge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Data Mining

Uploaded by

Samruddha Shedge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

2218 – INSY - 5339 – 002 - Principles of Business Data Mining

Professor: Jayarajan Samuel

Final Report

Employee Retention Program

Team 8
Team Members

Mantena Surendra Varma 1001956606

Atluri Sri Harsha 1001986168
Mallineni Sirisha 1001965878
Kiran Oruganti Venkatesh 1001957965

The University of Texas at Arlington

Employee Retention Program - Recruit and Retain Talent Team –
Executive Summary
Employee retention refers to keeping employees in organizations and preventing them from
leaving at any cost. Attracting and retaining the best employees is a stumbling block for any
company. There are a variety of reasons why an employee may resign or attempt to leave their
work, some of them could be:
1. Many job prospects outside the company
2. Low payment tiers/monthly salaries are a source of dissatisfaction.
3. Workplace adaptability
4. Uncomfortable/Unbalance work life
5. There are no employment perks, such as leave policies.
Employees are the most vital component of any business. They have an associate degree and are
an essential element of the business; without them, the company cannot envisage growing and
cannot also be ineffective in achieving any one of the organization's goals.
Any organization does not have the authority to terminate any of its employees unless it has a
strategy or a plan in place to keep the employees. Different companies have different methods
of retaining employees, but what matters is the company's plan for retaining employees.
The HR department from ABC IT company that Recruit and Retain jobs ensures that its employees
don’t quit their jobs. They want to compensate employees with retention bonus and provide
employment perks to achieve maximum retention.

Problem Statement
The ABC IT company's HR department, which recruits and retains employees, has limited funds;
they cannot compensate every employee and are concerned about who they should incentivize
to increase retention rates. They require the services of a data scientist to gain insight into
employees who are leaving the company and determine who should be compensated with
retention bonus.

Problem Solution
Using a data set containing 4653 observations containing employee details and their decisions
to quit or stay with the organization, we "Team 8" data scientists will generate predictions on
the data set and provide data insights and recommendations on whom they should reward.
Data Description:
The data set has 4635 observations with 9 different variables as explained below,
Independent Variables:
▪ Education: Education Level - Bachelors, Masters and PHD.
▪ Joining Year: Year of Joining Company.
▪ City: City office where Posted: Bangalore, Pune, and New Delhi.
▪ Payment Tier: Payment Tier 1- Highest, Payment Tier 2 – Medium, Payment Tier 3 –
Lowest.
▪ Age: Current Age
▪ Gender: Gender of Employee
▪ Ever Benched: Ever Kept out of Project For more than one month or more.
▪ Experience: Experience in Current Field.
Dependent (Target) Variable:
▪ Leave (1) or Not (0)
The dataset contains 3 binary, 4 categorical and 2 continuous variables.
More about Data:
The following images explains the Variable summary and Information about any missing values
in Class, Interval and Target Variables in detail.

Variable Summary:
The below images explain that the Class, Interval and Target Variables do not have any missing values.
Data Visualization:

Initial Results Observed: There are some outliners that have been observed while exploring the
data. There is no linear correlation between age and experience. In the first line graph (Bachelors)
at age 37 we can see that average experience of people who have never left the company is less
than average experience of the people who left the company. Same behaviors (outliers) are
observed for master’s and PHD employee’s as well. Refer to the below graph.

Trend Lines:
We can see a trend that employees leaving the company are increasing in the coming years. Refer
to the below graph
We can see a trend that employees staying with the company is very less in the coming years.
Refer to the below graph.

Additional Graphs:
The graph shows that nearly half of Masters' degree holding employees are likely to leave the
company.
The graph shows that employees of Payment Tier 2 are likely to leave the company.

The graph shows that high number of Female employees are likely to leave the company
The graph shows the values if employees who are ever benched by Leave or Not

The below graph shows the employees who are more likely to leave or stay with the company based on
their work experience.
Visualization Findings:
Female employees and employees with a master's degree are more likely to leave the
organization, as seen in the above visualization graphs. Payment tier is one of the most common
reasons for employees to leave the organization. Employees on the second category of salary are
more likely to leave. We also notice that based on Ever benched and job experience, there isn't
much of an impact on employees leaving.
We think the employees are leaving due of the poor pay scales. We also assume that the female
employees are leaving since they are not receiving any employee benefits.
Using modeling techniques, we will now predict the employees who need to be compensated so
that they do not leave the organization. We'd also like to provide additional information to
management based on our findings.

Prediction Models and Findings:

We have used Linear regression, Logistic regression, and Decision tree models to Train, validate
and test the data.
1. We take the data and divide it into three sections: 70%, 20%, and 10%.
2. We set aside 10% of the data. At the end, this will be used as new data to make
predictions.
3. Use 70% of the data to train the model and 20% to validate the model.

Data Partition:

As shown below we have set 70% to Train, 20% to Validate and 10% to Test.
Diagram Flow in SAS Tool:

The results of each model are as follows,

Linear Regression results:
Logistic Regression results:

Decision Tress: Build Decision tree with high significant variables.

About Results (β values):
The outputs in the above images of each model shows the coefficient’s (β values). Payment Tier,
Gender, City, and Education, all have a Pr/Chi-sqr of <0.001, indicating that they are highly
significant variables. Also, these parameters could have a positive or negative impact on
employees leaving the company.

Reasons to Choose Logistic regression as our model:

Logistic Regression assumes that the data is linearly (or curvy linearly) separable in space on to
exactly two planes.
Decision Trees are non-linear classifiers; they do not require data to be linearly separable. They
Bisect the sample space in to smaller and smaller Regions
We are sure that our data set divides in to exactly two separable parts, so we have chosen to go
with Regression model as it is performing best on our data.
We have chosen Logistic regression over Linear regression because of the following reasons,
1. The dependent variable (Target Variable) in our dataset is binary and Logistic regression
is highly used in this case.
2. Also, the Misclassification rates and Average squared errors for Logistic regression are
slightly less when compared to Linear regression. Please see the image on the left side of
this page.
3. The event classification table for both models are given below on the right-side of the
page, as you can see the False Negative values are better for Logistic regression.
Prediction (ŷ):
we have used R studio to run Logistic regression model on the 10% data (considered as new data)
that was kept aside at the beginning of the data partition. There are 465 observations in this 10%
data set.
The R studio code is shown in the figure below. Each line's explanation is provided in terms of
comments. The code for predicting the probabilities of each observation and exporting the
resultant data to an Excel file is highlighted in green.

Why Probabilities?
Before we look at the prediction outcomes, we need to discuss how these outputs can help
enhance retention rates.
We know that the probabilities of an employee quitting the organization range from 0 to 1. We
also know that the probability number closest to 0 indicates that the employee has a low interest
in leaving the company, while the probability value closest to 1 indicates that the person has a
great interest in leaving the organization. Based on this, we can say that employees with
probability values ranging from 0.3 to 0.7 are most likely unsure about whether to leave the
organization.
Overall, the probability of an employee leaving the organization is divided into three categories:
▪ 0 % to 20% (0 to 0.2) - indicates that the probability of employees leaving the company is
very low. There is no need to Incentivize because they won’t leave the company anyway.
▪ 70% to 100% (0.7 to 1) indicates that the probability of employees leaving the
organization is very high. There is no need to incentivize because they will leave the
company anyway, even if they are incentivized.
▪ 20% to 70% (0.2 to 0.7) - the probability of employees who are uncertain (Undecided)
whether to leave or stay with the company. These employees need to be incentivized to
obtain maximum retention.

Results:
The following picture shows the probabilities of each observation in excel sheet, and bar chart
that shows count of employees that are leaving, staying and uncertain in the company.

22 employees are leaving, 234 are staying, and 209 are unsure. There may be more employees i
n the unsure group who will leave the company. To increase retention, these employee issues
must be addressed.
Managerial or Policy Implications/ Recommendations:
we used visualization to gain further insights from the prediction results and the following are
the surprising insights.
The Pie Chart depicts that the Female employees are more likely to leave the company compared
to Male employees. It is very unlikely to see that almost 90% of female employee are in Uncertain
and leave category and only 10% are likely to stay in the company.
The Bar graph shows that the employees with master's degree are more likely to leave company
compared to others.
Conclusion:
We can conclude from the data visualization graphs and β values that payment tier is one of the
most prevalent reasons why employees leave the company. Offering an employee retention
bonus is the ideal solution, but due to company’s limited budget and it is not a not a good idea
to distribute budget equally to every employee. So, employees in the 0 to 20% (0 to 0.2) category
do not need to be incentivized because they will not leave the company anyway, whereas
employees in the 70% to 100% (0.7 to 1) category do not need to be incentivized because they
will leave the company regardless of whether they are incentivized. The highest retention rate
will be achieved by compensating 20% to 70% (0.2 to 0.7) of category employees.
Also, 90% of Female employees are unsure or leaving the company, management must develop
a Female Employee Benefits policy to prevent female employees from quitting the organization.
By utilizing modeling and visualization tools as a data scientist, we are pleased that we were able
to explain the common factors for employees leaving the company, provide advice on how to
best allocate their limited budget, which categories of employees the employer should reward,
additional information on which types of employees are more likely to leave, and policy
suggestions. We are confident that these results will ensure that the organization's retention rate
is as high as possible.

Math144 M1reviewer
No ratings yet
Math144 M1reviewer
41 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
Employee Attrition Study Case
No ratings yet
Employee Attrition Study Case
88 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
Employee Attrition Prediction
100% (1)
Employee Attrition Prediction
21 pages
Assignment Report - Group A
No ratings yet
Assignment Report - Group A
31 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Employee Turnover Prediction
100% (1)
Employee Turnover Prediction
16 pages
Cdu 1121 09
No ratings yet
Cdu 1121 09
10 pages
Employee Future Prediction
No ratings yet
Employee Future Prediction
3 pages
Problem Statement:: Field Characteristics Data Type
No ratings yet
Problem Statement:: Field Characteristics Data Type
4 pages
DATA4800 Report
No ratings yet
DATA4800 Report
6 pages
Employee Attrition Classification
No ratings yet
Employee Attrition Classification
16 pages
PPT (1)
No ratings yet
PPT (1)
44 pages
Research Paper (1)
No ratings yet
Research Paper (1)
5 pages
Reportprediction of Employee Atrition Uisng Machine Learning
No ratings yet
Reportprediction of Employee Atrition Uisng Machine Learning
6 pages
Prediction of Employee Attrition PDF
0% (1)
Prediction of Employee Attrition PDF
7 pages
HR Analytics Synopsis
100% (1)
HR Analytics Synopsis
3 pages
Employee Retention Problem Part 1: Written by Muhammad Rizaldy
No ratings yet
Employee Retention Problem Part 1: Written by Muhammad Rizaldy
1 page
AI Workshop Predict Employee Leave
No ratings yet
AI Workshop Predict Employee Leave
22 pages
Human Retention Using Data Science
No ratings yet
Human Retention Using Data Science
16 pages
Summer Internship Report
No ratings yet
Summer Internship Report
24 pages
employee turnover1
No ratings yet
employee turnover1
4 pages
Assighment3 4 AI Projecct
No ratings yet
Assighment3 4 AI Projecct
58 pages
Tentative Research Topic
No ratings yet
Tentative Research Topic
4 pages
Advanced Business Analytics Project: Prepared By: Group 10 Lohith Kumar Vamshi Aparna Samarth
No ratings yet
Advanced Business Analytics Project: Prepared By: Group 10 Lohith Kumar Vamshi Aparna Samarth
7 pages
Ibm Attrition Practices
No ratings yet
Ibm Attrition Practices
7 pages
Employee Turnover
No ratings yet
Employee Turnover
19 pages
Employee Turnover Prediction
No ratings yet
Employee Turnover Prediction
12 pages
Predicting Employee Attrition Along With Identifying High Risk Employees Using Big Data and Machine Learning
No ratings yet
Predicting Employee Attrition Along With Identifying High Risk Employees Using Big Data and Machine Learning
8 pages
Attrition Project Mangal
No ratings yet
Attrition Project Mangal
75 pages
ANLY 502 Final Report
No ratings yet
ANLY 502 Final Report
7 pages
People Analytics Flight Model
No ratings yet
People Analytics Flight Model
12 pages
Project Report
No ratings yet
Project Report
22 pages
IBM Analysis
No ratings yet
IBM Analysis
17 pages
Early Prediction of Employee Attrition Using Data Mining Techniques
No ratings yet
Early Prediction of Employee Attrition Using Data Mining Techniques
6 pages
Report
No ratings yet
Report
45 pages
Predict Employee Retention Using Data Sciene
No ratings yet
Predict Employee Retention Using Data Sciene
7 pages
Employee Attrition Prediction Using Machine Learning
No ratings yet
Employee Attrition Prediction Using Machine Learning
9 pages
1722506171 Employee Turnover Problem Statement
No ratings yet
1722506171 Employee Turnover Problem Statement
5 pages
Is 451 Report 1
No ratings yet
Is 451 Report 1
4 pages
Employee Attrition Risk Assessment Report - IT Case Study by The Brew (Https://thebrew - In)
No ratings yet
Employee Attrition Risk Assessment Report - IT Case Study by The Brew (Https://thebrew - In)
19 pages
[email protected]
No ratings yet
[email protected]
6 pages
Employee Turnover Prediction Project
No ratings yet
Employee Turnover Prediction Project
10 pages
HR Analytics
No ratings yet
HR Analytics
24 pages
Attrition Prediction: Schandia@cit - Edu.in
No ratings yet
Attrition Prediction: Schandia@cit - Edu.in
1 page
Towards Understanding Employee Attrition Using Decision Tree
100% (1)
Towards Understanding Employee Attrition Using Decision Tree
4 pages
Foreseeing Employee Attritions Using Div
No ratings yet
Foreseeing Employee Attritions Using Div
7 pages
HR A (6)
No ratings yet
HR A (6)
7 pages
HR_Review1
No ratings yet
HR_Review1
11 pages
AIP_aip-202501-0006
No ratings yet
AIP_aip-202501-0006
16 pages
batch 16 (3)
No ratings yet
batch 16 (3)
8 pages
BerkeGündüz MelihAydın Cmpe442 Training Report
No ratings yet
BerkeGündüz MelihAydın Cmpe442 Training Report
14 pages
EDA
No ratings yet
EDA
19 pages
Db15 Conference
No ratings yet
Db15 Conference
6 pages
Is451 Slide Deck 1
No ratings yet
Is451 Slide Deck 1
28 pages
ATAIML_02.04_04
No ratings yet
ATAIML_02.04_04
14 pages
Your Best Employee Is About To Quit
No ratings yet
Your Best Employee Is About To Quit
33 pages
IntelliAuto Employee Retention Analysis
No ratings yet
IntelliAuto Employee Retention Analysis
8 pages
Predictive Attrition Model
No ratings yet
Predictive Attrition Model
21 pages
Job Challenge Profile, Participant Workbook
From Everand
Job Challenge Profile, Participant Workbook
Cynthia D. McCauley
No ratings yet
How to Align Employee Targets to the Strategy
From Everand
How to Align Employee Targets to the Strategy
Tawia Tsekumah
No ratings yet
Explain The Linear Regression Algorithm in Detail
No ratings yet
Explain The Linear Regression Algorithm in Detail
12 pages
Improve Phase: 1. Should Be Flow Chart
No ratings yet
Improve Phase: 1. Should Be Flow Chart
9 pages
(Ebook) Spatial Analysis in Health Geography by Pavlos Kanaroglou, Eric Delmelle (eds.) ISBN 9781472416193, 1472416198 - Download the ebook now and read anytime, anywhere
100% (1)
(Ebook) Spatial Analysis in Health Geography by Pavlos Kanaroglou, Eric Delmelle (eds.) ISBN 9781472416193, 1472416198 - Download the ebook now and read anytime, anywhere
60 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Crash Course in Analytics For Non Analytics Managers
No ratings yet
Crash Course in Analytics For Non Analytics Managers
74 pages
Written Report
No ratings yet
Written Report
5 pages
Explaining Military Coups and Impeachments in Latin America Perez Linan Polga
No ratings yet
Explaining Military Coups and Impeachments in Latin America Perez Linan Polga
21 pages
Standardized Coefficients
No ratings yet
Standardized Coefficients
5 pages
Sales
No ratings yet
Sales
19 pages
Chapter2 Research Classification and Types of Research Eviota
100% (2)
Chapter2 Research Classification and Types of Research Eviota
39 pages
Contingency Theory Some Suggested Directions PDF
No ratings yet
Contingency Theory Some Suggested Directions PDF
20 pages
Empirical Methods For Finance: Sjoerd Van Den Hauwe
No ratings yet
Empirical Methods For Finance: Sjoerd Van Den Hauwe
27 pages
Week 3 4 Summative Test PR2
No ratings yet
Week 3 4 Summative Test PR2
4 pages
EY & Zepto Data Analyst Interview Questions
No ratings yet
EY & Zepto Data Analyst Interview Questions
24 pages
The Relationship Between Self-Efficacy, Self-Control and Self-Esteem To Students Academic Procrastination
No ratings yet
The Relationship Between Self-Efficacy, Self-Control and Self-Esteem To Students Academic Procrastination
11 pages
Chapter 1 - Multivariate
100% (1)
Chapter 1 - Multivariate
30 pages
Test Bank for Business Research Methods, 14th Edition Pamela Schindler instant download
100% (2)
Test Bank for Business Research Methods, 14th Edition Pamela Schindler instant download
66 pages
Statistical Tests For Comparative Studies
No ratings yet
Statistical Tests For Comparative Studies
7 pages
Exploring Internet Influence Towards Travel Satisfaction: Sciencedirect
No ratings yet
Exploring Internet Influence Towards Travel Satisfaction: Sciencedirect
10 pages
SLAC On Action Research - Day 1
No ratings yet
SLAC On Action Research - Day 1
7 pages
A.I Seminal
No ratings yet
A.I Seminal
27 pages
Project Report Forest Fire Final
No ratings yet
Project Report Forest Fire Final
26 pages
Econometrics- chapter -chapter- II
No ratings yet
Econometrics- chapter -chapter- II
34 pages
Assignment SI Dr. Javed Iqbal Fall 21 New
No ratings yet
Assignment SI Dr. Javed Iqbal Fall 21 New
7 pages
Jacob Dupps Research Final Paper
No ratings yet
Jacob Dupps Research Final Paper
11 pages
The Effects of Social Media Marketing On Online Consumer Behavior by Simona Vinerean, Iuliana Cetina, Luigi Dumitrescu & Mihai Tichindelean
No ratings yet
The Effects of Social Media Marketing On Online Consumer Behavior by Simona Vinerean, Iuliana Cetina, Luigi Dumitrescu & Mihai Tichindelean
15 pages
gamification_in_mathematics_for_student_by_dorit_alt.pdf
No ratings yet
gamification_in_mathematics_for_student_by_dorit_alt.pdf
6 pages
HPT30103 Research Methodology Group 3
No ratings yet
HPT30103 Research Methodology Group 3
26 pages

Data Mining

Uploaded by

Data Mining

Uploaded by

2218 – INSY - 5339 – 002 - Principles of Business Data Mining

Professor: Jayarajan Samuel

Employee Retention Program

Mantena Surendra Varma 1001956606

The University of Texas at Arlington

Prediction Models and Findings:

The results of each model are as follows,

Decision Tress: Build Decision tree with high significant variables.

Reasons to Choose Logistic regression as our model:

You might also like