Data Mining
Data Mining
Final Report
Team 8
Team Members
Problem Statement
The ABC IT company's HR department, which recruits and retains employees, has limited funds;
they cannot compensate every employee and are concerned about who they should incentivize
to increase retention rates. They require the services of a data scientist to gain insight into
employees who are leaving the company and determine who should be compensated with
retention bonus.
Problem Solution
Using a data set containing 4653 observations containing employee details and their decisions
to quit or stay with the organization, we "Team 8" data scientists will generate predictions on
the data set and provide data insights and recommendations on whom they should reward.
Data Description:
The data set has 4635 observations with 9 different variables as explained below,
Independent Variables:
▪ Education: Education Level - Bachelors, Masters and PHD.
▪ Joining Year: Year of Joining Company.
▪ City: City office where Posted: Bangalore, Pune, and New Delhi.
▪ Payment Tier: Payment Tier 1- Highest, Payment Tier 2 – Medium, Payment Tier 3 –
Lowest.
▪ Age: Current Age
▪ Gender: Gender of Employee
▪ Ever Benched: Ever Kept out of Project For more than one month or more.
▪ Experience: Experience in Current Field.
Dependent (Target) Variable:
▪ Leave (1) or Not (0)
The dataset contains 3 binary, 4 categorical and 2 continuous variables.
More about Data:
The following images explains the Variable summary and Information about any missing values
in Class, Interval and Target Variables in detail.
Variable Summary:
The below images explain that the Class, Interval and Target Variables do not have any missing values.
Data Visualization:
Initial Results Observed: There are some outliners that have been observed while exploring the
data. There is no linear correlation between age and experience. In the first line graph (Bachelors)
at age 37 we can see that average experience of people who have never left the company is less
than average experience of the people who left the company. Same behaviors (outliers) are
observed for master’s and PHD employee’s as well. Refer to the below graph.
Trend Lines:
We can see a trend that employees leaving the company are increasing in the coming years. Refer
to the below graph
We can see a trend that employees staying with the company is very less in the coming years.
Refer to the below graph.
Additional Graphs:
The graph shows that nearly half of Masters' degree holding employees are likely to leave the
company.
The graph shows that employees of Payment Tier 2 are likely to leave the company.
The graph shows that high number of Female employees are likely to leave the company
The graph shows the values if employees who are ever benched by Leave or Not
The below graph shows the employees who are more likely to leave or stay with the company based on
their work experience.
Visualization Findings:
Female employees and employees with a master's degree are more likely to leave the
organization, as seen in the above visualization graphs. Payment tier is one of the most common
reasons for employees to leave the organization. Employees on the second category of salary are
more likely to leave. We also notice that based on Ever benched and job experience, there isn't
much of an impact on employees leaving.
We think the employees are leaving due of the poor pay scales. We also assume that the female
employees are leaving since they are not receiving any employee benefits.
Using modeling techniques, we will now predict the employees who need to be compensated so
that they do not leave the organization. We'd also like to provide additional information to
management based on our findings.
Data Partition:
As shown below we have set 70% to Train, 20% to Validate and 10% to Test.
Diagram Flow in SAS Tool:
Why Probabilities?
Before we look at the prediction outcomes, we need to discuss how these outputs can help
enhance retention rates.
We know that the probabilities of an employee quitting the organization range from 0 to 1. We
also know that the probability number closest to 0 indicates that the employee has a low interest
in leaving the company, while the probability value closest to 1 indicates that the person has a
great interest in leaving the organization. Based on this, we can say that employees with
probability values ranging from 0.3 to 0.7 are most likely unsure about whether to leave the
organization.
Overall, the probability of an employee leaving the organization is divided into three categories:
▪ 0 % to 20% (0 to 0.2) - indicates that the probability of employees leaving the company is
very low. There is no need to Incentivize because they won’t leave the company anyway.
▪ 70% to 100% (0.7 to 1) indicates that the probability of employees leaving the
organization is very high. There is no need to incentivize because they will leave the
company anyway, even if they are incentivized.
▪ 20% to 70% (0.2 to 0.7) - the probability of employees who are uncertain (Undecided)
whether to leave or stay with the company. These employees need to be incentivized to
obtain maximum retention.
Results:
The following picture shows the probabilities of each observation in excel sheet, and bar chart
that shows count of employees that are leaving, staying and uncertain in the company.
22 employees are leaving, 234 are staying, and 209 are unsure. There may be more employees i
n the unsure group who will leave the company. To increase retention, these employee issues
must be addressed.
Managerial or Policy Implications/ Recommendations:
we used visualization to gain further insights from the prediction results and the following are
the surprising insights.
The Pie Chart depicts that the Female employees are more likely to leave the company compared
to Male employees. It is very unlikely to see that almost 90% of female employee are in Uncertain
and leave category and only 10% are likely to stay in the company.
The Bar graph shows that the employees with master's degree are more likely to leave company
compared to others.
Conclusion:
We can conclude from the data visualization graphs and β values that payment tier is one of the
most prevalent reasons why employees leave the company. Offering an employee retention
bonus is the ideal solution, but due to company’s limited budget and it is not a not a good idea
to distribute budget equally to every employee. So, employees in the 0 to 20% (0 to 0.2) category
do not need to be incentivized because they will not leave the company anyway, whereas
employees in the 70% to 100% (0.7 to 1) category do not need to be incentivized because they
will leave the company regardless of whether they are incentivized. The highest retention rate
will be achieved by compensating 20% to 70% (0.2 to 0.7) of category employees.
Also, 90% of Female employees are unsure or leaving the company, management must develop
a Female Employee Benefits policy to prevent female employees from quitting the organization.
By utilizing modeling and visualization tools as a data scientist, we are pleased that we were able
to explain the common factors for employees leaving the company, provide advice on how to
best allocate their limited budget, which categories of employees the employer should reward,
additional information on which types of employees are more likely to leave, and policy
suggestions. We are confident that these results will ensure that the organization's retention rate
is as high as possible.