Employee Attrition Analysis
Employee Attrition Analysis
Problem Statement
The primary problem addressed in this report is predicting employee attrition within an organization.
Employee attrition, or turnover, represents employees leaving a company for various reasons, and
high attrition can significantly impact business productivity, morale, and financial stability.
Understanding and predicting which employees are at risk of leaving helps organizations implement
proactive retention strategies and reduce overall turnover.
Dataset Description
The dataset used for this analysis is comprised of records related to employees' demographic, job-
related, and satisfaction factors. It consists of 59,598 rows and 24 features, out of which 11 features
were selected for this analysis. The selected features include:
Age: Employee age (integer).
Job Satisfaction: Level of job satisfaction (categorical, encoded).
Monthly Income: Employee's monthly income (integer).
Number of Dependents: Number of dependents reported by the employee (integer).
Company Tenure: Number of years the employee has been at the company (integer).
Job Level: Job level within the company (categorical, encoded).
Education Level: Level of education achieved by the employee (categorical, encoded).
Work-Life Balance: Self-reported work-life balance (categorical, encoded).
Years at Company: Total number of years spent at the company (integer).
Gender: Gender of the employee (categorical, encoded).
Attrition: Target variable, indicating whether the employee has left or stayed at the company
(categorical, 'Stayed' or 'Left').
Categorical features were encoded using Label Encoder to transform non-numerical values into
numerical values for compatibility with the prediction model.
Solution Approach
The solution approach used in this analysis was based on classification models. Initially, the dataset
was partitioned into training and testing sets, with 60% of the data used for training and 40% for
testing. The features were used as predictors, while the target variable was employee attrition.
A Decision Tree Classifier was chosen for this analysis due to its interpretability and ability to
handle mixed data types. The classifier was trained using the training dataset, and hyperparameters
such as max_depth were set to limit the complexity of the tree and prevent overfitting. The final
model had a maximum depth of 3.
The model was evaluated using accuracy and confusion matrix metrics:
The accuracy of the model was 0.64 (64%), indicating that the model correctly predicted
employee attrition 64% of the time.
The error rate was calculated to be 0.36 (36%), showing the proportion of incorrect
predictions made by the model.
Results
The results demonstrate that the Decision Tree Classifier was moderately effective in predicting
employee attrition, with some limitations in accuracy that indicate a need for further improvement.
The confusion matrix reveals that there were a significant number of both false positives and false
negatives, which suggests that the model struggled to accurately differentiate between employees who
would leave and those who would stay.
Conclusion
In conclusion, this analysis aimed to predict employee attrition using a Decision Tree Classifier
model. The chosen features and model provided moderate accuracy in predicting attrition,
highlighting the potential for further model optimization.
Additionally, the company could benefit from leveraging this model's insights to identify employees
at high risk of attrition and develop targeted retention strategies, such as offering personalized
development opportunities or better work-life balance initiatives.