0% found this document useful (0 votes)
57 views11 pages

Usiness Analytics Using R: Final Group Presentation

This document summarizes a group project presentation on using business analytics and R to predict employee turnover. The group collected data on 15,000 employees, including satisfaction levels, evaluations, projects, hours worked, tenure, accidents, promotions, department, salary, and whether they left the company. They cleaned the data, split it into test and train sets, and used linear regression, decision trees, support vector machines, and random forests. Random forests achieved the highest accuracy of 99.27% at predicting employee turnover. Satisfaction level, evaluations, and number of projects were found to be the most important predictors of an employee leaving.

Uploaded by

Aninda Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views11 pages

Usiness Analytics Using R: Final Group Presentation

This document summarizes a group project presentation on using business analytics and R to predict employee turnover. The group collected data on 15,000 employees, including satisfaction levels, evaluations, projects, hours worked, tenure, accidents, promotions, department, salary, and whether they left the company. They cleaned the data, split it into test and train sets, and used linear regression, decision trees, support vector machines, and random forests. Random forests achieved the highest accuracy of 99.27% at predicting employee turnover. Satisfaction level, evaluations, and number of projects were found to be the most important predictors of an employee leaving.

Uploaded by

Aninda Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Business Analytics using R

Final Group Presentation

Group 6
Problem statement

 To determine the probability of an employee leaving a firm on the


basis of past available data
Data collection

• Data Collected from ‘datahack.analyticsvidya.com’, of 15000 employees

• Data includes:
• Employee Number,
• Satisfaction level
• Last evaluation
• Total number of project
• Average monthly hours
• Time spend with the company
• Work accident
• Promotion in last 5years (if any)
• Department,
• Salary, and
• whether the employee has left or not
Methodology
 Data cleaning: Cleaned the data & found no missing data
 Dividing into Test & Train data: Divided the data into 30:70 ratio, 30% Test data
& 70% Train data
 Linear Regression: Linear Regression on all the columns except Employee ID,
since it no has no correlation with our predicted column
 Decision Trees: To classify the variables in order of their significance
 Support Vector Machine: Radial SVM to determine the accuracy of our model.
 Random Forests: To determine the accuracy of our model because the accuracy of
our model with SVM was very low
 Codes used

Microsoft Word
Document
Findings
No missing value
Decision Tree
prp plot
Decision Tree
rpart plot
Random Forest

Predicted
   0  1 Total
0 3442 7 3449
Actual
1 26 1024 1050
Total 3468 1031 4499
Precision 99.25% 99.32%
Recall 99.80% 97.52%
Accuracy 99.27%
Support Vector Machine

Predicted
   0  1 Total
0 3352 97 3449
Actual
1 90 960 1050
Total 3442 1057 4499
Precision 97.39% 90.82%
Recall 97.19% 91.43%
Accuracy 95.84%
Inferences
• Linear Regression Values:
R sq.=0.2082 (When considered all the variables, it is very low). But from linear regression we have found
that satisfaction level is the most important factor. Last evaluation, number of projects
• Decision tree:
Satisfaction level of 0.47 or more/less is the most important criteria. Based on that, number of projects >=3 or
less and time spent in the company >=5 or less are the next deciding criteria. Satisfaction level>0.47 has
72%of the data, and Satisfaction level>0.47 and time spent in the company <5 has 59% of the data
• SVM:
In SVM we got an accuracy of 95.84 which was a little more than Decision Trees.
• Random Forests:
Accuracy of 99.27 using random forests which is the highest accuracy among all the models that we used.
Since, Random Forests has the highest accuracy among all the models so we are going with Random Forests
to predict whether an employee will leave the organization or not.
Thank you

You might also like