Usiness Analytics Using R: Final Group Presentation
Usiness Analytics Using R: Final Group Presentation
Group 6
Problem statement
• Data includes:
• Employee Number,
• Satisfaction level
• Last evaluation
• Total number of project
• Average monthly hours
• Time spend with the company
• Work accident
• Promotion in last 5years (if any)
• Department,
• Salary, and
• whether the employee has left or not
Methodology
Data cleaning: Cleaned the data & found no missing data
Dividing into Test & Train data: Divided the data into 30:70 ratio, 30% Test data
& 70% Train data
Linear Regression: Linear Regression on all the columns except Employee ID,
since it no has no correlation with our predicted column
Decision Trees: To classify the variables in order of their significance
Support Vector Machine: Radial SVM to determine the accuracy of our model.
Random Forests: To determine the accuracy of our model because the accuracy of
our model with SVM was very low
Codes used
Microsoft Word
Document
Findings
No missing value
Decision Tree
prp plot
Decision Tree
rpart plot
Random Forest
Predicted
0 1 Total
0 3442 7 3449
Actual
1 26 1024 1050
Total 3468 1031 4499
Precision 99.25% 99.32%
Recall 99.80% 97.52%
Accuracy 99.27%
Support Vector Machine
Predicted
0 1 Total
0 3352 97 3449
Actual
1 90 960 1050
Total 3442 1057 4499
Precision 97.39% 90.82%
Recall 97.19% 91.43%
Accuracy 95.84%
Inferences
• Linear Regression Values:
R sq.=0.2082 (When considered all the variables, it is very low). But from linear regression we have found
that satisfaction level is the most important factor. Last evaluation, number of projects
• Decision tree:
Satisfaction level of 0.47 or more/less is the most important criteria. Based on that, number of projects >=3 or
less and time spent in the company >=5 or less are the next deciding criteria. Satisfaction level>0.47 has
72%of the data, and Satisfaction level>0.47 and time spent in the company <5 has 59% of the data
• SVM:
In SVM we got an accuracy of 95.84 which was a little more than Decision Trees.
• Random Forests:
Accuracy of 99.27 using random forests which is the highest accuracy among all the models that we used.
Since, Random Forests has the highest accuracy among all the models so we are going with Random Forests
to predict whether an employee will leave the organization or not.
Thank you