0% found this document useful (0 votes)

174 views24 pages

Summer Internship Report

This document is an internship project report submitted by two students, Chandra Kiran Verma and Tisha Chopra, to their professor Ms. Charu Gupta at IGDTUW. The project aims to predict employee attrition using logistic regression, random forest classifier, and decision tree models. It first provides background on employee attrition and reviews previous work using machine learning for prediction. It then describes the methodology, including an overview of decision trees, logistic regression, and random forest algorithms. The results of applying these three models on a dataset with features like income, work experience, gender, and education are presented and discussed.

Uploaded by

Shivangi Jaiswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

174 views24 pages

Summer Internship Report

Uploaded by

Shivangi Jaiswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

INDIRA GANDHI DELHI TECHNICAL UNIVERSITY FOR WOMEN

Department of Information Technology

Masters of Computer Applications (MCA)

Internship Project Report On

Employee Attrition Prediction

Conducted by the Centre of Excellence-AI, IGDTUW,Delhi
(5th june – 16th July,2022)

Submitted to: Submitted By:

Ms. Charu gupta Chandra Kiran Verma (01304092021)
IT Dept, IGDTUW Tisha Chopra (06604092021)
MCA 2nd Year, IGDTUW
ACKNOWLEDGEMENT
We would like to express our gratitude to the AI-ML club of Indira Gandhi Delhi Technical
University for women for giving the wonderful opportunity to do the amazing project
[ Employee Attrition] with proper guidance and workshops. We express our deepest thanks to
Dr. Ritu Jangra for giving us necessary advice and guidance . We also got exposed to
industrial perspective of applications of AI and ML and how AI is revolutionising business
processes conducted by industry professionals from various sectors including agriculture,
healthcare, data science and more. We will strive to use gained skills and knowledge in the
best possible way, and will continue to work on their improvement, in order to attain desired
career objectives. Overall, there was a lot of learning in each session for which we are really
thankful to the AI-ML club.

Chandra Kiran Verma (01304092021)

Tisha Chopra (06604092021)

CERTIFICATE
CERTIFICATE
ABSTRACT
Employee attrition refers to reduction in the number of employees or staff members in an
organisation. It occurs when an employee leaves and isn’t replaced at all or for a significant
amount of time, resulting in a reduction of the workforce.

This project aims to predict the rate of employee attrition using Logistic Regression, Random
Forest classifier and Decision Tree. We use evaluation of Work life balance, employee
performance, Standard hours at work and number of years spent in the company, among others,
as our features. The results of this research showed the superiority of the Logistic Regression
in terms of accuracy and predictive effectiveness, by means of the ROC curve.
Considering all the factors, we achieved the maximum accuracy of 88.66 using Logistic
Regression .The results of this research demonstrate that the Logistic regression classifier is a
superior algorithm in terms of significantly higher accuracy, relatively low runtimes and good
F1-score. Because of these reasons it is recommended to use logistic regression for accurately
predicting employee turnover , giving them the vision to take necessary action.
INDEX

SNO. CONTENTS
I. Introduction
II. Literature review
III. Research Methodology
III A. Decision Tree
III B. Logistic Regression
III C. Random Forest
IV. Data Collection
V. Data Analysis :
V A. Data pre-processing
V B. Feature selection
V C. Model Validation
VI. Result
VII. Discussion and Future Work
VIII. Conclusion
IX. References
I. INTRODUCTION

Employee attrition is the major problem which highlights in all the organizations. Whenever a
well trained and well adapted employee leaves the organization, it creates a vacuum. Retaining
employees is a critical and ongoing effort. However, if the situation isn't handled properly, it
can lead to a decrease in productivity. Efficiency of work is hampered to a large extent. Higher
the attrition rate, the organization has to face some incurred costs to recruit, induct, placement
and train the employee. The organization may have to employ new people and train them on
the tool that is being used, which is time consuming.

Many factors like organization size, location, policies, procedures also have an impact on
employee attrition.This project involves a comparative study of various algorithms using the
model evaluation metrics like accuracy, F-measure. This is done by importing the data from
Kaggle in the form of a CSV file. It stands for "Comma-Separated Values" files. It is a file
format which allows us to save the tabular data, such as spreadsheets. It is useful for huge
datasets and can use these datasets in programs.

This project gives a brief overview of the employee turnover problem, the importance of
solving it, and the work done by others using machine learning algorithms to solve the problem.
It explores 3 different machine learning algorithms, including decision tree, logistic regression
and random forest classifier and outlines the experimental method employed in terms of the
features used, pre-processing, and the metrics used to compare the algorithms. It presents the
results of the comparison and a discussion of the same, and possible future work and concludes
by recommending the Logistic Regression as an approach to solve the employee attrition
prediction problem.
II. LITERATURE REVIEW

Attrition is said to be the gradual reduction in the number of employees through retirement,
resignation or demise. It can also be said as employee turnover or employee defection. Most
literature on employee attrition categorizes it as either voluntary or involuntary. Involuntary
attrition is thought of as the mistake of the employee, and refers to the organization firing the
employee for various reasons. Voluntary attrition is when a worker leaves an organization out
of his/her own free will. It can be as a result of leaving a current job for a new job elsewhere
or retiring. It was found that the strongest predictors for voluntary turnover were age, pay,
tenure, overall job satisfaction, and employee’s perceptions of fairness. Other similar research
findings suggested that personal or demographic variables, specifically age, gender, ethnicity,
education, and marital status, were important factors in the prediction of voluntary employee
turnover. Other studies showed that several other features, such as working conditions, job
satisfaction, and growth potential also contributed to voluntary attrition. Employees are a
crucial resource for any organization, and hence withdrawal of productive employees might
affect an organization with respect to various aspects.

Some of the consequences of Employee Attrition are: Investing in staffing and training new
employees, increased burden on existing employees and a decline in the performance of the
organization. Therefore we will tackle this problem by applying machine learning techniques
to predict turnover thus giving the organizations the vision to take necessary action.
III. Research Methodology
Supervised learning is a subcategory of machine learning and artificial intelligence. It is defined
by its use of labelled datasets to train algorithms that to classify data or predict outcomes
accurately. This section outlines the theory behind each machine learning algorithm. The
dataset contains information about employees such as monthly income, TotalWorkingYears,
gender, nature of work, position, education, salary. Overall, there are 35 features recorded in
the dataset. However, for our analysis, not all features were essential or useful. For example,
standard weekly hours were not suitable for us because all records had the same values, so
these features were discarded from the analysis. After we select the features that we want to
keep in the data set, the data selection and cleansing step will be finished.

A. Decision Tree
Decision tree classifiers are regarded to be a standout of the most well-known
methods to data classification representation of classifiers. A decision tree is a flowchart-like
tree structure, where each internal node denotes a test on an attribute, each branch represents
an outcome of the test, and each leaf node (terminal node) holds a class label.

Conventionally, a decision tree is used for making boolean decisions in which the splitting
power of an attribute is computed as its information gain that, in turn, is computed as its
entropy reduction. Decision tree classifiers are known for their enhanced view of
performance outcomes. Because of their strong precision, optimised splitting parameters,
and enhanced tree pruning techniques (ID3, C4.5, CART, CHAID, and QUEST) are
commonly used by all recognized data classifier

Fig 1

In the case of pursuing Boolean decisions, the entropy of a data set is computed as:
H(Set) = −P1 × log2P1 − P2 × log2P2 where P1 is the proportion of the 1st decision, and
P2 is the proportion of the 2nd decision
The information gain of an attribute is computed as: Gain(A) = H(Set) − (w1 × H(a1) + w2 ×
H(a2) + ... + wm × H(am)) where a1, a2, ... , am are the different values of attribute A, and
w1, w2, ..., wm are the weights of the subsets split by using the values of attribute A.

B. Logistic Regression
Logistic regression is basically a supervised classification algorithm. In a classification
problem, the target variable(or output), y, can take only discrete values for a given set of
features(or inputs), X. It’s often used with regularisation in the form of penalties based on L1-
norm or L2-norm to avoid over-fitting.An L2-regularised logistic regression for this paper. This
technique obtains the posterior probabilities by assuming a model for the same and estimates
the parameters involved in the assumed model.

Fig 2

C. Random Forest
This algorithm is a popular tree based ensemble learning technique. In the ensemble-learning
approach, a single prediction model is created by gathering more than one classifier. The
type of ‘ensembling’ used here is bagging. In bagging, successive trees do not depend on
earlier trees — each is independently constructed using a different bootstrap sample of the
data set.
In the end, a simple majority vote is taken for prediction. Random forests are different from
standard trees in that for the latter each node is split using the best split among all variables. In
a random forest, each node is split using the best among a subset of predictors randomly chosen
at that node. This additional layer of randomness makes it robust against overfitting .Random
forest is an ensemble of the decision trees that is expected to perform better and hence give a
higher accuracy.

Fig 3
IV. Data Collection
The data was pulled from a sample dataset on Kaggle in the form of a CSV file.There were
around 1470 observations, each observation corresponding to an employee and 35 different
attributes Business Travel, Daily Rate, Department, Distance , Marital Status, Monthly Income,
Number of companies worked, Over18, Over Time, Percent Salary Hike,
Performance rating,Satisfaction, standard working hours, Stock option level, Employee field,
Environment Satisfaction, Gender, Hourly Rate, Job Involvement, Job Level, Job Role, Job
Satisfaction , Total working years, training times last year, work-life balance,years at company.
Years in current role, Year since last promotion, years with current manager. The dataset
included various important features including average number of monthly hours, overtime,
number of projects, years spent in the company and whether the employee received a promotion
in the last five years.

The factors that are included in this dataset are following:

1. Age

2. Attrition Yes/No

3. Business Travel - NonTravel/Travel-Rarely/Travel-Frequently

4. DailyRate - DailyRate paid to Employee

5. Department- There are 3 department as follow Human Resources, Research &

Development & Sales

6. DistanceFromHome

7. Education: The level of Education coded as follow: 1 ‘Below College’ 2 ‘College’ 3.

‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’

8. Education Field: Education Field of employee among Human Resources, Life Sciences
Marketing, Medical,Other & Technical Degree

9. EmployeeCount- Number of Employees in record 10.EmployeeNumber- Unique Code of

Employees

10. EnvironmentSatisfaction: The Environment satisfaction rate given by employee as 1

‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

11. Gender: Male/Female

12. HourlyRate: The Hourly Rate paid to Employee

13. JobInvolvement: The degree to which employee his involved in job, given as 1 ‘Low’ 2
Medium' 3 ‘High’ 4 ‘Very High’

14. JobLevel: The Job hierarchy of Employee, rated as 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very
High’
15. JobRole: The Job Role of Employee among Healthcare Representative, Human Resources
Laboratory Technician, Manager, Manufacturing Director, Research Director, Research
Scientist, Sales Executive & Sales Representative

16. JobSatisfaction: The job satisfaction rate of employee given as 1 ‘Low’ 2 ‘Medium’ 3
‘High’ 4 ‘Very High’

17. MaritalStatus: Employee is Married/Divorced/Single

18. Monthly Income: The monthly income of employee in Rs

19. MonthlyRate: The monthly rate of employee paid by company in Rs

20. NumCompaniesWorked: The total number of companies employee has worked.

21. Over18: Whether Employee is above 18 age or not.

22. OverTime: Whether employee do OverTime or not.

23. PercentSalaryHike: The percent increase in salary given to employee.

24. Performance Rating: The performance rating given to employee given as 1 ‘Low’ 2
‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’

25. RelationshipSatisfaction: The relationship satisfaction rating by employee given as 1

‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

26. StandardHours: StandardHours for an employee in week.

27. StockOptionLevel:

28. TotalWorkingYears: The total working years of an employee.

29. TrainingTimeLastYear: The total times for which employee was given training last year.
30. WorkLifeBalance: The work life balance rating given by employee as 1 ‘Bad’ 2 ‘Good’ 3
‘Better’ 4 ‘Best’

31. YearsAtComany: Total years employee have been in the company.

32. YearsInCurrentRole: The total year employee has been in current role.

33. YearsSinceLastPromotion: The total years after the last promotion.

34. YearWithCurrManager: The total year employee is with current manager.

V. Data Analysis

A. Data pre-processing

The process that involves transforming the raw data into a model-able format is called data
pre-processing. Real-world data is often incomplete and inconsistent.

It involves below steps: getting the dataset, importing libraries, importing datasets, finding
missing data, encoding categorical data, splitting dataset into training and test set, feature
scaling.
HeatMap representing Employee Attrition data

A heat map is a two-dimensional representation of data in which values are represented by

colours. A simple heat map provides an immediate visual summary of information.

For categorical variables the missing values were imputed using the mode of that field. For
example, salary values, which were either ‘true’ or ‘false’, were converted to 0 and 1
respectively.

B. Feature selection

The input variables that we give to our machine learning models are called features. Each
column in our dataset constitutes a feature. To train an optimal model, we need to make sure
that we use only the essential features. If we have too many features, the model can capture the
unimportant patterns and learn from noise. The method of choosing the important parameters
of our data is called Feature Selection. Popular methods for feature selection are correlation
analysis & chi-square analysis, exploratory bivariate analysis and information value analysis.
Correlation analysis is used for the numeric variables & Chi-square analysis is used for the
categorical variables. A high correlation or a chi-square value proves a feature significant.

Visualising numerical columns

Model Validation
The dataset was split 70-30 into training and testing sets. The trained model was then used
to predict on the 30% test set. The choice of model validation techniques is the area under the
receiver operating characteristic curve (ROCAUC). It is a visualization graph that is used to
evaluate the performance of different machine learning models. This graph is plotted between
true positive and false positive rates where true positive is totally positive and false positive is
a total negative. Additionally, accuracy and F1 scores of the classifiers are also used to compare
the results of the models. These two are important because they clearly show how
suitable the model is for use in an application

1: RANDOM FOREST CLASSIFIER

Random forest is an ensemble of the decision trees that is expected to perform better and hence
give a higher accuracy. Model evaluation matrix for this particular model is:

MODEL EVALUATION:
CONFUSION MATRIX :

ROC CURVE:

2: LOGISTIC REGRESSION

This technique obtains the posterior probabilities by assuming a model for the same and
estimates the parameters involved in the assumed model. Model evaluation matrix for this
particular model is:
MODEL EVALUATION:

Confusion matrix:

ROC Curve of the model:

3: DECISION TREE
Decision tree classifiers are regarded to be a standout of the most well-known methods to
data classification representation of classifiers.
H(Set) = −P1 × log2P1 − P2 × log2P2
where P1 is the proportion of the 1st decision, and P2 is the proportion of the 2nd decision
The information gain of an attribute is computed as: Gain(A) = H(Set) − (w1 × H(a1) + w2 ×
H(a2) + ... + wm × H(am))
where a1, a2, ... , am are the different values of attribute A, and w1, w2, ..., wm are the weights
of the subsets split by using the values of attribute A.

Model evaluation matrix for this particular model is:

MODEL EVALUATION :

Confusion matrix:
ROC Curve of the model:
RESULTS

Comparing the models on the basis of accuracy and F1 Score:

Algorithm Testing Accuracy F1 Score

Training
Accuracy

1.0 0.79 0.35

1. Decision Tree

0.87 0.88 0.51

2. Logistic Regression

3. Random Forest 0.98 0.85 0.21

VII. Discussion and Future Work

Using different models we were able to justify that the features chosen are causes that
contribute to voluntary attrition. Intuitively, we need to find the probability of event success
and event failure (attrition). It is used when the dependent variable is binary (0/1, True/False,
Yes/No) in nature. This is the basis for choosing logistic regression as the best approach in this
paper. The logistic regression algorithm has good ROC-AUC and accuracy values. Future work
might include modifying the algorithm or using another algorithm such as XGBoost classifier
or Support Vector Machine and check their accuracy.
VIII. CONCLUSION

The importance of predicting employee attrition using various machine learning algorithms
such as Logistic Regression, Random Forest Classifier and Decision Tree was presented in this
model. The results of this research showed the superiority of the Logistic Regression in terms
of accuracy and predictive effectiveness, by means of the ROC curve. Data from the dataset
was used to compare the Logistic regression against two other supervised classifiers that had
been historically used to build turnover models. Considering all the factors,we achieved the
maximum accuracy of 89.54 by Logistic Regression. The results of this research demonstrate
that the Logistic regression classifier is a superior algorithm in terms of significantly higher
accuracy, relatively low runtimes and good F1-score. Because of these reasons it is
recommended to use logistic regression for accurately predicting employee turnover , giving
them the vision to take necessary action.

In this project, the importance of employee attrition in organizations and the proper use of
machine learning algorithms by evaluating models was presented. The focus was on using
different algorithms and studying their effective performance. Random Forest Classifier,
Decision tree and Logistic regression were performed on the employee data for predicting
employee turnover. The formation of the regularizations makes it stand out effectively as
compared to the other models / classifiers. The accuracy through the Logistic Regression model
was analyzed to be 88.66%. Hence, the efficiency product of predictive machine learning
algorithms on the same dataset reveals that Logistic Regression outperforms if accuracy is the
preferred.
IX. REFERENCES
[1] A Review on Employee’s Voluntary Turnover: Psychological Perspective,
2020

[2] Supervised Learning - A Systematic Literature Review, 2021

[3] Classification Based on Decision Tree Algorithm for Machine Learning,

2021

[4] An Introduction to Logistic Regression Analysis and Reporting. The Journal

of Educational Research, Volume 96, 2002 - Issue 1

[5] A meta-analysis of research in random forests for classification, 2016

[6] Pre-Processing: A Data Preparation Step, 2018

[7] A survey of feature selection and feature extraction techniques in machine

learning, 2014

[8] Machine Learning Algorithm Validation, 2020

[9] Introduction to ROC analysis, 2006

5 Ieee
No ratings yet
5 Ieee
6 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
Employee Attrition Prediction
100% (1)
Employee Attrition Prediction
21 pages
Employee Churn Prediction Using Logistic Regression
No ratings yet
Employee Churn Prediction Using Logistic Regression
72 pages
Attrition Project Mangal
No ratings yet
Attrition Project Mangal
75 pages
A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
No ratings yet
A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
9 pages
Employee Attrition Prediction
No ratings yet
Employee Attrition Prediction
66 pages
Employee Attrition Prediction Using Machine Learning Models: A Review Paper
No ratings yet
Employee Attrition Prediction Using Machine Learning Models: A Review Paper
27 pages
Ibm Attrition Practices
No ratings yet
Ibm Attrition Practices
7 pages
AIP - Aip 202501 0006
No ratings yet
AIP - Aip 202501 0006
16 pages
Attrition Prediction Docs
No ratings yet
Attrition Prediction Docs
27 pages
Employee Attrition Prediction Using Machine Learning
No ratings yet
Employee Attrition Prediction Using Machine Learning
9 pages
Retention Is All You Need
No ratings yet
Retention Is All You Need
7 pages
DOCUMENTATION12
No ratings yet
DOCUMENTATION12
42 pages
Ataiml 02.04 04
No ratings yet
Ataiml 02.04 04
14 pages
Assighment3 4 AI Projecct
No ratings yet
Assighment3 4 AI Projecct
58 pages
Emloyee Attrition and Retention
No ratings yet
Emloyee Attrition and Retention
17 pages
18 Intellisys Employee
No ratings yet
18 Intellisys Employee
22 pages
Reportprediction of Employee Atrition Uisng Machine Learning
No ratings yet
Reportprediction of Employee Atrition Uisng Machine Learning
6 pages
HR Review1
No ratings yet
HR Review1
11 pages
IBM HR Analytics For Employee Attrition and Performance Prediction
No ratings yet
IBM HR Analytics For Employee Attrition and Performance Prediction
44 pages
10 1109@iadcc 2018 8692137
No ratings yet
10 1109@iadcc 2018 8692137
6 pages
Report
No ratings yet
Report
45 pages
Evaluating Employee Attrition - Design and Implementation
No ratings yet
Evaluating Employee Attrition - Design and Implementation
10 pages
Research Paper
No ratings yet
Research Paper
5 pages
Employee Attrition Prediction Analysis Report
No ratings yet
Employee Attrition Prediction Analysis Report
6 pages
DATA4800 Report
No ratings yet
DATA4800 Report
6 pages
Evaluation of Machine Learning Models For Employee Churn
No ratings yet
Evaluation of Machine Learning Models For Employee Churn
5 pages
Employee Turnover Prediction
100% (1)
Employee Turnover Prediction
16 pages
Employee Turnover Prediction
No ratings yet
Employee Turnover Prediction
12 pages
Employee Attrition Rate Prediction Using Machine Learning Approach
No ratings yet
Employee Attrition Rate Prediction Using Machine Learning Approach
8 pages
Db15 Conference
No ratings yet
Db15 Conference
6 pages
Applsci 13 00267
No ratings yet
Applsci 13 00267
8 pages
HR Analytics - Employee Attrition Analysis Using Random Forest
No ratings yet
HR Analytics - Employee Attrition Analysis Using Random Forest
7 pages
Batch 16
No ratings yet
Batch 16
8 pages
ISE 527 IEEE Access LaTeX Template
No ratings yet
ISE 527 IEEE Access LaTeX Template
16 pages
Cdu 1121 09
No ratings yet
Cdu 1121 09
10 pages
Borse Et Al 2024 Detecting Early Warning Signs of Employee Attrition Using Machine Learning Algorithms
No ratings yet
Borse Et Al 2024 Detecting Early Warning Signs of Employee Attrition Using Machine Learning Algorithms
9 pages
IBM Analysis
No ratings yet
IBM Analysis
17 pages
Project Report
No ratings yet
Project Report
22 pages
11783-Article Text-8048-2-10-20221004
No ratings yet
11783-Article Text-8048-2-10-20221004
9 pages
Employee Turnover1
No ratings yet
Employee Turnover1
4 pages
Applsci 12 06424
No ratings yet
Applsci 12 06424
17 pages
Employee Attrition PREDICTION Using Machine Learning
No ratings yet
Employee Attrition PREDICTION Using Machine Learning
11 pages
Employee Attrition Analysis
No ratings yet
Employee Attrition Analysis
2 pages
Employee Attrition Classification
No ratings yet
Employee Attrition Classification
16 pages
Abstract:: Disadvantages
No ratings yet
Abstract:: Disadvantages
2 pages
Tentative Research Topic
No ratings yet
Tentative Research Topic
4 pages
941-Article Text-9536-1-10-20240830
No ratings yet
941-Article Text-9536-1-10-20240830
12 pages
HR Analytics: Employee Attrition Analysis Using Logistic Regression
No ratings yet
HR Analytics: Employee Attrition Analysis Using Logistic Regression
8 pages
Prediction of Employee Attrition PDF
0% (1)
Prediction of Employee Attrition PDF
7 pages
Human Retention Using Data Science
No ratings yet
Human Retention Using Data Science
16 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Employee Future Prediction
No ratings yet
Employee Future Prediction
3 pages
Attrition Prediction: Schandia@cit - Edu.in
No ratings yet
Attrition Prediction: Schandia@cit - Edu.in
1 page
Predicting Employee Attrition Using XGBoost Machine Learning
No ratings yet
Predicting Employee Attrition Using XGBoost Machine Learning
8 pages
Early Prediction of Employee Attrition Using Data Mining Techniques
No ratings yet
Early Prediction of Employee Attrition Using Data Mining Techniques
6 pages
Is 451 Report 1
No ratings yet
Is 451 Report 1
4 pages
Sliding Mode Control PPT Final
No ratings yet
Sliding Mode Control PPT Final
28 pages
Module C - Transportation Models
No ratings yet
Module C - Transportation Models
27 pages
CSE445 NSU Week - 1
No ratings yet
CSE445 NSU Week - 1
28 pages
Aneeket Arya Tic-Tac-Toe AI First Draft
No ratings yet
Aneeket Arya Tic-Tac-Toe AI First Draft
12 pages
Ds 7-Merge Sort
No ratings yet
Ds 7-Merge Sort
10 pages
Artificial - Intelegence-1 - Autosaved
No ratings yet
Artificial - Intelegence-1 - Autosaved
155 pages
Linear Algebra
No ratings yet
Linear Algebra
20 pages
Binary Search Algorithm - Data Structure
No ratings yet
Binary Search Algorithm - Data Structure
3 pages
Math Assignment Unit 7
No ratings yet
Math Assignment Unit 7
5 pages
M 6 Problem Set Solutions
No ratings yet
M 6 Problem Set Solutions
8 pages
Simplified Unit 4 and 5 Study Material
No ratings yet
Simplified Unit 4 and 5 Study Material
34 pages
Week 4 Lecture Slides BUS265 2023
No ratings yet
Week 4 Lecture Slides BUS265 2023
45 pages
Wa0007
No ratings yet
Wa0007
3 pages
Question Bank - DC 6501
No ratings yet
Question Bank - DC 6501
10 pages
COMP9517 Lab3 - Theory
No ratings yet
COMP9517 Lab3 - Theory
16 pages
Syllabus Template MSc-IMCA
No ratings yet
Syllabus Template MSc-IMCA
4 pages
2020 - Implementation Matters in DRL A Case Study On PPO and TRPO
No ratings yet
2020 - Implementation Matters in DRL A Case Study On PPO and TRPO
14 pages
Network Flows 1.3 Network Representations 1.3 Network Representations
No ratings yet
Network Flows 1.3 Network Representations 1.3 Network Representations
35 pages
EEE539 Spring2024 MT Sol
No ratings yet
EEE539 Spring2024 MT Sol
6 pages
Fast - Algorithms - For - Mining Association Rules - R Agrawal - R Srikant-IBM
No ratings yet
Fast - Algorithms - For - Mining Association Rules - R Agrawal - R Srikant-IBM
32 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
Causal Loop Diagram
No ratings yet
Causal Loop Diagram
4 pages
Algebra Midterm
No ratings yet
Algebra Midterm
5 pages
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
No ratings yet
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
24 pages
Spline Interpolation Fortran Code
No ratings yet
Spline Interpolation Fortran Code
4 pages
Comparative Evaluation of Credit Card Fraud Detection
No ratings yet
Comparative Evaluation of Credit Card Fraud Detection
7 pages
Heap Sort
No ratings yet
Heap Sort
1 page
Secured Data Transmission in A V-Blast Encoded MIMO MCCDMA Wireless Communication System
No ratings yet
Secured Data Transmission in A V-Blast Encoded MIMO MCCDMA Wireless Communication System
8 pages
D11 D12 D13 0354 Midterm
No ratings yet
D11 D12 D13 0354 Midterm
2 pages
National Institute of Technology Rourkela: Department of Computer Science and Engineering
No ratings yet
National Institute of Technology Rourkela: Department of Computer Science and Engineering
2 pages
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
The Future of Human Resources in the Era of Artificial Intelligence 2025
From Everand
The Future of Human Resources in the Era of Artificial Intelligence 2025
Hussain Alshubaily
5/5 (1)

Summer Internship Report

Uploaded by

Summer Internship Report

Uploaded by

INDIRA GANDHI DELHI TECHNICAL UNIVERSITY FOR WOMEN

Department of Information Technology

Masters of Computer Applications (MCA)

Internship Project Report On

Employee Attrition Prediction

Submitted to: Submitted By:

Chandra Kiran Verma (01304092021)

Tisha Chopra (06604092021)

The factors that are included in this dataset are following:

3. Business Travel - NonTravel/Travel-Rarely/Travel-Frequently

4. DailyRate - DailyRate paid to Employee

5. Department- There are 3 department as follow Human Resources, Research &

7. Education: The level of Education coded as follow: 1 ‘Below College’ 2 ‘College’ 3.

9. EmployeeCount- Number of Employees in record 10.EmployeeNumber- Unique Code of

10. EnvironmentSatisfaction: The Environment satisfaction rate given by employee as 1

11. Gender: Male/Female

12. HourlyRate: The Hourly Rate paid to Employee

17. MaritalStatus: Employee is Married/Divorced/Single

18. Monthly Income: The monthly income of employee in Rs

19. MonthlyRate: The monthly rate of employee paid by company in Rs

20. NumCompaniesWorked: The total number of companies employee has worked.

21. Over18: Whether Employee is above 18 age or not.

22. OverTime: Whether employee do OverTime or not.

23. PercentSalaryHike: The percent increase in salary given to employee.

25. RelationshipSatisfaction: The relationship satisfaction rating by employee given as 1

26. StandardHours: StandardHours for an employee in week.

28. TotalWorkingYears: The total working years of an employee.

31. YearsAtComany: Total years employee have been in the company.

33. YearsSinceLastPromotion: The total years after the last promotion.

34. YearWithCurrManager: The total year employee is with current manager.

A heat map is a two-dimensional representation of data in which values are represented by

Visualising numerical columns

1: RANDOM FOREST CLASSIFIER

ROC Curve of the model:

Model evaluation matrix for this particular model is:

Comparing the models on the basis of accuracy and F1 Score:

Algorithm Testing Accuracy F1 Score

1.0 0.79 0.35

0.87 0.88 0.51

3. Random Forest 0.98 0.85 0.21

[2] Supervised Learning - A Systematic Literature Review, 2021

[3] Classification Based on Decision Tree Algorithm for Machine Learning,

[4] An Introduction to Logistic Regression Analysis and Reporting. The Journal

[5] A meta-analysis of research in random forests for classification, 2016

[6] Pre-Processing: A Data Preparation Step, 2018

[7] A survey of feature selection and feature extraction techniques in machine

[8] Machine Learning Algorithm Validation, 2020

[9] Introduction to ROC analysis, 2006

You might also like