From Big Data To Deep Data To Support People Analytics For Employee Attrition Prediction
From Big Data To Deep Data To Support People Analytics For Employee Attrition Prediction
ABSTRACT In the era of data science and big data analytics, people analytics help organizations and their
human resources (HR) managers to reduce attrition by changing the way of attracting and retaining talent.
In this context, employee attrition presents a critical problem and a big risk for organizations as it affects
not only their productivity but also their planning continuity. In this context, the salient contributions of
this research are as follows. Firstly, we propose a people analytics approach to predict employee attrition
that shifts from a big data to a deep data context by focusing on data quality instead of its quantity. In fact,
this deep data-driven approach is based on a mixed method to construct a relevant employee attrition model
in order to identify key employee features influencing his/her attrition. In this method, we started thinking
‘big’ by collecting most of the common features from the literature (an exploratory research) then we tried
thinking ‘deep’ by filtering and selecting the most important features using survey and feature selection
algorithms (a quantitative method). Secondly, this attrition prediction approach is based on machine, deep
and ensemble learning models and is experimented on a large-sized and a medium-sized simulated human
resources datasets and then a real small-sized dataset from a total of 450 responses. Our approach achieves
higher accuracy (0.96, 0.98 and 0.99 respectively) for the three datasets when compared previous solutions.
Finally, while rewards and payments are generally considered as the most important keys to retention, our
findings indicate that ‘business travel’, which is less common in the literature, is the leading motivator for
employees and must be considered within HR policies to retention.
INDEX TERMS Deep people analytics, employee attrition, retention, prediction, interpretation, policies
recommendation.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 60447
N. B. Yahia et al.: From Big Data to Deep Data to Support People Analytics for Employee Attrition Prediction
whether employees will leave or not can help the organization models and especially big data analytics across different HR
improve the HR management and save the cost on it [4]. functions [12]. One of the main challenges of using analytics
Therefore for the HR managers, it is crucial to have a in HR is the deficiency of empirical data. In fact, lack of
better idea of what kind of employees will tend to leave enough empirical data can be in terms of both the number
and what kind of features will influence them to leave [6]. of candidates or samples, as well as the number of features
Most commonly, organizations desire to make sure the right and this fails to adequately train a reliable model based on
employees are in the right place at the right time and identify- such a small dataset. Hence, organizations that plan to use
ing employees’ intention to leave by means of analytics [7]. HR analytics first have to face the data availability challenge
Descriptive analytics are used to summarize or turn data and they must be able to produce very large volumes of
into relevant information so investigate what has occurred. data [13]. Consequently, organizations need large-scale stor-
In other words, descriptive analytics have some meaningful age solutions that tend to be cloud-based and which require
impact by explaining what has already happened however, high costs. Moreover, small organizations may not have high-
they are not much helpful in predicting what will hap- quality HR data and may lack the analytical capabilities to
pen or may happen in the future. On the contrary, predictive adapt techniques designed for big data to areas where the
analytics have been proposed and used to forecast what will volume of data is quite small (big data). In this context,
happen in the future. In the field of HR, predictive analytics the main challenge is the quality of data where organizations
lead to achievement of organizational benefits and help surely must know exactly the data, they need to support their HR
in better decision-making in the organization without any analytics functions as HR managers may not have need to
biasness, especially with the most prosperous trend of the big all the data they collect. From this point of view, the volume
data era and data science basing on machine and deep learn- of data is not very important, as what matters in this context
ing techniques [8]. In fact, data is considered as one of the is the value of data. Importantly, the identification of deep
mandatory ingredients that a people analytics team requires data, a high-quality data that focus on specific predict trends,
to be effective [9]. Otherwise, HR is set to fail in handling Big is a major barrier to the use of HR analytics for some orga-
Data challenges since Big Data focuses on capturing every nizations. So, the main objective of our approach is to shift
piece of available information and collecting every suitable from big data to deep data perspective and to section down
and unsuitable data. But, in HR analytics context, the issue the massive amount of data by excluding useless or duplicate
must move from the size of the data to its smartness and information.
making better use of data to create and capture value, being a Thereby, in this paper, we aim to propose a deep data-
necessary prerequisite to the more advanced forms of big data driven predictive approach that can early detect and predict
analysis [4]. Additionally, [10] highlighted the limits of the employee intention to leave. Comparing with the related
application of Big Data within a contextual HR case study, works, this approach focuses on small information-rich HR
whilst also noting the need to shift the focus from a quan- data within big data. In fact, recent related works such
titative to a qualitative analysis of HR data. In this context, as [14]–[25] and [26] are commonly focusing on find-
the concept of deep data was born to deal with collecting only ing the best predictive models with high performances to
relevant and specific information and excluding information predict employee attrition using generally benchmarks and
that might be unusable or otherwise redundant [11]. simulated open data such as HR IBM1 and HR Kaggle2
Thus, in this paper, we mainly focus on two dimensions: a datasets. But, in this paper, we argue that apart from mod-
functional dimension and a data dimension. From a functional els performances, the HR data must be well constructed
dimension, we aim to test, compare and select the best accu- and filtered to give relevant and rapid prediction without
rate predictive model that can early detect employee attrition. biases.
We also aim to interpret the positive attrition to find reasons Thanks to this deep-data driven approach, which is based
behind it and so to support HR managers to build retention on small data providing the greatest business value at a
plan. From a data dimension, the key property of the proposed lower cost than vast volumes of big data with regards to the
approach, we aim to shift from big data to deep data to address real impactful factors on employee attrition. Thus, the main
data issues that organizations may face when implementing goals of this research are to: 1) create an effective employee
HR analytics. attrition model that contains the necessary and sufficient
Big data is a label commonly used to identify large vol- factors for early detection of attrition intent by deploying a
umes of (structured or unstructured) data that can gener- mixed method based on exploratory as well as quantitative
ally be defined with the help of the 3Vs volume, velocity, analyses, 2) build decision models to predict attrition using
and variety. Volume refers to the quantity of data that are Machine, Ensemble and Deep Learning techniques (ML,
produced by various sources such as sensors, social media, EL and DL),3) make interpretations to explain and identify
business transactions, etc. Velocity represents the speed at the exact reasons behind employee attrition, and 4) make
which data are produced, and variety refers to the different 1 https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-
formats of data. Over the last decade, the exploitation of big dataset
data has become very popular among organizations and these 2 https://github.com/ryankarlos/Human-Resource-Analytics-Kaggle-
ones tend to adopt new data-driven strategic decision-making Dataset
recommendations to fight this possible attrition and to take TABLE 1. Recent related works.
necessary HR management policies.
The outline of this paper is as follows: In the second
section, we will present an overview of related works. The
research methodology conducted in this research to collect
data for our study and to design our final employee attri-
tion model will be presented in the third section. In the
ford section, we will present our approach and the various
intelligent and predictive models proposed in order to predict
employee attrition as soon as possible. The fifth section will
show the experimental results as well as the findings of this
research i.e. interpretation of the results to understand what
makes an employee quit. Finally, we conclude and present an
outlook on future works.
3) ENSEMBLE LEARNING BASED PREDICTIVE MODELS somewhere else. In fact, organizations retention policies and
The main goal of ensemble learning (EL) is to combine sev- all other internal policies governance play a significant role
eral models in order to find a better solution that gives better in improving workplace productivity, engaging employees
results [34]. So, EL is used here to combine the classifiers and emotionally and, hence controlling attrition. How to retain
their predictions in order to improve robustness over a single productive employees and their valued skills is one of the
classifier. In this study, we will test three ensemble learning biggest problem that plague organizations, so we aim in this
models: study not only to help HR managers in early detection of
1. Random Forest is a popular tree-based ensemble learn- employee intention to leave but also to enable them to be
ing technique and a bagging algorithm where successive trees aware of the facts leading to employees’ attrition, thus they
are constructed using a different bootstrap sample of the can take few measures and effective management strategies
dataset. By the end, a simple majority vote is taken for pre- to retain their employees. Indeed, it is equally very important
diction. Random forests are different from standard trees as for HR managers to not only have an accurate, but also an
each node is split using the best among a subset of predictors interpretable and an explicative predictive model that indicate
randomly chosen at that node which makes it robust against which features triggering employee attrition and what makes
over-fitting [35]. an employee quit.
2. XGBoost is a gradient boosted tree algorithm that Thus, in this step of our approach, we will show how we
involves fitting a set of weak learners and in which final can use our proposed models for attrition interpretation as
prediction is produced by the combination of predictions well as attrition prediction using features importance. It is a
from all of them through a weighted majority vote (or sum). statistical method that allows us to evaluate and quantify the
This boosting algorithm is based on the use of a regularized- participation of each feature in the prediction of the classifi-
model formalization to control over-fitting, which makes it cation task. So, we will use it here to identify attritionary fea-
highly robust and gives it better performance [36]. tures and to understand these features’ influence on employee
3. Voting Classifier is an ensemble learning model that attrition. Generally, features importance provides a score for
trains on an ensemble of classifiers and then predicts the each attribute that indicates either how much an attribute
output class basing on a majority vote according to two contributes to the improvement of the performance, or how
different strategies. The first one is the Hard Voting where much does the model depends on each of its features in the
the predicted output class is the class which had the highest prediction.
probability of being predicted by each of the classifiers. So, our aim here is to search for real reasons behind
The second one is the Soft Voting where the output class the phenomenon of attrition, so interpretation has to focus
is the prediction based on the average of probability given only on attritional employees and those who have intention
to that class. In our case, we use a Voting classifier that to leave, i.e. only taking into account samples where the
combines our chosen ML models and that is based on the value of Attrition = 1 (and we ignore samples where the
majority vote strategy (Hard vote) to predict the output class. value of Attrition = 0). Then, we consider ‘‘Job satisfac-
Such a classifier can be useful for a set of equally well tion’’ feature as our new target because employee job sat-
performing model in order to balance out their individual isfaction is a key ingredient of employee retention. In fact,
weaknesses. evidence suggests that employee attrition is triggered by job
4. Stacked ANN-based model where outputs of the three dissatisfaction and many researchers have shown that the
chosen deep learners (DNN, LSTM and CNN) are collected employee satisfaction with job is significantly correlated to
to create a new dataset encompassing also for each row the the intention to leave [38]. We then proceed to the following
real expected value that will be used to train a new DNN steps:
learning model, called meta-learner. 1. Remove rows that present employees who did not
It is helpful to recall here that we used GridSeach for 10% leave their jobs or don’t have intention to leave (with
of dataset as validation set to identify the best hyperparame- Attrition = 0).
ters for each model (such as decision criterion and max-depth 2. Delete the ‘‘Attrition’’ column and consider ‘‘Job satis-
for DT, the hidden layers number and units or neurons number faction’’ as the new target.
in each layer for DNN, LSTM and CNN). 3. Convert values of job satisfaction column 1, 2 and 3,
4 into respectively 0 and 1 as satisfied and not satisfied.
C. INTERPRETATION OF THE EMPLOYEE 4. Apply features importance using the Random Forest
ATTRITION PHENOMENON (RF) classifier to identify the most impactful features
Employee retention refers to organizations’ practices and on employee job satisfaction (we choose RF because
policies that are used to prevent valuable and skilled employ- it is the most performing predictor whereas ensemble
ees from leaving their jobs [37]. method cannot be used here as its inputs are classifiers
Thus, retention is totally opposite of attrition, it means and not data).
the ability of organizations to keep their employees, in par- Results of applying features importance on our RF classifier
ticular, productive ones, and stop them from going to work are depicted in Fig. 4.
V. EXPERIMENTATION RESULTS
After conducting an exploratory and deep data analysis
and then identifying all models settings (parameters and
hyper-parameters), we are now ready to proceed onto build-
TABLE 5. Performance evaluation of models using our real dataset.
ing our models and to assess their performance. Indeed,
we will present in this section the experimental results of
machine, ensemble and deep learning predictive models.
To best assess the performance of these prediction models
in a variety of scenarios, the large-sized Kaggle HR simu-
lated dataset (15000 samples), the medium-sized IBM HR
simulated dataset (1470 samples) and our small-sized HR real
dataset (450 samples) are used. Finally, the salient contribu-
tion of these models will be presented towards the end of
this experimentation to enable the HR manager not only to
predict attrition but also to understand why and so to identify
keys to retention. Evaluation criteria for these models and
the comparison of their results are explained in following
sections.
TABLE 6. Comparison of accuracy models for IBM/Kaggle HR datasets the use of relevant data and the selection of impactful features
with existing works.
instead of using all the collected data. It is helpful mentioning
here that feature selection gives an effective way to reduce the
complexity of classification problems by removing irrelevant
and redundant data, which can reduce computation time,
improve learning accuracy, and facilitate a better understand-
ing for the learning model. According to the results, those
substantiations were experimentally proved here as shown in
tables 4 and 5. In fact, an improvement of accuracy measures
for most of the classifiers is marked when feature selection is
used. We also note an improvement of the F1-score after fea-
tures selection. This confirms the effectiveness of our chosen
employee attrition model in this study and the good results
from multiple classifiers after feature selection justify that the
selected features are effectively contribute to voluntary attri-
tion. Even for the human resources IBM simulated dataset,
predictors’ performance has been improved by reducing the
number of existing features and keeping only our 11 selected
features, and in particular, ensemble method VC accuracy
has been slightly increased from 0.93 before feature selection
to 0.96 after feature selection. Moreover, ensemble learning
VC applied to our final dataset after feature selection gives
real data. In fact, VC outperforms all the other classifiers the best results with an accuracy of 0.99. This also confirms
in terms of accuracy especially when using our real dataset that the choice of SelectKBest and RFE as the two feature
compared to the simulated ones. With regards to the differ- selection algorithms is a good one to improve and validate our
ent used machine learning classifiers, the use of ensemble employee attrition model. So, this deep study also comple-
learning VC gives better results in terms of accuracy for both ments previous findings reported in the literature regarding
simulated and real dataset regardless of the application of the impactful features on employee attrition and confirms
feature selection. In particular, for our final dataset, ensemble only the need of the 11 selected features.
learning VC gives the best results with an accuracy of 0.99. The second salient contribution in this paper concerns the
This can be explained by the fact that the ensemble learning interpretation and the explanation of the attrition phenomena
aims to combine (weak) learners in one method by taking and so the recommendations for effective retention. Accord-
advantage of their complementarity to output best accurate ing to [37], retention policies fall into three levels of HR man-
results. In addition, with regards to deep learning predictors, agement: High, medium and low levels. Each level considers
our ensemble learning VC also outperforms them in both a different perspective and requires a different kind of strate-
simulated and real data. This result may be explained by the gies that can help to combat the problem of attrition arising
quantity of the provided data. In fact, deep learning algo- at that level. In the lower managerial level, understanding and
rithms require ‘‘relatively’’ large datasets to work well and money are keys to retention, whereas, for the medium man-
to give better results, and it also needs the infrastructure to agerial level managers’ appreciation, training and business
train them in reasonable time. Also, deep learning algorithms travel programs act as major keys. Finally, for the high-level
require many more experiences and they are more beneficial management, retention policies include freedom of decision
when we deal with complex problems and real big data with making and creation of a trustworthy environment. Thus, gen-
a greater number of features. erally, organizations should create an environment that fosters
Moreover, in order to compare accuracy of our proposed work appreciation and a friendly collaborative atmosphere
models with regards to recent works that reused the sim- that makes an employee feel involved and connected to the
ulated HR datasets, we show in Table 6 different results. organization. Especially for our real case study, results of
We note here that for IBM HR simulated dataset, our feature importance applied to our RF classifier and plotted
ensemble learning VC gives the best results with an accu- in figure 5 show that, for the 450 respondents, the high-
racy of 0.93. For Kaggle HR simulated dataset, ensemble est importance is assigned firstly to the ’’Business Travel’’
learning VC equally gives the best results with an accuracy feature and secondly to ‘Rewards’. Meaning that Business
of 0.98. travel presents the most motivational attribute and the key to
Apart from proposed predictive models and their combi- employee retention with regards to the studied dataset. Thus,
nation to get more accurate employee attrition predictions, HR manager should adopt a retention strategy in the medium
the salient contributions in this paper basically deal with two managerial level and try to organize some business travels for
points. The first one concerns the proposals of a deep data- the employees. While rewards, pay and effort–reward imbal-
driven predictive approach. In fact, our approach focuses on ance are generally considered as the most impactful variables
on employee attrition as in [2] and [16], findings here indi- want to leave and to help them in adopting key policies to
cate, however, that one of the leading features identified is retention.
less common in the literature: business travel. In fact and as In terms of study limitations, considering dynamic features
reported in the literature (e.g., [39]), business travel, whether that deal with employees’ behaviour and their emotional
domestic or international, undoubtedly brings benefits for states will be promising to study their impact on employee
employees and is shown to have a significant effect up and attrition. In this case, the predictive models training must be
beyond technology transfer through innovation and inspira- on-line as data will be dynamic and new data can be added
tion from other environments. Indeed, it has been suggested whenever required. We acknowledge also that our question-
that the experience of visiting clients, other companies, cities naire respondents have equally suggested other features to
and countries broaden employees’ understanding of different be considered and that can cause voluntary turnover and so
cultures and make them more open-minded. can be integrated into our future study. In fact, they have
At this stage, we assume that there might be some validity proposed to consider health issues, job security and the use of
threats of our research findings, and we have self-assessed new technologies in the company. Finally, in future research,
them here in order to denote the trustworthiness of our results, considering unbalanced data is a real challenge especially for
to what extent they are true and not biased by our subjec- organizations and companies with high turnover rate because
tive point of view. In addition, these potential threats are the adopted predictive models are experimentally not suitable
addressed according to the classification proposed in [40]. for unbalanced data.
Regarding the construct validity, we assume that the provided
measures could be biased regarding the researchers’ expected APPENDIX
results. However, we have used in this research, to validate QUANTITATIVE QUESTIONNAIRE
and evaluate the performance of the adopted classifiers, accu- 1. Country:
racy which is considered as a standard metric often used for 2. Gender: Female/Male
measuring performance by reducing biases. They are also 3. Grade:
robust, particularly for balanced data, which is almost our 4. Age:
case here as for our real dataset 47,3% of respondents want 5. Education: 1: ’Below College’ 2: ’College’ 3:
to leave their jobs and 52,7% don’t have the intention to ’Bachelor’, 4: ’Master’, 5: ’Doctor’, 6: Other
quit. Regarding the external validity, there might be some 6. Specialty (Computer Science, Electronics, Mechanics,
issues regarding generalization of our predictive approach Business, Medicine, Education, etc.):
as collected data through the employee survey were small 7. Marital status: 1: Single, 2: Married, 3: Divorced
data (450 samples) which might indicate a low relevance of 8. Organization tenure (number of years at your
the obtained results. To overcome this issue, this approach organization):
and its learnt models are assessed on the large-sized Kaggle 9. Years since last promotion in the organization:
HR simulated dataset (15000 samples) and the medium-sized 10. Rate the degree of your job satisfaction (motivational
IBM HR simulated dataset (1470 samples) which will provide work, spirit of challenge, contentment with career
more consistent feedback about the relevance of our results. progress, personal development): 1: Low, 2: Medium,
Finally, regarding reliability, there might be a potential threat 3: High, 4: Very high
that concerns the dependency of data and analysis on the 11. Rate the degree of job performance (productivity, skills
specific researchers. However, we are doing an effort towards adequacy) : 1: Low, 2: Medium, 3: High, 4: Very high
trying to minimize this threat by collecting data from different 12. Rate the degree of environment satisfaction (simple
countries with different cultures. tasks, clear roles, no stressors): 1: Low, 2: Medium,
3: High, 4: Very high
VII. CONCLUSION AND FUTURE WORKS 13. Do you feel you are well rewarded for your dedica-
The main goal of this research is to help HR managers tion and commitment towards the work (rewards, Pay)?
to detect as soon as possible an employee’s intention to Yes/No
leave using predictive analytics methods and so to fight 14. How easy was it for you to get involved in your job
this attrition. The contributions can be summarized into (participation in decision making, opinions): 1: Slightly
three points: i) The proposal of a new employee attrition easy, 2: Moderately easy, 3: Very easy, 4: Extremely easy
model that contains only 11 features necessary and suffi- 15. Are you satisfied with your relationships at work (rela-
cient to detect intention to leave and to predict positive tionship with colleagues and manager)? ∗1: Slightly
attrition using a mixed research methodology. ii) The pro- satisfied, 2: Moderately satisfied, 3: Very satisfied, 4:
posal of machine, deep and ensemble learning predictive Extremely satisfied
models and their experimentation in a variety of different set- 16. Reward/Salary:
tings (large-sized simulated dataset, medium sized simulated 17. Trainings number offered by the organization:
dataset and small-sized real dataset) to best assess their 18. How easy was it to balance your work life and personal
performance. iii)The interpretation and the explication that life while working? 1: Low, 2: Medium, 3: Easy, 4: Very
enables HR managers to understand what makes an employee easy
19. How often did you travel for business at that organiza- [19] S. Karande and L. Shyamala, ‘‘Prediction of employee turnover using
tion? 1: Non-travel, 2: Travel rarely, 3 : Travel frequently ensemble learning,’’ in Ambient Communications and Computer Systems,
vol. 904, Y.-C. Hu, S. Tiwari, K. K. Mishra, and M. C. Trivedi, Eds.
20. Intention to quit the organization Yes/No Singapore: Springer, 2019, pp. 319–327.
21. Any other factors which you feel are responsible for [20] D. S. Sisodia, S. Vishwakarma, and A. Pujahari, ‘‘Evaluation of machine
Employee Attrition? learning models for employee churn prediction,’’ in Proc. Int. Conf.
Inventive Comput. Informat. (ICICI), Coimbatore, India, Nov. 2017,
pp. 1016–1020, doi: 10.1109/ICICI.2017.8365293.
REFERENCES [21] M. M. Alam, K. Mohiuddin, K. M. Islam, M. Hassan, A.-U. M. Hoque, and
S. M. Allayear, ‘‘A machine learning approach to analyze and reduce fea-
[1] R. Punnoose and P. Ajit, ‘‘Prediction of employee turnover in organizations
tures to a significant number for employee’s turn over prediction model,’’
using machine learning algorithms,’’ Int. J. Adv. Res. Artif. Intell., vol. 5,
in Intelligent Computing, vol. 857, K. Arai, S. Kapoor, and R. Bhatia, Eds.
no. 9, p. 5, 2016, doi: 10.14569/IJARAI.2016.050904.
Cham, Switzerland: Springer, 2019, pp. 142–159.
[2] R. Colomo-Palacios, C. Casado-Lumbreras, S. Misra, and P. Soto-Acosta, [22] S. Shah, S. Alatekar, Y. Bhangare, B. Kasar, and R. Patil, ‘‘Analysis of
‘‘Career abandonment intentions among software workers,’’ Hum. Fac- employee attrition and implementing a decision support system providing
tors Ergonom. Manuf. Service Industries, vol. 24, no. 6, pp. 641–655, personalized feedback and observations,’’ J. Crit. Rev., vol. 7, no. 19,
Nov. 2014, doi: 10.1002/hfm.20509. pp. 2372–2380, 2020.
[3] Amazon.fr—People Analytics in the era of big Data: Changing [23] F. Fallucchi, M. Coladangelo, R. Giuliano, and E. W. De Luca, ‘‘Predicting
the way you Attract, Acquire, Develop, and Retain Talent—Jean employee attrition using machine learning techniques,’’ Computers, vol. 9,
Paul Isson—Livres. Accessed: Dec. 15, 2019. [Online]. Available: no. 4, p. 86, Nov. 2020, doi: 10.3390/computers9040086.
https://www.amazon.fr/People-Analytics-Era-Big-Data/dp/1119050782
[24] S. R. Ponnuru, ‘‘Employee attrition prediction using logistic regression,’’
[4] D. Angrave, A. Charlwood, I. Kirkpatrick, M. Lawrence, and Int. J. Res. Appl. Sci. Eng. Technol., vol. 8, no. 5, pp. 2871–2875,
M. Stuart, ‘‘HR and analytics: Why HR is set to fail the big data May 2020, doi: 10.22214/ijraset.2020.5481.
challenge,’’ Hum. Resource Manage. J., vol. 26, no. 1, pp. 1–11, [25] S. Kakad, R. Kadam, P. Deshpande, S. Karde, and R. Lalwani, ‘‘Employee
Jan. 2016, doi: 10.1111/1748-8583.12090. attrition prediction system,’’ Int. J. Innov. Sci., Eng. Technol., vol. 7, no. 9,
[5] A. Tursunbayeva, S. D. Lauro, and C. Pagliari, ‘‘People analytics—A p. 7, 2020.
scoping review of conceptual boundaries and value propositions,’’ Int. [26] N. Jain, A. Tomar, and P. K. Jana, ‘‘A novel scheme for employee churn
J. Inf. Manage., vol. 43, pp. 224–247, Dec. 2018. problem using multi-attribute decision making approach and machine
[6] T. Pape, ‘‘Prioritising data items for business analytics: Framework and learning,’’ J. Intell. Inf. Syst., vol. 56, no. 2, pp. 279–302, Apr. 2021, doi:
application to human resources,’’ Eur. J. Oper. Res., vol. 252, no. 2, 10.1007/s10844-020-00614-9.
pp. 687–698, Jul. 2016. [27] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
[7] S. N. Mishra, D. R. Lama, and Y. Pal, ‘‘Human resource predictive ana- M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, and J. Van der Plas,
lytics (HRPA) for HR management in organizations,’’ Int. J. Sci. Technol. ‘‘Scikit-learn: Machine learning in Python,’’ J. Mach. Learn. Res., vol. 12,
Res., vol. 5, no. 5, pp. 33–35, 2016. pp. 2825–2830, Oct. 2011.
[8] P. Likhitkar and P. Verma, ‘‘HR value proposition using predictive analyt- [28] S. R. Safavian and D. Landgrebe, ‘‘A survey of decision tree classi-
ics: An overview,’’ in New Paradigm in Decision Science and Management. fier methodology,’’ IEEE Trans. Syst., Man, Cybern., vol. 21, no. 3,
Singapore: Springer, 2020, pp. 165–171, doi: 10.1007/978-981-13-9330- pp. 660–674, May 1991.
3_15. [29] A. J. Smola and B. Schölkopf, ‘‘A tutorial on support vector regres-
[9] T. Peeters, J. Paauwe, and K. Van De Voorde, ‘‘People analytics effective- sion,’’ Statist. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004, doi:
ness: Developing a framework,’’ J. Organizational Effectiveness, People 10.1023/B:STCO.0000035301.49549.88.
Perform., vol. 7, no. 2, pp. 203–219, Jul. 2020, doi: 10.1108/JOEPP-04- [30] G. King and L. Zeng, ‘‘Logistic regression in rare events data,’’ Political
2020-0071. Anal., vol. 9, no. 2, pp. 137–163, 2001.
[10] N. Shah, Z. Irani, and A. M. Sharif, ‘‘Big data in an HR con- [31] G. E. Hinton, ‘‘Reducing the dimensionality of data with neural networks,’’
text: Exploring organizational change readiness, employee attitudes Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006, doi: 10.1126/sci-
and behaviors,’’ J. Bus. Res., vol. 70, pp. 366–378, Jan. 2017, doi: ence.1127647.
10.1016/j.jbusres.2016.08.010. [32] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, ‘‘How to construct deep
[11] S. V. Kalinin, B. G. Sumpter, and R. K. Archibald, ‘‘Big-deep-smart data recurrent neural networks,’’ 2013, arXiv:1312.6026. [Online]. Available:
in imaging for guiding materials design,’’ Nature Mater., vol. 14, no. 10, https://arxiv.org/abs/1312.6026
pp. 973–980, Oct. 2015, doi: 10.1038/nmat4395. [33] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang,
[12] M. Nocker and V. Sena, ‘‘Big data and human resources management: G. Wang, J. Cai, and T. Chen, ‘‘Recent advances in convolutional neu-
The rise of talent analytics,’’ Social Sci., vol. 8, no. 10, p. 273, Sep. 2019, ral networks,’’ Pattern Recognit., vol. 77, pp. 354–377, May 2018, doi:
doi: 10.3390/socsci8100273. 10.1016/j.patcog.2017.10.013.
[13] D. Pessach, G. Singer, D. Avrahami, H. C. Ben-Gal, E. Shmueli, and [34] G. Brown, ‘‘Ensemble learning,’’ in Encyclopedia of Machine Learning,
I. Ben-Gal, ‘‘Employees recruitment: A prescriptive analytics approach via vol. 312. 2010, pp. 15–19.
machine learning and mathematical programming,’’ Decis. Support Syst., [35] A. Liaw and M. Wiener, ‘‘Classification and regression by randomForest,’’
vol. 134, Jul. 2020, Art. no. 113290, doi: 10.1016/j.dss.2020.113290. R News, vol. 2, pp. 18–22, Dec. 2002.
[14] S. S. Alduayj and K. Rajpoot, ‘‘Predicting employee attrition using [36] J. H. Friedman, ‘‘Greedy function approximation: A gradient
machine learning,’’ in Proc. Int. Conf. Innov. Inf. Technol. (IIT), Al Ain, boosting machine.,’’ Ann. Statist., vol. 29, no. 5, pp. 1189–1232,
UAE, Nov. 2018, pp. 93–98, doi: 10.1109/INNOVATIONS.2018.8605976. Oct. 2001.
[15] M. Ganesh V. Aishwaryalakshmi, S. Aksshaya, and K. Abinaya, ‘‘Predict- [37] B. K. Goswami and S. Jha, ‘‘Attrition issues and retention chal-
ing employee attrition using machine learning,’’ Int. J. Sci. Res. Comput. lenges of employees,’’ Int. J. Sci. Eng. Res., vol. 3, no. 4, pp. 1–6,
Sci., Eng. Inf. Technol., vol. 3, no. 3, pp. 145–149, 2018. Apr. 2012.
[16] Y. Zhao, M. K. Hryniewicki, F. Cheng, B. Fu, and X. Zhu, ‘‘Employee [38] A. H. Khan and M. Aleem, ‘‘Impact of job satisfaction on employee
turnover prediction with machine learning: A reliable approach,’’ in Intelli- turnover: An empirical study of autonomous medical institutions of
gent Systems and Applications, vol. 869, K. Arai, S. Kapoor, and R. Bhatia, Pakistan,’’ J. Int. Stud., vol. 7, no. 1, pp. 122–132, May 2014, doi:
Eds. Cham, Switzerland: Springer, 2019, pp. 737–758. 10.14254/2071-8330.2014/7-1/11.
[17] X. Gao, J. Wen, and C. Zhang, ‘‘An improved random forest algorithm for [39] J. V. Beaverstock, B. Derudder, J. R. Faulconbridge, and F. Witlox, ‘‘Inter-
predicting employee turnover,’’ Math. Problems Eng., vol. 2019, pp. 1–12, national business travel: Some explorations,’’ Geografiska Annaler, B,
Apr. 2019, doi: 10.1155/2019/4140707. Hum. Geogr., vol. 91, no. 3, pp. 193–202, Sep. 2009, doi: 10.1111/j.1468-
[18] S. N. Khera and Divya, ‘‘Predictive modelling of employee turnover 0467.2009.00314.x.
in Indian IT industry using machine learning techniques,’’ Vis., [40] P. Runeson and M. Höst, ‘‘Guidelines for conducting and reporting case
J. Bus. Perspective, vol. 23, no. 1, pp. 12–21, Mar. 2019, doi: study research in software engineering,’’ Empirical Softw. Eng., vol. 14,
10.1177/0972262918821221. no. 2, pp. 131–164, Apr. 2009, doi: 10.1007/s10664-008-9102-8.