0% found this document useful (0 votes)
59 views11 pages

A Proposed Model For Predicting Employees' Performance Using Data Mining Techniques: Egyptian Case Study

Uploaded by

VISHAL MAURYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views11 pages

A Proposed Model For Predicting Employees' Performance Using Data Mining Techniques: Egyptian Case Study

Uploaded by

VISHAL MAURYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331165269

A proposed Model for Predicting Employees' Performance Using Data Mining


Techniques: Egyptian Case Study

Article  in  International Journal of Computer Science and Information Security, · January 2019

CITATION READS

1 7,112

3 authors:

Mona Mohamed Nasr Essam Shaaban


Helwan University Faculty of computers and information systems Beni Suef University
81 PUBLICATIONS   265 CITATIONS    13 PUBLICATIONS   41 CITATIONS   

SEE PROFILE SEE PROFILE

Ahmed Samir

1 PUBLICATION   1 CITATION   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Utilizing Conceptual Mapping to Improve Training Programs View project

All content following this page was uploaded by Mona Mohamed Nasr on 18 February 2019.

The user has requested enhancement of the downloaded file.


International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

A proposed Model for Predicting Employees’


Performance Using Data Mining Techniques:
Egyptian Case Study
Mona Nasr Essam Shaaban Ahmed Samir
Faculty of Computers & Inf., Faculty of Computers & Inf., Faculty of Computers & Inf.,
Helwan University, Egypt Beni-Suef University, Egypt Helwan University, Egypt
[email protected] [email protected] [email protected]

Abstract—Human Resources Management (HRM) has Database (KDD) and is currently acquiring great deal of
become one of the essential interests of managers and attention and utilization. It is considered as a recently
decision makers in almost all types of businesses to adopt emerging analysis and predictive tool [2], because of the
plans for correctly discovering highly qualified employees. existence and multiplicity of massive amount of data
Accordingly, managements become interested about the containing huge hidden unknown knowledge.
performance of these employees. Especially to ensure the
appropriate person allocated to the convenient job at the
Knowledge can be extracted through various
right time. From here, the interest of data mining (DM) role methods and one of them is by using DM technique.
has been growing that its objective is the discovery of
DM techniques provides an approach to utilize
knowledge from huge amounts of data. In this paper, DM
techniques were utilized to build a classification model for
different DM tasks such as classification, association,
predicting employees’ performance using a real dataset and clustering used to extract hidden knowledge from
collected from the Ministry of Egyptian Civil Aviation huge amount of data.
(MOCA) through a questionnaire prepared and distributed
Classification is a predictive DM technique, makes
for 145 employees. Three main DM techniques were used
for building the classification model and identifying the prediction about values of data using known results
most effective factors that positively affect the found from various data. Classification technique is a
performance. The techniques are the Decision Tree (DT), supervised learning technique in DM and machine
Naïve Bayes, and Support Vector Machine (SVM). To get a learning, whereas the class level or the target class is
highly accurate model, several experiments were executed already previously known. It is one of the most useful
based on the previous techniques that are implemented in tasks in DM to build classification models from an input
WEKA tool for enabling decision makers and human dataset. The used classification techniques commonly
resources professionals to predict and enhance the build models, which in turn used to predict future data
performance of their employees.
trends [3]. With classification, Predictive models have
the specific target of enabling us to predict the unknown
values of variables depending on interest previously
Index Terms —Classification, C4.5 (J48), Data Mining,
known values of other variables [4].
Employees’ Performance, HRM, MOCA, Naïve Bayes, SVM
In this connection, the main objectives of the present
study were extracted to support the decision makers in
I. INTRODUCTION different locations to discover potential talents of
employees as follows:
HRM has a leading role in deciding the
competitiveness and effectiveness for better o Gathering a dataset of predictive variables,
continuation. Organizations consider HRM as “people o Identification of different factors, which affects
practices”. Therefore, it becomes the responsibility of the employees’ behavior and performance.
HRM to allocate the best employees to the appropriate o Using proposed DM classification techniques for
job at the right time, train and qualify them, and build constructing a predictive model and identifying
evaluation systems to monitor their performance and an relationships between most important factors
attempt to preserve the potential talents of employees [1]. affecting over whole efficiency of the model.
With the advancement and growth of technologies in There are various data classification techniques such
business organizations, HR employees need not handle as DT, SVM, Naïve Bayes classifier, and others. In this
the massive amount of data manually any further. These paper, the classification process is executed through
data is very important for the decision makers, but there using the three main classification technique that were
is a challenge to mine and get the best and useful data mentioned above. Other techniques can also be used for
from these huge data [1]. From here, the role of DM classification such as Neural Network (NN), K-Nearest
comes. DM is a step in Knowledge Discovery in Neighbors (KNN), etc.

31 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

The C4.5 (J48) technique is one of the DT family. It comprehensive study is presented on employee’s
can generate both decision tree and its rule sets. In performance prediction model and criteria that this
addition, it builds the tree for enhancing the prediction model measure based on the following literature study:
accuracy. Besides that, the models that are generated
Kirimi JM, Motur CA (2016) concentrates on
from the C4.5 (J48) are easily understandable because
collecting employees’ data of a public management
the extracted rules from the technique have a very
development institute in Kenya using the user interface,
explicit uncomplicated interpretation and has the
generating a decision tree based on the historical data of
advantage that does not need any field learning or
employees, identifying the relationship between the DT
parameter setting. Where, the researcher can easily
accuracy and employees’ attributes. Moreover, they
detect the most effective variables on the predicted
concentrated on the possibility of constructing two or
target. J48 is the optimal implementation for C4.5 rev. 8
more prediction techniques for predicting the employees’
technique and it is the own version of WEKA toolkit
performance and choosing the best suitable one for this
package that will be used in this study [5].
organization [10].
Naïve Bayes classifier or the Bayesian therom is
Desouki M. S., Al-Daher J (2015) presented a study
another classification technique that is utilized for
for applying DM techniques such as DT, Key Nearest
predicting a target class. It depends on probabilities in its
Neighbors (KNN), and SVM to the HRM field through
calculations, in addition, it provides a unique approach
analyzing the Performance Appraisal (PA) results, which
for realizing various learning algorithms that do not
supported by a multi-discipline academic research
explicitly use probabilities [6]. Therefore, the results of
organization in order to enhance the appraisal method
this classifier are more accurate, effective, and more
and assess the compatibility of practical implementation
sensitive to recent data inserted to the dataset [7].
with the objectives of PA process. To achieve that,
SVM is considered as one of the most effective various DM tasks have been utilized such as clustering,
supervised machine learning techniques that has a classification, and prediction. This study concluded that
straightforward structure and high ability for DM tasks can be hopeful and important in dealing with
classification. Moreover, SVM is recognized as the the activities of human resource like enhancing the
appropriate technique in machine learning and DM for methods of performance’s evaluation [11].
classification particularly on both linear and non-linear
V.Kalaivani, M.Elamparithi (2014) applied DT
decision margins where, high accuracy of model can be
techniques in order to predict the employees’
produced [8]. SVM has many advantages such as it has
performance; this is the objective of their research. DT is
no ceiling on the number of attributes and depends on the
one of the most popular classification technique that
kernel trick for building the model through expert
creates both a tree and rules set; building the model of
knowledge on the problem via kernel adjustment [9].
based a given data set. There are various DT algorithms
Sequential Minimal Optimization (SMO) is a SVM
as ID3, C4.5, CART, Bagging, Random Forest, Rotation
algorithm. It is recognized as an efficient classification
forest, and CHAID. In this study, C4.5, Bagging and
technique in solving the problem of optimization. SMO
Rotation Forest algorithms are utilized, which are
can be considered as the state–of–the–art approach in a
implemented in WEKA toolkit. Experiments were
non-linear SVM [10]. SVM will train the dataset using
performed based on the collected data from an institution
SMO algorithm to build the prediction model.
[12].
This paper is organized in six sections. The first
H. Jantan, Norazmah Mat Y. and Mohamad Rozuan
section is the introduction, followed by the second
N. (2014) applied SVM technique in the Classification
section, which describes some related work on HRM,
process of Employee Achievement. This study aimed to
DM in HR, the classification techniques used for
investigate the effectiveness of SVM technique in
classification and prediction. The third section discusses
detecting the required data pattern for classifying the
the adopted methodology for constructing the proposed
employee achievement. The model’s accuracy was
model. While, section 4 presents the experiments that
considered satisfactory by the SVM technique but needs
executed for generating the model. Section 5 shows some
some enhancement to get the higher [13].
results and discussion. Finally, section 6 ends with the
concluding remarks and future research directions. Lipsa Sadath (2013) discussed the possibility of
making decisions with automated and intelligent manner
II. LITERATURE REVIEW using DM techniques and depending on rich employee
database. It was concluded that C4.5 technique had the
Many researches have used DM classification higher accuracy. The objective of this study was
techniques for generating rules and predicting certain predicting the employees’ performance, applying the
attitudes in various fields of science [5]. therefore, finest Knowledge Management (KM) strategies, thus
evaluation and prediction of employee’s performance implementing stable HR system and powerful business
efficiency are considered as a critical issue for detecting [14].
the whole number of variables and criteria related to the
predictive model efficiency of the employees’ Qasem et al. (2012) used DM techniques for building
performance that have been reviewed. In this section, a a classification model in order to predict the performance

32 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

of new employees. Different DT techniques were used employees’ performance. For achieving this objective, it
for building the model such as ID3 and C4.5 (J48) is necessary to exist a generic guide to develop a DM
algorithms, where several classification rules set were project lifecycle containing certain steps that includes
produced. In addition to using the Naïve Bayes Problem Definition and Objective Structuring, Data
classification technique as another classifier, where three Collection and Understanding, Data Preparing and
experiments were conducted based on real data collected Preprocessing, Modeling and Experiments, Testing and
from several organizations for detecting the most Evaluating.
effective factors on the employees’ performance.
Moreover, the results of experiments showed that the job In general, Classification contains some steps to
title was the effective factor on the employees’ complete its process. The first step is called the learning
performance [15]. step where in the model; predefined classes are built by
analyzing a set of training dataset variables. Each
Hamidah Ja., Abdul Razak Ha., and Zulaiha variable is assumed that has a relation and regards to a
(2010a) presented an important study about the predefined class. The second step is responsible for
problems that may face the talents management that estimating the accuracy of model or classifier (validating
can be solved by using various DM techniques. In the model) through testing the model using a different
this study, they attempted to settle one of the talents dataset. If the classifier’s accuracy was considered
management tasks as identifying potential talents acceptable, the model or classifier can be used to apply
by predicting their performance based on previous to new unseen data to give prediction about specific
unknown label class and this is considered the third step
experience knowledge and introducing the suitable
as shown in figure 1. Therefore, the model acts as a
DM Technique for this issue. [16]. classifier in the process of decision-making. There are
Hamidah Ja., Abdul Razak Ha., and Zulaiha various classification techniques have been used in the
(2010b) used the DT techniques to investigate a prediction process such as DT, Naïve Bayes, SVM, etc.
study on how the potential talent can be predicted.
In this study, the C4.5 (J48) classification algorithm
was the main technique to produce the classification
rules set for human talent performance records.
Finally, the generated rules are evaluated using the
new unseen data to assess the accuracy of the
predication results [5].
Hamidah Ja., Abdul Razak Ha., and Zulaiha (2009)
also discussed the potential classification techniques for
talents’ forecasting. In this study, they used various
classification techniques such as DT, NN, and KNN.
They focused on the techniques’ accuracy to detect the
most suitable one for HR data. The results showed that
the DT technique was the potential one for talents’
forecasting in HRM, where it had the highest accuracy.
The used dataset was collected from a higher education Figure 1. The Classification Process in DM
institution for academic staff [17].
A. Problem Definition and Objective Structuring
In General, this paper is an initiative attempt to
investigate DM tasks, especially classification task, for
The first step in data mining is to understand and
supporting decision makers and HR’s professionals by
define the right problem and specify the objectives.
identifying and studying the main factors of their
Meanwhile, data miners should also equip themselves
employees that may positively affect their performance.
with domain knowledge to understand problem nature,
The paper applied some of the classification techniques
which will greatly improve DM effectiveness and
to build a proposed model for supporting the prediction
efficiency. Indeed, human resource management
of the employees’ performance. In the next sections, a
activities are very complicated and thus few quantitative
comprehensive description of the study is presented,
approaches have been employed in practice [2]. HRM at
specifying the methodology, the experiments and results,
MOCA and most of other public sectors use traditional
and a discussion of the results, finally conclusions and
assessment techniques that they do not enable them to get
recommendations for future work.
the perfect assessment for the employees’ performance
III. CONSTRUCTING THE CLASSIFICATION and therefore they cannot predict the performance and
MODEL discover the talents.
The proposed methodology was adopted for the this research concentrates on how can present a
objective, which is building the classification model proposed model supporting HRM and Decision makers
studying certain factors that may affect and predict the

33 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

to predict the employees’ performance of MOCA and important advantage where it is available for free and has
identifying the employees’ factors that are affect and a simple GUI so, it could be used smoothly. The tools
associate with bad/good performance. Moreover, supported by the WEKA workbench are based on
detecting the most suitable DM technique with the most statistical evaluations of the models (algorithms).
highly accuracy between the various classification Consequently, the WEKA user can easily make
techniques that will be used. comparisons among the results and accuracies of the
applied machine learning and DM algorithms for a given
B. Data Collection and Understanding Process
dataset in flexible procedures in order to detect the most
suitable algorithm for the given dataset [18].
The idea of this study is building a classification
model for predicting the employees’ performance based Feature Selection
on a real dataset to get real and significant results for
supporting the HR executives and the decision makers. Feature selection is a one of the main concepts of
To collect the required data, it is necessary to exist a DM and Machine Learning. Where, it is a process of
practical way. Therefore, a questionnaire is prepared and selecting necessary useful variables in a dataset to
manually distributed the employees of MOCA improve the results of machine learning and make it
containing the several attributes that may affect and more accurate. At which, Using too many numbers of
predict the performance Class (the target Class). The variables in a dataset reduce predictive performance. The
asked attributes for training dataset are selected based on data set may contain too many features; some of them do
the related factors for employee performance that not promote the prediction accuracy, and thus make the
confined between Educational factors, Personal factors, predictive model excessively complicated. Therefore,
and Professional factors such as (job title, age, rank, unnecessary useless variables must be avoided to make
qualifications, grade…etc.) as illustrated in table 1. the model efficiently works. Deciding which
These attributes are used to predict the employee unnecessary variable to avoid can be done by a manual
performance (the target class) to be - Excellent, Very manner using domain knowledge or it can be done
Good, or Good. The questionnaire was filled by 145 automatically [19].
employees from all different sectors of MOCA with
this paper targets getting the most important
various job titles, ages, and ranks to get complete sample
variables that may positively affect the accuracy of the
about them.
employees’ performance prediction model using the
C. Data Preparation and Pre-processing various feature selection algorithms that are supported in
WEKA such as CorrelationAttributeEval algorithm,
After the process of questionnaire collection GainRatioAttributeEval algorithm, ReliefFAttributeEval
finished, the process of preparing the data is performed, algorithm, and so on.
the raw data contained instances that were not applicable.
This was due to errors and anomalies that had to be IV. MODELING AND EXPERIMENTS
discarded. The data was transferred to Excel sheets to
review and modify the types of the collected data where The stage of Classification process comes after the
some attributes types need to be changed from numeric data has been prepared and preprocessed. Three
data type into categorical data type i.e. values illustrated classification techniques were used, which they are
by ranges for example the attributes of No. of experience SVM, DT, and Naïve Bayes classifier. These
years and service period (X3, X4) according to table 1. classification techniques are used and applied on the
Other attributes need to be generalized in fewer discrete dataset for building the employees’ performance
values instead of that they already for example the prediction model to get the most proper DM technique
attribute of faculty specialization (X15) according to and the most effective variables that may affect and
table 1 contained values like IT, CS, MIS they have been predict the employees’ performance as discussed at table
considered as only one value, IS and so on. Therefore, 1.
Data generalization is also considered as one of the data
reduction techniques. After preparing the excel sheet and These variables consist of (A) Professional
making the needed processing, the file was transformed information such as: job title, rank, No. of experience
into arff format that is compatible with the WEKA DM years, No. of the service years at MOCA, No. of
toolkit which was used in building the model. companies that worked for previously, salary, ask about
working in comfortable conditions, ask about the
The WEKA (Waikato Environment for Knowledge existence of comfort and satisfaction with the salary, job,
Analysis) toolkit is a machine learning platform, work conditions, and ask about getting trainings, (B)
developed by researchers at the University. Java is the Personal information such as: age, gender, marital status,
used implementation language. It provides a unified (C) Educational information such as: grade, degree,
package at only one application, which enables users to general specification, and university type. All these
access the modern updated technologies in DM and variables used to predict the target class (performance of
machine-learning environment. It contains several tasks MOCA’s employees) to be Excellent, Very Good, or
such as pre-processing, classification, clustering, Good.
association and visualization. The WEKA tool had an

34 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

TABLE 1. THE USED ATTRIBUTES FOR PREDICTING THE TARGET According to table 2, the results of the E1 indicated
CLASS (PERFORMANCE)
that the accuracy of the SVM technique is the highest
Variable
through using the whole variables of the dataset with
Variable Description accuracy percentage 81.38%. In addition, the E1 results
Symbol
X1 JobTitle Employee’s Job Title indicated that all these variables have some sort of
effectiveness on the employees’ performance. The X9
X2 Rank Employee’s Rank or Level variable is the most effective one on the performance.
X3 #ExpYears No. of Working Experience Years Other variables that participated in the decision tree
Service Period at MOCA generated from the C4.5 (J48) were X3, X2, X10, X14,
X4 ServicePeriod
(in Years)
No. of Previous Companies the and others had positively affected the performance.
X5 #PrevCo.
employee worked for
The profTrain. (X9) variable was the most effective
X6 SalRange Range of Employee’s Salary factor on the employees’ performance. The results
Working in Comfortable conditions showed that the variable had positively affected the
X7 ComfWorkCond. (in employee’s perspective). performance of employees that took training and joined
Answer with (Yes - No)
Existing Satisfaction for Salary (in
to courses related to their jobs better than ones who did
X8 SatSalary employee’s perspective). not.
Answer with (Yes - No)
Existing trainings for the job
In the next experiment (E2), the feature selection
X9 ProfTrain. (in employee’s perspective). algorithms were used to get the best feature subset for
Answer with (Yes - No) each algorithm from the whole dataset. These algorithms
Existing Satisfaction for the job were CorrelationAttributeEval, GainRatioAttributeEval,
X10 SatJob (in employee’s Perspective).
Answer with (Yes - No)
and ReliefFAttributeEval algorithm. All of them are
supported by WEKA tool.
X11 Age Employee’s Age
B. Second Experiment (E2): Using the important
X12 Gender Employee’s Gender
variables resulting from the use of Feature
X13 MarStatus Employee’s Marital Status selection algorithms (10 variables)

X14 EduDegree Employee’s Education Degree By using the previously mentioned feature selection
GenSpecial. General Specialization
algorithms, Table 3 shows the important feature subset
X15
containing the most 10 important variables that
X16 UniType Type of the University positively affect the employees’ performance. In
addition, the prediction accuracy for each classification
X17 Grade Employee’s Graduation Grade
technique applied to this dataset.
Employee’s Performance either as
Performance informed or predicted. This is the According to table 3, the results of the E2 indicated
target class that the highest accuracy of the three Classification
techniques through using the three different feature
selection algorithms is the SVM Technique with
A. First Experiment (E1): Using the whole variables accuracy percentages 84.14%, 82.76%, and 82.07% in
of the dataset that may affect the performance (17 descending order. The results of E2 also proved that
variables) using less No. of variables as predictors for the target
class as in E2 produced a higher accuracy than using all
In the first Experiment (E1), the whole variables of ones of the dataset as in E1, where the accuracy
the dataset were considered and tested to measure the percentages of the three classification techniques in E2
prediction accuracy of the three applied classification through using different feature selection algorithms were
techniques. Table 2 shows the accuracy percentages of better than the opposite ones in E1.
predicting the performance for each of these techniques.
TABLE 3. ACCURACY PERCENTAGES FOR PREDICTION ALGORITHMS
TABLE 2. ACCURACY PERCENTAGES FOR PREDICTION ALGORITHMS IN E2 BASED ON USING FEATURE SELECTION ALGORITHMS
IN E1
Prediction Accuracy
No. Technique Prediction Accuracy Produced
Feature Selection Technique
Feature
77.93 % Algorithm.
1 C4.5 (J48) Subset
C4.5 Naïve
SVM
(J48) Bayes
2 Naïve Bayes 71.03 %
CORRELATION- [X2,X6, 79.31 73.10 84.14
3 SVM 81.38 % ATTRIBUTEEVAL X9,X11, % % %
X10,X3,

35 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

X4,X14, who had less number of experience years. The table 5.


X12,X7] below illustrates this finding.
[X9,X2,
X6,X3,
C. Third Experiment (E3): Using the most effective
GAINRATIO- 79.31 72.41 82.07 variables resulting from the tree generated using
X10,X11,
ATTRIBUTEEVAL % % % Decision Tree technique (5 variables)
X4,X1,
X14,X7]
In this experiment (E3), DT technique was used as a
classification technique using its algorithm C4.5 to get
[X3,X2, the generated tree that illustrate the most effective factors
RELIEFF-
X6,X11,
79.31 73.79 82.76
on the employees’ performance and rank them with its
X4,X9, effectiveness. The generated tree showed the five
ATTRIBUTEEVAL % % %
X14,X1, variables that had greatly affected the performance were
X10,X15] X9, X3, X2, X14, and X10 as illustrated in figure 2. That
experiment can be applied to determine whether the
variables reduction would affect the accuracy of the
classifier or not.
The C4.5 (J48) had the same accuracy percentage
when the three feature selection algorithms were used TABLE 4. ACCURACY PERCENTAGES FOR PREDICTION ALGORITHMS
IN E3 BASED ON THE FIVE EFFECTIVE VARIABLES
even with the different produced feature subsets.
Nevertheless, its accuracy in E2 is better than the
No. Technique Prediction Accuracy
opposite one in E1. The naïve bays classifier had the best
prediction accuracy with percentage 73.79% when the
1 C4.5 (J48) 79.31 %
ReliefFAttributeEval feature selection algorithm was
used. In addition, its accuracy percentages through using
the three algorithms of the feature selection in E2 was 2 Naïve Bayes 82.07 %
better than the opposite one in E1. The SVM algorithm
had the best prediction accuracy with percentage 84.14% 3 SVM 86.90 %
when the CorrelationAttributeEval feature selection
algorithm was used. In addition, its accuracy percentages
through using the three algorithms of the feature According to table 4, the results of the E3 indicated
selection in E2 was better than the opposite one in E1. that the SVM technique had the highest prediction
accuracy through using the most five effective factors
The 10 variables of the produced feature subsets
with accuracy percentage 86.90 %. If the three
had a weight from 0 to 1 and sorted in descending order.
experiments’ results of E1, E2, and E3 were reviewed,
All of them had a greatly affected the employees’
The SVM technique would have the highest prediction
performance but the most effective factor differ from
accuracy at all experiments. Moreover, the prediction
each feature selection algorithm and other based on its
accuracy percentage of the SVM technique increased
weight through the used algorithm.
when the number of used variables had decreased at each
When using the CorrelationAttributeEval algorithm, the
experiment.
most effective variable was the rank (X2) that had the
greatest weight. Where, Employees’ performance with The results of E3 answered about the question of
higher rank were better than ones with lower rank, But in did the variables reduction would affect the accuracy of
some cases the better performance did not require higher the classifier or not. Where, the results proved that the
rank as shown in table 5. less of the used variables, the higher of the classifier
accuracy. Therefore, it is very important to determine the
The profTrain. (X9) variable was the most variables that had the greatest effect on the performance
positively affected the employees’ performance because to get the highest predication accuracy.
it had the maximum gain ratio when
GainRatioAttributeEval algorithm was used. Where the The generated tree indicated that all five variables
performance of employees that underwent professional had some sort of effect on employee’ performance, but
training and joined to courses related to their jobs better the profTrain. (X9) variable had the greatest positive
than ones who did not. This effective factor was common effect on the employees’ performance and it was the
in E1 and E2. starting node at the tree as shown in figure 2. Where
those employees that underwent professional training
As shown in table 3, the #ExpYears (X3) variable had a better performance than ones who did not as
had the greatest effect on the employees’ performance previously illustrated. If the three experiments’ results of
when the RelieffAttributeEval feature selection E1, E2 were reviewed, and E3, the results would prove
algorithm was used. Where employees with more that the X9 variable was common in all experiments.
numbers of experience years related to their jobs had Other variables that participated in the generated tree
positively affected the performance compared to those were Rank, #ExpYears, EduDegree, and SatJob variable.

36 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

The Rank and #ExpYears (X2, X3) variables had a The EduDegree (X14) variable had positively
greatly effect on the employees’ performance where the affected the employees’ performance where, employees
experiments results showed that employee’s seniority with higher academic education degrees had a better
plays an important role at the performance of the performance compared to ones with lower academic
MOCA’s employees. Where employees with higher rank qualifications. Figures and generated rules form the
like First rank and more number decision tree concluded that most of MOCA’s employees
with higher qualifications like PhD and Master’s degree
experience years performed better than the newest
had excellent performance disregarding the financial
ones who had lower ranks like third rank and less number
rank of them.
of experience years. Nevertheless, there was an
exception for this rule. Where, employees with a large The SatJob (X10) variable also had positively
number of experience years related to the wanted job and affected the employees’ performance. Where, it was
they newly hired to MOCA and were accommodated to noticed through experiments that employees who had a
lower financial rank like Third rank according to the low general satisfaction towards their job had a better
of civilization service, had a better performance performance compared to ones who did not had a
compared to ones who had a higher rank because of the satisfaction towards their job. Even if the employee had
high service period and had a low number of experience some years of experience related to his job and he was
years. not satisfied, his performance would not be Excellent as
it promising.

Figure 2. The decision tree generated from using C4.5 algorithm for E3 to predict employees’ performance

37 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

TABLE 5. CLASSIFICATION RULES GENERATED BY C4.5 ALGORITHM IN E3 FOR PREDICTING EMPLOYEES’ PERFORMANCE

V. RESULTS AND DISCUSSION SVM technique was the most suitable classifier for the
dataset.
In this research, the accuracy of the DM The research has found that several variables had
classification techniques was measured through using the greatly affected the employees’ performance of MOCA.
averaging accuracy of 10-fold cross validation dataset One of the variables that had the highest effect is
that supported by WEKA toolkit. The results of the three ProfTrain. (X9). The evidence of the importance of
above experiments that exist in the tables (II, III, and IV) professional training and its effect on the employees’
showed that all of the three techniques had convergent performance is the trend of the state recently for
and moderate accuracy, which is greater than 70%. The employees’ rehabilitation and human resource
moderate accuracy can be considered as acceptable development and enrolled them in professional training
accuracy in many cases. In the all three experiments, the to increase their performance.
dataset produced satisfactory models for each of the three
selected classification techniques. Other professional variables like #ExpYears (X3)
had positively affected the employees’ performance as
The goal of this research was detecting the most
shown in the results of the three experiments E1, E2, and
suitable classification technique for the used dataset.
E3. Where the experience factor had an important role in
Sequel to the above, the accuracy of the model was used
the performance of MOCA’s employees but with
to define the most proper classification technique for the
existence a condition of consistency of employees’
dataset. The model was created after the classification
rehabilitation and supporting them with professional
process was evaluated using 10-fold validation
training and courses for enhancing their performance.
technique. As shown in the three tables (II, III, and IV)
The professional variable of ServicePeriod (X4)
of the three above experiments, The SVM technique had
had positively affect the performance. This slight impact
the highest accuracy of among the selected techniques
had been shown in E2, while in the other experiments E1
through all the experiments. As a result of the above, the
and E3 it was not significant. The performance of seniors

38 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

was high if compared to juniors. Nevertheless, this was VI. CONCLUSION AND FUTURE WORK
not a permanent rule as shown in the next paragraph.
Finally, this correlation between this variable and the Applying the DM techniques in the different
rank (X2) variable existing in E2 is natural because the problem domains in the HRM field is considered as an
employee’s financial rank is based on his service period, important and urgent issue. Especially, at the public
where the movement from the lower rank and the higher sector in Egypt. In addition, increasing the horizons of
one requires specific number from service’s years. It is a academic and practice research on DM in HR for
HR’s law. reaching a government sector with a high performance.
On the other hand, the results showed that the This paper has concentrated on the capability of
MOCA had hired new employees who had a degree of building a predictive model for employees’ performance
master’s and PhD, in addition to hiring the top graduated of MOCA using classification techniques through
students of the faculties. Those employees had high studying and testing the factors that might positively
studies and capabilities and did not need a high rank or a affect the performance of the MOCA’s employees. Some
lot of time and many years of experience to perform their of them had greatly affected the performance prediction.
tasks. Therefore, the EduDegree (X14) variable had a Proftrain. (X9) was found as the most effective factor on
great effect on the performance as shown at TABLE 5. the performance then the #ExpYears (X3). The SVM
technique was found as the most suitable classifier for
Some Personal Variables such as Age (X11) had building the predictive model, where it had the greatest
slightly affected the employees’ performance, but not prediction accuracy through all the three experiments
with obvious impact. Since sometimes, the performance that had executed with the highest percentage 86.90%.
increases with the increase of age that includes the WEKA toolkit was used through executing the
experience factor, but at other times, it decreases because experiments.
of the lack of the highest motivation and ambition
compared to the younger employees. For decision makers and HRM department, this
model, or an enhanced one, can be utilized in predicting
One of the professional variables that had a positive the performance of the potential talents that will be
effect on the performance was SalRange (X6) where its promoted, predicting the performance of the recently
impact had been shown in E2, where the employees with applicant employees where various actions can be taken
high salaries performed better compared to ones who for avoiding any risk related to hiring employees with a
received low salaries. Sometimes, the money factor plays low performance, or so on.
an important role in the employee’s performance. On the
other hand, it had not an effective role on the As future work, it is recommended to support the
performance in E1 and E3. Someone can find this thing used dataset with a greater number of employees to get
as a surprise. But in the truth, this is a natural thing high accuracy for the predictive model. The accuracy of
because this research was about employees at a public other classification techniques such as Neural Network
sector, where the salaries are almost specified for each (NN), fuzzy logic and many others should also be
financial rank. The salaries are based on the employees’ experimented to validate these findings and help to select
seniority and the employees know that well. a more robust model.
Last but not least, the result of the experiments had
Finally, when the suitable predictive model is
showed that the professional then educational variables
generated, an application could be developed to be used
had the greatest impact on the performance of MOCA’s
by the decision makers and HR’s Officials based on the
employees much more than the personal ones.
generated rules for predicting the performance of
As a final analysis on the accuracy of the employees.
classification models that built through the three
The manuscript has not been published elsewhere.
experiments, it was noticed that the prediction accuracy
was much more in E3 than in experiments E2 and E1 for REFERENCES
all different techniques used excepting the C4.5 (J48) [1] L. Sadath, (2013) “Data Mining: A Tool for Knowledge
technique. It had the same accuracy in experiments E3 Management in Human Resource,” International Journal of
and E2 but it was much more than E1. This might prove Innovative Technology and Exploring Engineering, Vol. 2, Issue
that the less of the used variables in the classification 6, April 2013.
process, the higher of the classifier accuracy. Therefore, [2] G. K. Gupta (2006) “Introduction to Data Mining with Case
it is very important to determine the variables that had Studies” ISBN-81-203-3053-6.
greatly affect the performance to get the highest [3] AI-Radaideh, Q. A., AI-Shawakfa, E.M., and AI-Najjar, M. I.,
(2006) “Mining Student Data using Decision Trees”,
predication accuracy. International Arab Conference on Information
Technology(ACIT'2006), Yarmouk University, Jordan, 2006.

39 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 17, No. 1, January 2019

[4] Surjeet K. Y., Brijesh B., Saurabh P., (2011) “Data Mining [12] V.Kalaivani, Mr.M.Elamparithi (2014), “An Efficient
Applications: A comparative Study for Predicting Student's Classification Algorithms for Employee Performance
performance”, International Journal of Innovative Technology Prediction”, International Journal of Research in Advent
and Creative Engineering, Vol.1 No.12 (2011) 13-19. Technology, Vol.2, No.9, September 2014 E-ISSN: 2321-9637.
[5] Jantan, H., Hamdan, A. R., & Othman, Z. A. (2010b). “Human [13] Hamidah Jantan, Norazmah Mat Yusoff and Mohamad Rozuan
talent prediction in HRM using c4.5 classification algorithm”. Noh (2014), “Towards Applying Support Vector Machine
International Journal on Computer Science and Engineering, 2 Algorithm in Employee Achievement Classification”,
(08-2010), PP. 2526–2534 [D]. Proceedings of the International Conference on Data Mining,
[6] Islam, M. J., Wu, Q. M. J., Ahmadi, M., and Sid-Ahmed, M. A., Internet Computing, and Big Data, Kuala Lumpur, Malaysia,
2014 ISBN: 978-1-941968-02-4 ©2014 SDIWC.
(2010), "Investigating the Performance of Naive- Bayes
Classifiers and K- Nearest Neighbor Classifiers" Journal of [14] Lipsa Sadath (2013), “Data Mining: A Tool for Knowledge
Convergence Information Technology Volume 5, Number 2, Management in Human Resource”, International Journal of
April 2010. Innovative Technology and Exploring Engineering (IJITEE),
Vol-2, April 2013.
[7] Al-Radaideh, Q.A., Al-Nagi, E., (2012). “Using Data Mining
Techniques to Build a Classification Model for Predicting [15] Qasem et al. (2012), “Using Data Mining Techniques to Build a
Employees Performance”, International Journal of Advanced Classification Model for Predicting Employees Performance”, in
Computer Science and Applications, 3(2), pp 144 – 151. (IJACSA) International Journal of Advanced Computer Science
and Applications, Vol. 3, No. 2, 2012.
[8] S.Yasodha and P. S.Prakash, (2012), "Data Mining Classification
Technique for Talent Management using SVM," the International [16] Jantan, H., Hamdan, A.R. and Othman, Z.A. (2010a),
Conference on Computing, Electronics and Electrical “Knowledge Discovery Techniques for Talent Forecasting in
Technologies, 2012. Human Resource Application”, International Journal of
Humanities and Social Science, 5(11), pp. 694-702.
[9] Hua Hu, Jing Ye, and Chunlai Chai, (2009), “A Talent
Classification Method Based on SVM”, in International [17] Jantan, H., Hamdan, A.R. and Othman, Z.A. (2009),
Symposium on Intelligent Ubiquitous Computing and Education, "Classification Techniques for Talent Forecasting in Human
Chengdu, China, 2009, pp. 160-163. Resource Management” in 5th International Conference on
Advanced Data Mining and Application (ADMA), Beijing,
[10] Kirimi JM, Motur CA (2016), “Application of Data Mining
Classification in Employee Performance Prediction”. China, 2009, pp. 496-503.
International Journal of Computer Applications, Volume 146 – [18] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and
No.7, July 2016. I. Witten (2009), “The WEKA data mining software: an update”,
[11] Desouki M. S., Al-Daher J (2015), “Using Data Mining Tools to ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–
Improve the Performance Appraisal Procedure, HIAST Case”. 18, 2009.
International Journal of Advanced Information in Arts, Science [19] Pedro Domingos. (2012), “A few useful things to know about
& Management Vol.2, No.1, February 2015. machine learning. Communications of the ACM” 55(10), 78-87.

40 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
View publication stats

You might also like