0% found this document useful (0 votes)
84 views9 pages

Human Talent Prediction in HRM Using C4.5

This document discusses using a decision tree classifier to predict human talent. Specifically, it uses the C4.5 classification algorithm on a dataset of employee performance records from a Malaysian university to generate rules for predicting talent performance. The generated rules are then evaluated on unseen data to estimate the accuracy of prediction results. The goal is to help address one of the challenges in talent management, which is identifying existing talent in an organization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views9 pages

Human Talent Prediction in HRM Using C4.5

This document discusses using a decision tree classifier to predict human talent. Specifically, it uses the C4.5 classification algorithm on a dataset of employee performance records from a Malaysian university to generate rules for predicting talent performance. The generated rules are then evaluated on unseen data to estimate the accuracy of prediction results. The goal is to help address one of the challenges in talent management, which is identifying existing talent in an organization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Hamidah Jantan et al.

/ (IJCSE) International Journal on Computer Science and Engineering


Vol. 02, No. 08, 2010, 2526-2534

Human Talent Prediction in HRM using C4.5


Classification Algorithm

Hamidah Jantan Abdul Razak Hamdan and Zulaiha Ali Othman


Faculty of Computer and Mathematical Sciences Faculty of Information Science and Technology
Universiti Teknologi MARA (UiTM) Terengganu, Universiti Kebangsaan Malaysia (UKM)
23000 Dungun, Terengganu, Malaysia 43600 Bangi, Selangor, Malaysia
Email: [email protected] Email: {arh,zao}@ftsm.ukm.my

Abstract—In HRM, among the challenges for HR professionals of time which is known as talent management. Managing an
is to manage an organization’s talents, especially to ensure the organization talent has become one of the challenges to the
right person for the right job at the right time. Human talent HR professionals. This task involves a lot of managerial
prediction is an alternative to handle this issue. Due to that decisions in order to decide the right person for the right job
reason, classification and prediction in data mining which is at the right time. Sometimes, these types of decisions are
commonly used in many areas can also be implemented to very uncertain and difficult; and it depends on various
human talent. There are many classification techniques in data factors such as human experience, knowledge, preference
mining techniques such as Decision Tree, Neural Network, and judgment. The process to identify an existing talent in
Rough Set Theory, Bayesian theory and Fuzzy logic. Decision
organization is among the top talent management challenges
tree is among the popular classification techniques, which can
produce the interpretable rules or logic statement. The
and becomes a never ending issue [19].
generated rules from the selected technique can be used for In data mining tasks, classification and prediction is
future prediction. In this article, we present the study on how among the popular task for knowledge discovery and future
the potential human talent can be predicted using a decision plan. The classification process is known as supervised
tree classifier. By using this technique, the pattern of talent learning, where the class level or classification target is
performance can be identified through the classification already known. There are many techniques used for
process. In that case, the hidden and valuable knowledge
classification in data mining such as Decision Tree,
discovered in the related databases will be summarized in the
Bayesian, Fuzzy Logic, Support Vector Machine (SVM),
decision tree structure. In this study, we use decision tree C4.5
classification algorithm to generate the classification rules for Artificial Immune System (AIS), Neural Network, Rough
human talent performance records. Finally, the generated rules Set Theory, Genetic Algorithm and Nearest Neighbor.
are evaluated using the unseen data in order to estimate the Decision tree is among the powerful classification
accuracy of the prediction result. algorithms as stated in some studies [20-23]. The decision
tree technique has its advantages such as it can produce a
Keywords-Human talent; Classification, Prediction; Decision model which may represent interpretable rules or logic
Tree; C4.5 classifier; Classification Algorithm statement; it is more suitable for analyzing categorical
outcomes; it is non-parametric which is suited to capture a
functional form relating independent and dependent
I. INTRODUCTION variables; easy to interpret, computationally inexpensive,
1 capable in dealing with noisy data, its prediction model is
Nowadays, there are many areas, such as in finance, explainable to the user, it has automatic interaction detection
medical, marketing, stock market, telecommunication, to find the significant high-order interactions quickly, and it
manufacturing, health care and customer relationship, which can produce more informative outputs [18, 20]. In fact,
have adapted data mining techniques [1-4] [5-10] [11-13] there are many techniques from the decision tree family such
[14-16]. However, the used of classification techniques in as C4.5, NBTree, SimpleCart, REPTree, BFTree and others.
data mining approach does not attract much attention among The C4.5 classification algorithm is easy to understand as
people in Human Resource (HR) field [17, 18]. HR data the derived rules have a very straightforward interpretation.
provides a rich resource for knowledge discovery for Due to these reasons, this study is aimed to use this
decision support system development. In addition, today’s classification algorithm to handle issue on human talent
organization has to struggle effectively in terms of cost, prediction.
quality, service or innovation. The success of these tasks
depends on having enough right people with the right skills, In this study, recommendation for promotion (yes/no) is
deployed in the appropriate locations at the appropriate point considered as the target class in classification process. For
human talent dataset, we used employees’ data from one of
1
Malaysian higher learning institutions as training dataset.
This research was conducted as a part of the science fund The first phase of mining process, the training dataset is
project funded by MOSTI (Ministry of Science, Technology and prepared using the data mining preprocessing task. In the
Innovation), Malaysia (01-01-01-SF0236).
second phase, the C4.5 classifier is used to generate talent
performance knowledge from yearly performance evaluation

ISSN : 0975-3397 2526


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

database. The evaluation on other classifiers, such as neural forecasting talent needs; attracting and retaining the right
network and nearest neighbor using the same training set, leadership talent; engaging talent; identifying existing talent;
has been discussed in our previous work [24, 25]. In that attracting and retaining the right leadership and key
study, the proposed classifier for the training dataset is C4.5 contributor; deploying existing talent; lack of leadership
which is from the decision tree family. Finally, the generated capability at senior levels and ensuring a diverse talent pool
classification rules from C4.5 classifier for talent [19]. In this study, we attempt to focus on one of the talent
performance have been analyzed to determine interesting management challenges i.e. to identify an existing talent
attributes and the correctly classified data using unseen data. regarding the key talent in an organization by predicting
their performance. For that reason, we use the past data
This paper is organized in five sections. The first part is from the employee database to implement the classification
the introduction, followed by the second section which and prediction process. Talent management process consists
describes some related work on managing human talent, data of recognizing the key talent areas in the organization,
mining in HR, classification for prediction, the decision tree identifying the people in the organization who constitute
techniques and C4.5 classifier for classification and key talent, and conducting development activities for the
prediction. The third section discusses the research method, talent pool to retain and engage them and also have them
while section 4 provides some results and discussion. ready to move into more significant roles [27] as illustrated
Finally, section 5 ends with the concluding remarks and in Fig.1. These processes involve HR activities that need to
future research directions. be integrated into an effective system such as decision
II. RELATED WORKS support system [28].

A. Managing Human Talent B. Data Mining for Talent Management


In any organization, talent management has become an Data mining is part of the process in Knowledge
increasingly crucial approach in HR functions. Talent is Discovery in Database (KDD). Data mining is a step in
considered as the capability of any individual to make a KDD and currently receives great attention and is
significant difference to the current and future performance recognized as a newly emerging analysis tool [20].
of the organization [26]. Managing an organization’s talents Recently, data mining has given a great deal of concern and
involves human resource planning that emphasizes processes attention in the information industry and in society as a
for managing people in organization. Besides that, talent whole. This is due to the wide accessibility of enormous
management can be defined as a process to ensure amount of data and the important needs for turning such data
leadership continuity in key positions and encourages into useful information and knowledge [29]. Data mining
individual advancement; and decision to manage supply, has several tasks such as classification and prediction;
demand and flow of talent through human capital engine concept description; association; cluster analysis; outlier
[27]. analysis; trend and evaluation analysis; statistical analysis
and others. Computer application interfaces with data
Talent management is very crucial and needs some mining tool can help executives to make more informative
attention from HR professionals. TP Track Research Report and objective decisions. Besides that, it can help managers
has found that among the top current and future talent to retrieve, summarize and analyze related data to make
management challenges are developing existing talent; wiser and more informative decisions. There are very few

Figure 1. Human Talent Prediction using Data Mining Technique

ISSN : 0975-3397 2527


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

studies related to prediction application in HR using this In HRM, there are some interests on solving HRM problems
approach. However, this approach is quite popular in HR using data mining approach [17, 31]. There are very few
personnel selection problems. From the literature study, discussions on the uses of data mining related to employee’s
prediction applications in HRM are infrequent and there are performance prediction, project assignment, employee’s
some examples such as to predict the length of service, sales recruitment and many others. Due to these reasons, this
premiums, to achieve persistence indices of insurance agents study attempts to use the data mining approach for
and to analyze mis-operation behaviors of the operators [18]. employee’s performance prediction as one of the methods to
Over the years, data mining has involved in various predict the human talent in an organization. The purpose of
techniques including statistics, SVM, AIS, neural network, this study is to determine the employees’ performance by
decision tree, genetic algorithm, and visualization predicting their performance based on the past experience
techniques. Data mining has been applied in many fields knowledge through previous performance evaluation data. In
such as finance, marketing, manufacturing, health care, this study, the classification technique will be used for
customer relationship and etc. However, its application in human talent prediction.
HRM is rare [18].
C. Classification for Prediction
Recently, with the new demand and increased visibility, Classification and prediction are among the methods that
HRM seeks a more strategic role by turning to data mining can produce intelligent decision. Currently, many
methods [17]. This can be done by identifying generated classification and prediction methods have been proposed by
patterns from the existing data in HR databases as useful researchers in machine learning, pattern recognition, and
knowledge. Thus, this study concentrates on identifying the statistics. In this study, we are focusing on classification
patterns that relate to the talent. The patterns can be methods in data mining as part of machine learning process.
generated by using some of the major data mining Classification and prediction in data mining are two forms of
techniques, such as the clustering technique which is used to data analysis that can be used to extract models to describe
list the employees with similar characteristics, to group the important data classes or to predict future data trends [32] as
performances and etc. From the association technique, shown in Fig. 2.
patterns that are discovered can be used to associate the
employee’s profile for the most appropriate program/job, The classification process has two phases; the first phase
associated with employee’s attitude to performance and etc. is learning process where the training data are analyzed by
In classification and prediction, the pattern can be used to the classification algorithm. Learned model or classifier is
predict the percentage accuracy in employee’s performance, represented in the form of classification rules. The second
behavior, and attitudes, predict the performance progress phase is classification process, where the test data are used
throughout the performance period, and also identify the to estimate the accuracy of classification model or classifier.
best profile for different employee and etc. [30]. If the accuracy is considered acceptable, the model can be
applied to the new data to know the prediction result. There
The matching of data mining problems and talent are many techniques that can be used for classification such
management needs is very crucial. Therefore, it is very as decision tree, Bayesian methods, Bayesian network, rule-
important to determine the suitable data mining techniques. based algorithms, neural network, support vector machine,

Figure 2. Classification Process in Data Mining

ISSN : 0975-3397 2528


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

association rule mining, k-nearest-neighbor, case-based  Information measure or expected information is required
reasoning, genetic algorithms, rough sets and fuzzy logic. In to classify any arbitrary tuple:
this study, our discussion focuses on the three main m

 s log
si si
classification techniques i.e. decision tree, neural network I(s1,s2,...,sm)   2
and k-nearest-neighbor. Decision tree and neural network i 1
s (1)
are found useful in developing predictive models in many
fields[20].  Entropy of attribute A with values {a1,a2,…,av}:

D. Decision Tree Techniques v


s1 j  ...  smj
Decision tree can produce a model with rules that are E(A)  j 1
s
I ( s1 j ,..., smj ) (2)
human-readable and interpretable. The classification task
using decision tree technique can be performed without
complicated computations and the technique can be used  Information gain means how much can be gained by
for both continuous and categorical variables. This branching on attribute A:
technique is suitable for predicting categorical outcomes
and less appropriate for application with time series data Gain(A)  I(s 1, s 2,..., sm)  E(A) (3)
[20]. Decision tree classifiers are quite popular techniques
because the construction of tree does not require any
domain expert knowledge or parameter setting, and is III. METHODOLOGY
appropriate for exploratory knowledge discovery. In this study, we attempt to discover employees’
Currently, there are many research that employed decision performance patterns from the existing employees’
tree techniques such as in electricity energy consumption performance data using the decision tree classification
[20], prediction of breast cancer [23], accident frequency techniques. The techniques are chosen based on the
[33], personnel selection [18], job attitudes [34] and others. common techniques for classification and prediction in data
It is stated that, the decision tree is among the powerful mining. The decision tree is a ‘divide-and–conquer’
classification algorithms [20-23]. Some of decision tree approach from a set of independent instances. In this
classifiers are C4.5/C5.0/J4.8, NBTree, SimpleCart, experiment, we used C4.5 classifier which comes from the
REPTree, BFTree and others. decision tree family and has been evaluated in our previous
E. C4.5 Decision Tree Classifier work [25, 36]. The input variables for the process are
performance factors for selected attributes; and the outcome
The C4.5 technique is one of the decision tree families is the employees’ performance represented by the status of
that can produce both decision tree and rule-sets; and recommendation for promotion, whether “yes” or “no” as
construct a tree for the purpose of improving prediction shown in Table I.
accuracy [21, 23]. Besides that, C4.5 models are easy to
understand as the rules that are derived from the technique The attributes for training dataset are selected based on
have a very straightforward interpretation. The C4.5 / C5.0 / the related factors for employee performance, as illustrated
J48 classifier is among the most popular and powerful in Fig.3. These attributes are extracted from the individual
decision tree classifiers [20-23]. C5.0 and J48 are the factors component i.e. work outcome; knowledge and skill;
improved versions of C4.5 algorithms. WEKA toolkit individual quality; and activities and contribution. In this
package has its own version known as J48. J48 is an study, the performance factors are based on the performance
optimized implementation of C4.5 rev. 8. C4.5 creates an appraisal standard used by the Malaysian public sector
initial tree using the divide-and-conquer algorithm as organizations. Besides the performance factors, some
described below [35]: background information is also considered as part of the
attributes for the dataset.
 If all the cases in S belong to the same class or S is small,
the tree is a leaf labeled with the most frequent class in S. TABLE I. ATTRIBUTE DESCRIPTION
 However, choose a test based on single attribute with
Attribute/
two or more outcomes. Make this test the root of the tree Factor
Description
with one branch for each outcome of the test, partition S
into corresponding subsets S1, S2, ….according to the Category P – Professional, S - Support Staff
outcome for each case, and apply to the same procedure Gender Male and Female
recursively to each subset.
Doctorate, Master, Bachelor, Diploma and
Qualification
Usually there are many tests that can be selected in the Certificate
last step. C4.5 uses two heuristic criteria to rank possible PK1…6 Work Outcome (50%)
tests: information gain that uses attribute selection measure,
which minimizes the total entropy of the subset {Si}, and the PM1…6 Knowledge and Skill (25%)
default gain ratio that divides information gain by the KP1…6 Individual Quality (20%)
information provided by the test outcomes. The information
gain algorithm is described as the function Gain (A), which KS1…6 Activities and Contribution (5%)
is shown below:
YEAR1…6 Evaluation mark (100%)
 Select the attribute with the highest information gain
Target/Class Recommendation for promotion (Yes or No)
 S contains si tuples of class Ci for i = {1, …, m}

ISSN : 0975-3397 2529


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

Figure 3. Human Talent Prediction using Data Mining Techniques

The aim of this study is to generate the forecasting model support system can be used to predict whether the employee
that contains classification rules for human talent prediction. is recommended for promotion or not.
The classification rules also show us about the interesting or
important attributes for the dataset. The forecasting model
will be used to determine whether the employee is
recommended for promotion or not based on his/her
performance. In this experiment, the training dataset
contained 33 related attributes from background information
and performance factors which are demonstrated in Table I.
The initial dataset contains 655 records of performance
evaluation marks from the six years (2003-2008). Each
record holds evaluation marks for selected factors and the
total mark for each of the years. The data mining tools used
are WEKA and ROSETTA toolkit. The flow of the research
process is shown in Fig.4.
This study has three phases; the first phase is the data
collection process which involved the data cleaning and data
preprocessing. The second phase is to generate the
classification rules using C4.5 classifier for the training
dataset. In this case, we use all the selected attributes
defined in Table I. The C4.5 classifier produced the analysis
of the training dataset and the classification rules. In the
preprocessing phase of the experiment, the initial dataset is
divided into two sets of data. The first data set is the training
dataset which contains 590 data and the second one is testing
dataset or unseen data. The testing dataset comprises of 10%
records extracted from the initial data. The third phase of
experiment is the evaluation and interpretation of the
classification rules using the unseen data. The purpose of the
evaluation process is to determine the accuracy of
classification rules for prediction and to identify the
important or interesting attributes and rules. The hidden
knowledge from the performance evaluation is embedded
into the decision support system for employees’
performance prediction. Thus, the intelligent decision Figure 4. Research Phases

ISSN : 0975-3397 2530


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

IV. RESULTS AND DISCUSSIONS information provided by the test outcomes. With these
The classification rules generated from C4.5 classifier is criteria, C4.5 classifier can also be used to determine the
human readable and easy to understand which do not require important or interesting attributes from the dataset. In this
any domain expert knowledge or parameter setting. This study, the important attributes are identified through the
technique can produce both the decision tree and rule-sets number of hits for each of the attributes from generated
and can construct a tree for the purpose of improving the classification rules. The number of hits for all selected
prediction accuracy [21, 23]. On top of that, C4.5 model attributes is shown in Table II.
produces easy to understand rules with a very TABLE II. INTERESTING ATTRIBUTES
straightforward interpretation. As stated in [1-4], the
C4.5/C5.0/J4.8 classifier is among the popular and powerful Attributes Hits Attributes Hits
decision tree classifiers. The extracted summary of the
analysis for C4.5 classifier on the performance evaluation Category 1 Contribution3 2
training dataset from WEKA toolkit is shown below: Gender 2 Outcome4 2
Correctly Classified Instances 561 95.0847 % Qualification 1 Knowledge4 2
Incorrectly Classified Instances 29 4.9153 %
Kappa statistic 0.8489 Outcome1 2 Personal4 1
Mean absolute error 0.0861 Knowledge1 2 Contribution4 1
Root mean squared error 0.2075
Personal1 2 Knowledge5 1
Relative absolute error 25.3139 %
Root relative squared error 50.3528 % Contribution1 2 Contribution5 1
Total Number of Instances 590
YEAR1 2 Outcome6 2
Outcome2 2 Knowledge6 2
In addition, Fig. 5 shows part of the decision tree
classification model using C4.5 classifier. In this case, the Knowledge2 4 Personal6 1
leaves in the tree represent the classification result or the Contribution2 1 Contribution6 2
target class. As mentioned earlier, the classification’s target
Knowledge3 1
is the recommendation for promotion. The C4.5 classifier
uses two heuristic criterions to rank possible tests:
information gained by using attribute selection measure and In this experiment, from 33 selected attributes, only 23
the default gained ratio that divides information gain by the attributes are hit in the classification rules. As shown in

Figure 5. Sample of Rules using C4.5 Classifier

ISSN : 0975-3397 2531


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

Table II, the result reveals that the Knowledge2 attribute has techniques such as NBTree, SimpleCart, REPTree, BFTree
the highest hits as compared to other attributes. In this case, and many others should be tested to support the technique.
this result indicates the importance of the attribute for the
training dataset. The knowledge2 attribute represents the TABLE IV. RULE ANALYSIS
factor of knowledge and skill for year 2 performance
evaluation marks. This experiment result also indicates the Rule Hits Rule Hits Rule Hits Rule Hits
important of knowledge and skill factor for the employees’
performance, where the knowledge and skill attributes are 1 12 23 34
all listed in the importance of attributes in Table II. Another 2 1 13 24 35 3
important analysis from the experiment is the analysis of the
generated rules. The C4.5 classifier generated 43 3 1 14 4 25 2 36 1
classification rules from 590 training datasets. The sample
decision tree is illustrated in Figure 3 and the sample rules 4 4 15 1 26 2 37 1
that are extracted from the tree are shown in Table III.
5 1 16 27 38 3
TABLE III. SAMPLE OF CLASSIFICATION RULES
6 17 28 1 39 1
Rule No Classification Rules Result
7 18 29 40 1

IF (Knowledge5<=22.26) AND IF 8 22 19 1 30 1 41 4
(Outcome4<=45) AND IF
(Contribution2<=3.75) AND IF 9 20 1 31 42 1
1 NO
(Contribution1<=3) AND IF
(Knowledge6<=22.05) AND IF 10 21 32 43 5
(Contribution3<=3.25) AND IF
(Gender=Male) AND IF (Outcome2<=43.5) 11 3 22 33

IF (Knowledge5<=22.26) AND IF Table V shows the rules evaluation where 52 records


(Outcome4<=45) AND IF
(Contribution2<=3.75) AND IF
matched the rules which is 77% from the total number of
2 (Contribution1<=3) AND IF
YES records in the new data set. This result indicates the
(Knowledge6<=22.05) AND IF accuracy of the classification for the new dataset.
(Contribution3<=3.25) AND IF
(Gender=Male) AND IF (Outcome2>43.5) TABLE V. RULE EVALUATION

Number of
Status Accuracy(%)
IF (Knowledge5<=22.26) AND IF Data
(Outcome4<=45) AND IF Correctly
(Contribution2<=3.75) AND IF 52 77
3 NO Classified
(Contribution1<=3) AND IF
Incorrectly
(Knowledge6<=22.05) AND IF 13 23
Classified
(Contribution3<=3.25) AND IF
(Gender=Female)

V. CONCLUSION
Table IV shows the analysis of rules by looking at the This article has described the significance of the study on
number of hits for each rule applied on the unseen or testing the use of data mining classification techniques for
data (65 data). In this case, rule no 8 has the highest number employees’ performance prediction. However, there should
of hits as compared to the other rules. From that, we can be more data mining classification techniques applied to the
conclude that the rule is important or interesting. However, different problem domains in HR field of research to
further experiments are needed to verify this finding.
broaden the horizon of academic and practice work on data
Although the finding seems very interesting, further studies
mining in HR. Besides that, other data mining techniques
are needed to validate and explore more on the result. The
experiment related to the relevancy analysis should be such as SVM, Fuzzy logic and AIS should also be
conducted to determine the effects on the accuracy of the considered for future work on classification techniques using
classifier. Besides that, the experiment for attribute the same dataset.
reduction should also be performed as an approach to reduce In this study, as we can see from result analysis, C4.5
the processing time and space while improving the accuracy classifier has a great potential for performance prediction.
of the classification model. The generated classification rules can be used to predict the
In this study, we observed the great potential of C4.5 as a performance of an employee whether he/she has potential to
classification technique for employee’s performance be promoted or not, based on his/her performance..
prediction. In the next stage of data mining classification Currently, this research is at the stage of system
technique analysis, C4.5 classifier should be further development where the classification rules need to be
analyzed with other datasets to justify the suitability of the embedded into the decision support system which is known
C4.5 classifier for the performance evaluation dataset. as Intelligent Decision Support System (IDSS). The IDSS is
Furthermore, the other decision tree classification algorithms a substantial of intelligent system which can help the
should also be tested in order to validate these findings. For decision makers to determine their potential employee for
that reason, in the next experiment, other decision tree promotion or may be for other tasks. This intelligent system

ISSN : 0975-3397 2532


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

will be discussed in our next study. In conclusion, the [17] J. Ranjan, "Data Mining Techniques for better decisions in
ability to continuously change and obtain new understanding Human Resource Management Systems," International
of the classification and prediction in HR research has Journal of Business Information Systems, vol. 3, pp. 464-481,
become the major contribution to data mining in HRM. 2008.
[18] C. F. Chien and L. F. Chen, "Data mining to improve
ACKNOWLEDGMENT personnel selection and enhance human capital: A case study
in high-technology industry," Expert Systems and
The authors are indebted to Madam Hanita Yusuf, Applications, vol. 34, pp. 380-290, 2008.
Assistant Registrar at Universiti Teknologi MARA [19] A TP Track Research Report "Talent Management: A State
Terengganu for her permission to use the existing of the Art," Tower Perrin HR Services 2005.
employees’ evaluation marks as training data set in this [20] G. K. F. Tso and K. K. W. Yau, "Predicting electricity energy
study. consumption: A comparison of regression analysis, decision
tree and neural networks," Energy, vol. 32, pp. 1761-1768,
REFERENCES 2007.
[1] Y. Feng, et al., "Knowledge discovery in traditional Chinese [21] I. Becerra-Fernandez, S. H. Zanakis, and S. Walczak,
medicine: State of the art and perspectives," Artificial "Knowledge discovery techniques for predicting country
Intelligence in Medicine, vol. 38, pp. 219-236, 2006. investment risk," Computers & Industrial Engineering, vol.
[2] E. Frank, Hall, M., et al., "Data mining in bioinformatics 43, pp. 787-800, 2002.
using Weka," Bioinformatics Application Note, vol. 20, pp. [22] P. R. Kumar and V. Ravi, "Bankruptcy prediction in banks
2479-2481, 2004. and firms via statistical and intelligent techniques : A
[3] C. Combes, Meskens, N., Rivat,, C. & Vandamme J.P., review," European Journal of Operational Research, vol.
"Using a KDD process to forecast the duration of surgery," 180, pp. 1-28, 2007.
International Journal of Production Economics, vol. 112, pp. [23] D. Delen, G. Walker, and A. Kadam, "Predicting breast
279-293, 2008. cancer survivability: A comparison of three data mining
[4] C. L. Chang, & Chen, C.H., "Applying decision tree and methods," Artificial Intelligent in Medicine, vol. 34, pp. 113-
neural network to increase quality of dermatologic 127, 2005.
diagnosis," Expert Systems with Applications, vol. 36, pp. [24] J. Hamidah, H. Abdul Razak, and O. Zulaiha Ali,
4035-4041, 2009. "Classification Techniques for Talent Forecasting in Human
[5] A. S. Chang, & Leu, S.S., "Data mining model for identifying Resource Management " in Advanced Data Mining and
project profitablility variables," International Journal of Application, R. H. Q. Yang, J. P. J. Gama, and X. Meng, Eds.
Project Management, vol. 24, pp. 199-206, 2006. Beijing, China: Springer-Verlag Berlin Heidelberg, 2009, pp.
[6] B. Ekasingh, Ngamsomsuke, K., et al., "A data mining 496-503.
approach to simulating farmers' crop choices for integrated [25] J. Hamidah, H. Abdul Razak, and A. O. Zulaiha,
water resources management," Journal of Environmental "Classification for Talent Management using Decision Tree
Management, vol. 77, pp. 315-325, 2005. Induction Techniques," in 2nd Data Mining and Optimization
[7] R. Arbel, & Rokach, L., "Classifier evaluation under limited Seminar (DMO’09), Bangi, Selangor, 2009, pp. 15-20.
resources," Pattern Recognition Letters, vol. 27, pp. 1619- [26] M. Lynne, "Talent Management Value Imperatives :
1631, 2006. Strategies for Execution," The Conference Board 2005.
[8] F. E. Ciarapica, & Giacchetta. G., "Classification and [27] I. Cubbingham, "Talent Management : Making it real,"
prediction of occupational injury risk using soft computing Development and Learning in Organizations, vol. 21, pp. 4-
techniques: An Italian study," Safety Science, vol. 47, pp. 36- 6, 2007.
49, 2009. [28] CHINA UPDATE, "HR News for Your Organization : The
[9] V. Cho, & Ngai, E.W.T., "Data mining for selection of Tower Perrin Asia Talent Management Study," 2007.
insurance sales agents," vol. 20, pp. 123-132, 2003. [29] J. Han and M. Kamber, Data Mining : Concepts and
[10] S. H. Liao, Chen, Y.N., & Tseng, Y.Y., "Mining demand Techniques. San Francisco: Morgan Kaufmann Publisher,
chain knowledge of life insurance market for new product 2006.
development," Expert Systems with Applications, vol. 36, pp. [30] H. Jantan, A. R. Hamdan, and Z. A. Othman, "Data Mining
9422-9437, 2009. Techniques for Performance Prediction in Human Resource
[11] S. T. Li and L. Y. Shue, "Data mining to aid policy making in Application," in 1st Seminar on Data Mining and
air pollution management," Expert Systems with Applications, Optimization, Selangor, 2008, pp. 41-49.
vol. 27, pp. 331-340, 2004. [31] H. Jantan, A. R. Hamdan, and Z. A. Othman, "Classification
[12] W. S. D. Chen, Y.K., "Using neural networks and data Techniques for Talent Forecasting in Human Resource
mining techniques for the financial distress prediction model Management " in 5th International Conference on Advanced
" Expert System with Applications, vol. 36, pp. 4075-4086, Data Mining and Application (ADMA), Beijing, China, 2009,
2009. pp. 496-503.
[13] I. Bose, & Mahapatra, R.K., "Business data mining - a [32] J. Han and M. Kamber, Data Mining: Concepts and
machine learning perspective " Information & Management, Techniques. San Francisco: Morgan Kaufmann Publisher,
vol. 39, pp. 211-225, 2001. 2006.
[14] C. Rygielski, J. C. Wang, and D. C. Yeh, "Data mining [33] L. Y. Chang and W. C. Chen, "Data mining of tree-based
techniques for customer relationship management," models to analyze freeway accident frequency," Journal of
Technology in Society, vol. 24, pp. 483-502, 2002. Safety Research vol. 36, pp. 365-375, 2005.
[15] C. Romero and S. Ventura, "Educational data mining : A [34] K. Y. Tung, I. C. Huang, S. L. Chen, and C. T. Shih, "Mining
survey from 1995 to 2005," Expert Systems with the Generation Xer's job attitudes by artificial neural network
Applications, vol. 33, pp. 135-146, 2007. and decision tree - empirical evidence in Taiwan," Expert
[16] J. Ranjan and K. Malik, "Effective educational process: A Systems and Applications, vol. 29, pp. 783-794, 2005.
data mining approach," VINE: The Journal of Information [35] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H.
and Knowledge Management Systems, vol. 37, pp. 502-515, Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H.
2007. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, "Top 10

ISSN : 0975-3397 2533


Hamidah Jantan et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 08, 2010, 2526-2534

algorithms in data mining," Knowledge Information System, Abdul Razak Hamdan is a Professor at
2008. Faculty of Information Science and
[36] H. Jantan, A. R. Hamdan, Z. A. Othman, and M. Puteh, Technology (FTSM), Universiti
"Applying Data Mining Classification Techniques for Kebangsaan Malaysia (UKM). He
Employee's Performance Prediction," in Knowledge received his first degree in Science
Management 5th International Conference (KMICe2010), from Universiti Kebangsaan Malaysia
Kuala Terengganu, Terengganu Malaysia, 2010, pp. 645-652. (UKM) in 1975, his master degree in
Computing Science from University of
Newcastle Upon Tyne, England in1977, and PhD degree in
AUTHORS PROFILE Artificial Intelligent from Loughborough University of
Technology, England in 1987. His research interests include
Hamidah Jantan received the first Artificial Intelligent, Decision Support System, Strategic
degree in Computer Science in 1989 Planning and Data mining.
from Universiti Teknologi Malaysia
(UTM), Kuala Lumpur, Malaysia. She
obtained her master degree in Zulaiha Ali Othman is an associate
Information Technology (Science and Professor at Faculty of Information
System Management) from Universiti Science and Technology (FTSM),
Kebangsaan Malaysia (UKM) in2002. Universiti Kebangsaan Malaysia
She is currently working towards her Ph.D. degree in the (UKM). She received her first degree in
Faculty of Information Science and Technology (FTSM), computer science from Universiti
Universiti Kebangsaan Malaysia (UKM), Malaysia. In Kebangsaan Malaysia (UKM) in 1990,
addition, she is a senior lecturer at Universiti Teknologi her master degree in Software Technology from University
MARA (UiTM) Terengganu. Her research interests include of Sheffield,UK in 1997, and PhD degree in Computing
Intelligent System, Decision Support System and Data (Agent Oriented Methodology) from Sheffield Hallam
Mining technology. University, UK, in 2003. Her research interests include
Artificial Intelligent, Agent Technology and Data mining.

ISSN : 0975-3397 2534

You might also like