Predicting Stock Prices Using Data Mining Techniques
Predicting Stock Prices Using Data Mining Techniques
net/publication/281865047
CITATIONS READS
21 12,140
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Qasem A. Al-Radaideh on 18 September 2015.
1 2 3
QASEM A. AL-RADAIDEH, ADEL ABU ASSAF EMAN ALNAGI
1
Department of Computer Information Systems, Faculty of Information Technology and Computer Science
Yarmouk University, Irbid, Jordan. {[email protected]}
2
ICT Department, Amman Stock Exchange, Amman, Jordan. {[email protected]}
3
Department of Computer Science, Faculty of Information Technology
Philadelphia University, Jordan {[email protected]}
ABSTRACT
Forecasting stock return is an important financial subject that has attracted researchers’ attention for many years.
It involves an assumption that fundamental information publicly available in the past has some predictive
relationships to the future stock returns. This study tries to help the investors in the stock market to decide the
better timing for buying or selling stocks based on the knowledge extracted from the historical prices of such
stocks. The decision taken will be based on decision tree classifier which is one of the data mining techniques. To
build the proposed model, the CRISP-DM methodology is used over real historical data of three major companies
listed in Amman Stock Exchange (ASE).
Keywords: Data Mining, Data Mining, Data Classification, Decision Tree, Future stock return, data mining techniques,
decision tree classifiers, CRISP-DM methodology, Amman Stock Exchange.
2
The authors examined the effectiveness of the neural simulate results to confirm the theoretical computations of
network models used for level estimation and the approach.
classification. The results showed that the trading
strategies guided by the neural network classification 3. THE METHODOLOGY OF THE STUDY
models generate higher profits under the same risk
exposure than those suggested by other strategies. Data mining methodology is designed to ensure that the
data mining effort leads to a stable model that
The research by [13] was basically a comparison between successfully addresses the problem it is designed to solve.
the work of Fama and French’s model [14-15] and the Various data mining methodologies have been proposed
artificial neural networks in order to try to predict the to serve as blueprints for how to organize the process of
stock prices in the Chinese market. The purpose of this gathering data, analyzing data, disseminating results,
study is to demonstrate the accuracy of ANN in predicting implementing results, and monitoring improvements [9].
stock price movement for firms traded on the Shanghai To build the model that analyses the stock trends using
Stock Exchange. In order to demonstrate the accuracy of the decision tree technique, the CRISP-DM (Cross-
ANN, the authors made a comparative analysis between Industry Standard Process for data mining) [20] is used.
Fama and French’s model and the predictive power of the This methodology was proposed in the mid-1990s by an
univariate and multivariate neural network models. The European consortium of companies to serve as a non-
results from this study indicated that artificial neural proprietary standard process model for data mining. This
networks offer an opportunity for investors to improve model consists of the following six steps:
their predictive power in selecting stocks, and more
importantly, a simple univariate model appears to be more
Understanding the reason and objective of mining the
successful at predicting returns than a multivariate model.
stock prices.
Understanding the collected data and how it is
Al-Haddad et al., [16] presented a study that aimed to
structured.
provide evidence of whether or not the corporate
governance & performance indicators of the Jordanian Preparing the data that is used in the classification
industrial companies listed at Amman Stock Exchange model.
(ASE) are affected by variables that were proposed and to Selecting the technique to build the model.
provide the important indicators of the relationship of Evaluating the model by using one of the well known
corporate governance & firms’ performance that can be evaluation methods.
used by the Jordanian industrial firms to solve the agency Deploying the model in the stock market to predict
problem. The study random sample consists of (44) the best action to be taken, either selling or buying
Jordanian industrial firms. The study founds a positive the stocks.
direct relationship between corporate governance and Understanding the reason and objective of building
corporate performance. the model
Hajizadeh et al. [17] provided an overview of application The main reason and objective of building the model is to
of data mining techniques such as decision tree, neural try to help the investors in the stock market to decide the
network, association rules, and factor analysis and in best timing for buying or selling stocks based on the
stock markets. knowledge extracted from the historical prices of such
stocks. The decision taken will be based on one of the
Prediction stock price or financial markets has been one data mining techniques; the decision tree classifiers.
of the biggest challenges to the AI community. Various
technical, fundamental, and statistical indicators have Understanding the collected data
been proposed and used with varying results. Soni [18]
surveyed some recent literature in the domain of machine The Oracle database of Amman Stock Exchange (ASE)
learning techniques and artificial intelligence used to contains the historical prices of the 230 companies listed
predict stock market movements. Artificial Neural in the exchange from the year 2000. As the amount of
Networks (ANNs) are identified to be the dominant such data is very large and complicated, the decision was
machine learning technique in stock market prediction taken to choose three companies listed in the exchange.
area. The selection of these companies was based on the
following five criteria which represent the companies' size
El-Baky et al., [19], proposed a new approach for fast and liquidity: Market capitalization, days traded, turnover
forecasting of stock market prices. The proposed ratio, value traded and the number of shares traded, also
approach uses new high speed time delay neural networks the sector representation was considered during the
(HSTDNNs). The authors used the MATLAB tool to selection of these companies. These companies are “Arab
Bank”, its’ code in the stock market “ARBK” and it
3
belongs to the banking sector, “United Arab Investors other attributes were found not important and not having a
Company”, its’ code is “UAIC” and it belongs to the direct effect on the study. Table1 shows the 6 attributes
services sector, and “Middle East Complex for selected with their descriptions and their possible values.
Engineering, Electronics and Heavy Industries”, its’ code The class attribute is the investor action whether to buy or
is “MECE” and it belongs to the industrial sector. The sell that stock and it is named, “Action”. The data of this
period that was selected is from April 2005 to May 2007, attribute was taken also from ASE database, which is the
which presented the current and actual status of the net position of one of the biggest brokers dealing with the
market at that period of time. above mentioned stocks every day. The net position could
be either buying or selling that stock for that day.
At the beginning, the data collected contained 9 attributes;
this number was reduced manually to 6 attributes as the
Table 1: Attribute Description
Preparing the data generalizing them to discrete values, while table3 shows
the same sample after selecting the 6 attributes and after
At the beginning, when the data was collected, all the transforming them to discrete values.
values of the attributes selected were continuous numeric
values. Data transformation was applied by generalizing Building the model
data to a higher-level concept so as all the values became
discrete. The criterion that was made to transform the After the data has been prepared and transformed, the
numeric values of each attribute to discrete values next step was to build the classification model using the
depended on the previous day closing price of the stock. If decision tree technique. The decision tree technique was
the values of the attributes open, min, max, last were selected because [9] the construction of decision tree
greater than the value of attribute previous for the same classifiers does not require any domain knowledge, thus it
trading day, the numeric values of the attributes were is appropriate for exploratory knowledge discovery. Also,
replaced by the value Positive. If the values of the it can handle high dimensional data. Another benefit is
attributes mentioned above were less than the value of the that the steps of decision tree induction are simple and
attribute previous, the numeric values of the attributes fast. Generally, decision tree accuracy is considered good.
were replaced by Negative. If the values of those The decision tree method depends on using the
attributes were equal to the value of the attribute previous, information gain metric that determines the most useful
the values were replaced by the value Equal. Table 2 attribute. The information gain depends on the entropy
shows a sample of the continuous numeric values of the measure.
data before selecting the 6 attributes manually and before
Table 2: Sample of historical data before selecting relevant attributes and before generalization
4
26.3 26.3 26.7 26 26.02 Buy
26.02 26.09 26.25 25.55 25.63 Sell
Table 3: Sample of historical data after selecting attributes and after generalization.
The gain ratio is used to rank attributes and to build the classification rules that resulted after building the decision
decision tree where each attribute is located according to trees for each company using the C4.5 algorithm.
its gain ratio. When the decision tree model was applied
on the data of the three companies using the WEKA The graphs of the resulting decision trees using the C4.5
software version 3.5 [21], the root attribute for both algorithm with pruning technique is presented in Figure 1,
ARBK and UAIC company was the Open, while the Figure 2, and Figure 3 for the three companies under
attribute Last was the root for the decision tree of the study.
MECE company. As the process of building the tree goes
Table 4: Summary of the number of the classification rules
on, all the remaining attributes were used to continue with
this process. After building the complete decision tree, the Number of
Number of classification
set of classification rules were generated by following all Company classification rules
rules with pruning
without pruning
the paths of the tree. The maximum number of attributes
ARBK 21 11
that were used in some of the classification rules
generated were 4 attributes, while some classification UAIC 31 5
rules used only 1 attribute. Both the ID3 and C4.5
algorithms were used in building the decision trees and MECE 21 9
the pruning technique was used in the C4.5 algorithm in
order to reduce the size of the produced decision trees.
Table 4 gives a summary about the numbers of the
5
Figure 2: The Decision Tree for the ARBK
6
Table 5: Classification accuracy using ID3 & C4.5 classification methods and using 10-CV& Holdout evaluation methods
5. CONCLUSIONS AND FUTURE WORK [5] Tsang, P.M., Kwok, P., Choy, S.O., Kwan, R., Ng,
S.C., Mak, J., Tsang, J., Koong, K., and Wong, T.L.
This study presents a proposal to use the decision tree (2007) “Design and implementation of NN5 for Hong
classifier on the historical prices of the stocks to create Kong stock price forecasting”, Engineering Applications
decision rules that give buy or sell recommendations in of Artificial Intelligence, 20, pp. 453-461.
the stock market. Such proposed model can be a helpful [6] Ritchie, J.C., (1996) Fundamental Analysis: a Back-
tool for the investors to take the right decision regarding to-the-Basics Investment Guide to Selecting Quality
their stocks based on the analysis of the historical prices Stocks. Irwin Professional Publishing.
of stocks in order to extract any predictive information
from that historical data. The results for the proposed [7] Murphy, J.J., (1999) Technical Analysis of the
model were not perfect because many factors including Financial Markets: a Comprehensive Guide to Trading
but not limited to political events, general economic Methods and Applications. New York Institute of
conditions, and investors’ expectations influence stock Finance.
market. [8] Wang, Y.F., (2002) “Predicting stock price using
fuzzy grey prediction system”, Expert Systems with
As for the future work, there is still big room for testing Applications, 22, pp. 33-39.
and improving the proposed model by evaluating the
model over the whole companies listed in the stock [9] Han, J., Kamber, M., Jian P. (2011). “Data Mining
market. Also, the evaluation of a larger collection of Concepts and Techniques”. San Francisco, CA: Morgan
learning techniques such as neural networks, genetic Kaufmann Publishers.
algorithms, and association rules can represent a rich area [10] Enke, D., Thawornwong, S. (2005) “The use of data
for future investigation. Finally, reconsidering the factors mining and neural networks for forecasting stock market
affecting the behavior of the stock markets, such as returns”, Expert Systems with Applications, 29, pp. 927-
trading volume, news and financial reports which might 940.
impact stock price can be another rich field for future
studying. [11] Wang, J.L., Chan, S.H. (2006) “Stock market trading
rule discovery using two-layer bias decision tree”, Expert
REFERENCES Systems with Applications, 30(4), pp. 605-611.
[12] Lin, C. H. (2004) Profitability of a filter trading rule
[1] Wang, Y.F., (2003) “Mining stock price using fuzzy on the Taiwan stock exchange market. Master thesis,
rough set system”, Expert Systems with Applications, 24, Department of Industrial Engineering and Management,
pp. 13-23. National Chiao Tung University.
[2] Wu, M.C., Lin, S.Y., and Lin, C.H., (2006) “An [13] Cao, Q., Leggio, K.B., and Schniederjans, M.J.,
effective application of decision tree to stock trading”, (2005) “A comparison between Fama and French’s model
Expert Systems with Applications, 31, pp. 270-274. and artificial neural networks in predicting the Chinese
[3] Al-Debie, M., Walker, M. (1999). “Fundamental stock market”, Computers & Operations Research, 32, pp.
information analysis: An extension and UK evidence”, 2499-2512.
Journal of Accounting Research, 31(3), pp. 261–280. [14] Fama, E.F., French, K.R., (1993) “Common risk
[4] Lev, B., Thiagarajan, R. (1993). “Fundamental factors in the returns on stocks and bonds”, The Journal of
information analysis”, Journal of Accounting Research, Finance, 33, pp. 3-56.
31(2), 190–215.
7
[15] Fama, E.F., French, K.R., (1992) “The cross-section [19] Hazem M. El-Bakry, and Wael A. Awad, Fast
of expected stock returns”, The Journal of Finance, 47, Forecasting of Stock Market Prices by using New High
pp. 427-465. Speed Time Delay Neural Networks, International Journal
of Computer and Information Engineering 4:2 2010. Pp
[16] Al-Haddad W. Alzurqan S. and Al_Sufy S, The
138-144.
Effect of Corporate Governance on the Performance of
Jordanian Industrial Companies: An empirical study on [20] Chapman P., Clinton J., Kerber R., Khabaza T.,
Amman Stock Exchange. International Journal of Reinartz T., Shearer C., and Wirth R., (2000). “CRISP-
Humanities and Social Science, Vol. 1 No. 4; April 2011 DM 1.0: Step-by-step data mining guide”.
[17] Hajizadeh E., Ardakani H., and Shahrabi J., [21] Witten I. Frank E., and Hall M. (2011), “Data
Application of data mining techniques in stock markets: A Mining: Practical Machine Learning Tools and
survey, Journal of Economics and International Finance Techniques”, 3rd Edition, Morgan Kaufmann Publishers.
Vol. 2(7), pp. 109-118, July 2010.
[18] Soni S., Applications of ANNs in Stock Market
Prediction: A Survey, International Journal of Computer
Science & Engineering Technology (IJCSET), pp 71-83,
Vol. 2 No. 3, 2011.