01.predicting SET50 Stock Prices Using CARIMA
01.predicting SET50 Stock Prices Using CARIMA
Abstract—Investing in stocks is one of the most popular ap- This paper proposes a simple yet effective method for
proachesformoneyinvestment.Thispaperaimstopredictshort- predicting short-term movements of a target stock based on
termstockpricesofSET50ofStockExchangeofThailand(SET). ARIMA Model with another stock with highest lead/lag cor-
The proposed method is called CARIMA (Cross Correlation
Autoregressive Integrated Moving Average. The basic idea of relation with the target one. The correlation coefficient can be
CARIMA is to findt hem osth ighlyc orrelateds tockt opredict effectively incorporated with the original ARIMA thus called
thetargetoneinadditiontoARIMApredictedprice.Theresults Cross Correlation ARIMA (CARIMA). The performance of
ofCARIMAmodelyieldbetterpricetrends(measuredby10-day CARIMA is evaluated based on SET (Stock Exchange of
correlation coefficient)w hile% MAEs( MeanA bsoluteErrors) Thailand) 50 dataset in terms of trend similarity and %Mean
are quite similar with those of ARIMA.
Absolute Error(MAE).
Keywords - Stock, Prediction, ARIMA, Correlation, Time
The rest of this paper is organized as follows. Section 2
Series
presents the existing related work. Section 3 explains the
theory of ARIMA and the proposed CARIMA method. Section
I. I NTRODUCTION 4 describes dataset and results. Eventually, Section 5 provides
the conclusion including the future work.
Today, Stock Trading is a very popular method of short-
term investment. It is known that the time series of stock II. R ELATED W ORK
prices fluctuateg reatlyb utn ott otallyr andom.T herefore,the There is much work based on time series technique for
investigationoffactorsthatthefinancialdatahaslongattracted analyzing and predicting trends of the stock market around the
researchers from various different areas, such as: mathemati- world. All of these researches have been using various factors
cians,economistsandmorerecently,computerscientists.With that have impacts on the stock market as input to predict the
the advantages of information technique, a huge amount of trends.
stocktradingdatacanbecollectedeasily.Thecurrentfocusis C. Fonseka and L. Liyanage [5] developed an algorithm
to design a method of extracting useful information from the for predicting an individual stock from the Australian Stock
collecteddata.Therefore,datamininghasdrawnconsiderable Exchange(ASX) with correlation in 2008. Meanwhile, S. Chai-
attentions from the young generations of investor in order to gusin, et al [6], used a feedforward backpropagation neural
predictthechangesordiscoverthepatternsofthestockprices. network to predict the movement of SET index. The inputs of
There are two analytical methods for stock trading: 1.Fun- the models consist of seven nodes including the Dow Jones
damental [3]; 2.Technical Analysis [4]. Fundamental analy- index, Nikkei index, Hang Seng index, Gold prices, Minimum
sis focuses on the central factors of a company. Such as Loan Rate (MLR), and the exchange rates of the Thai Baht and
financials tatements,E arningsp erS hare( EPS),R eturnon the US dollar. In 2012, A. Srisawat [7] applied an association
Assets (ROA), Price-Earnings Ratio (P/E Ratio), Return on rule mining technique for discovering relationships between
Equity (ROE), price and book value (P/BV). On the other individual stocks from SET. Recently, W. Weiqing and Y. Lav
hand, Technical analyses use the recent historical data of [8] used ARIMA Model to study volatile characteristics in
an individual stock, including opening, highest, lowest and the US dollar index itself in 2013. They used measurement
closing prices and the volume of stock, to predict future statistical model to fit the ARIMA Model to predict US Dollar
stockpricemovements.Therearemanyindicatorsthatcanbe index movement for one month.
created from the data. Such as Moving Average Convergence As mentioned above, all of them used various data mining
Divergence (MACD), Average Directional Index (ADX), Ex- techniques to predict the stock or index. Very few of them
ponentialMovingAverage(EMA),DoubleExponentialMov- have used relationships between stocks to help in the stock
ing Average (DEMA), and Relative Strength Index (RSI). price prediction process. As a result, we are going to take
advantages of this gap to propose our method.
,(((
Fig. 1. Unit Root Test of ADVANC
B. ARIMA model
Autoregressive integrated moving average (ARIMA) model
is generalisation of an autoregressive moving average
(ARMA) model. ARIMA include differencing I to ARMA
Fig. 2. ADVANC: ARIMA Predicted Price, Xt (Top) vs Actual Price, Xt
AR(autoregressive)+I(integrated)+MA(moving average). The (Bottom)
equation of ARIMA Model is.
p
q TABLE I
Xt = m + Θi−p Δd Xt−i − i−q t−i (1) ARIMA M ODEL S ELECTION OF ADVANC BASED ON A KAIKE
I NFORMATION C RITERION (AIC) AND BAYESIAN I NFORMATION
t=1 t=1
C RITERION (BIC)
Where m is a constant, Θ and are parameter of autore- ARIMA AIC SC
gressive and moving average parts, is error(white noise), i p=1,d=0,q=0 4594.27 4608.31
is lead/lag time variable (days), Δ is the difference, X is a p=0,d=1,q=0 4582.58 4587.26
closing price of an individual stock, t is a day variable. p , p=0,d=0,q=1 6846.23 6860.27
d and q are orders of autoregressive, difference and moving p=1,d=1,q=0 4584.58 4593.94
average parts, respectively. p=1,d=0,q=1 4596.27 4614.99
p=0,d=1,q=1 4584.58 4593.94
The ARIMA predicted stock price Xt is parameterized as p=1,d=1,q=1 4586.58 4600.62
where AR: p is order of the autoregressive part, I: d is the C. Cross Correlation with Lead/Lag i days
degree of first differencing involved, and M A: q is order of
the moving average part. We assume the closing price is the one that shows the
To select the best model of ARIMA(p,q,d), Akaike Informa- realest price movement behaviour rather than High Low Open
tion Criterion (AIC) and Bayesian Information Criterion (BIC) price. Cross Correlation analysis of closing prices is a measure
can be evaluated to estimate the quality as listed in Table of the interrelationship between two stocks prices with a
1. In this case, ARIMA(0,1,0) model is selected to predict function of lead/lag time of one company relative to another.
ADVANC because of the smallest values of AIC and BIC. The The correlation coefficient can be calculated with respect to
model returns the smallest AIC of 4582.58 and relatively small lead/lag time i days as defined in Eq.(3).
BIC of 4587.26. in Fig. 2 show that there is no correlation 796
between predicted ADVANC and the actual price movement
ρxt yt (i) = [(Xt − uX )(Yt−i − uY )]/δX δY (3)
of ADVANC. t=1
Fig. 3. Cross Correlation Coefficient ρXt Yt (i) with Lead/Lag i between
BAY and BH Fig. 4. ADVANC: CARIMA Predicted Price X̂t with Lead, i = 10 from
INTUCH (Top) vs Actual Price, Xt (Bottom)
where t is the day variable that starts from January 1st, 2012
correlated stock, t is day variable and i is the lead time variable
to March 31st, 2015, i is the lead/lag time variable in days, δX
in days.
is a standard deviation of stock X, δY is a standard deviation
of stock Y , uX is a mean of stock X and uY is a mean of IV. E XPERIMENT AND R ESULTS
stock Y . In this section, there are 2 main activities. Firstly, we explain
We choose the highly correlated coefficient whose the details of SET50 dataset. Secondly, we present and discuss
|ρXt Yt (i)| ≥ 0.8 at time lag t − i. It means that stock Yt is the results.
leading stock Xt by i days. As a result, stock Yt−i can improve
ARIMA Model to predict stock Xt price movement. The A. SET50 Dataset
example of cross correlation between ADVANC and INTUCH The SET50 dataset is collected from Stock Exchange Thai-
shown in Fig. 3. land (SET). It consists of the 50 most valuable companies
(stocks) in Thailand. We use their closing prices to find the
D. CARIMA: Cross Correlation ARIMA correlation coefficient of each individual stock to the others
After we get the results of ARIMA price Xt predicted up to with the lead and lag time. The dataset starts from January
10 days from April 1-18, 2015. The highly correlated stock Y 1st, 2012 (t = 1) to March 31, 2015 (t = 796).
whose |ρXt Yt (i)| ≥ 0.8 that leads stock X for i days is picked. B. Results and Discussion
By shifting back stock Y to day t − i, the Rate of Change
The stock names, 10-day correlation coefficients, and
(ROC) from Yt−i is applied with respect to Xt . Finally, the
%MAEs between CARIMA X̂t and actual prices Xt vs.
CARIMA predicted X̂t is a product of ρXt Yt (i), Xt , and ROC
ARIMA Xt and actual prices Xt are listed in Table 2. NA in
of Yt−i as shown in Eq.(4).
ρXt Xt column means Not Available because the ARIMA result
Yt−i+1 − Yt−i is a straight line resulting divided by zero. The price trends of
X̂t = ρXt Yt (i) × Xt × (1 + ) (4) CARIMA are closer to the actual prices than those of ARIMA
Yt−i
alone (measured by 10-day correlation coefficients), although
where ρXt Yt (i) is the correlation coefficient between stock the %MAEs (Mean Absolute Errors) are quite similar. %MAEs
x and y, X is the ARIMA predicted price, Y is the most of CARIMA is not better than %MAEs of ARIMA.
TABLE II %MAEs are similar to those of ARIMA. For future work, we
10-DAY C ORRELATION C OEFFICIENTS AND %MAE OF CARIMA, X̂t intend to apply two or more stocks as predictors to CARIMA
AND ACTUAL S TOCK P RICES ,Xt VS . ARIMA,Xt AND ACTUAL S TOCK
P RICES ,Xt , (NA: N OT AVAILABLE ) for each individual SET50 stock. It is expected that they can
improve the prediction performance even more.
Stock ρX̂t X ρXt Xt %MAEX̂t X %MAEXt Xt
t t R EFERENCES
JAS 0.58 NA 2.29 2.13
[1] P. S.P. Cowpertwait and A. W. Metcalfe, Introductory Time Series with
ADVANC 0.40 NA 2.85 2.51 R, Springer, 2009, pp.137-155
RATCH 0.35 NA 0.84 0.51 [2] C. M.Conover and D. R. Peterson, The Lead-Lag Relationship between
GLOBAL 0.33 -0.39 1.53 1.32 the Option and Stock Markets prior to Substantial Earnings Surprises
IVL 0.31 NA 2.07 2.01 and the Effect of Securities Regulation. Journal of Financial and Strategic
THCOM 0.30 NA 3.96 2.23 Decisions, Spring 1999, Vol. 12 No. 1
BANPU 0.27 NA 3.65 3.20 [3] B. Graham , J. Zweig and W. Premchaiswadi, The Intelligent Investor:
The Definitive Book on Value Investing. A Book of Practical Counsel,
TCAP 0.15 NA 1.86 1.46
Collins Business Essentials, 2006
BCP 0.14 NA 4.61 4.78 [4] S. Nison, Beyond Candlesticks: New Japanese Charting Techniques
EGCO 0.10 -0.87 1.53 1.51 Revealed, Wiley Finance, 2009
KKP 0.08 -0.15 0.89 0.74 [5] C. Fonseka, and L. Liyanage, A Data mining algorithm to analyse stock
PTTEP 0.07 -0.56 6.82 6.36 market data using lagged correlation, 4th International Conference on
CENTEL 0.04 NA 3.94 4.10 Information and Automation for Sustainability, 2008, pp.163 - 166
TTW 0.03 -0.88 1.38 1.41 [6] S. Chaigusin, C. Chirathamjaree, and J. Clayden, A Data mining
algorithm to analyse stock market data using lagged correlation, Pro-
BAY -0.09 0.32 4.62 5.17 ceedings of the International Conference on Computational Intelligence
BH -0.11 -0.75 3.67 1.74 for Modelling Control and Automation, 2008, pp. 670 - 673
GLOW -0.17 -0.74 2.64 2.46 [7] A. Srisawat, Discovery Stock Trading Patterns: A Case Study of Thai
TRUE -0.20 0.80 3.84 4.07 Stock Market, International Journal of Intelligent Information Processing,
BDMS -0.22 -0.39 2.62 2.42 2012, Vol. 3 Issue 1, pp.1-9
CPN -0.33 NA 3.20 3.61 [8] W. Weiqing and Y. Lav, A Study of the USDX Based on ARIMA Model
A Correlation analysis between the USDX and the Shanghai index.
SCB -0.38 -0.51 1.77 1.33
Consumer Electronics, Communications and Networks (CECNet), 3rd,
PS -0.48 -0.71 2.62 2.31 2013
[9] J. A. Ryan, J. M. Ulrich, and W. Thielen, quantmod: Quanti-
tative Financial Modelling Framework, Version: 0.4-5, https://cran.r-
project.org/web/packages/quantmod/index.html, 2015
Because the main purpose of ARIMA is to predict price [10] R.J. Hyndman, forecast: Forecasting Functions for Time
Series and Linear Models, Version: 6.1, https://cran.r-
movement as close to the actual price as possible. But project.org/web/packages/forecast/index.html, 2015
CARIMA incorporates the cross correlation coefficient with
ARIMA in order to improve the correlation of the price move-
ments. For example, CARIMA predicted prices of ADVANC
predicted by INTUCH can be plotted in Fig. 4 and compared
with those of ARIMA in Fig. 2. We can see that those of
CARIMA are more correlated while the predicted price from
ARIMA is a straight line thus not correlated to the actual price.
Although %MAE are slightly higher than ARIMA. CARIMA
can yield better price trends as ρX̂t X are better than ρXt Xt .
t
Since the dataset is not big enough, only 50 individual
stocks, it is hard to find the leading/lagging stocks with
high correlation coefficients to make the better prediction.
Moreover, most of the stocks have the highest/ least correlation
coefficients in lag i = 0 that can be predicable that stock always
move up and down together in the same day. Its not reasonable
to use that to help ARIMA to predict individual stock price
movement.
V. C ONCLUSION
This paper aims to incorporate cross correlation coefficient
with ARIMA to predict short-term (daily) SET50 stock price
movement. The proposed method is called CARIMA to predict
stock A using the most highly correlated stock B within 10-day
lead. The empirical results obtained and compared CARIMA
predicted prices X̂t and ARIMA predicted Xt against the ac-
tual prices Xt . In terms of performance evaluation, CARIMA
prices are more correlated to the actual prices and their