0% found this document useful (0 votes)
49 views5 pages

Conf1 Ieee Icaesm

Time series prediction using ARIMA models
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views5 pages

Conf1 Ieee Icaesm

Time series prediction using ARIMA models
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE- International Conference On Advances In Engineering, Science And Management (ICAESM -2012) March 30, 31, 2012 259

Predictive Data Mining On Average Global


Temperature Using Variants of ARIMA models

Narendra Babu .C1, B. Eswara Reddy2, 1Member, IEEE


1
Research Scholar, Dept. of Computer Science
2
Associate Professor and Head, Dept. of Computer Science
JNTU Anantapur, Anantapur, A.P., India
[email protected]

 ARIMA is applied on mobile user data in [3]. Employment


Abstract—This paper analyzes and predicts the Average Global forecasting has been done using ARIMA in [4]. In financial
Temperature time series data. Three different variants of ARIMA applications also, such as stock market, ARIMA has been
models: Basic ARIMA, Trend based ARIMA and Wavelet based applied as seen in [5] for the case of Indonesian stock
ARIMA have been used to predict the average global exchange.
temperature. Out of all the three linear models, it has been
observed that Trend based ARIMA method outperforms basic Recently, wavelet based ARIMA has been used for prediction
ARIMA method and Wavelet based ARIMA method outperforms of traffic flow data in [1]. Wavelets have also been used for
Trend based ARIMA method. MAPE (Mean Absolute Percentage data mining on algae concentrations as seen in [6]. Many non-
Error), MaxAPE (Maximum Absolute Percentage Error) and linear techniques such as ANN (Artificial Neural Networks)
MAE (Mean Absolute Error) have been used as a performance
have also been used in various applications. ANN has been
measures to compare between the models.
compared with ARIMA in [7] and found that it is better than
Index Terms—Time series forecasting, Trend-based ARIMA, ARIMA. But non-linear techniques involve greater
Wavelet-based ARIMA, Predictive data mining, Average global complexity. Fuzzy techniques have also been applied as in [8].
temperature. Rough set theory also found its application in data mining as in
[9]. HMM (Hidden Markov Model) and spectral analysis have
been applied on sensory time series data as in [10].
I. INTRODUCTION In this paper data has been analyzed and predicted using three

P REDECTIVE and Descriptive data mining are two


important branches of data mining. Descriptive data
mining is used to extract knowledge from the available
different variants of ARIMA models, which are linear in
nature. The three models are basic ARIMA, Trend-based
ARIMA and wavelet-based ARIMA, each of which has been
data. Predictive data mining is used to forecast the available discussed in detail, later in the text following. Average global
data to benefit in many ways. Time series data are an temperature changes from 1880 to 2010 have been obtained
important class of data. Change of an attribute value as from [12]. The prediction of this time series data is done using
function of time can be considered as time series data. Such the above said techniques and the performance is compared.
data can be seen in many applications such as atmospheric
The rest of the paper is organized as follows: Section II
changes, production of commodities, geographical data, data
extracted from sensors, stock market data, inventory control describes the basic ARIMA model. Section III discusses trend-
etc. based ARIMA model. Wavelet-based ARIMA model is dealt
in section IV. Section V shows the implementation and results
Descriptive techniques such as classification, clustering etc. obtained, along with performance comparison. This concludes
can be used for pre-processing the raw time series data and the paper thereafter.
then predictive techniques can be applied on the same data to
forecast the data. The accuracy of such method sometimes
better than raw data prediction. II. BASIC ARIMA
In the literature ARIMA models (Box-Jenkins models) have A stationary time series data can be modeled as an ARMA
been widely used for various applications. Annual production (Auto Regressive Moving Average) model of order (p, q),
of pegion pea in India has been predicted using ARIMA in [2]. where p stands for order of AR process, q stands for MA
process order. An ARMA modeled time-series data can be
represented as in (1).

ISBN: 978-81-909042-2-3 ©2012 IEEE


IEEE- International Conference On Advances In Engineering, Science And Management (ICAESM -2012) March 30, 31, 2012 260

yt  a1 yt 1  a2 yt 2  ...a p yt  p  b1et 1  ....  bq et q  et III. TREND-BASED ARIMA


This method is a composite technique where, smoothening is
(1) used as a pre-processing step before doing predictive data
mining on the given time series data. Initially a moving
In (1), yt represents data, yt 1 , yt  2 .. are data at past time average linear filter is used on the raw data to make it smooth.
instants. et 1 , et  2 .. are errors at past time instants and This smoothing filter is given in (2). The thus extracted
smoothened data is called as trend of the data.
et stands for present error. ARMA assumes that the error is
p
1
Gaussian distributed. The coefficients, a1 , a2 … a p are the st   yt  i
2 p  1 i p
(2)
AR coefficients and b1 , b2 … bq are the MA model
coefficients. The model can be validated using AIC (Akakine In (2), st represents the trend component, yt represents the
Information Criterion).
raw data, (2p + 1) represents the length of the filter. The trend
ARIMA stands for Auto Regressive Integrated Moving data is subtracted from the raw data and the residual data is
Average Model. It is an ARMA model on a differenced time extracted. Now the raw data is classified into two parts: trend
series data, where differencing of order ‘d’ is performed on the component and residual data components. This equation is
original time series data to make the data stationary. The given in (3).
overall ARIMA (p, d, q) modeling involves the following
steps, which are discussed here in brief: rt  y t  st (3)

A. Making the data stationary In (3), rt represents residual data. To each of the components,
The time series data need not be stationary always. In this case trend and residual data, basic ARIMA model, described in the
an ARMA model cannot be fit. So initially, the data is made previous section is applied and each component is predicted.
stationary by performing differencing operation. The number Then the raw data predictions are obtained by summing up the
of times differencing is performed is denoted by ‘d’. predicted trend and predicted residual data as given in (4):

B. Identifying suitable values for model order y t , pre  st , pre  rt , pre (4)
To find out the values of p and q, the autocorrelation function
(ACF) and the partial autocorrelation function (PACF) are In (4), y t , pre represents predicted raw data, st , pre represents
applied on the differenced data. If the ACF shows a sinusoidal
decay and simultaneously, PACF becomes zero after a lag p, it
the predicted trend component, and rt , pre represents the
is a pure AR process of order p. If the ACF becomes zero after predicted residual component. The thus obtained predictions
a lag q, and PACF has sinusoidal decay, then it becomes an have greater accuracy compared to predictions from basic
MA process of order q. If both ACF and PACF have ARIMA. This has been confirmed as per the results obtained
sinusoidal decay, and become zero after lags q and p in section V.
respectively, then it becomes an ARMA process of order (p, q)
and correspondingly ARIMA process of order (p, d, q).
C. Prediction or forecasting the time series data IV. WAVELET-BASED ARIMA
After the model order (p, d, q) is obtained, using the estimation This is also a composite technique, where, the raw data is first
techniques, the parameters, a1 , a2 .. a p , b1 , b2 .. bq are found pre-processed using Wavelet decomposition technique and
then predictive data mining techniques are applied. The raw
out. Also, the variance of et can be obtained. Using these time series data is decomposed into several time series data
parameter estimates and error variance, the time series data components using wavelet transform. These components are
can be forecasted using methods like Least Squares, Maximum called as approximate and detailed components. This wavelet
Likelihood etc. transform can be applied in multiple levels.

For example consider a time series datayt . When db5


This modeling method is devised by Box and Jenkins [11], and
so, is famously called as Box Jenkins method. This can be (Deubechies) wavelet filtering is done on yt , two
used to predict a large class of time series data. If the time filtered/decomposed series can be obtained, one of them is a
series data has large variance after it is made stationary, then low-pass version which is called approximate component
logarithmic transformation can be used on the raw data to
decrease the error variance.
yat1 and the second is a high-pass version, which is called
The differenced data must be integrated to get back the raw
detail of level 1, ydt1 . The yat1 component can be further
data predictions. So, this model is termed as Auto Regressive filtered using wavelet filter, to get again approximate
“Integrated” Moving Average (ARIMA) model. component of level 2, yat 2 and the detail component of level

ISBN: 978-81-909042-2-3 ©2012 IEEE


IEEE- International Conference On Advances In Engineering, Science And Management (ICAESM -2012) March 30, 31, 2012 261

2, ydt 2 . This decomposition can be further continued to a identification and estimation is accomplished with the help of
R software, which is a freeware. This can also be done using
required number of levels. This decomposition strategy is
SPSS or SAS softwares. The results of basic ARIMA are
shown in (5).
shown in figure 2.

(5)
Thus, the raw data is decomposed into three different time
series data yat 2 (approximate), ydt 2 (detail 2) and
ydt1 (detail 1). The approximate component resembles the
smoothened version of raw data (similar to trend of section
III). Detail 2 and Detail 1 are the high frequency components.
If wavelets are applied again on yat 2 , then yt is decomposed
into one approximate and three detail time series data. The
detailed outline of this method is described in [1].
Each decomposed time series data is predicted using the basic
ARIMA model described in section II. Then the raw data
predictions are obtained by adding the predicted approximate Figure 1. Original Time Series data (Average Global Temperature)
time series data and all the predicted detailed time series data,
as given in (6): The trend-based ARIMA has been applied on the raw data
and the model for trend has been identified as ARIMA(11, 0,
y t , pre  yat 2, pre  yd t 2, pre  yd t1, pre (6) 11). The Residual model is obtained as ARIMA(5 ,0, 5). The
predicted data using this method is compared with the original
This method is an extension of the trend-based ARIMA
data as shown in figure 3. The smoothing filter can be
described in section III. Trend and residual data are like
implemented using MATLAB or SciLab.
decompositions of raw data. As trend is smooth version, it is
similar to the low frequency component of raw data and the
residual data comprises the high frequency component of raw
data. Trend and residual data are like first level
decompositions of the raw data. Trend can be further
decomposed and the method can be extended.
The difference between this method and the Trend-based
ARIMA method is that the filter used here is not a simple
moving average filter but a db5 filter. So this method has a
greater accuracy in predicting than trend-based ARIMA.
Nevertheless, trend based ARIMA has a better accuracy than
basic ARIMA. These have been confirmed from the results
shown in section V.

V. RESULTS AND DISCUSSION


The average global temperature time series data from 1880
to 2010 has been taken from [12] and prediction is performed
Figure 2. Raw data and basic ARIMA predicted data
using all the three methods explained before. The results of the
prediction are compared between the methods. The raw data
The wavelet based ARIMA has been applied on the raw
plot is shown in figure 1.
data and the raw data has been decomposed into one
Basic ARIMA is used on the raw data, for which the approximate and three detailed time series components.
estimation period is taken from 1880 to 2000. Using this Wavelet filter used here is db5. Each decomposed data is
estimation period data, the ARIMA model order and the model estimated using basic ARIMA and predicted. The predicted
coefficients (parameter estimates) are obtained. The estimated decompositions are then added to get the predicted raw data as
model is ARIMA (5, 1, 8). The average global temperatures described in section IV. The results of this method in
from 2001 to 2010 have been then predicted. The model comparison with the original data are shown in figure 4.

ISBN: 978-81-909042-2-3 ©2012 IEEE


IEEE- International Conference On Advances In Engineering, Science And Management (ICAESM -2012) March 30, 31, 2012 262

The performance of all the three methods is compared using


MAPE (Mean Absolute Percentage Error) performance
measure. This is given in (7).

1 N
original i  forecast i
MAPE 
N
 i 1 forecast i
(7)

 original i  forecast i 
MaxAPE  max 
 (8)
 forecast i 
N
1
MAE 
N
 original
i 1
i  forecast i (9)

In (7) and (9), N represents number of predicted values. The


MAPE, MaxAPE and MAE values obtained for all the three
methods have been tabulated in table 1. From table 1, it can be Figure 5. Raw data and predicted data from all three models
observed that trend-based ARIMA outperforms basic-ARIMA
and wavelet-based ARIMA outperforms trend-based ARIMA TABLE I
method. The raw data and the predicted data from each of the PERFORMANCE MEASURES COMPARISON
three methods are shown together in figure 5. S.No Method Used MAPE MaxAPE MAE
1 Basic ARIMA 0.0113 0.0176 0.1645

2 Trend-Based ARIMA 0.0077 0.0155 0.1119

3 Wavelet-Based ARIMA 0.0039 0.0081 0.0567

VI. CONCLUSION
This paper predicts average global temperature for the
forecasting period 2001 to 2010. This prediction is
accomplished using three variants of ARIMA models, basic-
ARIMA, Trend-Based ARIMA and Wavelet-Based ARIMA
models. The performance of each of these methods is
compared using MAPE. It is concluded that Wavelet–Based
ARIMA performs the best out of the three. Trend-Based
Figure 3. Raw data and Trend-based ARIMA predicted data
ARIMA is better than Basic-ARIMA but is inferior to Wavelet
–Based ARIMA. All the methods used here are linear in
nature. Using ANN, Fuzzy, HMM and Rough Set and Hybrid
models we can devise even better prediction method.

REFERENCES
[1] Ni Lihua, “ARIMA traffic flow using wavelets”, 2nd International
Conference ICISE, 2010.
[2] Dr. PDKV, “Use of the ARIMA Model for Forecasting Pigeon Pea
Production in India”, International Review of Business and Finance,
ISSN 0976-5891 Volume 2 Number 1, pp. 97–102, 2010.
[3] Xu Ye, “The Application of ARIMA Model in Chinese Mobile User
Prediction”, 2010 IEEE International Conference on Granular
Computing , 2010.
[4] Xiaoguo Wang, “ARIMA Time Series Application to Employment
Forecasting”, Proceedings of 4th International Conference on
Computer Science & Education, 2009.
[5] Yohanes Budiman Wijaya ,“Stock price prediction comparison of
Arima and artificial neural network Methods”, 2010 Second
Figure 4. Raw data and Wavelet-based ARIMA predicted data International Conference on Advances in Computing, Control, and
Telecommunication Technologies, 2010.

ISBN: 978-81-909042-2-3 ©2012 IEEE


IEEE- International Conference On Advances In Engineering, Science And Management (ICAESM -2012) March 30, 31, 2012 263

[6] Jinsuo Lu, “Data Mining on Algae Concentrations -Chlorophyll Time He has eleven years of teaching experience in various engineering
Series in Source Water Based on Wavelet”, Fifth International colleges. His research interests include Data Mining, Prediction on time
Conference on Fuzzy Systems and Knowledge Discovery, 2008. series data, Neural Networks, optimization techniques etc.
[7] Mehdi Khashei, “An artificial neural network (p, d,q) model for time
series forecasting”, Expert Systems with Applications pp. 479–489,
Elsevier, 2010.
[8] Chun-Hao Chen a, “Mining fuzzy frequent trends from time series”,
Expert Systems with Applications, pp. 4147–4153, Elsevier 2009.
[9] JingTao Yao, “Financial time-series analysis with rough sets”, Applied Dr. B. Eswara Reddy Graduated in B.Tech.(CSE) from Sri Krishna
Soft Computing , pp. 1000–1007, Elsevier 2009. Devaraya University in 1995. He received Masters Degree in
M.Tech.(Software Engineering), from JNT University, Hyderabad, in 1999.
[10] Jie Yin, “Integrating Hidden Markov Models and Spectral Analysis for He received Ph.D in Computer Science & Engineering from JNT University,
Sensory Time Series Clustering”, Proceedings of the Fifth IEEE Hyderabad, in 2008. He served as Assistant Professor from 1996 to 2006. He
International Conference on Data Mining (ICDM’05), 2005. is working as Associate Professor in CSE Dept., since 2006 and currently
[11] Brockwell ,“Time Series Theory and Methods ”, 1987. acting as Head of CSE Dept. at JNTUACE, Anantapur.
[12] http://www.earth-policy.org/data_center/C26 He has more than 10 Publications in various International Journals
and more than 15 Publications in various National and International
Conferences. He is one of the authors for the text book titled Programming
with Java published by Pearson/Sanguine Publishers. His research interests
include Pattern Recognition & Image Analysis, Data Warehousing & Mining
Narendra Babu C (M’10) became the Member(M) of IEEE in 2010. He was and Software Engineering.
born in 14 Jul 1978. He is presently working as Senior Assistant Professor , in He is a life member of ISTE, IE, ISCA and CSI.
the REVA Institute of Technology, Bangalore and also pursuing his Ph.D
from JNT-University Anantapur, AP, under the guidance of Dr. B. Eswara
Reddy. He completed his M.Tech degree from MSRIT, Bangalore in 2004.
He completed his BE from Adichunchanagiri Institutie of Technology,
Chikmagalur, Karnataka in 2000.

ISBN: 978-81-909042-2-3 ©2012 IEEE

You might also like