Kalman Filter Based Time Series Prediction of Cake Factory Daily Sale
Kalman Filter Based Time Series Prediction of Cake Factory Daily Sale
net/publication/323671215
Kalman filter based time series prediction of cake factory daily sale
CITATIONS READS
2 3,475
5 authors, including:
Jionglong Su Fei Ma
Xi'an Jiaotong-Liverpool University Xi'an Jiaotong-Liverpool University
65 PUBLICATIONS 515 CITATIONS 54 PUBLICATIONS 286 CITATIONS
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jionglong Su on 27 July 2019.
Abstract—Accurate prediction of future daily sales is a crucial [13]. In the meantime, its time consuming learning process
step towards optimal management of daily production of a cake and poor generalization ability restricts its application in real
factory. In this study, an interacting multiple model integrated world [14].
kalman filter was used to predict the future daily sales of cake
products. Two years daily sale history of 108 cake products were In recent years, the state space model which embeds Box-
used to train and test the proposed method. Our experiments Jenkins models into Kalman Filter has been suggested ( [15],
show that 1) running interacting multiple models of different [16]). These Kalman filtering algorithms can implement the
orders in parallel is more effective than single classical interacting optimal estimation of the system states. It constructs optimal
multiple model; 2) when only daily sale data was used, the state estimations by methods such as maximum likelihood
proposed method predicted 33.54% of sales within ±10% of true
sales; 3) when more variables, including festival and weekend, estimation (MLE) or least square solution and is self-adaptive
were combined into the prediction, 34.38% of predicted sales in nature [17]. These improves the accuracy and efficiency of
were within ±10% of true sales. time series prediction.
The Interacting Multiple-Model (IMM) is a method which
Keywords-Kalman filter, interacting multiple models, time allows optimal solution to be obtained among several possible
series, daily sale prediction, cake factory models [18]. Each model can effectively describe changing
parameters and each solution to the problem is a combination
I. I NTRODUCTION of several models.
In this paper we combine Kalman filter with Interacting
A time series is a series of data indexed in time order. Time Multiple-Model for online prediction of daily sales of a
series forecasting focuses on the influence of the rules of a cake shop. Sales prediction is essential for operation of food
given phenomenon to predict future trends based on historic industry due to short shelf life of raw materials. Accurate
data [1]. Traditional Box-Jenkins regression models such as sales forecasting can result in reasonable resource allocation
Autoregressive (AR), and Autoregressive Integrated Moving and optimization in advance, As a result, this can effectively
Average (ARIMA) have been widely used in the past for reduce wastage or bottleneck. Moreover, in order to improve
stationary data analysis [2]. In recent years, the modeling and the predicting accuracy and efficiency, there are several im-
prediction of the non-stationary time series raises considerable provements applied in our model :
interest to the research community [3]. • Parallel IMM with different orders used in this new model
In the last few decades, artificial neural networks (ANN) has make the lag in AR model can be adapted in the process
been widely used to predict the nonlinear processes due to its of optimal selection.
map-approaching and self-learning abilities [4]. For example, • The new method to achieve self-updated observation
Karunanithi et al. [5], Hu et al. [6] used both classical and noise variance Rt has been applied in this model, more-
recurrent multilayer perceptron (MLP) neural networks for over, we set some constant choice for the value of
forecasting software reliability. Inspired by both the multilayer system noise variance Qt to simplify operations and avoid
perceptron (MLP) and wavelet decomposition, Wang et al. ( backtest overfitting.
[7], [8], [9]) proposed the wavelet packet MLP (WP-MLP) • The new method to achieve self-updated observation
and demonstrated that it can be a promising method for noise variance Rt has been applied in this model, more-
time series prediction. Xu et al. [10], Wang and Gupta [11] over, we set some constant choice for the value of
applied feed-forward MLP to conduct prediction. The method system noise variance Qt to simplify operations and avoid
based on a BiLinear Recurrent Neural Network proposed by backtest overfitting.
Ghazali et al. [12] has also been proven to have robust abilities
in modeling and predicting time series. However, there are II. M ETHODOLOGY
some flaws on these methods: though ANN does not build
on specific mathematic model, it requires a large number A. Autoregressive Model
of training samples. And noisy training samples can lead to The autoregressive model, which describes a time-varying
error learning results such as slow convergence parameters and process, specifies that the estimated variable only depends
suboptimality which will finally result in invalid prediction linearly on its own previous data and a stochastic term [19].
The notation AR(p) is defined as: Pt|t−1 represents the error covariance matrix. The prior esti-
p
X mate x̂t|t−1 and Pt|t−1 are made for the measurement update.
yt = αi yt−i + t (1) As yt has been obtained at time t, this model can update the
i=1 conditional mean and covariance matrix of estimates according
where yt is the time series under investigation, αi denotes the to the three equations:
auto-regression coefficient, p represents the order of the model Kt = Pt|t−1 HtT [Ht Pt|t−1 HtT + Rt ]−1 (6)
which is generally very much smaller than the length of the
x̂t = x̂t|t−1 + Kt [yt − Ht x̂t|t−1 ] (7)
series, and t is always assumed to be the Gaussian white
noise. Pt|t = Pt|t−1 − Kt Ht Pt|t−1 (8)
Autoregressive model assumes that the series xt is sta- where Kt is the Kalman gain matrix that minimizes the à
tionary, which means the joint probability distribution of the posteriori error covariance, Pt|t denotes the updated error
stochastic process does not change when shifted in time. covariance matrix. Once this model obtains the outcome of
However, if we regard the series as a whole, the order of the next measurement, these estimates are updated based on
the whole series may not be the most appropriate for every a weighted average. The algorithm is recursive and does not
time step because the series may be nonstationary. Hence, we require any additional past information, only taking the present
consider that running the models which have different orders in input measurements and the previously calculated state and its
parallel and selecting the output based on estimated accuracy uncertainty matrix into consideration.
may be more efficient than classical means. In section 3,
1) Embedding Autoregressive Model into the KF: To solve
our model associated with Kalman filter and the interacting
the problem that the classical AR(p) model can only be fitted
multiple-model will be tested in detail.
using stationary data, the AR(p) model is embedded into the
Kalman filter, since the KF can accept non-stationary data.
B. Kalman filter(KF) Therefore, for the system equation of state space model, we
In classical filtering and prediction of dynamic system set At to be identity matrix so that xt follows a random walk.
by means of the state transition, one is subjected to many For observation equation, we rewrite the AR model as
limitations which may adversely impact on their practical
yt = Ht xt + vt (9)
applications [20]. To sidestep the difficulties which will curtail
the real-world usefulness, the Kalman filter provides an effi- where
cient recursive means to estimate the state space and is very Ht = yt−1 yt−2 ··· yt−p (10)
powerful to solve linear filtering problems [21].
and
The Kalman filters, based on linear dynamical system T
xt = α1 α2 ··· αp (11)
discretized in the time domain, can estimate the current state
with the estimated state from the previous time step and the The order of classical AR model is determined after obtain-
current measurement. In other words, this model assumes that ing all data. However, in the KF, we will not test the data first
the state of a system at a time t grounded on the prior state and we just do the prediction directly. Hence we can run the
at time t − 1 and gets an observation yt of the true state xt at model in real time.
time t according to the state space model:
2) Estimation of Rt : The process noise covariance Qt and
xt = At xt−1 + wt (system equation) (2) the measurement noise terms Rt are often assumed to be
yt = Ht xt + vt (observation equation) (3) unknown constants, which can simplify the KF. However,
the value of Qt and Rt should be time-variant since we are
where xt denotes the state vector at time step t, At represents estimating non-stationary time series. In this paper, we let
the state transition matrix based on the system state parameter Rt be self-adaptive and Qt be an unknown discrete random
at time step t − 1, wt is the process noise vector at time step variable. We give several possible values of Qt , which will be
t, yt denotes the vector of measurements, Ht represents the explained in detail in next section.
transformation matrix that maps the state vector parameters One method to estimate the measurement noise terms Rt is
into the measurement domain, vt is the measurement noise to define
term at time step t. Mt = yt − Ht x̂t|t−1 (12)
The Kalman filter consists of two stages: prediction and
measurement update. The equation related to prediction stage Since
are: yt = Ht xT + vt (13)