Theory of Chaos
Theory of Chaos
A R T I C L E I N F O A B S T R A C T
Keywords: Melt index (MI) is one of the most important variables determining the product quality in the industrial propylene
Melt index prediction polymerization process. In this paper, a multi-scale prediction model is proposed for MI prediction by combining
Empirical mode decomposition the empirical mode decomposition (EMD), chaos theory and optimized relevance vector machine (RVM) model.
Chaos theory
First, the EMD method is used to decompose the MI time series into intrinsic mode functions (IMFs) and the
Relevance vector machine
Multi-scale prediction model
residual. Then the chaotic characteristics of each component are identified with chaos theory. For the components
with chaotic characteristics, relevance vector machine (RVM) chaotic prediction model is developed as the pre-
dictive model. For the components without chaotic characteristics, least squares support vector machine (LSSVM)
is used as the predictive model. At the same time, an improved ant colony optimization (IACO) algorithm is used
to optimize the parameters of RVM and LSSVM. In the end, the final prediction results of MI are obtained by
summing the predicted results of all components. Research on the proposed multi-scale model is carried out on a
real propylene polymerization plant and the results are compared among the RVM-chaos, IACO-RVM-chaos and
multi-scale models. The research results show that the model developed achieves a good performance in the
industrial MI prediction process.
* Corresponding author.
** Corresponding author.
E-mail addresses: [email protected] (M. Zhang), [email protected] (L. Zhou).
https://doi.org/10.1016/j.chemolab.2019.01.008
Received 23 August 2018; Received in revised form 31 January 2019; Accepted 31 January 2019
Available online 2 February 2019
0169-7439/© 2019 Published by Elsevier B.V.
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
These works give good prediction, but greater performance and better Xmax ðtÞ þ Xmin ðtÞ
m1 ðtÞ ¼ (1)
universality of the prediction model are still the first-line goal in aca- 2
demic and industrial community. However, up to now, the above
methods predict MI using indirect measurable variables which have in-
(3) Extract the first IMF from original signal. First calculate h1 ðtÞ ¼
fluence to MI. No one directly predicts the MI by the law of itself. Un-
XðtÞ m1 ðtÞ. If h1 ðtÞ satisfies the IMF's conditions, it becomes the
derstanding the chaotic and multi-scale characteristics make the property
first IMF. If not, replace XðtÞ with h1 ðtÞ, and then calculate
of PP process more clear. We can analyze the process from different scales
h11 ðtÞ ¼ h1 ðtÞ m11 ðtÞ, where m11 ðtÞ is the mean of the upper and
and use the nature of each scale to make better prediction. Huang et al.
lower envelopes of h1 ðtÞ. If the IMF's conditions are still not met,
[17] proposed a decomposition technique to disclose the hidden nature
repeat the above steps k times until h1k ðtÞ satisfies IMF's condi-
in a time series, named the empirical mode decomposition (EMD). It is
tions, where h1k ðtÞ ¼ h1ðk1Þ ðtÞ m1k ðtÞ. Thus, the first IMF
commonly used for the processing of non-stationary and nonlinear sig-
component from the data is designated as c1 ðtÞ ¼ h1k ðtÞ.
nals [18–20] and does not require any predefined basis functions as in
(4) Once c1 ðtÞ is obtained, compute the residue r1 ðtÞ:
Fourier or wavelet analysis. The EMD method decomposes the time series
into a sum of intrinsic mode functions (IMFs) and residual sequence. This r1 ðtÞ ¼ XðtÞ c1 ðtÞ (2)
data-adaptive decomposition method is a powerful tool for multi-scale
analysis. Various EMD-based hybrid forecast models have been pre-
sented for the different applications [21–24]. Chaos is a bounded un- (5) Repeat steps (1)–(4) using r1 ðtÞ as XðtÞ to obtain c2 ðtÞ until rn ðtÞ is
stable dynamic behavior that exhibits sensitive dependence on initial a monotone function or a direct current component. That is
conditions and includes infinite unstable periodic motions in nonlinear
ri ðtÞ ¼ ri1 ðtÞ ci ðtÞ; i ¼ 1; 2; ⋯; n (3)
systems. Although it appears to be stochastic, it occurs in deterministic
nonlinear system under deterministic conditions. Prediction of chaotic
time series is a useful method to evaluate the characteristics of dynamical (6) Finally, the original time series XðtÞ can be expressed as the sum of
systems and forecast the trend of complex systems [25]. With the the IMFs and the residual:
development of chaos theory, chaotic time series prediction has been
X
n
widely applied in the fields of financial time series prediction, electricity XðtÞ ¼ ci ðtÞ þ rn ðtÞ (4)
price prediction, power load prediction, traffic flow prediction, and so on i¼1
[26–28].
In practice, Huang et al. [30] proposed to use the standard deviation
In this paper, a multi-scale prediction model based on EMD and chaos
(SD) between two consecutive processing results as a basis for judging
theory is proposed for industrial MI prediction. First, the EMD method is
whether the screening process is terminated. It is described as follows:
used to decompose the MI time series into IMFs and the residual. Then
the chaotic characteristics of each component are identified with chaos " #
XT h1ðk1Þ ðtÞ h1k ðtÞ2
theory. For the components with chaotic characteristics, relevance vector SD ¼ <ε (5)
t¼0 h21ðk1Þ ðtÞ
machine (RVM) chaotic prediction model is developed as the predictive
model. For the components without chaotic characteristics, least squares
support vector machine (LSSVM) is used as the predictive model. At the where T is the length of the original signal and ε is between 0.2 and 0.3.
same time, an improved ant colony optimization algorithm (IACO) is
used to optimize the parameters of RVM and LSSVM. In the end, the final 3. Methodology for chaotic characteristics analysis
prediction results of MI are obtained by summing the predicted results of
all components. Detailed comparisons among RVM-chaos, IACO-RVM- 3.1. Hurst index analysis
chaos and multi-scale models are then carried out.
The rest of the paper is organized as follows: Section 2 presents a brief The Hurst index, proposed by the famous British hydrologist H. E.
methodology of EMD. Section 3 introduces chaotic characteristics anal- Hurst [31], is an indicator for studying the memory of time series. The
ysis. Then, Section 4 presents the establishment of multi-scale prediction rescaled range analysis (R/S) is proposed to establish the Hurst index
model. At last, the results of the experiments are presented and discussed (denoted by H) as an indicator for judging whether the time series data
in Section 5, and conclusion is given in Section 6. follows a random walk or a biased random walk process [32]. In detail,
H ¼ 0:5 means the time series data follows a random walk; 0 H < 0:5
2. Empirical mode decomposition indicates that the time series has anti-persistence; 0:5 < H 1 means
that the time series has long-term memory characteristics. The R/S
Empirical mode decomposition (EMD) is a nonlinear and adaptive method is non-parametric analysis and does not require any potential
signal decomposition method [17]. EMD decomposes the signal into a distribution assumptions for the time series.
sum of intrinsic mode functions (IMFs) and a trend component. All the For the time series fxðtÞg; t ¼ 1; 2; ⋯; n, the R/S analysis can be
IMFs must satisfy two conditions: (1) the difference between the total expressed as
number of extrema and total number of zero-crossings is zero or one; (2)
RðnÞ=SðnÞ ¼ ðc nÞH (6)
at any point, the mean value of the envelope defined by the local maxima
and the envelope defined by the local minima is zero. EMD makes no
where H is the Hurst index, n is the sampling number, RðnÞ denotes the
assumption of linearity or stability compared with the Fourier transform,
range, SðnÞ denotes the standard deviation, and c is a constant.
and unlike the wavelet transforms, it does not include base functions and
The Hurst index H is estimated by
fixed frequency scale connected to these functions [29].
The detailed procedure of EMD for a given time series XðtÞ is as fol- lnðRðnÞ=SðnÞÞ
lows: H¼ (7)
lnðc nÞ
(1) Connect all the local maxima using cubic spline to form an upper
envelope Xmax ðtÞ. Repeat the procedure for the local minima to 3.2. Phase space reconstruction
form an lower envelope Xmin ðtÞ.
(2) Calculate the mean value m1 ðtÞ of the upper and lower envelopes, As the characteristics of time series and the establishment of predic-
that is tion model must be carried out in phase space, the phase space recon-
24
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
struction theory is the basis of chaotic time series analysis [33]. The for judging the existence of chaos [37].
purpose of phase space reconstruction is to recover chaotic attracttors The small data amount method proposed by Rosenstein et al. [38] has
that reflect the regularity of chaotic systems in high-dimensional phase high computational accuracy and small computational complexity. It is
space, so as to obtain more hidden information. In 1981, the Dutch also reliable for small data sets. Therefore, the small data amount method
mathematician Takens proposed the famous Takens theorem [34]. Ac- is adopted in this paper to calculate the largest Lyapunov exponent λ1 .
cording to the Takens theorem, if the embedding dimension m 2d þ The detailed procedure of calculating λ1 with small data sets method is as
1(where d is the correlation dimension), then the chaotic attractor can be follows:
recovered from this embedded dimension space.
If the dynamics of time series fxðtÞ; t ¼ 1; 2; ⋯; ng, are embedded in (1) For a time series fxi g; i ¼ 1; 2;⋯;N, calculate its delay time τ and
the m-dimensional phase space, then the phase space can be defined by embedding dimension m;
2 3 2 3 (2) Make phase space reconstruction as fYj ; j ¼ 1; 2; ⋯; Mg, where
X1 xð1Þ xð1 þ τÞ ⋯ xð1 þ ðm 1ÞτÞ M ¼ N ðm 1Þτ;
6 X2 7 6 xð2Þ xð2 þ τÞ ⋯ xð2 þ ðm 1ÞτÞ 7
X¼6 7¼6
4 ⋮ 5 4
7
5
(3) Find out the nearest point Yj1 of Yj and limit a brief separation as
⋯ ⋯ ⋯ ⋯
dj ð0Þ ¼ minYj Yj1 ; jj j1j > p, p is the average period;
XN xðn ðm 1ÞτÞ xðn ðm 2ÞτÞ ⋯ xðnÞ j
(8) (4) Calculate the distance ofthe above-mentioned
nearest points after
discrete time i as dj ðiÞ ¼ Yjþi Yj1þi , i ¼ 1; 2;⋯;minðM j; M
where N ¼ n ðm 1Þτ, τ is the time delay, m is the embedding j1Þ;
dimension. Pq
(5) For every i, calculate yðiÞ ¼ qΔt1
j¼1 lndj ðiÞ, where q is the number
By selecting the appropriate delay time and embedding dimension, it
of dj ðiÞ 6¼ 0;
is possible to restore the dynamics of the original system in the sense of
(6) Obtain the regression line L of y using the least square algorithm,
topology equivalence. Therefore, the key to phase space reconstruction is
and λ1 is the slope of line L.
to determine the delay time and embedding dimension. In this paper, the
time delay τ and the embedding dimension m are calculated by the mutual
4. Formulations for the multi-scale prediction model
information method [35] and the Cao's method [36], respectively.
For a set of observable series fxi g; i ¼ 1; 2; ⋯; N, the mutual infor-
4.1. LSSVM prediction model
mation between xt and xtþτ is defined as
X The LSSVM is an extension of SVM [39], which applies the linear least
Pðxt ; xtþτ Þ
IðτÞ ¼ Pðxt ; xtþτ Þln (9) squares criteria to the loss function instead of inequality constraints. It is
xt ;xtþτ
Pðxt ÞPðxtþτ Þ
based on the margin-maximization performing structural risk and has
excellent power of generalization. As the LSSVM can approach the
where Pðxt Þ and Pðxtþτ Þ represent the probability density of xt and xtþτ
non-linear system with high precision, it has been a powerful tool for
respectively, Pðxt ; xtþτ Þ is the joint probability density function. The first
modeling and forecasting non-linear systems [40–42].
minimum value of IðτÞ is chosen as the proper delay time.
Suppose that there is a set of data fxk ; yk gk ¼ 1N with the input data
In Cao's method, delay time τ should be determined in advance.
xk and the corresponding goal yk . In the modeling of LSSVM [43], the
i ðmÞ is the nearest neighbor of Yi ðmÞ in the m dimensional recon-
Y NN
following optimization problem is considered:
structed phase space, Yi ðm þ 1Þ and Y NN i ðm þ 1Þ are the expansion of
Yi ðmÞ and Y NN
i ðmÞ in the reconstructed phase space with m þ 1 dimen- 1 1X N
25
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
There are several kernel functions that are commonly utilized in the
LSSVM models, e.g., linear, polynomial, RBF, etc. The RBF kernel is μ ¼ σ 2 ΣΦT t (27)
employed in this study, which is described as follows:
where A ¼ diagðα0 ; α1 ; ⋯; αN Þ.
2 2
The likelihood distribution over the training targets can be calculated
Kðx; xk Þ ¼ exp kx xk k 2σ (18)
by maximizing the term pðt α; σ 2 Þ given by Tipping:
where σ is the bandwidth of the RBF kernel. The σ and the punishment Z
1
factor γ are two parameters that need to be chosen in the LSSVM model. p tα; σ 2 ¼ p tw; σ 2 pðwjαÞdw ¼ ð2π ÞN=2 Cj1=2 exp t T C1 t
2
(28)
4.2. IACO optimized RVM chaotic prediction model
where the covariance is given by C ¼ σ 2 I þ ΦA1 ΦT .
The RVM, proposed by Michael E. Tipping, is a sparse learning method Values of α and σ 2 can be calculated by iterative method:
based on probabilistic Bayesian framework [44]. The kernel function of RVM
does not need to satisfy the Mercer's condition. Thus, the assumption of αnew
i ¼ ð1 αi Σ ii Þ μ2i (29)
kernel function is relaxed and avoids the set of free parameters like the
penalty factor that the SVM has. In addition, relevance vectors of RVM are 2
new kt Φμk
sparser than support vectors of SVM due to prior distribution of relevance σ2 ¼ (30)
P
N
vectors. Due to these advantages, the RVM can effectively solve N ð1 α1 Σ ii Þ
i¼1
high-dimensional, nonlinear classification and regression problems [45–47].
Given a database preprocessed by the phase reconstruction method Since the effectiveness of the RVM model depends largely on the
fxn ; tn gn ¼ 1N , the signal can be expressed as follows: kernel parameter, optimizing the kernel parameter is indispensable for
improving the prediction accuracy of the model. In our previous work
tn ¼ yðxn ; wÞ þ εn (19) [15,48], the ACO algorithm showed good performance in terms of pa-
rameters optimization of the MI prediction model. Thus, an improved ant
where εn is the random noise, which is assumed to be independent zero-
colony optimization (IACO) algorithm is applied to optimize the pa-
mean Gaussian distributed with variance σ 2 . The function yðxÞ is defined
rameters of RVM. The detailed procedures of the IACO algorithm are
as follows:
described as follows:
X
N X
N
yðx; wÞ ¼ wi Kðx; xi Þ þ w0 ¼ wi ϕi ðxÞ (20)
i¼1 i¼1
matrix. RVM adopts a Bayesian perspective and constrains w and σ 2 by 1. Calculate the probability for choosing the region si ði ¼ 1; 2; ⋯; nÞ by Pi ðkÞ ¼
defining a prior probability distribution over the weights: P
Fi = ni¼1 Fi
2. Choose a schosen by following the roulette rules for ant j ðj ¼ 1; 2; ⋯; mÞ
Y
N 3. Create a distance with a direction del ¼ ðd1 ; d2 ; ⋯; dD Þ that ant j will walk.
pðwjαÞ ¼ N wi 0; α1
i (22) 4. Obtain a new region snew by snew ¼ schosen þ del
i¼0
5. Compare the fitness of snew and schosen : if Fnew > Fchosen , replace schosen with snew in
the set S and update Fchosen in the set Fi ði ¼ 1; 2; ⋯; nÞ; otherwise, do nothing.
Y
N Global search
PðαÞ ¼ Gammaðαi ja; bÞ (23)
i¼0 1. Choose the regions to be replaced by following the roulette rules. The total number
of regions to be chosen is R1þR2.
2. Calculate the probability for region si ði ¼ 1; 2; ⋯; nÞ to be chosen by Qi ðkÞ ¼
P σ 2 ¼ Gamma σ 2 c; d (24) X
n
1 1
=Fi i¼1 =Fi
where α is an N þ 1 hyper-parameter, the parameters a, b, c and d can be 3. Replace the first R1 regions with mutation operation:
fixed as small values, such as a ¼ b ¼ c ¼ d ¼ 105 . 3.1 Each initial region sold shifts a random distance in a random direction del ¼ ðd1 ;
Having defined the prior probability of the parameter set, the pos- d2 ; ⋯; dD Þ
3.2 Obtain a new region snew according to snew ¼ sold þ del
terior over weights can be calculated through the Bayesian rule:
3.3 Replace the chosen R1 regions with the new ones.
4.Replace the left R2 regions with cross operation:
pðtjw; σ 2 ÞpðwjαÞ
p wt; α; σ 2 ¼ 4.1 Each initial region sold crosses with a random region srandom in S.
pðtjα; σ 2 Þ 4.2 Obtain a new one snew according to snew ¼ p sold þ ð1 pÞ srandom , where p is an
1 X 1 adjustable probability parameter.
¼ ð2π ÞðNþ1Þ=2 jΣj1=2 exp ðw μÞT ðw μÞ (25) 4.3 Replace the chosen R2 regions with the new ones.
2 Return results
The process of local search and global search will be stopped by meeting an end
where the posterior covariance and mean are respectively: criterion.
Take the region sbest with the best fitness Fbest as the final solution of the optimizing
1
Σ ¼ σ 2 ΦT Φ þ A (26) problem.
26
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
A schema of IACO optimized RVM chaotic prediction model is shown different characteristics of each component. On one hand, in the process
in Fig. 1. The input signals are transmitted from time series to multi- of predicting chaotic time series, the input signals are transmitted from
dimension datasets through phase space reconstruction and are divided time series to multi-dimension datasets through phase space recon-
into two groups, which one is training dataset and the other is validation struction. As the RVM can effectively solve high-dimensional and
dataset. Then a RVM model is trained with the training data, optimized nonlinear regression problems, it is used to predict the chaotic IMFs.
by the IACO algorithm, and evaluated by the validation data. Thus, for the components with chaotic characteristics, the IACO opti-
mized RVM (IACO-RVM) chaotic prediction model is established to
4.3. Multi-scale prediction model predict them in the reconstructed phase space. On the other hand, as the
LSSVM can approach the non-linear system with high precision and has
Due to the MI time series is highly nonlinear and non-stationary, it is excellent power of generalization, it is used to predict the components
very difficult to predict accurately. In order to improve prediction ac- that do not have chaotic characteristics. The IACO algorithm is also used
curacy, the EMD method is used to decompose the MI time series to to optimize the parameters of LSSVM. Thus, for the components without
different scales. The scales contain multiple IMFs and the residual chaotic characteristics, the IACO-LSSVM model is established to predict
sequence. By identifying the chaotic characteristics of each component, it them. Finally, the prediction results of the MI are obtained by integrating
can be determined that some IMFs are chaotic time series while the the prediction results of the IMFs and the residual sequence. The sche-
others are not. Different prediction models are used according to matic diagram of the multi-scale prediction model is shown in Fig. 2.
27
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
In this paper, the following measures are employed for model eval-
uation: the mean absolute error (MAE), the mean relative error (MRE),
the root of mean square error (RMSE), Theil's inequality coefficient (TIC)
and standard deviation of absolute error (STD). The error indicators are
defined as follows:
1X n
MAE ¼ jyi by i j (31)
n i¼1
1X n
jyi by i j
MRE ¼ 100% (32)
n i¼1 yi
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1X n
RMSE ¼ ðyi by i Þ2 (33)
n i¼1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
P n
ðyi by i Þ2
TIC ¼ rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi ffi
i¼1
(34)
P n P n
y2i þ by 2i
i¼1 i¼1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X n
STD ¼ ðei eÞ2 (35)
n 1 i¼1
P P
where y ¼ 1n ni¼1 yi , ei ¼ yi b
y i , e ¼ 1 n ni¼1 ei , yi and b
=
y i denote the Fig. 4. The empirical mode decomposition results of the melt index time series.
measure value and predicted result, respectively. The MAE, MRE and
RMSE confirm the prediction accuracy of the prediction models. The STD The MI, which depends on the catalyst properties, reactant composition,
indicates the predictive stability, and the TIC indicates the level of reactor temperature and so on, can determine different brands of prod-
agreement between the proposed model and the studied process. ucts and different grades of product quality. In this application, the MI
time series, collected from a real propylene polymerization factory, is
5. Results and discussion selected as the sample set. A commercial brand named F401 with MI
values between 1.4 and 3.2 is considered. The sampling time of MI is 2 h
The multi-scale prediction model based on EMD, chaos theory and and the size of the MI time series is 170.
IACO-RVM addressed in the preceding section is applied to the prediction Application of EMD method to the MI time series produces five IMFs
of the MI in a real industrial propylene polymerization processes and the residual sequence. The decomposition results are illustrated in
currently being operated for commercial purposes in China. The Hypol Fig. 4. The x axis is the number of samples (the unit is 1) and the y axis is
technology is used in the process, which is one of the most widespread the melt index value (the unit is g/10min). In the figure, IMF1-IMF5 are
commercial methods for producing polypropylene. Fig. 3 shows the the five IMFs and RES is the residual sequence. Generally, the number of
schematic diagram of the industrial process. The process consists of a IMFs depends on the stopping criteria (Eq. (5)) and experimental con-
chain of reactors in series: two continuous stirred tank reactors, and two ditions. In the studied case, the experimental conditions are definite, so
fluidized-bed reactors. The feed to the reactor is comprised of propylene, the stopping criteria, i.e., the threshold of ε, will have an effect on the
hydrogen, and catalyst. These liquids and gases are reactants for the number of IMFs. Huang et al. [30] suggest that ε between [0.2, 0.3] is
growing polymer particles and also the provider of the heat transfer appropriate. Here, a routine value ε ¼ 0:2 is selected as the stopping
media. In the first two reactors, the polymerization reaction takes place criteria to perform EMD on the MI time series. It is obvious that the signal
in a liquid phase, and in the third and fourth reactors, the reaction is on the different scales demonstrates a quite different behavior and the
completed in vapor phase to produce the powdered polymer products. frequency included in each component gradually decreases from IMF1 to
28
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
Fig. 5. The reconstruction error of the EMD of the melt index time series.
IMF5. On the other hand, the reconstruction error of the EMD of the MI
time series is shown in Fig. 5. As can be seen from the figure, the
magnitude of reconstruction error is 1015, even close to 1016.
Subsequently, the chaotic characteristics of IMF1-IMF5 are identified.
First, the Hurst indexes H of IMF1-IMF5 are estimated with the R/S
analysis. The phase space of IMF1-IMF5 are reconstructed, the time delay
τ and the embedding dimension m of IMF1-IMF5 are calculated,
respectively, by the mutual information and Cao's methods. Then, the
largest Lyapunov exponents λ1 of IMF1-IMF5 are calculated by the small
data amount method. The chaotic characteristic analysis results of IMF1-
IMF5 are shown in Table 1. According to Table 1, it can be it can be
concluded that IMF2, IMF3, and IMF5 are chaotic sequences as they have
positive finite largest Lyapunov exponents, while IMF1 and IMF4 do not Fig. 7. Performance of the proposed models on the testing dataset.
have chaotic characteristics. The characteristics analysis results of the
original MI time series also show that it is a chaotic time series. The prediction results of IMF1-IMF5 and RES on the testing dataset
A multi-scale prediction model is established after the analysis of the are shown in Fig. 6. The curve marked with crosses is the actual value,
chaotic characteristics of the IMFs. For IMF2, IMF3 and IMF5 with while the curve marked with circles is the predicted value. It can be seen
chaotic characteristics, phase space reconstruction is firstly performed that the predicted value of each component is in good accordance with
according to the delay time and embedding dimension, and then the the actual value. The final prediction results of the MI time series are
IACO-RVM chaotic prediction model is adopted. For IMF1 and IMF4 obtained, as shown in Fig. 7. At the same time, the single RVM and IACO-
without chaotic characteristics and the residual sequence RES, the IACO- RVM chaotic prediction models have also been developed in comparison
LSSVM model is used as the prediction model. The prediction results of with the proposed model. These two models directly predict the original
the IMFs and the residual sequence are integrated to obtain the final MI time series. A distinct illustration shown in Fig. 7 exhibits how better
prediction results of the MI time series, and the prediction performance the Multi-scale model works than the other models on the testing dataset.
of the multi-scale model is evaluated. All data sets are separated into The curve marked with crosses is the real MI analytic value, while the
training, testing and generalization sets. The first 120 are used as training curves marked with triangles, circles and squares are the prediction
set, 30 are used as testing set, and the remaining 20 are used as gener- values by RVM-chaos model, IACO-RVM-chaos model and Multi-scale
alization set. It is noted that the training and testing sets come from the model, respectively. As can be seen from the figure, the Multi-scale
same batch, while the generalization dataset is from another batch. model yields consistently good prediction and performs much better
Therefore, the model's prediction accuracy can be reflected from the than the other models.
performance on testing dataset and the performance on the generaliza- The prediction results of different models on the testing dataset are
tion dataset can reflect the model's universality. listed in Table 2. The RBF-chaos model [49] reported in the open
29
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
Table 2 Table 4
Performance of the prediction models on the testing dataset. The comparison between the current work and the published literatures.
Model MAE MRE(%) RMSE TIC STD Model MAE MRE(%) RMSE TIC STD
RBF-chaos [49] – 2.87 0.0188 0.0104 – BNPSO-FNN [12] 0.00402 0.156 0.00600 0.00122 0.00552
RVM-chaos 0.01436 0.549 0.01750 0.00334 0.01771 HACDE-LSSVM [15] 0.00515 0.196 0.00820 0.00157 0.00737
IACO-RVM-chaos 0.00559 0.214 0.00721 0.00138 0.00725 PFO-RVM [16] 0.00773 0.295 0.00973 0.00186 0.00982
Multi-scale model 0.00329 0.125 0.00435 0.00083 0.00432 PSO-FF-WLSSVM [50] 0.00676 0.258 0.00860 0.00164 0.00867
Multi-scale model 0.00329 0.125 0.00435 0.00083 0.00432
Table 3
Performance of the prediction models on the generalization dataset.
support the conclusion that the Multi-scale model performs well not only
on the testing dataset but also on the generalization dataset.
Model MAE MRE(%) RMSE TIC STD
Moreover, the prediction results of different models on the general-
RBF-chaos [49] – 2.49 0.0112 0.0082 0.0082 ization dataset are listed in Table 3. It clearly shows that the MAE, MRE,
RVM-chaos 0.00486 0.185 0.00588 0.00112 0.00514 RMSE, TIC and STD of the Multi-scale model are better than that of the
IACO-RVM-chaos 0.00243 0.093 0.00348 0.00066 0.00310
Multi-scale model 0.00191 0.073 0.00303 0.00058 0.00264
RBF-chaos, RVM-chaos and IACO-RVM-chaos models. Compared with
RVM-chaos model, the error indicators of Multi-scale model has per-
centage decreases of 60.7%, 60.5%, 48.5%, 48.2% and 48.6%.
literature has also been considered to compare with the proposed The above experiments are performed on a personal computer with
Multi-scale model. It is obvious that the Multi-scale model has the best an Intel(R) Core(TM) i5-4210M CPU at 2.60 GHz. The computation time
performance over all. In detail, RBF-chaos model [49] gives an MRE of of the RVM-chaos model, the IACO-RVM-chaos model and the Multi-
2.87%, a RMSE of 0.0188, and a TIC of 0.0104. RVM-chaos model obtains scale model are 1.16 s, 2.37 s and 5.94 s, respectively, which increase
an MAE of 0.01436, an MRE of 0.549%, a RMSE of 0.01750, a TIC of with the complexity of models. These calculation times are the time of
0.00334 and a STD of 0.01771. IACO-RVM-chaos model gives an MAE of the entire running process, including the data processing, the training
0.00559, an MRE of 0.214%, a RMSE of 0.00721, a TIC of 0.00138 and a phase of the models and the MI calculation using a trained model. It can
STD of 0.00725. As can be seen from the above data, the error indicators be seen that due to the adoption of the multi-scale method, the compu-
of IACO-RVM-chaos model has percentage decreases of 61.1%, 61.0%, tation time of the proposed model is a bit long but still acceptable. Since
58.8%, 58.7% and 59.1% compared with RVM-chaos model. Further- the sampling time of the industrial MI prediction is about 2 h, the pro-
more, the Multi-scale model proposed in this paper shows even better posed method qualifies the online soft sensor for the MI prediction.
result. The MAE, MRE, RMSE, TIC and STD of Multi-scale model are Table 4 shows the comparison results between our work and the
0.00329, 0.125%, 0.00435, 0.00083, and 0.00432 respectively, with published literatures [12,15,16,50]. The methods of the published lit-
percentage decreasing of 41.1%, 41.6%, 39.7%, 39.9%, 40.4% compared eratures are used to handle the data set considered in the present paper.
to that of IACO-RVM-chaos model. These data prove that the proposed The error indicators in Table 4 are all computed on the same testing
Multi-scale model provides wonderful MI prediction accuracy for the dataset. With the same research data, all error indicators of the proposed
propylene polymerization process. model in this paper are the smallest and the prediction accuracy is the
To illustrate the universality of the proposed model, a visual com- best. It also illustrates the effectiveness of EMD and multi-scale modeling
parison of the prediction performance on the generalization dataset is method. The multi-scale model has better prediction performance than
shown in Fig. 8. The curve marked with crosses is the real MI value ob- the single prediction model. The proposed soft sensor is therefore sup-
tained from analysis in laboratory, while the results predicted by RVM- posed to have a promising potential for the practical use.
chaos model, IACO-RVM-chaos model and Multi-scale model are depic- To clarify the generality of the proposed Multi-scale model, another
ted by curves marked with triangles, circles and squares, respectively. dataset of Brand F400 from a real propylene polymerization plant is
Apparently, the prediction result of Multi-scale model is the best and selected to train and test the proposed model, which is divided into 60
nearly being the real MI value on most data points. The results strongly pairs for training and 25 pairs for testing. The results are shown in Fig. 9
30
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
Table 5 [7] X. Yuan, B. Huang, Y. Wang, C. Yang, W. Gui, Deep learning based feature
Performance of the prediction models on another testing dataset of brand F400. representation and its application for soft sensor modeling with variable-wise
weighted SAE, IEEE Trans. Ind. Inform. 14 (2018) 3235–3243.
Model MAE MRE(%) RMSE TIC STD [8] X. Yuan, Y. Wang, C. Yang, Z. Ge, Z. Song, W. Gui, Weighted linear dynamic system
for feature representation and soft sensor application in nonlinear dynamic
RVM-chaos 0.0712 2.75 0.0878 0.0176 0.0583 industrial processes, IEEE Trans. Ind. Electron. 65 (2018) 1508–1517.
IACO-RVM-chaos 0.0527 2.04 0.0640 0.0128 0.0416 [9] J. Zhang, Q. Jin, Y. Xu, Inferential estimation of polymer melt index using
Multi-scale model 0.0461 1.78 0.0554 0.0111 0.0376 sequentially trained bootstrap aggregated neural networks, Chem. Eng. Technol. 29
(2006) 442–448.
[10] J. Gonzaga, L. Meleiro, C. Kiang, R. Maciel Filho, ANN-based soft-sensor for real-
and in Table 5. A clearly contrast for different models is demonstrated in time process monitoring and control of an industrial polymerization process,
Comput. Chem. Eng. 33 (2009) 43–49.
Fig. 9. As can been see, the proposed Multi-scale model exhibits the best [11] H. Lou, H. Su, L. Xie, Y. Gu, G. Rong, Inferential model for industrial polypropylene
performance compared to other models. Table 5 shows the prediction melt index prediction with embedded priori knowledge and delay estimation, Ind.
results of different models on the second dataset. In detail, the Multi-scale Eng. Chem. Res. 51 (2012) 8510–8525.
[12] X. Liu, C. Zhao, Melt index prediction based on fuzzy neural networks and PSO
model outperforms the other models again with a decrease of 35.3% in
algorithm with online correction strategy, AIChE J. 58 (2012) 1194–1202.
MRE from 2.75% to 1.78%, compared to the RVM-chaos model. The error [13] Y. Li, M. Xu, Y. Wei, W. Huang, A new rolling bearing fault diagnosis method based
reduction also happens in terms of MAE, RMSE, TIC and STD. The on multiscale permutation entropy and improved support vector machine based
experiment results show that the proposed Multi-scale model perfor- binary tree, Measurement 77 (2016) 80–94.
[14] C. Jian, J. Gao, Y. Ao, A new sampling method for classifying imbalanced data
mances well in the efficiency of predicting MI and the generalization based on support vector machine ensemble, Neurocomputing 193 (2016) 115–122.
ability. [15] M. Zhang, X. Liu, A real-time model based on optimized least squares support vector
machine for industrial polypropylene melt index prediction, J. Chemometr. 30
(2016) 324–331.
6. Conclusion [16] Y. Sun, Y. Wang, X. Liu, C. Yang, Z. Zhang, W. Gui, X. Chen, B. Zhu, A novel
Bayesian inference soft sensor for real-time statistic learning modeling for industrial
In order to achieve better performance for the industrial MI predic- polypropylene melt index prediction, J. Appl. Polym. Sci. 134 (2017) 45384.
[17] N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.T. Chi,
tion, this paper presents a multi-scale prediction model for the MI pre- H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for
diction based on EMD, chaos theory and IACO-RVM chaotic prediction nonlinear and non-stationary time series analysis, Proc. Math. Phys. Eng. Sci. 454
model. For comparison, the RVM-chaos, IACO-RVM-chaos and multi- (1998) 903–995.
[18] Y. Lei, J. Lin, Z. He, M.J. Zuo, A review on empirical mode decomposition in fault
scale models are developed and evaluated. The soft sensor models are diagnosis of rotating machinery, Mech. Syst. Signal Process. 35 (2013) 108–126.
applied for the MI prediction in a real industrial PP plant. The application [19] C. Wang, H. Zhang, W. Fan, P. Ma, A new chaotic time series hybrid prediction
of the proposed model to the testing and generalization data shows its method of wind power based on EEMD-SE and full-parameters continued fraction,
Energy 138 (2017) 977–990.
good performance and excellent generalization ability. The multi-scale
[20] X. Qiu, Y. Ren, P.N. Suganthan, G.A.J. Amaratunga, Empirical Mode Decomposition
model predicts the MI with an MRE of 0.125% on the testing dataset, based ensemble deep learning for load demand time series forecasting, Appl. Soft
which is much more accurate than the IACO-RVM-chaos model with an Comput. 54 (2017) 246–255.
MRE of 0.214%, while much better than the RVM-chaos model with an [21] W.Y. Duan, L.M. Huang, Y. Han, Y.H. Zhang, S. Huang, Short-term forecast of non-
stationary and nonlinear ship motion using an AR-EMD-SVR model, J. Zhejiang
MRE of 0.549%. These models are also compared with other methods Univ. - Sci. A 7 (2015) 562–576.
reported in the open literature. The research results reveal the prediction [22] W.Y. Duan, L.M. Huang, Y. Han, D.T. Huang, A hybrid EMD-AR model for nonlinear
accuracy and validity of the proposed approach, which indicate that the and non-stationary significant wave height forecast, J. Zhejiang Univ. - Sci. A 2
(2016) 115–129.
multi-scale modeling method can be a promising and efficient method- [23] W.Y. Duan, Y. Han, L.M. Huang, B.B. Zhao, M.H. Wang, A hybrid EMD-SVR model
ology for industrial MI prediction. for the short-term prediction of significant wave height, Ocean. Eng. 124 (2016)
54–73.
[24] L.M. Huang, W.Y. Duan, Y. Han, Y.S. Chen, A review of short-term prediction
Acknowledgments techniques for ship motions in seaway, J. Ship Mech. 18 (2014) 1534–1542.
[25] W. Guo, T. Xu, Z. Lu, An integrated chaotic time series prediction model based on
This work is supported in part by NSFC-Zhejiang Joint Fund for the efficient extreme learning machine and differential evolution, Neural Comput.
Appl. 27 (2016) 883–898.
Integration of Industrialization and Information (No.U1609214), China [26] J. Zhang, J. Zhang, X. Wu, Y. Zhang, J. Li, A chaotic time series prediction model for
Postdoctoral Science Foundation (2018M630674), Zhejiang Provincial speech signal encoding based on genetic programming, Appl. Soft Comput. 38
Natural Science Foundation of China (No.LQ16F030002, (2016) 754–761.
[27] Y. Li, X. Jiang, H. Zhu, X. He, S. Peeta, T. Zheng, Y. Li, Multiple measures-based
No.LY19F030003), and their supports are thereby acknowledged.
chaotic time series for traffic flow prediction based on Bayesian theory, Nonlinear
Dynam. 85 (2016) 179–194.
Appendix A. Supplementary data [28] S.H. Chai, J.S. Lim, Forecasting business cycle with chaotic time series based on
neural network with weighted fuzzy membership functions, Chaos, Solit. Fractals
90 (2016) 118–126.
Supplementary data to this article can be found online at https://doi. [29] X. An, J. Yang, Denoising of hydropower unit vibration signal based on variational
org/10.1016/j.chemolab.2019.01.008. mode decomposition and approximate entropy, T. I. Meas. Contr. 38 (2016).
[30] H. Huang, J. Pan, Speech pitch determination based on Hilbert-Huang transform,
Signal Process. 86 (2006) 792–803.
References [31] H.E. Hurst, Long-term storage capacity of reservoirs, Trans. Am. Soc. Civ. Eng. 116
(1951) 770–808.
[1] F. Ahmed, L.H. Kim, Y.K. Yeo, Statistical data modeling based on partial least [32] M.A.S. Granero, J.E.T. Segovia, J.G. Perez, Some comments on Hurst exponent and
squares: application to melt index predictions in high density polyethylene the long memory processes on capital markets, Phys. A Stat. Mech. Appl. 387
processes to achieve energy-saving operation, Kor. J. Chem. Eng. 30 (2013) 11–19. (2008) 5543–5551.
[2] Y. Liu, Y. Liang, Z. Gao, Industrial polyethylene melt index prediction using [33] X. Han, E. Fridman, S.K. Spurgeon, Sampled-data sliding mode observer for robust
ensemble manifold learning-based local model, J. Appl. Polym. Sci. 134 (2017) fault reconstruction: a time-delay approach, J. Franklin Inst. 351 (2014)
45094. 2125–2142.
[3] L.L.T. Chan, J. Chen, Melt index prediction with a mixture of Gaussian process [34] F. Takens, Dynamical systems and turbulence, Lect. Notes Math. 898 (1981)
regression with embedded clustering and variable selections, J. Appl. Polym. Sci. 366–381.
134 (2017) 45237. [35] W. Wang, L. Ying, J. Zhang, On the relation between identifiability, differential
[4] W. Huang, Z. Jing, Multi-focus image fusion using pulse coupled neural network, privacy, and mutual-information privacy, IEEE Trans. Inf. Theor. 62 (2016)
Pattern Recogn. Lett. 28 (2018) 1123–1132. 5018–5029.
[5] L. Zhou, J. Zheng, Z. Ge, Z. Song, S. Shan, Multimode process monitoring based on [36] L. Cao, Practical method for determining the minimum embedding dimension of a
switching autoregressive dynamic latent variable model, IEEE Trans. Ind. Electron. scalar time series, Physica D 110 (1997) 43–50.
65 (2018) 8184–8194. [37] L.G. de la Fraga, E. Tlelo-Cuautle, Optimizing the maximum Lyapunov exponent
[6] L. Zhou, G. Li, Z. Song, S.J. Qin, Autoregressive dynamic latent variable models for and phase space portraits in multi-scroll chaotic oscillators, Nonlinear Dynam. 76
process monitoring, IEEE Trans. Contr. Syst. Technol. 25 (2016) 366–373. (2014) 1503–1515.
31
M. Zhang et al. Chemometrics and Intelligent Laboratory Systems 186 (2019) 23–32
[38] M.T. Rosenstein, J.J. Collins, C.J. De Luca, A practical method for calculating largest [45] J. Yan, Y. Liu, S. Han, M. Qiu, Wind power grouping forecasts and its uncertainty
Lyapunov exponents from small data sets, Physica D 65 (1993) 117–134. analysis using optimized relevance vector machine, Renew. Sustain. Energy Rev. 27
[39] V.N. Vapnik, V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998. (2013) 613–621.
[40] M.M. Adankon, M. Cheriet, A. Biem, Semisupervised least squares support vector [46] S. Kaltwang, S. Todorovic, M. Pantic, Doubly sparse relevance vector machine for
machine, IEEE Trans. Neural Network. 20 (2009) 1858. continuous facial behavior estimation, IEEE Trans. Pattern Anal. Mach. Intell. 38
[41] X.Y. Wang, H.Y. Yang, Z.K. Fu, A New Wavelet-based image denoising using (2016) 1748–1761.
undecimated discrete wavelet transform and least squares support vector machine, [47] H. Mehrotra, R. Singh, M. Vatsa, B. Majhi, Incremental granular relevance vector
Expert Syst. Appl. 37 (2010) 7040–7049. machine: a case study in multimodal biometrics, Pattern Recogn. 56 (2016) 63–76.
[42] B. Fan, X. Lu, H.X. Li, Probabilistic inference-based least squares support vector [48] J.B. Li, X.G. Liu, Melt index prediction by RBF neural network optimized with an
machine for modeling under noisy environment, IEEE Trans. Syst. Man, Cybern. adaptive new ant colony optimization algorithm, J. Appl. Polym. Sci. 119 (2011)
Syst. (2016) 1–8. 3093–3100.
[43] J.A.K. Suykens, J. Vandewalle, Least squares support vector machine classifiers, [49] Z. Zhang, T. Wang, X. Liu, Melt index prediction by aggregated RBF neural
Neural Process. Lett. 9 (1999) 293–300. networks trained with chaotic theory, Neurocomputing 131 (2014) 368–376.
[44] M.E. Tipping, Sparse Bayesian Learning and the Relevance Vector Machine, [50] M.M. Zhang, X.G. Liu, Melt index prediction by fuzzy functions and weighted least
JMLR.org, 2001. squares support vector machines, Chem. Eng. Technol. 36 (2013) 1577–1584.
32