0% found this document useful (0 votes)
23 views7 pages

Time Series Data Mining A Case Study With Big

Uploaded by

practice752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

Time Series Data Mining A Case Study With Big

Uploaded by

practice752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

SPECIAL SECTION ON BIG DATA TECHNOLOGY AND APPLICATIONS IN

INTELLIGENT TRANSPORTATION

Received December 31, 2019, accepted January 9, 2020, date of publication January 14, 2020, date of current version January 24, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2966553

Time Series Data Mining: A Case Study With Big


Data Analytics Approach
FANG WANG 1,3 , MENGGANG LI 2,3,4 , (Member, IEEE), YIDUO MEI 5, AND WENRUI LI 1,3
1 School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
2 National Academy of Economic Security, Beijing Jiaotong University, Beijing 100044, China
3 Beijing Laboratory of National Economic Security Early-Warning Engineering, Beijing Jiaotong University, Beijing 100044, China
4 Beijing Center for Industrial Security and Development Research, Beijing Jiaotong University, Beijing 100044, China
5 Postdoctoral Programme of China Centre for Industrial Security Research, Beijing Jiaotong University, Beijing 100044, China

Corresponding authors: Menggang Li ([email protected]) and Wenrui Li ([email protected])


This work was supported by the Program of the Co-Construction with the Beijing Municipal Commission of Education of China under
Grant B18H100040.

ABSTRACT Time series data is common in data sets has become one of the focuses of current research. The
prediction of time series can be realized through the mining of time series data, so that we can obtain the
development process and regularity of social economic phenomena reflected by time series, and extrapolate
to predict its development trend. More and more attention has been paid to time series prediction in the era
of big data. It is the basic application of time series prediction to accurately predict the trend. In this paper,
we introduce various time series autoregressive (AR) model, moving average (MA) model, and ARIMA
model that is combined by AR and MA. As the time series prediction in general scenarios, the ARIMA is
applied to the risk prediction of the National SME Stock Trading (New Third Board) in combination with
specific scenarios. The case studies show that the results of our analysis are basically consistent with the
actual situation, which has greatly helped the prediction of financial risks.

INDEX TERMS Data mining, time series, financial forecast, AR, MA, ARIMA, financial risk.

I. INTRODUCTION This method fits the historical time trend curve by estab-
Time series data mining comes from the need of people lishing an appropriate mathematical model and predicts the
to visualize data models according to their abilities. People trend of future time series according to the established model
rely on complex methods to perform these tasks. In fact, Curves, our common models include ARMA [2], VAR [3],
we can ignore small fluctuations to get the conceptual model TAR [4], ARCH [5], etc. The traditional time series method
and distinguish different time models based on the similarity can be applied to a variety of scenarios because it relies
between models. The main time series related tasks include on relatively simple data and only needs the historical time
content-based querying, anomaly checking, pattern recogni- series trend curve to build a model. However, the traditional
tion, prediction, clustering, classification and segmentation. time series prediction method often faces the problem of
A large number of decision-making problems cannot be lag, which is that the predicted value is several time units
separated from prediction in various research fields of the later than the true value. In order to improve the accuracy of
natural sciences and social sciences, forecast is the basis of prediction, machine learning algorithms are introduced into
the decision-making [1]. Therefore, we mainly explored the time series prediction. The machine learning methods select
time series data analysis and prediction. features that may affect the predicted value according to the
Time series prediction methods are divided into traditional specific application scenario, then introduces these features
time series prediction methods and machine learning meth- into the model, finally applies machine learning classification
ods. The traditional time series forecasting method refers to models for prediction. Machine learning methods need to
predicting the trend development of future time series only extract more features from data in multiple dimensions. The
based on the trend development of historical time series. more complex the model, the more accurate the prediction.
However, models are often not universal and features need
The associate editor coordinating the review of this manuscript and to be re-extracted for different application scenarios to build
approving it for publication was Sabah Mohammed . models. In reality prediction, machine learning methods are

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
14322 VOLUME 8, 2020
F. Wang et al.: Time Series Data Mining: Case Study With Big Data Analytics Approach

often combined with traditional time series prediction meth-


ods. We mainly explore the AR and MA prediction models,
and then explore the combination of the two models, ARIMA,
which has a good method for processing non-stationary time FIGURE 1. Three-step analysis method.
series [6].
At present, there are many methods for analyzing and
predicting factors related to the relationship between supply III. TIME SERIES DATA ANALYSIS AND FORECAST
and demand in the financial market, but the effect of this In this paper, a three-step analysis method for time series
method is not obvious. We conducted a time series analysis data analysis is proposed (see Figure 1): Firstly, the data
of the financial and economic fields and used the ARIMA is pre-processed, which includes stationary processing of
model to predict the risks of the National SME Stock Trading time series that are in an unstable state. Secondly, the pre-
(New Third Board). The final results are basically consistent processed data is tested for stationarity. Finally, the prediction
with the actual results, and good prediction results have been model is used to predict the probability distribution in the
obtained. same time period in the future.

A. STATIONARITY DETERMINATION AND PROCESSING


II. RESEARCH BACKGROUND A time series can be considered stable when it has no system-
Time series data is encountered in every aspect of the sci- atic changes in the mean (no trend), no systematic changes
entific field [7]. A time series is a series of observations in the variance and periodic changes strictly eliminated. The
taken in chronological order. For examples, a time series can time series can be further subdivided into strict stationary and
be constituted by the closing price of a stock A on each weak stationary. For all time t, any positive integer k and
trading day from June 1, 2015 to June 1, 2016; a time series any k positive integers (t1 , t2 , · · · , tk ), the joint distribution
can be constituted by the daily maximum temperature in of (rt1 , rt2 , · · · , rtk ) is the same as the joint distribution of
a certain place; The station’s environmental detection data (rt1+t , rt2+t , · · · , rtk+t ), we call the time series {rt } to be
records consist of a time series and so on. With the rapid strictly stable and the joint distribution of (rt1 , rt2 , · · · , rtk )
development of big data, more and more time series data are remains unchanged under the translational transformation of
stored in computers, so that we have a huge amount of time time. The above time series are strong stationary time series,
series data. Faced with these time series data, people want but the time series we use are generally weak stationary
to reveal the information existing in these series data sets sequence.
through effective methods or techniques. Today, the study of A weakly stationary sequence {rt } must satisfy the follow-
time series data has been rapidly developed and has become ing two conditions: E (rt ) = µ (µ is constant). Variance
an important research direction in data mining. We can dis- Cov (rt , rt−1 ) = γl , γl only depends on l (l is any integer). For
cover the inherent rules of things change and provide a ref- weakly stationary time series, the mean and the covariance
erence for relevant people through the study of time series of rt and rt−1 do not change with time. We usually call a
data. stationary sequence is weakly stationary in financial data.
The basis of time series analysis is to believe that prices Differential operation is usually used to achieve the stable
follow trends and that past price information is useful for condition when the time series is not stable. The difference
predicting future prices. Analysis of the dynamic changes in (forward here) is to find the difference between the value rt
stock prices is one of the most difficult challenges for human of the time series {rt } at time t and the value rt−1 at time
intelligence. There are many prediction methods for predict- t-1. Let us consider it as dt , it is a first-order difference.
ing stock price changes, such as the Box Jenkins method [8], If the same operation is performed on the new sequence {dt },
the Black-Scholes model [9], and the binomial model [10]. it is a second-order difference. Generally, non-stationary time
The Box-Jenkins method is a five-step process of identifying, series can be processed through d-time difference to be as
selecting, and evaluating conditional averaging models. The stationary or as approximate as stationary time series.
ARIMA method is popularized by Box and Jenkins, and
the ARIMA model is often called the Box-Jenkins model. B. TIME SERIES PREDICTION MODEL
Box and Tiao discuss the general transfer function model used 1) CORRELATION COEFFICIENT AND
by the ARIMA program. The Black-Scholes model is XXX. AUTOCORRELATION FUNCTION
The binomial model is YYY. The Autoregressive Moving The correlation coefficient is actually the angle between the
Average (ARMA) method is one of the most popular linear two vectors in the vector space and the covariance is the
models in time series prediction because of its good statistical expected value (or mean) of the product of their deviations
characteristics and great flexibility. The current stock forecast from their individual expected values. The correlation coeffi-
is based on market demand, the effect of this method is not cient is equal to 1 or −1 when the two vectors are parallel
very satisfactory due to the lack of time series. This paper (In particular, 1 means the same direction, −1 means the
uses the processing of time series data to obtain future stock reverse). If the two vectors are perpendicular and the cosine
conditions and produces good results. of the included angle is equal to 0, it means that the two

VOLUME 8, 2020 14323


F. Wang et al.: Time Series Data Mining: Case Study With Big Data Analytics Approach

vectors are uncorrelated. The smaller the angle between the TABLE 1. Unit root inspection table.
two vectors, the closer the absolute value of the correlation
coefficient is to 1, and the higher the correlation between the
two vectors.
The linear correlation between the two vectors is mea-
sured by correlation coefficient. In the stable time series
{rt }, the linear correlation between rt and its past value rt−i
is measured by autocorrelation coefficient. The correlation
coefficient between rt and rt−i is called the autocorrelation
coefficient of spacing l of rt , which is usually recorded as p1 .
specific:
Cov (rt , rt−1 ) Cov (rt , rt−1 )
ρ1 = √ =
Var (rt ) Var (rt−1 ) Var (rt ) perturbation or information of the AR model at time t, then it
The above formula uses the property of weak stationary: can be found that the model uses random interference or pre-
Var (rt ) = Var (rt−1 ). For {rt } samples of stationary time diction error in the past q periods to linearly express the cur-
series, then the autocorrelation coefficient of the samples with rent prediction value. The autocorrelation function is always
an interval of 1 is estimated as: q-step truncated for q-order MA models. Therefore, the MA
PT
(rt − r) (rt−1 − r) (q) sequence is only linearly related to its first q delay values,
ρ̂1 = t=l+1PT so it is a ‘‘limited memory’’ model. This feature can be
t=1 (rt − r)
2
used to determine the order of the model. MA models are
A series of autocorrelation sequences ρ̂1 , ρ̂2 , ρ̂3 · · · is always weakly stationary because they are a finite linear
called the sample autocorrelation function of rt . We con- combination of white noise sequences. Therefore, MA model
sider that the time series is completely uncorrelated when all has the properties of weak stationarity: stationarity, finality
the values in the autocorrelation function are 0. Therefore, and reversibility.
we often need to check whether multiple autocorrelation
coefficients are 0. C. ARIMA PREDICTION MODEL
So far, we have focused on stationary sequences. We can
2) AUTOREGRESSIVE (AR) MODEL consider using the ARIMA model if the sequence is non-
The data rt−1 at time t-1 may be useful in predicting rt at time stationary. The ARIMA can be used for statistics and arti-
t when the time series data interval is 1 and the autocorrelation ficial intelligence [12]. ARIMA has only one more letter
coefficient ACF is significant. We can build the following ‘‘I’’ than ARMA, which means that it has one more level
model according to the above principles:rt = ∅0 +∅1 rt−1 +at , of connotation than ARMA. A non-stationary sequence can
at is a white noise sequence, this model is called a first- be transformed into a stationary time series after d times of
order autoregressive (AR) model. We can introduce an AR difference. For the specific value of d, we first perform a
(p) model from AR model: rt = ∅0 + ∅1 rt−1 + ∅2 rt−2 + · · · + stationary test on the sequence after the first difference. Then
∅p rt−p + at . We generally choose partial correlation function we will continue to make the difference if it is still non-
and information criterion function to determine the order. The stationary until the test is stationary after d times. Finally,
information criterion usually uses the AIC rule. The follow- the specific value of d is calculated.
ing methods are proposed for the test of AR (p) stationarity.
We first assume that the sequence is weakly  stationary, then 1) UNIT ROOT TEST
E (rt ) = µ, Cov (rt ) = γ0 , Cov rt , rt−j = γj , (µ, γ0 ) are ADF is a common unit root test method [13]. Its original
constants. Because at is a white noise sequence, there are: hypothesis is that the sequence has a unit root, and the
E (at ) = 0, Var (at ) = σa2 , so there are: E(rt ) = ∅0 + sequence is non-stationary. It is necessary to be significant
∅1 E(rt−1 ) + ∅2 E(rt−2 ) + · · · + ∅p E(rt−p ). According to the at a given confidence level and reject the original hypothesis
nature of stationary, E (rt ) = E (rt−1 ) = E (rt−2 ) = · · · = for a stable time series data.
µ, which has: µ = ∅0 +∅1 µ+∅2 µ+· · ·+∅p µ, E (rt ) = µ = According to Table 1 and Figure 2 above, we assume the
θ0
1−∅1 −∅2 −···−∅p . We have the equation 1 − ∅1 x − ∅2 x − · · · − original hypothesis that the sequence has a unit root. The
∅p x = 0 as the characteristic equation when the denominator original hypothesis cannot be rejected because we can see that
is not 0. The inverses of all the solutions of the equation are the value of p-value is 0.1704489, which is much larger than
the characteristic roots of the model. The AR (p) sequence is the significant level. Therefore, the daily index series of the
stationary when all the characteristic roots are less than 1. Shanghai Stock Index is non-stationary. We make a difference
to the sequence as shown in Figure 3:
3) MOVING AVERAGE (MA) PREDICTION MODEL We can know from the figure 3 that the sequence is
We directly give the form of the MA (q) model: rt = c0 + approximately stationary. Let’s perform ADF test, p-value:
at − θ1 at−1 − θq a[11]
t−q , c0 is a constant term. The at is the 2.31245750144e-30. We can think the sequence is stationary

14324 VOLUME 8, 2020


F. Wang et al.: Time Series Data Mining: Case Study With Big Data Analytics Approach

A. ANALYTICAL METHOD
We used the three-step analysis method proposed above for
analysis: In the first stage, a non-stationary sequence is trans-
formed into a stationary time series by differential processing.
In the second stage, we use the ADF unit root test to check
whether the time series is stable. In the third stage, the prob-
ability distribution of different rises and falls within the same
time period based on the historical distribution of rises and
prepare is inferred for extreme situations that may seriously
affect the level of NAV.
STEP 1: data preparation (data preprocessing). The time
series is defined as a series of quantitative observations at
FIGURE 2. Unit root test chart.
consecutive times. In the analysis of financial time series,
the price time series itself is generally unstable, not com-
pletely random distribution, and has obvious autocorrelation.
At the same time, the law of price distribution may also
change abruptly due to a variety of factors, so that the law
established in the past stage may not still hold in the future.
Therefore, it is generally invalid to analyze the price time
series directly in an attempt to find the law or regression
formula. We pre-process the time series before applying the
ARMA model if the sequence is non-stationary. Generally,
the method for dealing with unstable time series is to make
first order difference of the time series [15]. Generally, two
FIGURE 3. Sub-difference. methods can be used:
The first is to find the difference between adjacent vari-
Because it can be seen that the p-value is very close to 0 and ables to build a first-order difference sequence, we can build
the original hypothesis is rejected. The value of d for the a new sequence yt :
original sequence can be 1 because the sequence is stable after
one difference. An ARMA model can be built from the differ- yt = xt − xt−1
ential sequence after the value of d is determined. At present, The second is to find the ratio of adjacent variables to
ARIMA has been widely used in various fields [14]. Next, build a first-order difference sequence, we can build a new
we will use the ARIMA model to analyze example in the sequence yt :
financial field. xt
yt =
xt−1
IV. NEW THIRD BOARD RISK FORECAST
A non-stationary sequence can be transformed into a sta-
The unit root analysis of the rise based on the stock price
tionary sequence after d times of difference. The specific
within a certain period of time can determine the stability of
value of d depends on the structure of the stationarity test
the rise series. The probability distribution of different rises
after the time series difference. we will continue to make the
and falls in the same period in the future can be inferred
difference if it is still non-stationary until the test is stationary
based on the historical distribution of the rise when the
after d times.
sequence is stable, so that the interested parties prepare plans
The relative ratio of the stock prices (the relative increase)
for extreme situations that seriously affect the level of net
is more concerned about the absolute value of stock price
worth of funds. In recent years, the OTC New Third Board has
changes, so that the ratio method is generally used in
developed rapidly in China. We can find that the New Third
the analysis of financial product price time series [16].
Board market has two characteristics after careful observa-
At the same time, the stock price difference will continue to
tion. Firstly, the overall market price volatility is significantly
increase or decrease accordingly after the stock price contin-
higher than that of the Shanghai and Shenzhen markets.
ues to rise or fall. Therefore, it is proposed to use the natural
Secondly, the volatility distribution is severely rightward. The
logarithm of the ratio of adjacent variables in the time series
fluctuation risk of individual stocks is often released quickly
of stock prices to perform first-order difference processing,
and violently because there is no limit of the daily limit
we need to construct a new series yt :
system. In the following, we focus on the practical problems
in the Chinese NEEQ stock market. The time series analysis xt
yt = ln( )
method was used to estimate the distribution probability of xt−1
future rise and fall based on the differential time series of A prominent advantage of this method is that the first-
daily rise and fall of stock prices. order difference sequence yt obtained from this method is

VOLUME 8, 2020 14325


F. Wang et al.: Time Series Data Mining: Case Study With Big Data Analytics Approach

approximately equal to the stock price increase, which can be TABLE 2. Unit root test results.
directly used for the probability prediction of the future stock
price distribution. In this paper, we use the ratio method to
deal with time series.
STEP 2: Stationarity check. We apply the unit root test to
the logarithmic rise series. Our goal is to investigate the sta-
TABLE 3. Daily increase probability distribution table.
tionarity of the residuals to determine if the ARMA model is a
good model for them. The original hypothesis of the unit root
test is to test whether the sequence is stationary. Then, negat-
ing the original hypothesis means that the series (or the dif-
ferential sequence in this example) is stationary. Specifically,
we use ADF (Augmented Dicky-Puller) to check whether the
time series is stable.
STEP 3: Prediction of the probability distribution of risk.
We can prepare for extreme situations that severely affect
asset benefit levels with the probability distribution of price
the next day. The basic idea of the ARMA model is to com-
bine the AR and MA models so that the number of parameters
used is kept small. The form of the model is:
i=p
X i=q
X
r t = ϕ0 + ϕi rt−1 + at + θi at−1
i=1 i=1
FIGURE 4. Daily increase probability distribution and cumulative
Among them, {at } is a white noise sequence, p and q are probability distribution.
both non-negative integers. We use the sequence of moving
operator B backward, the previous moment, the above model series can be rejected to have a unit root when the confidence
can be written as: 1 − ∅1 B − · · · − ∅p Bp rt = ∅0 + (1 − level is significantly higher than 1%, which is that the time
θ1 B − · · · − θq Bq )at series of logarithmic increase and decrease is basically stable.
At the time we get the expectation of rt : Therefore, we can use the time series analysis to predict
∅0 the future probability of distribution based on the stock’s
E (rt ) = past gain information. The following is a forecast of the
1 − ∅1 − · · · − ∅p
distribution of the gain in the next 1 transfer day based on
The inverse of all solutions of the equation 1−∅1 x −∅2 x 2 − historical gain information:
· · · − ∅p x p = 0 is called the characteristic root of the model. As can be seen from Table 3, the daily probability dis-
We can think the ARMA model is stable if the modulus of tribution of closing prices has a characteristic of peaks and
all characteristic roots is less than 1. We limit the maximum long tails. When applied to intraday T+0 trading, we should
order of AR to less than 6 and the maximum order of MA not pay attention to setting reasonable stop loss prices to avoid
to exceed 4 in order to control the amount of calculation. Then large losses caused by small probability events. As shown
we established the ARMA model based on the (3,3) order in Figure 4:
model solved by the AIC criterion. In addition to the daily increase distribution probabil-
We take Jindalai of the New Third Board as an exam- ity, we cannot directly use the cumulative distribution of
ple and predict the probability distribution of future fluc- the smaller period increase level probability when we want
tuations through the analysis of the closing price increase. to obtain the increase distribution probability of different
Since Jindalai was changed to a market-making transfer from time periods. For example, it is not possible to obtain
November 25, 2014, historical data was selected as daily a weekly increase horizontal distribution probability or a
closing price data for 421 transfer days from January 5, monthly increase horizontal distribution probability from the
2015 to September 24, 2016. A logarithmic gain sequence daily increase horizontal distribution probability by a super-
can be obtained after differential processing of the closing imposed manner. The correct processing method is to directly
price sequence. According to our analysis framework (see find the logarithmic first-order difference of the weekly clos-
Figure 1), we next perform a unit root test on the above two ing price sequence to obtain the time series of the logarithmic
time series and use ADF on Eviews 8 (Augmented Dicky- rise of the weekly closing price and process the sequence. For
Fuller) test can be obtained: example, statistics on the weekly logarithmic rise time series
It can be seen from table 2 that the existence of unit root of Jindalai are shown in Figure 5:
in the closing price time series cannot be denied even at the They can be obtained other distribution probabilities by
confidence level of 10%, so that the closing price time series interpolation in the distribution probability table or by tak-
is basically non-stationary. The logarithmic rise and fall time ing intersection points on the probability distribution curve.

14326 VOLUME 8, 2020


F. Wang et al.: Time Series Data Mining: Case Study With Big Data Analytics Approach

the historical rise and fall probability distribution curve. The


probability of falling distribution makes a good prediction for
extreme situations that may seriously affect the level of net
assets in real life so that we can infer what will happen in the
future through time series data sequences.
As the research continues, more technologies will be con-
sidered to expand the scope of prediction and improve the
accuracy [17]–[19]. We expect the time-series predictions
will have more applications in financial [20], [21]. By accu-
rate predictions, we can improve response measures to dis-
FIGURE 5. Weekly probability distribution curve and weekly cumulative cover possible emergencies; We can improve the prediction
probability curve.
by adding the spatial dimension under the combination of
time and space. For example, the better resource utilization
Assuming that the probability of occurrence is less than 5% as for users and taxis can be generated by predicting the number
the trading condition, by making two horizontal lines p = 5% of taxi rides in a certain area of Didi taxi [22]–[24]; This also
and p = 95% on the ordinate to intersect the distribution curve can accurate financial predictions, such as, it can help man-
at two points(Representing the probability distribution points agers Reasonably specify strategy by predicting the amount
of excessive decline and excessive increase, respectively), of money bought and sold, etc. The data mining of time series
taking two points on the abscissa can get an excessive drop data whose guidance and help [25] to actual production and
The corresponding abscissa is −4.5%, and the corresponding life will become more and more important.
abscissa is 4. 35%. In other words, there is a 84% probability
that the stock’s individual stocks will rise or fall between REFERENCES
−4.5% and 4.35%. Therefore, It can be bought and then sold [1] J. F. Li and Q. Zong, ‘‘The forecasting of the elevator traffic flow time
intraday T+0 trading when the stock falls by more than 4.5% series based on ARIMA and GP,’’ Adv. Mater. Res., vols. 588–589,
on the same day; It can be sold and then bought intraday pp. 1466–1471, Nov. 2012.
[2] M. Rout, B. Majhi, R. Majhi, and G. Panda, ‘‘Forecasting of currency
T+0 trading when a stock rises more than 4.35% on the same
exchange rates using an adaptive ARMA model with differential evolution
day. based training,’’ J. King Saud Univ.-Comput. Inf. Sci., vol. 26, no. 1,
pp. 7–18, Jan. 2014.
[3] N. Groenewold, L. Guoping, and C. Anping, ‘‘Regional output spillovers
B. CONCLUSION AND SUGGESTION
in China: Estimates from a VAR model,’’ Papers Regional Sci., vol. 86,
The time series analysis proposes a method of estimating no. 1, pp. 101–122, Mar. 2007.
the distribution probability of future stock price fluctuations [4] C. Altavilla and P. De Grauwe, ‘‘Non-linearities in the relation between
the exchange rate and its fundamentals,’’ Int. J. Fin. Econ., vol. 15, no. 1,
based on historical price information, thereby we can avoid p. n/a, 2008.
the risk of huge losses caused by intraday trading. The method [5] B. Baldauf and G. J. Santoni, ‘‘Stock price volatility: Some evidence
of directly judging the future price trend based on the histori- from an ARCH model,’’ J. Futures Markets, vol. 11, no. 2, pp. 191–200,
Apr. 1991.
cal price often lacks sufficient reliability because price series
[6] A. A. Ariyo, A. O. Adewumi, and C. K. Ayo, ‘‘Stock price prediction
are often unstable. We first perform a first-order difference using the ARIMA model,’’ in Proc. UKSim-AMSS 16th Int. Conf. Comput.
processing on the stock price series, which is equivalent to Modelling Simulation, Mar. 2014.
changing the research object from the stock price itself to the [7] C. C. Aggarwal, Mining Time Series Data. Springer, 2015.
[8] Y. Lu and S. AbouRizk, ‘‘Automated boxjenkins forecasting modelling,’’
change value of the stock price. Then check the stationarity Autom. Construction, vol. 18, no. 5, pp. 547–558.
and determine the stationarity of the increase series by unit [9] J.-P. Bouchaud and D. Sornette, ‘‘The black-scholes option pricing prob-
root analysis of the logarithmic increase of the stock price lem in mathematical finance: Generalization and extensions for a large
class of stochastic processes,’’ Sci. Finance Work. Paper Arch., vol. 4, no. 6,
within a certain period of time. Finally, we apply ARMA pp. 863–881, Jun. 1994.
(3,3) to the stable differential sequence to obtain the historical [10] S. Muzzioli and C. Torricelli, ‘‘A multiperiod binomial model for pric-
distribution rule. We use this rule to obtain the probability ing options in a vague world,’’ J. Econ. Dyn. Control, vol. 28, no. 5,
pp. 861–887, Feb. 2004.
distribution of different rises and falls in the same period [11] J. Che and J. Wang, ‘‘Short-term electricity prices forecasting based on
of time in the future, so that we can prepare for extreme support vector regression and Auto-regressive integrated moving average
situations that can seriously affect the level of net asset value. modeling,’’ Energy Convers. Manage., vol. 51, no. 10, pp. 1911–1917,
Oct. 2010.
[12] J.-J. Wang, J.-Z. Wang, Z.-G. Zhang, and S.-P. Guo, ‘‘Stock index fore-
V. SUMMARY AND OUTLOOK casting based on a hybrid model,’’ Omega, vol. 40, no. 6, pp. 758–766,
With the continuous progress on time series data mining Dec. 2012.
[13] Q. A. Nie and X. Zhang, ‘‘The analysis of united test of statistics in adf
technology, its application has been extended to financial unit root test,’’ Stat. Res., 2007.
analysis and it can well predict the risks in the financial [14] J. Contreras, R. Espinola, F. Nogales, and A. Conejo, ‘‘ARIMA models to
field in the future. We analyzed the time series data and its predict next-day electricity prices,’’ IEEE Trans. Power Syst., vol. 18, no. 3,
various prediction models, then applied the ARIMA predic- pp. 1014–1020, Aug. 2003.
[15] F. Islam, M. Shahbaz, A. U. Ahmed, and M. M. Alam, ‘‘Financial devel-
tion model to the analysis of financial. The paper proposed a opment and energy consumption nexus in Malaysia: A multivariate time
time series analysis method to predict the future rise through series analysis,’’ Econ. Model., vol. 30, pp. 435–441, Jan. 2013.

VOLUME 8, 2020 14327


F. Wang et al.: Time Series Data Mining: Case Study With Big Data Analytics Approach

[16] F. E. Tay and L. Cao, ‘‘Modified support vector machines in financial MENGGANG LI (Member, IEEE) received the
time series forecasting,’’ Neurocomputing, vol. 48, nos. 1–4, pp. 847–861, Ph.D. degree in applied economics from Beijing
Oct. 2002. Jiaotong University, Beijing, China. He is cur-
[17] D. Zhang, ‘‘High-speed train control system big data analysis based on rently the Dean of the National Academy of
fuzzy RDF model and uncertain reasoning,’’ Int. J. Comput., Commun. Economic Security, Beijing Jiaotong University,
Control, vol. 12, no. 4, p. 577, Jun. 2017. the Director of the Beijing Laboratory of National
[18] W. Xu, L. Liu, Q. Zhang, and P. Liu, ‘‘Location decision-making of Economic Security Early-Warning Engineering
equipment manufacturing enterprise under dual-channel purchase and sale
and the Beijing Philosophy and Social Sci-
mode,’’ Complexity, vol. 2018, Dec. 2018, Art. no. 3797131.
ence Beijing Industrial Security and Development
[19] D. Zhang, J. Sui, and Y. Gong, ‘‘Large scale software test data genera-
tion based on collective constraint and weighted combination method,’’ Research Base, and the Chairman of the IEEE
Tehnicki Vjesnik-Tech. Gazette, vol. 24, no. 4, pp. 1041–1049, Jul. 2017. Professional Committee in Logistics, Informatics, and Industrial Security
[20] W. Xu and Y. Yin, ‘‘Functional objectives decision-making of discrete System. His current research concerns national economic security, industrial
manufacturing system based on integrated ant colony optimization and economics, and industrial security.
particle swarm optimization approach,’’ Adv. Prod. Eng. Manage., vol. 13,
no. 4, pp. 389–404, Dec. 2018.
[21] W. Bao, J. Yue, and Y. Rao, ‘‘A deep learning framework for financial time
series using stacked autoencoders and long-short term memory,’’ PLoS
ONE, vol. 12, no. 7, Jul. 2017, Art. no. e0180944.
[22] J. Y. Chen, ‘‘Thrown under the bus and outrunning it! The logic of DiDi
and taxi drivers’ labour and activism in the on-demand economy,’’ New YIDUO MEI received the B.E. and Ph.D. degrees
Media Soc., vol. 20, no. 8, pp. 2691–2711, Aug. 2018. in computer science and technology from Xi’an
[23] L. Zhang, J. Lu, J. Zhou, J. Zhu, Y. Li, and Q. Wan, ‘‘Complexities’ day-to- Jiaotong University, in 2004 and 2011, respec-
day dynamic evolution analysis and prediction for a Didi taxi trip network tively. He is currently a Postdoctoral Researcher
based on complex network theory,’’ Mod. Phys. Lett. B, vol. 32, no. 9,
with the National Academy of Economic Security,
Mar. 2018, Art. no. 1850062.
Beijing Jiaotong University. His main research
[24] Y. Lu and X. Xiong, ‘‘Topic analysis of microblog about ‘didi taxi’ based
on K-means algorithm,’’ Amer. J. Inf. Sci. Technol., vol. 3, no. 3, p. 72, interests include blockchain, AI, cloud computing,
2019. grid computing, trust management, and big data.
[25] J.-C. Kim and K. Chung, ‘‘Mining based time-series sleeping pattern analy-
sis for life big-data,’’ Wireless Pers. Commun., vol. 105, no. 2, pp. 475–489,
Mar. 2019.

FANG WANG received the Ph.D. degree in WENRUI LI received the Ph.D. degree in applied
statistics from Beijing Jiaotong University, economics from Beijing Jiaotong University,
Beijing, China. She is currently a Postdoctoral Beijing, China. He is currently a Lecturer of indus-
Researcher with Beijing Jiaotong University. Her trial economics with Beijing Jiaotong University.
main research involves in game theory, machine His current research involves in industrial eco-
learning, industrial economics, and industrial nomics and industrial security.
security.

14328 VOLUME 8, 2020

You might also like