Time Series Data Mining A Case Study With Big

Uploaded by

practice752

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

Time Series Data Mining A Case Study With Big

Uploaded by

practice752

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

SPECIAL SECTION ON BIG DATA TECHNOLOGY AND APPLICATIONS IN

INTELLIGENT TRANSPORTATION

Received December 31, 2019, accepted January 9, 2020, date of publication January 14, 2020, date of current version January 24, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2966553

Time Series Data Mining: A Case Study With Big

Data Analytics Approach
FANG WANG 1,3 , MENGGANG LI 2,3,4 , (Member, IEEE), YIDUO MEI 5, AND WENRUI LI 1,3
1 School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
2 National Academy of Economic Security, Beijing Jiaotong University, Beijing 100044, China
3 Beijing Laboratory of National Economic Security Early-Warning Engineering, Beijing Jiaotong University, Beijing 100044, China
4 Beijing Center for Industrial Security and Development Research, Beijing Jiaotong University, Beijing 100044, China
5 Postdoctoral Programme of China Centre for Industrial Security Research, Beijing Jiaotong University, Beijing 100044, China

Corresponding authors: Menggang Li ([email protected]) and Wenrui Li ([email protected])

This work was supported by the Program of the Co-Construction with the Beijing Municipal Commission of Education of China under
Grant B18H100040.

ABSTRACT Time series data is common in data sets has become one of the focuses of current research. The
prediction of time series can be realized through the mining of time series data, so that we can obtain the
development process and regularity of social economic phenomena reflected by time series, and extrapolate
to predict its development trend. More and more attention has been paid to time series prediction in the era
of big data. It is the basic application of time series prediction to accurately predict the trend. In this paper,
we introduce various time series autoregressive (AR) model, moving average (MA) model, and ARIMA
model that is combined by AR and MA. As the time series prediction in general scenarios, the ARIMA is
applied to the risk prediction of the National SME Stock Trading (New Third Board) in combination with
specific scenarios. The case studies show that the results of our analysis are basically consistent with the
actual situation, which has greatly helped the prediction of financial risks.

INDEX TERMS Data mining, time series, financial forecast, AR, MA, ARIMA, financial risk.

I. INTRODUCTION This method fits the historical time trend curve by estab-
Time series data mining comes from the need of people lishing an appropriate mathematical model and predicts the
to visualize data models according to their abilities. People trend of future time series according to the established model
rely on complex methods to perform these tasks. In fact, Curves, our common models include ARMA [2], VAR [3],
we can ignore small fluctuations to get the conceptual model TAR [4], ARCH [5], etc. The traditional time series method
and distinguish different time models based on the similarity can be applied to a variety of scenarios because it relies
between models. The main time series related tasks include on relatively simple data and only needs the historical time
content-based querying, anomaly checking, pattern recogni- series trend curve to build a model. However, the traditional
tion, prediction, clustering, classification and segmentation. time series prediction method often faces the problem of
A large number of decision-making problems cannot be lag, which is that the predicted value is several time units
separated from prediction in various research fields of the later than the true value. In order to improve the accuracy of
natural sciences and social sciences, forecast is the basis of prediction, machine learning algorithms are introduced into
the decision-making [1]. Therefore, we mainly explored the time series prediction. The machine learning methods select
time series data analysis and prediction. features that may affect the predicted value according to the
Time series prediction methods are divided into traditional specific application scenario, then introduces these features
time series prediction methods and machine learning meth- into the model, finally applies machine learning classification
ods. The traditional time series forecasting method refers to models for prediction. Machine learning methods need to
predicting the trend development of future time series only extract more features from data in multiple dimensions. The
based on the trend development of historical time series. more complex the model, the more accurate the prediction.
However, models are often not universal and features need
The associate editor coordinating the review of this manuscript and to be re-extracted for different application scenarios to build
approving it for publication was Sabah Mohammed . models. In reality prediction, machine learning methods are

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
14322 VOLUME 8, 2020
F. Wang et al.: Time Series Data Mining: Case Study With Big Data Analytics Approach

often combined with traditional time series prediction meth-

ods. We mainly explore the AR and MA prediction models,
and then explore the combination of the two models, ARIMA,
which has a good method for processing non-stationary time FIGURE 1. Three-step analysis method.
series [6].
At present, there are many methods for analyzing and
predicting factors related to the relationship between supply III. TIME SERIES DATA ANALYSIS AND FORECAST
and demand in the financial market, but the effect of this In this paper, a three-step analysis method for time series
method is not obvious. We conducted a time series analysis data analysis is proposed (see Figure 1): Firstly, the data
of the financial and economic fields and used the ARIMA is pre-processed, which includes stationary processing of
model to predict the risks of the National SME Stock Trading time series that are in an unstable state. Secondly, the pre-
(New Third Board). The final results are basically consistent processed data is tested for stationarity. Finally, the prediction
with the actual results, and good prediction results have been model is used to predict the probability distribution in the
obtained. same time period in the future.

A. STATIONARITY DETERMINATION AND PROCESSING

II. RESEARCH BACKGROUND A time series can be considered stable when it has no system-
Time series data is encountered in every aspect of the sci- atic changes in the mean (no trend), no systematic changes
entific field [7]. A time series is a series of observations in the variance and periodic changes strictly eliminated. The
taken in chronological order. For examples, a time series can time series can be further subdivided into strict stationary and
be constituted by the closing price of a stock A on each weak stationary. For all time t, any positive integer k and
trading day from June 1, 2015 to June 1, 2016; a time series any k positive integers (t1 , t2 , · · · , tk ), the joint distribution
can be constituted by the daily maximum temperature in of (rt1 , rt2 , · · · , rtk ) is the same as the joint distribution of
a certain place; The station’s environmental detection data (rt1+t , rt2+t , · · · , rtk+t ), we call the time series {rt } to be
records consist of a time series and so on. With the rapid strictly stable and the joint distribution of (rt1 , rt2 , · · · , rtk )
development of big data, more and more time series data are remains unchanged under the translational transformation of
stored in computers, so that we have a huge amount of time time. The above time series are strong stationary time series,
series data. Faced with these time series data, people want but the time series we use are generally weak stationary
to reveal the information existing in these series data sets sequence.
through effective methods or techniques. Today, the study of A weakly stationary sequence {rt } must satisfy the follow-
time series data has been rapidly developed and has become ing two conditions: E (rt ) = µ (µ is constant). Variance
an important research direction in data mining. We can dis- Cov (rt , rt−1 ) = γl , γl only depends on l (l is any integer). For
cover the inherent rules of things change and provide a ref- weakly stationary time series, the mean and the covariance
erence for relevant people through the study of time series of rt and rt−1 do not change with time. We usually call a
data. stationary sequence is weakly stationary in financial data.
The basis of time series analysis is to believe that prices Differential operation is usually used to achieve the stable
follow trends and that past price information is useful for condition when the time series is not stable. The difference
predicting future prices. Analysis of the dynamic changes in (forward here) is to find the difference between the value rt
stock prices is one of the most difficult challenges for human of the time series {rt } at time t and the value rt−1 at time
intelligence. There are many prediction methods for predict- t-1. Let us consider it as dt , it is a first-order difference.
ing stock price changes, such as the Box Jenkins method [8], If the same operation is performed on the new sequence {dt },
the Black-Scholes model [9], and the binomial model [10]. it is a second-order difference. Generally, non-stationary time
The Box-Jenkins method is a five-step process of identifying, series can be processed through d-time difference to be as
selecting, and evaluating conditional averaging models. The stationary or as approximate as stationary time series.
ARIMA method is popularized by Box and Jenkins, and
the ARIMA model is often called the Box-Jenkins model. B. TIME SERIES PREDICTION MODEL
Box and Tiao discuss the general transfer function model used 1) CORRELATION COEFFICIENT AND
by the ARIMA program. The Black-Scholes model is XXX. AUTOCORRELATION FUNCTION
The binomial model is YYY. The Autoregressive Moving The correlation coefficient is actually the angle between the
Average (ARMA) method is one of the most popular linear two vectors in the vector space and the covariance is the
models in time series prediction because of its good statistical expected value (or mean) of the product of their deviations
characteristics and great flexibility. The current stock forecast from their individual expected values. The correlation coeffi-
is based on market demand, the effect of this method is not cient is equal to 1 or −1 when the two vectors are parallel
very satisfactory due to the lack of time series. This paper (In particular, 1 means the same direction, −1 means the
uses the processing of time series data to obtain future stock reverse). If the two vectors are perpendicular and the cosine
conditions and produces good results. of the included angle is equal to 0, it means that the two