Survey of Time Series Data Processing in Industrial Internet
Survey of Time Series Data Processing in Industrial Internet
Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS)
Abstract—This paper focuses on the processing Internet of vehicles, smart grid and so on, more requirements
requirements of time series data in industrial and IoT fields. for real-time data processing are putting forward. Another
The study of time series data processing in industry is solution with a new architecture is gradually formed. It is
continued for a long time, and a mature solution of using real- called time series database, and it is designed to meet the
time/ historian database has been formed. However, with the need of monitoring and analyzing massive real-time data
new demand of Industrial Internet, old architectures are from the Internet. This new database is kind of similar with
unable to fully support requirements (i.e., large amount and the real-time/ historian database. Compared with the
real-time analysis of industrial data). Meanwhile, a new traditional industrial solution, the Internet solution has better
architecture for processing real-time data in mobile Internet
scalability. In addition, it’s natural integration with big data
started to mature, this forms a solution called time-series
ecosystem will undoubtedly challenge the original technical
database, which provides lots of new advantages. When we try
to use a new technology to replace an old one, many aspects framework[5-10].
should be considered. This paper focuses on these challenges, This paper focuses on what new challenges will occur
starting with the demands of the industry, to analyze how to when processing real-time data in industrial systems in the
solve traditional problems with new technologies. This paper age of Industrial Internet, and what are the differences
also analyzes the development trend of time series data between the mature technologies and the emerging
processing and puts forward some general requirements for technologies. Besides, this paper also researches how to seize
the application of new technology in the field of Industrial the trend of technology to meet the requirements of Industrial
Internet, which lays a theoretical foundation for the Internet.
application and development of basic technologies of Industrial
Internet. II. REAL-TIME DATA PROCESSING IN TRADITIONAL
INDUSTRY
Keywords—Time Series Data; Time Series Database; Real-
In traditional industrial control field, there are a lot of
time/ historian Database; Industrial Internet
real-time data processing requirements, especially in the
I. INTRODUCTION Process Industry. The monitoring requirements are stringent
in production. Real-time monitoring data will reflect the
The rapid development of the Internet has led to a trend status of system, therefore, the processing of real-time data is
of technological innovation. A lot of new technologies has very important. After a long time of accumulation, a unique
been applied to the industry to solve practical needs. In and mature solution has been formed. The application of
industrial fields, concepts like “Industrial Internet”, real-time/ historian database is an important part of it, which
“Intelligent Manufacturing” begin to emerge. People are has been used for many years. In the field of industrial
actively exploring the application of new technologies in control, real-time/ historian database is mainly used to
industrial fields. Whether all the new technologies are collect, storage, query and analyze industrial process data,
suitable for promoting? When a new technology is used well and realize the real-time monitoring of process status[11].
in one industry, it may not be suitable for other industries, Data in industry have these characteristics: 1) most of
especially in the industrial field, which has more complex industrial data have timestamps and are generated in
environment. Industrial software system has extremely high sequence; 2) most of industrial data are structured data; 3)
requirements, especially for the abilities like real-time, the frequency of data collection is high, and the amount is
stability and security. Industrial systems were relatively close large; 4) the characteristics of a time period are more
for many years, so they were formed to be mature and important than of a single time point.
independent[1-3]. The requirements of software in industry are always very
In industrial fields, more than 80 percent of the rigorous, so the real-time/ historian database is polished as
monitoring data are real-time data, and all of them are time- practical, precise, stable, close, and with high-performance.
series data with timestamps[4]. These data from sensors or Take a medium-sized industrial enterprise as an example.
monitoring systems are collected in real time and used for When monitoring process, it may have 50,000 to 100,000
quick feedback of system status. In traditional industry, real- measuring points. The amount of data produced per day can
time/ historian databases are often used as the core solution reach hundreds of GB. Data in industrial enterprises are
to collect, store, query and analyze these data. required to be stored over a long time, so that historical
However, outside the field of industry, with the emerging trends can be queried at any time. These simple requirements
of new concepts such as mobile Internet, Internet of things, have demonstrated some of the capabilities that traditional
real-time/ historian databases needs to have, such as:
Project supported by National Key R&D Program of
China (2016YFB1000601)
737
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on October 22,2023 at 12:55:16 UTC from IEEE Xplore. Restrictions apply.
has been formed. This database solution and the traditional sides. So that they can absorb advantages and compensate
real-time/ historian database are like twins in different times. disadvantages from each other.
After entering the era of Internet, with the innovation of
Technology Trends
communications technology and the decline of
communication costs, another trend of Internet of everything With the development of Industrial Internet, demands are
starts. Not only the computer system needs to collect data, becoming more and more clear. When these two database
mobile phones, smart devices, shared bikes and cars that technologies bump into each other, we can observe some
people use every day are constantly sending real-time data to trends of technology development. We conclude them as the
the cloud. These data will be analyzed with big data following 6 points:
technologies to monitor and forecast the business, and help 1) Transition to distributed architecture: Traditional
enterprises reduce costs, as well as serve the public[15-16]. real-time databases mostly use Active-standby architectures,
These data share some of the same characteristics as most usually requiring expensive machines with higher hardware
of the real-time data in industrial fields: configuration to achieve extreme performance of a single
1) The length of a single data is not very large, but the machine; at the same time, it requires extreme stability of the
amount of data is very large; running software. The quality of the software ensures error-
2) They are all time-stamped, and they're generated in free running for many years; It will also require ultra-high
sequence; data compression ratio because of limited storage. But with
3) Most of the data are structured and are used to the development of distributed technology, the system can be
describe the characteristics of a parameter at a certain time easily expanded, so that the database is no longer dependent
point; on expensive hardware and storage devices. It can achieve
4) The writing frequency is much higher than the query high availability with the natural advantages of clusters, and
frequency; single point failure will never occur. It can be run on a
5) There are very few requirements to update data; normal x86 server or even on a virtual machine. Distributed
6) Users are more interested about the characteristics of architecture will greatly reduce the cost of use[17].
data over a time-period than of a single time point; 2) Diversified data structure: In industry, the
7) Most of the queries are based on a certain time-period traditional real-time/ historian database often uses single-
or a certain numerical range; value model. A parameter under monitoring is called a
8) Need calculation and visualization. measuring point. A model will be built for each measuring
Data from smart meters, environmental monitoring point when writing data to a database. For example, an
equipment, and industrial production lines also have these index like temperature of a wind turbine can be calculated as
characteristics. a measuring point, ten indexes of ten wind turbines are 100
However, due to the difference of application scenarios, measuring points. Each measurement point has some
industrial solutions may differ from Internet solutions to a information (like name, precision, data type, switching/
certain extent, which can be seen in Table II. analog value and so on). The writing efficiency of single-
value model is very high. Fig 1 shows the structure of a
TABLE II. DIFFERENCES BETWEEN TWO DATABASE SOLUTIONS single-value model.
Real-time/ historian Time series database
Items
database
Industrial enterprises, Internet enterprises,
mostly enterprise or mainly based on cloud
Growth environment
group-level platforms
applications
Deployment Active-Standby mode Distributed
Fig. 1. Structure of a single-value model
Read and write data, Read and write data,
Functional
aggregate query, data aggregate query, data
requirements Time series database in the Internet fields usually use
compression compression
Extreme processing High throughput and multi-value model, which is similar with the object-oriented
Performance
requirements
speed with single performance scaling model. For example, we create a model called wind turbine.
machine with cluster Its parameters include temperature, pressure, as well as
One-time License fee, Pay as much as use
Charge mode
high unit price
latitude and longitude, ID and other tag information. This
Combine with other will make it more appropriate for analysis when providing
Software ecology Integrated toolkit services. Technically, the single-value model and the multi-
independent services
Single machine Cloud platform and value model can be converted to each other. Many databases
performance and architecture provide services with multi-value model, but the underlying
Main advantages
compatibility with advantages
storage is still single-value model. Fig 2 shows the structure
industrial systems
Development Distributed and cloud Gradually infiltrate of a multi-value model.
tendency platform into industrial field
The time series database with new architecture is the
same, to a certain extent, as the traditional real-time/
historian database when processing these Internet data. They
have the same functional requirements realized in different
areas. When the new Internet technology permeates into
industry, it will reflect certain strengths and weaknesses. The
development of Industrial Internet technology requires the
mutual penetration and integration of technologies from two Fig. 2. Structure of a multi-value model
738
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on October 22,2023 at 12:55:16 UTC from IEEE Xplore. Restrictions apply.
3) SQL support: Most of the time series databases for large-scale analysis are transmitted to centralized
choose NoSQL storage which has better scalability[18]. storage. This hierarchical processing can effectively
Compared with the relational database, the data model of enhance the value of time-sensitive data, while reducing the
NoSQL is more flexible, which is very suitable for the burden of storage systems. So many time series databases
multi-value model, because it is easier to be extended. It is have edge computing versions, combined with the ability of
easy to scale out the cluster when resources are limited or stream computing to make the functionality more
when performance needs to be improved. The query diverse[21-23].
efficiency is high, and the cost of open source software is Fig 3 shows the Industrial Internet real-time processing
quite low. Most time series databases use various types of solution with edge computing.
NoSQL models, and Table III shows some examples:
739
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on October 22,2023 at 12:55:16 UTC from IEEE Xplore. Restrictions apply.
features and most mature functions. 95% of its operations the research on demand and supply side, we list some
can be done with visual interface. Other products need IDEs technical requirements for processing industrial real-time
to set the operations. Company E is also the only one that data in the new era. This work is of great significance for
has label management function, while others need to check further development of time series database and real-time
the data structure to get the meta data. processing in industry. Our team will continue study the
From overall situation, 7 companies have their own performance requirements of time series database to
emphasis on the design of product architecture and evaluate products which are suitable for industrial scenarios.
functions. Although their functions have different degrees of
maturity and ease of use, all products can cover most of the
technical requirements. ACKNOWLEDGEMENT
This paper’s relevant project is supported by National
TABLE IV. FUNCTIONAL SATISFACTION OF DIFFERENT DATABASES Key R&D Program of China (2016YFB1000601). In
Requirements A B C D E F G
addition, Special Thanks to Jeff TAO, Jie-ying HU, Miao
HUANG, Yu ZHONG, Le-qiang AI.
Data type Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ
Data accuracy Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ
Write time-series data Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ REFERENCES
Dynamically add time series Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ [1] Industrial internet platform white paper. Alliance of Industrial
Internet, 2019.
Query time series Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ
[2] Technical architecture of industrial big data white paper. Alliance of
Query tags Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Industrial Internet, 2018.
Interpolation queries Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ [3] Industial big data technology and application white paper. Alliance of
Query value* Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ ż Industrial Internet, 2017.
Query latest data Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ [4] Survey of industrial enterprise data asset management status. Alliance
of Industrial Internet, 2018.
Aggregation queries Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ
[5] S. K. Jensen, T. B. Pedersen and C. Thomsen, "Time Series
User-defined functions* Ɣ ż ż Ɣ Ɣ ż ż Management Systems: A Survey," in IEEE Transactions on
Geographic position queries* Ɣ ż Ɣ Ɣ ż ż ż Knowledge and Data Engineering, vol. 29, no. 11, pp. 2581-2600, 1
Nov. 2017.
Data lifetime management Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ
[6] Garima and S. Rani, "Review on time series databases and recent
Compatibility with mainstream Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ research trends in Time Series Mining," 2014 5th International
hardware and operating systems Conference - Confluence The Next Generation Information
Deploy with containers* ż ż Ɣ Ɣ ż ż ż Technology Summit (Confluence), Noida, 2014, pp. 109-115.
Connection with big data ecosystem* Ɣ ż Ɣ Ɣ Ɣ ż Ɣ [7] L. Xu, et al. "Telecom big data based user offloading self-
Easy deployment Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ optimisation in heterogeneous relay cellular systems," International
Journal of Distributed Systems and Technologies, 8(2), pp. 27-46,
Configuration management Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ April 2017.
Real-time monitoring Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ [8] L. Xu, et al. "Data mining and evaluation for Base Station
User management Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ deployment," in Proc. ICSINC, 13-15 Sept 2017, Chongqing China,
pp. 356-364.
Online update* Ɣ ż Ɣ ż Ɣ ż Ɣ
[9] D. Han and E. Stroulia, "A three-dimensional data model in HBase
Meta data management Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ for large time-series dataset analysis," 2012 IEEE 6th International
Import and export data Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Workshop on the Maintenance and Evolution of Service-Oriented and
Cloud-Based Systems (MESOCA), Trnto, 2012, pp. 47-56.
Hardware fault tolerance Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ
[10] D. Ramesh, A. Sinha and S. Singh, "Data modelling for discrete time
Operating system fault tolerance Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ series data using Cassandra and MongoDB," 2016 3rd International
Database service fault tolerance Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Conference on Recent Advances in Information Technology (RAIT),
Dhanbad, 2016, pp. 598-601.
Overload protection Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ
[11] Mi Yeong Hwang, Cheng Hao Jin, Yang Koo Lee, Kwang Deuk Kim,
Multiple replicas Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Jung Hoon Shin and Keun Ho Ryu, "Prediction of wind power
Online scale up or scale out Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ generation and power ramp rate with time series analysis," 2011 3rd
International Conference on Awareness Science and Technology
Online scale down* Ɣ ż Ɣ ż Ɣ ż Ɣ (iCAST), Dalian, 2011, pp. 512-515.
Authentication Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ [12] D. Ramesh and A. Kumar, "Query Driven implementation of Twitter
Operation audit Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ Ɣ base using Cassandra," 2018 International Conference on Current
Trends towards Converging Technologies (ICCTCT), Coimbatore,
Encryption communication Ɣ Ɣ Ɣ Ɣ Ɣ ż Ɣ 2018, pp. 1-4.
[13] H. Talei, M. Essaaidi and D. Benhaddou, "An End to End Real Time
Architecture for Analyzing and Clustering Time Series Data: Case of
VI. CONCLUSION an Energy Management System," 2018 6th International Renewable
and Sustainable Energy Conference (IRSEC), Rabat, Morocco, 2018,
In this paper, we analyze the challenges of processing pp. 1-7.
time series data in industry from several aspects, describe
[14] M. Tahmassebpour, "A New Method for Time-Series Big Data
the traditional processing methods in industrial field, and the Effective Storage," in IEEE Access, vol. 5, pp. 10694-10699, 2017.
new processing architecture after the development of the [15] Lei Yan, Xin Ai, Yao Wang and Huiying Zhang, "Impacts of electric
Internet. In order to solve the problems emerging in the vehicles on power grid considering time series of TOU," 2014 IEEE
industry, old method needs to take advantages form new one Conference and Expo Transportation Electrification Asia-Pacific
and they need to complement with each other, which is also (ITEC Asia-Pacific), Beijing, 2014, pp. 1-5.
the essence of Industrial Internet. The following part of the [16] S. Aljawarneh, V. Radhakrishna, P. V. Kumar and V. Janaki, "A
paper describes the technology trend in industry. Through similarity measure for temporal pattern discovery in time series data
740
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on October 22,2023 at 12:55:16 UTC from IEEE Xplore. Restrictions apply.
generated by IoT," 2016 International Conference on Engineering & to power plants," 2017 Nineteenth International Middle East Power
MIS (ICEMIS), Agadir, 2016, pp. 1-4. Systems Conference (MEPCON), Cairo, 2017, pp. 1400-1405.
[17] Pelkonen T , Franklin S , Teller J , et al., "Gorilla: A Fast, Scalable, [21] E. Oyekanlu and K. Scoles, "Real-Time Distributed Computing at
In-Memory Time Series Database," Proceedings of the Vldb Network Edges for Large Scale Industrial IoT Networks," 2018 IEEE
Endowment, 2015, 8(12):1816-1827. World Congress on Services (SERVICES), San Francisco, CA, 2018,
[18] A. K. Kalakanti, V. Sudhakaran, V. Raveendran and N. Menon, "A pp. 63-64.
comprehensive evaluation of NoSQL datastores in the context of [22] P. Porambage, J. Okwuibe, M. Liyanage, M. Ylianttila and T. Taleb,
historians and sensor data analysis," 2015 IEEE International "Survey on Multi-Access Edge Computing for Internet of Things
Conference on Big Data (Big Data), Santa Clara, CA, 2015, pp. 1797- Realization," in IEEE Communications Surveys & Tutorials, vol. 20,
1806. no. 4, pp. 2961-2991, Fourthquarter 2018.
[19] H. S. Raju and S. Shenoy, "Real-time remote monitoring and [23] E. Oyekanlu, "Predictive edge computing for time series of industrial
operation of industrial devices using IoT and cloud," 2016 2nd IoT and large scale critical infrastructure based on open-source
International Conference on Contemporary Computing and software analytic of big data," 2017 IEEE International Conference
Informatics (IC3I), Noida, 2016, pp. 324-329. on Big Data (Big Data), Boston, MA, 2017, pp. 1663-1669.
[20] E. Elazab, T. Awad, H. Elgamal and B. Elsouhily, "A cloud based
condition monitoring system for industrial machinery with application
741
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on October 22,2023 at 12:55:16 UTC from IEEE Xplore. Restrictions apply.