Network Traffic Prediction Based On LSTM and Transfer Learning
Network Traffic Prediction Based On LSTM and Transfer Learning
ABSTRACT The increasing amount of traffic in recent years has led to increasingly complex network
problems. To be able to improve overall network performance and increase network utilization, it is valuable
to take measures to capture future trends in network traffic. In traditional machine learning, to guarantee
the accuracy and high reliability of the models obtained through training, there are two basic assumptions:
(1) the training samples used for learning and the new test samples satisfy the condition of independent
identical distribution; and (2) there must be enough training samples to learn a good model. However, time-
series data are not easily accessible in real life, and even after putting in a lot of time and effort to collect
them, the data may be unavailable due to confidentiality. In this paper, a neural network model based on long
and short-term memory (LSTM) and transfer learning is proposed to address the problem of small sample
size in network traffic prediction. Knowledge in the source domain is transferred to the target domain using
transfer learning, and a prediction model with good performance is constructed with a small amount of target
domain data. The results show that the performance of the transfer learning model improves by more than
40% over the direct training model when using the same samples for predicting 10,000 rows of data, resulting
in better performance of the network traffic prediction task.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 86181
X. Wan et al.: Network Traffic Prediction Based on LSTM and Transfer Learning
segmented autoregressive sliding average (ARMA) model in the future period. The acquisition of traffic sequence data
[4], differential autoregressive moving average (ARIMA) requires a lot of time and effort, and even then, the acquired
model [5], differential autoregressive summation sliding traffic sequence data may not be usable due to the inclusion
average (FARIMA) model [6], etc. In addition, some schol- of privacy. In traditional network traffic prediction methods,
ars apply nonlinear theory to network traffic prediction and neural network models can use network layers to extract fea-
propose prediction models based on support vector machines tures from sufficient data and show good performance in per-
(SVM) [7], gray models (GM) [8], Gaussian processes forming prediction tasks. However, when there is insufficient
(GP) [9], and neural networks (NN) [10]. For example, the data, the neural network model is unable to create attributes
gray model is based on support vector machine compensa- that are not present in the data. If the neural network model
tion, the Gaussian process hybrid prediction model is based obtains training data that is not representative, it models the
on Gaussian distribution and the traffic prediction model is unique attributes in these training data as general attributes,
based on long and short-term memory (LSTM) neural net- which is often referred to as the overfitting problem. The
work. overfitting problem will result in a neural network model that
Although the prediction effects of the above models are predicts the training data more accurately but has a higher
more satisfactory, there are still shortcomings. With the error rate for other data and poor generalization performance.
increase in network complexity, the distribution characteris- The method proposed in this paper uses the idea of trans-
tics of network traffic have exceeded the traditional sense of fer learning to transfer the parameters of a network traffic
Poisson distribution or Markov distribution, so it is difficult prediction model, which has been trained in other domains,
to ensure the accuracy of the linear model prediction. Increas- to the original LSTM model. The constructed LSTM model
ingly mature machine learning-based traffic prediction meth- is trained using the pre-processed target domain data. The
ods have received great attention, and many traffic predic- method proposed in this paper results in more accurate net-
tion models based on vector machines and artificial neural work traffic prediction and better generalization of the net-
networks have emerged to greatly improve the prediction of work model when using the same size of data.
complex traffic at present. For traditional machine learning Considering the previous studies, the key contributions of
models based on vector machines and artificial neural net- this work can be summarized as follows
works, to guarantee the accuracy and high reliability of the 1. Building a network traffic prediction architecture based
models obtained by training, there are two basic assumptions: on LSTM and transfer learning.
(1) the training samples used for learning and the new test 2. By adding transfer learning, a neural network model
samples satisfy the condition of independent identical distri- can be trained using a small amount of data. Our
bution; (2) there must be enough training samples to learn method is able to produce more accurate predictions
a good model. However, in practical applications, these two than the method without transfer learning.
conditions are often not satisfied. Many fields eager to use 3. Transfer learning has been applied to solve classifica-
machine learning do not have enough data to train a model. tion problems, usually in combination with CNN neural
In this context, transfer learning was born. Transfer Learning networks. The combination of transfer learning and
is a term used in machine learning to refer to the effect of one LSTM proposed in this paper extends the application
type of learning on another type of learning, or the effect of area of transfer learning and, at the same time, proposes
an acquired experience on the completion of other activities. a new method to solve the prediction problem.
It can transfer existing knowledge to solve the problem of
having only a small amount of labeled data in the target The paper is organized as follows. Section II briefly sum-
region [11]. marises LSTM and transfer learning and why transfer learn-
In this paper, we propose a network traffic prediction ing should be used in network traffic prediction. Section III
method based on LSTM neural network and transfer learning. describes the network traffic prediction architecture based on
The method uses the idea of transfer learning to save the LSTM and transfer learning that will be used in this paper.
knowledge acquired during the execution of the source task Section IV presents performance results from specific test
in the source domain. When the knowledge in the target scenarios, and conclusions are presented in Section V.
domain is insufficient to complete the target task, the saved
knowledge is applied to complete the target task. The specific II. RELATED WORK
implementation is to transfer the parameters in the network As mentioned in the previous section, the use of linear and
traffic prediction model with sufficient source domain data nonlinear models for network traffic prediction has been
training to the network traffic prediction model without suffi- extensively studied in the literature, mainly by construct-
cient target domain data training and then train with less target ing fine-grained neural network models and then training
domain data, and finally get the network traffic prediction them using sufficient amounts of data. In contrast to the
model with more accurate prediction. above literature, we will use a small amount of data to con-
The network prediction task involved in this paper is a struct well-performing network traffic prediction models that
single indicator time series prediction task, i.e., given the address the problem of data not being easily available in the
historical change of a certain indicator, predict its change network domain. In this section, we present the background of
FIGURE 3. Network traffic prediction architecture based on LSTM and transfer learning.
values to replace the outliers, obtain a stable data set, and failure; the cost of acquiring such information is too high;
better construct the data model. the real-time performance of the system requires a high level
The logic of the percentile algorithm is to sort the factor of performance, that is, it is required to make judgments or
values in ascending or descending order and to process the decisions quickly before getting such information. The pres-
factor values whose ranking percentile is higher than the ence of missing values will cause the system to lose a large
set percentage or lower than the set percentage, similar to amount of useful information, making the certainty exhibited
the practice of ‘‘removing the highest scores and the lowest in the system weaker and the uncertainty component present
scores’’ in some competitions. The set percentages need to in the system more prominent. Data containing null values
be analyzed on a case-by-case basis. Due to the uncertainty of will cause the data analysis process to fall into chaos and lead
the percentages, this paper decided to use the median absolute to unreliable outputs.
deviation algorithm for the outliers. To avoid the problems caused by missing values, the miss-
The median absolute deviation (MAD) algorithm is to ing values are often removed to obtain the complete data
determine whether each element is an outlier by determin- set. Alternatively, other approaches are used for completing,
ing whether its deviation from the median value is within a such as Mean/Mode Completer and K-means clustering. The
reasonable range. Mean/Mode Completer method divides the attributes in the
1. Calculate the median value of all elements: Xmedian . initial dataset into numerical and non-numerical attributes
2. Calculate the absolute deviation of all elements from to be processed separately. If the null value is numeric, the
the median, a single element is denoted as Xi : bias = missing attribute is filled based on the average of the values
|Xi - Xmedian |. of the attribute in all other objects; if the null value is non-
3. Obtain the median value of the absolute deviation: numeric, the missing attribute is filled with the value that has
MAD = biasmedian . the highest number of values in all other objects (i.e., the value
4. Determine the parameter n, then all the data can be that occurs most frequently) based on the statistical principle
adjusted as (1). of plurality.
Another method similar to it is the Conditional Mean Com-
Xmedian + nMAD Xi > Xmedian + nMAD;
pleter method. In this method, the value used for averaging is
0
Xi = Xmedian − nMAD Xi < Xmedian − nMAD; (1) not taken from all the objects in the data set, but from those
Xi Xmedian − nMAD < Xmedian + nMAD. that have the same decision attribute value as that object. The
basic starting point of these two methods of data averaging
2) COMPLEMENTARY MISSING VALUES is the same, to supplement the missing attribute values with
There are many reasons for missing values. Broadly speaking, the maximum probability possible to take the values, only
information is temporarily unavailable; data is not recorded, differing a little in the specific method. Compared with the
omitted, or lost due to human factors, which is the main other methods, it uses the majority of the information from
reason for missing data; data is lost due to the failure of data the existing data to infer the missing values. The dataset used
collection equipment, storage media, or transmission media in this paper is a network traffic dataset, so we use a more
3) DATA SCALING
Data scaling, in statistics, means that the original data are
transformed by a certain mathematical transformation in a
certain way to put the data into a small specific interval, such
as 0 to 1 or −1 to 1. The purpose is to eliminate the differences
in characteristics, order of magnitude, and other characteristic
attributes between different samples and transform them into
a dimensionless relative value, with the resulting values of
each characteristic quantity being in the same order of mag-
nitude. There are many methods of data scaling, such as Min-
Max Normalization, Min-Max Normalization, also known as
the extreme difference method, is the simplest way to deal
with the magnitude problem, which is to scale the value of
a column in the data set to between 0 and 1. It is calculated
FIGURE 5. Dataset of core network traffic in a European city.
as (2). A single element is denoted as X, the minimum value
in the dataset is denoted as Xmin , and the maximum value in
the data set is denoted as Xmax .
to learn the mapping function y = f(x) from x to y. The
0 goal of the algorithm is to approximate the true mapping
X = (X − Xmin )/(Xmax − Xmin ). (2)
relationship well enough so that when new input data (X) is
This is a linear transformation of the original data. The available, the output variable (Y) of that data can be predicted.
Min-Max normalization method preserves the interrelation- A supervised learning problem is obtained by shifting the
ship between the original data, but if after normalization, the time series forward by a one-time step.
new input data exceeds the range of values of the original
data, i.e., it is not in the original interval [Xmin, Xmax], 5) DATASET
an out-of-bounds error will be generated. Therefore, this There are two datasets used for the experiments in this paper:
method is suitable for cases where the range of values of the the ‘‘int’’ traffic dataset and the ‘‘isp’’ traffic dataset. The
original data has been determined. ‘‘int’’ traffic dataset, was collected from 09:30 on November
Mean normalization is similar to Min-Max normalization, 19, 2004, to 11:11 on January 27, 2005. As shown in Fig. 4,
with the difference that the best value in the numerator is data were collected every five minutes. The ‘‘isp’’ traffic
replaced by the mean value u. It can be calculated using (3). dataset, was from a private ISP with centers in 11 European
X 0 = (X − u)/(Xmax − Xmin ). (3) cities. These data correspond to a transatlantic line and were
collected from June 7, 2005, 06:57 to July 31, 2005, 11:17.
This method scales the data to the interval [−1,1] with Data were collected every five minutes, as shown in Fig. 5.
an average value of 0. In this paper, the data are scaled to
between [0, 1] using the extreme difference method. B. MODEL BUILDING
In this paper, we use the LSTM network to construct a
4) RAW TIME SERIES TO CONSTRUCT SUPERVISED DATA network traffic prediction model. The input of the neural
Supervised learning is a problem with an input variable (X) network based on transfer learning is the network traffic of
and an output variable (Y), and an algorithm can be used the previous time of the backbone network, and the output
FIGURE 6. Network traffic prediction model based on LSTM and transfer learning.
result is the network traffic of the latter time. After completing Update the output gate output: the update of the hidden
the corresponding training, the network traffic of the previous state ht consists of two parts, the first part is ot , which is
time of the core network is used as the input, and the output obtained from the hidden state ht−1 of the previous moment
result is the traffic of the core network at a later time. The and the input data xt of the current moment, defining the
LSTM network traffic prediction model based on transfer weight Wo , bias bo , the weight Uo and activation function
learning is obtained after the training is completed. The neural sigmoid; and the second part consists of the hidden state Ct
network model based on transfer learning is shown in Fig. 6. and tanh activation function.
The input of the directly trained neural network is the network The last step is to update the predicted output of the current
traffic of the core network at the previous time, and the output moment: define the weights V and bias c, and then define the
result is the network traffic of the later time, and the directly activation function, generally the sigmoid function, to get the
trained LSTM network traffic prediction model is obtained predicted output of the current moment.
after the training is completed. The backpropagation algorithm of LSTM is shown in
The forward propagation algorithm of LSTM is shown in Fig. 6. It defines L as the loss function and updates
Fig. 6. Update the output of the forgetting gate: the forgetting the parameters by chaining the derivative rule to achieve
gate controls whether to forget the hidden cell state of the conditional satisfaction. Although the structure of LSTM
previous layer, and the input is the hidden state ht−1 of the is quite complex, we can use it effectively with some
previous moment and the input data xt of the current moment, API support.
defining the weight Wf and the bias bf and the weight Uf by The network traffic prediction model based on LSTM and
a selected activation function, generally sigmoid, the output transfer learning constructed in this paper uses mean squared
ft of the forgetting gate can be obtained. Since the sigmoid error (MSE) as the loss function. In mathematical statistics,
function has an output ft between [0,1], the output ft here the mean square error refers to the expected value of the
represents the probability of forgetting the state of the hidden square of the difference between the estimated value of the
cell in the previous layer. parameter and the true value of the parameter. MSE can
Updating the two outputs of the input gate: The input gate evaluate the degree of change in the data. The smaller the
consists of two parts, the first part defines the weight Wi and value of MSE, the better the accuracy of the prediction model
the bias bi and the weight Ui , and then uses the sigmoid in describing the experimental data. Moreover, as the error
activation function, the output is it, the second part defines decreases, the gradient also decreases, which is beneficial
the weight WC and the bias bC and the weight UC , and uses to convergence, and even with a fixed learning rate, it can
the tanh activation function, the output is C et , the two outputs converge to the minimum value faster. It can be calculated
will be multiplied together to update the cell state. by (4). The actual value is represented by yi in the equation,
Update the cell state: the cell state Ct consists of two parts, the predicted value is represented by ŷi , and the amount of
the first part is the product of Ct−1 and the output ft of the data in the data set is defined using m. The model uses Adam
forgotten gate; and the second part is the product of it and C et as the optimizer and sets the learning rate to 0.02. The main
of the input gate. advantage of Adam is that, after bias correction, each iteration
of the learning rate has a certain range, which makes the TABLE 1. Prediction accuracy of transfer learning model and direct
training model(1000row).
parameters relatively stable.
m
1X
MSE = (yi − ŷi )2 . (4)
m
i=1
C. PARAMETER TRANSFER
1) MODEL SAVING
In the use of transfer learning, the data in the source domain A. EXPERIMENTAL SETUP
is used to train the model to get a better model, but in the We performed ablation experiments to ensure that the param-
actual application, it is not possible to train it first and then eters used were superior before building the transfer learning
use it, which will increase the time consumption. Therefore, model and direct training the model.
it is possible to save the previously trained model and then
load it when you need to use it. One way is to save the whole 1) TRANSFER LEARNING MODEL
model and then load it directly, but this will consume more 1. An LSTM neural network model is constructed using
memory; the other way is to save only the parameters of the PyTorch, with step size set to 10, i.e., the first 10 rows
model. All we have to do is to save the dictionary and call of the dataset are used for prediction; batch size set to
it, then create a new model with the same structure when 10, i.e., the number of samples processed in each batch
we need it, and import the saved parameters into the new is 10, and input size set to 1. The model optimizer is
model. Adam, and the learning rate is set to 0.02.
2. Intercept 10,000 rows of samples from the ‘‘int’’
2) MODEL LOADING dataset in the academic backbone network domain to
A neural network is an operational model that consists of a train it. Set the epoch to 50.
large number of nodes and their mutual connections. Each 3. Transfer the model to the core network domain and
node represents a specific output function, called the activa- train it again using 1000 rows of samples from the
tion function. Each connection between two nodes represents ‘‘isp’’ dataset. Set the epoch to 50.
a weighted value for the signal passing through the connec- 4. Test the prediction accuracy of the model with 10,000
tion, called the weight, which is equivalent to the memory of samples from the ‘‘isp’’ dataset.
an artificial neural network. Depending on the model saving
method, there are some differences in the model loading 2) DIRECT TRAINING MODEL
method. Saving the complete model means that both the 1. An LSTM neural network model is constructed using
model structure and the model parameters are saved, and PyTorch, with step size set to 10, i.e., the first 10 rows
when loading the model, you can choose to load all of them of the dataset are used for prediction; batch size set to
or transfer only the model parameters to the model with the 10, i.e., the number of samples processed in each batch
same structure as the original model. When only the model is 10, and input size set to 1. The model optimizer is
parameters are saved, you can only choose to reconstruct Adam, and the learning rate is set to 0.02.
the model with the same structure as the original model 2. Train with samples from the 1000 rows of the ‘‘isp’’
and then transfer the parameters, and you cannot load the dataset. Set the epoch to 50.
model directly. The method proposed in this thesis saves the 3. Test the prediction accuracy of the model with samples
complete model and chooses to load all of it when the model from the 10,000 row ‘‘isp’’ dataset.
is loaded.
FIGURE 7. Loss functions of different data volumes are used for transfer learning model and direct learning model.
TABLE 2. Prediction accuracy of reverse transfer (1000row). TABLE 3. Prediction accuracy of transfer learning model and direct
training model(100row).
the amount of data in the data set is defined using m. TABLE 4. Prediction accuracy of transfer learning model and direct
v training model(10row).
u m
u1 X
RMSE = t (yi − ŷi )2 . (5)
m
i=1
prediction model with good performance in a network traffic [17] K. Weiss, T. M. Khoshgoftaar, and D. Wang, ‘‘A survey of transfer learn-
prediction scenario. From the forward transfer experiment ing,’’ J. Big Data, vol. 3, no. 1, pp. 1–40, Dec. 2016.
[18] S. Pan and Q. Yang, ‘‘A survey on transfer learning,’’ IEEE Trans. Knowl.
and the reverse transfer experiment, we can see that the Data Eng., vol. 22, pp. 1345–1359, Nov. 2010.
knowledge acquired from the source domain can be applied [19] H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Müller,
in the target domain, while the knowledge acquired from ‘‘Transfer learning for time series classification,’’ in Proc. IEEE Int. Conf.
Big Data (Big Data), Dec. 2018, pp. 1367–1376.
the target domain can also be applied in the source domain, [20] K. Kashiparekh, J. Narwariya, P. Malhotra, L. Vig, and G. Shroff, ‘‘Con-
so the source and target domains are similar. The results of vTimeNet: A pre-trained deep convolutional neural network for time series
the comparison experiments show that the transfer learning classification,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2019,
pp. 607–612.
model has better starting and ending points than the direct [21] D. Kearney, S. McLoone, and T. E. Ward, ‘‘Investigating the application
training model in the training process with the same amount of transfer learning to neural time series classification,’’ in Proc. 30th Irish
of data. Compared with the direct training model without Signals Syst. Conf. (ISSC), Jun. 2019, pp. 1–5.
[22] Y. Yu, X. Si, C. Hu, and Z. Jianxun, ‘‘A review of recurrent neural networks:
transfer learning, the performance of the transfer learning LSTM cells and network architectures,’’ Neural Comput., vol. 31, no. 7,
model can be improved by more than 40% in completing the pp. 1235–1270, Jul. 2019.
target task after training with the source domain data, which
leads to the performance improvement of the network traffic
prediction task. XIANBIN WAN was born in Heze, Shandong,
China, in 1998. He received the B.S. degree from
the Qilu University of Technology, where he is cur-
REFERENCES rently pursuing the M.S. degree. His main research
[1] Cisco Visual Networking Index: Global Mobile Data Traffic Forecast interests include machine learning, cloud comput-
Update, 2017–2022, Cisco, San Jose, CA, USA, 2019. ing, and network resource management.
[2] X. Zhang, Y. Wang, M. Yang, and G. Geng, ‘‘Toward concurrent video
multicast orchestration for caching-assisted mobile networks,’’ IEEE
Trans. Veh. Technol., vol. 70, no. 12, pp. 13205–13220, Dec. 2021, doi:
10.1109/TVT.2021.3119429.
[3] X. Zhang, Y. Wang, G. Geng, and J. Yu, ‘‘Delay-optimized multicast tree
packing in software-defined networks,’’ IEEE Trans. Services Comput.,
early access, Aug. 20, 2021, doi: 10.1109/TSC.2021.3106264. HUI LIU was born in Linyi, Shandong, China,
[4] N. Sadek and A. Khotanzad, ‘‘Multi-scale high-speed network traffic in 1995. He received the B.S. degree from Weifang
prediction using k-factor Gegenbauer ARMA model,’’ in Proc. IEEE Int. University. He is currently pursuing the M.S.
Conf. Commun., Jun. 2004, pp. 2148–2152. degree with the Qilu University of Technology.
[5] H. Z. Moayedi and M. A. Masnadi-Shirazi, ‘‘Arima model for network His main research interests include computer net-
traffic prediction and anomaly detection,’’ in Proc. Int. Symp. Inf. Technol., works, networked control systems, and computer
vol. 4, Aug. 2008, pp. 1–6. network reliability.
[6] C. G. Dethe and D. G. Wakde, ‘‘On the prediction of packet process in
network traffic using FARIMA time-series model,’’ J. Indian Inst. Sci.,
vol. 84, nos. 1–2, p. 31, 2013.
[7] W. Chen, Z. Shang, and Y. Chen, ‘‘A novel hybrid network traffic pre-
diction approach based on support vector machines,’’ J. Comput. Netw.
Commun., vol. 2019, pp. 1–10, Feb. 2019. HAO XU was born in Weifang, Shandong, China,
[8] X. Xiao, H. Duan, and J. Wen, ‘‘A novel car-following inertia gray model in 1997. He received the B.S. degree from the Qilu
and its application in forecasting short-term traffic flow,’’ Appl. Math. University of Technology, where he is currently
Model., vol. 87, pp. 546–570, Nov. 2020. pursuing the M.S. degree. His main research inter-
[9] Y. Xu, F. Yin, W. Xu, J. Lin, and S. Cui, ‘‘Wireless traffic prediction ests include computer networks, computer net-
with scalable Gaussian process: Framework, algorithms, and verification,’’ work reliability, and machine learning.
IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1291–1306, Jun. 2019.
[10] B. G. Çetiner, M. Sari, and O. Borat, ‘‘A neural network based traffic-flow
prediction model,’’ Math. Comput. Appl., vol. 15, no. 2, pp. 269–278, 2010.
[11] A. Sherstinsky, ‘‘Fundamentals of recurrent neural network (RNN) and
long short-term memory (LSTM) network,’’ Phys. D, Nonlinear Phenom-
ena, vol. 404, Mar. 2020, Art. no. 132306.
[12] R. W. Liu, M. Liang, J. Nie, W. Y. B. Lim, Y. Zhang, and M. Guizani, XINCHANG ZHANG (Senior Member, IEEE)
‘‘Deep learning-powered vessel trajectory prediction for improving smart received the M.S. degree from the Shandong Uni-
traffic services in maritime Internet of Things,’’ IEEE Trans. Netw. Sci. versity of Science and Technology, China, in 2005,
Eng., early access, Jan. 7, 2022, doi: 10.1109/TNSE.2022.3140529. and the Ph.D. degree from the Computer Network
[13] R. W. Liu, M. Liang, J. Nie, Y. Yuan, Z. Xiong, H. Yu, and Information Center, Chinese Academy of Sci-
N. Guizani, ‘‘STMGCN: Mobile edge computing-empowered vessel tra- ences, China, in 2010. He is currently a Professor
jectory prediction using spatio-temporal multi-graph convolutional net- at the Qilu University of Technology (Shandong
work,’’ IEEE Trans. Ind. Informat., early access, Apr. 8, 2022, doi:
Academy of Sciences). He has over 40 papers
10.1109/TII.2022.3165886.
in research journals, such as IEEE JOURNAL
[14] H. D. Trinh, L. Giupponi, and P. Dini, ‘‘Mobile traffic prediction from raw
ON SELECTED AREAS IN COMMUNICATIONS (JSAC),
data using LSTM networks,’’ in Proc. IEEE 29th Annu. Int. Symp. Pers.,
Indoor Mobile Radio Commun. (PIMRC), Sep. 2018, pp. 1827–1832. IEEE TRANSACTIONS ON SERVICES COMPUTING (TSC), IEEE TRANSACTIONS ON
[15] N. Ramakrishnan and T. Soni, ‘‘Network traffic prediction using recurrent VEHICULAR TECHNOLOGY (TVT), and IEEE Communications Magazine and
neural networks,’’ in Proc. 17th IEEE Int. Conf. Mach. Learn. Appl. international conference proceedings. His research interests include network
(ICMLA), Dec. 2018, pp. 187–193. protocols and architectures, and cloud computing. He won the Shandong (in
[16] H. Lu and F. Yang, ‘‘Research on network traffic prediction based on long China) Science and Technology Progress Awards, in 2013, 2018, and 2019,
short-term memory neural network,’’ in Proc. IEEE 4th Int. Conf. Comput. respectively.
Commun. (ICCC), Dec. 2018, pp. 1109–1113.