0% found this document useful (0 votes)
67 views13 pages

LSTM Network-Based Adaptation Approach For Dynamic Integration in Intelligent End-Edge-Cloud Systems

The paper proposes an LSTM network-based approach to optimize resource allocation in intelligent end-edge-cloud systems. It uses embedding techniques to address data sparsity and LSTM networks to analyze edge device storage capacity and user preferences over time. This allows dynamic adaptation of user requirements to edge resources to improve quality of experience.

Uploaded by

Palanikumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views13 pages

LSTM Network-Based Adaptation Approach For Dynamic Integration in Intelligent End-Edge-Cloud Systems

The paper proposes an LSTM network-based approach to optimize resource allocation in intelligent end-edge-cloud systems. It uses embedding techniques to address data sparsity and LSTM networks to analyze edge device storage capacity and user preferences over time. This allows dynamic adaptation of user requirements to edge resources to improve quality of experience.

Uploaded by

Palanikumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

TSINGHUA SCIENCE AND TECHNOLOGY

ISSN 1007-0214 20/21 pp1219−1231


DOI: 1 0 . 2 6 5 9 9 / T S T . 2 0 2 3 . 9 0 1 0 0 8 6
Volume 29, Number 4, August 2024

LSTM Network-Based Adaptation Approach for Dynamic


Integration in Intelligent End-Edge-Cloud Systems

Xuan Yang and James A. Esquivel*

Abstract: Edge computing, which migrates compute-intensive tasks to run on the storage resources of edge
devices, efficiently reduces data transmission loss and protects data privacy. However, due to limited
computing resources and storage capacity, edge devices fail to support real-time streaming data query and
processing. To address this challenge, first, we propose a Long Short-Term Memory (LSTM) network-based
adaptive approach in the intelligent end-edge-cloud system. Specifically, we maximize the Quality of
Experience (QoE) of users by automatically adapting their resource requirements to the storage capacity of
edge devices through an event mechanism. Second, to reduce the uncertainty and non-complete adaption of
the edge device towards the user’s requirements, we use the LSTM network to analyze the storage capacity of
the edge device in real time. Finally, the storage features of the edge devices are aggregated to the cloud to re-
evaluate the comprehensive capability of the edge devices and ensure the fast response of the user devices
during the dynamic adaptation matching process. A series of experimental results show that the proposed
approach has superior performance compared with traditional centralized and matrix decomposition based
approaches.

Key words: data query; Long Short-Term Memory (LSTM) networks; end-edge-cloud; quality of experience

1 Introduction streams, low value density, long end-of-network


transmission times, and high storage costs pose
With the rapid development of the Internet of Things
(IoTs), large-scale sensing devices are deployed in a significant challenges to the centralized data processing
wide range of applications, generating large-scale real- model of cloud infrastructure. In recent years, many
time streaming data non-stop. Traditional cloud-based end devices, such as smart gateways and wireless base
stream data processing architectures typically use stations, have evolved rapidly and started to have some
centralized data storage and centralized processing. computing power[1]. A new computing generalization,
However, the explosive growth of raw sensing data i.e., edge computing, has started to emerge and
received widespread attention, whose main idea is to
Xuan Yang is with Graduate School, Angeles University
schedule some computing tasks from the cloud to be
Foundation, Angeles City 2009, Philippines, and also with
Shandong Provincial University Laboratory for Protected executed at the end (edge devices) to reduce the burden
Horticulture, Weifang University of Science and Technology, of the cloud[2, 3]. The concept of edge computing aims
Weifang 262700, China. E-mail: [email protected]. to distribute the processing of streaming sensor data to
James A. Esquivel is with Graduate School, Angeles University edge devices, thereby reducing the computational load
Foundation, Angeles City 2009, Philippines. E-mail:
on cloud resources generated by large-scale data
[email protected].
streams. One crucial issue in real-time processing of
* To whom correspondence should be addressed.
Manuscript received: 2023-06-22 ; revised: 2023-07-26; such extensive sensor data is how to partition the
accepted: 2023-08-10 processing tasks effectively, enabling seamless

© The author(s) 2024. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
1220 Tsinghua Science and Technology, August 2024, 29(4): 1219−1231

integration between cloud and edge devices while base stations to swiftly respond to users’ real-time
making the most of their respective computational query. In this scenario, edge computing plays a vital
capabilities. Additionally, driven by recent role by enabling the processing and analysis of real-
advancements in technologies, such as big data, Deep time streaming data at the edge devices. By performing
Learning (DL) has emerged as one of the most data queries and processing locally, the system can
remarkable fields in Artificial Intelligence (AI)[4–6]. It respond rapidly to changing conditions, improving
has achieved substantial breakthroughs in various overall operational efficiency, and enabling timely
domains, including computer vision, speech actions to be taken based on the analyzed data.
recognition, and natural language processing[7, 8]. With the powerful learning capability, the embedding
Currently, most AI computing tasks heavily rely on technique in artificial intelligence is very suitable for
cloud platforms or other large-scale and compute- the interpretation and fusion of various data types, such
intensive resources[9, 10]. However, the physical as text, image, and sound[14]. Therefore, in this paper,
distance between these platforms and intelligent end we use the embedding technique to alleviate the data
devices, coupled with the immense amount of data at sparsity issue existed in user stream data-driven edge
the network edge, severely limits the convenience and device. Moreover, as a popular time modeling module,
benefits provided by AI. As a result, the integration of Long Short-Term Memory (LSTM) network owns the
edge computing and AI has been proposed, leading to ability to capture the dynamic features of historical
the concept of Edge Intelligence (EI)[3, 11–13]. This data over multiple time periods[15]. Thus, LSTM is
integration aims to overcome the limitations of consistent with our goal to explore the dynamic interest
traditional approaches by bringing AI capabilities of users on stream data. Therefore, LSTM is employed
closer to the edge, where data are generated and in this paper to capture the long short-term preferences
immediate decision-making is crucial. of users over different types of stream data, and then
As illustrated in Fig. 1, users are distributed across obtain the dynamic preference features of users.
different communities, and they possess numerous Afterwards, the derived user preferences features based
edge devices. The real-time streaming data generated on LSTM networks are used to aid the subsequent user
by users are transmitted to the base stations through stream data evaluation and optimize edge network
cell links. The base stations cooperate with each other connectivity and communications management.
in transferring the real-time data uploaded by users The main contributions made by the article are
through cooperative links. Real-time data transmission essential as follows.
occurs between the base stations and cloud platforms, (1) In this paper, we propose to utilize the embedding
such as YouTube and NETFLIX, via transmission technique to solve the sparsity problem of stream data
links. These three types of links ensure the real-time by considering the stream data that users have taken.
responsiveness of the edge devices. However, user Then for each user, his or her initial embeddings are
requests are dynamically changing, raising the need to obtained, which are the basis of subsequent network
optimize the transmission between edge devices and optimization.
(2) For the initial embedding of users by each time
slot, the embedding representation is sent to an LSTM
network, which can effectively extract the preference
representations of users during multiple time slots, and
further grasp the dynamic preferences of users in a
comprehensive manner.
(3) A series of experiments on a realistic dataset
show that our proposed approach has superior
performances compared to other competitive
approaches in aiding the accurate and optimize edge
network connectivity and communication management.
The rest of the paper is organized as follows: Section
2 summarizes the related work about stream data of
Fig. 1 Motivating example in end-edge-cloud systems. users. Section 3 formulates the focused problem in this
Xuan Yang et al.: LSTM Network-Based Adaptation Approach for Dynamic Integration in Intelligent... 1221

paper. Section 4 introduces the details of our proposed architectures that utilize edge computing.
user network optimization evaluation model. In Section However, these methods do not consider the dynamic
5, experimental evaluation is conducted. Finally, we nature of streaming data, which can result in
summarize the paper and point out the future work in continuously changing requirements for cloud service
Section 6. scheduling[24, 25]. This represents one of the main
challenges addressed in this paper.
2 Related Work (2) Another type of methods is to predict the strearn
Existing work on user stream data and intelligent end- data between users who are similar by using the
edge-cloud based on historical stream records of users historical records of social friends. Davy et al.[26]
could be divided into the following two categories. proposed an EaaS platform that aims to integrate the
(1) One type of these methods is based on the network edge into the computing ecosystem. They
historical stream data records of users. Currently, there employed Raspberry Pi devices as edge nodes.
is a significant body of research exploring real-time enabling access within the platform through a
processing methods for collecting, processing, and lightweight discovery protocol. The platform also
aggregating large volumes of real-time streaming data includes a scalable resource provisioning mechanism
using distributed stream processing systems. Lua that allows workload offloading from the cloud to
et al.[16] proposed an accurate and scalable Internet theedgc, catering to multiple user requests. Husseinet
subspace geometry, embedding nodes into a geometric et al.[27] emphasized the significant role of task
plane by measuring latency between nodes. This scheduling in the integration of IoT and edge
approach enables efficient network-aware overlay computing architectures. They introduced Cooperative
networks and scalable adjustment of multicast services as a Service (CoaaS), a lightweight container-based
under dynamic network conditions, without extensive service for task selection and scheduling. They utilized
network measurements. Other methods, such as those cooperative game theory to address task selection and
discussed in Refs. [17, 18], optimize computation scheduling challenges, and designed a multi-objective
layouts based on load balancing in the underlying function considering various constraints, such as
infrastructure. Mao et al.[19] considered extending memory, CPU, and user budget, to reduce energy
distributed resources based on query bandwidth usage consumption.
for dynamic service layouts. These approaches rely on Renart et al.[28] proposed a location-aware
prior knowledge of the infrastructure and static data mechanism in their edge-based programming
pipelines to achieve efficient load distribution across framework. This framework captures information
geographically distributed resources, complementing related to client and data source locations, and
the work presented in this paper. efficiently allocates stream computations at the edge of
Zhang et al.[20] introduced an Edge as a Service the infrastructure. Wang et al.[29] presented a novel
(EaaS) platform, aiming to achieve a distributed cloud cloud computing architecture comprising both cloud
architecture and integrate the network edge into the and edge components. The cloud part focuses on
computing ecosystem. Xu et al.[21, 22] proposed Edge processing large-scale and long-term global data to
Analytics as a Service (EAaaS), a scalable analytic obtain decision-making information, such as features or
service for real-time analytics in IoT scenarios. It rule sets. The edge part handles small-scale and short-
simplifies the programming of analytic logic for users term local data to display real-time information.
under specified rules, addressing the lack of flexible Moreover, they provided high-quality personalized
and unified methods for defining domain-specific services based on acquired data features, rule sets, and
analytic logic, while maintaining efficiency in data local high-quality data at the edge. However, they did
processing. However, it relies on a rule-based approach not consider the dynamic handling of event streams
for analytic services, which is not suitable for data- dynamically generated by flow data in IoT scenarios.
driven and event-driven IoT stream applications. Wang et al.[30] proposed a new cloud computing
Assunção et al.[23] investigated the latest techniques in framework based on a tensor service model to deliver
stream processing engines and the mechanisms high-quality proactive and personalized services to
leveraging the elastic characteristics of cloud humans. However, they did not consider the handling
computing resources for data stream processing in of a large number of dynamic sporadic event
1222 Tsinghua Science and Technology, August 2024, 29(4): 1219−1231
 
processing tasks.  1 1 ... 0 
 1 0 ... 0 
While these works attempt to integrate cloud  
UM =  .. .. .. ..  (2)
infrastructure with edge devices for data processing,  . . . . 
 
the collaboration between the edge and cloud in these 1 0 ... 1
approaches is based on static methods predefined in The historical records of multiple time slots help
advance[31]. They do not address the dynamic capture the full-scale status of each user, and thus
adaptation timing between cloud and edge services, match the stream data feature between two users. We
which is a key aspect in the research scenario presented use the following two techniques to extract effective
in this paper. information from multiple time slots:
● Embedding. We use the Embedding method to
3 Problem Statement transform the historical records of multiple time slots
Based on the dynamic record of stream data left by the into the same mathematical space, while maintaining
user in the edge device, a description of the stream data the original data information.
that the user has participated in is presented as follows. ● LSTM. The LSTM network is able to capture the
● U = {u1 , u2 , . . . , um } indicates the set of users located serial information of multiple time slots side by side
within the coverage area of a base station. and the intrinsic correlation between them[18]. Based on
● I = {i1 , i2 , . . . , in } indicates the set of stream data left such properties, we are capable of mining the
characteristics of the user’s stream data over multiple
on the base station.
time slots.
The stream data that users have participated in are
In general, the original records of multiple sequential
dynamically changing, and the stream data at time slot
stream data of each user are transformed by
t can be represented as b (t) = {It1 , R1t , Mt1 } , where the
embedding, and then fed into the LSTM network to
specific three elements can be expressed as follows: obtain a comprehensive representation of the user’s
● It1 = {i1, t , i2, t , . . . , in, t } indicates the set of stream preference. Finally, the similarity of stream data
data that including the behavioral records of u1 at time between two users is further evaluated.
slot t .
● R1t = {r1, t , r2, t , . . . , rn, t } indicates the set of ratings of 4 Methodology
the stream data that include the behavioral records of For the historical records left by users on the Macro
u1 at time slot t . Base Station (MBS), we propose the stream data
● Mt1 = {m1, t , m2, t , . . . , mn, t } indicates the set of assessment method based on artificial intelligence
categories corresponding to the stream data that u1 has technology, named PhyHA. The proposed framework
participated in time slot t . is shown specifically in Fig. 2. Specifically, firstly, we
● UR indicates the rating matrix between the users use an embedding approach to keep the basic historical
and the participating stream data at time slot t . Each records of the user on the base station, which allows us
row in the matrix represents the rating of user um to to obtain a representation of the user’s stream data
stream data in, t at time slot t . This is a sparse matrix. features. Secondly, the embedding representations of
the user’s stream data features over multiple time slots
[1, 2, 3, 4, 5] represents the rating, and 0 represents that
are fed into the LSTM network to extract the user’s
the user behavior at time t is empty.
comprehensive characteristics. Finally, the similarity of
 
 1 0 ... 5  stream data among users are evaluated by using cosine
 2 4 ... 2 
  similarity.
UR =  .. .. .. ..  (1)
 . . . . 
  4.1 Stream data feature representation
4 3 ... 1
Although users leave multiple time periods of historical
● UM indicates the interaction matrix between the records on the MBS base station, the records of stream
user and the category of stream data at time slot t . Each data that each user has engaged in at time slot t are
row in the matrix represents the category of stream data sparse. Coincidentally, embedding possess such
mn, t from user um at time slot t . This is also a sparse capability that transforms sparse raw data into dense
matrix. 1 represents true, 0 represents false. embedding vectors[32]. By leveraging this advantage,
Xuan Yang et al.: LSTM Network-Based Adaptation Approach for Dynamic Integration in Intelligent... 1223

B (1) B (2) B (3) B (T) B (1) B (2) B (3) B (T)

b (1) b (2) b (3) b (T) b (1) b (2) b (3) b (T)

Fig. 2 Overall framework of PhyHA.

embedding transformation techniques is exploited to records sparsity of the user’s participation in each time
obtain an effective representation of the user’s health slot. (3) The combination of multiple information
features. In order to consider the combined stream data represents the user’s historical behavior record, which
preferences of users, we take into account the user’s can provide a strong basis for subsequent preference
ratings and categories of stream data. After embedding extraction.
encoding, we can convert the data matrix It1 , the rating
4.2 Data preference representation over multiple
matrix R1t , and the category matrix M 1t into embedding
time slots
matrices as Ei,u1t , Er,u1t , and Em,
u1
t , respectively.
For the representation of each user’s historical stream The LSTM is a kind of temporal recurrent neural
record, we stitch three embedding matrices to represent network that is specifically designed to solve the long-
a composite representation of each user’s initial data term dependency problem of general Recurrent Neural
features. The specific process is shown below: Networks (RNNs), where all RNNs have a chained
form of repeating neural network modules[33]. Without
B(t) = Ei,u1t ⊕ Er,u1t ⊕ Em,
u1
t (3)
loss of generality, the LSTM still retains the underlying
where ⊕ denotes stitch operation between matrices. functionality of processing historical records over
b (t) and B (t) denote the stream data representation multiple time periods, and can extract both the
of u1 before and after embedding transformation over effective information as well as the correlation between
multiple time slots t in time period T , respectively. records. By taking advantage of this property, the data
t ∈ [1, T ] . Here, we utilize embedding transformation preferences of each user can be extracted through the
for obtaining a representation of the user’s initial LSTM network. The process of data flow in each
features, which has the following three advantages: (1) LSTM unit is shown in the following:
Combining multiple types of data information
ft = σ (W f • [ht−1 , B (t)] + b f ) (4)
generated by users, the embedding method can
transform various types of information into the same it = σ (Wi • [ht−1 , B (t)] + bi ) (5)
mathematical space. (2) Due to the sparsity of the
e = tanh (Wc • [ht−1 , B (t)] + bc )
C (6)
user’s stream data, only the data that the user has
participated in can be considered, without considering et
Ct = ft ◦ Ct−1 + it ◦ C (7)
the stream data that he or she has not participated in,
which effectively solves the problem of historical ot = σ (Wo • [ht−1 , B (t)] + bo ) (8)
1224 Tsinghua Science and Technology, August 2024, 29(4): 1219−1231

ht = ot ◦ tanh (Ct ) (9) and L2 denote the data preference during the time
where • denotes the multiplication between elements, period T of u1 and u2 , respectively, L1 and L2 also
◦ denotes hadamard product between matrics, tanh denotes the average outputs of multiple hidden layers
denotes the hyperbolic tangent function, and σ denotes in LSTM networks. In the LSTM unit, Ct plays a key
the sigmoid function. Both ht−1 and B (t) denote the role in data processing by extracting the key
input of the hidden layer and ht denotes the output of information on the current time slot and transferring it
the hidden layer in the LSTM network of u1 at time to the hidden state at the next moment. This way of
slot t . Wi , Wc , Wo and bi , bc , bo represent the weight dealing with long-term time records is very effective.
matrix and bias matrix of the corresponding network Overall, the LSTM network facilitates the acquisition
components, respectively. Ct−1 and Ct represent the of comprehensive stream data preferences of the users
cell state at time slots t − 1 and t , respectively, where involved in the historical records. Experiments have
Ct and ot together determine the output of the LSTM shown that multilayer LSTM networks can effectively
network. Cet converts the input to a numerical value extract such features over long-term periods to
between −1 and 1. After multiple time periods of data represent a comprehensive preference representation of
processing, the LSTM can extract the user’s historical users.
records at time slots. By observing the patterns and trends in storage usage
There are three sigmoid gate structures in each over time, the LSTM network can capture the
LSTM unit, which are forgetting gate ft , input gate it , dynamics of storage availability and predict the future
and output gate ot . Due to the characteristics of the capacity of the edge devices. The information extracted
sigmoid function that maps the input data to the value from the LSTM network regarding storage capacity is
interval (0, 1), these three gates determine the degree of then utilized for adaptive resource allocation. This
retention of the long-term memory stream, the degree means that based on the predicted storage capacity of
of embedding of the input unit, and the degree of each edge device, resource requirements of users are
presentation of the short-term memory stream, dynamically adjusted to match the available resources.
respectively. The reason for this principle is mainly This adaptive resource allocation ensures efficient
because of the application of the sigmoid function. utilization of the limited storage capacity of edge
Specifically, if the sigmoid output is 1 that means that devices, optimizing the allocation of computational
the information is completely retained, the output is 0 resources and enhancing the overall system
that means that the information is completely performance.
discarded, while between 0 and 1 that means that the
information is not completely retained but somewhat 4.3 Forward Neural Network (FNN)
discarded. The combined preference representations of users are
With the scalability of the structure of the LSTM extracted from the LSTM network and then sent into
network itself, multilayer LSTM networks can often the forward neural network FNN[34]. Here, the
obtain more levels of key information. Here, we utilize integrated preference representations of pairs of users
a multilayer LSTM network to extract the initial data are further processed to obtain the latent feature
preferences of users at multiple time slots. Based on representations of the users.
the processing of each user’s historical records in For the M -layer FNN network, if the input matrix is
LSTM network, the initial data preferences of each x , and the output matrix is y . W is the weight matrix of
user are extracted. Similarly, we can get the M -layer FNN network, and b is the corresponding bias
representation of data preferences of u2 . The multi- matrix. xm−1 (m = 1, 2, . . . , M) is both the output of the
layer LSTM network formulation is represented as ( m − 1 )-th layer and the input of the m -th layer. After
follows: the data flow through the last M -th layer, the output y
L1 = (h1a, 1 + h1a, 2 + · · · + h1a, T )/T, of the entire FNN is obtained. Equation (11) shows the
calculation process of the M -layer FNN’s output y ,
L2 = (h2a, 1 + h2a, 2 + · · · + h2a, T )/T (10)
y1 = W1 • x + b1 ,
where a denotes the number of layers of the LSTM,
h1a, T and h2a, T denote the output of the hidden layer of ym = Wm • xm−1 + bm−1 (m = 2, 3, . . . , M),
u1 and u2 during the time period T , respectively, L1 y = W M • x M−1 + b M−1 (11)
Xuan Yang et al.: LSTM Network-Based Adaptation Approach for Dynamic Integration in Intelligent... 1225

Based on the processing of the matrix by the FNN designated as a semi positive definite matrix. It can
model, the user’s preference representations are fed control the loss value of the whole network model, and
into Eq. (12) for processing, ensures the loss value falls within the range [−1, 1]. To
better optimize the parameters in the weight matrix, a
F1 = (W1M . . . (W12 • (W11 • L1 + b11 ) + b12 ) . . . b1M ),
Stochastic Gradient Descent (SGD) based model[40, 41]
F2 = (W2M . . . (W22 • (W21 • L2 + b21 ) + b22 ) . . . b2M ) (12)
is exploited to optimize the overall model. Moreover,
where W11 , W12 , . . . , W1M and W21 , W22 , . . . , W2M the specific pseudo-code process is shown in
represent the input layer weight matrix, while Algorithm 1.
b11 , b12 , . . . , b1M and b21 , b22 , . . . , b2M represent the Finally, the evaluation of stream data features
bias matrix of the input layer. F1 and F2 represent the between any two users is computed. To be more
stream data features of u1 and u2 . Finally, feature discriminative, we rank the similarity values in an
representations for pairs of users are extracted. ascending order. A similarity value greater than 0.4 is
4.4 Stream data assessment considered to be similar between two users. In this
paper, we exploit the historical records of user
To measure the match of the stream data, the similarity participation in stream data that contain multiple types
principle is used to map the stream data between two of values. Based on this, we utilize embedding
users, i.e., the closer the two elements are, the larger transformations to map multiple data types to the same
their similarity values are, while the more distance the mathematical space. Furthermore, considering that
two elements are, the smaller their similarity values users’ interest to engage in stream data varies over
are[35]. Based on the feature representations F1 and F2 time, we extract the dynamic data preferences of users
obtained from the FNN network, we use the cosine by utilizing the LSTM network, which obtains the user’s
similarity to measure the comparison of the features of preference features. Then, these features are feed into
two users, the FNN network. Finally, the similarity value of
F1, 2 = F1 • F2 = ∥F1 ∥ ∥F2 ∥ cos θ (13)
stream data features between two users is calculated by
making use of cosine similarity, which can greatly
where F1, 2 indicates the similarity value between users facilitate the subsequent recommendation for users’
u1 and u2 , ∥ ∥ indicates the modulo operation on a
historical records.
vector, and θ indicates the angle between the feature The proposed approach exhibits scalability in several
vectors F1 and F2 . The range of similarity value is aspects. Firstly, the use of embedding-based techniques
between −1 and 1. The user has engaged in more allows for efficient representation of user data,
similar stream data if the value of F1, 2 trends toward 1, enabling the model to handle larger-scale datasets
which indicates that u1 and u2 have similar historical without significant computational overhead. This is
records. The value of F1, 2 tends to be −1, which means because embeddings capture the underlying semantic
that the two users are in different physical conditions, relationships among data points, enabling effective
and indicates that there is a greater difference between
the historical records that the users are engaged Algorithm 1 PhyHA model
in[36, 37]. Input: U, I 1t , R1t , M 1t , and iteration times 100
For neural network models, the loss function helps to
Output: F1, 2
optimize the whole network model[25, 38]. L1 loss while iteration times ⩽ 100
function, i.e., absolute value loss function, is often used for a pair of users ( u1 , u2 ) in U do
in existing regression problems, which also helps to
B1 (t) , B2 (t) ← Input ( I 1t , R1t , M 1t ) and ( It2 , R2t , Mt2 ) by
optimize the whole network[39]. L1 loss function
using Eqs. (1)−(3);
optimization formula is as follows:
L1 , L2 ← Input B1 (t) and B2 (t) by using Eqs. (4)−(10);
b1, 2 ) = W(θ) (F
L (F1, 2 , F b1, 2 − F1, 2 ) (14) F1 , F2 ← Input L1 and L2 by using Eqs. (11) and (12);
where F b1, 2 denots the label value of the real physical F1, 2 ← Input F1 and F2 by using Eq. (13);
status of u1 and u2 . Two users with similar historical Optimize the whole network by using SGD in Eq. (14);
records are labeled as 1, and those who are not similar end for
end while
are labeled as −1. W (θ) is a weighted matrix which is
1226 Tsinghua Science and Technology, August 2024, 29(4): 1219−1231

generalization across a wide range of instances. is calcucated with the cosine similarity.
Secondly, the architecture of the model, such as the • Embedding[44]. It applies all records of stream
LSTM network, is designed to handle sequential data, data in which the user has participated, without
and can naturally scale to longer sequences without considering the time continuity. The embedded
substantial performance degradation. The model can representations of the two users are fed into the cosine
process streaming data in real-time, making it suitable similarity to calculate the similarity of matching
for applications that involve continuous data streams. degree.
Furthermore, the utilization of distributed computing (4) Metrics
frameworks can facilitate parallel processing and In the disciplines of information retrieval and
enable the model to leverage the computing power of statistics, metrics, such as accuracy, precision, and
multiple devices or nodes, thereby improving recall, are frequently used to evaluate the quality of
scalability and reducing the computational burden. findings[45, 46]. Thus, we use the above three metrics to
measure the effectiveness of our model’s performance.
5 Experiment
• Accuracy. It is the proportion of positive and
negative cases that are correctly classified to all
5.1 Experimental setup
samples involved in the classification,
(1) Dataset
TP + TN
We use a real dataset Epinions[42] to verify the Accuracy = (15)
TP + TN + FP + FN
effectiveness of our proposed model. The Epinions
where TP indicates true positive, which means that
dataset contains 7400 users, which includes
positive cases is judged as positive class. FP indicates
information about users’ ratings of stream data,
false positive, which implies that the negative cases is
categories, and other attribute information. The values
judged as positive. FN means false negative, which
of ratings are from 1 to 5. Epinions dataset comprises
means that the positive cases is judged as negative
records of users’ stream data for multiple time slots.
class. TN indicates true negative, which means the
One year is divided into one time slot, and there are 11
negative cases is judged as negative class.
time slots in total.
• Precision. It measures the correct proportion of
(2) Parameter setting
samples identified by the model in Ref. [47]. The
Through multi-level validation, we set the
expression is as follows:
dimensionality of embedding as 64 and the number of
layers of LSTM network is 2. The number of FNN TP
Precision = (16)
layers is set to 2. The learning rate of the existing TP + FP
network model is 1 × 10−3 . The PyTorch framework is • Recall. It measures the performance of the model
utilized to facilitate the implementation of the entire by calculating the proportion of correctly identified
model. For network optimization, the Adam optimizer samples that can be recommended to users. The higher
is employed to update the parameters and weights of the value, the better. The expression is as follows:
the network. For each experiment, we run the model TP
Recall = (17)
for 100 iterations. TP + FN
For the experimental environment, software
5.2 Effectiveness of the model
environment is as follows: PyCharm Community
Edition; Hardware environment is as follows: Window A series of experiments are implemented to effectively
10, 512 GB. validate the effective performance of the PhyHA
(3) Comparison methods model. The Epinions dataset is divided into different
• Random. We randomly select users’ records of proportions to validate the performance of PhyHA
stream data and further process them by using the FNN model under different data densities. In particular, the
network to calculate the similarity values for users by ratio of the training set is assigned as {20%, 40%, 60%,
the cosine similarity measurement. 80%, 90%}, and the remaining ratio is the test set. We
• MF[43]. The records of the stream data in which the utilize three metrics to discriminate the model
users participate are decomposed and further performance at multiple levels.
optimized. The final similarity value between two users As shown in Fig. 3, the training set ratio ranges from
Xuan Yang et al.: LSTM Network-Based Adaptation Approach for Dynamic Integration in Intelligent... 1227

20% to 90%, and the PhyHA model has the best


performance compared to the other three competing
methods. The accuracy values of the PhyHA model are
on average 22.45%, 8.90%, and 41.17% higher than
those of Embedding, MF, and Random methods. The
Random method performs worse because the random
selection of stream data that users have participated in
does not correspond to the feature extraction, which
makes the accuracy lower. The Embedding method
Fig. 4 Precision comparison.
performs better than the Random method because
Embedding still maintains the original feature of decomposition based approaches require complex
converting sparse vectors into dense vector algorithms and coordination mechanisms to effectively
representations. The MF method uses matrix distribute computation among edge devices. This
decomposition to obtain the user representation and complexity adds implementation overhead and may
data preference representation, but the process of reduce accuracy. In contrast, our proposed method,
matrix decomposition involves manually setting the based on embedding, efficiently preserves users’
dimensionality of the features, which can easily cause existing streaming data records, thereby reducing
overfitting problems and lead to low accuracy of the computational complexity and achieving highly
MF method. accurate user representations.
In terms of precision values (as shown in Fig. 4), the
5.3 Effect of similarity values
performance of the PhyHA model has been steadily
increasing with the increase of data density. The The similarity value of user representation indicates
accuracy values of the PhyHA model are 27.90%, how closely the users match each other’s stream data.
6.20%, and 37.91% higher than those of Embedding, The higher the similarity value, the better the match
MF, and Random on average. The recall performance between two users’ stream data, and vice versa. As
of the PhyHA model is best at 90% data density (see shown in Table 1, the similarity values of stream data
Fig. 5), which can indicate that more available feature between users are set as {0.4, 0.5, 0.6, 0.7, 0.8},
information is rel easier to obtain effective features at a to verify the effect on the accuracy and recall of the
larger proportion of the training set. models. First, from the overall trend, as the similarity
Compared to traditional centralized methods, matrix value changes from 0.4 to 0.8, the four methods

Fig. 3 Accuracy comparison. Fig. 5 Recall comparison.

Table 1 Effect of varying similarity values of user representation on models.


Precision Recall
Similarity value
Embedding MF Random PhyHA Embedding MF Random PhyHA
0.4 0.5450 0.6702 0.4665 0.6525 0.6309 0.6108 0.4562 0.7918
0.5 0.5465 0.7033 0.4868 0.7213 0.6335 0.6214 0.4662 0.8024
0.6 0.5535 0.7272 0.4712 0.8002 0.6485 0.6302 0.4662 0.8223
0.7 0.5638 0.7324 0.4829 0.8223 0.6425 0.6358 0.4862 0.8722
0.8 0.5616 0.7302 0.4868 0.8251 0.6454 0.6402 0.4562 0.8744
1228 Tsinghua Science and Technology, August 2024, 29(4): 1219−1231

perform increasingly well in terms of precision and measurement of stream data feature among users.
recall. Secondly, compared to the other three baseline These results are presented in Figs. 6 and 7, which
methods, the PhyHA model achieves the best results show that the PhyHA model has the best performance,
when the similarity value is set up to 0.8. This is and demonstrate that initial data, rating, and category
because a larger similarity value indicates more data should be taken into account in the modeling of the
interactions between users, and PhyHA model can user’s stream data features.
identify the friends with the most similar stream data
5.5 Effect of LSTM layers
feature of the target user.
To test the effect of data preference representation on
5.4 Effect of different user data the overall network model, we cross-validate the effect
In our proposed PhyHA model, we use three portions of different number of LSTM layers on the precision of
of user data, including item, rating, and category, to the model, and the number of LSTM layers a is set to
model the user’s initial data feature. We set different a = {2, 3, 4, 5, 6} (as presented in Figs. 8 and 9). When
ratios of the training set to verify the effect of different the number of LSTM layers is set to 1, the model has
user data on the model as shown in Figs. 6 and 7. the lowest precision. This is because a small number of
PhyHA-item, PhyHA-rating, and PhyHa-category LSTM layers does not capture the effective features of
denote the user data used in the PhyHA model the time slot. With the LSTM layers set to 2, the model
containing only item, rating, and category, has a relatively lower recall value and does not capture
respectively. the true preferences of the effective users over multiple
Compared with the PhyHA model, the average time slots. Collectively, the model achieves the best
precision values of the model decrease by 9.52%, results when the number of LSTM layers is 4.
3.21%, and 2.14% in the proportion of the changing 5.6 Effect of learning rate
training set corresponding to the PhyHA-item, PhyHA-
For model training of neural networks, parameters in
rating, and PhyHa-category methods, respectively. On
the network also play a crucial role, such as the
the one hand, the experimental results indicate that
learning rate. The learning rate is a tuning parameter in
item, rating, and category data are important for
the optimization algorithm, which determines the step
measuring users’ representation. On the other hand, the
rating and category have a greater impact on the

Fig. 8 Precision of different LSTM layers and different


learning rate on PhyHA model.
Fig. 6 Precision of different user data on PhyHA model.

Fig. 9 Recall of different LSTM layers and different


Fig. 7 Recall of different user data on PhyHA model. learning rate on PhyHA model.
Xuan Yang et al.: LSTM Network-Based Adaptation Approach for Dynamic Integration in Intelligent... 1229

size in each iteration, so that the loss function safeguarding the privacy of data within the base
converges to the minimum value. If the learning rate is stations and facilitating the learning of data training
too large, the parameters to be optimized fluctuate patterns. In our future work, we aim to integrate
around the minimum value; if the learning rate is too federated learning with LSTM neural networks to
small, the parameters to be optimized converge slowly. develop a comprehensive model that optimizes
As shown in Fig. 9, we set different learning rates to network transmission and data distribution. This fusion
observe the impact on the recall of the model from the of techniques will enable us to effectively train the
model optimization perspective. The learning rates are entire model and improve the overall performance of
set to {0.000 02, 0.000 03, 0.000 04, 0.000 05, 0.000 06}. the system. By leveraging the strengths of federated
The outcomes of the experiment demonstrate that a learning and LSTM networks, we anticipate achieving
higher learning rate does not result in better results. enhanced network efficiency and data privacy
When the learning rate is set from 0.000 02 to 0.000 04, preservation in edge computing environments.
the accuracy performance of the model is gradually References
increasing. At the interval of 0.000 04 to 0.000 06, the
performance of the model is in the progressive down. [1] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang,
Edge intelligence: Paving the last mile of artificial
When the learning rate is set to 0.000 04, the model has
intelligence with edge computing, Proc. IEEE, vol. 107,
the best performance. In conclusion, an appropriate no. 8, pp. 1738–1762, 2019.
learning rate parameter setting can contribute to the [2] J. Chen and X. Ran, Deep learning with edge computing:
overall operation of the model. A review, Proc. IEEE, vol. 107, no. 8, pp. 1655–1674,
2019.
6 Conclusion [3] S. Deng, H. Zhao, W. Fang, J. Yin, S. Dustdar, and A. Y.
Zomaya, Edge intelligence: The confluence of edge
Due to limited computing resources and storage computing and artificial intelligence, IEEE Internet Things
capacity, edge devices fail to support real-time J., vol. 7, no. 8, pp. 7457–7469, 2020.
streaming data query and processing in the edge [4] L. Kong, G. Li, W. Rafique, S. Shen, Q. He, M. R.
Khosravi, R. Wang, and L. Qi, Time-aware missing
computing domain. To cope with these problems, we
healthcare data prediction based on ARIMA model,
propose an adaptive approach based on LSTM IEEE/ACM Trans. Comput. Biol. Bioinf., doi:
networks in the intelligent end-edge-cloud system. In 10.1109/TCBB.2022.3205064.
particular, we maximize the user QoE by automatically [5] P. K. Ghosh, A. Chakraborty, M. Hasan, K. Rashid, and
adapting to the resource demand of users and the A. H. Siddique, Blockchain application in healthcare
storage capacity of edge devices. To reduce the systems: A review, Systems, vol. 11, no. 1, p. 38, 2023.
[6] J. K. Rajah, W. Chernicoff, C. J. Hutchison, P. Gonçalves,
uncertainty and incomplete adaptation of edge devices and B. Kopainsky, Enabling mobility: A simulation model
to user demands, we use LSTM networks to analyze of the health care system for major lower-limb amputees
the storage capacity of edge devices in real time. The to assess the impact of digital prosthetics services,
storage features of edge devices are aggregated to the Systems, vol. 11, no. 1, p. 22, 2023.
[7] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning,
cloud to re-evaluate the combined capacity of the edge
Nature, vol. 521, no. 7553, pp. 436–444, 2015.
devices and to ensure fast response of user devices. A [8] X. Zhou, Y. Li, and W. Liang, CNN-RNN based
series of experimental results demonstrate the superior intelligent recommendation for online medical pre-
performance of the proposed approach compared to the diagnosis support, IEEE/ACM Trans. Comput. Biol.
traditional centralized and matrix decomposition based Bioinf., vol. 18, no. 3, pp. 912–921, 2021.
[9] T. Li, X. Wang, Y. Yu, G. Yu, and X. Tong, Exploring the
approaches in real datasets.
dynamic characteristics of public risk perception and
The existing models primarily focus on improving emotional expression during the COVID-19 pandemic on
the real-time data flow from edge devices to enhance Sina Weibo, Systems, vol. 11, no. 1, p. 45, 2023.
the user experience. However, it is crucial to address [10] F. Wang, G. Li, Y. Wang, W. Rafique, M. R. Khosravi, G.
the limited data storage capacity at edge base stations. Liu, Y. Liu, and L. Qi, Privacy-aware traffic flow
To overcome this challenge, timely aggregation of real- prediction based on multi-party sensor data with zero trust
in smart city, ACM Trans. Internet Technol., vol. 23, no. 3,
time data from distributed base stations becomes
p. 44, 2023.
necessary. Additionally, considering the privacy [11] D. Xu, T. Li, Y. Li, X. Su, S. Tarkoma, T. Jiang, J.
concerns associated with user data, the utilization of Crowcroft, and P. Hui, Edge intelligence: Architectures,
federated learning can offer a promising solution by challenges, and applications, arXiv preprint arXiv:
1230 Tsinghua Science and Technology, August 2024, 29(4): 1219−1231

2003.12172, 2020. Mag., vol. 52, no. 1, pp. 132–139, 2014.


[12] L. Kong, L. Wang, W. Gong, C. Yan, Y. Duan, and L. Qi, [27] M. K. Hussein, M. H. Mousa, and M. A. Alqarni, A
LSH-aware multitype health data prediction with privacy placement architecture for a container as a service (CaaS)
preservation in edge environment, World Wide Web, vol. in a cloud environment, J. Cloud Comput., vol. 8, p. 7,
25, no. 5, pp. 1793–1808, 2022. 2019.
[13] J. P. Vaara, T. Vasankari, H. J. Koski, and H. Kyröläinen, [28] E. G. Renart, J. Diaz-Montes, and M. Parashar, Data-
Awareness and knowledge of physical activity driven stream processing at the edge, in Proc. IEEE 1st Int.
recommendations in young adult men, Front. Public Conf. Fog and Edge Computing, Madrid, Spain, 2017, pp.
Health, vol. 7, p. 310, 2019. 31–40.
[14] C. Janiesch, P. Zschech, and K. Heinrich, Machine [29] X. Wang, L. T. Yang, X. Xie, J. Jin, and M. J. Deen, A
learning and deep learning, Electronic Markets, vol. 31, cloud-edge computing framework for cyber-physical-
no. 3, pp. 685–695, 2021. social services, IEEE Commun. Mag., vol. 55, no. 11, pp.
[15] Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent 80–85, 2017.
neural networks: LSTM cells and network architectures, [30] X. Wang, L. T. Yang, J. Feng, X. Chen, and M. J. Deen, A
Neural Comput., vol. 31, no. 7, pp. 1235–1270, 2019. tensor-based big service framework for enhanced living
[16] E. K. Lua, X. Zhou, J. Crowcroft, and P. Van Mieghem, environments, IEEE Cloud Comput., vol. 3, no. 6, pp.
Scalable multicasting with network-aware geometric 36–43, 2016.
overlay, Comput. Commun., vol. 31, no. 3, pp. 464–488, [31] Y. Xu, L. Qi, W. Dou, and J. Yu, Privacy-preserving and
2008. scalable service recommendation based on SimHash in a
[17] Y. Han, C. Liu, S. Su, M. Zhu, Z. Zhang, and S. Zhang, A distributed cloud environment, Complexity, vol. 2017, p.
proactive service model facilitating stream data fusion and 3437854, 2017.
correlation, Int. J. Web Ser. Res., vol. 14, no. 3, pp. 1–16, [32] H. Cai, V. W. Zheng, and K. C. C. Chang, A
2017. comprehensive survey of graph embedding: Problems,
[18] Y. Xu, Z. Feng, X. Zhou, M. Xing, H. Wu, X. Xue, S. techniques, and applications, IEEE Trans. Knowl. Data
Chen, C. Wang, and L. Qi, Attention-based neural Eng., vol. 30, no. 9, pp. 1616–1637, 2018.
networks for trust evaluation in online social networks, [33] A. Sherstinsky, Fundamentals of recurrent neural network
Inf. Sci., vol. 630, pp. 507–522, 2023. (RNN) and long short-term memory (LSTM) network,
[19] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, A Phys. D: Nonlinear Phenom., vol. 404, p. 132306, 2020.
survey on mobile edge computing: The communication [34] F. Tacchino, P. Barkoutsos, C. Macchiavello, I. Tavernelli,
perspective, IEEE Commun. Surv. Tutorials, vol. 19, no. 4, D. Gerace, and D. Bajoni, Quantum implementation of an
pp. 2322–2358, 2017. artificial feed-forward neural network, Quantum Sci.
[20] M. Zhang, J. Cao, Y. Sahni, Q. Chen, S. Jiang, and T. Wu, Technol., vol. 5, no. 4, p. 044010, 2020.
EaaS: A service-oriented edge computing framework [35] C. Luo, J. Zhan, X. Xue, L. Wang, R. Ren, and Q. Yang,
towards distributed intelligence, in Proc. 2022 IEEE Int. Cosine normalization: Using cosine similarity instead of
Conf. Service-Oriented System Engineering, Newark, CA, dot product in neural networks, in Proc. 27th Int. Conf.
USA, 2022, pp. 165–175. Artificial Neural Networks, Rhodes, Greece, 2018, pp.
[21] X. Xu, S. Huang, L. Feagan, Y. Chen, Y. Qiu, and Y. 382–391.
Wang, EAaaS: Edge analytics as a service, in Proc. 2017 [36] A. O. Almagrabi and A. Bashir, A classification-based
IEEE Int. Conf. Web Services, Honolulu, HI, USA, 2017, privacy-preserving decision-making for secure data
pp. 349–356. sharing in internet of things assisted applications, Digital
[22] Y. Xu, Z. Feng, X. Xue, S. Chen, H. Wu, X. Zhou, M. Commun. Netw., vol. 8, no. 4, pp. 436–445, 2022.
Xing, and H. Chen, MemTrust: Find deep trust in your [37] C. Wang, X. Wu, G. Liu, T. Deng, K. Peng, and S. Wan,
mind, in Proc. 2021 IEEE Int. Conf. Web Services, Safeguarding cross-silo federated learning with local
Chicago, IL, USA, 2021, pp. 598–607. differential privacy, Digital Commun. Netw., vol. 8, no. 4,
[23] M. D. de Assunção, A. da Silva Veith, and R. Buyya, pp. 446–454, 2022.
Distributed data stream processing and edge computing: A [38] Y. Zheng, Z. Li, X. Xu, and Q. Zhao, Dynamic defenses in
survey on resource elasticity and future directions, J. cyber security: Techniques, methods and challenges,
Netw. Comput. Appl., vol. 103, pp. 1–17, 2018. Digital Commun. Netw., vol. 8, no. 4, pp. 422–435, 2022.
[24] Q. Zhang, X. Zhang, H. Hu, C. Li, Y. Lin, and R. Ma, [39] S. Pesme and N. Flammarion, Online robust regression via
Sports match prediction model for training and exercise SGD on the ℓ1 loss, in Proc. 34th Int. Conf. Neural
using attention-based LSTM network, Digital Commun. Information Processing Systems, Vancouver, Canada,
Netw., vol. 8, no. 4, pp. 508–515, 2022. 2020, p. 214.
[25] D. Deng, X. Li, V. Menon, J. Piran, H. Chen, and M. A. [40] X. Luo, W. Qin, A. Dong, K. Sedraoui, and M. Zhou,
Jan, Learning-based joint UAV trajectory and power Efficient and high-quality recommendations via
allocation optimization for secure IoT networks, Digital momentum-incorporated parallel stochastic gradient
Commun. Netw., vol. 8, no. 4, pp. 415–421, 2022. descent-based learning, IEEE/CAA J. Autom. Sin., vol. 8,
[26] S. Davy, J. Famaey, J. Serrat, J. L. Gorricho, A. Miron, M. no. 2, pp. 402–411, 2021.
Dramitinos, P. M. Neves, S. Latre, and E. Goshen, [41] L. Qi, W. Lin, X. Zhang, W. Dou, X. Xu, and J. Chen, A
Challenges to support edge-as-a-service, IEEE Commun. correlation graph based approach for personalized and
Xuan Yang et al.: LSTM Network-Based Adaptation Approach for Dynamic Integration in Intelligent... 1231

compatible web APIs recommendation in mobile APP [45] F. Wang, H. Zhu, G. Srivastava, S. Li, M. R. Khosravi,
development, IEEE Trans. Knowl. Data Eng., vol. 35, no. and L. Qi, Robust collaborative filtering recommendation
6, pp. 5444–5457, 2023. with user-item-trust records, IEEE Trans. Comput. Soc.
[42] J. Tang, H. Gao, H. Liu, and A. Das Sarma, eTrust: Syst., vol. 9, no. 4, pp. 986–996, 2022.
Understanding trust evolution in an online world, in Proc. [46] Y. Yang, X. Yang, M. Heidari, M. A. Khan, G. Srivastava,
18th ACM SIGKDD Int. Conf. Knowledge Discovery and M. R. Khosravi, and L. Qi, ASTREAM: Data-stream-
Data Mining, Beijing, China, 2012, pp. 253–261. driven scalable anomaly detection with accuracy guarantee
[43] S. Lipošek, J. Planinšec, B. Leskošek, and A. Pajtler, in IIoT environment, IEEE Trans. Netw. Sci. Eng., vol. 10,
Physical activity of university students and its relation to no. 5, pp. 3007–3016, 2023.
physical fitness and academic success, Ann. Kinesiologiae, [47] L. Qi, Y. Yang, X. Zhou, W. Rafique, and J. Ma, Fast
vol. 9, no. 2, pp. 89–104, 2019. anomaly identification based on multiaspect data streams
[44] S. Lai, K. Liu, S. He, and J. Zhao, How to generate a good for intelligent intrusion detection toward secure industry
word embedding, IEEE Intell. Syst., vol. 31, no. 6, pp. 4.0, IEEE Trans. Ind. Inf., vol. 18, no. 9, pp. 6503–6511,
5–14, 2016. 2022.

James A. Esquivel received the BEng Xuan Yang received the MS degree in
degree in mathematics and computer chemical engineering from Qilu University
science from University of Santo Tomas, of Technology, China in 2014. He is
Philippines in 1989, the MEng degree in currently a PhD candidate in information
computer science from De La Salle technology at Angeles University
University, Philippines in 1998, and the Foundation, Philippines. His main research
PhD degree in information technology interests include business intelligence and
from Angeles University Foundation, recommendation system.
Philippines in 2014. He is currently a professor of computer
science at Angeles University Foundation, Philippines. His main
research interests include machine learning and predictive
analytics.

You might also like