Enabling Efficient Blockage-Aware Handover in RIS-Assisted Mmwave Cellular Networks
Enabling Efficient Blockage-Aware Handover in RIS-Assisted Mmwave Cellular Networks
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2244 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022
elements in the reflecting arrays, RIS enables to create an extra Finally, to address the limited observations for the blockage
strong reflective path [18]–[21]. The process of RIS-assisted status of beamformers and RIS, we develop a lightweight
handover consists of the following stages: a) once the link algorithm based on the derived closed-form results to estimate
blockage occurred, the handover agent can intelligently ana- the link blockage coefficients, which are able to estimate
lyze the channel blockage status for affected channel clusters the channel blockage status based on the limited number of
of reflective channels from RIS to UE and direct channels measurements. Estimated channel blockage information serves
from BS to UE, in the angle domain; b) reconfigure the as the input for the learning model. Such a measure enables the
wireless scattering environment, i.e., the weights of phase shift PER-DRL agent to have a clear picture for the blockage status
elements in the RIS, to either recover currently blocked links on each cluster and thus can further improve the performance.
or facilitate UE handover rapidly. Therefore, RIS in mmWave In this work, with the assistance of the RIS, we provide
cellular networks has great potential to decrease redundant an efficient handover scheme in response to the frequent
handovers. mmWave channel blockages. We conduct extensive numerical
Unfortunately, it is non-trivial to develop the handover simulations to validate the analysis and incorporate our design
scheme to fully leverage the benefits of RIS to reduce the into three schemes: RIS-assisted UE handover, BF-assisted UE
number of handovers. In particular, we face the following handover and SNR threshold-based UE handover. The results
challenges. a) To restore blocked links, merely configuring show that: a) the RIS-assisted UE handover scheme is able
the phase shifting array of RIS is not sufficient. Actually, due to combat mmWave channel blockage and can significantly
to the high directivity of the mmWave signal, the received reduce the handover overhead; b) compared with the schemes
signal beams reflected from the RIS are highly dominated by without considering the RIS, the joint design for RIS phase
the beamformer of BSs. If the signals are not on the broadside shifts and BS beamformers outperforms the schemes relying
of BS beamformers, it will be suppressed or even nulled by merely on BF design or SNR threshold-based UE handover;
the sidelobe of the beams. b) Prior to performing RIS-assisted c) the blockage status information of all channel clusters is
link restoration or UE handover, a trade-off is needed under the estimated by the proposed algorithm. With a better overview
various channel blockage effect. For instance, recovering the of blockage over various beamformers, a desirable handover
communication links under the channel blockage may reduce performance is achieved even with limited number of channel
the handover consumption directly. However, for mmWave observations.
channels under the severe blockage, that might result in a low Our main contributions of this work are as follows:
data rate. To this end, an optimal handover policy balancing the • For the first time, we propose a RIS-assisted handover
RIS-assisted channel restoration and UE handover is needed. scheme for UEs in mmWave cellular networks by jointly
c) To combat the fast channel blockage, the time-critical UE designing the RIS phase shifts and beamformers at BSs,
handover is needed and the handover decision should be made which is able to leverage the rich scattering environment
based on a few number of channel observations filtered by enabled by the RIS to either recover the channel blockage
the current beamformers and RIS phase shifts [11], [22]. For or facilitate UE handover. It is proved that RIS can serve
instance, in work [22], limited channel measurements (only as an alternative approach to combat the blockage effect
4 time slots) are collected at the UE beam alignment stage and largely reduce the number of handovers triggered by
in response to the fast fading of mmWave channel. Such frequent channel blockages.
limited channel observations make the blockage status of • To efficiently perform RIS-assisted link restorations or
the resting beamformers/RIS phase shifts be unknown. As a UE handover, PER is incorporated in our handover
consequence, the performance of the real-time UE handover scheme. We conduct extensive numerical simulations and
may be negatively affected. the results show that the PER-based training strategy can
To address the aforementioned challenges residing in the outperform the double deep Q-Network (DDQN)-based
RIS-assisted handover schemes, we propose the following scheme.
measures/strategies. At first, to avoid being suppressed by the • To alleviate the impact of limited environment observa-
beamformer at BS, a joint-design for the RIS phase shift tions on the training process, we propose a lightweight
elements and the BS beamformer is developed to exploit algorithm to sense the link blockage status for clusters
the enriched scattering environment. Such a joint design is within RIS reflective channels and direct channels. Along
based on deep reinforcement learning (DRL), where a central with other information from environments, the sensing
controller optimizes the handover process by using a Markov results can serve as the input for the learning model.
decision process (MDP) according to the observations from To this end, the DRL agent can have an in-depth estima-
the dynamic environment. With the joint BF-RIS design, tion for the blockage status over a wide variety of beam-
the broad-sight of beampatterns can be aligned with RIS patterns. Numerical results show a significant advantage
reflective channels to mitigate the channel blockage. Secondly, of the proposed algorithm over training strategies with
to carefully take the link restoration or UE handover with the limited observations.
objective of reducing cumulative handover overhead, we pro- The rest of this paper is organized as follows. In Section II,
pose a handover scheme based on DRL. Prioritized experience we discuss the related works. In Section III, we introduce
replay (PER) is incorporated in the training process of our the system model of RIS-assisted mmWave cellular networks.
DRL agent model, which is able to accelerate the convergence Then the reward prediction under link blockage is presented in
of training and improve the performance of the DRL model. Section IV. Section V proposes the PER-DRL-based handover
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2245
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2246 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022
where hru gkb denotes the equivalent channel coefficient for the
2) Channel Coefficient of Reflective Channels: By consid-
BS-RIS-UE channel. For the reflective channels, the weights
ering the RIS channels, for the given weights of RIS Φ,
of RIS Φ is contained in hru gkb . It is defined by the antenna
the channel coefficient of involving the BS-RIS-UE channel
can be given as: pattern s and kb at RIS and BS b, respectively. hub (kb ) is the
channel coefficient of the BS-UE channel from BS b to UE u.
Pru
Prb
Pt denotes the transmission power. N0 is the noise.
gkb =
hru Aru (θnu , ψnb , Φg )Arb (θnb , wkb )
To track several BSs simultaneously, each UE broadcasts
nu=1 nb=1
×anb anu , (3) uplink sounding reference signals in dedicated slots, steering
through K beampatterns, one at a time, to cover the whole
where Aru (θnu , ψnb , Φg ) = aH (θnu ) ◦ [Φ1 , . . . , ΦM ]g a(ψnb ) angular space. Each BS b has K SINRs measurements yub =
and Arb (θnb , wkb ) = aH (θnb )wkb are the equivalent beampat- [SINR1ub , . . . , SINRK
ub ] at UE u. For each UE u, each mmWave
tern for the reflective channels RIS-UE and RIS-BS. Here ◦ is gNB collects sounding reference signals (SRS) and the central
the hadamard product. nu and nb denote the cluster index of coordinator fills a report table (RT) for UE u based on SRSs,
the channel hr,bu and Gb , respectively. a(θnu ) is the antenna as in Table I. Each entry represents the collected SRS and
steering vector for the channel hr,bu . aH (θnu ) and a(ψnb ) power in dB, transmitting through various beampatterns. s ∈
are the antenna steering vectors for the reflective channel {1, . . . , S} denotes the index of SINR samples that mmWave
Gb . Please find the definition in detail in [37]. Please note gNB collected.
that the RIS matrix Φg has been incorporated in the equiva- Based on the observations of directional communication
lent beampattern Aru (θnu , ψnb , Φg ). We assume the blockage at mmWave band [11], the mmWave links showed a strong
coefficient of channel hr,bu for the cluster nu is cnu . anb and inter-beam correlation (coefficient > 0.5) on channel coeffi-
anu are the complex channel gains. There are various chan- cient. It indicates that we still can observe the spatial correla-
nel estimation schemes for the RIS-assisted communication tion on SINR regarding different beampatterns even when the
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2247
blockage seems to land on a single beam [11]. Under frequent the approximation of blockage coefficients vector b̃s,kb can
blockages effect, the function above needs to take into account be represented as below:
the blockage. If we get a part of information related to link
blockage, the handover decision process under such a scenario rep 2
b̄ub = minb [hub
skb ]ms − h(gkb ) (u, b); b̄; p
, (6)
can be further optimized.
2) Coordinator for UE Handover: For the RT filled for where we append the two vectors into b̄ = [bT , cT ]T . In [11],
each UE, each mmWave cell sends this information, through the authors develop a table searching scheme to approach
the X2 link, to the coordinator [23]. The coordinator builds to the actual b. The authors quantize every single blockage
a complete report table (CRT). After assessing the CRT, coefficient bi , and the quantized bi is supposed to approach
the handover decision for the optimal mmWave gNB with the to actual bi . Apparently, the performance of table searching is
optimal beampatterns is evaluated for each UE, considering greatly affected by the quantization level of bi , and the global
the side information like SINRs and power. The criterion for minimum cannot be captured by the low-resolution table.
the efficient UE handover is described in Section V. Besides, a high quantization level requires a large memory.
IV. R EWARD P REDICTION U NDER L INK B LOCKAGE B. Channel Blockage Prediction Algorithm
To achieve a good trade-off between link restoration and In this work, we aim to develop a low-weight algorithm
UE handover, blockage status over various channel clusters to approximate the blockage coefficient b. In the first step,
are needed. In this section, we will introduce our blockage we can reformulate the objective function in (6) as below:
status prediction algorithm, which predicts the link blockage opt ∗ opt
status and reward for beam tracking. min ([hub
gkb ]ms − (h(g,kb ) ) b̄) ([hgkb ]ms − (h(g,kb ) ) b̄)
T ub T
b
0 ≤ b̄i ≤ 1, i ∈ {1, . . . , P }, (7)
A. Blockage Status Prediction Based on Limited Channel where [hub
gkb ]ms isthe channel coefficient observed by beam-
Observations pattern kb , hopt is the result for channel coefficient para-
(g,kb )
According to the observation in [11], given a 60 GHz meters and b̄ = [bT , cT ]T indicates the blockage coefficients.
phased array with multiple beam directions, blockage of one hopt
(g,kb ) can be expressed as
beam affects the performance of other beams. When obstacles
block a certain angular cluster, all beams can be affected in hopt
(g,kb ) = [Akb (θ1 , wkb ) × a1 , . . . ,
a correlated way. The authors in [11] find that such spatial Aru (θPru , ψPrb , Φg )Arb (θPru , wkb ) × aPru aPrb ] (8)
correlation among beampatterns can be assessed by a modeling
framework. The objective in (7) can be further simplified to
Suppose the measured channel coefficients filtered by beam- opt ∗ opt
([hub
gkb ]ms − (h(s,kb ) ) b̄) ([hgkb ]ms − (h(s,kb ) ) b̄)
T ub T
pattern (g, kb ) can be represented as [hub
gkb ]ms . Based on the
2 ub ∗ opt
prediction model in [11], the extent of blockage effect on gkb ]ms | − 2Re [hgkb ]ms (h(s,kb ) ) b̄ + b̄ Ab̄,
= |[hub H T
By incorporating the blockage coefficients for each cluster, Proposition 1: Matrix A is positive definite.
we can reformulate the measured channel coefficients from Proof: For any vector x = 0, xT Ax can be expressed as:
beampattern (s, kb ) as:
xT Ax = x2i ∗ |hopt
(g,kb ) (i)|
2
hrep
(gkb ) (u, b); (b, c); p i
∗
Pub + 2xi xj ∗ Re hopt opt
(g,kb ) (i)h(g,kb ) (j) . (10)
= Akb (θi , wkb ) × bi × ai i=j
i=1
We can observe that in (10), vector x can be replaced by link
Pru
Prb
coefficient vector b, which only contains positive numbers.
+ Aru (θnu , ψnb , Φg )Arb (θnb , wkb )
The xi xj in (10) is postive and the product of matrix meets
nu=1 nb=1
×cnu × anb anu . (5) xT Ax > 0. Therefore, matrix A is positive definite.
Based on the above observation, the objective in (9) can be
Based on the parameter p = {θi , ai , ψi , . . .}, the vector b further simplified to:
and c are the unknown blockage coefficients which need to be 1
∗ opt
approximated. For the ease of notation, we would use anb anr min b̄T Ab̄ − Re [hub gkb ]ms (h(g,kb ) )
H
b̄
b̄ 2
to indicate the product of path gains for clusters nr and nb,
0 ≤ b̄i ≤ 1, i ∈ {1, . . . , P }. (11)
which includes the phase information. With observed channel
coefficients from beampattern (g, kb ), if we put {bi , cnu }, i ∈ We usually can solve this problem using quadratic optimiza-
{1, . . . , Pub }, nu ∈ {1, . . . , Pru } into a single vector b̄, tion techniques. However, such approaches need to iteratively
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2248 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022
reduce the sum of the squares of the errors through a sequence contained in the state space st . In practice, we apply data
of updates, which incurs a large training overhead for our processing techniques like normalization to the input data [40].
scheme. 3) Agent: In this work, the RIS-assisted communication
To further reduce the computational complexity, the inequal- system acts as an environment. There is a central controller
ities serving as constraints in (11) can be incorporated into the in responsible for handovers and it is regarded as a learning
objective function with the lagrangian multiplier so that we can agent. The agent needs to be trained off-line according to a
get the close-form results for (11) as follow: design objective. The agent is trying to reduce the number of
Proposition 2: A closed-form solution for problem (11) can handovers, meanwhile maintaining a high data rate under fre-
be given as: quent link blockages, which requires to achieve the maximized
+ cumulative reward after a series of handover actions. For every
b̄∗ = A−1 (−Re ([hub ∗ opt
gkb ]ms ) (h(g,kb ) )
H
− λ∗ I)P . (12) handover action, if we denote its reward as rt , the cumulative
reward function Cu , during a time period, is defined as
Proof: The parameter λ∗ is the lagrangian multiplier in
t2
the dual problem. We prove in Appendix that the solution Cu = rtu , (15)
in (12) can meet the KKT condition and thus can achieve t=t1
dual optimality. In order to estimate λ∗ , we developed a where t1 and t2 are two epoachs. rtu denotes the instantaneous
lightweight algorithm based on the ellipsoid method, which reward for a time slot with duration ΔT . Its definition usually
has a high convergence rate. According to the link blockage needs to consider factors like SINR, beampattern, time slot
coefficient b̄∗ , our scheme can be aware of link blockage and length and so on.
take handover policy based on the predicted performance of 4) Reward: For every time epoch, our objective is to min-
the beam tracking. imize the total handover consumption while maintaining high
channel quality to provide sufficient data rate. Reward r(st , at )
V. DRL-BASED H ANDOVER S CHEME under state st and handover action at can be represented as
In this section, we present the DRL-based handover scheme below:
for the mmWave cellular network [40], [41], which is able to (ΔT − τb ) W if iHO = it−1 ,
r(st , at ) = Ui Rui , (16)
tackle systems with a large state-action space.
(ΔT − τa − τH ) Ui Rui , if iHO = it−1 ,
W
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2249
An action at is the output of the UE handover policy at C. Double Deep Q-Network (DDQN)
time epoch t, and state st is the observation of environment DDQN is proposed to improve the training efficiency of the
under previous action. Under the policy π, the expectation of DNN Q-value estimator, so that the agent can be applied to
discounted reward conditioning on the state st and action at solve the optimization problem under high-dimensional input
is named as the state-action value function which is usually state space. The state-action value function can be estimated
given as: by a non-linear estimator or the neural networks.
∞ Parameterized by the network coefficient θ, to approximate a
Q(st , at ) = E γ τ rt+τ +1 |st , at . (18) given state-value function Q(st , at ), the loss function between
τ =0 Q(st , at ; θ) and Q(st , at ) has to be minimized and it is
expressed as
To take the better policy, in Q-learning, we usually consider
the greedy policy strategy and the updated policy can be 1 2
L(θ) = Q(st , at ) − Q(st , at ; θ) . (23)
represented as N
t≤N
π (st ) = arg maxa Qπ (st , at ). (19) For the ease of presentation, we take the mean squared
error to denote the loss function L(θ), which requires the
In this case, according to the policy improvement theorem, appropriate gradient descent algorithm to minimize the loss
the time difference (TD) is given as and adjust the parameter θ. According to the universal approxi-
mation (UA) theorem, for any Lebesgue-integrable function f ,
TD(st+1 , at+1 ) = r(st+1 , at+1 ) + γ maxa Q(st+1 , a ) there exists fully connected Rectified Linear Unit (ReLU) θ
−Q(st , at ), (20) which makes the integration of loss function be smaller than
any positive . Two issues need to be addressed: non-i.i.d
Please note that in the process of reinforcement learning, input data and non-stationary targets. To address these issues,
the state-action function/state function is updated iteratively experience replay and target Q-Networks have been proposed
and to capture the random changes imposed by environment, for the DDQN to obtain the knowledge from the environment
the state-action function is updated as the following: more efficiently [43].
1) Experience Replay: DRL utilizes the experience replay
Q(st , at ) = Q(st , at ) + α TD(st+1 , at+1 ), (21) unit to help the DNN memorize the sequential knowledge
from the environment. At each time instant t, the agent stores
where α is the learning rate which determines the pace of the
its interaction experience tuple et = {st , at , r(st , at ), st+1 }
agent to adapt to the environment random changes.
into experience replay memory Dt = {e1 , . . . , et }. During
In the RL algorithm, we aim to find an optimal policy
the training, the stored samples are randomly extracted from
to achieve the largest cumulative reward. According to the
the replay memory based on the experience replay mechanism.
policy update in (19), we can get a new policy π (st ) so that
The motivation for experience replay is to break the correlation
the condition V (s) ≤ Qπ (s, π (s)) can be satisfied, where
among the sequential data by randomly sampling data in
V (s) is the value function. According to this observation,
Dt . In this way, the mechanism can remove the correlation
the following condition can be given as
between the samples, thereby accelerating convergence and
Qπ (s, π (s)) avoiding significant divergence.
2) Q Target Network: To serve as a non-linear state-action
= Eπ rt+1 + γV π (st+1 )|st = s value function approximator, DNN needs to learn a map-
≤ Eπ rt+1 + γQπ (st+1 , π (st+1 ))|st = s . (22) ping for constantly changing input and output. In DDQN,
two DNNs, θ− and θ have been proposed to achieve fast
By following this observation, we can continue to expand convergence. DNN θ− is selected to retrieve the state-action
this inequality for Qπ (st+1 , π (st+1 )). In the end, we can value function and its output serves as the fixed labeling in
derive that V π (s) ≤ V π (s). Then the policy π must obtain a time period. The second network θ includes updates in the
greater or equal expected return from all states st ∈ S. If the training and learning stage. In this process, the target network
RL agent repeats this iterative process, we can continu- is supposed to provide the target state-action value yj for the
ously observe the performance improvement. In this work, entry j.
the instantaneous reward consists of the cost due to the
handover consumption and the spectrum efficiency. rj , if training ends,
yj =
(24)
The traditional RL scheme, i.e., Q-learning, has the limited rj + γmaxa Q(sj+1 , a ; θ− ), otherwise.
performance in the dynamic environment. For example, in this
For episode i, the DDQN agent tries to solve the following
work, the state space is continuous and has vast combinations
problem:
of elements in the state space. With such a state space,
2
the Q-learning agent will take a long time to explore the min L(θi ) = min Eet ∼D yj − Q(sj , aj ; θ) . (25)
environment and obtain its knowledge, which causes a meager
convergence rate of training. In this paper, we propose to The error between the target value and the estimation is the
utilize DDQN, which is based on deep learning techniques time difference (TD) value. The gradient descent is performed
and this concept is developed by Google [43]. with respect to the network parameters θ, where the network
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2250 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022
parameter is updated according to the partial differentiation of E. RIS-Assisted Handover via PER-DRL
the function (25). After every M updates, the synchronization To enable the blockage-aware handover, an agent is trained
between network θ− and θ is performed. at the central. It is responsible to predict the channel blockage
and adjust beamformers and RIS phase shifts. At the beginning
D. Prioritized Experience Replay of training, global network information like power allocation,
CSI of all users, and SINRs after beamsweeping with different
PER is one of the most important improvements for RL beampatterns are collected. Some of information denoted in
algorithms. It is built on top of experience replay buffers. For Section V-A serves as the input while training the PER-
more frequent replay transitions with high expected learning DRL agent. Parameters like discounter factor γ, exploration-
progress, the importance is measured by the magnitude of their exploitation factor , and exponents for prioritized replay
TD error. This prioritization can lead to a loss of diversity, α and β are initialized at the beginning. To accelerate the
which is alleviated with stochastic prioritization, and introduce learning stage, the exploration-exploitation balance have to
bias, which can be corrected with importance sampling. The be addressed carefully. In our scheme, the value of is
stochastic sampling method interpolates between pure greedy updated in a monotonically decreasing manner so that the
prioritization and uniform random sampling. The probability agent explores environment with a high probability to take the
of being sampled is ensured to be monotonic in a transition’s random action, i.e., taking beampattern indexes for BSs and
priority, while guaranteeing a non-zero probability even for the RIS, performing frequent handover at the early stage. As the
lowest-priority transition. Concretely, we define the probability increasing of training episode, the output of the non-linear
of sampling transition i as estimator of state-action value functions is more reliable and
pα can converge gradually. The instantaneous reward is obtained
P (i) = i α , (26) by the agent for each action. At the same time, the agent
k pk
also gets the latest environment observations for the state. For
where pi ≥ 0 is the priority of transition. The exponent example, the transition (st−1 , at−1 , rt , γt , st ) has to be stored
determines how much prioritization is used, with α = 0 in a buffer. The experience in the replay buffer is selected by
corresponding to the uniform case. The magnitude of the TD the PER scheme to generate mini-batches and it is used to
error (squared) is what the algorithm aims to minimize in the train DQN.
Bellman equation. Hence, the algorithm picks the samples with After meeting the convergence threshold of PER-DDQN,
the largest error more frequently so that DNN can minimize the model is going to be saved. Due to the PER strategy,
it. Prioritized replay introduces bias because it changes this the channel variation at mmWave frequency band can be
distribution in an uncontrolled fashion, and therefore changes tolerated in the trained model. The trained model is applied at
the solution that the estimates will converge to. We can correct the central controller to perform the handover for RIS-assisted
this bias by using importance-sampling (IS) weights: mmWave cellular networks. In general, the model training is
1 1 β
performed offline while the agent implementation stage is per-
wi = × . (27) formed in real time. For a different network implementation,
N P (i) the PER-DDQN agent needs to be trained again to incorporate
The IS in PER is to correct the over-sampling with respect the dynamics of environment.
to the uniform distribution. These weights can be folded
into the Q-learning update by using wi δi instead of δi . The F. Implementation of PER-DRL Agent
accumulated gradient indicated by Δ is given as
We provide the details regarding the training of PER-DDQN
Δ ← Δ + wi δi · ∂θ Q(sj−1 , aj−1 ; θ). (28) agent and metrics to test the performance of the proposed
schemes.
In (28), the weight wi is coupled with the δi TD term 1) Setups for RIS-Assited mmWave Cellular Networks:
during training, with wi δi , because the δi is multiplied with the 3 BSs with N antenna elements are located in the 400m ×
gradient ∂θ Q(sj−1 ), aj−1 ; θ) following the chain rule. For sta- 400m rectangular plane, each with the radius of 50m. The
bility reasons, we always normalize weights by 1/(maxi wi ) RIS consists of M = 16 phase shifting elements in most
so that they only scale the update downwards. Here the cases. The overlay between adjacent BSs is covered by the
β term in the exponent controls the effect of prioritization RIS. The coordinates of BSs are pre-defined, i.e., the BS 1, 2,
applied on each term. Due to the reason that training is highly and 3 locate at (0, 0), (100, 0) and (50, −80) in meter (m). The
unstable at the beginning, β starts from small values of 0.4 to RIS locates at (50, −25) in meter (m). The same beamforming
0.6 and anneals towards one. In this process, IS corrections codebook [19] is considered in this work, which includes a
matter more near the end of training. Compared with the set of specified beamformers and weights for the RIS phase
standard DDQN architecture, PER-DDQN aims to change the shifting matrix. The resolved channel information indicated by
sampling distribution by using a criterion to define the priority the tuple is stored with the previous samples, which serves as
of each tuple of experience in the experience replay. For the input for the DNN.
instance, the tuple experience with a big difference between 2) Trainings of the PER-DRL Agent: The large variation
the prediction and the TD target indicates the agent has a lot in relation with severe mmWave channel fading and different
to learn and thus has a high priority. UE locations can be viewed in the magnitude of the samples,
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2251
which usually varies from 10−9 to 10−14 . In this sense, Algorithm 1 PER-DRL-Based Handover Scheme
we normalized the samples in the transition tuples to realize 1: Initializaiton :
the dataset scaling. For the DDQN model, the upper limit Channel representation p, SINRs {SINRb }b∈B ;
for the episode number is 10,000 and the batchsize is set to The handover agent initializes the step-size η;
be 64. For the DQN training model, the experience replay size Minibatch k, replay period K;
is set to be 8,000 and the buffer size for PER-DDQN is set to Parameters γ, , exponents α, β;
be 100,000, where the sampling bias and importance weight Central controller initializes handover;
are defined as in (27). To adapt to more complex datasets Beamformers and RIS;
due to the channel blockage at mmWave frequency band Initialize replay memory H = ∅, Δ = 0, p1 = 1;
and better generalize to previously unseen signal blockage The weight vector θ of the Q-network;
scenarios, for PER-DDQN model, we consider multiple layers The weight vector θ− of the target Q-network.
neural networks. In this work, after balancing the training 2:for t = 1 to T do:
efficiency and handover performance, we adopt 6 feedforward 3: Obtain st , rt ;
fully connected network layers for the PER-DRL model, 4: Store transition (st−1 , at−1 , rt , γt , st ) in H;
including the input layer, the hidden layer and the output layer. 5: With maximal priority pt = maxi<t pi ;
In this way, the DNN can describe the mapping between state 6: if t ≡ 0 mod K then:
transition and actions, which represents the BSs beamformers 7: for j = 1 to k do:
and RISs reflecting beamforming matrix. The size of 4 hidden α
8: Sample transition j ∼ P (j) = pα j/ i pi ;
layers are 500, 250, 200, 200, respectively. Each of the neurons 9: IS weight wj = (N · P (j))−β / maxi wi ;
in a hidden layer connects all the outputs of the last layer. Each 10: δj = rj + γ maxa Q(st , a )
layer utilizes the ReLu activation function. The output layer is 11: −Q(sj−1 , aj−1 );
a linear perceptron. To achieve an efficient interaction between 12: Update transition priority pj ← |δj |;
the agent and environment, the training data involving all of 13: Δ ← Δ + wj δj · ∂θ Q(sj−1 , aj−1 ; θ);
random dynamics is stored. 14: end for
3) Computational Complexity Analysis: From Algorithm 1, 15: Update weights θ ← θ + η · Δ, reset Δ = 0;
collecting transitions and executing back-propagation to train 16: Update weights of target network θ− ← θ;
the parameters account for the computational complexity. 17: end if
We considered the neural networks with weights and activation 18: Evaluate the SINRs for UEs;
functions. If we use a sequence (w1 , . . . , wL ) to denote the 19: Evaluate consumption on the handover;
number of nodes in each layer, we can get the following result. 20: Configure RIS weight and handover at ∼ πθ (st );
Theorem 1: After considering the PER mechanism, 21: end for
the computational complexity of DDQN can be
written as: I1 = O T − T −T Kmod K ( wi wi+1 ) +
T −T mod K
beam pattern is applied in this work. The system operates
K 3k wi wi+1 + log2 |D| .
on 28 GHz and the bandwidth is 100 MHz. The channel
Proof: For each time step, the computational complex- reciprocity holds in the coherence time. The setting parameters
ity of DNN is O( wi wi+1 ). Considering the mini-batch
of channel can be viewed as below. In the channel coherent
size as k, it requires T episodes. After every K transi-
time, we assume the number of clusters is distributed as
tions, the mini-batch operation
has the computational com- specified in [11]. The noise power N0 is expressed in the form
plexity as O( T −T Kmod K 3k wi wi+1 )), which includes the N0 (dB)
back-propagation and feedforward pass process. According of N0 = 10 10 , where N0 (dB) = −174 + 10log10(BW ) +
to [19], the computational complexity of PER is O(log2 |D|). FdB and FdB = 10dB is the noise figure. To have a better
There are T − T −T Kmod K iterations needed to generate
prediction on the required SINR in a period, the locations
of BSs are modeled as a Poisson Point Process (PPP) with
transitions with the computational complexity as O T −
density λm . In order to emulate the user mobility, we adopt
T −T mod K
K ( w i wi+1 ) . In our proposed scheme, PER the Random Way Point (RWP) with a moving speed at 2m/s.
schemes are utilized to improve the learning efficiency and The type of prioritization is proportional to pi = |δi | + ,
enhance the convergence speed, which requires extra compu- where |δi | is the TD of transition i. The hyperparameters were
tational complexity. However, after considering the (PER) unit, initialized as α = 0.7, β = 0.5. To evaluate the performance of
the proposed scheme can achieve better performance than the the proposed scheme, we compare our scheme in RIS-assisted
classical DDQN scheme. mmWave cellular networks with the following schemes:
• Considering PER in the training strategy, DDQN based
handover scheme is able to combat the frequent channel
VI. N UMERICAL R ESULTS blockage, which is denoted as RIS-SENS-DDQN. Here
We consider a mmWave Massive MIMO system adopting SENS stands for the blockage coefficients resolved by
analog phase-shifter with a single radio-frequency (RF) chain. the proposed algorithm. To determine an appropriate
The BSs adopt the uniform linear array (ULA) with 64 anten- handover policy, the value of Q function is estimated by
nas. The UE is with single antenna. The spatial-matched-filter DQN algorithm.
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2252 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2253
A. Prediction Accuracy of Proposed Lightweight Algorithm gering schemes [28]. We discussed the number of handovers
In the process of prediction, the UE movements follow a in the following three cases. In the first case, we had 3 BSs
random way point (RWP) [44] mobility pattern with speed and 6 users.
of 2m/s. We recorded the locations of the UE and get the Fig. 3 shows the average handover number of 6 schemes.
distances from the UE to B BS. By default, UE adopts the For a single process, we consider 500 channel blockages over
quasi-omni-directional beam pattern while the BS utilizes a various SNRs. The number of elements in the phase shifting
directional beam pattern. RIS utilizes the weights defined array in RIS is set to be M = 16 and N = 64 antenna elements
similar to the scheme in [19], [20]. An obstacle (i.e., human at each BS is considered. As expected, for a single process,
body, vehicle, building) randomly blocks the clusters within the average handover number of all 6 approaches decreases
the mmWave channel. Based on the parameters like AoA, monotonically with increasing of SNRs. After improving
phase shift, amplitude attenuation from the scanned CIR trace, SNRs/transmission power, for the instant with slight channel
estimated before blockage, the CIR traces from the current blockage, the data rate at UEs can be maintained. In addition,
beam pattern are recorded. Then, the receiver employs the from Fig. 3, we observe that our proposed scheme incorpo-
proposed lightweight algorithm to predict the SINRs and rating RIS has less handover number compared with other
corresponding reward for maintaining this link. schemes merely considering beam-tracking (denoted by BF-
The proposed lightweight algorithm can efficiently estimate DDQN) or proactive device handover (denoted by threshold).
link blockage status for RIS-assisted mmWave cellular net- Actually, with the corporation of RIS, UE can jointly adjust
works. For instance, Fig. 2(a) shows the accuracy of the beamformer w and RIS beampattern Φ to channel clusters
proposed algorithm while predicting the best beam pattern that is not blocked. Besides, with the inference of resolved
index for the phase arrays containing 64 antenna elements. blockage status information, the schemes with blockage infor-
The available beam pattern sector varies from 4 to 16. For a mation, i.e., denoted by RIS-SENS-PER or RIS-SENS-DDQN,
4-beam antenna array with SNR = 12 dB, in the presence of can outperform the learning strategy only considering state
RIS, our lightweight prediction algorithm achieved the mean information like SINR and channel information. In fact, our
prediction accuracy over 89%. As the increasing of beam approach provides detailed environment observation for the
pattern numbers, the prediction accuracy drops to 77% for channel blockage status, which gives more efficient feedback
the case with 16 beam patterns. In Fig. 2, the accuracy under for the blockage effect, while the resting approaches cannot
various level of transmitting power is listed. We can observe obtain such information in the training process. Moreover,
that as the rising of transmitting power, the prediction accuracy from Fig.3, we can observe that as the SNRs increase, the per-
gradually increases, where our algorithm outperforms the table formance of handover agent can gradually reduce the handover
searching algorithms [11] with quantization level 128 and 256, consumption. This is due to the fact that high SNRs enable
respectively. In Fig. 2, we compared the prediction accuracy a more accurate observations for environment states. Having
of the proposed schemes under various SNRs. We can find the less noisy environment observations can also improve the
that within high SNR regime, the impact of noise on the handover performance.
estimation of CSI can be limited and we thus can obtain Fig. 5 shows how the handover number varies under differ-
precise channel coefficient model reconstruction and blockage ent user numbers during random blockages. In Fig. 5, the CDF
status, which can directly improve the performance of the of the number of handovers is presented. For each process,
prediction accuracy. we conducted 500 steps. From this figure we observe that,
The empirical cumulative distribution function (CDF) of with the inference information of blockage status, the RIS-
three schemes can be viewed in Fig. 3(a). The accuracy of our assisted handover scheme RIS-SENS-PER obtains the smallest
proposed algorithm is represented by the black curve. With a number of handovers, which outperforms schemes without
high prediction accuracy as high as 90%, our algorithm has blockage status information or the beamformer-assisted han-
the lowest probability (10%) to fail the target. Low prediction dover scheme. Due to the ability to recover links under various
accuracy of existing works is due to two reasons: (i) The blockages, our scheme can restore links with good qualities
table search is based on the assumption that the quantized and at the same time, the overall handover procedure is further
link blockage coefficients can approach to actual coefficients. optimized by DRL based handover scheme. From the figure,
Quantized coefficient b doesn’t have enough updating rate to for two different learning strategies DDQN and PER-DDQN,
approach to the global minimum; (ii) The table search based we can find that PER based scheme has the lowest CDF. The
algorithm developed in [11] needs to enlarge the searching existing schemes that rely on partial information about SINR
space as the increasing of parameter numbers and thus requires (rate-based) or average SINRs (channel-based) do not have the
a large memory size, which may cause redundant computation ability to sense the blockage status and thus cannot optimize
overhead. overall handover process.
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2254 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2255
VII. C ONCLUSION
In this work, we developed a RIS-assisted handover scheme
based on DRL to mitigate large handover consumption due
to frequent link blockages at mmWave frequency band. The
intelligent reflecting surface is utilized to enrich the scattering
and reflecting environment for signal transmission. To better
mitigate the link blockage and refine reflective channels,
passive phase shifting array in the RIS and beamformers
at BSs are jointly designed. At the same time, to improve
the efficiency of model training, the DRL agent based on
PER-DDQN is developed. Numerical results show that UE
handover assisted by RIS can operate in various blockage
scenarios and achieve the lower handover number and higher
throughput than existing schemes.
A PPENDIX
Proof of Proposition 2: The objective function in prob-
lem (11) involving the lagrangian multiplier λ ∈ RP ×1 can
be represented as
1 ∗
L(b, λ) = bT Ab − Re ĥ k; p (hopt k )
H
b + λT b. (29)
2
We can get the derivative of the above function as
Fig. 7. The histogram of spectrum efficiency and handover number under
∗
different dataset. L(b, λ) = Ab − Re ĥ k; p (hopt k )H
+ λIP . (30)
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2256 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022
[9] M. Alrabeiah and A. Alkhateeb, “Deep learning for mmWave beam and [30] X. Liu et al., “Learning to predict the mobility of users in
blockage prediction using sub-6 GHz channels,” IEEE Trans. Commun., mobile mmWave networks,” IEEE Wireless Commun., vol. 27, no. 1,
vol. 68, no. 9, pp. 5504–5518, Sep. 2020. pp. 124–131, Feb. 2020.
[10] S. Collonge, G. Zaharia, and G. El Zein, “Influence of the human activity [31] L. Sun, J. Hou, and T. Shu, “Optimal handover policy for mmWave
on wide-band characteristics of the 60 GHz indoor radio channel,” IEEE cellular networks: A multi-armed bandit approach,” in Proc. IEEE
Trans. Wireless Commun., vol. 3, no. 6, pp. 2396–2406, Nov. 2004. Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6.
[11] S. Sur, X. Zhang, P. Ramanathan, and R. Chandra, “BeamSpy: Enabling [32] B. Sheen, J. Yang, X. Feng, and M. M. U. Chowdhury, “A deep learning
robust 60 GHz links under blockage,” in Proc. 13th USENIX Symp. Netw. based modeling of reconfigurable intelligent surface assisted wireless
Syst. Design Implement. (NSDI), 2016, pp. 193–206. communications for phase shift configuration,” IEEE Open J. Commun.
[12] L. Yan et al., “Machine learning-based handovers for sub-6 GHz Soc., vol. 2, pp. 262–272, 2021.
and mmWave integrated vehicular networks,” IEEE Trans. Wireless [33] N. K. Kundu and M. R. McKay, “Channel estimation for reconfigurable
Commun., vol. 18, no. 10, pp. 4873–4885, Oct. 2019. intelligent surface aided MISO communications: From LMMSE to deep
[13] M. Polese, M. Giordani, M. Mezzavilla, S. Rangan, and M. Zorzi, learning solutions,” IEEE Open J. Commun. Soc., vol. 2, pp. 471–487,
“Improved handover through dual connectivity in 5G mmWave mobile 2021.
networks,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 2069–2084, [34] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and
Sep. 2017. C. Yuen, “Reconfigurable intelligent surfaces for energy efficiency in
[14] Y. Xiu, Y. Zhao, Y. Liu, J. Zhao, O. Yagan, and N. Wei, “IRS- wireless communication,” IEEE Trans. Wireless Commun., vol. 18, no. 8,
assisted millimeter wave communications: Joint power allocation and pp. 4157–4170, Aug. 2019.
beamforming design,” 2020, arXiv:2001.07467. [Online]. Available: [35] C. Huang et al., “Holographic MIMO surfaces for 6G wireless networks:
http://arxiv.org/abs/2001.07467 Opportunities, challenges, and trends,” IEEE Wireless Commun., vol. 27,
[15] D.-W. Yue, H. H. Nguyen, and Y. Sun, “MmWave doubly- no. 5, pp. 118–125, Oct. 2020.
massive-MIMO communications enhanced with an intelligent [36] S. Hu, F. Rusek, and O. Edfors, “Beyond massive MIMO: The potential
reflecting surface,” 2020, arXiv:2003.00282. [Online]. Available: of data transmission with large intelligent surfaces,” IEEE Trans. Signal
http://arxiv.org/abs/2003.00282 Process., vol. 66, no. 10, pp. 2746–2758, May 2018.
[16] S. Huang, Y. Ye, M. Xiao, H. V. Poor, and M. Skoglund, “Decentralized [37] J. He, H. Wymeersch, and M. Juntti, “Channel estimation for RIS-
beamforming design for intelligent reflecting surface-enhanced cell-free aided mmWave MIMO systems via atomic norm minimization,” 2020,
networks,” IEEE Wireless Commun. Lett., vol. 10, no. 3, pp. 673–677, arXiv:2007.08158. [Online]. Available: http://arxiv.org/abs/2007.08158
Mar. 2021. [38] L. Wei, C. Huang, G. C. Alexandropoulos, C. Yuen, Z. Zhang, and
[17] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Deep learning for large M. Debbah, “Channel estimation for RIS-empowered multi-user MISO
intelligent surfaces in millimeter wave and massive MIMO systems,” in wireless communications,” IEEE Trans. Commun., vol. 69, no. 6,
Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6. pp. 4144–4157, Jun. 2021.
[18] S. Gong et al., “Towards smart wireless communications via intelligent [39] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, Jr., “Channel
reflecting surfaces: A contemporary survey,” 2019, arXiv:1912.07794. estimation and hybrid precoding for millimeter wave cellular systems,”
[Online]. Available: http://arxiv.org/abs/1912.07794 IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846,
[19] H. Yang, Z. Xiong, J. Zhao, D. Niyato, L. Xiao, and Q. Wu, “Deep Oct. 2014.
reinforcement learning-based intelligent reflecting surface for secure [40] L. Xiao et al., “Reinforcement learning-based downlink interference
wireless communications,” IEEE Trans. Wireless Commun., vol. 20, control for ultra-dense small cells,” IEEE Trans. Wireless Commun.,
no. 1, pp. 375–388, Jan. 2021. vol. 19, no. 1, pp. 423–434, Jan. 2020.
[20] H. Yang et al., “Intelligent reflecting surface assisted anti-jamming com- [41] C. Huang, R. Mo, and Y. Yuen, “Reconfigurable intelligent surface
munications based on reinforcement learning,” 2020, arXiv:2012.12761. assisted multiuser MISO systems exploiting deep reinforcement learn-
[Online]. Available: http://arxiv.org/abs/2012.12761 ing,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1839–1850,
[21] D. Zhao, H. Lu, Y. Wang, and H. Sun, “Joint passive beamforming and Jun. 2020.
user association optimization for IRS-assisted mmWave systems,” 2020, [42] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.
arXiv:2007.01069. [Online]. Available: http://arxiv.org/abs/2007.01069 Cambridge, MA, USA: MIT Press, 2018.
[22] K. Gao et al., “Beampattern-based tracking for millimeter wave commu- [43] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
nication systems,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), with double Q-learning,” in Proc. 13th AAAI Conf. Artif. Intell., 2016,
Dec. 2016, pp. 1–6. pp. 2094–2100.
[23] S. Kang, S. Choi, G. Lee, and S. Bahk, “A dual-connection based [44] C. Bettstetter, H. Hartenstein, and X. Pérez-Costa, “Stochastic properties
handover scheme for ultra-dense millimeter-wave cellular networks,” in of the random waypoint mobility model,” Wireless Netw., vol. 10, no. 5,
Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6. pp. 555–567, 2004.
[24] A. Alkhateeb, I. Beltagy, and S. Alex, “Machine learning for reli-
able mmWave systems: Blockage prediction and proactive handoff,” in
Proc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), Nov. 2018,
pp. 1055–1059.
[25] K. Qi, T. Liu, and C. Yang, “Federated learning based proactive handover
in millimeter-wave vehicular networks,” in Proc. 15th IEEE Int. Conf.
Signal Process. (ICSP), Dec. 2020, pp. 401–406.
[26] Y. Koda, K. Nakashima, K. Yamamoto, T. Nishio, and
M. Morikura, “Handover management for mmWave networks with
proactive performance prediction using camera images and deep
reinforcement learning,” 2019, arXiv:1904.04585. [Online]. Available:
http://arxiv.org/abs/1904.04585
[27] S. Zang et al., “Mobility handover optimization in millimeter wave
Long Jiao (Student Member, IEEE) received the
heterogeneous networks,” in Proc. 17th Int. Symp. Commun. Inf. Technol.
B.Sc. degree in information security from Xidian
(ISCIT), Sep. 2017, pp. 1–6.
University (XDU), Xi’an, China, in 2016. He is cur-
[28] M. Mezzavilla, S. Goyal, S. Panwar, S. Rangan, and M. Zorzi, “An rently pursuing the Ph.D. degree with George Mason
MDP model for optimal handover decisions in mmWave cellular net- University, Fairfax, VA, USA. He has been with
works,” in Proc. Eur. Conf. Netw. Commun. (EuCNC), Jun. 2016, George Mason University, since 2016. His current
pp. 100–105. interests include 5G communication systems, 5G
[29] L. Sun, J. Hou, and T. Shu, “Spatial and temporal contextual physical layer security, operational security of spec-
multi-armed bandit handovers in ultra-dense mmWave cellular net- trum sharing systems, and RIS/IRS-assisted wireless
works,” IEEE Trans. Mobile Comput., early access, Jun. 5, 2020, doi: communications.
10.1109/TMC.2020.3000189.
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2257
Pu Wang (Student Member, IEEE) received the Huacheng Zeng (Senior Member, IEEE) received
B.S. degree in telecommunications engineering from the Ph.D. degree in computer engineering from
Xidian University in 2014, where he is currently Virginia Polytechnic Institute and State University
pursuing the Ph.D. degree with the School of Cyber (Virginia Tech), Blacksburg, VA, USA. He is cur-
Engineering. His research interests include backscat- rently an Assistant Professor with the Department of
ter communication, wireless information and power Computer Science and Engineering, Michigan State
transfer, physical layer security, and information University (MSU), East Lansing, MI, USA. Prior
security in the Internet of Things. to joining MSU, he was an Assistant Professor in
electrical and computer engineering with the Uni-
versity of Louisville, Louisville, KY, USA, and a
Senior System Engineer at Marvell Semiconductor,
Santa Clara, CA, USA. His research interests include wireless networking and
sensing systems. He was a recipient of the NSF CAREER Award. He received
the Best Paper Award from IEEE SECON 2021.
Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.