0% found this document useful (0 votes)
74 views15 pages

Enabling Efficient Blockage-Aware Handover in RIS-Assisted Mmwave Cellular Networks

Uploaded by

Michael Samwel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views15 pages

Enabling Efficient Blockage-Aware Handover in RIS-Assisted Mmwave Cellular Networks

Uploaded by

Michael Samwel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO.

4, APRIL 2022 2243

Enabling Efficient Blockage-Aware Handover in


RIS-Assisted mmWave Cellular Networks
Long Jiao , Student Member, IEEE, Pu Wang , Student Member, IEEE, Amir Alipour-Fanid , Member, IEEE,
Huacheng Zeng , Senior Member, IEEE, and Kai Zeng , Member, IEEE

Abstract— Recently, networks operate at frequencies over I. I NTRODUCTION


28 GHz (mmWave) have emerged as a viable solution for 5G
mobile networks to provide Gbps data rate. Due to the high direc-
tivity and attenuation of mmWave signals, mmWave communica-
T HE 5G networks can meet the increasing demand by
providing high data rate, ultra-low latency, and massive
machine-type communications [1], [2]. In 5G networks, small
tion links are highly vulnerable to the frequent mmWave channel
blockages, which can trigger excessive handovers. Thanks to its cells or femtocells can operate at mmWave frequencies [3].
ability to enrich the scattering environment and create reflective In this way, traditional mobile networks (e.g., 3G, LTE, and
signal multipaths, Reconfigurable Intelligent Surface (RIS) has LTE-A) operating at sub-6 GHz frequencies complementarily
great potential to counter the blockage effect and thus greatly provide the wide range of coverage and control while small
reduce the number of unnecessary handovers. However, this
potential has not been well explored. In this paper, we propose a cells (femtocells) operating in the millimeter-wave (mmWave)
RIS-assisted handover scheme by leveraging deep reinforcement band ranging from 30 GHz to 300 GHz offer very high
learning (DRL). Under various channel blockage conditions, throughput [4]–[7].
the DRL agent manages to reduce the cumulative handover Along with the benefits (e.g., offering 10x bandwidth) of
overhead by jointly adjusting beamformers and RIS phase shifts. the mmWave frequency band, the signal suffers from much
Compared with the existing schemes without considering RIS, the
RIS-assisted handover scheme significantly reduces the number higher attenuation and larger pathloss than that at lower
of handovers and achieves higher spectrum efficiency. Besides, frequencies [8], [9]. In order to mitigate the high signal
to alleviate the impact from the limited observations of the fast attenuation and compensate for the large propagation pathloss,
fading channels, we propose a lightweight algorithm to sense the mmWave link is established by forming highly directional
the blockage status and such sensing results can be utilized to beams enabled by antenna arrays. The high directivity of
improve the performance of model training. Numerical results
show that DRL agent is able to further improve the performance signals at mmWave frequency band, however, makes mmWave
when integrated with the blockage status sensing algorithm. links highly vulnerable to the blockage [10], [11]. For instance,
even the human body can cause up to 35 dB signal attenuation.
Index Terms— 5G networks, millimeter wave, handover, beam
tracking, reconfigurable intelligent surfaces (RIS), intelligent The rapid fluctuations of channel quality due to blockage effect
reflecting surface (IRS). can cause/trigger excessive user equipment (UE) handovers,
which switch UE with a low signal power or data rate to
nearby base stations (BSs) with a high-quality link [12].
The frequent UE handovers will induce non-negligible delay,
i.e., the one-way delay of X2 link between the Mobility
Management Entity (MME) and the original mmWave gNB
Manuscript received March 2, 2021; revised July 23, 2021; accepted
August 11, 2021. Date of publication September 15, 2021; date of current can be 1 ms, and the delay at the MME side could be as high
version April 11, 2022. This work was supported in part by the Microsoft as 10 ms [13].
Research Award, U.S. Army Research Office (ARO) under Grant W911NF- Recently, a new paradigm, named reconfigurable intelli-
21-1-0187, and in part by the Commonwealth Cyber Initiative (CCI) and
its Northern Virginia (NOVA) Node, an investment in the advancement of gent surfaces (RIS) [14]–[17], has been proposed to enrich
cyber R&D, innovation and workforce development. The associate editor scattering environment so that strong reflective signal paths
coordinating the review of this article and approving it for publication was can be established between the BS and UEs. Thanks to
S. Dey. (Corresponding author: Kai Zeng.)
Long Jiao and Kai Zeng are with the Department of Electrical and Computer its ability to enrich scattering environment, RIS has great
Engineering, George Mason University, Fairfax, VA 22030 USA (e-mail: potential to reduce the excessive handovers caused by the
[email protected]; [email protected]). frequent mmWave channel blockage effect. For instance, RIS
Pu Wang is with the School of Cyber Engineering, Xidian University, Xi’an
710071, China (e-mail: [email protected]). can largely alleviate the blockage effect by creating extra BS-
Amir Alipour-Fanid is with the Lane Department of Computer Sci- RIS-UE signal paths thus greatly reduce blockage triggered
ence and Electrical Engineering, West Virginia University, Morgantown, handovers. However, the benefit of RIS on reducing the
WV 26506 USA (e-mail: [email protected]).
Huacheng Zeng is with the Department of Computer Science and Engi- handover overhead in mmWave cellular networks has not been
neering, Michigan State University, East Lansing, MI 48824 USA (e-mail: well explored in the existing literature.
[email protected]). In this paper, to minimize the handover overhead in
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TWC.2021.3110522. mmWave cellular networks, we propose an efficient handover
Digital Object Identifier 10.1109/TWC.2021.3110522 scheme leveraging RIS. Owing to the adjustable phase-shift
1536-1276 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2244 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022

elements in the reflecting arrays, RIS enables to create an extra Finally, to address the limited observations for the blockage
strong reflective path [18]–[21]. The process of RIS-assisted status of beamformers and RIS, we develop a lightweight
handover consists of the following stages: a) once the link algorithm based on the derived closed-form results to estimate
blockage occurred, the handover agent can intelligently ana- the link blockage coefficients, which are able to estimate
lyze the channel blockage status for affected channel clusters the channel blockage status based on the limited number of
of reflective channels from RIS to UE and direct channels measurements. Estimated channel blockage information serves
from BS to UE, in the angle domain; b) reconfigure the as the input for the learning model. Such a measure enables the
wireless scattering environment, i.e., the weights of phase shift PER-DRL agent to have a clear picture for the blockage status
elements in the RIS, to either recover currently blocked links on each cluster and thus can further improve the performance.
or facilitate UE handover rapidly. Therefore, RIS in mmWave In this work, with the assistance of the RIS, we provide
cellular networks has great potential to decrease redundant an efficient handover scheme in response to the frequent
handovers. mmWave channel blockages. We conduct extensive numerical
Unfortunately, it is non-trivial to develop the handover simulations to validate the analysis and incorporate our design
scheme to fully leverage the benefits of RIS to reduce the into three schemes: RIS-assisted UE handover, BF-assisted UE
number of handovers. In particular, we face the following handover and SNR threshold-based UE handover. The results
challenges. a) To restore blocked links, merely configuring show that: a) the RIS-assisted UE handover scheme is able
the phase shifting array of RIS is not sufficient. Actually, due to combat mmWave channel blockage and can significantly
to the high directivity of the mmWave signal, the received reduce the handover overhead; b) compared with the schemes
signal beams reflected from the RIS are highly dominated by without considering the RIS, the joint design for RIS phase
the beamformer of BSs. If the signals are not on the broadside shifts and BS beamformers outperforms the schemes relying
of BS beamformers, it will be suppressed or even nulled by merely on BF design or SNR threshold-based UE handover;
the sidelobe of the beams. b) Prior to performing RIS-assisted c) the blockage status information of all channel clusters is
link restoration or UE handover, a trade-off is needed under the estimated by the proposed algorithm. With a better overview
various channel blockage effect. For instance, recovering the of blockage over various beamformers, a desirable handover
communication links under the channel blockage may reduce performance is achieved even with limited number of channel
the handover consumption directly. However, for mmWave observations.
channels under the severe blockage, that might result in a low Our main contributions of this work are as follows:
data rate. To this end, an optimal handover policy balancing the • For the first time, we propose a RIS-assisted handover
RIS-assisted channel restoration and UE handover is needed. scheme for UEs in mmWave cellular networks by jointly
c) To combat the fast channel blockage, the time-critical UE designing the RIS phase shifts and beamformers at BSs,
handover is needed and the handover decision should be made which is able to leverage the rich scattering environment
based on a few number of channel observations filtered by enabled by the RIS to either recover the channel blockage
the current beamformers and RIS phase shifts [11], [22]. For or facilitate UE handover. It is proved that RIS can serve
instance, in work [22], limited channel measurements (only as an alternative approach to combat the blockage effect
4 time slots) are collected at the UE beam alignment stage and largely reduce the number of handovers triggered by
in response to the fast fading of mmWave channel. Such frequent channel blockages.
limited channel observations make the blockage status of • To efficiently perform RIS-assisted link restorations or
the resting beamformers/RIS phase shifts be unknown. As a UE handover, PER is incorporated in our handover
consequence, the performance of the real-time UE handover scheme. We conduct extensive numerical simulations and
may be negatively affected. the results show that the PER-based training strategy can
To address the aforementioned challenges residing in the outperform the double deep Q-Network (DDQN)-based
RIS-assisted handover schemes, we propose the following scheme.
measures/strategies. At first, to avoid being suppressed by the • To alleviate the impact of limited environment observa-
beamformer at BS, a joint-design for the RIS phase shift tions on the training process, we propose a lightweight
elements and the BS beamformer is developed to exploit algorithm to sense the link blockage status for clusters
the enriched scattering environment. Such a joint design is within RIS reflective channels and direct channels. Along
based on deep reinforcement learning (DRL), where a central with other information from environments, the sensing
controller optimizes the handover process by using a Markov results can serve as the input for the learning model.
decision process (MDP) according to the observations from To this end, the DRL agent can have an in-depth estima-
the dynamic environment. With the joint BF-RIS design, tion for the blockage status over a wide variety of beam-
the broad-sight of beampatterns can be aligned with RIS patterns. Numerical results show a significant advantage
reflective channels to mitigate the channel blockage. Secondly, of the proposed algorithm over training strategies with
to carefully take the link restoration or UE handover with the limited observations.
objective of reducing cumulative handover overhead, we pro- The rest of this paper is organized as follows. In Section II,
pose a handover scheme based on DRL. Prioritized experience we discuss the related works. In Section III, we introduce
replay (PER) is incorporated in the training process of our the system model of RIS-assisted mmWave cellular networks.
DRL agent model, which is able to accelerate the convergence Then the reward prediction under link blockage is presented in
of training and improve the performance of the DRL model. Section IV. Section V proposes the PER-DRL-based handover

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2245

scheme. In Section VI, we show the numerical results of our


handover scheme under various link blockage scenarios. The
conclusions are drawn in Section VII.

II. R ELATED W ORKS


Due to frequent channel blockage at mmWave band, signals
are highly susceptible to blockage and thus lead to inter-
mittent mmWave link. Handover schemes have been pro-
posed to counter the frequent channel blockage. They can
be broadly classified into: dual connectivity based mmWave
UE handover, ML-based UE proactive handover and side
information-powered handover policy design. Fig. 1. System model.
Dual connectivity protocols [13], [23] enable mobile
RIS-enabled channel blockage restoration and mmWave UE
mmWave UEs to maintain physical layer connections under
handover are unexplored in existing works.
mmWave channel blockage. In this case, mmWave band is uti-
lized to provide the high data throughput. The sub-6 GHz band
is required for the handover controlling. This type of scheme III. S YSTEM M ODEL
aims to enable rapid path switching in the event of failures on We consider a mmWave cellular network with a set of B
any one link. However, the time overhead of handover caused BSs B = {1, . . . , B} and a set of U UEs U = {1, . . . , U }.
by communication blockage cannot be minimized. At the The system is operated in discrete time slots [32], [33]. For
same time, the influence of channel dynamics on the handover every epoch t (t = 1, 2, 3 . . .), at the beginning of each time
strategy/policy is not considered in the protocol design. slot, the handover decision algorithm determines the target BS
To be proactive to link blockage with communication outage for the UE then the UE connects to this BS. The duration of
prediction, a machine learning based blockage prediction has each time slot is denoted as ΔT .
been proposed in [24], [25]. In [24], the proposed scheme A simple scenario is depicted in Fig. 1, which consists of
relies on a primitive hand-off assumption that UEs need to two BSs, one UE and one RIS. As shown in Fig. 1, due
perform hand-off as long as a link blockage occurred. The to the channel blockage, UE 1 can either switch to BS 2
authors did not consider the overall process as an optimized (the beams denoted in green) by performing device handover,
decision process and cannot reduce the impact of handovers or recover the communication link from BS 1 by focusing
on the user’s average data rate utility. In [26], authors leverage on the reflective channel (the beams denoted in red). For the
time-consecutive camera images in the handover decision channel, we mainly consider the direct links between the BS
process. While making handover decisions, the long-term and UE and the reflective channels (cascade channels) through
performance in future is predicted. the RIS. Furthermore, we assume that intra-cell UEs adopt
To achieve an optimal handover policy, side information orthogonal resource blocks to avoid intra-cell interference.
like temporal and spatial correlation of environment state To compensate the large propagation loss, we assume every
are considered. For example, in existing works [27], [28], BS utilizes a phased array that can adaptively tune weights on
the impact of distance between the macro cell and mmWave every omni-directional antenna element in the array to create
small cells on the handover performance was modeled as directional beams, whereas each UE is equipped with a single
an MDP problem. Whereas, the authors made some simple omni-directional antenna. We assume the RIS array contains
assumptions by only considering a 1-D case for user mobility. M phase shifters, and N antenna elements are packaged in
However, the developed MDP model relies on quantized the antenna array at each BS. Gb ∈ C M×N , hr,bu ∈ C M×1 ,
channel conditions and cannot respond to channel conditions hd,bu ∈ C N ×1 (u ∈ U, b ∈ B) are equivalent channels from
precisely. In [29], to avoid a large number of unnecessary the BS b to the RIS r, channels from the RIS r to UE u, and
handovers and energy consumption overhead due to the fre- channels from the BS b to UE u, respectively. The phase shift
quent short-term LOS blockage, authors propose handover matrix of RIS is denoted as Φ = diag{Φ1 , Φ2 , . . . , ΦM },
mechanisms with side information like spatial and temporal where Φm denotes the mth element of the RIS reflection coeffi-
contextual information by carefully deciding the next base cient matrix. For mth element in the RIS array, Φm = βm ejφm
station. In [30], [31], ML techniques are applied to learn the is represented by the phase shift coefficient φm ∈ [0, 2π] and
mobility of the mobile mmWave UEs and predict the moving the reflection coefficient βm ∈ [0, 1]. In practice, each element
directions. Based on such mobility information, the proposed of the RIS is usually designed to maximize the signal reflection
schemes aim to achieve the beam tracking and handover more amplitude (i.e., βm = 1) [19], [21]. The received signal usually
efficiently. consists of the signal coming from its associated BS and the
It should be noted that all of the aforementioned works do reflected signal from the RIS [34], [35]. It is given as:
not consider RIS. To the best of our knowledge, there is no 
existing work exploiting the enriched scattering and reflecting yu = (hH d,bu + hr,bu ΦGb ) Pb wb sb
H
 
clusters provided by RIS to reduce unnecessary handovers + (hH
d,iu + hr,iu ΦGi ) Pi wi si + nu ,
H
(1)
under link blockages. In particular, the trade-off between i∈B,i=b

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2246 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022

where sb is the tranmited symbol for UE b. wb ∈ C N ×1 TABLE I


is the beamforming vector for UE b. nu is additive white T HE RT OF SRS AT MM WAVE G NB
Gaussian noise (AWGN) with noise power N0 . Please note
in (1), the received signal also includes the interference signal
from the adjacent BSs.
In this work, different from the relay system or the
active large intelligent surfaces (LIS) [36], we consider the
RIS system operating under the passive mode. In this case,
the RIS does not incur self-interference or noise amplification.
As specified in [21], we assume that the RIS is coated on the
building around UEs. The definition of angle of arrival and
channel gain of the geometric channel model can be found
systems [38]. For the RIS-assisted mmWave communication
in [21]. To assess the theoretical performance gain brought
systems, the channel coefficient of reflective channels can be
by RIS, we assume that the channel state information of all
derived based on the estimation of AoAs/AoDs and channel
channels involved is perfectly known at the BS.
gains. For instance, in [37], for RIS-assisted mmWave systems,
a channel estimation scheme has been proposed based on the
A. Channel atomic norm minimization to sequentially estimate the channel
For the ease of exposition, at BS side, we focus on a parameters, i.e., angular parameters, angle differences, and the
1-D uniformly spaced antenna array, which has N antenna products of propagation path gains.
elements and can generate K beam directions/patterns in total.
The channel coefficient shaped by beampatterns B. Measurement Collection for UE Handover
Akb (θi , wkb ) can be represented as:
In this work, we assume that UEs select one of the beam-

Pub
patterns for measuring the signal quality. Supposing there is a
kb =
hub Akb (θi , wkb ) × ai , (2) central coordinator for B BSs, the handover procedure works
i=1
as follows.
where kb is the index of beampattern of BS b. There are 1) Downlink Measurements: Due to the sparsity of
Pub clusters in the UE-BS channel, where each cluster i is mmWave channel [39], the spatial channel response is dom-
represented by a tuple [θi , ai ], denoting its angle-of-arrival and inated by a few paths from a few angular directions. This
the combined amplitude and phase. For the ease of presenta- requires BSs and UEs to conduct a beam sweeping to deter-
tion, we assume the amplitude and phase term associated with mine the beampattern that offers the highest SINR, and the
each cluster is combined in a single term ai . We use a vector corresponding SINR can be described as:
p to denote such a parameter triplet. Intuitively, the channel ru 2
coefficient hub Pt |hub
kb + hgkb |
kb captures the aggregated effect of all paths by SINRgk 
ub =
b
ru 2
, (4)
beampatterns. i=b Pt |hui
ki + hgki | + N0

where hru gkb denotes the equivalent channel coefficient for the
2) Channel Coefficient of Reflective Channels: By consid-
BS-RIS-UE channel. For the reflective channels, the weights
ering the RIS channels, for the given weights of RIS Φ,
of RIS Φ is contained in hru gkb . It is defined by the antenna
the channel coefficient of involving the BS-RIS-UE channel
can be given as: pattern s and kb at RIS and BS b, respectively. hub (kb ) is the
channel coefficient of the BS-UE channel from BS b to UE u.

Pru 
Prb
Pt denotes the transmission power. N0 is the noise.
gkb =
hru Aru (θnu , ψnb , Φg )Arb (θnb , wkb )
To track several BSs simultaneously, each UE broadcasts
nu=1 nb=1
×anb anu , (3) uplink sounding reference signals in dedicated slots, steering
through K beampatterns, one at a time, to cover the whole
where Aru (θnu , ψnb , Φg ) = aH (θnu ) ◦ [Φ1 , . . . , ΦM ]g a(ψnb ) angular space. Each BS b has K SINRs measurements yub =
and Arb (θnb , wkb ) = aH (θnb )wkb are the equivalent beampat- [SINR1ub , . . . , SINRK
ub ] at UE u. For each UE u, each mmWave
tern for the reflective channels RIS-UE and RIS-BS. Here ◦ is gNB collects sounding reference signals (SRS) and the central
the hadamard product. nu and nb denote the cluster index of coordinator fills a report table (RT) for UE u based on SRSs,
the channel hr,bu and Gb , respectively. a(θnu ) is the antenna as in Table I. Each entry represents the collected SRS and
steering vector for the channel hr,bu . aH (θnu ) and a(ψnb ) power in dB, transmitting through various beampatterns. s ∈
are the antenna steering vectors for the reflective channel {1, . . . , S} denotes the index of SINR samples that mmWave
Gb . Please find the definition in detail in [37]. Please note gNB collected.
that the RIS matrix Φg has been incorporated in the equiva- Based on the observations of directional communication
lent beampattern Aru (θnu , ψnb , Φg ). We assume the blockage at mmWave band [11], the mmWave links showed a strong
coefficient of channel hr,bu for the cluster nu is cnu . anb and inter-beam correlation (coefficient > 0.5) on channel coeffi-
anu are the complex channel gains. There are various chan- cient. It indicates that we still can observe the spatial correla-
nel estimation schemes for the RIS-assisted communication tion on SINR regarding different beampatterns even when the

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2247

blockage seems to land on a single beam [11]. Under frequent the approximation of blockage coefficients vector b̃s,kb can
blockages effect, the function above needs to take into account be represented as below:
the blockage. If we get a part of information related to link
blockage, the handover decision process under such a scenario  rep  2
b̄ub = minb [hub
skb ]ms − h(gkb ) (u, b); b̄; p
 , (6)
can be further optimized.
2) Coordinator for UE Handover: For the RT filled for where we append the two vectors into b̄ = [bT , cT ]T . In [11],
each UE, each mmWave cell sends this information, through the authors develop a table searching scheme to approach
the X2 link, to the coordinator [23]. The coordinator builds to the actual b. The authors quantize every single blockage
a complete report table (CRT). After assessing the CRT, coefficient bi , and the quantized bi is supposed to approach
the handover decision for the optimal mmWave gNB with the to actual bi . Apparently, the performance of table searching is
optimal beampatterns is evaluated for each UE, considering greatly affected by the quantization level of bi , and the global
the side information like SINRs and power. The criterion for minimum cannot be captured by the low-resolution table.
the efficient UE handover is described in Section V. Besides, a high quantization level requires a large memory.

IV. R EWARD P REDICTION U NDER L INK B LOCKAGE B. Channel Blockage Prediction Algorithm
To achieve a good trade-off between link restoration and In this work, we aim to develop a low-weight algorithm
UE handover, blockage status over various channel clusters to approximate the blockage coefficient b. In the first step,
are needed. In this section, we will introduce our blockage we can reformulate the objective function in (6) as below:
status prediction algorithm, which predicts the link blockage opt ∗ opt
status and reward for beam tracking. min ([hub
gkb ]ms − (h(g,kb ) ) b̄) ([hgkb ]ms − (h(g,kb ) ) b̄)
T ub T
b
0 ≤ b̄i ≤ 1, i ∈ {1, . . . , P }, (7)
A. Blockage Status Prediction Based on Limited Channel where [hub
gkb ]ms isthe channel coefficient observed by beam-
Observations pattern kb , hopt is the result for channel coefficient para-
(g,kb )
According to the observation in [11], given a 60 GHz meters and b̄ = [bT , cT ]T indicates the blockage coefficients.
phased array with multiple beam directions, blockage of one hopt
(g,kb ) can be expressed as
beam affects the performance of other beams. When obstacles
block a certain angular cluster, all beams can be affected in hopt
(g,kb ) = [Akb (θ1 , wkb ) × a1 , . . . ,
a correlated way. The authors in [11] find that such spatial Aru (θPru , ψPrb , Φg )Arb (θPru , wkb ) × aPru aPrb ] (8)
correlation among beampatterns can be assessed by a modeling
framework. The objective in (7) can be further simplified to
Suppose the measured channel coefficients filtered by beam- opt ∗ opt
([hub
gkb ]ms − (h(s,kb ) ) b̄) ([hgkb ]ms − (h(s,kb ) ) b̄)
T ub T
pattern (g, kb ) can be represented as [hub
gkb ]ms . Based on the 
2 ub ∗ opt
prediction model in [11], the extent of blockage effect on gkb ]ms | − 2Re [hgkb ]ms (h(s,kb ) ) b̄ + b̄ Ab̄,
= |[hub H T

cluster i for the BS-UE channel or the BS-RIS-UE channel (9)


are denoted by the scalar bi ∈ [0, 1] or ci ∈ [0, 1]. For
instance, bi = 1 indicates the cluster i of the BS-UE channel where the matrix A can be represented as A =
is unblocked and bi = 0 denotes the cluster i is fully blocked. hopt opt
(g,kb ) (h(g,kb ) ) .
H

By incorporating the blockage coefficients for each cluster, Proposition 1: Matrix A is positive definite.
we can reformulate the measured channel coefficients from Proof: For any vector x = 0, xT Ax can be expressed as:
beampattern (s, kb ) as: 
  xT Ax = x2i ∗ |hopt
(g,kb ) (i)|
2

hrep
(gkb ) (u, b); (b, c); p i
  


Pub + 2xi xj ∗ Re hopt opt
(g,kb ) (i)h(g,kb ) (j) . (10)
= Akb (θi , wkb ) × bi × ai i=j
i=1
We can observe that in (10), vector x can be replaced by link

Pru 
Prb
coefficient vector b, which only contains positive numbers.
+ Aru (θnu , ψnb , Φg )Arb (θnb , wkb )
The xi xj in (10) is postive and the product of matrix meets
nu=1 nb=1
×cnu × anb anu . (5) xT Ax > 0. Therefore, matrix A is positive definite. 
Based on the above observation, the objective in (9) can be
Based on the parameter p = {θi , ai , ψi , . . .}, the vector b further simplified to:
and c are the unknown blockage coefficients which need to be 1 
∗ opt
approximated. For the ease of notation, we would use anb anr min b̄T Ab̄ − Re [hub gkb ]ms (h(g,kb ) )
H

b̄ 2
to indicate the product of path gains for clusters nr and nb,
0 ≤ b̄i ≤ 1, i ∈ {1, . . . , P }. (11)
which includes the phase information. With observed channel
coefficients from beampattern (g, kb ), if we put {bi , cnu }, i ∈ We usually can solve this problem using quadratic optimiza-
{1, . . . , Pub }, nu ∈ {1, . . . , Pru } into a single vector b̄, tion techniques. However, such approaches need to iteratively

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2248 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022

reduce the sum of the squares of the errors through a sequence contained in the state space st . In practice, we apply data
of updates, which incurs a large training overhead for our processing techniques like normalization to the input data [40].
scheme. 3) Agent: In this work, the RIS-assisted communication
To further reduce the computational complexity, the inequal- system acts as an environment. There is a central controller
ities serving as constraints in (11) can be incorporated into the in responsible for handovers and it is regarded as a learning
objective function with the lagrangian multiplier so that we can agent. The agent needs to be trained off-line according to a
get the close-form results for (11) as follow: design objective. The agent is trying to reduce the number of
Proposition 2: A closed-form solution for problem (11) can handovers, meanwhile maintaining a high data rate under fre-
be given as: quent link blockages, which requires to achieve the maximized

 + cumulative reward after a series of handover actions. For every
b̄∗ = A−1 (−Re ([hub ∗ opt
gkb ]ms ) (h(g,kb ) )
H
− λ∗ I)P . (12) handover action, if we denote its reward as rt , the cumulative
reward function Cu , during a time period, is defined as
Proof: The parameter λ∗ is the lagrangian multiplier in 
t2
the dual problem. We prove in Appendix that the solution Cu = rtu , (15)
in (12) can meet the KKT condition and thus can achieve t=t1
dual optimality. In order to estimate λ∗ , we developed a where t1 and t2 are two epoachs. rtu denotes the instantaneous
lightweight algorithm based on the ellipsoid method, which reward for a time slot with duration ΔT . Its definition usually
has a high convergence rate. According to the link blockage needs to consider factors like SINR, beampattern, time slot
coefficient b̄∗ , our scheme can be aware of link blockage and length and so on.
take handover policy based on the predicted performance of 4) Reward: For every time epoch, our objective is to min-
the beam tracking.  imize the total handover consumption while maintaining high
channel quality to provide sufficient data rate. Reward r(st , at )
V. DRL-BASED H ANDOVER S CHEME under state st and handover action at can be represented as
In this section, we present the DRL-based handover scheme below:

for the mmWave cellular network [40], [41], which is able to (ΔT − τb ) W if iHO = it−1 ,
r(st , at ) = Ui Rui , (16)
tackle systems with a large state-action space.
(ΔT − τa − τH ) Ui Rui , if iHO = it−1 ,
W

A. Detailed Description of RL where τH = αc ΔT denotes the handover cost coeffi-


In this work, the environment consists of RIS-assisted cient [28]. Spectrum efficiency involving the beamformer
mmWave cellular networks and mobile UEs. There is a central and phase shifting weight at the RIS is given as Rui =
controller in charge of UE handovers and it is represented log2 (1 + SINRub ({wi }i∈B , {αm }m∈M )). Our scheme adopts
by a learning agent. To continually interact with the dynamic beam tracking and the beam alignment time overhead can be
environment and adjust the beamformers and RIS properly, expressed as:
 
RL algorithm usually involves a group of environment states φ
st , a collection of possible action space A, and the instanta- τa = Tp , (17)
ψt
neous reward. The key elements of RL are given as below:
where Tp denotes the time overhead for beamforming pilot
1) Action Space: At every single time slot t, in response to
transmission. φ and ψt are the sector beamwidth and trans-
the environment observation, the agent takes action at ∈ A
mission beamwidth for beamforming, respectively. We assume
from action space A in the process of learning. To be reactive
that UEs utilize omni-directional antennas. τb is the time over-
to the channel blockage at mmWave frequency band, agent
head for beam tracking. According to the algorithm developed
needs to determine the beamformer {wi }i∈B and the reflecting
in this work, suitable beampatterns are predicted based on
beamforming coefficient (phase shift) {Φg }g∈G . iHO is the
the measurements filtered by a single beampattern. The time
indicator for UE handover. The action is given by
overhead for beam tracking thereby can be represented as
at = iHO , {wi }i∈B , {Φg }g∈G . (13) τb = τa /K. Here K is the number of beampatterns adopted in
our system. By incorporating the handover consumption in the
2) Environment States With Blockage Prediction: The
design of reward, our scheme can satisfy the data rate demand
observation of the environment at the current time slot t is
of UE while minimizing the number of handovers.
referred to the environment state st ∈ S. The environment
state st includes the actions of previous time epoch at−1 as
well as the predicted SINR for BSs, i.e., {SINRub }b∈B after B. The Primitives of Q-Learning
the channel recovery stage. It is defined as RL [42] consists of two entities, agents and environment.
The agents explore environment under current state st and
st = {SINRub }b∈B , at−1 . (14)
react with an action at . After an action, the agent gets the
To enable a better overview for the channel blockage status reward r(st , at ) and jumps to next state st+1 . In traditional
on each channel clusters, the predicted SINRs over every RL, there is a famous algorithm called Q-learning which
possible beampattern pair serve as the environment observa- utilizes the Q-value Q(s, a) to measure how well it can achieve
tion and all possible combinations of channel conditions are when taking action a at state s.

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2249

An action at is the output of the UE handover policy at C. Double Deep Q-Network (DDQN)
time epoch t, and state st is the observation of environment DDQN is proposed to improve the training efficiency of the
under previous action. Under the policy π, the expectation of DNN Q-value estimator, so that the agent can be applied to
discounted reward conditioning on the state st and action at solve the optimization problem under high-dimensional input
is named as the state-action value function which is usually state space. The state-action value function can be estimated
given as: by a non-linear estimator or the neural networks.
∞ Parameterized by the network coefficient θ, to approximate a
 
Q(st , at ) = E γ τ rt+τ +1 |st , at . (18) given state-value function Q(st , at ), the loss function between
τ =0 Q(st , at ; θ) and Q(st , at ) has to be minimized and it is
expressed as
To take the better policy, in Q-learning, we usually consider
the greedy policy strategy and the updated policy can be 1  2
L(θ) = Q(st , at ) − Q(st , at ; θ) . (23)
represented as N
t≤N

π  (st ) = arg maxa Qπ (st , at ). (19) For the ease of presentation, we take the mean squared
error to denote the loss function L(θ), which requires the
In this case, according to the policy improvement theorem, appropriate gradient descent algorithm to minimize the loss
the time difference (TD) is given as and adjust the parameter θ. According to the universal approxi-
mation (UA) theorem, for any Lebesgue-integrable function f ,
TD(st+1 , at+1 ) = r(st+1 , at+1 ) + γ maxa Q(st+1 , a ) there exists fully connected Rectified Linear Unit (ReLU) θ
−Q(st , at ), (20) which makes the integration of loss function be smaller than
any positive . Two issues need to be addressed: non-i.i.d
Please note that in the process of reinforcement learning, input data and non-stationary targets. To address these issues,
the state-action function/state function is updated iteratively experience replay and target Q-Networks have been proposed
and to capture the random changes imposed by environment, for the DDQN to obtain the knowledge from the environment
the state-action function is updated as the following: more efficiently [43].
1) Experience Replay: DRL utilizes the experience replay
Q(st , at ) = Q(st , at ) + α TD(st+1 , at+1 ), (21) unit to help the DNN memorize the sequential knowledge
from the environment. At each time instant t, the agent stores
where α is the learning rate which determines the pace of the
its interaction experience tuple et = {st , at , r(st , at ), st+1 }
agent to adapt to the environment random changes.
into experience replay memory Dt = {e1 , . . . , et }. During
In the RL algorithm, we aim to find an optimal policy
the training, the stored samples are randomly extracted from
to achieve the largest cumulative reward. According to the
the replay memory based on the experience replay mechanism.
policy update in (19), we can get a new policy π  (st ) so that
The motivation for experience replay is to break the correlation
the condition V (s) ≤ Qπ (s, π  (s)) can be satisfied, where
among the sequential data by randomly sampling data in
V (s) is the value function. According to this observation,
Dt . In this way, the mechanism can remove the correlation
the following condition can be given as
between the samples, thereby accelerating convergence and
Qπ (s, π  (s)) avoiding significant divergence.
  2) Q Target Network: To serve as a non-linear state-action
= Eπ rt+1 + γV π (st+1 )|st = s value function approximator, DNN needs to learn a map-
 
≤ Eπ rt+1 + γQπ (st+1 , π  (st+1 ))|st = s . (22) ping for constantly changing input and output. In DDQN,
two DNNs, θ− and θ have been proposed to achieve fast
By following this observation, we can continue to expand convergence. DNN θ− is selected to retrieve the state-action
this inequality for Qπ (st+1 , π  (st+1 )). In the end, we can value function and its output serves as the fixed labeling in

derive that V π (s) ≤ V π (s). Then the policy π  must obtain a time period. The second network θ includes updates in the
greater or equal expected return from all states st ∈ S. If the training and learning stage. In this process, the target network
RL agent repeats this iterative process, we can continu- is supposed to provide the target state-action value yj for the
ously observe the performance improvement. In this work, entry j.
the instantaneous reward consists of the cost due to the 
handover consumption and the spectrum efficiency. rj , if training ends,
yj = 
(24)
The traditional RL scheme, i.e., Q-learning, has the limited rj + γmaxa Q(sj+1 , a ; θ− ), otherwise.
performance in the dynamic environment. For example, in this
For episode i, the DDQN agent tries to solve the following
work, the state space is continuous and has vast combinations
problem:
of elements in the state space. With such a state space,
 2
the Q-learning agent will take a long time to explore the min L(θi ) = min Eet ∼D yj − Q(sj , aj ; θ) . (25)
environment and obtain its knowledge, which causes a meager
convergence rate of training. In this paper, we propose to The error between the target value and the estimation is the
utilize DDQN, which is based on deep learning techniques time difference (TD) value. The gradient descent is performed
and this concept is developed by Google [43]. with respect to the network parameters θ, where the network

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2250 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022

parameter is updated according to the partial differentiation of E. RIS-Assisted Handover via PER-DRL
the function (25). After every M updates, the synchronization To enable the blockage-aware handover, an agent is trained
between network θ− and θ is performed. at the central. It is responsible to predict the channel blockage
and adjust beamformers and RIS phase shifts. At the beginning
D. Prioritized Experience Replay of training, global network information like power allocation,
CSI of all users, and SINRs after beamsweeping with different
PER is one of the most important improvements for RL beampatterns are collected. Some of information denoted in
algorithms. It is built on top of experience replay buffers. For Section V-A serves as the input while training the PER-
more frequent replay transitions with high expected learning DRL agent. Parameters like discounter factor γ, exploration-
progress, the importance is measured by the magnitude of their exploitation factor , and exponents for prioritized replay
TD error. This prioritization can lead to a loss of diversity, α and β are initialized at the beginning. To accelerate the
which is alleviated with stochastic prioritization, and introduce learning stage, the exploration-exploitation balance have to
bias, which can be corrected with importance sampling. The be addressed carefully. In our scheme, the value of is
stochastic sampling method interpolates between pure greedy updated in a monotonically decreasing manner so that the
prioritization and uniform random sampling. The probability agent explores environment with a high probability to take the
of being sampled is ensured to be monotonic in a transition’s random action, i.e., taking beampattern indexes for BSs and
priority, while guaranteeing a non-zero probability even for the RIS, performing frequent handover at the early stage. As the
lowest-priority transition. Concretely, we define the probability increasing of training episode, the output of the non-linear
of sampling transition i as estimator of state-action value functions is more reliable and
pα can converge gradually. The instantaneous reward is obtained
P (i) =  i α , (26) by the agent for each action. At the same time, the agent
k pk
also gets the latest environment observations for the state. For
where pi ≥ 0 is the priority of transition. The exponent example, the transition (st−1 , at−1 , rt , γt , st ) has to be stored
determines how much prioritization is used, with α = 0 in a buffer. The experience in the replay buffer is selected by
corresponding to the uniform case. The magnitude of the TD the PER scheme to generate mini-batches and it is used to
error (squared) is what the algorithm aims to minimize in the train DQN.
Bellman equation. Hence, the algorithm picks the samples with After meeting the convergence threshold of PER-DDQN,
the largest error more frequently so that DNN can minimize the model is going to be saved. Due to the PER strategy,
it. Prioritized replay introduces bias because it changes this the channel variation at mmWave frequency band can be
distribution in an uncontrolled fashion, and therefore changes tolerated in the trained model. The trained model is applied at
the solution that the estimates will converge to. We can correct the central controller to perform the handover for RIS-assisted
this bias by using importance-sampling (IS) weights: mmWave cellular networks. In general, the model training is
1 1 β
performed offline while the agent implementation stage is per-
wi = × . (27) formed in real time. For a different network implementation,
N P (i) the PER-DDQN agent needs to be trained again to incorporate
The IS in PER is to correct the over-sampling with respect the dynamics of environment.
to the uniform distribution. These weights can be folded
into the Q-learning update by using wi δi instead of δi . The F. Implementation of PER-DRL Agent
accumulated gradient indicated by Δ is given as
We provide the details regarding the training of PER-DDQN
Δ ← Δ + wi δi · ∂θ Q(sj−1 , aj−1 ; θ). (28) agent and metrics to test the performance of the proposed
schemes.
In (28), the weight wi is coupled with the δi TD term 1) Setups for RIS-Assited mmWave Cellular Networks:
during training, with wi δi , because the δi is multiplied with the 3 BSs with N antenna elements are located in the 400m ×
gradient ∂θ Q(sj−1 ), aj−1 ; θ) following the chain rule. For sta- 400m rectangular plane, each with the radius of 50m. The
bility reasons, we always normalize weights by 1/(maxi wi ) RIS consists of M = 16 phase shifting elements in most
so that they only scale the update downwards. Here the cases. The overlay between adjacent BSs is covered by the
β term in the exponent controls the effect of prioritization RIS. The coordinates of BSs are pre-defined, i.e., the BS 1, 2,
applied on each term. Due to the reason that training is highly and 3 locate at (0, 0), (100, 0) and (50, −80) in meter (m). The
unstable at the beginning, β starts from small values of 0.4 to RIS locates at (50, −25) in meter (m). The same beamforming
0.6 and anneals towards one. In this process, IS corrections codebook [19] is considered in this work, which includes a
matter more near the end of training. Compared with the set of specified beamformers and weights for the RIS phase
standard DDQN architecture, PER-DDQN aims to change the shifting matrix. The resolved channel information indicated by
sampling distribution by using a criterion to define the priority the tuple is stored with the previous samples, which serves as
of each tuple of experience in the experience replay. For the input for the DNN.
instance, the tuple experience with a big difference between 2) Trainings of the PER-DRL Agent: The large variation
the prediction and the TD target indicates the agent has a lot in relation with severe mmWave channel fading and different
to learn and thus has a high priority. UE locations can be viewed in the magnitude of the samples,

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2251

which usually varies from 10−9 to 10−14 . In this sense, Algorithm 1 PER-DRL-Based Handover Scheme
we normalized the samples in the transition tuples to realize 1: Initializaiton :
the dataset scaling. For the DDQN model, the upper limit Channel representation p, SINRs {SINRb }b∈B ;
for the episode number is 10,000 and the batchsize is set to The handover agent initializes the step-size η;
be 64. For the DQN training model, the experience replay size Minibatch k, replay period K;
is set to be 8,000 and the buffer size for PER-DDQN is set to Parameters γ, , exponents α, β;
be 100,000, where the sampling bias and importance weight Central controller initializes handover;
are defined as in (27). To adapt to more complex datasets Beamformers and RIS;
due to the channel blockage at mmWave frequency band Initialize replay memory H = ∅, Δ = 0, p1 = 1;
and better generalize to previously unseen signal blockage The weight vector θ of the Q-network;
scenarios, for PER-DDQN model, we consider multiple layers The weight vector θ− of the target Q-network.
neural networks. In this work, after balancing the training 2:for t = 1 to T do:
efficiency and handover performance, we adopt 6 feedforward 3: Obtain st , rt ;
fully connected network layers for the PER-DRL model, 4: Store transition (st−1 , at−1 , rt , γt , st ) in H;
including the input layer, the hidden layer and the output layer. 5: With maximal priority pt = maxi<t pi ;
In this way, the DNN can describe the mapping between state 6: if t ≡ 0 mod K then:
transition and actions, which represents the BSs beamformers 7: for j = 1 to k do:
and RISs reflecting beamforming matrix. The size of 4 hidden  α
8: Sample transition j ∼ P (j) = pα j/ i pi ;
layers are 500, 250, 200, 200, respectively. Each of the neurons 9: IS weight wj = (N · P (j))−β / maxi wi ;
in a hidden layer connects all the outputs of the last layer. Each 10: δj = rj + γ maxa Q(st , a )
layer utilizes the ReLu activation function. The output layer is 11: −Q(sj−1 , aj−1 );
a linear perceptron. To achieve an efficient interaction between 12: Update transition priority pj ← |δj |;
the agent and environment, the training data involving all of 13: Δ ← Δ + wj δj · ∂θ Q(sj−1 , aj−1 ; θ);
random dynamics is stored. 14: end for
3) Computational Complexity Analysis: From Algorithm 1, 15: Update weights θ ← θ + η · Δ, reset Δ = 0;
collecting transitions and executing back-propagation to train 16: Update weights of target network θ− ← θ;
the parameters account for the computational complexity. 17: end if
We considered the neural networks with weights and activation 18: Evaluate the SINRs for UEs;
functions. If we use a sequence (w1 , . . . , wL ) to denote the 19: Evaluate consumption on the handover;
number of nodes in each layer, we can get the following result. 20: Configure RIS weight and handover at ∼ πθ (st );
Theorem 1: After considering the PER mechanism, 21: end for
the computational complexity of DDQN can be

written as: I1 = O T − T −T Kmod K ( wi wi+1 ) +
T −T mod K
   beam pattern is applied in this work. The system operates
K 3k wi wi+1 + log2 |D| .
on 28 GHz and the bandwidth is 100 MHz. The channel
Proof: For each time step, the computational complex- reciprocity holds in the coherence time. The setting parameters
ity of DNN is O( wi wi+1 ). Considering the mini-batch
of channel can be viewed as below. In the channel coherent
size as k, it requires T episodes. After every K transi-
time, we assume the number of clusters is distributed as
tions, the mini-batch operation
  has the computational com- specified in [11]. The noise power N0 is expressed in the form
plexity as O( T −T Kmod K 3k wi wi+1 )), which includes the N0 (dB)

back-propagation and feedforward pass process. According of N0 = 10 10 , where N0 (dB) = −174 + 10log10(BW ) +
to [19], the computational complexity of PER is O(log2 |D|). FdB and FdB = 10dB is the noise figure. To have a better
There are T − T −T Kmod K iterations needed to generate 
prediction on the required SINR in a period, the locations
of BSs are modeled as a Poisson Point Process (PPP) with
transitions with the computational complexity as O T −
 density λm . In order to emulate the user mobility, we adopt
T −T mod K
K ( w i wi+1 ) . In our proposed scheme, PER the Random Way Point (RWP) with a moving speed at 2m/s.
schemes are utilized to improve the learning efficiency and The type of prioritization is proportional to pi = |δi | + ,
enhance the convergence speed, which requires extra compu- where |δi | is the TD of transition i. The hyperparameters were
tational complexity. However, after considering the (PER) unit, initialized as α = 0.7, β = 0.5. To evaluate the performance of
the proposed scheme can achieve better performance than the the proposed scheme, we compare our scheme in RIS-assisted
classical DDQN scheme.  mmWave cellular networks with the following schemes:
• Considering PER in the training strategy, DDQN based
handover scheme is able to combat the frequent channel
VI. N UMERICAL R ESULTS blockage, which is denoted as RIS-SENS-DDQN. Here
We consider a mmWave Massive MIMO system adopting SENS stands for the blockage coefficients resolved by
analog phase-shifter with a single radio-frequency (RF) chain. the proposed algorithm. To determine an appropriate
The BSs adopt the uniform linear array (ULA) with 64 anten- handover policy, the value of Q function is estimated by
nas. The UE is with single antenna. The spatial-matched-filter DQN algorithm.

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2252 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022

Fig. 2. Prediction accuracy impacted by beampattern numbers and SNR.


Fig. 3. The performance of prediction accuracy and handover number.

• A handover scheme considering the BS beamformer


design is developed. Instead of considering handover
schemes in corporation with RIS, this scheme only relies
on the BS beamformer to counter the channel blockage.
In this scheme, a DDQN agent is trained to balance the
trade-off between beam-tracking and UE handover.
• A threshold-based handover strategy is considered. With
the ability to sensing channel blockages, this scheme is
able to perform handover and avoid data rate downgrad-
ing. Please note that, for the threshold-based handover
strategy, the assistance from BS beamformers or RIS is
not involved.
We compare the learning convergence speed and efficiency
under three learning schemes i.e., RIS-PER, RIS-DDQN and
Fig. 4. Average reward performance versus episodes under different learning
BF-DDQN, as an example. Fig. 4 shows the average system schemes.
reward versus training episodes. We have several observations.
At first, it can be observed that for the RIS-assisted UE
handover in the mmWave cellular networks, the models tend
to converge after 3,000 episodes with different convergence space. Please note that convergence speed can be affected by
speeds. Besides, we can observe great variation among the factors like the learning rate and the level of transmission
average reward at different episodes. This is due to the pertur- power. We observe that for the same learning algorithm, after
bation of frequent channel blockage effect. What’s more, PER- reducing the learning rate from 0.7 to 0.65, the convergence
based scheme is able to prioritize the transitions with large TD speed and learning performance can be directly improved.
errors and thus has a faster convergence speed. If we compare Increasing the power level can expand the dynamic range of
the RIS-DDQN scheme and BF-DDQN scheme, the later one episode reward and improve the average reward. This effect is
can converge fast due to the relatively small state and action in line with the observations in [7].

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2253

A. Prediction Accuracy of Proposed Lightweight Algorithm gering schemes [28]. We discussed the number of handovers
In the process of prediction, the UE movements follow a in the following three cases. In the first case, we had 3 BSs
random way point (RWP) [44] mobility pattern with speed and 6 users.
of 2m/s. We recorded the locations of the UE and get the Fig. 3 shows the average handover number of 6 schemes.
distances from the UE to B BS. By default, UE adopts the For a single process, we consider 500 channel blockages over
quasi-omni-directional beam pattern while the BS utilizes a various SNRs. The number of elements in the phase shifting
directional beam pattern. RIS utilizes the weights defined array in RIS is set to be M = 16 and N = 64 antenna elements
similar to the scheme in [19], [20]. An obstacle (i.e., human at each BS is considered. As expected, for a single process,
body, vehicle, building) randomly blocks the clusters within the average handover number of all 6 approaches decreases
the mmWave channel. Based on the parameters like AoA, monotonically with increasing of SNRs. After improving
phase shift, amplitude attenuation from the scanned CIR trace, SNRs/transmission power, for the instant with slight channel
estimated before blockage, the CIR traces from the current blockage, the data rate at UEs can be maintained. In addition,
beam pattern are recorded. Then, the receiver employs the from Fig. 3, we observe that our proposed scheme incorpo-
proposed lightweight algorithm to predict the SINRs and rating RIS has less handover number compared with other
corresponding reward for maintaining this link. schemes merely considering beam-tracking (denoted by BF-
The proposed lightweight algorithm can efficiently estimate DDQN) or proactive device handover (denoted by threshold).
link blockage status for RIS-assisted mmWave cellular net- Actually, with the corporation of RIS, UE can jointly adjust
works. For instance, Fig. 2(a) shows the accuracy of the beamformer w and RIS beampattern Φ to channel clusters
proposed algorithm while predicting the best beam pattern that is not blocked. Besides, with the inference of resolved
index for the phase arrays containing 64 antenna elements. blockage status information, the schemes with blockage infor-
The available beam pattern sector varies from 4 to 16. For a mation, i.e., denoted by RIS-SENS-PER or RIS-SENS-DDQN,
4-beam antenna array with SNR = 12 dB, in the presence of can outperform the learning strategy only considering state
RIS, our lightweight prediction algorithm achieved the mean information like SINR and channel information. In fact, our
prediction accuracy over 89%. As the increasing of beam approach provides detailed environment observation for the
pattern numbers, the prediction accuracy drops to 77% for channel blockage status, which gives more efficient feedback
the case with 16 beam patterns. In Fig. 2, the accuracy under for the blockage effect, while the resting approaches cannot
various level of transmitting power is listed. We can observe obtain such information in the training process. Moreover,
that as the rising of transmitting power, the prediction accuracy from Fig.3, we can observe that as the SNRs increase, the per-
gradually increases, where our algorithm outperforms the table formance of handover agent can gradually reduce the handover
searching algorithms [11] with quantization level 128 and 256, consumption. This is due to the fact that high SNRs enable
respectively. In Fig. 2, we compared the prediction accuracy a more accurate observations for environment states. Having
of the proposed schemes under various SNRs. We can find the less noisy environment observations can also improve the
that within high SNR regime, the impact of noise on the handover performance.
estimation of CSI can be limited and we thus can obtain Fig. 5 shows how the handover number varies under differ-
precise channel coefficient model reconstruction and blockage ent user numbers during random blockages. In Fig. 5, the CDF
status, which can directly improve the performance of the of the number of handovers is presented. For each process,
prediction accuracy. we conducted 500 steps. From this figure we observe that,
The empirical cumulative distribution function (CDF) of with the inference information of blockage status, the RIS-
three schemes can be viewed in Fig. 3(a). The accuracy of our assisted handover scheme RIS-SENS-PER obtains the smallest
proposed algorithm is represented by the black curve. With a number of handovers, which outperforms schemes without
high prediction accuracy as high as 90%, our algorithm has blockage status information or the beamformer-assisted han-
the lowest probability (10%) to fail the target. Low prediction dover scheme. Due to the ability to recover links under various
accuracy of existing works is due to two reasons: (i) The blockages, our scheme can restore links with good qualities
table search is based on the assumption that the quantized and at the same time, the overall handover procedure is further
link blockage coefficients can approach to actual coefficients. optimized by DRL based handover scheme. From the figure,
Quantized coefficient b doesn’t have enough updating rate to for two different learning strategies DDQN and PER-DDQN,
approach to the global minimum; (ii) The table search based we can find that PER based scheme has the lowest CDF. The
algorithm developed in [11] needs to enlarge the searching existing schemes that rely on partial information about SINR
space as the increasing of parameter numbers and thus requires (rate-based) or average SINRs (channel-based) do not have the
a large memory size, which may cause redundant computation ability to sense the blockage status and thus cannot optimize
overhead. overall handover process.

B. The Number of Handovers C. Handover Throughput


In this section, we validate the number of handovers The throughput for UE u under various blockage condi-
achieved by the proposed scheme. Most of MDP based tions is evaluated in the following scenarios. We considered
schemes cannot handle the huge state space. As benchmark 8 beam patterns and 16 passive phase shifting elements in
comparison, we compare our scheme with the following trig- the RIS. In the first scenario, 20% clusters in the channel are

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2254 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022

Fig. 5. The CDF of spectrum efficiency and handover number.

randomly blocked with the blockage coefficient bi ∈ [0, 1].


In the second scenario, for each slot, 50% clusters are blocked.
Then the corresponding results are presented in Fig. 5(a). Fig. 6. The histogram of spectrum efficiency and handover number.
From Fig. 5(a), we can observe that schemes, assisted by RIS
and considering blockage status information, have the highest
average spectrum efficiency compared with schemes without
blockage status information or schemes relying on beam- spectrum efficiency significantly increases with the number of
tracking. In Fig. 5(a), when K = 8, M = 16, N = 64, we can the RIS elements. By increasing the number of RIS elements,
observe that the scheme denoted by RIS-SENS-PER has the a high signal gain can be obtained and the interference is better
average spectrum efficiency larger than 2.5 bps/Hz, while suppressed to improve the received SINR at the UEs. In addi-
the RIS scheme without inference information has average tion, the performance of the threshold-based handover scheme
spectrum efficiency near to 2.3bps/Hz. With the inference of improves very slowly, which indicates the RIS can enrich the
blockage status, the design of beamformers and RIS weights scattering environment and thus combat the channel blockage.
are more accurate and appropriate for the fast blockage status It is worthwhile pointing out that the joint design of BS
over beamformers and RIS weights. For the scheme only beamformer and RIS weight can better leverage the benefits
considering beam-tracking, it is indeed able to restore the from the larger RIS array sizes. With more reflecting elements
link however the performance of average spectrum efficiency at the RIS, the proposed PER-DDQN learning based handover
for the restored link is limited. In this case, the potential of scheme becomes more flexible for passive phase shifting and
reflective channels is not fully exploited. beamforming design, thereby yielding higher gains.
From Fig.6(b), it is found that, for RIS-assisted handover
schemes, less handovers are needed in presence of fast chan-
D. The Number of RIS Elements nel blockage. We note that the handover scheme relying
In Fig. 6(a), for all of the six schemes, the performance of merely on the beam-tracking (denoted as BF-DDQN) has
average spectrum efficiency (bps/Hz) is evaluated by changing less performance improvement compared with schemes with
the number of RIS elements, i.e., from M = 8 to 48. The joint designs, RIS-DDQN or RIS-SENS-DDQN. In particular,
SNR is set to be 12 dB and 8 beampatterns are utilized at the performance gap clearly increases with RIS elements
BSs. For UE handover schemes assisted by the RIS, average number M . More passive phase shifting elements in the

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2255

we find that the performance of average spectrum efficiency


and total number of handovers vary slightly over different
datasets. For instance, compared with the performance over the
original dataset, we observe the decrease of average spectrum
efficiency and the increase of handover numbers under severe
blockage scenarios.

VII. C ONCLUSION
In this work, we developed a RIS-assisted handover scheme
based on DRL to mitigate large handover consumption due
to frequent link blockages at mmWave frequency band. The
intelligent reflecting surface is utilized to enrich the scattering
and reflecting environment for signal transmission. To better
mitigate the link blockage and refine reflective channels,
passive phase shifting array in the RIS and beamformers
at BSs are jointly designed. At the same time, to improve
the efficiency of model training, the DRL agent based on
PER-DDQN is developed. Numerical results show that UE
handover assisted by RIS can operate in various blockage
scenarios and achieve the lower handover number and higher
throughput than existing schemes.

A PPENDIX
Proof of Proposition 2: The objective function in prob-
lem (11) involving the lagrangian multiplier λ ∈ RP ×1 can
be represented as
1   ∗
L(b, λ) = bT Ab − Re ĥ k; p (hopt k )
H
b + λT b. (29)
2
We can get the derivative of the above function as
Fig. 7. The histogram of spectrum efficiency and handover number under
  ∗
different dataset. L(b, λ) = Ab − Re ĥ k; p (hopt k )H
+ λIP . (30)

We can observe that if we can find λ∗ to meet the comple-


mentary slackness λ∗ IP = 0. The result of equation L(b, λ)
RIS means richer scattering and reflecting clusters within the can meet the stationarity property. Therefore, b∗ is derived
channels are possible. At the same time, more passive phase according to the closed-form result in (2).
shifting elements enable the RIS to refine the reflecting signal
more clearly to increase the SINR. In addition, from Fig. 6(b) R EFERENCES
compared with the RIS-assisted handover schemes without the
inference information of blockage status at the training stage, [1] X. Duan and X. Wang, “Authentication handover and privacy protection
in 5G HetNets using software-defined networking,” IEEE Commun.
like RIS-DDQN or RIS-PER, two schemes, RIS-SENS-DDQN Mag., vol. 53, no. 4, pp. 28–35, Apr. 2015.
and RIS-SENS-PER require less handover consumption. For [2] J. Tang, H. Wen, K. Zeng, R.-F. Liao, F. Pan, and L. Hu, “Light-weight
example, after M = 16, the handover numbers of the two physical layer enhanced security schemes for 5G wireless networks,”
IEEE Netw., vol. 33, no. 5, pp. 126–133, Sep. 2019.
schemes drop very quickly. [3] G. Yang and M. Xiao, “Performance analysis of millimeter-wave
The performance of handover agents such as average spec- relaying: Impacts of beamwidth and self-interference,” IEEE Trans.
trum efficiency and total number of handovers can be affected Commun., vol. 66, no. 2, pp. 589–600, Feb. 2018.
[4] B. Van Quang, R. V. Prasad, and I. Niemegeers, “A survey on handoffs—
by the environment dynamics in the test data. For instance, Lessons for 60 GHz based wireless systems,” IEEE Commun. Surveys
datasets may include various environment dynamics with dif- Tuts., vol. 14, no. 1, pp. 64–86, Oct. 2012.
ferent blockage status, UE locations and so on. To test the per- [5] L. Jiao, N. Wang, P. Wang, A. Alipour-Fanid, J. Tang, and K. Zeng,
“Physical layer key generation in 5G wireless networks,” IEEE Wireless
formance of the trained model, we generated several datasets Commun., vol. 26, no. 5, pp. 48–54, Oct. 2019.
with different blockage status. In the blockage dataset 1, [6] M. Xiao et al., “Millimeter wave communications for future mobile
the percentage of severe channel blockage, i.e., the signal networks,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 1909–1935,
Sep. 2017.
strength drops 20 dB, is 30%. In the dataset 2 and dataset 3, [7] S. Huang, Y. Ye, and M. Xiao, “Learning-based hybrid beamforming
we consider more frequent channel blockages. For instance, design for full-duplex millimeter wave systems,” IEEE Trans. Cogn.
the percentage of severe channel blockage are 40% and 50%. Commun. Netw., vol. 7, no. 1, pp. 120–132, Mar. 2021.
[8] S. Sun and T. S. Rappaport, “Wideband mmWave channels: Implications
We trained the model using the first dataset and conduct for design and implementation of adaptive beam antennas,” in IEEE
the test over the other datasets. From Fig. 7(a) and 7(b), MTT-S Int. Microw. Symp. Dig., Jun. 2014, pp. 1–4.

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
2256 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 4, APRIL 2022

[9] M. Alrabeiah and A. Alkhateeb, “Deep learning for mmWave beam and [30] X. Liu et al., “Learning to predict the mobility of users in
blockage prediction using sub-6 GHz channels,” IEEE Trans. Commun., mobile mmWave networks,” IEEE Wireless Commun., vol. 27, no. 1,
vol. 68, no. 9, pp. 5504–5518, Sep. 2020. pp. 124–131, Feb. 2020.
[10] S. Collonge, G. Zaharia, and G. El Zein, “Influence of the human activity [31] L. Sun, J. Hou, and T. Shu, “Optimal handover policy for mmWave
on wide-band characteristics of the 60 GHz indoor radio channel,” IEEE cellular networks: A multi-armed bandit approach,” in Proc. IEEE
Trans. Wireless Commun., vol. 3, no. 6, pp. 2396–2406, Nov. 2004. Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6.
[11] S. Sur, X. Zhang, P. Ramanathan, and R. Chandra, “BeamSpy: Enabling [32] B. Sheen, J. Yang, X. Feng, and M. M. U. Chowdhury, “A deep learning
robust 60 GHz links under blockage,” in Proc. 13th USENIX Symp. Netw. based modeling of reconfigurable intelligent surface assisted wireless
Syst. Design Implement. (NSDI), 2016, pp. 193–206. communications for phase shift configuration,” IEEE Open J. Commun.
[12] L. Yan et al., “Machine learning-based handovers for sub-6 GHz Soc., vol. 2, pp. 262–272, 2021.
and mmWave integrated vehicular networks,” IEEE Trans. Wireless [33] N. K. Kundu and M. R. McKay, “Channel estimation for reconfigurable
Commun., vol. 18, no. 10, pp. 4873–4885, Oct. 2019. intelligent surface aided MISO communications: From LMMSE to deep
[13] M. Polese, M. Giordani, M. Mezzavilla, S. Rangan, and M. Zorzi, learning solutions,” IEEE Open J. Commun. Soc., vol. 2, pp. 471–487,
“Improved handover through dual connectivity in 5G mmWave mobile 2021.
networks,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 2069–2084, [34] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and
Sep. 2017. C. Yuen, “Reconfigurable intelligent surfaces for energy efficiency in
[14] Y. Xiu, Y. Zhao, Y. Liu, J. Zhao, O. Yagan, and N. Wei, “IRS- wireless communication,” IEEE Trans. Wireless Commun., vol. 18, no. 8,
assisted millimeter wave communications: Joint power allocation and pp. 4157–4170, Aug. 2019.
beamforming design,” 2020, arXiv:2001.07467. [Online]. Available: [35] C. Huang et al., “Holographic MIMO surfaces for 6G wireless networks:
http://arxiv.org/abs/2001.07467 Opportunities, challenges, and trends,” IEEE Wireless Commun., vol. 27,
[15] D.-W. Yue, H. H. Nguyen, and Y. Sun, “MmWave doubly- no. 5, pp. 118–125, Oct. 2020.
massive-MIMO communications enhanced with an intelligent [36] S. Hu, F. Rusek, and O. Edfors, “Beyond massive MIMO: The potential
reflecting surface,” 2020, arXiv:2003.00282. [Online]. Available: of data transmission with large intelligent surfaces,” IEEE Trans. Signal
http://arxiv.org/abs/2003.00282 Process., vol. 66, no. 10, pp. 2746–2758, May 2018.
[16] S. Huang, Y. Ye, M. Xiao, H. V. Poor, and M. Skoglund, “Decentralized [37] J. He, H. Wymeersch, and M. Juntti, “Channel estimation for RIS-
beamforming design for intelligent reflecting surface-enhanced cell-free aided mmWave MIMO systems via atomic norm minimization,” 2020,
networks,” IEEE Wireless Commun. Lett., vol. 10, no. 3, pp. 673–677, arXiv:2007.08158. [Online]. Available: http://arxiv.org/abs/2007.08158
Mar. 2021. [38] L. Wei, C. Huang, G. C. Alexandropoulos, C. Yuen, Z. Zhang, and
[17] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Deep learning for large M. Debbah, “Channel estimation for RIS-empowered multi-user MISO
intelligent surfaces in millimeter wave and massive MIMO systems,” in wireless communications,” IEEE Trans. Commun., vol. 69, no. 6,
Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6. pp. 4144–4157, Jun. 2021.
[18] S. Gong et al., “Towards smart wireless communications via intelligent [39] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, Jr., “Channel
reflecting surfaces: A contemporary survey,” 2019, arXiv:1912.07794. estimation and hybrid precoding for millimeter wave cellular systems,”
[Online]. Available: http://arxiv.org/abs/1912.07794 IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846,
[19] H. Yang, Z. Xiong, J. Zhao, D. Niyato, L. Xiao, and Q. Wu, “Deep Oct. 2014.
reinforcement learning-based intelligent reflecting surface for secure [40] L. Xiao et al., “Reinforcement learning-based downlink interference
wireless communications,” IEEE Trans. Wireless Commun., vol. 20, control for ultra-dense small cells,” IEEE Trans. Wireless Commun.,
no. 1, pp. 375–388, Jan. 2021. vol. 19, no. 1, pp. 423–434, Jan. 2020.
[20] H. Yang et al., “Intelligent reflecting surface assisted anti-jamming com- [41] C. Huang, R. Mo, and Y. Yuen, “Reconfigurable intelligent surface
munications based on reinforcement learning,” 2020, arXiv:2012.12761. assisted multiuser MISO systems exploiting deep reinforcement learn-
[Online]. Available: http://arxiv.org/abs/2012.12761 ing,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1839–1850,
[21] D. Zhao, H. Lu, Y. Wang, and H. Sun, “Joint passive beamforming and Jun. 2020.
user association optimization for IRS-assisted mmWave systems,” 2020, [42] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.
arXiv:2007.01069. [Online]. Available: http://arxiv.org/abs/2007.01069 Cambridge, MA, USA: MIT Press, 2018.
[22] K. Gao et al., “Beampattern-based tracking for millimeter wave commu- [43] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
nication systems,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), with double Q-learning,” in Proc. 13th AAAI Conf. Artif. Intell., 2016,
Dec. 2016, pp. 1–6. pp. 2094–2100.
[23] S. Kang, S. Choi, G. Lee, and S. Bahk, “A dual-connection based [44] C. Bettstetter, H. Hartenstein, and X. Pérez-Costa, “Stochastic properties
handover scheme for ultra-dense millimeter-wave cellular networks,” in of the random waypoint mobility model,” Wireless Netw., vol. 10, no. 5,
Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6. pp. 555–567, 2004.
[24] A. Alkhateeb, I. Beltagy, and S. Alex, “Machine learning for reli-
able mmWave systems: Blockage prediction and proactive handoff,” in
Proc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), Nov. 2018,
pp. 1055–1059.
[25] K. Qi, T. Liu, and C. Yang, “Federated learning based proactive handover
in millimeter-wave vehicular networks,” in Proc. 15th IEEE Int. Conf.
Signal Process. (ICSP), Dec. 2020, pp. 401–406.
[26] Y. Koda, K. Nakashima, K. Yamamoto, T. Nishio, and
M. Morikura, “Handover management for mmWave networks with
proactive performance prediction using camera images and deep
reinforcement learning,” 2019, arXiv:1904.04585. [Online]. Available:
http://arxiv.org/abs/1904.04585
[27] S. Zang et al., “Mobility handover optimization in millimeter wave
Long Jiao (Student Member, IEEE) received the
heterogeneous networks,” in Proc. 17th Int. Symp. Commun. Inf. Technol.
B.Sc. degree in information security from Xidian
(ISCIT), Sep. 2017, pp. 1–6.
University (XDU), Xi’an, China, in 2016. He is cur-
[28] M. Mezzavilla, S. Goyal, S. Panwar, S. Rangan, and M. Zorzi, “An rently pursuing the Ph.D. degree with George Mason
MDP model for optimal handover decisions in mmWave cellular net- University, Fairfax, VA, USA. He has been with
works,” in Proc. Eur. Conf. Netw. Commun. (EuCNC), Jun. 2016, George Mason University, since 2016. His current
pp. 100–105. interests include 5G communication systems, 5G
[29] L. Sun, J. Hou, and T. Shu, “Spatial and temporal contextual physical layer security, operational security of spec-
multi-armed bandit handovers in ultra-dense mmWave cellular net- trum sharing systems, and RIS/IRS-assisted wireless
works,” IEEE Trans. Mobile Comput., early access, Jun. 5, 2020, doi: communications.
10.1109/TMC.2020.3000189.

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.
JIAO et al.: ENABLING EFFICIENT BLOCKAGE-AWARE HANDOVER 2257

Pu Wang (Student Member, IEEE) received the Huacheng Zeng (Senior Member, IEEE) received
B.S. degree in telecommunications engineering from the Ph.D. degree in computer engineering from
Xidian University in 2014, where he is currently Virginia Polytechnic Institute and State University
pursuing the Ph.D. degree with the School of Cyber (Virginia Tech), Blacksburg, VA, USA. He is cur-
Engineering. His research interests include backscat- rently an Assistant Professor with the Department of
ter communication, wireless information and power Computer Science and Engineering, Michigan State
transfer, physical layer security, and information University (MSU), East Lansing, MI, USA. Prior
security in the Internet of Things. to joining MSU, he was an Assistant Professor in
electrical and computer engineering with the Uni-
versity of Louisville, Louisville, KY, USA, and a
Senior System Engineer at Marvell Semiconductor,
Santa Clara, CA, USA. His research interests include wireless networking and
sensing systems. He was a recipient of the NSF CAREER Award. He received
the Best Paper Award from IEEE SECON 2021.

Kai Zeng (Member, IEEE) received the Ph.D.


degree in electrical and computer engineering from
Worcester Polytechnic Institute (WPI) in 2008.
He was a Post-Doctoral Scholar with the Department
of Computer Science, University of California at
Davis (UCD), from 2008 to 2011. He was with the
Department of Computer and Information Science,
University of Michigan-Dearborn, as an Assistant
Professor, from 2011 to 2014. He is currently an
Amir Alipour-Fanid (Member, IEEE) received the Associate Professor with the Department of Elec-
Ph.D. degree in electrical and computer engineering trical and Computer Engineering, Cyber Security
from George Mason University, Fairfax, VA, USA, Engineering, and the Department of Computer Science, George Mason
in 2021. He is currently an Assistant Professor University. His current research interests include cyber-physical systems/the
with the Lane Department of Computer Science IoT security and privacy, 5G and beyond wireless network security, network
and Electrical Engineering, West Virginia University forensics, machine learning, and spectrum sharing. He was a recipient of U.S.
(WVU), Morgantown, WV, USA. His research inter- National Science Foundation Faculty Early Career Development (CAREER)
ests include wireless cyber-physical systems secu- Award in 2012, the Excellence in Postdoctoral Research Award from UCD
rity, the Internet of Things Security, connected and in 2011, and the Sigma Xi Outstanding Ph.D. Dissertation Award from WPI
autonomous vehicles security, 5G wireless commu- in 2008. He is an Editor of the IEEE T RANSACTIONS ON I NFORMATION
nication, theoretical multi-armed bandits learning, F ORENSICS AND S ECURITY and IEEE T RANSACTIONS ON C OGNITIVE
and applied machine learning. C OMMUNICATIONS AND N ETWORKING.

Authorized licensed use limited to: University of Glasgow. Downloaded on September 06,2022 at 20:02:23 UTC from IEEE Xplore. Restrictions apply.

You might also like