0% found this document useful (0 votes)

124 views5 pages

Deep Reinforcement Learning For Intelligent Reflec

1) The document proposes a deep reinforcement learning approach to optimize the sum-rate of an intelligent reflecting surface-assisted device-to-device communication network. 2) It formulates a Markov decision process to jointly optimize the transmit power of device-to-device transmitters and the phase shift matrix of the intelligent reflecting surface. 3) A deep reinforcement learning algorithm called proximal policy optimization is used to find the optimal policy that maximizes the network sum-rate.

Uploaded by

Sreekrishna Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views5 pages

Deep Reinforcement Learning For Intelligent Reflec

Uploaded by

Sreekrishna Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1

Deep Reinforcement Learning for Intelligent

Reflecting Surface-assisted D2D Communications
Khoi Khac Nguyen, Antonino Masaracchia, Cheng Yin, Long D. Nguyen, Octavia A. Dobre, and Trung Q.
Duong

Abstract—In this paper, we propose a deep reinforcement Some research works have investigated the efficiency of
learning (DRL) approach for solving the optimisation problem the IRS in assisting the D2D communications [7], [8]. In [7]
of the network’s sum-rate in device-to-device (D2D) communica-
arXiv:2108.02892v1 [eess.SP] 6 Aug 2021

and [8], two sub-problems with fixed passive beamforming

tions supported by an intelligent reflecting surface (IRS). The IRS
is deployed to mitigate the interference and enhance the signal vector and fixed phase shift matrix were considered. To solve
between the D2D transmitter and the associated D2D receiver. the power allocation optimisation with the fixed phase shift
Our objective is to jointly optimise the transmit power at the D2D matrix, the authors in [7] used the gradient descent method
transmitter and the phase shift matrix at the IRS to maximise while the authors in [8] employed the Dinkelbach method.
the network sum-rate. We formulate a Markov decision process For the phase shift optimisation, a local search algorithm was
and then propose the proximal policy optimisation for solving
the maximisation game. Simulation results show impressive proposed in [7] while fractional programming was utilised
performance in terms of the achievable rate and processing time. in [8]. However, these approaches assume a discrete phase
shift and only reach a sub-optimal solution. Moreover, these
Index Terms—Intelligent reflecting surface (IRS), D2D com- works only consider perfect conditions, e.g., channel state
munications, deep reinforcement learning. information (CSI). In addition, these algorithms cause large
delays due to high computational complexity.
Very recently, deep reinforcement learning (DRL) has been
I. I NTRODUCTION applied as an effective solution for solving complicated prob-
Device-to-device (D2D) communications play a critical role lems in wireless networks [9]–[14]. In [9], we defined the
in 5G networks by allowing users to communicate directly discrete power level and used the DRL algorithm to choose
without the involvement of base stations. It helps reduce the the transmit power at the D2D transmitter for maximising
latency and improve the information transmission efficiency the EE. In [11], discrete and continuous action spaces were
[1], [2]. In [1], the optimised power allocation at the D2D considered for the beamforming vector and the IRS phase
transmitters was proposed to maximise the energy efficiency shift in multiple-input single-output (MISO) communications.
(EE) performance, by following a machine learning-based ap- Then, two DRL algorithms were used to maximise the total
proach. In [2], the D2D transmitters harvest energy through the throughput. In [12], a method based on the DRL was used
simultaneous wireless information and power transfer protocol for optimising the unmanned aerial vehicle (UAV)’s altitude
(SWIPT). Then, a game theory approach was proposed to and the IRS diagonal matrix to minimise the sum age-of-
solve the power allocation and power splitting at SWIPT with information. In [13], the authors used the DRL technique to
pricing strategies for maximising the network performance. maximise the signal-to-noise ratio.
Intelligent reflecting surface (IRS), referring to the technol- In this paper, we propose a DRL algorithm for solving
ogy of massive elements of flexible reflection capability that the joint power allocation and phase shift matrix optimisa-
are controlled by an intelligent unit, has recently attracted great tion in the IRS-assisted D2D communications. Firstly, we
attention from the research community as an efficient means to conceive a D2D communication system with the support of
expand wireless coverage. The IRS can manage the incoming the IRS. The D2D channel is a combination of the direct
signal by a controller, which allows to efficiently adapt the link and the reflective link. The IRS is used for mitigating
angle of passive reflection from the transmitters toward the the interference and enhancing the information transmission
receivers [3]–[6]. In [4], the IRS harvests energy from the channel. Secondly, we formulate a Markov decision process
access point (AP) and uses it for reflecting the signal in two (MDP) [15] for the network throughput maximisation in the
phases. The AP beamforming vector, the IRS’s phase schedul- IRS-assisted D2D communications, in which the optimisation
ing, and the passive beamforming were optimised to maximise variables are the power at the D2D users and the phase shifts
the information rate. In [5], a channel estimation scheme for a at the IRS. Then, a DRL algorithm is used to search for an
multi-user multiple-input multiple-output (MIMO) system has optimal policy for maximising the network sum-rate. Finally,
been designed with the support of double IRS panels. we compare the efficiency of our proposed methods with other
schemes in terms of the achievable network sum-rate.
K. K. Nguyen, A. Masaracchia, C. Yin, and T. Q.
Duong are Queen’s University Belfast, UK (e-mail: II. S YSTEM M ODEL AND P ROBLEM F ORMULATION
{knguyen02,a.masaracchia,cyin01,trung.q.duong}@qub.ac.uk). L. D. Nguyen
is with Duy Tan University, Vietnam (email: [email protected]). We consider an IRS-assisted wireless network with N pairs
O. A. Dobre is with Memorial University, Canada (e-mail: [email protected]) of D2D users distributed randomly and an IRS panel, as shown
2

in Fig. 1. Each pair of D2D users comprises of a single- channel at time step t described by
antenna D2D transmitter (D2D-Tx) and a single-antenna D2D s r
receiver (D2D-Rx). An IRS panel with K reflective elements t β1 LoS 1
Hnm = h̃nm + h̃N LoS , (3)
is deployed to enhance the signal from the D2D-Tx to the 1 + β1 β + 1 nm
associated D2D-Rx and mitigate the interference from other
D2D-Txs. The IRS with reflective elements maps the receiver’s where β1 is the Rician factor, and h̃LoS N LoS
nm , h̃nm are the line-
signal by the value of the phase shift matrix controlled by of-sight (LoS) and the non-line-of-sight (NLoS) components
an intelligent unit. The received signal at the D2D-Rx is for the reflected channel, respectively. Specifically, the LoS
composed of a direct signal and a reflective one. component is defined as [7]
We denote the position of the nth D2D-Tx at time step t q 0

as Xnt (Tx) = xtn (Tx), ynt (Tx) , n = 1, . . . , N and that of h̃LoS

nm = β0 (dtn,IRS dtIRS,m )−κ0 e−jθ , (4)
t
the mth D2D-Rx as Xm (Rx) = xtm (Rx)), ym t
(Rx) , m = where θ0 ∈ [0, 2π] is the random phase. The NLoS component
t t t
1, . . . , N . The IRS is fixed at the position (xIRS , yIRS , zIRS ). is defined as
The phase shift value of each element in the IRS belongs to q
[0, 2π]. h̃N
nm
LoS
= β0 (dtn,IRS dtIRS,m )−κ1 ĥN LoS
nm , (5)

where κ1 is the path loss exponent for the NLoS component

Information Transmission
and the small-scale fading ĥN
nm
LoS
∼ CN (0, 1) is i.i.d. complex
Interference Gaussian distribution with zero mean and unit variance.
The received signal at the nth D2D-Rx at time step t can
be written as
K
X p
stn = htnn + t
Hnn Φt ptn utn
D2D-Tx D2D-Tx
D2D-Tx k=1
N K
(6)
X X p
+ htmn + t
Hmn Φt ptm utm + $,
D2D-Rx D2D-Rx
D2D-Rx m6=n k=1

where ptn is the transmit power at the nth D2D-Tx at time

step t, utn is the transmitted symbol from the nth D2D-Tx,
Fig. 1. System model of the IRS-assisted D2D communications. and $ ∼ N (0, α2 ) is the complex additive white Gaussian
noise.
We denote the direct channel from the nth D2D-Tx to Accordingly, the received signal-to-interference-plus-noise
the mth D2D-Rx at time step t by htnm , and the reflective ratio (SINR) at the nth D2D-Rx can be represented as
t
channel by Hnm . The phase shift matrix at the IRS at time PK
|htnn + k=1 Hnnt
Φt |2 ptn
t
t t
step t is defined by Φt = diag(η2t θ1t , η2t θ2t , . . . , ηK θK ), where γn = P P K
. (7)
t t t 2 t 2
t t
ηk ∈ [0, 1] and θk ∈ [0, 2π] represent the amplitude and the m6=n,m∈N |hmn + k=1 Hmn Φ | pm + α
phase shift value, respectively. In this paper, we assume that The achievable sum-rate at the nth D2D pair during time
the amplitudes of all elements are set to ηkt = 1. step t is defined as
The distance between the nth D2D-Tx and the mth D2D-Rx
at time step t is defined as Rnt = B log2 (1 + γnt ), (8)
q 2 where B is the bandwidth.
dtnm = t (Rx) 2 .

xtn (Tx) − xtm (Rx) + ynt (Tx) − ym In this paper, we aim at optimising the power allocation of
(1) all N pairs of D2D users P = {p1 , p2 , . . . , pN } and the phase
Similarly, the distance between the nth D2D-Tx and the IRS shift matrix Φ of the IRS to maximise the network sum-rate
is dtn,IRS and the distance between the IRS and the mth D2D- while satisfying all the constraints. The considered network
Rx is dtIRS,m at time step t. The direct channel is formulated optimisation can be formulated as follows:
as
p N
X
htnm = ĥn β0 (dtnm )−κ0 , (2) max t
Rtotal = Rnt
P,Φ
n=1
where β0 and ĥn are the channel power gain at the reference s.t. 0 < pn < Pmax , ∀n ∈ N (9)
distance d0 = 1 m and κ0 is the path-loss exponent in the Rnt ≥ rmin , ∀n ∈ N
D2D link. Here, we assume that the small-scale fading follows
θk ∈ [0, 2π], ∀k ∈ K,
the Nakagami-m distribution with m as the fading severity
parameter. where Pmax is the maximum transmit power at the D2D-Tx
The reflective channel via the IRS from the nth D2D-Tx and the constraint Rnt ≥ rmin , ∀n ∈ N indicates the quality-
toward the mth D2D-Rx is considered as a Rician fading of-service (QoS) of the D2D communications.
3

III. J OINT O PTIMISATION OF P OWER A LLOCATION AND denotes the state-value function while Qπ (s, a) is the action-
P HASE S HIFT M ATRIX value function.
Given the optimisation problem (9), we formulate the MDP In the PPO method, we limit the current policy such that
with the agent, the state space S, the action space A, the it does not go far from the obtained policy by using different
transition probability P, the reward function R and the dis- techniques, e.g., the clipping technique and Kullback-Leiber
count factor ζ. Let us denote Pss0 (a) as the probability when [17]. In this work, we use the clipping surrogate method to
the agent takes action at ∈ A at the state s = st ∈ S and prevent the excessive modification of the objective value, as
transfers to the next state s0 = st+1 ∈ S. In particular, we follows:
"
formulate the MDP game as follows:
Lclip (s, a; θ) = E min ptθ Aπ (s, a),
• State space: The channel gain of the D2D users forms
the state space as # (14)

n K
X K
X clip(ptθ , 1 π
− , 1 + )A (s, a) ,
S = h11 + H11 Φ, . . . , h1N + H1N Φ, . . . ,
k=1 k=1 where is a hyperparameter.
K
X K
X When the advantage Aπ (s, a) is positive, the term (1 + )
hnm + Hnm Φ, . . . , hnN + HnN Φ, . . . , takes action. Meanwhile, for the negative case of the advantage
k=1 k=1 Aπ (s, a), the term (1 − ) sets a ceiling to limit the objective
K K
X X o value. Moreover, for the advantage function Aπ (s, a), we use
hN 1 + HN 1 Φ, . . . , hN N + HN N Φ . (10) [18]:
k=1 k=1
Aπ (s, a) = rt + ζV π (st+1 ) − V π (st ), (15)
• Action space: The D2D-Txs adjust the transmit power
and the IRS changes the phase shift for maximising the where the state-value function V π (s) is obtained at the state
expected reward. Thus, The action space for the D2D s under the policy π as follows:
n o
users and the IRS is considered as follows: V π = E R|s, π . (16)
A = {p1 , p2 , . . . , pN , θ1 , θ2 , . . . , θK }. (11)
To train the policy network, we store the transition into a
• Reward function: The agent needs to find an optimal
mini-batch memory D and then use the stochastic policy gra-
policy for maximising the reward. In our problem, our dient (SGD) method to maximise the objective. By denoting
objective is to maximise the network sum-rate; thus, the the policy parameter by θ, it is updated as
reward function is defined as R =
h i
! θd+1 = arg max E L(s, a; θd ) . (17)
N PK 2
X |hnn + k=1 Hnn Φ| pn
B log2 1+ PN PK . The PPO algorithm for joint optimisation of the transmit
|h + H Φ| 2 p + α2
n=1 m6=n mn k=1 mn m power and the phase shift matrix in the IRS-aided D2D com-
(12) munications is presented in Algorithm 1, where M denotes
By following the MDP, the agent interacts with the environ- the maximum number of episodes and T is the number of
ment and receives the response to achieve the best expected iterations during a period of time.
reward. Particularly, the state of the agent at time step t is st .
The agent chooses and executes the action at under the policy IV. S IMULATION R ESULTS
π. The environment responds with the reward rt . After taking
For numerical results, we use Tensorflow 1.13.1 [19]. The
the action at , the agent moves to the new state st+1 with
IRS is deployed at (0, 0, 0), while the D2D devices are ran-
probability Pss0 (a). The interactions are iteratively executed
domly distributed within a circle of 100 m from the center. The
and the policy is updated for the optimal reward.
maximum distance between the D2D-Tx and the associated
In this paper, we propose a DRL approach to search for an
D2D-Rx is set to 10 m. We assume d/λ = 1/2, and set
optimal policy for maximising the reward value in (12). The
the learning rate for the PPO algorithm to 0.0001. For the
optimal policy can be obtained by modifying the estimation
neural networks, we initialise two hidden layers with 128 and
of the value function or directly by the objective. We use an
64 units, respectively. All other parameters are provided in
on-policy algorithm for our work, namely proximal policy
Table I. We consider the following algorithms in the numerical
optimisation (PPO) with the clipping surrogate technique
results.
[16]. Consider the probability ratio of the current policy and
t π(s,a;θ) • The proposed algorithm: We use the PPO algorithm
obtained policy pθ = π(s,a;θold ) , we need to find the optimal
policy to maximise the total expected reward as follows: with the clipping surrogate technique to solve the joint
" # " # optimisation of the power allocation and the phase shift
π(s, a; θ) π matrix of the IRS.
L(s, a; θ) = E A (s, a) = E ptθ Aπ (s, a) , (13) • Maximum power transmission (MPT): The D2D-Tx
π(s, a; θold )
transmits information with maximum power, Pmax . We
where E[·] is the expectation operation and Aπ (s, a) = use the PPO algorithm to optimise the phase shift matrix
Qπ (s, a) − V π (s) denotes the advantage function [17]; V π (s) of the IRS panel.
4

Algorithm 1 Proposed approach based on the PPO algorithm

for the IRS-assisted D2D communications.
1: Initialise the policy π with the parameter θπ Proposed algorithm
45 MPT
2: Initialise other parameters RPS
44 WithoutIRS
3: for episode = 1, . . . , M do
Receive initial observation state s0 43

Sum-rate (bits/s/Hz)
4:
5: for iteration = 1, . . . , T do 42
6: Obtain the action at at state st by following the
41
current policy
7: Execute the action at 40
8: Receive the reward rt according to (12) 39
9: Observe the new state st+1
10: Update the state st = st+1 38

11: Collect set of partial trajectories with D transitions 10 20 30 40 50

K
12: Estimate the advantage function according to (15)
13: end for
Fig. 2. The network sum-rate versus the number of IRS elements, K.
14: Update policy parameters using SGD with mini-batch
D
D
1 X clip schemes is compared while varying the number of D2D pairs,
θt+1 = arg max L (s, a; θt ) (18)
D N , in Fig. 3. We set the number of IRS element to K = 20
15: end for and take the average over 500 episodes to obtain the results.
Our proposed algorithm shows better performance, followed
by MPT. With higher number of D2D users, N ≥ 6, the
• Random phase shift matrix selection (RPS): We op- performance attained by the proposed algorithm still increases
timise the power allocation at the D2D-Tx with random while it decreases for the other schemes. The RPS and
selection of the phase shift matrix Φ. WithoutIRS models show the worse performance.
• Without IRS: The D2D-Tx transmits information with-
out the support of the IRS. We optimise the power
allocation by using the PPO algorithm.
45

TABLE I 40
SIMULATION PARAMETERS.
Sum-rate (bits/s/Hz)

Parameters Value 35

Bandwidth (W ) 1 MHz
30
Path-loss parameter κ0 = 2.5, κ1 = 3.6
Channel power gain −30 dB
25
Rician factor β1 = 4
Noise power α2 = −80 dBm Proposed algorithm
20 MPT
Clipping parameter = 0.2 RPS
WithoutIRS
Discounting factor ζ = 0.9
2 3 4 5 6 7 8 9 10
Max number of D2D pairs 10 N
Initial batch size K = 128

Fig. 3. The network sum-rate versus the number of D2D pairs, N .

Firstly, we compare the achievable network sum-rate pro-
vided by our proposed algorithm with that of other schemes. Further, we set N = 5, K = 20 and compare the
Fig. 2 plots the sum-rate versus different numbers of the IRS performance results of the four schemes while changing the
elements, K, where the number of D2D pairs is set to N = 5. value of the threshold, rmin , in Fig. 4. When the value of
As can be observed from this figure, the PPO algorithm-based rmin increases towards infinity, the number of D2D pairs that
technique outperforms other schemes and is followed by the satisfies the QoS constraints decreases and the sum-rate of
MPT technique. The RP and WithoutIRS schemes show poorer all schemes tends to 0. The proposed algorithm outperforms
performance in terms of the network sum-rate. The achievable the other schemes for all values of rmin . The gap between
network sum-rate using our proposed algorithm and MPT our algorithm and others increases following the increase
improves with increasing the number of IRS elements. The in rmin when rmin ≥ 5. The MPT algorithm exhibits the
results show that with the monotonic increase in the value worst performance when rmin ≥ 7. This suggests that the
of K, the communication quality between the D2D-Tx and optimisation of power allocation is important for efficient D2D
associated D2D-Rx is enhanced, while the interference from communications.
other D2D-Txs is suppressed. Next, we compare the total sum-rate of the four schemes by
Next, the performance of the previously mentioned four setting different maximum transmission powers at the D2D-
5

has been proposed for joint optimisation of the D2D-Tx power

and the IRS’s phase shift matrix. Numerical results have
40 showed a significant improvement in the achievable network
sum-rate performance compared with the benchmark schemes.
30
Our proposed scheme demonstrates the superiority of using
Sum-rate (bits/s/Hz)

IRS in mitigating the interference in the D2D communications

when compared with other existing schemes.
20
R EFERENCES
[1] K. K. Nguyen, T. Q. Duong, N. A. Vien, N.-A. Le-Khac, and N. M.
10
Nguyen, “Non-cooperative energy efficient power allocation game in
Proposed algorithm D2D communication: A multi-agent deep reinforcement learning ap-
MPT
RPS proach,” IEEE Access, vol. 7, pp. 100 480–100 490, Jul. 2019.
0 WithoutIRS [2] J. Huang, C.-C. Xing, and M. Guizani, “Power allocation for D2D
3 4 5 6 7 8 9 10 communications with SWIPT,” IEEE Trans. Wireless Commun., vol. 19,
rmin no. 4, pp. 2308–2320, Apr. 2020.
[3] H. Yu, H. D. Tuan, A. A. Nasir, T. Q. Duong, and H. V. Poor, “Joint
design of reconfigurable intelligent surfaces and transmit beamforming
Fig. 4. The network sum-rate versus the QoS threshold, rmin .
under proper and improper Gaussian signaling,” IEEE J. Select. Areas
Commun., vol. 38, no. 11, pp. 2589–2603, Nov. 2020.
[4] Y. Zou, S. Gong, J. Xu, W. Cheng, D. T. Hoang, and D. Niyato,
Tx, Pmax , in Fig. 5, with N = 5, K = 20. As Pmax “Wireless powered intelligent reflecting surfaces for enhancing wireless
varies from 100 mW to 400 mW, the performance of the four communications,” IEEE Trans. Veh. Technol., vol. 69, no. 10, pp. 12 369–
12 373, Oct. 2020.
schemes increases in the same upward trend. The gap between [5] B. Zheng, C. You, and R. Zhang, “Efficient channel estimation for
our proposed algorithm and the other schemes increases with double-IRS aided multi-user MIMO system,” IEEE Trans. Commun.,
the increase value of Pmax as we jointly optimise both power vol. 69, no. 6, pp. 3818–3832, Jun. 2021.
[6] K. K. Nguyen, S. Khosravirad, L. D. Nguyen, T. T. Nguyen,
allocation at the D2D-Tx and the IRS’s phase shift matrix. and T. Q. Duong, “Intelligent reconfigurable surface-assisted multi-
It is clear that the proposed algorithm is more effective for UAV networks: Efficient resource allocation with deep reinforcement
mitigating interference and providing a better communication learning,” 2021. [Online]. Available: https://arxiv.org/abs/2105.14142
[7] Y. Chen, B. Ai, H. Zhang, Y. Niu, L. Song, Z. Han, and H. V. Poor,
quality. “Reconfigurable intelligent surface assisted device-to-device communi-
Furthermore, we use neural networks for establishing the cations,” IEEE Trans. Wireless Commun., vol. 20, no. 5, pp. 2792–2804,
DRL algorithm. Thus, after iterative interactions with the May 2021.
[8] S. Jia, X. Yuan, and Y.-C. Liang, “Reconfigurable intelligent surfaces
environment, the neural networks are trained for achieving for energy efficiency in D2D communication network,” IEEE Wireless
an optimal solution. After training offline, the neural network Commun. Lett., vol. 10, no. 3, pp. 683–687, Mar. 2021.
can be deployed to the system for online execution. The [9] K. K. Nguyen, T. Q. Duong, N. A. Vien, N.-A. Le-Khac, and L. D.
Nguyen, “Distributed deep deterministic policy gradient for power
online neural networks can determine the proper action for allocation control in D2D-based V2V communications,” IEEE Access,
the IRS phase shift value and the D2D-Tx power allocation vol. 7, pp. 164 533–164 543, Nov. 2019.
for maximising the network sum-rate in real-time. [10] K. K. Nguyen, N. A. Vien, L. D. Nguyen, M.-T. Le, L. Hanzo, and T. Q.
Duong, “Real-time energy harvesting aided scheduling in UAV-assisted
D2D networks relying on deep reinforcement learning,” IEEE Access,
vol. 9, pp. 3638–3648, Dec. 2021.
[11] C. Huang, R. Mo, and C. Yuen, “Reconfigurable intelligent surface as-
sisted multiuser MISO systems exploiting deep reinforcement learning,”
41.0 IEEE J. Select. Areas Commun., vol. 38, no. 8, pp. 1839–1850, Aug.
2020.
40.5 [12] M. Shokry, M. Elhattab, C. Assi, S. Sharafeddine, and A. Ghrayeb,
“Optimizing age of information through aerial reconfigurable intelligent
Sum-rate (bits/s/Hz)

40.0 surfaces: A deep reinforcement learning approach,” IEEE Trans. Veh.

Proposed algorithm Technol., vol. 70, no. 4, pp. 3978–3983, Apr. 2021.
39.5 MPT
RPS [13] K. Feng, Q. Wang, X. Li, and C.-K. Wen, “Deep reinforcement learning
WithoutIRS based intelligent reflecting surface optimization for MISO communica-
39.0 tion systems,” IEEE Wireless Commun. Lett., vol. 9, no. 5, pp. 745–749,
May 2020.
38.5 [14] K. K. Nguyen, T. Q. Duong, T. Do-Duy, H. Claussen, and L. Hanzo, “3D
UAV trajectory and data collection optimisation via deep reinforcement
38.0 learning,” 2021. [Online]. Available: https://arxiv.org/abs/2106.03129
[15] D. P. Bertsekas, Dynamic Programming and Optimal Control. Athena
100 200 300 400 Scientific Belmont, MA, 1995, vol. 1, no. 2.
Pmax (mW) [16] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
“Proximal policy optimization algorithms,” 2017. [Online]. Available:
Fig. 5. The network sum-rate versus the maximum transmit power, Pmax . https://arxiv.org/abs/1707.06347
[17] J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-
dimensional continuous control using generalized advantage estimation,”
in Proc. 4th International Conf. Learning Representations (ICLR), 2016.
V. C ONCLUSION [18] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,”
in Proc. Int. Conf. Mach. Learn. PMLR, 2016, pp. 1928–1937.
In this paper, we have presented a DRL-based optimal [19] M. Abadi et al., “Tensorflow: A system for large-scale machine learn-
resource allocation scheme for IRS-assisted D2D communica- ing,” in Proc. 12th USENIX Sym. Opr. Syst. Design and Imp. (OSDI 16),
tions. The PPO algorithm with the clipping surrogate technique Nov. 2016, pp. 265–283.

No Doubt The Singles 1992-2003
88% (8)
No Doubt The Singles 1992-2003
104 pages
Odd Time Reading Text For All in
100% (2)
Odd Time Reading Text For All in
152 pages
Face Recognition Door Lock Slide
100% (5)
Face Recognition Door Lock Slide
25 pages
Cat Band Orchestra Bandpages 2011
No ratings yet
Cat Band Orchestra Bandpages 2011
82 pages
Intelligent Reflecting Surface Enhanced Wireless Network Joint Active and Passive Beamforming Design
No ratings yet
Intelligent Reflecting Surface Enhanced Wireless Network Joint Active and Passive Beamforming Design
6 pages
Sing A New Song
No ratings yet
Sing A New Song
4 pages
Robust and Secure Sum-Rate Maximization For Multiuser MISO Downlink Systems With Self-Sustainable IRS
No ratings yet
Robust and Secure Sum-Rate Maximization For Multiuser MISO Downlink Systems With Self-Sustainable IRS
18 pages
ALLGON 2002 Product Catalog
0% (1)
ALLGON 2002 Product Catalog
92 pages
The Manual (How To Have A Number One The Easy Way)
14% (29)
The Manual (How To Have A Number One The Easy Way)
52 pages
Learning To Reflect and To Beamform For Intelligent Reflecting Surface With Implicit Channel Estimation
No ratings yet
Learning To Reflect and To Beamform For Intelligent Reflecting Surface With Implicit Channel Estimation
15 pages
IRS Talkppt
No ratings yet
IRS Talkppt
34 pages
Towards Smart Wireless Communications Via Intelligent Reflecting Surfaces: A Contemporary Survey
No ratings yet
Towards Smart Wireless Communications Via Intelligent Reflecting Surfaces: A Contemporary Survey
33 pages
Ris + Uav
No ratings yet
Ris + Uav
17 pages
DDQN 1
No ratings yet
DDQN 1
5 pages
Dynamic Resource Allocation For IRS Assisted Energy Harvesting Systems With Statistical Delay Constraint
No ratings yet
Dynamic Resource Allocation For IRS Assisted Energy Harvesting Systems With Statistical Delay Constraint
6 pages
Sensors 23 08713
No ratings yet
Sensors 23 08713
17 pages
Low-Complexity Algorithm For Maximizing The Weighted Sum-Rate of Intelligent Reflecting Surface Assisted Wireless Networks
No ratings yet
Low-Complexity Algorithm For Maximizing The Weighted Sum-Rate of Intelligent Reflecting Surface Assisted Wireless Networks
10 pages
Joint Beamforming Design For Intelligent Omni Surface Assisted Wireless Communication Systems
No ratings yet
Joint Beamforming Design For Intelligent Omni Surface Assisted Wireless Communication Systems
17 pages
IRS-based Wireless Jamming Attacks When Jammers Can Attack Without Power
No ratings yet
IRS-based Wireless Jamming Attacks When Jammers Can Attack Without Power
10 pages
Intelligent Reflecting Surface vs. Decode-and-Forward: How Large Surfaces Are Needed To Beat Relaying?
No ratings yet
Intelligent Reflecting Surface vs. Decode-and-Forward: How Large Surfaces Are Needed To Beat Relaying?
5 pages
Sensors: Intelligent Reflecting Surface-Assisted Secure Multi-Input Single-Output Cognitive Radio Transmission
No ratings yet
Sensors: Intelligent Reflecting Surface-Assisted Secure Multi-Input Single-Output Cognitive Radio Transmission
22 pages
Intelligent Reflecting Surface Meets OFDM Protocol Design and Rate Maximization
No ratings yet
Intelligent Reflecting Surface Meets OFDM Protocol Design and Rate Maximization
14 pages
Ris + Uav
No ratings yet
Ris + Uav
12 pages
Intelligent Reflecting Surface Enhanced Wideband MIMO-OFDM Communications: From Practical Model To Reflection Optimization
No ratings yet
Intelligent Reflecting Surface Enhanced Wideband MIMO-OFDM Communications: From Practical Model To Reflection Optimization
13 pages
Rain Music Longhang Nguyen
No ratings yet
Rain Music Longhang Nguyen
3 pages
IRS-aided MIMO
No ratings yet
IRS-aided MIMO
30 pages
Intelligent Reflecting Surface: Practical Phase Shift Model and Beamforming Optimization
No ratings yet
Intelligent Reflecting Surface: Practical Phase Shift Model and Beamforming Optimization
7 pages
Channel Estimation and Passive Beamforming For Intelligent Reflecting Surface Discrete Phase Shift and Progressive Refinement
No ratings yet
Channel Estimation and Passive Beamforming For Intelligent Reflecting Surface Discrete Phase Shift and Progressive Refinement
17 pages
IoT Paper Final
No ratings yet
IoT Paper Final
17 pages
Intelligent Reflecting Surface Enhanced Wireless Network Via Joint Active and Passive Beamforming
No ratings yet
Intelligent Reflecting Surface Enhanced Wireless Network Via Joint Active and Passive Beamforming
16 pages
DSP Lab Sheet 2 PDF
No ratings yet
DSP Lab Sheet 2 PDF
50 pages
Intelligent Reflecting Surface Enhanced Wireless Network Via Joint Active and Passive Beamforming
No ratings yet
Intelligent Reflecting Surface Enhanced Wireless Network Via Joint Active and Passive Beamforming
16 pages
Beamforming Optimization For Active Intelligent Reflecting Surface-Aided SWIPT
No ratings yet
Beamforming Optimization For Active Intelligent Reflecting Surface-Aided SWIPT
17 pages
Spatial Throughput Characterization For Intelligent Reflecting Surface Aided Multiuser System
No ratings yet
Spatial Throughput Characterization For Intelligent Reflecting Surface Aided Multiuser System
5 pages
Wu 2021 Intelligent
No ratings yet
Wu 2021 Intelligent
39 pages
Irs Active Power Infor
No ratings yet
Irs Active Power Infor
5 pages
Sum-Rate Maximization For Multiuser MISO Downlink Systems With Self-Sustainable IRS
No ratings yet
Sum-Rate Maximization For Multiuser MISO Downlink Systems With Self-Sustainable IRS
7 pages
IRS Doppler Ken VTC2021S v20 210915
No ratings yet
IRS Doppler Ken VTC2021S v20 210915
6 pages
2002 02182v1
No ratings yet
2002 02182v1
5 pages
Intelligent Reflecting Surface Practical Phase Shift
No ratings yet
Intelligent Reflecting Surface Practical Phase Shift
5 pages
Intelligent Reflecting Surface-Assisted Wireless Powered Heterogeneous Networks
No ratings yet
Intelligent Reflecting Surface-Assisted Wireless Powered Heterogeneous Networks
12 pages
Sensors 22 02390 v2
No ratings yet
Sensors 22 02390 v2
21 pages
Ergodic Rate Analysis and IRS Configuration For Multi-IRS Dual-Hop DF Relaying Systems
No ratings yet
Ergodic Rate Analysis and IRS Configuration For Multi-IRS Dual-Hop DF Relaying Systems
5 pages
Intelligent Reflecting Surface-Enhanced OFDM Channel Estimation and Reflection Optimization
No ratings yet
Intelligent Reflecting Surface-Enhanced OFDM Channel Estimation and Reflection Optimization
5 pages
Reconfigurable Intelligent Surfaces-Assisted Multiuser MIMO Uplink Transmission With Partial CSI
No ratings yet
Reconfigurable Intelligent Surfaces-Assisted Multiuser MIMO Uplink Transmission With Partial CSI
15 pages
Beamforming Design Via Machine Learning in Intelligent Ref - 2025 - Physical Com
No ratings yet
Beamforming Design Via Machine Learning in Intelligent Ref - 2025 - Physical Com
10 pages
Intelligent Reflecting Surfaces To Achieve The Full-Duplex Wireless Communication
No ratings yet
Intelligent Reflecting Surfaces To Achieve The Full-Duplex Wireless Communication
5 pages
Esquema Elétrico Teclado Cássio mz500
100% (2)
Esquema Elétrico Teclado Cássio mz500
92 pages
A Hybrid Relay and Intelligent Reflecting Surface Network and Its Ergodic Performance Analysis
No ratings yet
A Hybrid Relay and Intelligent Reflecting Surface Network and Its Ergodic Performance Analysis
5 pages
Intelligent Reflecting Surface Aided Wireless Communications: A Tutorial
No ratings yet
Intelligent Reflecting Surface Aided Wireless Communications: A Tutorial
74 pages
J1 Music
No ratings yet
J1 Music
2 pages
Sensors 22 06214 v2
No ratings yet
Sensors 22 06214 v2
18 pages
Optimal User Cooperation in IRS Assisted WPCN For Practical Non-Linear Phase-Shift Model
No ratings yet
Optimal User Cooperation in IRS Assisted WPCN For Practical Non-Linear Phase-Shift Model
5 pages
Active and Passive Beamforming Optimization For IRS Based Wireless Communication System
No ratings yet
Active and Passive Beamforming Optimization For IRS Based Wireless Communication System
19 pages
Real-Time and Security-Aware Precoding in RIS-Empowered Multi-User Wireless Networks
No ratings yet
Real-Time and Security-Aware Precoding in RIS-Empowered Multi-User Wireless Networks
6 pages
Lcomm 2020 3011843
No ratings yet
Lcomm 2020 3011843
5 pages
Glocom 2018 8647620
No ratings yet
Glocom 2018 8647620
6 pages
2021 Resource Allocation For Active IRS-Assisted Multiuser Communication Systems
No ratings yet
2021 Resource Allocation For Active IRS-Assisted Multiuser Communication Systems
15 pages
2022 Beamforming Optimization For Active Intelligent Reflecting Surface-Aided SWIPT
No ratings yet
2022 Beamforming Optimization For Active Intelligent Reflecting Surface-Aided SWIPT
32 pages
Resource Allocation For An IRS-Assisted Dual-Functional Radar and Communication System Energy Efficiency Maximization
No ratings yet
Resource Allocation For An IRS-Assisted Dual-Functional Radar and Communication System Energy Efficiency Maximization
14 pages
Truly Intelligent Reflecting Surface-Aided Secure Communication Using Deep Learning
No ratings yet
Truly Intelligent Reflecting Surface-Aided Secure Communication Using Deep Learning
6 pages
Deep Reinforcement Learning Based Intelligent Reflecting Surface For Secure Wireless Communications
No ratings yet
Deep Reinforcement Learning Based Intelligent Reflecting Surface For Secure Wireless Communications
6 pages
Zar Gari 2020
No ratings yet
Zar Gari 2020
5 pages
IRS-Based Energy Efficiency and Admission Control Maximization For IoT Users With Short Packet Lengths
No ratings yet
IRS-Based Energy Efficiency and Admission Control Maximization For IoT Users With Short Packet Lengths
6 pages
Flash Cards For Grammar
No ratings yet
Flash Cards For Grammar
5 pages
Deep Reinforcement Learning For Intelligent Reflecting Surfaces Towards Standalone Operation
No ratings yet
Deep Reinforcement Learning For Intelligent Reflecting Surfaces Towards Standalone Operation
5 pages
Intelligent Reflecting Surface Enhanced
No ratings yet
Intelligent Reflecting Surface Enhanced
35 pages
Deep Reinforcement Learning-Based Relay Selection in Intelligent Reflecting Surface Assisted Cooperative Networks
No ratings yet
Deep Reinforcement Learning-Based Relay Selection in Intelligent Reflecting Surface Assisted Cooperative Networks
5 pages
1ista de Productos Giuseppino
No ratings yet
1ista de Productos Giuseppino
63 pages
Gerunds (Verbal Nouns) Are Used in The Following Cases
No ratings yet
Gerunds (Verbal Nouns) Are Used in The Following Cases
6 pages
Think Level 3. Unit 4
0% (1)
Think Level 3. Unit 4
1 page
Wheels On The Bus-F Major-Large Note Moveble Do Sheet Music
No ratings yet
Wheels On The Bus-F Major-Large Note Moveble Do Sheet Music
4 pages
Unit 4 Life in The Past
No ratings yet
Unit 4 Life in The Past
7 pages
Domingo Merito General
No ratings yet
Domingo Merito General
132 pages
Capella Romana. The Fall of Constantinople
No ratings yet
Capella Romana. The Fall of Constantinople
25 pages
Solid State Power Amps - Part 1
No ratings yet
Solid State Power Amps - Part 1
6 pages
AI and 6G Security: Opportunities and Challenges: June 2021
No ratings yet
AI and 6G Security: Opportunities and Challenges: June 2021
7 pages
Fishman INK BODY User Guide en
No ratings yet
Fishman INK BODY User Guide en
8 pages
Bebe Rexha
0% (1)
Bebe Rexha
3 pages
Guilt - Hilary Norman
No ratings yet
Guilt - Hilary Norman
392 pages
06fallcatalogp1 44
No ratings yet
06fallcatalogp1 44
44 pages
Krakow
No ratings yet
Krakow
5 pages
CLOSER CHORDS (Ver 4) by The Chainsmokers Feat. Halsey at Ultimate-Guitar
No ratings yet
CLOSER CHORDS (Ver 4) by The Chainsmokers Feat. Halsey at Ultimate-Guitar
5 pages
High Speed PCB Layout Techniques
No ratings yet
High Speed PCB Layout Techniques
41 pages
6G Internet of Things A Comprehensive Survey
No ratings yet
6G Internet of Things A Comprehensive Survey
26 pages
EXAM For 4th Quarter
No ratings yet
EXAM For 4th Quarter
4 pages
Harnessing The Power of Smart and Connected Health To Tackle Covid-19: Iot, Ai, Robotics, and Blockchain For A Better World
No ratings yet
Harnessing The Power of Smart and Connected Health To Tackle Covid-19: Iot, Ai, Robotics, and Blockchain For A Better World
23 pages
Dokumen - Tips - MC Text Closing Ceremony English Carnival
No ratings yet
Dokumen - Tips - MC Text Closing Ceremony English Carnival
4 pages
Smart and Secure CAV Networks Empowered by AI-Enabled Blockchain: Next Frontier For Intelligent Safe-Driving Assessment
No ratings yet
Smart and Secure CAV Networks Empowered by AI-Enabled Blockchain: Next Frontier For Intelligent Safe-Driving Assessment
8 pages
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
No ratings yet
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
8 pages
Blockchain Enabled Distributed Cooperative D2D Communications
No ratings yet
Blockchain Enabled Distributed Cooperative D2D Communications
6 pages
CSE Ratna Mudi
No ratings yet
CSE Ratna Mudi
2 pages
Gullah Culture Presentation
No ratings yet
Gullah Culture Presentation
5 pages
BC Magazine Final Version
No ratings yet
BC Magazine Final Version
19 pages
Hotteterre, Ornamentos. Tabla
No ratings yet
Hotteterre, Ornamentos. Tabla
1 page
I Want You Back: Q 100, Latin Funk
No ratings yet
I Want You Back: Q 100, Latin Funk
1 page
Concise Guide to OTN optical transport networks
From Everand
Concise Guide to OTN optical transport networks
alasdair gilchrist
4/5 (2)
IGNOU MCS 231 Mobile Computing Previous Year Solved Papers
From Everand
IGNOU MCS 231 Mobile Computing Previous Year Solved Papers
Manish Soni
No ratings yet
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Concise Guide to DWDM
From Everand
Concise Guide to DWDM
alasdair gilchrist
5/5 (2)

Deep Reinforcement Learning For Intelligent Reflec

Uploaded by

Deep Reinforcement Learning For Intelligent Reflec

Uploaded by

1

Deep Reinforcement Learning for Intelligent

and [8], two sub-problems with fixed passive beamforming

as Xnt (Tx) = xtn (Tx), ynt (Tx) , n = 1, . . . , N and that of h̃LoS

where κ1 is the path loss exponent for the NLoS component

where ptn is the transmit power at the nth D2D-Tx at time

Algorithm 1 Proposed approach based on the PPO algorithm

11: Collect set of partial trajectories with D transitions 10 20 30 40 50

Fig. 3. The network sum-rate versus the number of D2D pairs, N .

has been proposed for joint optimisation of the D2D-Tx power

IRS in mitigating the interference in the D2D communications

40.0 surfaces: A deep reinforcement learning approach,” IEEE Trans. Veh.

You might also like