DQN 1

The document proposes a deep Q-learning network based resource allocation scheme for massive MIMO-NOMA systems. It develops a reinforcement learning framework to build an iterative optimization structure for user clustering, power allocation, and beamforming. Specifically, a DQN is used to group users based on a reward item calculated after power allocation and beamforming, with the goal of maximizing system throughput.

Uploaded by

sima.sobhi70

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views5 pages

DQN 1

Uploaded by

sima.sobhi70

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1544 IEEE COMMUNICATIONS LETTERS, VOL. 25, NO.

5, MAY 2021

A Deep Q-Network Based-Resource Allocation Scheme

for Massive MIMO-NOMA
Yanmei Cao , Guomei Zhang , Member, IEEE, Guobing Li , Member, IEEE, and Jia Zhang

Abstract— In this letter, a deep Q-learning network (DQN) decoder detects the signal according to the power orders of
based resource allocation (RA) scheme is proposed for the mas- users. Thus, the error accumulation is an inherent problem of
sive multiple-input multiple-output (MIMO)- nonorthogonal mul- the SIC decoder. The more users are superimposed, the error
tiple access (NOMA) systems. The reinforcement learning (RL)
frame is developed to build an iterative optimization structure for propagation is more serious. This will limit the transmission
user clustering, power allocation and beamforming. Specifically, efficiency greatly. In addition, each beam has to cover all the
a DQN is designed to group the users based on the reward item users in one cluster in a MIMO-NOMA system, rather than
calculated after power allocation and beamforming. The objective one user in a MIMO-OMA system. Then, the tradeoff between
is to maximize the reward item, i.e., the system throughput. Then, enhancing the intra-cluster coverage and eliminating the inter-
a back propagation neural network (BPNN) is used to realize the
power allocation. During the training of BPNN, the exhaustive cluster interference becomes more difficult for beamforming.
search results in the quantized power set are taken as the output Base on the above analysis, joint optimization of user cluster-
labels. Simulation experiments show that the proposed scheme ing, power allocation (PA) and beamforming becomes more
can achieve high system spectrum efficiency approximating to the urgent for massive MIMO-NOMA.
exhaustive search based on user clustering and power allocation. Unfortunately, such a joint RA problem with multiple
Index Terms— Non-orthogonal multiple access, massive variables has been proven to be a NP-hard problem [5]. The
multiple-input multiple-output, resource allocation, deep alternate optimization for three parts is usually used. For the
Q-learning network, back propagation neural network. user clustering sub-problem, the optimal solution is obtained
by searching all clustering combinations exhaustively. For the
I. I NTRODUCTION scenario with a large number of users, some heuristic cluster-

D UE to the extreme shortage of wireless spectrum

resources, how to further improve spectrum efficiency
and system capacity is still a key problem for the next
ing algorithms, such as random pairing and the next-largest-
difference-based user pairing algorithm (NLUPA) [6], have
been proposed to reduce the system complexity. As for PA,
generation of mobile communications [1]. Non-orthogonality some simple methods such as the fixed power allocation (FPA)
and large dimensions are considered to be effective for tackling and the fractional transmit power allocation (FTPA) have been
this problem [2]. NOMA could support the needs of large-scale utilized [7]. In addition, considering the tight coupling between
access by superposing signals from multiple users in the same user clustering and power allocation, some joint optimization
orthogonal resource block. In addition, massive MIMO could schemes have also been studied. In [8], to maximize the energy
realize a high capacity through equipping with a large-scale of efficiency, a joint user pairing and dynamic power allocation
antennas [3]. The combination of NOMA and massive MIMO scheme was proposed. In order to improve the service quality
technology is able to exploit the degrees of freedom in both the for the cell-edge users, the dynamic user allocation and power
power domain and space domain, thereby further to improve optimization problems were studied by considering the user
the system spectrum efficiency. However, with the greatly fairness [9]. The above researches show that the joint optimal
increasing of user number and density in massive MIMO- transmission can work more efficiently. However, with the
NOMA systems, various contradictions become significant, increase in the number of antennas, how to improve the system
including the conflict between the transmission efficiency and capacity while avoiding the excessive computation complexity
the cumulative error propagation of successive interference and energy consumption is the primary difficulty for the joint
cancellation (SIC) decoder, and the trade-off between the intra- optimization. With the development of machine learning (ML)
cluster coverage enhancement and the inter-cluster interference in wireless communications, ML solutions for RA and wireless
suppression. Specifically, in order to obtain the advantages transmission continue to emerge [10], such as the channel allo-
of power domain multiplexing, a large difference of channel cation based on RL in [11], the power allocation and subchan-
gain among the superimposed users is required and different nel assignment schemes based on the deep recurrent neural
levels of power is allocated to the users [4]. Then, the SIC network (RNN) architecture for a NOMA-based heterogeneous
Internet of Things system in [12] and the long short term
Manuscript received December 26, 2020; accepted January 18, 2021. Date
of publication January 28, 2021; date of current version May 6, 2021. The memory network based power allocation for NOMA systems
research work reported in this letter is supported by the National Key R&D in [13]. These works show the potential advantage of applying
Program of China under the Grant No. 2020YFB1807702 and the National ML algorithms to the traditional communication systems.
Natural Science Foundation of China under the Grants No. 61941119. The
associate editor coordinating the review of this letter and approving it for Motivated by the previous research, this letter develops a RL
publication was J. Choi. (Corresponding author: Guomei Zhang.) method to solve the complex joint RA problem for massive
The authors are with the School of Information and Communications MIMO-NOMA systems. However, the traditional RL schemes,
Engineering, Xi’an Jiaotong University, Xi’an 710049, China (e-mail:
[email protected]). such as Q-Learning, are not suitable for the case that the
Digital Object Identifier 10.1109/LCOMM.2021.3055348 state space is too large and the learning efficiency would be
1558-2558 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP Q-NETWORK BASED-RESOURCE ALLOCATION SCHEME FOR MASSIVE MIMO-NOMA 1545

extremely low. Therefore, a deep Q-learning network (DQN) is σ 2 . The second and third terms after the second equal sign in
adopted to realize the joint RA scheme under the scene with a (2) represents the intra-cluster and inter-cluster interferences,
large number of users. Then, a deep coupled iterative structure respectively. The beamforming vector is designed to eliminate
involving three functional modules, namely user clustering, inter-cluster interference, which should satisfy hn w i = 0, i =
power allocation and beamforming, is established based on n. However, it is difficult to achieve such an ideal result in
the deep reinforcement learning (DRL) frame. In the user practice and the inter-cluster interference couldn’t be ignored.
clustering stage, DQN is used to gradually adjust the clustering Moreover, suppose that the SIC detector in the receiver could
results to maximize the system throughput, which is calculated cancel the interference from the previous users ideally, and
by the environment evaluator based on the previous clustering the user channel quality for the n-th user cluster ranks as
2 2 2
result, power factors and beamforming vectors. In the power hn,1 ≤ hn,2 ≤ · · · hn,K . Thus, the signal to inter-
assignment, a BPNN is designed to learn the relationship ference plus noise ratio of user Un,k is
2
between the power allocation factors and the users’ channel |hn,k wn | αn,k Pn
Φn,k = N
state information (CSI) for each user cluster. As for the 2 2
K

beamforming, some traditional methods, such as the zero |hn,k wi | Pi +|hn,k w n | αn,j Pn +σ 2
i=1,i=n j=k+1
forcing (ZF) algorithm and the other optimized beamforming (3)
schemes could be directly used among different user clusters.
Simulation experiments show that the iterative process can
Furthermore, with the objective of maximum sum rate, the
converge after dozens of iterations and the system performance
joint optimization problem could be written as follows:
could approximate the case of exhaustive search scheme. N K
max Rsum = Blog2 (1 + Φn,k )
{αn,k },{wn },{Un,k } n=1 k=1
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION N
s.t. C1 : αn,k ≤ 1,αn,k ∈ [0, 1]
Consider a single-cell multi-users downlink system. The n=1
N
base station (BS) is equipped with Nt antennas and serves C2 : Pn ≤ P
n=1
L single-antenna users. All users in the cell are divided into
C3 : Rn,k ≥ Rmin
N clusters, each of which includes K users. The users’ 2
data in one cluster are transmitted in the form of power- C4 : W = 1 (4)
domain NOMA signal structure and preprocessed by the same where B is the bandwidth of one user channel. C1 is the
beamforming vector. Assume that the BS deploys antennas on power allocation factor constraint for each user cluster and
the Y -Z plane in terms of the uniform planar array (UPA). C2 denotes the total power constraint of the BS over one user
Then, the channel vector from the BS to user m can be channel. C3 can ensure the minimum data rate for each user
modeled similarly as in [14], which is given by and C4 is the norm constraint of the beamforming matrix.
Lu
1 Because the joint optimization problem (4) is non-convex
hm = ( dm/d0 )μ √ gm,l b(vm,l ) ⊗ a(um,l ) (1) [5], the traditional solutions are always the heuristic or the
Lu l=1
alternating iterative methods. These methods exist the draw-
Here d0 is the radius of cell and dm is the distance between the backs of high complexity or limited performance. While, DRL
user and the BS. μ is the large-scale fading factor. b(vm,l ) and can not only fully explore the hidden information of big data
a(um,l ) are the vertical and horizontal array response vectors to improve its own learning performance, but also realize
of the UPA antenna, respectively. The symbol “⊗” in formula the dynamic real-time interaction. This method has strong
(1) represents the kronecker product of two matrices. Lu is generalization ability and highlights its advantages in wireless
the number of scattering paths and g denotes the small-scale RA. Therefore, this letter proposes a joint optimization method
fading coefficient. to solve the problem (4) based on the DRL in Section III.
Assume the data sent by the BS being X = [x1 · · · xN ]T ∈
K
C N ×1 , where xn = αn,k Pn sn,k is the superposed III. DQN BASED -R ESOURCE A LLOCATION S CHEME
k=1
NOMA signal of K users in the n-th cluster. Here, Pn is the The proposed RA network based on DQN is shown in Fig.1
total power of the n-th cluster. αn,k and sn,k are the power and it includes three parts: user clustering, power allocation
allocation factor and the transmitted symbol of the k-th user in and beamforming. While, the first two parts are mainly con-
the n-th cluster, denoted by Un,k , respectively. It is assumed cerned in this section.
2
that E[|sn,k | ] = 1. The received signal of user Un,k is
yn,k A. User Clustering Based on DQN

= hn,k W X +zn,k = hn,k w n αn,k Pn sn,k The user clustering problem is modeled as a RL task, which
K N
consists of the agent and the environment. Specifically, the user
+hn,k wn αn,k Pn sn,k +hn,k wi xi +zn,k clustering module is taken as an agent and the performance of
j=1,j=k i=1,i=n the massive MIMO-NOMA system is the environment. The
(2) actions {at } taken by the agent are based on the expected
where W = [w1 · · · wN ] ∈ C Nt ×N is a beamforming matrix, rewards from the environment. According to the considered
and hn,k ∈ C 1×Nt is the channel vector for user Un,k . zn,k is system, each part of the RL framework is described as
a complex white Gaussian noise with zero mean and variance follows:

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.
1546 IEEE COMMUNICATIONS LETTERS, VOL. 25, NO. 5, MAY 2021

taking action a from state s. In addition, the experience

replay strategy used in [15] is adopted in training. First, the
observation samples (st , at , rt , st+1 ) in previous iterations are
stored. Then a mini-batch of samples are randomly selected
from the replay memory to feed into the target Q-network to
get the training labels. The objective of this process is to break
the correlations among data samples and make training conver-
gent. Following the Q-network, ε-greedy strategy is utilized to
Fig. 1. Joint optimization network based on DQN .
choose the current action. That is to select an action randomly
from the whole action space with the exploration probability
State space S: The CSI of all users in each cluster forms ε and determine an action to receive the maximum Q-value
the current state of the t-th iteration, which is given by st = according to the Q-network output with the probability (1−ε).
{[h1,1 , · · · h1,K ], · · · [hN,1 , · · · , hN,K ]}.
Action space A: It should contain actions that can cover all
possible clustering results. When there are L users, the number B. Power Allocation Based on BPNN
of possible actions reaches to CLK CL−K K K
· · · CL−(N −1)K /N !.
In order to ensure the effectiveness of the SIC receiver in
The size of the action space will increase dramatically power-domain NOMA, the power of users in the same cluster
with the increase of users. The purpose of taking action needs to be assigned appropriately. Power allocation is key
is to select a suitable group for each user. Taking to compromise between the system sum rate and the users’
t t t t
the action at = {[U1,1 , · · · , U1,K ], · · · , [UN,1 , · · · , UN,K ]} fairness. Different from the traditional optimization algorithm,
for the state st will result in the new state st+1 . a BPNN based power allocator is designed to reduce the
at
This impact is defined as st → st+1 . Here st+1 is computation complexity as well as obtain a good performance.
{[h1,1 , · · · , h1,K ], · · · , [hN,1 , · · · , htN,K ]}, which fully corre-
t t t The task of power allocation is to calculate the users’
sponds to the new user grouping result at . And htn,k is the power allocation factors αn = [αn,1 , . . . ,αn,K ], αn,k ∈ [0, 1]
t for each group under a given user clustering result. In order
CSI of the user Un,k , i.e., the k-th user in the n-th group for
the current action at . to explore the nonlinear mapping ability of the BPNN’s to
Reward function: Here, the system sum rate rt = Rsum extract the relationship between the power allocation and the
is taken as the reward function, which is also related with users’ CSI among a cluster, the BPNN needs to be trained
{αn,k } and {wn }. The ideal goal of RL is to maximize the based on large amounts of labeled data. Here, the result of
∞
cumulative discounted reward Rt = i=0 γ i rt+i , where the exhaustive search based power allocation (ESPA) algorithm
discount factor is γ ∈ [0, 1]. Obviously, the action of each executed among a finite power allocation factor set, α n is
iteration can only be finalized after all iterations have been used as the training label for BPNN. Here, the finite set
completed with such a target. Specifically, a state-action value for ESPA is obtained by discretizing the continuous factor
(Q-value) function defined in (5) [15], which can determine the range with a small step size. In order to ensure the fairness
current action only based on the next iteration’s value function, of users, the constraint Rn,k ≥ Rmin is realized by the
is used in Q-learning. In order to find the optimal at for the ESPA. Because the beamforming matrix are unknown when
given state st to make the Q-value maximized, Eq.(5) has to generating the training data for the BPNN, the calculation
of Rn,k does not involve W and its expression is Rn,k =
2 K
be calculated for all actions. If L is very large, the complexity 2 2
will be extremely high and the algorithm converges slowly. B log(1+hn,k αn,k Pn /(hn,k j=k+1 αn,j Pn + σ )).
The BPNN consists of input layer, output layer and hidden
Q(st , at ) = E[rt + γ max Q(st+1 , at+1 ) |st , at ] (5)
at+1 layers. The hidden layers of the BPNN adopt the rectified
Deep Q-Network: To speed up the convergence of linear unit (ReLu) activation function [16] and other
Q-learning, a deep Q-Network is adopted to estimate the layers use the linear activation function. The input of the
Q-values. A fully connected neural network with three hidden BPNN is channel information {hn,1 , · · · , hn,K } of
layers is involved. Its input and output are the current state one cluster’s users, and the output is the corresponding
and the estimated Q-values corresponding to each action, power allocation factors αn . The loss function is defined
respectively. The function of the Q-network is represented as as Loss = min αn −α n 2 and the stochastic gradient
ωB ,bB
Q (st , at , ω), where ω is the weight set to be trained. In order descent (SGD) method is used to update the network
to ensure the convergence of Q-network’s parameters updating, parameters {ωB , bB } in training. The trained network can
a target Q-network, which has the same structure and the initial directly calculate the power allocation result based on the
weights as the Q-network but keeps the old weights ω C− of channel information and the online calculation complexity
C iterations ago during working, is included to provide the could be greatly reduced.
relatively stable labels for Q-network training. Hence, the loss Obviously, different BPNNs need to be trained for different
function used in training is given by configurations of K. In our work, the number users K in every
L(ω) = E[(r + γ max

Q(s , a , ωC− ) − Q(s, a, ω))2 ] (6) cluster is assumed identical. Then, the same pre-trained BPNN
a
can be used by all the clusters. In a practical system, the para-
where the label is y = r + γ max

Q(s , a , ω C− ) calculated meter K might be different for different user clusters accord-
a
based on the old weights ωC− and the new state s after ing to the users’ access requirement and their CSI. In this case,

TABLE I
S IMULATION PARAMETERS

multiple BPNNs need to be pre-trained for all the cases of K.

Fortunately, the possible values of K is limited in a very small
Fig. 2. The system total spectrum efficiency (L = 8, K = 2).
set. From the various references surveyed, it can be seen that
the number of superimposed users in a NOMA signal structure
is usually selected as 2 or 3 [4], [12], [17], [18]. Actually, with
K being larger than 3, the error propagation in the SIC decoder
becomes more serious and the total system performance will
decrease instead. This result can be seen from comparing
the simulation curves in Fig.4 and Fig.2. Since K is suitable
for 2 or 3, we only need to train two BPNNs in advance. Then,
the corresponding network structures and parameters could
be saved with an acceptable memory capacity requirement.
During work, the BS can call the needed BPNN based on the
actual number of users in each cluster. Through such process,
the proposed BPNN PA scheme becomes feasible even if the
number of users per cluster is not fixed.
Fig. 3. The system total spectrum efficiency (L = 1 2, K = 2).
IV. S IMULATION R ESULTS
In the simulation, one BS is located in the center of the
cell and users are randomly and uniformly distributed in the
cell. The specific simulation parameters are shown in Table 1.
For the power allocation network, 100,000 groups of data
are used to train the BPNN. The step size of power factor’s
discretization is 0.01. In order to guarantee the generalization
performance of the BPNN, additional 20000 groups of data are
used to test the network. In all schemes, the ZF beamforming
algorithm is selected, and the beamforming vector is calculated
based on the CSI with the best quality in each cluster. Five
benchmark schemes correspond to the different combinations
of user clustering methods, namely DQN, NLUPA and exhaus-
tive search (ES), and power allocation schemes, namely FTPA,
BPNN and ESPA. Fig. 4. The system total spectrum efficiency (L = 8, K = 4).
In Fig.2, the system spectrum efficiency of different
schemes are compared when L = 8 and K = 2. The total trans- the DQN user clustering. Furthermore, the simulation curves
mission power range is 0.08-4 W . The BPNN was adjusted corresponding to the ESPA, BPNN based PA and FTPA with
to 5 layers and the numbers of three hidden layers’ nodes adopting the ES user clustering are given in Fig.2, respectively.
are set as 32,64 and 32 through many times of experiments. From these curves, we can find that the proposed BPNN PA
It can be observed that the scheme DQN-FTPA outperforms outperforms the FTPA about 3 bit/s/Hz with the power of
the scheme NLUPA-FTPA due to using the DQN based user 4W. Its performance is almost same as the ESPA, but the
clustering, which takes into account the real-time interaction complexity could be reduced obviously by using the pre-
with the current environment. The result of DQN almost trained BPNN.
reaches to the one of the ES clustering, which can be taken as In addition, Fig.3 shows the simulation results as the number
the upper bound of the user clustering’s performance. Besides, of user increases to 12. Here, the number of users per cluster
the system spectrum efficiency could be improved by about is still 2, so the same pre-trained BPNN as used in the
3bit/s/Hz further with the power being above 1W, through simulation of Fig.2 is adopted. From Fig.3, it can be found
adopting the BPNN power allocation simultaneously above that the proposed DQN-BPNN RA scheme still maintains

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.
1548 IEEE COMMUNICATIONS LETTERS, VOL. 25, NO. 5, MAY 2021

V. C ONCLUSION
This work mainly studies the downlink RA problem in
the massive MIMO-NOMA system. In order to maximize
the system spectrum efficiency under the premise of ensuring
the worst user performance constraint, a deep Q-learning
network and a BP neural network are designed to realize the
joint user clustering and the intra-cluster power allocation,
respectively. The simulation results demonstrate the advantage
of our scheme on improving system spectrum efficiency.

R EFERENCES
[1] F. Tang, Y. Kawamoto, N. Kato, and J. Liu, “Future intelligent and
secure vehicular network toward 6G: Machine-learning approaches,”
Proc. IEEE, vol. 108, no. 2, pp. 292–307, Feb. 2020.
[2] Z. Shi, W. Gao, S. Zhang, J. Liu, and N. Kato, “AI-enhanced cooperative
Fig. 5. CDF curves of the user’s spectrum efficiency. spectrum sensing for non-orthogonal multiple access,” IEEE Wireless
Commun., vol. 27, no. 2, pp. 173–179, Apr. 2020.
a performance comparable to the ES-ESPA case and has [3] G. Zhang et al., “Interference management by vertical beam control
a significant gain over the NLUPA-FTPA method for the combined with coordinated pilot assignment and power allocation in
scenarios with more users. Furthermore, we also find that the 3D massive MIMO systems,” KSII Trans. Internet Inf. Syst., vol. 9,
no. 8, pp. 2797–2820, Aug. 2015.
total system spectrum efficiency increases about 7 bit/s/Hz [4] Z. Ding, F. Adachi, and H. V. Poor, “The application of MIMO to non-
at the power of 4W, when the number of users improves orthogonal multiple access,” IEEE Trans. Wireless Commun., vol. 15,
from 8 to 12. no. 1, pp. 537–552, Jan. 2016.
[5] Y. Sun, D. W. K. Ng, Z. Ding, and R. Schober, “Optimal joint power and
What’s more, we change the number of intra-cluster users subcarrier allocation for full-duplex multicarrier non-orthogonal multiple
and observe its effect on the system performance, just shown in access systems,” IEEE Trans. Commun., vol. 65, no. 3, pp. 1077–1091,
Fig.4. Here, K increases from 2 to 4 while L keeps to 8. In this Mar. 2017.
[6] S. M. R. Islam, M. Zeng, O. A. Dobre, and K.-S. Kwak, “Resource
case, the BPNN needs to be retrained. The network is adjusted allocation for downlink NOMA systems: Key techniques and open
to having 5 hidden layers, where the numbers of nodes are issues,” IEEE Wireless Commun., vol. 25, no. 2, pp. 40–47, Apr. 2018.
32, 64, 128, 64 and 32, respectively. It can be observed that [7] A. Benjebbour, A. Li, Y. Saito, Y. Kishiyama, A. Harada, and
T. Nakamura, “System-level performance of downlink NOMA for future
our scheme has a larger performance gain over the NLUPA- LTE enhancements,” in Proc. IEEE Globecom Workshops (GC Wkshps),
FTPA method for this case. However, we find that the system Atlanta, GA, USA, Dec. 2013, pp. 66–70.
spectrum efficiency for all the schemes under K = 4 reduces [8] S. Chinnadurai, P. Selvaprabhu, and M. H. Lee, “A novel joint user
pairing and dynamic power allocation scheme in MIMO-NOMA sys-
significantly when compared to the ones in Fig.2 (K = 2). This tem,” in Proc. Int. Conf. Inf. Commun. Technol. Converg. (ICTC), Jeju,
is just caused by the serious intra-cluster error propagation in South Korea, Oct. 2017, pp. 951–953.
the SIC decoder. The simulation work in Fig.4 can show that [9] Y. Liu, M. Elkashlan, Z. Ding, and G. K. Karagiannidis, “Fairness of
user clustering in MIMO non-orthogonal multiple access systems,” IEEE
the given BPNN based PA is applicable for the other case of Commun. Lett., vol. 20, no. 7, pp. 1465–1468, Jul. 2016.
K and K is suitably set as lower than 4 due to the limitation [10] R. Zhu and G. Zhang, “A segment-average based channel estimation
of the intra-cluster error propagation. scheme for one-bit massive MIMO systems with deep neural network,”
in Proc. IEEE 19th Int. Conf. Commun. Technol. (ICCT), Xi’an, China,
Fig.5 shows the CDF curve of user’s spectrum efficiency Oct. 2019, pp. 81–86.
got by the proposed scheme and the scheme ES-FTPA under [11] C. He, Y. Hu, Y. Chen, and B. Zeng, “Joint power allocation and channel
the total transmission power of 4 W , L = 8 and K = 2. The assignment for NOMA with deep reinforcement learning,” IEEE J. Sel.
Areas Commun., vol. 37, no. 10, pp. 2200–2210, Oct. 2019.
dotted line is the performance of the case that the inter-cluster [12] M. Liu, T. Song, and G. Gui, “Deep cognitive perspective: Resource
interference is ignored. It shows that the scheme DQN-BPNN allocation for NOMA-based heterogeneous IoT with imperfect SIC,”
obtains an obvious gain of the system spectrum efficiency IEEE Internet Things J., vol. 6, no. 2, pp. 2885–2894, Apr. 2019.
[13] G. Gui, H. Huang, Y. Song, and H. Sari, “Deep learning for an effective
against the traditional method. However, the performance of nonorthogonal multiple access scheme,” IEEE Trans. Veh. Technol.,
the edge users improves just slightly and has a big gap from the vol. 67, no. 9, pp. 8440–8450, Sep. 2018.
ideal situation. Although the minimum rate constraint for users [14] D. Ying, F. W. Vook, T. A. Thomas, D. J. Love, and A. Ghosh,
“Kronecker product correlation model and limited feedback codebook
has been considered in BPNN, the inter-cluster interference, design in a 3D channel model,” in Proc. IEEE Int. Conf. Commun. (ICC),
which cannot be suppressed by ZF beamforming, still worsens Sydney, NSW, Australia, Jun. 2014, pp. 5865–5870.
the final performance. Moreover, in the ideal situation without [15] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep rein-
forcement learning for dynamic multichannel access in wireless net-
inter-cluster interference, the lowest user’s spectrum efficiency works,” IEEE Trans. Cogn. Commun. Netw., vol. 4, no. 2, pp. 257–265,
cannot be kept on Rmin . This is because FPTA rather than Jun. 2018.
ESPA is used to generate the training labels in some extreme [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
cases. In such cases, Rmin cannot be realized for few users Process. Syst., Stateline, NV, USA, Dec. 2012, pp. 1097–1105.
no matter how to adjust the users’ power within a cluster due [17] TP for Classification of MUST Schemes, document R1-154999, 3GPP,
to their very poor channel conditions. But nearly 90% users 2015.
[18] J.-M. Kang, I.-M. Kim, and C.-J. Chun, “Deep learning-based MIMO-
still achieve the better performance than Rmin by the proposed NOMA with imperfect SIC decoding,” IEEE Syst. J., vol. 14, no. 3,
scheme. pp. 3414–3417, Sep. 2020.

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.

Kumpulan Soal Error Analysis
100% (5)
Kumpulan Soal Error Analysis
2 pages
Siemens TC14.3 Installation Guide
No ratings yet
Siemens TC14.3 Installation Guide
338 pages
Outage Probability Constrained MIMO-NOMA
No ratings yet
Outage Probability Constrained MIMO-NOMA
16 pages
Joint Mode Selection and Resource Allocation
No ratings yet
Joint Mode Selection and Resource Allocation
28 pages
Mil 1ST Sem 2ND Quarter Week 1
No ratings yet
Mil 1ST Sem 2ND Quarter Week 1
8 pages
Fu 2020
No ratings yet
Fu 2020
16 pages
Hybrid - Precoding-Based - Millimeter - Wave - Massive - MIM
No ratings yet
Hybrid - Precoding-Based - Millimeter - Wave - Massive - MIM
18 pages
Deep Learning Aided Beamforming For Downlink Non-Orthogonal Multiple Access Systems
No ratings yet
Deep Learning Aided Beamforming For Downlink Non-Orthogonal Multiple Access Systems
17 pages
Ix Developer: User's Guide
100% (1)
Ix Developer: User's Guide
48 pages
Energies 15 05668 v3
No ratings yet
Energies 15 05668 v3
19 pages
Tcomm 2020 2998858
No ratings yet
Tcomm 2020 2998858
14 pages
Exact BER Performance Analysis For Downlink NOMA Systems Over Nakagami - M Fading Channels
No ratings yet
Exact BER Performance Analysis For Downlink NOMA Systems Over Nakagami - M Fading Channels
17 pages
Applsci 13 02452
No ratings yet
Applsci 13 02452
13 pages
Non Orthogonal Multiple Access and 5g Technology
No ratings yet
Non Orthogonal Multiple Access and 5g Technology
40 pages
Stackelberg Game For User Clustering and Power
No ratings yet
Stackelberg Game For User Clustering and Power
16 pages
Joint Clustering and Power Allocation in Coordinated Multipoint Assisted C-NOMA Cellular Networks
No ratings yet
Joint Clustering and Power Allocation in Coordinated Multipoint Assisted C-NOMA Cellular Networks
16 pages
Fang 2017
No ratings yet
Fang 2017
12 pages
Joint Beamforming Design and Resource Allocation For Terrestrial-Satellite Cooperation System
No ratings yet
Joint Beamforming Design and Resource Allocation For Terrestrial-Satellite Cooperation System
14 pages
Let's Create Local and Global Ads: Activity 2: Home Business! Lead in
100% (1)
Let's Create Local and Global Ads: Activity 2: Home Business! Lead in
3 pages
A Deep Learning Approach For MIMO-NOMA Downlink Signal Detection
No ratings yet
A Deep Learning Approach For MIMO-NOMA Downlink Signal Detection
22 pages
SIC Constraint
No ratings yet
SIC Constraint
16 pages
Butterfly Optimization
No ratings yet
Butterfly Optimization
8 pages
Improved User Fairness in Decode-Forward Relaying Non-Orthogonal Multiple Access Schemes With Imperfect SIC
No ratings yet
Improved User Fairness in Decode-Forward Relaying Non-Orthogonal Multiple Access Schemes With Imperfect SIC
13 pages
On Energy Harvesting in Hybrid Massive MIMO Cooperative NOMA in A Multiuser Cell
No ratings yet
On Energy Harvesting in Hybrid Massive MIMO Cooperative NOMA in A Multiuser Cell
6 pages
A Quasi-Optimal Clustering Algorithm For MIMO-NOMA Downlink Systems
No ratings yet
A Quasi-Optimal Clustering Algorithm For MIMO-NOMA Downlink Systems
5 pages
Krushi Bhavan
No ratings yet
Krushi Bhavan
5 pages
Analysis of Power Allocation For Non-Orthogonal Mu
No ratings yet
Analysis of Power Allocation For Non-Orthogonal Mu
10 pages
Deep Reinforcement Learning For RIS-Aided Non-Orthogonal Multiple Access Downlink Networks
No ratings yet
Deep Reinforcement Learning For RIS-Aided Non-Orthogonal Multiple Access Downlink Networks
6 pages
New Paper
No ratings yet
New Paper
15 pages
Xu 2017
No ratings yet
Xu 2017
14 pages
Resource Allocation and BER Performance Analysis of NOMA Based Cooperative Networks
No ratings yet
Resource Allocation and BER Performance Analysis of NOMA Based Cooperative Networks
13 pages
5.downlink MIMO-NOMA For Ultra-Reliable Low-Latency Communications
No ratings yet
5.downlink MIMO-NOMA For Ultra-Reliable Low-Latency Communications
15 pages
Improving 5G Uplink Spectral Efficiency Using Massive Multiple-Input Multiple Output and Non-Orthogonal Multiple Access
No ratings yet
Improving 5G Uplink Spectral Efficiency Using Massive Multiple-Input Multiple Output and Non-Orthogonal Multiple Access
14 pages
Chap 7
No ratings yet
Chap 7
13 pages
Channel Assignment With Power Allocation For Sum Rate Maximization in NOMA Cellular Networks
No ratings yet
Channel Assignment With Power Allocation For Sum Rate Maximization in NOMA Cellular Networks
5 pages
No-Pain No-Gain DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks
No ratings yet
No-Pain No-Gain DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks
16 pages
A Novel Fair Power Allocation For Sum-Rate Maximization To NOMA-based Relaying System
No ratings yet
A Novel Fair Power Allocation For Sum-Rate Maximization To NOMA-based Relaying System
5 pages
Energy Efficient Resource Allocation Algorithm in Multi-Carrier NOMA Systems
No ratings yet
Energy Efficient Resource Allocation Algorithm in Multi-Carrier NOMA Systems
5 pages
Joint Beamforming and Power Allocation in Downlink NOMA Multiuser MIMO Networks
No ratings yet
Joint Beamforming and Power Allocation in Downlink NOMA Multiuser MIMO Networks
14 pages
Non-Orthogonal Multiple Access For Cooperative Communications: Challenges, Opportunities, and Trends
No ratings yet
Non-Orthogonal Multiple Access For Cooperative Communications: Challenges, Opportunities, and Trends
20 pages
Base 2
No ratings yet
Base 2
6 pages
Proportional Fairness-Based Power Allocation Algorithm For Downlink NOMA 5G Wireless Networks
No ratings yet
Proportional Fairness-Based Power Allocation Algorithm For Downlink NOMA 5G Wireless Networks
20 pages
Performance Enhancement of 5G Networks: Remodeling Power Domain Scheme Through NOMA-MIMO Technologies Integration
No ratings yet
Performance Enhancement of 5G Networks: Remodeling Power Domain Scheme Through NOMA-MIMO Technologies Integration
21 pages
CC-LINK Interface: SR83 Digital Controller
No ratings yet
CC-LINK Interface: SR83 Digital Controller
24 pages
Power Allocation Optimization For Uplink Non-Orthogonal Multiple Access Systems
No ratings yet
Power Allocation Optimization For Uplink Non-Orthogonal Multiple Access Systems
5 pages
Fundamentals of Quantum Programming in I
No ratings yet
Fundamentals of Quantum Programming in I
354 pages
ACO-Based Power Allocation For Throughput Maximization in The Downlink 5G NOMA Systems
No ratings yet
ACO-Based Power Allocation For Throughput Maximization in The Downlink 5G NOMA Systems
8 pages
Energy-Efficient and Fair Power Allocation Approach For NOMA in Ultra-Dense Heterogeneous Networks
No ratings yet
Energy-Efficient and Fair Power Allocation Approach For NOMA in Ultra-Dense Heterogeneous Networks
6 pages
A DST Precoding Based Uplink NOMA Scheme For PAPR Reduction in 5G Wireless Network
No ratings yet
A DST Precoding Based Uplink NOMA Scheme For PAPR Reduction in 5G Wireless Network
4 pages
Deep Learning Paper Gui2018
No ratings yet
Deep Learning Paper Gui2018
11 pages
On The Performance Gain of NOMA Over OMA in Uplink Communication Systems
No ratings yet
On The Performance Gain of NOMA Over OMA in Uplink Communication Systems
51 pages
Power Allocation For Downlink of Non-Orthogonal Multiple Access System Via Genetic Algorithm
No ratings yet
Power Allocation For Downlink of Non-Orthogonal Multiple Access System Via Genetic Algorithm
12 pages
Henry Optimal Channel Allocation in Massive MIMO
No ratings yet
Henry Optimal Channel Allocation in Massive MIMO
6 pages
Antenna Selection and Power Allocation Design For 5G Massive MIMO Uplink Networks
No ratings yet
Antenna Selection and Power Allocation Design For 5G Massive MIMO Uplink Networks
15 pages
Deep Learning For PHY Layer 5G Challenges
No ratings yet
Deep Learning For PHY Layer 5G Challenges
18 pages
A Deep Q-Network Based-Resource Allocation
No ratings yet
A Deep Q-Network Based-Resource Allocation
5 pages
Performance Analysis of MIMO-NOMA Systems Based On Dynamic PDF
No ratings yet
Performance Analysis of MIMO-NOMA Systems Based On Dynamic PDF
8 pages
A User Pairing Method To Improve The Channel Capacity For Multiuser MIMO PDF
No ratings yet
A User Pairing Method To Improve The Channel Capacity For Multiuser MIMO PDF
7 pages
System Performance of Cooperative Massive MIMO Downlink 5G Cellular Systems
No ratings yet
System Performance of Cooperative Massive MIMO Downlink 5G Cellular Systems
5 pages
Harpoon Lagoon Manual Ice
No ratings yet
Harpoon Lagoon Manual Ice
22 pages
Noma Iot
No ratings yet
Noma Iot
12 pages
Power Allocation in Downlink Non-Orthogonal Multiple Access Iot-Enabled Systems: A Particle Swarm Optimization Approach
No ratings yet
Power Allocation in Downlink Non-Orthogonal Multiple Access Iot-Enabled Systems: A Particle Swarm Optimization Approach
7 pages
Mathematical Modeling in Realistic Mathematics Education
No ratings yet
Mathematical Modeling in Realistic Mathematics Education
4 pages
User Pairing and Pair Scheduling in Massive MIMO
No ratings yet
User Pairing and Pair Scheduling in Massive MIMO
4 pages
Fairness For Non-Orthogonal Multiple Access in 5G Systems
No ratings yet
Fairness For Non-Orthogonal Multiple Access in 5G Systems
5 pages
P1 1 PDF
No ratings yet
P1 1 PDF
4 pages
Specifications Guide Electric Range EN
No ratings yet
Specifications Guide Electric Range EN
2 pages
Optimal Design of Non-Orthogonal Multiple Access With Wireless Power Transfer
No ratings yet
Optimal Design of Non-Orthogonal Multiple Access With Wireless Power Transfer
6 pages
Kaust Repository: Cluster Formation and Joint Power-Bandwidth Allocation For Imperfect Noma in Dl-Hetnets
No ratings yet
Kaust Repository: Cluster Formation and Joint Power-Bandwidth Allocation For Imperfect Noma in Dl-Hetnets
7 pages
Middleware MidSem Preparation - PDF
No ratings yet
Middleware MidSem Preparation - PDF
108 pages
Solar Dryer
No ratings yet
Solar Dryer
25 pages
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
No ratings yet
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
14 pages
Noma Isac 1
No ratings yet
Noma Isac 1
7 pages
Edinburgh Research Explorer: Network Slicing in 5G: Survey and Challenges
No ratings yet
Edinburgh Research Explorer: Network Slicing in 5G: Survey and Challenges
8 pages
Multiple Access Techniques For Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook
No ratings yet
Multiple Access Techniques For Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook
54 pages
(System Message) (System Message) (System Message) : (Dota V6.69C.W3X)
No ratings yet
(System Message) (System Message) (System Message) : (Dota V6.69C.W3X)
38 pages
Aegis El RG 4m El RG 4k Manual 21-03-31
No ratings yet
Aegis El RG 4m El RG 4k Manual 21-03-31
8 pages
GCash Edelene
No ratings yet
GCash Edelene
10 pages
Application Form
No ratings yet
Application Form
8 pages
Data Quality Model
No ratings yet
Data Quality Model
107 pages
Network Slicing Based 5G and Future Mobile Networks: Mobility, Resource Management, and Challenges
No ratings yet
Network Slicing Based 5G and Future Mobile Networks: Mobility, Resource Management, and Challenges
19 pages
Hybrid Beamforming For Millimeter Wave MIMO Integrated Sensing and Communications
No ratings yet
Hybrid Beamforming For Millimeter Wave MIMO Integrated Sensing and Communications
13 pages
海信 Residential Catalog-一拖多 16到27页
No ratings yet
海信 Residential Catalog-一拖多 16到27页
36 pages
US Gov National Standards Strategy 2023
No ratings yet
US Gov National Standards Strategy 2023
14 pages
Integrated Sensing, Communication, and Compu-Tation Over-the-Air: MIMO Beamforming Design
No ratings yet
Integrated Sensing, Communication, and Compu-Tation Over-the-Air: MIMO Beamforming Design
30 pages
Joint Secure Transmit Beamforming Designs For Integrated Sensing and Communication Systems
No ratings yet
Joint Secure Transmit Beamforming Designs For Integrated Sensing and Communication Systems
13 pages
SingleandMulti ObjectiveOptimizationAlgorithms
No ratings yet
SingleandMulti ObjectiveOptimizationAlgorithms
9 pages
Deep Reinforcement Learning For RAN Optimization and Control
No ratings yet
Deep Reinforcement Learning For RAN Optimization and Control
6 pages
Hybrid beamforming-SLNR2
No ratings yet
Hybrid beamforming-SLNR2
6 pages
Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
Best 10 CNC Machining Service Companies in Belgium
No ratings yet
Best 10 CNC Machining Service Companies in Belgium
5 pages
Deep Learning Based Predictive Beamforming Design
No ratings yet
Deep Learning Based Predictive Beamforming Design
6 pages
Antenna Array
No ratings yet
Antenna Array
22 pages
Hybrid Beamforming
No ratings yet
Hybrid Beamforming
12 pages
Lesson 2 Introduction of Robot HAT
No ratings yet
Lesson 2 Introduction of Robot HAT
4 pages
Antenna Activation For NOMA Assisted Pinching-Antenna Systems
No ratings yet
Antenna Activation For NOMA Assisted Pinching-Antenna Systems
5 pages
Beamforming & Reflecting Design RIS Multi User
No ratings yet
Beamforming & Reflecting Design RIS Multi User
14 pages
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
From Everand
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Design of The Center of Pressure Apparatus With Demonstration
No ratings yet
The Design of The Center of Pressure Apparatus With Demonstration
22 pages
Dynamic - Resource - Allocation - With - RAN - Slicing - and - Scheduling - For - uRLLC - and - eMBB - Hybrid - Services
No ratings yet
Dynamic - Resource - Allocation - With - RAN - Slicing - and - Scheduling - For - uRLLC - and - eMBB - Hybrid - Services
14 pages
10.1109 Jsyst.2023.3241002 TJJM
No ratings yet
10.1109 Jsyst.2023.3241002 TJJM
12 pages
Coexistence Mechanism Between eMBB and uRLLC in 5G Wireless Networks
No ratings yet
Coexistence Mechanism Between eMBB and uRLLC in 5G Wireless Networks
14 pages
Candidate Privacy
No ratings yet
Candidate Privacy
6 pages
Mobile User Environment Detection Using Deep Learning Based Multi-Output Classification
No ratings yet
Mobile User Environment Detection Using Deep Learning Based Multi-Output Classification
10 pages
Hybrid beamforming-SLNR6
No ratings yet
Hybrid beamforming-SLNR6
9 pages
ThinkCentre M75q Gen 2 Spec
No ratings yet
ThinkCentre M75q Gen 2 Spec
9 pages
Installation and Testing of Battery & Battery Charger
No ratings yet
Installation and Testing of Battery & Battery Charger
3 pages
Energy Efficiency - Spectral Efficiency Trade-Off: A Multiobjective Optimization Approach
No ratings yet
Energy Efficiency - Spectral Efficiency Trade-Off: A Multiobjective Optimization Approach
8 pages
Yu 2017
No ratings yet
Yu 2017
6 pages
Noma 3
No ratings yet
Noma 3
6 pages
10.1109 - Lcomm.2019.2959335 - Cho4
No ratings yet
10.1109 - Lcomm.2019.2959335 - Cho4
4 pages
Gravitee Vs Azure - Cloud Freedom From Your API Management Platform
No ratings yet
Gravitee Vs Azure - Cloud Freedom From Your API Management Platform
5 pages
Experian Dispute Form
100% (3)
Experian Dispute Form
1 page
Blanketrol III QRG
No ratings yet
Blanketrol III QRG
2 pages

DQN 1

Uploaded by

DQN 1

Uploaded by

1544 IEEE COMMUNICATIONS LETTERS, VOL. 25, NO.

A Deep Q-Network Based-Resource Allocation Scheme

D UE to the extreme shortage of wireless spectrum

taking action a from state s. In addition, the experience

multiple BPNNs need to be pre-trained for all the cases of K.

You might also like