0% found this document useful (0 votes)
43 views5 pages

DQN 1

The document proposes a deep Q-learning network based resource allocation scheme for massive MIMO-NOMA systems. It develops a reinforcement learning framework to build an iterative optimization structure for user clustering, power allocation, and beamforming. Specifically, a DQN is used to group users based on a reward item calculated after power allocation and beamforming, with the goal of maximizing system throughput.

Uploaded by

sima.sobhi70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views5 pages

DQN 1

The document proposes a deep Q-learning network based resource allocation scheme for massive MIMO-NOMA systems. It develops a reinforcement learning framework to build an iterative optimization structure for user clustering, power allocation, and beamforming. Specifically, a DQN is used to group users based on a reward item calculated after power allocation and beamforming, with the goal of maximizing system throughput.

Uploaded by

sima.sobhi70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1544 IEEE COMMUNICATIONS LETTERS, VOL. 25, NO.

5, MAY 2021

A Deep Q-Network Based-Resource Allocation Scheme


for Massive MIMO-NOMA
Yanmei Cao , Guomei Zhang , Member, IEEE, Guobing Li , Member, IEEE, and Jia Zhang

Abstract— In this letter, a deep Q-learning network (DQN) decoder detects the signal according to the power orders of
based resource allocation (RA) scheme is proposed for the mas- users. Thus, the error accumulation is an inherent problem of
sive multiple-input multiple-output (MIMO)- nonorthogonal mul- the SIC decoder. The more users are superimposed, the error
tiple access (NOMA) systems. The reinforcement learning (RL)
frame is developed to build an iterative optimization structure for propagation is more serious. This will limit the transmission
user clustering, power allocation and beamforming. Specifically, efficiency greatly. In addition, each beam has to cover all the
a DQN is designed to group the users based on the reward item users in one cluster in a MIMO-NOMA system, rather than
calculated after power allocation and beamforming. The objective one user in a MIMO-OMA system. Then, the tradeoff between
is to maximize the reward item, i.e., the system throughput. Then, enhancing the intra-cluster coverage and eliminating the inter-
a back propagation neural network (BPNN) is used to realize the
power allocation. During the training of BPNN, the exhaustive cluster interference becomes more difficult for beamforming.
search results in the quantized power set are taken as the output Base on the above analysis, joint optimization of user cluster-
labels. Simulation experiments show that the proposed scheme ing, power allocation (PA) and beamforming becomes more
can achieve high system spectrum efficiency approximating to the urgent for massive MIMO-NOMA.
exhaustive search based on user clustering and power allocation. Unfortunately, such a joint RA problem with multiple
Index Terms— Non-orthogonal multiple access, massive variables has been proven to be a NP-hard problem [5]. The
multiple-input multiple-output, resource allocation, deep alternate optimization for three parts is usually used. For the
Q-learning network, back propagation neural network. user clustering sub-problem, the optimal solution is obtained
by searching all clustering combinations exhaustively. For the
I. I NTRODUCTION scenario with a large number of users, some heuristic cluster-

D UE to the extreme shortage of wireless spectrum


resources, how to further improve spectrum efficiency
and system capacity is still a key problem for the next
ing algorithms, such as random pairing and the next-largest-
difference-based user pairing algorithm (NLUPA) [6], have
been proposed to reduce the system complexity. As for PA,
generation of mobile communications [1]. Non-orthogonality some simple methods such as the fixed power allocation (FPA)
and large dimensions are considered to be effective for tackling and the fractional transmit power allocation (FTPA) have been
this problem [2]. NOMA could support the needs of large-scale utilized [7]. In addition, considering the tight coupling between
access by superposing signals from multiple users in the same user clustering and power allocation, some joint optimization
orthogonal resource block. In addition, massive MIMO could schemes have also been studied. In [8], to maximize the energy
realize a high capacity through equipping with a large-scale of efficiency, a joint user pairing and dynamic power allocation
antennas [3]. The combination of NOMA and massive MIMO scheme was proposed. In order to improve the service quality
technology is able to exploit the degrees of freedom in both the for the cell-edge users, the dynamic user allocation and power
power domain and space domain, thereby further to improve optimization problems were studied by considering the user
the system spectrum efficiency. However, with the greatly fairness [9]. The above researches show that the joint optimal
increasing of user number and density in massive MIMO- transmission can work more efficiently. However, with the
NOMA systems, various contradictions become significant, increase in the number of antennas, how to improve the system
including the conflict between the transmission efficiency and capacity while avoiding the excessive computation complexity
the cumulative error propagation of successive interference and energy consumption is the primary difficulty for the joint
cancellation (SIC) decoder, and the trade-off between the intra- optimization. With the development of machine learning (ML)
cluster coverage enhancement and the inter-cluster interference in wireless communications, ML solutions for RA and wireless
suppression. Specifically, in order to obtain the advantages transmission continue to emerge [10], such as the channel allo-
of power domain multiplexing, a large difference of channel cation based on RL in [11], the power allocation and subchan-
gain among the superimposed users is required and different nel assignment schemes based on the deep recurrent neural
levels of power is allocated to the users [4]. Then, the SIC network (RNN) architecture for a NOMA-based heterogeneous
Internet of Things system in [12] and the long short term
Manuscript received December 26, 2020; accepted January 18, 2021. Date
of publication January 28, 2021; date of current version May 6, 2021. The memory network based power allocation for NOMA systems
research work reported in this letter is supported by the National Key R&D in [13]. These works show the potential advantage of applying
Program of China under the Grant No. 2020YFB1807702 and the National ML algorithms to the traditional communication systems.
Natural Science Foundation of China under the Grants No. 61941119. The
associate editor coordinating the review of this letter and approving it for Motivated by the previous research, this letter develops a RL
publication was J. Choi. (Corresponding author: Guomei Zhang.) method to solve the complex joint RA problem for massive
The authors are with the School of Information and Communications MIMO-NOMA systems. However, the traditional RL schemes,
Engineering, Xi’an Jiaotong University, Xi’an 710049, China (e-mail:
[email protected]). such as Q-Learning, are not suitable for the case that the
Digital Object Identifier 10.1109/LCOMM.2021.3055348 state space is too large and the learning efficiency would be
1558-2558 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP Q-NETWORK BASED-RESOURCE ALLOCATION SCHEME FOR MASSIVE MIMO-NOMA 1545

extremely low. Therefore, a deep Q-learning network (DQN) is σ 2 . The second and third terms after the second equal sign in
adopted to realize the joint RA scheme under the scene with a (2) represents the intra-cluster and inter-cluster interferences,
large number of users. Then, a deep coupled iterative structure respectively. The beamforming vector is designed to eliminate
involving three functional modules, namely user clustering, inter-cluster interference, which should satisfy hn w i = 0, i =
power allocation and beamforming, is established based on n. However, it is difficult to achieve such an ideal result in
the deep reinforcement learning (DRL) frame. In the user practice and the inter-cluster interference couldn’t be ignored.
clustering stage, DQN is used to gradually adjust the clustering Moreover, suppose that the SIC detector in the receiver could
results to maximize the system throughput, which is calculated cancel the interference from the previous users ideally, and
by the environment evaluator based on the previous clustering the user channel quality for the n-th user cluster ranks as
2 2 2
result, power factors and beamforming vectors. In the power hn,1  ≤ hn,2  ≤ · · · hn,K  . Thus, the signal to inter-
assignment, a BPNN is designed to learn the relationship ference plus noise ratio of user Un,k is
2
between the power allocation factors and the users’ channel |hn,k wn | αn,k Pn
Φn,k = N
state information (CSI) for each user cluster. As for the  2 2 
K

beamforming, some traditional methods, such as the zero |hn,k wi | Pi +|hn,k w n | αn,j Pn +σ 2
i=1,i=n j=k+1
forcing (ZF) algorithm and the other optimized beamforming (3)
schemes could be directly used among different user clusters.
Simulation experiments show that the iterative process can
Furthermore, with the objective of maximum sum rate, the
converge after dozens of iterations and the system performance
joint optimization problem could be written as follows:
could approximate the case of exhaustive search scheme. N K
max Rsum = Blog2 (1 + Φn,k )
{αn,k },{wn },{Un,k } n=1 k=1
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION N
s.t. C1 : αn,k ≤ 1,αn,k ∈ [0, 1]
Consider a single-cell multi-users downlink system. The n=1
N
base station (BS) is equipped with Nt antennas and serves C2 : Pn ≤ P
n=1
L single-antenna users. All users in the cell are divided into
C3 : Rn,k ≥ Rmin
N clusters, each of which includes K users. The users’ 2
data in one cluster are transmitted in the form of power- C4 : W  = 1 (4)
domain NOMA signal structure and preprocessed by the same where B is the bandwidth of one user channel. C1 is the
beamforming vector. Assume that the BS deploys antennas on power allocation factor constraint for each user cluster and
the Y -Z plane in terms of the uniform planar array (UPA). C2 denotes the total power constraint of the BS over one user
Then, the channel vector from the BS to user m can be channel. C3 can ensure the minimum data rate for each user
modeled similarly as in [14], which is given by and C4 is the norm constraint of the beamforming matrix.
 Lu
1  Because the joint optimization problem (4) is non-convex
hm = ( dm/d0 )μ √ gm,l b(vm,l ) ⊗ a(um,l ) (1) [5], the traditional solutions are always the heuristic or the
Lu l=1
alternating iterative methods. These methods exist the draw-
Here d0 is the radius of cell and dm is the distance between the backs of high complexity or limited performance. While, DRL
user and the BS. μ is the large-scale fading factor. b(vm,l ) and can not only fully explore the hidden information of big data
a(um,l ) are the vertical and horizontal array response vectors to improve its own learning performance, but also realize
of the UPA antenna, respectively. The symbol “⊗” in formula the dynamic real-time interaction. This method has strong
(1) represents the kronecker product of two matrices. Lu is generalization ability and highlights its advantages in wireless
the number of scattering paths and g denotes the small-scale RA. Therefore, this letter proposes a joint optimization method
fading coefficient. to solve the problem (4) based on the DRL in Section III.
Assume the data sent by the BS being X = [x1 · · · xN ]T ∈
K 
C N ×1 , where xn = αn,k Pn sn,k is the superposed III. DQN BASED -R ESOURCE A LLOCATION S CHEME
k=1
NOMA signal of K users in the n-th cluster. Here, Pn is the The proposed RA network based on DQN is shown in Fig.1
total power of the n-th cluster. αn,k and sn,k are the power and it includes three parts: user clustering, power allocation
allocation factor and the transmitted symbol of the k-th user in and beamforming. While, the first two parts are mainly con-
the n-th cluster, denoted by Un,k , respectively. It is assumed cerned in this section.
2
that E[|sn,k | ] = 1. The received signal of user Un,k is
yn,k A. User Clustering Based on DQN

= hn,k W X +zn,k = hn,k w n αn,k Pn sn,k The user clustering problem is modeled as a RL task, which
K N
 consists of the agent and the environment. Specifically, the user
+hn,k wn αn,k Pn sn,k +hn,k wi xi +zn,k clustering module is taken as an agent and the performance of
j=1,j=k i=1,i=n the massive MIMO-NOMA system is the environment. The
(2) actions {at } taken by the agent are based on the expected
where W = [w1 · · · wN ] ∈ C Nt ×N is a beamforming matrix, rewards from the environment. According to the considered
and hn,k ∈ C 1×Nt is the channel vector for user Un,k . zn,k is system, each part of the RL framework is described as
a complex white Gaussian noise with zero mean and variance follows:

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.
1546 IEEE COMMUNICATIONS LETTERS, VOL. 25, NO. 5, MAY 2021

taking action a from state s. In addition, the experience


replay strategy used in [15] is adopted in training. First, the
observation samples (st , at , rt , st+1 ) in previous iterations are
stored. Then a mini-batch of samples are randomly selected
from the replay memory to feed into the target Q-network to
get the training labels. The objective of this process is to break
the correlations among data samples and make training conver-
gent. Following the Q-network, ε-greedy strategy is utilized to
Fig. 1. Joint optimization network based on DQN .
choose the current action. That is to select an action randomly
from the whole action space with the exploration probability
State space S: The CSI of all users in each cluster forms ε and determine an action to receive the maximum Q-value
the current state of the t-th iteration, which is given by st = according to the Q-network output with the probability (1−ε).
{[h1,1 , · · · h1,K ], · · · [hN,1 , · · · , hN,K ]}.
Action space A: It should contain actions that can cover all
possible clustering results. When there are L users, the number B. Power Allocation Based on BPNN
of possible actions reaches to CLK CL−K K K
· · · CL−(N −1)K /N !.
In order to ensure the effectiveness of the SIC receiver in
The size of the action space will increase dramatically power-domain NOMA, the power of users in the same cluster
with the increase of users. The purpose of taking action needs to be assigned appropriately. Power allocation is key
is to select a suitable group for each user. Taking to compromise between the system sum rate and the users’
t t t t
the action at = {[U1,1 , · · · , U1,K ], · · · , [UN,1 , · · · , UN,K ]} fairness. Different from the traditional optimization algorithm,
for the state st will result in the new state st+1 . a BPNN based power allocator is designed to reduce the
at
This impact is defined as st → st+1 . Here st+1 is computation complexity as well as obtain a good performance.
{[h1,1 , · · · , h1,K ], · · · , [hN,1 , · · · , htN,K ]}, which fully corre-
t t t The task of power allocation is to calculate the users’
sponds to the new user grouping result at . And htn,k is the power allocation factors αn = [αn,1 , . . . ,αn,K ], αn,k ∈ [0, 1]
t for each group under a given user clustering result. In order
CSI of the user Un,k , i.e., the k-th user in the n-th group for
the current action at . to explore the nonlinear mapping ability of the BPNN’s to
Reward function: Here, the system sum rate rt = Rsum extract the relationship between the power allocation and the
is taken as the reward function, which is also related with users’ CSI among a cluster, the BPNN needs to be trained
{αn,k } and {wn }. The ideal goal of RL is to maximize the based on large amounts of labeled data. Here, the result of

cumulative discounted reward Rt = i=0 γ i rt+i , where the exhaustive search based power allocation (ESPA) algorithm
discount factor is γ ∈ [0, 1]. Obviously, the action of each executed among a finite power allocation factor set, α  n is
iteration can only be finalized after all iterations have been used as the training label for BPNN. Here, the finite set
completed with such a target. Specifically, a state-action value for ESPA is obtained by discretizing the continuous factor
(Q-value) function defined in (5) [15], which can determine the range with a small step size. In order to ensure the fairness
current action only based on the next iteration’s value function, of users, the constraint Rn,k ≥ Rmin is realized by the
is used in Q-learning. In order to find the optimal at for the ESPA. Because the beamforming matrix are unknown when
given state st to make the Q-value maximized, Eq.(5) has to generating the training data for the BPNN, the calculation
of Rn,k does not involve W and its expression is Rn,k =
2 K
be calculated for all actions. If L is very large, the complexity 2 2
will be extremely high and the algorithm converges slowly. B log(1+hn,k  αn,k Pn /(hn,k  j=k+1 αn,j Pn + σ )).
The BPNN consists of input layer, output layer and hidden
Q(st , at ) = E[rt + γ max Q(st+1 , at+1 ) |st , at ] (5)
at+1 layers. The hidden layers of the BPNN adopt the rectified
Deep Q-Network: To speed up the convergence of linear unit (ReLu) activation function [16] and other
Q-learning, a deep Q-Network is adopted to estimate the layers use the linear activation function. The input of the
Q-values. A fully connected neural network with three hidden BPNN is channel information {hn,1  , · · · , hn,K } of
layers is involved. Its input and output are the current state one cluster’s users, and the output is the corresponding
and the estimated Q-values corresponding to each action, power allocation factors αn . The loss function is defined
respectively. The function of the Q-network is represented as as Loss = min αn −α  n 2 and the stochastic gradient
ωB ,bB
Q (st , at , ω), where ω is the weight set to be trained. In order descent (SGD) method is used to update the network
to ensure the convergence of Q-network’s parameters updating, parameters {ωB , bB } in training. The trained network can
a target Q-network, which has the same structure and the initial directly calculate the power allocation result based on the
weights as the Q-network but keeps the old weights ω C− of channel information and the online calculation complexity
C iterations ago during working, is included to provide the could be greatly reduced.
relatively stable labels for Q-network training. Hence, the loss Obviously, different BPNNs need to be trained for different
function used in training is given by configurations of K. In our work, the number users K in every
L(ω) = E[(r + γ max

Q(s , a , ωC− ) − Q(s, a, ω))2 ] (6) cluster is assumed identical. Then, the same pre-trained BPNN
a
can be used by all the clusters. In a practical system, the para-
where the label is y = r + γ max

Q(s , a , ω C− ) calculated meter K might be different for different user clusters accord-
a
based on the old weights ωC− and the new state s after ing to the users’ access requirement and their CSI. In this case,

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP Q-NETWORK BASED-RESOURCE ALLOCATION SCHEME FOR MASSIVE MIMO-NOMA 1547

TABLE I
S IMULATION PARAMETERS

multiple BPNNs need to be pre-trained for all the cases of K.


Fortunately, the possible values of K is limited in a very small
Fig. 2. The system total spectrum efficiency (L = 8, K = 2).
set. From the various references surveyed, it can be seen that
the number of superimposed users in a NOMA signal structure
is usually selected as 2 or 3 [4], [12], [17], [18]. Actually, with
K being larger than 3, the error propagation in the SIC decoder
becomes more serious and the total system performance will
decrease instead. This result can be seen from comparing
the simulation curves in Fig.4 and Fig.2. Since K is suitable
for 2 or 3, we only need to train two BPNNs in advance. Then,
the corresponding network structures and parameters could
be saved with an acceptable memory capacity requirement.
During work, the BS can call the needed BPNN based on the
actual number of users in each cluster. Through such process,
the proposed BPNN PA scheme becomes feasible even if the
number of users per cluster is not fixed.
Fig. 3. The system total spectrum efficiency (L = 1 2, K = 2).
IV. S IMULATION R ESULTS
In the simulation, one BS is located in the center of the
cell and users are randomly and uniformly distributed in the
cell. The specific simulation parameters are shown in Table 1.
For the power allocation network, 100,000 groups of data
are used to train the BPNN. The step size of power factor’s
discretization is 0.01. In order to guarantee the generalization
performance of the BPNN, additional 20000 groups of data are
used to test the network. In all schemes, the ZF beamforming
algorithm is selected, and the beamforming vector is calculated
based on the CSI with the best quality in each cluster. Five
benchmark schemes correspond to the different combinations
of user clustering methods, namely DQN, NLUPA and exhaus-
tive search (ES), and power allocation schemes, namely FTPA,
BPNN and ESPA. Fig. 4. The system total spectrum efficiency (L = 8, K = 4).
In Fig.2, the system spectrum efficiency of different
schemes are compared when L = 8 and K = 2. The total trans- the DQN user clustering. Furthermore, the simulation curves
mission power range is 0.08-4 W . The BPNN was adjusted corresponding to the ESPA, BPNN based PA and FTPA with
to 5 layers and the numbers of three hidden layers’ nodes adopting the ES user clustering are given in Fig.2, respectively.
are set as 32,64 and 32 through many times of experiments. From these curves, we can find that the proposed BPNN PA
It can be observed that the scheme DQN-FTPA outperforms outperforms the FTPA about 3 bit/s/Hz with the power of
the scheme NLUPA-FTPA due to using the DQN based user 4W. Its performance is almost same as the ESPA, but the
clustering, which takes into account the real-time interaction complexity could be reduced obviously by using the pre-
with the current environment. The result of DQN almost trained BPNN.
reaches to the one of the ES clustering, which can be taken as In addition, Fig.3 shows the simulation results as the number
the upper bound of the user clustering’s performance. Besides, of user increases to 12. Here, the number of users per cluster
the system spectrum efficiency could be improved by about is still 2, so the same pre-trained BPNN as used in the
3bit/s/Hz further with the power being above 1W, through simulation of Fig.2 is adopted. From Fig.3, it can be found
adopting the BPNN power allocation simultaneously above that the proposed DQN-BPNN RA scheme still maintains

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.
1548 IEEE COMMUNICATIONS LETTERS, VOL. 25, NO. 5, MAY 2021

V. C ONCLUSION
This work mainly studies the downlink RA problem in
the massive MIMO-NOMA system. In order to maximize
the system spectrum efficiency under the premise of ensuring
the worst user performance constraint, a deep Q-learning
network and a BP neural network are designed to realize the
joint user clustering and the intra-cluster power allocation,
respectively. The simulation results demonstrate the advantage
of our scheme on improving system spectrum efficiency.

R EFERENCES
[1] F. Tang, Y. Kawamoto, N. Kato, and J. Liu, “Future intelligent and
secure vehicular network toward 6G: Machine-learning approaches,”
Proc. IEEE, vol. 108, no. 2, pp. 292–307, Feb. 2020.
[2] Z. Shi, W. Gao, S. Zhang, J. Liu, and N. Kato, “AI-enhanced cooperative
Fig. 5. CDF curves of the user’s spectrum efficiency. spectrum sensing for non-orthogonal multiple access,” IEEE Wireless
Commun., vol. 27, no. 2, pp. 173–179, Apr. 2020.
a performance comparable to the ES-ESPA case and has [3] G. Zhang et al., “Interference management by vertical beam control
a significant gain over the NLUPA-FTPA method for the combined with coordinated pilot assignment and power allocation in
scenarios with more users. Furthermore, we also find that the 3D massive MIMO systems,” KSII Trans. Internet Inf. Syst., vol. 9,
no. 8, pp. 2797–2820, Aug. 2015.
total system spectrum efficiency increases about 7 bit/s/Hz [4] Z. Ding, F. Adachi, and H. V. Poor, “The application of MIMO to non-
at the power of 4W, when the number of users improves orthogonal multiple access,” IEEE Trans. Wireless Commun., vol. 15,
from 8 to 12. no. 1, pp. 537–552, Jan. 2016.
[5] Y. Sun, D. W. K. Ng, Z. Ding, and R. Schober, “Optimal joint power and
What’s more, we change the number of intra-cluster users subcarrier allocation for full-duplex multicarrier non-orthogonal multiple
and observe its effect on the system performance, just shown in access systems,” IEEE Trans. Commun., vol. 65, no. 3, pp. 1077–1091,
Fig.4. Here, K increases from 2 to 4 while L keeps to 8. In this Mar. 2017.
[6] S. M. R. Islam, M. Zeng, O. A. Dobre, and K.-S. Kwak, “Resource
case, the BPNN needs to be retrained. The network is adjusted allocation for downlink NOMA systems: Key techniques and open
to having 5 hidden layers, where the numbers of nodes are issues,” IEEE Wireless Commun., vol. 25, no. 2, pp. 40–47, Apr. 2018.
32, 64, 128, 64 and 32, respectively. It can be observed that [7] A. Benjebbour, A. Li, Y. Saito, Y. Kishiyama, A. Harada, and
T. Nakamura, “System-level performance of downlink NOMA for future
our scheme has a larger performance gain over the NLUPA- LTE enhancements,” in Proc. IEEE Globecom Workshops (GC Wkshps),
FTPA method for this case. However, we find that the system Atlanta, GA, USA, Dec. 2013, pp. 66–70.
spectrum efficiency for all the schemes under K = 4 reduces [8] S. Chinnadurai, P. Selvaprabhu, and M. H. Lee, “A novel joint user
pairing and dynamic power allocation scheme in MIMO-NOMA sys-
significantly when compared to the ones in Fig.2 (K = 2). This tem,” in Proc. Int. Conf. Inf. Commun. Technol. Converg. (ICTC), Jeju,
is just caused by the serious intra-cluster error propagation in South Korea, Oct. 2017, pp. 951–953.
the SIC decoder. The simulation work in Fig.4 can show that [9] Y. Liu, M. Elkashlan, Z. Ding, and G. K. Karagiannidis, “Fairness of
user clustering in MIMO non-orthogonal multiple access systems,” IEEE
the given BPNN based PA is applicable for the other case of Commun. Lett., vol. 20, no. 7, pp. 1465–1468, Jul. 2016.
K and K is suitably set as lower than 4 due to the limitation [10] R. Zhu and G. Zhang, “A segment-average based channel estimation
of the intra-cluster error propagation. scheme for one-bit massive MIMO systems with deep neural network,”
in Proc. IEEE 19th Int. Conf. Commun. Technol. (ICCT), Xi’an, China,
Fig.5 shows the CDF curve of user’s spectrum efficiency Oct. 2019, pp. 81–86.
got by the proposed scheme and the scheme ES-FTPA under [11] C. He, Y. Hu, Y. Chen, and B. Zeng, “Joint power allocation and channel
the total transmission power of 4 W , L = 8 and K = 2. The assignment for NOMA with deep reinforcement learning,” IEEE J. Sel.
Areas Commun., vol. 37, no. 10, pp. 2200–2210, Oct. 2019.
dotted line is the performance of the case that the inter-cluster [12] M. Liu, T. Song, and G. Gui, “Deep cognitive perspective: Resource
interference is ignored. It shows that the scheme DQN-BPNN allocation for NOMA-based heterogeneous IoT with imperfect SIC,”
obtains an obvious gain of the system spectrum efficiency IEEE Internet Things J., vol. 6, no. 2, pp. 2885–2894, Apr. 2019.
[13] G. Gui, H. Huang, Y. Song, and H. Sari, “Deep learning for an effective
against the traditional method. However, the performance of nonorthogonal multiple access scheme,” IEEE Trans. Veh. Technol.,
the edge users improves just slightly and has a big gap from the vol. 67, no. 9, pp. 8440–8450, Sep. 2018.
ideal situation. Although the minimum rate constraint for users [14] D. Ying, F. W. Vook, T. A. Thomas, D. J. Love, and A. Ghosh,
“Kronecker product correlation model and limited feedback codebook
has been considered in BPNN, the inter-cluster interference, design in a 3D channel model,” in Proc. IEEE Int. Conf. Commun. (ICC),
which cannot be suppressed by ZF beamforming, still worsens Sydney, NSW, Australia, Jun. 2014, pp. 5865–5870.
the final performance. Moreover, in the ideal situation without [15] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep rein-
forcement learning for dynamic multichannel access in wireless net-
inter-cluster interference, the lowest user’s spectrum efficiency works,” IEEE Trans. Cogn. Commun. Netw., vol. 4, no. 2, pp. 257–265,
cannot be kept on Rmin . This is because FPTA rather than Jun. 2018.
ESPA is used to generate the training labels in some extreme [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
cases. In such cases, Rmin cannot be realized for few users Process. Syst., Stateline, NV, USA, Dec. 2012, pp. 1097–1105.
no matter how to adjust the users’ power within a cluster due [17] TP for Classification of MUST Schemes, document R1-154999, 3GPP,
to their very poor channel conditions. But nearly 90% users 2015.
[18] J.-M. Kang, I.-M. Kim, and C.-J. Chun, “Deep learning-based MIMO-
still achieve the better performance than Rmin by the proposed NOMA with imperfect SIC decoding,” IEEE Syst. J., vol. 14, no. 3,
scheme. pp. 3414–3417, Sep. 2020.

Authorized licensed use limited to: London School of Economics & Political Science. Downloaded on May 16,2021 at 23:25:39 UTC from IEEE Xplore. Restrictions apply.

You might also like