0% found this document useful (0 votes)

12 views6 pages

Distributed Channel Allocation For Mobile 6G Subnetworks Via Multi-Agent Deep Q-Learning

This paper presents a multi-agent double deep Q-network (MADDQN) approach for distributed channel allocation in 6G subnetworks, addressing the challenges of interference mitigation in dense deployments. The proposed method utilizes reinforcement learning to optimize resource allocation, achieving significant performance improvements over random allocation while maintaining low computational complexity. Results indicate that the MADDQN method is scalable and effective for various scenarios, making it suitable for applications requiring high data rates and minimal communication overhead.

Uploaded by

Tran Quang Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

Distributed Channel Allocation For Mobile 6G Subnetworks Via Multi-Agent Deep Q-Learning

Uploaded by

Tran Quang Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Distributed Channel Allocation for Mobile 6G

Subnetworks via Multi-Agent Deep Q-Learning

Ramoni Adeogun, Gilberto Berardinelli
2023 IEEE Wireless Communications and Networking Conference (WCNC) | 978-1-6654-9122-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/WCNC55385.2023.10118710

AI for Communication Research Group, Department of Electronic Systems, Aalborg University, Denmark
E-mail:{ra, gb}@es.aau.dk

Abstract—Sixth generation (6G) in-X subnetworks are recently practical constraints on resource availability) by adjusting the
proposed as short-range low-power radio cells for supporting utilization of the limited radio resources such as transmit
localized extreme wireless connectivity inside entities such as power, frequency channel, and time. Resource allocation typ-
industrial robots, vehicles, and the human body. The deployment
of in-X subnetworks in these entities may lead to fast changes in ically involves non-convex objective function and is known
the interference level and hence, varying risks of communication to be NP-hard with no universal optimal solution [5]. To
failure. In this paper, we investigate fully distributed resource overcome this limitation, algorithms for resource allocation
allocation for interference mitigation in dense deployments of have been traditionally based on hard-coded heuristics [6] or
6G in-X subnetworks. Resource allocation is cast as a multi- using optimization techniques such as game theory [7], genetic
agent reinforcement learning problem and agents are trained in a
simulated environment to perform channel selection with the goal algorithm [8] and geometric programming [9]. Over the last
of maximizing the per-subnetwork rate subject to a target rate few years, the focus appears to have shifted towards machine
constraint for each device. To overcome the slow convergence and learning-based algorithms [5] resulting in a large number
performance degradation issues associated with fully distributed of published works applying supervised [10], unsupervised
learning, we adopt a centralized training procedure involving [11] and reinforcement learning techniques [12] for resource
local training of a deep Q-network (DQN) at a central location
with measurements obtained at all subnetworks. The policy allocation in different types of wireless systems.
is implemented using Double Deep Q-Network (DDQN) due While several solutions have been proposed for resource
to its ability to enhance training stability and convergence. allocation in different wireless systems over the years, works
Performance evaluation results in an in-factory environment targeting the peculiar nature of short-range low-power 6G
indicated that the proposed method can achieve up to 19% rate in-X subnetworks are still rather limited. In our previous
increase relative to random allocation and is only marginally
worse than complex centralized benchmarks. works, we have proposed distributed rule-based heuristics [6],
Index Terms—Machine learning, reinforcement learning, inter- [13] and a supervised learning method [14] in which a deep
ference management, beyond 5G networks, resource allocation neural network (DNN) is trained with data generated using
centralized graph coloring for channel allocation in scenarios
I. I NTRODUCTION with dense deployment of 6G in-X subnetworks. In a recent
The proliferation of more demanding applications clearly in- work [15], a Q-learning method for joint power and channel
dicates that wireless networks beyond 5G must be designed to allocation using quantized state information is proposed. While
cope with more stringent performance requirements in denser the results in this paper highlight the potential of Q-learning
environments than current systems. Recent publications on for resource allocation, the method suffers from non-scalability
sixth generation (6G) [1]–[4] networks have identified short- to large problem dimensions as well as the effect of state
range wireless communication for replacing wired connectivity quantization on the performance of Q-learning algorithms. The
in applications such as industrial control at the sensor-actuator authors of [16] presented a complex architecture referred to
level, augmented- or virtual reality, and intra-vehicle control. as GA-Net which combines graph attention (GAT) networks,
Replacing wired connectivity with wireless offers the inherent graph neural networks (GNN), and multi-agent reinforcement
benefits of higher scalability, lower equipment weight, en- learning (MARL) for channel allocation in 6G subnetworks.
hanced flexibility, and lower maintenance cost among others. The introduction of multi-head attention for feature extraction
Clearly, some of these examples are life-critical use cases allows for only centralized training which requires the trans-
requiring performance guarantees at all times. Such use cases mission of sensing measurements from all subnetworks to a
can also lead to dense scenarios (e.g., in-body subnetworks central location translating to high communication overhead
in a crowded environment) leading to potentially high and and potential security threats. The lack of possibility for
dynamic interference footprint. In order to achieve the above distributed training limits the usability of GA-Net in practical
requirements, mechanisms for mitigating the adverse effects applications where connection to a central network may be
of interference are important. impossible. Moreover, relying solely on centralized training
Radio resource allocation has been an important component is not feasible for in-X subnetworks applications (such as
of wireless research for several years as a key framework in-vehicle or in-body) where privacy constraints may hinder
for interference mitigation. The goal of resource allocation the transmission of raw sensing data to a central server for
is to optimize specified performance metric(s) (subject to training. In such cases, methods that are amenable to both

978-1-6654-9122-8/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: National Institute of Informatics (NII - Kokuritsu Johogaku Kenkyujo). Downloaded on February 11,2025 at 07:57:39 UTC from IEEE Xplore. Restrictions apply.
distributed and centralized training are desired. path-loss exponent, respectively. We compute the log-normal
In this paper, we propose a simple, scalable, and robust shadowing using [17]
multi-agent double deep Q-network (MADDQN) method for  d 
 1 − e − n,z,m dc
channel allocation using sensing measurements of the aggre-

ψn,z,m = ln (Sn + Sz,m ) , (3)
gate interference power collected at each subnetwork. The
q
√ d
− n,z,m 
proposed method can be applied for distributed channel allo- 2 1+e dc

cation with or without the exchange of measurements between where Sx is the value of a two-dimensional Gaussian random
subnetworks and is amenable to centralized, distributed, or fed- process with exponential covariance at the location of the
erated training. We perform extensive simulations to evaluate device or AP, and dc denotes the de-correlation distance.
the performance of the proposed using parameters defined for At slot, t, the signal-to-noise-plus-interference ratio (SINR)
the in-factory environment. The performance and complexity on the link between the AP in subnetwork n and its mth device
analysis results show that the MADDQN method can achieve can be expressed as
significant performance improvement relative to random al-
k
location and has low computation complexity. The proposed k
gn,n,m [t]
γnm [t] = P k
(4)
method is also scalable and generalizes well to scenarios with n0 ∈Inn0 gn,n0 ,m0 [t] + σ
2
parameters different from those used for training.
The remaining part of this paper is organized as follows. where Inn0 denotes the set of all other subnetworks that
The system model, the distributed channel allocation problem, are operating on the same channel as the nth subnetwork
and a short overview of DQN are presented in Section II. In and σ 2 = 10(−174+nf +10 log10 (BW))/10 is the noise power
III, we present the proposed method. Performance evaluation with nf and BW denoting the noise figure and channel
and complexity analysis results are presented in Section IV. bandwidth, respectively. Assuming single antenna at both the
Finally, we draw conclusions in Section V. APs and devices and considering the Shannon approximation,
the achieved rate at slot t can then be written as
II. P ROBLEM F ORMULATION
ζnm [t] ≈ log2 (1 + γnm [t]). (5)
System Model: We consider a network with N mobile
subnetworks each serving M devices. Each subnetwork has Distributed Resource Allocation Problem: We consider a
a single access point (AP) that coordinates transmission for resource allocation problem involving fully distributed selec-
its associated devices. We index the subnetworks (and hence, tion of frequency channels. We consider in-X subnetworks
APs) with n ∈ N = {1, 2, · · · , N } and the devices in each supporting applications that require high data rates with or
subnetwork with m ∈ M = {1, 2, · · · , M }. We assume that without minimum rate constraints. The resource optimization
a total bandwidth, B, which is partitioned into K equal-sized problem can then be defined as a constrained multi-objective
channels is available in the system and that each subnetwork task involving the maximization of N objective functions, one
operates on a single channel at each time slot. We index the for each subnetwork. To support the requirement, we take the
channels with k ∈ {1, 2, · · · , K}. Denoting the transmit power objective function as the per subnetwork sum-rate subject to
as ptx , the power received on the link between the nth AP from a minimum rate per device constraint. The problem can be
the mth device in the zth subnetwork is defined as: formally expressed as:
( M
)N
k
gn,z,m [t] = ptx |hkn,z,m [t]|2 Γkn,z,m ψn,z,m , (1) X
t
P : maxt
ζnm (c ) st: ζnm ≥ ζtarget ∀n, m (6)
{c }
m=1
where hkn,z,m [t], Γkn,z,m and ψn,z,m are the Rayleigh dis- n=1
t
tributed complex small scale gain, path-loss, and log-normal where c = [ct1 · · · ctN ]; ctn
∈ {1, 2, · · · , K} ∀n denotes the
shadowing, respectively. By considering Jakes model, the vector of indices of the channel selected by all subnetworks at
small scale gain, hkn,z,m [t], is defined as time, t and ζtarget is the target minimum rate which is assumed
p equal for all subnetworks. The problem in (6) involves joint
hkn,z,m [t] = ρhkn,z,m [t − 1] + 1 − ρ2 kn,z,m , (2) optimization of N conflicting non-convex objective functions
and is known to be difficult to solve. A multi-agent reinforce-
where kn,z,m is an iid complex Gaussian variable and ρ is ment learning method for solving the problem is proposed in
the lag-1 temporal autocorrelation coefficient. The temporal this paper.
autocorrelation coefficient is modeled as ρ = J0 (2πfd Ts ), Deep Q-Learning Fundamentals: In deep Q-learning, a
where J0 (·), fd and Ts are the zeroth order Bessel function deep neural network often called Deep Q-Network (DQN) is
of the first kind, the maximum Doppler frequency, and slot- used to approximate the Q-function. The DQN circumvents
duration, respectively. the limitations associated with its table-based counterpart and
Denoting the corresponding distance as dn,z,m , the has been shown to provide better performance. The DQN can
path-loss component, Γkn,z,m is expressed as Γkn,z,m = be expressed as
c2 d−β 2 2 8
n,z,m /16π fk , where c ≈ 3 × 10 ms
−1
is the speed of
light, fk and α are the center frequency of channel k and the Q̂(s, a) = f (s, a, θ), (7)

where f is a function determined by the DQN architecture and independent learning case. With |Dn | < N − 1, the strongest
θ is a vector of the DQN parameters. The Q-value estimation interfering subnetworks are included |Dn |.
is now reduced to optimization of θ. This optimization is Action space: The action space is the set of all possible
typically performed using standard gradient descent algorithms actions that the agent can choose from at each time. While
with the Huber loss defined as [18] the method presented here can be applied to the selection of
( any wireless resource, we consider the allocation of frequency
(Γ(θ))2 if |Γ(θ)| ≤ δ
L(θ) = (8) channels. The action space for each subnetwork is therefore
δ|Γ(θ)| − 21 δ 2 otherwise the set of all available frequency channels defined as
0 0
where Γ = r(st , a) + γ maxa0 Q (st+1 , a ; θ) − Q(st , a; θ) is A = {c1 , c2 , · · · , cK }, (10)
the difference between expected and predicted Q-values and δ
is the discriminating parameter of the loss function. where ck denotes the k channel. At each time, the nth
subnetwork’s action is denoted atn ; atn ∈ A.
III. M ULTI - AGENT DDQN FOR C HANNEL A LLOCATION Reward signal: As stated in section II, the goal of each
We cast the resource selection described above in a MARL agent is to maximize the achieved rate while also ensuring that
framework in which each subnetwork has an agent at the a target rate, rtarget is achieved. To guide the agent towards
AP whose goal is to learn a policy for selecting a frequency achieving this goal, we define the reward function considering
channel such that its communication requirements are met via the optimization problem defined in (6). The reward for the
interaction with the wireless environment as shown in Fig. 1a. nth subnetwork at time, t is defined as
As with other RL techniques, MARL requires the definition (
of the environment, state (or feature) space, action space, and ζn if ζnm ≥ ζtarget , ∀n, m
rn = , (11)
reward signal as well appropriate model for the policy. As ζn − λ∆ζn otherwise
described in section II, a wireless environment with N mobile PM
subnetworks each serving M devices is considered. The other where ζn = m=1 ζnm is the P sum rate achieved by all
M
components are described below. devices in subnetwork n, ∆ζn = m=1 (ζtarget − ζnm ) and
State space: We consider two cases viz: fully independent λ is a control parameter which is set to ensure a balance
resource selection and resource selection with limited coop- between maximizing the achieved rate and guaranteeing that
eration. In the former, no communication is possible among the minimum rate is at least equal to ζtarget .
subnetworks. Each subnetwork, therefore, makes resource se- Policy Representation: Motivated by the work in [19]
lection decisions based solely on its local sensing information. where it was shown that a DQN-variant referred to as Double
The latter allows communication of only sensing measurement DQN (DDQN) offered up to 2-fold performance improvement
between a subnetwork and others in its neighbour set, denoted and better training stability than classic DQN, we adapt the
as Dn for the nth subnetwork. The feature set of subnetwork DDQN with experience replay [20] in a multi-agent version
n is represented as for channel selection. The considered DDQN architecture is
shown in Fig. 1. The DDQN comprises two networks viz:
Sn = {Iz,1 , Iz,2 , · · · , Iz,K } ∀z ∈ {n, Dn } (9)
• Main Network: The main network acts as the action-
where Iz,k is the measured aggregate interference power on value function approximator which maps the features to
channel k at the zth subnetwork. Note that the dimension of actions. This mapping for the nth subnetwork is denoted
the neighbour set, |Dn | can be varied between 0 and N − 1 to as Q(st , ak ; θt ) : st → {q(a|st , θt )|a ∈ A}, where
control the number of neighbours from which each subnetwork q(a|st , θt ) denotes the expected cumulative rewards for
receives state information. If |Dn | = 0, we have the fully taking action a at state, st .

Authorized licensed use limited to: National Institute of Informatics (NII - Kokuritsu Johogaku Kenkyujo). Downloaded on February 11,2025 at 07:57:39 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Training of MADDQN-based channel allocation TABLE I
1: Input: Learning rate, α, discount factor, γ, number of D EFAULT SIMULATION PARAMETERS .
episodes, T , number of episode steps, Ne , batch size, Nb , Parameter Value Parameter Value

target network update interval, Tup , switching delay, τdelay Deployment area [m2 ]
subnetwork radius [m]
40 × 40
3.0
Number of subnetworks, N
Velocity, v [m/s]
25
2.0
2: Compute initial states, {s1n }Nn=1
Number of frequency channels, |A|
Shadowing standard deviation, σs [dB]
4
5
Pathloss exponent, γ
Carrier frequency [GHz]
2.7
6
3: Initialize replay memory, {Dn }N n=1 , main network param-
Transmit power [dBm]
Channel bandwidth [MHz]
0
10
Noise figure [dB]
Network structure
10
|S| − 24 − 24 − |A|

eters, {θ n }Nn=1 , target network parameters θ̃ n = θ n Optimizer

Batch size
Adam
500
Learning rate
Number of training episodes
0.001
2000
Initial/final Epsilon 1/0.01 Discount factor, γ 0.99
4: for t = 1 to T do
5: Generate random switching index, {τn }N n=1
6: for i = 1 to Ne do exploration probability, respectively, and step is the number
7: for n = 1 to N do of exploration steps. The multi-agent training procedure is
8: if i modulo τdelay == τn then described in Algorithm 1.
9: subnetwork n obtain feature vector, stn
IV. P ERFORMANCE E VALUATION
10: subnetwork n select atn using -greedy strategy
11: end if Simulation settings: We consider a network with N = 25
12: end for subnetworks each with a single controller serving as the AP for
13: The joint resource selection of all subnetworks yield a sensor-actuator pair. Subnetworks are uniformly distributed
14: transitions into next states, {st+1 N
n }n=1 and in a 40 m × 40 m rectangular area leading to a deployment
15: t
immediate rewards, {rn (s , a)}n=1 N
density of 15625 subnetworks/km2 . Each subnetwork moves
16: if i modulo τdelay == τn then according to a restricted random direction mobility with a
17: Store experience samples (stn , atn , rnt , st+1
n ) in replay velocity, v = 2 m/s translating to a Doppler frequency,
18: memory Dn ; ∀n ∈ {1, · · · , N } fd = 40 Hz. We assume that transmissions occur over a
19: end if bandwidth, B = 10 MHz. Except where stated otherwise, we
20: Decay exploration probability as in (12). set the number of frequency channels, K = 4, and the transmit
21: if t modulo Nb == 0 then power, Ptx = 0 dBm. Without loss of generality, we consider
22: for n = 1 to N do a single device per subnetwork, i.e., M = 1. Other simulation
23: Randomly choose a mini-batch, (sτn , aτn , rnτ , sτn+1 ) parameters are listed in Tab. I.
24: Perform gradient descent to minimize (8) DDQN Design and Training Procedure: The main and the
25: end for target DDQN policy are implemented as fully connected neural
26: end if network (FCNN) architectures with two hidden layers each
27: if t modulo Tup == 0 then with 24 neurons in MATLAB. We studied both distributed
28: Update target networks: θ̃ n = θ n ; ∀n ∈ {1, · · · , N } training and execution approaches in which N agents are
29: end if trained simultaneously and the centralized training with dis-
30: end for tributed execution which involves training a single agent and
31: end for copying its weights to other agents either during the training
32: Output: Trained DQNs, {Qn }N n=1 or at convergence. The goal is to understand the potential of
both training mechanisms for the channel selection problem.
Our initial results showed that distributed training results in
• Target Network: In DDQN, the target network is used excessively long training time. With N = 25 subnetworks, it
for estimating expected returns from choosing an action took approximately 12× longer (time to convergence of about
at a given state as shown in Fig. 1. Estimates of the 44 hours) on a quad-core laptop with 8 GB RAM to train
expected reward are then used to compute the Q-value the agents in a distributed version compared to centralized
approximation error while performing optimization of training which took about 3.8 hours on the same machine. The
the main network. We denote the target network as distributed training procedure is summarized in Algorithm 1.
Q̃(st , ak ; θ̃t ). The target network has the same structure The centralized approach follows the same procedure except
as the main network but its weights are only updated after that only a single agent is applied at all subnetworks during
a specified number of steps, Tup , i.e., θ̃t := θt every Tup training.
steps. To achieve stability and improve convergence, experience
Action Selection During the training, resource selection deci- replay technique is used to store previous experiences in a
sion is made by each agent via the -greedy strategy [18], replay buffer. Samples for updating the DDQN weights are
where is the exploration probability, i.e., the probability then drawn randomly from the buffer thereby eliminating cor-
that the agent takes random action. During the training, is relations between successive samples. The agents are trained
decayed according to using the reward function in (11) with ζtarget = 0 bps/Hz.
Similar to the works in [6], [13], we utilized random switching
= max (min , (max − min )/step ) , (12)
delays to minimize the impact of ping-pong effects resulting
where min and max denote the minimum and maximum from simultaneous switching by multiple subnetworks to the

Authorized licensed use limited to: National Institute of Informatics (NII - Kokuritsu Johogaku Kenkyujo). Downloaded on February 11,2025 at 07:57:39 UTC from IEEE Xplore. Restrictions apply.
1 1
4.9 Surogate Optimizer
Surogate Optimizer
0.9 Centralized Coloring 0.9 Centralized Coloring
4.8 MADDQN: : jDj = 0
MADDQN: : jDj = 0
0.8 MADDQN: : jDj = 3 0.8 MADDQN: : jDj = 3
4.7 MADDQN: : jDj = 7 MADDQN: : jDj = 7
0.7 Random 0.7 Random
4.6
Reward [bps/Hz]

0.6 0.6
4.5

CDF
CDF
0.5 0.5
4.4
0.4 0.4
4.3 Reward: jDj = 0
Averaged Reward: jDj = 0 0.3 0.3
4.2 Reward: jDj = 3
Averaged Reward: jDj = 3 0.2 0.2
Reward: jDj = 7
4.1 0.1 0.1
Averaged Reward: jDj = 7
4 0 0
0 500 1000 1500 2000 0 2 4 6 8 10 2 2.5 3 3.5 4 4.5 5 5.5 6
Episode Per device rate [bps/Hz] Average rate [bps/Hz]

(a) Averaged reward versus episode with N = 25. (b) CDF of per device rate with N = 25. (c) CDF of average rate.
8

Random 0.04 80
MADDQN: jDj = 3
7

Centralized coloring and MADDQN [s]

Centralized Coloring
20 6 0.035 70

Mean running time estimate for

Average rate [bps/Hz]
Average rate [bps/Hz]

0.03

Surrogate optimizer [s]

5 60
15
0.025
4 50
0.02
10 3 40
0.015
2 30
5 0.01
Random MADDQN: : jDj = 0
1 MADDQN: jDj = 3 20
0.005 Centralized Coloring
Centralized Coloring Surogate Optimizer
0 0 0 10
5 10 15 20 25 30 35 40 45 1 2 3 4 5 6 7 8 9 5 10 15 20 25 30 35 40 45
Number of subnetworks Shadowing standard deviation [dB] Number of subnetworks

(d) Sensitivity to the number of subnetworks. (e) Sensitivity to shadowing standard deviation (f) Running time estimates
Fig. 2. Plots of the learning curves (a), performance (b-c), sensitivity evaluation (d-e) results, and running time estimates (f).

same channel. The delay is generated for all subnetworks at the surrogateopt function in MATLAB with default
the beginning of each snapshot as a random integer factor parameters except for the number of iterations which
of the transmission interval with a maximum value of 10. A is set to 400.
subnetwork is then allowed to perform channel switching at 3) Centralized coloring: Greedy graph coloring is applied
time instants determined by its assigned delay value. to the interference graph, G created from the matrix of
Simulation Results: Fig. 2a shows the averaged reward mutual interference power between subnetworks with a
over successive episodes with no target rate constraint, i.e., K − 1 strongest interfering neighbors edge constraint.
ζtarget = 0 bps/Hz and size of neighbor set for each subnet- To guarantee colorability G, the successive graph spar-
work, |D| = [0, 3, 7]. The averaging is performed over all steps sification involving removal of the weakest edges until
within each episode and all subnetworks. The figure shows no more than K colors are required [13] is used in the
that convergence is achieved at approximately 1000 episodes simulations.
with fully independent, i.e., |D| = 0 and 1600 episodes with Fig. 2b shows the empirical Cumulative Distribution Func-
|D| = 3 and |D| = 7. This indicates that an agent requires tion (CDF) of the per-device rate for the different methods.
longer training to learn the feature-to-action mapping function The proposed MADDQN scheme performs better than the
using sensing measurements from multiple subnetworks than random channel allocation, similar to centralized coloring,
using only local measurements. At convergence, averaged and only marginally worse compared to the iterative surrogate
reward of about 4.60 bps/Hz, 4.75 bps/Hz and 4.70 bps/Hz optimization technique. The averaged rate (or equivalently sum
is achieved with |D| = 0, |D| = 3 and |D| = 7, respectively, rate) performance of the different channel allocation methods
indicating marginal improvement of 3.3% with |D| = 3 and is shown in Fig. 2c where we plot the CDF of the rate averaged
2.2% with |D| = 7 compared to the fully independent case, over all subnetworks. Compared to random allocation, the
i.e. |D| = 0. proposed MADDQN method offers between ∼ 15% (with
The trained DDQN agents are deployed for distributed |D| = 0) and ∼ 19% (with |D| = 3) improvement at the
channel allocation and performance compared with three median of the average rate distribution and is only about ∼ 6%
benchmark algorithms viz: below the median average rate achieved by the centralized
1) Random: assign frequency channels randomly to all benchmark schemes, i.e., centralized coloring and surrogate
subnetworks at the start of a snapshot. optimizer. We remark here that the proposed method offers
2) Mixed Integer Surrogate Optimizer: the surrogate opti- the advantage of much lower signaling overhead since only a
mization method [21] is applied in a centralized version very limited exchange of information is required.
to the mixed integer problem involving maximization of Sensitivity evaluation: We study the robustness of the
the network sum rate. This method is implemented using proposed method to changes in the wireless environment than

Authorized licensed use limited to: National Institute of Informatics (NII - Kokuritsu Johogaku Kenkyujo). Downloaded on February 11,2025 at 07:57:39 UTC from IEEE Xplore. Restrictions apply.
those used during the training. Due to its high computation surrogate optimizer as well as the centralized graph coloring
complexity, the iterative surrogate optimizer is not included in with high signaling overhead. Our results further indicated that
the sensitivity evaluation. The MADDQN model trained with the proposed method is robust to changes in the deployment
N = 25 subnetworks and shadowing standard deviation of density as well as propagation parameters.
σs = 5 dB is evaluated with values of N between 5 and 45
R EFERENCES
in the same 40 m × 40 m and σs between 1 dB and 9 dB.
We plot the mean and standard deviation of the average rate [1] V. Ziegler, H. Viswanathan, H. Flinck, M. Hoffmann, V. Räisänen, and
K. Hätönen, “6G architecture to connect the worlds,” IEEE Access,
as a function of the number of subnetworks in Fig. 2d and vol. 8, pp. 173 508–173 520, 2020.
shadowing standard deviation in Fig. 2e. In both cases, the [2] H. Viswanathan and P. E. Mogensen, “Communications in the 6G Era,”
MADDQN method shows a similar trend as well as relative IEEE Access, vol. 8, pp. 57 063–57 074, 2020.
[3] G. Berardinelli, P. Baracca, R. Adeogun, S. Khosravirad, F. Schaich,
performance to the centralized coloring and random allocation K. Upadhya, D. Li, T. B. Tao, H. Viswanathan, and P. E. Mogensen,
benchmarks indicating that all schemes are equally affected “Extreme Communication in 6G: Vision and Challenges for ‘in-X’
by the changes in the number of subnetworks and shadowing Subnetworks,” IEEE OJCOM, 2021.
[4] G. Berardinelli, P. Mogensen, and R. O. Adeogun, “6G subnetworks for
standard deviation. It is therefore reasonable to conclude that Life-Critical Communication,” in 2nd 6G Wireless Summit, 2020.
the proposed scheme is robust to changes in the considered [5] F. Hussain, S. A. Hassan, R. Hussain, and E. Hossain, “Machine
wireless parameters. Learning for Resource Management in Cellular and IoT Networks:
Potentials, Current Solutions, and Open Challenges,” IEEE Commun.
Complexity Analysis: We compare the computational com- Surveys Tuts., vol. 22, no. 2, pp. 1251–1275, 2020.
plexity of the proposed MADDQN method with the bench- [6] R. Adeogun, G. Berardinelli, I. Rodriguez, and P. E. Mogensen, “Dis-
mark algorithms by estimating the total time required to per- tributed Dynamic Channel Allocation in 6G in-X Subnetworks for
Industrial Automation,” in IEEE Globecom Workshops, 2020.
form channel allocation for all subnetworks at each transmis- [7] R. O. Adeogun, “A novel game theoretic method for efficient downlink
sion instant. In Fig. 2f, we plot the average total running time resource allocation in dual band 5G heterogeneous network,” Wireless
per step as a function of the number of subnetworks. The figure Personal Communications, vol. 101, no. 1, pp. 119–141, Jul 2018.
[8] U. Mehboob, J. Qadir, S. Ali, and A. Vasilakos, “Genetic algorithms in
shows that the proposed MADDQN and our implementation wireless networking: techniques, applications, and issues,” Soft Comput-
of greedy coloring can provide up to a factor of 2000 reduction ing, vol. 20, no. 6, pp. 2467–2501, 2016.
in time complexity relative to the iterative surrogate optimizer. [9] K. T. Phan, T. Le-Ngoc, S. A. Vorobyov, and C. Tellambura, “Power
allocation in wireless relay networks: A geometric programming-based
While the running time for centralized coloring is marginally approach,” in IEEE GLOBECOM 2008 - 2008 IEEE Global Telecom-
lower than that of MADDQN for values of N between 5 and munications Conference, 2008, pp. 1–5.
35, the linear growth achieved by the latter makes it more [10] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos,
“Learning to Optimize: Training Deep Neural Networks for Interference
attractive for deployments with higher number of subnetworks, Management,” IEEE Transactions on Signal Processing, vol. 66, no. 20,
i.e., N ≥ 40. Note that the distributed MADDQN method pp. 5438–5453, 2018.
has minimal signaling overhead compared to the centralized [11] C. Sun and C. Yang, “Learning to Optimize with Unsupervised Learning:
Training Deep Neural Networks for URLLC,” in IEEE PIMRC, 2019,
benchmarks. Assuming a constant time cost for exchanging pp. 1–7.
sensing measurement between any pair of subnetworks or from [12] J. Burgueno, R. Adeogun, R. L. Bruun, C. S. M. Garcı́a, I. de-la Bandera,
a subnetwork to the central resource manager, the signaling and R. Barco, “Distributed Deep Reinforcement Learning Resource
Allocation Scheme For Industry 4.0 Device-To-Device Scenarios,” in
complexity for MADDQN and centralized benchmarks (i.e., IEEE VTC-Fall). IEEE, 2021, pp. 1–7.
centralized coloring and surrogate optimizer) is upper bounded [13] R. Adeogun, G. Berardinelli, and P. E. Mogensen, “Enhanced interfer-
by O(N |Dn |) and O(N 2 ), respectively. As observed from ence management for 6G in-X subnetworks,” IEEE Access, vol. 10, pp.
45 784–45 798, 2022.
the training curves in Fig. 2a and the mean rate performance [14] R. O. Adeogun, G. Berardinelli, and P. E. Mogensen, “Learning to Dy-
in Fig. 2c, no performance improvement is achieved with namically Allocate Radio Resources in Mobile 6G in-X Subnetworks,”
values of |Dn | > K − 1. In practical interference-limited in IEEE PIMRC, 2021.
[15] R. Adeogun and G. Berardinelli, “Multi-agent dynamic resource alloca-
scenarios, the number of available channels, K is much less tion in 6G in-X subnetworks with limited sensing information,” Sensors,
than the number of subnetworks. i.e., N << K and hence the vol. 22, no. 13, p. 5062, 2022.
signalling cost complexity for MADDQN reduces to O(N ). [16] X. Du, T. Wang, Q. Feng, C. Ye, T. Tao, L. Wang, Y. Shi, and
M. Chen, “Multi-agent reinforcement learning for dynamic resource
management in 6G in-X subnetworks,” IEEE Transactions on Wireless
V. C ONCLUSION Communications, pp. 1–1, 2022.
A simple multi-agent DDQN (MADDQN) approach is pro- [17] S. Lu, J. May, and R. J. Haines, “Effects of correlated shadowing
modeling on performance evaluation of wireless sensor networks,” in
posed for fully distributed dynamic channel allocation in dense IEEE Vehicular Technology Conference, 2015, pp. 1–5.
deployments of 6G in-X subnetworks. The access point in each [18] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,
subnetwork acts as the DDQN agent which dynamically makes 2nd ed. The MIT Press, 2018.
[19] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
channel selection decisions based on aggregate interference with double q-learning,” CoRR, vol. abs/1509.06461, 2015. [Online].
power per channel measurements obtained via sensing. The Available: http://arxiv.org/abs/1509.06461
presented performance results indicated that DDQN agents for [20] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier-
stra, and M. Riedmiller, “Playing Atari with deep reinforcement learn-
channel allocation can be trained with reasonably fast con- ing,” 2013.
vergence. The MADDQN approach yields a median average [21] H.-M. Gutmann, “A radial basis function method for global optimiza-
rate that is up to 19% higher than baseline random allocation tion,” Journal of global optimization, vol. 19, no. 3, pp. 201–227, 2001.
and only about 6% lower than the computational intensive

Authorized licensed use limited to: National Institute of Informatics (NII - Kokuritsu Johogaku Kenkyujo). Downloaded on February 11,2025 at 07:57:39 UTC from IEEE Xplore. Restrictions apply.

Split Federated Learning For 6G Enabled-Networks: Requirements, Challenges and Future Directions
No ratings yet
Split Federated Learning For 6G Enabled-Networks: Requirements, Challenges and Future Directions
37 pages
6G ppt1
No ratings yet
6G ppt1
39 pages
Meta Federated Reinforcement Learning For Distributed Resource Allocation
No ratings yet
Meta Federated Reinforcement Learning For Distributed Resource Allocation
11 pages
Multi Agent Dynamic Resource Allocation in 6G in X With Subnetworks
No ratings yet
Multi Agent Dynamic Resource Allocation in 6G in X With Subnetworks
15 pages
Progress
No ratings yet
Progress
30 pages
Deep Learning DL Based Joint Resource Allocation and RRH Association in 5G-Multi-Tier Networks
No ratings yet
Deep Learning DL Based Joint Resource Allocation and RRH Association in 5G-Multi-Tier Networks
1 page
Hwang 2020
No ratings yet
Hwang 2020
6 pages
Learning To Allocate Radio Resources in Mobile 6G in X Subnetworks FV
No ratings yet
Learning To Allocate Radio Resources in Mobile 6G in X Subnetworks FV
8 pages
Zhang 2019
No ratings yet
Zhang 2019
6 pages
A Brief Review of Machine Learning-Based Approaches For Advanced Interference Management in 6G In-X Sub-Networks
No ratings yet
A Brief Review of Machine Learning-Based Approaches For Advanced Interference Management in 6G In-X Sub-Networks
13 pages
Graph Neural Networks Approach For Joint Wireless Power Control and Spectrum Allocation
No ratings yet
Graph Neural Networks Approach For Joint Wireless Power Control and Spectrum Allocation
16 pages
Distributed Learning For Wireless Communications Methods Applications and Challenges
No ratings yet
Distributed Learning For Wireless Communications Methods Applications and Challenges
17 pages
Offline and Distributional Reinforcement Learning For Wireless Communications
No ratings yet
Offline and Distributional Reinforcement Learning For Wireless Communications
7 pages
Marcastel 2016 VTC
No ratings yet
Marcastel 2016 VTC
6 pages
Mathematics 10 03415 v2
No ratings yet
Mathematics 10 03415 v2
19 pages
A Machine Learning Perspective To Foster The Next Generation 5g Networks
No ratings yet
A Machine Learning Perspective To Foster The Next Generation 5g Networks
6 pages
A Reinforcement Learning Method For Joint Mode Selection and Power Adaptation in The V2V Communication Network in 5G
No ratings yet
A Reinforcement Learning Method For Joint Mode Selection and Power Adaptation in The V2V Communication Network in 5G
12 pages
10 1016@j Comnet 2020 107556
No ratings yet
10 1016@j Comnet 2020 107556
28 pages
Deep Q-Network For 5G NR Downlink Scheduling
No ratings yet
Deep Q-Network For 5G NR Downlink Scheduling
6 pages
6G Communication Networks Introduction, Vision, Challenges, and Future Directions
No ratings yet
6G Communication Networks Introduction, Vision, Challenges, and Future Directions
28 pages
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
No ratings yet
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
23 pages
Papper (Anh Hiếu)
No ratings yet
Papper (Anh Hiếu)
14 pages
Deep Learning Based
No ratings yet
Deep Learning Based
16 pages
HUSSEIN
No ratings yet
HUSSEIN
23 pages
Spectrum
No ratings yet
Spectrum
1 page
1 en 19 Chapter Author
No ratings yet
1 en 19 Chapter Author
12 pages
Artifical Intelligencein 5 GNetworks
No ratings yet
Artifical Intelligencein 5 GNetworks
10 pages
Reinforcement Learning Based Hybrid Spectrum Resource Allocation Scheme For The High Load of URLLC Services
No ratings yet
Reinforcement Learning Based Hybrid Spectrum Resource Allocation Scheme For The High Load of URLLC Services
21 pages
GD121 Spare Parts Old
No ratings yet
GD121 Spare Parts Old
647 pages
Clustering and Resource Allocation Strategy For D2D Multicast Networks With Machine Learning Approaches Working
No ratings yet
Clustering and Resource Allocation Strategy For D2D Multicast Networks With Machine Learning Approaches Working
16 pages
50 Make Smart Decisions Faster Deciding D2D Resource Allocation Via Stackelberg Game Guided Multi Agent Deep Reinforcement Learning
No ratings yet
50 Make Smart Decisions Faster Deciding D2D Resource Allocation Via Stackelberg Game Guided Multi Agent Deep Reinforcement Learning
12 pages
Deep Reinforcement Learning For RAN Optimization and Control
No ratings yet
Deep Reinforcement Learning For RAN Optimization and Control
6 pages
Whale Optimization Algorithm With Applications To Resource Allocation in Wireless Networks
No ratings yet
Whale Optimization Algorithm With Applications To Resource Allocation in Wireless Networks
13 pages
Organism: With A Foreword by
100% (1)
Organism: With A Foreword by
432 pages
Robust Deep Learning For Wireless Network Optimization
No ratings yet
Robust Deep Learning For Wireless Network Optimization
7 pages
Deep Reinforcement Learning Based Dynamic Resource Allocation in 5G Ultra-Dense Networks
No ratings yet
Deep Reinforcement Learning Based Dynamic Resource Allocation in 5G Ultra-Dense Networks
7 pages
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
No ratings yet
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
30 pages
Resource Management Based On Reinforcement Learning For D2D Communication in Cellular Networks
No ratings yet
Resource Management Based On Reinforcement Learning For D2D Communication in Cellular Networks
6 pages
6 Advanced Learning Algorithms For Enhanced
No ratings yet
6 Advanced Learning Algorithms For Enhanced
9 pages
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
No ratings yet
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
6 pages
Particle Swarm
No ratings yet
Particle Swarm
7 pages
Tham 2019
No ratings yet
Tham 2019
4 pages
DRL - 63 - Deep Reinforcement Learning For Resource Allocation
No ratings yet
DRL - 63 - Deep Reinforcement Learning For Resource Allocation
6 pages
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
No ratings yet
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
14 pages
Federated Deep Reinforcement Learning For User Access Control in Open Radio Access Networks
No ratings yet
Federated Deep Reinforcement Learning For User Access Control in Open Radio Access Networks
6 pages
Machine Learning For 5g Mobile and Wire
No ratings yet
Machine Learning For 5g Mobile and Wire
6 pages
Camera Ready Paper CONIT Hubballi
No ratings yet
Camera Ready Paper CONIT Hubballi
8 pages
A Machine Learning Framework For Resource Allocation Assisted by Cloud Computing
No ratings yet
A Machine Learning Framework For Resource Allocation Assisted by Cloud Computing
8 pages
NB-IoT 2
No ratings yet
NB-IoT 2
10 pages
Learn To Allocate Resources in Vehicular Networks
No ratings yet
Learn To Allocate Resources in Vehicular Networks
7 pages
DRL For WSN Book
No ratings yet
DRL For WSN Book
78 pages
Review Article Drl-Based Intelligent Resource Allocation For Diverse Qos in 5G and Toward 6G Vehicular Networks: A Comprehensive Survey
No ratings yet
Review Article Drl-Based Intelligent Resource Allocation For Diverse Qos in 5G and Toward 6G Vehicular Networks: A Comprehensive Survey
21 pages
6sn1118 0dh23 0aa1 Manual
100% (1)
6sn1118 0dh23 0aa1 Manual
485 pages
Resource Allocation Based On Graph Neural Networks in Vehicular Communications
No ratings yet
Resource Allocation Based On Graph Neural Networks in Vehicular Communications
5 pages
Deep Reinforcement Learning Based Computation Offloading and Resource Allocation For MEC
No ratings yet
Deep Reinforcement Learning Based Computation Offloading and Resource Allocation For MEC
6 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
9 pages
Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning For URLLC and eMBB Services
No ratings yet
Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning For URLLC and eMBB Services
14 pages
Federated Reinforcement Learning-Based Resource Allocation in D2D-Enabled 6G
No ratings yet
Federated Reinforcement Learning-Based Resource Allocation in D2D-Enabled 6G
7 pages
Machine Learning Techniques For 5G and Beyond
No ratings yet
Machine Learning Techniques For 5G and Beyond
18 pages
991.20 Nitrogeno Total en Leche - Kjeldahl
No ratings yet
991.20 Nitrogeno Total en Leche - Kjeldahl
2 pages
Stateless Reinforcement Learning
No ratings yet
Stateless Reinforcement Learning
5 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
16 pages
Federated Quantum Neural Network With Quantum Teleportation For Resource Optimization in Future Wireless Communication
No ratings yet
Federated Quantum Neural Network With Quantum Teleportation For Resource Optimization in Future Wireless Communication
17 pages
SANS MGT414 10 Course Book
No ratings yet
SANS MGT414 10 Course Book
100 pages
Trial Memorandum Plaintiff SAMPLE
100% (4)
Trial Memorandum Plaintiff SAMPLE
10 pages
Chapter 1 - Marketing in Today's Economy
No ratings yet
Chapter 1 - Marketing in Today's Economy
43 pages
OSS Engine Parts Section
No ratings yet
OSS Engine Parts Section
28 pages
Jakemurphy
No ratings yet
Jakemurphy
21 pages
FLEX-1500 Service Manual
No ratings yet
FLEX-1500 Service Manual
49 pages
Power Control For 6G In-Factory Subnetworks With Partial Channel Information Using Graph Neural Networks
No ratings yet
Power Control For 6G In-Factory Subnetworks With Partial Channel Information Using Graph Neural Networks
16 pages
Gat MF
No ratings yet
Gat MF
13 pages
Amer Shield
No ratings yet
Amer Shield
4 pages
EMD001 - Medical Companion
No ratings yet
EMD001 - Medical Companion
115 pages
Chapter 12.2 - Financial Statements
No ratings yet
Chapter 12.2 - Financial Statements
10 pages
Simulation
No ratings yet
Simulation
2 pages
Sanskrit PDF
No ratings yet
Sanskrit PDF
33 pages
HUST PPT Template 2022 Blue 4x3
No ratings yet
HUST PPT Template 2022 Blue 4x3
13 pages
Pengaruh Model PBL Terhadap Kemampuan Berpikir Kreatif Ditinjau Dari Kemandirian Belajar Siswa
No ratings yet
Pengaruh Model PBL Terhadap Kemampuan Berpikir Kreatif Ditinjau Dari Kemandirian Belajar Siswa
14 pages
Data Sheet - Carrier Chiller
No ratings yet
Data Sheet - Carrier Chiller
4 pages
A Shani 2020
No ratings yet
A Shani 2020
9 pages
(3b.) Positive Production Externalities (Type of Market Failure) - Notes
No ratings yet
(3b.) Positive Production Externalities (Type of Market Failure) - Notes
6 pages
Spark Streaming Assignment
No ratings yet
Spark Streaming Assignment
2 pages
Activity 3 Earths Interior
No ratings yet
Activity 3 Earths Interior
3 pages
Korea University Urban Planning and Urban Design Lab
No ratings yet
Korea University Urban Planning and Urban Design Lab
4 pages
General Biology Chapter 2 Assignment
No ratings yet
General Biology Chapter 2 Assignment
2 pages
North and South
No ratings yet
North and South
18 pages
Adult Christian Education: A Training of Kingdom Workers
No ratings yet
Adult Christian Education: A Training of Kingdom Workers
9 pages
Family Waste Inventory
No ratings yet
Family Waste Inventory
2 pages
Drawwork Cathead Control Panel, Model 9015A030 M851001467-ASM-001
No ratings yet
Drawwork Cathead Control Panel, Model 9015A030 M851001467-ASM-001
1 page
Specifications-700-HC Relays: Relay and Timer Specifications
No ratings yet
Specifications-700-HC Relays: Relay and Timer Specifications
1 page
Classification of Wildlife: Geography Project
No ratings yet
Classification of Wildlife: Geography Project
2 pages
Residential Plots For Sale in Wadakpally - Bheeramguda
No ratings yet
Residential Plots For Sale in Wadakpally - Bheeramguda
2 pages
Cookbook - Cuisine of The United Kingdom
No ratings yet
Cookbook - Cuisine of The United Kingdom
4 pages
IGNOU MCS 231 Mobile Computing Previous Year Solved Papers
From Everand
IGNOU MCS 231 Mobile Computing Previous Year Solved Papers
Manish Soni
No ratings yet

Distributed Channel Allocation For Mobile 6G Subnetworks Via Multi-Agent Deep Q-Learning

Uploaded by

Distributed Channel Allocation For Mobile 6G Subnetworks Via Multi-Agent Deep Q-Learning

Uploaded by

Distributed Channel Allocation for Mobile 6G

Subnetworks via Multi-Agent Deep Q-Learning

978-1-6654-9122-8/23/$31.00 ©2023 IEEE

eters, {θ n }Nn=1 , target network parameters θ̃ n = θ n Optimizer

Centralized coloring and MADDQN [s]

Mean running time estimate for

Mean running time estimate for

Surrogate optimizer [s]

You might also like