0% found this document useful (0 votes)
27 views18 pages

End-To-End Learning of Adaptive Coded Modulation Schemes For Resilient Wireless Communications

This article presents a novel approach to adaptive modulation and coding (AMC) in wireless communications using an auto-encoder (AE) model designed for multi-task learning. The proposed architecture allows for the simultaneous learning of multiple code rates, improving Block Error Rate performance over conventional methods in various channel conditions without the need for retraining. The study demonstrates the effectiveness of this method in optimizing data transmission by adapting to changing channel conditions in real-time.

Uploaded by

Dario Xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views18 pages

End-To-End Learning of Adaptive Coded Modulation Schemes For Resilient Wireless Communications

This article presents a novel approach to adaptive modulation and coding (AMC) in wireless communications using an auto-encoder (AE) model designed for multi-task learning. The proposed architecture allows for the simultaneous learning of multiple code rates, improving Block Error Rate performance over conventional methods in various channel conditions without the need for retraining. The study demonstrates the effectiveness of this method in optimizing data transmission by adapting to changing channel conditions in real-time.

Uploaded by

Dario Xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Applied Soft Computing Journal 159 (2024) 111672

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

End-to-end learning of adaptive coded modulation schemes for resilient


wireless communications
Christopher P. Davey a ,∗, Ismail Shakeel a,b , Ravinesh C. Deo a ,∗, Ekta Sharma a ,
Sancho Salcedo-Sanz c , Jeffrey Soar d
a
School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, 4300, Queensland, Australia
b Spectrum Warfare Branch, Information Sciences Division, Defence Science and Technology Group (DSTG), Edinburgh, 5111, SA, Australia
c Department of Signal Processing and Communications, Universidad de Alcalá, Alcalá de Henares, 28805, Spain
d School of Business, University of Southern Queensland, Springfield, 4300, Queensland, Australia

GRAPHICAL ABSTRACT

ARTICLE INFO ABSTRACT

Keywords: Adaptive modulation and coding schemes play a crucial role in ensuring robust data transfer in wireless
Wireless communications communications, especially when faced with changes or interference in the transmission channel. These
Adaptation schemes involve the use of variable coding rates, which can be achieved normally through code puncturing
Coding design
or shortening, and have been adopted in 4G and 5G communication standards. In recent works, auto-encoders
Deep learning
for wireless communications have demonstrated the ability to learn short code representations that achieve
Multi-task learning
gains over conventional codes. Such a methodology is attractive as it can learn optimal representations under
a variety of channel conditions. However, due to its structure the auto-encoder does not currently support
multiple code rates with a single model. This article draws upon the discipline of multi-task learning, as
it applies to deep learning and therefore devises a branching architecture for the auto-encoder and custom
training algorithm in training transmitter and receiver for adaptive modulation and coding. In this article
we aim to demonstrate improvements in Block Error Rate over conventional methods in the Additive White
Gaussian Noise channel, and to analyse the performance of the model under Rayleigh fading channels without
retraining the auto-encoder on the new channel. This article demonstrates a novel approach towards training

∗ Corresponding authors.
E-mail addresses: [email protected] (C.P. Davey), [email protected] (R.C. Deo).

https://doi.org/10.1016/j.asoc.2024.111672
Received 4 April 2023; Received in revised form 14 November 2023; Accepted 15 April 2024
Available online 26 April 2024
1568-4946/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

auto-encoder models to jointly learn adaptive modulation and coding schemes framed as a multi-task learning
problem. The research outcomes extend end-to-end learning approaches to the design of adaptive wireless
communications systems.

designed in isolation of each other, and often, they do not account for
1. Introduction the distortions introduced by different types of channels. This method
of system design is referred to as the block design, where components
Adaptive modulation and coding (AMC) methods can enable wire- are individually optimised and do not consider interactions between
less communication systems to optimise the transmission of data over components or the channel distortion [3].
a channel with varying operating conditions, message sizes, and data The ability to automatically learn each of the stages within a
transfer rates. This involves adjusting the modulation scheme and error wireless communications system presents an advantage over the block
correction coding rate in real-time based on the channel conditions. design approach, since each of the stages can be optimised jointly with
AMC algorithms are designed to monitor the channel conditions in respect to a given channel condition and hardware imperfection. As
real-time and dynamically select a suitable modulation and coding such, learning how to transmit and receive coded information over
scheme that provides the best trade-off between data rate and error the wireless channel has recently attracted significant attention in the
protection for the given channel conditions. The selection of the best field of wireless communications. Deep learning (DL) methods, and in
modulation and coding scheme is normally made by a look-up table particular the AE architecture have been demonstrated to jointly opti-
mise both transmitter and receiver with respect to an assumed channel
or an algorithm that maps the channel conditions to a particular
model [2]. Such a joint approach, learns coding and modulation in an
modulation and coding scheme with respect to transmit power, ex-
end-to-end manner by gradual optimisation of the model parameters to
pected channel use and error rate [1]. The modulator/demodulator
minimise the error produced in symbol-wise classification of individual
and encoder/decoder algorithms for each modulation and code are
messages [2].
normally designed and implemented separately on the communication
In end-to-end learning, a transmitter acts as an encoder network to
platform [2]. This paper focuses on using machine learning techniques
encode a message, while the receiver acts as the decoder network to
to generate multiple coded modulation schemes from a unified model
retrieve the original message [2]. The channel is represented either as
architecture for channel adaptation.
an instantaneous function which adds perturbations to the output of
In communications systems, the primary goal is to transmit a mes- the transmitter [2], may be learnt through adversarial techniques [4] or
sage through a communication channel to a receiver and then recon- reinforcement learning (RL) is applied without assuming a channel [5].
struct the original message without error at the receiver. Distortions The design of the AE for wireless communications in [2], does not
that are introduced by the channel are the primary obstacle to an error support learning more than one code rate, in that approach support for
free recovery of the transmitted message. multiple code rates requires separate networks. Therefore the subject
Fig. 1 illustrates a simple wireless communications system, which of this research article investigates how to parameterise and alter the
is the focus of our article. In such a system, a message 𝑀, defined structure of a single AE for wireless communications to enable support
in bits, is formatted and communicated by a transmitter over the air for multiple code rates.
(the channel) to be recovered and interpreted at the receiver. In order Multi-task learning (MTL) in DL research is concerned with the
to be transmitted, messages may be coded for error correction and challenge of training a single network architecture to concurrently
modulated for transmission over the channel. The modulation process perform different but related tasks, which may also have a dependency
converts a bit sequence into a waveform where each discrete point relation between them [6]. Approaches to MTL consider the architec-
(symbol) in the waveform is represented as a series of complex symbols ture design, model regularisation and training methods [6]. There is a
𝑥(𝑡) ∈ C, having both in-phase and quadrature (IQ) components, where relationship between negative transfer in MTL and catastrophic forget-
𝑡 indicates the discrete time step for the symbol. We use the term ting which occurs in sequential learning [7,8]. Negative transfer occurs
‘‘symbol’’ after modulation because a single symbol can refer to more when certain tasks negatively impact the ability to learn other tasks
than 1 bit of the original message. The number of bits mapped to when learnt concurrently [6]. Hence regularisation techniques acting to
a symbol is referred to as the order of the modulation. However, a minimise negative transfer during training are a key concern of MTL.
modulation is not necessarily sufficient to allow the receiver to recover This is realised through the design of the network architecture (hard
the message without error. sharing) as well as through weight regularisation and loss functions
The subject of this research is coding, which is an essential com- (soft sharing) [6]. In this article we approach AMC from the perspective
ponent of any communication system that enables error detection and of MTL and apply both hard sharing in the AE network design as well
correction at the receiver end. A variety of techniques are available as soft sharing in the approach to regularisation during the training
to code a particular message, and these include linear block codes procedure. We demonstrate that a multi-branching variant of the AE
and convolutional codes. The resulting code words are longer than the is better suited to learning AMC in comparison to a single path AE
original message block, and the ratio between the original message network. The model is trained by iterating between end-to-end and
at length 𝐾 and the resulting 𝑁 bit code word is known as the code receiver only training, and we apply a weight averaging regularisation
technique [9] to improve the error rates for each of the resulting code
rate 𝐾∕𝑁. For smaller code rates, one requires more symbols to be
rates.
transmitted across the channel. The ability to perform error correction
The main contributions of this paper are:
using a coded message can achieve significant improvements over
uncoded messages (a coding gain) but this comes with a trade-off in • To change the structure of the AE for wireless communications to
terms of the number of usages required to transmit the message, or enable learning multiple code rates with a single neural network
in other words, the amount of power required to send the message. architecture. A shared path with branching output heads are
However, the ability to receive a message without error reduces the activated based on a selected code rate parameter to support
need for re-transmissions. end-to-end learning for AMC.
As illustrated in Fig. 1, the coding and modulation are an integral • To frame the end-to-end learning of AMC for wireless communi-
part of the system. Therefore in this article, we focus on the learning cations as a multi-task learning problem.
of the coding and modulation process that assumes a perfect synchro- • Propose a training procedure which iterates between end-to-end
nisation of the transmitted signal with the receiver. The coding and training and receiver training, producing lower error rates than
modulation stages in the wireless communications system are typically single step training.

2
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 1. A simplified schematic of a typical wireless communication system. In the transmitter, a binary message of 𝐾 information bits is coded as 𝑁 bits for error correction
during the Encode stage. The Modulation stage converts the bits into discrete complex symbols using amplitude, phase or frequency to differentiate bits. On the receiver side, the
demodulator converts the received modulation symbols to the original code word of length 𝑁 and the decoding block converts from the code back to the original 𝐾 information
bits of the message.

• To show that the proposed method achieves results closely match- rate of 𝐾∕𝑁 = 4∕7 could achieve equal performance to MLD decoding
ing and improving upon performance of maximum likelihood for the Hamming(7,4) code and that an AE trained on an interference
decoding (MLD) with conventional codes over several code rates channel could achieve lower error rate than a quadrature amplitude
and channels. modulation (QAM) time sharing modulation scheme for equivalent
code sizes [2]. However, the design of the classifier architecture is
The remainder of this article is structured in the following manner:
limited to a fixed number of message bits 𝐾 and a fixed code size 𝑁,
Section 2 provides an overview of the research into end-to-end learning
and is constrained to the domain of short length burst transmissions.
for wireless communications as well as describing multi-task learning
Such codes are beneficial for use in energy constrained communications
and adaptive modulation and coding. Section 3 describes the multi-rate
such as in the internet of things (IoT). The AE model as presented
AE, the method of regularisation and its custom training procedure.
in [2] serves as the canonical model for DL end-to-end communications
Section 4 reports on the BER and BLER of the model under several
systems under simplified constraints. Related work stemming from this
channel environments. Section 5 describes the limitations of the model
and discusses the generalisation capability. The article concludes in initial research extends the AE architecture and examines applications
Section 6 where the summary of findings is given and further directions to end-to-end learning, over-the-air learning, and the use of custom
for research are proposed. training algorithms.
A key assumption of the AE model as described in [2] requires
2. Background and related work that a differentiable channel function for a given channel is predefined
to permit back-propagation between the transmitter and receiver. An
The use of AE neural networks for learning an end-to-end wire- alternate approach is to train a generative adversarial network (GAN)
less communications system was first proposed in [2]. An AE was using observed perturbations from the true channel, thereby removing
demonstrated to learn optimal short codes for the AWGN channel. the assumption of a predetermined channel. The approach described
The AE model architecture as described by the article is illustrated in [4], applies this technique to approximate an unknown channel
in Fig. 2. The model consists of symbol-wise encoding for 2𝐾 possible function and provides a fully differentiable channel model for training
messages, a transmitter containing multiple dense blocks (layers) and of the AE. The architecture is trained and evaluated on several channels
an energy normalisation constraint, transmitter output IQ symbols for including the AWGN, Rayleigh Fading and Frequency Selective multi-
a code size 𝑁, an assumed channel function, and a receiver (also path channels. Under the AWGN and Rayleigh fading channels, the
containing multiple dense blocks) whose task is to predict the received AE-GAN model is compared with the end-to-end AE from [2], as well
message index [2]. In a typical AE the middle layers are applied as conventional coding methods, demonstrating the effectiveness of the
to find a compressed set of features for the input, whereas in the GAN in approximating the channel during training [4]. Each compo-
wireless communications design, the middle layers typically represent nent (transmitter, channel GAN and receiver) is trained in succession
the output of the transmitter, which are influenced by the distortion using an iterative algorithm, where components not participating in
provided by the channel function. By applying DL to the design of each training cycle had their weights frozen [4]. The architecture
wireless communications systems the article introduced the potential assumed a constant size of 𝐾 message bits for the AE input and output,
use of generalised hardware platforms such as graphical processing and while it was able to approximate bit-wise output leveraging Con-
units (GPUs), and enabled the opportunity for optimisation against volutional Neural Network (CNN) layers to support differing message
complex channels without requiring an analytic mathematical model lengths, it did not address the effect of altering the code size 𝑁 and
for the channel [2]. The article demonstrated that an AE with a code was not designed to produce multiple code rates without retraining.

3
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 2. The neural network architecture of the end-to-end AE for wireless communications described in [2]. The architecture of transmitter and receiver each consist of multiple
dense layers or blocks, and the transmitter includes an energy normalisation layer. The model is defined for a predefined number of message bits 𝐾 and a single code rate 𝑁,
as well as an assumed channel function ℎ(𝑧). The transmitter learns to encode message 𝑀 from one of 2𝐾 messages into IQ symbols which are sent over the assumed channel
function and the receiver learns to estimate the probability of the message 𝑀 given the received output of the channel 𝑟.

Another method for addressing the channel assumption was pro- during training. While we do not investigate an OTA implementation,
posed by leveraging an iterative training algorithm based on RL in [5, we do investigate the performance of an end-to-end AE on several
10]. The training algorithm first trains the receiver by back-propagation channels without retraining or tuning, and propose that an architecture
and in the second step applies the receiver loss to an approximation of capable of generating AMC schemes is advantageous in environments
the gradient to perform back-propagation at the transmitter [5]. The for which it was not initially trained.
advantage of this method is that it is agnostic to the true channel, Extensions of the AE architecture have been made to incorporate
although does require reliable feedback of losses from the receiver, concatenated coding techniques for reliable communications in [11]
and applied an equalisation method from [2] in order to train against where the AE learns the inner code and the outer code is implemented
the Rayleigh Block Fading (RBF) channel [5]. This latter point indi- with a low density parity check (LDPC) code. Such an outer code is
cates the dependency of DL methods on the perturbations that are capable of variable code rates, independent of the AE model. In this
applied to their training data, in wireless communications systems, method, the AE performs bit-wise encoding and decoding as opposed
these perturbations result from the channel environment. In [10] the to the classification of a symbol-wise output in [2]. The role of the
RBF channel is evaluated with and without equalisation, indicating receiver is to estimate the log-likelihood ratio for each bit and is
a slight difference in performance between the approaches. Changes trained in a manner equivalent to maximising the achievable rate of the
made to the channel, outside of that applied during training, will have communications system [11]. Both the encoder and decoder are param-
a negative impact on the performance of the model, hence further eterised by the signal to noise ratio (SNR) and it is shown that learnt
investigation of approaches to regularisation for end-to-end training constellations are correlated with the given SNR [11]. The association
are required for adaptability to changing channel conditions. While of constellation and SNR enables a form of adaptive coding which does
the research focus for AE in wireless communications has addressed not vary the code length, but may rearrange the constellation points
the assumed channel function, we propose that adaptability can be instead. In our approach we instead modify the AE model architecture
achieved by also considering an AMC scheme. to allow parameterisation for a code index which permits the model to
One approach to address adaptation, post training, is to deploy learn a mapping between the code index parameter and a variable code
and retrain the receiver independently of the transmitter on the true length, thereby achieving AMC by varying code rates.
channel, also referred to as tuning or transfer learning. This technique Given an assumed channel, and a measure of the communication
is described in [3] where an iterative approach is applied to train the error rate, it is possible to iteratively search for an optimal code rate.
AE end-to-end and secondly to fine-tune the receiver. The method is A technique for this type of search is presented in [12]. The main
demonstrated on a software defined radio (SDR) implementation during contribution of the article is first to address the issue of overfitting in
an over the air (OTA) training phase [3]. OTA training allows a more the end-to-end AE and propose an additional regularisation term that
realistic channel environment as opposed to end-to-end training, and maximises the mutual information between the transmitter symbols
requires additional stages to support synchronisation such as filtering, and the output of an assumed channel function [12]. This regularisation
timing, phase, and carrier frequency offset corrections [3]. Two sep- term is applied to the loss term of the AE and is approximated by
arate sub-networks were incorporated prior to the receiver/decoder training a separate neural network. The search algorithm, described
model to correct for timing and phase offsets as well as learning new as capacity driven AE, iterates over multiple SNR and trains AEs at
features to assist in decoding [3]. While the trained system did not incremental code rates 𝐾∕𝑁 until improvement in the mutual infor-
perform as well as the comparative communications system, the work mation over previous AE falls below a given threshold [12]. However,
demonstrates that receiver tuning OTA has the potential to improve long training durations, and the limitations around sampling for large
the adaptation of the overall system post end-to-end training and em- message bit lengths are a disadvantage for AE. An exhaustive search is
phasises the mismatch between the analytic and actual channel models. feasible for short messages, but less feasible as 𝐾 increases. The ability
This is further evidence of the sensitivity of DL models to perturbations to design a single network architecture that can support multiple code

4
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

rates could reduce the overall duration of such a search algorithm. To units to enable a slightly larger network to aid in learning multiple code
make these changes to the AE architecture, we propose framing the task sizes.
of training multiple code rates as a MTL problem. This section includes the details for the changes to the AE archi-
MTL seeks to regularise a network to perform several related but dis- tecture (Section 3.1), the approach to training for the transmitter and
tinct tasks through the network architecture (hard sharing) or through receiver models (Section 3.2) and the selected channel functions that
regularisation methods constraining weights in matching layers (soft are applied during training and evaluation (Section 3.3).
sharing) [6]. The simplest hard sharing approach uses a common single
path with multiple outputs to demonstrate that the relatedness between 3.1. Model architecture
tasks benefits network regularisation through the transfer of inductive
bias between those tasks [13]. This type of architecture also has the The AE, described in [2], consists of a single path through the net-
advantage of limiting the number of parameters required by multiple work and a channel function implemented as an AWGN layer (Fig. 2).
networks, since a common path is shared between separate branches Estimation is performed as a classification for the corresponding one-
rather than requiring an individual network for each task [13]. Training hot encoded input message 𝑀. One-hot encoding represents each bi-
of such architectures is performed while learning multiple tasks simul- nary message 𝑀 by defining an input vector of the same length as the
taneously, whereas in our approach we train successively for single number of unique binary messages (in 2𝐾 messages). Each unique mes-
tasks (code rates) using a common architecture. However successive sage is represented as a 1 at its corresponding index in the input vector
training on different tasks is well known to suffer from catastrophic (a value from 0 to 2𝐾 − 1) with all other positions of the vector set to 0.
forgetting [7] also termed negative transfer in the MTL literature [6]. Symbol-wise classification is performed at the receiver by learning to
The challenge of MTL for sequential learning is approached in [14] estimate the probability of a given message at the corresponding vector
which proposes a dynamic architecture comprising of shared and task index 𝑝(𝑀|𝑟(𝑡)) given a set of channel symbols (in-phase and quadrature
specific paths which is trained in a sequential manner. This approach values) 𝑟(𝑡). The index with the highest probability is mapped from the
is demonstrated to reduce the negative transfer between tasks on the index to the estimated binary message 𝑀 ̂ = arg max 𝑝(𝑀|𝑟(𝑡)).
shared path [14]. During training and inference the structure of the The branching structure of our proposed model is inspired by hard
network is altered dynamically and enables the execution of one task sharing in MTL. In the simplest form of hard sharing, a common
at a time [14], in this manner the parameterisation of each task is network path is followed by a set of branches that correspond to
implicit to the current organisation of the network. In our approach each distinct task [6]. The common path learns shared features for all
we define tasks as code rates and dynamically reconfigure the network tasks. In our architecture a task, corresponding to a network branch,
structure during training and inference while supplying a code index represents a different code rate 𝐾∕𝑁 due to the change in the code
parameter to indicate which code rate the transmitter should output. size 𝑁. The common path contains two dense blocks composed of a
To regularise the shared path between tasks we make use of a simple feed-forward layer, batch-normalisation [24] and either rectified linear
weight averaging regularisation [9]. unit (ReLU) [25] or Swish [26] activation functions. The two dense
Adaptation for both modulation and coding has been demonstrated blocks feature residual connections to form skip blocks which assist in
to achieve more reliable communications under varying levels of in- preventing extremes in the gradient during back-propagation of deeper
terference when compared to adaptation for modulation only [15]. networks [23]. Although these networks are relatively shallow, we have
AMC-enabled systems have also been shown to produce higher data found that the skip connections do improve the performance of the
transfer rates over various communication environments [16,17]. AMC network.
is implemented by selecting a combination of modulation scheme and In the transmitter, the common path accepts a concatenation be-
error-correcting code to achieve a target BER under a given SNR tween the one-hot encoded message and an embedding representing
partition [16,18]. Different code sizes may be constructed from the the code index. The code size index is provided to a discrete gate that
same family of codes so that the minimum distance of the code remains determines which branch of the network architecture should receive
constant over varying SNR and channel fading conditions [19]. Such features from the common path during the forward pass. Each branch
codes can be formed by shortening, that is reducing both information contains a feed-forward layer followed by a linear activation. The
and code word bits, or puncturing by removing some of the parity output of the transmitter consists of a feed-forward layer followed by
check bits from each code word [20]. Cyclic codes are well suited to a tanh activation and an energy normalisation layer that is applied
shortening and puncturing since the original decoding procedure can prior to the channel function. An overview of the transmitter model
be applied to the resulting code [20]. This category of codes includes is shown in Fig. 3. The architecture is parameterised by a set of code
{ }
the Hamming code [21], Bose–Chaudhuri–Hocquenghem (BCH) code rates 𝑖 ∈ 11 , … , 𝑖𝑛 that are used by the branch node to select which
[21,22] and the quadratic residue code (QRC) [21]. Rather than short- of the output branches are active during training and inference. The
ening a family of codes, we augment the end-to-end AE for wireless branch node is indicated by the Discrete Gate in the transmitter, Fig. 3
communications to jointly learn multiple code rates. The advantage of and the receiver, Fig. 4.
jointly learning modulation and coding would enable AMC schemes to The architecture of the receiver is illustrated in Fig. 4 and follows
be tuned specifically for target channel conditions. a similar pattern to that of the transmitter. Instead of receiving an
additional code size parameter during inference, the length of the
3. Methodology received channel symbols is stored at the input to the network. Zero
padding is applied to the received symbols up to the maximum allowed
In our work we consider the AE architecture in [2] as the canonical code size. The padding layer is followed by a series of skip blocks
DL architecture for jointly learning modulation and coding in a wireless having the same structure as those in the transmitter, prior to the
communications system. However the structure of the canonical AE is discrete gate. The Discrete Gate receives the stored length parameter and
limited to a single message bit size 𝐾 and a single code size 𝑁. In uses this to determine which output branch (Dense Layer 𝑖) to activate
this article we make several alterations to the original AE architecture on the forward pass. Each output branch consists of a feed-forward
to support end-to-end learning for AMC with multiple code sizes 𝑁. layer and a soft-max activation layer.
We modify the network architecture so that it is able to learn several The number of units in the input layers of the transmitter relate
predefined code sizes by adding branching outputs at the transmitter to the 2𝐾 possible messages while for the receiver the input units
and receiver. We also add a parameter to select the code size index depend on the code size that is selected in the transmitter. To determine
during training and inference in the transmitter. In the main path of the number of units for the shared path of each network, a stepwise
the network we include skip connections [23] between blocks of dense approach was applied. Starting from the value of 𝐾 the number of units

5
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 3. The transmitter contains a common path followed by a discrete gate which switches between the set of selected code sizes prior to merging and energy normalisation.
The transmitter sizes for N provide an example of how each branch represents a different code rate, and are an example of 4 bit model configuration.

Fig. 4. The receiver architecture follows a similar approach to the transmitter where the shared path is followed by a discrete gate and a separate classification layer corresponding
to each code size.

was gradually increased until an acceptable performance was achieved message sizes, 𝐾 = 4 bits, 𝐾 = 7 bits and 𝐾 = 8 bits. For each case, the
during training. The list of layers and the number of units in each layer network was trained to map 𝐾 information bits to multiple code sizes
of the transmitter network is shown in Table 1 and is shown for the of 𝑁 channel symbols for transmission. For the 𝐾 = 4 bit message size,
receiver network in Table 2. the code sizes included 𝑁 = 4, 8, 16 and 20, similarly for the 𝐾 = 7 bit
Each branch in the transmitter and receiver represents a code of size
message size, code sizes 𝑁 = 11, 15, and 34 and finally for the 𝐾 = 8 bit
𝑁. The code size parameter was supplied to the network as a choice
from a set of code sizes 𝑖 ∈ {𝑖𝑖 , … , 𝑖𝑛 }. Several variants of the network message configuration, code size 𝑁 = 6, 8, 17, 32 and 40 were selected.
architecture were trained to evaluate the effect of additional code sizes The configurations for each model, message and code size are listed in
(or increased number of tasks) on the overall performance of the model. Table 3. The table also lists the total number of trainable parameters
The branching transmitter and receiver networks were trained for three in each neural network.

6
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Table 1
The number of units in each layer of the transmitter model. The list of choices for code size indices 𝑖 are provided as a parameter to
the architecture. During training and inference the one-hot encoded message and the selected code size index 𝑖 are provided as input to
the network.
Layer Units 𝐾 = 4 Units 𝐾 = 7 Units 𝐾 = 8 Group
Input 1 𝑀 = 2𝐾 units 16 128 256
Input 2 code size index 𝑖 1 1 1
Input layers
Code index embedding 2𝐾 units 8 14 16
Concatenate 2𝐾 + 2𝐾 units 24 142 272
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Dense layer 64 64 64
Batch normalisation – – – Skip block
Activation (ReLU or Swish) – – –
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Dense layer 64 64 64
Batch normalisation – – – Skip block
Activation (ReLU or Swish) – – –
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Gate layer – – –
Dense layer 𝑖 = 4, units = 8 𝑖 = 11, units = 22 𝑖 = 6, units = 12
Dense layer 𝑖 = 8, units = 16 𝑖 = 15, units = 30 𝑖 = 8, units = 16
{ }
Dense layer 𝑖 = 16, units = 32 𝑖 = 34, units = 68 𝑖 = 17, units = 34 Code size 𝑖 ∈ 𝑖, … , 𝑖𝑛 branches
Dense layer 𝑖 = 20, units = 40 – 𝑖 = 32, units = 64
Dense layer – – 𝑖 = 40, units = 80
Linear activation – – –
Reshape layer [2 × 𝑖] [2 × 𝑖] [2 × 𝑖]
Dense layer [2 × 𝑖] [2 × 𝑖] [2 × 𝑖] Transmitter output for code size 𝑖
Tanh activation – – –
Energy normalisation – – –

We compare the different model configurations against several con- size during training, and weight regularisation. Training consists of
ventional codes. The 𝐾 = 4 bit model is compared with the MLD mini-batches (32 messages per batch) and code sizes are selected from
performance of a system that uses uncoded binary phase shift keying a random uniform distribution each mini-batch. The update of each
(BPSK) modulation with extended Hamming (8,4) code. The 7 bit model is performed with back-propagation each mini-batch and the
model is compared with three BCH coded systems, two of which use gradient is calculated for the selected code size. Over the course of
shortened codes s-BCH(11,7) and s-BCH(34,7) derived from mother learning, the weights for each layer in the network are stored and are
codes BCH(15,11) and BCH(63,36) respectively, with the additional averaged across mini-batches every ten iterations. This latter approach
code being BCH(15,7). The K = 8 bit model is compared with uncoded to regularisation is based on the stochastic weight averaging (SWA)
BPSK and a QRC(17,8) code. performed in [9], and in the results we have observed that training
In both architectures, the gate function at layer 𝑙, 𝑓𝑔(𝑙) is parame- using SWA produces better performance as opposed to those networks
terised by the code rate index 𝑖 and input to the current layer ℎ(𝑙) = trained without SWA. SWA combined with a cyclical learning rate
( )
𝑓 (𝑙−1) ℎ(𝑙−1) which
{ selects from a set
} of branches comprising the next schedule [27] is demonstrated in [9] to improve the generalisation of
layer 𝑓𝑖(𝑙+1) ∈ 𝑓0(𝑙+1) , 𝑓1(𝑙+1) , … 𝑓𝑛(𝑙+1) (Eq. (1)). In the transmitter, the the network. During training back-propagation is performed with the
Adam optimiser [28] combined with the cyclical learning rate between
code rate index is supplied as an explicit parameter to the network, 0.0001 and 0.001.
whereas in the receiver the code rate index is determined based on the In addition to the sampling scheme and SWA regularisation, an
number of symbols received from the channel. During the forward pass alternating training algorithm is applied in four steps described in
only one path through the branch is active (at the branch layer, the ac- Algorithm 1 and Fig. 5. These steps consist of: (1) train the end-to-end
tive branch will be 𝑓𝑖(𝑙+1) ). During back-propagation, no gradients exist network, (2) generate mini-batches using the transmitter, (3) train the
for the inactive branches, hence only the active branch receives the receiver against a simulated channel and record the loss, and (4) update
gradient update. Each of the respective shared paths in the transmitter the end-to-end network. In Step 1, back-propagation is run on the end-
and receiver, participate in back-propagation. to-end model which contains both the receiver and transmitter models.
( ) ( ) This updates the weights in both the receiver and transmitter models,
𝑓𝑔(𝑙) 𝑖, ℎ(𝑙) = 𝑓𝑖(𝑙+1) ℎ(𝑙) ,
{ } (1) and the AWGN channel is simulated directly as part of the end-to-end
where 𝑓𝑖(𝑙+1) ∈ 𝑓0(𝑙+1) , 𝑓1(𝑙+1) , … 𝑓𝑛(𝑙+1) model architecture. During Step 2, the transmitter is used to generate
the transmitter symbols and the channel is simulated independently
3.2. Training algorithm of both transmitter and receiver models. In Step 3, the receiver is
then trained using the channel response as the receiver input, and
It is important to consider a suitable regularisation approach to pre- during back-propagation the receiver loss is calculated against the true
vent negative impact on overall network performance between tasks. messages. This allows the transmitter and receiver to be evaluated
This is achieved with two approaches, randomised sampling for code independently and the resulting receiver loss is used to coordinate

7
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Algorithm 1 During training, the end-to-end model is trained iteratively with the receiver model.
Input: epochs ⊳ The number of training iterations.
Input: batchSize ⊳ The size of each training or validation batch.
Input: transmitModel ⊳ The transmitter model.
Input: receiveModel ⊳ The receiver model.
Input: endToEndModel ⊳ The end-to-end model containing transmitter, channel and receiver models.
Input: channel ⊳ The channel simulation function.
Input: snr ⊳ An initial SNR dB for perturbation of training data.
Input: codeSizeList ⊳ The set of allowed code lengths.
Output: endToEndModel ⊳ The end-to-end model updated after training.

𝑙𝑜𝑠𝑠 ← ∞
𝑤𝑒𝑖𝑔ℎ𝑡𝑠𝐿𝑖𝑠𝑡 ← []

for 𝑖 ← 1, 𝑒𝑝𝑜𝑐ℎ𝑠 do
if Train with random SNR then
𝑠𝑛𝑟 ← Random-Uniform(0,9) ⊳ Use randomised SNR to perturb data for training.
end if
codeSize <- Random-Uniform(codeSizeList) ⊳ Randomly select the code size for the training batch.
Train-EndToEnd(𝑏𝑎𝑡𝑐ℎ𝑆𝑖𝑧𝑒, 𝑐𝑜𝑑𝑒𝑆𝑖𝑧𝑒, 𝑠𝑛𝑟, 𝑒𝑛𝑑𝑇 𝑜𝐸𝑛𝑑𝑀𝑜𝑑𝑒𝑙) ⊳ Fig. 5 (Step 1)
𝑚𝑒𝑠𝑠𝑎𝑔𝑒𝑠 ← Transmit-Samples(𝑏𝑎𝑡𝑐ℎ𝑆𝑖𝑧𝑒, 𝑐𝑜𝑑𝑒𝑆𝑖𝑧𝑒, 𝑠𝑛𝑟, 𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡𝑀𝑜𝑑𝑒𝑙, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙) ⊳ Fig. 5 (Step 2)
𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑟𝐿𝑜𝑠𝑠 ← Train-Receiver(𝑚𝑒𝑠𝑠𝑎𝑔𝑒𝑠, 𝑐𝑜𝑑𝑒𝑆𝑖𝑧𝑒, 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑀𝑜𝑑𝑒𝑙) ⊳ Fig. 5 (Step 3)
if 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑟𝐿𝑜𝑠𝑠 < 𝑙𝑜𝑠𝑠 then
𝑙𝑜𝑠𝑠 ← 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑟𝐿𝑜𝑠𝑠
Save(𝑒𝑛𝑑𝑇 𝑜𝐸𝑛𝑑𝑀𝑜𝑑𝑒𝑙, 𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡𝑀𝑜𝑑𝑒𝑙, 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑀𝑜𝑑𝑒𝑙) ⊳ Save models each time learning improves at the receiver.
end if
if 𝑖 mod 10 equals 0 then
𝑤 ← Average-Weights(𝑤𝑒𝑖𝑔ℎ𝑡𝑠𝐿𝑖𝑠𝑡)
Set-Weights(𝑒𝑛𝑑𝑇 𝑜𝐸𝑛𝑑𝑀𝑜𝑑𝑒𝑙, 𝑤) ⊳ Apply weight averaging every 10 iterations.
else
𝑤 ← Get-Weights(𝑒𝑛𝑑𝑇 𝑜𝐸𝑛𝑑𝑀𝑜𝑑𝑒𝑙)
Append(𝑤𝑒𝑖𝑔ℎ𝑡𝑠𝐿𝑖𝑠𝑡, 𝑤)
end if
Train-EndToEnd(𝑏𝑎𝑡𝑐ℎ𝑆𝑖𝑧𝑒, 𝑐𝑜𝑑𝑒𝑆𝑖𝑧𝑒, 𝑠𝑛𝑟, 𝑒𝑛𝑑𝑇 𝑜𝐸𝑛𝑑𝑀𝑜𝑑𝑒𝑙) ⊳ Fig. 5 (Step 4)
end for

return 𝑒𝑛𝑑𝑇 𝑜𝐸𝑛𝑑𝑀𝑜𝑑𝑒𝑙

Fig. 5. Custom training algorithm consisting of several stages interleaving training of receiver and end-to-end model.

8
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Table 2
The number of units in each layer of the receiver model. The list of choices for code size indices 𝑖 are provided as a parameter to the architecture,
and selected at runtime based on the length of the received channel symbols.
Layer Units 𝐾 = 4 Units 𝐾 = 7 Units 𝐾 = 8 Group
Input 1 [2 × 𝑖] units [2 × 𝑖] [2 × 𝑖] [2 × 𝑖]
Input from channel for code size 𝑖
Flatten Layer 2𝑖 2𝑖 2𝑖
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Dense layer 64 64 64
Batch normalisation – – – Skip block
Activation (ReLU or Swish) – – –
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Dense layer 64 64 64
Batch normalisation – – – Skip block
Activation (ReLU or Swish) – – –
Dense layer 32 512 512
Batch normalisation – – –
Activation (ReLU or Swish) – – –
Gate layer – – –
Dense layer 2𝐾 units 16 128 256 { }
Output 𝑖 ∈ 𝑖, … , 𝑖𝑛 Branches
Softmax activation – – –

Table 3
Multiple variations of the model are trained with separate configurations for message bits K and code size N. The total trainable parameters
for each neural network counts all weights, biases, and batch normalisation parameters. The final column lists the conventional codes included
in comparisons of BER and BLER selected channel conditions.
Configuration Model variant K N Transmitter Receiver Comparison codes
parameters parameters
1 4 bit single model 4 𝑖 ∈ {4} 3831 4880 Uncoded BPSK
2 4 bit multi-rate model 4 𝑖 ∈ {4, 8, 16, 20} 14 471 13 888 Uncoded BPSK
extended
Hamming(8,4)
3 7 bit multi-rate model 7 𝑖 ∈ {11, 15, 34} 671 151 767 616 s-BCH(11,7)
BCH(15,7)
s-BCH(34,7)
4 8 bit multi-rate code 8 𝑖 ∈ {6, 8, 17, 32, 40} 649 125 1 101 696 Uncoded BPSK
QRC(17,8)

intermittent checkpointing of both models. Finally, Step 4 updates the under block or bit fading. In addition we assume no channel estimation
end-to-end network after weight averaging before the next training to reverse the effect of fading on the receiver.
iteration.
𝑟(𝑡) = 𝑎(𝑡)𝑧(𝑡) + 𝑛(𝑡) (3)
3.3. Channel functions
The additive Gaussian noise 𝑛(𝑡) is drawn from the complex normal
The assumed channel function that is applied during training of the distribution 𝑛(𝑡) ∼  (0, 𝜎 2 ). The variance 𝜎 2 is derived from the
proposed model is the AWGN channel. Evaluation for the BER and BLER desired SNR and the final output of the transmitter layer 𝑧(𝑡), having 𝑡 =
is made on three channels, the AWGN and two variants of Rayleigh [1 ⋯ 𝑇 ] discrete time steps. A desired level of noise is first supplied to
fading differing in duration, the first applies fading to the entire block the channel simulation as the ratio of energy per bit to noise 𝐸𝑏 ∕𝑁0 dB.
(Block fading), and the second varies symbol to symbol (Bit fading). To account for the selected code rate 𝑘∕𝑛, the 𝐸𝑏 ∕𝑁0 dB is converted
When evaluation is carried out, the proposed models are not retrained to the ratio of energy per symbol to noise 𝐸𝑠 ∕𝑁0 dB = 𝐸𝑏 ∕𝑁0 dB +
or tuned for the two additional fading channels. 10𝑙𝑜𝑔10 (𝑘∕𝑛). The components for 𝐸𝑠 and∑𝑁0 are then estimated from
In AWGN (Eq. (2)) additive Gaussian noise 𝑛(𝑡) is added to the 𝑇
𝑡=1 𝑧(𝑡)
2
𝐸𝑠
the transmitter symbols 𝑧(𝑡) where 𝐸𝑠 = and 𝑁0 = . The
output of the transmitter 𝑧(𝑡), where 𝑡 is the discrete time step of the 𝑇 𝐸𝑠 ∕𝑁0
transmitter output. parameter 𝐸𝑠 ∕𝑁0 is also commonly referred to as SNR. The variance
is then estimated as 𝜎 2 = 𝑁0 ∕2 and used to sample from the complex
𝑟(𝑡) = 𝑧(𝑡) + 𝑛(𝑡) (2)
normal distribution.
Eq. (3) shows the Rayleigh fading coefficient 𝑎(𝑡), at each discrete The output at the transmitter is normalised by the energy constraint
time 𝑡, applied to the transmitted signal 𝑧(𝑡), prior to addition of ‖𝑥‖22 ≤ 1 implemented in Eq. (4) where 𝑥(𝑡) ∈ C is the sequence of
additive noise 𝑛(𝑡). The fading coefficient 𝑎(𝑡) = √1 |𝑎|𝑒𝑗𝜓 , is drawn from complex symbols output by the tanh activation layer and 𝑇 the number
2
a complex standard normal distribution 𝑎 ∼  (0, 1), and it’s argument of time steps in the sequence. During training it is possible to vary the
multiplied with the exponential waveform with phase parameter 𝜓, we SNR dB randomly or to train at a constant SNR dB. A fixed SNR of 6 dB
assume a constant phase 𝜓 = 0. The duration of the coefficient varies performed best for the 4 bit message, however 7 bit and 8 bit messages

9
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 6. BER and BLER in AWGN of several training methods, training with a standard back-propagation algorithm and SWA weight averaging (MultiTxRx 1-Step SWA), multi-step
training without SWA (MultiTxRx NoSWA) and multi-step training with SWA and fixed SNR (MultiTxRx SWA Fixed SNR). Improvement in performance is indicated with the
addition of SWA as well as when training with the multi-step training procedure as opposed to the standard back-propagation.

were trained with random SNR between 0 − 9 dB. All code rates from configuration 2 in Table 3 are compared under
𝑥(𝑡) in the AWGN channel in Figs. 8 and 9 and in the Block Fading and Bit
𝑧(𝑡) = √ (4)
∑𝑇 2
Fading channels in Figs. 10 and 11 respectively. In the AWGN channel,
𝑖=1 𝑥(𝑖) ∕𝑇 it is difficult to see the difference in performance between code rates
in relation to 𝐸𝑏 ∕𝑁0 (except for 𝐾 = 4, 𝑁 = 4). In contrast, Fig. 9
4. Results displays the BER and BLER related to the energy per symbol (𝐸𝑠 ∕𝑁0 )
SNR dB. This example demonstrates that the smaller code rates can
In this section we report the empirical evaluation for the proposed achieve lower BER and BLER as the channel noise increases at the cost
model at different code rates, listed in Table 3, for the AWGN and of increased channel usage due to the increase in code size 𝑁. The aim
the two fading channels. In each case we refer to the proposed model of an AMC scheme is to maintain performance by trading off channel
as the MultiTxRx model to indicate a multi-branching transmitter and use in varying SNR.
receiver. The first evaluation investigates the performance of the pro- In the Block Fading channel (Fig. 10) we observe that the BER is
posed training algorithm. The second set of evaluations reports on the much higher than the uncoded BPSK while the BLER is much lower.
performance of different variations of the architecture. These two sets The BER is higher because symbol-wise classification does not perform
of evaluations are used to examine the design choices for the model error correction of individual bits. An error on a code word may contain
structure and training approach. After this we compare the set of code more incorrect bits in a single forward pass estimation. However, this
size configurations from Table 3 with the conventional codes in the approach achieves better BLER, as it can accurately classify, or map,
AWGN channel, and evaluate performance without retraining or tuning the entire code word for a corresponding message. In the Block Fading
in the fading channels. channel, the entire code word is impacted by the channel fading. Bit-
The performance of several training algorithms are evaluated in interleaving techniques can be applied in this circumstance which can
Fig. 6, with message size 𝐾 = 4 bits and code size 𝑁 = 4 (from produce an effect that is similar to a Bit Fading channel prior to
configuration 1, Table 3). Uncoded BPSK performance is included for decoding. The difference between the code rates is most noticeable in
reference. The MultiTxRx 1-Step SWA model was trained using standard the Bit Fading channel (Fig. 11), performance improves as the code
back-propagation and included weight averaging. MultiTxRx NoSWA rate decreases (at the expense of channel use). In this channel, the
(no weight averaging) and MultiTxRx SWA where trained with the 𝐾 = 4, 𝑁 = 8 code is slightly better than the baseline extended
multi-step algorithm described in Fig. 5. The multi-step algorithm with Hamming(8,4) code, as opposed to the AWGN channel. In the AWGN
weight averaging (MultiTxRx SWA) produces lower BER and lower channel, code rate 𝐾 = 4, 𝑁 = 8 is close to the baseline extended
BLER in comparison to standard back-propagation (MultiTxRx 1-Step Hamming(8,4) code, but differs slightly in higher SNR. The model is
SWA). Training without weight averaging produces higher BER and not retrained on either of the fading channels and is able to perform
BLER. close to or better than the baseline.
Next, we compare the changes made to the AE architecture in Fig. 7 Fig. 12 displays the performance of the 𝐾 = 7 bit message and
by training on multiple code sizes from configuration 2 in Table 3. The code rates from configuration 3 of Table 3 in the AWGN channel. The
different types of architecture shown in Fig. 7 include a Single Path AE, figure shows slight gains for BLER over the shortened s-BCH(11,7)
Single Tx MultiRx, MultiTx SingleRx, MultiTxRx and MultiTxRx Residual. and BCH(15,7) codes, and similar performance to the shortened s-
The Single Path model consists of a single common path in the network. BCH(34,7) code. There is less difference in BER performance for these
The Single Tx MultiRx model applies a single path with a pooling layer codes. Under the Block Fading channel in Fig. 13 the BER is again
to realise multiple codes in the transmitter, and classifies multiple codes higher, but the BLER is lower in comparison to the reference codes.
using branching in the receiver. The structure is reversed in the MultiTx In the Bit Fading channel, shown in Fig. 14, incremental gains are
SingleRx. When trained with multiple code sizes, MultiTxRx is similar achieved on all code rates in comparison to the BCH and shortened
to the proposed architecture but does not feature skip connections codes (Fig. 14).
and the MultiTxRx residual is the proposed architecture, including skip Configuration 4 from Table 3 for the 𝐾 = 8 bit message and selected
connections. The MultiTxRx Residual model performs better than the code sizes, is compared with the uncoded BPSK and the QRC code in
other models and is close to the extended Hamming(8,4) BER. Both the AWGN (Fig. 15), Block Fading (Fig. 16) and Bit Fading (Fig. 17)
versions of the MultiTxRx model exhibit similar BLER. channels. In AWGN the BER produced by the model at the lower code

10
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 7. The multi-branching Tx Rx architecture is compared with four variants of the architecture, a non-branching single path architecture (Single Path), a single branch transmitter
with multi branch receiver (SingleTx MultiRx), the multi-branch transmitter and single receiver (MultiTx SingleRx) and multiple branching transmitter receiver with and without
residual connections (MultiTxRx Residual and MultiTxRx). The choice of network architecture influences performance for the multi-task estimation of multiple code rates.

Fig. 8. BER and BLER in AWGN for MultiTxRx model with K = 4 bits and N = [4, 8, 16, 20] compared with BPSK uncoded and extended Hamming(8,4) maximum likelihood
decoding (MLD).

Fig. 9. The coding gain for each respective code rate is visible when plotting the BER and BLER in AWGN over the energy per symbol (𝐸𝑠 ∕𝑁0 ) SNR dB. The advantage of learning
multiple codes enables operation under increased noise in the channel.

11
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 10. BER and BLER in the Block Fading channel for MultiTxRx model with K = 4 bits and N = [4, 8, 16, 20] compared with BPSK uncoded and extended Hamming(8,4)
maximum likelihood decoding (MLD). The proposed module was originally trained on the AWGN channel and is not trained for the Block Fading channel.

Fig. 11. BER and BLER in the bit fading channel for MultiTxRx model with K = 4 bits and N = [4, 8, 16, 20] compared with BPSK uncoded and extended Hamming(8,4) maximum
likelihood decoding (MLD). The proposed module was originally trained on the AWGN channel and is not trained for the bit fading channel.

Fig. 12. BER and BLER in the AWGN channel for MultiTxRx model with K = 7 bits and N = [11, 15, 34] compared with shortened BCH codes s-BCH(11,7), s-BCH(34,7) and
BCH code (15,7) maximum likelihood decoding (MLD), and trained with random SNR.

12
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 13. BER and BLER in the Block Fading channel for MultiTxRx model with K = 7 bits and N = [11, 15, 34] compared with shortened BCH codes s-BCH(11,7), s-BCH(34,7)
and BCH code (15,7) maximum likelihood decoding (MLD). The proposed module was originally trained on the AWGN channel and is not trained for the Block Fading channel.

Fig. 14. BER and BLER in the bit fading channel for MultiTxRx model with K = 7 bits and N = [11, 15, 34] compared with shortened BCH codes s-BCH(11,7), s-BCH(34,7) and
BCH code (15,7) maximum likelihood decoding (MLD). The proposed module was originally trained on the AWGN channel and is not trained for the bit fading channel.

Fig. 15. BER and BLER in the AWGN channel for MultiTxRx model with K = 8 bits and N = [6, 8, 17, 32, 40] compared with BPSK uncoded and Quadratic Residue Code (QRC)
K = 8, N = 17 maximum likelihood decoding (MLD), and trained with random SNR. The code (6,8) provides a higher channel usage than uncoded BPSK at 1.33 bits per channel
usage.

rates is similar to the baseline code QRC(17,8) and BLER is slightly channel, both BER and BLER achieve equal or better performance than
lower. The BER in the Block Fading channel, shown in Fig. 16, is the reference QRC(17,8) code. However, the BER for higher code rates
worse than the target baseline QRC code, however, as we have seen 𝐾 = 8, 𝑁 = 8 and 𝐾 = 8, 𝑁 = 6 do not perform as well as the uncoded
in the other configurations, the BLER for the same code size and lower BPSK in lower SNR, but do achieve gains for the BLER. This is also
code rates is slightly better than the reference code. In the Bit Fading apparent in the AWGN channel.

13
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 16. BER and BLER in the Block Fading channel for MultiTxRx model with K = 8 bits and N = [6, 8, 17, 32, 40] compared with BPSK uncoded and Quadratic Residue Code
(QRC) K = 8, N = 17 maximum likelihood decoding (MLD). The proposed module was originally trained on the AWGN channel and is not trained for the Block Fading channel.

Fig. 17. BER and BLER in the bit fading channel for MultiTxRx model with K = 8 bits and N = [6, 8, 17, 32, 40] compared with BPSK uncoded and Quadratic Residue Code
(QRC) K = 8, N = 17 maximum likelihood decoding (MLD). The proposed module was originally trained on the AWGN channel and is not trained for the bit fading channel.

5. Discussion matrix for the classifier is shown in Fig. 19, for messages 0000 and
0111 the percentage of incorrect classifications is approximately 3%,
Comparison of the proposed model with the selected codes s- slightly higher than the incorrect classification between 0000 and 0001.
BCH(11,7), BCH(15,7), s-BCH(34,7) and QRC(17,8), demonstrated The minimum Euclidean distance of the code does appear to be related
lower BLER in each of the channels and notably under the Bit Fading to the performance of the learnt code. For those codes which have
channel without retraining. While the BLER was close to the extended a lower BLER than the comparative code, the minimum Euclidean
Hamming(8,4) code in each of the channels. In both the AWGN and distance and mean Euclidean distances are close to or exceed that of the
Block Fading channels the BER was often poorer than the comparison corresponding code. Table 4 lists the minimum, mean and variance of
code. As noted this is due to the classification for an entire code word the Euclidean distance 𝑑𝐸 calculated for the constellations of the learnt
rather than at the bit level and for the Block Fading channel, this and comparison codes.
effect of fading can be mitigated through the use of bit interleaving. The changes to the AE to support multiple code rates does require
However, the performance of a code is also dependent on the smallest an increase in the number of parameters overall within the neural
minimum distance between all code words. Since the transmitter learns network. This is to support generalisation over multiple code rates.
continuous codes, instead of binary codes, the minimum Euclidean However, the use of a common shared path for multiple codes does
distance is more appropriate measure of distance for those codes. reduce the total number of parameters required in comparison to
Fig. 18 shows the Euclidean distances between each of the learnt training separate models. The size of four single AE neural networks
code words in the 𝐾 = 4, 𝑁 = 8 code. Ideally the transmitter should are shown in Table 5. The proposed branching model requires less
learn a constellation related to the distance between messages. In some parameters in a branching AE that can produce four different code rates
cases, there is a larger distance between message code words with a in comparison to four separate AE.
message Hamming distance of 1, than those message code words with a The proposed models produced gains in BLER in comparison to
larger message Hamming distance. For example, the Euclidean distance the conventional codes under each of the channels. However it is not
between code words for messages 0000 and 0001, a Hamming distance clear whether to attribute this gain to the learnt code or the inference
of 1, is larger than the Euclidean distance between code words for supported by the AE. To investigate this, we developed a table based
messages 0000 and 0111 with a Hamming distance of 3. The confusion transmitter and MLD receiver for the code rate 𝐾 = 7, 𝑁 = 15. Symbols

14
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Table 4
Computed minimum, mean and variance of euclidean distances for learnt and BPSK modulated reference codes.
Code rate 𝑑𝐸𝑚𝑖𝑛 𝐸[𝑑𝐸 ] 𝑉 𝑎𝑟[𝑑𝐸 ] Reference code Code 𝑑𝐸𝑚𝑖𝑛 Code 𝐸[𝑑𝐸 ] Code 𝑉 𝑎𝑟[𝑑𝐸 ]
K = 4, N = 8 3.83 4.12 0.08 Ext Hamming(8,4) 4 4.11 0.17
K = 7, N = 11 3.67 4.69 0.2 s-BCH(11,7) 3.46 4.66 0.47
K = 7, N = 15 4.38 4.38 0.19 BCH(15,7) 4.47 5.46 0.44
K = 7, N = 34 6.91 8.3 0.29 s-BCH(34,7) 6.63 8.25 0.53
K = 8, N = 17 4.34 5.82 0.23 QRC(17,8) 4.9 5.8 0.47

Table 5
The number of parameters for combined separate code rate models versus the multi-task
shared path model. The shared path architecture provides less total parameters than
separate models for each code rate.
Model variant K N Parameters
K = 4 N = 4 bit model 4 𝑖 ∈ {4} 8711
K = 4 N = 8 bit model 4 𝑖 ∈ {8} 9951
K = 4 N = 16 bit model 4 𝑖 ∈ {16} 12 431
K = 4 N = 20 bit model 4 𝑖 ∈ {20} 13 671
Total 44 764
4 bit multi-rate model 4 𝑖 ∈ {4, 8, 16, 20} 28 359

expect the MLD receiver to exhibit higher BLER. The MLD receiver
performed nearest neighbour decoding for received channel values
against the table of modulated symbols. The performance of the MLD
receiver matched the performance of the proposed branching AE model
in the corresponding channel (Fig. 20). This indicates that the gains
observed are generated due to the learnt constellations resulting from
training. This approach demonstrates potential use of DL for wireless
communications as a method for code design which may be applied
independently of the trained model.
Fig. 18. Euclidean distances between pairwise codewords for each input sequence,
learnt by the K = 4, N = 8 MultiTxRx auto-encoder. 6. Conclusions and future work

In this article we have presented a branching AE architecture ca-


pable of automatically learning multiple code rates for AMC schemes.
The proposed branching architecture extends applications of the AE
architecture beyond the learning of a single code rate to the learning
of multiple code rates. The choice of assumed channel during training
is highly influential to the resulting performance of the AE in other
channels. As a result, the ability to train a receiver separately on
a real channel provides the ability to further optimise the system
performance after deployment. The proposed branching AE for multiple
code rates, is demonstrated to perform well under a variety of changing
channel conditions, achieving gains in BLER compared to the selected
conventional codes. By leveraging an AMC scheme the approach offers
the potential to mitigate the requirement of receiver tuning in AE
for wireless communications. However, there remains a number of
limitations for the practical application of the DL approach requiring
further investigation.
First, in this article we have assumed perfect synchronisation at
the receiver. While it is possible to apply conventional methods for
synchronisation with learnt modulation and coding schemes, it is de-
sirable that synchronisation be addressed as part of the end-to-end AE
architecture.
Second, classification based architectures not only do not scale to
higher message lengths, but cannot provide error correction functional-
ity. Hence work on bit-wise decoding for longer message lengths, either
Fig. 19. The confusion matrix for the K = 4, N = 8 code rate under the block fading
as part of a concatenated code, or as a standalone network will be a
channel. Figures are relative to the predicted labels. While the classifier achieves a significant part of the practical application of such models, some of
high level of accuracy on the BLER, there is sufficient difference between messages to this work has already been described in the related work section of
cause high BER. this paper.
Third, there has been work investigating the sensitivity of such
architectures to their training conditions and whether they are brittle
for corresponding 7 bit messages output by the MultiTxRx K = 7, N in terms of adversarial attacks. While we do not directly explore
= 15 model were stored in a lookup table and transmitted over a this concern, there is a connection between network regularisation
AWGN channel. If the gain was solely due to the learnt receiver, we and training methods required to mitigate adversarial attacks. In [29]

15
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Fig. 20. BER and BLER in the AWGN channel of the learnt constellation for the (15,7) code produced by a table based transmitter and MLD receiver compared with the end-to-end
model and BCH(15,7) code.

conventional Hamming codes are shown to be more robust under CRediT authorship contribution statement
adversarial and jamming attacks than AEs. It is suggested in [30] that
adversarial examples are transferable across different models, thereby Christopher P. Davey: Conceptualisation, Methodology, Imple-
enabling black-box attacks. This raises the importance for future inves- mentation. Ismail Shakeel: Conceptualisation, Coordination, Edito-
tigation into regularisation methods for end-to-end learning in wireless rial. Ravinesh C. Deo: Coordination, Editorial. Ekta Sharma: Edi-
torial. Sancho Salcedo-Sanz: Coordination, Editorial. Jeffrey Soar:
communications and evaluation under adversarial interference.
Coordination, Editorial.
Fourth, as we have discussed, the Euclidean distances between
messages for neighbouring codes may be larger those several message Declaration of competing interest
bits away. This negatively impacts the performance of the BER, as mis-
classification results in a higher number of incorrect bits. Future work The authors declare that they have no known competing finan-
should investigate the ability of the transmitter to learn distance based cial interests or personal relationships that could have appeared to
relationships between source messages. In addition, while we have influence the work reported in this paper.
assumed no channel information at the receiver, it may be possible to
incorporate or learn such information to enhance receiver performance Data availability
in the end-to-end learning scenario.
The tuning of the receiver over varying channel conditions would Data used in this article is generated through simulation. The sim-
ulation process is described in the methodology section of the article,
be time consuming in a deployed system and may lead to poor perfor-
alongside the energy constraints and channel distortions.
mance on the original channel, for which it was first trained. Whether
it is practical to tune a receiver over the air, and how much training Acknowledgements
is required, is a matter for consideration. A practical solution may
be to use a branching AE with multiple code rates under changing This research is supported by UniSQ-DSTG Postgraduate Research
conditions. This would permit operation whilst a separate model is Scholarship 2021–2024 on the ‘Design of Efficient Artificial Intelli-
adapted in the background. The question of how to update such a model gence Algorithms for Future Communication Systems’. It is funded by
while mitigating catastrophic forgetting in changing channel conditions the Department of Defence, Commonwealth of Australia under a DSP
deserves further investigation. Scholarship (Project-Based) Agreement 10254.
Finally, the mapping between channel environment and choice of
code rate relies on measurements such as expected BER and BLER References
over associated SNR. It is feasible to imagine the joint learning of
[1] G. Caire, K.R. Kumar, Information theoretic foundations of adaptive coded
AMC and channel performance mapping, extending the work described modulation, Proc. IEEE 95 (12) (2007) 2274–2298.
in [31,32]. More recent research in the industrial internet of things [2] Timothy O’Shea, Jakob Hoydis, An introduction to deep learning for the physical
(IIoT) consider wireless communications as part of a joint optimisation layer, IEEE Trans. Cogn. Commun. Netw. 3 (4) (2017) 563–575.
[3] S. Dörner, S. Cammerer, J. Hoydis, S. t. Brink, Deep learning based com-
objective in seeking to reduce energy consumption over the collective
munication over the air, IEEE J. Sel. Top. Sign. Proces. 12 (1) (2018)
sensor network [33,34]. A potential application would be to learn 132–143.
energy efficient communication schemes for the IIoT setting, which are [4] Hao Ye, Le Liang, Geoffrey Ye Li, Biing-Hwang Juang, Deep learning-based
adaptive to operational constraints in addition to channel conditions, end-to-end wireless communication systems with conditional GANs as unknown
channels, IEEE Trans. Wirel. Commun. 19 (5) (2020) 3133–3143.
in an end-to-end manner.
[5] F.A. Aoudia, J. Hoydis, End-to-end learning of communications systems without
The flexibility of the AE architecture provides competitive perfor- a channel model, in: 2018 52nd Asilomar Conference on Signals, Systems, and
mance not only in learning a single code rate, but also as we have Computers, 2018, pp. 298–303.
shown, in learning AMC schemes with varying error-rate performance [6] Michael Crawshaw, Multi-task learning with deep neural networks: A survey,
2020, arXiv preprint arXiv:2009.09796.
and spectral efficiencies. By framing the learning problem as MTL,
[7] Michael McCloskey, Neal J. Cohen, Catastrophic interference in connectionist
the proposed architecture enables the deployment of a single model, networks: The sequential learning problem, in: Psychology of Learning and
instead of requiring multiple separate models for each code rate. Motivation, Vol. 24, Elsevier, 1989, pp. 109–165.

16
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

[8] Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio, [33] Sarogini Grace Pease, Russell Trueman, Callum Davies, Jude Grosberg, Kai Hin
An empirical investigation of catastrophic forgetting in gradient-based neural Yau, Navjot Kaur, Paul Conway, Andrew West, An intelligent real-time cyber-
networks, 2013, arXiv preprint arXiv:1312.6211. physical toolset for energy and process prediction and optimisation in the future
[9] Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gor- industrial internet of things, Future Gener. Comput. Syst. 79 (2018) 815–829,
don Wilson, Averaging weights leads to wider optima and better generalization, URL https://www.sciencedirect.com/science/article/pii/S0167739X1630382X.
2018, arXiv preprint arXiv:1803.05407. [34] Jiwei Huang, Han Gao, Shaohua Wan, Ying Chen, Aoi-aware energy con-
[10] Fayçal Ait Aoudia, Jakob Hoydis, Model-free training of end-to-end communica- trol and computation offloading for industrial IoT, Future Gener. Comput.
tion systems, IEEE J. Sel. Areas Commun. 37 (11) (2019) 2503–2516. Syst. 139 (2023) 29–37, URL https://www.sciencedirect.com/science/article/pii/
[11] S. Cammerer, F.A. Aoudia, S. Dörner, M. Stark, J. Hoydis, S. ten Brink, Trainable S0167739X22002916.
communication systems: Concepts and prototype, IEEE Trans. Commun. 68 (9)
(2020) 5489–5503.
[12] N.A. Letizia, A.M. Tonello, Capacity-driven autoencoders for communications, Christopher P. Davey has a Master of Information Technol-
IEEE Open J. Commun. Soc. 2 (2021) 1366–1378. ogy degree from the Queensland University of Technology
[13] Rich Caruana, Multitask Learning (Ph.D. thesis), Carnegie Mellon University, (QUT, Australia) in 2007 and completed a Master of
Pittsburgh, PA, 1998. Science majoring in mathematics and statistics from The
[14] Kevis-Kokitsi Maninis, Ilija Radosavovic, Iasonas Kokkinos, Attentive single- University of Southern Queensland (UniSQ, Australia) in
tasking of multiple tasks, in: Proceedings of the IEEE/CVF Conference on 2020. Chris has over a decade of professional experience
Computer Vision and Pattern Recognition, 2019, pp. 1851–1860. in software development and systems integration. He is
currently progressing work on a PhD program at UniSQ
[15] E. Armanious, D.D. Falconer, H. Yanikomeroglu, Adaptive modulation, adaptive
with the focus of his research being on ‘‘Deep Learning
coding, and power control for fixed cellular broadband wireless systems: some
for Wireless Communications’’. He has worked on ‘‘Artificial
new insights, in: 2003 IEEE Wireless Communications and Networking, 2003,
Intelligence for Decision-Making (AI4DM)’’ and ‘‘AI-enabled
WCNC 2003, Vol. 1, 2003, pp. 238–242, vol.1.
communicating systems’’ research project funded by the
[16] Joseph Downey, Dale Mortensen, Michael Evans, Janette Briones, Nicholas Tollis, Australian Government’s Department of Defence.
Adaptive coding and modulation experiment using NASA’s space communication
and navigation testbed, in: 2016 Communications Satellite Systems Conference,
ICSSC, 2016.
Ismail Shakeel received a Ph.D. degree in Telecommuni-
[17] Intae Hwang, Taewon Jang, Mingoo Kang, Sangmin No, Jungyoung Son, Daesik
cations in 2007 and a BEng (Hons) degree in Electronic
Hong, Changeon Kang, Performance analysis of adaptive modulation and coding
Engineering in 1997 from the University of South Australia.
combined with transmit diversity in next generation mobile communication
He has completed two master’s degrees at the University of
systems, Future Gener. Comput. Syst. 20 (2) (2004) 189–196. Canterbury (NZ) and Monash University in 2001 and 2002
[18] D. Wu, S. Ci, Cross-layer design for combining adaptive modulation and coding respectively. Ismail joined Defence Science and Technology
with hybrid ARQ to enhance spectral efficiency, in: 2006 3rd International Group (DSTG) in 2011 and is currently with the Information
Conference on Broadband Communications, Networks and Systems, 2006, pp. Sciences Division at DSTG. Ismail is also an Adjunct Profes-
1–6. sor at the University of Southern Queensland. Before joining
[19] A.J. Goldsmith, S.G. Chua, Adaptive coded modulation for fading channels, IEEE DSTG, he has worked in both academia and industry and
Trans. Commun. 46 (5) (1998) 595–602. holds a patent and generated more than 40 technical reports
[20] Shu Lin, Juane Li, Fundamentals of Classical and Modern Error- and publications in the field of telecommunications. Is-
Correcting Codes, Cambridge University Press, Cambridge, 2021, URL mail’s current research interests include signal detection and
classification techniques, artificial intelligence-enabled wire-
https://www.cambridge.org/core/books/fundamentals-of-classical-and-modern-
less communication, interference-resistant signalling, chaotic
errorcorrecting-codes/19A81ED5D7E9C6A1EBB9657683B6E39C.
communication, and cooperative wireless communication.
[21] Florence Jessie MacWilliams, Neil James Alexander Sloane, The Theory of
Error-Correcting Codes, vol. 16, Elsevier, 1977.
[22] Raj Chandra Bose, Dwijendra K. Ray-Chaudhuri, On a class of error correcting
binary group codes, Inf. Control 3 (1) (1960) 68–79. Ravinesh C. Deo is a Highly Cited Author, 2021 Clarivate)
leads UniSQ’s Advanced Data Analytics Lab as Professor at
[23] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
the University of Southern Queensland (UniSQ), Australia.
in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR,
He is a Clarivate Highly Cited Researcher with publications
2016, pp. 770–778.
ranking in top 1% by citations for field and publication
[24] Sergey Ioffe, Christian Szegedy, Batch normalization: Accelerating deep network year in the Web of Science citation index and is among
training by reducing internal covariate shift, in: Francis Bach, David Blei (Eds.), scientists and social scientists who have demonstrated sig-
Proceedings of the 32nd International Conference on Machine Learning, in: nificant broad influence, reflected in the publication of
Proceedings of Machine Learning Research, vol. 37, PMLR, 2015, pp. 448–456, multiple papers frequently cited by their peers. He leads
URL https://proceedings.mlr.press/v37/ioffe15.html. cross-disciplinary research in deep learning and artificial
[25] Xavier Glorot, Antoine Bordes, Yoshua Bengio, Deep sparse rectifier neural intelligence, supervising 20+ Ph.D./M.Sc. Degrees. He has
networks, in: Proceedings of the Fourteenth International Conference on Artificial received Employee Excellence Awards, Elsevier Highly Cited
Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2011, Paper Awards, and Publication Excellence and Teaching
pp. 315–323. Commendations. He has published more than 270 articles,
150 journals, and seven books with a cumulative citation
[26] Prajit Ramachandran, Barret Zoph, Quoc V. Le, Searching for activation
that exceeds 11,600.
functions, 2017, arXiv preprint arXiv:1710.05941.
[27] L.N. Smith, Cyclical learning rates for training neural networks, in: 2017 IEEE
Winter Conference on Applications of Computer Vision, WACV, 2017, pp.
464–472. Ekta Sharma holds a Ph.D. degree in Artificial Intelligence
[28] Diederik P. Kingma, Jimmy Ba, Adam: A method for stochastic optimization, from University of Southern Queensland, Australia. She
2014, arXiv preprint arXiv:1412.6980. completed her M.Phil., M.Sc. (Operations Research), and
B.Sc. (Mathematical Sciences), from the University of Delhi,
[29] M. Sadeghi, E.G. Larsson, Physical adversarial attacks against end-to-end
India. She has over a decade of experience in both Academia
autoencoder communication systems, IEEE Commun. Lett. 23 (5) (2019)
and Industry across varied roles from Area Manager to
847–850.
Learning Advisor, Lecturer, and Researcher at Universi-
[30] Shan Ai, Arthur Sandor Voundi Koe, Teng Huang, Adversarial perturbation in ties in Europe, India, and Australia. Her research work
remote sensing image recognition, Appl. Soft Comput. 105 (2021) 107252, URL has received funding from the Australian Government, the
https://www.sciencedirect.com/science/article/pii/S1568494621001757. Australian Defence Science and Technology Group, The Aus-
[31] S. Kojima, K. Maruta, C.J. Ahn, Adaptive modulation and coding using neural tralian Mathematical Sciences Institute, and the Australian
network based SNR estimation, IEEE Access 7 (2019) 183545–183553. Tropical Agriculture Institute. She is working on varied
[32] P.V.R. Ferreira, R. Paffenroth, A.M. Wyglinski, T.M. Hackett, S.G. Bilen, R.C. cross-disciplinary research projects in artificial intelligence
Reinhart, D.J. Mortensen, Reinforcement learning for satellite communications: and Wireless Communications.
From LEO to deep space operations, IEEE Commun. Mag. 57 (5) (2019) 70–75.

17
C.P. Davey et al. Applied Soft Computing 159 (2024) 111672

Sancho Salcedo-Sanz was born in Madrid, Spain, in 1974. and neural networks in different problems of Science and
He received the B.S degree in Physics from Universidad Technology.
Complutense de Madrid, Spain, in 1998, the Ph.D. degree
in Telecommunications Engineering from the Universidad
Jeffrey Soar is Personal Chair in Human-Centered Tech-
Carlos III de Madrid, Spain, in 2002, and the Ph.D. degree in
nology at the School of Business, University of Southern
Physics from Universidad Complutense de Madrid in 2019.
Queensland. His research is in AI, e-business, e-health,
He spent one year in the School of Computer Science, The
technology and development, and social and organisational
University of Birmingham, U.K, as postdoctoral Research
change. He came to academic research from a long and
Fellow. Currently, he is a Full Professor at the department
distinguished career in industry including as chief informa-
of Signal Processing and Communications, Universidad de
tion officer in government agencies in Australia and New
Alcalá, Spain. He has co-authored more than 240 inter-
Zealand.
national journal papers in the field of Machine Learning
and Soft-Computing and its applications. His current inter-
ests deal with Soft-computing techniques, hybrid algorithms

18

You might also like