Low-Complexity Neural Networks For Baseband Signal Processing
Low-Complexity Neural Networks For Baseband Signal Processing
Signal Processing
Guillaume Larue∗† , Mona Dhiflaoui∗ , Louis-Adrien Dufrene∗ , Quentin Lampin∗
Paul Chollet† , Hadi Ghauch† and Ghaya Rekaya† ,
∗ CITYTeam, Orange, Grenoble, France - E-mail: [email protected]
† COMELEC dept., LTCI, Institut Polytechnique de Paris, Palaiseau, France - E-mail: [email protected]
2020 IEEE Globecom Workshops (GC Wkshps) | 978-1-7281-7307-8/20/$31.00 ©2020 IEEE | DOI: 10.1109/GCWkshps50303.2020.9367521
Abstract—This study investigates the use of neural networks not have any known analytical solution. A potential draw-
for the physical layer in the context of Internet of Things. In such back of models with important number of parameters lies
systems, devices face challenging energy, computational and cost in the complex and time-consuming inference and learning
constraints that advocate for a low-complexity baseband signal
processing. In this work, low-complexity neural networks are processes. These limitations are usually incompatible with the
proposed as promising candidates. They present adaptability to energy, computational power and cost constraints of Internet
operating conditions, high performance to complexity ratio and of Things (IoT) and Green networks. Moreover, due to their
also offer a good explainability, crucial in most communications inherent ”black-box” design, DNN are often not explainable,
systems that cannot rely on ”black-box” solutions. Moreover, i.e. they do not offer the analytical guarantees, and operational
recent advances in dedicated hardware for neural networks
processing bring new perspectives in terms of efficiency and reliability, usually required in communication systems.
flexibility that motivate their use at the physical layer. To Therefore, this paper discusses the design of a NN based
illustrate how classical baseband signal processing algorithms PHY layer with realistic implementation constraints, and ad-
can be translated to minimal neural networks, two models are vocates for the use of low-complexity NN, designed to have
proposed in this paper to realize single-path equalization and
demodulation of M-QAM signals. These models are assessed a minimal architecture regarding the problem to solve. Using
using both simulation and experimentation and achieve near such algorithms, particularly when supported by NN dedicated
optimal performances. hardware, allow to leverage many advantages such as higher
efficiency, lower energy consumption or better explainability.
I. I NTRODUCTION The aim of this study is to illustrate this approach with
concrete examples.
In the context of 5G and beyond 5G communication sys- This paper is structured as follows. Section II describes
tems, the use of Neural Networks (NN) at the physical (PHY) the system model used throughout the study. Section III
layer is of growing interest. Envisioned use cases are as varied exposes the main contributions of the paper and illustrates
as signal and modulation classification [1], coding and decod- how the problems of single-path equalization and M-QAM
ing [2], interference cancellation [3], joint channel estimation demodulation can be expressed in terms of low-complexity
and symbol detection [4], and joint modulation and coding and explainable NN architectures. Section IV briefly addresses
[5]. Two approaches are usually proposed in the literature. the question of the learning processes that may be involved
The first one, named functional approach, aims to replace in the configuration of such models. Section V presents an
existing processing blocks of a standard PHY layer by NN experimental evaluation of the aforementioned system using
based algorithms [1], [6]. The second one, named End-to-End Software Defined Radio (SDR) and a channel emulator. The
approach, addresses the question in a more ground-breaking system achieves near-optimal Bit-Error Rate (BER) perfor-
way. The whole signal processing chain of both transmitter mance. Section VI summarize the results of the study and
and receiver is redefined as one or more NN that learn a new explores some perspectives of using efficient NN for PHY
communication scheme. This scheme is optimized with regard layer, before concluding the paper.
to a given channel and communication scenario [5]. This paper
focus on the functional approach that is compatible with an
implementation in current standards. II. S YSTEM MODEL
In both approaches, the design of NN architectures often
follow empirical methods inherited from domains such as The communication system described in Figure 1 is consid-
Computer Vision or Natural Language Processing, where ered. Random binary sequences are modulated using a Gray
NN showed impressive performance compared to handcrafted mapped M-QAM modulator. A single-path channel model is
algorithms. Notable models, so called Deep Neural Networks adopted and defined as its baseband equivalent. It is composed
(DNN), leverage several hidden layers to express sophisticated by the following combination of propagation effects and
non-linear functions, solving complex problems which do hardware impairments:
Authorized licensed use limited to: UNIVERSITE DE LILLE. Downloaded on March 26,2024 at 10:18:11 UTC from IEEE Xplore. Restrictions apply.
978-1-7281-7307-8/20/$31.00 ©2020 IEEE
Fig. 1. Baseband description of the considered QAM communication chain.
• Additive White Gaussian Noise (AWGN): real and imag- III. L OW- COMPLEXITY NN MODELS FOR PHY LAYER
inary parts of the noise are sampled from a zero-mean This section proposes two new low-complexity NN for base-
Gaussian distribution1 : band processing. They are described purely as deterministic
( )T ( √ )
i.i.d mathematical models with an analytical derivation of their
nk = nik nqk with nik , nqk ∼ N 0, N0 /2
parameters. The question of Machine Learning (ML) and
(1) parameters optimization will be briefly addressed in section
• Phase shift: symbols are rotated by an angle ϕ with the IV.
following rotation matrix:
( ) A. Designing a NN based single-path equalization system
cos(ϕ) sin(ϕ)
R= (2) This section illustrates how a simple NN can be used to
− sin(ϕ) cos(ϕ)
perform a single-path equalization. The objective of the equal-
• IQ amplitude imbalance: a different scaling factor is ization block is to retrieve the original samples, before the
applied to I and Q channels as described in the following channel impairments. The optimal solution given the channel
diagonal stretching matrix: impairments described in Equation (5) is:
( ) uk T = (yk − o)T (RA)−1
A = diag αi αq (3)
= yk T (RA)−1 − oT (RA)−1 (7)
• IQ offset: a different offset is applied to I and Q channels: T T −1
= xk + nk (RA)
( )T where
o = ωi ωq (4) ( )
1 αq cos(ϕ) −αq sin(ϕ)
Phase shift, IQ amplitude imbalance and IQ offset are con- (RA)−1 = (8)
αi αq αi sin(ϕ) αi cos(ϕ)
sidered constant under block-fading hypothesis2 . The overall
channel effect on a given sample is: From the optimal equalization scheme presented previously,
one can deduce a straightforward NN implementation. The
single-path equalization operation can be applied indepen-
yk T = xk T RA + (o + nk )T (5)
dently to all samples of a given frame affected by the
( )T ( )T ( ) ( )T same channel under block-fading hypothesis. A Convolutional
y ik xik αi cos(ϕ) αq sin(ϕ) ωi + nik
= + Neural Network (CNN) is therefore particularly suited in
yqk xqk −αi sin(ϕ) αq cos(ϕ) ωq + nqk
this case and allows for an efficient parallel processing of
where xk is the k-th input symbol and yk the corresponding samples blocks. The CNN shift invariant architecture, based
sample, affected by the channel. on shared weights, drastically reduces the number of trainable
On the receiver side, the effect of the channel must be
mitigated by a single-path equalizer. If perfect equalization
is considered, one can obtain the equalized sample:
Authorized licensed use limited to: UNIVERSITE DE LILLE. Downloaded on March 26,2024 at 10:18:11 UTC from IEEE Xplore. Restrictions apply.
in equation (11) with the example of ReLU activation. This
non-linearity can then be used to define a decision boundary.
{
s if s > 0
s = eT w + c and ReLU(s) = (11)
0 otherwise
where e is the input vector, w the weights vector and c
the bias of the neuron. s is the output of the neuron before
activation function, ReLU(s) is the output of the neuron after
applying the ReLU activation.
The NN should associate to a sample the corresponding
Fig. 3. Left figure: Received samples with impairments - Right figure:
Samples equalized by the NN.
binary code-word. Therefore, the NN need to solve both
decision and demapping problems. The proposed model in the
case of a 16-QAM, as described in Figure 5, is based on the
parameters, independently of the number of samples to pro- following observations:
cess [8]. The proposed minimal CNN to perform single-path • The demodulation operations can be applied indepen-
equalization consists of one linear layer (i.e. without a non- dently to each sample. For reasons similar to those
linear activation function) with only two kernels of size (1, 2) described in section III-A, a CNN with kernels of the
as described in Figure 2. An input matrix of shape (K, 2) is size of one sample is particularly well-suited.
considered, where K corresponds to the number of samples • The NN carries out both decision and demapping tasks.
and 2 to real (I channel) and imaginary (Q channel) parts of Following a similar approach to the one proposed in
these samples. The operation performed by such a CNN on a [9], a bit-level decision process is considered instead of
given sample is expressed as: symbol-level decision process followed by a symbol-to-
bit demapping. The objective of the NN is to approximate
uk T = yk T W + cT (9) the functions stated as b0 ,b1 ,b2 and b3 in Figure 4. Con-
sidering the usual monotonic activation functions offered
where yk is the k-th input sample (real and imaginary part by the majority of NN frameworks, the NN needs at least
of the sample form a vector of two elements) and uk the k-th two layers to compute the desired functions, as b1 and b3
equalized sample. W and c are the (2 × 2) weights matrix aren’t monotonic:
(each column representing one of the two kernels) and the – Regarding the input layer, the minimum number
bias vector of size two, respectively. of kernels corresponds to the number of deci-
Given the optimal solution described in equation (7), the sions boundaries associated to the Least Significant
optimal NN parameters are: Bit (LSB) on each channel (namely b1 and b3 ).
{ Consequently, four non-linear kernels are needed
Woptimal = (RA)−1 (2log2 (M )/2 in the general case of a M-QAM).
−1 (10)
optimal = −o (RA)
cT T
Considering computational power constraints, ReLU
activation is chosen. All the decision boundaries can
Figure 3 shows the result of the operation applied by this be expressed as a combination of the four kernels by
minimal NN to a 16-QAM constellation altered by a channel the output layer.
with arbitrarily defined impairments. One can see that the
samples are perfectly equalized.
Authorized licensed use limited to: UNIVERSITE DE LILLE. Downloaded on March 26,2024 at 10:18:11 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Comparison of the BER performance of a regular, minimal Euclidean
distance based, and the proposed CNN based demodulators over a simulated
AWGN channel for different QAM orders.
Fig. 5. The proposed model for 16-QAM demodulation leverages properties
of CNN to perform the same processing on both I and Q channels and on all
samples. It uses only 10 shared parameters. decision boundaries is proposed3 :
{
w1 = (−1) c1 = 0
– The output of the NN needs to represent the values of Layer1
w2 = (1) c2 = 0
the four bits associated to each symbol. As a result, { (13)
w3 = (−1 1) c3 = 0
the output layer needs four kernels (log2 (M ) for a Layer2
w4 = (−1 −1) c4 = 2δ
M-QAM). The outputs representing binary values,
a sigmoid activation function is particularly well- where wi and ci are respectively the weights vectors and
suited. bias of the i-th kernel (following numbering of Figure 5).
2δ corresponds to the √ inter-symbol distance of the considered
• Under the assumption that the samples are perfectly
constellation. δ = 1/ 10 in the case of a 16-QAM with a
equalized, I and Q channels can be processed separately
power normalized to one.
with the same operations, therefore dividing the number
As shown on Figure 6, the proposed model reaches theo-
and size of the aforementioned kernels by two.
retical optimal BER performance4 over AWGN channel for
The architecture of this model is scalable to any QAM order different QAM orders with appropriate configurations of the
by following the number of parameters described in table I. NN.
As a reference, a demodulator based on minimal Euclidean
TABLE I distance needs 3M additions, 2M multiplications and M −
M INIMAL NN MODEL ARCHITECTURE PROPOSED FOR M-QAM
DEMODULATION 1 comparisons, with M the order of the QAM modulation
[9]. Table II presents a comparison in terms of computational
CONVOLUTIONAL LAYER 1 complexity between the regular demodulator and the proposed
Input matrix dimensions (K, 2)
NN model. One can see that the NN outperforms, in terms of
Kernel number and properties n1 = 2log2 (M )/2−1 , size (1, 1), stride (1, 1)
Activation ReLU
complexity, the proposed reference demodulator.
Output tensor dimensions (K, 2, n1 )
CONVOLUTIONAL LAYER 2 TABLE II
Kernel number and properties n2 = log2 (M )/2, size (1, 1), stride (1, 1) C OMPLEXITY COMPARISON FOR DIFFERENT DEMODULATION SCHEMES
Activation Hard sigmoid QAM Real Multiplications Real Additions Real Comparisons
Output tensor dimensions (K, 2, n2 ) Order Regular NN Improv. Regular NN Improv. Regular NN Improv.
FLATTEN LAYER (M) (2M) Model (%) (3M) Model (%) (M-1) Model (%)
4 8 6 25 12 6 50 3 6 -50
Output vector dimension K log2 (M ) 16 32 16 50 48 12 75 15 12 20
64 128 38 70 192 20 90 63 20 68
256 512 88 83 768 32 96 255 32 87
The choice of a hard sigmoid instead of a sigmoid activation 1024 2048 202 90 3072 52 98 1023 52 95
on the output layer of the NN is proposed to lower the
computational complexity. As shown in equation (12), the hard
IV. L EARNING PROCESSES
sigmoid requires at most two comparisons, one addition and
one multiplication. Previous sections propose a mathematical definition of NN
architectures and an analytical derivation of their parameters.
0 if x < −2.5
3 Proposed configuration is not unique and assume a hard demodulation
sigmoidhard (x) = 1 if x > 2.5 (12)
process with a rounding of the output values to either zero or one. Higher
0.2x + 0.5 otherwise dynamic of the output layer parameters might be used to approximate the step
function with hard sigmoid activation and avoid rounding the outputs.
In the case of 16-QAM, one possible configuration of the 4 Optimal BER over AWGN channel is computed considering the nearest
weights and bias that is optimal with regard to the theoretical neighbor approximation and Gray coding.
Authorized licensed use limited to: UNIVERSITE DE LILLE. Downloaded on March 26,2024 at 10:18:11 UTC from IEEE Xplore. Restrictions apply.
This section describes the learning processes that can be ap-
plied to configure the proposed models. Such process applied
to the configuration of the model proposed in section III-B is
not of particular interest as one can view the demodulation as
a deterministic problem (i.e. that is not supposed to change
over time). On the contrary, it might be interesting to leverage
the learning capabilities of NN to continuously adapt the
equalization model proposed in section III-A to the channel
evolution.
To test this approach the channel impairments described in
Fig. 7. Description of the processing chain - Equalization model is trained
table III are considered. The learning algorithm aims to find using the received ZC sequence and the theoretical one, known by the receiver.
the weight matrix W and bias vector c that minimize L2 loss Using the trained equalization model, the payload is equalized and then
function between the original samples x[k] and the equalized demodulated using a predefined demodulation model. These operations are
repeated for each block.
samples u[k], as described in Equation (14).
Authorized licensed use limited to: UNIVERSITE DE LILLE. Downloaded on March 26,2024 at 10:18:11 UTC from IEEE Xplore. Restrictions apply.
often criticized in DNN and allow for a better explainabil-
ity. Moreover, even if low-complexity NN might not be as
expressive as their deep counterpart, they still profit of the
interesting properties offered by NN dedicated hardware such
as Tensor Processing Units (TPU). Indeed, NN are universal
approximators [14] but rely on simple individual mathematical
operations such as multiplications, additions and basic non-
linear activation functions. The expressiveness of NN models
lies in their layered architectures and not in complex individual
mathematical operations. These properties of NN allow for
very efficient and generic hardware to be developed [15].
This hardware generality is interesting in terms of scalability
of the PHY layer but also in terms of possible hardware
mutualization in a distributed learning and inference scheme
Fig. 8. Comparison of the BER performance of a regular receiver and the or in a Mobile Edge Computing context. Moreover, recent
proposed CNN based receiver over an emulated AWGN channel. versions of the aforementioned hardware, such as Google
Edge TPU, have been developed to address embedded and
low power applications. Hence, they are particularly appealing
equalization model. As described in Figure 7 the proposed
for IoT devices. As more and more companies such as Intel,
system works as follow:
Nvidia, Apple or Huawei develop NN dedicated hardware, the
• For each block, the equalization model is trained by Total Cost of Ownership (TCO) of NN based communication
comparing the received and theoretical ZC sequences. systems is decreasing. Thus, the more elements of the PHY
The 256 samples of the ZC sequences are divided in two layer (and beyond) are replaced by NN algorithms, the more
parts: two third for training and one third for validation. one can expect to benefit of the properties described above.
L2 loss and Adam optimizer with a learning rate of 0.1
are used. R EFERENCES
• After learning, the 4096 IQ samples of the payload are [1] N. E. West and T. O’Shea, “Deep architectures for modulation recogni-
tion”, IEEE, DySPAN, pp. 1–6, 2017.
corrected using the newly trained equalization model [2] H. Kim, Y. Jiang, R. Rana, S. Kannan, S. Oh, and P. Viswanath,
before being fed to the demodulation model. “Communication Algorithms via Deep Learning”, ArXiv180509317,
As described in Figure 8, this simple system reaches near 2018.
[3] M. Kim, N.-I. Kim, W. Lee, and D.-H. Cho, “Deep Learning-Aided
optimal BER. As a comparison, the regular receiver esti- SCMA”, IEEE, CL, vol. 22, no. 4, pp. 720–723, 2018.
mates the channel parameters by computing the correlation [4] H. Ye, G. Y. Li, and B. Juang, “Power of Deep Learning for Channel
between the received ZC sequence and the theoretical one Estimation and Signal Detection in OFDM Systems”, IEEE, WCL, vol.
7, no. 1, pp. 114–117, 2018.
to perform single-path equalization. A minimal Euclidean [5] T. O’Shea and J. Hoydis, “An Introduction to Deep Learning for the
distance algorithm is then used for demodulation. One can Physical Layer,” IEEE, TCCN, vol. 3, no. 4, pp. 563–575, 2017.
see a slight degradation of the BER performance of the NN [6] K. Ohnishi and K. Nakayama, “A neural demodulator for quadrature
amplitude modulation signals”, IEEE, ICNN, vol. 4, pp. 1933-1938,
model compared to the regular receiver. This can be explained 1996.
by the learning bias mentioned in section IV. [7] E. Gonzales Ceballos, “A novel adaptative multilevel–Quadrature ampli-
tude modulation (M-QAM) receiver using machine learning to mitigate
VI. R ESULTS , C ONCLUSIONS AND PERSPECTIVES multipath fading channel effects”, ProQuest, 2018.
[8] Y. Lecun, P. Haffner, L. Bottou and Y Bengio, ”Object Recognition
This paper has described how the problems of single-path with Gradient-Based Learning”, Springer, Computer Vision, vol. 1681,
pp. 319-345, 1999.
equalization and M-QAM demodulation can be expressed [9] H.G. Yeh and H.S. Seo, “Low Complexity Demodulator for M-ary
in terms of low-complexity NN architectures, relevant with QAM”, IEEE, WTS, pp. 1-6, 2007.
regard to the constraints of IoT systems. The proposed models [10] J.R. Barry, S.K. Wilson, S.G. Wilson and E. Biglieri, “MMSE Linear
Equalization”, Elsevier, Transmission Techniques for Digital Communi-
have been experimentally assessed using SDR based com- cations, pp. 308-311, 2016.
munication chain and a channel emulator, achieving near- [11] ”Product page - USRP B210 Software Defined Radio”,
optimal performance. Questions related to the training of these https://www.ettus.com/all-products/ub210-kit/ [Accessed: April 2020].
[12] ”Product page - Spirent Vertex Channel emulator”,
minimal models, and notably the question of the learning bias https://www.spirent.com/products/radio-frequency-and-wi-fi-channel-
as described in section IV, have not been addressed in depth emulation-vertex [Accessed: April 2020].
in this paper and have been left for further publications. [13] D. Chu, “Polyphase codes with good periodic correlation properties”,
IEEE, TIT, vol. 18, pp. 531-532, 1972.
The study of NN complexity is of great importance in [14] M. Leshno, V. Ya. Lin, A. Pinkus and S. Schocken, ”Multilayer
the context of 5G and beyond 5G cellular networks for IoT, feedforward networks with a nonpolynomial activation function can
where it is not always possible to use deep NN. Using low- approximate any function”, Science, Neural Networks, vol. 6, pp. 861-
867, 1993.
complexity NN offers the advantages of higher efficiency [15] A. Reuther, P. Michaleas, M. Jones, V. Gadepally, S. Samsi, and J.
and reduced learning and inference time. This paper demon- Kepner, “Survey and Benchmarking of Machine Learning Accelerators”,
strates that they also mitigate the ”black-box” issue that is arXiv190811348, 2019.
Authorized licensed use limited to: UNIVERSITE DE LILLE. Downloaded on March 26,2024 at 10:18:11 UTC from IEEE Xplore. Restrictions apply.