Electronics 11 02100
Electronics 11 02100
Article
Signals Recognition by CNN Based on Attention Mechanism
Feng Tian, Li Wang * and Meng Xia
School of Communication and Information Engineering, Xi’an University of Science and Technology,
Xi’an 710054, China; [email protected] (F.T.); [email protected] (M.X.)
* Correspondence: [email protected]; Tel.: +86-029-1959-156-5212
1. Introduction
Citation: Tian, F.; Wang, L.; Xia, M.
Signals Recognition by CNN Based
Automatic modulation recognition (AMR) is a pivotal technique of non-collaborative
on Attention Mechanism. Electronics
communication, which refers to automatically recognizing the modulation type of the sig-
2022, 11, 2100. https://doi.org/ nal by the receiver with limited or no prior information, providing the basis for subsequent
10.3390/electronics11132100 signal extraction and processing [1,2]. With the development of software-defined radio tech-
nology and the increasingly complex electromagnetic environment, the wireless channels
Received: 23 May 2022
of various kinds of noise and interference are gradually increasing. Traditional modulation
Accepted: 2 July 2022
recognition technology for communication signals has been unable to effectively carry out
Published: 5 July 2022
recognition, which represents a severe test for modulation recognition technology. There-
Publisher’s Note: MDPI stays neutral fore, how to efficiently and accurately realize the modulation recognition of communication
with regard to jurisdictional claims in signals has attracted more and more attention. Typical modulation recognition techniques
published maps and institutional affil- can broadly be divided into three categories: likelihood ratio recognition methods based on
iations. decision theory (LB), pattern recognition methods based on feature extraction (FB), and
deep learning recognition methods [3,4].
LB methods [5–9] make decisions by calculating the likelihood function of the re-
ceived signals and then comparing it to a certain threshold. Although LB methods can
Copyright: © 2022 by the authors.
minimize the error rate, their high computational complexity make them unsuitable for
Licensee MDPI, Basel, Switzerland.
applications such as unknown channels and clock offsets due to inaccurate internal clock
This article is an open access article
distributed under the terms and
sources between the transmitter and the receiver [2]. FB methods [10–13] require the man-
conditions of the Creative Commons
ual calculation of certain features of the received signal, such as the normalized central
Attribution (CC BY) license (https:// amplitude mean, standard deviation and kurtosis, normalized absolute instantaneous
creativecommons.org/licenses/by/ frequency, higher-order moments, higher-order volume accumulation, cyclic moments, and
4.0/). other features. Although the computational complexity of these features is relatively low,
the characteristics are overly dependent on manual analysis for their selection. It is diffi-
cult to characterize multiple modulation types in complex electromagnetic environments.
Therefore, AMR is a very challenging task, especially when there is no prior information
about the received signal in non-collaborative communication [1].
In recent years, due to the development of neural network units, such as hidden
layers and non-linear activation, deep neural networks have been particularly prominent
in image classification, machine translation, and natural language processing [14–20], as
deep learning models can extract deeper information hidden in the data. At present, deep
learning is also progressively being applied to wireless communication and radio signal
processing. In modulation recognition, deep learning methods obtain better performance
than FB methods. For instance, ref. [21] used a convolutional neural network (CNN) for
modulation recognition, which was shown experimentally to be close to the FB method
and had greater flexibility in detecting various modulation types. To further improve
performance, ref. [22] introduced a densely connected network (DenseNet) to deepen the
feature propagation in deep neural networks by creating shortcut paths between different
layers of the network. A convolutional long short-term deep neural network (CLDNN) was
introduced in [23], which exploits the complementary nature of CNN and LSTM to combine
the architectures of CNN and long short-term memory (LSTM) into deep neural networks.
The main difference between the deep learning-based modulation recognition system and
the traditional modulation recognition system is that the feature extraction is automatically
learned by the neural network, which avoids the feature design process and is more
appropriate for application in non-collaborative communication scenario requirements.
Existing methods are often inaccurate in estimating the reality of complex electro-
magnetic environments and signal-to-noise ratios (SNRs). Realistic channel SNRs may
be unstable or rapidly changing under certain circumstances. Although the use of sim-
ulated and synthetic data sets for learning is not favored in deep learning, the field of
radio communication is a special case. As the real complex electronics environment is
quantified as much as possible, and as simulation methods are refined, the gap between
synthetic and real data sets will narrow. This will facilitate modulation identification in
complex electromagnetic environments. This paper simulates a complex electromagnetic
environment by constructing channels containing additive white Gaussian noise (AWGN),
Rician multipath fading, and clock offset. Modulated signals with 10 kinds of transmission
impairments under different SNRs are synthesized into a dataset. A network model based
on ResNext is then established, the residual blocks in the traditional ResNet are replaced
with a residual block of the same topology stacked in parallel and an attention layer CBAM
is introduced, connected behind each convolution block of ResNext. The impaired I/Q
signals are used directly as input, by increasing the dimension of ‘cardinality’ to extract
greater signal features and improve the accuracy of modulation recognition. Simulation
results show that the established network outperforms other neural networks.
The remainder of this paper is organized as follows. Section 2 describes the construc-
tion of the signal recognition model. Section 3 discusses the experimental results. Section 4
provides the conclusions.
Received signal
Preprocessing
Feature
extraction
established method
ResNeXt+CBAM
Traditional AMR
This paper
methods
Feature SNR
selection estimation
Classifier
Demodulation
where g(t) is the additive white Gaussian noise (AWGN), s(t) is the transmit signals of
different modulation types, and SNR is defined as Ps /Pn (Ps is the signal power, Pn is the
noise power). The commonly used modulation methods are as follows:
When the transmit signal is a PSK or FSK signal, s(t) can be expressed in Equation (2)
as follows:
s(t) = [ Am ∑ an n(t − nTs )] cos(2π ( f c + f m )t + Ψ0 + Ψm ) (2)
n
where Am and an are the modulation amplitude and symbol sequence, respectively, n(t)
is the signal pulse, and Ts denotes the symbol period. f c and f m denote the carrier fre-
quency and modulation frequency. Ψ0 and Ψm denote the initial phase and modulation
phase, respectively.
When the transmit signals are M-QAM signals, which is slightly different from PSK
and FSK signals in that there are two quadrature carriers and the two carriers are modulated
by an and bn , respectively, s(t) can be expressed in Equation (3):
After determining the transmitting signal s(t), for the actual radio wave propagation
channel, the electromagnetic waves will be transmitted from different paths to the receiver
through reflections from multiple objects, creating a multipath effect. However, since there
are different transmission paths with different time delays, each propagation path will
change with time, and the interrelationship between the component fields involved in the
interference will also change with time, causing random changes in the synthetic wave-field,
and thus causing the fading of the total received field. In a multipath propagation scenario
with a strong path, the received signal is a statistical model of a multipath channel whose
impulse response amplitude follows Rician fading α(t).
A clock offset is caused by inaccurate internal time sources of the transmitter and
receiver. Clock offset causes the center frequency (used to down-convert the signal to
baseband) and the digital-to-analog converter (DAC) sample rate to be different from the
ideal value. Therefore, it is necessary to perform frequency offset f 0 and phase offset θ0 on
the signal based on the clock offset factor and the center frequency.
To simulate the complex electromagnetic environment, it is necessary to add Rician
multipath fading, frequency offset, and phase offset to the channel. The received sampled
signal r (t) can be re-expressed in Equation (4).
The purpose of modulation recognition is to determine the P(s(t) ∈ N (i )|r (t) ) after re-
ceiving the signal r (t), where N (i ) denotes the i-th modulation, and the goal is to recognize
the modulation type i from the received signal r (t). For simplicity, the received signal is
usually represented by its in-phase and quadrature I/Q components, which represent the
r (t) real and imaginary parts, respectively. For this purpose, the ResNext network, based
on the attention mechanism, is used to learn recognition, first processing the dataset to set
the network parameters and then calculating the recognition accuracy on the test dataset.
256-d in
256-d in
256-d out
256-d out
Figure 2. ResNet residual block (left) vs. ResNext residual block (right).
Channel Spatial
Attention Attention
Module Module
Input X X Refined
Feature Feature
Figure 3. CBAM structure diagram.
The channel attention module, as shown in Figure 4a, compresses the feature map
in the spatial dimension and then operates after obtaining a one-dimensional vector. The
input feature maps are compressed by the MaxPooling layer and the MeanPooling layer,
and then sent to the shared fully connected layer. Then the results of the shared fully
connected layer are summed and activated by the activation function sigmoid to obtain the
final channel attention weights MC ( F ), which can be expressed in Equation (5).
c c
MC ( F ) = σ (W1 (W0 ( Favg )) + (W1 (W0 ( Fmax ))) (5)
where F is the input feature mapping, and W0 and W1 represent the weight matrices of the
hidden layer and the fully connected layer, respectively.
The spatial attention module in Figure 4b can be regarded as channel compression,
performing MeanPooling and MaxPooling on the feature maps of the channel dimension.
The previously obtained feature maps (the number of channels is equal to 1) are merged to
obtain a two-channel feature map, and to obtain the spatial attention weight Ms ( F ), which
can be expressed in Equation (6).
MS ( F ) = σ ( f 7×7 ([ Favg
s s
; Fmax ])) (6)
where σ is the sigmoid activation operation and f 7×7 represents the kernel size of the convolution.
Electronics 2022, 11, 2100 6 of 13
MaxPool
Input
Feature Channel
F Shared MLP Attention
AvgPool
MC(F)
(a)
Conv layer
Weight layers
F(x) Relu
x
Weight layers
F(x)+x
Relu
Figure 6 shows the network structure designed for automatic feature extraction of
10 types of modulated signals. Based on ResNext, the CBAM module is introduced and
connected after each convolutional block of ResNext. It consists of four convolutional
modules and two fully connected layers, where each convolutional block consists of a down-
sampling layer and two ResNext residual blocks, and a MaxPooling layer. In each ResNext
Electronics 2022, 11, 2100 7 of 13
residual block, to avoid the gradient disappearance and slow network convergence caused
by the internal covariate shift during the training process, it is necessary to batch normalize
the activation values at the end. The original radio signal first enters the convolution
module through the input layer, and the convolution module inputs the extracted features
to the attention layer, weights the features in the attention layer, and finally inputs to the
next layer after the cascade processing of nodes. The final features are fed to fully connected
layers for subsequent classification. The first fully connected layer uses the Selu scaling
exponential linear unit activation function, and the second fully connected layer uses the
Softmax activation function with an output size of 10.
During the network training process, the cross-entropy loss function in Equation (7) is
chosen to evaluate the network.
N
1
Loss = −
N ∑ log(oM(i)) (7)
i =1
where N represents the number of training samples and oM (i ) represents the prediction
probability that the i-th sample belongs to class M (i ). The training process uses Adam
optimizer back-propagation to update all network parameters (including convolutional,
attention and fully connected layers).
input
Input stem
Down sampling
ResNext
Residual
Block2 CBAM
Attention weights
...
256,11,4 256,11,4 256,11,4
Block 3 CBAM
Cardinality=16
Attention weights
Fully connected
layers,Selu
Relu
output
Figure 6. ResNext network model based on attention mechanism.
Electronics 2022, 11, 2100 8 of 13
I/Q Impaired
2128
Bits signal Signal I/Q signalI 0 6 14 7 7 14 8 ... 8 0
Q 7 0 3 5 3 7 0 ... 0 1
Channel
AWGN
fading
Offset
Rician
Clock
The dataset parameters are also set as close as possible to the realistically complex
electromagnetic environment. The sampling frequency f s affects the classification perfor-
mance only in the fading channel, and the Rician channel is modeled as a flat channel when
f s = 200 kHz. In the flat channel, the multipath structure of the channel allows the spectral
characteristics of the transmitted signal to be preserved on the receiver side. Taking into
account that the length of the signal sequence may have an impact on the results, the model
shown in Figure 6 is used to train on datasets with sequence lengths of 32, 128, and 256,
respectively. The over-sampling rate is 4. Other signal parameters are set as follows: the
SNR range is [−6, 4], and the step size is 2 dB. The center frequencies of digital and analog
modulation types are 902 MHz and 100 MHz, respectively. A total of 10,000 data points are
generated for each modulation type at each SNR, of which 80% are used for training, 10%
for validation, and 10% for testing. Figure 8 shows the effect of different sequence lengths
on the recognition results, so the dataset with sequence length L = 128 was chosen as the
input to the modulation recognition model.
$ F F X U D F \
From the comparative performance plots of various neural networks given in Figure 9,
it is clear that the recognition accuracy of all five networks increased with increase in
SNR due to there being less noise at high SNR. Under high SNR, ResNext + CBAM and
ResNext were able to achieve more than 90% recognition accuracy, which was the best
result among all models, due to the residual connection method and the topology of
ResNext’s group convolution. The ResNext model, with introduction of a CBAM attention
layer, was about 3% higher than the ordinary ResNext model, reflecting the benefit of
introducing a CBAM attention mechanism, which can help to extract effective features from
noise-contaminated data.
$ F F X U D F \
5 H V 1 H [ W
&