0% found this document useful (0 votes)
42 views5 pages

Deep Neural Network Architectures For Modulation

The document discusses using deep neural network architectures for modulation classification. It evaluates convolutional neural network (CNN), residual network (ResNet), densely connected network (DenseNet), and convolutional long short-term deep neural network (CLDNN) models on a radio signal dataset. The CLDNN architecture achieved the best performance at 88.5% accuracy, improving on the previous state-of-the-art result from the paper by over 13%.

Uploaded by

Gökhan Kaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views5 pages

Deep Neural Network Architectures For Modulation

The document discusses using deep neural network architectures for modulation classification. It evaluates convolutional neural network (CNN), residual network (ResNet), densely connected network (DenseNet), and convolutional long short-term deep neural network (CLDNN) models on a radio signal dataset. The CLDNN architecture achieved the best performance at 88.5% accuracy, improving on the previous state-of-the-art result from the paper by over 13%.

Uploaded by

Gökhan Kaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Deep Neural Network Architectures for Modulation

Classification
Xiaoyu Liu, Diyu Yang, and Aly El Gamal
School of Electrical and Computer Engineering
Purdue University
Email: {liu1962, yang1467, elgamala}@purdue.edu

Abstract—In this work, we investigate the value of employ- convolutional neural networks (CNN) to the task of radio
ing deep learning for the task of wireless signal modulation modulation recognition [1].
recognition. Recently in [1], a framework has been introduced The Convolutional Neural Network (CNN) has been re-
by generating a dataset using GNU radio that mimics the
imperfections in a real wireless channel, and uses 10 different cently identified as a powerful tool in image classification
modulation types. Further, a convolutional neural network (CNN) and voice signal processing. There have also been successful
architecture was developed and shown to deliver performance attempts to apply this method in other areas such as natural
that exceeds that of expert-based approaches. Here, we follow language processing and video detection. Based on its supreme
the framework of [1] and find deep neural network architectures performance in feature extraction, a simple architecture of
that deliver higher accuracy than the state of the art. We tested
the architecture of [1] and found it to achieve an accuracy of CNN was introduced in [1] for distinguishing between 10 dif-
approximately 75% of correctly recognizing the modulation type. ferent modulations. Simulation results show that CNN not only
We first tune the CNN architecture of [1] and find a design demonstrates better accuracy results, but also provides more
with four convolutional layers and two dense layers that gives flexibility compared to current day expert-based approaches
an accuracy of approximately 83.8% at high SNR. We then [1]. However, CNN has been challenged with problems like
develop architectures based on the recently introduced ideas of
Residual Networks (ResNet [2]) and Densely Connected Networks
(DenseNet [3]) to achieve high SNR accuracies of approximately
83.5% and 86.6%, respectively. Finally, we introduce a Convo-
lutional Long Short-term Deep Neural Network (CLDNN [4]) to
achieve an accuracy of approximately 88.5% at high SNR.

I. I NTRODUCTION

Signal modulation is an essential process in wireless com-


munication systems. Modulation recognition tasks are gen-
erally used for both signal detection and demodulation. The
signal transmission can be smoothly processed only when the
signal receiver demodulates the signal correctly. However, with
the fast development of wireless communication techniques Fig. 1: A building block of ResNet.
and more high-end requirements, the number of modulation
methods and parameters used in wireless communication sys- vanishing or exploding gradients, and accuracy degradation
tems is increasing rapidly. The problem of how to recognize after reaching a certain network depth. Attempts have been
modulation methods accurately is hence becoming more chal- made to address the above issues. Most notably, Residual
lenging. Networks (ResNet) [2] and Densely Connected Networks
Traditional modulation recognition methods usually require (DenseNet) [3] were recently introduced to strengthen feature
prior knowledge of signal and channel parameters, which can propagation in the neural network by creating shortcut paths
be inaccurate under mild circumstances and need to be deliv- between different layers in the network. A building block of a
ered through a separate control channel. Hence, the need for residual learning network can be expressed using the equation
autonomous modulation recognition arises in wireless systems, in Figure 1, where x and H(x) are input and output of the
where modulation schemes are expected to change frequently block respectively, and F is the residual mapping function to
as the environment changes. This leads to considering new be trained. Since it may be hard to learn the mapping H(x)
modulation recognition methods using deep neural networks. = x, this block learns the residual mapping F (x) = H(x) -
Deep Neural Networks (DNN) have played a significant role x, which can be easier to learn [2]. By adding the bypass
in the research domain of video, speech and image processing connection, an identity mapping is created, allowing the deep
in the past few years. Recently the idea of deep learning has network to learn simple functions that would have required a
been introduced to the area of communications by applying shallower network to learn.

978-1-5386-1823-3/17/$31.00 ©2017 IEEE 915 Asilomar 2017


Recently, a Convolutional Long Short-term Deep Neural temporal learning using a two-convolutional-layer deep neural
Network (CLDNN) has been introduced in [1], where it com- network, and achieves accuracy of 75% at high SNR; a
bines the architectures of CNN and Long Short-Term Memory better performance against current day approaches [1]. Our
(LSTM) into a deep neural network by taking advantage training is based on several neural network architectures:
of the complementarity of CNNs, LSTMs, and DNNs [4]. Convolutional neural network (CNN), densely connected con-
The LSTM unit is a memory unit of a Recurrent Neural volutional network (DenseNet) [3], residual network (ResNet)
Network (RNN). RNNs are neural networks with memory [2], and convolutional long short-term deep neural network
that are suitable for learning sequence tasks such as speech (CLDNN) [4]. For CNN, we optimized the following hyper-
recognition and handwritten recognition. LSTM optimizes the parameters: learning rate, dropout rate, filter size, number of
gradient vanishing problem in RNNs by using a forget gate filters and the network depth. We tried different combinations
in its memory cell, which enables the learning of long-term of convolutional layer sequences and filter numbers in each
dependencies. layer to get the best accuracy result. We also develop deeper
Due to the fact that traditional channel models of the wire- networks by adding more convolutional layers on the CNN2
less channel are may not be accurate, in our experiments, we model and get the optimal accuracy from the architecture
use the RadioML2016.10b dataset generated in [1] as the input shown in Figure 2, where four convolutional layers are fol-
dataset. The data is generated in a way that captures various lowed by two dense layers. The first parameter below each
channel imperfections that are present in a real system using convolutional layer represents the number of filters in that
GNU radio. In this paper, we develop architectures of ResNet, layer, while the second and third numbers show the size of
DenseNet, and CLDNN for the modulation recognition task. each filter. For the two dense layers, there are 128 and 11
Using the same dataset generated in [1], we achieve a roughly neurons, in order of their depth in the network.
13.5% accuracy improvement at high SNR against the state
of the art architecture presented in [1]. The improvements
of accuracy are believed to be achieved by better spatial and
temporal feature extraction.

II. S IMULATION S ETUP


We use the RadioML2016.10b dataset generated in [1] as
the input data of our research. Details about the generation of
this dataset can be found in [5]. This dataset contains 10 types
of modulations: eight digital and two analog modulations.
These consist of BPSK, QPSK, 8PSK, QAM16, QAM64, Fig. 2: Architecture of seven-layer CNN
BFSK, CPFSK, and PAM4 for digital modulations, and WB-
FM, and AM-DSB for analog modulations. For digital modu- Inspired by the winner architecture of ImageNet 2015 [2],
lations, the entire Gutenberg works of Shakespeare in ASCII is we apply the ResNet architecture and test architectures with
used, with whitening randomizers applied to ensure equiprob- increasing number of convolutional layers up to 8. We obtain
able symbols and bits. For analog modulations, a continuous the best classification accuracy from the four-convolutional-
voice signal is used as input data, which consists primarily of layer ResNet architecture shown in Figure 3. The output from
acoustic voice speech with some interludes and off times. The the first layer is forwarded to the layer two levels deeper.
entire dataset is a 128-sample complex time-domain vector This structure alleviates the gradient vanishing problem by
generated in GNU radio. 160,000 samples are segmented into explicitly letting each few stacked layers fit into a residual
training and testing datasets through 128-samples rectangular mapping [2]. The hyper-parameters are chosen according
windowing processing, which is similar to the windowed to the basic observation that we make from simple CNN
continuous acoustic voice signal in voice recognition tasks. architectures that having larger filters close to the input layer
The training examples - each consisting of 128 samples - are followed by smaller filters close to the output layer leads to
fed into the neural network in 2*128 vectors with real and significant accuracy improvement.
imaginary parts separated in complex time samples. The labels DenseNet further improves the information flow between
in input data include SNR ground truth and the modulation layers than ResNet does, as each layer obtains additional
type. The SNR of samples is uniformly distributed from - inputs from all preceding layers and passes on its own feature-
20dB to +18dB. All training and testing are done in Keras maps to all subsequent layers [3]. Our DenseNet architecture is
using Nvidia M60 GPU. We use Adam [6] from the deep illustrated in Figure 4, with four convolutional layers densely
learning library as optimizer in Keras and use Theano as back connected with each other and the output fed into two dense
end. layers. We set the parameters of the convolutional layers to
achieve the best accuracy.
A. Evaluation Network We finally propose a CLDNN architecture that includes
We start with a convolutional neural network architecture long short-term memory units. CLDNNs are mainly used in
similar to the CNN2 network from [1], which performs blind voice processing tasks that involve raw time-domain wave-

916
epoch. We note that a high dropout rate may slow down the
training speed but reduces overfitting. In our setting, we set
the dropout rate to 0.6, which is higher than the rate used
in [1], and the activation function in each hidden layer is
a Rectified Linear Unit (ReLU) function. We set patience,
the period during which a non-converging validation loss is
tolerated, to 10 when there are three and four convolutional
layers and get a total training time of around half an hour.
When the network becomes deeper, it starts to take more
than 10 training epochs for the validation loss to decrease, so
setting patience to 20 produces smaller validation loss, which
Fig. 3: Architecture of seven-layer ResNet. means higher accuracy. To get better results, we set patience
to 20 in the remaining models. It takes approximately 1000
seconds per epoch in all three models. The total training time
is approximately 70 hours for the DenseNet model, 20 hours
for the ResNet model, and 50 hours for the CLDNN model.
III. R ESULTS
A. Convolutional Neural Network
We start with a basic two-convolutional-layer neural net-
work, in which two convolutional layers with 256 1x3 filters
and 80 2x3 filters, respectively, are followed by two dense
layers. We then explore the effect of different filter settings
by exchanging filter settings between the two convolutional
Fig. 4: Architecture of seven-layer DenseNet. layers. The performances of networks with different filter
settings demonstrate that layer architectures with larger filters
in earlier convolutional layers and smaller filters in deeper
forms [4]. It is a combination of CNNs, long short-term convolutional layers optimize the accuracy result at high SNR.
memory (LSTM), and deep neural networks (DNN). In our
setting, we choose four convolutional layers in CNN, followed
by one LSTM layer with 50 computing units and a two-
convolutional-layer DNN (see Figure 5). We tested different
CLDNN architectures with different number of memory cells
in the LSTM layer. Our experiments show that an LSTM layer
with 50 cells gives out the best accuracy result compared to
other layer settings.

Fig. 6: Varying hyper-parameters in CNN. Accuracies at lower


SNR are similar, the four-convolutional-layer architecture de-
livers an accuracy of 83.8% at high SNR.

Fig. 5: Architecture of eight-layer CLDNN. Next, we explore the optimal depth of CNN by increasing
the number of convolutional layers from 2 to 5. We find that
the best accuracy at high SNR is approximately 83.8%. The
B. Training Complexity best accuracy is obtained when using the four-convolutional-
The computation time using one NVIDIA M60 GPU for layer architecture as shown in Figure 2. This is a significant
96000 training examples and 64000 validation and testing improvement of 8.8% over the two-convolutional-layer model.
examples varies signifiantly for different models. The simplest Due to the fact that lower loss corresponds to higher accu-
model with only two convolutional layers in CNN takes racy, a smoothly decreasing loss indicates that the network
approximately 15 seconds per epoch while the CNN with is learning well as it does for the four-convolutional-layer
four convolutional layers takes approximately 400 seconds per model. When the neural network gets deeper, it becomes less

917
likely for the validation loss to converge. For the five and the convolutional part. We believe that the cyclic connections
six-convolutional-layer models, large loss vibrations appear extract more relevant temporal features in the signal. The
early during training, which means that the minimum losses results of CLDNN - shown in Figure 8 - do outperform other
achieved by these neural networks are larger than that of models. The accuracy at high SNR reaches 88.5% and it is
the four-convolutional-layer model, which leads to the poor the highest among all tested neural network architectures.
classification performance.
B. Residual Network
We find that combining a residual network with the original
CNN architecture demonstrates similar performance as the
pure CNN architecture. Similar to the result of CNN, the best
performance of 83.5% is achieved when we combine ResNet
with a four convolutional layer neural network as shown in
Figure 3. Recognition accuracy also starts to decrease when
we combine ResNet with a network architecture that has more
than four convolutional layers.
C. Densely Connected Network
Because more densely connected blocks require a deeper
neural network, which in our experiments did result in ac- Fig. 8: Classification performance comparison between can-
curacy degradation, we implement DenseNet on CNN archi- didate architectures. CLDNN and DenseNet outperform other
tectures with only one densely connected block. We start models with best accuracies of 88.5% and 86.6%, respectively.
with a three convolutional layer DenseNet and keep adding
convolutional layers into the network until the accuracy result
starts to descend. We achieve a best accuracy of 86.6% (see
Figure 7) at high SNR using the four convolutional layer
architecture shown in Figure 4.

Fig. 9: The confusion matrix of CLDNN at SNR=18dB.

TABLE I
Fig. 7: Best Performance at high SNR is achieved with a four-
convolutional-layer DenseNet. Misclassification Percentage(%)
8PSK/QPSK 5.5
QAM16/QAM64 58.48
D. CLDNN QAM64/QAM16 20.14
WBFM/AM-DSB 59.6
CLDNN has been widely used in recognition tasks that WBFM/GFSK 3.3
involve time domain signals like videos, speech, and images,
TABLE II: Significant modulation type misclassification at
as the inherent memory property leads to recognizing temporal
high SNR for the proposed CLDNN architecture
correlations in the input signal. Recent work has also suggested
the use of CLDNN for modulation recognition tasks [7].
However, neither the network architecture nor the obtained In Figure 9, we show the classification results of the
accuracy results were clearly specified in [7], and hence it highest SNR case in a confusion matrix. There are two main
was not feasible to reproduce these results and compare ours discrepancies besides the clean diagonal in the matrix, which
with. We applied the CLDNN architecture and compared the are WBFM being misclassified as AM-DSB and QAM16
performance of CLDNN with results demonstrated by ResNet being misclassified as QAM64. Details of the misclassification
and DenseNet. We added an LSTM unit into the network after effects on accuracies are listed in Table I, where the number

918
in the percentage column represents the percentage of the networks more efficient to train for the considered modulation
left hand side modulation type that is misclassified as the classifcation task.
modulation type on the right hand side. A small portion of We finally applied the CLDNN architecture and obtained
8PSK samples are misclassified as QPSK and a small portion through it the best performance among all tested network ar-
of WBFM samples are misclassified as GFSK; we expect chitectures. We believe that the good performance of CLDNN
that further optimizing the neural network architecture and is due to its long-term memory ability, which is suitable for
possibly increasing the depth would lead to capturing these the causality characteristic of time domain radio signals.
subtle feature differences. We further notice that QAM16
V. C ONCLUSION
and QAM64 are likely to be misclassified as each other,
since their similarities in the constellation diagram make the Multiple state of the art deep neural networks were applied
differentiation vulnerable to small noise in the signal. We for the radio modulation recognition task. We explored signal
expect that appropriate pre-processing of the input signal can feature extraction by adding convolutional layers, various
help alleviate these large misclassification percentages. Large kinds of residual layers and recurrent layers to a deep neural
discrepancy also exists in WBFM classification which is likely network architecture. A Convolutional Long Short-term Deep
to be recognized as AM-DSB. We believe that this discrepancy Neural Network (CLDNN) was found to deliver the best
is probably due to the silence period where only carrier tone classification architecture, which improves the accuracy by
exists in the analog voice signal. approximately 13.5% upon the original CNN model intro-
duced in [1]. We believe that the causality of radio time
IV. D ISCUSSION domain signals leads to this improvement since a recurrent
By creating shortcuts between different layers, the ResNet network is known to perform well for continuous acoustic
and DenseNet architectures alleviate the vanishing gradient signal processing tasks. The residual and densely connected
problem and promote feature reuse. By comparing the per- networks (ResNet and DenseNet) also perform well although
formances of ResNet and DenseNet in Figure 8, we notice the best accuracy is limited by the depth of network, but
that DenseNet demonstrates significantly better performance they suggest that changing connections between layers - and
than ResNet by including more shortcut connections in the specially creating shortcuts between non-consecutive layers -
network, and therefore further strengthens feature propagation may produce better classification accuracy.
throughout the network. R EFERENCES
[1] Timothy J. O’Shea, Latha Pemula, Dhruv Batra, and T. Charles Clancy.
”Radio transformer networks: Attention models for learning to synchro-
nize in wireless systems.” In Signals, Systems and Computers, 2016
50th Asilomar Conference on, pp. 662-666. IEEE, 2016.
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
residual learning for image recognition. CoRR, abs/1512.03385, 2015.
[3] Huang G, Liu Z, Weinberger K Q, et al. Densely connected convolutional
networks[J]. arXiv preprint arXiv:1608.06993, 2016.
[4] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak. Convolutional, long
short-term memory, fully connected deep neural networks. In 2015 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 45804584, April 2015.
[5] O’SHEA, Timothy J; WEST, Nathan. Radio Machine Learning Dataset
Generation with GNU Radio. Proceedings of the GNU Radio Confer-
ence, [S.l.], v. 1, n. 1, sep. 2016.
[6] D. P. Kingma and J. Ba, Adam: A method for stochastic op-
Fig. 10: Validation loss descents quickly in all three models, timization, CoRR, vol. abs/1412.6980, 2014. [Online]. Available:
http://arxiv.org/abs/1412.6980.
but losses of DenseNet and ResNet reach plateau earlier than [7] N. E. West and T. O’Shea, ”Deep architectures for modulation recog-
that of CNN. nition,” 2017 IEEE International Symposium on Dynamic Spectrum
Access Networks (DySPAN), Piscataway, NJ, 2017, pp. 1-6.

Although the ResNet and DenseNet architectures also suffer


from accuracy degradation when the network grows deeper
than the optimal depth, our experiments still show that when
using the same network depth, DenseNet and ResNet have
much lower convergence rates than plain CNN architectures.
Figure 10 shows validation errors of ResNet, DenseNet, and
CNN of the same network depth with respect to the number
of training epochs used. We can see that the ResNet and
the DenseNet start at significantly lower validation errors
and remain having a lower validation error throughout the
whole training process, meaning that combining ResNet and
DenseNet into a plain CNN architecture does make neural

919

You might also like