Deep Neural Network Architectures For Modulation
Deep Neural Network Architectures For Modulation
Classification
Xiaoyu Liu, Diyu Yang, and Aly El Gamal
School of Electrical and Computer Engineering
Purdue University
Email: {liu1962, yang1467, elgamala}@purdue.edu
Abstract—In this work, we investigate the value of employ- convolutional neural networks (CNN) to the task of radio
ing deep learning for the task of wireless signal modulation modulation recognition [1].
recognition. Recently in [1], a framework has been introduced The Convolutional Neural Network (CNN) has been re-
by generating a dataset using GNU radio that mimics the
imperfections in a real wireless channel, and uses 10 different cently identified as a powerful tool in image classification
modulation types. Further, a convolutional neural network (CNN) and voice signal processing. There have also been successful
architecture was developed and shown to deliver performance attempts to apply this method in other areas such as natural
that exceeds that of expert-based approaches. Here, we follow language processing and video detection. Based on its supreme
the framework of [1] and find deep neural network architectures performance in feature extraction, a simple architecture of
that deliver higher accuracy than the state of the art. We tested
the architecture of [1] and found it to achieve an accuracy of CNN was introduced in [1] for distinguishing between 10 dif-
approximately 75% of correctly recognizing the modulation type. ferent modulations. Simulation results show that CNN not only
We first tune the CNN architecture of [1] and find a design demonstrates better accuracy results, but also provides more
with four convolutional layers and two dense layers that gives flexibility compared to current day expert-based approaches
an accuracy of approximately 83.8% at high SNR. We then [1]. However, CNN has been challenged with problems like
develop architectures based on the recently introduced ideas of
Residual Networks (ResNet [2]) and Densely Connected Networks
(DenseNet [3]) to achieve high SNR accuracies of approximately
83.5% and 86.6%, respectively. Finally, we introduce a Convo-
lutional Long Short-term Deep Neural Network (CLDNN [4]) to
achieve an accuracy of approximately 88.5% at high SNR.
I. I NTRODUCTION
916
epoch. We note that a high dropout rate may slow down the
training speed but reduces overfitting. In our setting, we set
the dropout rate to 0.6, which is higher than the rate used
in [1], and the activation function in each hidden layer is
a Rectified Linear Unit (ReLU) function. We set patience,
the period during which a non-converging validation loss is
tolerated, to 10 when there are three and four convolutional
layers and get a total training time of around half an hour.
When the network becomes deeper, it starts to take more
than 10 training epochs for the validation loss to decrease, so
setting patience to 20 produces smaller validation loss, which
Fig. 3: Architecture of seven-layer ResNet. means higher accuracy. To get better results, we set patience
to 20 in the remaining models. It takes approximately 1000
seconds per epoch in all three models. The total training time
is approximately 70 hours for the DenseNet model, 20 hours
for the ResNet model, and 50 hours for the CLDNN model.
III. R ESULTS
A. Convolutional Neural Network
We start with a basic two-convolutional-layer neural net-
work, in which two convolutional layers with 256 1x3 filters
and 80 2x3 filters, respectively, are followed by two dense
layers. We then explore the effect of different filter settings
by exchanging filter settings between the two convolutional
Fig. 4: Architecture of seven-layer DenseNet. layers. The performances of networks with different filter
settings demonstrate that layer architectures with larger filters
in earlier convolutional layers and smaller filters in deeper
forms [4]. It is a combination of CNNs, long short-term convolutional layers optimize the accuracy result at high SNR.
memory (LSTM), and deep neural networks (DNN). In our
setting, we choose four convolutional layers in CNN, followed
by one LSTM layer with 50 computing units and a two-
convolutional-layer DNN (see Figure 5). We tested different
CLDNN architectures with different number of memory cells
in the LSTM layer. Our experiments show that an LSTM layer
with 50 cells gives out the best accuracy result compared to
other layer settings.
Fig. 5: Architecture of eight-layer CLDNN. Next, we explore the optimal depth of CNN by increasing
the number of convolutional layers from 2 to 5. We find that
the best accuracy at high SNR is approximately 83.8%. The
B. Training Complexity best accuracy is obtained when using the four-convolutional-
The computation time using one NVIDIA M60 GPU for layer architecture as shown in Figure 2. This is a significant
96000 training examples and 64000 validation and testing improvement of 8.8% over the two-convolutional-layer model.
examples varies signifiantly for different models. The simplest Due to the fact that lower loss corresponds to higher accu-
model with only two convolutional layers in CNN takes racy, a smoothly decreasing loss indicates that the network
approximately 15 seconds per epoch while the CNN with is learning well as it does for the four-convolutional-layer
four convolutional layers takes approximately 400 seconds per model. When the neural network gets deeper, it becomes less
917
likely for the validation loss to converge. For the five and the convolutional part. We believe that the cyclic connections
six-convolutional-layer models, large loss vibrations appear extract more relevant temporal features in the signal. The
early during training, which means that the minimum losses results of CLDNN - shown in Figure 8 - do outperform other
achieved by these neural networks are larger than that of models. The accuracy at high SNR reaches 88.5% and it is
the four-convolutional-layer model, which leads to the poor the highest among all tested neural network architectures.
classification performance.
B. Residual Network
We find that combining a residual network with the original
CNN architecture demonstrates similar performance as the
pure CNN architecture. Similar to the result of CNN, the best
performance of 83.5% is achieved when we combine ResNet
with a four convolutional layer neural network as shown in
Figure 3. Recognition accuracy also starts to decrease when
we combine ResNet with a network architecture that has more
than four convolutional layers.
C. Densely Connected Network
Because more densely connected blocks require a deeper
neural network, which in our experiments did result in ac- Fig. 8: Classification performance comparison between can-
curacy degradation, we implement DenseNet on CNN archi- didate architectures. CLDNN and DenseNet outperform other
tectures with only one densely connected block. We start models with best accuracies of 88.5% and 86.6%, respectively.
with a three convolutional layer DenseNet and keep adding
convolutional layers into the network until the accuracy result
starts to descend. We achieve a best accuracy of 86.6% (see
Figure 7) at high SNR using the four convolutional layer
architecture shown in Figure 4.
TABLE I
Fig. 7: Best Performance at high SNR is achieved with a four-
convolutional-layer DenseNet. Misclassification Percentage(%)
8PSK/QPSK 5.5
QAM16/QAM64 58.48
D. CLDNN QAM64/QAM16 20.14
WBFM/AM-DSB 59.6
CLDNN has been widely used in recognition tasks that WBFM/GFSK 3.3
involve time domain signals like videos, speech, and images,
TABLE II: Significant modulation type misclassification at
as the inherent memory property leads to recognizing temporal
high SNR for the proposed CLDNN architecture
correlations in the input signal. Recent work has also suggested
the use of CLDNN for modulation recognition tasks [7].
However, neither the network architecture nor the obtained In Figure 9, we show the classification results of the
accuracy results were clearly specified in [7], and hence it highest SNR case in a confusion matrix. There are two main
was not feasible to reproduce these results and compare ours discrepancies besides the clean diagonal in the matrix, which
with. We applied the CLDNN architecture and compared the are WBFM being misclassified as AM-DSB and QAM16
performance of CLDNN with results demonstrated by ResNet being misclassified as QAM64. Details of the misclassification
and DenseNet. We added an LSTM unit into the network after effects on accuracies are listed in Table I, where the number
918
in the percentage column represents the percentage of the networks more efficient to train for the considered modulation
left hand side modulation type that is misclassified as the classifcation task.
modulation type on the right hand side. A small portion of We finally applied the CLDNN architecture and obtained
8PSK samples are misclassified as QPSK and a small portion through it the best performance among all tested network ar-
of WBFM samples are misclassified as GFSK; we expect chitectures. We believe that the good performance of CLDNN
that further optimizing the neural network architecture and is due to its long-term memory ability, which is suitable for
possibly increasing the depth would lead to capturing these the causality characteristic of time domain radio signals.
subtle feature differences. We further notice that QAM16
V. C ONCLUSION
and QAM64 are likely to be misclassified as each other,
since their similarities in the constellation diagram make the Multiple state of the art deep neural networks were applied
differentiation vulnerable to small noise in the signal. We for the radio modulation recognition task. We explored signal
expect that appropriate pre-processing of the input signal can feature extraction by adding convolutional layers, various
help alleviate these large misclassification percentages. Large kinds of residual layers and recurrent layers to a deep neural
discrepancy also exists in WBFM classification which is likely network architecture. A Convolutional Long Short-term Deep
to be recognized as AM-DSB. We believe that this discrepancy Neural Network (CLDNN) was found to deliver the best
is probably due to the silence period where only carrier tone classification architecture, which improves the accuracy by
exists in the analog voice signal. approximately 13.5% upon the original CNN model intro-
duced in [1]. We believe that the causality of radio time
IV. D ISCUSSION domain signals leads to this improvement since a recurrent
By creating shortcuts between different layers, the ResNet network is known to perform well for continuous acoustic
and DenseNet architectures alleviate the vanishing gradient signal processing tasks. The residual and densely connected
problem and promote feature reuse. By comparing the per- networks (ResNet and DenseNet) also perform well although
formances of ResNet and DenseNet in Figure 8, we notice the best accuracy is limited by the depth of network, but
that DenseNet demonstrates significantly better performance they suggest that changing connections between layers - and
than ResNet by including more shortcut connections in the specially creating shortcuts between non-consecutive layers -
network, and therefore further strengthens feature propagation may produce better classification accuracy.
throughout the network. R EFERENCES
[1] Timothy J. O’Shea, Latha Pemula, Dhruv Batra, and T. Charles Clancy.
”Radio transformer networks: Attention models for learning to synchro-
nize in wireless systems.” In Signals, Systems and Computers, 2016
50th Asilomar Conference on, pp. 662-666. IEEE, 2016.
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
residual learning for image recognition. CoRR, abs/1512.03385, 2015.
[3] Huang G, Liu Z, Weinberger K Q, et al. Densely connected convolutional
networks[J]. arXiv preprint arXiv:1608.06993, 2016.
[4] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak. Convolutional, long
short-term memory, fully connected deep neural networks. In 2015 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 45804584, April 2015.
[5] O’SHEA, Timothy J; WEST, Nathan. Radio Machine Learning Dataset
Generation with GNU Radio. Proceedings of the GNU Radio Confer-
ence, [S.l.], v. 1, n. 1, sep. 2016.
[6] D. P. Kingma and J. Ba, Adam: A method for stochastic op-
Fig. 10: Validation loss descents quickly in all three models, timization, CoRR, vol. abs/1412.6980, 2014. [Online]. Available:
http://arxiv.org/abs/1412.6980.
but losses of DenseNet and ResNet reach plateau earlier than [7] N. E. West and T. O’Shea, ”Deep architectures for modulation recog-
that of CNN. nition,” 2017 IEEE International Symposium on Dynamic Spectrum
Access Networks (DySPAN), Piscataway, NJ, 2017, pp. 1-6.
919