Peng 2017
Peng 2017
Abstract—Deep learning (DL) is a powerful classification erable potential to other communications applications besides
technique that has great success in many application domains. modulation classification.
However, its usage in communication systems has not been well Modulation classification is usually a major communications
explored. In this paper, we address the issue of using DL in
communication systems, especially for modulation classification. problem with both civilian and military applications, such as
Convolutional neural network (CNN) is utilized to complete spectrum management, signal identification, electronic warfare,
the classification task. We convert the raw modulated signals threat analysis, etc. Previously, it was handled either in either
into images that have a grid-like topology and feed them to traditional signal processing [7] or ML approach [6]. In this ar-
CNN for network training. Two existing approaches, including ticle, we aim to solve it by use of DL. One of the most prevalent
cumulant and support vector machine (SVM) based classification
algorithms, are involved for performance comparison. Simulation DL architectures, convolutional Neural Network (CNN) [8],
results indicate that the proposed CNN based modulation clas- is considered for modulation classification. Modulated signals
sification approach achieves comparable classification accuracy are converted into grid-like topological data, e.g., images of
without the necessity of manual feature selection. constellation diagrams, to feed CNN. AlexNet [9], a famous
CNN model, is adopted and modified for network training and
I. I NTRODUCTION accuracy testing. The whole modulation classification work is
implemented based on the Caffe framework [10].
Deep learning (DL) is a branch of machine learning (ML)
The rest of this article is organized as follows. Section
that has state-of-the-art capability for classification [1]. Re-
II describes the signal model and traditional algorithms for
cently, it has attracted great attention and been applied in
modulation classification. Section III gives an overview of DL,
various fields. During the competition of ImageNet Large
CNN, and AlexNet. Section IV shows how to use DL for
Scale Visual Recognition Challenge (ILSVRC), many research
modulation classification. Simulation results are provided in
teams submitted various DL algorithms for object detection
Section V. Finally, Section VI concludes this paper.
and image classification, and the latest top-5 accuracy of
identifying objects from 1000 categories reaches 95% [2]. II. P ROBLEM F ORMULATION
In [3], the authors show that DL network can be trained
to implement accurate, inexpensive and scalable economics A. Signal Model
survey with satellite imagery in developing countries. More- Assume that we are operating in a coherent, synchronous
over, bioinformatics can also benefit from DL. Splice junctions environment with single-tone signaling and that carrier, timing,
can be discovered from DNA sequences, finger joints can be and waveform recovery have been accomplished. we then
recognized from X-ray images, lapses can be detected from obtain a baseband sequence composed of samples of the
electroencephalography signals, and so on [4]. complex envelope [7],
Although DL is flourishing everywhere, communications ∞
field seems to be an exception. It is noticed that some
X
y(n) = Aej(2πf0 nT +θn ) x (l) h (nT − lT + T T )+g(n),
traditional ML algorithms, e.g., Supported Vector Machine l=−∞
(SVM) and K-Nearest Neighbor (KNN), have been utilized (1)
for media access control (MAC) protocol identification [5] and where x(l) represents the symbol sequence, A is an unknown
modulation classification [6]. In this paper, we focus on the amplitude factor, f0 denotes the carrier frequency offset, θn is
issue of using DL in communications systems, especially for the phase jitter, T represents the symbol spacing, h (·) denotes
modulation classification. the residual channel effect, T is the time error, and g(n)
The use of DL in communications systems has multiple represents the additive Gaussian noise sequence.
advantages. Firstly, because of the huge amount of communi- Our modulation classification task is to decide which mod-
cations devices and the high communications data rate, massive ulation scheme has been utilized with the knowledge of the N
data, which is required by DL, are available in communications sample received vector y = [y(1), y(2), · · · .y(N )]T .
systems. Secondly, DL is able to extract features autonomously This paper considers four different modulation types, in-
and avoids the challenging task of manual feature selection. cluding quadrature phase-shift keying (QPSK), 8 phase-shift
Thirdly, since DL is evolving rapidly, there will be consid- keying (8PSK), 16 quadrature amplitude modulation (16QAM),
14 dB
the bn indicates bias value that corresponds to each feature
map. The sparse connectivity of convolutional layer only allow
neurons to connect with a local region of input volume, which
significantly reduces the number of parameters in the model.
This connectivity pattern enables CNNs to accept inputs with
6 dB
larger dimensionality which are computational infeasible to
ordinary neural networks.
The pooling layer is typically inserted after a convolutional
layer to reduce the dimensionality of feature maps and hence
the number of parameters also. It is a non-linear down-
-2 dB
sampling operation that aggregates a small patch of units within
a feature map, commonly maximizes values of a 2x2 region
with a shifting stride of 2, which makes the model invariant
to small translation of inputs [15]. Given the input data with QPSK 8PSK 16QAM 64QAM
grid-like topologies (i.e., images), a stack of convolution and
pooling layers are able to abstract fine-grained (i.e., points, Fig. 1. constellation diagrams for four modulation categories
edges) and coarse-grained representations (i.e., shapes) from with different SNRs.
data, which is also the reason that CNNs can achieve big
success in image-based applications such as face recognition,
object detection/localization, video analysis and etc. in communication systems where only complex sample points
Besides, there are typically multiple fully-connected (FC) of modulated signals are available, the data conversion is nec-
layers (also known as dense layers) at the end of CNN based essary to bridge the gap between two types of data. Therefore,
models. Those layers are just same as layers in ordinary we propose a method that converts complex sample points into
neural networks where all neurons are fully connected to a constellation diagram for the utilization of AlexNet as well
every activation of the previous layer and implement a matrix as other CNN based DL models.
multiplication with them. The constellation diagram has been widely used as a two-
dimensional representation of a modulated signal by mapping
B. AlexNet Model signal samples into scatter points on the complex plane. Note
AlexNet is a large CNN based DL model that consists of that the complex plane is infinite while the scope of an image is
650 thousand neurons and 60 million parameters. It is designed limited. Thus, it is necessary to select centric region of the com-
to classify 1.2 million images into 1000 categories, as the plex plane to generate a constellation diagram. If the selected
requirement of ILSVRC-2012 contest [9], and hence needs a region is too small, some sample points severely polluted by the
large learning capacity. AlexNet model mainly has 8 layers: 5 noise will be out of range and hence discarded. On the contrary,
convolutional layers and 3 fully-connected layers. Some of the large selected region results in high resolution of images,
convolutional layers are followed by normalization and max- which leads to rapid growth of computational complexity
pooling layers, and the last FC layer is connected to a 1000- when training the deep network. To achieve trade-off between
way softmax that corresponds to the number of classification classification performance and computing costs, this paper
categories. selects part of the complex plane whose real and imaginary
In addition to assemble multiple layers together, AlexNet axes both range from -3.5 to 3.5. This kind of constellation
also employs several novel features to improve the performance diagram is then output an image with JPEG format. Examples
on both classification accuracy and training efficiency: A of constellation diagrams for four modulation categories with
non-saturating activation function, Rectified Linear Units (Re- different signal-to-noise ratios (SNRs) are shown in Fig. 1.
LUs) [16], is used in place of traditional saturating functions
such as f (x) = tanh(x) or f (x) = (1 + e−x )−1 . This B. Network Configuration
results in a much faster training procedure than before. Another In order to facilitate CNN based modulation classification,
highlight feature, “dropout” [17] as a regularization method, is we adopt the BLVC reference CaffeNet model (a minor vari-
introduced to prevent overfitting by reducing co-adaptation of ation of AlexNet model within the Caffe toolkit) and slightly
neurons. Technically, it sets outputs of hidden neuron to zero modify it for better performance and faster learning speed.
with a probability of 0.5 that forced the network to learn more The number of outputs in layer #8 is changed to 4 as only
robust features than usual. four modulation types are investigated in our case, and the
size of layer #7 is shrinked to 256 because the default size
IV. M ODULATION C LASSIFICATION USING CNN
of 4096 always leads to difficulties in convergence in training.
A. Data Conversion In addition, several parameters of solver configuration, such as
As mentioned above, AlexNet is designed for image classifi- learning rate (0.01→0.0005), gamma (0.1→0.5), and stepsize
cation tasks. Considering the modulation classification problem (100,000→10,000), are also adjusted for better classification
100 100
95
90
90
80
85
Accuracy(%)
Accuracy(%)
70 80
QPSK
8PSK 75
60 16QAM
AlexNet
64QAM
SVM
Average 70
cumulant
50
65
40 60
-4 -2 0 2 4 6 8 10 12 14 -4 -2 0 2 4 6 8 10 12 14
SNR(dB) SNR(dB)
Fig. 2. Classification accuracy of four modulation types and Fig. 3. Modulation classification accuracy comparison.
average accuracy versus SNR.
and these two types are usually confused with each other. The
performance and training efficiency. We also fix the image res- reason is that they have the similar square constellation pattern
olution to be 227x227 as the default receptive field in AlexNet as shown in Fig. 1.
and hence avoid the random cropping operations. Finally, TABLE II. Confusion matrices of four modulation types with
considering the available computing resources on graphics SNRs at 4 and 8dB.
processing unit (GPU), we select to mini-batch gradient descent
with a batch size of 64 in the first convolutional layer. SNR Types QPSK 8PSK 16QAM 64 QAM Accuracy
QPSK 1000 0 0 0 100%
C. Implementation 4dB 8PSK 0 990 6 4 99.0%
We generate 10,000 images and 1000 constellation diagram 16QAM 0 8 633 359 63.3%
64QAM 0 2 362 636 63.6%
images per modulation type for training and testing, respec-
QPSK 1000 0 0 0 100%
tively. Each image is generated based on 1000 samples of a 8dB 8PSK 0 1000 0 0 100%
modulated signal. The maximum number of training iteration 16QAM 0 0 807 193 80.7%
is set to be 100,000. All models are trained on a single 64QAM 0 0 204 796 79.6%
K40 NVIDIA GPU card with the support of GPU-accelerated
libraries. Finally, we compare the proposed CNN (AlexNet) based
V. S IMULATION R ESULTS modulation classification approach with traditional cumulant
and SVM based algorithms, which have been discussed in
In order to illustrate the performance of CNN based modu- Sec. II. Fig. 3 presents the average classification accuracy of
lation classification, Fig. 1 presents the classification accuracy these three algorithms versus SNR. It can be seen from this
of each modulation type as well as the average accuracy with figure that, for cumulant and SVM based algorithms, the former
different SNRs range from -2dB to 14dB. For each modulation is better when SNR is high (SNR>5dB), while the latter is
type, 1000 tests are implemented for performance evaluation. superior when SNR is low. Obviously, the proposed CNN based
The average accuracy is obtained by averaging the classifica- approach achieves comparable accuracy with cumulant based
tion accuracy of four modulation types. As shown in Fig. 1, in algorithm in the high SNR region and is as good as SVM
the low SNR region, the tasks of identifying QPSK and 8PSK based algorithm in the low SNR region. Note that it obtains this
are relatively easy (accuracy>90%), while that of identifying classification performance without manual feature selection.
16QAM and 64QAM are rather difficult (accuracy is about
60%). The classification accuracy greatly increases along with VI. C ONCLUSION AND D ISCUSSION
the growth of SNRs, especially for 16QAM and 64QAM. In This paper presents the idea of using CNN to classify mod-
the high SNR region (e.g., SNR¿10dB), all modulation types ulation types in communication systems. In our methodology,
approach 100% accuracy, which means the proposed approach constellation diagrams are exploited to represent modulated
is able to achieve modulation classification without any errors. signals for CNN. Alexnet model is adopted for training and
Furthermore, TABLE II gives two confusion matrices with testing. Compared with traditional cumulant and SVM based
SNR at 4dB and 8dB to explain the detailed classification algorithms, the proposed CNN based approach not only avoids
performance of each modulation type. As shown in this table, the challenging task of manual feature selection but also
more classification errors occur when modulation order is achieves comparable performance on modulation classification
higher. Note that it is hard to identify 16QAM and 64QAM regardless of different SNR regions.
Although the current CNN based approach may not always
outperform existing works, there is still plenty of room for
improvement. For example, the data conversion procedure from
complex samples to images indeed incurs information loss
due to the limited resolution of images. Any enhanced data
conversion method that preserves more original information
is expected to be helpful. In addition, since the architecture
of neural networks has great impact on classification perfor-
mance, it is worthy to investigate more advanced DL models
for modulation classification in the future. Finally, a larger
amount of data for training is also beneficial for performance
improvement.
ACKNOWLEDGMENT
The authors would like to thank NVIDIA academic partner-
ship program for their generous donation and support.
R EFERENCES
[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Cambridge,
MA, USA: MIT Press, 2016.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, et
al., ”ImageNet Large Scale Visual Recognition Challenge,” International
Journal of Computer Vision, vol. 115, pp. 211-252, 2015.
[3] N. Jean, M. Burke, M. Xie, W. M. Davis, D. B. Lobell, and S. Ermon,
”Combining satellite imagery and ML to predict poverty,” Science, vol.
353, pp. 790-794, 2016.
[4] S. Min, B. Lee, and S. Yoon, “Deep learning in Bioinformatics,” arXiv:
1603.06430, 2016.
[5] S. Hu, Y. D. Yao, and Z. Yang, “MAC protocol identification using support
vector machines for cognitive radio networks,” IEEE Wireless Commun.,
vol. 21, pp. 52-60, 2014.
[6] M. W. Aslam, Z. Zhu, and A. K. Nandi, “Automatic Modulation Clas-
sification Using Combination of Genetic Programming and KNN,” IEEE
Trans. Wireless Commun., vol. 11, pp. 2742-2750, 2012.
[7] A. Swami and B. Sadler, “Hierarchical digital modulation classification
using cumulants,” IEEE Trans. Commun., vol. 48, no. 3, pp. 416-429, Mar.
2000.
[8] Y. LeCun, “Generalization and network design strategies,” University of
Toronto Technical Report CRG-TR-89-4, 1989.
[9] A. Krizhevsky, I. Sutskever and G. Hinton, “ImageNet Classification with
Deep Convolutional Neural Networks”, in Advances in Neural Information
Processing Systems 25, pp. 1097-1105, 2012.
[10] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
et al., ”Caffe: Convolutional Architecture for Fast Feature Embedding,”
arXiv:1408.5093, 2014.
[11] C. Burges, “A tutorial on support vector machines for pattern recogni-
tion,” Knowledge Discovery and Data Mining, vol. 2, pp. 1-43, 1998.
[12] K. Miiller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “An
introduction to kernel-based learning algorithms,” IEEE Trans. Neural
Networks, vol. 12, pp. 181-201, Mar. 2001.
[13] M. Mirarab and M. Sobhani, “Robust modulation classification for
PSK/QAM/ASK using higher order cumulants,” in Proc. 2007 Interna-
tional Conference on Information, Communications and Signal Process-
ing, pp. 1-4.
[14] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector
machines,” ACM Transactions on Intelligent Systems and Technology, vol.
2, issue 3, pp. 27:1–27:27, 2011.
[15] Y.-L. Boureau, J. Ponce, Y. LeCun, “A Theoretical Analysis of Feature
Pooling in Visual Recognition”, in Proceedings of the 27th international
conference on machine learning (ICML-10), pp 111118, 2010.
[16] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
boltzmann machines”, in Proceedings of the 27th International Conference
on Machine Learning (ICML-10), pp 807-814, 2010.
[17] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R.
Salakhutdinov, “Improving neural networks by preventing co-adaptation
of feature detectors”, arXiv:1207.0580, 2012.