0% found this document useful (0 votes)
15 views7 pages

Radio Identification

This document presents a method for identifying specific radios using software-defined radios (SDRs) and machine learning techniques. The method collects raw I/Q samples from transmissions between SDRs and uses a deep convolutional neural network (CNN) to learn features that uniquely identify each radio. The CNN is trained on over 20 million samples and achieves 90-99% accuracy in identifying radios located between 2-50 feet apart over a wireless channel. This approach performs device fingerprinting without requiring decoding, feature engineering, or protocol knowledge, making it robust against spoofing and able to handle multiple coexisting protocols.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

Radio Identification

This document presents a method for identifying specific radios using software-defined radios (SDRs) and machine learning techniques. The method collects raw I/Q samples from transmissions between SDRs and uses a deep convolutional neural network (CNN) to learn features that uniquely identify each radio. The CNN is trained on over 20 million samples and achieves 90-99% accuracy in identifying radios located between 2-50 feet apart over a wireless channel. This approach performs device fingerprinting without requiring decoding, feature engineering, or protocol knowledge, making it robust against spoofing and able to handle multiple coexisting protocols.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1

Deep Learning Convolutional Neural Networks


for Radio Identification
Shamnaz Riyaz, Kunal Sankhe, Stratis Ioannidis, and Kaushik Chowdhury
Electrical and Computer Engineering Department, Northeastern University, Boston, MA, USA
Email: [email protected], [email protected], [email protected], [email protected]

Abstract—Advances in software defined radio (SDR) tech- device fingerprinting tackles these two scenarios by learning
nology allow unprecedented control on the entire processing characteristic features of the transmitters in a pre-deployment
chain, allowing modification of each functional block as well training phase, which is then exploited during actual network
as sampling the changes in the input waveform. This paper
describes a method for uniquely identifying a specific radio operation. We recognize that SDRs come in diverse form
among nominally similar devices using a combination of SDR factors with varying on-board computational resources. Thus,
sensing capability and machine learning (ML) techniques. The for general purpose use, any device fingerprinting approach
key benefit of this approach is that ML operates on raw I/Q must be computationally simple once deployed in the field. For
samples and distinguishes devices using only the transmitter this reason, we propose machine learning (ML) techniques,
hardware-induced signal modifications that serve as a unique
signature for a particular device. No higher level decoding, specifically, Deep Convolutional Neural Networks (CNNs),
feature engineering, or protocol knowledge is needed, further and experimentally demonstrate near-perfect radio identifica-
mitigating challenges of ID spoofing and coexistence of multiple tion performance in many practical scenarios.
protocols in a shared spectrum. The contributions of the paper • Overview of our approach: ML techniques have been
are as follows: (i) The operational blocks in a typical wireless remarkably successful in image and speech recognition, how-
communications processing chain are modified in a simulation
study to demonstrate RF impairments, which we exploit. (ii) ever, their utility for device level fingerprinting by feature
Using an over-the-air dataset compiled from an experimental learning has yet to be conclusively demonstrated. True au-
testbed of SDRs, an optimized deep convolutional neural network tonomous behavior of SDRs, not only in terms of detecting
(CNN) architecture is proposed, and results are quantitatively spectrum usage, but also in terms of self-tuning a multitude
compared with alternate techniques such as support vector of parameters and reacting to environmental stimulus is now
machines and logistic regression. (iii) Research challenges for
increasing the robustness of the approach, as well as the a distinct possibility. We collect over 20 · 106 RF I/Q sam-
parallel processing needs for efficient training, are described. ples over multiple transmission rounds for each transmitter-
Our work demonstrates up to 90-99% experimental accuracy at receiver pair composed of off-the-shelf USRP SDRs. The
transmitter-receiver distances varying between 2-50 feet over a SDRs transmit standards compliant IEEE 802.11ac physical
noisy, multi-path wireless channel. layer waveforms, to create a database of received signals.
These I/Q samples carry embedded signatures characteristic
I. I NTRODUCTION of different active transmitter hardware, but are also subject to
Emerging applications in the context of smart cities, au- alterations introduced by the wireless channel. The approach of
tonomous vehicles, Internet of Things, and complex military providing raw time series radio signal by treating the complex
missions, among others, require reconfigurability both at the data as dimension of 2 real valued I/Q inputs to the CNN,
systems and the protocol level within its communications is motivated from modulation classification [1]. It has been
architecture. These advances rely on a critical enabling com- found to be a promising technique for feature learning on large
ponent, namely, software defined radio (SDR): this allows time series data. We develop a CNN architecture composed
cross-layer programmability of the transceiver hardware using of multiple convolutional and max-pooling layers optimized
high level directives. The promise of intelligent or so called for the task of radio fingerprinting. We partition the collected
cognitive radios builds on the SDR concept, where the radio is samples into separate instances and perform offline training on
capable of gathering contextual information and adapting its a computational cloud cluster, assigning weights to the inter-
own operation by changing the settings on the SDR based on neuron connections. A holdout data set composed of totally
what it perceives in its surroundings. unseen samples is used for estimation of detection accuracy.
In many mission critical scenarios, problems in authenti- • Contributions and paper structure: Our work makes the
cating devices, ID spoofing and unauthorized transmissions following key contributions. We survey and classify exist-
are major concerns. Moreover, high bandwidth applications ing approaches in Sec. II. We design a simulation model
are causing a spectrum crunch, leading network providers of a typical wireless communications processing chain in
to explore innovative spectrum sharing regimes in the TV MATLAB, and then modify the ideal operational blocks to
whitespace and the sub-6GHz bands. In all of the above, demonstrate the RF impairments that we wish to learn in
identifying (i) the type of the protocol in use, and (ii) the Sec. III. We describe the data gathering process for training the
specific radio transmitter (among many other nominally sim- classifier in Sec. IV. We architect and experimentally validate
ilar radios) become important. Our work on SDR-enabled an optimized deep convolutional neural network (CNN) for
2

RF Fingerprinting traces and generate the device driver fingerprint. [4] describes
a passive blackbox-based technique, that uses TCP or UDP
packet inter-arrival time to determine the type of access points
Supervised Unsupervised using wavelet analysis. However these techniques rely on prior
(A priori labeling (Real time grouping knowledge of vendor specific features.
of samples) of samples)
2) Classification-based: There are several studies on super-
[9] iHMRF vised learning that exploit RF features such as I/Q imbalance,
Similarity-based Classification [10] Nonparametric Bayesian phase imbalance, frequency error, and received signal strength,
(Matching with (Unique class to name a few. • Conventional: This form of classification
database entries) identification)
examines a match with pre-selected features using domain
knowledge of the system, i.e., the dominant feature(s) must
[3] 802.11 wireless driver FP
[4] Passive wireless AP FP be known a priori. [5] proposes classification by extracting
Conventional Deep Learning the known preamble within a packet and computing spectral
(Hand crafted (Multi-layer components. A set of log-spectral-energy features are given
Feature extractors) neural network)
as input to the k-nearest neighbors (k-NN) discriminatory
classifier. PARADIS [6] fingerprints 802.11 devices based on
[5] Frequency domain approach modulation-specific errors in the frame using SVM and k-NN
[1] Modulation Recognition - CNN
[6] PARADIS
[8] Deep learning - physical layer algorithms with an accuracy of 99%. In [7], a technique for
[7] GTID
physical device and device-type classification called GTID us-
Figure 1: RF Fingerprinting classification ing artificial neural networks is proposed. This method exploits
variations in clock skews as well as hardware compositions of
the devices. In general, as multiple different features are used,
radio fingerprinting in Sec. V, and quantitatively compare this selecting the right set of features is a major challenge. This
approach with support vector machines and logistical regres- also causes scalability problems when large number of devices
sion in Sec. VI. Finally, research challenges for increasing are present, leading to increased computational complexity in
the robustness of our approach are listed in Sec. VII and the training. • Deep Learning: Deep learning offers a powerful
conclusions are drawn in Sec. VIII. In summary, our CNN framework for supervised learning approach. It can learn
design demonstrates up to 90-99% experimental accuracy at functions of increasing complexity, leverages large datasets,
transmitter-receiver distances varying between 2-50 feet over and greatly increases the the number of layers, in addition to
a noisy, multi-path wireless channel. neurons within a layer. [1] and [8] apply deep learning at the
physical layer, specifically focusing on modulation recognition
II. R ELATED WORK using convolutional neural networks. They classify 11 different
The key idea behind radio fingerprinting is to extract unique modulation schemes. However, this approach does not identify
patterns (or features) and use them as signatures to identify a device, as we do here, but only the modulation type used by
devices. A variety of features at the physical (PHY) layer, the transmitter.
medium access control (MAC) layer, and upper layers have
been utilized for radio fingerprinting [2]. Simple unique iden- B. Unsupervised learning
tifiers such as IP addresses, MAC addresses, international mo- Unsupervised learning is effective when there is no prior
bile station equipment identity (IMEI) numbers can easily be label information about devices. In [9], an infinite Hidden
spoofed. Location-based features such as radio signal strength Markov Random field (iHMRF)-based online classification
(RSS) and channel state information (CSI) are susceptible algorithm is proposed for wireless fingerprinting using unsu-
to mobility and environmental changes. We are interested pervised clustering techniques and batch updates. Transmit-
in studying those features that are inherent to a device’s ter characteristics are used in [10] where a non-parametric
hardware, which are also unchanging and not easily replicated Bayesian approach (namely, an infinite Gaussian Mixture
by malicious agents. We classify existing approaches in Fig. 1. Model) classifies multiple devices in an unsupervised, passive
manner.
A. Supervised learning Transmitter identification using deep learning architectures
This type of learning requires a large collection of labeled is still in a nascent stage. Our work focuses on generation and
samples prior to network deployment for training the ML processing of large number of RF I/Q samples to train the
algorithm. classifiers and eventually identify the devices uniquely.
1) Similarity-based: Similarity measurements involve com-
paring the observed signature of the given device with the III. C AUSES OF HARDWARE IMPAIRMENTS
references present in a master database. In [3], a passive Using the MATLAB Communications System Toolbox, we
fingerprinting technique is proposed that identifies the wireless simulate a typical wireless communications processing chain
device driver running on an IEEE 802.11 compliant node by (see Fig. 2, with the shifts in the received complex valued
collecting traces of probe request frames from the devices. A I/Q samples), and then modify the ideal operational blocks to
supervised Bayesian approach is used to analyze the collected introduce RF impairments, typically seen in actual hardware
3

Input symbols
Reference points
1

Quadrature Amplitude
0.5

-0.5

I/Q Imbalance -1

-1 -0.5 0 0.5 1
DAC In-phase Amplitude

Digital Phase LO Figure 3: Data collection using SDR


Harmonics
Baseband Noise
Distortion PA
(DSP) π/2
Anti-aliasing Filter analog converters. Harmonic distortion is measured in terms
DAC
of total harmonic distortion, which is a ratio of the sum of
Nonlinear Distortion
the powers of all harmonic components to the power of the
Input symbols Input symbols fundamental frequency of the signal. This distortion is usually
Reference points Reference points
1 1 expressed in either percent or in dB relative to the fundamental
Quadrature Amplitude

Quadrature Amplitude

0.5 0.5 component of the signal.


•Power amplifier distortions: Power amplifier (PA) non-
0 0
linearities mainly appear when the amplifier is operated in
-0.5 -0.5
its non-linear region, i.e., close to its maximum output power,
-1 -1 where significant compression of the output signal occurs. The
distortions of the PA are generally modeled using AM/AM
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
In-phase Amplitude In-phase Amplitude (amplitude to amplitude) and AM/PM (amplitude to phase)
curves. The AM/AM causes amplitude distortion whereas
Figure 2: Typical transceiver chain with various sources of RF AM/PM introduces phase shift. The nonlinearity of amplifier
impairments. is modeled using Cubic Polynomial and Hyperbolic Tangent
methods using Third-order input intercept point (IIP3) param-
implementations. This allows us to individually study the I/Q eter. IIP3 expressed in (dBm) represents a scalar specifying
imbalance, phase noise, carrier frequency and phase offset, and the third order intercept.
nonlinearity of power amplifier, harmonic and power amplifier
distortions.
IV. DATA C OLLECTION FOR D EEP L EARNING
•I/Q imbalance: Quadrature mixers that convert baseband
to RF and vice versa are often impaired by gain and phase A. Experimental setup for Trace Data collection
mismatches between the parallel sections of the RF chain
We study the performance of different learning algorithms,
dealing with the in-phase (I) and quadrature (Q) signal paths.
including linear support vector machine (SVM), logistic re-
The analog gain is never the same for each signal path
gression, and CNNs, using I/Q samples collected from an
and the difference between their amplitude causes amplitude
experimental setup of USRP SDRs, shown in Fig. 3. For the
imbalance. In addition, the delay is never exactly 90◦ , which
purpose of data collection at the receiver end, we use a fixed
causes phase imbalance.
USRP B210. For the transmitter we use 5 different devices of
•Phase Noise: The up-conversion of a baseband signal to
the same family, i.e., USRP B210.
a carrier frequency fc is performed at the transmitter by
mixing the baseband signal with the carrier signal. Instead
of generating a pure tone at frequency fc , i.e., ej2πfc t , the
B. Protocols of Operation
generated tone is actually ej2πfc t+φ(t) , where φ(t) is a random
phase noise. The phase noise introduces a rotational jitter. We transmit different physical layer frames defined by the
Phase noise is expressed in units of dBc/Hz, which represents IEEE 802.11ac on each transmitter SDR. These frames are
the noise power relative to the carrier contained in a 1 Hz generated using MATLAB WLAN Systems toolbox, and are
bandwidth centered at a certain offset from the carrier. Typical standards compliant. The data frames generated are random
value of phase noise level is in the range [−100, −48] dBc/Hz, since we intend to transmit any data streams. These protocol
with frequency offset in the range [20, 200] Hz. frames are then streamed to the selected SDR for over-the-air
• Carrier Frequency and Phase offset: The performance of wireless transmission. The receiving SDR samples the incom-
crystal oscillators used for generating the carrier frequency is ing signals at 1.92 MS/s sampling rate at center frequency
specified with an accuracy in parts per million (ppm). The of 2.45 GHz for WiFi. The collected complex I/Q samples
difference in transmitter and receiver carrier frequencies is are partitioned into subsequences. For our experimental study,
referred to as carrier frequency offset. we set a fixed subsequence length of 128, additional details
•Harmonic distortions: The harmonics in a transmitted signal of which are described in Sec. V-A2. Overall, we collect
are caused by nonlinearities in the transmitter-side digital-to- approximately 20 million samples for each of five SDRs.
4

simply filters) that perform a convolution operation over input


ReLU
Softmax
data. The operation of the convolution filter is shown with an
example in Fig. 4b for intuitive understanding. A filter of size
Input
Conv+ Conv+ Output
2 × 2 is convolved with input data of size 4 × 4 by sliding
(2x128) Max
I/Q
ReLU
Pool
ReLU Max ⋮ ⋮ layer across its dimension to produce two-dimensional feature map.
(50x1x3) (50x2x3) Pool (4)
samples
A stride is the sliding interval of the filter and determines the
Convolution layer 1 Convolution layer 2 dimension of the feature map. Our example shows stride 1 to
(4)
1 1 0 1 (a)
produce a feature map of dimension 3 × 3. Each convolution
-1 2 -1 (256)
0 3 2 3 1 0 layer consists of a set of such filters, which in turn operates in-
0 3 1 Fully connected layers
1 1 1 2 1 -1
2 2 0 dependently to produce a set of two-dimensional feature maps.
3 2 1 2 Kernel/Filter Our CNN architecture is composed of the convolution layer
Feature Map
Input data (b) followed by an activation step that performs a pre-determined
𝑦 𝑚𝑎𝑥(1,2,1,3) = 3
y = max (0, 𝑥) non-linear transformation on each element of the feature map.
1 2 5 6 Max Pool with 2x2 There are many possible activation functions, such as sigmoid
1 3 4 3 filter and stride 2 3 6 and tanh; we use the Rectified Linear Unit (ReLU), as CNNs
2 4 1 0 4 2
3 3 2 2
with ReLU train faster compared to alternatives. As shown in
𝑥 (d) Fig. 4c, ReLU outputs max(x, 0) for an input x, replacing all
(c)
negative values in the feature map by zero.
Figure 4: CNN architecture
The convolution layer is generally followed by a pooling
C. Storage and Processing layer. Its functionality is to (a) introduce shift invariance (see
The samples are further analyzed offline over (i) worksta- also Sec. V-A3), as well as (b) reduce the dimensionality of
tions with typical configurations of Core-i7 processor, 8GB the rectified feature maps of the preceding convolution layer,
RAM, and flash-based 512GB storage as well as (ii) North- while retaining the most important information. We choose
eastern’s Discovery cluster that has 16 compute nodes with a a pooling layer with filters of size 2 × 2 and stride 2, which
NVIDIA Tesla K40m GPU each. These nodes have 48 logical downsamples the feature maps by 2 along both the dimensions.
cores each, and on each node the GPU has 2880 CUDA com- Among different filter operations (such as average, sum), max
puting cores. Each node has 128GByte of RAM configuration pooling gives better performance. As shown in Fig. 4d, max
and dual Intel E5 2650 CPUs @ 2.00 GHz processor. These pooling of size 2×2 with stride 2 selects the maximum element
GPU servers are on a 10Gb/s TCP/IP backplane. in the non-overlapping regions (shown with different colors).
Thus, it reduces the dimensionality of the feature map, which
V. CNN BASED R ADIO F INGERPRINTING in turn reduces the number parameters and computations in
the network.
The success of CNNs in vision and speech domains moti-
The output of the second pooling layer is provided as input
vates our investigation in using CNNs for radio fingerprinting.
to the fully connected layer. A fully connected or dense layer
The proposed method consists of two stages, i.e., a training
is a traditional Multi Layer Perceptron (MLP), where the
stage and an identification stage. In the former, the CNN
neurons have full connections to all activation steps in the
is trained using raw IQ samples collected from each SDR
previous layer, similar to regular neural networks. Its primary
transmitter to solve a multi-class classification problem. In
purpose is to perform the classification task on high-level
the identification stage, raw IQ samples of the unknown
features extracted from the preceding convolution layers. At
transmitter are fed to the trained neural network and the
the output layer, a softmax activation function is used. The
transmitter is identified based on observed value at the output
classifer with softmax activation function gives probabilities
layer. In this section, we first describe the CNN architecture
(e.g. [0.9, 0.09, 0.01] for three class labels).
and then present preprocessing of input data necessary to
Next, we discuss the selection hyperparameters of CNN to
improve the performance.
optimize the performance, followed by preprocessing of input
data necessary for proper operation of CNN and finally shift-
A. CNN Architecture invariance property of our classifier.
Our CNN architecture is inspired in part by AlexNet [11], 1) Model Selection: We start with a baseline architecture
which shows remarkable performance in image recognition. consisting of two convolution layers and two dense layers,
As shown in the Fig. 4a, our network has four layers, which then progressively vary the hyperparameters to analyze their
consists of two convolutional layers and two fully connected or effect on the performance. The first parameter is the number of
dense layers. The input to the CNN is a windowed sequence filters in the convolution layers. We observed that the number
of raw IQ samples with length 128. Each complex value is of filters within a range of (30 − 256) provide reasonably
represented as two-dimensional real values, which results in similar performance. However, since the number of compu-
the dimension of our input data growing to 2 × 128. This is tations increases with an increase in the number of filters,
then fed to the first convolution layer. we set 50 filters in both convolution layers for balancing
The convolution layer is the core building block of the CNN, the performance and computational cost. Similarly, we set
whose primary purpose is to extract features from the input 1 × 3 and 2 × 3 as the filter size in the first and second
data. It consists of a set of spatial filters (also called kernels, or convolution layer respectively, since larger filter size does
5

Sliding operation
SVM Logistic Regression CNN

100
1 2 ⋯ 128 129 ⋯ ⋯ N
90
80

Accuracy (%)
70
IQ samples after sliding
60
50
1 ⋯ 128 129 ⋯ 256 ⋯ ⋯ ⋯ M 40
30
20
10
0
2 3 4 5
Figure 5: Sliding procedure Number of transmitters

(a)
not offer significant performance improvement. Furthermore,
100 50
increasing the number of convolution layers from 2 to 4
45
shows no improvement in the performance, which justifies 95
40
continuation with two convolution layers. We then try to 90
35
analyze the effect of the number of neurons in the first dense

Accuracy (%)
85

SNR (dB)
30
layer by varying it between 64 to 1024. Interestingly, we 80
25
find that increasing the number of neurons beyond 256 does 75
20
not improve the performance. Therefore, we set 256 neurons
70
15
in the first dense layer. After finalizing the architecture and Accuracy (%)
Observed SNR in dB
65 10
parameters of CNN, we carefully select the regularization Analytical SNR in dB

60 5
parameters as follows: We use a dropout rate of 25% after 0 2 6 10 14 18 22 26 30 34 38 42 46 50
Distance (ft)
first and second convolution layers and dropout of 50% at first
dense layer. In addition, we use an `2 regularization parameter (b)
λ = 0.0001 to avoid over-fitting.
2) Preprocessing Data: Our experimental studies con-
1.0
ducted on different representative classes of ML algorithms
demonstrate significant performance improvement by choosing
0.8
deep CNN. However, to ensure scalable performance over
True Positive Rate

large number of devices, our CNN architecture needs to be


0.6
modified. In addition, our input I/Q sequences, which represent
a time-trace of collected samples, need to be suitably parti- 0.4
tioned and augmented beyond a stream of raw I/Q samples.
Our classifiers operate on sequences of I/Q samples of a SDR #1 (area = 0.96402)
0.2 SDR #2 (area = 0.93601)
fixed length. In general, given sequences of length L, we can SDR #3 (area = 1.00000)
create N = L/` subsequences of length ` by partitioning the 0.0 SDR #4 (area = 0.99461)
input stream. We thus create L − ` subsequences by sliding 0.0 0.2 0.4 0.6 0.8 1.0
a window of length ` over the larger sequence (or stream) False Positive Rate
of I/Q samples. Training classifiers over small subsequences
(c)
leads to more training data points, which in turn yields a low
variance but potentially high bias in the classification result. Figure 6: a) The accuracy comparison of SVM, logistic
Conversely, large sequences may lead to high variance and regression and CNN for 2 − 5 devices using 5-fold cross-
low bias. We set 128 as sequence length. From a wireless validation b) The plot of accuracy obtained using CNN for
communications viewpoint, the channel remains invariant in 4 devices over different distances between transmitter and
smaller durations of time. Hence, the ability to operate on receiver c) ROC curves for 4 devices under CNN classification
smaller subsequences carved out of in-order received samples
or 15-th position of an I/Q sequence. Convolved weights in
allows us to estimate the complex coefficients representing the
each layer detect signals in arbitrary positions in the sequence,
wireless channel. Thus we train our classifiers over the input
and a max-pool layer passes the presence of a signal to a
I/Q sequences by treating each real and imaginary part of a
higher layer irrespectively of where it occurs. To enhance the
sample as two inputs, leading to a training vector of 2 × `
shift-invariance property of our classifier during training, we
samples for a sequence of length `.
train it over sliding windows of length ` as shown in Fig. 5,
3) Shift Invariance: Another prominent characteristic of
rather than partitioned windows: this further biases the trained
our CNN classifier both with respect to our final goal of
classifiers to shift-invariant configurations.
identifying the transmitting device, but also in terms of feature
extraction, is shift invariance. In short, all events described in
Section III can occur at an arbitrary position in a given I/Q VI. R ESULTS AND P ERFORMANCE E VALUATION
sequence. A classifier should be able to detect a device-specific We implement our CNN training and classifier in Keras
impairment irrespectively of whether it occurs at e.g., the 1-st running on top of TensorFlow on an NVIDIA Cuda enabled
6

Tesla K40m GPU. We evaluate the performance of our CNN windowing process. However, identifying the optimal length
classifier using 5-fold Cross Validation technique. We use is a critical research objective and should be dependent on the
StratifiedKFold class from the scikit-learn Python machine channel coherence time. Varied CNN architectures may lead to
learning library to split up the training dataset into 5 folds. Our significantly different results. Finding an optimal architecture
training set consists of ≈ 720K training examples and ≈ 80K which enhances device classification is an open research issue.
examples for validation. We use another 200K examples for A related challenge is obtaining the right balance between
testing the performance of our trained model. Thus, we are training time and the classification accuracy. Increasing the
able to obtain less biased estimate of the performance of our depth of the CNN beyond a point may not help the classifi-
model. It took ≈ 43min to train our model. Performance cation; in fact there are risks of overfitting the training set,
evaluation on hold out dataset of 200K examples took only as we found in some of our early experiments. Our work
≈ 3min. The classifier output performance is measured using focuses on training the model with actual experimental data
metrics such as accuracy and Area Under the Curve (AUC), while a large body of earlier works attempt to solve a similar
the latter evaluated on the Receiver Operating Characteristic problem using synthetic data. There exists no standard dataset
(ROC) curve comprising true positive rate on the Y-axis and to benchmark the performance of our classifier, and releasing
false positive rate on the X-axis. all datasets in widely accepted formats is essential for correct
1) CNN vs. conventional algorithms: We first measure the replication of experiments. Finally, as a future objective, our
performance of our dataset using SVM and logistic regression goal is to validate the performance of our classifier to identify
for the classification of nominally similar devices. We extract large number of devices at distances of 100-200 ft. This may
several features such as amplitude, phase and FFT values also require us to effect major changes in the architecture and
from the raw I/Q samples and built a rich set of features to find new optimum parameters.
train the classifiers. We obtain the classification accuracy for
identification among 2, 3, 4 and 5 devices. As seen in Fig. 6a,
VIII. C ONCLUSION
accuracy measure with SVM and logistic regression algorithms
for 2 devices is ≈ 55% and it decreases further as the number We propose a radio fingerprinting approach based on
of devices increases. The performance deterioration can be deep learning CNN architecture to train using I/Q sequence
clearly seen in the Fig. 6a. We then train our CNN classifier examples. Our design enables learning features embedded
using raw data to classify the same set of devices. With our in the signal transformations of wireless transmitters, and
deep CNN network, we are able to achieve accuracy 98% for identifies specific devices. Furthermore, we have shown that
five devices, as opposed to less than ≈ 33% for the shallow our approach of device identification with CNN outperforms
learning SVM and logistic regression algorithms. alternate ML techniques such as SVM, logistic regression for
2) Impact of distance on radio fingerprinting: We run the identification of five nominally similar devices. Finally,
experiments to collect data over a distance ranging between 2- we experimentally validate the performance of our design on
50 ft over steps of 4 ft, to evaluate the impact of distance (and a dataset collected over range of distances, 2 ft to 50 ft.
possible multipath effect owing to reflections) on classification We observe that detection accuracy decreases as the distance
accuracy. Fig. 6b demonstrates the accuracy measure for the between transmitter and receiver increases and how compu-
classification of 4 devices using CNN. It achieves classification tational resources such as Keras running with GPU support
accuracy greater than 95% up to the distance of 34ft. In speed up the training time. Our future work involves increasing
addition, the observed SNR and analytical SNR (calculated the robustness of the CNN architecture to allow scaling up to
using free-space path model) are shown in the same plot correct identification of 1000s of similar radios.
to elucidate the effect of received SNR on the classification
accuracy. It is evident that the classification is robust against ACKNOWLEDGMENT
the fluctuations in SNR occurred due to path loss and multipath
fading up to the distance of 34ft. This work is supported by DARPA under the Young Faculty
3) Receiver Operating Characteristics for radio fingerprint- Award grant N66001-17-1-4042. We are grateful to Dr. Tom
ing: We obtained false positive rate and true positive rate Rondeau, program manager at DARPA, for his insightful
to measure AUC. Fig. 6c shows the ROC curve for four comments and suggestions that significantly improved the
similar WiFi devices. We can see that the CNN model works quality of the work.
extremely well, as AUC ranges between 0.93 and 1. The
AUC attained for each device is 0.964, 0.936, 1, and 0.994, R EFERENCES
respectively. This demonstrates CNN is the effective model for
[1] T. O’Shea, J. Corgan, and T. Charles Clancy, “Convolutional radio
radio fingerprinting. Additionally, training our CNN network modulation recognition networks,” 02 2016.
over a large dataset with Keras takes significantly lower time [2] Q. Xu, R. Zheng, W. Saad, and Z. Han, “Device fingerprinting in wire-
compared to any other aforementioned algorithms. less networks: Challenges and opportunities,” IEEE Communications
Surveys Tutorials, vol. 18, no. 1, pp. 94–104, Firstquarter 2016.
[3] J. Franklin, D. McCoy, P. Tabriz, V. Neagoe, J. Van Randwyk,
VII. R ESEARCH C HALLENGES and D. Sicker, “Passive data link layer 802.11 wireless device
We now discuss the challenges associated with the imple- driver fingerprinting,” in Proceedings of the 15th Conference on
USENIX Security Symposium - Volume 15, ser. USENIX-SS’06.
mentation of CNNs for radio fingerprinting. In our experi- Berkeley, CA, USA: USENIX Association, 2006. [Online]. Available:
ments, we set the partition length as 128 through a rectangular http://dl.acm.org/citation.cfm?id=1267336.1267348
7

[4] K. Gao, C. Corbett, and R. Beyah, “A passive approach to wireless


device fingerprinting,” in 2010 IEEE/IFIP International Conference on
Dependable Systems Networks (DSN), June 2010, pp. 383–392.
[5] I. O. Kennedy, P. Scanlon, F. J. Mullany, M. M. Buddhikot, K. E. Nolan,
and T. W. Rondeau, “Radio transmitter fingerprinting: A steady state
frequency domain approach,” in 2008 IEEE 68th Vehicular Technology
Conference, Sept 2008, pp. 1–5.
[6] V. Brik, S. Banerjee, M. Gruteser, and S. Oh, “Wireless device
identification with radiometric signatures,” in Proceedings of the 14th
ACM International Conference on Mobile Computing and Networking,
ser. MobiCom ’08. New York, NY, USA: ACM, 2008, pp. 116–127.
[Online]. Available: http://doi.acm.org/10.1145/1409944.1409959
[7] S. V. Radhakrishnan, A. S. Uluagac, and R. Beyah, “Gtid: A technique
for physical device and device type fingerprinting,” IEEE Transactions
on Dependable and Secure Computing, vol. 12, no. 5, pp. 519–532, Sept
2015.
[8] T. J. O’Shea and J. Hoydis, “An introduction to machine learning
communications systems,” CoRR, vol. abs/1702.00832, 2017. [Online].
Available: http://arxiv.org/abs/1702.00832
[9] F. Chen, Q. Yan, C. Shahriar, C. Lu, W. Lou, and T. C. Clancy,
“On passive wireless device fingerprinting using infinite hidden markov
random field,” submitted for publication.
[10] N. T. Nguyen, G. Zheng, Z. Han, and R. Zheng, “Device fingerprinting
to enhance wireless security using nonparametric bayesian method,” in
2011 Proceedings IEEE INFOCOM, April 2011, pp. 1404–1412.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks,” in
Proceedings of the 25th International Conference on Neural
Information Processing Systems - Volume 1, ser. NIPS’12. USA:
Curran Associates Inc., 2012, pp. 1097–1105. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2999134.2999257

You might also like