0% found this document useful (0 votes)
36 views20 pages

Rigorous Analysis of Data Orthogonalization For Self-Organizing Maps in Machine Learning Cyber Intrusion Detection For LoRa Sensors

Uploaded by

ruiandre1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views20 pages

Rigorous Analysis of Data Orthogonalization For Self-Organizing Maps in Machine Learning Cyber Intrusion Detection For LoRa Sensors

Uploaded by

ruiandre1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO.

1, JANUARY 2023 389

Rigorous Analysis of Data Orthogonalization for


Self-Organizing Maps in Machine Learning
Cyber Intrusion Detection
for LoRa Sensors
Manish Nair , Member, IEEE, Tommaso A. Cappello , Member, IEEE, Shuping Dang , Member, IEEE,
and Mark A. Beach , Senior Member, IEEE

Abstract— In this article, a novel unsupervised machine learn- Therefore, an indispensable requirement arises to segregate
ing (ML) algorithm is presented for the expeditious radio legitimate devices from rogue ones. In this context, there
frequency (RF) fingerprinting of long range (LoRa)-modulated exist significant interests in physical layer cybersecurity, also
chirps. Identification based on the received signal strength indi-
cator (RSSI) alone is unlikely to yield a robust means of authen- referred to as RF cyber–physical security [4], [5]. This need is
tication for critical infrastructure deployments. This is especially addressed by the SWAN Prosperity Partnership [6] and broadly
true for LoRa, a low-power and LoRa wireless Internet-of-Things encompasses the following.
(IoT) air-interface technology, where modulated chirps have con- 1) Establish methodologies to understand, and synthesize
stant envelope power and correlated in-phase/quadrature (I/Q) attacks on communication systems vectored through the
samples when the chirps are directly extracted. This makes tra-
ditional cyber intrusion detection techniques via a convolutional RF transceiver interface [7], [8].
neural network (CNN) impractical. Moreover, we also prove that 2) Develop RF transceivers for effective and efficient RF
such correlation leads to an orthogonally inseparable dataset, due threat detection, analysis, and mitigation [9], [10], [11].
to which classification becomes intractable. Therefore, we propose 3) Apply machine learning (ML) in the design of passive
an efficient way to produce self-organizing maps (SOMs) of RF structures [12], [13], [14], [15], [16], RF power
LoRa transmitters (TXs) and a potential rogue node prior to
CNN classification. This approach offers SOM orthogonalization, amplifiers (PAs) [17], [18], and RF PA linearization [19].
thus minimizing the mean square error (MSE) within the CNN 4) Test methodologies and resources for radio networks to
using our specially constituted SOM engine for precisely profiling evaluate threats and mitigation [20].
each LoRa TX. This method demonstrates cent-percent success 5) Apply ML-based techniques for low-cost wireless IoT
in recognizing each LoRa TX as either being a legitimate device modems, e.g., long range (LoRa) [21].
or a rogue.
In such a scenario, ensuring the durable security of wireless
Index Terms— Convolutional neural network (CNN), cyber IoT modems deployed in emergent smart city applications
intrusion detection, long range (LoRa), radio frequency (RF) and critical infrastructure is of paramount importance [22].
fingerprinting, self-organizing map (SOM).
These IoT modems rely on RF interfaces that are vulnerable
I. I NTRODUCTION to over-the-air cyberattacks via the RF open attack surface.
For example, LoRa modulation uses chirps with a long
W ITH the increasing deployment of Internet-of-Things
(IoT) networks as smart cities evolve especially in
critical infrastructure, their susceptibility to radio frequency
symbol duration [23], which increases the vulnerability of
being spoofed by rogue nodes [24]. Recently, ML enabled RF
fingerprinting is shown to be capable of detecting rogue nodes.
(RF) cyberattacks from actors making use of software-defined
For example, in [25], raw in-phase/quadrature (I/Q) samples
radios (SDRs) and customizing widely available RF sensors
are applied to a convolutional neural network (CNN) for
has grown substantially [2], [3]. Due to the mass market ubiq-
classifying ten different single-carrier modulation schemes
uity of such devices, they can be made to imitate each other.
including wideband frequency modulation (WBFM) and
Manuscript received 11 July 2022; revised 13 September 2022 and double-sideband amplitude modulation (DSB AM), through
19 October 2022; accepted 21 October 2022. Date of publication which competitive accuracies are achieved. In addition, I/Q
29 November 2022; date of current version 13 January 2023. This work was
supported by the UKRI/EPSRC Prosperity Partnership in Secure Wireless data samples of wireless local area network (WLAN) and
Agile Networks (SWAN) under Grant EP/T005572/1. (Corresponding author: long-term evolution (LTE) signals generated from SDRs are
Tommaso A. Cappello.) preprocessed, and radio identification is attained by means of
The authors are with the Communication Systems and Networks (CSN)
Research Group, University of Bristol, BS8 1UB Bristol, U.K. (e-mail: an optimized CNN in [9], whereas, in [26], a comparable
[email protected]; [email protected]; shuping.dang@ classifier is shown to be invariant to the MAC-ID spoofing
bristol.ac.uk; [email protected]). of automatic dependent surveillance-broadcast (ADS-B) and
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TMTT.2022.3223122. wireless-fidelity (Wi-Fi) signals as only their modem-specific
Digital Object Identifier 10.1109/TMTT.2022.3223122 RF impairments are learned. A similar approach is adopted
0018-9480 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
390 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

1) Using the LoRa chirps generated from five nominally


identical TXs and a rogue TX emulator, we prove
that the LoRa I/Qs are severely correlated; furthermore,
we reveal that correlated I/Q samples result in a dataset
that is orthogonally inseparable.
2) We demonstrate that orthogonal inseparability constrains
a nonconvex training problem, which minimizes the
chosen mean square error (MSE) loss function and is
Fig. 1. Conceptual illustration of our approach: legitimate LoRa nodes NP-hard.
(“real LoRa TXs”) under attack from spoofers (“rogue LoRa TXs”). For 3) Through experimental results, we also obtain the fol-
example, a rogue LoRa TX displaces a real LoRa TX 4. However, LoRa
TX 4 is accurately identified as a cyber intruder. This is accomplished by RF
lowing observations: the feature clusters in the SOMs
fingerprinting the TXs using a VSA and employing ML with SOMs. are interpreted as being related to modem-specific RF
impairments resembling I/Q imbalance; furthermore,
CNNs are invariant to shifts in the features clusters
within a dataset of SOM images. Such shift-invariance
in [27] for commercial off-the-shelf (COTS) Wi-Fi devices, enables the constitution of an “SOM engine” for training
albeit with the distinction of examining the composite RF a CNN.
transceiver-channel characteristics. 4) We propose and utilize an analytical approach to estab-
However, the RF fingerprinting of LoRa devices has only lish that the deep embedding of the SOM images
received scant attention. In [28], a CNN is shown to achieve extracted by the convolutional layers is orthogonalized
the highest classification accuracy with the lowest training time in a CNN, thus minimizing the MSE.
on LoRa spectrograms. Differential constellation trace figure To the best of our knowledge, this is, hitherto, the first and
(DCTF) images of LoRa chirps are utilized to enhance the only attempt to fully detail RF fingerprinting for detecting
classification capability of CNN in [29]. On the other hand, cyber intrusion using ML with SOMs. The performance of the
LoRa chirps present specific RF fingerprinting challenges CNN architectures with common activation functions, batch
given that the chirps have constant envelope power, and, normalization (BN), and a dropout layer is evaluated. Finally,
therefore, the I/Qs of different transmitters (TXs) are highly their efficacy with regard to detecting the cyber intrusion is
correlated. As a consequence, a dataset of LoRa I/Qs collected examined with the help of their RF fingerprint structures.
from notationally different LoRa TXs is orthogonally insepa- The remainder of this article is organized as follows.
rable. With an orthogonally inseparable I/Qs, the nonconvex Section II briefly describes LoRa modulation, the setup for
training problem of a general CNN classifier becomes NP- LoRa fingerprinting, and the emulation of cyber intrusion.
hard [30]. To the best of our knowledge, an NP-hard training Section III scrutinizes the problem of RF fingerprinting LoRa
problem results in the inability of gradient descent algorithms, chirps. Section IV introduces SOMs and investigates their
employed in CNNs, to iteratively derive the optimal link resolution using a CNN. Finally, this article is concluded in
weights that minimize a chosen loss function [31]. This leads Section V.
to, at worst, a stalling of the back-propagation mechanism
or, at best, a misestimation of labels in the dataset. Apropos
of detecting cyber intrusion through the RF fingerprinting of II. D ESCRIPTION OF L O R A M ODULATION , S YSTEM
LoRa chirps, such outcomes are highly undesirable. Never- S ETUP, AND I NTRUSION E MULATION
theless, not only are such critical insights completely missing LoRa exploits the chirp spread spectrum (CSS) where
in contemporary literature [9], [25], [26], [27], [28], [29] but symbols are encoded onto linear frequency-modulated chirps.
nearly all of the RF fingerprinting research in LoRa directly A chirp is characterized by its modulation bandwidth (BW)
apply a CNN to highly correlated I/Qs with minimal or no pre- and spreading factor (SF). Therefore, given y ∈ Y =
processing [28], [29]. Moreover, these CNN-only approaches {1, 2, . . . , Y } = {0, . . . , 2SF − 1}, a number of symbols are
are risky and lead to erroneous TX profiling because issues included in the transmission codebook, and the symbol time
such as carrier frequency offset (CFO) drift in LoRa chirps is given by Ts = (2SF /BW) [32]. The chirp is generated by
can influence CNN response [28]. introducing an offset f os,y = (BW/2SF ) · y that uniquely iden-
These challenges motivated us to devise a novel RF fin- tifies the symbol. The resulting yth LoRa-modulated symbol
gerprinting strategy hinged on the extraction of stable and in the transmission codebook can be expressed in the time
classifiable features from the severely correlated LoRa I/Qs. domain as
   
We propose the use of intermediary self-organizing maps BW 2  
(SOMs) that are expeditiously created to accurately fingerprint x y∈Y (t) = exp j 2π f c t + f os,y −  · BW t
2 · Ts
each of the highly correlated LoRa TXs. The SOMs are of low (1)
rank and generated directly from the high-dimensional LoRa
I/Qs. This approach is validated with five nominally identical where  is (1/2) for f os,y > (BW/2) or (3/2) for f os,y <
LoRa TXs and a rogue TX. Fig. 1 illustrates the concept. (BW/2), and f c is the RF carrier frequency. The significance
Specifically, the contributions of this article, extending that of  can be explained as follows. Since the transmission BW is
given in [1], are given as follows. defined between −(BW/2) and (BW/2), choosing an SF that

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 391

Fig. 2. Block diagram of the system setup: five real LoRa TXs and an
ARB with MATLAB R2020a emulating a rogue LoRa TX (cyber intruder)
are consecutively connected to the system setup. The RF signal is captured
at the VSA and also coupled to the oscilloscope. I/Qs are extracted from
the VSA and fingerprinted on a controlling PC also with MATLAB R2020a.
It should be noted that the roles of real and rogue TXs can be interchanged.
Fig. 3. Photograph of the test bench [1]: the five LoRa TXs are placed inside
a shielded enclosure in order to prevent RF leakage since f c = 868.1 MHz
is part of the licensed ISM band.
generates f os,y > 0 results in a chirp that exceeds (BW/2)
before the end of symbol period Ts . To prevent this, the sam-
pling frequency is set to exactly half of the Nyquist–Shannon to monitor the TX signal in the time domain. The VSA
sampling rate, i.e., BW (instead of 2BW), because of which and the oscilloscope share the same 10-MHz reference lock.
the part of the chirp that goes higher than (BW/2) folds Fig. 3 shows a photograph of the test bench, where each
back to −(BW/2). Hence, an LoRa-modulated symbol can be LoRa board is contained in a shielded enclosure to avoid
expressed in two parts: one prior to the occurring of folding RF leakage in the ISM band. At the enclosure’s interface,
(i.e.,  = (1/2) for fos,y > (BW/2)) and the other after the SMA/3.5 mm components and cables are used to implement
occurring of folding (i.e.,  = (3/2) for f os,y < (BW/2)). the block diagram shown in Fig. 2.
From (1), we can summarize the following key features that An example LoRa-modulated chirp in the time–angle
are relevant to our work. domain (captured from the output of an SODAQ board
1) There occurs continuous and linear chirping of fre- by the VSA Rohde & Schwarz FSQ26 in our test setup)
quency around f c , which can be visualized either in the is shown in Fig. 5. It provides a conceptual illustration
time–angle domain (see Fig. 5) or the time–frequency of offset f os,y that generates an LoRa chirp, expressed as
domain spectrogram (see Fig. 6). 2π f os,y = 2π(BW/2SF ) rad. Fig. 6 depicts the time–frequency
2) As the frequency chirps, continuous change/rotation of spectrogram of an LoRa frame (also from our test setup)
phase occurs (see Figs. 4–6). where the power of the chirps is given in the dB scale.
3) The chirps have constant envelope power (see Fig. 4). The initial eight chirps comprise the preamble. The subse-
Fig. 2 depicts the block diagram of the system setup quent two chirps and 2(1/4) downchirps time-synchronize
used to capture the LoRa I/Q sequences generated by a real and frequency-synchronize the LoRa frame at the receiver.
LoRa TX (SODAQ) or by a rogue node (ARB) emulating a It should be noted that these initial 8 + 2 + 2(1/4) = 12(1/4)
cyber intruder. The “real” TXs are five nominally identical chirps are unmodulated. The remaining chirps modulated by
SODAQ Explorer boards that are consecutively connected symbols in the transmission codebook Y comprise the payload.
to the test coupler input. Each SODAQ board contains a From this example, offset f os,y and the folding of chirps are
low-cost Microchip RN2483 LoRa modem configured with clearly observed.
f c = 868.1 MHz (dedicated Industrial, Scientific and Medical
(ISM) band), SF = 7, BW = 250 kHz, coding rate (CR) = III. P ROBLEM F ORMULATION
(4/5) (i.e., parity-check forward-error-correcting (FEC) coding
only), and a transmit power of −3 dBm. Moreover, all After capturing the I/Qs at the output of the five LoRa
the SODAQ boards are bit-similar, i.e., they have identi- boards (SODAQ) and cyber intruder (ARB), the I/Q’s con-
cal physical addresses and MAC IDs. An instrumentation- stellations are plotted in Fig. 4. Such constellations reveal
grade arbitrary waveform generator (ARB Rohde & Schwarz constant envelope power and (continuous) phase rotation.1
SMATE200A [33]) is then used to emulate a rogue LoRa Because of this, the I/Qs are highly correlated; the CNN
intruder, as depicted in Fig. 2, and configured with the same 1
Equation (1) describes constant envelope power and continuous phase
parameters of the SODAQ boards for achieving the maximum rotation. The extent of the “spiral-outs” in Fig. 4 is determined by the pulse
similarity. rise and fall times: a function of: 1) the Microchip RN2483 LoRa modem
The output of each SODAQ board or the ARB is cap- generating the modulated time-domain chirp waveform at the baseband; 2) the
crystal oscillator [the main timing reference for the SODAQ module driving
tured by a vector signal analyzer (VSA Rohde & Schwarz the phase-locked loop (PLL)] startup time; and 3) the turn-on/-off time of the
FSQ26). An oscilloscope (Keysight MS0S804A) is also used RF PA.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
392 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

Fig. 6. Time–frequency spectrogram of an LoRa frame (also obtained in


MATLAB using an I/Q sequence captured from the VSA): it comprises a
preamble (initial eight unmodulated chirps), frame sync. (two unmodulated
chirps), frequency sync. (2(1/4) unmodulated downchirps), and payload
(remaining modulated chirps).

A. Neural Computing Using CNNs


General-purpose CNN classifiers adopted in [9], [25], [26],
[27], [28], and [29] for traditional RF fingerprinting consist of
convolutional layers and fully connected (FC) layers. Fig. 7
depicts their general structure. The convolutional layers distill
shift-invariant features in the input I/Q samples (collected from
different modems), which forms neural computing parameters
for the succeeding FC layers. There are S different labels
in the classification dataset of high-dimensional2 LoRa I/Qs
denoted as {XT , lT }, as shown in Table I (also in the following
page), where X = [x1T , x2T , . . . , xTS ] ∈ C M×S are vectors of
Fig. 4. Constellations of the six I/Qs captured from the VSA for a single
LoRa frame only within the order of LoRa frame time, i.e., O{32 · Ts } (see complex I/Qs with xs ∈ C1×M , ∀s ∈ S = {1, 2, . . . , S};
Figs. 5 and 6), indicate constant power envelope because of which the LoRa ∀m ∈ M = {1, 2, . . . , M}, given M = 2x106 , are the I/Q
chirps of all the TXs are highly correlated (they are nearly indistinguishable samples in each sth dataset; and l ∈ R1×S , given S = 6, are the
from one another). The “spiral-out” in each constellation marks the startup
of chirps at the beginning of the LoRa frame. In [1], constellations of the known labels. A supervised gradient descent-based algorithm
six I/Qs were captured from the VSA for several LoRa frames. Therefore, is generally adopted to yield estimated labels.
several such spiral-outs were observed in each constellation, where each spiral- In order to fit labels l ∈ R1×S from the classification dataset
out marked the startup of chirps at the beginning of each new LoRa frame.
Such occurrence of several spiral-outs (not shown here) indicated constellation X ∈ C M×S utilizing a CNN with F(·) = ReLU(·) as the
rotation in addition to phase rotation (caused due to linear chirping) because activation function3 and pooling, as depicted in Fig. 7, the
of the CFO between the TX/RX local oscillators. entire training procedure can be classified into the forward-
propagation phase and the back-propagation phase. In the
forward-propagation phase, dataset X is further sliced accord-
ing to a preset batch size as {Xk }k=1 K
∈ C(M/K )×(S/K ).4 The
batch matrices define the batches upon which the individual
convolutions operate. Then, per-batch training is carried out
to yield the output of the CNN. The outputs of the neurons
located in the Dth (output) layer are the final estimated outputs
(labels) of the CNN and are compared to the 1 × S vector of
desired (known) labels l to determine the MSE for the set of
neurons in the output layer.
According to the back-propagation mechanism, the MSEs
are further utilized to recursively compute the set of gradient
Fig. 5. Time-domain visualization of an LoRa chirp obtained in MATLAB
using one of the six I/Q sequences captured from the VSA: a symbol is
2
encoded by the angle (shown in degrees) of a chirp with constant envelope A classification dataset is high dimensional when: 1) the sample size is
power, where the symbol time Ts is in the millisecond range. In our work, very large and 2) each sample is potentially extricable (by the convolutional
for SF = 7 and BW = 250 kHz, Ts = (2SF /BW) = 0.512 ms. layers) and a classifiable feature. Thus, there can potentially exist as many
features as the sample size in the dataset.
3 ReLU(·) = max(0, ·) is the rectified linear unit of the enclosed argument.

classifiers developed in [9], [25], [26], [27], [28], and [29] It is a nonlinear transformation belonging to the class of constrained linear
functions.
cannot distinguish between them. In this section, we formalize 4
K is usually set as the greatest common divisor (GCD) of M and S. In fact,
this problem. (M/K ) × (S/K ) is the size of the 2-D convolution operation.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 393

Fig. 7. Structure of a general purpose CNN, consisting of convolutional layers followed by FC layers.

TABLE I
C LASSIFICATION D ATASET {XT , lT } OF L O R A I/Q S

descents for the neurons in the output layer and hidden layers. written as
⎧ 
With the set of gradient descents, the correction terms for the  ⎨
set of link weights and the set of biases are computed. This β 
S K

MSE = E  ReLU α Xk ws (e) + θ 1T (e)
enables the back-propagation mechanism to iteratively update 2· ⎩
e=1 s=1 k=1
the set of link weights and biases until certain termination  2 ⎫
 T   ⎬
conditions are satisfied. Enabled by such an iterative updating 
· v (e) + θ 2T (e) − lT  . (2)
procedure, the CNN is trained to produce outputs (estimated  ⎭
2
labels) close to the desired outputs (real labels l). Otherwise,
this procedure is continued on an iterative basis. Once the Here, the expression enclosed within ReLU(·) is regarded as
iterative procedure is stopped, the resultant CNN with updated the output of a two-layer neural network at the eth epoch with
link weights and biases is said to be trained. ReLU(·) activation function, 2-D convolution, and pooling,
similar in structure to the CNN in Fig. 7; E{·} denotes the
expectation of the variable enclosed; {·}2 denotes the 2-norm
B. Orthogonal Inseparability of LoRa I/Qs of enclosed argument; e ≤  is the actual number of running
For ease of analysis, we now consider a simplified two-layer epochs; and θ 1T (e) and θ 2T (e) are the sets of biases at the
CNN in Fig. 7, i.e., D = 2 (one hidden layer and one output hidden and output layers, respectively. In (2), α is the learning
layer) with S neurons in the FC hidden layer and the output rate hyperparameter, and β is the range expansion coefficient
layer. Here, the number of neurons in the FC hidden and hyperparameter, which is preset for randomly initializing the
output layers is equal to the number of labels S. Then, upon set of link weights ws (e) ∈ W(e) and v(e).5 Over the course
denoting W(e) = [w1T (e), w2T (e), . . . , wTS (e)] ∈ R K ×S and of e ≤  epochs, MSE in (2) is minimized to yield the
v = [v 1 (e), v 2 (e), . . . , v S (e)] ∈ R1×S as the link weight matrix set of optimal link weight matrix W() and vector v().
and vector connecting the kth batch matrix Xk to the S hidden 5 In particular, the random initialization process is carried out by assigning
neurons (in the hidden layer) and, in turn, to the output layer, different uniformly distributed real numbers within the range (−β, β) to
respectively, at the eth epoch, the expression of MSE can be max{W(e ≤ )}.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
394 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

TABLE II
I LLUSTRATION OF THE O RTHOGONAL I NSEPARABILITY OF THE C LASSIFICATION D ATASET OF THE I/Q S OF L O R A C HIRPS AS D EFINED IN (5)

The optimization problem can be formulated as follows: In other words, a dataset is orthogonally separable if and
only if it is linearly separable, and a training problem similar
MSE ⎧  to the one given by the con-convex problem in (3) can
β
 ⎨ S K
 serve as a linear separator. Table II illustrates the orthogonal
= arg min E  ReLU α{Xk ws (e)
ws ()∈W(),v() 2· ⎩ inseparability of the LoRa I/Q dataset. Although the first
e=1 s=1 k=1
 2 ⎫ condition in (5) (i.e., xsT xt > 0, if ls = lt ) is satisfied (the diag-
   ⎬ onal elements in Table II are positive), the second condition

+ θ 1T (e) vT (e) + θ 2T (e) − lT  . (i.e., xsT xt ≤ 0, if ls = lt ) is not fulfilled (not all the off-
 ⎭
2 diagonal elements in Table II are less than or equal to zero).
(3) This is because the LoRa I/Qs from the five real LoRa TXs
It has been proven that the minimization in (3) is a (SODAQ) and rogue LoRa TX (ARB) are correlated due to
nonconvex max-margin optimization problem [34]. Gradient their constant envelope power and continuous phase rotation
descent-based solvers, such as adaptive stochastic gradient (see Fig. 4). Fig. 8 shows the correlation matrix of the high-
descent (ASGD), are employed in a CNN, which derives dimensional 2 × 106 I/Q samples form the five real LoRa
the optimal link weights W(e ≤ ) and v(e ≤ ) in (3) TXs (SODAQ) and the rogue LoRa TX (ARB). Histograms
over the course of e ≤  epochs via the back-propagation of each of the I/Q sequences [x1T , . . . , x6T ] appear along the
mechanism. Meanwhile, W(e ≤ ) and v(e ≤ ) are further diagonal, while scatter plots of the variable pairs of I/Qs
processed by the softmax(·) layer of the CNN (see Fig. 7) {xs∈{S}, xt∈{S} }∀s = t occur in the off-diagonal. The slopes of
to yield the normalized class of S probabilities given by (4), the least-squares regression fits in the scatter plots are equal
to the displayed correlation coefficients, which are computed
 Sshown at the bottom of the page. In (4), 0 ≤ ps (e) ≤ 1 and
as
for every variable pair of I/Qs as [36]
s=1 ps (e) = 1.
     
However, all of the contemporary literature on RF finger- Exs xt  − Exs  · Ext 
printing [9], [25], [26], [27], [28], [29] implicitly assume that ρxs ,xt =            
Exs2  − Exs  · Ext2  − Ext 
2 2
a classification dataset is orthogonally separable. We will now
prove that this assumption is would not be necessarily valid, s.t. xs∈S , xt∈S ∈ X ∈ C M×S
i.e., a dataset can be orthogonally inseparable if they are highly
∀s = t. (6)
correlated, e.g., as with our dataset of LoRa I/Qs. A classifica-
tion dataset {XT , lT } is deemed orthogonally separable if [35] The term in the nominator of (6) is the covariance of the
xsH xt > 0, if ls = lt respective variable pairs of I/Qs, and the twin terms in
the denominator are their respective standard deviations.
xsH xt ≤ 0, if ls = lt
The displayed correlation coefficients in the correlation matrix
s.t. xs∈S , xt∈S ∈ X ∈ C M×S of Fig. 8 clearly demonstrate the persistent correlation
s.t. ls∈S , lt∈S ∈ lT ∈ R S×1 between all the possible variable pairs of I/Q sequences
∀s = t. (5) collected from the five LoRa TXs (SODAQ boards) and the

    T 
K
k=1 α Xk ws (e) + θ 1 (e) v (e) + θ 2 (e)
T T
exp ReLU
ps (e) =       (4)
S K
s=1 exp ReLU k=1 α X k w s (e) + θ T
1 (e) v T (e) + θ T (e)
2

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 395

into an NP-hard problem.7 Under such circumstances, the iter-


ative updates to the link weights are inaccurate, due to which
the termination conditions cannot be satisfied. This results in
either: 1) the misestimation of the labels (i.e., the estimated
labels are not close to the original labels l) in turn leading
to misclassification or 2) the failure of the back-propagation
mechanism itself. Therefore, with a highly correlated dataset
whose labeled samples are not orthogonally separable, the
requirement can arise for an intermediate method to sustain
the back-propagation mechanism, which trains a CNN through
gradient descent to estimate outputs (labels) that are close to
Fig. 8. Correlation matrix of the LoRa I/Qs from five real LoRa TXs: the known outputs.
(SODAQ) and one rogue LoRa TX (ARB) cyber intruder. The histograms
appearing along the diagonal represent the distribution of each of the I/Q
sequences in X = [x1T , x2T , . . . , x6T ], while each off-diagonal contains the IV. R ESOLVING SOM S W ITH CNN
respective scatterplot associated with any variable pair of I/Q sequences To mitigate the aforementioned challenge, a two-step
xs , xt , s.t.xs∈S , xt∈S ∈ X ∈ C M×S and ls∈S , lt∈S ∈ lT ∈ R S×1 .
process is proposed here. First, the algorithm proposed in [37]
is used for the fast (≤ 32 · Ts [23]) generation of the SOMs,
ARB (emulating the rogue LoRa TX cyber intruder) according and next, a general-purpose classifier is applied. This results
to [36] in the process illustrated in Fig. 9 and enumerated in detail as
follows.
−1 ≤ |ρxs ,xt | ≤ 1 1) Here, every LoRa I/Q sample within the classification
s.t. ρxs ,xt = 0, if xs , xt are uncorrelated dataset X = [x1T , x2T , . . . , x6T ] ∈ C M×S is first indexed as

s.t. xs∈S , xt∈S ∈ X ∈ C M×S x i=1 , x i=2 , . . . , x i=M , x i=M+1 , x i=M+2 , . . . , x i=2M
∀s = t. (7) x1 x2

In particular, the I/Qs belonging to the rogue LoRa TX cyber . . . , x i=M(S−1)+1 , x i=M(S−1)+2 , . . . , x i=M S
intruder exhibit a high correlation with the I/Q sequences of x6
all the other five LoRa TXs. This makes the cyber intrusion where i ∈ {1, 2, . . . , M, M + 1, M + 2, . . . , 2M, . . . ,
through the spoofing of LoRa chirps quite effective. M(S −1)+1, M(S −1)+2, . . . , M S}, M = 2 × 106 are
With dataset {XT , lT } of LoRa I/Q sequences being orthog- the I/Q samples per LoRa TX (see Table I), and S = 6
onally inseparable, the minimization of MSE, as expressed are the number of LoRa TXs.
in (3), in order to derive the optimized link weights 2) After this, an ANN matrix W j =1 ∈ RU ×V is created such
ws∈S (e ≤ ), v(e ≤ ) of the hidden layer and the output that U, V  M (thus ensuring that the resulting SOMs
layer (for a successful classification) becomes constrained as are of low rank8 ); on the j = 1st epoch, j ∈ {1, . . . , J },
MSE as also shown in Fig. 9. In our work, we initialize W j =1
⎧  (colloquially termed feature maps) as a RU ×V (real)
β
 ⎨ S K
 matrix of weights, each arbitrarily assigned an identical
= arg min E  ReLU α{Xk ws (e)
ws ()∈W(),v() 2· ⎩ value = 160 (see Fig. 10).
e=1 s=1 k=1
 2 ⎫ 3) The set of M S Euclidean norms di, j =1 = x i, j =1 −
   ⎬ W j =1  are then computed and grouped as

+ θ 1T (e) vT (e) + θ 2T (e) − lT  
 ⎭
2 di=1, j =1 , di=2, j =1 , . . . , di=M, j =1
C : xs∈S
T
, xt∈S > 0 D1, j =1
s.t. Xk∈K , xs∈S , xt∈S ∈ X di=M+1, j =1 , di=M+2, j =1 , . . . , di=2M, j =1
s.t. ls∈S , lt∈S ∈ l T D2, j =1

∀s = t (8) . . . , di=M(S−1)+1, j =1 , di=M(S−1)+2, j =1 , . . . , di=M S, j =1
D6, j =1
due to additional penalty of orthogonal inseparability indicated
by constraint C.6 Since the training problem of (8) is con- 7 This is polynomial-time hard, i.e., no algorithms exist in the prob-
strained by orthogonal inseparability, solvers, such as ASGD, lem dimensions R2d=K ×S that can derive the optimized link weights
cannot be employed to derive the optimized link weights. ws (e) ∈ W(e) and vector v(e) over e ≤  epochs, thus constructing a
polynomial,
S which produces a response that is close to the label vector l,
In other words, a dataset that is not orthogonally separable due
s=1 ReLU([α{Xk ws (e ≤ ) + θ 1 (e ≤ )(v (e ≤ ) + θ 2
T T T
i.e.,
to correlation transforms the classification problem in CNNs (e ≤ ))}]) ≈ lT .
8 We arbitrarily choose U × V = 10 × 10 = 100. As a result, compared
6 In constraint C, the absence of the second condition (i.e., x T x ≤ 0,
s t
to the high-dimensional I/Qs (M = 2x106 ), the SOMs are low-dimensional.
if ls  = lt ) from (5) is to be noted. Hence, constraint C indicates orthogonal In this way, SOMs achieve a dimensionality reduction in the order of 104 ,
inseparability. i.e., O{104 }, considerably simplifying neural computation.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
396 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

Fig. 9. Generation of SOMs from the complex I/Q vector acting on an ANN matrix created for every ith input and jth epoch by unsupervised ML.

4) Utilizing this batch of S = 6 winning weights, six


batch update matrices [δWx1 , j =1 = μ j (Ix1 , j =1 , dx1 , j =1 ),
δWx2 , j =1 = μ j (Ix2 , j =1 , dx2 , j =1 ), . . . , δWx6 , j =1 =
μ j =1 (Ix6 , j =1 , dx6 , j =1 )] are computed on the j = 1st
epoch. Here, the hyperparameter μ j =1 (·, ·) represents
the learning rate on the j = 1st epoch, and any sth batch
update matrix δWxs∈S, j =1 has dimensionality RU ×V .
5) At the end of the j = 1st epoch, each of the six batch
update matrices is added to the original ANN W j =1
to produce a batch of six offspring ANNs W j =1 =
[Wx1 , j =1 = δWx1 , j =1 + W j =1 , Wx2 , j =1 = δWx2 , j =1 +
W j =1 , . . . , Wx6 , j =1 = δWx6 , j =1 + W j =1 ].
Fig. 10. Extent of update (or cluster) of the U × V = 100 weights in 6) The above described procedure is repeated from the
ANN at every jth epoch. The “winning” weight at any jth epoch Ixs∈S, j
has the greatest extent of the cluster (marked with the darker color map), j ≥ 2nd epoch onward, with the exception that the
and the geometrically farthest weight from the winning weight Ixs∈S, j at the six batch update matrices [δWx1 , j ≥2 = μ j ≥2 (Ix1 , j ≥2 ,
corresponding jth epoch has the least extent of the cluster (marked with dx1 , j ≥2 ), δWx2 , j ≥2 = μ j ≥2 (Ix2 , j ≥2 , dx2 , j ≥2 ), . . . ,
lighter color map). The extent of the cluster of the remaining U × V − 2 =
98 weights occurs in between. The batch update generating offspring ANNs δWx6 , j ≥2 = μ j ≥2 (Ix6 , j ≥2 , dx6 , j ≥2 )] obtained at the end
at every jth epoch occurs within a CPU clock period Fclk . of any j ≥ 2 epoch are added to the batch of six
offspring ANNs derived at the previous j − 1 ≥ 2 epoch
instead of the original ANN W j =1 : expressed herein
as W j ≥2 = [Wx1 , j ≥2 = δWx1 , j ≥2 + W j −1≥2 , Wx1 , j ≥2 =
on the j = 1st epoch, from which the batch of
δWx2 , j ≥2 +W j −1≥2 , . . . , Wx6 , j ≥2 = δWx6 , j ≥2 +W j −1≥2 ].
S = 6 “winning” weights [Ix1 , j =1 , Ix2 , j =1 , . . . , Ix6 , j =1 ]
are selected, which minimizes the Euclidean norm: In this way, the batch of six offspring ANN matrices generated
[Ix1 , j =1 = min{D1, j =1 }, Ix2 , j =1 = min{D2, j =1 }, . . . , at the j ≤ J th epoch denoted as W J = [Wx1 ,J = δWx1 ,J +
Ix6 , j =1 = min{D6, j =1 }]. W J , Wx2 ,J = δWx2 ,J + W J , . . . , Wx6 ,J = δWx6 ,J + W J ]

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 397

characterizes the SOMs of the dataset X = [x1T , x2T , . . . , x6T ] J = {200, 400, 600, 800, 1000, 1200} epochs are collected
of LoRa I/Qs; each of them is associated with a specific from each of the five real and nominally identical LoRa TXs
LoRa TX. and for the rogue LoRa TX emulating the cyber intruder
Fig. 9 (top right) shows the SOM obtained after within the order of LoRa frame time. From each dataset of
J = 200 epochs. We attain the computation time of about the six SOM images attributed to a particular LoRa TX, five
30 s for completing 200 epochs (in a personal computer with are randomly chosen and labeled9 as the training set (see
Intel Core i7 CPU clocked at Fclk = 3.6 GHz). We note Fig. 9 within green boxes). The remaining SOMs are the test
that 30 s/200 epochs = 150 ms, which is in the order images for that particular LoRa TX (see Fig. 9 within blue
of the LoRa frame time (i.e., O{32 · Ts }), thus proving boxes). In this way, a dataset consisting of 36 SOM images is
the feasibility of the approach. Dedicated hardware, such as generated, 30 of which are labeled (attributed to a particular
application specific integrated circuits (ASICs) or graphics LoRa TX class) for training, whereas the remaining five are
processing units (GPUs), can achieve higher execution speeds. used for testing purposes.
It should be noted that an epoch constitutes those specific
set of iterations (colloquially quantified by their associated A. Orthogonalization of Feature Clusters
time interval in terms of the number of CPU/ASIC/GPU
clock frequency Fclk in modern/applied ML literature) that are It is clear that the SOMs shown in Fig. 9 display prominent
induced by virtue of each of the interpolated hyperparameters. clusters of features, marked with a different color, unique for
Here, the only hyperparameter of interest is the learning rate each SODAQ board (conceptually similar to a fingerprint) and
μ j (·, ·), and the iterations induced by their tuning are the also the rogue LoRa TX (ARB). For example, the SOM of
set of learning rate hyperparameters μ j =1 (Ix1 , j =1 , dx1 , j =1 ), the rogue LoRa TX (representing the cyber intruder) displays,
μ j =1 (Ix2 , j =1 , dx2 , j =1 ), . . . , μ j =1 (Ix6 , j =1 , dx6 , j =1 ) alter from broadly, two sets of clusters. On the other hand, the SOMs
the j = 1st epoch to the set μ j ≥2 (Ix1 , j ≥2 , dx1 , j ≥2 ), from the real LoRa TXs display several sets of clusters.
μ j ≥2 (Ix2 , j ≥2 , dx2 , j ≥2 ), . . . , μ j ≥2 (Ix6 , j ≥2 , dx6 , j ≥2 ) over j ≥ It is likely that these feature clusters are correlated with
2 epochs, which are given as follows. the envelope variations at the output of the respective LoRa
1) The generation of the six batch update matrices boards (see Fig. 2) resembling10 I/Q imbalance. I/Q imbalance
[δWx1 , j =1 = μ j (Ix1 , j =1 , dx1 , j =1 ), δWx2 , j =1 = μ j refers to a modem-specific RF impairment in the quadrature
(Ix2 , j =1 , dx2 , j =1 ), . . . , δWx6 , j =1 = μ j =1 (Ix6 , j =1 , mixer that upconverts an I/Q waveform to an RF signal.
dx6 , j =1 )] at the j = 1st epoch; the generation It is often impaired by gain and phase mismatches between
of the six batch update matrices [δWx1 , j ≥2 = μ j ≥2 the parallel sections of the RF chain when processing the
(Ix1 , j ≥2 , dx1 , j ≥2 ), δWx2 , j ≥2 = μ j ≥2 (Ix2 , j ≥2 , dx2 , j ≥2 ), . . . , I (in-phase) and Q (quadrature) signal paths. If the analog
δWx6 , j ≥2 = μ j ≥2 (Ix6 , j ≥2 , dx6 , j ≥2 )] from the gain is not equalized for each signal path, it causes amplitude
2 ≥ j ≥ J th epoch onward. imbalance. A mismatch in the delay causes phase imbalance.
2) The addition of the six batch update matrices to Conversely, the instrumentation-grade ARB11 does not show
the original ANN W j =1 to produce the first batch such fluctuations because of the high quality and expensive
of six offspring ANNs expressed as W j =1 = RF front-end components that implement gain and phase
[Wx1 , j =1 = δWx1 , j =1 + W j =1 , Wx2 , j =1 = δWx2 , j =1 + equalization minimizing I/Q imbalance.
W j =1 , . . . , Wx6 , j =1 = δWx6 , j =1 + W j =1 ] at the j = 1st Training the CNN with the SOM engine instead of the
epoch; the addition of the six batch update matrices original dataset {XT , lT } of LoRa I/Qs, the expression of MSE
to the batch of six offspring ANNs obtained at the for minimization becomes as is given in (9), shown at the
previous j − 1 ≥ 2 epoch to produce the batch of six bottom of the next page. Here, c ∈ C = {1, 2, . . . , C}, and
offspring ANNs at every j ≥ 2 epoch, expressed as |C| = 6 are the set of classes12 with every SOM image
W j ≥2 = [Wx1 , j ≥2 = δWx1 , j ≥2 + W j −1≥2 , Wx1 , j ≥2 = (in Fig. 9) attributed to a particular class of LoRa TX, either
δWx2 , j ≥2 +W j −1≥2 , . . . , Wx6 , j ≥2 = δWx6 , j ≥2 +W j −1≥2 ] real or rogue. Then, for a generated training minibatch of K
from the 2 ≥ j ≥ J th epoch onward. samples, the deep embedding of the SOM images
Furthermore, the color map of the SOMs in Fig. 9 represents 9 This is attributed to a particular class, i.e., as belonging to a specific
the extent of update (or the scale of update; colloquially termed LoRa TX.
extent of cluster or just cluster) of each of the U × V = 10 The Microchip RN2483 LoRa modem does not implement a dedicated
I/Q architecture. Instead, a modulated chirp waveform is generated and
100 weights in any offspring ANN matrix δWs, j ≤J =200 + W j upconverted to an RF signal by a PLL and mixer. Then, we extract the I/Qs
by the j ≤ J = 200th epoch, referenced to the original ANN from the LoRa RF signal, which is collected from the R&S VSA FSQ26
W j . Fig. 10 shows the extent of the cluster of the weights in an (using a MATLAB script in our system setup).
11 On the other hand, the R&S ARB SMATE200A does implement a
offspring ANN δWs, j ≤J =100 + W j over j ≤ J = 100 epochs.
dedicated I/Q architecture. In the emulation of the rogue LoRa TX, separate
In Fig. 10, it is observed that the extent of cluster of these I and Q waveforms are generated by MATLAB utilizing (1), which are then
weights varies from their initialized fixed values = 160 up to upconverted to RF signals by the ARB.
12 The set of classes is analogous to a set of labels. As such, C = S = 6,
a range = 35:155 by the j = 100th epoch.
but, in this section, we adhere to the terminology “class” instead of “label”
Finally, Fig. 9 also depicts several labeled SOM image as we are referring to SOM images instead of the original dataset of LoRa
datasets constituting an SOM engine. Six SOM datasets at I/Q vectors X.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
398 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

{A(e)}k=1
K
∈ R K ·K ×K ·C at the eth training epoch is denoted as TABLE III
⎡ ⎤ S TRUCTURE OF CNN
A1,1 (e) A1,2 (e) · · · A1,c (e) · · · A1,C (e)
⎢ A2,1 (e) A2,2 (e) · · · A2,c (e) · · · A2,C (e) ⎥
⎢ ⎥
⎢ .. .. .. .. .. .. ⎥
⎢ . . . . . . ⎥
⎢ ⎥
⎢ Ak,1 (e) Ak,2 (e) · · · Ak,c (e) · · · Ak,C (e) ⎥
A(e) = ⎢ ⎥
⎢ .. .. .. .. .. .. ⎥
⎢ . . . . . . ⎥
⎢ ⎥
⎢ A K ,1 (e) A K ,2 (e) · · · A K ,c (e) · · · A K ,C (e) ⎥
⎣ # $% & # $% & #$%& # $% & #$%& # $% & ⎦
A1 (e) A2 (e) · · · Ac (e) · · · AC (e)
(10)
where Ak,c (e) = [a1Tk,c (e), a2Tk,c (e), . . . , akTk,c (e), . . . , aTK k,c (e)] is
the kth feature cluster, also referred to as the deep feature
submatrix or subspace, attributed to the cth class in the batch
A(e); Ac (e) ∈ R K ·K ×K are the C column subspaces and
each attributed to a cth class. Similarly, akTk,c (e) ∈ R K ×1
are the K column spaces of the subspace Ak,c (e) ∈ R K ×K .
These subspaces define the batches, upon which the individual
convolution operates.
Equation (9) is an unconstrained13 minimization, for which
an ASGD solver can be employed to easily derive the link
weights. This is because the gradient descents vanish when
the MSE attains the global minimum, which is achieved when AlexNet [40], which shows excellent performance for image
all of the C subspaces Ac (e ≤ ) attributed to any cth classification, while our adoption of the MSE loss function
class are orthogonalized by the e ≤ th epoch [38], [39]. that is minimized via the back-propagation mechanism when
Orthogonalization of subspaces that minimizes the MSE is implementing ASGD is inspired by the Adam optimizer [41].
inherent to a successful classification in CNNs. This is proven Its structure is enumerated in Table III. Thereafter, the perfor-
in the Appendix. In this way, at the end of a successful training mance, including convergence and accuracy, of the following
phase (i.e., by e ≤ th epoch) of a CNN, all of the classifier four types of CNN architectures is investigated:
subspaces Ac () are orthogonalized to attain the maximum 1) without BN;
separation from all the C classes. Such a geometry of learned 2) without BN but with a dropout layer instead;
features is not naturally imposed by a standard CNN in the 3) BN combined with dropout;
presence of highly correlated LoRa I/Qs that are orthogonally 4) BN combined with dropout and along with a mixture of
inseparable, thus making the classification problem intractable. activation functions.
Hence, we resort to the intermediate method of applying Finally, the efficacy of the above proposed CNN architectures
SOMs. for cyber intrusion detection through an analysis of their
associated RF fingerprint structures is also examined. The
functional characteristics of our proposed CNN are introduced
B. Experimental Results
as follows.
First, a CNN with four convolutional layers, one FC layer,14 The input layer accepts the randomly selected training set
softmax(·) layer, and an output classification layer, as illus- from the SOM engine for training every kth training minibatch
trated in Fig. 9, is proposed. Its architecture is inspired by each attributed to the cth class, as well as the randomly
13 Itis limited only by the hyperparameters α and β.
selected test set for classification. It is ensured that all the
14 We find that only one FC (i.e., Dth output) layer is sufficient for SOMs are of identical pixel resolution.
achieving classification. In addition, the mathematical framework developed The 2-D convolutional layers extract feature clusters from
for the analysis of a CNN with D = 2 (i.e., hidden, output) layers also the SOMs. It consists of a set of spatial filters performing the
well applies in all respects for a D = 1 (i.e, single FC layer only)
CNN whose MSE minimization is given by arg minwc ()∈W() (β/(2 · 2-D convolution, where a “stride” (as given in Table III) is the
  K
))  E{ C c=1 ReLU([ k=1 α{Ak,c (e)wc (e) + θ 1 }])2 } − (β/(2 ·
T 2 sliding interval of each spatial filter for the 2-D convolution
e=1 
))  e=1 E{ C
c=1 c A (e) 2
2 }. Note that the absence of link vector v(e) or operation. It determines the dimensions of the extracted feature
its associated bias θ 2T in the expression for MSE in (9) does not alter in any clusters (see Fig. 9). Each of these filters operates indepen-
way either the nature or any of the implications of the framework. dently to extract a corresponding number of feature clusters.


*  2+ 
* 2 +
β  C K
     C

MSE = arg min E  ReLU α Ak,c (e)wc (e)+ θ 1T vT (e)+ θ 2T  − β E  Ac (e)
   
wc ()∈W(),v() · 
2 e=1 c=1 k=1 2
2· e=1 c=1 2
(9)

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 399

BN normalizes the subspaces (which define the batches, (see Fig. 9) in the SOM images. Shift-invariance that is the
upon which the individual convolution operates) via minibatch exclusive property of CNN classifiers implies that the training
statistics. This is accomplished by introducing to each sub- process is invariant to shifts in the training samples [44].
space a pair of hyperparameters γk,c (e), ωk,c (e) at every eth CNNs were assumed to be shift-invariant by virtue of the
epoch, which scale and shift the normalized value denoted by convolutional layers, nonlinear transformation (activation), and
Bk,c (e) as pooling layers, progressively building stability to shifts in the
feature clusters.15 However, recent works have shown that
Bk,c (e) = γk,c (e)Âk,c (e) + ωk,c (e)
  CNNs are not as a matter of fact shift invariant [45], and
⇒ b1Tk,c (e), b2Tk,c (e), . . . , bkTk,c (e), . . . , bTK k,c (e) various measures are taken to counter this problem [45], [46].
= γk,c (e) In the context of RF fingerprinting, shift invariance enables
, - CNNs to learn the feature clusters occurring due to modem-
· â1Tk,c (e), â2Tk,c (e), . . . , âkTk,c (e), . . . , âTK k,c (e) specific RF impairments (resembling I/Q imbalance) in the
+ ωk,c (e) (11) LoRa TXs, irrespective of the arbitrary instants of time they
occur. A CNN’s ability to attribute class predictions on the
where
  SOM test set (i.e., as belonging to a specific LoRa TX) is
Ak,c (e) − E Ak,c (e) also on account of its invariance to distinct shifts in the feature
Âk,c (e) =    (12)
Var Ak,c (e) clusters, each attributed to a particular SOM class.
Finally, the softmax(·) layer generates class probabilities
with the expectation and variance being computed for every for the SOM test set, enabling the final classification layer to
kth training minibatch attributed to the cth class. These hyper- attribute an SOM image in the test set to a particular LoRa
parameters restore the representation power of the CNN by TX, covering both real or rogue classes.
reducing the internal covariate shift, which is defined as the Our approach with SOMs is much faster and within
change in the distribution of the neural computing parameters e ≤  = 20 epochs,16 and by 0.6 s, convergence to an
(subspaces in our case). By setting γk,c (e) = (Var{Ak,c (e)})1/2 acceptable solution with MSE ≤ 2.5 is attained. This is
and ωk,c (e) = E{Ak,c (e)}, the original distribution (if optimal) clearly observed in Fig. 11(a) when using both ReLU(·) and
is recovered. Reducing the internal covariate shift in this way tanh(·) activation functions with BN. Again, fast conver-
achieves the whitening of each of the CNN layers, albeit gence is crucial to the expeditious RF fingerprinting of LoRa
at a low computational cost. Furthermore, BN regularizes a modems.
CNN to attain an optimal bias–variance tradeoff by preventing For comprehensiveness, we define another performance
overfitting. This is crucial in accurate identification/cyber measure called the network accuracy given by
intrusion detection and also removes the requirement of a ⎧ ⎧  K
β ⎨  ⎨ C 
dropout layer. 
ACC = I E  ReLU α Ak,c (e)
The activation function is a nonlinear transformation for 2 ·  ⎩ e=1 ⎩ c=1 k=1
2 ⎫
performing neural computing upon the extracted and nor-
malized feature clusters. It can be chosen to be set as a   

logistic/sigmoid, sign/asymptotic, step, or constrained linear · wc (e) + θ 1T 
 ⎭
function. We choose the ReLU(·) activation function, which 2
* C 2 ++
belongs to the class of constrained linear functions. The  
activation layer is an important part of CNN. Irrespective of =E   Ac (e) (13)
the number of layers in a CNN, ifthere is no activation layer, c=1 2
K
the final output given by ReLU([ k=1 α{Ak,c (e)wc (e)+θ 1T }]) where I{·} is the Iverson bracket, which is also plotted in
remains the same at the eth epoch. It is because of the Fig. 11(b), showing that we also attain a cent-percent ACC
activation layer that a CNN possesses the ability of hierar- within e ≤  = 20 epochs. Fig. 13(a) [BN + tanh(·)/BN +
chical nonlinear mapping [42], [43]. Finally, the choice of ReLU(·)] highlights corresponding RF fingerprint structure
the activation function also impacts the performance of the (for the predicted versus the true class attributions for the
gradient descent. For example, the ReLU(·) activation function SOM test set). Here, the elements along the diagonal are the
is generally stable for ASDG because the set of computed gra- true class attributions. Hence, the predictions on the SOM test
dient descents over the e ≤  epochs converges to the global set are an exact match. The color maps indicate the degree of
optimum provided that the subspaces are orthogonalized (see correlation between the true and the predicted classes, and the
the Appendix for the proof of subspace orthogonalization). numbers are the assigned probabilities that an SOM image
In our proposed CNN, the convolutional layers are followed within the test set can be attributed to a particular class.
by a max(·)-pooling layer. It functions to introduce shift- As observed in Fig. 13(a), unity probabilities of 1 are obtained
invariance and reduces the dimensionality of the rectified fea-
ture clusters of the preceding layers while retaining the most 15 In Fig. 9, such shifts (in the feature clusters) can be visualized across the

important information. We implement max(·)-pooling layers SOM training and test sets accumulated for each of the LoRa TXs.
16 Multiple bursts of the LoRa RF signal are collected from the R&S VSA
with filters of size 2 × 2 and stride 2 × 2. It downsamples
FSQ26, I/Qs extracted, and then processed offline. Hence, e ≤  = 20 epochs
the feature clusters by 2 along both dimensions, selecting are within the order of LoRa frame time, i.e., being less than or equal to
the maximum element in the nonoverlapping feature clusters O{32 · Ts } [23].

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
400 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

Fig. 11. (a) Training loss (MSE) and (b) training accuracy (ACC) of the CNN with BN with ReLU(·) and tanh(·) activation functions; (c) MSE and
(d) ACC in a CNN without BN but with a dropout layer instead.

Fig. 12. (a) MSE and (b) ACC of the CNN with BN combining dropout with ReLU(·) and tanh(·) activation functions; (c) MSE and (d) ACC of CNN
with BN and dropout combined but with the heterogeneity of activation functions.

along the diagonal elements, thus validating the predictions on Next, Fig. 11(c) and (d) plots the MSE and ACC, respec-
the test set. In this way, cyber intrusion by the rogue LoRa tively, for employing ReLU(·) and tanh(·) activation functions
TX can be accurately detected. without BN and a dropout layer and without BN but with

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 401

Fig. 13. RF fingerprint structures for the CNN architectures proposed in Figs. 11 and 12. (a) BN + tanh/BN + ReLU/No BN, tanh only/BN + ReLU-1 +
Dropout + ReLU-2 + ReLU-3 + ReLU-4/Dropout + BN + tanh only/BN + tanh-1 + Dropout + ReLU-2 + ReLU-3 + ReLU-4/Dropout + BN + ReLU
only/BN + tanh-1 + Dropout + tanh-2 + tanh-3 + tanh-4. (b) No BN, ReLU only/ReLU only + Dropout. (c) NO BN, ReLU only. (d) tanh only + Dropout.

a dropout layer instead.17 The objective associated with these [see Fig. 11(a)] and ACC = 100% [see Fig. 11(b)] within
simulations is to evaluate the impact of BN, dropout, and acti- e ≤  = 20 epochs confirm this.
vation functions separately on CNN performance (i.e., MSE The predictions without BN, ReLU(·) only, or ReLU(·)
and ACC), as well as their resultant efficacies in detecting only + dropout [see Fig. 13(b)] yield all false-positive reports
cyber intrusion. The observations from Fig. 11(c) and (d) are among the five nominally identical real LoRa TXs. The only
enumerated infra: accurate prediction here is the true-positive report of the
1) without BN and a dropout layer: rogue LoRa TX. This is in agreement with the acceptable
MSE = 2 [see Fig. 11(c)] but low ACC = 20% [see
a) MSE = 0 and ACC = 100% are attained within
Fig. 11(d)]. In other words, an acceptably low MSE indicates
e ≤  = 10 epochs with tanh(·) activation
an accurate prediction of the true-positive cases in the test set
function.
(i.e., rogue LoRa TX), while paltry ACC is the indication of
b) An acceptably low MSE = 2 but poor ACC ≈
misclassification among the true-negative cases (i.e., real LoRa
20% is observed with ReLU(·) activation function.
TXs). Here, the CNN acquiring a high bias is skewed toward
2) without BN but with a dropout layer instead: identifying rogue LoRa TX (i.e., detecting cyber intrusion),
a) It now takes e ≤  = 50 epochs to attain but its variance is low,18 resulting in misclassification.
MSE = 0 and ACC = 100% when using the It is observed the simulation employing ReLU(·) without
tanh(·) activation function. BN can also yield false-negative reports [see Fig. 13(c)], i.e.,
b) The CNN performance with ReLU(·) activation rogue LoRa TX misclassified as a real LoRa TX. At the same
function is poor, resulting in MSE = 2 and time, four out of the five real LoRa TXs are identified as
ACC ≈ 20%. real LoRa TX 2 by mistake, i.e., there occur severe errors
The implications of the above-enumerated observations, ascer- in classifying the true-negative cases. Again, this agrees with
tained in their associated RF fingerprint structures, are shown the attained MSE = 2 [see Fig. 11(c)] and ACC = 20%
in Fig. 13(a) No BN, tanh(·) only, Fig. 13(b) No BN, ReLU(·) [see Fig. 11(d)]. A false-negative outcome is unacceptable in
only/ReLU(·) only + dropout, Fig. 13(c) No BN, ReLU(·) the context of cyber intrusion detection, and therefore, the
only; and Fig. 13(d) tanh(·) only + dropout, respectively. ReLU(·) activation function without BN should be avoided.
The predictions on the SOM test set without BN, tanh(·)
only [see Fig. 13(a)] are an exact match. MSE = 0
18 Of the polynomial in accurately estimating the labels of true-negatives by
17 Here, layer 3 in Table III is a dropout layer instead of BN. the e ≤ th epoch, this is colloquially termed the bias–variance tradeoff.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
402 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

Fig. 14. RF fingerprint structures for the CNN architectures proposed in Fig. 12 only. (a) BN + tanh-1 + Dropout + tanh-2 + tanh-3 + tanh-4.
(b) BN + tanh-1 + Dropout + ReLU-2 + ReLU-3 + ReLU-4.

TABLE IV
R ELATIVE P ERFORMANCE S UMMARIES OF MSE & ACC AND T HEIR T RAINING T IMES IN THE P ROPOSED F OUR T YPES OF CNN A RCHITECTURES

The scenario of using tanh(·) only + dropout results in BN followed by four tanh(·) activation functions in each of
the possibility of a misprediction among the true-negative the four convolutional layers or BN then tanh(·) combined
cases, i.e., one of the real LoRa TXs attributed to the wrong with dropout in the first convolutional layer, followed by
class [real LoRa TX 2 as real LoRa TX 4; see Fig. 13(d)]. three ReLU(·) activation functions in the succeeding layers.
However, the remaining true-negative and true-positive cases We obtain either Fig. 13(a) or 14(b) as their associated RF
(i.e., the real LoRa TXs and rogue LoRa TX, respectively) fingerprint structures, indicating the possibility of a mispre-
are identified correctly. This is also the indication of the diction in the true-negative cases [real LoRa TX 1 identified
MSE [see Fig. 11(c)] and ACC [see Fig. 11(d)] not entirely as real LoRa TX 3; see Fig. 14(b)].
reaching 0 and 100%, respectively, even at the e ≤  = A similar MSE and ACC performance [see
50th epoch. This is also the indication of suboptimal link Fig. 12(a) and (b)] is demonstrated by positioning dropout
weights, which, when processed by the softmax(·) layer, prior to BN followed by four ReLU(·) activation functions,
causes erroneous class attribution probabilities resulting in or BN and ReLU(·) combined followed by dropout in the first
misclassification. convolutional layer, succeeded by ReLU(·) only activation
Finally, Fig. 12(a) and (c), and (c) and (d) plots the MSEs functions in the remaining layers. Again, the corresponding
and ACCs, respectively, for ReLU(·) and tanh(·) activation structure of the RF fingerprint in Fig. 13(a) illustrates a
functions when BN is combined with dropout, and BN is cent-percent prediction accuracy.
combined with dropout considering heterogeneity of activation However, the simulation involving BN then tanh(·) imple-
functions. The goal is to evaluate the combined effect of menting dropout in the first convolutional layer, followed
BN with dropout on such CNNs’ MSE and ACC. Their by 3 tanh(·) activation functions, shows a degraded MSE
associated RF fingerprint structures are observed in both and ACC [see Fig. 12(a) and (b)] with slower conver-
Figs. 13(a) and 14(a)–(d). The conclusion is given as follows. gence. Two related RF fingerprint structures are observed in
1) The performance of a CNN with BN and dropout Figs. 13(a) and 14(a). While Fig. 13(a) does indicate accu-
combined can depend upon the relative positions of rate prediction, the occurrence of Fig. 14(a) also indicates
the BN layer and dropout layer, and the order of the a possibility of misprediction among the true-negative cases
activation function mixture. (real LoRa TXs), albeit with the true-positive case (rogue
For example, Fig. 12(c) and (d) shows superior MSE and LoRa TX) being identified accurately. This is again indicative
ACC being attained, respectively, either with dropout before of suboptimal link weights.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 403

TABLE V
C OMPARISON OF O UR P ROPOSED A PPROACH OF T RAINING F OUR T YPES OF CNN A RCHITECTURES ON AN SOM E NGINE
A GAINST THE C URRENT S TATE - OF - THE -A RT ML E NABLED RF F INGERPRINTING M ETHODS

Finally, as indicated in Fig. 13(a), the CNN architectures— it is irrelevant whether the rogue node is TX 3 or even TX 5
BN + tanh/BN + ReLU/No BN, tanh only/BN + ReLU-1 + (instead of the ARB), and they are all nevertheless precisely
Dropout + ReLU-2 + ReLU-3 + ReLU-4/Dropout + BN + separated. In this way, the objective of RF fingerprinting is
tanh only/BN + tanh-1 + Dropout + ReLU-2 + ReLU-3 + accomplished.
ReLU-4/Dropout + BN + ReLU only/BN + tanh-1 + From these investigations of CNN architectures and their
Dropout + tanh-2 + tanh-3 + tanh-4—are all trained to RF fingerprint structures, the following remarks can be
identify any of the LoRa TX 1, 2, 3, 4, or 5 from the other summarized.
[apart from detecting cyber intrusion from the rogue LoRa 1) Both ReLU(·) and tanh(·) activation functions perform
TX (ARB)]. Albeit in a realistic deployment, the identity of a similarly with BN. The RF fingerprints are cent-percent
rogue LoRa TX is unlikely to be known. However, since any of accurate.
the CNN architectures indicated in Fig. 13(a) exactly identifies 2) Without BN, ReLU(·) activation function performs
any of the LoRa TXs (1, 2, 3, 4, 5, or ARB) from the other, poorly with the CNN biased toward accurately detecting

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
404 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

the true-positive case (rogue LoRa TX) only. However, demonstrated the performance priority offered by our proposed
the tanh(·) activation function shows excellent perfor- method. This method demonstrated fast execution times for
mance without BN. RF fingerprints that were extracted within the order of LoRa
3) With a dropout layer replacing BN, the CNN’s perfor- symbol duration. To the best of our knowledge, this is the
mance with either ReLU(·) and tanh(·) is degraded. The first attempt at RF fingerprinting LoRa TXs for detecting cyber
resultant RF fingerprint structures with false-negative intrusion using SOMs. Although an ARB was used to emulate
reports are undesirable. a rogue LoRa TX, it should be emphasized that any one of
4) BN combined with a dropout layer and along with the real LoRa TXs could also emulate the rogue instead, for
a mixture of activation functions tends to perform example, real LoRa TX 2 spoofing real LoRa TX 3. However,
acceptably with the possibility of misattribution for the our strategy of using SOMs is competently capable of adapting
true-negative cases. However, there do not occur any to such a scenario because we achieve a cent-percent accuracy
false-negative cases. in distinguishing the real LoRa TXs from the rogue. In future
These results could be ascribed to the computation of gradient investigations, the LoRa I/Q dataset will be expanded to
descents by the back-propagation mechanism. With tanh(·), include the full range of transmit powers of −3 to 14 dBm,
the difference in gradients from one epoch to the next is greater SFs of 7–14, and CRs of (4/5) (i.e., parity-check FEC
compared to that of ReLU(·). On the other hand, this could coding only) to (7/8) (i.e., extended-Hamming FEC coding).
result in missing the global minimum during the minimization Furthermore, the system setup will be functionally upgraded
of MSE. It can also lead to faster convergence, provided to accomplish over-the-air transmission of LoRa frames under
that the random initializations of the link weights W(e) and channel fading, both indoor and outdoor. Here, SOMs of the
learning rate hyperparameter α are favorable. The dropout received I/Q sampled, thus generated, would be of the com-
layer prevents overfitting in the CNN by setting some elements posite transceiver-channel environment. LoRa modules from
(connections) to 0 and scaling the remaining elements based different manufacturers will also be RF fingerprinted. Finally,
upon a predefined dropout mask probability, thus performing cyber intrusion detection employing SOMs of the composite
a function similar to BN (i.e., altering the subspace distribu- transceiver channels with 4G-LTE and 5G-new radio (NR)
tions). However, a loss of information is possible here due to modules in both nonstandalone (NSA) and standalone (SA)
the setting of elements to 0, and thus, special care should be configurations will be examined.
exercised in the choice of the activation function. BN, besides
scaling and shifting the subspace distributions, also whitens A PPENDIX
and regularizes a CNN. This has a direct impact on the way
The expression for MSE in (9) can be expanded by using the
of faster computation of gradient descents resulting in faster
definition of subspace Ac (e) in (10) and applying the property
convergence. Hence, combining BN with dropout would be
of expectations over 2-norms as given by
redundant when BN by itself should suffice for achieving high 
β 2  2
classification accuracy.    
MSE ≤ E A1 () + A2 () + · · ·
2· 2 2
C. Performance Summary and Comparison  2  2
   
Relative performance summaries of MSE and ACC in our + Ac () + · · · + AC ()
2 2
proposed four types of CNN architectures (namely: 1) without
β  . . . . . 
 . . . . . 
BN; 2) without BN but with a dropout layer instead; 3) BN − E A1 ().A2 (). · · · .Ac (). · · · .AC ()
combined with dropout; and 4) BN combined with dropout 2·
along with a mixture of activation functions) along with their (14)
corresponding training times are provided in Table IV. Table V which enables expressing the set of gradient descents in the
further compares our adopted approach of training these four Dth and (D − 1)th FC layers at the e = th epoch as
CNN architectures on the SOM engine that orthogonalizes
the original dataset {XT , lT } of highly correlated LoRa I/Q g D {A()}
 .   . 
samples, against the current state-of-the-art ML-enabled RF . .
= UA1 ()1 VAT 1 ()1 .0 + 0.UA2 ()1 VAT 2 ()1 + · · ·
fingerprinting methods in the literature. It should be noted  .   . 
that, unlike most other works employing custom GPUs to train . .
+ UAc−1 ()1 VAT c−1 ()1 .0 + 0.UAc ()1 VAT c ()1 + · · ·
CNNs, we utilize only a standard personal computer with an  .   . 
Intel Core i7 CPU. Yet, as aforementioned, a rapid generation . .
+ UAC−1 ()1 VAT C−1 ()1 .0 + 0.UAC ()1 VAT C ()1
of SOM image dataset within the order of LoRa frame time 
O{32 · Ts } along with 100% classification accuracy and cyber − U1T ()V1 () · MSE (15)
intrusion detection is accomplished, with the training time
(for both MSE minimization and ACC maximization) being and
1.2 s/epoch (see Table IV).
g D−1 {A()}
 . 
.
V. C ONCLUSION = UA1 ()1 VAT 1 ()1 .0
 .   . 
In this article, we proposed a novel ML-based detec- . .
tion method using SOMs with an optimized CNN and + 0.UA2 ()1 VAT 2 ()1 + UAc−1 ()1 VAT c−1 ()1 .0

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 405

 .   . 
. .
+ 0.UAc ()1 VAT c ()1 + UAC−1 ()1 VAT C−1 ()1 .0 given by [38]
 
 .   N D−1
g Ak,c (e) = U1,k,c (e)V1,k,c (e)T (18)
.
+ 0.UAC ()1 VAT C ()1 −U1T ()V1 () · g D wc ().
N D−1 =1 and the set of gradient descents in the output layer and
(16) the hidden layer for minimizing the given by (14) is given
by [38]
In (15) and (16), we have the following remarks: * C  K
 . . 
. . (r)
1) 0 is a zero padding matrix. g D {A(e)}  Z(l)
k,c .U1,k,c (e)V1,k,c (e).Zk,c
T

2) UA1 ()1 ∈ UA1 () , UA2 ()1 ∈ UA2 () , . . . , UAc−1 ()1 ∈ c=1 k=1
+
UAc−1 () , UAc ()1 ∈ UAc () , . . . , UAC−1 ()1 ∈ UAC−1 () ,
and UAC ()1 ∈ UAC () are those initial columns − U1 (e)V1T (e) · MSE (19)
in the left singular matrices19 corresponding to
those singular values in 1 (), 2 (), . . . , c−1 (), and
C (), . . . , C−1 (), C () that are higher than * 
C K  . . 
threshold δ. . . (r)
g D−1 {A(e)}  Z(l)
k,c .U 1,k,c (e)V T
1,k,c (e) .Zk,c
3) Similarly, VA1 ()1 , VA2 ()1 , . . . , VAc−1 ()1 , . . ., c=1 k=1
VAc ()1 , . . . , VAC−1 ()1 , VAC ()1 are those initial + S
columns in the right singular matrices corresponding to − U1 (e)V1T (e) · g D (e)wc (e) (20)
those singular values in 1 (), 2 (), . . . , c−1 (), s=1
C (), . . . , C−1 (), C () that are higher than
respectively, where g D (e) = [g 1D (e), g 2D (e), . . . , g cD (e), . . . ,
threshold δ.
g CD (e)] and g D−1 (e) = [g 1D−1 (e), g 2D−1 (e), . . . , g cD−1 (e), . . . ,
4) U1 () and V1 () are the principal20 left and right
singular matrices of A(), respectively, at the e = th g CD−1 (e)]. In (19) and (20), Z(l) (r)
k,c and Zk,c are the zero padding
epoch. matrices to complete the dimensionality of A(e) ∈ R K ·K ×C·K
at any eth epoch. Moreover, in (19) and (20), the following
Then, g D {A()} = 0 and g D−1 {A()} = 0 establish holds.
equality between the first and second terms enclosed within
1) U1 (e) and V1 (e) are the principal left and right sin-
the expression {·} in (15) and (16). This can be explained in
gular matrices of the deep embedding of the SOM
the following way. The set of gradient descents g D {A()} and
images A(e), as defined in (10). This means that
g D−1 {A()} reducing to zero by the e ≤ th epoch occurs
U1 (e) ∈ U(e) and V1 (e) ∈ V(e) are those initial
because of a phenomenon in CNNs colloquially known as the
columns in U(e) and V(e), respectively (in the SVD
vanishing of gradients, and we can prove this in the following
of A(e) = U(e) (e)VT (e)) corresponding to those
way. The minimization of the MSE [expressed in (14)] through
singular values in (e) > δ.
the back-propagation mechanism necessitates the computation
2) The first term inside the enclosed argument {·}, at every
of the subgradient of Ak,c (e)2 at every eth epoch, whose
eth epoch, reduces the variance of the principal compo-
subdifferential is given by [47]
nents of each of the C column subspaces in Ac (e) ∈
 
  A(e) given by [38], [39]
∂ Ak,c (e) = U1,k,c (e)V1,k,c (e)T
2 * K
+ U2,k,c (e)WV2,k,c (e)T . (17) ,  -
σU1,c (e) = E U1,k,c (e) − E U1,k,c (e)
k=1
In (17), Ak,c (e) = Uk,c (e) k,c (e)Vk,c
T
is the SVD of Ak,c (e); +
U1,k,c (e) ∈ Uk,c (e) and V1,k,c (e) ∈ Vk,c (e) are those initial ,  -T
· U1,k,c (e) − E U1,k,c (e) (21)
columns in U1,k,c (e) and V1,k,c (e), respectively, corresponding
to those singular values (square of the eigenvalues) in k,c (e)
and
that are larger than the specified threshold of δ (known as the *
principal components of the kth subspace Ak,c (e) attributed K
,  -
the cth class on the eth epoch); U2,k,c (e) ∈ Uk,c (e) and σV1,c (e) = E V1,k,c − E V1,k,c
V2,k,c (e) ∈ Vk,c (e) are those latter columns in U2,k,c (e) and k=1
+
V2,k,c (e), respectively, corresponding to those singular values ,  -T
· V1,k,c − E V1,k,c (22)
in k,c that are lesser than δ; and W ≤ 1. Applying
W = 0, the projected subgradient for minimizing Ak,c 2 is
respectively, i.e., the intraclass variance within any cth
19 In the singular value decompositions (SVDs) of A ()
1 = subspace. Here, Ac (e) = Uc (e) c (e)VcT is the SVD
U1 () 1 ()V1T (), A2 () = U2 () 2 ()V2T (), . . . , Ac−1 () = of Ac (e). U1,c (e) ∈ Uc (e) and V1,c (e) ∈ Vc (e) are
Uc=1 () c−1 ()Vc−1 T
(), Ac () = Uc () c ()VcT (), . . . , AC−1 () = those initial columns in U1,c (e) and V1,c (e), respectively,
UC−1 () C−1 ()VC−1 T (), and AC () = UC () C ()VCT ().
20 That is, for U (e) ∈ U(e) and V (e) ∈ V(e), their initial columns in
corresponding to those singular values (square of the
1 1
U() and V(), respectively (in the SVD of A() = U() ()VT ()) eigenvalues) in c (e) that are larger than a specified
correspond to those singular values in () > δ. threshold δ.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
406 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

U1 ()V1T ()
 .   .   .   . 
. . . .
= UA1 ()1 VAT 1 ()1 .0 + 0.UA2 ()1 VAT 2 ()1 + · · · + UAc−1 ()1 VAT c−1 ()1 .0 + 0.UAc ()1 VAT c ()1 + · · ·
 .   . 
. .
+ UAC−1 ()1 VAT C−1 ()1 .0 + 0.UAC ()1 VAT C ()1
 . . . . . . 
. . . . . .
= UA1 ()1 .UA2 ()1 . · · · .UAc−1 ()1 .UAc ()1 . · · · · · · .UAC−1 ()1 UAC ()1
⎡ ⎤
VA1 ()1 0 ··· 0 0 ··· 0 0
⎢ 0 VA1 ()1 · · · 0 0 ··· 0 0 ⎥
⎢ ⎥
⎢ .. .. .. .. .. .. .. .. ⎥
⎢ . . . . . . . . ⎥
⎢ ⎥
⎢ 0 0 · · · VAc−1 ()1 0 ··· 0 0 ⎥
· ⎢ ⎥ (24)
⎢ 0 0 ··· 0 VAc ()1 · · · 0 0 ⎥
⎢ ⎥
⎢ . .. .. .. .. .. .. .. ⎥
⎢ .. . . . . . . . ⎥
⎢ ⎥
⎣ 0 0 ··· 0 0 · · · VAC−1 ()1 0 ⎦
0 0 ··· 0 0 ··· 0 VAC ()1
# $% &
orthogonalization/orthogonal separation of the principal components of the subspaces attributed to all the C classes in the deep embedding of the SOM images

3) The second term, at every eth epoch, jointly maximizes top of the page. Here, U1 () and V1 () are the princi-
the separation between the adjacent subspaces in each pal left and right singular matrices of A(), respectively.
of the C classes, i.e., the interclass separation. This is Therefore, they are each orthogonal, causing the bottom-
the separation between any adjacent column subspace most matrix [in the block diagonalization in (24)] to be
Ai (e), A j (e) ∈ A(e); ∀i, j ∈ C; i = j and is quantified orthogonal as well. Then, it follows that [UA1 ()1 |UA2 ()1 | · · ·
as the maximization of the principal angle θi, j defined |UAc−1 ()1 |UAc ()1 | · · · |UAC−1 ()1 UAC ()1 ] must also be orthog-
as [48], [49] onal. This means that the each of the C subspaces Ac () ∈
 A(), c ∈ C, is also orthogonalized by the e = th
uT (e) · v(e)
θi, j = arg max cos−1 epoch. Substituting (24) in the right-hand side (RHS) of the
u(e)∈Ai (e),v(e)∈A j (e) u(e)2 · v(e)2 expression for MSE in (9) reduces it to 0.
∀ i, j ∈ C; i = j (23)

where θi, j ∈ [0, (π/2)], and u(e) ∈ Ai (e) and v(e) ∈ R EFERENCES
A j (e) correspond to those column spaces in Ai (e) [1] M. Nair, T. Cappello, S. Dang, V. Kalokidou, and M. A. Beach, “RF
and A j (e), respectively, which maximize θi, j . These fingerprinting of LoRa transmitters using machine learning with self-
organizing maps for cyber intrusion detection,” in IEEE MTT-S Int.
correspond to the first column of left singular vectors Microw. Symp. Dig., Jun. 2022, pp. 491–494.
u(e) ∈ Ui (e) and the first column of right singular [2] R. O. Andrade, S. G. Yoo, L. Tello-Oquendo, and I. Ortiz-Garces,
vectors v(e) ∈ V j (e) that correspond to the largest “A comprehensive study of the IoT cybersecurity in smart cities,” IEEE
Access, vol. 8, pp. 228922–228941, 2020.
singular values in i (e) and j (e), respectively, cor- [3] D. Reising, J. Cancelleri, T. D. Loveless, F. Kandah, and A. Skjellum,
responding to the SVDs Ai (e) = Ui (e) i (e)ViT (e) and “Radio identity verification-based IoT security using RF-DNA finger-
A j (e) = U j (e) j (e)VTj (e). The joint maximization in prints and SVM,” IEEE Internet Things J., vol. 8, no. 10, pp. 8356–8371,
May 2021.
the separation between the adjacent column subspaces [4] S. E. Lyshevski, A. Aved, and P. Morrone, “Information-centric cyber-
Ai (e), A j (e) ∈ A(e); ∀i, j ∈ C; i = j also results in the attack analysis and spatiotemporal networks applied to cyber-physical
projection of any cth subspace Ac (e) ∈ A(e) onto its systems,” in Proc. IEEE Microw. Theory Techn. Wireless Commun.
(MTTW), Oct. 2020, pp. 172–177.
closest orthogonal form. [5] M. Sati, T. A. Abulifa, and S. O. Sati, “Wireless link reliabil-
Such a simultaneous reduction in the intra-class variance of ity in cyber physical system with Internet of Things,” in Proc.
all the C subspaces and joint maximization in the separation IEEE Microw. Theory Techn. Wireless Commun. (MTTW), Oct. 2020,
pp. 139–144.
between all the C subspaces result in the gradient descents [6] UKRI/EPSRC Prosperity Partnership in Secure Wireless Networks
diminishing the MSE [see (9) and (14)] at every eth epoch, (SWAN). Accessed: Feb. 2020. [Online]. Available: https://www.swan-
which, in turn, proportionally diminishes the set gradient partnership.ac.uk/
[7] A. R. D. Rizo, J. Leonhard, H. Aboushady, and H.-G. Stratigopou-
descents at the corresponding epoch [see (14) and (15) or (19) los, “RF transceiver security against piracy attacks,” IEEE Trans.
and (20)], illustrating the proportionality in the relationship), Circuits Syst. II, Exp. Briefs, vol. 69, no. 7, pp. 3169–3173,
vanishing by the e ≤ th epoch. This phenomenon is Jul. 2022.
[8] B. Zhu, J. Tang, W. Zhang, S. Pan, and J. Yao, “Broadband instanta-
colloquially termed the vanishing of gradients and allows us neous multi-frequency measurement based on a Fourier domain mode-
to set g D {A()} = 0 and g D−1 {A()} = 0 at the e ≤ th locked laser,” IEEE Trans. Microw. Theory Techn., vol. 69, no. 10,
epoch. pp. 4576–4583, Oct. 2021.
[9] S. Riyaz, K. Sankhe, S. Ioannidis, and K. Chowdhury, “Deep learning
Therefore, equating these two terms results in the block convolutional neural networks for radio identification,” IEEE Commun.
diagonalizing relationship, as derived in (24), shown at the Mag., vol. 56, no. 9, pp. 146–152, Sep. 2018.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
NAIR et al.: RIGOROUS ANALYSIS OF DATA ORTHOGONALIZATION FOR SOMs IN ML CYBER INTRUSION DETECTION 407

[10] D. Lee and K. Kwon, “CMOS channel-selection LNA with a feedfor- [32] J. Tapparel, “Complete reverse engineering of LoRa PHY,” École
ward N-path filter and calibrated blocker cancellation path for FEM-less Polytechn. Fédérale de Lausanne, Lausanne, Switzerland, Tech. Rep.,
cellular transceivers,” IEEE Trans. Microw. Theory Techn., vol. 70, no. 3, 2019.
pp. 1810–1820, Mar. 2022. [33] Rohde & Schwarz Test and Measurement Solutions. Accessed: 2021.
[11] J. Woo, K. Jung, and S. Mukhopadhyay, “Efficient on-chip acceleration [Online]. Available: https://www.rohde-schwarz.com/us/products/test-
of machine learning models for detection of RF signal modulation,” in and-measurement/vector-signal-generators/rs-smate200a-vector-signal-
IEEE MTT-S Int. Microw. Symp. Dig., Jun. 2021, pp. 74–77. generator_63493-7556.html
[12] R. Hongyo, Y. Egashira, T. M. Hone, and K. Yamaguchi, “Deep neural [34] Y. Wang and M. Pilanci, “The convex geometry of backpropagation:
network-based digital predistorter for Doherty power amplifiers,” IEEE Neural network gradient flows converge to extreme points of the dual
Microw. Wireless Compon. Lett., vol. 29, no. 2, pp. 146–148, Feb. 2019. convex program,” 2021, arXiv:2110.06488.
[13] C. Zhang, J. Jin, W. Na, Q. J. Zhang, and M. Yu, “Multivalued neural [35] M. Phuong and C. H. Lampert, “The inductive bias of ReLU networks
network inverse modeling and applications to microwave filters,” IEEE on orthogonally separable data,” in Proc. Int. Conf. Learn. Represent.,
Trans. Microw. Theory Techn., vol. 66, no. 8, pp. 3781–3797, Aug. 2018. 2021, pp. 1–19.
[14] J. Jin, C. Zhang, F. Feng, W. Na, J. Ma, and Q. Zhang, “Deep [36] P. J. Schreier, “A unifying discussion of correlation analysis for com-
neural network technique for high-dimensional microwave modeling and plex random vectors,” IEEE Trans. Signal Process., vol. 56, no. 4,
applications to parameter extraction of microwave filters,” IEEE Trans. pp. 1327–1336, Apr. 2008.
Microw. Theory Techn., vol. 67, no. 10, pp. 4140–4155, Oct. 2019. [37] F. Aiolli, G. D. S. Martino, M. Hagenbuchner, and A. Sperduti,
[15] F. Feng, C. Zhang, J. Ma, and Q.-J. Zhang, “Parametric modeling of “Learning nonsparse kernels by self-organizing maps for structured
EM behavior of microwave components using combined neural networks data,” IEEE Trans. Neural Netw., vol. 20, no. 12, pp. 1938–1949,
and pole-residue-based transfer functions,” IEEE Trans. Microw. Theory Dec. 2009.
Techn., vol. 64, no. 1, pp. 60–77, Jan. 2016. [38] J. Lezama, Q. Qiu, P. Musé, and G. Sapiro, “OLE: Orthogonal low-
[16] F. Feng, W. Na, J. Jin, W. Zhang, and Q.-J. Zhang, “ANNs for fast rank embedding—A plug and play geometric loss for deep learning,” in
parameterized EM modeling: The state of the art in machine learning Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Salt Lake City,
for design automation of passive microwave structures,” IEEE Microw. UT, USA, Jun. 2018, pp. 8109–8118.
Mag., vol. 22, no. 10, pp. 37–50, Oct. 2021. [39] Q. Qiu and G. Sapiro, “Learning transformations for clustering and clas-
[17] B. Liu, N. Deferm, D. Zhao, P. Reynaert, and G. Gielen, “An efficient sification,” Int. J. Mach. Learn., vol. 16, no. 1, pp. 187–225, Feb. 2015.
high-frequency linear RF amplifier synthesis method based on evolu- [40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
tionary computation and machine learning techniques,” IEEE Trans. with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
Comput.-Aided Design Integr. Circuits Syst., vol. 31, no. 7, pp. 981–993, Process. Syst., vol. 25, 2012, pp. 1–9.
Jul. 2012. [41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
[18] S. Zhang et al., “Deep neural network behavioral modeling based on 2014, arXiv:1412.6980.
transfer learning for broadband wireless power amplifier,” IEEE Microw. [42] S.-W. Wu, J. Yang, and G.-M. Cao, “Prediction of the Charpy V-notch
Wireless Compon. Lett., vol. 31, no. 7, pp. 917–920, Jul. 2021. impact energy of low carbon steel using a shallow neural network
[19] X. Hu et al., “Convolutional neural network for behavioral modeling and deep learning,” Int. J. Minerals, Metall. Mater., vol. 28, no. 8,
and predistortion of wideband power amplifiers,” IEEE Trans. Neural pp. 1309–1320, Aug. 2021.
Netw. Learn. Syst., vol. 33, no. 8, pp. 3923–3937, Aug. 2022. [43] D. Xiao and L. Wan, “Remote sensing inversion of saline and alkaline
[20] A. E. Spezio, “Electronic warfare systems,” IEEE Trans. Microw. Theory land based on an improved seagull optimization algorithm and the two-
Techn., vol. 50, no. 3, pp. 633–644, Mar. 2002. hidden-layer extreme learning machine,” Natural Resour. Res., vol. 30,
[21] Y. Yazid, I. Ez-Zazi, M. Arioua, and A. El Oualkadi, “A deep rein- no. 5, pp. 3795–3818, Oct. 2021.
forcement learning approach for LoRa WAN energy optimization,” [44] A. Chaman and I. Dokmanic, “Truly shift-invariant convolutional neural
in Proc. IEEE Microw. Theory Techn. Wireless Commun., Oct. 2021, networks,” CoRR, vol. abs/2011.14214, pp. 1–16, Nov. 2020.
pp. 199–204. [45] A. Azulay and Y. Weiss, “Why do deep convolutional net-
[22] N. Neshenko, E. Bou-Harb, J. Crichigno, G. Kaddoum, and N. Ghani, works generalize so poorly to small image transformations?” CoRR,
“Demystifying IoT security: An exhaustive survey on IoT vulnerabilities vol. abs/1805.12177, pp. 1–25, May 2018.
and a first empirical look on internet-scale IoT exploitations,” IEEE [46] R. Zhang, “Making convolutional networks shift-invariant again,” in
Commun. Surveys Tuts., vol. 21, no. 3, pp. 2702–2733, 3rd Quart., 2019. Proc. Int. Conf. Mach. Learn., vol. 97, Jun. 2019, pp. 7324–7334.
[23] J. Tapparel, O. Afisiadis, P. Mayoraz, A. Balatsoukas-Stimming, and [47] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank
A. Burg, “An open-source LoRa physical layer prototype on GNU solutions of linear matrix equations via nuclear norm minimization,”
radio,” in Proc. IEEE 21st Int. Workshop Signal Process. Adv. Wireless SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010.
Commun. (SPAWC), Atlanta, GA, USA, May 2020, pp. 1–5. [48] J. Miao and A. Ben-Israel, “On principal angles between subspaces in
[24] E. Aras, G. S. Ramachandran, P. Lawrence, and D. Hughes, “Exploring Rn ,” Linear Algebra Appl., vol. 171, pp. 81–98, Jul. 1992.
the security vulnerabilities of LoRa,” in Proc. 3rd IEEE Int. Conf. [49] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm,
Cybern. (CYBCONF), Exeter, U.K., Jun. 2017, pp. 1–6. theory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell.,
[25] T. O’Shea and J. Hoydis, “An introduction to deep learning for the vol. 35, no. 11, pp. 2765–2781, Nov. 2013.
physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4,
pp. 563–575, Dec. 2017.
[26] T. Jian, B. C. Rendon, A. Gritsenko, J. Dy, K. Chowdhury, and
S. Ioannidis, “MAC ID spoofing-resistant radio fingerprinting,” in Proc.
IEEE Global Conf. Signal Inf. Process. (GlobalSIP), Ottawa, ON, Manish Nair (Member, IEEE) received the B.E.
Canada, Nov. 2019, pp. 1–5. degree in electronics and communications engi-
[27] K. Sankhe, M. Belgiovine, F. Zhou, S. Riyaz, S. Ioannidis, and neering from Visvesvaraya Technological Univer-
K. Chowdhury, “ORACLE: Optimized radio classification through con- sity, Belagavi, India, in 2007, the M.S. degree in
volutional neural networks,” in Proc. IEEE Conf. Comput. Commun. electrical engineering from The University of Texas
(IEEE INFOCOM), Paris, France, Apr. 2019, pp. 370–378. at Dallas, Richardson, TX, USA, in 2010, and the
[28] G. Shen, J. Zhang, A. Marshall, L. Peng, and X. Wang, “Radio frequency Ph.D. degree in electronic engineering from the
fingerprint identification for LoRa using deep learning,” IEEE J. Sel. University of Kent, Canterbury, U.K., in 2019.
Areas Commun., vol. 39, no. 8, pp. 2604–2616, Aug. 2021. From 2009 to 2014, he was associated with Nokia
[29] L. Peng, J. Zhang, M. Liu, and A. Hu, “Deep learning based RF Siemens Networks, Espoo, Finland; Skyworks Solu-
fingerprint identification using differential constellation trace figure,” tions Inc., Irvine, CA, USA; Samsung Telecommuni-
IEEE Trans. Veh. Technol., vol. 69, no. 1, pp. 1091–1095, Jan. 2020. cations America, Richardson; and Qualcomm, San Diego, CA, USA, in radio
[30] G. Wang, G. B. Giannakis, and J. Chen, “Learning ReLU networks on frequency (RF) PA design, RF standards, RF systems, and RF applications’
linearly separable data: Algorithm, optimality, and generalization,” IEEE engineering capacities. He is currently a Senior Research Associate with the
Trans. Signal Process., vol. 67, no. 9, pp. 2357–2370, May 2019. University of Bristol, Bristol, U.K., and an Honorary Research Associate with
[31] A. Sahiner, T. Ergen, J. Pauly, and M. Pilanci, “Vector-output ReLU the University of Kent.
neural network problems are copositive programs: Convex analy- Dr. Nair is a Technical Program Committee (TPC) Member of the IEEE
sis of two layer networks and polynomial-time algorithms,” 2020, Wireless Communication and Networking Conference (WCNC) and the IEEE
arXiv:2012.13329. International Communications Conference (ICC).

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.
408 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 71, NO. 1, JANUARY 2023

Tommaso A. Cappello (Member, IEEE) received Mark A. Beach (Senior Member, IEEE) has over
the Laurea degree (cum laude) in electrical engi- 35 years of experience in physical layer wireless
neering and the Ph.D. degree from the University research, including spread spectrum; adaptive and
of Bologna, Bologna, Italy, in 2013 and 2017, smart antennas for capacity and range extension
respectively. in wireless networks; MIMO-aided connectivity for
From 2017 to 2019, he was a Post-Doctoral throughput and spectrum efficiency enhancement;
Research Associate with the Microwave and RF millimeter wave technology; and secure, robust,
Research Group, University of Colorado Boulder, and frequency agile radio frequency technologies.
Boulder, CO, USA. Since 2020, he has been a Lec- He leads the delivery of the UKRI/EPSRC SWAN
turer in electrical and communication engineering Prosperity Partnership in Secure Wireless Agile Net-
with the University of Bristol, Bristol, U.K. His works. He is an expert panel member of DCMS on
current research interests include the design and characterization of radio 6G, a co-director of the Center for Doctoral Training (CDT) in communica-
frequency (RF) and power electronic circuits. tions at the University of Bristol, and the Research Impact Director. He is
also a Chartered Engineer (CEng). He is a Co-Founder of the Cambridge-
Shuping Dang (Member, IEEE) received B.Eng. based company, ForeFront RF, Cambridge, U.K., creating frequency agile
degree (Hons.) in electrical and electronic engineer- technology to replace fixed frequency Surface Acoustic Wave (SAW) and
ing from The University of Manchester, Manchester, Bulk Acoustic Resonator (BAR) components commonplace in cellular phone
U.K., the B.Eng. degree in electrical engineering technology.
and automation from Beijing Jiaotong University, Dr. Beach is a member of the Institution of Engineering and Technology
Beijing, China, in 2014, via a joint “2 + 2” dual- (IET).
degree program, and the D.Phil. degree in engineer-
ing science from the University of Oxford, Oxford,
U.K., in 2018.
He joined the R&D Center, Huanan Communica-
tion Company Ltd., Huanan, China, after graduating
from the University of Oxford. He was a Post-Doctoral Fellow with the
Computer, Electrical and Mathematical Science and Engineering Division,
King Abdullah University of Science and Technology (KAUST), Thuwal,
Saudi Arabia. He is currently a Lecturer with the Department of Electrical
and Electronic Engineering, University of Bristol, Bristol, U.K. His research
interests include 6G communications, wireless communications, wireless
security, and machine learning for communications.

Authorized licensed use limited to: b-on: Instituto Politecnico de Viana do Castelo. Downloaded on May 16,2023 at 08:29:09 UTC from IEEE Xplore. Restrictions apply.

You might also like