Chapter 1 Introduction
Chapter 1 Introduction
Chapter 1 Introduction
INTRODUCTION
1.1 OVERVIEW
1
Area throughput refers to the amount of data successfully sent or received in a
cellular area. For serving a more significant number of connected devices and
corresponding traffic, we need to maximize area throughput. Any cellular network's
area throughput can be represented by equation (1.1).
As shown in Figure 1.2, the techniques to increase the network's area throughput [2]
can be classified into those that improve bandwidth, such as mm Wave, and cognitive
radio. The second kind of technique to enhance area throughput is increasing cell
density, such as ultra-dense and D2D networks. The third technique increases spectral
efficiency, such as massive MIMO, NOMA, and similar techniques. Most ongoing
developments rely on increasing spectral usage and densifying the cells. This scarcity
of spectral resources will soon drive that direction to a dead end. The spectral resources
are minimal, and the licensing agencies and governments have difficulty delivering
spectrum to various services without overlapping and interference. The interference in
signals for some pivotal technologies will cause severe risk to human life. The news
reports about the disruption of air traffic due to the recent deployment of 5G bands are
one such example. Also, the densification of existing cells creates unnecessary costs
and infrastructure management requirements.
2
Figure 1.2 Techniques to Improve Area Throughput
3
Massive MIMO is a multi-user MIMO technology that may provide wireless
terminals with reliable and consistently robust service in high-mobility scenarios. The
essential concept is to equip Base Station (BS) with multiple antenna arrays that can
simultaneously serve several terminals at the same time-frequency resource, as
illustrated in Figure 1.3. Massive implies the number of antennas rather than the
antennas' size. Theoretically, a massive MIMO base station can have infinite antennas,
but the antenna count is practically limited to around 200. The typical assumption for
all massive MIMO networks is that the transmit antenna count is much larger than the
single-antenna users. The directivity of the beams increases as the number of antennas
increases in BS. These highly directive beams can reduce interference leakage.
Let us consider a simple scenario, uplink i.i.d Rayleigh fading channel, with 128
BS antennas and two users as represented in Figure 1.4. Let the transmitted signal by
each user be 𝑥𝑘 , where 𝑘 =1,2. With 𝑁𝑇 transmit antennas at BS, and the channel vector
for 𝑘th user is represented in equation (1.2) and with complex normal distribution with
zero means. The noise 𝑛 is also distributed complex normal with zero means.
4
𝑦 = ℎ1 𝑥1 + ℎ2 𝑥2 + 𝑛 (1.3)
Applying a linear detector 𝑙1 for the user, one can detect the received signal by using
one as,
Let us consider the linear operator 𝑙1 be the maximal ratio filter for simplicity as
1
𝑙1 = 𝑁 ℎ1 when applying equation (1.4) produces the signal component as
𝑇
1
𝑙1𝐻 ℎ1 = 𝑁 ‖ℎ1 ‖2 when the number of transmitting antennas becomes infinity, the
𝑇
𝐸(|ℎ11 |2 ) = 1, which keeps the signal power at its maximum value. Now let us
1
consider the interference due to user 2 part where 𝑙1𝐻 ℎ2 = 𝑁 ℎ1 𝐻 ℎ2 When the number
𝑇
5
6. Lower latency
7. Robustness of the network
Massive MIMO in practice also yields ground-breaking results, apart from the
theoretical principles. In 2015, Lund University and Bristol University achieved a high
spectral efficiency by employing 128 transmit antennas with 22 users on 20 MHz radio
spectrum with 3.51GHz band. They have also demonstrated that a low complexity
massive MIMO system with baseband circuit may give stable operation in a variety of
scenarios in indoor and outdoor environments [4]. Massive MIMO hardware
implementation has been successfully tested, demonstrating that these systems can be
created with specific and comparatively cheaper hardware for baseband and RF chains.
Recent experiments in massive MIMO prove its ability to increase spectral efficiency
by multiple folds compared to the existing 4G architecture. The benefits of massive
MIMO have been reported in practical implementations reported from China and Japan.
Apart from higher spectral efficiency, massive MIMO has many other benefits.
Massive MIMO can improve energy efficiency due to its ability to concentrate the
beams on specific users, thereby reducing power wastage. Multipath fading can be
decreased in massive MIMO by leveraging the robustness of diversity techniques due
to the increased number of antennas. It has low latency, high data rate, and better
security because of spatial multiplexing and array gain and narrow beams and
orthogonal channels for better security.
6
techniques to ensure communication is uninterrupted. Especially angular data
estimation techniques need to be used to ensure reliable communication.
Furthermore, the broadcast nature of the massive MIMO network poses serious
security threats. Traditional cryptographic algorithms protect the network by
challenging the intruder with computationally intricate problems that are easier to solve
with available keys. The massive MIMO network can use its inherent wireless channel
properties for security. This new security paradigm is known as Physical Layer Security
(PLS). The PLS for massive MIMO uses randomness of wireless channel estimates to
degrade reception at the eavesdropper.
7
Figure 1.5 AoA for massive MIMO with ULA Base Station Antennas
The roots of early channel angle estimation may be traced back to Conventional
Beamforming (CBF) [8], a technique that directly translates the time domain Fourier
spectrum estimation approach to signal processing in spatial domain. The antenna array
angular domain resolution is limited by Rayleigh constraint. One solution proposed is
the Capon technique [9], it can minimize the power radiation in the interfering direction
without obstructing the smooth radiation towards the desired users. This technique is
robust and does not need a predetermined number of sources, but its resolution is
inadequate. The Rayleigh limit may be overcome using Eigen subspace methods.
8
Pisarenko analysis [10] is a subset of harmonic analysis. The signal and noise subspaces
are obtained by decomposing the array covariance matrix into eigenvalues or singular
values and then utilizing orthogonality to acquire them. The eigenvector specifies the
noise subspace with the least eigenvalue, resulting in a high-precision angle estimate of
the target at a cheap computational cost. This technique, however, has limitations since
it is optimized for array items with a more considerable number than the number of
signal sources. The use of high-resolution approaches performs Angle estimation.
10
Figure 1.6 Cryptographic Security for Wireless Communication Systems
The human brain begins learning in childhood and learns about its environment
through the initial input from its surroundings. For instance, parents may tell the
youngster that the object is a cat pointing to a cat. When a baby hears the word "cat"
and sees the pictorial representation or specialties of the feature, the brain immediately
begins associating the species' characteristics; the next time the baby sees another
animal with the same characteristics, the baby's brain recognizes and remembers it as a
cat. Similarly, an artificial neural network operates in the same manner. An ANN learns
the relationship between multiple inputs and outputs from the available learning
information and develops a conclusion or rule for responding to unknown inputs and
unknown surroundings. Due to the hierarchical layered design of the neural network,
information can be sent to neurons that are not direct neighbors.
Artificial neurons can receive information, process it, and transmit it to other
neurons. Typically, the signal at a connection is a number, and the neuron output is
determined as the weighted sum of its inputs, added to a bias value controlled by an
activation function. The strength of the connection between neurons is referred to as
"weight." As the learning progresses, the neural network modifies these weights, as
depicted in Figure 1.8. Weight tends to strengthen or weaken the relationship. Each
neuron has a threshold value over which only the aggregated signal is delivered, or in
13
simpler terms, the value at which the neuron activates or fires. The physiology of
neurons gives rise to the robust artificial neural network. When programmable
computers were invented, it seemed unthinkable that a manufactured the machine could
function in such an intelligent manner. Because ANNs link individual units together,
they are thought to be more efficient than traditional computers [27]. It comprises a
network of interconnected units or nodes known as "artificial neurons." These nodes
are equivalent to the neurons in the human brain. Like synapses in the brain, these
connections may carry messages from neuron to neuron.
14
computing in real-time is another advantage. Finally, fault tolerance is the capacity to
maintain a network in a breakdown.
An ANN is composed of an input, hidden and an output layer, where each layer
contains multiple neurons. Each layer is responsible for a distinct set of feature
extraction and processing responsibilities. The primary goal of the ANN was to create
a neural network that operated exactly like the human brain. However, when the
emphasis is switched to specific activities, departures from the natural element arise.
The ANN is widely utilized in computer vision, speech recognition, speech synthesis,
filtering of social network content, board and video gameplay, medical diagnosis, and
machine translation. ANNs are a critical machine learning tool, as they excel at
detecting patterns and analyzing complicated relationships, and they can learn features
and make judgments more effectively than a human programmer. Although neural
networks, also known as perceptrons, have existed since the 1940s, it is only in the last
decade that they have become a critical component of artificial intelligence. This is
accomplished by including the backpropagation algorithm, which enables networks to
alter their hidden layers of neurons when the output does not match the target. Many
components of an artificial intelligence system include rule-based systems, classical
machine learning, and representation learning. Each of these methods is different in
how it looks at, processes, extracts, and then maps the input's attributes to the output
[28].
The fundamental premise that adding additional processing units can enable
intelligent interactions in real-time scenarios has created novel models with multiple
units. Among these was the Convolutional Neural Network (CNN). CNN is often
regarded as an efficient method for performing complicated learning algorithms. Then
a plethora of deep learning architectures emerged [29]. Another advancement in this
discipline has been using backpropagation to train deep neural networks using internal
representations [18]. Long Short-Term Memory networks (LSTM) were developed as
a result of this inspiration, significantly improving the performance of Natural
Language Processing (NLP) [30]. Apart from this, it was discovered that kernel
machines and graphical approaches could execute a variety of critical duties [20].
Geoffrey Hinton introduced another type of neural network called the deep belief
network, optimized using greedy layer-wise pretraining. Subsequent research
communities discovered that it could also be used to train various types of deep neural
networks. Deep learning introduces representations that may be expressed in terms of
more straightforward representations. It assists computers in synthesizing complicated
notions from simpler ones. A feed-forward deep network, also known as a multilayer
perceptron, is one of the deep learning models. It is a mathematical representation of
how a group of input values and their associated output values are linked together to
make sense.
17
1.6.3 Activation Functions
The Sigmoid Function curve resembles the English letter 'S,' otherwise known
as logistic function. While its X values vary from -2 to 2, its Y values are naturally
steep [33]. Consequently, small changes in X significantly affect the value of Y. It is
often used in the output layer of a binary classification, where the desired outcome is
either 0 or 1. Because the sigmoid function has a value between 0 and 1, the outcome
may readily be expected to be one of the values that exceed 0.5 and zero otherwise. The
major disadvantage of these functions is their inability to change significantly higher
and very lower values. This problem is known as the vanishing gradient problem.
It is more effective than the sigmoid function. The Tanh function is often
referred to as the hyperbolic tangent function [33]. It is simply a shifted form of the
sigmoid. Both are related and may be deduced from one another. Because its values
vary from -1 to 1, it is best used in the hidden layers of a neural network. Here, the
19
mean for the concealed layer is 0 or extremely near to zero, which aids with data
centring. This modification simplifies the process of learning. However, the vanishing
gradient problem is also existing in the tahh function.
It is the most frequently used activation function. Its value will be between 0 to
infinity if the input is positive; otherwise, it will be zero. Additionally, it is implemented
in the neural network's hidden layers [34]. We can backpropagate the errors and have
the ReLU function activate many layers of neurons. Because it requires straightforward
mathematical procedures, it is less computationally costly than tanh and sigmoid. Just
a few neurons are engaged at any given time, resulting in a sparse, efficient, and
accessible network to compute. Its inability to work well with backpropagation is
known as the dying ReLU problem.
Both the Leaky and Randomized ReLU functions are naturally monotonic.
Furthermore, their derivatives are monotonous by nature [36]. The gradient might move
towards 0 due to the horizontal line in ReLU (for negative X). Because the gradient is
zero for activations in that region of ReLU, the weights are not modified throughout
the descent phase. Once the gradient becomes zero, those neurons cease to respond to
mistakes. This problem is referred to as the "dying ReLU problem." Just convert the
horizontal line to a non-horizontal component to address this issue. The fundamental
concept is to make the gradient non-zero. Leaky ReLu can overcome the dying ReLU
problem, and it will also provide compressed negative values.
20
1.6.5 Cost Function
The cost function is used for problem solving and training. It is the disparity
between actual and anticipated production [36]. The cost function can determine how
to optimize the cost by making tiny adjustments to the weights and biases. Additionally,
it is referred to as the mean square error. The model computes the error whenever a
random training sample is fed. Then, the weight and bias are updated appropriately to
minimize this inaccuracy. This procedure is repeated until the error value converges to
a minimum value or null. This scenario is when the network training has finished.
1.6.6 Optimizers
We have previously discussed the cost function and updating the weights in the
preceding section. Specific algorithms are necessary to do this. Here, optimizers step
in to connect the cost function and model parameters. Optimizers are responsible for
shaping and molding our models by adjusting weights in response to the changing
environment. Optimizers indicate the direction in which weights should be modified.
First, weights are set to a random value based on trial and error. Then, as the algorithm
moves on, weights can be changed to make the cost function as low as possible, based
on the gradient result.
The Gradient Descent optimizer is the most common. It is a quick, durable, and
flexible optimization approach that is frequently utilized. It begins by determining the
gradient of the cost function [38]. Gradients are nothing more than a loss function's
partial derivative. It dictates the direction in which we should go over the valley.
Calculating the gradient takes the loss function and weight into account. The curve's
objective is to reach the bottom of the valley, both local and global minima. It should
avoid missing global minima and be stuck in local minima. By adjusting the learning
rate appropriately, we can achieve this.
21
1.6.6.2 Stochastic Gradient Descent (SGD)
SGD selects training samples rather than computing the gradient across all
training instances [39]. SGD is a machine learning technique that uses batches of
training data or random instances.
1.6.6.3 Adagrad
Adagrad customizes the learning rate for each feature [40], which means that
some of the weights in the dataset have a different learning rate. It is meant for sparse
datasets with many missing input samples. The adaptive learning rate in Adagrad tends
to decrease significantly with time. Adagrad customizes the learning rate for each
feature, which means that some of the weights in the dataset have a different learning
rate. It is meant for sparse datasets with many missing input samples. The adaptive
learning rate in Adagrad tends to decrease significantly with time.
22
section. Due to the dynamic nature of the channel and the number of channels that need
to be estimated, the estimation process is complex in massive MIMO. Numerous
techniques are proposed in the literature. PLS is a paradigm shift in how security is
added to massive MIMO. Various approaches are proposed in the literature to achieve
secrecy using the physical layer signal processing.
23
MMSE, LMMSE, CS, MUSIC, and an estimation of signal parameters using ESPRIT.
For massive MIMO 3D channels, many techniques are proposed in the literature. In
[50] for AoA estimation, 2 MUSIC techniques are applied in azimuth and elevation
angle estimation, respectively. The primary disadvantage of that idea is its increased
system complexity and duration.
24
3. The proposed DL techniques rely on the time domain features of the signal
Numerous techniques used to fight against intruder attacks fall under the PLS
category. Numerous solutions have been offered [56] - [72]. The primary categories of
proposed technologies are multi-antenna approaches, coding techniques, key
generation techniques, and AN scheme. AN scheme sends well-structured artificial
noise in every direction except the direction of LU. The null space of the AN will be
along with the LU directions. The PLS based on beamforming employs two approaches.
In the first approach gain of LU is increased without consideration of the eavesdroppers'
channel. Another approach gains of the LU increased without regard for the AU
channel. ZF methods broadcast the eavesdropper's signal in an orthogonal direction. In
PLS for massive MIMO, various approaches can achieve secrecy. These are artificial
noise injections used to weaken the signal for the eavesdropper. [73] investigates signal
constellation modification. Another technique is to use secure beamforming to block
an eavesdropper's reception. PLS has recently been employed with the CS approach
[74]. Physical layer encryption [75] is one of the most often used PLS approaches.
Another approach to PLS is to generate physical layer keys. The LU will agree
on a randomly shared feature to create an encryption key. Numerous efforts have been
made in the literature to establish key generation methods for PLS. The PLS based on
frequency hopping that performs better in low-unpredictability channels is proposed by
Liangjun Hu et al. [88]. Ozan Alp Topal et al. [89] did a theoretical analysis and
practical verification of error rates of the physical layer key generation. Miro Bottarelli
et al. [90] developed a method for minimizing the number of numerical computations
required to produce physical layer keys. The work utilized Principal Component
Analysis (PCA) for dimensionality reduction. Zijie Ji et al. [91] examined the effect of
the ERASE environment reconstruction attack on the creation of physical layer secret
keys. Ahmed Badawy et al. discuss the secret key creation capacity of PLS. [92]. Ankit
Soni et al. [93] propose a key generation technique to apply to average based on moving
window to the RSSI of the pilot signals for IoT.
Reem Melki et al. [94] analyzed features of PLS key generation for OFDM
networks. The higher edge of the physical layer key rate on AWGN multipath fading
asymmetric channel is derived by Abdul Sahib et al. [95]. Combining GenSM and
massive MIMO is a novel approach to massive MIMO [96], [97]. GenSM is a relatively
new technology that provides dramatic efficiency in spectral utilizations. PLS design
of massive MIMO with GenSM assistance is a relatively new field of research. We
offer a PLS with key generation for a massive MIMO system with spatial modulation
26
based on the CATMCPR [98], in which a small number of mappings are named
Antenna Combinations and Mapping Modes (ACMM) are saved in a memory location
as dictionary entries. One of the ACMM entries is chosen using the key created. The
analysis of the existing key generation exposed the major disadvantage of the works in
literature that the generated physical layer keys remain the same for a considerably long
time due to the static nature of the channel in specific scenarios. This demerit may cause
a security hole that quickly helps Eve break the key. This work presents a DL-based
key generation technique for massive MIMO with GenSM assistance. The privacy
amplification and information reconciliation phases of chaotic key generation can be
reinforced further using DL techniques. This central area of study in this chapter
revolves around the kind of PLS based on DL algorithms.
The literature also extensively investigates the PLS based on DCE; the
methodologies reported in the research fall into two types. The first is a DCE that uses
Feed-Back and Retraining (FBR), while the second is a DCE that utilizes TWT. In the
FBR technique, the BS employs the initial low power pilot sequence for the first
estimate at LU. The AN-injected pilot sequences are broadcast depending on the
original estimation's feedback CSI. LU first delivers the pilot sequences, instantly
activating the BS channel taps in the TWT-based DCE. BS puts the AN into the reverse-
path pilot sequence by leveraging the channel's reciprocity. If the channel is not
27
reciprocal, just a few round-trip training pieces are required to get an acceptable degree
of discrimination.
28
domain NOMA, which is based on deep learning. [48] proposes DL-based super-
resolution AoA and channel estimation for massive MIMO networks.
Due to the limited resolution and angular coupling of the derived estimate, its
accuracy is reduced. With intrinsic adaptability, data-driven strategies can increase
channel estimation accuracy by reducing angular coupling. The DL technique
outperforms the conventional approach to estimation by addressing optimization via
data-driven signal processing. PLS, NOMA, synchronization, spatial modulation,
channel estimation, signal modulation detection, and beamforming are only a few
topics discussed. [114] approaches the beamforming problem via the DL algorithm, an
adaptive system, and reports a considerable gain enhancement. [115] proposed a novel
method for tracking and estimating vehicular millimeter-wave (mm-Wave) channels. It
offers a cost-effective blind channel estimation method based on DL for Uncoded
Space-Time Labelling Diversity (USTLD) systems. [116] provide an AE algorithm-
based solution for IEEE 802.11p Vehicle-to-Everything (V2X) networks by enhancing
the Data Pilot Assisted (DPA) estimation technique. [117] proposes a method for
estimating the AoA of a Uniform Circular Array (UCA) BS-based hybrid massive
MIMO model using a compact DL algorithm. The candidate angles are created using a
feed forward DNN, which are then given to a selection network for choosing the right
angles. A pilot-aided channel estimator for OFDM based on the DL algorithm LSTM
system is proposed in [118]. The authors quantify the proposal's performance and
compare the results with MMSE and Least Squares (LS) estimators. [119] proposes a
DL-based high-resolution channel estimator for a 2D massive MIMO channel using the
learned DNN model. This work does not explore the combined estimation of 3D angles
using the DNN.
In the literature on PLS, DL approaches are also widely used. R Liao [120]
proposed a DL-enabled physical layer authentication technique for the Industrial
Internet of Things (IIoT). Encryption is performed using three models: DNN, CNN,
and Convolutional Pre-processing Neural Network (CPNN). An encryption technique
based on DL-based CSI feature vectors is proposed by A Y Abyanh et al. [121]. In the
encryption process, the CNN architecture is used to extract feature sets and works as a
classifier. G Baldini et al. [122] proposed a hybrid architecture that combines the
benefits of Recurrence Plots (RP) with CNNs. Ferdowsi et al. [123] introduced the
LSTM deep learning model for PLS to defend against man-in-the-middle and data
29
injection attacks. They described the PLS based on RNN for countering spoofing
attacks that exploit the channel's CSI. Hao Gu et al. [124] introduced a complex-valued
CNN for lawful drone identification using RF fingerprinting-based PLS. The
researchers have obtained a classification accuracy of 99.5 percent using a specific
dataset. D Deng et al. [125] established a CNN-based PLS for MIMO heterogeneous
networks with an imperfect CSI.
Massive MIMO channel conditions are rapidly varying. As the number of users
increases, channel estimation requires high resolution and adaptability to changing
situations. The broad spectral width in high-rate links and many antennas considerably
increases channel estimate size, posing severe challenges to conventional massive
MIMO channel estimation techniques. Also, conventional channel estimation suffers
from higher signal acquisition costs and severe training overhead due to many antennas
in massive MIMO systems. Conventional signal processing techniques are less suitable
for this scenario. DL techniques are an illustrious candidate for their highly adaptive
nature via active real-time learning. Unlike traditional algorithms, where explicit steps
need to be defined before deployment, AI techniques learn the steps by analyzing the
data. This peculiarity of AI techniques makes it an apt candidate for massive MIMO
channel estimation.
For security purposes, the estimated channel CSI might be used. PLS is a new
paradigm for wireless network security based on the channel's physical layer
characteristics. Since the development of quantum computing technologies, standard
security methods in wireless networks have been dramatically deprecated, meaning
wireless communication is not entirely secure. PLS takes a novel method of using the
wireless channel's obscurity. The primary goals of PLS are to identify alternative
methods for supplementing existing cryptographic-based security measures in wireless
networks. To leverage the network's physical layer to improve security other than
enhancing security methods implemented at the protocol stack's upper layer.
1. More sensitive to more minor changes in the channel so that even the smaller
changes in a comparatively static channel can produce temporally decorrelated PLS
keys.
2. The PLS should be adaptive to any scattering environment
3. The system should not depend too much on the gaussian symmetry in fading
distribution of the channel
This work primarily considers the reliability and physical layer security
improvement of massive MIMO networks using DL architectures. The Two–
Dimensional (2D) and Three–Dimensional (3D) channel estimation is carried out using
DL algorithms. The proposed techniques are compared with conventional channel
estimation techniques. The estimated data is used for beamforming, and the bit error
rate of the received signal is chosen for performance analysis. The PLS of the massive
MIMO network is also implemented using DL algorithms. For PLS, the Auto-Encoder
(AE) model, Chaotic Deep Neural Network (CDNN), and Generative Adversarial
Network (GAN) are chosen for the PLS. There are two metrics for performance: the
network's secrecy rate and the probability of secrecy outage.
• To compare the performance of the Time Domain (TD), and Frequency Domain
(FD) datasets for DL enabled channel estimation.
32
• To compare the BER of the DL enabled massive MIMO communication with
the conventional techniques.
• To evaluate the secrecy metrics of the data-driven PLS techniques with the
conventional PLS techniques.
The thesis is arranged so that each chapter discusses the major concepts and
contributions made throughout the research. Chapter 1 provides a complete
introduction to the massive MIMO, its channel and AoA estimation, PLS, and Deep
Learning. This chapter also provides an intensive literature survey on the research
topics and the challenges and literature gap in the knowledge map. From the mentioned
challenges, the chapter focuses on the motivation and objectives of the research. The
research aims and the arrangement of the work have been described. Chapter 2
provides a strategy for 2D channel estimation based on DNN models and a PLS scheme
based on auto-encoder models. This chapter conducts an in-depth study of the
performance of both estimation and PLS. Chapter 3 details the design and
implementation of a 3D massive MIMO channel estimation algorithm based on KNN
and DNN. The performance analysis includes data preprocessing and feature
engineering. Chapter 4 introduces a novel framework for PLS based on CDNN for
massive MIMO using Generalized Spatial Modulation (GenSM). This chapter details
the mechanism for generating keys from channel estimations.
33
with the comparison of performance is also included in this chapter. Chapter 6
summarizes the investigation's key findings. Also, a synopsis of the scientific
contributions and future directions has been provided in the final chapter.
34