Chapter 1 Introduction

CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
Future wireless network standards require newer technologies to meet the

challenges posed by the sudden surge in the number of connected devices.
Technologies like the IoT, and M2M communication exponentially increase the count
of connected devices. This increased number of devices will increase the network
traffic. The main objective of the 5G and beyond standards is to improve the area
throughput to keep up with the growing traffic in the upcoming years. As per the
Ericsson Mobility Report 2021[1] forecasting, data usage is expected to increase 260
times in 16 years, as demonstrated in Figure 1.1. In particular, the growth is very rapid
in India. There are over 1.2 billion smartphone subscriptions in the Indian geographical
region, and experts expect even rapid surge network traffic. The area throughput of the
network needs to be increased to handle increased traffic.
Figure 1.1 Data Usage Prediction - Ericsson Mobility Report 2021
1
Area throughput refers to the amount of data successfully sent or received in a
cellular area. For serving a more significant number of connected devices and
corresponding traffic, we need to maximize area throughput. Any cellular network's
area throughput can be represented by equation (1.1).
Area Throughput = Bandwidth × Cell Density × Spectral efficiency (1.1)
As shown in Figure 1.2, the techniques to increase the network's area throughput [2]
can be classified into those that improve bandwidth, such as mm Wave, and cognitive
radio. The second kind of technique to enhance area throughput is increasing cell
density, such as ultra-dense and D2D networks. The third technique increases spectral
efficiency, such as massive MIMO, NOMA, and similar techniques. Most ongoing
developments rely on increasing spectral usage and densifying the cells. This scarcity
of spectral resources will soon drive that direction to a dead end. The spectral resources
are minimal, and the licensing agencies and governments have difficulty delivering
spectrum to various services without overlapping and interference. The interference in
signals for some pivotal technologies will cause severe risk to human life. The news
reports about the disruption of air traffic due to the recent deployment of 5G bands are
one such example. Also, the densification of existing cells creates unnecessary costs
and infrastructure management requirements.
The third category of techniques for increasing area throughput is increasing

spectral efficiency. Even though fundamental theoretical limitations exist, researchers
in this direction are trying to stretch the efficiency boundaries. Many techniques are
proposed to improve spectral efficiency to its optimal level. The most promising
technology that can tremendously change the spectral efficiency value is massive
MIMO technology. It is one of the foundational building blocks of the 5G
infrastructure, and it is a scaled-up version of multi-user MIMO in which the number
of antennas will be much larger than the number of users. Hence, highly directive beams
are possible in massive MIMO, reducing interference leakage.
2
Figure 1.2 Techniques to Improve Area Throughput
1.2 MASSIVE MIMO FUNDAMENTALS
Figure 1.3 Massive MIMO Network
3
Massive MIMO is a multi-user MIMO technology that may provide wireless
terminals with reliable and consistently robust service in high-mobility scenarios. The
essential concept is to equip Base Station (BS) with multiple antenna arrays that can
simultaneously serve several terminals at the same time-frequency resource, as
illustrated in Figure 1.3. Massive implies the number of antennas rather than the
antennas' size. Theoretically, a massive MIMO base station can have infinite antennas,
but the antenna count is practically limited to around 200. The typical assumption for
all massive MIMO networks is that the transmit antenna count is much larger than the
single-antenna users. The directivity of the beams increases as the number of antennas
increases in BS. These highly directive beams can reduce interference leakage.
Let us consider a simple scenario, uplink i.i.d Rayleigh fading channel, with 128
BS antennas and two users as represented in Figure 1.4. Let the transmitted signal by
each user be 𝑥𝑘 , where 𝑘 =1,2. With 𝑁𝑇 transmit antennas at BS, and the channel vector
for 𝑘th user is represented in equation (1.2) and with complex normal distribution with
zero means. The noise 𝑛 is also distributed complex normal with zero means.
Figure 1.4 Uplink massive MIMO with Two Users
ℎ𝑘 = [ℎ𝑘1 ℎ𝑘2 … ℎ𝑘𝑁𝑇 ]𝑇 (1.2)
The received signal at the BS will be
4
𝑦 = ℎ1 𝑥1 + ℎ2 𝑥2 + 𝑛 (1.3)
Applying a linear detector 𝑙1 for the user, one can detect the received signal by using
one as,
̂1 = 𝑙1𝐻 𝑦 = 𝑙1𝐻 ℎ1 𝑥1 + 𝑙1𝐻 ℎ2 𝑥2 + 𝑙1𝐻 𝑛

𝑦 (1.4)
Let us consider the linear operator 𝑙1 be the maximal ratio filter for simplicity as
1
𝑙1 = 𝑁 ℎ1 when applying equation (1.4) produces the signal component as
𝑇
1
𝑙1𝐻 ℎ1 = 𝑁 ‖ℎ1 ‖2 when the number of transmitting antennas becomes infinity, the
𝑇
𝐸(|ℎ11 |2 ) = 1, which keeps the signal power at its maximum value. Now let us
1
consider the interference due to user 2 part where 𝑙1𝐻 ℎ2 = 𝑁 ℎ1 𝐻 ℎ2 When the number
𝑇
of transmitting antennas goes to infinity, the 𝐸(ℎ11 𝐻 ℎ21 ) = 0 . Theoretically, the

1
inference becomes zero, and the noise part 𝑙1𝐻 𝑛 = ℎ1 𝐻 𝑛, for infinite number of
𝑁𝑇
antennas, 𝐸(ℎ11 𝐻 𝑛) = 0 , So we can theoretically conclude that the noise power

becomes zero. The complete spectral resource can be ideally utilized for transmitting
the intended signal, increasing the spectral efficiency.
1.3 ADVANTAGES AND CHALLENGES OF MASSIVE MIMO NETWORKS
Massive MIMO is the physical-layer technique in which each BS is equipped

with many active antennas that may be utilized to spatially multiplex several users to
communicate using the same resource block, means same frequency and time but
spatially separated. Signal attenuation and interference are major challenges in
implementation. Spatial signal processing methods like receiving, combining, and
transmission precoding will improve the spectral efficiency over traditional wireless
networks. The advantages of massive MIMO are listed as follows.
1. Higher spectral efficiency

2. Higher energy efficiency
3. Channel hardening
4. Higher data rate
5. Lesser fading
5
6. Lower latency
7. Robustness of the network
Massive MIMO in practice also yields ground-breaking results, apart from the
theoretical principles. In 2015, Lund University and Bristol University achieved a high
spectral efficiency by employing 128 transmit antennas with 22 users on 20 MHz radio
spectrum with 3.51GHz band. They have also demonstrated that a low complexity
massive MIMO system with baseband circuit may give stable operation in a variety of
scenarios in indoor and outdoor environments [4]. Massive MIMO hardware
implementation has been successfully tested, demonstrating that these systems can be
created with specific and comparatively cheaper hardware for baseband and RF chains.
Recent experiments in massive MIMO prove its ability to increase spectral efficiency
by multiple folds compared to the existing 4G architecture. The benefits of massive
MIMO have been reported in practical implementations reported from China and Japan.
Apart from higher spectral efficiency, massive MIMO has many other benefits.
Massive MIMO can improve energy efficiency due to its ability to concentrate the
beams on specific users, thereby reducing power wastage. Multipath fading can be
decreased in massive MIMO by leveraging the robustness of diversity techniques due
to the increased number of antennas. It has low latency, high data rate, and better
security because of spatial multiplexing and array gain and narrow beams and
orthogonal channels for better security.
Pilot contamination is a significant challenge to reliable communication through

a massive MIMO network [5]. Orthogonal pilot sequences are required for pilot-based
estimation. The scarcity of available orthogonal pilot sequences requires pilot reuse.
The natural side effect of this pilot reuse is the interference between the pilot sequences.
This interference is known as "pilot contamination." Performance deterioration caused
by hardware inadequacies is another crucial concern in massive MIMO deployment.
Channel estimation errors and channel aging effects are two more issues that might
degrade system performance, especially in high-mobility environments. Channel
coefficients are highly susceptible to change between the time they are assessed and
when the precoder or equalizer uses them. This is known as channel aging. The massive
MIMO network faces another challenging issue of rapidly varying channel conditions
of these numerous user channels. It is important to use high-resolution estimation
6
techniques to ensure communication is uninterrupted. Especially angular data
estimation techniques need to be used to ensure reliable communication.
Furthermore, the broadcast nature of the massive MIMO network poses serious
security threats. Traditional cryptographic algorithms protect the network by
challenging the intruder with computationally intricate problems that are easier to solve
with available keys. The massive MIMO network can use its inherent wireless channel
properties for security. This new security paradigm is known as Physical Layer Security
(PLS). The PLS for massive MIMO uses randomness of wireless channel estimates to
degrade reception at the eavesdropper.
1.4 CHANNEL AoA ESTIMATION FOR RELIABLE MASSIVE MIMO
Wireless channel estimation is a rapidly growing research topic in 5G and

beyond 5G communication networks. The channel gain and signal directions are the
essential data that must be estimated from the signal received at massive MIMO
through the wireless channel. In literature, the user angles are referred to as the Angle
of Arrival (AoA) or Direction of Arrival (DoA). AoA estimation has been used for
applications, including sonar, rescue, object tracking, radio astronomy, and other forms
of emergency assistance [6]. Estimating the AoA is very crucial in modern engineering
applications. Bluetooth 5.1, for example, adds new direction-finding functionality.
Additionally, modern Bluetooth proximity technology determines the distance

between devices based on signal strength. Thus, Bluetooth 5.1 devices may detect the
location of another device in their vicinity by combining these tactics. As seen in Figure
1.5, a Uniform Linear Array (ULA) receives the narrowband signals from a distant
emitter. The distances between the emitter and the various antennas vary. If we choose
the first antenna as our reference point, each of the other antennas has its unique
distance to the antenna.
Consequently, the signals arrive at the antennas at varying times, resulting in a

phase difference in the sample data. The vector formed only by the phase difference of
all elements is the "steering vector," denoted by the symbol 𝑎(𝜃). Furthermore, this
vector contains the direction information.
7
Figure 1.5 AoA for massive MIMO with ULA Base Station Antennas
The fundamental technique determines the AoA by using receive beamforming

to calculate the output power at various angles. Only when the angle of receive
beamforming meets the AoA are signals from separate antennas combined in phase.
Thus, the estimated AoA is the angle that corresponds to the maximum output power
value. The second approach, monopulse [7], is often used in radar systems to track the
target. Similar to the first, this approach determines the power differential between two
directions with a small interval at all angles by using the output power of receive
beamforming with varied angles. If the output power is symmetrical concerning the
AoA, the power differential approaches zero at the AoA. The strategies outlined above
involve a linear search for the direction spectrum.
The roots of early channel angle estimation may be traced back to Conventional
Beamforming (CBF) [8], a technique that directly translates the time domain Fourier
spectrum estimation approach to signal processing in spatial domain. The antenna array
angular domain resolution is limited by Rayleigh constraint. One solution proposed is
the Capon technique [9], it can minimize the power radiation in the interfering direction
without obstructing the smooth radiation towards the desired users. This technique is
robust and does not need a predetermined number of sources, but its resolution is
inadequate. The Rayleigh limit may be overcome using Eigen subspace methods.
8
Pisarenko analysis [10] is a subset of harmonic analysis. The signal and noise subspaces
are obtained by decomposing the array covariance matrix into eigenvalues or singular
values and then utilizing orthogonality to acquire them. The eigenvector specifies the
noise subspace with the least eigenvalue, resulting in a high-precision angle estimate of
the target at a cheap computational cost. This technique, however, has limitations since
it is optimized for array items with a more considerable number than the number of
signal sources. The use of high-resolution approaches performs Angle estimation.
Additionally, when we need to estimate the directions of several signals, both

preceding approaches are difficult to utilize. We can use a spatial-spectrum-based
technique referred to as the Multi-Signal Classification (MUSIC) algorithm for high-
resolution scenarios since it is a comparatively high-resolution estimation approach
[11]. This is a subspace-based approach in which the covariance matrix’s eigenvector
decomposition is utilized for the estimation. Another essential technique is the
Estimation of the Signal Parameters via the Rotational Invariance Technique (ESPRIT)
[12]. Unlike the MUSIC technique, it employs a signal subspace to predict the arrival
angles of signals by utilizing array geometry.
Traditional approaches for estimating AoA have several disadvantages due to

their reliance on a physical array configuration. In virtual antenna array technology
[13], a virtual antenna array is formed from the actual array signals. It enhances the
array's degree of freedom and aperture, allowing for more signal sources and improved
AoA estimation performance. By avoiding the constraints imposed by physical arrays,
these solutions considerably increase the performance of AoA estimation. Compressive
Sensing (CS) [14,15] is also used for AoA estimation. This approach restructured the
AoA estimation problem to rebuild a sparse row matrix, considerably reducing the
quantity of work required by Singular Value Decomposition (SVD) [16]. Although
academics have presented various strategies for estimating AoA, no method has been
identified that can perfectly predict AoA. With Artificial Intelligence (AI)
development, AoA estimation using AI can provide astonishing results. The AoA
estimation approach based on Deep Learning (DL) generalizes significantly and
improves estimation performance compared to standard techniques. However,
performance under the noisy channel or severe reverberation circumstances still has
space for improvement.
9
1.5 PHYSICAL LAYER SECURITY FOR SECURED MASSIVE MIMO
Apart from ensuring reliable connectivity, security is another critical

consideration in any wireless communication technology. Wireless networks' inherent
open channels make them more vulnerable to attack. Cryptography is the typical
method for concealing data sent via a wireless connection. Cryptography is the process
of encrypting and decrypting data to create an unreadable format that is only readable
by the intended recipient [17]. Encryption utilizes encoding for converting the plain
message to unintelligible format, while decryption utilizes a decoding procedure to
revert the unintelligible text to an intelligible message through unique keys. As seen in
Figure 1.6, cryptography is often used to resolve heritage and modern electronic
communication systems security issues. Alternatively, various emerging technologies,
most notably quantum computing, pose a danger to cryptographic-based systems.
Quantum computers have a virtually unlimited processing capacity and can easily crack
encryption and decryption keys. Eve can intercept or eavesdrop on the current data flow
utilizing a quantum computer's ability to estimate secret keys or do rapid reverse
calculations [18]. As a result, it is critical to understand that quantum computing's
ability to break through encryption is alarming for conventional cryptographic network
security. Thus, quantum computing technology has the potential to make specific
cryptographic systems less secure than they should be.
Additionally, cryptographic approaches may introduce undesirable delays in

certain situations, such as 5G Ultra-Reliable Low-Latency Communication (URLLC)
[19]. Also, cryptographic techniques are comparatively energy consuming due to the
extra resources required to execute calculations. As a result, different processes targeted
at strengthening cryptography must be implemented. PLS is one of the techniques for
strengthening cryptography. PLS is a relatively new form of security [20]. PLS is
founded on concepts from information theory. In this technique, security is considered
at the physical layer of the system. PLS has gained general recognition as the future
security solution because of its inherent ability to deal with a computationally
sophisticated eavesdropper. This is because PLS security does not rely on the
ineffectiveness of the eavesdropper's computing capacity.
10
Figure 1.6 Cryptographic Security for Wireless Communication Systems
Traditional security systems are based on computational exercises. Rather than

that, secrecy can be provided by physical features of the wireless channel between
authorized users and intruders. PLS principles originated from Claude E Shannon's
endowing research [21]. He defined perfect secrecy as possessing a shared key that is
unintelligible to the eavesdropper and has a greater entropy than the message
broadcast's entropy. The PLS was inspired by the fundamental properties of wireless
communication networks, as demonstrated in Figure 1.7. It takes the unpredictability
of channel factors such as phase, channel gain, and noise. Numerous approaches to
implementing PLS in practice are discussed in the literature [22].
Coding, multiple-antenna approach, key generation, Artificial Noise (AN)

injection, and cooperative communication are widely used PLS techniques. PLS
methods can prevent unauthorized users from eavesdropping data effortlessly
compared to encryption. Without using any encryption at the top layers, PLS can
facilitate security. The facilitation of key-free encryption is enabled by applying
appropriate signals and channel coding to certain wireless channel features [23]. PLS
approaches have demonstrated their ability to provide absolute security even when
adversaries have virtually unlimited computing toolsets. Despite PLS's unmatched
benefits, it is essential to mention that it does have certain drawbacks. It is nearly
11
unattainable to provide optimal security with unity probability because PLS is based
chiefly on mean information content. The authors of [24] proposed cross-layer
collaboration as a possible method for achieving wireless communication dependability
and energy efficiency.
Additionally, most PLS techniques require previous knowledge of the wiretap

channel used by the eavesdropper, which is not practicable in actual implementations.
Additionally, it is worth mentioning that relying entirely on PLS in future wireless
systems would be challenging, as it demands a more significant data rate to assure
secrecy. Also, PLS can be used in combination with advanced security approaches to
enhance the security and resiliency of wireless networks.
Figure 1.7 PLS for Wireless Communication Systems
1.6 DEEP LEARNING FUNDAMENTALS
The study consists of applying data-driven techniques, significantly how deep

learning algorithms can contribute to improving the reliability and security of massive
MIMO networks. Deep Learning is transforming many aspects of human life. It is a
subset and backend technology that enables artificial intelligence. AI is the science of
12
creating smart computers capable of thinking and acting the same way as humans.
Developing an expert system capable of exhibiting several characteristics of the brain
led to the notion of Artificial Neural Networks (ANN) and ultimately to deep learning.
DL [25] is a collection of algorithms entirely based on the brain's structure and function.
AI is a favorable technology that has grown in popularity in recent years, and ANN, a
subset of machine learning, are extensively employed in various sectors, from medicine
to space research [26]. Also, DL have been used to their total capacity in fields such as
wireless communication, medicine research, encryption, language processing, image
processing, and vehicular technologies, among others. ANN are sophisticated
algorithms inspired by human neural networks and astrocytes seen in animal brains.
Just as our brain learns new ideas, it also learns about its surroundings and how to
perform activities through examples. It is not programmed to perform a specific action,
as a robot is, but instead uses the learning material provided or prior experience to
attempt to analyze the characteristics of the newly fed input and make predictions based
on that knowledge.
The human brain begins learning in childhood and learns about its environment
through the initial input from its surroundings. For instance, parents may tell the
youngster that the object is a cat pointing to a cat. When a baby hears the word "cat"
and sees the pictorial representation or specialties of the feature, the brain immediately
begins associating the species' characteristics; the next time the baby sees another
animal with the same characteristics, the baby's brain recognizes and remembers it as a
cat. Similarly, an artificial neural network operates in the same manner. An ANN learns
the relationship between multiple inputs and outputs from the available learning
information and develops a conclusion or rule for responding to unknown inputs and
unknown surroundings. Due to the hierarchical layered design of the neural network,
information can be sent to neurons that are not direct neighbors.
Artificial neurons can receive information, process it, and transmit it to other
neurons. Typically, the signal at a connection is a number, and the neuron output is
determined as the weighted sum of its inputs, added to a bias value controlled by an
activation function. The strength of the connection between neurons is referred to as
"weight." As the learning progresses, the neural network modifies these weights, as
depicted in Figure 1.8. Weight tends to strengthen or weaken the relationship. Each
neuron has a threshold value over which only the aggregated signal is delivered, or in
13
simpler terms, the value at which the neuron activates or fires. The physiology of
neurons gives rise to the robust artificial neural network. When programmable
computers were invented, it seemed unthinkable that a manufactured the machine could
function in such an intelligent manner. Because ANNs link individual units together,
they are thought to be more efficient than traditional computers [27]. It comprises a
network of interconnected units or nodes known as "artificial neurons." These nodes
are equivalent to the neurons in the human brain. Like synapses in the brain, these
connections may carry messages from neuron to neuron.
Figure 1.8 Mathematical Equivalent of a Neuron
However, as AI became a reality with several practical applications, this

intelligent software capable of automating daily activities, understanding language,
picture, audio, and textual information, and making medical choices were brought into
the mainstream of study. It was previously capable of solving problems more quickly
than a human. Like voice recognition or picture recognition tasks, additional scenarios
developed in which issues needed to be handled that were simple for individuals to
accomplish but difficult to define and required intuitions that should feel automatic.
Additional characteristics of ANN include the ability to learn tasks depending on input
provided during the training period. Another feature is the self-organizing capability
during learning, and the brain develops its representation of knowledge. Parallel
14
computing in real-time is another advantage. Finally, fault tolerance is the capacity to
maintain a network in a breakdown.
An ANN is composed of an input, hidden and an output layer, where each layer
contains multiple neurons. Each layer is responsible for a distinct set of feature
extraction and processing responsibilities. The primary goal of the ANN was to create
a neural network that operated exactly like the human brain. However, when the
emphasis is switched to specific activities, departures from the natural element arise.
The ANN is widely utilized in computer vision, speech recognition, speech synthesis,
filtering of social network content, board and video gameplay, medical diagnosis, and
machine translation. ANNs are a critical machine learning tool, as they excel at
detecting patterns and analyzing complicated relationships, and they can learn features
and make judgments more effectively than a human programmer. Although neural
networks, also known as perceptrons, have existed since the 1940s, it is only in the last
decade that they have become a critical component of artificial intelligence. This is
accomplished by including the backpropagation algorithm, which enables networks to
alter their hidden layers of neurons when the output does not match the target. Many
components of an artificial intelligence system include rule-based systems, classical
machine learning, and representation learning. Each of these methods is different in
how it looks at, processes, extracts, and then maps the input's attributes to the output
[28].
1.6.1 Deep Neural Networks (DNN)
The ability of an ANN with several hidden layers to solve numerous

complicated and nonlinear tasks has led to the development of the DNN. The DNN is
simply a neural network with a particular complexity, often more than two layers. DNN
processes data in complicated ways with the use of mathematical modeling. The major
advantage of these additional hidden layers are typical feature engineering efforts can
be reduced. Each layer is assigned distinct responsibilities and is responsible for sorting
and arranging. This is referred to as the "feature hierarchy." DNN is capable of
analyzing even unlabeled and unstructured data. Deep learning is sometimes referred
to as "DNN" since it is a subset of machine learning that classifies and organizes data
in ways that go beyond standard input/output protocols. DNNs are composed of many
simple units, such as neurons, each of which can make basic judgments. There are rich
applications of DNN are reported from movie content recommendation to autonomous
15
vehicles. Medical imaging is another area where DNNs made a revolution. DNN
algorithms are now capable of predicting multiple deadly diseases in earlier stages
itself.
The fundamental premise that adding additional processing units can enable
intelligent interactions in real-time scenarios has created novel models with multiple
units. Among these was the Convolutional Neural Network (CNN). CNN is often
regarded as an efficient method for performing complicated learning algorithms. Then
a plethora of deep learning architectures emerged [29]. Another advancement in this
discipline has been using backpropagation to train deep neural networks using internal
representations [18]. Long Short-Term Memory networks (LSTM) were developed as
a result of this inspiration, significantly improving the performance of Natural
Language Processing (NLP) [30]. Apart from this, it was discovered that kernel
machines and graphical approaches could execute a variety of critical duties [20].
Geoffrey Hinton introduced another type of neural network called the deep belief
network, optimized using greedy layer-wise pretraining. Subsequent research
communities discovered that it could also be used to train various types of deep neural
networks. Deep learning introduces representations that may be expressed in terms of
more straightforward representations. It assists computers in synthesizing complicated
notions from simpler ones. A feed-forward deep network, also known as a multilayer
perceptron, is one of the deep learning models. It is a mathematical representation of
how a group of input values and their associated output values are linked together to
make sense.
1.6.2 Multi-Layer Perceptron (MLP)
The elementary type of DNN is the multilayer perceptron, which contains

numerous layers of neural units often coupled in a feed-forward fashion. Each neuron
in one layer is connected to the neurons in the next layer via directed connections.
Several activation functions may be used to make it nonlinear. If no activation function
is utilized during the process, the output layer of a neuron is linear [31]. The multilayer
perceptron mechanism is based on the well-known universal approximation theorem.
It employs a variety of learning approaches, including backpropagation. In this
scenario, the neural network's output is compared to the correct answer, yielding the
error function. The neural network then uses the error as feedback to update the weights
and biases. The updating process is repeated until the errors are reduced to a minimum.
16
The same procedure is performed several times until the network converges into a state
where the error is negligibly tiny or non-existent. At that moment, the network is
considered to have been thoroughly taught. A technique known as gradient descent is
utilized to update the weights. These updates are accomplished by calculating the
gradient of the error function concerning weights and updating the weights to reduce
the error. That is why backpropagation may be used only with networks that have
different activation functions.
When sample sizes are small, training should be supplemented by some

additional techniques. The major challenge is that the overfitting of the model to
training data cannot accurately represent the accurate statistical distribution that
generated the data. Another issue with backpropagation algorithms is that they
frequently end up in a local minimum of the error function. An MLP model is seen in
Figure 1.9. The input layer is the network's leftmost layer, and the neurons included
inside it are referred to as input neurons. The output layer, on the right, includes the
output neurons, or in some instances, a single output neuron. The intermediate layers
are called "hidden layers" since the neurons in these levels do not function as inputs or
outputs. Multiple layer networks are frequently referred to as multilayer perceptron, or
MLPs, despite being composed entirely of sigmoid neurons rather than perceptron.
Figure 1.9 Basic Multi-Layer Perceptron
17
1.6.3 Activation Functions
Activation functions are critical in DL literature, classified into seven main

categories [32]. It is the primary factor affecting a deep learning model's output. A
neural network's convergence capacity and rate rely wholly on the activation function
utilized. That are the only mathematical functions that define the output of the DNN.
An activation function is assigned to each neuron in the network. It decides whether to
activate a neuron or not based on how it helps the model predict what will happen.
Additionally, activation functions convert the output of each neuron to a value

between 1 and 0 or -1 to 1. Conventional activation functions are tanh, sigmoid
functions. They train the model using a backpropagation technique, which focuses on
the activation function and its derivative function due to the increased computational
burden. Due to the demand for high computing speed, new activation functions such
as Rectified Linear Unit (ReLU) and Swish have been developed. The activation
function acts as a functional gate between the current neuron's input and the neuron's
output to the following layer. It might be as simple as a step function that turns on and
off the neuron's output in response to a rule or threshold. Alternatively, it might be a
transformation that changes the input signals to the output signals required for the
neural network to function correctly. The activation function is classified into three
distinct categories: binary step function, linear activation function, and nonlinear
activation function.
1.6.3.1 Binary Step Function
It is also referred to as a threshold-based activation function. Its output will be

activated if the input is above the threshold and will be zero if it is below the threshold.
The neuron is activated depending on whether the input value exceeds the threshold.
Furthermore, it transmits a signal to the next layer. This function's problem is that it
does not work with outputs with more than one value.
1.6.3.2 Linear Activation Function
A linear activation function has the formula 𝐴 = 𝑐𝑥 . It receives an input,

multiplies the weights for each node, and outputs a linear function of the input. This
activation function performs better than the step function since it allows for numerous
outputs rather than a simple binary response. The disadvantage of this activation
18
function is that it cannot be used to train the network via backpropagation and has no
relationship with the input. As a result, the network is unaware of which weights should
be changed to reduce inaccuracy. With linear activation functions, all neural network
levels merge into one.
1.6.3.3 Non-Linear Activation Function
Due to the disadvantages of linear activation functions, nonlinear activation

functions are used in modern neural network models where the data contains images,
audio, and videos. They deal with complex mappings between the inputs and outputs
of the network, which are required for learning and modeling complex nonlinear data
or data with a high degree of dimension. Backpropagation is possible with nonlinear
functions because they have a derivative function connected to the inputs.
1.6.4 Types of Nonlinear Activation Function
The list of popular nonlinear activation functions [33] is discussed in this

section. The actual number of activation functions is numerous. A few essential
activation functions and their features are listed here.
1.6.4.1 Sigmoid Function
The Sigmoid Function curve resembles the English letter 'S,' otherwise known
as logistic function. While its X values vary from -2 to 2, its Y values are naturally
steep [33]. Consequently, small changes in X significantly affect the value of Y. It is
often used in the output layer of a binary classification, where the desired outcome is
either 0 or 1. Because the sigmoid function has a value between 0 and 1, the outcome
may readily be expected to be one of the values that exceed 0.5 and zero otherwise. The
major disadvantage of these functions is their inability to change significantly higher
and very lower values. This problem is known as the vanishing gradient problem.
1.6.4.2 Tanh Function
It is more effective than the sigmoid function. The Tanh function is often
referred to as the hyperbolic tangent function [33]. It is simply a shifted form of the
sigmoid. Both are related and may be deduced from one another. Because its values
vary from -1 to 1, it is best used in the hidden layers of a neural network. Here, the
19
mean for the concealed layer is 0 or extremely near to zero, which aids with data
centring. This modification simplifies the process of learning. However, the vanishing
gradient problem is also existing in the tahh function.
1.6.4.3 Rectified Linear Unit
It is the most frequently used activation function. Its value will be between 0 to
infinity if the input is positive; otherwise, it will be zero. Additionally, it is implemented
in the neural network's hidden layers [34]. We can backpropagate the errors and have
the ReLU function activate many layers of neurons. Because it requires straightforward
mathematical procedures, it is less computationally costly than tanh and sigmoid. Just
a few neurons are engaged at any given time, resulting in a sparse, efficient, and
accessible network to compute. Its inability to work well with backpropagation is
known as the dying ReLU problem.
1.6.4.4 SoftMax Function
The SoftMax function is a kind of sigmoid function. It is typically used when

dealing with several classes. The SoftMax function [35] would compress the outputs
for each class between 0 and 1 and divide them by their total. The SoftMax function is
used to get each input class's probabilities from the classifier.
1.6.4.5 Leaky Rectified Linear Unit (LReLU)
Both the Leaky and Randomized ReLU functions are naturally monotonic.
Furthermore, their derivatives are monotonous by nature [36]. The gradient might move
towards 0 due to the horizontal line in ReLU (for negative X). Because the gradient is
zero for activations in that region of ReLU, the weights are not modified throughout
the descent phase. Once the gradient becomes zero, those neurons cease to respond to
mistakes. This problem is referred to as the "dying ReLU problem." Just convert the
horizontal line to a non-horizontal component to address this issue. The fundamental
concept is to make the gradient non-zero. Leaky ReLu can overcome the dying ReLU
problem, and it will also provide compressed negative values.
20
1.6.5 Cost Function
The cost function is used for problem solving and training. It is the disparity
between actual and anticipated production [36]. The cost function can determine how
to optimize the cost by making tiny adjustments to the weights and biases. Additionally,
it is referred to as the mean square error. The model computes the error whenever a
random training sample is fed. Then, the weight and bias are updated appropriately to
minimize this inaccuracy. This procedure is repeated until the error value converges to
a minimum value or null. This scenario is when the network training has finished.
1.6.6 Optimizers
We have previously discussed the cost function and updating the weights in the
preceding section. Specific algorithms are necessary to do this. Here, optimizers step
in to connect the cost function and model parameters. Optimizers are responsible for
shaping and molding our models by adjusting weights in response to the changing
environment. Optimizers indicate the direction in which weights should be modified.
First, weights are set to a random value based on trial and error. Then, as the algorithm
moves on, weights can be changed to make the cost function as low as possible, based
on the gradient result.
1.6.6.1 Gradient Descent (GD)
The Gradient Descent optimizer is the most common. It is a quick, durable, and
flexible optimization approach that is frequently utilized. It begins by determining the
gradient of the cost function [38]. Gradients are nothing more than a loss function's
partial derivative. It dictates the direction in which we should go over the valley.
Calculating the gradient takes the loss function and weight into account. The curve's
objective is to reach the bottom of the valley, both local and global minima. It should
avoid missing global minima and be stuck in local minima. By adjusting the learning
rate appropriately, we can achieve this.
21
1.6.6.2 Stochastic Gradient Descent (SGD)
SGD selects training samples rather than computing the gradient across all
training instances [39]. SGD is a machine learning technique that uses batches of
training data or random instances.
1.6.6.3 Adagrad
Adagrad customizes the learning rate for each feature [40], which means that
some of the weights in the dataset have a different learning rate. It is meant for sparse
datasets with many missing input samples. The adaptive learning rate in Adagrad tends
to decrease significantly with time. Adagrad customizes the learning rate for each
feature, which means that some of the weights in the dataset have a different learning
rate. It is meant for sparse datasets with many missing input samples. The adaptive
learning rate in Adagrad tends to decrease significantly with time.
1.6.6.4 Root Mean Squared Propagation (RMSprop)
Professor Geoffrey Hinton created RMSprop as a variant of Adagrad [41]. It

collects gradients in a limited window rather than accumulating all gradients for
momentum. RMSprop has a lot in common with Adaprop, another optimizer that
attempts to address some of the shortcomings of Adagrad. It is a propagator of the root
mean square.
1.6.6.5 Adam Optimizer
Adam is an acronym that stands for adaptive moment estimation. It calculates

current gradients by utilizing previous gradients [42]. Adam additionally uses
momentum by multiplying the current gradient by fractions of prior gradients. This
optimizer has gained considerable adoption and is nearly universally recognized for use
in neural network training.
1.7 LITERATURE REVIEW
The reliability and security of massive MIMO networks using sophisticated

AoA estimation techniques and PLS techniques are studied comprehensively in this
22
section. Due to the dynamic nature of the channel and the number of channels that need
to be estimated, the estimation process is complex in massive MIMO. Numerous
techniques are proposed in the literature. PLS is a paradigm shift in how security is
added to massive MIMO. Various approaches are proposed in the literature to achieve
secrecy using the physical layer signal processing.
1.7.1 Review of Literature on Reliable massive MIMO using Channel AoA

Estimation
Several techniques are proposed in the literature for channel estimation in

massive MIMO. For massive MIMO networks, angular components of channel taps are
crucial to estimate where the position and positional changes of the users are critical
for communication. As a result, the estimation problem becomes more complex, and
the channel estimation demands a high level of resolution. To address the challenges
due to many antennas and users of massive MIMO networks, [43] proposed a Linear
Minimum Mean Squared Error (LMMSE) based approach to achieve optimal channel
estimates with comparable performance complexity. The correlation of spatial features
and symmetry between the BS antenna elements are utilized for efficient estimation.
[44] proposed Minimum Mean Squared Error (MMSE) based hybrid analog-digital
estimator with a beamformed signal. [45] studied the possibility of sparse Bayesian
Estimation (BE) for massive MIMO considering blind channel estimation by
considering limited distance paths between the user equipment and base station. Many
techniques are used for channel angular data estimation. The CS-based approach can
reduce the pilot overhead by exploiting the sparse structure of the channel. [46] also
proposes a CS-based approach for angular data, where the signal's sparsity is used for
recovery from under sampled than Shannon-Nyquist rate. A Rank Reduce (RARE) [47]
utilizes nested array measurements. [48] proposes the DL model for channel angle
estimation, where Time-Domain (TD) signals are considered for training and testing.
Apart from 2D massive MIMO channels, many works considered both

azimuthal and elevation angles for 3D channel estimation. [49] proposes a 3D channel
estimate step based on parametric channel prototyping for a massive MIMO network.
The channel estimation problem has a long history in wireless communication. In
Massive MIMO, several solutions to the complex gain estimate and AoA estimation
problems are proposed. The leading candidates are the conventional Kalman filtering,
23
MMSE, LMMSE, CS, MUSIC, and an estimation of signal parameters using ESPRIT.
For massive MIMO 3D channels, many techniques are proposed in the literature. In
[50] for AoA estimation, 2 MUSIC techniques are applied in azimuth and elevation
angle estimation, respectively. The primary disadvantage of that idea is its increased
system complexity and duration.
The second method is a lightweight alternative Low-Rank Decomposition

(ALRD) based on the reduced-rank approach [51]. To construct basis vectors for the
output power spectrum to determine the angular estimation ALRD-RLS and its
modified variant MALRD-RLS are used. Capan's minimum variable approaches are
implemented. The estimator proposed in [52] is another low-complexity estimation
algorithm compared to subspace-based techniques such as ESPRIT and MUSIC. This
work has an extensive analysis of the application of SVD or Eigen Vector
Decomposition (EVD) to estimate AoA. [53] provides a method for an efficient
estimate that considers direction-dependent mutual coupling. The selection matrix is
used to determine the outputs of the middle sub-array. A polynomial-based technique
is utilized, in which the roots of the prediction polynomial are treated as AoAs. This
approach aims to reduce the mutual coupling between elevation and azimuth
components which has the advantage of obviating the need for array calibration. [54]
presents AoA estimate for massive MIMO based on Uniform Rectangular Array (URA)
base stations. Thus, the technique circumvents the spectrum search by addressing the
beam-space shift-invariance structures, decreasing computing complexity. [55]
describes a method for estimating massive MIMO OFDM 3D channel for a system
using the ESPRIT. Even though the study does not include the mutual interference of
azimuth and elevation angular components, it offers a slew of fascinating conclusions
that may provide critical insights to construct a viable massive MIMO OFDM system.
All the above algorithms estimate both AoA component angles using signal subspace-
based methodologies.
The significant limitations of existing approaches are,
1. Lack of adaptability to the changing channel scenarios
2. The resolution provided by the compressive Sensing and other signal

processing-based approaches are shallow
24
3. The proposed DL techniques rely on the time domain features of the signal
4. By considering beam management techniques, user separation in the received

signal at BS in the training dataset of DL-based methods can be improved,
which is not considered in existing DL-based approaches.
1.7.2 Review of Literature on Secure massive MIMO using PLS
Numerous techniques used to fight against intruder attacks fall under the PLS
category. Numerous solutions have been offered [56] - [72]. The primary categories of
proposed technologies are multi-antenna approaches, coding techniques, key
generation techniques, and AN scheme. AN scheme sends well-structured artificial
noise in every direction except the direction of LU. The null space of the AN will be
along with the LU directions. The PLS based on beamforming employs two approaches.
In the first approach gain of LU is increased without consideration of the eavesdroppers'
channel. Another approach gains of the LU increased without regard for the AU
channel. ZF methods broadcast the eavesdropper's signal in an orthogonal direction. In
PLS for massive MIMO, various approaches can achieve secrecy. These are artificial
noise injections used to weaken the signal for the eavesdropper. [73] investigates signal
constellation modification. Another technique is to use secure beamforming to block
an eavesdropper's reception. PLS has recently been employed with the CS approach
[74]. Physical layer encryption [75] is one of the most often used PLS approaches.
Security solutions based on PLS are being investigated as a potential security

paradigm for 5G cellular communication networks. The multi-tier design of 5G
networks, which incorporates a variety of power level signals and hierarchies, makes it
practically hard to distribute and manage cryptographic keys. Numerous methods such
as PLS Coding, Artificial Noise Injection, Physical Layer Encryption, Cooperative
Jamming, DCE, and multi-antenna techniques are employed as PLS in 5G networks.
All these PLS techniques are applied for various essential 5G enabler technologies,
including mm-Wave propagation [76], Hetnets [77], Full-Duplex (FD) [78], and
NOMA [79]. All these works built the problem scenario based on passive and active
eavesdroppers. The passive eavesdropper is only capable of overhearing the message
or CSI. On the other hand, an active eavesdropper can modify the channel estimate or
message. [79] proposed precoding based on matched filtering and AN generating
design to weaken the wiretap channel. The i.i.d Rayleigh channel model is considered
25
in the multicell multi-user situation. Regularized channel inversion and improved AN
design are presented for the same channel characteristics [80]. Both articles use a single
intended user and a single attacker for the i.i.d Rician channel model, while [80]
considers a single legitimate user and many adversaries. [81] proposes a distributed
power allocation solution for physical layer security. [82] investigated a durable
beamforming technique for a massive MIMO system based on Beam Division Multiple
Access (BDMA), in which the spatial multiplexing technique is used for security. [83]
investigated the Chaotic Antenna-index Three-dimensional Modulation and
Constellation Points Rotated (CATMCPR) as a physical layer encryption solution. [84]
investigates constellation change in Orthogonal Frequency Division Multiplexing
(OFDM) networks. [85], [86] investigated an active attack on Rayleigh i.i.d
eavesdropper in single-cell multi-user, whereas [87] investigated the same for
correlated Rayleigh channel.
Another approach to PLS is to generate physical layer keys. The LU will agree
on a randomly shared feature to create an encryption key. Numerous efforts have been
made in the literature to establish key generation methods for PLS. The PLS based on
frequency hopping that performs better in low-unpredictability channels is proposed by
Liangjun Hu et al. [88]. Ozan Alp Topal et al. [89] did a theoretical analysis and
practical verification of error rates of the physical layer key generation. Miro Bottarelli
et al. [90] developed a method for minimizing the number of numerical computations
required to produce physical layer keys. The work utilized Principal Component
Analysis (PCA) for dimensionality reduction. Zijie Ji et al. [91] examined the effect of
the ERASE environment reconstruction attack on the creation of physical layer secret
keys. Ahmed Badawy et al. discuss the secret key creation capacity of PLS. [92]. Ankit
Soni et al. [93] propose a key generation technique to apply to average based on moving
window to the RSSI of the pilot signals for IoT.
Reem Melki et al. [94] analyzed features of PLS key generation for OFDM
networks. The higher edge of the physical layer key rate on AWGN multipath fading
asymmetric channel is derived by Abdul Sahib et al. [95]. Combining GenSM and
massive MIMO is a novel approach to massive MIMO [96], [97]. GenSM is a relatively
new technology that provides dramatic efficiency in spectral utilizations. PLS design
of massive MIMO with GenSM assistance is a relatively new field of research. We
offer a PLS with key generation for a massive MIMO system with spatial modulation
26
based on the CATMCPR [98], in which a small number of mappings are named
Antenna Combinations and Mapping Modes (ACMM) are saved in a memory location
as dictionary entries. One of the ACMM entries is chosen using the key created. The
analysis of the existing key generation exposed the major disadvantage of the works in
literature that the generated physical layer keys remain the same for a considerably long
time due to the static nature of the channel in specific scenarios. This demerit may cause
a security hole that quickly helps Eve break the key. This work presents a DL-based
key generation technique for massive MIMO with GenSM assistance. The privacy
amplification and information reconciliation phases of chaotic key generation can be
reinforced further using DL techniques. This central area of study in this chapter
revolves around the kind of PLS based on DL algorithms.
Y. Wu et al. [99] investigated how to allocate power for beam domain

communication in the presence of a passive attacker equipped with numerous antenna
systems. The research establishes the optimal power allocations by directing zero power
toward beams with high gain for AU and LU. Amplification and forward relaying are
investigated using Maximal Ratio Combining (MRC)/Maximum Ratio Transmission
(MRT). A power management strategy is presented for the Rician fading channel, using
the cumulative secrecy rate as the goal function for optimization. H Wei et al. [100]
investigated the problem of internal eavesdroppers, or idle LUs, and devised a PLS
approach for massive MIMO that incorporates AN and MRT precoding techniques. The
benefit of this technique is that it enables the hiding of a more extensive area with little
to transmit power. J Zhu et al. [101] proposed a PLS design based on AN and matched
filtering. The same authors introduced a channel inversion-based approach for multi-
user massive MIMO networks [102]. K Guo et al. [103] propose a PLS architecture for
massive MIMO based on power allocation.
The literature also extensively investigates the PLS based on DCE; the
methodologies reported in the research fall into two types. The first is a DCE that uses
Feed-Back and Retraining (FBR), while the second is a DCE that utilizes TWT. In the
FBR technique, the BS employs the initial low power pilot sequence for the first
estimate at LU. The AN-injected pilot sequences are broadcast depending on the
original estimation's feedback CSI. LU first delivers the pilot sequences, instantly
activating the BS channel taps in the TWT-based DCE. BS puts the AN into the reverse-
path pilot sequence by leveraging the channel's reciprocity. If the channel is not
27
reciprocal, just a few round-trip training pieces are required to get an acceptable degree
of discrimination.
T H Chang et al. [104] suggested a multistage DCE. Compared to AU, LU's

estimate approach was deemed reasonable. While the channel estimation performance
at LU is not perfect in the early stages, after a few stages, rapid improvement in LU
estimation is seen in terms of NMSE. T Y Liu et al. [105] proposed AN design as an
alternative to FBR-based DCE. The fundamental contribution of this paper is that AN
does not need the existence of channel null space, which allows the PLS to function in
the absence of primary restrictions. The constraint is that the total number of transmitter
antennas must exceed the number of receiving antennas. C W Huang et al. [106]
examined the DCE for TWT-based MIMO systems. They established the MMSE at LU
by setting a limit on the MSE at AU. J. Yang et al. [107] proposed a Whitening Rotation
(WR) for TWT-based DCE. The examination of LU and AU's Normalized Mean
Square Error (NMSE) reveals that the real benefit of this work is its robust performance,
even in the presence of active eavesdroppers. Fawad U D et al. [108] proposed a TWT-
based DCE in which LUs estimate the main stage self-interference channels, followed
by LUs communicating in-band FD. Intermixing the signals produces the necessary
ambiguity at the AU for the DCE. The BER and the secrecy capacity are used to conduct
the performance study. Albehadili et al. [109] proposed a DCE model based on the
channel's space-time signature. The study considers the uniqueness of the space-time
channel responses for each network connection. The performance analysis is conducted
regarding the likelihood of making an error.
1.7.3 Review of Literature on DL-aided Reliable and Secure massive MIMO
DL is a seminal advancement in data science. It is a game-changer in practically

every field where computer algorithms are used. Speech recognition, voice assistants,
self-driving vehicles, search engines, and computer vision are among applications in
which deep learning is advantageous. Deep learning is also critical in wireless
communication technologies. Many DL models have been proposed in the literature for
various problems in wireless communication networks [110]. In the OFDM network,
[111] signal detection and channel estimation utilizing DL. DL suggests a NOMA
variation Sparse Code Multiple Access (SCMA) [112]. [113] investigates the power
28
domain NOMA, which is based on deep learning. [48] proposes DL-based super-
resolution AoA and channel estimation for massive MIMO networks.
Due to the limited resolution and angular coupling of the derived estimate, its
accuracy is reduced. With intrinsic adaptability, data-driven strategies can increase
channel estimation accuracy by reducing angular coupling. The DL technique
outperforms the conventional approach to estimation by addressing optimization via
data-driven signal processing. PLS, NOMA, synchronization, spatial modulation,
channel estimation, signal modulation detection, and beamforming are only a few
topics discussed. [114] approaches the beamforming problem via the DL algorithm, an
adaptive system, and reports a considerable gain enhancement. [115] proposed a novel
method for tracking and estimating vehicular millimeter-wave (mm-Wave) channels. It
offers a cost-effective blind channel estimation method based on DL for Uncoded
Space-Time Labelling Diversity (USTLD) systems. [116] provide an AE algorithm-
based solution for IEEE 802.11p Vehicle-to-Everything (V2X) networks by enhancing
the Data Pilot Assisted (DPA) estimation technique. [117] proposes a method for
estimating the AoA of a Uniform Circular Array (UCA) BS-based hybrid massive
MIMO model using a compact DL algorithm. The candidate angles are created using a
feed forward DNN, which are then given to a selection network for choosing the right
angles. A pilot-aided channel estimator for OFDM based on the DL algorithm LSTM
system is proposed in [118]. The authors quantify the proposal's performance and
compare the results with MMSE and Least Squares (LS) estimators. [119] proposes a
DL-based high-resolution channel estimator for a 2D massive MIMO channel using the
learned DNN model. This work does not explore the combined estimation of 3D angles
using the DNN.
In the literature on PLS, DL approaches are also widely used. R Liao [120]
proposed a DL-enabled physical layer authentication technique for the Industrial
Internet of Things (IIoT). Encryption is performed using three models: DNN, CNN,
and Convolutional Pre-processing Neural Network (CPNN). An encryption technique
based on DL-based CSI feature vectors is proposed by A Y Abyanh et al. [121]. In the
encryption process, the CNN architecture is used to extract feature sets and works as a
classifier. G Baldini et al. [122] proposed a hybrid architecture that combines the
benefits of Recurrence Plots (RP) with CNNs. Ferdowsi et al. [123] introduced the
LSTM deep learning model for PLS to defend against man-in-the-middle and data
29
injection attacks. They described the PLS based on RNN for countering spoofing
attacks that exploit the channel's CSI. Hao Gu et al. [124] introduced a complex-valued
CNN for lawful drone identification using RF fingerprinting-based PLS. The
researchers have obtained a classification accuracy of 99.5 percent using a specific
dataset. D Deng et al. [125] established a CNN-based PLS for MIMO heterogeneous
networks with an imperfect CSI.
1.8 MOTIVATIONS FOR RESEARCH
Massive MIMO channel conditions are rapidly varying. As the number of users
increases, channel estimation requires high resolution and adaptability to changing
situations. The broad spectral width in high-rate links and many antennas considerably
increases channel estimate size, posing severe challenges to conventional massive
MIMO channel estimation techniques. Also, conventional channel estimation suffers
from higher signal acquisition costs and severe training overhead due to many antennas
in massive MIMO systems. Conventional signal processing techniques are less suitable
for this scenario. DL techniques are an illustrious candidate for their highly adaptive
nature via active real-time learning. Unlike traditional algorithms, where explicit steps
need to be defined before deployment, AI techniques learn the steps by analyzing the
data. This peculiarity of AI techniques makes it an apt candidate for massive MIMO
channel estimation.
For security purposes, the estimated channel CSI might be used. PLS is a new
paradigm for wireless network security based on the channel's physical layer
characteristics. Since the development of quantum computing technologies, standard
security methods in wireless networks have been dramatically deprecated, meaning
wireless communication is not entirely secure. PLS takes a novel method of using the
wireless channel's obscurity. The primary goals of PLS are to identify alternative
methods for supplementing existing cryptographic-based security measures in wireless
networks. To leverage the network's physical layer to improve security other than
enhancing security methods implemented at the protocol stack's upper layer.
The PLS consists of different approaches to support the authentication and

confidentiality objectives of PLS. The techniques are multi-antenna, security coding,
key generation, and discriminatory channel estimation. Although all these techniques
30
can introduce the flavor of secrecy in the massive MIMO system's physical layer,
existing techniques face many challenges. All the PLS techniques proposed in the
literature consider the channel Wide-Sense Stationary with Uncorrelated Scatters
(WSSUS). In conclusion, this considers that the channel between Alice and Bob's
channel values is hard to reproduce by Eve. This assumption is based on the assumption
that the received signal has rapid spatial decorrelation around the half-wavelength
distance values. If the scattering is poor in the environment, the assumption will not
hold perfectly in practical scenarios. Also, more sophisticated attackers can utilize
raytracing techniques to determine the scattering characteristics of the environment if
the blueprint of that environment is available.
Another critical challenge in existing PLS is that channels need to be dynamic

for PLS techniques where the channel estimates are chosen as the source of
randomness. The temporal decorrelation of the wireless channel is a critical technique
to generate the keys. A static channel will raise the issue that the generated key's
temporal correlation cannot be ensured, which helps Eve hack the key quickly. The
existing PLS has another vital issue the assumption that the channel follows Gaussian
distribution. Thus, the distribution of fading is symmetric. This assumption sometimes
oversimplifies the actual wireless scenario. In order to address these challenges, the
PLS system should be
1. More sensitive to more minor changes in the channel so that even the smaller
changes in a comparatively static channel can produce temporally decorrelated PLS
keys.
2. The PLS should be adaptive to any scattering environment
3. The system should not depend too much on the gaussian symmetry in fading
distribution of the channel
The aforementioned conditions can be met by incorporating the DL techniques

into the existing PLS concepts. Deep Learning is a popular subset of machine learning
in which DNNs are trained with good feature sets and then deployed for real-time
problem-solving. DL algorithms have widespread applications in computer vision,
natural language processing, automated cars, etc. DNN, CNN, Recurrent Neural
Network (RNN), Generative Adversarial Network (GAN), and LSTM are widely used
31
DL algorithms. The DL techniques are data-driven techniques trained with the actual
channel samples. From the actual channel dataset, the model can learn the distribution
function. Due to this act, the variation from the gaussian symmetry can withstand using
DL-based PLS techniques. The data-driven techniques' inherent dynamic nature can
produce the best results even if the channel is comparatively static. The techniques like
chaotic neural networks can introduce the required uncertainty to the produced keys
from the temporally correlated channel estimates. If the PLS can incorporate a
sophisticated PLS technique even if the WWSUS assumption does not hold, then the
system secrecy can also be kept.
This work primarily considers the reliability and physical layer security
improvement of massive MIMO networks using DL architectures. The Two–
Dimensional (2D) and Three–Dimensional (3D) channel estimation is carried out using
DL algorithms. The proposed techniques are compared with conventional channel
estimation techniques. The estimated data is used for beamforming, and the bit error
rate of the received signal is chosen for performance analysis. The PLS of the massive
MIMO network is also implemented using DL algorithms. For PLS, the Auto-Encoder
(AE) model, Chaotic Deep Neural Network (CDNN), and Generative Adversarial
Network (GAN) are chosen for the PLS. There are two metrics for performance: the
network's secrecy rate and the probability of secrecy outage.
1.9 RESEARCH OBJECTIVES
The research objectives of this study can be summarized as follows,
• To investigate the role of Artificial Intelligence techniques for reliable and

secured massive MIMO communication networks.
• To maximize the reliability of massive MIMO networks by improving channel

estimation accuracy.
• To compare the performance of the Time Domain (TD), and Frequency Domain
(FD) datasets for DL enabled channel estimation.
• To study the performance of 3D channel estimation based on the DL algorithm.
32
• To compare the BER of the DL enabled massive MIMO communication with
the conventional techniques.
• To improve the secrecy performance of massive MIMO networks by using deep

learning-based physical layer security systems.
• To analyze the eavesdropping ability of the eavesdropper when the DL based

PLS is employed and compare it with the conventional PLS techniques.
• To evaluate the secrecy metrics of the data-driven PLS techniques with the
conventional PLS techniques.
1.10 ORGANIZATION OF THESIS
The thesis is arranged so that each chapter discusses the major concepts and
contributions made throughout the research. Chapter 1 provides a complete
introduction to the massive MIMO, its channel and AoA estimation, PLS, and Deep
Learning. This chapter also provides an intensive literature survey on the research
topics and the challenges and literature gap in the knowledge map. From the mentioned
challenges, the chapter focuses on the motivation and objectives of the research. The
research aims and the arrangement of the work have been described. Chapter 2
provides a strategy for 2D channel estimation based on DNN models and a PLS scheme
based on auto-encoder models. This chapter conducts an in-depth study of the
performance of both estimation and PLS. Chapter 3 details the design and
implementation of a 3D massive MIMO channel estimation algorithm based on KNN
and DNN. The performance analysis includes data preprocessing and feature
engineering. Chapter 4 introduces a novel framework for PLS based on CDNN for
massive MIMO using Generalized Spatial Modulation (GenSM). This chapter details
the mechanism for generating keys from channel estimations.
Additionally, the performance of the secrecy mechanism is assessed and

compared to that of current strategies. Chapter 5 discusses Deep Discriminatory
Channel Estimation (DDCE), a DCE technique based on DL algorithms. The
performance analysis and modeling of the GAN model for DCE are also discussed in
this chapter. A short description various GAN frameworks and loss functions together
33
with the comparison of performance is also included in this chapter. Chapter 6
summarizes the investigation's key findings. Also, a synopsis of the scientific
contributions and future directions has been provided in the final chapter.
34

Chapter 1 Introduction

Uploaded by

Copyright:

Available Formats

Chapter 1 Introduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1 Introduction

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Future wireless network standards require newer technologies to meet the

Figure 1.1 Data Usage Prediction - Ericsson Mobility Report 2021

Area Throughput = Bandwidth × Cell Density × Spectral efficiency (1.1)

The third category of techniques for increasing area throughput is increasing

1.2 MASSIVE MIMO FUNDAMENTALS

Figure 1.3 Massive MIMO Network

Figure 1.4 Uplink massive MIMO with Two Users

ℎ𝑘 = [ℎ𝑘1 ℎ𝑘2 … ℎ𝑘𝑁𝑇 ]𝑇 (1.2)

The received signal at the BS will be

̂1 = 𝑙1𝐻 𝑦 = 𝑙1𝐻 ℎ1 𝑥1 + 𝑙1𝐻 ℎ2 𝑥2 + 𝑙1𝐻 𝑛

of transmitting antennas goes to infinity, the 𝐸(ℎ11 𝐻 ℎ21 ) = 0 . Theoretically, the

antennas, 𝐸(ℎ11 𝐻 𝑛) = 0 , So we can theoretically conclude that the noise power

1.3 ADVANTAGES AND CHALLENGES OF MASSIVE MIMO NETWORKS

Massive MIMO is the physical-layer technique in which each BS is equipped

1. Higher spectral efficiency

Pilot contamination is a significant challenge to reliable communication through

1.4 CHANNEL AoA ESTIMATION FOR RELIABLE MASSIVE MIMO

Wireless channel estimation is a rapidly growing research topic in 5G and

Additionally, modern Bluetooth proximity technology determines the distance

Consequently, the signals arrive at the antennas at varying times, resulting in a

The fundamental technique determines the AoA by using receive beamforming

Additionally, when we need to estimate the directions of several signals, both

Traditional approaches for estimating AoA have several disadvantages due to

Apart from ensuring reliable connectivity, security is another critical

Additionally, cryptographic approaches may introduce undesirable delays in

Traditional security systems are based on computational exercises. Rather than

Coding, multiple-antenna approach, key generation, Artificial Noise (AN)

Additionally, most PLS techniques require previous knowledge of the wiretap

Figure 1.7 PLS for Wireless Communication Systems

1.6 DEEP LEARNING FUNDAMENTALS

The study consists of applying data-driven techniques, significantly how deep

Figure 1.8 Mathematical Equivalent of a Neuron

However, as AI became a reality with several practical applications, this

1.6.1 Deep Neural Networks (DNN)

The ability of an ANN with several hidden layers to solve numerous

1.6.2 Multi-Layer Perceptron (MLP)

The elementary type of DNN is the multilayer perceptron, which contains

When sample sizes are small, training should be supplemented by some

Figure 1.9 Basic Multi-Layer Perceptron

Activation functions are critical in DL literature, classified into seven main

Additionally, activation functions convert the output of each neuron to a value

1.6.3.1 Binary Step Function

It is also referred to as a threshold-based activation function. Its output will be

1.6.3.2 Linear Activation Function

A linear activation function has the formula 𝐴 = 𝑐𝑥 . It receives an input,

1.6.3.3 Non-Linear Activation Function

Due to the disadvantages of linear activation functions, nonlinear activation

1.6.4 Types of Nonlinear Activation Function

The list of popular nonlinear activation functions [33] is discussed in this

1.6.4.1 Sigmoid Function

1.6.4.2 Tanh Function

1.6.4.3 Rectified Linear Unit

1.6.4.4 SoftMax Function

The SoftMax function is a kind of sigmoid function. It is typically used when

1.6.4.5 Leaky Rectified Linear Unit (LReLU)

1.6.6.1 Gradient Descent (GD)

1.6.6.4 Root Mean Squared Propagation (RMSprop)

Professor Geoffrey Hinton created RMSprop as a variant of Adagrad [41]. It