Machine_Learning_With_Computer_Networks_Techniques_Datasets_and_Models
Machine_Learning_With_Computer_Networks_Techniques_Datasets_and_Models
ABSTRACT Machine learning has found many applications in network contexts. These include solving
optimisation problems and managing network operations. Conversely, networks are essential for facilitating
machine learning training and inference, whether performed centrally or in a distributed fashion. To conduct
rigorous research in this area, researchers must have a comprehensive understanding of fundamental
techniques, specific frameworks, and access to relevant datasets. Additionally, access to training data can
serve as a benchmark or a springboard for further investigation. All these techniques are summarized in
this article; serving as a primer paper and hopefully providing an efficient start for anybody doing research
regarding machine learning for networks or using networks for machine learning.
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 54673
H. Afifi et al.: Machine Learning With Computer Networks: Techniques, Datasets, and Models
Intelligence (AI) refers to machines or systems that can are beginning to take on the aforementioned challenges of
perform tasks typically requiring human intelligence, such as the networking community on ML, and combining ML and
learning, reasoning, problem-solving, perception, language networking in research seems more attractive than ever.
understanding, and decision-making. ML is a subfield of Furthermore, computer network infrastructures have been
AI that concentrates on developing algorithms and statistical used recently to improve the performance of existing ML
models. These models enable computers to perform tasks approaches, e.g. by distributing the training process or the
without explicit programming. In other words, it involves data collection to improve resource utilization or training
using statistical techniques to enable machines to learn from speed.
data and improve their performance over time. ML models ML is a very active and rapidly expanding research
repeatedly show their potential for delivering high-quality field that includes an abundance of learning techniques,
output (e.g. classifications/decisions, regression values and model types, tools and frameworks, practices, and application
generated artifacts) in highly complex environments with possibilities. Although we focus here on ML models, some
non-trivial decision boundaries. Generally, for that sort of applications require considering the whole running system,
environment, the proposed ML model greatly reduces the i.e., AI system, to properly evaluate and understand the
compute resources needed to generate an adequate response output, instead of focusing solely on the ML models [4]. This
and/or generates outputs that are much ‘‘better’’ than what paper is intended as a primer/practical guide for researchers
existing models could deliver. That being said, for more who are keen on quickly applying ML to problems in com-
complex problem domains, most ML approaches require puter networking and/or leveraging networking techniques
substantial amounts of compute resources and training data. to improve the performance of their ML systems but feel
Since most ML models aim at generalizing from specific overwhelmed by the possibilities the intersection of ML and
records of data, the quality of these data samples is essential computer networking provides. The key points of the paper
to the overall model performance. This often means that large are the following:
amounts of data records are required to depict a sufficiently • It first introduces the most relevant concepts and model
representative portion of the problem’s data domain. Also, architectures of ML and then puts them into the context
more sophisticated models can quickly explode in terms of of the different networking problem domains and the
parameter/compute operation count and thus often require latest advancements therein,
specialized training hardware (i.e. memory and compute). • It exposes the currently open problems within computer
Nevertheless, the continuous improvement of used hardware networking and introduces a selection of different tools,
as well as the increased attention towards training data data sets, and approaches that have been popular among
acquisition, preparation and generation has paved the way for the research community and might serve as a starting
ML to enter into more and more application domains. point for future work,
Computer networking is a highly complex problem domain • It covers several techniques for utilizing networks to
with a plethora of tasks and problems that, to this day, improve ML efficiency, such as reducing resource
are solved predominantly through hand-crafted, algorithmic, requirements via Split Learning (SL) and distributed
or heuristic methods. These methods have to respect a wide training via Federated Learning (FL) or incorporating
range of topologies, network types and scopes, configura- the right inductive biases into ML models to improve
tions, hardware and protocol stacks, traffic patterns, and other their ability to generalize from limited data,
sources of variation. Furthermore, there are many different • It discusses challenges related to networks for ML, such
ways to assess network performance, and in many cases, as resource constraints, security concerns, and the lack
minimum performance guarantees and security policies add of understanding of how ML models make decisions
special constraints to the optimization problem. Additionally, (and how techniques such as Explainable Artificial
contemporary networks use specialized hardware to deliver Intelligence (XAI) may help in gaining understanding),
optimized performance, e.g. for forwarding packets at line • It comprehensively provides pointers for further study
speed. Oftentimes, this hardware does not easily allow ML on related surveys and research.
models to replace existing functionality, e.g. because certain The organization of the paper is visualized in Figure 1,
types of computations are not supported or because the stor- and the remainder is organized as follows: Section II explains
age is not available for more complex ML models. Finally, the basic concepts and categories of ML and relates common
while network administrators and networking researchers do networking problems to them. Section III introduces the ML
monitor their networks in action, the amount of useful ML subfield of deep learning, which has been responsible for
training data in networking – data that is not noisy nor most of the recent ML breakthroughs, elaborating on the
incomplete, publicly available, and diverse enough to cover most common model architectures and how and why they
large parts of the problem’s underlying data domain – is are suited to specific tasks within computer networking.
only a fraction of what other problem domains have at their Thereafter, Section IV sheds light on the variety of accessible
disposal. As a consequence, optimizing network performance data sets, tools, and frameworks that ease the development
has so far been largely beyond the reach of ML research. and training of ML-powered networking systems. Section V
However, given the increased visibility of ML, researchers discusses explainability in Artificial Intelligence (XAI),
which is rightfully gaining traction because many recently subfield [5]. ML models are statistically and computationally
tapped application domains (including computer networks) derived from evidence in the form of historical data or
come with amounts of complexity and risk that disqual- experience instead of explicitly programming a machine for
ify fully black-box ML models for widespread adoption. a task. The three traditional ML paradigms are supervised,
Section VI broadens the scope presented up until now and unsupervised, and Reinforcement Learning (RL). Methods
introduces ML techniques and paradigms such as distributed can be categorized into these paradigms by the type of
and parallel learning. These techniques leverage existing feedback the learning system receives. In supervised learning,
networking concepts and technology and seem useful, if not exact feedback is available in the form of data labels.
mandatory, for many problems in the networking domain. In unsupervised learning, on the other hand, data is only
Section VII and Section VIII give an overview of related partially labeled or completely unlabeled. Finally, in RL,
survey papers and open challenges in the concerned areas, implicit feedback is available for observed data in terms of
and finally, Section IX concludes this paper by summarizing a so-called reward function that labels data by a numerical
the content presented in this paper and providing perspectives value. We will now discuss the three main ML paradigms with
on the open challenges and questions of ML in networking a focus on the most popular ones. We then briefly touch on
and vice versa. some additional branches of ML that are relevant to computer
networking.
problem is a data set that consists of input-output data as in the SVM example (left side). Data points where
points D = (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ). The goal is to Feature2 ≤ −0.103 are assigned Class 1, all others are
learn a function h mapping from the input domain to the assigned Class 2. This decision boundary is visualized
target domain such that ŷi = h(xi ) for all data points. as the color step from green to blue in the left plot.
Both input and output domains can take various shapes, Decision Trees are a simple yet powerful tool to reach
such as boolean or scalar values, euclidean vectors or conclusions from input data with a high degree of human
more complex representations such as graphs. Depending explainability (see Section V).
on the type of output domain, supervised learning is • Random Forests [12] create a collection of decision
generally divided into classification and regression problems. trees, each trained on a different subset of the training
Examples of popular network applications that use supervised data. To achieve as little inter-tree correlation as
learning are traffic prediction [6] and classifying security possible, a random subset of features is considered for
attacks [7]. each split at each node when constructing the decision
trees. The results of all individual decision trees are
1) CLASSIFICATION aggregated to a final prediction. The class with the
In classification problems, the output domain is finite, majority vote among the trees is chosen as the final
e.g. true/false, sunny/cloudy/rainy or the set of digits 0-9. prediction. Figure 2c visualizes the decision boundary of
Examples from the networking domain include anomaly a random forest with three individual trees for the same
detection [8] (‘‘Given the current network monitoring data, data set used for the SVM and decision tree examples.
is the network showing abnormal behavior?’’) and failure Compared to single decision trees, random forests are
prediction [9] (‘‘Is this network node going to fail?’’). The known to improve the prediction accuracy as well as to
most fundamental models for classification are explained in reduce overfitting [12].
the following paragraphs. • The k-Nearest Neigbors (KNN) [13] algorithm is a
• Support Vector Machines (SVMs) [10] aim at construct- simple technique to assign class labels to new data
ing a so-called maximum margin separator - a decision points by examining the class labels of its k-nearest
boundary that divides samples of two different classes neighbors with known labels. Given the features of
with a maximum possible distance to the boundary. the input data point, these k-nearest neighbors are
This situation is depicted in Figure 2a. The solid determined by calculating a distance metric in the
black line represents the maximum margin separator input space, e.g., the Euclidean distance, Manhattan
and the two dashed lines visualize the margins to distance, or Minkowski distance. The class label of the
both classes. The nearest data points to the separator majority of these neighbors is then inherited for the new
are called the support vectors (red circles), as they data point. Figure 2d visualizes the decision boundary
support the position of the decision boundary. Generally, for the known data set using the Minkowski distance
the larger the margin, the better the generalization of and k = 5.
the model, as it reduces the risk of misclassifying
new, unseen data. Since the decision boundary is a 2) REGRESSION
separating hyperplane, the classification task fails for In regression problems, the output domain is continuous,
data that is not linearly separable. However, SVMs e.g., Rn (n ≥ 1). Examples from the networking domain
can also be used for non-linearly separable data by include network performance prediction [14] (‘‘How will
applying the kernel trick. It transforms the data into the network perform in the future, given certain network
a higher-dimensional space where it becomes linearly conditions and traffic?’’) and traffic prediction [15] (‘‘How
separable and a separating hyperplane is calculated. much / which type of traffic will be generated in the
When that linear hyperplane is transformed back into near future?’’). In principle, any function fw with learnable
the original space, it becomes a non-linear or even parameters w can serve as a regression model. However, the
incoherent hypersurface. structure of fw and the optimization procedure used to update
• Decision Trees [11] are structured like an inverted tree, the learnable parameters are crucial to finding good function
with a root node at the top, branching out into internal parameters efficiently. The most fundamental regression
nodes, and ending in leaf nodes at the bottom. The methods will be explained in the following paragraphs. All
data is split at the root node and the internal nodes of the aforementioned classification methods can also be used
based on a threshold value for a feature. The splitting for regression with slight modifications.
process continues until a stopping criterion is met, such • Support Vector Regression (SVR) [16] is an extension of
as reaching a maximum depth or minimum number of SVMs for regression tasks. It aims to find a function
samples in a leaf node. Leaf nodes represent the final f that approximates the relationship between input
predictions of the decision tree. The majority class in features and continuous target values with a certain
each leaf node is used as the prediction. Figure 2b degree of error tolerance. The error tolerance (ϵ) defines
visualizes a simple example decision tree (right side) an ϵ-tube around f . Inside this tube, errors from the
with a root node and two leaf nodes for the same data regression model are not penalized. The algorithm
FIGURE 2. Visualization of different classification methods based on a data set containing 30 samples and two features. The samples contain
15 samples of two different classes that are perfectly linearly separable. A Gaussian noise with a mean of 0 and a standard deviation of 0.8 is added,
making classification errors likely.
maximizes the number of training data points inside this ML models are tasked with finding the underlying regularities
tube. ϵ is a parameter defined by the user. Similar to in the data domain by inferring them from the given training
SVMs, the kernel trick can be applied to create non- data. The two main types of unsupervised learning, namely
linear SVRs. clustering and dimensionality reduction, differ in their use
• Decision Trees [11] can be used for regression tasks by case. Supervised learning has been used for tasks such as
using the average value of the samples in each leaf node anomaly detection, intrusion detection [17] and data traffic
as the prediction value. analyses [18].
• Random Forests [12] for regression use the average of
1) CLUSTERING
the individual trees’ predictions as the final prediction
value. Clustering approaches use the data points’ feature values to
• The KNN [13] method for regression calculates the label find regularities in the data domain and thus divide them
for the new data point by calculating the average target into multiple semantically meaningful categories. Clustering
value of its k-nearest neighbors. approaches such as k-means or Density-Based Spatial
• The most popular regression method is least-squares Clustering of Applications with Noise (DBSCAN) [19] differ
fitting, in which the model is updated to minimize in the way cluster affiliation is calculated, for example,
the squared L2 norms of the difference between the through data density or neighbor connectivity via measurable
predicted values and their associated labels. This is distance between the data points. Within the networking
known as the Mean Squared Error (MSE). domain, data grouping can serve as a useful starting point
In linear regression, this line is represented by a for further analysis and action in a variety of problem
linear function, while in logarithmic regression, it is settings, such as anomaly detection and resolution [20], task
represented by a logarithmic function. In other words, classification for scheduling [21], or traffic characterization
least-squares method fits a line to the data points in a way for traffic engineering [22].
that minimizes the sum of the squared vertical distances In general, there are different metrics to evaluate the
between the line and the points. performance of ML algorithms. Table 3 shows the most
common metrics, which appear in the literature, used for
B. UNSUPERVISED LEARNING supervised learning (with an emphasis on classification met-
As opposed to supervised learning, in unsupervised learning, rics that are typically used for evaluating traffic prediction)
the data comes without output/target values. Consequently, and unsupervised learning (with an emphasis on clustering
metrics as seen in intrusion detection as well as node(s) its comprehensive introduction would exceed this paper’s
selection for data collection). scope, we point the interested reader to [30] for a high-level
overview, and [31] for an extended overview of the core
2) DIMENSIONALITY REDUCTION concepts of probabilistic ML.
This type of learning analyzes the statistical properties of
the data in order to reduce the number of dimensions 2) HYBRID LEARNING APPROACHES
that sufficiently describes the data. This is particularly Many ML contributions do not fully fall into one of the
useful when dealing with more complex learning problems, aforementioned learning paradigms but rather combine their
as theoretical results show that the amount of data points ideas and create new sources for learning signals. Some
needed to learn an accurate model scales exponentially of these ‘‘hybrid’’ learning approaches are popular enough
with the dimensionality of the input data domain [23] (this to earn their own description. In semi-supervised learning,
phenomenon has been coined the ‘‘curse of dimensionality’’). typically, only parts of the training data are labeled [27].
While approaches like Decision Trees or Random Forest To train a model in a supervised or unsupervised manner,
can reduce the dimensionality of the relevant portions of the auxiliary information is extracted by respectively using
the data by considering only the most meaningful features, the other learning type. Self-supervised learning, on the
approaches like Principal Component Analysis (PCA) [24] other hand, tackles shortcomings of supervised learning
find a reduced-cardinality combination of new features. Like approaches (i.e., the need for large amounts of data and
clustering, this type of unsupervised learning is beneficial as vulnerability to adversarial inputs) by using parts or rep-
a preparative step before further analysis or model training, resentations of the input data as labels [32]. For example,
especially since, in many real-world scenarios, it has been in [33], a model is trained to predict future video frames by
observed that the given data lies on manifolds of much lower only feeding it the first few frames of a video and using the
dimensionality than the actual input space (the presumed remaining frames as ‘‘comparison’’ labels.
general rule for this is called the manifold hypothesis [25]).
Table 1 summarizes supervised methods, while Table 2
D. REINFORCEMENT LEARNING
summarizes unsupervised methods. Further details can be
In the spectrum of traditional learning paradigms for
found in [26] and [27]. Regardless which method is used, it is
intelligent agents, Reinforcement Learning (RL) is located
important to watch out for over- and underfitting.
between the two extreme domains of fully supervised
Overfitting is a condition where a statistical model begins
and unsupervised learning. RL is particularly suitable for
to describe the random error in the data rather than the
decision, control, and optimization problems where data
relationships between variables. This problem occurs when
the model is too complex. and observations are received sequentially [34]. As such,
Underfitting, on the other hand, is the inverse of overfitting. RL can be applied to various challenging problems in network
It means that the statistical or ML model is too simplistic science [35], [36], [37]. Especially, Deep RL (DRL) methods
to accurately capture the patterns in the data. A sign of as to be discussed in Section III-C have seen tremendous
underfitting is that there is a high bias and low variance success in solving resource allocation problems in computer
detected in the current model used. networking [38].
The implementation of RL is based on an RL agent
that receives performance feedback called rewards as the
C. FURTHER ML METHODS
agent interacts with an environment over time [39]. The
There are various other branches of ML that are of use in algorithm designer typically crafts the reward as a function
computer networks, see [28] and [29]. Here, we discuss two of the agent’s sequential observations. The rewards, however,
additional ML frameworks that are presumably relevant in the do not provide exact instructive feedback on how to change
networking domain. the agent’s behavior, thence RL is placed in the spectrum
of learning paradigms. In this section, we will describe
1) PROBABILISTIC ML the basics of RL and the most fundamental algorithms.
Oftentimes, neither all relevant information is known or Throughout, we will directly refer to applications in computer
attainable prior to making a decision, nor is the environment networking for almost all mentioned algorithms.
that reacts to the taken decision purely deterministic [5]. The interaction of an RL agent with its environment is
Uncertainty may exist in the input data, in the decision described by a Markov Decision Process (MDP) as illustrated
model parameters and output values and even in the in Figure 3. Whenever one seeks to solve a problem using RL,
architecture of the decision model itself [30]. In all of these the first step (arguably the most important) is to define the
cases, probability theory provides a unified framework to problem as an MDP. Based on this MDP one then chooses or
cope by using probability distributions to model uncertain designs a suitable RL algorithm to find a solution to the MDP.
quantities. This framework is in principle, applicable to all In general, an MDP can be considered as a system that can
ML learning paradigms, model architectures and problem assume states s from a state space S. The MDP transitions to
domains that come with some notion of uncertainty. Since a new state s′ according to a controlled transition probability
the near future, and the weights of future rewards decay networks, e.g., network self-organization [45], network
geometrically. For background on average and total cost slicing [46] or virtual network embedding [47].
MDPs see [40, Chapter 4 & 5].
Given a policy π, the associated action-value function (also 2) RL WITH FUNCTION APPROXIMATION
called Q-function) is defined as The rise of RL as a powerful tool for decision-making
π is largely due to the effective use of function approxi-
Q (s, a) := Eπ [R | s1 = s, a1 = a] . (1) mation. When the state S becomes large or continuous,
the traditional algorithms become impractical. Function
The Q-function is a fundamental object in RL and describes
approximation solves this problem by enabling RL agents
what accumulated reward R one can expect if we are in
to infer information about unseen state-action pairs from
state s, take action a, and follow the policy π for all future
observed state-action pairs. The approximation may be used
states. Furthermore, for finite action spaces, the Q-function
in policy or value space or in both, policy and value space.
can directly be used to implement a policy by setting π(s) =
For example, a Q-function Q(s, a) can be approximated by a
argmaxa∈A Q(s, a). This makes RL algorithms that seek to
function Qθ (s, a) with parameters θ. This is the basis of deep
find or approximate the optimal Q-function3 attractive since
Q-learning to be explained in Section III-C. Traditionally,
they immediately lead to simple, implementable policies.
function approximation was an important part of RL even
before the rise of deep neural networks [48].
1) BASIC RL ALGORITHMS
In Section III-C, we will discuss RL with deep neural
RL algorithms can be roughly divided into three groups, networks as function approximators. Here, we highlight the
value-based methods, policy-based methods, and actor-critic traditional class of stochastic policy gradient algorithms with
methods. Value-based methods seek to find or approximate policy function approximation for MDPs with finite action
value functions like the Q-function. Policy-based meth- space. Define a stochastic policy πθ (s, a) with parameters θ;
ods instead seek to optimize a policy π directly. Value- πθ (s, a) maps states to a distribution on A.
based methods, therefore, yield an implicit policy, whereas The stochastic policy gradient theorem [49] has given
policy-based methods yield an explicit policy. Actor-critic rise to a large class of algorithms, where Qπθ (s, a) is
methods combine learning in value- and policy-space and use replaced by a suitable estimator. E.g., the REINFORCE
a learned value function to ‘‘guide’’ the training of an explicit
algorithm [50] uses a Monte-Carlo estimator, actor-critic
policy. See [41, p. 36] for an illustration of the actor-critic
algorithms now add approximation in value space with an
feedback loop.
additional function approximator Qw (s, a) with parameters
Before stating some examples of popular RL algorithms,
w in place of Qπθ (s, a) [51], the famous Advantage-Actor-
we have to distinguish some typical MDP and RL settings:
Critic (A2C) algorithm [52] uses an approximation Aw (s, a)
1) Continuous state (e.g. S = Rn ) vs. finite state MDPs of the advantage function Aπθ (s, a) := Qπθ (s, a) − V πθ (s),
(e.g. S = {1, . . . , d}). where V πθ (s) := Ea∼πθ [Qπθ (s, a)]. These algorithms have
2) Continuous action vs. finite action MDPs. been used for various scheduling and resource allocation
3) Model-based vs. model-free RL problem. tasks in data centers [53], wireless networks [54], edge
Model-based RL usually focuses on offline planning of value computing [55] or vehicular networks [56].
functions and policies, where either the transition function p
is given or where p will be approximated [42]. Model-free
3) EXPLORATION AND CURIOSITY IN RL
RL methods instead seek to determine what action to take in
a given state without knowledge of p, e.g., solely by observing RL methods trade-off exploration vs. exploitation during
MDP transitions (s, a, r, s′ ). training, i.e., agents either explore some random action or
The traditional algorithms for model-based RL in finite exploit their current best guess of the optimal action for
state and action MDPs are value and policy iteration the current state. On the other hand, some methods purely
[40, Chapter 2]. Some recent applications of modern focus on exploration as a metric for learning. Such methods
value iteration algorithms in the context of networking may seek to explore as many unseen states during training
are age-of-information minimization in wireless broadcast as possible. Another approach is to explore promising
states, e.g., those parts of the state space where the current
networks [43] and multi-agent routing [44]. The most well-
approximation of a certain value function is particularly bad.
known model-free RL algorithm is tabular (simulation-based)
Q-Learning, which seeks to find the optimal Q-function. Such methods are known as curiosity-driven RL [57], [58].
Under simple conditions, Q-Learning is guaranteed to
converge to the optimal Q-function if all states and actions are 4) MULTI-AGENT RL
explored infinitely often [40, Section 6.6.1]. Q-Learning has Multi-Agent RL (MARL) problems are formulated as
been successfully applied to various problems in computer Markov games [59], where depending on the local reward
structure, the agents may cooperate or compete. An illus-
3 The optimal Q-function is given by the solution to Bellman’s equation tration is given in Figure 4 from the perspective of some
[41, Section 5.6]. agent i in a Markov game environment. Most notably,
A. NEURAL NETWORKS
Given its central role in almost any cognitive process, neu-
FIGURE 4. Markov game feedback loop over discrete time n. roscientists have long tried to understand the inner workings
and mechanisms of the brain. In [73], a mathematical model
for a neuron was introduced that has since inspired an
the environment typically transitions to a new state as a emerging class of ML model architectures: Artificial NN
function of all local actions and all local states. There are [5]. In NNs, a neuron j receives inputs ai from nodes i =
several additional properties to classify MARL settings, such 1, . . . , n and a bias input a0 = 1, and first computes
Pn a
as whether the setting is decentralized, or whether or to weighted sum using link weights wij : a′j = i=0 ij i .
w a
what extent agents transition independently [60]. We will Then, it computes the output (also called activation) oj =
discuss the two most common deep MARL algorithms in g(a′j ) using an activation function g [74]. If multiple such
Section III-C. For a survey of MARL algorithm and various neurons (mostly called perceptrons in the ML community)
applications and challenges in computer networks, see [61] are connected in a directed and acyclic manner, they form
and [62]. a so-called feed-forward network that is usually arranged in
layers. In such layered feed-forward networks, each neuron
5) RL WITH CONSTRAINTS receives the outputs of the neurons of the previous layer,
Contrained RL (CRL) [63] is a paradigm for constraint with the first layer receiving the overall model input and the
MDPs. The goal is to ensure that the agent’s actions do not last layer providing the overall model output. The resulting
violate any environmental constraints. A set of constraints compound function expressed with the network can be
can be specified as hard (absolute and must always be highly complex, in fact, it is shown in [75] that with as
satisfied) or soft (desired but can be violated if necessary) little as one intermediate neuron layer and by choosing any
constraints. Safe RL (SRL) [64], on the other hand, aims to squashing activation function (e.g., a non-decreasing function
learn policies that minimize the likelihood of unsafe actions converging towards 0 and 1 on its respective ends), NNs can
while maximizing the long-term expected reward for safe theoretically approximate any continuous function uniformly
actions. Accordingly, CRL and SRL both focus on ensuring on any compact set to an arbitrary degree of accuracy. This
that the agent’s actions do not violate certain constraints, statement is also known as the Universal Approximation
but the formulation of constraints takes slightly different Theorem (UAT).
approaches. Both approaches are used when resources (such The UAT becomes even more interesting once we view
as bandwidth, computation, and energy) are limited [65], the entire NN as a function hw (x) of the input vector x
[66], [67] or when some applications may impose additional and the NN weights w [5]. The UAT implies that there
constraints (such as throughput and latency) [68], [69]. exists a weight parameter configuration that sufficiently
approximates the function which describes the desired
III. DEEP LEARNING—THE COOL KID OF ML solution to a problem. Hence, many learning problems can
Deep Learning is a subfield of ML that aims at facilitating be viewed as a problem of function approximation to find
the learning of complex data representations by learning the right NN weights. The most common technique to
hierarchies of simpler intermediate representations [5], [70]. update the NN weights is gradient descent in combination
The resulting ‘‘stacking’’ of model blocks (predominantly with the so-called backpropagation algorithm [76]. For
Neural Network (NN) layers) is what gives Deep Learning example, to update a NN using an input-output tuple (x, y),
∂
its name. While the term Deep Learning has been around backpropagation calculates the derivative ∂w |y − hw (x)|2 of
for decades, it only started to gain widespread traction the output error with respect to the NN weights. This
in 2012 with the widely visible success of AlexNet [71] calculation is done sequentially starting from the output layer
winning a widely popular image classification challenge by applying the chain rule on the above derivative. Calculated
with deep Convolutional Neural Network (CNN) (see Sec- gradients are then used to update the NN weights iteratively.
tion III-B). Since then, the rate of progress concerning deep Various algorithms have been proposed throughout the last
NN architectures, paradigms and learning techniques has decade to use the aforementioned calculated gradients most
skyrocketed, and the development of specialized hardware effectively. The most well-known algorithm is the ADAM
such as high-end GPUs or TPUs has led to deep learning optimizer [77], which adaptively selects the stepsize for
models with millions or even billions of parameters [72]. individual NN weights based on the calculated gradient
As a consequence, a growing proportion of ML applications information. See [78] and the reference therein for various
other gradient-based methods. Note that most tools (such as 3) RECURRENT NEURAL NETWORK (RNN)
PyTorch and TensorFlow) offer these optimimzers as black For dealing with sequential data such as time series,
boxes without having to deal with the implementation details. Recurrent Neural Network (RNN) elements such as the Long
B. DEEP NEURAL NETWORK ARCHITECTURES
Short-Term Memory (LSTM) [84] or the Gated Recurrent
Unit (GRU) [85] have proven very useful. The commonality
The UAT seems to advocate that rather simple feed-forward
between all RNNs is feeding a portion of the output back into
NN architectures can be used for any problem that might
the RNN block for subsequent computations, enabling NN
be solvable with ML. In practice, however, the findings of
architectures with recurrent elements to capture sequential
the UAT are greatly humbled by the excessive amounts of
dependencies within the data [70].
training data, the size of the NN models, and the required
time for training necessary to achieve satisfactory results
on a complex task. Furthermore, for many tasks, it can be 4) GRAPH NEURAL NETWORK (GNN)
observed that the members of the underlying data domain Recently, Graph Neural Networks (GNNs) have emerged
are semantically composable into simpler entities, spanning as powerful architectures for handling graph-structured
a hierarchy of concepts. As a consequence, researchers have data. Utilizing permutation-invariant aggregation/pooling
started to add more structure to their models. operations and permutation-equivariant message passing
Different model architectures have proven effective for operations to learn patterns in the data while respecting the
different tasks. An overview of common deep learning model graph topology rather than assuming any specific ordering of
architectures is given in Table 4. In the following subsections, its nodes and edges [86].
we briefly present the most popular model archetypes
and refer to the provided references for further reading. 5) GENERATIVE ADVERSARIAL NETWORK (GAN)
Interestingly, all of the model archetypes introduced below Generative Adversarial Networks (GANs) [87] have emerged
are derivable from the same basic mathematical framework as a powerful tool for generating realistic data samples,
and only differ in the shape of data and the assumptions made including images [88], videos [89], and audio [90], but
about regularities in the data [79]. also network traffic [91]. GANs consist of two NNs: a
1) MULTILAYER PERCEPTRONS (MLP) generator that creates synthetic samples, and a discriminator
Standard Deep Neural Networkss (DNNs) consisting of that tries to distinguish between real and fake samples. These
multiple hidden layers of neurons are also called Multilayer two networks are trained simultaneously in an adversarial
Perceptrons (MLPs). Together with non-linear activation setting, where the generator tries to fool the discriminator,
functions, they have long become a standard tool for pro- while the discriminator tries to correctly identify the real
cessing vector-shaped inputs as their hierarchy of non-linear samples. In the context of computer networks, GANs are
function approximators is widely applicable across many useful for generating synthetic network traffic patterns that
mimic real-world traffic. This is useful for testing and
problem domains [80]. However, given that Multilayer
evaluating network performance metrics, intrusion detection
Perceptronss (MLPs) make rather few assumptions about
systems, and network security protocols. See Section IV-B3
the input and output data despite being shaped as a vector,
for example applications of GANs being used to generate
for many tasks these models often perform unfavorably
data.
compared to more specialized models of similar size.
A detailed introduction to MLPs is given in [70]. MLPs have
already been used in computer and wireless networks, e.g., for 6) TRANSFORMERS
channel decoding [81], in resource allocation [82], and in In many ML domains with complex long-range dependencies
intrusion detection [83]. within data points, the attention mechanism [92] and its
implementation in the Transformer architecture [93] has
2) CONVOLUTIONAL NEURAL NETWORK (CNN) proven to be extremely powerful. Works like [94] and
In many problem domains, data exists on a grid-like [95] show that Transformers can outperform both CNNs
structure where spatial patterns carry the same semantic and RNNs in problems with spatial and temporal data
information regardless of their location in the grid (also as each token/component of the input can relate to any
referred to as translation invariance). Examples include other component. While we refer to [93] for a detailed
images (2D grids) but also time-series data (1D grids). explanation of the attention mechanism, it is worth noting
To exploit this symmetry, Convolutional Neural Network that transformers are a special case of GNNs operating on a
(CNN) utilize spatial convolution, which applies the same fully-connected computation graph [96]. This implies that for
learnable spatial parametric kernels (i.e., small matrices with large inputs, using the transformers is computive intensive.
learnable individual entries) on evenly spaced patches of the
input grid [70]. The re-usage of a set of such kernels across 7) LARGE LANGUAGE MODEL (LLM)
multiple image positions is called weight sharing and greatly Large Language Models (LLMs) have recently gained
reduces the number of parameters needed to learn and extract significant attention in the field of Natural Language
the patterns of the input data. Processing (NLP). These models are trained on vast amounts
of text data and can generate human-like text based on their C. DEEP REINFORCEMENT LEARNING (DRL)
input. One popular type of LLMs is the generative pre- Deep Reinforcement Learning (DRL) refers to the use of
trained transformer (GPT), which uses a transformer-based DNNs as function approximators for RL algorithms. The gen-
architecture and is pre-trained on large amounts of text eral idea of RL with function approximation has been briefly
data using a self-supervised learning approach. During pre- described in Section II-D2. With the advent of deep learning
training, the model learns to predict the next word in libraries such as Keras and TensorFlow (see Section IV),
a sentence, which enables it to generate coherent and as well as standardized APIs such as Gymnasium (formerly
contextually relevant text. GPTs can be fine-tuned on specific known as OpenAI Gym), training of DRL algorithms has
NLP tasks, such as text classification, summarization, and become very accessible. However, the success of RL with
translation, by adding a task-specific output layer and training DNNs4 relies on some key techniques. This subsection
on a smaller dataset. Unlike traditional NLP models that focuses on the most important DRL algorithms and the tools
rely on hand-crafted features, LLMs learn to represent the and techniques to train DRL models.
meaning of words and phrases in a continuous vector space, First, we will explain two key techniques for DRL based on
enabling them to perform a wide range of NLP tasks. In the Deep Q-Learning also known as Deep Q-Networks (DQN)
context of computer networks, LLMs and GPTs have been [109]. DQN is a DRL algorithm for MDPs with finite action
used, for example, to generate synthetic network traffic [97], space. DQN seeks to approximate the optimal Q-function
to explain decisions in intrusion and anomaly detection by a DNN Qθ (s, a) with parameters θ. Specifically, a DQN
systems [98], [99] and for managing networks [100]. For takes a state s as input and outputs Qθ (s, a) for every
an overview of applications, techniques, and challenges, action a of the finite number of actions. The key techniques
we refer to [101]. introduced for DQN are an experience replay buffer and
a so-called target network. During training, DQN interacts
8) GENERATIVE AI (GENAI) with its environment, generating data tuples (s, a, r, s′ ).
Generative AI (GenAI) is a broader concept that can apply These data tuples are stored in an experience replay buffer.
During training, DQN samples a mini-batch from this
to any type of data [102]. It uses ML models, such as GPT,
memory and applies a stochastic gradient descent step of
GAN, or/and others, to learn the patterns and structure of the
the average squared Bellman error of the samples from
given training data, and can then be used to generate realistic
the mini-batch. This rather simple technique reduces the
and novel outputs that are similar but not identical to the data.
bias of Q-Learning towards its recent interaction with the
Additionally, and closely related, GenAI can be used with
environment and thereby helps to stabilize training. In NN
retrieval-augemented generation (RAG) [103] to automate
terminology, the right-hand side of the Bellman loss, i.e., r +
collecting information from the network, analyze it, and
γ maxa′ Qθ (s′ , a′ ) is the training target for Qθ (s, a) given
push new configuration if necessary [104], [105], [106]. This
the data tuple (s, a, r, s′ ). In other words, the DQN itself
removes the pain of learning a new documentation or writing
is used to compute its training targets. The idea behind
new scripts, and simplifies the user interaction. Recent
target networks is to use a separate target network Qθ (s, a)
′
GenAI models have shown impressive and/or human-like
to compute the aforementioned training targets. The target
capabilities in an unprecedented range of downstream tasks.
parameters θ ′ are then chosen to track the actual training
As a consequence, several network industrial companies
parameters slowly. With this, target networks provide more
have started to develop or adjust commercial products
that leverage GenAI e.g. to generate threat intelligence 4 Historically, it was known that training RL with a NN could potentially
reports [107], security policies, incident response plans [104], lead to systematic overestimation of utility values (such as the Q-function)
and proactively identify and fix network issues [105]. and thus to failed learning [108].
stable training targets, which has been shown to generally the reward changes the perception of an agent about the
improve DRL training, see [109] and [110]. However, more environment and results in different learned policies.
recent theoretical and numerical studies suggest that gradient 3) The design of the reward signal is an integral part
clipping is superior to the use of target networks [111]. of the design of an MDP. One has to craft a reward
DQN is also an integral component of the Deep Deter- function that incentivizes the desired behavior to get
ministic Policy Gradient (DDPG) algorithm [110], which an algorithm to learn the desired goal. Some additional
is one of the most well-known actor-critic algorithms for comments in no particular order: Make it easy for
continuous action spaces. In DDPG, a critic is trained using an agent to distinguish good from bad scenarios;
the DQN algorithm, while a deterministic policy is trained Continuous rewards or dense rewards typically make it
to maximize the approximated Q-function. DQN and DDPG, easier for algorithms to learn; If possible, avoid sparse
in turn, are the basis for the two common deep MARL rewards and instead shape the rewards to give gradual
algorithms Independent Deep Q-Learning [112] and Multi- feedback; Strictly positive rewards incentive agents to
Agent DDPG (MADDPG) [113]. However, only DDPG has avoid terminal states; Strictly negative rewards incentive
a truly distributed version that can be run with nearly arbitrary agents to reach terminal states.
communication delays over a communication network. This 4) It should be avoided to train DRL models with drop-
is known as the Distributed DDPG (3DPG) algorithm [114]. outs. Drop-outs is a regularization technique that was
Another important technique for successful DRL train- introduced in [121] to train NN models with less overfit-
ing was proposed as part of the deep actor-critic algo- ting while improving the generalization. However, this
rithm Asynchronous-Advantage-Actor-Critic (A3C) [52]. leads to increased training variance, which is generally
The asynchronous part refers to using several agents in undesirable for the training of DRL.
parallel simulated environments to improve and speed up
DRL training. In other words, the training progress of several
agents on the same problem is combined to enhance the 2) ALGORITHM CATEGORIZATION
training performance. This is especially important for com- The sheer amount of available DRL algorithms can be
plex tasks since multiple parallel processors can significantly overwhelming for starters in the field, making it challeng-
reduce the overall training time. ing to find appropriate algorithms for a given problem.
The success of DRL has been demonstrated across To ease the algorithm selection, we provide categorizations
various sub-areas in computer networks like management of widely used single-agent and multi-agent DRL algo-
of satellite-terrestrial networks [115], multi-objective service rithms in Figure 5 and Figure 6, respectively. We note
coordination [116], scheduling for large-scale networked that the tree structures are simplified. For example, the
control systems [117], acoustic sensor networks, [118], model-based algorithms in Figure 5 can further be classified
adaptability of wireless sensor networks [119] and other into value-based, policy-based, actor-critic and on/off-policy
applications in communications and networking [120]. algorithms. Furthermore, only selected and widely used
algorithms are shown. These categorizations should serve
as starting points. The final algorithm selection for a
1) GENERAL ADVICE FOR TRAINING DRL AGENTS
specific problem should also consider additional factors such
Training a DRL Agent to successfully solve a given problem as sampling efficiency, algorithm stability and exploration
can be a challenging task. In this section, we provide some strategy.
general advice from our experience in the hope to ease this Single-Agent-DRL Algorithm Categorization: Single-
task. agent-DRL algorithms can be coarsely categorized by their
1) It is good practice to normalize the states and actions, supported action space (discrete/continuous), if they are
e.g., [−1, 1]d , d ∈ N. Linear scaling always makes model-based or model-free, and if they are value-based,
this possible when the state space S is bounded in real policy-based or a combination of both - called actor-critic.
dimensional space. When S is unbounded, let’s say Rd , Considering the tree-structure in Figure 5, it can be seen that
one needs to use, e.g., a scaled version of the hyper- some algorithms (e.g., A2C/A3C, SAC, PPO) can be used
bolic tangent or the inverse stereographic projection. for both, discrete and continuous action spaces, while others,
Such nonlinear transformations, however, change the such as DQN and DDPG, are only compatible with one of
environment, and the resulting policies may perform them.
poorly in the actual environment if the normalization MARL algorithms can generally be categorized based on
is not chosen carefully. Ideally, one should aim at the same factors as single-agent DRL algorithms. However,
linear scaling throughout the state space’s ‘‘expected’’ additional multi-agent based factors can be included. These
dominant part. As the action space is typically bounded, are mainly centralized/decentralized learning and coopera-
action normalization is less problematic. tive/independent learning. To preserve clarity, some of the
2) Reward normalization should be used even more traditional single-agent based factors have been omitted in
carefully than state normalization. In general, changing Figure 6.
IV. DATASETS, TOOLS, AND FRAMEWORKS of data being used. In general, preprocessing includes the
Now that we have discussed what ML is and its potential following steps:
applications, we will introduce here the most popular datasets • Data cleaning: This involves removing any missing,
in the field of networks, as well as emulators, and simulators inconsistent, or irrelevant data to ensure the quality of
that can be used to run ML experiments. Since ML models the data being used for training.
parameters are learned from data, the datasets used are crucial • Data normalization: This involves transforming the data
in accomplishing the intended task, such as network latency into a common scale, such as normalizing the values
prediction or decision-making for traffic routes. between 0 and 1, to ensure that no variable has an undue
Additionally, ML models need to be tested before being influence on the model.
applied in a productive environment. Thus, well-known • Data selection: This involves selecting the relevant
network tools and frameworks can aid in prototyping, features or variables from the dataset that are most
tracking, and evaluating these models. important for the problem at hand. This step is important
to reduce the dimensionality of the data; making it easier
A. DATASETS to improve the performance of the model; it removes
Datasets are usually not plug-and-play and require prepro- irrelevant or redundant features, it can help to speed
cessing. The type of preprocessing required for the datasets up the training process and reduce the computational
depends on the specific problem being addressed and the type resources required for analysis.
• Data transformation: This involves transforming the data and RSRP as well as GPS data, time, data rate, etc., and it is
into a format suitable for the ML algorithm being used, published at CRAWDAD.5
such as converting categorical variables into numerical Raca et al. also created a widely used [129] dataset for
values using one-hot encoding; it generates a vector, 5G networks [130]. It contains 83 traces of measurements in
whose length corresponds to the number of categories Ireland with two different mobility patterns (cf. Table 5). The
in the dataset. Data points belonging to the category are measurement setup and dataset structure are comparable to
assigned 1, otherwise 0. their 4G_b dataset. In addition, it also contains ping statistics.
• Data splitting: This involves splitting the dataset into a Table 5 provides a comparative overview of the presented
training set to train the model, a validation set to evaluate datasets.
its performance during training, and a test set to evaluate
the models’ performance after training. 2) ROUTING
It is important to note that the specific preprocessing steps As routing is an important part of networking, having
required may vary depending on the dataset, the problem a real-world dataset for it can be beneficial for training
being addressed, and the type of ML model used. The ML models, evaluating their performance and testing their
preprocessing steps should be chosen carefully to ensure robustness in the case of network failures and other real-world
that the data is suitable for training and that the model issues. In the following, we discuss some of these datasets.
can accurately represent the underlying relationships in The Abilene dataset [131] is a real-world network trace
the data. In the following, we present the most popular that captures the communication patterns of a backbone
network domain datasets in the literature for different network. It was collected by the Abilene project, which was
applications. a collaboration between researchers from the University of
California, San Diego, and the University of Kansas. The Abi-
1) MOBILE NETWORK THROUGHPUT DATASETS lene dataset provides information about the communication
A common problem in networking research is replicat- patterns of a backbone network that connects several research
ing realistic network conditions, especially throughputs. institutions in the United States. It includes information
Dynamic Adaptive Streaming over HTTP (DASH) is about the network topologies, routing algorithms, and traffic
one such exemplary research area. Depending on the patterns of the network. It contains information about the
mobile network, different datasets containing traces of routes taken by packets, the number of packets sent and
real-world measurements have been created in order to received, and the size of the packets. The dataset is commonly
allow for a better comparison between different research used for research in the areas of network routing, network
approaches. management, and network performance evaluation.
For 3G mobile networks, the dataset by Riiser et al. [122] is The Global Environment for Network Innovations (GENI)
widely used [123]. It contains 86 traces from measurements dataset [132] provides network traces from real-world
conducted on commute paths in Oslo, Norway, using six deployments. GENI is a large-scale research infrastructure
different mobility patterns (cf. Table 5). Besides the download that provides a platform for conducting experiments and
throughput, it also contains the GPS latitude and longitude evaluating new network technologies and protocols. The
coordinates of the measurement device. dataset includes network traces that were collected from a
For 4G mobile networks, the dataset by Van Der Hooft variety of testbeds and networks, including campus networks,
et al. [124] (we call it 4G_a in this paper) is commonly data centers, and wide-area networks. It includes information
used [125]. It contains traces of 40 measurements with about the network topology, routing algorithms, and traffic
different mobility patterns (cf. Table 5) conducted in Ghent, patterns of the network. It also contains information about
Belgium. It is similar to the 3G dataset by containing the the routes taken by packets, the number of packets sent and
download throughput, as well as the GPS coordinates of the received, and the size of the packets. The dataset is commonly
measurement device. used for research in the areas of network routing, network
Another widely used [126] dataset for 4G networks was management, and network performance evaluation.
created by Raca et al. [127] (we call it 4G_b in this The Cooperative Association for Internet Data Analysis
paper). A total of 135 measurements were conducted in (CAIDA) anonymized internet traces dataset is a collection
Ireland. In comparison to the 4G_a dataset, this one is larger, of network traces that were collected by the CAIDA
also contains different mobility patterns (cf. Table 5), and project [133]. CAIDA is a non-profit research organization
contains significantly more metrics, such as the download that collects and analyzes data about the internet to gain
and upload throughput, additional channel-related metrics, insights into its structure and behavior. It includes data from a
context-related metrics, and cell-related metrics. variety of sources, including routers, switches, and end hosts.
In Farthofer et al. [128] an LTE dataset for the use of ML The data includes information about the network topology,
is described. The dataset is measured on an Austrian highway routing algorithms, and network traffic patterns. Additionally,
and contains over 2000 measurement points per month over it contains information about the routes taken by packets,
a time period of two years. Additionally, there are different
signal parameters measured in the dataset like SINR, RSSI, 5 https://www.crawdad.org/
TABLE 5. Overview of throughput datasets for 3G, 4G, and 5G mobile networks.
the number of packets sent and received, and the size of and the characteristics of the nodes themselves, such as
the packets. The data is collected using active and passive their locations and capabilities. One of the main strengths
measurements: of the Internet Topology Zoo is its comprehensive coverage
Active measurement involves actively sending test packets of different types of networks, including wide-area networks,
or requests to a network, and then analyzing the resulting data centers, and other large-scale networks. This makes it a
responses to gain insight into the network behavior and valuable resource for researchers and practitioners working
performance. Examples of active measurement techniques in the field of network routing and network management,
include pinging, tracerouting, and bandwidth testing. Active as it provides a diverse set of datasets for evaluating and
measurements are often more accurate and provide more comparing different algorithms and technologies.
detailed information about the network, but they can also To sum up, GENI and Abilene datasets primarily focus
introduce more overhead into the network and disrupt normal on network infrastructure, providing researchers access to
network traffic. national research networks. Conversely, CAIDA and Rocket-
Passive measurement involves observing network traffic Fuel are designed to facilitate the measurement and analysis
without actively generating any test traffic. Passive measure- of network traffic and topology. The Internet Topology Zoo,
ments are typically less disruptive to the network and do meanwhile, is a collection of publicly available network
not introduce any overhead, but they provide a more limited topologies that researchers can use for various purposes.
view of the network’s behavior and performance. Examples Thus, the size of the network varies depending on the
of passive measurement techniques include network traffic scope and focus of the dataset. The GENI and Abilene
analysis, packet capture and analysis, and log file analysis. datasets tend to cover larger networks compared to CAIDA
The RocketFuel dataset [134] is a collection of network and RocketFuel, which prioritize measurement and analysis
topology data that was collected by the RocketFuel project. tools [136]. CAIDA and RocketFuel datasets use passive
The RocketFuel project was a research effort aimed at measurements, while GENI and Abilene datasets use both
studying the structure and behavior of the Internet at the level active and passive measurements. The Internet Topology Zoo
of individual routers and links. The project collected data is a collection of network topologies and does not involve any
from several large Internet Service Providers (ISPs) and used measurements. Further comparison is shown in Table 6
it to create a high-resolution map of the Internet. It includes
information about the network topology and the paths that
packets take through the network. It also includes information 3) DYNAMIC ADAPTIVE STREAMING OVER HTTP (DASH)
about the capacities of the links between routers, as well Video streaming via Dynamic Adaptive Streaming over
as information about the location and characteristics of the HTTP (DASH) is a large research area in networking.
routers themselves. Recent rate adaptation algorithms often aim to optimize the
The Internet Topology Zoo [135] is a collection of network user’s Quality of Experience (QoE) under the given network
topology datasets that provide information about the physical conditions, such as a constrained bandwidth [137]. While
structure of different networks. The datasets in the Internet these algorithms were initially conventional heuristics, DRL-
Topology Zoo come from a variety of sources, including based approaches have recently shown excellent performance
measurements of the internet, testbeds, and simulations. Its and are now considered state-of-the-art [138]. In order to
datasets provide information about the connections between benchmark different solutions, publicly available DASH
nodes in a network, the capacities of the links between nodes, datasets are often used [138]. Typically, the datasets contain
videos that are encoded under a controlled set of parameters, most of the studies in the literature discussing solutions
e.g., resolution and bitrate, and split into segments of certain and their results do not make the datasets publicly available
lengths. The solutions are commonly evaluated via simula- for scrutiny by third parties. As such, the results are
tions where the videos from the DASH datasets are streamed difficult to verify and validate properly. Nevertheless, many
over simulated networks [139]. Realistic network conditions, studies rely on simulated datasets. An open-source traffic
especially the download bandwidth, are commonly simulated simulation software called Simulation of Urban Mobility
using the network traces from the datasets presented in (SUMO)6 provides datasets for simulating realistic urban
Section IV-A1 [138]. In the following, we present four traffic scenarios. Among the datasets are road networks,
commonly used DASH video datasets. Table 7 provides an traffic demand patterns, and vehicle behavior models, which
overview of their most important properties. can be customized for different traffic scenarios and urban
The DASH dataset [140] is an old (2012) but still widely environments [144].
used dataset, e.g., to test new QoE-schemes [139]. It contains Although SUMO is popular among researchers and prac-
6 videos of different genres split into segments ranging from titioners in the industry, another software called Multi-Agent
1 - 15 seconds in length. Transport Simulation (MATSim)7 is often used in academic
The Distributed DASH (D-DASH) dataset [141] was research. While SUMO focuses on macroscopic traffic flow
published in 2013 and is intended to be used in real-world modeling, MATSim uses an agent-based approach to model
testbeds. It contains one video that is distributed on servers in individual travel behavior [145]. As a result, MATSim can
Klagenfurt, Paris, and Prague. This enables a client to choose capture more complex individual decision processes, while
the requested location for each segment individually. SUMO is better suited for overall traffic flow modeling.
The ultra high definition HEVC DASH dataset [142] was Another open-source software is CityFlow,8 which includes
published in 2014 and includes one video. In contrast to a range of features that are not available in SUMO, such as
the DASH and D-DASH datasets, the video is encoded with real-time simulation and the ability to model pedestrian and
the newer and more efficient H.265 (HEVC) video codec. bicycle traffic [146].
Furthermore, it is encoded in UHD resolution, at 30 and 60 There exist other alternatives, yet commercial, that also
Frames per Second (FPS), and at 8 and 10 bits. provide mobility datasets, such as Aimsun,9 Vissim10 and
The multi-codec DASH dataset [143] is a rather new TransModeler11 We present in Table 8 an overview of these
dataset from 2018. It consists of 10 videos that are encoded datasets, while a comparison of the use cases for some of
with four different video codecs: H.264, H.265, VP9, and these datasets was shown in [147].
AV1. In addition, three different video FPS are included: 24,
6 https://sumo.dlr.de/docs/Data/Scenarios.html
30, and 60.
7 https://www.matsim.org/open-scenario-data
8 https://github.com/cybercore-co-ltd/track2_aicity_2021
4) MOBILITY AND AUTONOMOUS VEHICLES 9 https://www.aimsun.com/
In the context of mobility or autonomous driving using a 10 https://www.ptvgroup.com/
wireless network infrastructure (let it be cellular or V2X), 11 https://www.caliper.com/transmodeler/default.htm
5) (ENCRYPTED) NETWORK TRAFFIC ANALYTICS Due to the prevalence of those topics, there also exists a
Another common task for ML in networking is network variety of datasets for the different network traffic analytic
traffic analytics. This includes the task of traffic/service tasks.
classification, i.e., identifying an active service or traffic type An overview on the topic of traffic classification along
in the network. Examples for such a task are distinguishing with a list of existing works (and solutions) and datasets
between video and web traffic, between services like is provided in [161]. However, many of these datasets are
YouTube and Netflix, or even between different Android quite old and, thus, outdated. A dataset for encrypted network
apps. Due to pervasive encryption with for example TLS on traffic classification of YouTube and Netflix is provided
application and transport layer, protocols like HTTPS, DNS in [162]. Here, the authors collected three classes of flows,
over TLS (DoT), and QUIC [154] do not yield sufficient namely, web flows, YouTube flows, and Netflix flows for
unencrypted data that reliably identify services and traffic the most popular websites and videos, while using different
types. Instead, new techniques have to be developed which end devices, browsers, and operating systems. A new dataset
make use of the available unencrypted data. For encrypted for app traffic classification is Mirage [163]. This dataset
network traffic analytics, a common approach is therefore was generated using three different mobile devices, which
to extract packet sizes, directions, and inter-arrival times as were used by real experimenters (students) once or twice
well as potential additional information like port numbers to a day. Overall, each experimenter generated 12 captures of
build features. These features describe the network traffic of a duration of 5 to 10 minutes. Experimenters were hereby
a specific service or traffic type [155], [156], [157], [158]. instructed to use the app as they would usually do in their
These features are then fed to ML models to learn specific day-to-day life. The resulting datasets consists of 40 Android
patterns exhibited by different traffic types or services. apps from 16 different categories. Another new dataset for
Beyond traffic classification, this type of analytics is often traffic classification of mobile apps is AppClassNet [164].
used for security-related tasks like intrusion detection or It was designed as ImageNet for encrypted network traffic
fingerprinting of websites, browsers, devices, and operating analytics and, therefore, is significantly larger in terms
systems, and for estimating the QoE of services [159], [160]. of tested apps and available samples than other datasets.
The corresponding public dataset contains 500 apps with [171] module to ease the way for researchers to have
a volume of around 10 TB and stems from passive somehow reliable test environments for the newly developed
measurements. approaches and reduce development time for various kinds
For security tasks, in particular network intrusion detec- of research interests. Indeed, ns-3 can assist researchers in
tion, a comprehensive survey can be found in [165]. In this network performance evaluation, however, if it is extended
survey, the authors describe over 30 datasets and list the through open-source AI frameworks, the procedure would be
corresponding attacks and data types. A variety of datasets more beneficial to ML problems as by default it does not
for different security tasks is also provided by the Canadian support ML approaches. An attempt to do so was employed
Institute for Cybersecurity, University of New Brunswick in ns3-gym [172], an extension of ns-3 connecting the
(UNB) [166]. They provide more than 25 datasets from module to the OpenAI Gym toolkit. This connection is done
different categories. These categories include IoT, dark web, utilizing Zero MQ sockets through the IPC (Inter-Process
DNS, IDS, traffic classification (web and apps using Tor or Communications) method. Moreover, the capability and
VPN), malware, and operational technology. A dataset for adaptability of OpenAI Gym to reinforcement learning can
fingerprinting of devices and operating system in the potential be favorable, as it is a widespread library such as TensorFlow
presence of VPN is provided in [167]. The dataset contains and Scikit-Learn. ns3-gym aims at ameliorating the process
around 20000 examples suitable for fingerprinting browsers, of network prototyping that employs reinforcement learning.
operating systems, and apps. This module enhances the feature of scalability, which is
important for having several instances in ns-3 and making
B. DATA GENERATION the conversion and deployment of ns-3 scripts feasible in the
Using real-world data is often challenging due to its limited OpenAI Gym. Furthermore, debugging and exploitation of
availability and applicability. In this section, we explore the the module could be kept at a level that is uncomplicated for
use of simulators, emulators, and synthetic methods, such users as it is such a conventional module for ns-3, having
as GANs, for generating data that can be used to train two main blocks of OpenAI Gym and ns-3 that interact with
ML models. These approaches have the potential to help each other. Another interface extension that bridges the ns-3
when datasets are not available or suited, and can enable the and python-side ML implementation is ns3-ai [173], which
creation of diverse and complex datasets. claims to greatly increase the interaction speed by facilitating
communication through a shared memory block.
1) SIMULATION TOOLS
One of the paramount parts of designing a new scheme or OMNET++
protocol is the evaluation process. There are various methods, Another popular discrete-event network simulator is
including real-world experiments, simulation, emulation, OMNeT++,13 which can be used free of charge for
or analytical models in order to perform detailed investigation academic and educational purposes under a license with
of the newly designed scheme. Nevertheless, each method has rights similar to the GPL,14 but requires a paid license for
its advantages and disadvantages. When employing practical commercial use. While OMNeT++ itself only contains the
tests, the accuracy of the results can be faultless, however, core simulation framework, various models can be added via
in contrast, complexity and cost can increase. On the other external frameworks. The most important one is the INET
hand, modeling a new protocol based on conceptualization is Framework,15 which is maintained by the OMNeT++ core
beneficial for having an analytics model, yet the complexity team and provides models for network standards like IEEE
is still a flaw. Moreover, the accuracy can be declined as the 802.3 and IEEE 802.11 as well as higher layer protocols like
lack of capability for reflecting real-world scenarios might IP, UDP, and TCP.
be highlighted. Considering the above challenges, using In terms of ML, Veins-Gym [174] exposes an OMNeT++
simulation and emulation environments can strike a good simulation as an OpenAI Gym environment, analogous to
balance between complexity, cost, and accuracy. They might ns3-gym for ns-3. Despite its name, Veins-Gym can be used
not depict real-world conditions minutely, but even so, they not only in combination with the Veins framework but also
are eminent tools that can assist a researcher or developer with any OMNeT++ simulation.
to expand novel schemes. A thorough analysis of different An overview with examples of how to use different ML
simulators for networking can be found in [168] and [169]. frameworks such as TensorFlow in OMNeT++ can be found
in [175].
NETWORK SIMULATOR 3
One of the most powerful simulation tools in networking is 2) EMULATORS
ns-3 (Network Simulator - 3)12 [170], which is a discrete- A network emulator, unlike a simulator, creates a virtual copy
event open-source simulator under the GNU GPLv2 license. of a physical device, including all hardware and software
This tool comes with various modules such as Wi-Fi, LTE,
or even a recently released mmWave (millimeter wave) 13 https://omnetpp.org/
14 https://omnetpp.org/intro/license
12 https://www.nsnam.org 15 https://inet.omnetpp.org/
configurations, to functionally replace it. Hence, emulation deep learning purposes. Also, they provide the AndMal 2020
is more accurate than simulation, but also more expensive dataset to identify and classify Android malware based on
in terms of computation resources. There are many network ML.
emulating tools, including but not limited to:
• Mininet16 : a Python-based tool focused on emulating
3) SYNTHETIC
software-defined networks (SDNs) using OpenFlow
Synthetic data is needed because it can help to overcome the
switches.
lack of up-to-date real-world data and privacy constraints,
• GNS317 : supports a wide range of network devices and
which limit the development of new models. In addition,
protocols using virtual machines and real devices.
synthetic data can provide an efficient mechanism to
• Mahimahi18 : a lightweight network emulator that is
surmount the lack of labeled datasets and post-processing
designed to emulate low-bandwidth networks with high
overhead. In the context of network traffic analysis, synthetic
latency.
data can be used, for example, to train ML models to detect
• WANEM19 : a Linux-based tool that can be used to
cyber-attacks and resolve network congestion as well as other
emulate various network conditions such as latency,
performance issues.
packet loss, and bandwidth limitations in WAN.
SynGAN (Synthetic Generative Adversarial Network)
• TENS20 : a VM tool that can be used to generate emu-
[178] is a packet-level GAN designed to generate synthetic
lated network traffic for security evaluation purposes.
traffic data. It generates synthetic packets that closely
It can generate various types of traffic, such as HTTP,
resemble real-world traffic by simultaneously training the
FTP, SMTP, etc.
generator and discriminator networks. The generator network
• CORE21 : similar to GNS3 but with further emulation
takes random noise as input and produces synthetic network
capabilities beyond traditional networks, such as SDN
traffic data as output, while the discriminator network
and virtualization technologies.
distinguishes between synthetic and real data. Adversarial
• FlowEmu [176]22 : a modular network link emulator
training ensures that the synthetic data produced by SynGAN
with a flow-based programming inspired user interface
is representative of real network traffic.
that integrates TensorFlow for writing custom ML
To make sure that the generated data satisfies certain
modules.
constraints, PAC-GAN (Projection Adversarial Constraint
Many of these tools have been used for training and
GAN) [91] uses a projection operator to map the generated
evaluating ML algorithms. For example, SDWAN-gym23 and
data onto a feasible set that satisfies the desired constraints.
IROKO [177] are Python-based platforms built on top of
In addition to the standard GAN loss, PAC-GAN uses a
Mininet for training and evaluating reinforcement learning
constraint loss to ensure that the generated data is not only
algorithms in software-defined WANs and data centers,
realistic but also satisfies the desired constraints.
respectively. It is often the case that emulated data is mixed
Another type of traffic generator is flow-based GANs that,
with real data for a large reliable dataset. There exist many
unlike packet generators, focus on individual packets and
datasets that adopt this approach in cybersecurity, such as
generate flows of packets that share common characteristics,
the Canadian Institute for Cybersecurity database.24 They
such as source and destination IP addresses, source and
provide the ‘‘CICIDS2017’’ dataset, labeled network flows
destination ports, and protocol type. Additionally, they can
with full packet payloads in PCAP format, for ML and
reduce the amount of data needed to be generated by
16 http://mininet.org/ generating a single flow instead of multiple individual
17 https://docs.gns3.com/ packets.
18 https://manpages.org/mahimahi/
The authors in [179] propose different preprocessing
19 https://github.com/PJO2/wanem
20 https://github.com/vmware/te-ns
approaches, for transforming IP addresses of flows into a
21 http://coreemu.github.io/core/ continuous feature, since GANs can only process continuous
22 https://github.com/ComNetsHH/FlowEmu features. Then, they use domain knowledge, such as packet
23 https://github.com/amitnilams/sdwan-gym size, inter-arrival time, and flow duration distributions,
24 https://www.unb.ca/cic/datasets/index.html to evaluate the quality of the generated data. Another
OpenAI Gym / Gymnasium: OpenAI Gym32 (lately con- OpenAI SpinningUp: OpenAI SpinningUp36 is a great
tinued as Gymnasium33 by the Farama Foundation) is an resource for aspiring researchers and practitioners that are
open-source Python library that provides a standardized API excited to apply DRL to their problems but are over-
for the interaction between RL algorithms and environments. whelmed by the implementation complexity of algorithms
Additionally, it includes a wide range of environments of in frameworks like Stable Baselines3. It provides detailed
different complexities, including classic control tasks, Atari explanations of the most important concepts of DRL, as well
games, robotic simulations, as well as physical simulations. as explanations and implementations of key DRL algorithms.
This allows researchers to reproducibly benchmark RL algo- The algorithm implementations specifically focus on sim-
rithms on a standardized set of environments. Furthermore, plicity with the aim of being easy to follow for people new
Gym can be extended by custom environments, allowing to the field. This simplicity is achieved by narrowing down
users to easily compare the performance of different RL the implementations to the core concepts of the algorithms,
algorithms for customized problems. and by omitting more complex features that can significantly
One challenge of RL research is that different implemen- improve the algorithm’s performance. As a result, OpenAI
tations of the same RL algorithm can have significantly SpinningUP should be primarily seen as a resource for
different performances in the same environment, making RL education that should not be used in production systems.
algorithms highly sensitive not only to hyperparameters but PettingZoo: PettingZoo37 is an open-source Python
also to small implementation details [186]. library that contains a set of environments for multi-agent
Stable Baselines3: Stable Baselines3 (SB3) [187] is an reinforcement learning. While it is similar to OpenAI
open-source Python library that contains reference imple- Gym/Gymnasium in its functionality and API, the application
mentations of seven widely used DRL algorithms. Tab. 10 scenario of MARL is different from the one of single-agent
lists all supported algorithms. The performance of those RL. Among others, it contains multi-agent environments
algorithms has been thoroughly tested. The library is of Atari games and classic games like chess and Go.
compatible with the OpenAI Gym/Gymnasium API, enabling Furthermore, it can be extended by custom environments.
users to train RL agents in just a few lines of code. Moreover, Ray RLlib: Ray RLlib [189] is an open-source Python
the library supports custom Gym environments, custom library for RL. Out of the RL libraries presented in this
policies for the algorithms, TensorBoard, as well as data section, it is the most comprehensive one. It supports a wide
logging customization through custom callbacks. range of performance-tested RL algorithms, offers a high-
Additional RL algorithms are implemented in the Stable level user-friendly API to train agents, supports single-agent,
Baselines3 Contrib (SB3-Contrib)34 package. These are multi-agent, and custom environments, offers high scalability
implementations of newly published algorithms. They are by supporting both single-machine and distributed training,
less tested and therefore considered experimental. and offers tools for managing, tracking, and visualizing the
RL Baselines3 Zoo: RL Baselines3 Zoo35 is a Python results of experiments. Because it is built on the Ray platform,
library that provides pre-trained agents and a set of optimized it is also seamlessly compatible with other Ray libraries and
hyperparameters for the algorithms from SB3 and the Gym tools for distributed computing and parameter tuning.
environments. Moreover, it provides useful helper scripts for When to use which RL library? An important question to
training and evaluating agents, for tuning hyperparameters, answer in this primer is when to use which of the presented
and for plotting results. RL libraries? CleanRL is recommended to be used either
CleanRL: CleanRL [188] is a DRL framework that to fully understand how an algorithm is implemented or by
provides thoroughly benchmarked single-file Python imple- RL researchers to quickly prototype new ideas, since its
mentations of eight DRL algorithms (c.f. Tab. 10). Its goal design decision to separate each algorithm into its own file
is to provide researchers full control over an algorithm in lets the researcher focus on the algorithm instead of the
a single file, making it easier to 1) fully understand all complex software architecture of other RL algorithm libraries
implementation details, and 2) quickly prototype novel DRL with intertwined modular implementations. SB3 is primarily
features. In addition, it provides support for TensorBoard. intended to offer well-tested baseline implementations of
In comparison to SB3, CleanRL does not provide a high-level important DRL algorithms as a benchmark baseline for new
user-friendly API for model training. It is instead tailored RL developments. However, along with its extensions SB3-
to provide a development environment for DRL researchers contrib and Zoo, it is recommended to be used if a high-level
with implementations that are easy to read, debug, modify, interface for fast training of well-established and well-tested
and study. The desired workflow is to first prototype new RL RL algorithms on single-agent environments are desired
ideas in CleanRL and afterwards port it to a library offering and no scalability via distributed learning is required. RLlib
a higher-level API like SB3. offers a production-ready framework for large scale projects.
It is recommended to be used for multi-agent environments,
32 https://www.gymlibrary.dev as well as when high scalability via distributed learning,
33 https://gymnasium.farama.org
34 https://sb3-contrib.readthedocs.io/en/master/ 36 https://spinningup.openai.com
35 https://github.com/DLR-RM/rl-baselines3-zoo 37 https://pettingzoo.farama.org
• Optuna [219]: an open-source hyperparameter opti- • Keras Tuner [223]: a library customized for Keras that
mization framework for ML. It provides a flexible provides an easy-to-use API for defining a hyperparam-
and modular platform for automating the process of eter search space, choosing search algorithms such as
selecting optimal hyperparameters for a given model random search and Bayesian optimization, and running
architecture. Optuna uses various algorithms to search hyperparameter search processes. Furthermore, Keras
the hyperparameter space, including TPE, Covariance Tuner is easy to integrate with other Keras workflows
Matrix Adaptation Evolution Strategy (CMA-ES), Non- and can optimize both single-node and distributed
Dominated Sorting Genetic Algorithm II (NSGA-II), hyperparameters.
and adaptive sampling. It also supports distributed • Hyperopt [224]: a Python library for hyperparameter
optimization across multiple nodes for faster and more optimization that uses a combination of random search
efficient tuning. and Bayesian optimization to efficiently explore and
• Ray Tune [220]: the hyperparameter tuning component exploit the hyperparameter search space. It provides an
of the Ray framework. It handles the execution of easy-to-use API for defining the hyperparameter search
experiments including parameter studies with possibly space, selecting optimization algorithms, and executing
multiple repetitions as well as scheduling the runs the hyperparameter search process. Hyperopt uses
for parallel execution. For hyperparameter tuning, a Tree-structured Parzen Estimator (TPE) algorithm
it supports a wide variety of approaches. These include to model the relationship between hyperparameters
basic strategies such as grid or random search, but and model performance and to guide the search for
also more advanced approaches such as Bayesian better hyperparameters. Hyperopt also allows for the
optimization or Population Based Training [221]. While parallelization of the search process, making it scalable
some algorithms are implemented internally, it relies to large hyperparameter search spaces and parallel
heavily on third-party optimization libraries such as computing environments. It can be used with a variety
Hyperopt [222] and Optuna [219], and provides a of machine-learning frameworks, including Scikit-learn,
unified interface to them. Keras, and PyTorch.
• Scikit-Optimize [225]: a Python library for sequen- test and evaluate a wide range of wireless sensor networks and
tial model-based optimization that aims to efficiently IoT systems [230]. It supports various wireless technologies,
explore and exploit the hyperparameter search space such as Zigbee, Z-Wave, and LoRaWAN, and it can be easily
while minimizing the number of model evaluations. extended to support new technologies. FlockLab is widely
It provides a simple and flexible API for defining used in the field of WSNs and IoT systems [231], and it
the hyperparameter search space and selecting opti- has been developed and maintained by the Communication
mization algorithms, including Bayesian optimization Systems Group at ETH Zurich.
and gradient-based optimization. Scikit-Optimize also FIT IoT Lab [232] is an open-access testbed for IoT
supports parallel evaluation of the search process, experiments provided by the French Institute of Technology.
making it scalable to large hyperparameter search spaces It contains over 1500 nodes offering a wide range of
and parallel computing environments. In addition to low-power wireless devices that can be used to test and
hyperparameter optimization, Additionally, it can be evaluate various IoT applications, protocols, and algorithms.
used for function optimization and global optimization In addition, its large-scale infrastructure and easy-to-use web
tasks. Furthermore, it integrates easily with popu- interface provide a flexible and convenient platform for IoT
lar ML frameworks such as Scikit-learn and Keras, experimentation.
while including features such as early stopping and D-Cube [233] is a testbed by Graz University of Tech-
warm-starting to further improve the efficiency of the nology. It contains about 50 nodes with two platforms,
hyperparameter search process. nRF52840 and TelosB, and provides a set of predefined
Note that Table 12 presents only the most commonly used scenarios. These scenarios allow researchers to evaluate
algorithms for each tool. While other algorithms may be protocol performance and compare it against each other
added, the mileage may vary depending on the specific use easily.
case and requirements. Overall, the choice of which tool to CLOVES [234] is a part of the IoT Testbed at the
use depends on the specific requirements and use case. For University of Trento. It contains 275 indoor devices spread
example, if there is a need for scalability and distributed over 8000 square meters. Communication is possible using
training, Ray Tune is a good choice. If there is a need for ultra-wideband or narrowband, and all nodes are remotely
a general-purpose optimization library, then Scikit-Optimize accessible.
might be a good choice. Next, we are going to introduce some wired testbeds. Note
that some of them also provide wireless capabilities.
E. TESTBEDS PlanetLab [235] was founded in 2002 by researchers
As previously outlined, it is hard to replicate realistic network from several universities, including Princeton University, the
conditions, and using existing datasets might not always fit University of California at Berkeley, and Stanford University.
the problem. While simulation tools can help with that, there While it was shut down in March 2020, PlanetLab Europe38
is also the possibility of using existing testbeds or building continues to operate. It is a collection of interconnected
your own. Access is usually open or free to researchers computers located at over 250 sites in more than 40 countries
for the existing real-world testbeds, but you might have to across Europe and beyond, available for researchers to use
schedule your experiments and wait depending on utilization. in their experiments. PlanetLab Europe provides researchers
In the following, some popular real-world testbeds and some with virtual machines, storage, and network connectivity.
devices one could use to build a testbed will be introduced. In addition, researchers can deploy their software on the
There are two types of testbeds, wired ones, and wireless nodes and create custom network topologies to simulate
ones. The wireless ones are wireless sensor networks without various network scenarios.
any routers or switches, and communication is broadcasted. EmuLab39 [236] is a network testbed developed by the
Testbeds are relatively versatile and can be used to either test University of Utah that provides users with a virtual network
ML applications that rely on networks like Distributed ML or environment to test and evaluate various networking systems
to test ML algorithms that do traffic routing, for example. and applications. Emulab allows researchers to create and
configure network topologies, deploy software and network
1) REAL-WORLD TESTBEDS services, and generate different types of network traffic to test
For a more extensive overview, [226], [227], [228], and [229] and evaluate various networking scenarios.
provide surveys that either include a section about testbeds or GENI (Global Environment for Network Innovations)
are entirely about testbeds. We present a selection of popular [237] is a US national-scale network testbed that provides
testbeds, starting with wireless testbeds. researchers with a virtual laboratory for developing and test-
FlockLab [230] is an experimental platform that enables ing new networking technologies and applications. It com-
researchers to test and evaluate the performance of wireless prises a large-scale network of interconnected computing
sensor networks (WSN) and IoT systems. It is a flexible,
open-source testbed that provides a controlled and repeatable
environment for the evaluation of various applications. 38 https://www.planet-lab.eu/
An advantage of FlockLab is its flexibility, as it can be used to 39 https://www.emulab.net/
resources, including servers, routers, switches, and other CPU and memory capacity, which makes them powerful
network devices. enough to run various applications. The Raspberry Pi can
run TensorFlow Lite and other ML frameworks, enabling
2) BUILDING YOUR OWN TESTBED researchers to run pre-trained models and perform basic ML
When seeking greater control over a testbed, building a tasks. It can also be used as an edge device for collecting and
customized one emerges as a viable option. Fortunately, preprocessing data before sending it to the cloud for further
there are several cost-effective devices available for this analysis.
purpose, with some even incorporating machine learning The Intel Movidius Neural Compute Stick42 is a USB
accelerators [238]. These accelerators enable the deployment device that provides on-device AI inference for various appli-
of machine learning models for training and inference within cations in networked systems. It features a Myriad 2 VPU,
the testbed and offer a variety of communication approaches. which can run deep neural networks with low power
In this section, we will provide a list of the most common consumption. The Neural Compute Stick can accelerate
and popular devices used for this purpose, along with detailed computer vision, speech recognition, and natural language
explanations of their respective advantages. processing tasks in networked devices.
NVIDIA Jetson40 is a series of embedded computing
boards designed for IoT and ML applications. They include V. EXPLAINABLE ARTIFICIAL INTELLIGENCE
NVIDIA GPUs and CPUs, as well as a variety of interfaces While ML and especially DL models are powerful tools
and sensors for connecting to other devices. Jetson boards are for network service providers, they come with the major
designed to be low-power and compact, making them suitable drawback that their reasoning is difficult to understand for
for portable and battery-powered applications. They can be humans due to their black-box characteristics [239]. This lack
used for various tasks, including image and video processing, of understanding may result in stakeholders, e.g., network
deep learning, and robotic control. service providers, not deploying ML models in production
Google Coral41 includes a range of hardware and software environments as they do not trust their reasoning and, thus,
products, such as the Coral Dev Board, the Coral USB fear outages or revenue losses. To alleviate these concerns,
Accelerator, and the Edge TPU software. The Coral Dev Explainable AI (XAI) is well-suited as it helps to understand
Board is a single-board computer that is designed to be small the underlying reasoning of ML models. This reasoning
and low-power, making it suitable for use in portable and is achieved by intelligently relating inputs and outputs.
battery-powered devices. It has a system-on-a-chip (SoC) The thereby learned transformation function or only some
that includes a Google Edge TPU, which is a custom-built parts of it become interpretable. Usually, this interpretability
Tensor Processing Unit (TPU) for running ML/DL models. comes in the form of mathematical functions or as heatmaps
The Coral USB Accelerator is a small USB device that can describing the influence of the inputs on the model’s decision.
add Edge TPU capabilities to existing devices. The Edge TPU In addition, a quantification of a model’s uncertainty is
software provides a set of libraries and tools for developing fundamental for risk assessment during deployment, thereby
and deploying ML models. paving the way for Responsible AI.
The Raspberry Pi boards are equipped with a variety There are plenty of use cases to apply XAI in commu-
of interfaces and peripherals, such as USB ports, Ethernet, nication networks [240]. These use cases include network
HDMI, and a 40-pin expansion header. They also have high planning and engineering [241], resource allocation [242],
[243], performance management [128], [244], and security
40 https://www.nvidia.com/de-de/autonomous-machines/embedded-
systems/ 42 https://www.intel.com/content/www/us/en/developer/articles/tool/
41 https://coral.ai/ neural-compute-stick
management [245], [246]. Most of these works use the As a consequence, these techniques are model-specific and
methods presented in this chapter to make their models usually not applicable for network data. Nevertheless, there
explainable. exist approaches where network data is transformed to images
beforehand, e.g., for encrypted network traffic classification
A. TAXONOMY OF XAI METHODS in [251], and processed with a CNN, so saliency maps could
A general overview of XAI techniques is provided in [247] be applied here.
and an extensive survey on XAI methods as well as a Layer-wise Relevance Propagation (LRP) is a post-hoc
taxonomy for XAI methods in general can be found in [248]. method that uses the neural network’s forward pass and
XAI methods can be classified into techniques which explain propagates its output backwards through the layers until the
a model locally or globally. A local explanation technique input layer to derive the relevance of an input on the model’s
provides model explanations for a single input, e.g., why is prediction.
a specific packet routed that way, while a global explanation A prevalent local model-agnostic post-hoc explainer is
technique provides general explanation strategies of a model, called SHapley Additive exPlanations (SHAP), which uses
e.g., how does the model route packets in general. methods from game theory to judge the importance of
Further, XAI methods can be classified into post-hoc different feature inputs. Although this method can explain
explainers and interpretable models. Post-hoc explainers the black box of a ML model very well, it comes with the
are utilized to explain various already trained black-box drawback that it needs high computational power. Thus, it is
models, e.g., neural networks or ensemble models. Ensem- only feasible for models with fewer input parameters [252].
ble models like Random Forest are composed of multiple A well-working method for getting an explanation of
smaller models jointly determining the output. This makes classification models in a model-agnostic fashion is a method
interpretation difficult. Interpretable, transparent, or glass- named Local Interpretable Model-agnostic Explanations
box models provide an explanation for how the model obtains (LIME) [253]. LIME belongs to the class of surrogate
the output by design. Prevalent models are, for example, the models, where a model is used to approximate the predictions
well-known linear models and decision trees, as well as the of a target black-box model to infer the reasoning of the black-
less-known generalized additive models. box model. LIME trains a local surrogate model to explain
Finally, model-agnostic methods and model-specific meth- the predictions for a specific sample by first aggregating
ods are distinguished. Model-agnostic methods can be used permutations of the original feature inputs of the sample into
on top of every kind of model, while model-specific methods a new dataset, weighting the samples of the dataset according
can only be used by specific model families. A prominent to their proximity to the original sample, and then training
example of model-specific methods are saliency maps [249], an interpretable model on this dataset to approximate the
which are computed from the feature maps learned by a predictions of the black-box model. After training, the local
model and can be used in computer vision to highlight model can be interpreted to understand the black-box model’s
the regions on which the model focuses when processing reasoning.
input. They are generally applicable when using CNNs. This Another type of local model-agnostic post-hoc explainers
also implies that the nature of the data directly influences are counterfactual explanations [247]. Counterfactual expla-
the applicable XAI techniques for the different use cases, nations are used for causal reasoning and may serve to answer
e.g., time series XAI techniques are not usable for graph data. what-if questions, i.e., ‘‘would Y have occurred if X had
not occurred before’’. These techniques may be helpful for
network operators when they try to analyze and manage
B. SPECIFIC XAI METHODS
their network with respect to critical situations, e.g., how to
Since there are many different categories of XAI techniques, avoid congestion in a network. In a nutshell, they work by
there is a wide spectrum of specific XAI methods. Thus, deriving causal relationships from the input features and then
the following explained methods are only a small selection. manipulating input features to perform specific reasoning.
Due to the fact that in many XAI scenarios a black-box
ML model should be intelligible, the methods introduced
first focus on post-hoc explainer. While it is common to 2) INTERPRETABLE MODELS
perform post-hoc explanations, the authors of [250] argue The easiest to interpret and most known interpretable
that we should stop using post-hoc explainers and instead models include decision trees, which are interpretable in
directly use interpretable models. Interpretable models often an if-else fashion, and linear models like linear regression
perform weaker than black-box models, but are interpretable or logistic regression, where slope and intercept directly
by design. characterize the input mapping. Generalized linear models
(GLMs) and generalized additive models (GAMs) extend
1) POST-HOC EXPLAINERS linear models to better reflect non-linear functions and
As a majority of advances in ML happen in computer different target distributions other than Gaussian distributions
vision, there exists a huge variety of post-hoc explainers as with linear regression [247]. Especially for GAMs, many
explaining the learnt filters of a CNN, e.g., saliency maps. different models exist by now that are directly interpretable.
that many approaches quantify only epistemic or aleatoric sufficient for an ethical and responsible usage of ML
uncertainty, but not both simultaneously. A survey over models. Responsible AI is in general a much broader
existing approaches is provided in [260]. In the following, topic than XAI [262]. With responsible AI, there are
some selected ways to quantify uncertainty are shortly additional principles, which must be kept in mind, when
introduced. A way to estimate epistemic uncertainty is developing and deploying ML models. These principles
for example the use of ensembles, i.e., training the same include the prevention of discrimination against persons,
model with different seeds and considering the predictions groups, or races, i.e., the model must be fair. In the context
of each model and, in particular, the differences in these of communication networks, this could for example mean
predictions. The stronger the differences between the models’ that a model discriminates specific users by assigning them
predictions, the higher the uncertainty. This approach can be lower bandwidth shares and a higher latency. Additionally,
used for any kind of model. Another simple approach for responsible AI ensures that users or stakeholders are always
estimating epistemic uncertainty in neural networks is the aware of the usage of ML models. Specifically, it must be
use of Monte Carlo Dropout. With Monte Carlo Dropout, the transparent to everybody that ML has been used and how it
dropout layers, which are usually used for improved model has been used. An example for communication networks is
generalizability during training, are also kept active during the adaptive change of a routing table by an ML model. The
inference. Generating multiple model predictions with active model must be able to outline why a change was required
dropout can also be considered as approximate Bayesian and why it has changed specific routes. Next, the use of
inference. Again, the variation in the returned predictions ML models should always end up in a beneficial way for
quantifies the degree of uncertainty. To learn aleatoric humanity in all aspects of life. They should not be used in
uncertainty, it is usually required to learn in the model not disruptive ways, e.g., generating downtimes in a network
only mean responses, but instead the variance must be learnt, for specific users on purpose. Finally, privacy and security
too [261]. With neural networks and a regression task, this is also a very important topic. ML models require data.
is for example easily possible by simply adding another Here, the privacy and security of sensitive data must be kept
head, i.e., output neuron, to the neural network, which learns throughout the whole lifecycle of preparing and deploying
the variance and by accordingly adjusting the loss function. the model. Responsible AI is still a young field of research.
Using the negative log-likelihood of a Normal distribution Nevertheless, all the mentioned principles must be kept in
(or any other distribution) as loss function, it is thus for mind when preparing and deploying ML models in practice.
example possible to learn a Normal distribution for an input, It is one of those topics already diligently discussed in
thereby allowing to quantify the uncertainty in form of the the conceptualization of future networks, e.g., 6G [263].
variance for an input. Finally, Bayesian Neural Networks as Meanwhile [264] is a more generic survey of best practices
proposed by Kendall and Gal [261] can model both aleatoric to ensure that the AI environments are responsible.
and epistemic uncertainty. With Bayesian Neural Networks,
model weights are assigned a probability distribution instead
of a single value. Using these probability distributions, it is E. LIBRARIES
then possible to quantify epistemic uncertainty. For aleatoric Several XAI libraries are available for all kinds of frame-
uncertainty, they simply use two heads, where they learn both works, e.g., Scikit-learn, PyTorch, and TensorFlow (cf.
mean and variance for a data point. Section IV). Microsoft created a Python library named
InterpretML [254], which unifies black-box explainers,
e.g., SHAP values, LIME, or Partial Dependence Plots, and
D. RESPONSIBLE AI transparent models, e.g., linear models, decision trees, deci-
Strongly related to uncertainty is the concept of Responsible sion rules, and also EBM, a tree-based generalized additive
AI. According to Arrieta et al. [248], XAI alone is not model. OmniXAI [265], AIX360 [266], and Alibi [267]
also provide a collection of various post-hoc explainers and it is worth noting that some metrics can also help answer
models for all kinds of data types and backends. In contrast the questions arising in ‘‘Networks for ML’’. The choice of
to the libraries containing several different tools, individual metrics will depend on the specific problem and the desired
explainers like SHAP or Anchors, but also interpretable outcome. Hence, ‘‘ML for Networks’’ and ‘‘Networks for
models like the attention-based model TabNet43 are available ML’’ are not mutually exclusive [277].
as separate Python packages. For instance, Data Quality is a metric that can be used
All gradient-based methods, e.g., Integrated Gradi- for evaluating both. As ML is generally data-driven, data
ents [268], can be directly implemented within PyTorch quality is very important for model development. Thus,
and TensorFlow, or additional libraries like Captum [269], when ML is applied for network tasks, data quality is
TorchRay [270] and TF-Explain [271] can be used. Captum often determined primarily measured by the correctness
also comprises a huge number of techniques for explaining and representativeness of events/classes. This can also be
image-based data. utilized in ‘‘Networks for ML’’. However, as it focuses on
decentralized data sources, data distribution can additionally
VI. NETWORKS FOR MACHINE LEARNING be considered. Other metrics typically considered by the ML
In the previous sections, we primarily explained ML methods, community are: Privacy, Robustness, Energy Efficiency, and
architectures, and principles to develop ML models. Hence, Fairness. However, as ‘‘Networks for ML’’ also focuses on
we focused on applying ML to design and optimize networks, network behavior, typical network metrics are often applied,
detect patterns and anomalies, and predict network behavior such as Throughput, Latency, Packet loss rate, and Spectral
autonomously. We refer to this application as ‘‘ML for efficiency. Table 14 further explains the metrics and their
Networks’’ [272], [273], where ML models are developed impact.
from network data to, e.g., design the communication So why do these network metrics influence the ML
topology of a network or to balance the traffic load. models? High latency and low throughput (as well as low
However, networks and ML form a mutual relationship spectral efficiency) can cause delays in the training process,
in which networks support ML, e.g., by using a network as leading to slower training times and increased iteration
an infrastructure for ML algorithms, both for training and cycles. Packet loss can impact the accuracy and also the
inference. As we will see throughout this section, networks consistency of ML models, because it can lead to incorrect
are thus a key success factor for ML by connecting and or incomplete data inputs, and can cause inconsistent data
providing computational power and data storage [274], [275]. transfer in case of retransmissions. This, in turn, can affect
We refer to this support and infrastructure functionality of the model’s ability to generalize, converge and make accurate
networks as ‘‘Networks for ML’’. Important to note is that predictions.
it is detached from the ML model application. Instead, any Different network topologies could affect the ‘‘Networks
ML model can be trained or deployed in a networked system. for ML’’ performance, scalability, and security. Furthermore,
As ML is a relatively new network task, challenges for when considering ML for Networks, the choice of network
networks arising from ML traffic and possible effects on ML topology can also affect the accuracy and efficiency of the
from networks are still the subject of research. ‘‘Networks for models.
ML’’ generally comprises these open research questions. For example, in a star topology, all nodes are directly
ML algorithms primarily use a network to access data connected to a central hub, which can make the network
from memory or to exchange model parameters/updates. easier to manage and administer. From a ML perspective,
The traffic load generated, the traffic shape, and network this topology would lend itself to centralized learning, where
requirements, e.g., regarding latency and robustness, are data from all nodes is collected and processed in a central
unknown for many ML methods and are likely to be location. This approach could simplify the deployment and
application-specific and method-specific. All this can pose maintenance of the ML model, but it could also lead to a
new challenges for networks and make a better understanding single point of failure and potential privacy concerns.
of the mutual relationship between ML and networks On the other hand, a mesh topology, in which nodes are
necessary. Thus, it is no longer sufficient to evaluate ML connected in a decentralized fashion, can be more resilient to
model performance alone but also the network performance. failures and provide more privacy, but it can also be more
Hence, one might ask the question: Which metrics to use to difficult to manage. In terms of ML, this topology can be
evaluate model and network performance when applying ML suitable for distributed learning, where each node trains a
in networks? local model and shares its knowledge with the other nodes.
From Section II, we know, that several metrics can be This approach could improve the scalability and privacy of
used to evaluate the performance of ML models. These the model, but it could also increase the synchronization
metrics could depend on the specific task or application of overhead.
the model [276]. Although these metrics were introduced for There are also other network topologies, such as bus, ring,
ML models with network application (‘‘ML for Networks’’), tree, and hybrid, which can have different tradeoffs in terms
of network metrics. Choosing the right topology for a ML
43 https://dreamquark-ai.github.io/tabnet/ application depends on several factors, such as the size and
TABLE 14. Examples of metrics for ML for Networks and Networks for ML.
complexity of the network, the nature of the data and task, [281]. Therefore, the data is first collected from various nodes
and the available resources and constraints. in the network and then transmitted to a central server to
Examples of these constraints are computational resources train the ML model. Typically, the data is also preprocessed
and data availability. In the former, a star topology would for the training, which can happen on both, the collecting
have a central server that must have sufficient compu- nodes and the central server. In many cases, the central
tational resources to process all the data, whereas, in a server has more computing resources and larger storage space
mesh topology, each node could contribute computational than the collection nodes. Since training and inference are
resources, reducing the burden on any one node. In the independent, the resulting model can be used centrally and
latter constraint, a star network topology would have the decentrally for inference. In centralized inference, a central
data stored in a single location, which can limit the amount computing node (server) employs the model to infer from the
of data available for training. In contrast, a mesh topology data of various collection nodes. The collection nodes usually
could distribute data across multiple nodes, providing a send their observed data to the central computing node and
larger and more diverse data set for training. We refer receive the model predictions. However, it is also common to
to [279] for a comprehensive survey on the convergence, distribute the centrally trained ML model to different nodes,
robustness and privacy of ML algorithms with respect to which then independently infer from their local data.
network architecture and implementation in the context of 5G Centralized ML takes advantage of monopolization
networks. through central servers (e.g., on the cloud) with power-
In the following, we will explain advanced ML topics that ful computing resources that can handle the processing
have distributed implementation, exploiting both ‘‘ML for and training of computationally-heavy models using large
Networks’’ and ‘‘Networks for ML’’ domains. datasets [282]. While the increase in training speed and
better resource utilization is obvious, the benefit of more
A. CENTRALIZED ML accurate predictions requires a more detailed explanation.
Centralized ML refers to training ML models on a central Unlike the case where each node trains its model using its
node of the network using data from multiple nodes and is local data, centralized ML training benefits from aggregating
widely applied in networked systems such as the IoT [280], data from multiple nodes [283]. Thus, the model trains
on a larger dataset, but also the aggregated data better aggregations occur. From this perspective, architectures can
represents the overall data distribution, allowing the model be primarily distinguished by Client-Server and Peer-to-
to generalize better. For example, an ML model can extract Peer methods. The Client-Server methods use a set of
significant information from data from different sensor decentralized workers that process model updates as Clients
types or locations. This is particularly useful in network and a centralized server as Server. The server can be a
applications such as smart cities, environment monitoring, single worker, or multiple workers organized equally or
and industrial IoT. in hierarchical layers. Regardless of the server’s internal
However, there are also several disadvantages associated structure, the server maintains the shared model state and
with centralized ML in networked systems. First, the depen- stores all model parameters. Clients receive the current model
dence on a central computing node for model training and state with its parameter set from the server and communicate
inference introduces a single point of failure, and scalability their updates only to it. All communication is thus handled by
issues, potentially impacting the reliability and availability the server, which can lead to a bottleneck. In contrast, Peer-to-
of the system [284]. The centralized approach places high Peer methods entail direct communication of updates among
demands on the central server in terms of computing and workers without the presence of a central server managing
network performance, making its acquisition and mainte- the global model state. Which workers can communicate with
nance expensive. Secondly, the data collected by networked each other is defined in a communication topology. Here,
devices (e.g., multimedia sensors, intelligent vehicles) is all-to-all but also graph-based topologies such as trees and
transmitted in large quantities over the network, requiring rings are possible. In addition to the cooperation relationship,
high data rates. Nevertheless, transferring large amounts of data-parallel methods differ in whether workers transmit their
data to a central node can cause network congestion and updates synchronously or asynchronously and in the amount
degrade real-time performance [285]. Sensitive data may of communication overhead incurred. In production, where
need to be transmitted to the central server, potentially it is usually inferred from the model, the machines use the
compromising user privacy. Recently, there are growing mutual model instance.
concerns about privacy in networked systems with data Model parallelism, on the other hand, splits the model and
generated by networked devices, such as wearable devices or distributes it across multiple workers, allowing for model
sensors, where data is often very private or sensitive [286]. sizes larger than the memory of a single machine. Each
This results in additional requirements for the network over worker trains and infers only its parts of the model, which
which the data is transmitted, processed, and stored. requires less memory. Consequently, the model’s entirety
is upheld collectively by all workers, necessitating constant
communication among them during both the training and
B. DISTRIBUTED ML inference phases. The data is fed to the workers that maintain
In various fields of application, the complexity of tasks being the input layer of the model, and each worker forwards its
tackled by ML models has led to an increase in the number of computed output to the worker holding the next part of the
model parameters. To cope with this complexity, distributed model. In the backpropagation step during training, the work-
ML techniques make use of networks of interconnected ers holding the output layer first compute the updates. The
computing machines to address challenges such as handling updates are then propagated to the workers in reverse order
larger and distributed datasets, accommodating heightened and applied. A central challenge within model parallelism
computing resource demands, and dealing with models that lies in devising an effective strategy for partitioning a given
surpass the memory capacity of a single machine. Here, model across multiple networked machines. This partitioning
two approaches are prevalent and usually take advantage of determines how the model segments are distributed among
networking to enhance model training: 1) data-parallel and workers to optimize communication and computation while
2) model-parallel. Combinations of data- and model-parallel maintaining overall model coherence.
methods are also possible. Common methods for distributed ML for data-parallel and
Data-parallel corresponds to scale-out parallelization and, model-parallel are explained below.
therefore, increases computational capacity. During training,
several machines, so-called worker, train instances of the
ML model. These instances operate on distinct and usually C. PARAMETER SERVER
non-overlapping portions of the dataset. All instances have Parameter Server [287], [288] is a data-parallel Client-Server
the same model structure, number of layers, and number method (cf. Figure 9a). Here, multiple decentralized clients -
of neurons per layer, but the parameter values can vary. Worker are connected to a centralized server - Parameter
The workers periodically communicate to exchange model Server. The parameter server stores the model parameters,
parameters and aggregate their updates after processing a assigns data to workers, and aggregates the updates received
predefined number of samples locally. Various data-parallel from workers. Often, the parameter server is a single machine
methods have been formulated, differing primarily in the but can also be a set of equivalent or hierarchically structured
manner of cooperation among workers during training, machines [289]. Each worker maintains an instance of the
encompassing how workers communicate and where update model and individually processes parameter updates based on
its data. Typically, SGD is used for parameter optimization convergence. FedAvg works by having each device train its
during a training process. The processed data can either be own local model using its local data, and then the local models
captured and stored on the worker machine or transmitted are aggregated to form a global model that is distributed
from the parameter server. Usually, workers access only (non- back to the devices for further training. To address the
overlapping) portions of the data. The complete dataset is challenge of bias and inconsistency across devices, FedAvg
thus distributed across multiple workers. After processing uses a weighted average of the local models, with the
a predefined number of data samples, the workers first weights determined based on the amount of data each device
propose their parameter updates to the parameter server and contributes to the model. This approach ensures that each
then receive the updated model. However, how many other device’s contribution is weighted appropriately, producing a
workers have contributed to the updated model depends more representative and robust global model.
on the Parameter Server implementation. For synchronous By training a model locally, FL allows devices to make
implementations, the parameter server considers updates predictions and decisions without the need for a constant
from all workers. The workers do not continue processing network connection to a central server. This is particularly
until the updated model has been broadcast. Therefore, useful in applications such as autonomous vehicles, drones,
the slowest worker impacts the time for a model update and medical devices where data needs to be processed in
significantly. In contrast, in asynchronous implementations, real-time. Additionally, FL is also beneficial in scenarios
the parameter server updates and broadcasts the model where data is sensitive and cannot be shared, such as medical
immediately after receiving an update from the sending imaging or financial data. Another essential benefit of FL
worker. Here, workers proceed on different model instances. is its ability to handle data that is non-IID (Independent
This is a problem in heterogeneous environments, with and Identically Distributed), a common characteristic of data
different computing resources and transmission delays. collected from networked devices.
Slower workers working on outdated model instances can In traditional centralized learning, data is often assumed to
derange SGD’s solution with their updates, causing the be IID, which means that it has the same distribution across
model to converge incorrectly. For homogeneous cluster all devices. However, in practice, each device can have its
environments, this is not the case and is often faster own data distribution, which can lead to biased or suboptimal
than synchronous systems [290]. Since synchronous and models. FL algorithms such as Federated Averaging [291],
asynchronous Parameter Server implementations struggle in Federated Transfer Learning [292], and Federated Meta-
heterogeneous environments, time-wise and model-quality- Learning [293] are proposed to address these issues.
wise, respectively, Parameter Server is typically applied in
data centers.
E. ALL-REDUCE
D. FEDERATED LEARNING The All-Reduce approach [294] is a data-parallel distributed
Federated Learning (FL) [291] is another data-parallel Client- ML method for training ML models and implements the Peer-
Server distributed ML method that enables multiple devices to-Peer concept. Therefore it dispenses with a central server,
to collaboratively train a shared model without sharing their and instead, workers communicate directly. Which workers
raw data. This approach has gained significant attention in communicate with one another is specified by the commu-
recent years due to its ability to protect user privacy and nication topology used. Multiple communication topologies
enable learning on edge devices with limited computational are possible for the All-Reduce approach, i.e., ring [295], but-
resources. terfly [296], and trees [297]. The communication topologies
In FL, multiple devices, such as smartphones, IoT devices, affect the data rate and latency of the network differently.
or edge servers, participate in the training process by locally In some cases, the topology also restricts access to the data
training a model using their own data and then sending their set.
updated model parameters to a central server. The server In principle, each worker maintains an instance of the
aggregates the updates from all devices and uses them to model and individually processes updates by its assigned
update the global model. Figure 9b shows an FL scenario portion of data. The data is usually distributed at the
with three connected devices and a central server. The key beginning of the training. After processing a predefined
idea behind FL is that the global model is trained using number of data samples, the workers communicate their
a large amount of data from multiple devices, while each local updates with all their peers. Shortly after, they receive
device only needs to share the model updates. This allows the updates of their peers and aggregate them with their
FL to achieve the same performance as traditional centralized own. This step of communication and aggregation can be
learning while preserving user privacy. repeated several times. When all updates are distributed
One of the most widely used FL algorithms is Feder- to all workers, each worker adjusts its model instance
ated Averaging (FedAvg). FedAvg is designed to address parameters according to the aggregated updates and proceeds
several challenges that arise in FL, including the need to to produce the next local updates. The repetitive and
preserve data privacy, mitigate bias and inconsistency across expensive communication of updates guarantees that all
devices, reduce communication overhead, and enable model workers work with the same model instance [294]. Figure 9c
illustrates the communication topology of a ring All-Reduce the updated model parameters to a central server, in SL, the
approach. devices only forward a feature representation of their data to
the central server, which performs the model updates.
In SL, the model is split into at least two parts, with one
F. SPLIT LEARNING AND INFERENCE part running on the device and the other part running on
Split Learning (SL) [298], [299] is a model-parallel dis- the central server. Figure 9d shows the SL representation
tributed ML method that decouples model training from the with three devices. The key idea behind SL is that the
need for direct access to the raw data, in which a model is split device part of the model is lightweight and can be run
into at least two sub-models. It is similar to FL, but it focuses on devices with low computational power instead of the
on the case where devices have low computational power, entire computationally demanding model. Thus, SL enables
memory constraints, or limited energy budget. In contrast to model training and inference on devices with low computing
FL, where devices typically train a model locally and send resources.
SL over networked devices is particularly useful in focus on ML for Networks (ML4N) or Networks for ML
scenarios where devices have low computational power (N4ML).
and high communication bandwidth. For example, in a Perhaps the most comprehensive survey on ML for
network of smartphones, each smartphone may have a camera Networks, [308] discusses ML approaches for a wide range
that captures images, but the device may not have the of networking challenges and provides further references to
computational power to process the images. SL can be used more specialized surveys about ML approaches in certain
to train a model that can classify images without needing to networking domains. The work of [276] considers itself
process the images on the device. as an update to [308], covering more recent develop-
ments and discussing recent IDS datasets. Additionally,
1) FEDERATED SPLIT LEARNING several surveys consider ML approaches for a subset of
Federated Split Learning (FSL) [300], [301] is a distributed networked systems, such as vehicular networks in [316]
algorithm that combines the ideas of computing the weighted and [317], Software-Defined Networks (SDN) in [307],
average, a characteristic of the FL architecture, and the mobile/wireless/ubiquitous networks in [4] and [279], edge
neural network split between the client and server of the SL computing [305], [306] or network traffic monitoring and
architecture. It thus combines data- and model parallelism. analysis [314]. The work of [309] takes a unique stance and
In FSL, all clients compute in parallel and independently. provides the joint application of recent ML and Blockchain
They send/receive their smashed data to/from the server technologies for networking problems. Other surveys focus
in parallel. The client-side sub-network synchronization, on specific ML subdomains such as unsupervised learn-
i.e., forming the global client-side network, is done by ing [29], deep learning [272] or distributed ML [310].
aggregating (e.g., weighted averaging) all local client-side The work presented in [311] and [312] specifically
networks on a separate server. consider the role of FL in networking. While [311] discusses
FL several applications in the domain of communications
2) SPLIT COMPUTING and networking, [312] focuses on mobile edge computing
Splitting a neural network for inference tasks is usually called but also discusses how communication techniques influence
Split Computing (SC). It is very similar to SL, as a model is FL methods. The studies [285], [313] provide an overview
split into sub-models and then distributed on multiple devices of various applications of ML methods through IoT systems
communicating with each other. It is helpful in scenarios and analyses various approaches on how ML models
where sensor devices are resource-limited and can not deploy can be distributed and processed in the cloud-to-things
full models. Instead of offloading the sensor data, the sensor continuum. The survey [284] discusses the convergence of
can compute a part of the model and then transmit the edge computing methods and ML; specifically, it provides
compressed feature representation, resulting in a smaller end- a comprehensive view of how networking can be utilized
to-end latency [302]. for cooperative processing of deep learning models on
Most works focus on a simple client-server scenario. The edge devices. The survey [303] provides insights into how
model is then split into a head and a tail part. The client, networked devices such as smartphones and autonomous
a sensor, gathers sensor data, feeds it into the head of the vehicles are used for collaborative training of ML model, and
model, and then transmits the feature representation to the inference operations over the network using split computing
server. The server receives the feature representation and and early exit methods.
completes the inference process using the tail. In this client- Concerning the role of XAI in networking, the amount of
server scenario, the main challenges are to minimize the survey work is limited. The work of [319] motivates the usage
head with regard to computation and size on the client as of XAI methods in networking challenges but only covers a
sensors have limited resources and to minimize the amount single concrete problem. While there exist survey papers on
of communication while making sure that the model does not XAI [320] and Explainable Reinforcement Learning (XRL)
lose too much accuracy. [321] in general (i.e., not limited to networking), to the best
Matsubara et al. [303] provide a comprehensive survey of our knowledge only [318] surveys XAI techniques in
describing many proposed methods to optimize SC. Addi- the domain of networking, namely in challenges related to
tionally, it also contains links to code where available. With wireless/6G.
sc2bench [304], there is also a pip package to test and
compare several SC techniques while providing a framework VIII. CHALLENGES AND FUTURE DIRECTIONS
to start creating your own method. The adoption of ML in networks also brings forth several
challenges and opens up exciting future directions for
VII. FURTHER READINGS research and development for both ML for networks and
Several related survey and tutorial papers exist that cover Networks for ML. In this section we touch on some of these
parts of the interplay between ML and networking to a challenges, while we refer to [322] for further discussions on
varying extent and on varying scales of granularity. Table 16 limitations and challenges of ML and more specifically, when
lists the most related of these papers while highlighting their we apply ML for Network [276], [323]. The following are
ML scope, covered network applications, and whether they some of the current challenges in ML for Networks:
TABLE 16. Selective surveys & tutorials on using ML for Networks (ML4N) and Networks for ML (N4ML).
• Scalability: One of the critical challenges in ML for decision or prediction, making it challenging to debug
Networks is scaling up models to handle large-scale or troubleshoot issues.
networks with millions of nodes and edges. Most • Heterogeneous data: Networks often contain heteroge-
ML approaches are initially developed and tested for neous data from multiple sources, such as text, images,
small-scale networks to better debug them and under- and numerical data. Incorporating this data into ML
stand their effect on individual network components. models and designing models that can effectively handle
However, making them work at scale is not always trivial heterogeneous data is another challenge that requires
because large-scale network structures might lead to further research.
computation time explosions (as has been indicated e.g. • Robustness: ML models are vulnerable to attacks and
for SDN in [324]), especially for problems where global adversarial examples, especially in network environ-
decisions are taken in a centralized manner. ments where data may be noisy or corrupted.
• Limited data: One of the challenges in ML for Networks • Real-time decision-making in closed-loop systems: In
is the limited amount of data available for training. Col- many Network Control Systems (NCSs) environments,
lecting and labeling network data is a time-consuming decisions must be made in real-time, requiring efficient
and costly process, and in some cases, data may be and fast ML model inference [325], [326]. Developing
proprietary or sensitive, making it difficult to obtain. algorithms that can make accurate but fast decisions in
• Interoperability: Another challenge is the lack of inter- real-time is a significant challenge in ML for Networks.
operability of ML models. In many cases, it is difficult One of the core problems is the potential for unstable
to understand how a model arrived at a particular system behavior caused by a mismatch between the
indented NCS sampling time and the time required for •Latency: Network latency can affect the performance
inference of an ML model. As a result, input delays of ML algorithms, particularly in real-time applications
affect the resulting system, which must be handled where decisions must be made quickly. High latency
carefully [327]. Hence, there is a trade-off between can lead to delays in data transmission and processing,
large models that can handle large-scale networks (the which can negatively impact the accuracy and effective-
scalability challenge) and the required time for their ness of the algorithm [331].
inference. In general, the required inference time by • Bandwidth: ML algorithms often require large amounts
AI and ML models will be a non-trivial function of bandwidth to transfer data, and this can be a challenge
of the resulting closed-loop system in which it is in networks with limited bandwidth. High bandwidth
embedded. For RL, delays due to model inference requirements can also lead to increased costs for network
can be explicitly included in the modeling, resulting infrastructure in a real-world deployment.
in the notion of real-time MDPs and real-time RL • Network topology: The topology of a network can
algorithms [328]. Beyond cyber-physical closed-loop impact the performance of ML algorithms. For example,
systems, model inference delay impacts user experience networks with high levels of congestion or interference
when prompt LLMs, IoT or VR services are run via may not be suitable for real-time applications.
edge computing networks [329]. In other words, in these • Privacy and security: ML algorithms require access
cases, the system loop is closed via human feedback, to data, which can create potential privacy and
where unstable behavior will eventually result in the security risks, increasing the risk of data breaches and
performance loss. cyber-attacks during transmission over the network or
• Energy efficiency: ML models often require significant remote processing of user data.
computational resources, which can be challenging • Heterogeneous resources: The computing and commu-
in resource-constrained network environments. As the nication resources in devices used for processing ML
current trend points towards ever-increasing model algorithms over the network may vary widely, leading
scales, energy efficiency might become an even more to unstable training processes. Furthermore, this can
important aspect in even more situations. lead to the presence of slower devices (stragglers) that
• Privacy and security: Networks can contain sensitive slow down the training of a global model and affect the
and private data, which requires ML algorithms to be model’s efficiency.
developed with strong privacy and security safeguards. As earlier mentioned in Section VI some of these
ML algorithms for networks must maintain data privacy challenges may overlap, such as privacy and security.
while providing accurate predictions. Overall, ML for Networks and Networks for ML are rapidly
• Network complexity: Computer Networks can be highly growing fields with many challenges and opportunities for
complex and dynamic, with large numbers of inter- future research. Addressing these challenges will require
connected nodes, an interplay of various different collaboration between researchers from different disciplines.
protocols and changing operation conditions. This In the following sections, we will discuss some of the trending
makes it challenging to develop accurate ML models, applications that focus on these challenges.
since formulating ML problems for complex application
domains or sub-problems where suitable training data A. A NEW PARADIGM FOR NEXT-GENERATION WIRELESS
is available often requires several simplifying and/or NETWORKS
narrowing assumptions at the start [330]. Leaving out The rapid advancement of AI and ML technologies has also
such assumptions one by one brings ML systems closer opened up new vistas for next-generation wireless networks
to deployment in real-world scenarios, but often is a like 5G Advanced and 6G. These next-generation networks
non-trivial task that brings unexpected challenges in essentially serve two purposes: Data transport and service
every step along the way. delivery. They comprise various types of devices from User
On the other hand, the challenges related to Networks for Equipments (UEs), base stations, switches, routers, and
ML include: servers in a data center. With the integration of SDN and
• Resource constraints: ML algorithms often require sig- Network Function Virtualization (NFV), all devices can now
nificant computational resources, including processing constantly adapt to new situations, such as changing traffic
power, memory, and storage. Moreover, the training of patterns, better function placements, or new service demands,
ML models requires large amounts of data, and transfer- and incorporateAI and ML [332]. These technologies
ring this data across networks can be time-consuming promise to revolutionize the way we design and manage
and resource-intensive. This can be a challenge in wireless networks, leading to the emergence of AI-native
resource-constrained networks, such as those in IoT networks and AI-native air interfaces.
devices and edge computing environments, or when spe- On the one hand, AI-native networks are networks
cialized networking hardware disallows certain compute designed with AI integration at their core, rather than as
operations. In addition, storing data in a centralized an afterthought or add-on. Hence, AI (partially) replaces
location can create a bottleneck and security issues. human-defined rules, models, and algorithms, which may
not be optimal or scalable for the complex and dynamic in each layer), they can capture more intricate patterns in
wireless scenarios, so that these networks can learn, adapt, data. This increased capacity, while maybe beneficial for
and optimize itself autonomously and intelligently. model accuracy, leads to a higher number of computations
On the other hand, an AI-native air interface is an air during both the training and inference phases [284]. Each
interface that uses AI and ML to define and configure its computation requires a certain amount of energy, and thus,
physical and medium access control layer parameters, such as models grow more complex, their energy requirements
as waveforms, constellations, pilots, coding, modulation, escalate.
synchronization, channel estimation, equalization, detection, The energy consumption of DNNs is a multifaceted issue.
decoding, and access schemes [333]. Training DNNs is an energy-intensive process that requires
One of the main challenges here is the complexity and substantial computational resources [340]. This phase often
heterogeneity of wireless networks. This complexity makes necessitates the use of high-performance GPUs or even
it difficult to collect, process, and analyze data in real- clusters of GPUs, which are power-hungry devices [341].
time [333]. However, this can be mitigated by using The electricity consumption during this phase is considerable,
distributed AI engines, which can process data closer to the contributing to the overall energy footprint of developing
source and reduce latency. Another challenge is the lack of DNNs. The inference phase, where DNNs make predictions
standardized frameworks and architectures for implementing on new data, also demands a considerable amount of
AI in networks. To address this challenge, industry, and energy [342]. This phase is critical in real-world applications
academia collaborate to develop standardized AI frameworks where continuous or on-demand operation of DNNs is
and tools that can be used across different networks [334], required, such as in autonomous systems or real-time analysis
[335]. There are four aspects to address this challenge [336]: applications.
The substantial energy consumption of DNNs poses a sig-
1) DATA INFRASTRUCTURE nificant challenge for environmental sustainability. As these
a distributed data infrastructure that can handle massive networks become more prevalent across various sectors,
amounts of varied, distributed, and dynamic data, and enable the need for energy-efficient neural network architectures
data ingestion, processing, and exposure across layers and and training methods becomes increasingly important [343].
domains. In energy-constrained environments (e.g., with battery-
operated devices) the energy demands of DNNs are a crucial
2) INTELLIGENCE EVERYWHERE consideration. This has led to a focus on balancing model
a comprehensive and automated management of AI models, complexity with energy efficiency, driving innovation in
from training to deployment to monitoring, and the ability optimization techniques, and the development of specialized
to handle model drift, retraining, and versioning. This would hardware to run these models more efficiently [344].
take place for every network layer and on every network Moreover, different models and benchmarks are used to
device. estimate and plan the energy consumption of DNNs [345],
[346], [347].
3) ZERO TOUCH
a high degree of automation and autonomy for the manage- C. TINY MACHINE LEARNING
ment of AI and data, and the ability to express and supervise Tiny Machine Learning (TinyML) is an emerging field that
high-level goals rather than low-level actions. combines ML with ultra-low power computing, typically
found in microcontrollers and small IoT devices [348]. Its
4) AI AS A SERVICE goal is to deploy efficient ML models that can operate in
the exposure of AI and data services to external parties, environments with limited memory, processing power, and
such as service providers or customers, and the creation of energy. This is particularly relevant for applications where
a platform for innovation and collaboration. traditional ML models would be impractical due to their size
For further readings on the evaluation metrics of such and energy requirements.
networks, we refer to [337]. The authors in [338] also provide The primary motivation for TinyML is the need for
a road-map with potential frameworks to build such networks. localized data processing, especially in situations where
privacy, speed, and power efficiency are critical, rather than
B. DEEP NEURAL NETWORKS MODEL COMPLEXITY AND transmitting it to a centralized server or cloud [349]. This can
ENERGY CONSUMPTION be applied for many applications, spanning from smart home
The increasing complexity of DNNs has direct implications devices and wearable technology to healthcare monitoring
on energy consumption, a critical factor in both envi- and environmental sensors [285].
ronmental sustainability and practical deployment [339]. The core implementation of TinyML relies on ML model
The complexity of DNNs is largely driven by the depth quantization, which reduces its numerical precision and
and breadth of the network architecture. As DNNs grow size. Hence, implementing TinyML in environments with
deeper (with more layers) and wider (with more neurons limited resources presents several ongoing challenges – The
low computational capabilities and storage capacities of their acceptance in the computer networks research domain
smaller devices restrict the complexity of the models that and their suitability in productive environments. We also
can be deployed [350]. This constraint can adversely affect elaborated on how networking techniques can boost the
the efficacy and precision of TinyML-based applications. performance of existing ML setups and workflows, e.g.
To address this, some research suggests the integration of through several approaches for distributed learning.
cooperative ML (Section VI) and TinyML approaches [342], Lastly, we provided a large number of pointers for further
[351]. This strategy would enable devices with constrained reading, such as surveys on more specific ML/networking
resources to work collaboratively on ML tasks. Moreover, domains, example research works for some of the problems
progress in hardware development, particularly in creating introduced in this paper or links to many of the mentioned
more efficient microcontrollers and sensors, is expected to datasets or tools.
broaden the range of possible applications for TinyML. For a Despite our comprehensive coverage of established tools,
recent survey of tinyML applications and techniques, we refer approaches, and recent breakthroughs, it’s important to
to [352]. acknowledge the dynamic nature of ML research. The
field is characterized by the emergence of new algorithms,
IX. CONCLUSION the potential availability of additional tools and features
The aim of this paper is to provide interested but inexpe- in the future, and the hopeful prospect of more open-
rienced readers an an inspiring and practical jumpstart for sourced datasets. While this evolution is happening at an
research in the intersection of ML and computer networking. unprecedented pace, this paper still serves as a valuable
This encompasses not only the creation of novel ML- starting point for researchers and newcomers alike and
powered solutions for covered networking scenarios but also provides a timely and relevant contribution to the intersection
leveraging established networking technology to enhance of the fields of ML and computer networking.
existing ML approaches.
ACKNOWLEDGMENT
Compared to the aforementioned surveys and tutorials
The authors alone are responsible for the content. This work is
(Section VII), we are the first to provide a comprehensive
a result of a cooperation and continuous knowledge exchange
bidirectional overview of ML and XAI techniques across
between participants of the MaLeNe Workshop 2022.
different networking fields, and vice versa.44 Furthermore,
in addition to an overview of the current state of the art, our REFERENCES
work provides practical guidance for aspiring researchers to [1] J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz,
shortcut their way into meaningful research: N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-
Ackermann, V. M. Tran, A. Chiappino-Pepe, A. H. Badran,
• Many of the mentioned related papers do not consider I. W. Andrews, E. J. Chory, G. M. Church, E. D. Brown, T. S. Jaakkola,
datasets and/or starting points to reproduce the results or R. Barzilay, and J. J. Collins, ‘‘A deep learning approach to
even to just start experimenting. In contrast, we refer to antibiotic discovery,’’ Cell, vol. 180, no. 4, pp. 688–702, Feb. 2020.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
publicly available datasets as well as methods and tools S0092867420301021
to generate synthetic datasets (Section IV) and design [2] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer,
ML models suitable for the respective task. ‘‘High-resolution image synthesis with latent diffusion models,’’ in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022,
• We categorize existing approaches as ML serving pp. 10674–10685.
networks (ML4N) and networks serving ML (N4ML) [3] A. Davies, P. Veličković, L. Buesing, S. Blackwell, D. Zheng,
based on the used metrics, which helps to identify N. Tomašev, R. Tanburn, P. Battaglia, C. Blundell, A. Juhász, M. Lack-
enby, G. Williamson, D. Hassabis, and P. Kohli, ‘‘Advancing mathematics
research gaps and possible future directions of research. by guiding human intuition with AI,’’ Nature, vol. 600, no. 7887,
We introduced the most popular ML techniques, model pp. 70–74, Dec. 2021. [Online]. Available: https://www.nature.com/
types, and tools as well as several practical aspects to articles/s41586-021-04086-x
[4] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, ‘‘Artificial
consider when practicing ML such as obtaining high-quality neural networks-based machine learning for wireless networks: A
data for the learning algorithm, or the incorporation of tutorial,’’ IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3039–3071,
inductive biases (more specifically for networking data and 4th Quart., 2019.
[5] S. J. Russell, Artificial Intelligence a Modern Approach. London, U.K.:
network topologies) into ML models in order to reduce Pearson, 2010.
resource requirements. Secondly, we introduced the most [6] M. F. A. Fauzi, R. Nordin, N. F. Abdullah, and H. A. H. Alobaidy,
common computer networking problem domains and pointed ‘‘Mobile network coverage prediction based on supervised machine
learning algorithms,’’ IEEE Access, vol. 10, pp. 55782–55793, 2022.
to existing tools and datasets to accelerate and facilitate ML [7] C. Ioannou and V. Vassiliou, ‘‘Classifying security attacks in IoT
research on networking problems. networks using supervised learning,’’ in Proc. 15th Int. Conf. Distrib.
Thirdly, we introduced how XAI methods can improve Comput. Sensor Syst. (DCOSS), May 2019, pp. 652–658.
[8] W. Hu, Y. Liao, and R. Vemuri, ‘‘Robust anomaly detection using support
the transparency of ML models’ decisions and thus push vector machines,’’ Proc. Int. Conf. Mach. Learn., Jun. 2003, pp. 282–289.
[9] B. Mohammed, I. Awan, H. Ugail, and M. Younas, ‘‘Failure prediction
44 We do not aim for a comprehensive review of state-of-the-art research using machine learning in a virtualised HPC system and application,’’
in ML or its sub-disciplines, as there are numerous survey and tutorial Cluster Comput., vol. 22, no. 2, pp. 471–485, Jun. 2019.
resources that provide an excellent ML-focused overview. Rather, we view [10] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, ‘‘Support
ML techniques solely in relation to networking, either as facilitators (ML for vector machines,’’ IEEE Intell. Syst. Appl., vol. 13, no. 4, pp. 18–28,
Networks) or beneficiaries (Networks for ML). Aug. 1998.
[11] S. B. Kotsiantis, ‘‘Decision trees: A recent overview,’’ Artif. Intell. Rev., [37] P. Kumar and R. Kumar, ‘‘Issues and challenges of load balancing
vol. 39, no. 4, pp. 261–283, Apr. 2013. techniques in cloud computing: A survey,’’ ACM Comput. Surveys,
[12] L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, pp. 5–32, vol. 51, no. 6, pp. 1–35, Nov. 2019.
Oct. 2001. [38] A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi,
[13] G. Shakhnarovich, T. Darrell, and P. Indyk, ‘‘Nearest-neighbor methods ‘‘Deep reinforcement learning for radio resource allocation and manage-
in learning and vision,’’ IEEE Trans. Neural Netw., vol. 19, no. 2, p. 377, ment in next generation heterogeneous wireless networks: A survey,’’
Feb. 2008. 2021, arXiv:2106.00574.
[14] M. Nasri and M. Hamdi, ‘‘LTE QoS parameters prediction using [39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.
multivariate linear regression algorithm,’’ in Proc. 22nd Conf. Innov. Cambridge, MA, USA: MIT Press, 2018.
Clouds, Internet Netw. Workshops (ICIN), Feb. 2019, pp. 145–150. [40] D. Bertsekas, Dynamic Programming and Optimal Control, vol. 2.
[15] A. Y. Nikravesh, S. A. Ajila, C.-H. Lung, and W. Ding, ‘‘Mobile network Nashua, NH, USA: Athena Scientific, 2012.
traffic prediction using MLP, MLPWD, and SVM,’’ in Proc. IEEE Int. [41] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, vol. 5.
Congr. Big Data (BigData Congr.), Jun. 2016, pp. 402–409. Belmont, MA, USA: Athena Scientific, 1996.
[16] A. J. Smola and B. Schölkopf, ‘‘A tutorial on support vector regression,’’ [42] T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, ‘‘Model-based
Statist. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004. reinforcement learning: A survey,’’ Found. Trends Mach. Learn., vol. 16,
[17] C.-Y. Hsu, P.-Y. Chen, S. Lu, S. Liu, and C.-M. Yu, ‘‘Adversarial examples no. 1, pp. 1–118, 2023.
can be effective data augmentation for unsupervised machine learning,’’ [43] Y.-P. Hsu, E. Modiano, and L. Duan, ‘‘Age of information: Design and
in Proc. AAAI Conf. Artif. Intell., 2021, pp. 6926–6934. analysis of optimal scheduling algorithms,’’ in Proc. IEEE Int. Symp. Inf.
[18] D. Kim and J. Choi, ‘‘Unsupervised representation learning for binary Theory (ISIT), Jun. 2017, pp. 561–565.
networks by joint classifier learning,’’ 2021, arXiv:2110.08851. [44] Q. Sykora, M. Ren, and R. Urtasun, ‘‘Multi-agent routing value iteration
[19] E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, ‘‘DBSCAN network,’’ in Proc. Int. Conf. Mach. Learn., 2020, pp. 9300–9310.
revisited, revisited: Why and how you should (Still) use DBSCAN,’’ ACM [45] S. S. Mwanje, L. C. Schmelz, and A. Mitschele-Thiel, ‘‘Cognitive cellular
Trans. Database Syst., vol. 42, no. 3, pp. 1–21, Sep. 2017. networks: A Q-learning framework for self-organizing networks,’’ IEEE
[20] J. Li, H. Izakian, W. Pedrycz, and I. Jamal, ‘‘Clustering-based anomaly Trans. Netw. Service Manage., vol. 13, no. 1, pp. 85–98, Mar. 2016.
detection in multivariate time series data,’’ Appl. Soft Comput., vol. 100, [46] Y. Kim, S. Kim, and H. Lim, ‘‘Reinforcement learning based resource
Mar. 2021, Art. no. 106919. management for network slicing,’’ Appl. Sci., vol. 9, no. 11, p. 2361,
[21] I. Ullah and H. Y. Youn, ‘‘Task classification and scheduling based on K- Jun. 2019.
means clustering for edge computing,’’ Wireless Pers. Commun., vol. 113,
[47] H. Afifi and H. Karl, ‘‘Reinforcement learning for virtual network
no. 4, pp. 2611–2624, Aug. 2020.
embedding in wireless sensor networks,’’ in Proc. 16th Int. Conf. Wireless
[22] Z. Fan and R. Liu, ‘‘Investigation of machine learning based network
Mobile Comput., Netw. Commun. (WiMob), Oct. 2020, pp. 123–128.
traffic classification,’’ in Proc. Int. Symp. Wireless Commun. Syst.
[48] A. Geramifard, ‘‘A tutorial on linear function approximators for dynamic
(ISWCS), Aug. 2017, pp. 1–6.
programming and reinforcement learning,’’ Found. Trends Mach. Learn.,
[23] R. Bellman, ‘‘Dynamic programming,’’ Science, vol. 153, nos. 37–31,
vol. 6, no. 4, pp. 375–451, 2013.
pp. 34–37, 1966.
[49] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, ‘‘Policy gradient
[24] H. Abdi and L. J. Williams, ‘‘Principal component analysis,’’ WIREs
methods for reinforcement learning with function approximation,’’ in
Comput. Statistic, vol. 2, no. 4, pp. 433–459, Jul./Aug. 2010.
Proc. Adv. Neural Inf. Process. Syst., vol. 12, 1999, pp. 1–12.
[25] C. Fefferman, S. Mitter, and H. Narayanan, ‘‘Testing the manifold
hypothesis,’’ J. Amer. Math. Soc., vol. 29, no. 4, pp. 983–1049, Feb. 2016. [50] R. J. Williams, ‘‘Simple statistical gradient-following algorithms for
connectionist reinforcement learning,’’ in Reinforcement Learning.
[26] U. Narayanan, A. Unnikrishnan, V. Paul, and S. Joseph, ‘‘A survey
Boston, MA, USA: Springer, 1992, pp. 5–32.
on various supervised classification algorithms,’’ in Proc. Int. Conf.
Energy, Commun., Data Anal. Soft Comput. (ICECDS), Aug. 2017, [51] I. Grondman, L. Busoniu, G. A. D. Lopes, and R. Babuska, ‘‘A
pp. 2118–2124. survey of actor-critic reinforcement learning: Standard and natural policy
[27] J. E. van Engelen and H. H. Hoos, ‘‘A survey on semi-supervised gradients,’’ IEEE Trans. Syst. Man, Cybern. C, Appl. Rev., vol. 42, no. 6,
learning,’’ Mach. Learn., vol. 109, no. 2, pp. 373–440, Feb. 2020, doi: pp. 1291–1307, Nov. 2012.
10.1007/s10994-019-05855-6. [52] V. Mnih, ‘‘Asynchronous methods for deep reinforcement learning,’’ in
[28] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, ‘‘Machine learning in Proc. Int. Conf. Mach. Learn., 2016, pp. 1928–1937.
wireless sensor networks: Algorithms, strategies, and applications,’’ IEEE [53] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, ‘‘Resource
Commun. Surveys Tuts., vol. 16, no. 4, pp. 1996–2018, 4th Quart., 2014. management with deep reinforcement learning,’’ in Proc. 15th ACM
[29] M. Usama, J. Qadir, A. Raza, H. Arif, K. A. Yau, Y. Elkhatib, A. Hussain, Workshop Hot Topics Netw., Nov. 2016, pp. 50–56.
and A. Al-Fuqaha, ‘‘Unsupervised machine learning for networking: [54] C. Zhong, Z. Lu, M. C. Gursoy, and S. Velipasalar, ‘‘A deep actor-critic
Techniques, applications and research challenges,’’ IEEE Access, vol. 7, reinforcement learning framework for dynamic multichannel access,’’
pp. 65579–65615, 2019. IEEE Trans. Cognit. Commun. Netw., vol. 5, no. 4, pp. 1125–1139,
[30] Z. Ghahramani, ‘‘Probabilistic machine learning and artificial intelli- Dec. 2019.
gence,’’ Nature, vol. 521, no. 7553, pp. 452–459, May 2015. [Online]. [55] S. Tuli, S. Ilager, K. Ramamohanarao, and R. Buyya, ‘‘Dynamic
Available: https://www.nature.com/articles/nature14541 scheduling for stochastic edge-cloud computing environments using A3C
[31] K. P. Murphy, Probabilistic Machine Learning: An Introduction. learning and residual recurrent neural networks,’’ IEEE Trans. Mobile
Cambridge, MA, USA: MIT Press, 2022. [Online]. Available: Comput., vol. 21, no. 3, pp. 940–954, Mar. 2022.
https://probml.github.io/pml-book/book1.html [56] M. Chen, T. Wang, K. Ota, M. Dong, M. Zhao, and A. Liu, ‘‘Intelligent
[32] X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, and J. Tang, ‘‘Self- resource allocation management for vehicles network: An A3C learning
supervised learning: Generative or contrastive,’’ IEEE Trans. Knowl. approach,’’ Comput. Commun., vol. 151, pp. 485–494, Feb. 2020.
Data Eng., vol. 35, no. 1, pp. 857–876, Jan. 2023. [57] S. Still and D. Precup, ‘‘An information-theoretic approach to curiosity-
[33] F. Ebert, C. Finn, A. X. Lee, and S. Levine, ‘‘Self-supervised visual driven reinforcement learning,’’ Theory Biosciences, vol. 131, no. 3,
planning with temporal skip connections,’’ in Proc. Conf. Robot Learn., pp. 139–148, Sep. 2012.
2017, pp. 1–13. [58] Y. Burda, H. Edwards, A. Storkey, and O. Klimov, ‘‘Exploration by
[34] S. Meyn, Control Systems and Reinforcement Learning. Cambridge, random network distillation,’’ 2018, arXiv:1810.12894.
U.K.: Cambridge Univ. Press, 2022. [59] M. L. Littman, ‘‘Markov games as a framework for multi-agent
[35] Y. Xu, G. Gui, H. Gacanin, and F. Adachi, ‘‘A survey on resource reinforcement learning,’’ in Proc. Mach. Learn., Jan. 1994, pp. 157–163.
allocation for 5G heterogeneous networks: Current research, future [60] T. Gabel, ‘‘Multi-agent reinforcement learning approaches for distributed
trends, and challenges,’’ IEEE Commun. Surveys Tuts., vol. 23, no. 2, job shop scheduling problems,’’ Ph.D. dissertation, Dept. Math. Comput.
pp. 668–695, 2nd Quart., 2021. Sci., Osnabrück Univ., Osnabrück, Germany, 2009.
[36] M. M. Sadeeq, N. M. Abdulkareem, S. R. M. Zeebaree, D. M. Ahmed, [61] L. Canese, G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino,
A. S. Sami, and R. R. Zebari, ‘‘IoT and cloud computing issues, M. Re, and S. Spanò, ‘‘Multi-agent reinforcement learning: A review
challenges and opportunities: A review,’’ Qubahan Academic J., vol. 1, of challenges and applications,’’ Appl. Sci., vol. 11, no. 11, p. 4948,
no. 2, pp. 1–7, Mar. 2021. May 2021.
[62] T. Li, K. Zhu, N. C. Luong, D. Niyato, Q. Wu, Y. Zhang, and B. Chen, [86] P. Veličković, ‘‘Everything is connected: Graph neural networks,’’ 2023,
‘‘Applications of multi-agent reinforcement learning in future internet: arXiv:2301.08210.
A comprehensive survey,’’ IEEE Commun. Surveys Tuts., vol. 24, no. 2, [87] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-
pp. 1240–1279, 2nd Quart., 2022. Farley, S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial
[63] E. Altman, Constrained Markov Decision Processes, vol. 7. Boca Raton, nets,’’ in Proc. Adv. Neural Inf. Process. Syst., Z. Ghahramani,
FL, USA: CRC Press, 1999. M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, Eds., vol. 27.
[64] S. Gu, L. Yang, Y. Du, G. Chen, F. Walter, J. Wang, Y. Yang, and Red Hook, NY, USA: Curran Associates, 2014, pp. 2672–2680. [Online].
A. Knoll, ‘‘A review of safe reinforcement learning: Methods, theory and Available: https://proceedings.neurips.cc/paperfiles/paper/2014/file/5ca3
applications,’’ 2022, arXiv:2205.10330. e9b122f61f8f06494c97b1afccf3-Paper.pdf
[65] A. Avranas, M. Kountouris, and P. Ciblat, ‘‘Deep reinforcement learning [88] C. Han, H. Hayashi, L. Rundo, R. Araki, W. Shimoda, S. Muramatsu,
for resource constrained multiclass scheduling in wireless networks,’’ Y. Furukawa, G. Mauri, and H. Nakayama, ‘‘GAN-based synthetic brain
2020, arXiv:2011.13634. MR image generation,’’ in Proc. IEEE 15th Int. Symp. Biomed. Imag.
[66] S. Khairy, P. Balaprakash, L. X. Cai, and Y. Cheng, ‘‘Constrained deep (ISBI), Apr. 2018, pp. 734–738.
reinforcement learning for energy sustainable multi-UAV based random [89] Y. Chen, Y. Pan, T. Yao, X. Tian, and T. Mei, ‘‘Mocycle-GAN: Unpaired
access IoT networks with NOMA,’’ 2020, arXiv:2002.00073. video-to-video translation,’’ in Proc. 27th ACM Int. Conf. Multimedia,
[67] C. Sun, C. She, and C. Yang, ‘‘Unsupervised deep learning for optimizing Oct. 2019, pp. 647–655.
wireless systems with instantaneous and statistic constraints,’’ 2020, [90] J. Kong, J. Kim, and J. Bae, ‘‘HiFi-GAN: Generative adversarial networks
arXiv:2006.01641. for efficient and high fidelity speech synthesis,’’ in Proc. Adv. Neural Inf.
[68] Constrained Unsupervised Learning for Wireless Network Optimization. Process. Syst., vol. 33, 2020, pp. 17022–17033.
Cambridge, U.K.: Cambridge Univ. Press, 2022, pp. 182–211. [91] A. Cheng, ‘‘PAC-GAN: Packet generation of network traffic using
[69] D. Wu, L. Deng, Z. Liu, Y. Zhang, and Y. S. Han, ‘‘Reinforcement generative adversarial networks,’’ in Proc. IEEE 10th Annu. Inf. Technol.,
learning random access for delay-constrained heterogeneous wireless Electron. Mobile Commun. Conf. (IEMCON), Oct. 2019, pp. 0728–0734.
networks: A two-user case,’’ in Proc. IEEE Globecom Workshops (GC [92] D. Bahdanau, K. Cho, and Y. Bengio, ‘‘Neural machine translation by
Wkshps), Dec. 2021, pp. 1–7. jointly learning to align and translate,’’ 2014, arXiv:1409.0473.
[70] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. [93] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Cambridge, MA, USA: MIT Press, 2016. [Online]. Available: Ł. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv.
http://www.deeplearningbook.org Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–14.
[71] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification [94] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6, of deep bidirectional transformers for language understanding,’’ 2018,
pp. 84–90, May 2017, doi: 10.1145/3065386. arXiv:1810.04805.
[72] T. B. Brown, ‘‘Language models are few-shot learners,’’ in Proc. [95] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
NIPS, 2020, pp. 1877–1901. https://proceedings.neurips.cc/paper/2020/ T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,
hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html J. Uszkoreit, and N. Houlsby, ‘‘An image is worth 16×16 words:
Transformers for image recognition at scale,’’ 2020, arXiv:2010.11929.
[73] W. Mcculloch and W. Pitts, ‘‘A logical calculus of the ideas immanent in
nervous activity,’’ Bull. Math. Biol., vol. 52, nos. 1–2, pp. 99–115, 1990. [96] C. Joshi, ‘‘Transformers are graph neural networks,’’ in The Gra-
dient, vol. 12, 2020. Accessed: Apr. 12, 2024. [Online]. Available:
[74] S. Sharma, S. Sharma, and A. Athaiya, ‘‘Activation functions in neural
https://thegradient.pub/transformers-are-graph-neural-networks/
networks,’’ Towards Data Sci., vol. 6, no. 12, pp. 310–316, 2017.
[97] D. K. Kholgh and P. Kostakos, ‘‘PAC-GPT: A novel approach to
[75] K. Hornik, M. Stinchcombe, and H. White, ‘‘Multilayer feedforward
generating synthetic network traffic with GPT-3,’’ IEEE Access, vol. 11,
networks are universal approximators,’’ Neural Netw., vol. 2, no. 5,
pp. 114936–114951, 2023.
pp. 359–366, Jan. 1989. [Online]. Available: https://www.sciencedirect.
[98] N. Ziems, G. Liu, J. Flanagan, and M. Jiang, ‘‘Explaining tree model
com/science/article/pii/0893608089900208
decisions in natural language for network intrusion detection,’’ 2023,
[76] R. Hecht-Nielsen, ‘‘Theory of the backpropagation neural network,’’ in arXiv:2310.19658.
Neural Networks for Perception. Amsterdam, The Netherlands: Elsevier,
[99] T. Ali and P. Kostakos, ‘‘HuntGPT: Integrating machine learning-based
1992, pp. 65–93.
anomaly detection and explainable AI with large language models
[77] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’ (LLMs),’’ 2023, arXiv:2309.16021.
2014, arXiv:1412.6980. [100] S. K. Mani, Y. Zhou, K. Hsieh, S. Segarra, T. Eberl, E. Azulai, I. Frizler,
[78] G. Lan, First-order and Stochastic Optimization Methods for Machine R. Chandra, and S. Kandula, ‘‘Enhancing network management using
Learning, vol. 1. Cham, Switzerland: Springer, 2020. code generated by large language models,’’ in Proc. 22nd ACM Workshop
[79] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, ‘‘Geometric Hot Topics Netw., Nov. 2023, pp. 196–204.
deep learning: Grids, groups, graphs, geodesics, and gauges,’’ 2021, [101] Y. Huang, H. Du, X. Zhang, D. Niyato, J. Kang, Z. Xiong, S. Wang,
arXiv:2104.13478. and T. Huang, ‘‘Large language models for networking: Applications,
[80] R. Eldan and O. Shamir, ‘‘The power of depth for feedforward neural enabling techniques, and challenges,’’ 2023, arXiv:2311.17474.
networks,’’ in Proc. Conf. Learn. Theory, 2016, pp. 907–940. [102] J. Sun, Q. V. Liao, M. Müller, M. Agarwal, S. Houde, K. Talamadupula,
[81] T. Gruber, S. Cammerer, J. Hoydis, and S. T. Brink, ‘‘On deep learning- and J. D. Weisz, ‘‘Investigating explainability of generative AI for code
based channel decoding,’’ in Proc. 51st Annu. Conf. Inf. Sci. Syst. (CISS), through scenario-based design,’’ in Proc. 27th Int. Conf. Intell. User
Mar. 2017, pp. 1–6. Interface. New York, NY, USA: Association for Computing Machinery,
[82] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, Mar. 2022, pp. 212–228, doi: 10.1145/3490099.3511119.
‘‘Learning to optimize: Training deep neural networks for wireless [103] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun,
resource management,’’ in Proc. IEEE 18th Int. Workshop Signal Process. M. Wang, and H. Wang, ‘‘Retrieval-augmented generation for large
Adv. Wireless Commun. (SPAWC), Jul. 2017, pp. 1–6. language models: A survey,’’ 2023, arXiv:2312.10997.
[83] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, [104] Cisco. (2023). Cisco Unveils Next-Gen Solutions That Empower
‘‘Deep learning approach for network intrusion detection in software Security and Productivity With Generative AI. [Online]. Available:
defined networking,’’ in Proc. Int. Conf. Wireless Netw. Mobile Commun. https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2023/m06/cisco-un
(WINCOM), Oct. 2016, pp. 258–263. veils-next-gen-solutions-that-empower-security-and-productivity-with-
[84] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural generative-ai.html
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [105] Juniper. (2023). AI for IT Operations (AIOps). [Online]. Available:
[85] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, https://www.juniper.net/us/en/solutions/artificial-intelligence-for-it-oper
H. Schwenk, and Y. Bengio, ‘‘Learning phrase representations using ations-aiops.html
RNN encoder–decoder for statistical machine translation,’’ in Proc. [106] O. Santos. (2023). Securing AI: Navigating the Complex Landscape of
Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2014, Models, Fine-Tuning, and Rag. [Online]. Available: https://blogs.cisco.
pp. 1724–1734. [Online]. Available: http://aclweb.org/anthology/D14- com/security/securing-ai-navigating-the-complex-landscape-of-models-
1179 fine-tuning-and-rag
[107] I. Cisco Systems. (2023). Cisco AI Assistant Cisco. [Online]. Available: [129] J. Wu, L. Wang, Q. Pei, X. Cui, F. Liu, and T. Yang, ‘‘HiTDL: High-
https://www.cisco.com/site/us/en/solutions/artificial-intelligence/ai-assi throughput deep learning inference at the hybrid mobile edge,’’ IEEE
stant/index.html Trans. Parallel Distrib. Syst., vol. 33, no. 12, pp. 4499–4514, Dec. 2022.
[108] S. Thrun and A. Schwartz, ‘‘Issues in using function approximation [130] D. Raca, D. Leahy, C. J. Sreenan, and J. J. Quinlan, ‘‘Beyond throughput,
for reinforcement learning,’’ in Proc. 4th Connectionist Models Summer the next generation: A 5G dataset with channel and context metrics,’’ in
School, vol. 255, 1993, p. 263. Proc. 11th ACM Multimedia Syst. Conf., May 2020, pp. 303–308.
[109] V. Mnih, ‘‘Human-level control through deep reinforcement learning,’’ [131] Geant/Abilene Network Topology Data and Traffic Traces, 3rd Party,
Nature, vol. 518, pp. 529–533, Feb. 2015. Ocala, FL, USA, 2020.
[110] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, [132] GENI. Accessed: Apr. 12, 2024. [Online]. Available: https://www.
and D. Wierstra, ‘‘Continuous control with deep reinforcement learning,’’ geni.net/
2015, arXiv:1509.02971. [133] (Nov. 2020). Caida Data Completed Datasets. [Online]. Available:
[111] A. Ramaswamy, S. Bhatnagar, and N. Saxena, ‘‘A framework for provably https://www.caida.org/catalog/datasets/completed-datasets/
stable and consistent training of deep feedforward networks,’’ 2023, [134] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, ‘‘Measuring ISP
arXiv:2305.12125. topologies with rocketfuel,’’ ACM Trans. Netw., vol. 12, no. 1, pp. 2–16,
[112] A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, 2004.
J. Aru, and R. Vicente, ‘‘Multiagent cooperation and competition with [135] (2021). The Internet Topology Zoo. [Online]. Available:
deep reinforcement learning,’’ PLoS ONE, vol. 12, no. 4, Apr. 2017, http://www.topology-zoo.org/dataset.html
Art. no. e0172395. [136] M. Roughan, ‘‘A case study of the accuracy of SNMP measure-
[113] R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and ments,’’ J. Electr. Comput. Eng., vol. 2010, pp. 1–7, May 2010, doi:
I. Mordatch, ‘‘Multi-agent actor-critic for mixed cooperative-competitive 10.1155/2010/812979.
environments,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, [137] J. Kua, G. Armitage, and P. Branch, ‘‘A survey of rate adaptation
pp. 1–21. techniques for dynamic adaptive streaming over HTTP,’’ IEEE Commun.
[114] A. Redder, A. Ramaswamy, and H. Karl, ‘‘3DPG: Distributed deep Surveys Tuts., vol. 19, no. 3, pp. 1842–1866, 3rd Quart., 2017.
deterministic policy gradient algorithms for networked multi-agent [138] G. Zhou, R. Wu, M. Hu, Y. Zhou, T. Z. J. Fu, and D. Wu, ‘‘Vibra:
systems,’’ 2022, arXiv:2201.00570. Neural adaptive streaming of VBR-encoded videos,’’ in Proc. 31st ACM
[115] C. Qiu, H. Yao, F. R. Yu, F. Xu, and C. Zhao, ‘‘Deep Q-learning aided Workshop Netw. Operating Syst. Support Digit. Audio Video, Jul. 2021,
networking, caching, and computing resources allocation in software- pp. 1–8.
defined satellite-terrestrial networks,’’ IEEE Trans. Veh. Technol., vol. 68, [139] Y. Yuan, W. Wang, Y. Wang, S. S. Adhatarao, B. Ren, K. Zheng, and
no. 6, pp. 5871–5883, Jun. 2019. X. Fu, ‘‘VSiM: Improving QoE fairness for video streaming in mobile
[116] S. Schneider, R. Khalili, A. Manzoor, H. Qarawlus, R. Schellenberg, environments,’’ in Proc. IEEE INFOCOM Conf. Comput. Commun.,
H. Karl, and A. Hecker, ‘‘Self-learning multi-objective service coordi- May 2022, pp. 1309–1318.
nation using deep reinforcement learning,’’ IEEE Trans. Netw. Service [140] S. Lederer, C. Müller, and C. Timmerer, ‘‘Dynamic adaptive streaming
Manage., vol. 18, no. 3, pp. 3829–3842, Sep. 2021. over HTTP dataset,’’ in Proc. 3rd Multimedia Syst. Conf., Feb. 2012,
[117] A. Redder, A. Ramaswamy, and D. E. Quevedo, ‘‘Deep reinforcement pp. 89–94.
learning for scheduling in large-scale networked control systems,’’ IFAC- [141] S. Lederer, C. Mueller, C. Timmerer, C. Concolato, J. Le Feuvre, and
PapersOnLine, vol. 52, no. 20, pp. 333–338, 2019. K. Fliegel, ‘‘Distributed dash dataset,’’ in Proc. 4th ACM Multimedia Syst.
[118] H. Afifi, A. Ramaswamy, and H. Karl, ‘‘Reinforcement learning for Conf., 2013, pp. 131–135.
autonomous vehicle movements in wireless sensor networks,’’ in Proc. [142] J. Le Feuvre, J.-M. Thiesse, M. Parmentier, M. Raulet, and C. Daguet,
IEEE Int. Conf. Commun., Jun. 2021, pp. 1–6. ‘‘Ultra high definition HEVC DASH data set,’’ in Proc. 5th ACM
[119] B. Jang, M. Kim, G. Harerimana, and J. W. Kim, ‘‘Q-learning algorithms: Multimedia Syst. Conf., Mar. 2014, pp. 7–12.
A comprehensive classification and applications,’’ IEEE Access, vol. 7, [143] A. Zabrovskiy, C. Feldmann, and C. Timmerer, ‘‘Multi-codec DASH
pp. 133653–133667, 2019. dataset,’’ in Proc. 9th ACM Multimedia Syst. Conf., Jun. 2018,
[120] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C. Liang, pp. 438–443.
and D. I. Kim, ‘‘Applications of deep reinforcement learning in [144] A. Chandramohan, M. Poel, B. Meijerink, and G. Heijenk, ‘‘Machine
communications and networking: A survey,’’ IEEE Commun. Surveys learning for cooperative driving in a multi-lane highway environment,’’
Tuts., vol. 21, no. 4, pp. 3133–3174, 4th Quart., 2019. in Proc. Wireless Days (WD), Apr. 2019, pp. 1–4.
[121] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and [145] L. N. Alegre, T. Ziemke, and A. L. C. Bazzan, ‘‘Using reinforcement
R. R. Salakhutdinov, ‘‘Improving neural networks by preventing co- learning to control traffic signals in a real-world scenario: An approach
adaptation of feature detectors,’’ 2012, arXiv:1207.0580. based on linear function approximation,’’ IEEE Trans. Intell. Transp.
[122] H. Riiser, P. Vigmostad, C. Griwodz, and P. Halvorsen, ‘‘Commute path Syst., vol. 23, no. 7, pp. 9126–9135, Jul. 2022.
bandwidth traces from 3G networks: Analysis and applications,’’ in Proc. [146] C. Liu, Y. Zhang, W. Chen, F. Wang, H. Li, and Y.-D. Shen, ‘‘Adaptive
4th ACM Multimedia Syst. Conf., Feb. 2013, pp. 114–118. matching strategy for multi-target multi-camera tracking,’’ in Proc.
[123] X. Zuo, J. Yang, M. Wang, and Y. Cui, ‘‘Adaptive bitrate with user-level IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2022,
QoE preference for video streaming,’’ in Proc. IEEE INFOCOM Conf. pp. 2934–2938.
Comput. Commun., May 2022, pp. 1279–1288. [147] M. Maciejewski, ‘‘A comparison of microscopic traffic flow simulation
[124] J. van der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, systems for an urban area,’’ Transp. Problems, vol. 5, no. 4, pp. 29–40,
T. Bostoen, and F. De Turck, ‘‘HTTP/2-based adaptive streaming of 2010.
HEVC video over 4G/LTE networks,’’ IEEE Commun. Lett., vol. 20, [148] F. K. Karnadi, Z. H. Mo, and K.-C. Lan, ‘‘Rapid generation of realistic
no. 11, pp. 2177–2180, Nov. 2016. mobility models for VANET,’’ in Proc. IEEE Wireless Commun. Netw.
[125] L. Zhang, Y. Zhang, X. Wu, F. Wang, L. Cui, Z. Wang, and J. Liu, ‘‘Batch Conf., Mar. 2007, pp. 2506–2511.
adaptative streaming for video analytics,’’ in Proc. IEEE INFOCOM [149] M. Tsao, D. Milojevic, C. Ruch, M. Salazar, E. Frazzoli, and M. Pavone,
Conf. Comput. Commun., May 2022, pp. 2158–2167. ‘‘Model predictive control of ride-sharing autonomous mobility-on-
[126] A. Alhilal, T. Braud, B. Han, and P. Hui, ‘‘Nebula: Reliable low-latency demand systems,’’ in Proc. Int. Conf. Robot. Autom. (ICRA), May 2019,
video transmission for mobile cloud gaming,’’ in Proc. ACM Web Conf., pp. 6665–6671.
Apr. 2022, pp. 3407–3417. [150] C. M. Moyano, J. F. Ortega, and D. E. Mogrovejo, ‘‘Efficiency analysis
[127] D. Raca, J. J. Quinlan, A. H. Zahran, and C. J. Sreenan, ‘‘Beyond during calibration of traffic microsimulation models in conflicting
throughput: A 4G LTE dataset with channel and context metrics,’’ in Proc. intersections near Universidad del Azuay, using Aimsun 8.1,’’ in Proc.
9th ACM Multimedia Syst. Conf., Jun. 2018, pp. 460–465. MOVICI-MOYCOT Joint Conf. Urban Mobility Smart City, Apr. 2018,
[128] S. Farthofer, M. Herlich, C. Maier, S. Pochaba, J. Lackner, and pp. 1–6.
P. Dorfinger, ‘‘An open mobile communications drive test data set and [151] L. Yang and W. Lan, ‘‘On secondary development of PTV-VISSIM
its use for machine learning,’’ IEEE Open J. Commun. Soc., vol. 3, for traffic optimization,’’ in Proc. 13th Int. Conf. Comput. Sci. Educ.
pp. 1688–1701, 2022. (ICCSE), Aug. 2018, pp. 1–5.
[152] L. Lu, T. Yun, L. Li, Y. Su, and D. Yao, ‘‘A comparison of phase transitions [174] M. Schettler, D. S. Buse, A. Zubow, and F. Dressler, ‘‘How to train your
produced by PARAMICS, TransModeler, and VISSIM,’’ IEEE Intell. ITS? Integrating machine learning with vehicular network simulation,’’
Transp. Syst. Mag., vol. 2, no. 3, pp. 19–24, Fall. 2010. in Proc. IEEE Veh. Netw. Conf. (VNC), Dec. 2020, pp. 1–4.
[153] Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, S. Wang, [175] D. Stolpmann. (2021). Machine Learning in OMNeT++. GitHub Repos-
R. Kumar, D. Anastasiu, and J.-N. Hwang, ‘‘CityFlow: A city-scale itory. [Online]. Available: https://github.com/ComNetsHH/omnetpp-ml
benchmark for multi-target multi-camera vehicle tracking and re- [176] ‘‘FlowEmu: An open-source flow-based network emulator,’’ Electron.
identification,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Commun. EASST, vol. 80, Sep. 2021.
(CVPR), Jun. 2019, pp. 8789–8798. [177] F. Ruffy, M. Przystupa, and I. Beschastnikh, ‘‘Iroko: A framework to
[154] Z. Wang, B. Li, and B. Liang, ‘‘Quick: Quality-of-service improvement prototype reinforcement learning for data center traffic control,’’ 2018,
with cooperative relaying and network coding,’’ in Proc. IEEE Int. Conf. arXiv:1812.09975.
Commun., Jun. 2010, pp. 1–5. [178] J. Charlier, A. Singh, G. Ormazabal, R. State, and H. Schulzrinne,
[155] T. Mangla, E. Halepovic, M. Ammar, and E. Zegura, ‘‘EMIMIC: ‘‘SynGAN: Towards generating synthetic network attacks using GANs,’’
Estimating HTTP-based video QoE metrics from encrypted network 2019, arXiv:1908.09899.
traffic,’’ in Proc. Netw. Traffic Meas. Anal. Conf. (TMA), Jun. 2018, [179] M. Ring, D. Schlör, D. Landes, and A. Hotho, ‘‘Flow-based network
pp. 1–8. traffic generation using generative adversarial networks,’’ Comput. Secur.,
[156] C. Gutterman, K. Guo, S. Arora, X. Wang, L. Wu, E. Katz-Bassett, and vol. 82, pp. 156–172, May 2019, doi: 10.1016/j.cose.2018.12.012.
G. Zussman, ‘‘Requet: Real-time QoE detection for encrypted Youtube [180] A. Mozo, Á. González-Prieto, A. Pastor, S. Gómez-Canaval, and
traffic,’’ in Proc. 10th ACM Multimedia Syst. Conf., Jun. 2019, pp. 48–59. E. Talavera, ‘‘Synthetic flow-based cryptomining attack generation
[157] M. Seufert, P. Casas, N. Wehner, L. Gang, and K. Li, ‘‘Stream-based through generative adversarial networks,’’ Sci. Rep., vol. 12, no. 1,
machine learning for real-time QoE analysis of encrypted video streaming p. 2091, Feb. 2022, doi: 10.1038/s41598-022-06057-2.
traffic,’’ in Proc. 22nd Conf. Innov. Clouds, Internet Netw. Workshops [181] Y. Guo, G. Xiong, Z. Li, J. Shi, M. Cui, and G. Gou, ‘‘Combating imbal-
(ICIN), Feb. 2019, pp. 76–81. ance in network traffic classification using GAN based oversampling,’’ in
[158] N. Wehner, M. Ring, J. Schüler, A. Hotho, T. Hoßfeld, and M. Seufert, Proc. IFIP Netw. Conf. (IFIP Networking), Jun. 2021, pp. 1–9.
‘‘On learning hierarchical embeddings from encrypted network traffic,’’ [182] T. J. Anande and M. S. Leeson, ‘‘Generative adversarial networks
in Proc. NOMS IEEE/IFIP Netw. Oper. Manage. Symp., Apr. 2022, (GANs): A survey on network traffic generation,’’ Int. J. Mach. Learn.
pp. 1–7. Comput., vol. 12, no. 6, pp. 333–343, 2022.
[159] K. Dietz, M. Mühlhauser, M. Seufert, N. Gray, T. Hoßfeld, and [183] M. Rigaki and S. Garcia, ‘‘Bringing a GAN to a knife-fight: Adapting
D. Herrmann, ‘‘Browser fingerprinting: How to protect machine learning malware communication to avoid detection,’’ in Proc. IEEE Secur.
models and data with differential privacy?’’ Electron. Commun. EASST, Privacy Workshops (SPW), May 2018, pp. 70–75.
vol. 80, pp. 1–7, Sep. 2021. [184] C. Zhang, X. Ouyang, and P. Patras, ‘‘ZipNet-GAN: Inferring fine-
[160] N. Wehner, M. Seufert, J. Schüler, P. Casas, and T. Hoßfeld, ‘‘How are grained mobile traffic patterns via a generative adversarial neural
your apps doing? QoE inference and analysis in mobile devices,’’ in Proc. network,’’ 2017, arXiv:1711.02413.
17th Int. Conf. Netw. Service Manage. (CNSM), Oct. 2021, pp. 49–55. [185] B. Dowoo, Y. Jung, and C. Choi, ‘‘PcapGAN: Packet capture file
[161] A. Azab, M. Khasawneh, S. Alrabaee, K.-K.-R. Choo, and M. Sarsour, generator by style-based generative adversarial networks,’’ in Proc. 18th
‘‘Network traffic classification: Techniques, datasets, and challenges,’’ IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2019, pp. 1149–1154.
Digit. Commun. Netw., Sep. 2022. [Online]. Available: https://www. [186] L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph,
sciencedirect.com/science/article/pii/S2352864822001845 and A. Madry, ‘‘Implementation matters in deep RL: A case study on ppo
and trpo,’’ in Proc. Int. Conf. Learn. Represent., 2020, pp. 1–14.
[162] D. Shamsimukhametov, M. Liubogoshchev, E. Khorov, and I. Akyildiz,
‘‘Youtube Netflix web dataset for encrypted traffic classification,’’ in [187] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and
Proc. Int. Conf. Eng. Telecommun., 2021, pp. 1–5. N. Dormann, ‘‘Stable-baselines3: Reliable reinforcement learning imple-
mentations,’’ J. Mach. Learn. Res., vol. 22, no. 268, pp. 12348–12355,
[163] G. Aceto, D. Ciuonzo, A. Montieri, V. Persico, and A. Pescapé,
2021.
‘‘MIRAGE: Mobile-app traffic capture and ground-truth creation,’’ in
[188] S. Huang, R. F. J. Dossa, C. Ye, J. Braga, D. Chakraborty,
Proc. 4th Int. Conf. Comput., Commun. Secur. (ICCCS), Oct. 2019,
K. Mehta, and J. G. Araújo, ‘‘CleanRL: High-quality single-file imple-
pp. 1–8.
mentations of deep reinforcement learning algorithms,’’ J. Mach.
[164] C. Wang, A. Finamore, L. Yang, K. Fauvel, and D. Rossi, ‘‘AppClassNet: Learn. Res., vol. 23, no. 274, pp. 1–18, 2022. [Online]. Available:
A commercial-grade dataset for application identification research,’’ http://jmlr.org/papers/v23/21-1342.html
ACM SIGCOMM Comput. Commun. Rev., vol. 52, no. 3, pp. 19–27,
[189] E. Liang, R. Liaw, P. Moritz, R. Nishihara, R. Fox, K. Goldberg,
Jul. 2022.
J. E. Gonzalez, M. I. Jordan, and I. Stoica, ‘‘RLlib: Abstractions for
[165] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, distributed reinforcement learning,’’ 2017, arXiv:1712.09381.
‘‘A survey of network-based intrusion detection data sets,’’ Comput.
[190] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder,
Secur., vol. 86, pp. 147–167, Sep. 2019.
B. McGrew, J. Tobin, O. P. Abbeel, and W. Zaremba, ‘‘Hindsight
[166] (2023). Datasets. [Online]. Available: https://www.unb.ca/cic/datasets/ experience replay,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017,
[167] A. Dvir, Y. Zion, J. Muehlstein, O. Pele, C. Hajaj, and R. Dubin, pp. 1–14.
‘‘Robust machine learning for encrypted traffic classification,’’ 2016, [191] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
arXiv:1603.04865. ‘‘Proximal policy optimization algorithms,’’ 2017, arXiv:1707.06347.
[168] R. Poorzare and O. P. Waldhorst, ‘‘Toward the implementation of MPTCP [192] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar,
over mmWave 5G and beyond: Analysis, challenges, and solutions,’’ H. Zhu, A. Gupta, P. Abbeel, and S. Levine, ‘‘Soft actor-critic algorithms
IEEE Access, vol. 11, pp. 19534–19566, 2023. and applications,’’ 2018, arXiv:1812.05905.
[169] R. Poorzare and A. C. Augé, ‘‘Challenges on the way of implementing [193] S. Fujimoto, H. van Hoof, and D. Meger, ‘‘Addressing function approx-
TCP over 5G networks,’’ IEEE Access, vol. 8, pp. 176393–176415, 2020. imation error in actor-critic methods,’’ in Proc. Int. Conf. Mach. Learn.,
[170] T. R. Henderson, M. Lacage, G. F. Riley, C. Dowell, and J. Kopena, ‘‘Net- 2018, pp. 1587–1596.
work simulations with the NS-3 simulator,’’ SIGCOMM Demonstration, [194] H. Mania, A. Guy, and B. Recht, ‘‘Simple random search provides a com-
vol. 14, no. 14, p. 527, 2008. petitive approach to reinforcement learning,’’ 2018, arXiv:1803.07055.
[171] M. Mezzavilla, M. Zhang, M. Polese, R. Ford, S. Dutta, S. Rangan, [195] W. Dabney, M. Rowland, M. Bellemare, and R. Munos, ‘‘Distributional
and M. Zorzi, ‘‘End-to-end simulation of 5G mmWave networks,’’ IEEE reinforcement learning with quantile regression,’’ in Proc. AAAI Conf.
Commun. Surveys Tuts., vol. 20, no. 3, pp. 2237–2263, 3rd Quart., 2018. Artif. Intell., 2018, vol. 32, no. 1, pp. 1–10.
[172] P. Gawłowicz and A. Zubow, ‘‘Ns3-gym: Extending OpenAI gym for [196] S. Huang, R. F. J. Dossa, A. Raffin, A. Kanervisto, and W. Wang, ‘‘The 37
networking research,’’ 2018, arXiv:1810.03943. implementation details of proximal policy optimization,’’ in Proc. ICLR
[173] H. Yin, P. Liu, K. Liu, L. Cao, L. Zhang, Y. Gao, and X. Hei, ‘‘Ns3-AI: Blog Track, 2023, pp. 1–12.
Fostering artificial intelligence algorithms for networking research,’’ in [197] A. Kuznetsov, P. Shvechikov, A. Grishin, and D. Vetrov, ‘‘Controlling
Proc. Workshop Ns-3. New York, NY, USA: Association for Computing overestimation bias with truncated mixture of continuous distributional
Machinery, Jun. 2020, pp. 57–64, doi: 10.1145/3389400.3389404. quantile critics,’’ in Proc. Int. Conf. Mach. Learn., 2020, pp. 5556–5566.
[198] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, ‘‘Trust region [221] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue,
policy optimization,’’ in Proc. 32nd Int. Conf. Mach. Learn., vol. 37, Lille, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, C. Fernando,
France, 2015, pp. 1889–1897. and K. Kavukcuoglu, ‘‘Population based training of neural networks,’’
[199] S. Huang and S. Ontañón, ‘‘A closer look at invalid action masking in 2017, arXiv:1711.09846.
policy gradient algorithms,’’ 2020, arXiv:2006.14171. [222] J. Bergstra, D. Yamins, and D. Cox, ‘‘Making a science of model search:
[200] M. G. Bellemare, W. Dabney, and R. Munos, ‘‘A distributional perspective Hyperparameter optimization in hundreds of dimensions for vision archi-
on reinforcement learning,’’ in Proc. Int. Conf. Mach. Learn., 2017, tectures,’’ in Proc. 30th Int. Conf. Mach. Learn., vol. 28, S. Dasgupta and
pp. 449–458. D. McAllester, Eds. Atlanta, GA, USA: PMLR, Feb. 2013, pp. 115–123.
[201] K. W. Cobbe, J. Hilton, O. Klimov, and J. Schulman, ‘‘Phasic policy [Online]. Available: https://proceedings.mlr.press/v28/bergstra13.html
gradient,’’ in Proc. ICML, 2021, pp. 2020–2027. [223] R. Ostrovskiy and A. Gordon. (2020). Keras Tuner. GitHub repository.
[202] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, [Online]. Available: https://github.com/keras-team/keras-tuner
M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, [224] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, ‘‘Algorithms for hyper-
and D. Hassabis, ‘‘Mastering chess and shogi by self-play with parameter optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013,
a general reinforcement learning algorithm,’’ 2017, arXiv:1712. pp. 2546–2554. [Online]. Available: http://papers.nips.cc/paper/4443-
01815. algorithms-for-hyper-parameter-optimization
[203] Q. Wang, J. Xiong, L. Han, H. Liu, and T. Zhang, ‘‘Exponentially [225] M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and
weighted imitation learning for batched historical data,’’ in Proc. Adv. F. Hutter, ‘‘Efficient and robust automated machine learning,’’ in Proc.
Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–6. Adv. Neural Inf. Process. Syst., 2015, pp. 2962–2970. [Online]. Available:
[204] A. Kumar, A. Zhou, G. Tucker, and S. Levine, ‘‘Conservative Q-learning http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machin
for offline reinforcement learning,’’ in Proc. Int. Conf. Adv. Neural Inf. e-learning
Process. Syst., vol. 33, 2020, pp. 1179–1191. [226] M. Merenda, C. Porcaro, and D. Iero, ‘‘Edge machine learning for
[205] Z. Wang, ‘‘Critic regularized regression,’’ in Proc. Int. Conf. Adv. Neural AI-enabled IoT devices: A review,’’ Sensors, vol. 20, no. 9, p. 2533,
Inf. Process. Syst., vol. 33, 2020, pp. 7768–7778. Apr. 2020.
[206] D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, ‘‘Dream to control: [227] A.-S. Tonneau, N. Mitton, and J. Vandaele, ‘‘A survey on (mobile)
Learning behaviors by latent imagination,’’ 2019, arXiv:1912.01603. wireless sensor network experimentation testbeds,’’ in Proc. IEEE Int.
[207] L. Espeholt, ‘‘IMPALA: Scalable distributed deeP-RL with importance Conf. Distrib. Comput. Sensor Syst., May 2014, pp. 263–268.
weighted actor-learner architectures,’’ in Proc. Int. Conf. Mach. Learn., [228] M. Chernyshev, Z. Baig, O. Bello, and S. Zeadally, ‘‘Internet of Things
2018, pp. 1407–1416. (IoT): Research, simulators, and testbeds,’’ IEEE Internet Things J.,
[208] S. Kapturowski, G. Ostrovski, J. Quan, R. Munos, and W. Dabney, vol. 5, no. 3, pp. 1637–1647, Jun. 2018.
‘‘Recurrent experience replay in distributed reinforcement learning,’’ in [229] S. Zhu, S. Yang, X. Gou, Y. Xu, T. Zhang, and Y. Wan, ‘‘Survey of
Proc. Int. Conf. Learn. Represent., 2019, pp. 1–12. testing methods and testbed development concerning Internet of Things,’’
[209] M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, Wireless Pers. Commun., vol. 123, no. 1, pp. 165–194, Mar. 2022.
W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, ‘‘Rainbow: [230] R. Lim, F. Ferrari, M. Zimmerling, C. Walser, P. Sommer, and J. Beutel,
Combining improvements in deep reinforcement learning,’’ in Proc. AAAI ‘‘FlockLab: A testbed for distributed, synchronized tracing and profiling
Conf. Artif. Intell., vol. 32, no. 1, 2018, pp. 3215–3222. of wireless embedded systems,’’ in Proc. ACM/IEEE Int. Conf. Inf.
[210] E. Ie, V. Jain, J. Wang, S. Narvekar, R. Agarwal, R. Wu, H.-T. Cheng, Process. Sensor Netw. (IPSN), Apr. 2013, pp. 153–165.
T. Chandra, and C. Boutilier, ‘‘SlateQ: A tractable decomposition for [231] R. Trüb, R. Da Forno, L. Daschinger, A. Biri, J. Beutel, and L. Thiele,
reinforcement learning with recommendation sets,’’ in Proc. 28th Int. ‘‘Non-intrusive distributed tracing of wireless IoT devices with the
Joint Conf. Artif. Intell., Aug. 2019, pp. 2592–2599. FlockLab 2 testbed,’’ ACM Trans. Internet Things, vol. 3, no. 1, pp. 1–31,
[211] E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, 2021.
and D. Batra, ‘‘DD-PPO: Learning near-perfect PointGoal navigators [232] C. Adjih, E. Baccelli, E. Fleury, G. Harter, N. Mitton, T. Noel,
from 2.5 billion frames,’’ 2019, arXiv:1911.00357. R. Pissard-Gibollet, F. Saint-Marcel, G. Schreiner, J. Vandaele, and
[212] TensorFlow, ‘‘Tensorboard: A unified platform for visualizing live, rich T. Watteyne, ‘‘FIT IoT-LAB: A large scale open experimental IoT
data for tensorflow models,’’ in Proc. IEEE Conf. Comput. Vis. Pattern testbed,’’ in Proc. IEEE 2nd World Forum Internet Things (WF-IoT),
Recognit. (CVPR) Workshops, Jun. 2016. Dec. 2015, pp. 459–464.
[213] L. Biewald. (2020). Experiment Tracking With Weights and Biases. [233] M. Schuß, C. A. Boano, M. Weber, and K. Römer, ‘‘A competition to push
[Online]. Available: https://www.wandb.com/ the dependability of low-power wireless protocols to the edge,’’ in Proc.
[214] Comet.ml. (2018). Comet.ML: Machine Learning Operations Platform. 14th EWSN Conf., 2017, pp. 54–65.
[Online]. Available: https://www.comet.ml/ [234] D. Molteni, G. P. Picco, M. Trobinger, and D. Vecchia, ‘‘Cloves: A large-
[215] A. Chen, A. Chow, A. Davidson, A. DCunha, A. Ghodsi, S. A. Hong, scale ultra-wideband testbed,’’ in Proc. 20th ACM Conf. Embedded Netw.
A. Konwinski, C. Mewald, S. Murching, T. Nykodym, P. Ogilvie, Sensor Syst. New York, NY, USA: Association for Computing Machinery,
M. Parkhe, A. Singh, F. Xie, M. Zaharia, R. Zang, J. Zheng, and Nov. 2022, pp. 808–809, doi: 10.1145/3560905.3568072.
C. Zumar, ‘‘Developments in MLflow: A system to accelerate the [235] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawrzoniak,
machine learning lifecycle,’’ in Proc. 4th Int. Workshop Data Manage. and M. Bowman, ‘‘PlanetLab: An overlay testbed for broad-coverage
End End Mach. Learn. New York, NY, USA: Association for Computing services,’’ SIGCOMM Comput. Commun. Rev., vol. 33, no. 3, pp. 3–12,
Machinery, Jun. 2020, pp. 1–4, doi: 10.1145/3399579.3399867. Jul. 2003, doi: 10.1145/956993.956995.
[216] I. Habibie, M. Kleinsorge, Z. Al-Ars, J. Schneider, W. Kessler, and [236] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold,
T. Kuhlen, ‘‘Visdom: A tool for visualization and monitoring of machine M. Hibler, C. Barb, and A. Joglekar, ‘‘An integrated experimental
learning experiments,’’ Tech. Rep., Mar. 2017. environment for distributed systems and networks,’’ ACM SIGOPS
[217] Microsoft. (2021). Tensorwatch. GitHub repository. [Online]. Available: Operating Syst. Rev., vol. 36, pp. 255–270, Dec. 2002.
https://github.com/microsoft/tensorwatch [237] M. Berman, J. S. Chase, L. Landweber, A. Nakao, M. Ott,
[218] M. R. Asia. (2021). NNI (Neural Network Intelligence): An Open-Source D. Raychaudhuri, R. Ricci, and I. Seskar, ‘‘GENI: A federated testbed
Automl Toolkit for Neural Architecture Search and Hyper-parameter for innovative network experiments,’’ Comput. Netw., vol. 61, pp. 5–23,
Tuning. GitHub repository. [Online]. Available: https://github.com/ Mar. 2014.
microsoft/nni [238] L. Yang, F. Wen, J. Cao, and Z. Wang, ‘‘EdgeTB: A hybrid testbed for
[219] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, ‘‘Optuna: A distributed machine learning at the edge with high fidelity,’’ IEEE Trans.
next-generation hyperparameter optimization framework,’’ in Proc. 25th Parallel Distrib. Syst., vol. 33, no. 10, pp. 2540–2553, Oct. 2022.
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2019, [239] F. Hussain, R. Hussain, and E. Hossain, ‘‘Explainable artificial intelli-
pp. 2623–2631. gence (XAI): An engineering perspective,’’ 2021, arXiv:2101.03613.
[220] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, [240] S. Mukherjee, J. Rupe, and J. Zhu, ‘‘XAI for communication networks,’’
‘‘Tune: A research platform for distributed model selection and training,’’ in Proc. IEEE Int. Symp. Softw. Rel. Eng. Workshops (ISSREW),
2018, arXiv:1807.05118. Oct. 2022, pp. 359–364.
[241] C. Liaskos, S. Nie, A. Tsioliaridou, A. Pitsillides, S. Ioannidis, and [261] A. Kendall and Y. Gal, ‘‘What uncertainties do we need in Bayesian deep
I. Akyildiz, ‘‘End-to-end wireless path deployment with intelligent learning for computer vision?’’ in Proc. Adv. Neural Inf. Process. Syst.,
surfaces using interpretable neural networks,’’ IEEE Trans. Commun., vol. 30, 2017, pp. 1–5.
vol. 68, no. 11, pp. 6792–6806, Nov. 2020. [262] V. Dignum, Responsible Artificial Intelligence: How To Develop and Use
[242] A.-D. Marcu, S. K. G. Peesapati, J. M. Cortes, S. Imtiaz, and J. Gross, AI in a Responsible Way, vol. 2156. Cham, Switzerland: Springer, 2019.
‘‘Explainable artificial intelligence for energy-efficient radio resource [263] Y. Siriwardhana, P. Porambage, M. Liyanage, and M. Ylianttila, ‘‘AI
management,’’ in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), and 6G security: Opportunities and challenges,’’ in Proc. Joint Eur.
Mar. 2023, pp. 1–6. Conf. Netw. Commun. 6G Summit (EuCNC/6G Summit), Jun. 2021,
[243] P. Barnard, I. Macaluso, N. Marchetti, and L. A. DaSilva, ‘‘Resource pp. 616–621.
reservation in sliced networks: An explainable artificial intelligence [264] Q. Lu, L. Zhu, X. Xu, J. Whittle, D. Zowghi, and A. Jacquet,
(XAI) approach,’’ in Proc. IEEE Int. Conf. Commun., May 2022, ‘‘Responsible AI pattern catalogue: A collection of best practices for
pp. 1530–1535. AI governance and engineering,’’ ACM Comput. Surv., vol. 56, no. 7,
[244] A. Palaios, C. L. Vielhaus, D. F. Klzer, C. Watermann, R. Hernangomez, pp. 1–35, Oct. 2023, doi: 10.1145/3626234.
S. Partani, P. Geuer, A. Krause, R. Sattiraju, M. Kasparick, G. Fettweis, [265] W. Yang, H. Le, T. Laud, S. Savarese, and S. C. H. Hoi, ‘‘OmniXAI: A
F. H. P. Fitzek, H. D. Schotten, and S. Stanczak, ‘‘The story of QoS library for explainable AI,’’ 2022, arXiv:2206.01612.
prediction in vehicular communication: From radio environment statistics [266] V. Arya, R. K. E. Bellamy, P.-Y. Chen, A. Dhurandhar, M. Hind,
to network-access throughput prediction,’’ Tech. Rep., 2023. [Online]. S. C. Hoffman, S. Houde, Q. V. Liao, R. Luss, A. Mojsilović, S. Mourad,
Available: https://arxiv.org/abs/2302.11966 P. Pedemonte, R. Raghavendra, J. Richards, P. Sattigeri, K. Shanmugam,
[245] S. Hariharan, A. Velicheti, A. S. Anagha, C. Thomas, and N. Balakrish- M. Singh, K. R. Varshney, D. Wei, and Y. Zhang, ‘‘AI explainability 360
nan, ‘‘Explainable artificial intelligence in cybersecurity: A brief review,’’ toolkit,’’ in Proc. 3rd ACM India Joint Int. Conf. Data Sci. Manage. Data
in Proc. 4th Int. Conf. Secur. Privacy (ISEA-ISAP), Oct. 2021, pp. 1–12. (8th ACM IKDD CODS 26th COMAD), Jan. 2021, pp. 376–379.
[246] N. Capuano, G. Fenza, V. Loia, and C. Stanzione, ‘‘Explainable [267] J. Klaise, A. Van Looveren, G. Vacanti, and A. Coca, ‘‘Alibi explain:
artificial intelligence in CyberSecurity: A survey,’’ IEEE Access, vol. 10, Algorithms for explaining machine learning models,’’ J. Mach. Learn.
pp. 93575–93600, 2022. Res., vol. 22, no. 1, pp. 8194–8200, 2021.
[247] C. Molnar, Interpretable Machine Learning, 2nd ed., 2022. [Online]. [268] M. Sundararajan, A. Taly, and Q. Yan, ‘‘Axiomatic attribution for
Available: https://christophm.github.io/interpretable-ml-book and https:// deep networks,’’ in Proc. Int. Conf. Mach. Learn., vol. 70, 2017,
www.amazon.de/Interpretable-Machine-Learning-Making-Explainable/ pp. 3319–3328.
dp/B09TMWHVB4 [269] N. Kokhlikyan, V. Miglani, M. Martin, E. Wang, B. Alsallakh,
[248] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, J. Reynolds, A. Melnikov, N. Kliushkina, C. Araya, S. Yan, and
A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, O. Reblitz-Richardson, ‘‘Captum: A unified and generic model inter-
and F. Herrera, ‘‘Explainable artificial intelligence (XAI): Concepts, pretability library for PyTorch,’’ 2020, arXiv:2009.07896.
taxonomies, opportunities and challenges toward responsible AI,’’ Inf. [270] (2019). TorchRay. [Online]. Available: https://github.com/facebook
Fusion, vol. 58, pp. 82–115, Jun. 2020. research/TorchRay
[249] K. Simonyan, A. Vedaldi, and A. Zisserman, ‘‘Deep inside convolutional [271] (2019). TF-Explain. [Online]. Available: https://tf-explain.readthedocs.
networks: Visualising image classification models and saliency maps,’’ io/en/latest/index.html
2013, arXiv:1312.6034. [272] C. Zhang, P. Patras, and H. Haddadi, ‘‘Deep learning in mobile and
[250] C. Rudin, ‘‘Stop explaining black box machine learning models for high wireless networking: A survey,’’ IEEE Commun. Surveys Tuts., vol. 21,
stakes decisions and use interpretable models instead,’’ Nature Mach. no. 3, pp. 2224–2287, 3rd Quart., 2019.
Intell., vol. 1, no. 5, pp. 206–215, May 2019. [273] H. Hellström, J. Mairton B. da Silva Jr., M. M. Amiri, M. Chen, V. Fodor,
[251] T. Shapira and Y. Shavitt, ‘‘FlowPic: Encrypted Internet traffic classi- H. V. Poor, and C. Fischione, ‘‘Wireless for machine learning,’’ 2020,
fication is as easy as image recognition,’’ in Proc. IEEE INFOCOM arXiv:2008.13492.
Conf. Comput. Commun. Workshops (INFOCOM WKSHPS), Apr. 2019, [274] D. Jin, Z. Yu, P. Jiao, S. Pan, D. He, J. Wu, P. S. Yu, and W. Zhang, ‘‘A
pp. 680–687. survey of community detection approaches: From statistical modeling
[252] S. M. Lundberg and S.-I. Lee, ‘‘A unified approach to interpreting model to deep learning,’’ IEEE Trans. Knowl. Data Eng., vol. 35, no. 2,
predictions,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 1149–1170, Feb. 2023.
pp. 1–7. [275] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and
[253] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘‘Why should I trust you?’ M. Sun, ‘‘Graph neural networks: A review of methods and applications,’’
explaining the predictions of any classifier,’’ in Proc. 22nd ACM SIGKDD 2018, arXiv:1812.08434.
Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 1135–1144. [276] M. A. Ridwan, N. A. M. Radzi, F. Abdullah, and Y. E. Jalil, ‘‘Applications
[254] H. Nori, S. Jenkins, P. Koch, and R. Caruana, ‘‘InterpretML: of machine learning in networking: A survey of current issues and future
A unified framework for machine learning interpretability,’’ 2019, challenges,’’ IEEE Access, vol. 9, pp. 52523–52556, 2021.
arXiv:1909.09223. [277] F. Tang, B. Mao, N. Kato, and G. Gui, ‘‘Comprehensive survey on
[255] R. Agarwal, ‘‘Neural additive models: Interpretable machine learning machine learning in vehicular network: Technology, applications and
with neural nets,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 34. Red challenges,’’ IEEE Commun. Surveys Tuts., vol. 23, no. 3, pp. 2027–2057,
Hook, NY, USA: Curran Associates, 2021, pp. 4699–4711. 3rd Quart., 2021.
[256] N. Wehner, A. Seufert, T. Hoßfeld, and M. Seufert, ‘‘Explainable data- [278] E. García-Martín, C. F. Rodrigues, G. Riley, and H. Grahn, ‘‘Esti-
driven QoE modelling with XAI,’’ in Proc. 15th Int. Conf. Quality mation of energy consumption in machine learning,’’ J. Parallel
Multimedia Exper. (QoMEX), Jun. 2023, pp. 7–12. Distrib. Comput., vol. 134, pp. 75–88, Dec. 2019. [Online]. Available:
[257] K. Brunnström, S. A. Beker, K. De Moor, A. Dooms, S. Egger, https://www.sciencedirect.com/science/article/pii/S0743731518308773
M.-N. Garcia, T. Hossfeld, S. Jumisko-Pyykkö, C. Keimel, and [279] L. Song, X. Hu, G. Zhang, P. Spachos, K. N. Plataniotis, and H. Wu,
M.-C. Larabi, ‘‘Qualinet white paper on definitions of quality of expe- ‘‘Networking systems of AI: On the convergence of computing and com-
rience,’’ Tech. Rep., 2013. [Online]. Available: https://hal.science/hal- munications,’’ IEEE Internet Things J., vol. 9, no. 20, pp. 20352–20381,
00977812 Oct. 2022.
[258] N. Wehner, M. Seufert, J. Schuler, S. Wassermann, P. Casas, and [280] G. Drainakis, K. V. Katsaros, P. Pantazopoulos, V. Sourlas, and
T. Hossfeld, ‘‘Improving web QoE monitoring for encrypted network A. Amditis, ‘‘Federated vs. centralized machine learning under privacy-
traffic through time series modeling,’’ ACM SIGMETRICS Perform. Eval. elastic users: A comparative analysis,’’ in Proc. IEEE 19th Int. Symp.
Rev., vol. 48, no. 4, pp. 37–40, May 2021. Netw. Comput. Appl. (NCA), Nov. 2020, pp. 1–8.
[259] E. Hüllermeier and W. Waegeman, ‘‘Aleatoric and epistemic uncertainty [281] I. A. Majeed, S. Kaushik, A. Bardhan, V. S. K. Tadi, H.-K. Min,
in machine learning: An introduction to concepts and methods,’’ Mach. K. Kumaraguru, and R. D. Muni, ‘‘Comparative assessment of federated
Learn., vol. 110, no. 3, pp. 457–506, Mar. 2021. and centralized machine learning,’’ 2022, arXiv:2202.01529.
[260] A. F. Psaros, X. Meng, Z. Zou, L. Guo, and G. E. Karniadakis, [282] W. Hassan, T.-S. Chou, O. Tamer, J. Pickard, P. Appiah-Kubi, and
‘‘Uncertainty quantification in scientific machine learning: Methods, L. Pagliari, ‘‘Cloud computing survey on services, enhancements and
metrics, and comparisons,’’ J. Comput. Phys., vol. 477, Mar. 2023, challenges in the era of machine learning and data science,’’ Int. J.
Art. no. 111902. Informat. Commun. Technol. (IJ-ICT), vol. 9, no. 2, p. 117, Aug. 2020.
[283] Y. Ko, K. Choi, H. Jei, D. Lee, and S.-W. Kim, ‘‘ALADDIN: Asymmetric [306] J. Shuja, K. Bilal, W. Alasmary, H. Sinky, and E. Alanazi, ‘‘Applying
centralized training for distributed deep learning,’’ in Proc. 30th ACM Int. machine learning techniques for caching in next-generation edge
Conf. Inf. Knowl. Manage., Oct. 2021, pp. 863–872. networks: A comprehensive survey,’’ J. Netw. Comput. Appl., vol. 181,
[284] X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen, May 2021, Art. no. 103005.
‘‘Convergence of edge computing and deep learning: A comprehensive [307] J. Xie, F. R. Yu, T. Huang, R. Xie, J. Liu, C. Wang, and Y. Liu, ‘‘A survey
survey,’’ IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 869–904, of machine learning techniques applied to software defined networking
2nd Quart., 2020. (SDN): Research issues and challenges,’’ IEEE Commun. Surveys Tuts.,
[285] F. Samie, L. Bauer, and J. Henkel, ‘‘From cloud down to things: An vol. 21, no. 1, pp. 393–430, 1st Quart., 2019.
overview of machine learning in Internet of Things,’’ IEEE Internet
[308] R. Boutaba, M. A. Salahuddin, N. Limam, S. Ayoubi, N. Shahriar,
Things J., vol. 6, no. 3, pp. 4921–4934, Jun. 2019.
F. Estrada-Solano, and O. M. Caicedo, ‘‘A comprehensive survey on
[286] W. Toussaint and A. Y. Ding, ‘‘Machine learning systems in the IoT:
machine learning for networking: Evolution, applications and research
Trustworthiness trade-offs for edge intelligence,’’ in Proc. IEEE 2nd Int.
opportunities,’’ J. Internet Services Appl., vol. 9, no. 1, pp. 1–99,
Conf. Cognit. Mach. Intell. (CogMI), Oct. 2020, pp. 177–184.
Dec. 2018.
[287] A. Smola and S. Narayanamurthy, ‘‘An architecture for parallel topic
models,’’ Proc. VLDB Endowment, vol. 3, nos. 1–2, pp. 703–710, [309] Y. Liu, F. R. Yu, X. Li, H. Ji, and V. C. M. Leung, ‘‘Blockchain and
Sep. 2010. machine learning for communications and networking systems,’’ IEEE
[288] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, Commun. Surveys Tuts., vol. 22, no. 2, pp. 1392–1431, 2nd Quart., 2020.
J. Long, E. J. Shekita, and B.-Y. Su, ‘‘Scaling distributed machine learning [310] O. Nassef, W. Sun, H. Purmehdi, M. Tatipamula, and T. Mahmoodi,
with the parameter server,’’ in Proc. 11th USENIX Symp. Operating Syst. ‘‘A survey: Distributed machine learning for 5G and beyond,’’ Comput.
Des. Implement. (OSDI), 2014, pp. 583–598. Netw., vol. 207, Apr. 2022, Art. no. 108820.
[289] J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, [311] O. A. Wahab, A. Mourad, H. Otrok, and T. Taleb, ‘‘Federated machine
M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng, learning: Survey, multi-level classification, desirable criteria and future
‘‘Large scale distributed deep networks,’’ in Proc. Adv. Neural Inf. directions in communication and networking systems,’’ IEEE Commun.
Process. Syst., vol. 25, 2012, pp. 1–8. Surveys Tuts., vol. 23, no. 2, pp. 1342–1397, 2nd Quart., 2021.
[290] J. Jiang, B. Cui, C. Zhang, and L. Yu, ‘‘Heterogeneity-aware distributed [312] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang,
parameter servers,’’ in Proc. ACM Int. Conf. Manage. Data. New D. Niyato, and C. Miao, ‘‘Federated learning in mobile edge networks:
York, NY, USA: Association for Computing Machinery, May 2017, A comprehensive survey,’’ IEEE Commun. Surveys Tuts., vol. 22, no. 3,
pp. 463–478. pp. 2031–2063, 3rd Quart., 2020.
[291] B. McMahan, ‘‘Communication-efficient learning of deep networks from
[313] A. Imteaj, K. Mamun Ahmed, U. Thakker, S. Wang, J. Li, and
decentralized data,’’ in Proc. 20th Int. Conf. Artif. Intell. Statist., 2017,
M. H. Amini, ‘‘Federated learning for resource-constrained IoT devices:
pp. 1273–1282.
Panoramas and state of the art,’’ in Federated and Transfer Learning.
[292] Y. Liu, Y. Kang, C. Xing, T. Chen, and Q. Yang, ‘‘A secure federated
Cham, Switzerland: Springer, 2022, pp. 7–27.
transfer learning framework,’’ IEEE Intell. Syst., vol. 35, no. 4, pp. 70–82,
Jul. 2020. [314] M. Abbasi, A. Shahraki, and A. Taherkordi, ‘‘Deep learning for network
[293] F. Chen, M. Luo, Z. Dong, Z. Li, and X. He, ‘‘Federated meta- traffic monitoring and analysis (NTMA): A survey,’’ Comput. Commun.,
learning with fast convergence and efficient communication,’’ 2018, vol. 170, pp. 19–41, Mar. 2021.
arXiv:1802.07876. [315] F. Hussain, S. A. Hassan, R. Hussain, and E. Hossain, ‘‘Machine learning
[294] A. Gibiansky, ‘‘Bringing HPC techniques to deep learning,’’ Baidu Res., for resource management in cellular and IoT networks: Potentials, current
Beijing, China, Tech. Rep., 2017. solutions, and open challenges,’’ IEEE Commun. Surveys Tuts., vol. 22,
[295] P. Patarasuk and X. Yuan, ‘‘Bandwidth optimal all-reduce algorithms for no. 2, pp. 1251–1275, 2nd Quart., 2020.
clusters of workstations,’’ in Proc. J. Parallel Distrib. Comput., Feb. 2009, [316] A. Talpur and M. Gurusamy, ‘‘Machine learning for security in vehicular
vol. 69, no. 2, pp. 117–124. networks: A comprehensive survey,’’ IEEE Commun. Surveys Tuts.,
[296] H. Zhao and J. Canny, ‘‘Butterfly mixing: Accelerating incremental- vol. 24, no. 1, pp. 346–379, 1st Quart., 2022.
update algorithms on clusters,’’ in Proc. SIAM Int. Conf. Data Mining, [317] M. A. Hossain, R. M. Noor, K. A. Yau, S. R. Azzuhri, M. R. Z’aba, and
May 2013, pp. 785–793. I. Ahmedy, ‘‘Comprehensive survey of machine learning approaches in
[297] X. Wan, H. Zhang, H. Wang, S. Hu, J. Zhang, and K. Chen, ‘‘RAT– cognitive radio-based vehicular ad hoc networks,’’ IEEE Access, vol. 8,
Resilient allreduce tree for distributed machine learning,’’ in Proc. 4th pp. 78054–78108, 2020.
Asia–Pacific Workshop Netw., Aug. 2020, pp. 52–57.
[298] O. Gupta and R. Raskar, ‘‘Distributed learning of deep neural network [318] W. Guo, ‘‘Explainable artificial intelligence for 6G: Improving trust
over multiple agents,’’ J. Netw. Comput. Appl., vol. 116, pp. 1–8, between human and machine,’’ IEEE Commun. Mag., vol. 58, no. 6,
Aug. 2018. pp. 39–45, Jun. 2020.
[299] E. Samikwa, A. D. Maio, and T. Braun, ‘‘ARES: Adaptive resource-aware [319] Y. Zheng, Z. Liu, X. You, Y. Xu, and J. Jiang, ‘‘Demystifying deep
split learning for Internet of Things,’’ Comput. Netw., vol. 218, Dec. 2022, learning in networking,’’ in Proc. 2nd Asia–Pacific Workshop Netw.,
Art. no. 109380. Aug. 2018, pp. 1–7.
[300] V. Turina, Z. Zhang, F. Esposito, and I. Matta, ‘‘Federated or split? A [320] A. Adadi and M. Berrada, ‘‘Peeking inside the black-box: A survey
performance and privacy analysis of hybrid split and federated learning on explainable artificial intelligence (XAI),’’ IEEE Access, vol. 6,
architectures,’’ in Proc. IEEE 14th Int. Conf. Cloud Comput. (CLOUD), pp. 52138–52160, 2018.
Sep. 2021, pp. 250–260. [321] A. Heuillet, F. Couthouis, and N. Díaz-Rodríguez, ‘‘Explainability in
[301] Y. Gao, M. Kim, C. Thapa, A. Abuadbba, Z. Zhang, S. Camtepe, H. Kim, deep reinforcement learning,’’ Knowl.-Based Syst., vol. 214, Feb. 2021,
and S. Nepal, ‘‘Evaluation and optimization of distributed machine Art. no. 106685.
learning techniques for Internet of Things,’’ IEEE Trans. Comput., vol. 71, [322] A. Paleyes, R.-G. Urma, and N. D. Lawrence, ‘‘Challenges in deploying
no. 10, pp. 2538–2552, Oct. 2022. machine learning: A survey of case studies,’’ ACM Comput. Surv., vol. 55,
[302] E. Samikwa, A. Di Maio, and T. Braun, ‘‘Adaptive early exit of no. 6, pp. 1–29, Jul. 2023.
computation for energy-efficient and low-latency machine learning over
IoT networks,’’ in Proc. IEEE 19th Annu. Consum. Commun. Netw. Conf. [323] S. Faezi and A. Shirmarz, ‘‘A comprehensive survey on machine learning
(CCNC), Jan. 2022, pp. 200–206. using in software defined networks (SDN),’’ Hum.-Centric Intell. Syst.,
[303] Y. Matsubara, M. Levorato, and F. Restuccia, ‘‘Split computing and early vol. 3, no. 3, pp. 312–343, Jun. 2023.
exiting for deep learning applications: Survey and research challenges,’’ [324] S. Sezer, S. Scott-Hayward, P. K. Chouhan, B. Fraser, D. Lake,
ACM Comput. Surv., vol. 55, no. 5, pp. 1–30, May 2023. J. Finnegan, N. Viljoen, M. Miller, and N. Rao, ‘‘Are we ready for
[304] Y. Matsubara, R. Yang, M. Levorato, and S. Mandt, ‘‘SC2 benchmark: SDN? Implementation challenges for software-defined networks,’’ IEEE
Supervised compression for split computing,’’ 2022, arXiv:2203.08875. Commun. Mag., vol. 51, no. 7, pp. 36–43, Jul. 2013.
[305] M. G. S. Murshed, C. Murphy, D. Hou, N. Khan, G. Ananthanarayanan, [325] C. Lu, A. Saifullah, B. Li, M. Sha, H. Gonzalez, D. Gunatilaka, C. Wu,
and F. Hussain, ‘‘Machine learning at the network edge: A survey,’’ L. Nie, and Y. Chen, ‘‘Real-time wireless sensor-actuator networks
ACM Comput. Surveys, vol. 54, no. 8, pp. 1–37, Oct. 2021, doi: for industrial cyber-physical systems,’’ Proc. IEEE, vol. 104, no. 5,
10.1145/3469029. pp. 1013–1024, May 2016.
[326] H. Kopetz and W. Steiner, ‘‘Real-time communication,’’ in Real-time [345] R. Desislavov, F. Martínez-Plumed, and J. Hernández-Orallo, ‘‘Trends in
Systems: Design Principles for Distributed Embedded Applications. AI inference energy consumption: Beyond the performance-vs-parameter
Cham, Switzerland: Springer, 2022, pp. 177–200. laws of deep learning,’’ Sustain. Comput., Informat. Syst., vol. 38,
[327] D. Zhang, P. Shi, Q.-G. Wang, and L. Yu, ‘‘Analysis and synthesis of Apr. 2023, Art. no. 100857. [Online]. Available: https://www.science
networked control systems: A survey of recent advances and challenges,’’ direct.com/science/article/pii/S2210537923000124
ISA Trans., vol. 66, pp. 376–392, Jan. 2017. [346] T.-J. Yang, Y.-H. Chen, J. Emer, and V. Sze, ‘‘A method to estimate the
[328] S. Ramstedt and C. Pal, ‘‘Real-time reinforcement learning,’’ in Proc. Adv. energy consumption of deep neural networks,’’ in Proc. 51st Asilomar
Neural Inf. Process. Syst., vol. 32, 2019, pp. 1–9. Conf. Signals, Syst., Comput., Oct. 2017, pp. 1916–1920.
[329] J. Mendez, K. Bierzynski, M. P. Cuéllar, and D. P. Morales, ‘‘Edge [347] T.-J. Yang, Y.-H. Chen, and V. Sze, ‘‘Deep neural network energy
intelligence: Concepts, architectures, applications, and future directions,’’ estimation tool,’’ MIT, Tech. Rep., Jan. 2017. Accessed: Jan. 25, 2024.
ACM Trans. Embedded Comput. Syst., vol. 21, no. 5, pp. 1–41, Sep. 2022. [Online]. Available: https://energyestimation.mit.edu/
[348] J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han, and J. Cohn, ‘‘MCUNet: Tiny
[330] L. E. Lwakatare, A. Raj, I. Crnkovic, J. Bosch, and H. H. Olsson, ‘‘Large-
deep learning on IoT devices,’’ in Proc. Adv. Neural Inf. Process. Syst.,
scale machine learning systems in real-world industrial settings: A review
vol. 33, 2020, pp. 11711–11722.
of challenges and solutions,’’ Inf. Softw. Technol., vol. 127, Nov. 2020,
Art. no. 106368. [349] H. Cai, C. Gan, L. Zhu, and S. Han, ‘‘TinyTL: Reduce activations,
not trainable parameters for efficient on-device learning,’’ 2020,
[331] A. Redder, A. Ramaswamy, and H. Karl, ‘‘Stability and convergence of
arXiv:2007.11622.
distributed stochastic approximations with large unbounded stochastic
[350] L. Heim, A. Biri, Z. Qu, and L. Thiele, ‘‘Measuring what really matters:
information delays,’’ 2023, arXiv:2305.07091.
Optimizing neural networks for TinyML,’’ 2021, arXiv:2104.10645.
[332] R. Bless, B. Bloessl, M. Hollick, M. Corici, H. Karl, D. Krummacker, [351] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, ‘‘Deep gradient
D. Lindenschmitt, H. D. Schotten, and L. Wimmer, ‘‘Dynamic network compression: Reducing the communication bandwidth for distributed
(re-)configuration across time, scope, and structure,’’ in Proc. Joint Eur. training,’’ 2017, arXiv:1712.01887.
Conf. Netw. Commun. 6G Summit (EuCNC/6G Summit), Jun. 2022, [352] Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki,
pp. 547–552. and A. S. Hafid, ‘‘A comprehensive survey on TinyML,’’ IEEE Access,
[333] J. Hoydis, F. A. Aoudia, A. Valcarce, and H. Viswanathan, ‘‘Toward a 6G vol. 11, pp. 96892–96922, 2023.
AI-native air interface,’’ IEEE Commun. Mag., vol. 59, no. 5, pp. 76–81,
May 2021.
[334] F. Ait Aoudia, J. Hoydis, A. Valcarce, and H. Viswanathan. (2021). HAITHAM AFIFI (Member, IEEE) received the
Toward a 6G AI-Native Air Interface. [Online]. Available: https://www. B.Sc. degree in information engineering and tech-
bell-labs.com/institute/white-papers/toward-6g-ai-native-air-interface/ nology and the M.Sc. degree in communication
[335] R. Schwarz. (2023). Enabling an AI-Native Air Interface for 6G: Rohde engineering from German University in Cairo, in
& Schwarz Showcases AI/ML-Based Neural Receiver With Optimized 2014 and 2015, respectively, and the Ph.D. degree
Modulation at Brooklyn 6G Summit, in Collaboration With Nvidia. from the Hasso Plattner Institute, in 2023. He has
[Online]. Available: https://www.rohde-schwarz.com/se/about/news- industry experience as a Network Engineer with
press/all-news/enabling-an-ai-native-air-interface-for-6g-rohde- Orange Business Services and an IT Consultant for
schwarz-showcases-ai-ml-based-neural-receiver-with-optimized- integrating generative AI in network operations.
modulation-at-brooklyn-6g-summit-in-collaboration-with-nvidia- His research interests include wireless network
press-release-detailpage229356-1425541.html virtualization and reinforcement learning and network optimization.
[336] Ericsson. (2021). Defining AI Native: A Key Enabler for Advanced Intel-
ligent Telecom Networks. [Online]. Available: https://www.ericsson.com/
SABRINA POCHABA received the master’s
en/reports-and-papers/white-papers/ai-native
degree in mathematics from Ruprecht Karls Uni-
[337] C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V. Poor,
versity, Heidelberg, Germany. She is currently
‘‘Less data, more knowledge: Building next generation semantic
communication networks,’’ Tech. Rep., 2022. [Online]. Available: pursuing the Ph.D. degree with Paris Lodron
https://arxiv.org/abs/2211.14343 University of Salzburg. Since 2021, she has a Data
[338] C. K. Thomas, C. Chaccour, W. Saad, M. Debbah, and C. S. Hong, Scientist with Salzburg Research Forschungs-
‘‘Causal reasoning: Charting a revolutionary course for next-generation gesellschaft. There she is engaged in different
AI-native wireless networks,’’ IEEE Veh. Technol. Mag., vol. 19, no. 1, machine learning methods, focusing on networks
pp. 16–31, Mar. 2024. and communication.
[339] A. Mughees, M. Tahir, M. A. Sheikh, and A. Ahad, ‘‘Towards energy
efficient 5G networks using machine learning: Taxonomy, research
challenges, and future research directions,’’ IEEE Access, vol. 8, ANDREAS BOLTRES received the B.S. and M.S.
pp. 187498–187522, 2020. degrees in informatics from Karlsruhe Institute
[340] D. Li, X. Chen, M. Becchi, and Z. Zong, ‘‘Evaluating the energy of Technology (KIT), Germany, in 2017 and
efficiency of deep convolutional neural networks on CPUs and GPUs,’’ 2021, respectively, where he is currently pursuing
in Proc. IEEE Int. Conf. Big Data Cloud Comput. (BDCloud), Social the Ph.D. degree with the Autonomous Learning
Comput. Netw. (SocialCom), Sustain. Comput. Commun. (SustainCom) Robots Laboratory. His research interests include
(BDCloud-SocialCom-SustainCom), Oct. 2016, pp. 477–484. multi-agent and swarm reinforcement learning
[341] M. Svedin, S. W. D. Chien, G. Chikafa, N. Jansson, and A. Podobas, and their applications to robotics and computer
‘‘Benchmarking the Nvidia GPU lineage: From early K80 to mod- networking, in particular routing optimization and
ern A100 with asynchronous memory transfers,’’ in Proc. 11th Int.
traffic engineering.
Symp. Highly Efficient Accel. Reconfigurable Technol., Jun. 2021,
pp. 1–6.
[342] E. Samikwa, A. D. Maio, and T. Braun, ‘‘DISNET: Distributed micro- DOMINIC LANIEWSKI (Graduate Student Mem-
split deep learning in heterogeneous dynamic IoT,’’ IEEE Internet Things ber, IEEE) received the B.S. degree in information
J., vol. 11, no. 4, pp. 6199–6216, Feb. 2024. systems from the University of Münster, in 2017,
[343] Y. Chen, T.-J. Yang, J. Emer, and V. Sze, ‘‘Understanding the limitations and the M.S. degree in computer science from
of existing energy-efficient design approaches for deep neural networks,’’ Osnabrück University, in 2019, where he is
Energy, vol. 2, no. L1, p. L3, 2018. currently pursuing the Ph.D. degree with the
[344] E. Hossain and F. Fredj, ‘‘Editorial energy efficiency of machine- Distributed Systems Group. His research interests
learning-based designs for future wireless systems and networks,’’ include machine learning for networks, QoE of
IEEE Trans. Green Commun. Netw., vol. 5, no. 3, pp. 1005–1010, streaming applications, video and point cloud
Sep. 2021. streaming, and robot communications.
JANEK HABERER received the B.Sc. and M.Sc. NIKOLAS WEHNER received the master’s degree
degrees in computer science from Kiel University, in computer science from the University of
Germany, in 2019 and 2021, respectively, where Würzburg, Germany. In 2018, he started to work
he is currently pursuing the Ph.D. degree with the as a Research Engineer with the Center for
Distributed Systems Group. His research interests Technology Experience, AIT—Austrian Institute
include distributed machine learning and its appli- of Technology, Vienna, Austria. Since October
cations in the Internet of Things, particularly edge 2019, he has been a Ph.D. Researcher with the
computing and split learning. Chair of Communication Networks, University
of Würzburg. His research interests include QoE
of internet applications, machine learning for
networks, and user-centric communication networks.
LEONARD PAELEKE received the B.Eng. ADRIAN REDDER received the Master of Science
degree in mechanical engineering from Berliner degree in electrical engineering from Paderborn
Hochschule für Technik (BHT), in 2018, and the University (UPB), Germany, in 2019, with a major
M.S. degree in computational engineering from in control and information theory, and the Ph.D.
TU Berlin, in 2022. Since April 2022, he has been degree in computer science with a thesis on
a Ph.D. Researcher with the Internet-Technologies distributed stochastic approximation algorithms
and Softwarization Group, Hasso Plattner Institute in April 2024. After Graduate studies, he was a
(HPI). His research interests include machine member of the Computer Networks Group at UPB,
learning for networks, networks for machine and since October 2023, he has been a member of
learning, distributed machine learning, and their the Automatic Control Group at UPB. His research
application in mobile telecommunication networks, such as 6G. interests include control theory, reinforcement learning, distributed systems,
and stochastic approximation algorithms.
REZA POORZARE (Member, IEEE) received the ERIC SAMIKWA received the M.Sc. degree
B.S. and M.S. degrees in computer engineering in computer science and engineering from the
from Azad University, Iran, in 2010 and 2014, Royal Institute of Technology (KTH), Stockholm,
respectively, and the Ph.D. degree in network engi- Sweden, in 2020. He is currently pursuing the
neering from Universitat Politcnica de Catalunya, Ph.D. degree with the Communication and Dis-
Barcelona, Spain, in 2022. Currently, he is a tributed Systems Group, Institute of Computer
Postdoctoral Researcher with the Data-Centric Science, University of Bern, Bern, Switzerland.
Software Systems (DSS) Research Group, Insti- His research interests include distributed machine
tute of Applied Research, Karlsruhe University learning, federated learning, split learning, edge
of Applied Sciences, Karlsruhe, Germany. His computing, and the Internet of Things.
research interests include 5G, mmWave, wireless mobile networks, TCP,
MPTCP, congestion control, and artificial intelligence.