Machine_Learning_with_Computer_Networks_Techniques
Machine_Learning_with_Computer_Networks_Techniques
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2023.0322000
ABSTRACT Machine learning has found many applications in network contexts. These include
solving optimisation problems and managing network operations. Conversely, networks are essential
for facilitating machine learning training and inference, whether performed centrally or in a dis-
tributed fashion. To conduct rigorous research in this area, researchers must have a comprehensive
understanding of fundamental techniques, specific frameworks, and access to relevant datasets.
Additionally, access to training data can serve as a benchmark or a springboard for further
investigation. All these techniques are summarized in this article; serving as a primer paper and
hopefully providing an efficient start for anybody doing research regarding machine learning for
networks or using networks for machine learning.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
sentative portion of the problem’s data domain. Also, overwhelmed by the possibilities the intersection of ML
more sophisticated models can quickly explode in terms and computer networking provides. The key points of
of parameter/compute operation count and thus often the paper are the following:
require specialized training hardware (i.e. memory and
• It first introduces the most relevant concepts and
compute). Nevertheless, the continuous improvement of
model architectures of ML and then puts them
used hardware as well as the increased attention towards
into the context of the different networking problem
training data acquisition, preparation and generation
domains and the latest advancements therein,
has paved the way for ML to enter into more and more
• It exposes the currently open problems within
application domains.
computer networking and introduces a selection of
Computer networking is a highly complex problem
different tools, data sets, and approaches that have
domain with a plethora of tasks and problems that,
been popular among the research community and
to this day, are solved predominantly through hand-
might serve as a starting point for future work,
crafted, algorithmic, or heuristic methods. These meth-
• It covers several techniques for utilizing networks
ods have to respect a wide range of topologies, network
to improve ML efficiency, such as reducing re-
types and scopes, configurations, hardware and protocol
source requirements via Split Learning (SL) and
stacks, traffic patterns, and other sources of variation.
distributed training via Federated Learning (FL)
Furthermore, there are many different ways to assess
or incorporating the right inductive biases into ML
network performance, and in many cases, minimum
models to improve their ability to generalize from
performance guarantees and security policies add special
limited data,
constraints to the optimization problem. Additionally,
• It discusses challenges related to networks for ML,
contemporary networks use specialized hardware to de-
such as resource constraints, security concerns, and
liver optimized performance, e.g. for forwarding packets
the lack of understanding of how ML models make
at line speed. Oftentimes, this hardware does not easily
decisions (and how techniques such as Explainable
allow ML models to replace existing functionality, e.g.
Artificial Intelligence (XAI) may help in gaining
because certain types of computations are not supported
understanding),
or because the storage is not available for more complex
• It comprehensively provides pointers for further
ML models. Finally, while network administrators and
study on related surveys and research.
networking researchers do monitor their networks in
action, the amount of useful ML training data in net- The organization of the paper is visualized in Figure 1,
working – data that is not noisy nor incomplete, publicly and the remainder is organized as follows: Section II
available, and diverse enough to cover large parts of the explains the basic concepts and categories of ML and
problem’s underlying data domain – is only a fraction of relates common networking problems to them. Sec-
what other problem domains have at their disposal. As a tion III introduces the ML subfield of deep learning,
consequence, optimizing network performance has so far which has been responsible for most of the recent ML
been largely beyond the reach of ML research. However, breakthroughs, elaborating on the most common model
given the increased visibility of ML, researchers are architectures and how and why they are suited to
beginning to take on the aforementioned challenges of specific tasks within computer networking. Thereafter,
the networking community on ML, and combining ML Section IV shed light on the variety of accessible data
and networking in research seems more attractive than sets, tools, and frameworks that ease the development
ever. Furthermore, computer network infrastructures and training of ML-powered networking systems. Sec-
have been used recently to improve the performance of tion V discusses explainability in Artificial Intelligence
existing ML approaches, e.g. by distributing the train- (XAI), which is rightfully gaining traction because
ing process or the data collection to improve resource many recently tapped application domains (including
utilization or training speed. computer networks) come with amounts of complexity
ML is a very active and rapidly expanding research and risk that disqualify fully black-box ML models for
field that includes an abundance of learning techniques, widespread adoption. Section VI broadens the scope
model types, tools and frameworks, practices, and ap- presented up until now and introduces ML techniques
plication possibilities. Although we focus here on ML and paradigms such as distributed and parallel learning.
models, some applications require considering the whole These techniques leverage existing networking concepts
running system, i.e., AI system, to properly evaluate and technology and seem useful, if not mandatory, for
and understand the output, instead of focusing solely many problems in the networking domain. Section VII
on the ML models [4]. This paper is intended as and Section VIII give an overview of related survey
a primer/practical guide for researchers who are keen papers and open challenges in the concerned areas, and
on quickly applying ML to problems in computer net- finally, Section IX concludes this paper by summarizing
working and/or leveraging networking techniques to the content presented in this paper and providing per-
improve the performance of their ML systems but feel spectives on the open challenges and questions of ML in
2 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
Section 1
Introduction
Supervised Learning
Unsupervised Learning
Section 2
An Overview of Machine Learning
Further ML Methods
All-Reduce
Section 7
Further Readings Split Learning and Inference
Section 8
Challenges and Future Directions
Section 9
Conclusion
networking and vice versa. ML paradigms with a focus on the most popular ones.
We then briefly touch on some additional branches of
II. AN OVERVIEW OF MACHINE LEARNING ML that are relevant to computer networking.
AI is the discipline of machines that solve problems
by perceiving the environment and using some form A. SUPERVISED LEARNING
of knowledge model in order to derive solutions and Supervised learning is the first of the three main types of
conclusions. ML is an integral part of AI, of which ML and encompasses models that predict target values
it is considered a major subfield [5]. ML models are yi for given data points xi . The starting point for the
statistically and computationally derived from evidence learning problem is a data set that consists of input-
in the form of historical data or experience instead of output data points D = (x1 , y1 ), (x2 , y2 ), ..., (xN , yN ).
explicitly programming a machine for a task. The three The goal is to learn a function h mapping from the
traditional ML paradigms are supervised, unsupervised, input domain to the target domain such that ŷi = h(xi )
and Reinforcement Learning (RL). Methods can be for all data points. Both input and output domains can
categorized into these paradigms by the type of feedback take various shapes, such as boolean or scalar values,
the learning system receives. In supervised learning, euclidean vectors or more complex representations such
exact feedback is available in the form of data labels. as graphs. Depending on the type of output domain,
In unsupervised learning, on the other hand, data is supervised learning is generally divided into classifi-
only partially labeled or completely unlabeled. Finally, cation and regression problems. Examples of popular
in RL, implicit feedback is available for observed data in network applications that use supervised learning are
terms of a so-called reward function that labels data by traffic prediction [6] and classyfing security attacks [7].
a numerical value. We will now discuss the three main
VOLUME 11, 2023 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
2.0 2.0
1.5 1.5
1.0 1.0 Feature 2 <= -0.189
samples = 30
0.5 0.5 Yes No
Feature 2
Feature 2
0.0 0.0 samples = 13 Feature 2 <= 0.223
class = Class 1 samples = 17
0.5 0.5 Yes No
1.0 Class 1 1.0 Feature 2 <= -0.023 samples = 13
Feature 2
0.0 0.0
0.5 0.5
1.0 1.0
1.5 Class 1 1.5 Class 1
Class 2 Class 2
2.0 2.0
3 2 1 0 1 2 3 3 2 1 0 1 2 3
Feature 1 Feature 1
• Random Forests [12] for regression use the average and dimensionality reduction, differ in their use case.
of the individual trees’ predictions as the final Supervised learning has been used for tasks such as
prediction value. anomaly detection, intrusion detection [17] and data
• The KNN [13] method for regression calculates the traffic analyses [18].
label for the new data point by calculating the
average target value of its k-nearest neighbors. 1) Clustering
• The most popular regression method is least-
squares fitting, in which the model is updated to Clustering approaches use the data points’ feature values
minimize the squared L2 norms of the difference to find regularities in the data domain and thus divide
between the predicted values and their associated them into multiple semantically meaningful categories.
labels. This is known as the Mean Squared Error Clustering approaches such as k-means or Density-
(MSE). Based Spatial Clustering of Applications with Noise
In linear regression, this line is represented by a (DBSCAN) [19] differ in the way cluster affiliation is cal-
linear function, while in logarithmic regression, it culated, for example, through data density or neighbor
is represented by a logarithmic function. In other connectivity via measurable distance between the data
words, least-squares method fits a line to the data points. Within the networking domain, data grouping
points in a way that minimizes the sum of the can serve as a useful starting point for further analysis
squared vertical distances between the line and the and action in a variety of problem settings, such as
points. anomaly detection and resolution [20], task classification
for scheduling [21], or traffic characterization for traffic
engineering [22].
B. UNSUPERVISED LEARNING In general, there are different metrics to evaluate the
As opposed to supervised learning, in unsupervised performance of ML algorithms. Table 3 shows the most
learning, the data comes without output/target val- common metrics, which appear in the literature, used for
ues. Consequently, ML models are tasked with find- supervised learning (with an emphasis on classification
ing the underlying regularities in the data domain by metrics that are typically used for evaluating traffic
inferring them from the given training data. The two prediction) and unsupervised learning (with an emphasis
main types of unsupervised learning, namely clustering on clustering metrics as seen in intrusion detection as
VOLUME 11, 2023 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
6
Model Type Description Advantages Disadvantages
Support Vector Machine (SVM) Regression and A supervised learning algo- Can handle high-dimensional Can be computationally ex-
classification rithm that finds a hyperplane data, nonlinear relationships, pensive, sensitive to hyperpa-
that separates the data into and outliers. Can use different rameters, and difficult to inter-
different classes or predicts a kernels to customize the model. pret.
continuous value based on the
input features.
Support Vector Regression (SVR) Regression A type of SVM that pre- Can handle high-dimensional Can be computationally ex-
dicts a continuous value based data, nonlinear relationships, pensive, sensitive to hyperpa-
on the input features. It uses and outliers. Can use different rameters, and difficult to inter-
an epsilon-insensitive loss func- kernels to customize the model. pret.
tion to measure the error be-
tween the predicted and actual
values.
Decision Trees Regression and A supervised learning algo- Easy to understand, inter- Prone to overfitting, instabil-
classification rithm that splits the data into pret, and visualize. Can handle ity, and bias. Sensitive to noise
smaller subsets based on some both numerical and categori- and missing values.
criteria until the leaf nodes are cal data. Can capture complex
reached. The leaf nodes rep- nonlinear relationships.
resent the class labels or the
predicted values.
Random Forests Regression and An ensemble learning algo- Can improve the accuracy and Prone to overfitting, espe-
classification rithm that combines multiple robustness of decision trees. cially with noisy data. Can be
decision trees and aggregates Can handle both numerical computationally expensive and
their predictions using major- and categorical data. Can cap- slow to train and test. Less in-
ity voting or averaging. It uses ture complex nonlinear rela- terpretable than single decision
bootstrap sampling and fea- tionships. Can estimate feature trees.
ture selection to introduce ran- importance.
domness and reduce correla-
tion among the trees.
k-Nearest Neigbors (KNN) Regression and A lazy learning algorithm that Simple and intuitive to im- Sensitive to noise, outliers,
classification predicts the class label or plement and understand. No and irrelevant features. Can be
TABLE 1: Summary of supervised learning
the value of a new instance training required. Can handle computationally expensive and
based on the similarity with both numerical and categorical slow to test. Requires storage
its k nearest neighbors in the data. Can adapt to new data of the entire training data. Dif-
training data. It uses a dis- dynamically. ficult to choose the optimal
tance metric such as Euclidean value of k.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
similarity.
Least Squares Regression A method of fitting a linear Simple and fast to implement Sensitive to noise, outliers, and
model to the data by min- and solve. Provides a closed- nonlinearity. Prone to overfit-
imizing the sum of squared form solution for linear models. ting or underfitting, depend-
errors between the observed Can handle multiple features ing on the complexity of the
and predicted values. It can be and multicollinearity. model. May suffer from numer-
solved analytically using nor- ical instability or singularity is-
mal equations or iteratively us- sues.
ing gradient descent or other
algorithms.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
well as node(s) selection for data collection). learning paradigms, model architectures and problem
domains that come with some notion of uncertainty.
2) Dimensionality Reduction Since its comprehensive introduction would exceed this
This type of learning analyzes the statistical proper- paper’s scope, we point the interested reader to [30] for
ties of the data in order to reduce the number of a high-level overview, and [31] for an extended overview
dimensions that sufficiently describes the data. This of the core concepts of probabilistic ML.
is particularly useful when dealing with more complex
learning problems, as theoretical results show that the 2) Hybrid Learning Approaches
amount of data points needed to learn an accurate Many ML contributions do not fully fall into one of
model scales exponentially with the dimensionality of the aforementioned learning paradigms but rather com-
the input data domain [23] (this phenomenon has been bine their ideas and create new sources for learning
coined the "curse of dimensionality"). While approaches signals. Some of these "hybrid" learning approaches are
like Decision Trees or Random Forest can reduce the popular enough to earn their own description. In semi-
dimensionality of the relevant portions of the data supervised learning, typically, only parts of the training
by considering only the most meaningful features, ap- data are labeled [27]. To train a model in a supervised
proaches like Principal Component Analysis (PCA) [24] or unsupervised manner, the auxiliary information is
find a reduced-cardinality combination of new features. extracted by respectively using the other learning type.
Like clustering, this type of unsupervised learning is Self-supervised learning, on the other hand, tackles
beneficial as a preparative step before further analysis shortcomings of supervised learning approaches (i.e., the
or model training, especially since, in many real-world need for large amounts of data and vulnerability to
scenarios, it has been observed that the given data lies on adversarial inputs) by using parts or representations of
manifolds of much lower dimensionality than the actual the input data as labels [32]. For example, in [33], a
input space (the presumed general rule for this is called model is trained to predict future video frames by only
the manifold hypothesis [25]). feeding it the first few frames of a video and using the
remaining frames as "comparison" labels.
Table 1 summarizes supervised methods, while Table 2
summarizes unsupervised methods. Further details can D. REINFORCEMENT LEARNING
be found in [26], [27]. Regardless which method is used, In the spectrum of traditional learning paradigms for in-
it is important to watch out for over- and underfitting. telligent agents, Reinforcement Learning (RL) is located
Overfitting is a condition where a statistical model between the two extreme domains of fully supervised
begins to describe the random error in the data rather and unsupervised learning. RL is particularly suitable
than the relationships between variables. This problem for decision, control, and optimization problems where
occurs when the model is too complex. data and observations are received sequentially [34]. As
Underfitting, on the other hand, is the inverse of such, RL can be applied to various challenging problems
overfitting. It means that the statistical or ML model is in network science [35]–[37]. Especially, Deep RL (DRL)
too simplistic to accurately capture the patterns in the methods as to be discussed in Section III-C have seen
data. A sign of underfitting is that there is a high bias tremendous success in solving resource allocation prob-
and low variance detected in the current model used. lems in computer networking [38].
The implementation of RL is based on an RL agent
C. FURTHER ML METHODS that receives performance feedback called rewards as
There are various other branches of ML that are of use the agent interacts with an environment over time [39].
in computer networks, see [28], [29]. Here, we discuss two The algorithm designer typically crafts the reward as
additional ML frameworks that are presumably relevant a function of the agent’s sequential observations. The
in the networking domain. rewards, however, do not provide exact instructive feed-
back on how to change the agent’s behavior, thence RL
1) Probabilistic ML is placed in the spectrum of learning paradigms. In this
Oftentimes, neither all relevant information is known section, we will describe the basics of RL and the most
or attainable prior to making a decision, nor is the fundamental algorithms. Throughout, we will directly
environment that reacts to the taken decision purely refer to applications in computer networking for almost
deterministic [5]. Uncertainty may exist in the input all mentioned algorithms.
data, in the decision model parameters and output The interaction of an RL agent with its environment
values and even in the architecture of the decision is described by a Markov Decision Process (MDP) as
model itself. [30]. In all of these cases, probability illustrated in Figure 3. Whenever one seeks to solve a
theory provides a unified framework to cope by using problem using RL, the first step (arguably the most
probability distributions to model uncertain quantities. important) is to define the problem as an MDP. Based
This framework is in principle, applicable to all ML on this MDP one then chooses or designs a suitable RL
VOLUME 11, 2023 7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
weight is given to rewards in the near future, and Q-Learning is guaranteed to converge to the optimal Q-
the weights of future rewards decay geometrically. For function if all states and actions are explored infinitely
background on average and total cost MDPs see [40, often [40, Section 6.6.1]. Q-Learning has been success-
Chapter 4 & 5]. fully applied to various problems in computer networks,
Given a policy π, the associated action-value function e.g., network self-organization [45], network slicing [46]
(also called Q-function) is defined as or virtual network embedding [47].
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
above derivative. Calculated gradients are then used to 2) Convolutional Neural Network (CNN)
update the NN weights iteratively. Various algorithms In many problem domains, data exists on a grid-like
have been proposed throughout the last decade to use structure where spatial patterns carry the same semantic
the aforementioned calculated gradients most effectively. information regardless of their location in the grid (also
The most well-known algorithm is the ADAM optimizer referred to as translation invariance). Examples include
[77], which adaptively selects the stepsize for individual images (2D grids) but also time-series data (1D grids).
NN weights based on the calculated gradient informa- To exploit this symmetry, Convolutional Neural Network
tion. See [78] and the reference therein for various other (CNN) utilize spatial convolution, which applies the
gradient-based methods. Note that most tools (such as same learnable spatial parametric kernels (i.e., small
PyTorch and TensorFlow) offer these optimimzers as matrices with learnable individual entries) on evenly
black boxes without having to deal with the implemen- spaced patches of the input grid [70]. The re-usage of
tation details. a set of such kernels across multiple image positions is
called weight sharing and greatly reduces the number of
B. DEEP NEURAL NETWORK ARCHITECTURES parameters needed to learn and extract the patterns of
the input data.
The UAT seems to advocate that rather simple feed-
forward NN architectures can be used for any problem
3) Recurrent Neural Network (RNN)
that might be solvable with ML. In practice, however,
the findings of the UAT are greatly humbled by the For dealing with sequential data such as time series,
excessive amounts of training data, the size of the NN Recurrent Neural Network (RNN) elements such as the
models, and the required time for training necessary Long Short-Term Memory (LSTM) [84] or the Gated
to achieve satisfactory results on a complex task. Fur- Recurrent Unit (GRU) [85] have proven very useful.
thermore, for many tasks, it can be observed that the The commonality between all RNNs is feeding a portion
members of the underlying data domain are semantically of the output back into the RNN block for subsequent
composable into simpler entities, spanning a hierarchy of computations, enabling NN architectures with recurrent
concepts. As a consequence, researchers have started to elements to capture sequential dependencies within the
add more structure to their models. data [70].
Different model architectures have proven effective for
4) Graph Neural Network (GNN)
different tasks. An overview of common deep learning
model architectures is given in Table 4. In the following Recently, Graph Neural Networks (GNNs) have
subsections, we briefly present the most popular model emerged as powerful architectures for handling graph-
archetypes and refer to the provided references for fur- structured data. Utilizing permutation-invariant aggre-
ther reading. Interestingly, all of the model archetypes gation/pooling operations and permutation-equivariant
introduced below are derivable from the same basic message passing operations to learn patterns in the
mathematical framework and only differ in the shape data while respecting the graph topology rather than
of data and the assumptions made about regularities in assuming any specific ordering of its nodes and edges
the data [79]. [86].
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
Networks (DQN) [109]. DQN is a DRL algorithm for vice coordination [116], scheduling for large-scale net-
MDPs with finite action space. DQN seeks to approxi- worked control systems [117], acoustic sensor networks,
mate the optimal Q-function by a DNN Qθ (s, a) with [118], adaptability of wireless sensor networks [119] and
parameters θ. Specifically, a DQN takes a state s as other applications in communications and networking
input and outputs Qθ (s, a) for every action a of the finite [120].
number of actions. The key techniques introduced for
DQN are an experience replay buffer and a so-called 1) General Advice for Training DRL Agents
target network. During training, DQN interacts with its Training a DRL Agent to successfully solve a given
environment, generating data tuples (s, a, r, s′ ). These problem can be a challenging task. In this section, we
data tuples are stored in an experience replay buffer. provide some general advice from our experience in the
During training, DQN samples a mini-batch from this hope to ease this task.
memory and applies a stochastic gradient descent step 1) It is good practice to normalize the states and
of the average squared Bellman error of the samples from actions, e.g., [−1, 1]d , d ∈ N. Linear scaling al-
the mini-batch. This rather simple technique reduces the ways makes this possible when the state space
bias of Q-Learning towards its recent interaction with S is bounded in real dimensional space. When
the environment and thereby helps to stabilize training. S is unbounded, let’s say Rd , one needs to use,
In NN terminology, the right-hand side of the Bellman e.g., a scaled version of the hyperbolic tangent or
loss, i.e., r + γ maxa′ Qθ (s′ , a′ ) is the training target for the inverse stereographic projection. Such nonlinear
Qθ (s, a) given the data tuple (s, a, r, s′ ). In other words, transformations, however, change the environment,
the DQN itself is used to compute its training targets. and the resulting policies may perform poorly in
The idea behind target networks is to use a separate the actual environment if the normalization is not
′
target network Qθ (s, a) to compute the aforementioned chosen carefully. Ideally, one should aim at linear
training targets. The target parameters θ′ are then scaling throughout the state space’s “expected”
chosen to track the actual training parameters slowly. dominant part. As the action space is typically
With this, target networks provide more stable training bounded, action normalization is less problematic.
targets, which has been shown to generally improve 2) Reward normalization should be used even more
DRL training, see [109], [110]. However, more recent carefully than state normalization. In general,
theoretical and numerical studies suggest that gradient changing the reward changes the perception of an
clipping is superior to the use of target networks [111]. agent about the environment and results in different
DQN is also an integral component of the Deep learned policies.
Deterministic Policy Gradient (DDPG) algorithm [110], 3) The design of the reward signal is an integral part
which is one of the most well-known actor-critic algo- of the design of an MDP. One has to craft a reward
rithms for continuous action spaces. In DDPG, a critic function that incentivizes the desired behavior to
is trained using the DQN algorithm, while a determin- get an algorithm to learn the desired goal. Some
istic policy is trained to maximize the approximated additional comments in no particular order: Make
Q-function. DQN and DDPG, in turn, are the basis it easy for an agent to distinguish good from bad
for the two common deep MARL algorithms Indepen- scenarios; Continuous rewards or dense rewards
dent Deep Q-Learning [112] and Multi-Agent DDPG typically make it easier for algorithms to learn; If
(MADDPG) [113]. However, only DDPG has a truly possible, avoid sparse rewards and instead shape the
distributed version that can be run with nearly arbitrary rewards to give gradual feedback; Strictly positive
communication delays over a communication network. rewards incentive agents to avoid terminal states;
This is known as the Distributed DDPG (3DPG) algo- Strictly negative rewards incentive agents to reach
rithm [114]. terminal states.
Another important technique for successful DRL 4) It should be avoided to train DRL models with
training was proposed as part of the deep actor-critic drop-outs. Drop-outs is a regularization technique
algorithm Asynchronous-Advantage-Actor-Critic (A3C) that was introduced in [121] to train NN models
[52]. The asynchronous part refers to using several agents with less overfitting while improving the general-
in parallel simulated environments to improve and speed ization. However, this leads to increased training
up DRL training. In other words, the training progress variance, which is generally undesirable for the
of several agents on the same problem is combined to training of DRL.
enhance the training performance. This is especially
important for complex tasks since multiple parallel pro- 2) Algorithm Categorization
cessors can significantly reduce the overall training time. The sheer amount of available DRL algorithms can
The success of DRL has been demonstrated across var- be overwhelming for starters in the field, making it
ious sub-areas in computer networks like management of challenging to find appropriate algorithms for a given
satellite-terrestrial networks [115], multi-objective ser- problem. To ease the algorithm selection, we provide
VOLUME 11, 2023 13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
categorizations of widely used single-agent and multi- • Data selection: This involves selecting the relevant
agent DRL algorithms in Figure 5 and Figure 6, respec- features or variables from the dataset that are most
tively. We note that the tree structures are simplified. important for the problem at hand. This step is
For example, the model-based algorithms in Figure 5 important to reduce the dimensionality of the data;
can further be classified into value-based, policy-based, making it easier to improve the performance of the
actor-critic and on/off-policy algorithms. Furthermore, model; it removes irrelevant or redundant features,
only selected and widely used algorithms are shown. it can help to speed up the training process and
These categorizations should serve as starting points. reduce the computational resources required for
The final algorithm selection for a specific problem analysis.
should also consider additional factors such as sampling • Data transformation: This involves transforming
efficiency, algorithm stability and exploration strategy. the data into a format suitable for the ML algorithm
Single-Agent-DRL Algorithm Categorization: Single- being used, such as converting categorical variables
agent-DRL algorithms can be coarsely categorized by into numerical values using one-hot encoding; it
their supported action space (discrete/continuous), if generates a vector, whose length corresponds to the
they are model-based or model-free, and if they are number of categories in the dataset. Data points
value-based, policy-based or a combination of both belonging to the category are assigned 1, otherwise
- called actor-critic. Considering the tree-structure in 0.
Figure 5, it can be seen that some algorithms (e.g., • Data splitting: This involves splitting the dataset
A2C/A3C, SAC, PPO) can be used for both, discrete into a training set to train the model, a validation
and continuous action spaces, while others, such as DQN set to evaluate its performance during training, and
and DDPG, are only compatible with one of them. a test set to evaluate the models’ performance after
MARL algorithms can generally be categorized based training.
on the same factors as single-agent DRL algorithms.
It is important to note that the specific preprocessing
However, additional multi-agent based factors can be
steps required may vary depending on the dataset, the
included. These are mainly centralized/decentralized
problem being addressed, and the type of ML model
learning and cooperative/independent learning. To pre-
used. The preprocessing steps should be chosen carefully
serve clarity, some of the traditional single-agent based
to ensure that the data is suitable for training and
factors have been omitted in Figure 6.
that the model can accurately represent the underlying
relationships in the data. In the following, we present the
IV. DATASETS, TOOLS AND FRAMEWORKS
most popular network domain datasets in the literature
Now that we have discussed what ML is and its potential
for different applications.
applications, we will introduce here the most popular
datasets in the field of networks, as well as emulators,
and simulators that can be used to run ML experiments. 1) Mobile network throughput datasets
Since ML models parameters are learned from data, the A common problem in networking research is replicat-
datasets used are crucial in accomplishing the intended ing realistic network conditions, especially throughputs.
task, such as network latency prediction or decision- Dynamic Adaptive Streaming over HTTP (DASH) is
making for traffic routes. Additionally, ML models need one such exemplary research area. Depending on the
to be tested before being applied in a productive environ- mobile network, different datasets containing traces of
ment. Thus, well-known network tools and frameworks real-world measurements have been created in order to
can aid in prototyping, tracking, and evaluating these allow for a better comparison between different research
models. approaches.
For 3G mobile networks, the dataset by Riiser et
A. DATASETS al. [122] is widely used [123]. It contains 86 traces
Datasets are usually not plug-and-play and require from measurements conducted on commute paths in
preprocessing. The type of preprocessing required for Oslo, Norway, using six different mobility patterns (cf.
the datasets depends on the specific problem being Table 5). Besides the download throughput, it also
addressed and the type of data being used. In general, contains the GPS latitude and longitude coordinates of
preprocessing includes the following steps: the measurement device.
• Data cleaning: This involves removing any missing, For 4G mobile networks, the dataset by Van Der Hooft
inconsistent, or irrelevant data to ensure the quality et al. [124] (we call it 4G_a in this paper) is commonly
of the data being used for training. used [125]. It contains traces of 40 measurements with
• Data normalization: This involves transforming the different mobility patterns (cf. Table 5) conducted in
data into a common scale, such as normalizing the Ghent, Belgium. It is similar to the 3G dataset by
values between 0 and 1, to ensure that no variable containing the download throughput, as well as the GPS
has an undue influence on the model. coordinates of the measurement device.
14 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
Single-Agent
RL
Discrete Continuous
Action Space Action Space
- MCTS - MPC
- I2A - GPS
Value-based Policy-based Actor-Critic Value-based Policy-based Actor-Critic
Multi-Agent
RL
Discrete Continuous
Action Space Action Space
Another widely used [126] dataset for 4G networks there are different signal parameters measured in the
was created by Raca et al. [127] (we call it 4G_b in this dataset like SINR, RSSI, and RSRP as well as GPS data,
paper). A total of 135 measurements were conducted time, data rate, etc., and it is published at CRAWDAD5 .
in Ireland. In comparison to the 4G_a dataset, this Raca et al. also created a widely used [129] dataset for
one is larger, also contains different mobility patterns 5G networks [130]. It contains 83 traces of measurements
(cf. Table 5), and contains significantly more metrics, in Ireland with two different mobility patterns (cf.
such as the download and upload throughput, additional Table 5). The measurement setup and dataset structure
channel-related metrics, context-related metrics, and are comparable to their 4G_b dataset. In addition, it
cell-related metrics. also contains ping statistics.
In Farthofer et al. [128] an LTE dataset for the use of Table 5 provides a comparative overview of the pre-
ML is described. The dataset is measured on an Austrian sented datasets.
highway and contains over 2000 measurement points per
month over a time period of two years. Additionally, 5 https://www.crawdad.org/
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
TABLE 5: Overview of throughput datasets for 3G, 4G, and 5G mobile networks.
3G (Year 2013) [122] 4G_a (Year 2016) [124] 4G_b (Year 2018) [127] 5G (Year 2020) [130]
Measurement method HTTP video streaming HTTP file download TCP file download, TCP file download
Youtube streaming Netflix video streaming
Amazon video streaming
Number of traces 86 40 135 83
Trace lengths (min;max) 195 s; 12224 s 166 s; 758 s 383 s; 11141 s 266 s; 7752 s
DL throughput (min; max) 0; approx. 9 Mb/s 0; approx. 111 Mb/s 0; approx. 173 Mb/s 0; approx. 533 Mb/s
UL throughput (min; max) / / 0; approx. 4 Mb/s 0; approx. 7 Mb/s
Measurement interval 1s 1s 1s 1s
Mobility patterns Metro, Tram, Train, Foot, Bicycle, Bus, Static, Pedestrian, Car, Static, Car
Bus, Ferry, Car Tram, Train, Car Bus, Train
Logged metrics • GPS latitude, longitude • GPS latitude, longitude 19 metrics, e.g.: 25 metrics, e.g.:
• DL throughput • DL throughput • GPS longitude, latitude • GPS longitude, latitude
• DL & UL throughput • DL & UL throughput
• RSSI, RSRP, RSRQ • RSSI, RSRP, RSRQ
• CQI, SNR • CQI, SNR
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
as information about the location and characteristics of tions, especially the download bandwidth, are commonly
the routers themselves. simulated using the network traces from the datasets
The Internet Topology Zoo [135] is a collection of presented in Section IV-A1 [138]. In the following,
network topology datasets that provide information we present four commonly used DASH video datasets.
about the physical structure of different networks. The Table 7 provides an overview of their most important
datasets in the Internet Topology Zoo come from a properties.
variety of sources, including measurements of the in- The DASH dataset [140] is an old (2012) but still
ternet, testbeds, and simulations. Its datasets provide widely used dataset, e.g., to test new QoE-schemes [139].
information about the connections between nodes in a It contains 6 videos of different genres split into segments
network, the capacities of the links between nodes, and ranging from 1 - 15 seconds in length.
the characteristics of the nodes themselves, such as their The Distributed DASH (D-DASH) dataset [141] was
locations and capabilities. One of the main strengths of published in 2013 and is intended to be used in real-
the Internet Topology Zoo is its comprehensive cover- world testbeds. It contains one video that is distributed
age of different types of networks, including wide-area on servers in Klagenfurt, Paris, and Prague. This enables
networks, data centers, and other large-scale networks. a client to choose the requested location for each segment
This makes it a valuable resource for researchers and individually.
practitioners working in the field of network routing The ultra high definition HEVC DASH dataset [142]
and network management, as it provides a diverse set was published in 2014 and includes one video. In con-
of datasets for evaluating and comparing different algo- trast to the DASH and D-DASH datasets, the video
rithms and technologies. is encoded with the newer and more efficient H.265
To sum up, GENI and Abilene datasets primarily fo- (HEVC) video codec. Furthermore, it is encoded in UHD
cus on network infrastructure, providing researchers ac- resolution, at 30 and 60 Frames per Second (FPS), and
cess to national research networks. Conversely, CAIDA at 8 and 10 bits.
and RocketFuel are designed to facilitate the measure- The multi-codec DASH dataset [143] is a rather new
ment and analysis of network traffic and topology. The dataset from 2018. It consists of 10 videos that are
Internet Topology Zoo, meanwhile, is a collection of encoded with four different video codecs: H.264, H.265,
publicly available network topologies that researchers VP9, and AV1. In addition, three different video FPS
can use for various purposes. Thus, the size of the are included: 24, 30, and 60.
network varies depending on the scope and focus of the
dataset. The GENI and Abilene datasets tend to cover 4) Mobility and Autonomous Vehicles
larger networks compared to CAIDA and RocketFuel, In the context of mobility or autonomous driving using
which prioritize measurement and analysis tools [136]. a wireless network infrastructure (let it be cellular or
CAIDA and RocketFuel datasets use passive measure- V2X), most of the studies in the literature discussing
ments, while GENI and Abilene datasets use both active solutions and their results do not make the datasets
and passive measurements. The Internet Topology Zoo publicly available for scrutiny by third parties. As such,
is a collection of network topologies and does not involve the results are difficult to verify and validate properly.
any measurements. Further comparison is shown in Nevertheless, many studies rely on simulated datasets.
Table 6 An open-source traffic simulation software called Simu-
lation of Urban Mobility (SUMO)6 provides datasets for
3) Dynamic Adaptive Streaming over HTTP (DASH) simulating realistic urban traffic scenarios. Among the
Video streaming via Dynamic Adaptive Streaming over datasets are road networks, traffic demand patterns, and
HTTP (DASH) is a large research area in networking. vehicle behavior models, which can be customized for
Recent rate adaptation algorithms often aim to optimize different traffic scenarios and urban environments [144].
the user’s Quality of Experience (QoE) under the given Although SUMO is popular among researchers and
network conditions, such as a constrained bandwidth practitioners in the industry, another software called
[137]. While these algorithms were initially conventional Multi-Agent Transport Simulation (MATSim)7 is often
heuristics, DRL-based approaches have recently shown used in academic research. While SUMO focuses on
excellent performance and are now considered state-of- macroscopic traffic flow modeling, MATSim uses an
the-art [138]. In order to benchmark different solutions, agent-based approach to model individual travel be-
publicly available DASH datasets are often used [138]. havior [145]. As a result, MATSim can capture more
Typically, the datasets contain videos that are encoded complex individual decision processes, while SUMO is
under a controlled set of parameters, e.g., resolution better suited for overall traffic flow modeling. Another
and bitrate, and split into segments of certain lengths. open-source software is CityFlow8 , which includes a
The solutions are commonly evaluated via simulations 6 https://sumo.dlr.de/docs/Data/Scenarios.html
where the videos from the DASH datasets are streamed 7 https://www.matsim.org/open-scenario-data
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
range of features that are not available in SUMO, such as layer, protocols like HTTPS, DNS over TLS (DoT), and
real-time simulation and the ability to model pedestrian QUIC [154] do not yield sufficient unencrypted data
and bicycle traffic [146]. that reliably identify services and traffic types. Instead,
There exist other alternatives, yet commercial, that new techniques have to be developed which make use of
also provide mobility datasets, such as Aimsun9 , Vis- the available unencrypted data. For encrypted network
sim10 and TransModeler11 We present in Table 8 an traffic analytics, a common approach is therefore to
overview of these datasets, while a comparison of the extract packet sizes, directions, and inter-arrival times
use cases for some of these datasets was shown in [147]. as well as potential additional information like port
numbers to build features. These features describe the
5) (Encrypted) Network Traffic Analytics network traffic of a specific service or traffic type [155]–
Another common task for ML in networking is network [158]. These features are then fed to ML models to
traffic analytics. This includes the task of traffic/service learn specific patterns exhibited by different traffic types
classification, i.e., identifying an active service or traffic or services. Beyond traffic classification, this type of
type in the network. Examples for such a task are analytics is often used for security-related tasks like in-
distinguishing between video and web traffic, between trusion detection or fingerprinting of websites, browsers,
services like YouTube and Netflix, or even between devices, and operating systems, and for estimating the
different Android apps. Due to pervasive encryption QoE of services [159], [160]. Due to the prevalence of
with for example TLS on application and transport those topics, there also exists a variety of datasets for
the different network traffic analytic tasks.
9 https://www.aimsun.com/
10 https://www.ptvgroup.com/ An overview on the topic of traffic classification along
11 https://www.caliper.com/transmodeler/default.htm with a list of existing works (and solutions) and datasets
18 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
such as Wi-Fi, LTE, or even a recently released mmWave higher layer protocols like IP, UDP, and TCP.
(millimeter wave) [171] module to ease the way for In terms of ML, Veins-Gym [174] exposes an OM-
researchers to have somehow reliable test environments NeT++ simulation as an OpenAI Gym environment,
for the newly developed approaches and reduce develop- analogous to ns3-gym for ns-3. Despite its name, Veins-
ment time for various kinds of research interests. Indeed, Gym can be used not only in combination with the Veins
ns-3 can assist researchers in network performance eval- framework but also with any OMNeT++ simulation.
uation, however, if it is extended through open-source An overview with examples of how to use different
AI frameworks, the procedure would be more beneficial ML frameworks such as TensorFlow in OMNeT++ can
to ML problems as by default it does not support be found at https://github.com/ComNetsHH/omnetpp-
ML approaches. An attempt to do so was employed ml.
in ns3-gym [172], an extension of ns-3 connecting the
module to the OpenAI Gym toolkit. This connection is
done utilizing Zero MQ sockets through the IPC (Inter- 2) Emulators
Process Communications) method. Moreover, the capa- A network emulator, unlike a simulator, creates a vir-
bility and adaptability of OpenAI Gym to reinforcement tual copy of a physical device, including all hardware
learning can be favorable, as it is a widespread library and software configurations, to functionally replace it.
such as TensorFlow and Scikit-Learn. ns3-gym aims at Hence, emulation is more accurate than simulation, but
ameliorating the process of network prototyping that also more expensive in terms of computation resources.
employs reinforcement learning. This module enhances There are many network emulating tools, including but
the feature of scalability, which is important for having not limited to:
several instances in ns-3 and making the conversion and • Mininet15 : a Python-based tool focused on emulat-
deployment of ns-3 scripts feasible in the OpenAI Gym. ing software-defined networks (SDNs) using Open-
Furthermore, debugging and exploitation of the module Flow switches.
could be kept at a level that is uncomplicated for users • GNS316 : supports a wide range of network devices
as it is such a conventional module for ns-3, having two and protocols using virtual machines and real de-
main blocks of OpenAI Gym and ns-3 that interact with vices,
each other. Another interface extension that bridges the • Mahi17 : a lightweight network emulator that is
ns-3 and python-side ML implementation is ns3-ai [173], designed to emulate low-bandwidth networks with
which claims to greatly increase the interaction speed high latency.
by facilitating communication through a shared memory • WANEM18 : a Linux-based tool that can be used to
block. emulate various network conditions such as latency,
packet loss, and bandwidth limitations in WAN.
OMNeT++ • TENSE19 : a VM tool that can be used to generate
Another popular discrete-event network simulator is emulated network traffic for security evaluation
OMNeT++13 , which can be used free of charge for purposes. It can generate various types of traffic,
academic and educational purposes under a license with such as HTTP, FTP, SMTP, etc.
rights similar to the GPL14 , but requires a paid license • CORE20 : similar to GNS3 but with further emula-
for commercial use. While OMNeT++ itself only con- tion capabilities beyond traditional networks, such
tains the core simulation framework, various models can as SDN and virtualization technologies.
be added via external frameworks. The most important
one is the INET Framework, which is maintained by the 15 http://mininet.org/
OMNeT++ core team and provides models for network 16 https://docs.gns3.com/
standards like IEEE 802.3 and IEEE 802.11 as well as 17 https://manpages.org/mahimahi/
18 https://github.com/PJO2/wanem
13 https://omnetpp.org/ 19 https://github.com/vmware/te-ns
14 https://omnetpp.org/intro/license 20 http://coreemu.github.io/core/
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
• FlowEmu21 : a modular network link emulator with dresses, source and destination ports, and protocol type.
an flow-based programming inspired user interface Additionally, they can reduce the amount of data needed
that integrates TensorFlow for writing custom ML to be generated by generating a single flow instead of
modules. multiple individual packets.
Many of these tools have been used for training and The authors in [177] propose different preprocessing
evaluating ML algorithms. For example, SDWAN-gym22 approaches, for transforming IP addresses of flows into
and IROKO [175] are Python-based platforms built on a continuous feature, since GANs can only process
top of Mininet for training and evaluating reinforcement continuous features. Then, they use domain knowledge,
learning algorithms in software-defined WANs and data such as packet size, inter-arrival time, and flow duration
centers, respectively. It is often the case that emulated distributions, to evaluate the quality of the generated
data is mixed with real data for a large reliable dataset. data. Another example is MAIGAN (Massive Attack
There exist many datasets that adopt this approach in Generator via GAN) [178] that generates synthetic net-
cybersecurity, such as the Canadian Institute for Cy- work traffic that mimics various types of cyber attacks,
bersecurity database23 . They provide the “CICIDS2017” including Distributed Denial of Service (DDoS) attacks,
dataset, labeled network flows with full packet payloads port scanning attacks, and brute-force attacks, which is
in PCAP format, for ML and deep learning purposes. able to bypass black-box ML-based detection models24 .
Also, they provide the AndMal 2020 dataset to identify To handle the problem of imbalanced traffic classifica-
and classify Android malware based on ML. tion, i.e., data used for training the classification model
contains a disproportionate number of samples from one
3) Synthetic class compared to the others, ITC-GAN [179] uses a
modified GAN architecture with a class-balancing loss,
Synthetic data is needed because it can help to overcome
based on the inter-packet time characteristics, which
the lack of up-to-date real-world data and privacy con-
helps to balance the number of samples from each class
straints, which limit the development of new models.
during training and remove the bias in the classification
In addition, synthetic data can provide an efficient
model.
mechanism to surmount the lack of labeled datasets
The work in [180] discusses other GANs models for
and post-processing overhead. In the context of network
network traffic generation, including Facebook Chat
traffic analysis, synthetic data can be used, for example,
GAN [181] that generates chat message sequences based
to train ML models to detect cyber-attacks and resolve
on Facebook Messenger data, ZipNet GAN [182] that
network congestion as well as other performance issues.
generates compressed network packets using Huffman
SynGAN (Synthetic Generative Adversarial Net-
coding and PcapGAN [183] that generates network
work) [176] is a packet-level GAN designed to generate
packet captures (pcap files) by learning from real-world
synthetic traffic data. It generates synthetic packets
pcap files.
that closely resemble real-world traffic by simultaneously
training the generator and discriminator networks. The
C. MACHINE LEARNING TOOLS
generator network takes random noise as input and pro-
Selecting the right tools to solve ML problems can be
duces synthetic network traffic data as output, while the
challenging. This section gives an overview of the most
discriminator network distinguishes between synthetic
commonly used Python-based tools for classic ML, deep
and real data. Adversarial training ensures that the
learning, and reinforcement learning. Furthermore, we
synthetic data produced by SynGAN is representative
provide guidelines for when to use which tool.
of real network traffic.
To make sure that the generated data satisfies certain
1) Classic Machine Learning
constraints, PAC-GAN (Projection Adversarial Con-
Scikit-learn25 , also known as sklearn, is a machine-
straint GAN) [91] uses a projection operator to map
learning library for Python that provides a wide range
the generated data onto a feasible set that satisfies the
of tools for data analysis and modeling. It is built on top
desired constraints. In addition to the standard GAN
of other popular Python libraries such as NumPy26 and
loss, PAC-GAN uses a constraint loss to ensure that the
SciPy27 and is designed to be easy to use and efficient.
generated data is not only realistic but also satisfies the
Sklearn provides a variety of ML algorithms for various
desired constraints.
tasks such as classification, regression, clustering, and
Another type of traffic generator is flow-based GANs
dimensionality reduction. These algorithms are carefully
that, unlike packet generators, focus on individual pack-
designed to be intuitive, fast, and efficient, allowing for
ets and generate flows of packets that share common
quick prototyping and testing of ML models. Some of
characteristics, such as source and destination IP ad-
24 https://github.com/JayWalker512/PacketGAN
21 https://github.com/ComNetsHH/FlowEmu 25 https://scikit-learn.org/stable/
22 https://github.com/amitnilams/sdwan-gym 26 https://numpy.org
23 https://www.unb.ca/cic/datasets/index.html 27 https://scipy.org
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
the commonly used algorithms available in sklearn in- such as TensorFlow. This feature facilitates the easy
clude support vector machines, decision trees, k-nearest implementation of advanced deep-learning models, in-
neighbors, logistic regression, and random forests. It cluding those with conditional logic, loops, and dynamic
also includes tools for model selection, preprocessing, inputs. PyTorch is designed to be modular and flexi-
and evaluation that enable researchers to preprocess ble, with a wide range of building blocks (e.g., layers,
data and select the right model for a problem. Some activations, loss functions, and optimizers) that can be
of the popular tools available in sklearn include cross- used to create custom deep-learning models with vari-
validation, grid search, PCA, and feature selection. A ous complexity levels. Furthermore, PyTorch supports
key advantage of sklearn is its wide range of algorithms distributed training, allowing for the efficient use of
for different tasks that are designed to be easily utilized multiple GPUs and machines. PyTorch is suitable for
by beginners to get started with ML. building complex deep learning models, and its dynamic
computational graph makes it easy to write and debug
2) Deep Learning codes.
Keras28 : Keras is an open-source library for deep learn-
ing in Python that provides a high-level API for build- 3) Reinforcement Learning
ing, training, and evaluating DL models. A key advan- Many tools for RL are built on top of classical ML and
tage of Keras is its simplicity due to its high-level API, deep learning tools, to support various algorithms.
which abstracts away the complexities of building and OpenAI Gym / Gymnasium: OpenAI Gym31 (lately
training deep learning models. Keras provides a user- continued as Gymnasium32 by the Farama Founda-
friendly API that makes it easy to create and experiment tion) is an open-source Python library that provides
with different neural network architectures and offers a a standardized API for the interaction between RL
wide range of pre-trained models. Keras provides a wide algorithms and environments. Additionally, it includes
range of pre-built layers, optimizers, and other building a wide range of environments of different complexities,
blocks that can be easily combined to create complex including classic control tasks, Atari games, robotic
models. Keras can also be run on top of several backends, simulations, as well as physical simulations. This allows
including TensorFlow, Theano, and CNTK. Keras is researchers to reproducibly benchmark RL algorithms
best suited for building simple to medium-complexity on a standardized set of environments. Furthermore,
neural networks and deep learning models. Gym can be extended by custom environments, allowing
TensorFlow29 : TensorFlow is an open-source library users to easily compare the performance of different RL
for ML and deep learning developed by Google. It is algorithms for customized problems.
widely used for a variety of tasks, such as image and One challenge of RL research is that different im-
speech recognition, natural language processing, and for plementations of the same RL algorithm can have sig-
training and deploying large-scale ML models. The key nificantly different performances in the same environ-
advantages of TensorFlow are flexibility and scalabil- ment, making RL algorithms highly sensitive not only
ity. It allows for the building and training of complex to hyperparameters but also to small implementation
models, including deep neural networks, and can run details [184].
on a variety of platforms, including CPUs, GPUs, and Stable Baselines3: Stable Baselines3 (SB3) [185] is
TPUs. TensorFlow can be used for distributed (ML) an open-source Python library that contains reference
training across multiple machines, making it well-suited implementations of seven widely used DRL algorithms.
for large-scale ML tasks. The TensorFlow ecosystem Tab. 10 lists all supported algorithms. The performance
includes a wide range of tools and libraries for tasks of those algorithms has been thoroughly tested. The li-
such as data pre-processing, model visualization, and brary is compatible with the OpenAI Gym/Gymnasium
deployment. TensorFlow is a good choice for building API, enabling users to train RL agents in just a few
complex deep-learning models that require a high degree lines of code. Moreover, the library supports custom
of customization and is suitable for both research and Gym environments, custom policies for the algorithms,
production environments. TensorBoard, as well as data logging customization
PyTorch30 : PyTorch is an open-source ML library for through custom callbacks.
Python that is primarily developed by Facebook’s AI Additional RL algorithms are implemented in the Sta-
research group. A distinctive feature of PyTorch is its ble Baselines3 Contrib (SB3-Contrib)33 package. These
dynamic computational graph, which allows for more are implementations of newly published algorithms.
flexibility in building and modifying models compared They are less tested and therefore considered experimen-
to the static computation graph used in other libraries, tal.
28 https://keras.io/ 31 https://www.gymlibrary.dev
29 https://www.tensorflow.org/ 32 https://gymnasium.farama.org
30 https://pytorch.org/ 33 https://sb3-contrib.readthedocs.io/en/master/
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
RL Baselines3 Zoo: RL Baselines3 Zoo34 is a Python fers high scalability by supporting both single-machine
library that provides pre-trained agents and a set of and distributed training, and offers tools for managing,
optimized hyperparameters for the algorithms from SB3 tracking, and visualizing the results of experiments.
and the Gym environments. Moreover, it provides useful Because it is built on the Ray platform, it is also
helper scripts for training and evaluating agents, for seamlessly compatible with other Ray libraries and tools
tuning hyperparameters, and for plotting results. for distributed computing and parameter tuning.
CleanRL: CleanRL [186] is a DRL framework that When to use which RL library? An important question
provides thoroughly benchmarked single-file Python im- to answer in this primer is when to use which of the
plementations of eight DRL algorithms (c.f. Tab. 10). presented RL libraries? CleanRL is recommended to be
Its goal is to provide researchers full control over an used either to fully understand how an algorithm is
algorithm in a single file, making it easier to 1) fully implemented or by RL researchers to quickly prototype
understand all implementation details, and 2) quickly new ideas, since its design decision to separate each
prototype novel DRL features. In addition, it pro- algorithm into its own file lets the researcher focus on
vides support for TensorBoard. In comparison to SB3, the algorithm instead of the complex software architec-
CleanRL does not provide a high-level user-friendly API ture of other RL algorithm libraries with intertwined
for model training. It is instead tailored to provide modular implementations. SB3 is primarily intended to
a development environment for DRL researchers with offer well-tested baseline implementations of important
implementations that are easy to read, debug, modify, DRL algorithms as a benchmark baseline for new RL
and study. The desired workflow is to first prototype new developments. However, along with its extensions SB3-
RL ideas in CleanRL and afterwards port it to a library contrib and Zoo, it is recommended to be used if a
offering a higher-level API like SB3. high-level interface for fast training of well-established
OpenAI SpinningUp: OpenAI SpinningUp35 is a great and well-tested RL algorithms on single-agent environ-
resource for aspiring researchers and practitioners that ments are desired and no scalability via distributed
are excited to apply DRL to their problems but are learning is required. RLlib offers a production-ready
overwhelmed by the implementation complexity of algo- framework for large scale projects. It is recommended
rithms in frameworks like Stable Baselines3. It provides to be used for multi-agent environments, as well as
detailed explanations of the most important concepts when high scalability via distributed learning, e.g., on
of DRL, as well as explanations and implementations clusters, is required. Furthermore, RLlib includes tested
of key DRL algorithms. The algorithm implementations implementations of cutting-edge RL algorithms. Con-
specifically focus on simplicity with the aim of being easy cerning environments and the interaction of agents with
to follow for people new to the field. This simplicity is them, Gymnasium and PettingZoo are arguably the
achieved by narrowing down the implementations to the most recognized and thus important standard APIs for
core concepts of the algorithms, and by omitting more single-agent and multi-agent RL, respectively. Creating
complex features that can significantly improve the algo- environment interfaces that adhere to their API defini-
rithm’s performance. As a result, OpenAI SpinningUP tions makes it much easier to experiment with different
should be primarily seen as a resource for education that RL algorithms and variations, since many implemen-
should not be used in production systems. tations expect Gymnasium or PettingZoo environment
PettingZoo: PettingZoo36 is an open-source Python instances.
library that contains a set of environments for multi- Table 10 provides an overview of the algorithms
agent reinforcement learning. While it is similar to implemented in the different RL frameworks. Besides
OpenAI Gym/Gymnasium in its functionality and API, the listed algorithms, RLlib supports further algorithms
the application scenario of MARL is different from the specifically for MARL.
one of single-agent RL. Among others, it contains multi-
agent environments of Atari games and classic games D. DATA LOGGING AND PARAMETER TUNING
like chess and Go. Furthermore, it can be extended by Prototyping is an essential aspect of ML development,
custom environments. and the ability to log and monitor experiments is cru-
Ray RLlib: Ray RLlib [187] is an open-source Python cial for efficient iteration. Creating individual names
library for RL. Out of the RL libraries presented in this for logs and artifacts might work after ten runs but
section, it is the most comprehensive one. It supports a becomes overwhelming after hundreds of runs. When
wide range of performance-tested RL algorithms, offers trying to compare different runs or when looking for
a high-level user-friendly API to train agents, supports the respective parameters of a run, having to look into
single-agent, multi-agent, and custom environments, of- log files is cumbersome. Fortunately, a range of tools
has been developed to do this task, organizing every
34 https://github.com/DLR-RM/rl-baselines3-zoo run with its parameters, visualizing runs with plots and
35 https://spinningup.openai.com the option to filter and search for specific runs, which
36 https://pettingzoo.farama.org will be introduced in this section. These tools provide
VOLUME 11, 2023 23
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
TABLE 10: Overview of the implemented single-agent need the most fundamental logging and visualization
algorithms of different RL frameworks. options.
SB3 SB3 Con. CleanRL Spin.UP RLlib
A2C [52] + - - - + 2) Tuning Tools
DDPG [110] + - + + + In ML, hyperparameters are often used to control the
DQN [109] + - + - +
HER [188] + - - - - behavior of the ML model. These hyperparameters in-
PPO [189] + - + + +
SAC [190] + - + + +
clude the usual parameters for training such as batch
TD3 [191] + - + + + size, which optimizer to use, learning rate and other
ARS [192] - + - - +
QR-DQN [193] - + - - - optimizer specific parameters, but can also include the
RecurrentPPO [194] - + - - - structure of the model like number, type and topology of
TQC [195] - + - - -
TRPO [196] - + - + - layers, or other custom parameters specific to the appli-
Maskable PPO [197] - + - - -
Categorical DQN [198] - - + - - cation. Selecting optimal hyperparameters is critical to
PPG [199] - - + - - achieving the best possible performance of a ML model.
RND [58] - - + - -
VPG [49] - - - + + However, searching for the optimal hyperparameters can
A3C [52] - - - - +
AlphaZero [200] - - - - +
be a complex and time-consuming task, especially for
Behavior Cloning [201] - - - - + large datasets and complex models. Tuning tools help
CQL [202] - - - - +
CRR [203] - - - - + to automate this process by systematically searching for
Dreamer [204] - - - - + optimal hyperparameters based on a user-defined search
IMPALA [205] - - - - +
R2D2 [206] - - - - + space and optimization criteria. This can save significant
Rainbow [207] - - - - +
SlateQ [208] - - - - + time and resources in the ML development process and
DD-PPO [209] - - - - + help to achieve better model performance. Below is a list
of the most popular tools
• NNI (Neural Network Intelligence) [216]: an open-
a suite of features beyond just logging and monitoring, source toolkit developed by Microsoft for automat-
including the ability to perform hyperparameter tuning. ing and optimizing the hyperparameter tuning pro-
Additionally, most of them easily integrate with popular cess of deep learning models. It provides a frame-
ML frameworks such as TensorFlow, PyTorch, Keras, work for designing and conducting experiments us-
and Scikit-learn. ing various search algorithms and techniques, such
as grid search, random search, Bayesian optimiza-
1) Data Logging tion, evolutionary optimization, and tree-structured
TensorBoard is by far the most popular data logging Parzen estimator (TPE). It also supports dis-
and visualization tool in ML. It is widely used in con- tributed training and can scale up to thousands of
junction with deep learning frameworks like TensorFlow nodes for high-performance computing.
and PyTorch. However, there are other popular data • Optuna [217]: an open-source hyperparameter op-
logging and visualization tools as well, such as Weights & timization framework for ML. It provides a flexible
Biases (WandB) and Comet.ml. The popularity of these and modular platform for automating the process
tools may vary depending on the specific use case and of selecting optimal hyperparameters for a given
developer preferences. We present in Table 11 a list of model architecture. Optuna uses various algorithms
the most popular tools. These tools all support different to search the hyperparameter space, including TPE,
ML frameworks and offer real-time monitoring and Covariance Matrix Adaptation Evolution Strategy
visualization of ML models. They also handle various (CMA-ES), Non-Dominated Sorting Genetic Algo-
data types and provide customizable views. Weights & rithm II (NSGA-II), and adaptive sampling. It also
Biases and Comet.ml provide experiment tracking and supports distributed optimization across multiple
collaboration features that allow for easy collaboration nodes for faster and more efficient tuning.
and sharing of results, while Visdom and TensorWatch • Ray Tune [218]: the hyperparameter tuning compo-
are great options for debugging and developing since nent of the Ray framework. It handles the execution
they offer configurable logging options and real-time of experiments including parameter studies with
tensor viewing. possibly multiple repetitions as well as scheduling
It’s important to keep in mind that the developer’s the runs for parallel execution. For hyperparameter
individual demands and preferences will determine the tuning, it supports a wide variety of approaches.
visualization tool that fits best. Some developers might These include basic strategies such as grid or ran-
favor a tool that integrates well with their preferred ML dom search, but also more advanced approaches
framework, whereas others would value flexibility and such as Bayesian optimization or Population Based
customization possibilities. Also, some developers could Training [219]. While some algorithms are imple-
need more sophisticated features like collaboration tools mented internally, it relies heavily on third-party
or hyperparameter optimization, while others might only optimization libraries such as Hyperopt [220] and
24 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
Optuna [217], and provides a unified interface to including Scikit-learn, Keras, and PyTorch.
them. •Scikit-Optimize [223]: a Python library for se-
• Keras Tuner [221]: a library customized for Keras quential model-based optimization that aims to
that provides an easy-to-use API for defining a efficiently explore and exploit the hyperparame-
hyperparameter search space, choosing search al- ter search space while minimizing the number of
gorithms such as random search and Bayesian model evaluations. It provides a simple and flex-
optimization, and running hyperparameter search ible API for defining the hyperparameter search
processes. Furthermore, Keras Tuner is easy to inte- space and selecting optimization algorithms, in-
grate with other Keras workflows and can optimize cluding Bayesian optimization and gradient-based
both single-node and distributed hyperparameters. optimization. Scikit-Optimize also supports parallel
• HyperOpt [222]: a Python library for hyperpa- evaluation of the search process, making it scalable
rameter optimization that uses a combination of to large hyperparameter search spaces and parallel
random search and Bayesian optimization to ef- computing environments. In addition to hyperpa-
ficiently explore and exploit the hyperparameter rameter optimization, Additionally, it can be used
search space. It provides an easy-to-use API for for function optimization and global optimization
defining the hyperparameter search space, selecting tasks. Furthermore, it integrates easily with popular
optimization algorithms, and executing the hyper- ML frameworks such as Scikit-learn and Keras,
parameter search process. HyperOpt uses a Tree- while including features such as early stopping and
structured Parzen Estimator (TPE) algorithm to warm-starting to further improve the efficiency of
model the relationship between hyperparameters the hyperparameter search process.
and model performance and to guide the search Note that Table 12 presents only the most commonly
for better hyperparameters. HyperOpt also allows used algorithms for each tool. While other algorithms
for the parallelization of the search process, making may be added, the mileage may vary depending on
it scalable to large hyperparameter search spaces the specific use case and requirements. Overall, the
and parallel computing environments. It can be choice of which tool to use depends on the specific
used with a variety of machine-learning frameworks, requirements and use case. For example, if there is a
VOLUME 11, 2023 25
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
need for scalability and distributed training, Ray Tune been developed and maintained by the Communication
is a good choice. If there is a need for a general-purpose Systems Group at ETH Zurich.
optimization library, then Scikit-Optimize might be a FIT IoT Lab [230] is an open-access testbed for
good choice. IoT experiments provided by the French Institute of
Technology. It contains over 1500 nodes offering a wide
E. TESTBEDS range of low-power wireless devices that can be used
As previously outlined, it is hard to replicate realistic to test and evaluate various IoT applications, protocols,
network conditions, and using existing datasets might and algorithms. In addition, its large-scale infrastructure
not always fit the problem. While simulation tools can and easy-to-use web interface provide a flexible and
help with that, there is also the possibility of using exist- convenient platform for IoT experimentation.
ing testbeds or building your own. Access is usually open D-Cube [231] is a testbed by Graz University of Tech-
or free to researchers for the existing real-world testbeds, nology. It contains about 50 nodes with two platforms,
but you might have to schedule your experiments and nRF52840 and TelosB, and provides a set of predefined
wait depending on utilization. In the following, some scenarios. These scenarios allow researchers to evaluate
popular real-world testbeds and some devices one could protocol performance and compare it against each other
use to build a testbed will be introduced. There are two easily.
types of testbeds, wired ones, and wireless ones. The CLOVES [232] is a part of the IoT Testbed at the Uni-
wireless ones are wireless sensor networks without any versity of Trento. It contains 275 indoor devices spread
routers or switches, and communication is broadcasted. over 8000 square meters. Communication is possible
Testbeds are relatively versatile and can be used to using ultra-wideband or narrowband, and all nodes are
either test ML applications that rely on networks like remotely accessible.
Distributed ML or to test ML algorithms that do traffic Next, we are going to introduce some wired testbeds.
routing, for example. Note that some of them also provide wireless capabili-
ties.
1) Real-world Testbeds PlanetLab [233] was founded in 2002 by researchers
For a more extensive overview, [224], [225], [226], and from several universities, including Princeton University,
[227] provide surveys that either include a section about the University of California at Berkeley, and Stanford
testbeds or are entirely about testbeds. We present University. While it was shut down in March 2020, Plan-
a selection of popular testbeds, starting with wireless etLab Europe37 continues to operate. It is a collection
testbeds. of interconnected computers located at over 250 sites
FlockLab [228] is an experimental platform that en- in more than 40 countries across Europe and beyond,
ables researchers to test and evaluate the performance of available for researchers to use in their experiments.
wireless sensor networks (WSN) and IoT systems. It is a PlanetLab Europe provides researchers with virtual ma-
flexible, open-source testbed that provides a controlled chines, storage, and network connectivity. In addition,
and repeatable environment for the evaluation of various researchers can deploy their software on the nodes and
applications. An advantage of FlockLab is its flexibility, create custom network topologies to simulate various
as it can be used to test and evaluate a wide range network scenarios.
of wireless sensor networks and IoT systems [228]. It EmuLab38 [234] is a network testbed developed by
supports various wireless technologies, such as Zigbee, the University of Utah that provides users with a vir-
Z-Wave, and LoRaWAN, and it can be easily extended tual network environment to test and evaluate various
to support new technologies. FlockLab is widely used 37 https://www.planet-lab.eu/
in the field of WSNs and IoT systems [229], and it has 38 https://www.emulab.net/
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
networking systems and applications. Emulab allows Raspberry Pi can run TensorFlow Lite and other ML
researchers to create and configure network topologies, frameworks, enabling researchers to run pre-trained
deploy software and network services, and generate models and perform basic ML tasks. It can also be used
different types of network traffic to test and evaluate as an edge device for collecting and preprocessing data
various networking scenarios. before sending it to the cloud for further analysis.
GENI (Global Environment for Network Innova- The Intel Movidius Neural Compute Stick41 is a
tions) [235] is a US national-scale network testbed USB device that provides on-device AI inference for
that provides researchers with a virtual laboratory for various applications in networked systems. It features
developing and testing new networking technologies a Myriad 2 VPU, which can run deep neural networks
and applications. It comprises a large-scale network of with low power consumption. The Neural Compute Stick
interconnected computing resources, including servers, can accelerate computer vision, speech recognition, and
routers, switches, and other network devices. natural language processing tasks in networked devices.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
are utilized to explain various already trained black- explain the black box of a ML model very well, it comes
box models, e.g., neural networks or ensemble models. with the drawback that it needs high computational
Ensemble models like Random Forest are composed of power. Thus, it is only feasible for models with fewer
multiple smaller models jointly determining the output. input parameters [250].
This makes interpretation difficult. Interpretable, trans- A well-working method for getting an explanation
parent, or glass-box models provide an explanation for of classification models in a model-agnostic fashion is
how the model obtains the output by design. Prevalent a method named Local Interpretable Model-agnostic
models are, for example, the well-known linear models Explanations (LIME) [251]. LIME belongs to the class of
and decision trees, as well as the less-known generalized surrogate models, where a model is used to approximate
additive models. the predictions of a target black-box model to infer
Finally, model-agnostic methods and model-specific the reasoning of the black-box model. LIME trains a
methods are distinguished. Model-agnostic methods can local surrogate model to explain the predictions for a
be used on top of every kind of model, while model- specific sample by first aggregating permutations of the
specific methods can only be used by specific model original feature inputs of the sample into a new dataset,
families. A prominent example of model-specific meth- weighting the samples of the dataset according to their
ods are saliency maps [247], which are computed from proximity to the original sample, and then training an
the feature maps learned by a model and can be used interpretable model on this dataset to approximate the
in computer vision to highlight the regions on which the predictions of the black-box model. After training, the
model focuses when processing input. They are generally local model can be interpreted to understand the black-
applicable when using CNNs. This also implies that the box model’s reasoning.
nature of the data directly influences the applicable XAI Another type of local model-agnostic post-hoc ex-
techniques for the different use cases, e.g., time series plainers are counterfactual explanations [245]. Counter-
XAI techniques are not usable for graph data. factual explanations are used for causal reasoning and
may serve to answer what-if questions, i.e., "would Y
B. SPECIFIC XAI METHODS have occurred if X had not occurred before". These tech-
Since there are many different categories of XAI tech- niques may be helpful for network operators when they
niques, there is a wide spectrum of specific XAI methods. try to analyze and manage their network with respect
Thus, the following explained methods are only a small to critical situations, e.g., how to avoid congestion in a
selection. Due to the fact that in many XAI scenarios a network. In a nutshell, they work by deriving causal rela-
black-box ML model should be intelligible, the methods tionships from the input features and then manipulating
introduced first focus on post-hoc explainer. While it is input features to perform specific reasoning.
common to perform post-hoc explanations, the authors
of [248] argue that we should stop using post-hoc ex- 2) Interpretable Models
plainers and instead directly use interpretable models. The easiest to interpret and most known interpretable
Interpretable models often perform weaker than black- models include decision trees, which are interpretable in
box models, but are interpretable by design. an if-else fashion, and linear models like linear regression
or logistic regression, where slope and intercept directly
1) Post-Hoc Explainers characterize the input mapping. Generalized linear mod-
As a majority of advances in ML happen in computer els (GLMs) and generalized additive models (GAMs) ex-
vision, there exists a huge variety of post-hoc explainers tend linear models to better reflect non-linear functions
explaining the learnt filters of a CNN, e.g., saliency and different target distributions other than Gaussian
maps. As a consequence, these techniques are model- distributions as with linear regression [245]. Especially
specific and usually not applicable for network data. for GAMs, many different models exist by now that are
Nevertheless, there exist approaches where network data directly interpretable. The main idea behind GAMs is
is transformed to images beforehand, e.g., for encrypted that a separate function fi is learnt for each feature xi
network traffic classification in [249], and processed with and that the outputs of these functions are then summed
a CNN, so saliency maps could be applied here. up with a bias β to perform the prediction. The bias can
Layer-wise Relevance Propagation (LRP) is a post- hereby be learnt or fixed beforehand. Variants of GAM
hoc method that uses the neural network’s forward pass include the Explainable Boosting Machine (EBM) [252],
and propagates its output backwards through the layers which is a tree-based GAM and uses gradient boosting
until the input layer to derive the relevance of an input to improve performance, and the Neural Additive Model
on the model’s prediction. (NAM) [253], which consists of n neural subnetworks
A prevalent local model-agnostic post-hoc explainer (fi ) transforming each of the n input features to a new
is called SHapley Additive exPlanations (SHAP), which representation. The architecture of NAM with its n
uses methods from game theory to judge the importance subnetworks and the bias β is depicted in Fig. 7.
of different feature inputs. Although this method can In Fig. 8, we show the interpretability of EBM and
28 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
f1(x1) 2 EBM
NAM
2 EBM
NAM
2 EBM
NAM
1 1 1
x1 0 0 0
−1 −1 −1
Effect [MOS]
−2 −2 −2
f2(x2)
0 5 10 0 5 10 0 2 4 6 8
x2 Avg. Bitrate [Mbps] Initial Delay [s] Stallings [#]
5
∑ 2 EBM
NAM
2 EBM
NAM
1 1 4
...
...
MOS
0 0 3
fn(xn) −1 −1
IQX
2 WQL
xn −2 −2
1
LIN
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
ensembles, i.e., training the same model with different AI, there are additional principles, which must be kept
seeds and considering the predictions of each model and, in mind, when developing and deploying ML models.
in particular, the differences in these predictions. The These principles include the prevention of discrimination
stronger the differences between the models’ predictions, against persons, groups, or races, i.e., the model must
the higher the uncertainty. This approach can be used be fair. In the context of communication networks, this
for any kind of model. Another simple approach for could for example mean that a model discriminates
estimating epistemic uncertainty in neural networks is specific users by assigning them lower bandwidth shares
the use of Monte Carlo Dropout. With Monte Carlo and a higher latency. Additionally, responsible AI en-
Dropout, the dropout layers, which are usually used for sures that users or stakeholders are always aware of the
improved model generalizability during training, are also usage of ML models. Specifically, it must be transparent
kept active during inference. Generating multiple model to everybody that ML has been used and how it has
predictions with active dropout can also be considered been used. An example for communication networks
as approximate Bayesian inference. Again, the variation is the adaptive change of a routing table by an ML
in the returned predictions quantifies the degree of model. The model must be able to outline why a change
uncertainty. To learn aleatoric uncertainty, it is usually was required and why it has changed specific routes.
required to learn in the model not only mean responses, Next, the use of ML models should always end up in a
but instead the variance must be learnt, too [259]. With beneficial way for humanity in all aspects of life. They
neural networks and a regression task, this is for example should not be used in disruptive ways, e.g., generating
easily possible by simply adding another head, i.e., downtimes in a network for specific users on purpose.
output neuron, to the neural network, which learns the Finally, privacy and security is also a very important
variance and by accordingly adjusting the loss function. topic. ML models require data. Here, the privacy and
Using the negative log-likelihood of a Normal distribu- security of sensitive data must be kept throughout the
tion (or any other distribution) as loss function, it is thus whole lifecycle of preparing and deploying the model.
for example possible to learn a Normal distribution for Responsible AI is still a young field of research. Nev-
an input, thereby allowing to quantify the uncertainty ertheless, all the mentioned principles must be kept
in form of the variance for an input. Finally, Bayesian in mind when preparing and deploying ML models in
Neural Networks as proposed by Kendall and Gal [259] practice. It is one of those topics already diligently
can model both aleatoric and epistemic uncertainty. discussed in the conceptualization of future networks,
With Bayesian Neural Networks, model weights are e.g., 6G [261]. Meanwhile [262] is a more generic survey
assigned a probability distribution instead of a single of best practices to ensure that the AI environments are
value. Using these probability distributions, it is then responsible .
possible to quantify epistemic uncertainty. For aleatoric
uncertainty, they simply use two heads, where they learn E. LIBRARIES
both mean and variance for a data point.
Several XAI libraries are available for all kinds of frame-
works, e.g., Scikit-learn, PyTorch, and TensorFlow (cf.
D. RESPONSIBLE AI Section IV). Microsoft created a Python library named
Strongly related to uncertainty is the concept of Re- InterpretML [252], which unifies black-box explainers,
sponsible AI. According to Arrieta et al. [246], XAI e.g., SHAP values, LIME, or Partial Dependence Plots,
alone is not sufficient for an ethical and responsible and transparent models, e.g., linear models, decision
usage of ML models. Responsible AI is in general a trees, decision rules, and also EBM, a tree-based gen-
much broader topic than XAI [260]. With responsible eralized additive model. OmniXAI [263], AIX360 [264],
30 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
and Alibi [265] also provide a collection of various post- These metrics could depend on the specific task or
hoc explainers and models for all kinds of data types and application of the model [274]. Although these metrics
backends. In contrast to the libraries containing several were introduced for ML models with network application
different tools, individual explainers like SHAP or An- ("ML for Networks), it is worth noting that some metrics
chors, but also interpretable models like the attention- can also help answer the questions arising in "Networks
based model TabNet42 are available as separate Python for ML". The choice of metrics will depend on the
packages. specific problem and the desired outcome. Hence, "ML
All gradient-based methods, e.g., Integrated Gradi- for Networks" and "Networks for ML" are not mutually
ents [266], can be directly implemented within Py- exclusive [275].
Torch and TensorFlow, or additional libraries like Cap- For instance, Data Quality is a metric that can be used
tum [267], TorchRay [268] and TF-Explain [269] can for evaluating both. As ML is generally data-driven, data
be used. Captum also comprises a huge number of quality is very important for model development. Thus,
techniques for explaining image-based data. when ML is applied for network tasks, data quality is
often determined primarily measured by the correctness
VI. NETWORKS FOR MACHINE LEARNING and representativeness of events/classes. This can also
In the previous sections, we primarily explained ML be utilized in "Networks for ML". However, as it focuses
methods, architectures, and principles to develop ML on decentralized data sources, data distribution can
models. Hence, we focused on applying ML to design additionally be considered. Other metrics typically con-
and optimize networks, detect patterns and anomalies, sidered by the ML community are: Privacy, Robustness,
and predict network behavior autonomously. We refer Energy Efficiency, and Fairness. However, as "Networks
to this application as "ML for Networks" [270], [271], for ML" also focuses on network behavior, typical net-
where ML models are developed from network data to, work metrics are often applied, such as Throughput, La-
e.g., design the communication topology of a network or tency, Packet loss rate, and Spectral efficiency. Table 14
to balance the traffic load. further explains the metrics and their impact.
However, networks and ML form a mutual relation- So why do these network metrics influence the ML
ship in which networks support ML, e.g., by using a models? High latency and low throughput (as well as
network as an infrastructure for ML algorithms, both low spectral efficiency) can cause delays in the training
for training and inference. As we will see throughout this process, leading to slower training times and increased
section, networks are thus a key success factor for ML iteration cycles. Packet loss can impact the accuracy and
by connecting and providing computational power and also the consistency of ML models, because it can lead
data storage [272], [273]. We refer to this support and to incorrect or incomplete data inputs, and can cause
infrastructure functionality of networks as "Networks for inconsistent data transfer in case of retransmissions.
ML". Important to note is that it is detached from the This, in turn, can affect the model’s ability to generalize,
ML model application. Instead, any ML model can be converge and make accurate predictions.
trained or deployed in a networked system. As ML is Different network topologies could affect the "Net-
a relatively new network task, challenges for networks works for ML" performance, scalability, and security.
arising from ML traffic and possible effects on ML from Furthermore, when considering ML for Networks, the
networks are still the subject of research. "Networks for choice of network topology can also affect the accuracy
ML" generally comprises these open research questions. and efficiency of the models.
ML algorithms primarily use a network to access For example, in a star topology, all nodes are di-
data from memory or to exchange model parame- rectly connected to a central hub, which can make the
ters/updates. The traffic load generated, the traffic network easier to manage and administer. From a ML
shape, and network requirements, e.g., regarding latency perspective, this topology would lend itself to centralized
and robustness, are unknown for many ML methods and learning, where data from all nodes is collected and
are likely to be application-specific and method-specific. processed in a central location. This approach could
All this can pose new challenges for networks and make simplify the deployment and maintenance of the ML
a better understanding of the mutual relationship be- model, but it could also lead to a single point of failure
tween ML and networks necessary. Thus, it is no longer and potential privacy concerns.
sufficient to evaluate ML model performance alone but On the other hand, a mesh topology, in which nodes
also the network performance. Hence, one might ask the are connected in a decentralized fashion, can be more
question: Which metrics to use to evaluate model and resilient to failures and provide more privacy, but it can
network performance when applying ML in networks? also be more difficult to manage. In terms of ML, this
From Section II, we know, that several metrics can topology can be suitable for distributed learning, where
be used to evaluate the performance of ML models. each node trains a local model and shares its knowledge
with the other nodes. This approach could improve the
42 https://dreamquark-ai.github.io/tabnet/ scalability and privacy of the model, but it could also
VOLUME 11, 2023 31
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
TABLE 14: Examples of metrics for ML for Networks and Networks for ML.
Metric Usage Networks for ML ML for Networks
Throughput it measures the amount of data that can be transmitted the main objective is
over a network in a given time period. implicitly impacts the to optimize these
convergence rate of metrics for network
Latency it measures the time it takes for a packet to travel from ML models applications
the source to the destination.
Spectral efficiency it measures the amount of data that can be transmitted
per unit of wireless bandwidth.
Packet loss rate it measures the proportion of packets that are lost also impacts the model ac-
during transmission. curacy.
Privacy it measures the level of data privacy that is maintained any data such as medical network data such as
during the training/learning process data or images HTTP or IP traffic
Robustness it measures the ability to cope with changes in the the ability of a model to the ability of a network to
network maintain its performance continue operating in the
when tested on new data presence of failures or at-
that is different from the tacks.
training data
Energy Efficiency it relates to the power consumption of network devices the product of power for wireless networks, it
consumption per measures the amount of en-
inference/training step ergy consumed per bit of
and the amount of time for data transmitted.
executing the task [276]
Fairness for resource allocation, it measures the distribution of refers to correcting and resources are bandwidth
resources among different users or devices. eliminating bias in ML de- and computation
cisions. Hence, resources
are data
Data Quality: it measures the quality of data that is used for training high-quality data may compromise privacy or suffer from
across different devices. high latency when collecting them. There is a trade-off
between this metric and the aforementioned ones.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
and better resource utilization is obvious, the benefit dataset. All instances have the same model structure,
of more accurate predictions requires a more detailed number of layers, and number of neurons per layer, but
explanation. Unlike the case where each node trains its the parameter values can vary. The workers periodically
model using its local data, centralized ML training ben- communicate to exchange model parameters and aggre-
efits from aggregating data from multiple nodes [281]. gate their updates after processing a predefined number
Thus, the model trains on a larger dataset, but also of samples locally. Various data-parallel methods have
the aggregated data better represents the overall data been formulated, differing primarily in the manner of
distribution, allowing the model to generalize better. For cooperation among workers during training, encompass-
example, an ML model can extract significant informa- ing how workers communicate and where update aggre-
tion from data from different sensor types or locations. gations occur. From this perspective, architectures can
This is particularly useful in network applications such be primarily distinguished by Client-Server and Peer-
as smart cities, environment monitoring, and industrial to-Peer methods. The Client-Server methods use a set
IoT. of decentralized workers that process model updates as
However, there are also several disadvantages as- Clients and a centralized server as Server. The server can
sociated with centralized ML in networked systems. be a single worker, or multiple workers organized equally
First, the dependence on a central computing node for or in hierarchical layers. Regardless of the server’s in-
model training and inference introduces a single point ternal structure, the server maintains the shared model
of failure, and scalability issues, potentially impacting state and stores all model parameters. Clients receive
the reliability and availability of the system [282]. The the current model state with its parameter set from the
centralized approach places high demands on the central server and communicate their updates only to it. All
server in terms of computing and network performance, communication is thus handled by the server, which can
making its acquisition and maintenance expensive. Sec- lead to a bottleneck. In contrast, Peer-to-Peer methods
ondly, the data collected by networked devices (e.g., entail direct communication of updates among workers
multimedia sensors, intelligent vehicles) is transmitted without the presence of a central server managing the
in large quantities over the network, requiring high data global model state. Which workers can communicate
rates. Nevertheless, transferring large amounts of data with each other is defined in a communication topology.
to a central node can cause network congestion and Here, all-to-all but also graph-based topologies such
degrade real-time performance [283]. Sensitive data may as trees and rings are possible. In addition to the
need to be transmitted to the central server, potentially cooperation relationship, data-parallel methods differ in
compromising user privacy. Recently, there are growing whether workers transmit their updates synchronously
concerns about privacy in networked systems with data or asynchronously and in the amount of communication
generated by networked devices, such as wearable de- overhead incurred. In production, where it is usually
vices or sensors, where data is often very private or inferred from the model, the machines use the mutual
sensitive [284]. This results in additional requirements model instance.
for the network over which the data is transmitted, Model parallelism, on the other hand, splits the model
processed, and stored. and distributes it across multiple workers, allowing for
model sizes larger than the memory of a single machine.
B. DISTRIBUTED ML Each worker trains and infers only its parts of the model,
In various fields of application, the complexity of tasks which requires less memory. Consequently, the model’s
being tackled by ML models has led to an increase entirety is upheld collectively by all workers, necessitat-
in the number of model parameters. To cope with ing constant communication among them during both
this complexity, distributed ML techniques make use the training and inference phases. The data is fed to
of networks of interconnected computing machines to the workers that maintain the input layer of the model,
address challenges such as handling larger and dis- and each worker forwards its computed output to the
tributed datasets, accommodating heightened comput- worker holding the next part of the model. In the back-
ing resource demands, and dealing with models that sur- propagation step during training, the workers holding
pass the memory capacity of a single machine. Here, two the output layer first compute the updates. The updates
approaches are prevalent and usually take advantage of are then propagated to the workers in reverse order and
networking to enhance model training: 1) data-parallel applied. A central challenge within model parallelism
and 2) model-parallel. Combinations of data- and model- lies in devising an effective strategy for partitioning a
parallel methods are also possible. given model across multiple networked machines. This
Data-parallel corresponds to scale-out parallelization partitioning determines how the model segments are dis-
and, therefore, increases computational capacity. Dur- tributed among workers to optimize communication and
ing training, several machines, so-called worker, train computation while maintaining overall model coherence.
instances of the ML model. These instances operate Common methods for distributed ML for data-parallel
on distinct and usually non-overlapping portions of the and model-parallel are explained below.
VOLUME 11, 2023 33
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
C. PARAMETER SERVER central server. The server aggregates the updates from
Parameter Server [285], [286] is a data-parallel Client- all devices and uses them to update the global model.
Server method (cf. Figure 9a). Here, multiple decen- Figure 9b shows an FL scenario with three connected
tralized clients - Worker are connected to a centralized devices and a central server. The key idea behind FL is
server - Parameter Server. The parameter server stores that the global model is trained using a large amount of
the model parameters, assigns data to workers, and data from multiple devices, while each device only needs
aggregates the updates received from workers. Often, to share the model updates. This allows FL to achieve
the parameter server is a single machine but can also the same performance as traditional centralized learning
be a set of equivalent or hierarchically structured ma- while preserving user privacy.
chines [287]. Each worker maintains an instance of One of the most widely used FL algorithms is Feder-
the model and individually processes parameter up- ated Averaging (FedAvg). FedAvg is designed to address
dates based on its data. Typically, SGD is used for several challenges that arise in FL, including the need
parameter optimization during a training process. The to preserve data privacy, mitigate bias and inconsistency
processed data can either be captured and stored on across devices, reduce communication overhead, and
the worker machine or transmitted from the parameter enable model convergence. FedAvg works by having each
server. Usually, workers access only (non-overlapping) device train its own local model using its local data, and
portions of the data. The complete dataset is thus then the local models are aggregated to form a global
distributed across multiple workers. After processing a model that is distributed back to the devices for further
predefined number of data samples, the workers first training. To address the challenge of bias and inconsis-
propose their parameter updates to the parameter server tency across devices, FedAvg uses a weighted average of
and then receive the updated model. However, how the local models, with the weights determined based on
many other workers have contributed to the updated the amount of data each device contributes to the model.
model depends on the Parameter Server implementa- This approach ensures that each device’s contribution is
tion. For synchronous implementations, the parameter weighted appropriately, producing a more representative
server considers updates from all workers. The workers and robust global model.
do not continue processing until the updated model has By training a model locally, FL allows devices to
been broadcast. Therefore, the slowest worker impacts make predictions and decisions without the need for a
the time for a model update significantly. In contrast, constant network connection to a central server. This is
in asynchronous implementations, the parameter server particularly useful in applications such as autonomous
updates and broadcasts the model immediately after vehicles, drones, and medical devices where data needs
receiving an update from the sending worker. Here, to be processed in real-time. Additionally, FL is also
workers proceed on different model instances. This is a beneficial in scenarios where data is sensitive and cannot
problem in heterogeneous environments, with different be shared, such as medical imaging or financial data.
computing resources and transmission delays. Slower Another essential benefit of FL is its ability to han-
workers working on outdated model instances can de- dle data that is non-IID (Independent and Identically
range SGD’s solution with their updates, causing the Distributed), a common characteristic of data collected
model to converge incorrectly. For homogeneous cluster from networked devices.
environments, this is not the case and is often faster than In traditional centralized learning, data is often as-
synchronous systems [288]. Since synchronous and asyn- sumed to be IID, which means that it has the same
chronous Parameter Server implementations struggle distribution across all devices. However, in practice, each
in heterogeneous environments, time-wise and model- device can have its own data distribution, which can
quality-wise, respectively, Parameter Server is typically lead to biased or suboptimal models. FL algorithms
applied in data centers. such as Federated Averaging [289], Federated Transfer
Learning [290], and Federated Meta-Learning [291] are
D. FEDERATED LEARNING proposed to address these issues.
Federated Learning (FL) [289] is another data-parallel
Client-Server distributed ML method that enables mul- E. ALL-REDUCE
tiple devices to collaboratively train a shared model The All-Reduce approach [292] is a data-parallel dis-
without sharing their raw data. This approach has tributed ML method for training ML models and imple-
gained significant attention in recent years due to its ments the Peer-to-Peer concept. Therefore it dispenses
ability to protect user privacy and enable learning on with a central server, and instead, workers communicate
edge devices with limited computational resources. directly. Which workers communicate with one another
In FL, multiple devices, such as smartphones, IoT is specified by the communication topology used. Mul-
devices, or edge servers, participate in the training tiple communication topologies are possible for the All-
process by locally training a model using their own data Reduce approach, i.e., ring [293], butterfly [294], and
and then sending their updated model parameters to a trees [295]. The communication topologies affect the
34 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
(a) Parameter Server (b) Federated Learning (c) Ring All-Reduce (d) Split Learning
FIGURE 9: Cooperation topologies for common distributed ML architectures
Federated Learning a method of training a ML model across multiple privacy and security of communication overhead,
decentralized devices or servers while keeping data on data, ability to handle devices may have different
the devices. large-scale decentralized data distributions and
data. update rates.
All-Reduce a data-parallel distributed ML method for training ML low implementation requires communication
models and implementing a Peer-to-Peer concept. complexity, allows parallel between all devices, and
computation on multiple may be limited in terms of
devices. scalability.
Split Learning a method of training a ML model where the model is better privacy and potential loss of accuracy
split between two or more devices and only the output security compared to due to approximation in
activations are transmitted between devices. traditional centralized forward computation,
learning, reduced synchronization of model
communication overhead. parameters across devices
may be difficult.
Federated Split Learning a hybrid approach that combines elements of Federated achieves better privacy may be more complex to
Learning and Split Learning. and security compared to implement than other
traditional centralized methods, devices may
training, reduced have different data
communication overhead distributions and update
compared to Federated rates.
Learning.
data rate and latency of the network differently. In some updates and proceeds to produce the next local updates.
cases, the topology also restricts access to the data set. The repetitive and expensive communication of updates
guarantees that all workers work with the same model
In principle, each worker maintains an instance of the instance [292]. Figure 9c illustrates the communication
model and individually processes updates by its assigned topology of a ring All-Reduce approach.
portion of data. The data is usually distributed at the
beginning of the training. After processing a predefined
number of data samples, the workers communicate their F. SPLIT LEARNING AND INFERENCE
local updates with all their peers. Shortly after, they Split Learning (SL) [296], [297] is a model-parallel
receive the updates of their peers and aggregate them distributed ML method that decouples model training
with their own. This step of communication and aggre- from the need for direct access to the raw data, in
gation can be repeated several times. When all updates which a model is split into at least two sub-models. It is
are distributed to all workers, each worker adjusts its similar to FL, but it focuses on the case where devices
model instance parameters according to the aggregated have low computational power, memory constraints, or
VOLUME 11, 2023 35
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
limited energy budget. In contrast to FL, where devices regard to computation and size on the client as sensors
typically train a model locally and send the updated have limited resources and to minimize the amount of
model parameters to a central server, in SL, the devices communication while making sure that the model does
only forward a feature representation of their data to not lose too much accuracy.
the central server, which performs the model updates. Matsubara et al. [301] provide a comprehensive survey
In SL, the model is split into at least two parts, describing many proposed methods to optimize SC. Ad-
with one part running on the device and the other part ditionally, it also contains links to code where available.
running on the central server. Figure 9d shows the SL With sc2bench [302], there is also a pip package to
representation with three devices. The key idea behind test and compare several SC techniques while providing
SL is that the device part of the model is lightweight a framework to start creating your own method.
and can be run on devices with low computational
power instead of the entire computationally demanding VII. FURTHER READINGS
model. Thus, SL enables model training and inference Several related survey and tutorial papers exist that
on devices with low computing resources. cover parts of the interplay between ML and networking
SL over networked devices is particularly useful in to a varying extent and on varying scales of granularity.
scenarios where devices have low computational power Table 16 lists the most related of these papers while
and high communication bandwidth. For example, in a highlighting their ML scope, covered network applica-
network of smartphones, each smartphone may have a tions, and whether they focus on ML for Networks
camera that captures images, but the device may not (ML4N) or Networks for ML (N4ML).
have the computational power to process the images. Perhaps the most comprehensive survey on ML for
SL can be used to train a model that can classify images Networks, [306] discusses ML approaches for a wide
without needing to process the images on the device. range of networking challenges and provides further
references to more specialized surveys about ML ap-
1) Federated Split Learning proaches in certain networking domains. The work of
Federated Split Learning (FSL) [298], [299] is a dis- [309] considers itself as an update to [306], covering more
tributed algorithm that combines the ideas of computing recent developments and discussing recent IDS datasets.
the weighted average, a characteristic of the FL architec- Additionally, several surveys consider ML approaches
ture, and the neural network split between the client and for a subset of networked systems, such as vehicu-
server of the SL architecture. It thus combines data- and lar networks in [315], [316], Software-Defined Networks
model parallelism. In FSL, all clients compute in parallel (SDN) in [305], mobile/wireless/ubiquitous networks in
and independently. They send/receive their smashed [4], [277], edge computing [303], [304] or network traffic
data to/from the server in parallel. The client-side sub- monitoring and analysis [313]. The work of [307] takes
network synchronization, i.e., forming the global client- a unique stance and provides the joint application of
side network, is done by aggregating (e.g., weighted recent ML and Blockchain technologies for networking
averaging) all local client-side networks on a separate problems. Other surveys focus on specific ML subdo-
server. mains such as unsupervised learning [29], deep learning
[270] or distributed ML [308].
2) Split Computing The work presented in [310] and [311] specifically con-
Splitting a neural network for inference tasks is usually sider the role of FL in networking. While [310] discusses
called Split Computing (SC). It is very similar to SL, as FL several applications in the domain of communica-
a model is split into sub-models and then distributed on tions and networking, [311] focuses on mobile edge com-
multiple devices communicating with each other. It is puting but also discusses how communication techniques
helpful in scenarios where sensor devices are resource- influence FL methods. The studies [283], [312] provide an
limited and can not deploy full models. Instead of overview of various applications of ML methods through
offloading the sensor data, the sensor can compute a IoT systems and analyses various approaches on how
part of the model and then transmit the compressed ML models can be distributed and processed in the
feature representation, resulting in a smaller end-to-end cloud-to-things continuum. The survey [282] discusses
latency [300]. the convergence of edge computing methods and ML;
Most works focus on a simple client-server scenario. specifically, it provides a comprehensive view of how
The model is then split into a head and a tail part. networking can be utilized for cooperative processing
The client, a sensor, gathers sensor data, feeds it into of deep learning models on edge devices. The survey
the head of the model, and then transmits the fea- [301] provides insights into how networked devices such
ture representation to the server. The server receives as smartphones and autonomous vehicles are used for
the feature representation and completes the inference collaborative training of ML model, and inference oper-
process using the tail. In this client-server scenario, ations over the network using split computing and early
the main challenges are to minimize the head with exit methods.
36 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
TABLE 16: Selective surveys & tutorials on using ML for Networks (ML4N) and Networks for ML (N4ML)
Paper Network Application ML Scope N4ML or ML4N
[303], [304] edge computing: video processing, autonomous vehi- distributed training/inference and model compression: both
cles, smart homes, privacy, security, and caching DNN, FL, Transfer Learning (TL) and shallow ML
[305], [306] prediction and configuration: routing, congestion con- shallow ML, Deep Learning (DL) and RL ML4N
trol, traffic prediction/classification, and security
[282] edge, fog, and cloud computing: intelligent manufactur- distributed ML, FL, RL, TL N4ML
ing, security, real-time video processing, smart city
[270] wireless and mobile computing (e.g., user localization, DNN: e.g., CNN, RNN and GAN; and deep RL ML4N
mobility/mobile data analysis, signal processing)
[307] traffic classification, routing optimization, resource shallow ML, DL, RL ML4N
management, security
[4] wireless: Virtual Reality (VR), Mobile Edge Comput- DNN: e.g., Spiking Neural Network (SNN) and RNN both
ing (MEC), Internet of Things (IoT) and Unmanned
Aerial Vehicle (UAV)
[308] 5G: privacy and resource allocation distributed ML, model compression both
[301] edge computing: video/audio processing, autonomous Slit Computing (SC), distributed ML, Early Exit N4ML
vehicles, smart homes, industrial IoT (EEoI)
[29] traffic classification, network optimization, and Intru- unsupervised ML ML4N
sion Detection System (IDS)
[309] IDS, routing, network optimization, and resource allo- supervised and unsupervised ML, RL and DNN ML4N
cation
[310] IoT, vehicular networks, edge computing, wireless net- FL, RL ML4N
works, UAV swarms
[311] edge computing: privacy, security and model compres- FL both
sion
[283], [312] distributed computing: smart cities, autonomous vehi- distributed ML, FL, EEoI N4ML
cles, smart homes, industrial IoT
[277] mobile communication: privacy, security, robustness, distributed learning: FL, multi-agent and parameter both
and other exemplary applications server.
[313] traffic prediction/classification and security DNN: e.g., CNN, RNN and GAN ML4N
[314] IoT: resource management and traffic classification shallow ML and DL ML4N
[315], [316] vehicular network: security, privacy, and resource allo- shallow ML, DL, TL and RL ML4N
cation
[317] wireless/6G (e.g., signal detection, channel, and sensor explainable DL (e.g., symbolic representation, feature ML4N
network diagnostics, antenna selection) visualization, model reduction)
Concerning the role of XAI in networking, the amount networks with millions of nodes and edges. Most
of survey work is limited. The work of [318] motivates ML approaches are initially developed and tested
the usage of XAI methods in networking challenges but for small-scale networks to better debug them and
only covers a single concrete problem. While there exist understand their effect on individual network com-
survey papers on XAI [319] and Explainable Reinforce- ponents. However, making them work at scale is
ment Learning (XRL) [320] in general (i.e., not limited not always trivial because large-scale network struc-
to networking), to the best of our knowledge only [317] tures might lead to computation time explosions (as
surveys XAI techniques in the domain of networking, has been indicated e.g. for SDN in [324]), especially
namely in challenges related to wireless/6G. for problems where global decisions are taken in a
centralized manner.
VIII. CHALLENGES AND FUTURE DIRECTIONS • Limited data: One of the challenges in ML for
The adoption of ML in networks also brings forth several Networks is the limited amount of data available
challenges and opens up exciting future directions for for training. Collecting and labeling network data is
research and development for both ML for networks and a time-consuming and costly process, and in some
Networks for ML. In this section we touch on some of cases, data may be proprietary or sensitive, making
these challenges, while we refer to [321] for further dis- it difficult to obtain.
cussions on limitations and challenges of ML and more • Interoperability: Another challenge is the lack of
specifically, when we apply ML for Network [322], [323]. interoperability of ML models. In many cases, it
The following are some of the current challenges in ML is difficult to understand how a model arrived
for Networks: at a particular decision or prediction, making it
• Scalability: One of the critical challenges in ML for challenging to debug or troubleshoot issues.
Networks is scaling up models to handle large-scale
VOLUME 11, 2023 37
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
• Heterogeneous data: Networks often contain hetero- models, since formulating ML problems for complex
geneous data from multiple sources, such as text, application domains or sub-problems where suitable
images, and numerical data. Incorporating this data training data is available often requires several
into ML models and designing models that can simplifying and/or narrowing assumptions at the
effectively handle heterogeneous data is another start [330]. Leaving out such assumptions one by
challenge that requires further research. one brings ML systems closer to deployment in real-
• Robustness: ML models are vulnerable to attacks world scenarios, but often is a non-trivial task that
and adversarial examples, especially in network en- brings unexpected challenges in every step along the
vironments where data may be noisy or corrupted. way.
• Real-time decision-making in closed-loop systems: On the other hand, the challenges related to Networks
In many Network Control Systems (NCSs) envi- for ML include:
ronments, decisions must be made in real-time,
requiring efficient and fast ML model inference • Resource constraints: ML algorithms often require
[325], [326]. Developing algorithms that can make significant computational resources, including pro-
accurate but fast decisions in real-time is a sig- cessing power, memory, and storage. Moreover, the
nificant challenge in ML for Networks. One of the training of ML models requires large amounts of
core problems is the potential for unstable system data, and transferring this data across networks
behavior caused by a mismatch between the in- can be time-consuming and resource-intensive. This
dented NCS sampling time and the time required can be a challenge in resource-constrained networks,
for inference of an ML model. As a result, input such as those in IoT devices and edge comput-
delays affect the resulting system, which must be ing environments, or when specialized networking
handled carefully [327]. Hence, there is a trade-off hardware disallows certain compute operations. In
between large models that can handle large-scale addition, storing data in a centralized location can
networks (the first mentioned challenge) and the create a bottleneck and security issues.
required time for their inference. In general, the • Latency: Network latency can affect the perfor-
required inference time by AI and ML models will mance of ML algorithms, particularly in real-time
be a non-trivial function of the resulting closed-loop applications where decisions must be made quickly.
system in which it is embedded. For RL, delays due High latency can lead to delays in data transmission
to model inference can be explicitly included in the and processing, which can negatively impact the
modeling, resulting in the notion of real-time MDPs accuracy and effectiveness of the algorithm [331].
and real-time RL algorithms [328]. Beyond cyber- • Bandwidth: ML algorithms often require large
physical closed-loop systems, model inference delay amounts of bandwidth to transfer data, and this
impacts user experience when prompt LLMs, IoT or can be a challenge in networks with limited band-
VR services are run via edge computing networks width. High bandwidth requirements can also lead
[329]. In other words, in these cases, the system to increased costs for network infrastructure in a
loop is closed via human feedback, where unstable real-world deployment.
behavior will eventually result in the performance • Network topology: The topology of a network can
loss. impact the performance of ML algorithms. For
• Energy efficiency: ML models often require signifi- example, networks with high levels of congestion
cant computational resources, which can be chal- or interference may not be suitable for real-time
lenging in resource-constrained network environ- applications.
ments. As the current trend points towards ever- • Privacy and security: ML algorithms require access
increasing model scales, energy efficiency might to data, which can create potential privacy and se-
become an even more important aspect in even curity risks, increasing the risk of data breaches and
more situations. cyber-attacks during transmission over the network
• Privacy and security: Networks can contain sensi- or remote processing of user data.
tive and private data, which requires ML algorithms • Heterogeneous resources: The computing and com-
to be developed with strong privacy and security munication resources in devices used for processing
safeguards. ML algorithms for networks must main- ML algorithms over the network may vary widely,
tain data privacy while providing accurate predic- leading to unstable training processes. Furthermore,
tions. this can lead to the presence of slower devices
• Network complexity: Computer Networks can be (stragglers) that slow down the training of a global
highly complex and dynamic, with large numbers model and affect the model’s efficiency.
of interconnected nodes, an interplay of various dif- As earlier mentioned in Section VI some of these
ferent protocols and changing operation conditions. challenges may overlap, such as privacy and security.
This makes it challenging to develop accurate ML Overall, ML for Networks and Networks for ML are
38 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
rapidly growing fields with many challenges and oppor- and enable data ingestion, processing, and exposure
tunities for future research. Addressing these challenges across layers and domains.
will require collaboration between researchers from dif-
ferent disciplines. In the following sections, we will 2) Intelligence everywhere
discuss some of the trending applications that focus on a comprehensive and automated management of AI
these challenges. models, from training to deployment to monitoring,
and the ability to handle model drift, retraining, and
A. A NEW PARADIGM FOR NEXT-GENERATION WIRELESS versioning. This would take place for every network layer
NETWORKS and on every network device.
The rapid advancement of AI and ML technologies
3) Zero touch
has also opened up new vistas for next-generation
a high degree of automation and autonomy for the man-
wireless networks like 5G Advanced and 6G. These
agement of AI and data, and the ability to express and
next-generation networks essentially serve two purposes:
supervise high-level goals rather than low-level actions.
Data transport and service delivery. They comprise
various types of devices from User Equipments (UEs),
4) AI as a service
base stations, switches, routers, and servers in a
the exposure of AI and data services to external parties,
data center. With the integration of SDN and
such as service providers or customers, and the creation
Network Function Virtualization (NFV), all devices can
of a platform for innovation and collaboration.
now constantly adapt to new situations, such as chang-
For further readings on the evaluation metrics of such
ing traffic patterns, better function placements, or new
networks, we refer to [338]. The authors in [339] also
service demands, and incorporateAI and ML [332].
provide a road-map with potential frameworks to build
These technologies promise to revolutionize the way
such networks.
we design and manage wireless networks, leading to
the emergence of AI-native networks and AI-native air
B. DEEP NEURAL NETWORKS MODEL COMPLEXITY AND
interfaces.
ENERGY CONSUMPTION
On the one hand, AI-native networks are networks
The increasing complexity of DNNs has direct impli-
designed with AI integration at their core, rather than
cations on energy consumption, a critical factor in
as an afterthought or add-on. Hence, AI (partially)
both environmental sustainability and practical deploy-
replaces human-defined rules, models, and algorithms,
ment [340]. The complexity of DNNs is largely driven
which may not be optimal or scalable for the complex
by the depth and breadth of the network architecture.
and dynamic wireless scenarios, so that these networks
As DNNs grow deeper (with more layers) and wider
can learn, adapt, and optimize itself autonomously and
(with more neurons in each layer), they can capture
intelligently.
more intricate patterns in data. This increased capacity,
On the other hand, an AI-native air interface is an air
while maybe beneficial for model accuracy, leads to a
interface that uses AI and ML to define and configure
higher number of computations during both the training
its physical and medium access control layer parame-
and inference phases [282]. Each computation requires
ters, such as waveforms, constellations, pilots, coding,
a certain amount of energy, and thus, as models grow
modulation, synchronization, channel estimation, equal-
more complex, their energy requirements escalate.
ization, detection, decoding, and access schemes [333].
The energy consumption of DNNs is a multifaceted
One of the main challenges here is the complexity
issue. Training DNNs is an energy-intensive process that
and heterogeneity of wireless networks. This complexity
requires substantial computational resources [341]. This
makes it difficult to collect, process, and analyze data in
phase often necessitates the use of high-performance
real-time [334]. However, this can be mitigated by using
GPUs or even clusters of GPUs, which are power-
distributed AI engines, which can process data closer to
hungry devices [342]. The electricity consumption dur-
the source and reduce latency. Another challenge is the
ing this phase is considerable, contributing to the overall
lack of standardized frameworks and architectures for
energy footprint of developing DNNs. The inference
implementing AI in networks. To address this challenge,
phase, where DNNs make predictions on new data,
industry, and academia collaborate to develop standard-
also demands a considerable amount of energy [343].
ized AI frameworks and tools that can be used across
This phase is critical in real-world applications where
different networks [335], [336]. There are four aspects
continuous or on-demand operation of DNNs is required,
to address this challenge [337]:
such as in autonomous systems or real-time analysis
applications.
1) Data infrastructure The substantial energy consumption of DNNs poses
a distributed data infrastructure that can handle mas- a significant challenge for environmental sustainability.
sive amounts of varied, distributed, and dynamic data, As these networks become more prevalent across various
VOLUME 11, 2023 39
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
sectors, the need for energy-efficient neural network Compared to the aforementioned surveys and tutorials
architectures and training methods becomes increasingly (Section VII), we are the first to provide a comprehen-
important [344]. In energy-constrained environments sive bidirectional overview of ML and XAI techniques
(e.g., with battery-operated devices) the energy de- across different networking fields, and vice versa 43 .
mands of DNNs are a crucial consideration. This has led Furthermore, in addition to an overview of the current
to a focus on balancing model complexity with energy state of the art, our work provides practical guidance for
efficiency, driving innovation in optimization techniques, aspiring researchers to shortcut their way into meaning-
and the development of specialized hardware to run ful research:
these models more efficiently [345]. Moreover, different • Many of the mentioned related papers do not con-
models and benchmarks are used to estimate and plan sider datasets and/or starting points to reproduce
the energy consumption of DNNs [346]–[348]. the results or even to just start experimenting. In
contrast, we refer to publicly available datasets as
C. TINY MACHINE LEARNING well as methods and tools to generate synthetic
Tiny Machine Learning (TinyML) is an emerging field datasets (Section IV) and design ML models suit-
that combines ML with ultra-low power computing, able for the respective task.
typically found in microcontrollers and small IoT de- • We categorize existing approaches as ML serv-
vices [349]. Its goal is to deploy efficient ML models that ing networks (ML4N) and networks serving ML
can operate in environments with limited memory, pro- (N4ML) based on the used metrics, which helps to
cessing power, and energy. This is particularly relevant identify research gaps and possible future directions
for applications where traditional ML models would be of research.
impractical due to their size and energy requirements. We introduced the most popular ML techniques,
The primary motivation for TinyML is the need model types, and tools as well as several practical as-
for localized data processing, especially in situations pects to consider when practicing ML such as obtaining
where privacy, speed, and power efficiency are critical, high-quality data for the learning algorithm, or the
rather than transmitting it to a centralized server or incorporation of inductive biases (more specifically for
cloud [350]. This can be applied for many applica- networking data and network topologies) into ML mod-
tions, spanning from smart home devices and wearable els in order to reduce resource requirements. Secondly,
technology to healthcare monitoring and environmental we introduced the most common computer networking
sensors [283]. problem domains and pointed to existing tools and
The core implementation of TinyML relies on ML datasets to accelerate and facilitate ML research on
model quantization, which reduces its numerical pre- networking problems.
cision and size. Hence, implementing TinyML in en- Thirdly, we introduced how XAI methods can improve
vironments with limited resources presents several on- the transparency of ML models’ decisions and thus
going challenges – The low computational capabilities push their acceptance in the computer networks research
and storage capacities of smaller devices restrict the domain and their suitability in productive environments.
complexity of the models that can be deployed [351]. We also elaborated on how networking techniques can
This constraint can adversely affect the efficacy and boost the performance of existing ML setups and work-
precision of TinyML-based applications. To address this, flows, e.g. through several approaches for distributed
some research suggests the integration of cooperative learning.
ML (Section VI) and TinyML approaches [343], [352].
Lastly, we provided a large number of pointers
This strategy would enable devices with constrained
for further reading, such as surveys on more specific
resources to work collaboratively on ML tasks. More-
ML/networking domains, example research works for
over, progress in hardware development, particularly in
some of the problems introduced in this paper or links
creating more efficient microcontrollers and sensors, is
to many of the mentioned datasets or tools.
expected to broaden the range of possible applications
Despite our comprehensive coverage of established
for TinyML. For a recent survey of tinyML applications
tools, approaches, and recent breakthroughs, it’s impor-
and techniques, we refer to [353]
tant to acknowledge the dynamic nature of ML research.
The field is characterized by the emergence of new
IX. CONCLUSION
algorithms, the potential availability of additional tools
The aim of this paper is to provide interested but inexpe-
and features in the future, and the hopeful prospect
rienced readers an an inspiring and practical jumpstart
for research in the intersection of ML and computer net- 43 We do not aim for a comprehensive review of state-of-the-
working. This encompasses not only the creation of novel art research in ML or its sub-disciplines, as there are numerous
ML-powered solutions for covered networking scenarios survey and tutorial resources that provide an excellent ML-focused
overview. Rather, we view ML techniques solely in relation to net-
but also leveraging established networking technology to working, either as facilitators (ML for Networks) or beneficiaries
enhance existing ML approaches. (Networks for ML).
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
of more open-sourced datasets. While this evolution is svm,” in 2016 IEEE international congress on big data
happening at an unprecedented pace, this paper still (BigData Congress). IEEE, 2016, pp. 402–409.
[16] A. J. Smola and B. Schölkopf, “A tutorial on support vector
serves as a valuable starting point for researchers and regression,” Statistics and computing, vol. 14, pp. 199–222,
newcomers alike and provides a timely and relevant 2004.
contribution to the intersection of the fields of ML and [17] C.-Y. Hsu, P.-Y. Chen, S. Lu, S. Liu, and C.-M. Yu,
“Adversarial examples can be effective data augmentation
computer networking. for unsupervised machine learning,” 2021.
[18] D. Kim and J. Choi, “Self-supervised learning for
REFERENCES binary networks by joint classifier training,” CoRR,
[1] J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos- vol. abs/2110.08851, 2021. [Online]. Available: https:
Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. //arxiv.org/abs/2110.08851
Carfrae, Z. Bloom-Ackermann, V. M. Tran, A. Chiappino- [19] E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and
Pepe, A. H. Badran, I. W. Andrews, E. J. Chory, G. M. X. Xu, “Dbscan revisited, revisited: why and how you should
Church, E. D. Brown, T. S. Jaakkola, R. Barzilay, and (still) use dbscan,” ACM Transactions on Database Systems
J. J. Collins, “A Deep Learning Approach to Antibiotic (TODS), vol. 42, no. 3, pp. 1–21, 2017.
Discovery,” Cell, vol. 180, no. 4, pp. 688–702.e13, Feb. [20] J. Li, H. Izakian, W. Pedrycz, and I. Jamal, “Clustering-
2020. [Online]. Available: https://www.sciencedirect.com/ based anomaly detection in multivariate time series data,”
science/article/pii/S0092867420301021 Applied Soft Computing, vol. 100, p. 106919, 2021.
[2] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, [21] I. Ullah and H. Y. Youn, “Task classification and scheduling
and B. Ommer, “High-Resolution Image Synthesis based on k-means clustering for edge computing,” Wireless
With Latent Diffusion Models,” in Proceedings of the Personal Communications, vol. 113, pp. 2611–2624, 2020.
IEEE/CVF Conference on Computer Vision and Pattern [22] Z. Fan and R. Liu, “Investigation of machine learning
Recognition, 2022, pp. 10 684–10 695. [Online]. Available: based network traffic classification,” in 2017 International
https://openaccess.thecvf.com/content/CVPR2022/html/ Symposium on Wireless Communication Systems (ISWCS).
Rombach_High-Resolution_Image_Synthesis_With_ IEEE, 2017, pp. 1–6.
Latent_Diffusion_Models_CVPR_2022_paper.html [23] R. Bellman, “Dynamic programming,” Science, vol. 153, no.
[3] A. Davies, P. Veličković, L. Buesing, S. Blackwell, 3731, pp. 34–37, 1966.
D. Zheng, N. Tomašev, R. Tanburn, P. Battaglia, [24] H. Abdi and L. J. Williams, “Principal component analysis,”
C. Blundell, A. Juhász, M. Lackenby, G. Williamson, Wiley interdisciplinary reviews: computational statistics,
D. Hassabis, and P. Kohli, “Advancing mathematics by vol. 2, no. 4, pp. 433–459, 2010.
guiding human intuition with AI,” Nature, vol. 600, [25] C. Fefferman, S. Mitter, and H. Narayanan, “Testing the
no. 7887, pp. 70–74, Dec. 2021. [Online]. Available: manifold hypothesis,” Journal of the American Mathemati-
https://www.nature.com/articles/s41586-021-04086-x cal Society, vol. 29, no. 4, pp. 983–1049, 2016.
[4] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, [26] U. Narayanan, A. Unnikrishnan, V. Paul, and S. Joseph, “A
“Artificial neural networks-based machine learning for wire- survey on various supervised classification algorithms,” in
less networks: A tutorial,” IEEE Communications Surveys 2017 International Conference on Energy, Communication,
& Tutorials, vol. 21, no. 4, pp. 3039–3071, 2019. Data Analytics and Soft Computing (ICECDS), 2017, pp.
[5] S. J. Russell, Artificial intelligence a modern approach. 2118–2124.
Pearson Education, Inc., 2010. [27] J. E. van Engelen and H. H. Hoos, “A survey on
[6] M. F. Ahmad Fauzi, R. Nordin, N. F. Abdullah, and semi-supervised learning,” Mach Learn, vol. 109, no. 2, pp.
H. A. H. Alobaidy, “Mobile network coverage prediction 373–440, Feb. 2020. [Online]. Available: https://doi.org/10.
based on supervised machine learning algorithms,” IEEE 1007/s10994-019-05855-6
Access, vol. 10, pp. 55 782–55 793, 2022. [28] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, “Machine
[7] C. Ioannou and V. Vassiliou, “Classifying security attacks learning in wireless sensor networks: Algorithms, strategies,
in iot networks using supervised learning,” in 2019 15th In- and applications,” IEEE Communications Surveys & Tuto-
ternational Conference on Distributed Computing in Sensor rials, vol. 16, no. 4, pp. 1996–2018, 2014.
Systems (DCOSS), 2019, pp. 652–658. [29] M. Usama, J. Qadir, A. Raza, H. Arif, K.-L. A. Yau,
[8] W. Hu, Y. Liao, and V. R. Vemuri, “Robust anomaly Y. Elkhatib, A. Hussain, and A. Al-Fuqaha, “Unsupervised
detection using support vector machines,” in Proceedings of machine learning for networking: Techniques, applications
the international conference on machine learning. Morgan and research challenges,” IEEE access, vol. 7, pp. 65 579–
Kaufmann Publishers, 2003, pp. 282–289. 65 615, 2019.
[9] B. Mohammed, I. Awan, H. Ugail, and M. Younas, “Fail- [30] Z. Ghahramani, “Probabilistic machine learning and
ure prediction using machine learning in a virtualised hpc artificial intelligence,” Nature, vol. 521, no. 7553, pp. 452–
system and application,” Cluster Computing, vol. 22, pp. 459, May 2015. [Online]. Available: https://www.nature.
471–485, 2019. com/articles/nature14541
[10] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and [31] K. P. Murphy, Probabilistic Machine Learning: An
B. Scholkopf, “Support vector machines,” IEEE Intelligent introduction. MIT Press, 2022. [Online]. Available:
Systems and their applications, vol. 13, no. 4, pp. 18–28, https://probml.github.io/pml-book/book1.html
1998. [32] X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang,
[11] S. B. Kotsiantis, “Decision trees: a recent overview,” Artifi- and J. Tang, “Self-Supervised Learning: Generative or Con-
cial Intelligence Review, vol. 39, pp. 261–283, 2013. trastive,” IEEE Transactions on Knowledge and Data Engi-
[12] L. Breiman, “Random forests,” Machine learning, vol. 45, neering, vol. 35, no. 1, pp. 857–876, Jan. 2023.
pp. 5–32, 2001. [33] F. Ebert, C. Finn, A. X. Lee, and S. Levine, “Self-Supervised
[13] G. Shakhnarovich, T. Darrell, and P. Indyk, “Nearest- Visual Planning with Temporal Skip Connections,” in Con-
neighbor methods in learning and vision,” IEEE Trans. ference on Robot Learning (CoRL), 2017.
Neural Networks, vol. 19, no. 2, p. 377, 2008. [34] S. Meyn, Control systems and reinforcement learning. Cam-
[14] M. Nasri and M. Hamdi, “Lte qos parameters prediction bridge University Press, 2022.
using multivariate linear regression algorithm,” in 2019 22nd [35] Y. Xu, G. Gui, H. Gacanin, and F. Adachi, “A survey on
conference on innovation in clouds, internet and networks resource allocation for 5g heterogeneous networks: Current
and workshops (ICIN). IEEE, 2019, pp. 145–150. research, future trends, and challenges,” IEEE Communica-
[15] A. Y. Nikravesh, S. A. Ajila, C.-H. Lung, and W. Ding, tions Surveys & Tutorials, vol. 23, no. 2, pp. 668–695, 2021.
“Mobile network traffic prediction using mlp, mlpwd, and
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
[36] M. M. Sadeeq, N. M. Abdulkareem, S. R. Zeebaree, D. M. environments using a3c learning and residual recurrent
Ahmed, A. S. Sami, and R. R. Zebari, “Iot and cloud neural networks,” IEEE transactions on mobile computing,
computing issues, challenges and opportunities: A review,” vol. 21, no. 3, pp. 940–954, 2020.
Qubahan Academic Journal, vol. 1, no. 2, pp. 1–7, 2021. [56] M. Chen, T. Wang, K. Ota, M. Dong, M. Zhao, and
[37] P. Kumar and R. Kumar, “Issues and challenges of load A. Liu, “Intelligent resource allocation management for
balancing techniques in cloud computing: A survey,” ACM vehicles network: An a3c learning approach,” Computer
Computing Surveys (CSUR), vol. 51, no. 6, pp. 1–35, 2019. Communications, vol. 151, pp. 485–494, 2020.
[38] A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, [57] S. Still and D. Precup, “An information-theoretic approach
and M. Hamdi, “Deep reinforcement learning for radio to curiosity-driven reinforcement learning,” Theory in Bio-
resource allocation and management in next generation sciences, vol. 131, pp. 139–148, 2012.
heterogeneous wireless networks: A survey,” arXiv preprint [58] Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Ex-
arXiv:2106.00574, 2021. ploration by random network distillation,” arXiv preprint
[39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An arXiv:1810.12894, 2018.
Introduction, 2018. [59] M. L. Littman, “Markov games as a framework for multi-
[40] D. Bertsekas, Dynamic programming and optimal control: agent reinforcement learning,” in Machine learning proceed-
Volume II. Athena scientific, 2012, vol. 2. ings 1994. Elsevier, 1994, pp. 157–163.
[41] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic pro- [60] T. Gabel, “Multi-agent reinforcement learning approaches
gramming. Athena Scientific Belmont, MA, 1996, vol. 5. for distributed job shop scheduling problems,” Ph.D. disser-
[42] T. M. Moerland, J. Broekens, A. Plaat, C. M. Jonker et al., tation, Osnabrück, Univ., Diss., 2009, 2009.
“Model-based reinforcement learning: A survey,” Founda- [61] L. Canese, G. C. Cardarilli, L. Di Nunzio, R. Fazzolari,
tions and Trends® in Machine Learning, vol. 16, no. 1, pp. D. Giardino, M. Re, and S. Spanò, “Multi-agent reinforce-
1–118, 2023. ment learning: A review of challenges and applications,”
[43] Y.-P. Hsu, E. Modiano, and L. Duan, “Age of information: Applied Sciences, vol. 11, no. 11, p. 4948, 2021.
Design and analysis of optimal scheduling algorithms,” in [62] T. Li, K. Zhu, N. C. Luong, D. Niyato, Q. Wu, Y. Zhang,
2017 IEEE International Symposium on Information Theory and B. Chen, “Applications of multi-agent reinforcement
(ISIT). IEEE, 2017, pp. 561–565. learning in future internet: A comprehensive survey,” IEEE
[44] Q. Sykora, M. Ren, and R. Urtasun, “Multi-agent routing Communications Surveys & Tutorials, 2022.
value iteration network,” in International Conference on [63] E. Altman, Constrained Markov decision processes. CRC
Machine Learning. PMLR, 2020, pp. 9300–9310. press, 1999, vol. 7.
[45] S. S. Mwanje, L. C. Schmelz, and A. Mitschele-Thiel, [64] S. Gu, L. Yang, Y. Du, G. Chen, F. Walter, J. Wang,
“Cognitive cellular networks: A q-learning framework for Y. Yang, and A. Knoll, “A review of safe reinforcement
self-organizing networks,” IEEE Transactions on Network learning: Methods, theory and applications,” arXiv preprint
and Service Management, vol. 13, no. 1, pp. 85–98, 2016. arXiv:2205.10330, 2022.
[46] Y. Kim, S. Kim, and H. Lim, “Reinforcement learning based [65] A. Avranas, M. Kountouris, and P. Ciblat, “Deep
resource management for network slicing,” Applied Sciences, reinforcement learning for wireless scheduling with
vol. 9, no. 11, p. 2361, 2019. multiclass services,” CoRR, vol. abs/2011.13634, 2020.
[47] H. Afifi and H. Karl, “Reinforcement learning for virtual [Online]. Available: https://arxiv.org/abs/2011.13634
network embedding in wireless sensor networks,” in 2020 [66] S. Khairy, P. Balaprakash, L. X. Cai, and Y. Cheng,
16th International Conference on Wireless and Mobile Com- “Constrained deep reinforcement learning for energy
puting, Networking and Communications (WiMob). IEEE, sustainable multi-uav based random access iot networks
2020, pp. 123–128. with NOMA,” CoRR, vol. abs/2002.00073, 2020. [Online].
[48] A. Geramifard, T. J. Walsh, S. Tellex, G. Chowdhary, Available: https://arxiv.org/abs/2002.00073
N. Roy, J. P. How et al., “A tutorial on linear function [67] C. Sun, C. She, and C. Yang, “Optimizing ultra-reliable
approximators for dynamic programming and reinforcement and low-latency communication systems with unsupervised
learning,” Foundations and Trends® in Machine Learning, learning,” CoRR, vol. abs/2006.01641, 2020. [Online].
vol. 6, no. 4, pp. 375–451, 2013. Available: https://arxiv.org/abs/2006.01641
[49] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, [68] Constrained Unsupervised Learning for Wireless Network
“Policy gradient methods for reinforcement learning with Optimization. Cambridge University Press, 2022, p.
function approximation,” Advances in neural information 182–211.
processing systems, vol. 12, 1999. [69] D. Wu, L. Deng, Z. Liu, Y. Zhang, and Y. S. Han, “Re-
[50] R. J. Williams, “Simple statistical gradient-following algo- inforcement learning random access for delay-constrained
rithms for connectionist reinforcement learning,” Reinforce- heterogeneous wireless networks: A two-user case,” in 2021
ment learning, pp. 5–32, 1992. IEEE Globecom Workshops (GC Wkshps), 2021, pp. 1–7.
[51] I. Grondman, L. Busoniu, G. A. Lopes, and R. Babuska, “A [70] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
survey of actor-critic reinforcement learning: Standard and MIT Press, 2016, http://www.deeplearningbook.org.
natural policy gradients,” IEEE Transactions on Systems, [71] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
Man, and Cybernetics, Part C (Applications and Reviews), classification with deep convolutional neural networks,”
vol. 42, no. 6, pp. 1291–1307, 2012. Commun. ACM, vol. 60, no. 6, pp. 84–90, May
[52] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, 2017. [Online]. Available: https://dl.acm.org/doi/10.1145/
T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous 3065386
methods for deep reinforcement learning,” in International [72] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D.
conference on machine learning. PMLR, 2016, pp. 1928– Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry,
1937. A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger,
[53] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Re- T. Henighan, R. Child, A. Ramesh, D. Ziegler,
source management with deep reinforcement learning,” in J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler,
Proceedings of the 15th ACM workshop on hot topics in M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner,
networks, 2016, pp. 50–56. S. McCandlish, A. Radford, I. Sutskever, and D. Amodei,
[54] C. Zhong, Z. Lu, M. C. Gursoy, and S. Velipasalar, “A deep “Language Models are Few-Shot Learners,” in Advances in
actor-critic reinforcement learning framework for dynamic Neural Information Processing Systems, vol. 33. Curran
multichannel access,” IEEE Transactions on Cognitive Com- Associates, Inc., 2020, pp. 1877–1901. [Online]. Avail-
munications and Networking, vol. 5, no. 4, pp. 1125–1139, able: https://proceedings.neurips.cc/paper/2020/hash/
2019. 1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
[55] S. Tuli, S. Ilager, K. Ramamohanarao, and R. Buyya, [73] W. S. McCulloch and W. Pitts, “A logical calculus of
“Dynamic scheduling for stochastic edge-cloud computing the ideas immanent in nervous activity,” The bulletin of
mathematical biophysics, vol. 5, no. 4, p. 115–133, Dec 1943.
42 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
[74] S. Sharma, S. Sharma, and A. Athaiya, “Activation functions [91] A. Cheng, “Pac-gan: Packet generation of network traffic
in neural networks,” Towards Data Sci, vol. 6, no. 12, pp. using generative adversarial networks,” in 2019 IEEE 10th
310–316, 2017. Annual Information Technology, Electronics and Mobile
[75] K. Hornik, M. Stinchcombe, and H. White, “Multilayer Communication Conference (IEMCON), 2019, pp. 0728–
feedforward networks are universal approximators,” Neural 0734.
Networks, vol. 2, no. 5, pp. 359–366, Jan. 1989. [Online]. [92] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine
Available: https://www.sciencedirect.com/science/article/ translation by jointly learning to align and translate,” arXiv
pii/0893608089900208 preprint arXiv:1409.0473, 2014.
[76] R. Hecht-Nielsen, “Theory of the backpropagation neural [93] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
network,” in Neural networks for perception. Elsevier, 1992, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is
pp. 65–93. all you need,” Advances in neural information processing
[77] D. P. Kingma and J. Ba, “Adam: A method for stochastic systems, vol. 30, 2017.
optimization,” arXiv preprint arXiv:1412.6980, 2014. [94] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert:
[78] G. Lan, First-order and stochastic optimization methods for Pre-training of deep bidirectional transformers for language
machine learning. Springer, 2020, vol. 1. understanding,” arXiv preprint arXiv:1810.04805, 2018.
[79] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, [95] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn,
“Geometric Deep Learning: Grids, Groups, Graphs, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer,
Geodesics, and Gauges,” May 2021. [Online]. Available: G. Heigold, S. Gelly et al., “An image is worth 16x16 words:
http://arxiv.org/abs/2104.13478 Transformers for image recognition at scale,” arXiv preprint
[80] R. Eldan and O. Shamir, “The power of depth for feedfor- arXiv:2010.11929, 2020.
ward neural networks,” in Conference on learning theory. [96] C. Joshi, “Transformers are graph neural networks,” The
PMLR, 2016, pp. 907–940. Gradient, vol. 12, 2020.
[81] T. Gruber, S. Cammerer, J. Hoydis, and S. Ten Brink, “On [97] D. K. Kholgh and P. Kostakos, “Pac-gpt: A novel approach
deep learning-based channel decoding,” in 2017 51st annual to generating synthetic network traffic with gpt-3,” IEEE
conference on information sciences and systems (CISS). Access, 2023.
IEEE, 2017, pp. 1–6. [98] N. Ziems, G. Liu, J. Flanagan, and M. Jiang, “Explaining
[82] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. tree model decisions in natural language for network intru-
Sidiropoulos, “Learning to optimize: Training deep neural sion detection,” arXiv preprint arXiv:2310.19658, 2023.
networks for wireless resource management,” in 2017 IEEE [99] T. Ali and P. Kostakos, “Huntgpt: Integrating ma-
18th International Workshop on Signal Processing Advances chine learning-based anomaly detection and explainable
in Wireless Communications (SPAWC). IEEE, 2017, pp. ai with large language models (LLMs),” arXiv preprint
1–6. arXiv:2309.16021, 2023.
[83] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, [100] S. K. Mani, Y. Zhou, K. Hsieh, S. Segarra, T. Eberl, E. Azu-
and M. Ghogho, “Deep learning approach for network in- lai, I. Frizler, R. Chandra, and S. Kandula, “Enhancing net-
trusion detection in software defined networking,” in 2016 work management using code generated by large language
international conference on wireless networks and mobile models,” in Proceedings of the 22nd ACM Workshop on Hot
communications (WINCOM). IEEE, 2016, pp. 258–263. Topics in Networks, 2023, pp. 196–204.
[84] S. Hochreiter and J. Schmidhuber, “Long Short-Term Mem- [101] Y. Huang, H. Du, X. Zhang, D. Niyato, J. Kang, Z. Xiong,
ory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. S. Wang, and T. Huang, “Large language models for net-
1997. working: Applications, enabling techniques, and challenges,”
[85] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, arXiv preprint arXiv:2311.17474, 2023.
F. Bougares, H. Schwenk, and Y. Bengio, “Learning [102] J. Sun, Q. V. Liao, M. Muller, M. Agarwal,
Phrase Representations using RNN Encoder–Decoder for S. Houde, K. Talamadupula, and J. D. Weisz,
Statistical Machine Translation,” in Proceedings of the 2014 “Investigating explainability of generative ai for code
Conference on Empirical Methods in Natural Language through scenario-based design,” in 27th International
Processing (EMNLP). Doha, Qatar: Association for Conference on Intelligent User Interfaces, ser. IUI ’22.
Computational Linguistics, 2014, pp. 1724–1734. [Online]. New York, NY, USA: Association for Computing
Available: http://aclweb.org/anthology/D14-1179 Machinery, 2022, p. 212–228. [Online]. Available:
[86] P. Veličković, “Everything is Connected: Graph Neural https://doi.org/10.1145/3490099.3511119
Networks,” Jan. 2023. [Online]. Available: http://arxiv.org/ [103] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai,
abs/2301.08210 J. Sun, Q. Guo, M. Wang, and H. Wang, “Retrieval-
[87] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, augmented generation for large language models: A survey,”
D. Warde-Farley, S. Ozair, A. Courville, and arXiv preprint arXiv:2312.10997, 2024.
Y. Bengio, “Generative adversarial nets,” in Advances [104] Cisco, Cisco Unveils Next-Gen Solutions that Empower
in Neural Information Processing Systems, Z. Ghahramani, Security and Productivity with Generative AI, 2023.
M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, [Online]. Available: https://newsroom.cisco.com/c/r/
Eds., vol. 27. Curran Associates, Inc., 2014. [Online]. Avail- newsroom/en/us/a/y2023/m06/cisco-unveils-next-gen-
able: https://proceedings.neurips.cc/paper_files/paper/ solutions-that-empower-security-and-productivity-with-
2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf generative-ai.html
[88] C. Han, H. Hayashi, L. Rundo, R. Araki, W. Shimoda, [105] Juniper, AI for IT Operations (AIOps), 2023. [On-
S. Muramatsu, Y. Furukawa, G. Mauri, and H. Nakayama, line]. Available: https://www.juniper.net/us/en/solutions/
“Gan-based synthetic brain mr image generation,” in 2018 artificial-intelligence-for-it-operations-aiops.html
IEEE 15th International Symposium on Biomedical Imaging [106] O. Santos. (2023) Securing ai: Navigating
(ISBI 2018), 2018, pp. 734–738. the complex landscape of models, fine-
[89] Y. Chen, Y. Pan, T. Yao, X. Tian, and T. Mei, “Mocycle- tuning, and rag. [Online]. Available:
gan: Unpaired video-to-video translation,” in Proceedings https://blogs.cisco.com/security/securing-ai-navigating-
of the 27th ACM international conference on multimedia, the-complex-landscape-of-models-fine-tuning-and-rag
2019, pp. 647–655. [107] I. Cisco Systems. (2023) Cisco ai assistant - cisco. [Online].
[90] J. Kong, J. Kim, and J. Bae, “Hifi-gan: Generative adversar- Available: [4](https://www.cisco.com/site/us/en/solutions/
ial networks for efficient and high fidelity speech synthesis,” artificial-intelligence/ai-assistant/index.html)
Advances in Neural Information Processing Systems, vol. 33, [108] S. Thrun and A. Schwartz, “Issues in using function ap-
pp. 17 022–17 033, 2020. proximation for reinforcement learning,” in Proceedings of
the Fourth Connectionist Models Summer School, vol. 255.
Hillsdale, NJ, 1993, p. 263.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
[109] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, Proceedings of the ACM Web Conference 2022, 2022, pp.
M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, 3407–3417.
G. Ostrovski et al., “Human-level control through deep [127] D. Raca, J. J. Quinlan, A. H. Zahran, and C. J. Sreenan,
reinforcement learning,” nature, vol. 518, no. 7540, pp. 529– “Beyond throughput: A 4g lte dataset with channel and con-
533, 2015. text metrics,” in Proceedings of the 9th ACM Multimedia
[110] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Systems Conference, 2018, pp. 460–465.
Y. Tassa, D. Silver, and D. Wierstra, “Continuous con- [128] S. Farthofer, M. Herlich, C. Maier, S. Pochaba, J. Lackner,
trol with deep reinforcement learning,” arXiv preprint and P. Dorfinger, “An open mobile communications drive
arXiv:1509.02971, 2015. test data set and its use for machine learning,” IEEE Open
[111] A. Ramaswamy, S. Bhatnagar, and N. Saxena, “A framework Journal of the Communications Society, vol. 3, pp. 1688–
for provably stable and consistent training of deep feedfor- 1701, 2022.
ward networks,” arXiv preprint arXiv:2305.12125, 2023. [129] J. Wu, L. Wang, Q. Pei, X. Cui, F. Liu, and T. Yang, “Hitdl:
[112] A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Kor- High-throughput deep learning inference at the hybrid mo-
jus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation bile edge,” IEEE Transactions on Parallel and Distributed
and competition with deep reinforcement learning,” PloS Systems, vol. 33, no. 12, pp. 4499–4514, 2022.
one, vol. 12, no. 4, p. e0172395, 2017. [130] D. Raca, D. Leahy, C. J. Sreenan, and J. J. Quinlan, “Beyond
[113] R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and throughput, the next generation: a 5g dataset with channel
I. Mordatch, “Multi-agent actor-critic for mixed cooperative- and context metrics,” in Proceedings of the 11th ACM
competitive environments,” Advances in neural information Multimedia Systems Conference, 2020, pp. 303–308.
processing systems, vol. 30, 2017. [131] 3rd Party, “Geant/abilene network topology data and traffic
[114] A. Redder, A. Ramaswamy, and H. Karl, “3dpg: Distributed traces,” 2020. [Online]. Available: ^1^
deep deterministic policy gradient algorithms for networked [132] [Online]. Available: https://www.geni.net/
multi-agent systems,” arXiv preprint arXiv:2201.00570v2, [133] “Caida data - completed datasets,” Nov 2020. [Online]. Avail-
2022. able: https://www.caida.org/catalog/datasets/completed-
[115] C. Qiu, H. Yao, F. R. Yu, F. Xu, and C. Zhao, “Deep datasets/
q-learning aided networking, caching, and computing re- [134] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson,
sources allocation in software-defined satellite-terrestrial “Measuring isp topologies with rocketfuel,” IEEE/ACM
networks,” IEEE Transactions on Vehicular Technology, Transactions on Networking, vol. 12, no. 1, pp. 2–16, 2004.
vol. 68, no. 6, pp. 5871–5883, 2019. [135] “The internet topology zoo.” [Online]. Available: http:
[116] S. Schneider, R. Khalili, A. Manzoor, H. Qarawlus, R. Schel- //www.topology-zoo.org/dataset.html
lenberg, H. Karl, and A. Hecker, “Self-learning multi- [136] M. Roughan, “A case study of the accuracy of snmp
objective service coordination using deep reinforcement measurements,” Journal of Electrical and Computer
learning,” IEEE Transactions on Network and Service Man- Engineering, vol. 2010, p. 812979, May 2010. [Online].
agement, vol. 18, no. 3, pp. 3829–3842, 2021. Available: https://doi.org/10.1155/2010/812979
[117] A. Redder, A. Ramaswamy, and D. E. Quevedo, “Deep re- [137] J. Kua, G. Armitage, and P. Branch, “A survey of rate
inforcement learning for scheduling in large-scale networked adaptation techniques for dynamic adaptive streaming over
control systems,” IFAC-PapersOnLine, vol. 52, no. 20, pp. http,” IEEE Communications Surveys & Tutorials, vol. 19,
333–338, 2019. no. 3, pp. 1842–1866, 2017.
[118] H. Afifi, A. Ramaswamy, and H. Karl, “Reinforcement learn- [138] G. Zhou, R. Wu, M. Hu, Y. Zhou, T. Z. Fu, and D. Wu,
ing for autonomous vehicle movements in wireless sensor “Vibra: Neural adaptive streaming of vbr-encoded videos,”
networks,” in ICC 2021-IEEE International Conference on in Proceedings of the 31st ACM Workshop on Network and
Communications. IEEE, 2021, pp. 1–6. Operating Systems Support for Digital Audio and Video,
[119] B. Jang, M. Kim, G. Harerimana, and J. W. Kim, “Q- 2021, pp. 1–8.
learning algorithms: A comprehensive classification and ap- [139] Y. Yuan, W. Wang, Y. Wang, S. S. Adhatarao, B. Ren,
plications,” IEEE access, vol. 7, pp. 133 653–133 667, 2019. K. Zheng, and X. Fu, “Vsim: Improving qoe fairness for video
[120] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.- streaming in mobile environments,” in Proceedings of the
C. Liang, and D. I. Kim, “Applications of deep reinforcement IEEE International Conference on Computer Communica-
learning in communications and networking: A survey,” tions (INFOCOM). IEEE, 2022, pp. 1309–1318.
IEEE Communications Surveys & Tutorials, vol. 21, no. 4, [140] S. Lederer, C. Müller, and C. Timmerer, “Dynamic adaptive
pp. 3133–3174, 2019. streaming over http dataset,” in Proceedings of the 3rd
[121] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and Multimedia Systems Conference, 2012, pp. 89–94.
R. R. Salakhutdinov, “Improving neural networks by pre- [141] S. Lederer, C. Mueller, C. Timmerer, C. Concolato,
venting co-adaptation of feature detectors,” arXiv preprint J. Le Feuvre, and K. Fliegel, “Distributed dash dataset,” in
arXiv:1207.0580, 2012. Proceedings of the 4th ACM multimedia systems conference,
[122] H. Riiser, P. Vigmostad, C. Griwodz, and P. Halvorsen, 2013, pp. 131–135.
“Commute path bandwidth traces from 3g networks: anal- [142] J. Le Feuvre, J.-M. Thiesse, M. Parmentier, M. Raulet, and
ysis and applications,” in Proceedings of the 4th ACM C. Daguet, “Ultra high definition hevc dash data set,” in Pro-
Multimedia Systems Conference, 2013, pp. 114–118. ceedings of the 5th ACM Multimedia Systems Conference,
[123] X. Zuo, J. Yang, M. Wang, and Y. Cui, “Adaptive bitrate 2014, pp. 7–12.
with user-level qoe preference for video streaming,” in Pro- [143] A. Zabrovskiy, C. Feldmann, and C. Timmerer, “Multi-codec
ceedings of the IEEE International Conference on Computer dash dataset,” in Proceedings of the 9th ACM Multimedia
Communications (INFOCOM). IEEE, 2022, pp. 1279–1288. Systems Conference, 2018, pp. 438–443.
[124] J. Van Der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, [144] A. Chandramohan, M. Poel, B. Meijerink, and G. Heijenk,
P. R. Alface, T. Bostoen, and F. De Turck, “Http/2-based “Machine learning for cooperative driving in a multi-lane
adaptive streaming of hevc video over 4g/lte networks,” highway environment,” in 2019 Wireless Days (WD), 2019,
IEEE Communications Letters, vol. 20, no. 11, pp. 2177– pp. 1–4.
2180, 2016. [145] L. N. Alegre, T. Ziemke, and A. L. C. Bazzan, “Using rein-
[125] L. Zhang, Y. Zhang, X. Wu, F. Wang, L. Cui, Z. Wang, and forcement learning to control traffic signals in a real-world
J. Liu, “Batch adaptative streaming for video analytics,” scenario: An approach based on linear function approxi-
in Proceedings of the IEEE International Conference on mation,” IEEE Transactions on Intelligent Transportation
Computer Communications (INFOCOM). IEEE, 2022, pp. Systems, vol. 23, no. 7, pp. 9126–9135, 2022.
2158–2167. [146] C. Liu, Y. Zhang, W. Chen, F. Wang, H. Li, and Y.-D.
[126] A. Alhilal, T. Braud, B. Han, and P. Hui, “Nebula: Reliable Shen, “Adaptive matching strategy for multi-target multi-
low-latency video transmission for mobile cloud gaming,” in camera tracking,” in ICASSP 2022 - 2022 IEEE Interna-
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
tional Conference on Acoustics, Speech and Signal Process- traffic classification,” in Proc. Int. Conf. Eng. Telecommun.,
ing (ICASSP), 2022, pp. 2934–2938. 2021, pp. 1–5.
[147] M. Maciejewski, “A comparison of microscopic traffic flow [163] G. Aceto, D. Ciuonzo, A. Montieri, V. Persico, and
simulation systems for an urban area,” Transport Problems, A. Pescapé, “Mirage: Mobile-app traffic capture and ground-
vol. 5, no. 4, pp. 29–40, 2010. truth creation,” in 2019 4th International Conference
[148] F. K. Karnadi, Z. H. Mo, and K.-c. Lan, “Rapid generation of on Computing, Communications and Security (ICCCS).
realistic mobility models for vanet,” in 2007 IEEE Wireless IEEE, 2019, pp. 1–8.
Communications and Networking Conference, 2007, pp. [164] C. Wang, A. Finamore, L. Yang, K. Fauvel, and D. Rossi,
2506–2511. “Appclassnet: A commercial-grade dataset for application
[149] M. Tsao, D. Milojevic, C. Ruch, M. Salazar, E. Frazzoli, identification research,” ACM SIGCOMM Computer Com-
and M. Pavone, “Model predictive control of ride-sharing munication Review, vol. 52, no. 3, pp. 19–27, 2022.
autonomous mobility-on-demand systems,” in 2019 Inter- [165] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and
national Conference on Robotics and Automation (ICRA), A. Hotho, “A survey of network-based intrusion detection
2019, pp. 6665–6671. data sets,” Computers & Security, vol. 86, pp. 147–167, 2019.
[150] C. M. Moyano, J. F. Ortega, and D. E. Mogrovejo, “Effi- [166] “Datasets,” 2023. [Online]. Available: https://www.unb.ca/
ciency analysis during calibration of traffic microsimulation cic/datasets/
models in conflicting intersections near universidad del [167] A. Dvir, Y. Zion, J. Muehlstein, O. Pele, C. Hajaj, and
azuay, using aimsun 8.1,” in MOVICI-MOYCOT 2018: Joint R. Dubin, “Robust machine learning for encrypted traffic
Conference for Urban Mobility in the Smart City, 2018, pp. classification,” arXiv preprint arXiv:1603.04865, 2016.
1–6. [168] R. Poorzare and O. P. Waldhorst, “Toward the implemen-
[151] L. Yang and W. Lan, “On secondary development of ptv- tation of mptcp over mmwave 5g and beyond: Analysis,
vissim for traffic optimization,” in 2018 13th International challenges, and solutions,” IEEE Access, feb 2023.
Conference on Computer Science & Education (ICCSE), [169] R. Poorzare and A. Calveras Augé, “Challenges on the way
2018, pp. 1–5. of implementing tcp over 5g networks,” IEEE Access, sep
[152] L. Lu, T. Yun, L. Li, Y. Su, and D. Yao, “A comparison of 2020.
phase transitions produced by paramics, transmodeler, and [170] T. Henderson, M. Lacage, G. Riley, C. Dowell, and
vissim,” IEEE Intelligent Transportation Systems Magazine, J. Kopena, “Network simulations with the ns-3 simulator,”
vol. 2, no. 3, pp. 19–24, 2010. SIGCOMM Demonstration, vol. 14, p. 527, 2008.
[153] Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, [171] M. Mezzavilla, M. Zhang, M. Polese, R. Ford, S. Dutta,
S. Wang, R. Kumar, D. Anastasiu, and J.-N. Hwang, S. Rangan, and Z. M, “End-to-end simulation of 5g mmwave
“Cityflow: A city-scale benchmark for multi-target multi- networks,” IEEE Communications Surveys & Tutorials,
camera vehicle tracking and re-identification,” in 2019 vol. 20, no. 3, pp. 2237–2263, 2018.
IEEE/CVF Conference on Computer Vision and Pattern [172] P. Gawłowicz and Zubow, “ns3-gym: Extending ope-
Recognition (CVPR), 2019, pp. 8789–8798. nai gym for networking research,” [Online]. Available:
[154] Z. Wang, B. Li, and B. Liang, “Quick: Quality-of-service im- arXiv:1810.03943.
provement with cooperative relaying and network coding,” [173] H. Yin, P. Liu, K. Liu, L. Cao, L. Zhang, Y. Gao, and X. Hei,
in 2010 IEEE International Conference on Communications. “Ns3-ai: Fostering artificial intelligence algorithms for
IEEE, 2010, pp. 1–5. networking research,” in Proceedings of the 2020 Workshop
[155] T. Mangla, E. Halepovic, M. Ammar, and E. Zegura, on Ns-3, ser. WNS3 ’20. New York, NY, USA: Association
“emimic: Estimating http-based video qoe metrics from for Computing Machinery, 2020, p. 57–64. [Online].
encrypted network traffic,” in 2018 Network Traffic Mea- Available: https://doi.org/10.1145/3389400.3389404
surement and Analysis Conference (TMA). IEEE, 2018, [174] M. Schettler, D. S. Buse, A. Zubow, and F. Dressler, “How to
pp. 1–8. train your its? integrating machine learning with vehicular
[156] C. Gutterman, K. Guo, S. Arora, X. Wang, L. Wu, E. Katz- network simulation,” in 2020 IEEE Vehicular Networking
Bassett, and G. Zussman, “Requet: Real-time qoe detection Conference (VNC), 2020, pp. 1–4.
for encrypted youtube traffic,” in Proceedings of the 10th [175] F. Ruffy, M. Przystupa, and I. Beschastnikh, “Iroko: A
ACM Multimedia Systems Conference, 2019, pp. 48–59. framework to prototype reinforcement learning for data
[157] M. Seufert, P. Casas, N. Wehner, L. Gang, and K. Li, center traffic control,” CoRR, vol. abs/1812.09975, 2018.
“Stream-based machine learning for real-time qoe analysis [Online]. Available: http://arxiv.org/abs/1812.09975
of encrypted video streaming traffic,” in 2019 22nd Con- [176] J. Charlier, A. Singh, G. Ormazabal, R. State, and
ference on innovation in clouds, internet and networks and H. Schulzrinne, “Syngan: Towards generating synthetic net-
workshops (ICIN). IEEE, 2019, pp. 76–81. work attacks using gans,” CoRR, vol. abs/1908.09899, 2019.
[158] N. Wehner, M. Ring, J. Schüler, A. Hotho, T. Hoßfeld, [177] M. Ring, D. Schlör, D. Landes, and A. Hotho, “Flow-based
and M. Seufert, “On learning hierarchical embeddings from network traffic generation using generative adversarial
encrypted network traffic,” in NOMS 2022-2022 IEEE/IFIP networks,” Comput. Secur., vol. 82, no. C, p. 156–172,
Network Operations and Management Symposium. IEEE, may 2019. [Online]. Available: https://doi.org/10.1016/j.
2022, pp. 1–7. cose.2018.12.012
[159] K. Dietz, M. Mühlhauser, M. Seufert, N. Gray, T. Hoßfeld, [178] A. Mozo, Á. González-Prieto, A. Pastor, S. Gómez-Canaval,
and D. Herrmann, “Browser fingerprinting: How to protect and E. Talavera, “Synthetic flow-based cryptomining
machine learning models and data with differential privacy?” attack generation through generative adversarial networks,”
Electronic Communications of the EASST, vol. 80, 2021. Scientific Reports, vol. 12, no. 1, p. 2091, Feb 2022. [Online].
[160] N. Wehner, M. Seufert, J. Schüler, P. Casas, and T. Hoßfeld, Available: https://doi.org/10.1038/s41598-022-06057-2
“How are your apps doing? qoe inference and analysis in [179] Y. Guo, G. Xiong, Z. Li, J. Shi, M. Cui, and G. Gou, “Com-
mobile devices,” in 2021 17th International Conference on bating imbalance in network traffic classification using gan
Network and Service Management (CNSM). IEEE, 2021, based oversampling,” in 2021 IFIP Networking Conference
pp. 49–55. (IFIP Networking), 2021, pp. 1–9.
[161] A. Azab, M. Khasawneh, S. Alrabaee, K.-K. R. Choo, [180] T. J. Anande and M. S. Leeson, “Generative adversarial
and M. Sarsour, “Network traffic classification: Techniques, networks (gans): A survey on network traffic generation,”
datasets, and challenges,” Digital Communications and Net- International Journal of Machine Learning and Computing,
works, 2022. vol. 12, no. 6, 2022.
[162] D. Shamsimukhametov, M. Liubogoshchev, E. Khorov, and [181] M. Rigaki and S. Garcia, “Bringing a gan to a knife-fight:
I. Akyildiz, “Youtube netflix web dataset for encrypted Adapting malware communication to avoid detection,” in
2018 IEEE Security and Privacy Workshops (SPW), 2018,
pp. 70–75.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
[182] C. Zhang, X. Ouyang, and P. Patras, “Zipnet-gan: Inferring [201] Q. Wang, J. Xiong, L. Han, H. Liu, T. Zhang et al., “Expo-
fine-grained mobile traffic patterns via a generative nentially weighted imitation learning for batched historical
adversarial neural network,” CoRR, vol. abs/1711.02413, data,” Advances in Neural Information Processing Systems,
2017. [Online]. Available: http://arxiv.org/abs/1711.02413 vol. 31, 2018.
[183] B. Dowoo, Y. Jung, and C. Choi, “Pcapgan: Packet cap- [202] A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative
ture file generator by style-based generative adversarial q-learning for offline reinforcement learning,” Advances in
networks,” in 2019 18th IEEE International Conference On Neural Information Processing Systems, vol. 33, pp. 1179–
Machine Learning And Applications (ICMLA), 2019, pp. 1191, 2020.
1149–1154. [203] Z. Wang, A. Novikov, K. Zolna, J. S. Merel, J. T. Sprin-
[184] L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, genberg, S. E. Reed, B. Shahriari, N. Siegel, C. Gulcehre,
L. Rudolph, and A. Madry, “Implementation matters in N. Heess et al., “Critic regularized regression,” Advances in
deep rl: A case study on ppo and trpo,” in International Neural Information Processing Systems, vol. 33, pp. 7768–
Conference on Learning Representations, 2020. 7778, 2020.
[185] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, [204] D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to
and N. Dormann, “Stable-baselines3: Reliable reinforcement control: Learning behaviors by latent imagination,” arXiv
learning implementations,” The Journal of Machine Learn- preprint arXiv:1912.01603, 2019.
ing Research, vol. 22, no. 1, pp. 12 348–12 355, 2021. [205] L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih,
[186] S. Huang, R. F. J. Dossa, C. Ye, J. Braga, D. Chakraborty, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning
K. Mehta, and J. G. AraÚjo, “Cleanrl: High-quality single- et al., “Impala: Scalable distributed deep-rl with importance
file implementations of deep reinforcement learning algo- weighted actor-learner architectures,” in International con-
rithms,” Journal of Machine Learning Research, vol. 23, no. ference on machine learning. PMLR, 2018, pp. 1407–1416.
274, pp. 1–18, 2022. [206] S. Kapturowski, G. Ostrovski, J. Quan, R. Munos, and
[187] E. Liang, R. Liaw, P. Moritz, R. Nishihara, R. Fox, W. Dabney, “Recurrent experience replay in distributed rein-
K. Goldberg, J. E. Gonzalez, M. I. Jordan, and I. Stoica, forcement learning,” in International conference on learning
“Rllib: Abstractions for distributed reinforcement learning,” representations, 2019.
2017. [Online]. Available: https://arxiv.org/abs/1712.09381 [207] M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostro-
[188] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, vski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver,
P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and “Rainbow: Combining improvements in deep reinforcement
W. Zaremba, “Hindsight experience replay,” Advances in learning,” in Proceedings of the AAAI conference on artifi-
neural information processing systems, vol. 30, 2017. cial intelligence, vol. 32, no. 1, 2018.
[189] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and [208] E. Ie, V. Jain, J. Wang, S. Narvekar, R. Agarwal, R. Wu, H.-
O. Klimov, “Proximal policy optimization algorithms,” T. Cheng, T. Chandra, and C. Boutilier, “Slateq: a tractable
arXiv preprint arXiv:1707.06347, 2017. decomposition for reinforcement learning with recommen-
[190] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, dation sets,” in Proceedings of the 28th International Joint
J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft Conference on Artificial Intelligence, 2019, pp. 2592–2599.
actor-critic algorithms and applications,” arXiv preprint [209] E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa,
arXiv:1812.05905, 2018. D. Parikh, M. Savva, and D. Batra, “Dd-ppo: Learning near-
[191] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function perfect pointgoal navigators from 2.5 billion frames,” arXiv
approximation error in actor-critic methods,” in Interna- preprint arXiv:1911.00357, 2019.
tional conference on machine learning. PMLR, 2018, pp. [210] TensorFlow, “Tensorboard: A unified platform for visual-
1587–1596. izing live, rich data for tensorflow models,” in The IEEE
[192] H. Mania, A. Guy, and B. Recht, “Simple random search Conference on Computer Vision and Pattern Recognition
provides a competitive approach to reinforcement learning,” (CVPR) Workshops, 2016.
arXiv preprint arXiv:1803.07055, 2018. [211] L. Biewald, “Experiment tracking with weights and
[193] W. Dabney, M. Rowland, M. Bellemare, and R. Munos, biases,” 2020, software available from wandb.com. [Online].
“Distributional reinforcement learning with quantile regres- Available: https://www.wandb.com/
sion,” in Proceedings of the AAAI Conference on Artificial [212] Comet.ml, “Comet.ml: Machine learning operations plat-
Intelligence, vol. 32, no. 1, 2018. form,” https://www.comet.ml/, 2018.
[194] S. Huang, R. F. J. Dossa, A. Raffin, A. Kanervisto, and [213] A. Chen, A. Chow, A. Davidson, A. DCunha, A. Ghodsi,
W. Wang, “The 37 implementation details of proximal policy S. A. Hong, A. Konwinski, C. Mewald, S. Murching,
optimization,” The ICLR Blog Track 2023, 2022. T. Nykodym, P. Ogilvie, M. Parkhe, A. Singh, F. Xie,
[195] A. Kuznetsov, P. Shvechikov, A. Grishin, and D. Vetrov, M. Zaharia, R. Zang, J. Zheng, and C. Zumar,
“Controlling overestimation bias with truncated mixture of “Developments in mlflow: A system to accelerate the
continuous distributional quantile critics,” in International machine learning lifecycle,” in Proceedings of the Fourth
Conference on Machine Learning. PMLR, 2020, pp. 5556– International Workshop on Data Management for End-to-
5566. End Machine Learning, ser. DEEM’20. New York, NY,
[196] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, USA: Association for Computing Machinery, 2020. [Online].
“Trust region policy optimization,” in International confer- Available: https://doi.org/10.1145/3399579.3399867
ence on machine learning. PMLR, 2015, pp. 1889–1897. [214] I. Habibie, M. Kleinsorge, Z. Al-Ars, J. Schneider,
[197] S. Huang and S. Ontañón, “A closer look at invalid ac- W. Kessler, and T. Kuhlen, “Visdom: A Tool for Visual-
tion masking in policy gradient algorithms,” arXiv preprint ization and Monitoring of Machine Learning Experiments,”
arXiv:2006.14171, 2020. in ArXiv e-prints, Mar. 2017.
[198] M. G. Bellemare, W. Dabney, and R. Munos, “A distribu- [215] Microsoft, “Tensorwatch,” https://github.com/microsoft/
tional perspective on reinforcement learning,” in Interna- tensorwatch, 2021.
tional conference on machine learning. PMLR, 2017, pp. [216] M. R. Asia, “Nni (neural network intelligence): An
449–458. open-source automl toolkit for neural architecture search
[199] K. W. Cobbe, J. Hilton, O. Klimov, and J. Schulman, and hyper-parameter tuning,” GitHub repository, 2021.
“Phasic policy gradient,” in International Conference on [Online]. Available: https://github.com/microsoft/nni
Machine Learning. PMLR, 2021, pp. 2020–2027. [217] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama,
[200] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, “Optuna: A next-generation hyperparameter optimization
A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel framework,” in Proceedings of the 25rd ACM SIGKDD
et al., “Mastering chess and shogi by self-play with a International Conference on Knowledge Discovery and Data
general reinforcement learning algorithm,” arXiv preprint Mining, 2019.
arXiv:1712.01815, 2017.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
[218] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, [233] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson,
and I. Stoica, “Tune: A research platform for distributed M. Wawrzoniak, and M. Bowman, “Planetlab: An overlay
model selection and training,” 2018. [Online]. Available: testbed for broad-coverage services,” SIGCOMM Comput.
https://arxiv.org/abs/1807.05118 Commun. Rev., vol. 33, no. 3, p. 3–12, jul 2003. [Online].
[219] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, Available: https://doi.org/10.1145/956993.956995
J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, [234] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad,
K. Simonyan, C. Fernando, and K. Kavukcuoglu, M. Newbold, M. Hibler, C. Barb, and A. Joglekar,
“Population based training of neural networks,” 2017. “An integrated experimental environment for distributed
[Online]. Available: https://arxiv.org/abs/1711.09846 systems and networks,” SIGOPS Oper. Syst. Rev.,
[220] J. Bergstra, D. Yamins, and D. Cox, “Making a science of vol. 36, no. SI, p. 255–270, dec 2003. [Online]. Available:
model search: Hyperparameter optimization in hundreds of https://doi.org/10.1145/844128.844152
dimensions for vision architectures,” in Proceedings of the [235] M. Berman, J. S. Chase, L. H. Landweber, A. Nakao,
30th International Conference on Machine Learning, ser. M. Ott, D. Raychaudhuri, R. Ricci, and I. Seskar, “Geni:
Proceedings of Machine Learning Research, S. Dasgupta and A federated testbed for innovative network experiments,”
D. McAllester, Eds., vol. 28, no. 1. Atlanta, Georgia, USA: Comput. Networks, vol. 61, pp. 5–23, 2014.
PMLR, 17–19 Jun 2013, pp. 115–123. [Online]. Available: [236] L. Yang, F. Wen, J. Cao, and Z. Wang, “Edgetb: A hy-
https://proceedings.mlr.press/v28/bergstra13.html brid testbed for distributed machine learning at the edge
[221] R. Ostrovskiy and A. Gordon, “Keras tuner,” GitHub with high fidelity,” IEEE Transactions on Parallel and
repository, 2020. [Online]. Available: https://github.com/ Distributed Systems, vol. 33, no. 10, pp. 2540–2553, 2022.
keras-team/keras-tuner [237] F. Hussain, R. Hussain, and E. Hossain, “Explainable arti-
[222] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, ficial intelligence (xai): An engineering perspective,” arXiv
“Algorithms for hyper-parameter optimization,” in Advances preprint arXiv:2101.03613, 2021.
in Neural Information Processing Systems, 2013, pp. 2546– [238] S. Mukherjee, J. Rupe, and J. Zhu, “Xai for communication
2554. [Online]. Available: http://papers.nips.cc/paper/ networks,” in 2022 IEEE International Symposium on Soft-
4443-algorithms-for-hyper-parameter-optimization ware Reliability Engineering Workshops (ISSREW). IEEE,
[223] M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, 2022, pp. 359–364.
M. Blum, and F. Hutter, “Efficient and robust automated [239] C. Liaskos, S. Nie, A. Tsioliaridou, A. Pitsillides, S. Ioan-
machine learning,” in Advances in Neural Information nidis, and I. Akyildiz, “End-to-end wireless path deploy-
Processing Systems, 2015, pp. 2962–2970. [Online]. ment with intelligent surfaces using interpretable neural
Available: http://papers.nips.cc/paper/5872-efficient-and- networks,” IEEE Transactions on Communications, vol. 68,
robust-automated-machine-learning no. 11, pp. 6792–6806, 2020.
[224] M. Merenda, C. Porcaro, and D. Iero, “Edge machine learn- [240] A.-D. Marcu, S. K. G. Peesapati, J. M. Cortes, S. Imtiaz,
ing for ai-enabled iot devices: A review,” Sensors, vol. 20, and J. Gross, “Explainable artificial intelligence for energy-
no. 9, p. 2533, 2020. efficient radio resource management,” in 2023 IEEE Wire-
[225] A.-S. Tonneau, N. Mitton, and J. Vandaele, “A survey on less Communications and Networking Conference (WCNC).
(mobile) wireless sensor network experimentation testbeds,” IEEE, 2023, pp. 1–6.
in 2014 IEEE International Conference on Distributed Com- [241] P. Barnard, I. Macaluso, N. Marchetti, and L. A. DaSilva,
puting in Sensor Systems. IEEE, 2014, pp. 263–268. “Resource reservation in sliced networks: an explainable
[226] M. Chernyshev, Z. Baig, O. Bello, and S. Zeadally, “Internet artificial intelligence (xai) approach,” in ICC 2022-IEEE
of things (iot): Research, simulators, and testbeds,” IEEE International Conference on Communications. IEEE, 2022,
Internet of Things Journal, vol. 5, no. 3, pp. 1637–1647, pp. 1530–1535.
2017. [242] A. Palaios, C. L. Vielhaus, D. F. Külzer, C. Watermann,
[227] S. Zhu, S. Yang, X. Gou, Y. Xu, T. Zhang, and Y. Wan, “Sur- R. Hernangomez, S. Partani, P. Geuer, A. Krause, R. Sat-
vey of testing methods and testbed development concerning tiraju, M. Kasparick, G. Fettweis, F. H. P. Fitzek, H. D.
internet of things,” Wireless Personal Communications, pp. Schotten, and S. Stanczak, “The story of qos prediction in
1–30, 2022. vehicular communication: From radio environment statistics
[228] R. Lim, F. Ferrari, M. Zimmerling, C. Walser, P. Sommer, to network-access throughput prediction,” 2023.
and J. Beutel, “Flocklab: A testbed for distributed, synchro- [243] S. Hariharan, A. Velicheti, A. Anagha, C. Thomas, and
nized tracing and profiling of wireless embedded systems,” N. Balakrishnan, “Explainable artificial intelligence in cy-
in Proceedings of the 12th international conference on bersecurity: A brief review,” in 2021 4th International Con-
Information processing in sensor networks, 2013, pp. 153– ference on Security and Privacy (ISEA-ISAP). IEEE, 2021,
166. pp. 1–12.
[229] R. Trüb, R. Da Forno, L. Daschinger, A. Biri, J. Beutel, and [244] N. Capuano, G. Fenza, V. Loia, and C. Stanzione, “Explain-
L. Thiele, “Non-intrusive distributed tracing of wireless iot able artificial intelligence in cybersecurity: A survey,” IEEE
devices with the flocklab 2 testbed,” ACM Transactions on Access, vol. 10, pp. 93 575–93 600, 2022.
Internet of Things, vol. 3, no. 1, pp. 1–31, 2021. [245] C. Molnar, Interpretable Machine Learning, 2nd ed.,
[230] C. Adjih, E. Baccelli, E. Fleury, G. Harter, N. Mitton, 2022. [Online]. Available: https://christophm.github.io/
T. Noel, R. Pissard-Gibollet, F. Saint-Marcel, G. Schreiner, interpretable-ml-book
J. Vandaele et al., “Fit iot-lab: A large scale open experimen- [246] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot,
tal iot testbed,” in 2015 IEEE 2nd World Forum on Internet S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina,
of Things (WF-IoT). IEEE, 2015, pp. 459–464. R. Benjamins et al., “Explainable artificial intelligence (xai):
[231] M. Schuß, C. A. Boano, M. Weber, and K. Römer, “A Concepts, taxonomies, opportunities and challenges toward
competition to push the dependability of low-power wireless responsible ai,” Information fusion, 2020.
protocols to the edge,” in Proceedings of the 14th Inter- [247] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep in-
national Conference on Embedded Wireless Systems and side convolutional networks: Visualising image classification
Networks (EWSN). Junction Publishing, Feb. 2017, pp. models and saliency maps,” arXiv preprint arXiv:1312.6034,
54–65. 2013.
[232] D. Molteni, G. P. Picco, M. Trobinger, and [248] C. Rudin, “Stop explaining black box machine learning mod-
D. Vecchia, “Cloves: A large-scale ultra-wideband els for high stakes decisions and use interpretable models
testbed,” in Proceedings of the 20th ACM Conference instead,” Nature machine intelligence, vol. 1, no. 5, pp. 206–
on Embedded Networked Sensor Systems, ser. SenSys 215, 2019.
’22. New York, NY, USA: Association for Computing [249] T. Shapira and Y. Shavitt, “Flowpic: Encrypted internet
Machinery, 2023, p. 808–809. [Online]. Available: traffic classification is as easy as image recognition,” in IEEE
https://doi.org/10.1145/3560905.3568072 INFOCOM 2019-IEEE Conference on Computer Communi-
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
cations Workshops (INFOCOM WKSHPS). IEEE, 2019, C. Araya, S. Yan et al., “Captum: A unified and generic
pp. 680–687. model interpretability library for pytorch,” arXiv preprint
[250] S. M. Lundberg and S.-I. Lee, “A unified approach to inter- arXiv:2009.07896, 2020.
preting model predictions,” Advances in neural information [268] “TorchRay,” 2019. [Online]. Available: https://github.com/
processing systems, vol. 30, 2017. facebookresearch/TorchRay
[251] M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should [269] “TF-Explain,” 2019. [Online]. Available: https://tf-explain.
i trust you?" explaining the predictions of any classifier,” readthedocs.io/en/latest/index.html
in Proceedings of the 22nd ACM SIGKDD international [270] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in
conference on knowledge discovery and data mining, 2016, mobile and wireless networking: A survey,” IEEE Commu-
pp. 1135–1144. nications Surveys & Tutorials, vol. 21, no. 3, pp. 2224–2287,
[252] H. Nori, S. Jenkins, P. Koch, and R. Caruana, “Interpretml: 2019.
A unified framework for machine learning interpretability,” [271] H. Hellström, J. M. B. d. Silva Jr, V. Fodor, and C. Fis-
arXiv preprint arXiv:1909.09223, 2019. chione, “Wireless for machine learning,” arXiv preprint
[253] R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, arXiv:2008.13492, 2020.
R. Caruana, and G. E. Hinton, “Neural additive models: [272] D. Jin, Z. Yu, P. Jiao, S. Pan, D. He, J. Wu, P. Yu, and
Interpretable machine learning with neural nets,” Advances W. Zhang, “A survey of community detection approaches:
in Neural Information Processing Systems, vol. 34, pp. 4699– From statistical modeling to deep learning,” IEEE Transac-
4711, 2021. tions on Knowledge and Data Engineering, 2021.
[254] N. Wehner, A. Seufert, T. Hoßfeld, and M. Seufert, “Ex- [273] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and
plainable Data-Driven QoE Modelling with XAI,” in 2022 M. Sun, “Graph neural networks: A review of methods and
15th International Conference on Quality of Multimedia applications,” CoRR, vol. abs/1812.08434, 2018. [Online].
Experience (QoMEX), 2023, pp. 1–6. Available: http://arxiv.org/abs/1812.08434
[255] K. Brunnström, S. A. Beker, K. De Moor, A. Dooms, [274] M. A. Ridwan, N. A. M. Radzi, F. Abdullah, and Y. E. Jalil,
S. Egger, M.-N. Garcia, T. Hossfeld, S. Jumisko-Pyykkö, “Applications of machine learning in networking: A survey
C. Keimel, M.-C. Larabi et al., “Qualinet white paper on of current issues and future challenges,” IEEE Access, vol. 9,
definitions of quality of experience,” 2013. pp. 52 523–52 556, 2021.
[256] N. Wehner, M. Seufert, J. Schuler, S. Wassermann, P. Casas, [275] F. Tang, B. Mao, N. Kato, and G. Gui, “Comprehensive
and T. Hossfeld, “Improving web qoe monitoring for en- survey on machine learning in vehicular network: Technol-
crypted network traffic through time series modeling,” ACM ogy, applications and challenges,” IEEE Communications
SIGMETRICS Performance Evaluation Review, vol. 48, Surveys & Tutorials, vol. 23, no. 3, pp. 2027–2057, 2021.
no. 4, pp. 37–40, 2021. [276] E. García-Martín, C. F. Rodrigues, G. Riley, and H. Grahn,
[257] E. Hüllermeier and W. Waegeman, “Aleatoric and epistemic “Estimation of energy consumption in machine learning,”
uncertainty in machine learning: An introduction to con- Journal of Parallel and Distributed Computing, vol. 134, pp.
cepts and methods,” Machine Learning, vol. 110, pp. 457– 75–88, 2019. [Online]. Available: https://www.sciencedirect.
506, 2021. com/science/article/pii/S0743731518308773
[258] A. F. Psaros, X. Meng, Z. Zou, L. Guo, and G. E. Kar- [277] L. Song, X. Hu, G. Zhang, P. Spachos, K. N. Plataniotis, and
niadakis, “Uncertainty quantification in scientific machine H. Wu, “Networking systems of ai: On the convergence of
learning: Methods, metrics, and comparisons,” Journal of computing and communications,” IEEE Internet of Things
Computational Physics, vol. 477, p. 111902, 2023. Journal, vol. 9, no. 20, pp. 20 352–20 381, 2022.
[259] A. Kendall and Y. Gal, “What uncertainties do we need in [278] G. Drainakis, K. V. Katsaros, P. Pantazopoulos, V. Sourlas,
bayesian deep learning for computer vision?” Advances in and A. Amditis, “Federated vs. centralized machine learning
neural information processing systems, vol. 30, 2017. under privacy-elastic users: A comparative analysis,” in 2020
[260] V. Dignum, Responsible artificial intelligence: how to de- IEEE 19th International Symposium on Network Comput-
velop and use AI in a responsible way. Springer, 2019, vol. ing and Applications (NCA). IEEE, 2020, pp. 1–8.
2156. [279] I. A. Majeed, S. Kaushik, A. Bardhan, V. S. K. Tadi, H.-
[261] Y. Siriwardhana, P. Porambage, M. Liyanage, and K. Min, K. Kumaraguru, and R. D. Muni, “Comparative
M. Ylianttila, “Ai and 6g security: Opportunities and chal- assessment of federated and centralized machine learning,”
lenges,” in 2021 Joint European Conference on Networks arXiv preprint arXiv:2202.01529, 2022.
and Communications & 6G Summit (EuCNC/6G Summit). [280] W. Hassan, T.-S. Chou, O. Tamer, J. Pickard, P. Appiah-
IEEE, 2021, pp. 616–621. Kubi, and L. Pagliari, “Cloud computing survey on services,
[262] Q. Lu, L. Zhu, X. Xu, J. Whittle, D. Zowghi, and enhancements and challenges in the era of machine learning
A. Jacquet, “Responsible ai pattern catalogue: A collection and data science,” International Journal of Informatics and
of best practices for ai governance and engineering,” ACM Communication Technology (IJ-ICT), vol. 9, no. 2, pp. 117–
Comput. Surv., oct 2023, just Accepted. [Online]. Available: 139, 2020.
https://doi.org/10.1145/3626234 [281] Y. Ko, K. Choi, H. Jei, D. Lee, and S.-W. Kim, “Aladdin:
[263] W. Yang, H. Le, S. Savarese, and S. C. Hoi, “Omnixai: A Asymmetric centralized training for distributed deep learn-
library for explainable ai,” arXiv preprint arXiv:2206.01612, ing,” in Proceedings of the 30th ACM International Confer-
2022. ence on Information & Knowledge Management, 2021, pp.
[264] V. Arya, R. K. Bellamy, P.-Y. Chen, A. Dhurandhar, 863–872.
M. Hind, S. C. Hoffman, S. Houde, Q. V. Liao, R. Luss, [282] X. Wang, Y. Han, V. C. Leung, D. Niyato, X. Yan,
A. Mojsilović et al., “Ai explainability 360 toolkit,” in and X. Chen, “Convergence of edge computing and deep
Proceedings of the 3rd ACM India Joint International learning: A comprehensive survey,” IEEE Communications
Conference on Data Science & Management of Data (8th Surveys & Tutorials, vol. 22, no. 2, pp. 869–904, 2020.
ACM IKDD CODS & 26th COMAD), 2021, pp. 376–379. [283] F. Samie, L. Bauer, and J. Henkel, “From cloud down
[265] J. Klaise, A. Van Looveren, G. Vacanti, and A. Coca, to things: An overview of machine learning in internet of
“Alibi explain: Algorithms for explaining machine learning things,” IEEE Internet of Things Journal, vol. 6, no. 3, pp.
models,” The Journal of Machine Learning Research, vol. 22, 4921–4934, 2019.
no. 1, pp. 8194–8200, 2021. [284] W. Toussaint and A. Y. Ding, “Machine learning systems
[266] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attri- in the iot: Trustworthiness trade-offs for edge intelligence,”
bution for deep networks,” in International conference on in 2020 IEEE Second International Conference on Cognitive
machine learning. PMLR, 2017, pp. 3319–3328. Machine Intelligence (CogMI). IEEE, 2020, pp. 177–184.
[267] N. Kokhlikyan, V. Miglani, M. Martin, E. Wang, [285] A. Smola and S. Narayanamurthy, “An architecture for par-
B. Alsallakh, J. Reynolds, A. Melnikov, N. Kliushkina, allel topic models,” Proceedings VLDB Endowment, vol. 3,
no. 1-2, pp. 703–710, Sep. 2010.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
[286] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, [305] J. Xie, F. R. Yu, T. Huang, R. Xie, J. Liu, C. Wang, and
V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su, “Scaling Y. Liu, “A survey of machine learning techniques applied
distributed machine learning with the parameter server,” to software defined networking (sdn): Research issues and
2014. challenges,” IEEE Communications Surveys & Tutorials,
[287] J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. vol. 21, no. 1, pp. 393–430, 2018.
Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, [306] R. Boutaba, M. A. Salahuddin, N. Limam, S. Ayoubi,
and A. Y. Ng, “Large scale distributed deep networks,” N. Shahriar, F. Estrada-Solano, and O. M. Caicedo, “A
Advances in neural information processing systems, vol. 25, comprehensive survey on machine learning for networking:
2012. evolution, applications and research opportunities,” Journal
[288] J. Jiang, B. Cui, C. Zhang, and L. Yu, “Heterogeneity-aware of Internet Services and Applications, vol. 9, no. 1, pp. 1–99,
distributed parameter servers,” in Proceedings of the 2017 2018.
ACM International Conference on Management of Data, [307] Y. Liu, F. R. Yu, X. Li, H. Ji, and V. C. M. Leung,
ser. SIGMOD ’17. New York, NY, USA: Association for “Blockchain and machine learning for communications and
Computing Machinery, May 2017, pp. 463–478. networking systems,” IEEE Communications Surveys &
[289] B. McMahan, E. Moore, D. Ramage, S. Hampson, and Tutorials, vol. 22, no. 2, pp. 1392–1431, 2020.
B. A. y Arcas, “Communication-efficient learning of deep [308] O. Nassef, W. Sun, H. Purmehdi, M. Tatipamula, and
networks from decentralized data,” in Artificial intelligence T. Mahmoodi, “A survey: Distributed machine learning for
and statistics. PMLR, 2017, pp. 1273–1282. 5g and beyond,” Computer Networks, vol. 207, p. 108820,
[290] Y. Liu, Y. Kang, C. Xing, T. Chen, and Q. Yang, “A secure 2022.
federated transfer learning framework,” IEEE Intelligent [309] M. A. Ridwan, N. A. M. Radzi, F. Abdullah, and Y. Jalil,
Systems, vol. 35, no. 4, pp. 70–82, 2020. “Applications of machine learning in networking: a survey of
[291] F. Chen, M. Luo, Z. Dong, Z. Li, and X. He, “Federated current issues and future challenges,” IEEE Access, vol. 9,
meta-learning with fast convergence and efficient communi- pp. 52 523–52 556, 2021.
cation,” arXiv preprint arXiv:1802.07876, 2018. [310] O. A. Wahab, A. Mourad, H. Otrok, and T. Taleb, “Fed-
[292] A. Gibiansky, “Bringing HPC techniques to deep learning,” erated machine learning: Survey, multi-level classification,
Baidu Research, Tech. Rep., 2017. desirable criteria and future directions in communication
[293] P. Patarasuk and X. Yuan, “Bandwidth optimal all-reduce and networking systems,” IEEE Communications Surveys
algorithms for clusters of workstations,” pp. 117–124, 2009. & Tutorials, vol. 23, no. 2, pp. 1342–1397, 2021.
[294] H. Zhao and J. Canny, “Butterfly mixing: Accelerating [311] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C.
Incremental-Update algorithms on clusters,” in Proceedings Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning
of the 2013 SIAM International Conference on Data Mining in mobile edge networks: A comprehensive survey,” IEEE
(SDM), ser. Proceedings. Society for Industrial and Applied Communications Surveys & Tutorials, vol. 22, no. 3, pp.
Mathematics, May 2013, pp. 785–793. 2031–2063, 2020.
[295] X. Wan, H. Zhang, H. Wang, S. Hu, J. Zhang, and K. Chen, [312] A. Imteaj, K. Mamun Ahmed, U. Thakker, S. Wang,
“Rat-resilient allreduce tree for distributed machine learn- J. Li, and M. H. Amini, “Federated learning for resource-
ing,” in 4th Asia-Pacific workshop on networking, 2020, pp. constrained iot devices: Panoramas and state of the art,”
52–57. Federated and Transfer Learning, pp. 7–27, 2022.
[296] O. Gupta and R. Raskar, “Distributed learning of deep [313] M. Abbasi, A. Shahraki, and A. Taherkordi, “Deep learning
neural network over multiple agents,” Journal of Network for network traffic monitoring and analysis (ntma): A sur-
and Computer Applications, vol. 116, pp. 1–8, 2018. vey,” Computer Communications, vol. 170, pp. 19–41, 2021.
[297] E. Samikwa, A. Di Maio, and T. Braun, “Ares: Adaptive [314] F. Hussain, S. A. Hassan, R. Hussain, and E. Hossain,
resource-aware split learning for internet of things,” Com- “Machine learning for resource management in cellular and
puter Networks, vol. 218, p. 109380, 2022. iot networks: Potentials, current solutions, and open chal-
[298] V. Turina, Z. Zhang, F. Esposito, and I. Matta, “Federated lenges,” IEEE communications surveys & tutorials, vol. 22,
or split? a performance and privacy analysis of hybrid split no. 2, pp. 1251–1275, 2020.
and federated learning architectures,” in 2021 IEEE 14th [315] A. Talpur and M. Gurusamy, “Machine learning for security
International Conference on Cloud Computing (CLOUD). in vehicular networks: A comprehensive survey,” IEEE Com-
IEEE, 2021, pp. 250–260. munications Surveys & Tutorials, vol. 24, no. 1, pp. 346–379,
[299] Y. Gao, M. Kim, C. Thapa, S. Abuadbba, Z. Zhang, 2021.
S. Camtepe, H. Kim, and S. Nepal, “Evaluation and op- [316] M. A. Hossain, R. M. Noor, K.-L. A. Yau, S. R. Azzuhri,
timization of distributed machine learning techniques for M. R. Z’aba, and I. Ahmedy, “Comprehensive survey of ma-
internet of things,” IEEE Transactions on Computers, 2021. chine learning approaches in cognitive radio-based vehicular
[300] E. Samikwa, A. Di Maio, and T. Braun, “Adaptive early exit ad hoc networks,” IEEE Access, vol. 8, pp. 78 054–78 108,
of computation for energy-efficient and low-latency machine 2020.
learning over iot networks,” in 2022 IEEE 19th Annual Con- [317] W. Guo, “Explainable artificial intelligence for 6g: Improving
sumer Communications & Networking Conference (CCNC). trust between human and machine,” IEEE Communications
IEEE, 2022, pp. 200–206. Magazine, vol. 58, no. 6, pp. 39–45, 2020.
[301] Y. Matsubara, M. Levorato, and F. Restuccia, “Split com- [318] Y. Zheng, Z. Liu, X. You, Y. Xu, and J. Jiang, “Demystifying
puting and early exiting for deep learning applications: deep learning in networking,” in Proceedings of the 2nd
Survey and research challenges,” ACM Computing Surveys, Asia-Pacific Workshop on Networking, 2018, pp. 1–7.
vol. 55, no. 5, pp. 1–30, 2022. [319] A. Adadi and M. Berrada, “Peeking inside the black-box:
[302] Y. Matsubara, R. Yang, M. Levorato, and S. Mandt, “Sc2: A survey on explainable artificial intelligence (xai),” IEEE
Supervised compression for split computing,” arXiv preprint Access, vol. 6, pp. 52 138–52 160, 2018.
arXiv:2203.08875, 2022. [320] A. Heuillet, F. Couthouis, and N. Díaz-Rodríguez, “Explain-
[303] M. G. S. Murshed, C. Murphy, D. Hou, N. Khan, ability in deep reinforcement learning,” Knowledge-Based
G. Ananthanarayanan, and F. Hussain, “Machine learning Systems, vol. 214, p. 106685, 2021.
at the network edge: A survey,” ACM Comput. Surv., [321] A. Paleyes, R.-G. Urma, and N. D. Lawrence, “Challenges in
vol. 54, no. 8, oct 2021. [Online]. Available: https: deploying machine learning: A survey of case studies,” ACM
//doi.org/10.1145/3469029 Comput. Surv., vol. 55, no. 6, p. 114, 2022.
[304] J. Shuja, K. Bilal, W. Alasmary, H. Sinky, and E. Alanazi, [322] S. Mehdizadeh and A. Shirmarz, “A comprehensive survey
“Applying machine learning techniques for caching in next- on machine learning using in software defined networks
generation edge networks: A comprehensive survey,” Journal (sdn),” Human-Centric Intelligent Systems, vol. 3, no. 4, pp.
of Network and Computer Applications, vol. 181, p. 103005, 312–343, 2023.
2021.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
[323] M. A. Ridwan, N. A. M. Radzi, F. Abdullah, and Y. E. Jalil, omy, research challenges, and future research directions,”
“Applications of machine learning in networking: A survey Ieee Access, vol. 8, pp. 187 498–187 522, 2020.
of current issues and future challenges,” IEEE Access, vol. 9, [341] D. Li, X. Chen, M. Becchi, and Z. Zong, “Evaluating the
pp. 52 523–52 556, 2021. energy efficiency of deep convolutional neural networks
[324] S. Sezer, S. Scott-Hayward, P. K. Chouhan, B. Fraser, on cpus and gpus,” in 2016 IEEE international confer-
D. Lake, J. Finnegan, N. Viljoen, M. Miller, and N. Rao, ences on big data and cloud computing (BDCloud), so-
“Are we ready for sdn? implementation challenges for cial computing and networking (SocialCom), sustainable
software-defined networks,” IEEE Communications Maga- computing and communications (SustainCom)(BDCloud-
zine, vol. 51, no. 7, pp. 36–43, 2013. SocialCom-SustainCom). IEEE, 2016, pp. 477–484.
[325] C. Lu, A. Saifullah, B. Li, M. Sha, H. Gonzalez, D. Gunati- [342] M. Svedin, S. W. Chien, G. Chikafa, N. Jansson, and
laka, C. Wu, L. Nie, and Y. Chen, “Real-time wireless sensor- A. Podobas, “Benchmarking the nvidia gpu lineage: From
actuator networks for industrial cyber-physical systems,” early k80 to modern a100 with asynchronous memory trans-
Proceedings of the IEEE, vol. 104, no. 5, pp. 1013–1024, fers,” in Proceedings of the 11th International Symposium
2015. on Highly Efficient Accelerators and Reconfigurable Tech-
[326] H. Kopetz and W. Steiner, “Real-time communication,” in nologies, 2021, pp. 1–6.
Real-time systems: Design principles for distributed embed- [343] E. Samikwa, A. Di Maio, and T. Braun, “Disnet: Distributed
ded applications. Springer, 2022, pp. 177–200. micro-split deep learning in heterogeneous dynamic iot,”
[327] D. Zhang, P. Shi, Q.-G. Wang, and L. Yu, “Analysis and IEEE internet of things journal, 2023.
synthesis of networked control systems: A survey of recent [344] Y. Chen, T.-J. Yang, J. Emer, and V. Sze, “Understanding
advances and challenges,” ISA transactions, vol. 66, pp. 376– the limitations of existing energy-efficient design approaches
392, 2017. for deep neural networks,” Energy, vol. 2, no. L1, p. L3, 2018.
[328] S. Ramstedt and C. Pal, “Real-time reinforcement learning,” [345] E. Hossain and F. Fredj, “Editorial energy efficiency of
Advances in neural information processing systems, vol. 32, machine-learning-based designs for future wireless systems
2019. and networks,” IEEE Transactions on Green Communica-
[329] J. Mendez, K. Bierzynski, M. Cuéllar, and D. P. Morales, tions and Networking, vol. 5, no. 3, pp. 1005–1010, 2021.
“Edge intelligence: concepts, architectures, applications, and [346] R. Desislavov, F. Martínez-Plumed, and J. Hernández-
future directions,” ACM Transactions on Embedded Com- Orallo, “Trends in ai inference energy consumption:
puting Systems (TECS), vol. 21, no. 5, pp. 1–41, 2022. Beyond the performance-vs-parameter laws of deep
[330] L. E. Lwakatare, A. Raj, I. Crnkovic, J. Bosch, and H. H. learning,” Sustainable Computing: Informatics and Systems,
Olsson, “Large-scale machine learning systems in real-world vol. 38, p. 100857, 2023. [Online]. Available: https://www.
industrial settings: A review of challenges and solutions,” sciencedirect.com/science/article/pii/S2210537923000124
Information and software technology, vol. 127, p. 106368, [347] T.-J. Yang, Y.-H. Chen, J. Emer, and V. Sze, “A method to
2020. estimate the energy consumption of deep neural networks,”
[331] A. Redder, A. Ramaswamy, and H. Karl, “Stability and con- in 2017 51st Asilomar Conference on Signals, Systems, and
vergence of distributed stochastic approximations with large Computers, 2017, pp. 1916–1920.
unbounded stochastic information delays,” arXiv preprint [348] T.-J. Yang, Y.-H. Chen, and V. Sze, “Deep neural network
arXiv:2305.07091, 2023. energy estimation tool,” 1, 2017, accessed: 2024-01-25.
[332] R. Bless, B. Bloessl, M. Hollick, M. Corici, H. Karl, [349] J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han et al., “Mcunet:
D. Krummacker, D. Lindenschmitt, H. D. Schotten, and Tiny deep learning on iot devices,” Advances in Neural
L. Wimmer, “Dynamic network (re-) configuration across Information Processing Systems, vol. 33, pp. 11 711–11 722,
time, scope, and structure,” in 2022 Joint European Con- 2020.
ference on Networks and Communications & 6G Summit [350] H. Cai, C. Gan, L. Zhu, and S. Han, “Tinytl: Reduce
(EuCNC/6G Summit). IEEE, 2022, pp. 547–552. activations, not trainable parameters for efficient on-device
[333] J. Hoydis, F. A. Aoudia, A. Valcarce, and H. Viswanathan, learning,” arXiv preprint arXiv:2007.11622, 2020.
“Toward a 6g ai-native air interface,” IEEE Communications [351] L. Heim, A. Biri, Z. Qu, and L. Thiele, “Measuring what
Magazine, vol. 59, no. 5, pp. 76–81, 2021. really matters: Optimizing neural networks for tinyml,”
[334] ——, “Toward a 6g ai-native air interface,” 2021. arXiv preprint arXiv:2104.10645, 2021.
[335] F. Ait Aoudia, J. Hoydis, A. Valcarce, and [352] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally,
H. Viswanathan. (2021) Toward a 6g ai-native air “Deep gradient compression: Reducing the communica-
interface. [Online]. Available: https://www.bell-labs.com/ tion bandwidth for distributed training,” arXiv preprint
institute/white-papers/toward-6g-ai-native-air-interface/ arXiv:1712.01887, 2017.
[336] R. . Schwarz. (2023) Enabling an ai-native air [353] Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar,
interface for 6g: Rohde & schwarz showcases ai/ml- Y. Chtouki, and A. S. Hafid, “A comprehensive survey on
based neural receiver with optimized modulation tinyml,” IEEE Access, vol. 11, pp. 96 892–96 922, 2023.
at brooklyn 6g summit, in collaboration with
nvidia. [Online]. Available: https://www.rohde-
schwarz.com/se/about/news-press/all-news/enabling-
an-ai-native-air-interface-for-6g-rohde-schwarz-showcases-
ai-ml-based-neural-receiver-with-optimized-modulation-
at-brooklyn-6g-summit-in-collaboration-with-nvidia-press- HAITHAM AFIFI obtained his B.Sc degree
release-detailpage_229356-1425541.html in Information Engineering and Technol-
[337] Ericsson. (2021) Defining ai native: A key ogy in 2014 and M.Sc. in Communication
enabler for advanced intelligent telecom networks. Engineering 2015 from the German Uni-
[Online]. Available: https://www.ericsson.com/en/reports- versity in Cairo, and received his PhD at
and-papers/white-papers/ai-native
the Hasso Plattner Institute in 2023. His
[338] C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V.
research interests include wireless network
Poor, “Less data, more knowledge: Building next generation
semantic communication networks,” 2022. virtualization and reinforcement learning,
[339] C. K. Thomas, C. Chaccour, W. Saad, M. Debbah, and C. S. and network optimization. He also has in-
Hong, “Causal reasoning: Charting a revolutionary course dustry experience as a network engineer
for next-generation ai-native wireless networks,” 2023. at Orange Business Services and IT Consultant for integrating
[340] A. Mughees, M. Tahir, M. A. Sheikh, and A. Ahad, “Towards generative AI in network operations.
energy efficient 5g networks using machine learning: Taxon-
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
SABRINA POCHABA completed her Mas- REZA POORZARE (Member, IEEE) received
ter’s degree in mathematics at Ruprecht- the B.S. and M.S. degrees in computer
Karls-University in Heidelberg, Germany. engineering from the Azad University of
Since 2021 she is working as a data sci- Iran, in 2010 and 2014, respectively, and
entist at Salzburg Research Forschungsge- the Ph.D. degree in network engineering
sellschaft and doing her doctoral studies from Universitat Politècnica de Catalunya,
at Paris-Lodron University Salzburg. There Barcelona, Spain, in 2022. Currently, he is
she is engaged in different machine learning a Postdoctoral Researcher with the Data-
methods, focusing on networks and commu- Centric Software Systems (DSS) Research
nication. Group, Institute of Applied Research, Karl-
sruhe University of Applied Sciences, Karlsruhe, Germany. His
research interests include 5G, mmWave, wireless mobile networks,
TCP, MPTCP, congestion control, and artificial intelligence
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3384460
Afifi et al.: A Primer in Machine Learning with Computer Networks: Techniques, Datasets and Models
ERIC SAMIKWA received the M.Sc. degree MICHAEL SEUFERT is a Full Professor at the
in computer science and engineering from University of Augsburg, Germany, heading
the Royal Institute of Technology (KTH), the Chair of Networked Embedded Systems
Stockholm, Sweden, in 2020. He is cur- and Communication Systems. He received
rently pursuing the Ph.D. degree with the the Bachelor’s degree in economathematics
Communication and Distributed Systems and the Diploma, PhD, and Habilitation de-
Group, Institute of Computer Science, Uni- grees in computer science from the Univer-
versity of Bern, Bern, Switzerland. His re- sity of Würzburg, Germany, and holds the
search interests are in the areas of dis- First State Examination degree in mathe-
tributed machine learning, federated learn- matics, computer science, and education for
ing, split learning, edge computing, and Internet of Things. teaching in secondary schools. His research focuses on user-centric
communication networks, including QoE of Internet applications,
AI/ML for QoE-aware network management, as well as perfor-
mance modeling and evaluation of communication systems.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4