Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation
Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation
net/publication/345990587
CITATIONS READS
12 1,194
4 authors:
Some of the authors of this publication are also working on these related projects:
Reconstruccion De La Cuenca Del Rio Paez Y Zonas Aledañas- Nasa Kiwe, Para Desarrollar El Proyecto De Investigacion Proyecto Para La Realizacion De Los Estudios De
Vulnerabilidad Y Riesgos Derivados Del Flujo De Lodo Del Volcan Nevado Del Huila View project
Servicios de generación de alertas agroclimáticas como soporte a la toma de decisiones del sector Cafetero Colombiano, AgroCloud View project
All content following this page was uploaded by Juan Sebastián Rojas Meléndez on 27 November 2020.
ABSTRACT Data caps and service degradation are techniques used to control subscribers’ data consump-
tion. These techniques have emerged mainly due to the growing demands placed on the networking stack
created by the continuous increase in the number of connected users and their feature-rich, bandwidth-heavy
Over-the-Top (OTT) applications. In the mobile network’s scope, where traditional operators offer user data
plans with limited resources, service degradation is a standard mechanism used to throttle consumption.
Limiting user data usage helps to utilize resources better and to ensure the network’s reliable performance.
Nevertheless, this degradation is applied in a generalized way, affecting all user applications without
considering behavior. In this paper, we propose a reference model aiming to address this constraint.
Specifically, we attempt to personalize service degradation policies by providing a guideline for users’
OTT consumption behavior classification based on Incremental Learning (IL). We evaluated our model’s
viability in a case study by investigating the efficacy of several IL algorithms on a dataset containing real-
world users’ OTT application consumption behavior. The algorithms include Naive Bayes (NB), K-Nearest
Neighbor (KNN), Adaptive Random Forest (ARF), Leverage Bagging (LB), Oza Bagging (OB), Learn++,
and Multilayer Perceptron (MLP). The obtained results show that ARF and a composition between LB
and ARF achieve the best performance yielding a classification precision and recall of over 90%. Based
on the obtained results, we propose service degradation policies to support decision making in mission-
critical systems. We argue the strong applicability of our model in real-world scenarios, especially in user
consumption profiling. Our reference model offers a conceptual basis for the tasks that need to be performed
when defining personalized service degradation policies in current and future networks like 5G. To the best
of our knowledge, this work is the first effort in this matter.
INDEX TERMS Over-the-top application, classification, incremental learning, service degradation, decision
making.
I. INTRODUCTION Australia, Canada, South Africa, the UK, and the US are just
Data caps and monthly traffic limits are techniques used to a few countries to name where data caps are regularly used to
control subscribers’ data usage [1], [2]. They have emerged assist mobile operators [4].
mainly due to the growing demands placed on the networking A typical application of data caps is service degradation
stack created by the continuous increase in connected users, policies. Traditional mobile network operators offer user data
new applications, and services. Historically, these techniques plans with consumption limits. In such networks, service
have been used by Internet Service Providers (ISP). Today, degradation is a standard mechanism used to limit the amount
they are more commonly seen among mobile operators [3]. of information that can be transferred by the users over a spe-
cific period [2], [5]–[7]. This mechanism is usually applied
The associate editor coordinating the review of this manuscript and when a user exceeds the predefined consumption limit to save
approving it for publication was Shagufta Henna. resources and ensure the high performance of the network.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
207426 VOLUME 8, 2020
J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation
Nevertheless, this degradation is applied in a generalized way, Section 4 presents the case study focused on the proposal
affecting the performance of all user applications. of personalized service degradation policies. This section
Service degradation is heavily based on traffic analysis, includes the experimental scenarios and the analyses that
especially in the investigation of Over-the-Top (OTT) media were obtained through the implementation of the reference
and communications services. OTT refers to applications that model. Finally, in Section 5, we conclude this paper and
deliver audio, video, and other media over the Internet by sketch future work.
leveraging the infrastructure deployed by network operators
but without their involvement in its management or distribu- II. RELATED WORK
tion of the content [8]. These applications are well-known for We performed a systematic mapping [15] of existing works to
their large consumption of network resources to offer their identify related research. In this mapping, we analyzed recent
different functionalities, generating a massive impact on the works published between 2015 and 2020 in four scientific
network’s operation and management. While the implemen- databases – Scopus, ScienceDirect, IEEE Xplore, and Google
tation of service degradation policies seems straightforward, scholar. Figure 1 shows the obtained map with six different
the analysis of users’ consumption behavior is rarely con- research contexts and six research types. The former was
sidered. However, by including such analysis, operators can self-defined while the methodology itself specified the latter
take well-based decisions to maintain a network performance (note, we did not modify the predefined types). A general
while affecting the users’ perception of the service provision- observation (also evident from Figure 1) is that most works
ing as least as possible. are related to the domain of the social network having a
Machine Learning (ML) is regularly used in users’ con- solution proposal type. No evaluation research, philosophical
sumption behavior analysis [9]–[13]. However, most of the or experience works were identified. Hence, neither new pro-
models used are based on traditional batch learning, which posals nor experiences gained, lessons learned papers were
is well-known for several limitations. (i) Depending on the included in this study.
algorithm and the dataset’s size, the training phase might take
a considerable amount of time and demand high computa-
tional resources. (ii) When training with a small dataset with
no representative number of samples, the algorithm might
present a poor performance. (iii) Once an algorithm has been
trained and implemented, it cannot acquire new knowledge
from new samples. Consequently, if there is a change in the
data’s statistical properties (i.e., a concept drift), a new model
must be generated.
Incremental Learning (IL) constitutes an alternative to
traditional ML. It is built under the premise of continuous
learning and adaptation. Its main advantage is that it does not
require previous perception data. Therefore, it appears to be
a suitable solution to address the OTT service degradation FIGURE 1. Systematic map – User profiling.
limitations where consumption behaviors can change over
time in an indeterministic manner. The primary focus of interest in this study is network traffic
This paper proposes a reference model that can serve as a flow measurement and analysis. The obtained knowledge
guideline for users’ OTT consumption behavior classification is used for smart user consumption profiling and service
based on IL. We evaluated our model’s viability in a case degradation policy development. Therefore, we investigated
study by investigating the efficacy of several IL algorithms those works that were published in the domain that includes
on a dataset containing real-world users’ OTT application this area – information and telecommunication technologies.
consumption behavior. We rigorously assessed the NB, KNN, Additionally, selected works related to social networks and
ARF, LB, OB, Learn++, and MLP algorithms. The obtained mobile services were also included in our list as we found
results show that ARF and a composition between LB and some derived conclusions informational. In what follows
ARF achieve the best performance yielding a classification next, we briefly discuss these related works.
precision and recall of over 90%. Subsequently, we proposed Vinupaul et al. [16] utilized network flow analysis as a
service degradation policies based on the users’ consumption viable method for user identification. They introduced the
classification and the 5G technical specification [14]. Our concept of flow-bundle-level features to characterize and
reference model offers a conceptual basis for smart person- identify users. Four different ML models were validated using
alization of service degradation policies. a dataset of flow records that contained 65 users. The authors
The rest of this paper is organized as follows. achieved an accuracy of 83% when identifying users using
Section 2 presents the related work in the area of user the random forest algorithm.
profiling. In Section 3, we describe our reference model Hess et al. [17] defined a profiling model that character-
explaining its actors, components, and workflow. Then, izes the user behavior and temporal dynamics. The spatial
locations during an hour showed that roughly 80% of users works do not offer a guideline on how to perform user profil-
visit six cells daily. Also, 22% of users did not revisit ing either.
a site, and around 30% revisited the same sites between
6-10 days. Finally, holiday users showed a higher number of III. REFERENCE MODEL FOR IL-BASED USER OTT
upload/download rates. The knowledge of such regularity in CONSUMPTION CLASSIFICATION
mobility can help predict high traffic loads and achieve better This section presents our reference model that is depicted
network management. in Figure 2. The description of its main components, actors,
Rajashekar et al. [18] studied the extent to which phone and workflow is presented as follows.
behaviors can be associated with users. To achieve this, they
used a multilayer perceptron configured to act as an autoen- A. ACTORS
coder. They applied a self-organizing map (SOM) to unveil We identify four actors with the following roles:
unique characterizations of user behavior. The application,
• A network user is any person that has access to a device
cell tower, and website usage logs were used to generate user
capable of connecting to the network (e.g., smartphone,
behavior profiles and isolate a single user for a specific device
laptop, and tablet). Their main activity is network traffic
based on the behavior.
generation through the consumption of OTT applica-
Bakhshandeh et al. [19] showed that identifying the users
tions or any other activity that can be performed through
based on their behavior instead of IP addresses is vital since
the Internet.
the dynamic assignation of addresses can easily avoid secu-
• The network expert represents the person or staff who
rity mechanisms. The authors proposed a method for efficient
knows about network architecture and devices’ technical
user identification method that is based on NetFlow traffic
activities. This actor can execute network devices’ con-
flows. This method can extract a set of features from the
figuration, gather and summarize network information,
network flows and use a random forest model to classify
and configure environments for network experiments.
users. Their model achieved a precision of 94% in user iden-
• The data analyst is in charge of developing all the activ-
tification. The results showed that such a method is suitable
ities related to data analysis and ML models. Those
for forensic science. It does not require examining the total
activities include data cleaning and preprocessing, data
traffic (header with payload) for user identification and char-
transformation, clustering and pattern recognition, train-
acterization.
ing, selection, and deployment of ML models.
Zhao et al. [20] investigated the options for inferring users’
• Finally, the network analyst is the person or group of
gender based on application snapshots installed on mobile
people in charge of the decision-making process related
devices. Their main goal was to make smartphone systems
to network maintenance, network policies application,
more user-friendly and provide better personalized services
resource allocation, and business strategies. This actor
and products. They observed that it is possible to infer gender
can leverage the data analyst’s valuable information and
from usage habits. They also identified significant differences
the IL model to make strategic decisions. In some cases,
between application types, functionalities, and icon designs
depending on the available staff, the network analyst and
through an empirical study. Their prediction model achieved
network expert’s activities can be handled by the same
76.62% accuracy.
person or group. In our case study, the decision-making
Paul et al. [21] characterized the network of verified users
process is related to the proposal of personalized service
on Twitter. Their analysis extracted verified users on Twitter
degradation policies.
and obtained 231,246 user-profiles and 79,213,811 connec-
tions. Their analysis confirmed that the sub-graph of verified
users mirrors the full Twitter user graph in some aspects, such B. COMPONENTS
as possessing a short diameter. They also obtained interesting We also define three macro components. These are depicted
differences, such as the possession of a power-law out-degree in Figure 2. Except for the network user, each actor is respon-
distribution, slight dissortativity, and a significantly higher sible for one specific component. The description of the
reciprocity rate. Finally, the authors observed stationarity in components is presented as follows.
the time series of verified user activity levels that differs from • Data Gathering & Flow Generation comprise all the
traditional Twitter users. processes that must be performed by the network expert.
In summary, when compared with our approach, related Typically, the network expert holds all the knowl-
works discussed above are aimed at user profiling in a dif- edge related to network equipment and architecture.
ferent context. Accordingly, they fall short of addressing the These processes (packet persistence and flow gener-
classification domain of OTT user consumption behavior. ation) involve all the activities related to configur-
None of them provides a reference model proposal similar ing network devices, gathering and storing IP packets,
to ours. They neither aim to obtain an overview of the users’ packet aggregation for flow generation, and application
OTT consumption behavior to support personalized service labeling.
degradation policies. Furthermore, the proposed models in • Data Preprocessing and Model Selection comprises all
related research are based on traditional ML. Lastly, related the processes related to the data analyst’s skills and
knowledge. All raw data preprocessing is carried out capable of storing large amounts of data is therefore highly
in this component. It includes flow cleaning, user con- recommended.
sumption estimation, clustering and pattern recognition, In the second step, flow generation is performed. Network
and data cleaning. Furthermore, the data analyst is also flows are a statistical representation of the communications
responsible for the IL algorithms’ comparison and eval- generated by user devices. They carry information related to
uation to obtain the best model that maintain its perfor- the amount of transferred data and the communication dura-
mance consistency and knowledge retention. tion, among many others. Flows are obtained by aggregating
• Finally, the Decision-Making involves the processes that packets into flows [22]. Aggregation is performed based on
the network analyst must perform based on their knowl- a 5-tuple, precisely the source and destination IP addresses
edge and the user information obtained from the IL and port numbers, and the protocol identifier. Two tasks are
model. This way, the network analyst can make better required to be carried out in this context (steps 2a and 2b):
decisions related to network resource allocation, QoS • Network statistics calculation: as packets belonging to
policies, service pricing, and service offering. In our traffic flows are observed, a set of statistics can be
case study, decision-making is used for proposing ser- calculated to obtain a statistical representation of the
vice degradation policies based on users’ classification communication. Common network statistics include the
according to their OTT consumption behavior. packet sizes, number of packets, and inter-arrival times.
• Application labeling: to obtain an overview of network
users’ consumption behavior, it is necessary to know
C. WORKFLOW the applications that generate the flows. Consequently,
IP Packets are the fundamental elements that describe the all the network flows are labeled with the respective
communications established by network users. Therefore, application name that is being consumed. The most com-
packet persistence is the first step in our workflow. Packet mon approach for flow labeling is Deep Packet Inspec-
capture is usually achieved by a device that observes all the tion (DPI) [23]. Through this approach, the payload of
packets being transferred in the network. Once the packets are network packets is inspected to obtain the application-
captured, they are stored in a specific format known as packet specific information.
capture (PCAP) files. Depending on the traffic, the size of The resulting flow records with their respective application
PCAPs can grow to several hundreds of GB. An infrastructure labels are typically stored in a CSV file. However, other
Our architecture shown in Figure 4 aims to extend the The tasks associated with these steps are summarized as
network with a knowledge plane by gathering informa- follows.
tion relevant to IL-based users’ OTT consumption behavior Step 1: packet persistence. We started with capturing
classification. With such an extension, we aim to support packet traces in the network of Universidad del Cauca,
the network administrators in defining personalized service Colombia. The packets were captured using Wireshark on a
degradation policies. The description of its components is as server configured as a mirror port of a core network switch
follows: throughout ten days in 2019. All the packets were stored in
• Application Plane: this plane contains the network users PCAP files whose total size was roughly 3 TB.
that connect through different kinds of devices and con- Step 2: flow generation. Several applications can orga-
sume the OTT applications. nize IP packets into flows while obtaining flow statistics.
• Data & Control Plane: includes all the devices, network However, the majority of the widely recognized tools cannot
topology, exchanged data, and functional configuration perform layer seven (application) labeling. We developed our
set up by the network administrator. application, named Flow Labeler [26], to perform network
• Knowledge Plane: contains the elements that offer valu- statistics calculations and application labeling to overcome
able knowledge about the network to the administrator. this challenge. Flow Labeler is based on FlowRecorder [27]
The main elements include: and NFStream [28] network traffic flow metering tools.
(i) The capture server configured to gather the network It follows the architectural principles of the traditional
traffic containing the information about OTT con- network traffic measurement paradigm [22]. Using Flow
sumption behavior; Labeler, we processed all the captured PCAP files. Even-
(ii) Data cleaning & preprocessing where the raw data tually, we obtained a dataset containing 2,704,839 records
is processed and analyzed; with 50 attributes and 141 application labels. The features
(iii) The users’ consumption profiles dataset containing measured by Flow Labeler are shown in Table 1.
OTT consumption behavior data; Step 3: flow cleaning. During flow cleaning, we first
(iv) The IL classification model used for 1) classifying discarded the flow records that were not generated by user
the users into their corresponding consumption pro- devices. This was achieved by a filter that retained only
files, and 2) administrator support during the defi- those flows that belonged to a specific range of IP addresses
nition of personalized service degradation policies. holding user devices. Then a selection process was applied to
• Management Plane: represents the plane where the net- preserve only the most representative OTT applications. The
work administrator makes all the decisions to define the original dataset contained flows generated by 141 different
service degradation policies based on the users’ behavior applications. After the selection process, flows belonging
analysis. to 56 OTT applications were retained. These flows were
generated by popular applications such as Facebook, Netflix,
B. GROUND TRUTH DEVELOPMENT Spotify, Twitter, Dropbox, WhatsApp, and YouTube.
Following Steps 1-6 of our reference model’s workflow (see Step 4: user consumption estimation. After flow clean-
Section 3.C), we systematically developed a ground truth. ing, we proceeded with user consumption estimation.
occupations. However, considering the clusters’ behavior, last window of observed samples.Then, whenever new input
this case could be considered as an outlier in the data. data is received, the algorithm searches within these stored
Step 6: data preprocessing. In data preprocessing, samples and finds the closest neighbors using a selected
we focused on two goals: (i) to remove anomalies and (ii) to distance metric (e.g., Euclidean distance). While maintaining
improve the class balance. To achieve this, we applied the low search times, scikit-multiflow uses a structure called a K
Inter Quartile Range (IQR) [32] approach to detect and Dimensional Tree to store the samples.
remove the outliers in the dataset. To improve the class bal- OB [44] is an incremental ensemble learning method that
ance, we used the SMOTE algorithm [33]. As a result, we improves the traditional Bagging from the batch setting.
obtained a dataset with 1,249 instances (users). The classes It simulates the training phase by taking each arriving sample
were distributed based on their consumption leading to the to train the base estimator over k times. This sample is drawn
following statistics: by a binomial distribution – a method that can be considered
• 510 low consumption users, as ‘‘a good drawing with replacement’’ substitution from the
• 333 medium consumption users, and one implemented in the traditional batch learning.
• 406 high consumption users. LB [45] is based on OB. It tries to obtain better results
by modifying the Poisson distribution parameters obtained
from the binomial distribution when assuming an infinite
C. EVALUATION OF IL ALGORITHMS
input data stream. This Poisson distribution is used to perform
KDN is a commonly used paradigm in network traffic man- the drawing with replacement to reduce zero values in the
agement. One of its disadvantages is poor efficiency sourcing distribution’s mass probability function. This is achieved by
mainly from its utilization of traditional ML. increasing the value of λ from 1 to 6.
However, the OTT applications market is highly volatile ARF [46] is an adaptation of the traditional random forest
and unpredictable (e.g., a wide variety of applications, con- algorithm applied to the incremental learning scope. ARF
stant new trends, and technologies). These constraints must generates multiple decision trees and decides how to classify
be addressed through an approach that can adapt to the the input data through a weighted voting system. Within the
dynamic aspects of the network traffic and users’ behav- voting system, the individual tree with the best performance
ior changes. That being said, our goal was to compare the (in terms of accuracy or the Kappa statistic) has a more
IL algorithms’ classification performance in the domain of substantial weight in the votes, i.e., higher priority in the
user OTT consumption behavior. To achieve this, we exam- decision.
ined several IL algorithms’ performance using the dataset Learn++ [47] is an ensemble method inspired by the
described in Section 4.B. Adaptive Boosting (AdaBoost) algorithm. It was originally
developed to improve the classification performance of weak
1) OVERVIEW OF THE ALGORITHMS classifiers. In essence, Learn++ generates an ensemble of
The most commonly used IL algorithms [34]–[39] include weak classifiers, each trained using a different distribution
decision trees, rule-based systems, NB, KNN, SVM, and neu- of training samples. The outputs of these classifiers are then
ral networks. Other common alternatives represent ensemble combined using a majority-voting scheme to obtain the final
methods such as OB, ARF, LB, and Learn++. We eval- classification.
uated the Python scikit-multiflow [40] implementation of The MLP algorithm [48], [49], is a neural network with
these algorithms. The main characteristics of the algorithms three or more layers. It is used to classify data that cannot
are shown in Table 4 as per [41]. Furthermore, their brief be separated linearly. It is a type of artificial neural network
description is as follows. that is fully connected. Every node in a layer is connected to
The Hoeffding Tree (HT) or Very Fast Decision Tree each node in the following layer. It uses a nonlinear activation
(VFDT) [42] is an incremental decision tree induction algo- function (mainly hyperbolic tangent or logistic function).
rithm. It exploits the fact that a small sample of data can Several conclusions can be drawn from Table 4. All algo-
often be enough to choose an optimal splitting attribute for rithms have advantages and disadvantages that must be taken
the construction of the decision tree. This idea is supported into consideration during selection. Neural Networks, for
mathematically by the Hoeffding bound, which gives a level example, can handle noise but need a considerable amount
of confidence that allows choosing the best attribute to split of data for training to offer good performance. Besides,
the tree [43]. its implementation lacks a systematic approach making the
The NB algorithm’s incremental adaptation is a classifier approach mostly empirical. Lazy methods like KNN require
known for its simplicity and low computational cost. This an appropriate number of neighbors to be determined. How-
algorithm performs the classification process while applying ever, once this is obtained, it usually offers good performance
the Bayes theorem, where a strong (naive) assumption is on both regression and classification tasks. Decision trees are
made (all the input data features are independent of each easy to interpret but are quite sensitive when there is a large
other). number of attributes to handle. SVMs have strong mathemat-
The KNN algorithm, in an incremental setting, works by ical foundations, but its training and hyperplane calculations
keeping track of a fixed number of training samples of the can take a considerable amount of time when there is a large
2) EVALUATION OF CLASSIFICATION ACCURACY FIGURE 8. Configuration for the precision, recall and Kappa statistics
Considering the available evaluation approaches that can evaluation.
be considered for IL [50], we first focused on analyzing
the algorithms’ accuracy and learning rate when perform-
ing classification. Each algorithm’s performance in terms Figure 9 depicts the obtained results for the algorithms
of averaged success or error rate can be analyzed through tested in terms of precision, recall, and the Kappa statistic
accuracy measurements. These measurements include pre- with a warm-up of 10%. In comparison, Figure 10 shows
cision, recall, and the Kappa statistic [51], [52]. As the the achieved results with a warm-up of 15%. Similarly,
dataset does not hold many instances, the storage and Figures 11 and 12 show the obtained results for warm-ups
computational requirements are not considered an eval- with samples of 20% and 50% of the original dataset.
uation metric for the current scenarios. As shown in From the results, several conclusions can be derived.
Figure 8, a set of prequential evaluations was carried out In general, in all tests, we observed similar performance
for each algorithm changing the warm-up data sample size behaviors. We noticed that the NB, HT, KNN algorithm
between 10% (125 instances), 15% (187 instances), 20% and the ensemble methods used as base estimators in the
(250 instances), and 50% (625 instances) of the original composition achieved the worst performance.
dataset. For NB, the obtained performance could be due to
The grace period (i.e., the number of instances a leaf should its (naive) assumption of statistical independence between
wait before doing a split) for the decision trees (HT and ARF) the input data attributes. Since the time and data consump-
was set to 100 instances. The split criterion was based on the tion of the same OTT applications are related, assuming all
information gain. Furthermore, ten estimators were selected the attributes are independent could increase the error rate
for the ensemble methods (OB, LB, Learn++, and ARF) throughout the classification process. For the classification
[53]. Finally, a series of tests were carried out to obtain the process, the KNN algorithm maintains a small window of
best results for MLP. previously seen instances (previous perception data). It looks
3) EVALUATION OF CLASSIFICATION
PERFORMANCE MAINTENANCE
We assessed how well the algorithms determined in the
FIGURE 14. Achieved results the precision, recall, and Kappa statistics
previous section maintain their classification performance. with changing samples.
We also assessed how their learning rates are affected when
a change occurs in users’ OTT consumption behavior. First,
the algorithms were pre-trained using the entire dataset. Then,
we investigated their performance maintenance capabilities
through a prequential evaluation by systematically modifying
the behaviors of 60 users. This was to simulate a real-world
implementation use case where user behavior changes are
likely to happen over time. The introduced changes are as
follows:
• 20 Low Consumption users had their behavior changed
(10 users to Medium Consumption and 10 to High Con-
sumption),
• 20 Medium consumption behaviors were changed
(10 users to Low Consumption and 10 to High Con-
sumption), and
• 20 High Consumption users had their behaviors changed FIGURE 15. Precision evolution.
(10 users to Low Consumption and 10 users to Medium
Consumption).
Figure 13 depicts the configuration for this evaluation. The LB with ARF can maintain their performance throughout the
obtained results are shown in Figure 14, while each algo- experiment, as shown in Figure 16.
rithm’s precision evolution is depicted in Figure 15. From the Considering the above mentioned, we can conclude that
figures, we can observe that all the three algorithms presented both ARF and the composition between LB and ARF are
an excellent overall performance. However, as Figure 16 suitable for performing user OTT application consump-
shows, MLP is affected considerably by the changes intro- tion behavior classification. More importantly, these algo-
duced in the users’ behavior at the beginning of the data rithms can provide useful information, even in the event
stream. While the results eventually improve, the adaptation of dynamic user behavior change. Network administrators
to the changes is slow. On the other hand, both ARF and could make fair use of such an approach. IL can make
D. DECISION MAKING
Our case study’s final step was the applicability evaluation of
our reference model in a specific scenario. For this purpose,
we have selected decision-making as a process focused on
the recommendation of a set of personalized service degra-
dation policies. The policies were developed based on the
users’ OTT consumption behavior classified using IL. Since
service degradation is majorly implemented in mobile net-
works, the recommendation was developed in the Policy
and Charging Control (PCC) architecture scope. Specifically,
the concept of a 5G network PCC rule [14] was taken as the
basis for service degradation.
The elements considered in the structure of the PCC rule
are presented in Table 5. PCC rules can be of two types:
predefined or dynamic. Considering the nature of service
degradation, we proposed a dynamic PCC rule since it is
activated by a specified event (e.g., a user exceeds the con-
sumption limit).
In the 3GPP technical specification [14], the ARP is
defined by the 5QI parameter. It logically follows that if a
QoS flow has a GBR (5QI 1 to 4) or a non-GBR (5QI 5 to 9),
there are only two possibilities for defining a service degra-
dation policy (see Table 5 ). Either (i) degrade the service by
affecting the Maximum Bit Rate (MBR) associated with the that data occupation is the main indicator for determining how
SDF or (ii) block the service by modifying the Gate status much the users’ consumption demands network resources.
parameter. Figures 16-21 led to the following observations. Generally,
Considering that 56 OTT applications were identified in the applications that demanded the most data occupation
the dataset, we focused on identifying which applications from highest to lowest were HTTP (browsing), GoogleDocs,
were most commonly consumed in each user group. With Youtube, Gmail, Google, GoogleDrive, and AmazonVideo
this in mind, the 20 most used applications were identi- (7 applications). The Google application was used most of
fied in time and data occupation. Figures 16-18 illustrate the time by the low consumption users, with 10.26 hours
the 20 applications most commonly used in terms of time (see Figure 16). For the medium consumption users, Google
occupation for low, medium, and high consumption users, remained the application where users spent the most time
respectively. Similarly, Figures 19-21 illustrate the 20 most with 77.19 hours (see Figure 17). High consumption users
used applications in data occupation for the three groups. exhibited a large amount of time spent on Google with
Essentially, a larger time occupation does not imply higher 18.52 hours, as shown in Figure 18. Low consumption users
data occupation. Therefore, our recommendation considers exhibited their largest data occupation on HTTP and Google
FIGURE 18. Time occupation (sec) – High consumption users. Amazon, Facebook, Google, WhatsApp, and HTTP
(8 applications).
Considering the above observations, when a user exceeds
their allowed consumption limit, we recommend that all
user groups degrade the most commonly used applications.
In other terms, when a service degradation status is identified,
the MBR parameter is modified for the most commonly
used applications, while the rest are blocked by closing the
Gate Status. This way, the user can keep using the appli-
cations that are consumed most frequently. Simultaneously,
the network operator can save network resources by block-
ing the rest of the applications not accessed by the user so
frequently. To incorporate fairness in our proposal, we also
take into consideration the consumption profiles of the users
that exceed the limits. In consequence, the personalized
service degradation policies per each group are defined as
follows:
FIGURE 19. Data occupation (bytes) – Low consumption users.
• Low consumption group – HTTP, GoogleDocs,
Youtube, Gmail, Google, GoogleDrive, and Amazon-
Video will be degraded while the rest will be blocked.
docs (see Figure 19). Figure 20 shows that applications that • Medium consumption group – Youtube, GoogleDocs,
required the most network resources were Youtube, Google- Gmail, GoogleDrive, GoogleHangoutDuo, Spotify,
Docs, Gmail, GoogleDrive, GoogleHangoutDuo, Spotify, Twitter, and AmazonVideo will be degraded while the
Twitter, and AmazonVideo (8 applications). In terms of data rest will be blocked.
occupation (see Figure 21), these users consumed the most • High consumption group – Youtube, GoogleDrive,
resources using Youtube, GoogleDrive, GoogleHangoutDuo, GoogleHangoutDuo, Amazon, Facebook, Google,
ACKNOWLEDGMENT
The authors would like to thank Universidad del Cauca for
supporting this research.
REFERENCES
[1] J. K. MacKie-Mason and H. R. Varian, ‘‘Some FAQs about usage-
based pricing,’’ Comput. Netw. ISDN Syst., vol. 28, no. 1, pp. 257–265,
Dec. 1995, doi: 10.1016/0169-7552(95)00096-1.
[2] J. Krämer and L. Wiewiorra, ‘‘Data caps and two-sided pricing:
Evaluating managed service business models,’’ presented at the Eur.
Conf. Inf. Syst. (ECIS), Tel Aviv, Israel, 2014. [Online]. Available:
https://aisel.aisnet.org/ecis2014/proceedings/track10/10/.
E. SUMMARY
[3] J. Wortham, ‘‘As mobile networks speed up, data gets capped,’’ New York
This case study was aimed at demonstrating the applicability Times, Aug. 14, 2011.
of our reference model when assisting network operators [4] M. Lasar. (Apr. 4, 2011). It could be worse: Data caps around the
world. Ars Technica. [Online]. Available: https://arstechnica.com/tech-
in critical decision-making processes. The network admin- policy/news/2011/04/how-internet-users-are-disciplined-around-the-
istrators execute decisions based on the model that guides world.ars.
them through several tasks and processes. Following the [5] M. Chetty, R. Banks, A. J. Brush, J. Donner, and R. Grinter, ‘‘You’Re
capped: Understanding the effects of bandwidth caps on broadband use in
model, they eventually achieve the ultimate goal – maintain- the home,’’ in Proc. SIGCHI Conf. Hum. Factors Comput. Syst., New York,
ing user experience and reliable network performance. The NY, USA, 2012, pp. 3021–3030, doi: 10.1145/2207676.2208714.
defined representation model helped to comprehend users’ [6] M. Chetty, H. Kim, S. Sundaresan, S. Burnett, N. Feamster, and
W. K. Edwards, ‘‘uCap: An Internet data management tool for the home,’’
OTT consumption behavior and ensure that data preprocess- in Proc. 33rd Annu. ACM Conf. Hum. Factors Comput. Syst., 2015,
ing requirements are met. The decision-making capabilities pp. 3093–3102, doi: 10.1145/2702123.2702218.
of the architecture were shown to be efficient. The capabil- [7] Ixia. (2013). Quality of Service (QoS) and Policy Management in
Mobile Data Networks. [Online]. Available: https://www.celemetrix.
ities and applicability of our approach are far beyond this co.nz/assets/images/Ixia%20QoS.pdf
case study. Several components, tasks, and processes can be [8] T. Sudtasan and H. Mitomo, ‘‘Effects of OTT services on consumer’s
customized and modified to suit the network, services, and willingness to pay for optical fiber broadband connection in Thailand,’’
presented at the 27th Eur. Regional Conf. Int. Telecommun. Soc. (ITS),
user needs. As a result, our model highly flexible, adaptable, Evol. North-South Telecommun. Divide, Role Europe, Cambridge, U.K.,
and replicable. 2016. [Online]. Available: https://www.econstor.eu/handle/10419/148709
[9] S. Ponmaniraj, R. Rashmi, and M. V. Anand, ‘‘IDS based network security [27] A. Pekar. (2020). Drnpkr/FlowRecorder. GitHub. [Online]. Available:
architecture with TCP/IP parameters using machine learning,’’ in Proc. https://github.com/drnpkr/flowRecorder
Int. Conf. Comput., Power Commun. Technol. (GUCON), Sep. 2018, [28] Z. Aouini. (2020). Nfstream/Nfstream. GitHub. [Online]. Available:
pp. 111–114, doi: 10.1109/GUCON.2018.8674974. https://github.com/nfstream/nfstream
[10] L. Zhang, X. Meng, and H. Zhou, ‘‘Network fault diagnosis using [29] M. Charrad, N. Ghazzali, V. Boiteau, and A. Niknafs, ‘‘NbClust: AnRPack-
hierarchical SVMs based on kernel method,’’ in Proc. 2nd Int. Work- age for determining the relevant number of clusters in a data set,’’ J. Stat.
shop Knowl. Discovery Data Mining, Jan. 2009, pp. 753–756, doi: Softw., vol. 61, no. 6, 2014, Art. no. 1, doi: 10.18637/jss.v061.i06.
10.1109/WKDD.2009.79. [30] K. Djouzi and K. Beghdad-Bey, ‘‘A review of clustering algorithms for big
[11] S. Troia, D. E. Martinez, I. Martin, L. M. M. Zorello, G. Maier, data,’’ in Proc. Int. Conf. Netw. Adv. Syst. (ICNAS), Jun. 2019, pp. 1–6, doi:
J. A. Hernandez, O. G. de Dios, M. Garrich, J. L. Romero-Gazquez, 10.1109/ICNAS.2019.8807822.
F.-J. Moreno-Muro, P. P. Marino, and R. Casellas, ‘‘Machine learning- [31] M. Hassani, ‘‘Overview of efficient clustering methods for high-
assisted planning and provisioning for SDN/NFV-enabled metropolitan dimensional big data streams,’’ in Clustering Methods for Big Data
networks,’’ in Proc. Eur. Conf. Netw. Commun. (EuCNC), Jun. 2019, Analytics: Techniques, Toolboxes and Applications, O. Nasraoui and
pp. 438–442, doi: 10.1109/EuCNC.2019.8801956. C.-E. B. N’Cir, Eds. Cham, Switzerland: Springer, 2019, pp. 25–42.
[12] M. Shafiq, X. Yu, A. A. Laghari, L. Yao, N. K. Karn, and F. Abdessamia,
[32] G. Upton and I. Cook, Understanding Statistics, Illustrated. Oxford, U.K.:
‘‘Network traffic classification techniques and comparative analysis using
OUP Oxford, 1996.
machine learning algorithms,’’ in Proc. 2nd IEEE Int. Conf. Com-
put. Commun. (ICCC), Oct. 2016, pp. 2451–2455, doi: 10.1109/Comp- [33] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE:
Comm.2016.7925139. Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16,
pp. 321–357, Jun. 2002.
[13] J. S. Rojas, Á. R. Gallón, and J. C. Corrales, ‘‘Personalized service degra-
dation policies on OTT applications based on the consumption behavior of [34] P. Joshi, ‘‘Incremental learning: Areas and methods—A survey,’’ Int. J.
users,’’ in Computational Science and Its Applications—ICCSA (Lecture Data Mining Knowl. Manage. Process, vol. 2, no. 5, pp. 43–51, Sep. 2012,
Notes in Computer Science), vol. 10962. Cham, Switzerland: Springer, doi: 10.5121/ijdkp.2012.2504.
2018, doi: 10.1007/978-3-319-95168-3_37. [35] R. R. Ade and P. R. Deshmukh, ‘‘Methods for incremental learning:
[14] Policy and Charging Control Framework for the 5G System (5GS), A survey,’’ Int. J. Data Mining Knowl. Manage. Process, vol. 3, no. 4,
document 123.503, 3GPP Technical Specification (TS), Sep. 2020. pp. 119–125, Jul. 2013, doi: 10.5121/ijdkp.2013.3408.
[Online]. Available: https://www.etsi.org/deliver/etsi_ts/123500_123599/ [36] V. Lemaire, C. Salperwyck, and A. Bondu, ‘‘A survey on supervised
123503/16.05.01_60/ts_123503v160501p.pdf classification on data streams,’’ in Proc. 4th Eur. Bus. Intell. Summer
[15] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, ‘‘Systematic mapping School (eBISS), in Tutorial Lectures, Berlin, Germany, E. Zimányi and
studies in software engineering,’’ in Proc. 12th Int. Conf. Eval. Assess- R.-D. Kutsche, Eds. Cham, Switzerland: Springer, Jul. 2014, pp. 88–125.
ment Softw. Eng., Swinton, U.K., 2008, pp. 68–77. [Online]. Available: [37] J. Zhong, Z. Liu, Y. Zeng, L. Cui, and Z. Ji, ‘‘A survey on incremen-
http://dl.acm.org/citation.cfm?id=2227115.2227123 tal learning,’’ presented at the 5th Int. Conf. Comput., Automat. Power
[16] M. V. Vinupaul, R. Bhattacharjee, R. Rajesh, and G. S. Kumar, ‘‘User char- Electron., Nov. 2017. [Online]. Available: https://webofproceedings.
acterization through network flow analysis,’’ in Proc. Int. Conf. Data Sci. org/proceedings_series/article/artId/777.html
Eng. (ICDSE), Aug. 2016, pp. 1–6, doi: 10.1109/ICDSE.2016.7823965. [38] Q. Yang, Y. Gu, and D. Wu, ‘‘Survey of incremental learning,’’ in
[17] A. Hess, I. Marsh, and D. Gillblad, ‘‘Exploring communication and Proc. Chin. Control Decis. Conf. (CCDC), Jun. 2019, pp. 399–404, doi:
mobility behavior of 3G network users and its temporal consistency,’’ in 10.1109/CCDC.2019.8832774.
Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2015, pp. 5916–5921, doi: [39] S. Madhavan and N. Kumar, ‘‘Incremental methods in face recogni-
10.1109/ICC.2015.7249265. tion: A survey,’’ Artif. Intell. Rev., Aug. 2019, doi: 10.1007/s10462-019-
[18] D. Rajashekar, A. N. Zincir-Heywood, and M. I. Heywood, ‘‘Smart phone 09734-3.
user behaviour characterization based on autoencoders and self organizing [40] J. Montiel, J. Read, A. Bifet, and T. Abdessalem, ‘‘Scikit-multiflow:
maps,’’ in Proc. IEEE 16th Int. Conf. Data Mining Workshops (ICDMW), A multi-output streaming framework,’’ J. Mach. Learn. Res., vol. 19,
Dec. 2016, pp. 319–326, doi: 10.1109/ICDMW.2016.0052. no. 72, pp. 1–5, 2018.
[19] A. Bakhshandeh and Z. Eskandari, ‘‘An efficient user identification [41] A. Chefrour, ‘‘Incremental supervised learning: Algorithms and applica-
approach based on netflow analysis,’’ in Proc. 15th Int. ISC, Iranian Soc. tions in pattern recognition,’’ Evol. Intell., vol. 12, no. 2, pp. 97–112,
Cryptol., Conf. Inf. Secur. Cryptol. (ISCISC), Aug. 2018, pp. 1–5, doi: Jun. 2019, doi: 10.1007/s12065-019-00203-y.
10.1109/ISCISC.2018.8546856. [42] G. Hulten, L. Spencer, and P. Domingos, ‘‘Mining time-changing data
[20] S. Zhao, Y. Xu, X. Ma, Z. Jiang, Z. Luo, S. Li, L. T. Yang, A. Dey, and streams,’’ in Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery
G. Pan, ‘‘Gender profiling from a single snapshot of apps installed on Data Mining, San Francisco, CA, USA, Aug. 2001, pp. 97–106, doi:
a smartphone: An empirical study,’’ IEEE Trans. Ind. Informat., vol. 16, 10.1145/502512.502529.
no. 2, pp. 1330–1342, Feb. 2020, doi: 10.1109/TII.2019.2938248.
[43] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, ‘‘MOA: Massive online
[21] I. Paul, A. Khattar, P. Kumaraguru, M. Gupta, and S. Chopra,
analysis,’’ J. Mach. Learn. Res., vol. 11, no. 52, pp. 1601–1604, 2010.
‘‘Elites tweet? Characterizing the Twitter verified user network,’’
[44] N. C. Oza, ‘‘Online bagging and boosting,’’ in Proc. IEEE Int.
Mar. 2019, arXiv:1812.09710. Accessed: Jul. 16, 2020. [Online]. Avail-
Conf. Syst., Man Cybern., vol. 3, Oct. 2005, pp. 2340–2345, doi:
able: http://arxiv.org/abs/1812.09710
10.1109/ICSMC.2005.1571498.
[22] R. Hofstede, P. Celeda, B. Trammell, I. Drago, R. Sadre, A. Sperotto, and
A. Pras, ‘‘Flow monitoring explained: From packet capture to data analysis [45] A. Bifet, G. Holmes, and B. Pfahringer, ‘‘Leveraging bagging for evolv-
with NetFlow and IPFIX,’’ IEEE Commun. Surveys Tuts., vol. 16, no. 4, ing data streams,’’ in Machine Learning and Knowledge Discovery in
pp. 2037–2064, 4th Quart., 2014, doi: 10.1109/COMST.2014.2321898. Databases. ECML PKDD (Lecture Notes in Computer Science), vol. 6321,
[23] T. Bujlow, V. Carela-Español, and P. Barlet-Ros, ‘‘Independent comparison J. L. Balcázar, F. Bonchi, A. Gionis, and M. Sebag, Eds. Berlin, Germany:
of popular DPI tools for traffic classification,’’ Comput. Netw., vol. 76, Springer, 2010, doi: 10.1007/978-3-642-15880-3_15.
pp. 75–89, Jan. 2015, doi: 10.1016/j.comnet.2014.11.001. [46] H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger,
[24] D. D. Clark, C. Partridge, J. C. Ramming, and J. T. Wroclawski, ‘‘A knowl- G. Holmes, and T. Abdessalem, ‘‘Adaptive random forests for evolv-
edge plane for the Internet,’’ in Proc. Conf. Appl., Technol., Archit., Pro- ing data stream classification,’’ Mach. Learn., vol. 106, nos. 9–10,
tocols Comput. Commun., Karlsruhe, Germany, Aug. 2003, pp. 3–10, doi: pp. 1469–1495, Oct. 2017, doi: 10.1007/s10994-017-5642-8.
10.1145/863955.863957. [47] R. Polikar, L. Upda, S. S. Upda, and V. Honavar, ‘‘Learn++: An incre-
[25] A. Mestres, A. Rodriguez-Natal, J. Carner, P. Barlet-Ros, E. Alarcón, mental learning algorithm for supervised neural networks,’’ IEEE Trans.
M. Solé, V. Muntés-Mulero, D. Meyer, S. Barkai, M. J. Hibbett, G. Estrada, Syst., Man, Cybern. C, Appl. Rev., vol. 31, no. 4, pp. 497–508, Nov. 2001,
K. Ma’ruf, F. Coras, V. Ermagan, H. Latapie, C. Cassar, J. Evans, F. Maino, doi: 10.1109/5326.983933.
J. Walrand, and A. Cabellos, ‘‘Knowledge-defined networking,’’ ACM [48] R. Lippmann, ‘‘An introduction to computing with neural nets,’’
SIGCOMM Comput. Commun. Rev., vol. 47, no. 3, pp. 2–10, Sep. 2017, IEEE ASSP Mag., vol. 4, no. 2, pp. 4–22, Apr. 1987, doi:
doi: 10.1145/3138808.3138810. 10.1109/MASSP.1987.1165576.
[26] J. S. Rojas. (2020). Jsrojas/FlowLabeler. GitHub. [Online]. Available: [49] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
https://github.com/jsrojas/FlowLabeler. MA, USA: MIT Press, 2016.
[50] A. Cervantes, C. Gagné, P. Isasi, and M. Parizeau, ‘‘Evaluating and ADRIAN PEKAR received the Ph.D. degree in
characterizing incremental learning from non-stationary data,’’ Jun. 2018, computer science from the Technical University
arXiv:1806.06610. [Online]. Available: http://arxiv.org/abs/1806.06610 of Košice, Slovakia, in 2014. He is currently an
[51] D. M. W. Powers. (Dec. 2007). Evaluation: From Precision, Recall Assistant Professor with the Department of Net-
and F-Measure to ROC, Informedness, Markedness & Correlation. worked Systems and Services, Budapest Univer-
[Online]. Available: https://www.researchgate.net/publication/ sity of Technology and Economics, Hungary. Prior
228529307_Evaluation_From_Precision_Recall_and_F-Factor_to_ROC_ to this, he held research, teaching, and engineer-
Informedness_Markedness_Correlation
ing positions in New Zealand and Slovakia. His
[52] M. Banerjee, M. Capozzoli, L. McSweeney, and D. Sinha, ‘‘Beyond kappa:
research interests include network traffic classifi-
A review of interrater agreement measures,’’ Can. J. Statist., vol. 27, no. 1,
pp. 3–23, Mar. 1999, doi: 10.2307/3315487. cation and management, traffic measurement data
[53] (Oct. 23, 2020). Scikit-Multiflow’s Documentation. Accessed: Sep. reduction and visualization, and stream processing.
23, 2020. [Online]. Available: https://scikit-multiflow.github.io/scikit-
multiflow/
[54] P. K. Srimani and M. M. Patil, ‘‘Performance analysis of Hoeffding trees
in data streams by using massive online analysis framework,’’ Int. J. Data
Mining Model. Manage., vol. 7, no. 4, pp. 293–313, 2015. ÁLVARO RENDÓN (Senior Member, IEEE)
[55] D. Stathakis, ‘‘How many hidden layers and nodes?’’ Int. J. Remote received the B.S. degree in electronics engineer-
Sens., vol. 30, no. 8, pp. 2133–2147, Apr. 2009, doi: 10.1080/ ing and the M.S. degree in telematics engineer-
01431160802549278. ing from the Universidad del Cauca, Colombia,
[56] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimiza- in 1979 and 1989, respectively, and the Ph.D.
tion,’’ Jan. 2014, arXiv:1412.6980. [Online]. Available: http://arxiv.org/ degree in telecommunications engineering from
abs/1412.6980 the Technical University of Madrid, in 1997. He is
[57] (Oct. 29, 2020). Labeled Network Traffic Flows—141 Applications.
currently a Full Professor with the Department of
Accessed: Sep. 29, 2020. [Online]. Available: https://kaggle.com/jsrojas/
Telematics, University of Cauca, where he is also
labeled-network-traffic-flows-114-applications
[58] (Oct. 29, 2020). User OTT Consumption Profile—2019. Accessed: the Director of the Doctoral Program in Telematics
Sep. 29, 2020. [Online]. Available: https://kaggle.com/jsrojas/user-ott- Engineering. His research interests include telecommunications services,
consumption-profile-2019 e-health, and S&T management. He was a recipient of the Outstanding
Paper Award in the category of Tools Track at the First IEEE International
Conference on Engineering of Complex Computer Systems in 1995.
JUAN SEBASTIÁN ROJAS received the bache-
lor’s degree in electronics and telecommunications
engineering and the M.Sc. degree in telematics
engineering from the Universidad del Cauca,
Colombia, in 2015 and 2018, respectively. He is JUAN CARLOS CORRALES received the bach-
currently pursuing the Ph.D. degree in telematics elor’s and master’s degrees in telematics engi-
engineering. He is also an Active Researcher with neering from the University of Cauca, Colombia,
the Telematics Engineering Group, Universidad in 1999 and 2004, respectively, and the Ph.D.
del Cauca. Since 2016, he has been granted a Ph.D. degree in sciences, specialty computer science
Scholarship by MinCiencias and the Leader entity from the University of Versailles Saint-Quentin-
of the science, technology and innovation system in Colombia. Furthermore, en-Yve-lines, France, in 2008. He is currently a
he has been an Assistant Professor in several courses related to converged full-time Professor and leads the Telematics Engi-
services, data mining, and machine learning. His research topic involves the neering Group, University of Cauca. His research
study of users’ Over The Top consumption behavior aimed at the personal- interests include service composition and data
ization of service degradation through incremental learning supported by the analysis.
Knowledge Defined Networking paradigm.