0% found this document useful (0 votes)

33 views

Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

This document summarizes a research paper that proposes a model for personalized service degradation policies for Over-the-Top (OTT) services based on incremental learning of user consumption profiles. The model was evaluated using a case study with real user data, testing several incremental learning algorithms. The Adaptive Random Forest and Leverage Bagging algorithms achieved over 90% precision and recall in classifying user profiles, demonstrating the viability of the proposed model for personalized policies that consider individual usage behaviors.

Uploaded by

Rifqi Zumadi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

Uploaded by

Rifqi Zumadi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/345990587

Smart User Consumption Proﬁling: Incremental Learning-Based OTT Service

Degradation

Article in IEEE Access · November 2020

DOI: 10.1109/ACCESS.2020.3037971

CITATIONS READS

12 1,194

4 authors:

Juan Sebastián Rojas Meléndez Adrian Pekar

Universidad del Cauca Budapest University of Technology and Economics
6 PUBLICATIONS 53 CITATIONS 34 PUBLICATIONS 284 CITATIONS

SEE PROFILE SEE PROFILE

Alvaro Rendón Juan Carlos Corrales

Universidad del Cauca Universidad del Cauca
35 PUBLICATIONS 185 CITATIONS 186 PUBLICATIONS 1,270 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Reconstruccion De La Cuenca Del Rio Paez Y Zonas Aledañas- Nasa Kiwe, Para Desarrollar El Proyecto De Investigacion Proyecto Para La Realizacion De Los Estudios De
Vulnerabilidad Y Riesgos Derivados Del Flujo De Lodo Del Volcan Nevado Del Huila View project

Servicios de generación de alertas agroclimáticas como soporte a la toma de decisiones del sector Cafetero Colombiano, AgroCloud View project

All content following this page was uploaded by Juan Sebastián Rojas Meléndez on 27 November 2020.

The user has requested enhancement of the downloaded file.

Received October 27, 2020, accepted November 9, 2020, date of publication November 16, 2020,
date of current version November 27, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3037971

Smart User Consumption Profiling: Incremental

Learning-Based OTT Service Degradation
JUAN SEBASTIÁN ROJAS 1 , ADRIAN PEKAR 2, ÁLVARO RENDÓN 1, (Senior Member, IEEE),
AND JUAN CARLOS CORRALES 1
1 Telematics
Engineering Research Group, Universidad del Cauca, Popayán 190003, Colombia
2 Department
of Networked Systems and Services, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics,
1117 Budapest, Hungary
Corresponding author: Adrian Pekar ([email protected])
This work was supported by the National Research, Development and Innovation Office (NKFIH), Hungary, under Grant BME IE-NAT
TKP2020. The work of Juan Sebastián Rojas was supported by the Administrative Department of Science, Technology, and Innovation of
Colombia (MINCIENCIAS) through the Scholarship Granted in the Call for National Doctorates under Grant 727-2015.

ABSTRACT Data caps and service degradation are techniques used to control subscribers’ data consump-
tion. These techniques have emerged mainly due to the growing demands placed on the networking stack
created by the continuous increase in the number of connected users and their feature-rich, bandwidth-heavy
Over-the-Top (OTT) applications. In the mobile network’s scope, where traditional operators offer user data
plans with limited resources, service degradation is a standard mechanism used to throttle consumption.
Limiting user data usage helps to utilize resources better and to ensure the network’s reliable performance.
Nevertheless, this degradation is applied in a generalized way, affecting all user applications without
considering behavior. In this paper, we propose a reference model aiming to address this constraint.
Specifically, we attempt to personalize service degradation policies by providing a guideline for users’
OTT consumption behavior classification based on Incremental Learning (IL). We evaluated our model’s
viability in a case study by investigating the efficacy of several IL algorithms on a dataset containing real-
world users’ OTT application consumption behavior. The algorithms include Naive Bayes (NB), K-Nearest
Neighbor (KNN), Adaptive Random Forest (ARF), Leverage Bagging (LB), Oza Bagging (OB), Learn++,
and Multilayer Perceptron (MLP). The obtained results show that ARF and a composition between LB
and ARF achieve the best performance yielding a classification precision and recall of over 90%. Based
on the obtained results, we propose service degradation policies to support decision making in mission-
critical systems. We argue the strong applicability of our model in real-world scenarios, especially in user
consumption profiling. Our reference model offers a conceptual basis for the tasks that need to be performed
when defining personalized service degradation policies in current and future networks like 5G. To the best
of our knowledge, this work is the first effort in this matter.

INDEX TERMS Over-the-top application, classification, incremental learning, service degradation, decision
making.

I. INTRODUCTION Australia, Canada, South Africa, the UK, and the US are just
Data caps and monthly traffic limits are techniques used to a few countries to name where data caps are regularly used to
control subscribers’ data usage [1], [2]. They have emerged assist mobile operators [4].
mainly due to the growing demands placed on the networking A typical application of data caps is service degradation
stack created by the continuous increase in connected users, policies. Traditional mobile network operators offer user data
new applications, and services. Historically, these techniques plans with consumption limits. In such networks, service
have been used by Internet Service Providers (ISP). Today, degradation is a standard mechanism used to limit the amount
they are more commonly seen among mobile operators [3]. of information that can be transferred by the users over a spe-
cific period [2], [5]–[7]. This mechanism is usually applied
The associate editor coordinating the review of this manuscript and when a user exceeds the predefined consumption limit to save
approving it for publication was Shagufta Henna. resources and ensure the high performance of the network.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
207426 VOLUME 8, 2020
J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

Nevertheless, this degradation is applied in a generalized way, Section 4 presents the case study focused on the proposal
affecting the performance of all user applications. of personalized service degradation policies. This section
Service degradation is heavily based on traffic analysis, includes the experimental scenarios and the analyses that
especially in the investigation of Over-the-Top (OTT) media were obtained through the implementation of the reference
and communications services. OTT refers to applications that model. Finally, in Section 5, we conclude this paper and
deliver audio, video, and other media over the Internet by sketch future work.
leveraging the infrastructure deployed by network operators
but without their involvement in its management or distribu- II. RELATED WORK
tion of the content [8]. These applications are well-known for We performed a systematic mapping [15] of existing works to
their large consumption of network resources to offer their identify related research. In this mapping, we analyzed recent
different functionalities, generating a massive impact on the works published between 2015 and 2020 in four scientific
network’s operation and management. While the implemen- databases – Scopus, ScienceDirect, IEEE Xplore, and Google
tation of service degradation policies seems straightforward, scholar. Figure 1 shows the obtained map with six different
the analysis of users’ consumption behavior is rarely con- research contexts and six research types. The former was
sidered. However, by including such analysis, operators can self-defined while the methodology itself specified the latter
take well-based decisions to maintain a network performance (note, we did not modify the predefined types). A general
while affecting the users’ perception of the service provision- observation (also evident from Figure 1) is that most works
ing as least as possible. are related to the domain of the social network having a
Machine Learning (ML) is regularly used in users’ con- solution proposal type. No evaluation research, philosophical
sumption behavior analysis [9]–[13]. However, most of the or experience works were identified. Hence, neither new pro-
models used are based on traditional batch learning, which posals nor experiences gained, lessons learned papers were
is well-known for several limitations. (i) Depending on the included in this study.
algorithm and the dataset’s size, the training phase might take
a considerable amount of time and demand high computa-
tional resources. (ii) When training with a small dataset with
no representative number of samples, the algorithm might
present a poor performance. (iii) Once an algorithm has been
trained and implemented, it cannot acquire new knowledge
from new samples. Consequently, if there is a change in the
data’s statistical properties (i.e., a concept drift), a new model
must be generated.
Incremental Learning (IL) constitutes an alternative to
traditional ML. It is built under the premise of continuous
learning and adaptation. Its main advantage is that it does not
require previous perception data. Therefore, it appears to be
a suitable solution to address the OTT service degradation FIGURE 1. Systematic map – User profiling.
limitations where consumption behaviors can change over
time in an indeterministic manner. The primary focus of interest in this study is network traffic
This paper proposes a reference model that can serve as a flow measurement and analysis. The obtained knowledge
guideline for users’ OTT consumption behavior classification is used for smart user consumption profiling and service
based on IL. We evaluated our model’s viability in a case degradation policy development. Therefore, we investigated
study by investigating the efficacy of several IL algorithms those works that were published in the domain that includes
on a dataset containing real-world users’ OTT application this area – information and telecommunication technologies.
consumption behavior. We rigorously assessed the NB, KNN, Additionally, selected works related to social networks and
ARF, LB, OB, Learn++, and MLP algorithms. The obtained mobile services were also included in our list as we found
results show that ARF and a composition between LB and some derived conclusions informational. In what follows
ARF achieve the best performance yielding a classification next, we briefly discuss these related works.
precision and recall of over 90%. Subsequently, we proposed Vinupaul et al. [16] utilized network flow analysis as a
service degradation policies based on the users’ consumption viable method for user identification. They introduced the
classification and the 5G technical specification [14]. Our concept of flow-bundle-level features to characterize and
reference model offers a conceptual basis for smart person- identify users. Four different ML models were validated using
alization of service degradation policies. a dataset of flow records that contained 65 users. The authors
The rest of this paper is organized as follows. achieved an accuracy of 83% when identifying users using
Section 2 presents the related work in the area of user the random forest algorithm.
profiling. In Section 3, we describe our reference model Hess et al. [17] defined a profiling model that character-
explaining its actors, components, and workflow. Then, izes the user behavior and temporal dynamics. The spatial

VOLUME 8, 2020 207427

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

locations during an hour showed that roughly 80% of users works do not offer a guideline on how to perform user profil-
visit six cells daily. Also, 22% of users did not revisit ing either.
a site, and around 30% revisited the same sites between
6-10 days. Finally, holiday users showed a higher number of III. REFERENCE MODEL FOR IL-BASED USER OTT
upload/download rates. The knowledge of such regularity in CONSUMPTION CLASSIFICATION
mobility can help predict high traffic loads and achieve better This section presents our reference model that is depicted
network management. in Figure 2. The description of its main components, actors,
Rajashekar et al. [18] studied the extent to which phone and workflow is presented as follows.
behaviors can be associated with users. To achieve this, they
used a multilayer perceptron configured to act as an autoen- A. ACTORS
coder. They applied a self-organizing map (SOM) to unveil We identify four actors with the following roles:
unique characterizations of user behavior. The application,
• A network user is any person that has access to a device
cell tower, and website usage logs were used to generate user
capable of connecting to the network (e.g., smartphone,
behavior profiles and isolate a single user for a specific device
laptop, and tablet). Their main activity is network traffic
based on the behavior.
generation through the consumption of OTT applica-
Bakhshandeh et al. [19] showed that identifying the users
tions or any other activity that can be performed through
based on their behavior instead of IP addresses is vital since
the Internet.
the dynamic assignation of addresses can easily avoid secu-
• The network expert represents the person or staff who
rity mechanisms. The authors proposed a method for efficient
knows about network architecture and devices’ technical
user identification method that is based on NetFlow traffic
activities. This actor can execute network devices’ con-
flows. This method can extract a set of features from the
figuration, gather and summarize network information,
network flows and use a random forest model to classify
and configure environments for network experiments.
users. Their model achieved a precision of 94% in user iden-
• The data analyst is in charge of developing all the activ-
tification. The results showed that such a method is suitable
ities related to data analysis and ML models. Those
for forensic science. It does not require examining the total
activities include data cleaning and preprocessing, data
traffic (header with payload) for user identification and char-
transformation, clustering and pattern recognition, train-
acterization.
ing, selection, and deployment of ML models.
Zhao et al. [20] investigated the options for inferring users’
• Finally, the network analyst is the person or group of
gender based on application snapshots installed on mobile
people in charge of the decision-making process related
devices. Their main goal was to make smartphone systems
to network maintenance, network policies application,
more user-friendly and provide better personalized services
resource allocation, and business strategies. This actor
and products. They observed that it is possible to infer gender
can leverage the data analyst’s valuable information and
from usage habits. They also identified significant differences
the IL model to make strategic decisions. In some cases,
between application types, functionalities, and icon designs
depending on the available staff, the network analyst and
through an empirical study. Their prediction model achieved
network expert’s activities can be handled by the same
76.62% accuracy.
person or group. In our case study, the decision-making
Paul et al. [21] characterized the network of verified users
process is related to the proposal of personalized service
on Twitter. Their analysis extracted verified users on Twitter
degradation policies.
and obtained 231,246 user-profiles and 79,213,811 connec-
tions. Their analysis confirmed that the sub-graph of verified
users mirrors the full Twitter user graph in some aspects, such B. COMPONENTS
as possessing a short diameter. They also obtained interesting We also define three macro components. These are depicted
differences, such as the possession of a power-law out-degree in Figure 2. Except for the network user, each actor is respon-
distribution, slight dissortativity, and a significantly higher sible for one specific component. The description of the
reciprocity rate. Finally, the authors observed stationarity in components is presented as follows.
the time series of verified user activity levels that differs from • Data Gathering & Flow Generation comprise all the
traditional Twitter users. processes that must be performed by the network expert.
In summary, when compared with our approach, related Typically, the network expert holds all the knowl-
works discussed above are aimed at user profiling in a dif- edge related to network equipment and architecture.
ferent context. Accordingly, they fall short of addressing the These processes (packet persistence and flow gener-
classification domain of OTT user consumption behavior. ation) involve all the activities related to configur-
None of them provides a reference model proposal similar ing network devices, gathering and storing IP packets,
to ours. They neither aim to obtain an overview of the users’ packet aggregation for flow generation, and application
OTT consumption behavior to support personalized service labeling.
degradation policies. Furthermore, the proposed models in • Data Preprocessing and Model Selection comprises all
related research are based on traditional ML. Lastly, related the processes related to the data analyst’s skills and

207428 VOLUME 8, 2020

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

FIGURE 2. Reference model.

knowledge. All raw data preprocessing is carried out capable of storing large amounts of data is therefore highly
in this component. It includes flow cleaning, user con- recommended.
sumption estimation, clustering and pattern recognition, In the second step, flow generation is performed. Network
and data cleaning. Furthermore, the data analyst is also flows are a statistical representation of the communications
responsible for the IL algorithms’ comparison and eval- generated by user devices. They carry information related to
uation to obtain the best model that maintain its perfor- the amount of transferred data and the communication dura-
mance consistency and knowledge retention. tion, among many others. Flows are obtained by aggregating
• Finally, the Decision-Making involves the processes that packets into flows [22]. Aggregation is performed based on
the network analyst must perform based on their knowl- a 5-tuple, precisely the source and destination IP addresses
edge and the user information obtained from the IL and port numbers, and the protocol identifier. Two tasks are
model. This way, the network analyst can make better required to be carried out in this context (steps 2a and 2b):
decisions related to network resource allocation, QoS • Network statistics calculation: as packets belonging to
policies, service pricing, and service offering. In our traffic flows are observed, a set of statistics can be
case study, decision-making is used for proposing ser- calculated to obtain a statistical representation of the
vice degradation policies based on users’ classification communication. Common network statistics include the
according to their OTT consumption behavior. packet sizes, number of packets, and inter-arrival times.
• Application labeling: to obtain an overview of network
users’ consumption behavior, it is necessary to know
C. WORKFLOW the applications that generate the flows. Consequently,
IP Packets are the fundamental elements that describe the all the network flows are labeled with the respective
communications established by network users. Therefore, application name that is being consumed. The most com-
packet persistence is the first step in our workflow. Packet mon approach for flow labeling is Deep Packet Inspec-
capture is usually achieved by a device that observes all the tion (DPI) [23]. Through this approach, the payload of
packets being transferred in the network. Once the packets are network packets is inspected to obtain the application-
captured, they are stored in a specific format known as packet specific information.
capture (PCAP) files. Depending on the traffic, the size of The resulting flow records with their respective application
PCAPs can grow to several hundreds of GB. An infrastructure labels are typically stored in a CSV file. However, other

VOLUME 8, 2020 207429

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

storage formats can also be considered to store flow measure-

ment data. CSV files are a popular storage format among the
ML community.
The third step is concerning the flow cleaning process.
In this step, unnecessary information is removed. Such unnec-
essary data is, for example, flows that do not belong to
user devices’ communications. This process usually requires
a collaboration between the data analyst and the network
expert. By leveraging their knowledge, the network experts
can ensure that the flows to be removed are not data with high
information values.
Then, in the fourth step, information regarding user con-
sumption estimation is determined. This is achieved by sum- FIGURE 3. IL algorithms workflow.
marizing each user’s behavior, which requires performing
two main tasks (steps 4a and 4b):
• Occupation time calculation: the amount of time spent applied to raw data, a consolidated dataset is obtained. This
by each user consuming OTT applications offers essen- dataset is obtained via Steps 1-6, as shown in the reference
tial insights for the network operator. Knowing this model (see Figure 2). Then, the dataset is used to build the IL
aspect enables the identification of their preferred appli- model through training and testing. It is not uncommon to see
cations and the high demand for network resources. that more than one IL algorithm is used. This results in a set
Therefore, the amount of time that the user spent con- of models.
suming each application must be calculated by leverag- Next, each algorithm’s classification result is reviewed on
ing the network flows’ time-related information. every iteration in a label revision process, and feedback is
• Occupation data calculation: similarly, a further feature provided to the algorithm to maintain the IL setting. This way,
that offers essential insights on users’ behavior is the every algorithm is tested and compared, aiming at selecting
amount of data (bytes) exchanged by each application. the one showing the best classification performance.
As in the previous case, this aspect allows observing if The model capable of performing user classification based
the user demands significant network resources and if a on consumption behavior is prepared for application in the
different data plan might not be proper to their needs. eighth step. It maintains its adaptiveness, scalability, and
Therefore, data occupation is also calculated for each efficiency. Furthermore, the model is built in a way so that
OTT application consumed by the users by leveraging it provides useful information about the users.
the number of bytes exchanged in the network flows. Once all the previous steps have been completed, the net-
Next, clustering and pattern recognition and data work analyst can finally take appropriate actions supporting
preprocessing are performed in the fifth and sixth steps. These the decision-making process in the ninth step. With the users’
steps aim to identify the best approach to organize the users information provided by the classification model, the network
based on their consumption behavior. This process yields a analyst can develop new strategies to apply network policies,
dataset labeled with the set of groups that can be used for manage resource allocation, and issue data plans.
user classification.
Subsequently, IL model selection and training are per- IV. PERSONALIZED SERVICE
formed in the seventh step. IL is built on the premise of DEGRADATION – A CASE STUDY
continuous learning and adaptation, enabling the autonomous We assess the usefulness and real-world applicability of our
incremental development of complex skills and knowledge. reference model in a case study. Specifically, we evaluated
In ML context, IL aims to smoothly update the prediction our approach using prototype-level evaluations when evolv-
model to account for different tasks and data distributions ing towards proposing personalized service degradation poli-
while still being able to re-use and retain knowledge over cies. In what follows, we first describe the architecture used
time. Figure 3 illustrates a typical IL workflow. IL builds for both data measurement and decision-making. Then we
its prediction capabilities iteratively on top of what was discuss ground truth development as one of the key enablers
previously learned. This way, it can amend previous errors of our approach. This is followed by the performance evalua-
and shortcomings while efficiently adapting to new envi- tion of IL algorithms when investigated in OTT consumption
ronmental conditions as new data becomes available. As a behavior classification. Finally, we elaborate on developing
result, the systematic re-trainings typical for traditional batch service degradation policies as an end goal for the decision-
learning can be avoided. Consequently, time and computa- making process.
tional resources can be saved while maintaining the model’s
performance. A. ARCHITECTURE FOR SMART DECISION MAKING
As shown in Figure 3, IL provides a continuous iterative We evaluate our approach in the context of the Knowledge-
process. First, after feature extraction and instance labelling Defined Networking (KDN) paradigm [24], [25].

207430 VOLUME 8, 2020

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

FIGURE 4. Data extraction environment.

Our architecture shown in Figure 4 aims to extend the The tasks associated with these steps are summarized as
network with a knowledge plane by gathering informa- follows.
tion relevant to IL-based users’ OTT consumption behavior Step 1: packet persistence. We started with capturing
classification. With such an extension, we aim to support packet traces in the network of Universidad del Cauca,
the network administrators in defining personalized service Colombia. The packets were captured using Wireshark on a
degradation policies. The description of its components is as server configured as a mirror port of a core network switch
follows: throughout ten days in 2019. All the packets were stored in
• Application Plane: this plane contains the network users PCAP files whose total size was roughly 3 TB.
that connect through different kinds of devices and con- Step 2: flow generation. Several applications can orga-
sume the OTT applications. nize IP packets into flows while obtaining flow statistics.
• Data & Control Plane: includes all the devices, network However, the majority of the widely recognized tools cannot
topology, exchanged data, and functional configuration perform layer seven (application) labeling. We developed our
set up by the network administrator. application, named Flow Labeler [26], to perform network
• Knowledge Plane: contains the elements that offer valu- statistics calculations and application labeling to overcome
able knowledge about the network to the administrator. this challenge. Flow Labeler is based on FlowRecorder [27]
The main elements include: and NFStream [28] network traffic flow metering tools.
(i) The capture server configured to gather the network It follows the architectural principles of the traditional
traffic containing the information about OTT con- network traffic measurement paradigm [22]. Using Flow
sumption behavior; Labeler, we processed all the captured PCAP files. Even-
(ii) Data cleaning & preprocessing where the raw data tually, we obtained a dataset containing 2,704,839 records
is processed and analyzed; with 50 attributes and 141 application labels. The features
(iii) The users’ consumption profiles dataset containing measured by Flow Labeler are shown in Table 1.
OTT consumption behavior data; Step 3: flow cleaning. During flow cleaning, we first
(iv) The IL classification model used for 1) classifying discarded the flow records that were not generated by user
the users into their corresponding consumption pro- devices. This was achieved by a filter that retained only
files, and 2) administrator support during the defi- those flows that belonged to a specific range of IP addresses
nition of personalized service degradation policies. holding user devices. Then a selection process was applied to
• Management Plane: represents the plane where the net- preserve only the most representative OTT applications. The
work administrator makes all the decisions to define the original dataset contained flows generated by 141 different
service degradation policies based on the users’ behavior applications. After the selection process, flows belonging
analysis. to 56 OTT applications were retained. These flows were
generated by popular applications such as Facebook, Netflix,
B. GROUND TRUTH DEVELOPMENT Spotify, Twitter, Dropbox, WhatsApp, and YouTube.
Following Steps 1-6 of our reference model’s workflow (see Step 4: user consumption estimation. After flow clean-
Section 3.C), we systematically developed a ground truth. ing, we proceeded with user consumption estimation.

VOLUME 8, 2020 207431

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

TABLE 1. Attributes generated by flow labeler.

We considered only the attributes directly related to users’

consumption behavior. Eventually, OTT service consumption
was measured using the amount of time a user spent con-
suming the application and the amount of data exchanged
through the network. This led to a dataset with 716 OTT
consumption profiles with 114 attributes for each instance,
whose information is provided in Table 2.
Step 5: clustering and pattern recognition. Subse-
quently, since the users’ data remained unlabeled, we per-
formed clustering to classify the users. First, we focused
on finding the optimal number of clusters for the data.
We determined the ideal number of clusters for the dataset FIGURE 5. Elbow method - the curve of within-cluster sum of square
according to the number of clusters k.
supported by the well-known elbow and average silhouette
methods [29].
In the elbow method, the bend (knee) location in the plot is
generally considered an indicator of the appropriate number clusters. Figures 5 and 6 show the plots obtained with the two
of clusters. Whereas in the silhouette method, the location methods. Based on the figures, the optimal number of clusters
of the maximum is considered as the appropriate number of in our dataset appears to be 4 and 7.

207432 VOLUME 8, 2020

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

TABLE 2. Users’ consumption dataset – 114 attributes.

TABLE 3. Clusters labeling.

FIGURE 6. Silhouette method – the curve of average silhouette according

to the number of clusters k.

With these two values determined as the optimal number of

clusters, we proceeded with the clustering process. In our rig-
orous process, we analyzed our dataset using several cluster-
ing algorithms. Specifically, we used Cobweb, Hierarchical
(connectivity-based algorithms), Expectation-Maximization
(Gaussian mixture), DBSCAN (density-based), and K-means
(centroid-based) clustering [30], [31].
Among all the algorithms tested, K-Means yielded the
most reasonable results regarding the division of instances.
Specifically, we achieved the best results with 4 clusters.
After analyzing the centroids, the distributions, and the total
time and data occupation, we eventually developed a dataset
whose samples were divided into (labeled with) four classes,
as shown in Table 3. Since data occupation is considered the
most critical feature, the group of users who occupied more
data was labeled as the highest consumption users.
Finally, by implementing a correlation study between the FIGURE 7. Cluster visualization.
dataset attributes and the target classes, the most correlated
attributes were determined. Specifically, we determined that
AppleIcloud data occupation and Gmail time occupation A common observation is that medium consumption users
exhibited the highest correlations. Although the applications spend more time with consuming data while maintaining a
are different and several examples can be observed through low data occupation rate. High consumption users present
different correlation, we focused mostly on the target class higher data occupations in shorter periods. Furthermore, there
for visualization purposes. Consequently, in Figure 7, a plot is only one very high consumption user that exceeds all the
of these two attributes illustrates the cluster distribution. other clusters. This user exhibits the highest data and time

VOLUME 8, 2020 207433

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

occupations. However, considering the clusters’ behavior, last window of observed samples.Then, whenever new input
this case could be considered as an outlier in the data. data is received, the algorithm searches within these stored
Step 6: data preprocessing. In data preprocessing, samples and finds the closest neighbors using a selected
we focused on two goals: (i) to remove anomalies and (ii) to distance metric (e.g., Euclidean distance). While maintaining
improve the class balance. To achieve this, we applied the low search times, scikit-multiflow uses a structure called a K
Inter Quartile Range (IQR) [32] approach to detect and Dimensional Tree to store the samples.
remove the outliers in the dataset. To improve the class bal- OB [44] is an incremental ensemble learning method that
ance, we used the SMOTE algorithm [33]. As a result, we improves the traditional Bagging from the batch setting.
obtained a dataset with 1,249 instances (users). The classes It simulates the training phase by taking each arriving sample
were distributed based on their consumption leading to the to train the base estimator over k times. This sample is drawn
following statistics: by a binomial distribution – a method that can be considered
• 510 low consumption users, as ‘‘a good drawing with replacement’’ substitution from the
• 333 medium consumption users, and one implemented in the traditional batch learning.
• 406 high consumption users. LB [45] is based on OB. It tries to obtain better results
by modifying the Poisson distribution parameters obtained
from the binomial distribution when assuming an infinite
C. EVALUATION OF IL ALGORITHMS
input data stream. This Poisson distribution is used to perform
KDN is a commonly used paradigm in network traffic man- the drawing with replacement to reduce zero values in the
agement. One of its disadvantages is poor efficiency sourcing distribution’s mass probability function. This is achieved by
mainly from its utilization of traditional ML. increasing the value of λ from 1 to 6.
However, the OTT applications market is highly volatile ARF [46] is an adaptation of the traditional random forest
and unpredictable (e.g., a wide variety of applications, con- algorithm applied to the incremental learning scope. ARF
stant new trends, and technologies). These constraints must generates multiple decision trees and decides how to classify
be addressed through an approach that can adapt to the the input data through a weighted voting system. Within the
dynamic aspects of the network traffic and users’ behav- voting system, the individual tree with the best performance
ior changes. That being said, our goal was to compare the (in terms of accuracy or the Kappa statistic) has a more
IL algorithms’ classification performance in the domain of substantial weight in the votes, i.e., higher priority in the
user OTT consumption behavior. To achieve this, we exam- decision.
ined several IL algorithms’ performance using the dataset Learn++ [47] is an ensemble method inspired by the
described in Section 4.B. Adaptive Boosting (AdaBoost) algorithm. It was originally
developed to improve the classification performance of weak
1) OVERVIEW OF THE ALGORITHMS classifiers. In essence, Learn++ generates an ensemble of
The most commonly used IL algorithms [34]–[39] include weak classifiers, each trained using a different distribution
decision trees, rule-based systems, NB, KNN, SVM, and neu- of training samples. The outputs of these classifiers are then
ral networks. Other common alternatives represent ensemble combined using a majority-voting scheme to obtain the final
methods such as OB, ARF, LB, and Learn++. We eval- classification.
uated the Python scikit-multiflow [40] implementation of The MLP algorithm [48], [49], is a neural network with
these algorithms. The main characteristics of the algorithms three or more layers. It is used to classify data that cannot
are shown in Table 4 as per [41]. Furthermore, their brief be separated linearly. It is a type of artificial neural network
description is as follows. that is fully connected. Every node in a layer is connected to
The Hoeffding Tree (HT) or Very Fast Decision Tree each node in the following layer. It uses a nonlinear activation
(VFDT) [42] is an incremental decision tree induction algo- function (mainly hyperbolic tangent or logistic function).
rithm. It exploits the fact that a small sample of data can Several conclusions can be drawn from Table 4. All algo-
often be enough to choose an optimal splitting attribute for rithms have advantages and disadvantages that must be taken
the construction of the decision tree. This idea is supported into consideration during selection. Neural Networks, for
mathematically by the Hoeffding bound, which gives a level example, can handle noise but need a considerable amount
of confidence that allows choosing the best attribute to split of data for training to offer good performance. Besides,
the tree [43]. its implementation lacks a systematic approach making the
The NB algorithm’s incremental adaptation is a classifier approach mostly empirical. Lazy methods like KNN require
known for its simplicity and low computational cost. This an appropriate number of neighbors to be determined. How-
algorithm performs the classification process while applying ever, once this is obtained, it usually offers good performance
the Bayes theorem, where a strong (naive) assumption is on both regression and classification tasks. Decision trees are
made (all the input data features are independent of each easy to interpret but are quite sensitive when there is a large
other). number of attributes to handle. SVMs have strong mathemat-
The KNN algorithm, in an incremental setting, works by ical foundations, but its training and hyperplane calculations
keeping track of a fixed number of training samples of the can take a considerable amount of time when there is a large

207434 VOLUME 8, 2020

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

TABLE 4. General overview of common IL algorithms [41].

amount of data. Naïve Bayes is the fastest approach. Though,

it can present bad results due to its assumption of correlation
independence among the features. Finally, ensemble methods
offer an advanced approach by improving the performance
of individual algorithms through their combination. However,
it is crucial to choose a composition that is not too complex.
Otherwise, its training can take a large amount of time and
require several computational resources.
In conclusion, when selecting the best algorithm for a
specific goal, several algorithms should be taken into con-
sideration, tested and evaluated (compared).

2) EVALUATION OF CLASSIFICATION ACCURACY FIGURE 8. Configuration for the precision, recall and Kappa statistics
Considering the available evaluation approaches that can evaluation.
be considered for IL [50], we first focused on analyzing
the algorithms’ accuracy and learning rate when perform-
ing classification. Each algorithm’s performance in terms Figure 9 depicts the obtained results for the algorithms
of averaged success or error rate can be analyzed through tested in terms of precision, recall, and the Kappa statistic
accuracy measurements. These measurements include pre- with a warm-up of 10%. In comparison, Figure 10 shows
cision, recall, and the Kappa statistic [51], [52]. As the the achieved results with a warm-up of 15%. Similarly,
dataset does not hold many instances, the storage and Figures 11 and 12 show the obtained results for warm-ups
computational requirements are not considered an eval- with samples of 20% and 50% of the original dataset.
uation metric for the current scenarios. As shown in From the results, several conclusions can be derived.
Figure 8, a set of prequential evaluations was carried out In general, in all tests, we observed similar performance
for each algorithm changing the warm-up data sample size behaviors. We noticed that the NB, HT, KNN algorithm
between 10% (125 instances), 15% (187 instances), 20% and the ensemble methods used as base estimators in the
(250 instances), and 50% (625 instances) of the original composition achieved the worst performance.
dataset. For NB, the obtained performance could be due to
The grace period (i.e., the number of instances a leaf should its (naive) assumption of statistical independence between
wait before doing a split) for the decision trees (HT and ARF) the input data attributes. Since the time and data consump-
was set to 100 instances. The split criterion was based on the tion of the same OTT applications are related, assuming all
information gain. Furthermore, ten estimators were selected the attributes are independent could increase the error rate
for the ensemble methods (OB, LB, Learn++, and ARF) throughout the classification process. For the classification
[53]. Finally, a series of tests were carried out to obtain the process, the KNN algorithm maintains a small window of
best results for MLP. previously seen instances (previous perception data). It looks

VOLUME 8, 2020 207435

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

FIGURE 9. Results – Warm-up with 10%.

FIGURE 12. Results – Warm-up with 50%.

provide a correct estimator classification. As a result, a wrong

classification for the input into the voting process is selected.
Furthermore, the HT algorithm and the OB with the HT as
a base estimator exhibited poor performance. This algorithm
builds a decision tree while exploiting the following assump-
tions. (i) A small sample can often be enough to choose an
optimal splitting attribute. (ii) The distribution that generates
the data does not change over time. These assumptions make
the algorithm efficient in terms of computational resources.
However, in some instances, they can be a disadvantage as
once a node is created in the tree, it cannot be changed
anymore [54]. Therefore, once the users present different
FIGURE 10. Results – Warm-up with 15%. consumption behaviors, the performance might not achieve
the best results since the tree had a warm-up with a small
sample, and the nodes built will be unable to classify the
subsequent data inputs correctly.
On the other hand, we observed that four algorithms (ARF,
LB-ARF, Learn++ with ARF, and MLP) achieved an excel-
lent overall performance. ARF’s excellent performance is
likely due to the multiple decision trees that this approach
generates and the weighted voting system used for classifi-
cation. Furthermore, this algorithm also has a warning- and
drift-detection method. The warning-detection method tries
to detect when possible concept drift occurs in the target
class. For the other method, if drift is detected, the algo-
rithm starts training another decision tree besides building the
main ensemble method composition. Once the drift-detection
method confirms a concept drift in one of the individual
FIGURE 11. Results – Warm-up with 20%.
decision trees, it is replaced with the tree trained in the
background to keep the classification accuracy. The ARF
algorithm was used as a base estimator for Learn++ and
among them for the nearest neighbors to the current input. LB in order to observe if its excellent individual perfor-
Consequently, the data within the stored window likely does mance could be further improved through ensemble methods.
not provide a correct classification for the input. It can also Such improvement was achieved with LB, however, not with
be the case for the ensemble methods that use KNN as the Learn++.
base estimator (LB and OB). The ensemble methods are Finally, MLP also achieved an excellent performance,
composed of 10 KNN estimators, while each estimator stores especially after performing several experimental setups to
a data window. Therefore, the ensemble method’s perfor- help find the optimal network structure and an activation
mance becomes compromised if such a window does not function. We altered the number of hidden layers, the number

207436 VOLUME 8, 2020

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

of nodes, and the activation function as per [55]. The best

results were obtained with a 5-layer network (including the
input and output layers). It means we could ensure a fast
warm-up stage and low computational resource use. The
number of nodes in each layer was set between 1 and 113.
The best results were obtained with 113 nodes for the first
hidden layer and 40 nodes for the two subsequent layers
(113 × 40 × 40). As an activation function, we considered
ReLU, hyperbolic tangent, sigmoid, and identity functions.
The best results were achieved using the hyperbolic tangent
function. Finally, we selected Adam as the optimization algo-
rithm. It was shown to be an efficient extension of stochastic FIGURE 13. Classification performance maintenance setting.
gradient descent [56].
In conclusion, the best results were achieved using the
ARF, LB-ARF, and MLP algorithms. This is also evident
in Figures 9–12. These algorithms appeared to be suitable for
user OTT application consumption behavior classification.
However, the transfer of these approaches into real-world
applications depends on how well they maintain their accu-
racy whenever a user consumption change occurs. Therefore,
we examined the classification performance maintenance of
these algorithms, which will be discussed in the following
section.

3) EVALUATION OF CLASSIFICATION
PERFORMANCE MAINTENANCE
We assessed how well the algorithms determined in the
FIGURE 14. Achieved results the precision, recall, and Kappa statistics
previous section maintain their classification performance. with changing samples.
We also assessed how their learning rates are affected when
a change occurs in users’ OTT consumption behavior. First,
the algorithms were pre-trained using the entire dataset. Then,
we investigated their performance maintenance capabilities
through a prequential evaluation by systematically modifying
the behaviors of 60 users. This was to simulate a real-world
implementation use case where user behavior changes are
likely to happen over time. The introduced changes are as
follows:
• 20 Low Consumption users had their behavior changed
(10 users to Medium Consumption and 10 to High Con-
sumption),
• 20 Medium consumption behaviors were changed
(10 users to Low Consumption and 10 to High Con-
sumption), and
• 20 High Consumption users had their behaviors changed FIGURE 15. Precision evolution.
(10 users to Low Consumption and 10 users to Medium
Consumption).
Figure 13 depicts the configuration for this evaluation. The LB with ARF can maintain their performance throughout the
obtained results are shown in Figure 14, while each algo- experiment, as shown in Figure 16.
rithm’s precision evolution is depicted in Figure 15. From the Considering the above mentioned, we can conclude that
figures, we can observe that all the three algorithms presented both ARF and the composition between LB and ARF are
an excellent overall performance. However, as Figure 16 suitable for performing user OTT application consump-
shows, MLP is affected considerably by the changes intro- tion behavior classification. More importantly, these algo-
duced in the users’ behavior at the beginning of the data rithms can provide useful information, even in the event
stream. While the results eventually improve, the adaptation of dynamic user behavior change. Network administrators
to the changes is slow. On the other hand, both ARF and could make fair use of such an approach. IL can make

VOLUME 8, 2020 207437

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

TABLE 5. General overview of common IL algorithms.

FIGURE 16. Time occupation (sec) – Low consumption users.

network traffic-related managerial tasks faster, more robust,

and resource-efficient.

D. DECISION MAKING
Our case study’s final step was the applicability evaluation of
our reference model in a specific scenario. For this purpose,
we have selected decision-making as a process focused on
the recommendation of a set of personalized service degra-
dation policies. The policies were developed based on the
users’ OTT consumption behavior classified using IL. Since
service degradation is majorly implemented in mobile net-
works, the recommendation was developed in the Policy
and Charging Control (PCC) architecture scope. Specifically,
the concept of a 5G network PCC rule [14] was taken as the
basis for service degradation.
The elements considered in the structure of the PCC rule
are presented in Table 5. PCC rules can be of two types:
predefined or dynamic. Considering the nature of service
degradation, we proposed a dynamic PCC rule since it is
activated by a specified event (e.g., a user exceeds the con-
sumption limit).
In the 3GPP technical specification [14], the ARP is
defined by the 5QI parameter. It logically follows that if a
QoS flow has a GBR (5QI 1 to 4) or a non-GBR (5QI 5 to 9),
there are only two possibilities for defining a service degra-
dation policy (see Table 5 ). Either (i) degrade the service by
affecting the Maximum Bit Rate (MBR) associated with the that data occupation is the main indicator for determining how
SDF or (ii) block the service by modifying the Gate status much the users’ consumption demands network resources.
parameter. Figures 16-21 led to the following observations. Generally,
Considering that 56 OTT applications were identified in the applications that demanded the most data occupation
the dataset, we focused on identifying which applications from highest to lowest were HTTP (browsing), GoogleDocs,
were most commonly consumed in each user group. With Youtube, Gmail, Google, GoogleDrive, and AmazonVideo
this in mind, the 20 most used applications were identi- (7 applications). The Google application was used most of
fied in time and data occupation. Figures 16-18 illustrate the time by the low consumption users, with 10.26 hours
the 20 applications most commonly used in terms of time (see Figure 16). For the medium consumption users, Google
occupation for low, medium, and high consumption users, remained the application where users spent the most time
respectively. Similarly, Figures 19-21 illustrate the 20 most with 77.19 hours (see Figure 17). High consumption users
used applications in data occupation for the three groups. exhibited a large amount of time spent on Google with
Essentially, a larger time occupation does not imply higher 18.52 hours, as shown in Figure 18. Low consumption users
data occupation. Therefore, our recommendation considers exhibited their largest data occupation on HTTP and Google

207438 VOLUME 8, 2020

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

FIGURE 20. Data occupation (bytes) – Medium consumption users.

FIGURE 17. Time occupation (sec) – Medium consumption users.

FIGURE 21. Data occupation (bytes) – High consumption users.

FIGURE 18. Time occupation (sec) – High consumption users. Amazon, Facebook, Google, WhatsApp, and HTTP
(8 applications).
Considering the above observations, when a user exceeds
their allowed consumption limit, we recommend that all
user groups degrade the most commonly used applications.
In other terms, when a service degradation status is identified,
the MBR parameter is modified for the most commonly
used applications, while the rest are blocked by closing the
Gate Status. This way, the user can keep using the appli-
cations that are consumed most frequently. Simultaneously,
the network operator can save network resources by block-
ing the rest of the applications not accessed by the user so
frequently. To incorporate fairness in our proposal, we also
take into consideration the consumption profiles of the users
that exceed the limits. In consequence, the personalized
service degradation policies per each group are defined as
follows:
FIGURE 19. Data occupation (bytes) – Low consumption users.
• Low consumption group – HTTP, GoogleDocs,
Youtube, Gmail, Google, GoogleDrive, and Amazon-
Video will be degraded while the rest will be blocked.
docs (see Figure 19). Figure 20 shows that applications that • Medium consumption group – Youtube, GoogleDocs,
required the most network resources were Youtube, Google- Gmail, GoogleDrive, GoogleHangoutDuo, Spotify,
Docs, Gmail, GoogleDrive, GoogleHangoutDuo, Spotify, Twitter, and AmazonVideo will be degraded while the
Twitter, and AmazonVideo (8 applications). In terms of data rest will be blocked.
occupation (see Figure 21), these users consumed the most • High consumption group – Youtube, GoogleDrive,
resources using Youtube, GoogleDrive, GoogleHangoutDuo, GoogleHangoutDuo, Amazon, Facebook, Google,

VOLUME 8, 2020 207439

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

WhatsApp, and HTTP will be degraded while the rest V. CONCLUSION

will be blocked. Service degradation is a standard method to address unfair
Table 6 shows a summary of our recommendations for the consumption behaviors. Unfortunately, they lack configu-
set of personalized service degradation policies separated per ration on a finer granularity. Consequently, degradation is
each group of users. applied globally instead of on a user-by-user basis. As a
result, consumption patterns and habits are rarely incor-
porated into the decision-making process. With this work,
TABLE 6. Policies recommendation – DS (degrade service) - BS (block we aim to address this limitation.
service).
We proposed a reference model and showed that it can help
to achieve smart service degradation. It enables the incor-
poration of user consumption behavior when defining and
applying service degradation policies. The proposed model
provides a guideline for users’ OTT consumption behavior
classification based on IL. We defined its main components,
actors, and workflow. To the best of our knowledge, this
work is the first effort in this matter. It can be regarded as
a guideline for understanding OTT applications and their
demands on network technologies to propose more efficient
solutions. As part of our case study evaluation, we created
two datasets. We publish them both via [57], [58] to support
the reproducibility of research in the domain.
In future work, we plan to automatize the definition process
of personalized service degradation policies. We also plan
to implement ARF and LB/ARF in a real-world network
monitoring architecture. Furthermore, our plans also include
tests focused on incremental Support Vector Machines.

ACKNOWLEDGMENT
The authors would like to thank Universidad del Cauca for
supporting this research.

REFERENCES
[1] J. K. MacKie-Mason and H. R. Varian, ‘‘Some FAQs about usage-
based pricing,’’ Comput. Netw. ISDN Syst., vol. 28, no. 1, pp. 257–265,
Dec. 1995, doi: 10.1016/0169-7552(95)00096-1.
[2] J. Krämer and L. Wiewiorra, ‘‘Data caps and two-sided pricing:
Evaluating managed service business models,’’ presented at the Eur.
Conf. Inf. Syst. (ECIS), Tel Aviv, Israel, 2014. [Online]. Available:
https://aisel.aisnet.org/ecis2014/proceedings/track10/10/.
E. SUMMARY
[3] J. Wortham, ‘‘As mobile networks speed up, data gets capped,’’ New York
This case study was aimed at demonstrating the applicability Times, Aug. 14, 2011.
of our reference model when assisting network operators [4] M. Lasar. (Apr. 4, 2011). It could be worse: Data caps around the
world. Ars Technica. [Online]. Available: https://arstechnica.com/tech-
in critical decision-making processes. The network admin- policy/news/2011/04/how-internet-users-are-disciplined-around-the-
istrators execute decisions based on the model that guides world.ars.
them through several tasks and processes. Following the [5] M. Chetty, R. Banks, A. J. Brush, J. Donner, and R. Grinter, ‘‘You’Re
capped: Understanding the effects of bandwidth caps on broadband use in
model, they eventually achieve the ultimate goal – maintain- the home,’’ in Proc. SIGCHI Conf. Hum. Factors Comput. Syst., New York,
ing user experience and reliable network performance. The NY, USA, 2012, pp. 3021–3030, doi: 10.1145/2207676.2208714.
defined representation model helped to comprehend users’ [6] M. Chetty, H. Kim, S. Sundaresan, S. Burnett, N. Feamster, and
W. K. Edwards, ‘‘uCap: An Internet data management tool for the home,’’
OTT consumption behavior and ensure that data preprocess- in Proc. 33rd Annu. ACM Conf. Hum. Factors Comput. Syst., 2015,
ing requirements are met. The decision-making capabilities pp. 3093–3102, doi: 10.1145/2702123.2702218.
of the architecture were shown to be efficient. The capabil- [7] Ixia. (2013). Quality of Service (QoS) and Policy Management in
Mobile Data Networks. [Online]. Available: https://www.celemetrix.
ities and applicability of our approach are far beyond this co.nz/assets/images/Ixia%20QoS.pdf
case study. Several components, tasks, and processes can be [8] T. Sudtasan and H. Mitomo, ‘‘Effects of OTT services on consumer’s
customized and modified to suit the network, services, and willingness to pay for optical fiber broadband connection in Thailand,’’
presented at the 27th Eur. Regional Conf. Int. Telecommun. Soc. (ITS),
user needs. As a result, our model highly flexible, adaptable, Evol. North-South Telecommun. Divide, Role Europe, Cambridge, U.K.,
and replicable. 2016. [Online]. Available: https://www.econstor.eu/handle/10419/148709

207440 VOLUME 8, 2020

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

[9] S. Ponmaniraj, R. Rashmi, and M. V. Anand, ‘‘IDS based network security [27] A. Pekar. (2020). Drnpkr/FlowRecorder. GitHub. [Online]. Available:
architecture with TCP/IP parameters using machine learning,’’ in Proc. https://github.com/drnpkr/flowRecorder
Int. Conf. Comput., Power Commun. Technol. (GUCON), Sep. 2018, [28] Z. Aouini. (2020). Nfstream/Nfstream. GitHub. [Online]. Available:
pp. 111–114, doi: 10.1109/GUCON.2018.8674974. https://github.com/nfstream/nfstream
[10] L. Zhang, X. Meng, and H. Zhou, ‘‘Network fault diagnosis using [29] M. Charrad, N. Ghazzali, V. Boiteau, and A. Niknafs, ‘‘NbClust: AnRPack-
hierarchical SVMs based on kernel method,’’ in Proc. 2nd Int. Work- age for determining the relevant number of clusters in a data set,’’ J. Stat.
shop Knowl. Discovery Data Mining, Jan. 2009, pp. 753–756, doi: Softw., vol. 61, no. 6, 2014, Art. no. 1, doi: 10.18637/jss.v061.i06.
10.1109/WKDD.2009.79. [30] K. Djouzi and K. Beghdad-Bey, ‘‘A review of clustering algorithms for big
[11] S. Troia, D. E. Martinez, I. Martin, L. M. M. Zorello, G. Maier, data,’’ in Proc. Int. Conf. Netw. Adv. Syst. (ICNAS), Jun. 2019, pp. 1–6, doi:
J. A. Hernandez, O. G. de Dios, M. Garrich, J. L. Romero-Gazquez, 10.1109/ICNAS.2019.8807822.
F.-J. Moreno-Muro, P. P. Marino, and R. Casellas, ‘‘Machine learning- [31] M. Hassani, ‘‘Overview of efficient clustering methods for high-
assisted planning and provisioning for SDN/NFV-enabled metropolitan dimensional big data streams,’’ in Clustering Methods for Big Data
networks,’’ in Proc. Eur. Conf. Netw. Commun. (EuCNC), Jun. 2019, Analytics: Techniques, Toolboxes and Applications, O. Nasraoui and
pp. 438–442, doi: 10.1109/EuCNC.2019.8801956. C.-E. B. N’Cir, Eds. Cham, Switzerland: Springer, 2019, pp. 25–42.
[12] M. Shafiq, X. Yu, A. A. Laghari, L. Yao, N. K. Karn, and F. Abdessamia,
[32] G. Upton and I. Cook, Understanding Statistics, Illustrated. Oxford, U.K.:
‘‘Network traffic classification techniques and comparative analysis using
OUP Oxford, 1996.
machine learning algorithms,’’ in Proc. 2nd IEEE Int. Conf. Com-
put. Commun. (ICCC), Oct. 2016, pp. 2451–2455, doi: 10.1109/Comp- [33] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE:
Comm.2016.7925139. Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16,
pp. 321–357, Jun. 2002.
[13] J. S. Rojas, Á. R. Gallón, and J. C. Corrales, ‘‘Personalized service degra-
dation policies on OTT applications based on the consumption behavior of [34] P. Joshi, ‘‘Incremental learning: Areas and methods—A survey,’’ Int. J.
users,’’ in Computational Science and Its Applications—ICCSA (Lecture Data Mining Knowl. Manage. Process, vol. 2, no. 5, pp. 43–51, Sep. 2012,
Notes in Computer Science), vol. 10962. Cham, Switzerland: Springer, doi: 10.5121/ijdkp.2012.2504.
2018, doi: 10.1007/978-3-319-95168-3_37. [35] R. R. Ade and P. R. Deshmukh, ‘‘Methods for incremental learning:
[14] Policy and Charging Control Framework for the 5G System (5GS), A survey,’’ Int. J. Data Mining Knowl. Manage. Process, vol. 3, no. 4,
document 123.503, 3GPP Technical Specification (TS), Sep. 2020. pp. 119–125, Jul. 2013, doi: 10.5121/ijdkp.2013.3408.
[Online]. Available: https://www.etsi.org/deliver/etsi_ts/123500_123599/ [36] V. Lemaire, C. Salperwyck, and A. Bondu, ‘‘A survey on supervised
123503/16.05.01_60/ts_123503v160501p.pdf classification on data streams,’’ in Proc. 4th Eur. Bus. Intell. Summer
[15] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, ‘‘Systematic mapping School (eBISS), in Tutorial Lectures, Berlin, Germany, E. Zimányi and
studies in software engineering,’’ in Proc. 12th Int. Conf. Eval. Assess- R.-D. Kutsche, Eds. Cham, Switzerland: Springer, Jul. 2014, pp. 88–125.
ment Softw. Eng., Swinton, U.K., 2008, pp. 68–77. [Online]. Available: [37] J. Zhong, Z. Liu, Y. Zeng, L. Cui, and Z. Ji, ‘‘A survey on incremen-
http://dl.acm.org/citation.cfm?id=2227115.2227123 tal learning,’’ presented at the 5th Int. Conf. Comput., Automat. Power
[16] M. V. Vinupaul, R. Bhattacharjee, R. Rajesh, and G. S. Kumar, ‘‘User char- Electron., Nov. 2017. [Online]. Available: https://webofproceedings.
acterization through network flow analysis,’’ in Proc. Int. Conf. Data Sci. org/proceedings_series/article/artId/777.html
Eng. (ICDSE), Aug. 2016, pp. 1–6, doi: 10.1109/ICDSE.2016.7823965. [38] Q. Yang, Y. Gu, and D. Wu, ‘‘Survey of incremental learning,’’ in
[17] A. Hess, I. Marsh, and D. Gillblad, ‘‘Exploring communication and Proc. Chin. Control Decis. Conf. (CCDC), Jun. 2019, pp. 399–404, doi:
mobility behavior of 3G network users and its temporal consistency,’’ in 10.1109/CCDC.2019.8832774.
Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2015, pp. 5916–5921, doi: [39] S. Madhavan and N. Kumar, ‘‘Incremental methods in face recogni-
10.1109/ICC.2015.7249265. tion: A survey,’’ Artif. Intell. Rev., Aug. 2019, doi: 10.1007/s10462-019-
[18] D. Rajashekar, A. N. Zincir-Heywood, and M. I. Heywood, ‘‘Smart phone 09734-3.
user behaviour characterization based on autoencoders and self organizing [40] J. Montiel, J. Read, A. Bifet, and T. Abdessalem, ‘‘Scikit-multiflow:
maps,’’ in Proc. IEEE 16th Int. Conf. Data Mining Workshops (ICDMW), A multi-output streaming framework,’’ J. Mach. Learn. Res., vol. 19,
Dec. 2016, pp. 319–326, doi: 10.1109/ICDMW.2016.0052. no. 72, pp. 1–5, 2018.
[19] A. Bakhshandeh and Z. Eskandari, ‘‘An efficient user identification [41] A. Chefrour, ‘‘Incremental supervised learning: Algorithms and applica-
approach based on netflow analysis,’’ in Proc. 15th Int. ISC, Iranian Soc. tions in pattern recognition,’’ Evol. Intell., vol. 12, no. 2, pp. 97–112,
Cryptol., Conf. Inf. Secur. Cryptol. (ISCISC), Aug. 2018, pp. 1–5, doi: Jun. 2019, doi: 10.1007/s12065-019-00203-y.
10.1109/ISCISC.2018.8546856. [42] G. Hulten, L. Spencer, and P. Domingos, ‘‘Mining time-changing data
[20] S. Zhao, Y. Xu, X. Ma, Z. Jiang, Z. Luo, S. Li, L. T. Yang, A. Dey, and streams,’’ in Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery
G. Pan, ‘‘Gender profiling from a single snapshot of apps installed on Data Mining, San Francisco, CA, USA, Aug. 2001, pp. 97–106, doi:
a smartphone: An empirical study,’’ IEEE Trans. Ind. Informat., vol. 16, 10.1145/502512.502529.
no. 2, pp. 1330–1342, Feb. 2020, doi: 10.1109/TII.2019.2938248.
[43] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, ‘‘MOA: Massive online
[21] I. Paul, A. Khattar, P. Kumaraguru, M. Gupta, and S. Chopra,
analysis,’’ J. Mach. Learn. Res., vol. 11, no. 52, pp. 1601–1604, 2010.
‘‘Elites tweet? Characterizing the Twitter verified user network,’’
[44] N. C. Oza, ‘‘Online bagging and boosting,’’ in Proc. IEEE Int.
Mar. 2019, arXiv:1812.09710. Accessed: Jul. 16, 2020. [Online]. Avail-
Conf. Syst., Man Cybern., vol. 3, Oct. 2005, pp. 2340–2345, doi:
able: http://arxiv.org/abs/1812.09710
10.1109/ICSMC.2005.1571498.
[22] R. Hofstede, P. Celeda, B. Trammell, I. Drago, R. Sadre, A. Sperotto, and
A. Pras, ‘‘Flow monitoring explained: From packet capture to data analysis [45] A. Bifet, G. Holmes, and B. Pfahringer, ‘‘Leveraging bagging for evolv-
with NetFlow and IPFIX,’’ IEEE Commun. Surveys Tuts., vol. 16, no. 4, ing data streams,’’ in Machine Learning and Knowledge Discovery in
pp. 2037–2064, 4th Quart., 2014, doi: 10.1109/COMST.2014.2321898. Databases. ECML PKDD (Lecture Notes in Computer Science), vol. 6321,
[23] T. Bujlow, V. Carela-Español, and P. Barlet-Ros, ‘‘Independent comparison J. L. Balcázar, F. Bonchi, A. Gionis, and M. Sebag, Eds. Berlin, Germany:
of popular DPI tools for traffic classification,’’ Comput. Netw., vol. 76, Springer, 2010, doi: 10.1007/978-3-642-15880-3_15.
pp. 75–89, Jan. 2015, doi: 10.1016/j.comnet.2014.11.001. [46] H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger,
[24] D. D. Clark, C. Partridge, J. C. Ramming, and J. T. Wroclawski, ‘‘A knowl- G. Holmes, and T. Abdessalem, ‘‘Adaptive random forests for evolv-
edge plane for the Internet,’’ in Proc. Conf. Appl., Technol., Archit., Pro- ing data stream classification,’’ Mach. Learn., vol. 106, nos. 9–10,
tocols Comput. Commun., Karlsruhe, Germany, Aug. 2003, pp. 3–10, doi: pp. 1469–1495, Oct. 2017, doi: 10.1007/s10994-017-5642-8.
10.1145/863955.863957. [47] R. Polikar, L. Upda, S. S. Upda, and V. Honavar, ‘‘Learn++: An incre-
[25] A. Mestres, A. Rodriguez-Natal, J. Carner, P. Barlet-Ros, E. Alarcón, mental learning algorithm for supervised neural networks,’’ IEEE Trans.
M. Solé, V. Muntés-Mulero, D. Meyer, S. Barkai, M. J. Hibbett, G. Estrada, Syst., Man, Cybern. C, Appl. Rev., vol. 31, no. 4, pp. 497–508, Nov. 2001,
K. Ma’ruf, F. Coras, V. Ermagan, H. Latapie, C. Cassar, J. Evans, F. Maino, doi: 10.1109/5326.983933.
J. Walrand, and A. Cabellos, ‘‘Knowledge-defined networking,’’ ACM [48] R. Lippmann, ‘‘An introduction to computing with neural nets,’’
SIGCOMM Comput. Commun. Rev., vol. 47, no. 3, pp. 2–10, Sep. 2017, IEEE ASSP Mag., vol. 4, no. 2, pp. 4–22, Apr. 1987, doi:
doi: 10.1145/3138808.3138810. 10.1109/MASSP.1987.1165576.
[26] J. S. Rojas. (2020). Jsrojas/FlowLabeler. GitHub. [Online]. Available: [49] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
https://github.com/jsrojas/FlowLabeler. MA, USA: MIT Press, 2016.

VOLUME 8, 2020 207441

J. S. Rojas et al.: Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation

[50] A. Cervantes, C. Gagné, P. Isasi, and M. Parizeau, ‘‘Evaluating and ADRIAN PEKAR received the Ph.D. degree in
characterizing incremental learning from non-stationary data,’’ Jun. 2018, computer science from the Technical University
arXiv:1806.06610. [Online]. Available: http://arxiv.org/abs/1806.06610 of Košice, Slovakia, in 2014. He is currently an
[51] D. M. W. Powers. (Dec. 2007). Evaluation: From Precision, Recall Assistant Professor with the Department of Net-
and F-Measure to ROC, Informedness, Markedness & Correlation. worked Systems and Services, Budapest Univer-
[Online]. Available: https://www.researchgate.net/publication/ sity of Technology and Economics, Hungary. Prior
228529307_Evaluation_From_Precision_Recall_and_F-Factor_to_ROC_ to this, he held research, teaching, and engineer-
Informedness_Markedness_Correlation
ing positions in New Zealand and Slovakia. His
[52] M. Banerjee, M. Capozzoli, L. McSweeney, and D. Sinha, ‘‘Beyond kappa:
research interests include network traffic classifi-
A review of interrater agreement measures,’’ Can. J. Statist., vol. 27, no. 1,
pp. 3–23, Mar. 1999, doi: 10.2307/3315487. cation and management, traffic measurement data
[53] (Oct. 23, 2020). Scikit-Multiflow’s Documentation. Accessed: Sep. reduction and visualization, and stream processing.
23, 2020. [Online]. Available: https://scikit-multiflow.github.io/scikit-
multiflow/
[54] P. K. Srimani and M. M. Patil, ‘‘Performance analysis of Hoeffding trees
in data streams by using massive online analysis framework,’’ Int. J. Data
Mining Model. Manage., vol. 7, no. 4, pp. 293–313, 2015. ÁLVARO RENDÓN (Senior Member, IEEE)
[55] D. Stathakis, ‘‘How many hidden layers and nodes?’’ Int. J. Remote received the B.S. degree in electronics engineer-
Sens., vol. 30, no. 8, pp. 2133–2147, Apr. 2009, doi: 10.1080/ ing and the M.S. degree in telematics engineer-
01431160802549278. ing from the Universidad del Cauca, Colombia,
[56] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimiza- in 1979 and 1989, respectively, and the Ph.D.
tion,’’ Jan. 2014, arXiv:1412.6980. [Online]. Available: http://arxiv.org/ degree in telecommunications engineering from
abs/1412.6980 the Technical University of Madrid, in 1997. He is
[57] (Oct. 29, 2020). Labeled Network Traffic Flows—141 Applications.
currently a Full Professor with the Department of
Accessed: Sep. 29, 2020. [Online]. Available: https://kaggle.com/jsrojas/
Telematics, University of Cauca, where he is also
labeled-network-traffic-flows-114-applications
[58] (Oct. 29, 2020). User OTT Consumption Profile—2019. Accessed: the Director of the Doctoral Program in Telematics
Sep. 29, 2020. [Online]. Available: https://kaggle.com/jsrojas/user-ott- Engineering. His research interests include telecommunications services,
consumption-profile-2019 e-health, and S&T management. He was a recipient of the Outstanding
Paper Award in the category of Tools Track at the First IEEE International
Conference on Engineering of Complex Computer Systems in 1995.
JUAN SEBASTIÁN ROJAS received the bache-
lor’s degree in electronics and telecommunications
engineering and the M.Sc. degree in telematics
engineering from the Universidad del Cauca,
Colombia, in 2015 and 2018, respectively. He is JUAN CARLOS CORRALES received the bach-
currently pursuing the Ph.D. degree in telematics elor’s and master’s degrees in telematics engi-
engineering. He is also an Active Researcher with neering from the University of Cauca, Colombia,
the Telematics Engineering Group, Universidad in 1999 and 2004, respectively, and the Ph.D.
del Cauca. Since 2016, he has been granted a Ph.D. degree in sciences, specialty computer science
Scholarship by MinCiencias and the Leader entity from the University of Versailles Saint-Quentin-
of the science, technology and innovation system in Colombia. Furthermore, en-Yve-lines, France, in 2008. He is currently a
he has been an Assistant Professor in several courses related to converged full-time Professor and leads the Telematics Engi-
services, data mining, and machine learning. His research topic involves the neering Group, University of Cauca. His research
study of users’ Over The Top consumption behavior aimed at the personal- interests include service composition and data
ization of service degradation through incremental learning supported by the analysis.
Knowledge Defined Networking paradigm.