0% found this document useful (0 votes)
21 views11 pages

Semi-Supervised Multivariate Statistical Network

This document presents a semi-supervised approach for intrusion detection that extends an unsupervised multivariate statistical network monitoring approach based on principal component analysis. It introduces a supervised optimization technique to learn the optimum scaling of input data to detect targeted threats. Specifically, it uses an extension of the gradient descent method based on partial least squares to optimize the feature scaling for detecting a specific, targeted anomaly. This makes the approach semi-supervised by enhancing the original unsupervised model to be optimized for detecting particular security threats. The method is demonstrated on a real case study, showing improvements to detection performance and interpretability of attacks.

Uploaded by

Saca Ilmare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

Semi-Supervised Multivariate Statistical Network

This document presents a semi-supervised approach for intrusion detection that extends an unsupervised multivariate statistical network monitoring approach based on principal component analysis. It introduces a supervised optimization technique to learn the optimum scaling of input data to detect targeted threats. Specifically, it uses an extension of the gradient descent method based on partial least squares to optimize the feature scaling for detecting a specific, targeted anomaly. This makes the approach semi-supervised by enhancing the original unsupervised model to be optimized for detecting particular security threats. The method is demonstrated on a real case study, showing improvements to detection performance and interpretability of attacks.

Uploaded by

Saca Ilmare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO.

8, AUGUST 2019 2179

Semi-Supervised Multivariate Statistical Network


Monitoring for Learning Security Threats
José Camacho , Gabriel Maciá-Fernández, Noemí Marta Fuentes-García, and Edoardo Saccenti

Abstract— This paper presents a semi-supervised approach use of machine learning and data mining techniques [4].
for intrusion detection. The method extends the unsupervised However, it becomes essential to utilize tools and approaches
multivariate statistical network monitoring approach based on that provide interpretability of results, that is, information
the principal component analysis by introducing a supervised
optimization technique to learn the optimum scaling in the input about the features and the real cause of an attack in order
data. It inherits the advantages of the unsupervised strategy, to efficiently respond to it. Regretfully, the bulk of machine
capable of uncovering new threats, with that of supervised learning and data mining approaches does not satisfy this
strategies, capable of learning the pattern of a targeted threat. requirement. Regarding the optimization to targeted threats,
The supervised learning is based on an extension of the gradient proper system update is a relevant feature, in particular to
descent method based on partial least squares (PLS). Moreover,
we enhance this method by using sparse PLS variants. The make the most of information sharing technologies that distrib-
practical application of the system is demonstrated on a recently ute warnings among corporations when new threats arise. The
published real case study, showing relevant improvements in adaptation to targeted threats of machine learning models is
detection performance and in the interpretation of the attacks. more technically challenging than that of traditional rule-based
Index Terms— Multivariate statistical network monitoring, systems (e.g. antivirus, rule-based IDS like snort, correlation
anomaly detection, intrusion detection, semi-supervised learning, engines, etc.), more extended in the industry.
partial least squares regression, principal components analysis. Multivariate Analysis has been recognized as an outstanding
approach for anomaly detection in several domains, including
I. I NTRODUCTION industrial monitoring [5] and networking [6]. In the field of

C IBERSECURITY incidents are considered one of the


most relevant threats for businesses in almost any market.
According to the VERIZON annual ‘Data Breach Investigation
industrial processing, a well-developed strategy is Multivariate
Statistical Process Control (MSPC). A main tool within MSPC
is Principal Component Analysis (PCA), which was proposed
Report’ (DBIR) [1], tens of thousands of attacks targeted by Lakhina et al. [7] for intrusion detection. Some of the
private and public companies during 2017. To effectively fight benefits of PCA are its unsupervised nature, that does not
against this real menace, the security industry has identified require any a-priori knowledge on the data, and its capability
that an essential line of defense should be based on the joint to provide diagnosis information for a given anomaly, a main
effort of all stakeholders combining their technical skills and advantage over other machine learning methodologies. This
information [2]. In this regard, the use of anomaly-based diagnosis support allows shortening the lag from detection
Intrusion Detection Systems (IDS) [3] is fundamental to unveil to response in a security incident, and therefore has practical
previously unknown attack strategies and thwart potential implications.
attacks to organizations. In previous work [8], we introduced a methodology
Main technical challenges in the field of IDS design are named Multivariate Statistical Network Monitoring (MSNM),
the need for handling massive and disparate sources of infor- an extension of PCA-MSPC applied to the intrusion detection
mation, the extraction of useful knowledge for the forensic problem. MSNM overcomes some reported limitations of the
analysis of incident data and the optimization of the detection original PCA approach of Ringberg et al. [9]. It is based
system to targeted threats. Dealing with very different sources on an over-parameterization of the feature space, that is,
of information and a vast amount of data has fostered the on defining a large number of data features within the detection
Manuscript received June 18, 2018; revised October 11, 2018 and system. This combines with the multivariate approach based
November 26, 2018; accepted January 14, 2019. Date of publication Janu- on PCA, that can handle high-dimensional data with millions
ary 29, 2019; date of current version May 16, 2019. This work was supported of variables. However, an open problem and main limitation in
in part by the Spanish Ministry of Economy and Competitiveness and FEDER
funds under Grant TIN2014-60346-R and Grant TIN2017-83494-R. The MSNM is how to select the relative relevance of the features
associate editor coordinating the review of this manuscript and approving it in the system. Experiments in [8] showed high sensitivity
for publication was Dr. Guofei Gu. (Corresponding author: José Camacho.) of the detection to the relative relevance (scaling) of the
J. Camacho, G. Maciá-Fernández, and N. M. Fuentes-García are with the
Department of Signal Theory, Telematics and Communications, School of features in PCA. We can make the most of this sensitivity
Computer Science and Telecommunications, CITIC-University of Granada, to enhance the detection ability to a set of targeted threats.
18014 Granada, Spain (e-mail: [email protected]). In particular, defining an optimum scaling of the features to
E. Saccenti is with the Laboratory of Systems and Synthetic Biology,
Wageningen University and Research, 6708 PB Wageningen, The Netherlands. identify the pattern of a recently identified threat is useful to
Digital Object Identifier 10.1109/TIFS.2019.2894358 update the monitoring system. This would enhance MSNM
1556-6013 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2180 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 8, AUGUST 2019

with an adaptation mechanism equivalent to what it is done the learning is known as unsupervised [16], [18]. Mixed
in a traditional rule-based detection system when adding rules approaches are considered to be semi-supervised learning [18].
with the fingerprint of recently identified threats. Support vector machines, neural networks or decision trees are
This paper presents a new procedure to optimize the scaling common examples of supervised learning, thought extensions
of the features in MSNM for the detection of targeted threats. exist in the unsupervised setting. Factorization methods like
This makes MSNM a semi-supervised learning approach, PCA and clustering algorithms such as K-means are often
where the original unsupervised PCA model is enhanced applied for unsupervised analyses and often combined with
to be optimum for the detection of specific anomalies. supervised methods.
For this purpose, we employ and improve an optimization In cybersecurity, unsupervised ML methods are applied
algorithm [10], [11] originally introduced in the context of to the anomaly detection problem, a.k.a. intrusion detection
process optimization. This algorithm is based on Partial Least problem, while supervised methods can be used to detect and
Squares (PLS) [12], [13], a multivariate regression technique. classify previously observed attacks. In this context, the use
We refer to this algorithm as run-to-run PLS (R2R-PLS). of PCA was proposed more than a decade ago [19]. Due to its
The R2R-PLS has been shown to outperform state-of-the- unsupervised nature, PCA does not require –and is not limited
art optimization techniques, like genetic algorithms, in large by– an a-priori specification of potential anomalies in the
search spaces [10]. system. This means that PCA is still useful to detect new types
The contributions of this paper are the following: of anomalies, something mandatory in real world anomaly
• We cast the problem of adapting an anomaly detection detection. The most referred work for PCA anomaly detection
system to targeted threats into an optimization problem. is that of Lakhina et al. [7], form which alternative proposals
In particular, we apply an optimization scheme in the have been developed [20]–[23]. One of them, the MSNM
context of anomaly detection with MSNM, overcoming methodology [8], is the base of the approach of this paper.
one of its principal limitations: the sensitivity of detection This methodology allows to combine traffic data with other
to the features scaling. While this is specially interesting security data sources, demonstrating high detection capabilities
in the context of intrusion detection, this contribution also and with the advantage of providing diagnosis support [24].
applies to different application domains [5]. The learning process in ML yields a model from a training
• For this purpose, we extend the original R2R-PLS in [10] data set. This learning process is also referred to as model cali-
and [11] to a general optimization problem, simplify- bration, and it is generally performed by optimizing the model
ing its implementation and improving its computational parameters in consecutive steps until convergence [4], [17].
efficiency. The calibration of the parameters is performed following
• We further extend the R2R-PLS optimization to sparse optimization strategies, which pursue to find, at least, local
PLS variants, very popular in biological sciences [14], optimums for their values and, hopefully, global solutions [25].
leading to an improvement of the optimization per- These approaches can be classified as follows [16]: Stochas-
formance and in the understanding of the connection tic approximation methods, expectation-maximization meth-
between the features scale and the MSNM detection. ods and greedy optimization. Within stochastic optimization,
• We demonstrate previous contributions in a recently pub- gradient descent methods apply the derivative of the opti-
lished real case study [15]. mization function to obtain the direction of search. In recent
The rest of the paper is organized as follows. Section 2 dis- papers [10], [11] a variation of this type of optimization
cusses related work to this paper. Section 3 introduces the based on Partial Least Squares (PLS) was presented. PLS
MSNM technique. Section 4 reviews PLS and two sparse is a particular form of regression model suitable to handle
variants. Section 5 presents our particular implementation high dimensional data sets. For this reason, the PLS-based
of the R2R-PLS algorithm. Its application in the adapta- optimization is specially suited to solve optimization problems
tion of MSNM to new threats is presented in Section 6. where the search space is of high dimensionality. Therefore,
Section 7 demonstrates the application of the approach to the it is a practical choice in ML to optimize a large number of
real case study. Section 8 draws some concluding remarks. model parameters.
While PLS is well suited to model high dimensional
data, a recent trend of research has explored the ability of
II. R ELATED W ORK a type of methods that perform PLS regression combined
Machine learning (ML) techniques have been widely with variable selection. These are generally referred to as
applied to cybersecurity problems. ML refers to the combi- sparse PLS (SPLS) methods. Several variants of SPLS been
nation of statistics and artificial intelligence to learn a model proposed [14], [26]–[29] with the goal of performing variable
from data [4], [16], and [17]. This is a global term widely selection during model calibration, discarding non-informative
used to refer to the task in which one automatically calibrates variables. Results show that SPLS variants are more stable and
(trains) a model or algorithm to obtain a descriptive output for often yield improved performance in very high dimensional
a given input. If the value for the output is previously known set-ups.
and used in the training, then the learning is called supervised This paper presents a semi-supervised approach for intru-
and it usually applies to classification and regression problems. sion detection. The method extends the unsupervised MSNM
However, if only the input data are known and the objective approach by introducing a supervised optimization technique
is to extract patterns or common behavior from the data, to learn the optimum scaling of the features. It inherits the
CAMACHO et al.: SEMI-SUPERVISED MULTIVARIATE STATISTICAL NETWORK MONITORING 2181

advantages of the unsupervised strategy, capable of uncovering be analyzed with dimension reduction techniques, like PCA.
new threats, with that of supervised strategies, capable of The diagnosis procedure benefits from the definition of a large
learning the pattern of a given threat. Considering that the number of features for a better description of the anomaly
number of features of MSNM corresponds to the dimension taking place. Furthermore, counters and their correlation are
of the search space in the optimization problem, and that easy to interpret.
this number can be very large, we apply the PLS-based opti-
mization for the supervised learning. Furthermore, we extend C. Detection
this optimization technique using sparse variants of PLS,
improving the learning ability and model interpretability. The core of MSNM is PCA. PCA is applied to data sets
where M variables or features are measured on N observations
with the aim of finding the subspace of maximum variance in
III. M ULTIVARIATE S TATISTICAL N ETWORK M ONITORING the M-dimensional feature space. The original features are
The MSNM follows 4 main steps: 1) Parsing, 2) Fusion, linearly transformed into the Principal Components (PCs),
3) Detection, and 4) Diagnosis. The first three steps are which are the eigenvectors of XX := XT · X, typically for
equivalent to what it is commonly done in other machine mean centred (MC) X and sometimes also after auto-scaling
learning methodologies. However, step 4 is a main advan- (AS, normalization to unit variance).
tage in MSNM. This step is possible thanks to the white- The PCA model follows the expression:
box, exploratory characteristics of PCA as the core of the
approach. PCA is a linear model and as such it is easy to X = T A · PtA + E A , (1)
interpret in terms of the connection between anomalies and where A is the number of PCs, T A is the N × A score matrix,
features, something much more complicated in the non-linear P A is the M × A loading matrix and E A is the N × M matrix
machine learning variants. of residuals.
For each observation, corresponding to a feature vector
A. Parsing collected in a given time interval, the corresponding score
vector is computed as follows:
The information captured from a network is usually pre-
sented in the form of system logs or network traces, and cannot tn = x n · P A (2)
be directly used to feed a typical tool for anomaly detection.
Therefore, some sort of parsing and feature engineering needs where xn is a 1 × M vector representing the observation and
to be done in order to generate quantitative features that can tn a 1 × A vector with the corresponding scores, while
be used for data modeling. en = xn − tn · PtA (3)
Lakhina et al. [7] proposed the definition of counters
obtained from Netflow records as quantitative features for corresponds to the residuals.
anomaly detection using PCA. In [30], we generalized this For the detection of anomalies in MSNM, a pair of charts
definition to consider several sources of data, proposing the are monitored: the Q-statistic (Q-st), which compresses the
feature-as-a-counter approach. Each feature contains the num- residuals; and the D-statistic (D-st) or Hotelling’s T2 statistic,
ber of times a given event (e.g. number of packets sent from computed from the scores. The D-st and the Q-st for an
public IPs or number of flows associated to destination port observation can be computed from the following equations:
80) takes place during a given time window. The parsing Dn = tn · (T )−1 · tnt (4)
transforms the raw data in a stream of features, where each
Q n = en · ent (5)
time interval of e.g. 1 minute is represented by a feature vector
of counts. where T represents the covariance matrix of the scores in
the calibration data.
B. Fusion With the statistics computed from the calibration data,
upper control limits (UCL), i.e. detection thresholds, can be
The feature-as-a-counter definition simplifies the fusion of established in the charts at a certain confidence level [31]
different data sources in a single set of features. For each to decide if future events are anomalous. A straightforward
different source of data, a set of features (counters) is defined. approach to define the UCLs is by using percentiles over
The sampling rate for each source may be different, due to the the statistics computed from the calibration data X. Once the
specific dynamics of the source. Thus, to combine the features system is calibrated and control limits computed, it can be
from different sources these need to be stretched/compressed applied to incoming data/traffic. An anomaly is identified when
to a common sampling rate, yielding a unique matrix of data either the D-st or the Q-st exceeds the pre-defined UCLs.
of high dimensionality. In practice, when possible, all sources
are parsed at the same time rate, so that the fusion operation
is done by simply appending the features associated to the D. Diagnosis
different sources. Once an anomaly is detected, a diagnosis step is performed
The combination of the feature-as-a-counter and the fusion to identify the features associated with it. This information
procedure is specially suited for the subsequent multivariate is very useful to identify and, eventually, troubleshoot the
analysis. It yields high dimensional feature vectors that need to possible root causes of the anomaly. The contribution of the
2182 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 8, AUGUST 2019

features to a given anomaly can be investigated with the performance of the model. Several variants of the PLS algo-
contribution plots or similar tools, like oMEDA [32]. Thus, rithm have been proposed [14], [26]–[29] with the goal of
anomalies are detected in the D-st and/or Q-st statistics, and performing variable selection during model calibration. This
then the diagnosis is performed with e.g. oMEDA. The output family of algorithms is termed Sparse PLS (SPLS), and are
of oMEDA is a 1 × M vector where each element contains often reported to show improved performance over PLS in
the contribution of the corresponding feature to the anomaly high dimensional data. The most extended approach to define
under study. Those contributions with large magnitude, either sparse models is based on the LASSO regularization [33],
positive or negative, are considered to be relevant. often applied using a soft-thresholding operation [34].
While the diagnosis capabilities of MSNM are a main One popular SPLS algorithm is the variant proposed by
advantage over other ML techniques, these capabilities are not Lê Cao et al. [35] where the (sparse) PLS problem is solved
the focus of this paper. The interested reader is referred to [24]. using singular value decomposition [36]. Briefly, given the
response (Y) and predictor (X) matrices, the R-rank matrix
IV. PARTIAL L EAST S QUARES R EGRESSION C = XT Y (10)
AND S PARSE VARIANTS
can be decomposed as
This section introduces the regression techniques that we
will employ within the optimization of the scaling parameters C = GDUT (11)
of MSNM. The regression models are used to estimate the
where the matrices G (N × R) and U (L × R) are orthonormal
gradient in the optimization. Since the scaling parameters, and
and D is R × R diagonal containing the singular values of C.
so the gradient, are high dimensional, we need regression tech-
Using this formulation, the loading vectors pr and qr for X and
niques that are suitable for the analysis of high dimensional
Y are the singular vectors gr and ur of G and U, respectively.
data sets. This is the case of PLS and sparse variants.
A sparse PLS solution can be obtained by penalizing (i.e.
forcing to 0) the loadings [37], which are a measure of
A. Partial Least Squares the relative importance of each variable to the PLS model,
PLS is a particular form of regression that is suitable by solving the optimization problem:
in presence of collinearity in the predictors, common in
arg min C − pqT 2F + λ1 p1 + λ2 q1 (12)
high dimensional problems. Collinearity makes the classical p,q
multivariate least-squares (LS) regression break down due
whose solution is given by the soft-thresholding gλ (x) =
to singularity of the covariance matrix. PLS extends LS by
sign(x) (|x| − λ)+ applied to the standard PLS solution. The
inheriting the philosophy of PCA.
two parameters λ1 and λ2 control the sparsity for X and Y,
Briefly, given a N ×O response matrix Y and a N ×M set X
respectively. However, a more practical and equivalent alter-
of predictors variables, the PLS algorithm defines a subspace
native is to select the number of non zero components of the
of X which maximizes its covariance with Y. The PLS model
loadings Nx and N y [35] in the soft-thresholding [34]. In the
can written as:
context of this paper we restrict ourselves to set Nx , so that
X = T · PT + E (6) only loadings in the x-block are sparse, since the response is
univariate. Optimum values of this parameter can be found by
Y = T · QT + F (7)
cross-validation.
where T is the N×A score matrix, A the number of latent
variables (LVs) to be used to fit the model, P and Q are the C. Group-Wise PLS
M×A and O×A loading matrices for the predictors and the A different approach to obtain sparse PLS solutions is the
response, respectively, and E and F are the N×M and N×O recently proposed Group-wise PLS (GPLS) [38], where the
residual matrices of X and Y. Setting solution is found by defining groups of correlated predictor
B̂ P L S = W · (P T · W)−1 · QT (8) variables rather than using regularization.
Briefly, the GPLS algorithm starts by defining a set of K
with W the M×A matrix of weights, the two models given by (possibly overlapping) groups of correlated variables that are
Equations (6) and (7) can be re-arranged in a single regression obtained from a M × M correlation map M computed from
model given by C = XT Y. Subsequently, the weights and scores of K PLS
Y = X · B̂ P L S + F (9) models of 1 LV, each of them considering only the set of
variables corresponding to one of the groups, are computed.
The PLS model can be obtained, among others, using the From these, the one with the largest correlation with Y is
NIPALS algorithm, and the optimal number of LVs A can retained while the others are discarded. This is repeated for a
be estimated by cross-validation. number of LVs.
The GPLS approach is particularly suited for data explo-
B. Sparse PLS ration, but when data is sparse in a group-wise fashion (i.e.
Although PLS can effectively handle noisy predictor vari- when there are groups of correlated variables related to the
ables, the inclusion of variables which are non-relevant for response) the algorithm outperforms PLS and SPLS in terms
the prediction of the response usually decreases the prediction of goodness of prediction.
CAMACHO et al.: SEMI-SUPERVISED MULTIVARIATE STATISTICAL NETWORK MONITORING 2183

4. Check for convergence in the solution and otherwise loop


back to Step 1.
In each iteration, the xPLS model captures the variability
around ui related to the response. This allows the estimation
of the gradient in the optimization. A random signal is added
to the input to ensure that there is enough trade-off between
exploration and exploitation.
We have intentionally overlooked the problem of meta-
parameter estimation within the optimization. As already dis-
cussed, we need to define the number of LVs in PLS and there
Fig. 1. Optimization scheme.
is one additional meta-parameter in each of the two sparse
variants. Unfortunately, the use of cross-validation or other
The fitting of a GPLS model requires the definition of
automatic means for meta-parameter selection is computa-
a threshold, γ , to identify the groups of variables in M,
tional intensive, but this problem can be overcome. Within a
controlling simultaneously the number and size of the groups
gradient based optimization, we can simplify meta-parameter
of variables to be used. Optimal values for γ can be chosen by
selection for the sake of computational efficiency. This is done
graphically inspecting the correlation map M, by controlling
by fixing meta-parameters during the optimization. To min-
the trade-off between group size and dimension or by using
imize a detrimental effect on performance, we set PLS,
cross-validation.
SPLS or GPLS to be very parsimonious, with the intuition
V. RUN - TO -RUN X PLS O PTIMIZATION that if the model was parsimonious in excess, this would be
The optimization of the scaling parameters of MSNM is overcome by performing more iterations in the optimization.
defined as a gradient descent algorithm, where the gradi- First, we suggest to use one single LV in step 2., since any
ent is estimated with the PLS-based methods discussed in further contribution of additional LVs can be done in future
the previous section. This optimization approach has been iterations. Regarding parameters Nx in SPLS and γ in GPLS,
shown to outperform state-of-the-art optimization techniques, we suggest to set them so that models are very sparse, since
in particular genetic algorithms, in high dimensional search again any missing variability in one iteration can be taken into
spaces [10]. account in future iterations.
Let us define u as a set of inputs we can vary as desired, typ-
ically within some specific bounds or constraints, to a system VI. S EMI -S UPERVISED M ULTIVARIATE S TATISTICAL
we would like to optimize. Let us also define y as the set of N ETWORK M ONITORING
outputs we would like to optimize (either maximize or min- This section particularizes the R2R optimization algorithm
imize) by setting appropriate values to u. The goal of the to our specific application: the optimization of the scaling
optimization algorithm is to find those values in the input parameters in MSNM. Here, the input u we would like to set
that optimize the output, see Fig.1. Without loss of generality, is the scaling of the features in MSNM so as to maximize its
in the following we will assume we desire to maximize the detection performance, which is the output y. In the following
values in y by properly setting u, which is contrary to common we discuss in detail inputs and outputs and the complete
optimization literature but more appropriate for our specific system.
case.
The run to run (R2R) optimization algorithm [10], extended A. Output of the optimizer
for a general optimization problem and for PLS, SPLS and
There are several possibilities to measure the detection per-
GPLS, can be summarized as follows:
0. Select user defined parameters K (number of individual formance of an anomaly detection system like MSNM. Here
solutions in each iteration) and rc (level of exploration). we will use the Area Under the ROC Curve (AUROC or AUC)
Initialize input solution candidate ui for i = 0. computed from a labeled data set, where observations are
1. Repeat for k = {1...K } labeled as normal or attacks. The receiver operating charac-
teristic (ROC) curve shows the evolution of the true positive
1.1. Generate random variant solution ũik = ui +rc ·rik , for
rate (TPR) versus the false positive rate (FPR) for different
rik drawn from a multinormal random distribution.
values of the classifying threshold, discussed below. The TPR
1.2. Apply input ũik to the system in Fig.1 and measure
is the percentage of true attacks that are identified by the
output ỹik .
MSNM system, while the FPR is the percentage of normal
2. Fit a xPLS model with the K instances of the inputs ũik observations identified as attacks. The AUC is a scalar that
arranged in the rows of ũi , and outputs ỹik arranged in quantifies the quality of the anomaly detector. An anomaly
rows of Ỹi : detector should present an AUC as close to 1 as possible,
Q
Y = ũ · B̂ + F (13) while an AUC around 0.5 corresponds to a random classifier.
i i i
The ROC curves for MSNM are obtained by varying a
3. Compute the next input solution candidatea as: ui+1 = threshold in a specific combination of the Q-st and the D-st:
ui + 3 · B̂i
A · Dn (M − A) · Q n
M S N Mn = + (14)
a For minimization, the second addend is subtracted to the current solution.
M · UCLD M · UCLQ
2184 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 8, AUGUST 2019

where M S N Mn is the output of the anomaly detector at a


given observation n, Dn and Q n the corresponding statistics
and U C L D and U C L Q the corresponding 99% Upper Control
Limits (UCL), computed as percentiles in the calibration data.
Recall that A is the number of components in PCA and M
the number of features in the data.
B. Input of the Optimizer
To optimize the AUC, we modify the values of the scaling
of the features. These features were computed in the parsing
step of MSNM. The scale of the features changes their relative
importance in the PCA model. Since PCA maximizes variance,
the higher the relative scale (weight) of a variable, the more
percentage of its information is captured in the scores and the
less in the residuals of the model in Equation (1). However,
understanding how this scaling impacts the detection ability
of a MSNM system for a given attack is a real challenge, due
to the non-linear nature of the detection statistics.
Given a vector xm of size N ×1, corresponding to a column Fig. 2. Semi-supervised MSNM Optimization scheme.
of X with the values of one given feature (counter) in the N
calibration observations, which we assume to be mean centred, TABLE I
its scaled version is given by F EATURES OF THE C ALIBRATION AND THE
T EST S ETS IN THE UGR’16 DATASET
x˜i = xi wi (15)
Here we propose to define a set of values of
w1 , w2 , . . . , w M to be chosen such that when applied to the
calibration data X, detection by MSNM will be optimal in
terms of AUC for one type or a set of types of attacks identified
in a labeled data set.
C. Optimization Procedure
The complete semisupervised system is depicted in Fig. 2.
A detailed description of the optimization procedure follows:
1) Row vector w0 , 1 × M, is initialized such as w0 = UGR’16 dataset [15]. These data consist of Netflow network
(1, 1, . . . , 1) M . The number of individual solutions in traces taken from a real Tier 3 ISP network composed of
each iteration, K >> 1, and the level of exploration rc virtualized and hosted services of many companies and clients.
are chosen. Netflow sensors were located in the border routers of the
2) Row vector w01 , 1 × M, is generated such as w01 = w0 + network, capturing all the incoming and outgoing traffic from
rc · r01 where r01 is a 1 × M random vector whose entries the ISP. All the details related to the dataset are summarized
are ≈ N(0, 1). in Table I and can be consulted in [15]. Two blocks of data
3) The weighting vector w01 is applied to the calibration are provided, one for training models (calibration set), and the
data X and the detection performance of the resulting other for testing the results obtained from those models (test
MSNM system AU C01 is recorded. set).
4) Step 1 and 2 are repeated K times: resulting vectors The UGR’16 dataset is especially interesting for our exper-
w01 , w02 , . . . , w0K are arranged in a K × M matrix S0 and iments because the collected traffic includes controlled attack
the corresponding system performances in a K ×1 vector traffic against fake victims generated by 25 virtual machines
y0 = (AU C01 , AU C01 , . . . AU C0K )t . The k-th row of S that were deployed within the network. Thus, our aim is to test
contains the k-th vector of weights w0k and yk = AU C0k . if our optimization algorithm is able to capture the relevant
5) A xPLS model is fitted regressing y0 on S0 obtaining a variables for every attack type and properly scale them to
set of regression coefficient B̂0 . achieve good detection results. Although the variety of attacks
6) The current solution is updated: w1 = w0 + 3 · B̂0 . in this dataset is limited, this is the only recent dataset (see
7) We check for convergence in w, and otherwise loop to survey of datasets in [15]) that includes real background traffic
step 2. for a considerable amount of time (4 months), which is an
essential requisite to properly evaluate the false positive rate
VII. C ASE S TUDY: UGR’16 DATA S ET
in our detection results.
A. Experimental Framework The attack traffic was performed during the test set collec-
We evaluate our approach to optimize variables scaling in a tion, in particular during its first 12,000 observations, and it
real scenario. The dataset considered is the publicly available presents these different patterns:
CAMACHO et al.: SEMI-SUPERVISED MULTIVARIATE STATISTICAL NETWORK MONITORING 2185

TABLE II
VARIABLE VALUES C ONSIDERED AS F EATURES
IN O UR D ETECTION S YSTEM

• Low-rate DoS (dos): TCP SYN attack during 3 minutes


Fig. 3. AUC values per artificial attack type. Supervised and semi-supervised
by using the tool hping3. There are three different vari- approaches calibrated without distinguishing attack types. Quartiles (25% and
ants, where one-to-one or many-to-one are combined with 75%) and median are shown in the bars.
different schedulings.
• Port scanning (scan11): Continuous one-to-one scanning
from an attacker to a single victim’s IP during 3 minutes unsupervised MSNM for two preprocessing schemes, Mean-
by using the nmap tool. Centering (MSNM-MC) and Auto-Scaling (MSNM-AS), are
• Port scanning (scan44): Continuous scanning from 4 dif-
considered. Two variants of the SVM are also considered,
ferent attacker machines to four victim’s IP in parallel one based on the linear kernel (SVM-L) and one on the
during 3 minutes by using the nmap tool. radial basis function kernel (SVM-RBF). To calibrate the
• Botnet traffic (nerisbotnet): The test set includes
anomaly detector, unsupervised techniques only make use
botnet traffic traces obtained from the execution of the of the (cleaned) calibration data, described in the previous
malware known as Neris, corresponding to the capture section. Supervised techniques employ only the (cleaned) first
CTU-Malware-Capture-Botnet-42 available in [39]. test data set, which includes both normal traffic and all the
In this malware version, infected bots send SPAM, types of artificial attacks. SVM metaparameters are selected
connect to an HTTP C&C server and use HTTP to according to recommendations in the Matlab documentation of
perform some ClickFraud. function ’fitcsvm’. The semi-supervised MSNM is initially set
to the MSNM-MC, and then optimized with the R2R algorithm
For the processing of the dataset, we consider time intervals
using the (cleaned) first test data set. In all optimization runs,
of one minute. All the flows during an interval are aggregated
K is set to 100 and rc is set to 0.01. Methods are compared in
and summarized into a M-dimensional vector, which corre-
terms of the AUC computed for the (cleaned) second test set,
sponds to an observation. In particular, we define a set of
and thus from independent data to that used in the calibration
M=138 network-related features, corresponding to 11 different
of the anomaly detectors. We derive confidence intervals in
Netflow variables, as shown in Table II.
the performance using resampling techniques without replace-
Calibration data was cleaned of outliers following the Phase
ment.
1 approach in [8], so we expect it to be mainly composed of
Fig. 3 shows the AUCs of the different detection approaches
normal observations. A main real threat identified and cleaned
for the four artificial attack patterns. For the calibration, no dis-
from the calibration and test data is a SPAM campaign driven
tinction is made on the type of attack, that is, the anomaly
from some of the virtual machines located in the ISP. The
detectors are calibrated using two types of labels: normal and
cleaned calibration set is used to calibrate the MSNM system.
attack. For the evaluation, each group of AUCs is computed
The first 12,000 observations in the cleaned test data were split
comparing normal data with the specific type of attack, leaving
in two independent parts with 6,000 observations. The first set
out the observations corresponding to the other attack patterns.
is used within the optimization to select the optimum scaling.
We expect supervised approaches to outperform the others in
The second set is used to validate the results. We also used
this experiment, since all the attacks under evaluation were
the unclean second test data set, including the SPAM traffic,
used in their calibration. As expected, unsupervised methods
to assess the performance of the methods with previously
generally yield the worst results, in particular the MSNM-MC
unseen attacks.
approach. The MSNM-AS is generally outperformed by semi-
supervised techniques. The improvement is more notable in
B. Results the case of the nerisbotnet pattern. It is remarkable that this
To assess the performance of the semi-supervised MSNM is generated by a botnet that is equipped with mechanisms to
approach, we compare it with the unsupervised MSNM and hide its traffic and, thus, the detection of this traffic following
the Support Vector Machine (SVM) technique, the latter being unsupervised methodologies is a real challenge. In general
a representative of a supervised approach. Two variants of the we can conclude that the R2R optimization is improving the
2186 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 8, AUGUST 2019

Fig. 5. AUC values per artificial attack type. Supervised and semi-supervised
Fig. 4. AUC values for the real SPAM traffic. Quartiles (25% and 75%) and approaches calibrated per attack type. Quartiles (25% and 75%) and the
median are shown in the bars. median are shown in the bars.

performance of the MSNM, but this improvement is limited classes of data. However, the model identifies a support that
when diverse attacks patterns are optimized at the same time, contains one of the classes, and the rest of the feature space
since different patterns may counteract in the optimization. is assigned to the other class. If the former is the class
Supervised approaches based on SVM yield very good results. for normal observations, and the latter models the attacks,
In particular, the SVM-RBF shows a similar performance the configuration of SVM-RBF mimics that in MSNM, or in
to semi-supervised MSNM, and the linear kernel is among general of an anomaly detector, and is indeed very useful to
the best anomaly detectors for all the anomalies and clearly detect new anomalies. Note, however, that if the labels for
outperforms the other methods in ’nerisbotnet’. normal and anomalous classes are switched, the SVM-RBF
One typical advantage that is claimed for unsupervised (and provides the worst performance, and that, according to our
most semi-supervised) methods in comparison to supervised previous discussion, this specific result is very dependent on
approaches is that they are capable of identifying new threats. the relationship between the new threat and the normality
This is because they are based on a model of normality, and model.
any new threat that does not follow that model can be detected. Another experiment we can conduct is to calibrate the semi-
To check whether the semi-supervised MSNM retains this supervised and supervised methods specifically per attack
capability after the scaling optimization, we compared its type. This approach can be used to calibrate ensembles of
performance with the other methods in the detection of the detectors, which may be a suitable approach when the profiles
real SPAM traffic in the (unclean) second test data set. The of different attacks vary to a large extent. Mimicking rule-
anomaly detectors are the same as those used before, and based detectors, where each rule corresponds to a single type
therefore they were calibrated using independent and cleaned of attack, we can have an anomaly detector optimized for
data, in absence of any SPAM trace. Therefore, supervised each attack type. Performance results are shown in Fig. 5.
techniques are not expected to outperform the others in this We can see that this approach is generally beneficial for semi-
experiment. Results are shown in Fig. 4. We can see that supervised approaches, while in supervised methods it even
the semi-supervised methods provide a good performance in degrades the results. Among the semi-supervised methods,
comparison to the rest, and even outperform the unsupervised sparse PLS methods tend to outperform standard PLS, in par-
MSNM. This cannot be understood as a general result. In prin- ticular for the ’nerisbotnet’ attack.
ciple, all MSNM variants have the same apriori probability From the previous experiments, we can conclude that the
to detect new threats, since this depends very much on the R2R optimization leads to a general improvement of the
relationship between the new threat and the normality model. MSNM performance, conforming a semi-supervised approach
As the new threat can be anything, this relationship will likely that is competitive with state-of-the-art techniques and that
vary from case to case. However, this experiment shows that retains the capability to detect new threats. Note that the unsu-
the optimization did not have a negative effect on the ability pervised MSNM was generally outperformed by supervised
of MSNM to detect the unseen threat. Regarding supervised techniques, but this comes at a price: supervised techniques are
approaches, the SVM-L yielded a very poor result on the not generally applicable to most real cases, where the labeling
new attack type, as expected for a supervised technique, but of observations is not available. This is actually the common-
the SVM-RBF showed the best result among the methods. place in the cybersecurity industry [40]. Semi-supervised
This result can be explained due to the special properties of techniques can still be used when none or a partial labeling is
SVM-RBF. This method is supervised since during calibration, available. Furthermore, the MSNM approach has the additional
the classifier is optimized to distinguish between the two advantage over state-of-the-art supervised techniques to be an
CAMACHO et al.: SEMI-SUPERVISED MULTIVARIATE STATISTICAL NETWORK MONITORING 2187

Fig. 6. Optimized weights by PLS (a), SPLS (b) and GPLS (c). GlobOpt makes reference to the optimized profile obtained with no distinction among attack
types. SpecOpt makes reference to the optimized profile per attack type.

interpretable model, and thus easier to use and understand semi-supervised approach for a set of disparate attacks, rather
by practitioners [8]. While above we only compared in terms than on a per-attack basis.
of detection, MSNM also provides diagnosis support, that is, To interpret the profiles in Fig. 6, we selected those peaks
information about why an attack was identified as such. This is exceeding the average scaling value plus one standard devia-
actually a principal ability to reduce the time of response to an tion and listed them in Tables III, IV and V. Recall that these
attack or to quickly identify a false positive. Black-box models, variables will have a higher influence on the MSNM detector,
like the non-linear SVM, cannot be interpreted. Therefore, but the rest of variables will also impact the detection, to a
they do not provide the information about why an attack was lesser extent.
identified. Finally, it should be noted that none of the SVMs In Table III (PLS) we see that the GloOpt selects three
generally outperformed the semi-supervised methods in all the features: the number of connections with source port 1080
detection experiments performed. (sport_socks), with source port between 49152 and
It turns out that the result of the R2R optimization can also 65535 (sport_private) and with destination port 6667
be interpreted. Fig. 6 compares the scaling profiles obtained (dport_irc). The first feature is related to the SOCKS
from the optimization with PLS, SPLS and GPLS, and with proxy, an Internet proxy service. That port has been associated
and without distinguishing among the attack types. In general, in the past to several types of attacks, mainly trojans and
sparse methodologies provide clearer profiles, with lower SPAM. The second feature is a very general one, and might
numbers of picks and easier to interpret. The picks reflect have been incorrectly selected due to the low signal-to-noise
those features that are relevant for the detection of the type ratio in R2R-PLS. The last feature is related to the nerisbotnet.
of attack. We can also see that the optimization using all It is out of question that the IRC port is also related to
attack types (GloOpt) is dominated by the ’nerisbotnet’ attack normal activity. However, in the traffic of the network we
pattern, since in all cases the profile shows a large pick in are monitoring, the amount of IRC is low and the counter
the same feature than the profile specifically optimized for in dport_irc can be a valid means to detect malicious
that attack. Again, this illustrates the limitation of using this activity.
2188 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 8, AUGUST 2019

TABLE III VIII. D ISCUSSION AND C ONCLUSION


VARIABLES W ITH H IGHEST W EIGHTS F ROM PLS O PTIMIZATION
In this paper, we provide a solution to combine the advan-
tages of unsupervised and supervised learning in the con-
text of intrusion detection. For this, we use the Multivariate
Statistical Network Monitoring approach, recently proposed,
and we enhance an optimization algorithm based on Partial
Least Squares, specially suited for multivariate optimization
problems. The result is an anomaly detection system that can
be optimized for the detection of a set of attack patterns.
Our approach provides a machine learning technique with
similar flexibility to update to new attack patterns as in a
rule based system. Combined with unsupervised methods,
we can still identify unseen (zero-day) patterns of malicious
activity. This paper also introduces for the first time the
TABLE IV application of sparse methodologies in intrusion detection,
VARIABLES W ITH H IGHEST W EIGHTS F ROM SPLS O PTIMIZATION which were seen to be very effective within the proposed semi-
supervised detection machine. Results with real traffic showed
the practical applicability of the approach.

ACKNOWLEDGMENT
Anonymous reviewers are acknowledged for their useful
comments.

R EFERENCES
TABLE V
VARIABLES W ITH H IGHEST W EIGHTS F ROM GPLS O PTIMIZATION [1] Data Breach Investigation Report, Verizone, New York, NY, USA, 2017.
[2] M. Solomon. The Multiplier Effect of Collaboration for Security
Operations. Accessed: Apr. 1, 2018. [Online]. Available:
https://www.securityweek.com/multiplier-effect-collaboration-security-
operations
[3] P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and
E. Vázquez, “Anomaly-based network intrusion detection: Techniques,
systems and challenges,” Comput. Secur., vol. 28, nos. 1–2,
pp. 18–28, 2009. [Online]. Available: http://www.sciencedirect.
com/science/article/pii/S0167404808000692
[4] S. Dua and X. Du, Data Mining and Machine Learning in Cybersecurity.
Looking at the selection in Table IV, for SPLS, we can Boca Raton, FL, USA: CRC Press, 2016.
[5] A. Ferrer, “Latent structures-based multivariate statistical process con-
see that the number of features selected is reduced and more trol: A paradigm shift,” Qual. Eng., vol. 26, no. 1, pp. 72–91, 2014.
interpretable, but we still find some potentially inconsistent [6] G. Fernandes, Jr., L. F. Carvalho, J. J. P. C. Rodrigues, and
results. For instance, in the list of the most relevant variables M. L. Proença, “Network anomaly detection using IP flows with princi-
pal component analysis and ant colony optimization,” J. Netw. Comput.
for nerisbotnet, we can see that dport_oracle is present, despite Appl., vol. 64, pp. 1–11, Apr. 2016. [Online]. Available: http://www.
the fact that this botnet does not generate any traffic towards sciencedirect.com/science/article/pii/S1084804516000618
the oracle port. This potential inaccuracy is not affecting [7] A. Lakhina, M. Crovella, and C. Diot, “Diagnosing network-
wide traffic anomalies,” ACM SIGCOMM Comput. Commun. Rev.,
the detection results of this attack, as shown in Fig. 5, vol. 34, no. 4, pp. 219–230, Oct. 2004. [Online]. Available:
which are considerably improved in comparison to the other http://dl.acm.org/citation.cfm?id=1030194.1015492
MSNM variants. Using GPLS (Table V) the features are [8] J. Camacho, A. Pérez-Villegas, P. García-Teodoro, and
G. Maciá-Fernández, “PCA-based multivariate statistical network
subsequently reduced. While this is in general convenient, it monitoring for anomaly detection,” Comput. Secur., vol. 59,
does not necessarily imply a benefit in terms of performance. pp. 118–137, Jun. 2016. [Online]. Available: http://www.sciencedirect.
For instance, the global optimizer is mainly focus on the com/science/article/pii/S0167404816300116
[9] H. Ringberg, A. Soule, J. Rexford, and C. Diot, “Sensitivity of PCA
nerisbotnet attack. Differences between SPLS and GPLS may for traffic anomaly detection,” ACM SIGMETRICS Perform. Eval.
be associated to the degree of sparseness of the models, Rev., vol. 35, no. 1, pp. 109–120, Jun. 2007. [Online]. Available:
which is governed by the specific metaparameters used but http://dl.acm.org/citation.cfm?id=1269899.1254895
[10] J. Camacho, J. Picó, and A. Ferrer, “Self-tuning run to run optimization
also by the specificities of the training data. In SPLS we of fed-batch processes using unfold-PLS,” AIChE J., vol. 53, no. 7,
used Nx = 2, the most parsimonious value that results in pp. 1789–1804, 2007.
multivariate regression vectors. In GPLS we set γ = 0.8. The [11] J. Camacho, D. Lauri, B. Lennox, M. Escabias, and M. Valderrama,
“Evaluation of smoothing techniques in the run to run optimization
same metaparameters may result in opposite sparseness levels of fed-batch processes with u-PLS,” J. Chemometrics, vol. 29, no. 6,
for a different data set. However, we can generally conclude pp. 338–348, 2015, doi: 10.1002/cem.2711.
that the use of sparse methods within the R2R algorithm led to [12] H. Martens and T. Næs, Multivariate Calibration. Hoboken, NJ, USA:
Wiley, 1992.
improvements in terms of AUC and parsimony, and therefore [13] P. Geladi and B. R. Kowalski, “Partial least-squares regression: A tuto-
of interpretability of results. rial,” Anal. Chim. Acta, vol. 185, pp. 1–17, 1986.
CAMACHO et al.: SEMI-SUPERVISED MULTIVARIATE STATISTICAL NETWORK MONITORING 2189

[14] C. Colombani et al., “A comparison of partial least squares (PLS) and [38] J. Camacho and E. Saccenti, “Group-wise partial least square regres-
sparse PLS regressions in genomic selection in French dairy cattle,” sion,” J. Chemometrics, vol. 32, no. 3, p. e2964, 2018.
J. Dairy Sci., vol. 95, no. 4, pp. 2120–2131, 2012. [39] CTU-13 Dataset. Accessed: Apr. 1, 2018. [Online]. Available: https:
[15] G. Maciá-Fernández, J. Camacho, R. Magán-Carrión, //stratosphereips.org/category/dataset.html
P. García-Teodoro, and R. Therón, “UGR’16: A new dataset for [40] K. Kavanagh and O. Rochford, Critical Capabilities for Security Infor-
the evaluation of cyclostationarity-based network IDSs,” Comput. mation and Event Management, Gartner, document G00348945, 2015.
Secur., vol. 73, pp. 411–424, Mar. 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0167404817302353
[16] V. Cherkassky and F. Mulier, Learning from Data: Concepts, Theory,
and Methods. Hoboken, NJ, USA: Wiley, 2007. José Camacho received a degree in computer sci-
[17] E. Alpaydin, Introduction to Machine Learning. Cambridge, MA, USA: ence from the University of Granada in 2003, and
Massachusetts Institute of Technology, 2010. the Ph.D. degree from the Technical University of
[18] S. Skansi, Introduction to Deep Learning: From Logical Calculus to Valencia in 2007. He is currently an Associate
Artificial Intelligence. New York, NY, USA: Springer, 2018. Professor with the Department of Signal Theory,
[19] A. Kanaoka and E. Okamoto, “Multivariate statistical analysis of Telematics and Communication and a Researcher
network traffic for intrusion detection,” in Proc. 14th Int. Workshop with the Information and Communication Technolo-
Database Expert Syst. Appl., Sep. 2003, pp. 472–476. gies Research Centre, University of Granada, Spain.
[20] C. Callegari, L. Gazzarrini, S. Giordano, M. Pagano, and T. Pepe, His current research interests include exploratory
“A novel PCA-based network anomaly detection,” in Proc. IEEE Int. data analysis, anomaly detection, and optimization
Conf. Commun., Jun. 2011, pp. 1–5. with multivariate techniques applied to data of very
[21] A. Delimargas et al., “Evaluating a modified PCA approach on net- different nature, including manufacturing processes, chemometrics, and com-
work anomaly detection,” Proc. 5th Int. Conf. Next Gener. Netw. munication networks. He is especially interested in the use of exploratory
Services (NGNS), May 2014, pp. 124–131. data analysis to big data for network security. His Ph.D. was awarded with
[22] C. Callegari, L. Gazzarrini, S. Giordano, M. Pagano, and T. Pepe, the second Rosina Ribalta Prize to the best Ph.D. projects in the field of
“Improving PCA-based anomaly detection by using multiple time scale information and communication technologies from the EPSON Foundation,
analysis and Kullback–Leibler divergence,” Int. J. Commun. Syst., and with the D. L. Massart Award in chemometrics from the Belgian
vol. 27, no. 10, pp. 1731–1751, Oct. 2014, doi: 10.1002/dac.2432. Chemometrics Society.
[23] M. Aiello, M. Mongelli, E. Cambiaso, and G. Papaleo, “Profiling
DNS tunneling attacks with PCA and mutual information,” Logic
J. IGPL, vol. 24, no. 6, pp. 957–970, 2016. [Online]. Available:
http://jigpal.oxfordjournals.org/lookup/doi/10.1093/jigpal/jzw056
Gabriel Maciá-Fernández received the M.S. degree
[24] J. Camacho, P. García-Teodoro, and G. Maciá-Fernández, “Traffic mon-
itoring and diagnosis with multivariate statistical network monitoring: in telecommunications engineering from the Uni-
A case study,” in Proc. IEEE Secur. Privacy Int. Workshop Traffic Meas. versity of Seville, Spain, and the Ph.D. degree in
telecommunications engineering from the University
Cybersecurity (WTMC), May 2017, pp. 241–246.
[25] D. G. Luenberger, Linear and Nonlinear Programming (International of Granada. He is currently an Associate Professor
Series in Operations Research & Management Science), vol. 228. with the Department of Signal Theory, Telematics
Springer, 2008. and Communications, University of Granada, Spain.
From 1999 to 2005, he was a Specialist Consultant
[26] H. Chun and S. Keleş, “Sparse partial least squares regression for
simultaneous dimension reduction and variable selection,” J. Roy. Stat. at Vodafone España. His research interests include
Soc. Stat. Methodol. B, vol. 72, no. 1, pp. 3–25, 2010. computer and network security, with a special focus
on intrusion detection, reliable protocol design, net-
[27] E. Andries, “Sparse models by iteratively reweighted feature scaling:
A framework for wavelength and sample selection,” J. Chemometrics, work information leakage, and denial of service.
vol. 27, nos. 3–4, pp. 50–62, 2013.
[28] R. Calvini, A. Ulrici, and J. M. Amigo, “Practical comparison of sparse
methods for classification of Arabica and Robusta coffee species using
near infrared hyperspectral imaging,” Chemometrics Intell. Lab. Syst.,
vol. 146, pp. 503–511, Aug. 2015, doi: 10.1016/j.chemolab.2015.07.010. Noemí Marta Fuentes-García received a degree
[29] J. Camacho and E. Saccenti, “Group-wise partial least square regres- in computer science from the University of Granada,
sion,” J. Chemometrics, vol. 32, no. 3, p. e2964, 2018. [Online]. Spain, and the master’s degree in software devel-
Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/cem.2964 opment from the University of Granada, in 2015,
[30] J. Camacho, G. Maciá-Fernández, J. Díaz-Verdejo, and where he is currently pursuing the Ph.D. degree
with the Department of Signal Theory, Telematics
P. García-Teodoro, “Tackling the big data 4 vs for anomaly detection,”
in Proc. IEEE INFOCOM, Apr./May 2014, pp. 500–505. and Communications. Her Ph.D. is based in network
[31] P. Nomikos and J. F. MacGregor, “Multivariate SPC charts for monitor- monitoring for anomalies detection and diagnosis
ing batch processes,” Technometrics, vol. 37, no. 1, pp. 41–59, 1995. by using data analysis. Since 2012, she has been
[32] J. Camacho, “Observation-based missing data methods for exploratory working for several companies, which has driven her
data analysis to unveil the connection between observations and vari- to find a link between research and enterprise.
ables in latent subspace models,” J. Chemometrics, vol. 25, no. 11,
pp. 592–600, 2011.
[33] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.
Stat. Soc., B, Methodol., vol. 58, no. 1, pp. 267–288, 1994. Edoardo Saccenti received the M.Sc. degree in
[34] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with physics and the Ph.D. degree in structural biology
Sparsity: The Lasso and Generalizations. London, U.K.: Chapman & from the University of Florence, Italy. His main
Hall, 2015. research is multivariate statistics, in particular: prin-
[35] K.-A. Lê Cao, D. Rossouw, C. Robert-Granié, and P. Besse, “A sparse cipal component analysis and related methods with
PLS for variable selection when integrating omics data,” Stat. Appl. a focus on the problem of dimensionality assessment
Genet. Mol. Biol., vol. 7, no. 1, 2008, Art. no. 35. and its relationships with inferential statistics in
[36] A. Lorber, L. E. Wangen, and B. R. Kowalski, “A theoretical foundation the frame of random matrix theory; power analysis
for the PLS algorithm,” J. Chemometrics, vol. 1, no. 1, pp. 19–31, 1987. and sample size determination in the context of
[37] H. Shen and J. Z. Huang, “Sparse principal component analysis via PCA, PLS-DA, and network inference; and sparse
regularized low rank matrix approximation,” J. Multivariate Anal., component methodologies for data exploration and
vol. 99, no. 6, pp. 1015–1034, Jul. 2008. interpretation.

You might also like