00-Variables Influencing The Effectiveness of Signature-Based Network Intrusion Detection Systems
00-Variables Influencing The Effectiveness of Signature-Based Network Intrusion Detection Systems
To cite this article: Teodor Sommestad, Hannes Holm & Daniel Steinvall (2022)
Variables influencing the effectiveness of signature-based network intrusion detection
systems, Information Security Journal: A Global Perspective, 31:6, 711-728, DOI:
10.1080/19393555.2021.1975853
ABSTRACT KEYWORDS
Contemporary organizations often employ signature-based network intrusion detection systems to Infrastructure protection;
increase the security of their computer networks. The effectiveness of a signature-based system network-level security and
primarily depends on the quality of the rules used to associate system events to known malicious protection; network
monitoring; intrusion
behavior. However, the variables that determine the quality of rulesets is relatively unknown. This
detection
paper empirically analyzes the detection probability in a test involving Snort for 1143 exploitation
attempts and 12 Snort rulesets created by the Emerging Threats Labs and the Sourcefire
Vulnerability Research Team. The default rulesets from Emerging Threats raised priority-1-alerts
for 39% of the exploit attempts compared to 31% for rulesets from the Vulnerability Research Team.
The following features predict detection probability: if the exploit is publicly known, if the ruleset
references the exploited vulnerability, the payload, the type of software targeted, and the operating
system of the targeted software. The importance of these variables depends on the ruleset used
and whether default rules are used. A logistic regression model with these variables classifies 69–
92% of the cases correctly for the different rulesets.
CONTACT Teodor Sommestad [email protected] Swedish Defence Research Agency, Stockholm, Sweden
© 2021 The Author(s). Published with license by Taylor & Francis Group, LLC.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-
nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built
upon in any way.
712 T. SOMMESTAD ET AL.
false positives is reduced and the probability of detect 2. Related work and hypotheses
ing attacks is reasonable (Werlinger et al., 2008),
The body of research that addresses NIDS is con
(Goodall et al., 2009), (Sommestad et al., 2013). The
siderable. For instance, the database Scopus contains
problem with false positives is substantial. For exam
more than 2000 records matching the phrase “net
ple, Tjhai et al. (2008) reported that 96% of the alerts
work intrusion detection system” or the phrase “net
were false positives in a test where Snort monitored a
work-based intrusion detection system.” Few of
university’s web server, Sommestad and Franke
these records present comprehensive models on
(2015) reported that greater than 98% of the alerts
describing variables that are associated with the effi
Snort produced during a cyber security exercise were
cacy or usefulness of signature-based NIDS. This
false positives, and Cotroneo et al. (2019) refers to
section presents nine hypotheses related to detection
analyses were 99% of the alerts are false positives. The
probability drawn from previous tests, the use of
problem of detecting attacks is also considerable and a
threat intelligence in intrusion detection, and
frequent argument against signature-based NIDSs
research the focus on specific software and protocols.
and for the use of anomaly based alternatives is that
signature-based solutions are poor at detecting undi
sclosed (zero-day) exploits (Liao et al., 2013)(Patcha & 2.1. Previous tests of rules and rulesets
Park, 2007)(Holm, 2014) (Khraisat et al., 2019).
Because of this, a many scholars have focused on First, the design of signature-based NIDS and the
anomaly based intrusion detection (Khraisat et al., presence of different rulesets makes it natural to
2019). However, as will be described in below, it is expect that different rulesets will have different
not known when a signature-based NIDS will per capabilities. Thus, detection probability ought to
form well and when it will not, and to what extent depend on the rule set used.
they are capable of detecting undisclosed exploits. Detection probability is dependent on the ruleset
Clearly, a system administrator making decisions con used.(Hypothesis 1)
cerning the use an NIDS would benefit from knowing This hypothesis is well established and clearly
what type of attacks it can be expected to detect. holds true. However, no formal tests can be found
This paper studies how the detection prob of different rulesets. The literature contains a num
ability of the NIDS Snort is associated with ber of more or less realistic tests of signature-based
variables associated with the exploit code, the NIDS that assess detection probability. These tests
targeted software, and the ruleset. The assess are limited to a few attacks and a few NIDS solu
ment was made by executing 267 exploits dated tions. More importantly, they only investigate how
between 2008 and 2019 with different targets attack type, hardware, and NIDS-software influ
and payloads, producing 1143 tests cases. ence detection capabilities. For example, the test
Snort rulesets released between 2011 and 2019 performed by Erlacher and Dressler (2018) on
and developed by the Emerging Threats Lab Snort focus explicitly on HTTP traffic and the
(ET) and Sourcefire Vulnerability Research tests performed in Milenkoski et al. (2015) are
Team (VRT) were employed for the analyses. limited to attacks on a specific web server. Other
The remainder of the paper is structured as research look at rulesets but do not focus on detec
follows. Section 2 describes existing ideas con tion capability. For instance, Nyasore et al. (2020)
cerning detection capabilities of signature-based evaluate the overlap between different rules in two
NIDS, previous work related to these ideas, and rulesets for the popular NIDS called Snort. Thus,
presents hypotheses that are tested in the cur differences between different rulesets have not been
rent paper. This is followed by a description of explicitly tested.
materials and methods used to perform the test To the authors’ knowledge, the only previous
the hypotheses in section 3. Section 4 describes works that explicitly evaluate how different vari
the results and section 5 discusses the results. ables influence detection probability are those by
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 713
Holm (2014), Raftopoulos and Dimitropoulos some model of the threat have been presented
(2013). Holm (2014) tested how much lower the (Hofmann & Sick, 2011) (Shittu et al., 2015)
detection probability when the creators of a ruleset (Ramaki et al., 2015)(Mahdavi et al., 2020). When
do not know about the vulnerability that is it comes to specific signatures that generate IDS
exploited. Raftopoulos and Dimitropoulos (2013) alerts, a number of factors related to threat intelli
tested how detection probability relates to charac gence are worth noting.
teristics of specific rules, such as fields that are First, a number of research efforts have demon
checked. strated how traffic traces with malicious code that
On the other hand, a considerable number of contain actual exploit code in transit can be used to
ideas have been documented on how to automati produce rules. For instance, Wuu et al. (2007) used
cally develop rules for NIDS, and thereby give hints a dataset they had produced themselves,
of things that ought to predict a ruleset’s ability to Portokalidis et al. (2006) used data from a honey
detect different attacks. Ideas on automated rule pots, Khamphakdee et al. (2014) used the Darpa
generation can be traced back to early 2000 dataset and Lee et al. (2018) used malware samples.
(Levine et al., 2003). Many papers on automated These ideas involve a “ground truth” stating the
rule generation focus on the detection of activity part of the traffic that is benign and malicious.
performed after a vulnerability has been exploited They thereby suggest that it is easier to generate
(e.g., connecting to command-and-control net signatures when the malicious traffic is known, i.e.
works (Zand et al., 2014)), performance improve when the exploit is readily available.
ments (e.g., using field programmable gate arrays Detection probability is higher for disclosed
(Hieu et al., 2013)), or the creation of new rules by exploits than for undisclosed (zero-day) exploits.
aggregation of existing rules (e.g., by correlating (Hypothesis 2)
alerts from different NIDSs (Vasilomanolakis et It should be noted that there might be cases
al., 2015)). These contributions are beyond the where this notion is not the case. In particular, it
scope of the study described in this paper as they is sufficient that future attacks can be predicted to
concern-specific solutions that are not (yet) used by create signatures that detect attacks. For instance,
most practitioners. Nevertheless, we were able to some solutions use knowledge about known vul
identify a number of contributions describing nerabilities to identify potential exploit codes auto
methods for constructing rules to detect actual matically (Chandrasekaran et al., 2006). The test
exploits. These methods can be divided into two described by Holm (2014) confirms that knowledge
broad categories. about attacks is of relevance but not necessary.
More specifically, previously unknown attacks
● Solutions relying on knowledge or data on were detected if they involved requests that con
known or suspected attacks, i.e., threat cerned files or commands known to be sensitive,
intelligence. contained a sequence of no-operation-instructions,
● Solutions built to generate signatures for a or interacted with an authentication mechanism
certain type of software or protocol, e.g., web (Holm, 2014).
applications. Second, the amount of time that traffic traces and
other information about attacks have been available
These are described further below together with can be expected to increase detection probability.
related hypotheses. Gascon et al. (2011) refer to this as the “NIDS-
Exploit update delay,” and found that the median
delay was 146 days in a sample they analyzed.
2.2. Variables related to threat intelligence
Detection probability increases as the number of
Many ideas have been presented on how to use days an exploit has been known increases.
information about known or suspected attacks, (Hypothesis 3)
i.e., some form of threat intelligence, to improve There are reasons to question this hypothesis,
intrusion detection. For instance, a number of pro too. The number of attacks has been found to
posals on the correlation of IDS alerts based on peak immediately after the public release of a
714 T. SOMMESTAD ET AL.
vulnerability with little activity long before and occasionally make use of knowledge related to the
after the “day zero” (Bilge & Dumitras, 2012). software and identify events that would be potential
Thus, while traffic traces from the days immediately threats. For instance, a command sent to a server
after the release of an exploit code ought to be software prior to a login is considered suspicious in
useful, traffic traces years after the release may the research by Tran et al. (2012).
add little in terms of threat intelligence. It is not apparent which protocols and software
Third, NIDS rules occasionally have an explicit producers of rules for NIDS focus on. However, it is
reference to the exploit or vulnerability target it reasonable to expect that they will focus on proto
covers. In this case, it should be more effective as cols and software that is frequently used by attack
the attack undoubtedly is known and considered ers in network-based attacks. Given their exposure
relevant by the rule’s creator. to external traffic, web applications and web brow
Detection probability is higher when the sers are two examples of software that is expected to
exploited vulnerability is referenced in the signa be prioritized. In line with this, Snort has special
tures.(Hypothesis 4) support for decoding and normalizing http traffic,
Fourth, exploits typically carry a payload with suggesting that it should cope with such exploits
instructions that are to be executed on the targeted better.
machine, e.g., some piece of software to be exe Detection probability is higher than average for
cuted. Attackers usually want certain functions in attacks on web browsers.(Hypothesis 6)
these payloads, and they are therefore reused Detection probability is higher than average for
between attacks. Some research addresses this and attacks on web applications.(Hypothesis 7)
develops signatures searching for payloads in net The test performed by Holm (2014) provides
work traffic. For instance, Rubin et al. (2004) pre reasons to doubt these hypotheses. It reported a
sent a method to thwart methods that attempt to relatively low detection probability for http-based
split the payload in several TCP packets. Thus, attacks compared to other types of protocols. A
knowledge about the payload carried by an exploit possible reason for this result is that those attacks
and the way it is transferred to the target is also were encrypted with https in the test of Holm
relevant. In particular, payloads associated with (2014), and therefore impossible to process for the
maliciousness should raise alerts when benign non NIDS.
sense payloads do not. The targeted software platform also may influ
Detection probability is lower when payloads ence how well signatures detect attacks because the
with nonsense content (e.g., a debug trap) are platform may require a particular code and other
used.(Hypothesis 5) variables correlate with the software platform. In
In summary, various knowledge about exploits particular, software can vary in their popularity for
and payloads ought to increase the probability that both users and attackers. Signature developers are
an attack is detected by a NIDS. likely to prioritize the more popular, vulnerable,
and critical software. The research on rule genera
tion for SCADA-based systems is an example of
2.3. Variables related of software and protocol
this notion. Based on the widespread use of
A number of ideas have been presented on the use Windows-based operating systems, we theorize
of context information such as network topology in that signature producers will focus more on attacks
intrusion detection, e.g., the work by Mitchell and targeting this platform.
Chen (2015) and Pan et al. (2019). Similarly, solu Detection probability is higher than average for
tions for generating rules are often limited to a attacks targeting windows machine.(Hypothesis 8)
certain type of software or a specific protocol. For Based on the same reasoning, we expect that
instance, Nivethan and Papa (Nivethan & Papa, rules will focus on the more popular software ver
2016) identified rules for SCADA-software, sions. For instance, software versions for English
Nadler et al. (2019)) focus on rules for the DNS users is expected to receive more attention from
protocol , and Garcia-Teodoro et al. (2015) gener signature producers than software versions for
ate rules for the HTTP protocol. These proposals Norwegian users. It is difficult and complex to
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 715
quantify the popularity of different software ver reader, which can be delivered to the vulnerable
sions over the period addressed in this study. machine in many ways, were considered out of
However, we therefore settle with the claim that scope. Local exploits, typically elevating privileges
software versions will be associated with probability on a machine, were also excluded since they are
of detection. typically not associated with network
Detection probability depends on the version of communication.
the software being exploited.(Hypothesis 9) Altogether, 267 exploit modules satisfying the
abovementioned conditions were selected for inclu
sion using stratified randomization, with the years
3. Materials and methods
used as strata to ensure a good distribution over the
This test launched exploits against virtual machines 2008–2019. Most exploits can be set to target dif
deployed in CRATE, a cyber range operated by the ferent types of software and use different payloads.
Swedish Defense Research Agency. Traffic recorded Each exploit was executed against all available tar
from this process was fed to different instances of gets to enable assessments of whether software ver
Snort to obtain data on which cases were detected. sion matters. Three payloads were randomly
The sections below describe the database of selected for each exploit configuration to allow
exploits, signature databases, test procedure, oper analysis of whether the payload matters.
ationalization of variables, and statistical analysis.
3.2. Test control
3.1. Sample of exploits
The 267 exploits targeted a wide variety of ports
The test used exploits modules that are part of the and services. Instead of obtaining all the vulner
Metasploit Framework and provide privileges on able products and installing them in a test envir
the targeted machine when they are successful. onment, the exploits were changed such that they
The Metasploit Framework has since long been would behave properly when they interacted with
one of the most renowned platforms for managing a generic target responding with the right proto
cyber-attacks (Ramirez-Silva & Dacier, 2007) and col on the right port. Some exploits contained
offers a database of exploits that can be considered checks that would abort execution if the target
as representative for contemporary cyber threats. In did not match a predefined pattern (e.g., respond
fact, a clear link has been established between avail ing with a certain web browser version), and
ability of exploits for the Metasploit Framework some contained a nontrivial two way-interaction
and their use in cyber-attacks (Ramirez-Silva & (e.g., involving the use of session-ID in a web
Dacier, 2007). The database used in this research, request). Code was inserted to ensure that the
which is dated spring 2019, contains a total of 1867 exploits would execute if a vulnerable target
exploit modules that provide privileges on the tar machine were on the other side regardless of
geted machine. whether this was the case. For instance, if-
The number of exploits required for this test was statements requiring specific banners in the
determined based on power calculations on the responses of the target machine were removed
number of exploits from the years 2008–2019 and a simple string was used as session ID if the
needed to make statements on the relevance of exploit did not receive as proper session ID from
exploits being publicly known (i.e., hypothesis 2). the simulated server.
Exploits were selected to be representative of the A few sampled exploits were too complicated to
exploits types in Metasploit (e.g., “linux/http”) and revise in a reliable manner. For instance, an SMB-
to be considered relevant for a network-based implementation no longer available on the market
NIDS. The latter means that the exploit needs to required for the exploit windows/smb/timbuktu_
be associated with a specific network communica plughntcommand_bof proved too difficult to simu
tion pattern. Consequently, file format exploits, late. For every abandoned exploit module, a new
such as malicious PDF-files targeting a PDF- module from the same year was randomly selected.
716 T. SOMMESTAD ET AL.
In addition to these changes, code was added to Table 1. Rulesets used in the tests.
the exploits that logged when the exploits sent Producer Release date Number of default rules Total number of rules
ET 2011–04-02 7846 10186
nonmalicious traffic (e.g., connected to a server) ET 2012–01-25 12403 14788
and malicious traffic (e.g., overflowed a buffer). ET 2013–07-24 14795 18163
ET 2014–02-18 16022 19499
This process enabled verification of the execution ET 2015–01-29 17229 21116
and simplified the matching between exploits and ET 2016–01-01 14795 18164
ET 2017–01-15 20543 24912
alerts. ET 2018–01-09 19704 26225
All exploits were executed against one of three ET 2019–02-26 19539 27062
VRT 2014–09-22 5252 21654
types of machines that were chosen depending on VRT 2015–04-06 6721 23492
VRT 2019–02-26 11066 46896
the requirements by the exploit. The three
machines were a Linux-based server, a Windows-
based server, and Windows machine running a web
browser. Twenty instances of each machine type relevant to the typical user or caused too many false
were deployed in CRATE. For example, web server positives. Tests were performed both with and with
attacks targeted an apache web server running on out these rules activated.
the Linux machine type, and web browser attacks The rulesets from VRT and ET were used with
targeted Internet Explorer running in the Windows version 2.9.9.0 of Snort (dated 2016). A few ET
client machine type. Many server-side attacks only rules (at most eight rules from a single ruleset)
involved simple one-way interaction with the target were removed to ensure compatibility with the
machine. All such exploits targeted the Linux Snort version used in the test. These removed
machine type, which was preconfigured to respond rules concerned situations outside of the scope of
to TCP and UDP requests on all targeted ports. this test (e.g., attacks from 2006) and their absence
is not believed to influence the results or conclu
sions from the study in any significant manner.
3.3. NIDS and rulesets
Furthermore, Snort is shipped with a set of prepro
This test used the NIDS Snort for the tests. Snort is cessing rules. As this study only is concerned with
one of the more popular NIDSs (Bhosale & Mane, the rulesets in Table 1, preprocessor rules were
2015), is signature-based, has had an active com disabled.
munity for more than a decade, and has used the Two other Snort versions with and without the
same format for signatures for a long time. These preprocessor rules were tested to ensure that the
features made it possible to compare rulesets pro preprocessor and signature database were indepen
duced at different points in time with each other dent from each other with respect to producing
and to compare rulesets produced by different parts alerts: versions 2.9.2.1 (dated 2012) and 2.9.12
of the community with each other. (dated 2018). Our tests showed that the Snort ver
Unfortunately, there is no publicly available sions produced the same results without the pre
repository of old rulesets. To facilitate the tests, processor rules enabled and that the influence of
this project collected rulesets available through old preprocessor rules was stable across the three tested
versions of security-related Linux distributions versions. The preprocessor increased the probabil
(e.g., Security Onion) and Internet repositories. ity of priority-1-alert by 1%, priority-2-alerts by
Thus, the selected rulesets are considered a conve 12%, and priority-3-alerts by 11%.
nience sample of rules. Rulesets dated approxi
mately 1 year in-between were included. It should
3.4. Test procedure
be noted that the Snort Community rules were also
retrieved and tested. However, these rulesets were All tests were performed according to a standar
excluded from further analysis as they only detected dized procedure implemented in the test platform
between 0% and 2% of the tested attacks. SVED (Holm & Sommestad, 2016) in the cyber
Table 1 describes the rulesets used in the tests. Each range CRATE. SVED allows automated execution
ruleset had a number of rules that were disabled and logging of graphs of connected actions, such as
potentially because they were no longer deemed as attacks, in CRATE.
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 717
Server-side exploits were triggered by simply ● The attack was considered referenced by the
sending the traffic to the server machine; client- ruleset (hypothesis 4) if the exploited vulner
side exploits (exclusively web browser exploits) ability’s CVE code was mentioned in one of the
were triggered by commanding the target machine active rules of the tested ruleset.
to access a particular URL using the web browser. ● Payloads containing a debug trap or custom
The target machine was reset into a known respon command (set as a file listing command in the
sive state and traffic recordings were restarted test) were considered (benign) nonsense pay
before each exploit was executed. loads (hypothesis 5).
The recorded traffic was replayed to the different ● If the target was a web browser, web application,
versions of Snort and rulesets listed in Table 1. or a windows machine (hypothesis 6–8) was
SVED’s logs enabled straightforward comparisons determined based on the exploit module’s name.
between the carried out attacks and the alerts pro ● Each target of an exploit module in the
duced by Snort. As a part of this comparison, IP Metasploit Framework is considered an attack
addresses and timestamps were checked to ensure against a different software version (hypothesis
that alerts were generated for the attack in question 9).
and not for some background traffic, such as a
network broadcast in the cyber range. All these operationalizations relate to the execu
tion of one exploit module with certain settings and
one ruleset configured in a certain manner.
3.5. Variables and their operationalizations
4. Results
This section first presents bivariate analyses test
ing each of the nine hypotheses. This is followed
by the multivariate analysis testing whether the
hypotheses hold when other variables are con Figure 1. Number of rulesets associated with different probabil
trolled for. ity of detection for the attacks.
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 719
Table 2. Number of cases detected given that the attack code insignificant. Furthermore, the correlation across
pre-dates the signatures (i.e. is a zero-day) or not. all rulesets is close to zero (<0.01) when the binary
Detected as known attack
Yes No variables disclosure and references to the exploited
Detected as zero-day Yes 11318 48 vulnerability are controlled for.
No 2405 6293
Thus, hypothesis 3 is only partially supported.
The time since disclosure has a complicated rela
4.3. Time since disclosure of the exploit tionship with detection probability.
As it may take some time to produce rules for new
attacks, hypothesis 3 states that the time a vulner 4.4. Reference to the exploited vulnerability
ability has been publicly known increases the detec
tion probability. Figure 2 and Figure 3 illustrate the Attacks that are referenced in the ruleset are more
relationship between time and the probability of likely to raise a priority-1-alert. Table 3 details how
detection. Overall, the correlation between prob detection relates to references in the dataset. The
ability of priority-1-alerts and the time an attack 95% confidence interval for probability of detection
has been known is positive (r = 0.08) and signifi is 0.55–0.57 for un-referenced attacks and 0.69–
cant. However, the observant reader may notice 0.71 for referenced attacks. Thus, hypothesis 4 is
that the relationship to time appears to be non supported in the data.
linear in the figures. In particular, Figure 3 suggests
that VRT’s default rulesets have low detection
probabilities before an exploit is released and that 4.5. Nonsense payload
detection probability peak approximately ten quar Paired comparisons where only the payload is var
ters (2.5 years) after the exploit has been known. ied reveal a difference in dection in 4% of the cases.
The reason may be that rules typically are deacti In other words, the payload matters. However,
vated when attack levels have reduced, and the rules hypothesis 5 is more specific and states that non
become obsolete. The observant reader may also sense payloads should produce alerts less often than
notice that the probability of detection appears to other payloads. Table 4 illustrates how attacks exe
decrease slightly over time for ET when all rules are cuted with and without payloads representing non
activated (cf. Figure 2). However, the correlation sense are detected. These 1334 pairwise
between days since disclosure in the ET with all comparisons illustrate that attacks are detected
rules activated is close to zero (<0.01) and more seldom when a nonsense payload is used. A
1
0.9
0.8
Probablity of detection
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-40 -30 -20 -10 0 10 20 30 40 50
Number of quarters publically known
Figure 2. ET ruleset detection probability as a function of the number of quarters the vulnerability has been known.
720 T. SOMMESTAD ET AL.
Probablity of detection
0.7
0.6
ing many different types of software (e.g., web
0.5 browsers and web servers) and variants of the
0.4 same software (e.g., on different operating systems).
0.3
0.2
Table 5 shows the detection probability for exploit
0.1 modules containing different strings. Hypotheses
0 6–8 suggest that attacks associated with web brow
-40 -30 -20 -10 0 10 20 30 40 50
Number of quarters publically known sers (“browser”), web applications (“webapp” and
VRT all rules activated VRT default rules activated
“http”), and windows (“windows”) should be asso
ciated with a higher probability of detection. This is
Figure 3. VRT ruleset detection probability as a function of the definitely the case for web browsers (r = 0.20) and
number of quarters the exploit has been known.
windows (r = 0.07). This is the case for web appli
cations (r = 0.03) when the operating systems is
Table 3. Number of detected and un-detected cases when controlled for but not otherwise (r = −0.02). Thus,
exploited vulnerabilities are referenced or not. hypotheses 6 and 8 are supported. Hypothesis 7 is
Reference to vuln. in ruleset
only supported when other variables are con
Yes No
Dectected Yes 5609 10855
trolled for.
No 2460 8508
Tables 7 and 8 describe the regression models Figure 4. Prediction errors (residuals) for all ET rules with all rules
fitted to the data for each of the rulesets. The prob activated. Negative error implies that the model predicted a
ability of probability when related variables are detected attack but the attack was not detected. The line repre
sents the normal distribution.
controlled for this variable. It is therefore excluded
from the regression model. In addition, the regres
sion model does not include software version given that the variable is associated with a reduced detec
the multinomial nature of this variable. tion probability. For instance, the value of 0.51 for
All models are statistically significant, i.e., they exploits that are not public in the same ruleset
help to predict when priority-1-alerts occur. implies that alerts are produced less often if the
However, their goodness-of-fit varies. Nagelkerke’s attack is a zero-day.
R2 varies between 0.05 and 0.73. The portion of As expected, most of the ORs suggest that zero-
correctly classified cases varies between 0.65 and day attacks are detected less often, nonsense pay
0.92, and the area under curve varies between 0.62 loads generate fewer alerts, referenced attacks are
and 0.93. An example of how the logistic regression detected more often, and attacks on windows
model’s prediction match actual values is given in machines are detected more often. The large differ
Figure 4 for the case when all ET rules are included. ences between ORs in the regression models also
The model whose prediction errors are illustrated in provide further support for hypothesis 1, i.e. that
Figure 4 has a Nagelkerke’s R2 of 0.36, classifies 88% the ruleset matters. However, some results are not
of the cases correctly, and has an area under ROC as hypothesized.
curve of 0.84. First, cases with no reference to the exploited
The relationship of variables to alerts is vulnerabilities are associated with an OR value
expressed as an odds-ratio (OR). An OR is not to less one for ET with default rules, contradicting
be confused with probabilities or relative risks. In hypothesis 4. This unexpected result of default
this model, they reflect the odds of a priority-1-alert rules is largely due to references to CVE-2008-
given that the condition is met divided by the odds 4250 (the module “windows/smb/ms08_067_ne
that an alert is produced when the condition is not tapi”). This module has many targets, and the ET
met, i.e., Palert/(1-Palert). rules that reference this vulnerability are only able
Positive values imply that the variable increases to detect two of all 172 tested target and payload
the odds of an alert (and thus alert probability) combinations. With all rules activated, general rules
given that other variables in the model remain for detecting shellcode also trigger alerts for other
unchanged. For instance, the OR of 2.5 for versions of these attacks.
Windows in Table 4 (VRT 2015–04-06) means Second, attacks on web browsers have lower
that the odds of an alert for attacks on Window probability of detection for one VRT ruleset with
machines is more than twice as high as for attack on default rules, and attacks on web applications have
other machines. Conversely, negative values imply lower probability of detection for two VRT with all
722 T. SOMMESTAD ET AL.
rules activated. We find no simple explanation for supported as they are supported under certain con
these results. Thus, these hypotheses cannot be said ditions. More specifically, the number of days
to hold across rulesets when other variables are between the release date of an exploit and a ruleset
controlled for. have no clear link to detection probability when
Third, not all ORs are statistically significant, references to the exploited vulnerability and disclo
and the ORs vary substantially between different sure of the vulnerability are already considered.
rulesets. For instance, the presence of a reference to References to the exploited vulnerability are related
a vulnerability in VRT’s default ruleset (Table 7) to detection probability in all rulesets but inversely
influences the odds of an alert much more than it related to detection probability when all ET rules
does in the VRT rulesets with all rules activated are used. The main reason is that references to one
(Table 8). It is difficult to see any trends in what exploit code are frequently used in the tests. Web
producers focus on, and much of the variation browsers and web applications decrease the detec
appears to be stochastic. Specific rules often explain tion probability in some VRT rulesets but have a
the fluctuations. For instance, the OR of 18.9 for positive relationship to detection probability
web browser attacks in ET’s default rules from overall.
2015–01-29 is partly because that version had a
rule (sid 2019091) that alerts on patterns corre
5.2. Validity and reliability
sponding to random strings generated by the
MSF. This rule was removed before the next ruleset There are many possible objections to the tests
was released, and the OR decreased to 5.85 until the described in this paper.
rule was reintroduced. Another anomaly is the First, the selection of exploits, rulesets, and Snort
extreme ORs in the magnitude of 1010 for browser version may be biased in several ways. The exploits
attacks in ET with all rules activated. These arise were selected from the database of the Metasploit
due to a set of generic rules that raise alerts in every Framework. While this database represents relevant
single browser attack in the test. The reason is that attacks, they are not necessarily representative of
ET has rules that raise alerts (sid 1:2010706:7) due the attacks that system administrators should be
to the user agent used in these tests. concerned about. For instance, the selection
Overall, it appears that producers of rulesets resulted in no attacks on mobile phones, which
focus on different things. For instance, the ORs may be a concern in today’s enterprises. In addi
provide some support for the name ET (Emerging tion, the NIDS rulesets are a convenience sample.
Threats): when all rules are activated, the detection While they are evenly distributed over time, VRT
probability of ET is less dependent on public infor rulesets cover a shorter period. The absence of VRT
mation of the exploit (i.e., the OR is closer to one). rulesets dated before 2014 also means that VRT was
On the other hand, zero days still reduce the detec tested with fewer zero-day attacks (18% compared
tion probability in most of the tested ET ruleset. to 26% for ET). As discussed earlier, very small
differences were obtained in tests involving three
different Snort versions. However, software devel
5. Discussion
opment occasionally leads to significant differences
This section discusses first summary of the results between versions, and it is possible that the selec
of the statistical tests. Thereafter, the validity and tion of version (and thereby preprocessor) would
reliability of this test are discussed along with the influence detection rates.
implications. Finally, implications to practice and Second, there are some issues with test proce
research are discussed. dure worth noting. One issue is the realism of the
simulation. The tests seldom involved responses
from the actual software targets to requests made
5.1. Overview
by attacks that involved two-way interactions.
The nine hypotheses were tested through both Instead, responses were generally simulated by
bivariate and multivariate analyses. As summarized either 1) a different software that ran the same
in Table 9, some hypotheses are only partially protocol stack as the target (e.g., an apache web
Table 7. Prediction models for detection with default rules activated. Bold ORs are statistically significant.
VRT ET
All 2014–09- 2015–04- 2019–02- All 2011–04- 2012–01- 2013–07- 2014–02- 2015–01- 2016–01- 2017–01- 2018–01- 2019–02-
versions 22 06 26 versions 02 25 24 18 29 01 15 09 26
Detection probability (priority- 0.31 0.30 0.32 0.30 0.39 0.22 0.29 0.38 0.40 0.44 0.38 0.44 0.45 0.48
1-alert)
Odds-ratios
Exploited vuln. is referenced 37.2 60.6 38.6 31.5 0.42 0.33 0.14 0.49 0.34 0.30 0.49 0.32 0.36 0.55
The exploit is not public 0.64 0.75 0.71 - 0.46 1.00 0.51 0.57 0.63 0.80 0.79 0.55 0.71 -
The attack targets a web browser 0.71 0.42 0.49 1.14 8.78 9.53 0.51 5.87 8.40 18.9 5.85 19.4 17.7 12.2
The payload is nonsense code 0.69 0.60 0.41 1.27 0.34 0.28 0.38 0.35 0.31 0.28 0.39 0.33 0.32 0.39
The attack targets a web app. 0.99 0.62 0.68 1.90 2.28 4.41 1.18 2.36 2.20 1.75 2.24 1.88 1.78 2.98
The attack targets a win-machine 8.76 17.5 7.28 7.32 1.06 1.74 1.66 1.18 1.31 1.03 1.42 0.96 0.95 0.93
Model fit
Nagelkerke’s R2 0.58 0.73 0.48 0.54 0.18 0.21 0.19 0.12 0.16 0.22 0.11 0.23 0.21 0.18
Portion correctly classified 0.90 0.92 0.90 0.88 0.70 0.82 0.76 0.66 0.67 0.70 0.67 0.70 0.69 0.65
Area under ROC curve 0.90 0.93 0.91 0.87 0.73 0.73 0.72 0.69 0.72 0.75 0.69 0.76 0.75 0.74
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE
723
724 T. SOMMESTAD ET AL.
Table 8. Prediction models for detection with all rules activated. Bold ORs are statistically significant.
VRT ET
All 2014– 2015– 2019– All 2011– 2012– 2013– 2014– 2015– 2016– 2017– 2018– 2019–
versions 09-22 04-06 02-26 versions 04-02 01-25 07-24 02-18 01-29 01-01 01-15 01-09 02-26
Detection probability 0.74 0.66 0.72 0.85 0.87 0.83 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87
(priority-1-alert)
Odds-ratios
Exploited vuln. is 2.49 2.96 2.77 1.90 3.58 13.4 4.23 4.65 4.68 5.06 4.69 4.86 5.35 0.68
referenced
The exploit is not 0.36 0.57 0.51 - 0.84 1.50 0.68 0.92 0.94 0.72 0.92 0.53 1.41 -
public
9 10 9 9 9 9 9 9 9
The attack targets a 1.02 1.34 1.28 0.62 10 10 10 10 10 10 10 10 10 109
web browser
The payload is 0.63 0.51 0.60 0.87 0.31 0.28 0.32 0.31 0.31 0.28 0.31 0.24 0.27 0.36
nonsense code
The attack targets a 0.61 0.47 0.76 0.55 20.5 35.1 21.5 21.3 21.2 25.1 21.5 17.3 15.9 12.2
web app.
The attack targets a 1.90 2.50 2.42 1.21 1.81 1.64 1.59 1.80 1.83 1.88 1.85 1.26 1.36 2.20
win-machine
Model fit
Nagelkerke’s R2 0.20 0.26 0.28 0.05 0.36 0.50 0.35 0.35 0.35 0.37 0.35 0.34 0.34 0.28
Portion correctly 0.78 0.79 0.76 0.85 0.88 0.89 0.88 0.89 0.89 0.89 0.89 0.90 0.88 0.88
classified
Area under ROC curve 0.77 0.81 0.78 0.62 0.84 0.88 0.84 0.84 0.84 0.85 0.83 0.83 0.84 0.81
Table 9. Support for the hypotheses concerning probability of when results are interpreted. For instance, the large
detection. number of available targets and payloads available
Hypothesis Support
for the module “windows/smb/ms08_067_netapi”
Detection probability depend on the ruleset. Full
Detection probability is higher for disclosed exploits than for Full contributed to the peculiar result related to ET and
undisclosed (zero-day) exploits.
Detection probability increases with the number of days an Partial
referenced attacks (see section 4.2.2).
exploit has been known. Fourth, there are conditions in the tests that
Detection probability is higher when the exploited vulnerability Partial
is referenced in the signatures. make them relevant to a limited threat scenario.
Detection probability is lower when payloads with nonsense Full For instance, encryption was not used in commu
content are used.
Detection probability is higher than average for attacks on web Partial nication with web servers, and attacks were per
browsers. formed from a network configured to be external of
Detection probability is higher than average for attacks on web Partial
applications. the target. In practice, however, encryption often
Detection probability is higher than average for attacks Full pose an obstacle to NIDSs and attacks should be
targeting windows machine.
Detection probability depend on the version of the software Full expected within a network “home”-network too,
being exploited.
where the NIDS may not be configured to look
for attacks. Furthermore, many rules in the rulesets
(especially in ET) concern malicious code’s com
server, vsftpd or openssh-server), 2) a general script munication with command-and-control servers
that answered with dummy TCP/UDP data, 3) by and trigger based on destination IPv4 addresses.
altering the exploited source code to remove the Analyzing the quality of such IP-based rules is
need for correct network responses, or 4) a combi beyond the scope of this paper.
nation of 1 and 3 or 2 and 3. While this design was Fifth, the variables in these tests are high-level
necessary from a cost-perspective and our manual generalizations that aim to predict whether attacks
reviews of the rulesets suggest alerts seldom are will trigger alerts. In practice, a skilled NIDS opera
given from responses from targeted machines, it is tor is likely able to make qualified guesses based on
possible that it suppressed alerts. many other variables. For instance, the references
Third, each of the 267 unique exploits was tested used vulnerabilities exploited in the attacks are
with all available targets and (up to) three payloads based on CVE codes in our operationalization.
per target. Thus, exploits with very few targets and However, an NIDS operator may be able to guess
few applicable payloads are undersampled. While whether an attack is covered by general rules based
this may be aligned with the exploit’s relative on a vulnerability description or an attack
importance in security, it is important to recognize description.
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 725
Sixth, some statistical results are peculiar, at least average 37 times in the data; the alerts raised
at first glance. For instance, the extreme odds-ratio by ET are triggered on average 89 times. Some
for web browsers in the ET ruleset with all rules examples that illustrate the type of rules trig
activated may be considered strange. Likewise, the gered include common alerts and are for shell
probability of detection was surprisingly indepen code detected in traffic (e.g., sid 2013273 and
dent of the time a vulnerability has been known in 2012258) or excessive use of the heap using no-
the ET rulesets with all rules activated. The above operations (sid 2012111). With all rules active,
mentioned explanations to these results demon ET also produces alerts for commonly benign
strate properties in the data: a few (generic) rules things, such as when http-posts go to unusual
can make a significant difference in detection prob ports and there is a tilde-sign after a URI. The
ability. This situation is not ideal for probabilistic deactivated rules in VRT triggered by the attacks
models, such as logistic regression. Consequently, are not as generic but seldom point to specific
not all effect sizes reported here reflect general attack. For instance, alerts are raised due to a
tendencies, and the reader should interpret indivi pattern used in suggesting obfuscated data are
dual effect sizes of the different models carefully. detected (sid 19884 and 21038). In practice, the
specificity of rules can be important because a
system administrator must interpret the alert
5.3. Implications to practice
and identify what action to take. This is not
The tests described in this paper both confirm the focus of this paper. However, as stated
widely accepted ideas and report on less widely above, our informal assessment is that rulesets
known relationships. by VRT are better in this regard.
First, readers should be cautioned that this test Third, this study confirms the results of Holm
only focuses on variables that improve alert prob (2014): available signature databases can detect
ability. Unfortunately, generic rules tend to pro zero-day attacks, although odds-ratios show that
duce alerts for both benign and malicious traffic the detection probability is lower for zero days.
and may in fact decrease system administrators’ Furthermore, other variables are often more impor
detection capability. To illustrate this problem, we tant for detection. For instance, in the ET default
exposed the NIDS solutions used in the tests to 544 rulesets, the odds of detection are approximately
benign traffic recordings (pcap files) obtained from nine times higher if a web application is attacked
Wirehark’s sample captures (Isaac Boukris, 2019). and only reduced by a factor of approximately two
The ruleset detection probability and the number of if the attack is a zero day.
false priority-1-alerts they produce for these data Fourth, the detection rates show that Snort
are strongly correlated (r = 0.81). For instance, raises alerts for approximately 66–87% of known
when all rules are activated in the rulesets, 255% and unknown attacks with all provided rules acti
more alerts are produced for the benign sample vated. Many of the rules triggered by attacks
captures. Thus, interpretations of the results from could be described as generic detection rules not
this study should recognize that false alerts are an targeting a specific exploit or threat. For example,
important issue that often occurs together with they raise alerts for http traffic that goes to unu
increased probability of detection. The tests in this sual ports or when arbitrary shellcodes are seen
study do not take this into account. in the traffic. While these provided rules come
Second, the result shows that ruleset produ with the cost of many false alerts, they also illus
cers focus on different things and think differ trate that signature-based NIDSs can be used to
ently. VRT seems more accurate at covering the detect attacks even if the specifics of the exploit
attacks that are referenced by rules, have more are unknown. Many system administrators write
specific rules, and therefore detect fewer attacks. similar rules that capture events they consider
This notion is illustrated by the number of times suspicious anomalies in their organization. With
the rulesets raise the same alert for different such site-specific rules, even higher detection
attacks. The alerts raised by VRT occur on rates can be expected.
726 T. SOMMESTAD ET AL.
5.4. Implications for research known attacks and certain types of vulnerabilities,
software, or protocols. However, as this study
The main contribution of this study is the assess
demonstrates, available rulesets do not even cover
ment of how different variables influence the prob
well-documented exploit codes, such as those in the
ability that alerts are produced in contemporary
Metasploit Framework, and few attacks are expli
NIDS. We here describe four potential takeaways
citly referenced in the rulesets. Thus, research
for researchers.
improving existing rule-generation solutions or
First, the prediction of the logistic regression
making them more accessible to practitioners
model can be improved. While the variables
ought to be welcome.
included in the model are relevant, the model is
far from perfect. The operationalizations of vari
ables used in this paper are neither encompassing
Disclosure statement
nor believed to be optimal. For instance, an attack
can be referenced by other means than the CVE No potential conflict of interest was reported by the author(s).
code it targets, and the perceived threat level at the
time rules are written may be worth adding to the
model. References
Second, future research should use another more
Bhosale, D. A., & Mane, V. M. (2015). Comparative study and
refined, dependent variable than the probability of analysis of network intrusion detection tools. In 2015
priority-1-alerts. In particular, tests of specificity International Conference on Applied and Theoretical
and usefulness of the alerts would be relevant. For Computing and Communication Technology (iCATccT)
instance, alerts should provide descriptions that (Vol.9, pp. 312–315). IEEE. Davangere, India. https://doi.
would help a system administrator identify the org/10.1109/ICATCCT.2015.7456901
threat. Bilge, L., & Dumitras, T. (2012). Before we knew it: An empirical
study of zero-day attacks in the real world. Proceedings of the
Third, other types of predictions and models 2012 ACM Conference on Computer and Communications
may illuminate other important relationships. For Security – CCS’12, Raleigh, North Carolina, USA, 833–844.
instance, as we illustrated in section 5.3, there is a https://doi.org/10.1145/2382196.2382284
very strong relationship between detection prob Boukris, I. (2019). Wireshark - Sample Captures. The Wireshark
ability and the number of false alerts produced in Foundation. Retrieved July 15, 2019, from https://wiki.wire
the rulesets (r = 0.81). The number of false alerts shark.org/SampleCaptures#Sample_Captures
produced by a ruleset are not attributes of the unit Chandrasekaran, M., Baig, M., & Upadhyaya, S. (2006).
AVARE: Aggregated vulnerability assessment and response
of analysis in this test (attacks) and therefore against zero-day exploits. In 2006 IEEE International
impossible to fit into the logistic regression model. Performance Computing and Communications Conference
However, further analysis of this relationship may (Vol.2006, pp. 603–610). Phoenix, AZ: IEEE. https://doi.
be performed. For example, analyses could be per org/10.1109/.2006.1629458
formed to assess how rule specificity (e.g., defined Cotroneo, D., Paudice, A., & Pecchia, A. (2019). Empirical
as in the work by Raftopoulos and Dimitropoulos analysis and validation of security alerts filtering
techniques. IEEE Transactions on Dependable and Secure
(2013)) is related to utility. Such research may Computing, 16(5), 856–870. https://doi.org/10.1109/TDSC.
determine whether rules that generate alerts for 2017.2714164
zero-day attacks always are unspecific and prone Debar, H., Dacier, M., & Wespi, A. (1999). Towards a taxon
to producing false alerts. Other alternatives include omy of intrusion-detection systems. Computer Networks, 31
advanced methods for the generation of represen (8), 805–822. https://doi.org/10.1016/S1389-1286(98)
tative network traffic, e.g., as in the work by Shiravi 00017-6
et al. (2012) or Ring et al. (2019). Erlacher, F., & Dressler, F. (2018). How to test an IDS?
GENESIDS: An automated system for generating atack
Fourth, the result of this study shows the limita traffic. WTMC 2018 - Proceedings of the 2018 Workshop
tions of signature-based solutions and were further on Traffic Measurements for Cybersecurity, Part of
research is warranted. As mentioned in section 2 of SIGCOMM 2018, Budapest, Hungary: Association for
this paper, previous research has demonstrated Computing Machinery, New York, United States46–51.
solutions that automatically generate rules for https://doi.org/10.1145/3229598.3229601
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 727
Garcia-Teodoro, P., Diaz-Verdejo, J. E. E., Tapiador, J. E. E., & Liao, H.-J., Richard Lin, C.-H., Lin, Y.-C., & Tung, K.-Y.
Salazar-Hernandez, R. (2015). Automatic generation of (2013). Intrusion detection system: A comprehensive
HTTP intrusion signatures by selective identification of review. Journal of Network and Computer Applications, 36
anomalies. Computers and Security, 55, 159–174. https:// (1), 16–24. https://doi.org/10.1016/j.jnca.2012.09.004
doi.org/10.1016/j.cose.2015.09.007 Mahdavi, E., Fanian, A., & Amini, F. (2020). A real-time alert
Gascon, H., Orfila, A., & Blasco, J. (2011). Analysis of update correlation method based on code-books for intrusion
delays in signature-based network intrusion detection detection systems. Computers and Security, 89, 101661.
systems. Computers and Security, 30(8), 613–624. https:// https://doi.org/10.1016/j.cose.2019.101661
doi.org/10.1016/j.cose.2011.08.010 Milenkoski, A., Vieira, M., Kounev, S., Avritzer, A., & Payne,
Goodall, J. R., Lutters, W. G., & Komlodi, A. (2009). B. D. (2015). Evaluating computer intrusion detection sys
Developing expertise for network intrusion detection. tems: A survey of common practices. ACM Computing
Information Technology & People, 22(2), 92–108. https:// Surveys, 48(1), 1. https://doi.org/10.1145/2808691
doi.org/10.1108/09593840910962186 Mitchell, R., & Chen, I.-R. (2015). Behavior rule
Hieu, T. T., Thinh, T. N., & Tomiyama, S. (2013). ENREM: An specification-based intrusion detection for safety critical
efficient NFA-based regular expression matching engine on medical cyber physical systems. IEEE Transactions on
reconfigurable hardware for NIDS. Journal of Systems Dependable and Secure Computing, 12(1), 16–30. https://
Architecture, 59(4–5), 202–212. https://doi.org/10.1016/j. doi.org/10.1109/TDSC.2014.2312327
sysarc.2013.03.013 Nadler, A., Aminov, A., & Shabtai, A. (2019). Detection of
Hofmann, A., & Sick, B. (2011). Online intrusion alert aggre malicious and low throughput data exfiltration over the
gation with generative data stream modeling. IEEE DNS protocol. Computers and Security, 80, 36–53. https://
Transactions on Dependable and Secure Computing, 8(2), doi.org/10.1016/j.cose.2018.09.006
282–294. https://doi.org/10.1109/TDSC.2009.36 Nivethan, J., & Papa, M. (2016). Dynamic rule generation for
Holm, H. (2014). Signature based intrusion detection for SCADA intrusion detection. 2016 IEEE Symposium on
zero-day attacks: (Not) a closed chapter? In 2014 47th Technologies for Homeland Security, HST 2016, Waltham,
Hawaii International Conference on System Sciences (pp. MA, USA, (May). https://doi.org/10.1109/THS.2016.
4895–4904). Big Island, HI, United states: IEEE. https:// 7568964
doi.org/10.1109/HICSS.2014.600 Nyasore, O. N., Zavarsky, P., Swar, B., Naiyeju, R., & Dabra, S.
Holm, H., & Sommestad, T. (2016). SVED: Scanning, (2020). Deep packet inspection in industrial automation
Vulnerabilities, Exploits and Detection. In MILCOM 2016- control system to mitigate attacks exploiting modbus/TCP
2016 IEEE Military Communications Conference (pp. vulnerabilities. Proceedings - 2020 IEEE 6th Intl Conference
976–981). Baltimore, MD: IEEE. https://doi.org/10.1109/ on Big Data Security on Cloud, BigDataSecurity 2020, 2020
MILCOM.2016.7795457 IEEE Intl Conference on High Performance and Smart
Khamphakdee, N., Benjamas, N., & Saiyod, S. (2014). Network Computing, HPSC 2020 and 2020 IEEE Intl Conference on
traffic data to ARFF converter for association rules techni Intelligent Data and Security, IDS 2020, Baltimore, MD,
que of data mining. In 2014 IEEE Conference on Open 241–245. https://doi.org/10.1109/BigDataSecurity-HPSC-
Systems (ICOS) (pp. 89–93). IEEE. Subang, Malaysia. IDS49724.2020.00051
https://doi.org/10.1109/ICOS.2014.7042635 Pan, Z., Hariri, S., & Pacheco, J. (2019). Context aware intru
Khraisat, A., Gondal, I., Vamplew, P., & Kamruzzaman, J. sion detection for building automation systems. Computers
(2019). Survey of intrusion detection systems: Techniques, and Security, 85, 181–201. https://doi.org/10.1016/j.cose.
datasets and challenges. Cybersecurity, 2(1), 1. https://doi. 2019.04.011
org/10.1186/s42400-019-0038-7 Patcha, A., & Park, J.-M.-M. (2007). An overview of anomaly
Kumar, S., & Spafford, E. H. (1994). A pattern matching model detection techniques: Existing solutions and latest techno
for misuse intrusion detection. In Proceedings of the 17th logical trends. Computer Networks, 51(12), 3448–3470.
national computer security conference (pp. 11–21). https://doi.org/10.1016/j.comnet.2007.02.001
Baltimore, MD Portokalidis, G., Slowinska, A., & Bos, H. (2006). Argos. ACM
Lee, S., Kim, S., Lee, S., Choi, J., Yoon, H., Lee, D., & Lee, J.-R. SIGOPS Operating Systems Review, 40(4), 15–27. https://
(2018). LARGen: Automatic signature generation for mal doi.org/10.1145/1218063.1217938
wares using latent dirichlet allocation. IEEE Transactions on Raftopoulos, E., & Dimitropoulos, X. (2013). A quality metric
Dependable and Secure Computing, 15(5), 771–783. https:// for IDS signatures: In the wild the size matters. Eurasip
doi.org/10.1109/TDSC.2016.2609907 Journal on Information Security, (2013(1), 7. https://doi.
Levine, J., LaBella, R., Owen, H., Contis, D., & Culver, B. org/10.1186/1687-417X-2013-7
(2003). The use of honeynets to detect exploited systems Ramaki, A. A., Amini, M., & Ebrahimi Atani, R. (2015).
across large enterprise networks. IEEE Systems, Man and RTECA: Real time episode correlation algorithm for
Cybernetics Society Information Assurance Workshop, West multi-step attack scenarios detection. Computers and
Point, NY, USA, (June), 92–99. https://doi.org/10.1109/ Security, 49, 206–219. https://doi.org/10.1016/j.cose.2014.
SMCSIA.2003.1232406 10.006
728 T. SOMMESTAD ET AL.
Ramirez-Silva, E., & Dacier, M. (2007). Empirical study of the Sommestad, T., Hunstad, A., & Furnell, S. M. (2013). Intrusion
impact of metasploit-related attacks in 4 years of attack detection and the role of the system administrator.
traces In: Cervesato I. (eds) Advances in Computer Information Management & Computer Security, 21(1),
Science – ASIAN 2007. Computer and Network Security. 30–40. https://doi.org/10.1108/09685221311314400
ASIAN 2007. Lecture Notes in Computer Science, vol 4846. Tjhai, G., Papadaki, M., Furnell, S. M., & Clarke, N. L. (2008).
Springer, Berlin, Heidelberg h ttps://d oi.org/d oi:1 0.1007/ Investigating the problem of IDS false alarms: An experi
978-3-540-76929-3_19 mental study using snort. In Proceedings of The IFIP TC 11
Ring, M., Schlör, D., Landes, D., & Hotho, A. (2019). Flow- 23rd International Information Security Conference (pp.
based network traffic generation using generative adversar 253–267). Boston, MA: Springer US. https://doi.org/10.
ial networks. Computers and Security, 82, 156–172. https:// 1007/978-0-387-09699-5_17
doi.org/10.1016/j.cose.2018.12.012 Tran, T., Aib, I., Al-Shaer, E., & Boutaba, R. (2012). An evasive
Roesch, M. (1999). Snort: Lightweight intrusion detection for attack on SNORT flowbits. In Proceedings of the 2012 IEEE
networks. In LISA ’99: 13th Systems Administration Network Operations and Management Symposium, NOMS
Conference (pp. 229–238). Seattle, Washington: USENIX 2012 (pp. 351–358). Maui, HI: IEEE. https://doi.org/10.
Association. 1109/NOMS.2012.6211918
Rubin, S., Jha, S., & Miller, B. P. (2004). Automatic generation Vasilomanolakis, E., Karuppayah, S., Muhlhauser, M., &
and analysis of NIDS attacks. In Proceedings - Annual Fischer, M. (2015). Taxonomy and survey of collaborative
Computer Security Applications Conference, ACSAC (pp. intrusion detection. ACM Computing Surveys, 47(4), 1–33.
28–38). Tucson, AZ: IEEE. https://doi.org/10.1109/CSAC. https://doi.org/10.1145/2716260
2004.9 Werlinger, R., Hawkey, K., Muldner, K., Jaferian, P., & Beznosov,
Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. A. K. (2008). The challenges of using an intrusion detection
(2012). Toward developing a systematic approach to gen system. In Proceedings of the 4th symposium on Usable privacy
erate benchmark datasets for intrusion detection. and security - SOUPS ’08 (p. 107). New York, New York, USA:
Computers and Security, 31(3), 357–374. https://doi.org/ ACM Press. https://doi.org/10.1145/1408664.1408679
10.1016/j.cose.2011.12.012 Wuu, L.-C., Hung, C.-H., & Chen, S.-F. (2007). Building
Shittu, R., Healing, A., Ghanea-Hercock, R., Bloomfield, R., & intrusion pattern miner for snort network intrusion detec
Rajarajan, M. (2015). Intrusion alert prioritisation and tion system. Journal of Systems and Software, 80(10),
attack detection using post-correlation analysis. 1699–1715. https://doi.org/10.1016/j.jss.2006.12.546
Computers and Security, 50, 1–15. https://doi.org/10.1016/ Zand, A., Vigna, G., Yan, X., & Kruegel, C. (2014). Extracting
j.cose.2014.12.003 probable command and control signatures for detecting
Sommestad, T., & Franke, U. (2015). A test of intrusion alert botnets. In Proceedings of the ACM Symposium on Applied
filtering based on network information. Security and Computing (pp. 1657–1662). Gyeongju: Association for
Communication Networks, 8(13), 2291–2301. https://doi. Computing Machinery. https://doi.org/10.1145/2554850.
org/10.1002/sec.1173 2554896