Detecting Network Threats Using OSINT Knowledge-Based IDS: 2018 14th European Dependable Computing Conference
Detecting Network Threats Using OSINT Knowledge-Based IDS: 2018 14th European Dependable Computing Conference
Abstract—Cybercrime has steadily increased over the last devices that generated a traffic volume in order of 1.2 Tbps,
years, being nowadays the greatest security concern of most causing Internet unavailability across Europe and USA [7].
enterprises. Institutions often protect themselves from attacks Institutions normally protect themselves from attacks by
by employing intrusion detection systems (IDS) that analyze
the payload of packets to find matches with rules representing deploying IDS that monitor the network traffic, analyzing the
threats. However, the accuracy of these systems is as good as payload of the packets to find matches with rules about threats,
the knowledge they have about the threats. Nowadays, with the identifying anomalies and generating alerts [8]–[10]. The most
continuous flow of novel forms of sophisticated attacks and their common IDSs employ signatures, generating an alert when-
variants, it is a challenge to keep an IDS updated. Open Source ever the observed traffic is similar to one of the behaviours
Intelligence (OSINT) could be explored to effectively obtain
this knowledge, by retrieving information from diverse sources. represented in a rule [11]. Sometimes IDSs also resort to
This paper proposes a fully automated approach to update the machine learning techniques to improve precision [12], [13].
IDS knowledge, covering the full cycle from OSINT data feed Other approaches just inspect network flows, i.e., they do not
collection until the installation of new rules and blacklists. The check the payload of the packets, but only their headers [14]–
approach was implemented as the IDSoSint system and was [16]. In all cases, however, the accuracy of an IDS is as good
assessed with 49 OSINT feeds and production traffic. It was
able to identify in real time various forms of malicious activities, as the knowledge it has about the threats. However, it is a
including botnet C&C servers communications, remote access challenge to keep IDSs updated with the continuous emergence
applications, brute-force attacks, and phishing events. of novel forms of sophisticated attacks. In addition, usually
Keywords—Threat detection, Intrusion detection systems, OS- for each kind of attack, there are many slight variants that
INT, Threat intelligence, Network security can cause missing detection (false negatives) [10]. Knowledge
about recent attacks might also be hard to obtain because it is
I. I NTRODUCTION considered a business secret [17].
OSINT could be explored to collect rapidly (and often
Nowadays, institutions are often being targeted by cyber- affordably) knowledge about threats, by retrieving information
crime groups. Attacks on different hosts (from servers to from diverse sources (both public and private). OSINT is
devices) are getting increasingly frequent and with a profound normally composed of shared data provided for instance by the
impact, many times taking the form of distributed denial of examination of honeypots and other security event descriptions
service (DDoS), spamming, and malware. For example, it is about any kind of damage on services and systems, such as
estimated that in 2019 the costs associated with these criminal credential theft, phishing or DDoS [11]. Sometimes the data
actions can reach 2 trillion dollars [1]. Therefore, cybersecurity included in individual events is not enough to characterize real
has become one of the higher concerns and priorities of threats, being useless for detection either by behaviour and by
institutions, independently of their business areas [2]. patterns. However, if the data from multiple events/sources
Many times attacks are carried out by first infecting the could be aggregated and complemented with other informa-
victims’ hosts with malware or a backdoor, for later remotely tion, it would be possible to create indicators of attacks (IoA)
control them to perform malicious actions [3]. Botnet attacks [18] that could clearly characterize certain threats, allowing
have increased over the last years. For example, according their discovery.
to the 2016 and 2017 Akamai reports, such attacks have This paper presents an approach to enhance attack dis-
risen 129% from year 2015 to 2016 [4], and year 2017 covery capabilities, by automatically updating the IDS threat
closed with an increase of 10% in web application attacks knowledge. Our solution covers the full cycle, from collecting
and 14% in DDoS attacks [5]. The reason behind the DDoS OSINT feeds until installing rules at the intrusion detector.
attacks growth is in part explained by the larger number It involves the processing and correlation of OSINT data in
of connected IoT (Internet of Things) devices, which have order to create IoAs, which then support the generation of rules
reached approximately 3 billions in 2017 [6]. For instance, and blacklists, and their integration in the IDS. The current
Mirai was the first known botnet that used a considerable prototype is called IDSoSint, and it was assessed with 49
number of IoT devices. In 2016, Mirai attacked the Dyn OSINT sources and real traffic. During a period of time it
servers (organization that controls DNS) with 100.000 IoT was used to analyze the traffic of a few of the main network
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.
links of the University of Lisbon, scanning a large amount complex and require an extensive and continuous study about
of packets. IDSoSint was able to identify in real time the system under analysis. Contrarily to signature-based IDS,
various forms of malicious activities, including botnet C&C this solution can have a lower false negative rate because it
servers communications, remote access applications, brute- can detect unknown and uncommon behaviors. On the other
force attacks, and phishing events. hand, it has a high false positive rate because any behavioral
In summary, the main contributions of the paper are: (1) an deviation generates an alarm, even if there is no illicit action.
approach for improving IDS capabilities by leveraging OSINT;
(2) a process to generate IoAs based on OSINT processing; B. Intrusion Detection Tools
(3) a rule and blacklist generator for IDS from IoA’s; (4) a Zeng et al. proposed a hybrid architecture to detect botnet
system, IDSoSint, that implements the approach, and its communications by matching network traffic and host infor-
experimental evaluation with real traffic, showing the ability mation. The approach first analyzes network traffic to find
to detect various types of attacks in a (near) production machines that have a different behavior than expected, and
environment. then it inspects those hosts to discover malicious artefacts.
II. C ONTEXT AND RELATED W ORK This combination allows better coverage and evaluation of the
A. Botnets, attacks and IDSs monitored systems as it can detect common botnet communi-
cation behaviors and inspect hosts individually [8].
Botnets are networks of vulnerable devices (e.g., computers
Sperotto et al. [22] proposed an IDS based on inspecting
and mobile devices) that were compromised, and then con-
network traffic packet headers instead of their packet payloads.
trolled by criminal entities – botmasters – through a command
This IDS only detects attacks that can be found by header
and control center (C&C) [19]. Botnets are one of the main
analysis. On contrast, the approach we present inspects packet
roots of SPAM, malware propagation, ransomware, phishing,
payloads, allowing to detect botnets and other types of anoma-
and DDoS attacks. Botnets C&C are a valuable target, both
lies such as phishing.
for entities who try to detect and destroy them and for other
A survey on botnet detection showed that, despite the
cybercriminals who attempt to steal them for their benefit.
current level of expertise and the technologies used by IDS,
Botmasters use cryptographic mechanisms to protect the com-
only a few methods are effective, and the better ones are
munications within the botnet, and dissimulation mechanisms
based on anomaly approaches. Nevertheless, the detection
to prevent their identification and capture [20] [21]. Therefore,
of botnets remains difficult due to the cloaking techniques
it is difficult to eliminate such threats because they are getting
used by botmasters continuously [10]. Aviv et al. proposed
more sophisticated, and each day they appear in many variants,
a cooperation of entities, since they have confidential and
precluding easy forms of detection.
private information that, although it can not be shared and
IDS are a huge contribution for the discovery of those
made public, it can be used to mitigate the impact of botnets
threats. They can analyze the infrastructure of an organization
by detecting them [17].
and identify attack patterns. There are two kinds of IDS,
FeatureSmith detects malware in android applications based
network-based and host-based. While the former can validate
on knowledge gathered from these applications and employing
the network traffic to find malicious communication patterns,
data mining techniques. A dataset was obtained from google
the latter analyzes the system itself (e.g., computer). Both IDS
scholar scientific papers related to android malware, and the
types can operate by signatures or by behavior, but they are
data mining technique was applied to semantic relationships
as good as the knowledge they have about the threats. In this
and malware behaviour. The system detected 92.5% of mal-
paper, we will focus on network-based IDSs.
ware and had a false positive rate of 1% [9]. Unlike Fea-
Signature-based IDS use a database where each entry con-
tureSmith, the approach we propose detects threats by using
tains a representation of some characteristic of an attack,
OSINT data, which is processed to construct IDS rules and
which is called a signature. When a packet is processed by
obtain blacklists, and to be integrated in IDS as knowledge.
the IDS, its content is compared with the database entries in
MISP is a collaborative platform to share information about
order to find a matching signature, detecting in this way the
security threats. Its architecture is split into organization and
malicious traffic. This kind of solution only identifies known
community. These groups allow to share information in a
threats, i.e., those that have their signatures stored in the
secure, simple and controlled way at the organization level or
database. For this reason, this solution has the benefit of having
in a given community [23]. MISP follows an approach similar
a low number of false positives, and can provide good alert
to ours but does not perform correlations among the collected
information about the identified threats. On the other hand,
OSINT, and the rules it generates when applied to an IDS
it can have a high number of false negatives, namely when
decrease performance, turning it unusable.
”unknown” threats do not appear in the database [10], [11].
Behavior-based IDS are based on the behavior of systems it
III. OSINT- BASED IDS A PPROACH
knows and monitors. It starts by learning how a system should
behave, and it detects suspicious and incorrect actions when Our approach enhances the attack detection capabilities of
the monitored system deviates from the expected behavior, and an organization by renewing the information about threats
generates an alarm every time this occurs. These systems are configured in the IDSs, inserting novel rules and blacklists
129
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.
Network
OSINT Threat Information Rules
Traffic
feeds extractor
built automatically from OSINT data. It performs the com- generate knowledge, whereas the third phase is always in
plete cycle of IDS knowledge update, from collecting threat execution.
intelligence, generating rules, and installing them. The next sections explain the phases in detail.
OSINT-based threat events are provided by many feeds
and Internet sources, and they have diverse formats and data A. Information gathering
about security related incidents. The rules that we intend to This phase comprises three steps implemented by the
create vary significantly in terms of complexity depending on components: OSINT collector, threat information extractor,
the attack. This means that sometimes a single OSINT event and threat information aggregator. These components are
contains enough data to construct a simple rule. However, in respectively responsible for gathering the OSINT events from
other cases, the derivation of more complex rules requires the pre-configured sources, as indicated by the SOC (Security
the combination of multiple OSINT events and other external Operations Center) analyst; extracting the data from the events;
information. Still, it often happens that these complex rules are and aggregating the information from related events.
the ones necessary to actually support the detection of more
sophisticated attacks.
Collector 1 Parser 1 Deduplicator
Therefore, we process OSINT data with the goal of verify-
ing the interrelation among different security events in order to
IoCs
complement each other and obtain the relevant and complete IoCs
Threat aggr
information necessary to describe attacks. Moreover, based on Collector n Parser n information
this information, IDS rules (both simple and complex) are built aggregator
automatically, ready to be integrated into an IDS. As OSINT Fig. 2: Information gathering data flow in detail.
reports characterize many threat categories, rules for quite
distinct types of attacks are constructed, improving the IDS The data flow of these steps is displayed in Fig. 2. It
capabilities in two directions: the detection of an expressive starts with several collectors, each associated with a partic-
number of different attacks, and the robust discovery of ular feed (e.g., phishtank, malwaredomainlist), where they
sophisticate attacks. obtain an influx of OSINT events. Multiple collectors can be
The approach acts continuously in a loop, updating the IDS configured for the same threat category (phishing, malware
with the knowledge that was acquired through OSINT, in order domain) depending on the number of feeds the SOC analyst
to achieve better performance of detection. This means that for wants to gather. Therefore, various events can relate to the
each loop iteration all phases of the approach are executed, as same threat category. However, OSINT data is available on
shown in Fig. 1. These phases are summarized as follows: different formats (e.g., cvs, txt, doc), and thus it is necessary
1) Information gathering: collects the threat intelligence to harmonize events to a single format before processing. To
provided by OSINT feeds, and aggregates the security do so, the individual parsers analyze each of these events,
events by threat categories. classifying them into one of the pre-defined threat categories,
2) Knowledge generation: processes the aggregated data, and translating the events into the indicators of compromise
establishing associations between security events and (IoC). IoC are standardized forensic information artifacts used
other external information, and generating knowledge that in the identification of potential malicious activities of a com-
can describe attacks clearly. Based on this knowledge, promised system or network [18]. The IoC structure is speci-
IDS rules and IP blacklists are constructed. fied as a JSON object composed of a set of <key:value>
3) Incident detection: updates the IDS knowledge database pairs. Each pair contains data that can characterize malicious
with the rules and IP blacklists, allowing the system to activity, and the set of pairs is the union of data collected
remain up to date and able to identify new malicious from a part of an event, since an event can contain various
activities. Incidents are registered based on matching the IoCs related with the same security issue. Afterwards, a
observed traffic with the rules and blacklists. deduplicator checks for repeated IoCs by utilizing pattern
The first two phases are carried out periodically, within a matching because feeds from the same category can emit
predefined time range (e.g., daily) that is dependent on the equal threats, ensuring thus that similar IoCs are erased and
availability of OSINT updates, to process event feeds and achieving a better performance on all this process. Next, an
130
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.
The bottom of Fig. 3 shows an example of an IoA. This
External
information IoA describes attacks against the Apache service, namely
DDoS and RFI (remote file inclusion) attacks. To construct
this IoA, the data it contains was obtained as follows: a
Event Writer
collector was configured to get OSINT data from feed
Information
IoCs Experts IoAs https://lists.blocklist.de/lists/apache.txt,
aggr Protocol named Blocklist.de Apache, and with a description IP
Expert
reported as having run (...) RFI-Attacks.
Notice that this feed only provides a list of IPs that attacked
¬ ¬ ¬ {¬
¬ ¬ ¬"source.ip":"103.205.96.53", the Apache service, so all the information contained in this
¬ ¬ ¬"classification.taxonomy":"intrusion attempts", IoA do not come from it. Next, after the parser extracting
¬ ¬¬ "event_description.text":"IP reported as having run
¬ ¬¬¬¬¬ attacks on the service Apache, Apache-DDoS, RFI-Attacks", the 103.205.96.53 IP from that list, an IoC is created
¬ ¬ ¬"source.geolocation.cc":"VN", to it, and then aggregated with another IoCs associated to
¬ ¬ ¬"feed.url":"https://lists.blocklist.de/lists/apache.txt",
¬ ¬ ¬"protocol.application":"http", Apache service. In this case, this IoC is unique. However, if
¬ ¬ ¬"source.asn":63731, there were another IoCs belonging to the same IP network,
¬ ¬ ¬"source.network":"103.205.96.0/22",
¬ ¬ ¬"source.registry":"APNIC", they would be represented (aggregated) as a sub-network or
¬ ¬ ¬"time.observation":"2018-04-05T16:21:51+00:00", an IP range. The information experts get external information
¬ ¬ ¬"feed.name":"Blocklist.de Apache"
¬¬¬} about that IP (source fields in the IoA). In addition, the field
protocol.application (sixth pair in the example)
Fig. 3: Indicator of attacks (IoA) generation data flow.
contains the protocol targeted by the attack, which was added
by the protocol expert.
extractor examines the IoCs, detecting and extracting relevant
2) Rules & blacklist generator: This component uses the
data about threats (e.g, domains, malicious IP for a specific
data contained in the IoAs to build IDS rules and compose
protocol), and the threat information aggregator groups them
lists of IP addresses considered malicious (blacklist). Fig. 4
by category of threat.
displays this process. To create a rule, first it checks the
B. Knowledge generation protocol indication contained in protocol.application
to determine both the source and destination ports. Next,
This phase builds knowledge that will improve the detection it constructs the rule with this data, adding the remaining
capability of the IDS. It is composed of two main steps information from the IoA to make the rule more specific for a
implemented by the event generator and the rules & blacklist particular attack. Next, it stores the rule in the rules database.
generator. They are responsible for processing the aggregated In parallel, if the IoA only contains IPs tagged as malicious,
IoCs and creating rules & blacklists based on such data. the generator creates an IP blacklist with them, and stores the
1) Event generator: The actions carried out by the event result in the blacklist database.
generator are detailed in Fig. 3. The aggregated IoCs are In general, an IDS rule is specified as follows:
processed by the information experts to enhance them with
other external data, not coming from OSINT feeds. Experts alert [action][proto][srcIP][srcPort] ->
are configured by administrators, for example, to run network [destIP][destPort] ([rule options])
commands to obtain detailed data about the IP provided in the
IoC, such as whois, geo location, and asn source, or to obtain where proto indicates the protocol that the rule refers
features about the malware domain, such as the port. Next, to, such as TCP, UDP or ICMP; srcIP/destIP and
the protocol expert correlates all the information, identifying srcPort/destPort specify the source/destination IP and
interconnection points between them and enhancing them, with port; rule options defines the patterns used to analyze
the objective of getting together the most complete description the packet payload, checking if it matches with some of them
of an attack. The various data is then merged as an indicator of
attack (IoA). IoAs allow the attacks to be identified on systems
executing at runtime [18], [24]. The IoA structure follows the
structure of an IoC, i.e., a set of <key:value> pairs, where
each pair contains data that characterizes an attack, and the set
131
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.
Internet
OSINT External
feeds rules
IntelMQ Firewall
Internal
Network
and detecting an attack. If so, the defined action is executed. IV. IDS O S I N T S YSTEM
In such case, an alert message is emitted stating that the rule The approach was implemented in our IDSoSint system,
was triggered, and the packet payload is stored temporarily so which is illustrated in its current form in Fig. 5. It is divided
that later it can be analyzed by the SOC team. into two parts – knowledge management and incident detection
For example, from the IoA in Fig. 3 the generated rule is: – where the former implements the first two phases of the
alert tcp 103.205.96.53 22 -> $HOME_NET 80
approach and includes three modules we have developed, and
(msg:"Intrusion attempts to Apache; sid 100")
the second component corresponds to the third phase.
IDSoSint uses a few open-source tools and some modules
that we programmed in Python to facilitate the integration with
in which it is identified by ID 100 (sid 100) and the mes-
other software packages. We setup the IntelMQ platform [25]
sage Intrusion attempts to Apache being emitted
to collect various OSINT feeds and to produce the IoCs. Then,
when TCP traffic coming from 103.205.96.53 machine
the system is composed of three modules: the events correla-
through 22 source port tries to reach the Apache service
tor, for correlating IoCs in order to get relevant information
running on port 80 in the UL network identified by variable
that characterizes attacks; the IoA generator, for generating
$HOME_NET. Notice that the port 80 is determined based on
IoAs based on the correlation results; and the rule & blacklist
the information given by the protocol.application.
generator to get the IDS rules and blacklist based on the IoAs.
In addition, we configured the Pulledpork platform [26] to
C. Incident detection manage the rules and blacklists, and the IDS Snort [27] to
process the packets with our rules.
The detection of malicious activities by the IDS is done with
the help of the rules and blacklists generated by the previous V. E XPERIMENTAL E VALUATION
phase. However, before updating the IDS with this knowledge, The objective of the experimental evaluation is to answer the
the rules & blacklist manager needs to perform some checks. following questions: (1) Is IDSoSint able to detect attacks
This manager (see Fig. 1) administers all rules and blacklists by processing real data? (2) Is IDSoSint able to identify
with the objective of avoiding the duplication of rules in the the type of attacks performed? (3) Is IDSoSint able to
IDS, enhancing thus its performance, and handling the IP’s explore OSINT data to generate knowledge that can improve
reputation. The checking task is carried out at each pre-defined the detection capabilities of IDSs?
time interval (e.g., daily) before updating the IDS. Afterwards, In order to validate our approach, we evaluate IDSoSint
the new rules and blacklisted IPs are deployed into the IDS, in a production environment, by scanning during 8 days the
and it starts running with them. traffic of a few of the main network links of the University of
For its part, before the IDS analyzes the packet payload, Lisbon (UL), while analyzing a large amount of packets.
it first ascertains the IP headers using the blacklists. If no The rest of the section is organized as follows: in Sub-
match occurs, next, it uses the rules. This sequence of analysis section V-A the system set up is presented, focusing on the
improves the IDS performance, since the validation of packet OSINT feeds; and in Subsections V-B and V-C are described
headers is faster than looking at the packet contents. The and discussed the results. These last two subsections answer
actions and alerts are triggered if any rule finds illicit activity. questions 1 to 3.
Following the example described above, that rule is inserted
into IDS, and then if the host 103.205.96.53, via a SSH A. Setting-up IDSoSint
connection (port 22), tries to communicate with the UL web The IDSoSint system was set up in a HP ProLiant DL360
server (Apache service) which is running on port 80 in G6 machine, with 2 Intel Xeon CPU X5550 at 2.67GHz,
a host belonging to the $HOME_NET network, an alert will be 12GB of RAM, 165GB hard disk, and 3 network interfaces (a
emitted and the correspondent packet stored for later analyzes. 10Gb/s optic fibre channel and 2 1Gb/s ethernet interfaces).
132
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Number of registered incidents (8 days).
UL central
services
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8
1 network 410.795 523.005 411.344 505.541 472.309 840.062 605.033 1.683.571
Router 1
Total: 5.451.660
Public eduroam
network network the rule & blacklist generator, and that they could be imported
Firewall Core legacy Router 2
by Snort. During the evaluation period, on average, 5.290
rules and 14.590 blacklist entries were generated daily, with a
total of 42.320 and 116.720 entries, respectively. IDSoSint
Data center registered an overall 5.5 million incidents when it applied these
Fig. 6: University of Lisbon network overview with the three links rules and blacklists to the observed traffic. For those alerts
that were evaluated by IDSoSint. triggered by rules, the analyzed packets were stored for later
investigation. However, since the rules used by Snort were
The installed operating system was the Security Onion Linux the ones generated by the rule & blacklist generator and are
distribution, which is already fully equipped with network based on reputational OSINT feeds, all the incidents registered
security analysis applications such as Snort. By default Snort were effective and most of them regarded access attempts to
does not come with rules defined. Although the Talos Cisco unavailable services on UL network.
team releases a Snort rule-set, we opted by using only the Table II summarizes the number of incidents detected in
rules that IDSoSint generated, since the former are released each day. On average, 540.000 alerts were emitted on the
once per week and ours are constructed and deployed daily. initial 7 days, whilst in the the last day (day 8) there was
an important increase on the alerts. The explanation for this
TABLE I: Distribution of feeds by OSINT events. increase is most probably due to a novel high impact malware
OSINT event Feed OSINT event Feed
Blacklist IP 16 Malicious IP against MAIL server 2 that appeared in that day.
IPs trying remote access 9 Malicious IP against FTP server 1
Malicious domain 9 Malicious IP against SNMP server 1
Malicious IP against VoIP server 4 Phishing or malicious links 7
TABLE III: Number of collected OSINT events that belonged to
each of the eight categories used to classify the rules & blacklisted
IPs, and the corresponding incidents that were detected.
The IntelMQ threat intelligence platform was installed in Category OSINT event Incident
a virtual machine with 4GB of RAM, and 17 GB of hard Blacklist IP 116.313.458 3.779.638
disk. IntelMQ was configured to collect 49 OSINT feeds from IPs trying remote access 864 1,588,024
Malicious domain 4.223 45.843
different categories, provided by 44 public and open source Malicious IP against VoIP server 118 23.823
repositories belonging to 19 entities.1 Malicious IP against MAIL server 397 12.005
Some of these feeds are well known, such as Phishtank [28] Malicious IP against FTP server 23 1,536
Malicious IP against SNMP server 48 791
that reports phishing incidents, CINSScore [29] for blacklisted Phishing or malicious links 19.453 0
IPs, and Malware Domain [30] for share malware domains. Total 116.338.584 5.451.660
Table I distributes the 49 OSINT feeds by our 8 event
categories, which allowed the collection of threat intelligence Table III presents the number of OSINT events that were
information about diverse kinds of security incidents. collected and processed in each one of our threat categories,
As stated, we analyzed a few connection links of the i.e., extracted from the 49 OSINT feeds (column 2). It also
computer infrastructure of the UL. Fig. 6 gives an overview displays how the registered incidents were distributed per
of the network, displaying the three links where we deployed category (column 3). It is possible to observe that the number
IDSoSint (checkpoints in the figure). The UL central ser- of collected OSINT events was much larger than the number
vices network includes for example the administrative services of entries that were set in the IDS, respectively, around 116
and human resources; the data center stores the data of most million and 160 thousand. This reduction is explained, on
of the institution (e.g., websites); and eduroam is the academic one hand, by an efficient detection of duplicates, and on
European wireless network. We selected these links because the other hand, by the successful aggregation that can be
of their size and characteristics, as they connect parts of the achieved by representing the threats as IoAs. Recall that an
infrastructure that have distinct purposes, therefore ensuring IoA describes an attack using the information included in a set
that a wider range of traffic is observed. of related events. In addition, the table shows the effectiveness
B. Incident Record of the rules at producing diverse kinds of alerts, where the
most common incident was related to communications of
We analyzed the rules and blacklists that were created, and it blacklisted IP. This gives evidence that OSINT data can
was possible to verify that they were correctly constructed by be explored to create knowledge to be used by protection
1 8 of them: https://zeustracker.abuse.ch, https://www.openphish.com, assets in a network infrastructure, such as an IDS. Moreover,
https://www.spamhaus.org, https://reputation.alienvault.com, these numbers demonstrate how cybercrime is well present in
https://lists.blocklist.de, https://www.malwaredomainlist.com, public networks, and that it has to be an actual concern for
https://www.cinsscore.com, https://www.team-cymru.org. organizations.
133
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.
TABLE IV: Number of incidents by category.
Event Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Alert
Traffic made by a blacklisted IP 399652 378032 375775 482454 454109 770526 568797 350293 3,779,638
SSH communication by a malicious IP 7928 137997 30030 5788 5076 60203 9091 1319138 1,575,251
Telnet communication by a malicious IP 101 117 136 127 140 173 128 81 1,003
Communication with a malicious domain 663 2476 1564 10533 7008 3492 19726 381 45,843
Web communication by a malicious IP 228 42 58 156 0 0 750 10536 11,770
IMAP communication by a malicious IP 333 220 421 1251 2271 1452 1214 68 7,230
SMTP communication by a malicious IP 64 1597 636 52 125 542 799 960 4,775
FTP communication by a malicious IP 7 14 22 1415 33 0 36 9 1,536
SNMP communication by a malicious IP 171 150 84 85 108 115 58 20 791
SIP OPTIONS request by a malicious IP 1648 2210 2470 3270 3295 2997 4365 1871 22,126
SIP REGISTER request by a malicious IP 0 75 74 410 0 486 69 214 1,328
SIP communication by a malicious IP 0 75 74 0 144 76 0 0 369
Total 410795 523005 411344 505541 472309 840062 605033 1683571 5,451,660
A more detailed analysis of the rules and registered in- TABLE V: Number of alerts created due to blacklists, organized
by source and destination ports.
cidents shows that in some cases, a few rules of certain Source port Alert Destination port Alert
categories, as FTP for example, were sufficient to detect a 0 125324 0 125315
considerable number of illicit actions. However, in other cases, 135 1647 22 668036
443 2150 23 689073
although the rules to detect such actions were correct, no 5081 3543 25 42086
incidents were recorded (as in the phishing or malicious links 6000 23225 53 104006
category). This absence is justified by the pre-processing made 9090 2110 80 108549
9224 4523 123 16392
by Snort, where the packet headers are checked before the 10000 75759 443 78799
analysis of their payloads. In that case, if the phishing IP 21888 – 48284 54410 1433 672640
belongs to the list of malicious IPs, Snort detects the problem 50263 – 65535 406591 1900 101632
2323 19761
in the pre-processing phase, emits an alert, and ends the 5060 225279
analysis for that packet. In this situation, the associated rules 43526 228788
ended up not being used during the packet validation. Total 699282 Total 3080356
134
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.
[4] Akamai, “[State of Internet] / Security. Q2 2016 Report,” Akamai, Tech.
Rep., 2016.
[5] ——, “[State of Internet] / Security. Q4 2017 Report,” Akamai, Tech.
Rep., 2017.
[6] statista, “Internet of things (iot) connected devices installed base
worldwide from 2015 to 2025 (in billions),” https://www.statista.com/
statistics/471264/iot-number-of-connected-devices-worldwide/.
[7] SearchSecurity, “Details emerging on dyn dns ddos attack, mi-
rai iot botnet,” http://searchsecurity.techtarget.com/news/450401962/
Details-emerging-on-Dyn-DNS-DDoS-attack-Mirai-IoT-botnet.
[8] Y. Zeng, X. Hu, and K. G. Shin, “Detection of botnets using combined
host-and network-level information,” in IEEE/IFIP International Con-
ference on Dependable Systems & Networks, 2010, pp. 291–300.
[9] Z. Zhu and D. T., “Featuresmith: Automatically engineering features for
malware detection by mining the security literature,” in In Proceedings of
Fig. 7: Top 10 countries that produce the most alerts. the 2016 ACM SIGSAC Conference on Computer and Communications
Security, 2016, pp. 767–778.
[10] S. Silva, R. Silva, R. Pinto, and R. Salles, “Botnets: A survey,” Computer
This could indicate that a group of machines might have Networks, vol. 57(2), no. 2, pp. 378–403, 2013.
been compromised and are communicating with a specific [11] G. Bruneau, “The history and evolution of intrusion detection,” SANS
Institute, Tech. Rep., 2001.
port to blacklisted IPs. The remaining most accessed ports [12] B. Li, J. Springer, G. Bebis, and M. H. Gunes, “A survey of network flow
were VoIP (5060), HTTP (80), DNS (53), SSDP (1900), and applications,” Journal of Network and Computer Applications, vol. 36,
HTTPS (443). These results give evidence that several of the no. 2, pp. 567–581, 2013.
[13] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller,
IPs tagged as malicious (blacklisted) could be blocked at any “An overview of ip flow-based intrusion detection,” Communications
institution boarder firewall, as they attempt to access reserved Surveys & Tutorials, IEEE, vol. 12, no. 3, pp. 343–356, 2010.
services such as Telnet. [14] L. Hellemons, L. Hendriks, R. Hofstede, A. Sperotto, R. Sadre, and
A. Pras, “SSHCure: A flow-based SSH intrusion detection system,” in
Finally, the Fig. 7 gives an overview of the 10 countries IFIP International Conference on Autonomous Infrastructure, Manage-
most involved in the incidents, where the top 3 rank is: China, ment and Security, 2012, pp. 86–97.
USA, and Netherlands. These results confirm the statistics [15] A.-S. Kim, H.-J. Kong, S.-C. Hong, S.-H. Chung, and J. W. Hong, “A
flow-based method for abnormal network traffic detection,” in IEEE/IFIP
included in Akamai’s 2017 report [5]. Network Operations and Management Symposium, vol. 1. IEEE, 2004,
Based on the results discussed and analyzed in Sections V-B pp. 599–612.
and V-C, the questions 1 to 3 have a positive answer. [16] T. Dubendorfer, A. Wagner, and B. Plattner, “A framework for real-
time worm attack detection and backbone monitoring,” in 1st IEEE
International Workshop on Critical Infrastructure Protection, 2005.
VI. C ONCLUSION [17] A. H. Haeberlen and A. Aviv, “Challenges in experimenting with botnet
detection systems,” in In Proceedings of the 4th Conference on Cyber
The paper presents an approach to improve the detection Security Experimentation and Test, 2011, pp. 6–15.
capabilities of IDS by resorting to threat intelligence data [18] I. S. McAfee, “Indicators of attack (ioa),” Intel Security McAfee, Tech.
gathered from OSINT feeds. Our approach automatically Rep., 2014.
[19] N. Gamer, “The state of botnets in late 2015 and early 2016,” http://
processes OSINT data, aggregating and correlating it in order blog.trendmicro.com/the-state-of-botnets-in-late-2015-and-early-2016/.
to generate IoAs. Afterwards, these IoAs are used to build [20] J. M. Butle, “Finding hidden threats by decrypting ssl,” SANS Institute,
IDS rules and blacklist, which are then installed in the IDS. Tech. Rep., 2013.
[21] M. Rouse, “Metamorphic and polymorphic malware,”
We implemented the IDSoSint system and evaluated the http://searchsecurity.techtarget.com/definition/metamorphic-and-
approach with production traffic from a few links of the UL. polymorphic-malware.
The results show that OSINT data is useful to generate new [22] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller,
“An overview of ip flow-based intrusion detection,” IEEE Communica-
forms of representing knowledge of threat intelligence and tions Surveys and Tutorials, vol. 3, no. 12, pp. 343–356, 2010.
can be applied in defence mechanisms. IDSoSint is able to [23] C. Wagner, A. Dulaunoy, G. Wagener, and A. Iklody, “Misp: The de-sign
detect illicit activities by employing the generated knowledge, and implementation of a collaborative threat intelligence sharing plat-
form,” in In Proceedings of the 2016 ACM on Workshop on Information
alerting for problems such as botnet communications and Sharing and Collaborative Security, 2016, pp. 49–56.
remote access applications. [24] N. Lord, “What are indicators of compromise?” https://digitalguardian.
com/blog/what-are-indicators-compromise.
Acknowledgements. This work was partially supported by the EC [25] n0where, “Automate incident handling process: Intelmq,” https://
n0where.net/automate-incident-handling-process-intelmq/.
through funding of the H2020 DiSIEM project (H2020-700692), and [26] Shirkdog, “Pulledpork,” https://github.com/shirkdog/pulledpork.
by the LASIGE Research Unit (UID/CEC/00408/2013). We warmly [27] Snort, “Snort,” https://www.snort.org.
thank CNCS for availability on this work. [28] OpenDNS, ““phishtank,” https://www.phishtank.com.
[29] CINScore, “The cins score,” http://cinsscore.com.
[30] MDL, “Malware domain list,” https://www.malwaredomainlist.com.
R EFERENCES [31] M. Kezys, “Sip attack: Friendly-scanner,” http://blog.kolmisoft.com/
sip-attack-friendly-scanner/.
[1] S. Morgan, “Cyber crime costs projected to reach $2 tril-
[32] S. Gauci, “Sipvicious,” https://github.com/EnableSecurity/sipvicious.
lion by 2019,” https://www.forbes.com/sites/stevemorgan/2016/01/17/
[33] B. Mitchell, “Port 0 in tcp and udp,” https://www.lifewire.com/
cyber-crime-costs-projected-to-reach-2-trillion-by-2019/.
port-0-in-tcp-and-udp-818145.
[2] I. Society, “Global internet report 2016,” https://www.internetsociety.org/ [34] Speedguide, ““speedguide”,” https://www.speedguide.net/port.php?port=
globalinternetreport/2016/. 6000.
[3] P. Bächer, T. Holz, M. Kötter, and G. Wicherski, “Know your enemy:
Tracking botnets,” https://www.honeynet.org/book/export/html/50.
135
Authorized licensed use limited to: New York University. Downloaded on October 09,2021 at 11:34:18 UTC from IEEE Xplore. Restrictions apply.