Anomaly-Based IDS To Detect Attack Using Various...
Anomaly-Based IDS To Detect Attack Using Various...
IP, destination port) can be used to identify machines aggregate behavior of related records. For this reason, the
exhibiting anomalous behavior. If we assume that systems are following aggregate records were prepared for analysis as
configured properly and behave in predictable ways, then feature candidates:
outliers will appear as network scans, probes, mis- R
configurations, etc. and be detected by clustering. * Repeated attempts of access by a single IP
The data that will be used for this experiment will be taken * Number of source IPs per destination IP
from a production, university datacenter firewall. The * Number of destination IPs per source IP
datacenter firewall employs extensive logging and captures
audit data on every connection and attempted connection. * Number of destination ports on a given
source/destination IP pair.
* Unique IPs
IV. DATA ACQUISITION AND PREPARATION
* Maximum activity from a single IP
A PERL script was used to extract the following data from the * Failed and successful connections from the same IP
firewall logs: date and time of connection attempt, * Attempts to access invalid IPs
permit/deny, source IP, source port, destination IP, destination
port, protocol (ex. TCP/UDP/ICMP), bytes transferred to * Inbound/Outbound bytes per unit time
server, and bytes transferred to client. Below is a sanitized Firewall logs for one day were processed into the flow-like
sample of the data records. data format described in Section IV. This yielded just over
03/10/2006 one million records that were imported into a data base for
02:00:35,PERMIT,10.10.222.11,16285,10.10.224.61,80,6,33,3467 further analysis. SQL queries were developed to aggregate the
03/10/2006 data in accordance with the feature candidates. The following
02:00:35,PERMIT,10.10.222.11,16288,10.10.234.49,443,6,440,11671 feature vectors were derived from these candidates:
03/10/2006 02:00:35,DENY,192.168.250.172,4212,192.168.4.164,80,6,0,423 * Source IP address, number of destination IP
03/10/2006 Addresses
02:00:36,PERMIT,192.168.250.172,4210,10.10.224.45,8080,6,0,0
* Destination IP, number of failed access attempts
D
This raw data and/or its aggregations are used to determine
features that, in turn, can detect anomalous patterns. The log * Source IP, destination IP
data required some level of preparation to get to this point, * Destination Perspective Vector (destination IP, count
which is to be expected. For example, bugs were discovered in of Source IPs, number of successful accesses,
the audit logging system of the firewall, which affected data number of failed accesses, count of destination ports,
acquisition. Also, there were redundant log entries generated number of bytes transferred inbound number of
for denied traffic. Finally, there was an issue with time y
formats not being compatible with our database's date/time bytes transferred [outbound])
data type. Fortunately, our PERL filter program and other These feature vectors formed the basis for the analysis of
simple techniques were able to deal with each of these issues. anomalies. Clustering and classification models were used to
With the log extraction and preparation the data acquisition explore the utility of these vectors. The process and results are
step is complete. To analyze the relationships between data described below in Section VI.
records we experimented with loading subsets of data into a
relational database. Using SQL, a search for various aggregate
relationships was carried out. This proved to be well alignedV
with the goals of detecting anomalous traffic. For example, it
was fairly easy to observe port scan activity by reviewing
source IPs associated with a large number of destination IPs Several analysis techniques were employed to analyze the
over a span of time. This kind of statistical analysis is also firewall log data. Some features were analyzed with boxplots
used to determine useful features from aggregations of the to look at the distribution of selected features, and spot and
data records. analyze outliers. Clustering was performed on the destination
perspective vector which also led to creating a classification
model using JRIP in WEKA. WEKA is a data mining tool
V. IDENTIFICATION, DISCOVERY, AND ANALYSIS OF FEATURES from the University of Waikato. JRIP is one of the many
' ~~~~~~machine learning algorithms supported by WEKIA. The
techniques and how to apply them with WEKIA can be found
in the book "Data Mining: Practical Machine Learning Tools
Little information can be gained from analyzing individual and Techniques"[3].
records. Often, the presence of intrusion behavior or other
anomalous activity can only be detected by looking at the
3
taiPs (i.e.
* The
ho woktain
i
symbols)gfrom whichr porte scgansd
whc liel ha a worm/virusry
wer erer
monitoring service. The remaining activity was found to
taking
plaffc. TermiigPsapatobengdin correspond with the previously detected SSH web scanners.
The
0 5(0%)
12016 (84%) D tu
2 36( too) Im2
3 344 14%)
L~~~~~~~~
~~~~4~~~~~~~i