Intrusion Detection System Using Hierarchical GMM and Dimensionality Reduction
Intrusion Detection System Using Hierarchical GMM and Dimensionality Reduction
P a g e | 29
www.ijcait.com International Journal of Computer Applications & Information Technology
Vol. 1, No.1, July 2012
subsequent similar attempts. These specific patterns are called
signatures.
Anomaly Detection
Anomaly detection is concerned with identifying
events that appear to be anomalous with respect to normal
system behavior. Its Designed to uncover abnormal patterns
of behavior, the IDS establishes a baseline of normal usage
patterns, and anything that widely deviates from it gets
flagged as a possible intrusion. Thus these techniques identify
new types of intrusion as deviations from normal usage. It is
an extremely powerful and novel tool but a potential
drawback is the high false alarm rate, that is. previously
unseen (yet legitimate) system behaviours may also be
recognized as anomalies, and hence flagged as potential
intrusions. If a user in the graphics department suddenly starts
accessing accounting programs or compiling code, the system
can properly alert its administrators.
In a network-based system, or NIDS, the every
individual packet flowing through a network is analyzed. The
NIDS can detect malicious packets that are designed to be
overlooked by a firewall simplistic filtering rules. In a host-
based system, the IDS examines at the activity on each
individual computer or host.
A wide variety of techniques including neural
networks, decision tree approach and hidden Markov models
have been explored as different ways to cluster the data for
rule creation. Each and every techniques has got its own pros Fig.1. 1 Overall system Architecture
and cons, Hidden markov model is slow, full search on a
Reducing The Data Features For Intrusion Detection Systems
database of 400,000 sequences can take 15 hours. Decision
Using Gmm
tree approach is unstable to handle large volume of data,
The current intrusion detections aiming at the web server
Data Collection Issues: For accurate intrusion detection,
attack all adopt the rule-based method, like the famous
we must have reliable and complete data about the target
intrusion detecting system Snort2.0, which detection rules are
system’s activities. Reliable data collection is a complex issue
written after the features are refined from every intrusion
in itself. Most operating systems offer some form of auditing
behavior. Thus a rules library is formed. Then the captured
that provides an operations log for different users. These logs
data packets are matched the rules library respectively. If the
might be limited to the security-relevant events (such as failed
match succeeds, the behavior is regarded as intrusion.
login attempts) or they might offer a complete report on every
system call invoked by every process. Similarly, routers and Since the amount of audit data that an IDS needs to verify
firewalls provide event logs for network activity. These logs is very huge even for a small network, rule matching is
might contain simple information, such as network connection difficult even with computer assistance because extraneous
openings and closings, or a complete record of every packet features can make it harder to detect suspicious behavior
that appeared on the wire. patterns. Complex relationships exist between the features,
which are difficult for humans to discover. IDS must group
The amount of system activity information a system
the amount of data to be processed. This is very important if
collects is a trade-off between overhead and effectiveness. A
real-time detection is desired. Reduction can occur in one of
system that records every action in detail could have
several ways. Data that is not considered useful can be
substantially degraded performance and require enormous
filtered, leaving only the potentially interesting data. Data can
disk storage. For example, collecting a complete log of a 100-
be grouped or clustered to reveal hidden patterns; by storing
Mbit Ethernet link’s network packets could require hundreds
the characteristics of the clusters instead of the data, overhead
of Gbytes per day.
can be reduced. Finally, some data sources can be eliminated
using feature selection
3. OVERVIEW OF GAUSSIAN
MIXTURE MODEL 4. IMPROVING THE RULE
Mixture models are a type of density model that comprise MATCHING SPEED BY GMM
of a number of component functions, usually Gaussian. The
Gaussian Mixture Models (GMMs) are among the most
distribution of feature vectors was extracted from packets in
statistically mature methods for clustering. It deals with
the network. A Gaussian Mixture Model GMM is used to
clustering problems: a model-based approach, which consists
construct a Bayesian classification procedure on the
in using certain models for clusters and attempting to optimize
observations and leads to the system behavior model.
the fit between the data. In practice, each cluster can be
Parameters of mixture model are used by the Expectation
mathematically represented by a parametric distribution, like a
Maximization (EM) algorithm.
Gaussian. The entire data set is therefore modeled by a
mixture of these distributions. An individual distribution used
P a g e | 30
www.ijcait.com International Journal of Computer Applications & Information Technology
Vol. 1, No.1, July 2012
to model a specific cluster is often referred to as a component Maximization
distribution.
A mixture model with high likelihood tends to have the
component distributions have high “peaks” and the mixture
model “covers” the data well. Main advantages of model-
based clustering:
well-studied statistical inference techniques
flexibility in choosing the component distribution;
obtain a density estimation for each cluster;
a “soft” classification is available.
Mixture of Gaussians
The most widely used clustering method of this kind is the Fig. 3 Overview of the structure of GMM
one based on learning a mixture of Gaussians: we can actually
consider clusters as Gaussian distributions centred on their In the Expectation Maximization algorithm no of mixtures
barycentres, as we can see in this picture, where the grey decided beforehand, it updates the parameters of given k-
circle represents the first variance of the distribution: component mixture with respect to the data set Xn = x1, ....xn
such that likelihood of Xn is never smaller under new
mixtures.
Estimates by iterating following equations for all
components j €1, ..., k:
n
πj = ∑ P(j|xi)/n
Fig 2. GMM Cluster
i=1
The algorithm first chooses the component (the Gaussian)
n
We can obtain the likelihood of the sample: ∑j = ∑ P(j|xi)(xi − μi)(xi − μi)T/(nπj)
i=1
. Where θ is model with mean μ and covariace matrix ∑
πj: Mixing Weight
What we really want to maximise is
(probability of a datum given the centres of the Gaussians). φ(x; θj) : Mixture Component
P a g e | 31
www.ijcait.com International Journal of Computer Applications & Information Technology
Vol. 1, No.1, July 2012
Rule generalization the abnormal users' behavior multigigabit per second speeds
with a moderately small amount of embedded memory and a
We propose to generate new rules by generalising few mega bytes of external memory.
SNORT rules. Given an Internet packet that contains a
variation of a known attack, there should be some automated Dimensionality Reduction Algorithm
way to identify the packet as nearly matching a NIDS attack
signature. If a particular statement has a set of conditions Dimension Reduction techniques are proposed as a data pre-
against it, an item may match some of the conditions. processing step. This process identifies a suitable low-
dimensional representation of original data. Reducing the
Whereas Boolean logic would give the value false to the dimensionality improves the computational efficiency and
query ’does this item match the conditions’, our logic could accuracy of the data analysis.
allow the item to match to a lesser extent rather than not at all.
This principle can be applied when comparing an Internet Steps :
packet against a set of conditions in a SNORT rule. Our Select the dataset.
hypothesis is that if all but one of the conditions are met, an
alert with a lower priority can be issued against the Internet Perform discretization for pre-processing the data.
packet, as the packet may contain a variation of a known Apply Best First Search algorithm to filter out redundant &
attack. In our implementation, generalisation in the case of super flows attributes.
matching network packets against rules, involves allowing a
packet to generate an alert if: Using the redundant attributes apply classification
algorithm and compare their performance.
The conditions in the rule do not all match, yet most of
them do; Identify the Best One.
The only conditions that do not match exactly nearly The original dataset consist of 41 attributes and one class
match. label. The following list out the attribute names
As an example, assume a certain rule states that an alert (i) 41 Attributes: duration, protocol type, service, Flag,,
should be generated if a packet is a particular length, on a src_bytes, dst_bytes, land, wrong _ fragment,
particular port and contained a certain bit pattern. Using our urgent,Hot,num_field_logins,logged_in,num_compromised,ro
generalisation a packet matching those criteria, except perhaps ot_shell,su_attempted,num_root,num_file_creation,
on a different port, or with a slightly different bit pattern, srv_count,.
would still count as matching, and a (modified) alert would be
serror_rate,srv_serror_rate,rerror_rate,srv_rerror_rate,sam
generated.
e_srv_rate,diff_srv_rate,srv_diff_host_rate,dst_host_c
ount,dst_host_srv_count,dst_hosdst_same_srv_rate,dst_host_
diff_srv_rate, dst_host_same _ src _ port _ rate, dst _ host _
srv _ diff _ host _ rate, dst _ host
_serror_rate,dst_host_srv_serror_rate,dst_host_rerror_rate, dst
_ host_srv_rerror_rate.
Using Best First Search method we obtained two set of
reduced dimensionalities. 7 potential attributes and 14
potential attributes which are listed in the table 2 and 3
respectively.
(ii) 14 Attributes: duration, service, flag, src_bytes, dst_bytes,
count, srv _ count, serror_rate, rerror_rate, dst _
host _ same _ srv _ count, dst_host_srv_rate, dst _ host _
Block diagram of a complete network intrusion detection rerror _ rate , dst _ host _ diff _ srv_byte, dst_host_
system consisting of Snort, MySQL, Apache, ACID, PHP, same _ src _port_rate.
GD Library and PHPLOT
(iii) 7 Attributes : Protocol Type, Service,Srcbytes,
5. PROPOSED SYSTEM Dstbytes,count, diff_srv_rate, dest_host_srv_count,
P a g e | 32
www.ijcait.com International Journal of Computer Applications & Information Technology
Vol. 1, No.1, July 2012
[4] Lo B., Thiemjarus S., King R., and Yang G., “Body
6. CONCLUSION Sensor Network - A Wireless Sensor Platform for
In order to protect web server, as a security tool, the intrusion Pervasive Healthcare Monitoring”, Adjunct Proceedings
detection system is indispensable. The GMM technique has of the 3rd International conference on Pervasive
been introduced to apply in the classification of rule set so as Computing (PERVASIVE'05), May 2005.
to improve the traditional classification technique, reduce the
[5] Milazzo Jr. A.S., Herlong J.R., Li J.S., Sanders S. P.,
matching times and eventually improve the detection
Barrington M., and Bengur A.R., “Real-time
efficiency. In this paper we proposed a novel method based on
transmission of pediatric echocardiograms using a single
Hierarchical Gaussian Mixture Model for intrusion detection
ISDN line”, Computers in Biology and Medicine, vol.
mechanism. HGMM is an effective model for detecting
32, pp. 379-388, September 2002.
computer attacks of unknown patterns. The Expectation-
maximization algorithm are used to compute the parameters [6] N. F. Timmons, W. G. Scanlon, “Analysis of the
of a parametric mixture model distribution. If the threshold performance of IEEE 802.15.4 for medical sensor body
value is made too low, the IDS Engine suffers from a high area networking”, IEEE Sensor and Ad Hoc
false alarm rate. Here new scan detection techniques that have Communications and Networks Conference (SECON),
much lower false alarm rate and much higher coverage than 2004.
existing techniques are used to reduce the overall false alarm
rate. Some of the methods used are Filtering the unwanted [7] N. Smith-Guerin, L. Al Bassit, G. Poisson, C. Delgorge,
packets and Setting medium level of threshold value. P. Arbeille, and P. Vieyres, “Clinical validation of a
mobile patient-expert tele-echography system using
Using Dimensionality Reduction for three dimensionalities ISDN lines”, in Proc. 4th Int. IEEE/EMBS Special Topic
such as for 41 attributes 14 attributes and 7 attributes the Conf. Inform. Technol. Applicat. Biomed., Birmingham,
classification of attacks are made and by applying the U.K., April 2003, pp. 23–26.
evaluation criteria the corresponding Specificity, Accuracy,
Sensitivity are evaluated to get the respective True Positive, [8] Pertersen S., Peto V. and Rayner M., “Coronary heart
false positive rate for both the algorithms . disease statistics 2004”, British Heart Foundation, June
2004
SNORT RESULTS:
[9] R. S. H. Istepanian, E. Jovanov, Y. T. Zhang, “Guest
editorial introduction to the special section on M-health:
beyond seamless mobility and global wireless health-care
connectivity”, IEEE Trans. on Information Technology
in Biomedicine, vol. 8, no. 4, December 2004.
[10] R. S. H. Istepanian, B. Woodward, and C. I. Richards,
“Advances in telemedicine using mobile
communications”, in Proc. 23rd Annu. Int. IEEE/EMBS
Conf., Istanbul, Turkey, 2001, pp. 3556–3558.
[11] Sinem Coleri Ergen, “Zigbee/IEEE 802.15.4 Summary”,
UC Berkeley, September 2004.
http://www.cs.wisc.edu/~suman/courses/838/papers/zigb
ee.pdf
P a g e | 33