04 Chapter 1
04 Chapter 1
CHAPTER 1
INTRODUCTION
Overview
This chapter introduces cloud services and its related security issues. In order to resolve
security challenges Intrusion detection systemand its need in the cloud security are
briefed. Various approaches of cloud malicious activity detection are listed in the
chapter with motivation and objective of whole work.
1.1 Introduction
1|Page
Introduction
DoS is a type of attack in which excessive messages are generated artificially and
targeted towards the victim resource that makes the server unable to extend service to
legitimate users [2].
It causes ineffective and inaccessible services, interruption of network traffic and
connection interface. In this thesis, three hybrid IDS models are proposed to identify the
patterns of DoS attacks effectively.
The National Institute of Standards and Technology (NIST) defined cloud computing as
a model for enabling ubiquitous, convenient, on-demand network access to a shared pool
of configurable computing resources (e.g., networks, servers, storage, applications, and
services) that can be rapidly provisioned and released with minimal management effort
or service provider interaction.”[3] According to NIST, the cloud model is composed of
five essential characteristics, three service models, and four deployment models. [4].
Cloud computing was created in response to the growing need for Internet use,
interaction, and other related activities; it usually entails offering a dynamically
expanded Internet service using virtualized resources. The term "cloud" is a metaphor
for networks or the Internet. Cloud images have been used to illustrate
telecommunication networks in the past, and they are now used to refer to the
abstraction of the Internet and its underlying infrastructure. The term "cloud computing"
refers to the manner of renting and using IT infrastructure. It denotes that the required
resources are obtained over the network, depending on principles such as on-demand
and easy expansion; generalised cloud computing refers to the computing mode of rent
and use. IT, software, Internet-related services, and other services are examples of this
type of service.
___________________
2
P.Sekhara Rath, M. Mohanty, S.Acharya, M. Aich, http://ijieee.org.in/paper_detail.php?paper_id=4309&
3
Cloud computing definition. http://www.nist.gov/itl/csd/cloud-102511
4
Mell Peter and Grance Timothy, https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-
145.pdf
2|Page
Introduction
It means that computational ability can be treated as a commodity and exchanged over
the Internet in the same way that other utilities like water, gas, and electricity do.
It means that computational ability can be treated as a commodity and exchanged over
the Internet in the same way that other utilities like water, gas, and electricity do.
3|Page
Introduction
Additional characteristics listed in [5] that should be considered when designing a cloud-
based intrusion detection system are as follows:
• Quality of Service (QoS) which ensures the importance of the support of specific
requirements that should be met through the provided services or resources. This can be
met by ensuring that the agreed service quality of the cloud user is in the Service Level
Agreement (SLA) which covers response time, throughput, safety, etc.
• Agility and adaptability refer to on-time reaction to the size changes of resources and
the amount of those requests where they should be adapted automatically by the
resources management.
• Availability is the ability to provide redundant services and data to avoid service
provision failures.
4|Page
Introduction
1.2.4 OpenStack
"A cloud operating system that controls large pools of computing, storage, and
networking resources throughout a datacenter, all managed through a dashboard that
gives administrators control while empowering their users to provision resources
through a web interface," according to the OpenStack website. [6] Choosing what cloud
platform to use for this research experiments was a challenge. OpenStack was the
preferred option for several reasons.
• Free open source that has a huge community that contributes to enhancing its features.
• Trusted platform that has been used and supported by the leading companies such as
AT&T, Walmart, RedHat, Canonical, Dell, IBM, HP, Cisco, and PayPal.
• Flexibility that can be integrated with other services and technologies.
• Compatibility; OpenStacks APIs are designed to be compatible with public cloud
platforms [7].
• Security; high level of security can be achieved through role-based access controls.
___________
6
OpenStack official website. https://docs.openstack.org/queens/.
7
Openstack benefits. https://vexxhost.com/blog/
5|Page
Introduction
AWS claim to protect against: 1) Distributed denial of service attacks (DDoS), due to
‘world class infrastructure, proprietary DDOS mitigation techniques and homing across
a number of providers’, 2) Man-in-the-middle attacks (MITM), due to SSL-encrypted
endpoints, 3) IP Spoofing, due to the infrastructural design of hosts not being able to
send traffic with source other than its own, 4) Port Scanning, due to all ports being
closed by default and 5) Packet Sniffing, due to the hypervisor not allowing VMs
running in promiscuous mode to sniff traffic that is intended for another VM [9].
Furthermore, AWS provides monitoring and logging via their Cloudtrail1 and
Cloudwatch2 services. Cloudtrail logs any API activity including SDKs and command
line tools. With Cloudwatch it is possible to import any type of logging to process.
These can be logs from applications, systems, networks or even services like Cloudtrail.
Log analysis from systems can be used to search, among other things, for malicious
logins or DDOS attacks. Additionally, AWS uses security groups that can be assigned to
any instance or group of instances to administer what traffic can flow to which places.
AWS also provides a web application firewall (WAF) service which has similar
functionality as a security group, with the addition of conventional rules that block
regular attack patterns such as SQL injections and cross-site scripting (XSS). Lastly,
AWS offers a virtual private cloud (VPC) which provides almost all benefits of an on
premises infrastructure, with the benefits of the cloud. The VPC can be monitored with
VPC flow logs, keeping track of all packet movement across the VPC. These logs can,
once more, be integrated with Cloudwatch.
__________________
8
Amazon Web Services Inc, https://aws.amazon.com/.
9
Amazon Web Services Inc.https://d0.awsstatic.com/aws-answers/VPC_Security_ Capabilities.pdf.
6|Page
Introduction
1.3.2 Microsoft Azure Azure is Microsoft’s version of an IaaS full of cloud services
that consumers can use to build, deploy and manage applications. Within Azure the
services are similar to AWS. There are options for compute, networking, storage and
more. Azure’s Security Center enables the user to prevent, detect and respond to threats
[10]. On top of that, Azure comes with standard capabilities such as encryption for data
in transit and at rest.Furthermore, Microsoft offers logging with Azure Monitor , which
offers infrastructure,system, and application level data about the throughput of a service
and the surrounding environment.
Security specific event logs are customizable in the platform and can be managed in the
Azure Security Center. For securing the network, Azure defined network access controls
[11]. These controls can be enforced via the Network Security Groups (NSGs), which
are cloud based stateful packet filtering firewalls to evaluate any traffic flowing in or out
a VM, a group of VMs or even entire subnets. Note that the packet filtering is stateful
and cannot be compared with the full packet inspection capabilities traditional NIDS
encapsulates. Similar to AWS’s VPC, Azure also includes the ability to deploy virtual
networks (VNets).
1.3.3 Google Cloud Google’s cloud computing environment has been around since
2008, but their current IaaS service, Google Compute Engine, has only been here for a
couple of years. Even though Google is relatively new in the market compared to the
above two, they cannot be left out. Google Cloud is supported by the same infrastructure
Google uses for their core services like YouTube or their search engine. In terms of
security, Google have implemented several essential defaults across their entire
infrastructure. Examples of these security services include, but are not limited to, DoS
mitigation, encryption in transit and at rest, secure authentication and inter service
access management [12]. Within their own infrastructure Google applies monitoring
____________
10
Y. Diogenes. https://docs.microsoft.com/en-us/azure/security-center/ security-center-detection-
capabilities.
11
Y. Diogenes, et. al. https://www.microsoftpressstore.com/articles/article.aspx?p=2730118. 9
12
Google Inc. https://cloud.google.com/solutions.
7|Page
Introduction
(and network intrusion detection) that is focused on information gathered from the
internal network, employee actions and knowledge of vulnerabilities. These sources do
not surface to consumer level. However, all platform API requests are logged for the
user to monitor with Google’s stackdriver platform tools for logging5 and monitoring6.
Interestingly, both these services are also available for AWS. For private networking,
Google also deployed a VPC consisting of a comprehensive set of networking
capabilities, including granular IP address range selection, routes, firewalls, VPN, and
Cloud Router.
Many challenges and attacks have faced the cloud computing due to the broadness of
both vulnerabilities and attack vectors. Multiple types of challenges to this environment
are posed from different technologies and levels of software and hardware. In this
section presents an introduction to these attacks and challenges to give the reader a
comprehensive view of the state of art of cloud security, and will present a detailed
description in the next two chapters. There are three types of cloud deployment; private
that is deployed with an organization for internal use, public that provides different
service for various originations, and hybrid that combines the two previous types. These
forms of deployment are accompanied by security threats such as data leakage due to
cloning and resource pooling [14], and unauthorised access to data residuals that are
obsolete in different server of the public cloud [15]. Private clouds produces an ‘elastic
perimeter’ concept due to alternate between centralizing and distributing the data
according to user need, which results in data loss when storing data in higher level of
authorization zones [16]. The multi-tenant aspect of the cloud poses another security
threat when an illegal access to data is been tried by an authorised costumer that share
the same hardware with the victim. As mentioned previously, cloud computing offers a
number of services such as Software as a Service (SaaS),
_______________
14
B. Grobauer, T. Walloschek, and E. Stöcker,https://ieeexplore.ieee.org/document/5487489
15
R.Bhadauria and S. Sanyal,https://www.researchgate.net/publication/222101173
16
G. Galante and L. C. E. De Bona, https://www.researchgate.net/publication/235534087
8|Page
Introduction
Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) that, in turns, poses
different challenges to cloud security according to the types of offered the services.
While the types of security threats in cloud deployment models are dependent more on
exploiting managerial faults or bugs, the threats at service level are mostly using the
technical vulnerabilities to achieve the malicious purposes. This is more obvious in the
security threats that are caused by the virtualization concept of the cloud such as Virtual
Machine (VM) Hopping, Mobility, and Denial of Service and Service Hijacking.In
Virtual Machine Hopping , an attacker could gain access and control another VM on the
same server, which impact the confidentiality and the availability of the victim severely.
In VM Mobility, the VM and its 3 contents are saved as portable soft copies so it could
easily relocated from server to another via portable devices without having a copy on
fixed storage [14], this process can lead to serious security threats such as data leakage
and loss, and targeted attacks like man in-the-middle attack. VM Denial of Service
(DoS) happens when a VM consumes all the available resources that server cannot
operate the other VMs resides in the server. Service hijacking is also threat to cloud that
occur when an unauthorized users gaining illegal control on certain services [20].
Backups of data may leads to security threats when misconducted that results in data
leakage or misused by an unauthorised parties [21]. Network-based attacks are another
type of security threats, whether these attacks targeting the network infrastructure or
software. Browsing attacks are such attacks where the browsing software is used to
launch the attacks such as sniffing, SQL injection, and XML signature wrapping attacks.
In the sniffing attack, the attacker install malware on an intermediary host to steal the
victim credentials, which leads to illegal use of legit credentials. SQL injection attacks
work by inserting malicious code in a model SQL inputs that leads to granting the
attacker an unauthorised access to the database and consequently to other confidential
information [22].
___________________
14
B.Grobauer et al, T.Walloschek, and E. Stöcker, https://ieeexplore.ieee.org/document/5487489
21
S. Subashini et al,https://www.sciencedirect.com/science/article/abs/pii/S1084804510001281
22
S. Roschke, F. Cheng et al, https://www.infona.pl/resource/bwmeta1.element.ieee-art-000005380611
9|Page
Introduction
The XML Signature Element Wrapping attacks targeting to disturb the cloud service by
inserting malicious data in signature part of the massage, this may lead the cloud
interface to execute arbitrary methods [22]. Different types of attacks might target the
network infrastructure such as flooding attacks, which target the bandwidth of the
network in order to affect the service availability [23]. In addition to many network
based attacks that targets the conventional networking systems, and applied to the cloud
for malicious purposes such as the DoS attack. To demonstrate the impact of cyber-
attacks on cloud computing environment, this section present a recent snapshot of these
attacks and their implications on both global economy and people. In recent years,
cyber-attacks have increased rapidly in volume and diversity. In 2013, for example, over
552 million customers’ identities and crucial information were revealed through data
breaches worldwide.
These growing threats are further demonstrated in the 50,000 daily attacks on the
London Stock Exchange [24]. It has been predicted that the economic impact of cyber-
attacks will cost the global economy $3 Trillion on aggregate by 2020 [25]. These
immense effects and implications have urged the United States Department of Defence
to categorize cyber-attacks as an act of war, meriting physical military force in response
[24]. Such a categorization depicts the severe view countries across the globe now have
of cyber- 4 threats. The classical cyber-attacks such as the Distributed Denial of Service
(DDoS) are continuing to target the cloud such that four in five organizations has been
targeted by DDoS attack during 2017 [24]. Furthermore, Massachusetts Institute of
Technology (MIT) predicts sophisticated ransomware attacks will target the cloud in
2018 [25].
________________
22
S. Roschke, F. Cheng et al, https://www.infona.pl/resource/bwmeta1.element.ieee-art-000005380611
23
M. N. Ismail, A. Aborujilah, S. Musa, https://dl.acm.org/doi/10.1145/2448556.2448592
24
A. Chadd et al,https://www.comparethecloud.net/articles/2017-cyberattacks/.
25
W.Ashford et al,https://www.computerweekly.com/news/450432488/Ransomware-to-hit-
cloudcomputing-in-2018-predicts-MIT.
10 | P a g e
Introduction
Based on such cyber security atmosphere, the cloud security market will rise from $1.4
Billion to $12 Billion by 2024 [26]. The consequences of cyber-attacks such as huge
economic expenses, business effects, and personal damages encourage security
researchers to focus on this area in order to mitigate these consequences.
________________
26
A.D.Rayome et al,https://www.techrepublic.com/article/cloud-security-market-to-reach-12b-by-2024-
driven-by-rise-of-cyber-attacks/.
11 | P a g e
Introduction
SQL injection attack:In this attack the attacker uses the input field of the database of
the user. The most common example for such types of attacks are the attack occurred on
the Sony play station in the year 2008 website.
Command injection attack: the name of this attack is given as per its role, because it
injects the command and those commands are run according to the runtime environment
or may create shell.
3. Abuse and Nefarious use of cloud services
The main difference in this attack than the insider is the attackers background, otherwise
all is in common. In the insider attack the attacker is the authorised user of the data
while in this attacker is the hacker which attacks the less secured database or poor
clouds. As due to this no need of using expensive DoS and did brute forced attacks on
the target.
4. Denial of service attack
This type of attacksis mainly done by the flooded networks having many packets like
TCP, UDP, ICMP or their combinations.due to the risk of the intruder attack on the
distributed services of the computer, some of them are not even available to the
authorised users also. As this attack overloads all the systems, due to which legal users
are unable to used them.These types of attacksprove very dangerous for the single cloud
data and servers as many users depends on that cloud distributed network.
5. Side channel attack
This type of attack done with the cryptographic algorithm of the system. For this they
used the special VMM service which is virtual machine manger which guides the user
attack for the creation of virtualization layer. They placed a physical virtual machine on
the targeted system, while VMM helps other users and supervises known as hypervisor.
6. User to root attack
In this attack, the attacker uses the sniffing password for the authentication of the
targeted user’s system. So, by combining traditional various methods for the raising of
the privileges to the super user access acceptance. An example of such escalation
technique is the smashing stack, in which a packet of the set-UID- root program that
corrupts the address space, so that returning information from the instruction to subshell
space.
12 | P a g e
Introduction
_________________
27
https://www.dnsstuff.com/ intrusion-detection-system#types-of-intrusion-detection-system.
13 | P a g e
Introduction
Intrusion Detection System (IDS) is a multicolored technique which inspects all inbound
and outbound network activities, identifies suspicious patterns and discards them. IDS
have three main components namely Data source, Analysis engine and Response
manager [4]. Data source is the primary component in any IDS which is also called an
event generator. Data sources can be classified into four categories namely Hostbased
monitors, Network-based monitors, Application-based monitors and Targetbased
monitors. The second component of an intrusion detection system is the analysis engine.
This component receives information from the data source and examines the data for
symptoms of attacks or other policy violations. There are two broadly used techniques
for IDS analysis namely Misuse/Signature based detection and Anomaly/Statistical
detection fig. 1.1. The analysis engine uses one or both of the following analysis
approaches.
Anomaly Based Intrusion Detection System: The anomaly detection focuses on
identifying unusual behaviour in a host or a network.
They operate assuming that the attacks are different from the normal activity. Anomaly
detectors construct profiles representing the normal behaviour of users, hosts or network
connections.
__________________
27
https://www.dnsstuff.com/ intrusion-detection-system#types-of-intrusion-detection-system.
4
Mell Peterand Grance Timothyhttps://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-
145.pdf
14 | P a g e
Introduction
These profiles are constructed from historical data collected during normal operation.
The detectors collect data from the events and use a variety of measures to determine
when the monitored activity deviates from normal activity. The measures and techniques
used in the detection of anomalies include:
• Statistic measures, which can be parametric, where it is assumed that the distribution of
the profiled attributes fits a certain pattern, or non - parametric, where the distribution of
the profiled attributes is learnt from historical values observed over time.
15 | P a g e
Introduction
2. Programmed: The programmed learning needs some agent either it be a user or some
other, who teaches the system, program it to detect the different anomalies or malicious
events. Thus the user or the programmer of the system forms an opinion or idea on what
is considered abnormal behaviour for the system to security violation signal.
Advantages
• The IDSs based on anomaly recognition detect unusual behaviour. Thus they have the
ability to detect attacks for which they have no specific knowledge.
• Anomaly detectors produce information that is very useful to define new patterns for
signature detection.
Disadvantages
• The detection of anomalies produces a high number of false alarms due to the
unpredictable behaviour of users and networks.
16 | P a g e
Introduction
The proper operation of such a system depends not only on a good installation and
configuration, but also on the fact that the database where the attack patterns are stored
is updated.
The mis-use can be further classified in the following sub groups which are as under:
1. State Modeling A state model is a representation of the process model for one type of
change request. A state represents the status of an individual change request. State-
modelling encodes the intrusion as a number of different states, each of which has to be
present in the observation space for the intrusion to be considered to have taken place
.They are by their nature time series models. Two subclasses exist: in the first, state
transition, the states that make up the intrusion form a simple chain that has to be
traversed from beginning to end; in the second, Petri-net, the states form a Petri-net. In
this case they can have a more general tree structure, in which several preparatory states
can be fulfilled in any order.
2. Expert System An Expert system is based on the statistical profiles of users, events
etc. and then use the same for the intrusion detection process .Thus expert system is
employed to reason about the security state of the system, given rules that describe
intrusive behaviour. The system works on the principle of previously defined set of rules
which when assembles in sequence represent an attack. In expert system, all the events
that have been incorporated in an audit trail are translated in the form of if-then-else
rules. However it is very hectic to get a perfect rule for input data stream.
3. String Matching String match is very simple and based on the character matching
pattern but are case sensitive in nature. In this type of techniques, Sub-strings characters
are being matched in the text that has been ment for transmission. These systems are not
flexible, but it has the virtue of being simple to understand.
4. Simple Rule Based These systems are similar to the more powerful expert system,
but not as advanced. This often leads to speedier execution . The system observes events
17 | P a g e
Introduction
on system & applies rules to decide if activity is suspicious or not. The rule-based
anomaly detection analyzes historical audit records to identify usage patterns & auto-
generate rules for them. They also observe current behavior & match against rules to see
if conforms. It does not require prior knowledge of flaws like statistical anomaly
detection.
Advantages
• Signature detectors are very effective in detecting attacks without generating a large
number of false alarms.
• They can quickly and accurately diagnose the use of a specific attack technique. This
can help those responsible for security to easily follow security problems and to
prioritize corrective actions.
•It is capable of alerting on unknown suspicious behavior.
•Signatures may also include alerts on network traffic, including known
malicious IP addresses that are attempting to access a system.
Disadvantages
• Signature detectors only detect the attacks they previously know, so they must be
constantly updated with signatures of new attacks.
• It relies on a preprogramed list of known indicators of compromise
• Many signature detectors are designed to use very tight patterns that prevent them from
detecting variants of common attacks.
•It will not detect zero-day exploits.
• They require trained staff to configure and maintain them appropriately.
18 | P a g e
Introduction
The main limitation of this approach is that it looks only for the known weaknesses and
may not care about detecting unknown future intrusions. Anomaly/Statistical Detection:
An anomaly based detection engine will search for something rare or unusual [6]. In
other words, they take care of attacks those could not be analyzed by the misuse
detection method. They analyze system event streams using statistical techniques to find
patterns of activity that appear to be abnormal. The primary disadvantages of this system
are that they are highly expensive and there are possibilities to mislead an intrusive
behavior as normal behavior because of insufficient data. The third component of an
intrusion detection system is the response manager. In basic terms, the response manager
will act only when possible inaccuracies are found in the system, by informing someone
or something in the form of a response.
_____________
6
OpenStack official website. https://docs.openstack.org/queens/
19 | P a g e
Introduction
In Intrusion Detection Systems, there are a diversity of techniques to gather data , but in
general, as shown in Figure 1.2, IDSs consist of the following:
• Data gathering (sensors) is device that is responsible for gathering information from
the system.
• Detector ID - Engine analyzes the data collected from the sensors to identify any
attacks.
• Knowledge base (database) is the component where the IDS contains information
about traffic collected by the sensors. Security professionals usually provide such
information.
• The state of the Intrusion Detection System is revealed by the configuration device.
20 | P a g e
Introduction
Without taking into account any hybrid or distributed combinations, there are two types
of intrusion detection systems; host based intrusion detection systems (HIDS) and
network based intrusion detection systems (NIDS). For completeness of this research,
we also take into account additional log sources that can be analysed. For example, one
might use firewall logs as alternative source to verify intrusions on the network
boundary. This is relevant in the cloud setting as CSPs might provision certain
infrastructural, activity, diagnostic or application logs. HIDS and NIDS are usually
interleaved. HIDS catches intrusions the NIDS misses out on and vice versa.
21 | P a g e
Introduction
system). HIDS then checks logs and activity occurring on these objects for unwanted
modifications, memory and data integrity, system calls and more.
Like any intrusion detection system, HIDSs also report multiple false positives. Once the
system is adjusted, the reduction of false positives is remarkable and then also this type
of IDSs ignores very few attacks against the system. In contrast to NIDSs, HIDSs can
see the result of an attempted attack, as well as directly access and monitor data files and
processes of the attacked system [14]. Although NIDSs have greater development and
these days are more accepted, HIDS have certain advantages over them
Advantages
• The host-based IDSs, having the ability to monitor local events of a host, can detect
attacks that cannot be seen by a networkbased IDS.
• They can often operate in an environment in which network traffic travels encrypted,
since the source of information is analysed before the data is encrypted on the origin
host and / or after the data is decrypted on the destination host.
Disadvantages
• Host-based IDSs are more costly (in time and money) to administer as they must be
managed and configured at each monitored host. While the NIDSs have an IDS for
multiple monitored systems, HIDSs have an IDS for each of them.
• If the analysis station is within the monitored host, the IDS can be disabled if an attack
attains success on the machine
• They are not adequate for detecting attacks on an entire network (for example, port
scans) since the IDS only analyses those network packets sent to it.
_______________
14
B. Grobauer, T. Walloschek, and E. Stöcker, https://ieeexplore.ieee.org/document/5487489
22 | P a g e
Introduction
• HIDSs use resources of the host that they are monitoring, influencing its performance.
As a subclass of HIDS, we should quote the multi-host-based IDS. They use the
information collected
from two or more hosts analysing it and trying to catch any threat. Its approach is very
similar to the classic HIDS with the additional difficulty of having to coordinate the data
from several sources. As we mentioned Snort as an open source project in NIDS, we
should mention Osiris as an example of HIDS.
Network Based Intrusion Detection A NIDS on the other hand, monitors packets that
flow through the network, checking them for malicious content or policy violations . The
detection unit is usually placed as test access point (TAP) or switch port analyzer
23 | P a g e
Introduction
(SPAN) on a switch that mirrors the data elsewhere. Traditionally, there are two
placement options for a NIDS sensor.
Firstly, there is an inline option, which is a device that is placed on the network route.
Consequently, the NIDS can actually stop packets from reaching their destination,
possibly turning the intrusion detection system in an intrusion prevention system.
However, this requires the packet analysis to happen inside this inline NIDS device,
which introduces additional latency.
Secondly, there is out of band NIDS, which sits outside the network. Instead out of band
uses copies of the data. The copies are usually provided via a mirroring port on a switch
or TAP. Out of band introduces little to no extra latency, which is desirable, especially
in networks under heavy loads. On the downside, out of band looks at copies, so the data
is no longer real time.
• A well-located IDS can monitor a large network as long as it has enough capacity to
analyse the traffic in its totality.
• The NIDSs have a small impact on the network, usually remaining passive and not
interfering with normal operations of the latter.
• NIDSs can be configured to be invisible to the network in order to increase the security
against attacks.
Disadvantages
• The sensors not only analyse the headers of the packages, they also analyse their
content, so they may have difficulties processing all packages in a large network or with
24 | P a g e
Introduction
much traffic and may fail to recognize attacks during periods of high traffic. Some
vendors are trying to solve this problem by implementing IDSs completely in hardware,
which makes them much faster.
• The network-based IDSs do not know whether the attack was successful or not, the
only thing known is that it was launched. This means that after a NIDS detects an attack,
administrators must manually investigate every host attacked to determine if the attempt
was successful or not.
• Due to their general configuration, NIDSs may have a high false acceptance or false
positive rate. They may report a lot of normal activities identified as attacks. The
problem comes when the number of such alarms is unacceptably high.
• Perhaps the biggest drawback of NIDSs is their implementation of the stack for
network protocols that may differ from the stack of the systems they protect. Many
servers and desktop systems do not follow in some aspects the current TCP / IP
standards, thus it is possible to have them discard packages the NIDS has accepted. An
example of an open source NIDS and one of the most used nowadays is Snort on which
we will focus more in-depth later.
25 | P a g e
Introduction
1. Real time based: In this type of system, the data is being analyzed for some intrusion
while session is in progress and raises alarm immediately when the system detects some
suspicious data as an attack. Thus the data over the network is checked for any intrusion
in the real time aspect scenario.
2. Offline based : In this type of system, the data which we are going to analyze for
some intrusion has been collected previously. i.e. the data to be analyzed are already
there stored somewhere in term of log and are later processed for intrusion detection
process. These types of systems are mainly used for understanding the attack behaviour.
Kaiyuan et al.inuse a network intrusion detection algorithm in the paper, for this uses
hierarchical network mixing with the hybrid sampling. For the balancing of the majority
and minority sampling, first test with the OSS that is the one side selection and then
apply SMOTE which is synthetic minority over sampling technique. This can be used to
trained the features of minority sampling and model time decreasing that is OSS by
reduce noise. After the sampling is done, the deep hierarchical network is applying for
the special features that is CNN and BiLSTM. CNN (convolution neural network) and
BiLSTM (Bi-directional long short-term memory) for the extracting 3D and temporal
features.
_______________
31
Govinda.K1, Kevin Thomas. https://www.irjet.net/archives/V4/i7/IRJET-V4I703.pdf
26 | P a g e
Introduction
Singular valuation deposition (SVD) is the procedure which is used to reduce the
dimensionality in the data comes under the vector algebra by factorizing the matrix. The
main motive of the SVD is the data analysis of the gene that is to identify and expel the
mechanical constraints and narrate the significance. To calculate the covariance matrix
eigen values and eigen vectors of the sample gene matrix SVD is applied. To maintain
the higher variability of the matrix the eigen values must be greater so as to reduce the
eigen vector. Eigen vectors causes the PC that is prime principle components to reduce
into smaller scales due to higher unpredictability of the vectors.
Then researchers faced the problems and have to deal with the ill responses and many
variables for predictive model. E.g. spectrograph.
Locally Linear Embedding (LLE) has the feature of converting the high dimensional
data to low dimensional data with no loss of information. The most common method
used for the dimension reduction is PCA (principle component analysis) in which data
points are spanned in the data sets. By the great variance they are covered by the
directions of the respective orthogonal projection of subspace low dimension of
components or factors. For the nonlinear algorithm the stated methods are LLE and
ISOMAP. These methods convert the high dimensional data into low dimensional
27 | P a g e
Introduction
subspace by placing the important components into the nonlinear way. LLE can be
coordinated the data after the converting into the smooth manifold and into individual
local co-ordinates.
Generic algorithm research on the Darwinian evolution and natural selection has stated
many theories and models for deciding optimization. This algorithm is the substitute for
those problem on the population of the mutation, selection and recombination
applications. They are applied for the pattern recognition, classification and the
optimization technique problems as due to parallel, iterative.
Anomaly detection mainly works on the detection of attacks that differ from normal
behaviour in terms of both type and amount. If we know in detail what normal behaviour
pattern is, any violation can be identified, whether it is part of threat model or not.
However, the advantage of detecting previously unknown attacks is paid for in terms of
high false-positive rates in anomaly detection systems. The training of an anomaly
detection system in highly dynamic environments is a very difficult task. The anomaly
detection systems are intrinsically complex and also it is difficult to determine the
triggering event of the alarms.
_______________
8
Amazon Web Services Inc. https://aws.amazon.com/.
28 | P a g e
Introduction
On the other hand, misuse based IDS recognize the patterns of attack and they
essentially contain attack descriptions or signatures and match them against the audit
data stream, looking for evidence of known attacks [8]. In contrast to firewalls, a misuse
based IDS will scan all packets at layers 3 and 4 as well as the application level
protocols looking for back door Trojans, Denial of Service attacks, worms, buffer
overflow attacks, detect scans against the network etc. The main advantage of misuse
detection systems is that audit data analysis is mainly focused on and false alarms
produced are very few.
The main disadvantage of misuse detection systems is that they can detect only known
attacks for which they have a defined signature. For the new attacks being discovered,
developers must design new respective models and add them to the signature database.
In addition, signature-based IDSs are prone to attacks easily aimed at triggering a high
Volume of detection alerts by injecting traffic that has been specifically crafted to match
the signatures used in the analysis process. This type of attack can be used to exhaust the
resources on the IDS computing platform and to hide attacks within the large numbers of
alerts produced. IDS provides much greater visibility to detect signs of attacks and
compromised hosts.But, an IDS is also needed to make sure that the traffic that gets past
the firewall is monitored.
Detection Methods For both HIDS and NIDS the detection comes in two main
principles. An IDS either looks for behaviour that diverges from normal activity, or it
matches incoming data against known patterns and signatures. These two methods are
called anomaly based detection and signature (or misuse) based detection. Evidently, the
latter method only works against known attacks. On the other hand, anomaly detection
tends to generate more false positives than its signature based counterpart. In practice,
signature and anomaly based detection are often used in conjunction.
HIDS and NIDS in the Public Cloud Either two of the above detection methods have
their own advantages and disadvantages. Essentially, they supplement each other.
_______________
8
Amazon Web Services Inc. https://aws.amazon.com/.
29 | P a g e
Introduction
Log Monitoring and Analysis Event and log correlation from different detection units
is often performed in a centralised machine. This so called SIEM (security information
and event monitoring) is essentially an extra layer on top of all security controls. Besides
IDSs and other security devices, a SIEM can also collect logs from most sources within
your environment. In the cloud there are a lot of SaaS, PaaS and even IaaS instances that
generate logs. These logs range from system and application logs to active directory,
user activity, virtual machine (VM) or cloud security appliance related logs. Essentially,
cloud providers provide security of their own, so in light of this research it is definitely
something to take into account on top of IDS.
30 | P a g e
Introduction
one. Based on the given information, the machine will be able to see the relationship
between different data and predict the time it takes to travel from a location to another.
Unsupervised learning is a technique where a machine learning model does not need to
be supervised. The model will instead try to discover the information by itself, which
means that unsupervised learning deals with unlabeled datasets. In unsupervised
learning, the machine can find all types of data patterns, and it also helps in identifying
the features one needs to categorize the data.
31 | P a g e
Introduction
Every neuron in the network has been well-programmed based on their qualities, and
they work together to solve artificial intelligence challenges without having to create a
model of a real system [33]. A neural network is a system that is based on a model of the
human brain anatomy. The human brain is made up of a dense network of nerve cells
that may communicate with one another. These nerve cells are the fundamental
components of the brain's information processing machinery, known as neurons. In a
healthy human brain, there are about 10 billion neurons and 60 trillion connections and
synapses between them.The brain uses these neurons to process information; the human
brain's information processing capabilities is more faster and more powerful than any
computer currently in use.Despite the fact that each neuron has a fairly basic structure, a
vast number of neurons can combine to produce a massive and mature 15 processing
machinery.
______________
33
Wikipedia, http://en.wikipedia.org/wiki/Neural_network.
32 | P a g e
Introduction
A neuron has a cell body, soma, a lot of fibre called dendritic, and a long fibre called
axon, as shown in figure 1.5. The human brain can be viewed as a multi-layered,
nonlinear, parallel information processing system. The information processing and
storage mechanisms in the brain are not fully independent; they may occur concurrently
in the same neural network. In other words, these two procedures occur globally, not
locally, in the neural network. The ability to learn is the most important feature of a
biological neural network, and computers are used to replicate biological learning
processes in order to fulfil the tasks that are required.An artificial neural network is
made up of a number of very simple processors known as neurons, which are analogous
to biological neurons in the brain.
There are weighted linkages between neurons that connect them as a full network.
Signals are thus passed from one neuron to the next. The neurons' output signal will be
split into multiple branches, each transmitting the same signal. The incoming
connections of other neurons in the network end the outward branches. So far, neural
networks have been widely used. In the field of computer science, there are a variety of
33 | P a g e
Introduction
algorithms and works based on neural networks that have been used to a variety of
problems. Because of its self-learning and organising abilities, neural networks are an
excellent choice when several variables appear at random and the user needs to
determine or clarify something based on them. Neural Networks (NN) In simple terms, a
Neural Network [53] is a network that contains a number of neurons used to process
information. A Neural Network consists of three main components; input layer, hidden
layers and output layer, as shown in Figure 1.6.
Here are a few examples. As shown in. [35], an interesting study is conducted by
inputting variables such as moisture, titratable acidity, free fatty acids, tyrosine, and
peroxide value into an ANN and assisting in the development of a model of radial basis
(exact fit) artificial neural network for estimating the shelf life of burfi stored at 30oC;
___________________
53
N.Moustafa,https://www.researchgate.net/publication/287330529_UNSW-
35
Sumit Goyal, et al https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.674.2003
34 | P a g e
Introduction
the output value of this model will be acceptability, and the results are quite promising.
Another interesting paper is [36], which discusses the potential of ANN models based on
the Cascade Backpropagation algorithm for identifying the shelf life of processed cheese
stored at 30 oC. The Cascade backpropagation method (CBA) was utilised to speed up
learning in ANNs, and the Bayesian regularisation approach was employed to train the
network.
The authors offer a new method for assessing road photos that frequently contain
vehicles and extracting licence plates (LP) from natural features by looking for vertical
and horizontal edges in [37]. In this paper, an artificial neural network (ANN)-based
approach is employed to recognise Korean plate characters. A radial basis function
artificial neural network is introduced in reference [38], in which a multilayer feed
forward network is utilised to deal with hydrological data. Spread and centre values are
model parameters in RBFANN that are computed by inducing appropriate weight
values. The authors are addressing the problem of coal calorific value uncertainty by
proposing a soft measurement model for the calorific value of coal based on the RBF
neural network.
The genetic algorithm created a fitness function to optimise the RBF network
parameters, which was linked with the idea of k-cross validation. [39] distributes a BP-
based neural network technique for handling the traffic flow problem, which is a
difficult non-linear prediction of a large-scale system. Because the Neural Network
Model is adaptive and self-learning, it is effective. In an Intelligent Transport System, In
[40] also displays the classification of different categories of targets (vehicles). The soft
computing tool for classification is a Supervised Artificial Neural Network,
______________
36
Sumit Goyal et al https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.463.7045
37
Kaushik Deb et al https://www.sciencedirect.com/science/article/pii/S2212017312004124
38
YuanvJing, et al https://www.researchgate.net/publication/241633478
39
Anuja Nagare, Shalini Bhatia ,https://www.ijais.org/archives/volume1/number2/68-0115
40
Priyabrata Karmakar et al https://www.researchgate.net/publication/343771159
35 | P a g e
Introduction
where targets are categorised based on returned energy to the radar or Radar Cross
Section (RCS) values taken at various aspect angles. The authors of [41] examine and
determine the elements that influence the usage of lifeless-repairable spares. They
integrate the BP neural network with a genetic algorithm to optimise the weights and
thresholds of the BP neural network in order to forecast consumption. The research in
[42] attempts to build reference picture quality measuring techniques for JPEG images,
as well as an Elman neural network to classify the image depending on its quality.
In [43], a new strategy combining the Modular Radial Basis Function Neural Network
(M–RBF–NN) technique with relevant data–preprocessing techniques such as Singular
Spectrum Analysis (SSA) and Partial Least Square (PLS) regression is provided to
increase rainfall forecasting performance. IDS, like the examples above, is a viable
direction for implementing the neural network technique. In the following section of this
thesis, a backpropagation method based on neural networks and implemented in IDS
will be explored.
RF is a supervised machine learning algorithm [44] that mainly operates based on the
classification to solve problems. This algorithm combines multiple Decision Trees, and
the more trees, the more accurate the results.
These decision trees are feed with data and trained to produce outputs (predictions); the
Random Forest algorithm will then choose the best prediction (solution) based on
voting.
___________
41
Feng Guo,et al https://www.researchgate.net/publication/261318207
42
Paulraj M. P et al https://citeseerx.ist.psu.edu/viewdoc/download
43
Jiansheng Wu Yu, et al , http://yadda.icm.edu.pl/yadda/element/bwmeta1.
44
Nabila Farnaaz et al , https://www.sciencedirect.com/science/article/pii/S1877050916311127
36 | P a g e
Introduction
Figure 1.7 shows an example of a decision tree. A Random Forest algorithm starts with
selecting random data samples from a dataset. Then, for each chosen sample, a decision
tree is built, and the prediction results are gathered from each one. When the results are
gathered, a voting process is performed in order to select the best prediction result as a
final solution. A simple illustration of the functionality is shown in Figure 1.8.
37 | P a g e
Introduction
A decision tree [45] consists of three components; the nodes where each node represents
an attribute or a feature, the links that represent the rules (decisions), and leaves that
represent the outcomes.
Distance-based methods
According to [46], distance-based methods are categorized into two primary categories,
clustering-based anomaly detection and nearest neighbour-based anomaly detection
approaches. These methods are based on the similarity function between data instances.
_______________
45
Y. Chang et al,https://ieeexplore.ieee.org/abstract/document/8005870
46
M. A. F. Pimentel et al https://www. sciencedirect.com/science/article/pii/
38 | P a g e
Introduction
Clustering-based methods
As the name implies, this technique arranges similar objects into groups . For instance,
the well-known k-means algorithm is a technique used for anomaly detection. It works
by assuming that anomalous data points are far from their clusters, or that they are not
assigned to any cluster at all. In terms of clustering-based anomaly detection methods,
there are three main categories:
• The first category relies on assuming that normal data points (instances) belong to a
cluster, whereas anomalies do not (noise).
• The techniques used in the second category assume that normal instances are close to
the centroid of the cluster. In the case of anomalies, they lie far away from the centroid.
In order for anomaly to work, there are two requirements, an algorithm to cluster the
data and then compute the anomaly score for each data instance based on the distance
between this instance and its cluster centroid.
_______________
47
M. Amer, https: //www.researchgate.net/publication/230856452
39 | P a g e
Introduction
• In the third category, the issue with the previous two methods is addressed after the
anomalies clusters are formed.
The reason is that normal data points are grouped into large and dense clusters, whereas
anomalies belong to scattered or small clusters. Clustering-based anomaly detection
methods also suffer from scalability issues. However, the test phase is much faster
compared to the Nearest neighbour-based methods, as the algorithm only needs to
compare a few number of clusters with each other. There are efficient variants used such
as heuristic techniques (e.g., k-means), approximate clustering and advanced-indexing
techniques for data partitioning.
Genetic approach
A genetic algorithmic programme typically requires two things to be defined: a Genetic
illustration of the answer domain and a Fitness operation to evaluate the answer domain.
A typical illustration of the response is as an array of symbols, usually a binary string.
GA's evolution methods can start with a group of randomly produced persons or
solutions known as chromosomes and happen in generations.
Every production evaluates the strength of each individual inside the population, and
many people are chosen on or after the current population, with a probability
40 | P a g e
Introduction
proportionate to their fitness. The fitness function is drawn over the genetic diagram and
measures the quality of the reported response. The chosen individuals (chromosomes)
are then altered using an intrinsic operator similar to cross and modify to create a
completely new population.
The algorithmic programme ends either with the greatest number of generations or when
a target fitness value is reached. The following is the fundamental Genetic algorithmic
programme: Create an initial population of people and assess their fitness, however if
the termination condition is not met, choose people with high fitness for copy
recombination amongst people and assess the fitness of the modified individuals to
create a new population.
Biological approaches: The following are the details of the genetic algorithmic
programme:
The use of heuristic search techniques aided the advancement of natural action and
genetic science. Provide cost-effective methods for improvement, which are especially
useful when searching space is limited or analysis is challenging. The construct of the
algorithmic programme Key [24] is as follows:
1. Individual - Any potential answer
2. Genes-Attributes of an entity
3. Population - assortment of all people
4. Search area - All potential solutions to the difficulty
5. Body – (set of genes) set up for a private
_____________
24
A. Chadd et al , https://www.comparethecloud.net/articles/2017-cyberattacks/
41 | P a g e
Introduction
42 | P a g e
Introduction
ACO is made up of three key concepts: developing an ant colony, applying a local look,
and refreshing pheromones.
_______________
27
https://www.dnsstuff.com/ intrusion-detection-system#types-of-intrusion-detection-system.
43 | P a g e
Introduction
Worker Bees are tens of thousands of working drones in the insect colony; these honey
bees perform all conservation and administration tasks in the state, and the scout conduct
and conduct separately are in charge of the bumble bee colonies' operations. Scout
honey bees and Forager honey bees are the two types of working drones. So all three
types of bees used this genetic method to discover the best path or choose data from a
group of elements, which is also known as Path Restructuring (PFB) and Path Selection
(PS).
1.10 DataSet
UNSW-NB15 was created using a commercial penetration tool in the Cyber Range Lab
of the Australian Centre for Cyber Security (ACCS). This tool can generate hybrid
synthetically modern normal activities and contemporary attack behaviors from network
traffic. They collected tcpdump traces for a total duration of 31 h. From these network
traces, they extracted 49 features categorized into five groups: flow features, basic
features, content features, time features, and additional generated features. Feature and
statistical analyses are the most common methods used in several published papers
employing UNSW-NB15 [46–48]. While [46] could obtain 97% accuracy by using 23
features, [47] incorporated the XGBoost algorithm for feature reduction, using several
traditional machine learning algorithms for evaluation, such as Artificial Neural
Network (ANN), Logistic Regression (LR), k-Nearest Neighbor (kNN), Support Vector
Machine (SVM) and Decision Tree (DT).
This dataset contains over 2 million sample data elements from two different simulation
periods. In UNSW-NB15, there are nine categories of attacks [38]:
______________________
46
M. A. F. Pimentel https://www. sciencedirect.com/science/article/pii/S016516841300515X
47
M. Amer,et al https: //www.researchgate.net/publication/230856452
44 | P a g e
Introduction
1. Fuzzers: a technique where the attacker tries to uncover new and unknown
vulnerabilities in a program, operating system, or network by feeding it with the widest
possible range of unexpected input of random data to make it crash.
2. Analysis: Variety of intrusions which aim to penetrate internet applications using
ports (e.g., port scans), emails (e.g., spam), and web scripts (e.g., HTML files).
3. Backdoor: an intruder attempts to gain remote access to a device through bypassing
authentication methods. As a result, he/she will be able to alter files, steal sensitive data
and/or install malicious software.
4. Denial of Service (DoS): an intrusion that causes computer resources to be so heavily
used as to prevent the authorised requests from accessing a device.
5. Exploit: a sequence of instructions that make use of a glitch, bug or vulnerability and
generates an unintentional or unsuspected behaviour on a host or network.
6. Generic: an attack that is executed against all block-cipher to produce a collision with
no consideration of the details of how that block-cipher is implemented.
7. Reconnaissance: can be defined as a probe; an attack that gathers information about a
computer network to evade its security controls.
8. Shellcode: an attack in which the attacker penetrates a slight piece of code starting
from a shell to control the compromised machine.
9. Worm: a type of computer malware which once it is installed by unaware users it can
spread itself on the targeted system. It causes damage to the host network by exhausting
its bandwidth and/or modifying and deleting system files.
There were 47 different reliable features that were extracted from the raw network
packets. The final features that we adopted in our experiments are listed in Table 1.1
alongside their descriptions. The UNSW-NB15 dataset features are generated utilising
tools such as Argus and Bro-IDS. In addition, authors wrote procedures to create new
features based on relations among extracted features (e.g., is_sm_ips_ports and
ct_state_ttl). They included a variety of packet-based features and flow-based features.
These features are grouped into five sets:
(a) Flow features: involve the identifier attributes between hosts (i.e. client or server) for
instance IP address, port number and protocol type.
(b) Basic features: include protocol connection properties.
45 | P a g e
Introduction
(c) Content features: contain the attributes of TCP/IP; additionally they include some
properties of http services.
(d) Time features: involve the attributes time, for instance, arrival time between packets,
start/end packet time, and round-trip time of TCP protocol.
(e) Additional generated features: This category can be further divided into two groups:
general-purpose features, whereby each feature has its own purpose, in order to protect
the service of protocols, and connection features that are built from the flow of 100
recorded connections based on the sequential order of the last time feature.
Table1.1UNSW-NB15DatasetFeaturesDescription[38]
Id Feature Name Description
1 dur Recordtotalduration
2 sbytes Sourcetodestinationbytes
3 dbytes Destinationtosourcebytes
4 rate Numberofpacketspersecond
5 sttl Sourcetodestinationtimetolive
6 dttl Destination to sourcetime tolive
7 sloss Sourcepacketsretransmittedordropped
8 dloss Destinationpacketsretransmittedordropped
9 sload Sourcebitspersecond
10 dload Destinationbitspersecond
11 spkts Sourcetodestinationpacketcount
12 dpkts Destinationtosourcepacketcount
13 swin SourceTCPwindowadvertisementvalue
14 dwin DestinationTCPwindowadvertisementvalue
15 Stcpb SourceTCPbasesequencenumber
16 dtcpb DestinationTCPbasesequencenumber
17 smeansz Meanofthepacketsizetransmittedbythesrcip
18 dmeansz Meanofthepacketsizetransmittedbythedstip
19 trans_depth Theconnectionofhttprequest/responsetransaction
20 response_body_len Thecontentsizeofthedatatransferredfromhttp
21 Sjit Sourcejitter(mSec)
22 Djit Destinationjitter(mSec)
23 Sinpkt Sourceinter-packetarrivaltime
24 Dinpkt Destinationinter-packetarrivaltime
46 | P a g e
Introduction
25 Tcprtt Setupround-triptime,thesumof’synack’and’ackdat’
26 Synack ThetimebetweentheSYNandtheSYN_ACKpackets
27 Ackdat ThetimebetweentheSYN_ACKandtheACKpackets
28 is_sm_ips_ports Ifsrcip=dstipandsport=dsport,assign1else0
29 ct_state_ttl No.ofeachstateaccordingtovaluesofsttlanddttl
30 ct_flw_http_mthd No.ofmethodssuchasGetandPostinhttpservice
31 is_ftp_login Iftheftpsessionisaccessedbyuserandpasswordthen1else0
32 ct_ftp_cmd Noof flows that has a command in ftp session
33 ct_srv_src No.ofrowsofthesameserviceandsrcipin100rows
34 ct_srv_dst No.ofrowsofthesameserviceanddstipin100rows
35 ct_dst_ltm No.ofrowsofthesamedstipin100rows
36 ct_src_ltm No.ofrowsofthesrcipin100rows
37 ct_src_dport_ltm Noofrows ofthesame srcipandthe dsportin100 rows
38 ct_dst_sport_ltm No of rowsof thesame dstipand thesport in100 rows
39 ct_dst_src_ltm Noofrowsofthesamesrcipandthedstipin100records
IDS There are many ways to add the IDS tools to our network; each has its advantages
and its disadvantages. The best choice would be a compromise between cost and desired
properties, while maintaining a high level of benefits and a controlled number of
disadvantages, all in accordance with the needs of the organization. For this reason, the
positions of the IDS within a network provide different characteristics. Then Admin will
see different possibilities in the same network. Admin will suppose that Admin have a
network where a firewall divides the Internet from the demilitarized zone
(DMZDemilitarized Zone), and another one that divides the DMZ from the intranet of
the organization as shown in the next figure. The DMZ is the area between the Internet
and the internal network. It is used to provide public services without having to allow
access to the private network of the organization. In this subnet are usually located the
main services such as HTTP, DNS and other servers .
47 | P a g e
Introduction
The main drawbacks of this location are that the IDSs can’t detect attacks using in their
communications some methods to hide information, such as encryption algorithms, and
that in this location the traffic rate is usually so high that the IDSs can’t monitor all the
packages .
48 | P a g e
Introduction
________________
13
Google Inc https://cloud.google.com/security/security-design/resources/
49 | P a g e
Introduction
Behind the second firewall In this case the IDS is located between the second firewall
and the internal network. It is not inside the internal network so it will not listen to any
internal traffic. This IDS should be less powerful than those commented on before, as
the volume of traffic is smaller at this point. Any atypical traffic that comes up here must
be considered hostile. At this point of the network fewer false alarms will occur, so any
alarm from the IDS should be immediately studied. This implementation makes these
systems particularly vulnerable to attacks, not only from the outside but also inside their
own infrastructure. It is vital to keep this in mind when implementing an intrusion
detector in this location, in order to detect attacks produced from within the network
itself, such as those launched by internal staff .
50 | P a g e
Introduction
51 | P a g e
Introduction
1.13 Hypothesis
Cloud intrusion detection system was developed in this work with following set of
hyposis:
1. Input dataset have session attribute value with class of intrusion either normal
/ type of attack.
2. Session were taken in cloud enviorment as per type of attacks.
3. Proposed model is free to implement at any place of network either before
firewall or after firewall.
4. Selected feature set is available in testing dataset as well, This attribute cvalue
should have same sequence as present in training dataset.
Based on the difficulties described above, a IDS architecture consisting of nodes running
ANNs on the cloud platform is presented. It is expected to provide greater flexibility,
scalability, and performance by design. Genetic Algorithm (GA) operates on a
population of potential solutions applying the principle of survival of the fittest to
produce better and better approximations to the solution of the problem that GA is trying
to solve and at each generation, a new set of approximations is created by the process of
selecting individuals according to their level of fitness value in the problem domain and
breeding them together using the operators borrowed from the genetic process
performed in nature, i.e. crossover and mutation . GA is chosen because of some of its
nice properties, e.g., robust to noise, no gradient information is required to find a global
optimal or sub-optimal solution, self-learning capabilities, etc. Using GAs for network
intrusion detection has proven to be a cost-effective approach . Moreover, Self
Organizing Feature Maps (SOFM) is chosen among the soft computing algorithms
because it is proven technique for automated clustering and visual organization and
anomaly detection in IDS. Our main aim is to provide an intrusion detection system
based on soft computing algorithms such as Self Organizing Feature Map Artificial
Neural Network and Genetic Algorithm to network intrusion detection system. The key
52 | P a g e
Introduction
1.15 Motivation
This research project is motivated by the prevailing and evolving technology of cloud
computing and the related challenges such as security. Such computing paradigms
attracted the attention of researchers because of their benefits on the one hand, and the
necessity to address their challenges that enlarge the number and type of customers. The
defence systems that are used to secure the cloud environment still need more
improvements as attacks continue to occur despite the fact that multiple techniques and
systems have been employed for the purpose of cloud security. As these
countermeasures are working on preventing the intrusions or on detecting in case the
prevention has failed, therefore, providing information about the potential attacks
improves the ability to prevent the attacks or complicates the attack launched by the
adversaries. The motivation to address intrusion prediction in cloud computing and such
paradigms was inspired by the fact prediction systems could significantly enhance the
security in such environments and hence more reliability and usage.
Proposed model was developed with aim to detect the intrusion in the cloud
envionement. Work optimize feature set of the dataset. Proposed model traces some
patterns from the cloud environment which help to identify a traffic into normal or
malicious class. This learning is done by passing the training dataset through
convolution filter first than filtered traffic were further used for the training of the Error
53 | P a g e
Introduction
Back Propagation Neural Network. This learning gives a two class output (Normal,
Malicious). Further work was divide into three module. Input training dataset pre-
processing was done in first module. Feature selection was done in second module by
IWDS algorithm. Training of neural network was done in third module of work.
Experimental work of above model was done on machine having configuration of RAM
4GB, Processor I3 6th generation. Implementation of proposed and comparing models
was done on MATLAB 2016a version. Cloud based malicious session dataset
UNSWNB15 was used for getting the evaluation parameter values.
There are some limitations that are associated with the proposed system; this section
discussing them as follows:
1. Specialized Dataset for cloud computing: unfortunately, a there is a lack of
experimental dataset that is specialized for testing cloud security solutions. Although
this work used a well-known dataset for intrusion detection that uses a cloud setup
parameters, however, more precise results would be produced with data collected from
real life cloud traffic.
2. Sensitivity Issue: the port scan detection algorithm (which is the core of our prediction
system) has suffered from high ratio of false positive alarms, in spite of the high
detection ratio. This results of considerable amount of alarms that might confuses the
security team. More refinement is needed to decrease the false positive ratio, which in
turn improve our prediction system sensitivity.
3. Location Centric: our proposed system is supposed to monitor the incoming traffic to
the cloud, which means that our system located on the cloud network interface. This
means limited our system effectivity on the attacks that targeted the cloud from the
outside, and threats that are launched from the inside like insiders attacks are
challenging to the proposed solution.
54 | P a g e
Introduction
Work has focus on the network intrusion detection techniques proposed by researcher of
this field. Different types of attacks are done by intruders on the system, network, cloud,
etc. This work has summarized researcher work in chapter 2 of the document. Work
related to finds some feature reduction survey was also detailed.
Chapter 2Literature Survey This chapter presents existing work related to intrusion
detection using machine learning algorithms. The different models are detailed with a
short presentation of how they work. Approaches and results are successively presented,
then compared in a common section to determine the best techniques. Finally, the
problems identified are listed along with ideas on how to improve these different points.
The insights gathered in this chapter are used in the design of the IDSs in the following
chapters.
55 | P a g e
Introduction
Chapter 4Genetic Algorithm & Neural Network Based Detection This chapter
describes two methods to improve two aspects of intrusion detection. Firstly, it is
possible to improve detection model by using genetic algorithm intelligement water drop
for reducing the dimension of input dataset. Secondly, neural network was developed
where IDSs are deployed rarely provide labeled datasets containing attacks. A hybrid
IDS could then self-populate its own signature database. This chapter detail the whole
working stps by a block diagram as well.
Chapter 5Experiment and results Compariosn of proposed models were done in this
chapter with different dataset size. It was shown in chapter how proposed model has
improved varipus evaluation parameters of the work.
56 | P a g e