0% found this document useful (0 votes)
104 views13 pages

Digital Forensics Using Data Mining: Abstract

This document discusses using data mining techniques to help address challenges in digital forensics investigations involving large amounts of data. It begins with an introduction to digital forensics and data mining processes. Several proposed methods that combine these areas are then reviewed, including frameworks that apply data mining algorithms like clustering, classification, and association rule learning to evidence extraction and analysis. The document evaluates these approaches and notes that data mining can help analyze large forensic datasets more efficiently and generate patterns to aid investigations.

Uploaded by

Tahir Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views13 pages

Digital Forensics Using Data Mining: Abstract

This document discusses using data mining techniques to help address challenges in digital forensics investigations involving large amounts of data. It begins with an introduction to digital forensics and data mining processes. Several proposed methods that combine these areas are then reviewed, including frameworks that apply data mining algorithms like clustering, classification, and association rule learning to evidence extraction and analysis. The document evaluates these approaches and notes that data mining can help analyze large forensic datasets more efficiently and generate patterns to aid investigations.

Uploaded by

Tahir Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Digital forensics using Data Mining

Abstract:
In this paper, we reviewed the challenges faced by forensics analyst to examine and analyse
large amount of data gathered as digital evidence. Digital forensics is process of identifying,
preserving, extracting and documenting the digital evidence. Forensics investigations become
very complex while dealing with large volume of data. As a result, more time is devoured
with construction of minimum outputs and results. By using data mining techniques along
with digital forensics process are useful to surpass this issue. We discussed different proposed
methods of digital forensics with data mining techniques. One base paper is selected and
analysed comprehensively in this paper.

Keywords: Digital Forensics, Digital Evidence, KMT, Apriori-gen, Data Mining.

1 Introduction:
Digital forensics also known as computer forensics is the process of identifying, collecting,
analysing and examining the digital evidence along with the preservation of information and
maintaining the integrity of the evidence. The main goal of digital forensics is the
preservation of electronic evidence in original form along with the investigation process. An
investigation in digital forensics has four phases which are collection, examination, analysis
and presentation. In collection phase, the digital evidence is collected from suspect's device.
It may include seizure of devices present at the scene of crime and also the devices that may
contain potential digital evidence with the maximum care to avoid any contamination to the
evidence. The second phase is examination phase, where the data from evidence containers is
extracted and then examined. It also includes the extraction of relevant pieces of information
and recovery of any deleted files and folders. In analysis phase, the extracted data in then
analysed in detail to draw conclusions and results. The presentation phase is the final phase
where the forensics analyst prepares and presents detailed reports of the investigation
outcomes from analysis phase. Figure 1.1 shows digital forensics investigation process:

In today's era, every organization has huge amount of data which includes employee details,
contracts, business information and other sensitive information. All these details are stored in
some kind of database either relational or non-relational referred to as no sql. In digital
forensics, the hard part is to get the data from these large databases for analysis. The forensics
analyst, law enforcement and intelligence agencies faces an extreme difficult challenge to
analyse large volumes of data collected from organizational database involved in crimes and
terrorism. A suitable technique to tackle the issue is Data mining. Data mining is a scientific
method of extracting the meaningful information from existing databases. By using data
mining techniques we can identify the different patterns of related data and also construct
relationships among data to perform analysis. This comes handful in analysis phase of digital
forensics as it saves time and makes work easier for forensics analyst. We can also generate
crime related patterns through data mining to foresee any crimes in the future. Just like digital
forensics, data mining also has five general phases data collection, data storage and
management, data access and operations, and data presentation. Figure 1.2 shows the
process of data mining below:

In general, digital forensics software only assist forensics analyst during investigations. To
solve a crime, it needs a lot more than assistance where Data mining techniques for digital
forensics becomes useful. By using data mining, not only a large amount of data is analysed
to produce crime related patterns and co-relations, investigation time and complexity is also
reduced which is a basic need for digital forensics.

2 Related Work
In this section, various solutions and frameworks proposed for digital forensics through data
mining are discussed. For extracting the meaningful data and crime related patterns from
databases and various other sources of data, many frameworks were introduced by
researchers. Some of them are following:

2.1 Improving Digital Forensics Through Data Mining[1]

In 2014, Chrysoula Tsochataridou, Avi Arampatzis, and Vasilios Katos proposed a


mechanism for digital forensics based on data mining clustering techniques. Weka, a
collection of data mining algorithms was used for performing operations on data. The whole
scheme was to first store all the collected data in MySQL database and then perform different
operations on data by using weka. Operations include data pre-processing, classification,
regression, clustering and visualization.

2.2 Data mining Techniques for Digital Forensic Analysis[2]

In 2016, Ashwinkumar Malwadkar and Prof. Sonali Patil proposed a system for digital
forensics comprised of computer forensics tools and data mining techniques. Data mining
algorithm based on Apriori Algorithm was proposed for working with data gathered by
computer forensics tools to find crime patterns, co-relations and associations between data
items.

2.3 Digital Forensics and Cyber Crime Data Mining[3]

In 2012, K. K. Sindhu, and B. B. Meshram proposed a tool for digital forensics to find
motive, cyber crime patterns and also frequency of a specific attack happened over a certain
period of time. The proposed system is a combination of digital forensics investigation and
crime data mining techniques. Data gathered from different areas e.g network traffic, file
system analysis, log file analysis was analysed by data mining algorithm.

2.4 Applying Data Mining Principles in the Extraction of Digital Evidence[4]

In 2018, Raburu George, Omollo Richard, and Okumu Daniel proposed different data mining
techniques for digital forensics to extract data from extremely large databases. The main
focus was merging data mining techniques with digital forensics to extract digital evidence
and reduce complexity.

2.5 Data Mining based Crime-Dependent Triage in Digital Forensics Analysis[5]

In 2012, Rosamaria Berte, Fabio Marturana, Gianluigi Me, and Simone Tacconi proposed an
approach for digital forensics investigations based on data mining techniques and knowledge
management theory a.k.a KMT. The main aim of proposed system was to speed up the
investigations by giving priorities to seized devices for analysis.

2.6 Framework for Live Digital Forensics using Data Mining[6]

In 2015, Prof Sonal Honale , and Jayshree Borkar presented a framework for live digital
forensics using data mining. The data mining algorithms used are K-Means and Apriori
Algorithm for determining the cyber attacks that occurs and counting number of times any
specific attack occur during system working time. Various system tools are also used which
are win cap, jp cap and wmic.

2.7 A digital forensic model based on data mining[7]

In 2015, Peng Cheng and Hui Qu proposed a model for digital forensics with combination of
data mining. Data mining techniques were applied during data analysis process. A network
traffic forensics model was proposed based on hardtop. The model monitors the network
traffic and performs real time collection of traffic data for sampling and analysis.

2.8 Data Mining Methods Applied to a Digital Forensics Task for Supervised Machine
Learning[8]

In 2014, Antonio J. Tallon-Ballesteros and Jose C. Riquelme proposed a system based on


certain data mining techniques for digital evidence gathering. The system performs
experiments based on decision tress, bayes classifiers, artificial neural networks and nearest
neighbours on digital evidence to conclude results.

2.9 Imminent accession of Artificial Intelligence based Forensic Exploratory with Data
Mining Analysis

In 2017, S. Umar , A. Praveen, S. Gouse, and N. Deepthi discussed the role of data mining
techniques in digital forensics. They also discussed how and in which procedure data mining
techniques should be applied in digital forensics investigations. Different methods
association, grouping, classification and unpredictable are discussed.

All the above discussed methods have their own advantages and disadvantages. A brief
summary of the literature is discussed in table below:

Table 2.1

Brief summary of discussed methodologies

Proposed Mechanism/ Novel Contribution Tools / Limitations/


Method Parameters Drawbacks
Approach

[1] Model Based A model combined WEKA is used, It works well for
with data mining which is a small data sets but
Approach techniques and combination of memory can
system tools for data mining overflow in case if
gathering data algorithms and data volume is so
electronic mail large
messages are used
as unstructured
data

[2] System Based A system combined Apriori's Apriori Algorithm


with computer Algorithm for data can become slow
Approach forensics tools and mining is used. while processing
data mining Operations are large item sets to
algorithm performed on data generate candidates
gathered by and relations among
computer forensics them
tools.

[3] Tool Based A tool combined Digital forensics Non standard


with digital tools to gather data algorithm is used
Approach forensics exploration form network Limited to only
and cyber crime data traffic , file system network related data.
mining techniques analysis and log
files upon which
cyber crime
mining techniques
were applied to
conclude results
and frequency of
particular attacks

[4] Techniques Suggest a way to No tools are used. Non standard model
Based implement data based diagram
Approach mining techniques at Just a literature which shows the
the right place and review of certain point where data
right time to reduce data mining mining should be
investigation time techniques. applied
and complexity

[5] Priority Knowledge Knowledge Sometimes, it is not


Model Based management theory management what its look like, so
Approach is used to give theory with data there is a chance that
priority to evidence mining techniques. a nugatory device
containers to speed Parameters were may contain more
up investigations seized devices that potential and
contains electronic straight forward
evidence evidence. So as it is
priority based, it
may become useless
in such case

[6] Framework Framework for live Tools used are Apriori algorithm
Based digital forensics wincap, jpcap, can be slow for large
Approach with K-Means and wmic, K-means item sets.
Apriori Algorithm and Apriori
during system algorithm. Another problem
running time. Parameters used related to apriori
are network traffic algorithm is works
packets and log well to find events
files that occurs mostly,
but at the same, it is
very hard to find
rarely occurring
events

[7] Monitoring Network traffic Network Layer based


Model Based forensics model monitoring tool to approach and layers
Approach based on data gather data from are dependent on
mining techniques to network layer and each other, so if the
monitor network data mining failure occurs at one
traffic in real time techniques for layer then process
analysing data to will not continue to
conclude results work.

[8] Experimental Digital forensics Data mining Data to be given as


Model Based system with data techniques to parameters for
Approach mining techniques to perform experiment requires
gather electronic experiments based fine tuning to get
evidence on decision trees, better results.
byes classifiers
and nearest Results produced are
neighbours on accurate up to 73%
raw data only which is much
less than the
required percentage

[9] No Model Procedure in which Different methods Not sufficient details


data mining of data mining e.g.
techniques should be grouping, No standard model
applied during classification, or approach is
digital forensics for association and proposed to work
faster results identification with

3 Base Paper: Data mining Techniques for Digital Forensic Analysis[2]


This research is based on digital forensics and data mining techniques. Digital forensics or
computer forensics is the process that involves finding, collecting, analysing and presenting
the digital evidence. Mostly, the digital evidence or we can say electronic evidence is
magnetically stored information. At organization level, different devices can hold digital
evidence. Devices can be computer systems, laptops, mobiles, PDA'S, networking media, and
other consumer electronic devices. Digital forensics is a scientific method consist of four
phases which are identification, preservation, extraction and documentation. In identification
phase, the potential evidence containers are identified and seized. In preservation phase, the
digital evidence is preserved. The main focus during preservation phase is to preserve the
evidence in its most original form and save it from contamination. In extraction phase, the
evidence is extracted from containers. It also involves the recovery of any hidden or deleted
files and folders. The forensics analyst then performs analysis on the evidence to reports all
the findings and results in the documentation phase. Data is an important factor for
companies to compete in market place. Due to this, companies have huge amount of data
which is stored in databases hosted on servers. With increase in importance of data, its
security becomes a critical issue. Due to distributed architecture and large volumes of data, it
becomes very difficult for organizations to determine which events caused the crime such as
exposure of sensitive data and data breaches. Due to large amount of data, forensics analyst
and law enforcement agencies have problems to critical crime related cases. The methods to
analyse large volumes of data are outmoded. These methods does not provide all the details
because of exhaustive human interaction. Data mining is the solution for this problem. Data
mining is a scientific method of constructing useful information from raw data stored in
databases. Data mining techniques such as clustering and classification can help us to identify
and track crime patterns and thus speed up the investigation process. Digital forensics is
pivotal in prosecuting a cyber criminal. In this research, six crucial steps for digital forensics
are defined, which are shown in below diagram:

Different types of digital forensics are discussed by author are:

3.1 Computer Forensics[2]

3.2 File System Analysis[2]

3.3 Boot Sector Analysis[2]


3.4 Network Forensics[2]

3.5 Email Forensics[2]

Data mining is scientific field to construct intriguing structures of data from pre-stored data in
a certain format. The structures of data are basically patterns, statistical or graphical
representation of data. Data mining is a juvenile field in context of criminal and intelligence
analysis. Cyber crime data mining techniques that are helpful for solving crime are
clustering, association, deviation detection, classification, and string operator techniques.
Below is a diagram that shows data mining techniques:

Data mining algorithm used is Apriori algorithm for finding associations. Its working is based
on following steps:

1: In first step, we identify the item sets from evidence case report.

2: Then we make a set of items/variables Item Set Is = { I1, I2, I3, I4, I5..... In}

3: In third step, we identify/ determine the action set to perform actions on Item set, Action
Set As = { A1, A2, A3, A4....AN}

4: In fourth step, we find relevant item sets by using Apriori Algorithm

5: In fifth step, we associate item sets by making association rules

6: In sixth step, we set SQL queries in accordance with association rules

7: In seventh step, we perform SQL queries to retrieve data

The item sets extracted from case report are stored as attributes of different tables.
Association rules are based on fact that second event will occur as a result of first event. We
can denote it as X ------> Y, arrow shows association between event X and Y. This means
that Y will occur if X occurs. Below is a flowchart diagram that shows the working of
proposed system:

The proposed system is a combination digital forensics tools and data mining techniques. Its
flow of work is shown in complete system diagram below:
4 Conclusion:
Digital forensics is the process of identifying, preserving, extracting and presenting the digital
evidence. The digital evidence is a data that is gathered from suspects or criminal devices.
These devices can range from small to large including mobile, PDA's, laptops, IPods,
computers and other electronic devices. Meanwhile, there can very huge amount of data and
it becomes very difficult for investigators to analyse the data. Due to this, forensics
investigations can be very time consuming and complex. To overcome this issue, data mining
techniques can be used. Data mining is the process of developing the data structures which
gives crime related patterns and also large amount of data can be analysed with minimum
time and complexity. This research addressed the need of data mining techniques in digital
forensics process. Different proposed methods of digital forensics with data mining
techniques are discussed. By using data mining techniques along with digital forensics
techniques can help forensics analyst and intelligence agencies to reduce the work complexity
and conclude results in mean time.

5 References:
[1] "Improving Digital Forensics Through Data Mining by Chrysoula Tsochataridou, Avi
Arampatzis, Vasilios Katos Department of Electrical and Computer Engineering Democritus
University of Thrace Xanthi, Greece"

[2] "Data mining Techniques for Digital Forensic Analysis Ashwinkumar Malwadkar Prof.
Sonali Patil Department of Information Technology Department of Information Technology
K. J. Somaiya College of Engineering K. J. Somaiya College of Engineering Mumbai,
Maharashtra [email protected] [email protected]"

[3] "Digital Forensics and Cyber Crime Data mining K. K. Sindhu, B. B. Meshram Computer
Engineering Department, Shah and Anchor Kutchhi Engineering College, Mumbai, India
2Computer Engineering Department, Veermata Jijabai Technological Institute, Mumbai,
India Email: [email protected], [email protected]"

[4] "Applying Data Mining Principles in the Extraction of Digital Evidence Raburu George,
Omollo Richard, Okumu Daniel Department of Computer Science and Software Engineering
Jaramogi Oginga Odinga University of Science and Technology, Kenya Email:
[email protected] ; [email protected] ; [email protected]"

[5] "Data Mining based Crime-Dependent Triage in Digital Forensics Analysis Rosamaria
Bertè1, Fabio Marturana, Gianluigi Me1, Simone Tacconi, Department of Computer
Science, Systems and Production University of Tor Vergata, Rome, Italy Servizio Polizia
Postale e delle Comunicazioni,Rome, Italy [email protected], [email protected],
[email protected], [email protected]."

[6] "Framework for Live Digital Forensics using Data Mining Prof Sonal Honale , Jayshree
Borkar Computer Science and Engineering Department, Aabha Gaikwad College of
Engineering , Nagpur, India"

[7] "A digital forensic model based on data mining Peng Cheng , Hui Qu , Training
Department, Engineering College of CAPF, Xi'an 710086, China Faculty of Science,
Engineering College of CAPF, Xi'an 710086, China [email protected],
[email protected]"

[8] "Data Mining Methods Applied to a Digital Forensics Task for Supervised Machine
Learning Antonio J. Tallón-Ballesteros and José C. Riquelme Department of Languages and
Computer Systems, University of Seville Reina Mercedes Avenue, Seville, 41012 Spain
[email protected]"

[9] "Imminent accession of Artificial Intelligence based Forensic Exploratory with Data
Mining Analysis S. Umar , A. Praveen , S. Gouse, N. Deepthi Department Of Computer
Science Engineering, MLRIT, Hyderabad Department Of Computer Science Engineering,
IARE, Hyderabad Department Of Computer Science Engineering, MLRIT, Hyderabad
Department Of Computer Science Engineering, CMR ENGG & TECH, Hyderabad"

You might also like