Data Leakage Detection System
Data Leakage Detection System
DATABASE
SUBMITTED BY:
Reg.no:RA2011042020047
Class: CSE-BS
ABSTRACT:
Data mining is the process of extracting patterns from data. Data mining is
becoming an increasingly important tool to transform the data into information. It is
commonly used in a wide range of profiling practices, such as marketing, surveillance,
fraud detection and scientific discovery. Data mining can be used to uncover patterns in
data but is often carried out only on samples of data. The mining process will be
ineffective if the samples are not a good representation of the larger body of data. Data
mining cannot discover patterns that may be present in the larger body of data if those
patterns are not present in the sample being "mined". Inability to find patterns may
become a cause for some disputes between customers and service providers.
1
Therefore, data mining is not foolproof but may be useful if
sufficiently representative data samples are collected, the discovery of a particular pattern
in a particular set Of data does not necessarily mean that a pattern is found elsewhere in
the larger data from which that sample was drawn. An important part of the process is the
verification and validation of patterns on other samples of data.
2
TABLE OF CONTENTS
● ABSTRACT
● CHAPTER 1 :INTRODUCTION
➔ 1.1 INTRODUCTION
➔ 1.2 AIM
➔ 1.3 EXISTING SYSTEM
➔ 1.4 PROPOSED SYSTEM
➔ 1.5 THE TYPE OF EMPLOYEES MAY LEAK DATA
● CHAPTER 2 :SOFTWARE REQUIREMENT SPECIFICATIONS
➔ 2.1GENERAL DESCRIPTION
➔ 2.2 FUNCTIONAL REQUIREMENTS
➔ 2.3 INTERFACE REQUIREMENTS
➔ 2.4 NON FUNCTIONAL ATTRIBUTES
➔ 2.5 SOFTWARE REQUIREMENTS
➔ 2.6 HARDWARE REQUIREMENTS
● CHAPTER 3: MODULE ANALYSIS
➔ 3.1 ARCHITECTURE DIAGRAM
➔ 3.2 PROBLEMS SETUP AND NOTATION
➔ 3.3 GUILTY AGENTS
➔ 3.4 IMPLEMENTATION METHODS
➔ 3.4.1 DATA ALLOCATION
➔ 3.4.2 FAKE OBJECT
➔ 3.4.3 OPTIMIZATION
● CHAPTER 4: DESIGN AND PLANNING
➔ 4.1SOFTWARE DEVELOPMENT LIFE CYCLE MODEL
➔ 4.2 DATA FLOW DIAGRAM
➔ 4.3PROJECT FLOW DIAGRAM
➔ 4.4 ACTIVITY
➔ 4.5 SEQUENCE
➔ 4.6 COLLABORATION DIAGRAM
➔ 4.7 ENTITY RELATIONSHIP DIAGRAM
3
● CHAPTER 5: IMPLEMENTATION DETAILS
➔ 5.1FRONT END
➔ 5.2 BACK END
● CHAPTER 6: SYSTEM TESTING
➔ 6.1UNIT TESTING
➔ 6.2 INTEGRATION TESTING
➔ 6.3 ACCEPTANCE TESTING
● CHAPTER7: FUTURE ENHANCEMENTS
4
CHAPTER 1: INTRODUCTION
1.1 . INTRODUCTION:
1.2. AIM:
To provide security when data breach occurs and to prevent the loss of data, and
finding the culprits or third-party persons, who try to corrupt the database. And retrieve
the data through the Data Leakage Detection System.
5
1.3. EXISTING SYSTEM:
Traditionally, leakage detection is handled by watermarking, e.g., a unique
code is embedded in each distributed copy. If that copy is later discovered in the
hands of an unauthorized party, the leaker can be identified. Watermarks can be very
useful in some cases, but again, involve some modification of the original data.
Furthermore, watermarks can sometimes be destroyed if the data recipient is
malicious. The Existing System can detect the hackers, but the total number of
cookies (evidence) will be less, and the organization may not be able to proceed
legally because the data recipient is malicious.
The Existing System can detect the hackers, but the total number of cookies
(evidence) will be less, and the organization may not be able to proceed legally for
further proceedings due to lack of good number of cookies and the chances of the
hackers are high.
In the proposed system we study unobtrusive techniques for detecting leakage of a set
of objects or records. Specifically, we study the following scenario: After giving a set
of objects to agents, the distributor discovers some of those same objects in an
unauthorized place. (For example, the data may be found on a website, or may be
obtained through a legal discovery process.) At this point, the distributor can assess
the likelihood that the leaked data came from one or more agents, as opposed to
having been independently gathered by other means.
6
turns out that an agent was given one or more fake objects that were leaked, then the
distributor can be more confident that the agent was guilty. In the Proposed System
the hackers can be traced with a good amount of evidence.
➔ Use the company IT resources in ways they shouldn't i.e., by storing music,
movies, or playing games
➔ Gain access to areas of the IT system to which they shouldn't Send corporate
data (e.g., customer lists, R&D, etc.) to third parties
7
CHAPTER .2: SOFTWARE REQUIREMENT SPECIFICATIONS
2.1 .GENERAL DESCRIPTION:
In this, general functions of the product which includes the objective of the user, a user
characteristic, features, benefits, about why its importance is mentioned. It also describes
features of the user community.
2.2.FUNCTIONAL REQUIREMENTS :
In this, the possible outcome of a software system which includes effects due to operation
of the program is fully explained. All functional requirements which may include
calculations, data processing, etc. are placed in a ranked order.
In this, software interfaces which mean how software programs communicate with each
other or users either in the form of any language, code, or message are fully described
and explained. Examples can be shared memory, data streams, etc.
In this, non-functional attributes are explained that are required by the software system
for better performance. An example may include Security, Portability, Reliability,
Reusability, Application compatibility, Data integrity, Scalability capacity, etc.
8
2.5.SOFTWARE REQUIREMENTS:
9
CHAPTER 3: MODULE ANALYSIS
10
3.2 PROBLEM SETUP AND NOTATION:
A distributor owns a set T = {t1 . . . tm} of valuable data objects. The distributor wants to
share some of the objects with a set of agents U1; U2; . . . ; Un, but does not wish the
objects be leaked to other third parties. The objects in T could be of any type and size,
e.g., they could be tuples in a relation, or relations in a database. An agent Uireceives a
subset of objects Ri_ T, determined either by a sample request or an explicit request:
Sample request Ri = SAMPLE (T, mi): Any subset of records from T can be given to Ui.
Explicit request Ri=EXPLICIT (T,condi): Agent Uireceives all T objects that satisfy
condi.
Suppose that after giving objects to agents, the distributor discovers that a set S ( T )has
leaked. This means that some third party, called the target, has been caught in possession
of S. For example, this target may be displaying S on its website, or perhaps as part of a
legal discovery process, the target turned over S to the distributor. Since the agents U1.. .
. .Unhas some of the data, it is reasonable to suspect them leaking the data. However, the
agents can argue that they are innocent, and that the S data were obtained by the target
through other means.
3.4IMPLEMENTATION METHODS :
The main focus of this paper is the data allocation problem: How can the distributor
“intelligently” give data to agents in order to improve the chances of detecting a guilty
agent? As illustrated in Fig. 1, there are four instances of this problem we address,
depending on the type of data requests made by agents and whether “fake objects” are
allowed.
11
3.4.2 FAKE OBJECT:
The distributor may be able to add fake objects to the distributed data in order to improve
his effectiveness in detecting guilty agents. However, fake objects may impact the
correctness of what agents do, so they may not always be allowable. The idea of
perturbing data to detect leakage is not new, However, in most cases, individual objects
are perturbed, e.g., by adding random noise to sensitive salaries, or adding a watermark to
an image. In our case, we are perturbing the set of distributor objects by adding fake
elements. In some applications, fake objects may cause fewer problems than perturbing
real objects. Our use of fake objects is inspired by the use of “trace” records in mailing
lists.
3.4.3 OPTIMIZATION:
The distributor’s data allocation to agents has one constraint and one objective. The
distributor’s constraint is to satisfy agents’ requests, by providing them with the number
of objects they request or with all available objects that satisfy their conditions. His
objective is to be able to detect an agent who leaks any portion of his data. We consider
the constraint as strict. The distributor may not deny serving an agent request as and may
not provide agents with different perturbed versions of the same objects. We consider
fake object distribution as the only possible constraint relaxation.
12
CHAPTER 4: DESIGN AND PLANNING
13
4.3 OBJECT DIAGRAM :
14
4.3 PROJECT FLOW DIAGRAM:
15
4.4 :ACTIVITY:
16
4.4 USECASE DIAGRAM :
17
4.5 SEQUENCE:
18
·
19
4.6 COLLABORATION DIAGRAM:
20
4.7 ENTITY RELATIONSHIP DIAGRAM:
21
CHAPTER 5 : IMPLEMENTATION DETAILS
In this Section we will do Analysis of Technologies to use for implementing the
project.
JavaServer Pages (JSP) is a technology for developing Web Pages that supports dynamic
content. This helps developers insert java code in HTML pages by making use of special
JSP tags, most of which start with <% and end with %>.A JavaServer Pages component
is a type of Java servlet that is designed to fulfill the role of a user interface for a Java
web application. Web developers write JSPs as text files that combine HTML or XHTML
code, XML elements, and embedded JSP actions and commands.Using JSP, you can
collect input from users through Web Page forms, present records from a database or
another source, and create Web Pages dynamically.JSP tags can be used for a variety of
purposes, such as retrieving information from a database or registering user preferences,
accessing JavaBeans components, passing control between pages, and sharing
information between requests, pages etc.
SERVLET:
AServlet technology is used to create a web application (resides at server side and
generates a dynamic web page).Servlet technology is robust and scalable because of java
language. Before Servlet, CGI (Common Gateway Interface) scripting language was
common as a server-side programming language. However, there were many
disadvantages to this technology. We have discussed these disadvantages below.There are
many interfaces and classes in the Servlet API such as Servlet, GenericServlet,
HttpServlet, ServletRequest, ServletResponse, etc
22
5.2 BACK END:
MySQL:
23
CHAPTER 6: SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product. It is
the process of exercising software with the intent of ensuring that the Software system
meets its requirements and user expectations and does not fail in an unacceptable manner.
There are various types of tests. Each test type addresses a specific testing requirement.
Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
24
Features to be tested
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
25
CHAPTER 7 FUTURE ENHANCEMENT
Our future work includes the investigation of agent guilt models that capture leakage
scenarios that are not studied in this project. For example, what is the appropriate model
for cases where agents can collude and identify fake tuples? A preliminary discussion of
such a model is available in. Another open problem is the extension of our allocation
strategies so that they can handle agent requests in an online fashion (the presented
strategies assume that there is a fixed set of agents with requests known in advance).
Any application does not end with a single version. It can be improved to include new
features.
26
27
28