0% found this document useful (0 votes)
28 views28 pages

Data Leakage Detection System

The document describes a data leakage detection system for databases. It discusses detecting when sensitive data distributed to third parties has been leaked, and aims to identify the agent responsible. The proposed system studies techniques to detect leakage without modifying data, by distributing real and fake objects to agents and assessing which agents leaked objects found in unauthorized places.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views28 pages

Data Leakage Detection System

The document describes a data leakage detection system for databases. It discusses detecting when sensitive data distributed to third parties has been leaked, and aims to identify the agent responsible. The proposed system studies techniques to detect leakage without modifying data, by distributing real and fake objects to agents and assessing which agents leaked objects found in unauthorized places.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

DATA LEAKAGE DETECTION SYSTEM FOR

DATABASE

SUBMITTED BY:

V. Sai Surya Teja

Reg.no:RA2011042020047

Class: CSE-BS

ABSTRACT:

In the course of doing business, sometimes sensitive data must be


handed over to supposedly trusted third parties. For example, a hospital may give patient
records to researchers who will devise new treatments. Similarly, a company may have
partnerships with other companies that Require sharing customer data. Another enterprise
may outsource its data processing, so data must be given to various other companies. We
call the owner of the data the distributor and the supposedly trusted third parties the
agents. Our goal is to detect when the distributor's sensitive. Data has been leaked by
agents, and if possible, to identify the agent that leaked the data.

Data mining is the process of extracting patterns from data. Data mining is
becoming an increasingly important tool to transform the data into information. It is
commonly used in a wide range of profiling practices, such as marketing, surveillance,
fraud detection and scientific discovery. Data mining can be used to uncover patterns in
data but is often carried out only on samples of data. The mining process will be
ineffective if the samples are not a good representation of the larger body of data. Data
mining cannot discover patterns that may be present in the larger body of data if those
patterns are not present in the sample being "mined". Inability to find patterns may
become a cause for some disputes between customers and service providers.

1
Therefore, data mining is not foolproof but may be useful if
sufficiently representative data samples are collected, the discovery of a particular pattern
in a particular set Of data does not necessarily mean that a pattern is found elsewhere in
the larger data from which that sample was drawn. An important part of the process is the
verification and validation of patterns on other samples of data.

2
TABLE OF CONTENTS
● ABSTRACT
● CHAPTER 1 :INTRODUCTION
➔ 1.1 INTRODUCTION
➔ 1.2 AIM
➔ 1.3 EXISTING SYSTEM
➔ 1.4 PROPOSED SYSTEM
➔ 1.5 THE TYPE OF EMPLOYEES MAY LEAK DATA
● CHAPTER 2 :SOFTWARE REQUIREMENT SPECIFICATIONS
➔ 2.1GENERAL DESCRIPTION
➔ 2.2 FUNCTIONAL REQUIREMENTS
➔ 2.3 INTERFACE REQUIREMENTS
➔ 2.4 NON FUNCTIONAL ATTRIBUTES
➔ 2.5 SOFTWARE REQUIREMENTS
➔ 2.6 HARDWARE REQUIREMENTS
● CHAPTER 3: MODULE ANALYSIS
➔ 3.1 ARCHITECTURE DIAGRAM
➔ 3.2 PROBLEMS SETUP AND NOTATION
➔ 3.3 GUILTY AGENTS
➔ 3.4 IMPLEMENTATION METHODS
➔ 3.4.1 DATA ALLOCATION
➔ 3.4.2 FAKE OBJECT
➔ 3.4.3 OPTIMIZATION
● CHAPTER 4: DESIGN AND PLANNING
➔ 4.1SOFTWARE DEVELOPMENT LIFE CYCLE MODEL
➔ 4.2 DATA FLOW DIAGRAM
➔ 4.3PROJECT FLOW DIAGRAM
➔ 4.4 ACTIVITY
➔ 4.5 SEQUENCE
➔ 4.6 COLLABORATION DIAGRAM
➔ 4.7 ENTITY RELATIONSHIP DIAGRAM

3
● CHAPTER 5: IMPLEMENTATION DETAILS
➔ 5.1FRONT END
➔ 5.2 BACK END
● CHAPTER 6: SYSTEM TESTING
➔ 6.1UNIT TESTING
➔ 6.2 INTEGRATION TESTING
➔ 6.3 ACCEPTANCE TESTING
● CHAPTER7: FUTURE ENHANCEMENTS

4
CHAPTER 1: INTRODUCTION

1.1 . INTRODUCTION:

Data leakage is an uncontrolled or unauthorized transmission of classified


information to the outside. It poses a serious problem to companies as the cost of
incidents continues to increase. Many software solutions were developed to provide data
protection. However, data leakage detection systems cannot provide absolute protection.
Thus, it is essential to discover data leakage as soon as possible.

Data leakage can be defined as an event in which classified information,


e.g., sensitive, protected or confidential data has been viewed, stolen or used by
somebody who is not authorized to do so. Data leakage causes serious and expensive
problems to companies and organizations, because the number of events continues to rise.
Data leak prevention helps ensure that confidential data like customer information,
personal employee information, trade secrets, financial data and research and
development data remains safe and secure. Data leak prevention solutions prevent
confidential data by securing the data itself. Once most critical data and its location are
identified on the network, it can be monitored to determine who is accessing and using it;
where it is being sent, copied, or transmitted. Several methods and systems have been
developed to prevent data leakage.

1.2. AIM:

To provide security when data breach occurs and to prevent the loss of data, and
finding the culprits or third-party persons, who try to corrupt the database. And retrieve
the data through the Data Leakage Detection System.

5
1.3. EXISTING SYSTEM:
Traditionally, leakage detection is handled by watermarking, e.g., a unique
code is embedded in each distributed copy. If that copy is later discovered in the
hands of an unauthorized party, the leaker can be identified. Watermarks can be very
useful in some cases, but again, involve some modification of the original data.
Furthermore, watermarks can sometimes be destroyed if the data recipient is
malicious. The Existing System can detect the hackers, but the total number of
cookies (evidence) will be less, and the organization may not be able to proceed
legally because the data recipient is malicious.

The Existing System can detect the hackers, but the total number of cookies
(evidence) will be less, and the organization may not be able to proceed legally for
further proceedings due to lack of good number of cookies and the chances of the
hackers are high.

1.4. PROPOSED SYSTEM:

In the proposed system we study unobtrusive techniques for detecting leakage of a set
of objects or records. Specifically, we study the following scenario: After giving a set
of objects to agents, the distributor discovers some of those same objects in an
unauthorized place. (For example, the data may be found on a website, or may be
obtained through a legal discovery process.) At this point, the distributor can assess
the likelihood that the leaked data came from one or more agents, as opposed to
having been independently gathered by other means.

In the proposed approach, we develop a model for assessing the


"guilt" of agents. We also present algorithms for distributing objects to agents, in a
way that improves our chances of identifying a leaker. Finally, we also consider the
option of adding "fake" objects to the distributed set. Such objects do not correspond
to real entities but appear realistic to the agents. In a sense, the fake objects act as a
type of watermark for the entire set, without modifying any individual members. If it

6
turns out that an agent was given one or more fake objects that were leaked, then the
distributor can be more confident that the agent was guilty. In the Proposed System
the hackers can be traced with a good amount of evidence.

1.5.THE TYPE OF EMPLOYEES MAY LEAK DATA:

The security illiterate

➔ Majority of employees with little or no knowledge of security


➔ Corporate risk because of accidental breaches

The gadget nerds

➔ Introduce a variety of devices to their work PCs


➔ Download software
The unlawful residents

➔ Use the company IT resources in ways they shouldn't i.e., by storing music,
movies, or playing games

The malicious/disgruntled employees

➔ Typically minority of employees

➔ Gain access to areas of the IT system to which they shouldn't Send corporate
data (e.g., customer lists, R&D, etc.) to third parties

7
CHAPTER .2: SOFTWARE REQUIREMENT SPECIFICATIONS
2.1 .GENERAL DESCRIPTION:

In this, general functions of the product which includes the objective of the user, a user
characteristic, features, benefits, about why its importance is mentioned. It also describes
features of the user community.

2.2.FUNCTIONAL REQUIREMENTS :

In this, the possible outcome of a software system which includes effects due to operation
of the program is fully explained. All functional requirements which may include
calculations, data processing, etc. are placed in a ranked order.

2.3. INTERFACE REQUIREMENTS:

In this, software interfaces which mean how software programs communicate with each
other or users either in the form of any language, code, or message are fully described
and explained. Examples can be shared memory, data streams, etc.

2.4. NON FUNCTIONAL ATTRIBUTES :

In this, non-functional attributes are explained that are required by the software system
for better performance. An example may include Security, Portability, Reliability,
Reusability, Application compatibility, Data integrity, Scalability capacity, etc.

8
2.5.SOFTWARE REQUIREMENTS:

➔ The module is written in ASP .net and C# .net.


➔ It is developed in Visual Basics Platform.
➔ Windows is the operating system chosen for the module.
➔ The database used in the project is MS-SQL server 2005 or higher.

2.6. HARDWARE REQUIREMENTS :

➔ PROCESSOR: Pentium 4 or above.


➔ RAM: 256 MB or more.
➔ Hard disc Space: 500 MB to I GB.

9
CHAPTER 3: MODULE ANALYSIS

3.1 ARCHITECTURE DIAGRAM :

10
3.2 PROBLEM SETUP AND NOTATION:

A distributor owns a set T = {t1 . . . tm} of valuable data objects. The distributor wants to
share some of the objects with a set of agents U1; U2; . . . ; Un, but does not wish the
objects be leaked to other third parties. The objects in T could be of any type and size,
e.g., they could be tuples in a relation, or relations in a database. An agent Uireceives a
subset of objects Ri_ T, determined either by a sample request or an explicit request:
Sample request Ri = SAMPLE (T, mi): Any subset of records from T can be given to Ui.
Explicit request Ri=EXPLICIT (T,condi): Agent Uireceives all T objects that satisfy
condi.

3.3 GUILTY AGENTS :

Suppose that after giving objects to agents, the distributor discovers that a set S ( T )has
leaked. This means that some third party, called the target, has been caught in possession
of S. For example, this target may be displaying S on its website, or perhaps as part of a
legal discovery process, the target turned over S to the distributor. Since the agents U1.. .
. .Unhas some of the data, it is reasonable to suspect them leaking the data. However, the
agents can argue that they are innocent, and that the S data were obtained by the target
through other means.

3.4IMPLEMENTATION METHODS :

3.4.1 DATA ALLOCATION:

The main focus of this paper is the data allocation problem: How can the distributor
“intelligently” give data to agents in order to improve the chances of detecting a guilty
agent? As illustrated in Fig. 1, there are four instances of this problem we address,
depending on the type of data requests made by agents and whether “fake objects” are
allowed.

11
3.4.2 FAKE OBJECT:

The distributor may be able to add fake objects to the distributed data in order to improve
his effectiveness in detecting guilty agents. However, fake objects may impact the
correctness of what agents do, so they may not always be allowable. The idea of
perturbing data to detect leakage is not new, However, in most cases, individual objects
are perturbed, e.g., by adding random noise to sensitive salaries, or adding a watermark to
an image. In our case, we are perturbing the set of distributor objects by adding fake
elements. In some applications, fake objects may cause fewer problems than perturbing
real objects. Our use of fake objects is inspired by the use of “trace” records in mailing
lists.

3.4.3 OPTIMIZATION:

The distributor’s data allocation to agents has one constraint and one objective. The
distributor’s constraint is to satisfy agents’ requests, by providing them with the number
of objects they request or with all available objects that satisfy their conditions. His
objective is to be able to detect an agent who leaks any portion of his data. We consider
the constraint as strict. The distributor may not deny serving an agent request as and may
not provide agents with different perturbed versions of the same objects. We consider
fake object distribution as the only possible constraint relaxation.

12
CHAPTER 4: DESIGN AND PLANNING

4.1 SOFTWARE DEVELOPMENT LIFE CYCLE MODEL :

A software development life cycle (SDLC) model is a conceptual framework


describing all activities in a software development project from planning to
maintenance. This process is associated with several models, each including a variety
of tasks and activities.

4.2 DATA FLOW DIAGRAM :

13
4.3 OBJECT DIAGRAM :

14
4.3 PROJECT FLOW DIAGRAM:

15
4.4 :ACTIVITY:

16
4.4 USECASE DIAGRAM :

17
4.5 SEQUENCE:

18
·

19
4.6 COLLABORATION DIAGRAM:

20
4.7 ENTITY RELATIONSHIP DIAGRAM:

21
CHAPTER 5 : IMPLEMENTATION DETAILS
In this Section we will do Analysis of Technologies to use for implementing the
project.

5.1 : FRONT END:

JAVA SERVER PAGES(JSP) :

JavaServer Pages (JSP) is a technology for developing Web Pages that supports dynamic
content. This helps developers insert java code in HTML pages by making use of special
JSP tags, most of which start with <% and end with %>.A JavaServer Pages component
is a type of Java servlet that is designed to fulfill the role of a user interface for a Java
web application. Web developers write JSPs as text files that combine HTML or XHTML
code, XML elements, and embedded JSP actions and commands.Using JSP, you can
collect input from users through Web Page forms, present records from a database or
another source, and create Web Pages dynamically.JSP tags can be used for a variety of
purposes, such as retrieving information from a database or registering user preferences,
accessing JavaBeans components, passing control between pages, and sharing
information between requests, pages etc.

SERVLET:

AServlet technology is used to create a web application (resides at server side and
generates a dynamic web page).Servlet technology is robust and scalable because of java
language. Before Servlet, CGI (Common Gateway Interface) scripting language was
common as a server-side programming language. However, there were many
disadvantages to this technology. We have discussed these disadvantages below.There are
many interfaces and classes in the Servlet API such as Servlet, GenericServlet,
HttpServlet, ServletRequest, ServletResponse, etc

22
5.2 BACK END:

MySQL:

MySQL is an open source relational database management system (RDBMS) based on


Structured Query Language (SQL). It is one part of the very popular LAMP platform
consisting of Linux, Apache, My SQL, and PHP. Currently My SQL is owned by Oracle. My
SQL database is available on most important OS platforms. It runs on BSD Unix, Linux,
Windows, or Mac OS. Wikipedia and YouTube use My SQL. These sites manage millions of
queries each day. My SQL comes in two versions: My SQL server system and My SQL
embedded system.

23
CHAPTER 6: SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product. It is
the process of exercising software with the intent of ensuring that the Software system
meets its requirements and user expectations and does not fail in an unacceptable manner.
There are various types of tests. Each test type addresses a specific testing requirement.

6.1 :UNIT TESTING:

Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.

Test strategy and approach:

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

• All field entries must work properly.


• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

24
Features to be tested

● Verify that the entries are of the correct format


● No duplicate entries should be allowed
● All links should take the user to the correct page

6.2 INTEGRATION TESTING:

Software integration testing is the incremental integration testing of two or more


integrated software components on a single platform to produce failures caused by
interface defects. The task of the integration test is to check that components or software
applications, e.g. components in a software system or — one step up — software
applications at the company level — interact without error.
Test Results - All the test cases mentioned above passed successfully. No defects
encountered.

6.3 ACCEPTANCE TEST:

User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

25
CHAPTER 7 FUTURE ENHANCEMENT
Our future work includes the investigation of agent guilt models that capture leakage
scenarios that are not studied in this project. For example, what is the appropriate model
for cases where agents can collude and identify fake tuples? A preliminary discussion of
such a model is available in. Another open problem is the extension of our allocation
strategies so that they can handle agent requests in an online fashion (the presented
strategies assume that there is a fixed set of agents with requests known in advance).

Any application does not end with a single version. It can be improved to include new
features.

26
27
28

You might also like