100% found this document useful (1 vote)
486 views81 pages

Data Leakage Detection

We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebodys laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leaka

Uploaded by

Sai Phani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
486 views81 pages

Data Leakage Detection

We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebodys laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leaka

Uploaded by

Sai Phani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 81

Data Leakage Detection

Chapter -1

INRODUCTION

1
Data Leakage Detection

INTRODUCTION

1.1 PROJECT DESCRIPTION


In the course of doing business, sometimes sensitive data must be
handed over to supposedly trusted third parties. For example, a hospital may
give patient records to researchers who will devise new treatments.
Similarly,
a company may have partnerships with other companies that require sharing
customer data. Another enterprise may outsource its data processing, so data
must be given to various other companies.We call the owner of the data the
distributor and the supposedly trusted third parties the agents. Our goal is to
detect when the distributor’s sensitive data has been leaked by agents, and if
possible to identify the agent that leaked the data. We consider applications
where the original sensitive data cannot be perturbed. Perturbation is a very
useful technique where the data is modified and made “less sensitive” before
being handed to agents. For example, one can add random noise to certain
attributes, or one can replace exact values by ranges . However, in some
cases it is important not to alter the original distributor’s data. For example,
if an outsourcer is doing our payroll, he must have the exact salary and
customer bank account numbers. If medical researchers will be treating
patients (as opposed to simply computing statistics), they may need accurate
data for the patients. Traditionally, leakage detection is handled by
watermarking, e.g., a unique code is embedded in each distributed

2
Data Leakage Detection

copy. If that copy is later discovered in the hands of an unauthorized party,


the leaker can be identified. Watermarks can be very useful in some cases,
but again, involve some modification of the original data.
Furthermore, watermarks can sometimes be destroyed if the data recipient is
malicious. In this paper we study unobtrusive techniques for detecting
leakage of a set of objects or records. Specifically, we study the following
scenario: After giving a set of objects to agents, the distributor discovers
some of those same objects in an unauthorized place. (For example, the data
may be found on a web site, or may be obtained through a legal discovery
process.) At this point the distributor can assess the likelihood that the
leaked data came from one or more agents, as opposed to having
been independently gathered by other means. Using an analogy with cookies
stolen from a cookie jar, if we catch Freddie with a single cookie, he can
argue that a friend gave him the cookie. But if we catch Freddie with 5
cookies, it will be much harder for him to argue that his hands were not in
the cookie jar. If the distributor sees “enough evidence” that an agent leaked
data, he may stop doing business with him, or may initiate legal
proceedings. In this paper we develop a model for assessing the “guilt” of
agents. We also present algorithms for distributing objects to agents, in a
way that improves our chances of identifying a leaker. Finally, we also
consider the option of adding “fake” objects to the distributed set. Such
objects do not correspond to real entities but appear realistic to the agents. In
a sense, the fake objects acts as a type of watermark for the entire set,
without modifying any individual members. If it turns out an agent was
given one or more fake objects that were leaked, then the distributor can be
more confident that agent was guilty.

3
Data Leakage Detection

4
Data Leakage Detection

Chapter -2

PROBLEM DEFINITION

5
Data Leakage Detection

PROBLEM DEFINITION

2.1 Existing System

1. INTRODUCTION

In the course of doing business, sometimes sensitive data must be handed


over to supposedly trusted third parties. For example, a hospital may give
patient records to researchers who will devise new treatments. Similarly,
a company may have partnerships with other companies that require sharing
customer data. Another enterprise may outsource its data processing, so data
must be given to various other companies.We call the owner of the data the
distributor and the supposedly trusted third parties the agents. Our goal is to
detect when the distributor’s sensitive data has been leaked by agents, and if
possible to identify the agent that leaked the data. We consider applications
where the original sensitive data cannot be perturbed. Perturbation is a very
useful technique where the data is modified and made “less sensitive” before
being handed to agents. For example, one can add random noise to certain
attributes, or one can replace exact values by ranges . However, in some
cases it is important not to alter the original distributor’s data. For example,
if an outsourcer is doing our payroll, he must have the exact salary and
customer bank account numbers. If medical researchers will be treating
patients (as opposed to simply computing statistics), they may need accurate

6
Data Leakage Detection

data for the patients. Traditionally, leakage detection is handled by


watermarking, e.g., a unique code is embedded in each distributed
copy. If that copy is later discovered in the hands of an unauthorized party,
the leaker can be identified. Watermarks can be very useful in some cases,
but again, involve some modification of the original data.
Furthermore, watermarks can sometimes be destroyed if the data recipient is
malicious. In this paper we study unobtrusive techniques for detecting
leakage of a set of objects or records. Specifically, we study the following
scenario: After giving a set of objects to agents, the distributor discovers
some of those same objects in an unauthorized place. (For example, the data
may be found on a web site, or may be obtained through a legal discovery
process.) At this point the distributor can assess the likelihood that the
leaked data came from one or more agents, as opposed to having
been independently gathered by other means. Using an analogy with cookies
stolen from a cookie jar, if we catch Freddie with a single cookie, he can
argue that a friend gave him the cookie. But if we catch Freddie with 5
cookies, it will be much harder for him to argue that his hands were not in
the cookie jar. If the distributor sees “enough evidence” that an agent leaked
data, he may stop doing business with him, or may initiate legal
proceedings. In this paper we develop a model for assessing the “guilt” of
agents. We also present algorithms for distributing objects to agents, in a
way that improves our chances of identifying a leaker. Finally, we also
consider the option of adding “fake” objects to the distributed set. Such
objects do not correspond to real entities but appear realistic to the agents. In
a sense, the fake objects acts as a type of watermark for the entire set,
without modifying any individual members. If it turns out an agent was

7
Data Leakage Detection

given one or more fake objects that were leaked, then the distributor can be
more confident that agent was guilty.

2.2 Proposed System


Our goal is to detect when the distributor’s sensitive data has
been leaked by agents, and if possible to identify the agent that leaked
the data. Perturbation is a very useful technique where the data is
modified and made “less sensitive” before being handed to agents. we
develop unobtrusive techniques for detecting leakage of a set of
objects or records.

In this section we develop a model for assessing the “guilt” of


agents. We also present algorithms for distributing objects to agents,
in a way that improves our chances of identifying a leaker. Finally, we
also consider the option of adding “fake” objects to the distributed set.
Such objects do not correspond to real entities but appear realistic to
the agents. In a sense, the fake objects acts as a type of watermark for
the entire set, without modifying any individual members. If it turns
out an agent was given one or more fake objects that were leaked,
then the distributor can be more confident that agent was guilty.

8
Data Leakage Detection

Problem Setup and Notation:

A distributor owns a set T={t1,…,tm}of valuable data objects. The


distributor wants to share some of the objects with a set of agents
U1,U2,…Un, but does not wish the objects be leaked to other third
parties. The objects in T could be of any type and size, e.g., they could
be tuples in a relation, or relations in a database. An agent Ui receives
a subset of objects, determined either by a sample request or an
explicit request:

1. Sample request
2. Explicit request

Guilt Model Analysis:

our model parameters interact and to check if the interactions


match our intuition, in this section we study two simple scenarios as
Impact of Probability p and Impact of Overlap between Ri and S.
In each scenario we have a target that has obtained all the distributor’s
objects, i.e., T = S.

Algorithms:

1. Evaluation of Explicit Data Request Algorithms

In the first place, the goal of these experiments was to see


whether fake objects in the distributed data sets yield

9
Data Leakage Detection

significant improvement in our chances of detecting a guilty


agent. In the second place, we wanted to evaluate our e-
optimal algorithm relative to a random allocation.

2. Evaluation of Sample Data Request Algorithms

With sample data requests agents are not interested in


particular objects. Hence, object sharing is not explicitly
defined by their requests. The distributor is “forced” to
allocate certain objects to multiple agents only if the number
of requested objects exceeds the number of objects in set T.
The more data objects the agents request in total, the more
recipients on average an object has; and the more objects are
shared among different agents, the more difficult it is to
detect a guilty agent.

Chapter – 3

10
Data Leakage Detection

FEASIBILITY STUDY

FEASIBILITY STUDY:

11
Data Leakage Detection

The feasibility of the project is analyzed in this phase and

business proposal is put forth with a very general plan for the project

and some cost estimates. During system analysis the feasibility study

of the proposed system is to be carried out. This is to ensure that the

proposed system is not a burden to the company. For feasibility

analysis, some understanding of the major requirements for the system

is essential.

Three key considerations involved in the feasibility analysis are

 ECONOMICAL FEASIBILITY

 TECHNICAL FEASIBILITY

 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

12
Data Leakage Detection

This study is carried out to check the economic impact that the

system will have on the organization. The amount of fund that the company

can pour into the research and development of the system is limited. The

expenditures must be justified. Thus the developed system as well within the

budget and this was achieved because most of the technologies used are

freely available. Only the customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is,

the technical requirements of the system. Any system developed must not

have a high demand on the available technical resources. This will lead to

high demands on the available technical resources. This will lead to high

demands being placed on the client. The developed system must have a

modest requirement, as only minimal or null changes are required for

implementing this system.

13
Data Leakage Detection

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system

by the user. This includes the process of training the user to use the system

efficiently. The user must not feel threatened by the system, instead must

accept it as a necessity. The level of acceptance by the users solely depends

on the methods that are employed to educate the user about the system and

to make him familiar with it. His level of confidence must be raised so that

he is also able to make some constructive criticism, which is welcomed, as

he is the final user of the system.

14
Data Leakage Detection

Chapter – 4

SYSTEM ANALYSIS

15
Data Leakage Detection

SYSTEM ANALYSIS

SOFTWARE REQUIREMENT SPECIFICATION

Question: What is SRS?


Answer: Software Requirement Specification (SRS) is the starting point
of the software developing activity. As system grew more
complex it became evident that the goal of the entire system
cannot be easily comprehended. Hence the need for the
requirement phase arose. The software project is initiated by
the client needs. The SRS is the means of translating the ideas
of the minds of clients (the input) into a formal document (the
output of the requirement phase.)

The SRS phase consists of two basic activities:

1) Problem/Requirement Analysis:

The process is order and more nebulous of the two, deals


with understand the problem, the goal and constraints.

2) Requirement Specification:

Here, the focus is on specifying what has been found


giving analysis such as representation, specification languages and
tools, and checking the specifications are addressed during this
activity.

16
Data Leakage Detection

The Requirement phase terminates with the production of the


validate SRS document. Producing the SRS document is the basic
goal of this phase.

ROLE OF SRS
The purpose of the Software Requirement Specification is to
reduce the communication gap between the clients and the
developers. Software Requirement Specification is the medium
though which the client and user needs are accurately specified. It
forms the basis of software development. A good SRS should
satisfy all the parties involved in the system.

SCOPE

This document is the only one that describes the requirements


of the system. It is meant for the use by the developers, and will also
be the basis for validating the final delivered system. Any changes
made to the requirements in the future will have to go through a
formal change approval process. The developer is responsible for
asking for clarifications, where necessary, and will not make any
alterations without the permission of the client.

17
Data Leakage Detection

System Specification

System Requirements:

Hardware Requirements:

• System : Pentium IV 2.4 GHz.


• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• Ram : 512 Mb.

Software Requirements:

• Operating system : - Windows XP.

• Coding Language : DOT NET

• Data Base : SQL Server 2005

18
Data Leakage Detection

Chapter -5

SYSTEM DESIGN

19
Data Leakage Detection

SYSTEM DESIGN

System design is transition from a user oriented document to


programmers or data base personnel. The design is a solution, how to
approach to the creation of a new system. This is composed of several steps.
It provides the understanding and procedural details necessary for
implementing the system recommended in the feasibility study. Designing
goes through logical and physical stages of development, logical design
reviews the present physical system, prepare input and output specification,
details of implementation plan and prepare a logical design walkthrough.

SOFTWARE DESIGN
In designing the software following principles are followed:

1. Modularity and partitioning: software is designed such that, each

system should consists of hierarchy of modules and serve to partition into

separate function.

2. Coupling: modules should have little dependence on other modules of a

system.

3. Cohesion: modules should carry out in a single processing function.

20
Data Leakage Detection

4. Shared use: avoid duplication by allowing a single module be called by

other that need the function it provides

Proposed Modules:

1. Data Allocation Module


2. Fake Object Module
3. Optimization Module
4. Data Distributor
MODULES:

1. Data Allocation Module:

The main focus of our project is the data allocation problem as how
can the distributor “intelligently” give data to agents in order to
improve the chances of detecting a guilty agent.

2. Fake Object Module:

Fake objects are objects generated by the distributor in order to


increase the chances of detecting agents that leak data. The distributor
may be able to add fake objects to the distributed data in order to
improve his effectiveness in detecting guilty agents. Our use of fake
objects is inspired by the use of “trace” records in mailing lists.
3. Optimization Module:

21
Data Leakage Detection

The Optimization Module is the distributor’s data allocation to


agents has one constraint and one objective. The distributor’s
constraint is to satisfy agents’ requests, by providing them with the
number of objects they request or with all available objects that satisfy
their conditions. His objective is to be able to detect an agent who
leaks any portion of his data.

4. Data Distributor:

A data distributor has given sensitive data to a set of supposedly


trusted agents (third parties). Some of the data is leaked and found in
an unauthorized place (e.g., on the web or somebody’s laptop). The
distributor must assess the likelihood that the leaked data came from
one or more agents, as opposed to having been independently
gathered by other means.

SYSTEM DESIGN

Data Flow Diagram / Use Case Diagram / Flow Diagram

The DFD is also called as bubble chart. It is a simple graphical


formalism that can be used to represent a system in terms of the input data to

22
Data Leakage Detection

the system, various processing carried out on these data, and the output data
is generated by the system.

Data flow diagram: Data Flow Diagram:

23
Data Leakage Detection

Login

Admin Agent
Check

no
Select Agent Exists

yes
Create Account
Upload File to Agent

View And Update Agent Details

File Maintainance and Secret key

File Details

Data Leaker
File Lock with Secret Key

End yes no
if exists

File Locked File Unlocked

File Download with Secret key

yes no
if Secret key Exists

Original File Duplicate File

24
Data Leakage Detection

Use Case Diagram:

Create an Account

Login

Upload Files to Agent

Admin
Agent Generate Secret Key

Download Files

Lock/UnLock

Data Leaker

25
Data Leakage Detection

Class Diagram:

Upload Files Agent Account


FileID AgentName
FileName AgentID
AgentID AgentPassword
FileType EmailID
Filepath
UploadDate
CreateAccount()
SenttoAgent() GenerateKey()
ViewFileDetails()

Lock/UnLock
FileID Edit Account
FilePassword AgentName
ReTypePassword EmailID
SecretKey OldPassword
NewPassword
ReType NewPassword

Lock()
UnLock() Update()

Sequence Diagram:

26
Data Leakage Detection

DataBase

Agent Admin

Create an Account

Upload Files Store Files

L:ock/UnLockFiles

View Agent Account

View Files

Download Files

Send Required File

Send Duplicate File

Data Leaker

If Secret key does not matches If Secret key matches

Activity Diagram:

27
Data Leakage Detection

Login

Check

No
Upload Files Exists Create Account
Yes

Files Maitainance
Lock/UnLock File

Data Leaker
File Download

If secret key exists


Check Not Exists

Download Receive
Original File Duplicate File

5.1 DATABASE DESIGN:

28
Data Leakage Detection

The database tables are designed by analyzing functions involved in


the system and format of the fields is also designed. The fields in the
database tables should define their role in the system. The unnecessary fields
should be avoided because it affects the storage areas of the system. Then in
the input and output screen design, the design should be made user friendly.
The menu should be precise and compact.

5.2 INPUT/OUTPUT DESIGN


INPUT DESIGN

The input design is the link between the information system and
the user. It comprises the developing specification and procedures
for data preparation and those steps are necessary to put transaction
data in to a usable form for processing can be achieved by
inspecting the computer to read data from a written or printed
document or it can occur by having people keying the data directly
into the system. The design of input focuses on controlling the
amount of input required, controlling the errors, avoiding delay,
avoiding extra steps and keeping the process simple. The input is
designed in such a way so that it provides security and ease of use
with retaining the privacy. Input Design considered the following
things:
 What data should be given as input?
 How the data should be arranged or coded?

29
Data Leakage Detection

 The dialog to guide the operating personnel in providing


input.
 Methods for preparing input validations and steps to follow
when error occur.

OBJECTIVES

1.Input Design is the process of converting a user-oriented


description of the input into a computer-based system. This design
is important to avoid errors in the data input process and show the
correct direction to the management for getting correct information
from the computerized system.

2. It is achieved by creating user-friendly screens for the data entry


to handle large volume of data. The goal of designing input is to
make data entry easier and to be free from errors. The data entry
screen is designed in such a way that all the data manipulates can
be performed. It also provides record viewing facilities.
3.When the data is entered it will check for its validity. Data can be
entered with the help of screens. Appropriate messages are
provided as when needed so that the user
will not be in maize of instant. Thus the objective of input design
is to create an input layout that is easy to follow

30
Data Leakage Detection

OUTPUT DESIGN

A quality output is one, which meets the requirements of the end


user and presents the information clearly. In any system results of
processing are communicated to the users and to other system
through outputs. In output design it is determined how the
information is to be displaced for immediate need and also the hard
copy output. It is the most important and direct source information
to the user. Efficient and intelligent output design improves the
system’s relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well
thought out manner; the right output must be developed while
ensuring that each output element is designed so that people will
find the system can use easily and effectively. When analysis
design computer output, they should Identify the specific output
that is needed to meet the requirements.
2.Select methods for presenting information.
3.Create document, report, or other formats that contain
information produced by the system.
The output form of an information system should accomplish one
or more of the following objectives.
 Convey information about past activities, current status or
projections of the

31
Data Leakage Detection

 Future.
 Signal important events, opportunities, problems, or
warnings.
 Trigger an action.
 Confirm an action.

Chapter -6

32
Data Leakage Detection

IMPLEMENTATION

OVERVIEW OF SOFTWARE DEVELOPMENT


TOOLS

6.1 HTML

Html is a language which is used to create web pages with html


marking up a page to indicate its format, telling the web browser
where you want a new line to begin or how you want text or images
aligned and more are possible.

We used the following tags in our project.

TABLE:

33
Data Leakage Detection

Tables are so popular with web page authors is that they let you arrange the
elements of a web page in such a way that the browser won’t rearrange them
web page authors frequently use tables to structure web pages.

TR:
TR is used to create a row in a table encloses <TH> and
<TD> elements. <TR> contain many attributes. Some of them are,
 ALIGN: specifies the horizontal alignment of the text in the table row.
 BGCOLOR: Specifies the background color for the row.
 BORDERCOLOR: Sets the external border color for the row.
 VALIGN: Sets the vertical alignment of the data in this row.

TH:
TH is used to create table heading.
 ALIGN: Sets the horizontal alignment of the content in the table cell.
Sets LEFT, RIGHT, CENTER.
 BACKGROUND: Species the back ground image for the table cell.
 BGCOLOR: Specifies the background color of the table cell
 VALIGN: Sets the vertical alignment of the data. Sets to TOP,
MIDDLE, BOTTOM or BASELINE.
 WIDTH: Specifies the width of the cell. Set to a pixel width or a
percentage of the display area.
TD:
TD is used to create table data that appears in the cells of a
table.

34
Data Leakage Detection

 ALIGN: Species the horizontal alignment of content in the table cell.


Sets to LEFT, CENTER, RIGHT.
 BGCOLOR: Specifies the background image for the table cell.
 BGCOLOR: sets the background color of the table cells.
 WIDTH: Species the width of the cell

FRAMES:
Frames are used for either run off the page or display only small
slices of what are supposed to be shown and to configure the frame we can
use <FRAMESET>There are two important points to consider when
working with <FRAMESET>.
 <FRAMESET> element actually takes the place of the <BODY>
element in a document.
 Specifying actual pixel dimensions for frames .

<FRAME> Elements are used to create actual frames.


From the frameset point of view dividing the browser into tow vertical
frames means creating two columns using the <FRAMESET> elements
COLS attribute.
The syntax for vertical fragmentation is,
<FRAMESET COLS =”50%, 50%”>
</FRAMESET>
Similarly if we replace COLS with ROWS then we get horizontal
fragmentation.
The syntax for horizontal fragmentation is,
<FRAMESET ROWS=”50%, 50%”>

35
Data Leakage Detection

</FRAMESET>
FORM:
The purpose of FORM is to create an HTML form; used
to enclose HTML controls, like buttons and text fields.

ATTRIBUTES:
 ACTION: Gives the URL that will handle the form
data.

 NAME: Gives the name to the form so you can


reference it in code set to an alphanumeric string.

 METHOD: method or protocol is used to sending


data to the target action URL. The GET method is the default, it is
used to send all form name/value pair information in an URL.
Using the POST method, the content of the form are encoded as with
the GET method, but are sent in environment variables.

CONTROLS IN HTML

<INPUT TYPE =BUTTON>:


Creates an html button in a form.
ATTRIBUTES:
 NAME: gives the element a name. Set to alphanumeric characters.
 SIZE: sets the size.
 VALUE: sets the caption of the element.

36
Data Leakage Detection

<INPUT TYPE = PASSWORD>:


Creates a password text field, which makes typed input.
ATTRIBUTES:
 NAME: gives the element a name, set to alphanumeric characters.
 VALUE: sets the default content of the element.

<INPUT TYPE=RADIO>:
Creates a radio button in a form.
ATTRIBUTE:
 NAME: Gives the element a name. Set to alphanumeric character.
 VALUE: Sets the default content of the element.

<INPUT TYPE=SUBMIT>:
Creates a submit button that the user can click to send data in the form
back to the web server.
ATTRIBUTES:
NAME: Gives the element a name. Set to alphanumeric characters.
VALUE: Gives this button another label besides the default, Submit Query.
Set to alphanumeric characters.
<INPUT TYPE=TEXT>:
Creates a text field that the user can enter or edit text in.
ATTRIBUTES:
NAME: Gives the element a name. Set to alphanumeric characters.
VALUE: Holds the initial text in the text field. Set to alphanumeric
characters.

37
Data Leakage Detection

38
Data Leakage Detection

6.2 JAVA SCRIPT


Java script originally supported by Netscape navigator is the
most popular web scripting language today. Java script lets you
embedded programs right in your web pages and run these programs
using the web browser. You place these programs in a <SCRIPT>
element, usually with in the <HEAD> element. If you want the
script to write directly to the web page, place it in the <BODY> element.

JAVASCRIPT METHODS:
Writeln:
Document.writeln() is a method, which is used to write some text to
the current web page.
onClick:
Occurs when an element is clicked.
onLoad:
Occurs when the page loads.
onMouseDown:
Occurs when a mouse button goes down.
onMouseMove:
Occurs when the mouse moves.
onUnload:
Occurs when a page is unloaded.

39
Data Leakage Detection

JDBC DRIVERS:
The JDBC API only defines interfaces for objects used for
performing various database-related tasks like opening and
closing connections, executing SQL commands, and retrieving the
results. We all write our programs to interfaces and not
implementations. Either the resource manager vendor or a third
party provides the implementation classes for the standard JDBC
interfaces. These software implementations are called JDBC drivers.
JDBC drivers transform the standard JDBC calls to the external
resource manager-specific API calls. The diagram below depicts
how a database client written in java accesses an external resource

40
Data Leakage Detection

manager using the JDBC API and JDBC driver:

Depending on the mechanism of implementation, JDBC drivers are broadly


classified into four types.

TYPE1:
Type1 JDBC drivers implement the JDBC API on top of a
lower level API like ODBC. These drivers are not generally
portable because of the independency on native libraries. These

41
Data Leakage Detection

drivers translate the JDBC calls to ODBC calls and ODBC sends the
request to external data source using native library calls. The
JDBC-ODBC driver that comes with the software distribution for J2SE
is an example of a type1 driver.

TYPE2:
Type2 drivers are written in mixture of java and native code.
Type2 drivers use vendors specific native APIs for accessing the
data source. These drivers transform the JDBC calls to vendor specific
calls using the vendor’s native library.
These drivers are also not portable like type1 drivers because
of the dependency on native code.

TYPE3:
Type3 drivers use an intermediate middleware server for
accessing the external data sources. The calls to the middleware
server are database independent. However, the middleware
server makes vendor specific native calls for accessing the data
source. In this case, the driver is purely written in java.

TYPE4:
Type4 drivers are written in pure java and implement
the JDBC interfaces and translate the JDBC specific calls to vendor
specific access calls. They implement the data transfer and network
protocol for the target resource manager. Most of the leading

42
Data Leakage Detection

database vendors provide type4 drivers for accessing their database


servers.

DRIVER MANAGER AND DRIVER:

The java.sql package defines an interface called Java.sql.Driver


that makes to be implemented by all the JDBC drivers and a class
called java.sql.DriverManager that acts as the interface to the
database clients for performing tasks like connecting to external
resource managers, and setting log streams. When a JDBC client
requests the Driver Manager to make a connection to an external
resource manager, it delegates the task to an approate driver
class implemented by the JDBC driver provided either by the
resource manager vendor or a third party.

43
Data Leakage Detection

JAVA.SQL.DRIVERMANAGER:

The primary task of the class driver manager is to manage the


various JDBC drivers register. It also provides methods for:
 Getting connections to the databases.
 Managing JDBC logs.
 Setting login timeout.

MANAGING DRIVERS:

JDBC clients specify the JDBC URL when they request a


connection. The driver manager can find a driver that matches the
request URL from the list of register drivers and delegate the
connection request to that driver if it finds a match JDBC URLs
normally take the following format:
<protocol>:<sub-protocol>:<resource>
The protocol is always jdbc and the sub-protocol and resource depend
on the type of resource manager. The URL for postgreSQL is in the
format:
Jdbc: postgres ://< host> :< port>/<database>
Here host is the host address on which post master is running and
database is the name of the database to which the client wishes to connect.

MANAGING CONNECTION:
DriverManager class is responsible for managing connections
to the databases:

44
Data Leakage Detection

public static Connection getConnection (String url,Properties info)


throws SQLException
This method gets a connection to the database by the specified
JDBC URL using the specified username and password. This method
throws an instance of SQLException if a database access error occurs.

CONNECTIONS:

The interface java.sql.Connection defines the methods required for a


persistent connection to the database. The JDBC driver vendor
implements this interface. A database ‘vendor-neutral’ client never
uses the implementation class and will always use only the interface.
This interface defines methods for the following tasks:
 Statements, prepared statements, and callable statements are the
different types of statements for issuing sql statements to the database
by the JDBC clients.
 For getting and setting auto-commit mode.
 Getting meta information about the database.
 Committing and rolling back transactions.

CREATING STATEMENTS:
The interface java.sql.Connection defines a set of methods for
creating database statements. Database statements are used for sending SQL
statements to the database:
Public Statement createStatement () throws SQLException

45
Data Leakage Detection

This method is used for creating instances of the interface


java.sql.Statement. This interface can be used for sending SQL statements to
the database. The interface java.sql.Statement is normally used for sending
SQL statements that don’t take any arguments. This method throws an
instance of SQLException if a database access error occurs:
Public Statement createStatement (int resType, int resConcurrency)
throws SQLException.

JDBC RESULTSETS:

A JDBC resultset represents a two dimentional array of data


produced as a result of executing SQL SELECT statements
against databases using JDBC statements. JDBC resultsets are
represented by the interface java.sql.ResultSet. The JDBC vendor
provider provides the implementation class for this interface.

SCROLLING RESULTSETS:

public boolean next() throws SQLException


public boolean previous() throws SQLException
public boolean first() throws SQLException
public boolean last() throws SQLException

46
Data Leakage Detection

ACCESSING RESULTSET DATA:

Method name and Purpose

public boolean getBoolean (int i)

Gets the data in the specified column as a boolean.

public boolean getBoolean (String col)


public int getInt(int I) Gets the data in the specied columnas
an int.

public int getInt (String col)

public String getString (int I) Gets the data in the specied column as
a string.

STATEMENT:

The interface java.sql.Stament is normally used for sending


SQL statements that do not have IN or OUT parameters. The JDBC driver
vendor provides the implementation class for this interface. The
common methods required by the different JDBC statements are
defined in this interface. The methods defined by java.sql. Statement
can be broadly categorized as follows:

47
Data Leakage Detection

 Executing SQL statements


 Querying results and result sets
 Handling SQL batches
 Other miscellaneous methods

The interface java.sql.statements defines


Methods for executing different SQL statements like SELECT,
UPDATE, INSERT, DELETE, and CREATE.

Public Result set execute Query (string sql) throws SQLException


The following figure shows how the Driver Manager, Driver, Connection,
Statement, Result Set classes are connected.

DriverManager

Driver Driver
Layer

Application
Connection
Layer

Prepared Statement Statement Callable Statement

Result Set Result Set Result Set

48
Data Leakage Detection

6.4 JAVA SERVER PAGES (JSP)


INTRODUCTION:
Java Server Pages (JSP) technology enables you to mix regular, static
HTML with dynamically generated content. You simply write the regular
HTML in the normal manner, using familiar Web-page-building tools. You
then enclose the code for the dynamic parts in special tags, most of which
start with <% and end with %>.

THE NEED FOR JSP:


Servlets are indeed useful, and JSP by no means makes them obsolete.
However,
 It is hard to write and maintain the HTML.
 You cannot use standard HTML tools.
 The HTML is inaccessible to non-Java developers.

BENEFITS OF JSP:
JSP provides the following benefits over servlets alone:
 It is easier to write and maintain the HTML: In this no extra
backslashes, no double quotes, and no lurking Java syntax.
 You can use standard Web-site development tools:

49
Data Leakage Detection

We use Macromedia Dreamweaver for most of the JSP pages.


Even HTML tools that know nothing about JSP can used because they
simply ignore the JSP tags.
 You can divide up your development team:
The Java programmers can work on the dynamic code.
The Web developers can concatenate on the representation layer.
On large projects, this division is very important. Depending
on the size of your team and the complexity of your project, you
can enforce a weaker or stronger separation between the static
HTML and the dynamic content.

CREATING TEMPLATE TEXT:

A large percentage of our JSP document consists of static text


known as template text. In almost all respects, this HTML looks just
likes normal HTML follows all the same syntax rules, and
simply “passed through” to that client by the servlet created to handle
the page. Not only does the HTML look normal, it can be created
by whatever tools you already are using for building Web pages.
There are two minor exceptions to the “template
text passed through” rule. First, if you want to have <% 0r %> in
the out port, you need to put <\% or %\> in the template text. Second,
if you want a common to appear in the JSP page but not in the
resultant document,

<%-- JSP Comment -- %>

50
Data Leakage Detection

HTML comments of the form:

<!—HTML Comment -->

are passed through to the client normally.

TYPES OF JSP SCRIPTING ELEMENTS:


JSP scripting elements allow you to insert Java code into the servlet
that will be generated from the JSP page. There are three forms:

1. Expressions of the form <%=Java Expression %>, which are


evaluated and inserted into the servlet’s output.
2. Sciptlets of the form <%Java code %>, which are inserted into the
servlet’s_jspService method (called by service).
3. Declarations of the form<%! Field/Method Declaration %>, which
are inserted into the body of the servlet class, outside any existing
methods.

USING JSP EXPRESSIONS:

A JSP element is used to insert values directly into the output. It has
the following form:
<%= Java Expression %>
The expression is evaluated, converted to a string, and inserted in the
page. This evaluation is performed at runtime (when the page is
requested) and thus has full access to the information about the request.

51
Data Leakage Detection

For example, the following shows the date/time that the page was
requested.
Current time: <%=new java.util.Date () %>

PREDEFINED VARIABLES:
To simplify expressions we can use a number of predefined variables (or
“implicit objects”). The specialty of these variables is that, the system simple
tells what names it will use for the local variables in _jspService.The most
important ones of these are:
 request, the HttpServletRequest.
 response, the HttpServletResponse.
 session, the HttpSession associated with the request
 out, the writer used to send output to clients.
 application, the ServletContext. This is a data structure shared by all
servlets and JSP pages in the web application and is good for storing
shared data.
Here is an example:

Your hostname: <%= request.getRemoteHost () %>

COMPARING SERVLETS TO JSP PAGES

52
Data Leakage Detection

JSP works best when the structure of the HTML page is fixed but the
values at various places need to be computed dynamically. If the structure of
the page is dynamic, JSP is less beneficial. Some times servlets are better in
such a case. If the page consists of binary data or has little static content,
servlets are clearly superior. Sometimes the answer is neither servlets nor
JSP alone, but rather a combination of both.

WRITING SCRIPTLETS
If you want to do something more complex than output the value of a
simple expression .JSP scriptlets let you insert arbitrary code into the
servlet’s _jspService method. Scriptlets have the following form:
<% Java code %>
Scriptlets have access to the same automatically defined variables as do
expressions (request, response, session, out , etc ) .So for example you want
to explicitly send output of the resultant page , you could use the out
variable , as in the following example:
<%
String queryData = request.getQueryString ();
out.println (“Attached GET data: “+ queryData);
%>
SCRIPTLET EXAMPLE:
As an example of code that is too complex for a JSP expression alone,
a JSP page that uses the bgColor request parameter to set the background
color of the page .Simply using

53
Data Leakage Detection

<BODY BGCOLOR=”<%= request.getParameter (“bgcolor”) %> “>


would violate the cardinal rule of reading form data.

USING DECLARATIONS
A JSP declaration lets you define methods or fields that get inserted
into the main body of the servlet class .A declaration has the following form:
<%! Field or Method Definition %>
Since declarations do not generate output, they are normally used in
conjunction with JSP expressions or scriptlets. In principle, JSP declarations
can contain field (instance variable) definitions, method definitions, inner
class definitions, or even static initializer blocks: anything that is legal to put
inside a class definition but outside any existing methods. In practice
declarations almost always contain field or method definitions.
We should not use JSP declarations to override the standard servlet life cycle
methods. The servlet into which the JSP page gets translated already makes
use of these methods. There is no need for declarations to gain access to
service, doget, or dopost, since calls to service are automatically dispatched
to _jspService , which is where code resulting from expressions and
scriptlets is put. However for initialization and cleanup, we can use jspInit
and jspDestroy- the standard init and destroy methods are guaranteed to call
these methods in the servlets that come from JSP.

6.5 TOMCAT

54
Data Leakage Detection

Tomcat 6.0 web server

Tomcat is an open source web server developed by Apache Group.


Apache Tomcat is the servlet container that is used in the official
Reference Implementation for the Java Servlet and JavaServer Pages
technologies. The Java Servlet and JavaServer Pages specifications are
developed by Sun under the Java Community Process. Web Servers
like Apache Tomcat support only web components while an application
server supports web components as well as business components
(BEAs Weblogic, is one of the popular application server).To develop a
web application with jsp/servlet install any web server like JRun,
Tomcat etc to run your application.

Fig Tomcat Webserver

TERMINOLOGY:
Context – a Context is a web application.

55
Data Leakage Detection

$CATALINA_HOME – This represents the root of Tomcat


installation.
DIRECTORIES AND FILES:
/bin – Startup, shutdown, and other scripts. The *.sh files (for Unix
systems) are functional duplicates of the *.bat files (for Windows systems).
Since the Win32 command-line lacks certain functionality, there are some
additional files in here.

/conf – Configuration files and related DTDs. The most important file
in here is server.xml. It is the main configuration file for the container.

/logs – Log files are here by default.

/webapps – This is where webapps go\

INSTALLATION:
Tomcat will operate under any Java Development Kit (JDK)
environment that provides a JDK 1.2 (also known as Java2 Standard
Edition, or J2SE) or later platform. JDK is needed so that servlets, other
classes, and JSP pages can be compiled.

DEPLOYMENT DIRECTORIES FOR DEFAULT WEB


APPLICATION:

HTML and JSP Files

56
Data Leakage Detection

 Main Location
$CATALINA_HOME/webapps/ROOT
 Corresponding URLs.
http://host/SomeFile.html
http://host/SomeFile.jsp

 More Specific Location (Arbitrary Subdirectory).


$CATALINA_HOME/webapps/ROOT/SomeDirectory

 Corresponding URLs
http://host/SomeDirectory/SomeFile.html
http://host/SomeDirectory/SomeFile.jsp

Individual Servlet and Utility Class Files

 Main Location (Classes without Packages).


$CATALINA_HOME/webapps/ROOT/WEB-INF/classes

 Corresponding URL (Servlets).


http://host/servlet/ServletName

 More Specific Location (Classes in Packages).


$CATALINA_HOME/webapps/ROOT/WEB-INF/classes/packageName

 Corresponding URL (Servlets in Packages).

57
Data Leakage Detection

http://host/servlet/packageName.ServletName

Servlet and Utility Class Files Bundled in JAR Files

 Location
$CATALINA_HOME/webapps/ROOT/WEB-INF/lib

 Corresponding URLs (Servlets)


http://host/servlet/ServletName
http://host/servlet/packageName.ServletName

Chapter -7

58
Data Leakage Detection

TESTING

SOFTWARE TESTING

Testing
Software testing is a critical element of software quality assurance and
represents the ultimate review of specification, design and code generation.

7.1 TESTING OBJECTIVES

59
Data Leakage Detection

 To ensure that during operation the system will perform as per


specification.
 TO make sure that system meets the user requirements during
operation
 To make sure that during the operation, incorrect input, processing
and output will be detected
 To see that when correct inputs are fed to the system the outputs are
correct
 To verify that the controls incorporated in the same system as
intended
 Testing is a process of executing a program with the intent of finding
an error
 A good test case is one that has a high probability of finding an as yet
undiscovered error

The software developed has been tested successfully using the


following testing strategies and any errors that are encountered are corrected
and again the part of the program or the procedure or function is put to
testing until all the errors are removed. A successful test is one that uncovers
an as yet undiscovered error.

Note that the result of the system testing will prove that the system is
working correctly. It will give confidence to system designer, users of the
system, prevent frustration during implementation process etc.,

60
Data Leakage Detection

7.2 TEST CASE DESIGN:

White box testing

White box testing is a testing case design method that uses


the control structure of the procedure design to derive test cases.
All independents path in a module are exercised at least once, all
logical decisions are exercised at once, execute all loops at boundaries
and within their operational bounds exercise internal data structure to
ensure their validity. Here the customer is given three chances to
enter a valid choice out of the given menu. After which the control exits
the current menu.

Black Box Testing

Black Box Testing attempts to find errors in following areas or


categories, incorrect or missing functions, interface error, errors in data
structures, performance error and initialization and termination error. Here
all the input data must match the data type to become a valid entry.
The following are the different tests at various levels:

Unit Testing:
Unit testing is essentially for the verification of the code
produced during the coding phase and the goal is test the internal logic
of the module/program. In the Generic code project, the unit testing is
done during coding phase of data entry forms whether the functions are

61
Data Leakage Detection

working properly or not. In this phase all the drivers are tested they are
rightly connected or not.

Integration Testing:
All the tested modules are combined into sub systems, which are
then tested. The goal is to see if the modules are properly integrated,
and the emphasis being on the testing interfaces between the modules.
In the generic code integration testing is done mainly on table creation
module and insertion module.

Validation Testing

This testing concentrates on confirming that the software is error-free


in all respects. All the specified validations are verified and the software is
subjected to hard-core testing. It also aims at determining the degree of
deviation that exists in the software designed from the specification; they are
listed out and are corrected.
System Testing

This testing is a series of different tests whose primary is to fully


exercise the computer-based system. This involves:
 Implementing the system in a simulated production environment and
testing it.
 Introducing errors and testing for error handling.

62
Data Leakage Detection

TEST CASE 1 :

Test case for Login form:

When a user tries to login by submitting an incorrect ID or an


incorrect Password then it displays an error message “NOT A VALID
USER NAME”.

TEST CASE 2:

Test case for User Registration form:

When a user enters user id to register and ID already exists, then


this result in displaying error message “USER ID ALREADY EXISTS”.

TEST CASE 3 :

Test case for Change Password:

When the old password does not match with the new password ,then
this results in displaying an error message as “ OLD PASSWORD
DOES NOT MATCH WITH THE NEW PASSWORD”.

TEST CASE 4 :

Test case for Forget Password:

63
Data Leakage Detection

When a user forgets his password he is asked to enter Login name, ZIP
code, Mobile number. If these are matched with the already stored ones then
user will get his Original password.

Chapter – 8

64
Data Leakage Detection

OUTPUT SCREENS

Distributor Login

65
Data Leakage Detection

Distributor Home Page

Distributor Send file

66
Data Leakage Detection

View Sent files

View Leak Files

67
Data Leakage Detection

Agent Home

68
Data Leakage Detection

View Files Sent By Distributor

View Key

69
Data Leakage Detection

Files Sent By Agent

70
Data Leakage Detection

Send File To Agent

71
Data Leakage Detection

Edit Account Details

User Registration

72
Data Leakage Detection

8. CONCLUSION

In a perfect world there would be no need to hand over sensitive data to


agents that may unknowingly or maliciously leak it. And even if we had to
hand over sensitive data, in a perfect world we could watermark each object
so that we could trace its origins with absolute certainty. However, in many
cases we must indeed work with agents that may not be 100% trusted, and
we may not be certain if a leaked object came from an agent or from some
other source, since certain data cannot admit watermarks. In spite of these
difficulties, we have shown it is possible to assess the likelihood that an
agent is responsible for a leak, based on the overlap of his data with the
leaked data and the data of other agents, and based on the probability that

73
Data Leakage Detection

objects can be “guessed” by other means. Our model is relatively simple, but
we believe it captures the essential trade-offs. The algorithms we have
presented implement a variety of data distribution strategies that can
improve the distributor’s chances of identifying a leaker. We have shown
that distributing objects judiciously can make a significant difference in
identifying guilty agents, especially in cases where there is large overlap in
the data that agents must receive. Our future work includes the investigation
of agent guilt models that capture leakage scenarios that are not studied in
this paper. For example, what is the appropriate model for cases where
agents can collude and identify fake tuples? A preliminary discussion of such
a model is available in Another open problem is the extension of our
allocation strategies so that they can handle agent requests in an online
fashion (the presented strategies assume that there is a fixed set of agents
with requests known in advance).

74
Data Leakage Detection

75
Data Leakage Detection

76
Data Leakage Detection

APPENDIX

ABBREVATIONS:

HTML : Hyper Text Markup language.


JSCRIPT : Java Script
DFD : Data Flow Diagrams
HTTP : Hyper Text Transfer Protocol
JDBC : Java Data Base Connectivity.

FAQ’S

Questions: What is java script?

Answer: Java script is a compact. Object based scripting


language for developing client and server internet
applications.

Client vs Server side java script?

Client side java script is interpreted only with in the browser


that supports it, and the code is visible to the user. Server side java
script is stored in a pre-compiled state on the server, so it is

77
Data Leakage Detection

browser – independent, and only the results of the java script programs
are passed to the browser, so that code is never revealed.

Where can <script> container tags be placed with in an html document?

In general, the <script> container tags may appear any where with in
the html document. It is more viable to have the tags placed with in the
<head> container.

78
Data Leakage Detection

10. BIBLIOGRAPHY

Good Teachers are worth more than thousand books, we have them in
Our Department

References Made From:


1. User Interfaces in C#: Windows Forms and Custom Controls by Matthew
MacDonald.

2. Applied Microsoft® .NET Framework Programming (Pro-Developer) by


Jeffrey Richter.

3. Practical .Net2 and C#2: Harness the Platform, the Language, and the
Framework by Patrick Smacchia.

4. Data Communications and Networking, by Behrouz A Forouzan.

5. Computer Networking: A Top-Down Approach, by James F. Kurose.

6. Operating System Concepts, by Abraham Silberschatz.

7. R. Agrawal and J. Kiernan. Watermarking relational databases. In VLDB


’02: Proceedings of the 28th international conference on Very Large Data
Bases, pages 155–166. VLDB Endowment, 2002.

8. P. Bonatti, S. D. C. di Vimercati, and P. Samarati. An algebra for


composing access control policies. ACM Trans. Inf. Syst. Secur., 5(1):1–
35, 2002.

79
Data Leakage Detection

9. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization


of data provenance. In J. V. den Bussche and V. Vianu, editors, Database
Theory - ICDT 2001, 8th International Conference, London, UK, January
4-6, 2001, Proceedings, volume 1973 of Lecture Notes in Computer
Science, pages 316–330. Springer, 2001

10.P. Buneman and W.-C. Tan. Provenance in databases. In SIGMOD ’07:


Proceedings of the 2007 ACM SIGMOD international conference on
Management of data, pages 1171–1173, New York, NY, USA, 2007. ACM.

11.Y. Cui and J. Widom. Lineage tracing for general data warehouse
transformations. In The VLDB Journal, pages 471–480, 2001.

12.S. Czerwinski, R. Fromm, and T. Hodes. Digital music distribution and


audio watermarking.

13.F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li. Information Security


Applications, pages 138–149. Springer, Berlin / Heidelberg, 2006. An
Improved Algorithm to Watermark Numeric Relational Data.

14.F. Hartung and B. Girod. Watermarking of uncompressed and compressed


video. Signal Processing, 66(3):283–301, 1998.

15.S. Jajodia, P. Samarati, M. L. Sapino, and V. S. Subrahmanian. Flexible


support for multiple access control policies. ACM Trans. Database Syst.,
26(2):214–260, 2001.

16.Y. Li, V. Swarup, and S. Jajodia. Fingerprinting relational databases:


Schemes and specialties. IEEE Transactions on Dependable and Secure
Computing, 02(1):34–45, 2005.

80
Data Leakage Detection

17.B. Mungamuru and H. Garcia-Molina. Privacy, preservation and


performance: The 3 p’s of distributed data management. Technical report,
Stanford University, 2008.

Sites Referred:
http://www.sourcefordgde.com
http://www.networkcomputing.com/
http://www.ieee.org

http://www.almaden.ibm.com/software/quest/Resources/

http://www.computer.org/publications/dlib

http://www.ceur-ws.org/Vol-90/

http://www.microsoft.com/isapi/redir.dll?prd=ie&pver=6&ar=msnhome

Abbreviations:

OOPS  Object Oriented Programming Concepts


TCP/IP  Transmission Control Protocol/Internet Protocol
CLR  Common Language Runtime

81

You might also like