Data Leakage Detection
Data Leakage Detection
By
Prisha Sondhi-
RA2111026010391
Ravi Mounika krishna
RA2111026010411
Abstract
We study the following problem: A data distributor has given sensitive
data to a set of supposedly trusted agents (third parties). Some of the data are
leaked and found in an unauthorized place (e.g., on the web or somebody’s
laptop). The distributor must assess the likelihood that the leaked data came
from one or more agents, as opposed to having been independently gathered
by other means.
We propose data allocation strategies (across the agents) that improve
the probability of identifying leakages. These methods do not rely on
alterations of the released data (e.g., watermarks). In some cases, we can also
inject “realistic but fake” data records to further improve our chances of
detecting leakage and identifying the guilty party.
Introduction:
Leakage is a budding security threat to organizations, particularly
when data leakage is carried out by trusted agents. In this paper, we
present unobtrusive techniques for detecting data leakage and
assessing the “guilt” of agents. Water marking is the long-established
technique used for data leakage detection which involves some
modification to the original data.
To overcome the disadvantages of using watermark, data allocation
strategies are used to improve the feasibility of detecting guilty agent.
Distributor ”intelligently” allocates data based on sample request and
explicit request using allocation strategies in order to better the
effectiveness in detecting guilty agent.
Fake objects are designed to look like real objects, and are distributed
to agents together with requested data. Fake objects encrypted with a
private key are designed to look like real objects, and are distributed to
agents together with requested data. By this way we can identify, the
guilty agent who leaked the data by decrypting his fake object.
Aim of the Project
The aim of the project is to overcome data allocation problem and to send
secured data for third party agent. Our goal is detect when the distributor’s sensitive
data has been leaked by agents, and if possible to identify the agent that leaked the
data. We develop unobtrusive techniques for detecting leakage of a set of objects or
records.
Watermarks can be very useful in some cases, but again, involve some
modification of the original data. Furthermore, watermarks can sometimes be
destroyed if the data recipient is malicious.
E.g. A hospital may give patient records to researchers who will devise new
treatments. Similarly, a company may have partnerships with other companies
that require sharing customer data.
Another enterprise may outsource its data processing, so data must be given
to various other companies. We call the owner of the data the distributor and
the supposedly trusted third parties the agents.
Disadvantage Existing System
Cannot detect leaked data
Cannot detect source of Leaked data
Not secure
Can lead to huge losses
PROPOSED SYSTEM:
Our goal is to detect when the distributor’s sensitive data has been leaked by agents,
and if possible to identify the agent that leaked the data. Perturbation is a very useful
technique where the data is modified and made “less sensitive” before being handed to
agents. we develop unobtrusive techniques for detecting leakage of a set of objects or
records.
In this section we develop a model for assessing the “guilt” of agents. We also present
algorithms for distributing objects to agents, in a way that improves our chances of
identifying a leaker.
Finally, we also consider the option of adding “fake” objects to the distributed set.
Such objects do not correspond to real entities but appear realistic to the agents.
In a sense, the fake objects acts as a type of watermark for the entire set, without
modifying any individual members. If it turns out an agent was given one or more fake
objects that were leaked, then the distributor can be more confident that agent was
guilty.
Advantage Proposed System:-
We can provide security to our data during its distribution or
transmission and even we can detect if that gets leaked
we have presented implement a variety of data distribution strategies
that can improve the distributor’s chances of identifying a leaker
Quick response time
Customized processing
Small memory factor
Highly secure
Replication in Heterogenic Database
Easy updating.
MODULES:
1. Data Allocation Module:
The main focus of our project is the data allocation problem as how can
the distributor “intelligently” give data to agents in order to improve the chances
of detecting a guilty agent , Admin can send the files to the authenticated user,
users can edit their account details etc. Agent views the secret key details through
mail. In order to increase the chances of detecting agents that leak data.
4. Data Distributor:
A data distributor has given sensitive data to a set of supposedly
trusted agents (third parties). Some of the data is leaked and found in an
unauthorized place (e.g., on the web or somebody’s laptop). The distributor
must assess the likelihood that the leaked data came from one or more agents,
as opposed to having been independently gathered by other means . Admin can
able to view the which file is leaking and fake user’s details also.
Flowchart: Start
User Login
Admin Client
Authentication
Stop
Advantages & Disadvantages:
Disadvantages of Existing system:
Cannot detect leaked data
Cannot detect source of Leaked data
Not secure
Can lead to huge losses