0% found this document useful (0 votes)
12 views7 pages

Sanjay REPORT

The document presents a probability-based model for detecting data leakage in smart cities, addressing the critical issue of protecting sensitive information from unauthorized access. It introduces a Bigraph-based approach to allocate data among various agents and employs a guilt identification model to determine the responsible party in case of data breaches. The proposed framework is evaluated through experiments, demonstrating its effectiveness in identifying guilty agents and ensuring data security.

Uploaded by

Harry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

Sanjay REPORT

The document presents a probability-based model for detecting data leakage in smart cities, addressing the critical issue of protecting sensitive information from unauthorized access. It introduces a Bigraph-based approach to allocate data among various agents and employs a guilt identification model to determine the responsible party in case of data breaches. The proposed framework is evaluated through experiments, demonstrating its effectiveness in identifying guilty agents and ensuring data security.

Uploaded by

Harry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

A Probability based Model for Big Data Security in Smart

City
Vishal Dattana Department of Kishu Gupta Department of Ashwani Kush University
Computing, Middle East College, Computer Science & College, Kurukshetra University,
Muscat, Sultanate of Oman Email: Applications, Kurukshetra Kurukshetra-136119, India Email:
[email protected] University, Kurukshetra-136119, [email protected]
India Email:
[email protected]

in terms of social, economic and physical


parameters
Abstract— Smart technologies at hand have
facilitated generation and collection of huge volumes
of data, on daily basis. It involves highly sensitive
and diverse data like personal, organisational, is considered as smart one [1], [2]. The most
environment, energy, transport and economic data. crucial concern about the smart city confidential
Data Analytics provide solution for various issues data at present is the issue of data breaching which
being faced by smart cities like crisis response, hampers the privacy and security of crucial data.
disaster resilience, emergence management, smart This gigantic volume of sensitive strategic data is
traffic management system etc.; it requires required to be protected from data leakages [3].
distribution of sensitive data among various entities Existing scenario of rapid growth requires the
within or outside the smart city,. Sharing of sensitive sharing of sensitive data of entity among diverse
data creates a need for efficient usage of smart city stakeholders within or outside the organization
data to provide smart applications and utility to the (here city) premises for analyzing purpose [4], [5],
end users in a trustworthy and safe mode. This [6]. But the receiving entity may misuse this data
shared sensitive data if get leaked as a consequence and can leak it deliberately or by mistake to some
can cause damage and severe risk to the city’s unauthorized third party [7], [8]. Data leakage is
resources. Fortification of critical data from defined as the deliberated or accidental distribution
unofficial disclosure is biggest issue for success of of sensitive information or data to an unauthorized
any project. Data Leakage Detection provides a set malicious entity [9]. Critical data in various
of tools and technology that can efficiently resolves organizations as shown in fig.1 [10] include
the concerns related to smart city critical data. The Intellectual Property (IP), demographic
paper, showcase an approach to detect the leakage information, infrastructure details, public sector
which is caused intentionally or unintentionally. The data, financial information and various other
model represents allotment of data objects between information depending upon the city [11].
diverse agents using Bigraph. The objective is to
make critical data secure by revealing the guilty
agent who caused the data leakage.

Keywords: Big data; Bigraph; Data Analytics; Data


Leakage; Guilt Model; IoT; Smart City.

I. INTRODUCTION

The innovation in the communication technology


has facilitated the organizations to keep a record of
nearly each and every activity or event occurred Figure 1. Smart city architecture.
within its premises. Big data simply does not mean
huge volume of data collected through sensors but Data leakage exposes a big challenge and great threat to
actually it is the data available to be analyzed using the organization confidentiality because as the count of
advances tools to endow smartness to a city by breaches increases in resultant the cost occurred due to
determining trends, opportunities and various risks these leakages also continue to increase [12], [13], [14].
associated. A city owing intelligent infrastructure It is essential to protect the confidential information as
and Agents A distributor has to distribute the data 𝒟 =
{𝐷1 , 𝐷2 , . . . , 𝐷𝑛 } among various agents 𝒜 such
it increases the risk of falling the sensitive information

that 𝒜 = {𝐴1 , 𝐴2 , . . . , 𝐴𝑚} and do not want that


in unauthorized hands and then it can be misused by

data get leak to some nasty entity. An Agent 𝐴𝑗 makes


unauthorized third party [15], [16]. Thus, it has become
essential for any organization to detect and prevent such
leakage [17]. Consequently, if restrict the data sharing
response receives a subset of data objects 𝑋𝑗 ⊆ 𝒟.
request to the data distributor for required data and in
to regulate security and privacy of sensitive information

An agent is considered as Guilty Agent 𝐺𝐴 if it leak


might reduce the organization's growth [18].

The traditional approach like watermarking, allotted data 𝑋𝑗 to any unofficial party which can
steganography for data leakage detection involves misuse the crucial data.
modification in the original data [19], [20], [21] so as an
alternate, a model to identify the malicious guilty agent  Bigraph

A Graph 𝐺(𝑈, 𝑉, 𝐸) is considered as Bigraph if its


who caused leakage of critical information and provides

vertices are dividable in two disjoint sets 𝑈 and 𝑉 in


security to safeguard the sensitive information. This

such a way that 𝑈 ∩ 𝑉 = ∅ and set of edges from the


model envisages the guilty agent by observing the

set 𝑈 to set 𝑉 is represented by 𝐸 .If 𝑢′, 𝑢′′ ∈ 𝑈 then


pattern of data allocation among various agents. In the

𝑒(𝑢 ′ , 𝑢 ′′) ∈ ∅ and 𝑣′, 𝑣′′ ∈ 𝑉 then 𝑒(𝑣 ′ , 𝑣 ′′) ∈ ∅


model, distributor allocates the requested data item

where 𝑒 ∈ 𝐸. It can be said that any edge 𝑒 can’t exist


among various agents, represented through Bigraph.
After receiving the crucial data, if agent discloses this
data to some malicious third party and sometime later between two vertices of the same set.
data is found existing at some unauthorized place,
The model employs probability estimation approach to
mechanism to detect leakage is used to unveil the
identify guilty agent. Furthermore, the scheme
leaker. This paper is structured as; in Section 2 guilt
introduces strong cryptography technique to provide
identification model has been presented. Detailed model
security to the protocol. The conceptual structure of the
has been discussed in section 3. This section describes
all the data objects 𝐷𝑖 (1 ≤ 𝑖 ≤ 𝑛) among various agents
model is as represented in the Fig. 2. Distributor shares
the allotment of required data objects between diverse

𝐴𝑗 (1 ≤ 𝑗 ≤ 𝑚) according to their demand. Later on if


agents and it computes probability of guilty entity.
Furthermore, section 4 provides the experimental
results, followed by conclusion. any agent leaks the data at some unauthorized place and
the distributor finds it, then leakage recognition
II. GUILTY AGENT IDENTIFICATION MODEL technique is applied to unveil the agent responsible for
data leakage i.e guilt agent. Probability is calculated to
This section introduces some basic definition for assess the likelihood of any entity for being guilty by
analyzing and procuring base of the model and then comparing the data allocated to various agents and then
present abstract view of the proposed model.  Entities the guilty agent is identified.
Figure 2. Guilt Detection Model.

given. It is assumed that ∀𝐷𝑖 ∈ ℒ where i= {𝜐1 ′ , 𝜐2


′ , . . . , 𝜐𝑛 ′ }, there can be two possible ways only,
either any single agent from the set 𝒵𝐷𝑖 = {𝐴𝑗 | 𝐷𝑖 ∈
𝑋𝑗 } has leaked object 𝐷𝑖 to target 𝑡 where 𝒵𝐷𝑖 is
the set of agents having 𝐷𝑖 in their allocated dataset
III. DATA DISTRIBUTIONS & PROBABILITY

COMPUTATION Agent 𝐴𝑗 sends the request 𝑅𝑞 for 𝑋𝑗 ∀ 𝑗 = {1, 2, . . . , 𝑚} or the target 𝑡 retrieved the
the required data objects to the distributor. Distributor data object 𝐷𝑖 by guess or through any other mean
without intervention of any agent 𝐴𝑗 . The probability
checks the background of the agent whether it is
to leak any data object 𝐷𝑖 to the leak data set ℒ i.e. 𝑃𝑏
available in the database 𝐷𝑏. To provide security to the
trustworthy agent and the required document is
{leak 𝐷𝑖 to ℒ} is equal ∀ 𝐴𝑗 ∈ 𝒵𝐷𝑖 if it is leaked by
any agent 𝐴𝑗 ∈ 𝒵𝐷𝑖 otherwise 𝑃𝑏 {leak 𝐷𝑖 to ℒ} is
data object, distributor encrypts the document and then
𝛼 if it is obtained by the target𝑡. It is considered that
provides the required document to the agent. Similarly,
𝐴𝑗 decision to leak any data object 𝐷𝑖 is autonomous
distributor fulfills the request of all the agents by

maintained in the database 𝐷𝑏. The allocation and to the leaking of other data object 𝐷𝑖 ′ ∀𝐷𝑖 ,𝐷𝑖 ′ ∈ ℒ
checking their malicious record and availability of data

where 𝐷𝑖 ≠ 𝐷𝑖 ′ . 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} of the agent 𝐴𝑗 to


using a directed Bigraph 𝐺(𝒟, 𝒜, 𝐸). If any data be a guilty agent 𝐺𝐴 is computed as given in Eq. (4).
distribution of data among various agents is represented

object 𝐷𝑖 is allocated to an agent 𝐴𝑗 then an edge 𝑒


exists between the node 𝐷𝑖 and 𝐴𝑗 where 𝐷𝑖 ∈ 𝒟,
𝐴𝑗 ∈ 𝒜 and 𝑒 ∈ 𝐸. Directed Bigraph can be
represented as a matrix as shown in Eq. (1). Matrix 𝐵 is
a 𝑛 × 𝑚 matrix where 𝑛 represents the number of data
objects and 𝑚 represents the number of agents.

If 𝐴𝑗 leaks all the data objects from its allocated set


𝑋𝑗 such that ℒ = 𝑋𝑗 then we compute the probability
𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 } of 𝐴𝑗 for being 𝐺𝐴 . We define a
difference function 𝜗(𝑗,𝑘) ∗ (𝐺𝐴 ) given in Eq. (5) to
The entry 𝑏𝑖𝑗 ∈ 𝐵 in Eq. (2)is 1 if there exists an edge maximize the possibility of identifying 𝐺𝐴 who leaked
between data 𝐷𝑖 and agent 𝐴𝑗 for 1 ≤ 𝑖 ≤ 𝑛 and 1 ≤ 𝑗 all its data. 𝜗(𝑗,𝑘) ∗ (𝐺𝐴 ) = 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 } −
≤ 𝑚. In Eq. (3𝐴), 𝐶𝐷𝑖 represents the number of 𝑃𝑏 {𝐺𝐴𝑘 | 𝑋𝑗 }
agents to whom data object 𝐷𝑖 is allocated and in Eq. ∀ 𝑚}
(3𝐵), 𝑅𝐴𝑗 represents the number of requests fulfilled
j, k = {1, 2, . . . ,

of the agent 𝐴𝑗 .
(5)

proposed approach, we find 𝜗̅∗ and 𝑚𝑖𝑛 𝜗 ∗ as


To evaluate and analyze the performance of the

shown in Eq. (6) and Eq. (7) respectively. The pseudo


code for the proposed framework is given in Algorithm
1.

The encrypted document 𝜉 ∗ (𝐷𝑖 ) is passed to the


agent 𝐴𝑗 and then decrypted by it. Later on, if any
agent 𝐴𝑗 leak the data at some unauthorized place and

applied to find the guilty agent 𝐺𝐴 by calculating the


the distributor discover it, then detection technique is

agents. Let 𝐺𝐴𝑗 denotes the event that agent 𝐴𝑗 is


probability on the basis of the data allocated to various

guilty agent 𝐺𝐴 and the probability 𝑃𝑏 of an agent


𝐴𝑗 for being a guilty agent 𝐺𝐴 is to be computed. 𝑃𝑏
{𝐺𝐴𝑗 | ℒ} is the probability when the leak dataset ℒ is
IV. EXPERIMENTAL RESULT
Proposed model is implemented using C/C++ and
simulated environment for data leakage problem to
conduct the experiments. The performance of given

against the parameter called as weight factor 𝒲ℱ .


framework is evaluated by computing the probability

Weight factor is the ratio of summation of all data


objects which have been allocated, with the total data
objects available for allocation purpose.
Figure 3. Evaluation of probability 𝑷𝒃 to find guilty agent 𝑮𝑨 when (a) 𝜶 = 𝟎. 𝟏 (b) 𝜶 = 𝟎. 𝟓 (c) 𝜶 = 𝟎. 𝟗 (d) |𝓛|
varies.

| ℒ} = 0.792717, 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 } = 0.952254, ̅𝜗 ∗ =


0.732435 and 𝑚𝑖𝑛 𝜗 ∗ = 0.27591 when 𝒲ℱ = 4.925.

weight factor 𝒲ℱ is high.


It shows that the probability is very high even when the

Also observed that when 𝛼 is small then the value of all


agent considered is |𝒟| = 200, |𝒜| = 50 in𝐺(𝒟, 𝒜, 𝐸). the four evaluation parameters i.e. 𝑃𝑏 {𝐺𝐴𝑗 | ℒ}, 𝑃𝑏
In this experimental scenario, fixed value of data and

In every scenario, some or all agents send the requests {𝐺𝐴𝑗 | 𝑋𝑗 }, 𝜗̅∗ and 𝑚𝑖𝑛 𝜗 ∗ are high. The reason
for data objects and weight factor increases every time. can be explained as the chances of guessing the data is
The requests of agent from interval [1 − 6] every time is unlike. It is more likely that any agent has leaked the
chosen for each scenario. At last each agent has data data objects and each agent has ample of the leaked

𝛼 increases, the probability to identify the malicious


objects in the range[1− 25]. data. In Fig. 3b and Fig. 3c, shows that as the value of

set i.e |ℒ| = 100 and varies the value of weight factor
First experiment considers 100 objects in the leak data entity, the average success rate and the detection rate

𝒲ℱ and guessing probability 𝛼. In Fig. 3, 𝑃1


decreases as it becomes more and more likely that data

represents𝑃𝑏 {𝐺𝐴𝑗 | ℒ}. It is the average probability decreases. From Fig. 3b, observations are that when 𝛼
is guessed by the target and their guilt probability

when leak dataset ℒ is given. 𝑃2 illustrates the curve = 0.5 and 𝒲ℱ = 4.925 then 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} =
for 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 } representing average probability 0.640587, 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 } = 0.874787, 𝜗̅∗ =
when all the allocated data set is leaked by agent. 𝑃𝑃1 0.728014 and 𝑚𝑖𝑛 𝜗 ∗ = 0.38134. Experimental result

parameters 𝜗̅∗ and 𝑚𝑖𝑛 𝜗 ∗ respectively. Average is high even when the weight factor 𝒲ℱ and guessing
and𝑃𝑃2 constitutes the curve for the performance shows that the probability to detect the malicious entity

success rate to detect guilty agent is represented by 𝜗̅∗ probability 𝛼 is high. In Fig. 3c, 𝛼 is very high equal to
and 𝑚𝑖𝑛 𝜗 ∗ represents the detection rate in case two 0.9, and have the average success rate equal to

values are fixed as 𝛼 = 0.3 and evaluated the


agents possesses same probability of being guilty. 0.300168 that is also high. In the next experiment,

From Fig. 3, observations are that the value of 𝑃𝑏 probability 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} at various loads by varying
{𝐺𝐴𝑗 | ℒ}, 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 }, 𝜗̅∗ and 𝑚𝑖𝑛 𝜗 ∗ the number of objects in the leak dataset ℒ. Fig. 3d
decreases as weight factor 𝒲ℱ increases i.e. the shows the curve for 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} when the size of ℒ
variesi.e. |ℒ| = {50, 100, 150, 200} respectively. For the
increment in weight factor 𝒲ℱ. In Fig. 3a, 𝑃𝑏 {𝐺𝐴𝑗
probability to identify the guilty agent decreases with
fixed allocation of dataset among agents i.e. allocated
dataset 𝑋𝑗 of various agents 𝐴𝑗 (1 ≤ 𝑗 ≤ 𝑚) remains [7] K Kaur, I Gupta, and A.K. Singh, "A Comparative
same, observation is that as the data in leak dataset Evaluation of Data Leakage/Loss prevention Systems
increases, the probability to identify the malicious entity (DLPS)," in 4th International Conference on Computer
also increases. This can be showcased as evidence Science & Information Technology (CS & IT-CSCP),
against the leaker responsible for data leakage. This Dubai, UAE, 2017, pp. 87-95.
method can detect the guilty agent successfully with a
[8] M Backes, N Grimm, and A Kate, "Lime: Data
high rate which proves the effectiveness of the scheme.
Lineage in the Malicious Environment," in 10th
V. CONCLUSION International Workshop Security Trust Management,
2014, pp. 183-187.
Smart city is data driven big data collected by
ubiquitous smart things like various sensors, audio- [9] A. Kumar, A. Goyal, A. Kumar, N. K. Chaudhary,
visual cameras etc. transform the lives of residents by and S., S. Kamath, "Comparative Evaluation of
availing a plenty of smart and intelligent applications Algorithms for Effective Data Leakage Detection," in
and aid in decision making ability. Successful IEEE Conference on Information and Communication
implementation of smart city concept depends on the Technologies (ICT 2013), vol. 13, 2013, pp. 177-182.
efficient usage and security of sensitive data. Given
[10] S Sholla, R Naaz, and M.A Chishti, "Semantic
model solves data leakage problem using a guilt agent
Smart City: Context Aware Application Architecture,"
identification model to detect the leakages that are
in 2nd International Conference on Electronics,
caused intentionally or unintentionally. It finds the
Communication and Technology (ICECA), 2018, pp.
chances of the agent for being guilty by computing
721-724.
probability depending on the data allocated among
diverse agents through Bigraph. Information leaker is [11] A. Shabtai, Y. Elovici, and L. Rokach,. NewYork:
identified by comparing the calculated probability of Springer, 2012, ch. Introduction to Information Security
leaking the data and the confidential information is and Data Leakage, pp. 1-87.
preserved. Future efforts could be made to improve the
security of the most sensitive information via [12] X. Shu and D. Yao, "Data Leak Detection as a
considering the threshold value. Service," in Springer, International Conference on
Security and Privacy in Communication Systems, 2012,
REFERENCES pp. 222-240.
[1] A Ismail, "Utilizing Big Data Analytics as a solution [13] F Liu, X Shu, D Yao, and A.R. Butt, "Privacy-
for Smart Cities," in 3rd MEC International Conference Preserving Scanning of Big Content for Sensitive Data
on Big Data and Smart City, 2016, pp. 1-5. Exposure with MapReduce," in 5th ACM Conference
Data Application Security, Privacy (CODASPY),
[2] A Sharif, J Li, M Khalil, R Kumar, and M.I Sharif,
Texas, USA, 2015, pp. 195-206.
"Internet of Things- Smart Traffic Management System
for Smart Cities using Big Data Analytics," in IEEE, [14] X. Shu, J. Zhang, D. Yao, and W. C. Feng, "Fast
2017, pp. 281-284. Detection of Transformed Data Leaks," IEEE
Transactions on Information Forensics and Security,
[3] C Xu, X Huang, J Zhu, and K Zhang, "Reseach on
vol. 11, no. 3, pp. 528-542, March 2016.
the Construction of Sanya Smart Tourism City based on
Internet and Big Data," in International Conference on [15] M Gafny, A Shabtai, L Rokach, and Y Elovici,
Intelligent Transportation, Big Data & Smart City, "Detecting Data Misuse by Applying Context- Based
2018, pp. 125-128. Data Linkage," ACM workshop Insider Threats, pp. 3-
12, 2010.
[4] P Papadimitriou and H.G Molina, "Data Leakage
Detection," IEEE Transaction on Knowledge and Data [16] K Kaur, I Gupta, and A.K. Singh, "A Comparative
Engineering, vol. 23, no. 1, pp. 51-63, January 2011. Study of the Approach Provided for Preventing the Data
Leakage," vol. 9, no. 5, pp. 21-33, 2017.
[5] J Croft and M Caesar, "Towards Practical
Avoidence of Information Leakage in Enterprise [17] X. Shu and D. Yao, "Privacy-Preserving Detection
Networks," in 6th USENIX conference Hot Topics of Sensitive Data Exposure," IEEE Transactions on
Securty (HotSec), CA, USA, 2011, p. 7. Information forensics and Security, vol. 10, no. 5, pp.
1092-1103, May 2015.
[6] I Gupta and A.K. Singh, "A Probabilistic Approach
for Guilty Agent Detection using Bigraph after [18] A Harel, A Shabtai, L Rokach, and Y Elovici, "M-
Distribution of Sample Data," in Procedia Computer Score: A Miuseability Weight Measure," IEEE:
Science, vol. 125, 2018, pp. 662-668.
Dependable Secure Comput., vol. 9, no. 3, pp. 414-428,
2012.

[19] K Gupta and A Kush, "A Review on Data Leakage


Detection for Secure," International Journal of
Engineering and Advanced Technology (IJEAT), vol. 7,
no. 1, pp. 153-159, October 2017.

[20] K Gupta and A Kush, "Performance Evaluation on


Data Leakage Detection for Secure Communication," in
5th International Conference on “ Co mputing for
Sustainable Global Develop ment: INDIACom, New
Delhi, India, 2018, pp. 3957-3960.

[21] K Kaur, I Gupta, and A.K. Singh, "Data Leakage


Prevention: E-Mail Protection via Gateway," in IOP
Conf. Series: Journal of Physics: Conf. Series., 2017,
pp. 1-5.

You might also like