A Probability Based Model For Big Data Security in Smart City
A Probability Based Model For Big Data Security in Smart City
Smart City
Vishal Dattana Kishu Gupta Ashwani Kush
Department of Computing, Middle East Department of Computer Science & University College,
College, Applications, Kurukshetra University, Kurukshetra-136119,
Muscat, Sultanate of Oman Kurukshetra University, Kurukshetra-136119, India
Email: [email protected] India Email: [email protected]
Email: [email protected]
Abstract— Smart technologies at hand have facilitated Existing scenario of rapid growth requires the sharing of
generation and collection of huge volumes of data, on daily sensitive data of entity among diverse stakeholders within or
basis. It involves highly sensitive and diverse data like personal, outside the organization (here city) premises for analyzing
organisational, environment, energy, transport and economic purpose [4], [5], [6]. But the receiving entity may misuse this
data. Data Analytics provide solution for various issues being
data and can leak it deliberately or by mistake to some
faced by smart cities like crisis response, disaster resilience,
emergence management, smart traffic management system unauthorized third party [7], [8]. Data leakage is defined as
etc.; it requires distribution of sensitive data among various the deliberated or accidental distribution of sensitive
entities within or outside the smart city,. Sharing of sensitive information or data to an unauthorized malicious entity [9].
data creates a need for efficient usage of smart city data to Critical data in various organizations as shown in fig.1 [10]
provide smart applications and utility to the end users in a include Intellectual Property (IP), demographic information,
trustworthy and safe mode. This shared sensitive data if get infrastructure details, public sector data, financial
leaked as a consequence can cause damage and severe risk to information and various other information depending upon
the city’s resources. Fortification of critical data from the city [11].
unofficial disclosure is biggest issue for success of any project.
Data Leakage Detection provides a set of tools and technology
that can efficiently resolves the concerns related to smart city
critical data. The paper, showcase an approach to detect the
leakage which is caused intentionally or unintentionally. The
model represents allotment of data objects between diverse
agents using Bigraph. The objective is to make critical data
secure by revealing the guilty agent who caused the data
leakage.
I. INTRODUCTION
The innovation in the communication technology has Figure 1. Smart city architecture.
facilitated the organizations to keep a record of nearly each
and every activity or event occurred within its premises. Big Data leakage exposes a big challenge and great threat to
data simply does not mean huge volume of data collected the organization confidentiality because as the count of
through sensors but actually it is the data available to be breaches increases in resultant the cost occurred due to these
analyzed using advances tools to endow smartness to a city leakages also continue to increase [12], [13], [14]. It is
by determining trends, opportunities and various risks essential to protect the confidential information as it increases
associated. A city owing intelligent infrastructure in terms of the risk of falling the sensitive information in unauthorized
social, economic and physical parameters is considered as hands and then it can be misused by unauthorized third party
smart one [1], [2]. [15], [16]. Thus, it has become essential for any organization
to detect and prevent such leakage [17]. Consequently, if
The most crucial concern about the smart city restrict the data sharing to regulate security and privacy of
confidential data at present is the issue of data breaching sensitive information might reduce the organization's growth
which hampers the privacy and security of crucial data. This [18].
gigantic volume of sensitive strategic data is required to be
protected from data leakages [3].
Leak
the data
Database
Maintenance
Distributo Database
r Data Allocation &
Distribution
Probability
Calculation
0.8 0.8
probability
probability
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 4 5 0 1 2 3 4 5
weight factor weight factor
(a) (b)
1 1
0.8 0.8
probability
probability
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 4 5 0 1 2 3 4 5
weight factor weight factor
(C) (d)
Figure 3. Evaluation of probability 𝑷𝒃 to find guilty agent 𝑮𝑨 when (a) 𝜶 = 𝟎. 𝟏 (b) 𝜶 = 𝟎. 𝟓 (c) 𝜶 = 𝟎. 𝟗 (d) |𝓛| varies.
𝑚
First experiment considers 100 objects in the leak data
∑ R Aj
𝑗=1 set i.e |ℒ| = 100 and varies the value of weight factor 𝒲ℱ
𝒲ℱ = and guessing probability 𝛼. In Fig. 3, 𝑃1
|𝒟|
represents𝑃𝑏 {𝐺𝐴𝑗 | ℒ}. It is the average probability when leak
In this experimental scenario, fixed value of data and
dataset ℒ is given. 𝑃2 illustrates the curve for 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 }
agent considered is |𝒟| = 200, |𝒜| = 50 in𝐺(𝒟, 𝒜, 𝐸). In
every scenario, some or all agents send the requests for data representing average probability when all the allocated data
objects and weight factor increases every time. The requests set is leaked by agent. 𝑃𝑃1 and𝑃𝑃2 constitutes the curve for
of agent from interval [1 − 6] every time is chosen for each the performance parameters 𝜗̅ ∗ and 𝑚𝑖𝑛 𝜗 ∗ respectively.
scenario. At last each agent has data objects in the range[1 − Average success rate to detect guilty agent is represented by
25]. 𝜗̅ ∗ and 𝑚𝑖𝑛 𝜗 ∗ represents the detection rate in case two
agents possesses same probability of being guilty.
From Fig. 3, observations are that the value of the data allocated among diverse agents through Bigraph.
𝑃𝑏 {𝐺𝐴𝑗 | ℒ}, 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 }, 𝜗̅ ∗ and 𝑚𝑖𝑛 𝜗 ∗ decreases as Information leaker is identified by comparing the calculated
weight factor 𝒲ℱ increases i.e. the probability to identify the probability of leaking the data and the confidential
guilty agent decreases with increment in weight factor 𝒲ℱ . information is preserved. Future efforts could be made to
In Fig. 3a, 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} = 0.792717, 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 } = improve the security of the most sensitive information via
considering the threshold value.
0.952254, ̅𝜗 ∗ = 0.732435 and 𝑚𝑖𝑛 𝜗 ∗ = 0.27591 when
𝒲ℱ = 4.925. It shows that the probability is very high even REFERENCES
when the weight factor 𝒲ℱ is high.
[1] A Ismail, "Utilizing Big Data Analytics as a solution for Smart
Cities," in 3rd MEC International Conference on Big Data and
Also observed that when 𝛼 is small then the value of all Smart City, 2016, pp. 1-5.
the four evaluation parameters i.e.
𝑃𝑏 {𝐺𝐴𝑗 | ℒ}, 𝑃𝑏 {𝐺𝐴𝑗 | 𝑋𝑗 }, 𝜗̅ ∗ and 𝑚𝑖𝑛 𝜗 ∗ are high. The [2] A Sharif, J Li, M Khalil, R Kumar, and M.I Sharif, "Internet of
Things- Smart Traffic Management System for Smart Cities
reason can be explained as the chances of guessing the data using Big Data Analytics," in IEEE, 2017, pp. 281-284.
is unlike. It is more likely that any agent has leaked the data [3] C Xu, X Huang, J Zhu, and K Zhang, "Reseach on the Construction
objects and each agent has ample of the leaked data. In Fig. of Sanya Smart Tourism City based on Internet and Big Data,"
3b and Fig. 3c, shows that as the value of 𝛼 increases, the in International Conference on Intelligent Transportation, Big
Data & Smart City, 2018, pp. 125-128.
probability to identify the malicious entity, the average
success rate and the detection rate decreases as it becomes [4] P Papadimitriou and H.G Molina, "Data Leakage Detection," IEEE
Transaction on Knowledge and Data Engineering, vol. 23, no.
more and more likely that data is guessed by the target and 1, pp. 51-63, January 2011.
their guilt probability decreases.
[5] J Croft and M Caesar, "Towards Practical Avoidence of Information
From Fig. 3b, observations are that when 𝛼 = 0.5 and Leakage in Enterprise Networks," in 6th USENIX conference
Hot Topics Securty (HotSec), CA, USA, 2011, p. 7.
𝒲ℱ = 4.925 then 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} =
0.640587, 𝑃𝑏 {𝐺𝐴 | 𝑋𝑗 } = 0.874787, 𝜗 =̅ ∗ [6] I Gupta and A.K. Singh, "A Probabilistic Approach for Guilty Agent
𝑗 Detection using Bigraph after Distribution of Sample Data," in
0.728014 and 𝑚𝑖𝑛 𝜗 ∗ = 0.38134. Experimental result Procedia Computer Science, vol. 125, 2018, pp. 662-668.
shows that the probability to detect the malicious entity is [7] K Kaur, I Gupta, and A.K. Singh, "A Comparative Evaluation of
high even when the weight factor 𝒲ℱ and guessing Data Leakage/Loss prevention Systems (DLPS)," in 4th
probability 𝛼 is high. In Fig. 3c, 𝛼 is very high equal to 0.9, International Conference on Computer Science & Information
Technology (CS & IT-CSCP), Dubai, UAE, 2017, pp. 87-95.
and have the average success rate equal to 0.300168 that is
also high. [8] M Backes, N Grimm, and A Kate, "Lime: Data Lineage in the
Malicious Environment," in 10th International Workshop
In the next experiment, values are fixed as 𝛼 = 0.3 and Security Trust Management, 2014, pp. 183-187.
evaluated the probability 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} at various loads by [9] A. Kumar, A. Goyal, A. Kumar, N. K. Chaudhary, and S., S.
Kamath, "Comparative Evaluation of Algorithms for Effective
varying the number of objects in the leak dataset ℒ. Fig. 3d Data Leakage Detection," in IEEE Conference on Information
shows the curve for 𝑃𝑏 {𝐺𝐴𝑗 | ℒ} when the size of ℒ varies i.e. and Communication Technologies (ICT 2013), vol. 13, 2013,
pp. 177-182.
|ℒ| = {50, 100, 150, 200} respectively. For the fixed
allocation of dataset among agents i.e. allocated dataset 𝑋𝑗 of [10] S Sholla, R Naaz, and M.A Chishti, "Semantic Smart City: Context
Aware Application Architecture," in 2nd International
various agents 𝐴𝑗 (1 ≤ 𝑗 ≤ 𝑚) remains same, observation is Conference on Electronics, Communication and Technology
that as the data in leak dataset increases, the probability to (ICECA), 2018, pp. 721-724.
identify the malicious entity also increases. This can be [11] A. Shabtai, Y. Elovici, and L. Rokach,. NewYork: Springer, 2012,
showcased as evidence against the leaker responsible for data ch. Introduction to Information Security and Data Leakage, pp.
leakage. This method can detect the guilty agent successfully 1-87.
with a high rate which proves the effectiveness of the scheme. [12] X. Shu and D. Yao, "Data Leak Detection as a Service," in Springer,
International Conference on Security and Privacy in
Communication Systems, 2012, pp. 222-240.
V. CONCLUSION
Smart city is data driven big data collected by ubiquitous [13] F Liu, X Shu, D Yao, and A.R. Butt, "Privacy- Preserving Scanning
of Big Content for Sensitive Data Exposure with MapReduce,"
smart things like various sensors, audio-visual cameras etc. in 5th ACM Conference Data Application Security, Privacy
transform the lives of residents by availing a plenty of smart (CODASPY), Texas, USA, 2015, pp. 195-206.
and intelligent applications and aid in decision making
ability. Successful implementation of smart city concept [14] X. Shu, J. Zhang, D. Yao, and W. C. Feng, "Fast Detection of
Transformed Data Leaks," IEEE Transactions on Information
depends on the efficient usage and security of sensitive data. Forensics and Security, vol. 11, no. 3, pp. 528-542, March
Given model solves data leakage problem using a guilt agent 2016.
identification model to detect the leakages that are caused [15] M Gafny, A Shabtai, L Rokach, and Y Elovici, "Detecting Data
intentionally or unintentionally. It finds the chances of the Misuse by Applying Context- Based Data Linkage," ACM
workshop Insider Threats, pp. 3-12, 2010.
agent for being guilty by computing probability depending on
[16] K Kaur, I Gupta, and A.K. Singh, "A Comparative Study of the
Approach Provided for Preventing the Data Leakage," vol. 9,
no. 5, pp. 21-33, 2017.
[19] K Gupta and A Kush, "A Review on Data Leakage Detection for
Secure," International Journal of Engineering and Advanced
Technology (IJEAT), vol. 7, no. 1, pp. 153-159, October 2017.
[21] K Kaur, I Gupta, and A.K. Singh, "Data Leakage Prevention: E-Mail
Protection via Gateway," in IOP Conf. Series: Journal of
Physics: Conf. Series., 2017, pp. 1-5.