An Access Control Scheme For Big Data Processing: Vincent C. Hu, Tim Grance, David F. Ferraiolo, D. Rick Kuhn

This document proposes an access control scheme for distributed big data processing clusters. It first describes how big data is difficult to process due to its large size and complex formats. It then discusses how access control is critical for security but becomes more challenging for complex systems like big data processing. The document proposes collaborating access control across domains and distributed management to protect big data processing components and users.

Uploaded by

NAGA KUMARI ODUGU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views7 pages

An Access Control Scheme For Big Data Processing: Vincent C. Hu, Tim Grance, David F. Ferraiolo, D. Rick Kuhn

Uploaded by

NAGA KUMARI ODUGU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

An Access Control Scheme for Big Data Processing

Vincent C. Hu, Tim Grance, David F. Ferraiolo, D. Rick Kuhn

National Institute of Standards and Technology
Gaithersburg, MD, USA
vhu, grance, dferraiolo, [email protected]

Abstract— Access Control (AC) systems are among the most BD has denser and higher resolutions such as media,
critical of network security components. A system’s privacy and photos, and videos from sources such as social media, mobile
security controls are more likely to be compromised due to the applications, public records, and databases; the data is either in
misconfiguration of access control policies rather than the failure static batches or dynamically generated by machine and users
of cryptographic primitives or protocols. This problem becomes by the advanced capacities of hardware, software, and network
increasingly severe as software systems become more and more technologies. Examples include data from sensor networks or
complex, such as Big Data (BD) processing systems, which are tracking user behavior. Rapidly increasing volumes of data and
deployed to manage a large amount of sensitive information and data objects add enormous pressure on existing IT
resources organized into a sophisticated BD processing cluster.
infrastructures with scaling difficulties such as capabilities for
Basically, BD access control requires the collaboration among
cooperating processing domains to be protected as computing
data storage, advance analysis, and security. These difficulties
environments that consist of computing units under distributed result from BD’s large and growing files, at high speed, and in
AC managements. Many BD architecture designs were proposed various formats, as is measured by: Velocity (the data comes at
to address BD challenges; however, most of them were focused high speed, e.g., scientific data such as data from weather
on the processing capabilities of the “three Vs” (Velocity, patterns.); Volume (the data results from large files, e.g.,
Volume, and Variety). Considerations for security in protecting Facebook generates 25TB of data daily.); and Variety (the files
BD are mostly ad hoc and patch efforts. Even with some come in various formats: audio, video, text messages, etc. [2]).
inclusion of security in recent BD systems, a critical security Therefore, BD data processing systems must be able to deal
component, AC (Authorization), for protecting BD processing with collecting, analyzing, and securing BD data that requires
components and their users from the insider attacks, remains processing very large data sets that defy conventional data
elusive. This paper proposes a general purpose AC scheme for management, analysis, and security technologies. In simple
distributed BD processing clusters. ways, some solutions use a dedicated system for their BD
processing. However, to maximize scalability and
Keywords—Access Control, Authorization, Big Data, performance, most BD processing systems apply massively
Distributed System parallel software running on many commodity computers in
distributed computing frameworks that may include columnar
I. INTRODUCTION databases and other BD management solutions [5].
Data IQ News [1] estimates that the global data population Access Control (AC) systems are among the most critical
will reach 44 zettabytes (1 billion terabytes) by 2020. This of network security components. It is more likely that privacy
growth trend is influencing the way data is being mass or security will be compromised due to the misconfiguration of
collected and produced for high-performance computing or access control policies than from a failure of a cryptographic
operations and planning analysis. Big Data (BD) refers to large primitive or protocol. This problem becomes increasingly
data that is difficult to process by using a traditional data severe as software systems become more and more complex
processing system, for example, to analyze Internet data traffic, such as BD processing systems, which are deployed to manage
or edit video data of hundreds of gigabytes. (Note that each a large amount of sensitive information and resources
case depends on the capabilities of a system; it has been argued organized into a sophisticated BD processing cluster. Basically,
that for some organizations, terabytes of text, audio, and video BD AC systems require collaboration among corporate
data per day can be processed, thus, it is not BD, but for those processing domains as protected computing environments,
organizations that cannot process efficiently, it is BD [2]). BD which consist of computing units under distributed AC
technology is gradually reshaping current data systems and management [6].
practices. Government Computer News [3] estimates that the
volume of data stored by federal agencies alone will increase Many architecture designs have been proposed to address
from 1.6 to 2.6 petabytes within two years, and U.S. state and BD challenges; however, most of them have been focused on
local governments are just as keen on harnessing the power of the processing capabilities of the “three Vs” (Velocity,
BD to boost security, prevent fraud, enhance service delivery, Volume, and Variety). Considerations for security in protecting
and improve emergency response. It is estimated that BD AC are mostly ad hoc and patch efforts. Even with the
successfully leveraging technologies for BD can reduce the IT inclusion of some security capability in recent BD systems,
cost by an average of 48% [4].
practical AC (authorization) for BD processing components is throughout industry and government. Hadoop keeps data and
not readily available. processing resources in close proximity within the cluster. It
runs on a distributed model composed of numerous low-cost
This paper proposes a general AC scheme for distributed computers (e.g., Linux-based machines with simple
BD processing clusters. Section II describes current BD tools architecture): two main MS components: TD – MapReduce
and implementations. Section III discusses BD AC and DD - File System (HDFS-Hadoop file system), and a set
requirements. Section IV introduces related work. Section V of tools. The TD provides distributed data processing across
illustrates our BD AC scheme. Section VI discusses the cluster, and the DD distributes large data sets across the
implementation considerations for the general BD model. servers in the cluster [3]. Hadoop’s CSs are called Slaves, and
Section VII concludes the paper. each has two components: Task Tracker and Data Node. The
MS contains two additional components: Job Tracker and
II. GENERAL BIG DATA MODEL AND EXAMPLE Name Node. Job Tracker and Task Tracker are grouped as
The fundamental model for most of the current BD MapReduce, and Name Node and Data Node fall under HDFS.
architecture designs is based on the concept of distributed
processing [2, 4], which contains a set of generic processing
systems as shown in Figure 1:
1. Master System (MS) receives data from BD data source
providers, and determines processing steps in response to a
user’s request. MS has the following three major functions:
• Task Distribution (TD) function is responsible for
distributing processes to the cooperated (slave) systems of the
BD cluster.
• Data distribution (DD) function is responsible for
distributing data to the cooperated systems of the BD cluster.
• Result Collection (RC) function processes collects
and analyzes information provided by cooperating systems of
the BD cluster, and generates aggregated result to users.
Unless restricted by specific applications, the three
functions are usually installed and managed in the same host
machine for easy and secure management and maintenance.
Fig. 2. Hadoop BD cluster example
2. Cooperated System (CS) (or slave system) is assigned and
trusted by MS for BD processing. CS reports progress or Hadoop combines storage, servers, and networking to break
problems to the TD and DD, otherwise, returns the computed data as well as computation down to small pieces. Each
result to the RC of MS. computation is assigned a small piece of data, so that instead
one big computation, numerous small computations are
performed much faster, with the result aggregated and sent
back to the application [2]. Thus, Hadoop provides linear
scalability, using as many computers as required. Cluster
communication between MS and CSs manages what traditional
data service systems would not be able to handle.

III. BIG DATA ACCESS CONTROL CHALLENGES

Enterprises want the same security capabilities for BD as
are in place for “non-BD” information systems, including user
authentication and authorization (AC). According to [4], the
biggest challenge in deploying BD technologies is security
(50% of those surveyed), and the biggest challenge working
with and leveraging technologies for BD data is to maintain
data security (47% of those surveyed). One of the fundamental
security techniques is AC policy enforcement and management
Fig. 1. General BD model [8], which allows organizations to safeguard their BD in order
to meet security and privacy mandates. However, the three Vs
We define a BD Cluster as a BD distributed system that of data are overwhelming for existing system models, which
networks MS and CSs to serve BD users’ requests to process were not designed and built with AC capability in mind [9].
BD source data. The model in Figure 1 represents a generic Thus, most of them fail to adequately manage the creation, use,
distributed BD architecture such as the Apache Foundation’s and dissemination of BD data and process. As a result, they
open source software Hadoop [7] (Figure 2), which is used either introduce friction into collaboration through excessively
strict rules, or risk serious data loss by sharing data too can only restrict access on an IP/port basis, and knows nothing
permissively [5]. of the architecture of the BD cluster. As a result, security
administrators have to segregate sensitive data on separate
Authentication is different from authorization, as servers in order to control access. It would require the creation
distinguished in [10]; the authentication management function of a second BD cluster to contain sensitive data, and even then
is not directly related to the data content. For BD, as for non- would only provide two levels of security for the data. Even
BD data systems, authentication is generally handled by MS without those drawbacks, perimeter security solutions represent
and CSs independently. The focus of our BD scheme is on a single layer of defense around a soft interior; for example,
authorization (AC), which is more complex than non-BD once a firewall is breached, the system is wide open for attack
systems, because of the need to synchronize access privileges [9].
between the MS and CSs.
Hadoop, like many open source technologies, was not
BD AC must not only enforce access control policies on created with security in mind. It uses the MapReduce facility
data leaving the MS, it must also control access to the CSs’ and a distributed file system with no built-in security. The
resources. Depending on the sensitivity of the data, it needs to Hadoop community realized that more robust security controls
make certain that BD applications, the MS, and CSs have were needed, and decided to focus on security by applying
permissions to access the data that they are analyzing, and deal
technologies including Kerberos, firewalls, and basic HDFS
with the access to the distributed BD process and data from permissions [9, 11]. The Kerberos implementation utilized the
their local users [11]. The characteristics of BD distributed token-based framework to support a flexible authorization
computing model, as illustrated below, pertain to a unique set enforcement engine that aims to replace (but be backwards
of challenges for BD AC, which requires a different set of compatible with) the current AC Lists (ACLs) approaches for
concepts and considerations. AC, thus to support an advanced authorization model, focusing
Like Hadoop, support for the BD’s three V features on Attribute Based AC (ABAC) and the XACML standard.
complicates a system’s AC implementation, because the However, Kerberos is difficult to install and configure on the
difficulties are in general handled by the following techniques, MS and CSs, and to integrate with Active Directory (AD) and
each with its security challenge. Lightweight Directory Access Protocol, (LDAP) services. A
malicious developer could easily write code to impersonate
• Distributed computing – BD data is processed anywhere users’ Hadoop services (e.g., writing a new TaskTracker and
resources are available, enabling massively parallel registering itself as a Hadoop service, or impersonating the
computation between MS and CSs. This creates complicated HDFS or mapped users, deleting everything in HDFS, etc.). In
environments that are highly vulnerable to attack, as opposed addition, DataNodes enforced no AC; a malicious user could
to the centralized repositories that are monolithic and easier to read arbitrary data blocks from DataNodes, bypassing AC
secure. restrictions, or writing garbage data to DataNodes,
• Fragmented/redundant data - Data within BD clusters is undermining the integrity of the data to be analyzed. Further,
fluid, with multiple copies moving to and from MS and CSs to anyone could submit a job to a JobTracker and it could be
ensure redundancy and resiliency. Data can become sliced into arbitrarily executed.
fragments that are shared across them. This fragmentation adds Some components of the Hadoop ecosystem have applied
complexity to the data integrity and confidentiality. their own security as a layer over Hadoop; for example,
• Node-to-node communication - MS and CSs usually Apache Accumulo [12] provides cell-level authorization, and
communicate through unsecure protocols such as RPC over HBase [13] provides AC at the column and family level [11].
TCP/IP [9]. Some of them configured Hadoop to perform AC based on user
and group permissions by ACLs, but this may not be enough
for every organization, because many organizations use
IV. RELATED WORK
flexible and dynamic AC policies based on security attributes
Tools and techniques for BD AC should protect BD of users and resources and business processes, so the ACL
processes and data, ensuring that security policies are enforced approach is certainly limited [11].
in a cost-effective and timely manner. Currently, only a few
approaches that address the unique architecture of distributed For decades, relational, or SQL-based databases, have been
computing can meet the security requirements of BD AC. the database schema of choice to store and manage data; such
Some provide an enterprise-class security solution by generally databases allow data to be stored by predefined schema such as
applying traditional perimeter security solutions for a control RDBMS’s row and column in table format. SQL-based
point (gateway/perimeter such as firewalls and intrusion databases support AC on data queries by assigning column or
detection/prevention technologies) where data and commands row with security attributes so that they conform to the
enter the MS or CS. But traditional approaches that rely on Attribute-Based AC (ABAC) [10] model that is central to
perimeter security are unable to adequately secure a BD many database security frameworks. But in the era of BD, the
cluster. For example, firewalls attempt to map IP to actual AD traditional database model has difficulty dealing with the
(Active Directory) credentials, but this is problematic in the multitude of unstructured data types, as well as the massive
BD cluster, because it requires specific network design (i.e., no amounts of data that must be stored, managed, and
Network Address Translation (NAT) from internal corporate manipulated. Many applications employ NoSQL [14] for
sub-nets). Even with special network configuration, a firewall handling unstructured, messy, and unpredictable data. NoSQL
encompasses a wide variety of different database technologies
that were developed in response to a rise in the volume of data data/process by considering the processing capabilities (e.g.,
stored. They are built to allow the insertion of data without a system load) and security requirements of the CS. For example,
predefined schema; NoSQL taxonomy supports key-value the distributed BD data cannot be written to disk space that is
stores, document store, BigTable, and graph databases. shared by other local CS users that have nothing to do with the
However, it is useful when constraints and validation logic are BD, or BD data can be printed only from local printers.
not required to be implemented in a database. Thus, most BD Additionally, the CSP needs to handle a situation when AC
environments only offer AC at the schema level, with no finer rules from other CS local policies conflict with CSP rules.
granularity to address attributes of users and resources.
Therefore, if the AC for BD data query is based on the
Federated Attribute Definitions (FAD) list the common
structure attributes of BD data, then NoSQL needs to support
attributes used by MS and CSs, so that the MSP and CSPs can
the capability to create and manage AC information; such a
capability may require an application layer on top of the be composed using the common attributes in the FAD
existing NoSQL mechanisms [15]. dictionary. For example, attribute local user is defined as all
users who can log into the CS system, company employee is
defined as all the CS users who have company’s employee
V. GENERAL BIG DATA ACCESS CONTROL SCHEME
identifications, and system administrators is defined as the CS
Fundamentally, the AC requirements for a BD cluster are user who has system administrator privilege on the CS system.
no different than non-BD systems. However, due to the facts So, the FAD serves as the federated dictionary of AC attributes
that (1) BD is processed by distributing its processes and data that should be syntactically and semantically agreed by the MS
from MS to CSs, and (2) BD data has no formal scheme for and CSs.
database management, BD AC needs additional AC
capabilities than non-BD systems. A BD cluster is a construct To apply the scheme, the following tasks need to be
of an enterprise system that requires MS’s AC mechanism to performed before the application:
be incorporated with CSs’. In terms of AC privilege as defined
in [10], when MS passes the process/data to a CS, the MS is • Coordinate BD source providers and MS for SA
the Subject, the BD process/data and required CS local agreements;
resources are the Objects, and the required actions are the • MS collects information about CSs based on the knowledge
Actions in CS’s AC policy. Figure 3 shows our proposed AC of CSs’ security capabilities, levels of assurances, or trust. The
scheme based on the general BD model described in Section 3. information is required for MS to define security classes in
The scheme includes AC components to meet the BD AC sync with BD source providers for SA;
requirements as described below.
• Coordinate MS and CSs to define attributes in FAD based
Security Agreement (SA) is a mutual agreement between BD on the common syntactic and semantic values of attributes for
source provider and the MS for defining security classes of BD the BD processing needs;
source. The purpose of SA is for the BD source provider and
the MS to define and agree upon security classes (ranks), so • CS prepares information about how it trusts the MS (i.e.,
that it can be referred by the MS and CSs to decide the levels what local resources are available for the BD processing) and
of security (or trust) that a CS is qualified for processing the considerations for performance and security capabilities of the
BD. For instance, a BD source may be an email log file, CS’s local system, as well as responsibility for disseminating
considering the confidential level; the log can be accessed only the distributed BD process and data. All the information will be
by a CS with security class say from 1 to 3. translated into CSP rules;
• CS composes meta-rules to handle conflicts between CS’s
Trust CS List (TCSL) lists the trusted CSs recognized by the local AC policies and CSP, as well as between MSP and CSP.
MS. The TCSL categorizes CSs by the security classes Note that unless specifically agreed upon, CSP rules should
according to the SAs worked with BD source providers. TCSL have higher priority than MSP rules when determining access
is managed by MS security officer based on their knowledge of privileges. If CS denies a BD process’s access, it should notify
the associated CSs. For example, CS-i is assigned to class 1, MS for transferring the task/data to other CSs; and
CS-j and CS-k are assigned to class 2, and CS-l is assigned to • In addition to authorization enforcement, MS and CSs may
class 3. In other words, TCSL allows the MS to determine how consider including AC activity audit capability for BD access
CSs are trusted for the distributions of BD process and data. logs.

MS AC Policy (MSP) is managed by the MS security officer.

MSP specifies a set of AC rules that are imposed by MS to
enforce AC on CSs. For example, the distributed BD data can
be read only by subjects with attribute company employee, or
the BD process can be executed only by processes with subject
attribute system administrator in a CS.

CS AC Policy (CSP) is managed by the CS security officer.

CSP allows the CS to control the access to the distributed BD
BDAC {
Cu = {c1…...ck} such that (bdu, ci)  SA;
if Cu =  {
request = deny /* for this csl
else
CSu = {cs1…..csk} such that (csi, ci)  TCSL and ci
Cu;
if CSu =  or csl  CSu {
request = deny /* for this csl
else
if (there exist (mpx = (atx, ax, bdx) MSP such
that au = = ax and bdu = = bdx)) and (there exist
(cspx = (bdx, ax, rsx)  CSPl such that au = =ax
and bdu = = bdx and rsx = = resource required)) {
request = grant /* for this csl ;
if perform au on rsx = = success {

Fig. 3. A generic BD architecture

return result to RC
else
Formally, the proposed scheme can be represented by:
return “RC resource from csl
• A set of security classes from c1 to cn in the set C = unavailable”
{c1…..cn}.
}
• A set of BD source providers from bd1 to bdn in the set BD
= {bd1…bdn}. else
• A set of CSs from cs1 to csn in the set CS = {cs1….csn}. request = deny /* for this csl
• A set of federated attributes from at1 to atn in the set FAD = }
{at1….atn}. }
• A set of (bdx, cx) pairs in the set SA  BD  C. }
• A set of (csx, cx) pairs in the set TCSL  CS  C. }
• A set of MS policy rules from mp1 to mpn in the set MSP = The following demonstrates an example of the algorithm.
{mp1 ….mpn}, each mpi = (atx, ax, bdx) is a tuple where, atx  The BD source provider x enforces security class 2 as defined
FAD is an attributes, ax is an action, and bdx  BD is a source in SA. Class 2 is a level of moderate trust by some mutual
provider that means subject with attribute atx is permitted to criteria between x and MS. MS sets up MSP and regulates that
perform action ax on object from bdx. only system administrator can process x’s BD after the subject
attribute: system administrator is syntactically and
• A set of AC policy rules for CS CSi in the set CSPi =
semantically agreed between MS and participating CSs. MS
{cspi1….cspin}, each cspii = (bdx, ax, rsx) is a tuple where bdx 
also assigns CS-i, CS-j, and CS-k (assuming the higher the
BD, ax is an action, and rsx is a local resource from CSi that
number, the higher the security classes) to security classes
means subject from bdx is permitted to perform action ax on
greater than or equal to class 2 in the TCSL. Assume CS-i‘s
object rsx.
local AC policy does not allow to process nonlocal data, as
Let the BDU = (u, au, bdu) represent a BD user request; well as CS-i’s CSP gives local policy higher priority than MSP.
where u is an authenticated BD user by the MS, au is a And CS-j does not give any system administrator privilege to
requested action, and bdu is a BD source provider of the nonlocal process; thus, only CS-k can process request for x’s
data/process that u is request to perform au from. The AC BD.
algorithm to accept (u, au, bdu) on the CS csl is:
Figure 4 depicts the BD AC control/manage domains
where MS AC domain is enforced collectively by information
in the SA, TCSL, MSP, and FAD entries, which are all
configured and managed by MS. CS AC domain is enforced
collectively by information in MSP, FAD, and CSP entries;
however, only CSP entries are configured and managed by the harder to retrieve required attributes, for example, by
CS. Note that atx in the FAD and csx in TCSL are used to traditional database queries or analytic tools. This is because
determine if a CS is permitted to process the BD process established procedures for manipulating data typically rely
request. The csx is independently decided by the MS, but atx upon imposition of a rigid structure or schema (defining
needs to share the responsibilities with trusted CSs for access elements of the data to be a location, a device type, a zip code,
privilege decisions. Thus, the chain of trust is from BD source a user ID, etc.) and the pre-calculation of indexes to recall
provider to MA through SA, then from MS to CS through specific data fields (attributes), such as used in RDBA or
TCSL, MSP and FAD, also from CS to MS through FAD and NoSQL. These approaches continue to excel in managing
CSP which shows that both MS’s and CS’s AC domains do not
highly structured data but are less well suited for BD [5].
allow unauthorized access without the coordination between
MS and CSs.
AC auditing -- The nature of distributed processing in BD
clusters poses a challenge for AC activity auditing, which
Authenticated SA TCSL CSP
tends to be incapable of analyzing the interconnected data
BD user request ….. …… …… between MS and CSs, which are responsible for tracking the
u, au, bdu (bdu, cx) (cx, csx) bdu, au, rsu actual interactions with files and resources spread across their
…… …… …… operating systems and applications. To minimize the
FAD MSP
problems, organizations may want to implement auditing
….. ……
functions in the infrastructure layer, simplify complexity in
atx atx, au, bdu the application space, and adopt authentication and MAC.
…… …… However, scaling these technologies to the levels necessary to
accommodate BD can present its own set of unique challenges
[5, 16].
Unauthorized
Combining with Cloud computing environment -- With the
MS access
Unauthorized emergence of Cloud platforms, many might consider
CS access processing BD in a Cloud environment, to handle the growing
volume and complexity of BD. However, Cloud computing
has both positive and negative effects on the BD. As data
Fig. 4. BD AC control/manage domain. becomes more accessible through the Cloud, there are three
major threats to securing BD processing: malfunctioning
Note that to be general, we provide a core concept of the infrastructure components, infrastructure outside attacks, and
scheme. For practical implementations, additional sets and infrastructure inside attacks. To address these threats, the MS
rules for the algorithm that cover other system components should improve trustworthiness and the usability of CSs by
such as backup, logs, etc., might need to be included. strengthening the MSP and fine-grained FDA.

VI. IMPLEMENTATION CONSIDERATIONS Many AC models and mechanisms can be applied to

In addition to the scheme, operational considerations that support the proposed scheme. One of the most versatile is the
are common for other enterprise BD AC mechanisms also Attribute-Based Access Control (ABAC) model, because of its
need to be addressed as below. capabilities in configuring and managing attributes and AC
policies. It also supports the AC requirements of enterprise
Trust establishment – For better performance, MS might systems, where BD is usually serviced as described in [10],
need to separate parallel TDs and DDs functions (e.g., which provides guidance for ABAC enterprise implementation
Hadoop) that process immense volumes of BD. In this case, by the initiation, acquisition and development,
TDs and DDs should also protect the BD data from an implementation and assessment, and operations and
untrusted CS, especially when their processes are delegated in maintenance phases of an enterprise. The guidance may also
a virtual environment such as that of a Cloud. For example, be applied to BD AC implementation.
some suggest ensuring the trustworthiness of CSs by
Mandatory AC (MAC) mechanisms so that CS must be VII. SUMMARY AND CONCLUSIONS
authenticated and given properties by MS, and only when
they're competent can they be assigned CS tasks. After this To harvest BD benefits, security challenges must be
qualification, periodic updates must be made to ensure CSs overcome. Security professionals apply most controls at the
consistently meet established policies [16]. very edges of the network. However, if attackers penetrate the
BD cluster, they will have full and unrestricted access to the
Content Attributes -- If instead of SAs, the security classes BD [9]. Thus, many organizations are being required to
for MSP are dynamically determined by referring attributes enforce AC and privacy restrictions on BD to meet regulatory
from the BD sources contents [15], then the issue would be requirements. In this paper, we presented an AC scheme for
that as data grows in volume and complexity, it becomes BD data processing in a distributed processing environment.
We introduced the definition of BD, illustrated a general BD REFERENCES
process model abstracted based on general distributed
processing environment, and used the popular BD process [1] “Big data to turn ‘mega’ as capacity will hot 44 zettabytes by 2020,”,
application Hadoop as an example. We discussed BD AC DataIQ News, http://www.dataiq.co.uk/news/20140410/big-data-turn-
issues with related work. Finally, we presented the BD AC mega-capacity-will-hit-44-zettabytes-2020, Oct. 2014.
scheme and considerations based on trust between BD source [2] H. Mir, “Hadoop Tutorial 1What is Hadoop,” ZeroToProTraining,
http://ZeroTOProTraining.com http://nusmv.irst.itc.it/.
providers and BD Master System (MS), and consequently,
[3] J. Moore, “How big data is remaking the government data center,”
between MS and Cooperating Systems (CSs). The scheme is GCN, http://gcn.com/articles/2014/02/14/big-data-data-centers.aspx,
focused on authorization (access control) to protect BD Feb. 2014.
processing and data from insider attacks in the BD cluster, [4] W. Bell, “The Big Data Cure,” MeriTalk,
under the assumption that authentication is already http://www.meritalk.com/bigdatacure, 2014.
established. In addition to AC components, we demonstrated [5] P. Miller, “Applying big data analytics to human-generated data,”
the formal sets and algorithm, and the domain of protection GIGAOM RESEARCH, http://research.gigaom.com/report/applying-
big-data-analytics-to-human-generated-data/, Jan. 2014.
for the scheme that showed that no unauthorized privileges
[6] V. Hu, R. Kuhn, T. Xie, and J. Hwang, “Model Checking for
either from MS or CSs are possible. Verification of Mandatory Access Control Models and Properties,”
International Journal of Software Engineering and Knowledge
Depending on the security requirements of the BD Engineering (IJSEKE) regular issue IJSEKE Vol. 21, No. 1., 2011.
application, many AC models and mechanisms can be applied [7] Hadoop.apache.org
to the scheme. The proposed scheme is devised to be generally [8] V. Hu and K. Scarfone, “Guidelines for Access Control System
Evaluation Metrics,” NIST Interagency Report 7874, Gaithersburg, MD,
applicable for distributed system-based BD AC, and stems USA, 2012.
from the fundamental AC mechanism that uses attributes of [9] “The Big Data Security Gap: Protecting the Hadoop Cluster,” White
subjects, objects, actions, and sometimes environment Paper, Zittaset, http://www.zettaset.com/wp-
conditions which are the building blocks of Attribute-Based content/uploads/2014/04/zettaset_wp_security_0413.pdf, 2014.
AC (ABAC) mechanisms, to determine access permission. We [10] V. Hu, D. Ferraiolo, R. Kuhn, A. Schnitzer, K. Sandlin, R. Miller, and
further discussed some operational and implementation issues K. Scarfone…., “Attribute Based Access Control Definition and
Consideration,” NIST Special Publication 800-162, Gaithersburg, MD,
for practical application. These issues are tied to the BD USA, 2013.
application, and should be handled according to the BD [11] K.. T. Smith, “Big Data Security: The Evolution of Hadoop’s Security
security requirements. Model,” InfoQ, http://www.infoq.com/articles/HadoopSecurityModel,
Aug. 2014.
[12] “Apache Accumulo,” https://accumulo.apache.org
[13] Hbase.apache.org
[14] “NoSQL Databases Explained,” mongoDB Inc.,
http://www.mongodb.com/nosql-explained, 2014.
[15] W. Zeng, Y, Yang, B. Luo, “Access Control for Big Data using Data
Content,” in Proc. 2013 IEEE International Conference on Big Data,
2013.
[16] S. Shea, “CSA top 10 big data security, privacy challenges and how to
solve them,” TechTarget, SearchSecurity, Nov. 2013.

Dev Ops
0% (1)
Dev Ops
113 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
A Sentiment Analysis System To Improve Teaching and Learning PDF
0% (1)
A Sentiment Analysis System To Improve Teaching and Learning PDF
8 pages
AWS Virtual Cloud
No ratings yet
AWS Virtual Cloud
28 pages
Big Data Based Security Analytics For Protecting Virtualized Infrastructures in Cloud Computing
No ratings yet
Big Data Based Security Analytics For Protecting Virtualized Infrastructures in Cloud Computing
41 pages
HCI NOTES (R16) Unit-I, II
No ratings yet
HCI NOTES (R16) Unit-I, II
59 pages
Big Data Solution Assignment-I
No ratings yet
Big Data Solution Assignment-I
4 pages
Information Management 2
No ratings yet
Information Management 2
111 pages
Google Chrome Extension
No ratings yet
Google Chrome Extension
28 pages
AI and Machine Learning For Risk Management
No ratings yet
AI and Machine Learning For Risk Management
19 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Big Data Analytics - Project
50% (2)
Big Data Analytics - Project
27 pages
Data Science
No ratings yet
Data Science
87 pages
Precision Aquaculture
No ratings yet
Precision Aquaculture
5 pages
Optional Feature Description: Imanager U2000 V200R014
No ratings yet
Optional Feature Description: Imanager U2000 V200R014
29 pages
Selenium Webdriver & BDD With Specflow
100% (1)
Selenium Webdriver & BDD With Specflow
45 pages
Chapter 9 - BDMT
No ratings yet
Chapter 9 - BDMT
61 pages
Overview of Big Data: Saidatul Rahah Hamidi
No ratings yet
Overview of Big Data: Saidatul Rahah Hamidi
25 pages
SAP Note 3117800 - SAP BW Bridge Limitations
No ratings yet
SAP Note 3117800 - SAP BW Bridge Limitations
3 pages
Notes Hadoop
No ratings yet
Notes Hadoop
19 pages
Big Data - Hands-On Manual The Fastest Way To Learn Big Data! - Alvaro de Castro
No ratings yet
Big Data - Hands-On Manual The Fastest Way To Learn Big Data! - Alvaro de Castro
46 pages
HCI Chapter 8 Evaluation Techniques
100% (1)
HCI Chapter 8 Evaluation Techniques
11 pages
Jifs223295 2
No ratings yet
Jifs223295 2
25 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Big Data Processing With Hadoop: Bachelor's Thesis Information Technology Internet Technology 2015
No ratings yet
Big Data Processing With Hadoop: Bachelor's Thesis Information Technology Internet Technology 2015
45 pages
Wa0000.
No ratings yet
Wa0000.
35 pages
Files 1 2020 April NotesHubDocument 1586849482
No ratings yet
Files 1 2020 April NotesHubDocument 1586849482
60 pages
Step - By-Step Procedure To Set Up A Standalone Root CA On Windows Server
No ratings yet
Step - By-Step Procedure To Set Up A Standalone Root CA On Windows Server
20 pages
Operating System - Unit 2
No ratings yet
Operating System - Unit 2
131 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Big Data 2.0 Processing Systems 2ed
No ratings yet
Big Data 2.0 Processing Systems 2ed
155 pages
Anjos-2019-Fast-Sec - An Approach To Secure Big
No ratings yet
Anjos-2019-Fast-Sec - An Approach To Secure Big
17 pages
Big Data in Computer Cyber Security Systems
No ratings yet
Big Data in Computer Cyber Security Systems
10 pages
Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends
No ratings yet
Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends
13 pages
Pharmazone Final SRS
No ratings yet
Pharmazone Final SRS
28 pages
The Beckman Report On Database Research
No ratings yet
The Beckman Report On Database Research
12 pages
Big Data - Infrastructure Considerations: Author Anand Veeramani / Deepak Shivamurthy
No ratings yet
Big Data - Infrastructure Considerations: Author Anand Veeramani / Deepak Shivamurthy
11 pages
TravelMate P645-S P645-SG Compal LA-B731P
No ratings yet
TravelMate P645-S P645-SG Compal LA-B731P
58 pages
MODULE 4 Notes
No ratings yet
MODULE 4 Notes
12 pages
21cs71 Model Set 1 Paper Solution
No ratings yet
21cs71 Model Set 1 Paper Solution
32 pages
? Master SQL DDL With These 50 Interview Questions!
No ratings yet
? Master SQL DDL With These 50 Interview Questions!
8 pages
Lecture8 - Big Data (Hadoop)
No ratings yet
Lecture8 - Big Data (Hadoop)
29 pages
ML in Financial Crisis Prediction Survey
No ratings yet
ML in Financial Crisis Prediction Survey
16 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Applsci 13 01183
No ratings yet
Applsci 13 01183
28 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
Data Visualization
No ratings yet
Data Visualization
140 pages
A Review of Hadoop Security Issues, Threats and Solutions
No ratings yet
A Review of Hadoop Security Issues, Threats and Solutions
6 pages
Ism 6404 CH 7
No ratings yet
Ism 6404 CH 7
47 pages
Bda 1
No ratings yet
Bda 1
26 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Philips 49 Monitor Manual
No ratings yet
Philips 49 Monitor Manual
41 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Chapter 5 v8.0 ME
No ratings yet
Chapter 5 v8.0 ME
107 pages
Hadoop Big Data Unit 2
No ratings yet
Hadoop Big Data Unit 2
23 pages
Janisha Sethi cv.2
No ratings yet
Janisha Sethi cv.2
1 page
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Hadoop Explained (Ebook)
No ratings yet
Hadoop Explained (Ebook)
22 pages
Lec1 Special
No ratings yet
Lec1 Special
21 pages
FPGA Keyboard Interface - Embedded Thoughts
No ratings yet
FPGA Keyboard Interface - Embedded Thoughts
29 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
Lect 2 Big Data Lesson01
No ratings yet
Lect 2 Big Data Lesson01
26 pages
Bda Crime
No ratings yet
Bda Crime
13 pages
Modern Information Retrieval Chapter 5 Query Operations
No ratings yet
Modern Information Retrieval Chapter 5 Query Operations
33 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Intents and Broadcasts
No ratings yet
Intents and Broadcasts
4 pages
A STUDY ON BIG DATA HADOOP Nandha Kumar
No ratings yet
A STUDY ON BIG DATA HADOOP Nandha Kumar
7 pages
Big Data Architecture Incident Response
No ratings yet
Big Data Architecture Incident Response
19 pages
Siemens Power Link
No ratings yet
Siemens Power Link
14 pages
Introduction To Big Data: Soorya Prasanna Ravichandran
No ratings yet
Introduction To Big Data: Soorya Prasanna Ravichandran
33 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Hadoop Architecture and Its Functionality
No ratings yet
Hadoop Architecture and Its Functionality
7 pages
High Level View of Cloud Security
No ratings yet
High Level View of Cloud Security
11 pages
Guha Roy 2017
No ratings yet
Guha Roy 2017
3 pages
V3i308 PDF
No ratings yet
V3i308 PDF
9 pages
Towards Fish Individuality-Based Aquaculture
No ratings yet
Towards Fish Individuality-Based Aquaculture
10 pages
ML Opp
No ratings yet
ML Opp
9 pages
A Review On HADOOP MAPREDUCE-A Job Aware Scheduling Technology
No ratings yet
A Review On HADOOP MAPREDUCE-A Job Aware Scheduling Technology
5 pages
Secure Data Sharing With Data Partitioning in Big Data: Mr. Shriniwas Patilbuwa Rasal Prof. Hingoliwala Hyder Ali
No ratings yet
Secure Data Sharing With Data Partitioning in Big Data: Mr. Shriniwas Patilbuwa Rasal Prof. Hingoliwala Hyder Ali
4 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Academic Performance
No ratings yet
Academic Performance
13 pages
Hachalu Hundessa Campus IOT Department of Information Technology
No ratings yet
Hachalu Hundessa Campus IOT Department of Information Technology
10 pages
Jaa Project
100% (1)
Jaa Project
2 pages
IOT With ESP8266 Node MCU PDF
No ratings yet
IOT With ESP8266 Node MCU PDF
4 pages
Student Feedback PDF
No ratings yet
Student Feedback PDF
9 pages
Spring Boot Annotations - A Guide To Essential Annotations - by Saurabh Kundu - Medium
No ratings yet
Spring Boot Annotations - A Guide To Essential Annotations - by Saurabh Kundu - Medium
7 pages
Design Process HCI Design: Structure and Feedback Are The Seven Principles Used in Interface Designing
No ratings yet
Design Process HCI Design: Structure and Feedback Are The Seven Principles Used in Interface Designing
7 pages
Mailerlite - QA Engineer Assignment
No ratings yet
Mailerlite - QA Engineer Assignment
7 pages
Exploring Security Threats On Blockchain Technology Along With Possible Remedies
No ratings yet
Exploring Security Threats On Blockchain Technology Along With Possible Remedies
4 pages
Chinese
No ratings yet
Chinese
19 pages
Lab 8 (Function Module) Ans
No ratings yet
Lab 8 (Function Module) Ans
11 pages
Topics: FPGA Fabric Architecture Concepts
No ratings yet
Topics: FPGA Fabric Architecture Concepts
23 pages
GDS-1000B Quick Start Guide A
No ratings yet
GDS-1000B Quick Start Guide A
2 pages
Elementary Concepts of Big Data and Hadoop
No ratings yet
Elementary Concepts of Big Data and Hadoop
4 pages
Acorn DB
No ratings yet
Acorn DB
12 pages
OOP Question Bank-Updated
No ratings yet
OOP Question Bank-Updated
4 pages
Global Extended Reality .9491869.powerpoint
No ratings yet
Global Extended Reality .9491869.powerpoint
4 pages

An Access Control Scheme For Big Data Processing: Vincent C. Hu, Tim Grance, David F. Ferraiolo, D. Rick Kuhn

Uploaded by

An Access Control Scheme For Big Data Processing: Vincent C. Hu, Tim Grance, David F. Ferraiolo, D. Rick Kuhn

Uploaded by

An Access Control Scheme for Big Data Processing

Vincent C. Hu, Tim Grance, David F. Ferraiolo, D. Rick Kuhn

III. BIG DATA ACCESS CONTROL CHALLENGES

MS AC Policy (MSP) is managed by the MS security officer.

CS AC Policy (CSP) is managed by the CS security officer.

Fig. 3. A generic BD architecture

VI. IMPLEMENTATION CONSIDERATIONS Many AC models and mechanisms can be applied to

You might also like