Automated Methods For Generating Least Privilege Access Control Policies - Principle of Least Privilege (PoLP)
Automated Methods For Generating Least Privilege Access Control Policies - Principle of Least Privilege (PoLP)
by
Matthew W. Sanders
c Copyright by Matthew W. Sanders, 2019
All Rights Reserved
A thesis submitted to the Faculty and the Board of Trustees of the Colorado School
of Mines in partial fulfillment of the requirements for the degree of Doctor of Philosophy
(Computer Science).
Golden, Colorado
Date
Signed:
Matthew W. Sanders
Signed:
Dr. Chuan Yue
Thesis Advisor
Golden, Colorado
Date
Signed:
Dr. Tracy Camp
Professor and Head
Department of Computer Science
ii
ABSTRACT
Access controls are the processes and mechanisms that allow only authorized users to
perform operations upon the resources of a system. Using access controls, administrators
attempt to implement the Principle of Least Privilege, a design principle where privileged
entities operate using the minimal set of privileges necessary to complete their job. This
protects the system against threats and vulnerabilities by reducing exposure to unauthorized
activities. Although access control can be considered only one area of security research, it
is a pervasive and omnipresent aspect of information security.
But achieving the Principle of Least Privilege is a difficult task. It requires the ad-
ministrators of the access control policies to have an understanding of the overall system,
each user’s job function, the operations and resources necessary to those job functions, and
how to express these using the access control model and language of the system. In almost
all production systems today, this process of defining access control policies is performed
manually. It is error prone and done without quantitative metrics to help administrators
and auditors determine if the Principle of Least Privilege has been achieved for the system.
In this dissertation, we explore the use of automated methods to create least privilege
access control policies. Specifically, we (1) develop a framework for policy generation al-
gorithms, derive metrics for determining adherence to the Principle of Least Privilege, and
apply these to evaluate a real world dataset, (2) develop two machine learning based algo-
rithms for generating role based policies and compare their performance to naive methods,
and (3) develop a rule mining based algorithm to create attribute based policies and evaluate
its effectiveness to role based methods. By quantifying the performance of access control
policies, developing methods to create least privilege policies, and evaluating their perfor-
mance using real world data, the projects presented in this dissertation advance the state of
access control research and address a problem of great significance to security professionals.
iii
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
iv
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
v
3.6.4 Results Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
vi
4.7.1 Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
CHAPTER 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
vii
LIST OF FIGURES
viii
Figure 4.3 Comparison of Methods for Calculating Coverage Rates . . . . . . . . . . 90
ix
LIST OF TABLES
x
LIST OF ABBREVIATIONS
False Negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FN
False Positive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FP
xi
Role Mining Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RMP
True Negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TN
True Positive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TP
xii
ACKNOWLEDGMENTS
I would like to express my utmost gratitude to my advisor Professor Chuan Yue. His
wisdom, insights, guidance and unwavering patience were all crucial in my research and my
personal growth as a researcher. I hope that one day I can exhibit the same virtuous qualities
that he has shown while mentoring me. I am also grateful to my committee members,
Professor Tracy Camp, Professor Nils Tilton, Professor Bo Wu, and Professor Dejun Yang
for their time and support.
I would also like to thank my family for their love and encouragement. I would like to
thank my mother Ruth, a teacher who taught me the importance of education and to always
continue learning. I would like to thank my father Wiley, who instilled in me the perseverance
needed to sustain me during my years of research. Finally, I would like to thank my wife
Elizabeth, words cannot express how grateful I am for her support and sacrifices made during
countless late nights and weekends. Without her, my research and Ph.D. pursuit would not
have been possible.
xiii
CHAPTER 1
INTRODUCTION
Access controls are the processes and mechanisms that allow only authorized users to
perform operations upon the resources of a system. They allow administrators and resource
owners to specify which users can access a system, what resources those users can access, and
what operations those users can perform. Using access controls, administrators implement
the Principle of Least Privilege (PoLP), a design principle where privileged entities oper-
ate using the minimal set of privileges necessary to complete their job. This protects the
system against threats and vulnerabilities by reducing exposure to unauthorized activities
and provide access only for those who have been approved. Although access control can
be considered only one area of security research, it is the most pervasive and omnipresent
aspect of information security [1]. Because the PoLP is so fundamental to secure design, it
is specified in all widely accepted security compliance standards:
• Payment Card Industry (PCI) Data Security Standard (DSS) v3.1, Requirement 7: Re-
strict access to cardholder data by business need to know.
• National Institute of Standards and Technology (NIST) Special Publication 800-53, Secu-
rity and Privacy Controls for Federal Information Systems and Organizations, AC-6: The
organization employs the principle of least privilege, allowing only authorized accesses for
users (or processes acting on behalf of users) which are necessary to accomplish assigned
tasks in accordance with organizational missions and business functions.
1
• National Institute of Standards and Technology (NIST) Special Publication 800-171, Pro-
tecting Controlled Unclassified Information in Nonfederal Systems and Organizations,
3.1.5: Employ the principle of least privilege, including for specific security functions and
privileged accounts..
As information systems have become more complex, access controls have also evolved
to meet the diverse requirements of these information systems. Early access control models
such as Access Control Lists (ACLs) consisting of a list of user permissions attached to each
system object were sufficient for simpler systems. But these models are woefully inadequate
for modern systems where it is not uncommon to deal with thousands of users with federated
identities from multiple systems, each system with its own type of resources and operations,
possibly using different access control models.
In modern systems, the complexity of managing access controls and implementing the
PoLP often exceeds the capacity of manual management. While implementing the PoLP is
a desirable and sometimes mandatory requirement for software systems, proper implementa-
tion can be difficult and is often not even attempted. Previous research into the use of least
privilege practices in the context of operating systems [2] revealed that the overwhelming
majority of study participants did not utilize least privilege policies. This was due to their
partial understanding of the security risks, as well as a lack of motivation to create and
enforce such policies.
In addition to information systems becoming more complex, they have also become more
empowering for their users, increasing the possible damage that may be caused by access
2
control errors. For example, Cloud Computing provides cheap on demand access to com-
puting and storage resources for its users. With this increased power also comes increased
consequences of access control mistakes. The Amazon Simple Storage Service (S3) is just
one of many popular cloud services. S3 provides the ability for users to easily and securely
store data in the cloud and allow other users to read or modify that data. While the access
controls and operations of the S3 service are relatively simple to understand and manage,
there were at least seven major incidents in 2017 where the mismanagement of S3 access
controls led to significant data breaches [3]:
• May 2017: Booz Allen Hamilton exposed battlefield imagery and administrator credentials
to sensitive systems of the National Geospatial Agency (NGA).
• June 2017: Deep Root Analytics exposed personal data of 198 million American voters.
• July 2017: Dow Jones & Co. exposed personally identifiable information of 2.2 million
people.
• July and September 2017: Verizon Wireless exposed personally identifiable information of
over 6 million customers and sensitive corporate information.
• September 2017: Accenture exposed hundreds of gigabytes of data, including private sign-
ing keys and plaintext passwords.
Another common class of security breaches resulting from poor access control and the
power of cloud computing is cryptojacking attacks enabled by compromised cloud creden-
tials. Cryptojacking is any attack involving the unauthorized use of computing resources to
3
mine cryptocurrency. The cloud computing form of cryptojacking attacks occur when users
accidentally expose their cloud computing credentials such as in publicly shared source code.
Attackers find these credentials and use them to to mine cryptocurrency at the victim’s
expense. Many such incidents have been documented in news articles with organizations
such as Tesla [4], The L.A. Times [4], Gemalto [5], and Aviva [5] being just some of the
documented victims of such attacks. These attacks are increasingly common with attack-
ers continually searching open source code repositories such as GitHub for access keys [6].
Improved authentication methods may have prevented these attacks, but even with perfect
authentication, insider threats and accidental misuse are still security issues. The PoLP
helps reduce the damage possible from such threats. In the cryptojacking scenario, reducing
the number of users that can create virtual instances or reducing the number of instances
any single user can create alone would reduce the damage caused by such attacks.
It is important to note that these breaches are not the result of previously unknown
vulnerabilities being exploited, nor due to the efforts of unusually capable and determined
attackers. Instead, these are attacks of opportunity made possible by human errors in man-
aging the access controls of an organization’s resources. The negative impacts of such access
control misconfigurations are pervasive and growing. In 2017, security research firm RedLock
found that 53% of organizations using cloud storage services such as Amazon S3 had inad-
vertently exposed one or more such services to the public. It appears that this is trending
upwards despite growing awareness about the risks of misconfigurations [5]. The damage
from such incidents may have been reduced or prevented all together by stricter adherence
to the PoLP which would restrict the access to such resources to fewer people.
This thesis presents metrics, methods, and experimental results of using automated meth-
ods to implement least privilege access control policies across three separate but related
projects. While the cloud computing environment is the focus of this work because of access
to available data and because it is one of the most complex environments in terms of access
control, the problems of access control errors are not unique to the cloud environment and
4
this work is relevant to addressing such problems in other environments as well.
Before describing solutions, we must first analyze and define the problem of automating
least privileges. There exists a large body of work mining Role Based Access Control (RBAC)
access control policies from existing permissions or audit logs in order to create the smallest
(and most maintainable) RBAC policies with metrics to support these goals. However, these
previous works have neglected to address methods and metrics for measuring the security
of policies in terms of the least privilege. Instead of focusing on maintainability, we argue
that the security of policies and their adherence to the PoLP is the most important goal
when considering automated methods of building access control policies. Our first project,
“Automated Least Privileges in Cloud-Based Web Services” provides an analysis of over-
privilege present in the access control policies of a real world dataset. It also defines a
methodology and metrics for quantifying the security of policies in terms of over-privilege
and under-privilege. Unlike previous approaches which often treat access control policies and
audit logs as fixed sets, our approach considers how these both change over time to better
analyze the risk of over-privilege in policies.
In our second project, “Minimizing Privilege Assignment Errors in Cloud Services”,
we implement three separate policy generation algorithms to create RBAC least privilege
policies by mining a real world dataset of audit logs. Our algorithms consist of a naive
approach, an unsupervised algorithm based on clustering, and a supervised algorithm based
on machine learning classification. Using the same metrics and evaluation methodology as
the first project, we analyze and compare the performance of these three algorithms. These
metrics include a weighting that allows administrators to express how much they value
minimizing under-privilege vs. minimizing over-privilege which we use to determine which
algorithm performs ‘best’ as this weighting varies.
While RBAC is the de-facto access control model in government and industry, the At-
tribute Based Access Control (ABAC) is becoming more popular. ABAC provides the ability
to create security policies using attributes that may be associated with users, objects, or the
5
operating environment. By using the wealth of attribute information in the audit logs and
the greater expressive power of ABAC policies it is possible to create access control policies
which simultaneously reduce under- and over-privilege when compared to RBAC. Creating
such ABAC policies is the focus of our our third project, “Mining Least Privilege Attribute
Based Access Control Policies”. In this project, we implement an algorithm based on as-
sociation rule mining techniques to create ABAC least privilege policies by mining a real
world dataset of audit logs. We adapt the metrics of our previous works and use the same
same methods to evaluate policies over time in terms of under- and over-privilege errors.
In addition to showing the effectiveness of our own algorithm, this project also provides
a methodology and quantitative comparison showing the ability of ABAC to reduce under-
privilege and over-privilege when compared to RBAC which may be valuable to access control
researchers regardless of their interest in policy mining techniques.
The remainder of this chapter briefly describes each of these three projects, one in each
subsection. Each project’s goals, methods, and results are described in detail in separate
chapters of this thesis.
The PoLP is a fundamental guideline for secure computing that restricts privileged en-
tities to only the permissions they need to perform their authorized tasks. Achieving least
privileges in an environment composed of many heterogeneous web services provided by a
third party is an important but difficult and error prone task for many organizations. This
paper explores the challenges that make achieving least privileges uniquely difficult in the
cloud environment and the potential benefits of automated methods to assist with creating
least privilege policies from audit logs. To accomplish these goals, we implement two frame-
works: a Policy Generation Framework for automatically creating policies from audit log
data, and an Evaluation Framework to quantify the security provided by generated roles.
We apply these frameworks to a real world dataset of audit log data with 4.3 million events
from a small company and present results describing the policy generator’s effectiveness. Re-
6
sults show that it is possible to significantly reduce over-privilege and administrative burden
of permission management.
The PoLP is a security objective of granting users only those accesses they need to perform
their duties. Creating least privilege policies in the cloud environment with many diverse
services, each with unique privilege sets, is significantly more challenging than policy creation
previously studied in other environments. Such security policies are always imperfect and
must balance between the security risk of granting over-privilege and the effort to correct for
under-privilege. In this paper, we formally define the problem of balancing between over-
privilege and under-privilege as the Privilege Error Minimization Problem (PEMP) and
present a method for quantitatively scoring security policies. We design and compare three
algorithms for automatically generating policies: a naive algorithm, an unsupervised learning
algorithm, and a supervised learning algorithm. We present the results of evaluating these
three policy generation algorithms on a real-world dataset consisting of 5.2 million Amazon
Web Service (AWS) audit log entries. The application of these methods can help create
policies that balance between an organization’s acceptable level of risk and effort to correct
under-privilege.
Implementing effective and secure access control policies is a significant challenge. Too
much over-privilege increases the risk of damage to the system via compromised credentials,
insider threats, and accidental misuse. Policies that are under-privileged prevent users from
being able to perform their duties. Access control policies are rarely perfect in these regards
and administrators must create policies that balance between the two competing goals of
minimizing under-privilege vs. minimizing over-privilege. The access control model used to
implement policies plays a large role in the ability to construct secure policies and the At-
tribute Based Access Control (ABAC) model continues to gain in popularity as the solution
7
to many access control use cases because of its advantages in granularity, flexibility, and us-
ability. ABAC allows administrators to create access control policies based on the attributes
of the users, operations, resource, and environment. Due to the flexibility of ABAC however,
it can be difficult to determine which attributes and value combinations would create the
best policies in terms of minimizing under- and over-privilege. To address this problem, we
introduce a method of mining ABAC policies from audit logs to generate ABAC policies
which minimize both under- and over-privilege. We also explore optimization methods for
dealing with large ABAC privilege spaces, and present experimental results of our methods
using a real-world dataset demonstrating the effectiveness of our methods.
8
CHAPTER 2
AUTOMATED LEAST PRIVILEGES IN CLOUD-BASED WEB SERVICES
2.1 Introduction
The commoditization of web services by cloud computing providers enables the outsourc-
ing of IT services on a massive scale. The business model of providing software, platform,
and infrastructure components via web services has seen tremendous growth over the last
decade and is forecast to continue expanding at a rapid pace [7]. From small startups to
large companies such as Netflix, Expedia, and Yelp [8], many organizations rely on services
provided by a third party for their mission critical operations. While the adoption of these
hosted web services continues, there are significant security and usability concerns yet to be
solved. Privilege management is a key issue in managing the operation of the diverse array
of web services available.
The principle of least privilege is a design principle where privileged entities operate using
the minimal set of privileges necessary to complete their job [9]. Least privileges protect
against several threats, primarily among them being the compromise of privileged entities’
credentials and functions by a malicious party. Other relevant threats mitigated by least
privileges include accidental misuse, whereby privileged entities may delete or misconfigure
resources which they do not require access to. Another threat is intentional misuse, where
insiders can abuse over-privileges to cause more damage than they would be able to under a
least privilege policy.
While implementing the principle of least privilege is a desirable and sometimes manda-
tory requirement for software systems, proper implementation can be difficult and is often
not even attempted. Previous research into the use of least privilege practices in the context
of operating systems [2] revealed that the overwhelming majority of study participants did
not utilize least privilege policies. This was due to their partial understanding of the security
9
risks, as well as a lack of motivation to create and enforce such policies. In comparison to
the operating system environment, the use of third party web services present a much larger
number of services, resource types, access control policy languages, and audit mechanisms
even within a single service provider making it significantly more difficult to manage access
control.
The main contributions of this paper are: (1) an exploration of the challenges and ben-
efits of implementing an automated least privileges approach for third party web services
using real world data, (2) a concrete implementation of a framework for generating least
privilege policies from audit log data, and (3) metrics and methodology for quantifying the
effectiveness of least privilege policies. Related works are described in Section 2.2. The
motivating example of a real world dataset of manually created policies is analyzed in Sec-
tion 2.3. Automated least privilege generation and evaluation frameworks used are describe
in Section 2.4, the metrics used to evaluate adherence to PoLP are described in Section 2.5
and the results of our analysis are described in Section 2.6.
10
include measuring similarity with existing roles, minimizing the number of user-role assign-
ment and permission-role assignment relations, metrics that seek to reduce administrative
cost, weighted structures that assign adjustable weights to assignment relationships, and
minimizing the number of edges in the role hierarchy graph.
Another related area of research uses audit data to create least privilege policies. Priv-
ileged entities often already possess the privileges necessary to do their jobs, thus roles can
be derived from existing permissions via data mining methods [13]. Notable examples of
mining data to create least privilege policies include EASEAndroid [14] for mobile devices,
ProgramCutter [15] for desktop applications, and Passe [16] for web applications. However,
these approaches do not provide a quantified assessment of how well they achieve the PoLP.
Like role mining, our research aims to reduce the administrative burden of creating access
control policies. However, instead of seeking to make roles more easily maintainable, we
seek to reduce administrator burden by generating secure and complete policies via easily
and frequently repeatable automated methods. The focus of this research is directly on
quantifying and improving the security of automatically generated privilege assignments
regardless of their size and complexity, thus we are addressing a problem different from the
RMP.
To illustrate the challenges of creating least privilege policies and to highlight the po-
tential of using an automated approach to policy generation, we examine a real world
dataset of policies manually created by administrators. The Amazon Web Services (AWS)
CloudTrail [17] logs of a company which provides a Software as a Service (SaaS) product were
analyzed (with permission). The audit logs contained 4.3M events collected over a period
of 307 days. During this period, 37 unique roles and 15 unique users exercised privileges.
Data gathered from the logs were analyzed and compared with the account Identity and
Access Management (IAM) [18] policies as they existed at the end of the collection period.
To quantify the effectiveness of these manually created policies at limiting over-privilege, we
11
compare the actions and services granted by these policies to those exercised in the audit
log data.
The privileged entities considered in this paper are users and virtual machine instances
which can both be assigned to roles. In our dataset, users were granted unconstrained access
making their comparison with exercised privileges somewhat uninteresting, but also demon-
strating a situation where achieving least privilege policies on users was not even attempted.
In contrast to users, virtual machines in our dataset were not granted unrestricted access
but were assigned roles manually created by administrators with the intent of constraining
the virtual machines to least privilege policies. While data for both users and roles were
analyzed, this section focuses on role policies granted to virtual machines to illustrate the
over-privilege present in manually created policies. As the results show, over-privilege was
common for these roles even though the role creators had the benefit of familiarity with the
application and the privileges it required. Services and actions not supported by CloudTrails
were excluded from these results.
Of the 37 unique roles identified in the dataset, 14 were present in the AWS IAM data at
the end of the collection period (those not found in the IAM policies had been deleted during
the collection period). Figure 2.1 shows a comparison between the actions granted and used
by virtual machine roles during the observation period. Even though the policies for each
role were intended to approximate least privileges, clearly there is a significant difference
between the number of actions granted and number of actions used. The average number of
actions granted to these 14 roles was 61.14, while the average number of privileges used was
2.92.
The comparison of privileges granted to those actually used at the service level of gran-
ularity is shown in Figure 2.2. Significant over-privilege is present at the service level, with
every role being granted privileges to at least one service for which it did not perform any
actions. The average number of services used by roles was 1.71 while the average number of
services granted was 5.07.
12
Role14
Role13 Used Actions
Role12 Granted Actions
Role11
Role10
Role9
Role8
Role7
Role6
Role5
Role4
Role3
Role2
Role1
1 2 4 8 16 32 64 128 256
This section describes the frameworks for generating and evaluating least privilege poli-
cies. First we present a framework for generating least privilege policies from audit logs. We
then present a framework for evaluating the effectiveness of a policy generator.
13
Role14
Role13
Used Services
Role12
Granted Services
Role11
Role10
Role9
Role8
Role7
Role6
Role5
Role4
Role3
Role2
Role1
0 2 4 6 8 10 12 14 16 18 20 22 24
Figure 2.2: Number of Granted & Used Services by Role
The process of generating policies begins with ingesting the raw data audit logs for
a given observation period into a datastore. Once ingested, the logs are normalized by
creating a projection of the events onto each unique privileged entity identified in the audit
logs for a specified observation period. Next, the policy generator algorithm is applied to the
normalized data. The generator implemented for this paper uses a simple counting based
approach which creates policy grants for each action an entity successfully exercised during
the observation phase. After policy generation is complete, additional modifications may
be made to the policies such as denying access to privileges which can be used to escalate
privileges. The policy generation framework is a bottom-up approach to building RBAC
policies where exercised permissions are used to create roles. This design can also be applied
to audit log data that have been previously collected in an organization’s environment, and
14
does not require an active presence in the cloud environment during log collection.
We next implemented a framework for evaluating the generated policies. This evaluation
framework simulates the application of an automated least privileges policy generator across
varying observation periods and operation periods. The purpose of these simulations is to
provide a quantified evaluation of the effectiveness of our current and future policy generators
if they were to be adopted in production by an organization. The information obtained from
these simulations can help determine how long the observation period should be, how long
these generated policies should be used for, and how effective the policy generator is. For
these evaluations we chose one day as the finest granularity of time period as this provides
enough time for entities to complete tasks requiring related privileges.
The evaluation framework uses a sliding window approach to perform its duties. It
repeatedly generates observation and operation phases of predetermined sizes and compares
the policy generated during the observation phase to the privileges exercised during the
operation phase. Each of these single evaluations is a trial and multiple trials for the same
evaluation parameters are achieved by incrementing the dates of the observation phase and
15
operation phase by a fixed amount. Figure 2.3 provides a visual representation of how the
sliding window technique is used to generate evaluation trials using the available audit log
data.
2.5 Metrics
16
be P recision = 1 because there is no possibility of over-privilege, and the case where all
privileges are granted is redefined to be Recall = 1 because there is no possibility of under-
privilege. To present more intuitive metrics, we take the compliment of precision and recall
to create metrics where lower values are more favorable: the Over Privilege Rate (OPR) in
Equation 2.1 and Under Privilege Rate (UPR) in Equation 2.2, respectively.
U nexercisedGranted
OP R = 1 − P recision = (2.1)
AllGranted
ExercisedDenied
U P R = 1 − Recall = (2.2)
AllExercised
It is important to consider the amount of time which over-privilege exists. While the cost
of under-privilege is a decreased ability for privileged entities to perform their tasks, high
over-privilege can result in compromises of confidentiality, integrity, and availability if the
over-privilege is exploited by an attacker. The longer that over-privilege exists the greater
the possibility of it being exploited, thus we introduce an additional weight on the OPR to
account for the amount of time which unused privilege grants existed. The Temporal Over
Privilege Rate (TOPR) in Equation 2.3 is the OPR multiplied by the number of days the
privileges went unused (the length of the operation period).
OPR and UPR are two individual metrics for measuring the generated least privilege
policies. To provide a single metric that weights minimal over-privilege vs. minimal under-
privilege, we use the F-score metric (Equation 2.4). Higher β values for the F-score indicate
a higher weight for recall, which indicates a higher weight for minimal under-privilege. Lower
β values for the F-score weight minimal over-privilege higher. We use a temporally weighted
version of the F-score, T Fβ (Equation 2.5), that accounts for the length of time which an
over-privilege was granted. To incorporate a temporal weighting of over-privilege in T Fβ ,
we divide the precision by the operation period length because precision is the compliment
17
of OPR and thus is directly tied to how we score over-privilege. Note that Fβ and T Fβ
are equivalent for the finest granularity of the operation period which is one day in our
simulations.
P recision · Recall
Fβ = (1 + β 2 ) · (2.4)
(β 2 · P recision) + Recall
P recision
2 OperationP eriodLength
· Recall
T Fβ = (1 + β ) · P recision
(2.5)
(β 2 · OperationP eriodLength
) + Recall
The F-score is the harmonic mean of precision and recall. The advantage of using the
harmonic mean F-score over arithmetic mean is that low scores for either precision or recall
will result in an overall low F-score which avoids allowing extreme policies to achieve favorable
scores. Consider an example policy which grants all privileges to an entity. This would result
in a perfect score in terms of precision (1), but the worst possible score in terms of recall (0).
The resulting F-score in this example would be 0 while arithmetic mean score would be 0.5,
the same as if precision and recall were both 0.5. This equal scoring between an extreme
policy and a balanced policy is not desirable in applications which values both precision and
recall.
2.6 Results
This section presents the results of our analysis tying together all of the work described
thus far. We consider the behavior of users and roles granted to virtual machines separately
when evaluating the effectiveness of their policies because they have different usage pat-
terns which produce significantly different scores. The behavior of virtual machines is fairly
consistent in both the actions and resources used while users are less predictable.
The results of evaluating the least privilege policy generator for observation periods of 7
and 28 days as the operation phase varies from 1 to 7 days are shown for users in Figure 2.4
and for virtual machine roles in Figure 2.5. The results for both entity types show that as the
18
length of the operation phase increases, the UPR also increases which is to be expected as
privileged entities use privileges that were not observed during shorter operation phases. For
virtual machine roles, there is very little difference between the UPR for 7 days of operation
vs. 28 days of operation. As we will see later in the metrics, the most variability in virtual
machine permissions exercised occurs during the first few days of the observation phase.
6 1
0.9
5
0.8
0.7
4
0.6
OPR/UPR
TOPR
3 0.5
0.4
2
0.3
0.2
1
0.1
0 0
1 2 3 4 5 6 7
OperationDays
TOPR (7 Observation Days) TOPR (28 Observation Days) UPR (7 Observation Days)
OPR (7 Observation Days) OPR (28 Observation Days) UPR (28 Observation Days)
As the operation phase increases entities are more likely to use privileges they may
not have exercised previously during shorter periods. Thus the unweighted OPR decreases
for both entity types as the operation period increases. However, the TOPR in Figure 2.4
increases as the operation phase increases, indicating that the new privileges exercised during
each additional day of the operation phase do not reduce over-privilege enough to offset the
over-privilege caused by leaving the unexercised privileges granted to the entities longer. The
19
3 0.6
2.5 0.5
2 0.4
OPR/UPR
TOPR
1.5 0.3
1 0.2
0.5 0.1
0 0
1 2 3 4 5 6 7
OperationDays
TOPR (7 Observation Days) TOPR (28 Observation Days) UPR (7 Observation Days)
OPR (7 Observation Days) OPR (28 Observation Days) UPR (28 Observation Days)
effect is more pronounced users than virtual machine roles - the virtual machine roles have
lower TOPR scores for all operation and observation periods.
To determine a recommended operation period based on how much one values minimal
over-privilege vs. minimal under-privilege, we use the T Fβ metric (Formula 2.5). Figure 2.6
shows the combined T Fβ score for both user and virtual machine role data for varying
operation period lengths and β values. In these charts β = 10 represents that minimal
under-privilege is considered to be 10 times more important than minimal over-privilege
while β = 0.1 represents that minimal over-privilege is 10 times more important than min-
imal under-privilege. All of the calculated T Fβ scores constantly decrease as the operation
period increases indicating the smallest operation period of one day is the optimal choice
for minimizing temporal weighted over-privilege and under-privilege. The higher β values
show generally higher scores which decrease less as the operation period increases, indicating
20
1
0.9
0.8
0.7
0.6
Score
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7
Operation Days
TF10 TF5 TF2 TF1 TF0.1
that increasing the operation period would have a less negative impact for those that value
minimal under-privilege.
Next we evaluate the impact of varying the observation period. The results of evaluating
the automated least privilege policy generator for operation phases of lengths 1 and 7 days as
the observation phase varies from 1 to 28 days are shown for users in Figure 2.7 and for virtual
machine roles in Figure 2.8. As the observation period increases the UPR decreases for users
at a logarithmic rate because more privileges exercised by users are captured during longer
observation phases. For virtual machine roles however there is little benefit in increasing
the observation period beyond two days as these virtual machines are unlikely to exercise
additional privileges that have not been exercised after the first day of observation. For both
21
6 1
0.9
5
0.8
0.7
4
0.6
OPR/UPR
TOPR
3 0.5
0.4
2
0.3
0.2
1
0.1
0 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TOPR (1 Operation Days) TOPR (7 Operation Days) UPR (1 Operation Days)
OPR (1 Operation Days) OPR (7 Operation Days) UPR (7 Operation Days)
entity types the UPR is again lower for the 1 day operation period vs. the 7 day operation
period.
For both entity types the OPR and TOPR increase as the observation phase increases
because longer observation phases result in entities being granted more privileges. This is
intuitively obvious for users as they are likely to use some privileges periodically which are
captured during the observation phase, and then not use them again for extended periods of
time or at all during the operation phase. Although the virtual machine roles are unlikely to
spontaneously use new privileges like users, not all privileges are exercised on a daily basis.
To determine a recommended observation period based on how much one values minimal
over-privilege vs. minimal under-privilege, we again use the T Fβ metric. For this evaluation
the user and virtual machine role scores are presented separately because (unlike varying
22
3 0.6
2.5 0.5
2 0.4
OPR/UPR
TOPR
1.5 0.3
1 0.2
0.5 0.1
0 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TOPR (1 Operation Days) TOPR (7 Operation Days) UPR (1 Operation Days)
OPR (1 Operation Days) OPR (7 Operation Days) UPR (7 Operation Days)
the operation phase in Figure 2.6) the dissimilar behavior patterns of users and virtual
machines produce different recommended observation periods. Figure 2.9 displays the T Fβ
scores for user entities as the observation phase varies and the operation phase remains fixed
at one day. The decreasing scores for β = 0.1, 1, 2 imply that organizations which value
minimal over-privilege should choose a lower observation period. Even if minimal under-
privilege is valued twice as much as minimal over-privilege as indicated by β = 2, the OPR
rises significantly faster than the under-privilege rate decreases as the observation period
increases (as shown in Figure 2.7). For β = 5, 10 the T Fβ increases as the observation period
increases before eventually decreasing at 8 days for β = 5 and stabilizing at 13 days for
β = 10 as the increasing OPR outweighs the more heavily rated but slower to decline UPR.
The T Fβ scores for virtual machine roles are presented in Figure 2.10. The role based scores
23
0.8
0.7
0.6
0.5
Score
0.4
0.3
0.2
0.1
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TF10 TF5 TF2 TF1 TF0.1
for low β again show that organizations which value minimal over-privilege should use small
observation periods, while organizations which value minimal under-privilege will see little
or no benefit in extending the observation period for these roles as the under-privilege rate
showed little decline for observation periods over two days (as shown in Figure 2.8).
The results of this section quantify the effectiveness of our policy generator applied to
real world hosted web service audit log dataset. They describe how the performance of
the policy generator is affected by varying the observation period and operation period.
Based on this evaluation, we found that the actions of users were relatively difficult to
predict compared to virtual machine roles with incidents of under-privilege being much
higher for users. Virtual machines could be constrained to their actions used during their
24
1
0.9
0.8
Score
0.7
0.6
0.5
0.4
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TF10 TF5 TF2 TF1 TF0.1
first couple days operation to significantly reduce over-privilege present in their policies. For
both types of privileged entities, increasing the operation period increased under-privilege
while increasing the observation period increased over-privilege.
The conclusions drawn from these results are valuable because they quantify the perfor-
mance that can be expected by adopting an automated least privilege approach and they
provide a benchmark by which to judge future policy generation algorithms. The generation
of these results also demonstrates the application of the policy generation and evaluation
frameworks which can be used for evaluating future algorithms.
2.7 Summary
This paper explored the challenges and benefits of automating least privilege policies in
a cloud computing environment. Previous research in role mining approaches in other envi-
25
ronments were examined and unique aspects of automated role mining in a cloud computing
environment were identified. A bottom-up design to generate least privilege policies was
implemented to illustrate the potential of an automated least privilege approach and the
results of evaluation on real world audit log data were presented. The results showed that
even when administrators attempt to manually create least privilege policies there is signif-
icant room for improvement upon these policies. Metrics for evaluating the effectiveness of
least privilege policy generators were presented for the same data set. These results showed
the trade-offs between over-privilege and under-privilege that can be achieved by varying
the observation period, operation period, and resource constraints for the presented policy
generator and these results provide benchmarks for future policy generators to be evaluated
against.
26
CHAPTER 3
MINIMIZING PRIVILEGE ASSIGNMENT ERRORS IN CLOUD SERVICES
3.1 Introduction
27
risks, as well as a lack of motivation to create and enforce such policies. Failing to create
least privilege policies in a cloud computing environment is especially high risk due to the
potentially severe security consequences. However, it is also significantly more difficult
to achieve least privilege in the cloud computing environment than in other environments
due to the large variety of services and actions as detailed in Section 3.3.
Automatic methods for creating security policies that are highly maintainable have re-
ceived a significant amount of research in works that address the Role Mining Problem
(RMP). However, the maintainability of policies does not directly address how secure or
complete a policy is. To directly address the goals of security and completeness in policies,
we define the Privilege Error Minimization Problem (PEMP) where automatically
generated policies for future use are evaluated directly on their security and completeness.
The most important metric of a generated security policy should be how secure it is (mini-
mizing over-privilege) and how complete it is (minimizing under-privilege).
We use machine learning methods to address the PEMP which is fundamentally a pre-
diction problem. Audit logs contain the richest source of data from which to derive policies
that assign privileges to entities. We mine audit logs of cloud services using one unsupervised
and one supervised learning algorithm to address the PEMP along with a naive algorithm
for comparison. Note that researchers often take a program analysis approach to find which
privileges are needed by specific mobile or other types of applications; we do not take this
approach to address PEMP because the privilege errors in PEMP are associated with priv-
ileged entities, not an application. The F-Measure is a commonly used metric for scoring
in binary classification problems which we adapt to our problem. We show how the β vari-
able of the F-Measure can be used to provide a weighted scoring between under-privilege
and over-privilege. We present the results of our algorithms across a range of β values to
demonstrate how an organization can determine which approach to use based on its level of
acceptable risk.
28
The main contributions of this paper are: (1) a formal definition of the PEMP which
describes the problem of creating complete and secure privilege policies regardless of the
access control mechanism, (2) a metric to assess how well the PEMP is solved based on
the F-Measure, (3) a methodology of training and validating policy generation algorithms,
and (4) one supervised and one unsupervised learning algorithm applied to generating least
privilege policies and an analysis of their performance.
Section 3.2 reviews related works on role mining and automated least privileges. Sec-
tion 3.3 presents a comparison of the privilege spaces of various environments and a de-
scription of our dataset. Section 3.4 formally defines the PEMP and a scoring metric for
evaluating how well it is solved. Section 3.5 details specific algorithms and methods used
in our approach to addressing the PEMP and Section 3.6 analyzes the results of these al-
gorithms. Section 3.7 concludes this work and discusses potential research areas for future
work.
There are two areas of work closely related to ours: role mining and implementing least
privilege policies in other environments. Role mining refers to automated approaches to
creating Role Based Access Control (RBAC) policies. Role mining can be performed in
a top-down manner where organizational information is used or in a bottom-up manner
where existing privilege assignments such as access-control lists are used to derive RBAC
policies [22]. The problem of discovering an optimal set of roles from existing user permissions
is referred to as the Role Mining Problem (RMP) [23].
While we do not directly attempt to solve the RMP or one of its variations, our work
has aspects in common with works that do. The authors of [22] defined role mining as
being a prediction problem which seeks to create permission assignments that are complete
and secure by mining user permission relations. We also employ prediction to mine user
permission relations and create policies to balance completeness and security. Our work
differs from those that address RMPs in several key ways however. We mine audit log
29
data produced by a system in operation, not existing or manually created user-permission
assignments. We do not assume that the given data naturally fits into an RBAC policy
that is easy to maintain and secure. Most importantly, instead of evaluating an RBAC
configuration based on its maintainability, we focus on evaluating user privilege assignments
based on their completeness (minimizing under-privilege) and security (minimizing over-
privilege). We view our work as complementary to RMP research as once balanced user
permission assignments are generated, existing RMP methods can be used to derive roles
which are more compact.
Another area of research closely related to ours is works that use audit log data to achieve
least privilege. Privileged entities often already possess the privileges necessary to do their
jobs, thus roles can be derived from existing permissions via data mining methods [13].
Methods of automated policy generation have been studied in several environments. Pol-
gen [24] is one of the earliest works in this area which generates policies for programs on
SELinux based on patterns in the programs’ behavior. Other notable examples of mining au-
dit data to create policies include EASEAndroid [14] for mobile devices, ProgramCutter [15]
for desktop applications, and Passe [16] for web applications. [25] used Latent Dirichlet
Allocation (LDA), a machine learning technique to create roles from source code version
control usage logs. In [26], the same group used a similar approach to evaluate conformance
to least privilege and measured the over-privilege of mined roles in operating systems.
Previous approaches have several shortcomings which are addressed in this paper. Polgen
guides policy creation based on logs but does not provide over-privilege or under-privilege
metrics. EASEAndroid’s goal is to identify malicious programs for a single-user mobile
environment, not to create user policies. ProgramCutter and Passe help partition system
components to improve least privilege but do not create policies for privileged entities. Only
[25], [26] and [27] present metrics on over-privilege and under-privilege by comparing policies
to usage. Key issues with these works is that they assume roles are stable, not accounting
for change in user behavior over time, and use cross-validation for model evaluation which
30
is not appropriate for environments where temporal relationships should be considered. We
address these short comings using the rolling forecasting and sliding simulation methods
discussed in Sections 3.4.3.2 and 3.5.3, respectively. Finally, our work addresses the trade-off
between over- and under-privilege and the selection of different algorithms based on how an
organization values over- vs. under-privilege. A metric based on the F-Measure for scoring
over-privilege and under-privilege by comparing policies to usage and naive algorithm only
for building policies was presented in [27] which we expand upon and use the naive algorithm
presented in that work for comparison purposes.
The cloud environment is multi-user and multi-service, with high risk where errors in
privilege assignments can cause significant damage to an organization if exploited. With a
large number of services, unique privileges to each service, as well as federated identities and
identity delegation, the cloud also presents more complexity to security policy adminis-
trators than environments previously studied for policy creation such as mobile, desktop,
or applications. To quantify the scale of privilege complexity, we consider the size of the
privilege spaces for three environments: Android 7, IBM z/OS 1.13, and AWS. Android [28]
requires an application’s permissions to be specified in a manifest included with the appli-
cation with 128 possible privileges that can be granted. For IBM z/OS [29], we consider the
number of services derived from the different types of system resource classes; there are 213
resource classes and five permission states that can be granted to every class. The privilege
space of AWS is much larger however, with over 104 services and 2,823 unique privileges as
of August 2017 [30].
Our dataset for training and evaluation consists of 5.2M AWS CloudTrail audit events
representing one year of cloud audit data provided by a small Software As A Service (SaaS)
company. To better understand how much of the privilege space is used in our dataset,
statistics about privileged user behavior are shown in Table Table 3.1. This table separates
the metrics by the first month, last month, and total for one year of data. Users is the number
31
of active users during that time period. Unique Services Avg. is the average number of unique
services used by active users. Unique Actions Avg. is the average number of unique actions
P
exercised by active users, and Action Avg. is the average of the total actions exercised by
active users. The standard deviation is also provided for Unique Services, Unique Actions,
P
and Actions metrics to understand the variation between individual users. For example,
P
looking at both the Unique and Actions, we observe that their standard deviation is
higher than the average for all time periods, indicating a high degree of variation between
how many actions users exercise.
The problem we address is that of automatically creating least privilege access control
policies in the cloud environment.
We refer to the problem formally as the Privilege Error Minimization Problem (PEMP)
and define it using the notation from the NIST definition of RBAC [31].
32
Additionally we define the following terms:
• OBP observation period, the time-period during which exercised permissions (UPE) are
observed and used for creating user-to-permission assignment UPA.
While both UPE and UPA are user-to-permission relations, UPE represents exercised
permissions but UPA represents all assignments. Using the preceding terms, we now define
the PEMP.
Definition 1. Privilege Error Minimization Problem (PEMP). Given a set of users USERS,
a set of all possible permissions PRMS, and a set of user-permissions exercised UPE, find
the set of user-permissions assignments UPA that minimizes the over-privilege and under-
privilege errors for a given operation period OPP.
33
3.4.2 Algorithm Overview
Now that we have defined the PEMP as being a prediction problem, we adapt existing
prediction algorithms to address it. We utilize two machine learning methods in this paper
to generate privilege policies from mining audit log data. First, we employ clustering to find
privileged entities which use similar permissions, making the problem analogous to that of
finding similar documents in a text corpus. After finding similar users, we generate policies
that combine the privileges used by clustered entities. The second machine learning method
we employ is classification. Using a set of user-to-privilege relations exercised during the
observation period, we train a classifier to learn which user-to-privilege relations should be
classified as grant and which should be denied. Once trained, we use the classifier to generate
policies for an operation period. More details on the application of these algorithms to
generate least privilege policies are discussed in Section 3.5.
We borrow techniques and terminology used in machine learning literature for assessing
the effectiveness of our algorithms in addressing the PEMP. Using a standard approach
for evaluating the effectiveness of a predictive model [32], we take a test dataset for which
we know the expected (target) predictions that the model should make, present it to a
trained model, record the actual predictions that made, and compare them to the expected
predictions. We first present our method for scoring individual predictions, and then our
method for splitting up the dataset into multiple partitions.
Policy generation for a given operation period is a two-class classification problem where
every user-to-permission mapping in a generated policy falls into one of two possible classes:
grant or deny. By comparing the predicted privileges to the target privileges, we can cate-
gorize each prediction into one of four outcomes:
34
• True Positive (TP): a privilege that was granted in the predicted policy and exercised
during the OPP.
• True Negative (TN): a privilege that was denied in the predicted policy and not exercised
during the OPP.
• False Positive (FP): a privilege that was granted in the predicted policy but not exercised
during the OPP.
• False Negative (FN): a privilege that was denied in the predicted policy but attempted to
be exercised during the OPP.
Using the above outcomes we can then calculate precision, recall, and the F1 mea-
sure, a frequently used set of performance metrics in machine learning and information
retrieval [32]. Precision and recall are defined as follows[32]:
TP
precision = (3.1)
(T P + F P )
TP
recall = (3.2)
(T P + F N )
In terms of this problem domain, precision is the fraction of permissions accurately
granted by the predictor (T P ) over all permissions granted by the predictor (T P + F P ).
If there were no permissions granted by the predictor that went unused in the OPP, then
precision = 1. Thus a high precision value is an indicator of low over-privilege. Similarly,
recall is the fraction of permissions accurately granted by the predictor (T P ) over all permis-
sions exercised in the OPP (T P + F N ). If there were no permissions denied by the predictor
that should have been granted, then recall = 1. Thus a high recall value is an indicator of
low under-privilege.
Precision and recall can be collapsed into a single performance metric, the F1 measure,
which is the harmonic mean of precision and recall. For predictive assessment, it is often
preferable to use a harmonic mean as opposed to an arithmetic mean. Arithmetic means
35
are susceptible to large outliers which can dominate the performance metrics. The harmonic
mean however emphasizes the importance of smaller values and thus gives a more realistic
measure of model performance[32]. For example, the arithmetic mean when precision=0 and
recall=1 is 0.5, however the harmonic mean of those same values is 0.
The F1 measure is “balanced” because it gives equal weighting to precision and recall.
For our assessment we utilize a general form that allows for a variable weighting between
recall and precision (or, under-privilege and over-privilege), β. High β values increase the
importance of recall, while low β values increase the importance of precision. The weighted
measure, Fβ is defined in Equation 3.3.
P recision · Recall
Fβ = (1 + β 2 ) · (3.3)
(β 2 · P recision) + Recall
The β weighting is important because it is not reasonable to expect all potential users
of a policy generation tool to value over-privilege and under-privilege equally. Molloy et al.
identified equal weighting between over- and under-assignments as a problem in several pre-
vious works addressing the RMP [33], and preferred to weight more importance to reducing
over-privilege. It is also reasonable to expect that some organizations are willing to accept
more risk from over-privilege to minimize the cost of privileged entities not being able to
perform their duties due to under-privilege.
Following the standard approach for evaluating model effectiveness described earlier,
we will compare predicted results to expected (target) results. Rather than using a single
operation period for our evaluation which may not be representative of the entire dataset,
we must partition the dataset into multiple training and test sets using a sampling method.
We then aggregate the results of evaluating these partitions to produce a single score for a
proposed solution.
For our scenario however, we observe that there is a temporal aspect to permissions
and interdependencies between the exercised actions which imposes specific restric-
36
tions on how we should partition the dataset. For example, a resource such as a virtual
machine must be created before it can be used, modified or deleted. Methods such as hold-
out sampling and k-fold cross validation which randomly partition a dataset do not account
for interdependencies in the data and may not allow for learning algorithms to observe these
dependent actions to occur. Thus we use a sampling approach for scenarios like ours which
considers a time dimension with interdependent data referred to as “out-of-time sampling”;
it is a form of hold-out sampling which uses data from one time period to build a training
set and another period to build a test set[32]. The application of out-of-time sampling to
generate and score multiple training and test sets is sometimes known as “rolling forecasting
origin”, which is similar to cross-validation but the training set consists only of observations
that occurred prior to those in the test set [34]. Suppose k observations are required to
produce a reliable forecast. Then rolling forecasting origin works as follows [34].
1. Select the observation at time k + i for the test set, and use the observations at times
1, 2, ..., k + i − 1 to estimate the forecasting model. Compute the error on the forecast
for time k + i.
2. Repeat the above step for i = 1, 2, ..., T −k where T is the total number of observations.
Adapting the above method to our domain, we allow the training set/observation period
to be comprised of any set of dates before time k + i, and the test set/operation period is
specifically at time k + i. We define the step size i to be of one day, which is an adequate
amount of time to complete most tasks using related permissions. Also, when using an
automated solution to generate permission policies, it is reasonable to expect that new
solutions can be generated on at least daily basis.
The measure of forecast accuracy in our scenario is the Fβ score for a given operation
period described in Section 3.4.3.1, where a perfect prediction with no over-privilege and no
under-privilege present would score a 1.0. We use a rolling mean to compute the accuracy
37
of a proposed solution across all operation periods. Thus our quality measure used for
assessing an automated solution to creating permission policies should maximize the average
Fβ measure across all operation periods:
1 X
T −k
Fβ (P recisioni , Recalli ) (3.4)
T − k i=1
3.5 Methodology
This section describes the algorithms and techniques we design to address the PEMP in
the cloud environment. We first present a naive algorithm which will be used to establish a
performance baseline for us to compare the performance of our learning based approaches to.
While the naive algorithm merely uses a privilege entity’s observed privileges to build policies,
the learning based approaches also account for the behavior of other users in generating
policies. Each of these methods is applied for a single operation period. The evaluation of
an algorithm across multiple operation periods is done using the method described in Section
3.4.3.2.
The naive approach shown in Algorithm 1 takes all privileges exercised during the obser-
vation period as input and combines them to form a privilege policy to be used during the
operation period. This seems a reasonable approach for a policy administrator to take if they
needed to implement a least privilege policy in an environment where all privilege entities
previously had unrestricted access to all permissions. By examining all previous access logs
or only the access logs up to a specific point in the past, they can discover all privileges
used by each privileged entity and thus expect this to be the set of privileges required for a
privileged entity to perform their duties. Although infrequently used privileges will not be
captured if they are outside of the observation period, policy generation algorithms can still
achieve good results without knowing the frequency for which these privileges are exercised
38
because infrequently used privileges will have little impact on the Fβ score, particularly for
low β values which value minimizing over-privilege. Furthermore, in a low β environment
it is likely that infrequently used privileges should be denied by default and granted by
exception instead of always being granted by a long-term policy.
Our unsupervised learning policy generation method (Algorithm 2) uses a clustering al-
gorithm to find clusters of similar privileged entities based on their permissions exercised.
By placing each permission exercised by an entity into a separate document and applying
clustering to the document corpus (lines 2-5), we have made the problem analogous to find-
ing similar text documents in a corpus. Once similar entities are grouped by clustering,
each group is assigned a shared role and granted the combined permissions of all entities
in that role (lines 6-14). Entities which do not belong to any cluster are granted only the
privileges they used during the observation period just as in the naive method (lines 15-19).
It is important to note that using this method of combining similar entities only grants
permissions additional to those used during the observation period. This is useful in envi-
ronments where minimizing under-privilege is more important than minimizing
over-privilege.
There are several details of our application of clustering worth describing here. Each
document is converted to a feature vector for clustering using a Term Frequency-Inverse
Document Frequency (TF-IDF) vectorizer. TF-IDF is a common approach for finding similar
39
Algorithm 2: Unsupervised Policy Generator
Input: U P E The set of user-permissions exercised during the observation period
OBP .
Output: U P A The mapping of user-to-permission assignments.
1 U P A, documents ← ∅;
2 for user, perm ∈ U P E do
3 documentsuser ← documentsuser ∪ perm;
4 end
5 clusters, outliers ← DBSCAN (documents);
6 for cluster ∈ clusters do
7 role ← ∅;
8 for user, document ∈ cluster do
9 for perm ∈ document do
10 role ← role ∪ perm;
11 end
12 end
13 U P Auser ← role;
14 end
15 for user, document ∈ outliers do
16 for user, perm ∈ document do
17 U P Auser ← rolesuser ∪ perm;
18 end
19 end
20 return U P A
40
documents in information retrieval [35]. The TF-IDF weighting has the advantage that
it preserves information about how often each permission is exercised by a user. Once
vectorization is complete, the specific clustering algorithm we use for finding similar users is
the DBSCAN algorithm of the scikit-learn library [36], an implementation of the algorithm
originally published in [37]. The DBSCAN algorithm has several advantages for our scenario,
primary among them being that we do not need to specify the expected number of clusters
ahead of time unlike other popular clustering algorithms such as k-means. The performance
of DBSCAN also scales well in regards to the number of samples given when compared to
other clustering algorithms [38]. There is one relevant hyper-parameter for DBSCAN which
we vary in our policy generation experiments, ǫ, which is the maximum distance between two
samples for them to be considered as in the same cluster. We explore three methods for
calculating ǫ: the mean distance between all points, median distance between all points,
and middle point between the minimum and maximum points in the vector space.
41
Algorithm 3: Supervised Policy Generator
Input: U P E User-Permissions Exercised. The set of user-permissions exercised
during the observation period OBP .
Input: P RM S The set of possible permissions.
Input: T SP Training Set Parameters. Mapping of parameters used to build the
training set.
Input: CAP Classifier Algorithm Parameters. Mapping of parameters used to build
the predicted policy from a trained classifier.
Input: P GP Policy Generation Parameters. Mapping of parameters used to build
the predicted policy from a trained classifier.
Output: U P A Mapping storing the roles generated by each of the classifier instances.
1 U P A ← ∅;
2 for tP arams ∈ permute(T SP ) do
3 f eatureV ector, labelSet ← createTrainingSet(tP arams, U P E);
4 for clf P arams ∈ permute(CAP ) do
5 clf ← decisionTree(clf P arams);
6 clf ← clf.train(f eatureV ector, labelSet);
7 for pP arams ∈ permute(P GP ) do
8 roles ← ∅;
9 possibleP rivs ← createPossiblePrivs(pP arams, P RM S);
10 for user, perm ∈ possibleP rivs do
11 if clf.predict(user, perm) == ’granted’ then
12 rolesuser ← rolesuser ∪ perm;
13 end
14 end
15 U P AtP arams,clf P arams,pP arams ← roles;
16 end
17 end
18 end
19 return U P A
42
3.5.3.1 Classification Algorithm and Feature Selection
We use a decision tree (DT) classification algorithm for supervised learning, also from
the scikit-learn library [36]. The algorithm implemented in the library is an optimized
version, an implementation of the CART algorithm published in [39]. The advantages of the
decision tree algorithm used are speed and the ability to display the set of rules learned during
classification. It was also the top performing classification algorithm in our preliminary
comparison of 15 different classification algorithms in the scikit-learn library.
We utilize five features availabel directly from the audit log data for training: the time
at which a permission was exercised, the unique identifier of the executing entity, the type
of entity (user or delegated role), the service which the action belonged to, and the type
of action performed. Instead of using the absolute time of an action, we derive features
capturing whether it was exercised on a weekend or weekday, as well as the specific day
of the week. These are all bottom-up data attributes availabel directly from the access
logs. Other top-down information such as job role or organization department was not
availabel with our dataset (nor does it exist in many small organizations), but could easily
be integrated with the exercised privilege information if availabel.
Several hyper-parameters must be selected for our supervised learning approach. These
include parameters for the decision tree classifier, the constructions of the training set, the
policy construction from the trained classifier. Our method for selecting optimized hyper-
parameters uses only out-of-sample data and is an adaptation of the “sliding simulation”
method presented in [40].
The sliding simulation method of [40] is based on three premises. First, a model should
be selected based on how well it predicts out-of-sample actual data, not on how well it fits
historical data. Second, a model is selected from among many candidates run in parallel on
the out-of-sample data. Third, models are optimized for each forecast horizon separately,
43
making it possible to use different models and optimize parameters within models. The
method operates by running several prediction models in parallel across a sliding window of
data, computing the accuracy of each model for a given period and selecting the model(s)
with the best score to be used in creating the forecast for the next period. Using this
technique, the author in [40] showed that it outperformed the best method of a previous
competition in statistical forecasting (the M -Competition [41]) by a large margin.
As in the sliding simulation method, we run many permutations of parameters in parallel
on out-of-sample data and use the best performing parameters to create a future prediction.
Modifications were implemented to adapt sliding simulation to our problem domain. Slid-
ing simulation originally dealt with making numerical predictions and measuring the error
between a predicted and actual value. In our scenario a security policy is the prediction and
we use the Fβ score presented in Section 3.4.3.1 as our scoring criteria. While [40] used all
observation points before the forecast period, the most recent exercised permissions are most
relevant to predicting future permissions; training a classifier with older and less relevant
permissions had a negative effect on prediction accuracy.
44
We use two methods of decomposing the time series data which we term filter decomposi-
tion and filler decomposition. For the filter method, the days which do not fit into the chosen
model are filtered out of each observation period in the sliding window evaluation before the
data are used by the algorithms. With the filler method, the end date of the sliding window
evaluation is used as a starting point and the observation period is created by enlarging the
window by moving the start date backward until the observation period is “filled” with only
data matching the chosen model. Consider a sliding window evaluation with a window size
of 10 days using these two decomposition methods. For the filter method, the number of
days fitting the weekday model will vary from 6 to 8, and the number of days fitting the
weekend model will vary from 2 to 4. For the filler method, the number of days fitting a
model will always be 10 days when the sliding window size is 10 days.
The decomposition method used for evaluation is chosen based on the β value we wish
to optimize for. For algorithms seeking to score well for β > 1, increasing the window
size results in better scores, and the filter approach is used where the variations in the
observation dataset size are smoothed out across larger windows. For experiments which
seek to score well for β < 1, smaller window sizes score more favorably but the variable
number of matching days which fit within a chosen time period can have undesirable effects
on the results when using small window sizes. Thus the filler model is used in experiments
for β < 1 which gives a consistent number of days for data points in each window.
3.6 Results
This section analyzes the performance of our algorithms for generating security policies.
We first examine the results using the complete model and then show how decomposition
and the use of multiple decomposed models can improve on those results.
The Receiver Operating Characteristic (ROC) curve is a graphic commonly used to chart
the performance of binary classifiers. It charts the trade-off between the True Positive Rate
45
1
0.9
0.8
0.7
True Positive Rate (Recall)
0.6
0.5
0.4
0.3
0.2
0.1
0
0.00001 0.0001 0.001 0.01 0.1 1
False Positive Rate
DBSCAN-Average DBSCAN-Median DBSCAN-Middle Naïve DT-SOD-Recall
46
0.9
0.8
0.7
0.6
Score
0.5
0.4
0.3
0.2
0.1
0
1/100 1/90 1/80 1/70 1/60 1/50 1/40 1/30 1/20 1/10 1/5 1/2 1 2 5 10 20 30 40 50 60 70 80 90 f100
Beta
DBSCAN-Average DBSCAN-Median DBSCAN-Middle Naïve DT-SOD DT-SND AllowAll
47
score significantly better than the naive algorithm as β decreases with the performance gap
widening until β < 1/30, where the scores of the supervised and naive algorithms cease to
improve as β decreases. The unsupervised algorithms score relatively poorly for β < 1.
The trends in these charts highlight the strengths and weaknesses of each algo-
rithm. By granting users the privileges used by similar users, the unsupervised algorithms
predict privileges a user may use in the future. But there is no mechanism for the unsuper-
vised learning algorithm to learn which possible privilege grants may result in over-privilege
and restrict these privileges accordingly. The supervised algorithms attempt to learn any
patterns in the past data and use these to predict future privilege assignments. While priv-
ileges used previously are likely to be used again and rarely used privileges can be denied
with some degree of confidence, it is difficult to predict the usage of a future privilege that
has never been used before using only past patterns.
Figure 3.1 and Figure 3.2 show the scores of algorithms regardless of the size of the
observation period. We next examine the performance of these algorithms for fixed β values
as the observation period size varies. We chose values β = 80 and β = 1/10 because these
seemed the most interesting in terms of the trade-offs between the various methods. The
performance of the unsupervised and naive algorithms for β = 80 are shown in Figure 3.3.
The choice of ǫ as the threshold for determining which users are alike presents interesting
trade-offs between window size and score. In general, using the median for calculating ǫ
consistently provides slightly better scores than the naive approach across all window sizes
with the scores for both the unsupervised algorithm (with the middle method) and naive
algorithm peaking at 115 days. Using the average and middle methods for calculating ǫ both
provide better scores for observation periods < 40 days, but their scores level off there and
begin to gradually decrease after peaking at 59 days for the average method and 68 days for
the median method.
The performance of the supervised and naive algorithms for β = 1/10 are shown in
Figure 3.4. The naive algorithm achieves its best performance with an observation period
48
0.95
0.9
0.85
0.8
Score
0.75
0.7
0.65
0.6
0.55
0 20 40 60 80 100 120
Days
DBSCAN-Average DBSCAN-Median DBSCAN-Middle Naïve
49
0.7
0.65
0.6
0.55
0.5
Score
0.45
0.4
0.35
0.3
0.25
0.2
1 2 3 4 5 6 7
Days
Naïve DT-SOD-B1/10 DT-SND-B1/10
In this section we present the results after decomposing the dataset in separate models
for weekday and weekend data using the decomposition methods discussed previously in
Section 3.5.4.
The performance of the complete and decomposed models for β values >= 1 for both the
naive algorithm and the unsupervised algorithm (with the average method for calculating ǫ)
are shown in Figure 3.5. For both algorithms, the weekday model performance is superior
to the complete model for β values >= 1. The trend previously illustrated in Figure 3.2 of
50
0.95
0.85
0.75
0.65
Score
0.55
0.45
0.35
0.25
1 2 5 10 20 30 40 50 60 70 80 90 100
Beta
Naïve-Complete Naïve-WeekDay Naïve-WeekEnd
Average-Complete Average-WeekDay Average-WeekEnd
51
similar users will exercise similar privileges in a cluster if identified.
0.9
0.8
0.7
Score
0.6
0.5
0.4
0.3
1/100 1/90 1/80 1/70 1/60 1/50 1/40 1/30 1/20 1/10 1/5 1/2 1
Beta
Naïve-Complete Naïve-WeekDay Naïve-WeekEnd
DT-SND-Complete DT-SND-WeekDay DT-SND-WeekEnd
The performance of the complete and decomposed models for β values <= 1 for both the
naive algorithm and the supervised algorithm (using the SND labeling method) are shown
in Figure 3.6. As with the unsupervised algorithm and β values >= 1, the weekday model
outperforms the complete model while the weekend model under-performs the complete
model where β values <= 1 as well. The performance gap between the weekday and complete
models for the supervised algorithm is much larger than in previously examined experiments.
With the inconsistent activity of the weekend actions removed, the supervised algorithm is
better able to identify and leverage patterns to create security policies. The performance
52
of the supervised algorithm for the weekend model decreased substantially compared to the
complete model however. For β = 1/30, the supervised weekend model scored 39% lower
than the complete model, while the naive weekend model scored only 19% lower than its
complete model. The reasons for the lower weekend model scores for the supervised algorithm
are the same as the lower weekend model scores for the unsupervised algorithm: there is less
data to work with and higher variability in that data.
Section 3.6.2 illustrated how decomposition improved scoring for the weekday model, but
we are interested in finding the highest possible score across all days in the availabel dataset.
To improve the overall score, we combine two previously examined models using
one model and algorithm for the weekday policies and another model and algorithm for the
weekend policies which we refer to this as a recomposed model. To build the recomposed
model, we use policies from the weekday model when evaluating weekdays, but as the pre-
viously examined results have shown, the weekend models performed fairly poor so we will
instead use policies generated by the complete model when evaluating weekends.
The performance of the complete and recomposed models for β values >= 1 for both
the naive algorithm and the unsupervised algorithm (with the average method used for
calculating ǫ) are shown in Figure 3.7. For the unsupervised algorithm, the recomposed
model outscores the complete model for β values >= 5, and outscores the naive algorithm
for both the complete and recomposed models for β >= 50, with the performance gap
increasing after that as β increases. For the naive algorithm however, the improved scores
of the weekday model are not enough to offset the poorer scores of the complete model for
the weekend days, thus the recomposed model using the naive algorithm scores almost the
same as the complete model for β > 5. The scores for the highest β value tested are .9379
for the recomposed model with the unsupervised algorithm and .9149 for the recomposed
model with the naive algorithm, an improvement of 2.5% over an already fairly high score.
53
0.95
0.85
0.75
0.65
Score
0.55
0.45
0.35
0.25
0.15
1 2 5 10 20 30 40 50 60 70 80 90 100
Beta
Naïve-Complete Naïve-WD/Naïve-Complete Average-Complete Average-WD/Naïve-Complete
54
1
0.9
0.8
Score
0.7
0.6
0.5
0.4
1/100 1/90 1/80 1/70 1/60 1/50 1/40 1/30 1/20 1/10 1/5 1/2 1
Beta
Naïve-Complete Naïve-WD/Naïve-Complete DT-SND-Complete DT-SND-WD/DT-SND-Complete
Creating security policies is inherently an optimization problem that must balance be-
tween minimizing over-privilege and minimizing under-privilege. How much one values
achieving one of these objectives vs. the other can be expressed using the β value as described
in Section 3.4.3. The results of this section demonstrate the effectiveness algorithms and
decomposition methods that can be used to create better security policies for a
cloud environment with “better” being expressed in terms of the Fβ score.
We also presented the results of using decomposition methods to decompose the dataset
into weekday and weekend models and then use the best aspects of the weekday and complete
55
models for scoring across the complete dataset time period. Not all audit log datasets will
exhibit similar behavior that benefits from such decomposition, but it is reasonable to expect
many datasets consisting of audit log events generated by human privileged entities working
a five-day work week will. Regardless of the decomposition method used, we find that the
unsupervised algorithm performs more favorably as β increases due to its ability to
use information from similar users to predict the future use of privileges. The unsupervised
algorithm does not have a mechanism to deny privileges however, so its scores are relatively
low for small β values. Conversely, the supervised algorithm performs more favorably
as β decreases but poorly for large β values. The supervised algorithm is able to use
the recurring patterns in data to score well for restricting privileges, but scores poorly at
predicting possible new privileges that privileged entities may use. The naive approach
performs well only for values near β = 1, representing its favorability for environments
which value balancing over- and under-privilege nearly equally but it is outperformed by
the other algorithms as the β value increases or decreases away from β = 1. The key
takeaway from these results is that how an organization values over-privilege vs.
under-privilege will determine which algorithm is best suited for generating that
organization’s security policies; none of the three examined algorithms is clearly
superior to the others for all likely scenarios.
3.7 Summary
This paper addressed issues related to automatically creating least privilege policies in
the cloud environment. We defined the Privilege Error Minimization Problem (PEMP)
to directly address the goals of completeness and security when creating privilege policies,
and introduced a weighted scoring mechanism to evaluate a policy against these goals. We
adapted techniques from statistical forecasting and machine learning to train and evalu-
ate a supervised and an unsupervised learning algorithm for automated policy generation.
The results of our analysis show that the supervised algorithm performed well for reducing
over-privilege while the unsupervised algorithm performed well for reducing under-privilege
56
compared to a naive approach. These results demonstrate the potential to apply such au-
tomated methods to create more secure roles based on an organization’s acceptable level of
risk in accepting over-privilege vs. its desire to minimize the effort to correct under-privilege.
This paper suggests many possibilities for future research in automated least privileges
approaches. The policy generation approaches described in this paper are based on features
directly availabel in the audit logs such as the service name, user name, and privilege ex-
ercised. We would consider additional features for future research such as properties of the
requesting entity and the resources being operated on such as a user’s job title and organiza-
tional unit or the subnet(s) which a virtual resource operates within. Combining the ability
of the unsupervised algorithm (to predict the use of future privileges based on clusters of
similar users) with the ability of the supervised algorithm (to restrict privileges which are
unlikely to be used in the future) may also improve scoring.
57
CHAPTER 4
MINING LEAST PRIVILEGE ATTRIBUTE BASED ACCESS CONTROL POLICIES
4.1 Introduction
Access control is a key component of all secure computing systems but implementing
effective and secure access control policies is a significant challenge. Access control policies
are predictions about which privileged entities will exercise specific operations upon specific
objects under various conditions and accurately predicting the future is always difficult. Too
much over-privilege increases the risk of damage to the system via compromised credentials,
insider threats, and accidental misuse. Policies that are under-privileged prevent users from
being able to perform their duties. Both of these conflicting goals are expressed by the
principle of least privilege which requires every privileged entity of a system to operate using
the minimal set of privileges necessary to complete its job [20]. The principle of least privilege
is a fundamental access control principle in information security [1] and is a requirement in
security compliance standards such as the Payment Card Industry Data Security Standard
(PCI-DSS), Health Insurance Portability and Accountability Act (HIPAA) and ISO 17799
Code of Practice for Information Security Management [21].
Many access control models have been introduced to address the challenges of creating
and administrating secure and effective access control policies, with different approaches
attempting to balance between the competing goals of ease of use, granularity, flexibility,
the ability to leverage aspects unique to a specific domain, and scalability. Access control
models are constantly evolving, but Attribute Based Access Control (ABAC) continues to
gain in popularity as the solution to many access control use cases because of its flexibility,
usability, and ability to support information sharing across disparate organizations. ABAC
allows security policies to be created based on the attributes of the user, operation, and
environment at the time of an access request.
58
The flexibility of ABAC policies is both a major strength and a hindrance. With the
ability to create policies based on many attributes, administrators face difficult questions
such as what constitutes “good” ABAC policies, how to create them, and how to validate
them? Additionally, the ABAC privilege space of a system can be extremely large, so how can
administrators determine which attributes are most relevant in their systems? We address
these issues by taking a rule mining approach to create ABAC policies from audit logs. Rule
mining methods are a natural fit for creating ABAC policies because security policies are
a set of rules regarding the actions that users can perform upon resources. By identifying
common patterns of usage between the attributes and values from audit logs, rules can be
created based on an organization’s acceptable level of risk regarding under- vs. over-privilege.
By using out-of-sample validation to evaluate the effectiveness of the generated policies on a
dataset of 4.7M Amazon Web Service (AWS) log events, our experiments show that our rule
mining based approach is effective at generating policies which minimize the instances of
under-privilege (which allows users to perform their necessary duties), while also minimizing
over-privilege (which reduces security risks to the system).
We address the problem of creating least privilege ABAC policies using rule mining tech-
niques in this research through the following contributions: 1) a definition for the ABAC
Privilege Error Minimization Problem (P EM PABAC ) which addresses balancing between
under- and over-privilege errors in security policies, 2) an algorithm for automatically gen-
erating least privilege ABAC policies from mining audit logs, 3) an algorithm for scoring
ABAC policies using out-of-sample validation, 4) feature selection, scalability, and perfor-
mance optimization methods for processing large ABAC privilege spaces, 5) a quantitative
analysis of the performance of our mining algorithm using a real-world dataset consisting
of over 4.7M audit log entries, and 6) a performance comparison of automatically generated
ABAC policies created by our mining algorithm with automatically generated role based
policies.
59
The rest of this paper is organized as follows. Section 4.2 provides background information
on the ABAC model and rule mining methods. Section 4.3 reviews related work specific
to mining access control policies. Section 4.6 formally defines the ABAC version of the
privilege error minimization problem of mining ABAC policies with minimal under- and over-
privilege assignment errors and defines metrics for evaluating policies. Section 4.7 details
specific algorithms and methods used in our approach for addressing the problem defined
in Section 4.6. Section 4.8 analyzes the results of applying our algorithms to a real-world
dataset. Section 4.9 concludes and discusses potential future work.
4.2 Background
4.2.1 Attribute Based Access Control (ABAC)
4.2.1.1 ABAC Definition
NIST defines ABAC as “An access control method where subject requests to perform op-
erations on objects are granted or denied based on assigned attributes of the subject, assigned
attributes of the object, environment conditions, and a set of policies that are specified in
terms of those attributes and conditions” [42]. Attributes are any property of the subjects,
objects, and environment encoded as a name:value pair. Subjects may be a person or non-
person entity (such as an autonomous service), objects are system resources, operations are
functions executed upon objects at the request of subjects and environment conditions are
characteristics of the context in which access requests occur and are independent of subjects
and objects [42].
60
By using identity federation and basing access decisions on policies using an abstracted
common set of attributes, decisions can be externalized with policies established across orga-
nizational boundaries [43]. Because of these characteristics, the Federal Identity, Credential,
and Access Management (FICAM) Roadmap 2.0 called out ABAC as a recommended access
control model for promoting information sharing between diverse and disparate organiza-
tions [42].
The Role Based Access Control (RBAC) model has been the de-facto access control
standard for industry and academia for more than two decades [44]. Using RBAC, admin-
istrators identify privileges needed for common job functions, create roles for each function
and assign users to their appropriate roles for performing their duties. This simplifies the
administrators’ task compared to DAC and provides more granularity than MAC.
However, as access control needs have become more complex and applied to more di-
verse domains, organizations have found that RBAC does not provide sufficient granularity,
becomes too difficult to manage, or does not support their information sharing needs. Orga-
nizations facing these challenges may meet them using an ABAC based system. Consider the
case of an administrator that wishes to restrict operations needed for performing a database
backup to a specific maintenance window timeframe and a specific location or IP address
range. Such constraints can be easily expressed using ABAC attributes, but cannot be ex-
pressed using only the user, operation, and object semantics of the RBAC model. Another
common problem with RBAC is “Role Explosion”, where the need to define and assign users
many roles to access diverse sets of different applications within an organization makes main-
tenance of the many roles unmanageable. ABAC is able to address this problem by defining
policies based on user attributes (for example their job title, supervisor, or skill set in an
HR database) so that access control decisions are made according to attributes of the user
at the time of the access request.
61
4.2.2 Rule Mining Methods
Frequent itemset mining and association rule mining are two popular rule mining methods
for identifying patterns in commercial databases [45] with applications in many diverse fields.
Frequent itemset mining is the first step in association rule mining and is a deterministic
method that identifies common patterns in a database of transactions. The frequent itemset
problem is defined as follows, given a transaction database DB and a minimum support
threshold ǫ, find the complete set of frequent patterns in the database. The set of items is
I = {a1 , ..., an } and a transaction database is DB = hT1 , ..., Tm i, where Ti (i ∈ [1...m]) is a
transaction which contains a set of items in I. The support of a pattern A (where A is a set
|Ti ∈DB|A⊆Ti |
of items), is the fraction of transactions containing A in DB, support(A) = |DB|
. A
pattern is frequent if A’s support is >= ǫ, (the minimum support threshold) [46].
Association rule mining uses itemsets identified by a frequent itemset mining algorithm
to identify rules of the form X ⇒ Y where X ⊂ I, Y ⊂ I, and X ∩ Y = ∅. The first itemset
X is the “antecedent” and the second itemset Y is the “consequent”. The confidence of
a rule X ⇒ Y is the proportion of the transactions that contain X which also contain Y ,
support(X∪Y )
conf idence(X ⇒ Y ) = support(X)
. Given a transaction database DB, minimum support
threshold ǫ and minimum confidence c, the association rule mining problem is to find all of
the rules in DB that have support ≥ ǫ and conf idence ≥ c [47].
The output of frequent itemset mining is many subsets of items that occurred within the
transaction database DB, while the output of association rule mining is two subets (X ⇒ Y )
implying the probability that Y occurs in the transaction database given X. In the context
of creating security policies, there is a clear translation of frequent itemsets into ABAC rules.
Just as frequent itemsets state whether a pattern occurred or not (with a given support ≥ ǫ),
security policies must make a binary decision about whether a request should be allowed or
denied.
62
4.3 Related Work
We group related work into two categories: those that deal with generating RBAC least
privilege policies, and those that address the problems of modifying existing ABAC policies
or creating ABAC policies of minimal size. To the best of our knowledge, our work is the
first to address the problem of automatically creating least privilege ABAC policies.
We first consider the set of related works which generate RBAC least privilege poli-
cies from audit logs. In [48], the authors formally define the Privilege Error Minimization
Problem (PEMP) which seeks to minimize the under-privilege and over-privilege assignment
errors of a policy put into operation. Naive, unsupervised learning, and supervised learning
algorithms are designed to mine RBAC policies using attribute information from audit logs.
Policy evaluation was performed by using out-of-sample validation over discretized time pe-
riods. Our work uses a similar evaluation method but designs a rule mining algorithm to
generate ABAC policies. With the ability to use attributes in mined policies (vs. user, oper-
ation, and resource ids only in RBAC), we are able to generate policies that simultaneously
reduce both under- and over-privilege when compared to RBAC policies in [48].
Another important work in generating least privilege policies is [25] which used Latent
Dirichlet Allocation (LDA) to create least-privilege RBAC policies from source code version
control usage logs. This method also used user attribute information in the mining process
although the resulting policies were RBAC policies. In [25], the authors introduced the
λ−Distance metric for evaluating candidate rules, which added the total number of under-
assignments to the total number of over-assignments with λ acting as a weighting factor on
the over-assignments to specify how much the metric values over-privilege vs. under-privilege
for a particular application.
Because the under- and over-assignments in λ−Distance are not normalized before being
added, it is easy for one side to dominate the equation. Extreme changes in λ may be needed
63
to trade off between under- and over-privilege, or slight changes to λ may cause extreme
changes in the resulting policies depending on the sizes of the log entries and privilege space.
This makes it difficult for an administrator to choose a λ value which accurately captures
their organization’s desired balance between under- and over-privilege.
One early work on applying association rule mining to ABAC policies was [49], which
used the Apriori algorithm [50] to detect statistical patterns from access logs of a set of
lab doors in a research lab. The dataset consisted of 25 physical doors and 29 users who
used a smart-phone application and Bluetooth to open the doors. The authors used the
output of the mining algorithm to identify policy misconfigurations by comparing mined
rules with existing rules. The performance of the algorithm was measured in terms of the
trade-off between success in detecting and guiding the repair of misconfigurations vs. the
inconvenience to users of suggesting incorrect modifications to policies. The dataset used
in [49] was ”somewhat small” as the authors noted, leaving questions as to its scalability in
terms of users and attributes where we use a much larger dataset in terms of the number of
events and attributes.
Another work, [51] presents a tool named Rhapsody which builds upon Apriori-SD [52],
a version of the Apriori algorithm modified for subgroup discovery. This work is similar to
our own in that it also seeks to create ABAC policies of minimal over-privilege by mining
logs however, it does not provide a weighting method for balancing between under-privilege
and over-privilege, nor does it consider large and complex privilege spaces. Instead, Rhap-
sody uses a simpler model of attributes of Users and Permissions only instead of the Users,
Operations, Resources and Environment attributes we use. Rhapsody uses a metric termed
reliability to quantify the confidence of a rule and all its significant refinements to assist in
simplifying and reducing over-privilege of policies. While Rhapsody is designed to operate
on “sparse” audit logs where only a small amount (≤ 10%) of all possible log entries are
64
likely to occur in the mined logs, our work is designed to operate on logs several orders of
magnitude more sparse than those of Rhapsody using optimization techniques described in
Section 4.7.3. One important limitation of Rhapsody is that run-time grows exponentially
with the maximum number of rules a request may satisfy, limiting number of attributes that
can be considered to “less than 20” [51], which would prevent a direct comparison using our
dataset of over 1,700 attributes. We also employ different metrics and scoring methodology
for evaluating policies compared to Rhapsody. The authors of [51] use the F-score metric
which was suitable for RBAC policy evaluation in our previous works, but we found to be
too dominated by the Precision component when scoring ABAC policies so we have cho-
sen to evaluate policies in terms of True Positive Rate and False Positive Rate separately.
Furthermore, we use a sliding window approach to evaluate policies over time which retains
their temporal dependencies vs. the random sampling cross-validation approach used in [51].
In [53], the authors used association rule mining to mutate existing ABAC policies as
a moving target defense against attackers who could compromise values of attribute stores
(with stores possibly distributed across multiple organizations). By expanding an existing
policy with new rules that use highly correlated attributes identified by using association rule
mining techniques on audit logs, this method provides additional protection in the event that
attribute values used by the original policy rules are compromised. While [53] also used rule
mining of audit logs, it did not create new policies nor did it aim to achieve least privilege
policies. Experimental results dealt with identifying correlations between attributes but the
analysis of the security of the results was qualitative so there were no metrics of goodness
similar to ours to use as a comparison between [53] and this work.
A few papers have been published to address the ABAC Mining Problem which deals
with finding ABAC policies of minimal size given a set of authorizations or audit log entries.
The ABAC Mining Problem was addressed by Xu & Stoller in [54], then formally defined by
another group of researchers in [55]. The metric for evaluating the minimal size of ABAC
65
policies in these works is Weighted Structural Complexity (WSC), which was introduced in
[56] to measure the size of RBAC policies and adapted to ABAC policies in [57].
In [54], Xu & Stoller presented an algorithm for mining ABAC policies from operational
logs. Their algorithm attempts to create policies that cover all the entries found in an audit
log while also minimizing the number of over-assignments and the WSC of the policy through
a process of merging and simplifying candidate rules. The authors defined Qrul (Equation
4.1), a quality metric for evaluating candidate rules during the mining process. In this quality
metric, |[[p]]| is the number of user-permissions in the possible permission universe covered by
a candidate rule p. |[[p]] ∩ U P | represents the number of user-permissions in the logs covered
by p, but not covered by existing rules in the policy. W SC(p) represents the WSC score of
rule p. The number of over-assignments granted by the rule is |[[p]] \ U P (L)| where L is the
operation log. Balancing between the number of over-assignments produced by a rule p and
the number of log entries covered by p is achieved by varying the over-assignment weight,
ω0′ .
66
Both [54] and [55] mine rules and calculate coverage based on user-permission tuples,
where a tuple hu, o, ri contains a user, operation, and resource only, instead of considering
all of the valid attribute combinations in the privilege space. This reduces the computational
complexity of mining and evaluating rules, but unfortunately presents a problem for accuratly
evaluating ABAC policies because such a tuple may be both allowed and denied unless
considering all the attributes of the user, operation, and resource at the time of the user
request. The authors of [55] identify and address this problem by denying all instances
of a tuple if any single instance of that tuple is denied. This significantly reduces the
granularity and flexibility advantages of the ABAC model. This issue is further complicated
when evaluating policies over time because attribute values may change. To address these
problems, we base our metrics on the entire ABAC privilege space of valid attribute:value
pairs instead of the individual users, operations, and resources.
Another key difference between our work and all previous works cited in ABAC mining
is the evaluation of policies for least-privilege over time. None of the previous works on
ABAC policy mining captured the performance of mined policies in terms of under-privilege
vs. over-privilege when put into operation, which we contend is the most important measure
of a security policy. We use out-of-sample validation on a real world dataset to evaluate
the under-privilege and over-privilege rates of policies over time using a sliding window of
observation and operation periods, a method originally described in [48]. While minimizing
complexity (evaluated by WSC) is desirable in that it makes policies easier to maintain by
administrators, we see it as less important than least privilege performance over time. This
is especially true when using automated methods to build policies where less administrator
involvement is necessary. Methods for minimizing ABAC policy complexity are complemen-
tary to our work as once least privilege policies are identified, then methods for minimizing
policy complexity can be applied.
67
4.6 Problem Definition and Metrics
The problem we address in this paper is minimizing privilege assignment errors in ABAC
policies. Access control can be viewed as a prediction problem. The statements that comprise
a policy are predictions about which entities should be granted privileges to perform specific
operations upon the specific resources necessary to perform their jobs. The goal of this work
is to automatically generate policies that are accurate access control predictions. There have
been many access control related papers with similar but not entirely the same goals. To
help clarify the specific problem this paper addresses we formally define it as the ABAC
Privilege Error Minimization Problem (P EM PABAC ) in this section. We also define specific
metrics to be used in evaluating the performance of proposed solutions (in the form of ABAC
policies).
Our problem definition is based on the Privilege Error Minimization Problem (PEMP)
originally defined in [48]. The PEMP defined the problem of creating least privilege RBAC
policies which consisted of users, operations, and objects. Like the original PEMP, our
problem seeks to minimize the under- and over-privilege assignment errors in policies and uses
the notions of observation and operation periods for evaluation. However, users, operations,
and resources are only some of the attributes availabel when creating ABAC policies so we
adapt the problem definition to the ABAC privilege space.
The size of an ABAC privilege space is determined by the attributes and values of valid
ABAC policies. A is the set of valid attributes which can be used in policies. As in other
ABAC mining works [49, 53, 54], we assume all attributes and values present in the logs
can also be used in building policies. Each individual attribute ai ∈ A has a set of atomic
values Vi which are valid for that attribute. All values for an attribute are the attribute’s
range Range(ai ) = Vi . The Cartesian product of all possible attribute:value combinations is
ξ = V1 ×...×Vn = {(v1 , ..., vn )|vi ∈ Vi for every i ∈ {1, ..., n}}. However, some attribute:value
68
pairs are not valid when present in combination with other attribute:value pairs because of
dependencies between them. For example, some operations are only valid on certain resource
types so combinations including both operation:DeleteUser and resourceType:File are not be
valid. The valid privilege universe ξ‘ is the set of all possible attribute:value combinations
when considering the dependency relationships between all attributes and values.
Any measure of security policy accuracy must also take time into account because the
amount of risk from over-privileges accumulates over time. Over-privilege carries the risk that
an unnecessary privilege will be misused, and this risk increases the longer the over-privilege
exists. To capture risk across a specified time period, we define the Operation Period (OP P )
as the time period during which security policies are evaluated against user operations. With
the concepts of the valid privilege universe ξ‘ and operation period OP P defined, we now
define the ABAC specific version of the Privilege Error Minimization Problem P EM PABAC
(Definition 1).
Definition 1. P EM PABAC : ABAC Privilege Error Minimization Problem. Given the uni-
verse of all valid attribute:value combinations ξ ′ , find the set of attribute:value constraints
that minimizes the over-privilege and under-privilege errors for a given operation period
OP P .
We use terminology from statistical hypothesis testing for assessing the effectiveness
of our algorithm in addressing the P EM PABAC . We first present our method for scoring
individual predictions, and then our method for splitting up the dataset and evaluating the
algorithm’s performance over multiple time periods.
Policy evaluation for a given operation period is a two-class classification problem where
every possible event in the ABAC privilege space falls into one of two possible classes:
grant or deny. By applying the policies generated from the observation period data to the
69
privileges exercised in the operation period, we can categorize each prediction into one of
four outcomes:
• True Positive (TP): a privilege that was granted in the predicted policy and exercised
during the OPP.
• True Negative (TN): a privilege that was denied in the predicted policy and not exercised
during the OPP.
• False Positive (FP): a privilege that was granted in the predicted policy but not exercised
during the OPP.
• False Negative (FN): a privilege that was denied in the predicted policy but attempted to
be exercised during the OPP.
TP
TPR = (4.2)
(T P + F N )
FP
FPR = (4.3)
(F P + T N )
Using the above outcomes we then calculate True Positive Rate (TPR) also known as
Recall and False Positive Rate (FPR) as shown in Equations 4.2 and 4.3, respectively.
As with the problem definition, these metrics are also derived from [48] but adapted from
RBAC to be more suitable to the ABAC privilege space. Where [48] used metrics based
on TPR and Precision, we used TPR and FPR instead. Precision ( (T PT+F
P
P)
) is suitable
when considering the users and operations because the universe of possible grants is roughly
on the same order of magnitude as the number of unique log events. When dealing with
the ABAC universe, the number of possible unique attribute:value combinations is likely to
be many orders of magnitude greater than the number of events in the operational logs.
To avoid over-fitting, ABAC rules must grant a large number of attribute:value privileges in
absolute terms (on the order of hundreds or thousands of attribute:value combinations in our
experiments), but are actually still quite small relative to the universe of possible attribute
70
combinations (which totals in the millions or billions). Stated another way, Precision is not
a suitable metric for use in mining ABAC policies from logs because it uses one term (TP)
which is driven primarily by the number of entries in the log, and another term (FP) which is
driven by the size of the privilege universe. On the other hand, both terms in the TPR (TP
and FN) are log derived, and both terms in FPR (FP and TN) are policy derived metrics.
TPR and FPR are the metrics used to evaluate a policy in terms of under-privilege
and over-privilege, respectively. If all privileges exercised in the OP P were granted, there
was no under-privilege for the policy being evaluated so F N = 0, and T P R = 1. As the
number of erroneously denied privileges (FNs) grows, T P R → 0, thus TPR represents under-
privilege. For the edge case that no privileges were exercised in the OP P we redefine TPR
to be T P R = 1, as no under-privilege is possible in this case. If all privileges granted by
the policy were exercised during the OP P , there was no over-privilege for the policy being
evaluated so F P = 0 and F P R = 0. As the number of erroneously granted privileges (FPs)
grows, F P R → 1, thus FPR represents over-privilege.
To score policies across multiple time periods, we use out-of-time validation [32], a tem-
poral form out-of-sample validation. In out-of-sample validation, a set of data is used to
train an algorithm (training set) and a separate set of non-overlapping data is used to test
the performance of the trained algorithm (test set). In our evaluation, the training and test
sets are contiguous and the test time period immediately follows the training time period.
The training set is refered to as the Observation Period (OBP ), while the test set is the
Operation Period (OP P ) defined previously in Section 4.6.1. It is important to note that
this method preserves the temporal interdependencies between actions. For example, if an
employee moves to a new position within the organization, one would expect the privileges
mined for that employee in the future time periods would be very different from those mined
in the past time periods. Methods such as k-fold cross validation which randomly partition
a dataset (and used in [25] to evaluate policies) do not account for these temporal inter-
71
dependencies. When charting metrics for multiple time periods, we use the average of all
individual scores. This gives equal weight to each operation period score.
Quantifying the number of resources allowed or denied by a policy implies that there is a
known value for the number of possible resources in the system. This presents a challenge for
any least-privilege scoring approach that is not unique to the ABAC model or our method-
ology. While every system has finite limits on the resource identifier length and number of
resources, these can be so numerous that we can consider them as too large to quantify and
treat them as being infinite. For example, consider how many possible file names there are
255
for the ext4 file system with up to 255 bytes for the file name, 28 possible distinct file
names exist, excluding the file path [58].
Instead of counting all possible resource identifiers, we use the resource identifiers present
in the OBP and OP P for our policy scoring calculations. This approach presents several
advantages over other possible approaches such as using all values in the dataset, or in-
trospecting the environment for the resources present (which would be prohibitively time
consuming for our dataset). Only the recently used resources are counted, giving them
greater importance, and all necessary data is availabel in the audit logs. This also implies
that the valid privilege space ξ‘ may vary in size between scoring periods depending on the
resource identifiers present.
4.7 Methodology
This section presents both the algorithm we used to generate policies for addressing the
P EM PABAC problem as well as the algorithm we used to score these policies across multiple
operation periods.
72
4.7.1 Rule Mining
4.7.1.1 Scoring Candidate Rules
Our rule mining algorithm operates similarly to the mining algorithms presented in [25,
54] in that it considers the set of uncovered log entries and iteratively generates many
candidate rules, scores them, and selects the best scoring rule for the next iteration until all
of the given log events are covered by the set of generated rules. A critical component of
this approach is the metric used to evaluate candidate rules. Before describing the algorithm
design, we will first detail the metric used for evaluating candidate rules generated during
the mining process. We propose a candidate scoring metric termed the Cscore in this paper
using the following definitions.
• c is an ABAC constraint specified as a attribute:value pair, or single key and a set of values
key:{values}. Values are required to be discrete, continuous values must be binned to be
used by the mining algorithm. r is a rule consisting of one or more constraints. p is a
policy consisting of one or more rules.
• L is the complete set of log entries for the dataset, LOBP is the set of logs in the observation
period OBP , LOBP ⊆ L.
• LOBP (c) is the set of log entries which meet (are ”covered by”) the set of constraints c.
The constraint set may be specified by the use of a rule r or policy p, LOBP (c) ⊆ LOBP .
• ξ ′ is the privilege universe of valid log events as defined previously in Section 4.6.1.
The CoverageRate (Equation 4.4) is the ratio of all logs in the observation period covered
by a candidate rule r that are not already covered by other rules in the policy p (|LOBP (r) \
LOBP (p)|) to the remaining number of log entries not covered by any rules in the policy
(|LOBP \ LOBP (p)|). A candidate rule that covers more log entries is considered higher
quality than a rule that covers fewer log entries. The numerator of the OverPrivilegeRate
(Equation 4.5) first finds the number of valid attribute:value combinations in the privilege
73
universe which are covered by a rule (ξ‘(r)), minus those attribute:value combinations which
occur in the set of uncovered logs LOBP (r) \ LOBP (p), the result is the total number of
over-assignments for rule r. The total over-assignments are then normalized using the total
number of valid combinations in the valid privilege universe |ξ ′ |. A candidate rule which
has fewer over-assignments is considered higher quality than a rule that has more over-
assignments. The candidate score Cscore (Equation 4.6) is then the ω weighted addition of
the CoverageRate and the complement of the OverPrivilegeRate. By normalizing the under-
assignments using the number of log entries and the over-assignments using the size of the
valid privilege universe, the effect of varying the weight ω in the Cscore is more predictable and
results in better performance when compared to the λ−Distance metric which also uses a
variable weighting between over-assignments and under-assignments but does not normalize
these values (see Section 4.8.2 for Cscore vs. λ−Distance comparison details).
Our algorithm for mining an ABAC policy from the logs of a given observation period is
presented in Algorithm 4. Note that we use arithmetic operators =, +, − when describing
integer operations, and set operators ←, ∪, \, ∈, |size| when describing set operations. As
mentioned previously, the algorithm iteratively generates candidate rules from the set of
uncovered logs. To avoid confusion between the original set of log entries for the observation
period LOBP and the current set of uncovered log entries which is updated for each iteration
of the algorithm, we copy LOBP to Luncov at line 2. The FP-growth algorithm [46] is used
to mine frequent itemsets from the set of uncovered observation period log entries (line 4).
74
Algorithm 4: Rule Mining Algorithm
Input: LOBP The set of log entries representing user actions during the observation
period OBP .
Input: ω under-privilege vs. over-privilege weighting variable.
Input: ǫ Threshold value for minimum itemset frequency.
Input: ξ‘ The set of all valid attribute:value combinations that comprise the privilege
universe.
Output: policy The set of ABAC rules that make up the policy to be applied during
the operation period OP P .
1 policy ← ∅;
2 Luncov ← LOBP ;
3 while |Luncov | > 0 do
4 itemsets ← F P −growth.f requentItemsets(Luncov , ǫ);
5 candidateRules ← ∅;
6 for itemset ∈ itemsets do
7 rule = createRule(itemset);
8 coverageRate = |Luncov (rule)|
|Luncov |
;
9 overAssignmentRate = |ξ‘(rule)|−|L uncov (rule)|
|ξ‘|
;
10 rule.Cscore = coverageRate + ω × (1 − overAssignmentRate);
11 candidateRules ← candidateRules ∪ rule;
12 end
13 bestRule = sortDescending(candidateRules, Cscore )[0];
14 policy ← policy ∪ bestRule;
15 Luncov ← Luncov \ Luncov (bestRule);
16 end
17 return policy
75
The itemsets returned by the FP-growth algorithm are sets of attribute:value statements,
and each of these itemsets is used to create a candidate rule which is then scored using the
Cscore metric (lines 6-12). After all candidates are scored, the highest scoring rule is selected
and added to the policy, then all log entries covered by that rule are removed from the set
of uncovered log entries (lines 13-15). The mining process continues until all log entries are
covered (lines 3-16).
Once the observation period logs have been mined to create a policy, that policy is scored
using the events that took place during the operation period immediately following the mined
observation period as described in Algorithm 5. Each event during the operation period is
evaluated against the mined policy (lines 3-10), events allowed by the policy are TPs, while
events denied by the policy are FNs. A unique combination of attirbute:value pairs may
occur multiple times within the same time period. The TPs and FNs are both values based
on the number of times an event occurs in the log. The set of unique events that were
exercised in the operation period and granted by the policy is also maintained (line 6) in
order to calculate the FPs later (line 15). By counting each TP and FN instead of unique
occurrences, the resulting TPR is frequency weighted. Events that occur more frequently
in the operation period have a greater impact on the resulting TPR than those events that
occur less frequently.
While the TPs, FNs, and resulting TPR are based on the frequency weighted count of
events present in the log, the FPs, TNs and resulting FPR cannot be frequency weighted
because each unique valid event of the privilege universe is either granted or denied only once
by the policy. To obtain these values (FP, TN, FPR), we first determine how many unique
events out of the valid privilege space are granted by the policy (lines 11-14). It is important
to note that enumerating the entire privilege space and testing every valid event against the
policy would be much more computationally intensive than our approach, which is to use
information about the valid privilege space to enumerate only the valid events allowed by
76
Algorithm 5: Policy Scoring Algorithm
Input: LOP P The set of log entries representing user actions during the operation
period OP P .
Input: ξ‘ The set of all valid attribute:value combinations that comprise the privilege
universe.
Input: policy The set of ABAC rules that make up the policy to be applied during
the operation period OP P .
Output: T P R, F P R The true positive and false positive rates of the policy
evaluated against the operation period OP P .
1 T P = F N = 0;
2 exercisedGrantedEvents ← ∅ ;
3 for event ∈ LOP P do
4 if policyAllowsEvent(policy, event) then
5 T P = T P + 1;
6 exercisedGrantedEvents ← exercisedGrantedEvents ∪ event;
7 else
8 F N = F N + 1;
9 end
10 end
11 eventsAllowedByP olicy ← ∅;
12 for r ∈ policy do
13 eventsAllowedByP olicy ← eventsAllowedByP olicy ∪ ξ‘(rule);
14 end
15 F P = |eventsAllowedByP olicy \ exercisedGrantedEvents|;
16 T N = |privU niverse| − (T P + F N + F P );
17 if T P + F N == 0 then
18 T P R = 1;
19 else
20 T P R = T P/(T P + F N );
21 end
22 F P R = F P/(F P + T N );
23 return T P R, F P R
77
each rule. Most mined rules only allow a small percentage of the privilege space except in
cases of extreme ω values.
Once the set of all the unique events allowed by a policy has been enumerated, we remove
the set of unique events which occurred and were granted during the operation period to
obtain the number of total FP events for the policy (line 15). At this point we have obtained
the unique sets of TPs, FNs, and FPs, so any remaining privilege in the valid privilege
universe not in these sets must be a TN (line 16). With these values calculated, we can
then calculate the TPR and FPR, with the caveat that in the case where no privileges were
exercised during the operation period, we define T P R = 1 because there could not be any
instances of under-privilege (lines 18-22). The purpose of the policyAllowsEvent() function
is self-explanatory and trivial to implement, so the implementation of this method is omitted
due to space considerations.
Dealing with the large number of possible attributes:value combinations that may com-
prise an ABAC privilege space can be a significant challenge compared to the simpler RBAC
privilege space. Using all attributes and values present from logs may make the privilege
universe computationally impractical to process. But discarding too many attributes or im-
portant attributes may result in less secure policies. We address these issues by using feature
selection and partitioning methods to make large ABAC privilege spaces more manageable.
Intuitively, attributes which occur infrequently in the logs or have highly unique values
are poor candidates for use in creating ABAC policies. Uncommon attributes are difficult
to mine meaningful patterns from because there is less data available to identify patterns
from. Also, rules created with uncommon attributes are less useful in access control decisions
because future access requests are unlikely to use these attributes as well. Using attributes
with unique values (the attribute value is never or rarely duplicated across log entries) is likely
78
to result in over-fitting for any rules created with those attributes. Following this reasoning,
we perform preprocessing on our dataset to select and bin the most useful attributes as
follows.
AttributeOccurrences
3. Sort attributes by F requency = T otalLogEntries
. Select attributes above a frequency
threshold, θ.
4. Sort the remaining values by U niquness, high U niquness are candidates for binning
or removal.
Our full dataset contained 1,748 distinct attributes (see Section 4.8.1 for dataset descrip-
tion). In step (1) attributes with U niqueness ≈ 1.0 nearly always have unique values, and
U niqueness ≈ 0.0 implies the attribute values are nearly always the same. Resource identi-
fiers are given an exception to the uniqueness test in this step as they are expected to have
high uniqueness. For our dataset, we identified and removed two always unique attributes,
eventID and requestID, and one attribute that always had the same value accountId. We
confirmed that these attributes would always meet the uniqueness criteria with the AWS
documentation. Applying step (2), we identified three distinct attributes for the user name
with a 1:1 correlation and removed two of them. For step (3) we selected two thresholds to
build two datasets for experimentation, theta = 0.1 and theta = 0.005, we term the priv-
ilege universes built using these thresholds ξ‘0.1 and ξ‘0.005 , respectively. Figure Figure 4.1
charts the rank of the top 50 most common attributes after our feature selection process was
complete. The attribute frequency follows the common power law distribution with a “long
tail”; the remaining attributes not charted here occurred in less than 0.2% of the log entries.
Next we apply step (4) of our process to our dataset. Some of the remaining attributes still
have fairly high U niquness values which are difficult to mine meaningful rules from. In our
79
1
0.8
Frequency
0.6
0.4
0.2
0
0 10 20 30 40 50
Attribute Frequncy Rank
Figure 4.1: Top 50 Attributes Ranked by Frequency
dataset, some of these attributes such as checksum values are not relevant to creating security
policies and can be discarded. Others are attributes which may benefit from binning into a
smaller subset of values. There were three such attributes in our dataset: sourceIPAddress,
userAgent, and eventName. The sourceIPAddress is an IPv4 address with over 4 billion
possible values. After consulting with the system administrator of the dataset provider, we
found that it was unlikely they would use rules based on the raw IP address since users will
change IPs frequently. Instead, they preferred to derive the geographical location from the
IP address so IPs were binned by U.S. states and each country the organization’s users may
log in from. The userAgent attribute is the AWS Command Line Interface (CLI), Software
Development Kit (SDK), or web browser version used when making a request. This field
benefits from binning as users are likely to perform similar requests from a web browser,
but they may upgrade their browser version regularly. Without binning the many different
browser versions into a single group, a mining algorithm would not effectively learn user
80
patterns. Again, the dataset provider agreed that the raw value was too granular for use so
the userAgent attribute was binned into 10 buckets. The eventName attribute is the name
of the operation. This attribute is already effectively binned because each eventName can
only be associated with one eventSource which is the AWS service name associated with the
operation. We derived two additional attributes to bin eventName, one based on whether it
was a Create, Read, Update, Delete, or Execute operation, and a second derived attribute
based on the first word of the eventName. For example the operation “StartInstance” is
binned into a bucket with other attributes that begin with “Start”. Experiments showed
this improved T P R with a negligible decrease in F P R at a ratio of 20:1.
The resulting ABAC privilege space may still be quite large even for a modest dataset
after applying the feature selection and binning methods as just described in Section 4.7.3.1.
This section describes partitioning techniques we applied to split up the privilege space during
the policy mining process. Partitioning techniques (as used in databases to split large tables
into smaller parts) are used to both reduce the memory footprint of our algorithms, and to
improve performance by performing operations in parallel across multiple processors.
The rule mining algorithm (Algorithm 4) uses partitioning to improve the run time
and space efficiency for storing and searching the privilege universe ξ‘. The total number
of valid combinations of ξ‘ was on the order of billions for some of our experiments, but
Algorithm 4 only needs to determine the number of privileges covered by a rule and it does
not need to enumerate and store all possible privilege combinations in memory. This is
a subtle but important difference because it means we can calculate the number of valid
privilege combinations by splitting ξ‘ into smaller sets of independent partitions to perform
this calculation. The total number of valid privilege combinations covered by a rule is the
product of the number of valid privilege combinations covered by each separate partition,
i.e., |ξ‘(r)| = |P1 (r)| × ... × |Pn (r)| where the attributes of each partition Pi are independent
of the attributes in all other partitions.
81
To create these partitions, the AWS documentation was used to identify dependencies
between attributes in our dataset. Next, a simple depth first search was used to identify
connected components of interdependent attributes. The valid attribute:value combinations
for all attributes in each connected component were then enumerated and stored into one
inverted index for each partition. Finding the number of valid privilege combinations covered
by a rule in a partition (|Pn (r)|) is accomplished by searching the inverted index using the
rule’s attribute:value constraints as search terms. As a result of this partitioning, our queries
were performed against three indexes on the order of thousands to hundreds of thousands of
documents vs. a single index that would have been on the order of hundreds of millions to
billions of documents if such a partitioning scheme were not in use.
For our dataset, a depth first search identified one connected component of all user
attributes, and another connected component of operations and resources. Operations and
resources were connected because most operations are specific to a single or set of resource
types. We grouped all other attributes that were independent of users and operations into
a third component which included environment attributes such as the sourceIPAddress and
userAgent. Although this grouping of attributes by components was obtained from processing
our specific dataset, it is reasonable to assume that the user attributes are independent of
the valid operation and resource attribute combinations in other datasets as well. This is
also consistent with the NIST ABAC guide which defines environment conditions as being
independent of subjects and objects [42].
Due to the large number of candidate rules generated by the F P −growth algorithm,
scoring of candidate rules is the most computationally intensive part of Algorithm 4 in our
experiments (except for those with fairly large ǫ values which generate few candidates). The
search against the inverted index is also parallelized to improve performance.
To improve the run time performance of the policy scoring algorithm (Algorithm 5)
and enable it to deal with a privilege space larger than the available memory, we again
82
employ partitioning and parallelization methods. As mentioned in 4.7.2, Algorithm 5 must
enumerate the set of all privilege combinations covered by a rule in order to identify the total
unique number of privilege combinations covered by a policy. If extreme values for ω are
chosen, it is possible for Algorithm 4 to generate rules with a large number of over-privileges,
possibly the entire privilege space. Therefore, Algorithm 5 must be able to deal with the
possibility that it will have to enumerate all privilege combinations of ξ‘, although again,
this only happens for extreme values of ω, and this is for the out-of-sample validation for
policy scoring only, not the rule mining algorithm.
To deal with the possible need to enumerate a large portion or even all of the privilege
space, we partitioned ξ‘ along two attributes so that the values of those attributes are placed
into separate partitions. As with any partitioning, choosing a key that nearly equally splits
the universe of possible values is important. For our experiments, we chose to partition
the ξ‘ space along the attributes associated with the operation name and the user name.
The overall correctness of the algorithm is independent of the partition keys used, and 1...n
partitions may be used for each attribute depending on the size of the privilege space and
available memory.
Each of these partitions is operated on in parallel when evaluating each rule of the
policy. Unique hashes of the enumerated events are used in order to deduplicate events
which may be generated by more than one rule. This partitioning and parallelization takes
place within lines 11-14 of Algorithm 5. We describe these optimizations here because they
are useful in speeding up and scaling the algorithm when dealing with a large number of
attribute:value pairs, but we omit it from the pseudo-code in Algorithm 5 in order to simplify
the presentation of the parts of the algorithm necessary for correctness.
4.8 Results
We use the Receiver Operating Characteristic (ROC) curve to compare the performance
of various algorithms and parameters. The ROC curve is a graphic commonly used to chart
the performance of binary classifiers. It charts the trade-off between the TPR and the FPR
83
of a binary classifier, with the ideal performance having a TPR value of one and FPR value
of zero. Our charts also include the Area Under the Curve (AUC) which measures the area
underneath the ROC curve. This provides a single quantitative score that incorporates both
the F P R and T P R as the weighting metric is varied with higher AU C scores being more
favorable.
First, we describe our dataset used for these experiments. Next we present experimental
results and analysis to justify our choices for the candidate evaluation metric Cscore , including
a comparison of several possible methods for normalizing the CoverageRate variable. Then
we examine the effect of varying the two adjustable input variables to the mining algorithm,
the length of the observation period (|LOBP |), and the minimum support threshold (ǫ).
Finally, we compare the performance of our ABAC algorithm and policies to that of an
RBAC based approach.
84
P
Avg. is the average number of unique actions exercised by active users, and Action
Avg. is the average of the total actions exercised by active users. The standard deviation
P
is also provided for Unique Services, Unique Actions, and Actions metrics to understand
P
the variation between individual users. For example, looking at both the Unique and
Actions, we observe that their standard deviation is higher than the average for all time
periods, indicating a high degree of variation between the number of actions that users
exercise.
Based on our dataset of 4.7M user generated events, we derive two privilege universes
using our feature selection methodology described in Section 4.7.3.1. ξ‘0.1 used 15 attributes
and consisted of 510M unique attribute:value combinations. ξ‘0.005 used 40 attributes, 25
of which were resource identifiers so the universe size varied between 1.5B and 8.6B unique
attribute:value combinations depending on the number of resources used during the OBP
and OP P periods. All of the experiments in this section use ξ‘0.1 except for Section 4.8.4
which uses ξ‘0.005 .
We consider three criteria in the design and evaluation of the Cscore metric for selecting
a single rule from many candidate rules generated by the F P −growth algorithm during
each iteration of our rule mining algorithm. C1:AU C is the Area Under the ROC Curve.
85
C2:Smoothness means that T P R values should increase monotonically as the F P R in-
creases. And, C3:Interpretability means that the effect of changing the weighting variable
should be predictable and easy to understand by an administrator who uses the metric in a
policy mining algorithm.
We propose the candidate scoring metric Cscore in Section 4.7.1.1, λ−Distance is pre-
sented in [25], and Qrul is presented in [54]. All of these metrics use the number of over-
assignments and number of log entries covered with a weighting variable for adjusting the
importance between over-assignments and coverage in their scoring of candidates. However,
these metrics differ in how they normalize these numbers (if at all) and how they implement
the weighting between them. The results of varying the over-assignment weightings for these
candidate evaluation methods are shown in Figure 4.2.
Four distinct versions of the Qrul metric are presented in Figure 4.2. Qrul is the metric
as presented in [54] (and in this paper as Equation 4.1). In [54], the authors also described
QrulF req, a frequency weighted variant of Qrul which should be a fairer comparison with our
frequency weighted policy scoring algorithm (Algorithm 5). The authors of [54] provide their
source code on their website. After inspecting this source code, it appears that the scoring
algorithms implemented in the source code for Qrul and QrulF req are slightly different
from those presented in the paper. Instead of using the number of privileges covered by a
rule out of the entire privilege universe ([[p]]) as the denominator for the over-assignments
side of the metric, the implemented metrics instead use the number of privileges covered by
a rule out of the log entries not covered by other rules already in the policy (|[[p]] ∩ U P |).
These “as-implemented” metrics, QrulImpl and QrulF reqImpl, perform more favorably
than their counterparts so we include them in our comparison here along with the versions
as documented in [54].
All of the examined metrics performed relatively well with high AU C values, but the
Cscore metric has the highest AU C value, thus being the most favorable metric per the
86
1
0.998
0.996
0.994
TPR
0.992
C-Score AUC=0.9993
0.99 QrulImpl AUC=0.9983
QrulFreqImpl AUC=0.9978
0.988 λ-Distance AUC=0.9972
QrulFreq AUC=0.9946
Qrul AUC=0.9922
0.986
0 0.2 0.4 0.6 0.8 1
FPR
87
portion of the ROC curve for Qrul (QrulF req has a similar inflection point that is difficult
to discern in Figure 4.2 at F P R = 0.0013).
Unlike the Qrul and λ−Distance metrics, Cscore normalizes both the number of logs
covered and over-assignments into a ratio between [0, 1] before applying the weighting. This
makes the weighting variable independent of the size of the privilege universe and number of
log entries and thus easier to understand and apply. In Figure 4.2, varying the ω weighting
1
of the Cscore results in the FPR values between ω = 10
and ω = 10, and varies the charted
F P R between F P R = 0.05 and F P R = 0.998 at relatively even intervals. To achieve a
similar spread across the F P R scores with QrulF reqImpl and λ−Distance, the variable
1 1
weighting for those metrics must be varied between 100
and 2000
. QrulImpl achieved the
second highest AU C score due to an unusually good score near F P R = 0.34, but QrulImpl
is difficult to assign a weighting to with predictable results. For example, the QrulImpl
1
score at F P R = 0.34, T P R = 0.9998 was achieved with ω‘0 = 100000
, but the next score at
1
F P R = 0.49, T P R = 0.9988 was achieved with ω‘0 = 500000
, which is a significant difference
that is difficult to determine without experimentation and consideration of the privilege
space and log sizes. Because of its predictability and more even distribution of results, we
find Cscore best meets our evaluation criterion C3:Interpretability.
The CoverageRate (Equation 4.4) of the Cscore (Equation 4.6) is the number of log entries
covered by rule r normalized to the range [0, 1], so that it can be compared with the weighted
value of the OverP rivilegeRate (Equation 4.5) normalized to the same range. There are
several possible ways to compute such a coverage rate however, and it is not immediately
clear which would perform the best without experimentation. We consider four possible
methods of computing the CoverageRate and analyze their performance here:
|Luncov (r)|
• |Luncov |
: The frequency weighted number of logs covered out of the total number of
uncovered logs.
88
|{Luncov (r)}|
• |{Luncov }|
: The unique number of logs covered out of the set of unique uncovered logs.
|Luncov (r)|
• |LOBP |
: The frequency weighted number of logs covered out of the total number of logs
in the observation period.
|{Luncov (r)}|
• |{LOBP }|
: The unique number of logs covered out of the set of unique log entries during
the observation period.
The results of applying the four separate methods of computing the CoverageRate are
presented in Figure 4.3 and identified in that chart by the denominator of each method.
|Luncov (r)|
As evident in Figure 4.3, the |Luncov |
method performed the best for two of our criteria
for selecting a candidate metric: C1:AU C and C2:Smoothness. The frequency weighted
|Luncov (r)| |Luncov (r)|
methods |Luncov |
and |LOBP |
performed about the same in terms of C3:Interpretability
1
with ω = 10
resulting in scores in the upper-left most part of the chart. The methods using
the number of unique log entries performed less favorably in terms of C3:Interpretability
1
with their upper-left most points being reached near ω = 256
, a value farther away from 1
and more difficult to find without experimentation.
In addition to the ω variable which is varied to generate the points along all of the ROC
curves in this section (with the exception of the RBAC algorithm curve in Figure 4.6), there
are two other parameters which can be varied as inputs to Algorithm 4: the threshold value
used by the F P −growth algorithm, ǫ, and the length of the observation period |LOBP |.
The minimum support threshold (ǫ) is used to specify that a pattern is considered a
“frequent” pattern if that pattern occurs in >= ǫ of the examined entries. Increasing ǫ
causes fewer candidate patterns to be identified by the F P −growth algorithm. The results
of varying ǫ between [0.05, 0.1, 0.2, 0.3] are shown in Figure 4.4. For both ǫ = 0.2 and ǫ = 0.3,
we observe inflection points in the chart as ω decreases because a lower ω value favors more
89
1
0.999
0.998
TPR
0.997
0.996
|𝕃 uncov| AUC=0.9993
|𝕃 OBP| AUC=0.9974
90
1
ω=1/10
ω=1/10
0.995
TPR
0.99
0.985
ε=0.3 AUC=0.9640
ε=0.2 AUC=0.9940
ε=0.1 AUC=0.9993
ε=0.05 AUC=0.9996
0.98
0 0.02 0.04 0.06 0.08 0.1
FPR
When mining policies with a variable observation period length, a larger observation
window generally results in higher T P R but also higher F P R as a result of the mining
algorithms being given more privileges in larger observation periods as previously observed
in [48]. While this trend is also present with our mining algorithm, it is much less noticeable
than with the naive RBAC mining approach.
91
1
0.998
0.996
TPR
0.994
92
1
0.95
0.9
0.85
TPR
0.8
0.75
0.7
0.65
ABAC AUC=0.9973
RBAC AUC=0.9269
0.6
0 0.05 0.1 0.15 0.2 0.25 0.3
FPR
Figure 4.6: Comparison of ABAC vs. RBAC Performance
4.8.4 ABAC vs. RBAC Performance
The final experiment we run is to compare the performance of our ABAC algorithm
against an RBAC mining algorithm. For this comparison, we use the naive algorithm pre-
sented in [48], which builds an RBAC policy based on the permissions exercised during an
observation period. Other role mining algorithms would perform very similarly because the
role mining problem is designed to fit a set of roles to a given matrix of user to permission
assignments, just with variations on how those users and permissions are grouped by roles to
minimize WSC. Although this RBAC algorithm is fairly simple, it performed quite well in
93
the scenario that sought an equal balance between low over-privilege and low under-privilege
when compared to more sophisticated algorithms [48].
The ROC curve of our ABAC algorithm and the naive RBAC algorithm from [48] are
presented in Figure 4.6. For this comparison, the ABAC algorithm used a fixed observation
period size of 30 days, an itemset frequency ǫ = 0.1, and the over-privilege weight varied be-
1
tween ω = [ 8192 , ..., 16] by powers of 2 to generate the data points. For the RBAC algorithm,
there is no variable similar to ω that can be used as a parameter to instruct the algorithm to
directly vary the importance between under-privilege and over-privilege. However, varying
the observation period length effectively serves this purpose by causing more or fewer priv-
ileges to be granted by the algorithm, so the observation period length was varied between
[3, 7, 15, 30, 45, 60, 75, 90, 105, 120] days to generate the data points for the RBAC algorithm
in Figure 4.6.
The ABAC algorithm significantly outperformed the RBAC algorithm across the ROC
curves in Figure 4.6. With only 30 days worth of data, the ABAC algorithm was able to
correctly grant more privileges (higher TPR) than the RBAC algorithm with 120 days of
data. The ABAC algorithm was also able to correctly restrict more unnecessary privileges
(lower FPR) than the RBAC algorithm operating on only 3 days of data. This is due to
the ability of the ABAC algorithm to identify and use patterns and create policies based on
attributes vs. the RBAC algorithm which is restricted to using only RBAC semantics.
4.9 Summary
This paper explored methods for automatically generating least privilege ABAC policies
that balance between minimizing under- and over-privilege assignment errors. We defined
the ABAC Privilege Error Minimization Problem (ABACP EM P ). We also presented metrics
and methodology for evaluating ABAC policies using out-of-sample validation. We adapted
techniques from unsupervised rule mining to create an algorithm which automatically per-
forms ABAC policy generation by mining audit logs with a variable weighting between under-
and over-privilege. We described optimization methods using feature selection, partitioning,
94
and parallelization to mine and score large ABAC privilege spaces. Finally, we presented the
results of applying our algorithm on a real-world dataset which demonstrated its effectiveness
as well as the better performance of our ABAC policies over mined RBAC policies.
This work suggests many possibilities for future research in generating secure ABAC
policies. Our candidate rule scoring metric, Cscore , can be expanded to consider policy com-
plexity (WSC), or our method can be combined with those which minimize policy complexity
only. Additional attributes may be incorporated from sources other than just audit logs such
as HR databases of user attributes, or by introspecting the application environment and ex-
tracting attribute information about existing resources. As the number of attributes grows,
so does the importance of feature selection for selecting highly relevant attributes that can
help improve the security of the generated policies without greatly increasing the runtime
and memory required by a mining algorithm.
95
CHAPTER 5
CONCLUSION
As access controls have evolved to cover the complex and various use cases of modern
computing, the burden of defining access control policies has also increased, often exceeding
the human ability to define policies that implement the Principle of Least Privilege. Along
with increasing complexity the commoditization of computing power, such as cloud comput-
ing, has made it easier than ever for organizations to rapidly deploy computing resources with
minimal effort (or training), thus increasing the risks and damages that may be caused as a
result of poor access control policies. The research presented in this thesis demonstrates the
effectiveness of several automated methods for creating access control policies that achieve
the principle of least privilege with quantitatively evaluations of their performance at reduc-
ing under-privilege and over-privilege on real world datasets. More specifically, the individual
projects that comprise this thesis have made the following contributions to advance the state
of access control research:
2. We formally defined the Privilege Error Minimization Problem (PEMP) which de-
scribed the problem of creating complete and secure RBAC privilege policies. Using
our previously defined metrics and policy generation framework we presented a method-
ology for training and validating one naive and two machine learning based algorithms.
Again using real world data, we present evaluation results for our presented algorithms.
96
3. We presented an association rule mining based algorithm to address the problem of
automatically creating ABAC policies. We also presented feature selection, scalability,
and performance optimization methods for processing the large privilege spaces that
are inherent to the ABAC environment. Using metrics adapted from our previous work
to better suit ABAC policies, we presented a quantitative analysis of the performance
of our mining algorithm using a real-world dataset and a comparison of our automat-
ically generated ABAC policies created by our mining algorithm with automatically
generated RBAC based policies.
97
REFERENCES CITED
[1] Harold F Tipton and Kevin Henry. Official (ISC) 2 guide to the CISSP CBK. Auerbach
Publications, 2006.
[2] Sara Motiee, Kirstie Hawkey, and Konstantin Beznosov. Do windows users follow the
principle of least privilege?: investigating user account control practices. In Symposium
on Usable Privacy and Security (SOUPS), 2010.
[6] Darren Pauli. Dev put AWS keys on Github. Then BAD THINGS hap-
pened. https://www.theregister.co.uk/2015/01/06/dev_blunder_shows_github_
crawling_with_keyslurping_bots, 2015. Accessed: 2018-10-21.
[7] U.S. Department of Commerce. 2016 Top Markets Report Cloud Computing. http://
trade.gov/topmarkets/pdf/Cloud_Computing_Top_Markets_Report.pdf, 2016. Ac-
cessed: 2017-03-23.
[9] Jerome H Saltzer and Michael D Schroeder. The protection of information in computer
systems. IEEE, 63(9):1278–1308, 1975.
[10] Ravi Sandhu, David Ferraiolo, and Richard Kuhn. The NIST model for role-based
access control: towards a unified standard. In ACM workshop on Role-based access
control, 2000.
98
[11] Jaideep Vaidya, Vijayalakshmi Atluri, and Janice Warner. Roleminer: mining roles
using subset enumeration. In Proceedings of the 13th ACM conference on Computer
and communications security, pages 144–153. ACM, 2006.
[12] Hassan Takabi and James BD Joshi. Stateminer: an efficient similarity-based approach
for optimal mining of role hierarchy. In Proceedings of the 15th ACM symposium on
Access control models and technologies, pages 55–64. ACM, 2010.
[13] Jrgen Schlegelmilch and Ulrike Steffens. Role mining with orca. In ACM Symposium
on Access control models and technologies (SACMAT), 2005.
[14] Ruowen Wang, William Enck, Douglas Reeves, Xinwen Zhang, Peng Ning, Dingbang
Xu, Wu Zhou, and Ahmed M. Azab. Easeandroid: Automatic policy analysis and
refinement for security enhanced android via large-scale semi-supervised learning. In
USENIX Security Symposium, 2015.
[15] Yongzheng Wu, Jun Sun, Yang Liu, and Jin Song Dong. Automatically partition
software into least privilege components using dynamic data dependency analysis. In
IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013.
[16] Aaron Blankstein and Freedman J. Michael. Automating isolation and least privilege
in web services. In IEEE Symposium on Security and Privacy, pages 133–148. IEEE,
2014.
[18] Amazon Web Services. AWS Identity and Access Management (IAM). https://aws.
amazon.com/iam/, 2017. Accessed: 2017-02-20.
[19] Bob Violino. Cloud Computing Sees Huge Growth Rates Across All Seg-
ments. http://www.information-management.com/news/infrastructure/
cloud-computing-sees-huge-growth-rates-across-all-segments-10030682-1.
html, 2017. Accessed: 2017-09-07.
[20] Jerome H Saltzer and Michael D Schroeder. The protection of information in computer
systems. Proceedings of the IEEE, 63(9):1278–1308, 1975.
99
[22] Mario Frank, Joachim M Buhmann, and David Basin. On the definition of role mining.
In ACM Symposium on Access control models and technologies (SACMAT), pages 35–44.
ACM, 2010.
[23] Jaideep Vaidya, Atluri Vijayalakshmi, and Qi Guo. The role mining problem: finding
a minimal descriptive set of roles. In ACM Symposium on Access control models and
technologies (SACMAT), pages 175–184. ACM, 2007.
[24] Brian T. Sniffen, David R. Harris, and John D. Ramsdell. Guided policy generation for
application authors. In SELinux Symposium, 2006.
[25] Ian Molloy, Youngja Park, and Suresh Chari. Generative models for access control
policies: Applications to role mining over logs with attribution. In ACM Symposium on
Access control models and technologies (SACMAT). ACM, 2012.
[26] Suresh Chari, Ian Molloy, Youngja Park, and Wilferid Teiken. Ensuring continuous
compliance through reconciling policy with usage. In ACM Symposium on Access control
models and technologies (SACMAT), pages 49–60, 2013.
[27] Matthew Sanders and Chuan Yue. Automated least privileges in cloud-based web ser-
vices. In Hot Topics in Web Systems and Technologies (HotWeb). IEEE, 2017.
[29] IBM Corporation. z/OS Security Server RACF General User’s Guide.
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_1.13.0/com.ibm.
zos.r13.icha100/toc.htm, 2012. Accessed: 2017-05-17.
[30] Amazon Web Services. IAM Policy Generator Source Code. https://awsiamconsole.
s3.amazonaws.com/iam/assets/js/bundles/policies.js, 2017. Accessed: 2017-05-
04.
[31] David F Ferraiolo, Ravi Sandhu, Serban Gavrila, D Richard Kuhn, and Ramaswamy
Chandramouli. Proposed nist standard for role-based access control. ACM Transactions
on Information and System Security (TISSEC), 4(3):224–274, 2001.
[32] John D. Kelleher, Brian Mac Namee, and Aoife D’Arcy. Fundamentals of Machine
Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Stud-
ies. MIT Press, 2015. ISBN 0262029448, 9780262029445.
100
[33] Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang, and Jorge Lobo.
Evaluating role mining algorithms. In ACM Symposium on Access control models and
technologies (SACMAT), pages 95–104. ACM, 2009.
[34] Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice.
OTexts, 2014.
[36] Pedregosa, F., et al. Scikit-learn: Machine learning in Python. Journal of Machine
Learning Research, 12:2825–2830, 2011.
[37] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based
algorithm for discovering clusters in large spatial databases with noise. In Knowledge
discovery in databases (KDD), volume 96, pages 226–231. AAAI Press, 1996.
[39] Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. Classification
and regression trees. CRC press, 1984.
[40] Spyros Makridakis. Sliding simulation: A new approach to time series forecasting.
Management Science, 36(4):505–512, 1990.
[41] Spyros Makridakis, A Andersen, Robert Carbone, Robert Fildes, Michele Hibon, Rudolf
Lewandowski, Joseph Newton, Emanuel Parzen, and Robert Winkler. The accuracy of
extrapolation (time series) methods: Results of a forecasting competition. Journal of
forecasting, 1(2):111–153, 1982.
[42] Vincent C Hu et al. Nist 800-162: Guide to attribute based access control (abac)
definition and considerations (draft), 2013.
[43] Bill Fisher, Norm Brickman, Prescott Burden, Santos Jha, Brian Johnson, Andrew
Keller, Ted Kolovos, Sudhi Umarji, and Sarah Weeks. Attribute based access control.
NIST SPECIAL PUBLICATION, page 3B, 1800.
[44] Arjumand Fatima, Yumna Ghazi, Muhammad Awais Shibli, and Abdul Ghafoor Abassi.
Towards attribute-centric access control: an abac versus rbac argument. Security and
Communication Networks, 9(16):3152–3166, 2016.
101
[45] Trevor Hastie, Jerome Friedman, and Robert Tibshirani. The elements of statistical
learning. Springer series in statistics New York, NY, USA, 2001.
[46] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns without
candidate generation: A frequent-pattern tree approach. Data mining and knowledge
discovery, 8(1):53–87, 2004.
[47] Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between
sets of items in large databases. In ACM sigmod record, volume 22, pages 207–216. ACM,
1993.
[48] Matthew W Sanders and Chuan Yue. Minimizing privilege assignment errors in cloud
services. In Proceedings of the ACM Conference on Data and Application Security and
Privacy, pages 2–12, 2018.
[49] Lujo Bauer, Scott Garriss, and Michael K Reiter. Detecting and resolving policy mis-
configurations in access-control systems. ACM Transactions on Information and System
Security (TISSEC), 14(1):2, 2011.
[50] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association
rules. In Proceedings of the International Conference on Very Large Data Bases, VLDB,
volume 1215, pages 487–499, 1994.
[51] Carlos Cotrini Jiménez, Thilo Weghorn, and David A. Basin. Mining abac rules from
sparse logs. 2018 IEEE European Symposium on Security and Privacy (EuroS&P),
pages 31–46, 2018.
[52] Branko Kavšek, Nada Lavrač, and Viktor Jovanoski. Apriori-sd: Adapting association
rule learning to subgroup discovery. In Michael R. Berthold, Hans-Joachim Lenz, Eliz-
abeth Bradley, Rudolf Kruse, and Christian Borgelt, editors, Advances in Intelligent
Data Analysis V, pages 230–241, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.
ISBN 978-3-540-45231-7.
[53] Carlos E. Rubio-Medrano, Josephine Lamp, Adam Doupé, Ziming Zhao, and Gail-Joon
Ahn. Mutated policies: Towards proactive attribute-based defenses for access control. In
Proceedings of the Workshop on Moving Target Defense, 2017. ISBN 978-1-4503-5176-8.
[54] Zhongyuan Xu and Scott D Stoller. Mining attribute-based access control policies.
IEEE Transactions on Dependable and Secure Computing, 12(5), 2015.
[55] Tanay Talukdar, Gunjan Batra, Jaideep Vaidya, Vijayalakshmi Atluri, and Shamik
Sural. Efficient bottom-up mining of attribute based access control policies. In Proceed-
ings of the IEEE International Conference on Collaboration and Internet Computing
(CIC), pages 339–348, 2017.
102
[56] Ian Molloy, Hong Chen, Tiancheng Li, Qihua Wang, Ninghui Li, Elisa Bertino, Seraphin
Calo, and Jorge Lobo. Mining roles with semantic meanings. In ACM Symposium on
Access control models and technologies (SACMAT), pages 21–30, 2008.
[57] Zhongyuan Xu and Scott D Stoller. Mining attribute-based access control policies from
logs. In Proceedings of the IFIP Working Conference on Data and Applications Security
and Privacy, pages 276–291. Springer, 2014.
103