0% found this document useful (0 votes)

87 views117 pages

Automated Methods For Generating Least Privilege Access Control Policies - Principle of Least Privilege (PoLP)

Uploaded by

farrukh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views117 pages

Automated Methods For Generating Least Privilege Access Control Policies - Principle of Least Privilege (PoLP)

Uploaded by

farrukh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 117

AUTOMATED METHODS FOR GENERATING

LEAST PRIVILEGE ACCESS CONTROL

POLICIES

by
Matthew W. Sanders
c Copyright by Matthew W. Sanders, 2019

All Rights Reserved
A thesis submitted to the Faculty and the Board of Trustees of the Colorado School
of Mines in partial fulfillment of the requirements for the degree of Doctor of Philosophy
(Computer Science).

Golden, Colorado
Date

Signed:
Matthew W. Sanders

Signed:
Dr. Chuan Yue
Thesis Advisor

Golden, Colorado
Date

Signed:
Dr. Tracy Camp
Professor and Head
Department of Computer Science

ii
ABSTRACT

Access controls are the processes and mechanisms that allow only authorized users to
perform operations upon the resources of a system. Using access controls, administrators
attempt to implement the Principle of Least Privilege, a design principle where privileged
entities operate using the minimal set of privileges necessary to complete their job. This
protects the system against threats and vulnerabilities by reducing exposure to unauthorized
activities. Although access control can be considered only one area of security research, it
is a pervasive and omnipresent aspect of information security.
But achieving the Principle of Least Privilege is a difficult task. It requires the ad-
ministrators of the access control policies to have an understanding of the overall system,
each user’s job function, the operations and resources necessary to those job functions, and
how to express these using the access control model and language of the system. In almost
all production systems today, this process of defining access control policies is performed
manually. It is error prone and done without quantitative metrics to help administrators
and auditors determine if the Principle of Least Privilege has been achieved for the system.
In this dissertation, we explore the use of automated methods to create least privilege
access control policies. Specifically, we (1) develop a framework for policy generation al-
gorithms, derive metrics for determining adherence to the Principle of Least Privilege, and
apply these to evaluate a real world dataset, (2) develop two machine learning based algo-
rithms for generating role based policies and compare their performance to naive methods,
and (3) develop a rule mining based algorithm to create attribute based policies and evaluate
its effectiveness to role based methods. By quantifying the performance of access control
policies, developing methods to create least privilege policies, and evaluating their perfor-
mance using real world data, the projects presented in this dissertation advance the state of
access control research and address a problem of great significance to security professionals.

iii
TABLE OF CONTENTS

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Automated Least Privileges in Cloud-Based Web Services . . . . . . . . . . . . . 6

1.2 Minimizing Privilege Assignment Errors in Cloud Services . . . . . . . . . . . . 7

1.3 Mining Least Privilege Attribute Based Access Control Policies . . . . . . . . .7

CHAPTER 2 AUTOMATED LEAST PRIVILEGES IN CLOUD-BASED WEB

SERVICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Over-Privilege in Manually Generated Policies . . . . . . . . . . . . . . . . . . 11

2.4 Policy Generation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6.1 Impact of Varying the Operation Period . . . . . . . . . . . . . . . . . 18

2.6.2 Impact of Varying the Observation Period . . . . . . . . . . . . . . . . 21

2.6.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iv
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

CHAPTER 3 MINIMIZING PRIVILEGE ASSIGNMENT ERRORS IN CLOUD

SERVICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Problem Scope and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4.3 Model Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4.3.1 Scoring individual predictions . . . . . . . . . . . . . . . . . . 34

3.4.3.2 Scoring multiple predictions . . . . . . . . . . . . . . . . . . . 36

3.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5.1 Naive Policy Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5.2 Unsupervised Policy Generation . . . . . . . . . . . . . . . . . . . . . . 39

3.5.3 Supervised Policy Generation . . . . . . . . . . . . . . . . . . . . . . . 41

3.5.3.1 Classification Algorithm and Feature Selection . . . . . . . . . 43

3.5.3.2 Sliding Simulation for Supervised Parameter Selection . . . . 43

3.5.4 Model Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6.1 Complete Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6.2 Decomposed Models Results . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6.3 Recomposed Models Results . . . . . . . . . . . . . . . . . . . . . . . . 53

v
3.6.4 Results Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

CHAPTER 4 MINING LEAST PRIVILEGE ATTRIBUTE BASED ACCESS

CONTROL POLICIES . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1 Attribute Based Access Control (ABAC) . . . . . . . . . . . . . . . . . 60

4.2.1.1 ABAC Definition . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1.2 ABAC Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1.3 ABAC vs. RBAC . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.2 Rule Mining Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.4 Least Privilege Policy Generation . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.5 ABAC Policy Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.5.1 ABAC Rule Mining Based Works . . . . . . . . . . . . . . . . . . . . . 64

4.5.2 ABAC Policy Minimization Works . . . . . . . . . . . . . . . . . . . . 65

4.6 Problem Definition and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.6.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.6.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.6.2.1 Scoring Individual Predictions . . . . . . . . . . . . . . . . . . 69

4.6.2.2 Scoring Policies Across Multiple Time Periods . . . . . . . . . 71

4.6.2.3 Scoring Infinite Possible Resource Identifiers . . . . . . . . . . 72

4.7 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

vi
4.7.1 Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.7.1.1 Scoring Candidate Rules . . . . . . . . . . . . . . . . . . . . . 73

4.7.1.2 Rule Mining Algorithm . . . . . . . . . . . . . . . . . . . . . 74

4.7.2 Policy Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.7.3 Optimizations For Large Privilege Spaces . . . . . . . . . . . . . . . . . 78

4.7.3.1 Preprocessing And Feature Selection . . . . . . . . . . . . . . 78

4.7.3.2 Mining Algorithm Optimizations . . . . . . . . . . . . . . . . 81

4.7.3.3 Scoring Algorithm Optimizations . . . . . . . . . . . . . . . . 82

4.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.8.1 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.8.2 Cscore Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.8.2.1 Evaluating Candidate Scoring Metrics . . . . . . . . . . . . . 86

4.8.2.2 Methods of Calculating CoverageRate . . . . . . . . . . . . . 88

4.8.3 Effect of Varying Algorithm Parameters . . . . . . . . . . . . . . . . . 89

4.8.3.1 Effect of Varying Itemset Frequency Threshold . . . . . . . . 89

4.8.3.2 Effect of Varying Observation Period Length . . . . . . . . . . 91

4.8.4 ABAC vs. RBAC Performance . . . . . . . . . . . . . . . . . . . . . . 93

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

CHAPTER 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

vii
LIST OF FIGURES

Figure 2.1 Number of Granted & Used Actions by Role . . . . . . . . . . . . . . . . 13

Figure 2.2 Number of Granted & Used Services by Role . . . . . . . . . . . . . . . . 14

Figure 2.3 Sliding Window Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Figure 2.4 User Evaluation as Opr Days Vary (Obs Days=7,28) . . . . . . . . . . . 19

Figure 2.5 Role Evaluation as Opr Days Vary (Obs Days=7,28) . . . . . . . . . . . 20

Figure 2.6 T Fβ score as Opr Days Vary (Obs Days=7) . . . . . . . . . . . . . . . . 21

Figure 2.7 User Evaluation as Obs Days Vary (Opr Days=1,7) . . . . . . . . . . . . 22

Figure 2.8 Role Evaluation as Obs Days Vary (Opr Days=1,7) . . . . . . . . . . . . 23

Figure 2.9 User T Fβ scores as Obs Days Vary (Opr Days=1) . . . . . . . . . . . . . 24

Figure 2.10 Role T Fβ scores as Obs Days Vary (Opr Days=1) . . . . . . . . . . . . . 25

Figure 3.1 Receiver Operating Characteristic Curves . . . . . . . . . . . . . . . . . . 46

Figure 3.2 Beta Values Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Figure 3.3 Naive vs. Unsupervised Algorithms, β = 80 . . . . . . . . . . . . . . . . . 49

Figure 3.4 Naive vs. Supervised Algorithms, β = 1/10 . . . . . . . . . . . . . . . . . 50

Figure 3.5 Decomposed Models Unsupervised β >= 1 . . . . . . . . . . . . . . . . . 51

Figure 3.6 Decomposed Models, Supervised β <= 1 . . . . . . . . . . . . . . . . . . 52

Figure 3.7 Recomposed Models, β >= 1 . . . . . . . . . . . . . . . . . . . . . . . . . 54

Figure 3.8 Recomposed Models, β <= 1 . . . . . . . . . . . . . . . . . . . . . . . . . 55

Figure 4.1 Top 50 Attributes Ranked by Frequency . . . . . . . . . . . . . . . . . . . 80

Figure 4.2 Comparison of Candidate Evaluation Metrics . . . . . . . . . . . . . . . . 87

viii
Figure 4.3 Comparison of Methods for Calculating Coverage Rates . . . . . . . . . . 90

Figure 4.4 Performance as Itemset Frequency Varies . . . . . . . . . . . . . . . . . . 91

Figure 4.5 Performance as Observation Period Varies . . . . . . . . . . . . . . . . . . 92

Figure 4.6 Comparison of ABAC vs. RBAC Performance . . . . . . . . . . . . . . . 93

ix
LIST OF TABLES

Table 3.1 One Year Total Usage of our Dataset . . . . . . . . . . . . . . . . . . . . . 32

Table 4.1 16 Month Total Usage of our Dataset . . . . . . . . . . . . . . . . . . . . . 85

x
LIST OF ABBREVIATIONS

Attribute Based Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ABAC

Area Under Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AUC

Amazon Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AWS

Federal Identity, Credential, and Access Management Architecture . . . . . . . . . FICAM

False Negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FN

False Positive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FP

False Positive Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FPR

Health Insurance Portability and Accountability Act . . . . . . . . . . . . . . . . HIPAA

Identity and Access Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IAM

International Organization for Standardization . . . . . . . . . . . . . . . . . . . . . . ISO

Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LDA

National Institute of Standards and Technology . . . . . . . . . . . . . . . . . . . . NIST

Observation Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OBP

Operation Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OPP

Over-Privilege Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OPR

Payment Card Industry Data Security Standard . . . . . . . . . . . . . . . . . . PCI-DSS

Privilege Error Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . . PEMP

Principle of Least Privilege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PoLP

Role Based Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RBAC

Receiver Operating Characteristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROC

xi
Role Mining Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RMP

Software As A Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SaaS

Term Frequency-Inverse Document Frequencyj . . . . . . . . . . . . . . . . . . . . TF-IDF

Temporal Over-Privilege Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TOPR

True Negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TN

True Positive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TP

True Positive Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TPR

Under-Privilege Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UPR

Weighted Structural Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . WSC

xii
ACKNOWLEDGMENTS

I would like to express my utmost gratitude to my advisor Professor Chuan Yue. His
wisdom, insights, guidance and unwavering patience were all crucial in my research and my
personal growth as a researcher. I hope that one day I can exhibit the same virtuous qualities
that he has shown while mentoring me. I am also grateful to my committee members,
Professor Tracy Camp, Professor Nils Tilton, Professor Bo Wu, and Professor Dejun Yang
for their time and support.
I would also like to thank my family for their love and encouragement. I would like to
thank my mother Ruth, a teacher who taught me the importance of education and to always
continue learning. I would like to thank my father Wiley, who instilled in me the perseverance
needed to sustain me during my years of research. Finally, I would like to thank my wife
Elizabeth, words cannot express how grateful I am for her support and sacrifices made during
countless late nights and weekends. Without her, my research and Ph.D. pursuit would not
have been possible.

xiii
CHAPTER 1
INTRODUCTION

Access controls are the processes and mechanisms that allow only authorized users to
perform operations upon the resources of a system. They allow administrators and resource
owners to specify which users can access a system, what resources those users can access, and
what operations those users can perform. Using access controls, administrators implement
the Principle of Least Privilege (PoLP), a design principle where privileged entities oper-
ate using the minimal set of privileges necessary to complete their job. This protects the
system against threats and vulnerabilities by reducing exposure to unauthorized activities
and provide access only for those who have been approved. Although access control can
be considered only one area of security research, it is the most pervasive and omnipresent
aspect of information security [1]. Because the PoLP is so fundamental to secure design, it
is specified in all widely accepted security compliance standards:

• Payment Card Industry (PCI) Data Security Standard (DSS) v3.1, Requirement 7: Re-
strict access to cardholder data by business need to know.

• Health Insurance Portability and Accountability (HIPAA), 164.312(a)(3)(ii)(B): Imple-

ment procedures to determine that the access of a workforce member to electronic protected
health information is appropriate.

• National Institute of Standards and Technology (NIST) Special Publication 800-53, Secu-
rity and Privacy Controls for Federal Information Systems and Organizations, AC-6: The
organization employs the principle of least privilege, allowing only authorized accesses for
users (or processes acting on behalf of users) which are necessary to accomplish assigned
tasks in accordance with organizational missions and business functions.

1
• National Institute of Standards and Technology (NIST) Special Publication 800-171, Pro-
tecting Controlled Unclassified Information in Nonfederal Systems and Organizations,
3.1.5: Employ the principle of least privilege, including for specific security functions and
privileged accounts..

• DOD Instruction 8500.2 Information Assurance (IA) Implementation, Control ECLP-1

Least Privilege: Access procedures enforce the principles of separation of duties and “least
privilege.” Access to privileged accounts is limited to privileged users. Use of privileged
accounts is limited to privileged functions; that is, privileged users use non-privileged ac-
counts for all non-privileged functions.

As information systems have become more complex, access controls have also evolved
to meet the diverse requirements of these information systems. Early access control models
such as Access Control Lists (ACLs) consisting of a list of user permissions attached to each
system object were sufficient for simpler systems. But these models are woefully inadequate
for modern systems where it is not uncommon to deal with thousands of users with federated
identities from multiple systems, each system with its own type of resources and operations,
possibly using different access control models.
In modern systems, the complexity of managing access controls and implementing the
PoLP often exceeds the capacity of manual management. While implementing the PoLP is
a desirable and sometimes mandatory requirement for software systems, proper implementa-
tion can be difficult and is often not even attempted. Previous research into the use of least
privilege practices in the context of operating systems [2] revealed that the overwhelming
majority of study participants did not utilize least privilege policies. This was due to their
partial understanding of the security risks, as well as a lack of motivation to create and
enforce such policies.
In addition to information systems becoming more complex, they have also become more
empowering for their users, increasing the possible damage that may be caused by access

2
control errors. For example, Cloud Computing provides cheap on demand access to com-
puting and storage resources for its users. With this increased power also comes increased
consequences of access control mistakes. The Amazon Simple Storage Service (S3) is just
one of many popular cloud services. S3 provides the ability for users to easily and securely
store data in the cloud and allow other users to read or modify that data. While the access
controls and operations of the S3 service are relatively simple to understand and manage,
there were at least seven major incidents in 2017 where the mismanagement of S3 access
controls led to significant data breaches [3]:

• May 2017: Booz Allen Hamilton exposed battlefield imagery and administrator credentials
to sensitive systems of the National Geospatial Agency (NGA).

• June 2017: Deep Root Analytics exposed personal data of 198 million American voters.

• July 2017: Dow Jones & Co. exposed personally identifiable information of 2.2 million
people.

• July 2017: World Wrestling Entertainment exposed personally identifiable information of

over 3 million customers.

• July and September 2017: Verizon Wireless exposed personally identifiable information of
over 6 million customers and sensitive corporate information.

• September 2017: BroadSoft exposed personally identifiable information of 4 million Time

Warner Cable customers.

• September 2017: Accenture exposed hundreds of gigabytes of data, including private sign-
ing keys and plaintext passwords.

Another common class of security breaches resulting from poor access control and the
power of cloud computing is cryptojacking attacks enabled by compromised cloud creden-
tials. Cryptojacking is any attack involving the unauthorized use of computing resources to

3
mine cryptocurrency. The cloud computing form of cryptojacking attacks occur when users
accidentally expose their cloud computing credentials such as in publicly shared source code.
Attackers find these credentials and use them to to mine cryptocurrency at the victim’s
expense. Many such incidents have been documented in news articles with organizations
such as Tesla [4], The L.A. Times [4], Gemalto [5], and Aviva [5] being just some of the
documented victims of such attacks. These attacks are increasingly common with attack-
ers continually searching open source code repositories such as GitHub for access keys [6].
Improved authentication methods may have prevented these attacks, but even with perfect
authentication, insider threats and accidental misuse are still security issues. The PoLP
helps reduce the damage possible from such threats. In the cryptojacking scenario, reducing
the number of users that can create virtual instances or reducing the number of instances
any single user can create alone would reduce the damage caused by such attacks.
It is important to note that these breaches are not the result of previously unknown
vulnerabilities being exploited, nor due to the efforts of unusually capable and determined
attackers. Instead, these are attacks of opportunity made possible by human errors in man-
aging the access controls of an organization’s resources. The negative impacts of such access
control misconfigurations are pervasive and growing. In 2017, security research firm RedLock
found that 53% of organizations using cloud storage services such as Amazon S3 had inad-
vertently exposed one or more such services to the public. It appears that this is trending
upwards despite growing awareness about the risks of misconfigurations [5]. The damage
from such incidents may have been reduced or prevented all together by stricter adherence
to the PoLP which would restrict the access to such resources to fewer people.
This thesis presents metrics, methods, and experimental results of using automated meth-
ods to implement least privilege access control policies across three separate but related
projects. While the cloud computing environment is the focus of this work because of access
to available data and because it is one of the most complex environments in terms of access
control, the problems of access control errors are not unique to the cloud environment and

4
this work is relevant to addressing such problems in other environments as well.
Before describing solutions, we must first analyze and define the problem of automating
least privileges. There exists a large body of work mining Role Based Access Control (RBAC)
access control policies from existing permissions or audit logs in order to create the smallest
(and most maintainable) RBAC policies with metrics to support these goals. However, these
previous works have neglected to address methods and metrics for measuring the security
of policies in terms of the least privilege. Instead of focusing on maintainability, we argue
that the security of policies and their adherence to the PoLP is the most important goal
when considering automated methods of building access control policies. Our first project,
“Automated Least Privileges in Cloud-Based Web Services” provides an analysis of over-
privilege present in the access control policies of a real world dataset. It also defines a
methodology and metrics for quantifying the security of policies in terms of over-privilege
and under-privilege. Unlike previous approaches which often treat access control policies and
audit logs as fixed sets, our approach considers how these both change over time to better
analyze the risk of over-privilege in policies.
In our second project, “Minimizing Privilege Assignment Errors in Cloud Services”,
we implement three separate policy generation algorithms to create RBAC least privilege
policies by mining a real world dataset of audit logs. Our algorithms consist of a naive
approach, an unsupervised algorithm based on clustering, and a supervised algorithm based
on machine learning classification. Using the same metrics and evaluation methodology as
the first project, we analyze and compare the performance of these three algorithms. These
metrics include a weighting that allows administrators to express how much they value
minimizing under-privilege vs. minimizing over-privilege which we use to determine which
algorithm performs ‘best’ as this weighting varies.
While RBAC is the de-facto access control model in government and industry, the At-
tribute Based Access Control (ABAC) is becoming more popular. ABAC provides the ability
to create security policies using attributes that may be associated with users, objects, or the

5
operating environment. By using the wealth of attribute information in the audit logs and
the greater expressive power of ABAC policies it is possible to create access control policies
which simultaneously reduce under- and over-privilege when compared to RBAC. Creating
such ABAC policies is the focus of our our third project, “Mining Least Privilege Attribute
Based Access Control Policies”. In this project, we implement an algorithm based on as-
sociation rule mining techniques to create ABAC least privilege policies by mining a real
world dataset of audit logs. We adapt the metrics of our previous works and use the same
same methods to evaluate policies over time in terms of under- and over-privilege errors.
In addition to showing the effectiveness of our own algorithm, this project also provides
a methodology and quantitative comparison showing the ability of ABAC to reduce under-
privilege and over-privilege when compared to RBAC which may be valuable to access control
researchers regardless of their interest in policy mining techniques.
The remainder of this chapter briefly describes each of these three projects, one in each
subsection. Each project’s goals, methods, and results are described in detail in separate
chapters of this thesis.

1.1 Automated Least Privileges in Cloud-Based Web Services

The PoLP is a fundamental guideline for secure computing that restricts privileged en-
tities to only the permissions they need to perform their authorized tasks. Achieving least
privileges in an environment composed of many heterogeneous web services provided by a
third party is an important but difficult and error prone task for many organizations. This
paper explores the challenges that make achieving least privileges uniquely difficult in the
cloud environment and the potential benefits of automated methods to assist with creating
least privilege policies from audit logs. To accomplish these goals, we implement two frame-
works: a Policy Generation Framework for automatically creating policies from audit log
data, and an Evaluation Framework to quantify the security provided by generated roles.
We apply these frameworks to a real world dataset of audit log data with 4.3 million events
from a small company and present results describing the policy generator’s effectiveness. Re-

6
sults show that it is possible to significantly reduce over-privilege and administrative burden
of permission management.

1.2 Minimizing Privilege Assignment Errors in Cloud Services

The PoLP is a security objective of granting users only those accesses they need to perform
their duties. Creating least privilege policies in the cloud environment with many diverse
services, each with unique privilege sets, is significantly more challenging than policy creation
previously studied in other environments. Such security policies are always imperfect and
must balance between the security risk of granting over-privilege and the effort to correct for
under-privilege. In this paper, we formally define the problem of balancing between over-
privilege and under-privilege as the Privilege Error Minimization Problem (PEMP) and
present a method for quantitatively scoring security policies. We design and compare three
algorithms for automatically generating policies: a naive algorithm, an unsupervised learning
algorithm, and a supervised learning algorithm. We present the results of evaluating these
three policy generation algorithms on a real-world dataset consisting of 5.2 million Amazon
Web Service (AWS) audit log entries. The application of these methods can help create
policies that balance between an organization’s acceptable level of risk and effort to correct
under-privilege.

1.3 Mining Least Privilege Attribute Based Access Control Policies

Implementing effective and secure access control policies is a significant challenge. Too
much over-privilege increases the risk of damage to the system via compromised credentials,
insider threats, and accidental misuse. Policies that are under-privileged prevent users from
being able to perform their duties. Access control policies are rarely perfect in these regards
and administrators must create policies that balance between the two competing goals of
minimizing under-privilege vs. minimizing over-privilege. The access control model used to
implement policies plays a large role in the ability to construct secure policies and the At-
tribute Based Access Control (ABAC) model continues to gain in popularity as the solution

7
to many access control use cases because of its advantages in granularity, flexibility, and us-
ability. ABAC allows administrators to create access control policies based on the attributes
of the users, operations, resource, and environment. Due to the flexibility of ABAC however,
it can be difficult to determine which attributes and value combinations would create the
best policies in terms of minimizing under- and over-privilege. To address this problem, we
introduce a method of mining ABAC policies from audit logs to generate ABAC policies
which minimize both under- and over-privilege. We also explore optimization methods for
dealing with large ABAC privilege spaces, and present experimental results of our methods
using a real-world dataset demonstrating the effectiveness of our methods.

8
CHAPTER 2
AUTOMATED LEAST PRIVILEGES IN CLOUD-BASED WEB SERVICES

2.1 Introduction

The commoditization of web services by cloud computing providers enables the outsourc-
ing of IT services on a massive scale. The business model of providing software, platform,
and infrastructure components via web services has seen tremendous growth over the last
decade and is forecast to continue expanding at a rapid pace [7]. From small startups to
large companies such as Netflix, Expedia, and Yelp [8], many organizations rely on services
provided by a third party for their mission critical operations. While the adoption of these
hosted web services continues, there are significant security and usability concerns yet to be
solved. Privilege management is a key issue in managing the operation of the diverse array
of web services available.
The principle of least privilege is a design principle where privileged entities operate using
the minimal set of privileges necessary to complete their job [9]. Least privileges protect
against several threats, primarily among them being the compromise of privileged entities’
credentials and functions by a malicious party. Other relevant threats mitigated by least
privileges include accidental misuse, whereby privileged entities may delete or misconfigure
resources which they do not require access to. Another threat is intentional misuse, where
insiders can abuse over-privileges to cause more damage than they would be able to under a
least privilege policy.
While implementing the principle of least privilege is a desirable and sometimes manda-
tory requirement for software systems, proper implementation can be difficult and is often
not even attempted. Previous research into the use of least privilege practices in the context
of operating systems [2] revealed that the overwhelming majority of study participants did
not utilize least privilege policies. This was due to their partial understanding of the security

9
risks, as well as a lack of motivation to create and enforce such policies. In comparison to
the operating system environment, the use of third party web services present a much larger
number of services, resource types, access control policy languages, and audit mechanisms
even within a single service provider making it significantly more difficult to manage access
control.
The main contributions of this paper are: (1) an exploration of the challenges and ben-
efits of implementing an automated least privileges approach for third party web services
using real world data, (2) a concrete implementation of a framework for generating least
privilege policies from audit log data, and (3) metrics and methodology for quantifying the
effectiveness of least privilege policies. Related works are described in Section 2.2. The
motivating example of a real world dataset of manually created policies is analyzed in Sec-
tion 2.3. Automated least privilege generation and evaluation frameworks used are describe
in Section 2.4, the metrics used to evaluate adherence to PoLP are described in Section 2.5
and the results of our analysis are described in Section 2.6.

2.2 Related Work

Addressing the administrative burden of access control management is a well-studied

problem. While many access control models have been researched, Role Based Access Con-
trols (RBAC) remains a common model for implementing access control policies. The fun-
damental premise of RBAC is to create a set of permissions for each functional role required
to perform a job, and then assign privileged entities to these roles [10]. This allows policy
creators to reason about access controls in terms of privileges needed to perform a task and
the tasks an entity must perform.
A significant amount of work has been published on role mining methods which create
more maintainable RBAC policies from existing privilege assignments. The basic RMP uses
the minimal set of roles as the measure of goodness for deriving roles [11]. Alternatives to
the minimum number of roles as a goodness metric for role mining algorithms have also
been explored. A discussion of these alternative goodness metrics is given in [12], which

10
include measuring similarity with existing roles, minimizing the number of user-role assign-
ment and permission-role assignment relations, metrics that seek to reduce administrative
cost, weighted structures that assign adjustable weights to assignment relationships, and
minimizing the number of edges in the role hierarchy graph.
Another related area of research uses audit data to create least privilege policies. Priv-
ileged entities often already possess the privileges necessary to do their jobs, thus roles can
be derived from existing permissions via data mining methods [13]. Notable examples of
mining data to create least privilege policies include EASEAndroid [14] for mobile devices,
ProgramCutter [15] for desktop applications, and Passe [16] for web applications. However,
these approaches do not provide a quantified assessment of how well they achieve the PoLP.
Like role mining, our research aims to reduce the administrative burden of creating access
control policies. However, instead of seeking to make roles more easily maintainable, we
seek to reduce administrator burden by generating secure and complete policies via easily
and frequently repeatable automated methods. The focus of this research is directly on
quantifying and improving the security of automatically generated privilege assignments
regardless of their size and complexity, thus we are addressing a problem different from the
RMP.

2.3 Over-Privilege in Manually Generated Policies

To illustrate the challenges of creating least privilege policies and to highlight the po-
tential of using an automated approach to policy generation, we examine a real world
dataset of policies manually created by administrators. The Amazon Web Services (AWS)
CloudTrail [17] logs of a company which provides a Software as a Service (SaaS) product were
analyzed (with permission). The audit logs contained 4.3M events collected over a period
of 307 days. During this period, 37 unique roles and 15 unique users exercised privileges.
Data gathered from the logs were analyzed and compared with the account Identity and
Access Management (IAM) [18] policies as they existed at the end of the collection period.
To quantify the effectiveness of these manually created policies at limiting over-privilege, we

11
compare the actions and services granted by these policies to those exercised in the audit
log data.
The privileged entities considered in this paper are users and virtual machine instances
which can both be assigned to roles. In our dataset, users were granted unconstrained access
making their comparison with exercised privileges somewhat uninteresting, but also demon-
strating a situation where achieving least privilege policies on users was not even attempted.
In contrast to users, virtual machines in our dataset were not granted unrestricted access
but were assigned roles manually created by administrators with the intent of constraining
the virtual machines to least privilege policies. While data for both users and roles were
analyzed, this section focuses on role policies granted to virtual machines to illustrate the
over-privilege present in manually created policies. As the results show, over-privilege was
common for these roles even though the role creators had the benefit of familiarity with the
application and the privileges it required. Services and actions not supported by CloudTrails
were excluded from these results.
Of the 37 unique roles identified in the dataset, 14 were present in the AWS IAM data at
the end of the collection period (those not found in the IAM policies had been deleted during
the collection period). Figure 2.1 shows a comparison between the actions granted and used
by virtual machine roles during the observation period. Even though the policies for each
role were intended to approximate least privileges, clearly there is a significant difference
between the number of actions granted and number of actions used. The average number of
actions granted to these 14 roles was 61.14, while the average number of privileges used was
2.92.
The comparison of privileges granted to those actually used at the service level of gran-
ularity is shown in Figure 2.2. Significant over-privilege is present at the service level, with
every role being granted privileges to at least one service for which it did not perform any
actions. The average number of services used by roles was 1.71 while the average number of
services granted was 5.07.

12
Role14
Role13 Used Actions
Role12 Granted Actions
Role11
Role10
Role9
Role8
Role7
Role6
Role5
Role4
Role3
Role2
Role1

1 2 4 8 16 32 64 128 256

Figure 2.1: Number of Granted & Used Actions by Role

The results presented in this section demonstrate the over/privilege present in a real
world dataset of manually created policies with significant over-privilege present at both
the action and service level for all virtual machine roles. Achieving least privilege policies
for users was not even attempted in the dataset. These results underscore the difficulty
and administrative burden of achieving least privilege policies in a cloud environment and
provide motivation for an automated least privilege policy generation approach.

2.4 Policy Generation and Evaluation

This section describes the frameworks for generating and evaluating least privilege poli-
cies. First we present a framework for generating least privilege policies from audit logs. We
then present a framework for evaluating the effectiveness of a policy generator.

13
Role14
Role13
Used Services
Role12
Granted Services
Role11
Role10
Role9
Role8
Role7
Role6
Role5
Role4
Role3
Role2
Role1

0 2 4 6 8 10 12 14 16 18 20 22 24
Figure 2.2: Number of Granted & Used Services by Role

The process of generating policies begins with ingesting the raw data audit logs for
a given observation period into a datastore. Once ingested, the logs are normalized by
creating a projection of the events onto each unique privileged entity identified in the audit
logs for a specified observation period. Next, the policy generator algorithm is applied to the
normalized data. The generator implemented for this paper uses a simple counting based
approach which creates policy grants for each action an entity successfully exercised during
the observation phase. After policy generation is complete, additional modifications may
be made to the policies such as denying access to privileges which can be used to escalate
privileges. The policy generation framework is a bottom-up approach to building RBAC
policies where exercised permissions are used to create roles. This design can also be applied
to audit log data that have been previously collected in an organization’s environment, and

14
does not require an active presence in the cloud environment during log collection.
We next implemented a framework for evaluating the generated policies. This evaluation
framework simulates the application of an automated least privileges policy generator across
varying observation periods and operation periods. The purpose of these simulations is to
provide a quantified evaluation of the effectiveness of our current and future policy generators
if they were to be adopted in production by an organization. The information obtained from
these simulations can help determine how long the observation period should be, how long
these generated policies should be used for, and how effective the policy generator is. For
these evaluations we chose one day as the finest granularity of time period as this provides
enough time for entities to complete tasks requiring related privileges.

Figure 2.3: Sliding Window Evaluation

The evaluation framework uses a sliding window approach to perform its duties. It
repeatedly generates observation and operation phases of predetermined sizes and compares
the policy generated during the observation phase to the privileges exercised during the
operation phase. Each of these single evaluations is a trial and multiple trials for the same
evaluation parameters are achieved by incrementing the dates of the observation phase and

15
operation phase by a fixed amount. Figure 2.3 provides a visual representation of how the
sliding window technique is used to generate evaluation trials using the available audit log
data.

2.5 Metrics

The PoLP implies two competing fundamental requirements. Minimize Over-Privilege:

Privileged entities should not be granted more permissions than necessary to complete their
tasks. Minimize Under-Privilege: Privileged entities should be granted all of the permis-
sions that are necessary to complete their assigned tasks. Balancing between these require-
ments presents a trade-off between accepting risk and administrative overhead. To assess
the effectiveness of automatically generated policies, we quantify their adherence to these
requirements for meeting PoLP. We provide a variable weight to balance between these re-
quirements so that organizations of automated least privilege policy tools can determine how
to vary the observation length, operation length, and resource level restrictions depending
on how much they value over-privilege vs. under-privilege.
To provide a quantitative evaluation we adopt terminology common to statistical hy-
pothesis testing. The granting of a privilege by the policy generator is a positive prediction
and the denial of a privilege is a negative prediction. For each evaluation trial, if the policy
generated from the events of the observation phase granted a privilege which was exercised
during the operation phase it is a True Positive (TP), while a granted privilege that was
not exercised during the operation phase is a False Positive (FP). Similarly, if the auto-
matically generated policy denied a privilege which was exercised during the operation phase
it is a False Negative (FN). Privileges which were denied by the policy and not exercised
during the operation phase are a True Negative (TN).
Precision and recall are metrics commonly used in hypothesis testing. Precision is the
fraction of granted privileges that are exercised, high precision values indicate low over-
privilege. Recall is the fraction of exercised privileges that are granted, high recall val-
ues indicate low under-privilege. The case where all privileges are denied is redefined to

16
be P recision = 1 because there is no possibility of over-privilege, and the case where all
privileges are granted is redefined to be Recall = 1 because there is no possibility of under-
privilege. To present more intuitive metrics, we take the compliment of precision and recall
to create metrics where lower values are more favorable: the Over Privilege Rate (OPR) in
Equation 2.1 and Under Privilege Rate (UPR) in Equation 2.2, respectively.
U nexercisedGranted
OP R = 1 − P recision = (2.1)
AllGranted
ExercisedDenied
U P R = 1 − Recall = (2.2)
AllExercised

It is important to consider the amount of time which over-privilege exists. While the cost
of under-privilege is a decreased ability for privileged entities to perform their tasks, high
over-privilege can result in compromises of confidentiality, integrity, and availability if the
over-privilege is exploited by an attacker. The longer that over-privilege exists the greater
the possibility of it being exploited, thus we introduce an additional weight on the OPR to
account for the amount of time which unused privilege grants existed. The Temporal Over
Privilege Rate (TOPR) in Equation 2.3 is the OPR multiplied by the number of days the
privileges went unused (the length of the operation period).

T OP R = OP R · OperationP eriodLength (2.3)

OPR and UPR are two individual metrics for measuring the generated least privilege
policies. To provide a single metric that weights minimal over-privilege vs. minimal under-
privilege, we use the F-score metric (Equation 2.4). Higher β values for the F-score indicate
a higher weight for recall, which indicates a higher weight for minimal under-privilege. Lower
β values for the F-score weight minimal over-privilege higher. We use a temporally weighted
version of the F-score, T Fβ (Equation 2.5), that accounts for the length of time which an
over-privilege was granted. To incorporate a temporal weighting of over-privilege in T Fβ ,
we divide the precision by the operation period length because precision is the compliment

17
of OPR and thus is directly tied to how we score over-privilege. Note that Fβ and T Fβ
are equivalent for the finest granularity of the operation period which is one day in our
simulations.
P recision · Recall
Fβ = (1 + β 2 ) · (2.4)
(β 2 · P recision) + Recall
P recision
2 OperationP eriodLength
· Recall
T Fβ = (1 + β ) · P recision
(2.5)
(β 2 · OperationP eriodLength
) + Recall

The F-score is the harmonic mean of precision and recall. The advantage of using the
harmonic mean F-score over arithmetic mean is that low scores for either precision or recall
will result in an overall low F-score which avoids allowing extreme policies to achieve favorable
scores. Consider an example policy which grants all privileges to an entity. This would result
in a perfect score in terms of precision (1), but the worst possible score in terms of recall (0).
The resulting F-score in this example would be 0 while arithmetic mean score would be 0.5,
the same as if precision and recall were both 0.5. This equal scoring between an extreme
policy and a balanced policy is not desirable in applications which values both precision and
recall.

2.6 Results

This section presents the results of our analysis tying together all of the work described
thus far. We consider the behavior of users and roles granted to virtual machines separately
when evaluating the effectiveness of their policies because they have different usage pat-
terns which produce significantly different scores. The behavior of virtual machines is fairly
consistent in both the actions and resources used while users are less predictable.

2.6.1 Impact of Varying the Operation Period

The results of evaluating the least privilege policy generator for observation periods of 7
and 28 days as the operation phase varies from 1 to 7 days are shown for users in Figure 2.4
and for virtual machine roles in Figure 2.5. The results for both entity types show that as the

18
length of the operation phase increases, the UPR also increases which is to be expected as
privileged entities use privileges that were not observed during shorter operation phases. For
virtual machine roles, there is very little difference between the UPR for 7 days of operation
vs. 28 days of operation. As we will see later in the metrics, the most variability in virtual
machine permissions exercised occurs during the first few days of the observation phase.

6 1

0.9
5
0.8

0.7
4
0.6

OPR/UPR
TOPR

3 0.5

0.4
2
0.3

0.2
1
0.1

0 0
1 2 3 4 5 6 7
OperationDays
TOPR (7 Observation Days) TOPR (28 Observation Days) UPR (7 Observation Days)
OPR (7 Observation Days) OPR (28 Observation Days) UPR (28 Observation Days)

Figure 2.4: User Evaluation as Opr Days Vary (Obs Days=7,28)

As the operation phase increases entities are more likely to use privileges they may
not have exercised previously during shorter periods. Thus the unweighted OPR decreases
for both entity types as the operation period increases. However, the TOPR in Figure 2.4
increases as the operation phase increases, indicating that the new privileges exercised during
each additional day of the operation phase do not reduce over-privilege enough to offset the
over-privilege caused by leaving the unexercised privileges granted to the entities longer. The

19
3 0.6

2.5 0.5

2 0.4

OPR/UPR
TOPR

1.5 0.3

1 0.2

0.5 0.1

0 0
1 2 3 4 5 6 7
OperationDays
TOPR (7 Observation Days) TOPR (28 Observation Days) UPR (7 Observation Days)
OPR (7 Observation Days) OPR (28 Observation Days) UPR (28 Observation Days)

Figure 2.5: Role Evaluation as Opr Days Vary (Obs Days=7,28)

effect is more pronounced users than virtual machine roles - the virtual machine roles have
lower TOPR scores for all operation and observation periods.
To determine a recommended operation period based on how much one values minimal
over-privilege vs. minimal under-privilege, we use the T Fβ metric (Formula 2.5). Figure 2.6
shows the combined T Fβ score for both user and virtual machine role data for varying
operation period lengths and β values. In these charts β = 10 represents that minimal
under-privilege is considered to be 10 times more important than minimal over-privilege
while β = 0.1 represents that minimal over-privilege is 10 times more important than min-
imal under-privilege. All of the calculated T Fβ scores constantly decrease as the operation
period increases indicating the smallest operation period of one day is the optimal choice
for minimizing temporal weighted over-privilege and under-privilege. The higher β values
show generally higher scores which decrease less as the operation period increases, indicating

20
1

0.9

0.8

0.7

0.6
Score

0.5

0.4

0.3

0.2

0.1

0
1 2 3 4 5 6 7
Operation Days
TF10 TF5 TF2 TF1 TF0.1

Figure 2.6: T Fβ score as Opr Days Vary (Obs Days=7)

that increasing the operation period would have a less negative impact for those that value
minimal under-privilege.

2.6.2 Impact of Varying the Observation Period

Next we evaluate the impact of varying the observation period. The results of evaluating
the automated least privilege policy generator for operation phases of lengths 1 and 7 days as
the observation phase varies from 1 to 28 days are shown for users in Figure 2.7 and for virtual
machine roles in Figure 2.8. As the observation period increases the UPR decreases for users
at a logarithmic rate because more privileges exercised by users are captured during longer
observation phases. For virtual machine roles however there is little benefit in increasing
the observation period beyond two days as these virtual machines are unlikely to exercise
additional privileges that have not been exercised after the first day of observation. For both

21
6 1

0.9
5
0.8

0.7
4
0.6

OPR/UPR
TOPR

3 0.5

0.4
2
0.3

0.2
1
0.1

0 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TOPR (1 Operation Days) TOPR (7 Operation Days) UPR (1 Operation Days)
OPR (1 Operation Days) OPR (7 Operation Days) UPR (7 Operation Days)

Figure 2.7: User Evaluation as Obs Days Vary (Opr Days=1,7)

entity types the UPR is again lower for the 1 day operation period vs. the 7 day operation
period.
For both entity types the OPR and TOPR increase as the observation phase increases
because longer observation phases result in entities being granted more privileges. This is
intuitively obvious for users as they are likely to use some privileges periodically which are
captured during the observation phase, and then not use them again for extended periods of
time or at all during the operation phase. Although the virtual machine roles are unlikely to
spontaneously use new privileges like users, not all privileges are exercised on a daily basis.
To determine a recommended observation period based on how much one values minimal
over-privilege vs. minimal under-privilege, we again use the T Fβ metric. For this evaluation
the user and virtual machine role scores are presented separately because (unlike varying

22
3 0.6

2.5 0.5

2 0.4

OPR/UPR
TOPR

1.5 0.3

1 0.2

0.5 0.1

0 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TOPR (1 Operation Days) TOPR (7 Operation Days) UPR (1 Operation Days)
OPR (1 Operation Days) OPR (7 Operation Days) UPR (7 Operation Days)

Figure 2.8: Role Evaluation as Obs Days Vary (Opr Days=1,7)

the operation phase in Figure 2.6) the dissimilar behavior patterns of users and virtual
machines produce different recommended observation periods. Figure 2.9 displays the T Fβ
scores for user entities as the observation phase varies and the operation phase remains fixed
at one day. The decreasing scores for β = 0.1, 1, 2 imply that organizations which value
minimal over-privilege should choose a lower observation period. Even if minimal under-
privilege is valued twice as much as minimal over-privilege as indicated by β = 2, the OPR
rises significantly faster than the under-privilege rate decreases as the observation period
increases (as shown in Figure 2.7). For β = 5, 10 the T Fβ increases as the observation period
increases before eventually decreasing at 8 days for β = 5 and stabilizing at 13 days for
β = 10 as the increasing OPR outweighs the more heavily rated but slower to decline UPR.
The T Fβ scores for virtual machine roles are presented in Figure 2.10. The role based scores

23
0.8

0.7

0.6

0.5
Score

0.4

0.3

0.2

0.1

0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TF10 TF5 TF2 TF1 TF0.1

Figure 2.9: User T Fβ scores as Obs Days Vary (Opr Days=1)

for low β again show that organizations which value minimal over-privilege should use small
observation periods, while organizations which value minimal under-privilege will see little
or no benefit in extending the observation period for these roles as the under-privilege rate
showed little decline for observation periods over two days (as shown in Figure 2.8).

2.6.3 Summary of Results

The results of this section quantify the effectiveness of our policy generator applied to
real world hosted web service audit log dataset. They describe how the performance of
the policy generator is affected by varying the observation period and operation period.
Based on this evaluation, we found that the actions of users were relatively difficult to
predict compared to virtual machine roles with incidents of under-privilege being much
higher for users. Virtual machines could be constrained to their actions used during their

24
1

0.9

0.8
Score

0.7

0.6

0.5

0.4
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Observation Days
TF10 TF5 TF2 TF1 TF0.1

Figure 2.10: Role T Fβ scores as Obs Days Vary (Opr Days=1)

first couple days operation to significantly reduce over-privilege present in their policies. For
both types of privileged entities, increasing the operation period increased under-privilege
while increasing the observation period increased over-privilege.
The conclusions drawn from these results are valuable because they quantify the perfor-
mance that can be expected by adopting an automated least privilege approach and they
provide a benchmark by which to judge future policy generation algorithms. The generation
of these results also demonstrates the application of the policy generation and evaluation
frameworks which can be used for evaluating future algorithms.

2.7 Summary

This paper explored the challenges and benefits of automating least privilege policies in
a cloud computing environment. Previous research in role mining approaches in other envi-

25
ronments were examined and unique aspects of automated role mining in a cloud computing
environment were identified. A bottom-up design to generate least privilege policies was
implemented to illustrate the potential of an automated least privilege approach and the
results of evaluation on real world audit log data were presented. The results showed that
even when administrators attempt to manually create least privilege policies there is signif-
icant room for improvement upon these policies. Metrics for evaluating the effectiveness of
least privilege policy generators were presented for the same data set. These results showed
the trade-offs between over-privilege and under-privilege that can be achieved by varying
the observation period, operation period, and resource constraints for the presented policy
generator and these results provide benchmarks for future policy generators to be evaluated
against.

26
CHAPTER 3
MINIMIZING PRIVILEGE ASSIGNMENT ERRORS IN CLOUD SERVICES

3.1 Introduction

Cloud computing has revolutionized the information technology industry. Organizations

leverage cloud computing to deploy IT infrastructure that is resilient, affordable, and mas-
sively scalabel with minimal up-front investment. Small startups can rapidly move from an
idea to commercial operations and large enterprises can benefit from an elastic infrastructure
that scales with unpredictable demand. Because of these benefits, cloud providers have seen
significant growth recently with cloud computing industry revenue up 25% in 2016 totaling
$148 billion [19]. Despite the wide adoption of cloud computing, there are still significant
issues regarding security and usability that must be addressed. Privilege management is one
such security and usability issue.
The principle of least privilege requires every privileged entity of a system to operate
using the minimal set of privileges necessary to complete its job [20], and is considered
a fundamental access control principle in information security [1]. Least privilege policies
limit the amount of damage that can be caused by compromised credentials, accidental
misuse, and intentional misuse by insider threats. Least privilege is also a requirement
of all compliance standards such as the Payment Card Industry Data Security Standard,
Health Insurance Portability and Accountability Act, and ISO 17799 Code of Practice for
Information Security Management [21].
Despite the importance of implementing least privilege policies, they are not always
implemented properly because of the difficulty of creating them and sometimes they are not
implemented at all. Previous research on the use of least privilege practices in the context
of operating systems revealed that the overwhelming majority of study participants did not
utilize least privilege policies [2]. This was due to their partial understanding of the security

27
risks, as well as a lack of motivation to create and enforce such policies. Failing to create
least privilege policies in a cloud computing environment is especially high risk due to the
potentially severe security consequences. However, it is also significantly more difficult
to achieve least privilege in the cloud computing environment than in other environments
due to the large variety of services and actions as detailed in Section 3.3.
Automatic methods for creating security policies that are highly maintainable have re-
ceived a significant amount of research in works that address the Role Mining Problem
(RMP). However, the maintainability of policies does not directly address how secure or
complete a policy is. To directly address the goals of security and completeness in policies,
we define the Privilege Error Minimization Problem (PEMP) where automatically
generated policies for future use are evaluated directly on their security and completeness.
The most important metric of a generated security policy should be how secure it is (mini-
mizing over-privilege) and how complete it is (minimizing under-privilege).
We use machine learning methods to address the PEMP which is fundamentally a pre-
diction problem. Audit logs contain the richest source of data from which to derive policies
that assign privileges to entities. We mine audit logs of cloud services using one unsupervised
and one supervised learning algorithm to address the PEMP along with a naive algorithm
for comparison. Note that researchers often take a program analysis approach to find which
privileges are needed by specific mobile or other types of applications; we do not take this
approach to address PEMP because the privilege errors in PEMP are associated with priv-
ileged entities, not an application. The F-Measure is a commonly used metric for scoring
in binary classification problems which we adapt to our problem. We show how the β vari-
able of the F-Measure can be used to provide a weighted scoring between under-privilege
and over-privilege. We present the results of our algorithms across a range of β values to
demonstrate how an organization can determine which approach to use based on its level of
acceptable risk.

28
The main contributions of this paper are: (1) a formal definition of the PEMP which
describes the problem of creating complete and secure privilege policies regardless of the
access control mechanism, (2) a metric to assess how well the PEMP is solved based on
the F-Measure, (3) a methodology of training and validating policy generation algorithms,
and (4) one supervised and one unsupervised learning algorithm applied to generating least
privilege policies and an analysis of their performance.
Section 3.2 reviews related works on role mining and automated least privileges. Sec-
tion 3.3 presents a comparison of the privilege spaces of various environments and a de-
scription of our dataset. Section 3.4 formally defines the PEMP and a scoring metric for
evaluating how well it is solved. Section 3.5 details specific algorithms and methods used
in our approach to addressing the PEMP and Section 3.6 analyzes the results of these al-
gorithms. Section 3.7 concludes this work and discusses potential research areas for future
work.

3.2 Related Work

There are two areas of work closely related to ours: role mining and implementing least
privilege policies in other environments. Role mining refers to automated approaches to
creating Role Based Access Control (RBAC) policies. Role mining can be performed in
a top-down manner where organizational information is used or in a bottom-up manner
where existing privilege assignments such as access-control lists are used to derive RBAC
policies [22]. The problem of discovering an optimal set of roles from existing user permissions
is referred to as the Role Mining Problem (RMP) [23].
While we do not directly attempt to solve the RMP or one of its variations, our work
has aspects in common with works that do. The authors of [22] defined role mining as
being a prediction problem which seeks to create permission assignments that are complete
and secure by mining user permission relations. We also employ prediction to mine user
permission relations and create policies to balance completeness and security. Our work
differs from those that address RMPs in several key ways however. We mine audit log

29
data produced by a system in operation, not existing or manually created user-permission
assignments. We do not assume that the given data naturally fits into an RBAC policy
that is easy to maintain and secure. Most importantly, instead of evaluating an RBAC
configuration based on its maintainability, we focus on evaluating user privilege assignments
based on their completeness (minimizing under-privilege) and security (minimizing over-
privilege). We view our work as complementary to RMP research as once balanced user
permission assignments are generated, existing RMP methods can be used to derive roles
which are more compact.
Another area of research closely related to ours is works that use audit log data to achieve
least privilege. Privileged entities often already possess the privileges necessary to do their
jobs, thus roles can be derived from existing permissions via data mining methods [13].
Methods of automated policy generation have been studied in several environments. Pol-
gen [24] is one of the earliest works in this area which generates policies for programs on
SELinux based on patterns in the programs’ behavior. Other notable examples of mining au-
dit data to create policies include EASEAndroid [14] for mobile devices, ProgramCutter [15]
for desktop applications, and Passe [16] for web applications. [25] used Latent Dirichlet
Allocation (LDA), a machine learning technique to create roles from source code version
control usage logs. In [26], the same group used a similar approach to evaluate conformance
to least privilege and measured the over-privilege of mined roles in operating systems.
Previous approaches have several shortcomings which are addressed in this paper. Polgen
guides policy creation based on logs but does not provide over-privilege or under-privilege
metrics. EASEAndroid’s goal is to identify malicious programs for a single-user mobile
environment, not to create user policies. ProgramCutter and Passe help partition system
components to improve least privilege but do not create policies for privileged entities. Only
[25], [26] and [27] present metrics on over-privilege and under-privilege by comparing policies
to usage. Key issues with these works is that they assume roles are stable, not accounting
for change in user behavior over time, and use cross-validation for model evaluation which

30
is not appropriate for environments where temporal relationships should be considered. We
address these short comings using the rolling forecasting and sliding simulation methods
discussed in Sections 3.4.3.2 and 3.5.3, respectively. Finally, our work addresses the trade-off
between over- and under-privilege and the selection of different algorithms based on how an
organization values over- vs. under-privilege. A metric based on the F-Measure for scoring
over-privilege and under-privilege by comparing policies to usage and naive algorithm only
for building policies was presented in [27] which we expand upon and use the naive algorithm
presented in that work for comparison purposes.

3.3 Data Description

The cloud environment is multi-user and multi-service, with high risk where errors in
privilege assignments can cause significant damage to an organization if exploited. With a
large number of services, unique privileges to each service, as well as federated identities and
identity delegation, the cloud also presents more complexity to security policy adminis-
trators than environments previously studied for policy creation such as mobile, desktop,
or applications. To quantify the scale of privilege complexity, we consider the size of the
privilege spaces for three environments: Android 7, IBM z/OS 1.13, and AWS. Android [28]
requires an application’s permissions to be specified in a manifest included with the appli-
cation with 128 possible privileges that can be granted. For IBM z/OS [29], we consider the
number of services derived from the different types of system resource classes; there are 213
resource classes and five permission states that can be granted to every class. The privilege
space of AWS is much larger however, with over 104 services and 2,823 unique privileges as
of August 2017 [30].
Our dataset for training and evaluation consists of 5.2M AWS CloudTrail audit events
representing one year of cloud audit data provided by a small Software As A Service (SaaS)
company. To better understand how much of the privilege space is used in our dataset,
statistics about privileged user behavior are shown in Table Table 3.1. This table separates
the metrics by the first month, last month, and total for one year of data. Users is the number

31
of active users during that time period. Unique Services Avg. is the average number of unique
services used by active users. Unique Actions Avg. is the average number of unique actions
P
exercised by active users, and Action Avg. is the average of the total actions exercised by
active users. The standard deviation is also provided for Unique Services, Unique Actions,
P
and Actions metrics to understand the variation between individual users. For example,
P
looking at both the Unique and Actions, we observe that their standard deviation is
higher than the average for all time periods, indicating a high degree of variation between
how many actions users exercise.

Table 3.1: One Year Total Usage of our Dataset

Metric First Month Last Month One Year
Users 7 13 18
Unique Services Avg. 5.86 8.08 13.50
Unique Services StdDev. 2.97 5.22 9.04
Unique Actions Avg. 13.71 45.31 88.78
Unique
P Actions StdDev. 20.21 48.13 91.99
P Actions Avg. 91.97 78.38 238.30
Actions StdDev. 299.89 261.95 1271.15

3.4 Problem Scope and Approach

The problem we address is that of automatically creating least privilege access control
policies in the cloud environment.

3.4.1 Problem Definition

We refer to the problem formally as the Privilege Error Minimization Problem (PEMP)
and define it using the notation from the NIST definition of RBAC [31].

• USERS, OPS, and OBS (users, operations, and objects, respectively).

• P RM S = 2OP S×OBS , the set of permissions

• U P A ⊆ U SERS × P RM S, a many-to-many mapping of user-to-permission assignments.

32
Additionally we define the following terms:

• U P E ⊆ U P A, a many-to-many mapping of user-permission relations representing per-

missions exercised by users during a time period.

• OBP observation period, the time-period during which exercised permissions (UPE) are
observed and used for creating user-to-permission assignment UPA.

• OP P operation period, the time-period during which the user-to-permission assignments

UPA is to be considered in operation.

While both UPE and UPA are user-to-permission relations, UPE represents exercised
permissions but UPA represents all assignments. Using the preceding terms, we now define
the PEMP.

Definition 1. Privilege Error Minimization Problem (PEMP). Given a set of users USERS,
a set of all possible permissions PRMS, and a set of user-permissions exercised UPE, find
the set of user-permissions assignments UPA that minimizes the over-privilege and under-
privilege errors for a given operation period OPP.

The PEMP is fundamentally a prediction problem. Given availabel information over

time-period OBP, we seek to predict the set of permission assignments UPA that will be
necessary for privileged entities to complete their tasks during a given operation time-period
OPP. This UPA should bound the set of permissions exercised during the operation time-
period as tightly as possible to avoid both unused permissions (over-privilege) and missing
permissions (under-privilege). We have intentionally left the assessment metric of how priv-
ilege assignment errors are measured out of the problem definition. A problem may have
many solutions as well as many metrics for determining if a problem is solved. This sepa-
ration of the problem and assessment metrics allows for the discussion of metrics separate
from the problem itself.

33
3.4.2 Algorithm Overview

Now that we have defined the PEMP as being a prediction problem, we adapt existing
prediction algorithms to address it. We utilize two machine learning methods in this paper
to generate privilege policies from mining audit log data. First, we employ clustering to find
privileged entities which use similar permissions, making the problem analogous to that of
finding similar documents in a text corpus. After finding similar users, we generate policies
that combine the privileges used by clustered entities. The second machine learning method
we employ is classification. Using a set of user-to-privilege relations exercised during the
observation period, we train a classifier to learn which user-to-privilege relations should be
classified as grant and which should be denied. Once trained, we use the classifier to generate
policies for an operation period. More details on the application of these algorithms to
generate least privilege policies are discussed in Section 3.5.

3.4.3 Model Assessment

We borrow techniques and terminology used in machine learning literature for assessing
the effectiveness of our algorithms in addressing the PEMP. Using a standard approach
for evaluating the effectiveness of a predictive model [32], we take a test dataset for which
we know the expected (target) predictions that the model should make, present it to a
trained model, record the actual predictions that made, and compare them to the expected
predictions. We first present our method for scoring individual predictions, and then our
method for splitting up the dataset into multiple partitions.

3.4.3.1 Scoring individual predictions

Policy generation for a given operation period is a two-class classification problem where
every user-to-permission mapping in a generated policy falls into one of two possible classes:
grant or deny. By comparing the predicted privileges to the target privileges, we can cate-
gorize each prediction into one of four outcomes:

34
• True Positive (TP): a privilege that was granted in the predicted policy and exercised
during the OPP.

• True Negative (TN): a privilege that was denied in the predicted policy and not exercised
during the OPP.

• False Positive (FP): a privilege that was granted in the predicted policy but not exercised
during the OPP.

• False Negative (FN): a privilege that was denied in the predicted policy but attempted to
be exercised during the OPP.

Using the above outcomes we can then calculate precision, recall, and the F1 mea-
sure, a frequently used set of performance metrics in machine learning and information
retrieval [32]. Precision and recall are defined as follows[32]:
TP
precision = (3.1)
(T P + F P )
TP
recall = (3.2)
(T P + F N )
In terms of this problem domain, precision is the fraction of permissions accurately
granted by the predictor (T P ) over all permissions granted by the predictor (T P + F P ).
If there were no permissions granted by the predictor that went unused in the OPP, then
precision = 1. Thus a high precision value is an indicator of low over-privilege. Similarly,
recall is the fraction of permissions accurately granted by the predictor (T P ) over all permis-
sions exercised in the OPP (T P + F N ). If there were no permissions denied by the predictor
that should have been granted, then recall = 1. Thus a high recall value is an indicator of
low under-privilege.
Precision and recall can be collapsed into a single performance metric, the F1 measure,
which is the harmonic mean of precision and recall. For predictive assessment, it is often
preferable to use a harmonic mean as opposed to an arithmetic mean. Arithmetic means

35
are susceptible to large outliers which can dominate the performance metrics. The harmonic
mean however emphasizes the importance of smaller values and thus gives a more realistic
measure of model performance[32]. For example, the arithmetic mean when precision=0 and
recall=1 is 0.5, however the harmonic mean of those same values is 0.
The F1 measure is “balanced” because it gives equal weighting to precision and recall.
For our assessment we utilize a general form that allows for a variable weighting between
recall and precision (or, under-privilege and over-privilege), β. High β values increase the
importance of recall, while low β values increase the importance of precision. The weighted
measure, Fβ is defined in Equation 3.3.
P recision · Recall
Fβ = (1 + β 2 ) · (3.3)
(β 2 · P recision) + Recall
The β weighting is important because it is not reasonable to expect all potential users
of a policy generation tool to value over-privilege and under-privilege equally. Molloy et al.
identified equal weighting between over- and under-assignments as a problem in several pre-
vious works addressing the RMP [33], and preferred to weight more importance to reducing
over-privilege. It is also reasonable to expect that some organizations are willing to accept
more risk from over-privilege to minimize the cost of privileged entities not being able to
perform their duties due to under-privilege.

3.4.3.2 Scoring multiple predictions

Following the standard approach for evaluating model effectiveness described earlier,
we will compare predicted results to expected (target) results. Rather than using a single
operation period for our evaluation which may not be representative of the entire dataset,
we must partition the dataset into multiple training and test sets using a sampling method.
We then aggregate the results of evaluating these partitions to produce a single score for a
proposed solution.
For our scenario however, we observe that there is a temporal aspect to permissions
and interdependencies between the exercised actions which imposes specific restric-

36
tions on how we should partition the dataset. For example, a resource such as a virtual
machine must be created before it can be used, modified or deleted. Methods such as hold-
out sampling and k-fold cross validation which randomly partition a dataset do not account
for interdependencies in the data and may not allow for learning algorithms to observe these
dependent actions to occur. Thus we use a sampling approach for scenarios like ours which
considers a time dimension with interdependent data referred to as “out-of-time sampling”;
it is a form of hold-out sampling which uses data from one time period to build a training
set and another period to build a test set[32]. The application of out-of-time sampling to
generate and score multiple training and test sets is sometimes known as “rolling forecasting
origin”, which is similar to cross-validation but the training set consists only of observations
that occurred prior to those in the test set [34]. Suppose k observations are required to
produce a reliable forecast. Then rolling forecasting origin works as follows [34].

1. Select the observation at time k + i for the test set, and use the observations at times
1, 2, ..., k + i − 1 to estimate the forecasting model. Compute the error on the forecast
for time k + i.

2. Repeat the above step for i = 1, 2, ..., T −k where T is the total number of observations.

3. Compute the forecast accuracy measures based on the errors obtained.

Adapting the above method to our domain, we allow the training set/observation period
to be comprised of any set of dates before time k + i, and the test set/operation period is
specifically at time k + i. We define the step size i to be of one day, which is an adequate
amount of time to complete most tasks using related permissions. Also, when using an
automated solution to generate permission policies, it is reasonable to expect that new
solutions can be generated on at least daily basis.
The measure of forecast accuracy in our scenario is the Fβ score for a given operation
period described in Section 3.4.3.1, where a perfect prediction with no over-privilege and no
under-privilege present would score a 1.0. We use a rolling mean to compute the accuracy

37
of a proposed solution across all operation periods. Thus our quality measure used for
assessing an automated solution to creating permission policies should maximize the average
Fβ measure across all operation periods:

1 X
T −k
Fβ (P recisioni , Recalli ) (3.4)
T − k i=1

3.5 Methodology

This section describes the algorithms and techniques we design to address the PEMP in
the cloud environment. We first present a naive algorithm which will be used to establish a
performance baseline for us to compare the performance of our learning based approaches to.
While the naive algorithm merely uses a privilege entity’s observed privileges to build policies,
the learning based approaches also account for the behavior of other users in generating
policies. Each of these methods is applied for a single operation period. The evaluation of
an algorithm across multiple operation periods is done using the method described in Section
3.4.3.2.

3.5.1 Naive Policy Generation

The naive approach shown in Algorithm 1 takes all privileges exercised during the obser-
vation period as input and combines them to form a privilege policy to be used during the
operation period. This seems a reasonable approach for a policy administrator to take if they
needed to implement a least privilege policy in an environment where all privilege entities
previously had unrestricted access to all permissions. By examining all previous access logs
or only the access logs up to a specific point in the past, they can discover all privileges
used by each privileged entity and thus expect this to be the set of privileges required for a
privileged entity to perform their duties. Although infrequently used privileges will not be
captured if they are outside of the observation period, policy generation algorithms can still
achieve good results without knowing the frequency for which these privileges are exercised

38
because infrequently used privileges will have little impact on the Fβ score, particularly for
low β values which value minimizing over-privilege. Furthermore, in a low β environment
it is likely that infrequently used privileges should be denied by default and granted by
exception instead of always being granted by a long-term policy.

Algorithm 1: Naive Policy Generator

Input: U P E The set of user-permissions exercised during the observation period
OBP .
Output: U P A The mapping of user-to-permission assignments.
1 U P A ← ∅;
2 for user, perm ∈ U P E do
3 U P Auser ← rolesuser ∪ perm;
4 end
5 return U P A

3.5.2 Unsupervised Policy Generation

Our unsupervised learning policy generation method (Algorithm 2) uses a clustering al-
gorithm to find clusters of similar privileged entities based on their permissions exercised.
By placing each permission exercised by an entity into a separate document and applying
clustering to the document corpus (lines 2-5), we have made the problem analogous to find-
ing similar text documents in a corpus. Once similar entities are grouped by clustering,
each group is assigned a shared role and granted the combined permissions of all entities
in that role (lines 6-14). Entities which do not belong to any cluster are granted only the
privileges they used during the observation period just as in the naive method (lines 15-19).
It is important to note that using this method of combining similar entities only grants
permissions additional to those used during the observation period. This is useful in envi-
ronments where minimizing under-privilege is more important than minimizing
over-privilege.
There are several details of our application of clustering worth describing here. Each
document is converted to a feature vector for clustering using a Term Frequency-Inverse
Document Frequency (TF-IDF) vectorizer. TF-IDF is a common approach for finding similar

39
Algorithm 2: Unsupervised Policy Generator
Input: U P E The set of user-permissions exercised during the observation period
OBP .
Output: U P A The mapping of user-to-permission assignments.
1 U P A, documents ← ∅;
2 for user, perm ∈ U P E do
3 documentsuser ← documentsuser ∪ perm;
4 end
5 clusters, outliers ← DBSCAN (documents);
6 for cluster ∈ clusters do
7 role ← ∅;
8 for user, document ∈ cluster do
9 for perm ∈ document do
10 role ← role ∪ perm;
11 end
12 end
13 U P Auser ← role;
14 end
15 for user, document ∈ outliers do
16 for user, perm ∈ document do
17 U P Auser ← rolesuser ∪ perm;
18 end
19 end
20 return U P A

40
documents in information retrieval [35]. The TF-IDF weighting has the advantage that
it preserves information about how often each permission is exercised by a user. Once
vectorization is complete, the specific clustering algorithm we use for finding similar users is
the DBSCAN algorithm of the scikit-learn library [36], an implementation of the algorithm
originally published in [37]. The DBSCAN algorithm has several advantages for our scenario,
primary among them being that we do not need to specify the expected number of clusters
ahead of time unlike other popular clustering algorithms such as k-means. The performance
of DBSCAN also scales well in regards to the number of samples given when compared to
other clustering algorithms [38]. There is one relevant hyper-parameter for DBSCAN which
we vary in our policy generation experiments, ǫ, which is the maximum distance between two
samples for them to be considered as in the same cluster. We explore three methods for
calculating ǫ: the mean distance between all points, median distance between all points,
and middle point between the minimum and maximum points in the vector space.

3.5.3 Supervised Policy Generation

For the supervised learning approach, we design a classification algorithm to generate

policies as follows (and in Algorithm 3). First we construct a training set of documents
from the permissions exercised during the observation period and select a subset of previous
data for creating the class labels (line 3). We then train a classifier using the training
set for each permutation of the Classifier Algorithm Parameters (CAP) (lines 4-6). These
multiple instances of the classifier with different permutations of the CAP are used for hyper-
parameter selection using the “sliding simulation” method to be described in Section 3.5.3.2
. Next we create a set of possible permissions that may be exercised during the operation
period based on the Policy Generation Parameters (PGP) (line 9). Each of the possible policy
permissions is tested against the classifier which will predict that the permission should be
either granted or denied, and the results of this classification are used to create the policy
for the next operation period (lines 10-15).

41
Algorithm 3: Supervised Policy Generator
Input: U P E User-Permissions Exercised. The set of user-permissions exercised
during the observation period OBP .
Input: P RM S The set of possible permissions.
Input: T SP Training Set Parameters. Mapping of parameters used to build the
training set.
Input: CAP Classifier Algorithm Parameters. Mapping of parameters used to build
the predicted policy from a trained classifier.
Input: P GP Policy Generation Parameters. Mapping of parameters used to build
the predicted policy from a trained classifier.
Output: U P A Mapping storing the roles generated by each of the classifier instances.
1 U P A ← ∅;
2 for tP arams ∈ permute(T SP ) do
3 f eatureV ector, labelSet ← createTrainingSet(tP arams, U P E);
4 for clf P arams ∈ permute(CAP ) do
5 clf ← decisionTree(clf P arams);
6 clf ← clf.train(f eatureV ector, labelSet);
7 for pP arams ∈ permute(P GP ) do
8 roles ← ∅;
9 possibleP rivs ← createPossiblePrivs(pP arams, P RM S);
10 for user, perm ∈ possibleP rivs do
11 if clf.predict(user, perm) == ’granted’ then
12 rolesuser ← rolesuser ∪ perm;
13 end
14 end
15 U P AtP arams,clf P arams,pP arams ← roles;
16 end
17 end
18 end
19 return U P A

42
3.5.3.1 Classification Algorithm and Feature Selection

We use a decision tree (DT) classification algorithm for supervised learning, also from
the scikit-learn library [36]. The algorithm implemented in the library is an optimized
version, an implementation of the CART algorithm published in [39]. The advantages of the
decision tree algorithm used are speed and the ability to display the set of rules learned during
classification. It was also the top performing classification algorithm in our preliminary
comparison of 15 different classification algorithms in the scikit-learn library.
We utilize five features availabel directly from the audit log data for training: the time
at which a permission was exercised, the unique identifier of the executing entity, the type
of entity (user or delegated role), the service which the action belonged to, and the type
of action performed. Instead of using the absolute time of an action, we derive features
capturing whether it was exercised on a weekend or weekday, as well as the specific day
of the week. These are all bottom-up data attributes availabel directly from the access
logs. Other top-down information such as job role or organization department was not
availabel with our dataset (nor does it exist in many small organizations), but could easily
be integrated with the exercised privilege information if availabel.

3.5.3.2 Sliding Simulation for Supervised Parameter Selection

Several hyper-parameters must be selected for our supervised learning approach. These
include parameters for the decision tree classifier, the constructions of the training set, the
policy construction from the trained classifier. Our method for selecting optimized hyper-
parameters uses only out-of-sample data and is an adaptation of the “sliding simulation”
method presented in [40].
The sliding simulation method of [40] is based on three premises. First, a model should
be selected based on how well it predicts out-of-sample actual data, not on how well it fits
historical data. Second, a model is selected from among many candidates run in parallel on
the out-of-sample data. Third, models are optimized for each forecast horizon separately,

43
making it possible to use different models and optimize parameters within models. The
method operates by running several prediction models in parallel across a sliding window of
data, computing the accuracy of each model for a given period and selecting the model(s)
with the best score to be used in creating the forecast for the next period. Using this
technique, the author in [40] showed that it outperformed the best method of a previous
competition in statistical forecasting (the M -Competition [41]) by a large margin.
As in the sliding simulation method, we run many permutations of parameters in parallel
on out-of-sample data and use the best performing parameters to create a future prediction.
Modifications were implemented to adapt sliding simulation to our problem domain. Slid-
ing simulation originally dealt with making numerical predictions and measuring the error
between a predicted and actual value. In our scenario a security policy is the prediction and
we use the Fβ score presented in Section 3.4.3.1 as our scoring criteria. While [40] used all
observation points before the forecast period, the most recent exercised permissions are most
relevant to predicting future permissions; training a classifier with older and less relevant
permissions had a negative effect on prediction accuracy.

3.5.4 Model Decomposition

Time series decomposition is a common technique used to improve predictions [34], it

identifies patterns in data and decomposes the data into different models based on those
patterns. We applied time series decomposition to our data after recognizing sig-
nificant differences between the privileges exercised during weekdays and week-
ends. While given enough data and the proper features a supervised approach should be
able to learn and use these patterns to make predictions, decomposing the data provides
several advantages: (1) improves scores for both naive and unsupervised algorithms, (2) less
training data is needed for the supervised approach since it does not need to learn the dif-
ferent behavior patterns in weekdays and weekends, and (3) information about weekday or
weekend patterns can be used in hyper-parameters that control the creation of the training
set for supervised learning.

44
We use two methods of decomposing the time series data which we term filter decomposi-
tion and filler decomposition. For the filter method, the days which do not fit into the chosen
model are filtered out of each observation period in the sliding window evaluation before the
data are used by the algorithms. With the filler method, the end date of the sliding window
evaluation is used as a starting point and the observation period is created by enlarging the
window by moving the start date backward until the observation period is “filled” with only
data matching the chosen model. Consider a sliding window evaluation with a window size
of 10 days using these two decomposition methods. For the filter method, the number of
days fitting the weekday model will vary from 6 to 8, and the number of days fitting the
weekend model will vary from 2 to 4. For the filler method, the number of days fitting a
model will always be 10 days when the sliding window size is 10 days.
The decomposition method used for evaluation is chosen based on the β value we wish
to optimize for. For algorithms seeking to score well for β > 1, increasing the window
size results in better scores, and the filter approach is used where the variations in the
observation dataset size are smoothed out across larger windows. For experiments which
seek to score well for β < 1, smaller window sizes score more favorably but the variable
number of matching days which fit within a chosen time period can have undesirable effects
on the results when using small window sizes. Thus the filler model is used in experiments
for β < 1 which gives a consistent number of days for data points in each window.

3.6 Results

This section analyzes the performance of our algorithms for generating security policies.
We first examine the results using the complete model and then show how decomposition
and the use of multiple decomposed models can improve on those results.

3.6.1 Complete Model Results

The Receiver Operating Characteristic (ROC) curve is a graphic commonly used to chart
the performance of binary classifiers. It charts the trade-off between the True Positive Rate

45
1

0.9

0.8

0.7
True Positive Rate (Recall)

0.6

0.5

0.4

0.3

0.2

0.1

0
0.00001 0.0001 0.001 0.01 0.1 1
False Positive Rate
DBSCAN-Average DBSCAN-Median DBSCAN-Middle Naïve DT-SOD-Recall

Figure 3.1: Receiver Operating Characteristic Curves

(TPR, also called recall) and the False Positive Rate (FPR) of a binary classifier, with the
ideal performance having a TPR value of one and FPR of zero. While the ROC illustrates
FPR, the rest of the charts in this section use Fβ described in Section 3.4.3.1. The ROC
curves for the naive, three unsupervised (DBSCAN) and one supervised (DT) algorithms
across multiple observation period lengths are presented in Figure 3.1. All of the algorithms
perform well in terms of minimizing the FPR with the unsupervised methods being able to
provide higher recall than the naive approach but at the cost of higher FPR. The supervised
approach is not able to score as well as the other algorithms in terms of recall but maintains

46
0.9

0.8

0.7

0.6
Score

0.5

0.4

0.3

0.2

0.1

0
1/100 1/90 1/80 1/70 1/60 1/50 1/40 1/30 1/20 1/10 1/5 1/2 1 2 5 10 20 30 40 50 60 70 80 90 f100
Beta
DBSCAN-Average DBSCAN-Median DBSCAN-Middle Naïve DT-SOD DT-SND AllowAll

Figure 3.2: Beta Values Curves

a lower FPR for all data points. The use of specific observation period sizes for the sliding
window method described in Section 3.4.3 prevents the data points from spanning the entire
range of the chart which is typical for ROC curves.
The performance of the naive, three unsupervised, and two supervised algorithms across
Fβ values for 1/100 <= β <= 100 is presented in Figure 3.2. Two separate methods are in
this section for labeling the training data: substitution/overlapping daytype (SOD) where
a day of the same type (weekday or weekend) is used which overlaps with the observation
period, and substitution/non-overlapping day of week (SND) where a day on the same day of
the week is used which was prior to (non-overlapping) the observation period. Additionally,
the performance of the policy that allows all privileges are also shown in this chart for
comparison. The scores on this chart represent the best performance of each algorithm
regardless of the size of the sliding window used for the observation period. Some important
trends are evident from this chart. For β values where 1 < β < 50, the naive approach
performs the best with the unsupervised methods scoring slightly better after β > 50. The
policy that allows all privileges comes close to scoring as good as the naive approach at
β = 100, but even for such a high β, the naive and unsupervised algorithms are still favorable
over the allow all policy. While the performance of the unsupervised algorithms is not very
compelling in this chart, later results using decomposition will show a larger performance gap
between the naive and unsupervised methods for high β values. The supervised algorithms
score relatively poorly for β > 1. For β values where β < 1, the supervised algorithms

47
score significantly better than the naive algorithm as β decreases with the performance gap
widening until β < 1/30, where the scores of the supervised and naive algorithms cease to
improve as β decreases. The unsupervised algorithms score relatively poorly for β < 1.
The trends in these charts highlight the strengths and weaknesses of each algo-
rithm. By granting users the privileges used by similar users, the unsupervised algorithms
predict privileges a user may use in the future. But there is no mechanism for the unsuper-
vised learning algorithm to learn which possible privilege grants may result in over-privilege
and restrict these privileges accordingly. The supervised algorithms attempt to learn any
patterns in the past data and use these to predict future privilege assignments. While priv-
ileges used previously are likely to be used again and rarely used privileges can be denied
with some degree of confidence, it is difficult to predict the usage of a future privilege that
has never been used before using only past patterns.
Figure 3.1 and Figure 3.2 show the scores of algorithms regardless of the size of the
observation period. We next examine the performance of these algorithms for fixed β values
as the observation period size varies. We chose values β = 80 and β = 1/10 because these
seemed the most interesting in terms of the trade-offs between the various methods. The
performance of the unsupervised and naive algorithms for β = 80 are shown in Figure 3.3.
The choice of ǫ as the threshold for determining which users are alike presents interesting
trade-offs between window size and score. In general, using the median for calculating ǫ
consistently provides slightly better scores than the naive approach across all window sizes
with the scores for both the unsupervised algorithm (with the middle method) and naive
algorithm peaking at 115 days. Using the average and middle methods for calculating ǫ both
provide better scores for observation periods < 40 days, but their scores level off there and
begin to gradually decrease after peaking at 59 days for the average method and 68 days for
the median method.
The performance of the supervised and naive algorithms for β = 1/10 are shown in
Figure 3.4. The naive algorithm achieves its best performance with an observation period

48
0.95

0.9

0.85

0.8
Score

0.75

0.7

0.65

0.6

0.55
0 20 40 60 80 100 120
Days
DBSCAN-Average DBSCAN-Median DBSCAN-Middle Naïve

Figure 3.3: Naive vs. Unsupervised Algorithms, β = 80

of one day and steadily declines after that. The supervised algorithms all achieve their
best performance with an observation period size of 2 or 3 days and then decline until
leveling off around six and seven days. Among the supervised methods, the SND approach
performs the best for observation periods less than five days but declines more rapidly than
the SOD labeling method. Although not charted here, the precision score of the supervised
methods constantly increases and the recall score constantly decreases as the observation
period increases. The increase in precision is not rapid enough to overcome the decrease in
recall after the observation period exceeds 3 days however, which is why the scores for the
supervised algorithms decrease or level off after that point. Conversely, the precision score
of the naive method constantly decreases and the recall score constantly increases as the

49
0.7

0.65

0.6

0.55

0.5
Score

0.45

0.4

0.35

0.3

0.25

0.2
1 2 3 4 5 6 7
Days
Naïve DT-SOD-B1/10 DT-SND-B1/10

Figure 3.4: Naive vs. Supervised Algorithms, β = 1/10

observation period increases.

3.6.2 Decomposed Models Results

In this section we present the results after decomposing the dataset in separate models
for weekday and weekend data using the decomposition methods discussed previously in
Section 3.5.4.
The performance of the complete and decomposed models for β values >= 1 for both the
naive algorithm and the unsupervised algorithm (with the average method for calculating ǫ)
are shown in Figure 3.5. For both algorithms, the weekday model performance is superior
to the complete model for β values >= 1. The trend previously illustrated in Figure 3.2 of

50
0.95

0.85

0.75

0.65
Score

0.55

0.45

0.35

0.25
1 2 5 10 20 30 40 50 60 70 80 90 100
Beta
Naïve-Complete Naïve-WeekDay Naïve-WeekEnd
Average-Complete Average-WeekDay Average-WeekEnd

Figure 3.5: Decomposed Models Unsupervised β >= 1

the unsupervised algorithm under-performing the naive algorithm for low β but eventually
outperforming it as β increases is also present in this chart but more pronounced. The
performance gap between unsupervised and naive algorithms widens in the decomposed
models with the unsupervised algorithm overtaking the naive algorithm at β = 50 for the
weekday model and β = 40 for the weekend model, where previously the unsupervised
algorithm did not outscore the naive algorithm in the complete model until β = 90. The
weekend model performance is generally worse than the complete model performance. There
are two primary reasons for this: first, there is less data availabel to the weekend model,
with only 28% of the complete model data; second, the activity of users on the weekends is
lower and highly inconsistent, making it harder to find similar entities and less likely that

51
similar users will exercise similar privileges in a cluster if identified.

0.9

0.8

0.7
Score

0.6

0.5

0.4

0.3
1/100 1/90 1/80 1/70 1/60 1/50 1/40 1/30 1/20 1/10 1/5 1/2 1
Beta
Naïve-Complete Naïve-WeekDay Naïve-WeekEnd
DT-SND-Complete DT-SND-WeekDay DT-SND-WeekEnd

Figure 3.6: Decomposed Models, Supervised β <= 1

The performance of the complete and decomposed models for β values <= 1 for both the
naive algorithm and the supervised algorithm (using the SND labeling method) are shown
in Figure 3.6. As with the unsupervised algorithm and β values >= 1, the weekday model
outperforms the complete model while the weekend model under-performs the complete
model where β values <= 1 as well. The performance gap between the weekday and complete
models for the supervised algorithm is much larger than in previously examined experiments.
With the inconsistent activity of the weekend actions removed, the supervised algorithm is
better able to identify and leverage patterns to create security policies. The performance

52
of the supervised algorithm for the weekend model decreased substantially compared to the
complete model however. For β = 1/30, the supervised weekend model scored 39% lower
than the complete model, while the naive weekend model scored only 19% lower than its
complete model. The reasons for the lower weekend model scores for the supervised algorithm
are the same as the lower weekend model scores for the unsupervised algorithm: there is less
data to work with and higher variability in that data.

3.6.3 Recomposed Models Results

Section 3.6.2 illustrated how decomposition improved scoring for the weekday model, but
we are interested in finding the highest possible score across all days in the availabel dataset.
To improve the overall score, we combine two previously examined models using
one model and algorithm for the weekday policies and another model and algorithm for the
weekend policies which we refer to this as a recomposed model. To build the recomposed
model, we use policies from the weekday model when evaluating weekdays, but as the pre-
viously examined results have shown, the weekend models performed fairly poor so we will
instead use policies generated by the complete model when evaluating weekends.
The performance of the complete and recomposed models for β values >= 1 for both
the naive algorithm and the unsupervised algorithm (with the average method used for
calculating ǫ) are shown in Figure 3.7. For the unsupervised algorithm, the recomposed
model outscores the complete model for β values >= 5, and outscores the naive algorithm
for both the complete and recomposed models for β >= 50, with the performance gap
increasing after that as β increases. For the naive algorithm however, the improved scores
of the weekday model are not enough to offset the poorer scores of the complete model for
the weekend days, thus the recomposed model using the naive algorithm scores almost the
same as the complete model for β > 5. The scores for the highest β value tested are .9379
for the recomposed model with the unsupervised algorithm and .9149 for the recomposed
model with the naive algorithm, an improvement of 2.5% over an already fairly high score.

53
0.95

0.85

0.75

0.65
Score

0.55

0.45

0.35

0.25

0.15
1 2 5 10 20 30 40 50 60 70 80 90 100
Beta
Naïve-Complete Naïve-WD/Naïve-Complete Average-Complete Average-WD/Naïve-Complete

Figure 3.7: Recomposed Models, β >= 1

The performance of the complete and recomposed models for β values <= 1 for both
the naive algorithm and the supervised algorithm (using the SNDT labeling method) are
shown in Figure 3.8. For the recomposed model using the supervised algorithm, the signif-
icantly improved scores of the weekdays using the weekday model are combined with the
weekends from the complete model to improve the overall scores by 89% compared to the
naive complete model at β = 1/100. For the recomposed model using the naive algorithm,
the improvement provided by the weekday model was not enough to offset the poor scores
of the weekend policies in the complete model, resulting in the recomposed model scoring
lower than the complete model for β < 1/2.

54
1

0.9

0.8
Score

0.7

0.6

0.5

0.4
1/100 1/90 1/80 1/70 1/60 1/50 1/40 1/30 1/20 1/10 1/5 1/2 1

Beta
Naïve-Complete Naïve-WD/Naïve-Complete DT-SND-Complete DT-SND-WD/DT-SND-Complete

Figure 3.8: Recomposed Models, β <= 1

3.6.4 Results Summary

Creating security policies is inherently an optimization problem that must balance be-
tween minimizing over-privilege and minimizing under-privilege. How much one values
achieving one of these objectives vs. the other can be expressed using the β value as described
in Section 3.4.3. The results of this section demonstrate the effectiveness algorithms and
decomposition methods that can be used to create better security policies for a
cloud environment with “better” being expressed in terms of the Fβ score.
We also presented the results of using decomposition methods to decompose the dataset
into weekday and weekend models and then use the best aspects of the weekday and complete

55
models for scoring across the complete dataset time period. Not all audit log datasets will
exhibit similar behavior that benefits from such decomposition, but it is reasonable to expect
many datasets consisting of audit log events generated by human privileged entities working
a five-day work week will. Regardless of the decomposition method used, we find that the
unsupervised algorithm performs more favorably as β increases due to its ability to
use information from similar users to predict the future use of privileges. The unsupervised
algorithm does not have a mechanism to deny privileges however, so its scores are relatively
low for small β values. Conversely, the supervised algorithm performs more favorably
as β decreases but poorly for large β values. The supervised algorithm is able to use
the recurring patterns in data to score well for restricting privileges, but scores poorly at
predicting possible new privileges that privileged entities may use. The naive approach
performs well only for values near β = 1, representing its favorability for environments
which value balancing over- and under-privilege nearly equally but it is outperformed by
the other algorithms as the β value increases or decreases away from β = 1. The key
takeaway from these results is that how an organization values over-privilege vs.
under-privilege will determine which algorithm is best suited for generating that
organization’s security policies; none of the three examined algorithms is clearly
superior to the others for all likely scenarios.

3.7 Summary

This paper addressed issues related to automatically creating least privilege policies in
the cloud environment. We defined the Privilege Error Minimization Problem (PEMP)
to directly address the goals of completeness and security when creating privilege policies,
and introduced a weighted scoring mechanism to evaluate a policy against these goals. We
adapted techniques from statistical forecasting and machine learning to train and evalu-
ate a supervised and an unsupervised learning algorithm for automated policy generation.
The results of our analysis show that the supervised algorithm performed well for reducing
over-privilege while the unsupervised algorithm performed well for reducing under-privilege

56
compared to a naive approach. These results demonstrate the potential to apply such au-
tomated methods to create more secure roles based on an organization’s acceptable level of
risk in accepting over-privilege vs. its desire to minimize the effort to correct under-privilege.
This paper suggests many possibilities for future research in automated least privileges
approaches. The policy generation approaches described in this paper are based on features
directly availabel in the audit logs such as the service name, user name, and privilege ex-
ercised. We would consider additional features for future research such as properties of the
requesting entity and the resources being operated on such as a user’s job title and organiza-
tional unit or the subnet(s) which a virtual resource operates within. Combining the ability
of the unsupervised algorithm (to predict the use of future privileges based on clusters of
similar users) with the ability of the supervised algorithm (to restrict privileges which are
unlikely to be used in the future) may also improve scoring.

57
CHAPTER 4
MINING LEAST PRIVILEGE ATTRIBUTE BASED ACCESS CONTROL POLICIES

4.1 Introduction

Access control is a key component of all secure computing systems but implementing
effective and secure access control policies is a significant challenge. Access control policies
are predictions about which privileged entities will exercise specific operations upon specific
objects under various conditions and accurately predicting the future is always difficult. Too
much over-privilege increases the risk of damage to the system via compromised credentials,
insider threats, and accidental misuse. Policies that are under-privileged prevent users from
being able to perform their duties. Both of these conflicting goals are expressed by the
principle of least privilege which requires every privileged entity of a system to operate using
the minimal set of privileges necessary to complete its job [20]. The principle of least privilege
is a fundamental access control principle in information security [1] and is a requirement in
security compliance standards such as the Payment Card Industry Data Security Standard
(PCI-DSS), Health Insurance Portability and Accountability Act (HIPAA) and ISO 17799
Code of Practice for Information Security Management [21].
Many access control models have been introduced to address the challenges of creating
and administrating secure and effective access control policies, with different approaches
attempting to balance between the competing goals of ease of use, granularity, flexibility,
the ability to leverage aspects unique to a specific domain, and scalability. Access control
models are constantly evolving, but Attribute Based Access Control (ABAC) continues to
gain in popularity as the solution to many access control use cases because of its flexibility,
usability, and ability to support information sharing across disparate organizations. ABAC
allows security policies to be created based on the attributes of the user, operation, and
environment at the time of an access request.

58
The flexibility of ABAC policies is both a major strength and a hindrance. With the
ability to create policies based on many attributes, administrators face difficult questions
such as what constitutes “good” ABAC policies, how to create them, and how to validate
them? Additionally, the ABAC privilege space of a system can be extremely large, so how can
administrators determine which attributes are most relevant in their systems? We address
these issues by taking a rule mining approach to create ABAC policies from audit logs. Rule
mining methods are a natural fit for creating ABAC policies because security policies are
a set of rules regarding the actions that users can perform upon resources. By identifying
common patterns of usage between the attributes and values from audit logs, rules can be
created based on an organization’s acceptable level of risk regarding under- vs. over-privilege.
By using out-of-sample validation to evaluate the effectiveness of the generated policies on a
dataset of 4.7M Amazon Web Service (AWS) log events, our experiments show that our rule
mining based approach is effective at generating policies which minimize the instances of
under-privilege (which allows users to perform their necessary duties), while also minimizing
over-privilege (which reduces security risks to the system).
We address the problem of creating least privilege ABAC policies using rule mining tech-
niques in this research through the following contributions: 1) a definition for the ABAC
Privilege Error Minimization Problem (P EM PABAC ) which addresses balancing between
under- and over-privilege errors in security policies, 2) an algorithm for automatically gen-
erating least privilege ABAC policies from mining audit logs, 3) an algorithm for scoring
ABAC policies using out-of-sample validation, 4) feature selection, scalability, and perfor-
mance optimization methods for processing large ABAC privilege spaces, 5) a quantitative
analysis of the performance of our mining algorithm using a real-world dataset consisting
of over 4.7M audit log entries, and 6) a performance comparison of automatically generated
ABAC policies created by our mining algorithm with automatically generated role based
policies.

59
The rest of this paper is organized as follows. Section 4.2 provides background information
on the ABAC model and rule mining methods. Section 4.3 reviews related work specific
to mining access control policies. Section 4.6 formally defines the ABAC version of the
privilege error minimization problem of mining ABAC policies with minimal under- and over-
privilege assignment errors and defines metrics for evaluating policies. Section 4.7 details
specific algorithms and methods used in our approach for addressing the problem defined
in Section 4.6. Section 4.8 analyzes the results of applying our algorithms to a real-world
dataset. Section 4.9 concludes and discusses potential future work.

4.2 Background
4.2.1 Attribute Based Access Control (ABAC)
4.2.1.1 ABAC Definition

NIST defines ABAC as “An access control method where subject requests to perform op-
erations on objects are granted or denied based on assigned attributes of the subject, assigned
attributes of the object, environment conditions, and a set of policies that are specified in
terms of those attributes and conditions” [42]. Attributes are any property of the subjects,
objects, and environment encoded as a name:value pair. Subjects may be a person or non-
person entity (such as an autonomous service), objects are system resources, operations are
functions executed upon objects at the request of subjects and environment conditions are
characteristics of the context in which access requests occur and are independent of subjects
and objects [42].

4.2.1.2 ABAC Benefits

Allowing any property to be encoded instead of restricting the model to predetermined

attributes and relationships between objects gives ABAC unlimited flexibility. Because of
this flexibility, ABAC is able to implement other access control models such as Discretionary
Access Control (DAC), Mandatory Access Control (MAC), and Role Based Access Control
(RBAC).

60
By using identity federation and basing access decisions on policies using an abstracted
common set of attributes, decisions can be externalized with policies established across orga-
nizational boundaries [43]. Because of these characteristics, the Federal Identity, Credential,
and Access Management (FICAM) Roadmap 2.0 called out ABAC as a recommended access
control model for promoting information sharing between diverse and disparate organiza-
tions [42].

4.2.1.3 ABAC vs. RBAC

The Role Based Access Control (RBAC) model has been the de-facto access control
standard for industry and academia for more than two decades [44]. Using RBAC, admin-
istrators identify privileges needed for common job functions, create roles for each function
and assign users to their appropriate roles for performing their duties. This simplifies the
administrators’ task compared to DAC and provides more granularity than MAC.
However, as access control needs have become more complex and applied to more di-
verse domains, organizations have found that RBAC does not provide sufficient granularity,
becomes too difficult to manage, or does not support their information sharing needs. Orga-
nizations facing these challenges may meet them using an ABAC based system. Consider the
case of an administrator that wishes to restrict operations needed for performing a database
backup to a specific maintenance window timeframe and a specific location or IP address
range. Such constraints can be easily expressed using ABAC attributes, but cannot be ex-
pressed using only the user, operation, and object semantics of the RBAC model. Another
common problem with RBAC is “Role Explosion”, where the need to define and assign users
many roles to access diverse sets of different applications within an organization makes main-
tenance of the many roles unmanageable. ABAC is able to address this problem by defining
policies based on user attributes (for example their job title, supervisor, or skill set in an
HR database) so that access control decisions are made according to attributes of the user
at the time of the access request.

61
4.2.2 Rule Mining Methods

Frequent itemset mining and association rule mining are two popular rule mining methods
for identifying patterns in commercial databases [45] with applications in many diverse fields.
Frequent itemset mining is the first step in association rule mining and is a deterministic
method that identifies common patterns in a database of transactions. The frequent itemset
problem is defined as follows, given a transaction database DB and a minimum support
threshold ǫ, find the complete set of frequent patterns in the database. The set of items is
I = {a1 , ..., an } and a transaction database is DB = hT1 , ..., Tm i, where Ti (i ∈ [1...m]) is a
transaction which contains a set of items in I. The support of a pattern A (where A is a set
|Ti ∈DB|A⊆Ti |
of items), is the fraction of transactions containing A in DB, support(A) = |DB|
. A
pattern is frequent if A’s support is >= ǫ, (the minimum support threshold) [46].
Association rule mining uses itemsets identified by a frequent itemset mining algorithm
to identify rules of the form X ⇒ Y where X ⊂ I, Y ⊂ I, and X ∩ Y = ∅. The first itemset
X is the “antecedent” and the second itemset Y is the “consequent”. The confidence of
a rule X ⇒ Y is the proportion of the transactions that contain X which also contain Y ,
support(X∪Y )
conf idence(X ⇒ Y ) = support(X)
. Given a transaction database DB, minimum support
threshold ǫ and minimum confidence c, the association rule mining problem is to find all of
the rules in DB that have support ≥ ǫ and conf idence ≥ c [47].
The output of frequent itemset mining is many subsets of items that occurred within the
transaction database DB, while the output of association rule mining is two subets (X ⇒ Y )
implying the probability that Y occurs in the transaction database given X. In the context
of creating security policies, there is a clear translation of frequent itemsets into ABAC rules.
Just as frequent itemsets state whether a pattern occurred or not (with a given support ≥ ǫ),
security policies must make a binary decision about whether a request should be allowed or
denied.

62
4.3 Related Work

We group related work into two categories: those that deal with generating RBAC least
privilege policies, and those that address the problems of modifying existing ABAC policies
or creating ABAC policies of minimal size. To the best of our knowledge, our work is the
first to address the problem of automatically creating least privilege ABAC policies.

4.4 Least Privilege Policy Generation

We first consider the set of related works which generate RBAC least privilege poli-
cies from audit logs. In [48], the authors formally define the Privilege Error Minimization
Problem (PEMP) which seeks to minimize the under-privilege and over-privilege assignment
errors of a policy put into operation. Naive, unsupervised learning, and supervised learning
algorithms are designed to mine RBAC policies using attribute information from audit logs.
Policy evaluation was performed by using out-of-sample validation over discretized time pe-
riods. Our work uses a similar evaluation method but designs a rule mining algorithm to
generate ABAC policies. With the ability to use attributes in mined policies (vs. user, oper-
ation, and resource ids only in RBAC), we are able to generate policies that simultaneously
reduce both under- and over-privilege when compared to RBAC policies in [48].
Another important work in generating least privilege policies is [25] which used Latent
Dirichlet Allocation (LDA) to create least-privilege RBAC policies from source code version
control usage logs. This method also used user attribute information in the mining process
although the resulting policies were RBAC policies. In [25], the authors introduced the
λ−Distance metric for evaluating candidate rules, which added the total number of under-
assignments to the total number of over-assignments with λ acting as a weighting factor on
the over-assignments to specify how much the metric values over-privilege vs. under-privilege
for a particular application.
Because the under- and over-assignments in λ−Distance are not normalized before being
added, it is easy for one side to dominate the equation. Extreme changes in λ may be needed

63
to trade off between under- and over-privilege, or slight changes to λ may cause extreme
changes in the resulting policies depending on the sizes of the log entries and privilege space.
This makes it difficult for an administrator to choose a λ value which accurately captures
their organization’s desired balance between under- and over-privilege.

4.5 ABAC Policy Mining

4.5.1 ABAC Rule Mining Based Works

One early work on applying association rule mining to ABAC policies was [49], which
used the Apriori algorithm [50] to detect statistical patterns from access logs of a set of
lab doors in a research lab. The dataset consisted of 25 physical doors and 29 users who
used a smart-phone application and Bluetooth to open the doors. The authors used the
output of the mining algorithm to identify policy misconfigurations by comparing mined
rules with existing rules. The performance of the algorithm was measured in terms of the
trade-off between success in detecting and guiding the repair of misconfigurations vs. the
inconvenience to users of suggesting incorrect modifications to policies. The dataset used
in [49] was ”somewhat small” as the authors noted, leaving questions as to its scalability in
terms of users and attributes where we use a much larger dataset in terms of the number of
events and attributes.
Another work, [51] presents a tool named Rhapsody which builds upon Apriori-SD [52],
a version of the Apriori algorithm modified for subgroup discovery. This work is similar to
our own in that it also seeks to create ABAC policies of minimal over-privilege by mining
logs however, it does not provide a weighting method for balancing between under-privilege
and over-privilege, nor does it consider large and complex privilege spaces. Instead, Rhap-
sody uses a simpler model of attributes of Users and Permissions only instead of the Users,
Operations, Resources and Environment attributes we use. Rhapsody uses a metric termed
reliability to quantify the confidence of a rule and all its significant refinements to assist in
simplifying and reducing over-privilege of policies. While Rhapsody is designed to operate
on “sparse” audit logs where only a small amount (≤ 10%) of all possible log entries are

64
likely to occur in the mined logs, our work is designed to operate on logs several orders of
magnitude more sparse than those of Rhapsody using optimization techniques described in
Section 4.7.3. One important limitation of Rhapsody is that run-time grows exponentially
with the maximum number of rules a request may satisfy, limiting number of attributes that
can be considered to “less than 20” [51], which would prevent a direct comparison using our
dataset of over 1,700 attributes. We also employ different metrics and scoring methodology
for evaluating policies compared to Rhapsody. The authors of [51] use the F-score metric
which was suitable for RBAC policy evaluation in our previous works, but we found to be
too dominated by the Precision component when scoring ABAC policies so we have cho-
sen to evaluate policies in terms of True Positive Rate and False Positive Rate separately.
Furthermore, we use a sliding window approach to evaluate policies over time which retains
their temporal dependencies vs. the random sampling cross-validation approach used in [51].
In [53], the authors used association rule mining to mutate existing ABAC policies as
a moving target defense against attackers who could compromise values of attribute stores
(with stores possibly distributed across multiple organizations). By expanding an existing
policy with new rules that use highly correlated attributes identified by using association rule
mining techniques on audit logs, this method provides additional protection in the event that
attribute values used by the original policy rules are compromised. While [53] also used rule
mining of audit logs, it did not create new policies nor did it aim to achieve least privilege
policies. Experimental results dealt with identifying correlations between attributes but the
analysis of the security of the results was qualitative so there were no metrics of goodness
similar to ours to use as a comparison between [53] and this work.

4.5.2 ABAC Policy Minimization Works

A few papers have been published to address the ABAC Mining Problem which deals
with finding ABAC policies of minimal size given a set of authorizations or audit log entries.
The ABAC Mining Problem was addressed by Xu & Stoller in [54], then formally defined by
another group of researchers in [55]. The metric for evaluating the minimal size of ABAC

65
policies in these works is Weighted Structural Complexity (WSC), which was introduced in
[56] to measure the size of RBAC policies and adapted to ABAC policies in [57].
In [54], Xu & Stoller presented an algorithm for mining ABAC policies from operational
logs. Their algorithm attempts to create policies that cover all the entries found in an audit
log while also minimizing the number of over-assignments and the WSC of the policy through
a process of merging and simplifying candidate rules. The authors defined Qrul (Equation
4.1), a quality metric for evaluating candidate rules during the mining process. In this quality
metric, |[[p]]| is the number of user-permissions in the possible permission universe covered by
a candidate rule p. |[[p]] ∩ U P | represents the number of user-permissions in the logs covered
by p, but not covered by existing rules in the policy. W SC(p) represents the WSC score of
rule p. The number of over-assignments granted by the rule is |[[p]] \ U P (L)| where L is the
operation log. Balancing between the number of over-assignments produced by a rule p and
the number of log entries covered by p is achieved by varying the over-assignment weight,
ω0′ .

|[[p]] ∩ U P | w′ × |[[p]] \ U P (L)|

Qrul(p, U P ) = × (1 − 0 ) (4.1)
W SC(p) |[[p]]|
The authors of [55] define the ABAC Mining problem as follows: Given the set of
authorizations A, the set of user attribute conditions (UC ), and the set of object attribute
conditions (OC ), the ABAC Mining Problem is to discover the minimum set of access rules
Π such that there exists a rule r ∈ Π where user u is allowed to perform permission p on
object o iff a = hu, o, pi ∈ A. Two algorithms are presented for addressing this problem:
an exponential run-time algorithm based on identifying functional dependencies between
attributes, and an asymptotic run-time bottom-up algorithm based on finding more general
rules from a candidate set of rules. As in [54], the authors use WSC to evaluate their mined
policies, but with the important difference that no over-privileges are allowed in the mined
policies while [54] optionally allowed over-assignments using the variable over-assignment
weight, ω0′ .

66
Both [54] and [55] mine rules and calculate coverage based on user-permission tuples,
where a tuple hu, o, ri contains a user, operation, and resource only, instead of considering
all of the valid attribute combinations in the privilege space. This reduces the computational
complexity of mining and evaluating rules, but unfortunately presents a problem for accuratly
evaluating ABAC policies because such a tuple may be both allowed and denied unless
considering all the attributes of the user, operation, and resource at the time of the user
request. The authors of [55] identify and address this problem by denying all instances
of a tuple if any single instance of that tuple is denied. This significantly reduces the
granularity and flexibility advantages of the ABAC model. This issue is further complicated
when evaluating policies over time because attribute values may change. To address these
problems, we base our metrics on the entire ABAC privilege space of valid attribute:value
pairs instead of the individual users, operations, and resources.
Another key difference between our work and all previous works cited in ABAC mining
is the evaluation of policies for least-privilege over time. None of the previous works on
ABAC policy mining captured the performance of mined policies in terms of under-privilege
vs. over-privilege when put into operation, which we contend is the most important measure
of a security policy. We use out-of-sample validation on a real world dataset to evaluate
the under-privilege and over-privilege rates of policies over time using a sliding window of
observation and operation periods, a method originally described in [48]. While minimizing
complexity (evaluated by WSC) is desirable in that it makes policies easier to maintain by
administrators, we see it as less important than least privilege performance over time. This
is especially true when using automated methods to build policies where less administrator
involvement is necessary. Methods for minimizing ABAC policy complexity are complemen-
tary to our work as once least privilege policies are identified, then methods for minimizing
policy complexity can be applied.

67
4.6 Problem Definition and Metrics

The problem we address in this paper is minimizing privilege assignment errors in ABAC
policies. Access control can be viewed as a prediction problem. The statements that comprise
a policy are predictions about which entities should be granted privileges to perform specific
operations upon the specific resources necessary to perform their jobs. The goal of this work
is to automatically generate policies that are accurate access control predictions. There have
been many access control related papers with similar but not entirely the same goals. To
help clarify the specific problem this paper addresses we formally define it as the ABAC
Privilege Error Minimization Problem (P EM PABAC ) in this section. We also define specific
metrics to be used in evaluating the performance of proposed solutions (in the form of ABAC
policies).

4.6.1 Problem Definition

Our problem definition is based on the Privilege Error Minimization Problem (PEMP)
originally defined in [48]. The PEMP defined the problem of creating least privilege RBAC
policies which consisted of users, operations, and objects. Like the original PEMP, our
problem seeks to minimize the under- and over-privilege assignment errors in policies and uses
the notions of observation and operation periods for evaluation. However, users, operations,
and resources are only some of the attributes availabel when creating ABAC policies so we
adapt the problem definition to the ABAC privilege space.
The size of an ABAC privilege space is determined by the attributes and values of valid
ABAC policies. A is the set of valid attributes which can be used in policies. As in other
ABAC mining works [49, 53, 54], we assume all attributes and values present in the logs
can also be used in building policies. Each individual attribute ai ∈ A has a set of atomic
values Vi which are valid for that attribute. All values for an attribute are the attribute’s
range Range(ai ) = Vi . The Cartesian product of all possible attribute:value combinations is
ξ = V1 ×...×Vn = {(v1 , ..., vn )|vi ∈ Vi for every i ∈ {1, ..., n}}. However, some attribute:value

68
pairs are not valid when present in combination with other attribute:value pairs because of
dependencies between them. For example, some operations are only valid on certain resource
types so combinations including both operation:DeleteUser and resourceType:File are not be
valid. The valid privilege universe ξ‘ is the set of all possible attribute:value combinations
when considering the dependency relationships between all attributes and values.
Any measure of security policy accuracy must also take time into account because the
amount of risk from over-privileges accumulates over time. Over-privilege carries the risk that
an unnecessary privilege will be misused, and this risk increases the longer the over-privilege
exists. To capture risk across a specified time period, we define the Operation Period (OP P )
as the time period during which security policies are evaluated against user operations. With
the concepts of the valid privilege universe ξ‘ and operation period OP P defined, we now
define the ABAC specific version of the Privilege Error Minimization Problem P EM PABAC
(Definition 1).

Definition 1. P EM PABAC : ABAC Privilege Error Minimization Problem. Given the uni-
verse of all valid attribute:value combinations ξ ′ , find the set of attribute:value constraints
that minimizes the over-privilege and under-privilege errors for a given operation period
OP P .

4.6.2 Evaluation Metrics

We use terminology from statistical hypothesis testing for assessing the effectiveness
of our algorithm in addressing the P EM PABAC . We first present our method for scoring
individual predictions, and then our method for splitting up the dataset and evaluating the
algorithm’s performance over multiple time periods.

4.6.2.1 Scoring Individual Predictions

Policy evaluation for a given operation period is a two-class classification problem where
every possible event in the ABAC privilege space falls into one of two possible classes:
grant or deny. By applying the policies generated from the observation period data to the

69
privileges exercised in the operation period, we can categorize each prediction into one of
four outcomes:

• True Positive (TP): a privilege that was granted in the predicted policy and exercised
during the OPP.

• True Negative (TN): a privilege that was denied in the predicted policy and not exercised
during the OPP.

• False Positive (FP): a privilege that was granted in the predicted policy but not exercised
during the OPP.

• False Negative (FN): a privilege that was denied in the predicted policy but attempted to
be exercised during the OPP.

TP
TPR = (4.2)
(T P + F N )
FP
FPR = (4.3)
(F P + T N )
Using the above outcomes we then calculate True Positive Rate (TPR) also known as
Recall and False Positive Rate (FPR) as shown in Equations 4.2 and 4.3, respectively.
As with the problem definition, these metrics are also derived from [48] but adapted from
RBAC to be more suitable to the ABAC privilege space. Where [48] used metrics based
on TPR and Precision, we used TPR and FPR instead. Precision ( (T PT+F
P
P)
) is suitable
when considering the users and operations because the universe of possible grants is roughly
on the same order of magnitude as the number of unique log events. When dealing with
the ABAC universe, the number of possible unique attribute:value combinations is likely to
be many orders of magnitude greater than the number of events in the operational logs.
To avoid over-fitting, ABAC rules must grant a large number of attribute:value privileges in
absolute terms (on the order of hundreds or thousands of attribute:value combinations in our
experiments), but are actually still quite small relative to the universe of possible attribute

70
combinations (which totals in the millions or billions). Stated another way, Precision is not
a suitable metric for use in mining ABAC policies from logs because it uses one term (TP)
which is driven primarily by the number of entries in the log, and another term (FP) which is
driven by the size of the privilege universe. On the other hand, both terms in the TPR (TP
and FN) are log derived, and both terms in FPR (FP and TN) are policy derived metrics.
TPR and FPR are the metrics used to evaluate a policy in terms of under-privilege
and over-privilege, respectively. If all privileges exercised in the OP P were granted, there
was no under-privilege for the policy being evaluated so F N = 0, and T P R = 1. As the
number of erroneously denied privileges (FNs) grows, T P R → 0, thus TPR represents under-
privilege. For the edge case that no privileges were exercised in the OP P we redefine TPR
to be T P R = 1, as no under-privilege is possible in this case. If all privileges granted by
the policy were exercised during the OP P , there was no over-privilege for the policy being
evaluated so F P = 0 and F P R = 0. As the number of erroneously granted privileges (FPs)
grows, F P R → 1, thus FPR represents over-privilege.

4.6.2.2 Scoring Policies Across Multiple Time Periods

To score policies across multiple time periods, we use out-of-time validation [32], a tem-
poral form out-of-sample validation. In out-of-sample validation, a set of data is used to
train an algorithm (training set) and a separate set of non-overlapping data is used to test
the performance of the trained algorithm (test set). In our evaluation, the training and test
sets are contiguous and the test time period immediately follows the training time period.
The training set is refered to as the Observation Period (OBP ), while the test set is the
Operation Period (OP P ) defined previously in Section 4.6.1. It is important to note that
this method preserves the temporal interdependencies between actions. For example, if an
employee moves to a new position within the organization, one would expect the privileges
mined for that employee in the future time periods would be very different from those mined
in the past time periods. Methods such as k-fold cross validation which randomly partition
a dataset (and used in [25] to evaluate policies) do not account for these temporal inter-

71
dependencies. When charting metrics for multiple time periods, we use the average of all
individual scores. This gives equal weight to each operation period score.

4.6.2.3 Scoring Infinite Possible Resource Identifiers

Quantifying the number of resources allowed or denied by a policy implies that there is a
known value for the number of possible resources in the system. This presents a challenge for
any least-privilege scoring approach that is not unique to the ABAC model or our method-
ology. While every system has finite limits on the resource identifier length and number of
resources, these can be so numerous that we can consider them as too large to quantify and
treat them as being infinite. For example, consider how many possible file names there are
255
for the ext4 file system with up to 255 bytes for the file name, 28 possible distinct file
names exist, excluding the file path [58].
Instead of counting all possible resource identifiers, we use the resource identifiers present
in the OBP and OP P for our policy scoring calculations. This approach presents several
advantages over other possible approaches such as using all values in the dataset, or in-
trospecting the environment for the resources present (which would be prohibitively time
consuming for our dataset). Only the recently used resources are counted, giving them
greater importance, and all necessary data is availabel in the audit logs. This also implies
that the valid privilege space ξ‘ may vary in size between scoring periods depending on the
resource identifiers present.

4.7 Methodology

This section presents both the algorithm we used to generate policies for addressing the
P EM PABAC problem as well as the algorithm we used to score these policies across multiple
operation periods.

72
4.7.1 Rule Mining
4.7.1.1 Scoring Candidate Rules

Our rule mining algorithm operates similarly to the mining algorithms presented in [25,
54] in that it considers the set of uncovered log entries and iteratively generates many
candidate rules, scores them, and selects the best scoring rule for the next iteration until all
of the given log events are covered by the set of generated rules. A critical component of
this approach is the metric used to evaluate candidate rules. Before describing the algorithm
design, we will first detail the metric used for evaluating candidate rules generated during
the mining process. We propose a candidate scoring metric termed the Cscore in this paper
using the following definitions.

• c is an ABAC constraint specified as a attribute:value pair, or single key and a set of values
key:{values}. Values are required to be discrete, continuous values must be binned to be
used by the mining algorithm. r is a rule consisting of one or more constraints. p is a
policy consisting of one or more rules.

• L is the complete set of log entries for the dataset, LOBP is the set of logs in the observation
period OBP , LOBP ⊆ L.

• LOBP (c) is the set of log entries which meet (are ”covered by”) the set of constraints c.
The constraint set may be specified by the use of a rule r or policy p, LOBP (c) ⊆ LOBP .

• ξ ′ is the privilege universe of valid log events as defined previously in Section 4.6.1.

The CoverageRate (Equation 4.4) is the ratio of all logs in the observation period covered
by a candidate rule r that are not already covered by other rules in the policy p (|LOBP (r) \
LOBP (p)|) to the remaining number of log entries not covered by any rules in the policy
(|LOBP \ LOBP (p)|). A candidate rule that covers more log entries is considered higher
quality than a rule that covers fewer log entries. The numerator of the OverPrivilegeRate
(Equation 4.5) first finds the number of valid attribute:value combinations in the privilege

73
universe which are covered by a rule (ξ‘(r)), minus those attribute:value combinations which
occur in the set of uncovered logs LOBP (r) \ LOBP (p), the result is the total number of
over-assignments for rule r. The total over-assignments are then normalized using the total
number of valid combinations in the valid privilege universe |ξ ′ |. A candidate rule which
has fewer over-assignments is considered higher quality than a rule that has more over-
assignments. The candidate score Cscore (Equation 4.6) is then the ω weighted addition of
the CoverageRate and the complement of the OverPrivilegeRate. By normalizing the under-
assignments using the number of log entries and the over-assignments using the size of the
valid privilege universe, the effect of varying the weight ω in the Cscore is more predictable and
results in better performance when compared to the λ−Distance metric which also uses a
variable weighting between over-assignments and under-assignments but does not normalize
these values (see Section 4.8.2 for Cscore vs. λ−Distance comparison details).

|LOBP (r) \ LOBP (p)|

CoverageRate(r, p, LOBP ) = (4.4)
|LOBP \ LOBP (p)|
|ξ ′ (r) \ (LOBP (r) \ LOBP (p))
OverP rivilegeRate(r, p, LOBP , ξ ′ ) = (4.5)
|ξ ′ |
Cscore (r, p, LOBP , ξ ′ , ω) = CoverageRate(r, p, LOBP )+
(4.6)
ω × (1 − OverP rivilegeRate(r, p, LOBP , ξ ′ ))

4.7.1.2 Rule Mining Algorithm

Our algorithm for mining an ABAC policy from the logs of a given observation period is
presented in Algorithm 4. Note that we use arithmetic operators =, +, − when describing
integer operations, and set operators ←, ∪, \, ∈, |size| when describing set operations. As
mentioned previously, the algorithm iteratively generates candidate rules from the set of
uncovered logs. To avoid confusion between the original set of log entries for the observation
period LOBP and the current set of uncovered log entries which is updated for each iteration
of the algorithm, we copy LOBP to Luncov at line 2. The FP-growth algorithm [46] is used
to mine frequent itemsets from the set of uncovered observation period log entries (line 4).

74
Algorithm 4: Rule Mining Algorithm
Input: LOBP The set of log entries representing user actions during the observation
period OBP .
Input: ω under-privilege vs. over-privilege weighting variable.
Input: ǫ Threshold value for minimum itemset frequency.
Input: ξ‘ The set of all valid attribute:value combinations that comprise the privilege
universe.
Output: policy The set of ABAC rules that make up the policy to be applied during
the operation period OP P .
1 policy ← ∅;
2 Luncov ← LOBP ;
3 while |Luncov | > 0 do
4 itemsets ← F P −growth.f requentItemsets(Luncov , ǫ);
5 candidateRules ← ∅;
6 for itemset ∈ itemsets do
7 rule = createRule(itemset);
8 coverageRate = |Luncov (rule)|
|Luncov |
;
9 overAssignmentRate = |ξ‘(rule)|−|L uncov (rule)|
|ξ‘|
;
10 rule.Cscore = coverageRate + ω × (1 − overAssignmentRate);
11 candidateRules ← candidateRules ∪ rule;
12 end
13 bestRule = sortDescending(candidateRules, Cscore )[0];
14 policy ← policy ∪ bestRule;
15 Luncov ← Luncov \ Luncov (bestRule);
16 end
17 return policy

75
The itemsets returned by the FP-growth algorithm are sets of attribute:value statements,
and each of these itemsets is used to create a candidate rule which is then scored using the
Cscore metric (lines 6-12). After all candidates are scored, the highest scoring rule is selected
and added to the policy, then all log entries covered by that rule are removed from the set
of uncovered log entries (lines 13-15). The mining process continues until all log entries are
covered (lines 3-16).

4.7.2 Policy Scoring

Once the observation period logs have been mined to create a policy, that policy is scored
using the events that took place during the operation period immediately following the mined
observation period as described in Algorithm 5. Each event during the operation period is
evaluated against the mined policy (lines 3-10), events allowed by the policy are TPs, while
events denied by the policy are FNs. A unique combination of attirbute:value pairs may
occur multiple times within the same time period. The TPs and FNs are both values based
on the number of times an event occurs in the log. The set of unique events that were
exercised in the operation period and granted by the policy is also maintained (line 6) in
order to calculate the FPs later (line 15). By counting each TP and FN instead of unique
occurrences, the resulting TPR is frequency weighted. Events that occur more frequently
in the operation period have a greater impact on the resulting TPR than those events that
occur less frequently.
While the TPs, FNs, and resulting TPR are based on the frequency weighted count of
events present in the log, the FPs, TNs and resulting FPR cannot be frequency weighted
because each unique valid event of the privilege universe is either granted or denied only once
by the policy. To obtain these values (FP, TN, FPR), we first determine how many unique
events out of the valid privilege space are granted by the policy (lines 11-14). It is important
to note that enumerating the entire privilege space and testing every valid event against the
policy would be much more computationally intensive than our approach, which is to use
information about the valid privilege space to enumerate only the valid events allowed by

76
Algorithm 5: Policy Scoring Algorithm
Input: LOP P The set of log entries representing user actions during the operation
period OP P .
Input: ξ‘ The set of all valid attribute:value combinations that comprise the privilege
universe.
Input: policy The set of ABAC rules that make up the policy to be applied during
the operation period OP P .
Output: T P R, F P R The true positive and false positive rates of the policy
evaluated against the operation period OP P .
1 T P = F N = 0;
2 exercisedGrantedEvents ← ∅ ;
3 for event ∈ LOP P do
4 if policyAllowsEvent(policy, event) then
5 T P = T P + 1;
6 exercisedGrantedEvents ← exercisedGrantedEvents ∪ event;
7 else
8 F N = F N + 1;
9 end
10 end
11 eventsAllowedByP olicy ← ∅;
12 for r ∈ policy do
13 eventsAllowedByP olicy ← eventsAllowedByP olicy ∪ ξ‘(rule);
14 end
15 F P = |eventsAllowedByP olicy \ exercisedGrantedEvents|;
16 T N = |privU niverse| − (T P + F N + F P );
17 if T P + F N == 0 then
18 T P R = 1;
19 else
20 T P R = T P/(T P + F N );
21 end
22 F P R = F P/(F P + T N );
23 return T P R, F P R

77
each rule. Most mined rules only allow a small percentage of the privilege space except in
cases of extreme ω values.
Once the set of all the unique events allowed by a policy has been enumerated, we remove
the set of unique events which occurred and were granted during the operation period to
obtain the number of total FP events for the policy (line 15). At this point we have obtained
the unique sets of TPs, FNs, and FPs, so any remaining privilege in the valid privilege
universe not in these sets must be a TN (line 16). With these values calculated, we can
then calculate the TPR and FPR, with the caveat that in the case where no privileges were
exercised during the operation period, we define T P R = 1 because there could not be any
instances of under-privilege (lines 18-22). The purpose of the policyAllowsEvent() function
is self-explanatory and trivial to implement, so the implementation of this method is omitted
due to space considerations.

4.7.3 Optimizations For Large Privilege Spaces

Dealing with the large number of possible attributes:value combinations that may com-
prise an ABAC privilege space can be a significant challenge compared to the simpler RBAC
privilege space. Using all attributes and values present from logs may make the privilege
universe computationally impractical to process. But discarding too many attributes or im-
portant attributes may result in less secure policies. We address these issues by using feature
selection and partitioning methods to make large ABAC privilege spaces more manageable.

4.7.3.1 Preprocessing And Feature Selection

Intuitively, attributes which occur infrequently in the logs or have highly unique values
are poor candidates for use in creating ABAC policies. Uncommon attributes are difficult
to mine meaningful patterns from because there is less data available to identify patterns
from. Also, rules created with uncommon attributes are less useful in access control decisions
because future access requests are unlikely to use these attributes as well. Using attributes
with unique values (the attribute value is never or rarely duplicated across log entries) is likely

78
to result in over-fitting for any rules created with those attributes. Following this reasoning,
we perform preprocessing on our dataset to select and bin the most useful attributes as
follows.

1. Remove unique and redundant attributes using U niquness where U niquness =

U niqueV alues
AttributeOccurrences
.

2. Remove redundant correlated attributes.

AttributeOccurrences
3. Sort attributes by F requency = T otalLogEntries
. Select attributes above a frequency
threshold, θ.

4. Sort the remaining values by U niquness, high U niquness are candidates for binning
or removal.

Our full dataset contained 1,748 distinct attributes (see Section 4.8.1 for dataset descrip-
tion). In step (1) attributes with U niqueness ≈ 1.0 nearly always have unique values, and
U niqueness ≈ 0.0 implies the attribute values are nearly always the same. Resource identi-
fiers are given an exception to the uniqueness test in this step as they are expected to have
high uniqueness. For our dataset, we identified and removed two always unique attributes,
eventID and requestID, and one attribute that always had the same value accountId. We
confirmed that these attributes would always meet the uniqueness criteria with the AWS
documentation. Applying step (2), we identified three distinct attributes for the user name
with a 1:1 correlation and removed two of them. For step (3) we selected two thresholds to
build two datasets for experimentation, theta = 0.1 and theta = 0.005, we term the priv-
ilege universes built using these thresholds ξ‘0.1 and ξ‘0.005 , respectively. Figure Figure 4.1
charts the rank of the top 50 most common attributes after our feature selection process was
complete. The attribute frequency follows the common power law distribution with a “long
tail”; the remaining attributes not charted here occurred in less than 0.2% of the log entries.
Next we apply step (4) of our process to our dataset. Some of the remaining attributes still
have fairly high U niquness values which are difficult to mine meaningful rules from. In our

79
1

0.8
Frequency

0.6

0.4

0.2

0
0 10 20 30 40 50
Attribute Frequncy Rank
Figure 4.1: Top 50 Attributes Ranked by Frequency
dataset, some of these attributes such as checksum values are not relevant to creating security
policies and can be discarded. Others are attributes which may benefit from binning into a
smaller subset of values. There were three such attributes in our dataset: sourceIPAddress,
userAgent, and eventName. The sourceIPAddress is an IPv4 address with over 4 billion
possible values. After consulting with the system administrator of the dataset provider, we
found that it was unlikely they would use rules based on the raw IP address since users will
change IPs frequently. Instead, they preferred to derive the geographical location from the
IP address so IPs were binned by U.S. states and each country the organization’s users may
log in from. The userAgent attribute is the AWS Command Line Interface (CLI), Software
Development Kit (SDK), or web browser version used when making a request. This field
benefits from binning as users are likely to perform similar requests from a web browser,
but they may upgrade their browser version regularly. Without binning the many different
browser versions into a single group, a mining algorithm would not effectively learn user

80
patterns. Again, the dataset provider agreed that the raw value was too granular for use so
the userAgent attribute was binned into 10 buckets. The eventName attribute is the name
of the operation. This attribute is already effectively binned because each eventName can
only be associated with one eventSource which is the AWS service name associated with the
operation. We derived two additional attributes to bin eventName, one based on whether it
was a Create, Read, Update, Delete, or Execute operation, and a second derived attribute
based on the first word of the eventName. For example the operation “StartInstance” is
binned into a bucket with other attributes that begin with “Start”. Experiments showed
this improved T P R with a negligible decrease in F P R at a ratio of 20:1.

4.7.3.2 Mining Algorithm Optimizations

The resulting ABAC privilege space may still be quite large even for a modest dataset
after applying the feature selection and binning methods as just described in Section 4.7.3.1.
This section describes partitioning techniques we applied to split up the privilege space during
the policy mining process. Partitioning techniques (as used in databases to split large tables
into smaller parts) are used to both reduce the memory footprint of our algorithms, and to
improve performance by performing operations in parallel across multiple processors.
The rule mining algorithm (Algorithm 4) uses partitioning to improve the run time
and space efficiency for storing and searching the privilege universe ξ‘. The total number
of valid combinations of ξ‘ was on the order of billions for some of our experiments, but
Algorithm 4 only needs to determine the number of privileges covered by a rule and it does
not need to enumerate and store all possible privilege combinations in memory. This is
a subtle but important difference because it means we can calculate the number of valid
privilege combinations by splitting ξ‘ into smaller sets of independent partitions to perform
this calculation. The total number of valid privilege combinations covered by a rule is the
product of the number of valid privilege combinations covered by each separate partition,
i.e., |ξ‘(r)| = |P1 (r)| × ... × |Pn (r)| where the attributes of each partition Pi are independent
of the attributes in all other partitions.

81
To create these partitions, the AWS documentation was used to identify dependencies
between attributes in our dataset. Next, a simple depth first search was used to identify
connected components of interdependent attributes. The valid attribute:value combinations
for all attributes in each connected component were then enumerated and stored into one
inverted index for each partition. Finding the number of valid privilege combinations covered
by a rule in a partition (|Pn (r)|) is accomplished by searching the inverted index using the
rule’s attribute:value constraints as search terms. As a result of this partitioning, our queries
were performed against three indexes on the order of thousands to hundreds of thousands of
documents vs. a single index that would have been on the order of hundreds of millions to
billions of documents if such a partitioning scheme were not in use.
For our dataset, a depth first search identified one connected component of all user
attributes, and another connected component of operations and resources. Operations and
resources were connected because most operations are specific to a single or set of resource
types. We grouped all other attributes that were independent of users and operations into
a third component which included environment attributes such as the sourceIPAddress and
userAgent. Although this grouping of attributes by components was obtained from processing
our specific dataset, it is reasonable to assume that the user attributes are independent of
the valid operation and resource attribute combinations in other datasets as well. This is
also consistent with the NIST ABAC guide which defines environment conditions as being
independent of subjects and objects [42].
Due to the large number of candidate rules generated by the F P −growth algorithm,
scoring of candidate rules is the most computationally intensive part of Algorithm 4 in our
experiments (except for those with fairly large ǫ values which generate few candidates). The
search against the inverted index is also parallelized to improve performance.

4.7.3.3 Scoring Algorithm Optimizations

To improve the run time performance of the policy scoring algorithm (Algorithm 5)
and enable it to deal with a privilege space larger than the available memory, we again

82
employ partitioning and parallelization methods. As mentioned in 4.7.2, Algorithm 5 must
enumerate the set of all privilege combinations covered by a rule in order to identify the total
unique number of privilege combinations covered by a policy. If extreme values for ω are
chosen, it is possible for Algorithm 4 to generate rules with a large number of over-privileges,
possibly the entire privilege space. Therefore, Algorithm 5 must be able to deal with the
possibility that it will have to enumerate all privilege combinations of ξ‘, although again,
this only happens for extreme values of ω, and this is for the out-of-sample validation for
policy scoring only, not the rule mining algorithm.
To deal with the possible need to enumerate a large portion or even all of the privilege
space, we partitioned ξ‘ along two attributes so that the values of those attributes are placed
into separate partitions. As with any partitioning, choosing a key that nearly equally splits
the universe of possible values is important. For our experiments, we chose to partition
the ξ‘ space along the attributes associated with the operation name and the user name.
The overall correctness of the algorithm is independent of the partition keys used, and 1...n
partitions may be used for each attribute depending on the size of the privilege space and
available memory.
Each of these partitions is operated on in parallel when evaluating each rule of the
policy. Unique hashes of the enumerated events are used in order to deduplicate events
which may be generated by more than one rule. This partitioning and parallelization takes
place within lines 11-14 of Algorithm 5. We describe these optimizations here because they
are useful in speeding up and scaling the algorithm when dealing with a large number of
attribute:value pairs, but we omit it from the pseudo-code in Algorithm 5 in order to simplify
the presentation of the parts of the algorithm necessary for correctness.

4.8 Results

We use the Receiver Operating Characteristic (ROC) curve to compare the performance
of various algorithms and parameters. The ROC curve is a graphic commonly used to chart
the performance of binary classifiers. It charts the trade-off between the TPR and the FPR

83
of a binary classifier, with the ideal performance having a TPR value of one and FPR value
of zero. Our charts also include the Area Under the Curve (AUC) which measures the area
underneath the ROC curve. This provides a single quantitative score that incorporates both
the F P R and T P R as the weighting metric is varied with higher AU C scores being more
favorable.
First, we describe our dataset used for these experiments. Next we present experimental
results and analysis to justify our choices for the candidate evaluation metric Cscore , including
a comparison of several possible methods for normalizing the CoverageRate variable. Then
we examine the effect of varying the two adjustable input variables to the mining algorithm,
the length of the observation period (|LOBP |), and the minimum support threshold (ǫ).
Finally, we compare the performance of our ABAC algorithm and policies to that of an
RBAC based approach.

4.8.1 Dataset Description

We examine the performance of our ABAC policy generation algorithm on a real-world

dataset. Our dataset was provided by a Software As A Service (SaaS) company and consists
of 4.7M user generated AWS CloudTrail audit events representing 16 months of audit data
for 38 users. We used user-generated audit events only, filtering out those events generated
by non-person entities. Events generated by non-person entities were very consistent, and it
is easy to derive very low under- and over-privilege security policies for them directly from
audit logs without using sophisticated methods. Addressing the problem of minimizing the
privilege assignment errors for human users is much more challenging so we chose to focus
on human generated log events only.
To better understand the privilege space used in our dataset and highlight the high
degree of variability in user behavior, statistics about user behavior are shown in Table
Table 4.1. This table separates the metrics by the first month, last month, and total for
16 months of data. Users is the number of active users during that time period. Unique
Services Avg. is the average number of unique services used by active users. Unique Actions

84
P
Avg. is the average number of unique actions exercised by active users, and Action
Avg. is the average of the total actions exercised by active users. The standard deviation
P
is also provided for Unique Services, Unique Actions, and Actions metrics to understand
P
the variation between individual users. For example, looking at both the Unique and
Actions, we observe that their standard deviation is higher than the average for all time
periods, indicating a high degree of variation between the number of actions that users
exercise.

Table 4.1: 16 Month Total Usage of our Dataset

Metric First Month Last Month 16 Months

Users 17 26 38
Unique Services Avg. 12.94 12.11 22.66
Unique Services StdDev. 10.16 9.98 16.70
Unique Actions Avg. 65.76 62.92 168.34
Unique
P Actions StdDev. 76.11 73.30 178.52
P Actions Avg. 9138.35 9664.19 123659.82
Actions StdDev. 20279.87 14124.15 235915.45

Based on our dataset of 4.7M user generated events, we derive two privilege universes
using our feature selection methodology described in Section 4.7.3.1. ξ‘0.1 used 15 attributes
and consisted of 510M unique attribute:value combinations. ξ‘0.005 used 40 attributes, 25
of which were resource identifiers so the universe size varied between 1.5B and 8.6B unique
attribute:value combinations depending on the number of resources used during the OBP
and OP P periods. All of the experiments in this section use ξ‘0.1 except for Section 4.8.4
which uses ξ‘0.005 .

4.8.2 Cscore Analysis

We consider three criteria in the design and evaluation of the Cscore metric for selecting
a single rule from many candidate rules generated by the F P −growth algorithm during
each iteration of our rule mining algorithm. C1:AU C is the Area Under the ROC Curve.

85
C2:Smoothness means that T P R values should increase monotonically as the F P R in-
creases. And, C3:Interpretability means that the effect of changing the weighting variable
should be predictable and easy to understand by an administrator who uses the metric in a
policy mining algorithm.

4.8.2.1 Evaluating Candidate Scoring Metrics

We propose the candidate scoring metric Cscore in Section 4.7.1.1, λ−Distance is pre-
sented in [25], and Qrul is presented in [54]. All of these metrics use the number of over-
assignments and number of log entries covered with a weighting variable for adjusting the
importance between over-assignments and coverage in their scoring of candidates. However,
these metrics differ in how they normalize these numbers (if at all) and how they implement
the weighting between them. The results of varying the over-assignment weightings for these
candidate evaluation methods are shown in Figure 4.2.
Four distinct versions of the Qrul metric are presented in Figure 4.2. Qrul is the metric
as presented in [54] (and in this paper as Equation 4.1). In [54], the authors also described
QrulF req, a frequency weighted variant of Qrul which should be a fairer comparison with our
frequency weighted policy scoring algorithm (Algorithm 5). The authors of [54] provide their
source code on their website. After inspecting this source code, it appears that the scoring
algorithms implemented in the source code for Qrul and QrulF req are slightly different
from those presented in the paper. Instead of using the number of privileges covered by a
rule out of the entire privilege universe ([[p]]) as the denominator for the over-assignments
side of the metric, the implemented metrics instead use the number of privileges covered by
a rule out of the log entries not covered by other rules already in the policy (|[[p]] ∩ U P |).
These “as-implemented” metrics, QrulImpl and QrulF reqImpl, perform more favorably
than their counterparts so we include them in our comparison here along with the versions
as documented in [54].
All of the examined metrics performed relatively well with high AU C values, but the
Cscore metric has the highest AU C value, thus being the most favorable metric per the

86
1

0.998

0.996

0.994
TPR

0.992

C-Score AUC=0.9993
0.99 QrulImpl AUC=0.9983
QrulFreqImpl AUC=0.9978
0.988 λ-Distance AUC=0.9972
QrulFreq AUC=0.9946
Qrul AUC=0.9922
0.986
0 0.2 0.4 0.6 0.8 1
FPR

Figure 4.2: Comparison of Candidate Evaluation Metrics

criterion C1:AU C. While we do not provide a quantitative score for C2:Smoothness, it
is evident from Figure 4.2 that the Cscore is much closer to a monotonic function than the
other metrics whose T P R values increase and decrease several times as the F P R increases.
The Qrul and QrulF req methods are particularly poor in terms of smoothness as they both
have an inflection point near ω‘o = 1, where increasing the weighting slightly after that point
w0′ ×|[[p]]\U P (L)|
causes the term |[[p]]|
of Equation 4.1 to be > 1. The Qrul-based metrics then take
the complement of this value, causing this side of the equation to produce negative values.
Furthermore, increasing the weighting beyond a certain point causes the metric to only
select those candidate rules which have zero over-assignments, resulting in the unterminated

87
portion of the ROC curve for Qrul (QrulF req has a similar inflection point that is difficult
to discern in Figure 4.2 at F P R = 0.0013).
Unlike the Qrul and λ−Distance metrics, Cscore normalizes both the number of logs
covered and over-assignments into a ratio between [0, 1] before applying the weighting. This
makes the weighting variable independent of the size of the privilege universe and number of
log entries and thus easier to understand and apply. In Figure 4.2, varying the ω weighting
1
of the Cscore results in the FPR values between ω = 10
and ω = 10, and varies the charted
F P R between F P R = 0.05 and F P R = 0.998 at relatively even intervals. To achieve a
similar spread across the F P R scores with QrulF reqImpl and λ−Distance, the variable
1 1
weighting for those metrics must be varied between 100
and 2000
. QrulImpl achieved the
second highest AU C score due to an unusually good score near F P R = 0.34, but QrulImpl
is difficult to assign a weighting to with predictable results. For example, the QrulImpl
1
score at F P R = 0.34, T P R = 0.9998 was achieved with ω‘0 = 100000
, but the next score at
1
F P R = 0.49, T P R = 0.9988 was achieved with ω‘0 = 500000
, which is a significant difference
that is difficult to determine without experimentation and consideration of the privilege
space and log sizes. Because of its predictability and more even distribution of results, we
find Cscore best meets our evaluation criterion C3:Interpretability.

4.8.2.2 Methods of Calculating CoverageRate

The CoverageRate (Equation 4.4) of the Cscore (Equation 4.6) is the number of log entries
covered by rule r normalized to the range [0, 1], so that it can be compared with the weighted
value of the OverP rivilegeRate (Equation 4.5) normalized to the same range. There are
several possible ways to compute such a coverage rate however, and it is not immediately
clear which would perform the best without experimentation. We consider four possible
methods of computing the CoverageRate and analyze their performance here:

|Luncov (r)|
• |Luncov |
: The frequency weighted number of logs covered out of the total number of
uncovered logs.

88
|{Luncov (r)}|
• |{Luncov }|
: The unique number of logs covered out of the set of unique uncovered logs.

|Luncov (r)|
• |LOBP |
: The frequency weighted number of logs covered out of the total number of logs
in the observation period.

|{Luncov (r)}|
• |{LOBP }|
: The unique number of logs covered out of the set of unique log entries during
the observation period.

The results of applying the four separate methods of computing the CoverageRate are
presented in Figure 4.3 and identified in that chart by the denominator of each method.
|Luncov (r)|
As evident in Figure 4.3, the |Luncov |
method performed the best for two of our criteria
for selecting a candidate metric: C1:AU C and C2:Smoothness. The frequency weighted
|Luncov (r)| |Luncov (r)|
methods |Luncov |
and |LOBP |
performed about the same in terms of C3:Interpretability
1
with ω = 10
resulting in scores in the upper-left most part of the chart. The methods using
the number of unique log entries performed less favorably in terms of C3:Interpretability
1
with their upper-left most points being reached near ω = 256
, a value farther away from 1
and more difficult to find without experimentation.

4.8.3 Effect of Varying Algorithm Parameters

In addition to the ω variable which is varied to generate the points along all of the ROC
curves in this section (with the exception of the RBAC algorithm curve in Figure 4.6), there
are two other parameters which can be varied as inputs to Algorithm 4: the threshold value
used by the F P −growth algorithm, ǫ, and the length of the observation period |LOBP |.

4.8.3.1 Effect of Varying Itemset Frequency Threshold

The minimum support threshold (ǫ) is used to specify that a pattern is considered a
“frequent” pattern if that pattern occurs in >= ǫ of the examined entries. Increasing ǫ
causes fewer candidate patterns to be identified by the F P −growth algorithm. The results
of varying ǫ between [0.05, 0.1, 0.2, 0.3] are shown in Figure 4.4. For both ǫ = 0.2 and ǫ = 0.3,
we observe inflection points in the chart as ω decreases because a lower ω value favors more

89
1

0.999

0.998
TPR

0.997

0.996

|𝕃 uncov| AUC=0.9993

0.995 |{𝕃 uncov}| AUC=0.9983

|𝕃 OBP| AUC=0.9974

|{𝕃 OBP}| AUC=0.9970

0.994
0 0.2 0.4 FPR 0.6 0.8 1

Figure 4.3: Comparison of Methods for Calculating Coverage Rates

granular rules in order to lower the over-privilege rate; however, higher ǫ values result in fewer
and less granular patterns being identified by the F P −growth algorithm. Stated another
way, low ω values generally result in lower F P R values, while high ǫ values generally result
in higher F P R values. The inflection points occur as a result of conflicting instructions
between low ω and high ǫ values.
As a result of generating more candidates for the mining algorithm to evaluate, lower
ǫ values generally result in higher AU C scores as well. The trade-off for more candidate
1
policies however is an increase in run time. At ω = 10
, the average mining times for
ǫ = 0.05, 0.1, 0.2, 0.3 were 29.8, 15.3, 2.8, and 1.2 minutes, respectively. The other charts in

90
1

ω=1/10
ω=1/10
0.995
TPR

0.99

0.985

ε=0.3 AUC=0.9640
ε=0.2 AUC=0.9940
ε=0.1 AUC=0.9993
ε=0.05 AUC=0.9996
0.98
0 0.02 0.04 0.06 0.08 0.1
FPR

Figure 4.4: Performance as Itemset Frequency Varies

this section were generated using ǫ = 0.1 as it offered the best trade-off between performance,
stability, and run time.

4.8.3.2 Effect of Varying Observation Period Length

When mining policies with a variable observation period length, a larger observation
window generally results in higher T P R but also higher F P R as a result of the mining
algorithms being given more privileges in larger observation periods as previously observed
in [48]. While this trend is also present with our mining algorithm, it is much less noticeable
than with the naive RBAC mining approach.

91
1

0.998

0.996
TPR

0.994

0.992 7 Days AUC=0.9963

15 Days AUC=0.9985
30 Days AUC=0.9993
45 Days AUC=0.9991
60 Days AUC=0.9987
0.99
0 0.2 0.4 0.6 0.8 1
FPR

Figure 4.5: Performance as Observation Period Varies

The results of varying the observation period length between |LOBP | = [7, 15, 30, 45, 60]
days are shown in Figure 4.5. As |LOBP | increases, the TPR generally increases compared to
lower |LOBP | periods of similar F P R values, and the resulting ROC curve becomes smoother.
1
As with ǫ, we observe a trade-off between |LOBP | and run time. At ω = 16
, the average mining
times for |LOBP | = [7, 15, 30, 45, 60] were 5.7, 6.5, 7.2, 10.5 and 12.8 minutes, respectively.
The other charts in this section were generated using |LOBP | = 30 days as it offered the best
trade-off between performance, stability, and run time.

92
1

0.95

0.9

0.85
TPR

0.8

0.75

0.7

0.65
ABAC AUC=0.9973
RBAC AUC=0.9269
0.6
0 0.05 0.1 0.15 0.2 0.25 0.3
FPR
Figure 4.6: Comparison of ABAC vs. RBAC Performance
4.8.4 ABAC vs. RBAC Performance

The final experiment we run is to compare the performance of our ABAC algorithm
against an RBAC mining algorithm. For this comparison, we use the naive algorithm pre-
sented in [48], which builds an RBAC policy based on the permissions exercised during an
observation period. Other role mining algorithms would perform very similarly because the
role mining problem is designed to fit a set of roles to a given matrix of user to permission
assignments, just with variations on how those users and permissions are grouped by roles to
minimize WSC. Although this RBAC algorithm is fairly simple, it performed quite well in

93
the scenario that sought an equal balance between low over-privilege and low under-privilege
when compared to more sophisticated algorithms [48].
The ROC curve of our ABAC algorithm and the naive RBAC algorithm from [48] are
presented in Figure 4.6. For this comparison, the ABAC algorithm used a fixed observation
period size of 30 days, an itemset frequency ǫ = 0.1, and the over-privilege weight varied be-
1
tween ω = [ 8192 , ..., 16] by powers of 2 to generate the data points. For the RBAC algorithm,
there is no variable similar to ω that can be used as a parameter to instruct the algorithm to
directly vary the importance between under-privilege and over-privilege. However, varying
the observation period length effectively serves this purpose by causing more or fewer priv-
ileges to be granted by the algorithm, so the observation period length was varied between
[3, 7, 15, 30, 45, 60, 75, 90, 105, 120] days to generate the data points for the RBAC algorithm
in Figure 4.6.
The ABAC algorithm significantly outperformed the RBAC algorithm across the ROC
curves in Figure 4.6. With only 30 days worth of data, the ABAC algorithm was able to
correctly grant more privileges (higher TPR) than the RBAC algorithm with 120 days of
data. The ABAC algorithm was also able to correctly restrict more unnecessary privileges
(lower FPR) than the RBAC algorithm operating on only 3 days of data. This is due to
the ability of the ABAC algorithm to identify and use patterns and create policies based on
attributes vs. the RBAC algorithm which is restricted to using only RBAC semantics.

4.9 Summary

This paper explored methods for automatically generating least privilege ABAC policies
that balance between minimizing under- and over-privilege assignment errors. We defined
the ABAC Privilege Error Minimization Problem (ABACP EM P ). We also presented metrics
and methodology for evaluating ABAC policies using out-of-sample validation. We adapted
techniques from unsupervised rule mining to create an algorithm which automatically per-
forms ABAC policy generation by mining audit logs with a variable weighting between under-
and over-privilege. We described optimization methods using feature selection, partitioning,

94
and parallelization to mine and score large ABAC privilege spaces. Finally, we presented the
results of applying our algorithm on a real-world dataset which demonstrated its effectiveness
as well as the better performance of our ABAC policies over mined RBAC policies.
This work suggests many possibilities for future research in generating secure ABAC
policies. Our candidate rule scoring metric, Cscore , can be expanded to consider policy com-
plexity (WSC), or our method can be combined with those which minimize policy complexity
only. Additional attributes may be incorporated from sources other than just audit logs such
as HR databases of user attributes, or by introspecting the application environment and ex-
tracting attribute information about existing resources. As the number of attributes grows,
so does the importance of feature selection for selecting highly relevant attributes that can
help improve the security of the generated policies without greatly increasing the runtime
and memory required by a mining algorithm.

95
CHAPTER 5
CONCLUSION

As access controls have evolved to cover the complex and various use cases of modern
computing, the burden of defining access control policies has also increased, often exceeding
the human ability to define policies that implement the Principle of Least Privilege. Along
with increasing complexity the commoditization of computing power, such as cloud comput-
ing, has made it easier than ever for organizations to rapidly deploy computing resources with
minimal effort (or training), thus increasing the risks and damages that may be caused as a
result of poor access control policies. The research presented in this thesis demonstrates the
effectiveness of several automated methods for creating access control policies that achieve
the principle of least privilege with quantitatively evaluations of their performance at reduc-
ing under-privilege and over-privilege on real world datasets. More specifically, the individual
projects that comprise this thesis have made the following contributions to advance the state
of access control research:

1. We explored the challenges and benefits of implementing an automated least privileges

approach for third party web services using real world data. We also presented a
concrete implementation of a framework for generating least privilege policies from
audit log data, and presented metrics and methodology for quantifying the effectiveness
of least privilege policies in an RBAC environment.

2. We formally defined the Privilege Error Minimization Problem (PEMP) which de-
scribed the problem of creating complete and secure RBAC privilege policies. Using
our previously defined metrics and policy generation framework we presented a method-
ology for training and validating one naive and two machine learning based algorithms.
Again using real world data, we present evaluation results for our presented algorithms.

96
3. We presented an association rule mining based algorithm to address the problem of
automatically creating ABAC policies. We also presented feature selection, scalability,
and performance optimization methods for processing the large privilege spaces that
are inherent to the ABAC environment. Using metrics adapted from our previous work
to better suit ABAC policies, we presented a quantitative analysis of the performance
of our mining algorithm using a real-world dataset and a comparison of our automat-
ically generated ABAC policies created by our mining algorithm with automatically
generated RBAC based policies.

97
REFERENCES CITED

[1] Harold F Tipton and Kevin Henry. Official (ISC) 2 guide to the CISSP CBK. Auerbach
Publications, 2006.

[2] Sara Motiee, Kirstie Hawkey, and Konstantin Beznosov. Do windows users follow the
principle of least privilege?: investigating user account control practices. In Symposium
on Usable Privacy and Security (SOUPS), 2010.

[3] Ericka Chickowski. Leaky Buckets: 10 Worst Amazon S3 Breaches. https:

//businessinsights.bitdefender.com/worst-amazon-breaches, 2018. Accessed:
2018-10-20.

[4] John E Dunn. Unsecured AWS led to cryptojacking attack

on LA Times. https://nakedsecurity.sophos.com/2018/02/27/
unsecured-aws-led-to-cryptojacking-attack-on-la-times, 2018. Accessed:
2018-10-21.

[5] RedLock CSI Team. Cloud Security Trends. https://info.redlock.io/hubfs/

WebsiteResources/RL_Cloud_Security_Trends_Oct_2107.pdf?t=1507325492499,
2018. Accessed: 2018-10-21.

[6] Darren Pauli. Dev put AWS keys on Github. Then BAD THINGS hap-
pened. https://www.theregister.co.uk/2015/01/06/dev_blunder_shows_github_
crawling_with_keyslurping_bots, 2015. Accessed: 2018-10-21.

[7] U.S. Department of Commerce. 2016 Top Markets Report Cloud Computing. http://
trade.gov/topmarkets/pdf/Cloud_Computing_Top_Markets_Report.pdf, 2016. Ac-
cessed: 2017-03-23.

[8] Amazon Web Services. Case Studies. https://aws.amazon.com/solutions/

case-studies, 2017. Accessed: 2017-03-20.

[9] Jerome H Saltzer and Michael D Schroeder. The protection of information in computer
systems. IEEE, 63(9):1278–1308, 1975.

[10] Ravi Sandhu, David Ferraiolo, and Richard Kuhn. The NIST model for role-based
access control: towards a unified standard. In ACM workshop on Role-based access
control, 2000.

98
[11] Jaideep Vaidya, Vijayalakshmi Atluri, and Janice Warner. Roleminer: mining roles
using subset enumeration. In Proceedings of the 13th ACM conference on Computer
and communications security, pages 144–153. ACM, 2006.

[12] Hassan Takabi and James BD Joshi. Stateminer: an efficient similarity-based approach
for optimal mining of role hierarchy. In Proceedings of the 15th ACM symposium on
Access control models and technologies, pages 55–64. ACM, 2010.

[13] Jrgen Schlegelmilch and Ulrike Steffens. Role mining with orca. In ACM Symposium
on Access control models and technologies (SACMAT), 2005.

[14] Ruowen Wang, William Enck, Douglas Reeves, Xinwen Zhang, Peng Ning, Dingbang
Xu, Wu Zhou, and Ahmed M. Azab. Easeandroid: Automatic policy analysis and
refinement for security enhanced android via large-scale semi-supervised learning. In
USENIX Security Symposium, 2015.

[15] Yongzheng Wu, Jun Sun, Yang Liu, and Jin Song Dong. Automatically partition
software into least privilege components using dynamic data dependency analysis. In
IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013.

[16] Aaron Blankstein and Freedman J. Michael. Automating isolation and least privilege
in web services. In IEEE Symposium on Security and Privacy, pages 133–148. IEEE,
2014.

[17] Amazon Web Services. AWS CloudTrail. https://aws.amazon.com/cloudtrail/,

2017. Accessed: 2017-02-20.

[18] Amazon Web Services. AWS Identity and Access Management (IAM). https://aws.
amazon.com/iam/, 2017. Accessed: 2017-02-20.

[19] Bob Violino. Cloud Computing Sees Huge Growth Rates Across All Seg-
ments. http://www.information-management.com/news/infrastructure/
cloud-computing-sees-huge-growth-rates-across-all-segments-10030682-1.
html, 2017. Accessed: 2017-09-07.

[20] Jerome H Saltzer and Michael D Schroeder. The protection of information in computer
systems. Proceedings of the IEEE, 63(9):1278–1308, 1975.

[21] SANS Institute. A compliance primer for it professionals.

https://www.sans.org/reading-room/whitepapers/compliance/
compliance-primer-professionals-33538, 2010. Accessed: 2018-09-05.

99
[22] Mario Frank, Joachim M Buhmann, and David Basin. On the definition of role mining.
In ACM Symposium on Access control models and technologies (SACMAT), pages 35–44.
ACM, 2010.

[23] Jaideep Vaidya, Atluri Vijayalakshmi, and Qi Guo. The role mining problem: finding
a minimal descriptive set of roles. In ACM Symposium on Access control models and
technologies (SACMAT), pages 175–184. ACM, 2007.

[24] Brian T. Sniffen, David R. Harris, and John D. Ramsdell. Guided policy generation for
application authors. In SELinux Symposium, 2006.

[25] Ian Molloy, Youngja Park, and Suresh Chari. Generative models for access control
policies: Applications to role mining over logs with attribution. In ACM Symposium on
Access control models and technologies (SACMAT). ACM, 2012.

[26] Suresh Chari, Ian Molloy, Youngja Park, and Wilferid Teiken. Ensuring continuous
compliance through reconciling policy with usage. In ACM Symposium on Access control
models and technologies (SACMAT), pages 49–60, 2013.

[27] Matthew Sanders and Chuan Yue. Automated least privileges in cloud-based web ser-
vices. In Hot Topics in Web Systems and Technologies (HotWeb). IEEE, 2017.

[28] Google, Inc. Manifest.permission — Android Developers. https://developer.

android.com/reference/android/Manifest.permission.html, 2017. Accessed:
2017-01-10.

[29] IBM Corporation. z/OS Security Server RACF General User’s Guide.
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_1.13.0/com.ibm.
zos.r13.icha100/toc.htm, 2012. Accessed: 2017-05-17.

[30] Amazon Web Services. IAM Policy Generator Source Code. https://awsiamconsole.
s3.amazonaws.com/iam/assets/js/bundles/policies.js, 2017. Accessed: 2017-05-
04.

[31] David F Ferraiolo, Ravi Sandhu, Serban Gavrila, D Richard Kuhn, and Ramaswamy
Chandramouli. Proposed nist standard for role-based access control. ACM Transactions
on Information and System Security (TISSEC), 4(3):224–274, 2001.

[32] John D. Kelleher, Brian Mac Namee, and Aoife D’Arcy. Fundamentals of Machine
Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Stud-
ies. MIT Press, 2015. ISBN 0262029448, 9780262029445.

100
[33] Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang, and Jorge Lobo.
Evaluating role mining algorithms. In ACM Symposium on Access control models and
technologies (SACMAT), pages 95–104. ACM, 2009.

[34] Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice.
OTexts, 2014.

[35] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to

Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. ISBN
0521865719, 9780521865715.

[36] Pedregosa, F., et al. Scikit-learn: Machine learning in Python. Journal of Machine
Learning Research, 12:2825–2830, 2011.

[37] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based
algorithm for discovering clusters in large spatial databases with noise. In Knowledge
discovery in databases (KDD), volume 96, pages 226–231. AAAI Press, 1996.

[38] scikit-learn developers. Overview of clustering methods. http://scikit-learn.org/

stable/modules/clustering.html, 2016. Accessed: 2017-09-01.

[39] Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. Classification
and regression trees. CRC press, 1984.

[40] Spyros Makridakis. Sliding simulation: A new approach to time series forecasting.
Management Science, 36(4):505–512, 1990.

[41] Spyros Makridakis, A Andersen, Robert Carbone, Robert Fildes, Michele Hibon, Rudolf
Lewandowski, Joseph Newton, Emanuel Parzen, and Robert Winkler. The accuracy of
extrapolation (time series) methods: Results of a forecasting competition. Journal of
forecasting, 1(2):111–153, 1982.

[42] Vincent C Hu et al. Nist 800-162: Guide to attribute based access control (abac)
definition and considerations (draft), 2013.

[43] Bill Fisher, Norm Brickman, Prescott Burden, Santos Jha, Brian Johnson, Andrew
Keller, Ted Kolovos, Sudhi Umarji, and Sarah Weeks. Attribute based access control.
NIST SPECIAL PUBLICATION, page 3B, 1800.

[44] Arjumand Fatima, Yumna Ghazi, Muhammad Awais Shibli, and Abdul Ghafoor Abassi.
Towards attribute-centric access control: an abac versus rbac argument. Security and
Communication Networks, 9(16):3152–3166, 2016.

101
[45] Trevor Hastie, Jerome Friedman, and Robert Tibshirani. The elements of statistical
learning. Springer series in statistics New York, NY, USA, 2001.

[46] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns without
candidate generation: A frequent-pattern tree approach. Data mining and knowledge
discovery, 8(1):53–87, 2004.

[47] Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between
sets of items in large databases. In ACM sigmod record, volume 22, pages 207–216. ACM,
1993.

[48] Matthew W Sanders and Chuan Yue. Minimizing privilege assignment errors in cloud
services. In Proceedings of the ACM Conference on Data and Application Security and
Privacy, pages 2–12, 2018.

[49] Lujo Bauer, Scott Garriss, and Michael K Reiter. Detecting and resolving policy mis-
configurations in access-control systems. ACM Transactions on Information and System
Security (TISSEC), 14(1):2, 2011.

[50] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association
rules. In Proceedings of the International Conference on Very Large Data Bases, VLDB,
volume 1215, pages 487–499, 1994.

[51] Carlos Cotrini Jiménez, Thilo Weghorn, and David A. Basin. Mining abac rules from
sparse logs. 2018 IEEE European Symposium on Security and Privacy (EuroS&P),
pages 31–46, 2018.

[52] Branko Kavšek, Nada Lavrač, and Viktor Jovanoski. Apriori-sd: Adapting association
rule learning to subgroup discovery. In Michael R. Berthold, Hans-Joachim Lenz, Eliz-
abeth Bradley, Rudolf Kruse, and Christian Borgelt, editors, Advances in Intelligent
Data Analysis V, pages 230–241, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.
ISBN 978-3-540-45231-7.

[53] Carlos E. Rubio-Medrano, Josephine Lamp, Adam Doupé, Ziming Zhao, and Gail-Joon
Ahn. Mutated policies: Towards proactive attribute-based defenses for access control. In
Proceedings of the Workshop on Moving Target Defense, 2017. ISBN 978-1-4503-5176-8.

[54] Zhongyuan Xu and Scott D Stoller. Mining attribute-based access control policies.
IEEE Transactions on Dependable and Secure Computing, 12(5), 2015.

[55] Tanay Talukdar, Gunjan Batra, Jaideep Vaidya, Vijayalakshmi Atluri, and Shamik
Sural. Efficient bottom-up mining of attribute based access control policies. In Proceed-
ings of the IEEE International Conference on Collaboration and Internet Computing
(CIC), pages 339–348, 2017.

102
[56] Ian Molloy, Hong Chen, Tiancheng Li, Qihua Wang, Ninghui Li, Elisa Bertino, Seraphin
Calo, and Jorge Lobo. Mining roles with semantic meanings. In ACM Symposium on
Access control models and technologies (SACMAT), pages 21–30, 2008.

[57] Zhongyuan Xu and Scott D Stoller. Mining attribute-based access control policies from
logs. In Proceedings of the IFIP Working Conference on Data and Applications Security
and Privacy, pages 276–291. Springer, 2014.

[58] Ext4 disk layout. https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout,

Aug 2018. Accessed: 2018-08-22.

103

IAM Solution Design
No ratings yet
IAM Solution Design
4 pages
FedRAMP Security Controls Baseline
No ratings yet
FedRAMP Security Controls Baseline
3,273 pages
An Evaluation Framework For Cybersecurity Maturity Aligned With The NIST CSF
No ratings yet
An Evaluation Framework For Cybersecurity Maturity Aligned With The NIST CSF
20 pages
Firewall Checklist PDF
No ratings yet
Firewall Checklist PDF
5 pages
In Ra CSCRF For Sebi Regulated Entities Deloitte India 20.09 Noexp
No ratings yet
In Ra CSCRF For Sebi Regulated Entities Deloitte India 20.09 Noexp
9 pages
Sample Network Audit Contract
No ratings yet
Sample Network Audit Contract
12 pages
Hardening Document Sample1
No ratings yet
Hardening Document Sample1
28 pages
Cyber Security Training Cum Certification Program
No ratings yet
Cyber Security Training Cum Certification Program
6 pages
CISSP Domain 2 Asset Security Detailed
No ratings yet
CISSP Domain 2 Asset Security Detailed
32 pages
3-Information Security Policies-23-05-2023
No ratings yet
3-Information Security Policies-23-05-2023
44 pages
(2025) NIST Privacy Framework 1.1
No ratings yet
(2025) NIST Privacy Framework 1.1
51 pages
IAM Strategy
No ratings yet
IAM Strategy
6 pages
Enabling Protection Against Data Exfiltration by Implementing Iso 27001:2022 Update
No ratings yet
Enabling Protection Against Data Exfiltration by Implementing Iso 27001:2022 Update
17 pages
HIPAA Compliance Checklist
No ratings yet
HIPAA Compliance Checklist
3 pages
CIS Controls v8 Mapping To SOC2 1 2024
No ratings yet
CIS Controls v8 Mapping To SOC2 1 2024
144 pages
Controlcasepciv4 241115112355 3cfe7e3f
No ratings yet
Controlcasepciv4 241115112355 3cfe7e3f
27 pages
Cisco Ironport and Exchange 2016
No ratings yet
Cisco Ironport and Exchange 2016
10 pages
2020 Gartner MQ For PAM
No ratings yet
2020 Gartner MQ For PAM
33 pages
Antivirus Software in Comparison
100% (1)
Antivirus Software in Comparison
12 pages
Proposal For Preliminary Gap Assessment and Compliance Audit As Per IRDAI Information and Cyber Security Guidelines
No ratings yet
Proposal For Preliminary Gap Assessment and Compliance Audit As Per IRDAI Information and Cyber Security Guidelines
23 pages
CIS Controls v8 Mapping To UK NCSC Cyber Assessment Framework v3.1 - 8-16-2022
No ratings yet
CIS Controls v8 Mapping To UK NCSC Cyber Assessment Framework v3.1 - 8-16-2022
115 pages
Tripwire CCM Policies and Platform Support 2015Q4 Datasheet
100% (1)
Tripwire CCM Policies and Platform Support 2015Q4 Datasheet
12 pages
Gartner Magic Quadrant 2023 Ve Quan Ly Quyen Truy Cap Dac Quyen
No ratings yet
Gartner Magic Quadrant 2023 Ve Quan Ly Quyen Truy Cap Dac Quyen
41 pages
Passing Unix Linux Audit With Power Broker
100% (2)
Passing Unix Linux Audit With Power Broker
11 pages
(Class Note) Module 10 - Security Metrics
No ratings yet
(Class Note) Module 10 - Security Metrics
71 pages
Ems E3
No ratings yet
Ems E3
1 page
Gartner Reprint
No ratings yet
Gartner Reprint
18 pages
TRA Checklist
No ratings yet
TRA Checklist
20 pages
CAU 08 Conjur - Fundamentals Integrations
No ratings yet
CAU 08 Conjur - Fundamentals Integrations
29 pages
CISSP-Identity and Access Management
No ratings yet
CISSP-Identity and Access Management
4 pages
SIEM Use-Cases Pertaining To PCI DSS V3
No ratings yet
SIEM Use-Cases Pertaining To PCI DSS V3
6 pages
TrendMicro Datasheet Cloud One Workload Security
No ratings yet
TrendMicro Datasheet Cloud One Workload Security
7 pages
Fatal System Error Joseph Menn Instant Download
100% (2)
Fatal System Error Joseph Menn Instant Download
55 pages
How To Recover Forgot Wallet Identifier in Blockchain
0% (1)
How To Recover Forgot Wallet Identifier in Blockchain
9 pages
Identity Management Design Guide
No ratings yet
Identity Management Design Guide
436 pages
RSA Via L-G CyberArk AppGuide
No ratings yet
RSA Via L-G CyberArk AppGuide
66 pages
AWS HIPAA Compliance Whitepaper PDF
No ratings yet
AWS HIPAA Compliance Whitepaper PDF
49 pages
Health Information Matrix
No ratings yet
Health Information Matrix
9 pages
Control Control Requirements: Statement of Applicability
No ratings yet
Control Control Requirements: Statement of Applicability
3 pages
Rsa Securid Suite: Accelerate Business While Mitigating Identity Risk
No ratings yet
Rsa Securid Suite: Accelerate Business While Mitigating Identity Risk
7 pages
ChangeAuditor ActiveDirectory 7.1 EventReferenceGuide
No ratings yet
ChangeAuditor ActiveDirectory 7.1 EventReferenceGuide
69 pages
SIIM Live Learning Center (Curriculum Module (Exam) )
No ratings yet
SIIM Live Learning Center (Curriculum Module (Exam) )
11 pages
A Look at Layered Security
No ratings yet
A Look at Layered Security
20 pages
Summary of The HIPAA Security Rule
No ratings yet
Summary of The HIPAA Security Rule
7 pages
Information Asset Register For Early Years Settings - Template
No ratings yet
Information Asset Register For Early Years Settings - Template
144 pages
The Periodic Table of Data Privacy
No ratings yet
The Periodic Table of Data Privacy
1 page
Cyber Security Questions
No ratings yet
Cyber Security Questions
13 pages
Computer Misuse
No ratings yet
Computer Misuse
25 pages
Network Usage Final
No ratings yet
Network Usage Final
8 pages
Red Teaming Frameworks and Methodologies
No ratings yet
Red Teaming Frameworks and Methodologies
1 page
Policylifecyclesummary
No ratings yet
Policylifecyclesummary
1 page
CISO Proposal 23dec2015
No ratings yet
CISO Proposal 23dec2015
7 pages
Differentiating Between Access Control Terms
No ratings yet
Differentiating Between Access Control Terms
12 pages
Information Security BDCR Plan DRAFT Nov 2019
No ratings yet
Information Security BDCR Plan DRAFT Nov 2019
11 pages
Nerc Cip
No ratings yet
Nerc Cip
30 pages
TDWI Checklist Report Six Capabilities To Eliminate Data Chaos SAP Loshin Russom Web PDF
No ratings yet
TDWI Checklist Report Six Capabilities To Eliminate Data Chaos SAP Loshin Russom Web PDF
12 pages
HIPAA Security101 For Covered Entities
No ratings yet
HIPAA Security101 For Covered Entities
11 pages
PIFSS IT Quality Assurance Office: Public Institution For Social Security
No ratings yet
PIFSS IT Quality Assurance Office: Public Institution For Social Security
7 pages
Circular Regarding Online Access of EDCP OB Corpus Card
No ratings yet
Circular Regarding Online Access of EDCP OB Corpus Card
11 pages
Csol 530 02 sp21
No ratings yet
Csol 530 02 sp21
16 pages
Presentation 2
No ratings yet
Presentation 2
13 pages
Cyber Security Workshop Lab File Complete
No ratings yet
Cyber Security Workshop Lab File Complete
43 pages
University of Ghana Projec (Pos)
No ratings yet
University of Ghana Projec (Pos)
54 pages
NIST Post-Quantum Cryptography
No ratings yet
NIST Post-Quantum Cryptography
34 pages
Heist CTF Notes - TryHackMe. Solve Ethereum Smart Contract CTF On - by Sle3pyHead ? - ? - May, 2025 - Medium
No ratings yet
Heist CTF Notes - TryHackMe. Solve Ethereum Smart Contract CTF On - by Sle3pyHead ? - ? - May, 2025 - Medium
7 pages
Information Security
No ratings yet
Information Security
138 pages
Pasword List (Italy)
No ratings yet
Pasword List (Italy)
116 pages
The Impact of Artificial Intelligence On Project Management: Enhancing Efficiency, Risk Mitigation, and Decision-Making in Complex Projects
No ratings yet
The Impact of Artificial Intelligence On Project Management: Enhancing Efficiency, Risk Mitigation, and Decision-Making in Complex Projects
24 pages
L15-Block Cipher Design Principles of Des - Strength of Des
No ratings yet
L15-Block Cipher Design Principles of Des - Strength of Des
16 pages
Authorizations Under HIPAA : C:/my Documents/authorizations
No ratings yet
Authorizations Under HIPAA : C:/my Documents/authorizations
6 pages
Iot Report
No ratings yet
Iot Report
11 pages
Abstract On Steganography
100% (1)
Abstract On Steganography
10 pages
Fortigate 120g Series
No ratings yet
Fortigate 120g Series
10 pages
Securing Web Portals
No ratings yet
Securing Web Portals
12 pages
CISA Audit Report Format - v3
No ratings yet
CISA Audit Report Format - v3
12 pages
FidBond Enrolment Form Template Pinaka Bago PassiNHS
100% (1)
FidBond Enrolment Form Template Pinaka Bago PassiNHS
7 pages
PDF - 3 ssrn-4641044
No ratings yet
PDF - 3 ssrn-4641044
23 pages
Cyber Security Syllabus
No ratings yet
Cyber Security Syllabus
2 pages
Bayanan 2
No ratings yet
Bayanan 2
63 pages
Auditing2e ppt15 l02
No ratings yet
Auditing2e ppt15 l02
29 pages
Springboard Courses 2020
No ratings yet
Springboard Courses 2020
27 pages
Brochure Product Certifications
No ratings yet
Brochure Product Certifications
4 pages
DLP For Observation October 18 2022 CSS 12
No ratings yet
DLP For Observation October 18 2022 CSS 12
7 pages
NetScaler 10.5 Administration
No ratings yet
NetScaler 10.5 Administration
182 pages
The Brian D. Kirkpatrick - Resume
No ratings yet
The Brian D. Kirkpatrick - Resume
2 pages
IP Firewall Address List - RSC
No ratings yet
IP Firewall Address List - RSC
5 pages
Pandit
No ratings yet
Pandit
13 pages
Hacking and Innovation: Gregory Conti
No ratings yet
Hacking and Innovation: Gregory Conti
4 pages
Mastering Active Directory
From Everand
Mastering Active Directory
VICTOR P HENDERSON
No ratings yet
Mastering Microsoft 365 ENTRA ID - 100 Practical Guides For Secure Identity and Access Management: Mastering Microsoft 365, #122
From Everand
Mastering Microsoft 365 ENTRA ID - 100 Practical Guides For Secure Identity and Access Management: Mastering Microsoft 365, #122
Openshelves
No ratings yet