0% found this document useful (0 votes)
29 views17 pages

Analyzing Component Composability of Cloud Security Configurations

The document discusses analyzing the composability of cloud security configurations from a security perspective using first-order predicate logic. It aims to determine if a system built using components that satisfy a security policy will result in a composed system that also satisfies that policy. It presents an approach to formally represent security properties and policies, implements the approach on an AWS application, and experimentally evaluates the scalability. The work seeks to help customers securely compose cloud systems using building blocks while ensuring compliance with security policies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views17 pages

Analyzing Component Composability of Cloud Security Configurations

The document discusses analyzing the composability of cloud security configurations from a security perspective using first-order predicate logic. It aims to determine if a system built using components that satisfy a security policy will result in a composed system that also satisfies that policy. It presents an approach to formally represent security properties and policies, implements the approach on an AWS application, and experimentally evaluates the scalability. The work seeks to help customers securely compose cloud systems using building blocks while ensuring compliance with security policies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Received 1 November 2023, accepted 19 November 2023, date of publication 7 December 2023,

date of current version 15 December 2023.


Digital Object Identifier 10.1109/ACCESS.2023.3340690

Analyzing Component Composability of Cloud


Security Configurations
KANDASAMY MUNIASAMY 1 , ROHIT CHADHA 2, PRASAD CALYAM 2, (Senior Member,
IEEE), AND M. SETHUMADHAVAN 1
1 TIFAC CORE in Cyber Security, Amrita Vishwa Vidyapeetham, Amritanagar, Coimbatore, Tamil Nadu 641112, India
2 Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
Corresponding author: Kandasamy Muniasamy ([email protected])
The work of Rohit Chadha was supported in part by Grant NSF CNS 1553548 and Grant NSF SHF 1900924.

ABSTRACT Security is a major concern when building large-scale computer systems. Cloud services have
made it easier to provision large-scale systems on demand over the Internet. While the cloud service providers
provide the required building blocks such as compute units, database servers, and storage, customers are
still responsible for securely combining these systems to satisfy their organization’s security policy. The
secure development and operation of such large-scale systems present technical challenges. Composing a
larger system using components with known security properties that satisfy a given security policy without
re-analyzing the individual components is a difficult problem. In this study, we attempted to analyze the
composability of components from a security perspective using first-order predicate logic. We posit that if
we build a system using individual components that satisfy a security policy, the composed system will be
sound with regard to that policy. Additionally, the methodology can be used to identify drifts or violations
during future changes in the system by running checks during the system release cycles for continuous
verification.

INDEX TERMS Cloud security, composability, formal analysis, policy-based verification.

I. INTRODUCTION In a distributed system as complex as AWS, one can design


Though cloud computing offered by major players, such as a secure Virtual Private Cloud (VPC) system composed of
Amazon with Amazon Web Services (AWS) and Microsoft building blocks in various ways. Not all designs would
Azure, has replaced traditional data centers for big and small exhibit the desired security properties. A misstep in security
companies, security remains a major responsibility for cus- design is disastrous and costly for any organization. Like
tomers. AWS, for example, presents a shared responsibility the discipline of programming, where programming with
model whereby Amazon is responsible for the security of mathematical rigor yields programs of desired quality,
the cloud, and each customer is responsible for security correctness, simplicity, and elegance, proper use of security
in the cloud [1]. AWS provides security building blocks abstractions is needed for composing cloud systems.
such as Identity and Access Management (IAM), Security We consider a sample application deployed in a VPC
Groups, and Continuous Monitoring. However, customers (Figure 1) that consists of a database, several compute
are responsible for securing their AWS accounts and their instances, and a load balancer. This simple example is a
applications from common security threats through not only model of how large-scale systems such as video conferencing
the better design of their applications but also securely services and credit card processing systems are deployed.
composing their operational systems with those building This VPC has three subnets, two private and one public.
blocks. Instances in the private subnet are not publicly addressable.
The public subnet has an application load balancer that end
users connect to using HTTPS over port 443. It also has a
The associate editor coordinating the review of this manuscript and bastion host that provides Secure Shell (SSH) access to the
approving it for publication was Giovanni Pau . compute instances in the private subnet. Web-based access to
2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by/4.0/ 139935
K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

can be automatically constructed using APIs, this model


can be applied to any public cloud and on-premise systems
that support APIs. The formal reasoning supported by
the model can make useful predictions during the initial
design as well as during the evolution of the system.
Finally, practitioners can easily understand mathematical
logic without any learning curve, and the barrier to adoption
is minimal. We connect technical and operational validity
by developing a formal methodology, implementing our
approach, and experimenting with real-life AWS systems
FIGURE 1. A sample VPC. consisting of 60 VPCs with thousands of resources spread
over multiple regions. We automated the generation of
logical statements that describe the configuration of VPC
the VPC resources is controlled by the Identity and Access resources. We coded the axioms that describe the behavior
Management (IAM) framework provided by AWS. Larger of the system components and security policies as logical
systems utilize several VPCs and many building blocks, such implications. We then analyzed the combined knowledge
as those used in this sample application, in addition to other base to identify components that compose and the ones that
AWS components. are non-compliant with regard to security policies.
The fundamental difficulty with operational security in the The remainder of this paper is organized into seven
cloud is due to the improper and insecure configuration of sections. Section II covers the related research. Section III
components [2], the selection of components with properties discusses our approach to representing the security properties
inconsistent with the rest of the system, and the lack of near- of components and security policies. Section IV presents
real-time evaluation and detection of offending components. the implementation of the proposed approach for a sample
In a public cloud environment, under the shared responsibility application. Section V discusses the experimental results and
model, each customer is responsible for ensuring security in scalability of the approach. Section VI identifies the study’s
their operating environment. Although individual customers limitations and future research directions. Finally, section VII
may follow certain best practices for building their system in concludes this paper.
the cloud using the building blocks from providers, there are
no formal ways to ensure the security assurance of the overall II. RELATED RESEARCH
system. Trust in cloud providers and process rigor cannot be The early use of formal methods dates back to the 80s when
substituted for sound practices that link the design of a system researchers studied the security of access control models such
with analysis and evaluation in the ever-changing security as Lampson, Bell-LaPadula, and Biba. This was followed by
landscape. Thus, securely composing a system will remain efforts to prove the correctness of operating system kernels
elusive without sound modeling and evaluation approaches and authentication protocols [3]. The 90s saw the increasing
that can provide proof of the desired outcome. To aid adoption of model checking in hardware verification to
customers in building systems securely in the cloud, we pose complement simulation, the use of notations such as Z
a composability problem and a method to solve it. We state and TLA+ for software specification, and the evolution
the problem of how to compose a larger system that satisfies a of tools and their use in partial analysis in specification
given security policy using individual components and ensure and verification [4], [5]. With the widespread use of cloud
that such a system remains secure during its lifetime. To this systems and the increased focus on thwarting cyber threats,
end, we answer two central questions regarding this problem: formal methods are finding a niche again. Engineers at
• How do we compose a system securely using the Amazon Web Services have used formal specification and
existing building blocks provided by cloud service model checking using TLA+ since 2011 to help solve
providers? difficult design problems in critical systems as complexity
• How do we verify that the composed system is ‘‘sound’’? and scale increase [6]. In a multi-tenant cloud platform,
That is, the security properties always satisfy the given each customer’s operating environment for their applications
security policy. is built from the building blocks offered by the providers
We use mathematical logic to represent the security properties and should be verifiably secure from common threats.
of the individual components and network semantics. The In our work, we gather the security properties of cloud
empirical security policies are coded as rules. We can then system building blocks, such as storage, compute units (i.e.,
reason about these representations to prove that the composed virtualized hosts), and databases, and check them against a
system is sound from the given security policy perspective. security policy to ensure that there is no contradiction using
We use this approach to discuss the sample application’s first-order logic to prove composability.
compositional security. Our incremental approach can be The Hookup theorem, based on the ‘‘restrictiveness’’ prop-
scaled to larger systems as more components are added. erty [7], and conditional composability, based on conditional
Because the security properties of cloud-based components non-dependency [8], are rigid and cannot be easily applied to

139936 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

practical large-scale systems. The composition proposed for [14], TLA+ by Leslie Lamport [15], Portunes Algebra [16],
cryptographic protocols by Canetti et al [9] does not work for Vampire [18] and Datalog with Soufflé [19]. We preferred to
general system composition. use Prolog because it is well known and has a better chance
AWS uses automated reasoning tools to verify the network of adoption given that people in academia and industry
reachability of resources in Virtual Private Networks (VPCs) are familiar with it. This section briefly introduces FOL
[10]. The tools used include Vampire, MonoSAT, and Soufflé, and discusses the representation of security and network
which formalize network semantics into logic and perform properties for a sample application.
reasoning to answer network configuration and reachability
questions. Examples include identifying any resources that A. A SHORT INTRODUCTION TO FOL
are tagged ‘Bastion’ or any compute instances that are FOL uses terms, variables, constants, functions, and predi-
reachable from the Internet using the Secure Shell (SSH) cates as its main elements. The signature of the FOL formula
protocol. is defined as follows:
Microsoft Azure has developed a tool to validate network X
connectivity policies automatically [11]. The tool can check = ⟨F, P, arr⟩
selected properties of policies, such as whether some traffic where F and P are disjoint sets of function and predicate
is permitted or denied, and compare two policies to identify symbols, and arr is an arity function: F ∪ P → N giving the
drifts. The tool uses bit-vectors to encode policies and the number of parameters for F and P. A term can be an object,
theorem prover Z3 as the underlying solver. a variable, a constant, or a function. A predicate p(t1 , . . . , tn )
These tools use logic to answer questions on the configu- is an atomic formula (atoms for short) where each ti is a term.
ration that return a list or questions on reachability that return Predicates with zero arity are treated as propositions; hence,
yes/no. Reference [12] postulates an approach for security the FOL subsumes propositional logic. Predicates capture the
vulnerability management in systems by representing the properties of objects and their relationship as in
vulnerabilities in a system and corrective actions as a SAT
problem. Reference [13] describes an automated SMT-based placement(computer_c1, vpc1, subnet1)
approach to prevent published vulnerabilities in components
where the relationship indicates that computer c1 is in
used in a composite cloud service. Our research focuses on
subnet1 of vpc1. Logical connectives such as conjunc-
representing the security properties of individual components
tion (∧), disjunction (∨), negation (¬), and quantification
and verifying whether they hold good in conjunction as well
(universal (∀) and existential (∃)) produce formulas. If a
as satisfy a given set of security policy statements of an
quantifier does not cover a variable in a formula, it is free;
organization to prove the composability of those components.
otherwise, it is bound. The use of variables helps compactly
Although [10], [11], [12], and [13] have made good progress
represent knowledge. A formula or a term is called ground
in applying formal methods to answer questions on net-
if it has no occurrence of variables. Henceforth, facts will be
work reachability, security configurations, and vulnerability
ground formulas in a FOL signature. The logical implications
remediation, these do not address the general composability
are if-then rules. Quantified variables, predicates, logical
of components with respect to a security policy. Our work
connectives, and implications are combined into expressive
builds on utilizing the security property representation to
formulas.
verify whether a given set of components can be composed
Reasoning is done by asserting facts and then deducing
to assemble a larger system, focusing primarily on platform
new facts using implications. A definite clause is a general-
components such as compute units, databases, and network
ized implication of the form [20]:
elements, which are the fundamental building blocks of a
cloud-based system. (p1 ∧ · · · ∧ pk ) → q.
All the free variables in a definite clause are assumed to be
III. REPRESENTATION OF SYSTEMS AND PROPERTIES universally quantified across the clause. If q is a propositional
We require a simple and elegant notation to express the symbol, a predicate, or a logical constant such as false, then
properties of systems, problems, and proofs. Notations based it is known as a Horn clause. An example is:
on mathematical symbols with a vocabulary much smaller likes(X , Y ) ∧ likes(Y , X ) → buddies(X , Y ).
than that of a natural language can be easily combined
into expressions and manipulated using rules to produce This implies that if X and Y like each other, they are buddies,
new expressions. Propositional Logic and First-Order Logic assuming the domain of discourse to be people. When
(FOL) have been used extensively to represent knowledge combined with facts likes (alice, bob) and likes
about the world and reason with that knowledge. One of the (bob, alice), we obtain a new assertion buddies
major strengths of logical representation is expressiveness. (alice, bob).
We use FOL, specifically Prolog, for the representation and We use many-sorted FOL to represent the VPC resources
reasoning for our problem domain. Other choices include as liberal relations. We explain the sorts, predicates, and their
C.A.R. Hoare’s Communicating Sequential Processes (CSP) arguments used in the modeling later in subsection III-C.

VOLUME 11, 2023 139937


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

Interested readers are referred to [21] for comprehensive redirected to the media servers where the meeting itself will
review of FOL. be hosted in a geographical location closest to the user.

B. PROLOG 1) AWS VPC


To reason using facts and implications automatically, we use To establish the security context for our sample application
a programming language. Prolog is the most widely used (Figure 1 on Page 139936), we briefly describe the VPC
logic programming language and follows the FOL style. concept in AWS. A VPC is a service that allows customers to
Implications in FOL are represented as rules in Prolog and launch AWS resources in a logically isolated virtual network
are valid Horn clauses. For example, an implication: that they define. An account may contain multiple VPC
instances. One can specify an IP address range for the VPC,
male(X ) ∧ child(X , Y ) → son(X , Y ) specify subnets (which are a range of IP addresses from the
(X is male, and X is the child of Y implies X is the son of Y) main block) that are public and private, and use multiple
is represented in Prolog as follows: layers of security using Route Tables, Security Groups,
and Network Access Control Lists (NACL). A route table
son(X , Y ) : − male(X ), child(X , Y ). specifies the VPC-level routing that is allowed, such as the
connection to the Internet via the Internet Gateway, peering
The variables in the Prolog rules are implicitly universally with other VPCs, and traffic between subnets. Security
quantified. A Prolog program comprises facts and rules. groups in AWS explicitly define allowed traffic flows, often
Given a knowledge base of facts and rules, we can pose specified as a tuple (protocol, port or port range, source IP).
queries that are predicates with or without variables that Only explicitly specified traffic patterns are allowed. The
return yes or no answers or all objects for which the query other flows are denied by default. In addition to the default
evaluates to true. For example, on starting the Prolog program security group at the VPC level, the resources in the VPC,
that has loaded the knowledge base of facts corresponding to such as the compute units and databases, will have associated
the predicates male(·) and child(·) and the above rule, one security groups. The NACL at the VPC subnet level controls
can pose two types of queries: traffic to or from a subnet according to the inbound and
• A query that returns yes or no as to whether someone is a outbound rules. Security groups and NACLs offer ‘defense
son of a given parent, as in ?- son(bob, john). The query in depth’ protection for VPC resources. These equivalent
may return a true or false answer based on whether the Software Defined Networking (SDN) constructs act as virtual
goal is evaluated to be true or false. firewalls to protect the network and resources.
• A list query that returns the names of all sons of parent
john, as in ?- son(X , john). Variable X is implicitly 2) VPC RESOURCES AS LOGICAL STATEMENTS
existentially qualified to mean returning at least one We now describe the modeling of the VPC construct and
object that evaluates this goal to be true. Prolog returns its resources as many-sorted structures to build a knowledge
all objects that satisfy the query using backtracking and base (KB) of logical statements representing our sample
unification. application [17].
We list the sorts that are used in our modeling to describe
C. REPRESENTING AWS VPC RESOURCES AS LOGICAL the attributes of the system resources in Table 1.
STATEMENTS The sort visibility_attribute is a set with
We briefly introduce the AWS VPC and explain our two values, private and public. This attribute denotes
methodology for representing VPC resources as logical whether the resource is public-facing or internal to the
statements. The sample application (Figure 1 on Page VPC. The direction_attribute is a set with two
139936) is representative of typical two-tier architectures values ingress and egress and denotes a resource’s traffic
with load balancers, where the incoming traffic is distributed direction. The protocolnum_attribute is a set of
evenly to the back-end servers (i.e., compute units) that integers {6, 17, 1, −1}. 6 denotes TCP, 17 denotes
connect to a database. The load balancer is in an external- udp, 1 denotes icmp and −1 denotes all protocols. The
facing subnet, whereas the back-end servers and the database protocol_attribute is a set of protocol names {tcp,
are in internal subnets that are private and are not open to udp, icmp, −1} where −1 denotes all protocols. The
the Internet. Operations team members who need to access permission_attribute is a set of two values, allow
the servers will utilize Secure Shell (SSH) via a bastion and deny. The encryption_attribute is a set of two
host, which could be either open to the Internet or to their values, encrypted and unencrypted.
Company’s VPN IP address range that passes traffic to We describe the predicates used in our implementation
the destination servers. In a distributed Video Conferencing over these sorts as follows. Table 2 presents the predicate
application, for example, the initial interaction of the user is names and their arguments.
with such a system where the user’s credentials and meeting • The predicate vpc describes the VPC construct,
identifiers will be validated by the back-end servers utilizing a logically isolated network in an AWS account in a
the profile data of users stored in the databases and then specific region with an IP address block. The sorts

139938 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

TABLE 1. Predicate argument names and their sorts. In the knowledge base, we will have several instances
of the relation vpc, one each for a VPC with a unique
ID, the corresponding IP address block, and the AWS
region name for a given AWS account where the VPC is
created.
• The predicate subnet describes a subnet within a VPC
identified by a unique subnet ID with the corresponding
IP address block and whether the subnet is public (i.e.,
Internet-facing) or private (internal-facing).
• The predicate nacl describes the Network Access
Control List (NACL) associated with the VPC and
its subnets. A VPC will have one or more NACLs,
but a subnet can be associated with only one NACL.
nacl_association associates a NACL with the
VPC and a subnet. AclId is a unique identifier for a
NACL in a region, AclAssocId is a unique identifier
that associates a NACL with a VPC and a subnet in it,
Direction indicates whether the traffic is ingress or
egress, RuleNo is an integer, and ProtocolNo identi-
fies the transport protocol. SourceIPAddress is the
IP address of the source where traffic originates from
in CIDR notation, FromPort and ToPort denote
the starting and ending port numbers respectively, and
Permission denotes whether the traffic is allowed or
denied.
• The predicate secgroup describes the Security
Group construct that enforces allowed traffic flows for
TABLE 2. Predicates and their arguments. AWS resources such as compute units and databases.
SgName is a string, SgId is a unique identi-
fier for the security group, Direction is one of
(ingress, egress), (RuleNo) is the security group rune
number, Protocol identifies the transport protocol,
FromPort and ToPort are the starting and ending
port numbers. SourceIPAddress is the IP address
of the source where traffic originates from in CIDR
notation, and Description is a string.
• The predicate compute describes a compute unit.
InstanceName is a set of strings that may be used
to identify virtual machine instances, InstanceId
is a unique ID created by AWS to identify a vir-
tual machine instance, EncryptionStatus denotes
whether or not the virtual machine has its disk drives
encrypted, ModelName refers to one of the supported
compute types in AWS and PublicOrPrivate
indicates whether the compute unit is public-facing
or internal. Additionally, a compute unit’s placement
in a subnet of a VPC and its association with a
involved are inferred from the table 1. VpcId refers security group are described by placement and
to unique identifiers for VPCs in an AWS account, secgrp_association predicates.
IPAddressBlock refers to the non-intersecting IP • The predicate rds describes a database unit. RdsName
address blocks in the CIDR notation (a.b.c.d/n, is a string, EncryptionStatus indicates whether
where a - d are octets and n is the number of consecutive the database is encrypted or not, and RdsType is
leading 1-bits from left to right in the subnet mask) a string that represents the type of database such as
for the individual VPCs in an account, AccountId Postgres, and Aurora-MySQL. An rds relation will
refers to the unique numeric account Ids in AWS and have one or more associated placement relations
AWSRegion refers to the names of the AWS regions. depending on how many subnets the RDS is created in as

VOLUME 11, 2023 139939


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

well as one or more secgrp_association relations compute ( " compute_name " , " c o m p u t e _ i d " ,
connecting it to the associated security groups. unencrypted , " r5_2xlarge " , public ) .
• The predicate alb describes a load balancer. AlbName r d s ( " rds_name " , e n c r y p t e d , " p o s t g r e s " ) .
is a string, and AlbType is a string and refers to the p l a c e m e n t ( " compute_name " , " c o m p u t e _ i d " ,
type of load balancer, such as a network or application. compute , " v p c _ i d " , " s u b n e t _ i d " ) .
Similar to compute and rds predicates, alb relation p l a c e m e n t ( " r d s _ n a m e " , " r d s _ i d " , rdms , "
will have one or more associated placement relations vpc_id " , " subnet_id " ) .
depending on how many subnets the alb is created in as subnet ( " vpc_id " , " subnet_id " , ip
well as one or more secgrp_association relations (10 ,136 ,214 ,0/23) , public ) .
connecting it to the associated security groups.
The first fact describes a compute unit named compute_unit
• The predicate placement describes a resource’s
with an id compute_id. The second fact describes the
placement in a subnet within a VPC. Here, the
database unit named rds_name. The placement facts for
ResourceType will refer to VPC resources compute, rds,
the compute unit and the rds unit associate them with their
or alb, and the ResourceName and ResourceId
corresponding subnets where these resources are created.
will be the corresponding name and id for the resource,
The subnet fact describes the subnet configuration with id
respectively and SubnetId is the identifier of the
subnet_id.
subnet the resource is created in.
A resource is public if it is created in a public subnet.
• The predicate secgrp_association describes an
Expressing this behavior as an axiom requires us to describe a
association between a VPC resource and a security
system predicate isPublic and an implication statement in
group that governs traffic into and out of the resource.
terms of the resource predicates placement and subnet.
Here, the ResourceName and ResourceId will be
We express this as follows:
the name and identifier for the resource associated with
placement(X , _, _, _, Y ), subnet(_, Y , _, public) →
the security group with the name SgName and identifier
isPublic(X ).
SgId.
2) SECURITY POLICIES
D. COMPOSABILITY VERIFICATION MODELING
Once we have the knowledge base of the system, we can
1) KNOWLEDGE BASE
verify whether the components in the system satisfy the
The set P consists of all resource configuration predicates
security policies set by the administrators. For our study,
above (See Table 2). We assume, in addition to resource
we assume a set SPred of security policy goal predicates.
predicates, a set S of system predicates is used to describe
Each element of the set SPred is a predicate, spi , whose
certain behaviors or specifications of the system. Each
sorts we will leave unspecified since these depend upon the
element of the set S is a predicate, si , whose sorts we will
application. We represent security policies themselves as a
leave unspecified since these depend upon the application.
set SP of logical implication statements using the resource
We describe the system behavior using the resource config-
configuration and system predicates we described earlier and
uration predicates, and the system predicates as axioms that
security policy goal predicates. These security policies are of
are FOL implications. These axioms are of the form:
the form:
p1 (T˜1 ) ∧ · · · ∧ pk (T˜k ) → si (T̃ )
p1 (T˜1 ) ∧ · · · ∧ pk (T˜k ) ∧ s1 (T˜1′ ) ∧ · · · ∧ sn (T˜n′ ) → spi (T̃ )
where p1 , . . . , pk belong to P, si is a predicate in set S
where p1 . . . pk are the resource predicates, s1 . . . sn are the
and T̃ is a tuple of variables or constants of sorts described
system predicates, and T̃ is a tuple of variables or constants
in Table 1. These implications are axioms that are system-
of sorts, as explained before and spi is a security policy goal
dependent. Intuitively, we use these axioms to define certain
predicate in the set SPred.
characteristics, such as what it means for a VPC resource
Definition 2: Given a knowledge base KB = (P, S, A),
to be public or private based on network placement. Now,
a set of security policy statements SP, a security policy goal
we formally define Knowledge Base.
predicate spi (T̃ ) ∈ SPred, and T̃ a tuple of constants and
Definition 1: A Knowledge Base (KB) is a triple (P, S, A)
variables, ? − KB ∪ SP, spi (T̃ ) returns a set of assignments
consisting of a set of resource configuration predicates P, a set
Assn_Set to variables in T̃ such that for each assignment
of system predicates S, and a set of system behavioral axioms
ρ ∈ Assn_Set,
A of the form:
• KB ∪ SP |H spi (T̃ /ρ), where T̃ /ρ is interpreted as ρ
p1 (T˜1 ) ∧ · · · ∧ pk (T˜k ) → si (T̃ ). applied to T̃ .
In case T̃ consists of only constants, it returns true if KB ∪
We explain the KB contents with an example.
SP |H spi (T̃ ) and false otherwise.
Example 1: Consider the following facts using the
Remark 1: When we encode the above query in Prolog,
resource configuration predicates for a compute unit and a
false may be returned if Prolog finds no assignments or if
relational database unit.

139940 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

Prolog is not able to prove that KB ∪ SP |H spi (T̃ ) in the case s e c g r p _ a s s o c i a t i o n ( instance_name ,
T̃ consists of only constants. Thus, a false output by Prolog i n s t a n c e _ i d , sg_name , s g _ i d ) ,
is distinct from the logical false, and care must be taken to
which associates with the security group that specifies the
interpret it.
ingress and egress rules for the compute unit.
Given a goal predicate, say, composableCompute
Using rules in Prolog, we represent the characteristics or
pertaining to compute units, ? − KB ∪ SP,
behaviors that are FOL implications. An example is whether
composableCompute(X̃ ) will return a set of assign-
a compute unit is encrypted. An underscore (_) denotes the
ments to X, which are the compute units that satisfy
do not care values in the following statement:
the security policy on the configuration requirements
such as encryption, network placement, open ports, etc. e n c r y p t e d (X) :− compute (X, _ , Y, _ , _ ) ,
The remaining compute units do not compose securely Y = encrypted .
with regard to this policy, and those units could be
obtained using the negative goal predicate expression: ? − This rule is equivalent to the logical statement:
KB ∪ SP, compute(X̃ ), not(composableCompute(X̃ )). We compute(X , _, Y , _, _) ∧ Y = encrypted →
explain this using the following example. encrypted(X ). The consequent of the implication becomes
Example 2: Suppose we have the following system the head of the Prolog rule, and the antecedents are in the
axioms that describe system characteristics regarding the body of the rule. The head and the body are separated by:-.
encryption of a compute unit and placement in a private Another example of a behavior or a specification is to
subnet. An underscore denotes a don’t care value. specify if a resource is accessible over the Internet using the
compute(X , _, Y , _, _) ∧ Y = encrypted → Secure Shell protocol. The rule:
encrypted(X ). canSSHFromInternet(X) :- compute(X, _,_,
placement(X , _, _, _, Y ) ∧ subnet(_, Y , _, private) → _, public), secgrp_association(X, _, Y,
isPrivate(X ). _), secgroup(Y, _,ingress, _, _,
Assume we have a composableCompute goal pred- Fromport, Toport, ip(0,0,0,0/0), _),
icate defined in a security policy SP using the above Fromport \= null, between(Fromport,
system predicates and the resource predicate compute: Toport, 22),
compute(X , _, _, _, _) ∧ encrypted(X ) ∧ isPrivate(X ) → states that one can access a compute unit using SSH if the
composableCompute(X ). compute unit is public-facing and is associated with a security
The query: ? − KB ∪ SP, composableCompute(X ) group that permits ingress traffic on port 22. These general
will return all compute units that are encrypted and created rules are unlikely to change from one system to another for a
in a private subnet. cloud provider.
The query: We generated the knowledge base of logical statements
? − KB ∪ SP, compute(X , _, _, _, _), for the sample application using the model described
not(composableCompute(X )) in Section III. The proposed implementation approach is
will return the compute units that do not satisfy the policy. illustrated in (Figure 2). The inference model comprises two
components. The first is the knowledge base of the security
IV. IMPLEMENTATION OF OUR APPROACH FOR A properties of the components in the system, which is an AWS
SAMPLE APPLICATION VPC in this example. These properties are ground atomic
A. USING PROLOG TO REPRESENT THE FACTS AND formulas, which are Prolog facts in our implementation
BEHAVIORS OF SYSTEM COMPONENTS corresponding to the individual component configurations in
Using the predicates described above, we represent the the VPC. Additionally, we added network properties that are
ground facts regarding a VPC and its resources using Prolog. common to all VPCs, such as what is meant by a component
We describe the salient security properties of the components that is public facing (i.e., open to the Internet), the reachability
using atomic formulas (i.e., Prolog facts). For example, of one component to the other, and so on. These properties are
a compute unit is described by the following predicates: system behavioral axioms. We automated the generation of
Prolog facts corresponding to the component configurations
compute ( i n s t a n c e _ n a m e , i n s t a n c e _ i d , using a bash shell program that utilized AWS’ Command Line
encrypted , private) , Interface (CLI) [23]. This program operates at an account
level to generate Prolog facts for all VPCs in that account
that describes the compute unit. or individually for a given VPC through a command line
placement ( instance_name , instance_id , argument containing VPC_id [22]. The network properties
compute , v p c _ i d , s u b n e t _ i d ) , are coded manually as Prolog rules, which are implication
statements in the FOL. These facts and rules form the
which describes the network placement for the compute unit, knowledge base. The Security Policy is a set of Prolog
such as the subnet in which it is created and the vpc for the rules that we code manually because this can vary from
subnet, and organization to organization and can be found in policy and

VOLUME 11, 2023 139941


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

that VPC. AWS CLI, written in the Python programming


language, is used to query AWS services from command line
shells (such as bash, zsh, and tcsh on Linux and macOS, and
command prompt and PowerShell on Windows). The basic
structure of the command is [22]:
aws <command> <subcommand > [ o p t i o n s and
parameters ] .
For example, to get information about the compute units in a
FIGURE 2. Proposed implementation. VPC, the CLI command used is:
aws ec2 describe-instances [options and
parameters]
procedure documents in unstructured ways. The knowledge A sample set of options and parameters could be:
base and the Policy statements were fed to the Swi-Prolog -- filter Name=vpc-id,Values=<actual vpc
interpreter for policy checking. id>,
that filters out EC2 instances for a VPC with a given ID. The
B. AUTOMATIC GENERATION OF THE KNOWLEDGE BASE option:
FOR A VPC
--query 'Reservations[*].Instances[*].
In AWS, an application is deployed in a VPC for network { Instance:InstanceId,
isolation. A distributed application may be deployed in Subnet:SubnetId, Tags:Tags,
multiple VPCs in the same region as well as in different SecurityGroups:SecurityGroups,
regions and peered together if direct connectivity (i.e., using PrivateIpAddress:PrivateIpAddress,
private IP addresses) between them is needed. Each region PublicIpAddress:PublicIpAddress,
is a separate geographical area, such as us-west-2 (Oregon), InstanceType:InstanceType,
us-east-1 (Virginia), or eu-west-1 (Ireland). Each region BlockDeviceMappings:BlockDevice
will have multiple availability zones for redundancy. Each Mappings }'
availability zone comprises one or more discrete data centers --output json
[24]. A distributed application is designed and deployed with
specifies the parameters and the output format.
‘strong cohesion and loose coupling’ by having subsystems
We implemented a bash shell program that used AWS CLI
with different functionalities in their own VPCs and then
to gather configuration details and encoded them as Prolog
replicated to other regions for high availability. Certain
facts. The program can be run on any system that supports a
subsystems will be centralized in one or two regions for
bash shell. The credential required to run the program is an
disaster recovery and failover, and other subsystems will
access key and a secret access key pair. The program can fetch
be deployed in different regions for proximity to users.
the configurations for all VPCs in a region or a single VPC of
For example, in a Video Conferencing Service that is used
interest based on the argument passed at run time. We discuss
globally, an application layer that identifies and authenticates
the implementation details in the next two subsections.
users can be deployed in their own VPCs in two regions.
The application layer routes users to the media subsystem
C. GATHERING VPC CONFIGURATION
deployed in a region geographically closest to the users. The
The information about a VPC is queried using the command:
media layer is deployed in several regions worldwide for
high availability and reduced latency. However, each region’s aws ec2 describe-vpcs --vpc-ids vpcId
individual subsystem design and composition are the same The response is returned as a JSON object such as:
except for the IP address assignment, which will vary from {
region to region. "Vpcs": [
From a composability perspective, if an individual sub- {
system composition is sound in one VPC, we can infer that "CidrBlock": "10.0.0.0/16",
such a composition will be sound in every VPC with a "State": "available",
similar configuration. Similarly, if the connectivity between "VpcId": "vpc-id",
subsystems is sound in a region (for example, in the case of "OwnerId": "acct-id",
the application layer and media layer subsystems in the Video "CidrBlockAssociationSet": [
Conferencing system discussed above), we can draw the same { ... }
conclusion for other regions without having to repeat the ],
analysis. "Tags": [
To draw composability reasoning for a subsystem deployed { ... }
in a VPC with regard to a given security policy, we must ]
gather logical statements about each component deployed in }

139942 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

] The MapPublicIpOnLaunch attribute indicates whether


} a subnet is private or public. We encode these as the following
We encode this configuration in Prolog as: Prolog facts:
s u b n e t ( " vpc−i d " , " s u b n e t −i d 1 " , i p
vpc ( " vpc−i d " , i p ( 1 0 , 2 2 , 1 2 , 0 / 2 2 ) , " a c c t −
(10 ,1 ,0/24) , public ) .
i d " , us−west −2) .
s u b n e t ( " vpc−i d " , " s u b n e t −i d 2 " , i p
The fact vpc describes the VPC by its ID, its CIDR (10 ,0 ,0 ,0/24) , private) .
block, the AWS account ID, and the region it is created in. A network access control list (NACL) acts as a firewall for the
The parameters vpc-id and acct-id are the actual identifiers VPC that controls traffic in and out of the subnets. Multiple
returned in the JSON response, and us-west-2 is the AWS subnets can be associated with a NACL entry identified by
region where the VPC is located and is passed on as a an id, but each subnet can be associated with only a single
configuration parameter for the shell program. We represent NACL entry. NACL entries are evaluated in order of priority
the IP addresses in the CIDR notation as ip(a, b, c, d)/n, according to the rule number from low to high preference
where a - d are octets corresponding to the IP address and n order. The default-deny rule will apply if none of the NACL
is the number of bits in the subnet mask. We use this notation entries matches. We fetch the NACL configuration using the
to easily match the IP addresses and check whether a given CLI command:
IP address is in the CIDR block using Prolog.
aws ec2 describe-network-acls --filter
Similarly, we fetch salient VPC configurations, such as
``Name-=vpc-id, Values=vpcId''.
subnets, network access control lists (NACL), and security
The response is an array of NACL entries as JSON objects.
groups, and encode them as Prolog statements as explained
A NACL entry has a NACL id, an association id with each
below:
subnet to which the NACL applies, and a set of ingress and
A subnet is a range of IP addresses in a single availability
egress rules. A sample is provided below. Please note that
zone. Resources such as compute units and databases are
we have masked the suffix of various ids that are in hex
deployed in a subnet. The AWS CLI command:
and replaced it with a placeholder ‘id1’. JSON format does
aws ec2 describe-subnets --filters not allow comments. We have included the comments for
"Name-=vpc-id, Values=vpcId" explanatory purposes only.
returns a JSON array such as: {
{ "NetworkAcls": [
"Subnets": [ {
{ "Associations": [
"AvailabilityZone": {
"us-west-2b", "NetworkAclAssociationId":
"CidrBlock": "10.0.1.0/24", "acl-id1",
"MapPublicIpOnLaunch": true, "NetworkAclId": "acl-id1",
"SubnetId": "subnet-id1", "SubnetId": "subnet-id1"
"VpcId": "vpc-id", },
"OwnerId": "acct-id", { ..... }
"Tags": [ ],
], "Entries": [// Egress Ingress rules
"SubnetArn": "subnet arn" {// Permits SSH traffic to the
}, Internet
{ "CidrBlock": "0.0.0.0/0",
"AvailabilityZone": "Egress": true,
"us-west-2a", "PortRange": {
"CidrBlock": "10.0.0.0/24", "From": 22,
"MapPublicIpOnLaunch": false, "To": 22
"SubnetId": "subnet-id2", },
"VpcId": "vpc-id", "Protocol": "6",
"OwnerId": "acct-id", "RuleAction": "allow",
"Tags": [ "RuleNumber": 200
], },
"SubnetArn": "subnet arn" {// Default deny rule
} "CidrBlock": "0.0.0.0/0",
] "Egress": true,
} "Protocol": "-1",

VOLUME 11, 2023 139943


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

"RuleAction": "deny", subnet level and provide an extra layer of protection if the
"RuleNumber": 1000 security groups are overly permissive.
},
s e c g r o u p ( " sg−name " , " sg−i d " , i n g r e s s , 0 ,
{ // Permits SSH traffic from
tcp , 0 , 65535 , i p ( a , b , c , d / 3 2 ) , n u l l )
subnets
.
"CidrBlock": "10.0.1.0/24",
s e c g r o u p ( " sg−name " , " sg−i d " , i n g r e s s , 2 ,
"Egress": false,
t c p , 2 2 , 2 2 , i p (w, x , y . z / 3 2 ) , n u l l ) .
"PortRange": {
s e c g r o u p ( " sg−name " , " sg−i d " , e g r e s s , 0 ,
"From": 22,
−1, 0 , 6 5 5 3 5 , i p ( 0 , 0 , 0 , 0 / 0 ) , n u l l ) .
"To": 22
}, The fact secgroup uses the following schema:
"Protocol": "6",
s e c g r o u p ( name , i d , i n g r e s s / e g r e s s , r u l e
"RuleAction": "allow",
t y p e , p r o t o c o l ( t c p , udp , e t c , . ) ,
"RuleNumber": 200
from p o r t , t o ~ p o r t , s o u r c e /
},
d e s t i n a t i o n , and d e s c r i p t i o n ) .
{ ..... }
], For example,
"IsDefault": false,
s e c g r o u p ( l a u n c h −w i z a r d −2 , " sg −0
"NetworkAclId": "acl-id1",
e37d454d98079623 " , i n g r e s s , 0 , t c p ,
"Tags": [
22 , 22 , ip ( 0 , 0 , 0 , 0 / 0 ) , n u l l )
{
"Key": "BILLING", specifies that ingress access from the Internet (0.0.0.0/0) to
"Value": "RD" port 22 (SSH) is allowed.
},
< more tags > D. REPRESENTATION OF VPC RESOURCES
] The fact compute describes a compute unit with its instance
}, name, instance ID, encrypted or unencrypted volume used,
< more associations > type or model of the unit, and whether it is public or
] private. The fact placement describes the vpc and the
} subnet in which the compute unit is created, and the fact
The corresponding Prolog facts are: secgrp_association relates the security group name
and the security group ID with the unit.
n a c l _ a s s o c i a t i o n ( " vpc−i d " , " a c l a s s o c −i d 1
" , " a c l −i d 1 " , " s u b n e t −i d 1 " ) . s e c g r p _ a s s o c i a t i o n ( " i n s t a n c e −name " , "
n a c l ( " vpc−i d " , " a c l −i d 1 " , e g r e s s , 2 0 0 , i n s t a n c e −i d " , " sg−name " , " sg−i d " ) .
6 , 22 , 22 , ip ( 0 , 0 , 0 , 0 / 0 ) , allow ) . compute ( " i n s t a n c e −name " , " i n s t a n c e −i d " ,
n a c l ( " vpc−i d " , " a c l −i d 1 " , i n g r e s s , 2 0 0 , unencrypted , " c5_2xlarge " , private ) .
6 , 22 , 22 , ip ( 1 0 . 0 . 1 . 0 / 2 4 ) , allow ) . p l a c e m e n t ( " i n s t a n c e −name " , " i n s t a n c e −i d "
n a c l ( " vpc−i d " , " a c l −i d 1 " , e g r e s s , 1 0 0 0 , , compute , " vpc−i d " , " s u b n e t −i d 1 " ) .
−1, −1, −1, i p ( 0 , 0 , 0 , 0 / 0 ) , deny ) .
The fact alb describes a load balancer by its name
The fact nacl_association describes the relationship and type, such as ‘application’ or ‘network’. The fact
between the subnets and the NACL groups in a VPC. The placement connects the vpc and the subnets the load
fact nacl describes a nacl entry by the vpc-id it is associated balancer is associated with.
with, the acl ID, the rule type - ingress or egress, the rule
a l b ( " a l b −name " , a p p l i c a t i o n ) .
number, the protocol (such as TCP as identified by the
p l a c e m e n t ( " l b −name " , " lbname−i d . e l b . us−
numeral 6), beginning port number, ending port number, the
west −2. amazonaws . com " , a l b , " vpc−i d " ,
source IP address, and the action, allow or deny.
" s u b n e t −i d 1 " ) .
Similarly, we encoded the Prolog facts corresponding to
p l a c e m e n t ( " l b −name " , " lbname−i d . e l b . us−
the security groups based on the responses for the correspond-
west −2. amazonaws . com " , a l b , " vpc−i d " ,
ing CLI commands. In addition to NACLs, security groups
" s u b n e t −i d 2 " ) .
are associated with resources in the VPC. While NACLs
p l a c e m e n t ( " l b −name " , " lbname−i d . e l b . us−
support ‘allow’ and ‘deny’ rules, security groups only support
west −2. amazonaws . com " , a l b , " vpc−i d " ,
‘allow’ rules. NACLs are stateless, and return traffic has to
" s u b n e t −i d 3 " ) .
be permitted by explicit rules, unlike security groups where
return traffic is automatically permitted. Because NACLs are The fact rds describes the rds instance by its name,
at the subnet level, the rules apply to all resources at this whether encrypted at rest or not, and the type of

139944 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

TABLE 3. Sample VPC configuration. subsections IV-C and IV-D. For example, a resource is public
if it is created in a public-facing subnet. A compute unit is
accessible from the Internet via SSH if the unit is public
and has an associated security group that permits the SSH
protocol.
i s P u b l i c (X) :− p l a c e m e n t (X, _ , _ , _ , Y) ,
s u b n e t ( _ , Y, _ , p u b l i c ) .
i s P r i v a t e (X) :− p l a c e m e n t (X, _ , _ , _ , Y) ,
s u b n e t ( _ , Y, _ , p r i v a t e ) .
A resource X is public if it is in a subnet Y, which is public.
c a n S S H F r o m I n t e r n e t (X) :− i s P u b l i c (X) ,
TABLE 4. Sample VPC resource configuration. s e c g r p _ a s s o c i a t i o n (X, _ , Y, _ ) ,
s e c g r o u p (Y, _ , i n g r e s s , _ , −1, n u l l ,
null , ip (0 ,0 ,0 ,0/0) , _ ) .
c a n S S H F r o m I n t e r n e t (X) :− i s P u b l i c (X) ,
s e c g r p _ a s s o c i a t i o n (X, _ , Y, _ ) ,
s e c g r o u p (Y, _ , i n g r e s s , _ , t c p ,
FromPort , T o P o r t , i p ( 0 , 0 , 0 , 0 / 0 ) , _
),
F r o m P o r t \ = n u l l , b e t w e e n ( FromPort ,
ToPort , 22) .
The first SSH rule handles a security group that opens all
ports (indicated by null) pertaining to all protocols (indicated
by -1) to the Internet and includes TCP port 22. The second
rule explicitly checks whether the TCP port 22 is within the
specified range.
database - MySQL, Postgres, and so on. Similar to the
F. REPRESENTATION OF SECURITY POLICIES IN PROLOG
fact compute, there is an associated placement fact that
describes the vpc and subnet in which the rds instance is Security policies can be rules or facts that specify required
created, and a secgrp_association fact that connects conditions. For example, encryption at rest for compute and
it with its security group name and its ID. database instances is coded as:
e n c r y p t e d (X) :− compute (X, _ , Y, _ , _ ) ,
r d s ( " r d s −name " , e n c r y p t e d , " p o s t g r e s " ) .
Y = encrypted .
p l a c e m e n t ( " r d s −name " , " db−r e s o u r c e i d " ,
r d s E n c r y p t i o n A t R e s t (X) :− r d s (X, Y, _ ) , Y
rdms , " vpc−i d " , " s u b n e t −i d 1 " ) .
= encrypted .
p l a c e m e n t ( " r d s −name " , " db−r e s o u r c e i d " ,
rdms , " vpc−i d " , " s u b n e t −i d 2 " ) . Similarly, traffic to standard ports such as TCP ports 22, 80,
s e c g r p _ a s s o c i a t i o n ( " r d s −name " , " db− and 443 can be specified, and the security groups that permit
r e s o u r c e i d " , " sg−name " , " sg−i d " ) . traffic to other ports can be identified using the following rule:
Prolog facts for our sample VPC (Figure 1) and its n o n C o m p l i a n t S e c G r o u p ( ID ) :− s e c g r o u p ( ID ,
resources are presented in Tables 3 and 4. The resources _ , i n g r e s s , _ , _ , X, _ , i p
considered are compute units, a bastion host, a load balancer, ( 0 , 0 , 0 , 0 / 0 ) , _ ) , n o t ( memberchk (X,
and a relational database. We have substituted placeholders [443 , 80 , 2 2 ] ) ) .
for the actual resource names and resource IDs. In the next
subsection, we discuss how we use Prolog rules to represent The following policy rules specify the conditions for the
the semantics of network properties and security policies. composability of EC2 instances and database instances.
As discussed above, the facts are automatically created from composableCompute (X) :− e n c r y p t e d (X) , (
the AWS account using a shell program and AWS CLI. i s P r i v a t e (X) ;
( n o t ( c a n S S H F r o m I n t e r n e t (X) ) ,
E. REPRESENTATION OF NETWORK PROPERTIES s e c g r p _ a s s o c i a t i o n (X, _ , Y, _ ) ,
We represent the network properties in the form of rules. n o t ( n o n C o m p l i a n t S e c G r o u p (Y) ) ) ) .
The properties are generic and are based on the schema we composableRDS (X) :− r d s E n c r y p t i o n A t R e s t (
decided on for the security properties of the components in X) , i s P r i v a t e (X) .

VOLUME 11, 2023 139945


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

The compute composability rule requires an instance to rds(X , _, _), not(composableRDS(X )).
use only encrypted disk storage, to be in a private subnet or Any values returned for X will denote database instances
not publicly reachable over SSH, and not have TCP ports that are either not encrypted or public-facing.
open to the Internet other than the standard ports permitted We presented an approach to building the KB of security
by the policy. The rds composability requires the rds instance properties of individual components as logical facts, repre-
to be encrypted at rest and to be in a private subnet. If we senting security policies as rules and reasoning about them to
apply these two rules to the facts in Tables 3 and 4, none of verify whether the facts satisfy the policies. This approach is
the compute units will satisfy the composability conditions general and can be applied to large distributed systems such
because all of them are in public subnets, but the rds instance as banking and financial or video conferencing applications
will be composable because it is in a private subnet and is because these systems are built using components such as
encrypted. VPCs, compute units, and databases, which are the basic
building blocks. There will be several thousands of similar
G. VULNERABILITIES IDENTIFIABLE FROM COMPONENT facts for such systems. Resource configuration facts and
CONFIGURATIONS system behavior rules can be added to the framework
While AWS provides building blocks, it is still left to we discussed for more components, such as queuing,
the customers to configure them securely under the shared notification, and identity and access management. Security
responsibility model. For example, one can launch a compute policy rules can be coded based on the application domain and
unit but could configure it in the following insecure ways: verified using the properties of the individual components to
• Use the elastic block storage without encryption, leaving prove the soundness of the composition.
the information stored in the volume available for cloud
providers to access. H. VERIFYING THE SECURITY POLICY STATEMENTS
• Set up the compute unit to be Internet-facing with SSH The components compose securely if and only if every
ports open, leaving it discoverable on the Internet for statement in the security policy is satisfied by the combined
password brute forcing and other attacks. model of the components. For example, if we want to
• Expose the services to the Internet directly without a verify the security policy that all compute units should have
load balancer for hackers to be able to fingerprint the their EBS volumes encrypted, the statement is posed as a
services running on the compute instance for planning query: encrypted(X). Prolog returns the compute units with
attacks based on published vulnerabilities. encrypted EBS volumes. Similarly, assume a security policy
Similarly, one can launch a database instance without where all compute units and database instances must be in
encryption at rest and with a public IP address, making it private subnets as specified in the statement:
susceptible to password attacks and eventual data breaches.
i s P r i v a t e (X) , ( compute (X, _ , _ , _ , _ ) ; r d s (X,
We consider these two categories of vulnerabilities arising
_ , _) ) .
from misconfigurations in this paper. This can be expanded
to include improper handling of access key/secret key pairs, When issuing this query, X will be instantiated to the compute
not enabling multi-factor authentication for users, not using units and rds instances in the private subnet, and the query will
the right TLS protocol versions for protecting data in transit, succeed.
etc. Thus, composable building blocks such as compute units
The security policies are designed to detect and address and database instances can be assembled from a given VPC
vulnerabilities. While we can identify the components that configuration and network properties. Two such sample rules
are composable with regards to a given policy, we can also are composableCompute and composableRDS presented
enumerate the ones that do not compose using the negative in subsection IV-E.
goal predicate. For example, if a compute unit does not satisfy
the goal predicate encrypted(X) for compute units, then V. PERFORMANCE EVALUATION AND COMPOSITION AS
we can infer that there is a vulnerability due to the lack of A WAY TO SCALE
encryption at rest for the elastic block storage. Similarly, We implemented our prototype using a bash script and AWS
we can identify compute units that are in the public subnet Command Line Interface to fetch the configuration of VPCs
and hence will be discoverable for brute force attacks. in the AWS account and the compute, rds, and load balancer
Example 3: Given a KB of compute units, we can list the resources contained within it and translate them into Prolog
compute units that are not encrypted or in a public subnet facts as we described in section IV on Page 139941. The
using the following queries. An underscore denotes don’t care program is general and can be run on any AWS account
values. using an access key and a secret access key pair. The program
compute(X , _, _, _, _), not(encrypted(X )). enumerates through VPC-level network configurations such
compute(X , _, _, _, _), not(isPrivate(X )). as subnets, network access control lists (NACLs), security
Example 4: Another example is the composability rule groups, and resources such as compute units and database
for database instances. We can identify vulnerable database instances and their attributes. The configuration information
instances using the negative predicate goal as follows: is written as Prolog facts in a file. We manually verified

139946 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

TABLE 5. Time taken for KB generation (s). TABLE 6. Execution time for verification - DB instances (time in µs).

TABLE 7. Execution time for verification - compute instances (µs).

the program output using the AWS resource configuration


information for subnets, NACLs, security groups, compute
units, load balancers, and RDS instances to ensure the
correctness of the mapping. Network properties, such as
canSSHFromInternet and Security Policies, are coded as
Prolog rules manually. The program includes the required
Prolog libraries and the rules file in the output file, so the Swi-
Prolog (swipl) interpreter can be directly invoked to process
it. KB generation was tested using the bash script and AWS in Table 6. The average time taken per RDS instance is 4
CLI on a MacBook Pro running macOS Monterey version to 5.5 µs.
12.5 with a 2.6GHz 6-Core Intel Core i7 processor and 16GB Similarly, for the goal composableCompute(X ), the
of memory. The statistics for generating the KB for a VPC number of inferences was four, and the time taken
with 423 resources and their corresponding 1561 facts are was 5 µs to return a single answer, whereas, for
shown in table 5. The CLI takes 1 - 3 seconds to return the composableComputeList(X ), the average response time
resource information in the JSON format. More time is spent for a compute instance varies from 6.63 µs to 12.43 µs
parsing and generating facts about the security groups and depending on the amount of backtracking involved during
compute units. We use the jq utility to parse and extract processing. The results are presented in Table 7 for various
data elements from JSON documents. Multiple invocations of counts of compute instances in a real-life system we
jq contribute to the increased time spent processing security experimented with. As shown in Figure 3, the average
groups and computing unit information. response time increases as the number of compute units
We experimented with the knowledge base generation increases due to search cost. The average number of
program on larger AWS systems, iterating through every inferences decreases slightly as the number of compute units
AWS region in an account, and then within each region, every increases, but this could be impacted by the increased amount
VPC and gathered the resource configuration information of backtracking if there are more non-compliant compute
at the VPC level. The seven KBs we generated from AWS units. Figure 4 shows that as the number of facts increases
accounts consisted of 520 to 27,305 facts corresponding to in the KB, the average response time per result returned
5 to 60 VPCs, as shown in Table 7. The running time to increases due to increased search cost. This aligns with the
generate the KBs roughly corresponds to the per-fact elapsed average response time trend in Figure 3 since the compute
time listed in Table 5. units contribute to the facts more significantly than the RDS
We used Swi-Prolog version 8.2.4 on a MacBook Pro for instances. Overall, we conclude that the response time range
the study. When the Prolog KB with policy rules outlined of 5 to 12.43 µs per component is acceptable in practical
in subsection IV-F on Page 139945 was run in Swi-Prolog, systems where security policy evaluation is conducted as a
we measured the elapsed time for the typical composition background activity during an initial design of the system and
rules. We used the time predicate to measure the number when there are changes to the system during release cycles,
of inferences involved in a goal and the average time taken unlike online transaction processing involving users.
based on 10,000 executions of the goal. For example, for We further analyzed the response time and the number
the goal composableRDS(X ), the number of inferences was of inferences for the subgoals in the composability rule for
four, and the elapsed time was 4 µs to return one answer. For compute units:
the list query, composableRDSList(), that returns 11 of the
12 composable RDS instances from our KB of 6,722 facts, composableCompute (X) :− e n c r y p t e d (X) , (
the number of inferences was 102, and the elapsed time was i s P r i v a t e (X) ;
55 µs. The one RDS instance that was not returned in the ( n o t ( c a n S S H F r o m I n t e r n e t (X) ) ,
results was due to the instance not using encryption at rest, s e c g r p _ a s s o c i a t i o n ( _ , X, Y, _ ) ,
which violated the security policy. The results are presented n o t ( n o n C o m p l i a n t S e c G r o u p (Y) ) ) ) .

VOLUME 11, 2023 139947


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

TABLE 8. Comparison of response time for the combined goal and the
conditions in the body of the rule (µs).

FIGURE 3. Average response time in µs and number of inferences Vs


number of compute units involved.

FIGURE 5. Analysis of response times for subconditions.

TABLE 9. Comparison of inferences for the combined goal and the


FIGURE 4. Average response time in µs Vs total number of facts involved. conditions in the body of the rule.

c o m p o s a b l e C o m p u t e L i s t ( L ) :−
f i n d a l l (X, composableCompute (X) , Y) ,
s o r t (Y, L ) .
We measured the average response times and the number
of inferences based on 10,000 executions for each of the
conditions in the body of the rule, encrypted, isPrivate, the
not conditions:
( n o t ( c a n S S H F r o m I n t e r n e t (X) ) ,
s e c g r p _ a s s o c i a t i o n ( _ , X, Y, _ ) , n o t (
n o n C o m p l i a n t S e c G r o u p (Y) ) ) ,
as well as the OR (isPrivate; not conditions) combination.
The sum of splits is the sum of the results for encrypted and
the OR condition. The results are presented in Table 8 and
Figure 5.
We can conclude from this analysis that the sum of
response times of individual conditions in the body is roughly
equal to that of the overall goal except in the case where
there is more backtracking due to individual subgoal failures
that increase the overall average response time, whereas the
response time of individual subgoals without the influence of FIGURE 6. Analysis of average inferences for subconditions.
other subgoals are lower.
Similarly, we analyzed the Inferences for the overall goal
and the conditions in the body of the rule and have presented We observe that there is a constant overhead when the
the results in Table 9 and Figure 6. number of compute units in the result is smaller, which

139948 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

TABLE 10. Average response time (µs) and inferences for the Our framework is extensible to add additional components
non-compliant compute units.
and the corresponding facts and rules and requires declaring
the arity of the new predicates at the beginning of the KB file.

B. COMPOSITION AS A WAY TO SCALE


When the security properties of individual components are
known, they can be analyzed for conformance with a given
security policy. When combining such components to form a
larger system, we do not have to re-analyze them individually
but perform a quick check on the combined set of properties.
Thus, composing a system using components with known
security properties is a practical method for building large-
scale systems. Given a set of security policy statements, any
component that satisfies these is a composable construct.
A larger system built using such composable constructs
would offer sound assurance from a security perspective.
If a component c1 ’s security properties entail a security
policy SP (Pc1 |H SP) and another component c2 ’s
security properties also entail the same policy (Pc2 |H SP)
individually, then the combined system consisting of c1 and
c2 entails the policy P (Pc1 ∩ Pc2 |H SP). We extend this
to c1 , . . . , cn as (∩ni=1 Pci ) |H SP). Though the knowledge
base of security properties of components is the union of the
statements, semantically, the model shrinks as we add more
FIGURE 7. Average response time in µs and inferences to retrieve
non-compliant compute units in the System. facts and rules due to the implied conjunction.

increases the average number of inferences but decreases as C. EXTENSION TO LARGE SCALE SYSTEM COMPOSITION
the number increases. The result set where more instances Large-scale systems utilize several AWS regions worldwide.
do not satisfy the goal, there is more backtracking, which Each region uses multiple availability zones for redundancy.
increases the number of inferences. The general pattern of When a software system is deployed in such distributed
response time and inferences remains consistent for both the environments, the deployment is standardized with templates
combined goal and the subgoals and is linear. using AWS cloudformation or terraform software tools.
We also studied the response time and average number of The configuration of the system in each region in terms
inferences for retrieving the list of compute units that do not of VPCs and the topology of the application with load
conform to the security policy and hence are vulnerable to balancers, compute units, database servers, and the associated
common attacks explained in subsection IV-G on Page 12. security groups are templatized. This way, a larger distributed
This list complements the list of composable compute units system is built using the building blocks deployed in each
and is useful information for the operations personnel to region utilizing multiple availability zones. Our approach
remediate them to comply with the security policy. The automatically converts the security groups and network
results are presented in Table 10 and Figure 7. access control lists of such building blocks into equivalent
Practitioners can concentrate on just getting the list of Prolog statements. The statements can then be verified
non-compliant components in a system to address the issues against the organization’s security policy. If the statements
and then, for double checking, perform a compliance check to satisfy the security policy in one region, the combined system
ensure that all components are returned in the composability will also satisfy the same policy because of the identical
query. configuration of the system in the other regions. As a double-
check, verification per region can still be performed.
A. EXTENSIBILITY OF THE PROLOG FRAMEWORK
While Prolog is not efficient for number crunching and rep- 1) CONTINUOUS VERIFICATION
resenting complex data structures and lacks rich input/output Security is not a point in time but a continuous process. While
interfaces, its declarative style makes it easy to represent VPC configurations and individual resource configurations
knowledge as facts and rules, and the built-in backtrack- change during the application life cycle from release to
ing, pattern matching, and recursion make it suitable for release, the security policies to be satisfied are comparatively
searching through the KB, returning all possible results and static. During each release cycle, the VPC configuration
writing compact and understandable programs for symbolic can be gathered, versioned, and analyzed against security
computation. policies. Thus, one can evaluate how configurations change or

VOLUME 11, 2023 139949


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

can be triggered by the same notification to analyze if there


are violations in near real-time. Automatic remediation can
also roll back the changes and notify stakeholders. The
configurations of individual resources become composable
constructs for an organization’s security policy. Standard
terraform or cloudformation templates can be developed for
these components with known security properties to compose
new systems guaranteed to satisfy the organization’s policies.

VII. CONCLUSION
Although composability is a hard problem, identifying the
components that satisfy a given security policy to compose
a larger system is of practical importance, particularly in
FIGURE 8. Security policy verification framework.
cloud environments. We presented a framework using FOL
to analyze the composability of components and identify the
drift over a period of time. The framework shown in Figure 8 composable constructs that satisfy a given security policy.
can be automated for DevSecOps. Our approach focused on the composability of components
from a security policy perspective. We applied our method to
VI. LIMITATIONS OF THE STUDY AND FUTURE WORK analyze the virtual private cloud in AWS by translating the
Our study covers a few cloud infrastructure components that configuration information into Prolog facts and combining
are most prevalent in practical systems. To make the approach them with rules representing the policies and network
more practical, we should extend our implementation to cover properties. We could show the components that satisfy a
all relevant infrastructure components, including server-less given policy and those that do not. We posit that assembling
computes known as Lambdas, Queuing, Notification, and a large-scale system with composable constructs - compute
Logging frameworks. We map the component configuration units, databases, and perimeter security nodes such as bastion
into logical statements using a schema that we think is hosts, ensures that the combined system will satisfy the
sufficiently expressive to prove if the configuration satisfies overall security policy without re-analyzing the components
a given security policy. We can add additional features of after the system is built. This study can be extended to create
the components to the facts, such as backup retention and composable constructs, assemble a large-scale system using
replication in the case of databases. Additionally, security those constructs, and re-validate when the system changes to
teams generally write security policy statements in natural identify and correct drifts.
language. Translating them into Prolog facts and rules
requires manual work unless a powerful natural language REFERENCES
processing approach is used in automation. We used a shell [1] Amazon Web Services. (2023). AWS Shared Responsibility. [Online].
script and AWS Command Line Interface to automate the Available: https://aws.amazon.com/compliance/shared-responsibility-
generation of Prolog facts corresponding to the configuration model/
[2] The Open Worldwide Application Security Project. OWASP Top 10.
of the chosen components. For flexibility and better perfor- Accessed: Nov. 26, 2023. [Online]. Available: https://owasp.org/www-
mance, this could be done in a language such as Java and project-top-ten/
made more generic to handle other cloud platforms such [3] J. M. Wing, ‘‘A symbiotic relationship between formal methods and
security,’’ in Proc. Comput. Secur., Dependability, Assurance, Needs
as Microsoft Azure and Google Cloud Platform. Although Solutions, Williamsburg, VA, USA, 1998, pp. 26–38.
we provided the necessary logical framework for the study, [4] E. M. Clarke and J. M. Wing, ‘‘Formal methods: State of the art and future
we need to extend the formalism to be more general. directions,’’ ACM Comput. Surveys, vol. 28, no. 4, pp. 626–643, Dec. 1996.
[5] L. Lamport, ‘‘The temporal logic of actions,’’ ACM Trans. Program. Lang.
Syst., vol. 16, no. 3, pp. 872–923, May 1994.
A. FUTURE WORK [6] C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and
M. Deardeuff, ‘‘How Amazon uses formal methods,’’ Commun. ACM,
KB generation will be implemented using a high-level vol. 58, no. 4, pp. 66–73, Apr. 2015.
language for portability. This framework can be incorporated [7] D. McCullough, ‘‘A hookup theorem for multilevel security,’’ IEEE Trans.
into a cloud system, as in Figure 8. The KB generator Softw. Eng., vol. 16, no. 6, pp. 563–568, Jun. 1990.
and updater can run in the AWS account as a Lambda, [8] J. A. McDermid and Q. Shi, ‘‘Secure composition of systems,’’ in Proc.
8th Annu. Comput. Secur. Appl. Conf., San Antonio, TX, USA, 1992,
and the knowledge base can be stored in a database pp. 112–122.
such as a DynamoDB indexed by the pair (region, VPC). [9] R. Canetti, Y. Dodis, R. Pass, and S. Walfish, ‘‘Universally composable
To continuously update the facts when changes occur in the security with global setup,’’ in Theory of Cryptography (Lecture Notes
in Computer Science), vol. 4392. Berlin, Germany: Springer, 2007,
VPC resources, a Lambda function can be invoked when pp. 61–85, doi: 10.1007/978-3-540-70936-7_4.
change notifications occur via a Simple Notification Service [10] J. Backes, S. Bayless, B. Cook, C. Dodge, A. Gacek, A. J. Hu, T. Kahsai,
(SNS) and Simple Queue Service (SQS). Resource changes B. Kocik, E. Kotelnikov, J. Kukovec, and S. McLaughlin, ‘‘Reachability
analysis for AWS-based networks,’’ in Computer Aided Verification
can be gathered and added to the KB in a version-controlled Lecture Notes in Computer Science), vol. 11562, Cham, Switzerland:
manner. The policy checker run in another Lambda function Springer, 2019, pp. 231–241, doi: 10.1007/978-3-030-25543-5_14.

139950 VOLUME 11, 2023


K. Muniasamy et al.: Analyzing Component Composability of Cloud Security Configurations

[11] K. Jeyaraman, N. Bjorner, G. Outhred, and C. Haufman. Accessed: ROHIT CHADHA received the B.Tech. degree in
Nov. 26, 2023. Automated Analysis and Debugging of Network Poli- computer science and engineering from the Indian
cies. [Online]. Available: https://www.microsoft.com/en-us/research/ wp- Institute of Technology, New Delhi, India, in 1997,
content/uploads/2016/02/secguru.pdf and the Ph.D. degree from the Department of
[12] M. Barrére, R. Badonnel, and O. Festor, ‘‘A SAT-based autonomous Mathematics, University of Pennsylvania, in 2003.
strategy for security vulnerability management,’’ in Proc. IEEE Netw. He is currently an Associate Professor with the
Operations Manag. Symp. (NOMS), Krakow, Poland, May 2014, pp. 1–9, Department of Electrical Engineering and Com-
doi: 10.1109/NOMS.2014.6838309.
puter Science, University of Missouri, Columbia,
[13] M. Oulaaffart, R. Badonnel, and C. Bianco, ‘‘An automated SMT-
and the Director of the Mizzou Cybersecurity
based security framework for supporting migrations in cloud composite
services,’’ in Proc. IEEE/IFIP Netw. Operations Manage. Symp., Budapest,
Center. He has held research positions with
Hungary, Apr. 2022, pp. 1–9, doi: 10.1109/NOMS54207.2022.9789768. INRIA, Saclay, France; the University of Illinois at Urbana–Champaign;
[14] C. A. R. Hoare, Communicating Sequential Processes. Upper Saddle River, Instituto Superior Tecnico, Portugal; and the University of Sussex, U.K. His
NJ, USA: Prentice-Hall, 1985. research interest includes formal engineering methods for computer security.
[15] L. Lamport, Specifying Systems: The TLA+ Language and Tools for
Hardware and Software Engineers. Reading, MA, USA: Addison-Wesley,
2002.
[16] M. Dickinson, S. Debroy, P. Calyam, S. Valluripally, Y. Zhang,
R. B. Antequera, T. Joshi, T. White, and D. Xu, ‘‘Multi-cloud performance
and security driven federated workflow management,’’ IEEE Trans. Cloud
Comput., vol. 9, no. 1, pp. 240–257, Mar. 2021.
[17] J. Väänänen. (2014). Many-Sorted Logic. [Online]. Available:
https://docplayer.net/155578006-Many-sorted-logic-jouko-vaananen-
1-2-easllc-july-department-of-mathematics-and-statistics-university-of-
helsinki.html
[18] L. Kovács and A. Voronkov, ‘‘First-order theorem proving and vampire,’’
Computer Aided Verification (Lecture Notes in Computer Science),
vol. 8044, Berlin, Germany: Springer, 2013, pp. 1–35. PRASAD CALYAM (Senior Member, IEEE)
[19] E. Kotelnikov, ‘‘Checking network reachability properties by automated received the M.S. and Ph.D. degrees from the
reasoning in first-order logic,’’ in Automated Theorem Proving With Department of Electrical and Computer Engineer-
Extensions of First-Order Logic. Gothenburg, Sweden: Chalmers Univ. ing, Ohio State University, in 2002 and 2007,
Technology, 2018, ch. 5, pp. 114–131. respectively. He is currently a Full Professor with
[20] D. Sadig. (2018). CS 221 Lecture 16 [Powerpoint Slides]. the Department of Computer Science, University
[Online]. Available: https://web.stanford.edu/class/archive/cs/ of Missouri, Columbia. Previously, he was the
cs221/cs221.1186/lectures/index.html#include=logic1.js&mode=print1pp Research Director with the Ohio Supercomputer
[21] H. B. Enderton, A Mathematical Introduction to Logic, 2nd ed. Amster- Center. His research interests include distributed
dam, The Netherlands: Elsevier, 2001. and cloud computing, computer networking, and
[22] Amazon Web Services. AWS CLI Reference. Accessed: Nov. 26, 2023. cybersecurity.
[Online]. Available: https://docs.aws.amazon.com/cli/latest/reference/
[23] Github. Implementation Source Code. Accessed: Nov. 26, 2023. [Online].
Available: https://github.com/kandamuniasamy2016/composability
[24] Amazon Web Services. Regions and Availability Zones. Accessed:
Nov. 26, 2023. [Online]. Available: https://docs.aws.amazon.com/
managedservices/latest/appguide/cfn-ingest-ex-3-tier.html

KANDASAMY MUNIASAMY received the B.E.


degree in mechanical engineering from the Uni-
versity of Madras, Chennai, India, in 1983,
and the M.S. degree in computer science from
the University of Louisiana, Lafayette, in 1991.
He is currently pursuing the Ph.D. degree in M. SETHUMADHAVAN received the Ph.D.
cyber security with Amrita Vishwa Vidyapeetham, degree in mathematics from the University of
Coimbatore, Tamil Nadu, India. As a software Calicut, India, in 1997. He is currently a Professor
professional focused on database APIs and enter- in mathematics and computer science with Amrita
prise security products and services, he worked Vishwa Vidyapeetham, Coimbatore, India. He has
for various companies, including Sybase, Oracle, Netscape, Verisign, and been the Head of the TIFAC-Centre of Relevance
Symantec. He also leads a security team in Verizon, USA, focusing on the and Excellence in Cyber Security, since 2005.
security of applications hosted in public cloud infrastructures and on-premise His research interests include number theory and
data centers. His research interests include using formal methods to prove the cryptology.
security of systems and secure composition.

VOLUME 11, 2023 139951

You might also like