Analyzing Component Composability of Cloud Security Configurations
Analyzing Component Composability of Cloud Security Configurations
ABSTRACT Security is a major concern when building large-scale computer systems. Cloud services have
made it easier to provision large-scale systems on demand over the Internet. While the cloud service providers
provide the required building blocks such as compute units, database servers, and storage, customers are
still responsible for securely combining these systems to satisfy their organization’s security policy. The
secure development and operation of such large-scale systems present technical challenges. Composing a
larger system using components with known security properties that satisfy a given security policy without
re-analyzing the individual components is a difficult problem. In this study, we attempted to analyze the
composability of components from a security perspective using first-order predicate logic. We posit that if
we build a system using individual components that satisfy a security policy, the composed system will be
sound with regard to that policy. Additionally, the methodology can be used to identify drifts or violations
during future changes in the system by running checks during the system release cycles for continuous
verification.
practical large-scale systems. The composition proposed for [14], TLA+ by Leslie Lamport [15], Portunes Algebra [16],
cryptographic protocols by Canetti et al [9] does not work for Vampire [18] and Datalog with Soufflé [19]. We preferred to
general system composition. use Prolog because it is well known and has a better chance
AWS uses automated reasoning tools to verify the network of adoption given that people in academia and industry
reachability of resources in Virtual Private Networks (VPCs) are familiar with it. This section briefly introduces FOL
[10]. The tools used include Vampire, MonoSAT, and Soufflé, and discusses the representation of security and network
which formalize network semantics into logic and perform properties for a sample application.
reasoning to answer network configuration and reachability
questions. Examples include identifying any resources that A. A SHORT INTRODUCTION TO FOL
are tagged ‘Bastion’ or any compute instances that are FOL uses terms, variables, constants, functions, and predi-
reachable from the Internet using the Secure Shell (SSH) cates as its main elements. The signature of the FOL formula
protocol. is defined as follows:
Microsoft Azure has developed a tool to validate network X
connectivity policies automatically [11]. The tool can check = ⟨F, P, arr⟩
selected properties of policies, such as whether some traffic where F and P are disjoint sets of function and predicate
is permitted or denied, and compare two policies to identify symbols, and arr is an arity function: F ∪ P → N giving the
drifts. The tool uses bit-vectors to encode policies and the number of parameters for F and P. A term can be an object,
theorem prover Z3 as the underlying solver. a variable, a constant, or a function. A predicate p(t1 , . . . , tn )
These tools use logic to answer questions on the configu- is an atomic formula (atoms for short) where each ti is a term.
ration that return a list or questions on reachability that return Predicates with zero arity are treated as propositions; hence,
yes/no. Reference [12] postulates an approach for security the FOL subsumes propositional logic. Predicates capture the
vulnerability management in systems by representing the properties of objects and their relationship as in
vulnerabilities in a system and corrective actions as a SAT
problem. Reference [13] describes an automated SMT-based placement(computer_c1, vpc1, subnet1)
approach to prevent published vulnerabilities in components
where the relationship indicates that computer c1 is in
used in a composite cloud service. Our research focuses on
subnet1 of vpc1. Logical connectives such as conjunc-
representing the security properties of individual components
tion (∧), disjunction (∨), negation (¬), and quantification
and verifying whether they hold good in conjunction as well
(universal (∀) and existential (∃)) produce formulas. If a
as satisfy a given set of security policy statements of an
quantifier does not cover a variable in a formula, it is free;
organization to prove the composability of those components.
otherwise, it is bound. The use of variables helps compactly
Although [10], [11], [12], and [13] have made good progress
represent knowledge. A formula or a term is called ground
in applying formal methods to answer questions on net-
if it has no occurrence of variables. Henceforth, facts will be
work reachability, security configurations, and vulnerability
ground formulas in a FOL signature. The logical implications
remediation, these do not address the general composability
are if-then rules. Quantified variables, predicates, logical
of components with respect to a security policy. Our work
connectives, and implications are combined into expressive
builds on utilizing the security property representation to
formulas.
verify whether a given set of components can be composed
Reasoning is done by asserting facts and then deducing
to assemble a larger system, focusing primarily on platform
new facts using implications. A definite clause is a general-
components such as compute units, databases, and network
ized implication of the form [20]:
elements, which are the fundamental building blocks of a
cloud-based system. (p1 ∧ · · · ∧ pk ) → q.
All the free variables in a definite clause are assumed to be
III. REPRESENTATION OF SYSTEMS AND PROPERTIES universally quantified across the clause. If q is a propositional
We require a simple and elegant notation to express the symbol, a predicate, or a logical constant such as false, then
properties of systems, problems, and proofs. Notations based it is known as a Horn clause. An example is:
on mathematical symbols with a vocabulary much smaller likes(X , Y ) ∧ likes(Y , X ) → buddies(X , Y ).
than that of a natural language can be easily combined
into expressions and manipulated using rules to produce This implies that if X and Y like each other, they are buddies,
new expressions. Propositional Logic and First-Order Logic assuming the domain of discourse to be people. When
(FOL) have been used extensively to represent knowledge combined with facts likes (alice, bob) and likes
about the world and reason with that knowledge. One of the (bob, alice), we obtain a new assertion buddies
major strengths of logical representation is expressiveness. (alice, bob).
We use FOL, specifically Prolog, for the representation and We use many-sorted FOL to represent the VPC resources
reasoning for our problem domain. Other choices include as liberal relations. We explain the sorts, predicates, and their
C.A.R. Hoare’s Communicating Sequential Processes (CSP) arguments used in the modeling later in subsection III-C.
Interested readers are referred to [21] for comprehensive redirected to the media servers where the meeting itself will
review of FOL. be hosted in a geographical location closest to the user.
TABLE 1. Predicate argument names and their sorts. In the knowledge base, we will have several instances
of the relation vpc, one each for a VPC with a unique
ID, the corresponding IP address block, and the AWS
region name for a given AWS account where the VPC is
created.
• The predicate subnet describes a subnet within a VPC
identified by a unique subnet ID with the corresponding
IP address block and whether the subnet is public (i.e.,
Internet-facing) or private (internal-facing).
• The predicate nacl describes the Network Access
Control List (NACL) associated with the VPC and
its subnets. A VPC will have one or more NACLs,
but a subnet can be associated with only one NACL.
nacl_association associates a NACL with the
VPC and a subnet. AclId is a unique identifier for a
NACL in a region, AclAssocId is a unique identifier
that associates a NACL with a VPC and a subnet in it,
Direction indicates whether the traffic is ingress or
egress, RuleNo is an integer, and ProtocolNo identi-
fies the transport protocol. SourceIPAddress is the
IP address of the source where traffic originates from
in CIDR notation, FromPort and ToPort denote
the starting and ending port numbers respectively, and
Permission denotes whether the traffic is allowed or
denied.
• The predicate secgroup describes the Security
Group construct that enforces allowed traffic flows for
TABLE 2. Predicates and their arguments. AWS resources such as compute units and databases.
SgName is a string, SgId is a unique identi-
fier for the security group, Direction is one of
(ingress, egress), (RuleNo) is the security group rune
number, Protocol identifies the transport protocol,
FromPort and ToPort are the starting and ending
port numbers. SourceIPAddress is the IP address
of the source where traffic originates from in CIDR
notation, and Description is a string.
• The predicate compute describes a compute unit.
InstanceName is a set of strings that may be used
to identify virtual machine instances, InstanceId
is a unique ID created by AWS to identify a vir-
tual machine instance, EncryptionStatus denotes
whether or not the virtual machine has its disk drives
encrypted, ModelName refers to one of the supported
compute types in AWS and PublicOrPrivate
indicates whether the compute unit is public-facing
or internal. Additionally, a compute unit’s placement
in a subnet of a VPC and its association with a
involved are inferred from the table 1. VpcId refers security group are described by placement and
to unique identifiers for VPCs in an AWS account, secgrp_association predicates.
IPAddressBlock refers to the non-intersecting IP • The predicate rds describes a database unit. RdsName
address blocks in the CIDR notation (a.b.c.d/n, is a string, EncryptionStatus indicates whether
where a - d are octets and n is the number of consecutive the database is encrypted or not, and RdsType is
leading 1-bits from left to right in the subnet mask) a string that represents the type of database such as
for the individual VPCs in an account, AccountId Postgres, and Aurora-MySQL. An rds relation will
refers to the unique numeric account Ids in AWS and have one or more associated placement relations
AWSRegion refers to the names of the AWS regions. depending on how many subnets the RDS is created in as
well as one or more secgrp_association relations compute ( " compute_name " , " c o m p u t e _ i d " ,
connecting it to the associated security groups. unencrypted , " r5_2xlarge " , public ) .
• The predicate alb describes a load balancer. AlbName r d s ( " rds_name " , e n c r y p t e d , " p o s t g r e s " ) .
is a string, and AlbType is a string and refers to the p l a c e m e n t ( " compute_name " , " c o m p u t e _ i d " ,
type of load balancer, such as a network or application. compute , " v p c _ i d " , " s u b n e t _ i d " ) .
Similar to compute and rds predicates, alb relation p l a c e m e n t ( " r d s _ n a m e " , " r d s _ i d " , rdms , "
will have one or more associated placement relations vpc_id " , " subnet_id " ) .
depending on how many subnets the alb is created in as subnet ( " vpc_id " , " subnet_id " , ip
well as one or more secgrp_association relations (10 ,136 ,214 ,0/23) , public ) .
connecting it to the associated security groups.
The first fact describes a compute unit named compute_unit
• The predicate placement describes a resource’s
with an id compute_id. The second fact describes the
placement in a subnet within a VPC. Here, the
database unit named rds_name. The placement facts for
ResourceType will refer to VPC resources compute, rds,
the compute unit and the rds unit associate them with their
or alb, and the ResourceName and ResourceId
corresponding subnets where these resources are created.
will be the corresponding name and id for the resource,
The subnet fact describes the subnet configuration with id
respectively and SubnetId is the identifier of the
subnet_id.
subnet the resource is created in.
A resource is public if it is created in a public subnet.
• The predicate secgrp_association describes an
Expressing this behavior as an axiom requires us to describe a
association between a VPC resource and a security
system predicate isPublic and an implication statement in
group that governs traffic into and out of the resource.
terms of the resource predicates placement and subnet.
Here, the ResourceName and ResourceId will be
We express this as follows:
the name and identifier for the resource associated with
placement(X , _, _, _, Y ), subnet(_, Y , _, public) →
the security group with the name SgName and identifier
isPublic(X ).
SgId.
2) SECURITY POLICIES
D. COMPOSABILITY VERIFICATION MODELING
Once we have the knowledge base of the system, we can
1) KNOWLEDGE BASE
verify whether the components in the system satisfy the
The set P consists of all resource configuration predicates
security policies set by the administrators. For our study,
above (See Table 2). We assume, in addition to resource
we assume a set SPred of security policy goal predicates.
predicates, a set S of system predicates is used to describe
Each element of the set SPred is a predicate, spi , whose
certain behaviors or specifications of the system. Each
sorts we will leave unspecified since these depend upon the
element of the set S is a predicate, si , whose sorts we will
application. We represent security policies themselves as a
leave unspecified since these depend upon the application.
set SP of logical implication statements using the resource
We describe the system behavior using the resource config-
configuration and system predicates we described earlier and
uration predicates, and the system predicates as axioms that
security policy goal predicates. These security policies are of
are FOL implications. These axioms are of the form:
the form:
p1 (T˜1 ) ∧ · · · ∧ pk (T˜k ) → si (T̃ )
p1 (T˜1 ) ∧ · · · ∧ pk (T˜k ) ∧ s1 (T˜1′ ) ∧ · · · ∧ sn (T˜n′ ) → spi (T̃ )
where p1 , . . . , pk belong to P, si is a predicate in set S
where p1 . . . pk are the resource predicates, s1 . . . sn are the
and T̃ is a tuple of variables or constants of sorts described
system predicates, and T̃ is a tuple of variables or constants
in Table 1. These implications are axioms that are system-
of sorts, as explained before and spi is a security policy goal
dependent. Intuitively, we use these axioms to define certain
predicate in the set SPred.
characteristics, such as what it means for a VPC resource
Definition 2: Given a knowledge base KB = (P, S, A),
to be public or private based on network placement. Now,
a set of security policy statements SP, a security policy goal
we formally define Knowledge Base.
predicate spi (T̃ ) ∈ SPred, and T̃ a tuple of constants and
Definition 1: A Knowledge Base (KB) is a triple (P, S, A)
variables, ? − KB ∪ SP, spi (T̃ ) returns a set of assignments
consisting of a set of resource configuration predicates P, a set
Assn_Set to variables in T̃ such that for each assignment
of system predicates S, and a set of system behavioral axioms
ρ ∈ Assn_Set,
A of the form:
• KB ∪ SP |H spi (T̃ /ρ), where T̃ /ρ is interpreted as ρ
p1 (T˜1 ) ∧ · · · ∧ pk (T˜k ) → si (T̃ ). applied to T̃ .
In case T̃ consists of only constants, it returns true if KB ∪
We explain the KB contents with an example.
SP |H spi (T̃ ) and false otherwise.
Example 1: Consider the following facts using the
Remark 1: When we encode the above query in Prolog,
resource configuration predicates for a compute unit and a
false may be returned if Prolog finds no assignments or if
relational database unit.
Prolog is not able to prove that KB ∪ SP |H spi (T̃ ) in the case s e c g r p _ a s s o c i a t i o n ( instance_name ,
T̃ consists of only constants. Thus, a false output by Prolog i n s t a n c e _ i d , sg_name , s g _ i d ) ,
is distinct from the logical false, and care must be taken to
which associates with the security group that specifies the
interpret it.
ingress and egress rules for the compute unit.
Given a goal predicate, say, composableCompute
Using rules in Prolog, we represent the characteristics or
pertaining to compute units, ? − KB ∪ SP,
behaviors that are FOL implications. An example is whether
composableCompute(X̃ ) will return a set of assign-
a compute unit is encrypted. An underscore (_) denotes the
ments to X, which are the compute units that satisfy
do not care values in the following statement:
the security policy on the configuration requirements
such as encryption, network placement, open ports, etc. e n c r y p t e d (X) :− compute (X, _ , Y, _ , _ ) ,
The remaining compute units do not compose securely Y = encrypted .
with regard to this policy, and those units could be
obtained using the negative goal predicate expression: ? − This rule is equivalent to the logical statement:
KB ∪ SP, compute(X̃ ), not(composableCompute(X̃ )). We compute(X , _, Y , _, _) ∧ Y = encrypted →
explain this using the following example. encrypted(X ). The consequent of the implication becomes
Example 2: Suppose we have the following system the head of the Prolog rule, and the antecedents are in the
axioms that describe system characteristics regarding the body of the rule. The head and the body are separated by:-.
encryption of a compute unit and placement in a private Another example of a behavior or a specification is to
subnet. An underscore denotes a don’t care value. specify if a resource is accessible over the Internet using the
compute(X , _, Y , _, _) ∧ Y = encrypted → Secure Shell protocol. The rule:
encrypted(X ). canSSHFromInternet(X) :- compute(X, _,_,
placement(X , _, _, _, Y ) ∧ subnet(_, Y , _, private) → _, public), secgrp_association(X, _, Y,
isPrivate(X ). _), secgroup(Y, _,ingress, _, _,
Assume we have a composableCompute goal pred- Fromport, Toport, ip(0,0,0,0/0), _),
icate defined in a security policy SP using the above Fromport \= null, between(Fromport,
system predicates and the resource predicate compute: Toport, 22),
compute(X , _, _, _, _) ∧ encrypted(X ) ∧ isPrivate(X ) → states that one can access a compute unit using SSH if the
composableCompute(X ). compute unit is public-facing and is associated with a security
The query: ? − KB ∪ SP, composableCompute(X ) group that permits ingress traffic on port 22. These general
will return all compute units that are encrypted and created rules are unlikely to change from one system to another for a
in a private subnet. cloud provider.
The query: We generated the knowledge base of logical statements
? − KB ∪ SP, compute(X , _, _, _, _), for the sample application using the model described
not(composableCompute(X )) in Section III. The proposed implementation approach is
will return the compute units that do not satisfy the policy. illustrated in (Figure 2). The inference model comprises two
components. The first is the knowledge base of the security
IV. IMPLEMENTATION OF OUR APPROACH FOR A properties of the components in the system, which is an AWS
SAMPLE APPLICATION VPC in this example. These properties are ground atomic
A. USING PROLOG TO REPRESENT THE FACTS AND formulas, which are Prolog facts in our implementation
BEHAVIORS OF SYSTEM COMPONENTS corresponding to the individual component configurations in
Using the predicates described above, we represent the the VPC. Additionally, we added network properties that are
ground facts regarding a VPC and its resources using Prolog. common to all VPCs, such as what is meant by a component
We describe the salient security properties of the components that is public facing (i.e., open to the Internet), the reachability
using atomic formulas (i.e., Prolog facts). For example, of one component to the other, and so on. These properties are
a compute unit is described by the following predicates: system behavioral axioms. We automated the generation of
Prolog facts corresponding to the component configurations
compute ( i n s t a n c e _ n a m e , i n s t a n c e _ i d , using a bash shell program that utilized AWS’ Command Line
encrypted , private) , Interface (CLI) [23]. This program operates at an account
level to generate Prolog facts for all VPCs in that account
that describes the compute unit. or individually for a given VPC through a command line
placement ( instance_name , instance_id , argument containing VPC_id [22]. The network properties
compute , v p c _ i d , s u b n e t _ i d ) , are coded manually as Prolog rules, which are implication
statements in the FOL. These facts and rules form the
which describes the network placement for the compute unit, knowledge base. The Security Policy is a set of Prolog
such as the subnet in which it is created and the vpc for the rules that we code manually because this can vary from
subnet, and organization to organization and can be found in policy and
"RuleAction": "deny", subnet level and provide an extra layer of protection if the
"RuleNumber": 1000 security groups are overly permissive.
},
s e c g r o u p ( " sg−name " , " sg−i d " , i n g r e s s , 0 ,
{ // Permits SSH traffic from
tcp , 0 , 65535 , i p ( a , b , c , d / 3 2 ) , n u l l )
subnets
.
"CidrBlock": "10.0.1.0/24",
s e c g r o u p ( " sg−name " , " sg−i d " , i n g r e s s , 2 ,
"Egress": false,
t c p , 2 2 , 2 2 , i p (w, x , y . z / 3 2 ) , n u l l ) .
"PortRange": {
s e c g r o u p ( " sg−name " , " sg−i d " , e g r e s s , 0 ,
"From": 22,
−1, 0 , 6 5 5 3 5 , i p ( 0 , 0 , 0 , 0 / 0 ) , n u l l ) .
"To": 22
}, The fact secgroup uses the following schema:
"Protocol": "6",
s e c g r o u p ( name , i d , i n g r e s s / e g r e s s , r u l e
"RuleAction": "allow",
t y p e , p r o t o c o l ( t c p , udp , e t c , . ) ,
"RuleNumber": 200
from p o r t , t o ~ p o r t , s o u r c e /
},
d e s t i n a t i o n , and d e s c r i p t i o n ) .
{ ..... }
], For example,
"IsDefault": false,
s e c g r o u p ( l a u n c h −w i z a r d −2 , " sg −0
"NetworkAclId": "acl-id1",
e37d454d98079623 " , i n g r e s s , 0 , t c p ,
"Tags": [
22 , 22 , ip ( 0 , 0 , 0 , 0 / 0 ) , n u l l )
{
"Key": "BILLING", specifies that ingress access from the Internet (0.0.0.0/0) to
"Value": "RD" port 22 (SSH) is allowed.
},
< more tags > D. REPRESENTATION OF VPC RESOURCES
] The fact compute describes a compute unit with its instance
}, name, instance ID, encrypted or unencrypted volume used,
< more associations > type or model of the unit, and whether it is public or
] private. The fact placement describes the vpc and the
} subnet in which the compute unit is created, and the fact
The corresponding Prolog facts are: secgrp_association relates the security group name
and the security group ID with the unit.
n a c l _ a s s o c i a t i o n ( " vpc−i d " , " a c l a s s o c −i d 1
" , " a c l −i d 1 " , " s u b n e t −i d 1 " ) . s e c g r p _ a s s o c i a t i o n ( " i n s t a n c e −name " , "
n a c l ( " vpc−i d " , " a c l −i d 1 " , e g r e s s , 2 0 0 , i n s t a n c e −i d " , " sg−name " , " sg−i d " ) .
6 , 22 , 22 , ip ( 0 , 0 , 0 , 0 / 0 ) , allow ) . compute ( " i n s t a n c e −name " , " i n s t a n c e −i d " ,
n a c l ( " vpc−i d " , " a c l −i d 1 " , i n g r e s s , 2 0 0 , unencrypted , " c5_2xlarge " , private ) .
6 , 22 , 22 , ip ( 1 0 . 0 . 1 . 0 / 2 4 ) , allow ) . p l a c e m e n t ( " i n s t a n c e −name " , " i n s t a n c e −i d "
n a c l ( " vpc−i d " , " a c l −i d 1 " , e g r e s s , 1 0 0 0 , , compute , " vpc−i d " , " s u b n e t −i d 1 " ) .
−1, −1, −1, i p ( 0 , 0 , 0 , 0 / 0 ) , deny ) .
The fact alb describes a load balancer by its name
The fact nacl_association describes the relationship and type, such as ‘application’ or ‘network’. The fact
between the subnets and the NACL groups in a VPC. The placement connects the vpc and the subnets the load
fact nacl describes a nacl entry by the vpc-id it is associated balancer is associated with.
with, the acl ID, the rule type - ingress or egress, the rule
a l b ( " a l b −name " , a p p l i c a t i o n ) .
number, the protocol (such as TCP as identified by the
p l a c e m e n t ( " l b −name " , " lbname−i d . e l b . us−
numeral 6), beginning port number, ending port number, the
west −2. amazonaws . com " , a l b , " vpc−i d " ,
source IP address, and the action, allow or deny.
" s u b n e t −i d 1 " ) .
Similarly, we encoded the Prolog facts corresponding to
p l a c e m e n t ( " l b −name " , " lbname−i d . e l b . us−
the security groups based on the responses for the correspond-
west −2. amazonaws . com " , a l b , " vpc−i d " ,
ing CLI commands. In addition to NACLs, security groups
" s u b n e t −i d 2 " ) .
are associated with resources in the VPC. While NACLs
p l a c e m e n t ( " l b −name " , " lbname−i d . e l b . us−
support ‘allow’ and ‘deny’ rules, security groups only support
west −2. amazonaws . com " , a l b , " vpc−i d " ,
‘allow’ rules. NACLs are stateless, and return traffic has to
" s u b n e t −i d 3 " ) .
be permitted by explicit rules, unlike security groups where
return traffic is automatically permitted. Because NACLs are The fact rds describes the rds instance by its name,
at the subnet level, the rules apply to all resources at this whether encrypted at rest or not, and the type of
TABLE 3. Sample VPC configuration. subsections IV-C and IV-D. For example, a resource is public
if it is created in a public-facing subnet. A compute unit is
accessible from the Internet via SSH if the unit is public
and has an associated security group that permits the SSH
protocol.
i s P u b l i c (X) :− p l a c e m e n t (X, _ , _ , _ , Y) ,
s u b n e t ( _ , Y, _ , p u b l i c ) .
i s P r i v a t e (X) :− p l a c e m e n t (X, _ , _ , _ , Y) ,
s u b n e t ( _ , Y, _ , p r i v a t e ) .
A resource X is public if it is in a subnet Y, which is public.
c a n S S H F r o m I n t e r n e t (X) :− i s P u b l i c (X) ,
TABLE 4. Sample VPC resource configuration. s e c g r p _ a s s o c i a t i o n (X, _ , Y, _ ) ,
s e c g r o u p (Y, _ , i n g r e s s , _ , −1, n u l l ,
null , ip (0 ,0 ,0 ,0/0) , _ ) .
c a n S S H F r o m I n t e r n e t (X) :− i s P u b l i c (X) ,
s e c g r p _ a s s o c i a t i o n (X, _ , Y, _ ) ,
s e c g r o u p (Y, _ , i n g r e s s , _ , t c p ,
FromPort , T o P o r t , i p ( 0 , 0 , 0 , 0 / 0 ) , _
),
F r o m P o r t \ = n u l l , b e t w e e n ( FromPort ,
ToPort , 22) .
The first SSH rule handles a security group that opens all
ports (indicated by null) pertaining to all protocols (indicated
by -1) to the Internet and includes TCP port 22. The second
rule explicitly checks whether the TCP port 22 is within the
specified range.
database - MySQL, Postgres, and so on. Similar to the
F. REPRESENTATION OF SECURITY POLICIES IN PROLOG
fact compute, there is an associated placement fact that
describes the vpc and subnet in which the rds instance is Security policies can be rules or facts that specify required
created, and a secgrp_association fact that connects conditions. For example, encryption at rest for compute and
it with its security group name and its ID. database instances is coded as:
e n c r y p t e d (X) :− compute (X, _ , Y, _ , _ ) ,
r d s ( " r d s −name " , e n c r y p t e d , " p o s t g r e s " ) .
Y = encrypted .
p l a c e m e n t ( " r d s −name " , " db−r e s o u r c e i d " ,
r d s E n c r y p t i o n A t R e s t (X) :− r d s (X, Y, _ ) , Y
rdms , " vpc−i d " , " s u b n e t −i d 1 " ) .
= encrypted .
p l a c e m e n t ( " r d s −name " , " db−r e s o u r c e i d " ,
rdms , " vpc−i d " , " s u b n e t −i d 2 " ) . Similarly, traffic to standard ports such as TCP ports 22, 80,
s e c g r p _ a s s o c i a t i o n ( " r d s −name " , " db− and 443 can be specified, and the security groups that permit
r e s o u r c e i d " , " sg−name " , " sg−i d " ) . traffic to other ports can be identified using the following rule:
Prolog facts for our sample VPC (Figure 1) and its n o n C o m p l i a n t S e c G r o u p ( ID ) :− s e c g r o u p ( ID ,
resources are presented in Tables 3 and 4. The resources _ , i n g r e s s , _ , _ , X, _ , i p
considered are compute units, a bastion host, a load balancer, ( 0 , 0 , 0 , 0 / 0 ) , _ ) , n o t ( memberchk (X,
and a relational database. We have substituted placeholders [443 , 80 , 2 2 ] ) ) .
for the actual resource names and resource IDs. In the next
subsection, we discuss how we use Prolog rules to represent The following policy rules specify the conditions for the
the semantics of network properties and security policies. composability of EC2 instances and database instances.
As discussed above, the facts are automatically created from composableCompute (X) :− e n c r y p t e d (X) , (
the AWS account using a shell program and AWS CLI. i s P r i v a t e (X) ;
( n o t ( c a n S S H F r o m I n t e r n e t (X) ) ,
E. REPRESENTATION OF NETWORK PROPERTIES s e c g r p _ a s s o c i a t i o n (X, _ , Y, _ ) ,
We represent the network properties in the form of rules. n o t ( n o n C o m p l i a n t S e c G r o u p (Y) ) ) ) .
The properties are generic and are based on the schema we composableRDS (X) :− r d s E n c r y p t i o n A t R e s t (
decided on for the security properties of the components in X) , i s P r i v a t e (X) .
The compute composability rule requires an instance to rds(X , _, _), not(composableRDS(X )).
use only encrypted disk storage, to be in a private subnet or Any values returned for X will denote database instances
not publicly reachable over SSH, and not have TCP ports that are either not encrypted or public-facing.
open to the Internet other than the standard ports permitted We presented an approach to building the KB of security
by the policy. The rds composability requires the rds instance properties of individual components as logical facts, repre-
to be encrypted at rest and to be in a private subnet. If we senting security policies as rules and reasoning about them to
apply these two rules to the facts in Tables 3 and 4, none of verify whether the facts satisfy the policies. This approach is
the compute units will satisfy the composability conditions general and can be applied to large distributed systems such
because all of them are in public subnets, but the rds instance as banking and financial or video conferencing applications
will be composable because it is in a private subnet and is because these systems are built using components such as
encrypted. VPCs, compute units, and databases, which are the basic
building blocks. There will be several thousands of similar
G. VULNERABILITIES IDENTIFIABLE FROM COMPONENT facts for such systems. Resource configuration facts and
CONFIGURATIONS system behavior rules can be added to the framework
While AWS provides building blocks, it is still left to we discussed for more components, such as queuing,
the customers to configure them securely under the shared notification, and identity and access management. Security
responsibility model. For example, one can launch a compute policy rules can be coded based on the application domain and
unit but could configure it in the following insecure ways: verified using the properties of the individual components to
• Use the elastic block storage without encryption, leaving prove the soundness of the composition.
the information stored in the volume available for cloud
providers to access. H. VERIFYING THE SECURITY POLICY STATEMENTS
• Set up the compute unit to be Internet-facing with SSH The components compose securely if and only if every
ports open, leaving it discoverable on the Internet for statement in the security policy is satisfied by the combined
password brute forcing and other attacks. model of the components. For example, if we want to
• Expose the services to the Internet directly without a verify the security policy that all compute units should have
load balancer for hackers to be able to fingerprint the their EBS volumes encrypted, the statement is posed as a
services running on the compute instance for planning query: encrypted(X). Prolog returns the compute units with
attacks based on published vulnerabilities. encrypted EBS volumes. Similarly, assume a security policy
Similarly, one can launch a database instance without where all compute units and database instances must be in
encryption at rest and with a public IP address, making it private subnets as specified in the statement:
susceptible to password attacks and eventual data breaches.
i s P r i v a t e (X) , ( compute (X, _ , _ , _ , _ ) ; r d s (X,
We consider these two categories of vulnerabilities arising
_ , _) ) .
from misconfigurations in this paper. This can be expanded
to include improper handling of access key/secret key pairs, When issuing this query, X will be instantiated to the compute
not enabling multi-factor authentication for users, not using units and rds instances in the private subnet, and the query will
the right TLS protocol versions for protecting data in transit, succeed.
etc. Thus, composable building blocks such as compute units
The security policies are designed to detect and address and database instances can be assembled from a given VPC
vulnerabilities. While we can identify the components that configuration and network properties. Two such sample rules
are composable with regards to a given policy, we can also are composableCompute and composableRDS presented
enumerate the ones that do not compose using the negative in subsection IV-E.
goal predicate. For example, if a compute unit does not satisfy
the goal predicate encrypted(X) for compute units, then V. PERFORMANCE EVALUATION AND COMPOSITION AS
we can infer that there is a vulnerability due to the lack of A WAY TO SCALE
encryption at rest for the elastic block storage. Similarly, We implemented our prototype using a bash script and AWS
we can identify compute units that are in the public subnet Command Line Interface to fetch the configuration of VPCs
and hence will be discoverable for brute force attacks. in the AWS account and the compute, rds, and load balancer
Example 3: Given a KB of compute units, we can list the resources contained within it and translate them into Prolog
compute units that are not encrypted or in a public subnet facts as we described in section IV on Page 139941. The
using the following queries. An underscore denotes don’t care program is general and can be run on any AWS account
values. using an access key and a secret access key pair. The program
compute(X , _, _, _, _), not(encrypted(X )). enumerates through VPC-level network configurations such
compute(X , _, _, _, _), not(isPrivate(X )). as subnets, network access control lists (NACLs), security
Example 4: Another example is the composability rule groups, and resources such as compute units and database
for database instances. We can identify vulnerable database instances and their attributes. The configuration information
instances using the negative predicate goal as follows: is written as Prolog facts in a file. We manually verified
TABLE 5. Time taken for KB generation (s). TABLE 6. Execution time for verification - DB instances (time in µs).
TABLE 8. Comparison of response time for the combined goal and the
conditions in the body of the rule (µs).
c o m p o s a b l e C o m p u t e L i s t ( L ) :−
f i n d a l l (X, composableCompute (X) , Y) ,
s o r t (Y, L ) .
We measured the average response times and the number
of inferences based on 10,000 executions for each of the
conditions in the body of the rule, encrypted, isPrivate, the
not conditions:
( n o t ( c a n S S H F r o m I n t e r n e t (X) ) ,
s e c g r p _ a s s o c i a t i o n ( _ , X, Y, _ ) , n o t (
n o n C o m p l i a n t S e c G r o u p (Y) ) ) ,
as well as the OR (isPrivate; not conditions) combination.
The sum of splits is the sum of the results for encrypted and
the OR condition. The results are presented in Table 8 and
Figure 5.
We can conclude from this analysis that the sum of
response times of individual conditions in the body is roughly
equal to that of the overall goal except in the case where
there is more backtracking due to individual subgoal failures
that increase the overall average response time, whereas the
response time of individual subgoals without the influence of FIGURE 6. Analysis of average inferences for subconditions.
other subgoals are lower.
Similarly, we analyzed the Inferences for the overall goal
and the conditions in the body of the rule and have presented We observe that there is a constant overhead when the
the results in Table 9 and Figure 6. number of compute units in the result is smaller, which
TABLE 10. Average response time (µs) and inferences for the Our framework is extensible to add additional components
non-compliant compute units.
and the corresponding facts and rules and requires declaring
the arity of the new predicates at the beginning of the KB file.
increases the average number of inferences but decreases as C. EXTENSION TO LARGE SCALE SYSTEM COMPOSITION
the number increases. The result set where more instances Large-scale systems utilize several AWS regions worldwide.
do not satisfy the goal, there is more backtracking, which Each region uses multiple availability zones for redundancy.
increases the number of inferences. The general pattern of When a software system is deployed in such distributed
response time and inferences remains consistent for both the environments, the deployment is standardized with templates
combined goal and the subgoals and is linear. using AWS cloudformation or terraform software tools.
We also studied the response time and average number of The configuration of the system in each region in terms
inferences for retrieving the list of compute units that do not of VPCs and the topology of the application with load
conform to the security policy and hence are vulnerable to balancers, compute units, database servers, and the associated
common attacks explained in subsection IV-G on Page 12. security groups are templatized. This way, a larger distributed
This list complements the list of composable compute units system is built using the building blocks deployed in each
and is useful information for the operations personnel to region utilizing multiple availability zones. Our approach
remediate them to comply with the security policy. The automatically converts the security groups and network
results are presented in Table 10 and Figure 7. access control lists of such building blocks into equivalent
Practitioners can concentrate on just getting the list of Prolog statements. The statements can then be verified
non-compliant components in a system to address the issues against the organization’s security policy. If the statements
and then, for double checking, perform a compliance check to satisfy the security policy in one region, the combined system
ensure that all components are returned in the composability will also satisfy the same policy because of the identical
query. configuration of the system in the other regions. As a double-
check, verification per region can still be performed.
A. EXTENSIBILITY OF THE PROLOG FRAMEWORK
While Prolog is not efficient for number crunching and rep- 1) CONTINUOUS VERIFICATION
resenting complex data structures and lacks rich input/output Security is not a point in time but a continuous process. While
interfaces, its declarative style makes it easy to represent VPC configurations and individual resource configurations
knowledge as facts and rules, and the built-in backtrack- change during the application life cycle from release to
ing, pattern matching, and recursion make it suitable for release, the security policies to be satisfied are comparatively
searching through the KB, returning all possible results and static. During each release cycle, the VPC configuration
writing compact and understandable programs for symbolic can be gathered, versioned, and analyzed against security
computation. policies. Thus, one can evaluate how configurations change or
VII. CONCLUSION
Although composability is a hard problem, identifying the
components that satisfy a given security policy to compose
a larger system is of practical importance, particularly in
FIGURE 8. Security policy verification framework.
cloud environments. We presented a framework using FOL
to analyze the composability of components and identify the
drift over a period of time. The framework shown in Figure 8 composable constructs that satisfy a given security policy.
can be automated for DevSecOps. Our approach focused on the composability of components
from a security policy perspective. We applied our method to
VI. LIMITATIONS OF THE STUDY AND FUTURE WORK analyze the virtual private cloud in AWS by translating the
Our study covers a few cloud infrastructure components that configuration information into Prolog facts and combining
are most prevalent in practical systems. To make the approach them with rules representing the policies and network
more practical, we should extend our implementation to cover properties. We could show the components that satisfy a
all relevant infrastructure components, including server-less given policy and those that do not. We posit that assembling
computes known as Lambdas, Queuing, Notification, and a large-scale system with composable constructs - compute
Logging frameworks. We map the component configuration units, databases, and perimeter security nodes such as bastion
into logical statements using a schema that we think is hosts, ensures that the combined system will satisfy the
sufficiently expressive to prove if the configuration satisfies overall security policy without re-analyzing the components
a given security policy. We can add additional features of after the system is built. This study can be extended to create
the components to the facts, such as backup retention and composable constructs, assemble a large-scale system using
replication in the case of databases. Additionally, security those constructs, and re-validate when the system changes to
teams generally write security policy statements in natural identify and correct drifts.
language. Translating them into Prolog facts and rules
requires manual work unless a powerful natural language REFERENCES
processing approach is used in automation. We used a shell [1] Amazon Web Services. (2023). AWS Shared Responsibility. [Online].
script and AWS Command Line Interface to automate the Available: https://aws.amazon.com/compliance/shared-responsibility-
generation of Prolog facts corresponding to the configuration model/
[2] The Open Worldwide Application Security Project. OWASP Top 10.
of the chosen components. For flexibility and better perfor- Accessed: Nov. 26, 2023. [Online]. Available: https://owasp.org/www-
mance, this could be done in a language such as Java and project-top-ten/
made more generic to handle other cloud platforms such [3] J. M. Wing, ‘‘A symbiotic relationship between formal methods and
security,’’ in Proc. Comput. Secur., Dependability, Assurance, Needs
as Microsoft Azure and Google Cloud Platform. Although Solutions, Williamsburg, VA, USA, 1998, pp. 26–38.
we provided the necessary logical framework for the study, [4] E. M. Clarke and J. M. Wing, ‘‘Formal methods: State of the art and future
we need to extend the formalism to be more general. directions,’’ ACM Comput. Surveys, vol. 28, no. 4, pp. 626–643, Dec. 1996.
[5] L. Lamport, ‘‘The temporal logic of actions,’’ ACM Trans. Program. Lang.
Syst., vol. 16, no. 3, pp. 872–923, May 1994.
A. FUTURE WORK [6] C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and
M. Deardeuff, ‘‘How Amazon uses formal methods,’’ Commun. ACM,
KB generation will be implemented using a high-level vol. 58, no. 4, pp. 66–73, Apr. 2015.
language for portability. This framework can be incorporated [7] D. McCullough, ‘‘A hookup theorem for multilevel security,’’ IEEE Trans.
into a cloud system, as in Figure 8. The KB generator Softw. Eng., vol. 16, no. 6, pp. 563–568, Jun. 1990.
and updater can run in the AWS account as a Lambda, [8] J. A. McDermid and Q. Shi, ‘‘Secure composition of systems,’’ in Proc.
8th Annu. Comput. Secur. Appl. Conf., San Antonio, TX, USA, 1992,
and the knowledge base can be stored in a database pp. 112–122.
such as a DynamoDB indexed by the pair (region, VPC). [9] R. Canetti, Y. Dodis, R. Pass, and S. Walfish, ‘‘Universally composable
To continuously update the facts when changes occur in the security with global setup,’’ in Theory of Cryptography (Lecture Notes
in Computer Science), vol. 4392. Berlin, Germany: Springer, 2007,
VPC resources, a Lambda function can be invoked when pp. 61–85, doi: 10.1007/978-3-540-70936-7_4.
change notifications occur via a Simple Notification Service [10] J. Backes, S. Bayless, B. Cook, C. Dodge, A. Gacek, A. J. Hu, T. Kahsai,
(SNS) and Simple Queue Service (SQS). Resource changes B. Kocik, E. Kotelnikov, J. Kukovec, and S. McLaughlin, ‘‘Reachability
analysis for AWS-based networks,’’ in Computer Aided Verification
can be gathered and added to the KB in a version-controlled Lecture Notes in Computer Science), vol. 11562, Cham, Switzerland:
manner. The policy checker run in another Lambda function Springer, 2019, pp. 231–241, doi: 10.1007/978-3-030-25543-5_14.
[11] K. Jeyaraman, N. Bjorner, G. Outhred, and C. Haufman. Accessed: ROHIT CHADHA received the B.Tech. degree in
Nov. 26, 2023. Automated Analysis and Debugging of Network Poli- computer science and engineering from the Indian
cies. [Online]. Available: https://www.microsoft.com/en-us/research/ wp- Institute of Technology, New Delhi, India, in 1997,
content/uploads/2016/02/secguru.pdf and the Ph.D. degree from the Department of
[12] M. Barrére, R. Badonnel, and O. Festor, ‘‘A SAT-based autonomous Mathematics, University of Pennsylvania, in 2003.
strategy for security vulnerability management,’’ in Proc. IEEE Netw. He is currently an Associate Professor with the
Operations Manag. Symp. (NOMS), Krakow, Poland, May 2014, pp. 1–9, Department of Electrical Engineering and Com-
doi: 10.1109/NOMS.2014.6838309.
puter Science, University of Missouri, Columbia,
[13] M. Oulaaffart, R. Badonnel, and C. Bianco, ‘‘An automated SMT-
and the Director of the Mizzou Cybersecurity
based security framework for supporting migrations in cloud composite
services,’’ in Proc. IEEE/IFIP Netw. Operations Manage. Symp., Budapest,
Center. He has held research positions with
Hungary, Apr. 2022, pp. 1–9, doi: 10.1109/NOMS54207.2022.9789768. INRIA, Saclay, France; the University of Illinois at Urbana–Champaign;
[14] C. A. R. Hoare, Communicating Sequential Processes. Upper Saddle River, Instituto Superior Tecnico, Portugal; and the University of Sussex, U.K. His
NJ, USA: Prentice-Hall, 1985. research interest includes formal engineering methods for computer security.
[15] L. Lamport, Specifying Systems: The TLA+ Language and Tools for
Hardware and Software Engineers. Reading, MA, USA: Addison-Wesley,
2002.
[16] M. Dickinson, S. Debroy, P. Calyam, S. Valluripally, Y. Zhang,
R. B. Antequera, T. Joshi, T. White, and D. Xu, ‘‘Multi-cloud performance
and security driven federated workflow management,’’ IEEE Trans. Cloud
Comput., vol. 9, no. 1, pp. 240–257, Mar. 2021.
[17] J. Väänänen. (2014). Many-Sorted Logic. [Online]. Available:
https://docplayer.net/155578006-Many-sorted-logic-jouko-vaananen-
1-2-easllc-july-department-of-mathematics-and-statistics-university-of-
helsinki.html
[18] L. Kovács and A. Voronkov, ‘‘First-order theorem proving and vampire,’’
Computer Aided Verification (Lecture Notes in Computer Science),
vol. 8044, Berlin, Germany: Springer, 2013, pp. 1–35. PRASAD CALYAM (Senior Member, IEEE)
[19] E. Kotelnikov, ‘‘Checking network reachability properties by automated received the M.S. and Ph.D. degrees from the
reasoning in first-order logic,’’ in Automated Theorem Proving With Department of Electrical and Computer Engineer-
Extensions of First-Order Logic. Gothenburg, Sweden: Chalmers Univ. ing, Ohio State University, in 2002 and 2007,
Technology, 2018, ch. 5, pp. 114–131. respectively. He is currently a Full Professor with
[20] D. Sadig. (2018). CS 221 Lecture 16 [Powerpoint Slides]. the Department of Computer Science, University
[Online]. Available: https://web.stanford.edu/class/archive/cs/ of Missouri, Columbia. Previously, he was the
cs221/cs221.1186/lectures/index.html#include=logic1.js&mode=print1pp Research Director with the Ohio Supercomputer
[21] H. B. Enderton, A Mathematical Introduction to Logic, 2nd ed. Amster- Center. His research interests include distributed
dam, The Netherlands: Elsevier, 2001. and cloud computing, computer networking, and
[22] Amazon Web Services. AWS CLI Reference. Accessed: Nov. 26, 2023. cybersecurity.
[Online]. Available: https://docs.aws.amazon.com/cli/latest/reference/
[23] Github. Implementation Source Code. Accessed: Nov. 26, 2023. [Online].
Available: https://github.com/kandamuniasamy2016/composability
[24] Amazon Web Services. Regions and Availability Zones. Accessed:
Nov. 26, 2023. [Online]. Available: https://docs.aws.amazon.com/
managedservices/latest/appguide/cfn-ingest-ex-3-tier.html