0% found this document useful (0 votes)
3 views15 pages

Automated_Infrastructure_as_Code_Program_Testing

The document discusses Automated Configuration Testing (ACT), a novel methodology for efficiently testing Infrastructure as Code (IaC) programs using general-purpose programming languages. It highlights the challenges of testing IaC due to the complexity of modern deployments and the low adoption of testing practices in existing IaC programs. The proposed ProTI tool implements ACT for Pulumi TypeScript, enabling rapid unit testing of IaC programs with minimal development effort and demonstrating effectiveness in bug detection.

Uploaded by

nawrami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views15 pages

Automated_Infrastructure_as_Code_Program_Testing

The document discusses Automated Configuration Testing (ACT), a novel methodology for efficiently testing Infrastructure as Code (IaC) programs using general-purpose programming languages. It highlights the challenges of testing IaC due to the complexity of modern deployments and the low adoption of testing practices in existing IaC programs. The proposed ProTI tool implements ACT for Pulumi TypeScript, enabling rapid unit testing of IaC programs with minimal development effort and demonstrating effectiveness in bug detection.

Uploaded by

nawrami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO.

6, JUNE 2024 1585

Automated Infrastructure as Code Program Testing


Daniel Sokolowski , David Spielmann , and Guido Salvaneschi

Abstract—Infrastructure as Code (IaC) enables efficient de- declarative IaC, developers describe the target state of the de-
ployment and operation, which are crucial to releasing software ployment instead of the deployment steps [7], [8], which are
quickly. As setups can be complex, developers implement IaC automatically derived. Deployments can be complex—a trend
programs in general-purpose programming languages like Type-
Script and Python, using PL-IaC solutions like Pulumi and AWS also driven by modern systems often consisting of several small
CDK. The reliability of such IaC programs is even more relevant components. For example, applications that have consisted of a
than in traditional software because a bug in IaC impacts the monolithic web server and a database now may comprise tens or
whole system. Yet, even though testing is a standard development hundreds of serverless functions and microservices. This trend
practice, it is rarely used for IaC programs. For instance, in transfers complexity from components to their composition,
August 2022, less than 1 % of the public Pulumi IaC programs
on GitHub implemented tests. Available IaC program testing resulting in long, structured IaC scripts. To cope with such
techniques severely limit the development velocity or require complexity, developers implement IaC programs—in contrast
much development effort. To solve these issues, we propose to IaC scripts—with recent declarative IaC solutions that adopt
Automated Configuration Testing (ACT), a methodology to test general-purpose languages, e.g., TypeScript, Python, or Java,
IaC programs in many configurations quickly and with low effort. and not only configuration languages and DSLs with con-
ACT automatically mocks all resource definitions in the IaC pro-
gram and uses generator and oracle plugins for test generation strained expressivity like JSON and YAML. Such Programming
and validation. We implement ACT in ProTI, a testing tool for Languages IaC (PL-IaC) solutions come with all abstractions
Pulumi TypeScript with a type-based generator and oracle, and (and tools) of well-known general-purpose programming lan-
support for application specifications. Our evaluation with 6 081 guages. To the best of our knowledge, the industrial-strength
programs from GitHub and artificial benchmarks shows that PL-IaC solutions available today are Pulumi [9], the Cloud
ProTI can directly be applied to existing IaC programs, quickly
finds bugs where current techniques are infeasible, and enables Development Kit (CDK) of Amazon Web Services (AWS CDK)
reusing existing generators and oracles thanks to its plugg- [10], and the CDK for Terraform (CDKTF) [11]. They have
able architecture. existed since 2018–2020 with quickly growing communities.
Index Terms—Property-based testing, fuzzing, The NPM core packages of AWS CDK, CDKTF, and Pulumi
infrastructure as code, DevOps. alone grew from 11 M downloads in 2020 to 146 M downloads
in 2023.1 Pulumi reported growth from hundreds to 2 000
customers and tens of thousands to 150 000 end users in the
I. INTRODUCTION
same period [12], [13].

I NFRASTRUCTURE as Code (IaC) automates software


operations [1] and is a key tool in organizations that aim
for reliable, high-throughput software development and deploy-
Testing IaC programs is an open research problem and crit-
ical in practice. For example, Rahman et al. [6] urge in their
mapping study of IaC research for more work on testing, and
ment [2]. With IaC, developers specify provisioning, deploy- Guerriero et al. [3] found that declarativity and “impossible
ment, and configuration in text-based files that are amenable testing” are the most mentioned differences between IaC and
to well-known software engineering practices like version traditional software in 44 semi-structured interviews with senior
control, code review, and continuous integration. As a re- developers. The lack of suitable testing techniques is especially
sult, IaC enables faster, more reproducible software operations apparent for PL-IaC: while studies found that more than 50 %
[3], [4], [5], [6]. of public software projects on GitHub use testing [14], [15],
IaC started with imperative scripts, but meanwhile, many we found only 25 % of the PL-IaC programs use testing,
more robust, declarative IaC solutions are available. With dropping to 1 % for general PL-IaC [16], which only Pulumi
implements. Because Pulumi provides more-advanced support
and is more open than the CDKs (Section II-A), we focus
on Pulumi.
Manuscript received 24 November 2023; revised 8 April 2024; Current testing techniques for PL-IaC (Section II-B) pose a
accepted 13 April 2024. Date of publication 1 May 2024; date of current dilemma (Section II-C): integration testing is notoriously slow
version 14 June 2024. This work is partially supported by the Swiss Na- and potentially causes high infrastructure costs. Unit testing
tional Science Foundation (SNSF) under Grant 200429 and by Armasuisse
S+T. Recommended for acceptance by P. Runeson. (Corresponding author: is the only alternative without these issues, but insightful unit
Daniel Sokolowski.) tests for IaC programs require high development effort. Every
The authors are with the University of St. Gallen, 9000 St. Gallen, Switzer-
land (e-mail: [email protected]; [email protected];
[email protected]). 1 According to https://npm-stat.com/ for aws-cdk-lib, @aws-cdk/
Digital Object Identifier 10.1109/TSE.2024.3393070 core, cdktf, and @pulumi /pulumi.

0098-5589 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1586 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024

resource definition has to be replaced with a mock faithfully


modeling the cloud resource and implementing configuration
validation and generation logic. Mocking code easily becomes
more complex than the IaC program under test itself, resulting
Fig. 1. High-level architecture of PL-IaC solutions.
in very few projects using systematic testing—despite testing
being crucial for the high-velocity development of reliable soft-
ware [17], [18]. 4) An evaluation with 6 081 Pulumi TypeScript programs
To enable efficient testing of IaC programs, we propose Au- from GitHub and benchmarks, showing that ProTI ap-
tomated Configuration Testing (ACT). ACT is an automated plies to existing IaC programs, can efficiently find bugs,
framework allowing developers to rapidly unit-test IaC pro- and can leverage existing tools as test generators and
grams in hundreds of configurations without writing any code. oracles through plugins.
ACT uses existing mechanisms to mock all resource definitions
in the IaC program automatically. In the mocks, a generator
provides test input, and oracles validate resource configurations. II. PL-IAC AND THE TESTING PROBLEM
The approach is open and enables reuse across projects through
pluggable third-party generators and oracles. We introduce PL-IaC (Section II-A) and IaC program testing
We implement ACT in the ProTI testing tool for Pulumi (Section II-B), discuss how existing testing techniques fall short
TypeScript with a default generator and oracle leveraging type (Section II-C), and outline our solution in Section II-D.
information from Pulumi package schemas. The evaluation on
6 081 Pulumi TypeScript programs from GitHub and generated
artificial benchmarks shows that (1) ProTI can find bugs reli- A. Programming Languages IaC (PL-IaC)
ably and quickly compared to existing testing techniques, (2) PL-IaC solutions adopt a general-purpose programming lan-
ProTI can be applied to IaC programs without any changes, guage, e.g., Python or TypeScript. IaC programs define the
(3) ProTI finds bugs often within seconds or tens of seconds, declarative target state of the deployment as a directed acyclic
and (4) ProTI can leverage existing generator and oracle tools graph (DAG). Each node is a resource with its configuration,
through simple plugins. and the arcs are dependencies between resources that constrain
This work is the first investigation of testing for IaC pro- the deployment order. For instance, a (virtual) server must be
grams. It is relevant because the popularity of PL-IaC is contin- created before a web application on it. Similarly, the server
uously increasing. Further, testing IaC is open research with a should be created before its DNS record.
high impact on the security and reliability of software systems. While IaC programs describe the target state, the deployment
ACT and ProTI are novel because they introduce efficient test- engine performs the actual deployment actions and maintains
ing of IaC programs and significant because they can improve the deployment’s state. The deployment engine receives the
the velocity of correct IaC program development. Also, ProTI’s target state from the IaC program, compares it with the current
pluggable architecture enables researchers to experiment with state, and performs the required actions to fill the gap (Fig. 1).
new oracles and generators specialized for PL-IaC. Lastly, our Further, it provides the IaC program with information about the
work follows scientific standards [19] rigorously, and we dis- deployment state such that the program can observe infrastruc-
close all developed software, analysis scripts, and data to en- ture information only available after a resource was created,
sure verifiability, transparency, reusability, and recoverability. e.g., a dynamically assigned IP.
ProTI2 is open source and publicly maintained on GitHub3 To define the target state, PL-IaC solutions provide an em-
with long-term archived releases [20]. The remaining evaluation bedded DSL that is available as libraries for many programming
material is published under the CC-BY-4.0 license on Zenodo languages. These SDK libraries simply provide a class for each
[21]. We only exclude analyzed third-party code that does not deployable resource type. In an IaC program, developers define
permit re-distribution. In summary, this paper contributes: a resource (i.e., a node in the target state) by instantiating an
1) A comparison of testing methods for IaC programs, es- object of the resource type’s class. The resource’s input con-
tablishing the testing dilemma of PL-IaC, which is backed figuration is provided as an argument to the constructor. After
by our previous repository mining study [16]. the deployment engine deploys a resource, its post-deployment
2) Automated Configuration Testing (ACT): A novel ap- output configuration is available as properties on the resource’s
proach for efficient unit testing of IaC programs. object. Developers explicitly define a dependency from a re-
3) ProTI: A testing tool implementing ACT for Pulumi source A to a resource B (i.e., an arc from node A to B in the
TypeScript that is pluggable with third-party generators target state) by referencing B or one of its output properties
and oracles and provides default type-based generators in A’s input configuration. Alternatively, such a dependency
and oracles based on Pulumi package schemas. can be defined implicitly by instantiating A in a program part
that depends on an output property of B. Defined resources,
their properties, and their dependencies are immutable. Thus,
the target state is monotonically growing throughout the IaC
2 https://proti-iac.github.io. program execution, and defined resources and dependencies can
3 https://github.com/proti-iac/proti. neither be changed nor removed.

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1587

Listing 1. RWW example: Pulumi TypeScript program that deploys a static


website on AWS S3 showing a random word.4

Fig. 3. Roles and relationships in PL-IaC solutions.

the component in the PL-IaC architecture that implements the


cloud-specific SDK library and deployment engine plugin to de-
fine and control the cloud’s resources (for Pulumi, e.g., there are
providers for AWS, Azure, Google Cloud, etc.). IaC programs
use the provider SDK libraries (e.g., Pulumi’s AWS and random
provider SDK libraries are imported in Line ) and instantiate
objects of their resource types (e.g., the S3 bucket in Lines
1.6–1.8) to define resources in the target state. The deployment
engine receives the target state from the IaC program and com-
pares it to the current state, which it maintains. To reach the
target state, the deployment engine uses the provider plugins
to control the specific clouds, i.e., to perform actions to create,
Fig. 2. Example of a target state described by Listing 1.4 read, update, delete, and list the resources (CRUDL actions).
To our knowledge, Pulumi is the only industrial-grade IaC
solution implementing this PL-IaC approach to its full extent.
As a running example, we introduce the Pulumi TypeScript Pulumi features a CLI that orchestrates the concurrent exe-
IaC program of the Random Word Website (RWW) in List- cution of the IaC program and the deployment engine. Both
ing 1, which defines the target state in Fig. 2. It deploys a other available PL-IaC solutions—AWS CDK and CDKTF—
static website on AWS S3 [22] that displays a word randomly support a weaker approach, two-phase PL-IaC (Section VI).
selected from the array in Line 1.5. Lines 1.6–1.8 define the S3 Hence, we focus on Pulumi and TypeScript, which is the most
bucket and Line 1.10 the word-id resource. It receives range popular programming language in PL-IaC [16]. We do not
(Line 1.9) as input configuration and is assigned to rng. After investigate cloud SDKs, e.g., AWS SDK [23] and Azure SDK
word-id is deployed, the deployment engine provides a ran- [24], because they target the imperative, low-level management
domly drawn number as the result field of the resource’s of resources. PL-IaC abstracts the complexities of CRUD oper-
output configuration. Such output configuration values are ations for IaC program developers and hides such SDKs in the
available as properties of the resource objects, in this case provider plugins.
as rng.result. To access the value, apply (Line 1.11)
registers a callback (Lines 1.11–1.18), which executes as soon
B. Testing IaC Programs
as the random number is available. The number is used to
select a word from the words array in Line 1.16, which is Fig. 4 shows the PL-IaC testing techniques available for
capitalized and set as content in the input configuration of Pulumi programs [25], ordered top-to-bottom by time consump-
the index resource (Lines 1.12–1.17). The dependence of tion. Unit Testing IaC programs is like in traditional software:
index on word-id is defined implicitly by defining index IaC users run (parts of) the program with a unit testing frame-
in the apply callback, a program part depending on word-id’s work, mock objects with side effects, i.e., every resource def-
output configuration. The dependence on the S3 bucket is made inition in an IaC program, and add checks. Even with runtime
explicit by referencing its object in the input configuration (Line mocking—like supported by Pulumi—developers still have to
1.13). Finally, Line 1.20 exports the website’s URL. provide the mocking logic. Dry Running simply executes the
Fig. 3 shows the PL-IaC architecture in detail. The IaC IaC program without executing deployment actions, providing
program (e.g., Listing 1) uses the IaC solutions’ SDK (e.g., a quick indication of whether the program terminates and a
Pulumi’s SDK is imported in Line 1.1) to deploy an application preview of the target state. Yet, the preview of dry running
in the cloud. A cloud is a collection of resources controllable is incomplete and neither supports specific checks nor ensures
through an API, e.g., the random number generator and AWS sufficient coverage. Dry running does not execute code paths
public cloud in Listing 1. For each cloud, there is a provider, that depend on values available only after a resource was cre-
ated. Resource Property Testing and Stack Property Testing,
4 For brevity, we omit the bucket’s ownership controls, public access block, e.g., with CrossGuard [26], solve these issues by performing the
and policy resources that are required to allow public access from the Internet. deployment, making them integration testing techniques. They

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1588 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024

Unit testing PL-IaC is labor-intensive and error-prone com-


pared to developing the program. First, one has to mock all
resource definitions—three in Listing 1. This step is not prob-
lematic per se, e.g., by adopting the runtime mocking Pulumi
provides. Yet, to create effective mocks, developers must imple-
ment validation logic for the input configuration and generate
output configurations as test inputs for the rest of the program.
Such code simulates the logic of cloud configuration, which is
complex and requires a correct model. Lastly, developers must
Fig. 4. PL-IaC testing techniques and their coverage, ordered by expected ensure the tests cover all relevant cases and may need to update
run time and feedback cycle frequency.
mocks with every change.
In summary, developers face a dilemma when testing IaC
TABLE I programs: They either invest excessive programming effort for
TESTING TECHNIQUES IN PUBLIC PULUMI PROGRAMS
ON GITHUB (AUGUST 2022) [16] efficient unit testing or resort to expensive integration testing,
hampering development velocity.
# Projects (% of Total)
Pulumi projects 12 945 (100 %) D. Automated IaC Testing to the Rescue
with unit testing 118 (1 %)
with property testing 33 (0 %)
To solve this issue, we propose Automated Configuration
with end-to-end testing 22 (0 %) Testing (ACT), a novel unit testing methodology for fast testing
of IaC programs with low development effort. ACT automati-
cally mocks all resource definitions. Each mock implements a
check resource configurations against policies before their de- test oracle to validate the resources’ input configurations and a
ployment and the final observed post-deployment state. End-to- test generator that provides the resources’ output configurations.
end Testing, e.g., Pulumi’s integration testing framework [27], With this level of automation, the PL-IaC program can be tested
runs the IaC program and validates the resulting deployment— in many different configurations without writing testing code.
not only its observed state. We implement ACT in ProTI, a testing tool for Pulumi
To shed light on the adoption of these testing techniques, TypeScript. ProTI’s results depend on oracles and generators,
we analyzed all public Pulumi projects on GitHub in August which are pluggable to foster reuse across programs. ProTI is
2022 in our PIPr dataset. For the detailed method, results, and equipped with a default oracle and generator based on types of
replication scripts, we refer to the PIPr dataset paper [16]. resource configurations from Pulumi package schemas. Further,
Briefly, we extracted specific keywords and the file extensions ProTI provides a specification mechanism to refine oracles and
of all files in the projects’ directory and mapped this data to generators in the IaC program where needed. ProTI tests the
testing techniques. Table I summarizes the results, showing example in Listing 1 in hundreds of different configurations in
that less than 1 % of all 12 945 projects implement systematic a short time with no changes to its code. In each test, ProTI
testing. This minuscule share indicates that developers perceive validates all resource configurations, including different num-
systematic testing impractical for IaC programs. ber values for wordId. ProTI will likely test a case where
Line 1.16 fails, detecting the bug in seconds.
C. The Dilemma of IaC Program Testing
III. AUTOMATED CONFIGURATION TESTING
Even though neglected by PL-IaC developers, systematic
testing is crucial for the high-velocity development of IaC— We now introduce Automated Configuration Testing (ACT),
no less than for traditional software [17], [18]. Without testing, a novel testing methodology for IaC programs. To effectively
e.g., it is easy to miss the bug in Listing 1: The random number address the testing dilemma (Section II-C), ACT is a unit testing
ranges from zero to three (Line 1.9), but the words array technique because the core issue of integration testing, being
index only from zero to two. If three is drawn, Line 1.16 calls slow and resource-intensive, is caused by the cloud providers,
toUpperCase() on undefined, causing an error. We now e.g., AWS and Azure, and cannot be significantly improved at
consider today’s PL-IaC testing techniques for Listing 1. the side of IaC developers. Thus, we aim to understand and
For integration testing IaC programs, including end-to-end minimize the developer’s unit testing effort.
and property testing, a single run of Listing 1 takes at least
A. Why Unit Testing IaC Programs is Effortful: Mocks
seconds. Programs with more complex resources may require
hours and cause high infrastructure costs. Testing only a few Efficient unit testing requires eliminating of integration with
configurations can miss corner-case bugs, like in Listing 1. external, slow, and resource-intensive components. For IaC
Dry running is fast and does not require coding. Yet, it cannot programs, this means mocking the interaction with the cloud,
find many errors, including the one in Listing 1, because it which is encapsulated in resource definitions. To this end, all
does not execute code depending on output configuration that resource object instantiations, a substantial part of the IaC pro-
is only available post-deployment, e.g., the apply callback in gram’s code, must be mocked—most of the code in the RWW
Lines 1.11–1.18. example (Listing 1).

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1589

A test generator provides a separate value producer for each


test case. During the execution of a test case, its value producer
receives the resources’ input configurations and returns for each
an output configuration. As the output configurations are test
input, the test case is ultimately defined as the sequence of
output configurations its value producer returns. An oracle is
a predicate function that decides whether the resource’s input
configurations are valid. We distinguish between two kinds
Fig. 5. Overview of ACT.
of oracles. Resource oracles receive individual resource input
configurations during the IaC program execution. Deployment
Mocking all resource definitions with a naïve mock is trivial, oracles receive the input configuration of all resources after the
requiring, e.g., in Pulumi TypeScript, only a couple of lines IaC program completed, enabling holistic validations.
of code—independent of the IaC program’s size. Yet, for ef- In ACT, both the generator and the oracles are plugins, al-
fective unit testing, the mocks have to implement the cloud lowing for exchange, adoption, and experimentation with test
logic in two crucial aspects. (1) The mocks have to return generators and oracles. Ideally, these plugins implement gener-
an output configuration for each resource input configuration alized, reusable generation and validation strategies decoupled
they receive. This is because, in a real deployment, the cloud from a specific IaC program. ACT solves the issue of unit test-
provides the resource’s output configuration to the IaC program ing IaC programs by moving the development effort of testing
after the resource deployment. As the output configurations are code from the developers of an individual IaC program to the
accessible in the remaining IaC program, they indeed constitute community. Once the community instantiates ACT for a specific
test input. Thus, the returned output configurations have to be platform (e.g., .Net or Python; our reference implementation
realistic to test the remaining IaC program precisely. Further, to covers Pulumi Typescript) and provides suitable plugins, de-
cover all paths, it may be necessary to return different output velopers can test the basic correctness of the imperative IaC
configurations across test executions. (2) To test the declarative program and its target state without implementing any code.
target state the IaC program defines, i.e., the cloud configuration ACT’s approach fosters the reuse of plugins across dif-
to set up and not only the imperative IaC program execution, ferent applications. However, to ensure that testing is also
the mocks have to validate the received resource input con- based on application-specific knowledge (e.g., intentful oracles,
figurations (i.e., they have to implement test oracles). This is Section III-A), a mechanism to augment the community-
because the cloud provides feedback to the IaC program on provided generators and oracles with application-specific gen-
the resource input configurations by reporting an error when eration and validation specifications is needed. For this, ACT
an invalid configuration is deployed. implementations can leverage various approaches, e.g., specifi-
Such oracles should be intentless, i.e., they reject config- cation DSLs separated from or embedded into the IaC program
urations that are generally invalid, independent of the IaC code. ProTI features ad-hoc specifications, an embedded DSL
program’s context. Ideally, they are further intentful, i.e., integrated into the IaC program code (Section IV-C).
they also reject configurations that violate the IaC program’s
application-specific goals. C. Running Test Sequences With ACT
Finally, the significant challenge is that mocks have to im-
plement both suitable test generators and oracles. Suitable test With automated test execution, generation, and validation,
generators ensure coverage and minimize false positives be- ACT can execute the IaC program in many different configura-
cause they do not generate unrealistic test inputs that trigger tions. For a sequence of tests, the generator plugin provides a
issues that would never occur in practice. Suitable test oracles different value producer for each test case. The test case selec-
verify the cloud configuration the IaC program defines. Both are tion it performs is crucial, i.e., which value producer instances
non-trivial and require a significant amount of code—likely a it chooses, as it determines which and how parts of the IaC
multiple of the IaC program’s code. Further, such mocks mirror program are tested. ACT terminates once an oracle finds a bug,
the logic of the IaC program under test and the cloud it uses, the program under test crashes, or, if no bug is found, after
leading to code tightly coupled with the IaC program, ultimately a defined amount of runs or a timeout. Thus, a generator’s
slowing down any future changes. prioritization and selection of test cases is crucial to ensure
relevant bugs are triggered (early).
Conceptually, ACT combines property-based testing (PBT)
B. Automating Unit Testing With ACT
[28], [29] and fuzzing [30] techniques for IaC programs. Both
To solve these issues, we propose ACT (Fig. 5). ACT systematically test a program s in many configurations c ∈ C,
automatically mocks all resource definitions by intercepting which are put into relation by a property p, leading to ∀c ∈
the constructors of resource classes, e.g., the constructor of C. p(c, s(c)) if s is correct. However, the pessimistic assump-
aws.s3.Bucket in Lines 1.6–1.8 of Listing 1. The ACT tion is that s contains a bug, yielding the goal to find and test a
resource mocks receive the input configuration of each resource configuration c leading to ¬p(c, s(c)) as early as possible in a
and return suitable output configurations. The resource mocks sequence of tests. As the generator plugin is exchangeable, ACT
implement both a test generator and a set of test oracles. is amenable to new state-of-the-art fuzzing and PBT test case

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1590 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024

selection strategies, e.g., based on testing feedback [31], search- models, including types. For instance, Pulumi providers, i.e.,
based techniques [32], code coverage [33], and combinatorial vendor-specific plugins (cf. Section II-A) used by Pulumi to
coverage [34]. interact with the cloud, are distributed as packages that con-
tain a schema JSON file defining the types of the resources’
target and output configuration. Such type definitions are a
D. Discussion configuration model that is by design available for all resources
We now discuss bugs in IaC programs, ACT’s design, its Pulumi supports—even for dynamically typed languages—and
relation to cloud models, and the resulting limitations. they can be leveraged for type-based generators and oracles
1) IaC Program Bugs: We propose a bug taxonomy for [29]. ACT’s open architecture ensures that developers can adopt
IaC programs. In contrast to previous, more fine-grained bug and combine available models and plug in domain-specific
taxonomies, e.g., for IaC defects by Rahman et al. [35], we optimizations. ACT is not limited to functional properties. For
focus purely on the required oracle to find a bug. Recent fuzzing instance, models of cloud performance and security, predicting
literature, e.g., Su et al. [36] and Li et al. [33], commonly bad performance and insecure setups based on resource config-
distinguishes crash bugs that cause the program to crash and urations, can be embedded in ACT oracle plugins to cover such
non-crashing logic bugs, which require a more precise oracle non-functional aspects.
than crash detection to identify erroneous computations. We add Ideally, models for ACT generators and oracles are (1) com-
two categories for bugs where the program logic may be correct, plete, i.e., they can produce all valid configurations, and (2)
but the resulting resource configuration is faulty. Configuration correct, i.e., they include only valid configurations. Incomplete
bugs are the wrong configuration of an isolated resource, e.g., models in a generator systematically prevent generating test
setting an IPv4 address to the invalid value 400.0.0.1. With cases that may be needed to find bugs, and incorrect models can
configuration interaction bugs, the configuration of the individ- yield test cases that never occur in practice. Incomplete models
ual resources is valid but invalid in combination. For example, in oracles can trigger false positives (i.e., alerts in the absence of
there is a subnet 192.168.0.1/24 and a server in it has a bug) and incorrect models false negatives (i.e., missing bugs).
the IP address 192.168.1.2, which is invalid in this subnet. In practice, cloud models are not perfect. For instance, Pulumi
In contrast to crash and logic bugs, configuration bugs require package schema types are complete but not fully correct. In
oracles that can identify invalid cloud configurations and, for RWW (Listing 1), a correct generator should generate integers
configuration interaction bugs, even across multiple resources. in the range (Line 1.9) for RandomInteger’s result field
Crash bugs and logic bugs are related to “traditional” code, (Line 1.11). Yet, a type-based generator provides any number,
while configuration (interaction) bugs are related to the embed- including outside the range and fractions, because the type
ded DSL code in IaC programs that defines the target state of of RandomInteger.result is number. Similarly, a cor-
the deployment through instantiating objects of the resource rect oracle only accepts valid HTML for the content field
types’ classes. However, IaC programs mix traditional code (Line 1.16), but a type-based one accepts any string.
(Lines 1.1–1.5, Line 1.9, and Line 1.20 of Listing 1) with the In practice, useful test generators and oracles may still gen-
embedded DSL code (Lines 1.6–1.8 and Lines 1.10–1.18). This erate irrelevant tests or miss bugs. Even if application-specific
mixing prevents testing the kinds of code in isolation and causes knowledge can further limit the configuration space, correcting
existing testing methods to be only applicable with a huge the model in generator and oracle plugins may overfit the plu-
mocking effort (Section III-A). gins to the specific program, reducing reusability or slowing
2) ACT’s Approach: ACT focuses on finding configuration down development. ACT addresses these issues by enabling
(interaction) bugs. To this end, static analysis is a suitable alter- fine-tuning of test generation and oracles for a specific ap-
native; for example, it can easily find the bug in Listing 1. Yet, plication, e.g., ProTI provides an ad-hoc specifications syntax
we base ACT on automated testing because it does not incur the (Section IV-C).
limitations of static analysis when covering complex dynamic
behavior of the IaC program code and supporting all features
IV. ProTI: ACT FOR PULUMI TYPESCRIPT
of the host language. In such systematic testing, the generator
has to exercise the IaC program in different configurations We present ProTI, an instantiation of ACT for Pulumi Type-
to find crash and logic bugs that yield wrong configurations Script. ProTI is built upon the popular JavaScript testing tool
effectively. We argue that covering such configuration-related Jest [37], fast-check [38] for the test execution strategy and
crash and logic bugs is sufficient because IaC programs focus arbitraries, and Pulumi’s runtime mocking. ProTI comprises
on the configuration, and all relevant logic drives this pur- six TypeScript packages (Table II). The first four packages
pose. If an IaC program implements complicated configuration- implement the core abstractions and Jest plugins for a Jest
unrelated logic, it should be separated from the embedded DSL runner, test runner, and reporter. @proti-iac/pulumi-
code and specifically checked with existing, well-established packages-schema is a Pulumi-packages-schema-based or-
testing techniques. acle and a generator plugin. @proti-iac/spec implements
3) Cloud Configuration Models: Generators and oracles the ad-hoc specification syntax. ProTI is used through Jest’s
implicitly define models of cloud resource configuration. Such CLI, which’s configuration it facilitates with a preset. ProTI
models could be derived from specifications, be hand-crafted, preserves Jest’s pre-test features and optimizations, e.g., an in-
or, more realistically, be derived from existing approximate memory file system for the code.

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1591

TABLE II B. Test Generator and Oracle Plugins


PROTI PACKAGES: NON-BLANK, NON-COMMENT SLOC
In an execution, ProTI loads exactly one generator plugin
Package Description Source SLOC Test SLOC and a variable number of oracle plugins, which are invoked in
@proti-iac/core Core abstractions 758 863 parallel. We do not provide an explicit mechanism to compose
@proti-iac/runner Jest runner 26 51 different plugins; however, when developers write a plugin’s
@proti-iac/test-runner Jest test runner 429 90
@proti-iac/reporter Jest reporter for check results 149 19 code, they can also combine other plugins programmatically.
@proti-iac/spec Ad-hoc specifications 12 74 ProTI plugins are implemented as NodeJS modules, export-
@proti-iac/pulumi- Pulumi packages schema infras- 1 334 1 960
packages-schema tructure, oracle, and generator ing the respective plugin as default and, optionally, an init
Total 2 708 3 057
function of ProTI’s TestModuleInitFn type that can im-
plement initialization code called by ProTI when loading the
plugin and also implements a plugin configuration interface.
@proti-iac/core implements all plugin-related types.
Generator plugins are implemented as fast-check value gen-
erators of ProTI’s Generator type, i.e., type Arbitrary
<Generator>. The arbitrary is called once for each test
run to provide a Generator and may implement shrinking,
a technique from property-based testing where, once an error
is found, simplified versions are tested and presented to the
developer as an easier-to-understand alternative if they still
trigger the bug [29]. The test run’s generator is invoked for
Fig. 6. ProTI test execution: i. – iii. initialization, a. – b. run initialization,
each resource with its target configuration and returns its output
1. – 4. resource mocking, I. – II. reporting. configuration for the run. Further, the generator is invoked with
the arbitrary of each ad-hoc generator specification, guiding
its execution to enable deterministic test generation strategies,
A. Test Execution With ProTI including shrinking.
Oracle plugins are implemented as a class inheriting from
Jest runners distribute tests over multiple workers. They in-
ProTI’s Oracle<S> type and can leverage state of type S
voke a test runner for each test suite. ProTI’s runner extends
that is initialized for every test run through a function they
Jest’s default by (1) verifying the test configuration and (2)
implement and passed to all invocations of the oracle in the run.
forwarding file system and module resolution information to
For these invocations, oracles implement at least one out of four
ProTI’s test runners, which they had to re-generate otherwise.
resource input configuration validation interfaces, which are
ProTI’s test runner is invoked once on the Pulumi.yaml
separately called for each resource or once with all resources,
of each IaC program and implements ACT (Section III). Fig. 6
both available synchronously and asynchronously.
details the test execution. First, the IaC program and its de-
For now, ProTI provides default generator and oracle plugins
pendencies are transpiled to JavaScript (i) and a configured
based on Pulumi packages schema types in @proti-iac/
set of dependencies is preloaded (ii). Preloaded modules are
pulumi-packages-schema. The package implements the
shared among all IaC program runs, breaking isolation but
infrastructure to automatically retrieve the schemas of all re-
reducing overhead. For technical reasons, Pulumi’s SDK must
sources in the IaC program under test. The oracle translates the
be preloaded. Further, the test coordinator loads the genera-
schemas’ resource types to validation functions to dynamically
tor and oracle plugins (iii). ProTI checks the IaC program in
check each resource input configuration. The generator com-
several runs, each configured with its own test run coordi-
poses fast-check arbitraries to generate output configurations,
nator (a), managing isolated run states for the generator and
inheriting fast-check’s random value generation strategy, which
oracles. Each run executes the IaC program once (b). ProTI
is biased towards generating extremes, e.g., instead of using
mocks all resource definitions by intercepting the constructors
an even distribution, it prioritizes generating small and big
of all resource classes with Pulumi’s runtime mocking feature.
values. However, ProTI can be easily extended with oracles
This way, each resource input configuration CI is run through
and generator arbitraries based on other model sources, e.g.,
validations and transformations that the provider’s SDK may
codified policies and cloud specifications.
implement in the resources’ constructors. We call the checked
and potentially transformed CI target configuration CT . For
instance, a resource’s CI may contain additional fields, which C. Ad-Hoc Specifications in ProTI
is valid in TypeScript’s structural type system, but the resource To fine-tune generators and oracles, ProTI provides ad-
constructor does not add them to CT . ProTI uses CT instead hoc specification syntax. generate(e).with(a) defines
of CI in the remaining ACT workflow. The mock provides all an ad-hoc specification to replace values returned by e with
resources’ target configuration CT to the generator and oracle values from a fast-check arbitrary a. For ad-hoc oracles,
plugins (1 – 2), and receives the output configurations CO to expect(e).to(p) applies an oracle predicate function to
use in the remaining program execution (3 – 4). Finally, ProTI an expression e. In a regular execution, the ad-hoc syntax only
reports the test results (I – II). returns the evaluation of the wrapped expression e, with no

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1592 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024

Listing 2. Listing 1 with ProTI ad-hoc specifications (orange). and scalability to ensure it is fast enough for realistic
IaC programs.
RQ4: Can existing test generation and oracle tools be
integrated into ProTI? We investigated whether ACT
allows to leverage third-party oracles and generators.
The following four subsections present our experiments and
Section V-E discusses their results and threats to validity. We
ran all experiments on serverless AWS Fargate [39] containers
with 1 vCPU and 4 GB of memory on AWS Elastic Container
Service (ECS) [40] in the eu-west-1 region (Ireland).

A. Finding Errors in IaC Programs


We compared ProTI with the available testing techniques
change in the semantics of the IaC program. When running with for Pulumi TypeScript programs (cf. Section II-B) on nine
ProTI, however, generate(e).with(a) calls the gener- variants of the RWW example (Section II-A). The variants are
ator plugin with a and returns a value from a. The oracle the following. VC is correct, VS is Listing 2, i.e., VC with
syntax still returns the evaluation of e, but it introduces a check, ProTI ad-hoc specifications, and VSDB adds the deployment
reporting an error if p(e) is false or fails. of a serverless database to VS. Most remaining variants have
Listing 2 fixes the indexing bug in Listing 1 and is fine-tuned a crash bug according to our bug taxonomy (Section III-D):
with ad-hoc specifications. In ProTI executions, the ad-hoc gen- VNT has syntax errors, VE always throws an error, which VAE
erator specification in Line 2.12 addresses the imprecision of throws asynchronously, VO is Listing 1, i.e., it has a one-off bug
the type-based oracle in Line 1.11, which generates any number, in asynchronous code that leads to a crash, which we combine
not only realistic output configuration values. Instead, Line 2.12 with the ad-hoc specifications of Listing 2 in VSO. VSB is VS
specifies to only generate integer values in the correct range with a configuration bug, setting a string instead of an object
interval. Further, Lines 2.18–2.19 specify an oracle that checks for the bucket’s website property (cf. Line 1.7).
that the webpage’s content is not empty, encoding the develop- ProTI was configured with the type-based oracle and gen-
ers’ application-specific intent to show a non-empty webpage. erator (Section IV) and up to 100 runs. Unit testing used Jest
While ProTI implements this embedded specification DSL [37] and ran the program once with a naïve mock that returned
for application-specific generator and oracle directives, ACT empty configurations. Dry running executed pulumi pre-
implementations could use external DSLs or encourage sepa- view. (Dry) property testing executed Pulumi CrossGuard [26]
rating the specification code into other files, as common with via (pulumi preview) pulumi up with the AWSGuard
most testing frameworks today. Such separation is also possible policy pack [41]. All Pulumi commands were non-interactive
with ProTI’s ad-hoc specifications but would require restructur- with skipped previews. End-to-end testing used Pulumi’s Go
ing the IaC program code to improve testability. For instance, integration testing framework [27], checking the content of the
the code augmented with specifications in Listing 2 could be deployed website. We executed each experiment 10 times after
wrapped in functions that separate testing files mock during warmup. Table III reports whether an error was (always) found
testing. We support inlining in ProTI for simplicity, assuming and the minimum and average run time.
that few ad-hoc specifications are required with good plugins. As expected, dry running did not find asynchronous errors
Yet, if a lot of ad-hoc specifications are required, separation is (VAE, VO, and VSO) as it does not run code depending on
preferable to avoid the added complexity of mixing concerns, unknown output configurations. Property testing and end-to-
obfuscating the IaC program code, and potentially introducing end testing found the one-off bugs (VO and VSO) only occa-
new error sources. sionally. ProTI was the only technique that spotted all errors
reliably. However, the imprecision of the type-based generator,
V. EVALUATION
i.e., generating any number for rng.result and not only
We evaluate ACT’s effectiveness, applicability, performance, integer values in the defined range, increased the likelihood of
and extensibility by answering the following research questions finding the error in VO, but also caused that ProTI identified
about its ProTI implementation. VC as faulty; a false positive. This imprecision is resolved in
RQ1: Can ProTI find bugs reliably? We determined VS, VSO, VSB, and VSDB with ProTI ad-hoc specifications
whether ProTI can find bugs quickly and reli- (cf. Section IV-B). ProTI always identified bugs in the first test
ably compared to existing PL-IaC testing techniques run, except in VSO, where it required 2 to 6 tests, causing a
(cf. Section II-B). slightly longer run time compared to VO.
RQ2: Is ProTI applicable to real-world open-source code?
We explored whether ProTI can be applied to exist-
ing, real-world IaC programs. RQ1: ProTI can find bugs reliably and is able to uncover
RQ3: How long does ProTI run, and how does the run errors in edge cases without explicitly testing for them.
time scale? We measured ProTI’s execution duration

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1593

TABLE III errors by common causes and report their frequency. Both the
PL-IAC TESTING TECHNIQUES ON VARIANTS OF THE RWW EXAMPLE categorization and error labeling are based on string matching
(Listing 1). ∗ FAULTY VARIANT. ERROR FOUND ∗ (ALWAYS ), MINIMUM
(AVERAGE) RUN TIME OVER 10 REPETITIONS on the execution logs, and the error grouping by open coding.
This process was incrementally performed and implemented by
ProTI Dry Run Property Test the first author and reviewed by the second author. The authors
Unit Test Dry Property Test End-to-end Test know Pulumi and ProTI well through their research.
VNT: 
16.7 s (16.8 s) 10.0 s (10.3 s)  12.4 s (12.4 s)

Non-transpilable 
1.9 s (2.0 s) 11.6 s (11.7 s)  47.9 s (65.7 s)
On a technical level, ProTI was able to test 40 % of the IaC

7.0 s (7.2 s)  2.3 s (2.4 s)  4.4 s (4.5 s) programs out of the box. This share is extremely remarkable and
∗ VE: Error

2.2 s (2.2 s)  3.7 s (3.8 s)  52.5 s (59.6 s) exceeds our initial expectations because (1) we did not filter for
VAE: 
7.4 s (7.6 s) 3.4 s (3.5 s)  9.4 s (9.6 s)
∗ buggy or non-functional programs, (2) ran all programs with
Async Error 
2.4 s (2.5 s) 4.8 s (4.9 s)  50.8 s (60.7 s)
VC: Correct

7.5 s (7.6 s) 3.4 s (3.4 s) 9.5 s (9.7 s) current NodeJS and TypeScript versions, and (3) did neither
2.7 s (2.7 s) 4.8 s (4.9 s) 53.5 s (59.0 s) look into nor provide any program-specific environments. We
VS: Listing 2 21.0 s (21.1 s) 3.5 s (3.5 s) 9.5 s (9.7 s)
(ad-hoc specs.) 2.8 s (2.9 s) 5.0 s (5.0 s) 52.6 s (62.3 s) suspect that ProTI can be used for most of the remaining IaC
VO: Listing 1  7.4 s (7.6 s) 3.4 s (3.4 s) ∗ 9.4 s (9.6 s) programs, too, after little effort is invested to understand their

(one-off bug) 2.7 s (2.7 s) 4.8 s (4.9 s) ∗ 51.9 s (58.4 s)
VSO: Listing 2  8.1 s (8.3 s) 3.5 s (3.6 s) ∗ 9.5 s (9.7 s)
expected execution environment or bug.
∗ The most common reasons why ProTI could not test a pro-
with one-off bug 2.8 s (2.9 s) 4.9 s (5.0 s) ∗ 59.5 s (66.6 s)
VSB: Listing 2  7.6 s (7.8 s)  3.5 s (3.5 s)  5.6 s (5.7 s) gram are module resolution and type checking, failing 1 745

with config. bug 2.8 s (2.9 s)  4.8 s (4.9 s)  48.4 s (57.4 s)
VSDB: Listing 2 39.2 s (39.6 s) 8.1 s (8.4 s) 163.4 s (189.9 s)
(29 %) and 984 (16 %) executions. The causes include in-
with AWS RDS 3.1 s (3.1 s) 8.0 s (8.1 s) 212.5 s (265.7 s) compatibility with PNPM, the TypeScript version, unmet envi-
ronment assumptions, and incomplete, broken setups. Among
the programs ProTI was able to test, it found issues in 68 %.
The tests found 659 (11 %) executions where the setup was
TABLE IV
EXECUTION TIME AND RESULT CLASSIFICATION OF PROTI EXECUTIONS incomplete, e.g., missing configuration or programs. Mocking
ON 6 081 PULUMI TYPESCRIPT PROGRAMS failed in 468 (8 %) executions, which can be caused by incom-
patible, outdated Pulumi versions. Our type-based oracle and
Category Execution Time
Error Reason [# Programs. (% in Category)] average generator failed to find type definitions in 416 (7 %) execu-
# programs. (std)
tions because they are dynamic resources, stack references, or
Project 1.6 s
2 (0 %)
invalid Pulumi.yaml 2 (100 %)
(0.1 s) missing in the provider’s schema. Our oracle identified invalid
Transpilation module resolution 1 335 (50 %), type checking 984 8.9 s resource configurations in 58 (1 %) executions. ProTI ran only
2 649 (44 %) (37 %), program resolution 324 (12 %), legacy (5.6 s)
NodeJS 5 (0 %), JSX 1 (0 %) an unknown number of tests in crashed executions, 100 tests in
Preloading module resolution 410 (85 %), legacy 7.8 s the passing ones, and only a single test in 98 % of the executions
482 (8 %) NodeJS/Pulumi 20 (4 %), unknown 18 (4 %), syntax (5.9 s) under checking. In the other 26 checking executions, ProTI ran
error 18 (4 %), config 16 (3 %)
Checking setup 659 (40 %), mocking 468 (29 %), missing type 17.2 s between 2 and 38 tests until an error was found. Due to a lack
1 633 (27 %) definition 416 (25 %), application 86 (5 %), other 64 (17.2 s) of ground truth, we cannot determine the precision and recall
(4 %), oracle 58 (4 %)
Passed 23.4 s
of the experiment.
772 (13 %) (11.4 s)
Crashed 25.9 s
out of memory 473 (87 %), unknown 70 (13 %) RQ2: ProTI can be applied to existing IaC programs.
543 (9 %) (38.9 s)
Total 14.4 s
6 081 (100 %) (17.0 s)

C. Execution Duration and Scaling Behavior


B. Applicability to Real-World Programs We performed time measurements on Pulumi programs that
We executed ProTI on all 6 081 Pulumi TypeScript programs define 0, 1, 10, 50, and 100 AWS S3 bucket resources. The
in the PIPr dataset [16] of all public IaC programs on GitHub experiment considered two program variants, one defining the
in August 2022. PIPr contains examples, toy projects, and pro- resources independently for parallel deployment, and one in a
duction projects in unknown shares and is only filtered by the dependency chain for sequential deployment. We ran ProTI
relevance criteria inherent to the GitHub Code Search API [42] three times on each program and repeated the experiment five
we used for the evaluation and that we discuss in detail in the times. As the programs are correct, ProTI runs them 100 times
dataset’s paper [16]. PNPM was used to install dependencies in each execution without identifying a bug. Table V and Fig. 7
and TypeScript version 5.1.6 for the execution. report the average execution time in total and separated by
The first two columns of Table IV show the results. We phase. Table V shows the absolute values separately for the first
categorized the executions by the phase in the ProTI run and consecutive runs. Fig. 7 separates the first, second, and third
where a problem was detected: invalid project files that prevent runs and also shows results as relative values.
execution, failures during transpilation, failures during mod- Execution times are higher for first runs because the transpi-
ule preloading, failures during checking, successfully passed, lation overhead is significant and, on average, 76 % lower in
and crashed executions. Within each category, we grouped subsequent runs (Fig. 7). Test runs, transpilation, and module

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1594 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024

TABLE V
TOTAL AVERAGE EXECUTION TIME OF PROTI OVER 5 REPETITIONS OF THE RQ3: A single test run of ProTI typically takes hundreds
DURATION EXPERIMENTS BY FIRST/CONSECUTIVE EXECUTION AND PHASE
FOR IAC PROGRAMS WITH 0, 10, 50, AND 100 RESOURCES WITH BOTH
of milliseconds and test duration scales with the number of
INDEPENDENT AND CHAINED DEPENDENCIES resources—not with their deployment time—permitting to
quickly check hundreds of configurations.
Resources: 0 1 10 50 100
Phase indep. chain indep. chain indep. chain
Remaining 1.7 s 1.6 s 2.2 s 2.2 s 6.3 s 4.6 s 14.8 s 15.9 s
100 Runs 1.0 s 9.3 s 57.4 s 69.5 s 274.5 s 262.8 s 563.3 s 535.8 s D. Integrating Existing Tools Into ProTI
Run 1

Preloading 0.7 s 0.7 s 0.7 s 0.7 s 0.7 s 0.7 s 0.7 s 0.7 s


Transpilation 15.1 s 15.2 s 15.3 s 15.3 s 15.2 s 15.3 s 15.3 s 15.3 s ACT’s effectiveness is crucially dependent on the quality
Total 18.5 s 26.8 s 75.6 s 87.7 s 296.6 s 283.4 s 594.2 s 567.8 s
of its plugins. Many techniques have been developed for test
Remaining 1.6 s 1.6 s 2.2 s 2.3 s 6.3 s 6.3 s 11.1 s 10.0 s generation and oracles (cf. Section III-C). To leverage advanced
Run 2 & 3

100 Runs 1.4 s 7.4 s 51.7 s 50.3 s 260.8 s 243.1 s 520.2 s 493.6 s
Preloading 0.8 s 0.8 s 0.8 s 0.8 s 0.7 s 0.8 s 0.8 s 0.8 s techniques from related work, ProTI must be open to extension
Transpilation 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s with them. To demonstrate ProTI’s extendability, we imple-
Total 7.5 s 13.5 s 58.3 s 57.1 s 271.6 s 253.9 s 535.7 s 508.1 s
mented ProTI plugins using the Radamsa fuzzer [43] and the
Daikon invariant detector [44] with a generator and an oracle
plugin based on existing tools. This experiment assesses the
feasibility of integrating existing approaches; optimizing them
and evaluating their effectiveness and efficiency is the subject of
future work focusing on test generation and oracle techniques,
while this paper focuses on the overall approach.
Radamsa [43] is a fuzzing tool that derives fresh test inputs
from an example. We adopted it for a ProTI generator plugin
that, separately for each resource type, uses the type-based
generator to generate an output configuration example, which
is passed to Radamsa as JSON to generate a list of derived test
Fig. 7. Average execution time of ProTI over 5 repetitions of the duration inputs. We filter non-parsable configurations from Radamsa’s
experiments (Table V) by phase, resource count, and dependency. Results for results and use the remaining ones as test input in ProTI. When-
three consecutive executions (1, 2, 3). In total (top row) and relative (bottom
row). ever ProTI runs out of Radamsa-generated inputs, we repeat the
procedure. The generator implementation required 83 SLOC,
of which only 48 differ from a naïve generator returning empty
preloading are the only actions of the test runner taking signifi- configurations.
cant time. The remaining execution time was consumed outside Daikon [44] is a dynamic invariant detector that identifies
the test runner, including Jest’s setup and reporting. A single test application invariants in a set of program traces. We used it
run in the experiments took 10 ms to 5.9 s, and the duration for an invariant regression oracle that detects behavior changes
scales linearly with the resource number. across different versions of an IaC program. In the first ProTI
We found similar execution times in the IaC programs from execution, the oracle records all resources’ target and output
GitHub (Table IV). Conservatively approximating a single test states and invokes Daikon on them to find resource configura-
duration by dividing the total run time of all passed ProTI tion invariants over all runs, e.g., a particular bucket’s id equals a
executions by 100 (the number of runs), we measured test run field of a policy, independent of the concrete value. In consecu-
durations from 34 ms to 1.0 s; 234 ms on average. It is an tive ProTI executions, we repeat the procedure and additionally
approximation because the total run time also includes overhead compare the obtained invariants with the previously generated
like setup and reporting, and it is conservative because we ones, issuing a warning if an invariant cannot be found anymore,
assume these contributors are instant, i.e., the test run duration i.e., it may be violated in the new program version. The oracle
is likely a bit lower. The RWW experiments (Table III) confirm plugin comprises only 120 SLOC, mainly for converting re-
these durations, too. Lastly, our experiments show that ProTI source configurations between ProTI and Daikon and managing
is quicker when it finds a bug because of early termination. state across executions.
The experiments passing RWW experiments with 6 resources
(VS) and 25 resources (VSDB) in Table III confirm that test
RQ4: Existing tools can be integrated into ProTI by
time grows with the number of resources (on average, 21 s and
implementing a plugin, demonstrating ProTI’s openness to
40 s including overhead). They further show that the perfor-
third-party techniques.
mance of integration testing heavily depends on the deploy-
ment time of the resources—which ProTI is independent of.
Deploying AWS RDS databases takes longer than AWS S3
resources, yielding testing VSDB takes 20× and 4× longer than E. Limitations, Threats to Validity, and Implications
VS with property testing and end-to-end testing, respectively, Our experiments on ProTI show that ACT can find bugs
while ProTI was only 2× slower. quickly and reliably in IaC programs, even in edge cases (RQ1),

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1595

can be applied to IaC programs without adjustments (RQ2), VI. RELATED WORK
can be fast enough to run hundreds of tests in a short time
We summarize the limitations of two-phase PL-IaC solutions
(RQ3), and can be extended with existing tools through gen-
and related work on infrastructure deployment quality, auto-
erator and oracle plugins (RQ4). Yet, our experiments do not
mated mocking, and related software testing techniques.
provide quantitative insight into ACT’s effectiveness, i.e., the
likelihood that all bugs and no false positives are found and after
which time. Such insights require an IaC program dataset with A. Limitations of Two-Phase PL-IaC
correctness annotations, i.e., precise knowledge about bugs General PL-IaC solutions like Pulumi can observe a re-
in them. Such evaluation is planned in future work to assess source’s state after deployment, the output configuration, and
advanced generator and oracle plugins. This paper focuses on process the values in the general-purpose language. In con-
the feasibility of the ACT approach to test IaC, not on the trast, two-phase PL-IaC solutions like AWS CDK and CDKTF
precision and recall of a specific testing technique. prohibit IaC programs from accessing the deployment state.
Relevant threats to validity in this work include that we Two-phase PL-IaC solutions (1) execute the IaC program to
evaluate ACT through ProTI, a single instantiation for one generate the target state as a JSON file and (2) provide it to the
specific PL-IaC solution and language. Yet, we expect that deployment engine, i.e., AWS CloudFormation or Terraform.
implementations for other languages and PL-IaC solutions yield Such exchange is uni-directional, i.e., with no arrow from the
similar results because IaC programs for other tools and other deployment engine to the IaC program in Fig. 1. Due to this
languages, i.e., the embedded PL-IaC DSL, are, technically, approach, two-phase PL-IaC can only compute on resource
analogous. The IaC program selection in our experiments is state that can be expressed in the deployment engine’s DSL—
also a threat. For RQ1, the set of variants in RWW suffices practically limited to referencing values, string interpolation,
to demonstrate the behavioral differences of ACT compared and simple value processing. Yet, using an expressive language
to other techniques; yet, more experiments are needed to show to process the externally generated state is the reason for using
with statistical significance that these differences are relevant in general-purpose languages in IaC programs in the first place.
practice such that ACT is beneficial on other IaC programs. For Accordingly, two-phase PL-IaC only provides a subset of PL-
RQ2, we inherit the limitations and validity threats of the PIPr IaC’s capabilities. In fact, AWS CDK code can be embedded
dataset [16]—including generalizability—but, based on our ex- into Pulumi programs, but not vice versa [45].
perience, we expect the qualitative insight to apply to other IaC Unit testing two-phase PL-IaC, i.e., for CDKs [46], [47], is
programs. For RQ3, we focused on the number of resources simpler than for general PL-IaC and does not require mocking
and their dependencies in IaC programs, showing how they as two-phase IaC programs do not interact with the deployment
influence performance. We rely on our experience that resource engine. We believe this simplification is the reason unit testing
number and dependency are the factors that most significantly is much more common in CDK projects than Pulumi [16]. Also,
impact performance, but other factors can be studied with a template projects set up through the CLI include a unit testing
more comprehensive sensitivity analysis. The categorization, setup with a simple test for CDKs (commented in the templates
as well as the error labeling and grouping in RQ2, may be by default), but not for Pulumi.
subjective, an issue we limited through the review of a second
author. Another potential issue is that ProTI is a random-based
B. Infrastructure as Code Quality
testing tool, which, in case of a bug, may cause the bug to
be inconsistently (not) caught by different test cases across Previous work discussed IaC quality and how to improve it,
executions. Hence, we apply 10 repetitions for RQ1. For RQ2, but it is mainly focused on Puppet, Ansible, and Chef. These
we saw negligible variance in tests. As the programs in RQ3 Configuration as Code (CaC) tools have been designed to con-
are correct, they are not impacted by this threat. RQ4 is also figure existing, mutable infrastructure—even though they also
not affected because it only demonstrates that existing tools can support provisioning. In contrast, PL-IaC does not only employ
be leveraged in ACT. RQ4 does not measure ProTI executions general-purpose programming languages instead of DSLs; it
to quantify the effectiveness of specific tools in the context of also focuses on infrastructure provisioning (like, e.g., Terraform
IaC. This aspect must be evaluated for each plugin and crucially and AWS CloudFormation), typically implementing immutable
depends on the implemented method. infrastructure management. Research on PL-IaC has been lim-
For practitioners, ACT and ProTI are new techniques whose ited to deployment coordination [48], [49].
effectiveness depends, in the long run, on a community effort Hummer et al. [50] proposed an idempotency testing ap-
to maintain the framework and the test generation and oracle proach for Chef scripts, which Ikeshita et al. [51] augmented
plugins. Practitioners can now try out ACT with low effort on with verification techniques to minimize the size of the required
existing Pulumi TypeScript IaC programs. This solution can test suite. Shambaugh et al. [52] proposed Rehearsal to verify
already reduce the development time through earlier bug de- the determinacy and idempotency of Puppet scripts. Yet, declar-
tection and increase the reliability of IaC programs, supporting ative IaC ensures idempotency by design.
faster evolving, functional, secure systems. A user study assess- Sharma et al. [53] were the first to identify code smells in
ing user acceptance of ACT and ProTI is left to future work. For Puppet scripts. Later studies confirmed them for Chef [54].
researchers, ACT and ProTI are novel testbeds that facilitate Rahman et al. surveyed CaC research [6] and identified source
exploring advanced test generation and oracle techniques for code properties correlating with defects in Puppet scripts [55],
IaC programs and correct and secure cloud configuration. such as hard-coded strings. They further recognized security
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1596 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024

smells and proposed linters for Puppet, Ansible, and Chef [56], D. Infrastructure Verification
[57]. Saavedra and Ferreira [58] introduced GLITCH for linters Ensuring infrastructure correctness has been extensively
on a CaC-solution-agnostic intermediate representation. Reis studied. AWS investigated automatically verifying infrastruc-
et al. [59] found that such linters are too imprecise but can be
ture properties [78], [79], [80], leading to at least two automated
improved through user feedback. services in production: AWS Tiros verifies reachability queries
Opdebeeck et al. [60], [61] analyzed the quality of semantic
on virtual networks [81] and AWS Zelkova performs access ver-
versioning and variable-precedence-related code smells in An-
ification on role-based AWS IAM policies [82]. These solutions
sible. Further, they applied program dependence graph analysis verify already deployed setups, but their techniques should be
to Ansible scripts, motivating control- and data-flow analysis
applicable pre-deployment on IaC programs, which encode the
for IaC security smell detection techniques [62]. Dalla Palma
infrastructure’s configuration. Such pre-deployment infrastruc-
et al. [63], [64], [65] proposed various quality metrics and an ture verification could also leverage more foundational tech-
AI defect prediction framework for Ansible scripts. Kumara
niques. E.g., Alloy [83] is a language and analysis tool to verify
et al. [4] and Guerriero et al. [3] explored IaC best practices
structural properties of software. Ahrens et al. [84] developed
and issues in the industry through a grey literature survey and a proof system for invariants on reconfigurable distributed sys-
practitioner interviews. Hassan and Rahman [66] studied bugs
tems. Evangelidis et al. [85] proposed probabilistic verification
in open-source Ansible test scripts. Borovits et al. [67] pro- of performance properties of rule-based auto-scaling policies.
posed FindICI, an AI-based tool to identify linguistic inconsis- Lastly, Abu Jabal et al. [86] gave a comprehensive overview
tency between documentation, comments, and code in Ansible
of techniques for policy verification, focused on access control
scripts, and Chiari et al. [68] surveyed work on static analysis and network management.
for IaC, focusing mainly on CaC. Program verification remains an open challenge, either re-
This paper is the first about quality in PL-IaC, which
quiring significant manual effort or being limited to specific
focuses—unlike CaC—on declarative infrastructure provision- properties [87]. Augmenting ACT with automated verification
ing through programs in popular imperative programming lan- of domain-specific properties, e.g., network access constraints,
guages. Further, we propose ACT and implement it in ProTI,
is a promising direction, orthogonal to ACT’s contribution to
enabling efficient unit testing of IaC programs. the testing of IaC programs.

C. Correctness of Infrastructure and Architecture Modeling E. Automated Mocking


Modeling languages are textual or graphical languages In a study on mocking in open source systems, Spadini et al.
that express a system’s structure. The Topology and Orches- [88] found that developers mock components that are difficult to
tration Specification for Cloud Applications (TOSCA) is a handle and that mocking code increases the coupling between
modeling language for the topology of cloud applications, system and test code. According to the authors, the results mo-
resources, and their orchestration [69]. Bellendorf and Mann tivate the need for mock synthesis. Taneja et al. [89] proposed
[70] surveyed existing literature on TOSCA. Wurster et al. MODA, using an efficient, SQL-aware mock and advanced test
[71] presented a systematic review of declarative deployment generation techniques to automatically test database applica-
technologies and introduced a metamodel for their common tions. Solms and Marshall [90] automatically generated mocks
core. A follow-up work [72] leverages such core to define from explicit component contracts. Various works synthesize
TOSCA Light, a subset of TOSCA, aiming to reconcile research mocks from interaction traces of components [91], [92], [93].
modeling and industrial practice. TOSCA Light enables the In contrast, Zhu et al.’s StubCoder [94] synthesizes mocks for
transformation of compliant deployment models to technology- regression testing solely from the tests’ code, without running
specific models. the mocked component.
Architecture description languages (ADL) define application Mocking resource definitions in IaC programs is trivial be-
components and their relationships. For example, ArchJava [73] cause PL-IaC solutions provide an interface to intercept them,
embeds such specifications in Java source code, enabling archi- eliminating the need for advanced mocking techniques. Yet,
tecture compliance checks at compile time. Krüger et al. [74] in- the mocks’ test generation and validation logic are complex.
troduced ORS for the compositional specification, deployment, ACT encapsulates them into plugins, enabling the integration
and dynamic reconfiguration of systems of services. In contrast of mocking techniques from literature into ProTI.
to other established ADLs, ORS separates the application from
infrastructure concerns. Terra and de Oliveira Valente [75] pro-
F. Fuzz and Property-Based Testing
posed specifying and statically enforcing dependencies in the
software architectures to avoid erosion. Placement types are a Fuzz testing (fuzzing) discovers software vulnerabilities, typ-
language approach where the type system checks architectural ically by treating the program as a closed box and testing it
conformance [76], [77]. for hangs and crashes. Yet, input-value-generation-guided ap-
ProTI verifies the correct composition of infrastructure proaches exist; for example, grammar-based fuzzing is an active
configuration, e.g., through type-based oracles, and enables research field [95], [96]. Li et al. [33] and Zeller et al. [30]
application-specific checks through ad-hoc specifications. On provided an overview of state-of-the-art fuzzing techniques.
top, ProTI tests the imperative IaC program generating it. Property-based testing (PBT) [28], [29] is a related approach,

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1597

where code is exercised on randomly generated tests, and results [8] U. Breitenbücher, T. Binz, K. Képes, O. Kopp, F. Leymann, and J.
are checked against invariants—the properties. Wettinger, “Combining declarative and imperative cloud application
provisioning based on TOSCA,” in Proc. IEEE Int. Conf. Cloud Eng.,
Various works investigate effective PBT test generators. Boston, MA, USA. Los Alamitos, CA, USA: IEEE Comput. Soc. Press,
Lampropoulos et al. proposed Luck, a language for PBT gen- 2014, pp. 87–96, doi: 10.1109/IC2E.2014.56.
erators [97], and coverage-guided PBT [98]. Löscher and Sag- [9] “Pulumi: Infrastructure as code in any programming language.”
Pulumi. Accessed: Nov. 29, 2023. [Online]. Available: https://github.
onas introduced targeted PBT [32] and automated it [99] using com/pulumi/pulumi
search-based techniques to guide the generation. Kuhn et al. [10] “Cloud development framework: AWS Cloud Development Kit.” Ama-
[100] found that most bugs are caused by the interaction of only zon Web Services. Accessed: Nov. 29, 2023. [Online]. Available: https://
aws.amazon.com/cdk/
a few parameters, motivating combinatorial testing [101], which [11] “CDK for Terraform.” HashiCorp. Accessed: Nov. 29, 2023. [Online].
Goldstein et al. [34] applied to PBT generators by modifying the Available: https://developer.hashicorp.com/terraform/cdktf
random generator distributions. On the intersection with formal [12] J. Duffy, “Pulumi raises Series B to build the future of cloud engineer-
ing.” Pulumi Blog. Accessed: Nov. 30, 2023. [Online]. Available: https://
methods, Paraskevopoulou et al. [102] integrated PBT into a www.pulumi.com/blog/series-b/
proof assistant to verify tests, and Lampropoulos et al. [103] [13] J. Duffy, “Building the best infrastructure as code with $41M Series C
compiled logical conditions (inductive relations) to generators funding.” Pulumi Blog. Accessed: Nov. 30, 2023. [Online]. Available:
https://www.pulumi.com/blog/series-c/
and to their soundness and completeness proofs. De Angelis [14] M. Madeja, J. Porubän, S. Chodarev, M. Sulír, and F. Gurbáľ, “Empirical
et al. [104] leveraged symbolic execution and constraint logic study of test case and test framework presence in public projects on
programming to automatically derive generators. GitHub,” Appl. Sci., vol. 11, no. 16, pp. 1–22, 2021, doi:10.3390/
app11167250.
ACT is fuzzing and PBT for IaC programs. For ProTI, type- [15] P. Singh Kochhar, T. F. Bissyandé, D. Lo, and L. Jiang, “An empirical
based generators and oracles, prototypes demonstrating third- study of adoption of software testing in open source projects,” in Proc.
party tool integration, and an ad-hoc specification syntax are 13th Int. Conf. Qual. Softw., Najing, China. Piscataway, NJ, USA: IEEE
Press, 2013, pp. 103–112, doi: 10.1109/QSIC.2013.57.
available. The approaches above can be integrated or imple- [16] D. Sokolowski, D. Spielmann, and G. Salvaneschi, “The PIPr dataset
mented in ProTI plugins to use them for IaC programs. of public infrastructure as code programs,” in Proc. 21st IEEE/ACM
Int. Conf. Mining Softw. Repositories (MSR), Lisbon, Portugal, 2024,
pp. 498–503, doi: 10.1145/3643991.3644888.
[17] H. Holmström Olsson, H. Allahyari, and J. Bosch, “Climbing the
VII. CONCLUSION ”stairway to heaven” - A mulitiple-case study exploring barriers in the
transition from agile development towards continuous deployment of
Testing is rarely used for IaC programs, and available tech- software,” in Proc. 38th Euromicro Conf. Softw. Eng. Adv. Appl. (SEAA),
Cesme, Izmir, Turkey, V. Cortellessa, H. Muccini, and O. Demirörs,
niques either hinder development velocity or require much pro- Eds., Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 2012,
gramming effort. We present Automated Configuration Testing pp. 392–399, doi: 10.1109/SEAA.2012.54.
(ACT) for quick IaC program testing at low effort and imple- [18] J. Humble and D. Farley, Continuous Delivery: Reliable Software
Releases Through Build, Test, and Deployment Automation. Reading,
ment it for Pulumi TypeScript in ProTI. ProTI is effective on MA, USA: Addison-Wesley, 2010.
existing IaC programs, and its modular architecture enables the [19] P. Ralph et al., “Empirical standards for software engineering research,”
use of existing third-party and novel test generators and oracles, 2021, arXiv:2010.03525.
[20] D. Sokolowski, D. Spielmann, and G. Salvaneschi, “ProTI: Automated
breaking ground for future research on effective test generators unit testing of Pulumi TypeScript infrastructure as code programs,” 2023,
and oracles for IaC programs. doi: 10.5281/zenodo.10028479.
[21] D. Sokolowski, D. Spielmann, and G. Salvaneschi, “Evaluation of auto-
mated infrastructure as code program testing,” 2024. doi: 10.5281/zen-
odo.10908273.
REFERENCES [22] “Cloud object storage: Amazon S3,” Amazon Web Services. Accessed:
Nov. 29, 2023. [Online]. Available: https://aws.amazon.com/s3/
[1] K. Morris, Infrastructure as Code: Dynamic Systems for the Cloud Age, [23] “Developer tools: SDKs and programming toolkits for building on AWS:
2nd ed. Sebastopol, CA, USA: O’Reilly Media, Inc., 2021. SDKs,” Amazon Web Services. Accessed: Nov. 29, 2023. [Online].
[2] G. Kim, J. Humble, P. Debois, J. Willis, and N. Forsgren, The DevOps Available: https://aws.amazon.com/developer/tools/#SDKs
hHandbook: How to Create World-class Agility, Rel., & Secur. Technol. [24] “Azure SDK releases.” Microsoft Azure. Accessed: Nov. 29, 2023.
Organizations, 2nd ed. IT Revolution Press, 11 2021. [Online]. Available: https://azure.github.io/azure-sdk/
[3] M. Guerriero, M. Garriga, D. A. Tamburri, and F. Palomba, “Adoption, [25] “Testing of Pulumi programs.” Pulumi. Accessed: Nov. 29, 2023.
support, and challenges of Infrastructure-as-Code: Insights from indus- [Online]. Available: https://www.pulumi.com/docs/using-pulumi/testing/
try,” in Proc. IEEE Int. Conf. Softw. Maintenance Evolution (ICSME), [26] “Policy as code for any cloud with Pulumi: Pulumi CrossGuard.” Pulumi.
Cleveland, OH, USA. Piscataway, NJ, USA: IEEE Press, 2019, pp. 580– Accessed: Nov. 29, 2023. [Online]. Available: https://www.pulumi.com/
589, doi: 10.1109/ICSME.2019.00092. crossguard/
[4] I. Kumara et al., “The do’s and don’ts of infrastructure code: [27] “Integration testing for Pulumi programs.” Pulumi. Accessed: Nov. 29,
A systematic gray literature review,” Inf. Softw. Technol., vol. 137, 2021, 2023. [Online]. Available: https://www.pulumi.com/docs/using-pulumi/
Art. no. 106593. doi: 10.1016/J.INFSOF.2021.106593. testing/integration/
[5] L. A. F. Leite, C. Rocha, F. Kon, D. S. Milojicic, and P. Meirelles, [28] G. Fink and M. Bishop, “Property-based testing: A new approach
“A survey of DevOps concepts and challenges,” ACM Comput. Surv., to testing for assurance,” ACM SIGSOFT Softw. Eng. Notes, vol. 22,
vol. 52, no. 6, pp. 127: 1–127:35, 2020, doi: 10.1145/3359981. no. 4, pp. 74–80, 1997, doi: 10.1145/263244.263267.
[6] A. Rahman, R. Mahdavi-Hezaveh, and L. A. Williams, “A systematic [29] K. Claessen and J. Hughes, “A lightweight tool for random test-
mapping study of infrastructure as code research,” Inf. Softw. Technol., ing of Haskell programs,” in Proc. Fifth ACM SIGPLAN Int. Conf.
vol. 108, pp. 65–77, 2019, doi: 10.1016/j.infsof.2018.12.004. Functional Program. (ICFP ’00), Montreal, Canada, M. Odersky and
[7] C. Endres, U. Breitenbücher, M. Falkenthal, O. Kopp, F. Leymann, P. Wadler, Eds., New York, NY, USA: ACM, 2000, pp. 268–279, doi:
and J. Wettinger, “Declarative vs. imperative: Two modeling patterns 10.1145/351240.351266.
for the automated deployment of applications,” Accessed: Nov. 30, [30] A. Zeller, R. Gopinath, M. Böhme, G. Fraser, and C. Holler, “Fuzzing:
2023. [Online]. Available: https://www.iaas.uni-stuttgart.de/publications/ Breaking things with random inputs.” Fuzzing. Accessed: Nov. 30, 2023.
INPROC-2017-12-Declarative-vs-Imperative-Modeling-Patterns.pdf [Online]. Available: https://www.fuzzingbook.org/html/Fuzzer.html

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1598 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024

[31] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-directed C. Krintz and E. D. Berger, Eds., New York, NY, USA: ACM, 2016,
random test generation,” in Proc. 29th Int. Conf. Softw. Eng. (ICSE pp. 416–430, doi: 10.1145/2908080.2908083.
2007), Minneapolis, MN, USA. Los Alamitos, CA, USA: IEEE Comput. [53] T. Sharma, M. Fragkoulis, and D. Spinellis, “Does your configuration
Soc. Press, 2007, pp. 75–84, doi: 10.1109/ICSE.2007.37. code smell?” in Proc. 13th Int. Conf. Mining Softw. Repositories (MSR),
[32] A. Löscher and K. Sagonas, “Targeted property-based testing,” in Proc. Austin, TX, USA, M. Kim, R. Robbes, and C. Bird, Eds., New York,
26th ACM SIGSOFT Int. Symp. Softw. Testing Anal., Santa Barbara, CA, NY, USA: ACM, 2016, pp. 189–200, doi: 10.1145/2901739.2901761.
USA, T. Bultan and K. Sen, Eds., New York, NY, USA: ACM, 2017, [54] J. Schwarz, A. Steffens, and H. Lichter, “Code smells in infrastructure as
pp. 46–56, doi: 10.1145/3092703.3092711. code,” in Proc. 11th Int. Conf. Qual. Inf. Commun. Technol. (QUATIC),
[33] J. Li, B. Zhao, and C. Zhang, “Fuzzing: A survey,” Cybersecurity, vol. Coimbra, Portugal, A. Bertolino, V. Amaral, P. Rupino, and M. Vieira,
1, no. 1, 2018, Art. no. 6, doi: 10.1186/S42400-018-0002-Y. Eds., Los Alamitos, CA, USA: IEEE Comput. Soc., 2018, pp. 220–228,
[34] H. Goldstein, J. Hughes, L. Lampropoulos, and B. C. Pierce, “Do doi: 10.1109/QUATIC.2018.00040.
judge a test by its cover - combining combinatorial and property- [55] A. Rahman and L. A. Williams, “Source code properties of defective
based testing,” in Proc. Program. Lang. Syst. 30th Eur. Symp. Pro- infrastructure as code scripts,” Inf. Softw. Technol., vol. 112, pp. 148–
gram. (ESOP), Luxembourg City, Luxembourg, N. Yoshida, Ed., vol. 163, 2019. doi: 10.1016/j.infsof.2019.04.013.
12648, Cham, Switzerland: Springer-Verlag, 2021, pp. 264–291, doi: [56] A. Rahman, C. Parnin, and L. A. Williams, “The seven sins: Security
10.1007/978-3-030-72019-3_10. smells in infrastructure as code scripts,” in Proc. IEEE / ACM 41st Int.
[35] A. Rahman, E. Farhana, C. Parnin, and L. A. Williams, “Gang of Conf. Softw. Eng. (ICSE), Montreal, QC, Canada, J. M. Atlee, T. Bultan,
Eight: A defect taxonomy for infrastructure as code scripts,” in Proc. and J. Whittle, Eds., 2019, pp. 164–175, doi: 10.1109/ICSE.2019.00033.
42nd Int. Conf. Softw. Eng., Seoul, South Korea, G. Rothermel and [57] A. Rahman, M. R. Rahman, C. Parnin, and L. A. Williams, “Security
D. Bae, Eds., New York, NY, USA: ACM, 2020, pp. 752–764, doi: smells in Ansible and Chef scripts: A replication study,” ACM Trans.
10.1145/3377811.3380409. Softw. Eng. Methodol., vol. 30, no. 1, pp. 3: 1–3:31, 2021, doi:
[36] T. Su et al., “Fully automated functional fuzzing of Android apps for 10.1145/3408897.
detecting non-crashing logic bugs,” in Proc. ACM Program. Lang., vol. [58] N. Saavedra and J. F. Ferreira, “GLITCH: Automated polyglot security
5, no. OOPSLA, 2021, pp. 1–31, doi: 10.1145/3485533. smell detection in infrastructure as code,” in Porc. 37th IEEE/ACM Int.
[37] “Jest: Delightful JavaScript testing.” Meta Platforms. Accessed: Nov. 29, Conf. Automated Softw. Eng. (ASE), Rochester, MI, USA. New York,
2023. [Online]. Available: https://jestjs.io/ NY, USA: ACM, 2022, pp. 47: 1–47:12, doi: 10.1145/3551349.3556945.
[38] N. Dubien, “fast-check official documentation.” fast-check. Accessed: [59] S. Reis, R. Abreu, M. d’Amorim, and D. Fortunato, “Leveraging practi-
Nov. 29, 2023. [Online]. Available: https://fast-check.dev/ tioners’ feedback to improve a security linter,” in Proc. 37th IEEE/ACM
[39] “Serverless compute: AWS Fargate.” Amazon Web Services. Accessed: Int. Conf. Automated Softw. Eng. (ASE), Rochester, MI, USA. New York,
Nov. 29, 2023. [Online]. Available: https://aws.amazon.com/fargate/ NY, USA: ACM, 2022, pp. 66: 1–66:12, doi: 10.1145/3551349.3560419.
[40] “Amazon elastic container service.” Amazon Web Services. Accessed: [60] R. Opdebeeck, A. Zerouali, C. Velázquez-Rodríguez, and C. De Roover,
Nov. 29, 2023. [Online]. Available: https://aws.amazon.com/ecs/ “On the practice of semantic versioning for Ansible Galaxy roles: An
[41] “Policies for AWS (AWSGuard).” Pulumi. Accessed: Nov. 29, empirical study and a change classification model,” J. Syst. Softw.,
2023. [Online]. Available: https://www.pulumi.com/docs/using-pulumi/ vol. 182, 2021, Art. no. 111059, doi: 10.1016/j.jss.2021.111059.
crossguard/awsguard/ [61] R. Opdebeeck, A. Zerouali, and C. De Roover, “Smelly variables in
[42] “Github docs: Searching code (legacy).” GitHub. Accessed: Nov. 30, Ansible infrastructure code: Detection, prevalence, and lifetime,” in
2023. [Online]. Available: https://docs.github.com/en/search-github/ Proc. 19th IEEE/ACM Int. Conf. Mining Softw. Repositories (MSR),
searching-on-github/searching-code Pittsburgh, PA, USA. New York, NY, USA: ACM, 2022, pp. 61–72,
[43] A. Helin, “Radamsa: A general-purpose fuzzer.” GitHub. Accessed: Nov. doi: 10.1145/3524842.3527964.
30, 2023. [Online]. Available: https://gitlab.com/akihe/radamsa [62] R. Opdebeeck, A. Zerouali, and C. De Roover, “Control and data flow
[44] M. D. Ernst et al., “The Daikon system for dynamic detection of likely in security smell detection for infrastructure as code: Is it worth the
invariants,” Sci. Comput. Program., vol. 69, nos. 1–3, pp. 35–45, 2007, effort? ” in Proc. 20th IEEE/ACM Int. Conf. Mining Softw. Repositories
doi: 10.1016/j.scico.2007.01.015. (MSR), Melbourne, Australia. Piscataway, NJ, USA: IEEE Press, 2023,
[45] L. Hoban, “Introducing AWS CDK on Pulumi.” Pulumi Blogs. Ac- pp. 534–545, doi: 10.1109/MSR59073.2023.00079.
cessed: Nov. 30, 2023. [Online]. Available: https://www.pulumi.com/ [63] S. Dalla Palma, D. Di Nucci, F. Palomba, and D. A. Tamburri, “Within-
blog/aws-cdk-on-pulumi/ project defect prediction of Infrastructure-as-Code using product and
[46] “Testing constructs: AWS Cloud Development Kit (AWS CDK) v2.” process metrics,” IEEE Trans. Softw. Eng., vol. 48, no. 6, pp. 2086–
Amazon Web Services. Accessed: Nov. 29, 2023. [Online]. Available: 2104, Jun. 2022, doi: 10.1109/TSE.2021.3051492.
https://docs.aws.amazon.com/cdk/v2/guide/testing.html [64] S. Dalla Palma, D. Di Nucci, F. Palomba, and D. A. Tamburri, “Toward
[47] “Unit tests: CDK for terraform.” HashiCorp. Accessed: Nov. 29, a catalog of software quality metrics for infrastructure code,” J. Syst.
2023. [Online]. Available: https://developer.hashicorp.com/terraform/ Softw., vol. 170, 2020, Art. no. 110726, doi: 10.1016/J.JSS.2020.110726.
cdktf/test/unit-tests [65] S. Dalla Palma, D. Di Nucci, and D. A. Tamburri, “AnsibleMet-
[48] D. Sokolowski, P. Weisenburger, and G. Salvaneschi, “Automating rics: A Python library for measuring Infrastructure-as-Code blueprints
serverless deployments for DevOps organizations,” in Proc. 29th in Ansible,” SoftwareX, vol. 12, 2020, Art. no. 100633, doi:
ACM Joint Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., Athens, 10.1016/J.SOFTX.2020.100633.
Greece, D. Spinellis, G. Gousios, M. Chechik, and M. D. Penta, Eds., [66] M. M. Hassan and A. Rahman, “As code testing: Characterizing
New York, NY, USA: ACM, 2021, pp. 57–69, doi: 10.1145/3468264. test quality in open source Ansible development,” in Proc. 15th
3468575. IEEE Conf. Softw. Testing, Verification Validation (ICST), Valencia,
[49] D. Sokolowski, P. Weisenburger, and G. Salvaneschi, “Decentralizing Spain. Piscataway, NJ, USA: IEEE Press, 2022, pp. 208–219, doi:
infrastructure as code,” IEEE Softw., vol. 40, no. 1, pp. 50–55, 2023, 10.1109/ICST53961.2022.00031.
doi: 10.1109/MS.2022.3192968. [67] N. Borovits et al., “FindICI: Using machine learning to detect lin-
[50] W. Hummer, F. Rosenberg, F. Oliveira, and T. Eilam, “Testing idem- guistic inconsistencies between code and natural language descriptions
potence for infrastructure as code,” in Proc. Middleware ACM/I- in Infrastructure-as-Code,” Empir. Softw. Eng., vol. 27, no. 7, 2022,
FIP/USENIX 14th Int. Middleware Conf., Beijing, China, D. M. Eyers Art. no. 178, doi: 10.1007/s10664-022-10215-5.
and K. Schwan, Eds., vol. 8275, Berlin, Heidelberg, Germany: Springer- [68] M. Chiari, M. De Pascalis, and M. Pradella, “Static analysis of infras-
Verlag, 2013, pp. 368–388, doi: 10.1007/978-3-642-45065-5_19. tructure as code: A survey,” in Proc. IEEE 19th Int. Conf. Softw. Archit.
[51] K. Ikeshita, F. Ishikawa, and S. Honiden, “Test suite reduction Companion (ICSA), Honolulu, HI, USA. Piscataway, NJ, USA: IEEE
in idempotence testing of infrastructure as code,” in Proc. Tests Press, 2022, pp. 218–225, doi: 10.1109/ICSA-C54293.2022.00049.
Proofs - 11th Int. Conf. Marburg, Germany, S. Gabmeyer and E. B. [69] “Topology and Orchestration Specification for Cloud Applications versin
Johnsen, Eds., vol. 10375, Springer-Verlag, 2017, pp. 98–115, doi: 1.0,” OASIS. Accessed: Nov. 29, 2023. [Online]. Available: http://docs.
10.1007/978-3-319-61467-0_6. oasis-open.org/tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.html
[52] R. Shambaugh, A. Weiss, and A. Guha, “Rehearsal: A configuration [70] J. Bellendorf and Z. Á. Mann, “Specification of cloud topologies and
verification tool for Puppet,” in Proc. 37th ACM SIGPLAN Conf. orchestration using TOSCA: A survey,” Computing, vol. 102, no. 8,
Program. Lang. Des. Implementation (PLDI) Santa Barbara, CA, USA, pp. 1793–1815, 2020, doi: 10.1007/S00607-019-00750-3.

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1599

[71] M. Wurster et al., “The essential deployment metamodel: A systematic Argentina, J. M. González-Barahona, A. Hindle, and L. Tan, Eds., Los
review of deployment automation technologies,” SICS Softw.Intensive Alamitos, CA, USA: IEEE Comput. Soc. Press, 2017, pp. 402–412, doi:
Cyber Phys. Syst., vol. 35, nos. 1–2, pp. 63–75, 2020, doi: 10.1109/MSR.2017.61.
10.1007/S00450-019-00412-X. [89] K. Taneja, Y. Zhang, and T. Xie, “MODA: automated test generation
[72] M. Wurster, U. Breitenbücher, L. Harzenetter, F. Leymann, for database applications via mock objects,” in Proc. 25th IEEE/ACM
J. Soldani, and V. Yussupov, “TOSCA Light: Bridging the gap Int. Conf. Automated Softw. Eng. (ASE), Antwerp, Belgium, C. Pecheur,
between the TOSCA specification and production-ready deployment J. Andrews, and E. D. Nitto, Eds., New York, NY, USA: ACM, 2010,
technologies,” in Proc. 10th Int. Conf. Cloud Comput. Services Sci., pp. 289–292, doi: 10.1145/1858996.1859053.
(CLOSER), Prague, Czech Republic, D. Ferguson, M. Helfert, and [90] F. Solms and L. Marshall, “Contract-based mocking for services-oriented
C. Pahl, Eds., Rijeka, Croatia: SciTech, 2020, pp. 216–226, doi: development,” in Proc. Annu. Conf. South Afr. Inst. Comput. Sci. Inf.
10.5220/0009794302160226. Technol. (SAICSIT), Johannesburg, South Africa., F. F. Blauw, M.
[73] J. Aldrich, C. Chambers, and D. Notkin, “ArchJava: Connecting soft- Coetzee, D. A. Coulter, E. M. Ehlers, W. S. Leung, C. Marnewick,
ware architecture to implementation,” in Proc. 24th Int. Conf. Softw. and D. van der Haar, Eds., New York, NY, USA: ACM, 2016, pp. 40:
Eng. (ICSE) , Orlando, Florida, USA, W. Tracz, M. Young, and J. 1–40:8, doi: 10.1145/2987491.2987534.
Magee, Eds., New York, NY, USA: ACM, 2002, pp. 187–197, doi: [91] D. Saff and M. D. Ernst, “Mock object creation for test factoring,”
10.1145/581339.581365. in Proc. ACM SIGPLAN-SIGSOFT Workshop Program Anal. Softw.
[74] I. Krüger, B. Demchak, and M. Menarini, “Dynamic service composition Tools Eng. (PASTE’04), Washington, DC, USA, C. Flanagan and A.
and deployment with OpenRichServices,” in Software Service and Zeller, Eds., New York, NY, USA: ACM, 2004, pp. 49–51, doi:
Application Engineering - Essays Dedicated to Bernd Krämer on the 10.1145/996821.996838.
Occasion of His 65th Birthday, M. Heisel, Ed., vol. 7365, Springer- [92] S. Joshi and A. Orso, “SCARPE: A technique and tool for se-
Verlag, 2012, pp. 120–146, doi: 10.1007/978-3-642-30835-2_9. lective capture and replay of program executions,” in Proc. 23rd
[75] R. Terra and M. T. de Oliveira Valente, “A dependency constraint IEEE Int. Conf. Softw. Maintenance (ICSM 2007), Paris, France. Los
language to manage object-oriented software architectures,” Softw. Pract. Alamitos, CA, USA: IEEE Comput. Soc., 2007, pp. 234–243, doi:
Exp., vol. 39, no. 12, pp. 1073–1094, 2009, doi: 10.1002/SPE.931. 10.1109/ICSM.2007.4362636.
[76] P. Weisenburger, M. Köhler, and G. Salvaneschi, “Distributed system [93] M. Fazzini, A. Gorla, and A. Orso, “A framework for automated test
development with ScalaLoci,” in Proc. ACM Program. Lang., vol. 2, mocking of mobile apps,” in Proc. 35th IEEE/ACM Int. Conf. Automated
no. OOPSLA, pp. 129: 1–129:30, 2018, doi: 10.1145/3276499. Softw. Eng. (ASE), Melbourne, Australia. Piscataway, NJ, USA: IEEE
[77] G. Zakhour, P. Weisenburger, and G. Salvaneschi, “Type-safe dynamic Press, 2020, pp. 1204–1208, doi: 10.1145/3324884.3418927.
placement with first-class placed values,” in Proc. ACM Program. Lang., [94] H. Zhu et al., “StubCoder: Automated generation and repair of stub code
vol. 7, no. OOPSLA2, Oct. 2023, doi: 10.1145/3622873. for mock objects,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 1,
[78] B. Cook, “Formal reasoning about the security of Amazon Web pp. 1–31, Aug. 2023, doi: 10.1145/3617171.
Services” in Proc. Comput. Aided Verification 30th Int. Conf., [95] V. Vikram, R. Padhye, and K. Sen, “Growing a test corpus with bonsai
(CAV), Oxford, UK, H. Chockler and G. Weissenbacher, Eds., vol. fuzzing,” in Proc. 43rd IEEE/ACM Int. Conf. Softw. Eng., ICSE 2021,
10981, Cham, Switzerland: Springer-Verlag, 2018, pp. 38–47, doi: Madrid, Spain. Piscataway, NJ, USA: IEEE Press, 2021, pp. 723–735,
10.1007/978-3-319-96145-3_3. doi: 10.1109/ICSE43902.2021.00072.
[79] J. Backes et al., “One-click formal methods,” IEEE Softw., vol. 36, [96] D. Steinhöfel and A. Zeller, “Input invariants,” in Proc. 30th ACM Joint
no. 6, pp. 61–65, Nov./Dec. 2019, doi: 10.1109/MS.2019.2930609. Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng. (ESEC/FSE) Singapore,
[80] M. Bouchet et al., “Block public access: Trust safety verification of Singapore,A. Roychoudhury, C. Cadar, and M. Kim, Eds., New York,
access control policies,” in Proc. 28th ACM Joint Eur. Softw. Eng. Conf. NY, USA: ACM, 2022, pp. 583–594, doi: 10.1145/3540250.3549139.
Symp. Foundations Softw. Eng. (ESEC/FSE ’20), Virtual Event, USA, [97] L. Lampropoulos, D. Gallois-Wong, C. Hritcu, J. Hughes, B. C. Pierce,
P. Devanbu, M. B. Cohen, and T. Zimmermann, Eds., New York, NY, and L. Xia, “Beginner’s Luck: A language for property-based genera-
USA: ACM, 2020, pp. 281–291, doi: 10.1145/3368089.3409728. tors,” in Proc. 44th ACM SIGPLAN Symp. Princ. Program. Lang. (POPL
[81] J. Backes et al., “Reachability analysis for AWS-based net- 2017), Paris, France, G. Castagna and A. D. Gordon, Eds., New York,
works,” in Proc. Comput. Aided Verification 31st Int. Conf. (CAV) NY, USA: ACM, 2017, pp. 114–129, doi: 10.1145/3009837.3009868.
New York City, NY, USA, I. Dillig and S. Tasiran, Eds., vol. [98] L. Lampropoulos, M. Hicks, and B. C. Pierce, “Coverage guided,
11562, Cham, Switzerland: Springer-Verlag, 2019, pp. 231–241, doi: property-based testing,” in Proc. ACM Program. Lang., vol. 3, no.
10.1007/978-3-030-25543-5_14. OOPSLA, pp. 181: 1–181:29, 2019, doi: 10.1145/3360607.
[82] J. Backes et al., “Semantic-based automated reasoning for AWS access [99] A. Löscher and K. Sagonas, “Automating targeted property-based test-
policies using SMT,” in Proc. Formal Methods Comput. Aided Des. ing,” in Proc. 11th IEEE Int. Conf. Softw. Testing, Verification Validation
(FMCAD), Austin, TX, USA, N. S. Bjørner and A. Gurfinkel, Eds., (ICST), Västerås, Sweden. Los Alamitos, CA, USA: IEEE Comput. Soc.
Piscataway, NJ, USA: IEEE Press, 2018, pp. 1–9, doi: 10.23919/FM- Press, 2018, pp. 70–80, doi: 10.1109/ICST.2018.00017.
CAD.2018.8602994. [100] R. Kuhn, D. R. Wallace, and A. M. Gallo, “Software fault interactions
[83] D. Jackson, “Alloy: A lightweight object modelling notation,” ACM and implications for software testing,” IEEE Trans. Softw. Eng., vol. 30,
Trans. Softw. Eng. Methodol., vol. 11, no. 2, pp. 256–290, 2002, doi: no. 6, pp. 418–421, 2004, doi: 10.1109/TSE.2004.24.
10.1145/505145.505149. [101] R. Kuhn, Y. Lei, and R. Kacker, “Practical combinatorial testing:
[84] E. Ahrens, M. Bozga, R. Iosif, and J. Katoen, “Reasoning about Beyond pairwise,” IT Prof., vol. 10, no. 3, pp. 19–23, May/Jun. 2008,
distributed reconfigurable systems,” in Proc. ACM Program. Lang., doi: 10.1109/MITP.2008.54.
vol. 6, no. OOPSLA2, pp. 145–174, 2022, doi: 10.1145/3563293. [102] Z. Paraskevopoulou, C. Hritcu, M. Dénes, L. Lampropoulos, and B. C.
[85] A. Evangelidis, D. Parker, and R. Bahsoon, “Performance modelling and Pierce, “Foundational property-based testing,” in Interactive Theorem
verification of cloud-based auto-scaling policies,” Future Gener. Comput. Proving - 6th Int. Conf. (ITP), Nanjing, China, C. Urban and X. Zhang,
Syst., vol. 87, pp. 629–638, Oct. 2018, doi: 10.1016/j.future.2017.12.047. Eds., vol. 9236, Cham, Switzerland: Springer-Verlag, 2015, pp. 325–343,
[86] A. Abu Jabal et al., “Methods and tools for policy analysis,” doi: 10.1007/978-3-319-22102-1_22.
ACM Comput. Surv., vol. 51, no. 6, pp. 121: 1–121:35, 2019. doi: [103] L. Lampropoulos, Z. Paraskevopoulou, and B. C. Pierce, “Generating
10.1145/3295749. good generators for inductive relations,” in Proc. ACM Program. Lang.,
[87] G. Zakhour, P. Weisenburger, and G. Salvaneschi, “Type-checking vol. 2, no. POPL, pp. 45: 1–45:30, 2018, doi: 10.1145/3158133.
CRDT convergence,” vol. 7, no. PLDI, 2023, pp. 1365–1388, doi: [104] E. De Angelis, F. Fioravanti, A. Palacios, A. Pettorossi, and M.
10.1145/3591276. Proietti, “Property-based test case generators for free,” in Tests Proofs
[88] D. Spadini, M. F. Aniche, M. Bruntink, and A. Bacchelli, “To mock 13th Int. Conf.,Porto, Portugal, D. Beyer and C. Keller, Eds., vol.
or not to mock?: An empirical study on mocking practices,” in Proc. 11823, Cham, Switzerland: Springer-Verlag, 2019, pp. 186–206, doi:
14th Int. Conf. Mining Softw. Repositories (MSR), Buenos Aires, 10.1007/978-3-030-31157-5_12.

Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.

You might also like