Automated_Infrastructure_as_Code_Program_Testing
Automated_Infrastructure_as_Code_Program_Testing
Abstract—Infrastructure as Code (IaC) enables efficient de- declarative IaC, developers describe the target state of the de-
ployment and operation, which are crucial to releasing software ployment instead of the deployment steps [7], [8], which are
quickly. As setups can be complex, developers implement IaC automatically derived. Deployments can be complex—a trend
programs in general-purpose programming languages like Type-
Script and Python, using PL-IaC solutions like Pulumi and AWS also driven by modern systems often consisting of several small
CDK. The reliability of such IaC programs is even more relevant components. For example, applications that have consisted of a
than in traditional software because a bug in IaC impacts the monolithic web server and a database now may comprise tens or
whole system. Yet, even though testing is a standard development hundreds of serverless functions and microservices. This trend
practice, it is rarely used for IaC programs. For instance, in transfers complexity from components to their composition,
August 2022, less than 1 % of the public Pulumi IaC programs
on GitHub implemented tests. Available IaC program testing resulting in long, structured IaC scripts. To cope with such
techniques severely limit the development velocity or require complexity, developers implement IaC programs—in contrast
much development effort. To solve these issues, we propose to IaC scripts—with recent declarative IaC solutions that adopt
Automated Configuration Testing (ACT), a methodology to test general-purpose languages, e.g., TypeScript, Python, or Java,
IaC programs in many configurations quickly and with low effort. and not only configuration languages and DSLs with con-
ACT automatically mocks all resource definitions in the IaC pro-
gram and uses generator and oracle plugins for test generation strained expressivity like JSON and YAML. Such Programming
and validation. We implement ACT in ProTI, a testing tool for Languages IaC (PL-IaC) solutions come with all abstractions
Pulumi TypeScript with a type-based generator and oracle, and (and tools) of well-known general-purpose programming lan-
support for application specifications. Our evaluation with 6 081 guages. To the best of our knowledge, the industrial-strength
programs from GitHub and artificial benchmarks shows that PL-IaC solutions available today are Pulumi [9], the Cloud
ProTI can directly be applied to existing IaC programs, quickly
finds bugs where current techniques are infeasible, and enables Development Kit (CDK) of Amazon Web Services (AWS CDK)
reusing existing generators and oracles thanks to its plugg- [10], and the CDK for Terraform (CDKTF) [11]. They have
able architecture. existed since 2018–2020 with quickly growing communities.
Index Terms—Property-based testing, fuzzing, The NPM core packages of AWS CDK, CDKTF, and Pulumi
infrastructure as code, DevOps. alone grew from 11 M downloads in 2020 to 146 M downloads
in 2023.1 Pulumi reported growth from hundreds to 2 000
customers and tens of thousands to 150 000 end users in the
I. INTRODUCTION
same period [12], [13].
0098-5589 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1586 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1587
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1588 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1589
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1590 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024
selection strategies, e.g., based on testing feedback [31], search- models, including types. For instance, Pulumi providers, i.e.,
based techniques [32], code coverage [33], and combinatorial vendor-specific plugins (cf. Section II-A) used by Pulumi to
coverage [34]. interact with the cloud, are distributed as packages that con-
tain a schema JSON file defining the types of the resources’
target and output configuration. Such type definitions are a
D. Discussion configuration model that is by design available for all resources
We now discuss bugs in IaC programs, ACT’s design, its Pulumi supports—even for dynamically typed languages—and
relation to cloud models, and the resulting limitations. they can be leveraged for type-based generators and oracles
1) IaC Program Bugs: We propose a bug taxonomy for [29]. ACT’s open architecture ensures that developers can adopt
IaC programs. In contrast to previous, more fine-grained bug and combine available models and plug in domain-specific
taxonomies, e.g., for IaC defects by Rahman et al. [35], we optimizations. ACT is not limited to functional properties. For
focus purely on the required oracle to find a bug. Recent fuzzing instance, models of cloud performance and security, predicting
literature, e.g., Su et al. [36] and Li et al. [33], commonly bad performance and insecure setups based on resource config-
distinguishes crash bugs that cause the program to crash and urations, can be embedded in ACT oracle plugins to cover such
non-crashing logic bugs, which require a more precise oracle non-functional aspects.
than crash detection to identify erroneous computations. We add Ideally, models for ACT generators and oracles are (1) com-
two categories for bugs where the program logic may be correct, plete, i.e., they can produce all valid configurations, and (2)
but the resulting resource configuration is faulty. Configuration correct, i.e., they include only valid configurations. Incomplete
bugs are the wrong configuration of an isolated resource, e.g., models in a generator systematically prevent generating test
setting an IPv4 address to the invalid value 400.0.0.1. With cases that may be needed to find bugs, and incorrect models can
configuration interaction bugs, the configuration of the individ- yield test cases that never occur in practice. Incomplete models
ual resources is valid but invalid in combination. For example, in oracles can trigger false positives (i.e., alerts in the absence of
there is a subnet 192.168.0.1/24 and a server in it has a bug) and incorrect models false negatives (i.e., missing bugs).
the IP address 192.168.1.2, which is invalid in this subnet. In practice, cloud models are not perfect. For instance, Pulumi
In contrast to crash and logic bugs, configuration bugs require package schema types are complete but not fully correct. In
oracles that can identify invalid cloud configurations and, for RWW (Listing 1), a correct generator should generate integers
configuration interaction bugs, even across multiple resources. in the range (Line 1.9) for RandomInteger’s result field
Crash bugs and logic bugs are related to “traditional” code, (Line 1.11). Yet, a type-based generator provides any number,
while configuration (interaction) bugs are related to the embed- including outside the range and fractions, because the type
ded DSL code in IaC programs that defines the target state of of RandomInteger.result is number. Similarly, a cor-
the deployment through instantiating objects of the resource rect oracle only accepts valid HTML for the content field
types’ classes. However, IaC programs mix traditional code (Line 1.16), but a type-based one accepts any string.
(Lines 1.1–1.5, Line 1.9, and Line 1.20 of Listing 1) with the In practice, useful test generators and oracles may still gen-
embedded DSL code (Lines 1.6–1.8 and Lines 1.10–1.18). This erate irrelevant tests or miss bugs. Even if application-specific
mixing prevents testing the kinds of code in isolation and causes knowledge can further limit the configuration space, correcting
existing testing methods to be only applicable with a huge the model in generator and oracle plugins may overfit the plu-
mocking effort (Section III-A). gins to the specific program, reducing reusability or slowing
2) ACT’s Approach: ACT focuses on finding configuration down development. ACT addresses these issues by enabling
(interaction) bugs. To this end, static analysis is a suitable alter- fine-tuning of test generation and oracles for a specific ap-
native; for example, it can easily find the bug in Listing 1. Yet, plication, e.g., ProTI provides an ad-hoc specifications syntax
we base ACT on automated testing because it does not incur the (Section IV-C).
limitations of static analysis when covering complex dynamic
behavior of the IaC program code and supporting all features
IV. ProTI: ACT FOR PULUMI TYPESCRIPT
of the host language. In such systematic testing, the generator
has to exercise the IaC program in different configurations We present ProTI, an instantiation of ACT for Pulumi Type-
to find crash and logic bugs that yield wrong configurations Script. ProTI is built upon the popular JavaScript testing tool
effectively. We argue that covering such configuration-related Jest [37], fast-check [38] for the test execution strategy and
crash and logic bugs is sufficient because IaC programs focus arbitraries, and Pulumi’s runtime mocking. ProTI comprises
on the configuration, and all relevant logic drives this pur- six TypeScript packages (Table II). The first four packages
pose. If an IaC program implements complicated configuration- implement the core abstractions and Jest plugins for a Jest
unrelated logic, it should be separated from the embedded DSL runner, test runner, and reporter. @proti-iac/pulumi-
code and specifically checked with existing, well-established packages-schema is a Pulumi-packages-schema-based or-
testing techniques. acle and a generator plugin. @proti-iac/spec implements
3) Cloud Configuration Models: Generators and oracles the ad-hoc specification syntax. ProTI is used through Jest’s
implicitly define models of cloud resource configuration. Such CLI, which’s configuration it facilitates with a preset. ProTI
models could be derived from specifications, be hand-crafted, preserves Jest’s pre-test features and optimizations, e.g., an in-
or, more realistically, be derived from existing approximate memory file system for the code.
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1591
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1592 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024
Listing 2. Listing 1 with ProTI ad-hoc specifications (orange). and scalability to ensure it is fast enough for realistic
IaC programs.
RQ4: Can existing test generation and oracle tools be
integrated into ProTI? We investigated whether ACT
allows to leverage third-party oracles and generators.
The following four subsections present our experiments and
Section V-E discusses their results and threats to validity. We
ran all experiments on serverless AWS Fargate [39] containers
with 1 vCPU and 4 GB of memory on AWS Elastic Container
Service (ECS) [40] in the eu-west-1 region (Ireland).
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1593
TABLE III errors by common causes and report their frequency. Both the
PL-IAC TESTING TECHNIQUES ON VARIANTS OF THE RWW EXAMPLE categorization and error labeling are based on string matching
(Listing 1). ∗ FAULTY VARIANT. ERROR FOUND ∗ (ALWAYS ), MINIMUM
(AVERAGE) RUN TIME OVER 10 REPETITIONS on the execution logs, and the error grouping by open coding.
This process was incrementally performed and implemented by
ProTI Dry Run Property Test the first author and reviewed by the second author. The authors
Unit Test Dry Property Test End-to-end Test know Pulumi and ProTI well through their research.
VNT:
16.7 s (16.8 s) 10.0 s (10.3 s) 12.4 s (12.4 s)
∗
Non-transpilable
1.9 s (2.0 s) 11.6 s (11.7 s) 47.9 s (65.7 s)
On a technical level, ProTI was able to test 40 % of the IaC
7.0 s (7.2 s) 2.3 s (2.4 s) 4.4 s (4.5 s) programs out of the box. This share is extremely remarkable and
∗ VE: Error
2.2 s (2.2 s) 3.7 s (3.8 s) 52.5 s (59.6 s) exceeds our initial expectations because (1) we did not filter for
VAE:
7.4 s (7.6 s) 3.4 s (3.5 s) 9.4 s (9.6 s)
∗ buggy or non-functional programs, (2) ran all programs with
Async Error
2.4 s (2.5 s) 4.8 s (4.9 s) 50.8 s (60.7 s)
VC: Correct
7.5 s (7.6 s) 3.4 s (3.4 s) 9.5 s (9.7 s) current NodeJS and TypeScript versions, and (3) did neither
2.7 s (2.7 s) 4.8 s (4.9 s) 53.5 s (59.0 s) look into nor provide any program-specific environments. We
VS: Listing 2 21.0 s (21.1 s) 3.5 s (3.5 s) 9.5 s (9.7 s)
(ad-hoc specs.) 2.8 s (2.9 s) 5.0 s (5.0 s) 52.6 s (62.3 s) suspect that ProTI can be used for most of the remaining IaC
VO: Listing 1 7.4 s (7.6 s) 3.4 s (3.4 s) ∗ 9.4 s (9.6 s) programs, too, after little effort is invested to understand their
∗
(one-off bug) 2.7 s (2.7 s) 4.8 s (4.9 s) ∗ 51.9 s (58.4 s)
VSO: Listing 2 8.1 s (8.3 s) 3.5 s (3.6 s) ∗ 9.5 s (9.7 s)
expected execution environment or bug.
∗ The most common reasons why ProTI could not test a pro-
with one-off bug 2.8 s (2.9 s) 4.9 s (5.0 s) ∗ 59.5 s (66.6 s)
VSB: Listing 2 7.6 s (7.8 s) 3.5 s (3.5 s) 5.6 s (5.7 s) gram are module resolution and type checking, failing 1 745
∗
with config. bug 2.8 s (2.9 s) 4.8 s (4.9 s) 48.4 s (57.4 s)
VSDB: Listing 2 39.2 s (39.6 s) 8.1 s (8.4 s) 163.4 s (189.9 s)
(29 %) and 984 (16 %) executions. The causes include in-
with AWS RDS 3.1 s (3.1 s) 8.0 s (8.1 s) 212.5 s (265.7 s) compatibility with PNPM, the TypeScript version, unmet envi-
ronment assumptions, and incomplete, broken setups. Among
the programs ProTI was able to test, it found issues in 68 %.
The tests found 659 (11 %) executions where the setup was
TABLE IV
EXECUTION TIME AND RESULT CLASSIFICATION OF PROTI EXECUTIONS incomplete, e.g., missing configuration or programs. Mocking
ON 6 081 PULUMI TYPESCRIPT PROGRAMS failed in 468 (8 %) executions, which can be caused by incom-
patible, outdated Pulumi versions. Our type-based oracle and
Category Execution Time
Error Reason [# Programs. (% in Category)] average generator failed to find type definitions in 416 (7 %) execu-
# programs. (std)
tions because they are dynamic resources, stack references, or
Project 1.6 s
2 (0 %)
invalid Pulumi.yaml 2 (100 %)
(0.1 s) missing in the provider’s schema. Our oracle identified invalid
Transpilation module resolution 1 335 (50 %), type checking 984 8.9 s resource configurations in 58 (1 %) executions. ProTI ran only
2 649 (44 %) (37 %), program resolution 324 (12 %), legacy (5.6 s)
NodeJS 5 (0 %), JSX 1 (0 %) an unknown number of tests in crashed executions, 100 tests in
Preloading module resolution 410 (85 %), legacy 7.8 s the passing ones, and only a single test in 98 % of the executions
482 (8 %) NodeJS/Pulumi 20 (4 %), unknown 18 (4 %), syntax (5.9 s) under checking. In the other 26 checking executions, ProTI ran
error 18 (4 %), config 16 (3 %)
Checking setup 659 (40 %), mocking 468 (29 %), missing type 17.2 s between 2 and 38 tests until an error was found. Due to a lack
1 633 (27 %) definition 416 (25 %), application 86 (5 %), other 64 (17.2 s) of ground truth, we cannot determine the precision and recall
(4 %), oracle 58 (4 %)
Passed 23.4 s
of the experiment.
772 (13 %) (11.4 s)
Crashed 25.9 s
out of memory 473 (87 %), unknown 70 (13 %) RQ2: ProTI can be applied to existing IaC programs.
543 (9 %) (38.9 s)
Total 14.4 s
6 081 (100 %) (17.0 s)
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1594 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024
TABLE V
TOTAL AVERAGE EXECUTION TIME OF PROTI OVER 5 REPETITIONS OF THE RQ3: A single test run of ProTI typically takes hundreds
DURATION EXPERIMENTS BY FIRST/CONSECUTIVE EXECUTION AND PHASE
FOR IAC PROGRAMS WITH 0, 10, 50, AND 100 RESOURCES WITH BOTH
of milliseconds and test duration scales with the number of
INDEPENDENT AND CHAINED DEPENDENCIES resources—not with their deployment time—permitting to
quickly check hundreds of configurations.
Resources: 0 1 10 50 100
Phase indep. chain indep. chain indep. chain
Remaining 1.7 s 1.6 s 2.2 s 2.2 s 6.3 s 4.6 s 14.8 s 15.9 s
100 Runs 1.0 s 9.3 s 57.4 s 69.5 s 274.5 s 262.8 s 563.3 s 535.8 s D. Integrating Existing Tools Into ProTI
Run 1
100 Runs 1.4 s 7.4 s 51.7 s 50.3 s 260.8 s 243.1 s 520.2 s 493.6 s
Preloading 0.8 s 0.8 s 0.8 s 0.8 s 0.7 s 0.8 s 0.8 s 0.8 s techniques from related work, ProTI must be open to extension
Transpilation 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s 3.7 s with them. To demonstrate ProTI’s extendability, we imple-
Total 7.5 s 13.5 s 58.3 s 57.1 s 271.6 s 253.9 s 535.7 s 508.1 s
mented ProTI plugins using the Radamsa fuzzer [43] and the
Daikon invariant detector [44] with a generator and an oracle
plugin based on existing tools. This experiment assesses the
feasibility of integrating existing approaches; optimizing them
and evaluating their effectiveness and efficiency is the subject of
future work focusing on test generation and oracle techniques,
while this paper focuses on the overall approach.
Radamsa [43] is a fuzzing tool that derives fresh test inputs
from an example. We adopted it for a ProTI generator plugin
that, separately for each resource type, uses the type-based
generator to generate an output configuration example, which
is passed to Radamsa as JSON to generate a list of derived test
Fig. 7. Average execution time of ProTI over 5 repetitions of the duration inputs. We filter non-parsable configurations from Radamsa’s
experiments (Table V) by phase, resource count, and dependency. Results for results and use the remaining ones as test input in ProTI. When-
three consecutive executions (1, 2, 3). In total (top row) and relative (bottom
row). ever ProTI runs out of Radamsa-generated inputs, we repeat the
procedure. The generator implementation required 83 SLOC,
of which only 48 differ from a naïve generator returning empty
preloading are the only actions of the test runner taking signifi- configurations.
cant time. The remaining execution time was consumed outside Daikon [44] is a dynamic invariant detector that identifies
the test runner, including Jest’s setup and reporting. A single test application invariants in a set of program traces. We used it
run in the experiments took 10 ms to 5.9 s, and the duration for an invariant regression oracle that detects behavior changes
scales linearly with the resource number. across different versions of an IaC program. In the first ProTI
We found similar execution times in the IaC programs from execution, the oracle records all resources’ target and output
GitHub (Table IV). Conservatively approximating a single test states and invokes Daikon on them to find resource configura-
duration by dividing the total run time of all passed ProTI tion invariants over all runs, e.g., a particular bucket’s id equals a
executions by 100 (the number of runs), we measured test run field of a policy, independent of the concrete value. In consecu-
durations from 34 ms to 1.0 s; 234 ms on average. It is an tive ProTI executions, we repeat the procedure and additionally
approximation because the total run time also includes overhead compare the obtained invariants with the previously generated
like setup and reporting, and it is conservative because we ones, issuing a warning if an invariant cannot be found anymore,
assume these contributors are instant, i.e., the test run duration i.e., it may be violated in the new program version. The oracle
is likely a bit lower. The RWW experiments (Table III) confirm plugin comprises only 120 SLOC, mainly for converting re-
these durations, too. Lastly, our experiments show that ProTI source configurations between ProTI and Daikon and managing
is quicker when it finds a bug because of early termination. state across executions.
The experiments passing RWW experiments with 6 resources
(VS) and 25 resources (VSDB) in Table III confirm that test
RQ4: Existing tools can be integrated into ProTI by
time grows with the number of resources (on average, 21 s and
implementing a plugin, demonstrating ProTI’s openness to
40 s including overhead). They further show that the perfor-
third-party techniques.
mance of integration testing heavily depends on the deploy-
ment time of the resources—which ProTI is independent of.
Deploying AWS RDS databases takes longer than AWS S3
resources, yielding testing VSDB takes 20× and 4× longer than E. Limitations, Threats to Validity, and Implications
VS with property testing and end-to-end testing, respectively, Our experiments on ProTI show that ACT can find bugs
while ProTI was only 2× slower. quickly and reliably in IaC programs, even in edge cases (RQ1),
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1595
can be applied to IaC programs without adjustments (RQ2), VI. RELATED WORK
can be fast enough to run hundreds of tests in a short time
We summarize the limitations of two-phase PL-IaC solutions
(RQ3), and can be extended with existing tools through gen-
and related work on infrastructure deployment quality, auto-
erator and oracle plugins (RQ4). Yet, our experiments do not
mated mocking, and related software testing techniques.
provide quantitative insight into ACT’s effectiveness, i.e., the
likelihood that all bugs and no false positives are found and after
which time. Such insights require an IaC program dataset with A. Limitations of Two-Phase PL-IaC
correctness annotations, i.e., precise knowledge about bugs General PL-IaC solutions like Pulumi can observe a re-
in them. Such evaluation is planned in future work to assess source’s state after deployment, the output configuration, and
advanced generator and oracle plugins. This paper focuses on process the values in the general-purpose language. In con-
the feasibility of the ACT approach to test IaC, not on the trast, two-phase PL-IaC solutions like AWS CDK and CDKTF
precision and recall of a specific testing technique. prohibit IaC programs from accessing the deployment state.
Relevant threats to validity in this work include that we Two-phase PL-IaC solutions (1) execute the IaC program to
evaluate ACT through ProTI, a single instantiation for one generate the target state as a JSON file and (2) provide it to the
specific PL-IaC solution and language. Yet, we expect that deployment engine, i.e., AWS CloudFormation or Terraform.
implementations for other languages and PL-IaC solutions yield Such exchange is uni-directional, i.e., with no arrow from the
similar results because IaC programs for other tools and other deployment engine to the IaC program in Fig. 1. Due to this
languages, i.e., the embedded PL-IaC DSL, are, technically, approach, two-phase PL-IaC can only compute on resource
analogous. The IaC program selection in our experiments is state that can be expressed in the deployment engine’s DSL—
also a threat. For RQ1, the set of variants in RWW suffices practically limited to referencing values, string interpolation,
to demonstrate the behavioral differences of ACT compared and simple value processing. Yet, using an expressive language
to other techniques; yet, more experiments are needed to show to process the externally generated state is the reason for using
with statistical significance that these differences are relevant in general-purpose languages in IaC programs in the first place.
practice such that ACT is beneficial on other IaC programs. For Accordingly, two-phase PL-IaC only provides a subset of PL-
RQ2, we inherit the limitations and validity threats of the PIPr IaC’s capabilities. In fact, AWS CDK code can be embedded
dataset [16]—including generalizability—but, based on our ex- into Pulumi programs, but not vice versa [45].
perience, we expect the qualitative insight to apply to other IaC Unit testing two-phase PL-IaC, i.e., for CDKs [46], [47], is
programs. For RQ3, we focused on the number of resources simpler than for general PL-IaC and does not require mocking
and their dependencies in IaC programs, showing how they as two-phase IaC programs do not interact with the deployment
influence performance. We rely on our experience that resource engine. We believe this simplification is the reason unit testing
number and dependency are the factors that most significantly is much more common in CDK projects than Pulumi [16]. Also,
impact performance, but other factors can be studied with a template projects set up through the CLI include a unit testing
more comprehensive sensitivity analysis. The categorization, setup with a simple test for CDKs (commented in the templates
as well as the error labeling and grouping in RQ2, may be by default), but not for Pulumi.
subjective, an issue we limited through the review of a second
author. Another potential issue is that ProTI is a random-based
B. Infrastructure as Code Quality
testing tool, which, in case of a bug, may cause the bug to
be inconsistently (not) caught by different test cases across Previous work discussed IaC quality and how to improve it,
executions. Hence, we apply 10 repetitions for RQ1. For RQ2, but it is mainly focused on Puppet, Ansible, and Chef. These
we saw negligible variance in tests. As the programs in RQ3 Configuration as Code (CaC) tools have been designed to con-
are correct, they are not impacted by this threat. RQ4 is also figure existing, mutable infrastructure—even though they also
not affected because it only demonstrates that existing tools can support provisioning. In contrast, PL-IaC does not only employ
be leveraged in ACT. RQ4 does not measure ProTI executions general-purpose programming languages instead of DSLs; it
to quantify the effectiveness of specific tools in the context of also focuses on infrastructure provisioning (like, e.g., Terraform
IaC. This aspect must be evaluated for each plugin and crucially and AWS CloudFormation), typically implementing immutable
depends on the implemented method. infrastructure management. Research on PL-IaC has been lim-
For practitioners, ACT and ProTI are new techniques whose ited to deployment coordination [48], [49].
effectiveness depends, in the long run, on a community effort Hummer et al. [50] proposed an idempotency testing ap-
to maintain the framework and the test generation and oracle proach for Chef scripts, which Ikeshita et al. [51] augmented
plugins. Practitioners can now try out ACT with low effort on with verification techniques to minimize the size of the required
existing Pulumi TypeScript IaC programs. This solution can test suite. Shambaugh et al. [52] proposed Rehearsal to verify
already reduce the development time through earlier bug de- the determinacy and idempotency of Puppet scripts. Yet, declar-
tection and increase the reliability of IaC programs, supporting ative IaC ensures idempotency by design.
faster evolving, functional, secure systems. A user study assess- Sharma et al. [53] were the first to identify code smells in
ing user acceptance of ACT and ProTI is left to future work. For Puppet scripts. Later studies confirmed them for Chef [54].
researchers, ACT and ProTI are novel testbeds that facilitate Rahman et al. surveyed CaC research [6] and identified source
exploring advanced test generation and oracle techniques for code properties correlating with defects in Puppet scripts [55],
IaC programs and correct and secure cloud configuration. such as hard-coded strings. They further recognized security
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1596 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024
smells and proposed linters for Puppet, Ansible, and Chef [56], D. Infrastructure Verification
[57]. Saavedra and Ferreira [58] introduced GLITCH for linters Ensuring infrastructure correctness has been extensively
on a CaC-solution-agnostic intermediate representation. Reis studied. AWS investigated automatically verifying infrastruc-
et al. [59] found that such linters are too imprecise but can be
ture properties [78], [79], [80], leading to at least two automated
improved through user feedback. services in production: AWS Tiros verifies reachability queries
Opdebeeck et al. [60], [61] analyzed the quality of semantic
on virtual networks [81] and AWS Zelkova performs access ver-
versioning and variable-precedence-related code smells in An-
ification on role-based AWS IAM policies [82]. These solutions
sible. Further, they applied program dependence graph analysis verify already deployed setups, but their techniques should be
to Ansible scripts, motivating control- and data-flow analysis
applicable pre-deployment on IaC programs, which encode the
for IaC security smell detection techniques [62]. Dalla Palma
infrastructure’s configuration. Such pre-deployment infrastruc-
et al. [63], [64], [65] proposed various quality metrics and an ture verification could also leverage more foundational tech-
AI defect prediction framework for Ansible scripts. Kumara
niques. E.g., Alloy [83] is a language and analysis tool to verify
et al. [4] and Guerriero et al. [3] explored IaC best practices
structural properties of software. Ahrens et al. [84] developed
and issues in the industry through a grey literature survey and a proof system for invariants on reconfigurable distributed sys-
practitioner interviews. Hassan and Rahman [66] studied bugs
tems. Evangelidis et al. [85] proposed probabilistic verification
in open-source Ansible test scripts. Borovits et al. [67] pro- of performance properties of rule-based auto-scaling policies.
posed FindICI, an AI-based tool to identify linguistic inconsis- Lastly, Abu Jabal et al. [86] gave a comprehensive overview
tency between documentation, comments, and code in Ansible
of techniques for policy verification, focused on access control
scripts, and Chiari et al. [68] surveyed work on static analysis and network management.
for IaC, focusing mainly on CaC. Program verification remains an open challenge, either re-
This paper is the first about quality in PL-IaC, which
quiring significant manual effort or being limited to specific
focuses—unlike CaC—on declarative infrastructure provision- properties [87]. Augmenting ACT with automated verification
ing through programs in popular imperative programming lan- of domain-specific properties, e.g., network access constraints,
guages. Further, we propose ACT and implement it in ProTI,
is a promising direction, orthogonal to ACT’s contribution to
enabling efficient unit testing of IaC programs. the testing of IaC programs.
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1597
where code is exercised on randomly generated tests, and results [8] U. Breitenbücher, T. Binz, K. Képes, O. Kopp, F. Leymann, and J.
are checked against invariants—the properties. Wettinger, “Combining declarative and imperative cloud application
provisioning based on TOSCA,” in Proc. IEEE Int. Conf. Cloud Eng.,
Various works investigate effective PBT test generators. Boston, MA, USA. Los Alamitos, CA, USA: IEEE Comput. Soc. Press,
Lampropoulos et al. proposed Luck, a language for PBT gen- 2014, pp. 87–96, doi: 10.1109/IC2E.2014.56.
erators [97], and coverage-guided PBT [98]. Löscher and Sag- [9] “Pulumi: Infrastructure as code in any programming language.”
Pulumi. Accessed: Nov. 29, 2023. [Online]. Available: https://github.
onas introduced targeted PBT [32] and automated it [99] using com/pulumi/pulumi
search-based techniques to guide the generation. Kuhn et al. [10] “Cloud development framework: AWS Cloud Development Kit.” Ama-
[100] found that most bugs are caused by the interaction of only zon Web Services. Accessed: Nov. 29, 2023. [Online]. Available: https://
aws.amazon.com/cdk/
a few parameters, motivating combinatorial testing [101], which [11] “CDK for Terraform.” HashiCorp. Accessed: Nov. 29, 2023. [Online].
Goldstein et al. [34] applied to PBT generators by modifying the Available: https://developer.hashicorp.com/terraform/cdktf
random generator distributions. On the intersection with formal [12] J. Duffy, “Pulumi raises Series B to build the future of cloud engineer-
ing.” Pulumi Blog. Accessed: Nov. 30, 2023. [Online]. Available: https://
methods, Paraskevopoulou et al. [102] integrated PBT into a www.pulumi.com/blog/series-b/
proof assistant to verify tests, and Lampropoulos et al. [103] [13] J. Duffy, “Building the best infrastructure as code with $41M Series C
compiled logical conditions (inductive relations) to generators funding.” Pulumi Blog. Accessed: Nov. 30, 2023. [Online]. Available:
https://www.pulumi.com/blog/series-c/
and to their soundness and completeness proofs. De Angelis [14] M. Madeja, J. Porubän, S. Chodarev, M. Sulír, and F. Gurbáľ, “Empirical
et al. [104] leveraged symbolic execution and constraint logic study of test case and test framework presence in public projects on
programming to automatically derive generators. GitHub,” Appl. Sci., vol. 11, no. 16, pp. 1–22, 2021, doi:10.3390/
app11167250.
ACT is fuzzing and PBT for IaC programs. For ProTI, type- [15] P. Singh Kochhar, T. F. Bissyandé, D. Lo, and L. Jiang, “An empirical
based generators and oracles, prototypes demonstrating third- study of adoption of software testing in open source projects,” in Proc.
party tool integration, and an ad-hoc specification syntax are 13th Int. Conf. Qual. Softw., Najing, China. Piscataway, NJ, USA: IEEE
Press, 2013, pp. 103–112, doi: 10.1109/QSIC.2013.57.
available. The approaches above can be integrated or imple- [16] D. Sokolowski, D. Spielmann, and G. Salvaneschi, “The PIPr dataset
mented in ProTI plugins to use them for IaC programs. of public infrastructure as code programs,” in Proc. 21st IEEE/ACM
Int. Conf. Mining Softw. Repositories (MSR), Lisbon, Portugal, 2024,
pp. 498–503, doi: 10.1145/3643991.3644888.
[17] H. Holmström Olsson, H. Allahyari, and J. Bosch, “Climbing the
VII. CONCLUSION ”stairway to heaven” - A mulitiple-case study exploring barriers in the
transition from agile development towards continuous deployment of
Testing is rarely used for IaC programs, and available tech- software,” in Proc. 38th Euromicro Conf. Softw. Eng. Adv. Appl. (SEAA),
Cesme, Izmir, Turkey, V. Cortellessa, H. Muccini, and O. Demirörs,
niques either hinder development velocity or require much pro- Eds., Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 2012,
gramming effort. We present Automated Configuration Testing pp. 392–399, doi: 10.1109/SEAA.2012.54.
(ACT) for quick IaC program testing at low effort and imple- [18] J. Humble and D. Farley, Continuous Delivery: Reliable Software
Releases Through Build, Test, and Deployment Automation. Reading,
ment it for Pulumi TypeScript in ProTI. ProTI is effective on MA, USA: Addison-Wesley, 2010.
existing IaC programs, and its modular architecture enables the [19] P. Ralph et al., “Empirical standards for software engineering research,”
use of existing third-party and novel test generators and oracles, 2021, arXiv:2010.03525.
[20] D. Sokolowski, D. Spielmann, and G. Salvaneschi, “ProTI: Automated
breaking ground for future research on effective test generators unit testing of Pulumi TypeScript infrastructure as code programs,” 2023,
and oracles for IaC programs. doi: 10.5281/zenodo.10028479.
[21] D. Sokolowski, D. Spielmann, and G. Salvaneschi, “Evaluation of auto-
mated infrastructure as code program testing,” 2024. doi: 10.5281/zen-
odo.10908273.
REFERENCES [22] “Cloud object storage: Amazon S3,” Amazon Web Services. Accessed:
Nov. 29, 2023. [Online]. Available: https://aws.amazon.com/s3/
[1] K. Morris, Infrastructure as Code: Dynamic Systems for the Cloud Age, [23] “Developer tools: SDKs and programming toolkits for building on AWS:
2nd ed. Sebastopol, CA, USA: O’Reilly Media, Inc., 2021. SDKs,” Amazon Web Services. Accessed: Nov. 29, 2023. [Online].
[2] G. Kim, J. Humble, P. Debois, J. Willis, and N. Forsgren, The DevOps Available: https://aws.amazon.com/developer/tools/#SDKs
hHandbook: How to Create World-class Agility, Rel., & Secur. Technol. [24] “Azure SDK releases.” Microsoft Azure. Accessed: Nov. 29, 2023.
Organizations, 2nd ed. IT Revolution Press, 11 2021. [Online]. Available: https://azure.github.io/azure-sdk/
[3] M. Guerriero, M. Garriga, D. A. Tamburri, and F. Palomba, “Adoption, [25] “Testing of Pulumi programs.” Pulumi. Accessed: Nov. 29, 2023.
support, and challenges of Infrastructure-as-Code: Insights from indus- [Online]. Available: https://www.pulumi.com/docs/using-pulumi/testing/
try,” in Proc. IEEE Int. Conf. Softw. Maintenance Evolution (ICSME), [26] “Policy as code for any cloud with Pulumi: Pulumi CrossGuard.” Pulumi.
Cleveland, OH, USA. Piscataway, NJ, USA: IEEE Press, 2019, pp. 580– Accessed: Nov. 29, 2023. [Online]. Available: https://www.pulumi.com/
589, doi: 10.1109/ICSME.2019.00092. crossguard/
[4] I. Kumara et al., “The do’s and don’ts of infrastructure code: [27] “Integration testing for Pulumi programs.” Pulumi. Accessed: Nov. 29,
A systematic gray literature review,” Inf. Softw. Technol., vol. 137, 2021, 2023. [Online]. Available: https://www.pulumi.com/docs/using-pulumi/
Art. no. 106593. doi: 10.1016/J.INFSOF.2021.106593. testing/integration/
[5] L. A. F. Leite, C. Rocha, F. Kon, D. S. Milojicic, and P. Meirelles, [28] G. Fink and M. Bishop, “Property-based testing: A new approach
“A survey of DevOps concepts and challenges,” ACM Comput. Surv., to testing for assurance,” ACM SIGSOFT Softw. Eng. Notes, vol. 22,
vol. 52, no. 6, pp. 127: 1–127:35, 2020, doi: 10.1145/3359981. no. 4, pp. 74–80, 1997, doi: 10.1145/263244.263267.
[6] A. Rahman, R. Mahdavi-Hezaveh, and L. A. Williams, “A systematic [29] K. Claessen and J. Hughes, “A lightweight tool for random test-
mapping study of infrastructure as code research,” Inf. Softw. Technol., ing of Haskell programs,” in Proc. Fifth ACM SIGPLAN Int. Conf.
vol. 108, pp. 65–77, 2019, doi: 10.1016/j.infsof.2018.12.004. Functional Program. (ICFP ’00), Montreal, Canada, M. Odersky and
[7] C. Endres, U. Breitenbücher, M. Falkenthal, O. Kopp, F. Leymann, P. Wadler, Eds., New York, NY, USA: ACM, 2000, pp. 268–279, doi:
and J. Wettinger, “Declarative vs. imperative: Two modeling patterns 10.1145/351240.351266.
for the automated deployment of applications,” Accessed: Nov. 30, [30] A. Zeller, R. Gopinath, M. Böhme, G. Fraser, and C. Holler, “Fuzzing:
2023. [Online]. Available: https://www.iaas.uni-stuttgart.de/publications/ Breaking things with random inputs.” Fuzzing. Accessed: Nov. 30, 2023.
INPROC-2017-12-Declarative-vs-Imperative-Modeling-Patterns.pdf [Online]. Available: https://www.fuzzingbook.org/html/Fuzzer.html
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
1598 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 50, NO. 6, JUNE 2024
[31] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-directed C. Krintz and E. D. Berger, Eds., New York, NY, USA: ACM, 2016,
random test generation,” in Proc. 29th Int. Conf. Softw. Eng. (ICSE pp. 416–430, doi: 10.1145/2908080.2908083.
2007), Minneapolis, MN, USA. Los Alamitos, CA, USA: IEEE Comput. [53] T. Sharma, M. Fragkoulis, and D. Spinellis, “Does your configuration
Soc. Press, 2007, pp. 75–84, doi: 10.1109/ICSE.2007.37. code smell?” in Proc. 13th Int. Conf. Mining Softw. Repositories (MSR),
[32] A. Löscher and K. Sagonas, “Targeted property-based testing,” in Proc. Austin, TX, USA, M. Kim, R. Robbes, and C. Bird, Eds., New York,
26th ACM SIGSOFT Int. Symp. Softw. Testing Anal., Santa Barbara, CA, NY, USA: ACM, 2016, pp. 189–200, doi: 10.1145/2901739.2901761.
USA, T. Bultan and K. Sen, Eds., New York, NY, USA: ACM, 2017, [54] J. Schwarz, A. Steffens, and H. Lichter, “Code smells in infrastructure as
pp. 46–56, doi: 10.1145/3092703.3092711. code,” in Proc. 11th Int. Conf. Qual. Inf. Commun. Technol. (QUATIC),
[33] J. Li, B. Zhao, and C. Zhang, “Fuzzing: A survey,” Cybersecurity, vol. Coimbra, Portugal, A. Bertolino, V. Amaral, P. Rupino, and M. Vieira,
1, no. 1, 2018, Art. no. 6, doi: 10.1186/S42400-018-0002-Y. Eds., Los Alamitos, CA, USA: IEEE Comput. Soc., 2018, pp. 220–228,
[34] H. Goldstein, J. Hughes, L. Lampropoulos, and B. C. Pierce, “Do doi: 10.1109/QUATIC.2018.00040.
judge a test by its cover - combining combinatorial and property- [55] A. Rahman and L. A. Williams, “Source code properties of defective
based testing,” in Proc. Program. Lang. Syst. 30th Eur. Symp. Pro- infrastructure as code scripts,” Inf. Softw. Technol., vol. 112, pp. 148–
gram. (ESOP), Luxembourg City, Luxembourg, N. Yoshida, Ed., vol. 163, 2019. doi: 10.1016/j.infsof.2019.04.013.
12648, Cham, Switzerland: Springer-Verlag, 2021, pp. 264–291, doi: [56] A. Rahman, C. Parnin, and L. A. Williams, “The seven sins: Security
10.1007/978-3-030-72019-3_10. smells in infrastructure as code scripts,” in Proc. IEEE / ACM 41st Int.
[35] A. Rahman, E. Farhana, C. Parnin, and L. A. Williams, “Gang of Conf. Softw. Eng. (ICSE), Montreal, QC, Canada, J. M. Atlee, T. Bultan,
Eight: A defect taxonomy for infrastructure as code scripts,” in Proc. and J. Whittle, Eds., 2019, pp. 164–175, doi: 10.1109/ICSE.2019.00033.
42nd Int. Conf. Softw. Eng., Seoul, South Korea, G. Rothermel and [57] A. Rahman, M. R. Rahman, C. Parnin, and L. A. Williams, “Security
D. Bae, Eds., New York, NY, USA: ACM, 2020, pp. 752–764, doi: smells in Ansible and Chef scripts: A replication study,” ACM Trans.
10.1145/3377811.3380409. Softw. Eng. Methodol., vol. 30, no. 1, pp. 3: 1–3:31, 2021, doi:
[36] T. Su et al., “Fully automated functional fuzzing of Android apps for 10.1145/3408897.
detecting non-crashing logic bugs,” in Proc. ACM Program. Lang., vol. [58] N. Saavedra and J. F. Ferreira, “GLITCH: Automated polyglot security
5, no. OOPSLA, 2021, pp. 1–31, doi: 10.1145/3485533. smell detection in infrastructure as code,” in Porc. 37th IEEE/ACM Int.
[37] “Jest: Delightful JavaScript testing.” Meta Platforms. Accessed: Nov. 29, Conf. Automated Softw. Eng. (ASE), Rochester, MI, USA. New York,
2023. [Online]. Available: https://jestjs.io/ NY, USA: ACM, 2022, pp. 47: 1–47:12, doi: 10.1145/3551349.3556945.
[38] N. Dubien, “fast-check official documentation.” fast-check. Accessed: [59] S. Reis, R. Abreu, M. d’Amorim, and D. Fortunato, “Leveraging practi-
Nov. 29, 2023. [Online]. Available: https://fast-check.dev/ tioners’ feedback to improve a security linter,” in Proc. 37th IEEE/ACM
[39] “Serverless compute: AWS Fargate.” Amazon Web Services. Accessed: Int. Conf. Automated Softw. Eng. (ASE), Rochester, MI, USA. New York,
Nov. 29, 2023. [Online]. Available: https://aws.amazon.com/fargate/ NY, USA: ACM, 2022, pp. 66: 1–66:12, doi: 10.1145/3551349.3560419.
[40] “Amazon elastic container service.” Amazon Web Services. Accessed: [60] R. Opdebeeck, A. Zerouali, C. Velázquez-Rodríguez, and C. De Roover,
Nov. 29, 2023. [Online]. Available: https://aws.amazon.com/ecs/ “On the practice of semantic versioning for Ansible Galaxy roles: An
[41] “Policies for AWS (AWSGuard).” Pulumi. Accessed: Nov. 29, empirical study and a change classification model,” J. Syst. Softw.,
2023. [Online]. Available: https://www.pulumi.com/docs/using-pulumi/ vol. 182, 2021, Art. no. 111059, doi: 10.1016/j.jss.2021.111059.
crossguard/awsguard/ [61] R. Opdebeeck, A. Zerouali, and C. De Roover, “Smelly variables in
[42] “Github docs: Searching code (legacy).” GitHub. Accessed: Nov. 30, Ansible infrastructure code: Detection, prevalence, and lifetime,” in
2023. [Online]. Available: https://docs.github.com/en/search-github/ Proc. 19th IEEE/ACM Int. Conf. Mining Softw. Repositories (MSR),
searching-on-github/searching-code Pittsburgh, PA, USA. New York, NY, USA: ACM, 2022, pp. 61–72,
[43] A. Helin, “Radamsa: A general-purpose fuzzer.” GitHub. Accessed: Nov. doi: 10.1145/3524842.3527964.
30, 2023. [Online]. Available: https://gitlab.com/akihe/radamsa [62] R. Opdebeeck, A. Zerouali, and C. De Roover, “Control and data flow
[44] M. D. Ernst et al., “The Daikon system for dynamic detection of likely in security smell detection for infrastructure as code: Is it worth the
invariants,” Sci. Comput. Program., vol. 69, nos. 1–3, pp. 35–45, 2007, effort? ” in Proc. 20th IEEE/ACM Int. Conf. Mining Softw. Repositories
doi: 10.1016/j.scico.2007.01.015. (MSR), Melbourne, Australia. Piscataway, NJ, USA: IEEE Press, 2023,
[45] L. Hoban, “Introducing AWS CDK on Pulumi.” Pulumi Blogs. Ac- pp. 534–545, doi: 10.1109/MSR59073.2023.00079.
cessed: Nov. 30, 2023. [Online]. Available: https://www.pulumi.com/ [63] S. Dalla Palma, D. Di Nucci, F. Palomba, and D. A. Tamburri, “Within-
blog/aws-cdk-on-pulumi/ project defect prediction of Infrastructure-as-Code using product and
[46] “Testing constructs: AWS Cloud Development Kit (AWS CDK) v2.” process metrics,” IEEE Trans. Softw. Eng., vol. 48, no. 6, pp. 2086–
Amazon Web Services. Accessed: Nov. 29, 2023. [Online]. Available: 2104, Jun. 2022, doi: 10.1109/TSE.2021.3051492.
https://docs.aws.amazon.com/cdk/v2/guide/testing.html [64] S. Dalla Palma, D. Di Nucci, F. Palomba, and D. A. Tamburri, “Toward
[47] “Unit tests: CDK for terraform.” HashiCorp. Accessed: Nov. 29, a catalog of software quality metrics for infrastructure code,” J. Syst.
2023. [Online]. Available: https://developer.hashicorp.com/terraform/ Softw., vol. 170, 2020, Art. no. 110726, doi: 10.1016/J.JSS.2020.110726.
cdktf/test/unit-tests [65] S. Dalla Palma, D. Di Nucci, and D. A. Tamburri, “AnsibleMet-
[48] D. Sokolowski, P. Weisenburger, and G. Salvaneschi, “Automating rics: A Python library for measuring Infrastructure-as-Code blueprints
serverless deployments for DevOps organizations,” in Proc. 29th in Ansible,” SoftwareX, vol. 12, 2020, Art. no. 100633, doi:
ACM Joint Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., Athens, 10.1016/J.SOFTX.2020.100633.
Greece, D. Spinellis, G. Gousios, M. Chechik, and M. D. Penta, Eds., [66] M. M. Hassan and A. Rahman, “As code testing: Characterizing
New York, NY, USA: ACM, 2021, pp. 57–69, doi: 10.1145/3468264. test quality in open source Ansible development,” in Proc. 15th
3468575. IEEE Conf. Softw. Testing, Verification Validation (ICST), Valencia,
[49] D. Sokolowski, P. Weisenburger, and G. Salvaneschi, “Decentralizing Spain. Piscataway, NJ, USA: IEEE Press, 2022, pp. 208–219, doi:
infrastructure as code,” IEEE Softw., vol. 40, no. 1, pp. 50–55, 2023, 10.1109/ICST53961.2022.00031.
doi: 10.1109/MS.2022.3192968. [67] N. Borovits et al., “FindICI: Using machine learning to detect lin-
[50] W. Hummer, F. Rosenberg, F. Oliveira, and T. Eilam, “Testing idem- guistic inconsistencies between code and natural language descriptions
potence for infrastructure as code,” in Proc. Middleware ACM/I- in Infrastructure-as-Code,” Empir. Softw. Eng., vol. 27, no. 7, 2022,
FIP/USENIX 14th Int. Middleware Conf., Beijing, China, D. M. Eyers Art. no. 178, doi: 10.1007/s10664-022-10215-5.
and K. Schwan, Eds., vol. 8275, Berlin, Heidelberg, Germany: Springer- [68] M. Chiari, M. De Pascalis, and M. Pradella, “Static analysis of infras-
Verlag, 2013, pp. 368–388, doi: 10.1007/978-3-642-45065-5_19. tructure as code: A survey,” in Proc. IEEE 19th Int. Conf. Softw. Archit.
[51] K. Ikeshita, F. Ishikawa, and S. Honiden, “Test suite reduction Companion (ICSA), Honolulu, HI, USA. Piscataway, NJ, USA: IEEE
in idempotence testing of infrastructure as code,” in Proc. Tests Press, 2022, pp. 218–225, doi: 10.1109/ICSA-C54293.2022.00049.
Proofs - 11th Int. Conf. Marburg, Germany, S. Gabmeyer and E. B. [69] “Topology and Orchestration Specification for Cloud Applications versin
Johnsen, Eds., vol. 10375, Springer-Verlag, 2017, pp. 98–115, doi: 1.0,” OASIS. Accessed: Nov. 29, 2023. [Online]. Available: http://docs.
10.1007/978-3-319-61467-0_6. oasis-open.org/tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.html
[52] R. Shambaugh, A. Weiss, and A. Guha, “Rehearsal: A configuration [70] J. Bellendorf and Z. Á. Mann, “Specification of cloud topologies and
verification tool for Puppet,” in Proc. 37th ACM SIGPLAN Conf. orchestration using TOSCA: A survey,” Computing, vol. 102, no. 8,
Program. Lang. Des. Implementation (PLDI) Santa Barbara, CA, USA, pp. 1793–1815, 2020, doi: 10.1007/S00607-019-00750-3.
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.
SOKOLOWSKI et al.: AUTOMATED INFRASTRUCTURE AS CODE PROGRAM TESTING 1599
[71] M. Wurster et al., “The essential deployment metamodel: A systematic Argentina, J. M. González-Barahona, A. Hindle, and L. Tan, Eds., Los
review of deployment automation technologies,” SICS Softw.Intensive Alamitos, CA, USA: IEEE Comput. Soc. Press, 2017, pp. 402–412, doi:
Cyber Phys. Syst., vol. 35, nos. 1–2, pp. 63–75, 2020, doi: 10.1109/MSR.2017.61.
10.1007/S00450-019-00412-X. [89] K. Taneja, Y. Zhang, and T. Xie, “MODA: automated test generation
[72] M. Wurster, U. Breitenbücher, L. Harzenetter, F. Leymann, for database applications via mock objects,” in Proc. 25th IEEE/ACM
J. Soldani, and V. Yussupov, “TOSCA Light: Bridging the gap Int. Conf. Automated Softw. Eng. (ASE), Antwerp, Belgium, C. Pecheur,
between the TOSCA specification and production-ready deployment J. Andrews, and E. D. Nitto, Eds., New York, NY, USA: ACM, 2010,
technologies,” in Proc. 10th Int. Conf. Cloud Comput. Services Sci., pp. 289–292, doi: 10.1145/1858996.1859053.
(CLOSER), Prague, Czech Republic, D. Ferguson, M. Helfert, and [90] F. Solms and L. Marshall, “Contract-based mocking for services-oriented
C. Pahl, Eds., Rijeka, Croatia: SciTech, 2020, pp. 216–226, doi: development,” in Proc. Annu. Conf. South Afr. Inst. Comput. Sci. Inf.
10.5220/0009794302160226. Technol. (SAICSIT), Johannesburg, South Africa., F. F. Blauw, M.
[73] J. Aldrich, C. Chambers, and D. Notkin, “ArchJava: Connecting soft- Coetzee, D. A. Coulter, E. M. Ehlers, W. S. Leung, C. Marnewick,
ware architecture to implementation,” in Proc. 24th Int. Conf. Softw. and D. van der Haar, Eds., New York, NY, USA: ACM, 2016, pp. 40:
Eng. (ICSE) , Orlando, Florida, USA, W. Tracz, M. Young, and J. 1–40:8, doi: 10.1145/2987491.2987534.
Magee, Eds., New York, NY, USA: ACM, 2002, pp. 187–197, doi: [91] D. Saff and M. D. Ernst, “Mock object creation for test factoring,”
10.1145/581339.581365. in Proc. ACM SIGPLAN-SIGSOFT Workshop Program Anal. Softw.
[74] I. Krüger, B. Demchak, and M. Menarini, “Dynamic service composition Tools Eng. (PASTE’04), Washington, DC, USA, C. Flanagan and A.
and deployment with OpenRichServices,” in Software Service and Zeller, Eds., New York, NY, USA: ACM, 2004, pp. 49–51, doi:
Application Engineering - Essays Dedicated to Bernd Krämer on the 10.1145/996821.996838.
Occasion of His 65th Birthday, M. Heisel, Ed., vol. 7365, Springer- [92] S. Joshi and A. Orso, “SCARPE: A technique and tool for se-
Verlag, 2012, pp. 120–146, doi: 10.1007/978-3-642-30835-2_9. lective capture and replay of program executions,” in Proc. 23rd
[75] R. Terra and M. T. de Oliveira Valente, “A dependency constraint IEEE Int. Conf. Softw. Maintenance (ICSM 2007), Paris, France. Los
language to manage object-oriented software architectures,” Softw. Pract. Alamitos, CA, USA: IEEE Comput. Soc., 2007, pp. 234–243, doi:
Exp., vol. 39, no. 12, pp. 1073–1094, 2009, doi: 10.1002/SPE.931. 10.1109/ICSM.2007.4362636.
[76] P. Weisenburger, M. Köhler, and G. Salvaneschi, “Distributed system [93] M. Fazzini, A. Gorla, and A. Orso, “A framework for automated test
development with ScalaLoci,” in Proc. ACM Program. Lang., vol. 2, mocking of mobile apps,” in Proc. 35th IEEE/ACM Int. Conf. Automated
no. OOPSLA, pp. 129: 1–129:30, 2018, doi: 10.1145/3276499. Softw. Eng. (ASE), Melbourne, Australia. Piscataway, NJ, USA: IEEE
[77] G. Zakhour, P. Weisenburger, and G. Salvaneschi, “Type-safe dynamic Press, 2020, pp. 1204–1208, doi: 10.1145/3324884.3418927.
placement with first-class placed values,” in Proc. ACM Program. Lang., [94] H. Zhu et al., “StubCoder: Automated generation and repair of stub code
vol. 7, no. OOPSLA2, Oct. 2023, doi: 10.1145/3622873. for mock objects,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 1,
[78] B. Cook, “Formal reasoning about the security of Amazon Web pp. 1–31, Aug. 2023, doi: 10.1145/3617171.
Services” in Proc. Comput. Aided Verification 30th Int. Conf., [95] V. Vikram, R. Padhye, and K. Sen, “Growing a test corpus with bonsai
(CAV), Oxford, UK, H. Chockler and G. Weissenbacher, Eds., vol. fuzzing,” in Proc. 43rd IEEE/ACM Int. Conf. Softw. Eng., ICSE 2021,
10981, Cham, Switzerland: Springer-Verlag, 2018, pp. 38–47, doi: Madrid, Spain. Piscataway, NJ, USA: IEEE Press, 2021, pp. 723–735,
10.1007/978-3-319-96145-3_3. doi: 10.1109/ICSE43902.2021.00072.
[79] J. Backes et al., “One-click formal methods,” IEEE Softw., vol. 36, [96] D. Steinhöfel and A. Zeller, “Input invariants,” in Proc. 30th ACM Joint
no. 6, pp. 61–65, Nov./Dec. 2019, doi: 10.1109/MS.2019.2930609. Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng. (ESEC/FSE) Singapore,
[80] M. Bouchet et al., “Block public access: Trust safety verification of Singapore,A. Roychoudhury, C. Cadar, and M. Kim, Eds., New York,
access control policies,” in Proc. 28th ACM Joint Eur. Softw. Eng. Conf. NY, USA: ACM, 2022, pp. 583–594, doi: 10.1145/3540250.3549139.
Symp. Foundations Softw. Eng. (ESEC/FSE ’20), Virtual Event, USA, [97] L. Lampropoulos, D. Gallois-Wong, C. Hritcu, J. Hughes, B. C. Pierce,
P. Devanbu, M. B. Cohen, and T. Zimmermann, Eds., New York, NY, and L. Xia, “Beginner’s Luck: A language for property-based genera-
USA: ACM, 2020, pp. 281–291, doi: 10.1145/3368089.3409728. tors,” in Proc. 44th ACM SIGPLAN Symp. Princ. Program. Lang. (POPL
[81] J. Backes et al., “Reachability analysis for AWS-based net- 2017), Paris, France, G. Castagna and A. D. Gordon, Eds., New York,
works,” in Proc. Comput. Aided Verification 31st Int. Conf. (CAV) NY, USA: ACM, 2017, pp. 114–129, doi: 10.1145/3009837.3009868.
New York City, NY, USA, I. Dillig and S. Tasiran, Eds., vol. [98] L. Lampropoulos, M. Hicks, and B. C. Pierce, “Coverage guided,
11562, Cham, Switzerland: Springer-Verlag, 2019, pp. 231–241, doi: property-based testing,” in Proc. ACM Program. Lang., vol. 3, no.
10.1007/978-3-030-25543-5_14. OOPSLA, pp. 181: 1–181:29, 2019, doi: 10.1145/3360607.
[82] J. Backes et al., “Semantic-based automated reasoning for AWS access [99] A. Löscher and K. Sagonas, “Automating targeted property-based test-
policies using SMT,” in Proc. Formal Methods Comput. Aided Des. ing,” in Proc. 11th IEEE Int. Conf. Softw. Testing, Verification Validation
(FMCAD), Austin, TX, USA, N. S. Bjørner and A. Gurfinkel, Eds., (ICST), Västerås, Sweden. Los Alamitos, CA, USA: IEEE Comput. Soc.
Piscataway, NJ, USA: IEEE Press, 2018, pp. 1–9, doi: 10.23919/FM- Press, 2018, pp. 70–80, doi: 10.1109/ICST.2018.00017.
CAD.2018.8602994. [100] R. Kuhn, D. R. Wallace, and A. M. Gallo, “Software fault interactions
[83] D. Jackson, “Alloy: A lightweight object modelling notation,” ACM and implications for software testing,” IEEE Trans. Softw. Eng., vol. 30,
Trans. Softw. Eng. Methodol., vol. 11, no. 2, pp. 256–290, 2002, doi: no. 6, pp. 418–421, 2004, doi: 10.1109/TSE.2004.24.
10.1145/505145.505149. [101] R. Kuhn, Y. Lei, and R. Kacker, “Practical combinatorial testing:
[84] E. Ahrens, M. Bozga, R. Iosif, and J. Katoen, “Reasoning about Beyond pairwise,” IT Prof., vol. 10, no. 3, pp. 19–23, May/Jun. 2008,
distributed reconfigurable systems,” in Proc. ACM Program. Lang., doi: 10.1109/MITP.2008.54.
vol. 6, no. OOPSLA2, pp. 145–174, 2022, doi: 10.1145/3563293. [102] Z. Paraskevopoulou, C. Hritcu, M. Dénes, L. Lampropoulos, and B. C.
[85] A. Evangelidis, D. Parker, and R. Bahsoon, “Performance modelling and Pierce, “Foundational property-based testing,” in Interactive Theorem
verification of cloud-based auto-scaling policies,” Future Gener. Comput. Proving - 6th Int. Conf. (ITP), Nanjing, China, C. Urban and X. Zhang,
Syst., vol. 87, pp. 629–638, Oct. 2018, doi: 10.1016/j.future.2017.12.047. Eds., vol. 9236, Cham, Switzerland: Springer-Verlag, 2015, pp. 325–343,
[86] A. Abu Jabal et al., “Methods and tools for policy analysis,” doi: 10.1007/978-3-319-22102-1_22.
ACM Comput. Surv., vol. 51, no. 6, pp. 121: 1–121:35, 2019. doi: [103] L. Lampropoulos, Z. Paraskevopoulou, and B. C. Pierce, “Generating
10.1145/3295749. good generators for inductive relations,” in Proc. ACM Program. Lang.,
[87] G. Zakhour, P. Weisenburger, and G. Salvaneschi, “Type-checking vol. 2, no. POPL, pp. 45: 1–45:30, 2018, doi: 10.1145/3158133.
CRDT convergence,” vol. 7, no. PLDI, 2023, pp. 1365–1388, doi: [104] E. De Angelis, F. Fioravanti, A. Palacios, A. Pettorossi, and M.
10.1145/3591276. Proietti, “Property-based test case generators for free,” in Tests Proofs
[88] D. Spadini, M. F. Aniche, M. Bruntink, and A. Bacchelli, “To mock 13th Int. Conf.,Porto, Portugal, D. Beyer and C. Keller, Eds., vol.
or not to mock?: An empirical study on mocking practices,” in Proc. 11823, Cham, Switzerland: Springer-Verlag, 2019, pp. 186–206, doi:
14th Int. Conf. Mining Softw. Repositories (MSR), Buenos Aires, 10.1007/978-3-030-31157-5_12.
Authorized licensed use limited to: Arizona State University. Downloaded on December 06,2024 at 16:34:27 UTC from IEEE Xplore. Restrictions apply.