0% found this document useful (0 votes)
13 views

Leveraging_Large_Language_Models_to_Improve_REST_API_Testing

The paper presents RESTGPT, a novel approach that utilizes Large Language Models (LLMs) to enhance REST API testing by extracting machine-interpretable rules and generating example parameter values from natural language descriptions in API specifications. RESTGPT outperforms existing tools like NLP2REST and ARTE in both rule extraction and value generation, achieving significantly higher precision rates. The study outlines the methodology, evaluation results, and future research directions for improving REST API testing through advanced LLM techniques.

Uploaded by

crce.9598.ce
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Leveraging_Large_Language_Models_to_Improve_REST_API_Testing

The paper presents RESTGPT, a novel approach that utilizes Large Language Models (LLMs) to enhance REST API testing by extracting machine-interpretable rules and generating example parameter values from natural language descriptions in API specifications. RESTGPT outperforms existing tools like NLP2REST and ARTE in both rule extraction and value generation, achieving significantly higher precision rates. The study outlines the methodology, evaluation results, and future research directions for improving REST API testing through advanced LLM techniques.

Uploaded by

crce.9598.ce
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2024 IEEE/ACM 46th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-

NIER)

Leveraging Large Language Models to Improve REST API Testing


Myeongsoo Kim Tyler Stennett Dhruv Shah
Georgia Institute of Technology Georgia Institute of Technology Georgia Institute of Technology
Atlanta, Georgia, USA Atlanta, Georgia, USA Atlanta, Georgia, USA
[email protected] [email protected] [email protected]

Saurabh Sinha Alessandro Orso


IBM Research Georgia Institute of Technology
Yorktown Heights, New York, USA Atlanta, Georgia, USA
[email protected] [email protected]

ABSTRACT 1 INTRODUCTION
The widespread adoption of REST APIs, coupled with their growing In today’s digital era, web applications and cloud-based systems
complexity and size, has led to the need for automated REST API have become ubiquitous, making REpresentational State Transfer
testing tools. Current tools focus on the structured data in REST (REST) Application Programming Interfaces (APIs) pivotal elements
API specifications but often neglect valuable insights available in in software development [29]. REST APIs enable disparate systems
unstructured natural-language descriptions in the specifications, to communicate and exchange data seamlessly, facilitating the in-
which leads to suboptimal test coverage. Recently, to address this tegration of a wide range of services and functionalities [11]. As
gap, researchers have developed techniques that extract rules from their intricacy and prevalence grow, effective testing of REST APIs
these human-readable descriptions and query knowledge bases has emerged as a significant challenge [12, 19, 40].
to derive meaningful input values. However, these techniques are Automated REST API testing tools (e.g., [3, 5, 6, 9, 14–16, 18, 20,
limited in the types of rules they can extract and prone to produce 22, 39]) primarily derive test cases from API specifications [2, 23, 25,
inaccurate results. This paper presents RESTGPT, an innovative 34]. Their struggle to achieve high code coverage [19] often stems
approach that leverages the power and intrinsic context-awareness from difficulties in comprehending the semantics and constraints
of Large Language Models (LLMs) to improve REST API testing. present in parameter names and descriptions [1, 17, 19]. To address
RESTGPT takes as input an API specification, extracts machine- these issues, assistant tools have been developed. These tools lever-
interpretable rules, and generates example parameter values from age Natural Language Processing (NLP) to extract constraints from
natural-language descriptions in the specification. It then augments parameter descriptions [17] and query parameter names against
the original specification with these rules and values. Our evalua- databases [1], such as DBPedia [7]. However, attaining high accu-
tions indicate that RESTGPT outperforms existing techniques in racy remains a significant challenge for these tools. Moreover, they
both rule extraction and value generation. Given these promising are limited in the types and complexity of rules they can extract.
results, we outline future research directions for advancing REST This paper introduces RESTGPT, a new approach that harnesses
API testing through LLMs. Large Language Models (LLMs) to enhance REST API specifications
by identifying constraints and generating relevant parameter val-
CCS CONCEPTS ues. Given an OpenAPI Specification [25], RESTGPT augments it
• Information systems → RESTful web services; • Software by deriving constraints and example values. Existing approaches
and its engineering → Software testing and debugging. such as NLP2REST [17] require a validation process to improve
precision, which involves not just the extraction of constraints
but also executing requests against the APIs to dynamically check
KEYWORDS
these constraints. Such a process demands significant engineer-
Large Language Models for Testing, OpenAPI Specification Analysis ing effort and a deployed service instance, making it cumbersome
ACM Reference Format: and time-consuming. In contrast, RESTGPT achieves higher accu-
Myeongsoo Kim, Tyler Stennett, Dhruv Shah, Saurabh Sinha, and Alessan- racy without requiring expensive validation. Furthermore, unlike
dro Orso. 2024. Leveraging Large Language Models to Improve REST API ARTE [1], RESTGPT excels in understanding the context of a pa-
Testing. In New Ideas and Emerging Results (ICSE-NIER’24), April 14–20, 2024, rameter name based on an analysis of the parameter description,
Lisbon, Portugal. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/ thus generating more contextually relevant values.
3639476.3639769 Our preliminary results demonstrate the significant advantage
of our approach over existing tools. Compared to NLP2REST with-
out the validation module, our method improves precision from
50% to 97%. Even when compared to NLP2REST equipped with
its validation module, our approach still increases precision from
This work licensed under Creative Commons Attribution International 4.0 License.
79% to 97%. Additionally, RESTGPT successfully generates both
ICSE-NIER’24, April 14–20, 2024, Lisbon, Portugal
syntactically and semantically valid inputs for 73% of the parame-
© 2024 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0500-7/24/04. ters over the analyzed services and their operations, a considerable
https://doi.org/10.1145/3639476.3639769

37
/ institutions :
get:
operationId: searchInstitutions
produces:
- application / json
parameters:
- name: filters
in: query
required: false
type: string
description: The filter for the bank search .
Examples :
* Filter by State name
` STNAME : \" West Virginia \" `
* Filter for any one of multiple State names
` STNAME : (\" West Virginia \" , \" Delaware \") `
- name: sort_order
in: query
required: false
type: string
description: Indicator if ascending ( ASC ) or descending (
DESC ) Figure 2: Overview of our approach.
responses:
’200’:
description: successful operation testing tools tend to generate random string values, which are often
schema:
type: object
not valid inputs for such parameters.
In response to these challenges, assistant tools have been intro-
Figure 1: A part of FDIC Bank Data’s OpenAPI specification. duced to enhance the capabilities of these testing tools. For instance,
ARTE [1] taps into DBPedia [7] to generate relevant parameter
improvement over ARTE, which could generate valid inputs for 17% example values. Similarly, NLP2REST applies natural language pro-
of the parameters only. Given these encouraging results, we outline cessing to extract example values and constraints from descriptive
a number of research directions for leveraging LLMs in other ways text portions of the specifications [17].
for further enhancing REST API testing.
2.3 Large Language Model
2 BACKGROUND AND MOTIVATING Large Language Models (LLMs) [13, 24, 36] represent a transfor-
EXAMPLE mative leap in the domains of natural language processing (NLP)
and Machine Learning. Characterized by their massive size, often
2.1 REST APIs and OpenAPI Specification containing billions of parameters, these models are trained on vast
REST APIs are interfaces built on the principles of Representational text corpora to generate, understand, and manipulate human-like
State Transfer (REST), a design paradigm for networked applica- text [28]. The architecture behind LLMs are primarily transformer-
tions [11]. Designed for the web, REST APIs facilitate data exchange based designs [37]. Notable models based on this architecture in-
between clients and servers through predefined endpoints primar- clude GPT (Generative Pre-trained Transformer) [27], designed
ily using the HTTP protocol [30, 35]. Each client interaction can mainly for text generation, and BERT (Bidirectional Encoder Repre-
include headers and a payload, while the corresponding response sentations from Transformers) [10], which excels in understanding
typically contains headers, content, and an HTTP status code indi- context. These models capture intricate linguistic nuances and se-
cating the outcome. mantic contexts, making them adept at a wide range of tasks from
OpenAPI Specification (OAS) [25] is arguably the industry stan- text generation to answering questions.
dard for defining RESTful API interfaces. It offers the advantage
of machine-readability, supporting automation processes, while 2.4 Motivating Example
also presenting information in a clear, human-readable format. Key The OpenAPI specification for the Federal Deposit Insurance Cor-
features of OAS include the definition of endpoints, the associated poration (FDIC) Bank Data’s API, shown in Figure 1, serves to offer
HTTP methods, expected input parameters, and potential responses. insights into banking data. Using this example, we highlight the
As an example, Figure 1 shows a portion of the FDIC Bank Data’s challenges in parameter value generation faced by current REST
API specification. This part of the specification illustrates how one API testing assistant tools and illustrate how RESTGPT addresses
might query information about institutions. It also details an ex- these challenges.
pected response, such as the 200 status code, which indicates a
successfully processed scenario. (1) Parameter filters: Although the description provides guid-
ance on how the parameter should be used, ARTE’s de-
pendency on DBPedia results in no relevant value gener-
2.2 REST API Testing and Assistant Tools ation for filters. NLP2REST, with its keyword-driven ex-
Automated REST API testing tools [5, 6, 9, 14–16, 18, 20, 22, 39] traction, identifies examples from the description, notably
derive test cases from widely-accepted specifications, primarily aided by the term “example”. Consequently, patterns such
OpenAPI [25]. However, these tools often struggle to achieve com- as STNAME: "West Virginia" and STNAME: ("West Virginia",
prehensive coverage [19]. A significant reason for this is their inabil- "Delaware") are accurately captured.
ity to interpret human-readable parts of the specification [17, 19]. (2) Parameter sort_order: Here, both tools exhibit limitations.
For parameters such as filters and sort_order shown in Figure 1, ARTE, while querying DBPedia, fetches unrelated values
2

38
such as “List of colonial heads of Portuguese Timor”, high- The implementation of cases in model prompting plays a pivotal
lighting its contextual inadequacy. In the absence of identifi- role in directing the model’s behaviour, ensuring that it adheres
able keywords, NLP2REST fails to identify “ASC” or “DESC” to precise criteria as depicted in the example. Drawing inspiration
as potential values. from Chain-of-Thought prompting [38], we decompose rule extrac-
In contrast to these tools, RESTGPT is much more effective: with tion into specific, manageable pieces to mitigate ambiguity and,
a deeper semantic understanding, RESTGPT accurately discerned consequently, improve the model’s processing abilities.
that the filters parameter was contextualized around state names
Grammar Highlights
tied to bank records, and generated test values such as STNAME:
"California" and multi-state filters such as STNAME: ("California",
Relational Operators: ’<’, ’>’, ’<=’, ’>=’, ’==’, ’! =’
"New York"). Also, it successfully identifies the values “ASC” or
Arithmetic Operators: ’+’, ’−’, ’∗’, ’/’
“DESC” from the description of the sort_order parameter. This
Dependency Operators: ’AllOrNone’, ’ZeroOrOne’, ...
example illustrates RESTGPT’s superior contextual understanding,
which enable it to outperform the constrained or context-blind
The Grammar Highlights emphasize key operators and vocab-
methodologies of existing tools.
ulary that the model should recognize and employ during rule
extraction. By providing the model with a fundamental context-
3 OUR APPROACH specific language, RESTGPT identifies rules within text.
3.1 Overview Output Configurations
Figure 2 illustrates the RESTGPT workflow, which starts by parsing
the input OpenAPI specification. During this phase, both machine- Example Parameter Constraint: min [minimum], max
readable and human-readable sections of each parameter are iden- [maximum], default [default]
tified. The human-readable sections provide insight into four con- Example Parameter Format: type [type], items [item
straint types: operational constraints, parameter constraints, pa- type], format [format], collectionFormat [collectionFor-
rameter type and format, and parameter examples [17]. mat]
The Rule Generator, using a set of crafted prompts, extracts
these four rules. We selected GPT-3.5 Turbo as the LLM for this After guiding the model through the rule-extraction process
work, given its accuracy and efficiency, as highlighted in a recent via specific prompting, we lastly define output formatting to com-
report by OpenAI [24]. The inclusion of few-shot learning further pile the model’s findings into a simple structure for subsequent
refines the model’s output. By providing the LLM with concise, processing.
contextually-rich instructions and examples, the few-shot prompts Additionally, the Rule Generator also oversees the value-generation
ensure the generated outputs are both relevant and precise [8, 21]. process, which is executed during the extraction of parameter ex-
Finally, RESTGPT combines the generated rules with the original ample rules. Our artifact [31, 32] provides details of all the prompts
specification to produce an enhanced specification. and their corresponding results.

3.2 Rule Generator 3.3 Specification Enhancement


To best instruct the model on rule interpretation and output for- The primary objective of RESTGPT is to improve the effectiveness of
matting, our prompts are designed around four core components: REST API testing tools. We accomplish this by producing enhanced
guidelines, cases, grammar highlights, and output configurations. OpenAPI specifications, augmented with rules derived from the
Guidelines human-readable natural-language descriptions in conjunction with
the machine-readable OpenAPI keywords [33].
1. Identify the parameter using its name and description. As illustrated in Figure 2, the Specification Parsing stage extracts
2. Extract logical constraints from the parameter descrip- the machine-readable and human-readable components from the
tion, adhering strictly to the provided format. API specification. After rules from the natural language inputs have
3. Interpret the description in the least constraining way. been identified by the Rule Generator, the Specification Building
phase begins. During this phase, the outputs from the model are
The provided guidelines serve as the foundational instructions processed and combined with the machine-readable components,
for the model, framing its perspective and clarifying its primary ensuring that there is no conflict between restrictions. For example,
objectives. Using the guidelines as a basis, RESTGPT can then the resulting specification must have the style attribute only if
proceed with more specific prompting. the data type is array or object. The final result is an enriched
Cases API specification that contains constraints, examples, and rules
extracted from the human-readable descriptions.
Case 1: If the description is non-definitive about parameter
requirements: Output "None".
4 PRELIMINARY RESULTS
...
Case 10: For complex relationships between parameters: 4.1 Evaluation Methodology
Combine rules from the grammar. We collected nine RESTful services from the NLP2REST study. The
motivation behind this selection is the availability of a ground truth
3

39
Table 1: Effectiveness of NLP2REST and RESTGPT.
No. of Rules in NLP2REST Without Validation Process NLP2REST With Validation Process RESTGPT
REST Service Ground Truth TP FP FN Precision Recall F1 TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
FDIC 45 42 36 3 54% 93% 68% 42 25 3 63% 93% 75% 44 0 1 100% 98% 99%
Genome Nexus 81 79 3 2 96% 98% 97% 79 3 2 96% 98% 97% 75 0 6 100% 93% 96%
LanguageTool 20 20 12 0 63% 100% 77% 18 2 2 90% 90% 90% 18 0 3 100% 86% 92%
OCVN 17 15 2 2 88% 88% 88% 13 1 4 93% 76% 84% 15 2 1 88% 94% 91%
OhSome 14 13 66 1 16% 93% 28% 12 11 2 52% 80% 63% 12 3 2 80% 86% 83%
OMDb 2 2 0 0 100% 100% 100% 2 0 0 100% 100% 100% 2 0 0 100% 100% 100%
REST Countries 32 28 1 4 97% 88% 92% 28 0 4 100% 88% 93% 30 0 2 100% 94% 97%
Spotify 88 83 68 5 55% 94% 69% 82 28 6 75% 93% 83% 86 2 4 98% 96% 97%
YouTube 34 30 126 4 19% 88% 32% 28 9 6 76% 82% 79% 24 2 8 92% 75% 83%
Total 333 312 314 21 50% 94% 65% 304 79 29 79% 91% 85% 306 9 27 97% 92% 94%

Table 2: Accuracy of ARTE and RESTGPT. such as “Arabic”, “Chinese”, “English”, and “Spanish”. However,
Service Name ARTE RESTGPT RESTGPT understands the context of the language parameter, and
FDIC 25.35% 77.46% generates language code such as “en-US” and “de-DE”.
Genome Nexus 9.21% 38.16%
Language-Tool 0% 82.98% 5 FUTURE PLANS
OCVN 33.73% 39.76%
OhSome 4.88% 87.80% Given our encouraging results on LLM-based rule extraction, we
OMDb 36.00% 96.00% next outline several research directions that we plan to pursue in
REST-Countries 29.66% 92.41% leveraging LLMs for improving REST API testing more broadly.
Spotify 14.79% 76.06%
Youtube 0% 65.33% Model Improvement. There are two ways in which we plan to
Average 16.93% 72.68% create improved models for supporting REST API testing. First, we
will perform task-specific fine-tuning of LLMs using data from APIs-
of extracted rules in the NLP2REST work [17]. Having this data, guru [4] and RapidAPI [26], which contain thousands of real-world
we could easily compare our work with NLP2REST. API specifications. We will fine-tune RESTGPT with these datasets,
To establish a comprehensive benchmark, we incorporated a which should enhance the model’s capability to comprehend di-
comparison with ARTE as well. Our approach was guided by the verse API contexts and nuances. We believe that this dataset-driven
ARTE paper, from which we extracted the necessary metrics for refinement will help RESTGPT understand a broader spectrum of
comparison. Adhering to ARTE’s categorization of input values as specifications and generate even more precise testing suggestions.
Syntactically Valid and Semantically Valid [1], two of the authors Second, we will focus on creating lightweight models for support-
meticulously verified the input values generated by RESTGPT and ing REST API testing, such that the models do not require expensive
ARTE. Notably, we emulated ARTE’s approach in scenarios where computational resources and can be deployed on commodity CPUs.
more than ten values were generated by randomly selecting ten To this end, we will explore approaches for trimming the model,
from the pool for analysis. focusing on retaining the essential neurons and layers crucial for
our task.
4.2 Results and Discussion Improving fault detection. RESTGPT is currently restricted
Table 1 presents a comparison of the rule-extraction capabilities to detecting faults that manifest as 500 server response codes. By
of NLP2REST and RESTGPT. RESTGPT excels in precision, recall, leveraging LLMs, we intend to expand the types of bugs that can be
and the F1 score across a majority of the REST services. NLP2REST, detected, such as bugs related to CRUD semantic errors or discrepan-
while effective, hinges on a validation process that involves evalu- cies in producer-consumer relationships. By enhancing RESTGPT’s
ating server responses to filter out unsuccessful rules. This method- fault-finding ability in this way, we aim to make automated REST
ology demands engineering effort, and its efficacy is constrained API testing more effective and useful in practice.
by the validator’s performance. LLM-based Testing Approach. We aim to develop a REST API
In contrast, RESTGPT eliminates the need for such validation testing tool that leverages server messages. Although server mes-
entirely with its high precision. Impressively, RESTGPT’s precision sages often contain valuable information, current testing tools fail
of 97% surpasses even the precision of NLP2REST post-validation, to leverage this information [17]. For instance, if a server hint sug-
which stands at 79%. This emphasizes that RESTGPT is able to de- gests crafting a specific valid request, RESTGPT, with its semantic
liver superior results without a validation stage. This result shows understanding, could autonomously generate relevant tests. This
an LLM’s superior ability in nuanced rule detection, unlike conven- would not only enhance the testing process but also ensure that
tional NLP techniques that rely heavily on specific keywords. potential loopholes that the server messages may indicate would
Furthermore, Table 2 presents data on accuracy of ARTE and not be overlooked.
RESTGPT. The data paint a clear picture: RESTGPT consistently
achieves higher accuracy than ARTE across all services. This can ACKNOWLEDGMENTS
be attributed to the context-awareness capabilities of LLMs, as This work was partially supported by NSF, under grant CCF-0725202,
discussed in Section 2. For example, in language-tool service, we DOE, under contract DE-FOA-0002460, and gifts from Facebook,
found that, for the language parameter, ARTE generates values Google, IBM Research, and Microsoft Research.

40
REFERENCES [19] Myeongsoo Kim, Qi Xin, Saurabh Sinha, and Alessandro Orso. 2022. Automated
[1] J. C. Alonso, A. Martin-Lopez, S. Segura, J. Garcia, and A. Ruiz-Cortes. 2023. ARTE: Test Generation for REST APIs: No Time to Rest Yet. In Proceedings of the 31st
Automated Generation of Realistic Test Inputs for Web APIs. IEEE Transactions ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual,
on Software Engineering 49, 01 (jan 2023), 348–363. https://doi.org/10.1109/TSE. South Korea) (ISSTA 2022). Association for Computing Machinery, New York, NY,
2022.3150618 USA, 289–301. https://doi.org/10.1145/3533767.3534401
[2] API Blueprint. 2023. API Blueprint. https://apiblueprint.org/ [20] Kerry Kimbrough. 2023. Tcases. https://github.com/Cornutum/tcases
[3] Apiary. 2023. Dredd. https://github.com/apiaryio/dredd [21] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and
[4] APIs.guru. 2023. APIs-guru. https://apis.guru/ Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of
[5] Andrea Arcuri. 2019. RESTful API Automated Test Case Generation with Evo- prompting methods in natural language processing. Comput. Surveys 55, 9 (2023),
Master. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 1–35.
1, Article 3 (jan 2019), 37 pages. https://doi.org/10.1145/3293455 [22] Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés. 2021. RESTest:
[6] Vaggelis Atlidakis, Patrice Godefroid, and Marina Polishchuk. 2019. RESTler: Automated Black-Box Testing of RESTful Web APIs. In Proceedings of the 30th
Stateful REST API Fuzzing. In Proceedings of the 41st International Conference on ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual,
Software Engineering (Montreal, Quebec, Canada) (ICSE ’19). IEEE Press, Piscat- Denmark) (ISSTA 2021). Association for Computing Machinery, New York, NY,
away, NJ, USA, 748–758. https://doi.org/10.1109/ICSE.2019.00083 USA, 682–685. https://doi.org/10.1145/3460319.3469082
[7] Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, [23] MuleSoft, LLC, a Salesforce company. 2020. RAML. https://raml.org/
Richard Cyganiak, and Sebastian Hellmann. 2009. Dbpedia-a crystallization point [24] OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
for the web of data. Journal of web semantics 7, 3 (2009), 154–165. [25] OpenAPI. 2023. OpenAPI standard. https://www.openapis.org.
[26] R Software Inc. 2023. RapidAPI. https://rapidapi.com/terms/
[8] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan,
[27] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018.
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda
Improving language understanding by generative pre-training.
Askell, et al. 2020. Language models are few-shot learners. Advances in neural
[28] Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles
information processing systems 33 (2020), 1877–1901.
Brundage, and Ilya Sutskever. 2019. Better language models and their implica-
[9] Davide Corradini, Amedeo Zampieri, Michele Pasqua, Emanuele Viglianisi,
tions.
Michael Dallago, and Mariano Ceccato. 2022. Automated black-box testing
[29] Leonard Richardson, Mike Amundsen, and Sam Ruby. 2013. RESTful Web APIs:
of nominal and error scenarios in RESTful APIs. Software Testing, Verification
Services for a Changing World. O’Reilly Media, Inc., Sebastopol, CA, USA.
and Reliability 32 (01 2022). https://doi.org/10.1002/stvr.1808
[30] Alex Rodriguez. 2008. Restful web services: The basics. IBM developerWorks 33,
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:
2008 (2008), 18.
Pre-training of Deep Bidirectional Transformers for Language Understanding.
[31] SE@GT. 2024. Experiment infrastructure, data, and results for RESTGPT (GitHub).
arXiv:1810.04805 [cs.CL]
https://github.com/selab-gatech/RESTGPT.
[11] Roy Thomas Fielding. 2000. Architectural Styles and the Design of Network-Based
[32] SE@GT. 2024. Experiment infrastructure, data, and results for RESTGPT (Zenodo).
Software Architectures. Ph. D. Dissertation. University of California, Irvine.
https://doi.org/10.5281/zenodo.10467805.
[12] Amid Golmohammadi, Man Zhang, and Andrea Arcuri. 2023. Testing RESTful
[33] SmartBear Software. 2023. OpenAPI data model. https://swagger.io/docs/
APIs: A Survey. ACM Trans. Softw. Eng. Methodol. 33, 1, Article 27 (nov 2023),
specification/data-models/keywords/.
41 pages. https://doi.org/10.1145/3617175
[34] SmartBear Software. 2023. Swagger. https://swagger.io/specification/v2/.
[13] Google. 2023. Google Bard. https://bard.google.com/
[35] Stefan Tilkov. 2007. A brief introduction to REST.
[14] Zac Hatfield-Dodds and Dmitry Dygalo. 2022. Deriving Semantics-Aware Fuzzers
[36] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
from Web API Schemas. In Proceedings of the ACM/IEEE 44th International Confer-
Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro,
ence on Software Engineering: Companion Proceedings (Pittsburgh, Pennsylvania)
Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guil-
(ICSE ’22). Association for Computing Machinery, New York, NY, USA, 345–346.
laume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models.
https://doi.org/10.1145/3510454.3528637
arXiv:2302.13971 [cs.CL]
[15] Stefan Karlsson, Adnan Causevic, and Daniel Sundmark. 2020. QuickREST:
[37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Property-based Test Generation of OpenAPI-Described RESTful APIs. In 2020
Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All
IEEE 13th International Conference on Software Testing, Validation and Verifica-
you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von
tion (ICST). IEEE Press, Piscataway, NJ, USA, 131–141. https://doi.org/10.1109/
Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.),
ICST46399.2020.00023
Vol. 30. Curran Associates, Inc., Red Hook, NY, USA. https://proceedings.neurips.
[16] Stefan Karlsson, Adnan Čaušević, and Daniel Sundmark. 2021. Automatic
cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Property-based Testing of GraphQL APIs. In 2021 IEEE/ACM International Con-
[38] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi,
ference on Automation of Software Test (AST). IEEE Press, Piscataway, NJ, USA,
Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning
1–10. https://doi.org/10.1109/AST52587.2021.00009
in large language models. Advances in Neural Information Processing Systems 35
[17] Myeongsoo Kim, Davide Corradini, Saurabh Sinha, Alessandro Orso, Michele
(2022), 24824–24837.
Pasqua, Rachel Tzoref-Brill, and Mariano Ceccato. 2023. Enhancing REST API
[39] Huayao Wu, Lixin Xu, Xintao Niu, and Changhai Nie. 2022. Combinatorial
Testing with NLP Techniques. In Proceedings of the 32nd ACM SIGSOFT In-
Testing of RESTful APIs. In Proceedings of the 44th International Conference
ternational Symposium on Software Testing and Analysis (ISSTA 2023). Asso-
on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for
ciation for Computing Machinery, New York, NY, USA, 1232–1243. https:
Computing Machinery, New York, NY, USA, 426–437. https://doi.org/10.1145/
//doi.org/10.1145/3597926.3598131
3510003.3510151
[18] Myeongsoo Kim, Saurabh Sinha, and Alessandro Orso. 2023. Adaptive REST
[40] Man Zhang and Andrea Arcuri. 2023. Open Problems in Fuzzing RESTful APIs:
API Testing with Reinforcement Learning. In 2023 38th IEEE/ACM International
A Comparison of Tools. ACM Trans. Softw. Eng. Methodol. 32, 6, Article 144 (sep
Conference on Automated Software Engineering (ASE). IEEE Press, Piscataway, NJ,
2023), 45 pages. https://doi.org/10.1145/3597205
USA, 446–458. https://doi.org/10.1109/ASE56229.2023.00218

41

You might also like