Large Language Model Powered Test Case Generation For Software Ap
Large Language Model Powered Test Case Generation For Software Ap
September 2023
Recommended Citation
Dantas, Victor, "Large Language Model Powered Test Case Generation for Software Applications",
Technical Disclosure Commons, (September 26, 2023)
https://www.tdcommons.org/dpubs_series/6279
Large Language Model Powered Test Case Generation for Software Applications
ABSTRACT
Test cases for software are designed to provide code coverage as well as to test the code
against real user behavior. Building such test cases can be challenging. Current tools to build
test cases rely on macros (scripts) to simulate user behavior. Building such macros requires
manually programming test cases and/or randomizing user actions during the test. This
disclosure describes the use of generative artificial intelligence (AI) techniques to learn the
patterns of user behavior on a website, app, or other software to be tested. For example, a large
language model (LLM) can be utilized to learn the patterns. The LLM can be prompted to
automatically generate test cases during the automated testing phase of the software
development life cycle for the software that is to be tested. The automatically generated test
cases can encapsulate specific user personas based on the set of actions in the LLM response.
KEYWORDS
● Code testing
● Software development
● Prompt engineering
● User persona
● User behavior
BACKGROUND
The software development lifecycle includes testing code using multiple test cases. Test
cases are designed to provide code coverage as well as to test the code against real user
behavior. Building such test cases can be challenging. Current tools to build test cases rely on
macros (scripts) to simulate user behavior. Building such macros requires manually
programming test cases and/or randomizing user actions during the test.
DESCRIPTION
This disclosure describes the use of generative artificial intelligence (AI) techniques to
learn the patterns of user behavior on a website, app, or other software to be tested. For
example, a large language model (LLM) can be utilized to learn the patterns. The LLM can be
prompted to automatically generate test cases during the automated testing phase of the
software development life cycle for the software that is to be tested. The automatically
generated test cases can encapsulate specific user personas based on the set of actions (e.g.,
carried out by different real users, obtained for training purposes with specific user permission).
The set of user actions corresponding to each test case are mapped to a set of application
programming interface (API) calls. The API calls can be represented in text form and can
therefore be learned by a language model. This is different from the use of macros or scripts,
since the user behavior is not explicitly programmed into the test case generator. Also, the user
actions are not randomly generated, but are based on the learned patterns of user behaviors and
not learnt instead. Prompt engineering and other suitable fine-tuning techniques can be applied
as necessary to guide the model into using API calls that simulate real user behavior.
https://www.tdcommons.org/dpubs_series/6279 3
Dantas: Large Language Model Powered Test Case Generation for Software Ap
Fig. 1 illustrates LLM-powered simulated user testing for software, e.g., websites/ web
applications, mobile applications, or any other software. Log files (102) with multiple user
journeys captured with specific user permission from a large number of users are obtained. The
log files are provided as input to a large language model (104). The LLM learns different
patterns of user behavior and can generate learned user personas (106) based on the input set of
log files. The LLM can be trained on application log files or any other suitable input where real
When testing the application, a test case generator (108) sends prompts to the LLM
requesting generation of test cases. An example prompt is - “Generate 100 test cases consisting
each of a search query on a clothing retail website and a set of actions corresponding to API
calls that a user would make while browsing the website and making transactions.”
The prompts can be generated based on a list of supported APIs (112) for the
application backend (110) that is to be tested. For example, a website or web application may
support a set of GET APIs and a set of POST APIs. For example, the available GET API calls
The LLM generates test cases and sends them to the test case generator. For example, a
test case may include the following sequence of API calls that mimics a user browsing the
website for shirts and pants, adding items to their cart, and performing a purchase action.
The test generator can parse the response provided by the LLM and execute the test
case. Additional test cases can be generated via additional prompts. Prompts can be selected to
provide appropriate test coverage and/or simulate user behavior that tests the application
thoroughly.
CONCLUSION
This disclosure describes the use of generative artificial intelligence (AI) techniques to
learn the patterns of user behavior on a website, app, or other software to be tested. For
example, a large language model (LLM) can be utilized to learn the patterns. The LLM can be
https://www.tdcommons.org/dpubs_series/6279 5
Dantas: Large Language Model Powered Test Case Generation for Software Ap
prompted to automatically generate test cases during the automated testing phase of the
software development life cycle for the software that is to be tested. The automatically
generated test cases can encapsulate specific user personas based on the set of actions in the
LLM response.
REFERENCES
1. Kang, Sungmin, Juyeon Yoon, and Shin Yoo. "Large language models are few-shot
2. “Locust: an open source load testing tool” available online at https://locust.io/ accessed
September 5, 2023.