0% found this document useful (0 votes)
9 views

Software Engineering Using Autonomous Agents Are We There Yet

The paper discusses the role of autonomous agents, particularly those powered by Large Language Models (LLMs) like Auto-GPT and GitHub Copilot, in enhancing Software Engineering practices. It highlights the advantages of context-rich prompts for improving task execution and decision-making while addressing challenges such as task skipping, hallucinations, and the complexity of tasks. The authors emphasize the need for further research to refine the capabilities of these agents and ensure effective integration into the Software Development Life Cycle (SDLC).

Uploaded by

brunojimezz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Software Engineering Using Autonomous Agents Are We There Yet

The paper discusses the role of autonomous agents, particularly those powered by Large Language Models (LLMs) like Auto-GPT and GitHub Copilot, in enhancing Software Engineering practices. It highlights the advantages of context-rich prompts for improving task execution and decision-making while addressing challenges such as task skipping, hallucinations, and the complexity of tasks. The authors emphasize the need for further research to refine the capabilities of these agents and ensure effective integration into the Software Development Life Cycle (SDLC).

Uploaded by

brunojimezz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Software Engineering Using Autonomous Agents:


Are We There Yet?
Samdyuti Suri, Sankar Narayan Das, Kapil Singi, Kuntal Dey, Vibhu Saujanya Sharma, Vikrant Kaulgud
2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) | 979-8-3503-2996-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/ASE56229.2023.00174

Accenture Tech Labs, Bangalore, India


{samdyuti.suri, sankar.naryan.das, kapil.singi, kuntal.dey, vibhu.sharma, vikrant.kaulgud}@accenture.com

Abstract—Autonomous agents equipped with Large script execution, and self-improvement capabilities. Auto-GPT
Language Models (LLMs) are rapidly gaining prominence marks a significant stride in autonomous agents, demonstrating
as a revolutionary technology within the realm of Software the potential of AI-driven decision-making and task execu-
Engineering. These intelligent and autonomous systems
demonstrate the capacity to perform tasks and make tion. GitHub Copilot is meticulously engineered to excel in
independent decisions, leveraging their intrinsic reasoning code generation and provide developer assistance through the
and decision-making abilities. provision of code snippets, comprehensive explanations, and
This paper delves into the current state of autonomous agents, awareness of the context.
their capabilities, challenges, and opportunities in Software Both GitHub Copilot and Auto-GPT leverage the GPT frame-
Engineering practices. By employing different prompts (with
or without context), we conclude the advantages of context- work to facilitate developer engagement in application devel-
rich prompts for autonomous agents. Prompts with context opment, but they have different capabilities. While GitHub
enhance user requirement understanding, avoiding irrelevant Copilot is strategically designed to assist developers by sug-
details that could hinder task comprehension and degrade gesting contextually relevant code snippets in alignment with
model performance, particularly when dealing with complex the codebase and the developer comments, autonomous agents
frameworks such as Spring Boot, Django, Flask, etc.
This exploration is conducted using Auto-GPT (v0.3.0), an like Auto-GPT are software frameworks acting as agents that
open-source application powered by GPT-3.5 and GPT-4 which are endowed with the capacity to make the decisions and take
intelligently connects the “thoughts” of Large Language Models actions in the real world on their own. The comparison of the
(LLMs) to independently accomplish the assigned goals or tasks. quality of code generated by both is out of the scope of this
paper.
Index Terms—Autonomous agents, Large Language Models
(LLMs), SDLC.
II. SOFTWARE ENGINEERING USING
I. I NTRODUCTION AUTONOMOUS AGENTS

The Software Development Life Cycle (SDLC) provides Autonomous Agent Workflow: Figure 1 illustrates the
a structured framework for managing software projects. It structure of an Autonomous Agent Workflow. The User
ensures the rapid delivery of high-quality software within provides prompts that are captured by the Task Agent. It
budget. Agile methods, DevOps, Test Automation, Cloud creates tasks and adds them to the Task List Stack. The
Computing, Continuous Integration/Continuous Deployment Prioritization Agent assigns priority to each task and sends
(CI/CD) and Artificial Intelligence (AI) enhance the SDLC, them back to the Task List Stack. The Task Agent selects
improving collaboration, time to market, and software quality an incomplete task and assigns it to the Execution Agent,
aligned with customer needs. which retrieves context from memory and executes the task.
Generative AI (GenAI) has witnessed significant growth in Intermediate results are stored in memory until all tasks are
recent years with extensive research and development in completed. The final result is delivered to the User.
various domains. Different versions of the popular genera- Let us integrate this workflow into the field of Software
tive model, Generative Pre-trained Transformer (GPT) [1], Engineering.
generate realistic content by leveraging learned patterns and
knowledge from existing data, without explicit human pro-
gramming. GPT demonstrates its prowess in generating natural
language text and code. In the realm of GPT-based models, we
observe distinct applications like ChatGPT [2], Auto-GPT [3]
and GitHub Copilot [4]. ChatGPT relies on human-provided
prompts for task completion and conversational engagement.
Auto-GPT, operating autonomously, efficiently addresses sub-
sets of problems independently by leveraging self-generated
prompts. It functions as a comprehensive personal assistant,
Fig. 1. Basic Structure of Autonomous GPT
encompassing code writing, debugging, test case generation,

2643-1572/23/$31.00 ©2023 IEEE 1855


DOI 10.1109/ASE56229.2023.00174
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:06:08 UTC from IEEE Xplore. Restrictions apply.
Autonomous Orchestration in SDLC: Figure 2 shows B. Prompts with Context:
the collaborative power of Autonomous Agents in achieving Example: ”Create index.html to take the input numbers
end-to-end cycle completion in Software Engineering. Gen AI num1 and num2, for a calculator application in the folder
agents may be assigned to complete the task of each role like myproject/src/main/resources/template and style.css in the
Business Analyst, Software Designer, Software Developer and folder myproject/src/main/resources/static. Then create
Quality Analyst. Through orchestration, the tasks of all roles GetSetCalc.java to get the getters and setters of num1
(shown in the dotted upper layer) can be accomplished by a and num2 and the operator; Calculate class with the
unified approach (shown in the solid lower layer) through a name Calculate. java with all the calculator functions like
GenAI agent like Auto-GPT. add, subtract, multiply and divide to calculate num1 and
num2; Controller class with the name Controller.java in
the folder myproject/src/main/java/com/example/myproject
to control the logic and flow of the application. Then
generate Test cases in the file CalcTest.java in the path
myproject/src/test/java/com/example/myproject including all
the edge cases and run the test cases. The next action is
to clone the GitHub repository, copy all the files and then
commit and push the changes.”
Here prompt is given with contexts like file
names (index.html, style.css, GetSetCalc.java,
Calculate.java, Controller.java and CalcTest.java),
target paths (myproject/src/main/resources/template,
myproject/src/main/resources/static, etc), methods (add,
subtract, multiply, divide) and variable names (num1, num2
and operator). Upon executing the prompt, the files were
saved in the designated target paths and maintained consistent
variable names and methods throughout Frontend and Back-
Fig. 2. Autonomous Orchestration in SDLC end components which ultimately allowed the successful
execution of the application.
III. EXPERIMENTS AND REVIEW During our initial exploration with Auto-GPT, we employed
Let us investigate Auto-GPT’s ability to autonomously a Single or High-level Prompt, that was concise but lacked
perform different phases of SDLC i.e, the code generation detailed context. This led to conflicts and ambiguity in the
process, including generating code, test cases, testing the framework. Therefore, we realized that a comprehensive
developed code, and checking it into the GitHub repository. evaluation required more diverse inputs. While High-Level
Let’s consider a scenario where a user provides a Web prompts were effective for generating and testing simple
Application requirements to Auto-GPT, which recommends code, we observed that they fell short for complex code,
Spring Boot as the preferred framework. Auto-GPT fetches requiring more context to help the autonomous agents
the framework and its dependencies from the web and achieve the desired results. This limitation emphasizes the
develops the Front-end and Back-end using HTML, CSS, significance of providing comprehensive and precise context
and Java. Code Generation is followed by the generation and to Auto-GPT for achieving accurate and desired outcomes.
execution of test cases, including the edge cases. In case of By providing Auto-GPT with comprehensive and precise
any test failures, Auto-GPT rectifies the code and re-runs context, including accurate target paths, required methods,
the test cases. Finally, the completed code is checked into and consistent variable names, conflicts and ambiguities
the GitHub repository. For this experiment, there can be two were avoided, resulting in a successful and functional web
kinds of prompts - High-Level prompts (Without Context) application. This highlights the importance of providing
and Prompts with Context. detailed context to Auto-GPT for achieving desired outcomes
effectively. We encountered another limitation related to
A. High-Level Prompts: context length, which represents the maximum number of
Example: ”Create html and css files to create a calculator tokens the model can accommodate in a single prompt. When
application and handle the backend with controller class and dealing with complex code generation, a specific context
methods. Generate test cases and execute them. Then clone becomes necessary. If the input prompt becomes excessively
the GitHub repository, copy the files and push the changes.” lengthy, the autonomous agents might take too long to process
The result of running such a prompt was that the files were it or disregard the additional content, resulting in an inability
not saved in the correct target path; had inconsistent variable to produce the desired output.
and methods names. Hence the application failed to compile. To conclude, prompts with context proved highly beneficial for
autonomous agents in various ways. Firstly, they significantly

1856

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:06:08 UTC from IEEE Xplore. Restrictions apply.
improved the understanding of user requirements, as the We aim to investigate parallelism in SDLC, using multiple dis-
context provided specific and relevant information. Avoiding tributed autonomous agents, like MetaGPT [6] to accomplish
irrelevant or insufficient information was crucial to prevent the goals or tasks collaboratively, as it can enhance efficiency,
distractions that could degrade the model’s performance speed up the outcomes and enhance performance.
and hinder task comprehension [5]. Secondly, it ensured Additionally, a robust risk management framework is imper-
consistency in code development and test case generation by ative to safeguard the integrity of autonomous agents and
aligning them with the provided information, thus minimizing their systems. The inherent capabilities of these agents can
contradictions. Additionally, the contextual information inadvertently breach sensitive information, leading to potential
assisted in generating accurate code, reducing ambiguity data leaks and privacy infringements, as well as download
when dealing with interdependent files based on methods, inappropriate files and fall prey to cyber-attacks. Through a
resources, and other factors. comprehensive risk management approach, we can proactively
Challenges: tackle these challenges, enabling the benefits of autonomous
While Autonomous Agents represent the future of Software agents to prosper while preserving the security and integrity
Engineering, several challenges require attention. With respect of essential systems.
to Auto-GPT, the following challenges need to be addressed:
V. C ONCLUSION
A. Task or Goal Skipping: Auto-GPT demonstrates the
potential to skip a goal in cases where an error arises Autonomous agents have gained significant traction due to
during the execution of a particular task or goal within a their remarkable adaptability and capability to learn from user
stack of multiple goals. Furthermore, when multiple tasks input, revolutionizing the work processes of SDLC. We have
are presented simultaneously, it may experience confusion showcased the potential of autonomous agents, particularly
and inadvertently skip certain tasks. Additionally, there is a Auto-GPT, in requirement gathering, code and test case gen-
chance that certain tasks may be overlooked or not effectively eration and pushing the framework to the repository, demon-
addressed by Auto-GPT. strating their decision-making power. Our experiments have
B. Hallucinations: During the execution of tasks with also underscored the importance of context-specific prompts
High-Level prompts, it is prone to either invoke unrequired for precise results in complex frameworks such as Spring
classes, methods, or variables or generate files in unrequired Boot, Django, Flask, etc., using Auto-GPT. While excelling in
target paths. This inherent hallucinatory tendency needs to be simpler tasks, the agent (Auto-GPT) is observed to encounter
comprehensively addressed in order to enhance the reliability challenges with ambiguity and complexity, highlighting the
and accuracy of the system’s output. need for precise context.
C. The complexity of Tasks: Auto-GPT exhibits limitations Since autonomous agents are still developing, it is imperative
in handling complex tasks. When tasked with performing to conduct further research and enhancements for reshaping
complex operations, precise contextual information is Software Engineering practices. By addressing challenges in-
necessary for successful execution; otherwise, it encounters cluding task skipping, hallucinations, task complexity, circular
failures. Furthermore, specifying the correct target path is responses, and task completion, along with improving code
crucial when dealing with intricate frameworks, as failure to and test case quality, setting up strong risk management, and
do so results in file generation in incorrect locations. exploring multi-agent collaboration for parallel execution, we
D. Gets stuck in repetitive or circular responses: Despite see a future where agents like Auto-GPT will transform SDLC,
having unambiguous and well-formulated prompts, Auto-GPT enhancing software quality.
may still encounter issues of getting stuck in loops. This is R EFERENCES
primarily attributed to its limited contextual understanding,
which prevents it from generating diverse and contextually [1] A. Radford, K. Narasimhan, T. Salimans, and I.
appropriate responses. Sutskever, “Improving language understanding by gen-
E. Absence of task completion assessment mechanism: erative pre-training,” OpenAI, Tech. Rep. 1, 2018.
Auto-GPT marks a task as complete after saving results, [2] OpenAI, ChatGPT, https : / / openai . com / blog / chatgpt,
lacking subsequent validation for actual completeness. 2021.
[3] Significant-Gravitas, Auto-GPT, https : / / github . com /
Significant-Gravitas/Auto-GPT, 2023.
IV. F UTURE W ORK [4] GitHub, OpenAI and Microsoft, GitHub Copilot docu-
Future work for autonomous agents involves addressing mentation, https://docs.github.com/en/copilot, 2023.
challenges to enhance their capabilities. We aim to refine [5] F. Shi, X. Chen, K. Misra, et al., “Large language
complex code and test case generation by providing better models can be easily distracted by irrelevant context,” in
context, avoiding errors and improving the accuracy. To further International Conference on Machine Learning, PMLR,
enhance the performance of autonomous agents, we need 2023, pp. 31 210–31 227.
to address issues such as task or goal skipping, handling [6] S. Hong, X. Zheng, J. Chen, et al., “Metagpt: Meta
hallucinations and complex tasks, and preventing repetitive or programming for multi-agent collaborative framework,”
circular responses. arXiv preprint arXiv:2308.00352v2, 2023.

1857

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:06:08 UTC from IEEE Xplore. Restrictions apply.

You might also like