Software Engineering Using Autonomous Agents Are We There Yet
Software Engineering Using Autonomous Agents Are We There Yet
Abstract—Autonomous agents equipped with Large script execution, and self-improvement capabilities. Auto-GPT
Language Models (LLMs) are rapidly gaining prominence marks a significant stride in autonomous agents, demonstrating
as a revolutionary technology within the realm of Software the potential of AI-driven decision-making and task execu-
Engineering. These intelligent and autonomous systems
demonstrate the capacity to perform tasks and make tion. GitHub Copilot is meticulously engineered to excel in
independent decisions, leveraging their intrinsic reasoning code generation and provide developer assistance through the
and decision-making abilities. provision of code snippets, comprehensive explanations, and
This paper delves into the current state of autonomous agents, awareness of the context.
their capabilities, challenges, and opportunities in Software Both GitHub Copilot and Auto-GPT leverage the GPT frame-
Engineering practices. By employing different prompts (with
or without context), we conclude the advantages of context- work to facilitate developer engagement in application devel-
rich prompts for autonomous agents. Prompts with context opment, but they have different capabilities. While GitHub
enhance user requirement understanding, avoiding irrelevant Copilot is strategically designed to assist developers by sug-
details that could hinder task comprehension and degrade gesting contextually relevant code snippets in alignment with
model performance, particularly when dealing with complex the codebase and the developer comments, autonomous agents
frameworks such as Spring Boot, Django, Flask, etc.
This exploration is conducted using Auto-GPT (v0.3.0), an like Auto-GPT are software frameworks acting as agents that
open-source application powered by GPT-3.5 and GPT-4 which are endowed with the capacity to make the decisions and take
intelligently connects the “thoughts” of Large Language Models actions in the real world on their own. The comparison of the
(LLMs) to independently accomplish the assigned goals or tasks. quality of code generated by both is out of the scope of this
paper.
Index Terms—Autonomous agents, Large Language Models
(LLMs), SDLC.
II. SOFTWARE ENGINEERING USING
I. I NTRODUCTION AUTONOMOUS AGENTS
The Software Development Life Cycle (SDLC) provides Autonomous Agent Workflow: Figure 1 illustrates the
a structured framework for managing software projects. It structure of an Autonomous Agent Workflow. The User
ensures the rapid delivery of high-quality software within provides prompts that are captured by the Task Agent. It
budget. Agile methods, DevOps, Test Automation, Cloud creates tasks and adds them to the Task List Stack. The
Computing, Continuous Integration/Continuous Deployment Prioritization Agent assigns priority to each task and sends
(CI/CD) and Artificial Intelligence (AI) enhance the SDLC, them back to the Task List Stack. The Task Agent selects
improving collaboration, time to market, and software quality an incomplete task and assigns it to the Execution Agent,
aligned with customer needs. which retrieves context from memory and executes the task.
Generative AI (GenAI) has witnessed significant growth in Intermediate results are stored in memory until all tasks are
recent years with extensive research and development in completed. The final result is delivered to the User.
various domains. Different versions of the popular genera- Let us integrate this workflow into the field of Software
tive model, Generative Pre-trained Transformer (GPT) [1], Engineering.
generate realistic content by leveraging learned patterns and
knowledge from existing data, without explicit human pro-
gramming. GPT demonstrates its prowess in generating natural
language text and code. In the realm of GPT-based models, we
observe distinct applications like ChatGPT [2], Auto-GPT [3]
and GitHub Copilot [4]. ChatGPT relies on human-provided
prompts for task completion and conversational engagement.
Auto-GPT, operating autonomously, efficiently addresses sub-
sets of problems independently by leveraging self-generated
prompts. It functions as a comprehensive personal assistant,
Fig. 1. Basic Structure of Autonomous GPT
encompassing code writing, debugging, test case generation,
1856
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:06:08 UTC from IEEE Xplore. Restrictions apply.
improved the understanding of user requirements, as the We aim to investigate parallelism in SDLC, using multiple dis-
context provided specific and relevant information. Avoiding tributed autonomous agents, like MetaGPT [6] to accomplish
irrelevant or insufficient information was crucial to prevent the goals or tasks collaboratively, as it can enhance efficiency,
distractions that could degrade the model’s performance speed up the outcomes and enhance performance.
and hinder task comprehension [5]. Secondly, it ensured Additionally, a robust risk management framework is imper-
consistency in code development and test case generation by ative to safeguard the integrity of autonomous agents and
aligning them with the provided information, thus minimizing their systems. The inherent capabilities of these agents can
contradictions. Additionally, the contextual information inadvertently breach sensitive information, leading to potential
assisted in generating accurate code, reducing ambiguity data leaks and privacy infringements, as well as download
when dealing with interdependent files based on methods, inappropriate files and fall prey to cyber-attacks. Through a
resources, and other factors. comprehensive risk management approach, we can proactively
Challenges: tackle these challenges, enabling the benefits of autonomous
While Autonomous Agents represent the future of Software agents to prosper while preserving the security and integrity
Engineering, several challenges require attention. With respect of essential systems.
to Auto-GPT, the following challenges need to be addressed:
V. C ONCLUSION
A. Task or Goal Skipping: Auto-GPT demonstrates the
potential to skip a goal in cases where an error arises Autonomous agents have gained significant traction due to
during the execution of a particular task or goal within a their remarkable adaptability and capability to learn from user
stack of multiple goals. Furthermore, when multiple tasks input, revolutionizing the work processes of SDLC. We have
are presented simultaneously, it may experience confusion showcased the potential of autonomous agents, particularly
and inadvertently skip certain tasks. Additionally, there is a Auto-GPT, in requirement gathering, code and test case gen-
chance that certain tasks may be overlooked or not effectively eration and pushing the framework to the repository, demon-
addressed by Auto-GPT. strating their decision-making power. Our experiments have
B. Hallucinations: During the execution of tasks with also underscored the importance of context-specific prompts
High-Level prompts, it is prone to either invoke unrequired for precise results in complex frameworks such as Spring
classes, methods, or variables or generate files in unrequired Boot, Django, Flask, etc., using Auto-GPT. While excelling in
target paths. This inherent hallucinatory tendency needs to be simpler tasks, the agent (Auto-GPT) is observed to encounter
comprehensively addressed in order to enhance the reliability challenges with ambiguity and complexity, highlighting the
and accuracy of the system’s output. need for precise context.
C. The complexity of Tasks: Auto-GPT exhibits limitations Since autonomous agents are still developing, it is imperative
in handling complex tasks. When tasked with performing to conduct further research and enhancements for reshaping
complex operations, precise contextual information is Software Engineering practices. By addressing challenges in-
necessary for successful execution; otherwise, it encounters cluding task skipping, hallucinations, task complexity, circular
failures. Furthermore, specifying the correct target path is responses, and task completion, along with improving code
crucial when dealing with intricate frameworks, as failure to and test case quality, setting up strong risk management, and
do so results in file generation in incorrect locations. exploring multi-agent collaboration for parallel execution, we
D. Gets stuck in repetitive or circular responses: Despite see a future where agents like Auto-GPT will transform SDLC,
having unambiguous and well-formulated prompts, Auto-GPT enhancing software quality.
may still encounter issues of getting stuck in loops. This is R EFERENCES
primarily attributed to its limited contextual understanding,
which prevents it from generating diverse and contextually [1] A. Radford, K. Narasimhan, T. Salimans, and I.
appropriate responses. Sutskever, “Improving language understanding by gen-
E. Absence of task completion assessment mechanism: erative pre-training,” OpenAI, Tech. Rep. 1, 2018.
Auto-GPT marks a task as complete after saving results, [2] OpenAI, ChatGPT, https : / / openai . com / blog / chatgpt,
lacking subsequent validation for actual completeness. 2021.
[3] Significant-Gravitas, Auto-GPT, https : / / github . com /
Significant-Gravitas/Auto-GPT, 2023.
IV. F UTURE W ORK [4] GitHub, OpenAI and Microsoft, GitHub Copilot docu-
Future work for autonomous agents involves addressing mentation, https://docs.github.com/en/copilot, 2023.
challenges to enhance their capabilities. We aim to refine [5] F. Shi, X. Chen, K. Misra, et al., “Large language
complex code and test case generation by providing better models can be easily distracted by irrelevant context,” in
context, avoiding errors and improving the accuracy. To further International Conference on Machine Learning, PMLR,
enhance the performance of autonomous agents, we need 2023, pp. 31 210–31 227.
to address issues such as task or goal skipping, handling [6] S. Hong, X. Zheng, J. Chen, et al., “Metagpt: Meta
hallucinations and complex tasks, and preventing repetitive or programming for multi-agent collaborative framework,”
circular responses. arXiv preprint arXiv:2308.00352v2, 2023.
1857
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:06:08 UTC from IEEE Xplore. Restrictions apply.