Agents For Software Development
Agents For Software Development
Development
Graham Neubig
My Pro le
● Professor at CMU
● Maintainer of OpenHands
https://github.com/All-Hands-AI/OpenHands
● Software developer
fi
More and more major businesses and
industries are being run on software and
delivered as online services—from movies to
agriculture to national defense. […] Over the
next 10 years, I expect many more industries
to be disrupted by software […].
— Marc Andreessen - Why Software is Eating the World (2011)
17% 15%
Coding
Bug xing
14% Testing
Documents/Reviews
Communication
8% Other
36%
10%
https://github.com/All-Hands-AI/OpenHands-resolver
How Promising?
• Code generation leads to large improvements in
productivity (Github 2023)
Challenges in Coding
Agents
• De ning the Environment
• Designing an Observations/Actions
• Code Generation (atomic actions)
• File Localization (exploration)
• Planning and Error Recovery
• Safety
fi
Software Development
Environments
Types of Environments
• Actual Environments:
• Source Repositories: Github, Gitlab
• Task Management Software: Jira, Linear
• Of ce Software: Google Docs, Microsoft Of ce
• Communication Tools: Gmail, Slack
• Testing Environments:
• Mostly focused on coding!
• Developers do more, e.g. browse the web (next session)
fi
fi
Simple Coding
(Chen et al. 2021, Austin et al. 2021)
• e.g. HumanEval/
MBPP
• Examples of usage
of the Python
standard library
• Includes docstring,
some example
inputs/outputs, and
tests
Broader Domains:
CoNaLa/ODEX
(Yin et al. 2018, Wang et al. 2022)
fl
An Aside: Dataset Leakage
Existing New
• Leakage of datasets
is a big problem
• ARCADE shows that
novel notebooks are
harder than online
notebooks
• LiveCodeBench
(Jain et al. 2023)
shows that some
code LMs
outperform on
HumanEval
Dataset: Design2Code
(Si et al. 2024)
• Code generation from web sites
• De nes “event
stream” for coding,
execution, and
browsing actions/
observations
• Implements SWE-
agents style actions
as “agent skills” that
can be called
fi
Code-based LLMs
Basic Method: Code-
generating LM
https://github.com/All-Hands-AI/OpenHands/issues/4259
https://github.com/All-Hands-AI/openhands-resolver/issues/146
fl
fl
fi
Solution 2:
Prompt the Agent w/ Search Tools
• e.g. SWE-agent provides a tool for searching repositories
Solution 3:
A-priori Map the Repo
• Create a map of the repo and prompt agent with it
• Aider repomap creates a tree-structured map of the
repo
• Agentless (Xia et al. 2024) does a hierarchical
search for every issue
Solution 4: Retrieval-
augmented Code Generation
• Retrieve similar code, and ll it in with a retrieval-
augmented LM (Hayati et al. 2018)
• Particularly, in code there is also documentation, which
can be retrieved (Zhou et al. 2022)
fi
Fixing Based on Error
Messages
• e.g. InterCode (Yang et al. 2023)
Safety
Coding Models
can Cause Harm!
• By accident
• The coding model accidentally pushes to your
main branch
• The coding model is told to “make the tests
pass”, so it deletes the tests
• Intentionally
• Coding agents can be used for hacking (Yang et
al. 2023)
Safety Mitigation 1:
Sandboxing
https://github.com/settings/tokens?type=beta
Safety Mitigation 3:
Post-hoc Auditing
• e.g. OpenHands security analyzer
Action
OK NO
X
Observation
https://github.com/All-Hands-AI/OpenHands
Questions?