AI Can Write Code Like Humans-Bugs and All - WIRED
AI Can Write Code Like Humans-Bugs and All - WIRED
2021 07:00 AM
The AI Database →
S O M E S O F T WA R E D E V E LO P E R S are now letting artificial intelligence help write their code. They’re finding that AI is
Last June, GitHub, a subsidiary of Microsoft that provides tools for hosting and collaborating on code, released a beta
version of a program that uses AI to assist programmers. Start typing a command, a database query, or a request to an
API, and the program, called Copilot, will guess your intent and write the rest.
Alex Naka, a data scientist at a biotech firm who signed up to test Copilot, says the program can be very helpful, and it
has changed the way he works. “It lets me spend less time jumping to the browser to look up API docs or examples on
Stack Overflow,” he says. “It does feel a little like my work has shifted from being a generator of code to being a
discriminator of it.”
But Naka has found that errors can creep into his code in different ways. “There have been times where I've missed some
kind of subtle error when I accept one of its proposals,” he says. “And it can be really hard to track this down, perhaps
because it seems like it makes errors that have a different flavor than the kind I would make.”
The risks of AI generating faulty code may be surprisingly high. Researchers at NYU recently analyzed code generated by
Copilot and found that, for certain tasks where security is crucial, the code contains security flaws around 40 percent of
the time.
The figure “is a little bit higher than I would have expected,” says Brendan Dolan-Gavitt, a professor at NYU involved with
the analysis. “But the way Copilot was trained wasn’t actually to write good code—it was just to produce the kind of text
that would follow a given prompt.”
Despite such flaws, Copilot and similar AI-powered tools may herald a sea change in the way software developers write
code. There’s growing interest in using AI to help automate more mundane work. But Copilot also highlights some of the
pitfalls of today’s AI techniques.
“It seems like it makes errors that have a different flavor than the kind I would make.”
While analyzing the code made available for a Copilot plugin, Dolan-Gavitt found that it included a list of restricted
phrases. These were apparently introduced to prevent the system from blurting out offensive messages or copying well-
known code written by someone else.
Oege de Moor, vice president of research at GitHub and one of the developers of Copilot, says security has been a
concern from the start. He says the percentage of flawed code cited by the NYU researchers is only relevant for a subset
of code where security flaws are more likely.
De Moor invented CodeQL, a tool used by the NYU researchers that automatically identifies bugs in code. He says GitHub
recommends that developers use Copilot together with CodeQL to ensure their work is safe.
The GitHub program is built on top of an AI model developed by OpenAI, a prominent AI company doing cutting-edge
work in machine learning. That model, called Codex, consists of a large artificial neural network trained to predict the
next characters in both text and computer code. The algorithm ingested billions of lines of code stored on GitHub—not all
of it perfect—in order to learn how to write code.
Keep Reading
Search our artificial intelligence database and discover stories by sector, tech, company, and more.
OpenAI has built its own AI coding tool on top of Codex that can perform some stunning coding tricks. It can turn a typed
instruction, such as “Create an array of random variables between 1 and 100 and then return the largest of them,” into
working code in several programming languages.
Another version of the same OpenAI program, called GPT-3, can generate coherent text on a given subject, but it can also
regurgitate offensive or biased language learned from the darker corners of the web.
Copilot and Codex have led some developers to wonder if AI might automate them out of work. In fact, as Naka’s
experience shows, developers need considerable skill to use the program, as they often must vet or tweak its suggestions.
Hammond Pearce, a postdoctoral researcher at NYU involved with the analysis of Copilot code, says the program
sometimes produces problematic code because it doesn’t fully understand what a piece of code is trying to do.
“Vulnerabilities are often caused by a lack of context that a developer needs to know,” he says.
Some developers worry that AI is already picking up bad habits. “We have worked hard as an industry to get away from
copy-pasting solutions, and now Copilot has created a supercharged version of that,” says Maxim Khailo, a software
developer who has experimented with using AI to generate code but has not tried Copilot.
SUBMIT
By signing up you agree to our User Agreement and Privacy Policy & Cookie Statement
Khailo says it might be possible for hackers to mess with a program like Copilot. “If I was a bad actor, what I would do
would be to create vulnerable code projects on GitHub, artificially boost their popularity by buying GitHub stars on the
black market, and hope that it will become part of the corpus for the next training round.”
Both GitHub and OpenAI say that, on the contrary, their AI coding tools are only likely to become less error prone.
OpenAI says it vets projects and code both manually and using automated tools.
De Moor at GitHub says recent updates to Copilot should have reduced the frequency of security vulnerabilities. But he
adds that his team is exploring other ways of improving the output of Copilot. One is to remove bad examples that the
underlying AI model learns from. Another may be to use reinforcement learning, an AI technique that has produced some
impressive results in games and other areas, to automatically spot bad output, including previously unseen examples.
“Enormous improvements are happening,” he says. “It’s almost unimaginable what it will look like in a year.”
Will Knight is a senior writer for WIRED, covering artificial intelligence. He was previously a senior editor at MIT Technology Review,
where he wrote about fundamental advances in AI and China’s AI boom. Before that, he was an editor and writer at New Scientist. He
studied anthropology and journalism in... Read more
SENIOR WRITER
TOPICS ARTIFICIAL INTELLIGENCE MACHINE LEARNING GITHUB PROGRAMMING DEVELOPERS OPENAI