France, 2024, Navigating Software Development in The ChatGPT and GitHub Copilot Era

The article discusses the impact of generative artificial intelligence (GenAI) technologies, particularly large language models (LLMs) like ChatGPT and GitHub Copilot, on the software development landscape. It explores the potential for these tools to either supplement developers' skills or lead to job losses, while providing insights from a literature review and developer feedback. Additionally, it introduces a capability maturity model (CMM) framework for assessing the integration of LLMs in software development practices.

Uploaded by

Vinicius Almeida Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

France, 2024, Navigating Software Development in The ChatGPT and GitHub Copilot Era

Uploaded by

Vinicius Almeida Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Business Horizons (2024) 67, 649e661

Available online at www.sciencedirect.com

ScienceDirect
w w w. j o u r n a l s . e l s e v i e r. c o m / b u s i n e s s - h o r i z o n s

Navigating software development in the

ChatGPT and GitHub Copilot era
Stephen L. France

Mississippi State University, Mailstop 9582, Mississippi State, MS 39762, USA

KEYWORDS Abstract Generative artificial intelligence (GenAI) technologies using LLMs (large
Generative AI; language models), such as ChatGPT and GitHub Copilot, with the ability to create
Large language models; code, have the potential to change the software-development landscape. Will this
Software developers; process be incremental, with software developers learning GenAI skills to supple-
AI prompting; ment their existing skills, or will the process be more destructive, with the loss
Prompt engineering; of large numbers of development jobs and a radical change in the responsibilities
Capability maturity of the remaining developers? Given the rapid growth of AI capabilities, it is impos-
model sible to provide a crystal ball, but this article aims to give insight into the adoption
of GenAI with LLMs in software development. The article gives an overview of the
software-development industry and of the job functions of software developers. A
literature review, combined with a content analysis of online comments from de-
velopers, gives insight into how GenAI implemented with LLMs is changing software
development and how developers are responding to these changes. The article ties
the academic and developer insights together into recommendations for software
developers, and it describes a CMM (capability maturity model) framework for as-
sessing and improving LLM development usage.
ª 2024 Kelley School of Business, Indiana University. Published by Elsevier Inc. This
is an open access article under the CC BY license (http://creativecommons.org/
licenses/by/4.0/).

1. Generative AI: A coding revolution in billion words (Hughes, 2023). The previous gener-
the making? ations of AI often had quite well-defined inputs
and outputs. For example, a chess-playing AI
Generative artificial intelligence (GenAI), based on would be trained using the rules of chess and many
LLMs (large language models), exploded into public different past game scenarios. The current gen-
consciousness in November 2022, spurred by the eration of GenAI using LLMs, powered by billions of
release of a trial version of the chatbot ChatGPT. neurons, has more breadth and scope for
ChatGPT was developed by OpenAI utilizing the nonstructured tasks. ChatGPT and similar LLMs can
GPT-3 neural network, which has a size of 175 generate creative output from simple user
billion possible parameters and was trained on 300 prompts and have the potential to create
unstructured outputs, such as student essays,
news releases, computer code, music, and art
E-mail address: [email protected] (Sommers, 2023).

https://doi.org/10.1016/j.bushor.2024.05.009
0007-6813/ª 2024 Kelley School of Business, Indiana University. Published by Elsevier Inc. This is an open access article under the CC
BY license (http://creativecommons.org/licenses/by/4.0/).
650 S.L. France

While the effects of LLMs on education have software developers, with a focus on factors
grabbed headlines and policy interest, software related to the productivity of developers. This is
development has the highest rate of LLM adoption followed by a review of the initial findings on the
and usage, with nearly 25% of software- effects of LLMs on software development and a
development companies implementing LLM-based content analysis of developer conversations. While
GenAI initiatives (MacRae, 2023). Since the launch similar analyses have been performed in the
of first preview of GitHub Copilot in 2021 software-engineering literature, this analysis has a
(Gershgorn, 2021), coders have been utilizing LLMs broader focus than previous analyses, and includes
in their workflows, and as of July 2023, a survey on insights into developers’ hopes and worries for their
Stack Overflow of over 90,000 developers (Vincent, careers and into future-looking trends in the soft-
2023) found that over 70% of developers were using ware development industry. A capability-maturity
or planned to use LLM-based tools for coding in model (CMM) framework is introduced for evalu-
2023, though fewer than 50% of developers replied ating LLM-based software development, and con-
that they “somewhat trust” or “highly trust” AI tent analysis is used to help evaluate the maturity
coding tools. of current LLM-based development efforts.
But what do the rapid changes outlined above
mean for developers? In general, there are con-
trasting views of the effects of GenAI on employ- 2. Software development: Industry and
ment. It may be that AI will replace categories of skills
skilled jobs that were previously not at risk from
automation. But it may be that LLMs will be used The scale of the global software-development in-
to supplement workers’ existing skills, enabling dustry is enormous. As of 2023, globally there are
workers to adapt and to improve their productivity approximately 35 million software developers,
by incorporating AI into their workflows with the most common programming languages (in
(Chatravorti, 2023). This adaption is a key focus of order) being JavaScript, Java, Python, C/Cþþ,
this article. How can developers and development and C# (Noll et al., 2023). Most projections show
teams best incorporate LLMs into their workflows? that the demand for programmers will grow. In
How can they modify their skill sets to thrive in an addition to dedicated software developers, there
environment where LLM-based tools are are many millions more people who perform some
ubiquitous? coding tasks as part of their job requirements, a
This article looks to help software developers segment of developers that may grow with
and business managers who utilize the services of increased use of LLMs that democratize coding and
software developersdfor example, a marketing allow more people to easily create code
manager who needs custom reports or sales- (Davenport et al., 2023). Examples include ac-
automation macrosdapprise this rapidly changing countants writing VBA code to automate spread-
environment. One could pose the question: Is sheets, business analysts running analyses in R or
GenAI with LLMs going to lead to a radical trans- Python code to find actionable insights from data,
formation of the software industry with mass job and artists writing programming scripts to help
losses among developers, or will the effects be animate artwork and add visual effects.
more positive, with developers using LLMs to Despite high-profile cutbacks in the tech sector
improve productivity? Given the pace of techno- in 2023 by companies such as Amazon, Meta, and
logical advancement, it is impossible to provide a Salesforce (Trueman, 2023), long-term growth is
definite prediction of the overall future of soft- expected in software development jobs. In the US,
ware development; however, it is possible to the Bureau of Labor Statistics (2022) has estimated
apprise the current technological situation and around 15% growth in computer and information
initial implementations of LLMs, and to make technology jobs between 2021 and 2031.1 In
concrete, forward-looking recommendations. addition, commentators have predicted a global
Thus, the major aims of this article are, first, to shortage of technology workers, with a potential
understand how LLMs are being implemented in talent shortage of 85 million software developers
the software industry; second, to provide de- by 2030 (Sloyan, 2021).
velopers with a framework for implementing LLMs;
1
and third, to provide advice to developers on how There is a projected increase for the categories of “Web
to adapt to this new environment. Developers and Digital Designers” and “Software Developers,
Quality Assurance Analysts, and Testers”, but a slight decrease
The article starts with a brief overview of the for the “Computer Programmers” category. This may indicate a
current state of the global software industry, fol- change in development focus or perhaps a change in job title
lowed by analysis of the working environment for nomenclature.
ChatGPT and Copilot development 651

3. The development environment and informal among team members or can be formal
developer productivity processes built into the software development
process, with assigned reviewers and scores given
Thus far, this article has explored how LLMs are to reviewed output, and automated processes
increasingly being used by software developers. To checking for common errors and coding-standards
fully understand how LLMs impact developers, it is violations (Badampudi et al., 2023). Code reviews
important to understand some of the basics of can differ between companies. For example, pair
developer roles and workflows. This section acts as programming incorporates reviewing into the pro-
a prerequisite for the remainder of the article by gramming process by having one programmer
summarizing academic software development working as a driver and focusing on the coding and
work and describing aspects of developer jobs that syntax, while another programmer works as a
will be affected by LLMs, along with associated navigator, guiding the coding and design process
terminology. and reviewing code (Bird et al., 2022). Google
Working as a software developer is a multifaced employs a lightweight review process, with
job. Besides coding, job responsibilities can changes often reviewed by only one reviewer,
include testing, documentation, systems design albeit with high reviewing standards and quick
and analysis, technical support, and interacting review times (Sadowski et al., 2018).
with clients. Software development is a creative The knowledge required by software developers
field, but it is important to approach coding tasks can be immense, with developers needing to know
systematically. For example, research found that programming-language commands, coding stan-
when making code changes, developers who dards and syntax, licensing information, tool
methodologically worked through the existing usage, bug fixing, and how to access both general
code to understand it had better performance and organization-specific coding libraries and re-
than developers who took a more piecemeal sources. It is impossible for a developer to have all
approach (Robillard et al., 2004). this required information available on tap from
Generally, as developers become more experi- memory. To get additional information and
enced, they take on broader responsibilities, with knowledge, practitioners utilize a range of re-
senior developers taking on both supervisory resources, including help searches in integrated
sponsibilities over junior developers and manage- development environments (IDEs), text searches
ment responsibilities over areas such as coding on the web, colleagues, product documentation,
standards and processes (Ardimento et al., 2022). online forums and bulletin boards, online tutorials,
Developers have both external and intrinsic moti- code repositories, and blogs (e.g., Xia et al.,
vations for good performance. For example, the 2017). Developers often engage in opportunistic
tech industry is notorious for stack-ranking re- reuse of code found on the web but need to edit
views, where all employees are ranked on a scale, and adapt code to fit into their codebases
and a certain bottom percentage of employees are (Ciborowska et al., 2018). This practice saves time
marked as having unacceptable performance but leads to potential security and licensing issues.
(e.g., Hess, 2023), which leaves employees with The Stack Overflow community (including the
constant performance anxiety. Intrinsic motivation Stack Exchange coding forums) is particularly
often comes from a sense of pride in one’s tech- important for programmers, allowing them to ask
nical ability. Research has shown that software questions and receive answers. The whole archive
developers value a sense of agency and control of questions and answers is searchable. In fact,
over their work and have a sense of achievement querying Stack Overflow to find useful code snip-
from working productively on coding tasks (Meyer pets and to help find solutions for debugging code
et al., 2021), and they often test their skills with is an important skill (Li et al., 2022).
competitive coding challenges (Kumar et al., The environment described above is one in
2018). which developers utilize a range of knowledge and
A developer does not exist in a vacuum, and a skills but frequently need to search for additional
developer’s success depends on a range of factors, resources and knowledge. Developers work with
including interactions with other team members other team members, review code, and manage
and the use of tools and search resources to find code in repositories. A developer’s responsibilities
information. A developer may work on a piece of depend on their level of experience. Many of the
code individually or with other people, and quality facets of development described above are in the
control and code reviews are a big part of the process of being altered by LLMs. Despite the
development process. Code reviews can be recent introduction of LLM-based tools for coding,
652 S.L. France

with the first preview of GitHub Copilot, released with both ChatGPT (Feng et al., 2023) and GitHub
in 2021 (Gershgorn, 2021) and a demo version of Copilot (Zhang et al., 2023), which found such
ChatGPT, released in 2022 (Marr, 2023), the use of benefits as helpfulness in generating code,
these tools for coding has become widespread. reduced coding time, and integration with stan-
Along with the rapid commercial implementa- dard editors. The same studies also found some
tion of LLMs, researchers have noted the potential drawbacks, including poor code quality, privacy
for GenAI to transform development, but they issues, issues with the pricing or subscription
have also found that there are limits to what LLMs structure, and limitations on the size of the code
can achieve. Overall, most studies have found that that could be created. My analysis builds on this
GenAI using LLMs works well on small, well-defined previous work but aims to provide a wider under-
coding samples. But LLM usage stumbles with standing of the environment for GenAI and soft-
larger, more complex problems, and though LLMs ware development, and of the opportunities and
can create good-quality code, they are not at the anxieties expressed by developers regarding
stage of being completely independent from GenAI. Reddit was an important source, as it has a
human control. Most of the empirical studies on much broader scope than Stack Overflow or GitHub
code correctness (e.g., Nguyen et al., 2022) found and includes more general discussion of careers.2
that LLMs could only solve a proportion of the To find relevant posts, I searched for each of the
posed problems correctly. Additionally, LLMs do different LLMs along with terms related to coding.
not always generate code to high security stan- The search was not restricted to any specific sub-
dards, and since they are reliant on the code base reddit, though the majority of posts came from
used to train them, it is possible for security risks coding- and development-focused subreddits.
and malicious code to be propagated through the Posts were chosen that (1) had both LLM and
LLM models (e.g., Pearce et al., 2022). However, coding content, (2) were written by commenters
there is great potential for LLMs to help speed up who had at least a minimal knowledge of coding or
code development by offering suggestions and by development, and (3) had original content (i.e.,
creating initial versions of code (e.g., Imai et al., were not purely a reply to other posts). A
2022; Peng et al., 2023), and LLMs could be an saturation-sampling approach was used to ensure a
effective alternative to a human pair programmer. broad range of posts for each of the initial the-
The currently deployed LLM models are a long way matic categories, which resulted in 2,567 posts
from artificial general intelligence. The actual being retrieved. I then used a semiautomated
performance of LLMs may depend on the applica- content-analysis procedure (e.g., Horne et al.,
tion and on how much training code is available. 2020) to help categorize the posts. Categories
For example, Nguyen et al. (2022) found that LLMs were formed by analyzing the topics in the liter-
perform differently on different programming ature review and then matching the categories
languages. Using LLMs with a niche language, one with those found from a cluster analysis of the text
with nonstandard syntax, is likely to give poorer comments.3 Cluster analysis is a method of
performance than with a more popular language, grouping items and is used in the business world for
such as Java or Python. grouping or segmenting products or customers. It is
also used for finding and extracting topics from
text data. In this case, I used cluster analysis to
4. Qualitative content analysis of categorize the Reddit comments into homoge-
developer views neous groups or topics. A word-cloud visualization
of the categories is given in Figure 1. The world
The previous section gave some insight into the cloud gives the top 50 terms based on occurrence
usage of LLMs in software development and into in the category comments over the baseline rate
some of the potential benefits and limitations of
LLM usage. But how do developers feel about these
new technologies? Can we gain any additional in- 2
I considered using Twitter too, but given its recent removal
sights from the initial implementation of LLMs by of academic API access and its ongoing platform instability, I
developers? I performed a social-media content discounted it.
3
analysis to gain further insight into how software The comments had punctuation and low-information “stop”
developers are utilizing GenAI and LLMs and to words removed, and data were converted into text feature
understand trends in LLM usage, with a view to data (with words þ bigrams), and k-medoids clustering was
used. An eight-cluster solution was chosen using stability anal-
giving forward-looking recommendations for LLM ysis, and these eight clusters were mapped to six categories,
use. This work follows previous work that analyzed with clusters for the different LLMs mapped into one platform
LLMs using social-media and discussion-board posts category.
ChatGPT and Copilot development 653

Figure 1. Category word cloud

across all topics, with the size of each term based concepts in the categories and core representative
on relative occurrence. comments. To name each of the categories, the
To aid further analysis, each comment was top terms from the word cloud were analyzed,
assigned both a centrality score (how core the along with the comments in each category with the
article was to the category)4 and a positive/ highest centrality. A summary of the resulting six
negative sentiment score, derived using sentiment categories is given in Table 1.5 I selected 10
analysis. This information was used to find core representative comments for each category, and I
combined some very similar comments. The

4
This was operationalized as a silhouette score for cluster
5
analysis, with larger values of the score indicating more central A full dataset with the comments and cluster analysis is
comments. available from the author.
654 S.L. France

Table 1. Summary content for thematic categories

Category 1: Developer issues and jobs
Why do businesses hire developers when they can build their apps with no code solutions?
Most programmers are not writing difficult code at all, I think AI today can compete with entry level positions.
A lot of folks are going to become prompt engineers. Using ChatGPT these days, I think of myself as a solution
engineer or solutions architect.
I never got into coding and now I’m doing stuff I usually had to ask our engineers to do.
I see a lot of lower/middle people getting a bite taken out of their salaries in the coming years, and there being
increased requirements/competition.
It is to modern programming what high-level languages were to assembly designers, or what digital programming
was to card-punchers.
This will just speed up and change the workflow. Mechanical engineers aren’t obsolete because CAD is faster than
drawings.
Freezing entry level hiring because of AI is incredibly dumb and short-sighted. You don’t hire juniors to do simple,
menial tasks. You hire juniors to turn them into seniors.
It can help entry level people scale their work faster, help them write test cases, help them think through
problems, and so on.
AI probably does kill several jobs, but it also is going to be best when paired with skilled workers to enhance what
they can do, or help train lower skill workers to improve themselves more quickly.
I have never met a client who could explain what they needed well enough to get an AI to code it.
I think this is my problem with inexperienced analysts/coders using the tool. If they use a LLM to write their code
and get a result, they have no idea how to validate that result. Learning the technology is still a good idea IMO. A
job interviewer for an analyst position may expect you to know a few things.
Category 2: General coding issues and platform performance
The GitHub copilot is the best coding tool I’ve ever seen in 25 years of coding. Increased my speed by 50% by doing
all the boring repetitions and quickly suggesting whole functions based on comments.
Just spend most of the time writing good documentation and have copilot fill in the rest.
I do more complex algorithms and generation in ChatGPT4 and use Copilot just to speed things up in my editing.
Copilot consistently saves me about 1-4 hours a week of writing code manually.
GitHub Copilot is very good at what it does, suggestions and autocomplete, but is not fully functional by itself. It
needs good prompts and guidance from a human developer.
I used GitHub Copilotdit saves my fingers from keystrokes for boiler plate code. But it also generates utter
garbage where I stare at it.
Unless it knows my entire codebase, APIs, and database, it really doesn’t do much more than give me a template
for functions.
It generates patterns.But it doesn’t spend time thinking: “Okay, so this is easy, but that is best practice.”
Copilot code cleanup is horrible. It’s good for writing boiler-plate code, and that’s about it.
He’s right in saying that writing code goes quicker, except now you’re spending 10x time fixing bugs.
Every coder will tell you ChatGPT produced horrendous code and makes $%# up as it goes as along.
Category 3: Testing code
I find it useful for writing generic unit tests and to document the tests with comments.
The largest benefit we’ve found is how adept it is at extrapolating from one unit test and autocompleting 5e6
additional test cases for a function.
It honestly generates an entire 30-line test exactly how I would have.
It is useful for regexp optimizing, unit test auto generation, and static analysis.
Once a test is done, it autocompletes my other tests, which follow the same pattern.
My belief is we will be the ones writing the tests. What better way to verify the output?
I could easily fall into the trap of expecting ChatGPT to write all the tests for me and then just have it enter a
feedback loop where it’s only looking at tests it has generated meaning the tests become less reliable over time.
Tests are the check to make sure the AI is in check. If AI is writing tests, you have to double check the tests.
I love that people would rather have AI write tests for them than admit that our testing practices are rudimentary
and could use substantial improvement.
I find best use of ChatGPT is to run a check on your own code to see if it needs improvements, and for writing unit
tests.
It seems a little risky to rely on it for tests, since it can still hallucinate sometimes.
ChatGPT and Copilot development 655

Table 1 (continued )
Category 4: Understanding code
I can now understand the code BETTER than I ever diddthrough prompting questions and getting really good
explanations.
Point to a line number of a code snippet and it’ll tell you exactly what, why, and how it works.
I ask it to explain how each piece of code works then save the snippet of code for future use after gaining full
understanding.
I always understand the code first, then rewrite it to my coding standards.
I’m good enough of a programmer to be able to use and modify it as I need it, but I don’t fully understand the
code.
I would never use ChatGPT to write code. I need to understand my code.
I think I have never committed a piece code that I did not understand. I am responsible for it, so I need to know
what it is doing.
I always understand the code first, then rewrite it to my coding standards. If you don’t, your codebase will turn
into an unmaintainable heap of $%#^.
Having to fix things forces me to learn and understand the code, so a win-win all around.
Sometimes it can make weird mistakes or miss details that require you to understand the code after all.
It doesn’t understand code. It’s predicting what should come next. For simpler things that have a ton of examples
it is easy. For others it is hard.
ChatGPT has been teaching me to be a better programmer because I review the code to make sure it’ll actually
work. Some of which I don’t understand why and I ask it to explain.
Category 5: Open-source and legal issues
The copilot AI frequently copypastes code directly from its training data. That would be fine, since that’s what a
lot of developers do anyway, were it not for open source/free software licenses.
Open source licenses will need to be updated to include specific verbiage regarding usage in training sets since
this is a grey area
This could be like sampling music, but it still has to be differentiated enough and code has less leeway to claim
artistic expression or parody.
Things like the Quake code snippet that included the comments of the license and the original comments on the
implementation make using large snippets of code from copilot concerning legally.
If an engineer memorizes and replicates a copyrighted section of code, it will still violate copyright. Fair use for AI
is a gray area.
They need to include the relevant license and attribution if the output is an exact match from something within
the training set.
If it’s using code with copy left license to train, it must also be licensed properly.
If this is regularly direct copying unlicensed code, I would expect it to be banned by large corporations purely out
of risk management.
xxxxxx shouldn’t be able to destroy open source licensing by putting an AI in front of code copying.
Category 6: Stack Overflow and information searches
It is like Stack Overflow, but you can ask follow-up questions to help diagnose code you struggle with.
It is like having someone on Stack Exchange answer my question instantly.
It is preferable to Stack Overflow as you do not need to deal with “stuck up $%#^&,” sarcasm, or condescension.
It’s just a more efficient Stack Overflow.
What’s crazy is that it did 99% of its training On Stack Overflow, and somehow got the technical knowledge
WITHOUT becoming an $%#^&%.
It saves you a Google search and a couple clicks through outdated stack overflow posts.
It is good for small, trivial, or minor questions that impede your progress but aren’t important enough for Stack
Overflow.
If you feel like after some time ChatGPT is taking you in a loop, then you have to ask aka Slack Overflow.
It doesn’t explain as well as Stack Overflow and it misses lots of nuances/gives incorrect explanations.
I’d much rather use an article where the author validated and explained the code, like Stack Overflow.
It tries to guess the answer you want, not to deliver the real answer. You are better off Googling for code on
GitHub or Stack Overflow.
One of the barriers I faced when starting to code (and probably most people do too) is dealing with small, trivial,
or minor questions that impede your progress but aren’t considered important enough to ask on forums like Stack
Overflow or would take up to an hour of Googling.
656 S.L. France

comments were standardized for grammar and The exact balance will depend on the develop-
consistency. ment environment, developer experience, and
The results show a large degree of concurrence the goals of the project (e.g., business reports
with the literature summarized in the previous or mission critical software). Senior developers
section, but there are some interesting additional should mentor junior developers on how to
insights, particularly concerning the developer job make use of LLMs while maintaining coding
market and practical limitations of LLM models. quality.

4.1. Category 1: Developer issues and jobs

4.3. Category 3: Testing code
This category focuses on the broad effects of LLMs
on developer job security. More pessimistic com- This category focuses on the usage of LLM tools for
menters thought that junior developers, who testing. Positive commenters noted that LLMs are
typically undertake simpler coding tasks, would good at writing small unit tests (i.e., for individual
become automated or at least face tougher functions) from documentation; once given some
competition. Other commenters noted that com- initial tests, they can then autocomplete a
panies still needed a pipeline for senior developers comprehensive range of tests. More cautious
and that automating basic coding functions could commenters worried that automating testing and
risk this. LLMs can be used to enhance the skills of coding could lead to a lack of controls, or that
more senior developers and help train more junior LLMs can get stuck in a feedback loop by basing
developers. Some commenters noted that devel- tests only on previous tests, while others main-
opment has gradually moved from low-level tained that since LLMs can hallucinate, it is too
development (assembly language) through to risky to entrust testing to LLMs.
high-level development and no- or low-code envi-
ronments, and they observed that the use of Key takeaways: At the current stage of maturity,
prompts is just an additional level of abstraction LLMs should not be entrusted both to create and
for developers. also test the same code. LLMs can be used to
increase the efficiency of testing and docu-
Key takeaways: While GenAI tools may improve mentation (particularly on unit tests), but
efficiency, trying to completely replace junior strong quality controls should be introduced to
developers will harm the development pipeline, ensure proper test coverage and accuracy.
so there will still be opportunities for junior
developers. But all developers must learn new
workflows and new skills, such as prompting 4.4. Category 4: Understanding code
LLMs.
This category focuses on the broad topic of un-
derstanding code. Some commentators note that
4.2. Category 2: General coding issues and when used with appropriate prompts, LLMs can
platform performance improve developers’ understanding of code. There
is a general view that when using code generated
This category contains discussions on the actual by LLMs, developers should ensure that they fully
utility of the new LLM-based tools. Positive com- understand the code before using it, as a devel-
ments noted time savings of 50%, removing re- oper committing code is responsible for it,
petitive tasks, and usefulness in simple coding including errors generated by LLMs. Some com-
tasks and generating documentation. Negative mentators admit that they do not always fully
comments noted poor-quality code, a lack of understand the code that they utilize, whether
knowledge of proprietary software and application from an existing code base or generated by LLMs.
programming interfaces (APIs), time lost in addi-
tional testing, and the need for skilled, knowl- Key takeaways: Understanding code is key to
edgeable developers to provide meaningful LLM quality software implementations. Code taken
prompts. piecemeal from LLMs can lead to unexpected
functionality and potential security and privacy
Key takeaways: Developers should understand risks. Any code generated by LLMs should be
how best to utilize LLMs to improve efficiency read through line by line and thoroughly docu-
without compromising quality, and they should mented. LLMs have strong potential for helping
learn how to write effective prompts for LLMs. developers understand existing code and can be
ChatGPT and Copilot development 657

utilized to help the learning curve of new de- 5. A roadmap for developers
velopers assigned to existing projects.
The content analysis provides key takeaways for
4.5. Category 5: Open-source and legal the implementation of LLMs and builds on the
issues summaries of research on the best practices for
the initial introduction of LLMs to the development
This category focuses on issues with licensing. environment. But how can developer LLM usage be
Commenters note that code used to train LLMs evaluated effectively? A commonly used method-
may have licenses that allow free use, but often, ology for appraising an organization’s current
the licenses do not allow commercial distribution implementation status and maturity level with a
or using the code without attribution. There have development technology is the capability maturity
been instances where LLMs have replicated copy- model (CMM). The CMM was originally developed as
righted code exactly, such as code from the game a methodology for process improvement in soft-
Quake. This poses a challenge from a risk- ware development/engineering at IBM in the 1980s
management standpoint. Commenters note that (Paulk, 2009), but models of similar maturity have
the legal concept of fair use is currently untested been deployed in a range of business areasdfor
for coding applications, and it may take lawsuits example, sourcing innovation (Legenvre &
and court rulings to develop precedence. Gualandris, 2018) and human resources
(Wademan et al., 2007)dand have already been
Key takeaways: Any company engaging in soft- implemented for general AI technologies (e.g.,
ware development needs a clear policy on code Sadiq et al., 2021).
reuse and copyright. This is true for code snip- The general idea of a maturity model is that in
pets taken from Stack Overflow as well as code the initial stage, when a new technology or busi-
taken from LLMs. In particular, there should be ness practice is introduced, this is often done in an
rules to prevent large chunks of code being used ad-hoc manner with little coherent documentation
verbatim with comments. The legal situation or process. In the repeatable stage, there is some
and any precedent-setting court cases should be degree of documentation and process control at
monitored carefully. the project level, but processes are still quite
chaotic. At the defined stage, processes are
4.6. Category 6: Stack Overflow and properly documented at the organizational level,
information searches and process standards are developed. In the
managed stage, process metrics are gathered at an
This category puts the use of LLMs into context organizational level to guide evaluation, and in the
with existing methods of searching for informa- optimized stage, these metrics are used to opti-
tion. Multiple comments (over 20 in the full data- mize processes incrementally. This use of metrics
set) focused on the general unfriendliness of Stack and statistics for process improvement is key to
Overflow participants and the fact that novice the top two maturity stages of most CMM imple-
programmers were put off from asking questions mentations. Different variants of the CMM use
by potential sarcasm or condescension. Com- different terminology for the stages, but there is
menters also noted that using LLMs is faster than some commonality in that, over time, processes go
waiting for answers on Stack Overflow. But some from unmanaged and reactive to managed and
commenters noted that Stack Overflow provides proactive. An organization can analyze its pro-
high-quality explanations of the rationale for cesses on different CMM dimensions and then
coding decisions, which were sometimes superior develop a plan for process improvement to reach a
to answers created by LLMs. higher level of the CMM.
The CMM implemented for this article is given in
Key takeaways: Developers should be encouraged Table 2. The dimensions chosen were Tools, Cod-
to utilize a full range of search resources, ing, Copyright & Security, Testing, and Mentorship.
including internal documentation, IDE help func- This is not an exact one-to-one mapping to the
tionality, coworkers, Stack Overflow, and LLMs. All content analysis, but the dimensions include the
have advantages and disadvantages. Developers main elements of the development process found
could utilize LLMs for syntax and library queries, in the content analysis and literature review. Each
rely on colleagues for questions about company of the first five rows of the table contains a
coding standards, and utilize Stack Overflow for description of one of the CMM stages. The last row
more complex coding issues and queries. contains summaries of category discussions
658 S.L. France

Table 2. A capability maturity model (CMM) for LLM-based software development

Copyright &
Tools Coding Testing Mentorship
security
Initial Individual There are no There is no There are no There is no
developers utilize rules or policies process for policies for the mentorship for
GenAI tools in an on the use of checking the usage of LLMs in junior developers
ad-hoc fashion. LLMs for copyright of testing. on appropriate
generating code. generated code use of LLMs.
or code that
introduces
privacy or
security issues.
Repeatable All developers There is There are basic There is some There is ad-hoc
have access to awareness of best rules for code usage of LLMs for knowledge
specified LLM practices for the usage, but testing, and LLMs transfer between
tools (e.g., use of LLMs, potential are also in junior and senior
ChatGPT, GitHub which is relayed copyright creating tests. developers on
Copilot). to developers. violations and best practices for
security breaches using LLMs.
are not tracked.
Defined Developers have There are rules There are There is There is
access to governing when documentation documentation organized
standardized to use LLMs for and rules for on how to utilize mentoring for
tools, along with coding and avoiding LLMs for unit junior developers
documentation procedures for potential testing and to ensure
on how to utilize ensuring that copyright quality standards appropriate use
the LLMs for rules are violations and to ensure full test of LLMs.
different tasks. enforced. privacy or coverage.
security issues.
Managed Developers have Use of LLMs in There are LLMs are fully Statistics are
access to LLM coding, along documentation incorporated into made available
tools, along with with developer and audits of LLM the testing on developer
usage statistics, statistics, (time code usage, to process. Tests productivity and
which can be taken for tasks, ensure that large and results are coding quality.
used to help errors, etc.) is chunks of code or fully Mentors use these
adapt these tools tracked. Coders potentially documented, statistics to help
and the IDE for can utilize these copyrighted with statistics advise junior
optimal usage. statistics to algorithms are available on the developers.
improve avoided, and that efficacy of LLM
efficiency. privacy and coding and LLM
security issues do testing.
not occur.
Optimized Using statistics Developer LLM usage Statistics on Metrics should be
and developer metrics of LLM statistics, along developer used to help
feedback, LLM usage are used to with records of productivity and individual junior
tools and IDE understand potential coding quality are developers
features are optimal LLM copyright, made available. improve their
optimized to usage and to privacy, and Mentors use these development
improve coding improve security breaches statistics to help processes and
quality and company-wide from LLM use, are advise junior training
productivity. processes for used to developers. procedures for
usage of LLMs. proactively alter the use of LLMs.
processes.
ChatGPT and Copilot development 659

Table 2 (continued )
Copyright &
Tools Coding Testing Mentorship
security
Content- Comments in C2 Discussions in C4 The comments in Discussion in C3 Discussions in C1
analysis and C4 indicate on how C5 show that gives ideas on on the
insights that many developers adapt developers how to test with importance of
developers have code to coding understand LLMs in different training new
access to LLM standards and use licensing issues testing scenarios developers and
tools. However, LLMs to and that LLMs can (e.g., unit tests), on the dangers of
most developers understand code. regurgitate along with using LLMs
utilize LLMs in an C2 contains usage copyrighted limitations and without knowing
ad-hoc manner to scenarios and code. But there is risks (e.g., how to evaluate
support specific limitations. little discussion completely code.
coding tasks. of specific automated coding
corporate rules, and testing).
policies, and
processes.

relating to the maturity levels of current LLM- software (both unit and system testing), and
based development efforts. Unsurprisingly, given integrating systems with other systems and data
the recent introduction of LLM-based coding tools, sources. Application development requires strong
much of the implementation of these tools is still domain knowledge that must be built over time.
very much ad-hoc. Many of the content-analysis Having rapid access to a large amount of knowl-
discussions focused on trying out LLMs for some edge and being able to query this knowledge effi-
aspect (e.g., coding, testing, or documentation) of ciently is of great use to developers. To use a
a developer’s workflow and then evaluating the computing analogy, developers only have a small
performance of LLM-based development as portion of the knowledge required to complete
opposed to traditional development. In these dis- their jobs stored in accessible memory, and LLMs
cussions, amid concerns about best practices for provide a rapid interface to the remaining knowl-
coding, testing, and training, one can see the edge in long-term memory (i.e., the internet, code
emergence of more repeatable and defined repositories, documentation). Software develop-
processes. ment has gone through a process of abstraction
While this CMM framework should provide de- over the last 50 years, moving from assembly lan-
velopers with a roadmap to measure and improve guage to compiled languages, such as Fortran or C/
LLM processes, it is not intended to be inflexible or Cþþ, through interpreted languages, such as Py-
overly prescriptive. As described previously, the thon and Java, to templated, no- or low-code so-
development world is a broad church. An organi- lutions (e.g., Sundberg & Holmström, 2023). The
zation developing mission-critical software for a use of prompting with LLMs is another step on this
power station or an airline may initially want to path, and developers need to adapt to this envi-
restrict LLM usage to purely informational usaged ronment to prosper.
for example, to help developers understand an LLMs are here to stay. With LLMs, productivity
existing code base. An organization developing requirements for developers may increase, so de-
custom macros for business or sales automation velopers need to learn how to integrate LLMs into
software, on the other hand, may take a more their workflows to improve efficiency and keep up
aggressive approach to LLM usage, as this area has with other developers. This may be achieved by
lower barriers to entry than mission-critical benchmarking different strategiesdfor example,
software. by using LLMs just for quickly finding syntax (and
not for more complex tasks) or by supplementing
them with human testing and commenting, and by
6. Remarks for developers testing these strategies in the types of coding
challenges discussed in the literature review. LLM
Software developers have many different job skills are often complementary to non-LLM skills.
functions. Besides coding, job functions include For example, there are cases where Stack Over-
working with clients to build requirements, testing flow queries may receive insightful answers from
660 S.L. France

expert coders, but for simple syntax queries, LLMs able to implement them to improve coding effi-
provide quicker answers and avoid some of the ciency while preserving quality and accounting for
problematic aspects of human interaction noted potential copyright and security issues. They can
by commenters in the content analysis. achieve this by implementing and improving pro-
Many commenters in the content analysis cesses and standards for LLM usage, using frame-
stated that LLMs will sometimes hallucinate code, works such as the CMM.
so it is a good idea to understand how well Developers need to supplement traditional
different types of prompts perform for different coding skills with skills in prompting LLMs to aid in
aspects of the specific programming language the creation of code, documentation, and tests. As
being used. For example, commenters noted that several commenters in the content analysis noted,
LLMs work well with Python and Java, so it may software developers will still be needed until the
be that complex queries in these languages have advent of completely sentient AI, a hypothesized
a high rate of success, which may not be true for event that will transform society, and which is
all languages. But developers need to be able to beyond the scope of this analysis.
work without LLMs. Security issues in certain
fields (e.g., defense), or well-publicized copy-
right issues or lawsuits (such as the ongoing
References
lawsuit against OpenAI discussed in the content-
Ardimento, P., Bernardi, M. L., Cimitile, M., Redavid, D., &
analysis comments), may cause risk-averse cor- Ferilli, S. (2022). Understanding coding behavior: An incre-
porations to bar LLMs, an event mentioned by mental process mining approach. Electronics, 11(3), 389.
several commenters in the content analysis, so Badampudi, D., Unterkalmsteiner, M., & Britto, R. (2023).
developers wishing to maximize opportunities Modern code reviewsdsurvey of literature and practice.
must still work on improving their traditional ACM Transactions on Software Engineering and Methodol-
ogy, 32(4), 1e61.
coding and non-LLM search skills. Bird, C., Ford, D., Zimmermann, T., Forsgren, N.,
Prompting skills are important. Prompting can Kalliamvakou, E., Lowdermilk, T., & Gazit, I. (2022). Taking
be used for both generating code and for queries flight with copilot: Early insights and opportunities of AI-
to understand code. Multiple commentators in the powered pair-programming tools. Queue, 20(6), 35e57.
content analysis noted the importance of good Bureau of Labor Statistics. (2022, September 8). Occupational
outlook handbook: Computer and information technology
prompts, and several speculated that prompt en- occupations. Available at https://www.bls.gov/ooh/
gineer will soon become a developer job title. In computer-and-information-technology/home.htm
fact, courses and documentation are now available Chatravorti, B. (2023, June 25). How will AI change work? A look
for prompt engineering. A coder’s exact prompting back at the ‘productivity paradox’ of the computer age
style will depend on the particular LLM and the shows it won’t be so simple. Fortune. Available at https://
fortune.com/2023/06/25/ai-effect-jobs-remote-work-
language and domain of the coding problem, but it productivity-paradox-computers-iphone-chatgpt/
is likely that skills can be built through trying Ciborowska, A., Kraft, N. A., & Damevski, K. (2018, May).
queries at different levels of abstraction and Detecting and characterizing developer behavior following
carefully examining results. In a similar fashion to opportunistic reuse of code snippets from the web. In Pro-
the Google algorithm for search-engine optimiza- ceedings of the 15th International Conference on Mining
Software Repositories (pp. 94e97). New York, NY: Associa-
tion, it is likely that the optimal method of tion for Computing Machinery.
prompting will change with updates to the LLM Davenport, T. H., Barkin, I., & Tomak, K. (2023). We’re all pro-
algorithms, so developers will need to conduct grammers now. Harvard Business Review, 101(5), 98e107.
constant testing and stay abreast of updated Feng, Y., Vanam, S., Cherukupally, M., Zheng, W., Qiu, M., &
training materials to keep their prompting skills Chen, H. (2023). Investigating code generation performance
of Chat-GPT with crowdsourcing social data. In Proceedings
honed. of the 47th IEEE Computer Software and Applications Con-
ference (pp. 876e885). Piscataway, NJ: Institute of Elec-
7. Concluding comments trical and Electronics Engineers.
Gershgorn, D. (2021, June 29). GitHub and OpenAI launch a new
Overall, GenAI and LLMs have already had a large AI tool that generates its own code. Verge. Available at
https://www.theverge.com/2021/6/29/22555777/github-
effect on development practices and workflows. openai-ai-tool-autocomplete-code
While commenters in the content analysis dis- Hess, A. J. (2023, February 16). Ranking workers can hurt
played some unease about the future of develop- morale and productivity. Tech companies are doing it any-
ment as LLMs become more widespread, it is way. Fast Company. Available at https://www.fastcompany.
unlikely that software development will be com/90850190/stack-ranking-workers-hurt-morale-
productivity-tech-companies
destroyed as a profession. However, developers do Horne, J., Recker, M., Michelfelder, I., Jay, J., & Kratzer, J.
need to adapt to this new environment. They (2020). Exploring entrepreneurship related to the sustain-
should understand the potential of LLMs and be able development goals-mapping new venture activities
ChatGPT and Copilot development 661

with semi-automated content analysis. Journal of Cleaner of GitHub Copilot’s code contributions. In 2022 IEEE Sym-
Production, 242, 118052. posium on Security and Privacy (pp. 754e768). Piscataway,
Hughes, A. (2023, September 25). ChatGPT: Everything you NJ: Institute of Electrical and Electronics Engineers.
need to know about OpenAI’s GPT-4 tool. BBC. Available at Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The
https://www.sciencefocus.com/future-technology/gpt-3/ impact of AI on developer productivity: Evidence from
Imai, S. (2022, May). Is GitHub Copilot a substitute for human GitHub Copilot. arXiv. Available at https://arxiv.org/abs/
pair-programming? An empirical study. In Proceedings of the 2302.06590
ACM/IEEE 44th International Conference on Software Engi- Robillard, M. P., Coelho, W., & Murphy, G. C. (2004). How
neering (pp. 319e321). New York, NY: Association for effective developers investigate source code: An explor-
Computing Machinery. atory study. IEEE Transactions on Software Engineering,
Kumar, R., Hasteer, N., & Van Belle, J. P. (2018, January). 30(12), 889e903.
Evaluating factors influencing contestant behavior in Sadiq, R. B., Safie, N., Abd Rahman, A. H., & Goudarzi, S.
competitive software development. In 2018 8th Interna- (2021). Artificial intelligence maturity model: A systematic
tional Conference on Cloud Computing, Data Science, and literature review. PeerJ Computer Science, 7, e661.
Engineering (pp. 20e25). Piscataway, NJ: Institute of Elec- Sadowski, C., Söderberg, E., Church, L., Sipko, M., &
trical and Electronics Engineers. Bacchelli, A. (2018, May). Modern code review: A case study
Legenvre, H., & Gualandris, J. (2018). Innovation sourcing at Google. In Proceedings of the 40th International Confer-
excellence: Three purchasing capabilities for success. Busi- ence on Software Engineering (pp. 181e190). New York, NY:
ness Horizons, 61(1), 95e106. Association for Computing Machinery.
Li, A., Endres, M., & Weimer, W. (2022, May). Debugging with Sloyan, T. (2021, June 8). Is there a developer shortage?
Stack Overflow: Web search behavior in novice and expert Yes, but the problem is more complicated than it looks.
programmers. In Proceedings of the ACM/IEEE 44th Inter- Forbes. Available at https://www.forbes.com/sites/
national Conference on Software Engineering (pp. 69e81). forbestechcouncil/2021/06/08/is-there-a-developer-
New York, NY: Association for Computing Machinery. shortage-yes-but-the-problem-is-more-complicated-than-it-
MacRae, D. (2023, May 16). A quarter of tech firms use gener- looks
ative AI for software development. Developer. Available at Sommers, J. (2023, June 13). How to create with AI, including
https://www.developer-tech.com/news/2023/may/16/a- art, music, and writing, according to people who’ve written
quarter-of-tech-firms-use-generative-ai-for-software- songs, stories, and letters. Business Insider. Available at
development/ https://www.businessinsider.com/how-to-create-with-ai-
Marr, B. (2023, May 19). A short history of ChatGPT: How we got art-music-writing-chatgpt-dall-e-2-2023-6
to where we are today. Forbes. Available at https://www. Sundberg, L., & Holmström, J. (2023). Democratizing artificial
forbes.com/sites/bernardmarr/2023/05/19/a-short-history- intelligence: How no-code AI can leverage machine learning
of-chatgpt-how-we-got-to-where-we-are-today/? operations. Business Horizons, 66(6), 777e788.
shZ299ae6a4674f Trueman, C. (2023, June 19). Tech layoffs in 2023: A timeline.
Meyer, A. N., Murphy, G. C., Zimmermann, T., & Fritz, T. (2021). Computerworld. Available at https://www.computerworld.
Enabling good work habits in software developers through com/article/3685936/tech-layoffs-in-2023-a-timeline.html
reflective goal-setting. IEEE Transactions on Software Engi- Vincent, J. (2023, June 13). Stack Overflow survey finds de-
neering, 47(9), 1872e1885. velopers are ready to use AI tools d Even if they don’t fully
Nguyen, N., & Nadi, S. (2022, May). An empirical evaluation of trust them. Verge. Available at https://www.theverge.com/
GitHub Copilot’s code suggestions. In Proceedings of the 19th 2023/6/13/23759101/stack-overflow-developers-survey-ai-
International Conference on Mining Software Repositories (pp. coding-tools-moderators-strike
1e5). New York, NY: Association for Computing Machinery. Wademan, M. R., Spuches, C. M., & Doughty, P. L. (2007). The
Noll, B., Korakitis, K., Solodkov, N., Dodd, L., & Muir, D. (2023, people capability maturity model. Performance Improve-
May 1). State of the Developer Nation 24th Edition - Q1 2023. ment Quarterly, 20(1), 97e123.
Available at https://www.developernation.net/resources/ Xia, X., Bao, L., Lo, D., Kochhar, P. S., Hassan, A. E., & Xing, Z.
reports/state-of-the-developer-nation-24th-edition-q1-2023 (2017). What do developers search for on the web? Empirical
Paulk, M. C. (2009). A history of the capability maturity model Software Engineering, 22, 3149e3185.
for software. ASQ Software Quality Professional, 12(1), Zhang, B., Liang, P., Zhou, X., Ahmad, A., & Waseem, M. (2023).
5e19. Practices and challenges of using GitHub Copilot: An
Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. empirical study. arXiv. Available at https://arxiv.org/abs/
(2022, May). Asleep at the keyboard? Assessing the security 2303.08733