AI Copilot Code Quality 2025
AI Copilot Code Quality 2025
ut if you ask a Senior Developer “what would unlock your team’s full
B
potential?” their answer won’t be “more code lines added.” To retain high
project velocity over years, research suggests that a DRY (Don’t Repeat
Yourself), modular approach to building is essential[4][5]. Canonical
systems are documented, well-tested, reused, and periodically upgraded.
he data in this report contains multiple signs of eroding code quality. This
T
isnotto say that AI isn’t incredibly useful. Butitisto say that the frequency
of copy/pasted lines in commits grew 6% faster than our 2024 prediction.
Meanwhile, the percent of commits with duplicated blocks grew even faster.
Our research suggests a path by which developers can continue to
generate distinct value from code assistants into the foreseeable future.
ut here was something we didnotpredict: how mucheffort Google would invest into
B
exploring this particular question. They went so far as to attempt to formalize the exact
rate at which “greater AI adoption” could be mapped to “higher defect rate”:
Google DORA 2024’s extrapolated change in delivery stability per 25% increase in AI
In a common theme across 2024 AI projections: we got the general direction right, but
underestimated the magnitude of the change.
ottom line:2024 marked the first year GitClear hasever measured where the
B
number of “Copy/Pasted” lines exceeded the count of “Moved” lines.Moved
lines strongly suggest refactoring activity. If the current trend continues, we believe it
could soon bring about a phase change in how developer energy is spent, especially
among long-lived repos. Instead of developer energy being spent principally on
developing new features, in coming years we may find “defect remediation” as the
leading day-to-day developer responsibility.
2024: The first year on record where within-commit copy/paste exceeded moved lines
ven when managers focus on more substantive productivity metrics, like “tickets
E
solved” or “commits without a security vulnerability,” AI can juice these metrics by
duplicating large swaths of code in each commit. Unless managers insist on finding
metrics that approximate “long-term maintenance cost,” the AI-generated work their
team produces will take the path of least resistance: expand the number of lines
requiring indefinite maintenance.
In our 2024 research (analyzing data from 2023), we cautioned that the decline of
“Moved” code, toward our projection of 13.4%, coupled with the rise of “Copy/Paste”
code, toward a projected 11.6%, would create challenges to code maintainability in
2024.
he actual breakdown of code lines committed during 2024 was substantially worse
T
than our projection.
eyond that, the developer needs to understand the code they are refactoring well
B
enough to be confident they won’t break existing callers that relied on the original
behavior of the potentially refactored method.
he good news is that AI is already helping developers with the most trivial types of
T
rewrites. When considering the 17% increase in Find/Replace code, credit is due to AI
IDEs like Cursor that will, by default, rewrite code that does not adhere to the project’s
linting rules[13]. In practice, it means that projectsare, at least, more consistent on a
per-line basis. Unfortunately, per-line consistency is not the primary impediment to
long-term maintainability.
efactored systems in general, and moved code in particular, are the signature of
R
code reuse. As a product grows in scope, developers traditionally rearrange existing
code into new modules and files, to reduce the number of systems that need to be
maintained. Code reuse leaves fewer concepts (and “gotchyas”) for new team
members to learn. It also increases testing and documentation infrastructure that can
accumulate each time the system is invoked by a new team member.
In four years, the rate of this code reuse artifact is less than half its 2020 (“fully
human”) rate. It has sunkfrom 25% of changed linesin 2021 to < 10% in 2024.If
this were a matter of “reduced developer communication,” the trend line would not
have dropped by 44% in 2024 alone (from 16.9% to 9.5%). The growing “back to
office” mandates prove that “less reuse” will carry the day, even if developers exist in
the same room. Will leaders proactively champion the cause of “code reuse,” or will it
be discarded on behalf of “more lines equals moredone”? Difficult to predict; we
would welcome opinions from CTOs sent to ourContactInformation. We can report on
what we hear in next year’s report.
oogle’s latest report suggests that AI has shown great promise in its ability to
G
improve code documentation. But if we keep duplicating snippets within functions,
capturing the potential benefit of “well-documented functions” will remain elusive.
odern code assistants are constantly suggesting multi-line code blocks to be
M
inserted through a single press of the “tab” key. It’s readily apparent, from using these
tools, that many of the suggested code blocks have their origins in existing code.
Since the most popular code assistant of 2024 was limited to roughly 10 files that
could fit in its context window[A6], we wonderedif it would be possible to measure an
uptick in the percent of newly authored code that duplicated adjacent existing code
lines[16].
he following table shows the measured frequency of commits that contain a duplicate
T
block, by the year the code was authored:
ommits
C
Commits otal dupe
T Duplicate Median dupe
ear
Y containing dupe
scanned blocks found block % block size
block
2020 19,805 9,227 139 0.70% 10
2021 29,912 9,295 143 0.48% 11
2022 40,010 10,685 182 0.45% 11
2023 41,561 20,448 747 1.80% 10
2024 56,495 63,566 3,764 6.66% 10
ccording to our duplicate block detection method[A8], 2024 was without precedent
A
in the likelihood that a commit would contain a duplicated code block. The prevalence
of duplicate blocks in 2024 was observed to be approximately 10x higher than it had
been two years prior.
Instead of focusing on the specific Jira they had been assigned, the developer must
now tack-on the effort of understandingevery systemin the repo that duplicates the
code block they are changing. Likewise for every developerwho reviews the pull
request that changes code from disparate domains. They, too, must acquaint
themselves with every context where a duplicated block was changed, to render
opinion on whether the change in context seems prone to defect.
ext, each changed code domain must be tested. Picking themethodto test these
N
“prerequisite sub-tasks” saps the mental stamina of the developer. And because the
impromptu chore of “updating duplicate blocks” is rarely-if-ever budgeted into the
team’s original Story Point estimate, duplication is a significant obstacle to keeping on
schedule. As a developer falls further behind their target completion time, their morale
is prone to drop further. From the developer’s perspective, this task that had seemed
so simple, has now ballooned into systems where theymight have
little-or-no-familiarity. All this burden, without any observable benefit to teammates or
executives.
y analyzing 3,113 pairs of co-changed code lines, the researchers observe that
B
“5
7.1% of all co-changed clones are involved in bugs.”
In breaking down the prevalence of the three different types of code clones[A5], the
researchers find:
hat is, Type 1 clones (clones identical save for spaces and new lines) are often
T
present in different folders, but aside from that, the three types of clones are
distributed evenly between “same file,” “same folder,” and “different folder.”
When it comes to assessing the effect of these clones, the authors observe:
umerous studies have shown that code clones in traditional software systems
N
could cause bugs[4][7][8][9][10][11][12]
heir research strongly suggests that the elevated rate of bugs in cloned code is
T
contributing to the higher baseline of errors and defects observed since 2022.
urther research exploring the relationship between “Cloned code” and “Development
F
outcomes” can be found in the appendix[A9]. Whilethere are many different
approaches that researchers have used to evaluate the consequences of cloned
blocks, all research we reviewed over the past 10 years affirmed that cloned code (in
particular, cloned code blocks that do not consistently co-evolve) as a source of
observed defects, proportionally to how often they exist within a repo.
oogle DORA’s 2024 survey included 39,000 respondents–enough sample size to
G
evaluate how the reported AI benefit of “increased developer productivity” mixed with
the AI liability of “lowered code quality.”That researchhas since been released, with
Google researchers commenting:
his graph shows that for every 25% increase in the adoption of AI, their model
T
projectsa 7.2% decrease in “delivery stability.”But Google does not have a clear
iven the evidence from the survey that developers are rapidly adopting AI,
G
relying on it, and perceiving it as a positive performance contributor, we found
the overall lack of trust in AI surprising.
o Google researchers, they see AI adoption increasing, and they see developers
T
reporting greater productivity, and so the decrease in quality is interpreted as
unexpected.
ere is the age of about eight million changed lines across our ~10,000 repo sample
H
set over the past five years[A0]:
ote that this data does not include newly added lines – we analyzed those in the
N
“Trends in Code Changes” section. Here’s how the percentage of code being changed
within a month of authorship looks since 2020:
ut the 2024 ratios for “what type of code is being revised” do not paint an
B
encouraging picture. During the past year, only 20% of all modified lines were
changing code that was authored more than a month earlier. Whereas, in 2020, 30%
of modified lines were in service of refactoring existing code.
In git repos that maintain their change velocity over the course of years, developers
must divide their attention between implementing new features and updating/removing
legacy code. If every new feature is implemented viaaddedcode (without seeking out
update,move, anddeleteopportunities), the repogrows crowded withclonedand
near-clonedblocks. Adding new developers becomesincreasingly expensive, as they
struggle to determine which function, among a multitude of similar choices, should be
considered the “canonical” version that their code should call.
o gauge whether the shift toward modifying newer code was driven by “changing
T
team focus” vs “changing author tendencies,” here’s another set of yearly data points:
In the data table from [A7], we analyzed only linesthat were new to the repo. For the
45 million lines that met this criteria, we measured the percentage of newly added
lines that went on to be revised within 2 or 4 weeks.
he trend line here is a little cagey, with 2023 faking a return toward pre-AI levels. But
T
if we consider 2021 as the “pre-AI” baseline, this data tells us that, during 2024, there
was a 20-25% increase in the percent of new lines that get revised within a month.
oogle DORA’s 2024 report proves that developers trust the current generation of AI
G
assistants about as much as we trusted the previous generation, i.e., not much:
The median developer trusts AI-generated code “Somewhat,” with good reason
here is a lot of utility that AI provides, but the data from this year affirms why
T
long-term-oriented devs might eye their “tab” key with a faint sense of foreboding.
2.
Stack Overflow: 2024 State of Development Survey
[Stack
Overflow,
2024]
3.
An empirical study on bug propagation through code cloning
[ScienceDirect, Mondal et. al, 2019]
4.
On the Relationship of Inconsistent Software Clones and Faults
[Arxiv, Wagner et. al, 2016]
5.
Exploring the Impact of Code Clones on Deep Learning Software
[Association for Computing Machinery, 2023]
6.
Four Worst Software Metrics Agitating Developers
[GitClear,
2021]
7.
Bug Replication in Code Clones: An Empirical Study
[IEEE, Islam,
Mondal, 2016]
8.
Context-based detection of clone-related bugs
[FSE
Conference, Jiang,
Su, 2007]
9.
Assessing the effect of clones on changeability
[IEEE,
Lozano,
Wermelinger, 2008]
10.
Tracking clones' imprint
[IWSC Proceedings of the
4th International
Workshop on Software Clones, Lozano, Wermelinger, 2010]
11.
Comparative stability of cloned and non-cloned code: an empirical
study
[ACM Symposium, Mondal et. al, 2012]
12.
A Change-Type Based Empirical Study on the Stability of Cloned Code
[IEEE, Rahman, Roy, 2014]
13.
Cursor IDE automatically rewrites linter violations
[Cursor, 2025]
14.
CCFinder: a multilinguistic token-based code clone detection system
for large scale source code
[IEEE, Kamiya et al, 2002]
15.
Oreo: detection of clones in the twilight zone
[ACM,
Saini, 2018]
16.
Duplicate Code Block Detection
[GitClear, 2025]
1. A dded code. Newly committed lines of code that aredistinct, excluding lines that
incrementally change an existing line (labeled "Updates"). "Added code" also does not
include lines that are added, removed, and then re-added (these lines are labeled as
"Updated" and "Churned")
2. Deleted code. Lines of code that are removed, committed,and not subsequently
re-added for at least the next two weeks.
3. Moved code. A line of code that is cut and pastedto a new file, or a new function
within the same file. By definition, the content of a "Moved" operation doesn't change
within a commit, except for (potentially) the white space that precedes the content.
4. Updated code. A committed line of code based off anexisting line of code, that
modifies the existing line of code by approximately three words or less.
5. Find/Replaced code. A pattern of code change wherethe same string is removed
from 3+ locations and substituted with consistent replacement content.
6. Copy/Pasted code. Identical line contents, excludingprogramming language
keywords (e.g.,end })
, [), that are committed tomultiple files or functions within a
,
commit.
7. No-op code. Trivial code changes, such as changesto white space, or changes in line
number within the same code block. No-op code is excluded from this research.
pecific examples of GitClear's code operations can be foundin the Diff Delta
S
documentation. GitClear has been classifying git reposby these operations since 2020. As of
January 2025, GitClear has analyzed and classified around a billion lines of code over five
years, from a mixture of commercial customers (e.g., NextGen Health, Bank of Georgia) and
popular open source repos (e.g., Facebook React, Google Chrome). 211 million lines of code
were meaningful (not No-op) line changes, used for this research.
long with the evolution of code change operations, we are also exploring the change in
A
"Churned code." This is not treated as a code operation,because a churned line can be
paired with many operations, including "Added," "Deleted," or "Updated" code. For a line to
qualify as "churned," it must have been authored, pushed to the git repo, and then reverted or
substantially revised within the subsequent two weeks. Churn is best understood as "changes
Lines ll-line
A
Year Added Deleted Updated Moved Copy/pasted Find/replaced c hanged Churn*
* This is any line that was authored, and then subsequently changed within 2 weeks. Since this includes
lines that might get re-re-revised, we opted to focus this year more on the percent of churned lines that had
been newly authored (in theChurn Percentsection)
Open source repos that were analyzed as part of the data set included:
1. Chromium (chromium/chromium)
2. Facebook React (facebook/react)
.
1 dded code
A
2. Updated code
3. Deleted code
4. Copy/pasted code
5. Find/replaced code
6. Moved code
7. No-op code
itClear’s data is split about two-thirds private corporations that have opted in to anonymized
G
data sharing, and one-third open source projects (mostly those run by Google, Facebook,
and Microsoft). In addition to the code operation data, GitClear’s data set also segments and
excludes lines if they exist within auto-generated files, subrepo commits, and other
exclusionary criteria enumeratedin this documentation.As of February 2025, that
documentation suggests that a little less than half of the “lines changed” by a conventional git
stats aggregator (e.g., GitHub) would qualify for analysis among the 211m lines in this study.
The study does include commented lines – future research could compare comment vs.
non-comment lines. It could also compare “test code” vs “other types of code,” which
probably influences the levels of copy/paste.
If you know of other companies that report code operations of comparable granularity, please
contact[email protected]and this section will beupdated, and a new PDF document will be
uploaded with credit given to the contributor (if desired).
ere’s a rundown of the pertinent documentation for the reports that were referenced in this
H
research:
1. Viewing “Copy/Paste” and “Moved” percent compared to industry benchmarks
2. Setting up team goalsto track where cloned code blocksare being introduced
3. “Code Operation” and “Diff Delta velocity” reports
4. Code Line Provenance graph: How old is the code thatis being revised by team, by
repo, or by time range?
5. Tech Debt browserhelps to identify directories withatypical velocity or bug
characteristics
6. Pull Request Review tool: provably reduces lines to review by roughly 30%
or the metrics that matter most, many teams opt to setup Slack notifications. For example,
F
notifications can be set up to trigger if:
A
● pull request has been awaiting review longer than one business day
● More thannbusiness days (n = user configurable)have passed since the last activity
(comment or commit) on a pull request
● A pull request has experienced more thannroundsof feedback
● Multiple cloned code blocks are found, but one was modified while the other(s) weren’t
● The team’s overall percentage of “Bug work” exceeds some pre-determined threshold
itClear also offers Automated Changelogs, which can be used within a GitHub Profile
G
[example], a repo readme [same invocation as on profile],or embedded within a product’s
website [example].
hankfully, short-term churn has not yet reached 7%, but the overall shift toward increasingly
T
recent code being revised was sufficient to drive the increase in defects noted by the 2024
Google DORA report.
● Type 1
, indicates two code fragments are identical
except
for blank spaces or comments. It is also called an exact clone.
● Type 2
, indicates two code fragments have similar
syntax,
but contain different variable names, constant names, function
names, and so on. It is also called a renamed clone. Figure
1
shows an example of a Type 2 clone.
● Type 3
, indicates two code fragments that add, delete,
or
modify statements on the top of a Type 2 clone. It is also
called a gap clone. Figure
2
shows an example of a
Type 3
clone.
ow much context window did GitHub Copilot offer in 2024? GitHub does not offer a
H
documented number for this, but it’s not especially difficult to piece together an estimation:
● R eddit from post August 2024asking why context windowis limited to 4-8k depending
on backend chosen
● GitHub blog post from December 2024announcing thenew availability of 64k context
for queries made to ChatGPT-4o, implies that context window couldn’t have been
more than 32k previously in 2024
● GitHub issue from 2023asking why copilot is limitedto 8192 tokens
aken together, it seems likely that GitHub Copilot offered context in the 4-8k range, as the
T
Reddit post describes. If a project’s files average 300 lines of 50 character-long lines, that
implies that GitHub Copilot would be able to send at most (300 lines * 50 characters) / (4
characters per token) = 3,750 tokens per file, which would imply that around three full files of
this magnitude could fit into an 8k context window.
hile this extrapolation relies on a heavy dose of “back-of-napkin math,” it is consistent with
W
all published information to estimate that the most popular code assistant could probably fit
no more than 10 full files of context during 2024. The rest of the most popular code-writing
ew
N evised
R evised
R hurn New
C Revised hurned
C
lines in under C
hurned under 2 ed in 2 exl nder P
U ercent u nder 2 under 2
added month in month weeks week test month churned weeks weeks
5,434,04 4,759, 271,20
2020 4 298,369 5.49% 259,921 4.78% 520 2 5.70% 235464 4.33%
6,374,71 5,499, 371,84
2021 9 405,372 6.36% 357,602 5.61% 090 7 6.76% 329250 5.16%
7,118,97 6,261, 480,33
2022 7 529,287 7.43% 462,304 6.49% 462 6 7.67% 418699 5.88%
13,805,1 12,079 824,07
2023 81 901,014 6.53% 773,878 5.61% ,904 1 6.82% 706422 5.12%
12,762,6 10,700 887,64
2024 12 1,009,552 7.91% 898,991 7.04% ,097 8 8.30% 788268 6.18%
o backfill years prior to 2024, we queued up the largest (byDiff Delta) 1,000 commits made
T
to each repo between 2020 and 2023. Each of these 1,000 commits were evaluated for the
presence of Type 1 cloned code blocks[A5].
ince our backfill method gave precedence to analyzing large commits, and “more lines”
S
implies “more opportunity to contain a duplicate block,” our method should bias toward
reporting a greater percentage of “commits with a duplicate block” for the years prior to 2024.
hat the analyzed commits showed negligible code cloning, relative to 2024, further supports
T
the notion that cloned code has been on a steep upward trajectory.
hile their methods (and repos analyzed) were disparate, the papers agreed within 1.4% in
W
their assessment of bug prevalence within code clones. The fact that both research teams
detected a 17-18% rate of propagated bugs lurking in cloned code suggests a level of
robustness to their collective estimation.
● N
o updates have been made since publication on February 4, 2025. We will list
substantive revisions to the research here, to the extent they transpire.
Contact information
It’s tempting to predict that the momentum of “Added,” “Churned” and “Copy/Paste” code
lines are fueling a rise in the number of lower-quality pull requests being fielded by dev
teams. If you have any experience with, or interest in, this question, drop us a line and we
might investigate it in 2025. If we do investigate it, we pledge to write up and make our results
publicly available, whether they demonstrate a positive, negative, or null result.
If you would like to discuss this research, or have ideas on how to improve it, please contact
[email protected]or[email protected]directly.
e are happy to consider improvements to the clarity of this writing, or to explain how
W
GitClear can help teams measure the metrics explored by this research.