Understanding Bugs in Rust Compilers
Understanding Bugs in Rust Compilers
Abstract—Rust compilers play a foundational role in the Rust for researchers in the field of programming languages.
language. Like any complex system, they are susceptible to There have many recent empirical studies on open-source
bugs, which can impact the correctness and reliability of software bugs, while empirical studies on the bugs of Rust
the compiled Rust programs. To gain a deeper understanding compilers are still lacking. Current researches on Rust mainly
of these bugs, this paper presents the first comprehensive focus on memory and thread safety practices. For instance,
analysis of historical bugs in two widely used Rust compilers: Qin et al. [21] analyze memory and thread safety practices in
Rustc and Rust-GCC. The analysis delves into the bugs’ real-world Rust programs, identifying memory-safety issues
characteristics, bug-proneness locations, bug root causes, and and concurrency bugs. Their findings offer insights into Rust
bug-fixing efforts. The findings reveal that the majority of bugs program behaviors and propose directions for building bug
in Rustc are associated with the compiler’s kernel, while Rust- detectors. Evans et al. [9] analyze the usage and safety impli-
GCC experiences most bugs related to the cleanup process. cations of unsafe Rust in real-world libraries and applications.
Among all modules, the ‘src/librustc’ module exhibits the They reveal the presence of unsafe Rust in call chains and
highest bug-proneness in the Rustc compiler, whereas the discuss the challenges for Rust’s memory safety guarantees.
‘gcc/rust’ modules demonstrate the highest bug-proneness in To bridge the existing research gap regarding bugs in Rust
the Rust-GCC compiler. Furthermore, the study reveals that compilers, this paper presents the first empirical study on the
the bug-fixing process is accelerated when test cases utilize characteristics of bugs in Rust compilers. Our investigation
Rust’s concurrency features. focuses on the historical bugs in two widely used open-
source Rust compilers: Rustc and Rust-GCC. The open-source
Keywords–Empirical Study; Rust Compiler; Historical Bug
nature of these compilers facilitates our empirical study. To
gather comprehensive data, we collected issues, commits, pull
1. I NTRODUCTION
requests, and source code from GitHub, covering the period
Rust is a young programming language known for its safety, up until June 19, 2023. This data will enable us to address the
concurrency, and performance [18]. It relies on its compilers following four research questions.
to translate Rust code into efficient machine code. However, • RQ1: What are the characteristics of bug-triggering test
like any complex software system, Rust compilers are not cases? (Bug-triggering Test Cases)
immune to bugs. These bugs may introduce critical errors • RQ2: What are the underlying root causes of bugs in Rust
and impact the correctness and reliability of the compiled compilers? (Bug Root Causes)
programs. Identifying and diagnosing bugs in Rust compilers • RQ3: Which parts of Rust compilers are more susceptible
is an important but challenging task. Rust compilers consist to bugs? (Bug-Proneness Locations)
of multiple components and complex algorithms for pars- • RQ4: What factors influence the duration of bug fixing in
ing, semantic analysis, optimization, and code generation. Rust compilers? (Bug-Fixing Factors)
The language itself prioritizes safety and reliability through
features like ownership, borrowing, and strict static typing. The four research questions provide a comprehensive ex-
While these features reduce the likelihood of bugs, when ploration of bugs in Rust compilers, examining them from
they do occur, they often manifest as subtle behavioral differ- various perspectives: bug-triggering test cases, root causes of
ences or unexpected output, making them difficult to identify. bugs, locations prone to bugs, and factors influencing bug
Moreover, the translation of high-level code to machine code fixing. The findings addressing these research questions can
introduces additional complexities and potential bug oppor- be summarized as follows:
tunities. These bugs may not have an evident source in the • Answer to RQ 1: Bug-triggering test cases in both Rustc and
original code, making it harder to pinpoint their origin. As Rust-GCC have an average of fewer than 14 lines of code
a result, locating and fixing bugs in Rust compilers poses (LoC) and fewer than 3 variables. In Rustc, bug-triggering
challenges for developers and maintainers. Thus, it is crucial to test cases exhibit an average of 2.15 usages of ownership
understand these bugs as it enables efficient bug detection and and borrowing, 0.35 usage of error handling, 0.19 usages
resolution. This understanding instills confidence in using Rust of pattern matching, and 0.22 usage of unsafe Rust code.
as a programming language, knowing that efforts are made to In Rust-GCC, bug-triggering test cases have an average of
minimize risks associated with compiler bugs. The insights 1.14 usages of ownership and borrowing, 0.19 usages of
gained from studying these bugs may also provide inspiration error handling, and 0.23 usages of unsafe Rust code.
139
RQ 4 explores the factors influencing bug fixing. The objective to 203,146, indicating active development and maintenance.
of RQ 4 is to investigate the factors that influence the duration The modified files in Rust-GCC reach 1,014,668, reflecting
required to fix bugs in the Rust compiler. The findings may substantial code modifications.
contribute to understanding bug-fixing duration and provide
insights for the improvements to streamline bug resolution and TABLE II
T HE S TATISTICS OF I SSUES .
enhance overall development efficiency. To answer RQ 4, we
examine how developers’ activities and Rust features of test Open Reopen Completed Not Planned
cases influence bug duration and investigate their relationships Rustc 8,915 (18.2%) 82 (0.2%) 39,326 (80.4%) 566 (1.2%)
Rust-GCC 278 (28.9%) 3 (0.3%) 679 (70.5%) 3 (0.3%)
with Pearson [19] and Spearman [26] correlation coefficients.
140
TABLE IV
T HE STATISTICS OF C OMMITS .
TABLE V
T HE STATISTICS OF BUG - TRIGGERING TEST CASES .
closed. Notably, a significant percentage of the closed pull classes in the bug-triggering test cases of Rustc and Rust-GCC.
requests, i.e., 1,222 (91.3%), have been successfully merged, For Rustc, the bug-triggering test cases have an average LoC
indicating a high level of acceptance and integration. However, of 13.68, with a median of 9. This indicates that the test cases
a small proportion, i.e., 117 (8.7%), of the closed pull requests tend to be relatively concise. In terms of variables, the average
are still awaiting merge. The table highlights the differences in number is 2.39, with a median of 2. The test cases also contain
pull request statuses between the two projects. Rustc exhibits an average of 0.99 functions and 0.2 classes. Overall, the test
a higher number of closed pull requests, with a considerable cases in Rustc account for a total of 19,543 files. In the case of
percentage already merged. Rust-GCC also demonstrates a Rust-GCC, the bug-triggering test cases have an average LoC
predominantly closed status, with a high merge rate. However, of 12.54, with a median of 7. Similarly to Rustc, the test cases
the proportion of open pull requests and those not yet merged in Rust-GCC exhibit a relatively concise nature. The average
is relatively higher in Rustc compared to Rust-GCC. number of variables is 1.97, with a median of 1. Additionally,
the test cases contain an average of 1.45 functions and 0.34
3.3 Commits
classes. The total number of files covered by the test cases in
Table IV presents the statistics of commits in Rustc and Rust- Rust-GCC is 101.
GCC, including the verification status and types of commits. In summary, this table highlights the differences in the char-
For Rustc, out of a total of 227,016 commits, 40,440 (17.8%) acteristics of bug-triggering test cases between Rustc and
have been verified, while the remaining 186,576 (82.2%) are Rust-GCC. Both projects exhibit relatively concise test cases
yet to be verified. In terms of code changes, additions account with similar patterns in terms of lines of code and variables.
for 38.0% of total code changes, whereas deletions comprise However, there are variations in the number of functions and
62.0% of total code changes. In contrast, for Rust-GCC, a total classes between the two projects. Rustc has a higher average
of 202,225 commits were recorded, with only 921 (0.5%) be- number of functions and classes compared to Rust-GCC.
ing verified and the majority, i.e., 201,304 (99.5%), remaining
unverified. Code additions in Rust-GCC make up 37.8% of Finding 1: On average, the number of lines of code (LoC)
total code changes, while deletions constitute 62.2% of total in bug-triggering test cases is less than 14, and the number
code changes. These statistics reveal differences between the of variables is less than 3.
two projects. Rustc exhibits a higher proportion of verified 4.1.2 Rust Features in Bug-triggering Test Cases
commits compared to Rust-GCC, with a notable percentage of Table VI provides statistics on the occurrence of Rust features
commits yet to be verified in both projects. Moreover, Rustc in bug-triggering test cases for Rustc and Rust-GCC. The table
has a higher ratio of deletions compared to additions, while consists of two sections, each representing the average (Avg.)
Rust-GCC shows a relatively balanced distribution between and sum (Sum.) of the Rust features in the test cases for both
additions and deletions. compilers. We investigate seven Rust features:
4. E MPIRICAL B UG A NALYSIS 1) Ownership and Borrowing: Rust’s ownership model en-
This section explores the bug features in Rustc and Rust-GCC sures memory safety and prevents bugs by controlling
compilers from four aspects: properties of the bug-triggering resource allocation and deallocation. Borrowing allows
test case, the root causes of the bugs, the locations of the bugs, multiple references to a resource without transferring own-
and the factors related to bug duration. ership, enabling safe concurrent programming.
2) Pattern Matching: Rust’s pattern matching feature enables
4.1 Properties of Bug-triggering Test Cases (RQ 1) concise control flow and data manipulation by matching
4.1.1 Statistics of Bug-triggering Test Cases and deconstructing data structures based on patterns.
Table V presents the statistics of bug-triggering test cases 3) Error Handling: Rust provides robust error handling
in Rustc and Rust-GCC. The table exhibits the average and through the Result type and the match or ? operator,
median number of the lines of code, variables, functions, and promoting explicit and reliable error handling.
141
TABLE VI TABLE VII
T HE STATISTICS OF RUST FEATURES IN BUG - TRIGGERING TEST CASES . T HE TOP 10 LABELS IN ISSUES .
142
14) cleanup: This label is used for issues related to code 4.3 Bug-Proneness Locations in Rust Compilers (RQ 3)
cleanup or refactoring. It indicates tasks aimed at improv- 4.3.1 Bug-Proneness Modules
ing code quality, readability, or maintainability. Table VIII exhibits the top 10 bug-proneness modules in Rustc
15) upstream: This label is assigned to issues regarding the and Rust-GCC.
upstreaming of Rust-GCC into GCC. It indicates tasks
related to integrating Rust-GCC changes or features into TABLE VIII
the main GCC codebase. T HE T OP 10 B UG -P RONENESS M ODULES
16) diagnostic: This label is used for issues related to diagnos- Rustc Rust-GCC
tic static analysis. It indicates tasks related to improving Module Name # Issue Module Name # Issue
the static analysis and diagnostic capabilities of Rust-GCC. Top 1 src/test 6754 gcc/rust 323
17) plan: This label is assigned to issues that require planning Top 2 src/librustc 1782 gcc/testsuite 231
Top 3 src/librustdoc 1265 README.md 11
or discussion before implementation. It helps to track and Top 4 src/libsyntax 922 gcc/config 9
manage tasks that need further analysis or coordination. Top 5 src/libstd 886 Dockerfile 5
18) GCC: This label is used for issues specifically related to the Top 6 src/librustc typeck 817 gcc/c-family 4
Top 7 src/tools 767 gcc/ada 4
GCC compiler. It indicates tasks that are directly related Top 8 src/libcore 648 gcc/DATESTAMP 4
to the GCC codebase or functionality. Top 9 src/librustc mir 634 Makefile.def 4
19) parser: This label is assigned to issues related to the Top 10 tests/ui 624 Makefile.in 4
parser component of Rust-GCC. It indicates tasks related
to parsing and processing Rust code. In Rustc, the module ‘src/test’ is related to 6,754 reported
20) community: This label is used for issues related to issues. This is caused by the update of test cases in Rustc’s
community engagement or initiatives. It indicates tasks regression testing. ‘src/librustc’ follows closely with 1,782
aimed at fostering community participation or addressing reported issues, highlighting issues related to the core func-
community-related concerns. tionality of the Rustc compiler. Similarly, ‘src/librustdoc’ has
1,265 reported issues, suggesting the need for improvements
in the Rust documentation generator. In Rust-GCC, ‘gcc/rust’
is the most problematic module with 323 reported issues,
For Rustc, the most frequently used label is ‘T-compiler’, indicating difficulties in the integration of Rust within the GCC
appearing in 14,046 issues, which accounts for 12.6% of the compiler. ‘gcc/testsuite’ follows with 231 reported issues,
total labeled issues. The second most common label is ‘C- emphasizing the need for effective testing of Rust integration.
bug’, found in 13,106 issues, representing 11.8% of the labeled The remaining modules in both projects demonstrate a lower
issues. Additionally, the label ‘I-ICE’ is applied to 6,182 number of reported issues.
issues, comprising 5.6% of the labeled issues. The table further
lists other frequently used labels, such as ‘A-diagnostics’, ‘C- Finding 4: The module “src/librustc” exhibits the highest
enhancement’, ‘T-rustdoc’, ‘T-libs-api’, ‘E-easy’, ‘T-lang’, ‘P- bug-proneness in the Rustc compiler, while the modules
medium’, indicating their respective occurrence frequencies “gcc/rust” demonstrate the highest bug-proneness in the
and percentages within Rustc issues. In the case of Rust- Rust-GCC compiler.
GCC, the most prevalent label is ‘bug’, applied to 433 is-
sues, accounting for 36.1% of the labeled issues. The label 4.3.2 Bug-proneness Files
‘enhancement’ is assigned to 289 issues, representing 24.1% Table IX presents the top 10 bug-proneness files in two
of the labeled issues. Furthermore, the labels ‘good-first-pr’, projects, Rustc and Rust-GCC.
‘cleanup’, and ‘upstream’ are applied to 153, 92, and 34 issues, In Rustc, the file ‘src/librustc typeck/check/mod.rs’ is iden-
respectively. The remaining labels in the top 10, such as ‘di- tified as the most problematic file with 373 reported issues,
agnostic’, ‘plan’, ‘GCC’, ‘parser’, and ‘community’, indicate indicating potential challenges in the type checking pro-
their corresponding occurrence frequencies and percentages cess. Similarly, ‘src/libsyntax/parse/parser.rs’ follows closely
within Rust-GCC issues. with 336 reported issues, highlighting potential issues in
the parsing phase. ‘src/librustdoc/clean/mod.rs’ ranks third
with 299 reported issues, emphasizing the need for improve-
In summary, this table highlights the variations in labels
ments in the Rust documentation generator. For Rust-GCC,
between Rustc and Rust-GCC. Most bugs in Rustc are related
the file ‘gcc/rust/typecheck/rust-tyty.h’ stands out with 73
to the kernel of compilers (‘T-compiler’). In Rust-GCC, ‘bug’,
reported issues, suggesting difficulties in the type check-
‘enhancement’, and ‘good-first-pr’ do not imply the bug’s
ing process for Rust integration within the GCC compiler.
root causes. Thus, most bugs in Rust-GCC are related to the
‘gcc/rust/typecheck/rust-tyty.cc’ follows with 69 reported is-
cleanup process (‘cleanup’).
sues, indicating potential bugs in the type checking imple-
Finding 3: Most bugs in Rustc are related to the kernel of mentation. The other files in the top 10 for both projects
compilers, while most bugs in Rust-GCC are related to the also demonstrate a significant number of reported issues,
cleanup process. highlighting the importance of examining and addressing bugs
within these files.
143
TABLE IX
T HE TOP 10 B UG -P RONENESS FILES .
Rustc Rust-GCC
File Name # Issue File Name #Issue
Top 1 src/librustc typeck/check/mod.rs 373 gcc/rust/typecheck/rust-tyty.h 73
Top 2 src/libsyntax/parse/parser.rs 336 gcc/rust/typecheck/rust-tyty.cc 69
Top 3 src/librustdoc/clean/mod.rs 299 gcc/rust/typecheck/rust-hir-type-check-expr.h 64
Top 4 Cargo.lock 277 gcc/rust/parse/rust-parse-impl.h 47
Top 5 src/librustc resolve/lib.rs 249 gcc/rust/Make-lang.in 44
Top 6 src/librustc/middle/ty.rs 206 gcc/rust/hir/tree/rust-hir-item.h 41
Top 7 src/librustdoc/lib.rs 195 gcc/rust/hir/rust-ast-lower-item.h 40
Top 8 src/tools/miri 187 gcc/rust/typecheck/rust-hir-type-check-implitem.h 40
Top 9 src/libstd/lib.rs 187 gcc/rust/backend/rust-compile-expr.h 39
Top 10 src/libsyntax/ast.rs 173 gcc/rust/resolve/rust-ast-resolve-item.h 35
TABLE X
T HE RELATIONS BETWEEN ISSUE DURATION AND THE MAINTENANCE OF RUST COMPILER .
Rustc Rust-GCC
Pearson Spearman Pearson Spearman
statistics p-value correlation p-value statistics p-value correlation p-value
Changes of Files 4.78E-02 1.30E-07 2.48E-01 3.08E-170 -1.91E-02 6.40E-01 1.84E-01 5.40E-06
Pull Comments 1.75E-01 8.46E-84 2.44E-01 6.52E-164 -3.68E-02 3.68E-01 6.24E-02 1.26E-01
Pull Duration 2.34E-01 1.81E-150 4.15E-01 0.00E+00 1.68E-01 3.53E-05 2.32E-01 8.06E-09
Issue Comments 1.85E-01 3.44E-305 3.48E-01 0.00E+00 4.21E-01 1.14E-30 2.81E-01 7.33E-14
Finding 5: The file “src/librustc typeck/check/mod.rs” ex- demonstrates a moderate positive correlation (0.168) with
hibits the highest level of bug-proneness in Rustc, while issue duration, and its p-value of 3.53E-05 confirms statistical
“gcc/rust/typecheck/rust-tyty.h” demonstrate the highest bug- significance. Issue comments display the strongest positive
proneness in Rust-GCC. correlation in this section, with a coefficient of 0.421 and a
highly significant p-value of 1.14E-30.
4.4 Factors Impacting Bug-Fix Duration (RQ 4) Finding 6: The more files that are modified, the longer it
takes to fix the issue.
4.4.1 Relationships between Issue Duration and Compiler
Maintenance
Finding 7: Bugs that are discussed more frequently tend to
Table X provides findings regarding the relationship between
require more time for the fix.
issue duration and various aspects of the Rust compiler’s
maintenance. The table presents correlation coefficients and p-
values calculated using both Pearson and Spearman methods Finding 8: As the duration of pull requests increases, so
for Rustc and Rust-GCC. does the duration of the associated issues.
For the Rustc compiler, the table shows correlations and
statistical significance values for four variables: changes of 4.4.2 Relationships between Issue Duration and Rust Features
files, pull comments, pull duration, and issue comments. The Table XI presents the statistical relationships between issue
results indicate that changes of files have a positive correlation duration and various features of the Rust programming lan-
of 0.048 with issue duration, which is statistically significant guage. The table is also divided into two sections, Rustc and
(p-value = 1.30E-07). Pull comments also show a statistically Rust-GCC, and provides correlation coefficients and p-values
significant positive correlation (0.175) with issue duration (p- calculated using both Pearson and Spearman methods.
value = 8.46E-84). Pull duration demonstrates a moderate pos- In the Rustc section, the table analyzes the correlations and sta-
itive correlation (0.234) with issue duration, and its statistical tistical significance values for 7 Rust features, i.e., ownership
significance is confirmed by a p-value of 1.81E-150. Issue and borrowing, pattern matching, error handling, concurrency,
comments exhibit the strongest positive correlation among the macros, trait, and unsafe Rust code. The results indicate
variables, with a coefficient of 0.185 and an extremely low p- that ownership and borrowing have a positive correlation
value of 3.44E-305. In the case of Rust-GCC, the correlations of 0.00446 with issue duration, but it is not statistically
with issue duration are generally weaker compared to Rustc. significant (p-value = 0.419). Pattern matching exhibits a
Changes of files show a weak negative correlation (-0.019), very weak positive correlation (0.000393) that is also not
which is not statistically significant (p-value = 0.640). Pull statistically significant (p-value = 0.943). Error handling shows
Comments also lack statistical significance (p-value = 0.368) a negative correlation (-0.0177) with issue duration, and its p-
and have a weak negative correlation (-0.036). Pull duration value of 0.00138 confirms statistical significance. Concurrency
144
TABLE XI
T HE RELATIONS BETWEEN ISSUE DURATION AND THE RUST FEATURES .
Rustc Rust-GCC
Pearson Spearman Pearson Spearman
statistics p-value correlation p-value statistics p-value correlation p-value
Ownership and Borrowing 4.46E-03 4.19E-01 4.41E-02 1.52E-15 -1.78E-02 7.97E-01 2.46E-02 7.22E-01
Pattern Matching 3.93E-04 9.43E-01 1.70E-04 9.75E-01 1.56E-01 2.27E-02 1.06E-01 1.23E-01
Error Handling -1.77E-02 1.38E-03 -1.45E-03 7.93E-01 -4.35E-02 5.28E-01 -6.92E-02 3.16E-01
Concurrency -3.24E-02 4.72E-09 -4.06E-02 1.89E-13 -4.35E-02 5.28E-01 -6.92E-02 3.16E-01
Macros -3.65E-03 5.09E-01 -8.91E-03 1.07E-01 — — — —
Trait -2.60E-03 6.38E-01 -8.33E-05 9.88E-01 — — — —
Unsafe 5.44E-03 3.25E-01 2.28E-03 6.80E-01 -3.26E-3 9.62E-01 -1.75E-01 1.06E-02
demonstrates a negative correlation (-0.0324) and is highly chosen seven of the most critical and widely used features
statistically significant (p-value = 4.72E-09). Macros and trait for our study. While some features remain unexplored, the
features do not have values available in the table. In the Rust- selected features in our experiments yield many insights and
GCC section, the correlations with issue duration are less can serve as practical guidance for developers to effectively
apparent. Ownership and borrowing show a weak positive maintain their projects.
correlation (0.0178) that is not statistically significant (p-value
5.3 Issue Duration and Bug-fixing
= 0.797). Pattern matching has a positive correlation of 0.156,
which is statistically significant (p-value = 0.0227). Error RQ 4 delves into the factors influencing bug-fixing and utilizes
handling exhibits a negative correlation (-0.0435) with issue issue duration as a measure of the bug-fixing effort. While
duration, but it is not statistically significant (p-value = 0.528). issue duration represents only a portion of the overall effort,
Concurrency displays a negative correlation (-0.0435), similar it plays a crucial role in bug-fixing processes. Issue duration
to Rustc, but its statistical significance is not reported. Macros reflects the time and resources in addressing specific bugs,
and trait features are not available in the Rust-GCC section. aiding in project planning and resource allocation. Longer
Unsafe Rust code exhibits a negative correlation (-0.175) with durations may indicate more complex or critical bugs, while
a p-value = 0.0106, indicating statistical significance. shorter durations may suggest simpler fixes. Therefore, we
In summary, this table provides insights into the correlations believe that issue duration can effectively represent the bug-
between issue duration and different features of the Rust fixing effort in most cases.
programming language. The findings suggest that concurrency
6. I MPLICATIONS
and error-handling features may have a stronger influence on
issue duration. In contrast, ownership and borrowing, pattern This section presents the implications arising from the findings
matching, macros, traits, and unsafe Rust code features show of this paper. From different perspectives, i.e., users, develop-
weaker or less conclusive associations. ers, software testers, and project managers, these implications
shed light on various aspects of bug identification, resolution,
Finding 9: The more frequently concurrency features are and overall software quality.
utilized in test cases, the faster bugs can be fixed.
6.1 Implications to Rust Software Developers
5. T HREATS TO VALIDITY From the perspective of the software developers of Rust, these
findings provide specific insights that can guide their coding
This section discusses threats to validity.
practices. Finding 2 highlights that the usage of ownership
5.1 The Selection of Rust Compilers. and borrowing, error handling, and pattern matching in bug-
This paper endeavors to gain insights from historical bugs triggering test cases may tend to cause bugs in Rust compilers.
in Rust compilers. However, it is important to acknowledge Developers can utilize this information to identify potential
that conducting experiments on all Rust compilers is not areas of concern and pay closer attention to these language
feasible within the scope of this study. Therefore, to achieve features during code development and testing. By ensuring
meaningful results, we have focused our investigation on proper usage of these features and conducting thorough testing
two of the most prominent and extensively utilized Rust around them, developers can reduce the occurrence of bugs
compilers: Rustc and RustGCC. These compilers boast active and enhance the overall quality of their code.
development teams and have garnered over 2k stars on GitHub,
6.2 Implications to Testers of Rust Compilers
attesting to their widespread adoption and popularity within
the community. Software testers can benefit from several findings. Finding 1
reveals that bug-triggering test cases typically involve a small
5.2 The Representative of Rust Features number of lines of code and variables. Finding 2 highlights
While Rust boasts numerous features, it is impractical to cover that the usage of ownership and borrowing, error handling,
all of them in our experiments. As a result, we have carefully and pattern matching may relate to bugs. Thus, testers can
145
use this information to design small test cases containing these and software bugs by integrating versioning systems and bug
features, which may guide them to identify and resolve bugs reporting systems. This approach enables the characterization
more efficiently. Finding 3 highlights that most bugs in Rustc of software artifact evolution. Li et al. [15] examine bug char-
are related to the kernel of compilers, while bugs in Rust- acteristics in modern open-source software, revealing changes
GCC are more frequently associated with the cleanup process. in bug trends, the persistence of simple memory-related bugs,
Testers can focus their testing efforts on these specific areas the prevalence of semantic bugs, and the increasing presence
to identify and report bugs early in the development cycle. It of security bugs. Ibrahim et al. [12] investigate the relationship
is crucial for testers to thoroughly understand the kernel and between comment update practices and software bugs. They
cleanup processes to effectively identify and address potential find that inconsistent changes between code and comments,
issues. Additionally, Findings 4 and 5 identify the modules especially when previously consistent, pose a higher risk of
and files with the highest bug-proneness. Testers can prioritize introducing bugs. Guo et al. [11] focus on the bug reas-
testing activities in these areas to maximize bug detection and signment process in the Microsoft Windows Vista operating
contribute to overall quality assurance efforts. system project. They reveal that bug reassignments can be
beneficial in determining the most suitable person to fix a
6.3 Implications to Project Managers bug. Shihab et al. [25] aim to predict re-opened bugs in open-
From a project management perspective, Findings 6, 7, and source software. They identify key dimensions, such as work
8 provide implications for bug fixing and issue resolution. habits, bug reports, bug fixes, and team dynamics, to build
Finding 6 suggests that minimizing the number of files decision trees for predicting re-opened bugs. Thung et al.
modified during the bug-fixing process can expedite issue [29] analyze bugs in machine learning systems, exploring bug
resolution. Project managers can enforce best practices and categories, severities, fix time and effort, and bug impacts.
code review guidelines to ensure that developers make minimal They emphasize the need for further research in addressing
code modifications when addressing bugs. Finding 7 highlights bugs in algorithm-intensive machine learning systems. Saha
that bugs that are more frequently discussed require more time et al. [23] focus on long-lived bugs in software development,
to fix. Project managers can encourage efficient and effective analyzing their proportion, severity, assignment, reasons, and
communication channels among team members to streamline nature of fixes. They highlight the adverse impact of long-
discussions and expedite bug resolution. Finding 8 establishes lived bugs on user experience and suggest careful prioritization
a correlation between the duration of pull requests and the for quicker bug fixes. Tan et al. [28] analyze real-world bugs
associated issue duration. Project managers can ensure timely in Linux kernel, Mozilla, and Apache projects. They reveal
code reviews and pull request handling to prevent delays in that semantic bugs are the main root cause, calling for more
bug fixing and minimize the impact on project timelines. research on addressing them. Chen et al. [4] examine dormant
bugs in software systems, highlighting their prevalence and
6.4 Implications to Researchers impact on understanding software quality. They find that
From a research perspective, these findings provide insights dormant bugs differ from non-dormant bugs in various aspects
into bug patterns and areas for further investigation. Re- and emphasize the need to incorporate the study of dormant
searchers can delve deeper into the identified modules and bugs in assessing software quality. Vahabzadeh et al. [31] focus
files with the highest bug-proneness (Findings 4 and 5) to un- on bugs in test code, comparing properties of test bugs with
derstand the underlying reasons and potential solutions. They production bugs and categorizing test bugs based on impact
can explore the kernel of compilers in Rustc and the cleanup and root causes. They highlight the importance of addressing
process in Rust-GCC to identify opportunities for optimization test code bugs and their distinct characteristics. Sahoo et al.
and improvement. Finding 9 suggests that utilizing concur- [24] explore the characteristics of reported bugs in server
rency features more frequently in test cases can expedite bug software and their implications for automated bug diagnosis.
resolution. This encourages researchers to investigate the deep They find that a high percentage of bug symptoms can be
reasons that concurrency features positively impact bug-fixing reliably reproduced at the production site using the same set
speed and overall development efficiency. The findings can of inputs. Cotroneo et al. [5] investigate the bug manifestation
guide research efforts toward enhancing the overall reliability, process and its impact on system failures. They identify
performance, and efficiency of Rust compilers. failure-exposing conditions and analyze bug reports to under-
stand bug-triggering conditions, their evolution over time, and
7. R ELATED WORK their impact on the user. Cairo et al. [2] conduct a systematic
This section presents the related work, primarily focusing on literature review on the influence of code smells on software
three aspects: empirical studies on software bugs, empirical bugs. They identify influential and less influential code smells
studies on compiler bugs, and studies on the Rust features. based on evidence from selected studies and provide insights
into analyzing the impact of code smells. Timperley et al. [30]
7.1 Empirical Studies on Software Bugs introduce BugZoo, a decentralized platform for distributing,
Understanding bugs in complex software systems is a critical reproducing, and interacting with historical software bugs.
research area. D’Ambros and Lanza [6] propose a visual ap- BugZoo ensures reproducibility, extensibility, and usability
proach to analyze the relationship between evolving software for conducting experiments in bug-related research. Zampetti
146
et al. [36] present an empirical characterization of software quality, bug detection, and risk mitigation in PHP. Wang
bugs in open-source Cyber-Physical Systems, identifying root et al. [32] study bugs in Python interpreters (CPython and
causes and highlighting the need for specialized verification PyPy), analyzing bug distribution, test programs, bug-fixing
and validation techniques for these systems. Ding et al. [7] duration, priority correlation, and root causes. Their findings
analyze bugs discovered by the OSS-Fuzz continuous fuzzing offer insights for detecting and fixing Python interpreter bugs,
service in open-source software projects. They examine the improving interpreter quality, and addressing potential issues.
characteristics and lifecycles of fuzzer-found faults, challenges Liu et al. [17] conduct a large-scale empirical study on bugs
posed by flaky bugs, and the limited filing of CVEs for security in Python interpreters, identifying bug locations, symptoms,
vulnerabilities. root causes, and fixing time. They highlight problematic
These studies provide insights into various aspects of soft- components, common symptoms, and root causes, providing
ware bugs, including their characteristics, prediction, fixing practical implications for testing, debugging, and improving
processes, and implications for software quality and user Python interpreters.
experience. Different from the research on traditional software, These studies offer insights into bugs found in compilers
our empirical study focuses on bugs in fundamental software, or interpreters. Rust, as a growing language, has gained
i.e., Rust compilers. As the foundation of all software, com- increasing popularity, yet there is a lack of prior research
pilers may involve intricate designs, and the consequences of on bugs specific to Rust compilers. Consequently, this paper
errors are more severe. Therefore, our research aims to delve presents the first empirical study aimed at understanding bugs
into the historical bugs in Rust compilers to gain a deeper in Rust compilers. Our findings and implications may serve
understanding of them. as a resource for testers, developers, and researchers working
with Rust compilers.
7.2 Empirical Studies on Compiler Bugs
Compilers/interpreters are crucial components of program- 7.3 Studies on the Rust Features
ming languages, and understanding bugs in these complex Rust, known for its safety, concurrency, and performance
software systems is essential. Li et al. [14] study performance features, has attracted significant research attention. Qin et
bugs in Markdown compilers, highlighting the main cause al. [21] analyze memory and thread safety practices in real-
as the handling of context-sensitive features. They develop world Rust programs, identifying memory-safety issues and
MdPerfFuzz, a fuzzing framework, to detect unknown bugs concurrency bugs. Their findings offer insights into Rust
and identify new performance bugs in real-world Markdown program behaviors and propose directions for building bug
compilers. Sun et al. [27] examine bug-fix revisions in GCC detectors. Chakraborty et al. [3] explore developer discus-
and LLVM compilers, revealing C++ as the most error-prone sions and support for new programming languages, i.e., Go,
component and providing insights into bug-fixing patterns and Swift, Rust, on Stack Overflow. They examine difficult topics,
priorities. Their findings contribute to improving compiler test- resource availability, and the relationship between developer
ing and debugging. Zhou et al. [38] investigate optimization activity and language growth. Li et al. [13] investigate the
bugs in GCC and LLVM compilers, analyzing bug characteris- usage and impact of the yank mechanism in the Rust package
tics, misoptimizations, bug lifespan, and average fix duration. registry, revealing the proportion of yanked releases and their
They highlight the need for effective techniques and tools for reasons. They also highlight the adoption of yanked releases
testing and debugging compiler optimizations. Romano et al. and the resulting unresolved dependencies in the ecosystem.
[22] focus on bugs in WebAssembly compilers, analyzing bug Li et al. [16] develop FFIChecker, a tool for detecting memory
characteristics, lifecycle, impact, and bug-inducing inputs and management issues across the Rust/C Foreign Function Inter-
fixes. Their findings offer insights for enhancing development face (FFI). They demonstrate its effectiveness in addressing
and testing efforts and suggest opportunities for practical tools real-world cross-language memory management issues. Xu et
to test and debug WebAssembly compilers. Du et al. [8] al. [35] conduct an in-depth study of memory-safety issues
examine bugs in deep learning (DL) compilers, identifying in Rust, analyzing existing common vulnerabilities and ex-
root causes and exploring bug types, consequences, and fix- posures (CVEs) and categorizing these bugs. They propose
ing durations. Their findings have practical implications for best practices and methods to enhance the security of Rust
DL compiler developers and users, improving development development. Astrauskas et al. [1] empirically examine the
quality and understanding bug impacts. Wang et al. [33] usage of unsafe code in Rust, evaluating the validity of the
analyze bug characteristics in JavaScript engines, revealing Rust hypothesis and classifying the purposes for which unsafe
buggy components, test programs for bug discovery, bug-fixing code is employed. Evans et al. [9] analyze the usage and safety
durations, priority assignments, and common root causes. implications of Unsafe Rust in real-world libraries and appli-
Their study enhances our understanding of JavaScript engine cations. They reveal the presence of Unsafe Rust in call chains
bugs, facilitating effective bug detection and fixing. Wang and discuss the challenges it poses to Rust’s memory safety
et al. [34] investigate bugs in PHP, analyzing bug reports, guarantees. Zhang et al. [37] focus on understanding the run-
revisions, and root causes. They identify bug distribution, time performance of Rust compared to C. They investigate the
impact on packages, repair duration, and common root causes. performance of Rust using micro benchmarks and highlight the
These findings provide insights for improving development performance overhead compared to C due to run-time checks
147
and language design restrictions. Zhu et al. [39] examine the [4] Tse-Hsun Chen, Meiyappan Nagappan, Emad Shihab,
learning and programming challenges of Rust’s safety rules and Ahmed E Hassan. An empirical study of dormant
through empirical analysis of Stack Overflow questions and an bugs. In Proceedings of the 11th Working Conference on
online survey. Their study provides insights for Rust learners, Mining Software Repositories, pages 82–91, 2014.
practitioners, and language designers. [5] Domenico Cotroneo, Roberto Pietrantuono, Stefano
These studies make significant contributions to our under- Russo, and Kishor Trivedi. How do bugs surface? a
standing of various facets of Rust, such as memory safety, comprehensive study on the characteristics of software
concurrency, performance, and developer experiences. In this bugs manifestation. Journal of Systems and Software,
paper, we delve into a distinct aspect of Rust: bugs in Rust 113:27–43, 2016.
compilers. Our research fills a crucial research gap in the realm [6] Marco D’Ambros and Michele Lanza. Software bugs
of Rust, complementing existing studies on the language. We and evolution: A visual approach to uncover their rela-
believe these investigations work together to foster the growth tionship. In Conference on Software Maintenance and
and development of the Rust community. Reengineering (CSMR’06), pages 10–pp. IEEE, 2006.
[7] Zhen Yu Ding and Claire Le Goues. An empirical study
8. CONCLUSION
of oss-fuzz bugs. In 2021 IEEE/ACM 18th International
This paper presents an empirical study on the bugs of two Conference on Mining Software Repositories (MSR),
mainstream Rust compilers, i.e., Rustc and Rust-GCC. Specif- pages 131–142. IEEE, 2021.
ically, based on the historical bugs crawled from GitHub, we [8] Xiaoting Du, Zheng Zheng, Lei Ma, and Jianjun Zhao.
investigate four research questions on bug-triggering test cases, An empirical study on common bugs in deep learning
the bug root causes, bug locations, and bug-fixing effort. The compilers. In 2021 IEEE 32nd International Symposium
result shows that most bugs in Rustc are related to the kernel on Software Reliability Engineering (ISSRE), pages 184–
of compilers, while most bugs in Rust-GCC are related to the 195. IEEE, 2021.
cleanup process. The module “src/librustc” exhibits the high- [9] Ana Nora Evans, Bradford Campbell, and Mary Lou
est bug-proneness in the Rustc compiler, while the modules Soffa. Is rust used safely by software developers?
“gcc/rust” demonstrate the highest bug-proneness in the Rust- In Proceedings of the ACM/IEEE 42nd International
GCC compiler. The more frequently concurrency features of Conference on Software Engineering, pages 246–257,
Rust are utilized in test cases, the faster bugs can be fixed. 2020.
These findings shed light on the improvement of compiler [10] https://docs.github.com/en/rest?apiVersion=2022-11-28.
quality and reliability, aiding developers in building more Last accessed on Jul. 12, 2023.
robust Rust applications and potential areas for enhancement, [11] Philip J Guo, Thomas Zimmermann, Nachiappan Na-
guiding future efforts to enhance Rust’s compilation process gappan, and Brendan Murphy. ” not my bug!” and
and enhance overall language stability. other reasons for software bug report reassignments. In
Proceedings of the ACM 2011 conference on Computer
ACKNOWLEDGMENT
supported cooperative work, pages 395–404, 2011.
We would like to thank anonymous reviewers for their insight- [12] Walid M Ibrahim, Nicolas Bettenburg, Bram Adams, and
ful and constructive comments. This research was partially Ahmed E Hassan. On the relationship between comment
funded by the National Natural Science Foundation of China update practices and software bugs. Journal of Systems
under Grant No. 62172209, and the Science, Technology and Software, 85(10):2293–2304, 2012.
and Innovation Commission of Shenzhen Municipality (No. [13] Hao Li, Filipe R Cogo, and Cor-Paul Bezemer. An
CJGJZD20200617103001003). empirical study of yanked releases in the rust package
registry. IEEE Transactions on Software Engineering,
R EFERENCES
49(1):437–449, 2022.
[1] Vytautas Astrauskas, Christoph Matheja, Federico Poli, [14] Penghui Li, Yinxi Liu, and Wei Meng. Understanding
Peter Müller, and Alexander J Summers. How do and detecting performance bugs in markdown compilers.
programmers use unsafe rust? Proceedings of the ACM In 2021 36th IEEE/ACM International Conference on
on Programming Languages, 4(OOPSLA):1–27, 2020. Automated Software Engineering (ASE), pages 892–904.
[2] Aloisio S Cairo, Glauco de F Carneiro, and Miguel P IEEE, 2021.
Monteiro. The impact of code smells on software bugs: [15] Zhenmin Li, Lin Tan, Xuanhui Wang, Shan Lu,
A systematic literature review. Information, 9(11):273, Yuanyuan Zhou, and Chengxiang Zhai. Have things
2018. changed now? an empirical study of bug characteristics
[3] Partha Chakraborty, Rifat Shahriyar, Anindya Iqbal, and in modern open source software. In Proceedings of the
Gias Uddin. How do developers discuss and support 1st workshop on Architectural and system support for
new programming languages in technical q&a site? an improving software dependability, pages 25–33, 2006.
empirical study of go, swift, and rust in stack overflow. [16] Zhuohua Li, Jincheng Wang, Mingshen Sun, and John CS
Information and Software Technology, 137:106603, 2021. Lui. Detecting cross-language memory management
148
issues in rust. In European Symposium on Research in [30] Christopher Steven Timperley, Susan Stepney, and
Computer Security, pages 680–700. Springer, 2022. Claire Le Goues. Bugzoo: a platform for studying
[17] Di Liu, Yang Feng, Yanyan Yan, and Baowen Xu. software bugs. In Proceedings of the 40th interna-
Towards understanding bugs in python interpreters. Em- tional conference on software engineering: companion
pirical Software Engineering, 28(1):19, 2023. proceeedings, pages 446–447, 2018.
[18] Nicholas D Matsakis and Felix S Klock. The rust [31] Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah.
language. ACM SIGAda Ada Letters, 34(3):103–104, An empirical study of bugs in test code. In 2015 IEEE
2014. international conference on software maintenance and
[19] Karl Pearson. Vii. note on regression and inheritance in evolution (ICSME), pages 101–110. IEEE, 2015.
the case of two parents. proceedings of the royal society [32] Ziyuan Wang, Dexin Bu, Aiyue Sun, Shanyi Gou, Yong
of London, 58(347-352):240–242, 1895. Wang, and Lin Chen. An empirical study on bugs in
[20] https://github.com/PyGithub/PyGithub. Last accessed on python interpreters. IEEE Transactions on Reliability,
Jul. 12, 2023. 71(2):716–734, 2022.
[21] Boqin Qin, Yilun Chen, Zeming Yu, Linhai Song, and [33] Ziyuan Wang, Dexin Bu, Nannan Wang, Sijie Yu, Shanyi
Yiying Zhang. Understanding memory and thread safety Gou, and Aiyue Sun. An empirical study on bugs in
practices and issues in real-world rust programs. In javascript engines. Information and Software Technology,
Proceedings of the 41st ACM SIGPLAN Conference 155:107105, 2023.
on Programming Language Design and Implementation, [34] Ziyuan Wang, Dexin Bu, Xingpeng Xuan, and Jia Gu. An
pages 763–779, 2020. empirical study on bugs in php. International Journal
[22] Alan Romano, Xinyue Liu, Yonghwi Kwon, and Weihang of Software Engineering and Knowledge Engineering,
Wang. An empirical study of bugs in webassembly com- 32(06):845–870, 2022.
pilers. In 2021 36th IEEE/ACM International Conference [35] Hui Xu, Zhuangbin Chen, Mingshen Sun, Yangfan Zhou,
on Automated Software Engineering (ASE), pages 42–54. and Michael R Lyu. Memory-safety challenge considered
IEEE, 2021. solved? an in-depth study with all rust cves. ACM
[23] Ripon K Saha, Sarfraz Khurshid, and Dewayne E Perry. Transactions on Software Engineering and Methodology
An empirical study of long lived bugs. In 2014 Software (TOSEM), 31(1):1–25, 2021.
Evolution Week-IEEE Conference on Software Mainte- [36] Fiorella Zampetti, Ritu Kapur, Massimiliano Di Penta,
nance, Reengineering, and Reverse Engineering (CSMR- and Sebastiano Panichella. An empirical characterization
WCRE), pages 144–153. IEEE, 2014. of software bugs in open-source cyber–physical systems.
[24] Swarup Kumar Sahoo, John Criswell, and Vikram Adve. Journal of Systems and Software, 192:111425, 2022.
An empirical study of reported bugs in server soft- [37] Yuchen Zhang, Yunhang Zhang, Georgios Portokalidis,
ware with implications for automated bug diagnosis. and Jun Xu. Towards understanding the runtime perfor-
In Proceedings of the 32nd ACM/IEEE International mance of rust. In Proceedings of the 37th IEEE/ACM
Conference on Software Engineering-Volume 1, pages International Conference on Automated Software Engi-
485–494, 2010. neering, pages 1–6, 2022.
[25] Emad Shihab, Akinori Ihara, Yasutaka Kamei, Walid M [38] Zhide Zhou, Zhilei Ren, Guojun Gao, and He Jiang. An
Ibrahim, Masao Ohira, Bram Adams, Ahmed E Hassan, empirical study of optimization bugs in gcc and llvm.
and Ken-ichi Matsumoto. Studying re-opened bugs in Journal of Systems and Software, 174:110884, 2021.
open source software. Empirical Software Engineering, [39] Shuofei Zhu, Ziyi Zhang, Boqin Qin, Aiping Xiong, and
18:1005–1042, 2013. Linhai Song. Learning and programming challenges of
[26] Charles Spearman. The proof and measurement of rust: A mixed-methods study. In Proceedings of the 44th
association between two things. The American journal International Conference on Software Engineering, pages
of psychology, 100(3/4):441–471, 1987. 1269–1281, 2022.
[27] Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su.
Toward understanding compiler bugs in gcc and llvm.
In Proceedings of the 25th international symposium on
software testing and analysis, pages 294–305, 2016.
[28] Lin Tan, Chen Liu, Zhenmin Li, Xuanhui Wang,
Yuanyuan Zhou, and Chengxiang Zhai. Bug charac-
teristics in open source software. Empirical software
engineering, 19:1665–1705, 2014.
[29] Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao
Jiang. An empirical study of bugs in machine learning
systems. In 2012 IEEE 23rd International Symposium on
Software Reliability Engineering, pages 271–280. IEEE,
2012.
149