0% found this document useful (0 votes)
36 views12 pages

Efficient Pattern-Based Static Analysis Approach Via Regular-Expression Rules

The document discusses an efficient pattern-based static analysis approach called Codegex that uses regular expression rules to detect bugs. Codegex aims to address limitations of existing tools like SpotBugs by providing faster analysis without compilation and supporting analysis of partial programs.

Uploaded by

chm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views12 pages

Efficient Pattern-Based Static Analysis Approach Via Regular-Expression Rules

The document discusses an efficient pattern-based static analysis approach called Codegex that uses regular expression rules to detect bugs. Codegex aims to address limitations of existing tools like SpotBugs by providing faster analysis without compilation and supporting analysis of partial programs.

Uploaded by

chm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) | 978-1-6654-5278-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/SANER56733.2023.00022

Efficient Pattern-based Static Analysis Approach via


Regular-Expression Rules
Xiaowen Zhang∗ Ying Zhou∗ Shin Hwei Tan†
Dept. of Comp. Sci. and Engr. Dept. of Comp. Sci. and Engr. Dept. of Comp. Sci. and Engr.
Southern University of Sci. and Tech. Southern University of Sci. and Tech. Southern University of Sci. and Tech.
Shenzhen, China Shenzhen, China Shenzhen, China
[email protected] [email protected] [email protected]

Abstract—Pattern-based static analyzers like SpotBugs use bug not need compilation. Fourth, the high false positive (FP) rate
patterns (rules) to detect bugs may have several limitations: is the key barrier to using static analyzers [1], [13], [14].
(1) too slow, (2) do not usually support analysis of partial We propose Codegex, a pattern-based static analysis ap-
programs, (3) require parsing code into AST/CFG, and (4) high
false positive rate. Each pattern relies on analysis context (e.g., proach based on regular expression (regex). Our key insight is
data flow analysis) to improve the accuracy of the analysis. To that many bug patterns checked by pattern-based static ana-
understand the analysis contexts required by each pattern, we lyzers can be naturally represented by regex rules, especially
study the design of bug patterns in SpotBugs. Based on our study, patterns that rely on string matching. By using regex to match
we present Codegex, an efficient pattern-based static analysis these patterns, our approach can address the aforementioned
approach that uses regular expression with several strategies to
extract more information from program texts (syntax and type limitations, including: (1) provides instant feedback by saving
information). It can analyze partial and complete code quickly compilation overhead, (2) supports analysis of partial pro-
without parsing code into AST. We evaluate Codegex using two grams, and (3) does not require parsing code into AST/CFG
settings. First, we compare the effectiveness and efficiency of that may cause build failure. However, relying solely on regex
Codegex and SpotBugs in analyzing 52 projects. Our results may lead to high FP rate.
show that Codegex can detect bugs with comparable accuracy as
SpotBugs but up to 590X faster, showing the potential of using Pattern-based static analyzers usually use one or several
Codegex as the fast stage of SpotBugs in a two-stage approach types of analysis contexts (e.g., data flow analysis) to reduce
for instant feedback. Second, we evaluate Codegex in automated FPs. To understand the analysis contexts required by each
code review by running it on 4256 PRs where it generated 372 bug pattern in a pattern-based static analyzer, we conducted
review comments and received 116 feedback. Overall, 78.45% of a study of SpotBugs. We select SpotBugs as it (1) is one
the feedback that we received is positive, indicating the promise
of using Codegex for automated code review. of the most popular static analyzers, and (2) has the largest
Index Terms—Partial Program Analysis, Static Analysis, Regex number of bug patterns than other tools (e.g., Error Prone,
PMD, Infer) [8], [15]. Our study revealed that: (1) most bug
patterns in SpotBugs do not require analysis contexts beyond
I. I NTRODUCTION the class under analysis,(2) method information (e.g., method
Pattern-based static analyzers detect bugs via a set of bug name) required in some bug patterns can be extracted directly
patterns (rules for detecting a potential problem in a given from the program texts, and (3) data type information is the
program) but there exist several barriers that hinder their most important one among all analysis contexts.
wide adoption. First, developers think that static analyzers Inspired by our study, we design each pattern in Codegex by
are too slow to run [1], [2]. Most static analysis tools (e.g., first using a regex to match the bug within a single statement,
Coverity [3] and Fortify [4]) are designed to run in batch and then employing several heuristics to improve the efficiency
mode, and are not well-integrated into the development envi- and accuracy on-demand if bug patterns require more analysis
ronment (IDE) where instant feedback is required [5]. Second, contexts. These strategies include (1) syntax-guided matching
developers prefer static analyzers that support partial analysis, (using keywords to encode class/method signature informa-
analyzing only recent code changes [2], [6], [7]. Third, most tion), (2) explicit type-driven matching (matching implicit data
static analyzers rely heavily on compiler technologies. They type for patterns requiring type checking), (3) matching at
either (1) require users to manually set up build configurations word boundary (optimizing the analysis), (4) broadening anal-
(cannot be easily deployed to support gazillions of repositories ysis scope via ”diff” search (searching across all code changes
in GitHub [6]) or (2) fail to run due to compilation errors in an- instead of a statement) and online search searching all files in
alyzed project (27.5% of evaluated programs in SpotBugs fail the repository), (5) encoding operator precedence (increasing
to run [8]). In fact, compiled classes may be unavailable [9]– accuracy of analyzing arithmetic and bitwise operations), and
[11] but only Checkstyle [12] that checks for coding rules does (6) enforcing anti-patterns (rules for filtering FPs).
∗ Joint first authors Codegex can benefit developers in two settings: (1) Codegex
† Corresponding author provides instant feedback in the IDE via 87 patterns. Motivated

2640-7574/23/$31.00 ©2023 IEEE 132


DOI 10.1109/SANER56733.2023.00022
Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
by a prior study where developers expressed the need for a patterns. Consider SA FIELD SELF COMPUTATION and
two-stage approach in a program analyzer [2], Codegex is SA LOCAL SELF COMPUTATION patterns that check for
used on top of SpotBugs with a two-stage approach where the nonsensical self computation (e.g., x & x) in global fields and
first stage runs the detectors for the 87 patterns in Codegex local variables, respectively. Since Codegex relies on program
to provide real-time feedback, and the second stage runs the text information in partial snippets, it cannot distinguish be-
remaining detectors for more sophisticated analysis during tween a global field and a local variable without analyzing its
nightly builds. (2) Codegex analyzes the code snippets within scope. However, as self computation is considered problematic
a pull request (PR) during code review. Prior studies show that in any scope (local or global), Codegex can still detect the
static analysis approaches could assist developers in automated self computation. Listing 1 shows a self computation reported
code review [16], [17]. A prior attempt was to integrate by Codegex in the expression endDate.getTime() -
FindBugs (a deprecated predecessor of SpotBugs) as part of endDate.getTime(). Codegex detects the self computa-
a review bot [18]. As it requires bytecode for analysis, the tion by (1) searching for keywords representing arithmetic
Review Bot workaround this by (1) maintaining a local copy operators (’|’, ’∧’, ’&’, ’−’), and (2) if the keyword is found,
of the project source code, (2) synchronizing the local copy it uses the following regex to detect the self computation:
to the changelist determined by a trial-and-error approach named capturing group subroutine of aux1
(trying each candidate changelist until the build succeeds),
(3) copying merged files, and (4) building the project. The (\w(?:[\w.]|(?P<aux1>...))*)\s*([|ˆ&-])\s*(\w(?:[\w.]|(?&aux1))*)

workaround may still incur build failures, causing delays in operand1 operator operand2
running SpotBugs for automated code review.
Overall, our contributions can be summarized as follows: The regex above checks if operand1 matches operand2.
• We perform the first study of the analysis contexts of 438 In contrast, SpotBugs fails to detect the self computation in
bug patterns in SpotBugs by reading their documentation Listing 1 because (1) it performs bytecode analysis where the
and implementation. It helps us identify the set of bug check expression needs to be parsed and finds two method
patterns that can be run quickly in the two-stage approach. invocations from the stack, and (2) it needs to match the
• We propose Codegex, a novel static analysis approach that program text with the object endDate and the method
uses regex-based rules and several strategies that augments getTime() (if an expression includes a long method call
rules by analyzing contexts. Codegex can perform quick yet chain like a.b.c(), it needs to iteratively traverse the
accurate analysis of partial code snippets without parsing call chains). Based on the example (x-x) provided in the
into AST/CFG. bug description for SA LOCAL SELF COMPUTATION, we
• We compare the effectiveness of Codegex and SpotBugs encode the ’-’ operator as one of the operators for self
on 52 open-source projects. Our results show that Codegex computation, thinking that SpotBugs should be able to detect
can analyze real-world projects quickly with comparable the self computation temp-temp in Listing 2 where the
accuracy as SpotBugs. Codegex runs up to approximately expression endDate.getTime() is stored in temp. But
24K faster than SpotBugs when considering the initial SpotBugs still fails to detect the self computation as (1) it
compilation time. Moreover, we manually analyzed the FP fails to detect double type variables (the opcodes for integer
and false negative (FN) cases reported by SpotBugs. Our subtraction and floating point subtraction are different, adding
study reveals several limitations of SpotBugs, and provides this support requires matching opcodes for all supported data
insights on potential improvements to its bug detection types); and (2) a bug exits in its current implementation of
capabilities. Overall, we have reported 16 bugs to SpotBugs self computation for local variables. Although the expression
where ten are confirmed and eight are fixed. x-x is listed as an example in its description, the current
• We evaluate the effectiveness of Codegex in automated code implementation only matches expressions containing the xor
review by running it against 4256 PRs from 2769 different operator ”∧”, and ignores all other operators. We reported and
projects. As Codegex automatically analyzes PRs and leaves provided corresponding PRs to the developers of SpotBugs to
code review comments, we assess whether it follows the bot fix both limitations [21], [22], and the developers have been
ethics [19]. To our knowledge, this is the largest evaluation accepted them. The example shows two problems in Spot-
reported for automated code review on unresolved PRs. Bugs: (1) incomplete modeling of sibling types (e.g., double
In the end, we received 91 positive feedback from the and integer), and (2) failure to handle complex expressions
developers. The source code and our dataset are available involving method calls. Although it may seem trivial to extend
at the anonymous link at https://codegex-analysis.github.io. SpotBugs to support more opcodes, it requires the designer
of a bug detector to think exhaustively about the possible
II. M OTIVATING EXAMPLE scenarios (e.g., different data types) in which the detector will
As Codegex only requires limited contexts (in a PR), be invoked. Our example also shows that the scope of the
one may think that it is essentially trading accuracy variables (local versus field) is irrelevant for checking for self
for speed. In this section, we use a simplified ex- computation. The key to check for self computations lies in
ample from a PR [20] to show how Codegex can the string matching of the form x op x. Hence, we argue
be faster and yet more accurate in detecting certain that bug patterns that involve string matching (e.g. specified

133

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
1 double e x a m p l e S e l f C o m p u t a t i o n 1 ( D a t e e n d D a t e ) { 1 double e x a m p l e S e l f C o m p u t a t i o n 2 ( D a t e e n d D a t e ) {
2 //Codegex reported a self computation 2 double temp = e n d D a t e . g e t T i m e ( ) ;
3 double temp = e n d D a t e . g e t T i m e ( ) − e n d D a t e . g e t T i m e ( ) ; 3 // SpotBugs should report this self computation
4 return temp ; } 4 double s e c o n d = temp − temp ;
5 return s e c o n d ; }
Listing 1: Self computation detected by Codegex
Listing 2: Self computation that should be detected by SpotBugs

method names or operators) can be more easily checked using


regex rules.
III. M OTIVATING STUDY
Static analysis tools like SpotBugs rely on bug patterns, and
each pattern depends on different types of analysis contexts
to detect a bug. However, according to SpotBugs’s official
documentation [23], most of its analysis is local, which means
that many of the analysis contexts are not required to detect a
bug pattern. To investigate the required analysis contexts, we
conducted a study of 451 bug patterns supported by SpotBugs.
Our study aims to answer the research questions below: Fig. 1: Workflow of Codegex
RQ1: What is the scope of analysis needed to detect a bug
analysis context. Table I shows that most patterns in SpotBugs
pattern in SpotBugs?
require analyzing only method level (41.32%) or statement
RQ2: What program analysis techniques are important for
level (31.28%) information for their analyses, which indicates
detecting a bug pattern in SpotBugs?
that most patterns do not require information beyond the
RQ1 aims to identify the scope of analysis required for the class under analysis. Moreover, we observe that some method
bug patterns in SpotBugs. In RQ2, we studied the program information (e.g., method name) can be obtained by analyzing
analysis techniques needed for each pattern in SpotBugs. For the program texts. In fact, eight patterns in Codegex use the
each question, two authors of the paper categorized the results method signature information for their analysis. Among all
independently and met to resolve any disagreement. program analysis techniques required by each pattern, our
For each bug pattern b, we first obtained a high-level
study shows that data type information is the most important
understanding of its design by reading b’s documentation
one (44.29%). Meanwhile, the last rows of Table I show that
(also known as bug description in SpotBugs). Each bug
most bug patterns do not require information from various
description contains (1) the rationale behind the bug patterns
graph representations (control flow, data flow, call graph,
(explaining why certain program behavior is problematic), (2)
inheritance graph). Specifically, most graph-based analysis
the condition that will trigger the bug, and (3) examples of the
techniques are required in less than 10% of the bug patterns
bug detected by a bug pattern. Then, if we failed to answer the
in SpotBugs (except for inheritance graph).
two research questions based on the bug descriptions, we refer
to the implementation of each pattern in SpotBugs. In total, IV. M ETHODOLOGY
there are 451 patterns listed on SpotBugs’ bug description Figure 1 shows the workflow of Codegex. Given a PR or a
page. We excluded 12 of them as they are internal patterns complete program, Codegex first splits the code into program
used only by SpotBugs in experiments, and one of them as statements S. Then, it uses our regex-based analyzer with
the pattern has been deleted from the implementation. These several heuristics to check if S matches any pattern. As the
results in 438 patterns in our study. To study the analysis complete program has been downloaded, Codegex skips the
contexts required by each of these 438 patterns, we consider online search, and relies on diff search for broadening its
four scopes of analysis: (1) inter-class, (2) class, (3) method, analysis scope. The analyzer produces information about the
and (4) statement. If a bug pattern requires multiple analysis pattern type, bug description, source information (file name,
scopes, we select the highest scope of analysis (e.g., if a pattern line number), and the priority for a warning, which can be
needs class level and method level information, we consider directly used as output for complete program. For PRs, our PR
that it needs class level information). For RQ2, we study comment generator automatically produces review comments
program analysis techniques (type, annotation, java version, with the annotated code.
data flow, control flow, call graph, inheritance graph) that are
used in each bug pattern. A. Preprocessing
Table I shows the results of our study where the ”Context” PR: A PR in the unified diff format contains the contexts,
column denotes the context information used, the ”Descrip- additions, and deletions. Our analyzer checks for violations in
tion” column explains each context, and the ”Pattern (%)” contexts and additions but ignores the deletions (as they no
shows the percentage of patterns that require specific contexts. longer exist in the new version).
The ”Implemented (%)” shows the percentage of patterns Complete program: A complete program is a code change
implemented in Codegex among all patterns using a particular that adds all files within the project repository.

134

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Analysis contexts used in the bug patterns in SpotBugs and those in the included patterns in Codegex
Context Description Patterns (%)2 Implemented (%)2
Inter-class Check in multiple classes whether a field/method exists or look at their content 9.36 0
Import Check if the fully-qualified names (the import statement) contain specific substring 0.91 0
Signature Check the kind (class/interface/enum), modifiers and name of the inspected class 3.2 35.71
Class 18.04 16.46
Field Check the signature of fields (modifiers, type and name) in the inspected class 11.19 16.33
Body Use information from (1) at least two methods, or (2) one method and other class-level info 10.96 2.08
Signature Check modifiers, return type, method name, parameters of the inspected method 13.7 13.33
Local variable Check the signature or usage of local variables 4.57 40
Method 41.32 18.78
Field Check the usage of fields in the scope of the inspected method 4.79 14.29
Body Use information from (1) at least two statements, or (2) one statement and other method-level info 34.02 10.74
Statement Use information from only one statement or expression 31.28 30.66

Type Check the data type information of constants, variables or fields 44.29
Annotation Check annotations of class, fields, methods or parameters 5.02
Java version Check the JDK version 1.14
Analysis Techniques Data flow Calculate the set of possible values at each program point 9.36
Control flow Check the control flow of the program 9.36
Call graph Check the relationship between method calls 3.2
Inheritance graph Check the inheritance relationship between classes 17.81
1
The numbers (in bold text) for inter-class, class, method and statement levels do not overlap and sum up to 100 (e.g., 9.36+18.04+41.32+31.28=100%). The numbers for
the sub-contexts for class level, method level and analysis techniques are intersecting (e.g., one pattern can use method information from signature and local variable).
2
We compute these values using the total number of patterns (438) as the denominator.

Given a PR or a complete program with code changes C, For example, when detecting the pattern RV 01 TO INT
Codegex parses the program text in C, and splits the texts into that gives a warning when a random value from 0 to 1
program statements. Specifically, it separates statements using is being coerced to integer value, Codegex uses the regex
terminators for Java programs (i.e., semicolon, ‘{’, and ‘}’). \(\s*int\s*\)\s*(\w+)\.(?:random|nextDouble
Preprocessing allows each pattern to be matched statement- |nextFloat)\(\s*\) for its detection where the
by-statement, giving the exact position of the statement. \(\s*int\s*\) part detects the type coercion. By
including the data type information, Codegex can report this
B. Regex-based Analyzer pattern with high confidence by setting the bug pattern to
Given statements S extracted from our preprocessing, our high priority (the same priority used in SpotBugs).
analyzer detects violation for bug patterns in S. The key Optimization by matching at word boundary. One of the
technical challenge is to represent the selected patterns in commonly used heuristics to optimize regex performance is to
SpotBugs using regex rules without compromising accuracy. do “whole words” matching by using word boundaries [25].
We use several heuristics described below to solve this prob- In the regex syntax, ‘\b’ matches a word boundary (edge be-
lem: tween sequences of alphanumeric characters or the underscore
Syntax-guided matching. Table I shows that many SpotBugs character, and any other character). For example, the regex
patterns use class signature information(3.20%) or method \bif\b matches the standalone string “ if ” but does not
signature (13.70%) for detection. To encode such informa- match “ifa” because there is no word boundary to the right
tion into our regex rules, we use keywords representing of “if”. As program texts are usually whole word strings, we
class/method/variable/field names, modifiers (e.g., ”static”), restrict each bug pattern to search for whole words so that it
Java keywords (e.g., ”if”)), and operators (”&&”) that are will skip over non-matching input quickly.
within the Java syntax for the bug detection. To support Broaden analysis scope via diff search and online search.
syntax-guided matching, we use the layered analysis ap- Our study in Section III shows that some bug patterns in
proach [24] by checking a pattern using two phases: (1) SpotBugs require more analysis contexts to detect certain
keyword matching, (2) pattern-based matching using regex. patterns. Codegex includes two heuristics to enhance the
Keyword matching to filter statements that do not match any analysis contexts: (1) diff search, and (2) online search.
patterns is a faster than regex rules. Specifically, it checks When these heuristics identifies the relevant analysis contexts,
whether a statement contains a keyword that represents the Codegex will adjust the priority for a given bug pattern
condition for a bug pattern or a group of bug patterns. For as it gains higher confidence with more contexts. For most
example, the SE NONSTATIC SERIALVERSIONID pattern implemented bug patterns, Codegex matches a regex against a
checks whether the field name serialVersionUID of a single program statement st. When the diff search heuristic
serializable class is declared as static. Statements with- is activated in a bug pattern, Codegex will use additional
out the keyword ”serialVersionUID” are skipped in keyword contexts around st by searching through all code changes in
matching. the input PR. For example, consider the SpotBugs pattern
Explicit type driven matching. Our study in Section III UI INHERITANCE UNSAFE GETRESOURCE that warns
shows that data type information is an important analysis about the usage of this.getClass().getResource()
context used in many bug patterns. Although Codegex being unsafe if this class is extended by a class in another
essentially treats code changes as plain texts for pattern package. Detecting this pattern requires checking whether (1)
matching, we incorporate explicit type information into bug a statement contains the getClass().getResource()
patterns by using data types as keywords for its analysis. method invocations (regex can be used to match this), and (2)

135

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
if the class is extended (SpotBugs will increase the priority TABLE II: Statistics of implemented bug patterns.
of the warning if this condition is met). To check for the Category # implemented patterns Total
(2) condition, Codegex searches for the “extends ClassA” CORRECTNESS 37 145
keywords (ClassA is the filename of this instance) within BAD PRACTICE 22 91
PERFORMANCE 14 37
the code changes (“diff”) in the given PR using the diff STYLE 8 86
search heuristic (note that this will not work for inner class MT CORRECTNESS 5 46
with different filenames). If the diff search fails, Codegex MALICIOUS CODE 1 17
Others 0 16
will deploy online search to check for the (2) condition. As Total 87 438
Codegex only pre-downloads “diff” in the PR, other classes
that extend ClassA are not available for analysis (i.e., they C. PR Comment Generator
may be in different files or different folders than the modified For each code snippet in a PR in which our analyzer pro-
files in the PR). Hence, we implement online search using duces a warning, our PR comment generator will give a review
the GitHub Search API1 to perform code search of the entire comment with the annotated code. We reuse SpotBugs’s bug
repository of the given PR for the “extends ClassA” keywords. description for the comments. In SpotBugs, code that violates
If the query is found within the repository of the PR, Codegex a bug pattern pat has (1) a bug category cat (e.g., STYLE), (2)
will increase the priority of the bug pattern because the (2) a short description sd, and (3) a long description ld. Codegex
condition is satisfied. Currently, Codegex uses online search produces review comments using the template below:
in only one pattern because (1) it is expensive as it relies on the
speed of the GitHub Search API, and (2) it requires defining a I detect that this code is problematic. According to the cat, sd (pat). ld
search query with exact matching (e.g., if we change the query
to “extend Class”, the search may return irrelevant results). Figure 3 shows an example of Codegex’s
Encode operator precedence. We encode the Java operator generated comment and the annotated code for the
precedence (used for determining the order in which operators NM METHOD NAMING CONVENTION pattern that
are evaluated) in our analyzer to increase the accuracy of belongs to the BAD PRACTICE category.
analyzing arithmetic and bitwise operations. For example, Implementation. Codegex uses the Python built-in regex
when detecting the SA LOCAL SELF COMPUTATION pat- library in which the regex pattern language has been stud-
tern that checks for nonsensical self computation in return ied [26], and its extension that offers extra functionalities (e.g.,
i|i&j;, if we use a regex to extract the bitwise operation, named capturing group). Table II shows the statistics of the
it will match the first expression i|i instead of i|(i&j) implemented bug patterns across different categories. The “#
because the precedence of & is higher than that of |. Encoding implemented patterns” column shows the number of patterns
operator precedence will reduce Codegex’s FP rate. implemented in Codegex, and the “Total” column denotes the
Most patterns in SpotBugs have anti-patterns (i.e., rules () total number of patterns in SpotBugs. We only implement
that disallow matching certain elements). To reduce FPs, we 87 patterns because 212 patterns require supporting multiline
encode anti-patterns using two heuristics: regex, and 139 patterns cannot be detected using regex. As
Encode anti-patterns via keyword filtering. To under- shown in Table II, most patterns Codegex currently supports
stand and reuse the design of each pattern, we refer to: belong to the CORRECTNESS and BAD PRACTICE cate-
(1) bug description, (2) source code, and (3) test cases gories. We prioritize these categories because prior studies of
in SpotBugs. We extract anti-patterns from these resources FindBugs have shown their importance (i.e., most development
to improve the accuracy of our analysis. For example, the efforts focus on these categories [27], and they have a shorter
pattern NM CLASS NAMING CONVENTION checks for lifetime implying that they are more serious [28]).
upper camel cases of a Java class. To prevent FPs when
analyzing special classes, SpotBugs added a filter for class V. E VALUATION
names with the underscore character. We reuse this filter to We evaluate two settings in which Codegex may be useful:
skip the checking for class names with underscore characters. (1) giving instant feedback for real-world projects, and (2) pro-
Encode anti-patterns via negative lookahead. As Codegex viding review comments for PRs. Codegex uses several heuris-
analyzes incomplete programs that may contain only the dec- tics to improve the effectiveness of analysis (Section IV-B). As
laration site or the call site of methods, it uses negative looka- all heuristics (except for online search) are tightly coupled with
head (a regex construct q(?!u) used to match q not followed the design of each bug pattern, we did not separately evaluate
by the regex u) for filtering negative/corner cases. For exam- each heuristic.
ple, to detect the NM METHOD NAMING CONVENTION All experiments were conducted on a machine with Intel
pattern that checks whether a Java method is in the lower (R) Core (TM) i7-8700 CPU @3.2 GHz and 32 GB RAM.
camel cases format, we include the regex (?!new) to avoid Comparison with SpotBugs. While there are many static
matching constructors (e.g., new Object()) that can have analyzers [12], [29]–[32], we only evaluate against SpotBugs
method names starting with a capital letter. because patterns in Codegex are derived from SpotBugs, and
1 https://docs.github.com/en/rest/reference/search
patterns determined the types of detected bugs (fair compari-
son with other analyzers is infeasible because each tool detects

136

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
TABLE III: Performance of Codegex versus SpotBugs (in seconds) our knowledge, the number of evaluated projects is the largest
Project KSLOC
SpotBugs Codegex Speedup reported so far in all prior evaluations of static analyzers [13],
IC SC A(S) A(C) S1 S2 S3
community 3.72 160.80 6.55 5.13 0.58 283.91 19.98 8.78 [34]–[36]. Table III presents the performance comparison with
Angular2AndJavaEE 1.42 239.20 92.80 5.44 0.25 963.40 386.87 21.42
biojava 117.78 59.22 13.70 42.44 19.30 5.27 2.91 2.20 SpotBugs. The ”Project” column denotes the abbreviated name
nacos-spring 1.80 584.01 12.84 15.36 0.31 1925.08 90.57 49.33
spring-boot 0.48 59.09 3.59 3.60 0.07 904.59 103.78 51.92 of each project, and the ”KSLOC” column means the 1000 (K)
spring-boot-java 5.30 70.00 9.80 5.00 0.92 81.80 16.14 5.46
quickfixj 1279.49 442.60 180.20 116.80 3.25 171.90 91.26 35.89
Source Lines of Code (SLOC). Overall, the evaluated projects
spring-comparing
tij4-maven
0.91
17.89
178.40 7.22 4.36
54.19 0.24 1.05
0.14 1304.40 82.65 31.09
<0.01 5524.60 129.80 105.40
are diverse in terms of size (0.01–1279.49 KSLOC).
fabric8-maven-plugin 48.01 2654.20 17.39 43.60 3.79 712.10 16.10 11.51
java-microservice 2.59 883.80 8.12 11.04 0.04 24213.24 518.55 298.78 When running SpotBugs, we use default configurations
spring-cloud-release 0.49 68.37 59.86 1.55 <0.01 6991.80 6140.40 154.60
java-uuid-generator 3.07 7.13 2.17 1.80 0.26 34.02 15.14 6.86 except for two differences: (1) using the debug option in
cloud-opensource
flyer-maker
10.51
0.76
721.80 322.60 11.17
13.37 1.83 3.51
0.90 817.39 372.22 12.46
0.15 114.54 36.25 23.83
SpotBugs to output a list of analyzed files and give the list to
gchisto
travels-java-api
6.57
2.95
8.60 2.70 5.61
204.60 5.16 4.49
1.23 11.54 6.75 4.56
0.35 597.25 27.56 12.82
Codegex to ensure that both tools analyze the same files (note
spring-zeebe
visualee
1.58
3.76
951.80 8.38 10.51
37.26 3.48 4.04
0.22 4326.06 84.93 47.26
0.32 130.81 23.82 12.80
that we disable the debug option when computing the analysis
javaee7-essentials
webcam-capture
0.01
16.02
1.61 1.29 1.11
183.20 7.24 32.07
<0.01 272.40 239.60 111.00
1.93 111.40 20.34 16.60
time to avoid adding overhead to SpotBugs), (2) running both
cloud-espm-v2
reactive-ms-example
4.67
1.49
4.19 3.70 5.94
570.00 3.55 3.58
0.45 22.40 21.30 13.13
0.10 5986.65 74.42 37.32
tools only on the 87 implemented patterns, filtering out the
osgi.enroute
kafka-streams
0.99
99.40
3351.60 163.20 15.84
4203.80 23.62 9.35
0.24 13815.62 734.53 64.97
16.94 248.64 1.95 0.55
unimplemented patterns in SpotBugs. In our experiments, we
code-assert 7.61 970.20 10.46 11.03 1.10 889.68 19.48 10.00 use version v4.1.4 of SpotBugs and SpotBugs Maven Plugin
opencc4j 0.87 80.80 2.62 3.36 0.10 815.89 57.97 32.61
hprose-java 15.85 12.37 3.72 6.06 2.85 6.47 3.43 2.13 (v4.2.3). To ensure fair comparison with SpotBugs, the entire
SpringBootUnity 5.86 2682.20 17.76 31.70 1.19 2287.90 41.70 26.72
triava 5.92 7.26 2.09 4.52 0.98 12.07 6.78 4.63 repository is downloaded and the same set of files are given
jol 7.20 14.01 3.13 10.73 0.86 28.73 16.10 12.46
javaee8-essentials 0.02 1.38 1.10 1.14 <0.01 251.80 224.00 113.80 to both tools for analysis.
cargotracker 5.97 626.00 6.40 4.54 0.73 864.99 15.01 6.23
hope-cloud 0.14 168.80 5.94 8.38 0.01 11935.33 964.24 564.37 Quality of generated warnings. We measure the quality of
java-speech-api 1.37 8.60 1.68 3.74 0.28 43.49 19.09 13.18
jmh 249.88 73.42 16.23 29.73 3.71 27.80 12.39 8.01 the warnings generated by each tool. Given the set C of
javaee-javascript 0.35 27.69 1.90 3.29 0.06 547.18 91.67 58.14
paho.mqtt.java
bitfinex-v2
28.44
6.41
31.92 11.35 13.59
24.89 2.78 4.51
3.43
0.91
13.26
32.18
7.26 3.96
7.98 4.93
Codegex’s generated warnings  and the set S of SpotBugs’
aem-component 1.65 88.96 6.62 4.20 0.33 281.43 32.70 12.69 generated warnings, we use S C as our ground truth because
superword 9.08 115.00 23.13 6.14 1.83 66.14 15.98 3.35
reddit-bot 1.50 15.27 3.45 3.84 0.20 95.71 36.50 19.24 (1) labeled dataset for all the evaluated projects is unavailable
asmsupport 26.71 28.78 6.28 16.78 4.44 10.26 5.19 3.78
Benchmark 146.26 110.00 15.03 21.79 28.79 4.58 1.28 0.76 and manually labeling each statement as buggy or not is time-
wro4j 33.41 356.20 44.72 11.91 3.34 110.26 16.96 3.57
spring-mvc 2.00 83.20 4.80 4.25 0.34 261.02 27.03 12.70 consuming, and (2) as FNs are absent warnings, we can only
spring-context
iot-dc3
3.06
14.67
14.42 1.94 3.86
295.20 28.01 45.70
0.38 48.06 15.25 10.15
2.97 114.65 24.79 15.37 rely on the extra warnings (S  or C  ) to determine the FNs of
nacos-spring-project
spring-boot-graalvm
7.40
0.05
104.20 6.94 8.31
59.06 3.21 3.37
0.90 124.99 16.94 9.23
0.01 10927.37 1151.37 589.86 each tool. We compute
 relative true positive (T P R ) which is
cms-admin-end
spring4.x-project
8.23
0.58
74.80 5.19 5.64
348.80 2.08 22.42
1.47 54.91 7.39 3.85
0.11 3495.71 230.69 211.12 true warning in S C, relative false positive (F P R ) which is
Average 42.73 425.70 23.07 12.67 2.18 1979.28 237.06 55.72
false warning in S C, relative false negative (F N R ) refers
different types of bugs). Our evaluation aims to address the to unreported true warning in S C.
T P R +T N R
questions below: Relative Accuracy= T P R +F P R +F N R +T N R
PR
RQ3: Compared to SpotBugs, what is the quality of the Relative Recall= T P RT +F NR
J(S, C)= |S∩C| 760
|S∪C| = 883 ≈ 0.86
generated warnings by Codegex? Relative accuracy and recall are used to compare the quality
RQ4: How responsive is Codegex compared to SpotBugs? of analysis results, whereas Jaccard index (J(S, C)) is used to
measure the similarity between two sets of data (between S
Selection of projects. We evaluate Codegex and SpotBugs on and C). The precision is 100% for each tool as all results will
52 open-source Java projects on GitHub. Projects are selected be in our ground truth dataset. The high Jaccard index between
by building a crawler to get the top 100 Java projects that the warnings generated by SpotBugs and those generated by
(1) have the greatest number of stars, and (2) use Maven Codegex (0.86) indicates that the analysis results of Codegex
for compilation (the SpotBugs Maven plugin is baseline). are comparable to that of SpotBugs. Since Jaccard index
Although Codegex does not require compilation, SpotBugs can shows high similarity between S and C, we divided the set of
only be run on compiled code so we excluded 48 uncompilable warnings given by both tools into:
projects. We manually classify the root causes of build errors
(O) Overlaps: The set of warnings generated by both tools
from 48 projects’ build logs using the definition [33]. We
that match the same bug instance (the same statement within
conclude that these errors occur due to: (1) 19 compilation
the same class for the same project). When both tools give the
errors caused by incompatible Java versions or syntax errors in
same warnings, we assume that these warnings share the same
the code, (2) 13 resolution errors due to missing dependencies
quality and label them as T P R . We manually analyzed them
or dependency version issues, (3) 12 errors are other causes
to verify that they are referring to the same bug instance.
(e.g. network problems), and (4) four build file parsing failures.
(S’) Unique in SpotBugs: The set of warnings generated ex-
The high percentage of build errors in popular Java projects
clusively by SpotBugs but not produced by Codegex.
is inline with the findings of prior work that broken snapshots
(C’) Unique in Codegex: The set of warnings generated ex-
occur in most projects [33]. It motivates the need for a tool that
clusively by Codegex but not produced by SpotBugs.
does not require compilation. We did not further crawl more
projects due to limited resources and high manual inspection RQ3: Results for effectiveness. Table IV presents the results
costs. In total, we evaluate on 52 Java projects. To the best of for the effectiveness of SpotBugs and Codegex for the patterns

137

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
TABLE IV: Effectiveness of Codegex versus SpotBugs Table III show the performance comparison between Codegex
Pattern
TPR FNR
O
Accuracy Recall and SpotBugs. The ”IC” column shows the initial compi-
S C S C S C S C
ES COMPARING STRINGS WITH EQ 16 5 0 11 5 100.0 31.25 100.0 31.25 lation time (IC) needed for building each project, whereas
SA SELF COMPUTATION 0 1 1 0 0 0.0 100.0 0.0 100.0
EQ COMPARING CLASS NAMES 0 1 1 0 0 0.0 100.0 0.0 100.0 the ”SC” column denotes the subsequent compilation time
DMI RANDOM USED ONLY ONCE 176 215 39 0 176 81.86 100.0 81.86 100.0
SA SELF COMPARISON 0 1 1 0 0 0.0 100.0 0.0 100.0 (SC) for building each project. And the ”A(S)” and ”A(C)”
VA FORMAT STRING USES NEWLINE 60 66 6 0 60 90.91 100.0 90.91 100.0
NM CLASS NAMING CONVENTION 6 5 0 1 5 100.0 83.33 100.0 83.33 columns denote the analysis time for SpotBugs and Codegex,
DM STRING CTOR 5 1 0 4 1 100.0 20.0 100.0 20.0
UI INHERITANCE UNSAFE GETRESOURCE 8 9 1 0 8 88.89 100.0 88.89 100.0 respectively. The ”S1”, ”S2” and ”S3” columns show the
DMI USELESS SUBSTRING 0 1 1 0 0 0.0 100.0 0.0 100.0
DM BOXED PRIMITIVE FOR COMPARE 2 1 0 1 1 100.0 50.0 100.0 50.0 speedup achieved by Codegex over SpotBugs taking the sum
DM BOXED PRIMITIVE FOR PARSING 13 5 0 8 5 100.0 38.46 100.0 38.46
IIO INEFFICIENT LAST INDEX OF 26 25 0 1 25 100.0 96.15 100.0 96.15 of initial compilation time and analysis time (S1); taking the
DMI HARDCODED ABSOLUTE FILENAME 46 16
IIO INEFFICIENT INDEX OF 363 354
1 31
3 12
15
351
97.87 34.04
99.18 96.72
97.87 34.04
99.18 96.72
sum of subsequent compilation time and analysis time (S2),
Others (20 unique patterns)
Total
108 108
829 814
0 0
54 69
108
760
100.0 100.0
93.88 92.19
100.0 100.0
93.88 92.19
and considering only the analysis time (S3). We observe quite
impressive speedups up to 24k for initial compilation as it
in S  or C  . The ”O” column shows the number of overlapping takes time to download dependencies for some Java projects
warnings (O) for each bug pattern where the ”Others” row but Codegex does not have this limitation. Moreover, Codegex
denotes 20 bug patterns where both tools produce the same sets also outperforms SpotBugs in terms of other speedups (S2
of warnings (i.e., achieve same accuracy and precision). The and S3). Specifically, Codegex achieves an average speedup
”Accuracy” and ”Recall” columns show the relative accuracy of 237.06 over SpotBugs for S2, and average speedup of
and relative recall, respectively. We observe that the values for 55.72 for S3. Considering only the analysis time for both
the relative accuracy and recall for a tool are the same in each tools (A(S) and A(C)) for measuring response time, SpotBugs
row. We explain this scenario by including ”T P R ” columns has an average analysis time of 12.67s, whereas Codegex has
to show the relative true positives and ”F N R ” columns to an average analysis time of 2.18s. On average, the analy-
show the relative false negatives (we did not show the values sis time for Codegex is below Nielsen’s 10s recommended
for relative TN and FP as T P R =F P R =0 for all patterns). threshold for interactive feedback, suggesting that Codegex
The relative F P R =0 for all patterns because (1) our manual allows ”keeping the user’s attention focused on the dialogue”,
analysis shows that all evaluated tools only generate true whereas SpotBugs results have exceeded the limit, indicating
warnings (no theoretical FP), and (2) these warnings may be that ”users will want to perform other tasks while waiting for
marked as effective FP [37] by developers but we only evaluate the analysis to finish” [38].
theoretical FP in Q1. Overall, Codegex achieved comparable
results with SpotBugs in terms of overall accuracy and recall. Answer to RQ4: Codegex can provide instant feedback with
As highlighted in Table IV, Codegex outperforms SpotBugs in average analysis time of 2.18s (SpotBugs takes an average of
accuracy and recall for seven patterns. We observe that these 12.67s).
seven patterns mostly rely on string matching (e.g., matching
Effectiveness of Codegex for code review. We evaluate the
the program text of the operands for the self computations
effectiveness of Codegex in generating code review comments
patterns, and matching class/method names for patterns like
for open PRs by answering the following questions:
DMI RANDOM USED ONLY ONCE). This observation is
inline with our hypothesis that bug patterns relying on string RQ5: What is the quality of Codegex’s generated comments?
matching can be more easily matched via regex rules. Refer RQ6: How efficient is Codegex in performing code review?
to Section VII for the limitations of each tool. Crawling PRs. Existing datasets for evaluating automated
code review approaches are unsuitable for our evaluation
Answer to RQ3: Codegex achieves comparable accuracy as because they only contain manual code review comments
SpotBugs. (ground truth) [39]–[42], which may not cover the problems
reported by a static analyzer. To evaluate the real capability of
Responsiveness. We compute the metrics below:
Codegex in generating review comments, we build a crawler
(IC) Initial compilation time: Time taken to build a project
to get the 10977 most recently opened PRs in GitHub. We
(SC) Subsequent compilation time: Time taken to build a
select the PRs that have at least one code change in Java files
project with all dependencies being pre-downloaded.
(because our tool only analyzes Java files), resulting in 4256
(A(t)) Analysis time: Time taken for a tool t to produce
PRs from 2769 different projects. Supplementary table shows
analysis reports. We use A(S) to denote SpotBugs’ analysis
that evaluated PRs are quite diverse as involved patches that
time, and A(C) for Codegex’s analysis time.
modify 0–25267 lines of code, and span across 1 – 30 files.
S1 = IC+A(S) S2 = SC+A(S) A(S)
S3 = A(C)
A(C) A(C) Measuring quality of generated reviews. As manually check
We calculate IC and SC only for SpotBugs as it requires
for the correctness of the 372 generated comments for 4256
compilation. We include both IC and SC since one may argue
PRs is time-consuming, we rely on the developer feedback
that IC is unimportant as downloading dependencies is only a
to measure their qualities. Given a PR and its corresponding
one-time effort. To account for performance differences across
feedback f , we manually classify f into:
multiple runs, we rerun each time calculation for five runs and
reported the average time in Table III. (AF) Accept and fixed: We consider f to be AF if the
RQ4: Results for response time. The last seven columns in developer (1) gave positive feedback or acknowledgment and

138

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
Fig. 3: Example feedback received for a PR in OpenJDK [43]

Example feedback. Figure 2 shows that developers tend


to give feedback for patterns related to naming conventions
Fig. 2: Quality of Codegex’s generated review comments (patterns with names ending with NAMING CONVENTION).
This observation is inline with prior studies of code re-
(2) modified or mentioned they will fix it in the future. view that show that developers usually fix maintainability-
(AC) Accept: We consider f to be AC if the developer related issues [44], [45]. Figure 3 shows positive feedback for
(1) gave positive feedback or acknowledgment, or (2) used the NM METHOD NAMING CONVENTION pattern for an
positive emoji. OpenJDK’s PR. In his response, the OpenJDK developer said
(NI) Not interested: We consider f as N I if the developer that Codegex helped in detecting method renaming problems
(1) used neutral emoji, (2) wanted to unsubscribe our service, in several 20-year-old methods.Consider another positive feed-
(3) said that our suggestions have little impact, or (4) ignored back for the PR [46] where the developer not only agreed
our comment. with the Codegex’s generated comment, but also added the
(MR) Mark as resolved: We consider f as M R if the de- SpotBugs plugin after getting our automatically generated
veloper marked our comments as resolved but did not reply. comment. As SpotBugs and Codegex give similar outputs
(FP) Mark as FP: We consider f as F P if the developer (1) (all comments in Codegex are derived from bug descriptions
used negative emoji, (2) found inconsistency between their in SpotBugs), these developers have mistakenly treated our
code and our comment, or (3) said that our comment is tool as SpotBugs. This result is inline with our results that
inapplicable. show that Codegex can achieve comparable accuracy as Spot-
Bugs. Moreover, the positive feedback also shows the role of
RQ5: Results for quality of generated reviews. Some code Codegex as a lightweight frontend for SpotBugs that provides
changes have multiple violations for a bug pattern. Instead better user experience (i.e., does not require build configu-
of leaving a comment per violation, Codegex only gives ration, and provides quick yet accurate feedback), prompting
a comment per bug pattern to avoid spamming developers users to install SpotBugs after trying our lightweight frontend.
with too many comments. In total, we received 116 pieces RQ6: PR Analysis time. For the 4256 evaluated PRs,
of feedback from developers of corresponding PRs where Codegex automatically produces 372 review comments. In
AF =55, AC=36, N I=7, M R=6, and F P =12. The effective total, Codegex takes 702 seconds to analyze 4256 PRs. If we
false positives (12/116≈10%) match well with the expectation turn off the online search strategy, Codegex only takes 168
in prior study [37]. We ran 12 FPs in SpotBugs and found seconds to analyze 4256 PRs. The average analysis time is
that it gave the same ten FPs. The two remaining FPs are due 0.039 seconds per PR. These results are below the Nielsen’s
to the failure of Codegex in matching special AST elements. 0.1 second recommended time limit for a system to react
Figure 2 presents a bar chart where the x-axis shows the instantenously [38], indicating the promise of using Codegex
number of received feedback for a given category, and the in an online setting (as a code review bot for checking PRs).
y-axis shows the names of the patterns with at least one
feedback. Different shades denotes different categories, where VI. S URVEY ON THE EASE OF WRITING REGEX RULES
”AF” (unshaded) and ”AC” (diagonally shaded) are positive To evaluate the ease of writing regex rules compared to
feedback. ”NI” (vertically shaded) and ”MR” (horizontally the non-regex rules in SpotBugs, we surveyed five students
shaded) are neutral feedback, whereas ”FP” (black) denotes at the authors’ university (two first-year graduate students and
FPs. Overall, most feedback is positive (only five patterns with three juniors) for their experience in implementing regex rules
≥1 negative feedback). in Codegex referring to the corresponding implementations
in SpotBugs. The survey contains questions about (1) prior
Answer to RQ5: Among the feedback from 116 that we programming experience, (2) the familiarity with regex, and
received from developers, 78.45% of them are positive. the ease of writing bug detection rules (see supplementary

139

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
material for the questions). The participants have 2–5 years the same class by checking to see if the names of their
of Java programming experience (SpotBugs is written in Java) classes are equal” so we expect SpotBugs to warn about
and 1–2 years of Python programming experience (Codegex is c.getClass().getName().equals(c2.getClass()
implemented in Python). Overall, our survey results show that .getName()) but it has an F N R because it only checks
although most participants are moderately familiar with regex if the comparison is inside the equals method. When
(average Likert score=2.6 with 5 being expert), they think that we read the bug descriptions for other related patterns, we
implementing a bug pattern using regex-based rule is easier think that it should be changed to ”This class defines an
than using non-regex in SpotBugs as they rated the difficulty equals method that ...” [47]. As the users rely on the bug
of using regex (average Likert score=2.8 with 5 being very description to understand warnings, future researches could be
difficult) lower than that of using non-regex (average Likert approaches that automatically detect inconsistencies between
score=3.6). bug descriptions of related patterns.
Limitations of a regex-based approach. Based on the F N R s
VII. L IMITATIONS for Codegex in Table IV, we identify the following limitations:
SpotBugs’ limitations. Below are SpotBugs’ limitations based Only support single-statement (44/69): Due to
on F N R s in Table IV: the lack of multiline regex support, Codegex fails
Missed specific kinds of operands (35/54): As Spot- to detect patterns that require sophisticated analysis
Bugs analyzes bytecode, its bug detection has to con- (data flow analysis). For example, the pattern
sider the variants of the same operation given differ- DMI HARDCODED ABSOLUTE FILENAME requires
ent kinds of operands. For example, when detecting the checking (1) a File object is created, and (2) there exists an
DMI RANDOM USED ONLY ONCE pattern that warns absolute path string in the File object creation. As Codegex
when a random object is created and used only once, can only detect explicit object creation (e.g., new File(”/abs”)),
SpotBugs has F N R s when analyzing int randNumber = it fails to detect the string usage in indirect object construction
new Random().nextInt(99); since it only checks for (e.g., inside the method parseZip("/abs")). In future,
instructions that load a local variable but misses those that load we plan to use the multiline mode in the regex library to
a constant (99). Future research can work on testing SpotBugs support more patterns .
with different kinds of operands to find these F N R s. Fails to infer data type (24/69): As Codegex relies on
Missed compound expressions (8/54): The bug explicit type driven matching, it fails to detect patterns that
detection rules in SpotBugs usually only consider require type inference. For example, when detecting the
simple expressions, and may miss violations in ES COMPARING STRINGS WITH EQ pattern which com-
compound expressions. For example, when detecting pares String objects for reference equality using the ==
the VA FORMAT STRING USES NEWLINE pattern or != operators, Codegex fails to warn about the expression
that gives a warning when a format string statement this.getName()==that.getName() due to failure to
includes a newline character ’\n’, SpotBugs has an F N R s infer the return type of the getName() method.
for String.format(var+"GitHub.\n"); with a Missed special AST elements (1/69): As Codegex
compound expression var+ "GitHub.\n" due to the does not parse code into ASTs, it may miss
unsupported string concatenation operation. some AST nodes. For example, when checking the
Incomplete modeling of sibling types (6/54): As sibling types NM CLASS NAMING CONVENTION pattern that checks
(e.g., the two floating-point types: float and double) share for upper camel cases, Codegex did not warn about the enum
similar behaviors, one would expect SpotBugs to give similar complexFeaturesAppendEnum expression because we
warnings during its analysis. However, when checking for the do not consider enum as a type of special Java class.
pattern RV 01 TO INT that reports a warning when a random
value is being coerced to the integer value 0, SpotBugs only VIII. R ELATED W ORK
checks for certain APIs that produce a random value (e.g., Enhancing static analysis. Several techniques were pro-
nextDouble()), and omit other similar APIs with sibling posed to prioritize more important generated warnings in
types (e.g., nextFloat()). FindBugs [13], [14], [36]. Previous work (e.g., [5], [48]–
Handling method calls (4/54): In Section II, we propose ex- [50])focus on improving performance of static analysis via
panding the detection of self computation by treating method staged analyses. Although Codegex uses a two-stage approach
calls as expression. For example, SpotBugs has an F N R when for quick analysis, it uses different techniques (regex rules and
checking size+=(dom.getSegmentAtPos(a).get several strategies) from existing methods. Several approaches
From()-dom.getSegmentAtPos(a).getFrom()+1); support analysis of partial programs [7], [51], [52]. Prior
as it fails to detect the self computation in the method call framework resolves ambiguities in partial Java programs via
dom.getSegmentAtPos(a).getFrom(). several heuristics [7]. Although several strategies are used to
Inconsistent bug description (1/54): We found improve accuracy of Codegex, it can analyze smaller programs
an inconsistency in the bug description for the (programs with only one statement versus one class in prior
EQ COMPARING CLASS NAMES pattern. Specifically, work [7]), and it is designed for bug pattern detection whereas
it stated that “This method checks to see if two objects are prior work is designed for type inference. The most relevant

140

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
work, μchex, performs bug detection on AST nodes built from the machine used and network latency, our results show that
code snippet by sliding window and micro-grammars [51]. the speedup that Codegex offers is quite substantial.
While it benefits from the strength of tree representation for Internal. Our code and scripts may have bugs that could affect
complex analysis (e.g. flow analysis), Codegex improves the our results. To mitigate this threat, we wrote tests for each
comprehensibility and ease of implementation of bug detection implemented pattern. Moreover, we evaluate the effectiveness
rules based on text representation. Codegex did not compare of Codegex in automated code review via developers’ feedback
with μchex since it is not open-source. as it is time-consuming to manually label each warning. We
Code Reviews. Prior automated code review approaches either mitigate this threat by manually analyzing each feedback.
rely on deep learning for modeling code changes and review Ethical considerations. In principle, it is straightforward
comments [39]–[41], [53] or code reviewer recommenda- to extend Codegex to a bot automatically triggered after
tion [54], [55]. Although these techniques can potentially submitting a new PR. However, to ensure reproducible results,
discover new problems in given code changes, they are more we first obtained a fixed list of open PRs, and then run
suitable for code review of mature projects where many PRs Codegex to generate review comments for each PR in the
and review comments exist. Codegex can handle any type of list (i.e., our experiments only affect developers of the 372
projects, including new projects which do not have enough PRs with review comments). Instead of spamming developers
review comments for training. Meanwhile, several studies have by contacting them via survey [62], we evaluate whether
shown the usefulness of static analysis in automated code Codegex is ethical based on bot ethics [19] which checks
review [16], [17], [56], [57]. One of the most relevant works, if the bot: (1) is lawbreaking, (2) involves deception, and
Review Bot, produces review comments based on the output (3) violates social norms. To check for law breaking, we
of static analyzers [18]. While Review Bot solves the compi- (1) obtained ethical approval from the Institutional Review
lation requirement of SpotBugs using a workaround, Codegex Board (IRB) of our institute, and (2) manually checked the
improves over SpotBugs via regex rules and heuristics to skip contribution guidelines, and signed the Contributor License
compilation. Agreement (CLA) of each repository in the PR list (in most
Regex-based approaches. Regex matching has been widely CLAs, “submitted” includes any form of communications,
used in many tasks (e.g., mutant generation [58] and detecting which covers code review comments). Overall, there are 31
security vulnerabilities [59]–[61]). In the security domain, projects with CLAs, and we have signed all of them. To avoid
DevSkim is an IDE plugin [61] that uses regex rules for inline deception, we did not hide the fact that the comments are sent
checks of security vulnerabilities (e.g., invoking dangerous by a bot. In fact, six developers are aware that the comments
API like strcpy). Regex rules cannot be directly used in are sent by a bot (e.g., one developer replied to us saying
other domains to detect bug patterns in SpotBugs because “good bot”). For (3), the fact that our bot achieves similar
(1) a general-purpose analyzer may have rules that are not accuracy as SpotBugs (a widely used tool) for many patterns
domain-specific and may require more contexts for accurate shows that our bot is beneficent [63] and did not “create more
detection, and (2) these tools are not designed for checking evil than good”. Moreover, we also manually replied to 57
partial programs. While Codegex uses regex as its core for developers to discuss the analysis results.
analysis, it differs from existing approaches in several aspects:
(1) it uses strategies to improve effectiveness, and (2) it is more IX. C ONCLUSION
general than domain-specific techniques that target security
vulnerabilities. We present Codegex, a novel regex-based approach for effi-
cient static analysis. To perform fast yet accurate analysis, our
A. Threats to Validity approach uses several heuristics to enrich the analysis contexts.
Our experiments that compare Codegex and SpotBugs show
External. Our study and our evaluation results may not that Codegex can analyze up to 590X faster than SpotBugs
generalize beyond the evaluated open-source Java projects. To with comparable accuracy. For automated code review, we
mitigate this threat, we include a large number of open-source evaluate Codegex against 4256 PRs, and received 116 feed-
projects of diverse sizes. We also ensure that the projects used back where 78.45% are positive. Our experiments confirm the
in the two experiments (Section V) do not overlap. Since we two settings in which Codegex can enhance existing static
only evaluate on SpotBugs, our results may not generalize analyzers like SpotBugs, including: (1) acting as the fast stage
to other static analysis tools (e.g., Infer [31]). During the in a two-stage approach where more sophisticated analysis
manual inspection of the generated warnings, two authors can be run as part of a nightly build, and (2) supporting
of the paper reviewed the results independently and met to incremental analysis that performs automated code review for
resolve any disagreement. Moreover, whenever we found a partial code in a PR without setting up build configurations.
bug or an FN in SpotBugs, we confirmed its validity with the
developer by filing bug reports, leading to a total of 16 bug X. ACKNOWLEDGEMENT
reports. Furthermore, due to limited resources, we conduct
all experiments on a single machine. Although the initial This work was supported by the National Natural Science
compilation time and subsequent compilation time depend on Foundation of China (Grant No. 61902170).

141

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [24] C. Cifuentes and B. Scholz, “Parfait: designing a scalable bug checker,”
in Proceedings of the 2008 workshop on Static analysis, 2008, pp. 4–11.
[1] B. Johnson, Y. Song, E. Murphy-Hill, and R. Bowdidge, “Why don’t [25] Five invaluable techniques to improve regex per-
software developers use static analysis tools to find bugs?” in 2013 35th formance. [Online]. Available: https://www.loggly.com/blog/
International Conference on Software Engineering (ICSE). IEEE, 2013, five-invaluable-techniques-to-improve-regex-performance/
pp. 672–681. [26] C. Chapman and K. T. Stolee, “Exploring regular expression usage and
[2] M. Christakis and C. Bird, “What developers want and need from context in python,” in Proceedings of the 25th International Symposium
program analysis: an empirical study,” in Proceedings of the 31st on Software Testing and Analysis, 2016, pp. 282–293.
IEEE/ACM international conference on automated software engineering, [27] N. Ayewah and W. Pugh, “The google findbugs fixit,” 01 2010, pp.
2016, pp. 332–343. 241–252.
[3] Coverity, 2022. [Online]. Available: https://scan.coverity.com/ [28] S. Kim and M. D. Ernst, “Prioritizing warning categories by analyzing
[4] Fortify, 2022. [Online]. Available: https://www.microfocus.com/en-us/ software history,” in Fourth International Workshop on Mining Software
cyberres/application-security Repositories (MSR’07:ICSE Workshops 2007), 2007, pp. 27–27.
[5] L. N. Q. Do, K. Ali, B. Livshits, E. Bodden, J. Smith, and E. Murphy- [29] Pmd. [Online]. Available: https://pmd.github.io/
Hill, “Just-in-time static analysis,” in Proceedings of the 26th ACM [30] Jlint. [Online]. Available: http://jlint.sourceforge.net/
SIGSOFT International Symposium on Software Testing and Analysis, [31] C. Calcagno and D. Distefano, “Infer: An automatic program verifier for
2017, pp. 307–317. memory safety of C programs,” in NASA Formal Methods Symposium.
[6] T. Clem and P. Thomson, “Static analysis at github: An experience Springer, 2011, pp. 459–465.
report,” Queue, vol. 19, no. 4, p. 42–67, aug 2021. [Online]. Available: [32] Error prone. [Online]. Available: https://errorprone.info/
https://doi.org/10.1145/3487019.3487022 [33] M. Tufano, F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia,
[7] B. Dagenais and L. Hendren, “Enabling static analysis for partial java and D. Poshyvanyk, “There and back again: Can you compile that
programs,” in Proceedings of the 23rd ACM SIGPLAN conference snapshot?” Journal of Software: Evolution and Process, vol. 29, no. 4,
on Object-oriented programming systems languages and applications, p. e1838, 2017.
2008, pp. 313–328. [34] N. Ayewah, W. Pugh, J. D. Morgenthaler, J. Penix, and Y. Zhou,
[8] D. A. Tomassi, “Bugs in the wild: examining the effectiveness of static “Evaluating static analysis defect warnings on production software,” in
analyzers at finding real-world bugs,” in Proceedings of the 2018 26th Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program
ACM Joint Meeting on European Software Engineering Conference and analysis for software tools and engineering, 2007, pp. 1–8.
Symposium on the Foundations of Software Engineering, 2018, pp. 980– [35] A. Habib and M. Pradel, “How many of all bugs do we find? A study of
982. static bug detectors,” in 2018 33rd IEEE/ACM International Conference
on Automated Software Engineering (ASE), 2018, pp. 317–328.
[9] [java] Allow MissingOverride rule to work without bytecode, 2020.
[36] S. Kim and M. D. Ernst, “Which warnings should I fix first?” in
[Online]. Available: https://github.com/pmd/pmd/issues/2428
Proceedings of the the 6th joint meeting of the European software
[10] Analyze Java without running the build, 2018. [Online]. Available:
engineering conference and the ACM SIGSOFT symposium on The
https://github.com/facebook/infer/issues/969
foundations of software engineering, 2007, pp. 45–54.
[11] Analyse jar and not java file, 2018. [Online]. Available: https:
[37] C. Sadowski, E. Aftandilian, A. Eagle, L. Miller-Cushon, and C. Jaspan,
//github.com/spotbugs/spotbugs/issues/695
“Lessons from building static analysis tools at google,” Communications
[12] Checkstyle. [Online]. Available: http://checkstyle.sourceforge.net/ of the ACM, vol. 61, no. 4, pp. 58–66, 2018.
[13] H. Shen, J. Fang, and J. Zhao, “EFindBugs: Effective error ranking for [38] J. Nielsen, Usability engineering. Morgan Kaufmann, 1994.
findbugs,” in 2011 Fourth IEEE International Conference on Software [39] J. K. Siow, C. Gao, L. Fan, S. Chen, and Y. Liu, “CORE: Automating
Testing, Verification and Validation. IEEE, 2011, pp. 299–308. review recommendation for code changes,” in 2020 IEEE 27th Interna-
[14] S. Heckman and L. Williams, “A systematic literature review of action- tional Conference on Software Analysis, Evolution and Reengineering
able alert identification techniques for automated static code analysis,” (SANER). IEEE, 2020, pp. 284–295.
Information and Software Technology, vol. 53, no. 4, pp. 363–387, 2011. [40] A. Gupta and N. Sundaresan, “Intelligent code reviews using deep
[15] S. Wagner, J. Jürjens, C. Koller, and P. Trischberger, “Comparing bug learning,” in Proceedings of the 24th ACM SIGKDD International
finding tools with reviews and tests,” in Proceedings of the 17th IFIP Conference on Knowledge Discovery and Data Mining (KDD’18) Deep
TC6/WG 6.1 International Conference on Testing of Communicating Learning Day, 2018.
Systems, ser. TestCom’05. Berlin, Heidelberg: Springer-Verlag, 2005, [41] S.-T. Shi, M. Li, D. Lo, F. Thung, and X. Huo, “Automatic code review
p. 40–55. [Online]. Available: https://doi.org/10.1007/11430230 4 by learning the revision of source code,” in Proceedings of the AAAI
[16] S. Panichella, V. Arnaoudova, M. Di Penta, and G. Antoniol, “Would Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 4910–
static analysis tools help developers with code reviews?” in 2015 IEEE 4917.
22nd International Conference on Software Analysis, Evolution, and [42] R. Chatley and L. Jones, “Diggit: Automated code review via software
Reengineering (SANER). IEEE, 2015, pp. 161–170. repository mining,” in 2018 IEEE 25th International Conference on
[17] D. Singh, V. R. Sekar, K. T. Stolee, and B. Johnson, “Evaluating Software Analysis, Evolution and Reengineering (SANER). IEEE, 2018,
how static analysis tools can reduce code review effort,” in 2017 pp. 567–571.
IEEE Symposium on Visual Languages and Human-Centric Computing [43] 8262881: port JVM/DI tests from JDK-4413752 to JVM/T. [Online].
(VL/HCC). IEEE, 2017, pp. 101–105. Available: https://github.com/openjdk/jdk/pull/2899
[18] V. Balachandran, “Reducing human effort and improving quality in peer [44] M. Beller, A. Bacchelli, A. Zaidman, and E. Juergens, “Modern code
code reviews using automatic static analysis and reviewer recommenda- reviews in open-source projects: Which problems do they fix?” in Pro-
tion,” in 2013 35th International Conference on Software Engineering ceedings of the 11th working conference on mining software repositories,
(ICSE). IEEE, 2013, pp. 931–940. 2014, pp. 202–211.
[19] C. A. de Lima Salge and N. Berente, “Is that social bot behaving [45] A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges of
unethically?” Communications of the ACM, vol. 60, no. 9, pp. 29–31, modern code review,” in 2013 35th International Conference on Software
2017. Engineering (ICSE). IEEE, 2013, pp. 712–721.
[20] Network status check & optimize up log. [On- [46] Changelog. [Online]. Available: https://github.com/foo4u/
line]. Available: https://github.com/qiniu/android-sdk/pull/453/files/ conventional-commits-for-java/pull/12
7356a6c3eccb942fa90edc9e537b36b349ebc89a#diff-7f67181 [47] Inconsistent bug description on eq comparing class names. [Online].
[21] Sa local self computation bug. [Online]. Available: https://github.com/ Available: https://github.com/spotbugs/spotbugs/issues/1523
spotbugs/spotbugs/issues/1472 [48] B. Hardekopf and C. Lin, “Flow-sensitive pointer analysis for millions
[22] False negatives on sa field self computation and of lines of code,” in International Symposium on Code Generation and
sa local self computation. [Online]. Available: https://github.com/ Optimization (CGO 2011). IEEE, 2011, pp. 289–298.
spotbugs/spotbugs/issues/1473 [49] V. Kahlon, Y. Yang, S. Sankaranarayanan, and A. Gupta, “Fast and accu-
[23] The architecture of findbugs. [Online]. Available: https://github.com/ rate static data-race detection for concurrent programs,” in International
spotbugs/spotbugs/blob/a6f9acb2932b54f5b70ea8bc206afb552321a222/ Conference on Computer Aided Verification. Springer, 2007, pp. 226–
spotbugs/design/architecture/architecture.tex 239.

142

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.
[50] S. J. Fink, E. Yahav, N. Dor, G. Ramalingam, and E. Geay, “Effective
typestate verification in the presence of aliasing,” ACM Transactions on
Software Engineering and Methodology (TOSEM), vol. 17, no. 2, pp.
1–34, 2008.
[51] F. Brown, A. Nötzli, and D. Engler, “How to build static checking
systems using orders of magnitude less code,” in Proceedings of the
Twenty-First International Conference on Architectural Support for
Programming Languages and Operating Systems, 2016, pp. 143–157.
[52] H. Zhong and X. Wang, “Boosting complete-code tool for partial pro-
gram,” in 2017 32nd IEEE/ACM International Conference on Automated
Software Engineering (ASE). IEEE, 2017, pp. 671–681.
[53] H.-Y. Li, S.-T. Shi, F. Thung, X. Huo, B. Xu, M. Li, and D. Lo, “Deep-
review: automatic code review using deep multi-instance learning,” in
Pacific-Asia Conference on Knowledge Discovery and Data Mining.
Springer, 2019, pp. 318–330.
[54] P. Thongtanunam, C. Tantithamthavorn, R. G. Kula, N. Yoshida, H. Iida,
and K.-i. Matsumoto, “Who should review my code? a file location-
based code-reviewer recommendation approach for modern code re-
view,” in 2015 IEEE 22nd International Conference on Software Analy-
sis, Evolution, and Reengineering (SANER). IEEE, 2015, pp. 141–150.
[55] Y. Yu, H. Wang, G. Yin, and T. Wang, “Reviewer recommendation for
pull-requests in github: What can we learn from code review and bug
assignment?” Information and Software Technology, vol. 74, pp. 204–
218, 2016.
[56] C. Vassallo, S. Panichella, F. Palomba, S. Proksch, H. C. Gall, and
A. Zaidman, “How developers engage with static analysis tools in
different contexts,” Empirical Software Engineering, vol. 25, no. 2, pp.
1419–1457, 2020.
[57] M. V. Mäntylä and C. Lassenius, “What types of defects are really dis-
covered in code reviews?” IEEE Transactions on Software Engineering,
vol. 35, no. 3, pp. 430–448, 2008.
[58] A. Groce, J. Holmes, D. Marinov, A. Shi, and L. Zhang, “An extensible,
regular-expression-based tool for multi-language mutant generation,” in
2018 IEEE/ACM 40th International Conference on Software Engineer-
ing: Companion (ICSE-Companion). IEEE, 2018, pp. 25–28.
[59] C. R. Meiners, J. Patel, E. Norige, E. Torng, and A. X. Liu, “Fast regular
expression matching using small tcams for network intrusion detection
and prevention systems,” in Proceedings of the 19th USENIX conference
on Security, 2010, pp. 8–8.
[60] T. Liu, Y. Sun, A. X. Liu, L. Guo, and B. Fang, “A prefiltering
approach to regular expression matching for network security systems,”
in International Conference on Applied Cryptography and Network
Security. Springer, 2012, pp. 363–380.
[61] Devskim. [Online]. Available: https://github.com/microsoft/devskim
[62] S. Baltes and S. Diehl, “Worse than spam: Issues in sampling software
developers,” in Proceedings of the 10th ACM/IEEE international sym-
posium on empirical software engineering and measurement, 2016, pp.
1–6.
[63] N. E. Gold and J. Krinke, “Ethical mining: A case study on msr mining
challenges,” in Proceedings of the 17th International Conference on
Mining Software Repositories, 2020, pp. 265–276.

143

Authorized licensed use limited to: Institute of Software. Downloaded on April 09,2024 at 09:38:52 UTC from IEEE Xplore. Restrictions apply.

You might also like