Learning How To Listen - Automatically Finding Bug Patterns in Event-Driven JavaScript APIs
Learning How To Listen - Automatically Finding Bug Patterns in Event-Driven JavaScript APIs
F
arXiv:2107.13708v3 [cs.SE] 11 Feb 2022
Abstract—Event-driven programming is widely practiced in the advocates an asynchronous programming style centered
JavaScript community, both on the client side to handle UI events and around a collection of event-based APIs for accessing re-
AJAX requests, and on the server side to accommodate long-running sources like the file system, the network, or databases.
operations such as file or network I/O. Many popular event-based APIs The precise APIs implemented by individual platforms
allow event names to be specified as free-form strings without any
and frameworks differ, but a common feature across all of
validation, potentially leading to lost events for which no listener has
been registered and dead listeners for events that are never emitted.
JavaScript is the notion of a central event loop that han-
In previous work, Madsen et al. presented a precise static analysis for dles event dispatching. Events are identified by an event
detecting such problems, but their analysis does not scale because it name and may optionally have a payload. When an event
may require a number of contexts that is exponential in the size of the happens, it is associated with a particular object, which is
program. Concentrating on the problem of detecting dead listeners, we known as the event target in many client-side frameworks
present an approach to learn how to use event-based APIs by first min- and the event emitter in Node.js. We will follow the latter
ing a large corpus of JavaScript code using a simple static analysis to terminology in this paper. Client code can register listener
identify code snippets that register an event listener, and then applying functions (or listeners for short) for a particular event on an
statistical modeling to identify anomalous patterns, which often indicate
event emitter. When an event is emitted, all the listener
incorrect API usage. In a large-scale evaluation on 127,531 open-source
JavaScript code bases, our technique was able to detect 75 anomalous
callbacks registered for it on the emitter object are run in
listener-registration patterns, while maintaining a precision of 90.9% and sequence. While many events are emitted by framework
recall of 7.5% over a validation set, demonstrating that a learning-based code, application code can also emit events explicitly.
approach to detecting event-handling bug patterns is feasible. In an Most of the event-based APIs mentioned above are in-
additional experiment, we investigated instances of these patterns in 25 trinsically dynamic and untyped. By “dynamic” we mean
open-source projects, and reported 30 issues to the project maintainers, that the association between events and listeners can change
of which 7 have been confirmed as bugs. over time, with new listeners being registered and exist-
ing listeners being removed throughout an event emitter’s
Index Terms—static analysis, JavaScript, event-driven programming,
lifecycle. Indeed, it is common for listeners themselves to
bug finding, API modeling
register or remove listeners on their own or on other emit-
ter(s). By “untyped” we mean that event names are free-
form strings that are not validated in any way, and can be
1 I NTRODUCTION
associated with any emitter and any payload. In particular,
Event-driven programming has been the dominant client applications can emit and listen for custom events on
paradigm in JavaScript since its early days. This is quite emitters defined by a library.
natural on the client side, since most web applications are While these two properties are prized by some for their
GUI-based and hence are centered around reacting to user flexibility, they also give rise to several classes of subtle
actions such as clicking a button or pressing a key. The W3C bugs [7]. For example, if a listener registration misspells the
UI Events standard [1] defines the low-level event API sup- name of the event or registers the listener on the wrong
ported by all modern browsers, while popular libraries such object, the listener will never be invoked. This is known as
as jQuery [2], Angular [3] and React [4] provide higher-level a dead listener. Dead listeners can also arise if a listener is
abstractions on top of it. Many other client-side APIs such as registered at the wrong time, for instance after the event
Web Workers and Web Sockets are likewise programmed in has already been emitted. The dual of a dead listener is a
an event-driven style. On the desktop, the popular Electron lost event, which can happen if an event emission misspells
[5] framework enforces an architecture where applications the event name or emits it on the wrong object. Both dead
are split into a main process and a renderer process, which listeners and lost events are particularly hard to debug, as
communicate via an event-based API. Finally, the Node.js they manifest in the lack of execution of the listener function
platform [6], which is dominant in server-side JavaScript, rather than an explicit error message.
In this paper, we concentrate mostly on dead-listener
• E. Arteca and F. Tip are with the Khoury College of Computer Sciences at bugs. Our goal is to detect such bugs automatically and
Northeastern University. E-mail: {arteca.e,f.tip}@northeastern.edu statically, i.e., without having to run the code under analysis.
• M. Schäfer is with GitHub. E-mail: [email protected]
Prior work by Madsen et al. [7] employs context-
2
127K projects data mining listener registration classi cation classi ed listener registration pairs
pairs
API models
<a,e> @ loc
<a,e> @ loc
…
dead listener IDE plugin
project bug report project
analysis for smart completion
Fig. 1. Overview of approach: the top half depicts the model-construction pipeline, while the bottom half shows their potential applications. This
paper focuses on the shaded areas.
sensitive static analysis techniques to infer a semantic model code base. If, however, both the event emitter and the event
of event emission and listener registration to identify dead
fifi
name are rare for each other yet otherwise common, then
listeners. Unfortunately, their analysis does not scale well that is a strong indication that this pair represents a mistake.
because it may require a number of contexts that is expo- Our statistical analysis has four parameters shown as
nential in the size of the program. inputs to the classification stage in Figure 1: rarity thresholds
We propose instead to learn how to use event-based APIs pa and pe defining when paths and events are considered
by first mining a large corpus of JavaScript code with a rare, respectively, and confidence thresholds pca and pce
simple static analysis to identify code snippets that register defining the statistical confidence we demand for paths and
an event listener, and then applying statistical modeling to events to be considered rare, respectively. The output of
identify anomalous patterns. Intuitively, if we look at enough classification is a set of pairs learned to be expected, and a
code we would expect most API usages to correspond to set learned to be anomalous. Pairs are left unclassified if they
their designed use, so particularly rare patterns are likely do not meet the thresholds for being common or rare.
bugs. We formalize this concept of “particularly rare” as These sets constitute API models, for those APIs ana-
thresholds in our statistical analysis, and identify patterns lyzed. Once constructed, these API models can be used, e.g.,
that meet these thresholds as potential bugs. Using the same in bug finding tools (see bottom left part of Figure 1), or for
thresholds, our approach also addresses the dual problem of smart completion in an IDE (see bottom right part). In this
learning expected uses, with “particularly common” uses of work, we focus on the set of pairs that are learned to be
the APIs corresponding to the intended use. anomalous, as they are likely to indicate dead listener bugs.
Figure 1 visualizes our approach. The top of the figure The effectiveness of our approach crucially depends on
shows how models of event-driven APIs are constructed how we configure the threshold parameters for classifica-
in two steps: First, a data mining analysis is applied to tion. In our evaluation, we systematically explore the space
a large number of JavaScript projects to obtain a list of of possible configurations, computing for each of them the
event listener registrations. These are represented as listener- set of anomalous listener-registration pairs from more than
registration pairs ha, ei, where a is an event-emitting API 532,000 pairs mined from over 127,500 open-source code
endpoint symbolically represented by an access path [8] as bases. To quantitatively assess the quality of the models
explained in Section 3, and e is the name of the event the generated with a particular configuration, we then compute
listener is registered for. The second step is classification, i.e., the true-positive rate (the precision) and the percentage of
performing a statistical analysis of the occurrence distribu- true positives detected (the recall) with respect to a validation
tions of e’s and a’s, and using this to identify pairs ha, ei set of pairs that we semi-automatically labeled as correct or
where the access path a and event e are rare relative to each incorrect according to the API documentation1 .
other. In other words, we look for cases where e is rarely In general, configurations with lower precision yield
listened for on a, and a rarely registers a listener for e. higher recall. For practically useful tools, however, a preci-
sion of at least 90% is generally considered essential [9], [10].
Considering one of these conditions in isolation, or only
Several configurations achieve this rate over the labeled set.
the absolute number of occurrences of a pair, is not usually
sufficient, since the data may be too sparse to conclude that
1. Event-listener registration pairs in the validation set may also be
it is anomalous. For example, a may be a rarely-used API, or designated as being imprecise, to reflect situations where the access
e may be a custom event that is used only by one particular path is insufficiently precise to make a determination (see Section 6).
3
To gain confidence that this is not simply an artifact of 1 const http = require(’http’);
the data, we performed a 10-fold cross-validation experi- 2 module.exports.request = (url) =>
ment. We partitioned the labeled set into 10 sets; for each 3 new Promise((resolve, reject) => {
set, we found the optimal configuration for the other 9 sets 4 const req = http.request(url, res => {
5 res.on(’data’, /* omitted */);
(which together form the training data), and computed the 6 res.on(’end’, () => {
precision and recall of that configuration over the remain- 7 /* omitted */
ing set (which comprises the validation data). Our results 8 resolve( res);
9 });
show that the optimal configuration for the training data 10 res.on(’timeout’, () => reject(req)); // bug here
consistently achieves good results over the validation data. 11 });
To qualitatively assess the usefulness of our approach, 12 req.end();
we investigated uses of anomalous pairs in 25 open-source 13 });
projects, reporting 30 issues to the project maintainers. At Fig. 2. An example of a dead-listener bug
the time of writing, 7 of these have been confirmed as dead-
listener bugs, and two have been patched.
The rest of the paper is structured as follows. Section 2 an argument that is an instance of http.IncomingMessage
provides background on event-driven JavaScript program- representing the HTTP response. This object is itself an event
ming and reviews a dead-listener bug in an open-source emitter, emitting data events when response data becomes
project. Sections 3 and 4 explain our approach in detail, available and an end event once all data has been received.
while Section 5 covers the implementation. Sections 6 covers If, on the other hand, the request times out before receiv-
experimental methodology used in an experimental evalua- ing a response, the request object emits a timeout event.
tion that is presented in Section 7. Next, Section 8 presents
a case study of false positives and false negatives observed
2.2 Motivating example
in our results, and discusses threats to validity. Section 9
discusses to what extent our techniques are applicable to Consider the code shown in Figure 2, which is a con-
detecting lost events. Section 10 reviews related work, and densed version of a bug our approach identified in the
Section 11 concludes and outlines directions for future work. min-req-promise npm package.
The source code of our implementation, experimental min-req-promise turns the somewhat intricate event-
data, and reproduction instructions are available online at based http.request API discussed above into a simpler
https://github.com/emarteca/JSEventAPIModelling promise-based API. It exports a function request, which
returns a promise wrapped around a call to http.request.
The pending request (an instance of http.ClientRequest)
2 BACKGROUND
is stored in variable req (line 4), and a listener function is
We begin by recapitulating the basics of event-driven pro- passed to http.request on the same line, which associates
gramming in Node.js and some of the most common kinds it with the response event on req. Finally, req.end() is
of mistakes programmers make when writing event-driven called on line 12 to dispatch the request. Once a response
code. We then show a concrete example of such a bug, based arrives, the http library invokes the listener provided on
on code we found using our approach in an open-source line 4, passing it a res object representing the response,
project on GitHub, and finally explain how we go about which is an instance of http.IncomingMessage. On this
identifying this sort of bug automatically. object, handlers for three events are installed: data, end
and timeout. The first event is emitted whenever a chunk
2.1 Event-driven programming in Node.js of response data arrives, the second when the response has
All event emitters in Node.js are instances of the been received in its entirety. For simplicity, we have omitted
EventEmitter class [11] or one of its subclasses. Listeners the handler functions for these two events; the interested
are associated with an event by invoking one of several reader is referred to the project’s GitHub page [12].
listener registration methods (such as on or addListener); The third event, timeout, is the problematic one: this
these all take two arguments: an event name, which is a event is actually never emitted by http.IncomingMessage
free-form string, and the listener function itself. Events can objects, so the listener on line 10 is dead code. There is
be emitted by invoking the emit method, which takes as a timeout event on http.ClientRequest, however, so
its first argument an event name; any further arguments presumably the event should have been registered on req,
are passed as arguments to the listener functions associated not res. We contacted the author of min-req-promise,
with the event. who confirmed our analysis of the issue.
A typical example of this event-driven style is Note that there are no compile-time or runtime diag-
the request function from the http package in nostics to alert the developer to this problem: not only is it
the Node.js standard library. Normally invoked as very difficult to infer precise types for variables in JavaScript
http.request(url, fn) where url is the URL to make in general, but there is not even anything semantically
a request to, and fn is a listener function, it creates an event wrong with registering a handler for a timeout event on
emitter object of class http.ClientRequest representing http.ClientRequest. While the http library will never
the pending request to url and associates fn with the emit this event, client code could do so itself by calling the
response event of the request. emit method (although in this case it does not). Moreover,
When a response to the request is received, the since dead-listener bugs do not cause a crash at runtime,
response event is emitted, causing fn to be invoked with they may go undetected for a long time: in the case of
4
min-req-promise, the bug had been present since its initial form ha, ei where a represents the object on which the lis-
version (released in March 2018). tener is registered, and e the event for which it is registered.
At present, the only way for a developer to detect this Both a and e need to be represented in a code base-
sort of problem is to carefully reason about types and the independent way to enable us to meaningfully collate re-
events they support (as we have done above), or to write sults obtained on many different code bases.
extensive unit tests to ensure all events are handled as For events, this is easy: e is the event name annotated
expected. In the above example, this would require adding with the emitter package. For instance, timeout events on
a test involving a request that times out, which is an edge a’s rooted in the http package are considered to be different
case that is easy to overlook. from timeout events rooted in the process package. This
Clearly, a more automated approach is desirable. is important, as events with the same name in different
packages may behave differently.
2.3 Automatically detecting dead listeners To represent event emitters, we use access paths similar
We have argued that the dynamic nature of the JavaScript to those proposed by Mezzetti et al. [8]: starting from a pack-
event-driven APIs makes it unrealistic to detect dead listen- age import, the access path records a sequence of property
ers at runtime. However, an approach based on static analy- reads, method calls and function parameters that need to be
sis faces the usual dilemma of having to trade off precision traversed to reach a particular point in the program. More
against performance: an imprecise analysis is likely to report precisely, a conforms to the following grammar:
many false positives, while a very precise analysis will not
usually scale to realistic code bases. a ::= require(m) an import of package m
Ideally, a static analysis would analyze client code as | a.f property f of an object repre-
in Figure 2 along with the implementation of the Node.js sented by a
standard libraries and any other third-party libraries it | a() return value of a function repre-
depends on, derive a precise model of which types support sented by a
| a(i) ith argument of a function repre-
which events, and then flag dead listeners based on this
sented by a
information. In practice, we know of no static analyzer for
| anew () instance of a class represented by
JavaScript precise enough to derive such a model that scales
a
to the size and complexity of the libraries involved. As a
comparatively benign example, the Node.js http package Note that access paths are always rooted at a package
transitively depends on more than 60 modules, for a total of import, so we can always tell which package any program
around 20,000 lines of code. While this is quite manageable element derives from.
for, say, type inference or taint tracking, it is out of reach for For instance, in Figure 2, the access path associated with
techniques that precisely model event dispatch, such as that the variable req is require(http).request(), meaning that
of Madsen et al. [7]. req is initialized to the result of calling the method request
The usual answer is to instead provide the analysis on the result of importing the http module.2
with simplified models of the libraries involved. This is The access path of res, on the other hand, is
indeed a good approach for frequently used and well- require(http).request(1)(0): starting from the import of
documented packages like http, but the modern JavaScript http, we look at a call to request as above, but instead
library landscape is vast, with npm alone hosting well over of considering the result we look instead at its second
one million packages. While many of these are very rarely argument, 3 which is the listener function on line 4, and then
used, the number of popular packages is still too large to the first argument to that function, which is the variable res.
allow manual modeling, especially since packages tend to As above, the value of the first argument to request is not
go in and out of style quite frequently. recorded in the access path.
Upon analyzing this snippet of code, we would record
2.4 Approach three pairs of access paths and events, corresponding to the
Our proposed solution to this dilemma is to turn the three explicit event listener registrations:
size of the JavaScript ecosystem to our advantage in a 1) hrequire(http).request(1)(0), datai, corresponds
two-step approach illustrated in Figure 1: first, we mine to line 5
large amounts of open-source code from GitHub and other 2) hrequire(http).request(1)(0), endi, corresponds
hosting platforms for real-world examples of event-listener to line 6
registrations; then we perform a statistical analysis to deter- 3) hrequire(http).request(1)(0), timeouti,
mine whether a certain pattern is rare and hence suggestive corresponds to line 10
of incorrect API usage, or whether is common and therefore
likely to be a correct use. This allows us to automatically Our approach is based on the assumption that if such
derive models instead of writing them by hand. pairs are collected over a lot of code, we are likely to see
In the next two sections we explain the data mining and many instances of the first two (correct) pairs, but few
classification steps in more detail. instances of the last (incorrect) pair. This is indeed the case:
in our experiments (further detailed below) we found 996
3 DATA M INING
2. Note that the argument to request is not recorded in the access
The mining step is implemented as a context- and flow- path; see also Section 8.
insensitive static analysis that finds event-listener registra- 3. We index arguments starting from zero, so the argument at index
tions and records them as listener-registration pairs of the one is the second argument.
5
instances of the first pair and 898 of the second, but only 1 var eos = function(stream, opts, callback) {
one of the third. 2 // ...
To detect event-listener registrations, our analysis looks 3 if ( isRequest( stream)) {
for calls to methods named on, once, addListener, 4 stream.on(’complete’, /* ... */ );
5 stream.on(’abort’, /* ... */);
prependOnceListener or prependListener (the stan- 6 }
dard Node.js listener registration methods), where the re- 7 // ...
ceiver can be represented by an access path, the first argu- Fig. 3. Listener registration with explicit type check
ment is a constant string (the event name), and the second
argument is a function (the callback).
3.1.2 Access path alias removal
Such chained listener registrations are a very common pat-
3.1 Access Path Imprecision tern in event-driven JavaScript, since all listener-registration
Due to its simplicity, our mining analysis is fairly imprecise. methods return their receiver object, that is, the event emit-
As we will show in Section 7, this does not matter: the ter object itself. To mitigate the resulting aliasing, we replace
statistical analysis in the classification step compensates for access paths representing the result of a listener-registration
much of the imprecision and yields high-quality results. method with the access path of the receiver object of the call.
There are two main sources of imprecision: our choice of For example, recall that the access path for res in
access paths to represent runtime objects, and the lack of Figure 2 is require(http).request(1)(0). This means
context and flow sensitivity of the analysis. that the access path for res.on(’data’, ...) is
require(http).request(1)(0).on(). Similarly, the access
3.1.1 Imprecision due to access path representation path for res.on(’data’, ...).on(’end’, ...) is
require(http).request(1)(0).on().on(). However, the lat-
The formulation of access paths we use is attractive in its
ter two access paths are aliases of the first one, so we
simplicity, but it is imprecise because access paths are both
replace both of them with require(http).request(1)(0),
overapproximate (the same access path may represent many
which enables our analysis to recognize that all three event
different runtime objects) and non-canonical (two different
listeners are registered on the same API element.
access paths may represent the same runtime object).
Note that in general, cycles in the data-flow graph can
As an example of the former, consider again line 4 in
give rise to infinitely many access paths that all alias each
Figure 2. This line can equivalently be written like this:
other. Such cycles are already detected and collapsed by the
1 const req = http.request(url); access-path library used in our implementation.
2 req.on(’response’, res => { ... });
Here, the access path for res becomes 3.1.3 Imprecision due to lack of context and flow sensitivity
require(http).request().on(1)(0): it is the first parameter The second source of imprecision is the lack of context and
of the second argument to on invoked on the result of flow sensitivity of the analysis, which may cause listener-
http.request. This does not record the other arguments registration pairs to be reported that can never actually
to on; so, the access path does not include the fact that happen at runtime.
the first argument to on is response. While in actual A typical example of this is shown in Figure 3.4 The func-
fact res is an instance of http.IncomingMessage since tion eos accepts a variety of streams. Since the complete
the event listener is associated with event response, and abort events are not emitted by all types of streams,
the parameter of an event listener associated with, for it first checks whether the stream is a request before regis-
example, event socket has the same access path, but it is tering listeners for these two events. Our analysis lacks flow
an instance of net.Socket. This means that in some cases sensitivity, and hence reports complete and abort event
we cannot determine event registration correctness based listeners being registered on all streams passed as arguments
purely on the object’s access path: for example, while both to eos that do not support these events (in this particular ex-
http.IncomingMessage and net.Socket have a data ample, stream objects of class http.IncomingMessage).
event, the former has an aborted event that the latter lacks. Finally, note that our mining analysis does not account
As an example of the lack of canonicity of access paths, for code that explicitly emits an event. This means that it
note that the event registration method on returns the emit- may report a pair ha, ei that is, in general, incorrect because
ter event on which it is invoked, so lines 5–10 of Figure 2 a is a library API that does not emit event e, but happens to
could be rewritten as a single statement with three chained be a correct use for a particular code base, because that code
listener registrations: base explicitly emits e on a.
1 res.on(’data’, /* omitted */) For example, consider Figure 4.5 On line 8, we see a
2 .on(’end’, /* omitted */) listener to response registered on the result of a call
3 .on(’timeout’, () => reject(req)); to http.createServer(), which is an object that is an
instance of http.Server. According to the API documen-
While this does not affect the pair recorded
tation of the http library, http.Server does not emit the
for the first registration, the second becomes
response event. However, on line 5, the client application
hrequire(http).request(1)(0).on(), endi. Seman-
tically, require(http).request(1)(0).on() and 4. Adapted from the mafintosh/end-of-stream project on
require(http).request(1)(0) denote the same set of GitHub
concrete runtime objects, i.e., they are aliases. 5. Adapted from the strongloop/strong-pm project on GitHub
6
experiments, and these experiments are independent, since Putting it all together, then, we consider a listener-
checking if one pair has an access path matching a has no registration pair ha, ei to be rare if both rarity tests succeed,
effect on the checking of any other pairs. Hence it makes that is, if the following condition holds:
sense to model the probability p of such an a0 being the a we
are interested in as a binomial distribution. In general, the BCDF(k, na , pe ) < pce ∧ BCDF(k, ne , pa ) < pca
binomial distribution describes the probability distribution
of the number of “successes” in a sequence of independent 4.3 Refining the statistical analysis
experiments, where in this case success means a0 being a.
We cannot measure the probability p directly, since our Applying this condition in practice, we noticed one particu-
data set only covers a small fraction of the universe of lar scenario where it led to misclassifications: if for an event
all existing or possible JavaScript code. Instead, we use a e there are many pairs ha, ei, but each individual pair occurs
confidence test to determine how likely it is, based on our infrequently, we will end up classifying all access paths a
limited data set, that a is a rare access path for e, that is, that for this event as rare. This pattern arises, for instance, with
the true (but unknown) probability p is smaller than pa . custom events used in tests.
As a concrete example, there are 522 ha, ei pairs register-
As is usual for hypothesis tests, we will actually test the
ing a listener for the doge event on an a rooted at the npm
converse: how unlikely it is that a is a common access path
package socket.io-client. This nonsensical event name
for e, that is, that p ≥ pa .
is commonly used for a placeholder or test event – this is
Since we model the probability of an access path oc- reflected in the data, as we see that these 522 pairs involve
curring with an event as a binomial distribution, we can 520 unique paths, 519 of which occur in exactly one pair.
use the binomial cumulative distribution function (BCDF) In other words, the usage of doge follows no discernible
to implement this test [14]: for a series of ne independent pattern in the data.
experiments, BCDF(k, ne , pa ) is the likelihood that the prob-
For one of the pairs ha, dogei where a only oc-
ability of success is at least pa , assuming that at most k of
curs once and a rarity threshold pa of 0.01, we get
the experiments are successful.
BCDF(1, 522, 0.01) ≈ 0.03, so we would conclude with 97%
In our setting, the ne experiments we consider are checks confidence that a is rare for doge and might then label it
of all listener registration pairs of the form ha0 , ei, and k is as anomalous. This is undesirable: we should not conclude
the number of pairs where a0 = a. As explained above, pa is anything about this pair, since the data is too sparse.
the rarity threshold we use to classify an access path as rare We encode this into the statistical analysis by changing
for an event. If this likelihood BCDF(k, ne , pa ) is less than the occurrence count k to not only count occurrences of
a (small) confidence threshold pca , then this means that based the pair ha, ei, but also occurrences of pairs ha0 , ei where
on our data it is unlikely that p is at least pa , and so it is likely a0 appears together with e as often or less often than a.
that it is, in fact, less than pa . Formally, we write ke (a) for the number of times the pair
As a concrete example, for the pair ha, ei occurs in the data (for which we used k above), and
hrequire(http).request(1)(0), timeouti corresponding then define
to the bug in Figure 2 we have ne = 216 and k = 2: the X
timeout event occurs in 216 pairs, but only twice with ke (dae) = {ke (a0 ) | 0 < ke (a0 ) ≤ ke (a)}
this access path. Intuitively, since we see this access path
in 2/216 pairs, we might expect a p value around 0.01, but Intuitively, this means that we are now not only taking
higher values like p > 0.05 seem unlikely. into account the absolute number of times we see a together
Plugging in these values into the BCDF formula, we get with e, but also how that number compares to that of other
BCDF(2, 216, 0.05) ≈ 0.001, meaning that based on our as (on the same e). For example, for the 519 access paths
observations the likelihood of p being greater than 0.05 is that only appear once together with doge, we now have
0.1%. Turning this statement around, we are 99.9% certain ke (dae) = 519, making them very unlikely to be considered
that a occurs in 5% or less of all access pairs involving rare.
e. Now, to conclude that a is indeed rare for e (with the Defining ka (dee) symmetrically as the number of occur-
rarity threshold pa = 0.05), we need this 99.9% certainty to rences of pairs ha, e0 i where e0 appears together with a as
satisfy the chosen confidence threshold pca . If, for example, often or less often than e, we refine the overall condition for
we chose a pca = 0.05, the confidence threshold would be a pair ha, ei being classified as anomalous as follows:
95% and so we would conclude that a is rare for e.
BCDF(ka (dee), na , pe ) < pce ∧ BCDF(ke (dae), ne , pa ) < pca
This confidence threshold pca is also a parameter of the
statistical analysis, so that we ultimately end up with four In particular, the single-occurrence access
parameters: two rarity thresholds pa and pe , and two confi- paths above now fail the second condition since
dence thresholds pca and pce , all of which range between 0 BCDF(ke (dae), 522, 0.01) = BCDF(519, 522, 0.01) ≈ 1,
and 1 (as they represent probabilities). that is, we are almost 100% confident that these access paths
The rarity threshold pa determines when we consider do not meet the rarity threshold of 0.01.
an access path a to be rare for an event e, and the rarity It should be noted, however, that this formulation does
threshold pe determines when we consider an event e to result in more false negatives: if any of these access paths is
be rare for an access path a. The confidence threshold pca actually incorrect, they will no longer be flagged. Since we
determines how confident we want to be that a is actually are mostly interested in automated bug detection, we are
rare for e based on the data, and similarly for pce . willing to trade false positives for false negatives.
8
true, mainly for two reasons11 : choose a set of parameter values to test. For the rar-
ity thresholds pa and pe , we chose values from the
1) The access path may be rooted in the import of an
set {0.005, 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.25}. A value of
API that is not among the 18 packages considered
pa = 0.005, for instance, means that we consider an
in the validation set.
access path to be rare for an event if it occurs in
2) The access path and/or the event are not mentioned
less than 0.5% of all pairs with this event. For the
in the API documentation. This is, for example, the
confidence thresholds we chose values from the set
case for custom API extensions or events like the
{0.005, 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 1}. A value of pca =
spotify property in the example in Section 4, or
0.005, for instance, means that we want to be 99.5% sure
for deprecated APIs that have been removed from
that an a is rare for an e before classifying it as rare. The
the documentation.
extreme value of pa = 1 has the effect of classifying every a
as rare for e, thereby reducing the statistical analysis to just
6.2 Measuring analysis quality checking whether events are rare for access paths (and vice
Given the validation set, we can now define the usual met- versa). This allows us to test the sensitivity of the statistical
rics for assessing the effectiveness of the statistical analysis. analysis to the rarity of access paths and events individually.
A false positive is a pair haccess path, eventi that is Altogether, this results in a space of 4,096 configurations.
classified as anomalous by the statistical analysis but that is
labeled as correct or imprecise12 in the validation set. Con- 7 E VALUATION
versely, a false negative is a pair that is labeled as incorrect
To evaluate the practicality of our approach, we run the
in the validation set but that the statistical analysis does not
statistical analysis over the mined data for each of the 4,096
classify as anomalous (i.e., we consider both the case where
configurations, and assess the results quantitatively and
the statistical analysis classifies it as expected and the case
qualitatively with the following research questions:
where the statistical analysis has left it unclassified as false
negatives). Finally, a true positive is a pair that is labeled as RQ1. Impact of configuration parameters on precision/recall:
incorrect in the validation set and that the statistical analysis How do precision/recall change as the configuration
also classifies as anomalous. parameters vary?
In Section 8.1 we investigate reasons for false positives RQ2. Impact of training set selection on precision/recall: How
and false negatives that occur in the results of the statistical do precision/recall change as the training set selec-
analysis, and present a case study of some specific demon- tion varies?
strative examples. RQ3. Impact of training set size on precision/recall: How
Based on these definitions, we now can now define the do precision/recall change as the training set size
recall of the statistical analysis as the percentage of pairs varies?
labeled as incorrect in the validation set that are classified as RQ4. Utility of results: Does the approach identify practi-
anomalous by the statistical analysis. Moreover, the precision cally relevant mistakes?
of the statistical analysis is defined as the percentage of RQ5. Performance: Is the approach practical in terms of
true positives among all anomalous pairs reported by the performance/resources?
statistical analysis.
We now address each of these research questions in turn.
6.3 Data Mining and Statistical Analysis
We ran the mining analysis on all 127,531 JavaScript projects 7.1 RQ1: Impact of configuration parameters on preci-
available on LGTM.com at the time of writing this paper. sion/recall
With this, we collected a total of 532,004 ha, ei listener- Since the goal is to automatically find dead listener patterns,
registration pairs (160,195 of which were unique), from our approach has to achieve two things to be practically
35,757 projects. The remaining projects did not use event- useful: it should classify as anomalous as many as possible
based APIs recognized by the analysis. listener-registration pairs that are labeled as incorrect in
Of this mined data, we have labeled 959 pairs as being the validation set, while at the same time minimizing the
correct API uses, 4,323 pairs as having an imprecise access number of pairs that are classified as anomalous that are
path, and 399 as incorrect API uses, for a total of 5,681 labeled as correct in the validation set. In other words, we
labeled pairs forming our validation set. should maximize for both recall and precision.
Section 7 will explore how recall and precision of the How well the approach achieves these goals depends
statistical analysis are affected by the selection of differ- on the parameters of the statistical analysis, so we sys-
ent parameters for the rarity thresholds pa and pe and tematically explore the space of parameter configurations
the confidence thresholds pca and pce , and which con- to find one that maximizes recall while maintaining an
figurations generally yield the best tradeoff between re- acceptable precision rate (defined as 90% in accordance
call and precision. For these experiments, we needed to with the literature [10], [9]). For each configuration we
run the classification to find anomalous listener-registration
11. Supplemental materials provide some examples of pairs in our pairs, and use the validation set described in Section 6.1 to
mined data that are not included in the validation set.
determine the precision and recall of the statistical analysis.
12. We consider pairs that the statistical analysis classifies as being
anomalous but that are labeled as imprecise in the validation set as Figure 5 shows the results of this experiment. Unsur-
false positives, to report the most pessimistic results for our technique. prisingly, there is an inverse correlation between the recall
10
Fig. 5. Precision and recall for all configurations (blue dots); Pareto front in red
and precision: configurations that classify many pairs as 10%, and the confidence thresholds pca and pce are 3% and
being anomalous have many true positives, but also many 1%, respectively. Over the validation set, this configuration
false positives. Hence it is not meaningful to optimize either yields three false positives and 30 true positives, for a
metric in isolation. precision of 90.9%. The true-positive pairs occur 75 times
Instead, we want to concentrate on the Pareto front [20, in total across 64 projects. All the false positives for this
Chapter 16], that is, the set of configurations for which there configuration were cases where the access path is overly
is no other solution that is better on both metrics (the red imprecise.
line in Figure 5): a configuration is in the Pareto front if
there is no configuration with the same (or higher) precision
that has a higher recall. 7.2 RQ2: Impact of training set selection on preci-
Altogether, there are eight configurations on the Pareto sion/recall
front with precision of 80% or above, as detailed in Table 1. The configuration we identified as optimal in RQ1 performs
For each configuration we show the values of the four pa- very well on the full validation set, but of course this does
rameters, the precision and recall, and the number of unique not imply that it would do as well on another data set.
true positives, false positives, and pairs in the validation In order to address this concern without having to manu-
set that remain unclassified by the statistical analysis. We ally label even more pairs, we conducted a 10-fold cross-
also show the number of times these true positives occur validation experiment. We divided the data into 10 random
in the entire data set (roughly speaking, this is the number partitions. Then, we determine the best configuration (i.e.,
of potential bugs the configuration finds) and the number the highest recall with at least 90% precision) over nine of
of projects they occur in. For example, the first row reads these partitions (the training data) and validate them on
as follows: a parameter configuration of pa , pe , pca , and pce the remaining partition (the validation data). We repeat this
as 5%, 5%, 2%, and 10% respectively, results in a precision procedure ten times, once for each of the partitions as the
of 100% and recall of 3.0% over the validation set. This validation data.
corresponds to 12 true positives (#TP), no false positives The results of the experiment are shown in Table 2.
(#FP), and no unclassified pairs (#UP); these true positive Each row represents the results of using a group of nine
pairs occur 23 times in the mined data (Occ TP), across 22 partitions as the training data and the remaining partition as
projects (#Proj). the validation data. The second column shows the optimal
To answer RQ1, then, we found that there are indeed configuration over the training data. Columns 3-5 show the
configurations with more than 90% precision. The fourth precision, recall, and absolute true positive count on the
row represents the configuration we consider optimal: This training data, while columns 6-8 show the same on the
is the configuration that yields the highest recall for a preci- validation set. For example, the first row reads as follows:
sion over 90%. The rarity thresholds pa and pe are 10% and in the first round the optimal configuration on the training
11
Configuration Results
(pa , pe , pca , pce ) % Precision % Recall # TP # FP # UP Occ TP # Proj
(0.05, 0.05, 0.02, 0.1) 100.0 3.0 12 0 0 23 22
(0.1, 0.05, 0.05, 0.1) 95.8 5.8 23 1 0 57 48
(0.1, 0.05, 0.1, 0.1) 92.3 6.0 24 2 0 58 49
(0.1, 0.1, 0.03, 0.01) 90.9 7.5 30 3 1 75 64
(0.25, 0.04, 0.01, 0.005) 88.6 7.8 31 4 3 77 61
(0.25, 0.05, 0.01, 0.01) 86.5 8.0 32 5 3 79 63
(0.25, 0.01, 1, 0.04) 85.4 8.8 35 6 9 48 36
(0.25, 0.01, 1, 0.1) 84.8 9.8 39 7 12 55 41
TABLE 1
Configurations with ≥80% recall; optimal configuration highlighted in gray.
data was pa = 0.25, pe = 0.04, pca = 0.01, pce = 0.005, then determined the optimal configuration on this subset13 .
which achieved a 90.6% precision with 8.1% recall, finding As before, we define “optimal” to mean the configuration
29 true positive results. On the validation data, that same with the highest recall that achieves a precision of at least
configuration resulted in a precision of 87.5% and recall of 90%. Then, we take this configuration and report the pre-
5.0%, with 2 true positive results. cision/recall it achieves over the whole data set. We repeat
We see consistent results with the cross-validation exper- this process 10 times for each percentage, and test this on
iment. Concretely: across the 10 rounds of the experiment, samples of 2%, 5%, 10%, 25%, and 50% of the total data.
in the training data we see an average precision of 90.8% For each of these sample percentage, we report the average
(standard deviation 0.7%) and an average recall of 7.4% (harmonic mean) precision and recall computed over all 10
(standard deviation 1.0%). Then, in the validation data we iterations.
see an average precision of 86.3% (standard deviation 10.3%) Table 3 presents the results of this experiment, with
and an average recall of 7.8% (standard deviation 4.0%). one row per sampling of a given percentage. For example,
From this we see that not only is the quality of results the first row can be read as follows: for the first random
consistent between training runs, but that it also results in sampling of 2% of the data, the optimal configuration is
consistent results on the validation data. (0.25, 0.01, 1, 0.1). This achieves a precision of 100% and a
Looking at the configurations determined to be optimal, recall of 50% over the 2% subset, and a precision of 84.8%
we see a high occurrence rate of each of the parameters and a recall of 9.8% over the whole dataset.
determined optimal over the whole set. Concretely: the Considering only average precision and recall over the
optimal pa = 0.1 is found in 7 runs and pa = 0.25 (resulting whole dataset for the moment, we can see two interesting
in a precision of 88.6% over the whole set) is found in the trends: when training on 2% of the data, precision over
other 3. Similarly, the optimal pe = 0.1 is found in 5 runs, the whole dataset is fairly low at 82.4%, and so is recall
pca = 0.03 in 4 runs, and pce = 0.01 in 3 runs. In conclusion, at 6.9%. As the amount of training data increases, precision
the choice of training data does not substantially affect the increases as well, reaching 91% for 25% of the data. Recall,
choice of optimal configuration. on the other hand, at first decreases, hitting a low of 3.6%
on 25% of the data, before slightly increasing again to 3.9%
at 50%. (Compare this to the 7.5% recall reported in RQ1
7.3 RQ3: Impact of training set size on precision/recall when training on all 100% of the data.) This suggests that
stable precision can already be achieved with relatively little
Having shown that the selection of the training data does not
training data, but more data is needed to improve recall.
matter much, we will now explore the effect of the size of the
training data: how well does the statistical analysis perform
when trained over smaller data sets? 13. Alternatively, one could apply the analysis to a subset of projects
and examine the stability of the results. However, we decided against
In order to test this, we designed an experiment where this approach because the number of modeled API calls varies consid-
we randomly sampled a given percentage of the data, and erably between projects.
12
2% of data sampled for subset
On subset On whole set
Iter Optimal config % Precision % Recall % Precision % Recall
1 (0.25, 0.01, 1, 0.1) 100.0 50.0 84.8 9.8
2 (0.25, 0.01, 1, 0.1) 100.0 25.0 84.8 9.8
3 (0.25, 0.02, 1, 0.005) 100.0 71.4 78.1 12.5
4 (0.25, 0.005, 1, 0.1) 100.0 12.5 84.2 4.0
5 (0.25, 0.02, 1, 0.005) 100.0 42.9 78.1 12.5
6 (0.25, 0.005, 1, 0.1) 100.0 46.2 84.2 4.0
7 (0.25, 0.03, 1, 0.005) 100.0 41.7 71.8 15.3
8 (0.25, 0.01, 1, 0.05) 100.0 50.0 84.1 9.3
9 (0.25, 0.005, 1, 0.05) 100.0 14.3 93.3 3.5
10 (0.25, 0.01, 1, 0.02) 100.0 37.5 83.8 7.8
Average (harmean): 82.4 6.9
5% of data sampled for subset
Iter Optimal config % Precision % Recall % Precision % Recall
1 (0.25, 0.03, 0.005, 0.05) 100.0 25.0 87.5 7.0
2 (0.25, 0.04, 0.1, 0.005) 90.0 30.8 82.9 8.5
3 (0.1, 0.1, 0.02, 0.05) 100.0 21.1 81.6 7.8
4 (0.25, 0.02, 1, 0.03) 90.0 37.5 76.4 13.8
5 (0.04, 0.02, 0.005, 0.05) 100.0 7.1 100.0 1.3
6 (0.25, 0.02, 1, 0.1) 100.0 13.6 81.3 6.5
7 (0.25, 0.03, 0.05, 0.03) 100.0 10.0 81.1 7.5
8 (0.25, 0.005, 1, 0.01) 100.0 29.4 90.0 4.8
9 (0.25, 0.02, 1, 0.05) 100.0 36.4 80.6 6.3
10 (0.25, 0.02, 1, 0.1) 100.0 22.7 81.3 6.5
Average (harmean): 83.8 4.8
10% of data sampled for subset
Iter Optimal config % Precision % Recall % Precision % Recall
1 (0.1, 0.05, 0.05, 0.005) 100.0 5.1 95.2 5.0
2 (0.25, 0.005, 1, 0.05) 100.0 29.0 93.3 3.5
3 (0.1, 0.04, 0.05, 0.01 100.0 15.8 95.0 4.8
4 (0.25, 0.02, 0.1, 0.1) 100.0 13.5 81.3 6.5
5 (0.05, 0.1, 0.01, 0.01 92.9 33.3 90.5 4.8
6 (0.25, 0.005, 1, 0.03) 90.0 22.5 92.9 3.3
7 (0.25, 0.005, 1, 0.03) 100.0 31.3 92.9 3.3
8 (0.25, 0.005, 1, 0.1) 100.0 28.1 84.2 4.0
9 (0.25, 0.02, 0.1, 0.1) 100.0 13.9 81.3 6.5
10 (0.25, 0.03, 0.05, 0.03) 90.0 20.5 81.1 7.5
Average (harmean): 88.4 4.5
25% of data sampled for subset
Iter Optimal config % Precision % Recall % Precision % Recall
1 (0.04, 0.05, 0.1, 0.03) 92.3 10.4 90.5 4.8
2 (0.1, 0.01, 0.02, 0.05) 90.0 8.7 90.0 2.3
3 (0.25, 0.04, 0.1, 0.02) 92.6 21.6 81.0 8.5
4 (0.03, 0.03, 0.05, 0.04) 100.0 6.9 100.0 2.0
5 (0.25, 0.005, 1, 0.03) 90.0 12.0 92.9 3.3
6 (0.25, 0.005, 1, 0.04) 100.0 12.6 92.9 3.3
7 (0.25, 0.005, 1, 0.05) 100.0 15.2 93.3 3.5
8 (0.25, 0.03, 0.02, 0.02) 91.7 11.8 84.4 6.8
9 (0.25, 0.005, 1, 0.03) 90.9 9.8 92.9 3.3
10 (0.1, 0.05, 0.05, 0.05) 90.9 8.5 95.7 5.5
Average (harmean): 91.0 3.6
50% of data sampled for subset
Iter Optimal config % Precision % Recall % Precision % Recall
1 (0.05, 0.05, 0.1, 0.05) 100.0 7.0 92.9 3.3
2 (0.05, 0.05, 0.05, 0.1) 93.8 7.9 92.9 3.3
3 (0.02, 0.05, 0.1, 0.02) 100.0 5.9 100.0 1.8
4 (0.04, 0.05, 0.01, 0.02) 92.9 6.6 100.0 2.8
5 (0.25, 0.03, 0.02, 0.04) 94.4 8.1 84.8 7.0
6 (0.25, 0.04, 0.01, 0.005) 90.0 13.2 88.6 7.8
7 (0.25, 0.01, 1, 0.04) 92.6 12.6 85.4 8.8
8 (0.05, 0.05, 0.05, 0.04) 92.9 7.2 92.3 3.0
9 (0.25, 0.02, 0.1, 0.005) 91.3 10.2 81.5 5.5
10 (0.25, 0.04, 0.02, 0.005) 90.5 9.6 86.1 7.8
Average (harmean): 91.0 3.9
TABLE 3
Optimal configurations over smaller percentages of the data, and the corresponding precision/recall over the whole dataset
13
Data mining and classification: Our approach involves made up of a very common access path and a very
mining and classifying listener registration pairs from a common event; indeed, it meets the thresholds of all
large number of projects. The data mining step requires of the statistical analysis parameter configurations
about 404 hours of compute time for the 127,531 projects in we tested. Therefore, this pair is always classified as
the data set. Since LGTM.com runs queries concurrently, this anomalous, i.e., it is a true positive.
step was completed in about two days. The classification 2) For the second pair, the access path
stage is much faster: classifying the pairs for a given config- require(net).connect().setNoDelay() occurs
uration takes only 35 to 40 seconds on commodity hardware. twice, the event secureConnect occurs 26 times,
We expect these steps to be applied infrequently as event- and they occur only once as a pair. Since the access
driven APIs tend to evolve slowly, and our experimental path is so rare, this incorrect pair represents 50%
results suggest that the set of optimal analysis thresholds is of the uses of this access path, and therefore it
fairly stable. is very unlikely to be classified as anomalous by
Per-project costs: Once an API model has been con- the statistical analysis. Indeed, this pair is not
structed, it can be used for a variety of purposes, e.g., in a classified as anomalous with any of the parameter
bug-detection tool that flags uses of event-driven APIs that configurations, unless the rarity of the access path is
are likely to be buggy, or in an IDE plugin for smart com- not considered at all. Therefore, it is almost always
pletion. Running the mining analysis on a single JavaScript a false negative.
project is quite fast: for 52% of all projects in the data set,
the analysis takes ten seconds or less, with another 45% In other words, a false negatives may occur in cases where a
taking between ten seconds and a minute. There are only given access path or event is used rarely, making it difficult
151 projects (0.1%) for which the analysis takes more than for the statistical analysis to conclude that the particular
ten minutes. event-listener registration pair is rarer still.
We consider these results to be encouraging as, while the
upfront cost of constructing an API model is quite high, our 8.1.2 False positives
experimental results suggest that the per-project costs are The false positives that we observed correspond to event-
sufficiently low to allow integration of our approach in a listener registration pairs that are labeled as correct in the
realistic continuous-integration workflow. validation set but that show up rarely in the mined data
and thus get classified as anomalous.
As an example of a false positive, consider the pair
8 D ISCUSSION hrequire(process).stdin, draini which is one of the
This section reports on a case study in which we investi- three false positives that arises when the analysis is run
gated a few specific examples where the statistical analysis with the optimal configuration that achieves a precision
produced false positives and false negatives, and considers of 90.9%. In the mined data, we see the access path
threats to the validity of our results. require(process).stdin 1,948 times, the event drain 234
times, but this pair itself only shows up once (in access paths
rooted in process). The vast majority of the drain events
8.1 Case study of false positives and false negatives
(209 of the 234) are seen with require(process).stdout.
8.1.1 False negatives This is because drain is an event on Writable streams
False negatives are listener-registration pairs that are labeled [23], and according to the documentation [24], [25]
as incorrect in the validation set but that the statistical process.stdout is either a net.Socket or a Writable
analysis does not classify as anomalous. Whether or not a stream while process.stdin is either a net.Socket or
given pair ha, ei is classified as anomalous by the statistical a Readable stream. Since net.Sockets are Duplex streams
analysis is entirely determined by the frequency with which (i.e., both readable and writable), according to the documen-
a, e, and the combination ha, ei occur in the mined data. tation registering a listener for drain on process.stdin
Pairs for which both the access path and the event occur is a correct use of the API. However, from the data it seems
very frequently (but the pair itself is rare) will satisfy the that although this is a correct API usage, this use is rare,
criteria for being classified as anomalous with more statis- and therefore it ends up being classified as anomalous by
tical analysis parameter configurations than those incorrect the statistical analysis.
pairs that appear more rarely. As another example of a false positive, consider
As an example, consider the pairs: the pair hrequire(zlib).createGunzip(), draini. In the
mined data, the access path shows up 1,649 times, the
hrequire(net).createServer(), endi event 441 times, but the pair itself only once (in access
and paths rooted in zlib). The most common events we see
with this access path are data (395 times), end (388 times),
hrequire(net).connect().setNoDelay(), secureConnecti, error (364 times), and close (344 times). These are all
events emitted by objects of class stream.Readable [26].
which are both labeled as incorrect in the validation set:
This is noteworthy because, according to the documenta-
1) For the first pair, the access path require(net). tion, zlib.createGunzip() returns a Gunzip object [27],
createServer() occurs 1109 times, the event end and this inherits from stream.Transform which is a Duplex
occurs 872 times, and they occur as a pair only twice stream (i.e., both readable and writable) [28]. The drain
(in the net package). In other words, this rare pair is event is an event on writable streams only. So, it seems that
15
although the streams returned by zlib.createGunzip() are a precision of at least 90%). Moreover, a cross-validation
both readable and writable, they are almost always used as experiment revealed the configuration parameters to be
readable streams, causing the statistical analysis to flag the quite stable across subsets of the data.
rare occurrence where this is not the case as anomalous. Finally, the static analysis used in the mining phase is
relatively simple and imprecise, e.g., due to inherent im-
precision of the access path representation. Our evaluation
8.2 Threats to Validity
accounted for this by considering all pairs involving impre-
We are aware of several potential threats to validity. cise access paths to be false positives. A more sophisticated
Our results depend on the set of code bases that have analysis using more precise access paths would also increase
been mined, and this set may not be representative. How- the precision of the statistical analysis.
ever, we simply used the set of all JavaScript projects on
LGTM.com that were available at the time of writing this
paper, which includes many popular open-source projects, 9 L EARNING LOST EVENTS
and projects added by users of LGTM.com. These code bases At first glance, it seems that we could apply this same learn-
were not specifically selected for this project, and provide a ing approach to the dual problem of finding bug patterns
reasonable sample of real-world JavaScript code. in lost events [7], those events which are emitted but never
Our measurements of precision and recall are based on listened for. We modified our static analysis to identify event
a relatively small set of listener-registration pairs that we emissions instead of event registrations, and reran the data
semi-automatically labeled as correct or incorrect (5,681 out mining to collect information on this dual problem, across
of 160,195 unique pairs) and might not generalize beyond the same set of projects. From this analysis, we mined a total
this set. Exhaustively labeling all pairs was infeasible, so we of 22,900 ha, ei pairs (10,432 unique).
focused on the most popular packages, to ensure that the From this data, we determined that the learning ap-
results are relevant for widely-used APIs. Cross-validation proach cannot be effectively used to identify bug patterns
showed that the choice of optimal configuration does not in lost events, since the vast majority of events emitted in
crucially depend on the chosen training data set. projects are custom events. The use of a custom event is
The semi-automatic labeling of pairs in the validation set specific to the project it appears in, and so patterns observed
generation involved a review of API documentation by the in other projects cannot be used to learn about its proper
authors. Thus, there was potential for human error in this use. We discussed this in Section 4.1 with respect to listener
process: if we misread the documentation, some pairs could registrations, but the same logic applies to event emissions.
be mislabeled in the validation set. In practice, we saw no For the remainder of this section, we discuss some details
examples of this in any of the pairs we examined. about the data mined on events emitted.
The validation set is itself biased in that it contains a Base event emitter: Of the 22,900 pairs mined,
relatively small number of pairs labeled as incorrect (399 out we observed that 5,248 have an emitter access path of
of 5,681). This affects the accuracy of the reported precision require(events).EventEmitternew () and 1,220 have access
since we are much more likely to find that a pair classified as path require(events)new (). No other access paths occur
anomalous by the statistical analysis is actually correct (and this frequently in our data set.
hence a false positive) than incorrect (and hence a true pos- In code, these access paths correspond to
itive). Consequently, our reported precision underestimates new require(’events’).EventEmitter() and
the actual precision. Remedying this imbalance would only new require(’events’)() respectively. Looking at
improve the precision. the documentation of the EventEmitter class15 , we see
There is potential for bias in the generalization of our re- that these are aliases, as the EventEmitter class is the
sults from the 18 packages we model to other npm packages. default export of the events package. Looking at the same
In particular, the results might be affected by the number of documentation, we see that there are only 2 events that
events and emitters available in a package. However, the make up this API: newListener and removeListener.
18 packages in the validation set cover a range of different Any other events emitted on objects that are instances of
scenarios in terms of the number of events, emitters, and EventEmitter are therefore custom events. With our data,
listener-registration pairs ha, ei that constitute correct usage 6,466 of these 6,468 pairs emit a custom event on objects of
of the API. These values range from 102 correct usages over the base event emitter class.
12 access paths and 13 events with the stream package, to We examined what users actually do with the custom
77 correct usages over 15 access paths and 24 events with events on base event emitters. From our exploration, a few
the fs package, to only 2 correct usages over one access common patterns of how users build their own custom
path and 2 events with the http2 package. Table 7 in the event infrastructures on top of the base EventEmitter class
supplemental materials shows these values for all the pack- could be observed; examples of these are included below.
ages in the validation set. Thus, we have some confidence Developers often create custom EventEmitter classes ex-
that the approach generalizes over packages with differing tending the base EventEmitter class in a classic object-
oriented style. Consider the following demonstrative exam-
numbers of events and emitter objects. ple, condensed from the update manager class in vscode.
The values chosen for the parameters of the statistical
analysis obviously greatly influences the quality of results. 1 export class UpdateManager
2 extends events.EventEmitter {
However, our evaluation considered a large number of dif- 3 // methods that emit and listen for custom events
ferent combinations, over which we determined the optimal
configuration for a particular set of conditions (here, for 15. https://nodejs.org/api/events.html#events class eventemitter
16
4 private initRaw(): void { on the notion of an event-based call graph that augments
5 // ... a traditional call graph with edges corresponding to event-
6 this.emit(’checking-for-update’);
7 } listener registration, event emission, and callback invoca-
8 tion. Event-handling bugs are detected by looking for pat-
9 public initialize(): void { terns in these augmented call graphs. Unfortunately, their
10 this.on(’checking-for-update’, /* ... */ ); approach does not scale well because their context-sensitive
11 }
12 } analysis employs notions of contexts corresponding to the
13 sets of events emitted and listeners registered, which may
14 export const Instance = new UpdateManager(); be exponential in the size of the program. This exponential
Since the access path representation does behavior appears to manifest itself in practice, given that, on
not reason about the inheritance hierarchy, the their largest subject program (which is a mere 390 LOC), one
new UpdateManager() is represented abstractly as of their analyses incurs a running time of 17 seconds, and
require(events).EventEmitternew (). Other common the other one does not terminate at all. Our approach targets
custom event usage patterns include extending the only dead listeners, and only those cases where the event
EventEmitter prototype or including an EventEmitter the listener is meant to handle is never emitted (excluding
as a class field. In each of these cases, the developers are cases where it is emitted at a time when the listener is
encapsulating the base EventEmitter so as to build their not registered). This allows us to use a simple and scalable
own custom event-based infrastructure. static analysis in our mining phase, and rely on statistical
There are a variety of ways developers make use of reasoning over a large data set to offset the noise.
custom event infrastructures in their code bases. Building Unfortunately, it is not possible to compare our tech-
something on top of the base EventEmitter is one of the nique directly against Madsen’s, given that their imple-
most common patterns we observed, as we have discussed mentation was a proof-of-concept static analysis for a small
above; after this, the next most frequently observed emitter subset of ECMAScript 5. As such, it did not support modern
was objects of class socket.io Socket. Examining the docu- JavaScript features such as classes or promises, which are
mentation of the socket.io API about emitting events, we pervasive in the subject applications that we analyzed. Fur-
see that there are no standard events. Therefore, all events thermore, upon inquiry, we were informed that Madsen’s
emitted on socket.io client or server based emitters are tool is no longer available [35]. That said, we investigated
custom events. This corresponds to 8,398 of the pairs (1,911 the bugs reported in Madsen’s work and found that, of the
client-side and 6487 server-side). 12 real-world bugs considered in their work, three are dead-
Manual analysis of a subset of remaining pairs: So listener bugs of the kind that is targeted by our analysis, and
far, we have determined that 14,866 or the 22,900 pairs our optimal configuration identifies all of them. The others
mined correspond to emissions of custom events on either concern dead emits, listeners for custom events, or listeners
the base event emitter or via socket.io. This in turn does that are dead due to the order in which they were added,
not mean that the rest of the pairs in our dataset do not and are outside the scope of our work.
correspond to custom events. Our work also stands in a long line of research viewing
There are 2,668 unique pairs remaining. Of these, we bugs as “deviant behavior”: statistical methods are used
manually looked at a random sampling of 200, split across to infer beliefs or rules that are implicit in the code, and
all the APIs16 . In this manual analysis, we found that 82.5% violations of these rules are flagged as likely defects.
correspond to custom events. Engler et al. [36] distinguish between “MUST beliefs”
From this, we conclude that the vast majority of events and “MAY beliefs”. The former are directly implied by the
explicitly emitted correspond to custom events. As dis- code, and often boil down to simple data-flow properties:
cussed, the use of a custom event is specific to the project for example, dereferencing a pointer implies the belief that
in which the custom emitter is defined and so patterns it is not a null pointer, and a subsequent null check of the
observed in other projects cannot be used to learn about same pointer is inconsistent with that belief. MAY beliefs,
its proper use. Therefore, our learning approach cannot be on the other hand, are patterns such as two functions that
effectively used to identify bug patterns in lost events. are often invoked in a particular order, which might reflect
an implicit rule (such as the second one freeing a resource
allocated by the first one), or might be a coincidence. They
10 R ELATED WORK use an analysis based on the z statistic to distinguish the
A considerable amount of research has focused on de- two. Our work also aims to infer a MAY belief, but of a
tecting and characterizing bugs in JavaScript applications, more complex kind than those considered by Engler, since
including bug detection tools using static analysis [29] and the relationship between event emitters and events is many-
dynamic analysis [30], [31]; evaluations of the effectiveness to-many. Custom events pose an additional challenge that
of type systems for preventing bugs [32]; development of requires more sophisticated statistical methods than the z
benchmarks [33]; and studies of real-world bugs [34]. statistic.
The most closely related work to ours is by Madsen et The PR-Miner system [37] targets a broader class of rules:
al. [7]. They describe a static analysis for detecting dead using frequent itemset mining, it extracts association rules
listeners, lost events and other event-handling bugs based A ⇒ B , where A and B are sets of program elements
such as function calls. Such a rule expresses the observation
16. All pairs manually analyzed are included in a table in the supple- that functions containing all elements in A also contain all
mental materials. elements in B , with a certain level of confidence. Violations
17
of high-confidence rules are then likely to be bugs. Again, is demonstrated to lead to improvements in client analysis
the relationship between event emitters and events does not such as typestate analysis (by eliminating a false positive
immediately fit this pattern: association rules are “forall” result) and taint analysis (by eliminating a false negative re-
rules in the sense that if all elements in A are present then sult). Chibotaru et al. [53] present a semi-supervised method
all elements in B must be present. By contrast, we are for inferring taint analysis specifications. A propagation
interested in “exists” rules in the sense that in any given graph is inferred from each program in a dataset, and it is
context a particular type of event emitter emits one event assumed that a small number of nodes corresponding to API
from a certain set, but not necessarily all of them. functions is annotated as a source, sink, or sanitizer. To infer
WN-Miner [38] and PF-Miner [39] focus more narrowly situations where unannotated nodes also play one of these
on the problem of inferring temporal specifications, specif- roles, a set of linear constraints is derived from the propa-
ically pairs of functions f and g such that g must always gation graph so that the solution to constraints represents
be invoked after f , usually because it performs some sort the likelihood of unannotated nodes being a source, sink, or
of cleanup. Acharya et al. [40] generalize this to inferring sanitizer. The program properties these works are designed
partial orders between functions. Gruska et al. [41], on the to identify are API types and function signatures. They do
other hand, generalize in a different direction and employ not discuss applications to message-passing systems like is
association rules of a similar kind as PR-Miner, but where seen in event-driven programming.
the sets A and B now contain candidate function pairs, Hanam et al. [54] present a technique for discovering
thus allowing inference of context-dependent specifications. JavaScript bug patterns by analyzing many bug-fix commits.
Murali et al. [42] apply a Bayesian framework for learn- They decompose commits into a set of language-construct
ing probabilistic API specifications, which is more robust changes, represent these as feature vectors, and apply un-
on noisy and heterogeneous data than more lightweight supervised machine learning to identify bug patterns. The
approaches. Although dead-listener detection shares some identified patterns are low-level issues such as dereferenc-
general principles with temporal-specification mining, the ing undefined and incorrect error handling. They do not
concrete setup is rather different and it is not immediately discuss bug patterns related to event handling.
obvious that their techniques apply to our problem. DeepBugs [55] aims to generate bug-fix changes auto-
Monperrus et al. [43] propose type usages as a particu- matically. By applying simple program transformations to
larly useful kind of specification to infer for object-oriented code that is assumed to be correct, training data is obtained
programs: a type usage is a set of methods invoked on a for a classifier that distinguishes correct from anomalous
variable of a given type, all within the body of a method code. The approach is evaluated for three types of er-
with a given signature. They define a metric termed s-score, rors (swapped function arguments, wrong binary operator,
which can be used to identify type usages that are them- wrong operand in binary operation), and detected dozens
selves rare, but similar to a very common type usage. This of real bugs, with a false positive rate of around 30%. It
parallels our goal of finding rare event-emitter pairs where is unclear how well this approach would work for less
a pair with a different event or a different emitter is very syntactic bugs like the dead-listener bugs we consider.
common, although the technical details of our approach are Ryu et al. [56] present the SAFE tools for detecting type
again somewhat more complex to deal with the problem of mismatch bugs that cause runtime errors (e.g., accesses to
custom events. undefined) in JavaScript web applications. They construct
A significant amount of research has been devoted to the simple models of browser runtime constructs such as the
detection of event races using static [44] and dynamic [45], HTML Document Object Model (DOM) through a dynamic
[46] analysis. Recent work has focused on event races that analysis; this is used as input for their bug detector. The
have observable effects [47], by classifying event races [48], SAFE tools differ from our work in three key ways: most
and by developing specialized techniques focused on event importantly, the class of bugs SAFE tracks does not include
races that occur during page initialization [49] or that are dead-listener bugs; also, their target runtime is the browser
associated with AJAX requests [50]. The access paths used while ours is Node.js; and, our analysis is purely static.
in this paper are not precise enough to capture the order-
ing constraints necessary for event-race detection, so our
approach is not immediately applicable to this problem. 11 C ONCLUSION
Other researchers have used statistical reasoning for We have presented an approach for detecting dead listener
predicting properties of programs for use in bug finding. patterns in event-driven JavaScript programs that relies on a
Raychev et al. [51] derive probabilistic models from existing combination of static analysis and statistical reasoning. The
data using structured prediction with conditional random static analysis computes a set of listener-registration pairs
fields (CRFs). They apply their analysis to JavaScript pro- ha, ei where a is an access path and e the name of an event,
grams to predict the names of identifiers and types of vari- reflecting the fact that a listener is registered for e on an ob-
ables in new, unseen programs, and suggest that the com- ject represented by a. After applying the static analysis to a
puted results can be useful for de-obfuscation and adding large corpus of JavaScript applications, statistical modeling
or checking type annotations. Eberhardt et al. [52] apply is used to differentiate expected event listener registrations
unsupervised machine learning to a large corpus of Java that are commonly observed from rarely observed anoma-
and Python programs obtained from public repositories to lous cases that are likely to be incorrect. In a large-scale
infer aliasing specifications for popular APIs, which are then evaluation on 127,531 open-source JavaScript code bases,
used to enhance a may-alias analysis that is applied to ap- our technique was able to detect 75 anomalous listener-
plications using such APIs. The resulting enhanced analysis registration patterns, while maintaining a precision of 90.9%
18
and recall of 7.5% over a validation set, demonstrating that [10] C. Sadowski, J. van Gogh, C. Jaspan, E. Söderberg, and C. Winter,
a learning-based approach to detecting event-handling bug “Tricorder: Building a Program Analysis Ecosystem,” in 37th
IEEE/ACM International Conference on Software Engineering, ICSE
patterns is feasible. 2015, Florence, Italy, May 16-24, 2015, Volume 1, 2015, pp. 598–608.
We report on several additional experiments to better [Online]. Available: https://doi.org/10.1109/ICSE.2015.76
assess the impact of the data set analyzed by the statistical [11] OpenJS Foundation, “EventEmitter documentation,”
analysis, the utility of the results, and the practicality of https://nodejs.org/api/events.html#events emitter on
eventname listener, 2021, accessed: 2021-12-19.
the technique. One experiment revealed that the selection [12] B. Vidis, “Minimal request promise,” https://github.com/
of the particular subset of data that statistical analysis is benoitvidis/min-req-promise/, 2021, accessed: 2021-07-18.
trained on does not substantially affect the choice of optimal [13] F. L. Norvell, “Add dynamic instrumentation to emitters,” https:
configuration. On the other hand, we found the size of the //www.npmjs.com/package/emitter-listener/, 2021, accessed:
2021-07-18.
subset used for training to have significant impact, with [14] J. G. Bryan and G. P. Wadsworth, “Introduction to probability and
smaller training set sizes generally resulting in classifiers random variables,” p. 52, 1960.
that have unstable precision and lower recall on the full [15] P. Avgustinov, O. de Moor, M. P. Jones, and M. Schäfer, “QL:
data set. Furthermore, we demonstrated that our approach object-oriented queries on relational data,” in 30th European
Conference on Object-Oriented Programming, ECOOP 2016, July
is effective at identifying buggy listener registrations in real 18-22, 2016, Rome, Italy, 2016, pp. 2:1–2:25. [Online]. Available:
code bases: of the 30 issues we recently reported to develop- https://doi.org/10.4230/LIPIcs.ECOOP.2016.2
ers of 25 open-source projects on GitHub, 7 were confirmed [16] GitHub, “Ql standard libraries,” https://github.com/github/
as bugs. While the statistical analysis requires a significant codeql, 2021, accessed: 2021-07-18.
[17] ——, “Lgtm.com,” https://lgtm.com/, 2021, accessed: 2021-07-18.
amount of compute time, we would expect this cost to
[18] P. developers, “pandas: Python data analysis library,” https://
be incurred infrequently, as APIs tend to evolve slowly. pandas.pydata.org/, 2021, accessed: 2021-07-18.
Checking a specific project for dead listeners typically takes [19] SciPy developers, “SciPy,” https://www.scipy.org/, 2021, ac-
no more than a few minutes for all but the largest projects. cessed: 2021-07-18.
As future work, we plan to explore more precise notions [20] A. Mas-Colell, M. D. Whinston, J. R. Green et al., Microeconomic
theory. Oxford university press New York, 1995, vol. 1.
of access paths that would allow us to build distinct rep- [21] “eris bug fix commit,” https://github.com/abalabahaha/eris/
resentations for function calls where some arguments are commit/6fcfc8c5472bdbc41f5cfbff499774534f6429ad, 2020.
string literals and others are callbacks. In principle, this [22] “haraka bug fix commit,” https://github.com/haraka/Haraka/
would enable us to distinguish access paths in the presence commit/99cb9f93451e54b16cfad856de0c4811fb6d4ccf, 2020.
[23] Node.js, “Stream drain event documentation,” https://nodejs.
of nested event handlers. org/api/stream.html#event-drain, 2021, accessed: 2021-12-07.
[24] ——, “Process stdin documentation,” https://nodejs.org/api/
process.html#processstdin, 2021, accessed: 2021-12-07.
ACKNOWLEDGMENT [25] ——, “Process stdout documentation,” https://nodejs.org/api/
The authors would like to thank Albert Ziegler for insightful process.html#processstdout, 2021, accessed: 2021-12-07.
[26] OpenJS Foundation, “stream.Readable documentation,”
and helpful discussions about the statistical modeling. https://nodejs.org/dist/latest-v16.x/docs/api/stream.html#
E. Arteca and F. Tip were supported in part by National class-streamreadable, 2021, accessed: 2021-12-19.
Science Foundation grants CCF-1715153 and CCF-1907727. [27] ——, “zlib.Gunzip documentation,” https://nodejs.org/dist/
E. Arteca was also supported in part by the Natural Sciences latest-v16.x/docs/api/zlib.html#class-zlibgunzip, 2021, accessed:
2021-12-19.
and Engineering Research Council of Canada. [28] ——, “zlib.ZipBase documentation,” https://nodejs.org/
dist/latest-v16.x/docs/api/zlib.html#class-zlibzlibbase, 2021, ac-
cessed: 2021-12-19.
R EFERENCES [29] S. Bae, H. Cho, I. Lim, and S. Ryu, “SAFEWAPI: web API
[1] World Wide Web Consortium, “UI Events - W3C Working Draft,” misuse detector for web applications,” in Proceedings of the
https://www.w3.org/TR/DOM-Level-3-Events, 2021, accessed: 22nd ACM SIGSOFT International Symposium on Foundations
2021-07-18. of Software Engineering, (FSE-22), Hong Kong, China, November
[2] OpenJS Foundation, “jquery,” https://jquery.com/, 2021, ac- 16 - 22, 2014, 2014, pp. 507–517. [Online]. Available: https:
cessed: 2021-07-18. //doi.org/10.1145/2635868.2635916
[3] Angular, https://angular.io/, 2021, accessed: 2021-07-18. [30] M. Pradel, P. Schuh, and K. Sen, “TypeDevil: Dynamic
[4] Facebook, “React: A JavaScript library for building user inter- type inconsistency analysis for JavaScript,” in 37th IEEE/ACM
faces,” https://reactjs.org, 2021, accessed: 2021-07-18. International Conference on Software Engineering, ICSE 2015, Florence,
[5] Electron Administrative Working Group, “Electron: Build cross Italy, May 16-24, 2015, Volume 1, 2015, pp. 314–324. [Online].
platform desktop apps with JavaScript, HTML, and CSS,” https: Available: https://doi.org/10.1109/ICSE.2015.51
//electronjs.org, 2021, accessed: 2021-07-18. [31] S. Alimadadi, D. Zhong, M. Madsen, and F. Tip, “Finding broken
[6] OpenJS Foundation, “Node.js,” https://nodejs.org, 2021, ac- promises in asynchronous JavaScript programs,” pp. 162:1–162:26,
cessed: 2021-07-18. 2018. [Online]. Available: https://doi.org/10.1145/3276532
[7] M. Madsen, F. Tip, and O. Lhoták, “Static analysis of event- [32] Z. Gao, C. Bird, and E. T. Barr, “To type or not to type:
driven Node.js JavaScript applications,” in Proc. ACM SIGPLAN quantifying detectable bugs in JavaScript,” in Proceedings of the
Conference on Object-Oriented Programming, Systems, Languages, and 39th International Conference on Software Engineering, ICSE 2017,
Applications (OOPSLA), 2015. Buenos Aires, Argentina, May 20-28, 2017, 2017, pp. 758–769.
[8] G. Mezzetti, A. Møller, and M. T. Torp, “Type Regression Testing [Online]. Available: https://doi.org/10.1109/ICSE.2017.75
to Detect Breaking Changes in Node.js Libraries,” in 32nd European [33] P. Gyimesi, B. Vancsics, A. Stocco, D. Mazinanian, Á. Beszédes,
Conference on Object-Oriented Programming, ECOOP 2018, July 16- R. Ferenc, and A. Mesbah, “BugsJS: a benchmark of JavaScript
21, 2018, Amsterdam, The Netherlands, ser. LIPIcs, T. D. Millstein, bugs,” in 12th IEEE Conference on Software Testing, Validation and
Ed., vol. 109. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Verification, ICST 2019, Xi’an, China, April 22-27, 2019, 2019, pp. 90–
2018, pp. 7:1–7:24. 101. [Online]. Available: https://doi.org/10.1109/ICST.2019.00019
[9] C. Sadowski, E. Aftandilian, A. Eagle, L. Miller-Cushon, and [34] J. Wang, W. Dou, Y. Gao, C. Gao, F. Qin, K. Yin, and J. Wei, “A
C. Jaspan, “Lessons from Building Static Analysis Tools at comprehensive study on real world concurrency bugs in Node.js,”
Google,” Commun. ACM, vol. 61, no. 4, pp. 58–66, Mar. 2018. in Proceedings of the 32nd IEEE/ACM International Conference on
[Online]. Available: http://doi.acm.org/10.1145/3188720 Automated Software Engineering, ASE 2017, Urbana, IL, USA,
19
October 30 - November 03, 2017, 2017, pp. 520–531. [Online]. Engineering Conference and Symposium on the Foundations of Software
Available: https://doi.org/10.1109/ASE.2017.8115663 Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL,
[35] M. Madsen, 2021, private communication. USA, November 04-09, 2018, 2018, pp. 38–48. [Online]. Available:
[36] D. R. Engler, D. Y. Chen, and A. Chou, “Bugs as Inconsistent https://doi.org/10.1145/3236024.3236038
Behavior: A General Approach to Inferring Errors in Systems [51] V. Raychev, M. T. Vechev, and A. Krause, “Predicting
Code,” in Proceedings of the 18th ACM Symposium on Operating program properties from ”big code”,” in Proceedings of the
System Principles, SOSP 2001, Chateau Lake Louise, Banff, Alberta, 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles
Canada, October 21-24, 2001, K. Marzullo and M. Satyanarayanan, of Programming Languages, POPL 2015, Mumbai, India, January
Eds. ACM, 2001, pp. 57–72. [Online]. Available: https: 15-17, 2015, 2015, pp. 111–124. [Online]. Available: https:
//doi.org/10.1145/502034.502041 //doi.org/10.1145/2676726.2677009
[37] Z. Li and Y. Zhou, “PR-Miner: Automatically Extracting [52] J. Eberhardt, S. Steffen, V. Raychev, and M. T. Vechev, “Unsu-
Implicit Programming Rules and Detecting Violations in Large pervised learning of API aliasing specifications,” in Proc. ACM
Software Code,” in Proceedings of the 10th European Software SIGPLAN Conference on Programming Language Design and Imple-
Engineering Conference held jointly with 13th ACM SIGSOFT mentation (PLDI), 2019.
International Symposium on Foundations of Software Engineering, [53] V. Chibotaru, B. Bichsel, V. Raychev, and M. T. Vechev, “Scalable
2005, Lisbon, Portugal, September 5-9, 2005, M. Wermelinger and taint specification inference with big code,” in Proc. ACM SIG-
H. C. Gall, Eds. ACM, 2005, pp. 306–315. [Online]. Available: PLAN Conference on Programming Language Design and Implementa-
https://doi.org/10.1145/1081706.1081755 tion (PLDI), 2019.
[38] W. Weimer and G. C. Necula, “Mining Temporal Specifications for [54] Q. Hanam, F. S. D. M. Brito, and A. Mesbah, “Discovering bug
Error Detection,” in Tools and Algorithms for the Construction and patterns in JavaScript,” in Proceedings of the 24th ACM SIGSOFT
Analysis of Systems, 11th International Conference, TACAS 2005, Held International Symposium on Foundations of Software Engineering, FSE
as Part of the Joint European Conferences on Theory and Practice of 2016, Seattle, WA, USA, November 13-18, 2016, 2016, pp. 144–156.
Software, ETAPS 2005, Edinburgh, UK, April 4-8, 2005, Proceedings, [Online]. Available: https://doi.org/10.1145/2950290.2950308
ser. Lecture Notes in Computer Science, N. Halbwachs and L. D. [55] M. Pradel and K. Sen, “Deepbugs: a learning approach to
Zuck, Eds., vol. 3440. Springer, 2005, pp. 461–476. [Online]. name-based bug detection,” pp. 147:1–147:25, 2018. [Online].
Available: https://doi.org/10.1007/978-3-540-31980-1 30 Available: https://doi.org/10.1145/3276517
[39] H. Liu, Y. Wang, J. Bai, and S. Hu, “PF-Miner: A practical paired [56] S. Ryu, J. Park, and J. Park, “Toward analysis and bug finding in
functions mining method for Android kernel in error paths,” JavaScript web applications in the wild,” IEEE Software, vol. 36,
J. Syst. Softw., vol. 121, pp. 234–246, 2016. [Online]. Available: no. 3, pp. 74–82, 2018.
https://doi.org/10.1016/j.jss.2016.02.007
[40] M. Acharya, T. Xie, J. Pei, and J. Xu, “Mining API Patterns
as Partial Orders from Source Code: From Usage Scenarios
to Specifications,” in Proceedings of the 6th joint meeting of the
European Software Engineering Conference and the ACM SIGSOFT
International Symposium on Foundations of Software Engineering, Ellen Arteca is currently a PhD candidate at the
2007, Dubrovnik, Croatia, September 3-7, 2007, I. Crnkovic and Khoury College of Computer Sciences, North-
A. Bertolino, Eds. ACM, 2007, pp. 25–34. [Online]. Available: eastern University, advised by Dr. Frank Tip. She
https://doi.org/10.1145/1287624.1287630 received her MMath degree from the University
[41] N. Gruska, A. Wasylkowski, and A. Zeller, “Learning from of Waterloo in 2018. Her research interests in-
6,000 Projects: Lightweight Cross-Project Anomaly Detection,” in clude program analysis, bug finding, and test
Proceedings of the Nineteenth International Symposium on Software generation for dynamic languages.
Testing and Analysis, ISSTA 2010, Trento, Italy, July 12-16, 2010,
P. Tonella and A. Orso, Eds. ACM, 2010, pp. 119–130. [Online].
Available: https://doi.org/10.1145/1831708.1831723
[42] V. Murali, S. Chaudhuri, and C. Jermaine, “Bayesian specification
learning for finding API usage errors,” in Proceedings of the 2017
11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE
2017, Paderborn, Germany, September 4-8, 2017, 2017, pp. 151–162.
[Online]. Available: https://doi.org/10.1145/3106237.3106284
[43] M. Monperrus and M. Mezini, “Detecting Missing Method Calls
As Violations of the Majority Rule,” ACM Trans. Softw. Eng. Max Schäfer received a DPhil in computer sci-
Methodol., vol. 22, no. 1, pp. 7:1–7:25, 2013. [Online]. Available: ence from the University of Oxford in 2010. He
https://doi.org/10.1145/2430536.2430541 is currently a principal software engineer with
[44] Y. Zheng, T. Bao, and X. Zhang, “Statically locating web appli- GitHub in Oxford, UK. His research interests
cation bugs caused by asynchronous calls,” in Proceedings of the include program analysis, software supply-chain
20th international conference on World wide web. ACM, 2011, pp. security, and advanced programming tools.
805–814.
[45] B. Petrov, M. Vechev, M. Sridharan, and J. Dolby, “Race detection
for web applications,” in ACM SIGPLAN Notices, vol. 47, no. 6.
ACM, 2012, pp. 251–262.
[46] V. Raychev, M. Vechev, and M. Sridharan, “Effective race detection
for event-driven programs,” in ACM SIGPLAN Notices, vol. 48,
no. 10. ACM, 2013, pp. 151–166.
[47] E. Mutlu, S. Tasiran, and B. Livshits, “Detecting JavaScript races
that matter,” in Proc. 10th Joint Meeting on Foundations of Software
Engineering (ESEC/FSE), 2015, pp. 381–392.
Frank Tip received the PhD degree from the
[48] L. Zhang and C. Wang, “Rclassify: classifying race conditions in University of Amsterdam, Amsterdam, Nether-
web applications via deterministic replay,” in Proceedings of the lands, in 1995. He is currently a professor at the
39th International Conference on Software Engineering. IEEE Press, Khoury College of Computer Sciences, North-
2017, pp. 278–288. eastern University, Boston, Massachusetts. His
[49] C. Q. Adamsen, A. Møller, and F. Tip, “Practical initialization race research interests include asynchronous Pro-
detection for JavaScript web applications,” in Proc. ACM SIGPLAN gramming, static and dynamic program analysis,
Conference on Object-Oriented Programming, Systems, Languages, and refactoring, and test generation.
Applications (OOPSLA), October 2017, pp. 66:1–66:22.
[50] C. Q. Adamsen, A. Møller, S. Alimadadi, and F. Tip, “Practical
AJAX race detection for JavaScript web applications,” in
Proceedings of the 2018 ACM Joint Meeting on European Software