0% found this document useful (0 votes)
16 views

SP-Automatic Documentation Generation via Source Code Summarization of Method Context

This paper presents a novel approach for automatic documentation generation that enhances existing methods by incorporating context information from Java methods. The proposed technique analyzes method invocations to provide programmers with more comprehensive summaries that explain not only what methods do but also why they exist and how to use them. A user study demonstrated that the generated documentation significantly improves understanding compared to traditional documentation methods.

Uploaded by

pokat10168
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

SP-Automatic Documentation Generation via Source Code Summarization of Method Context

This paper presents a novel approach for automatic documentation generation that enhances existing methods by incorporating context information from Java methods. The proposed technique analyzes method invocations to provide programmers with more comprehensive summaries that explain not only what methods do but also why they exist and how to use them. A user study demonstrated that the generated documentation significantly improves understanding compared to traditional documentation methods.

Uploaded by

pokat10168
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Automatic Documentation Generation via

Source Code Summarization of Method Context


Paul W. McBurney and Collin McMillan
Department of Computer Science and Engineering
University of Notre Dame
Notre Dame, IN, USA
{pmcburne, cmc}@nd.edu

ABSTRACT ments in the software’s source code. The key advantage is


A documentation generator is a programming tool that cre- that they relieve programmers of many tedious tasks while
ates documentation for software by analyzing the statements writing documentation. They offer a valuable opportunity
and comments in the software’s source code. While many to improve and standardize the quality of documentation.
of these tools are manual, in that they require specially- Still, a majority of documentation generators are manual.
formatted metadata written by programmers, new research They need considerable human intervention. Prominent ex-
has made inroads towards automatic generation of documen- amples include Doxygen [54] and JavaDoc [25]. These tools
tation. These approaches work by stitching together key- streamline the task of writing documentation by standardiz-
words from the source code into readable natural language ing its format and presentation. But, they rely on program-
sentences. These approaches have been shown to be effec- mers to write the documentation’s content (in particular, a
tive, but carry a key limitation: the generated documents summary of each function or method) as specially-formatted
do not explain the source code’s context. They can describe metadata in the source code comments. The tools cannot
the behavior of a Java method, but not why the method generate documentation without this metadata. The burden
exists or what role it plays in the software. In this paper, of writing the documentation still lies with the programmers.
we propose a technique that includes this context by ana- Recent research has made inroads towards automatic gen-
lyzing how the Java methods are invoked. In a user study, eration of natural language descriptions of software [2, 31,
we found that programmers benefit from our generated doc- 34, 46–48, 55]. In particular, work by Sridhara et al. can
umentation because it includes context information. form natural language summaries of Java methods [46]. The
summaries can then be aggregated to create the software’s
Categories and Subject Descriptors documentation. The technique works by first selecting a
method’s most important statements, and then extracting
D.2 [Software]: Software Engineering; D.2.9 [Software keywords from the identifier names in those statements. Next,
Engineering]: Management—Productivity a natural language generator stitches the keywords into En-
General Terms glish sentences. Finally, these sentences are used to make a
method summary. The process is automatic; so long as the
Algorithms, Documentation source code contains meaningful identifiers, the summaries
Keywords will describe the main behaviors of a given Java method.
What is missing from the method summaries is informa-
Source code summarization tion about the context which surrounds the method being
1. INTRODUCTION summarized. The context includes the dependencies of the
Different studies of program comprehension show that method, and any other methods which rely on the output of
programmers rely on good software documentation. [12, 24, the method [26]. A method’s context is important for pro-
29, 52]. Unfortunately, manually-written documentation is grammers to know because it helps answer questions about
notorious for being incomplete, either because it is very why a method exists and what role it plays in the soft-
time-consuming to create [7, 21], or because it must con- ware [8, 42, 43]. Because they summarize only select state-
stantly be updated [11, 19, 41]. One result has been the ments within a method, existing techniques will supply only
invention of the documentation generator. A documenta- limited context about a method.
tion generator is a programming tool that creates documen- In this paper, we hypothesize that existing documentation
tation for software by analyzing the statements and com- generators would be more effective if they included informa-
tion from the context of the methods, in addition to the
data from within the methods. We define “more effective”
Permission to make digital or hard copies of all or part of this work for in terms of three criteria: programmers find the documenta-
personal or classroom use is granted without fee provided that copies are not tion’s method summaries to be more helpful in understand-
made or distributed for profit or commercial advantage and that copies bear ing 1) what the methods do internally, 2) why the methods
this notice and the full citation on the first page. Copyrights for components exist, and 3) how to use the methods. To test our hypothe-
of this work owned by others than ACM must be honored. Abstracting with sis, we introduce a novel technique to automatically gener-
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee.
ate documentation that includes context. We then perform
ICPC ’14, June 2–3, 2014, Hyderabad, India a case study with 12 Java programmers as participants.
Copyright 2014 ACM 978-1-4503-2879-1/14/05 $15.00.
During the study, the participants evaluated two config- that our generated documentation contains structural infor-
urations of our technique in comparison to documentation mation in each method’s summary.
generated by a state-of-the-art solution [46]. In one config- We include this structural information from the context
uration, the documentation consisted only of our generated surrounding each method in the program. A method’s “con-
summaries. In the second configuration, the documenta- text” is the environment in which the method is invoked [26].
tion contained both our summaries and the state-of-the-art It includes the statement which called the method, the state-
summaries. We found that our summaries were in general ments which supplied the method’s inputs, and the state-
of higher quality and made a key contribution by provid- ments which use the method’s output. Context-sensitive
ing more-thorough contextual information. The program- program slicing has emerged as one effective technique for
mers consistently rated our summaries as more-helpful in extracting context [26]. Given a method, these techniques
understanding why given Java methods exist, and how to will return all statements in its context. However, some
use them, than the state-of-the-art method. statements in the context are more-relevant to the method
Our tool works by collecting contextual data about Java than other statements. This issue of relevance is impor-
methods from the source code, namely method calls, and tant for this paper because we must limit the size of the
then using the keywords from the context of a method to text summaries, and therefore select only a small number of
describe how that method is used. We use related work, the statements for use in generating the summaries.
Software Word Usage Model by Hill et al. [17], to identify the Consider the manually-written examples of method sum-
parts of speech for the different keywords. We choose the maries from NanoXML, a Java program for parsing XML,
contextual information to summarize using the algorithm below. Item 1 is an example method we selected. It demon-
PageRank, which we compute for the program’s call graph. strates how the default summary from documentation can
We then build a novel Natural Language Generation system be incomplete. In isolation, the method summary leaves a
to interpret the keywords and infer meaning from the con- programmer to guess: What is the purpose of reading the
textual information. Our system then generates a readable character? For what is the character used? Why does the
English description of the context for each method in a Java method even exist?
program. We will describe typical natural language genera-
tion systems and supporting technologies for our approach in Example method with default summary
from JavaDocs
Section 3, followed by our approach, our evaluation, and our
1) StdXMLReader.read() / Method Name
evaluation results. Specifically, we contribute the following:
“Reads a character.” / Method Summary
• A novel approach for generating natural language de- Methods from context of example, with
scriptions of source code. Our approach is different summaries from JavaDocs
from previous approaches in that we summarize con- 2) XMLUnit.skipWhitespace()
text as readable English text. “Skips whitespace from the reader.”
3) XMLElement.addChild()
• A case study evaluating our approach and comparing it “Adds a child element.”
against documentation generated by a state-of-the-art 4) StdXMLBuilder.startElement()
approach. Our case study shows that our approach can “This method is called when a new XML element is en-
improve existing documentation by adding important countered.”
contextual information. 5) StdXMLBuilder.addAttribute()
“This method is called when a new attribute of an XML
• A complete implementation of our approach for Java element is encountered.”
methods. For the purpose of reproducibility of our re-
sults, we have released our implementation to the pub- These questions can be answered by reading the context.
lic as an open-source project via our online appendix1 . The example method may be easier to understand when we
know that Items 2 through 5 are in the example’s context.
2. THE PROBLEM These methods are in the context because they all rely on
the method read (e.g., they either call read directly, or call
The long-term problem we target in this paper is that a method that calls read). We selected Items 2 through
much software documentation is incomplete [30], which costs 5 above by hand to demonstrate this motivating example.
programmers time and effort when trying to understand the However, in the remainder of this paper we will discuss how
software [12]. In Java programs, a typical form of this doc- we automatically choose methods from the context and gen-
umentation is a list of inputs, outputs, and text summaries erate natural language descriptions, such as the one below in
for every method in the software (e.g., JavaDocs). Only if Item 6, for arbitrary Java methods. Our summaries provide
these summaries are incomplete, do the programmers resort programmers with key clues about how a method is used,
to reading the software’s source code [40]. What they must and provide this information as English readable sentences:
look for are clues in the source code’s structure about how
the methods interact [18,24,51]. The term “structure” refers Example method with summary including the me-
to both the control flow relationships and the data depen- thod’s contextual information
dencies in source code. The structure is important because
it defines the behavior of the program: methods invoke other 6) StdXMLReader.read()
methods, and the chain of these invocations defines how the “This method reads a character. That character is
program acts. In this paper, we aim to generate documen- used in methods that add child XML elements and
tation that is more-complete than previous approaches, in attributes of XML elements. Called from method that
skips whitespace.”
1
http://www.nd.edu/~pmcburne/summaries/
3. BACKGROUND “document structuring” takes place which sorts the messages
This section describes three supporting technologies for into a sequence that makes sense to a human reader. This
our work: the Software Word Usage Model (SWUM) [17], sequence of messages is known as the document plan.
the design of Natural Language Generation (NLG) systems The next main component, the Microplanner, decides which
[39], and the algorithm PageRank [27]. These techniques words will be used to describe each message. In “lexicaliza-
were proposed and evaluated elsewhere. We emphasize them tion”, the microplanner assigns specific words as parts of
here because they are important concepts for our approach. speech in a “phrase” about each message. Typically the sub-
ject, verb, and object for a given message are identified ,
3.1 Software Word Usage Model along with any modifiers such as adjectives and adverbs.
Next, two steps smooth the phrases so that they are more
The Software Word Usage Model (SWUM) is a technique naturally read. “Reference generation” decides how nouns
for representing program statements as sets of nouns, verbs, will be referred to in the phrases, such as whether to use
and prepositional phrases. SWUM works by making as- a proper name or a pronoun. Finally, “aggregation” joins
sumptions about different Java naming conventions, and us- phrases based on how they are related, e.g., causally (joined
ing these assumptions to interpret different programs state- by because) or via coordination (joined by and/or).
ments. Consider a method from NanoXML which has the The final component of NLG is the Surface Realizer. The
signature static String scanPublicId(StringBuffer, XML- surface realizer generates natural language sentences from
Reader, char XMLEntityResolver). SWUM first splits the the phrases. Different grammar rules for the natural lan-
identifier names using the typical Java convention of camel guage dictate how the sentences should be formed. The sur-
case. Next, it reads verbs from the method as the starting face realizer follows these rules to create sentences that con-
word from the method identifier (e.g., “scan”). SWUM also tain the parts of speech and words given by the microplan-
extracts noun phrases, such as “public id”, and deduces a ner. These sentences are the surface text. They are human-
relationship of the nouns to the verbs. For example, “public readable descriptions of the information in the messages,
id” is assumed to be the direct object of “scan” because it interpreted from the facts given to the document planner,
follows “scan” in the method identifier. Other nouns, such as and in the order defined in the document plan.
“string” or “xml reader”, are read from the return types and
arguments, and are interpreted under different assumptions. 3.3 PageRank
We direct readers to the relevant literature on SWUM for PageRank is an algorithm for approximating the impor-
complete details [16, 17]. tance of the nodes in a graph [27]. While a complete dis-
One strategy for using SWUM for text generation is to cussion of PageRank is beyond the scope of this paper, in
define templates of natural language sentences, and use the general, PageRank calculates importance based on the num-
output from SWUM to fill these templates [46]. For exam- ber of edges which point to a given node as well as the
ple, a template for method call statements is “action theme importance of the nodes from which those edges originate.
args and get return-type”. The template may be further pro- PageRank is well-known for its usefulness in ranking web
cessed so that items such as return-type actually display as pages for web search engines. However, PageRank has seen
the variable name. Given a method call statement systemID growing relevance in its applications in software engineer-
= XMLUtil.scanPublicID(publicID, reader, &, this.en- ing. In particular, a body of work has shown how PageRank
tityResolver);, a summary for the statement is “scan pub- can highlight important functions or methods in a software
lic id and get system id”. To summarize an entire method program [5, 20, 33, 38]. A common and effective strategy is
from these summaries of statements, Sridhara et al. se- to model a software program as a “call graph”: a graph in
lected a subset of key statements by defining rules for which which the nodes are functions or methods, and the edges
statements are typically the most-important (e.g., return or
control-flow statements). A method summary was a combi-
nation of the summaries of these key statements.
3.2 Natural Language Generation Systems
The design of a Natural Language Generation (NLG) sys-
tems typically follows an architecture described by Reiter
and Dale [39]. Figure 1 illustrates this architecture. Concep-
tually, the architecture is not complicated: a “communica-
tive goal” is translated from a series of facts into readable
natural language sentences, known as “surface text.” The
NLG system has three main components, each of which is
made up of several individual steps.
The first main component is the Document Planner. The
input to this component is a list of facts that need to be com-
municated to a human reader. Through “content determina-
tion”, the document planner interprets the facts and creates
“messages.” Messages are an intermediate representation be-
tween the communicative goal and readable text. For exam-
ple, in a weather forecast generator such as FOG [14], facts Figure 1: The typical design of a Natural Lan-
about the temperature on given days result in a message of- guage Generation system as described by Reiter and
fering an interpretation of those facts, e.g., that it is colder Dale [39]. We built our NLG system around each of
today than it was yesterday. After the messages are created, these seven steps.
are call relationships among the methods. Methods that
are called many times or that are called by other important
methods are ranked as more important than methods which
are called rarely, and thus have few edges in the call graph.
We follow this model of using PageRank for this paper.

4. APPROACH
This section describes the details of our approach, includ-
ing each step of our natural language generation system.
Generally speaking, our approach creates a summary
of a given method in three steps: 1) use PageRank to Figure 2: Overview of our approach.
discover the most-important methods in the given method’s
context, 2) use data from SWUM to extract keywords about
the actions performed by those most-important methods,
and 3) use a custom NLG system to generate English sen- extracted by SWUM from the method’s signature. Our sys-
tences describing for what the given method is used. tem makes a simplifying assumption that all methods per-
The architecture of our approach is shown in Figure 2. In form some action on some input. If the keyword associated
theory our system could summarize functions in many lan- with the input is labeled as a noun by SWUM, and the
guages, but in this paper we limit the scope to Java methods. keyword associated with the method name is a verb, we as-
The data we collect about these Java methods is our “com- sume that there is a verb/direct-object relationship between
municative goal” (see Section 3.2) and is the basis for the the method name and the input name. This relationship is
information we convey via NLG. recorded as a Quick Summary Message.
4.1 Data Collection Another type of message is the Importance Message. The
idea behind an importance message is to give programmers
The comment generator requires three external tools to
clues about how much time to spend reading a method.
produce the necessary input data: SWUM, the call graph
The importance message is created by interpreting both the
generator, and PageRank. SWUM parses the grammati-
PageRank value of the method and the PageRank values of
cal structure from the function and argument names in a
all other methods. The importance message represents how
method declaration. This allows us to describe the method
high this value is above or below average. At the same time,
based on the contents of its static features. Specifically,
an importance message will trigger our NLG system to in-
SWUM outputs the keywords describing the methods, with
clude more information in the method’s description if the
each keyword tagged with a part-of-speech (Figure 2, area
method is ranked highly (see Aggregation below).
3). Next, we produce a call graph of the project for which
A third message type is the Output Usage Message. This
we are generating comments. Our call graph2 allows us to
message conveys information about the method’s output,
see where a method is called so that we can determine the
such as “the character returned by this method is used to
method’s context (Figure 2, area 2). Finally, we obtain a
skip whitespace in character streams.” Our system uses data
PageRank value for every method by executing the PageR-
from quick summary messages, importance messages, and
ank algorithm with the procedure outlined in Section 3.3.
the call graph to create output usage messages. Given a
In addition to gleaning this information from the project
method, our system creates an output usage message by first
to produce our comments, we also use the source code of
finding the methods in the call graph which depend on the
the project itself. For every method call in the call graph,
given method. Then, it picks the two of those methods with
the Data Organizer searches through the code to find the
the highest PageRank. It uses the quick summary message
statement that makes that call. The purpose of collecting
from those two methods to describe how the output is used.
these statements is to provide a concrete usage example to
The last message type we will examine in detail is the Use
the programmer. The Data Organizer combines these ex-
Message. This message serves to illustrate how a program-
ample statements with the call graph and SWUM keywords
mer can use the method by highlighting a specific example in
to create the Project Metadata (Figure 2, area 4).
the code. For example, one message we generated was “the
4.2 Natural Language Generation method can be used in an assignment statement ; for exam-
ple: Date releaseDate=getReleaseDate();.” Our system uses
This section covers our NLG system. Our system pro-
the call graph to find a line of code that calls the method
cesses the Project Metadata as input (Figure 2, area 5),
for which we are generating the message. It then classifies,
following each of the NLG steps shown in Figure 1.
based on static features with the line of code, whether the
Content Determination. We create four different types
calling statement is a conditional, iteration, assignment, or
of “messages” (see Section 3.2) that represent information
procedural statement. If a source code example cannot be
about a method’s context. While all message types may be
found, the Use Message is omitted.
downloaded from our online appendix, due to space limi-
Document Structuring. After generating the initial
tations, we discuss only four representative messages here.
messages in the content determination phase, we organize
First, a Quick Summary Message represents a brief, high-
all the messages into a single document plan. We use a tem-
level action summarizing a whole method. For example,
plated document plan where messages occur in a pre-defined
“skips whitespace in character streams.” We create these
order: Quick Summary Messages, Return Messages, Output
messages from the noun/verb labeling of identifier names
Used Messages, Called Messages, Importance Messages, and
2 then Use Messages. Note that this order may change during
Generated using java-callgraph, available via https://
github.com/gousiosg/java-callgraph, verified 9/12/2013 the Aggregation phase below.
Lexicalization. Each type of message needs a different Consider getResult() from StdXMLBuilder.java in Nano-
type of phrase to describe it. This section will describe how XML. The method’s signature, public Object getResult(),
we decide on the words to be used in each of those phrases, is parsed by SWUM which will tell us the verb is “get”and
for the four message types described under Content Determi- the object is “result.” Additionally, it will note the return
nation. Note that the phrases we generate are not complete type as “object.” This will be used to generate the Quick
sentences; they will be grouped with other phrases during Summary Message “This method gets the result and returns
Aggregation and formed into sentences during realization. an Object.” Then, using the call graph, we determine that
The Quick Summary Message records a verb/direct-object the top two methods (as scored by PageRank) that call get-
relationship between two words extracted by SWUM. The Result() are scanData() and parse(). Initially, in the doc-
conversion to a sentence is simple in this case: the verb be- ument planning phase, we generate two separate messages,
comes the verb in the sentence, and likewise for the direct- one using the SWUM information for each function. How-
object. The subject is assumed to be “the method”, but is ever, these are combined in the aggregation step with the
left out for brevity. To give the reader further information conjunction “and”, and eventually produces the Output Us-
about the method’s purpose, we add the input parameter age Message “That Object is used by methods that scans
type as an indirect object using the preposition “in”. the data and that parses the std XML parser.”
We create a phrase for an Output Usage Message by set- The last message we generate is the Use Message. We
ting the object as the return type of the method, and the search through the most important calling method, which
verb as “is”. The subject is the phrase generated from the in this case is scanData(). We take a line of code that calls
Quick Summary Message. We set the voice of the phrase getResult(), and determine based on its content whether it
to be passive. We decided to use passive voice to empha- is a conditional, iteration, assignment, or procedural state-
size how the return data is used, rather than the contents of ment. Using this information, we generate the Use Message
the Quick Summary Message. An example of the phrase we “The method can be used in an iteration statement ; for ex-
output is under the Content Determination section. ample: while ((!this.reader.atEOF()) && (this.build-
The Use Message is created with the subject “this method”, er.getResult() == null)) { ”. Each of these messages
the verb phrase “can be used”, and appending the prepo- are then appended together to make the final summary.
sitional phrase ”as a statement type;”. Statement type is
pulled from the data structures populated in our content 6. EVALUATION
determination step. Additionally, we append a second de- Our evaluation compares our approach to the state-of-the-
pendent clause “for example: code”. art approach described by Sridhara et al. [46]. The objec-
Reference Generation and Aggregation. During Ag- tive of our evaluation is three-fold: 1) to assess the degree
gregation, we create more-complex and readable phrases to which our summaries meet the quality of summaries gen-
from the phrases generated during Lexicalization. Our sys- erated by a state-of-the-art solution, 2) to assess whether
tem works by looking for patterns of message types, and then the summaries provide useful contextual information about
grouping the phrases of those messages into a sentence. For the Java methods, and 3) to determine whether the gener-
example, if two Output Usage Messages are together, and ated summaries can be used to improve, rather than replace,
both refer to the same method, then the phrases of those two existing documentation.
messages are conjoined with an “and” and the subject and Assessing Overall Quality. One goal of our evaluation
verb for the second phrase is hidden. In another case, if a is to quantify any difference in quality between our approach
Quick Summary Message follows a Quick Summary Message presented in this paper and the existing state-of-the-art ap-
for a different method, then it implies that the messages are proach, and to determine in what areas the quality of the
related, and we connect them using the preposition “for”. summaries can be most improved. To assess quality, we ask
The result is a phrase such as “skips whitespace in charac- the three following Research Questions (RQs):
ter streams for a method that processes xml”. Notice that
Reference Generation occurs alongside Aggregation. Rather RQ1 To what degree do the summaries from our approach
than hiding the subject in the phrase “processes xml”, we and the state-of-the-art approach differ in overall ac-
make it explicit as “method” and non-specific using the arti- curacy?
cle “a” rather than “the.” Due to space limitations, we direct RQ2 To what degree do the summaries from our approach
readers to our online appendix for a complete listing of the and the state-of-the-art approach differ in terms of
Aggregation techniques we follow. missing important information?
Surface Realization. We use an external library, sim-
plenlg [13], to realize complete sentences from the phrases RQ3 To what degree do the summaries from our approach
formed during Aggregation. In the above steps, we set all and the state-of-the-art approach differ in terms of in-
words and parts-of-speech and provided the structure of the cluding unnecessary information?
sentences. The external library follows English grammar
These Research Questions are derived from two earlier
rules to conjugate verbs, and ensure that the word order,
evaluations of source code summarization [34,46], where the
plurals, and articles are correct. The output from this step
“quality” of the generated comments was assessed in terms
is the English summary of the method (Figure 2, area 6).
of accuracy, content adequacy, and conciseness. Content ad-
equacy referred to whether there was missing information,
5. EXAMPLE while conciseness referred to limiting unnecessary informa-
In this section, we explore an example of how we form a tion in the summary. This strategy for evaluating generated
summary for a specific method. We will elaborate on how comments is supported by a recent study of source code com-
we use SWUM, call graph, PageRank, and source code to ments [49] in which quality was modeled as a combination of
form our messages. factors correlating to accuracy, adequacy, and conciseness.
Assessing Contextual Information. Contextual in-
formation about a method is meant to help programmers Table 1: The cross-validation design of our user
understand the behavior of that method. But, rather than study. Different participants read different sum-
describe that behavior directly from the internals of the maries for different programs.
method itself, context explains how that method interacts Round Group Summary Program 1 Program 2
with other methods in a program. By reading the context, A Our NanoXML Jajuk
programmers then can understand what the method does, 1 B S.O.T.A. Siena JEdit
why it exists, and how to use it (see Section 2). Therefore, C Combined JTopas JHotdraw
we study these three Research Questions: A Combined Siena Jajuk
2 B Our JTopas JEdit
RQ4 Do the summaries help programmers understand what C S.O.T.A. NanoXML JHotdraw
the methods do internally? A S.O.T.A. JTopas Jajuk
3 B Combined NanoXML JEdit
RQ5 Do the summaries help programmers understand why
C Our Siena JHotdraw
the methods exist?
RQ6 Do the summaries help programmers understand how
to use the methods?
Table 2: The questions we ask during the user study.
The rationale behind RQ4 is that a summary should pro- The first six are answerable as “Strongly Agree”,
vide programmers with enough details to understand the “Agree”, “Disagree”, and “Strongly Disagree.” The
most-important internals of the method—for example, the last two are open-ended.
type of algorithm the method implements—without forcing Independent of other factors, I feel that the sum-
them to read the method’s source code. Our summaries Q1 mary is accurate.
aim to include this information solely from the context. If
The summary is missing important information,
our summaries help programmers understand the methods’
key internals, it means that this information came from the Q2 and that can hinder the understanding of the
method.
context. For RQ5 , a summary should help programmers
understand why the method is important to the program The summary contains a lot of unnecessary infor-
as a whole. For example, the programmers should be able Q3 mation.
to know, from reading the summary, what the consequences The summary contains information that helps me
might be of altering or removing the method. Likewise, for understand what the method does (e.g., the inter-
RQ6 , the summary should explain the key details about how Q4
nals of the method).
a programmer may use the method in his or her own code.
Orthogonality. While the ultimate goal of this research The summary contains information that helps me
is to generate documentation purely from data in the source understand why the method exists in the project
Q5 (e.g., the consequences of altering or removing the
code, we also aim to improve existing documentation by
adding contextual information. In particular, we ask: method).
The summary contains information that helps me
RQ7 Do the summaries generated by our solution contain Q6 understand how to use the method.
orthogonal information to the information already in
the summaries from the state-of-the-art solution? In a sentence or two, please summarize the
Q7 method in your own words.
The idea behind this RQ is that to improve existing sum- Do you have any general comments about the
maries, the generated summaries should contribute new in- Q8
given summary?
formation, not merely repeat what is already in the sum-
maries. We generate summaries by analyzing the context
of methods, so it is plausible that we add information from The purpose of this rotation was to ensure that all evalu-
this context, which does not exist in the summaries from the ators would read summaries from each different approach
state-of-the-art solution. for several different Java programs, and to mitigate any bias
from the order in which the approaches and methods were
6.1 Cross-Validation Study Methodology presented [32]. Table 1 shows our study design in detail.
To answer our Research Questions, we performed a cross- Upon starting the study, each participant was randomly as-
validation study in which human experts (e.g., Java pro- signed to one of three groups. Each of those groups was then
grammers) read the source code of different Java methods, assigned to see one of three types of summary: summaries
as well as summaries of those methods, for three different from our approach, summaries from the state-of-the-art ap-
rounds. For each method and summary, the experts an- proach, or both summaries at the same time.
swered eight questions that covered various details about
the summary. Table 2 lists these questions. The first six 6.2 Subject Java Programs
correspond to each of the Research Questions above, and The summaries in the study corresponded to Java meth-
were multiple choice. The final two were open-ended ques- ods from six different subject Java programs, listed in Ta-
tions; we study the responses to these two questions in a ble 3. We selected these programs for a range of size (5 to
qualitative evaluation in Section 8. 117 KLOC, 318 to 7161 methods) and domain (including
In the cross-validation study design, we rotated the sum- text editing, multimedia, and XML parsing, among others).
maries and Java methods that the human evaluators read. During the study, participants were assigned to see methods
from four of these applications. During each of three differ- in programming experience. We attempted to mitigate these
ent rounds, we rotated one of the programs that the groups threats through our cross-validation study design, which al-
saw, but retained the fourth program. The reason is so that tered the order in which the participants viewed the Java
the group would evaluate different types of summaries for methods and summaries. We also recruited our participants
different programs, but also evaluate different types of sum- from a diverse body of professionals and students, and con-
maries from a single application. From each application, we firmed our results with accepted statistical testing proce-
pre-selected (randomly) a pool of 20 methods from each ap- dures. Still, we cannot guarantee that a different group of
plication. At the start of each round, we randomly selected participants would not produce a different result.
four methods from the pool for the rotated application, and Another source for a threat to validity is the set of Java
four from the fixed application. Over three rounds, partici- programs we selected. We chose a variety of applications
pants read a total of 24 methods. Because the methods were of different sizes and from different domains. In total, we
selected randomly from a pool, the participants did not all generated summaries for over 19,000 Java methods from six
see the same set of 24 methods. The programmers could projects, and randomly selected 20 of these methods from
always read and navigate the source code for these applica- each project to be included in the study (four to twelve of
tions, though we removed all comments from this code to which were ultimately shown to each participant). Even
avoid introducing a bias from these comments. with this large pool of methods, it is still possible that our
results would change with a different projects. To help mit-
6.3 Participants igate this threat, we have released our tool implementation
We had 12 participants in our study. Nine were graduate and all evaluation data in an online appendix, so that other
students and from the Computer Science and Engineering researchers may reproduce our work in independent studies.
Department at the University of Notre Dame. The remain-
ing three were professional programmers from two different 7. EMPIRICAL RESULTS
organizations, not listed due to our privacy policy. This section reports the results of our evaluation. First,
we present our statistical process and evidence. Then, we
6.4 Metrics and Statistical Tests explain our interpretation of this evidence and answer our
Each of the multiple choice questions could be answered
research questions.
as “Strongly Agree”, “Agree”, “Disagree”, or “Strongly Dis-
agree.” We assigned a values to these answers as 4 for 7.1 Statistical Analysis
“Strongly Agree”, 3 for “Agree”, 2 for “Disagree”, and 1 for The main independent variable was the type of summary
“Strongly Disagree.” For questions 1, 4, 5, and 6, higher rated by the participants: summaries generated by our solu-
values indicate stronger performance. For questions 2 and tion, summaries from the state-of-the-art solution, or both
3, lower values are preferred. We aggregated the responses presented together. The dependent variables were the rat-
for each question by approach. For example, all responses ings for each question: 4 for “Strongly Agree” to 1 for “Strongly
to question 1 for the summaries from our approach, and all Disagree”.
responses to question 1 for the summaries from the state-of- For each question, we compare the mean of the partic-
the-art approach. ipants’ ratings for our summaries to the summaries from
To determine the statistical significance of the differences the state-of-the-art approach. We also compare the ratings
in these groups, we used the two-tailed Mann-Whitney U given when both summaries were shown, versus only the
test [44]. The Mann-Whitney test is non-parametric, and state-of-the-art summaries. We compared these values us-
it does not assume that the data are normally distributed. ing the Mann-Whitney test (see Section 6.4). Specifically,
However, the results of these tests may not be accurate if we posed 12 hypotheses of the form:
the number of participants in the study is too small. There- Hn The difference in the reported ratings of the responses
fore, to confirm statistical significance, we use the procedure for Qm is not statistically-significant.
outlined by Morse [35] to determine the minimum popula-
tion size for a tolerated p value of 0.05. We calculated these where n ranges from 1 to 12, and m ranges from 1 to 6,
minimum sizes using observed, not expected, values of U . depending on which question is being tested. For example,
in H11 , we compare the answers to Q5 for the state-of-the-
6.5 Threats to Validity art summaries to the answers to Q5 for the combined sum-
As with any study, our evaluation carries threats to valid- maries.
ity. We identified two main sources of these threats. First, Table 4 shows the rejected hypotheses (e.g., the means
our evaluation was conducted by human experts, who may with a statistically-significant difference). We made a deci-
be influenced by factors such as stress, fatigue, or variations sion to reject a hypothesis only when three criteria were met.
First, |Z| must be greater than Zcrit . Second, p must be less
Table 3: The six Java programs used in our evalu- than the tolerated error 0.05. Finally, the calculated min-
ation. KLOC reported with all comments removed. imum number of participants (Nmin ) must be less than or
All projects are open-source. equal to the number of participants in our study (13). Note
Methods KLOC Java Files that due to space limitations we do not include values for the
NanoXML 318 5.0 28 four hypotheses for which we do not have evidence to reject.
Siena 695 44 211 For reproducibility purposes, these results are available at
JTopas 613 9.3 64 our online appendix.
Jajuk 5921 70 544 7.2 Interpretation
JEdit 7161 117 555 Figure 3 showcases the key evidence we study in this eval-
JHotdraw 5263 31 466 uation. We use this evidence to answer our Research Ques-
tions along the three areas highlighted in Section 6.
Table 4: Statistical summary of the results for the participants’ ratings for each question. “Samp.” is the
number of responses for that question for a given summary type, for all rounds. Mann-Whitney test values
are U , Uexpt , and Uvari . Decision criteria are Z, Zcrit , and p. Nmin is the calculated minimum number of
participants needed for statistical significance.
H Q Summary n x̃ µ Vari. U Uexpt Uvari Z Zcrit p Nmin Decision
Our 65 3 3.015 0.863
H1 Q1 1223 1917 34483 3.74 1.96 <1e-3 2 Reject
S.O.T.A. 59 3 2.390 0.863
Our 65 3 2.492 0.973
H2 Q2 2272 1885 36133 2.03 1.96 0.042 2 Reject
S.O.T.A. 58 3 2.862 1.139
Our 65 2 1.815 0.497
H3 Q3 2011 1917 34204 0.503 1.96 0.615 15 Not Reject
S.O.T.A. 59 2 1.983 0.982
Our 65 3 2.877 0.641
H4 Q4 1410 1917 34475 2.736 1.96 0.006 4 Reject
S.O.T.A. 59 3 2.407 0.832
Our 65 3 2.585 0.809
H5 Q5 1251 1885 35789 3.351 1.96 0.001 2 Reject
S.O.T.A. 58 3 1.983 0.930
Our 65 3 2.769 0.649
H6 Q6 799 1885 35447 5.771 1.96 <1e-3 3 Reject
S.O.T.A. 58 3 1.776 0.773
Combined 59 3 2.847 0.580
H7 Q1 1266 1741 29026 2.788 1.96 0.005 3 Reject
S.O.T.A. 59 3 2.390 0.863
Combined 59 2 2.322 0.843
H8 Q2 2219 1711 31205 2.876 1.96 0.004 3 Reject
S.O.T.A. 58 3 2.862 1.139
Combined 59 2 2.542 1.149
H9 Q3 1227 1741 31705 2.884 1.96 0.004 5 Reject
S.O.T.A. 59 2 1.983 1.149
Combined 58 3 2.879 0.564
H10 Q4 1241 1711 27964 2.811 1.96 0.005 5 Reject
S.O.T.A. 59 3 2.407 0.832
Combined 59 3 2.508 0.634
H11 Q5 1176 1711 30423 3.064 1.96 0.002 4 Reject
S.O.T.A. 58 2 1.983 0.930
Combined 59 3 2.746 0.503
H12 Q6 705 1711 29972 5.814 1.96 <1e-3 2 Reject
S.O.T.A. 58 2 1.776 0.773

(a) Our vs. S.O.T.A. Summaries (b) Combined vs. S.O.T.A. Summaries

Figure 3: Performance comparison of the summaries. The chart shows the difference in the means of the
responses to each question. For example in (a), the mean of Q5 for our approach is 0.602 higher than for the
state-of-the-art summaries. The sign is reversed for Q2 and Q3 because lower scores, not higher scores, are
better values for those questions. Solid bars indicate differences which are statistically-significant. In general,
our summaries were more accurate and provided more-thorough contextual information.
Overall Quality. The summaries from our approach are Contextual Information. The summaries from our ap-
superior in overall quality to the summaries from the state- proach included more contextual information than the state-
of-the-art approach. Figure 3(a) shows the difference in of-the-art summaries. The differences in responses for ques-
the means of the responses for survey questions. Questions tions Q4 , Q5 , and Q6 are higher for our summaries by a
Q1 through Q3 refer to aspects of the summaries related statistically-significant margin. These results mean that,
to overall quality, in particular to our Research Questions in comparison to the state-of-the-art summaries, our sum-
RQ1 to RQ3 . In short, participants rated our summaries maries helped the programmers understand why the meth-
as more accurate and as missing less required information ods exist and how to use those methods. Therefore, we
by a statistically-significant margin. While these results are answer RQ4 , RQ5 , and RQ6 with a positive result. The
encouraging progress, they nevertheless still point to a need answers to these research questions point to an important
to improve. In Section 8, we explore what information that niche filled by our approach: the addition of contextual in-
the participants felt was unnecessary in our summaries. formation to software documentation.
Orthogonality. We found substantial evidence showing • “A bit sparse and missing a lot of information.”
that our summaries improve the state-of-the-art summaries.
When participants read both types of summary for a given • “Comment details the inner workings but provides no
method, the responses for Q4 , Q5 , and Q6 improved by a sig- big picture summary.”
nificant margin, pointing to an increase in useful contextual • “Only provides a detail for one of the possible branches.”
information in the documentation. Overall quality did not
decrease by a significant margin compared to when only our • “It seems the summary is generated only based on the
solutions were given, except in terms of unnecessary infor- last line of the method.”
mation added. Consider Figure 3(b): Accuracy and missing These comments occur more frequently with the state-of-
information scores showed similar improvement. While the the-art compared to our approach. A possible reason for
combined summaries did show a marked increase in unneces- this is our approach focuses much more on method inter-
sary information, we still find evidence to positively answer actions (e.g., method calls), and avoids the internal details
RQ7 : the information added to state-of-the-art summaries of the function. By contrast, the state-of-the-art approach
by our approach is orthogonal. This answer suggests that focuses on a method’s internal execution, selecting a small
our approach can be used, after future work to reduce un- subset of statements to use in the summary. Participants
necessary information, to improve existing documentation. felt this selection often leaves out error checking and alter-
nate branches, focusing too narrowly on particular internal
8. QUALITATIVE RESULTS operations while ignoring others.
Participants in the evaluation study had the opportunity Several of our generated summaries and the state-of-the-
to write an opinion about each summary (see Q8 in Table 2). art generated summaries had grammar issues that distracted
In this section, we explore these opinions for feedback on our users. Additionally, the state-of-the-art approach often se-
approach and directions for future work. lected lines of source code, but did not generate English
One of the results in our study was the significantly worse summaries for those lines. Several users commented on these
performance of Q3 in the combined comments, suggesting an issues, noting that it made the summaries either impossible
increase in the amount of unnecessary information. Several or difficult to understand. Our aim is to correct these issues
user comments from our survey note concerns of repetitious going forward with refinement of our NLG tool.
information, as well as difficulties in processing the longer Another common theme of participant comments in both
comments that result from the combination. our approach and the state-of-the-art centered on function
• “The description is too verbose and contains too many input parameters. Many participants felt an explanation of
details.” input parameters was lacking in both approaches, as well
as the combination approach. A selection of these com-
• “The summary contains too much information and con- ments follows. These comments were selected from our ap-
fuses the purpose of the method...” proach, the state-of-the-art, and the combined approach re-
spectively:
• “The summary seems accurate but too verbose.”
• “The input parameters publicID and systemID are not
• “Too much information, I cannot understand the com- defined – what are they exactly?”
ment.”
• “The summary could mention the input required is the
Another result is the increase in the scores for Q5 and path for the URL”
Q6 , which deal with how a programmer can use the method
within the system. This increase appears to be due to the • “... It would be better if the summary described the
Use Message. Several users noted a lack of any form of types of the inputs...”
usage message in the state-of-the-art approach. A selection 9. RELATED WORK
of these comments follows.
The related work closest to our approach is detailed in a
• “Nice and concise, but lacking information on uses...” recent thesis by Sridhara [45]. In Section 3, we summarized
certain elements of this work that inspired our approach.
• “The summary is clear. An example is expected.” Two aspects we did not discuss are as follows. First, one
• “The summary...does not tell me where the method is approach creates summaries of the “high level actions” in a
called or how it is used.” method [47]. A high level action is defined as a behavior
at a level of abstraction higher than the method. The ap-
Additionally, in a method summary from our approach proach works by identifying which statements in a method
that did not generate a Use Message, a participant noted implement that behavior, and summarizing only those state-
“I feel that an example should be provided.” However, one ments. A second approach summarizes the role of the pa-
participant in our study had a largely negative opinion of rameters to a method. This approach creates a description of
the Use Message. This participant repeatedly referred to key statements related to the parameter inside the method.
the “last sentence” (the Use Message) as “unnecessary”, even Our approach is different from both of these approaches in
stating “Assume every one of these boxes comments about that we create summaries from the context of the method –
removing the last line of the provided comment.” that is, where the method is invoked. We help programmers
Participants often felt the state-of-the-art approach lacked understand the role the method plays in the software.
critical information about the function. Comments indicat- There are a number of other approaches that create nat-
ing a lack of information appeared consistently from many ural language summaries of different software artifacts and
participants. The following comments (each from a different behaviors. Moreno et al. describe a summarization tech-
participant) support this criticism: nique for Java classes that match one of 13 “stereotypes.” [34]
The technique selects statements from the class based on summaries written by a state-of-the-art solution. We found
this stereotype, and then uses the approach by Sridhara [46] that our summaries were superior in quality and that our
to summarize those statements. Work by Buse et al. fo- generated summaries fill a key niche by providing contextual
cuses on Java exceptions [3]. Their technique is capable of information. That context is missing from the state-of-the-
identifying the conditions under which an exception will be art summaries. Moreover, we found that by combining our
thrown, and producing brief descriptions of those conditions. summaries with the state-of-the-art summaries, we can im-
Recent work by Zhang et al. performs a similar function by prove existing software documentation. Finally, the source
explaining failed tests [56]. That approach modifies a failed code for our tool’s implementation and evaluation data are
test by swapping different expressions into the test to find publically available for future researchers.
the failure conditions. Summary comments of those con-
ditions are added to the test. Another area of focus has 11. ACKNOWLEDGMENTS
been software changes. One approach is to improve change
The authors would like to thank Dr. Emily Hill for provid-
log messages [4]. Alternatively, work by Kim et al. infers
ing key assistance with the SWUM tool. We also thank and
change rules, as opposed to individual changes, that explain
acknowledge the Software Analysis and Compilation Lab
the software’s evolution [23]. The technique can summarize
at the University of Delaware for important help with the
the high-level differences between two versions of a program.
state-of-the-art summarization tool. Finally, we thank the
Another approach, developed by Panichella et al., uses ex-
12 participants who spent time and effort completing our
ternal communications between developers, such as bug re-
evaluation.
ports and e-mails, and structures them to produce source
code documentation [37].
The key difference between our approach and these ex- 12. REFERENCES
isting approaches is that we summarize the context of the [1] J. Aponte and A. Marcus. Improving traceability link
source code, such as how the code is called or the output recovery methods through software artifact
is used. Structural information has been summarized be- summarization. In Proceedings of the 6th International
fore, in particular by Murphy [36], in order to help program- Workshop on Traceability in Emerging Forms of
mers understand and evolve software. Murphy’s approach, Software Engineering, TEFSE ’11, pages 46–49, New
the software reflexion model, notes the connections between York, NY, USA, 2011. ACM.
low-level software artifacts in order to point out connections [2] H. Burden and R. Heldal. Natural language generation
between higher-level artifacts. There are techniques which from class diagrams. In Proceedings of the 8th
give programmers some contextual information by listing the International Workshop on Model-Driven Engineering,
important keywords from code. For example, Haiduc et al. Verification and Validation, MoDeVVa, pages 8:1–8:8,
use a Vector Space Model to rank keywords from the source New York, NY, USA, 2011. ACM.
code, and present those keywords to programmers [15]. The [3] R. P. Buse and W. R. Weimer. Automatic
approach is based on the idea that programmers read source documentation inference for exceptions. In Proceedings
code cursorily by reading these keywords, and use that in- of the 2008 international symposium on Software
formation to deduce the context behind the code. Follow- testing and analysis, ISSTA ’08, pages 273–282, New
up studies have supported the conclusions that keyword-list York, NY, USA, 2008. ACM.
summarization is useful to programmers [1], and that VSM [4] R. P. Buse and W. R. Weimer. Automatically
is an effective strategy for extracting these keywords [6, 9]. documenting program changes. In Proceedings of the
Tools such as Jadeite [51], Apatite [10], and Mica [50] IEEE/ACM international conference on Automated
are related to our approach in that they add API usage software engineering, ASE ’10, pages 33–42, New
information to documentation of those APIs. These tools York, NY, USA, 2010. ACM.
visualize the usage information as part of the interface for [5] W.-K. Chan, H. Cheng, and D. Lo. Searching
exploring or locating the documentation. We take a differ- connected api subgraph via text phrases. In
ent strategy by summarizing the information as natural lan- Proceedings of the ACM SIGSOFT 20th International
guage text. What is similar is that this work demonstrates Symposium on the Foundations of Software
a need for documentation to include the usage data, as con- Engineering, FSE ’12, pages 10:1–10:11, New York,
firmed by studies of programmers during software mainte- NY, USA, 2012. ACM.
nance [22, 28, 53]. [6] A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella,
and S. Panichella. Using ir methods for labeling source
10. CONCLUSION code artifacts: Is it worthwhile? In Program
We have presented a novel approach for automatically Comprehension (ICPC), 2012 IEEE 20th
generating summaries of Java methods. Our approach dif- International Conference on, pages 193–202, June
fers from previous approaches in that we summarize the con- 2012.
text surrounding a method, rather than details from the in- [7] S. C. B. de Souza, N. Anquetil, and K. M. de Oliveira.
ternals of the method. We use PageRank to locate the most- A study of the documentation essential to software
important methods in that context, and SWUM to gather maintenance. In Proceedings of the 23rd annual
relevant keywords describing the behavior of those methods. international conference on Design of communication:
Then, we designed a custom NLG system to create natural documenting & designing for pervasive information,
language text about this context. The output is a set of SIGDOC ’05, pages 68–75, New York, NY, USA, 2005.
English sentences describing why the method exists in the ACM.
program, and how to use the method. In a cross-validation [8] E. Duala-Ekoko and M. P. Robillard. Asking and
study, we compared the summaries from our approach to answering questions about unfamiliar apis: an
exploratory study. In Proceedings of the 2012 Conference on Software Engineering, ICSE ’03, pages
International Conference on Software Engineering, 14–24, Washington, DC, USA, 2003. IEEE Computer
ICSE 2012, pages 266–276, Piscataway, NJ, USA, Society.
2012. IEEE Press. [21] M. Kajko-Mattsson. A survey of documentation
[9] B. Eddy, J. Robinson, N. Kraft, and J. Carver. practice within corrective maintenance. Empirical
Evaluating source code summarization techniques: Softw. Engg., 10(1):31–55, Jan. 2005.
Replication and expansion. In Proceedings of the 21st [22] T. Karrer, J.-P. Krämer, J. Diehl, B. Hartmann, and
International Conference on Program Comprehension, J. Borchers. Stacksplorer: call graph navigation helps
ICPC ’13, 2013. increasing code maintenance efficiency. In Proceedings
[10] D. S. Eisenberg, J. Stylos, and B. A. Myers. Apatite: of the 24th annual ACM symposium on User interface
a new interface for exploring apis. In Proceedings of software and technology, UIST ’11, pages 217–224,
the SIGCHI Conference on Human Factors in New York, NY, USA, 2011. ACM.
Computing Systems, CHI ’10, pages 1331–1334, New [23] M. Kim, D. Notkin, D. Grossman, and G. Wilson.
York, NY, USA, 2010. ACM. Identifying and summarizing systematic code changes
[11] B. Fluri, M. Wursch, and H. C. Gall. Do code and via rule inference. IEEE Transactions on Software
comments co-evolve? on the relation between source Engineering, 39(1):45 –62, Jan. 2013.
code and comment changes. In Proceedings of the 14th [24] A. J. Ko, B. A. Myers, and H. H. Aung. Six learning
Working Conference on Reverse Engineering, WCRE barriers in end-user programming systems. In
’07, pages 70–79, Washington, DC, USA, 2007. IEEE Proceedings of the 2004 IEEE Symposium on Visual
Computer Society. Languages - Human Centric Computing, VLHCC ’04,
[12] A. Forward and T. C. Lethbridge. The relevance of pages 199–206, Washington, DC, USA, 2004. IEEE
software documentation, tools and technologies: a Computer Society.
survey. In Proceedings of the 2002 ACM symposium on [25] D. Kramer. Api documentation from source code
Document engineering, DocEng ’02, pages 26–33, New comments: a case study of javadoc. In Proceedings of
York, NY, USA, 2002. ACM. the 17th annual international conference on Computer
[13] A. Gatt and E. Reiter. Simplenlg: a realisation engine documentation, SIGDOC ’99, pages 147–153, New
for practical applications. In Proceedings of the 12th York, NY, USA, 1999. ACM.
European Workshop on Natural Language Generation, [26] J. Krinke. Effects of context on program slicing. J.
ENLG ’09, pages 90–93, Stroudsburg, PA, USA, 2009. Syst. Softw., 79(9):1249–1260, Sept. 2006.
Association for Computational Linguistics. [27] A. N. Langville and C. D. Meyer. Google’s PageRank
[14] E. Goldberg, N. Driedger, and R. Kittredge. Using and Beyond: The Science of Search Engine Rankings.
natural-language processing to produce weather Princeton University Press, Princeton, NJ, USA, 2006.
forecasts. IEEE Expert, 9(2):45–53, April 1994. [28] T. D. LaToza and B. A. Myers. Developers ask
[15] S. Haiduc, J. Aponte, L. Moreno, and A. Marcus. On reachability questions. In Proceedings of the 32nd
the use of automated text summarization techniques ACM/IEEE International Conference on Software
for summarizing source code. In Proceedings of the Engineering - Volume 1, ICSE ’10, pages 185–194,
2010 17th Working Conference on Reverse New York, NY, USA, 2010. ACM.
Engineering, WCRE ’10, pages 35–44, Washington, [29] D. Lawrie, C. Morrell, H. Feild, and D. Binkley.
DC, USA, 2010. IEEE Computer Society. What’s in a name? a study of identifiers. In In 14th
[16] E. Hill. Integrating Natural Language and Program International Conference on Program Comprehension,
Structure Information to Improve Software Search and pages 3–12. IEEE Computer Society, 2006.
Exploration. PhD thesis, Newark, DE, USA, 2010. [30] T. C. Lethbridge, J. Singer, and A. Forward. How
AAI3423409. software engineers use documentation: The state of
[17] E. Hill, L. Pollock, and K. Vijay-Shanker. the practice. IEEE Softw., 20(6):35–39, Nov. 2003.
Automatically capturing source code context of [31] S. Mani, R. Catherine, V. S. Sinha, and A. Dubey.
nl-queries for software maintenance and reuse. In Ausum: approach for unsupervised bug report
Proceedings of the 31st International Conference on summarization. In Proceedings of the ACM SIGSOFT
Software Engineering, ICSE ’09, pages 232–242, 20th International Symposium on the Foundations of
Washington, DC, USA, 2009. IEEE Computer Society. Software Engineering, FSE ’12, pages 11:1–11:11, New
[18] R. Holmes and G. C. Murphy. Using structural York, NY, USA, 2012. ACM.
context to recommend source code examples. In [32] C. D. Manning, P. Raghavan, and H. Schtze.
Proceedings of the 27th international conference on Introduction to Information Retrieval. Cambridge
Software engineering, ICSE ’05, pages 117–125, New University Press, New York, NY, USA, 2008.
York, NY, USA, 2005. ACM. [33] C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie,
[19] W. M. Ibrahim, N. Bettenburg, B. Adams, and A. E. and C. Fu. Portfolio: finding relevant functions and
Hassan. Controversy corner: On the relationship their usage. In Proceedings of the 33rd International
between comment update practices and software bugs. Conference on Software Engineering, ICSE ’11, pages
J. Syst. Softw., 85(10):2293–2304, Oct. 2012. 111–120, New York, NY, USA, 2011. ACM.
[20] K. Inoue, R. Yokomori, H. Fujiwara, T. Yamamoto, [34] L. Moreno, J. Aponte, S. Giriprasad, A. Marcus,
M. Matsushita, and S. Kusumoto. Component rank: L. Pollock, and K. Vijay-Shanker. Automatic
relative significance rank for software component generation of natural language summaries for java
search. In Proceedings of the 25th International classes. In Proceedings of the 21st International
Conference on Program Comprehension, ICPC ’13, K. Vijay-Shanker. Towards automatically generating
2013. summary comments for java methods. In Proceedings
[35] D. T. Morse. Minsize2: A computer program for of the IEEE/ACM international conference on
determining effect size and minimum sample size for Automated software engineering, ASE ’10, pages
statistical significance for univariate, multivariate, and 43–52, New York, NY, USA, 2010. ACM.
nonparametric tests. Educational and Psychological [47] G. Sridhara, L. Pollock, and K. Vijay-Shanker.
Measurement, 59(3):518–531, June 1999. Automatically detecting and describing high level
[36] G. C. Murphy. Lightweight structural summarization actions within methods. In Proceedings of the 33rd
as an aid to software evolution. PhD thesis, University International Conference on Software Engineering,
of Washington, July 1996. ICSE ’11, pages 101–110, New York, NY, USA, 2011.
[37] S. Panichella, J. Aponte, M. Di Penta, A. Marcus, and ACM.
G. Canfora. Mining source code descriptions from [48] G. Sridhara, L. Pollock, and K. Vijay-Shanker.
developer communications. In Program Generating parameter comments and integrating with
Comprehension (ICPC), 2012 IEEE 20th method summaries. In Proceedings of the 2011 IEEE
International Conference on, pages 63–72, June 2012. 19th International Conference on Program
[38] D. Puppin and F. Silvestri. The social network of java Comprehension, ICPC ’11, pages 71–80, Washington,
classes. In Proceedings of the 2006 ACM symposium DC, USA, 2011. IEEE Computer Society.
on Applied computing, SAC ’06, pages 1409–1413, [49] D. Steidl, B. Hummel, and E. Juergens. Quality
New York, NY, USA, 2006. ACM. analysis of source code comments. In Proceedings of
[39] E. Reiter and R. Dale. Building natural language the 21st International Conference on Program
generation systems. Cambridge University Press, New Comprehension, ICPC ’13, 2013.
York, NY, USA, 2000. [50] J. Stylos and B. A. Myers. Mica: A web-search tool for
[40] T. Roehm, R. Tiarks, R. Koschke, and W. Maalej. finding api components and examples. In Proceedings
How do professional developers comprehend software? of the Visual Languages and Human-Centric
In Proceedings of the 2012 International Conference Computing, VLHCC ’06, pages 195–202, Washington,
on Software Engineering, ICSE 2012, pages 255–265, DC, USA, 2006. IEEE Computer Society.
Piscataway, NJ, USA, 2012. IEEE Press. [51] J. Stylos, B. A. Myers, and Z. Yang. Jadeite:
[41] L. Shi, H. Zhong, T. Xie, and M. Li. An empirical improving api documentation using usage information.
study on evolution of api documentation. In In CHI ’09 Extended Abstracts on Human Factors in
Proceedings of the 14th international conference on Computing Systems, CHI EA ’09, pages 4429–4434,
Fundamental approaches to software engineering: part New York, NY, USA, 2009. ACM.
of the joint European conferences on theory and [52] A. A. Takang, P. A. Grubb, and R. D. Macredie. The
practice of software, FASE’11/ETAPS’11, pages Effects of Comments and Identifier Names on Program
416–431, Berlin, Heidelberg, 2011. Springer-Verlag. Comprehensibility: An Experimental Study. Journal
[42] J. Sillito, G. C. Murphy, and K. De Volder. Asking of Programming Languages, 4(3):143–167, 1996.
and answering questions during a programming [53] Y. Tao, Y. Dang, T. Xie, D. Zhang, and S. Kim. How
change task. IEEE Trans. Softw. Eng., 34(4):434–451, do software engineers understand code changes?: an
July 2008. exploratory study in industry. In Proceedings of the
[43] S. E. Sim, C. L. A. Clarke, and R. C. Holt. Archetypal ACM SIGSOFT 20th International Symposium on the
source code searches: A survey of software developers Foundations of Software Engineering, FSE ’12, pages
and maintainers. In Proceedings of the 6th 51:1–51:11, New York, NY, USA, 2012. ACM.
International Workshop on Program Comprehension, [54] D. van Heesch. Doxygen website, 2013.
IWPC ’98, pages 180–, Washington, DC, USA, 1998. [55] A. T. T. Ying and M. P. Robillard. Code fragment
IEEE Computer Society. summarization. In Proceedings of the 2013 9th Joint
[44] M. D. Smucker, J. Allan, and B. Carterette. A Meeting on Foundations of Software Engineering,
comparison of statistical significance tests for ESEC/FSE 2013, pages 655–658, New York, NY,
information retrieval evaluation. In CIKM, pages USA, 2013. ACM.
623–632, 2007. [56] S. Zhang, C. Zhang, and M. D. Ernst. Automated
[45] G. Sridhara. Automatic Generation of Descriptive documentation inference to explain failed tests. In
Summary Comments for Methods in Object-oriented Proceedings of the 2011 26th IEEE/ACM
Programs. PhD thesis, University of Delaware, Jan. International Conference on Automated Software
2012. Engineering, ASE ’11, pages 63–72, Washington, DC,
[46] G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and USA, 2011. IEEE Computer Society.

You might also like