Jss 18
Jss 18
Abstract
In large and active software projects, it becomes impractical for a developer
to stay aware of all project activity. While it might not be necessary to know
about each commit or issue, it is arguably important to know about the ones
that are unusual. To investigate this hypothesis, we identified unusual events
in 200 GitHub projects using a comprehensive list of ways in which an artifact
can be unusual and asked 140 developers responsible for or affected by these
events to comment on the usefulness of the corresponding information. Based
on 2,096 answers, we identify the subset of unusual events that developers
consider particularly useful, including large code modifications and unusual
amounts of reviewing activity, along with qualitative evidence on the reasons
behind these answers. Our findings provide a means for reducing the amount
of information that developers need to parse in order to stay up to date with
development activity in their projects.
Keywords: awareness, unusual events, GitHub
1. Introduction
As part of their work, software developers create, modify, and delete many
artifacts in any given day. While some of these artifacts follow regular pat-
terns (e.g., an issue is closed by a new commit addressing the issue, or a
pull request is merged quickly after a few code review comments), others are
unusual: A difficult issue might take a particularly long time to address, a
2
why some types of unusual events are useful to know about and others are
not. We define an unusual event as an artifact that is unusual in at least
one way (e.g., a commit with an unusually large number of files added), and
an unusual event type as one way in which an artifact could be considered
unusual (e.g., unusually large number of files added in a commit). One
artifact could be unusual according to more than one unusual event type
at any point in time. In this work, we consider commits, issues, and pull
requests as artifacts, since they are the main artifacts on GitHub capturing
developer activity.
To achieve our research goal of identifying the set of unusual event types
that developers consider useful to be kept aware of, we presented 140 de-
velopers from 200 randomly sampled GitHub projects with a list of unusual
events we had detected in their projects and asked them to rate the useful-
ness of the corresponding information. Based on a total of 2,096 ratings of
different unusual events by the developers that were directly responsible for
and/or affected by these unusual events and their reasoning, we compiled a
list of types of unusual events that developers want to be kept aware of.
In particular, we investigated the following research questions:
RQ2 Which types of unusual events do developers find most useful and why?
3
• a list of types of unusual events that developers want to be kept aware
of, based on empirical evidence,
• the reasons for including and excluding specific unusual event types
from this list,
• data from 200 randomly sampled GitHub projects about the frequency
of unusual events and their types, and
2. Motivating Examples
RxSwift1 is a GitHub project that ports ReactiveX, an API for asyn-
chronous programming with observable streams, to Swift. When we down-
loaded its data, the repository contained 1,605 commits, 352 issues, and 443
pull requests. A typical issue on RxSwift is closed after being open for less
than 5 days (median: 4.65 days, first quartile: 21.74 hours, third quartile:
16.20 days). Considering these numbers, issue #206 is unusual: more than
10 weeks passed between the moment it was opened and the moment it was
closed. When we pointed this out to one of RxSwift’s contributors, they
stated: “I think the info is really useful actually, having a long standing
issue could [...] be an indicator of a difficult issue”.
Another project we analyzed for this work is LaTeXML,2 a converter for
LATEX to XML, HTML, and other formats. The corresponding repository con-
tained 4,520 commits, 675 issues, and 119 pull requests when we downloaded
1
https://github.com/ReactiveX/RxSwift
2
https://github.com/brucemiller/LaTeXML
4
its data. Out of the 675 issues, 21 were labeled with wontfix. These issues
usually did not attract much discussion: the median number of comments for
these 21 issues was 2, with the first quartile at 1 and the third quartile at 3.5.
Issue #724 is unusual in this regard with 13 comments. When we asked one
of LaTeXML’s contributors about this unusual event, they responded: “In
this case it indicates an interesting discussion that spans beyond the concrete
issue”.
Finally, the Elixir repository3 on GitHub hosts a dynamic, functional
language for building scalable and maintainable applications, with 11,548
commits, 2,402 issues, and 2,696 pull requests at the time of our data down-
load. Issue #3413 is unusual in terms of time between open and closed with
a duration of almost 11 months, considering all issues in this project assigned
to GitHub user josevalim. This user typically closes issues in less than 7 days
(median: 6.94 days, first quartile: 21.26 hours, third quartile: 36.21 days).
Given these numbers, one of his colleagues commented: “This information
is useful. Knowing José [...] closes issues quickly makes it appear that this
was a difficult problem”.
The goal of our work is to provide developers with useful insights such as
the ones illustrated in these examples through a systematic investigation of
different types of unusual events and their perceived usefulness.
3
https://github.com/elixir-lang/elixir
5
Table 1: Descriptive statistics of the 200 GitHub projects
4
We performed the random selection by randomly selecting GitHub project IDs between
1 and 70,000,000 and testing whether the corresponding projects fulfilled our sampling
criteria.
5
http://tinyurl.com/unusual-events-github
6
Table 2: Number of unusual commits. The column called “project” shows how many
unusual events we identified using the entire project as context, whereas the remaining
columns show the number of unusual events created by context-specific types.
7
Table 3: Number of unusual commits, when data is grouped by files and filetypes.
Table 4: Number of unusual issues. The column called “project” shows how many unusual
events we identified using the entire project as context, whereas the remaining columns
show the number of unusual events created by context-specific types.
8
Table 5: Number of unusual pull requests. The column called “project” shows how many
unusual events we identified using the entire project as context, whereas the remaining
columns show the number of unusual events created by context-specific types.
9
preliminary work on unusual events in SVN repositories [7], our past work
on awareness [1], productivity metrics [5, 8], other related work [9, 10, 11],
and the data available through the GitHub API [12]. It is important to
note that the goal of this comprehensive list of unusual event types was not
to identify types that we as researchers considered particularly useful, but
rather to follow a systematic and inclusive approach that would allow us to
ask developers about the usefulness of the different types of unusual events
while reducing the bias that we were bringing to this project. The next
section lists all unusual event types we considered in this work.
In order not to introduce bias due to different formulas being used for
different types of unusual events, we used the same definition of what we
consider as unusual for all types: the extreme outlier definition, also used by
Alali et al. [9], according to which x is an extreme outlier if x < Q1 − 3 · IQR
or x > Q3+3·IQR, where IQR denotes the inter-quartile range between the
first quartile Q1 and the third quartile Q3 of the underlying distribution. In
other words, a value in a distribution is considered as an outlier if it is either
three inter-quartile ranges above the third quartile or three inter-quartile
ranges below the first quartile of that distribution.
An important dimension of unusual events is context since what is unusual
depends on many factors, including team size, work dynamics, software pro-
cess, development cycle, domain, product size, criticality, and development
model [7]. To account for that, some of the types of unusual events that
we defined use the complete set of artifacts (i.e., commits, issues, or pull re-
quests) in a project to compute the corresponding distributions while others
are context-specific. For example, the types of unusual events related to com-
mit message length take all commits in a project into account and consider
a commit as unusual if that commit’s message length is extremely short or
extremely long, according to the definition of extreme outliers given in the
previous paragraph. On the other hand, the unusual event types related to
commit message length for a particular committer look at the commits of
each committer in a project separately, and consider a commit as unusual
if that commit’s message length is extremely short or extremely long given
the set of commits authored by the particular committer. Since what it un-
usual depends on the particular project and development team, none of our
unusual event types take data from more than one project into account, i.e.,
all computations are project-specific.
10
4. Unusual event types and their frequency
In this section, we present the unusual event types considered in this
work along with empirical data on how frequently each type occurred in our
sample of 200 GitHub projects.
11
such as committer / merge? to be able to distinguish between merge com-
mits and regular commits of different developers. In addition, we consider
the files that a commit touched along with their filetype. These pieces of
contextual information were again inspired by our previous work [5] which
identified “time between commits to a particular file” as a potentially useful
piece of information. As Table 3 shows, we only considered file-level context
for some of the commit-related types of unusual events since the remain-
ing ones would not be sensible (e.g., combinations such as “number of files
deleted for a particular file”).
12
4.4. Overlap between types of unusual events
As these tables show, we found instances of all types of unusual events in
the 200 GitHub projects. In general, file-specific types, such as the number
of days between commits to a particular file, account for a large number of
unusual events (26% of all commits), whereas types related to title length of
issues or pull requests have a much lower yield (at most 1% of all issues or
pull requests). Artifacts can be unusual in more ways than one. While less
than half of the issues (29.81%) and pull requests (46.22%) in our data are
unusual according to our list of types of unusual events, 58.81% of all commits
in our data are unusual in at least one way, with a maximum of 1,316 types
per commit. Note that this number includes many context-specific types of
unusual events, such as a different types for each file, filetype, or label.
A tool that detects more than half of all artifacts as unusual is arguably
not useful. In the next section, we describe the research method we followed
to narrow down the initial comprehensive list of types of unusual events to
the much smaller subset that developers consider useful.
5. Research Method
In this section, we present our research questions and the methods used
for data collection.
13
unusual events in a systematic way. If unusual events are perceived differ-
ently from artifacts that are not detected as unusual by any of the unusual
event types, we can argue that being aware of such unusual events might be
useful.
The investigation of why developers want to be aware of unusual events
is the goal of our second research question:
RQ2 Which types of unusual events do developers find most useful and why?
This research question aims at filtering the list of types of unusual events
down to those that developers consider to be most useful. In addition to
identifying this subset, we are interested in the reasons why certain unusual
event types are considered useful or not useful, respectively.
6
We downloaded the relevant data from GitHub on July 18th, 2016.
14
Table 6: Survey excerpt—note that there were up to 12 instances of Question 5 and 6 in
each survey and that Question 6 only appeared after participants had answered Question 5
• 3 artifacts (one commit, one issue, and one pull request) that they had
authored that were not unusual,
• 3 artifacts (one commit, one issue, and one pull request) that they had
authored that were unusual,
• 3 artifacts (one commit, one issue, and one pull request) that somebody
else on their project had authored that were not unusual, and
• 3 artifacts (one commit, one issue, and one pull request) that somebody
else on their project had authored that were unusual.
15
Table 7: Demographics about the 140 survey participants
7
https://github.com/BristolTopGroup/AnalysisSoftware
16
development (median: 10 years—3 participants did not answer this question)
and involved in more than one project (median: 4—12 participants did not
answer this question, see Table 7). Our participants held different jobs in
industry and academia. The majority (66 participants) called themselves
“software developer” or “software engineer”. Our sample also contained con-
sultants (3), technical leads (7), directors (3), managers (5), students (11),
CEOs or founders (5), and researchers (7).
6. Findings
In this section, we present our findings along with the details on data
analysis.
17
oddsratio
0.5
1.0
2.0
3.0
5.0
10.0
(01) unusual vs. difficult
18
(07) unusual vs. atypical
confidence interval does not contain the value 1.0, the association is statistically significant
Figure 1: Odds of artifacts being perceived as difficult and atypical (log scale). If the 95%
owned by developers other than the participant (3.27, 95% CI [2.10, 5.09]),
commits (3.31, 95% CI [1.42, 7.72]), and pull requests (4.97, 95% CI [2.61,
9.48]) are even higher than that. These results are a first indication that
one of the use cases of our approach is the detection of difficult artifacts, in
particular pull requests.
In contrast, the odds ratios for perceptions of typicality (06–12 in Fig-
ure 1) are much lower. None of the odds ratios is higher than 2, and in most
cases, the 95% confidence interval includes values below 1.0, rendering the
results inconclusive.
To investigate our research question RQ1.2 about developers’ perceptions
of artifacts affected by particular unusual event types, we calculated the odds
ratios for all project-wide unusual event types (second columns in Tables 2
through 5).8 Figure 2 shows the results. The top row shows the odds that
an artifact is perceived as being difficult if it is affected by an unusual event
type compared to the odds of being perceived as difficult if it is not affected
by that unusual event type. The bottom row has the corresponding data for
artifacts being perceived as atypical. The Figure shows unusual event types
related to commits, issues, and pull requests from left to right, and we only
compare commits to other commits, issues to other issues, and pull requests
to other pull requests.
With one exception each, all types of unusual events for commits and
pull requests present an odds ratio for difficulty greater than 1.0, even if
we take the 95% confidence interval into account. The highest odds ratio is
presented by the number of comments on pull requests: the odds of a pull
request that is unusual according to this particular type being perceived as
difficult are 11.29 times higher than the odds of other pull requests (95%
CI [5.22, 24.45]). Interestingly, this does not necessarily imply that the pull
request is perceived as atypical: As the bottom row of Figure 2 shows, many
of the odds ratios for perception of typicality are close to 1.0, with the 95%
confidence interval containing values on either side of 1.0.
Other noteworthy findings include that the number of labels on pull re-
quests does not appear to give information about difficulty (odds ratio: 1.42,
95% CI [0.39, 5.19]), whereas large pull requests in terms of number of lines of
code added are perceived as atypical (odds ratio: 6.18, 95% CI [2.59, 14.77]).
8
For context-specific types of unusual events, we did not have enough data to calculate
reasonable confidence intervals.
19
significant at an alpha level of 0.05. From left to right: commits, issues, and pull requests.
artifact to be perceived as atypical. If the 95% confidence interval does not contain the value 1.0, the association is statistically
top row shows the odds ratios for an artifact to be considered difficult and the bottom row shows the odds ratios for an
Figure 2: Odds of artifacts affected by different unusual event types being perceived as difficult and atypical (log scale). The
10
30
10
30
1
2
3
5
2
3
5
commits: message length
10
30
10
30
1
2
3
5
2
3
5
issues: body length
30
10
30
1
2
3
5
2
3
5
20
Files being renamed is not a common event, which explains our lack of data
and the corresponding large confidence interval for this type of unusual event.
A long time between commits is not perceived as atypical (odds ratio: 1.04,
95% CI [0.57, 1.88]).
21
Table 8: The most useful types of unusual events
22
voiced the opposite(2) : “Comments are not really a good indicator of much
in and of them selves”.
The response time to pull requests and issues is also an important
metric(9) : “The responsiveness of the repo owner is valuable when evaluating
using/contributing to an open source project”. A long-standing pull request
or issue could also indicate low priority(5) : “It’s useful to see we have a long
outstanding documentation issue, as they tend to be neglected unless we are
reminded accordingly” or difficulty(1) : “The open-closed time usually means
low priority, improper issue statement or something very difficult to fix”.
Seeing which issues have a long time-to-close can be useful(9) : “Yes, seeing
which issues have been open the longest can be vital”, but is less useful after
the fact(1) : “It would be good to know which issues have been outstanding for
a long time but once the issue has been addressed/closed, this information
becomes irrelevant”.
Another important metric is churn(3) : “It is useful because it can specify
the amount the code that is modified, and might even help spot the problem
if anything goes wrong in the future”. In particular knowing about unusual
deletions can be useful(3) : “LOC deleted relative to owner can be a good mea-
sure of difficulty of the pull request”, although one participant disagreed(1) :
“Core developer deleting one file isn’t unusual”.
Knowing about gaps in the commit activity can also be useful(4) : “Days
between commits for filetypes would be good to determine what kinds of things
are being worked on (code vs. assets vs. build scripts)”, similar to the number
of commits on a pull request(1) : “Detect unusual PRs to have more people
check it”. Types of unusual events related to long commit messages or issue
and pull request bodies generally received low ratings(9) : “Commit comment
length isn’t a useful metric unless its unusually short”, although some partic-
ipants saw value(4) : “A longer message can indicate the why for the change
is not immediately obvious”.
In some cases, the insights extracted by our analysis helped developers
reflect on their projects: “I realized that code review comments rarely hap-
pened” and could possibly be used to gamify some aspects of the development
process(5) : “Maybe the typical number of commits a user usually has would
be useful for encouraging developers to commit more often”. This theme is
echoed in related work [15].
Missing context was prevalent among the main criticisms of some types
of unusual events(5) : “I need to see them in context. Adding 100 lines of
documentation probably doesn’t need as much attention as adding 100 new
23
functions”. Commit-related types were often affected by changes to the doc-
umentation or formatting(10) : “It is a change in text. Not code. So there
are no changes”, and some commits were generated automatically(4) : “It’s
a commit made by an automated system. Not interested in getting statistics
for these events”. Gaps in commit activity could be explained by exter-
nal reasons(8) : “Open source projects are side-time for most of us”. More
advanced code metrics could be useful to address some of these issues(3) :
“Displaying a raw complexity score may be useful”. Most of these limitations
can be traced back to our decision to keep the types of unusual events in-
dependent of a particular programming language in order to be applicable
to all GitHub repositories. Future work will have to investigate this tradeoff
further.
In other cases, the information provided by the types of unusual events
was too fine-grained(7) : “I think splitting LOC metrics by file and label is
probably too granular”, and unusual events based on labels were generally not
seen as useful(4) : “Different projects use GitHub labels for different purposes.
Some of them are not using label at all. It’s not significant”.
For types of unusual events outside of the six that we identified as being
the most useful, the information was often seen as not useful in a practi-
cal sense(26) : “It’s interesting but not useful as a contributor”. In addition,
many participants used our survey to explain the unusual events rather than
indicate whether the information was useful(78) : “It was just a small improve-
ment to run samples”, or they only answered whether the information was
useful without stating why(66) : “Useful”, “Not useful”.
7. Discussion
In this section, we discuss our findings, in particular related to the de-
velopers’ perceptions of unusual events, the events’ verifiability, implications
for a user interface for displaying unusual events, and actions that developers
can take based on unusual events.
24
that awareness of such unusual events is considered useful. While developers
might be aware of difficult unusual events among their own artifacts, knowing
about difficult work of somebody else on the team can help prevent potential
problems early on, can encourage discussion where it is needed, and can give
important pointers to events in a project’s history to be reviewed when trying
to understand a project. Interestingly, our participants rated unusual events
among their own artifacts as less difficult compared to unusual events among
the artifacts of their team members.
The relationship between the unusual event information and whether an
artifact is considered atypical is much weaker, suggesting that there is a differ-
ence between observed typicality (via metrics) and perceptions of typicality,
and that developers do not view difficult tasks as atypical.
7.2. Verifiability
Our findings provide evidence that developers value simple and easily
understandable metrics over complex ones. With one exception, the unusual
event types that were rated as being most useful are based on project-wide
metrics that can easily be verified by looking at the raw data. While our
initial set of unusual event types contained more complex and context-specific
unusual event types, we found that developers generally did not find these
as useful as unusual event types based on project-wide data. In addition,
our findings indicate that awareness tools based on commit or source code
activity alone are not sufficient to communicate all the information developers
care about in a project: half of the six most useful unusual event types are
related to issues and pull requests.
We note that while the raw numbers can easily be verified, simply looking
at the metrics of a given commit, issue, or pull request will not tell developers
whether this artifact is unusual. To acquire this information, developers
have to download all data, calculate the metric values for each artifact, and
investigate the distributions of the different values—just as we did in this
work. In other words, while the raw numbers are verifiable, deciding what is
unusual requires a considerable amount of work that developers are unlikely
to undertake.
25
A particularly promising approach would be the integration into commu-
nication tools through bots [16]. For example, the current integration of
GitHub events into the cloud-based team collaboration tool Slack creates a
notification for each action taken on GitHub. Arguably, it would be more
useful to generate these notifications only in cases where something unusual
happened that deserves attention. Our empirical investigation of different
unusual event types and their perceived usefulness provides the empirical
foundation for building such tool support.
7.4. Implications
Tool support which surfaces unusual events is useful for software devel-
opers and their managers. Unusual events have the potential to significantly
reduce the amount of information that developers and managers need to parse
to stay on top of everything that is going on in their projects. We found that
only 15% of all commits, 8% of all issues, and 4% of all pull requests in our
data set were classified as unusual. Having to only look at this small subset
of information will save developers’ time and make it less likely for them
to miss important events that require their attention. Notifications about
unusual events can trigger a wide range of actions by software developers
or managers. In our previous preliminary work on unusual events in SVN
repositories [7], we found that notifications about such events could serve as
a discussion starter or a meeting agenda. As one developer explained: “It
would be useful to be aware of unusual events from other developers. [...] If
I notice a strange modification or many modifications I can promptly talk to
the developer about it.” Unusual events can also play a significant role for
managers who are in charge of monitoring project progress: “As a manager,
it is a way to look closer to what newcomers are doing. [...] The informa-
tion would be useful in the meetings, since I could question and talk to them
about their tasks without being too passive and waiting for them to tell me
something” [7].
Based on these preliminary findings, in this work, we have conducted
a systematic exploration of different kinds of unusual events that can be
detected for GitHub projects, and we have identified the subset of unusual
events that developers find particularly useful. We have uncovered additional
use cases for unusual events, for example detailed in Section 2: An issue with
an unusually long time between when it was opened and when it was closed
can point to difficulties that might require support from other developers,
and an issue with an unusually large number of comments can indicate a
26
discussion that other developers should be aware of. While many artifacts in
a GitHub repository can be considered as unusual according to some metric,
the goal of our work was to identify those types of unusual events that de-
velopers find useful. These findings are also important for researchers who
are interested in developer awareness in general or GitHub repositories in
particular since they uncover a new category of events that developers care
about: unusual events related to commits, issues, and pull requests.
8. Limitations
In terms of construct validity (i.e., the degree to which a test measures
what it claims, or purports, to be measuring), while our initial list of types
of unusual events was designed to be comprehensive and based on related
work, there could be other important unusual event types that we did not
ask our participants about. Our definition of unusual, although based on
related work [9], is only one possible way of detecting unusual values in a
distribution. Other approaches may have led to different results, and we will
continue our empirical investigation into the impact of different definitions
on the results. However, our results provide a first systematic and empirical
exploration of the idea of unusual events, and without empirical evidence,
we cannot determine to what extent other approaches would have resulted
in different outcomes.
In terms of internal validity (i.e., the extent to which a causal conclusion
based on a study is warranted), even the most useful unusual event types that
we defined still received negative ratings. As our qualitative data shows, it is
unrealistic to assume that all developers agree on wanting awareness of the
same information. Our qualitative analysis may have introduced bias and
error into our interpretation of the developer responses. We mitigated this
threat by having two of the authors do the coding.
In terms of external validity (i.e., the extent to which the results of a study
can be generalized to other situations and to other people), we cannot gen-
eralize our findings to development platforms other than GitHub. However,
GitHub now hosts more than 19.4 million active repositories,9 making it a
good starting point for this research. To distribute our survey, we contacted
all developers that had contributed at least one unusual commit to one of
9
https://octoverse.github.com/
27
the 200 projects in our sample within the last six months. However, the 140
individuals who contributed to this study were self-selected volunteers within
this sample. The general population on GitHub might have different charac-
teristics and opinions. Thus, we cannot claim that our results generalize to
all GitHub users or to the entire population of developers.
9. Related Work
Existing work on detecting unusual events in software repositories has
mostly focused on detecting specific unusual events, often focusing on bug
detection and prevention. Crystal [17], for example, can detect if a developer
has not committed for a long time, and if a developer has made changes
that conflict with other developers’ changes, break the build, or make a test
fail. WeCode [4] identifies the outcomes of merging all the developers’ code
at once. We take a broader approach by enumerating a large number of
unusual events that can happen in GitHub repositories, and by collecting
empirical data about their usefulness.
There is also a substantial body of work on the detection of buggy com-
mits. Kim et al. [18] employed machine learning to determine whether a
new software change is more similar to prior buggy changes or prior clean
changes. Eyolfson et al. [19] found commits submitted between midnight and
4am to be significantly more bug-prone than those submitted at other times,
and daily-committing developers to produce less buggy commits. The focus
of our work is not on bug detection, but rather on making developers aware
of unusual events in their repositories.
The detection of unusual events can be supported by visualizations of the
software process [20], the change history [21], or an individual commit [22].
While some of these allow for the identification of unusual events, they are
not as comprehensive as our unusual event types and do not include unusual
events on issues or pull requests.
Awareness tools for software developers have historically focused on
awareness at source code level. For example, Seesoft [23] maps each line
of source code to a thin row and uses colours to indicate changes. Au-
gur [24] extends the idea behind Seesoft by adding software development
activities to a Seesoft-style visualization, allowing developers to explore rela-
tionships between artifacts and activities. Palantı́r [25, 26] provides insight
into workspaces of other developers, focusing on artifact changes. Need-
Feed [27] models code relevance and highlights changes that a developer may
28
need to review. Relevant changes are determined using models that incorpor-
tate data mined from a project’s software repository. With FASTDash [28], a
developer can determine which team members have source files checked out,
which files are being viewed, and what methods and classes are currently
being changed. Going beyond source code, WIPDash [29] was designed to
increase awareness of work items and code activity. Similarly, the dashboard
component of IBM’s Jazz [1] is intended to provide information at a glance
and to allow for easy navigation to more complete information. Our work was
inspired by the dashboard component in Jazz, but using unusual events as
content rather than high-level summaries of artifact counts over time. With
its inherent transparency [30], GitHub affords group awareness in distributed
software development [31, 32], and external websites have started to aggre-
gate data from GitHub [33]. Our work adds to this body of knowledge by
exploring the concept of unusual event awareness on GitHub.
29
Acknowledgments
We thank all developers who participated in our survey for their partici-
pation.
References
[1] C. Treude, M.-A. Storey, Awareness 2.0: Staying aware of projects,
developers and tasks using dashboards and feeds, in: Proceedings of
the 32nd International Conference on Software Engineering - Volume 1,
2010, pp. 365–374.
30
[8] J. Lima, C. Treude, F. Figueira Filho, U. Kulesza, Assessing developer
contribution with repository mining-based metrics, in: Proceedings of
the International Conference on Software Maintenance and Evolution,
2015, pp. 536–540.
31
ings of the 19th Conference on Computer Supported Cooperative Work
and Social Computing Companion, 2016, pp. 333–336.
[17] Y. Brun, R. Holmes, M. Ernst, D. Notkin, Early detection of collabo-
ration conflicts and risks, IEEE Transactions on Software Engineering
39 (10) (2013) 1358–1375.
[18] S. Kim, E. J. Whitehead, Jr., Y. Zhang, Classifying software changes:
Clean or buggy?, IEEE Transactions on Software Engineering 34 (2)
(2008) 181–196.
[19] J. Eyolfson, L. Tan, P. Lam, Do time of day and developer experience af-
fect commit bugginess?, in: Proceedings of the 8th Working Conference
on Mining Software Repositories, 2011, pp. 153–162.
[20] A. Hindle, M. W. Godfrey, R. C. Holt, Software process recovery using
recovered unified process views, in: Proceedings of the International
Conference on Software Maintenance, 2010, pp. 1–10.
[21] F. Van Rysselberghe, S. Demeyer, Studying software evolution infor-
mation by visualizing the change history, in: Proceedings of the 20th
International Conference on Software Maintenance, 2004, pp. 328–337.
[22] M. D’Ambros, M. Lanza, R. Robbes, Commit 2.0, in: Proceedings of
the 1st Workshop on Web 2.0 for Software Engineering, 2010, pp. 14–19.
[23] S. G. Eick, J. L. Steffen, E. E. Sumner, Jr., Seesoft-a tool for visual-
izing line oriented software statistics, IEEE Transactions on Software
Engineering 18 (11) (1992) 957–968.
[24] J. Froehlich, P. Dourish, Unifying artifacts and activities in a visual tool
for distributed software development teams, in: Proceedings of the 26th
International Conference on Software Engineering, 2004, pp. 387–396.
[25] A. Sarma, Z. Noroozi, A. van der Hoek, Palantı́r: Raising awareness
among configuration management workspaces, in: Proceedings of the
25th International Conference on Software Engineering, 2003, pp. 444–
454.
[26] A. Sarma, A. van der Hoek, Towards awareness in the large, in: Proceed-
ings of the International Conference on Global Software Engineering,
2006, pp. 127–131.
32
[27] R. Padhye, S. Mani, V. S. Sinha, Needfeed: Taming change notifications
by modeling code relevance, in: Proceedings of the 29th International
Conference on Automated Software Engineering, 2014, pp. 665–676.
33