0% found this document useful (0 votes)
16 views8 pages

Empirical Software Engineering at Microsoft Research

Empirical SE Research

Uploaded by

sro1990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Empirical Software Engineering at Microsoft Research

Empirical SE Research

Uploaded by

sro1990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Empirical Software Engineering at Microsoft Research

http://research.microsoft.com/en-us/projects/esm/
Christian Bird Brendan Murphy Nachiappan Nagappan Thomas Zimmermann
[email protected] [email protected] [email protected] [email protected]
Microsoft Research, Redmond, USA and Cambridge, UK
(Authors are in alphabetical order.)

ABSTRACT At a high level the goals of the ESE follows two guiding
We describe the activities of the Empirical Software Engi- principles,
neering (ESE) group at Microsoft Research. We highlight
our research themes and activities using examples from our  Empower software development teams
research on socio technical congruence, bug reporting and  To gain insight from product process, people
triaging, and data-driven software engineering to illustrate and customers
our relationship to the CSCW community. We highlight our by employing a qualitative and quantitative approach to the
unique ability to leverage industrial data and developers and software development process.
the ability to make near term impact on Microsoft via the
results of our studies. We also present the collaborations In this paper we discuss three broad themes of the ESE
our group has with academic researchers. group,
Author Keywords  Socio technical congruence;
Software engineering, Socio technical congruence, bug  Bug reporting and triaging; and
tracking and triaging, data-driven software engineering  Data-driven software engineering.
ACM Classification Keywords In each of these sections our studies leverage techniques
D.2.5 [Software Engineering]: Testing and Debugging; and methods from both the Software Engineering and
D.2.7 [Software Engineering]: Distribution, Maintenance, CSCW communities to adapt case studies in practice from
and Enhancement the empirical domain with the CSCW aspects as all soft-
ACM General Terms ware systems which are built by teams inherently have a
Human Factors, Management, Measurement significant collaborative aspect. We also present our collab-
orations and discuss the uniqueness of our fit in the middle
INTRODUCTION of these two communities.
The Empirical Software Engineering (ESE) group at Mi-
crosoft Research focuses on working in the intersection of SOCIO TECHNICAL CONGRUENCE
the Software Engineering and CSCW communities. “Design and programming are human activities;
forget that and all is lost”
“Over the last decade, it has become clear that empirical studies – Bjarne Stroustrop
are a fundamental component of software engineering research
and practice: Software development practices and technologies As software projects grow in size and complexity, so do the
must be investigated by empirical means in order to be under- teams of engineers that develop and maintain them.
stood, evaluated, and deployed in proper contexts. This stems
from the observation that higher software quality and productivi-
Brooks, in his seminal work, “The Mythical Man Month”
ty have more chances to be achieved if well-understood, tested [1] discussed coordination as one of the key problems of
practices and technologies are introduced in software develop- running a software project with many developers. The co-
ment. Empirical studies usually involve the collection and analy- ordination effort required to help each member of a team
sis of data and experience that can be used to characterize, evalu- stay in sync and keep a project on schedule is enormous.
ate and reveal relationships between software development de-
liverables, practices, and technologies.” Socio Technical Congruence is a term that has emerged
(Empirical Software Engineering journal, recently in the software engineering literature to describe
http://www.springer.com/computer/swe/journal/10664) the relationship between the “social” side of development,
meaning the developers, their relationships to each other,
how they communicate, work together on software, etc. and
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are the “technical” side which encapsulates features of the
not made or distributed for profit or commercial advantage and that copies software itself such as dependencies between components,
bear this notice and the full citation on the first page. To copy otherwise, component complexity, and software quality. The idea
or republish, to post on servers or to redistribute to lists, requires prior behind the term has its origins in Conway’s Law, originally
specific permission and/or a fee.
CSCW 2011, March 19–23, 2011, Hangzhou, China. presented in Conway’s paper “How Do Committees Invent”
Copyright 2011 ACM 978-1-4503-0556-3/11/3...$10.00.. [2]. This is of importance to the CSCW community because
Figure 1. A developer-module network characterizes the con-
tributions of developers within a system.

of the emphasis placed on understanding how, and under Figure 2. A socio-technical network between modules (circles)
what circumstances, developers should work together on and developers (boxes).
projects.
 Advanced centrality measures can improve the
In an effort to aid the development effort at Microsoft and
prediction of number of post-release failures.
understand the effect of human factors, we have gathered
data and investigated the relationship of software quality In summary, we found that a strong relationship exists be-
with developer attributes such as collaboration behavior, tween the developers’ commit behavior and the software
geographic location, position within the organization, and quality of modules within the system.
work assignment. Below, we describe some of our key
Adding Technical Relationships
results. In later work, Christian and Nachi built upon this result by
Contribution Behavior and Quality adding module dependencies to the developer-module net-
Is a study of collaboration behavior, Nachi, in joint work work [4]. In previous work, Tom and Nachi found that
with Martin Pinzger, developed a developer-module net- dependencies can predict failures in both modules [5] and
work, which characterized the contributions of developers subsystems [6]. A network that incorporates both develop-
to modules within a system [3]. Figure 1 shows an example er contributions and dependencies is a socio-technical net-
developer-module network. Gray circles represent devel- work because edges may represent contributions from peo-
opers and boxes represent modules within a system. Edges ple to modules or dependencies between modules within a
connect developers to the modules that they have contribut- system. Figure 2 depicts a portion of a socio-technical net-
ed to, with edge weights representing the number of source work. Circles represent modules and boxes are developers.
code repository commits. Note that the developer-module Solid directed edges are dependencies, indicating that one
network for Windows Vista is quite large, with thousands module may use functions or types defined in another, may
of developers and thousands of binaries – executables make RPC calls to another, etc. Dashed lines indicate that a
(.exe), shared libraries (.dll) and drivers (.sys). developer contributed to a module (we used weights in our
analysis, but do not depict them in the figure).
We found that topological properties of this network were
highly related to post-release faults. For instance, modules By combining both types of relationships into one graph,
that were more central, as defined by traditional social net- we were able to increase the power of fault predicting re-
work analysis centrality measures, tended to have more gression models. Using principal component analysis, we
faults than other modules. We also found that less complex found that models based on this network had higher recall
measures, such as the number of distinct authors and num- than networks with only contribution edges (developer-
ber of distinct commits were both significant predictors for module networks) or only dependency edges to a statistical-
the probability of post-release failures. By using a host of ly significant degree in Vista (recall was similar to previous
social network measures (we refer the reader to the original models).
paper for details and descriptions) in conjunction with prin-
To see if such models are specific to Microsoft or if they
cipal component analysis, we were able to train a logistic
are more generally applicable, we also applied the same
regression module for predicting failure-prone modules that
techniques to 6 major releases of the Eclipse Java IDE (2.0
achieved an average precision of 83% and recall of 89%.
through 3.3) and achieved precision and recall rates of fail-
We summarize our important results:
ure-prone plug-ins ranging from 75% to 86%. Further, we
 Network centrality measures can predict failure- were able to train a regression model on one release of
prone binaries in Windows Vista. Eclipse and achieve recall and precision values ranging
 Network centrality measures can predict the num- from 75% to 93% on the next release, showing that cross
ber of post-release failures. release prediction works well for network based regression
models.
The key contributions of this work were: these metrics. Our results indicate that all eight measures
are important because a step-wise regression retained every
 We found that using technical and contribution re- measure. We also created a predictor based on principal
lationships together have more power than either component analysis (due to high correlation between some
in isolation for predicting bugs. measures) and compared it to prior approaches that includ-
 We showed that such techniques are general by us- ed attributes such as code churn, code complexity, depend-
ing them on projects that differ in size, domain, encies, code coverage during testing, and pre-release bugs.
and process (commercial vs. open source). Surprisingly, the model based purely on organizational met-
 We demonstrated how such techniques can be rics performed better, in terms of precision and recall, than
used in practice to accurately predict failure prone all of these models to a statistically significant degree.
modules in one release using data from a prior re-
lease. We were able to build a better predictor using attributes of
the organization that developed the software instead of us-
In all of these models, the inclusion of developer behavior ing attributes of the software itself. This finding highlights
significantly improved the results over models that did not. the importance of coordination and collaboration in soft-
Clearly, the human side of software engineering has a pro- ware development, as it implies that perhaps high levels of
found effect on quality. coordination are able maintain code quality in the face of
Does Organizational Structure Affect Bugs? factors known to result in faults such as higher levels of
One of the unique advantages of working within an organi- complexity or code churn.
zation like Microsoft is that we have access to types of data Vista is a large project, in terms of code and developers. In
that may not be available in academia. One such form of an attempt to determine how large a project needs to be
data is the organizational structure of the teams that develop before these organizational measures begin to have an ef-
the software. fect, we replicated our study on a smaller data set and found
Brooks stated that product quality is strongly affected by that a team size of 30 engineers and three levels of organi-
organization structure [1]. In order to empirically evaluate zational depth should be sufficient for a model to predict
this claim, Nachi and a visiting researcher, Vic Basili, de- failure-proneness.
veloped a suite of metrics to quantify organizational com- Investigating the Effects of Geographic Distribution
plexity [7] and investigated the relationship of these An additional form of information that we have related to
measures with software quality that we summarize below. software development is the geographic location of all de-
The term “owning organization” is used to denote the or- velopers. This enabled us to address an issue that many
ganization that owns the binary. have wondered about and that may have consequences for
 Number of engineers that worked on a binary Microsoft’s development process, “Does distributed devel-
 Number of engineers who worked on a binary and opment affect software quality?”
left the organization prior to release In 2009, Chris and Nachi investigated this question by ex-
 Total number of contributions to a binary amining the locations of the developers that worked on each
 Number of levels up the organization required to binary that shipped with Windows Vista [8]. We grouped
reach the person who oversees the engineers mak- binaries into 6 categories depending on how spread out the
ing at least 75% of the contributions to a binary developers were that contributed to them. Some binaries
 Proportion of engineers in the owning organization were developed mostly by developers in the same building
who contributed to a binary while others had a team that spanned multiple countries.
 Proportion of edits to a binary that were made by
When we compared the defect rates for the different groups,
the owning organization
we found that no group had more than 16% more defects
 Ratio of proportion of engineers reporting to the than binaries developed by engineers in the same building.
owning manager relative to the total number of While this is not a trivial increase, we had expected the
engineers editing a binary effect of geographic distribution to be much larger due to
 Number of different organizations that contribute the barriers imposed such as lack of familiarity, time zone
at least 10% of the edits to a binary issues, and less-rich communication. Following a similar
Each of these measures is based on a hypothesis related to type of analysis to that of Herbsleb and Mockus [9], we
software quality. For instance, a large loss of team mem- examined the effects of distribution when controlling for
bers (2nd measure) affects knowledge retention and thus team size. There was very little difference in failures, 6%
quality. The more cohesive the contributors are (organiza- at most, between distributed and collocated binaries.
tionally, 5th measure) the higher the quality. We also examined attributes of the binaries in each group in
We gathered these metrics for Windows Vista and correlat- order to determine if, for instance, managers only distribut-
ed each with post-release faults in the first six months. We ed binaries that were smaller, less critical to the system, or
also evaluated the accuracy of a predictive model based on made up for distribution by testing more. In all, we exam-
Figure 3. Information in bug reports that is considered most Figure 4. Information in bug reports that is considered most
helpful by developers vs. information provided by reporters. helpful by developers vs. what reporters believe is important.

ined over 50 measures in categories such as complexity, resolution of software bugs. In our research we studied bug
churn, test coverage, dependencies, and organizational met- tracking in open source and a closed source environments..
rics, and determined that there is very little difference be-
What Makes a Good Bug Report?
tween distributed and collocated binaries other than team Tom, in joint work with Rahul Premraj, Nicolas Betten-
size. Thus, it appears that within Microsoft, distributed burg, Sascha Just, Adrian Schröter, and Cathrin Weiss,
development doesn’t negatively affect quality. There are a conducted a survey among developers and users of the
number of reasons that we believe this may be (and we in- Apache, Eclipse, and Mozilla projects [10]. The 466 re-
vite the reader to examine them in the original paper [8]), sponses revealed several interesting findings on how to
but we have yet to empirically verify them. improve bug tracking systems.
This result is all the more surprising in light of the findings First, we observed a mismatch between the information
of our study on organizational metrics, as it may seem to be considered most useful by developers and the information
at odds with those findings. The resolution of it lies in the provided by reporters (see Figure 3). Developers want steps
fact that organizational structure spans geography at low to reproduce, stack traces, and test cases in bug reports;
levels within Microsoft. While some companies have an however, this is not the information that reporters provide.
Asian organization, a European organization, etc., within Yet, when asked, the reporters’ responses indicated that
Microsoft, it is not uncommon to have a team with devel- they know what is most helpful to developers and the rank-
opers in India and others in Beijing who report to a second ings matched almost perfectly (see Figure 4). There are two
line manager in Redmond. This approach may be one rea- implications for bug tracking systems: (1) Tell users while
son that geography has less of an effect, but we plan to they are reporting a bug what information is important. (2)
study this further to provide more conclusive evidence. At the same time, systems should provide better tools to
BUG REPORTING AND TRIAGING collect important information automatically, because often
In the past years, Tom Zimmermann has collaborated with this information is difficult to obtain for users.
other researchers on studies on bug tracking systems. The
Next, we analyzed the comments by the survey respondents
advantage of these collaborations is that academic research-
to identify additional design recommendations:
ers can analyze open-source projects, while we can analyze
projects at Microsoft. Thus findings come with a higher  Support different levels of users (novice, expert)
generality. and provide different user interfaces for each level.
Bug reports are a perfect data source for CSCW research. Inexperienced users should receive more guidance
They capture collaboration, communication, and coordina- when reporting bugs.
tion among people. Especially in open source projects, bug  Integrate bug report reputation. Several develop-
tracking systems directly involve the users of a software ers pointed out that reporters who are well known,
and not just the engineers. This leads to communities of either personally or through well-written past bug
several thousand people who discuss and work towards a reports, will get more attention. Experienced re-
porters could be marked in their user profiles.
 Provide a powerful, yet simple and easy-to-use  Empirical analysis of response rate and time. Out
search feature. Several respondents to our survey of all questions, 67.66% were responded to. Of the
complaint about the limited search functionality, questions with responses, 79.4% received respons-
which is often only basic keyword search. es within the day.
 Evolving information needs. We learned that the
For a complete list of recommendations, we refer to our
kind of questions and thus the information needs
main publication on this work [10].
change over a bug’s life cycle.
Reassignment of Bug Reports  Community-based bug tracking. Bug reporting and
Many people collaborate on fixing bugs and bug reports are tracking should be understood as a social activity
often reassigned to other developers. Together with Gaeul within a community, supported by the bug tracking
Jeong and Sung Kim, Tom proposed bug tossing graphs system.
[11] to capture frequent reassignment patterns. In these
graphs, nodes represent developers and weighted edges Our results showed that the role of users goes beyond simp-
represent the number of reassignments between two devel- ly reporting bugs: their active and ongoing participation is
opers. On two large open-source projects, we showed that important for making progress on the bugs they report.
bug tossing graphs combined with Markov chains can re- Based on the results, we suggested four ways in which bug
duce the number of reassignments substantially (also tracking systems can be improved (see the main publication
known as ticket routing problem [12]). on this work [14]).

However, not all bug reassignments are necessarily bad. DATA-DRIVEN SOFTWARE ENGINEERING
Sometimes reassignments are actually needed to locate the A significant proportion of empirical research is done via
root cause for a bug and to find the right person who can fix case studies which collect and analyze data from software
the bug. Such “beneficial” reassignments can increase the artifacts and the associated processes and variables to quan-
chances of a bug report getting fixed (see next subsection). tify, characterize and explore the relationship between dif-
We are currently working on a characterization of bug re- ferent variables to deliver high quality secure software on
port reassignments to identify potential improvements for time and within budget. Data-Driven Software Engineering
bug tracking systems. forms a crucial part of empirical software engineering as it
can be used to understand the successful development of
Characterizing which Bugs Get Fixed software systems. Nachi Nagappan and Brendan Murphy
Often, the cost or risk of fixing a bug can be too high, or the were some of the first at Microsoft to begin collecting and
impact of a bug report can be too low (only few people af- analyzing software engineering artifact data for this pur-
fected, easy workaround). Thus not all bug reports get fixed pose.
in real software development. In joint work, with Philip
Guo, we characterized which bugs get fixed in Windows In this section we will explain three of our projects at a very
Vista and Windows 7 [13]. We made several observations high level that involve data driven software engineering.
related to how people collaborate and coordinate: They range from software product to software practice is-
sues. The three projects are,
 People who have been more successful in getting
their submitted bugs fixed are more likely to get 1. Software product – Failure-prediction/Risk analysis:
their bugs fixed in the future. Using software development data obtained during the
 Reassignments are not always detrimental to bug- development process to predict failures and identify the
fix likelihood; several might be needed to find the best predictors.
optimal bug fixer. 2. Software practice – Does test-driven development
work? If so is there any supporting data for teams to
 Bugs assigned across teams or locations are less
make decisions to use test-driven development.
likely to get fixed, due to less communication and
3. Software practice – Is there data available on how
lowered trust.
effective Unit testing is? What is the cost associated
Collaboration and Information Needs in Bug Reports with unit testing and do developers offer a resistance to
Especially in open source, bug tracking systems play a cen- unit testing.
tral role in supporting collaboration between the developers
Failure-Prediction/Risk Analysis
and the users of the software. To better understand this col-
An important application of data-driven software engineer-
laboration, we quantitatively and qualitatively analyzed the
ing is in the field of failure-prediction. Failure prediction
questions asked in a sample of 600 bug reports from the
can be used to understand the overall success of the devel-
MOZILLA and ECLIPSE projects (joint work with Silvia
opment process and plan for maintenance activities. Soft-
Breu, Rahul Premraj, and Jonathan Sillito) [14].
ware organizations can benefit greatly from an early estima-
We categorized the questions into a catalogue of frequently tion regarding the quality of their product. Because product
asked questions (eight categories, 40 subcategories) and quality information is available late in the process, correc-
then analyzed response rates and times by category and tive actions tend to be expensive [15].
project. Key findings of this study include:
During the development cycle different metrics can be col- quality and effort required to transition from the ad-hoc
lected that can be related to product quality. The goal is to testing to a more formal unit testing process. Also to further
use such metrics to make estimates of post-release failures quantify developer perceptions we conducted a survey and
early in the software development cycle, during the imple- interviews with the team to determine the tradeoffs of doing
mentation and testing phases. Such estimates can for exam- unit testing. These results can help other teams decide on
ple help focus testing, code and design reviews and afford- the cost and overhead to transition towards a more formal
ably guide corrective actions. Across a span of several unit testing process.
years, Nachi and Brendan (in collaboration with others)
In general the three projects in the data –drive software
have used different metrics for failure prediction: code cov-
engineering domain are more focused towards the empirical
erage [16]; code churn [17]; code complexity [18]; code
data analysis with making the results accessible to engi-
dependencies [19]; people and organizational metrics [7].
neers via tools, techniques and processes.
Based on the results from using these various metrics either
Analytics for Software Development
individually or in as a composite model effective failure The previous subsection presented studies where the ESE
prediction models have been built and is used in a wide group collaborated with product teams at Microsoft. Our
variety of products at Microsoft. These failure-prediction future work will focus on making data-driven software en-
models are built as services which allow engineers to pre- gineering accessible to a wider audience of engineers and
dict risk; identify other engineers who share dependencies managers.
with their code which might be affected by changes; priori-
tize testing; identify ownership to have the best person fix We plan to build tools that allow an easy access to data to
bugs and plan for staffing up for maintenance activities. simplify data-driven decision making. For example, existing
development environments such Microsoft's Team Founda-
Test-Driven Development
tion Server and IBM's Jazz provide dashboards to inform
Test-driven development (TDD) [20] is an “opportunistic”
engineers of the status of various events. However, while
software development practice that has been used sporadi-
showing status and indicators is fairly straightforward, it is
cally for decades. With this practice, a software engineer
unclear what are the most important factors are for devel-
cycles minute-by-minute between writing failing unit tests
opment teams to make data-driven decisions. What do we
and writing implementation code to pass those tests. Test-
need to surface so that development data becomes actiona-
driven development has recently re-emerged as a critical
ble for teams so that they can improve how they work to-
enabling practice of agile software development methodol-
gether?
ogies [21], in particular Extreme Programming (XP) [22].
However, little empirical evidence supports or refutes the Furthermore, we plan to evangelize empirical methods in
utility of this practice in an industrial context. software development and will provide analytics tools to
empower development teams to run studies that go beyond
For this purpose, Nachi collected and analyzed [23] data
the use of simple dashboards. In particular, we foresee the
from three different teams at Microsoft (in Windows, MSN
role of a software development analyst who combines the
and Visual Studio) to build up an empirical body of
expertise in collecting and analyzing data with the
knowledge on the efficacy of TDD. This has enabled teams
knowledge of processes specific to the product team. Right
to decide on the utility of TDD as a development practice.
now, this expertise is often split across Microsoft Research
Further, by documenting the contextual information about
(who have the analytics knowledge) and product teams
the human factors about the engineers involved (their expe-
(who have the domain knowledge).
rience, programming expertise, whether collocated or dis-
tributed) team can make a data-driven decision on their WHAT MAKES EMPIRICAL SOFTWARE ENGINEERING
move to following a TDD for software development. RESEARCH AT MICROSOFT UNIQUE?
An industrial research lab such as Microsoft Research has
Software Unit Testing
several advantages to conduct research.
Unit testing is the testing of individual hardware or soft-
ware units or groups of related units (IEEE [24]) and has Easy access to industrial data. During software develop-
been widely used in commercial software development for ment a large amount of data is generated and recorded in
decades. But academic research has produced little empiri- software repositories. Being inside Microsoft simplifies the
cal evidence via a large scale industrial case study on the access to such data and enables empirical studies as the
experiences, costs, and benefits of unit testing. Does auto- ones presented in this paper.
mated unit testing produce higher quality code?
Easy access to developers. Not only is the access to data
To help other teams make a data-driven decision, Nachi, easier, but also the access to engineers. This allows valida-
Laurie Williams, and Gunnar Kudrjavets observed [25] one tion of empirical findings, user studies of prototypes, inter-
large Microsoft team consisting of 32 developers transi- views, surveys, etc. and makes an ideal environment to
tioned from ad hoc and individualized unit testing practices study collaboration in software development.
to the utilization of the NUnit automated unit testing
Near term impact. Since Microsoft’s core business is de-
framework by all members of the team. We quantified the
veloping software, findings that result from our studies can
Main location
ns of the ESM grroup (black pinss):
Redmond (US SA), Cambridge ((UK)
Collaboration ns with other Miicrosoft Researcch Labs:
Microsoft Ressearch India (Banngalore), Microsooft Research
Asia (Beijing)), European Microosoft Innovation Center
(Aachen, Germ many)
Interns 2007--2010: Universityy of Virginia (Rayy Buse);
University of California, Santaa Cruz (Ken Hulleett);
National Instittute of Technologgy, Tiruchirappallli, India
(Kalaikumarann Ramamurthy); Stanford Universsity
(Philip Guo); Boğaziçi Universsity, Turkey (Aysse Tosun);
Darmstadt Unniversity of Technnology (Andreas JJohansson);
North Carolinna State Universityy (Lucas Laymann,
Meiyappan Naagappan)
Visitors 20077-2010: Hong Konng University of Science and
Technology (S Sung Kim); Univversity of Zurich ((Harald Gall,
Martin Pinzgeer); Saarland Univversity (Andreas Zeller);
North Carolinna State Universityy (Laurie William
ms);
University of Maryland (Victorr Basili); Darmstadt
University of Technology (Neeeraj Suri).
F
Figure 5. Colla
aborations of the
t Empirical Software Eng
gineering and Measurementt (ESM) Group
p at Microsoft Research.

bbe put into pracctice immediattely within thee company. Th his this onne step furtherr: developmennt teams shouldd have the
sserves to validaate the findingss and results in
n higher levels of knowleedge and toolss to analyze theeir collaboratioon patterns
qquality and productivity. themseelves and moniitor their improovement over ttime.
CCollaboration with other Miccrosoft researcchers. There area Being located inside Microsoft offfers us with maany unique
pplenty opportuunities to collaaborate with other
o researcheers opportutunities to pursue our goals. M
Microsoft has m many large
innside Microsooft. At the mo oment Microso oft Research has
h softwaare projects andd the “cooperaative” aspect iss omnipres-
mmore than 800 researchers,, working in eight locations ent. RRather than justt coming in afteer the fact and investigat-
aaround the worrld. For most areas,
a experts are
a easily accees- ing (likke many studiies in the miniing software rrepositories
ssible and allow
w for multidiscipplinary researcch when neededd. field ddo), we can wattch software ddevelopment whhile it hap-
pens. We can also ttest tools relatted to helping to support
CCollaboration with external researchers. There
T are man ny
cooperrative work aand understannd when coopperation is
oopportunities for
fo our group to t collaborate with researcheers
neededd and when enggineers can woork on their ow wn.
ooutside Microssoft. Often wee conduct reseearch in tandem m:
wwe analyze datta from Microssoft projects, while
w an academ m- For m
more informatioon about the E ESE group at Microsoft
icc researcher annalyzes open-ssource projectss. This increasses and/or to apply for ann internship, loogon to
thhe generality of
o our empirical findings. Seelected academ mic hhttp://research.m
microsoft.com//en-us/projectss/esm/
rresearchers alsoo get the oppoortunity to access to Microso oft
ddata either as interns (typicaally PhD students) or visitin ng ACKNO
OWLEDGMEN
NTS
rresearchers (forr example, professors during a sabbatical). We thaank Tom Ball, Robin Moeuur, Wolfram Scchulte; our
visitorss Sung Kim (22010), Harald G
Gall (2008, 20009), Laurie
FFigure 5 showss a Bing map with
w the locatio ons of the Empiir-
Williamms (2009), Anndreas Zeller (2005, 2009), Victor R.
iccal Software Engineering GroupG and thhe collaboratorr’s
Basili (2007), Neeraaj Suri (2007),, Martin Pinzgger (2007);
loocations. In th
he past years we
w have workeed with 8 interrns
our intterns Ray Busse, Ken Hullettt, Meiyappan Nagappan
aand 7 professo ors from all ovver the world. We are alwaays
Kalaikkumaran Ramaamurthy (all 22010), Philip G Guo, Ayse
loooking for outtstanding visito
ors and interns.. To learn more
Tosun (both 2009),, Lucas Laym man, Andreas Johansson
aabout visits orr internships visit
v our web-ssite and/or con-
(both 22007); and wee thank our coollaborators. TThanks for
ttact one of thee authors of this
t paper. In fact, three ES
SE
the greeat work!
ggroup memberss interned befo ore they joinedd Microsoft fu
ull-
tiime (Nachi Naagappan in 200 05, Tom Zimm mermann in 200 06, We alsso thank our coolleagues at Miicrosoft Researrch and the
aand Christian Bird
B in 2008 an nd 2009) many pproduct groupps at Microsoft ft who have woorked with
C
CONCLUSION
us and helped with sttudies. You rocck!
IIn this paper, wew presented three main theemes that show w- REFER
RENCES
ccase the researcch of the ESE group at Micrrosoft: the anally- 1. Broooks Jr., F.P. T
The mythical m
man-month. Adddison-
ssis of socio tecchnical congruence and bug tracking
t system
ms Weesley, 1975.
aallows us to unnderstand how development
d teeams collaboraate
aand to build too ols that help th
hem collaboratte with each otth- 2. Coonway, M.E. H How do committtees invent. Daatamation,
eer. With data-d driven softwaree engineering, we want to tak ke 14,, 4 (1968), 28-31.
3. Pinzger, M., Nagappan, N., and Murphy, B. Can an empirical study of Microsoft Windows. In
developer-module networks predict failures? In Proceedings of the 32nd ACM/IEEE International
Proceedings of the 16th ACM SIGSOFT International Conference on Software Engineering (2010), 495-504.
Symposium on Foundations of Software Engineering
(2008), 2-12. 14. Breu, S., Premraj, R., Sillito, J., and Zimmermann, T.
Information needs in bug reports: improving cooperation
4. Bird, C., Nagappan, N., Devanbu, P., Gall, H., and between developers and users. In Proceedings of the
Murphy, B. Putting it All Together: Using Socio- ACM Conference on Computer Supported Cooperative
Technical Networks to Predict Failures. In Proceedings Work (2010), 301-310.
of the 17th International Symposium on Software
Reliability Engineering (2009), 109-119. 15. Boehm, B.W. Software Engineering Economics. Prentice
Hall, 1981.
5. Zimmermann, T. and Nagappan, N. Predicting Defects
using Social Network Analysis on Dependency Graphs. 16. Mockus, A., Nagappan, N., and Dinh-Trong, T.T. Test
In Proceedings of the 30th International Conference on coverage and post-verification defects: A multiple case
Software Engineering (2008), 531-540. study. In Proceedings of the 3rd International
Symposium on Empirical Software Engineering and
6. Zimmermann, T. and Nagappan, N. Predicting subsystem Measurement (2009), 291-301.
failures using dependency graph complexities. In
Predicting subsystem failures using dependency graph 17. Nagappan, N. and Ball, T. Use of relative code churn
complexities (2007), 227-236. measures to predict system defect density. In
Proceedings of the 27th International Conference on
7. Nagappan, N., Murphy, B., and Basili, V. The influence Software Engineering (2005), 284-292.
of organizational structure on software quality: an
empirical case study. In Proceedings of the 30th 18. Bhat, T. and Nagappan, N. Building Scalable Failure-
International Conference on Software Engineering proneness Models Using Complexity Metrics for Large
(2008), 521-530. Scale Software Systems. In Proc. of the Asia Pacific
Software Engineering Conference (2006), 361-366.
8. Bird, C., Nagappan, N., Devanbu, P., Gall, H., and
Murphy, B. Does Distributed Development Affect 19. Nagappan, N. and Ball, T. Using Software Dependencies
Software Quality? An Empirical Case Study of Windows and Churn Metrics to Predict Field Failures: An
Vista. In Proceedings of the International Conference on Empirical Case Study. In Proceedings of the First
Software Engineering (2009), 518-528. International Symposium on Empirical Software
Engineering and Measurement (2007), 364-373.
9. Herbsleb, J.D. and Mockus, A. An empirical study of
speed and communication in globally distributed 20. Beck, K. Test Driven Development: By Example.
software development. IEEE Transactions on Software Addison-Wesley Professional, 2002.
Engineering, 29, 6 (2003), 481 - 494.
21. Cockburn, A. Agile Software Development. Addison-
10. Zimmermann, T., Premraj, R., Bettenburg, N., Just, S., Wesley Professional, 2001.
Schröter, A., and Weiss, C. What Makes a Good Bug
Report? IEEE Transactions on Software Engineering. To 22. Beck, K. and Andres, C. Extreme Programming
Explained: Embrace Change. Addison-Wesley
appear.
http://doi.ieeecomputersociety.org/10.1109/TSE.2010.63. Professional, 2004.

11. Jeong, G., Kim, S., and Zimmermann, T. Improving bug 23. Bhat, T. and Nagappan, N. Evaluating the efficacy of
test-driven development: industrial case studies. In
triage with bug tossing graphs. In Proceedings of the the
7th joint meeting of the European Software Engineering Proceedings of the ACM/IEEE International Symposium
Conference and the ACM SIGSOFT Symposium on the on Empirical Software Engineering (2006), 356-363.
Foundations of Software Engineering (2009), 111-120. 24. IEEE. IEEE Standard 610.12-1990, IEEE Standard
12. Shao, Q., Chen, Y., Tao, S., Yan, X., and Anerousis, N. Glossary of Software Engineering Terminology. , 1990.
Efficient ticket routing by resolution sequence mining. In 25. Williams, L., Kudrjavets, G., and Nagappan, N. On the
Proceedings of the 14th ACM SIGKDD International Effectiveness of Unit Test Automation at Microsoft. In
Conference on Knowledge Discovery and Data Mining Proceedings of the IEEE International Symposium on
(2008), 605-613. Software Reliability Engineering (2009).
13. Guo, P.J., Zimmermann, T., Nagappan, N., and Murphy,
B. Characterizing and predicting which bugs get fixed:

You might also like