Beyond The Personal Software Process: Metrics Collection and Analysis For The Differently Disciplined
Beyond The Personal Software Process: Metrics Collection and Analysis For The Differently Disciplined
individual level. A best practice for organizational metrics Carnegie Mellon University
Characteristic Generation 1 Generation 2 Generation 3
(manual PSP) (Leap, (Hackystat)
PSP Studio,
PSP Dashboard)
Collection overhead High Medium None
Analysis overhead High Low None
Context switching Yes Yes No
Metrics changes Simple Software edits Tool dependent
Adoption barriers Overhead, Context-switching Privacy,
Context-switching Sensor availability
indicating that PSP data can support both project estima- lection and analysis called Leap [8, 6]. Despite the auto-
tion and quality assurance. In addition, researchers from mated support, adoption was still low, and this led us to the
the Software Engineering Institute analyzed data submitted discovery of another adoption barrier: the need for students
to them by instructors of 23 PSP classes, and concluded to continuously “context switch” between product develop-
that the PSP improved the students’ estimation accuracy and ment and process recording. We have now implemented a
product quality [2]. new system called Hackystat and have deployed it in two
To our knowledge, there is no published empirical re- software engineering classes. Hackystat completely auto-
search directly addressing the second conjecture, such as mates both collection and analysis of metric data and thus
studies reporting the actual percentage of PSP students who addresses both the overhead and context switching barriers
continue to use it one, two, and three years after hav- to adoption. The next section discusses this research trajec-
ing taken the class. (Publications on PSP adoption gen- tory in more detail.
erally consist of speculative guidelines and/or short-term
data which do not address the second conjecture.) However,
anecdotal evidence does not support the second conjecture.
2. Three generations of metrics for individuals
For example, a report on a workshop of PSP instructors re-
veals that in one course of 78 students, 72 of them “aban- Looking back, we can divide our research on metrics for
doned” the PSP because they felt “it would impose an ex- individuals into three generations. Figure 1 illustrates five
cessively strict process on them and that the extra work distinguishing characteristics of these generations.
would not pay off.” None of the remaining six students The first generation approach uses the PSP as originally
reported any perceived process improvements [1]. Our ex- described in A Discipline for Software Engineering. Users
periences teaching the PSP are similar: despite classroom of the PSP create and print out forms in which they man-
improvements in estimation and quality assurance, few if ually log effort, size, and defect information. Additional
any students adopted PSP-specific concepts. forms use this data to support project estimation and qual-
If the first conjecture is true but the second is false, then ity assurance. This approach creates substantial overhead
studying the PSP is similar to studying Latin: a task that due to form filling. For example, the PSP requires students
advocates suggest you learn for its indirect benefits, rather to write down every compiler error that occurs during de-
than because you’ll actually use it in your daily life. The velopment. It also recommends that the developer keep a
workshop report echos this PSP-as-Latin viewpoint when stopwatch by their desk in order to keep track of all inter-
it conjectures, “Even if students don’t use the PSP again, ruptions. A benefit of using forms is that changing metrics
improving and making them aware of their programming simply involves editing the affected forms and/or creating
habits will help them in their future academic and profes- new ones.
sional careers.” We began teaching the PSP in 1996, and had success
This paper presents a perspective on our research and ed- similar to that reported in other case studies. Most of our
ucational experiences for the past six years regarding met- students were able to estimate both the size and time of their
rics collection and analysis for individual developers. We final project with 90% or better accuracy, and one student
began by teaching and using the PSP in its original form, achieved 100% yield on one project. (This means that the
but students found the overhead of metrics collection and student eliminated all syntax and semantic errors from the
analysis to be excessive. To address this issue, we next de- system prior to the first compile of that system.) Despite
veloped a comprehensive toolkit for PSP-style metrics col- the obvious discipline displayed by these students, followup
2
Sensors are tool sensor
and metric-specific sensor
sensor
Analysis alerts/URLs
Web XML
Mailer Database
Server
email indicated that none of them continued using the man- and fast as pressing a button, this continual context switch-
ual PSP after finishing the semester. We attributed this to ing is still too intrusive for many users who desire long pe-
the overhead involved in collection and analysis, and began riods of uninterrupted focus for efficient and effective de-
the Leap research project in 1998 to pursue low overhead velopment.
approaches to collection and analysis of individual software In May 2001, we began the Hackystat project, in which
engineering metric data. metrics are collected automatically by attaching sensors to
A second generation approach uses Leap or another auto- development tools, metric data is sent by the sensors to a
mated tool for PSP-style metrics such as the PSP Studio [3] server, analyses over the gathered data are performed by a
and PSP Dashboard [11]. These tools all have the same ba- server, and alerts are emailed to the developer when trig-
sic approach to user interaction: they display dialog boxes gered. With Hackystat, the overhead of metrics collection
where the user records effort, size, and defect information. is effectively eliminated, developers never context switch
The tools also display various analyses when requested by between working and telling the tool that they’re working,
the user. Second generation approaches do an excellent job and analysis results can be provided in a “just in time”
of lowering the overhead associated with metrics analysis, manner. While Hackystat successfully addresses the bar-
and substantially reduce the overhead of metrics collection. riers to adoption identified in first and second generation
However, metrics changes require changes to the software approaches, it changes the nature of metric data that is col-
and are thus more complicated than in the first generation lected, imposes requirements on development tools, and in-
approach. troduces new adoption issues. The remainder of the paper
After teaching and using the Leap system we found that, describes the system and our results in more detail.
similar to the manual PSP, developers can utilize their Leap
historical data to substantially improve their project plan- 3. An overview of Hackystat
ning and quality assurance activities. Followup email indi-
cated that adoption improved slightly: a handful of students Figure 2 shows the basic architecture of Hackystat and
continued to use Leap after the end of the semester, and a how information flows between the user and the system.
small number of industrial developers discovered the tool Hackystat requires the development of client-side sensors
online and began using it. A few former students and devel- that attach to development tools and that unobtrusively col-
opers continue to use at least some parts of Leap. lect effort, size, defect, and other metrics regarding the
While “some adoption” is definitely an improvement user’s development activities. Not every development tool
over “no adoption”, we were still surprised by the very low is amenable to Hackystat instrumentation: Emacs is easy to
level of adoption of a toolkit that provided so much auto- integrate, Notepad is not.
mated support. We then discovered that a major adoption The current system includes sensors for the Emacs and
barrier is the requirement that the user constantly switch JBuilder IDEs, the Ant build system, and the JUnit test-
back and forth between doing work and “telling the tool” ing tool. These sensors collect activity data (such as which
what work is being done. Even if telling the tool is as simple file, if any, is under active modification by the developer
3
at 30 second intervals), size data (such as the Chidamber- exceeded, the server sends an email to the developer indi-
Kemerer object oriented metrics and non-comment source cating that an analysis has discovered data that may be of
lines of Java code), and defect data (invocation of unit tests interest to the developer along with an URL to display more
and their pass/fail status). details about the data in question at the server.
The developer begins using Hackystat by installing one One alert is called the “Complexity Threshold Alert”,
or more sensors, and registering with a Hackystat server. and it allows the developer to configure it to analyze the
During registration, the server sets up an account for the de- Chidamber-Kemerer metrics associated with each class she
veloper and sends her an email containing a randomly gen- worked on during the previous seven days and trigger
erated 12 character key that serves as her account password. an email if the values of these metrics exceed developer-
This password prevents others from accessing her metric specified values. This enables the system to monitor the
data or uploading their data to her account. complexity of the classes that the developer works on and
Once the developer has registered with a server and in- to send an email if they exceed the specified value.
stalled the sensors, she can return to her development activ- Student usage creates an opportunity for specialized
ities. Metrics are collected by the sensors and sent unobtru- analyses and alerts. For example, analyses can help students
sively to the server at regular intervals (if the developer is see whether “last minute hacking” leads to more testing fail-
connected to the net) or cached locally for later sending (if ures, less testing in general, and lower productivity. Alerts
the developer is working off line). can help students monitor their usage and inform them when
On the server side, analysis programs are run regularly their effort falls below a certain level of consistency (such
over all of the metrics for each developer. A fundamental as at least one hour of effort at least 4 days a week).
analysis is the abstraction of the raw metric data stream into Alerts provide a kind of “just-in-time” approach to met-
a representation of the developer’s activity at 5 minute in- rics collection and analysis. The developer can effectively
tervals over the course of a day. We call this abstraction the “forget about” metrics collection and analysis during her
“Daily Diary”, and it is illustrated in Figure 3. This Daily daily work, but the metrics will still be gathered and avail-
Diary shows that the developer began work on Friday, June able to her when she has a need for them. Furthermore, the
21, 2002, at approximately 9:30am, and during the first five alert mechanism can make her aware of impending prob-
minutes of work the file that was edited most frequently was lems without her having to regularly “poll” her dataset look-
called Controller.java. The location of this file is also indi- ing for them.
cated along with its Chidamber-Kemerer metrics and size,
computed from the .class file associated with the most re- 4. Results
cent compilation of this file in the developer workspace.
Among other things, this Diary excerpt also shows that be- The Hackystat project began in early 2001, and the first
tween 9:45am and 9:55am, the developer invoked 60 JUnit operational release of the server and a small set of sensors
tests that passed, 1 that failed, and none that aborted due to occurred in July 2001. The server is written in Java and con-
exceptional conditions. tains approximately 200 classes, 1000 methods, and 15,000
The Daily Diary is useful for visualizing and explain- non-comment lines of code. Client-side, tool-specific code
ing Hackystat’s representation of developer behavior, but is is much smaller: the JBuilder sensor code is approximately
not intended as the “user interface” to the system. Instead, 200 lines of Java, and the Emacs sensor code is approxi-
the Daily Diary representation serves as a basis for generat- mately 400 lines of Lisp. Hackystat is available without
ing other analyses, such as: the amount of developer effort charge under an open source license and is available for
spent on a given module per day (or week, or month); the download at http://csdl.ics.hawaii.edu/Tools/Hackystat. In
change in size of a module per day (or week, or month); the addition, we maintain a public server running the latest re-
distribution of unit tests across a module, their invocation lease of Hackystat at http://hackystat.ics.hawaii.edu/.
rate, and their success rate per day (or week, or month), the Hackystat is currently being used by approximately 40
average number of new classes, methods, or lines of code students in undergraduate and graduate software engineer-
written in a given module per day (or week, or month), and ing classes at the University of Hawaii, as well as by one
so forth. industry site. One user has development data for over 250
Analysis results are available to each developer from days spanning 15 months of usage. We will be gather-
their account home page on the web server, and can be re- ing adoption data regarding the ongoing use of Hackystat
trieved manually to support, for example, project planning throughout 2003.
activities. However, a more interesting mechanism in Hack- Our research confirms the quote that begins this paper:
ystat is the ability to define alerts, which are analyses that we did not discover a means to automatically collect PSP-
run periodically over developer data and that specify some style effort, size, and defect data. On the other hand, our
sort of threshold value for the analysis. If the threshold is research shows that automatic collection of Hackystat-style
4
Figure 3. The Daily Diary: Developer metrics at five minute intervals.
effort, size, and defect data is indeed possible. It is instruc- data, which involves examining size and effort over a repre-
tive to compare and contrast the two approaches to these sentative period of weeks.
three metrics and its implications. Size. In the PSP, the developer invokes a source code
Effort. In the PSP, effort data (whether recorded by hand analysis tool to collect size data at the end of project (and
or using a tool) is always associated with a “project” and perhaps at the beginning, if the project is an incremental ex-
a “phase” of development. So, a developer might record tension of an existing system). Size data consists of counts
that from 10:00am to 11:00am, she was working on Project of classes, methods, and non-comment lines of code.
“Timer1.2” in the phase “design”. Although in theory this In Hackystat, similar size data is collected, but this
seems simple enough, in practice it incurs significant over- data needs to be incrementally collected since there are no
head to define unique “projects” for every development ac- projects, much less defined start or end dates. This poses a
tivity, determine the “phase” to be assigned, and record indi- problem, since the source code files parsed by the source
vidual entries each time the developer switches to a different code analysis tools are frequently syntactically incorrect
task or project. In addition, the PSP requires that you record while they are under active development. Hackystat solves
“idle time”, so every phone call or colleague’s appearance this for Java by parsing the .class file associated with the
at your door generates an additional recording activity. most recent compilation of the source file. This enables
In Hackystat, effort data is associated with active mod- Hackystat to provide size information such as the actual
ification of a file, and has a fixed grain size of five minute number of new methods added during a given day.
increments. If the developer is not actively changing a file, Defects. In the PSP, the developer must record every
then they are “idle”. Instead of a “project”, Hackystat has defect, including compilation defects, as well as the time it
the concept of a “locale”, which generally corresponds to a took to remove them, any other defects injected as a result of
subdirectory (or package) hierarchy. There is currently no this defect, the phase in which the defect was injected into
attempt to represent development “phase” in Hackystat. the product as well as the phase in which it was removed.
While the PSP effort representation has the potential to In Hackystat, pre-release defect data is automatically
be more accurate than Hackystat’s, the reality is that the collected by attaching a sensor to a unit testing mechanism
overhead and context switching required to conform to PSP such as JUnit, and post-release defect data is automatically
effort collection makes it exceedingly costly to the devel- collected by attaching a sensor to a bug reporting system
oper. Hackystat effort data, on the other hand, is effectively such as Bugzilla.
“free”. Another difference is in the application of effort PSP defect collection supports a number of analyses not
data to planning and estimation. In the PSP, one plans us- possible with Hackystat defect collection, such as the re-
ing “projects” which are associated with various sizes and lationship between the cost of removal of a defect and the
effort levels. In Hackystat, one can plan using “locales”, interval in phases between its injection and removal. On
which are also associated with various sizes and effort lev- the other hand, PSP defect collection creates substantial de-
els. However, one can also plan using simple “work week” veloper overhead, and is quite sensitive to “collection fa-
5
tigue”. For example, if developers stop recording defects are also beginning empirical studies regarding the construct
as conscientiously over time, then potentially incorrect and validity of measures such as “Most Active File.” Finally,
misleading analyses (such as a trend toward decreased de- we will be assessing the long-term adoption of Hackystat
fect density) can result. Hackystat defect collection is not by following changes in usage patterns by students as they
susceptible to these problems, and does support activities move on to other classes or professional work.
such as complexity measurement validation (the develop-
ment of models that predict post-release defect rates from 6. Acknowledgments
pre-release complexity measures).
While Hackystat reduces barriers to adoption due to de-
We gratefully acknowledge the hardworking and disci-
veloper overhead, it creates a new adoption issue of its own:
plined students from our current and prior software engi-
the specter of “Big Brother”. As Figure 3 illustrates, Hacky-
neering courses. Support for this research was provided in
stat servers provide a fairly detailed log of developer activi-
part by grants CCR98-04010 and CCR02-34568 from the
ties, which may cause privacy concerns, particularly among
National Science Foundation.
professional developers who might worry about access to
the data by management. We have taken several steps to
address privacy. First, data access requires a password that References
should be known only to the developer who owns the data.
Second, we maintain a public Hackystat server that allows [1] J. Borstler, D. Carrington, G. Hislop, S. Lisack, K. Olson,
developers to keep their data “off site” and thus unavail- and L. Williams. Teaching PSP: Challenges and lessons
able to management. Third, a developer might alternatively learned. IEEE Software, 19(5), September 2002.
[2] W. Hayes and J. W. Over. The Personal Software Process
decide to download the Hackystat server and run it locally
(PSP): An empirical study of the impact of PSP on indi-
so that all data is kept under their immediate control. We
vidual engineers. Technical Report CMU/SEI-97-TR-001,
may investigate further measures, such as PGP encryption Software Engineering Institute, Pittsburgh, PA., 1997.
of data, if privacy issues are revealed to be a major barrier [3] J. Henry. Personal Software Process studio. http://www-
to adoption. cs.etsu.edu/softeng/psp/, 1997.
[4] W. S. Humphrey. A Discipline for Software Engineering.
Addison-Wesley, January 1995.
5. Conclusions and Future Directions [5] P. M. Johnson and A. M. Disney. The personal software
process: A cautionary case study. IEEE Software, 15(6),
Our first conclusion is the need for further research on November 1998.
the issue of PSP adoption. While there now exists ample [6] P. M. Johnson, C. A. Moore, J. A. Dane, and R. S. Brewer.
case study evidence that the PSP can provide software engi- Empirically guided software effort guesstimation. IEEE
neering benefits in a classroom setting, our own experience Software, 17(6), December 2000.
and other anecdotal evidence suggests that most developers [7] S. Khajenoori and I. Hirmanpour. An experiential report on
the implications of the Personal Software Process for soft-
abandon PSP practices after its use is no longer mandated.
ware quality improvement. In Proceedings of the Fifth Inter-
Our second conclusion is that a significant barrier to national Conference on Software Quality, pages 303–312,
adoption of metrics by individual developers occurs when October 1995.
there is the need to regularly “context switch” between [8] C. A. Moore. Lessons learned from teaching reflective soft-
product development and process recording. This indi- ware engineering using the Leap toolkit. In Proceedings of
cates that second generation approaches that simply auto- the 2000 International Conference on Software Engineer-
mate PSP-style effort, size, and defect collection might not ing, Workshop on Software Engineering Education, Limer-
be widely adopted. ick, Ireland, May 2000.
Our third conclusion is that third generation approaches [9] M. Ramsey. Experiences teaching the Personal Software
such as the Hackystat system present a promising means Process in academia and industry. In Proceedings of the
1996 SEPG Conference, 1996.
to eliminate the need for context switching by developers
[10] B. Shostak. Adapting the Personal Software Process to in-
by automatic collection of metric data. However, the ap- dustry. Software Process Newsletter #5, Winter 1996.
proach changes the nature of the data that is collected, and [11] D. Tuma. PSP dashboard.
raises new adoption issues related to privacy. We hope that http://processdash.sourceforge.net/, 2000.
other researchers will download and evaluate Hackystat to [12] E. F. Weller. Lessons learned from three years of inspection
explore these issues or be inspired to develop their own third data. IEEE Software, pages 38–45, September 1993.
(or fourth!) generation approach to metrics collection and [13] L. A. Williams. The Collaborative Software Process. PhD
analysis for individuals. thesis, University of Utah, 2000.
Hackystat is under active development, and we are cur-
rently developing sensors for Eclipse, Forte, and CVS. We