Communications200910 DL
Communications200910 DL
ACM
CACM.ACM.ORG OF THE 10/2009 VOL.52 NO.10
A View of
Parallel
Computing
A Conversation
with David E. Shaw
Smoothed Analysis
A Retrospective
from C.A.R. Hoare
Debating Net
Neutrality
Association for
Computing Machinery
ABCD springer.com
Noteworthy Titles
Computational More Math Into User Interface
Geometry LaTeX Design for
Algorithms and G. Grätzer, University Programmers
Applications of Manitoba, Winnipeg, J. Spolsky, Fog Creek
MB, Canada Software, New York,
M. d. Berg, TU Eind-
hoven, The Netherlands; For close to two decades, NY, USA
O. Cheong, KAIST, Math into Latex has The author of one of the
Daejeon, Korea; been the standard most popular indepen-
M. v. Kreveld, M. Overmars, Utrecht University, introduction and dent web sites gives you a brilliantly readable
Utrecht, The Netherlands complete reference for writing articles and book with what programmers need to know
books containing mathematical formulas. In (TM) about User Inteface Design. Spolsky
This well-accepted introduction to computa-
this fourth edition, the reader is provided with concentrates especially on the common
tional geometry is a textbook for high-level
important updates on articles and books. An mistakes that too many programs exhibit.Most
undergraduate and low-level graduate courses.
important new topic is discussed: transparen- programmers dislike user interface program-
The focus is on algorithms and hence the book
cies (computer projections). ming, butthis book makes it quintessentially
is well suited for students in computer science
and engineering. The book is largely self- easy, straightforward, and fun.
2007. XXXIV, 619 p. 44 illus. Softcover
contained and can be used for self-study by ISBN 978-0-387-32289-6 7 $49.95 2006. 160 p. Softcover
anyone with a basic background in algorithms.
ISBN 978-1-893115-94-1 7 $34.99
In this third edition, besides revisions to the
second edition, new sections discussing The Algorithm
Voronoi diagrams of line segments, farthest-
point Voronoi diagrams, and realistic input
Design Manual Fast Track to
models have been added. S. S. Skiena, State MDX
University of New York, M. Whitehorn,
3rd ed. 2008. XII, 386 p. 370 illus. Hardcover Stony Brook, NY, USA University College
ISBN 978-3-540-77973-5 7 $49.95
This expanded and Worcester, UK;
updated second edition R. Zare, M. Pasumansky,
of a classic bestseller Microsoft Corporation,
Robot Building continues to take the Redmond, WA, USA
for Beginners “mystery” out of designing and analyzing Fast Track to MDX
D. Cook, Motorla, algorithms and their efficacy and efficiency. provides all the necessary background needed
Whitestown, IN, USA Expanding on the highly successful formula of to write useful, powerful MDX expressions and
the first edition, the book now serves as the introduces the most frequently used MDX
Learning robotics by
primary textbook of choice for any algorithm functions and constructs. No prior knowledge
yourself isn‘t easy. This
design course while maintaining its status as is assumed and examples are used throughout
book by an experienced
the premier practical reference guide to the book to rapidly develop MDX skills to the
software developer and
algorithms. point where they can solve real business
self-taught mechanic
tells how to build robots from scratch with problems. A CD-ROM containing examples
2nd ed. 2008. XVI, 736 p. 115 illus. With online
individual electronic components – in easy-to- from within the book, and a time-limited
files/update. Hardcover
understand language with step-by-step version of ProClarity, are included.
ISBN 978-1-84800-069-8 7 $79.95
instructions.
2nd ed. 2006. XXVI, 310 p. 199 illus. With CD-ROM.
1st ed. 2002. Corr. 2nd printing 2005. XV, 568 p. Softcover
Softcover ISBN 978-1-84628-174-7 7 $54.95
ISBN 978-1-893115-44-6 7 $29.95
Easy Ways to Order for the Americas 7 Write: Springer Order Department, PO Box 2485, Secaucus, NJ 07096-2485, USA 7 Call: (toll free) 1-800-SPRINGER
7 Fax: 1-201-348-4505 7 Email: [email protected] or for outside the Americas 7 Write: Springer Customer Service Center GmbH, Haberstrasse 7,
69126 Heidelberg, Germany 7 Call: +49 (0) 6221-345-4301 7 Fax : +49 (0) 6221-345-4229 7 Email: [email protected]
7 Prices are subject to change without notice. All prices are net prices. 014091x
COMMUNICATIONS OF THE ACM
34 Probing Biomolecular Machines 56 A View of the Parallel As with all magazines, page limitations often
with Graphics Processors Computing Landscape prevent the publication of articles that might
GPU acceleration and other Writing programs that scale with otherwise be included in the print edition.
To ensure timely publication, ACM created
computer performance increases increasing numbers of cores should
Communications’ Virtual Extension (VE).
will offer critical benefits to be as easy as writing programs VE articles undergo the same rigorous review
biomedical science. for sequential computers. process as those in the print edition and are
By James C. Phillips and John E. Stone By Krste Asanovic, Rastislav Bodik, accepted for publication on their merit. These
James Demmel, Tony Keaveny, articles are now available to ACM members in
42 Unifying Biological Image Kurt Keutzer, John Kubiatowicz, the Digital Library.
Formats with HDF5 Nelson Morgan, David Patterson,
The biosciences need an image Koushik Sen, John Wawrzynek, Balancing Four Factors
format capable of high performance David Wessel, and Katherine Yelick in System Development Projects
and long-term maintenance. Girish H. Subramanian, Gary Klein,
Is HDF5 the answer? 68 Automated Support for Managing James J. Jiang, and Chien-Lung Chan
By Matthew T. Dougherty, Michael J. Folk, Feature Requests in Open Forums
Erez Zadok, Herbert J. Bernstein, The result is stable, focused, Attaining Superior
Frances C. Bernstein, Kevin W. Eliceiri, dynamic discussion threads that Complaint Resolution
Werner Benger, Christoph Best avoid redundant ideas and engage Sridhar R. Papagari Sangareddy,
thousands of stakeholders. Sanjeev Jha, Chen Ye,
48 A Conversation with David E. Shaw By Jane Cleland-Huang, and Kevin C. Desouza
Stanford professor Pat Hanrahan Horatiu Dumitru, Chuan Duan,
sits down with the noted hedge fund and Carlos Castro-Herrera Making Ubiquitous
founder, computational biochemist, Computing Available
and (above all) computer scientist. Vivienne Waller and Robert B. Johnson
Research Highlights
De-escalating IT Projects:
Review Articles 86 Technical Perspective The DMM Project
Relational Query Optimization— Donal Flynn, Gary Pan, Mark Keil,
76 Smoothed Analysis: An Attempt Data Management Meets and Magnus Mahring
to Explain the Behavior Statistical Estimation
of Algorithms in Practice By Surajit Chaudhuri Human Interaction for High-Quality
This Gödel Prize-winning work traces Machine Translation
the steps toward modeling real data. 87 Distinct-Value Synopses Francisco Casacuberta, Jorge Civera,
By Daniel A. Spielman for Multiset Operations Elsa Cubel, Antonio L. Lagardia,
and Shang-Hua Teng By Kevin Beyer, Rainer Gemulla, Guy Lampalme, Elliott Macklovitch,
Peter J. Haas, Berthold Reinwald, and Enrique Vidal
and Yannis Sismanis
How Effective is Google’s
96 Technical Perspective Translation Service in Search?
Data Stream Processing— Jacques Savoy and Ljiljana Dolamic
About the Cover: When You Only Get One Look
Leonello Calvetti brings By Johannes Gehrke Overcoming the J-Shaped
to life the bridge analogy
FPO
connecting users to Distribution of Product Reviews
a parallel IT industry 97 Finding the Frequent Nan Hu, Paul A. Pavlou, and Jie Zhang
as noted by the authors
of the cover story Items in Streams of Data
beginning on page 56. By Graham Cormode Technical Opinion
Calvetti, a photo-realistic
illustrator, is a graduate and Marios Hadjieleftheriou Do SAP Successes Outperform
of Benvenuto Cellini Themselves and Their Competitors?
College in Florence,
Italy, where he studied Richard J. Goeke and Robert H. Faley
technical design and
developed his artistic
applications of 3D software and digital imagery.
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
Board Members
DOI:10.1145/1562764.1562767
E
DWARD E. LEE’S “Computing program manager and Lockheed fi- and as a tool for the young learning
Needs Time” (May 2009) nally agreed to cancel the project. skills in the field. However, symbolic
might be the most impor- It was no surprise 20 or 30 years and spatial presentation (for the most
tant, stimulating, and timely ago that technical people involved in part the essence of the mathematical
article I have read in Commu- the development of demanding em- teaching method) has little to do with
nications over the past 50 years. It dealt bedded systems learned primarily the ability to achieve proficiency in
with a major technical topic long ne- on the job, with more trial and error mathematics or computing. Simply
glected by ACM but that is today more than we care to admit but that we ac- linearizing the symbols and spatial
important than ever. One sentence in cepted. Are universities today educat- arrangement for access by the blind
particular resonated: “It is easy to en- ing the new generation of systems and through voice (or Braille) techniques
vision new capabilities that are tech- software engineers so they are able to could still miss the point but seems to
nically within striking distance but develop computerized performance be the best we can do today.
that would be extremely difficult to systems? In my own recent cynical Similarly, teaching design as hier-
deploy with today’s methods.” Having moments rebooting my Mac, I think archical decomposition and spatial
been involved in the development of such software development is being layout redolent in techniques (such as
embedded systems in military aircraft done by people who would, perhaps, UML) also misses the point; it might
from 1968 to 1994, I would say that to be downright dangerous if assigned to thus be replaced by the simple text
succeed, computerized performance develop critical computerized perfor- “generate a hierarchical decomposi-
systems need the following: mance systems. tion and group the resulting units”
Knowledge. Of the operational objec- In any case, Lee’s technical vision with examples of what is meant and
tives and requirements of the system; and thinking were truly impressive, as any of the tricks that might be avail-
Discipline. At all technical and man- was his skill in communicating them. able. Noting that time arrangement
agement levels to assure system re- Sherm Mullin, Oxnard, CA could be replaced with spatial-ar-
sources are well defined and carefully rangement restrictions compounds
budgeted, with timing at the top of the the challenge. (A bundle of tactile
list; and great care How the Blind Learn knotted strings might be a better ex-
Selection. Of exceptionally self-di- Math and Design emplar of hierarchy than a neat 2.5D
rected project personnel, including Kristen Shinohara and Josh Tenenberg graph, as I believe Donald Knuth once
pruning out those unsuited for such were absolutely correct in “A Blind said.)
system-development work. Person’s Interactions with Technol- It may be that seeing the world as
You might view them as platitudes, ogy” (Aug. 2009) saying that replacing the sighted see it is the goal of prac-
so consider, too, several examples text with voice does not automatically titioners and educators alike; the
from four programs involving military provide the blind interaction with or motes, as it were, could indeed be in
aircraft: even access to technology. However, the wrong eyes.
In both the P-3C maritime patrol they might have added that this says Bernard M. Diaz, Liverpool, U.K.
aircraft and the carrier-based S-3A an- more about how the sighted value text
tisubmarine aircraft, senior software and its characteristics (such as left to
engineers (in both the contracting right, fixed character sets, and lamina) Correction
agency and the contractor) developed than it does about the absence of that Brad Chen of Google contributed to the
a set of real-time software “require- value in the blind. development of the news story “Toward
ments” that could not be met with In trying to teach computer science Native Web Execution” (July 2009).
available memory. In each case Navy to two completely blind students,
program managers were steeped in I have found that two key issues—
the operational requirements and had mathematics and design—are both Communications welcomes your opinion. To submit a
Letter to the Editor, please limit your comments to 500
exercised their authority to severely so dependent on spatial arrangement words or less and send to [email protected].
limit them. that the standard approaches are at
In the case of the F-117 stealth best unreliable and at worst confus-
fighter, an important new operational ing. Even a cursory glance at the excel-
subsystem was never deployed. The lent Blindmath notice board (http://
customer and the Lockheed Advanced www.nfbnet.org/mailman/listinfo/
Development Company jointly in- blindmath_nfbnet.org) demonstrates
vented a set of utopian system “re- the importance of Abraham Nemeth’s
quirements,” basically ignoring man- Braille coding for math-symbol lay-
agement directions. The customer’s out to both experienced practitioners © 2009 ACM 0001-0782/09/1000 $10.00
DOI:10.1145/1562764.1562768
Balancing Four Factors indicating the possibility for action. This and even effective. Compared to a
in System Development Projects article translates the conceptual findings monolingual search, the automatic
into principles for design. query translation hurts the retrieval
Girish H. Subramanian, Gary Klein, effectiveness (from 12% to 30%
James J. Jiang, and Chien-Lung Chan De-escalating IT Projects: depending on the language pairs).
Various translation difficulties as
The DMM Project
It is often the methodology that dictates well as linguistic features may explain
system development success criteria. Donal Flynn, Gary Pan, Mark Keil, such degradation. Instead of providing
However, an organization will benefit and Magnus Mahring a direct translation for all language
most from a perspective that looks at Taming runaway information technology pairs, we can select an intermediary
both the quality of the product and an projects is a continuing challenge for language or pivot (for example, English)
efficient process. This imperative will many managers. These are projects that and such strategy does not always
more likely incorporate the multiple grossly exceed their planned budgets further degrade the search quality.
goals of various stakeholders. To this end, and schedules, often by a factor of
a project view of system development 2–3 fold or greater. Many end in failure,
should adjust to incorporate controls into not only in the sense of budget or
a process that stresses flexibility, promote schedule, but in terms of delivered Overcoming the J-Shaped
learning in a process that is more rigid, functionality. This article examines Distribution of Product Reviews
and be certain to use evaluation criteria three approaches that have been
that stress the quality of the product as Nan Hu, Paul A. Pavlou, and Jie Zhang
suggested for managing de-escalation.
well as the efficiency of the process. By combining the best elements from Product review systems rely on a simple
the approaches, we provide an integrated database technology that allows people
Attaining Superior framework as well as introducing a to rate products. Following the view of
Complaint Resolution de-escalation management maturity information systems as socio-technical
(DMM) model that provides a useful systems, product review systems denote
Sridhar R. Papagari Sangareddy, roadmap for improving practice. the interaction between people and
Sanjeev Jha, Chen Ye, and
Kevin C. Desouza technology. This article provides evidence
Human Interaction to support the finding that online
Why is customer service more important for High-Quality product reviews suffer from two kinds
than ever for consumer technology Machine Translation of potential biases: purchasing bias and
companies? How can they retain existing under-reporting bias. Therefore the
customers and prevent them from Francisco Casacuberta, Jorge Civera, average of ratings alone may not be
discontinuing their products? What can Elsa Cubel, Antonio L. Lagardia,
representative of product quality, and
they do to provide superior complaint Guy Lampalme, Elliott Macklovitch,
consumers need to look at the entire
management and resolution? The and Enrique Vidal
distribution of the reviews.
authors answer these questions and more The interactive-predictive approach
by proposing and evaluating a model allows for the construction of machine
of complaint management and service translation systems that produce
evaluation. A holistic approach toward Technical Opinion:
high-quality results in a cost-effective
complaint management is recommended manner by placing a human operator Do SAP Successes
for retaining customers. at the center of the production Outperform Themselves
process. The human serves as the and Their Competitors?
guarantor of high quality and the role
Making Ubiquitous of the automated systems is to ensure Richard J. Goeke and Robert H. Faley
Computing Available increased productivity by proposing Managers and researchers have long
well-formed extensions to the current debated how to measure the business
Vivienne Waller and Robert B. Johnson target text, which the operator may value of IT investments. ERP systems
Computing artifacts that seamlessly then accept, orrect, or ignore.
have recently entered this debate,
support everyday activities must be made Interactivity allows the system to take
advantage of the human-validated portion as research measuring the business
both physically and cognitively ‘available’
of the text to improve the accuracy of value of these large IT investments has
to users. Artifacts that are designed using
a traditional model of computing tend subsequent predictions. produced mixed results. Using regression
to get in the way of what we want to do. discontinuity analysis, the authors
Drawing on Heidegger, the authors delve found that successful SAP implementers
deeper into the concept of ‘availability’ How Effective is Google’s improved inventory turnover in their
than existing studies in human-computer Translation Service in Search? post-implementation periods. These
interaction have done. They find two ways SAP successes also significantly improved
Jacques Savoy and Ljiljana Dolamic inventory turnover relative to their
that ubiquitous computing can be truly
available are through manipulating the Using freely available translation competitors. However, profitability
space of possible actions and through services, bilingual search is possible improvements were more difficult to find.
DOI:10.1145/1562764.1562769 http://cacm.acm.org/blogs/blog-cacm
Computer Science
but not with particularly impressive re-
sults.
Without a clear metric for success
toolset, lasted for just two hours, and can help them achieve something that 1,817 kanji, QR codes are a forerunner
the other was the final presentation ses- appeals to them. As in “You’re inter- of ubiquitous computing technology
sion of an eight-week project on Second ested in alcohol and The Simpsons. and portend great things to come.
Life programming. Both of them went Ideal. How about you make a 3D Hom- What’s remarkable to me is, for all
very well from the point of view of intro- er Simpson whose arm can move up our similarities, how widely divergent
ducing young people to the fun aspects and down to drink beer?” At that point American and Japanese urban cul-
of computer science. Whether they you can start explaining the necessary tures can be. The market penetration
pay off in terms of recruiting people to programming and math concepts to numbers aren’t that strikingly differ-
study our degree courses in CS remains do the rotation in 3D space. Or even ent; a March 2008 study showed that
to be seen. But you have to start some- just admire what they have figured out more than 84% of Americans own a
where, right? Here are some things I by themselves. Once you have them cell phone, where a Wolfram Alpha
noticed that might be useful to others hooked on programming or signed up query shows that 83% of Japanese own
who are interested in schools’ outreach on your degree program, you can build one. The differences in practice, how-
and recruitment. on it. I’m not saying we don’t need to ever, could not be more pronounced.
! A relaxed atmosphere prevailed. teach sober, serious, and worthy as- In terms of mobile phone use, walking
The young people were joking around pects of computer science. Of course the streets of Japan is like being on a
and enjoying themselves. Importantly, we do. I’m just saying we don’t need college campus all the time. It’s not un-
they were laughing with the staff rather to push it immediately. It’s kind of like reasonable to estimate that every fifth
than at them. Having some handpicked when you have a new boyfriend and person is interacting in some way with
students who I knew to be friendly and you know you have to introduce him to a mobile device, and here’s the rub on
approachable really helped with this. your weird family. Do you take him to this point—Americans make calls on
! The young people were doing stuff meet the mad uncle with the scary eye- their phones, the Japanese interact.
rather than listening to me drone on. brows straight off? No, you introduce Ubiquitous Web access and wide-
The games workshop kids spent most him to a friendly cousin who will make spread support for the mobile platform,
of their time exploring the software with him feel at home and has something in addition to the vastly increased data-
minimal time spent in demos. The Sec- in common with him. transfer capabilities, mean Japan is a
ond Life project groups were presenting What I’m suggesting is not new— society in which cell phones are a prac-
their projects and giving demos while there are pockets of excellent out- tical mobile computing platform. QR
their classmates assessed their work. reach work with kids in various parts codes have blossomed in this culture
They seemed to be taking the assess- of the world. I think it’s time we tried not only because they’re immensely
ment task seriously and responsibly. more of it, even although it is time useful to both organizations and con-
And I’ll tell you what: It really makes consuming. After all, we know we can sumers, but because the cultural soil is
them ask sensible questions at the end recruit hardcore computer scientists ripe for their adoption. QR codes have
of each presentation. This is a contrast to our degree programs with our cur- been met with lukewarm response in
to the usual setup in class where stu- rent tactics (you know, the people who the U.S., and I fear it may be yet another
dents sit like turnips when you ask, “Are are born with silver Linux kernels in mobile technology to which we get hip
there any questions?” their mouths). But given there aren’t three to five years behind the curve.
! Both the workshops involved cre- that many of them, it’s well worth the Irrespective of this, the applications
ative tasks where the teenagers chose effort to reach out to the normal popu- of QR codes in Japan are at times as-
for themselves what to build. This does lation. Unleash the inner computer tounding. For many high-dollar cor-
have the drawback of revealing my igno- scientist in everyone! porations, such as Louis Vuitton and
rance of the latest pop culture fads, but Coca-Cola, the QR code is the ad (art?)
at least I do know what South Park is. Se- From Michael itself. Oftentimes, the QR code is the
riously, though, this is very important. Conover’s actual content, made of something un-
If you want people to take pride in their “Advertainment” expected or even a medium for digital
work, they need to take some owner- Mobile phones are a way activism. Because of its robust digital
ship of it. For that to happen, they need of life in Japan, and this format, creative marketers have a lot
to have the choice to work on person- aspect of the culture man- of wiggle room when it comes to cre-
ally meaningful projects and this often ifests itself in many ways. Among the ating eye-catching, market-driven ap-
means embracing popular culture in a more remarkable are the ubiquitous plications of this technology and, like
way which we, as grown-up computer quick response (QR) codes that adorn a ubiquitous translation technology, it’s
scientists, might find baffling or in- sizable percentage of billboards, mag- the widespread use of Internet-enabled
tensely irritating. azines, and other printed media. In phones that underlies this technologi-
Rather than pushing our agenda of brief, these two-dimensional bar codes cal paradigm shift.
what we think is important and berat- offer camera phones with the appropri-
ing young people that they ought to ate software an opportunity to connect Greg Linden is the founder of Geeky Ventures. Michael
find it interesting, we need to meet with Web-based resources relating to Conover, a Ph.D. candidate at Indiana University, is a
visiting researcher at IBM Tokyo Research Laboratory.
them halfway. We need to start from the product or service featured in an Judy Robertson is a lecturer at Heriot-Watt University.
their interests, and then help them to advertisement. Encoding a maximum
see how computer science knowledge of 4,296 alphanumeric characters, or © 2009 ACM 0001-0782/09/1000 $10.00
ACM
Member
News
DOI:10.1145/1562764.1562770 David Roman
Susan T. Dumais,
T
220,000 specimens have been digitized
H E H E RBA RI U M AT the Univer- data repository, via the National Sci- by hand, with students laboriously key-
sity of Alaska’s Museum of ence Foundation’s TeraGrid cyber- ing in detailed information from the
the North houses one of the infrastructure. “We take images and specimens’ labels, some of which are
world’s largest collections of they’re immediately downloaded to handwritten and some of which are
arctic plant specimens and Texas, and in live time we have a link to in Russian. To record the remaining
Russian flora. For scientists studying that file from our database,” says Steffi 140,000 specimens, however, the proj-
the ecology of this region or the chang-
ing biodiversity caused by human en-
croachment and climate change, it’s an
invaluable resource. However, the Her-
barium is not ideally located for most
scientists. Because of the considerable
expense of traveling to Alaska, scien-
tists often have specimens temporarily
shipped to their institutions, but that
also costs money, and delicate speci-
mens suffer from the attendant wear
and tear. As a result, the Herbarium is
scanning and digitizing its extensive
collection, making the images and text
available on the Internet to scientists,
not to mention enthusiastic amateurs,
everywhere in the world.
The amount of data involved—
about five terabytes so far—is hardly
intimidating by today’s standards, but
it proved to be overwhelming for the
Herbarium’s previous database part-
ner. Consequently, the Herbarium
teamed up with the University of Texas
at Austin’s Texas Advanced Comput-
ing Center (TACC), gaining access to
TACC’s Corral, an online 1.2 petabyte The ambitious Encyclopedia of Life aims to create a Web page for every known species on Earth.
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 11
news
ect is shifting to automated label scan- visitors who want to dig deeper to more “discovered” on more than one occa-
ning using Google’s Tesseract optical specialized, detailed resources. In- sion and received duplicate names,
character recognition engine. deed, says EOL Executive Director Jim and sometimes molecular or DNA
Handling varied data, such as text, Edwards, also at NMNH, a principal analysis demonstrates that what’s long
numbers, images, and sounds, no motive behind EOL was the realization been regarded as a single species is, in
longer poses any great technical dif- that many research communities are fact, two distinct species.
ficulties, says Chris Jordan, who is in building their own databases—such as In addition, it’s essential to be able
charge of data infrastructure at TACC. AntWeb, FishBase, and others—each to search in less formal ways, such as
Corral uses Sun Microsystems’ Lustre with its own design, interface, and by habitat type, mating behavior, or
file-handling program to present users search procedures, created to meet the leaf shape, characteristics that aren’t
with a seamless interface to the data. needs of its particular community. often described by a standardized set
What’s trickier is building a presenta- Launched with grants from the Mac- of terms. That’s a particular issue for
tion layer that gives scientists in a par- Arthur and Sloan foundations, EOL re- one of EOL’s partner efforts, the Biodi-
ticular discipline access to the resourc- ceives support from several museums versity Heritage Library (BHL), a collab-
es in an intuitive and user-friendly way. and other institutions. EOL invites oration of 10 natural history museum
Unlike researchers in physics and en- scientific contributions from amateurs libraries, botanical libraries, and re-
gineering, for example, those working and academics, but uses a network of search institutions in the U.S. and the
in museums or the humanities aren’t researchers to decide what informa- U.K. that has put nearly 15 million digi-
accustomed to using computers at the tion will be included. The site tries to tized pages from 37,000 books online.
command line level, says Jordan, so “a steer a middle course between a pure Although optical character recognition
lot of our effort now is on [building] in- top-down model, with pages created software has become reliable, scanning
terfaces for people to locate data along only by experts, and a self-policed wiki. millions of pages is labor-intensive and
with descriptive metadata in a way EOL resides on servers at the Smith- time consuming, says Chris Freeland,
that’s reasonably easy.” sonian and the Marine Biological Lab- BHL’s technical director and manager
oratory in Woods Hole, MA, but since of the bioinformatics department at
Accessible and Useful it is mainly an index to other informa- the Missouri Botanical Garden in St.
Making vast repositories of biologi- tion, the data cached on those servers Louis. Someone—usually a volunteer
cal information widely accessible and amounts to a few hundred megabytes. student—must turn the pages of every
useful is the aim of the Encyclopedia EOL’s informatics challenges derive in book and correctly position them for
of Life (EOL), an ambitious project large part from the historical origins the camera. After that phase, though,
whose long-term goal is to create a of biological information. Even the the digitizing process is automated
Web page for every known species on formal names of species can be treach- and efficient. Typewritten or commer-
Earth. “We want to provide all infor- erous, since some species have been cially printed material is optically well
mation about all organisms to all audi-
ences,” says Cyndy Parr, EOL’s species
pages group leader, who is based at
the Smithsonian’s National Museum
of Natural History (NMNH) in Wash-
ington, D.C. So far, EOL has about
1.4 million species pages, but most of
them are little more than placeholders
for known organisms, frequently di-
recting visitors to an external link with
more information. Some 158,000 pag-
es, however, contain at least one data
object that’s been scientifically vetted
by EOL. The project is chasing a mov-
ing target, since the number of species
on the planet is unknown. A figure of
1.8 million known species is common-
ly accepted, says Parr, but new species
are being discovered at a rate of 20,000
or more each year, and some extrapo-
lations suggest that there may be as
many as 10 million distinct species.
Far from competing with efforts
such as those by the University of Alas-
ka’s Herbarium, EOL aims to build
species pages that deliver essential in-
formation in a uniform style and lead Galaxy Zoo uses crowdsourcing to help categorize galaxies as either spiral or elliptical.
T
HE CONTROVERSY ABOUT Vinton Cerf, chief Internet evange-
network neutrality—the list at Google, also sees potential acces-
principle that Internet us- sibility problems if broadband provid-
ers should be able to access ers wield too much power. “All of us, in
any Web content or use any the consumer space especially, stand
applications, without restrictions or to lose the opportunity to access new
limitations from their Internet service products and services if we don’t get
provider (ISP)—remains unresolved [the debate] right,” Cerf warns.
in the U.S. over who, if anyone, has a Cerf cites the well-publicized legal
legal or commercial right to regulate wrangling over broadband provider
Internet traffic. Net neutrality pro- Comcast’s deliberate slowing down of
ponents advocate for legislation that traffic for customers using BitTorrent
would keep broadband service provid- and other filing-sharing applications
ers from controlling Internet content on its network in 2007. “As long as we
or gaining the ability to impose extra have clumsy or consumer-unfriendly
charges for heavy users of the Internet. regimes for network management,” he
Opponents argue that existing rules says, “we will see problems for some
enforced by the Federal Communica- reasonable applications and thus some
tions Commission (FCC) and others inhibition of innovative new services.”
make additional laws unnecessary, or Cerf envisions a safer Net world
could jeopardize service providers’ where regulations would constrain
First Amendment rights. anti-competitive practices, such as un-
But now some analysts are warning fair pricing by ISPs that compete with
that the battle may ultimately mean providers of online movie-on-demand
much more than decisions about bits, Musical artist Moby, left, and U.S. Representative services. “Fairly high aggregate caps
Ed Markey at a SavetheInternet.com press
bandwidth, pricing, and flow control. conference on net neutrality in Washington, on total monthly transfers or, prefer-
“The battle for Net neutrality is also D.C., with a counterdemonstrator’s sign ably, a cap on the minimum assured
a battle over the shape of American blocking the view of the Capitol dome.
rate for a given price would be very at-
culture,” says Tim Wu, a Columbia explains. Part of an AT&T customer’s tractive,” Cerf says. The “key point,”
University law professor specializing monthly bill would support the Wall he adds, is that “the user gets to pay
in copyrights and communications Street Journal, for example, but these for a guarantee of a certain minimum
who wrote a 2003 paper, “Network subscribers “can’t get access to USA rate. If the user doesn’t use it, others
Neutrality, Broadband Discrimina- Today or something else. It starts to may. That’s the whole dynamic of ca-
tion,” which popularized the concept become a little bit dicey when you can pacity sharing.”
of Net neutrality. see how that would help [news organi-
Wu fears that in an under-regulated zation’s] business models.” Net Neutrality Skeptics
Internet ruled by ISPs, commercial mo- Innovation and experimentation Others dismiss such fears. “The threat
tives could stifle entertainment, cul- could also suffer, Wu warns. “In the level is not red or orange. We are way
ture, and political diversity. For exam- early days of mobile phones, the only down in yellow,” says Barbara Esbin,
ple, Wu says, the current open Internet way an application appeared on a mo- director of the Center for Communi-
is “bad for the business models” of bile phone was if it made money for cations and Competition Policy at the
traditional newspapers, many of which the phone company,” he says. But in Progress and Freedom Foundation, a
don’t charge for the news they publish an Internet context, because some pro-business Washington, D.C.-based
on their Web sites. But what if a news now-familiar features don’t have a di- think tank supported in part by broad-
organization, for financial means, rect commercial bent, they may never band providers such as AT&T, Comcast,
PHOTOGRA PH BY BRIA N LONG
aligned itself with an individual ISP to have been introduced. “In an Inter- and Verizon.
sell premium content? net that [ISPs] can control, why would Net neutrality skeptics dismiss the
“If you imagine a non-open Inter- you put Wikipedia on it?” he asks. “It notion that the lack of additional Inter-
net, there could be the official newspa- doesn’t make sense because it doesn’t net controls is endangering society, and
pers of AT&T’s network, let’s say,” Wu make money.” argue that the status quo engenders a
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 15
news
A
P ROTOT YPE OF your com-
pany’s latest product sits
on the conference table.
You suggest sleeker styl-
ing. The 3D model reacts,
fluidly changing its angles. A colleague
recommends different proportions,
and again the model reflects those
edicts. A pass of the hands changes the
model’s color; another reveals its inte-
rior, while a third folds it back into a
flat sheet. Computer-aided design soft-
ware records all the changes to replay
them later, and to easily produce the
final version.
That’s the vision of researchers at
Carnegie Mellon University and else- A top view of two magnet rings, with individual driver boards, from the self-actuating Planar
where as they embark in the new field Catom V7 developed by the Carnegie Mellon University-Intel claytronics research team.
of robotics known as claytronics. In it, between Carnegie Mellon University’s Both projects face a host of chal-
a shape-shifting object is comprised of Collaborative Research in Program- lenges. Most immediately, the indi-
thousands of miniature robots known mable Matter project and Intel Re- vidual mechanical devices need to
as claytronic atoms, or catoms. Each search Pittsburgh’s Dynamic Physical be much smaller than the units most
catom has the power to move, receive Rendering project—focuses on pro- recently demonstrated, which were
power, compute, and communicate grammable materials’ shape-shifting about the size of large salt shakers. Ja-
with others, resulting in an ensemble aspects as well as the software needed son Campbell, a senior staff research
that dynamically envisions three-di- to drive it. The other major effort eyes scientist at Intel’s Pittsburgh labora-
mensional space much as the pixels military applications; this collabora- tory, says his group is working with
on a computer monitor depict a two-di- tion is between the Defense Advanced catoms of a tubular structure that are
mensional plane. Functional scenarios Research Projects Agency (DARPA) and much smaller—1 millimeter in diam-
for claytronics include a general-pur- a consortium of colleges including the eter and 10 millimeters in length. He
pose computer changing from laptop University of California at Berkeley, believes that they need only be small
to cellphone form as needed, or, say, Harvard University, the University of enough to disappear to our eyes and
a plumber’s tool that changes from a Pennsylvania, Massachusetts Institute touch. “That gives us a size target be-
scuttling, insect-like device in crawl of Technology, and Cornell University. tween a tenth of a millimeter and one
spaces to a slithering snake-like shape millimeter across,” he says, with the
when inside pipes. size requirement depending on ap-
Ultimately, the claytronics dream To form stable plication. A shape-changing radio an-
PHOTOGRA PHS COU RTESY OF CA RNEGIE M ELLON UNIVERSIT Y- INT EL
goes, we may have morphing human- tenna doesn’t need to be very small, for
oids like T-1000 in the movie Terminator ensembles, catoms example, whereas for a handheld de-
2 or Odo in the TV series “Star Trek: Deep must have power, vice, “the consumer is the perceptual
Space Nine.” Because their shape-shift- system” and miniaturization has great
ing abilities could make them excellent communication, value, says Campbell.
mimics of human forms, such androids and adherence Claytronics research flows out of the
might act as reusable stand-ins for sur- field of modular robotics, where indi-
geons, repair technicians, and other ex- to one another. vidual units are typically a few inches
perts who control them remotely. across. But as the units get smaller—
and more numerous—the physics con-
Collaborative Research trolling them changes. Kasper Støy, as-
Two joint ventures have taken the sociate professor of computer systems
lead in this field. One partnership— engineering at the University of South-
ern Denmark, points to one difference their size, the catoms are translucent. Education
My guess is that we’ll use multiple ways
U.S. CS
between the macro and micro levels.
With larger modules, he says, “we have of getting power in.”
to fight a lot with gravity. All the mod- Mark Yim, an associate professor
ules have to be able to lift the others.
We’ve long used electromechanical
of mechanical engineering at the Uni-
versity of Pennsylvania who is involved
Courses
actuators to overcome gravity in large
modules. But at the micro level, you
with the DARPA project, has experi-
mented with drawing power from the Decline
don’t want to have a big actuator in environment and letting the catoms’
every little unit. The question is, How intelligence and adhesion turn it into The number of computer
science courses being taught
do you go from a bunch of really weak useful motion. “I asked, What if we in high schools throughout
modules and get a big, strong robot?” shake the table to get the modules to America is steadily decreasing,
Size is not the only challenge. To move around each other, and the mod- according to the 2009 CSTA
Secondary Computer Science
form stable ensembles, catoms must ules determine when to grab or let go?”
Survey conducted by the
have power, communication, and ad- says Yim. “If they get really small, you Computer Science Teachers
herence to one another. The Carnegie might be able to move them with sound Association. The survey, based
Mellon project is currently attempting waves that shake the air.” on responses from 1,094 high
school teachers in spring 2009,
to address most of these issues by at- As catom production ramps up, found that only 65% of schools
taching electrostatic plates to the sur- these questions assume greater ur- offer an introductory CS course,
face of each catom. Plates given oppo- gency. Goldstein notes that his lab, in compared with 73% in 2007
site charges would cling to each other; collaboration with the U.S. Air Force and 78% in 2005. Likewise, the
number of advanced placement
changing the charges in a precisely Research Laboratory, has “refined the CS courses has also declined,
programmed sequence would force process to print the catom’s circuits with 27% in 2009, 32% in 2007,
the plates to roll over one another into on a standard silicon die, then post- and 40% in 2005.
“There are really three big
the desired configuration. Information process it to curl up into a sphere or requirements for reversing
could be exchanged between adjoining a tube.” By this December, Goldstein the trends highlighted in the
catoms by manipulating their voltage says, he hopes to fabricate a silicon cyl- survey,” said CSTA Director
differences. inder about one millimeter in diameter Chris Stephenson in an email
interview. “First, we need to
Claytronics researchers are con- that will “move around under its own ensure that computer science
sidering several options for providing control, transfer power from a ground is classified as a rigorous
power. In early versions, “catoms will plane, and be able to communicate academic subject in K-12 and not
as a ‘technical’ or ‘vocational’
get power from the table they sit on, with other units.”
subject. Second, as a community,
through capacitive coupling,” says Seth we have to do a much better job
Goldstein, an associate professor of The Shapes to Come of explaining to policymakers,
computer science who leads the Carn- A pile of catoms is useless without coor- parents, and students what
unique skills and knowledge
egie Mellon team. But additional tech- dination. And although the challenge of computer science provides
nologies are also under consideration, creating claytronic software is similar to and what future opportunities
Goldstein says. “We are looking at mag- that of any massively parallel computer it offers. Finally, we have to
netic resonance coupling, which goes system, there is one key difference: The fix the mess that is computer
science teacher preparation and
over a longer distance,” he says. “We’re catoms’ physical nature makes error- certification in most states, so
also looking at solar power. Because of correction more important. “Catoms that computer science teachers
can be certified as CS teachers
and not as math, science,
business, or tech teachers.
“Our highly decentralized
education system in the U.S.
makes it very difficult to achieve
systematic, long-term change,”
said Stephenson. “In each state,
the players, the priorities, and
the policies are different.”
The survey’s findings were
not unexpected. “This is the
third time we have done this
survey,” said Stephenson, “and
so while we were not really
surprised by the results, we
were more troubled than ever
with the continued decline in
the number of courses being
offered and the fact that it is
getting increasingly harder
for students to fit elective
A pair of Planar Catom V8s, each of which weighs 100 grams, with their stack of magnet- computer science courses into
sensor rings on the bottom and solid-state electronic control rings on the top. their timetables.”
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 17
news
ing tasks will need to be distributed his lab at Carnegie Mellon will build
throughout the ensemble. “The more working, submillimeter catoms by
For catoms to be parts you have, the more it matters the end of 2009. By 2012, he believes,
commercially viable, that you have processing on each part. the researchers will have “hundreds
There are still things you can central- of catoms working together intelli-
large quantities ize, but decisions about what modules gently.” More fantastically, Goldstein
of the devices must to move when have to be done inside hopes that in 30 years there will be a
the aggregate,” says Campbell. claytronic humanoid that looks, feels,
be produced at The Carnegie Mellon team uses and moves in ways almost identical
a low cost. simulations to address these ques- to a person.
tions, and has created two languages But such ambitious goals may ob-
that are optimized to program the en- scure a more fundamental issue. “Say,
semble. One, called Meld, is a state- you have something like claytronics,”
less, declarative, logic-based language Yim says. “You have the mechanical part
used to control the behavior of indi- working. You’ve got the computational
are mechanical devices, and mechani- vidual catoms. The second, locally dis- part working so you can control them in
cal devices fail,” says Campbell. tributed predicates (LDP), is used to a distributed fashion. The question is,
There’s also the question of external recognize patterns and evaluate con- What do you do with it? Maybe you want
control: How much instruction should ditions throughout the ensemble as a it to go get a beer from the refrigerator.
the aggregate get from an outside whole. Meld and LDP are complemen- How does it know what is the best shape
source? Boston University electrical tary, says language developer Michael to be in to open the door?”
and computer engineering professor Ashley-Rollman, a computer science Indeed, making claytronics truly
Tommaso Toffoli expresses skepticism graduate student at Carnegie Mellon. useful depends on developments in
about the ability of a claytronic ensem- “If one robot stops working,” he ex- areas beyond these projects’ current
ble to be self-directed. In the 1990s, he plains, “Meld just throws everything scope, including artificial intelligence
says, companies attempting to build away from that robot and pretends and human modeling. But the scenar-
massively parallel computers failed be- that it was never there.” LDP, on the ios described only illustrate possible
cause they misread this problem. “Peo- other hand, “permits programmers applications for claytronics, and in fact
ple eventually found that instead of to manage states themselves”—at researchers are quicker to say what the
having millions of tiny computers the the expense of error correction. LDP, technology will be than what it will do.
size of ants trying to simulate the brain, Ashley-Rollman says, “gives Meld rules But that’s the very problem that gener-
they got more work done by having just through which it can determine what al-purpose computers faced until the
a few hundred that act as slaves, carry- the state can be.” mid-1980s; like PCs, claytronic ensem-
ing sand and water to a floating-point bles may ultimately prove their worth
processor,” Toffoli says. The Reality Challenge in ways now unimagined.
Campbell disagrees. He believes Commercial viability for catoms will
that for catoms to be as effective as require major advances in the ability Tom Geller is an Oberlin, Ohio-based science, technology,
and business writer.
possible, they will each need to have to produce large quantities of the de-
onboard intelligence and that process- vices at low cost. Goldstein hopes that © 2009 ACM 0001-0782/09/1000 $10.00
Milestones
I
1970s the president
N T H E LAT E the drive to do things is a very impor- accomplished simply through force of
of a company I worked with tant attribute. Certainly, having a will. Just because we really, really want
asked the company’s econo- strong conviction that something can- something doesn’t mean we will be
mists and financial forecasters not be done is usually a self-fulfilling able to get it—even if we are the presi-
to provide their future market prophecy. If people are convinced that dent of a major corporation.
predictions for the next decade. This something is not achievable, then
was a capital-intensive business with they usually won’t achieve it—if we ar- Infectious Conduct
long lead times for equipment pur- gue for our limitations, we get to keep Capricious behavior, particularly on
chases, so a realistic appraisal of the them. But sometimes things cannot be the part of powerful or influential peo-
likely future business environment
was a pretty important thing. When
the predictions arrived, they did not
match what the president wanted to
see—not at all.
It is not uncommon that people in
positions of power are strong-willed,
opinionated individuals who are used
to getting their own way. The president
was no exception and his response to
the figures presented was quite dra-
matic. After some blustering, shout-
ing, and numerous derogatory remarks
directed at the professional expertise
PHOTOGRA PH BY M IKHA IL ESTEVES
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 19
viewpoints
ple, can be infectious. When one person to do for his customer, the customer al- and hold your ground like that.
in a system starts acting oddly, nearby ways wanted more. In fact, it seemed Then an interesting thing started
people have two choices: to label the be- like the more he gave the customer the happening. Since there were dependen-
havior as odd or to act like it is normal. more was wanted. The “customer is al- cies operating between the division’s
If, for whatever reason, people act as if ways right” approach did not seem to projects, other project managers start-
someone’s weird actions are OK, they be working. After much discussion we ed intentionally “hooking” their proj-
themselves start behaving weirdly. It is decided that: ect estimates and plans to my friend’s
almost as if the odd behavior is catch- ! The customer was a very strong- project plan. Their reasoning was that
ing. To compound the problem, we hu- willed and decisive person. since the boss doesn’t mess with that
mans have built-in rationalizing capa- ! Asking for “more” is a perfectly ap- project if my project has dependencies
bility that kicks in like a reflex when we propriate thing for a customer to do, on it, he won’t mess with my project ei-
act in an odd or unethical manner. This especially from the customer’s per- ther. As each project stabilized by being
rationalization employs a thing called spective. strongly coupled to more realistic proj-
“cognitive dissonance” and allows us ! The customer was asking for more ect deadlines, everyone started calming
to continue to act in a weird way while because more was being provided. down. Inexorably, sanity spread across
simultaneously retaining the convic- ! If the construction manager agreed the organization as people started com-
tion that we are not acting in a weird to do more, it meant he must be able to mitting only to things they really be-
way at all.a do it. lieved they could do.
! As a strong-willed decisive person,
The Project Managers: Can-Do the customer was expecting the man- Infectious Insanity
A friend of mine, a project manager at ager to be equally strong-willed and The opposite can be true too, as evi-
a large electronics company, described decisive in deciding what could not be denced in the case of the financial
this behavior in the planning meetings done. forecasting scenario mentioned ear-
he attends. The project managers re- ! The customer would continue pil- lier. Confronted with the president’s
porting to the strong-willed and force- ing on until the project manager said demand to “…make these numbers hap-
ful vice president in charge of their divi- “no.” pen…” the Ph.D. economists returned
sion almost compete with each other to This could be called “Newton’s First to their financial forecasting groups
promise things to the boss and pretend Law of Behavior”: every behavior will and had to figure out how to ignore fac-
they and their teams can do things they continue until acted upon by another tors or promote factors to make it hap-
really don’t think they can do at all. The behavior. We could even extend this to pen. One can imagine the chief econo-
boss is the instigator. He puts enormous Newton’s Third Law and infer that the mist’s assistant saying “…but why would
pressure on his people and browbeats other behavior must be equal and op- we ignore this factor, boss? We’d be crazy
them if they come up with numbers posite. This meant the project manager to do that! That’s not how it works!” To
he doesn’t like. When one manager had to learn how to apply equal force in which the chief economist might reply
“caves” and agrees to something he (se- saying “no” to the extra work to balance “Well, that’s how it works if you want to
cretly and privately) doesn’t think can the force the customer was applying in keep your job…”.
be done, the others feel they have to demanding the extra work. Then cognitive dissonance kicks in
do so as well. The compliant managers and people start rationalizing: “…well
then get praised by the vice president, Breaking the Cycle maybe the factor really isn’t that impor-
which reinforces the behavior. To stop this behavior, people and orga- tant…” and, starting from the top, ev-
Privately, the managers bemoan nizations must somehow get out of the eryone starts separating from reality.
their fate and wring their collective cycle. For the construction manager it In the business of software, this
managerial hands over what their boss was to learn to be firm and to realize the cycle results in significant and peren-
has forced them to commit to. But, un- customer is not best served by trying nial overcommitments. Lacking well-
til recently, they didn’t change their be- to do everything. The customer is best defined estimation practices, these
havior. served by most effectively doing the commitments are simply the wishes
most important things. of the strongest-willed people with the
The Construction Manager: For the software project manag- highest authority—unless the organi-
First Law of Behavior ers dealing with the vice president, zations and people that work for them
Years ago, I was coaching a (non-soft- my friend bravely decided to break the can provide the appropriate counter-
ware) manager working in the construc- cycle himself. After working a lot on his response.
tion industry. An affable and customer- project’s estimation practice, he vigor- It seems that sensible behavior or
centric person, his life was being made ously defended his estimates to the vice weird behavior will grow within organi-
very difficult by his primary customer. president and simply refused to back zations rather like a disease propagates.
The construction company built tele- down when pressured to reduce the It is an upward or a downward spiral.
phone switch centers and no matter projections. At one point, he even chal- But we can choose the direction.
what the manager promised and agreed lenged the vice president to fire him if
necessary. The vice president wisely Phillip G. Armour ([email protected]) is a senior
consultant at Corvus International Inc., Deer Park, IL.
a Tavris, C. and Aronson, E. Mistakes Were Made chose not to do this and privately com-
(But Not by Me). Harvest Books, 2003. mented that it “took guts” to stand up Copyright held by author.
Historical Reflections
Computing in the
Depression Era
Insights from difficult times in the computer industry.
S
I N CE T H E BE G I N N I N G of the
computer industry, it has
been through several ma-
jor recessions. The first big
one was in 1969; there was
a major shakeout in the mid-1980s;
and most recently there was the dot-
com bust in 2001. A common pattern
observed in these downturns was that
they occurred approximately five years
after the establishment of a new com-
puting paradigm—the launch of the
IBM System/360 platform in 1964, the
personal computer in the late 1970s,
and the commercial Internet in 1995. Despite the Great Depression, IBM increases its employment, trains more salesmen, and
These new computing modes created increases engineering efforts.
massive opportunities that the entre-
preneurial economy rapidly supplied But there was an office machine indus- exporters of office machinery in the
and then oversupplied. It then took try whose products—typewriters, add- 1930s were the U.S. and Germany. Most
only a small downturn in the wider ing machines, and accounting equip- other advanced countries, such as Brit-
economy to cause a major recession in ment—performed many of the tasks ain and France, had their own office
the computing industry. now done with computers. Indeed, the machine industries, but they found it
The current recession appears to be most successful of the early computer difficult to compete at the best of times.
quite different when compared to these firms sprang from the office machine Their response in the depression was
earlier downturns. Unlike in earlier re- industry—IBM, NCR, Remington Rand to impose “ad valorem” import duties
cessions, computing is not a victim of (later Univac), and Burroughs among of 25% or 50%. In these countries im-
its own excess, but is suffering from the them. port duties combined with a general
general economic malaise. Computing During the depression years protec- lack of business investment made of-
is not much different than any other in- tionism was seen as one of the policy fice machines formidably costly, and
dustry in the current recession—it has options. Because this option still sur- the domestic products were not always
PHOTOGRA PH COURT ESY O F IBM CORPORAT E ARCH IVES
no unique immunity. However, it has faces from time to time, it is instruc- an adequate substitute. Although the
no unique vulnerability either, which tive to see what happened in the 1930s. import duties on office machinery may
offers a small amount of comfort. The U.S. was a net exporter of office have briefly helped domestic manu-
machines, so it was not interested in facturers, retaliatory protectionist
Lessons from the Great Depression protectionism in this sector, of course. measures in other industries simply
To get some insight into what is hap- Most of the drive for protection came led to a downward spiral of economic
pening today we have to look back to from nations that imported office ma- activity. At the height of the depression
the Great Depression. Electronic com- chinery—but these polices were often in 1932, office machine consumption
puters did not exist at the time of the the result of a tit-for-tat elsewhere in worldwide was down a staggering 60%.
Wall Street crash of 1929, of course. the economy. The world’s two biggest It is a difficult lesson, but selective pro-
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 21
viewpoints
tectionism did not help in the Great ent than NCR’s. IBM’s main product IBM’s second big revenue source
Depression and it won’t help today. line between the wars was punched was the sale of punched cards. IBM en-
card accounting equipment, which was forced a rule—and got away with it until
IBM and NCR the most “high-tech” office machinery. an antitrust suit in 1936—that only IBM
The cases of IBM and NCR make in- There were machines for punching cards could be used on IBM machines.
teresting contrasts in weathering the and verifying cards, others for sorting Cards were sold for $1.40 a thousand,
economic storm.a NCR fared badly—it and rearranging them, and the most far more than they cost to produce. Card
didn’t go out of business, but it was not complex machine—the tabulator— revenues accounted for an astounding
until World War II that it fully recov- could perform sophisticated account- 30% of IBM’s total revenues and an even
ered. In 1928, the year before the crash, ing procedures and report generation. higher percentage of its profits. Because
NCR was ranked the world’s second Although orders for new machines fell cards were a consumable, firms had to
largest office machine firm (after Rem- drastically, IBM’s president Thomas continually purchase them so long as
ington Rand) with annual sales of $49 J. Watson, Sr. decided to maintain the they were still in business.
million; IBM was ranked the fourth manufacturing operation. Watson rea- The key difference between NCR
largest with sales of less than $20 mil- soned that rather than disband IBM’s and IBM was that NCR made almost
lion. A decade later, and well into the skilled design and manufacturing work all its money from the sale of new ma-
recovery, NCR sales were only $37 mil- force it would be more economical, as chines, whereas IBM made its money
lion, whereas IBM had sales approach- well as socially desirable, to stockpile from two sources: leasing and the sale
ing $40 million and it was the largest machines for when the upturn came. of punched card consumables. Look-
office machine supplier in the world. For IBM, the upturn came in 1935 ing back at the NCR and IBM experi-
The depression years 1929–1932 when President Franklin D. Roosevelt ences with the benefit of hindsight, we
were a desperate time for NCR. Before launched the New Deal and the Social can see it was an early incarnation of
the crash it had been a wonderful firm Security Administration. The new ad- the product-versus-services business
to work for. Headquartered in Dayton, ministration was hailed as the “world’s models. When they start out, product
Ohio, it had built a “daylight factory” biggest bookkeeping operation.” IBM firms have the advantage that they get
set in urban parkland in the 1890s and turned out to be the only office ma- a very rapid return on investment. In
it had pioneered in employee welfare, chine firm left that had an adequate a single sale, the firm gets all the re-
with all kinds of health, recreational, inventory of machines to service the turns it will ever get from a customer.
and cultural benefits. During the de- operation and it supplied 400 account- This helps a firm to grow organically
pression years NCR’s sales fell cata- ing machines to the government. It was in its early years. In contrast, when a
strophically. Overseas sales, which had a turning point for IBM and its profits services firm takes a new order it gets a
formerly amounted to 45% of total soared for the remainder of the 1930s. modest annual income extending over
sales, were badly affected by protec- Watson was justly celebrated for his many years. This slower rate of return
tionism. According to the then-CEO faith in the future. He became a confi- makes it difficult for a firm to retain
Edward Deeds “commercial treaties, dant of Roosevelt and chairman of the profits to achieve organic growth and
tariff barriers, trades restrictions, and International Chamber of Commerce. it may need access to capital until it
money complications” took “produc- Heroic as Watson’s strategy was, it starts to generate a positive income.
tivity from the Dayton factory.” It had to would not have been possible for NCR, But the slower growth and steady in-
cut its work force by more than half in Remington Rand, or Burroughs to do come makes for much less volatility.
order to survive—from 8,600 down to the same. IBM had an income that was As Communications columnist Michael
3,500 workers. At the worst of times, all practically recession-proof. First, IBM’s Cusumano noted in 2003—writing in
that could be done was to sponsor a re- machines were not sold but leased. the aftermath of the dot-com bust—
lief kitchen, run by the NCR Women’s The manufacturing cost of an account- the trick for survival of all firms is
Club, to feed the unemployed and their ing machine was recovered in the first getting the right mix of products and
families. Mirroring the fall in business, one or two year’s rental, and after that, services.b Products generate a rapid
NCR’s shares fell from a peak of $165 apart from the cost of maintenance, return on investment, but services pro-
in 1928 to $6.79 in 1932. Recovery was the machine revenue was pure profit. vide a steady income that gives some
very slow, and was only fully achieved The accounting machines had an aver- immunity against recessions. It’s not
with the coming of World War II when age life of at least 10 years, so it was an easy to get the balance right, but IBM
NCR was put to work making arma- extremely profitable business. During did it in the 1930s.
ments, analog computer bombsights, the depression, although new orders
and code-breaking machinery. stalled, very few firms gave up their
b Michael A. Cusumano, “Finding Your Balance
The story of IBM in the Great De- accounting machines—not only were
in the Products and Services Debate,” Com-
pression could hardly be more differ- they dependent on the equipment, mun. ACM 46, 3 (Mar. 2003), 15–17.
but they needed them more than ever
to improve efficiency. During the de-
a An excellent economic and business history
pression years, while IBM did not lease Martin Campbell-Kelly (M.Campbell-Kelly@warwick.
of these firms is: James W. Cortada, Before the ac.uk) is a professor in the Department of Computer
Computer: IBM, NCR, Burroughs, and Remington many new machines, it was kept afloat Science at the University of Warwick, where he specializes
Rand and the Industry They Created 1865–1965, by the revenues from those already in in the history of computing.
Princeton University Press, 1993. the field. Copyright held by author.
Inside Risks
Reflections on Conficker
An insider’s view of the analysis and implications of the Conficker conundrum.
I
cessible computers. Even on those oc-
N MID-FEBRUARY 2009, something casions when patches are diligently
unusual, but not unprecedented, produced, widely publicized, and auto-
occurred in the malware defense disseminated by operating system (OS)
community. Microsoft posted infect PCs around the world was known and application manufacturers, Con-
its fifth bounty on the heads of well before Conficker began to spread ficker demonstrates that millions of In-
those responsible for one of the latest in late November 2008. The earliest ac- ternet-accessible machines may remain
multimillion-node malware outbreaks.4 counts of the Microsoft Windows buffer permanently vulnerable. In some cases,
Previous bounties have included Sasser overflow used by Conficker arose in early even security-conscious environments
(awarded), Mydoom, MSBlaster, and September 2008, and a patch to this vul- may elect to forgo automated software
Sobig. Conficker’s alarming growth nerability had been distributed nearly a patching, choosing to trade off vulner-
rate in early 2009 along with the appar- month before Conficker was released. ability exposure for some perceived no-
ent mystery surrounding its ultimate Neither was Conficker the first to intro- tion of platform stability.7
purpose had raised enough concern duce dynamic domain generation as a Another lesson of Conficker is the
among whitehat security researchers method for selecting the daily Internet ability of malware to manipulate the
that reports were being distributed to rendezvous points used to coordinate current facilities through which Inter-
the general public and raising concerns its infected population. Prior malware net name space is governed. Dynamic
in Washington, D.C. such as Bobax, Kraken, and more re- domain generation algorithms (DGAs),
Was it all hype and of relative small cently Torpig and a few other malware along with fast flux (domain name look-
importance among an ever-increasing families have used dynamic domain ups that translate to hundreds or thou-
stream of new and sophisticated mal- generation as well. Conficker’s most re- sands of potential IP addresses), are
ware families? What weaknesses in the cent introduction of an encrypted peer- increasingly adopted by malware per-
ways of the Internet had this botnet to-peer (P2P) channel to upgrade its petrators as a retort to the growing effi-
brought into focus? Why was it here and ability to rapidly disseminate malware ciency with which whitehats were able to
when would it end? More broadly, why binaries is also preceded by other well- behead whole botnets by quickly identi-
do some malware outbreaks garner wide established kin, Storm worm being per- fying and removing their command and
PHOTOGRA PH BY J USTIN D. SULLIVA N
attention while other multimillion-vic- haps the most well-known example. control sites and redirecting all bot cli-
tim epidemics (such as Seneka) receive Nevertheless, among the long his- ent links. While not an original concept,
little notice? All are fair questions, and tory of malware epidemics, very few can Conficker’s DGA produced a new and
to some degree still remain open. claim sustained worldwide infiltration unique struggle between Conficker’s
In several ways, Conficker was not of multiple millions of infected drones. authors and the whitehat community,
fundamentally novel. The primary in- The rapidly evolving set of Conficker who fought for control of the daily sets
filtration method used by Conficker to variants do represent the state of the of domains used as Conficker’s Internet
Technology Strategy
and Management
Dealing with the
Venture Capital Crisis
How New Zealand and other small, isolated markets can act as “natural incubators.”
T
HE VENTURE CAPITAL indus-
try, like financial services
in general, has fallen on
hard times. Venture funds
historically have returned
about 20% annually to investors, twice
the average of U.S. stocks. Like stocks,
though, returns over the past year have
been sharply negative and investing
has fallen dramatically. U.S. invest-
ments have dropped from the 2000
peak of $105 billion to a low of $20 bil-
lion in 2003 and recovered only to $28
billion in 2008.1 The 2009 numbers are
running at half the 2008 level. Other
countries see similar trends, includ-
ing Israel, which usually has the high-
est level of venture capital investment
given the size of its economy.3 Invest-
ment there was merely $800 million in vesting recently in energy and the en- sense to explore these big, growing re-
2008, down from $2.7 billion in 2000.a vironment as well as more traditional gions. But there is still a lot of competi-
Part of the problem is that large areas, such as software, health care, tion. What really might jump-start the
payoffs have become increasingly biotechnology, medical devices, multi- industry is more creative globalization,
scarce: Average valuations of venture- media, and communications. But per- with an eye toward using some overseas
backed startups in 2008 were half haps the biggest future challenge will markets as “natural incubators.”
those of 2007, while public offerings be not the sector but the geography: VC Governments and universities have
(IPOs) have dwindled—only six in the firms put most of their money in home long supported entrepreneurs with in-
U.S. in 2008, compared to 86 in 2007.b markets to keep a close watch on their cubators that offer seed money, office
But creativity may be another problem entrepreneurs. U.S.-based VC firms also space, financial and business advice,
for the industry. can argue that their home market offers and introductions to potential execu-
Venture capital firms have been in- the biggest public offerings and asset tives and customers. But most incuba-
sales. But a recent survey of 700 VC firms tors I know of have done poorly (I was
PHOTOGRA PH BY TO M COATES
found that 52% are now investing over- an advisor to four firms in the late 1990s
a Israel Venture Capital Research Center; seas. Nearly this same number planned and worked for several venture funds).
http://www.ivc-online.com/
b See VC Industry Statistics Archive from
to invest more in China, India, and In the U.S., entrepreneurs with the best
the National Venture Capital Association at other parts of Asia, while others want ideas usually do not need to be incu-
http://www.nvca.org/ to invest in South America.2 It makes bated and get funding directly from VC
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 25
viewpoints
testing tools, and SAP’s NetWeaver) and on London’s AIM exchange and re-
was discovered in the 1980s and 1990s future challenge will ported revenues in the last fiscal year
by the VC community as well as Ameri- not be the sector but of about US$30 million, with custom-
can and European multinationals. ers in 30 countries.
Also, the U.S. provides nearly $3 billion the geography. ! Framecad (http://www.framecad.
a year in aid to Israel. This, along with com) sells machinery and design soft-
large defense and security spending, ware as well as consulting services
helps fuel demand for high technol- that enable construction companies
ogy. Another small, advanced market is to create small or medium-sized steel-
Finland (population five million). But framed buildings. These are relatively
inexpensive and do not require much they represent common needs, are
skilled labor to construct. In addi- highly standardized, and do not need
tion to New Zealand, the company has With the proper level customization or specialized knowl-
found customers in developing econo- of ambition, talent, edge are the easiest to scale and export.
mies such as China, the Middle East, But these businesses attract the most
and Southeast Asia. and opportunity, competition (see my July 2003 Commu-
! GFG Group (http://www.gfg-group.
even a small, isolated nications column, “Beware the Lure of
com) provides off-the-shelf electronic the Horizontal”). The easiest markets
payment solutions (credit and debit company can turn the to gain a foothold in are “vertical servic-
cards, mobile phone applications, world into its market. es,” such as custom-built applications
ATM and POS applications) and relat- or specialized services for a particular
ed services to 115 million customers industry. But labor-intensive or skill-
in 40 countries. Most of its business dependent work is difficult to scale and
outside New Zealand is in Australia, more difficult to export—which is why
Singapore, the Philippines, and the having an incubator and time to experi-
United Arab Emirates. panies can use the site and tools as a de- ment can be important.
! Inro Technologies (http://www. velopment platform. The applications The population, physical character-
inro.co.nz) sells robotics technology to run inside a browser and do not require istics, or other unique local require-
retrofit manual forklifts and turn them large (and slow) software downloads. ments of a country can also inspire en-
into automated vehicles. Fonterra, ! Weta Workshop (http://www.weta- trepreneurial creativity. For example,
New Zealand’s largest firm and a ma- workshop.com) is one of the largest vid- New Zealand has a severe shortage of
jor exporter of dairy products, invested eo-effects design and production com- people, so we see firms use software
because finding forklift drivers is diffi- panies in the world serving the movie and other technologies to foster auto-
cult and expensive. It is even more ex- industry and now branching out into mation (such as retrofit forklifts) and
pensive to build new automated ware- other markets, such as animation for devise inexpensive, fast solutions to
houses from scratch. children’s TV and technology for video common problems (building construc-
! Methodware (http://www.method- game producers. It is best known for tion, data warehouse design, virtual
ware.com) provides customized risk video effects in the Lord of the Rings mov- hosting, electronic payments). Other
management and internal audit soft- ies (which were shot in New Zealand). It firms take advantage of New Zealand’s
ware to financial services, energy, and has won five Academy Awards for visual breathtaking scenery and creative art
utilities companies as well as the pub- effects, costumes, and makeup. communities. But perhaps most im-
lic sector. It originally targeted small ! Wherescape (http://www.wheres- portant is for VC firms as well as entre-
and medium-size firms but through cape.com) combines unique in-house preneurs investing in small markets to
partnerships has been able to sell to tools with an agile development meth- think big—and follow the lead of Nokia
1,800 corporate clients in 80 countries. odology to build data warehouses or one of the 70 or so Israeli companies
! NextWindow (http://www.nextwin- quickly and cheaply for a variety of in- that have been listed on the NASDAQ
dow.com) sells touch-screen computer dustries. It has hundreds of customers, stock exchange (more than any other
displays and overlay touch-screen de- mainly in New Zealand and Australia, foreign country). With the proper level
vices and software, initially for large but also works with partners around of ambition, talent, and opportunity,
screens and kiosks. It has grown very the world. even a small, isolated company can
rapidly through international distribu- ! Virtual Infrastructure Profession- turn the world into its market.
tion partnerships and alliances with PC als (http://www.vipl.co.nz) provides
manufacturers around the world, espe- custom-built virtualization, hosting, References
1. Cain Miller, C. What will fix the venture capital crisis?
cially in the U.S. and disaster recovery solutions (using The New York Times (May 4, 2009); http://bits.blogs.
! OpenCloud (http://www.open- mostly VMWare and Citrix). It partners nytimes.com/2009/05/04/what-will-fix-the-venture-
capital-crisis/ and Sales of startups plummet, along
cloud.com) sells a suite of real-time with most of the major software and with prices, The New York Times (Apr. 1, 2009); http://
Java application servers (Rhino) as well hardware producers, but does most of bits.blogs.nytimes.com/2009/04/01/sales-of-start-
ups-plummet-along-with-prices/
as provides development tools and its business in New Zealand. 2. Global Economic Downturn Driving Evolution of Venture
consulting for companies interested These firms seemed very interested Capital Industry, National Venture Capital Association
Press Release (June 10, 2009); www.nvca.org
in building multimedia products and in exports, though many lack capital 3. Megginson, W.L. Towards a global model of venture
services, especially for mobile phones. and experienced difficulties growing capital? Journal of Applied Corporate Finance 16
(2004), 89–107.
It moved its headquarters to the U.K. to beyond a certain size. Some companies 4. New Zealand Private Equity and Venture Capital
Association and Ernst & Young, The New Zealand
gain better access to customers. succeeded only because there was little Private Equity and Venture Capital Monitor 2008; http://
! Smallworlds (http://www.small- international competition. The critical www.nzvca.co.nz/Shared/Content/Documents/
worlds.com) is a 3D “virtual world” and decision for government as well as pri-
social networking site that enables us- vate investors is to determine—before Michael Cusumano ([email protected]) is Sloan
Management Review Distinguished Professor of
ers to set up their own room spaces and they put in too much time and mon- Management and Engineering Systems at the MIT Sloan
then do instant messaging as well as ey—which firms can export in volume. School of Management and School of Engineering in
Cambridge, MA.
share audio and video content or play “Horizontal products” that can be sold
games with their friends. Outside com- to almost anyone in any market because Copyright held by author.
Kode Vicious
Kode Reviews 101
A review of code review do’s and don’ts.
Dear KV,
My company recently went through a
round of layoffs, and at the same time
a lot of people left the company vol-
untarily. While trying to pick up the
pieces, we’ve discovered that there are
just some components of our system
that no one understands. Now man-
agement is considering hiring back
some folks as “consultants” to try to fix
this mess. It’s not like the code is un-
commented or spaghetti; it’s just that
there are bits of it that no one remain-
ing with the company understands.
It all seems pretty stupid and waste-
ful, but perhaps I’m just a bit grumpy
because I didn’t get a nice farewell
package and instead have to clean up
the mess.
Holding the Bag
Dear Holding, tend. Right now, that advice is a bit like on-the-desk, vein-pulsing-in-the-head
Maybe you should quit and see if you closing the barn door after the train has kind of experience. I’ve stopped such
get hired back as a consultant; I hear left the station, or…whatever. It does code reviews after 10 minutes when
they get really good rates! Maybe that’s bring me to a few things I would like to I realized no one in the room had
ILLUSTRATION BY A ND REW STELLMA N <H TT P:// WWW. ST ELLM A N- GREENE.COM >
not the right advice here. I meant to say about how to do a proper code re- read a single line of the code before
say, “Welcome to the latest round of re- view, which is something I don’t think they showed up in the meeting room.
cession,” wherein companies that grew most people ever learn how to do—and Please, to preserve life and sanity, pre-
bloated in good times try to grow lean certainly most programmers would not pare for a code review.
in bad times, but realize they can’t shed learn to do this if it were a choice. Preparations for a code review in-
all the useless pounds they thought There are three phases to any code clude selecting a constrained section of
they could. In my career this is round review: preparation, the review, and the code to look at, informing everyone
three of this wheel of fortune, and I am afterward. One of the things most which piece of code you have picked,
sure it will not be the last. people miss when they call for—or in and telling them that if they don’t read
The best way to make sure that every- the case of managers, order—a code the code before the appointed meet-
one on a team or that enough people in review is that it is unproductive just to ing time, you will be very displeased.
a group or company are able to main- shove a bunch of people in a room and When I say constrained, I do not nor-
tain a significant piece of software is show them unfamiliar code. When I mally mean a single page of code, nor
to institute system code reviews and to say unproductive, what I mean is that do I mean a set of 30 files. Balance, as
beat senseless anyone who does not at- it is a teeth-grinding, head-banging- in all things KV talks about, is impor-
tant to the process. If you’re reviewing ing the code review, who may or may not
a particularly nasty bit of code—for be the author of the code, should give a
example, a job scheduler written by short (no more than 10- to 15-minute) A compiler reads
someone who is no longer with the introduction to the code to be reviewed. your code, but it
company—then you’re going to want Make sure to keep the person focused
to take smaller chunks for each review. on the code being reviewed. Letting a doesn’t understand
The reason for the smaller chunks programmer wax poetic leads to poor its purpose or design;
is that the person who wrote it is not poetry and to wishing that, like Ulysses,
present to explain it to you. A code re- you had wax in your ears. Once the in- for the moment, only
view without a guide takes about two to troduction is complete, you can walk a person can do that.
three times as long as one conducted through any header files or places where
with the author of the code. data structures, base classes, and other
The next step is to schedule a time elements are defined. With the basics
to review the code. Do not review code of the code covered, you can move on
directly after lunch. KV once worked to the functions or methods and review
for someone who would schedule those next. When the review is over, there is still
meetings at 2 p.m., when lunch was One of the most difficult challenges work to be done. Of course, someone
normally from noon to 1 p.m. I never in a code review is to avoid getting dis- has to fix all the spelling, grammati-
failed to snore in his meetings. Code tracted by minutiae. Many people like cal, and language conformance issues,
reviews should be done when people to point out every spelling and gram- as well as the genuine bugs, but that’s
have plenty of brainpower and energy matical error, as if they’re scoring not all. Copies of the notes should be
because reading someone else’s code points on a test. Such people are sim- distributed to all participants, just in
normally takes more energy than read- ply distracting the group from under- case something happens to the author
ing or writing your own: it’s extremely standing the overall structure of the of the code, and the marked-up copies
difficult to get into someone else’s code and possibly digging for deeper of the code should be kept somewhere
mind-set. Trying to adjust your way problems. The same is true for lan- for future reference. There are now
of thinking to that of some other pro- guage fascists, who feel they need to some tools that will handle this elec-
grammer takes quite an effort, if only quote the language standard, chapter tronically. Google has a code-review
to keep yourself from beating on that and verse, as if an ANSI standard is a tool called Rietveld for code kept in
other programmer for being so ap- holy book. Both of these types of issues the subversion source-code control
parently foolish. When I call a code should be noted, quickly, and then you system. Although an electronic system
review, it is either early in the day or should move on. Do not dwell here, for for recording and acting on code-re-
two to three hours after lunch. Do not it is here that you will lose your way and view issues is an excellent tool, it is not
perform a code review for more than be dashed upon the rocks by the Scylla a substitute for formal code reviews
two hours. There is no such thing as a of spelling and Charybdis of syntax. where you discuss the design, as well
productive four-hour meeting, except As in any other engineering endeav- as the implementation, of a piece of
for management types who equate the or, someone will need to take notes on code. A compiler reads your code, but
number of hours they’ve blathered on what problems or issues were found it doesn’t understand its purpose or
with the amount of work they’ve done. in the review. It is infuriating in the design; for the moment, only a person
Now for the review itself. Providing extreme to review the same piece of can do that.
coffee and food is probably a good idea. code twice in six months and to find KV
Food has the effect of making sure peo- the same issues, all because everyone
ple show up, and coffee has the effect of was too lazy to write down the issues.
keeping them awake. The person lead- A whiteboard or flip chart is fine for Related articles
this, and in a pinch you might be able on queue.acm.org
to trust the author of the code to do A conversation with Steve Bourne,
One of the most this for you. I would follow the latter Eric Allman, and Bryan Cantrill
path only if I trusted the author of the http://queue.acm.org/detail.cfm?id=1454460
difficult challenges code, because programmers are gener- Kode Vicious: The Return
in a code review ally lazy and will subconsciously avoid http://queue.acm.org/detail.cfm?id=1039521
work. It’s not even that they’ll know
is to avoid getting they left off something to fix, but three
The Yin and Yang of Software Development
http://queue.acm.org/detail.cfm?id=1388787
distracted by months later you’ll say to them, “Wait,
we told to you to fix this. Why isn’t this
minutiae. fixed?!” To which you will receive cu-
George V. Neville-Neil ([email protected]) is the proprietor of
Neville-Neil Consulting and a member of the ACM Queue
editorial board. He works on networking and operating
rious looks accompanied by “What? systems code for fun and profit, teaches courses on
This? Oh, you meant this?” If it’s an various programming-related subjects, and encourages
your comments, quips, and code snips pertaining to his
issue, write it down while the group is Communications column.
thinking about it, and go over the list
at the end of the review. Copyright held by author.
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 29
V
viewpoints
Viewpoint
Retrospective: An Axiomatic Basis
for Computer Programming
C.A.R. Hoare revisits his past Communications article on the axiomatic
approach to programming and uses it as a touchstone for the future.
T
HIS MONTH MARKS the 40th an- of the widely varying hardware archi- pectations led me in 1968 to move from
niversary of the publication tectures prevalent at the time. an industrial to an academic career. And
of the first article I wrote as an I expected that research into the axi- when I retired in 1999, both the positive
academic.a I have been invited omatic method would occupy me for my and the negative expectations had been
to give my personal view of the entire working life; and I expected that entirely fulfilled.
advances that have been made in the its results would not find widespread The main attraction of the axiomatic
subject since then, and the further ad- practical application in industry until method was its potential provision of
vances that remain to be made. Which after I reached retirement age. These ex- an objective criterion of the quality of
of them did I expect, and which of them
surprised me?
Retrospective (1969–1999)
My first job (1960–1968) was in the
computer industry; and my first major
project was to lead a team that imple-
mented an early compiler for ALGOL
60. Our compiler was directly struc-
tured on the syntax of the language, so
elegantly and so rigorously formalized
as a context-free language. But the se-
mantics of the language was even more
important, and that was left informal
in the language definition. It occurred
to me that an elegant formalization
might consist of a collection of axioms,
similar to those introduced by Euclid
to formalize the science of land mea-
surement. My hope was to find axioms
that would be strong enough to en-
able programmers to discharge their
responsibility to write correct and ef-
ficient programs. Yet I wanted them
PHOTOGRA PH BY ROBERT M . M cCLURE
a programming language, and the ease soning about programs. They include
with which programmers could use it. the dynamic logic of actions, temporal
For this reason, I appealed to academic I did not realize that logic, linear logic, and separation logic.
researchers engaged in programming the success of tests Some of these theories are now being
language design to help me in the re- reused in the study of computational
search. The latest response comes from is that they test biology, genetics, and sociology.
hardware designers, who are using axi- the programmer, Equally spectacular (and to me unex-
oms in anger (and for the same reasons pected) progress has been made in the
as given above) to define the properties not the program. automation of logical and mathemati-
of modern multicore chips with weak cal proof. Part of this is due to Moore’s
memory consistency. Law. Since 1969, we have seen steady ex-
One thing I got spectacularly wrong. ponential improvements in computer
I could see that programs were getting capacity, speed, and cost, from mega-
larger, and I thought that testing would bytes to gigabytes, and from megahertz
be an increasingly ineffective way of re- sprinkled more or less liberally in the to gigahertz, and from megabucks to
moving errors from them. I did not real- program text, were used in development kilobucks. There has been also at least
ize that the success of tests is that they practice, not to prove correctness of pro- a thousand-fold increase in the efficien-
test the programmer, not the program. grams, but rather to help detect and di- cy of algorithms for proof discovery and
Rigorous testing regimes rapidly per- agnose programming errors. They are counterexample (test case) generation.
suade error-prone programmers (like evaluated at runtime during overnight Crudely multiplying these factors, a
me) to remove themselves from the tests, and indicate the occurrence of any trillion-fold improvement has brought
profession. Failure in test immediately error as close as possible to the place in us over a tipping point, at which it has
punishes any lapse in programming the program where it actually occurred. become easier (and certainly more reli-
concentration, and (just as important) The more expensive assertions were able) for a researcher in verification to
the failure count enables implementers removed from customer code before use the available proof tools than not to
to resist management pressure for pre- delivery. More recently, the use of asser- do so. There is a prospect that the activ-
mature delivery of unreliable code. The tions as contracts between one module ities of a scientific user community will
experience, judgment, and intuition of of program and another has been incor- give back to the tool-builders a wealth
programmers who have survived the rig- porated in Microsoft implementations of experience, together with realistic
ors of testing are what make programs of standard programming languages. experimental and competition materi-
of the present day useful, efficient, and This is just one example of the use of al, leading to yet further improvements
(nearly) correct. Formal methods for formal methods in debugging, long be- of the tools.
achieving correctness must support the fore it becomes possible to use them in For many years I used to speculate
intuitive judgment of programmers, not proof of correctness. about the eventual way in which the re-
replace it. In 1969, my proof rules for programs sults of research into verification might
My basic mistake was to set up proof were devised to extract easily from reach practical application. A general
in opposition to testing, where in fact a well-asserted program the math- belief was that some accident or se-
both of them are valuable and mutu- ematical ‘verification conditions’, the ries of accidents involving loss of life,
ally supportive ways of accumulating proof of which is required to establish perhaps followed by an expensive suit
evidence of the correctness and service- program correctness. I expected that for damages, would persuade software
ability of programs. As in other branches these conditions would be proved by managers to consider the merits of pro-
of engineering, it is the responsibility of the reasoning methods of standard gram verification.
the individual software engineer to use logic, on the basis of standard axioms This never happened. When a bug
all available and practicable methods, and theories of discrete mathematics. occurred, like the one that crashed the
in a combination adapted to the needs What has happened in recent years is maiden flight of the Ariane V spacecraft
of a particular project, product, client, exactly the opposite of this, and even in 1996, the first response of the manag-
or environment. The best contribution more interesting. New branches of er was to intensify the test regimes, on
of the scientific researcher is to extend applied discrete mathematics have the reasonable grounds that if the erro-
and improve the methods available to been developed to formalize the pro- neous code had been exercised on test,
the engineer, and to provide convincing gramming concepts that have been it would have been easily corrected be-
evidence of their range of applicability. introduced since 1969 into standard fore launch. And if the issue ever came
Any more direct advocacy of personal programming languages (for example, to court, the defense of ‘state-of-the-art’
research results actually excites resis- objects, classes, heaps, pointers). New practice would always prevail. It was
tance from the engineer. forms of algebra have been discovered clearly a mistake to try to frighten peo-
for application to distributed, concur- ple into changing their ways. Far more
Progress (1999–2009) rent, and communicating processes. effective is the incentive of reduction in
On retirement from University, I ac- New forms of modal logic and abstract cost. A recent report from the U.S. De-
cepted a job offer from Microsoft Re- domains, with carefully restricted ex- partment of Commerce has suggested
search in Cambridge (England). I was pressive power, have been invented to that the cost of programming error to
surprised to discover that assertions, simplify human and mechanical rea- the world economy is measured in tens
of billions of dollars per year, most of it is (and should always be) to pluck the ers. Such experiments will often be the
falling (in small but frequent doses) on ‘low-hanging fruit’; that is, to solve rational reengineering of existing real-
the users of software rather than on the the easiest parts of the most prevalent istic applications. Experience gained
producers. problems, in the particular circum- in the experiments is expected to lead
The phenomenon that triggered in- stances of here and now. But the goal to revisions and improvements in the
terest in software verification from the of the pure research scientist is exactly tools, and in the theories on which
software industry was totally unpredict- the opposite: it is to construct the most the tools were based. Scientific rivalry
ed and unpredictable. It was the attack general theories, covering the widest between experimenters and between
of the hacker, leading to an occasional possible range of phenomena, and to tool builders can thereby lead to an ex-
shutdown of worldwide commercial seek certainty of knowledge that will ponential growth in the capabilities of
activity, costing an estimated $4 billion endure for future generations. It is to the tools and their fitness to purpose.
on each occasion. A hacker exploits avoid the compromises so essential to The knowledge and understanding
vulnerabilities in code that no reason- engineering, and to seek ideals like ac- gained in worldwide long-term re-
able test strategy could ever remove curacy of measurement, purity of mate- search will guide the evolution of so-
(perhaps by provoking race conditions, rials, and correctness of programs, far phisticated design automation tools
or even bringing dead code cunningly beyond the current perceived needs of for software, to match the design au-
to life). The only way to reach these vul- industry or popularity in the market- tomation tools routinely available to
nerabilities is by automatic analysis place. For this reason, it is only scien- engineers of other disciplines.
of the text of the program itself. And it tific research that can prepare man-
is much cheaper, whenever possible, kind for the unknown unknowns of the The End
to base the analysis on mathematical forever uncertain future. No exponential growth can continue
proof, rather than to deal individually So I believe there is now a better scope forever. I hope progress in verifica-
with a flood of false alarms. In the in- than ever for pure research in computer tion will not slow down until our
terests of security and safety, other science. The research must be motivat- programming theories and tools are
industries (automobile, electronics, ed by curiosity about the fundamental adequate for all existing applications
aerospace) are also pioneering the use principles of computer programming, of computers, and for supporting the
of formal tools for programming. There and the desire to answer the basic ques- continuing stream of innovations
is now ample scope for employment of tions common to all branches of sci- that computers make possible in all
formal methods researchers in applied ence: what does this program do; how aspects of modern life. By that time,
industrial research. does it work; why does it work; and what I hope the phenomenon of program-
is the evidence for believing the answers ming error will be reduced to insignif-
Prospective (2009–) to all these questions? We know in prin- icance: computer programming will
In 1969, I was afraid industrial re- ciple how to answer them. It is the speci- be recognized as the most reliable of
search would dispose such vastly su- fications that describes what a program engineering disciplines, and com-
perior resources that the academic does; it is assertions and other internal puter programs will be considered
researcher would be well advised to interface contracts between component the most reliable components in any
withdraw from competition and modules that explain how it works; it is system that includes them.
move to a new area of research. But programming language semantics that Even then, verification will not be a
again, I was wrong. Pure academic re- explains why it works; and it is math- panacea. Verification technology can
search and applied industrial research ematical and logical proof, nowadays only work against errors that have been
are complementary, and should be constructed and checked by computer, accurately specified, with as much ac-
pursued concurrently and in collabo- that ensures mutual consistency of curacy and attention to detail as all
ration. The goal of industrial research specifications, interfaces, programs, other aspects of the programming task.
and their implementations. There will always be a limit at which the
There are grounds for hope that engineer judges that the cost of such
The phenomenon that progress in basic research will be much specification is greater than the benefit
faster than in the early days. I have that could be obtained from it; and that
triggered interest in already described the vastly broader testing will be adequate for the pur-
software verification theories that have been proposed to pose, and cheaper. Finally, verification
understand the concepts of modern cannot protect against errors in the
from the software programming. I have welcomed the specification itself. All these limits can
industry was totally enormous increase in the power of au- be freely acknowledged by the scien-
tomated tools for proof. The remaining tist, with no reduction in enthusiasm
unpredicted and opportunity and obligation for the sci- for pushing back the limits as far as
unpredictable. entist is to conduct convincing experi- they will go.
ments, to check whether the tools, and
the theories on which they are based, C.A.R. Hoare ([email protected]) is a principal
are adequate to cover the vast range of researcher at Microsoft Research in Cambridge, U.K., and
Emeritus Professor of Computing at Oxford University.
programs, design patterns, languages,
and applications of today’s comput- Copyright held by author.
The Most Complete U Unlimited full-text access to all scholarly content ever published by
ACM since 1954 to the present
Global Resource for
Includes 256,000 Articles, 270+ Conference Proceedings Titles,
Computer Scientists and
U
37 High-Impact Journals, 9 Specialized Magazines, and 43 Special Interest
Information Technology Groups contributing content
Professionals U Full-text access to the complete contents of ACM’s award-winning
flagship publication Communications of the ACM, including access to
all premium content on the magazine’s Web site http://cacm.acm.org
U Access to ACM TechNews technology news briefing service,
and all video and multimedia content published by ACM, including
800+ multimedia files
U Contains The Guide to Computing Literature Index — the largest and
most comprehensive bibliographic database in the field of computing
with over 1.3 million records and 7 million references to the
computing and IT literature
U Access to The Online Computing Reviews Service, providing timely
commentary and critiques by leading computing experts of the most
ACM Membership: essential books and articles.
Professionals and students may U Advanced technology includes cutting-edge search functionality
purchase both an ACM Membership and guided navigation
and the ACM Digital Library benefitting U COUNTER Compliant Usage Statistics
from a wide variety of resources,
including access to over 40 high-impact
U Advancing the field of computing and technology with the
publications; free online books and
highest-quality content at affordable rates
courses for professional development;
a searchable Digital Library; a growing
online community of electronic forums;
ACM Digital Library access is available by annual subscription
and more. For pricing information or to institutions and individuals worldwide through ACM
to subscribe visit us online at https:// Membership, Academic Consortia and Corporate Libraries.
campus.acm.org/public/quickJoin/
Consortia and Corporate libraries are eligible to subscribe to
interim.cfm or contact ACM Member special Digital Library Packages, which include the DL Core Package,
Services Department at 1.800.342.6626 DL Master SIG Package, and The Guide to Computing Literature.
(U.S. and Canada), 212.626.0500 For pricing information or to subscribe, contact Nolen S. Harris at
(Global) or email [email protected]. 212.626.0676 or email [email protected].
Probing
to desktop computers, making them
accessible to application scientists lack-
ing experience with clustering, queuing
systems, and the like.
Biomolecular
This article is based on our experi-
ences developing software for use by
and in cooperation with scientists, often
graduate students, with backgrounds
Machines
in physics, chemistry, and biology. Our
programs, NAMD18 and VMD10 (Visual
Molecular Dynamics), run on computer
systems ranging from laptops to super-
with Graphics
computers and are used to model pro-
teins, nucleic acids, and lipids at the
atomic level in order to understand how
protein structure enables cellular func-
Processors
tions such as catalyzing reactions, har-
vesting sunlight, generating forces, and
sculpting membranes (for additional
scientific applications, see http://www.
ks.uiuc.edu/). In 2007 we began work-
ing with the Nvidia CUDA (Compute
Unified Device Architecture) system
for general-purpose graphics proces-
sor programming to bring the power of
many-core computing to practical sci-
entific applications.22
COMPUTER SIMULATION HAS become an integral
part of the study of the structure and function of Bottom-up Biology
If one were to design a system to safe-
biological molecules. For years, parallel computers guard critical data for thousands of
have been used to conduct these computationally years, it would require massive redun-
demanding simulations and to analyze their results. dancy, self-replication, easily replace-
able components, and easily interpreted
These simulations function as a “computational formats. These are the same challenges
microscope,” allowing the scientist to observe details faced by our genes, which build around
of molecular processes too small, fast, or delicate themselves cells, organisms, popula-
tions, and entire species for the sole
to capture with traditional instruments. Over time, purpose of continuing their own surviv-
commodity GPUs (graphics processing units) have al. The DNA of every cell contains both
data (the amino acid sequences of every
evolved into massively parallel computing devices, and protein required for life) and metadata
more recently it has become possible to program them (large stretches of “junk” DNA that inter-
act with hormones to control whether a not self-assemble into a unique struc- reactions, efficiently harnessing and ex-
sequence is exposed to the cell’s protein ture in a reasonable time. Determining pending energy obtained from respira-
expression machinery or hidden deep the folded structure of a protein based tion or photosynthesis. While the ami-
inside the coils of the chromosome). only on its sequence is one of the great no acid chain is woven into a scaffold
The protein sequences of life, once challenges in biology, for while DNA of helices and sheets, and some compo-
expressed as a one-dimensional chain sequences are known for entire organ- nents are easily recognized, there are no
of amino acids by the ribosome, then isms, protein structures are available rigid shafts, hinges, or gears to simplify
fold largely unaided into the unique only through the painstaking work of the analysis.
three-dimensional structures required crystallographers. To observe the dynamic behavior of
for their functions. The same protein Simply knowing the folded structure proteins and larger biomolecular aggre-
from different species may have simi- of a protein is not enough to under- gates, we turn to a computational micro-
lar structures despite greatly differing stand its function. Many proteins serve scope in the form of a molecular dynam-
sequences. Protein sequences have a mechanical role of generating, trans- ics simulation. As all proteins are built
been selected for the capacity to fold, ferring, or diffusing forces and torques. from a fixed set of amino acids, a model
as random chains of amino acids do Others control and catalyze chemical of the forces acting on every atom can
be constructed for any given protein, in- nearly identical initial conditions will
cluding bonded, electrostatic, and van quickly diverge, but over time their aver-
der Waals components. Newton’s laws age properties will converge if the pos-
of motion then describe the dynamics sible conformations of the system are
of the protein over time. When experi-
mental observation is insufficient in If one were to sufficiently well sampled. To guarantee
that an expected transition is observed,
resolving power, with the computer we
have a perfect view of this simple and
design a system to it is often necessary to apply steering
forces to the calculation. Analysis is per-
limited model. safeguard critical formed, both during the calculation and
Is it necessary to simulate every atom
in a protein to understand its function?
data for thousands later, on periodically saved snapshots to
measure average properties of the sys-
Answering no would require a complete of years, it would tem and to determine which transitions
knowledge of the mechanisms involved,
in which case the simulation could pro-
require massive occur with what frequency.
As the late American mathematician
duce little new insight. Proteins are not redundancy, R. W. Hamming said, “The purpose of
designed cleanly from distinct compo-
nents but are in a sense hacked together self-replication, computing is insight, not numbers.”
Simulations would spark little insight if
from available materials. Rising above easily replaceable scientists could not see the biomolecu-
the level of atoms necessarily abandons
some detail, so it is best to reserve this components, and lar model in three dimensions on the
computer screen, rotate it in space, cut
for the study of aggregate-level phenom-
ena that are otherwise too large or slow
easily interpreted away obstructions, simplify representa-
tions, incorporate other data, and ob-
to simulate. formats. These serve its behavior to generate hypothe-
Tracking the motions of atoms re-
quires advancing positions and veloci-
are the same ses. Once a mechanism of operation for
a protein is proposed, it can be tested by
ties forward through millions or bil- challenges faced both simulation and experiment, and
lions of femtosecond (10–15 second)
time steps to simulate nanoseconds or by our genes. the details refined. Excellent visual rep-
resentation is then needed to an equal
microseconds of simulated time. Simu- extent to publicize and explain the dis-
lation sizes range from a single protein covery to the biomedical community.
in water with fewer than 100,000 atoms
to large multicomponent structures Biomedical Users
of 1–10 million atoms. Although every We have more than a decade of experi-
atom interacts with every other atom, ence guiding the development of the
numerical methods have been devel- NAMD (http://www.ks.uiuc.edu/Re-
oped to calculate long-range interac- search/namd/) and VMD (http://www.
tions for N atoms with order N or N log ks.uiuc.edu/Research/vmd/) programs
N rather than N2 complexity. for the simulation, analysis, and visual-
Before a molecular dynamics simu- ization of large biomolecular systems.
lation can begin, a model of the biomo- The community of scientists that we
lecular system must be assembled in serve numbers in the tens of thousands
as close to a typical state as possible. and circles the globe, ranging from
First, a crystallographic structure of students with only a laptop to leaders
any proteins must be obtained (pdb.org of their fields with access to the most
provides a public archive), missing de- powerful supercomputers and graph-
tails filled in by comparison with other ics workstations. Some are highly expe-
structures, and the proteins embedded rienced in the art of simulation, while
in a lipid membrane or bound to DNA many are primarily experimental re-
molecules as appropriate. The entire searchers turning to simulation to help
complex is then surrounded by water explain their results and guide future
molecules and an appropriate concen- work.
tration of ions, located to minimize The education of the computational
their electrostatic energy. The simula- scientist is quite different from that of
tion must then be equilibrated at the the scientifically oriented computer
proper temperature and pressure until scientist. Most start out in physics or
the configuration stabilizes. another mathematically oriented field
Processes at the atomic scale are sto- and learn scientific computing infor-
chastic, driven by random thermal fluc- mally from their fellow lab mates and
tuations across barriers in the energy advisors, originally in Fortran 77 and
landscape. Simulations starting from today in Matlab. Although skilled at
solving complex problems, they are sel- ability of high-performance computing have become ubiquitous in modern
dom taught any software design process hardware, we have long sought better computers.
or the reasons to prefer one solution to options for bringing larger and faster GPU hardware design. Modern GPUs
another. Some go for years in this en- simulations to the scientific masses. have evolved to a high state of sophis-
vironment before being introduced to The last great advance in this regard was tication necessitated by the complex
revision-control systems, much less au- the evolution of commodity-based Li- interactive rendering algorithms used
tomated unit tests. nux clusters from cheap PCs on shelves by contemporary games and various
As software users, scientists are into the dominant platform today. The engineering and scientific visualization
similar to programmers in that they are next advance, practical acceleration, will software. GPUs are now fully program-
comfortable adapting examples to suit require a commodity technology with mable massively parallel computing de-
their needs and working from docu- strong commercial support, a sustain- vices that support standard integer and
mentation. The need to record and re- able performance advantage over sev- floating-point arithmetic operations.11
peat computations makes graphical in- eral generations, and a programming State-of-the-art GPU devices contain
terfaces usable primarily for interactive model that is accessible to the skilled more than 240 processing units and are
exploration, while batch-oriented input scientific programmer. We believe that capable of performing up to 1TFLOPs.
and output files become the primary ar- this next advance is to be found in 3D High-end devices contain multiple gi-
tifacts of the research process. graphics accelerators inspired by pub- gabytes of high-bandwidth on-board
One of the great innovations in sci- lic demand for visual realism in com- memory complemented by several
entific software has been the incorpo- puter games. small on-chip memory systems that can
ration of scripting capabilities, at first be used as program-managed caches,
rudimentary but eventually in the form GPU Computing further amplifying effective memory
of general-purpose languages such as Biomolecular modelers have always had bandwidth.
Tcl and Python. The inclusion of script- a need for sophisticated graphics to elu- GPUs are designed as throughput-
ing in NAMD and VMD has blurred the cidate the complexities of the large mo- oriented devices. Rather than optimiz-
line between user and developer, expos- lecular structures commonly studied ing the performance of a single thread
ing a safe and supportive programming in structural biology. In 1995, 3D visu- or a small number of threads of execu-
language that allows the typical scien- alization of such molecular structures tion, GPUs are designed to provide high
tist to automate complex tasks and even required desk-side workstations cost- aggregate performance for tens of thou-
develop new methods. Since no recom- ing tens of thousands of dollars. Gradu- sands of independent computations.
pilation is required, the user need not ally, the commodity graphics hardware This key design choice allows GPUs to
worry about breaking the tested, perfor- available for personal computers began spend the vast majority of chip die area
mance-critical routines implemented to incorporate fixed-function hardware (and thus transistors) on arithmetic
in C++. Much new functionality in VMD for accelerating 3D rendering. This units rather than on caches. Similarly,
has been developed by users in the form led to widespread development of 3D GPUs sacrifice the use of independent
of script-based plug-ins, and C-based games and funded a fast-paced cycle instruction decoders in favor of SIMD
plug-in interfaces have simplified the of continuous hardware evolution that (single-instruction multiple-data)
development of complex molecular has ultimately resulted in the GPUs that hardware designs wherein groups of
structure analysis tools and readers for
dozens of molecular file formats.
Scientists are quite capable of devel-
oping new scientific and computational
approaches to their problems, but it is
unreasonable to expect the biomedi-
cal community to extend their inter-
est and attention so far as to master
the ever-changing landscape of high-
performance computing. We seek to
provide users of NAMD and VMD with
the experience of practical supercomput-
ing, in which the skills learned with toy
systems running on a laptop remain of
use on both the departmental cluster
and national supercomputer, and the
complexities of the underlying parallel
decomposition are hidden. Rather than
a fearful and complex instrument, the
supercomputer now becomes just an-
other tool to be called upon as the user’s
Figure 1. The recently constructed Lincoln GPU cluster at the National Center for
work requires. Supercomputing Applications contains 1,536 CPU cores, 384 GPUs, 3TB of memory, and
Given the expense and limited avail- achieves an aggregate peak floating-point performance of 47.5TFLOPS.
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 37
practice
processing units share an instruction suited for GPU acceleration. third-party compiler vendors such as
decoder. This design choice maximiz- As a direct result of the large number RapidMind12 and the Portland Group to
es the number of arithmetic units per of processing units, high-bandwidth target GPUs more easily. With the major
mm2 of chip die area, at the cost of re- main memory, and fast on-chip mem- GPU vendors providing officially sup-
duced performance whenever branch ory systems, GPUs have the potential ported toolkits for GPU computing, the
divergence occurs among threads on to significantly outperform traditional most significant barrier to widespread
the same SIMD unit. CPU architectures significantly on high- use has been eliminated.
The lack of large caches on GPUs ly data-parallel problems that are well Although we focus on CUDA, many of
means that a different technique must matched to the architectural features of the concepts we describe have analogs
be used to hide the hundreds of clock the GPU. in OpenCL. A full overview of the CUDA
cycles of latency to off-chip GPU or host GPU programming. Until recently, the programming model is beyond the
memory. This is accomplished by multi- main barrier to using GPUs for scientif- scope of this article, but John Nickolls
plexing many threads of execution onto ic computing had been the availability et al. provide an excellent introduction
each physical processing unit, man- of general-purpose programming tools. in their article, “Scalable parallel pro-
aged by a hardware scheduler that can Early research efforts such as Brook2 and gramming with CUDA,” in the March/
exchange groups of active and inactive Sh13 demonstrated the feasibility of us- April 2008 issue of ACM Queue.15 CUDA
threads as queued memory operations ing GPUs for nongraphical calculations. code is written in C/C++ with exten-
are serviced. In this way, the memory In mid-2007 Nvidia released CUDA,16 sions to identify functions that are to be
operations of one thread are overlapped a new GPU programming toolkit that compiled for the host, the GPU device,
with the arithmetic operations of oth- addressed many of the shortcomings or both. Functions intended for execu-
ers. Recent GPUs can simultaneously of previous efforts and took full advan- tion on the device, known as kernels, are
schedule as many as 30,720 threads on tage of a new generation of compute- written in a dialect of C/C++ matched to
an entire GPU. Although it is not nec- capable GPUs. In late 2008 the Khronos the capabilities of the GPU hardware.
essary to saturate a GPU with the maxi- Group announced the standardization The key programming interfaces CUDA
mum number of independent threads, of OpenCL,14 a vendor-neutral GPU and provides for interacting with a device in-
this provides the best opportunity for la- accelerator programming interface. clude routines that do the following:
tency hiding. The requirement that the AMD, Nvidia, and many other vendors ! Enumerate available devices and
GPU be supplied with large quantities have announced plans to provide Open- their properties
of fine-grained data-parallel work is the CL implementations for their GPUs and ! Attach to and detach from a device
key factor that determines whether or CPUs. Some vendors also provide low- ! Allocate and deallocate device
not an application or algorithm is well level proprietary interfaces that allow memory
! Copy data between host and device
memory
! Launch kernels on the device
! Check for errors
When launched on the device, the
kernel function is instantiated thou-
sands of times in separate threads ac-
cording to a kernel configuration that
determines the dimensions and num-
ber of threads per block and blocks per
grid. The kernel configuration maps
the parallel calculations to the device
hardware and can be selected at run-
time for the specific combination of
input data and CUDA device capabili-
ties. During execution, a kernel uses
its thread and block indices to select
desired calculations and input and out-
put data. Kernels can contain all of the
usual control structures such as loops
and if/else branches, and they can read
and write data to shared device memory
or global memory as needed. Thread
synchronization primitives provide the
means to coordinate memory accesses
among threads in the same block, al-
Figure 2. GPU-accelerated electrostatics algorithms enable researchers to place ions lowing them to operate cooperatively
appropriately during early stages of model construction. Placement of ions in large structures
such as the ribosome shown here previously required the use of HPC clusters for calculation on shared data.
but can now be performed on a GPU-accelerated desktop computer in just a few minutes. The key challenges involved in de-
the use of GPUs for molecular dynamics in our GPU implementation is respon-
was the Folding@Home project5,7 where sible for the forces on the atoms in a sin-
continuing efforts on development of gle patch due to the atoms in either the
highly optimized GPU algorithms have same or a neighboring patch. The kernel
demonstrated speedups of more than a
factor of 100 for a particular class of sim- We expect GPUs copies the atoms from the first patch in
the assigned pair to shared memory and
ulations (for example, protein folding)
of very small molecules (5,000 atoms
to maintain their keeps the atoms from the second patch
in registers. All threads iterate in uni-
and less). Folding@Home is a distrib- current factor- son over the atoms in shared memory,
uted computing application deployed
on thousands of computers worldwide.
of-10 advantage in accumulating forces for the atoms in
registers only. The accumulated forces
GPU acceleration has helped make it peak performance for each atom are then written to global
the most powerful distributing comput-
ing cluster in the world, with GPU-based
relative to CPUs, memory. Since the forces between a pair
of atoms are equal and opposite, the
clients providing the dominant compu- while their obtained number of force calculations could be
tational power (http://fah-web.stanford.
edu/cgi-bin/main.py?qtype=osstats). performance cut in half, but the extra coordination
required to sum forces on the atoms in
HOOMD (Highly Optimized Object-
oriented Molecular Dynamics), a re-
advantage for well- shared memory outweighs any savings.
suited problems
NAMD uses constant memory to
cently developed package specializing store a compressed lookup table of
in molecular dynamics simulations of
polymer systems, is unique in that it
continues to grow. bonded atom pairs for which the stan-
dard short-range interaction is not
was designed from the ground up for valid. This is efficient because the table
execution on GPUs.1 Though in its in- fits entirely in the constant cache and
fancy, HOOMD is being used for a vari- is referenced for only a small fraction
ety of coarse-grain particle simulations of pairs. The texture unit, a specialized
and achieves speedups of up to a factor feature of GPU hardware designed for
of 30 through the use of GPU-specific al- rapidly mapping images onto surfaces,
gorithms and approaches. is used to interpolate the short-range
NAMD18 is another early success in interaction from an array of values that
the use of GPUs for molecular dynam- fits entirely in the texture cache. The
ics.17,19,22 It is a highly scalable parallel dedicated hardware of the texture unit
program that targets all-atom simula- can return a separate interpolated value
tions of large biomolecular systems con- for every thread that requires it faster
taining hundreds of thousands to many than the potential function could be
millions of atoms. Because of the large evaluated analytically.
number of processor-hours consumed Building, visualizing, and analyzing
by NAMD users on supercomputers molecular models. Another area where
around the world, we investigated a va- GPUs show great promise is in acceler-
riety of acceleration options and have ating many of the most computationally
used CUDA to accelerate the calculation intensive tasks involved in preparing
of nonbonded forces using GPUs. CUDA models for simulation, visualizing them,
acceleration mixes well with task-based and analyzing simulation results (http://
parallelism, allowing NAMD to run on www.ks.uiuc.edu/Research/gpu/).
clusters with multiple GPUs per node. One of the critical tasks in the simula-
Using the CUDA streaming API for asyn- tion of viruses and other structures con-
chronous memory transfers and kernel taining nucleic acids is the placement
invocations to overlap GPU computa- of ions to reproduce natural biological
tion with communication and other conditions. The correct placement of
work done by the CPU yields speedups ions (see Figure 2) requires knowledge
of up to a factor of nine times faster than of the electrostatic field in the volume
CPU-only runs.19 of space occupied by the simulated sys-
At every iteration NAMD must calcu- tem. Ions are placed by evaluating the
late the short-range interaction forces electrostatic potential on a regularly
between all pairs of atoms within a cut- spaced lattice and inserting ions at the
off distance. By partitioning space into minima in the electrostatic field, updat-
patches slightly larger than the cutoff ing the field with the potential contribu-
distance, we can ensure that all of an at- tion of the newly added ion, and repeat-
om’s interactions are with atoms in the ing the insertion process as necessary.
same or neighboring cubes. Each block Of these steps, the initial electrostatic
field calculation dominates runtime maintained this performance lead de- simulation on graphics processing units. Journal of
Computational Chemistry 30, 6 (2009), 864–872.
and is therefore the part best suited for spite historically lagging CPUs by a 8. Göddeke, D., Strzodka, R., Mohd-Yusof, J., McCormick,
GPU acceleration. generation in fabrication technology, P., Buijssen, S.H.M., Grajewski, M., and Turek, S.
Exploring weak scalability for FEM calculations on a
A simple quadratic-time direct Cou- a handicap that may fade with growing GPU-enhanced cluster. Parallel Computing 33, 10–11
lomb summation algorithm computes demand. (2007), 685–699.
9. Hardy, D.J., Stone, J.E., and Schulten, K. Multilevel
the electrostatic field at each lattice The great benefits of GPU accelera- summation of electrostatic potentials using graphics
point by summing the potential contri- tion and other computer performance processing units. Parallel Computing 35 (2009),
164–177.
butions for all atoms. When implement- increases for biomedical science will 10. Humphrey, W., Dalke, A., and Schulten, K. VMD: Visual
molecular dynamics. Journal of Molecular Graphics 14
ed optimally, taking advantage of fast come in three areas. The first is do- (1996), 33–38.
reciprocal square-root instructions and ing the same calculations as today, but 11. Lindholm, E., Nickolls, J., Oberman, S., and Montrym,
J. Nvidia Tesla: A unified graphics and computing
making extensive use of near-register- faster and more conveniently, providing architecture. IEEE Micro 28, 2 (2008), 39–55.
speed on-chip memories, a GPU direct results over lunch rather than overnight 12. McCool, M. Data-parallel programming on the Cell
BE and the GPU using the RapidMind development
summation algorithm can outperform to allow hypotheses to be tested while platform. In Proceedings of GSPx Multicore
a CPU core by a factor of 44 or more.17,22 they are fresh in the mind. The second Applications Conference (Oct.–Nov. 2006).
13. McCool, M., Du Toit, S., Popa, T., Chan, B., and Moule, K.
By employing a so-called “short-range is in enabling new types of calculations Shader algebra. ACM Transactions on Graphics 23, 3
cutoff” distance beyond which contri- that are prohibitively slow or expensive (2004), 787–795.
14. Munshi, A. OpenCL specification version 1.0 (2008);
butions are ignored, the algorithm can today, such as evaluating properties http://www.khronos.org/registry/cl/.
achieve linear time complexity while still throughout an entire simulation rather 15. Nickolls, J., Buck, I., Garland, M., and Skadron, K.
Scalable parallel programming with CUDA. ACM Queue
outperforming a CPU core by a factor of than for a few static structures. The third 6, 2 (2008): 40–53.
26 or more.20 To take into account the and greatest is in greatly expanding the 16. Nvidia CUDA (Compute Unified Device Architecture)
Programming Guide. Nvidia, Santa Clara, CA 2007.
long-range electrostatic contributions user community for high-end biomedi- 17. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone,
from distant atoms, the short-range cut- cal computation to include all experi- J.E., and Phillips, J.C. GPU computing. In Proceedings
of IEEE 96 (2008), 879–899.
off algorithm must be combined with a mental researchers around the world, 18. Phillips, J.C., Braun, R., Wang, W., Gumbart, J.,
Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L.,
long-range contribution. A GPU imple- for there is much work to be done and and Schulten, K. Scalable molecular dynamics with
mentation of the linear-time multilevel we are just now beginning to uncover NAMD. Journal of Computational Chemistry 26 (2005),
1781–1802.
summation method, combining both the wonders of life at the atomic scale. 19. Phillips, J.C., Stone, J. E., and Schulten, K. Adapting a
the short-range and long-range contri- message-driven parallel application to GPU-accelerated
clusters. In Proceedings of the 2008 ACM/IEEE
butions, has achieved speedups in ex- Conference on Supercomputing. IEEE Press, 2008.
cess of a factor of 20 compared with a Related articles 20. Rodrigues, C.I., Hardy, D.J., Stone, J.E., Schulten, K., and
on queue.acm.org Hwu, W.W. GPU acceleration of cutoff pair potentials for
CPU core.9 molecular modeling applications. In Proceedings of the
GPU acceleration techniques have GPUs: A Closer Look 2008 ACM Conference on Computing Frontiers (2008),
Kayvon Fatahalian and Mike Houston 273–282.
proven successful for an increasingly 21. Showerman, M., Enos, J., Pant, A., Kindratenko,
http://queue.acm.org/detail.cfm?id=1365498
diverse range of other biomolecu- V., Steffen, C., Pennington, R., and Hwu, W. QP:
Scalable Parallel Programming with CUDA A heterogeneous multi-accelerator cluster. In
lar applications, including quantum Proceedings of the 10th LCI International Conference
John Nickolls, Ian Buck, Michael Garland, on High-performance Clustered Computing (Mar. 2009).
chemistry simulation (http://mtzweb. and Kevin Skadron 22. Stone, J.E., Phillips, J. C., Freddolino, P.L., Hardy, D.J.,
stanford.edu/research/gpu/) and visual- http://queue.acm.org/detail.cfm?id=1365500 Trabuco, L.G., and Schulten, K. Accelerating molecular
ization,23,25 calculation of solvent-acces- modeling applications with graphics processors.
Future Graphics Architectures Journal of Computational Chemistry 28 (2007),
sible surface area,4 and others (http:// William Mark 2618–2640.
www.hicomb.org/proceedings.html). It 23. Stone, J. E., Saam, J., Hardy, D. J., Vandivort, K. L., Hwu,
http://queue.acm.org/detail.cfm?id=1365501 W. W., and Schulten, K. High-performance computation
seems likely that GPUs and other many- and interactive display of molecular orbitals on
GPUs and multicore CPUs. In Proceedings of the 2nd
core processors will find even greater References Workshop on General-purpose Processing on Graphics
applicability in the future. 1. Anderson, J.A., Lorenz, C.D., and Travesset, A. Processing Units. ACM International Conference
General-purpose molecular dynamics simulations fully Proceeding Series 383 (2009), 9–18.
implemented on graphics processing units. Journal of 24. Takizawa, H. and Kobayashi, H. Hierarchical
Chemical Physics 227, 10 (2008), 5342–5359.
Looking Forward 2. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K.,
parallel processing of large-scale data clustering
on a PC cluster with GPU coprocessing. Journal of
Both CPU and GPU manufacturers now Houston, M., and Hanrahan, P. Brook for GPUs: Stream Supercomputing 36, 3 (2006), 219–234.
computing on graphics hardware. In Proceedings of
exploit fabrication technology improve- 2004 ACM SIGGRAPH. ACM, NY, 777–786
25. Ufimtsev, I.S. and Martinez, T.J. Quantum chemistry on
graphical processing units. Strategies for two-electron
ments by adding cores to their chips as 3. Davis, D., Lucas, R., Wagenbreth, G., Tran, J., and Moore, integral evaluation. Journal of Chemical Theory and
J. A GPU-enhanced cluster for accelerated FMS.
feature sizes shrink. This trend is antici- In Proceedings of the 2007 DoD High-performance
Computation 4, 2 (2008), 222–231.
Unifying
There is a need to bridge biological
and scientific disciplines with an im-
age framework capable of high com-
putational performance and interop-
Biological
erability. Suitable for archiving, such
a framework must be able to maintain
images far into the future. Some frame-
works represent partial solutions: a
Image
few, such as XML, are primarily suited
for interchanging metadata; others,
such as CIF (Crystallographic Informa-
tion Framework),2 are primarily suited
Formats
for the database structures needed for
crystallographic data mining; still oth-
ers, such as DICOM (Digital Imaging
and Communications in Medicine),3
with HDF5
are primarily suited for the domain of
clinical medical imaging.
What is needed is a common image
framework able to interoperate with
all of these disciplines, while provid-
ing high computational performance.
HDF (Hierarchical Data Format)6 is
such a framework, presenting a his-
toric opportunity to establish a coin
of the realm by coordinating the imag-
THE BIOLOGICAL SCIENCES need a generic image format ery of many biological communities.
Overcoming the digital confusion of
suitable for long-term storage and capable of incoherent bio-imaging formats will
handling very large images. Images convey profound result in better science and wider ac-
ideas in biology, bridging across disciplines. cessibility to knowledge.
Digital imagery began 50 years ago as an obscure Semantics: Formats,
technical phenomenon. Now it is an indispensable Frameworks, and Images
Digital imagery and computer tech-
computational tool. It has produced a variety of nology serve a number of diverse bio-
incompatible image file formats, most of which are logical communities with terminology
already obsolete. differences that can result in very dif-
ferent perspectives. Consider the word
Several factors are forcing the obsolescence: format. To the data-storage communi-
rapid increases in the number of pixels per image; ty the hard-drive format will play a ma-
jor role in the computer performance ferent facets of the same specification. HDF5 translates across a variety of
of a community’s image format, and Because these terms are so ubiquitous computing architectures. Through
to some extent, they are inseparable. and varied due to perspective, we shall support from NASA (National Aero-
A format can describe a standard, a use them interchangeably, with the em- nautics and Space Administration),
framework, or a software tool; and for- phasis on the storage and management NSF (National Science Foundation),
mats can exist within other formats. of pixels throughout their lifetime, from DOE (Department of Energy), and oth-
Image is also a term with several acquisition through archiving. ers, HDF5 continues to support inter-
uses. It may refer to transient electri- national research. The HDF Group, a
cal signals in a CCD (charge-coupled Hierarchical Data Format Version 5 nonprofit spin-off from the University
device), a passive dataset on a storage HDF5 is a generic scientific data for- of Illinois, manages HDF5, reinforcing
device, a location in RAM, or a data mat with supporting software. Intro- the long-term business commitment
structure written in source code. An- duced in 1998, it is the successor to the to maintain the format for purposes of
other example is framework. An image 1988 version, HDF4. NCSA (National archiving and performance.
framework might implement an image Center for Supercomputing Applica- Because an HDF5 file can contain
standard, resulting in image files cre- tions) developed both formats for almost any collection of data entities
ated by a software-imaging tool. The high-performance management of in a single file, it has become the for-
framework, the standard, the files, and large heterogeneous scientific data. mat of choice for organizing hetero-
the tool, as in the case of HDF,6 may be Designed to move data efficiently be- geneous collections consisting of very
so interrelated that they represent dif- tween secondary storage and memory, large and complex datasets. HDF5 is
used for some of the largest scientific bit floating-point number). An HDF5 HDF5 as a framework to create their
data collections, such as the NASA group is similar to a directory, or fold- independent next-generation image
Earth Observation System’s petabyte er, in a computer file system. An HDF5 file formats. In the case of the NeXus,11
repository of earth science data. In group contains links to groups or data- the format developed by the neutron
2008, netCDF (network Common Data sets, together with supporting meta- and synchrotron facilities, HDF5 has
Form)10 began using HDF5, bring- data. The organization of an HDF5 file been the operational infrastructure in
ing in the atmospheric and climate is a directed graph structure in which its design since 1998.
communities. HDF5 also supports groups and datasets are nodes, and Ongoing discussions by MEDSBIO
the neutron and X-ray communities links are edges. Although the term have led to the realization that common
for instrument data acquisition. Re- HDF implies a hierarchical structur- computational storage algorithms and
cently, MATLAB implemented HDF5 ing, its topology allows for other ar- formats for managing images would
as its primary storage format. Soon rangements such as meshes or rings. tremendously benefit the X-ray, neu-
HDF5 will formally be adopted by the HDF5 is a completely portable file tron, electron, and optical acquisition
International Organization for Stan- format with no limit on the number or communities. Significantly, the entire
dardization (ISO), as part of specifi- size of data objects in the collection. biological community would benefit
cation 10303 (STEP, Standard for the During I/O operations, HDF5 automat- from coherent imagery and better-in-
Exchange of Product model data). Also ically takes care of data-type differenc- tegrated data models. With four bio-
of note is the creation of BioHDF1 for es, such as byte ordering and data-type imaging communities concluding that
organizing rapidly growing genomics size. Its software library runs on Linux, HDF5 is essential to their future image
data volumes. Windows, Mac, and most other oper- strategy, this is a rare opportunity to
The HDF Group’s digital preserva- ating systems and architectures, from establish comprehensive agreements
tion efforts make HDF5 well suited laptops to massively parallel systems. on a common scientific image stan-
for archival tasks. Specifically their HDF5 implements a high-level API dard across biological disciplines.
involvement with NARA (National Ar- with C, C++, Fortran 90, Python, and
chives and Records Administration), Java interfaces. It includes many tools Concerns Identified
their familiarity with the ISO standard for manipulating and viewing HDF5 The following deficiencies impede the
Reference Model for an Open Archival data, and a wide variety of third-party immediate and long-term usefulness
Information System (OAIS),13 and the applications and tools are available. of digital images:
HDF5 implementation of the Meta- The design of the HDF5 software ! The increase in pixels caused by im-
data Encoding and Transmission Stan- provides a rich set of integrated perfor- proving digital acquisition resolutions,
dard (METS)8 developed by the Digital mance features that allow for access- faster acquisition speeds, and expanding
Library Federation and maintained by time and storage-space optimizations. user expectations for “more and faster”
the Library of Congress. For example, it supports efficient ex- is unmanageable. The solution requires
traction of subsets of data, multiscale technical analysis of the computation-
Technical Features of HDF5 representation of images, generic di- al infrastructure. The image designer
An HDF5 file is a data container, simi- mensionality of datasets, parallel I/O, must analyze the context of computer
lar to a file system. Within it, user tiling (2D), bricking (3D), chunking hardware, application software, and
communities or software applica- (nD), regional compression, and the the operating-system interactions.
tions define their organization of data flexible management of user metadata This is a moving target monitored
objects. The basic HDF5 data model that is interoperable with XML. HDF5 over a period of decades. For example,
is simple, yet extremely versatile in transparently manages byte ordering today’s biologists use computers hav-
terms of the scope of data that it can in its detection of hardware. Its soft- ing 2GB–16GB of RAM. What method
store. It contains two primary objects: ware extensibility allows users to in- should be used to access a four-dimen-
groups, which provide the organizing sert custom software “filters” between sional, 1TB image having 30 hyper-
structures, and datasets, which are secondary storage and memory; such spectral values per pixel? Virtually all
the basic storage structures. HDF5 filters allow for encryption, compres- of the current biological image formats
groups and datasets may also have at- sion, or image processing. The HDF5 organize pixels as 2D XY image planes.
tributes attached, a third type of data data model, file format, API, library, A visualization program may require
object consisting of small textual or and tools are open source and distrib- the entire set of pixels read into RAM
numeric metadata defined by user ap- uted without charge. or virtual memory. This, coupled with
plications. poor performance of the mass storage
An HDF5 dataset is a uniform mul- MEDSBIO relating to random disk seeks, paging,
tidimensional array of elements. The X-ray crystallographers formed MEDS- and memory swaps, effectively makes
elements might be common data types BIO (Consortium for Management of the image unusable. For a very large
(for example, integers, floating-point Experimental Data in Structural Bi- image, it is desirable to store it in mul-
numbers, text strings), n-dimensional ology)7 in 2005 to coordinate various tiple resolutions (multiscale) allowing
memory chunks, or user-defined com- research interests. Later the electron4 interactive access to regions of inter-
pound data structures consisting of and optical14 microscopy communi- est. Visualization software may inten-
floating-point vectors or an arbitrary ties began attending. During the past sively compute these intermediate
bit-length encoding (for example, 97- 10 years, each community considered data resolutions, later discarded upon
exit from the software. subsequent datasets be corrupted. and modeling, attempt to implement
! The inflexibility of current biologi- That risk would be unacceptable for these formats, forcing the communi-
cal image file designs prevents them operational software used in data re- ties to confront design deficiencies.
from adapting to future modalities and positories and research. This function Basic image metadata definitions
dimensionality. Rapid advances in bio- and its certification testing are critical such as rank, dimension, and modality
logical instrumentation and computa- features of HDF software that are not must be explicitly defined so the down-
tional analysis are leading to complex readily available in any other format. stream communities can easily partic-
imagery involving novel physical and ipate. Different research communities
statistical pixel specifications. Common Objectives must be able to append new types of
! The inability to assemble different The objectives of these acquisition metadata to the image, enhancing the
communities’ imagery into an overarch- communities are identical, requiring imagery as it progresses through the
ing image model allows for ambiguity in performance, interoperability, and pipeline. Ongoing advances in the ac-
the analysis. The integration of various archiving. There is a real need for the quisition communities will continue
coordinate systems can be an impass- different bio-imaging communities to produce new and significant im-
able obstacle if not properly organized. to coordinate within the same HDF5 age modalities that feed this image
There is an increasing need to correlate data file by using identical high-per- pipeline. Enabling downstream us-
images of different modalities in order formance methods to manage pixels; ers easily to access pixels and append
to observe spatial continuity from mil- avoiding namespace collisions be- their community metadata supports
limeter to angstrom resolutions. tween the biological communities; interoperability, ultimately leading to
! The non-archival quality of images and adopting the same archival best fundamental breakthroughs in biol-
undermines their long-term value. The practices. All of these would benefit ogy. This is not to suggest that differ-
current designs usually do not provide downstream communities such as vi- ent communities’ metadata can be or
basic archival features recommended sualization developers and global re- should be uniformly defined as a sin-
by the Digital Library Federation, nor positories. gle biological metadata schema and
do they address issues of provenance. Performance. The design of an image ontology in order to achieve an effec-
Frequently, the documentation of a file format and the subsequent organi- tive image format.
community image format is incom- zation of stored pixels determine the Archiving. Scientific images have a
plete, outdated, or unavailable, thus performance of computation because general lack of archival design features.
eroding the ability to interpret the dig- of various hardware and software data- As the sophistication of bio-imagery
ital artifact properly. path bottlenecks. For example, many improves, the demand for the place-
specialized biological image formats ment of this imagery into long-term
Consensus use simple 2D pixel organizations, global repositories will be greater. This
It would be desirable to adopt an exist- frequently without the benefit of com- is being done by the Electron Micros-
ing scientific, medical, or computer pression. These 2D pixel organizations copy Databank4 in joint development
image format, and simply benefit from are ill suited for very large 3D images by the National Center for Macromo-
the consequences. All image formats such as electron tomograms or 5D op- lecular Imaging, the RCSB (Research
have their strengths and weaknesses. tical images. Those bio-imaging files Collaboratory for Structural Bioinfor-
They tend to fall into two categories: have sizes that are orders of magnitude matics) at Rutgers University, and the
generic and specialized formats. Ge- larger than the RAM of computers. European Bioinformatics Institute.
neric image formats usually have fixed Worse, widening gaps have formed be- Efforts such as the Open Microscopy
dimensionality or pixel design. For ex- tween CPU/memory speeds, persistent Environment14 are also developing
ample, MPEG29 is suitable for many storage speeds, and network speeds. bio-image informatics tools for lab-
applications as long as it is 2D spatial These gaps lead to significant delays based data sharing and data mining of
plus 1D temporal using red-green-blue in processing massive data sets. Any biological images that also are requir-
modality that is lossy compressed for file format for massive data has to ac- ing practical image formats for long-
the physiological response of the eye. count for the complex behavior of soft- term storage and retrieval. Because of
Alternatively, the specialized image ware layers, all the way from the appli- the evolving complexity of bio-imagery
formats suffer the difficulties of the cation, through middleware, down to and the need to subscribe to archival
image formats we are already using. operating-systems device drivers. A ge- best practices, an archive-ready image
For example, DICOM3 (medical imag- neric n-dimensional multimodal im- format must be self-describing. That
ing standard) and FITS5 (astronomical age format will require new instantia- is, there must be sufficient infrastruc-
imaging standard,) store their pixels tion and infrastructure to implement ture within the image file design to
as 2D slices, although DICOM does new types of data buffers and caches to properly document its content, con-
incorporate MPEG2 for video-based scale large datasets into much smaller text, and structure of the pixels and
imagery. RAM; much of this has been resolved related community metadata, thereby
The ability to tile (2D), brick (3D), or within HDF5. minimizing the reliance on external
chunk (nD) is required to access very Interoperability. Historically the ac- documentation for interpretation.
large images. Although this is concep- quisition communities have defined
tually simple, the software is not, and custom image formats. Downstream The Inertia of Legacy Software
must be tested carefully or risk that communities, such as visualization Implementing a new unified image
format supporting legacy software same byte order as defined by the lega- follow the extensible design strategy
across the biological disciplines is a cy image format. The fourth possibility used in the organization of NFS (Net-
Gordian knot. Convincing software is to endow the VFS with archival and work File System) version 4 protocol
developers to make this a high priority performance analysis tools that could metadata.12
is a difficult proposition. Implemen- transparently provide those services to 6. In some circumstances it will
tation occuring across hundreds of legacy application software. be desirable to define adjuncts to the
legacy packages and flawlessly fielded common image model. An example is
in thousands of laboratories is not a Recommendations MPEG video, where the standardized
trivial task. Ideally, presenting images To achieve the goal of an exemplary compression is the overriding reason
simultaeously in their legacy formats image design having wide, long-term to store the data as a 1D byte stream
and in a new advanced format would support, we offer the following recom- rather than decompressing it into the
mitigate the technical, social, and lo- mendations to be considered through standard image model as a 3D YCbCr
gistical obstacles. However, this must a formal standards process: pixel dataset. Proprietary image for-
be accomplished without duplicating 1. Permit and encourage scientific mat is another type of adjunct requir-
the pixels in secondary storage. communities to continually to evolve ing 1D byte encapsulation rather than
One proposal is to mount an HDF5 their own image designs. They know translating it into the common image
file as a VFS (virtual file system) so that the demands of their disciplines best. model. In this scenario, images are
HDF5 groups become directories and Implementing community image for- merely flagged as such and routine ar-
HDF5 datasets become regular files. mats through HDF5 provides these chiving methods applied.
Such a VFS using FUSE (Filesystem-in- communities flexible routes to a com- 7. Provide a comprehensively tested
User-Space) would execute simultane- mon image model. software API in lockstep with the image
ously across the user-process space 2. Adopt the archival community’s model. Lack of a common API requires
and the operating system space. This recommendations on archive-ready each scientific group to develop and
hyperspace would manage all HDF- datasets. Engaging the digital preserva- test the software tools from scratch or
VFS file activity by interpreting, inter- tion community from the onset, rather borrow them from others, resulting in
cepting, and dynamically rearranging than as an afterthought, will produce not only increased cost for each group,
legacy image files. A single virtual file better long-term image designs. but also increased likelihood of errors
presented by the VFS could be com- 3. Establish a common image mod- and inconsistencies among imple-
posed of several concatenated HDF5 el. The specification must be concep- mentations.
datasets, such as a metadata header tually simple and should merely dis- 8. Implement HDF5 as a virtual file
dataset and a pixel dataset. Such a VFS tinguish the image’s pixels from the system. HDF-VFS could interpret in-
file could have multiple simultaneous various metadata. The storage of pix- coming legacy image file formats by
filenames and legacy formats depend- els should be in an appropriate dimen- storing them as pixel datasets and en-
ing on the virtual folder name that sional dataset. The encapsulation of capsulated metadata. HDF-VFS could
contains it, or the software application community metadata should be in 1D also present such a combination of
attempting to open it. byte datasets or attributes. HDF datasets as a single legacy-format
The design and function of an HDF- 4. The majority of the metadata is image file, byte-stream identical. Such
VFS has several possibilities. First, uniquely specific to the biological com- a file system could allow user legacy ap-
non-HDF5 application software could munity that designs it. The use of bina- plications to access and interact with
interact transparently with HDF5 files. ry or XML is an internal concern of the the images through standard file I/O
PDF files, spreadsheets, and MPEGs community creating the image design; calls, obviating the requirement and
would be written and read as routine however, universal image metadata burden of legacy software to include,
file-system byte streams. Second, this will overlap across disciplines, such compile, and link HDF5 API libraries
VFS, when combined with transparent as rank, dimensionality, and pixel mo- in order to access images. The duality
on-the-fly compression, would act as dality. Common image nomenclature of presenting an image as a file and
an operationally usable compressed should be defined to bridge metadata an HDF5 dataset offers a number of
tarball. Third, design the VFS with namespace conversions to legacy for- intriguing possibilities for managing
unique features such as interpreting mats. images and non-image datasets such
incoming files as image files. Commu- 5. Use RDF (Resource Description as spreadsheets or PDF files, or man-
nity-based legacy image format filters Framework)15 as the primary mecha- aging provenance without changes to
would rearrange legacy image files. For nism to manage the association of pix- legacy application software.
example, the pixels would be stored as el datasets and the community meta- 9. Make the image specification
HDF5 datasets in the appropriate di- data. A Subject-Predicate-Object-Time and software API freely accessible and
mensionality and modality, and the tuple stored as a dataset can benefit available without charge. Preferably,
related metadata would be stored as a from HDF5’s B-tree search features. such software should be available un-
separate HDF5 1D byte dataset. When Such an arrangement provides useful der an open source license that allows
legacy application software opens the time stamps for provenance and ge- a community of software developers to
legacy image file, the virtual file is dy- neric logging for administration and contribute to its development. Charg-
namically recombined and presented performance testing. The definition ing the individual biological imaging
by the VFS to the legacy software in the of RDF predicates and objects should communities and laboratories adds
financial complexity to the pursuit of gists will continue with incompatible Better Scripts, Better Games
scientific efforts that are frequently methods for solving similar problems, http://queue.acm.org/detail.cfm?id=1483106
underfunded. such as not having a common image Concurrency’s Shysters
10. Establish methods for verifica- model. http://blogs.sun.com/bmc/entry/
tion and performance testing. A critical The failure to establish a scalable concurrency_s_shysters
requirement is the ability to determine n-dimensional scientific image stan-
compliance. Not having compliance dard that is efficient, interoperable, References
testing significantly weakens the ar- and archival will result in a less-than- 1. BioHDF; http://www.geospiza.com/research/biohdf/.
2. Crystallographic Information Framework.
chival value by undermining the reli- optimal research environment and a International Union of Crystallography; http://www.
ability and integrity of the image data. less-certain future capability for im- iucr.org/resources/cif/.
3. DICOM (Digital Imaging and Communications in
Performance testing using prototypi- age repositories. The strategic danger Medicine); http://medical.nema.org.
4. EMDB (Electron Microscopy Data Bank); http://
cal test cases assists in the design pro- of not having a comprehensive scien- emdatabank.org/.
cess by flagging proposed community tific image storage framework is the 5. FITS (Flexible Image Transport System); http://fits.
gsfc.nasa.gov/.
image design that will have severe per- massive generation of unsustainable 6. HDF (Hierarchical Data Format); http://www.hdfgroup.
formance problems. Defining baseline bio-images. Subsequently, the long- org.
7. MEDSBIO (Consortium for Management of
test cases will quickly identify software term risks and costs of comfortable Experimental Data in Structural Biology); http://www.
problems in the API. inaction will likely be enormous and medsbio.org.
8. METS (Metadata Encoding and Transmission
11. Establish ongoing adminis- irreversible. Standard); http://www.loc.gov/standards/mets/.
trative support. Formal design pro- The challenge for the biosciences 9. MPEG (Moving Picture Experts Group); http://www.
chiariglione.org/mpeg/.
cesses can take considerable time to is to establish a world-class imaging 10. netCDF (network Common Data Form); http://www.
complete, but some needs—such as specification that will endow these unidata.ucar.edu/software/netcdf/.
11. NeXus (neutron, x-ray and muon science); http://www.
technical support, consultation, pub- indispensable and nonreproducible nexusformat.org.
lishing technical documentation, and observations with long-term mainte- 12. NFS (Network File System); http://www.ietf.org/rfc/
rfc3530.txt.
managing registration of community nance and high-performance compu- 13. OAIS (Open Archival Information System); http://
image designs—require immediate at- tational access. The issue is not wheth- nost.gsfc.nasa.gov/isoas/overview.html.
14. OME (Open Microscopy Environment); http://www.
tention. Establishing a mechanism for er the biosciences will adopt HDF5 as openmicroscopy.org/.
15. RDF (Resource Description Framework); http://www.
imaging communities to register their a useful imaging framework—that is w3.org/RDF/.
HDF5 root level groups as community already happening—but whether it is
specific data domains will provide an time to gather the many separate piec-
essential cornerstone for image de- es of the currently highly fragmented Matthew T. Dougherty ([email protected]) is at the
National Center for Macromolecular Imaging, specializing
sign and avoid namespace collisions patchwork of biological image for- in cryo-electron microscopy, visualization, and animation.
with other imaging communities. mats and place them under HDF5 as a
12. Examine how other formal stan- common framework. This is the time Michael J. Folk ([email protected]) is president of
The HDF Group.
dards have evolved. Employ the suc- to unify the imagery of biology, and we
cessful strategies and avoid the pitfalls. encourage readers to contact the au- Erez Zadok ([email protected]) is associate professor
Developing strategies and alliances thors with their views. at Stony Brook University, specializing in computer
storage systems performance and design.
with these standards groups will fur-
ther strengthen the design and adop- Acknowledgments Herbert J. Bernstein ([email protected]) is professor
tion of a scientific image standard. This work was funded by the Na- of computer science at Dowling College, active in the
development of IUCr standards.
13. Establishing the correct forum tional Center for Research Resourc-
is crucial and will require the guidance es (P41-RR-02250), National Insti- Frances C. Bernstein ([email protected])
of a professional standards organi- tute of General Medical Sciences is retired from Brookhaven National Laboratory after 24
years at the Protein Data Bank, active in macromolecular
zation—or organizations—that per- (5R01GM079429, Department of En- data representation and validation.
ceives the development of such an im- ergy (ER64212-1027708-0011962),
age standard as part of its mission to National Science Foundation (DBI- Kevin W. Eliceiri ([email protected]) is director
at the Laboratory for Optical and Computational
serve the public and its membership. 0610407, CCF-0621463), National Instrumentation, University of Wisconsin-Madison, active
Broad consensus and commitment by Institutes of Health (1R13RR023192- in the development of tools for bio-image informatics.
A Conversation
with
David E. Shaw
himself first and foremost
D AV I D SH AW C O N S ID E R S
a computer scientist. It’s a fact that’s sometimes
overshadowed by the activities of his two highly
successful, yet very different, ventures: the hedge
fund D. E. Shaw & Co., which he founded 20 years
ago, and the research lab D. E. Shaw Research, where
he now conducts hands-on research in the field of
computational biochemistry. The former makes
money through rigorous quantitative and qualitative
investment techniques, while the latter spends money
simulating complex biochemical pro- tions by several orders of magnitude.
cesses. But a key element to both or- Four 512-processor machines are now
ganizations’ success has been Shaw’s active and already helping scientists to
background in computer science. Serv- understand how proteins interact with
ing as interviewer, computer graphics each other and with other molecules at
researcher and Stanford professor an atomic level of detail. Shaw’s hope
Pat Hanrahana points out that one of is that these “molecular microscopes”
Shaw’s unique gifts is his ability to ef- will help unravel some biochemical
fectively apply computer science tech- mysteries that could lead to the de-
PHOTOGRA PH COURT ESY O F DAVID E. SH AW
O C TO BE R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 49
practice
nancial world applying quantitative At a certain point, when I was ap- affect an individual’s susceptibility to
and computational techniques to the proaching my 50th birthday, I felt like various human diseases, and so forth.
process of investment management. it was a natural time to think about There are a lot of computer scientists
During the early years of D. E. Shaw & what I wanted to do over the coming working in the area that’s commonly
Co., the financial firm, I’d been per- years. Since my graduate work at Stan- referred to as bioinformatics, but not
sonally involved in research aimed at ford and my research at Columbia were nearly as many who work on prob-
understanding various financial mar- focused in part on parallel architec- lems involving the three-dimensional
kets and phenomena from a math- tures and algorithms, one of the things structures and structural changes that
ematical viewpoint. But as the years I spent some time thinking about was underlie the physical behavior of in-
went by and the company grew, I had whether there might be some way to dividual biological molecules. I think
to spend more time on general man- apply those sorts of technologies to there’s still a lot of juicy, low-hanging
agement, and I could feel myself get- one of the areas Rich had been teach- fruit in this area, and maybe even some
ting stupider with each passing year. ing me about. After a fair amount of important unifying principles that
I didn’t like that, so I started solving reading and talking to people, I found haven’t yet been discovered.
little theoretical problems at night just one application—the simulation of HANRAHAN: When you mention drug
for fun—things I could tackle on my molecular dynamics—where it seemed discovery, do you see certain applica-
own, since I no longer had a research like a massive increase in speed could, tions like that in the near-term or are
group like I did when I was on the fac- in principle, have a major impact on you mostly trying to do pure research at
ulty at Columbia. As time went by, I our understanding of biological pro- this point?
realized that I was enjoying that more cesses at the molecular level. SHAW: Although my long-term hope
and more, and that I missed doing re- It wasn’t immediately clear to me is that at least some of the things we
search on a full-time basis. whether it would actually be possi- discover might someday play a role in
I had a friend, Rich Friesner, who ble to get that sort of speed up, but it curing people, that’s not something
was a chemistry professor at Columbia, smelled like the sort of problem where I expect to happen overnight. Impor-
and he was working on problems like there might be some nonobvious way tant work is being done at all stages
protein folding and protein dynamics, to make it happen. The time seemed in the drug development pipeline, but
among other things. Rich is a compu- ripe from a technological viewpoint, our own focus is on basic scientific re-
tational chemist, so a lot of what he did and I just couldn’t resist the impulse to search with a relatively long time hori-
involved algorithms, but his training see if it could be done. At that point, I zon, but a large potential payoff. To put
was in chemistry rather than computer started working seriously on the prob- this in perspective, many of the medi-
science, so when we got together social- lem, and I found that I loved being cations we use today were discovered
ly, we often talked about some of the involved again in hands-on research. more or less by accident, or through
intersections between our fields. He’d That was eight years ago, and I still feel a brute-force process that’s not based
say, “You know, we have this problem: the same way. on a detailed understanding of what’s
the inner loop in this code does such- HANRAHAN: In terms of your goals at going on at the molecular level. In
and-such, and we can’t do the studies D. E. Shaw Research, are they particu- many areas, these approaches seem to
we want to do because it’s too slow. Do larly oriented toward computational be running out of steam, which is lead-
you have any ideas?” chemistry or is there a broader mis- ing researchers to focus more on tar-
Although I didn’t understand much sion? geting drugs toward specific proteins
at that point about the chemistry and SHAW: The problems I’m most in- and other biological macromolecules
biology involved, I’d sometimes take terested in have a biochemical or bio- based on an atomic-level understand-
the problem home, work on it a little physical focus. There are lots of other ing of the structure and behavior of
bit, and try to come up with a solution aspects of computational chemistry those targets.
that would speed up his code. In some that are interesting and important— The techniques and technologies
cases, the problem would turn out to nanostructures and materials science we’ve been working on are providing
be something that any computer scien- and things like that—but the applica- new tools for understanding the biol-
tist with a decent background in algo- tions that really drive me are biologi- ogy and chemistry of pharmaceutically
rithms would have been able to solve. cal, especially things that might lead relevant molecular systems. Although
After thinking about it for a little while, not only to fundamental insights into developing a new drug can take as long
you’d say, “Oh, that’s really just a spe- molecular biological processes, but as 15 years, our scientific progress is
cial case of this known problem, with also to tools that someone might use at occurring over a much shorter tim-
the following wrinkle.” some point to develop lifesaving drugs escale, and we’re already discovering
One time I managed to speed up the more effectively. things that we hope might someday be
whole computation by about a factor Our particular focus is on the struc- useful in the process of drug design.
of 100, which was very satisfying to me. ture and function of biological mol- But I also enjoy being involved in the
It didn’t require any brilliant insight; ecules at an atomic level of detail, and unraveling of biological mysteries,
it was just a matter of bringing a bit of not so much at the level of systems bi- some of which have puzzled research-
computer science into a research area ology, where you try to identify and un- ers for 40 or 50 years.
where there hadn’t yet been all that derstand networks of interacting pro- HANRAHAN: This machine you’ve
much of it. teins, figure out how genetic variations built, Anton, is now operational. Can
you tell us a little bit about that ma- HANRAHAN: When I read the Anton
chine and the key ideas behind it? paper in Communications,b it reminded
SHAW: Anton was designed to run me a lot of what I worked on, which
molecular dynamics (MD) simula- were graphics chips—just in the way
tions a couple orders of magnitude
faster than the world’s fastest super- The techniques you’re choreographing communica-
tion and keeping everything busy and
computers. MD simulations are con- and technologies all these really important tricks. Can
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 51
practice
the best way to compute the interac- ative codesign of new architectural
tions of two particles involves send- and algorithmic approaches could
ing them both somewhere else is just have increased, at least to some extent,
amazing. I understand your proof but the number of applications in which a
it still boggles me because it seems so
counterintuitive. Our latest sufficient speedup was achieved that
outweighed the very real economies of
SHAW: It is kind of weird, but if you
look back through the literature, you
benchmark scale associated with the use of gener-
al-purpose commodity hardware. In
can find various pieces of the puzzle measurement our case, I don’t think we could have
in a number of different contexts. Al-
though I wasn’t aware of this at the
was 16,400 reached our performance goals either
by implementing standard computa-
time, it turns out that the idea of “meet- nanoseconds tional chemistry algorithms on new
ing on neutral territory” can be found
in different forms in publications dat-
per day on a hardware or by using standard high-
performance architectures to run new
ing back as far as the early 1990s, al- 512-node Anton algorithms.
though these early approaches didn’t
offer an asymptotic advantage over configuration, so HANRAHAN: I think a lot of people in
computer science were very surprised
traditional spatial decomposition
methods. Later, Marc Snir indepen-
Anton would run by your Anton paper because they
might have thought that normal com-
dently came up with an algorithm that three or four orders puting paradigms are efficient and
achieved the same asymptotic proper-
ties as mine in a different way, and he
of magnitude faster, that there’s not a lot to be gained. You
read these papers where people get 5%
described his method in a very nice and roughly two improvements, and then you sort of
paper that appeared shortly before
my own. His paper included a proof
orders of magnitude blow them out of the water with this
100-fold improvement. Do you think
that this is the best you can do from faster than the this is just a freak thing or are there
an asymptotic viewpoint. Although
the constant factors were such that fastest that can chances that we could come up with
other revolutionary new ways of solv-
his method wouldn’t have performed be achieved under ing problems?
well in the sorts of applications we’re
interested in, it’s clear with the benefit practical conditions SHAW: I’ve been asked that before,
but I’m really not sure what the an-
of hindsight that the straightforward
addition of certain features from the
on supercomputers swer is. I’d be surprised if this turned
out to be the only computationally
NT method would have made his algo- or massively demanding application that could be
rithm work nearly as well as NT itself
for that class of applications.
parallel clusters. addressed using the general approach
we’ve taken to the design of a special-
But the important thing from my purpose machine. On the other hand,
perspective was not that the NT algo- there are a lot of scientific problems
rithm had certain asymptotic advan- for which our approach would clearly
tages, but that with the right kind of not be effective. For one thing, some
machine architecture it removed a key problems are subject to unavoidable
bottleneck that would otherwise have communication bottlenecks that
kept me from exploiting the enormous would dominate any speedup that
amount of application-specific arith- might be achieved using an applica-
metic horsepower I’d wanted to place tion-specific chip. And in some other
on each chip. applications, flexibility may be impor-
HANRAHAN: I think it’s a great exam- tant enough that a cluster-based solu-
ple of computer science thinking be- tion, or maybe one based on the use of
cause it’s a combination of a new algo- GPUs, would be a better choice.
rithm, which is a new way of organizing One of the things I found attractive
the computation, as well as new hard- about our application from the start
ware. Some people think all advances was that although it was nowhere close
in computing are due to software or to “embarrassingly parallel,” it had
hardware, but I think some of the most several characteristics that seemed
interesting ones are where those things to beg for the use of special-purpose
coevolve in some sense. hardware. First, the inner loop that
SHAW: I agree completely. The his- accounted for a substantial majority
tory of special-purpose machines has of the time required for a biologically
been marked by more failures than oriented molecular dynamics simula-
successes, but I suspect that the cre- tion was highly regular, and could be
mapped onto silicon in an extremely a specific type of inter-chip communi- metaphor of a computational micro-
area- and power-efficient way. It also cation that was especially important in scope is that it emphasizes one of the
turned out that these inner-loop cal- our application. key things that Anton isn’t. Although
culations could be structured in such a Since my own interest is in the appli- we sometimes describe the machine
way that the same data was used a num- cation of molecular dynamics simula- as a “special-purpose supercomputer,”
ber of times, and with well-localized tions to biological problems, I haven’t its range of applicability is in practice
data transfers that minimized the need been forced to think very hard about so narrow that thinking of Anton as a
to send data across the chip, much less what aspects of the approach we’ve fol- computer is a bit like thinking of a mi-
to and from off-chip memory. lowed might be applicable to the code- croscope as a general-purpose labora-
There were some parts of our ap- sign of special-purpose machines and tory instrument. Like the optical mi-
plication where any function-specific algorithms for other applications. If I croscope, Anton is really no more than
hardware would have been grossly un- had to guess, I’d say that at least some a specialized tool for looking at a par-
derutilized or unable to take advantage aspects of our general approach might ticular class of objects that couldn’t be
of certain types of new algorithms or wind up being relevant to researchers seen before.
biophysical models that might later be who are looking for an insane increase HANRAHAN: So now that you have this
discovered. Fortunately, most of those in speed for certain other applications, microscope, what do you want to point
calculations aren’t executed all that but that hunch isn’t based on anything it at? I know you must be collaborating,
PHOTOGRA PH COURT ESY O F DAVID E. SH AW
frequently in a typical biomolecular very solid. and you have computational chemists
simulation. That made it feasible to HANRAHAN: In the Communications and biologists at D. E. Shaw Research.
incorporate a set of programmable on- paper, Anton was described as a com- Do you have some problems that you
chip processors that could be used for putational microscope. I really liked want to go after with it?
various calculations that fell outside that phrase and that the name Anton SHAW: There’s a wide range of spe-
the inner loop. We were also able to came from van Leeuwenhoek, who was cific biological phenomena we’d like
make use of problem-specific knowl- one of the first microscopists. to know more about, but at this point,
edge to provide hardware support for SHAW: Part of the reason I like the there’s a lot we can learn by simply
putting things under the microscope as part of standard practice, and then on these new problems?
and seeing what’s there. When Anton routinely deploy this approach to solv- SHAW: I’m not sure I deserve those
van Leeuwenhoek first started exam- ing biological problems. very kind words, but for what it’s worth,
ining pond water and various bodily SHAW: That’s a great analogy. Al- I tend to generate a plentiful supply of
fluids, there’s no way he could have though the evidence is still what I’d ideas, the vast majority of which turn
predicted that he’d see what we now characterize as preliminary, I think out to be bad ones. In some cases, they
know to be bacteria, protozoa, and there’s enough of it at this point to involve transplanting computational
blood and sperm cells, none of which predict that MD simulations will prob- techniques from one application to
had ever been seen before. Although ably be playing a much larger role another, and there’s usually a good
we have no illusions about our ma- within the field of structural biology a reason why the destination field isn’t
chine having an impact comparable to few years from now. It’s hard to tell in already using that technique. I also
the optical microscope, the fact is that advance whether there’s a Toy Story have a remarkable capacity to delude
nobody has ever seen a protein move around the corner, since the biggest myself into thinking that each idea has
over a period of time even remotely breakthroughs in biology often turn a higher probability of working than it
close to what we’re seeing now, so in out to be ones that would have been really does, which provides me with the
some ways, we’re just as clueless as difficult to even characterize before motivation I need to keep working on
van Leeuwenhoek was when he start- they happened. It may be that there are it. And, every once in a while, I stumble
ed looking. some deep principles out there that on an idea that actually works.
All that being said, there are some are waiting to be discovered in silico— HANRAHAN: It sounds like you have
biological systems and processes that ones that can tell us something funda- this “gene” for computing. You know
we’ve been interested in for a while, mental about some of the mechanisms algorithms, you know architecture,
and we’re beginning to learn more nature has evolved to do what it wants but yet you still are fascinated with ap-
about some of them now that we’re able to get done. It’s hard to plan for a dis- plying them to new problems. That’s
to run extremely long simulations. The covery like that; you just have to be in what’s often missing in our field. Peo-
one that’s probably most well known is the right territory with the tools and ple learn the techniques of the field but
the process of protein folding, which skills you think you might need in or- they don’t know how to apply them in a
is when a string of amino acids folds der to recognize it when you see it. new problem domain.
up into a three-dimensional protein. HANRAHAN: There’s no place that’s SHAW: I love learning about new
We’ve already started to learn some in- more fun to be when you have a new fields, but in some ways I feel like a
teresting things related to folding that microscope. van Leeuwenhoek must tourist whose citizenship is computer
we wouldn’t have known if it hadn’t have had a good time. I’ve read a bunch science. I think to myself, “I’m do-
been for Anton, and we’re hoping to about Robert Hooke, too, who was a ing computational finance, but I am a
learn more over time. We’ve also con- contemporary of van Leeuwenhoek. computer scientist. I’m doing compu-
ducted studies of a class of molecules He was part of the Royal Society. Every tational biology, but I am a computer
called kinases, which play a central week or two, they would get together scientist.” When we computer scien-
role in the development and treatment and make all these discoveries because tists start infiltrating a new discipline,
of cancer. And we’re looking at several they were looking at the world in a dif- what we bring to the table is often more
proteins that transfer either ions or sig- ferent way. than just a bag of tricks. What’s some-
nals through membranes in the cell. SHAW: I’ve always thought it would times referred to as “computational
We’re also using Anton to develop new be great to live during a period when a thinking” is leaving its mark on one
algorithms and methods, and to test lot of fundamental discoveries were be- field after another—and the night is
and improve the quality of the physical ing made, and a single scientist could still young.
models that are used in the simulation stay on top of most of the important
process itself. advances being made across the full
HANRAHAN: It seems like you’re al- breadth of a given discipline. It’s a bit Related articles
most at a tipping point. I worked at Pix- sad that we can’t do that anymore— on queue.acm.org
ar, and one of the singular events was victims of our success. A Conversation with
when computer graphics were used in HANRAHAN: But one thing that amaz- Kurt Akeley and Pat Hanrahan
Jurassic Park. Even bigger was Toy Story. es me about you is the number of fields http://queue.acm.org/detail.cfm?id=1365496
Once the graphics software reached a you’ve had an impact on. I’m trying to Beyond Beowulf Clusters
certain maturity, and once you showed figure out your secret sauce. You seem Philip Papadopoulos, Greg Bruno, Mason Katz
that it could be used to make a block- to be able to bring computation to bear http://queue.acm.org/detail.cfm?id=1242501
buster, then it wasn’t that long after- in problem areas that other people Databases of Discovery
ward that almost every movie included haven’t been as successful in. Obvi- James Ostell
computer-generated effects. How close ously you’ve had a huge impact on the http://queue.acm.org/detail.cfm?id=1059806
are we to that tipping point in structur- financial industry with the work you
al biology? If you’re able to solve some did on modeling stocks and portfolios.
important problems in structural biol- And now you’re doing biochemistry.
ogy, then people might begin consider- How do you approach these new areas?
ing molecular dynamics simulations How do you bring computers to bear © 2009 ACM 0001-0782/09/1000 $10.00
acmqueue ')"+&(("-
'(")'"+ -"#+"")'(&-
,$&('"+ -,$"'( '##1&'
!#&#"("(")"%)()&'')'
planetqueue #'-queue )(#&'+#
/)" #0!$#&("(#"("(&#!(
(
&&-"$&#*#!!"(&-videos
#+" # audioroundtable discussions
$ )')"%)acmqueue case studies
Visit today!
http://queue.acm.org/
contributed articles
DOI:10.1145/ 1562764.1562783
technology advances to double per-
Writing programs that scale with increasing formance every 18 months. The im-
plicit hardware/software contract was
numbers of cores should be as easy as writing that increased transistor count and
programs for sequential computers. power dissipation were OK as long
as architects maintained the existing
BY KRSTE ASANOVIC, RASTISLAV BODIK, JAMES DEMMEL, sequential programming model. This
TONY KEAVENY, KURT KEUTZER, JOHN KUBIATOWICZ, contract led to innovations that were
NELSON MORGAN, DAVID PATTERSON, KOUSHIK SEN, inefficient in terms of transistors and
JOHN WAWRZYNEK, DAVID WESSEL, AND KATHERINE YELICK power (such as multiple instruction
issue, deep pipelines, out-of-order
A View of
execution, speculative execution,
and prefetching) but that increased
performance while preserving the se-
quential programming model.
the Parallel
The contract worked fine until we
hit the power limit a chip is able to
dissipate. Figure 1 reflects this abrupt
change, plotting the projected micro-
Computing
processor clock rates of the Interna-
tional Technology Roadmap for Semi-
conductors in 2005 and then again just
two years later.16 The 2005 prediction
Landscape
was that clock rates should have ex-
ceeded 10GHz in 2008, topping 15GHz
in 2010. Note that Intel products are
today far below even the conservative
2007 prediction.
After crashing into the power wall,
architects were forced to find a new par-
adigm to sustain ever-increasing perfor-
mance. The industry decided the only
viable option was to replace the single
power-inefficient processor with many
efficient processors on the same chip.
INDUSTRY NEEDS HELP from the research community The whole microprocessor industry
to succeed in its recent dramatic shift to parallel thus declared that its future was in par-
allel computing, with increasing num-
computing. Failure could jeopardize both the bers of processors, or cores, each tech-
IT industry and the portions of the economy nology generation every two years. This
style of chip was labeled a multicore mi-
that depend on rapidly improving information croprocessor. Hence, the leap to mul-
technology. Here, we review the issues and, as an ticore is not based on a breakthrough
example, describe an integrated approach we’re in programming or architecture and
is actually a retreat from the more dif-
developing at the Parallel Computing Laboratory, or ficult task of building power-efficient,
Par Lab, to tackle the parallel challenge. high-clock-rate, single-core chips.5
ILLUSTRATION BY LEONELLO CA LVET TI
Research, MasPar, nCUBE, Sequent, long span in between is software. We Metro, which has 188 five-stage RISC
Silicon Graphics, and Thinking Ma- use the bridge analogy throughout cores, plus four spares to help yield and
chines are just the best-known mem- this article. The aggressive goal of the dissipate just 35 watts.
bers of the Dead Parallel Computer So- parallel revolution is to make it as easy It may be reasonable to assume
ciety. Given this sad history, multicore to write programs that are as efficient, that manycore computers will be ho-
pessimism abounds. Quoting comput- portable, and correct (and that scale as mogeneous, like the Metro, but there
ing pioneer John Hennessy, President the number of cores per microproces- is an argument for heterogeneous
of Stanford University: sor increases biennially) as it has been manycores as well. For example, sup-
“…when we start talking about par- to write programs for sequential com- pose 10% of the time a program gets no
allelism and ease of use of truly parallel puters. Moreover, we can fail overall speedup on a 100-core computer. To
computers, we’re talking about a problem if we fail to deliver even one of these run this sequential piece twice as fast,
that’s as hard as any that computer sci- “parallel virtues.” For example, if par- assume a single fat core would need 10
ence has faced. …I would be panicked if I allel programming is unproductive, times as many resources as a thin core
were in industry.”19 this weakness will delay and reduce due to larger caches, a vector unit, and
Jeopardy for the IT industry means the number of programs that are able other features. Applying Amdahl’s Law,
opportunity for the research commu- to exploit new multicore architectures. here are the speedups (relative to one
nity. If researchers meet the paral- Hardware tower. The power wall thin core) of 100 thin cores and 90 thin
lel challenge, the future of IT is rosy. forces the change in the traditional cores for the parallel code plus one fat
If they don’t, it’s not. Hence, there programming model, but the question core for the sequential code:
are few restrictions on potential so- for parallel researchers is what kind of
lutions. Given an excuse to reinvent computing architecture should take Speedup100 = 1 / (0.1 + 0.9/100) = 9.2
the whole software/hardware stack, its place. There is a technology sweet times faster
this opportunity is also a once-in-a- spot around a pipelined processor of Speedup91 = 1 / (0.1/2 + 0.9/90) = 16.7
career chance to fix other weaknesses five-to-eight stages that is most effi- times faster
in computing that have accumulated cient in terms of performance per joule
over the decades like barnacles on the and silicon area.5 Using simple cores In this example of manycore proces-
hull of an old ship. means there is room for hundreds of sor speedup, a fat core needing 10
Here, we lay out one view of the op- them on the same chip. Moreover, hav- times as many resources would be
portunities, then, as an example, de- ing many such simple cores on a chip more effective than the 10 thin cores
scribe in more depth the approach of simplifies hardware design and verifi- it replaces.5,15
the Berkeley Parallel Computing Lab, cation, since each core is simple, and One notable challenge for the hard-
or Par Lab, updating two long techni- replication of cores is nearly trivial. ware tower is that it takes four to five
cal reports4,5 that include more detail. Just as it’s easy to add spares to mask years to design and build chips and port
Our goal is to recruit more parallel manufacturing defects, “manycore” software to evaluate them. Given this
revolutionaries. computers can also have higher yield. lengthy cycle, how could researchers in-
One example of a manycore comput- novate more quickly?
Parallel Bridge er is from the world of network proces- Software span. Software is the main
The bridge in Figure 2 represents an sors, which has seen a great deal of inno- problem in bridging the gap between
analogy connecting computer users vation recently due to the growth of the users and the parallel IT industry.
on the right to the IT industry on the networking market. The best-designed Hence, the long distance of the span in
left. The left tower is hardware, the network processor is arguably the Cisco Figure 2 reflects the daunting magni-
right tower is applications, and the Silicon Packet Processor, also known as tude of the software challenge.
One especially vexing challenge
Figure 1. Microprocessor clock rates of Intel products vs. projects from for the parallel software span is that
the International Roadmap for Semiconductors in 2005 and 2007.16
sequential programming accommo-
dates the wide range of skills of today’s
25
programmers. Our experience teach-
ing parallelism suggests that not every
20 programmer is able to understand the
nitty gritty of concurrent software and
Clock Rate (GHz)
15
2005 Roadmap
parallel hardware; difficult steps in-
clude locks, barriers, deadlocks, load
10
balancing, scheduling, and memory
consistency. How can researchers de-
2007 Roadmap
velop technology so all programmers
5 Intel single core
benefit from the parallel revolution?
Intel multicore A second challenge is that two criti-
0
2001 2003 2005 2007 2009 2011 2013
cal pieces of system software—com-
pilers and operating systems—have
grown large and unwieldy and hence
Hardware
Software
Applications
IT Industry
Users
Figure 2. Bridge analogy connecting users to a parallel IT industry, inspired by the view of the Golden Gate Bridge from Berkeley, CA.
resistant to change. One estimate is is not defined by only average per- ings with 50,000 or more servers to run
that it takes a decade for a new compil- formance; advances could be in, say, SaaS, inspiring the new catchphrase
er optimization to become part of pro- worst-case response time, battery life, “cloud computing.”b They have also be-
duction compilers. How can research- reliability, or security. To save the IT gun renting thousands of machines by
ers innovate rapidly if compilers and industry, researchers must demon- the hour to enable smaller companies
operating systems evolve so glacially? strate greater end-user value from an to benefit from cloud computing. We
A final challenge is how to measure increasing number of cores. expect these trends to accelerate; and
improvement in parallel program- The mobile device (laptops and hand-
ming languages. The history of these Par Lab helds) is the client. In 2007, Hewlett-
languages largely reflects researchers As a concrete example of the parallel Packard, the largest maker of PCs,
deciding what they think would be landscape, we describe Berkeley’s Par shipped more laptops than desktops.
better and then building it for oth- Lab project,a exploring one of many Millions of cellphones are shipped each
ers to try. As humans write programs, potential approaches, though we won’t day with ever-increasing functionality, a
we wonder whether human psychol- know for years which of our ideas will trend we expect to accelerate as well.
ogy and human-subject experiments bear fruit. We hope it inspires more Surprisingly, these extremes in
shouldn’t be allowed to play a larger researchers to participate, increasing computing share many characteris-
role in this revolution.17 the chance of finding a solution before tics. Both concern power and energy—
Applications tower. The goal of re- it’s too late for the IT industry. the datacenter due to the cost of power
search into parallel computing should Given a five-year project, we project and cooling and the mobile client due
be to find compelling applications that the state of the field in five to 10 years, to battery life. Both concern cost—
thirst for more computing than is cur- anticipating that IT will be driven to the datacenter because server cost is
rently available and absorb biennially extremes in size due to the increasing replicated 50,000 times and mobile
increasing number of cores for the next popularity of software as a service, or clients because of a lower unit-price
decade or two. Success does not require SaaS: target. Finally, the software stacks are
improvement in the performance of The datacenter is the server. Amazon, becoming similar, with more layers for
ILLUSTRATION BY LEONELLO CA LVET TI
all legacy software. Rather, we need to Google, Microsoft, and other major IT mobile clients and increasing concern
create compelling applications that ef- vendors are racing to construct build- about protection and security.
fectively utilize the growing number of
cores while providing software environ-
a In March 2007, Intel and Microsoft invited 25 b See Ambrust, M. et al. Above the Clouds:
ments that ensure that legacy code still universities to propose five-year centers for A Berkeley View of Cloud Computing. Univer-
works with acceptable performance. parallel computing research; the Berkeley and sity of California, Berkeley, Technical Report
Note that the notion of “better” Illinois efforts were ranked first and second. EECS-2009-28.
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 59
contributed articles
Many datacenter applications have stroke patients. Advanced physiological vious parallel projects, emphasizing
ample parallelism across independent blood-flow modeling based on com- software architecture, autotuning, and
users, so the Par Lab focuses on paral- putational analysis of 3D medical im- separate support for productivity vs.
lelizing applications for clients. The ages of a patient’s cerebral vasculature performance programming.
multicore and manycore chips in the enables “virtual stress testing” to risk- Architecting parallel software with
datacenter stand to benefit from the stratify stroke victims intraoperatively. design patterns, not just parallel pro-
same tools and techniques developed Patients thus identified at low risk of gramming languages. Our situation is
for similar chips in mobile clients. complications can then be treated to similar to that found in other engineer-
Given this projection, we decided to mitigate the effects of the stroke. This ing disciplines where a new challenge
take a fresh approach: the Par Lab will technology will ultimately lower compli- emerges that requires a top-to-bottom
be driven top-down, applications first, cation rates in treating stroke victims, rethinking of the entire engineering
then software, and finally hardware. improve quality of life, reduce medical process; for example, in civil architec-
Par Lab application tower. An unfor- care expenditures, and save lives; and ture, Filippo Brunelleschi’s solution
tunate computer science tradition is we Parallel browser. The browser will in 1418 for how to construct the dome
build research prototypes, then wonder be the largest and most important ap- for the Cathedral of Florence required
why applications people don’t use them. plication on many mobile devices. We innovations in tools and building tech-
In the Par Lab, we instead selected ap- will first parallelize sequential browser niques, as well as rethinking the whole
plications up-front to drive research and bottlenecks. Rather than parallelizing process of developing an architecture.
provide concrete goals and metrics to JavaScript programs, we are pursuing All computer science faces a similar
evaluate progress. We selected each ap- an actor language with implicit paral- challenge; parallel programming is
plication based on five criteria: compel- lelism. Such a language may be acces- overdue for a fundamental rethinking
ling in terms of likely market or social sible to Web programmers while al- of the process of designing software.
impact, with short-term feasibility and lowing them to extract the parallelism Programmers have been trying to
longer-term potential; requiring signifi- in the browser’s JIT compiler, thereby craft parallel code for decades and
cant speedup or smaller, more efficient turning all Web-site developers un- learned a great deal about what works
platform to work as intended; covering knowingly into parallel programmers. and what doesn’t work. Automatic par-
the possible platforms and markets Application-domain experts are allelism doesn’t work. Compilers are
likely to dominate usage; enabling tech- first-class members of the Par Lab proj- great at low-level scheduling decisions
nology for other applications; and in- ect. Rather than try to answer design but can’t discover new algorithms to
volvement of a local committed expert questions abstractly, we ask our experts exploit concurrency. Programmers in
application partner to help design, use, what they prefer in each case. Project high-performance computing have
and evaluate our technology. success is judged by the user experience shown that explicit technologies (such
Here are the five initial applications with the collective applications on our as MPI and OpenMP) can be made to
we’re developing: hardware-software prototypes. If suc- work but too often require heroic ef-
Music/hearing. High-performance cessful, we imagine building on these fort untenable for most commercial
signal processing will permit: concert- five applications to create other appli- software vendors.
quality sound-delivery systems for cations that are even more compelling, To engineer high-quality parallel
home sound systems and conference as in the following two examples: software, we plan to rearchitect the
calls; composition and gesture-driven Name Whisperer. Imagine that your software through a “design pattern lan-
live-performance systems; and much mobile client peeking out of your shirt guage.” As explored in his 1977 book,
improved hearing aids; pocket is able to recognize the per- civil architect Christopher Alexander
Speech understanding. Dramatically son walking toward you to shake your wrote that “design patterns” describe
improved automatic speech recogni- hand. It would search a personal im- time-tested solutions to recurring prob-
tion in moderately noisy and rever- age database, then whisper in your lems within a well-defined context.3
berant environments would greatly ear, “This man is John Smith. He got An example is Alexander’s “family of
improve existing applications and en- an A– from you in CS152 in 1993”; and entrances” pattern, addressing how to
able new ones, like, say, a real-time Health Coach. As your mobile client simplify comprehension of multiple
meeting transcriber with rewind and is always with you, you could take pic- entrances for a first-time visitor to a
search. Depending on acoustic condi- tures and weigh your dishes (assum- site. He defined a “pattern language”
tions, current transcribers can gener- ing it has a built-in scale) before and as a collection of related and interlock-
ate many errors; after each meal. It would also record ing patterns, constructed such that the
Content-based image retrieval. Con- how much you exercise. Given calories patterns flow into each other as the de-
sumer-image databases are growing so consumed and burned and an image of signer solves a design problem.
dramatically they require automated your body, it could visualize what you’re Computer scientists are trained to
search instead of manual labeling. Low likely to look like in six months at this think in well-defined formalisms. Pat-
error rates require processing very high rate and what you’d look like if you ate tern languages encourage a less for-
dimensional feature spaces. Current less or exercised more. mal, more associative way of thinking
image classifiers are too slow to deliver Par Lab software span. Software about a problem. A pattern language
adequate response times; is the major effort of the project, and does not impose a rigid methodol-
Intraoperative risk assessment for we’re taking a different path from pre- ogy; rather, it fosters creative problem
parallel challenge,
for software design is not new. An exam- to define an application’s high-level
ple is Gamma et al.’s 1994 book Design software architecture, but a complete
Patterns, which outlined patterns use-
ful for object-oriented programming.12
the future of IT pattern language for application de-
sign must at least span the full range,
In building our own pattern language, is rosy. If they from high-level architecture to detailed
we found Shaw’s and Garlan’s report,23
which described a variety of architec-
don’t, it’s not. software implementation and tuning.
Mattson et al’s 2004 book Patterns for
tural styles useful for organizing soft- Parallel Programming18 was the first
ware, to be very effective. That these such attempt to systematize parallel
architectural styles may also be viewed programming using a complete pattern
as design patterns was noted earlier language. We combine the structural
by Buschmann in his 1996 book Pat- and computational patterns mentioned
tern-Oriented Software Architecture.7 In earlier in our pattern language to liter-
particular, we adopted Pipe-and-Filter, ally sit on top of the algorithmic struc-
Agent-and-Repository, Process Control, tures and implementation structures
and Event-Based architectural styles as in the pattern language in Mattson’s
structural patterns within our pattern book. The resulting pattern language is
language. To this list, we add MapReduce still under development but is already
and Iterator as structural design patterns. employed by the Par Lab to develop the
These patterns define the structure software architectures and parallel im-
of a program but do not indicate what plementations of such diverse applica-
is actually computed. To address this tions as content-based image retrieval,
blind spot, another key part of our pat- large-vocabulary continuous speech
tern language is the set of “dwarfs” of recognition, and timing analysis for in-
the Berkeley View reports4,5 (see Fig- tegrated circuit design.
ure 3). Dwarfs are best understood as Patterns are conceptual tools that
computational patterns providing the help a programmer reason about a
computational interior of the structural software project and develop an ar-
patterns discussed earlier. By analogy, chitecture but are not themselves
the structural patterns describe a fac- implementation mechanisms for
tory’s physical structure and general producing code.
workflow. The computational patterns Split productivity and efficiency lay-
describe the factory’s machinery, flow ers, not just a single general-purpose
of resources, and work products. Struc- layer. A key Par Lab research objective
tural and computational patterns can is to enable programmers to easily
be combined to architect arbitrarily write programs that run as efficiently
complex parallel software systems. on manycore systems as on sequential
Convention holds that truly useful ones. Productivity, efficiency, and cor-
patterns are not invented but mined rectness are inextricably linked and
from successful software applications. must be addressed together. These ob-
To arrive at our list of useful compu- jectives cannot be accomplished with
tational patterns we began with those a single-point solution (such as a uni-
compiled by Phillip Collela of Law- versal language). In our approach, pro-
rence Berkeley National Laboratory of ductivity is addressed in a productivity
the “seven dwarfs of high-performance layer that uses a common composition
computing.” Then, in 2006 and 2007 and coordination language to glue to-
we worked with domain experts to gether the libraries and programming
broadly survey other application ar- frameworks produced by the efficien-
eas, including embedded systems, cy-layer programmer. Efficiency is prin-
general-purpose computing (SPEC cipally handled through an efficiency
benchmarks), databases, games, arti- layer that is targeted for use by expert
ficial intelligence/machine learning, parallel programmers.
computer-aided design of integrated The key to generating a successful
multicore software developer commu- to recurring problems. Basing frame- defining a thin portability layer that
nity is to maximally leverage the efforts works on pervasive design patterns will runs efficiently across single-socket
of parallel programming experts by en- help make parallel frameworks broad- platforms and includes features for
capsulating their software for use by the ly applicable. parallel job creation, synchronization,
programming masses. We use the term Productivity-layer programmers will memory allocation, and bulk-memory
“programming framework” to mean compose libraries and programming access. To provide a common model of
a software environment that supports frameworks into applications with the memory across machines with coher-
implementation of the solution pro- help of a composition and coordina- ent caches, local stores, and relatively
posed by the associated design pattern. tion language.13 The language will be slow off-chip memory, we are defining
The difference between a programming implicitly parallel; that is, its composi- an API based on the idea of logically
framework and a general programming tion will have serial semantics, mean- partitioned shared memory, inspired
model or language is that in a program- ing the composed programs will be by our experience with Unified Parallel
ming framework the customization is safe (such as race-free) and virtualized C,27 which partitions memory among
performed only at specified points that with respect to processor resources. It processors but not (currently) between
are harmonious with the style embod- will document and check interface re- on- and off-chip.
ied in the original design pattern. An strictions to avoid concurrency bugs We may implement this efficiency
example of a successful sequential resulting from incorrect composition, language either as a set of runtime
programming framework is the Ruby as in, say, instantiating a framework primitives or as a language extension of
on Rails framework, which is based with a stateful function when a state- C. It will be extensible with libraries to
on the Model-View-Controller pat- less one is required. Finally, it will experiment with various architectural
tern.26 Users have ample opportunity support definition of domain-specific features (such as transactions, dynamic
to customize the framework but only abstractions for constructing frame- multithreading, active messages, and
in harmony with the core Model-View- works for specific applications, offer- collective communication). The API will
Controller pattern. ing a programming experience similar be implemented on some existing mul-
Frameworks include libraries, code to MATLAB and SQL. ticore and manycore platforms and on
generators, and runtime systems that Parallel programs in the efficiency our own emulated manycore design.
assist programmers with implementa- layer are written very close to the ma- To engineer parallel software, pro-
tion by abstracting difficult portions chine, with the goal of allowing the best grammers must be able to start with
of the computation and incorporating possible algorithm to be written in the effective software architectures, and
them into the framework itself. Histor- primitives of the layer. Unfortunately, the software engineer would describe
ically successful parallel frameworks existing multicore systems do not of- the solution to a problem in terms of a
encode the collective experience of the fer a common low-level programming design pattern language. Based on this
programming community’s solutions model for parallel code. We are thus language, the Par Lab is creating a fam-
Figure 3. The color of a cell (for 12 computational patterns in seven general application areas and five Par Lab applications)
indicates the presence of that computational pattern in that application; red/high; orange/moderate; green/low; blue/rare.
Embed
Games
SPEC
CAD
HPC
ML
DB
2. Circuits
3. Graph Algorithms
4. Structured Grid
5. Dense Matrix
6. Sparse Matrix
7. Spectral (FFT)
8. Dynamic Prog
9. Particle Methods
10. Backtrack/B&B
ily of frameworks to help turn a design machine. This surprising result is partly The synthesized mechanics could be
into working code. The general-pur- explained by the way the autotuner tire- barrier synchronization expressions or
pose programmer will work largely with lessly tries many unusual variants of a tricky loop bounds in stencil loops. Our
the frameworks and stay within what particular routine. Unlike libraries, au- sketching-based synthesis is to tradi-
we call the productivity layer. Specialist totuners also allow tuning to the partic- tional, deductive synthesis what model
programmers trained in the details of ular problem size. Autotuners also pre- checking is to theorem proving; rather
parallel programming technology will serve clarity and support portability by than interactively deriving a program,
work within the efficiency layer to im- reducing the temptation to mangle the our system searches a space of candi-
plement the frameworks and map them source code to improve performance date programs with constraint solving.
onto specific hardware platforms. This for a particular computer. Efficiency is achieved by reducing the
approach will help general-purpose Autotuning also helps with produc- problem to one solved with two com-
programmers create parallel software tion of parallel code. However, paral- municating SAT solvers. In future work,
without having to master the low-level lel architectures introduce many new we hope to synthesize parallel sparse
details of parallel programming. optimization parameters; so far, there matrix codes and data-parallel algo-
Generating code with search-based au- are few successful autotuners for paral- rithms for additional problems (such as
totuners, not compilers. Compilers that lel codes. For any given problem, there parsing).
automatically parallelize sequential may be several parallel algorithms, each Verification and testing, not one or the
code may have great commercial value with alternative parallel data layouts. other. Correctness is addressed differ-
as computers go from one to two to four The optimal choice may depend not ently at the two layers. The productiv-
cores, though as described earlier, his- only on the processor architecture but ity layer is free from concurrency prob-
tory suggests they will be unable to scale also on the parallelism of the computer lems because the parallelism models
from 32 to 64 to 128 cores. Compiling and memory bandwidth. Consequent- are restricted, and the restrictions are
will be even more difficult, as the switch ly, in a parallel setting, the search space enforced. The efficiency-layer code is
to multicore means microprocessors will be much larger than for traditional checked automatically for subtle con-
are becoming more diverse, since con- serial hardware. currency errors.
ventional wisdom is not yet established The table lists the results of auto- A key challenge in verification is
for multicore architectures. For exam- tuning on three multicores for three obtaining specifications for programs
ple, the table here shows the diversity kernels related to the dwarfs’ sparse to verify. Modular verification and au-
in designs of x86 and SPARC multicore matrix, stencil for PDEs, and structured tomated unit-test generation require
computers. In addition, as the num- grids9,30,31 mentioned earlier. This au- the specification of high-level serial se-
ber of cores increase, manufacturers totuned code is the fastest known for mantic constraints on the behavior of
will likely offer products with differing these kernels for all three computers. the individual modules (such as paral-
numbers of cores per chip to cover mul- Performance increased by factors of two lel frameworks and parallel libraries).
tiple price-performance points. They to four over standard code, much better To simplify specification, we use ex-
will also allow each core to vary its clock than you would expect from an optimiz- ecutable sequential programs with the
frequency to save power. Such diversity ing compiler. same behavior as a parallel component,
will make the goals of efficiency, scal- Efficiency-layer programmers will augmented with atomicity constraints
ing, and portability even more difficult be able to build autotuners for use by on a task,21 predicate abstractions of
for conventional compilers, at a time domain experts and other efficiency- the interface of a module,14 or multiple
when innovation is desperately needed. layer programmers to help deliver on ownership types.8
In recent years, autotuners have the goals of efficiency, portability, and Programmers often find it difficult to
become popular for producing high- scalability. specify such high-level contracts involv-
quality, portable scientific code for se- Synthesis with sketching. One chal- ing large modules; however, most find
rial microprocessors,10 optimizing a set lenge for autotuning is how to produce it convenient to specify local properties
of library kernels by generating many the high-performance implementa- of programs using assert statements
variants of a kernel and measuring each tions explored by the search. One ap- and type annotations. Local assertions
variant by running on the target plat- proach is to synthesize these complex and type annotations are often gener-
form. The search process effectively programs. In doing so, we rely on the ated from a program’s implicit correct-
tries many or all optimization switches; search for performance tuning, as well ness requirements (such as data race,
hence, searching may take hours to as for programmer productivity. To ad- deadlock freedom, and memory safety).
complete on the target platform. How- dress the main challenge of traditional The system propagates implications
ever, search is performed only once, synthesis—the need for experts to com- of these local assertions to the module
when the library is installed. The result- municate their insight with a formal boundaries through a combination of
ing code is often several times faster domain theory—we allow that insight static verification and directed automat-
than naive implementations. A single to be communicated directly by pro- ed unit testing. These implications cre-
autotuner can be used to generate high- grammers who write an incomplete ate serial contracts that specify how the
quality code for a variety of machines. program, or “sketch.” In it, they provide modules (such as frameworks) are used
In many cases, the autotuned code is an algorithmic skeleton, and the syn- correctly. When the contracts for the
faster than vendor libraries that were thesizer supplies the low-level mechan- parallel modules are in place, program-
specifically hand-tuned for the target ics by filling in the holes in the sketch. mers use static program verification to
Kernel
Optimization SpMV Stencil LBMHD SpMV Stencil LBMHD SpMV Stencil LBMHD
Standard 1.0 1.3 3.5 1.4 1.5 3.0 2.1 0.5 3.4
Prefetching 1.1 1.7 4.6 2.9 3.8 8.1 3.6 0.5 10.5
Final 1.5 2.5 5.6 3.6 8.0 14.1 4.1 6.7 10.5
check if the client code composed with age. The probabilistic models will give else parallelization is counterproduc-
the contracts is correct. a more realistic estimate of coverage of tive. The same argument applies to op-
Static program analysis in the pres- race and other concurrency errors in timistic parallelization. Work efficiency
ence of pointers and heap memory parallel programs. is a demanding requirement, since, for
falsely reports many errors that cannot Parallelism for energy efficiency. While some “inherently sequential” problems,
really occur. For restricted parallelism the earlier computer classes—desktops like finite-state machines, only work-
models with global synchronization, and laptops—reused the software of inefficient algorithms are known. In this
this analysis becomes more tractable, their own earlier ancestors, the energy context, we developed a nearly work-ef-
and a recently introduced technique efficiency for handheld operation may ficient algorithm for lexical analysis. We
called “directed automated testing,” need to come from data parallelism are also working on data-parallel algo-
or concolic unit testing, has shown in tasks that are currently executed se- rithms for Web-page layout and identi-
promise for improving software quality quentially, possibly from three sources: fying parallelism in future Web-browser
through automated test generation us- Efficiency. Completing a task on slow applications, attempting to implement
ing a combination of static and dynam- parallel cores will be more efficient than them with efficient message passing.
ic analyses.21 The Par Lab combines completing it in the same time sequen- Space-time partitioning for decon-
directed testing with model-checking tially on one fast core; structed operating systems. Space-time
algorithms to unit-test parallel frame- Energy amortization. Preferring data- partitioning is crucial for manycore cli-
works and libraries composed with se- parallel algorithms over other styles of ent operating systems. A spatial partition
rial contracts. Such techniques enable parallelism, as SIMD and vector com- (partition for short) is an isolated unit
programmers to quickly test executions puters amortize the energy expended containing a subset of physical machine
for data races and deadlocks directly, on instruction delivery; and resources (such as cores, cache parti-
since a combination of directed test Energy savings. Message-passing pro- tions, guaranteed fractions of memory
input generation and model checking grams may be able to save the energy or network bandwidth, and energy
hijacks the underlying scheduler and used by cache coherence. budget). Space-time partitioning virtu-
controls the synchronization primi- We apply these principles in our work alizes spatial partitions by time-multi-
tives. Our testing techniques will pro- on parallel Web browsers. In algorithm plexing whole partitions onto available
vide deterministic replay and debug- design, we observe that to save energy hardware but at a coarse-enough gran-
ging capabilities at low cost. We will with parallelization, parallel algorithms ularity to allow efficient programmer-
also develop randomized extensions of must be close to “work efficient,” that level scheduling in a partition.
our directed testing techniques to build is, they should perform no more total The presence of space-time parti-
a probabilistic model of path cover- work than a sequential algorithm, or tioning leads to restructuring systems
researchers must
scheduling and resource management Optional explicit control of the mem-
at the partition granularity. Applica- ory hierarchy. Caches were invented so
tions and OS services (such as file sys-
tems) run within their own partitions.
demonstrate hardware could manage a memory hi-
erarchy without troubling the program-
Partitions are lightweight and can be greater end-user mer. When it takes hundreds of clock
resized or suspended with similar over-
heads to a process-context swap.
value from cycles to go to memory, programmers
and compilers try to reverse-engineer
A key tenet of our approach is that an increasing the hardware controllers to make bet-
resources given to a partition are either
exclusive (such as cores or private cach- number of cores. ter use of the hierarchy. This backward
situation is especially apparent for
es) or guaranteed via a quality-of-service hardware prefetchers when program-
contract (such as a minimum fraction mers try to create a particular pattern
of network or memory bandwidth). that will invoke good prefetching. Our
During a scheduling quantum, the ap- approach aims to allow programmers
plication runtime within a partition to quickly turn a cache into an explicitly
is given unrestricted “bare metal” ac- managed local store and the prefetch
cess to its resources and may schedule engines into explicitly controlled Di-
tasks onto them in some way. Within rect Memory Access engines. To make it
a partition, our approach has much in easy for programmers to port software
common with the Exokernel.11 In the to our architecture, we also support a
common case, we expect many appli- traditional memory hierarchy. The low-
cation runtimes to be written as librar- overhead mechanism we use allows
ies (similar to libOS). Our Tessellation programs to be composed of methods
kernel is a thin layer responsible for that rely on local stores and methods
only the coarse-grain scheduling and that rely on memory hierarchies.
assignment of resources to partitions Accurate, complete counters of perfor-
and implementation of secure restrict- mance and energy. Sadly, performance
ed communications among partitions. counters on current single-core com-
The Tessellation kernel is much thin- puters often miss important measure-
ner than traditional kernels or even ments (such as prefetched data) or are
hypervisors. It avoids many of the per- unique to a computer and only under-
formance issues associated with tra- standable by the machine’s designers.
ditional microkernels by providing OS We will include performance enhance-
services through secure messaging to ments in the Par Lab architecture only
spatially co-resident service partitions, if they have counters to measure them
rather than context-switching to time- accurately and coherently. Since energy
multiplexed service processes. is as important as performance, we also
Par Lab hardware tower. Past parallel include energy counters so software can
projects were often driven by the hard- improve both. Moreover, these coun-
ware determining the application and ters must be integrated with the soft-
software environment. The Par Lab is ware stack to provide insightful mea-
driven top down from the applications, surements to the efficiency-layer and
so the question this time is what should productivity-layer programmers. Ide-
architects do to help with the goals of ally, this research will lead to a standard
productivity, efficiency, correctness, for performance counters so schedul-
portability, and scalability? ers and software development kits can
Here are four examples of this kind count on them on any multicore.
of help that illustrate our approach: Intuitive performance model. The
Supporting OS partitioning. Our hard- multicore diversity mentioned earlier
ware architecture enforces partition- exacerbates the already difficult jobs
ing of not only the cores and on-chip/ performed by programmers, compiler
off-chip memory but also the commu- writers, and architects. Hence, we de-
nication bandwidth among these com- veloped an easy-to-understand visual
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 65
contributed articles
model with built-in performance guide- Manycore synergy with cloud comput- Center is pursuing deterministic mod-
lines to identify bottlenecks in the dozen ing. SaaS applications in data centers els that allow programmers to reason
dwarfs in Figure 3.29 The Roofline mod- with millions of users are naturally par- with sequential semantics for testing
el plots computational and memory- allel and thus aligned with manycore, while naturally exposing a parallel per-
bandwidth limits, then determines the even if clients apps are not; formance model for WYSIWYG perfor-
best possible performance of a kernel Vitality of open source software. The mance. For reactive programs where
by examining the average number of op- OSS community is a meritocracy, so it’s parallelism is part of the problem, it is
erations per memory access. It also plots likely to embrace technical advances pursuing a shared-nothing approach
ceilings below the “roofline” to suggest rather than be limited by legacy code. that leverages actor-like models used in
the optimizations that might be useful Though OSS has existed for years, it is distributed systems. For application do-
for improving performance. One goal of more important commercially today mains that allow greater specialization,
the performances counters should be to than it was; it is developing a framework to gener-
provide everything needed to automati- Single-chip multiprocessors enable in- ate domain-specific environments that
cally create Roofiline models. novation. Having all processors on the either hide concurrency or expose only
A notable challenge from our ear- same chip enables inventions that were specialized forms of concurrency to the
lier description of the hardware tower impractical or uneconomical when end user while exploiting domain-spe-
is how to rapidly innovate at the hard- spread across many chips; and cific optimizations and performance
ware/software interface, when it can FPGA prototypes shorten the hard- measures. Initial applications and do-
take four to five years to build chips and ware/software cycle. Systems like RAMP mains include teleimmersion via “vir-
run programs needed to evaluate them. help researchers explore designs of tual teleportation” (multimedia), dy-
Given the capacity of field-programma- easy-to-program manycore architec- namic real-time virtual environments
ble gate arrays (FPGAs), researchers can tures and build prototypes more quickly (computer graphics), learning by read-
prototype full hardware and software than they ever could with conventional ing, and authoring assistance (natural
systems that run fast enough to inves- hardware prototypes. language processing).
tigate architectural innovations. This Given the importance of the chal- Stanford. The Pervasive Parallelism
flexibility means researchers can “tape lenges to our shared future in the IT Laboratory (http://ppl.stanford.edu/
out” every day, rather than over years. industry, pessimism is not a sufficient wiki/index.php/Pervasive_Parallelism_
We will leverage the Research Accelera- excuse to sit on the sidelines. The sin is Laboratory) at Stanford University takes
tor for Multiple Processors (RAMP) Proj- not lack of success but lack of effort. an application-driven approach toward
ect (http://ramp.eecs.berkeley.edu/) to parallel computing that extends from
build flexible prototypes fast enough to Related Projects programming models down to hard-
run full software stacks—including new Computer science hasn’t solved the ware architecture. The key technical
operating systems and our five com- parallel challenge though not because concepts are domain-specific languag-
pelling applications—to enable rapid it hasn’t tried. There could be a dozen es for increasing programmer produc-
architecture innovation using future conferences dedicated to parallelism, tivity and a common parallel runtime
prototype software, rather than past including Principles and Practice of Par- environment combining dynamic and
benchmarks.28 allel Programming, Parallel Algorithms static approaches for concurrency
and Architectures, Parallel and Distrib- and locality management. There are
Reasons for Optimism uted Processing, and Supercomputing. domain-specific languages for artifi-
Given the history of parallel comput- All traditionally focus on high-perfor- cial intelligence and robotics, business
ing, it’s easy to be pessimistic about our mance computing; the target hardware data analysis, and virtual worlds and
chances. The good news is that there is usually large-scale computers with gaming. The experimental platform
are plausible reasons researchers could thousands of microprocessors. Simi- is the Flexible Architecture Research
succeed this time: larly, there are many high-performance Machine, or FARM, system, combining
No killer microprocessor. Unlike in the computing research centers. Rather commercial processors with FPGAs in
past, no one is building the faster serial than review this material, here we high- the memory fabric.
microprocessor; programmers need- light four centers focused on multicore Georgia Tech. The Sony, Toshiba,
ing more performance have no option computers and their approaches to the IBM Center of Competence for the Cell
other than parallel hardware; parallel challenge in academia: Broadband Engine Processor (http://sti.
New measures of success. Rather than Illinois. The Universal Parallel Com- cc.gatech.edu/) at Georgia Tech focuses
the traditional goal of linear speedup puting Research Center (http://www. on a single multicore computer, as its
for all software as the number of pro- upcrc.illinois.edu/) at the University of name suggests. Researchers explore
cessors increases, success can reflect Illinois focuses on making it easy for do- versions of programs on Cell, includ-
improved responsiveness or MIPS/Joule main experts to take advantage of paral- ing image compression6 and financial
for a few new parallel killer apps; lelism, so the emphasis is more on pro- modeling.2 The Center also sponsors
All the wood behind one arrow. As ductivity in specific domains than on workshops and provides remote access
there is no alternative, the whole IT in- generality or performance.1 It relies on to Cell hardware.
dustry is committed, meaning many advancing compiler technology to find Rice University. The Habanero Multi-
more people and companies are work- opportunities for parallelism, whereas core Software Project (http://habanero.
ing on the problem; the Par Lab focuses on autotuning. The rice.edu/Habanero_Home.html) at
Rice University is developing languages, performance model. We also plan to try 10. Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E.,
Petitet, A., Vuduc, R., Whaley, R., and Yelick, K. Self-
compilers, managed runtimes, concur- to scrape off the barnacles that have ac- adapting linear algebra algorithms and software.
rency libraries, and tools that support cumulated on the hardware/software Proceedings of the IEEE, Special Issue on Program
Generation, Optimization, and Adaptation 93, 2 (Feb.
portable parallel abstractions with high stack over the years. 2005), 293–312.
productivity and high performance for This parallel challenge offers the 11. Engler, D.R. Exokernel: An operating system
architecture for application-level resource
multicores; examples include parallel worldwide research community an op- management. In Proceedings of the 15th Symposium
language extensions25 and optimized portunity to help IT remain a growth on Operating Systems Principles (Cooper Mountain, CO,
Dec. 3–6, 1995), 251–266.
synchronization primitives.24 industry, sustain the parts of the world- 12. Gamma, E. et al. Design Patterns: Elements of
wide economy that depend on the con- Reusable Object-Oriented Software. Addison-Wesley
Professional, Reading, MA, 1994.
Conclusion tinuous improvement in IT cost-per- 13. Gelernter, D. and Carriero, N. Coordination languages
We’ve provided a general view of the and their significance. Commun. ACM 35, 2 (Feb. 1992),
formance, and take a once-in-a-career 97–107.
parallel landscape, suggesting that the chance to reinvent the whole software/ 14. Henzinger, T.A. et al. Permissive interfaces. In
Proceedings of the 10th European Software Engineering
goal of computer science should be hardware stack. Though there are rea- Conference (Lisbon, Portugal, Sept. 5–9). ACM Press,
making parallel computing productive, sons for optimism, the difficulty of the New York, 2005, 31–40.
15. Hill, M. and Marty, M. Amdahl’s Law in the multicore
efficient, correct, portable, and scal- challenge is reflected in the numerous era. IEEE Computer 41, 7 (2008), 33–38.
able. We highlighted the importance parallel failures of the past. 16. International Technology Roadmap for
Semiconductors. Executive Summary, 2005 and 2007;
of finding new compelling applica- Combining upside and downside, http://public.itrs.net/.
tions and the advantages of manycore this research challenge represents the 17. Kantowitz, B. and Sorkin, R. Human Factors:
Understanding People-System Relationships. John
and heterogeneous hardware. We also most significant of all IT challenges Wiley & Sons, Inc., New York, 1983.
described the research of the Berkeley over the past 50 years. We hope many 18. Mattson, T., Sanders, B., and Massingill, B. Patterns for
Parallel Programming. Addison-Wesley Professional,
Par Lab. While it will take years to learn more innovators will join this quest to Reading, MA, 2004.
which of our ideas work well, we share it build a parallel bridge. 19. O’Hanlon, C. A conversation with John Hennessy and
David Patterson. Queue 4, 10 (Dec. 2005/Jan. 2006),
here as a concrete example of a coordi- 14–22.
nated attack on the problem. Acknowledgments 20. Patterson, D. and Hennessy, J. Computer Organization
and Design: The Hardware/Software Interface, Fourth
Unlike the traditional approach of This research is sponsored by the Uni- Edition. Morgan Kaufmann Publishers, Boston, MA, Nov.
2008.
making hardware king, the Par Lab versal Parallel Computing Research 21. Sen, K. and Viswanathan, M. Model checking
is application-driven, working with Center, which is funded by Intel and multithreaded programs with asynchronous atomic
methods. In Proceedings of the 18th International
domain experts to create compelling Microsoft (Award # 20080469) and by Conference on Computer-Aided Verification (Seattle,
applications in music, image- and matching funds from U.C. Discovery WA, Aug. 17–20, 2006).
22. Sen, K. et al. CUTE: A concolic unit testing engine for
speech-recognition, health, and par- (Award #DIG07-10227). Additional C. In Proceedings of the Fifth Joint Meeting European
allel browsers. support comes from the Par Lab Affili- Software Engineering Conference (Lisbon, Portugal,
Sept. 5–9). ACM Press, New York, 2005, 263–272.
The software span connecting ap- ate companies: National Instruments, 23. Shaw, M. and Garlan, D. An Introduction to Software
plications to hardware relies more on NEC, Nokia, NVIDIA, and Samsung. Architecture. Technical Report CMU/SEI-94-TR-21,
ESC-TR-94-21. CMU Software Engineering Institute,
parallel software architectures than We wish to thank our colleagues in the Carnegie Mellon University, Pittsburgh, PA, 1994.
on parallel programming languages. Par Lab and the Lawrence Berkeley Na- 24. Shirako, J., Peixotto, D., Sarkar, V., and Scherer,
W. Phasers: A unified deadlock-free construct for
Instead of traditional optimizing com- tional Laboratory collaborations who collective and point-to-point synchronization. In
pilers, we depend on autotuners, us- shaped these ideas. Proceedings of the 22nd ACM International Conference
on Supercomputing (Island of Kos, Greece, June 7–12).
ing a combination of empirical search ACM Press, New York, 2008, 277–288.
and performance modeling to create 25. Shirako, J., Kasahara, H., and Sarkar, V. Language
References
extensions in support of compiler parallelization. In
highly optimized libraries tailored to 1. Adve, S. et al. Parallel Computing Research at Illinois:
Proceedings of the 20th Workshop on Languages and
The UPCRC Agenda. White Paper. University of Illinois,
Compilers for Parallel Computing (Urbana, IL, Oct.
specific machines. By splitting the soft- Urbana-Champaign, IL, Nov. 2008.
11–13). Springer-Verlag, Berlin, 2007, 78–94.
ware stack into a productivity layer and 2. Agarwal, V., Liu, L.-K., and Bader, D. Financial modeling
26. Thomas, D. et al. Agile Web Development with Rails,
on the Cell broadband engine. In Proceedings of 22nd
Second Edition. The Pragmatic Bookshelf, Raleigh, NC,
an efficiency layer and targeting them IEEE International Parallel and Distributed Processing
2008.
Symposium (Miami, FL, Apr. 14–18, 2008).
at domain experts and programming 3. Alexander, C. et al. A Pattern Language: Towns,
27. UPC Language Specifications, Version 1.2. Technical
Report LBNL-59208. Lawrence Berkeley National
experts respectively, we hope to bring Buildings, Construction. Oxford University Press, 1997.
Laboratory, Berkeley, CA, 2005.
4. Asanovic, K. et al. The Parallel Computing Laboratory
parallel computing to all programmers at U.C. Berkeley: A Research Agenda Based on the
28. Wawrzynek, J. et al. RAMP: Research Accelerator for
Multiple Processors. IEEE Micro 27, 2 (Mar. 2007),
while keeping domain experts produc- Berkeley View. UCB/EECS-2008-23, University of
46–57.
California, Berkeley, Mar. 21, 2008.
tive and allowing expert programmers 5. Asanovic, K. et al. The Landscape of Parallel Computing
29. Williams, S., Waterman, A., and Patterson, D. Roofline:
An insightful visual performance model for floating-
to achieve maximum efficiency. Our ap- Research: A View from Berkeley. UCB/EECS-2006-183,
point programs and multicore architectures. Commun.
University of California, Berkeley, Dec. 18, 2006.
proach to correctness relies on verifica- 6. Bader, D.A. and Patel, S. High-performance MPEG-2
ACM 52, 4 (Apr. 2009), 65–76.
30. Williams, S. et al. Lattice Boltzmann simulation
tion where possible, then uses the same software decoder on the Cell broadband engine. In
optimization on leading multicore platforms. In
Proceedings of the 22nd IEEE International Parallel
tools to reduce the amount of testing Proceedings of the 22nd IEEE International Parallel
and Distributed Processing Symposium (Miami, FL, Apr.
and Distributed Processing Symposium (Miami, FL, Apr.
where verification is not possible. 14–18, 2008).
14–18, 2008).
7. Buschmann, F. et al. Pattern-Oriented Software
The hardware tower of the Par Lab 31. Williams, S. et al. Optimization of sparse matrix-vector
Architecture: A System of Patterns. John Wiley & Sons,
multiplication on emerging multicore platforms. In
serves the software span and applica- Inc., New York, 1996.
Proceedings of the Supercomputing (SC07) Conference
8. Clarke, D.G. et al. Ownership types for flexible alias
(Reno, NV, Nov. 10–16). ACM Press, New York, 2007.
tion tower. Examples of such service protection. In Proceedings of the OOPSLA Conference
include support for OS partitioning, ex- (Vancouver, BC, Canada, 1998), 48–64.
9. Datta, K. et al. Stencil computation optimization and The authors are all affiliated with the Par Lab (http://parlab.
plicit control for the memory hierarchy, autotuning on state-of-the-art multicore architectures. eecs.berkeley.edu/) at the University of California, Berkeley.
In Proceedings of the ACM/IEEE Supercomputing (SC)
accurate measurement for performance 2008 Conference (Austin, TX, Nov. 15–21). IEEE Press,
and energy, and an intuitive, multicore Piscataway, NJ, 2008. © 2009 ACM 0001-0782/09/1000 $10.00
Automated
mon in open source forums and increas-
ingly common across a number of enter-
prise-level projects. They can grow quite
large, with many thousands of stake-
Support for
holders. Most use discussion threads to
help focus the conversation; however,
user-defined threads tend to result in re-
dundant ideas, while predefined cat-
Managing
egories might be overly rigid, possibly
leading to coarse-grain topics that fail
to facilitate focused discussions.
To better understand the problems
Feature
and challenges of capturing require-
ments in open forums, we surveyed
several popular open source projects,
evaluating how the structure of their fo-
rums and organization of their feature
Requests in
requests help stakeholders work collab-
oratively toward their goals. The survey
covered a number of tools and frame-
works: a customer relationship man-
Open Forums
agement system called SugarCRM11; a
UML modeling tool called Poseidon10;
an enterprise resource planning tool
called Open Bravo10; a groupware tool
called ZIMBRA10; a Web tool for admin-
istrating the MySQL server called PHP-
MyAdmin10; an open .NET architecture
for Linux called Mono10; and the large
Web-based immersive game world Sec-
grow larger and more complex
AS SOF TWARE PRO J E C T S ond Life.9 All exhibited a significantly
high percentage of discussion threads
and involve more stakeholders across geographical consisting of only one or two feature
and organizational boundaries, project managers requests. For example, as shown in Fig-
increasingly rely on open discussion forums to elicit ure 1, 59% of Poseidon threads, 70% of
SugarCRM threads, 48% of Open Bravo
requirements and otherwise communicate with other threads, and 42% of Zimbra threads in-
stakeholders. Unfortunately, open forums generally cluded only one or two requests. The
presence of so many small threads sug-
don’t support all aspects of the requirements- gests either a significant number of dis-
elicitation process; for example, in most forums, tinct discussion topics or that users had
stakeholders create their own discussion threads, created unnecessary new threads. An
initial analysis found the second case—
introducing significant redundancy of ideas and unnecessary new threads—held for
possibly causing them to miss important discussions. all forums we surveyed; for example in
the SugarCRM forum, stakeholders’ re- Figure 1. Forums are characterized by numerous small threads,
quests related to email lists were found many with only one or two feature requests.
across 20 different clusters, 13 of which POSEIDON
included only one or two comments. Threads: 1,423 Stakeholders: 377
Feature requests: 5,764 Largest thread: 48
We designed an experiment to de- 1800
termine quantitatively if user feature
1600
requests and comments (placed in
new threads) should instead have been 1400
Number of threads
We conducted it using the data from 1000
SugarCRM, an open source customer
800
relationship management system
600
supporting campaign management,
email marketing, lead management, 400
Figure 2. The extent to which feature requests assigned to ture requests are preprocessed to re-
individual vs. larger threads fit into global topics. move common words (such as “this”
Large thread and “because”) not useful for identify-
Small thread ing underlying themes. The remaining
70
terms are then stemmed to their gram-
Number of feature requests
60
matical roots. Each feature request x
50
40
is represented as a vector of terms (tx,1,
30
tx,2,……, tx,w,) that is then normalized
20
through a technique called term fre-
10 quency, inverse document frequency
0 (tf-idf), where tf represents the original
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 term frequency of term t in the feature
Proximity score request, and idf represents the inverse
document frequency; idf is often com-
puted as log2(N/dft), where N represents
Figure 3. Two-stage spherical K-means. the number of feature requests in the
entire forum, and dft represents the
Algorithm: Two-stage spherical K-means clustering
number of feature requests containing
Input: unlabeled instances ⲭ = {xi }i = 1, number of clusters K, initial t. The similarity between each normal-
N
selected by the algorithm, with the ob- feature requests. Re-clustering is per- included 4,205 feature requests we
jective that they are as dissimilar from formed through a modified SPKMeans mined from Second Life, an Internet-
each other as possible. The distance algorithm (we call Stable SPKMeans) based virtual world video game in
from each feature request to each cen- designed to minimize the movement which stakeholders are represented by
troid is computed, and each feature of feature requests between clusters interacting avatars; and the third de-
request is placed in the cluster associ- through reuse of the current set of cen- scribed the features of an online Am-
ated with its closest centroid. Once all troids as seeds for the new clustering. azon-like portal designed specifically
feature requests are assigned to clus- Cluster quality is also improved for students. In spring 2008 we asked
ters, the centroids are repositioned so through user feedback specifying 36 master’s-level students enrolled in
as to increase their average proximity to whether a pair of feature requests be- two different advanced software-engi-
all feature requests in the cluster. This long together. For example, users not neering classes at DePaul University to
is followed by a series of incremental happy with the quality of a cluster can consider the needs of a typical student,
optimizations in which an attempt is specify that a given feature request does textbook reseller, textbook author,
made to move a randomly selected fea- not fit the cluster. They might also pro- textbook publisher, instructor, portal
ture request to the cluster for which it vide additional tags to help place the administrator, architect, project man-
maximizes the overall cohesion gain. feature request in a more appropriate ager, and developer and create relevant
The process continues until no further cluster. These user constraints, along feature requests. The result was 366
optimization is possible. This simple with the tag information, are then in- feature requests.
post-processing step has been shown corporated into the SPK algorithm of To evaluate cluster quality, we con-
to significantly improve the quality of future clusterings. This reassignment structed an “ideal” clustering for the
clustering results.6 maximizes the quality of the individual SugarCRM data by reviewing and mod-
Clustering results are strongly in- clusters and optimizes conformance ifying the natural discussion threads
fluenced by initial centroid selection, to user constraints. Our prior work in created by SugarCRM users. Modifica-
meaning poor selection can lead to low- this area demonstrated significant im- tions included merging closely related
quality clusterings. However, this prob- provement in cluster quality when con- singleton threads, decomposing large
lem can be alleviated through a consen- straints are considered.13 megathreads into smaller more cohe-
sus technique for performing the initial sive ones, and reassigning misfits to
clustering, an approach that generates Evaluating AFM new clusters. The answer set enabled
multiple individual clusterings, then We conducted a series of experiments us to compare the quality of the gener-
uses a voting mechanism to create a fi- designed to evaluate the AFM’s abil- ated clusters using a metric called Nor-
nal result.8 Though it has a much longer ity to quickly deliver cohesive, distinct, malized Mutual Information (NMI) to
running time than SPKMeans, consen- and stable clusters. They utilized three measure the extent to which the knowl-
sus clustering has been shown to con- data sets: the first was the SugarCRM edge of one cluster reduces uncertainty
sistently deliver clusterings that are of data set discussed earlier; the second of other clusters.9 On a scale of 0 (no
higher-than-average quality compared
to the standalone SPK clusterings.6 In Figure 4. A partial cluster with security-related requirements generated after gathering
1,000 constraints from the Student data set.
the AFM system, consensus clustering
is used only for the initial clustering in
(1) The system must protect stored confidential information.
order to create the best possible set of
(2) The system must encrypt purchase/transaction information.
start-up threads.
Following the arrival of each new (3) A privacy policy must describe in detail to the users how their information
is stored and used.
feature request, the algorithm recom-
putes the ideal granularity to deter- (4) Transmission of personal information must be encrypted.
mine if a new cluster should be added. (5) Transmission of financial transactions must be encrypted.
To add a new cluster in a way that pre- (6) The system must use both encrypt and decrypt in some fields.
serves the stability of existing clusters (7) The system must allow users to view their previous transactions.
and minimizes clustering time, the (8) Databases must use the TripleDES encryption standard for database security.
AFM approach identifies the least- AES is still new and has had compatibility issues with certain types of databases,
cohesive cluster, then bisects it using including SQL Server express edition.
SPK, with K = 2. Feature requests from (9) The site must ensure that payment information is confidential and credit card
neighboring clusters are then reevalu- transactions are encrypted to prevent hackers from retrieving information.
ated to determine if they exhibit closer (10) Because the system will be used to buy books, we must focus on security and consider
proximity to one of the two new cen- transaction control in the architecture used to build it.
troids than they do to their own cur- (11) Correct use of cryptography techniques must be applied in the Amazon portal system
rently assigned centroids. If this closer to protect student information from outsiders and staff who might potentially acquire
proximity occurs they are reassigned to the information if left unprotected.
the relevant cluster. To ensure contin- (12) Sessions that handle payment transactions must be encrypted.
ued cluster quality, the entire data set
is re-clustered periodically following
the arrival of a fixed number of new
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 71
contributed articles
Figure 5. Stability of threads in the AFM clustering process. one clustering is possible for a given
data set, these metrics alone lack suf-
Sugar Second Life Student ficient insight to judge whether the
70 cluster quality is better or worse than
Percentage of feature requests
50
contacts.” In this case, both approach-
40 es were relatively successful.
30 For the second topic, appointment
scheduling, we identified 32 feature
20
requests. Human stakeholders placed
10 them across six different threads, with
15 in a thread called “scheduling,” 12
0
1 6 11 16 21 26 31 36 41 46 in a thread called “how to improve cal-
Iteration number endering [sic] and activities in sugar
c. Number of feature requests moved during sales,” two in a thread called “calen-
the incremental clustering process. dar,” and one each in threads related
to mass updates, printing, and email
management. The AFM placed 18 fea-
similarity) to 1 (identical clusterings), tering scores of 0.58 increased to 0.7 ture requests in a discussion related to
the SugarCRM clusterings scored 0.57, following 1,000 pairwise constraints. daily appointment scheduling, anoth-
indicating some degree of similarity Figure 4 outlines a cohesive cluster of er 10 in a discussion related to meeting
between the generated cluster and the feature requests for the Student data scheduling, and the remaining four in
answer set. In experiments using the sets generated using 1,000 system- two clusters related to email, Web-en-
answer set to simulate the gathering wide constraints. Note that 1,000 con- abled calendars, and integration with
of nonconflicting user feedback, NMI straints represent only 1.5% of possible other tools. In this case—appointment
scores increased to 0.62 after 1,000 constraints for the Student Portal data scheduling—the AFM performed mar-
pairwise constraints were randomly set and only 0.2% for SugarCRM. ginally better than the user-defined
selected from the pairs of requests ex- These results demonstrate that threads, as the feature requests were
hibiting borderline (neither very strong many of the themes we identified in slightly less dispersed.
nor very weak) proximity scores6 We our two answer sets were also detected For the third topic, language support,
also created an answer set for the Stu- by the unsupervised clustering algo- we identified 11 feature requests. Hu-
dent Portal data set; initial NMI clus- rithms. However, because more than man stakeholders placed them across
seven relatively small threads epitomiz- Student data sets, we found that the subsequent creation of new threads,
ing the kinds of problems we found in Stable SPKMeans clustering algorithm causing reclassification of some exist-
the open source forums we studied. The had no significant negative effect on ing feature requests. Figure 5b shows
AFM created a focused discussion fo- the quality of the final clusters. In fact, that the percentage of feature requests
rum on the topic of languages in which the NMI scores for the two data sets— moved across clusters gradually de-
it placed nine of the 11 requests. The SugarCRM and Student for which creases across subsequent iterations.
AFM approach excelled here. “target clusterings” were available— However, as shown in Figure 5c, the
The fourth and final topic, attach showed a slight improvement when we actual number of movements increas-
documents, represents a case in which used the stable algorithm. es in early iterations, then becomes
a fairly dominant concern was dis- To evaluate the stability of the modi- relatively stable. Though not discussed
persed by human users across 13 dif- fied SPKMeans algorithm, we tracked here, we also conducted extensive ex-
ferent threads, while the AFM placed the movement of feature requests be- periments that found the traditional
all related requests in a single highly tween clusters for re-clustering inter- unmodified SPKMeans algorithm re-
focused discussion thread. vals of 25 feature requests. Figure 5a sulted in approximately 1.6–2.6 times
It was evident that in each of the shows the number of moves per feature more volatility of feature requests than
four cases the AFM approach either request, reported in terms of percent- our modified version.
matched or improved on the results of age. Approximately 62%–65% of feature We analyzed the total execution time
the user-created threads. requests were never moved, 20%–30% of the Stable SPKMeans algorithm for
AFM was designed to minimize the of feature requests were moved once each of the three data sets using MAT-
movement of feature requests among or twice, and only about 5%–8% were LAB on a 2.33GHz machine; Table 2
clusters in order to preserve stability moved more frequently. A significant outlines the total time needed to per-
and quality. In a series of experiments amount of this movement is accounted form the clusterings at various incre-
against the SugarCRM, Second Life, and for by the arrival of new topics and the ments. We excluded initial consensus
Topic User-defined threads # FRs Cluster Size AFM-defined threads # FRs Cluster Size
Distribution lists Email management 14 74 Send email to sugar contacts 14 62
Scheduling Scheduling 15 37 View daily appointments in calendar 18 58
appointments How to improve calendaring 12 42 Schedule meetings 10 39
and activities in sugar sales
Calendar 2 32 Calendar support features 3 27
Mass updates 1 14 Send email to sugar contacts 1 62
Printing 1 5
Email management 1 74
Language International salutations using 2 3 Contact languages 9 11
support one language pack
Customer salutation in email template 2 2 Users roles and defaults 1 32
Outlook category synchronization 2 7 Contact categories and fields 1 13
Project: Translation system 2 4
Fallback option in language files 1 1
Language Preference option 1 1
A simplified workflow for small business 1 7
Document Attaching documents to cases, projects 6 7 Attach and manage documents 25 32
attachments Attachments for bug tracker 1 1
Display attachment 3 3
Display size of attachments 1 1
and sugar documents
Note: AFM-generated threads are initially labeled according
Documents to a customer 2 3 to the most common terms and then renamed by stakeholders.
Email management 2 74 For example, the thread “Send email to sugar contacts”
File upload field 2 5 was first generated as “email, send, list, sugar, contact.”
Koral document management integration 1 1 The threads here have been renamed.
Link documents with project 2 2
Module builder 1 6 Spelling and grammar is maintained from the original
user forums.
New feature request: Attach file to 1 1
an opportunities
View all in all lists 2 3
WebDAV access to "documents" 1 3
O C TO B E R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 73
contributed articles
clustering times that, with typical pa- data-mining techniques to help man- Acknowledgments
rameters of 50–100 requests, could take age discussion threads in open discus- This work was partially funded by Na-
up to two minutes. As depicted in this sion forums. Our ongoing work aims to tional Science Foundation grant CCR-
example, the SugarCRM data set took improve techniques for incorporating 0447594, including a Research Expe-
a total of 57.83 seconds to complete all user feedback into the clustering pro- riences for Undergraduates summer
29 clusterings at increments of 50 fea- cess so clusters that appear ad hoc to supplement to support the work of
ture requests. Individual clusterings are users or contain multiple themes can Horatiu Dumitru. We would also like to
notably fast; for example, the complete be improved. acknowledge Brenton Bade, Phik Shan
Second Life data set, consisting of 4,205 These findings are applicable across Foo, and Adam Czauderna for their
requests, was clustered in 75.18 seconds a range of applications, including those work developing the prototype.
using standard SPKMeans and in only designed to gather comments from a
1.02 seconds using our stable approach. product’s user base, support activities References
1. Basu, C., Hirsh, H., and Cohen, W. Recommendation
The Stable SPKMeans algorithm sig- (such as event planning), and capture as classification: Using social and content-based
nificantly improves the performance of requirements in large projects when information in recommendation. In Proceedings of
the 15th National Conference on Artificial Intelligence
the AFM, mainly because it takes less stakeholders are dispersed geographi- (Madison, WI, July 26–30). MIT Press, Cambridge, MA,
1998, 714–720.
time to converge on a solution when cally. Our ongoing work focuses on the 2. Can, F. and Ozkarahan, E.A. Concepts and effectiveness
quality seeds are passed forward from use of forums to support the gathering of the cover-coefficient-based clustering methodology
for text databases. ACM Transactions on Database
the previous clustering. Smaller incre- and prioritizing of requirements where Systems 15, 4 (Dec. 1990), 483–517.
ments require more clusterings, so the automated forum managers improve 3. Castro-Herrera, C., Duan, C., Cleland-Huang, J., and
Mobasher, B. A recommender system for requirements
overall clustering time increases as the the allocation of feature requests to elicitation in large-scale software projects. In
increment size decreases. However, threads and use recommender systems Proceedings of the 2009 ACM Symposium on Applied
Computing (Honolulu, HI, Mar. 9–12). ACM Press, New
our experiments found that increasing to help include stakeholders in relevant York, 2008, 1419–1426.
the increment size to 25 or even 50 fea- discussion groups.3 They also improve 4. Davis, A., Dieste, O., Hickey, A., Juristo, N., and Moreno,
A. Effectiveness of requirements elicitation techniques.
ture requests has negligible effect on the precision of forum search and en- In Proceedings of the 14th IEEE International
the quality and stability of the clusters. hance browsing capabilities by predict- Requirements Engineering Conference (Minneapolis, MN,
Sept.). IEEE Computer Society, 2006, 179–188.
ing and displaying stakeholders’ inter- 5. Dhillon, I.S. and Modha, D.S. Concept decompositions for
Conclusion est in a given discussion thread. large sparse text data using clustering. Machine Learning
42, 1–2 (Jan. 2001), 143–175.
We have identified some of the prob- From the user’s perspective, AFM fa- 6. Duan, C., Cleland-Huang, J., and Mobasher, B. A
lems experienced in organizing dis- cilitates the process of entering feature consensus-based approach to constrained clustering of
software requirements. In Proceedings of the 17th ACM
cussion threads in open forums. The requests. Enhanced search features International Conference on Information and Knowledge
Management (Napa, CA, Oct. 26–30). ACM Press, New
survey we conducted in summer 2008 help users decide where to place new York, 2008, 1073–1082.
of several open source forums sug- feature requests more accurately. Un- 7. Frakes, W.B. and Baeza-Yates, R. Information Retrieval:
Data Structures and Algorithms. Prentice-Hall,
gests that expecting users to manu- derlying data-mining functions then Englewood Cliffs, NJ, 1992.
ally create and manage threads may test the validity of the choice and (when 8. Fred, A.L. and Jain, A.K. Combining multiple clusterings
using evidence accumulation. IEEE Transactions on
not be the most effective approach. In placement is deemed incorrect) rec- Pattern Analysis and Machine Intelligence 27, 6 (June
contrast, we described an automated ommend moving the feature request 2005), 835–850.
9. Second Life virtual 3D world; http://secondlife.com,
technique involving our own AFM for to another existing discussion group or feature requests downloaded from the Second Life
creating stable, high-quality clusters sometimes to an entirely new thread. issue tracker https://jira.secondlife.com/secure/
Dashboard.jspa.
to anchor related discussion groups. All techniques described here are 10. SourceForge. Repository of Open Source Code and
Though no automated technique al- being implemented in the prototype Applications; feature requests for Open Bravo, ZIMBRA,
PHPMyAdmin, and Mono downloaded from SourceForge
ways delivers clusters that are cohesive AFM tool we are developing to test and forums http://sourceforge.net/.
and distinct from other clusters, our evaluate the AFM as an integral com- 11. Sugar CRM. Commercial open source customer
relationship management software; http://www.
reported experiments and case studies ponent of large-scale, distributed-re- sugarcrm.com/crm/; feature requests mined from
demonstrate the advantages of using quirements processes. feature requests at http://www.sugarcrm.com/forums/.
12. Wagstaff, K., Cardie, C., Rogers, S., and Schrödl, S.
Constrained K-means clustering with background
Table 2. Performance measured by total time spent clustering (in seconds). knowledge. In Proceedings of the 18th International
Conference on Machine Learning (June 28–July 1).
Morgan Kaufman Publishers, Inc., San Francisco, 2001,
Time to cluster entire set
577–584.
Data set Increment Size of feature requests one time
1 10 25 50 Jane Cleland-Huang ([email protected]) is an
Student (366 feature requests) associate professor in the School of Computing at DePaul
University, Chicago, IL.
Stable SPK Means 7.49 0.98 0.62 0.41 0.04
Horatiu Dumitru ([email protected]) is an
Standard SPK Means 101.82 10.78 4.74 2.92 0.73 undergraduate student studying computer science at the
Sugar (1,000 feature requests) University of Chicago, Chicago, IL.
Stable SPK Means 84.54 13.24 6.66 4.27 0.22 Chuan Duan ([email protected]) is a post-doctoral
researcher in the School of Computing at DePaul
Standard SPK Means 2,374.31 249.92 108.62 57.83 6.69
University, Chicago, IL.
Second Life (4,205 feature requests)
Carlos Castro-Herrera ([email protected]) is
Stable SPK Means 1,880.69 268.15 146.75 96.11 1.02 a Ph.D. student in the School of Computing at DePaul
Standard SPK Means 11,409.57 12,748.63 5,917.94 2,619.90 75.18 University, Chicago, IL.
%*&(*))'&%)&() %#+&+%* &% #&&+%* &% #%*%&#& )$/# %($+* #)
*#(&+'&+%* &% )&/)*$)
-#**"( &$'%/ &('&(* &%%*#&+%* &%&"
(* %'/)*$)* &%# %&+%* &%,#)(&(*&(/
% * &%#&(*&( )
#+$((
*#(&+%* &%.)%)*(+$%*)%
%(/+&+%* &%
review articles
DOI:10.1145/ 1562764.1562785
performs well in the worst case: if one
This Gödel Prize-winning work traces the steps can prove that an algorithm performs
well in the worst case, then one can be
toward modeling real data. confident that it will work well in ev-
ery domain. However, there are many
BY DANIEL A. SPIELMAN AND SHANG-HUA TENG algorithms that work well in practice
that do not work well in the worst case.
Smoothed
Smoothed analysis provides a theo-
retical framework for explaining why
some of these algorithms do work well
in practice.
The performance of an algorithm is
Analysis:
usually measured by its running time,
expressed as a function of the input
size of the problem it solves. The per-
formance profiles of algorithms across
the landscape of input instances can
the Behavior of
in the input size on all instances, some
take quadratic or higher order poly-
nomial time, while some may take an
is equal to P. Shortly before he passed problems. We don’t understand is the set of all graphs with n vertices;
away, Tim Russert of NBC’s “Meet why! It is apparent that worst-case and in computational geometry, we
the Press,” commented that the 2008 analysis does not provide use- often have 7n ℝn. In order to suc-
election could end in a tie between ful insights on the performance cinctly express the performance of an
the Democratic and the Republican of algorithms and heuristics algorithm A, for each 7n one defines
candidates. In other words, he solved and our models of computation a scalar TA(n) that summarizes the
a 51-item Knapsack problema by need to be further developed instance-based complexity measure,
hand within a reasonable amount and refined. Theoreticians are TA[], of A over 7n. One often further
of time, and most likely without investing increasingly in careful simplifies this expression by using
using the pseudo-polynomial-time experimental work leading to big-O or big-1 notation to express TA(n)
dynamic-programming algorithm for identification of important new asymptotically.
Knapsack! questions in algorithms area. Traditional Analyses. It is under-
In our field, the simplex algorithm Developing means for predicting standable that different approaches to
is the classic example of an algorithm the performance of algorithms summarizing the performance of an
that is known to perform well in prac- and heuristics on real data and algorithm over 7n can lead to very dif-
tice but has poor worst-case complex- on real computers is a grand ferent evaluations of that algorithm. In
ity. The simplex algorithm solves a challenge in algorithms. Theoretical Computer Science, the most
linear program, for example, of the commonly used measures are the
form, Needless to say, there are a multi- worst-case measure and the average-
tude of algorithms beyond simplex and case measures.
max cTx subject to Ax b b, (1) simulated annealing whose perfor- The worst-case measure is defined
mance in practice is not well explained as
where A is an m × n matrix, b is an by worst-case analysis. We hope that
m-place vector, and c is an n-place theoretical explanations will be found WCA(n) = max TA[x]
x 7n
vector. In the worst case, the simplex for the success in practice of many
algorithm takes exponential time.25 of these algorithms, and that these
Developing rigorous mathematical theories will catalyze better algorithm The average-case measures have
theories that explain the observed per- design. more parameters. In each average-case
formance of practical algorithms and measure, one first determines a distri-
heuristics has become an increasingly The Behavior of Algorithms bution of inputs and then measures the
important task in Theoretical Computer When A is an algorithm for solving expected performance of an algorithm
Science. However, modeling observed problem P we let TA[x] denote the assuming inputs are drawn from this
data and practical problem instances running time of algorithm A on an input distribution. Supposing + provides a
is a challenging task as insightfully instance x. If the input domain 7 has distribution over each 7n, the average-
pointed out in the 1999 “Challenges only one input instance x, then we can case measure according to + is
for Theory of Computing” Report use the instance-based measure TA1[x]
for an NSF-Sponsored Workshop on and TA2[x] to decide which of the two Ave+A (n) = E [TA[x]],
x + 7n
Research in Theoretical Computer algorithms A1 and A2 more efficiently
Science.b solves P. If 7 has two instances x and y,
then the instance-based measure of an where we use x + 7n to indicate that x
While theoretical work on mod- algorithm A defines a two-dimensional is randomly chosen from 7n according
els of computation and meth- vector (TA[x], TA[y]). It could be the case to distribution +.
ods for analyzing algorithms has that TA1[x] TA2[x] but TA1[y] ! TA2[y]. Critique of Traditional Analyses.
had enormous payoff, we are not Then, strictly speaking, these two algo- Low worst-case complexity is the gold
done. In many situations, sim- rithms are not comparable. Usually, standard for an algorithm. When low,
ple algorithms do well. Take for the input domain is much more com- the worst-case complexity provides an
example the Simplex algorithm plex, both in theory and in practice. absolute guarantee on the performance
for linear programming, or the The instance-based complexity mea- of an algorithm no matter which input
success of simulated annealing sure TA[] defines an |7| dimensional it is given. Algorithms with good worst-
of certain supposedly intractable vector when 7 is finite. In general, it case performance have been developed
can be viewed as a function from 7 to for a great number of problems.
ℝ1+ but it is unwieldy. To compare two However, there are many problems
a
In presidential elections in the United States,
algorithms, we require a more concise that need to be solved in practice for
each of the 50 states and the District of Colum- complexity measure. which we do not know algorithms with
bia is allocated a number of electors. All but the An input domain 7 is usually viewed good worst-case performance. Instead,
states of Maine and Nebraska use a winner-take- as the union of a family of subdomains scientists and engineers typically use
all system, with the candidate winning the major- {71, …, 7n, …}, where 7n represents all heuristic algorithms to solve these
ity votes in each state being awarded all of that
states electors. The winner of the election is the
instances in 7 of size n. For example, problems. Many of these algorithms
candidate who is awarded the most electors. in sorting, 7n is the set of all tuples of work well in practice, in spite of having
b
Available at http://sigact.acm.org/ n elements; in graph algorithms, 7n a poor, sometimes exponential, worst-
case running time. Practitioners justify from the graph of a triangulation of nor completely arbitrary. At a high
the use of these heuristics by observing a point set in two dimensions, which level, each input is generated from a
that worst-case instances are usually will also have average degree approxi- two-stage model: In the first stage, an
not “typical” and rarely occur in prac- mately six. instance is generated and in the second
tice. The worst-case analysis can be too In fact, random objects such as ran- stage, the instance from the first stage
pessimistic. This theory-practice gap dom graphs and random matrices have is slightly perturbed. The perturbed
is not limited to heuristics with expo- special properties with exponentially instance is the input to the algorithm.
nential complexity. Many polynomial- high probability, and these special In smoothed analysis, we assume
time algorithms, such as interior-point properties might dominate the aver- that an input to an algorithm is subject
methods for linear programming and age-case analysis. Edelman14 writes of to a slight random perturbation. The
the conjugate gradient algorithm for random matrices: smoothed measure of an algorithm on
solving linear equations, are often an input instance is its expected per-
much faster than their worst-case What is a mistake is to psycholog- formance over the perturbations of
bounds would suggest. In addition, ically link a random matrix with that instance. We define the smoothed
heuristics are often used to speed up the intuitive notion of a “typical” complexity of an algorithm to be the
the practical performance of imple- matrix or the vague concept of maximum smoothed measure over
mentations that are based on algo- “any old matrix.” input instances.
rithms with polynomial worst-case In contrast, we argue that “ran- For concreteness, consider the case
complexity. These heuristics might dom matrices” are very special 7n = ℝn, which is a common input
in fact worsen the worst-case perfor- matrices. domain in computational geometry,
mance, or make the worst-case com- scientific computing, and optimiza-
plexity difficult to analyze. Smoothed Analysis: A Step Toward tion. For these continuous inputs and
Average-case analysis was intro- Modeling Real Data. Because of the applications, the family of Gaussian
duced to overcome this difficulty. In intrinsic difficulty in defining prac- distributions provides a natural model
average-case analysis, one measures tical distributions, we consider an of noise or perturbation.
the expected running time of an algo- alternative approach to modeling Recall that a univariate Gaussian
rithm on some distribution of inputs. real data. The basic idea is to identify distribution with mean 0 and standard
While one would ideally choose the typical properties of practical data, deviation s has density
distribution of inputs that occurs in define an input model that captures
practice, this is difficult as it is rare these properties, and then rigorously 1
} } e
x /2s .
2 2
O C TO BE R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 79
review articles
where g is a Gaussian random vector of relaxation of polynomial smoothed Phase II is iterative: in the ith iteration,
variance s 2. complexity. the algorithm finds a neighboring ver-
tex vi of vi−1 with better objective value
In this definition, the “original” Definition 3. A has probably polyno- or terminates by returning vi−1 when
_
input x is perturbed to obtain the mial smoothed complexity if there exist no such neighboring vertex exists. The
_
input x + g, which is then fed to the constants n0, s0, c, and a, such that for simplex algorithms differ in their pivot
algorithm. For each original input, this all n r n0 and 0 b s b s0, rules, which determine which vertex
measures the expected running time vi to choose when there are multiple
of algorithm A on random perturba- _ choices. Several pivot rules have been
_ max n E g
[TA(x + g)A] b c . V 1 . n. (4)
tions of that input. The maximum out x [1, 1] proposed; however, almost all existing
front tells us to measure the smoothed pivot rules are known to have exponen-
analysis by the expectation under the They show that some algorithms tial worst-case complexity.25
worst possible original input. have probably polynomial smoothed Spielman and Teng36 considered
The smoothed complexity of an complexity, in spite of the fact that the smoothed complexity of the sim-
algorithm measures the performance their smoothed complexity according plex algorithm with the shadow-vertex
of the algorithm both in terms of to Definition 1 is unbounded. pivot rule, developed by Gass and
the input size n and in terms of the Saaty.18 They used Gaussian pertur-
magnitude s of the perturbation. Examples of Smoothed Analysis bations to model noise in the input
By varying s between zero and infin- In this section, we give a few examples data and proved that the smoothed
ity, one can use smoothed analysis to of smoothed analysis. We organize complexity of this algorithm is polyno-
interpolate between worst-case and them in five categories: mathemati- mial. Vershynin38 improved their result
average-case analysis. When s = 0, cal programming, machine learning, to obtain a smoothed complexity of
one recovers the ordinary worst-case numerical analysis, discrete math-
analysis. As s grows large, the random ematics, and combinatorial optimi- O (max (n5 log2 m, n9 log 4 n, n3 s
4)).
perturbation g dominates the original zation. For each example, we will give
_
x , and one obtains an average-case the definition of the problem, state See Blum and Dunagan, and Duna-
analysis. We are most interested in the worst-case complexity, explain gan et al.7, 13 for smoothed analyses of
the situation in which s is small rela- the perturbation model, and state the other linear programming algorithms
_ _
tive to ||x ||, in which case x + g may be smoothed complexity under the per- such as the interior-point algorithms
interpreted as a slight perturbation of turbation model. and the perceptron algorithm.
_
x . The dependence on the magnitude Mathematical Programming. The Quasi-Concave Minimization.
s is essential and much of the work in typical problem in mathematical pro- Another fundamental optimization
smoothed analysis demonstrates that gramming is the optimization of an problem is quasi-concave minimiza-
noise often makes a problem easier to objective function subject to a set of tion. Recall that a function f : ℝn m
solve. constraints. Because of its impor- ℝ is quasi-concave if all of its upper
tance to economics, management level sets Lg = {x| f (x) r g} are convex.
Definition 2. A has polynomial science, industry, and military plan- In quasi-concave minimization, one is
smoothed complexity if there exist posi- ning, many optimization algorithms asked to find the minimum of a quasi-
tive constants n0, s0, c, k1, and k2 such and heuristics have been developed, concave function subject to a set of lin-
that for all n r n0 and 0 b s b s0, implemented and applied to practi- ear constraints. Even when restricted
cal problems. Thus, this field provides to concave quadratic functions over
SmoothedVA (n) b c . V k2 . nk1, (2) a great collection of algorithms for the hypercube, concave minimization
smoothed analysis. is NP-hard.
From Markov’s inequality, we know Linear Programming. Linear pro- In applications such as stochastic
that if an algorithm A has smoothed gramming is the most fundamental and multiobjective optimization, one
complexity T(n, s), then optimization problem. A typical lin- often deals with data from low-dimen-
ear program is given in Equation 1. sional subspaces. In other words, one
_
_ max nPrg[TA(x + g)] b d
1T(n, S)] r 1
d . The most commonly used linear pro- needs to solve a quasi-concave mini-
x [1, 1]
(3) gramming algorithms are the simplex mization problem with a low-rank
algorithm12 and the interior-point quasi-concave function.23 Recall that a
Thus, if A has polynomial smoothed algorithms. function f : ℝn m ℝ has rank k if it can
_
complexity, then for any x , with prob- The simplex algorithm, first devel- be written in the form
ability at least (1 − d ), A can solve a oped by Dantzig in 1951,12 is a family of
_
random perturbation of x in time poly- iterative algorithms. Most of them are f (x) g (a1T x, aT2 x,...,aTk x),
nomial in n, 1/s, and 1/d. However, the two-phase algorithms: Phase I deter-
probabilistic upper bound given in (3) mines whether a given linear program is for a function g : ℝk m ℝ and linearly
does not necessarily imply that the infeasible, unbounded in the objective independent vectors a1, a2, …, ak.
smoothed complexity of A is O(T(n,s) ). direction, or feasible with a bounded Kelner and Nikolova23 proved that,
Blum and Dunagan7 and subsequently solution, in which case, a vertex v0 of under some mild assumptions on the
Beier and Vöcking6 introduced a the feasible region is also computed. feasible convex region, if k is a constant
then the smoothed complexity of learning. The ordinary perceptron For example, Wilkinson ( JACM
quasi-concave minimization is polyno- algorithm solves a fundamental prob- 1961) demonstrated a family of linear
mial when f is perturbed by noise. Key lem: given a collection of points x1, …, systemsc of n variables and {0, −1, 1}
to their analysis is a smoothed bound xn ℝd and labels b1, …, bn {±1}n, find coefficients for which Gaussian elimi-
on the size of the k-dimensional shadow a hyperplane separating the positively nation with partial pivoting—the most
of the high-dimensional polytope that labeled examples from the negatively popular variant in practice—requires
defines the feasible convex region. labeled ones, or determine that no n-bits of precision.
Their result is a nontrivial extension such plane exists. Under a smoothed Precision Requirements of Gaussian
of the analysis of two-dimensional model in which the points x1, …, xn are Elimination. In practice one almost
shadows of Kelner and Spielman, and subject to a s -Gaussian perturbation, always obtains accurate answers using
Spielman and Teng.24, 36 Blum and Dunagan show that the per- much less precision. High-precision
Machine Learning. Machine learn- ceptron algorithm has probably poly- solvers are rarely used or needed. For
ing provides many natural problems nomial smoothed complexity, with example, Matlab uses 64 bits.
for smoothed analysis. The field has exponent a = 1. Their proof follows Building on the smoothed analy-
many heuristics that work in practice, from a demonstration that if the posi- sis of condition numbers (discussed
but not in the worst case, and the data tive points can be separated from the below), Sankar et al.33, 34 proved that it
defining most machine learning prob- negative points, then they can probably is sufficient to use O(log2(n/s) ) bits of
lems is inherently noisy. be separated by a large margin. That is, precision to run Gaussian elimination
K-means. One of the fundamental there probably exists a plane separat- with partial pivoting when the matri-
problems in machine learning is that ing the points for which no point is too ces of the linear systems are subject to
of k-means clustering: the partitioning close to that separating plane. s -Gaussian perturbations.
of a set of d-dimensional vectors Q = It is known that the perceptron The Condition Number. The
{q1, …, qn} into k clusters {Q1, …, Qk} so algorithm converges quickly in this smoothed analysis of the condition
that the intracluster variance case. Moreover, this margin is exactly number of a matrix is a key step toward
what is maximized by support vector understanding numerical precision
k machines. required in practice. For a square
V¤ ¤ \\ qj
mQi \\2, PAC Learning. Probably approxi- matrix A, its condition number k (A)
i = 1 qj Qi
mately correct learning (PAC learn- is given by k (A) = ||A||2||A−1||2 where
ing) is a framework in machine ||A||2 = maxx ||Ax||2/||x||2. The condition
is minimized, where mQi ¤q Q qj learning introduced by Valiant in which number of A measures how much the
j i
\Qi \ is the centroid of Qi. a learner is provided with a polynomial solution to a system Ax = b changes as
One of the most widely used clus- number of labeled examples from a one makes slight changes to A and b:
tering algorithms is Lloyd’s algorithm given distribution, and must produce If one solves the linear system using
(IEEE Transaction on Information a classifier that is usually correct with fewer than log(k (A) ) bits of precision,
Theory, 1982). It first chooses an arbi- reasonably high probability. In stan- then one is likely to obtain a result far
trary set of k centers and then uses dard PAC learning, the distribution from a solution.
the Voronoi diagram of these centers from which the examples are drawn The quantity 1/||A−1||2 = minx||Ax||2/
to partition Q into k clusters. It then is fixed. Recently, Kalai and Teng21 ||x||2 is known as the smallest singular
repeats the following process until it applied smoothed analysis to this value of A. Sankar et al.34 proved the
stabilizes: use the centroids of the cur- problem by perturbing the input dis- following _ statement: For any _ square
rent clusters as the new centers, and tribution. They prove that polynomial- matrix A in ℝn × n satisfying ||A||2 b @n ,
then repartition Q accordingly. sized decision trees are PAC-learnable and for any x > 1,
Two important questions about in polynomial time under perturbed
Lloyd’s algorithm are how many itera- product distributions. In constrast,
tions its takes to converge, and how under the uniform distribution even PrA [||A
1||2 r x] b
@n ,
}xs
close to optimal is the solution it super-constant size decision trees are
finds? Smoothed analysis has been not known to be PAC-learnable in poly- where A is a s -Gaussian perturbation
used to address the first question. nomial time. of A. Wschebor40 improved _this bound
Arthur and Vassilvitskii proved that Numerical Analysis. One of the foci to show that for s b 1 and ||A||2 b 1,
in the worst case, Lloyd’s algorithm of numerical analysis is the determina-
requires 27(@n ) iterations to converge,3 tion of how much precision is required Pr[k (A) r x ] b O
@n .
}xs
but that it has smoothed complexity by numerical methods. For example,
PHOTOGRA PH BY H ENNING M ÜHLINGH AU S
polynomial in nk and s −1.4 Manthey and consider the most fundamental prob- See Bürgisser et al. and Dunagan
Röglin28 recently reduced this bound to lem in computational science—that et al.10, 13 for smoothed analysis of the
polynomial in n@k and s −1. of solving systems of linear equations. condition numbers of other problems.
Perceptrons, Margins, and Support Because of the round-off errors in Discrete Mathematics. For prob-
Vector Machines. Blum and Dunagan’s computation, it is crucial to know how lems in discrete mathematics, it is
analysis of the perceptron algorithm7 many bits of precision a linear solver
for linear programming implicitly con- should maintain so that its solution is c
See the second line of the Matlab code at
tains results of interest in machine meaningful. the end of the Discussion section for an example.
O C TO BE R 2 0 0 9 | VO L. 52 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 81
review articles
more natural to use Boolean perturba- to other discrete problems. For numerical algorithms are known to run
_ _ _
tions: Let x = (x 1,…, x n) {0, 1}n or {−1, example, Feige16 used the following faster when their inputs are well condi-
_
1} . The s -Boolean perturbation of x is
n
smoothed model for 3CNF formulas. tioned. Some smoothed analyses, such
a random string x = (x1,…, xn) {0, 1} First, an adversary picks an arbitrary as that of Blum and Dunagan,7 exploit
_
n
or {−1, 1}n, where xi = x i with prob- formula with n variables and m clauses. this connection explicitly. Others rely
_
ability 1 − s and xi x x i with probability Then, the formula is perturbed at ran- on similar ideas.
s. That is, each bit is flipped indepen- dom by flipping the polarity of each For many problems, the condition
dently with probability s. occurrence of each variable indepen- number of an input is approximately
Believing that s -perturbations of dently with probability s. Feige gave a the inverse of the distance of that
Boolean matrices should behave like randomized polynomial-time refuta- input to the set of degenerate or ill-
Gaussian perturbations of real matri- tion algorithm for this problem. posed problems. As these degenerate
ces, Spielman and Teng35 made the fol- Combinatorial Optimization. Beier sets typically have measure zero, it is
lowing conjecture: and Vöcking6 and Röglin and Vöcking31 intuitively reasonable that a perturbed
considered the smoothed complexity instance should be unlikely to be too
Let A be an n by n matrix of inde- of integer linear programming. They close to a degenerate one.
pendently and uniformly chosen studied programs of the form Other Performance Measures.
r1 entries. Then Although we normally evaluate the
max cTx subject to Ax b b and x performance of an algorithm by its
PrA [||A
1||2 r x] b
@n
}x A ,
n
n, running time, other performance
(5) parameters are often important. These
performance parameters include the
for some constant a. where A is an m × n real matrix, b ℝm, amount of space required, the number
and ℤ. of bits of precision required to achieve
Statements close to this conjecture Recall that ZPP denotes the class of a given output accuracy, the number
and its generalizations were recently decision problems solvable by a ran- of cache misses, the error probability
proved by Vu and Tao39 and Rudelson domized algorithm that always returns of a decision algorithm, the number
and Vershynin.32 the correct answer, and whose expected of random bits needed in a random-
The s -Boolean perturbation of a running time (on every input) is poly- ized algorithm, the number of calls to
graph can be viewed as a smoothed nomial. Beier, Röglin, and Vöcking6, a particular subroutine, and the num-
extension of the classic Erdös– 31
proved the following statement: For ber of examples needed in a learning
Rényi random graph model. _ The any constant c, let 0 be a class of inte- algorithm. The quality of an approxi-
s -perturbation of a graph G, which we ger linear programs of form 5 with |D| mation algorithm could be its approxi-
denote by GG_(n, s ), is a distribution of = O(nc). Then, 0 has an algorithm of mation ratio; the quality of an online
random graphs in which every edge is probably polynomial smoothed com- algorithm could be its competitive
removed with probability s and every plexity if and only if 0u ZPP, where ratio; and the parameter of a game
nonedge is included with probability 0u is the “unary” representation of 0. could be its price of anarchy or the rate
s. Clearly for p [0, 1], G(n, p) = Gc (n, Consequently, the 0/1-knapsack prob- of convergence of its best-response
p), i.e., the p-Boolean perturbation lem, the constrained shortest path dynamics. We anticipate future results
of the empty graph. One can define a problem, the constrained minimum on the smoothed analysis of these per-
smoothed extension of other random spanning tree problem, and the con- formance measures.
graph models. For example, for any m strained minimum weighted match- Precursors to Smoothed Complexity.
and _ G = (V, E), Bohman et al. define ing problem have probably polynomial Several previous probabilistic models
9
G(G, m) to be the distribution of the smoothed complexity. have also combined features of worst-
random graphs (V, E T) where T is a Smoothed analysis has been applied case and average-case analyses.
set of m edges chosen uniformly at ran- to several other optimization prob- Haimovich19 and Adler1 consid-
dom from the _ complement of E, i.e., lems such as local search and TSP,15 ered the following probabilistic
chosen from E = {(i, j) E}. scheduling,5 motion planning,11 super- analysis: Given a linear program
A popular subject of study in the tra- string approximation,27 multiobjective $ = (A, b, c) of form (1), they defined
ditional Erdös–Rényi model is the phe- optimization,31 string alignment,2 and the expected complexity of $ to be
nomenon of phase transition: for many multidimensional packing.22 the expected complexity of the sim-
properties such as being connected or plex algorithm when the inequality
being Hamiltonian, there is a critical p Discussion sign of each constraint is uniformly
below which a graph is unlikely to have A Theme in Smoothed Analyses. One flipped. They proved that the expected
each property and above which it prob- idea is present in many of these complexity of the worst possible $ is
ably does have the property. Related smoothed analyses: perturbed inputs polynomial.
phase transitions have also been found are rarely ill-conditioned. Informally Blum and Spencer8 studied the
in the smoothed Erdös–Rényi models speaking, the condition number of design of polynomial-time algorithms
GG_ (n, s).17, 26 a problem measures how much its for the semi-random model, which
Smoothed analysis based on answer can be made to change by combines the features of the semi-
Boolean perturbations can be applied slight modifications of its input. Many random source with the random graph
model that has a “planted solution.” For example, the worst-case com- A simpler way to strengthen
This model can be illustrated with the plexity and the smoothed complexity smoothed analysis is to restrict the
k-Coloring Problem: An adversary of the problem of computing a market family of perturbations considered.
plants a solution by partitioning the set equilibrium are essentially the same.20 For example, one could employ zero-
V of n vertices into k subsets V1,…,Vk. Let So far, no polynomial-time pricing preserving perturbations, which only
algorithm is known for general mar- apply to nonzero entries. Or, one could
F = {(u, v)| u and v are in different kets. On the other hand, pricing seems use relative perturbations, which per-
subsets} to be a practically solvable problem, as turb every real number individually by
Kamal Jain put it “If a Turing machine a small multiplicative factor.
be the set of potential inter-subset can’t compute then an economic sys- Algorithm Design Based on Per-
edges. A graph is then constructed by tem can’t compute either.” turbations and Smoothed Analysis.
the following semi-random process A key step to understanding the Finally, we hope insights gained from
that perturbs the decisions of the behaviors of algorithms in practice is smoothed analysis will lead to new ideas
adversary: In a sequential order, the the construction of analyzable models in algorithm design. On a theoretical
adversary decides whether to include that are able to capture some essential front, Kelner and Spielman24 exploited
each edge of F in the graph, and then aspects of practical input instances. ideas from the smoothed analysis
a random process reverses the deci- For practical inputs, there may often of the simplex method to design a
sion with probability s. Note that every be multiple parameters that govern (weakly) polynomial-time simplex
graph generated by this semi-random the process of their formation. method that functions by systemati-
process has the planted coloring: c(v) = One way to strengthen the cally perturbing its input program. On
i for all v Vi, as both the adversary and smoothed analysis framework is to a more practical level, we suggest that
the random process preserve this solu- improve the model of the formation it might be possible to solve some
tion by only considering edges from F. of input instances. For example, if problems more efficiently by perturb-
As with the smoothed model, one the input instances to an algorithm A ing their inputs. For example, some
can work with the semi-random model come from the output of another algo- algorithms in computational geometry
by varying s between 0 and 1 to inter- rithm B, then algorithm B, together implement variable-precision arith-
polate between worst-case and aver- with a model of B’s input instances, metic to correctly handle exceptions
age-case complexity for k-coloring. provide a description of A’s inputs. that arise from geometric degener-
Algorithm Design and Analysis For example, in finite-element calcu- acy.29 However, degeneracies and near-
for Special Families of Inputs. lations, the inputs to the linear solver degeneracies occur with exceedingly
Probabilistic approaches are not the A are stiffness matrices that are pro- small probability under perturbations
only means of characterizing practical duced by a meshing algorithm B. The of inputs. To prevent perturbations
inputs. Much work has been spent on meshing algorithm B, which could be from changing answers, one could
designing and analyzing inputs that a randomized algorithm, generates employ quad-precision arithmetic,
satisfy certain deterministic but prac- a stiffness matrix from a geometric placing the perturbations into the
tical input conditions. We mention a domain 7 and a partial differential least-significant half of the digits.
few examples that excite us. equation F. So, the distribution of the Our smoothed analysis of Gaussian
In parallel scientific computing, stiffness matrices input to algorithm elimination suggests a more stable
one may often assume that the input A is determined by the distribution solver for linear systems: When given
graph is a well-shaped finite element of the geometric domains 7 and the a linear system Ax = b, we first use the
mesh. In VLSI layout, one often only set F of partial differential equations, standard Gaussian elimination with
considers graphs that are planar or and the randomness in algorithm B. partial pivoting algorithm to solve
nearly planar. In geometric model- If, for example, 7 is the design of an Ax = b. Suppose x* is the solution com-
ing, one may assume that there is an advanced rocket from a set * of “blue- puted. If ||b – Ax*|| is small enough,
upper bound on the ratio among the prints” and F is from a set of PDEs then we simply return x*. Otherwise,
distances between points. In Web describing physical parameters such we can determine a parameter e and
analysis, one may assume that the as pressure, speed, and temperature, generate a new linear system (A + e G)y
input graph satisfies some powerlaw and 7 is generated by a perturbation = b, where G is a Gaussian matrix with
degree distribution or some small- model ( of the blueprints, then one mean 0 and variance 1. Instead of solv-
world properties. When analyzing may further measure the performance ing Ax = b, we solve a perturbed linear
hash functions, one may assume that of A by the smoothed value: system (A + e G)y = b. It follows from
the data being hashed has some non- standard analysis that if e is sufficiently
PHOTOGRA PH BY H ENNING M ÜHLINGH AU S
negligible entropy.30 max E
E [Q(A, X)] , smaller than k (A), then the solution to
Limits of Smoothed Analysis. The F, 7 * 7 k((7) X k B7, F the perturbed linear system is a good
goal of smoothed analysis is to explain approximation to the original one.
why some algorithms have much bet- One could use practical experience or
ter performance in practice than pre- where 7 k ( (7) indicates that 7 is binary search to set e.
dicted by the traditional worst-case obtained from a perturbation of 7 and The new algorithm has the property
analysis. However, for many problems, X k B(7, F) indicates that X is the out- that its success depends only on the
there may be better explanations. put of the randomized algorithm B. machine precision and the condition
number of A, while the original algo- volume 5125 of Lecture Notes in Computer Science algorithm? Inequalities – III. O. Shisha, ed.
(Springer, 2008) 357–369. Academic Press, 1972, 159–175.
rithm may fail due to large growth 3. Arthur, D. and Vassilvitskii, S. How slow is the 26. Krivelevich, M., Sudakov, B. and Tetali, P. On
factors. For example, the following is k-means method? In SOCG ’06, the 22nd Annual smoothed analysis in dense graphs and formulas.
ACM Symposium on Computational Geometry (2006) Random Struct. Algorithms 29 (2005), 180–193.
a fragment of Matlab code that first 144–153. 27. Ma, B. Why greed works for shortest common
solves a linear system whose matrix is 4. Arthur, D. and Vassilvitskii, S. Worst-case and superstring problem. In CPM ’08: Proceedings of the
smoothed analysis of the ICP algorithm, with an 19th Annual Symposium on Combinatorial Pattern
the 70 × 70 matrix Wilkinson designed application to the k-means method. In FOCS ’06, Matching (2008), 244–254.
to trip up partial pivoting, using the the 47th Annual IEEE Symposium on Foundations of 28. Manthey, B. and Röglin, H. Improved smoothed
Computer Science (2006) 153–164. analysis of the k-means method. In SODA ’09
Matlab linear solver. We then per- 5. Becchetti, L., Leonardi, S., Marchetti-Spaccamela, (2009).
A., Schfer, G., and Vredeveld, T. Average-case and 29. Mehlhorn, K. and Näher, S. The LEDA Platform of
turb the system, and apply the Matlab smoothed competitive analysis of the multilevel Combinatorial and Geometric Computing. Cambridge
solver again. feedback algorithm. Math. Oper. Res. 31, 1 (2006), University Press, New York, 1999.
85–108. 30. Mitzenmacher, M. and Vadhan, S. Why simple hash
6. Beier, R. and Vöcking, B. Typical properties of winners functions work: exploiting the entropy in a data
and losers in discrete optimization. In STOC ’04: stream. In SODA ’08: Proceedings of the Nineteenth
the 36th Annual ACM Symposium on Theory of Annual ACM-SIAM Symposium on Discrete
>> % Using the Matlab Solver Computing (2004), 343–352. Algorithms (2008), 746–755.
7. Blum, A. and Dunagan, J. Smoothed analysis of 31. Röglin, H. and Vöcking, B. Smoothed analysis of
>> n = 70; A = 2*eye(n)- the perceptron algorithm for linear programming. integer programming. Proceedings of the 11th
tril(ones(n) ); A(:,n)=1; In SODA ’02 (2002), 905–914. International Conference on Integer Programming
8. Blum, A. and Spencer, J. Coloring random and and Combinatorial Optimization. M. Junger and
>> b = randn(70,1); x = A\b; V. Kaibel, eds. Volume 3509 of Lecture Notes in
semi-random k-colorable graphs. J. Algorithms 19, 2
>> norm (A*x-b) (1995), 204–234. Computer Science, Springer, 2005, 276–290.
2.762797463910437e+004 9. Bohman, T., Frieze, A. and Martin, R. How many 32. Rudelson, M. and Vershynin, R. The littlewood-offord
random edges make a dense graph hamiltonian? problem and invertibility of random matrices. Adv.
>> % FAILED because of large Random Struct. Algorithms 22, 1 (2003), 33–42. Math. 218, (June 2008), 600–633.
growth factor 10. Bürgisser, P., Cucker, F., and Lotz, M. Smoothed 33. Sankar, A. Smoothed analysis of Gaussian
analysis of complex conic condition numbers. J. elimination. Ph.D. Thesis, MIT, 2004.
>> %Using the new solver 34. Sankar, A., Spielman, D.A., and Teng, S.-H. Smoothed
de Mathématiques Pures Appliqués 86 4 (2006),
>> Ap = A + randn(n)/10^9; 293–309. analysis of the condition numbers and growth factors
y = Ap\b; 11. Damerow, V., Meyer auf der Heide, F., Räcke, H., of matrices. SIAM J. Matrix Anal. Appl. 28, 2 (2006),
Scheideler, C., and Sohler, C. Smoothed motion 446–476.
>> norm(Ap*y-b) complexity. In Proceedings of the 11th Annual 35. Spielman, D.A. and Teng, S.-H. Smoothed analysis
6.343500222435404e-015 European Symposium on Algorithms (2003), 161–171. of algorithms. In Proceedings of the International
12. Dantzig, G.B. Maximization of linear function of Congress of Mathematicians (2002), 597–606.
>> norm(A*y-b) 36. Spielman, D.A. and Teng, S.-H. Smoothed analysis
variables subject to linear inequalities. Activity
4.434147778553908e-008 Analysis of Production and Allocation. of algorithms: Why the simplex algorithm usually
T.C. Koopmans, Ed. 1951, 339–347. takes polynomial time. J. ACM 51, 3 (20040,
13. Dunagan, J., Spielman, D.A., and Teng, S.-H. 385–463.
Smoothed analysis of condition numbers and 37. Teng, S.-H. Algorithm design and analysis with
complexity implications for linear programming. perburbations. In Fourth International Congress of
Mathematical Programming, Series A, 2009. To Chinese Mathematicans (2007).
Note that while the Matlab linear 38. Vershynin, R. Beyond Hirsch conjecture: Walks on
appear. Preliminary version available at http://arxiv.
solver fails to find a good solution to the org/abs/cs/0302011v2. random polytopes and smoothed complexity of the
14. Edelman, A. Eigenvalue roulette and random test simplex method. In Proceedings of the 47th Annual
linear system, our new perturbation- matrices. Linear Algebra for Large Scale and IEEE Symposium on Foundations of Computer
based algorithm finds a good solution. Real-Time Applications. Marc S. Moonen, Gene H. Science (2006), 133–142.
Golub, and Bart L. R. De Moor, eds. NATO ASI Series, 39. Vu, V.H. and Tao, T. The condition number of a
While there are standard algorithms 1992, 365–368. randomly perturbed matrix. In STOC ’07: the 39th
for solving linear equations that do not 15. Englert, M., Röglin, H., and Vöcking, B. Worst case Annual ACM Symposium on Theory of Computing
and probabilistic analysis of the 2-opt algorithm (2007), 248–255.
have the poor worst-case performance for the TSP: extended abstract. In SODA ’07: The 40. Wschebor, M. Smoothed analysis of k(a).
of partial pivoting, they are rarely used 18th Annual ACM-SIAM Symposium on Discrete J. Complexity 20, 1 (February 2004), 97–107.
Algorithms (2007), 1295–1304.
as they are less efficient. 16. Feige, U. Refuting smoothed 3CNF formulas. In The
For more examples of algorithm 48th Annual IEEE Symposium on Foundations of Daniel A. Spielman ([email protected]) is a
Computer Science (2007), 407–417. professor of Applied Mathematics and Computer Science
design inspired by smoothed analysis 17. Flaxman, A. and Frieze, A.M. The diameter of at Yale University, New Haven, CT.
and perturbation theory, see Teng.37 randomly perturbed digraphs and some applications.
In APPROX-RANDOM (2004), 345–356. Shang-Hua Teng ([email protected]) is a
18. Gass, S. and Saaty, T. The computational algorithm professor of Department of Computer Science, at Boston
Acknowledgments for the parametric objective function. Naval Res. University, and senior research scientist at Akamai
Logist. Quart. 2, (1955), 39–45. Technologies, Inc.
We would like to thank Alan Edelman 19. Haimovich, M. The simplex algorithm is very
for suggesting the name “Smoothed good!: On the expected number of pivot steps and
related properties of random linear programs.
Analysis” and thank Heiko Röglin and Technical report, Columbia University (April 1983).
Don Knuth for helpful comments on 20. Huang, L.-S. and Teng, S.-H. On the approximation
and smoothed complexity of Leontief market
this writing. equilibria. In Frontiers of Algorithms Workshop
Due to Communications space con- (2007), 96–107.
21. Kalai, A. and Teng, S.-H. Decision trees are PAC-
strainsts, we have had to restrict our learnable from most product distributions: a
bibliography to 40 references. We smoothed analysis. ArXiv e-prints (December 2008).
22. Karger, D. and Onak, K. Polynomial approximation
apologize to those whose work we have schemes for smoothed and random instances of
been forced not to reference. multidimensional packing problems. In SODA ’07:
the 18th Annual ACM-SIAM Symposium on Discrete
Algorithms (2007), 1207–1216.
23. Kelner, J.A. and Nikolova, E. On the hardness and
References smoothed complexity of quasi-concave minimization.
1. Adler, I. The expected number of pivots needed In The 48th Annual IEEE Symposium on Foundations
to solve parametric linear programs and the of Computer Science (2007), 472–482.
efficiency of the self-dual simplex method. 24. Kelner, J.A. and Spielman, D.A. A randomized
Technical report, University of California at Berkeley polynomial-time simplex algorithm for linear
(May 1983). programming. In The 38th Annual ACM Symposium
2. Andoni, A. and Krauthgamer, R. The smoothed on Theory of Computing (2006), 51–60.
complexity of edit distance. In Proceedings of ICALP, 25. Klee, V. and Minty, G.J. How good is the simplex © ACM 0001-0782/09/1000 $10.00.
P. 96 P. 97
Technical Finding the Frequent
Perspective
Data Stream Items in Streams of Data
Processing— By Graham Cormode and Marios Hadjieleftheriou
When You Only
Get One Look
By Johannes Gehrke
Technical Perspective
Relational Query Optimization—Data
Management Meets Statistical Estimation
By Surajit Chaudhuri
RELATIONAL SYSTEMS HAVE made it pos- them. It is also important to be able to timization. It revisits the difficult prob-
sible to query large collections of data maintain these statistics efficiently in lem of efficient estimation of the num-
in a declarative style through languages the face of data updates. These require- ber of distinct values in an attribute and
such as SQL. The queries are translated ments demand a judicious trade-off makes a number of contributions by
into expressions consisting of relation- between quality of estimation and the building upon past work that leverages
al operations but do not refer to any overheads of doing this estimation. randomized algorithms. It suggests an
implementation details. There is a key The early commercial relational sys- unbiased estimator for distinct values
component that is needed to support tems used estimation techniques us- that has a lower mean squared error
this declarative style of programming ing summary structures such as simple than previously proposed estimators
and that is the query optimizer. The op- histograms on attributes of the relation. based on a single scan of data. The au-
timizer takes the query expression as in- Each bucket of the histogram repre- thors propose a summary structure (syn-
put and determines how best to execute sented a range of values. The histogram opsis) for a relation such that the num-
that query. This amounts to a combina- captured the total number of rows and ber of distinct values in a query using
torial optimization on a complex search number of distinct values for each buck- multiset union, multiset intersection,
space: finding a low-cost execution plan et of the histogram. For larger query and multiset difference operations over
among all plans that are equivalent to expressions, histograms were derived a set of relations can be estimated from
the given query expression (consider- from histograms on its sub-expressions the synopses of base relations. Further-
ing possible ordering of operators, al- in an adhoc manner. Since the mid- more, if only one of the many partitions
ternative implementations of logical 1990s, the unique challenges of statis- of a relation is updated, the synopsis
operators, and different use of physical tical estimation in the context of query for just that partition must be rebuilt to
structures such as indexes). The success optimization attracted many research- derive the distinct value estimations for
that relational databases enjoy today in ers with backgrounds and interests in the entire relation.
supporting complex decision-support algorithms and statistics. We saw de- It has been 30 years since the frame-
queries would not have been a reality velopment of principled approaches to work of query optimization was defined
without innovation in query optimiza- these statistical estimation problems by System-R and relational query opti-
tion technology. that leveraged randomized algorithms mization has been a great success story
In trying to identify a good execu- such as probabilistic counting. Keeping commercially. Yet, statistical estimation
tion plan, the query optimizer must be with the long-standing tradition of close problems for query expressions remain
aware of statistical properties of data relation between database research and one area where significant advances
over which the query is defined because the database industry, some of these so- are needed to take the next big leap in
these statistical properties strongly in- lutions have been adopted in commer- the state of the art for query optimiza-
fluence the cost of executing the query. cial database products. tion. Recently, researchers are trying to
Examples of such statistical properties The following paper by Beyer et al. understand how additional knowledge
are total number of rows in the relation, showcases recent progress in statistical on statistical properties of data and
distribution of values of attributes of estimations in the context of query op- queries can best be gleaned from past
a relation, and the number of distinct executions to enhance the core statisti-
values of an attribute. Because the op- cal estimation abilities. Although I have
timizer needs to search among many The authors highlighted query optimization, such
alternative execution plans for the statistical estimation techniques also
given query and tries to pick one with showcase recent have potential applications in other ar-
low cost, it needs such statistical esti- progress in statistical eas such as data profiling and approxi-
mation not only for the input relations, mate query processing. I invite you to
but also for many sub-expressions that estimations in the read the following paper to sample a
it considers part of its combinatorial context of query subfield that lies at the intersection of
search. Indeed, statistical properties of database management systems, statis-
the sub-expressions guide the explora- optimization. tics, and algorithms.
tion of alternatives considered by the
query optimizer. Since access to a large Surajit Chaudhuri ([email protected]) is a principal
researcher and research area manager at Microsoft. He is
data set can be costly, it is not feasible to an ACM Fellow and the recipient of the 2004 ACM SIGMOD
determine statistical properties of these Contributions Award.
Distinct-Value Synopses
for Multiset Operations
By Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwald, and Yannis Sismanis
number of DVs, or by taking a single pass through the data A B, and A \ B are given respectively by nA(v) nB(v), min(nA(v),nB(v) ), and
and using hashing techniques to compute an estimate using max(nA(v) nB(v), 0).
O C TO B E R 2 0 0 9 | VO L . 5 2 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 87
research highlights
in Section 2, a simple “KMV” (K Minimum hash Values) Specifically, if D 1 points are placed randomly and uni-
synopsis2 for a single partition—obtained by hashing the formly on the unit interval, then, by symmetry, the expected
DVs in the partition and recording the K smallest hash distance between any two neighboring points is 1/(D 1)
values—along with a “basic” DV estimator based on the z 1/D, so that the expected value of U(k), the kth smallest
synopsis; see Equation 1. In Section 3, we briefly review point, is E[U(k)] z 3kj =1 (1/D) k/D. Thus D z k/E[U(k)]. The sim-
prior work and show that virtually all prior DV estimators plest estimator of E[U(k)] is simply U(k) itself—the so-called
can be viewed as versions of, or approximations to, the “method-of-moments” estimator—and yields the basic
basic estimator. In Section 4, we propose a new DV esti- estimator
mator—see Equation 2—that improves upon the basic
estimator. The new estimator also uses the KMV synopsis D̂ kBE k/U(k) (1)
and is a deceptively simple modification of the basic esti-
mator. Under a probabilistic model of hashing, we show The above 1-to-1 mapping from the D DVs to a set of
that the new estimator is unbiased and has lower mean- D uniform random numbers can be constructed perfectly
squared error than the basic estimator. Moreover, when using O(D log D) memory, but this memory requirement
there are many DVs and the synopsis size is large, we show is clearly infeasible for very large datasets. Fortunately, a
that the new unbiased estimator has essentially the mini- hash function—which typically only requires an amount of
mal possible variance of any DV estimator. To help users memory logarithmic in D—often “looks like” a uniform ran-
assess the precision of specific DV estimates produced dom number generator. In particular, let # (A) = {v1, v2, …,
by the unbiased estimator, we provide probabilistic error vD} be the domain of multiset A, i.e., the set of DVs in A, and
bounds. We also show how to determine appropriate syn- let h be a hash function from # (A) to {0, 1, …, M}, where
opsis sizes for achieving a desired error level. M is a large positive integer. For many hash functions, the
In Section 5, we augment the KMV synopsis with coun- sequence h(v1), h(v2), …, h(vD) looks like the realization of
ters—in the spirit of Ganguly et al. and Shukla et al.10, 18—to a sequence of independent and identically distributed
obtain an “AKMV synopsis.” We then provide methods for (i.i.d.) samples from the discrete uniform distribution on
combining AKMV synopses such that the collection of these {0, 1, …, M}. Provided that M is sufficiently greater than D,
synopses is “closed” under multiset operations on the parent the sequence U1 = h(v1)/M, U2 = h(v2)/M, …, UD = h(vD)/M will
partitions. The AKMV synopsis can also handle deletions of approximate the realization of a sequence of i.i.d. samples
individual partition elements. We also show how to extend from the continuous uniform distribution on [0, 1]. This
our simple unbiased estimator to exploit the AKMV synopsis assertion requires that M be much larger than D to avoid col-
and provide unbiased estimates in the presence of multiset lisions, i.e., to ensure that, with high probability, h(vi) x h(vj)
operations, obtaining an unbiased estimate of Jaccard dis- for all i p j. A “birthday problem” argument 16, p. 45 shows that
tance in the process; see Equations 7 and 8. Section 6 con- collisions will be avoided when M = 7(D2). We assume hence-
cerns some recent complements to, and extensions of, our forth that, for all practical purposes, any hash function that
original results in Beyer et al.3 arises in our discussion avoids collisions. We use the term
“looks like” in an empirical sense, which suffices for appli-
2. A BASIC ESTIMATOR AND SYNOPSIS cations. Thus, in practice, the estimator D̂ kBE can be applied
The idea behind virtually all DV estimators can be viewed as with U(k) taken as the kth smallest hash value (normalized by
follows. Each of the D DVs in a dataset is mapped to a ran- a factor of 1/M). In general, E[1/X] 1/E[X] for a non-negative
dom location on the unit interval, and we look at the posi- random variable X,17, p. 351 and hence
tion U(k) of the kth point from the left, for some fixed value
of k; see Figure 1. The larger the value of D, i.e., the greater E[D̂ kBE] E[k/U(k)] k/E[U(k)] z D
the number of points on the unit interval, the smaller the
value of U(k). Thus D can plausibly be estimated by a decreas-
ing function of U(k).
Algorithm 1 (KMV Computation).
hashes each DV to a position in a bit vector V of length M = counter, in order to permit DV estimation in the presence of
O(D), and uses the number of 1-bits to estimate the DV count. both insertions and deletions to the dataset. This modifica-
Its O(D) storage requirement is typically unacceptable for tion does not ameliorate the inclusion/exclusion problem,
modern datasets. however.
The “logarithmic counting” method of Flajolet and Sample-Counting Synopsis: Another type of synopsis arises
Martin 1, 9 uses a bit vector of length L = O(log D). The idea is to from the “sample counting” DV-estimation method—also
hash each of the DVs in A to the set {0, 1}L of binary strings of called “adaptive sampling”—credited to Wegman.1 Here
length L, and look for patterns of the form 0 j1 in the leftmost the synopsis for partition A comprises a subset of {h(v):
bits of the hash values. For a given value of j, the probability v # (A)}, where h: # (A) c {0, 1, …, M} is a hash function
of such a pattern is 1/2 j1, so the expected observed num- as before. In more detail, the synopsis comprises a fixed-
ber of such patterns after D DVs have been hashed is D/2 j1. size buffer that holds binary strings of length L = log(M),
Assuming that the longest observed pattern, say with j* lead- together with a “reference” binary string s, also of length
ing 0’s, is expected to occur once, we set D/2 j*1 = 1, so that L. The idea is to hash the DVs in the partition, as in loga-
D = 2 j *1; see Figure 2, which has been adapted from Astrahan rithmic counting, and insert the hash values into a buffer
et al.1 The value of j * is determined approximately, by taking that can hold up to k > 0 hash values; the buffer tracks only
each hash value, transforming the value by zeroing out all the distinct hash values inserted into it. When the buffer
but the leftmost 1, and computing the logarithmic-counting fills up, it is purged by removing all hash values whose left-
synopsis as the bitwise-OR of the transformed values. Let r most bit is not equal to the leftmost bit of s; this operation
denote the position (counting from the left, starting at 0) of removes roughly half of the hash values in the buffer. From
the leftmost 0 bit in the synopsis. Then r is an upper bound this point on, a hash value is inserted into the buffer if and
for j*, and typically a lower bound for j * 1, leading to a only if the first bit matches the first bit of s. The next time
crude (under)estimate of 2r. For example, if r = 2, so that the the buffer fills up, a purge step (with subsequent filtering)
leftmost bits of the synopsis are 110 (as in Figure 2), we know is performed by requiring that the two leftmost bits of each
that the pattern 001 did not occur in any of the hash values, hash value in the buffer match the two leftmost bits of the
so that j * < 2. reference string. This process continues until all the values
The actual DV estimate is obtained by multiplying 2r by in the partition have been hashed. The final DV estimate is
a factor that corrects for the downward bias, as well as for roughly equal to K2r, where r is the total number of purges
hash collisions. In the complete algorithm, several inde- that have occurred and K is the final number of values in
pendent values of r are, in effect, averaged together (using a the buffer. For sample-counting algorithms with reference
technique called “stochastic averaging”) and then exponen- string equal to 00 . . . 0, the synopsis holds the K smallest
tiated. Subsequent work by Durand and Flajolet8 improves hash values encountered, where K lies roughly between k/2
on the storage requirement of the logarithmic counting and k.
algorithm by tracking and maintaining j * directly. The num- The Bellman Synopsis: In the context of the Bellman sys-
ber of bits needed to encode j * is O(log log D), and hence the tem, the authors in Dasu et al.6 propose a synopsis related
technique is called LogLog counting. to DV estimation. This synopsis comprises k entries
The main drawback of the above bit-vector data struc- and uses independent hash functions h1, h2,…,hk; the
tures, when used as synopses in our partitioned-data setting, ith synopsis entry is given by the ith minHash value mi =
is that union is the only supported set operation. One must, minv # (A) hi(v). The synopsis for a partition is not actually
e.g., resort to the inclusion/exclusion formula to handle set used to directly compute the number of DVs in the partition,
intersections. As the number of set operations increases, but rather to compute the Jaccard distance between parti-
this approach becomes extremely cumbersome, expensive, tion domains; see Section 3.3. (The Jaccard distance between
and inaccurate. ordinary sets A and B is defined as J(A, B) = |A B|/|A B|.
Several authors 10, 18 have proposed replacing each bit in the If J(A, B) = 1, then A = B; if J(A, B) = 0, then A and B are dis-
logarithmic-counting bit vector by an exact or approximate joint.) Indeed, this synopsis cannot be used directly for DV
estimation because the associated DV estimator is basically
Figure 2. Logarithmic counting. D̂ BE
1 , which has infinite expectation; see Section 2. When
constructing the synopsis, each scanned data item must be
Bit string hashed k times for comparison to the k current minHash
Hash with only
Color
values; for the KMV synopsis, each scanned item need only
value the leftmost “1”
be hashed once.
}
Red 111 100
Green 101 100 3.2. DV estimators
Red 111 100 Bit position: 012 The basic estimator D̂ kBE was proposed in Bar-Yosssef et al.,2
OR = 110
Blue 011 010 along with conservative error bounds based on Chebyshev’s
Pink 010 010 inequality. Interestingly, both the logarithmic and sample-
counting estimators can be viewed as approximations to the
j* = 1 r = Position of leftmost “0” = 2 basic estimator. For logarithmic counting—specifically the
2r = 4 Flajolet–Martin algorithm—consider the binary decimal
representation of the normalized hash values h(v)/M, where
4. AN IMPROVED DV ESTIMATOR In particular, E[D̂ kUB] D provided that k > 1, and Var[D̂ kUB]
As discussed previously, the basic estimator D̂ kBE is biased D(D k 1)/(k 2) provided that k > 2.
upwards for the true number of DVs D, and so somehow
needs to be adjusted downward. We therefore consider the Proof. For k > r r 0, if follows from (3) that
estimator
basic and unbiased estimators, note that, by (5), E[D̂ kBE] kD/ 24 relational tables, with a total of 504 attributes and roughly
(k
1) and 2.6 million tuples. For several different hash functions, we
computed the average value of the relative error RE(D̂kUB) =
|D̂kUB
D|/D over multiple datasets in the database. The hash
functions are described in detail in Beyer et al.3; for example,
the Advanced Encryption Standard (AES) hash function is a
well established cipher function that has been studied exten-
Thus D̂ kBE is biased high for D, as discussed earlier, and sively. The “baseline” curve corresponds to an idealized hash
has higher MSE than D̂ kUB. function as used in our analysis. As can be seen, the real-world
We can also use the result in (3) to obtain probabilistic accuracies are consistent with the idealized results, and rea-
(relative) error bounds for the estimator D̂ kUB. Specifically, set sonable accuracy can be obtained even for synopsis sizes of
x
Ix(a, b) ³0 t a1 (1 t)b1 dt/B(a, b), so that P{U(k) b x} Ix(k, D − k k < 100. In Beyer et al.,3 we found that the relative performance
1). Then, for 0 < e < 1 and k r 1, we have of different hash functions was sensitive to the degree of regu-
larity in the data; the AES hash function is relatively robust to
such data properties, and is our recommended hash function.
0
10 100 1000
Synopsis size (k)
O C TO BE R 2 0 0 9 | VO L . 5 2 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 93
research highlights
of v in E; observe that cE(v) = 0 if v is not present in E, so that Theorem 4. If k > 1 then E[D̂E] = DE. If k > 2, then
there may be one or more zero-valued counters; see line 4 of
Figure 4 for the case E = A w B. DE(kDv
k2
Dv k DE)
With these definitions, the collection of AKMV syn- Var [D̂E]
k(k
2)
opses over compound partitions is closed under mul-
tiset operations, and an AKMV synopsis can be built up Proof. The distribution of KE does not depend on the hash
incrementally from the synopses for the base partitions. values {h(v): v # (Av )}. It follows that the random vari-
Specifically, suppose that we combine (base or compound) ables KE and U(k), and hence the variables p̂ and U(k), are
partitions A and B—having respective AKMV synopses statistically independent. By standard properties of the
LA = (LA, cA) and LB = (LB, cB)—to create E = A ◊ B, where hypergeometric distribution, E[KE] = kDE/Dv , so that E[D̂E] =
◊ {v , \}. Then the AKMV synopsis for E is (LA LB,cE),
, w E[r̂ D̂v ] = E[r̂ ]E[D̂v ] = r Dv = DE. The second assertion fol-
where lows in a similar manner. Q
Thus D̂E is unbiased for DE. It also follows from the proof
cA(n) cB(n) if ◊ v that the estimator r̂ is unbiased for DE/Dv . In the special
cE(n) min (cA(n), cB(n)) if ◊ w case where E = A w B for two ordinary sets A and B, the ratio
max (cA(n)
cB(n), 0) if ◊ \ r corresponds to the Jaccard distance J(A, B), and we obtain
an unbiased estimator of this quantity. We can also obtain
As discussed in Beyer et al. and Gemulla,3, 11 the AKMV probability bounds that generalize the bounds in (6); see
synopsis can be sometimes be simplified. If, for example, Beyer et al.3 for details.
all partitions are ordinary sets (not multisets) and ordinary Figure 5 displays the accuracy of the AKMV estima-
set operations are used to create compound partitions, then tor when estimating the number of DVs in a compound
the AKMV counters can be replaced by a compact bit vector, partition. For this experiment, we computed a KMV synop-
yielding an O(k log D) synopsis size. sis of size k = 8192 for each dataset in the RDW database.
Then, for every possible pair of synopses, we used DˆE to
5.2. DV estimator for AKMV synopsis estimate the DV count for the union and intersection, and
We now show how to estimate the number of DVs for a also estimated the Jaccard distance between set domains
compound partition using the partition’s AKMV synop- using our new unbiased estimator rˆ defined in (7). We
sis; to this end, we need to generalize the unbiased DV also estimated these quantities using a best-of-breed esti-
estimator of Section 4. Consider a compound partition E mator: the SDLogLog estimator, which is a highly tuned
created from n r 2 base partitions A1, A2, …, An using mul- implementation of the LogLog estimator given in Durand
tiset operations, along with the AKMV synopsis LE = (LE, cE), and Flajolet.8 For this latter estimator, we estimated the
where LE = (h(v1), h(v2), …, h(vk) ). Denote by VL = {v1, v2, …, vk} number of DVs in the union directly, and then used the
the set of data values corresponding to the elements of LE. inclusion/exclusion rule to estimate the DV count for the
(Recall our assumption that there are no hash collisions.) intersection and then for the Jaccard distance. The figure
Set KE = |{v Œ VL: cE(v) > 0}|; for the example E = A w B in displays, for each estimator, a relative-error histogram for
Figure 4, KE = 2. It follows from Theorem 3 that LE is a size- each of these three multiset operations. (The histogram
k KMV synopsis of the multiset Av = A1v A v ...v A . The shows, for each possible RE value, the number of dataset
2 n
key observation is that, under our random hashing model, pairs for which the DV estimate yielded that value.) For the
VL can be viewed as a random sample of size k drawn uni-
formly and without replacement from # (Av ); denote by
Dv = |# (Av )| the number of DVs in Av . The quantity KE is a Figure 5. Accuracy Comparison for Union, Intersection, and Jaccard
random variable that represents the number of elements Distance on the RDW Dataset.
in VL that also belong to the set # (E). It follows that KE has
500
a hypergeometric distribution: setting DE = |# (E)|, we have
P{KE = j} = Dj E Dvk
jDE / Dkv Unbiased-KMV/Intersections
Unbiased-KMV/Unions
We now estimate DE using KE, as defined above, and U(k), the 400 Unbiased-KMV/Jaccard
SDLogLog/Intersections
largest hash value in LE. From Section 4.1 and Theorem 3, SDLogLog/Unions
we know that D̂v = (k − 1)/U(k) is an unbiased estimator of Dv ; SDLogLog/Jaccard
we would like to “correct” this estimator via multiplication 300
Frequency
r̂ KE/k (7)
100
the fraction of elements in the sample VL # (Av ) that
belong to # (E). This leads to our proposed estimator
0
0 0.05 0.10 0.15
KE k 1 Relative error
D̂E (8)
k U(k)
Technical Perspective
Data Stream Processing—
When You Only Get One Look
By Johannes Gehrke
T H E D ATA B A S E A N D systems communi- structures, and they do not require Within AT&T (the home institution
ties have made great progress in de- expensive sorting operations. They of the authors) a variety of streaming
veloping database systems that allow are online—an answer of the query algorithms are deployed today for net-
us to store and query huge amounts over the current prefix of the stream is work monitoring, based on real-time
of data. My first computer cost about available at any time. These so-called analysis of packet header data. For
$1,000; it was a Commodore 64 with data stream algorithms achieve these example, quantile aggregates are used
a 170KB floppy disk drive. Today (Sep- properties by trading exact query an- to track the distribution of round-trip
tember 2009), I can configure a 1.7TB swers against approximate answers, delays between different points in the
file server for the same price. Compa- but these approximations come with network over time. Similarly, heavy-
nies are responding to this explosion provable quality guarantees. hitter aggregates are used to find
in storage availability by building big- The following paper by Graham sources that send the most traffic to a
ger and bigger data warehouses, where Cormode and Marios Hadjieleftheriou given destination over time.
every digital trail we leave is stored for gives an overview of recent progress Although the paper surveys about
later analysis. Data arrives 24/7, and for the important primitive of finding 30 years of research, there is still
real-time “on-the-fly” analysis—where frequent items in a data stream. Infor- much progress in the area. Moreover,
answers are always available—is be- mally, an item is frequent in a prefix finding frequent items is just one sta-
coming mandatory. Here is where data of the stream if its relative frequency tistic. In practice, much more sophis-
stream processing comes to the rescue. exceeds a user-defined threshold. An- ticated queries such as frequent com-
In data stream processing sce- other formulation of the problem just binations of items, mining clusters,
narios, data arrives at high speeds looks for the most frequently occurring or other statistical models require
and must be analyzed in the order it items. The authors present an algo- data stream algorithms with quality
is received using a limited amount of rithmic framework that encompasses guarantees. For example, recent work
memory. The area has two main di- previous work and shows the results of from Yahoo! for content optimization
rections: a systems side and an algo- a thorough experimental comparison shows how to use time series models
rithmic side. On the systems side, re- of the different approaches. that are built online to predict the
searchers have developed data stream This paper is especially timely since click-through rate of an article based
processing systems that work like da- some of these algorithms are already on the stream of user clicks.
tabase systems turned upside down: in use (and those that are in use are not I think we have only scratched the
Long-running queries are registered necessarily the best, according to the surface both for applications and in
with the system and data is streamed authors). For example, inside Google’s novel algorithms, and I am looking
through the system. Startups now sell analysis infrastructure, in the map- forward to another 30 years of innova-
systems that analyze streaming data reduce framework, there exist several tion. I recommend this paper to learn
for solutions in areas such as fraud prepackaged “aggregators” that in one about the types of techniques that have
detection, algorithmic trading, and pass collect statistics over a huge data- been developed over the years and see
network monitoring. They often offer set. The “quantile” aggregate, which how ideas from algorithms, statistics,
at least an order of magnitude perfor- collects a value at each quantile of the and databases have come together in
mance improvement over traditional data, uses a previously developed algo- this problem.
database systems. rithm that is covered in the paper, and
On the algorithmic side, there has the “top” aggregate estimates the most Johannes Gehrke ([email protected]) is an
associate professor at Cornell University, Ithaca, NY.
been much research on novel one- popular values in a dataset, again
pass algorithms. These algorithms using an algorithm captured by the
have no need for secondary index framework in the paper. © 2009 ACM 0001-0782/09/1000 $10.00
made to an Internet search engine, and the frequent items are to provide formal guarantees on the quality of their output as a
now the (currently) popular terms. These are not simply hypo- function of an accuracy parameter e. We also provide baseline
thetical examples, but genuine cases where algorithms for implementations of many of these algorithms against which
this problem have been applied by large corporations: AT&T11 future algorithms can be compared, and on top of which algo-
and Google,23 respectively. Given the size of the data (which is rithms for different problems can be built. We perform experi-
being generated at high speed), it is important to find algo- mental evaluation of the algorithms over a variety of data sets
rithms which are capable of processing each new update very to indicate their performance in practice. From this, we are
quickly, without blocking. It also helps if the working space able to identify clear distinctions among the algorithms that
of the algorithm is very small, so that the analysis can hap- are not apparent from their theoretical analysis alone.
pen over many different groups in parallel, and because small
structures are likely to have better cache behavior and hence 2. DEFINITIONS
further help increase the throughput. We first provide formal definition of the stream and the fre-
Obtaining efficient and scalable solutions to the frequent quencies fi of the items within the stream as the number of
items problem is also important since many streaming times item i is seen in the stream.
applications need to find frequent items as a subroutine of
another, more complex computation. Most directly, min- Definition 1. Given a stream S of n items t1 … tn, the fre-
ing frequent itemsets inherently builds on finding frequent quency of an item i is fi = |{ j|tj = i}|. The exact f-frequent
items as a basic building block. Finding the entropy of a items comprise the set {i| fi > fn}.
stream requires learning the most frequent items in order
to directly compute their contribution to the entropy, and Example. The stream S = (a, b, a, c, c, a, b, d) has fa = 3, fb = 2,
remove their contribution before approximating the entropy fc = 2, fd = 1. For f = 0.2, the frequent items are a, b, and c.
of the residual stream.8 The HSS (Hierarchical Sampling A streaming algorithm which finds the exact f-frequent
from Sketches) technique uses hashing to derive multiple items must use a lot of space, even for large values of f, based
substreams, the frequent elements of which are extracted on the following information-theoretic argument. Given
to estimate the frequency moments of the stream.4 The fre- an algorithm that claims to solve this problem for f = 50%,
quent items problem is also related to the recently popular we could insert a set S of N items, where every item has fre-
area of Compressed Sensing. quency 1. Then, we could also insert N − 1 copies of item i.
Other work solves generalized versions of frequent items If i is now reported as a frequent item (occurring more than
problems by building on algorithms for the “vanilla” version 50% of the time) then i S, else i S. Consequently, since cor-
of the problem. Several techniques for finding the frequent rectly storing a set of size N requires 7(N) space, 7(N) space
items in a “sliding window” of recent updates (instead of is also required to solve the frequent items problem. That is,
all updates) operate by keeping track of the frequent items any algorithm which promises to solve the exact problem on a
in many sub-windows.2, 13 In the “heavy hitters distinct” stream of length n must (in the worst case) store an amount of
problem, with applications to detecting network scanning information that is proportional to the length of the stream,
attacks, the count of an item is the number of distinct pairs which is impractical for the large stream sizes we consider.
containing that item paired with a secondary item. It is typi- Because of this fundamental difficulty in solving the exact
cally solved extending a frequent items algorithm with dis- problem, an approximate version is defined based on a tol-
tinct counting algorithms.25 Frequent items have also been erance for error, which is parametrized by e.
applied to models of probabilistic streaming data,17 and
within faster “skipping” techniques.3 Definition 2. Given a stream S of n items, the e-approximate
Thus the problem is an important one to understand and frequent items problem is to return a set of items F so that for
study in order to produce efficient streaming implementa- all items i F, fi > (f − e)n, and there is no i F such that fi > fn.
tions. It remains an active area, with many new research
contributions produced every year on the core problem and Since the exact (e = 0) frequent items problem is hard
its variations. Due to the amount of work on this problem, in general, we use “frequent items” or “the frequent items
it is easy to miss out some important references or fail to problem” to refer to the e-approximate frequent items prob-
appreciate the properties of certain algorithms. There are lem. A closely related problem is to estimate the frequency
several cases where algorithms first published in the 1980s of items on demand.
have been “rediscovered” two decades later; existing work
is sometimes claimed to be incapable of a certain guaran- Definition 3. Given a stream S of n items defining frequen-
tee, which in truth it can provide with only minor modifica- cies fi as above, the frequency estimation problem is to pro-
tions; and experimental evaluations do not always compare cess a stream so that, given any i, an f^i is returned satisfying
against the most suitable methods. fi b f^i b fi + en.
In this paper, we present the main ideas in this area, by
describing some of the most significant algorithms for the core A solution to the frequency estimation problem allows the
problem of finding frequent items using common notation frequent items problem to be solved (slowly): one can estimate
and terminology. In doing so, we also present the historical the frequency of every possible item i, and report those i’s
development of these algorithms. Studying these algorithms whose frequency is estimated above (f − e)n. Exhaustively enu-
is instructive, as they are relatively simple, but can be shown merating all items can be very time consuming (or infeasible,
Figure 4. Performance of counter-based algorithms on real network data (a) speed, (b) precision, and (c) average relative error.
F LC LCD SSL SSH F LC LCD SSL SSH F LC LCD SSL SSH
60
LC, LCD
ARE
15000 50 0.1
40
10000 30
0.05
5000 20
10
0 0 0
0.0001 0.001 0.01 0.0001 0.001 0.01 0.0001 0.001 0.01
f (log scale) f (log scale) f (log scale)
(a) HTTP: Speed vs. f. (b) HTTP: Precision vs. f. (c) HTTP: ARE vs. f of frequent items.
Figure 5. Performance of sketch algorithms on real network data (a) speed, (b) size, and (c) precision.
CS CMH CGT CS CMH CGT CS CMH CGT
Precision (%)
Updates/ms
3000 2.5e+07
Bytes
2500 2e+07 85
2000 1.5e+07
80
1500 1e+07
75
1000 5e+06
500 0 70
0.0001 0.001 0.01 0.0001 0.001 0.01 0.0001 0.001 0.01
f (log scale) f (log scale) f (log scale)
(a) HTTP: Speed vs. f. (b) HTTP: Size vs. f. (c) HTTP: Precision vs. f.
for small values of f. CMH is the most space efficient sketch In the weighted input case, each update comes with an
and still consumes space three times as large as the least associated weight (such as a number of bytes, or num-
space efficient counter-based algorithm. ber of units sold). Here, sketching algorithms directly
Precision, Recall, and Error: The sketch algorithms all handle weighted updates because of their linearity. The
have near perfect recall, as is the case with the counter-based SpaceSaving algorithm also extends to the weighted
algorithms. Figure 5(c) shows that they also have good preci- case, but this is not known to be the case for the other
sion, with CS reporting the largest number of false positives. counter-based algorithms discussed.
Nevertheless, on some other datasets we tested (not shown In the distributed data case, different parts of the input
here), the results were reversed. We also tested the average are seen by different parties (different routers in a net-
relative error in the frequency estimation of the truly fre- work, or different stores making sales). The problem is
quent items. For sufficiently skewed distributions all algo- then to find items which are frequent over the union of
rithms can estimate item frequencies very accurately, and the all the inputs. Again due to their linearity properties,
results from all sketches were similar since all hierarchical sketches can easily solve such problems. It is less clear
sketch algorithms essentially correspond to a single instance whether one can merge together multiple counter-
of a CountSketch or CountMin sketch of equal size. based summaries to obtain a summary with the same
Conclusions: There is no clear winner among the sketch accuracy and worst-case space bounds.
algorithms. CMH has small size and high update through- Often, the item frequencies are known to follow some
put, but is only accurate for highly skewed distributions. statistical distribution, such as the Zipfian distribution.
CGT consumes a lot of space but it is the fastest sketch and With this assumption, it is sometimes possible to prove
is very accurate in all cases, with high precision and good fre- a smaller space requirement on the algorithm, as a func-
quency estimation accuracy. CS has low space consumption tion of the amount of “skewness” in the distribution.9, 21
and is very accurate in most cases, but has slow update rate In some applications, it is important to find how many
and exhibits some random behavior. distinct observations there have been, leading to a distinct
heavy hitters problem. Now the input stream S is of the
5. CONCLUDING REMARKS form (i, j), and fi is defined as |{j|(i, j) S}|. Multiple
We have attempted to present algorithms for finding frequent occurrences of (i, j) only count once towards fi. Techniques
items in streams, and give an experimental comparison for “distinct frequent items” rely on combining frequent
of their behavior to serve as a baseline for comparison. For items algorithms with “count distinct” algorithms.25
insert-only streams, the clear conclusion of our experiments While processing a long stream, it may be desirable to
is that the SpaceSaving algorithm, a relative newcomer, weight more recent items more heavily than older ones.
has surprisingly clear benefits over others. We observed that Various models of time decay have been proposed to
implementation choices, such as whether to use a heap or achieve this. In a sliding window, only the most recent
lists of items grouped by frequencies, trade-off speed, and items are considered to define the frequent items.2
space. For sketches to find frequent items over streams More generally time decay can be formalized via a func-
including negative updates, there is not such a clear answer, tion which assigns a weight to each item in the stream
with different algorithms excelling at different aspects of the as a function of its (current) age, and the frequency of
problem. We do not consider this the end of the story, and an item is the sum of its decayed weights.
continue to experiment with other implementation choices.
Our source code and experimental test scripts are available Each of these problems has also led to considerable effort
from http://www.research.att.com/^marioh/frequent–items/ from the research community to propose and analyze algo-
so that others can use these as baseline implementations. rithms. This research is ongoing, cementing the position of
We conclude by outlining some of the many variations of the frequent items problem as one of the most enduring and
the problem intriguing in the realm of algorithms for data streams.
Take Advantage of
ACM’s Lifetime Membership Plan!
ACM Professional Members can enjoy the convenience of making a single payment for their
entire tenure as an ACM Member, and also be protected from future price increases by
taking advantage of ACM's Lifetime Membership option.
ACM Lifetime Membership dues may be tax deductible under certain circumstances, so
becoming a Lifetime Member can have additional advantages if you act before the end of
2009. (Please consult with your tax advisor.)
Lifetime Members receive a certificate of recognition suitable for framing, and enjoy all of
the benefits of ACM Professional Membership.
But in about the same time it takes to enjoy a cup of coffee, you can learn more about your
ACM-sponsored group insurance program — a special member benefit that can help provide
you financial security at economical group rates.
Take just a few minutes today to make sure you’re properly insured.
Call Marsh Affinity Group Services at 1-800-503-9230 or visit www.personal-plans.com/acm.
AG5217
CAREERS
Open Positions at INRIA Candidates have to be highly competent in National Taiwan University
For Faculty and Research Scientists conducting research and education in the areas of Professor-Associate Professor-Assistant
human science, i.e., anthropometry, ergonomics, Professor
INRIA is a French public research institute in human factors, brain science and/or skill science.
Computer Science and Applied Mathematics. It 1. Mission: The Department of Computer Science and Infor-
is an outstanding and highly visible scientific or- A primary mission of this position is to promote in- mation Engineering, the Graduate Institute of
ganization, a major player in European research ternational research and education in the field of Networking and Multimedia, and the Graduate
and development programs. knowledge science based on the human science. Institute of Biomedical Electronics and Bioinfor-
INRIA has eight research centers in Paris, Bor- 2. Qualification: matics, all of National Taiwan University, have fac-
deaux, Grenoble, Lille, Nancy, Nice - Sophia Anti- Applicants have to hold a Ph.D. degree, and be ulty openings at all ranks beginning in February
polis, Rennes and Saclay. These centers host more qualified for highly scientific activities by par- 2010. Highly qualified candidates in all areas of
than 170 groups in partnership with universities ticipating in domestic and international research computer science/engineering and bioinformat-
and other research organizations. INRIA focuses initiatives. The search committee may ask candi- ics are invited to apply. A Ph.D. or its equivalent is
the activity of over 1200 researchers and faculty dates to present research activities together with required. Applicants are expected to conduct out-
members, 1200 PhD students and 500 post-docs a research plan in Japanese and/or English. standing research and be committed to teaching.
and engineers, on fundamental research at the 3. Contract: 5 years (extendable) Candidates should send a curriculum vitae, three
best international level, as well as on development 4. Selection: letters of reference, and supporting materials be-
and transfer activities in the following areas: The search committee shall evaluate the candi- fore October 15, 2009, to Prof Kun-Mao Chao, De-
! Modeling, simulation and optimization of dates’ expertise, research activities, and teaching partment of Computer Science and Information
complex dynamic systems skills. The evaluation result of the each applicant Engineering, National Taiwan University, No 1,
! Formal methods in programming secure and will not be released after the selection. The evalu- Sec 4, Roosevelt Rd., Taipei 106, Taiwan.
reliable computing systems ation procedure is strictly impartial, unbiased,
! Networks and ubiquitous information, and fair. An Equal Opportunity Employer, JAIST
computation and communication systems values diversity and strongly encourages applica- NEC Laboratories America, Inc.
! Vision and human-computer interaction tions from foreigners and women. Research Staff Members - Data Management
modalities, virtual worlds and robotics 5. Applicants must submit some applications:
! Computational Engineering, Computational Detailed can be found at: NEC Laboratories America, Inc., (www.nec-labs.
Sciences and Computational Medicine http://www.jaist.ac.jp/jimu/syomu/koubo/ com) is a vibrant industrial research center re-
! In 2010, INRIA will be opening several knowledge_media-e.htm nowned for technical excellence and high-impact
positions in its 8 research centers across France: 6. Deadline: Applications must be received no innovations that conducts research and develop-
! Junior and senior level positions, later than November 30, 2009. ment in support of global businesses by building
! Tenured and tenure-track positions, For more information, please contact: upon NEC’s 100-year history of innovation. Our re-
! Research and joint faculty positions with http://www.jaist.ac.jp/index-e.html search programs cover a wide range of technology
universities areas and maintain a balanced mix of fundamen-
tal and applied research as they focus on innova-
These positions cover all the above research KAIST tions which are ripe for technical breakthrough.
areas. Professor Our progressive environment provides exposure
INRIA centers provide outstanding scientific to industry-leading technologies and nurtures
environments and excellent working conditions. KAIST, the top Science & Engineering University close collaborations with leading research uni-
The institute welcomes applications from all na- in Korea, invites applications for tenure-track po- versities and institutions. Our collaborative at-
tionalities. It offers competitive salaries and so- sitions at all levels in Computer Science. We wel- mosphere, commitment to developing talent,
cial benefit programs. French schooling, medical come applications from outstanding candidates and extremely competitive benefits ensure that
coverage and social programs are highly regard- in all major fields of computer science and its in- we attract the sharpest minds in their respective
ed. Visa and working permits for the applicant terdisciplinary areas. fields. NEC Labs is headquartered in Princeton,
and the spouse will be provided. KAIST offers a competitive start-up research NJ and has a second location in Cupertino, CA.
Calendar and detailed application informa- fund and joint appointment with KAIST Insti- We are seeking researchers to join our Data
tion at: tutes, which will expand opportunities in inter- Management group in Cupertino, CA. The current
http://www.inria.fr/travailler/index.en.html disciplinary research and funding. KAIST also research focus of the group is to create cutting
More about our research groups: provides housing for five years. KAIST is commit- edge technologies for Data Management in the
http://www.inria.fr/recherche/equipes/index. ted to increasing the number of female and non- Cloud. Candidates must have a Ph.D. in Computer
en.html Korean faculty members. Science (or related fields) with solid data manage-
More about INRIA: www.inria.fr Required qualification is a Ph.D. or an equiva- ment background and strong publication record
lent degree in computer science or a closely re- in related areas, must be proactive with a “can-do”
lated field by the time of appointment. Strong attitude, and able to conduct research indepen-
Japan Advanced Institute of Science candidates who are expected to receive the Ph.D. dently. Experience in Cloud Computing, SaaS, Ser-
and Technology (JAIST) degrees within a year can be offered our appoint- vice Oriented Computing areas is a major plus.
Assistant Professor ment. Applicants must demonstrate strong re- The requirements for one position are:
search potential and commitment to teaching. ! Deep understanding of data management
Japan Advanced Institute of Science and Technolo- KAIST attracts nationwide top students pursuing systems and database internals
gy invites applicants for an Assistant Professor posi- B.S., M.S., and Ph.D. degrees. The teaching load is ! Strong hands-on system building and proto-
tion of the field of knowledge media in SCHOOL OF three hours per semester. typing skills
KNOWLEDGE SCIENCE. The appointee is expected For more information on available positions, ! Experience in distributed data management
to start her/his academic and educational activities please visit our website: ! Good knowledge of emerging data models and
in JAIST at an earlier time after 1st April, 2010. http://cs.kaist.ac.kr/service/employment.cs data processing techniques (e.g.,
! Key/Value Stores, Column-Oriented Databases, http://www.cs.txstate.edu/recruitment/ for job Positions are available at all ranks, and we
MapReduce, etc.) duties, required and preferred qualifications, ap- have a large number of limited term positions
! Knowledge of middleware technologies plication procedures, and information about the currently available.
university and the department. For all positions we require a Ph.D. Degree or
For consideration, please forward resume and Texas State University-San Marcos is an equal op- Ph.D. candidacy, with the degree conferred prior
research statement to [email protected] and portunity educational institution and as such does to date of hire. Submit your application electroni-
reference “DM-R1” in the subject line. not discriminate on grounds of race, religion, sex, cally at:
For another position the researcher will create national origin, age, physical or mental disabilities, http://ttic.uchicago.edu/facapp/
new models to capture, analyze, and predict the or status as a disabled or Vietnam era veteran. Texas Toyota Technological Institute at Chicago is an
state of the data management systems deployed State is committed to increasing the number of wom- Equal Opportunity Employer
in cloud environment and combine the insights en and minorities in faculty and senior administra-
provided by those models with the database in- tive positions. Texas State University-San Marcos is a
ternals to deliver leading-edge data management member of the Texas State University System. University of Chicago
technologies for unparalleled efficiency gains. Professor, Associate Professor, Assistant
The requirements are: Professor, and Instructor
! Demonstrated knowledge of statistical and Toyota Technological Institute at
probabilistic models in large scale data and sys- Chicago (TTI-C) The Department of Computer Science at the Univer-
tem analysis Computer Science at TTI-Chicago sity of Chicago invites applications from exception-
! Strong experience in data mining and data ana- Faculty Positions at All Levels ally qualified candidates in all areas of Computer
lytics Science for faculty positions at the ranks of Profes-
! Good hands-on system building and prototyp- Toyota Technological Institute at Chicago (TTI-C) sor, Associate Professor, Assistant Professor, and
ing skills is a philanthropically endowed degree-granting Instructor. The University of Chicago has the high-
! Experience in data warehousing institute for computer science located on the Uni- est standards for scholarship and faculty quality,
versity of Chicago campus. The Institute is expect- and encourages collaboration across disciplines.
For consideration, please forward resume and ed to soon reach a steady-state of 12 traditional The Chicago metropolitan area provides a di-
research statement to [email protected] and faculty (tenure and tenure track), and 12 limited verse and exciting environment. The local econ-
reference “DM-R2” in the subject line. term faculty. Applications are being accepted in omy is vigorous, with international stature in
EOE/AA/MFDV all areas, but we are particularly interested in: banking, trade, commerce, manufacturing, and
! Theoretical computer science transportation, while the cultural scene includes
! Speech processing diverse cultures, vibrant theater, world-renowned
Texas State University-San Marcos ! Machine learning symphony, opera, jazz, and blues. The University
Department of Computer Science ! Computational linguistics is located in Hyde Park, a pleasant Chicago neigh-
Applications are invited for a tenure-track posi- ! Computer vision borhood on the Lake Michigan shore.
tion at the rank of Assistant, Associate or Profes- ! Scientific computing All applicants must apply through the Universi-
sor. Consult the department recruiting page at ! Programming languages ty’s Academic Jobs website, http://academiccareers.
:>(9;/469,*633,.,
PU]P[LZ HWWSPJH[PVUZ MVY H [LU\YL[YHJR
HWWVPU[TLU[ HZ (ZZPZ[HU[ VY (ZZVJPH[L
7YVMLZZVY VM ,UNPULLYPUN PU [OL HYLH
VM *VTW\[LY ,UNPULLYPUN [V ILNPU Graduate School of Computer
PU :LW[LTILY ( KVJ[VYH[L PU *VTW\[LY VY ,SLJ[YPJHS and Information Sciences Dean
,UNPULLYPUNVYHYLSH[LKÄLSKPZYLX\PYLKHSVUN^P[OZ[YVUNPU[LYLZ[Z
Nova Southeastern University (NSU) invites applications for
PU\UKLYNYHK\H[L[LHJOPUNHUKPUKL]LSVWPUNHSHIVYH[VY`YLZLHYJO Dean of its Graduate School of Computer and Information
Sciences. The School offers a unique mix of innovative M.S.
WYVNYHTPU]VS]PUN\UKLYNYHK\H[LZ;LHJOPUNYLZWVUZPIPSP[PLZPUJS\KL and Ph.D. programs in computer science, information sys-
YVIV[PJZHUKKPNP[HSKLZPNUHUKLSLJ[P]LJV\YZLZPU[OLJHUKPKH[L»Z tems, information security, and educational technology.
As the chief academic and administrative officer of the Graduate
HYLH VM ZWLJPHSPaH[PVU L_HTWSLZ VM ^OPJO JV\SK PUJS\KL PTHNL School of Computer and Information Sciences (GSCIS), the
WYVJLZZPUN]PZPVU LTILKKLK Z`Z[LTZ NYHWOPJZ HUK V[OLY HYLHZ Dean will be responsible for leadership of the school's academic
and administrative affairs. The Dean will provide innovative
YLSH[LK [V JVTW\[LY OHYK^HYL :\WLY]PZPVU VM Z[\KLU[ YLZLHYJO vision and leadership in order to maintain and advance the stature
of the GSCIS. The Dean will foster and enhance the multidisci-
HUKZLUPVYKLZPNUWYVQLJ[ZHZ^LSSHZZ[\KLU[HK]PZPUNPZYLX\PYLK plinary structure of the GSCIS that includes Computer Science,
Information Systems, and Information Technology in Education
:HIIH[PJHSSLH]L^P[OZ\WWVY[PZH]HPSHISLL]LY`MV\Y[O`LHY disciplines. The Dean will ensure that quality educational servic-
es are provided to students.
:^HY[OTVYL *VSSLNL PZ HU \UKLYNYHK\H[L SPILYHS HY[Z PUZ[P[\[PVU Qualifications include a doctoral degree in computer science,
^P[O Z[\KLU[Z VU H Z\I\YIHU HYIVYL[\T JHTW\Z TPSLZ information systems, or related field. Candidates should have
the ability to work with faculty in their continued pursuit of
ZV\[O^LZ[ VM 7OPSHKLSWOPH ,PNO[ MHJ\S[` PU [OL +LWHY[TLU[ VM academic excellence and a shared vision towards preemi-
nence in research and scholarship. Candidates should have
,UNPULLYPUNVMMLYHYPNVYV\Z(),;HJJYLKP[LKWYVNYHTMVY[OL):PU experience related to graduate education, a demonstrated
,UNPULLYPUN[VHWWYV_PTH[LS`Z[\KLU[Z;OLKLWHY[TLU[OHZ record of developing and facilitating research, and senior
administrative experience. Candidates should have a sophis-
HU LUKV^LK LX\PWTLU[ I\KNL[ HUK [OLYL PZ Z\WWVY[ MVY MHJ\S[` ticated knowledge of the use of technology in the delivery of
education and distance learning and/or hybrid curricula.
Z[\KLU[ JVSSHIVYH[P]L YLZLHYJO -VY WYVNYHT KL[HPSZ ZLL O[[W! Located on a beautiful 330-acre campus in Fort Lauderdale,
LUNPUZ^HY[OTVYLLK\ 0U[LYLZ[LK JHUKPKH[LZ ZOV\SK Z\ITP[ H Florida, NSU has more than 28,000 students and is the sixth
largest independent, not for-profit university in the United
J]IYPLMZ[H[LTLU[ZKLZJYPIPUN[LHJOPUNWOPSVZVWO`HUKYLZLHYJO States. NSU awards associate’s, bachelor’s, master’s, educa-
tional specialist, doctoral, and first-professional degrees in
PU[LYLZ[ZHUK\UKLYNYHK\H[LHUKNYHK\H[L[YHUZJYPW[ZHSVUN^P[O more than 100 disciplines. It has a college of arts and sciences
[OYLL SL[[LYZ VM YLMLYLUJL [V! *OHPY +LWHY[TLU[ VM ,UNPULLYPUN and schools of medicine, dentistry, pharmacy, allied health and
nursing, optometry, law, computer and information sciences,
:^HY[OTVYL*VSSLNL*VSSLNL(]LU\L:^HY[OTVYL7( psychology, education, business, oceanography, and humani-
ties and social sciences.
VY[VSTVS[LY'Z^HY[OTVYLLK\^P[O[OL^VYK¸JHUKPKH[L¹ Applications should be submitted online at
PU[OLZ\IQLJ[SPULI`+LJLTILY :^HY[OTVYL*VSSLNLPZ www.nsujobs.com for position #997648
Visit our websites: www.nova.edu & http://scis.nova.edu
HU LX\HS VWWVY[\UP[` LTWSV`LY" ^VTLU HUK TPUVYP[` JHUKPKH[LZ
Nova Southeastern University is an Equal Opportunity/
HYLZ[YVUNS`LUJV\YHNLK[VHWWS` Affirmative Action Employer.
The Department of Computer and Information University of Tartu Several computer science faculty are actively en-
Science seeks individuals with exceptional prom- Senior Research Fellow gaged in scientific computing, computational
ise for, or a proven record of, research achievement systems biology, biological modeling and bio-
who will excel in teaching undergraduate and grad- The successful candidate will lead an industry- informatics. Interdisciplinary research is highly
uate courses and take a position of international driven research project in the area of agile soft- valued and encouraged by the departments and
leadership in defining their field of study. While ex- ware development environments. The position University. The successful candidate will have a
ceptional candidates in all areas of core computer is for 4 years with a monthly salary of 2500-4000 strong research record in computational biophys-
science may apply, of particular interest this year euro. For details see: http://tinyurl.com/pnn5qn ics. The candidate should also have demonstrated
are candidates in who are working on the founda- ability to teach courses relating to topics in phys-
tions of Market and Social Systems Engineering ics, biophysics, or computer science. The success-
- the formalization, analysis, optimization, and re- The University of Washington Bothell ful candidate will be expected to teach in both
alization of systems that increasingly integrate en- Assistant Professor — departments at the undergraduate and graduate
gineering, computational, and economic systems Software Engineering levels. Excellence in research, teaching, and ob-
and methods. Candidates should have a vision and taining external funding will be expected. Appli-
interest in defining the research and educational The University of Washington Bothell Comput- cants should send a copy of their CV, statements
frontiers of this rapidly growing field. ing and Software Systems Program invites ap- regarding their research interests and teaching
The University of Pennsylvania is an Equal plications for a tenure track position in Software philosophy, and the names of three references to
Opportunity/Affirmative Action Employer. Engineering to begin fall 2010. Areas of research the Computational Biophysics Search Commit-
The Penn CIS Faculty is sensitive to “two-body and teaching interest include: Requirements tee, Box 7507, Wake Forest University, Winston-
problems” and would be pleased to assist with Engineering, Quality Assurance, Testing Meth- Salem, NC 27109-7507. Application materials can
opportunities in the Philadelphia region. odologies, Software Development Processes, be sent electronically in the form of a single PDF
For more detailed information regarding this Software Design Methodologies, Software Proj- document to [email protected]. Re-
position and application link please visit: ect Management, and Collaborative and Team view of applications will begin November 1, 2009
http://www.cis.upenn.edu/departmental/ Development. and will continue through January 15, 2010. More
facultyRecruiting.shtml The Bothell campus of the University of Wash- information is available at http://www.wfu.edu/
ington was founded in 1990 as an innovative, in- csphy/recruiting/. Wake Forest University is an
terdisciplinary campus within the University of equal opportunity/affirmative action employer
University of Pennsylvania Washington system – one of the premier institu-
Lecturer tions of higher education in the US. Faculty mem-
bers have full access to the resources of a major Williams College
The University of Pennsylvania invites appli- research university, with the culture and close Visiting Faculty Position
cants for the position of Lecturer in Computer relationships with students of a small liberal arts
Science to start July 1, 2010. Applicants should college. The Department of Computer Science at Williams
hold a graduate degree (preferably a Ph.D.) in For additional information, including ap- College invites applications for an anticipated
Computer Science or Computer Engineering, plication procedures, please see our website at one-semester, half-time visiting faculty position
and have a strong interest in teaching with prac- http://www.uwb.edu/CSS/. All University faculty in the spring of 2010.
tical application. Lecturer duties include under- engage in teaching, research, and service. The We are particularly interested in candidates
graduate and graduate level courses within the University of Washington, Bothell is an affirma- who can teach an undergraduate course in arti-
Master of Computer and Information Technol- tive action, equal opportunity employer. ficial intelligence or a related field. Candidates
ogy program,(www.cis.upenn.edu/grad/mcit/). Of should either hold or be pursuing a Ph.D. in
particular interest are applicants with expertise computer science or a closely related discipline.
and/or interest in teaching computer hardware US Air Force Academy This position might be particularly attractive
and architecture. The position is for one year and Visiting Professor of to candidates who are pursuing an advanced
is renewable annually up to three years. Success- Computer Science degree and seek the opportunity to incorporate
ful applicants will find Penn to be a stimulating additional classroom experience in their profes-
environment conducive to professional growth in The United States Air Force Academy Department sional preparation.
both teaching and research. of Computer Science is accepting applications for Applications in the form of a vita, a teaching
The University of Pennsylvania is an Equal a Visiting Professor position for the 2010-11 aca- statement, and three letters of reference, at least
Opportunity/Affirmative Action Employer. demic year. Interested candidates should contact one of which speaks to the candidate’s promise
The Penn CIS Faculty is sensitive to “two-body the Search Committee Chair, Dr. Barry Fagin, at as a teacher, may be sent to:
problems” and would be pleased to assist with [email protected] or 719-333-7377. Detailed Professor Thomas Murtagh, Chair
opportunities in the Philadelphia region. information is available at http://www.usafa.edu/ Department of Computer Science
For more detailed information regarding this df/dfcs/visiting/index.cfm. TCL, 47 Lab Campus Drive
position and application link please visit: Williams College
http://www.cis.upenn.edu/departmental/ Williamstown, MA 01267
facultyRecruiting.shtml Wake Forest University
Faculty Position in Electronic mail may be sent to cssearch@
Computational Biophysics cs.williams.edu. Applications should be submit-
University of San Francisco ted by October 15, 2009 and will be considered
Tenure-track Position Wake Forest University invites applications for until the position is filled.
a tenure track faculty position at the level of As- The Department of Computer Science con-
The Department of Computer Science at the Uni- sistant Professor with a joint appointment in the sists of eight faculty members supporting a thriv-
versity of San Francisco invites applications for a Departments of Computer Science and Physics ing undergraduate computer science major in
tenure-track position beginning in August, 2010. to begin in the fall semester of 2010. Applicants a congenial working environment with small
While special interest will be given to candidates should have completed a PhD in an appropriate classes, excellent students, and state-of-the-art
in bioinformatics, game engineering, systems, field by the time of appointment. Wake Forest facilities. Williams College is a highly selective,
and networking, qualified applicants from all University is a highly ranked, private university coeducational, liberal arts college of 2100 stu-
areas of Computer Science are encouraged to ap- with about 4500 undergraduates, 750 graduate dents located in the scenic Berkshires of Western
ply. For full consideration, applications should be students, and 1700 students in the professional Massachusetts. Beyond meeting fully its legal
submitted by December 1, 2009. schools of medicine, law, divinity and business. obligations for non-discrimination, Williams
More details, including how to apply, can be The Physics Department has a major concentra- College is committed to building a diverse and
found here: tion in biophysics with approximately one third inclusive community where members from all
http://www.cs.usfca.edu/job of the departmental faculty working in that field. backgrounds can live, learn, and thrive.
Milgram’s
[CON T I N U ED FRO M P. 112]
Q&A
The Networker
Jon Kleinberg talks about algorithms, information flow, and
the connections between Web search and social networks.
A
P RO F E S S O R O F computer
science at Cornell Uni-
versity, Jon Kleinberg
has been studying how
people maintain social
connections—both on- and offline—
for more than a decade. Kleinberg
has received numerous honors and
awards, most recently the 2008 ACM-
Infosys Foundation Award in Com-
puting Sciences.
you could carry out manually. Suppose force each other. in the 1960s established that we’re all
I were trying to buy a new laptop, for linked by short paths—the proverbial
example. I’d find lots of people blog- Where did your work go from there? “six degrees of separation.”
ging, writing product reviews, and Once I had created the algorithm, I The thing that intrigued me was
so on. And I’d see certain things be- realized there’s something very basic how creative [C O NTINUED O N P. 111]