Communications201608 DL PDF
Communications201608 DL PDF
ACM
CACM.ACM.ORG OF THE 08/2016 VOL.59 NO.08
Computational
Biology in the
21st Century
Scaling with
Compressive
Algorithms
Onward! Radical new ideas and visions related to programming and software
SPLASH-I World class speakers on current topics in software, systems, and languages research
SPLASH-E Researchers and educators share educational results, ideas, and challenges
Biermann
SPLASH General Chair: Eelco Visser SLE General Chair: Tijs van der Storm
OOPSLA Papers: Yannis Smaragdakis SLE Papers: Emilie Balland, Daniel Varro
OOPSLA Artifacts: Michael Bond, Michael Hind GPCE General Chair: Bernd Fischer
Onward! Papers: Emerson Murphy-Hill GPCE Papers: Ina Schaefer
Onward! Essays: Crista Lopes Mövenpick
Student Research Competition: Sam Guyer, Patrick Lam
SPLASH-I: Eelco Visser, Tijs van der Storm Posters: Jeff Huang, Sebastian Erdweg
SPLASH-E: Matthias Hauswirth, Steve Blackburn Publications: Alex Potanin Amsterdam
DLS: Roberto Ierusalimschy Publicity and Web: Tijs van der Storm, Ron Garcia
Workshops: Jan Rellermeyer, Craig Anslow Student Volunteers: Daco Harkes
5 Editor’s Letter
From the New ACM President
By Vicki L. Hanson
7 Cerf’s Up
Star Struck in Lindau
By Vinton G. Cerf
10 BLOG@CACM
Inside the Great Wall
During a trip to China, Jason Hong 15 28
watches for signs of new technologies.
12 Reinforcement Renaissance 22 Privacy and Security
25 Calendar The power of deep neural networks Computer Security Is Broken:
has sparked renewed interest Can Better Hardware Help Fix It?
101 Careers in reinforcement learning, with Computer security problems have
applications to games, robotics, far exceeded the limits of the human
and beyond. brain. What can we do about it?
Last Byte By Marina Krakovsky By Paul Kocher
30 Viewpoint
Teamwork in Computing Research
Considering the benefits and
downsides of collaborative research.
By Ben Shneiderman
PH OTO BY M ARYNA PL ESH KUN
72 Computational Biology in
the 21st Century: Scaling with
Compressive Algorithms
Algorithmic advances take advantage
of the structure of massive biological
data landscape.
By Bonnie Berger, Noah M. Daniels,
and Y. William Yu
Research Highlights
32 Debugging Distributed Systems 46 Smart Cities: Concepts, Architectures,
ShiViz is a new distributed system Research Opportunities 82 Technical Perspective
debugging visualization tool. The aim is to improve cities’ Toward Reliable Programming
By Ivan Beschastnikh, Patty Wang, management of natural and for Unreliable Hardware
Yuriy Brun, and Michael D. Ernst municipal resources and in turn By Todd Millstein
the quality of life of their citizens.
38 The Singular Success of SQL By Rida Khatoun and Sherali Zeadally 83 Verifying Quantitative Reliability
SQL has a brilliant future for Programs that Execute on
as a major figure in the pantheon 58 Adaptive Computation: Unreliable Hardware
of data representations. The Multidisciplinary Legacy By Michael Carbin, Sasa Misailovic,
By Pat Helland of John H. Holland and Martin C. Rinard
John H. Holland’s general theories
42 The Hidden Dividends of adaptive processes apply 92 Technical Perspective
of Microservices across biological, cognitive, social, Why Didn’t I Think of That?
Microservices aren’t for every and computational systems. By Philip Wadler
company, and the journey isn’t easy. By Stephanie Forrest
By Tom Killalea and Melanie Mitchell 93 Ur/Web: A Simple Model
for Programming the Web
Articles’ development led by 64 Skills for Success at Different Stages By Adam Chlipala
queue.acm.org
of an IT Professional’s Career
The skills and knowledge that
earn promotions are not always Watch the author discuss
enough to ensure success in his work in this exclusive
Communications video.
the new position. http://cacm.acm.org/
By Leon Kappelman, Mary C. Jones, videos/ur-web
About the Cover: Vess Johnson, Ephraim R. Mclean,
Researchers today
are able to amass and Kittipong Boonme
unprecedented amounts
of biological data,
IMAGES F ROM SH UTT ERSTOCK. CO M
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
ACM, the world’s largest educational STA F F EDITORIAL BOARD ACM Copyright Notice
and scientific computing society, delivers DIRECTOR OF GROUP PU BLIS HING E DITOR- IN- C HIE F Copyright © 2016 by Association for
resources that advance computing as a Scott E. Delman Moshe Y. Vardi Computing Machinery, Inc. (ACM).
science and profession. ACM provides the [email protected] [email protected] Permission to make digital or hard copies
computing field’s premier Digital Library of part or all of this work for personal
and serves its members and the computing Executive Editor NE W S or classroom use is granted without
profession with leading-edge publications, Diane Crawford Co-Chairs fee provided that copies are not made
conferences, and career resources. Managing Editor William Pulleyblank and Marc Snir or distributed for profit or commercial
Thomas E. Lambert Board Members advantage and that copies bear this
Executive Director and CEO Senior Editor Mei Kobayashi; Michael Mitzenmacher; notice and full citation on the first
Bobby Schnabel Andrew Rosenbloom Rajeev Rastogi page. Copyright for components of this
Deputy Executive Director and COO Senior Editor/News work owned by others than ACM must
VIE W P OINTS
Patricia Ryan Larry Fisher be honored. Abstracting with credit is
Director, Office of Information Systems Co-Chairs permitted. To copy otherwise, to republish,
Web Editor Tim Finin; Susanne E. Hambrusch;
Wayne Graves David Roman to post on servers, or to redistribute to
Director, Office of Financial Services John Leslie King lists, requires prior specific permission
Rights and Permissions Board Members
Darren Ramdin Deborah Cotton and/or fee. Request permission to publish
Director, Office of SIG Services William Aspray; Stefan Bechtold; from [email protected] or fax
Donna Cappo Michael L. Best; Judith Bishop; (212) 869-0481.
Art Director Stuart I. Feldman; Peter Freeman;
Director, Office of Publications
Andrij Borys Mark Guzdial; Rachelle Hollander;
Bernard Rous For other copying of articles that carry a
Associate Art Director Richard Ladner; Carl Landwehr;
Director, Office of Group Publishing code at the bottom of the first or last page
Margaret Gray Carlos Jose Pereira de Lucena;
Scott E. Delman or screen display, copying is permitted
Assistant Art Director Beng Chin Ooi; Loren Terveen; provided that the per-copy fee indicated
Mia Angelica Balaquiot Marshall Van Alstyne; Jeannette Wing
ACM CO U N C I L in the code is paid through the Copyright
Designer
President Clearance Center; www.copyright.com.
Iwona Usakiewicz
Alexander L. Wolf Production Manager P R AC TIC E
Vice-President Subscriptions
Lynn D’Addesio Co-Chair
Vicki L. Hanson An annual subscription cost is included
Director of Media Sales Stephen Bourne
Secretary/Treasurer in ACM member dues of $99 ($40 of
Jennifer Ruzicka Board Members
Erik Altman which is allocated to a subscription to
Publications Assistant Eric Allman; Peter Bailis; Terry Coatta;
Past President Communications); for students, cost
Juliet Chance Stuart Feldman; Benjamin Fried;
Vinton G. Cerf is included in $42 dues ($20 of which
Pat Hanrahan; Tom Killalea; Tom Limoncelli; is allocated to a Communications
Chair, SGB Board Columnists Kate Matsudaira; Marshall Kirk McKusick;
Patrick Madden subscription). A nonmember annual
David Anderson; Phillip G. Armour; George Neville-Neil; Theo Schlossnagle; subscription is $269.
Co-Chairs, Publications Board Michael Cusumano; Peter J. Denning; Jim Waldo
Jack Davidson and Joseph Konstan Mark Guzdial; Thomas Haigh;
Members-at-Large The Practice section of the CACM ACM Media Advertising Policy
Leah Hoffmann; Mari Sako; Communications of the ACM and other
Eric Allman; Ricardo Baeza-Yates; Editorial Board also serves as
Pamela Samuelson; Marshall Van Alstyne ACM Media publications accept advertising
Cherri Pancake; Radia Perlman; the Editorial Board of .
Mary Lou Soffa; Eugene Spafford; in both print and electronic formats. All
CO N TAC T P O IN TS C ONTR IB U TE D A RTIC LES advertising in ACM Media publications is
Per Stenström Copyright permission
SGB Council Representatives Co-Chairs at the discretion of ACM and is intended
[email protected] Andrew Chien and James Larus to provide financial support for the various
Paul Beame; Jenna Neefe Matthews; Calendar items
Barbara Boucher Owens Board Members activities and services for ACM members.
[email protected] William Aiello; Robert Austin; Elisa Bertino; Current advertising rates can be found
Change of address Gilles Brassard; Kim Bruce; Alan Bundy; by visiting http://www.acm-media.org or
BOARD C HA I R S [email protected] Peter Buneman; Peter Druschel; Carlo Ghezzi; by contacting ACM Media Sales at
Education Board Letters to the Editor Carl Gutwin; Yannis Ioannidis; (212) 626-0686.
Mehran Sahami and Jane Chu Prey [email protected] Gal A. Kaminka; James Larus; Igor Markov;
Practitioners Board Gail C. Murphy; Bernhard Nebel; Single Copies
George Neville-Neil W E B S IT E
http://cacm.acm.org Lionel M. Ni; Kenton O’Hara; Sriram Rajamani; Single copies of Communications of the
Marie-Christine Rousset; Avi Rubin; ACM are available for purchase. Please
REGIONA L C O U N C I L C HA I R S AU T H O R G U ID E L IN ES Krishan Sabnani; Ron Shamir; Yoav contact [email protected].
ACM Europe Council http://cacm.acm.org/ Shoham; Larry Snyder; Michael Vitale;
Dame Professor Wendy Hall Wolfgang Wahlster; Hannes Werthner; COMMUN ICATION S OF THE ACM
ACM India Council ACM ADVERTISIN G DEPARTM E NT Reinhard Wilhelm (ISSN 0001-0782) is published monthly
Srinivas Padmanabhuni 2 Penn Plaza, Suite 701, New York, NY by ACM Media, 2 Penn Plaza, Suite 701,
ACM China Council 10121-0701 RES E A R C H HIGHLIGHTS New York, NY 10121-0701. Periodicals
Jiaguang Sun T (212) 626-0686 Co-Chairs postage paid at New York, NY 10001,
F (212) 869-0481 Azer Bestovros and Gregory Morrisett and other mailing offices.
PUB LICATI O N S BOA R D
Co-Chairs Board Members
Director of Media Sales Martin Abadi; Amr El Abbadi; Sanjeev Arora; POSTMASTER
Jack Davidson; Joseph Konstan Jennifer Ruzicka Please send address changes to
Board Members Nina Balcan; Dan Boneh; Andrei Broder;
[email protected] Doug Burger; Stuart K. Card; Jeff Chase; Communications of the ACM
Ronald F. Boisvert; Anne Condon; 2 Penn Plaza, Suite 701
Nikil Dutt; Roch Guerrin; Carol Hutchins; For display, corporate/brand advertising: Jon Crowcroft; Sandhya Dwaekadas;
Matt Dwyer; Alon Halevy; Norm Jouppi; New York, NY 10121-0701 USA
Yannis Ioannidis; Catherine McGeoch; Craig Pitcher
M. Tamer Ozsu; Mary Lou Soffa; Alex Wade; [email protected] T (408) 778-0300 Andrew B. Kahng; Sven Koenig; Xavier Leroy;
Keith Webster William Sleight Steve Marschner; Kobbi Nissim;
[email protected] T (408) 513-3408 Steve Seitz; Guy Steele, Jr.; David Wagner; Printed in the U.S.A.
ACM U.S. Public Policy Office Margaret H. Wright; Andreas Zeller
Renee Dopplick, Director Media Kit [email protected]
1828 L Street, N.W., Suite 800 WEB
Washington, DC 20036 USA Association for Computing Machinery Chair
T (202) 659-9711; F (202) 667-1066 (ACM) James Landay
2 Penn Plaza, Suite 701 Board Members A
SE
REC
Y
Computer Science Teachers Association New York, NY 10121-0701 USA Marti Hearst; Jason I. Hong;
E
CL
PL
Mark R. Nelson, Executive Director T (212) 869-7440; F (212) 869-0481 Jeff Johnson; Wendy E. MacKay
NE
TH
S
I
Z
I
M AGA
q Join ACM-W: ACM-W supports, celebrates, and advocates internationally for the full engagement of women in
all aspects of the computing field. Available at no additional cost.
Priority Code: CAPP
Payment Information
Payment must accompany application. If paying by check
or money order, make payable to ACM, Inc., in U.S. dollars
Name or equivalent in foreign currency.
Credit Card #
City/State/Province
Exp. Date
ZIP/Postal Code/Country
Signature
Email
1-800-342-6626 (US & Canada) Hours: 8:30AM - 4:30PM (US EST) [email protected]
1-212-626-0500 (Global) Fax: 212-944-1318 acm.org/join/CAPP
cerf’s up
DOI:10.1145/2963167
H
IGHLY CLASSIFIED INFOR- trols how memory can be used. Tagged However, they are typically difficult to
on personal com-
M AT I O N memory can make it much more diffi- design, expensive to manufacture, reduce
puters in the U.S. today is cult for a cyberattacker to find and ex- performance, and are unavailable in most
largely protected from the ploit software vulnerabilities because commercial and consumer contexts.
attacks Daniel Genkin et compromising a Linux kernel object or Moreover, their security depends on
al. described in their article “Physical a Java application object does not auto- understanding the feasible attacks. Faraday
Key Extraction Attacks on PCs” (June matically give power over any other ob- cages, optical couplers, and power filters
2016). Here, I outline cost-effective ject. Security-access operations by such would not stop acoustic leakage through
defenses that will in the future com- objects can be strengthened through vents or self-amplification attacks that
pletely defeat such attacks, while “inconsistency robustness” providing induce leakage at frequencies below the
making even stronger cyberattacks technology for valid formal inference filter’s design specification. The problem of
extremely difficult. on inconsistent information.1 Such in- creating inexpensive hardware or software
For example, tiny Faraday cages can consistency robust inference is impor- that adequately mitigates all feasible side-
be constructed in a processor pack- tant because security-access decisions channel attacks thus remains open.
age so encryption/decryption can be are often made on the basis of conflict- Daniel Genkin, Lev Pachmanov,
performed without the possibility of ing and inconsistent information. Itamar Pipman, Adi Shamir,
inadvertent emanations that could be Such individual processor-package and Eran Tromer, Tel Aviv, Israel
measured or exploited, because all ex- cyberdefense methods and technolo-
ternal communication to a cage would gies will make it possible, within, say,
be through optical fiber and the cage’s the next five years, to construct a highly Fewer Is Better Than More
power supply is filtered.1 This way, en- secure board with more than 1,000 gen- Samples When Tuning Software
cryption keys and encryption/decryp- eral-purpose coherent cores, greatly di- The emphasis on visualizing large
tion processes would be completely minishing dependence on datacenters numbers of stack samples, as in, say,
protected against the attacks described and thereby decreasing centralized flame graphs in Brendan Gregg’s article
in the article. In such a Faraday cage, points of vulnerability.1 “The Flame Graph” (June 2016) actu-
advanced cryptography (such as Learn- These technologies promise to pro- ally works against finding some perfor-
ing With Errors) could not be feasibly vide a comprehensive advantage over mance bottlenecks, resulting in sub-
attacked through any known method, cyberattacks. The U.S., among all coun- optimal performance of the software
including quantum computing. tries, has the most to lose through its being tuned. Any such visualization
Hardware can likewise help protect current cyberdefense vulnerabilities. must necessarily discard information,
software, including operating systems In light of its charter to defend the resulting in “false negatives,” or failure
and applications, through RAM-proc- nation and its own multitudinous cy- to identify some bottlenecks. For exam-
essor package encryption. All traffic berdefense vulnerabilities, the Depart- ple, time can be wasted by lines of code
between a processor package and RAM ment of Defense is the logical agency to that happen to be invoked in numerous
can be encrypted using a Faraday cage initiate an urgent program to develop places in the call tree. The call hierar-
to protect a potentially targeted app and deploy strong cyberdefenses using chy, which is what flame graphs display,
(which is technically a process) from these technologies and methods. In- cannot draw attention to these lines of
operating systems and hypervisors, dustry and other government agencies code.1 Moreover, one cannot assume
other apps, and other equipment, in- should then quickly incorporate them the bottlenecks can be ignored; even a
cluding baseband processors, disk into their own operations. particular bottleneck that starts small
controllers, and USB controllers. Even does not stay small, on a percentage ba-
a cyberattack that compromises an Reference sis, after other bottlenecks have been
1. Hewitt, C. Security Without IoT Mandatory Backdoors:
entire operating system or hypervisor Using Distributed Encrypted Public Recording to removed. Gregg made much of 60,000
would permit only denial of service to Catch & Prosecute Suspects. Social Science Research samples and how difficult they are to
Network, June 16, 2016; http://papers.ssrn.com/sol3/
its applications and not give access to papers.cfm?abstract_id=2795682 visualize. However, he also discussed
any application or related data storage. finding and fixing a bottleneck that re-
Similarly, every-word-tagged exten- Carl Hewitt, Palo Alto, CA sulted in saving 40% of execution time.
sions of ARM and X86 processors can That means the fraction of samples dis-
be used to protect each Linux kernel playing the bottleneck was at least 40%.
object and each Java object in an app Authors Respond: The bottleneck would thus have been
from other such objects by using a We agree that high-security designs displayed, with statistical certainty, in a
tag on each word of memory that con- can mitigate physical side channels. purely human examination of 10 or 20
random stack samples—with no need tion. Spend time in the mind-set of the
DOI:10.1145.2950111 http://cacm.acm.org/blogs/blog-cacm
Jason Hong A cool feature of WeChat is the abil- almost turned around, but decided
The Emerging ity to send virtual red envelopes to to see how well he could do just with
Technology Landscape others. Real red envelopes are given his smartphone. It turns out he did
in China by close friends and family members perfectly fine, even being able to buy
http://bit.ly/1Rtm2SR on birthdays and Chinese New Year, food from street vendors without any
May 25, 2016
and contain money. I remember al- problems.
I have been in China about two weeks ways looking forward to getting these I found the popularity of mobile
so far, and have been amazed at how envelopes as a kid. WeChat’s virtual payment in China amazing, especially
much the technology landscape has envelopes can also be used to transfer since I have yet to directly see anyone
changed since I last spent time here in money to others, and this feature is in the U.S. use any smartphone mo-
2008. I wanted to share some of these already very popular for these special bile payment apps (other than Square,
observations because they represent occasions. There is even a Wikipedia which addresses a different payment
some really unique and compelling page about these virtual red envelopes space). I have only heard secondhand
uses of technology. (http://bit.ly/1r4gwAc) reporting over stories from a few of students at my
a billion were sent last year for Chi- university about using Venmo or other
WeChat Mobile Social Networking nese New Year. payment apps.
Perhaps the most obvious change is My best explanation is relative ad-
the ubiquity of WeChat, a combina- Mobile Payment Systems vantage. In the U.S., credit cards are
tion of mobile chat and social net- Another obvious change is the wide- already very popular, and so mobile
working app. Pretty much every Chi- spread adoption of mobile payment payment apps only offer a small ad-
nese person I met used WeChat, and systems. Many people who hosted us vantage in this context. In contrast, in
I saw people using it in restaurants, for dinner used their smartphones to China, credit cards were not very com-
on the subway, waiting for the bus, pay, just by taking a picture of a QR mon, and so mobile payment systems
pretty much everywhere. WeChat was code on the receipt. One person told conveniently filled in the gap, allow-
developed by powerhouse Tencent, us how you can use AliPay (from Ali- ing China to leapfrog ahead.
and its features are basically the same baba, a major e-commerce company),
as any social networking site: you can WeChat, or several other apps to buy Taxi Hailing Systems
add friends, post status updates that things from vending machines, again Didi is a very popular app for hailing
friends can see, and send messages to through QR codes. The fact they were taxis. It is only about four years old,
friends. The app feels much cleaner compatible was really cool, since I and in about half the taxis I was in, the
than Facebook’s mobile app, though, would have expected companies to try drivers had a holder for their smart-
probably because WeChat started mo- to do proprietary systems. phone and were actively using it. One
bile first and as a result had to be very Another person told us how he driver even had his app in some kind
simple. forgot his wallet at home one day. He of active mode, where you could see
and hear every single nearby request, popular apps were offering deep cuts as much as two months of salary, but
and it was going off about every 20 sec- in prices to gain market share. I still saw Apple’s smartphone almost
onds. (I should also note driving safety Compounding things is the fact everywhere.
is a dodgy concept in China. Half the that a lot of similar companies exist. Chinese people also have a strong
taxis I was in didn’t have safety belts, Many friends I talked with observed affinity for numbers. One interesting
pedestrians walk anytime across that if you come up with a good start- quirk is that most telephone numbers
the street, and motorcycles often go up idea, it does not take very long for in China do not have any dashes or
against traffic.) copycat companies to appear and of- spaces, just a string of 10 or 11 digits.
Didi used to be two competing taxi fer their VC-funded discounts. One One colleague explained Chinese peo-
hailing companies, but they merged a person commented how there used ple have a strong capacity for remem-
short while back. I do not know much to be a dozen companies that would bering numbers. However, I can as-
about Chinese law, but there do not wash your car overnight (for about sure readers this is not genetic, given
seem to be antitrust laws to prevent 20 RMB, or about $3), but now it has my lack of this facility.
large mergers like this. Some people dropped to about two. So competition More interestingly, many “holi-
I talked to also said taxi hailing was is very stiff, and it is going to be excit- days” have been created based on
much better before the merger, since ing to see how things play out in the special numbers. You may know how
the two companies offered heavy dis- next few years. Chinese people avoid the number 4
counts, making it very cheap to get (since it sounds like the word “death”
around, but no longer. Same- or Next-Day Delivery in Chinese) and really like the number
Didi is also really interesting when If you walk on a major street of any 8 (since it sounds like “good fortune”).
compared with Uber’s business mod- Chinese city, you will likely see a dozen Riffing on this idea, May 20 (5/20) has
el. Unlike Uber, which takes a cut out motorized bikes with logos like TMall, been advanced by companies as a new
of the fare, Didi does not seem to; in- JD.com, or even Amazon. Same- or kind of Valentine’s Day, since if you
stead, people told me Didi analyzes next-day delivery is very common and say the individual digits it sounds a lit-
and sells the data from riders. For convenient in China. My wife and I tle bit like “I love you.” Yeah, it is sort
example, they can determine origin used these services to buy diapers and of a stretch, but WeChat has taken full
and destination pairs easily, making clothes, and they deliver right to your advantage of this by making it easy to
it possible to infer demographics and door. You can also pay on delivery, send 520 RMB (about $75) to one’s sig-
interests of riders. They can then sell probably because of the lack of online nificant other; there is no option for
that data and even use it for highly tar- payment infrastructure, though the 5.20 or 52.00 RMB. From what I was
geted advertisements or offers. I was mobile payment systems described told, 5/20 was invented just a few years
told Didi recently closed a major deal earlier can also be used. ago, but it has already caught on.
with some banks to do some offers. The density of cities in China is November 11 is another new “hol-
Apple also announced it will be invest- one clear reason these delivery ser- iday,” representing single people
ing US$1 billion into Didi, perhaps vices work. One VC explained another (11/11, or all 1’s). Students I talked to
to help with Apple Pay in China, to non-obvious reason: in the U.S., retail said it was very similar to Black Friday
improve their maps, and to sell other infrastructure is mostly saturated, in the U.S., the day of frenzied shop-
software and services. meaning e-commerce offers limited ping after Thanksgiving, because what
It is also not clear to me how well relative advantage over what already better way is there to celebrate being
Uber can do in China. Uber was re- exists, and often works by cannibal- single than going shopping? Appar-
ported to have lost about $1 billion in izing brick-and-mortar stores. In con- ently, Alibaba has been breaking sales
China last year because of its heavy trast, retail infrastructure in China records every year on this date.
discounts to win market share. It is not is not as well established, meaning
clear to me if Uber China also works e-commerce has a lot more room for Parting Thoughts
with taxis, but Uber’s standard model potential growth. These factors, along I hope I have conveyed some of the en-
of having everyday people act as drivers with relatively low labor and delivery ergy and excitement I have been see-
might not work as well in China, since costs, make same- or next-day delivery ing the past few weeks in China. Even
the density of taxis in major Chinese in China very effective in practice. though I am supposed to be on vaca-
cities is already pretty high. tion, it is hard to miss all of the inno-
Another interesting story I heard Other Odds and Ends vation and advances everyday people
is many of these companies are suf- A few more observations about the are making use of. I am really looking
fering from fraud. For example, one tech landscape in China that did not forward to seeing what the next de-
friend told me how Uber was essen- fit in the above categories. One friend cade of change will bring.
tially paying riders to use their service, told me how the iPhone was some-
and how one person had 100 smart- times called the “kidney” because Jason Hong is an associate professor in the School of
phones and colluded with drivers to one teen literally sold his kidney so Computer Science at Carnegie Mellon University.
do fake rides to cash in. In fact, a lot he could buy an iPhone; he figured he
of people commented how all of these had two kidneys and really wanted an
discounts were essentially VCs fuel- iPhone. More surprisingly, I was told
ing the economy, since a lot of these that for many people, the iPhone costs © 2016 ACM 0001-0782/16/08 $15.00
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 11
N
news
Reinforcement
Renaissance
The power of deep neural networks has sparked renewed
interest in reinforcement learning, with applications to games,
robotics, and beyond.
E
has an-
ACH T I M E D EE PM I N D
nounced an amazing accom-
plishment in game-playing
computers in recent months,
people have taken notice.
First, the Google-owned, London-
based artificial intelligence (AI) re-
search center wowed the world with
a computer program that had taught
itself to play nearly 50 different 1980s-
era Atari games—from Pong and
Breakout to Pac-Man, Space Invaders,
Boxing, and more—using as input
nothing but pixel positions and game
scores, performing at or above the hu-
man level in more than half these var-
ied games. Then, this January, Deep-
Mind researchers impressed experts
with a feat in the realm of strategy
games: AlphaGo, their Go-playing pro-
gram, beat the European champion in
the ancient board game, which poses a
much tougher AI challenge than chess.
Less than two months later, AlphaGo elicits gee-whiz reactions from the “Reinforcement learning is a mod-
scored an even greater victory: it won 4 general public, and DeepMind’s tri- el of learning where you’re not given
games in a best-of-5 series against the umphs have heightened academic a solution—you have to discover it by
best Go player in the world, surprising and commercial interest in the AI trial and error,” explains Sridhar Ma-
IMAGE BY MA XUSER
the champion himself. field behind DeepMind’s methods: hadevan, a professor at the University
The idea that a computer can learn a blend of deep neural networks and of Massachusetts Amherst, a long-
to play such complex games from reinforcement learning called “deep time center of research into reinforce-
scratch and achieve a proficient level reinforcement learning.” ment learning.
The clearest contrast is with super- These networks, whose multiple layers
vised learning, the kind used to train learn relevant features at increasingly
image recognition software, in which Despite the buzz higher levels of abstraction, are cur-
the supervision comes in the form around DeepMind, rently the best available way to mea-
of labeled examples (and requires sure similarities between situations,
people to label them). Reinforcement combining Abbeel explains.
learning, on the other hand, “is a way reinforcement The two types of learning—rein-
of not needing labels, or labeling au- forcement learning and deep learning
tomatically by who’s winning or los- learning with through deep neural networks—com-
ing—by the rewards,” explains Uni- neural networks plement each other beautifully, says
versity of Alberta computer scientist Sutton. “Deep learning is the greatest
Rich Sutton, a co-founder of the field is not new. thing since sliced bread, but it quickly
of reinforcement learning and co-au- becomes limited by the data,” he ex-
thor of the standard textbook on the plains. “If we can use reinforcement
subject. In reinforcement learning, learning to automatically generate
the better your moves are, the more data, even if the data is more weakly
rewards you get, “so you can learn to labeled than having humans go in and
play the Go game by playing the moves label everything, there can be much
and winning or losing, and no one has I think is at the core of intelligence,” more of it because we can generate it
to tell you if that was a good move or a says computer scientist Itamar Arel, a automatically, so these two together
bad move because you can figure it out professor of electrical engineering and really fit well.”
for yourself; it led to a win, so it was a computer sciences at the University of Despite the buzz around DeepMind,
good move.” Tennessee and CEO of Osaro, a San combining reinforcement learning
Sutton knows the process is not Francisco-based AI startup. “If some- with neural networks is not new. TD-
as simple as that. Even in the neat, thing good happens now, can I think Gammon, a backgammon-playing pro-
tightly controlled world of a game, back to the last n steps and figure out gram developed by IBM’s Gerald Tes-
deducing which moves lead to a win what I did that led to the positive or auro in 1992, was a neural network that
is a notoriously difficult problem be- negative outcome?” learned to play backgammon through
cause of the delay between an action Just as for human players, figuring reinforcement learning (the TD in the
and its reward, a key feature of rein- out smart moves enables the program name stands for Temporal-Difference
forcement learning. In many games, to repeat that move the next time it learning, still a dominant algorithm in
you receive no feedback at all until the faces the same situation—or to try a reinforcement learning). “Back then,
end of the game, such as a 1 for a win new, possibly better move in hope of computers were 10,000 times slower
or a –1 for a loss. stumbling into an even higher reward. per dollar, which meant you couldn’t
“Typically, you have to go through In extremely small worlds (think of a have very deep networks because
hundreds of actions before your score simple game like Tic-Tac-Toe, also those are harder to train,” says Jürgen
increases,” explains Pieter Abbeel, an known as Noughts and Crosses), the Schmidhuber, a professor of artificial
associate professor at the University same exact situations come up again intelligence at the University of Lu-
of California, Berkeley, who applies and again, so the learning agent can gano in Switzerland who is known for
deep reinforcement learning to robot- store the best action for every possible seminal contributions to both neural
ics. (For a robot, a reward comes for situation in a lookup table. In complex networks and reinforcement learning.
completing a task, such as correctly as- games like chess and Go, however, it is “Deep reinforcement learning is just
sembling two Lego pieces.) “Before you impossible to enumerate all possible a buzzword for traditional reinforce-
understand how the game works, and situations. Even checkers has so much ment learning combined with deeper
are learning through your own trial and branching and depth that the game neural networks,” he says.
error,” Abbeel says, “you just kind of do yields a mind-boggling number of Schmidhuber also notes the tech-
things, and every now and then your different positions. So imagine what nique’s successes, though impressive,
score goes up, and every now and then happens when you move from games have so far been in narrow domains,
it goes down, or doesn’t go up. How do to real-world interactions. in which the current input (such as
you tease apart which subset of things “You’re never going to see the same the board position in Go or the current
that you did contributed to your score situation a second time in the real screen in Atari) tells you everything you
going up, and which subset was just world,” says Abbeel, “so instead of a need to know to guide your next move.
kind of a waste of time?” table, you need something that under- However, this “Markov property” does
This thorny question—known as the stands when situations are similar to not normally hold outside of the world
credit assignment problem—remains situations you’ve seen before.” This is of games. “In the real world, you see
a major challenge in reinforcement where deep learning comes in, because just a tiny fraction of the world through
learning. “Reinforcement learning is understanding similarity—being your sensors,” Schmidhuber points out,
unique in that it’s the only machine able to extract general features from speaking of both robots and humans.
learning field that’s focused on solving many specific examples—is the great As humans, we complement our lim-
the credit assignment problem, which strength of deep neural networks. ited perceptions through selective
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 13
news
memories of past observations; we their own past experiences, but also Even modest amounts of demon-
also draw on decades of experience to from the aggregate experiences of oth- stration, Arel says, “give the agent a
combine existing knowledge and skills er connected devices, and as a result head start to make it practical to apply
to solve new problems. “Our current become increasingly better at taking these ideas to robotics and other do-
reinforcement learners can do this in the right action at the right time. mains where acquiring experience is
principle, but humans still do it much Osaro, Arel’s software startup, also prohibitively expensive.”
better,” he says. uses deep reinforcement learning, but
Researchers continue to push rein- promises to eliminate the costly ini-
forcement learners’ capabilities, and are tial part of the learning curve. For ex- Further Reading
already finding practical applications. ample, a computer learning to play the Mnih, V., et al.
“Part of intelligence is knowing Atari game Pong from scratch starts Human-level control through deep
reinforcement learning. Nature (2015), vol.
what to remember,” says University out completely clueless, and therefore
518, pp. 529–533.
of Michigan reinforcement learning requires tens of thousands of plays to
expert Satinder Singh, who has used become proficient—whereas humans’ Silver, D., et al.
Mastering the game Go with deep neural
the world-building game Minecraft to experience with the physics of balls networks and tree search. Nature (2016),
test how machines can choose which bouncing off walls and paddles makes vol. 529, pp. 484–489.
details in their environment to look Pong intuitive even to children. Tesauro, G.
at, and how they can use those stored “Deep reinforcement learning is a Temporal difference learning and TD-
memories to behave better. promising framework, but applying it Gammon. Communications of the ACM
Singh and two colleagues recently from scratch is a bit problematic for re- (1995), vol. 38, issue 3, pp. 58–68.
co-founded Cogitai, a software com- al-world problems,” Arel says. A factory Sutton, R.S., and Barto, A.G.
pany that aims to use deep reinforce- assembling smartphones, for example, Reinforcement Learning: An Introduction.
ment learning to “build machines requires its robotic manufacturing MIT Press, 1998, https://mitpress.mit.edu/
books/reinforcement-learning
that can learn from experience the way equipment to get up to speed on a new
humans can learn from experience,” design within days, not months. Osa-
Based in San Francisco, Marina Krakovsky is the author
Singh says. For example, devices like ro’s solution is to show the learning of The Middleman Economy: How Brokers, Agents,
thermostats and refrigerators that agent what good performance looks Dealers, and Everyday Matchmakers Create Value and
Profit (Palgrave Macmillan, 2015).
are connected through the Internet of like “so it gets a starting point far better
Things could continually get smarter than cluelessness,” enabling the agent
and smarter by learning not only from to rapidly improve its performance. © 2016 ACM 0001-0782/16/08 $15.00
Milestones
I
N T H E S P RI N G of 1991, a 21-year-
old Finnish student named
Linus Torvalds sat down to
write code that would ulti-
mately revolutionize the world
of software development. In a Usenet
newsgroup post late that summer, he
told the world about his work: “I’m
doing a (free) operating system (just a
hobby, won’t be big and professional
like gnu) for 386(486) AT clones,” he
wrote. “This has been brewing since
April and is starting to get ready. I’d
like to know what features most people
would want. Any suggestions are wel-
come, but I won’t promise I’ll imple-
ment them :-).”
Indeed, users of his Linux operating
system have wanted a lot of features
over the past quarter-century, and Tor-
valds has not had to add them himself.
Linux today has more than 18 million
lines of source code and some 12,000 the 1970s, AT&T Bell Labs licensed ers asked. “Why would you give away
participating developers. There are tens Unix source code to government and software created at great expense to
of millions of Linux users worldwide, academic researchers pretty much competitors?” developers wanted to
from owners of Android smartphones without restriction. know. William Scherlis, a computer
to corporate data center managers to The modern open source software science professor at Carnegie Mellon
scientists at supercomputer centers. (OSS) movement got its start with Rich- University and director of the univer-
It is a remarkable success story by ard Stallman, a staunch “free software” sity’s Institute for Software Research,
almost any measure, and much of that advocate, who wrote an open source says these objections are no longer
success owes to a model of software de- text editor called Emacs in 1976. In valid, if they ever were.
velopment called “open source.” There 1983, he launched the GNU Project to Scherlis says the view of the open
are slightly different definitions of the develop a free Unix-like operating sys- source movement as a combination of
idea, but essentially open source refers tem and related utilities. At the same anarchy and demagoguery is a myth.
to software that is publicly available as time he also founded the non-profit “It looks like hundreds or thousands
source code and may be freely used, Free Software Foundation to promote of people contributing from around
modified, and redistributed, without wide collaboration in the develop- the world, but generally it’s a very small
charge for the license (but sometimes ment, distribution, and modification core group that has intellectual owner-
with a charge for the service of distrib- of free software, including GNU Project ship of the code base,” he says. “There’s
uting the software). software such as GNU Emacs and the usually a hierarchical structure so that
GCC C compiler. control is maintained, and the major
History The creation and use of OSS grew successful projects, like Apache and
Software offered to the public in open steadily after 1983, but users and de- Eclipse, have elaborate ownership and
source format is not a new idea. Begin- velopers, especially large corpora- governance structures: the Apache
ning in the 1950s, a user group called tions, often have looked at it askance. Foundation, the Eclipse Foundation,
SHARE, working with IBM, published “Who would trust a mission-critical and so on.”
applications and utilities as source application to software written by a Allison Randal, president of the
code for use on IBM mainframes. In bunch of long-haired anarchists?” us- Open Source Initiative, which advo-
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 15
news
cates for OSS and maintains a list of gan four years ago and resulted in a the public domain ... [OSS] isn’t suc-
industry-standard OSS licenses, says development and production tool for cessful in building a mass market and
the open source community essential- Google, but it was not flexible enough making powerful, easy-to-use software
ly declared victory in 2010, by which for others to adapt to their purposes, broadly accessible to consumers.”
time she says the tide of opinion had says Google senior fellow Jeffrey Dean. Yet in 2004, Microsoft dipped its giant
flowed overwhelmingly from propri- “If you wanted to do a more exotic neu- toe into the OSS waters with the release
etary software to OSS. She cites a re- ral network, some of those were hard,” of the open source Windows Installer
cent survey of 1,300 IT professionals he says. TensorFlow was developed XML Toolset (WiX). In 2005, Microsoft
by Black Duck Software that showed from the start to be open sourced and open sourced the F# programming lan-
the percentage of companies running was written to minimize its dependen- guage and, soon after, a number of other
part or all of their operations on OSS cies on other internal Google tools things. Last year, it released the open
had almost doubled between 2010 and libraries. source development framework and
and 2015, from 42% to 78%. The num- Dean says Google traditionally has runtime system .NET Core, a free imple-
ber reporting they contribute to open published the ideas behind its tech- mentation of its .NET Framework, for
source projects rose from 50% in 2014 nologies in journals. By open sourcing Windows, Linux, and Mac OS X.
to 64% last year, she adds. TensorFlow, Google has gone further Microsoft now participates in more
by making it easier for others to try than 2,000 open source projects, says
Why Do It? Google’s ideas and code in their own Anders Hejlsberg, a technical fellow
“It comes down to economic neces- software. That will enable users to try and a lead developer of the open source
sity,” says Randal, who is also a devel- different machine-learning techniques, tools C# and TypeScript. “New proj-
opment manager at Hewlett Packard spawning advances that may help ects today are open source by default,
Enterprise. “If nobody else was using Google in return. “We hope that a whole unless there are good reasons why
open source, you could ignore it, but community will spring up around this, they shouldn’t be,” he says. “That’s a
if others use it, they are getting some- and we will get a wide variety of contrib- complete switch from the proprietary
thing free and you are not, and they utors, from students and hobbyists to mind-set of earlier days.”
have an advantage. You can’t be a start- large companies,” Dean says. Microsoft is collaborating with
up in Silicon Valley and not use it.” For years, Microsoft lagged behind Google on the development of Google’s
There are also reasons why large many other developers in its embrace next version of Angular, the popular
companies like HP increasingly use of OSS. In a speech in 2001, Microsoft Web user-interface framework. The
OSS, Randal says. “They get bogged senior vice president Craig Mundie open source project will combine fea-
down by their massive patent port- said, “The OSS development model tures of Angular with features from
folios. OSS is a good way for them to leads to a strong possibility of un- Microsoft’s TypeScript, a superset of
innovate because they can pool their healthy ‘forking’ of a code base, result- JavaScript. Says Hejlsberg, “Previously
patents.” For example, she says, the ing in the development of multiple it was, ‘Those are our competitors; we
500 companies that participate in the incompatible versions of programs, can’t work with them.’ But now it’s,
OpenStack project for cloud comput- weakened interoperability, product ‘Oh, gosh, they are trying to solve prob-
ing, including AT&T, IBM, and Intel, instability, and hindering businesses lems we’ve already solved, and vice
agree to license their patents to the ability to strategically plan for the fu- versa. We should work together for
neutral, non-profit OpenStack Foun- ture ... It has inherent security risks the benefit of both companies and the
dation and thereby to all OpenStack and can force intellectual property into community at large.’”
users and contributors. “The compa- While altruism toward the exter-
nies have agreed not to attack each nal development community some-
other with patent disputes around Allison Randal, times plays a part, the open sourcing
their collaborative work,” Randal of .NET Core was mostly a financial
says. “It’s a safe space for all of them president of the decision, says Randal of the Open
to work in.” Open Source Source Initiative. “.NET is pretty old,
Last year, Google surprised some and they hit a point where they real-
observers by releasing for general use Initiative, says ized they’d get more value out of re-
the source code for its TensorFlow the open source leasing it as open source, getting a
software, a set of tools for developing lot of eyes on the code, and getting
deep learning applications, including community contributions back.”
neural networks. It is the AI engine essentially declared Carnegie Mellon’s Scherlis says re-
behind various Google apps—such as cent open source projects have shown
Google Photos, which can identify ob- victory in 2010. an increased focus on software assur-
jects in pictures it has never seen be- ance. “With OSS we have a chance to
fore—but the code previously was off do better at providing users with not
limits to external parties wanting to just code, but evidence of quality.” That
develop such apps. might take the form of test cases, per-
Work on a predecessor system for formance evaluations, code analyses,
deep learning, called DistBelief, be- or inspection reports, he notes.
Downsides
Scherlis cautions not to get swept away
“New projects today
ACM
with open source euphoria; major ef-
forts like Apache may have tight gov-
ernance, but some projects do not. He
are open source by Member
points to the devastating Heartbleed default, unless there News
security bug discovered in the OpenSSL are good reasons why
library in 2014 as an example; the bug
left an estimated 500,000 trusted com- they shouldn’t be. MAKING SOFTWARE SAFER,
RATHER THAN FASTER
puters vulnerable to breaches of cryp- That’s a complete Michael Franz,
a professor in
tographic security. “OpenSSL wasn’t
a well-funded consortium, it was just switch from the computer
science
a small group doing it,” Scherlis says. the proprietary mind- department at
the University
“But it was so good and so essential, ev-
erybody used it.” set of earlier days.” of California,
Irvine, is co-inventor of the trace
Scherlis says users are dreaming compilation technology that
when they think, “Many eyes have cast eventually became the
their gaze upon this code, and so it is JavaScript engine in Mozilla’s
Firefox browser. He has spent
good.” He explains, “It’s possibly true much of his career making
for shallow bugs, but not so much for software go faster.
the deep bugs that vex all projects— Franz says his “original
background” is in programming
the global quality properties of the they want to make sure there is sup-
languages and compilers.
system, architectural flaws, concur- port for the project after the launch. He recalls reading a book on
rency problems, deep security prob- So you have not just coders, you have data structures and algorithms
lems, timing and performance prob- community managers, marketing by Niklaus Wirth, the designer
of Pascal and recipient of
lems, and so on.” teams, and product managers looking the 1984 ACM A.M. Turing
Finally, Scherlis warns OSS is not re- at what is the experience of users com- Award, then writing him a
ally “free.” In reality, most users will pay ing to the project.” letter saying, “this would be
someone to adapt the software to work interesting” and asking “what
can I do to study under you?”
in their data centers and will incur inter- He received a response from
Further Reading
nal support and maintenance costs for the university registrar’s office,
it. If the software is mission critical, the Charny, B. Franz recalls, “saying they
Microsoft Raps Open-Source Approach, had taken the liberty to enroll
company will want to devote staff to the me in their series of entrance
CNET News, May 3, 2001
external open source project to ensure http://www.cnet.com/news/microsoft-raps- examinations, and asked could
its needs are met over time. open-source-approach/ I be there in two weeks.
OSS has become big business since I passed, and that is how
Google I got the opportunity to study
Stallman started the Free Software Interviews with Google’s Angular computer science at ETH
Foundation. The company GitHub has team about their use of Microsoft’s Zurich, the Swiss Federal
become the go-to place for develop- open source TypeScript Institute of Technology.”
https://www.youtube.com/ Today, Franz sees computer
ers and users of open software, from
watch?v=hvYnjJc88OI security as a more immediate
large companies like Apple, Google, problem. As a result, he has
and Microsoft to thousands of start- Kon, F. and Souza, B. been helping to pioneer
The Open-Source Ecosystem, Open Source artificial diversity. “The idea
ups. According to GitHub’s Brandon Initiative, 2012 behind artificial diversity is
Keepers, the company hosts 31 mil- http://flosscc.org/spread-the-word [video] to generate multiple versions
lion open source projects used by 12 Meeker, H. of a program, so an attacker
million developers. doesn’t know which version is
Open (Source) for Business: A Practical
Keepers, GitHub’s head of open running, making it far more
Guide to Open-Source Software Licensing,
difficult to exploit.”
source software, commends Apple for CreateSpace Independent Publishing During the last few years,
the way it released in 2014 its open Platform, April 6, 2015 Franz says he has been focused
http://www.amazon.com/exec/obidos/ on making things safer, rather
source programming language Swift. ASIN/1511617772/flatwave-20 than faster, although not at
“The way they did their launch was one the expense of performance.
Stallman, R.
of the most impressive we’ve seen,” “We invent a security feature,
Free Software, Free Society: Selected
Keepers says. “They invited the com- Essays, 3rd ed., Free Software Foundation, which creates a drag, and then
munity into the process.” we put on our other hat and
Oct. 2015
optimize the drag away again.
That is the wave of the future as de- http://shop.fsf.org/product/free-software-
So in the end, we have a new
velopers take open source more and free-society-3-paperback/ security feature which is almost
more seriously, Keepers predicts. “We performance-neutral.”
Gary Anthes is a technology writer and editor based in —John Delaney
are seeing companies treating open
Arlington, VA.
source launches like product launch-
es. They want to make a big splash, but © 2016 ACM 0001-0782/16/08 $15.00
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 17
news
Smartphone Apps
for Social Good
Mobile apps make it easier, faster, and cheaper to create massive impact
on social causes ranging from world hunger to domestic violence.
T
HE INTERNET IS chock-full of
gripes about Millennials, the
smartphone-obsessed gen-
eration that reached young
adulthood at the turn of the
century. Millenials are entitled, lazy,
self- —and selfie- —absorbed, and un-
interested in the world at large. They
vex and puzzle employers in equal mea-
sure, and they cannot be counted on to
do anything other than, well, whatever
they feel like doing.
Tell that to a new generation of app
makers who are busy building pro-
grams that make it easy and fun to do
massive good around the world. Their
apps feed the hungry, clothe the naked,
and shelter the homeless, all with a tap
of that little screen typically reserved
for Angry Birds or Amazon purchases.
The first wave of smartphone apps—
the Instagrams, Foursquares, and
Snapchats of the world—prided them-
selves on being social. This new genera-
tion of apps prides itself on being so-
cially good, and they are being adopted
most frequently by Millennials.
Apps that do social good range from
ethical marketplaces like Orange Harp, The Lunchbox Fund fosters education by providing nourishing meals to children in rural
which “makes the world more socially areas of South Africa.
conscious and sustainable by provid-
ing people access to amazing products “Smartphone users are ordering (we are only human). The human race
and behind-the-scenes details about meals, cars, making appointments, has always had an altruistic streak; it
how they are made,” to Feedie, which and conducting more and more as- is not exclusive to recent generations.
donates a meal to the non-profit Lunch- pects of their personal and work lives Yet the youth of today have a few ad-
box Fund each time a user shares pic- from their devices,” de Brun says. vantages that help them do more good
tures of his or her own food at partici- “Why wouldn’t they also be able to do- faster, says Anbu Anbalagapandian,
pating restaurants. nate, give back, or effect social change who works for the Orange Harp ethical
Far from wasting their time order- from their phones as well?” mobile marketplace.
ing stuff, broadcasting their breakfast To all the Millennial naysayers out “With advances in technology, it
PHOTO C OURT ESY OF THE LUNCH BOX F UND
plans, or gaming with friends, Millen- there, it might be time to revise your is much easier to create solutions
nials are using apps like these that criticisms: the Millennial generation is and have a bigger impact,” says An-
do social good to change the world, connected, conscientious, and ready to balagapandian. “For example, donat-
because they are conditioned to do combat social ills. ing food from a local restaurant to a
so, says Steve de Brun, co-creator of homeless shelter, or microlending to
MicroHero, a free app that allows us- Charity Meets Cool a small business in remote parts of
ers to earn money for charity by taking Perhaps Millennials like to pat them- the world, have been made incred-
online surveys. selves on the back a little too much ibly easy.”
Phone screens displaying apps that allow users to give to those in need; from left, Orange Harp, Share the Meal, and Project Noah.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 19
news
ShareTheMeal is another app that says de Brun, will cause more corporate
does good in a cost-effective way by social responsibility programs to get in
leveraging cross-border engagement. Social good apps the game, and that is when things will
Developed by the United Nations World succeed because start to get really interesting.
Food Program, the app makes it easy When social good apps appeal not
for a user to tap once and donate $0.50, they are a win-win-win just to individuals but entire corpora-
enough to feed a child for an entire day for charitable tions, their impact grows by orders of
in most of the world. Instead of costly magnitude.
outreach, users may download the app organizations, users, That is because the incentives for
from anywhere in the world and donate. and participating doing good and doing well are aligned.
The U.N. claims this initiative’s admin- “Entrepreneurs are inventing business
istrative costs are among the lowest in businesses. models that provide social good and
the non-profit world, with 90% of dona- provide for a profitable, sustainable
tions going directly to operations. business,” de Brun says.
“So many people want to dedicate Other app makers agree. Nick Ma-
time and energy to charities that they rino is director of Social Change at
feel passionate about, but there may nesses that have pressing questions TangoTab, an app that donates a meal
be certain factors preventing them they need answered. The result is a to local food charities each time a user
from doing so, like geographic barri- type of conscientious capitalism that checks in at a local restaurant. “I be-
ers,” says Sophie Barnett, the Lunch- gives businesses a direct incentive to lieve we have seen an overall increase
box Fund’s digital coordinator. “Social support charitable causes. Currently, in businesses working with causes,”
good apps, accessible from anywhere, MicroHero works with over 200 chari- Marino says.
at any time, have made doing good infi- ties, including the World Wildlife Fund Restaurants like Starbucks, Nobu,
nitely easier.” and the American Red Cross. and Maggiano’s have signed up to
Social good apps succeed because This model only works when you work with TangoTab, because the
they are a win-win-win for charitable understand how consumers actually best social good apps leverage con-
organizations, users, and participat- use smartphones; you cannot simply sumer behavior to create real change,
ing businesses. In the case of Feedie, use an app notification to ask for a Marino explains.
restaurants pay $500 as a tax-deduct- donation and call it a day, says Micro- “Apps like TangoTab are making
ible donation in anticipation of 2,000 Hero’s de Brun, adding that apps must it easier for people to make a differ-
photos of their meals being posted. provide an experience just like any ence by doing things they already do
They receive baked-in buzz as photos consumer product. daily. People dine out all the time. We
of their culinary creations are shared “Smartphone operating systems give people the opportunity to impact
by users who feel good when they do and app user interfaces are getting bet- someone in need just by doing what
good, and they garner media atten- ter at serving end users in natural, even they’re already doing.”
tion when Feedie is featured by me- delightful ways,” he says. “If done well,
dia outlets like Mashable, The Huff- social good actions can be very sponta-
Further Reading
ington Post, and Time, marketing neous and efficient from a phone.”
exposure for which many businesses In short, charitable giving cannot Wigglesworth, V.,
(2015) SafeNight app comes to the aid
would gladly pay. Thanks to Feedie’s only become widespread on mobile
of domestic violence victims in North
ingenuity, businesses pay to feed the devices; it can become second nature Texas. The Dallas Morning News. http://
hungry instead. if done right, just like opening Face- www.dallasnews.com/news/domestic-
“We want to make taking a Feedie book or checking Snapchat. There is an violence/20151009-safenight-app-comes-
photo as ubiquitous as posting to Ins- added financial benefit for consumers, to-the-aid-of-domestic-violence-victims-in-
north-texas.ece
tagram and create a sustainable, scal- too: you can claim many types of chari-
able impact,” says Barnett. table donations as tax deductions. Godfrey, M,.
(2013) Feedie App Turns Food Photos
While the apps themselves do not of-
Into Charitable Giving. ABC News. http://
Doing Well While Doing Good fer much help with this (you still must abcnews.go.com/Technology/feedie-
Feedie highlights an important truth claim a charitable donation on your app-turns-food-photos-charitable-giving/
about social good apps of any type: own), they do make it easier to give in story?id=20060750
those that create the biggest impact the first place. Chang, L,.
are the ones that help others do well De Brun believes social good apps (2015) With the ShareTheMeal App, Your
while doing good. That is exhibited in like MicroHero are just getting started. 50-Cent Donation Can Help End World
Hunger. Digital Trends. http://www.
the business model of the MicroHero As user bases grow, social good app
digitaltrends.com/mobile/help-end-world-
survey-taking app. developers will find more fluid ways to hunger-with-the-uns-sharethemeal-app/
Users are asked to complete free connect in-app actions with actual mon-
surveys in MicroHero; each complet- etary impact. Gamification will help:
Logan Kugler is a freelance technology writer based in
ed survey results in a donation to the the more giving feels like play, the more Tampa, FL. He has written for over 60 major publications.
user’s favorite charity by companies engagement we will see from socially
that conduct market research, or busi- conscious users. Widespread adoption, © 2016 ACM 0001-0782/16/08 $15.00
D
goal is to make the magazine accessible to anyone with an interest in
.4
NO
2•
L .2
VO
16
20
R
SU
X
zine
the word. Our editors represent a team of students with diverse interests s Th
e AC
MM
aga
oad
who are undergrads and graduate students from around the globe. Cr o
s sr
S
G
.OR
D
.3
NO
http://xrds.acm.org/volunteer.cfm G2
0 16
VO
L .2
2•
R
R IN
SP
nt s
ude
r St
X
e fo
azin
M Mag
e AC
s Th
Association for s sr
oad
Cr o
Computing Machinery
S
G
.OR
CM
S .A
X RD
D
O .2
•N
L .2 2
VO
15
20
TER
R
W IN
nt s
St ude
X
for
zine
aga
MM
e AC
s Th
oad
s sr
Cr o
G
.OR
CM
S .A
X RD
ital ion
X
Digbricat
az
Mag
A CM
Theet
The
ads
Fa
ro
Cr os s
S
rn
Intehings
for n
est atio abric .O RG
anif bric
CM
A M ital Fa F S .A
with
X RD
d
Han
of T
ig g
the D intin ends a
t Pr
.3
for s
O
•N
cts ing Sof n L L .2 1
spe f Th icatio 15
VO
Pro rnet o tion r 0
Fab
G2
nizaes of
R IN
Inte SP
hro
ync eng ss
lob al S e Challwarene nt s
G d th rk A St ude
an etwo for
N zine
ding aga
Buil e AC
MM
s Th
oad
s sr
Cr o
V
viewpoints
I
N 1967, THE Silver Bridge col- ply because we are ignorant as to which Protocols can greatly simplify se-
lapsed into the Ohio River specific elements will fail. curity by removing reliance on com-
during rush hour. Instead of munications channels, but in practice
redundancy the bridge used Security Building Blocks they have proven deceptively tricky. In
high-strength steel. The failure To build strong security systems, we 1996, I co-authored the SSL v3.0 pro-
of a single eyebar was catastrophic.a need reliable building blocks. Crypto- tocol for Netscape, which became the
Today’s computing devices resemble graphic algorithms are arguably the basis for the TLS standard. Despite
the Silver Bridge, but are much more most important building block we have nearly 20 years of extensive analysis,
complicated. They have billions of today. Well-designed algorithms can researchers have identified new issues
lines of code, logic gates, and other provide extraordinary strength against and corner cases related to SSL/TLS
elements that must work perfectly. certain attacks. For example, Diffie- even relatively recently. Still, I believe
Otherwise, adversaries can compro- Hellman, RSA, and triple DES—known we are reaching the point of having
mise the system. The individual failure since the 1970s—continue to provide cryptographically sound protocols.
rates of many of these components are practical security today if sufficiently Although breakthroughs in quantum
small, but aggregate complexity makes large keys are used. computing or other attacks are con-
vulnerability statistically certain. ceivable, I am cautiously optimistic
This is a scaling problem. Security- that current versions of the TLS stan-
critical aspects of our computing plat- To build strong dard (when conservative key sizes and
forms have seen exponential increases configurations are chosen) will resist
in complexity, rapidly overwhelming security systems, cryptanalysis over many decades.
defensive improvements. The futility we need better Unfortunately, my optimism does
of the situation leads to illogical rea- not extend to actual implementations.
soning that structural engineers would building blocks. Most computing devices running SSL/
never accept, such as claiming that ob- TLS are riddled with vulnerabilities that
viously weak systems are “strong” sim- allow adversaries to make an end-run
around the cryptography. For example,
a “The Collapse of the Silver Bridge: NBS Deter- errant pointers in device drivers or bugs
mines Cause,” 2009; https://1.usa.gov/21cRgUV in CPU memory management units can
destroy security for all software on a de- be straightforward to secure. The branches—the bits of the secret key.
vice. To make progress, we need another problems I encountered were a lot Upgrading to a digital storage oscil-
building block: simple, high-assurance more interesting and less intuitive loscope enabled far more advanced
hardware for secure computation. than I expected. analysis methods. With my colleagues
I noticed small data-dependent Joshua Jaffe and Benjamin Jun, I devel-
Secure Computation in Hardware correlations in timing measure- oped statistical techniques (Differen-
Consider the problem of digitally ments of devices’ cryptographic op- tial Power Analysis, or DPA) to solve for
signing messages while securing erations. Cryptographic algorithms keys by leveraging tiny correlations in
a private key. Meaningful security are extremely brittle; they are very noisy power consumption or RF mea-
assurances are impossible for soft- difficult to break by analyzing binary surements.c
ware implemented on a typical PC or input and output messages, but fail Side channels weren’t the only is-
smartphone due to reliance on com- if attackers get any other informa- sue. For example, scan chains and oth-
plex and untrustworthy components, tion. The timing variations violated er test modes can be abused by attack-
including the hardware, operating the algorithms’ security model, and ers. Researchers and pay TV pirates
system, and so forth. Alternatively, in practice allowed me to factor RSA independently discovered that glitches
if the computation is moved to iso- keys and break other algorithms.b and other computation errors can be
lated hardware, the private key’s se- I bought the cheapest analog os- devastating for security.d
curity depends only on the logic of cilloscope from Fry’s electronics and Fortunately, practical and effective
IMAGE BY ALICIA KUBISTA /A ND RIJ BORYS ASSOCIAT ES
a comparatively simple hardware placed a resistor in the ground input of solutions have been found and imple-
block (see Figure 1). The amount of a chip doing cryptographic operations. mented to these issues. For example,
security-critical logic is reduced by The scope showed power consumption nearly 10 billion chips are made an-
many orders of magnitude, turning varying with the pattern of branches nually with DPA countermeasures.
an unsolvable security problem into taken by the device’s processor. I could
a reasonably well-bounded one. easily identify the conditions for these c P. Kocher, J. Jaffe, and B. Jun, “Differential
In the 1990s, I started investigat- Power Analysis,” 1999; https://bit.ly/1XLZhSZ
d D. Boneh, R. DeMillo, and R. Lipton,
ing secure hardware, assuming that b P. Kocher, “Timing Attacks on Implementa- “On the Importance of Checking Crypto-
simple mathematical operations like tions of Diffie-Hellman, RSA, DSS, and Other graphic Protocols for Faults,” 1997; https://
encryption and digital signing would Systems,” 1996; https://bit.ly/25S86vt stanford.io/1PPPFnj
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 23
viewpoints
Figure 1. A simple fixed-function secure computation block. Figure 2. Vulnerability risks increase
with the number of potential interactions
between system elements.
Security perimeter
Key Generation
Keypair
Private key
Although there is a possibility that un- most all of the (very complexf) proces- orders of magnitude less. Examples
expected new categories of attack may sor in the security perimeter, and do of on-chip security hardware include
be discovered, based on what we know, not even appear to mitigate side chan- Apple’s Secure Enclave, AMD’s Secure
a well-designed chip can be robust nel or glitch attacks. Trusted Execu- Processor, and Rambus’s CryptoMan-
against non-invasive attacks. Strate- tion Environments (TEEs) typically use ager Cores. Depending on the appli-
gies for addressing invasive attacks ARM’s TrustZone CPU mode to try to cation, a security core may offload
have also improved greatly, although isolate an independent “trusted” oper- specific functions like authentication,
still generally assume some degree of ating system, but security dependen- or can be programmable. Over time,
security through obscurity. cies include the CPU(s), the chip’s test/ these secure domains can improve
debug modes, the memory subsystem/ and evolve to take on a growing range
Adding Secure Computation RAM, the TEE operating system, and of security-sensitive operations.
to Legacy Architectures other high-privilege software.
Today’s computing architectures are The approach I find most compel- Limits of Human Comprehension
too entrenched to abandon, yet too ling is to integrate security blocks Security building blocks must be sim-
complex to secure. It is practical, how- onto large multi-function chips. These ple enough for people to comprehend
ever, to add extra hardware where criti- cores can create an intra-chip security their intended security properties.
cal operations can be isolated. Actual perimeter that does not trust the RAM, With remarkable regularity, teams
efforts to do this in practice vary wildly legacy processors, operating system, working on data security dramatically
in cost, programming models, fea- or other logic. In addition to providing overestimate what can be implement-
tures, and level of security assurance. much better security integration than ed safely. I set a requirement when
Early attempts used standalone separate chips, on-die cores cost 1–2 working on the design of SSL v3.0 that
security chips, such as SIM cards in a technically proficient person could
mobile devices, TPMs in PCs, and f J. Rutkowska, “Intel x86 Considered Harm- read and understand the protocol in
conditional access cards in pay TV ful,” 2015; https://bit.ly/1ObbBaA one day. Despite this, multiple review-
systems. These were limited to single- ers and I missed several important
purpose applications that could bear but subtle design problems, some
the cost—typically a dollar or more.
The security chip’s electrical interface
Better hardware of which were not found until many
years later.
was also a security risk, for example foundations can Vulnerability risks grow with the
allowing pay TV pirates to steal video
decryption keys for redistribution.
enable a new number of potential interactions be-
tween elements in the system. If in-
Another strategy is to add security evolutionary process teractions are not constrained, risks
modes to existing designs. Because scale as the square of the number of
the existing processor and other logic that is essential elements (see Figure 2).
are reused, these approaches add al-
most no die area. Unfortunately, this
for the technology Although secure hardware blocks
may appear to be simple, they are
reuse brings significant security risks industry’s future. still challenging to verify. Formal
due to bugs in the shared logic and methods, security proofs, and stat-
separation mechanisms. Intel’s Soft- ic analysis tools can help to some
ware Guard Extensions (SGX)e leave al- extent by augmenting our human
brains. Still, there are gaps between
e M. Hoekstra, “Intel SGX for Dummies,” 2013; these methods and the messiness of
https://intel.ly/1eQpY4P the real world. As a result, these ap-
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 25
V
viewpoints
Education
From Computational Thinking
to Computational Participation
in K–12 Education
Seeking to reframe computational thinking as computational participation.
C
OMPUTATIONAL THINKING HAS
become a battle cry for cod-
ing in K–12 education. It is
echoed in statewide efforts to
develop standards, in chang-
es to teacher certification and gradua-
tion requirements, and in new curricu-
lum designs.1 The annual Hour of Code
has introduced millions of kids to cod-
ing inspired by Apple cofounder Steve
Jobs who said, “everyone should learn
how to program a computer because
it teaches you how to think.” Compu-
tational thinking has garnered much
attention but people seldom recognize
that the goal is to bring programming
back into the classroom.
In the 1980s many schools featured
Basic, Logo, or Pascal programming
computer labs. Students typically re-
ceived weekly introductory program-
ming instruction.6 These exercises Students in a Makey Makey workshop conducted by volunteers from Robogals Wellesley.
were often of limited complexity, dis-
connected from classroom work, and Under what circumstances do they do thinking.7 Computational participation
lacking in relevance. They did not de- it, and how?2 Computational think- involves solving problems, designing
liver on promises. By the mid-1990s ing and programming are social, cre- systems, and understanding human
most schools had turned away from ative practices. They offer a context for behavior in the context of computing.
programming. Pre-assembled multi- making applications of significance It allows for participation in digital ac-
media packages burned onto glossy for others, communities in which de- tivities. Many kids use code outside of
PHOTO C OURT ESY OF ROBO GA LS W ELLESLEY
CD-ROMs took over. Toiling over syn- sign sharing and collaboration with school to create and share. Youth-gen-
tax typos and debugging problems others are paramount. Computational erated websites have appeared to make
were no longer classroom activities. thinking should be reframed as com- and share programmable media online.
Computer science is making a putational participation. These sites include video games, inter-
comeback in schools. We should not active art projects, and digital stories.
repeat earlier mistakes, but leverage Computational Participation They are inherently do-it-yourself (DIY),
what we have learned.5 Why are stu- This idea expands on Jeannette Wing’s encouraging youth programming as an
dents interested in programming? original definition of computational effective way to create and share online,
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 27
V
viewpoints
Kode Vicious
Chilling the
Messenger
Keeping ego out of software-design review.
Dear KV,
I was recently hired as a mid-level Web
developer working on version 2 of a
highly successful but outdated Web
application. It will be implemented
with ASP.Net WebAPI. Our archi-
tect designed a layered architecture,
roughly like Web Service > Data Ser-
vice > Data Access. He noted that data
service should be agnostic to Entity
Framework ORM (object-relational
mapping), and it should use unit-of-
work and repository patterns. I guess
my problem sort of started there.
Our lead developer has created a
solution to implement the architec-
ture, but the implementation does
not apply the unit-of-work and reposi-
tory patterns correctly. Worse, the
code is really difficult to understand
and it does not actually fit the archi-
tecture. So I see a lot of red flags com-
ing up with this implementation. It
took me almost an entire weekend to
work through the code, and there are
still gaps in my understanding.
This week our first sprint starts,
and I feel a responsibility to speak up
and try to address this issue. I know new kid on the block trying to change Dear ~Opinionated,
that I will face a lot of resistance, just the game. I also don’t want to be per- Let me work backward through your
based on the fact that the lead devel- ceived as Mr. Know-It-All, even though letter from the end. You are asking me,
oper wrote that code and understands I might be a little more opinionated Kode Vicious, how to point out prob-
it more than the alternatives. He may than I should be sometimes. lems without offending anyone? Have
IMAGE BY MA RYNA PLESH KUN
not see the issue that I will try to con- My question is, how can I convince you read any of my previous columns?
vey. I need to convince him and the the team that there is a real problem Let’s just start out with the KV ground
rest of the team that the code needs with the implementation without of- rules: it’s only the law and other del-
to be refactored or reworked. I feel fending anyone? eterious side effects that keep me on
apprehensive, because I am like the ~Opinionated the “right” side of violence in some
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 29
V
viewpoints
Viewpoint
Teamwork in
Computing Research
Considering the benefits and downsides
of collaborative research.
T
E A M WO RK H AS LON G been a sonalities are fruitfully combined.6 In statistically significantly higher ratings
part of computing research, short, research teams of two to 10 mem- than papers whose authors were either all
but now advanced technolo- bers and larger groups of hundreds of academics or all business practitioners.
gies and widespread profi- researchers, can accomplish much more This added to the evidence that diversity
ciency with collaboration tech- and conduct higher quality work than in teams is a catalyst for high quality.8
nologies are creating new opportunities. they could just two decades ago when the However, teamwork has downsides,
The capacity to share data, computing Internet was a novelty.10 requiring extra coordination among
resources, and research instruments has The growth of research teams was collaborators, learning new disciplines,
been growing steadily, just as predicted well documented in a series of papers by adjusting to fresh research methods,
when Bill Wulf coined the term “collabo- Wuchty, Jones, and Uzzi.5,13,14 They report- and accommodating different person-
ratories” a quarter of a century ago.2,15,16 ed that from the 1950s to 2000 the aver- alities.4 These downsides mean that
Teamwork has become the over- age number of journal paper co-authors those who engage in teamwork will
whelmingly dominant form of research, in science and engineering grew from 1.9 need to learn how to do it effectively, so
so rewarding effective teams, teaching to 3.5. They also showed that the impact as to attain the full benefits.
our students how to collaborate, and sup- as measured by journal citation counts
porting research on what works and what increased as the number of authors grew. Doing Teamwork Right
doesn’t have taken on new importance. Furthermore, the benefit of teamwork The interest in research teams and larger
grew over time. In 1955, team papers at- groups has now accelerated with the pub-
The Case For and tracted 1.7 times as many citations as lication of the National Academies report
Against Teamwork solo-authored papers, but by 2000 the Enhancing the Effectiveness of Team Science.3
In recent years, the tools for locating rel- advantage grew to 2.1 times as many cita- This report adds recent evidence that
evant documents, finding team members tions, suggesting that technologies and teams and larger groups are a growing
with special skills, coordinating sched- teamwork skills had enabled teams to be phenomenon in science and engineer-
ules, and refining reports collaboratively more effective than they were in the past. ing research, where multiple authorship
has grown substantially.7 In addition to A study of the papers for the ACM SIG- has risen to 90% of all papers in 2013.
these technology advances, the willing- KDD 2014 Conference, by program chairs The report makes another useful
ness and fluency with which young re- Jure Leskovec and Wei Wang, added evi- contribution by characterizing seven
searchers appear to use video and audio dence of the benefits of teamwork. The dimensions that challenge today’s re-
conferencing, curated datasets, shared reviewer ratings of the 1,036 submitted search teams (see the accompanying ta-
document editors, task managers, and papers increased steadily for papers with ble). Teams on the left side of the range
other collaboration tools grows steadily. up to five co-authors, then remained are easier to manage, while teams on
Another driving force for teamwork level. Reviewer ratings may be imperfect, the right side of the range are more dif-
and larger group collaborations is what but this bit of evidence seems potent, es- ficult to manage, suggesting that these
I see as the increased ambition of team pecially since this conference had an im- deserve more study.3 The report offers
members and the growing expectations pressively rigorous acceptance rate of ap- suggestions of how to improve team pro-
from research leaders. Teamwork brings proximately 14%. Another outcome was cesses and calls for increased research
more than larger capacity for work; it tied to the ratings for papers. Those pa- on teams and larger groups.
opens new possibilities when different pers whose authors included a mix of aca- Teams and larger groups of academics
disciplines, research methods, and per- demics and business practitioners had and practitioners seem likely to be more
effective in choosing meaningful prob- come part of breakthroughs than those A third step, for government agen-
lems, forming successful research plans, who go it alone. cies, would be to increase funding to
and in testing hypotheses in living labo- A near-term impediment to teamwork study computing research teams so as to
ratories at scale. Teaming between ap- is the difficulty that researchers expect to enable leaders to form and manage suc-
plied and basic researchers is likely to be have when facing hiring, tenure, and pro- cessful teams. Teamwork is difficult since
a growing trend, as indicated by a recent motion committees, who are perceived it requires different skills than working
National Science Foundation program as having trouble in assessing individu- alone, but the potential for greater impact
announcement, Algorithms in the Field:11 als who contribute to teams. Even team makes teamwork attractive. A strong re-
“Algorithms in the Field encourages clos- members who have participated in many search agenda would include applied and
er collaboration between two groups of award-winning papers fear they will find basic components to understand which
researchers: (i) theoretical computer sci- it difficult to convince review committees incentives and rewards best amplify suc-
ence researchers, who focus on the de- of their contributions. Being a first author cess within the seven dimensions of the
sign and analysis of provably efficient and helps gain recognition, as does docu- National Academies report.
provably accurate algorithms for various menting the role of each team member in
computational models; and (ii) applied the acknowledgments section. Writing a References
1. ACM 2014 Conference on Knowledge Discovery and
researchers including a combination of single-author paper, when this is warrant- Data Mining (ACM-KDD); www.kdd2014.org
systems and domain experts.” ed, may also help in many disciplines. 2. Cerf, V.G. et al. National Collaboratories: Applying
Information Technologies for Scientific Research.
Teamwork between academics and National Academy Press, Washington, D.C. (1993).
3. Cooke, N.J. and Hilton, M.L., Eds. Enhancing the
practitioners can have strong benefits, Recommendations Effectiveness of Team Science. National Academies
as does multidisciplinary collaboration In light of the growing interest in team- Press, Washington, D.C. (2015); http://www.nap.
edu/19007
within academic communities. A clear work, appointment, promotion, and ten- 4. Cummings, J. and Kiesler, S. Collaborative research
testimonial for joint research bridging ure committees would do well to update across disciplinary and organizational boundaries.
Social Studies of Science 35, 5 (2005), 703–722.
computer science and other disciplines their methods for documenting and as- 5. Jones, B.F., Wuchty, S., and Uzzi, B. Multi-University
comes from David Patterson’s review of sessing teamwork, so as to encourage research teams: Shifting impact, geography, and
stratification in science. Science 322, 5905 (2008),
his 35-plus years running research labs and reward effective team participation 1259–1262; DOI:10.1126/science.1158357
on computer systems: “The psychologi- and leadership. One example to follow 6. Nelson, B. The data on diversity. Commun. ACM 57, 11
(Nov. 2014), 86–95.
cal support of others also increases the is the University of Southern Califor- 7. Olson, J.S. and Olson, G.M. Working Together Apart:
collective courage of a group. Multidisci- nia, which has developed guidelines to Collaboration over the Internet. Morgan & Claypool
Publishers, 2013.
plinary teams, which increasingly involve emphasize a variety of forms of collab- 8. Page, S.E. The Difference: How the Power of Diversity
disciplines outside computer science, orative scholarship and to introduce at- Creates Better Groups, Firms, Schools, and Societies.
Princeton University Press, NJ, 2007.
have greater opportunity if they are will- tribution standards for contributions to 9. Patterson, D. How to build a bad research center.
ing to take chances that individuals and larger projects.12 Commun. ACM 57, 3 (Mar. 2014), 33–36.
10. Spector, A., Norvig, P., and Petrov, S. Google’s hybrid
companies will not.”9 A second step, for computing edu- approach to research, Commun. ACM 55, 7 (July
2012), 34–37.
Patterson concludes with his vision cators, would be to increase teamwork 11. U.S. National Science Foundation Program on
of the growth of multidisciplinary teams: training and experiences, so as to raise Algorithms in the Field (AitF); http://www.nsf.gov/
pubs/2015/nsf15515/nsf15515.htm
“Whereas early computing problems the quality of students’ work. Team 12. University of Southern California, Guidelines for
were more likely to be solved by a single projects in undergraduate and graduate Assigning Authorship and for Attributing Contributions
to Research Products and Creative Works (Sept.
investigator within a single discipline, I courses would train students in using 16, 2011); http://www.usc.edu/academe/acsen/
believe the fraction of computing prob- collaboration tools and nurture their Documents/senate%20news/URC_on_Authorship_
and_Attribution_20110916.pdf
lems requiring multidisciplinary teams communication skills for future profes- 13. Wuchty, S., Jones, B. F., and Uzzi, B., The increasing
will increase.” sional or research jobs. The National dominance of teams in production of knowledge.
Science 316, 5827 (2007a), 1036–1039. DOI:10.1126/
There is always room for solitary re- Academies report makes many recom- science.1136099
searchers who wish to pursue their own mendations that need to be tailored to fit 14. Wuchty, S., Jones, B. F., and Uzzi, B., Why do team-
authored papers get cited more? Response. Science
projects. There are substantial advan- local computing cultures and the seven 317, 5844, (2007b), 1497–1498.
tages to working alone, but those who dimensions on which today’s research 15. Wulf, W.A. The collaboratory opportunity. Science 261,
5123 (1993), 854–855; DOI: 10.1126/science.8346438
learn team skills are more likely to be- teams differ in complexity. 16. Wulf, W.A. The National Collaboratory—A White Paper.
In Towards a National Collaboratory, the unpublished
report of a workshop held at Rockefeller University,
Seven dimensions that challenge today’s research teams. March 17–18, 1989 (Joshua Lederberg and Keith
Uncapher, co-chairs).
Dimension Range
Diversity of team or group membership Homogeneous Heterogeneous Ben Shneiderman ([email protected]) is a Distinguished
University Professor of computer science at the University
Disciplinary integration Unidisciplinary Transdisciplinary of Maryland, a Member of the National Academy of
Engineering, and the author of The New ABCs of Research:
Team or group size Small (2) Mega (1000s)
Achieving Breakthrough Collaborations (Oxford University
Goal alignment across teams Aligned Divergent or Misaligned Press, April 2016).
Permeable team and organizational Stable Fluid
boundaries Thanks to the anonymous reviewers and readers of early
drafts, including Nancy Cooke, Gerhard Fischer, Kara Hall,
Proximity of team or group members Co-located Globally distributed Margaret Hilton, Judy Olson, Scott Page, Dave Patterson,
Task interdependence Low High and Jennifer Preece.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 31
practice
D OI:1 0.1 1 4 5/29 09 480
Debugging
Distributed
Systems
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 33
practice
alization of a distributed system’s ex- The colored boxes at the top represent distributed-system executions and
ecution, ShiViz displays the happens- nodes, and the vertical lines below corresponding logs. Figure 2 details
before relation. Given event e at node n, them are the node timelines. Circles some of these operations.
the happens-before relation indicates on each node’s timeline represent
all the events that logically precede e. events executed by that node. Edges Understanding Distributed-System
Other events might have already oc- connect events, representing the re- Executions
curred at other nodes according to corded happens-before relation: an ShiViz helps developers to understand
wall-clock time, but node n cannot tell event that is higher in the graph hap- the relative ordering of events and
whether those other events happened pened before an event positioned the likely chains of causality between
before or after e, and they do not affect lower in the graph that it is connected events, which is important for debug-
the behavior of e. This partial order to via a downward path. ShiViz aug- ging concurrent behavior; to query for
can rule out which events do not cause ments the time-space diagram with certain events and interaction patterns
others, identify concurrent events, operations to help developers explore between hosts; and to identify structur-
and help developers mentally replay
parts of the execution. Figure 1. Time-space diagram of an execution with three nodes.
Figure 1 illustrates an execution of
the two-phase commit protocol with replica 1 tx manager replica 2
one transaction manager and two rep-
tx prepare
licas.1 This time-space diagram is a [0,1,0)
visualization of the underlying hap-
commit
pens-before partial order, showing an [0,1,1)
execution with three nodes. Lines with abort
[1,1,0) r2 commit
arrows denote the partial ordering of [0,2,1)
events, each of which has an associated
r1 abort
vector timestamp in brackets. (See the [1,3,1)
accompanying sidebar on timestamps.)
Figure 2 shows a screenshot of
ShiViz visualizing an execution of a tx abort
[1,4,1)
distributed data-store system called
tx aborted tx aborted
Voldemort.12 In the middle of the [2,4,1) [1,4,2)
screen is the time-space diagram,
with time flowing from top to bottom.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 35
Protocol negotiated for Socket[addr=/127.0.0.1,port=64172,localport=64169]: voldemort-native-v1
Protocol negotiated for Socket[addr=/127.0.0.1,port=64173,localport=64166]: voldemort-native-v1
practice
systems more effectively. bucket.org/shiviz/). Watch a video dem- 7. Killian, C., Anderson, J. W., Jhala, R., Vahdat, A. Life,
death, and the critical transition: Finding liveness
Clustering executions. To help onstrating key ShiViz features at http:// bugs in systems code. In Proceedings of the 4th
manage many executions, ShiViz sup- bestchai.bitbucket.org/shiviz-demo/. Usenix Conference on Networked Systems Design
and Implementation, (2007); https://www.usenix.org/
ports grouping executions into clus- legacy/event/nsdi07/tech/killian/killian.pdf.
ters. A user can cluster by the num- Acknowledgments
8. Liu, X., Guo, Z., Wang, X., Chen, F., Lian, X., Tang, J.,
Wu, M., Kaashoek, M. F., Zhang, Z. D3S: Debugging
ber of nodes or by comparison to a We thank Perry Liu and Albert Xing, deployed distributed systems. In Proceedings of
base execution, using as a distance who helped develop ShiViz; Jenny the 5th Usenix Symposium on Networked Systems
Design and Implementation, 2008; 423–437; http://
metric the differencing mechanism Abrahamson, who developed the ini- static.usenix.org/event/nsdi08/tech/full_papers/
liu_xuezheng/liu_xuezheng.pdf.
described earlier. Cluster results are tial ShiVector and ShiViz prototypes; 9. Mace, J., Roelke, R., Fonseca, R. Pivot tracing: Dynamic
presented as distinct groups of listed and Donald Acton and Colin Scott, who causal monitoring for distributed systems. In Proceedings
of the 25th Symposium on Operating Systems Principles,
execution names. helped evaluate ShiViz. This work is (2015); 378–393; http://sigops.org/sosp/sosp15/
Execution clusters aid in the inspec- supported by NSERC USRA, the NSERC current/2015-Monterey/122-mace-online.pdf.
10. Mattern, F. Virtual time and global states of distributed
tion and comparison of multiple ex- Discovery grant, and the National Sci- systems. In Proceedings of the International
ecutions by providing an overview of all ence Foundation under grants CCF- Workshop on Parallel and Distributed Algorithms, 1989;
http://homes.cs.washington.edu/~arvind/cs425/doc/
executions at once. Users can quickly 1453474 and CNS-1513055. This ma- mattern89virtual.pdf
scan through cluster results to see terial is based on research sponsored 11. Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker,
M., Deardeuff, M. How Amazon Web Services uses
how executions are alike or different, by DARPA under agreement number formal methods. Commun. ACM 58, 4 (2015), 66–73;
based on the groups into which they FA8750-12-2-0107. The U.S. govern- http://cacm.acm.org/magazines/2015/4/184701-how-
amazon-web-services-uses-formal-methods/fulltext.
are sorted. Clustering also helps users ment is authorized to reproduce and 12. Project Voldemort; http://www.project-voldemort.com/
pinpoint executions of interest by al- distribute reprints for governmental voldemort/.
13. Sambasivan, R.R., Fonseca, R., Shafer, I., Ganger, G. So,
lowing them to inspect a subset of ex- purposes, notwithstanding any copy- you want to trace your distributed system? Key design
ecutions matching a desired measure. right notices thereon. insights from years of practical experience. Parallel
Data Laboratory, Carnegie Mellon University, 2014;
This subset can be further narrowed http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-
PDL-14-102.pdf.
by performing a keyword search or a 14. Scott, C. et al. Minimize faulty executions of distributed
Related articles
structured search on top of the cluster- systems. In Proceedings of the 13th Usenix Symposium
on queue.acm.org on Networked Design and Implementation (Santa
ing results. Execution names among Clara, CA, Mar. 16–18, 2016) 291–309.
clusters are highlighted if their cor- Advances and Challenges in Log Analysis 15. Sigelman, B. H., Barroso, L. A., Burrows, M., Stephenson,
Adam Oliner, Archana Ganapathi, and Wei Xu P., Plakal, M., Beaver, D., Jaspan, S., Shanbhag, C. Dapper,
responding graphs contain instances http://queue.acm.org/detail.cfm?id=2082137 a large-scale distributed systems tracing infrastructure.
matching the user’s search query. Research at Google, 2010; http://research.google.com/
Leveraging Application Frameworks pubs/pub36356.html.
ShiViz helps developers visualize the Douglas C. Schmidt, Aniruddha Gokhale, 16. Wilcox, J. R., Woos, D., Panchekha, P., Tatlock, Z., Wang,
event order, search for communication and Balachandran Natarajan X., Ernst, M. D., Anderson, T. Verdi: A framework for
implementing and formally verifying distributed systems.
patterns, and identify potential event http://queue.acm.org/detail.cfm?id=1017005 In Proceedings of the 36th SIGPLAN Conference on
causality. This can help developers rea- Postmortem Debugging
Programming Language Design and Implementation,
2015, 357–368; https://homes.cs.washington.
son about the concurrency of events in in Dynamic Environments edu/~ztatlock/pubs/verdi-wilcox-pldi15.pdf.
an execution, distributed system state, David Pacheco 17. Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.
Experience mining Google’s production console logs. In
and distributed failure modes, as well http://queue.acm.org/detail.cfm?id=2039361 Proceedings of the Workshop on Managing Systems via
as formulate hypotheses about system Log Analysis and Machine Learning Techniques, 2010;
http://iiis.tsinghua.edu.cn/~weixu/files/slaml10.pdf.
behavior and verify them via execution References 18. Yang, J., et al. MoDist: Transparent model checking of
1. Bernstein, P., Hadzilacos, V., Goodman, N. Distributed unmodified distributed systems. In Proceedings of the
visualizations. Meanwhile, the gener- recovery. Concurrency Control and Recovery in 6th Usenix Symposium on Networked Systems Design
ality of logging makes ShiVector and Database Systems, Chapter 7. Addison-Wesley, 1986; and Implementation, 2009, 213–228; https://www.
http://research.microsoft.com/en-us/people/philbe/ usenix.org/legacy/event/nsdi09/tech/full_papers/yang/
ShiViz broadly applicable to systems chapter7.pdf. yang_html/.
2. Corbett, J. C. et al. Spanner: Google’s globally
deployed on a wide range of devices. distributed database. In Proceedings of the 10th
ShiViz has some limitations. ShiViz Usenix Symposium on Operating Systems Design
Ivan Beschastnikh (http://www.cs.ubc.ca/~bestchai/)
and Implementation, 2012; https://www.usenix.org/
surfaces low-level ordering informa- conference/osdi12/technical-sessions/presentation/
works on improving the design, implementation, and
operation of complex systems. He is an assistant
tion, which makes it a poor choice corbett.
professor in the department of computer science at the
3. Garduno, E., Kavulya, S. P., Tan, J., Gandhi, R.,
for understanding high-level system Narasimhan, P. Theia: Visual signatures for problem
University of British Columbia, where he leads a team
of students on projects that span distributed systems,
behavior. The ShiViz visualization is diagnosis in large Hadoop clusters. In Proceedings of
software engineering, security, and networks, with a
the 26th International Conference on Large Installation
based on logical and not realtime or- System Administration, 2012, 33–42; https://users.
particular focus on program analysis.
dering, and cannot be used to study cer- ece.cmu.edu/~spertet/papers/hadoopvis-lisa12- Patty Wang has explored approaches to helping
cameraready-v3.pdf. developers understand and compare multiple distributed
tain performance characteristics. The 4. Geels, D., Altekar, G., Maniatis, P., Roscoe, T., Stoica, I. executions, focusing on summarizing similarities and
ShiViz tool is implemented as a client- Friday: Global comprehension for distributed replay. In differences across traces.
Proceedings of the 4th Usenix Conference on Networked
side-only browser application, making Systems Design and Implementation, (2007); https:// Yuriy Brun (http://people.cs.umass.edu/~brun/) works on
www.usenix.org/legacy/event/nsdi07/tech/full_papers/ automating system building and creating self-adaptive
it portable and appropriate for ana- systems. He is an assistant professor at the University of
geels/geels.pdf.
lyzing sensitive log data. This design 5. Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J. R., Massachusetts, Amherst.
choice, however, also limits its scalabil- Parno, B., Roberts, M. L., Setty, S., Zill, B. IronFleet:
Michael D. Ernst (http://homes.cs.washington.edu/~mernst/)
Proving practical distributed systems correct. In
ity. A related and complementary tool researches ways to make software more reliable, more
Proceedings of the 25th Symposium on Operating
secure, and easier to produce. His primary technical interests
Systems Principles; 2015; http://sigops.org/sosp/sosp15/
to ShiViz is Ravel, which can scalably current/2015-Monterey/250-hawblitzel-online.pdf.
are in software engineering, programming languages, type
theory, and security, among others.
visualize parallel execution traces.6 6. Isaacs, K.E. et al. Combing the communication hairball:
Visualizing parallel execution traces using logical time.
ShiViz is an open source tool with an IEEE Transactions on Visualization and Computer Copyright held by authors.
online deployment (http://bestchai.bit- Graphics 20, 12 (Dec 2014), 2349–2358. Publication rights licensed to ACM. $15.00.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 37
practice
DOI:10.1145/ 2948983
easily with set-oriented operations.
rticle development led by
A
queue.acm.org
Second, the high-level expression of
the programmer’s intent has empow-
ered huge performance gains.
SQL has a brilliant future as a major figure This article discusses how these
in the pantheon of data representations. features are dependent on SQL cre-
ating a notion of stillness through
BY PAT HELLAND transactions and a notion of a tight
group of tables with schema fixed at
The
the moment of the transaction. These
characteristics are what make SQL dif-
ferent from the increasingly pervasive
distributed systems.
SQL has a brilliant past and a bril-
liant future. That future is not as the
singular and ubiquitous holder of data
Singular
but rather as a major figure in the pan-
theon of data representations. What
the heck happens when data is not
kept in SQL?
Success
I launched my career in database
implementation when Jimmy Carter
was president. At the time, there were
a couple of well-accepted represen-
tations for data storage: the network
model was expressed in the CODASYL
of SQL
(Conference/Committee on Data Sys-
tems Languages) standard with data
organized in sets having one set owner
(parent) and multiple members (chil-
dren); the hierarchical model ensured
all data was captured in a tree structure
with records having a parent-child-
grandchild relationship. Both of these
models required the programmer to
navigate from record to record.
Then along came these new-fangled
relational things. INGRES (and its lan-
guage QUEL) came from UC Berkeley.
System-R (and its language SQL) came
from IBM Research. Both leveraged re-
lational algebra to support set-oriented
SQL HAS BEEN singularly successful in its impact on the abstractions allowing powerful access
database industry. Nothing has come remotely close to to data.
At first, they were really, really, really
its ubiquity. Its success comes from its high-level use of slow. I remember lively debates with
relational algebra allowing set-oriented operations on database administrators who fervently
IMAGE BY FOC AL PO INT
data shaped as rows, columns, cells, and tables. believed they must be able to know the
cylinder on disk holding their records!
SQL’s impact can be seen in two broad areas. First, They most certainly did not want to
the programmer can accomplish a lot very change from their hierarchical and
network databases. As time went on, ing the computation. SQL is supposed
SQL became inexorably faster and to produce consistent results. Those
more powerful. Soon, SQL meant data- consistent results are dependent on in-
base and database meant SQL. put data that appears to be unchanging.
A funny thing happened by the early
2000s, though. People started putting Distributed Transactions and, specifically,
transactional isolation provide the
data in places other than “the data-
base.” The old-time database people
transactions across sense that nothing else is happening in
the world.
(including yours truly) predicted their different SQL The Holy Grail of transaction iso-
demise. Boy, were we wrong!
Of course, for most of us who had
databases are rare lation is serializability. The idea is to
make transactions appear as if they
worked so hard to build transaction- and challenging. happened in a serial order. They don’t
al, relational, and strongly consistent actually have to occur in a serial order;
systems, these new representations of it just has to seem like they do.
data in HTML, XML, JSON, and other In the accompanying figure, the red
formats didn’t fit into our worldview. transaction Ti depends upon changes
The radicals of the 1970s and 1980s be- made by the green transactions (Ta, Tb,
came the fuddy-duddies of the 2000s. A Tc, Td, and Tf). The blue transactions
new schism had emerged. (Tk, Tl, Tm, Tn, and To) depend on the
changes made by Ti. Ti definitely is or-
SQL, Values, and Relational Algebra dered after the green transactions and
Relational databases have tables with before the blue ones. It doesn’t matter
rows and columns. Each column in if any of the yellow transactions (Te,
a row provides a cell that is of a well- Tg, Tj, and Th) occur before or after
known type. Data Definition Language Ti. There are many correct serial or-
(DDL) specifies the tables, rows, and ders. What matters is the concurrency
columns and can be dynamically implemented in the system provides a
changed at any time, transforming the view that is serializable.
shape of the data. Suddenly, the world is still and set
The fundamental principle in the re- orientation can smile on it.
lational model is that all interrelating
is achieved by means of comparisons A Sense of Place
of values, whether these values iden- SQL and its relational data are almost
tify objects in the real world or indicate always kept inside a single system or a
properties of those objects. A pair of few systems close to each other. Each
values may be meaningfully compared, SQL database is almost always con-
however, if and only if these values are tained within a trust boundary and pro-
drawn from a common domain. tected by surrounding application code.
The stuff being compared in a query I don’t know of any systems that al-
must have matching DDL or it doesn’t low untrusted third parties to access
make sense. SQL depends on its DDL their back-end databases. My bank’s
being rigid for the duration of the query. ATM, for example, has never let me di-
There is not really a notion of some rectly access its back-end database with
portion of the SQL data having extensi- Java Database Connectivity (JDBC). So
ble metadata that arrives with the data. far, the bank has constrained me to a
All of the metadata is defined before handful of operations such as deposit,
the query is issued. Extensible data is, withdrawal, or transfer. It’s really an-
by definition, not defined (at least at noying! In fact, I can’t think of any en-
the receiver’s system). terprise databases that allow untrusted
SQL’s strength depends on a well- third parties to “party” on their data-
defined schema. Its set-oriented na- bases. All of them insist on using appli-
ture uses the well-defined schema for cation code to mitigate the foreigners’
the duration of the operations. The access to the system.
data and metadata (schema) must re- Interactions occur across these sys-
main still while SQL does its thing. tems, but they are implemented with
some messages or other data exchange
The Stillness and Isolation that is loosely coupled to the underly-
of Transactions ing databases on each side. The mes-
SQL is set oriented. Bad stuff happens sages hit the application code and not
when the set of data slides around dur- the database.
Different Places
Means Different Times Tc Tf Tl Tn
Multiple databases sharing a transac-
tional scope is extremely rare. When
a transaction executes on a database, Tb Td Th Tk Tm To
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 41
practice
DOI:10.1145/ 2948985
separation of concerns, focusing the
rticle development led by
A
queue.acm.org
attention of each service upon some
well-defined aspect of the overall
application. These services can be
Microservices aren’t for every company, composed in novel ways with loose
and the journey isn’t easy. coupling between the services, and
they can be deployed independently.
BY TOM KILLALEA Many implementers are drawn by the
allure of being able to make changes
The Hidden
more frequently and with less risk of
negative impact. Robert C. Martin de-
scribed the single responsibility princi-
ple: “Gather together those things that
Dividends of
change for the same reason. Separate
those things that change for differ-
ent reasons.”5 The clear separation of
concerns, minimal coupling across
Microservices
domains of concern, and the potential
for a higher rate of change lead to in-
creased business agility and engineer-
ing velocity.
Martin Fowler argues the adop-
tion of continuous delivery and the
treatment of infrastructure as code
are more important than moving to
microservices, and some implement-
ers adopt these practices on the way
to implementing microservices, with
MICROSERVIC ES ARE AN approach to building positive effects on resilience, agility,
distributed systems in which services are exposed only and productivity. An additional key
benefit of microservices is they can
through hardened APIs; the services themselves have enable owners of different parts of an
a high degree of internal cohesion around a specific overall architecture to make very dif-
ferent decisions with respect to the
and well-bounded context or area of responsibility, hard problems of building large-scale
and the coupling between them is loose. Such services distributed systems in the areas of
are typically simple, yet they can be composed into persistence mechanism choices, con-
sistency, and concurrency. This gives
very rich and elaborate applications. The effort service owners greater autonomy, can
required to adopt a microservices-based approach lead to faster adoption of new tech-
nologies, and can allow them to pur-
is considerable, particularly in cases that involve sue custom approaches that might be
migration from more monolithic architectures. The optimal for only a few or even for just
explicit benefits of microservices are well known and one service.
numerous, however, and can include increased agility, The Dividends
resilience, scalability, and developer productivity. While difficult to implement, a mi-
croservices-based approach can pay
This article identifies some of the hidden dividends dividends to the organization that takes
of microservices that implementers should make a the trouble, though some of the benefits
conscious effort to reap. are not always obvious. What follows is
a description of a few of the less obvious
The most fundamental of the benefits driving the ones that may make the adoption of mi-
momentum behind microservices is the clear croservices worth the effort.
to the degree possible, a simple test is that has embraced permissionless in-
enabled, it can lead to innovations by to look at the prevalence of meetings novation should have a high rate of ex-
consumers of a set of interfaces that between teams (as distinct from within perimentation and a low rate of cross-
the designers of those interfaces might teams). Cross-team meetings suggest team meetings.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 43
practice
vices, it’s easy to get a clear view of a different approaches to the gover- ˲˲ A change in one service has un-
service’s call volume, to stand up dif- nance expectations that it has of dif- expected consequences or requires a
ferent and potentially competing ver- ferent services. This will start with change in other services.
sions of a service, or to build a new ser- a consistent companywide model ˲˲ Services share a persistence store.
vice that shares nothing with the old for data classification and with the ˲˲ You cannot change your service’s
service other than backward compat- classification of the criticality of the persistence tier without anyone car-
ibility with those interfaces that con- integrity of different business pro- ing.
sumers care about the most. cesses. This will typically lead to ˲˲ Engineers need intimate knowl-
In a world of permissionless in- threat modeling for the services that edge of the designs and schemas of
novation, services can and should handle the most important data and other teams’ services.
routinely come and go. It’s worth in- processes, and the implementation ˲˲ You have compliance controls
vesting some effort to make it easier of the controls necessary to serve the that apply uniformly to all services.
to deprecate services that have not company’s security and compliance ˲˲ Your infrastructure isn’t program-
meaningfully caught on. One ap- needs. As microservices proliferate, mable.
proach to doing this is to have a suf- it can be possible to ensure the most ˲˲ You can’t do one-click deploy-
ficiently high degree of competition severe burden of compliance is con- ments and rollbacks.
for resources so that any resource- centrated in a very small number of
constrained team that is responsible services, releasing the remaining ser- Conclusion
for a languishing service is drawn to vices to have a higher rate of innova- Microservices aren’t for every compa-
spending most of their time on other tion, comparatively unburdened by ny, and the journey isn’t easy. At times
services that matter more to custom- such concerns. the discussion about their adoption
ers. As this occurs, responsibility has been effusive, focusing on auton-
for the unsuccessful service should Dividend #8: omy, agility, resilience, and developer
be transferred to the consumer who Test Differently productivity. The benefits don’t end
cares about it the most. This team Engineering teams often view the there, however, and to make the jour-
may rightfully consider themselves to move to microservices as an opportu- ney worthwhile, it’s important to reap
have been left “holding the can,” al- nity to think differently about testing. the additional dividends.
though the deprecation decision also Frequently, they will start thinking
passes into their hands. Other teams about how to test earlier in the de-
Related articles
that wish not to be left holding the can sign phase, before they start to build on queue.acm.org
have an added incentive to migrate or their service. A clearer definition of
A Conversation with Werner Vogels
terminate their dependencies. This ownership and scope can provide an
http://queue.acm.org/detail.cfm?id=1142065
may sound brutal, but it’s an impor- incentive to achieve greater coverage.
tant part of “failing fast.” As stated by Yelp in setting forth its The Verification of a Distributed System
Caitie McCaffrey
service principles, “Your interface http://queue.acm.org/detail.cfm?id=2889274
Dividend #6: is the most vital component to test.
There’s Just No Getting around It: You’re
End Centralized Metadata Your interface tests will tell you what Building a Distributed System
In Amazon’s early years, a small your client actually sees, while your Mark Cavage
number of relational databases were remaining tests will inform you on http://queue.acm.org/detail.cfm?id=2482856
used for all of the company’s criti- how to ensure your clients see those
cal transactional data. In the interest results.”7 References
1. Arkko, J. Permissionless innovation. IETF; https://
of data integrity and performance, The adoption of practices such as www.ietf.org/blog/2013/05/permissionless-
any proposed schema change had to continuous deployment, smoke tests, innovation/.
2. Conway, M.E. How do committees invent? Datamation
be reviewed and approved by the DB and phased deployment can lead to Magazine (1968); http://www.melconway.com/Home/
Cabal, a gatekeeping group of well- tests with higher fidelity and lower Committees_Paper.html.
3. Gray. J. A conversation with Werner Vogels. ACM
meaning enterprise modelers, data- time-to-repair when a problem is dis- Queue 4, 4 (2006); http://queue.acm.org/detail.
cfm?id=1142065.
base administrators, and software covered in production. The effective- 4. Honest Status Page. @honest_update,
engineers. With microservices, con- ness of a set of tests can be measured 2015; https://twitter.com/honest_update/
status/651897353889259520.
sumers should not know or care about less by their rate of problem detection 5. Martin, R.C. The single responsibility principle;
how data persists behind a set of APIs and more by the rate of change that http://blog.8thlight.com/uncle-bob/2014/05/08/
SingleReponsibilityPrinciple.html.
on which they depend, and indeed it they enable. 6. Perera, D. The crypto warrior. Politico; http://www.
should be possible to swap out one politico.com/agenda/story/2015/12/crypto-war-cyber-
security-encryption-000334.
persistence mechanism for another Warning Signs 7. Yelp service principles; https://github.com/Yelp/
without consumers noticing or need- The following indicators are helpful service-principles.
ing to be notified. in determining that the journey to
Tom Killalea was with Amazon for 16 years and now
microservices is incomplete. You are consults and sits on several company boards, including
Dividend #7: probably not doing microservices if: those of Capital One, ORRECO, and MongoDB.
Concentrate The Pain ˲˲ Different services do coordinated
A move to microservices should en- deployments. Copyright held by author.
able an organization to take on very ˲˲ You ship client libraries. Publications rights licensed to ACM.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 45
contributed articles
DOI:10.1145/ 2858789
improve energy efficiency and mini-
The aim is to improve cities’ management mize carbon-emission levels. Along
with cities’ growth, innovative solu-
of natural and municipal resources and tions are crucial for improving produc-
in turn the quality of life of their citizens. tivity (increasing operational efficien-
cies) and reducing management costs.
BY RIDA KHATOUN AND SHERALI ZEADALLY A smart city is an ultra-modern ur-
ban area that addresses the needs of
Smart
businesses, institutions, and especially
citizens. Here we should differentiate
between a smart city and smart urban-
ism. The objective of these concepts is
the same—the life of citizens. The ar-
chitects of ancient cities did not take
into consideration long-term scalabil-
Cities:
ity—housing accessibility, sustain-
able development, transport systems,
and growth—and there is no scalable
resource management that may be
applied from one decade to another.
Unfortunately, smart urbanism is not
well represented in smart cities’ devel-
Concepts,
opment. Smart urbanism must also
be considered as an aspect of a smart
city, including information-commu-
Architectures,
nication technologies. In recent years,
a significant increase in global energy
consumption and the number of con-
Research
nected devices and other objects has
led government and industrial institu-
tions to deploy the smart city concept.
from rural areas in search of better jobs and education. ˽˽ A smart city is a complex system,
meaning even a single vulnerability
Consequently, cities’ services and infrastructures are could affect all citizens’ security.
being stretched to their limits in terms of scalability, ˽˽ Future research must address high
environment, and security as they adapt to support this energy consumption, security, privacy,
lack of investment, smart citizens,
population growth. Visionaries and planners are thus and other related challenges to enable
secure, robust, scalable smart city
seeking a sustainable, post-carbon economy20 to development and adoption.
producing more than 60% of all CO2 ternet, dispatching units of energy as how organizations should deliver and
emissions (http://unhabitat.org/). needed, representing a set of distrib- reap benefits from their services (such
Innovative solutions are imperative uted renewable electricity generators as transport, energy consumption, and
to address cities’ social, economic, and linked and managed through the In- charging tolls); such models must be
environmental effects. Those solutions ternet. The IoE enables accurate, real- designed to support city development.
involve three key objectives: time monitoring and optimization of Note the “smart city” label is not a
Optimized management of energy power flows; marketing slogan. A city is “smart” if
resources. This objective could be re- Decentralized energy production. it provides better efficiency for urban
PHOTO BY YOSH IKA ZU TSUNO/ AF P/GET T Y IM AGES
alized through the Internet of Energy The IoE concept allows consumers to planning through a variety of tech-
(IoE), or smart grid technology. The be energy producers themselves, using nologies. Smart cities are also defined,
IoEa,b connects energy grids to the In- renewable energy sources and com- according to Anthony Townsend in his
bined heat and power units; decen- book Smart Cities (W.W. Norton & Com-
tralization enables smarter demand- pany, 2014), as “places where infor-
a http://www.artemis-ioe.eu/ioe_consortium_ response management of consumers’ mation technology is combined with
area.htm
b http://www.bdi.eu/BDI_english/download_
energy use; and infrastructure, architecture, everyday
content/Marketing/Brochure_Internet_of_ Integrated business models and eco- objects, and our bodies to address
Energy.pdf nomic models. These models describe social, economic, and environmental
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 47
contributed articles
problems.” The European Parliament and new public-transport solutions to sidered before cities are able to reap
proposedc this definition: “A smart city reduce car use; the benefits. Here, we describe some
is a city seeking to address public is- Home energy management. Options of the basic concepts and architectural
sues via information and communica- include timely energy billing, optimal components of smart cities, includ-
tion technology (ICT)-based solutions energy management, saving, perhaps, ing energy management (ISO 50001),
on the basis of a multi-stakeholder, 30%–40% on electricity bills; the Euro- smart homes, vehicular networks,
municipality based partnership.” This pean Commission estimates approxi- smart grids, and quality of life (ISO
is quite broad, encompassing many mately 72% of European electricity con- 37120). We then examine recent smart
fields, while the Japanese definition is sumers will have smart meters by 2020; city projects around the world, identi-
more specific, focusing on energy, in- Educational facilities. More invest- fying some of the challenges and fu-
frastructure, ICT, and lifestyle. From ment is needed to improve educational ture research opportunities. We also
these definitions, we deduce ICT plays opportunities for all, lifelong learning, highlight some of the risks introduced
a pivotal role in developing a city that education through remote learning, through information systems in the ur-
can adapt to the needs of its citizens. and smart devices in classrooms; ban environment.
By leveraging advanced power sys- Tourism. Preserving a city’s natural
tems, networking, and communica- resources promotes the growth of tour- Implementation and Deployment
tion technologies, a smart city aims ism; additionally, smart devices offer di- Designing and deploying smart cities
to enhance the lives of its citizens and rect and localized access to information; needs experts from multiple fields,
optimize territorial, economic, and Citizens’ health. Using new tech- including economics, sociology, en-
environmental resources. Smart cities nologies could improve people’s gineering, ICT, and policy and regula-
promise multiple benefits: health; citizens need full access to tion. Various frameworks describing
Safety and security. This includes high-quality, affordable healthcare, the architecture of smart cities have
surveillance cameras, enhanced emer- and wireless body-area network tech- been proposed by both industry and
gency-response services, and automat- nology—including sensors attached academic sources. One of the most
ed messages for alerting citizens; real- to the body or clothes and implanted widely adapted and adopted models
time information about a city should under the skin—can acquire health is the reference model proposed by
be available; information (such as heartbeat, blood the U.S. National Institute of Stan-
Environment and transportation. sugar, and blood pressure) and trans- dards and Technology. Smart cities
This entails controlled pollution levels, mit it in real time or offline through a are complex systems, often called
smart street lights, congestion rules, smartphone to remote servers acces- “systems of systems,” including peo-
sible by healthcare professionals for ple, infrastructure, and process com-
c http://www.smartcities.at/assets/Publika-
monitoring or treatment. ponents (see Figure 1). Most smart
tionen/Weitere-Publikationen-zum-Thema/ Despite this potential, many ele- cities models consist of six compo-
mappingsmartcities.pdf ments must be understood and con- nents: government, economy, mobil-
ity, environment, living, and people.
Figure 1. A smart city model. The European Parliament Policy De-
partment said in 2014 that 34% of
Internet smart cities in Europe have only one
of Things
such component.
Multiple approaches and methods
have been proposed to evaluate smart
cities from multiple perspectives, in-
Smart
Environment cluding an urban Internet of Things
(IoT) system for smart cities, sus-
Smart Smart tainability, global city performance,
Living Mobility future urban environments, urban
Internet Internet competitiveness, and resilience. But
of People Smart of Services several fundamental architectural
City components must be in place to make
Smart Smart a city smart.
People Economy Essential components. The basic
underpinnings of a smart city include
Smart five components:
Governance
Broadband infrastructure. This infra-
structure is pivotal, offering connectiv-
ity to citizens, institutions, and orga-
Internet nizations. However, today’s Internet
of Data lacks the robustness needed to support
smart cities’ services and data volume.
It includes both wired and wireless
networks. Wireless broadband is im- ies for creating globally interlinked li-
portant for smart cities, especially with brary data, linked data in biomedicine
the explosive growth of mobile applica- for creating orthogonal interoperable
tions and popularity and the connectiv- reference ontologies, and linked gov-
ity of smart devices;
E-services. The concept of “electronic By leveraging ernment data for improving internal
administrative processes.
services” involves using ICT in the pro-
vision of services, including sales, cus-
advanced Sustainable infrastructures. The In-
ternational Electrotechnical Commis-
tomer service, and delivery. The Inter- power systems, sion (IEC) says cities aiming to develop
net is today the most important way to
provide them (such as for tourism, city
networking, and into smart cities should start with three
pillars of sustainability: economic, so-
environment, energy, transport, securi- communication cial, and environmental. One of the
ty, education, and health). A European
Union research initiative (called the in-
technologies, a first steps in addressing sustainability
is to increase resource efficiency in all
novation framework H2020) focuses on smart city aims to domains (such as energy, transport,
developing such e-services; and
Open government data. Open govern- enhance the lives and ICT). An efficient and sustainable
ICT infrastructure is essential for man-
ment data (OGD) means data can be of its citizens and aging urban systems development.
used freely, reused, and redistributed
by anyone. A multinational initiative to optimize territorial, Adepetu et al.1 explained how an ICT
model works and can be used in sus-
promote worldwide adoption of OGD
was launched in 2012 with input from
economic, and tainable city planning. For a sustainable
ICT infrastructure, they defined various
the Microsoft Open Data initiative, Or- environmental green performance indicators for ICT
ganization for Economic Cooperation
and Development, and U.S. Open Data
resources. resource use, application lifecycle, en-
ergy impact, and organizing impact.
Initiative (http://www.data.gov). E-governance. This component fo-
A smart city can be seen as an cuses on a government’s performance
open data generator. Implementation through the electronic medium to
questions include: How can we effi- facilitate an efficient, speedy, trans-
ciently sort and filter the data being parent process for disseminating in-
produced? Who are the legal owners formation to the public and also for
of the data? And what are the restric- performing administration activities.
tions on the data? Navigating OGD An e-government system consists of
can be related to the open data barri- three components: government-to-
ers of task complexity and informa- citizen, government-to-business, and
tion quality. With OGD, there is no ex- government-to-government. E-govern-
planation of the data’s meaning, and ment allows citizens to fulfill their civic
discovering the appropriate data is and social responsibilities through a
difficult. Data is often duplicated, and Web portal. A growing number of gov-
data formats and datasets are likewise ernments around the world are deploy-
often too complex for humans and ing Web 2.0 technologies, an archi-
even machines to handle. In 2015, the tecture referred to as “e-government
Spanish standardization normative 2.0,” linking citizens, businesses, and
UNE178301 was published to help cit- government institutions in a seamless
ies evaluate the maturity of their own network of resources, capabilities, and
open data projects by relying on five information exchange.
characteristics of their data—political, Fundamental technologies. The de-
legal, organizational, technical, and sign and implementation of smart cities
social. OGD will have an enormously also involves a number of technologies:
positive effect on services offered to Ubiquitous computing. Ubiquitous
citizens, thus improving their daily devices include heterogeneous ones
lives. In this context, linked open data that communicate directly through
(LOD) could support complex and in- heterogeneous networks. The UrBan
terdisciplinary data mining analysis.12 Interactions Research Programd at
LOD is complex because data viewed the University of Oulu, Oulu, Finland,
in isolation may be irrelevant but studies urban computing, interac-
when aggregated from various sources tion among urban spaces, humans,
can yield more meaningful results and
fresh insights. LOD is used in such d Open Ubiquitous Oulu; http://www.ubioulu.fi/
applications as linked data in librar- en/home
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 49
contributed articles
be continuously
enhanced to provide a better under- their throughput and transmission
standing of networks, services, users, range. New wireless technologies (such
and users’ devices with various access
connections. Lee13 also identified six
connected—in as WiMAX and Long-Term Evolution)
are unsuitable due to their high energy
capabilities and functions of smart public places, consumption. Novel Wi-Fi technol-
ubiquitous networks including con-
text awareness, content awareness,
in public ogy (such as by the IEEE 802.11ah Task
Group) could be an efficient solution
programmability, smart resource man- transportation, for smart city services.11 IEEE 802.11ah
agement, autonomic network manage-
ment, and ubiquity. and at home— aims to help design an energy-efficient
protocol allowing thousands of indoor
Big data. Traditional database man- in order to share and outdoor devices to work in the
agement tools and data processing ap-
plications cannot process such a huge their knowledge same area and a transmission range up
to 1km at default transmission power
amount of information. Data from
multiple sources (such as email mes-
and experience. of 200mW.11
IoT. One of the main IoT goals is to
sages, video, and text) are distributed make the Internet more immersive
in different systems. Copying all of it and pervasive. As a network of highly
from each system to a centralized loca- connected devices, IoT technology
tion for processing is impractical for works for a range of heterogeneous
performance reasons. In addition, the devices (such as sensors, RFID tags,
data is unstructured. Deploying thou- and smartphones). Multiple forms of
sands of sensors and devices in a city communications are possible among
poses significant challenges in man- such “things” and devices. IoTs must
aging, processing, and interpreting be designed to support a smart city’s
the big data they generate. Big data,10 vision in terms of size, capability, and
reflecting such properties as volume, functionality, including noise moni-
variety, and velocity, is a broad term toring, traffic congestion, city energy
for complex quantitative data that re- consumption, smart parking meters
quires advanced tools and techniques and regulations, smart lighting, au-
for analyzing and extracting relevant in- tomation, and the salubrity of pub-
formation. Several challenges must be lic buildings.11 They must exploit the
addressed, including capture, storage, most advanced communication tech-
search, processing, analysis, and visual- nologies, thus supporting added-val-
ization. Also needed is a scalable analyt- ue services for a city’s administration
ics infrastructure to store, manage, and and citizens.
analyze large volumes of unstructured Cloud computing. Cloud computing
data. Smart cities can use multiple enables network access to shared, con-
hardware and software technologies to figurable, reliable computing resourc-
process the big data being produced, es. The cloud is considered a resource
including parallel system architectures environment that is dynamically con-
(such as cluster-based high-perfor- figured to bring together testbeds, ap-
mance systems and cloud platforms), plets, and services in specific instanc-
parallel file systems and parallel input/ es where people’s social interaction
output (such as parallel file systems for would call for such services;
big data storage and NoSQL databases Service-oriented architectures (SOAs).
for big data), programming (such as An SoA is a principle for software
low-level programming models, skel- structuring based on service. A smart
etal parallel programming, and generic city’s development should focus on
parallel programming), data manage- SOA-based design architectures to ad-
ment on multilevel memory, and hier- dress its challenges. A smart city thus
archy task scheduling.15 requires a new IT infrastructure, from
Networking. Networking technolo- both a technical and an organization-
gies enable devices and people to have al perspective.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 51
contributed articles
Cybersecurity architectures. Smart latency. Critical infrastructure needs tion privacy of vehicles. Adding layers
cities pose challenges to the security protection from attacks that could of security (such as VANETs, SCADA,
and privacy of citizens and govern- cripple or severely damage a city’s abil- and mobile networks) in an indepen-
ment alike. The security issues associ- ity to function, from industrial plants dent way will make the architecture of
ated with the information produced to essential services, including access a smart city very complex. A smart city
in a smart city extend to relationships to electricity, water, and gas. Attackers thus requires a cybersecurity architec-
among those citizens, as well as their exploiting vulnerabilities in industrial ture by design.
personal safety. Some smart cities are control systems (such as the supervi-
already confronted by identity spoof- sory control and data acquisition, or Projects and
ing, data tampering, eavesdropping, SCADA, systems that manage it) can Standardization Efforts
malicious code, and lack of e-services cause significant disruption in service Many cities have piloted their own
availability. Other related challenges delivery. In vehicular ad hoc networks smart city projects; see Table 1 for spe-
include scalability, mobility, deploy- (VANETs), the standard 1609.2-2013 cific smart city developments in Asia,
ment, interoperability (of multiple recommends using pseudonymous Europe, and North America.
technologies), legal, resources, and authentication for protecting the loca- Smart cities incorporate digital ser-
Pires et al.19 Local sustainable Calculation of a single percentage synthesizing multiple themes: Quantitative
development environmental sustainable development education, energy, water,
indicators transport, noise, agriculture, tourism, nature conservation, marine and
coastal environment, and biodiversity.
IBM* IBM smarter city A tool to measure a city’s performance, perform benchmarks, and Qualitative, quantitative
assessment tool identify challenges and opportunities for improvement; tool relies on
multiple themes: civic life, economic life, mobility and transport, energy
management, water management, and city services.
Lee et al.14 Framework for Taking various practical perspectives from case studies in San Francisco Qualitative
building smart cities and Seoul Metropolitan City, the framework relies on urban openness,
service innovation, partnership formation, urban proactiveness, smart city
infrastructure integration, and smart city governance.
Desouza et al.6 Resilience An approach to designing, planning, and managing for resilience, Qualitative
framework for cities including evaluation of cultural and process dynamics within cities.
Debnath et al.5 Benchmarking for A composite scoring system to measure the “smartness index” of a city’s Quantitative
smart-transport transportation system; proposed indicators rely on private transport
cities (traffic flow prediction, parking information sharing, paying tolls/parking
charges/enforcement fines, and automated and coordinated traffic signal
control), public transport (detection of passengers, passenger information
management, and detection of passengers), and emergency transport
(with emergency vehicles able to provide a priority signal).
* http://www.ibm.com/smarterplanet/us/en/smarter_cities/solutions/solution/S868511G94528M58.html
vices as part of a livable and sustain- can be used to measure a city’s per-
able environment for their citizens. formance against indicators for each
Along with their benefits comes a set smart city system. It identifies oppor-
of new issues regarding, say, open data tunities for improvement by evaluating
and interoperability for policymakers,
companies, and citizens alike. Cities Detecting component effectiveness, taking into
account their degree of modernity,
need performance-evaluation indica-
tors to measure how much they might
behavioral CO2 emissions, and the city’s energy
management. This analysis is based on
possibly improve quality of life and anomalies in daily multiple indicators to measure the cur-
sustainability. Performance evaluation
indicators often lack standardization,
human life is rent situation and the potential for im-
proving a city for the following compo-
consistency, or comparability from city very important nents of civil society: civic life (degree
to city. A series of international stan-
dards is being developed to provide a
for developing of modernity, public safety, education,
and housing); economic life (business
holistic, integrated approach to sus- smart systems. attraction strategy and online services
tainable development and resilience for businesses); mobility and transport
under ISO/TC 268 (sustainable cities (wireless networks and telecommuni-
and communities standard). cations infrastructures); energy man-
ISO 37120 establishes a set of stan- agement (production rate of renew-
dard performance-evaluation indica- able energy, smart metering, and CO2
tors that provide a uniform approach emissions); water management (water
to what is measured and how that quality and smart metering); and city
measurement is to be undertaken. services. Another smart city tool, de-
The International Telecommunica- veloped by Boyd Cohenf of Universidad
tions Union defines various key per- del Desarrollo in Chile, includes the
formance indicators for smart sus- environment, mobility, governance,
tainable cities through the standard economy, people, and living compo-
ISO/TR 37150 for fire and emergency nents (such as health, safety, cultural
response, health, education, safety, level, and happiness).
transportation, energy (the percentage Table 2 outlines the benchmarking
of a city’s population with authorized methods aimed at measuring smart
electrical service), water (the percent- cities from multiple perspectives.
age of a city’s population with a pota- Various organizations and aca-
ble water supply service), social equity, demic institutions have been working
technology and innovation (such as on smart city models. As mentioned
number of Internet connections per earlier, the IEC uses three pillars of
100,000 people), CO2 levels and reduc- sustainability—economic, social, and
tion strategies, and buildings (such environmental—to develop smart cit-
as energy consumption of residential ies. IBM combines instrumentation,
buildings). Fujitsu has also proposed interconnection, and intelligence in
a performance indicator for ICT in its smart cities model. Table 3 outlines
smart cities, using such factors as a some of the efforts being undertaken
city’s environmental impact, the ratio by various organizations.
of renewable energy to total energy
consumed, and a community’s power- Challenges and
outage frequency rate. Research Opportunities
Some type of benchmarking meth- Here, we highlight some of the chal-
od is pertinent to compare smart cities. lenges faced by smart cities while ex-
Various benchmarking methods were ploring research opportunities that
proposed in 2014 for such comparison. need more attention to assist smart
For instance, Pires et al.19 analyzed a city development and adoption.
Portuguese initiative that uses com- Challenges. The following are the
mon indicators to benchmark sustain- most noteworthy challenges to be ad-
able development across 25 Portuguese dressed.
cities and municipalities.e The Smarter Lack of investment. The concept of
City assessment tool developed by IBM smart cities reflects strong potential for
investment and business opportunities.
e Smart Cities Portugal; http://www.in-
teli.pt/uploads/documentos/documen- f http://smartcitiescouncil.com/resources/
to_1400235009_2055.pdf smart-city-index-master-indicators-survey
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 53
contributed articles
Figure 2. Smart city infrastructure investments by industry, 2014–2023. neutrality in public transport by 2025.
The bus system in Copenhagen, Den-
mark, costs €125 million annually. In
North America Europe Asia Pacific
India, the national government’s an-
Latin America Middle East and Africa
$30,000 nual budget for development of 100
smart citiesh is $1.27 billion, adding
$25,000 11.5 million homes annually. In the
European Union, smart city market
projections are expected to exceed $1
$ Millions
$20,000
trillion by the end of 2016. China’s
$15,000
future smart cities allocations exceed
$322 billion for more than 600 cities
nationwide.i All these projects dem-
$10,000
onstrate how substantial is the rate of
investment in smart cities. However,
$5,000
if some of the challenges (such as cy-
bersecurity) are not addressed early,
$–
2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 the ultimate cost of smart cities will
Source: Navigant Research only increase.
High energy consumption. The U.S.
Energy Information Administration
Figure 3. Estimating future energy consumption. estimates approximately 21% of the
world’s electricity generation was from
renewable energy in 2011, with a pro-
History 2010 Projections jected increase to nearly 25% by 2040.
250
The absence of natural resources in the
28%
World Energy Consumption by Fuel
city’s functionality that, exploited poor or nonexistent security features others have noted, ubiquitous and
correctly, yields dividends for both in connected devices, poor implemen- collaborative urban sensing integrat-
citizens and the city. tation of security features, encryption ed with smart objects can provide an
Privacy. Privacy will play a pivotal (outdated and weak encryption algo- intelligent environment. Otherwise,
role in any smart city strategy. Citi- rithms), lack of computer emergency packet latencies and packet loss are
zens interact with smart city ser- response teams, large and complex inevitably not controllable. One such
vices through their smartphones attack surfaces, patch deployment is- proposal is the Mobile Ad hoc Net-
and computers connected through sues, insecure legacy systems, lack of works (MANET) coordination proto-
heterogeneous networks and sys- cyberattack emergency plans, and de- col to opportunistically exploit MA-
tems. It is thus imperative smart nial of service (DoS). NET nodes as mobile relays for the
cities, founded on the use of ICT, A 2015 workshopj identified several fast collection of urgent data from
be adept at handling important pri- challenges, including vulnerabilities wireless sensor networks without sac-
vacy issues (such as eavesdropping in the transfer of data, physical conse- rificing battery lifetime. Simulation
and confidentiality). Domingo- quences for cyberattacks, collection results show that their cluster forma-
Ferrer 7 divided privacy into three and storage of large amounts of data in tion protocol is reliable and always
dimensions: respondent, user, and the cloud, and exploitation of city data delivers over 98% of packets in street
owner. Martinez-Ballesté et al. 16 by attackers. and square scenarios. Other issues,
proposed the concept of citizens’ Detecting behavioral anomalies including the convergence of IoT and
privacy based on statistical disclo- in daily human life is very important intelligent transportation systems re-
sure control, including methods for developing smart systems. Table 4 quire further investigation.
for safeguarding the confidential- outlines some research anomaly-de- Data management. Data plays a key
ity of information about individu- tection frameworks being proposed to role in a smart city. A huge quantity of
als when releasing their data. Ex- detect behavioral anomalies in human data will be generated by smart cities;
amples of such methods are private daily life in smart-home and smart- understanding, handling, and treating
information retrieval (obtaining grid environments in the context of it will be a challenge. However, mo-
database information belonging smart cities. Multiple challenges in- bile phone data can help achieve sev-
to someone and hiding it), privacy- clude characterization of the behaviors eral smart city objectives. Smartphone
preserving data mining (collabora- of sensor nodes (such as the accuracy data can be used to develop a variety
tion between entities to get results of detection, false positive rate, and of urban applications. For example,
without sharing all the data), loca- low computational cost). transportation analysis through mo-
tion privacy, anonymity and pseud- Research opportunities. Consider bile phone data can be applied for
onyms, privacy in RFID, and privacy these research opportunities. estimating road traffic volume and
in video surveillance. IoT management. The IoT needs transport demands. Real-time infor-
Cyberattacks. As with any infra- an efficient, secure architecture that mation from mobile-phone data about
structure, smart cities are prone to cy- enhances urban data harvesting. As the origins of visitors combined with
berattack, and the current attack sur- taxis’ Global Positioning System data
face for cities is wide open. IOActive j Designed-In Cybersecurity for Smart Cities
could help manage transportation re-
Labs identified several causes of cyber- Workshop, May 2015; http://www.nist.gov/cps/ sources, as in, say, the public’s future
attack: lack of cybersecurity testing, cybersec_smartcities.cfm demand for taxis.
Zhu et al.24 Spatial anomaly (sleeping at the wrong place), Dynamic Bayesian network; Accuracy of 99%, low User intervention needed
timing anomaly (sleepwalking at midnight); maximum-likelihood estimation false positive rate
duration anomaly (working on the computer for algorithm; and Laplace smoothing
a long time); sequence anomaly (working on the
computer for a very long time without eating)
Usman et al.23 Transmission errors, node faults, or attacks Fuzzy logic-based cross-layer Accuracy of 98%, low Tested on limited profiles of
rule-base energy consumption, sensor nodes
simplicity
Mitchell et al.17 Attacks performed by a compromized device in Behavior-rule based intrusion High intrusion detection No repair strategies
smart grid detection system rate, false positives less
than 6%
Britsh Standards Definition of standardization strategy Britsh Standards Institution Definition of Develop a guide for
Institution standardization strategy strategies for smart
cities and communities, a
smart city data concept
model, and collaborative
relationship management
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 55
contributed articles
posed integrating multimedia, hu- Conclusion Commun. ACM 57, 7 (July 2014), 86–94.
11. Khorov, E., Lyakhov, A., Krotov, A., and Guschin,
man factors, user-centered system The strong interest by municipal A. A survey on IEEE 802.11ah: An enabling
methodology, and design principles. and local governments worldwide in networking technology for smart cities. Computer
Communications 58, 1 (Mar. 2015), 53–69.
Smart cities can provide solutions to smart cities stems from their ability 12. Lausch, A., Schmidt, A., and Tischendorf, L. Data
many sustainability problems, but to improve their citizens’ quality of mining and linked open data—New perspectives for
data analysis in environmental research. Ecological
their cohesive development requires life. Here, we described some of the Modelling 295, 10 (Jan. 2015), 5–17.
an effective policy to be in place as basic concepts of smart cities, identi- 13. Lee, C., Gyu, M., and Woo, S. Standardization and
challenges of smart ubiquitous networks. IEEE
part of any solution. fying challenges and future research Communications Magazine 51, 10 (Oct. 2013), 102–110.
14. Lee, J., Gong, M., and Mei-Chih Hu, H. Towards
Information system risks. In a smart opportunities to enable large-scale an effective framework for building smart cities:
city, everything is interconnected, deployment of smart cities. Develop- Lessons from Seoul and San Francisco. Technological
Forecasting and Social Change 89 (Nov. 2014), 80–99.
including the public water system, ers, architects, and designers should 15. Ma, Y., Wu, H., Wang, L., Huang, B., Ranjan, R., Zomaya,
traffic control, public transportation, now focus on aspects of IoT manage- A., and Jie, W. Remote sensing big data computing:
Challenges and opportunities. Future Generation
and critical infrastructure. Each one ment, data management, smart city Computer Systems 51 (Oct. 2015), 47–60.
involves its own vulnerabilities. Al- assessment, VANET security, and re- 16. Martinez-Balleste, A., Perez-Martinez, P., and Solanas,
A. The pursuit of citizens’ privacy: A privacy-aware
though a smart city is a complex sys- newable technologies (such as solar smart city is possible. IEEE Communications
tem, its interconnected nature means power). We underscore when design- Magazine 51, 6 (June 2013), 136–141.
17. Mitchell, R. and Chen, I. Behavior-rule-based intrusion
a single vulnerability could greatly ing smart cities, security and privacy detection systems for safety-critical smart grid
affect citizens’ security; for example, remain considerable challenges that applications. IEEE Transactions on Smart Grid 4, 3
(Sept. 2013), 1254–1263.
an attacker might be able to connect demand proactive solutions. We hope 18. Perttunen, M., Riekki, J., Kostakos, V., and Ojala, T.
to the electric power system to gain to see smart city developers, archi- Spatio-temporal patterns link your digital identities.
Computers, Environment and Urban Systems 47
access to the network and alter pub- tects, and designers provide scalable, (Sept. 2014), 58–67.
lic transportation to potentially para- cost-effective solutions to address 19. Pires, S., Fidélis, T., and Ramos, T. Measuring and
comparing local sustainable development through
lyze intelligent transportation sys- them in the future. common indicators: Constraints and achievements in
practice. Cities 39 (Aug. 2014), 1–9.
tems, with thousands of passengers 20. Rifkin, J. The Zero Marginal Cost Society: The Internet
on board at rush hour. The attacker Acknowledgments of Things, The Collaborative Commons, and The
Eclipse of Capitalism. St. Martin’s Press, St. Martin’s
could also launch false alarms and We express our gratitude to Teodora Griffin, New York, 2015.
modify traffic lights and controllers. Sanislav, Nils Walravens, and Lyes 21. Sheldon, M., Van de Groep, J., Brown, A., Polman,
A., and Atwater, H. Plasmoelectric potentials in
Finding workable solutions is criti- Khoukhi for their comments and sug- metal nanostructures. Science 346, 828 (Nov. 2014),
cal; otherwise the public will not trust gestions on early drafts of this article, 828–831.
22. Silva, A., Monticone, F., Castaldi, G., Galdi, V., Alù, A.,
smart city projects or deem them vi- helping improve its content, quality, and Engheta, N. Performing mathematical operations
able. Devices (such as smartphones) and presentation. We would also like with metamaterials. Science 343, 6167 (Jan. 2014),
160–163.
used in the IoT depend on security to thank the reviewers for their valu- 23. Usman, M., Muthukkumarasamy, V., and Wu, X.
features (such as the ability to do able feedback and reviews. Mobile agent-based cross-layer anomaly detection in
smart home sensor networks using fuzzy logic. IEEE
secure email, secure Web browsing, Transactions on Consumer Electronics 61, 2 (May
and other transactions). These devic- 2015), 197–205.
References 24. Zhu, C., Sheng, W., and Liu, M. Wearable sensor-based
es need efficient, secure implementa- 1. Adepetu, A., Arnautovic, E., Svetinovic, D., and de behavioral anomaly detection in smart assisted-living
Weck, L. Complex urban systems ICT infrastructure systems. IEEE Transactions on Automation Science
tions of all these features in a smart modeling: A sustainable city case study. IEEE and Engineering 12, 4 (Oct. 2015), 1225–1234.
city. In this context, elliptic curve Transactions on Systems, Man, and Cybernetics:
Systems 44, 3 (Mar. 2014), 363–374.
cryptography (ECC) is considered 2. Akselrod, G., Argyropoulos, C., Hoang, T., Ciracì, C.,
Fang, C., and Huang, J. Probing the mechanisms Rida Khatoun ([email protected]) is an
the most trusted solution for provid- associate professor at Telecom ParisTech, Paris, France.
of large Purcell enhancement in plasmonic
ing security on resource-constrained nanoantennas. Nature Photonics 8 (2014), 835–840. Sherali Zeadally ([email protected]) is an associate
devices and embedded systems. ECC 3. Amaba, B. Industrial and business systems for smart professor in the College of Communication and
cities. In Proceedings of the First International Workshop Information at the University of Kentucky, Lexington, KY.
uses public key cryptography and is on Emerging Multimedia Applications and Services for
Smart Cities. ACM Press, New York, 2014, 21–22.
based on the algebraic structure of 4. Angelidou, M. Smart cities: A conjuncture of four
elliptic curves over finite fields. ECC forces. Cities 47 (Sept. 2015), 95–106.
5. Debnath, A., Chin, H., Haque, M., and Yue, B. A
advantages include the ability to pro- methodological framework for benchmarking smart
vide the same level of security as other transport cities. Cities 37 (Apr. 2014), 47–56.
6. Desouza, K. and Flanery, T. Designing, planning, and
cryptographic algorithms but with managing resilient cities: A conceptual framework.
smaller keys, low memory require- Cities 35 (Dec. 2013), 89–99.
7. Domingo-Ferrer, J. A three-dimensional conceptual
ments and computational overheads, framework for database privacy. Chapter in Secure
and much faster computations. Data Management, W. Jonker and M. Petkovic, Eds.
Springer, Berlin, Heidelberg, Germany, 2007, 193–202.
The scientific community must 8. Ferreira, D. AWARE: A Mobile Context Instrumentation
address these cybersecurity proj- Middleware to Collaboratively Understand Human
Behavior. Ph.D. dissertation, Faculty of Technology,
ects. Unresolved challenges and op- University of Oulu, Oulu, Finland, 2013; http://jultika.
portunities for participation involve oulu.fi/Record/isbn978-952-62-0190-0
9. Huang, L., Chen, X., Mühlenbernd, H., Zhang, H., Chen,
DoS-attack detection for distribution S., Bai, B., Tan, Q., Jin, G., Cheah, K., Qiu, C., Li, J.,
systems, cryptographic countermea- Zentgraf, T., and Zhang, S. Three-dimensional optical
holography using a plasmonic metasurface. Nature
sures, and authentication in IoT and Communications 4, 2808 (Nov. 2013).
10. Jagadish, H.V., Gehrke, J., Labrinidis, A.,
critical infrastructure, as well as in Papakonstantinou, Y., Patel, J., Ramakrishnan, R.,
key management. and Shahabi, C. Big data and its technical challenges. © 2016 ACM 0001-0782/16/08 $15.00
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 57
contributed articles
DOI:10.1145/ 2964342
sometimes better known outside com-
John H. Holland’s general theories of adaptive puter science than within.
Holland is best known for his inven-
processes apply across biological, cognitive, tion of genetic algorithms (GAs), a fam-
social, and computational systems. ily of search and optimization methods
inspired by biological evolution. Since
BY STEPHANIE FORREST AND MELANIE MITCHELL their invention in the 1960s, GAs have
inspired many related methods and
Adaptive
led to the thriving field of evolutionary
computation, with widespread scien-
tific and commercial applications. Al-
though the mechanisms and applica-
Computation:
tions of GAs are well known, they were
only one offshoot of Holland’s broader
motivation—to develop a general the-
ory of adaptation in complex systems.
Here, we consider this larger frame-
Legacy of
of adaptive systems: discovery and dy-
namics in adaptive search; internal
models and prediction; exploratory
contact with. As a result, even though he received what ˽˽ Exploratory models of complex adaptive
systems lead to insights about essential
was arguably one of the world’s first computer science mechanisms and principles, a use of
modeling that is different from simply
Ph.D. degrees in 1959,23 his contributions are making accurate predictions.
mentioned the Krebs cycle, a core cel- ized “two-armed bandit” problem. blocks making up that individual.
lular metabolic pathway that is highly Given a slot machine with two arms, The question of balancing exploi-
conserved across living systems.) Suc- each of which has an unknown payoff tation and exploration—how to opti-
cessful individuals are discovered in probability, how should you allocate mally allocate trials to different arms
stages, first by finding useful building trials (pulls) between the arms so as to based on their observed payoffs—now
blocks through stochastic sampling, maximize your total payoff? Holland becomes the question of how to sam-
and over time recombining them to argued that the optimal strategy allo- ple optimally in the vast space of pos-
create higher-fitness individuals out of cates trials to the observed best arm at sible building blocks, based on their
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 59
contributed articles
estimated contribution to fitness. learning. In Holland’s view, the key environment to enhance survival. The
Evolution deals in populations of indi- activity of an adaptive agent involves key learning elements of this method,
viduals, of course, not building blocks. building and refining these data-driv- the “bucket-brigade” algorithm, com-
There is no explicit mechanism that en models of the environment. bined with a genetic algorithm, pre-
keeps statistics on how building blocks In his second book, Induction,14 saged many of the ideas in modern
contribute to fitness. Holland’s central Holland and his co-authors proposed reinforcement learning, notably the
idea here is that nearly optimal build- inductive methods by which cogni- temporal-difference methods intro-
ing-block sampling occurs implicitly, tive agents can construct—and profit duced by Sutton and Barto.28
as an emergent property of evolution- from—internal models by combining Holland’s inspiration for classifier
ary population dynamics. environmental inputs and rewards systems came from several different
Holland’s early papers10,11 and his with stored knowledge. In their frame- disciplines, including Hebbian learn-
influential 1975 book Adaptation in work, an internal model defines a set ing, artificial intelligence, evolution-
Natural and Artificial Systems12 devel- of equivalence classes over states of the ary biology, economics, psychology,
oped a general, formal setting in which world, together with a set of transition and control theory. Knowledge rep-
these ideas could be expressed mathe- rules between the equivalence class- resentation in the form of a popula-
matically. It was this formalization that es, all of which are learned over time tion of “if-then” rules was a natural
led to the invention of genetic algo- based on environmental rewards (or choice due to its popularity in AI at
rithms that featured stochastic popula- punishments). The many-to-one map- the time, as well as Holland’s early
tion-based search, as well as crossover ping from states of the world to states work on modeling Hebbian cell as-
between individuals as a critical op- of the model (the equivalence classes) semblies: “In Hebb’s view, a cell as-
eration that allows successful building is called a homomorphism. Models sembly makes a simple statement: If
blocks to be recombined and tested in that form valid homomorphisms with such and such an event occurs, then
new contexts. the part of the world being modeled I will fire for a while at a high rate.”29
However, Holland’s aim was more allow the system to make accurate pre- The if-then rules, when activated,
general than a new class of algorithms. dictions. In Holland’s conception, the compete to post their results on a
He aspired to develop an interdisci- equivalence classes are initially very shared “message list,” serving as the
plinary theory of adaptation, one that general, as with, say, “moving object” system’s short-term memory, again
would inform, say, biology as much as and “stationary object.” Through expe- inspired by Hebb’s work, as well as
computer science.5 The later, success- rience and learning, these classes can by AI blackboard systems of the day.
ful application of genetic algorithms be specialized into more useful and Unlike blackboard systems, however,
to real-world optimization and learn- precise subclasses, as in, say, “insect” new rules are generated automatically
ing tasks was, for Holland, just icing and “nest.” Over time, the adaptive in a trial-and-error fashion, and can
on the cake. His broader view of adap- system builds up a default hierarchy of be selected and recombined by a ge-
tation has inspired many and engen- rules covering general cases and refine- netic algorithm.
dered criticism from others, leading to ments for specific classes. Successful rules are strengthened
spirited intellectual debates and con- At the time of Holland’s work, the over time if their predictions lead to
troversies. Most controversial is the ex- idea of default hierarchies was preva- positive rewards from the environment
tent to which it can be demonstrated, lent in knowledge representation sys- (and weakened otherwise) through a
either empirically or mathematically, tems. Holland made two key contribu- credit-assignment method called the
that the behavior of adaptive systems tions to this literature. The first was his bucket-brigade algorithm, in which
is actually governed by Holland’s pro- emphasis on homomorphisms as a for- rules gaining rewards from the envi-
posed principles. Regardless of one’s mal way to evaluate model validity, an ronment or from other rules, transfer
position on this question, the ideas idea that dates back to W. Ross Ashby’s some of their gains to earlier-firing
are compelling and provide a set of An Introduction to Cybernetics.2 Hol- “stage-setting” rules that set up the
unifying concepts for thinking about land’s student Bernard Ziegler devel- conditions for the eventual reward.
adaptation. oped this idea into a formal theory of Holland credited Arthur Samuel’s pio-
computer modeling and simulation.30 neering work on machine learning ap-
Internal Models and Prediction Even today, these early homomorphic plied to checkers26 as a key inspiration
Internal models are central to Hol- theories of modeling stand as the most for these ideas.
land’s theory of adaptive systems. He elegant approach we know of to char- Holland was primarily interested in
posited that all adaptive systems create acterize how consistent a model is with how the two learning mechanisms—
and use internal models to prosper in its environment and how an intelligent discovery of new rules and apportion-
the face of “perpetual novelty.” agent, human or artificial, can update a ing credit to existing rules—could work
Models can be tacit and learned model to better reflect reality. together to create useful default hierar-
over evolutionary time, as in the case of Holland’s second key contribu- chies of rules. He emphasized that the
bacteria swimming up a chemical gra- tion was describing a computational competition inherent in the learning
dient, or explicit and learned over a sin- mechanism, the “learning classifier and action mechanisms would allow
gle lifespan, as in the case of cognitive system,”13,21 to illustrate how a cogni- the system to adapt to a continually
systems that incorporate experience tive system could iteratively build up a changing environment without los-
into internal representations through detailed and hierarchical model of its ing what it had learned in the past.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 61
contributed articles
markets, cities, and the brain. In the prices over time. The model tested for
early 1980s, he teamed up with a small the emergence of three different trad-
group of scientists, primarily physi- ing strategies: fundamental, techni-
cists, with a sprinkling of economists cal, and uninformed. The simulated
and biologists, to discuss what proper-
ties this wide swath of “complex adap- Holland’s view market with adaptive trading agents
was run many times, and the dynam-
tive systems” have in common. The dis-
cussions helped define the intellectual
was that a ics of price and trading volumes were
compared to observed patterns in real
mission of the Santa Fe Institute, the system in stable markets. Holland and his collabora-
first institution dedicated to develop-
ing a science of complexity, as well as
equilibrium tors found that the model’s dynamics
replicated several otherwise puzzling
the other complexity institutes that is essentially features of real-life markets.
followed. Holland brought to these
discussions his lifelong study of ad-
dead. Although the Santa Fe Institute
Stock Market model was highly sim-
aptation and a reminder that serious plified, it was very influential and led
theories about complexity would need to many follow-on projects. It demon-
to look deeper than phenomenological strated clearly the essential role that
descriptions but also account for the adaptation plays in complex systems
“how” and “why” of these systems. and illustrated how Holland’s theories
As the discussions about complex of continual learning in response to in-
adaptive systems matured, a consensus termittent feedback from the environ-
developed about their basic properties. ment could be integrated into domain-
Such systems are composed of many specific settings.
components with nonlinear interac- Echo17,22 was an even more ambi-
tions; are characterized by complex tious exploratory model Holland and
emergent behavior; exhibit higher-or- his collaborators developed during the
der patterns; operate at multiple (and 1990s. Echo formalized Holland’s ide-
often nested) spatial and temporal alization of complex adaptive systems
scales, with some behavior conserved into a runnable computational system
across all scales and other behaviors where agents evolved external markers
changing at different scales; and are (called tags) and internal preferences,
adaptive, with behavioral rules contin- then used them to acquire resources
ually adjusted through evolution and and form higher-level aggregate struc-
learning. Although this list is far from tures (such as trading relationships,
a formal characterization of complex symbiotic groups, trophic cascades,
adaptive systems, most people work- and interdependent organizations).
ing in the field today are interested in Echo agents sometimes discovered
systems that have these properties. mimicry and used it to deceive com-
In the early 1990s, Holland teamed petitors. These complex interacting
up with other Santa Fe Institute re- patterns arose throughout the mod-
searchers, including several econo- el’s execution, which started with a
mists, to tackle the mismatch between single minimal agent. When the runs
the predictions of rational expecta- stabilized, the resulting population
tions theory—the dominant theory in of agents was found to reproduce sev-
economics at the time—and empiri- eral well-known patterns observed in
cally observed stock-market behaviors. nature, most famously the rank-fre-
In brief, most economic theory of the quency distribution of species diversity
day assumed that all participants in an known as the Preston curve in ecology.
economy or financial market are fully In lay terms, Echo evolved populations
rational, acting to maximize their indi- where “most species are rare.”
vidual gain. In real life, however, actors The broad scope of the model, to-
in economies and markets are rarely gether with its ability to produce easily
wholly rational, and financial markets identifiable and well-known patterns
often deviate from rationality, as in, from nature, was appealing to immu-
say, speculative bubbles and crashes. nologists, economists, and evolution-
The Santa Fe Institute Artificial ary biologists alike. Many of the in-
Stock Market project1,24 developed an sights behind the project are described
exploratory model in which rational in Holland’s third book, Hidden Order.16
traders were replaced by adaptive trad- Holland’s later books, Emergence,18
ers—those who learn to forecast stock Signals and Boundaries,19 and Complex-
ity: A Very Short Introduction20 show ant colonies for optimization methods, 9. Goldberg, D.E. Computer-Aided Gas Pipeline Operation
Using Genetic Algorithms and Rule Learning. Ph.D.
how the theories of adaptation Hol- and mechanisms from immunology to Dissertation, University of Michigan, Ann Arbor, MI,
land developed during the earlier part improve computer security. 1983; http://www.worldcat.org/title/computer-aided-
gas-pipeline-operation-using-genetic-algorithms-and-
of his career fit into the larger land- Behind the ideas is the man himself. rule-learning/oclc/70390568
scape of complex systems research. In Everyone who knew John personally 10. Holland, J.H. Outline for a logical theory of adaptive
systems. Journal of the ACM 9, 3 (1962), 297–314.
these works, he especially emphasized remembers the gleam in his eye when 11. Holland, J.H. Genetic algorithms and the optimal
the open-ended nature of complex sys- encountering a new idea; his willing- allocation of trials. SIAM Journal on Computing 2, 2
(1973), 88–105.
tems, where change and adaptation are ness to talk to anyone, no matter how 12. Holland, J.H. Adaptation in Natural and Artificial
Systems: An Introductory Analysis with Applications
continual, systems co-evolve with each famous or obscure; and his incredible to Biology, Control, and Artificial Intelligence.
other, and ecological niches arise and generosity and patience. His person- University of Michigan Press, Ann Arbor, MI, 1975.
13. Holland, J.H. Escaping brittleness: The possibilities
decay spontaneously depending on re- ality and humanity were somehow of general-purpose learning algorithms applied to
source availability and competition. inextricably entangled with his intel- parallel rule-based systems. In Machine Learning:
An Artificial Intelligence Approach, Volume 2, R.S.
Holland’s focus on understanding lectual contributions. Since his death Michalski, J.G. Carbonell, and T.M. Mitchell, Eds.
the mechanisms by which complex in 2015, many of Holland’s former stu- Morgan-Kaufman, San Francisco, CA, 1986, 593–623.
14. Holland, J.H. Induction: Processes of Inference, Learning,
patterns emerge and change, rather dents and colleagues have movingly and Discovery. MIT Press, Cambridge, MA, 1989.
than simply characterizing the pat- described their desire to emulate his 15. Holland, J.H. Genetic algorithms. Scientific American
267 (July 1992), 44–50.
terns themselves (such as describing personal qualities as much as his sci- 16. Holland, J.H. Hidden Order: How Adaptation Builds
chaotic attractors or power laws), re- entific excellence. His ideas, intellectu- Complexity. Perseus Books, New York, 1995.
17. Holland, J.H. Echoing emergence: Objectives, rough
flected his determination to get to the al passion, and personal approach will definitions, and speculations for Echo-class models.
heart of complex adaptive systems. serve as beacons for research in intel- In Complexity: Metaphor, Models, and Reality, G.A.
Cowen, D. Pines, and D. Meltzer, Eds. Perseus Books,
This determination represents the ligent and complex systems for many New York, 1999, 309–342.
best of science. Holland’s willingness years to come. 18. Holland, J.H. Emergence: From Chaos to Order. Oxford
University Press, Oxford, U.K., 2000.
to tackle the most difficult questions, 19. Holland, J.H. Signals and Boundaries: Building
Blocks for Complex Adaptive Systems. MIT Press,
develop his own formalisms, and use Acknowledgments Cambridge, MA, 2012.
mathematics and simulation to pro- We thank R. Axelrod, L. Booker, R. 20. Holland, J.H. Complexity: A Very Short Introduction.
Oxford University Press, Oxford, U.K., 2014.
vide insight sets a high bar for scien- Riolo, K. Winter, and the anonymous 21. Holland, J.H. and Reitman, J.S. Cognitive systems
tists in all disciplines. reviewers for their careful reading of based on adaptive algorithms. ACM SIGART Bulletin
63 (June 1977), 49.
the manuscript and many helpful sug- 22. Hraber, P.T., Jones, T., and Forrest, S. The ecology of
Conclusion gestions. Stephanie Forrest gratefully Echo. Artificial Life 3, 3 (1997), 165–190.
23. London, R.L. Who Earned First Computer Science
John Holland was unusual in his abil- acknowledges the partial support of Ph.D.? blog@cacm (Jan. 15, 2013); http://cacm.acm.
ity to absorb the essence of other dis- NSF (NeTS 1518878, CNS 1444500), org/blogs/blog-cacm/159591-who-earned-first-
computer-science-phd/fulltext
ciplines, articulate grand overarching DARPA (FA8750-15-C-0118), Air Force 24. Palmer, R.G., Arthur, W.B., Holland, J.H., LeBaron, B.,
principles, and then back them up Office of Scientific Research (FA8750- and Tayler, P. Artificial economic life: A simple model
of a stock market. Physica D: Nonlinear Phenomena
with computational mechanisms and 15-2-0075), and the Santa Fe Institute. 75, 1–3 (1994), 264–274.
mathematics. Unlike most research- Melanie Mitchell gratefully acknowl- 25. Rochester, N., Holland, J.H., Haibt, L.H., and Duda, W.
Tests on a cell assembly theory of the action of the
ers, Holland moved seamlessly among edges the partial support of NSF (IIS- brain, using a large digital computer. IRE Transactions
these three modes of thinking, devel- 1423651). Any opinions, findings, and on Information Theory 2, 3 (1956), 80–93.
26. Samuel, A.L. Some studies in machine learning using
oping models that were years ahead of conclusions or recommendations ex- the game of checkers. IBM Journal of Research and
Development 3, 3 (1959), 210–229.
their time. A close reading of his work pressed in this material are those of 27. Smith, S.F. A Learning System Based on Genetic
reveals the antecedents of many ideas the authors and do not necessarily re- Adaptive Algorithms. Ph.D. Dissertation, University of
Pittsburgh, Pittsburgh, PA, 1980.
prevalent in machine learning today flect the views of the National Science 28. Sutton, A.G. and Barto, R.S. Time derivative models
(such as reinforcement learning in Foundation. of Pavlovian reinforcement. In Learning and
Computational Neuroscience: Foundations of Adaptive
non-Markovian environments and Networks, M. Gabriel and J. Moore, Eds. MIT Press,
active learning). His seminal genetic References
Cambridge, MA, 1990, 497–537.
29. Waldrop, M.M. Complexity: The Emerging Science at
algorithm spawned the field of evolu- 1. Arthur, W.B. Holland, J.H., LeBaron, B.D., Palmer,
the Edge of Order and Chaos. Simon and Schuster,
R.G., and Tayler, P. Asset pricing under endogenous
tionary computation, and his insights expectations in an artificial stock market. In The
New York, 1993.
30. Ziegler, B. Theory of Modeling and Simulation. Wiley
and wisdom helped define what are Economy As an Evolving Complex System II, W.B.
Interscience, New York, 1976.
Arthur, S. Durlauf, and D. Lane, Eds. Addison-Wesley,
today referred to as the “sciences of Reading, MA, 1997.
complexity.” 2. Ashby, W.R. An Introduction to Cybernetics. Chapman
Stephanie Forrest ([email protected]) is Distinguished
& Hall, London, 1956.
Holland’s many books and papers 3. Bellman, R.E. Adaptive Control Processes: A Guided Professor of Computer Science at the University of
Tour. Princeton University Press, Princeton, NJ, 1961. New Mexico, Albuquerque, NM, External Professor
have influenced scientists around the at the Santa Fe Institute, Santa Fe, NM, and a Fellow
4. Booker, L.B., Goldberg, D.E., and Holland, J.H.
world and across many disciplines. Classifier systems and genetic algorithms. Artificial of the IEEE; John Holland was her Ph.D. advisor at
Intelligence 40, 1–3 (1989), 235–282. the University of Michigan.
Closer to home, he introduced several
5. Christiansen, F.B. and Feldman, M.W. Algorithms, Melanie Mitchell ([email protected]) is Professor of
generations of students at the Universi- genetics, and populations: The schemata theorem Computer Science at Portland State University and
ty of Michigan to computation in natu- revisited. Complexity 3, 3 (1998), 57–64. External Professor at the Santa Fe Institute, Santa Fe,
6. Dennett, D.C. Elbow Room: The Varieties of Free Will NM; John Holland was co-advisor for her Ph.D. at the
ral systems, an idea that even today re- Worth Wanting. MIT Press, Cambridge, MA, 1984. University of Michigan.
7. Eddington, A. The Nature of the Physical World:
mains somewhat controversial, despite Gifford Lectures, 1927. Cambridge University Press,
successes in genetic algorithms for en- Cambridge, U.K., 1927.
8. Fisher, R.A. The Genetical Theory of Natural Selection:
gineering design, biomimicry for robot- A Complete Variorum Edition. Oxford University Press, Copyright held by the authors.
ics, abstractions of pheromone trails in Oxford, U.K.,1930. Publication rights licensed to ACM. $15.00.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 63
contributed articles
DOI:10.1145/ 2888391
excellent communication skills,
The skills and knowledge that earn strong business acumen, and strate-
gic thinking.6–9 Although these quali-
promotions are not always enough ties are important, a more compre-
to ensure success in the new position. hensive set of skills is increasingly
important, as organizations grapple
LEON KAPPELMAN, MARY C. JONES, VESS JOHNSON, with the costs of hiring, training, re-
EPHRAIM R. MCLEAN, AND KITTIPONG BOONME tention, and turnover.5,7,12,13,18–20
Though many studies have consid-
Skills for
ered IT skills, most focus on specific
technical skills (such as programming
languages, hardware, and systems anal-
ysis) and are framed from the perspec-
tive of employers, IT degree programs,
or recent IT graduates,18,23 focusing
primarily on skills for new hires.1,2
Success
Limited research has informed our
understanding of the skills needed for
an IT professional who wants to prog-
ress up the ranks of an organization.
There is also little or no information
on the expectations of senior and mid-
level IT managers with regard to their
own skills or of those above and below
most important for success at each of The task of identifying the skills ness and technology conditions.3 In
the stages. This research continues to needed by IT employees as they rise the 1960s and 1970s, driven by the
provide insight into the skills IT pro- through the ranks of an organiza- predominance of mainframe com-
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 65
contributed articles
puters and long system develop- still viewed primarily as a support ac- greater strategic focus. In the 2000s,
ment cycles, technical skills were tivity responsible for managing the the emergence of social networking,
the most valued, followed closely infrastructure needed for the func- mobile computing, and e-commerce
by managerial skills.4,16 In the 1980s tional groups within the organiza- made it even more important for IT
and 1990s, the widespread use of tion.3,11,15 This brought greater need managers to address organization is-
personal computers and local-area for broader technical skills, along sues.12,17,25 Interpersonal, technical,
networks triggered a shift to a more with greater awareness of business and organizational skills have long
strategic focus.21 However, IT was and technology changes, as well as been important for the success of IT
professionals at all stages of their
Table 1. Respondents’ work tenures. careers, but the relative mix of skills
at each stage is not always clear or
IT Current Organization Current Position straightforward.5,7–9,15,16,25 For exam-
< 5 years 8.9% 44.4% 65.4%
ple, although nontechnical skills are
>=5 to <10 years 5.6% 21.0% 22.4%
often viewed as more important for
>=10 to <15 years 11.0% 15.2% 6.7%
senior-management career success,
>=15 to <20 years 18.3% 9.5% 4.4%
technical skills are still highly valued,
>=20 years 56.2% 9.8% 1.1%
particularly in organizations that of-
fer dual career ladders, allowing em-
ployees to pursue either technical or
managerial career paths.10,16,24
Figure 1. Annual revenue of respondents’ organizations ($U.S.). Understanding this evolution of
skill requirements is important for
30%
several reasons. First, as IT organiza-
tions become more strategic, under-
25% 24.8%
standing issues related to recruiting,
21.3%
20% developing, and retaining skilled
IT employees becomes even more
15%
11.3%
important.5,8,13,15,21 Second, because
11.0%
10% 9.0% 9.6% IT has become more strategic and
5.9% 6.1% business-focused in most organiza-
5%
tions, the skills needed at all levels
0% of IT personnel have evolved and ex-
panded. A good understanding of the
M
0B
0B
–1
–5
10
50
00
–1
10
00
>1
0–
0–
–5
>5
0–
0–
>5
>1
00
>1
>1
We received responses from 312 Table 2. CIO perceptions of essential success skills at three stages of an IT career (n = 312).
CIOs, 384 mid-level IT managers, and
306 practitioners who either did not
answer the skills questions or self- IT Middle
CIO % Selecting Management % Selecting New IT Hire % Selecting
identified as “others,” or consultants,
vendors, academics, retired, or in tran- Providing 34.29% Collaboration 33.65% Technical 47.44%
leadership with others knowledge
sition. “CIO” represented the highest-
People 29.49% Problem 19.87% Problem 39.42%
ranking IT person in an organization, management solving solving
regardless of title. The term “mid-level Strategic 23.72% Technical 19.87% Collaboration 38.46%
IT manager/professional” was used planning knowledge with others
for respondents in IT management Decision 23.40% People 17.63% Functional area 22.12%
positions below CIO. Everyone in the making management knowledge
sample was an IT manager since SIM is Communication 20.83% Functional area 16.99% Communication 18.27%
(oral) knowledge (oral)
a professional society for IT managers.
Collaboration 20.19% Communication 16.99% Honesty/ 15.38%
Table 1 reports the distribution of the
with others (oral) credibility
respondents’ tenure. For distribution
Emotional 16.03% Business 16.67% User relationship 8.65%
by organizational revenue, see Figure intelligence analysis management
1, and by industry type, see Figure 2. Honesty/ 15.38% Project 14.10% Business 8.65%
Participants were presented with a credibility management analysis
list of 36 skills and asked to identify the Business 11.86% Decision 11.54% Communication 8.65%
three most important for the success analysis making (written)
of new IT hires, mid-level IT managers, Change 11.22% Honesty/ 11.22% Emotional 7.69%
management credibility intelligence
and themselves in their current posi-
tions. Table 2 reports the top 10 rank-
ings and percentage of CIOs selecting
each success skill for each career stage. Table 3. Mid-level IT manager perceptions of skills* (non-CIOs; n = 384).*
“Providing leadership” was select-
ed by 34.3% of CIOs as one of the top
three skills for success in their jobs
IT Middle Management Skills New IT Hire Skills
and was highest ranked. Not surpris-
Skills % Selecting Skills % Selecting
ingly, the next most selected CIO skills
Collaboration with others 34.4% Collaboration with others 41.7%
were directly related to leadership and
big-picture thinking: “people manage- Functional area knowledge 22.1% Technical knowledge 38.5%
ment,” “strategic planning,” and “de- Problem solving 20.6% Problem solving 32.0%
cision making.” Two of the skills CIOs People management 20.1% Communication (oral) 22.1%
perceived as most important to their Technical knowledge 17.7% Functional area knowledge 18.0%
jobs were also skills they perceived as Communication (oral) 16.9% Honesty 17.2%
highly important to both mid-level IT Emotional intelligence 14.3% Business analysis 12.2%
managers and new IT hires: “oral com- Communication (written) 12.5% Emotional intelligence 10.7%
CIOs indicated that many of the Managing expectations 9.9% Programming 9.4%
most important skills for mid-level IT * Only the top 10 skills are included.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 67
contributed articles
Figure 3. Perception of top skills for mid-level IT managers by level of manager. ment” as being as critical as the two
higher levels did.
Figure 3 outlines the most impor-
1 level from CIO n=159 2 levels from CIO n=106 3 levels from CIO n=40 tant personal success skills at different
35%
reporting distances from the CIO, us-
30% ing the same data as Table 4 but includ-
ing only those skills selected by at least
25%
15% of the respondents. The eight skills
20% in Figure 3 are also in the top six of at
least one level in Table 4. Note how the
15%
relative importance of “emotional intel-
10% ligence,” “problem solving,” and “func-
tional area knowledge” increases as one
5% moves up the IT hierarchy, while the im-
0% portance of “oral communications” and
“decision making” decreases.
(o tion
en al
le rea
en e
so blem
le cal
ak n
ot ion
em pl
m cisio
lig n
rs
l)
ce
ng
ing
ow ni
ag eo
el tio
ow l a
dg
dg
th at
ra
lvi
kn ech
o
an P
wi bor
int mo
kn ona
De
Pr
un
T
move up the career ladder and oc-
lla
E
m
ti
nc
m
Co
Fu
Co
findings may be limited because all Implications and Lessons Learned ample, “technical knowledge,” “prob-
respondents were members of one A picture emerges of the variety of lem solving,” and “collaboration”
professional association, SIM. Because skills IT managers deem necessary were the top skills for new IT hires
SIM is an organization for IT profes- for success at various stages of an IT yet were relatively less important for
sionals who have chosen a managerial career, providing a broader, more dy- mid-level managers even though still
career path, rather than a technical ca- namic view of skills for IT career suc- highly ranked and significantly less
reer path, findings might not be entire- cess than the typical in-depth look important for CIOs, while “people
ly applicable to those on a more tech- at specific skills required for a given management,” “strategic planning,”
nical track. On the other hand, a plus career stage or the skill shortages at a and “decision making” were of little
for generalizability is the significant particular moment in time. The study importance to the success of new
degree of variation in the organization also helps improve our understanding hires, somewhat important for mid-
size and industries represented in the of the progression of skills needed as level managers, but were among the
sample. However, since smaller organi- an IT professional’s career advances most important skills reported by re-
zations tend to have flatter hierarchies, from individual performer to manager spondents for CIO success. There is
size-related effects could be an issue of individual performers to leader. thus advance and decline in the rela-
for the mid-level perceptions in Table This progression is reflected in Fig- tive importance of various skills as an
3, Table 4, and Figure 3. Moreover, be- ure 4, illustrating the evolution of IT IT professional’s career progresses.
cause we gathered only frequencies of professional skill requirements as a However, Figure 4 also makes clear
responses, it was difficult to know with career progresses and using the top that throughout one’s entire career,
confidence the relative importance five success skills most frequently se- “collaboration” and “oral communi-
among the skills respondents selected; lected by CIOs across the three career cations” skills remain important.
for example, if 20% of the respondents stages in Table 2. Figure 4 depicts These findings, and recognition of
chose skill X and 10% chose skill Z, this how CIOs’ perceptions of essential the different skills needed at differ-
does not necessarily mean skill X is skills change for those IT profession- ent career stages, represent an impor-
twice as important as skill Z. als at different career stages. For ex- tant contribution to our understand-
Figure 4. CIO perceptions of the top five skills across three levels of IT careers.
Providing leadership
50%
45%
Functional area
40% People management/relationships
knowledge
35%
30%
25%
20%
15%
0%
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 69
contributed articles
ing of IT skills and careers; they also All this highlights the importance N. The 2014 SIM IT Key Issues and Trends Study.
MIS Quarterly Executive 13, 4 (Dec. 2014), 237–263.
have implications for organizations, of continuous, lifelong learning to the 13. Kappelman, L., McLean, E., Luftman, J., and Johnson,
as well as individuals. On the one success of IT professionals. In addi- V. IT key issues, investments, and practices of
organizations and IT executives: Results and
hand, such understanding can help tion to the formal academic programs observations from the 2013 SIM IT Trend Study. MIS
organizations improve their IT hiring discussed here, mentoring by a more Quarterly Executive 12, 4 (Dec. 2013), 227–240.
14. Lee, C.K. and Wingreen, S.C. Transferability of
practices and invest in designing and senior manager in the organization knowledge, skills, and abilities along IT career
implementing appropriate training, (but not necessarily one’s boss) can paths: An agency theory perspective. Journal of
Organizational Computing and Electronic Commerce
education, retention, and career-ad- be very helpful toward career advance- 20, 1 (Feb. 2010), 23–44.
15. Lee, D.M.S., Trauth, E.M., and Farwell, D. Critical skills
vancement programs. On the other, ment. If not formally provided by an and knowledge requirements of IS professionals: A
they provide a framework whereby IT employer, individuals should seek joint academic/industry investigation. MIS Quarterly
(Special Issue on IS Curricula and Pedagogy) 19, 3
employees can develop and advance out such mentors informally through (Sept. 1995), 313–340.
their own careers, acquiring the skills professional associations like ACM 16. Litecky, C.R., Arnett, K.P., and Prabhakar, B. The
paradox of soft skills versus technical skills in IS
needed for success in their current and personal networks. Similarly, ex- hiring. Journal of Computer Information Systems 45,
positions and prepare for their next ternal career coaches can be helpful, 1 (Sept. 2004), 69–76.
17. Luftman, J., Kempaiah, R., and Nash, E. Key issues
career stage. particularly at critical decision points for IT executives 2005. MIS Quarterly Executive 5, 2
In the first stage, new hires are ex- in one’s career. (June 2006), 27–45.
18. Overby, S. CIOs struggle with the great talent hunt.
pected to have the technical knowl- CIO (May 1, 2013), 32–44.
edge and problem-solving skills Conclusion 19. Prabhakar, B., Litecky, C.R., and Arnett, K. IT skills in a
tough job market. Commun. ACM 48, 10 (Oct. 2005),
needed to perform the job. These This research study, which was based 91–94.
skills would come from their college on SIM member data, provides in- 20. Schwarzkopf, A.B., Mejias, R.J., Jasperson, J.S.,
Saunders, C.S., and Gruenwald, H. Effective practices
preparation in computer science or sights into the diverse and dynamic for IT skills staffing. Commun. ACM 47, 1 (Jan. 2004),
information systems, technical cer- nature of skill requirements at differ- 83–88.
21. Sethi, V. and King, W.R. Development of measures to
tifications (such as Cisco and Micro- ent stages of an IT career and levels of assess the extent to which an information technology
application provides competitive advantage.
soft), or previous employment, intern- responsibility from the perspective of Management Science 40, 12 (Dec. 1994), 1601–1627.
ships, and work-study programs. At most senior and mid-level IT manag- 22. Smits, S.J., McLean E.R., and Tanner, J.R. Managing
high-achieving information systems professionals.
this stage, IT employees function as ers. Organizations can use them to en- Journal of Management Information Systems 9, 4
individual performers or members hance their IT workforce practices and (1993), 103–120.
23. Todd, P.A., McKeen, J.D., and Gallupe, R.B. The
of a team. They also begin to gain the IT professionals to achieve their per- evolution of IS job skills: A content analysis of IS job
functional knowledge and business sonal career objectives and help others advertisements from 1970 to 1990. MIS Quarterly 19,
1 (Mar. 1995), 1–27.
understanding needed to be a more do so, too. 24. Wilkerson, J. An alumni assessment of MIS related
effective performer. job skill importance and skill gaps. Journal of
Information Systems Education 23, 1 (2012), 85–99.
At mid-level, IT professionals le- References 25. Zwieg, P., Kaiser, K.M., Beath, C.M., Bullen, C.,
1. Aasheim, C.L., Li, L., and Williams, S. Knowledge
verage their technical experience to and skill requirements for entry-level information
Gallagher, K.P., Goles, T., Howland, J., Simon, J.C.,
Abbot, P. Abraham, T., Carmel, E., Evaristo, R. Hawk,
be managers of individual perform- technology workers: A comparison of industry and S., Lacity, M., Gallivan, M., Kelly, S., Mooney, J.G.,
academia. Journal of Information Systems Education Ranganathan, C., Rottman, J.W., Ryan, T., and Wion,
ers, heading teams, IT projects, or 20, 3 (Summer 2009), 349–356. R. The information technology workforce: Trends and
programs. Here “decision making,” 2. Aasheim, C.L., Shropshire, J., Li, L., and Kadlee, C. implications 2005–2008. MIS Quarterly Executive 5, 2
Knowledge and skill requirements for entry-level IT (June 2006), 47–54.
“business knowledge,” and “people workers: A longitudinal study. Journal of Information
management” skills become more Systems Education 23, 2 (Spring 2012), 193–204.
3. Agarwal, R. and Ferratt, T. Enduring practices for Leon Kappelman ([email protected]) is a
important than they were previous- managing IT professionals. Commun. ACM 45, 9 professor in the Department of Information Technology
ly. Obtaining an MBA or a master’s (Sept. 2002), 73–79. & Decision Sciences in the College of Business at the
4. Benamati, J.H., Ozdemir, Z.D., and Smith, H.J. Aligning University of North Texas, Denton, TX.
degree in IS, or participating in per- undergraduate IS curricula with industry needs.
sonal- or management-development Commun. ACM 53, 3 (Mar. 2010), 152–156. Mary C. Jones ([email protected]) is department
5. Byrd, T.A. and Turner, D.E. An exploratory analysis chair and a professor in the Department of Information
programs can be valuable career en- of the value of the skills of IT personnel: Their Technology & Decision Sciences in the College of Business
relationship to IS infrastructure and competitive at the University of North Texas, Denton, TX.
hancers at this stage, enabling op- advantage. Decision Sciences 32, 1 (Mar. 2001), 21–54.
portunities for greater management 6. Clark, C.E., Cavanaugh, N.C., Brown, C.V., and Vess Johnson ([email protected]) is an assistant
Sambamurthy, V. Building change-readiness professor and MIS coordinator at the H-E-B School of
responsibilities. capabilities in the IS organization: Insights from the Business at the University of the Incarnate Work, San
Upon achieving executive leader- Bell Atlantic experience. MIS Quarterly 21, 4 (Dec. Antonio, TX.
1997), 425–454.
ship responsibilities, as in, say, a pro- 7. Gallagher, K.P., Kaiser, K.M., Simon, J.C., Beath, C.M.,
Ephraim R. McLean ([email protected]) is a Regents’
professor and George E. Smith Eminent Scholar’s chair
motion to CIO, “providing leadership,” and Goles, T. The requisite variety of skills for IT in Computer Information Systems Department in the
professionals. Commun. ACM 53, 6 (Virtual Extension, J. Mack Robinson College of Business at Georgia State
along with “strategic planning” and June 2010), 144–148. University, Atlanta, GA.
“decision making,” become key ingre- 8. Harris, J. Preparing to be the chief information officer.
Journal of Leadership, Accountability and Ethics 8, 5 Kittipong Boonme ([email protected]) is an assistant
dients for success. It is here that knowl- (Dec. 2011), 56–62. professor in the School of Management at Texas Woman’s
edge of the business must be linked to 9. Hawk, S., Kaiser, K.M., Goles, T., Bullen, C.V., Simon, University, Denton TX.
J.C., Beath, C.M., Gallagher, K.P., and Frampton, K.
knowledge of the business’s custom- The information technology workforce: A comparison
ers, suppliers, and indeed the industry of critical skills of clients and service providers.
Information Systems Management 29, 1 (2012), 2–12.
itself. Here, senior management and 10. Hsu, M.K., Jiang, J.J., Klein, G., and Tang, Z. Perceived
executive leadership programs (such career incentives and intent to leave. Information and
Management 40, 5 (May 2003), 361–369.
as at Harvard, MIT, Stanford, and SIM’s 11. Igbaria, M., Greenhaus, J.H., and Parasuraman, S.
Career orientations of MIS employees: An empirical
Regional Leadership Forum) can help analysis. MIS Quarterly 15, 2 (June 1991), 151–169.
provide this broader perspective. 12. Kappelman, L., McLean, E., Johnson, V., and Gerhart, © 2016 ACM 0001-0782/16/08 $15.00
Computational
Biology in
the 21st Century:
Scaling with
Compressive
Algorithms
explosion of data. However, this explosion is a mixed ˽˽ Of course, we are dealing with a massive
amount of data so that compression
blessing. On the one hand, the scale and scope of data becomes important for efficiency.
should allow new insights into genetic and infectious ˽˽ We highlight recent research that
diseases, cancer, basic biology, and even human capitalizes on structural properties
of biological data—low metric entropy
migration patterns. On the other hand, researchers are and fractal dimension—to allow us
to design algorithms that run in sublinear
generating datasets so massive that it has become time and space.
capacity (Figure 1). This growth in data poses significant sources are not needed continuously,
In 2002, when the first human ge- challenges for researchers.25 Currently, it is no panacea. First and foremost,
nome was sequenced, the growth in many biological “omics” applications the computer systems that make up
computing power was still match- require us to store, access, and analyze those cloud datacenters are themselves
ing the growth rate of genomic data. large libraries of data. One approach to bound by improvements in semicon-
However, the sequencing technology solving these challenges is to embrace ductor technology and Moore’s Law.
used for the Human Genome Project— cloud computing. Google, Inc. and the Thus, cloud computing does not truly
Sanger sequencing—was supplanted Broad Institute have collaborated to address the problem posed by the faster-
around 2004, with the advent of what bring the GATK (Genome Analysis Tool- than-Moore’s-Law exponential growth
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 73
review articles
Figure 1. (a) Moore’s and (b) Kryder’s laws contrasted with genomic sequence data. tion, and interaction of DNA, RNA, and
proteins. Much of this data is amena-
Genomic Data Computing Power
ble to standard Big Data analysis meth-
109 109 ods; however, in this article we focus
108 108
on examples of biological data that ex-
hibit additional exploitable structure
106 106
sequences (using a four-letter alpha-
105 105
bet representing the four DNA or RNA
104 104 bases) or protein sequences (using a
103 103 20-letter alphabet representing the 20
standard amino acids) are obtained in
102 102
several ways. For both protein and RNA
10 10 sequence data, mass spectrometry,
2002 2004 2006 2008 2010 2012 2014
Year
which can determine protein sequence
(a) and interactions and RNA-seq, which
can determine RNA sequence and abun-
109 109 dance allow scientists to also infer the
expression of the gene to which it might
108 108
106 106
est volume of sequence data available is
105
105
that of DNA. To better understand the
104 104 structure of NGS sequence data, we will
103 103 expand on NGS methodologies.
At the dawn of the genomic era,
102 102
Sanger sequencing was the most widely
10
2002 2004 2006 2008 2010 2012 2014
10 used method for reading a genome.
Year More recently, however, NGS approach-
(b) es, beginning with Illumina’s “sequenc-
ing by synthesis,” have enabled vastly
greater throughput due to massive par-
in omics data. Moreover, in the face of rithmic advances for dealing with the allelism, low cost, and simple sample
disease outbreaks such as the 2014 Ebo- growth in biological data by explicitly preparation. Illumina sequencing and
la virus epidemic in West Africa, analysis taking advantage of its unique struc- other NGS approaches such as SOLiD,
resources are needed at often-remote ture; algorithms for gaining novel bio- Ion Torrent, and 454 pyrosequencing do
field sites. While it is now possible to logical insights are not its focus. not read a single DNA molecule end-to-
bring sequencing equipment and limit- end as one could read through a bound
ed computing resources to remote sites, Types of Biological Data book. Instead, in shotgun sequencing,
Internet connectivity is still highly con- In the central dogma of molecular bi- DNA molecules are chopped into many
strained; accessing cloud resources for ology, DNA is transcribed into RNA, small fragments; from these fragments
analytics may not be possible. which is translated by the ribosome we generate reads from one or both ends
Computer scientists routinely ex- into polypeptide chains, sequences of (Figure 2a). These reads must be put to-
ploit the structure of various data in amino acids, which singly or in com- gether in the correct order to piece to-
order to reduce time or space complex- plexes are known as proteins. Proteins gether an entire genome. Current reads
ity. In computational biology, this ap- fold into sophisticated, low-energy typically range from 50 to 200 bases
proach has implicitly served research- structures, which function as cellular long, though longer reads are available
ers well. Now-classical approaches machines; the DNA sequence deter- with some technologies (for example,
such as principal component analysis mines the amino acid sequence, which PacBio). Because no sequencing tech-
(PCA) reduce the dimensionality of in turn determines the folded structure nology is completely infallible, sequenc-
data in order to simplify analysis and of a protein. This structure ultimately ing machines also provide a quality score
uncover salient features.3 As another determines a protein’s function within (or measure of the confidence in the
example, clever indexing techniques the cell. Certain kinds of RNA also func- DNA base called) associated with each
such as the Burrows-Wheeler Trans- tion as cellular machines. Methods position. Thus, an NGS read is a string
form (BWT) take advantage of aspects have been developed to gather biologi- of DNA letters, coupled with a string of
of sequence structure3 to speed up cal data from every level of this process, ASCII characters that encode the quality
computation and save storage. This resulting in a massive influx of data on of the base call. A sequencing run will
article focuses on cutting-edge algo- sequence, abundance, structure, func- produce many overlapping reads.
While measuring abundance to gen- in addition to random walks for global into contiguous sequences. The as-
erate gene expression data (for more in- multiple network alignment. Other sembly problem is analogous to the
formation, see the Source Material that tools solve other biological problems, problem of reconstructing a book with
accompanies this article in the ACM such as MONGOOSE,10 which analyzes all its pages torn out. De novo assem-
Digital Library) lends itself to cluster metabolic networks. However, given its bly is beyond the scope of this article,
analysis and probabilistic approaches, breadth, biological network science is but is possible because the sequence
the high dimensionality and noise in beyond the scope of this article. is covered by many overlapping reads;3
the data present significant challeng- for this task, the de Bruijn graph data
es. Principal Component Analysis has Challenges with Biological Data structure is commonly used.6 Often,
shown promise in reducing the dimen- Given DNA or RNA reads from NGS however, a reference genome (or in the
sionality of gene expression data. Such technologies, the first task is to as- case of RNA, transcriptome) is available
data and its challenges have been the semble those fragments of sequence for the organism being sequenced; the
focus of other articles,3 and thus will be
only lightly touched upon here. Figure 2. The next-generation sequencing (NGS) pipeline.
As mentioned earlier, function fol-
lows form, so in addition to sequence
cut many times
and expression, structure plays an im- at random
portant role in biological data science. (Shotgun)
However, we are not interested in only
RNA and protein structures; small chem-
ical compounds represent an additional
source of relevant structural data, as
they often interact with their larger RNA Read ∼100 bases (bp)
and protein brethren. Physical struc- from one or both ends
tures of molecules can be determined ∼100 bp ∼100 bp
by X-ray crystallography, NMR, electron
(a)
‘Shotgun’ sequencing breaks DNA molecules into many short fragments, which are read from
microscopy, and other techniques. Once one or both ends in the form of reads, and relies on high coverage to produce a statistically likely
determined, there are a variety of ways representation of a whole genome.
of representing these structures, from
labeled graphs of molecular bonds to
summaries of protein domains. These
representations can then be stored in
databases such as Pub-Chem or the Pro- A A G C C T A C C A C
T T C G G A T G G T G
tein Data Bank, and are often searched
through, for example, for potential small
molecule agonists for protein targets.
Importantly, as we will expand upon
later, interesting biomolecules tend to 1
be sparse and non-randomly distributed SNP
2
in many representational spaces, which
can be used for accelerating the afore-
mentioned searches.
When examining more complex
A A G C T T A C C A C
phenomena than single proteins or
T T C G A A T G G T G
compounds, we often look to synthe-
size things together into a systems-
level understanding of biology. To that
end, we frequently use networks to rep-
resent biological data, such as the ge- (b)
Single-nucleotide polymorphisms, or SNPs, are the simplest type of genomic
netic and physical interactions among variant, and form the bulk of ‘variant-calling’ analysis.
proteins, as well as those in metabolic
pathways.3 While standard network
science tools have been employed in Raw Reads Mapped Reads Variant Calls
these analyses—for example, several 102……A 452……A
146……C 713……C
approaches make use of diffusion or Bowtie2 GATK or
278……G 843……G
or BWA Samtools
random walks to explore the topology to reference 343……T 901……T
of networks9,11—they are often paired (c)
The NGS downstream analysis pipeline. Shotgun reads are mapped to a reference genome with tools
with more specific biological data, as such as BWA, Bowtie, or CORA. The resulting genomic sequence is analyzed for variants with tools
seen in IsoRank32 and IsoRankN’s21 such as GATK or Samtools. This allows relationships between genes and diseases to be uncovered.
use of conserved biological function
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 75
review articles
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 77
review articles
Figure 3. Cartoon depiction of points in an arbitrary high-dimensional space, as might arise computation is performed on one hu-
from genomes generated by mutation and selection during the course of evolution. man genome, and a researcher wishes
to perform the same computation on
Although high dimensional locally, at the global scale of covering spheres, the data cloud looks
nearly 1-dimensional, which enables entropy scaling of similarity search. Clusters cover the data
another human genome, most of the
points but do not cover unoccupied regions of space. The green triangle represents a query, with two work has already been done.22 When
concentric search radii (red circles) around it. Thanks to low fractal dimension, the large circle does dealing with redundant data, cluster-
not contain vastly more points than the small circle. ing comes to mind. While cluster-based
search is well studied,20 conventional
wisdom holds that it provides a constant
factor speed-up over exhaustive search.
Beyond redundancy, however, an-
other attribute of large biological da-
tasets stands out. Far fewer biological
sequences exist than could be enumer-
ated, but even more so, those that exist
tend to be highly similar to many oth-
ers. Thanks to evolution, only those
genes that exhibit useful biological
function survive, and most random
sequences of amino acids would not
be expected to form stable structures.
Since two human genomes differ on
average by only 0.1%, a collection of
1,000 human genomes contains less
than twice the unique information of
a single genome.22 Thus, not only does
biological data exhibit redundancy, it
also tends not to inhabit anywhere near
the entire feasible space (Figure 3). It
seems that physical laws—in this case,
evolution—constrain the data to a par-
ticular subspace of the Cartesian space.
One key insight related to redun-
dancy is that such datasets exhibit low
metric entropy.38 That is, for a given
free hash table. The alphabet reduction, Another recent and novel approach cluster radius rc and a database D, the
as it is reversible, can be thought of as to exploiting the structure of gene number k of clusters needed to cover
a form of lossless compression; a 20-let- expression space is Parti (Pareto task D is bounded by Nrc (D), the metric en-
ter amino acid alphabet is mapped onto inference),17 which describes a set of tropy, which is relatively small com-
a smaller alphabet, with offsets stored data as a polytope, and infers the spe- pared to |D|, the number of entries in
to recover the original sequence in the cific tasks represented by vertices of the database (Figure 3). In contrast,
full alphabet. The hash table provides that polytope from the features most if the points were uniformly distrib-
an efficient index of the database to highly enriched at those vertices. uted about the Cartesian space, Nrc (D)
be searched. DIAMOND7 also relies on The most widely used chemogenom- would be larger.
alphabet reduction, but uses “shaped ics search is the Small Molecule Sub- A second key insight is the biologi-
seeds”—essentially, k-mers of length graph Detector (SMSD),29 which applies cal datasets have low fractal dimen-
15–24 with wildcards at 9–12 specific one of several MCS algorithms based on sion.38 That is, within some range of
positions—instead of simple k-mer the size and complexity of the graphs in radii r1 and r2 about an arbitrary point
seeds to index the database. DIAMOND question. Notably, large chemical com- in the database D, the fractal dimen-
demonstrates search performance pound databases, such as PubCHEM, sion d is d = (log(n 2/n1)
(log (r2/r1) , where n1 and n2
three to four orders of magnitude faster cannot be searched on a laptop com- are the number of points within r1 and
than BLASTX, but still linear in the size puter with current tools such as SMSD. r2 respectively (Figure 3).
of the database being searched. Cluster-based search, as exempli-
Recent work on gene expression Structure of Biological Data fied by “compressive omics”—the use
has explored additional ways to exploit Fortunately, biological data has unique of compression to accelerate analy-
the high-dimensional structure of the structure, which we later take advantage sis—can perform approximate search
data. SPARCLE (SPArse ReCovery of to perform search that scales sublin- within a radius r of a query q on a da-
of Linear combinations of Expres- early in the size of the database.38 The tabase D with fractal dimension d and
sion)28 brings ideas from compressed first critical observation is that much metric entropy k at the scale rc in time
sensing8 to gene expression analysis. biological data is highly redundant; if a proportional to
output size subset of the original database. This the recently released MICA38 dem-
approach provides orders-of-magni- onstrates the compressive-acceler-
metric entropy
tude runtime improvements to BLAST ation approach of caBLAST22 and
scaling factor nucleotide 22 and protein 12 search; caBLASTP12 is largely orthogonal to
these runtime improvements increase alphabet-reduction and indexing ap-
where BD(q, r) refers to the set of points as databases grow. proaches. MICA applies the compres-
in D contained within a ball of radius r The CORA read mapper37 applies a sive-acceleration framework to the
about a point q. mid-size l-mer based read-compression state-of-the-art DIAMOND,7 using it
Given this formalization, the ratio approach with a compressive indexing for its “coarse search” phase and a
(|D|)
k
provides an estimate of the speed- of the reference genome (referred to as user’s choice of DIAMOND or BLASTX
up factor for the coarse search compo- a homology table). CORA, like caBLAST for its “fine search” phase; MICA dem-
nent compared to a full linear search. (compressively accelerated BLAST)22 and onstrates nearly order-of-magnitude
The time complexity of the fine search caBLASTP,12 accelerates existing tools run-time gains over the highly opti-
is exponential in the fractal dimension (in this case, read mappers including mized DIAMOND, comparable to that
d, which can be estimated globally by BWA or Bowtie2) by allowing them to op- of caBLASTP over BLASTP.
sampling the local fractal dimension erate in a compressed space, and relies Compressive genomics22 has been
over a dataset. The accompanying table on a coarse and a fine phase. In contrast, generalized and adapted to non-
provides the fractal dimension d sam- short seed-clustering schemes, such sequence spaces as well, and coined
pled at typical query radii, as well as as those used in Masai33 and MrsFAST,3 “compressive omics.” One such ex-
the ratio (|D|) , for nucleotide sequence, conceptually differ from CORA in that ample is chemogenomics. Applying a
k
protein sequence, protein structure, those schemes aim to accelerate only compressive acceleration approach,
and chemical compound databases. the seed-to-reference matching step. Ammolite38 accelerates SMSD search
Biological datasets exhibit redun- Thus, there is a subsequent seed-exten- by an average of 150x on the PubChem
dancy, and are constrained to subspaces sion step, which is substantially more database. Another example is esFrag-
by physical laws; that is, the vast major- costly and still needs to be performed Bag,38 which clusters proteins based
ity of enumerable sequences and struc- for each read and mapping individually, on the cosine distance or Euclidean
tures do not exist because they are not even when seeds are clustered. Through distance of their bag-of-words vectors,
advantageous (or at least, have not been its l-mer based read compression further accelerating FragBag’s running
selected for by evolution). This combi- model, CORA is able to accelerate and time by an average of 10x.
nation results in low fractal dimension achieve asymptotically sublinear scaling The compressive omics approach
and low metric entropy relative to the for both the seed-matching and seed- can, in some cases, come at the cost
size of the dataset, which suggests that extension steps within coarse-mapping, of accuracy. However, these cases are
“compressive omics” will provide the which comprises the major bulk of the well defined. Compressive omics never
ability for computation to scale sublin- read-mapping computation. Tradition- results in false positives (with respect
early with massively growing data. ally, k-mers refer to short substrings of to the naïve search technique being
fixed length (often, but not necessar- accelerated), because the fine search
The Age of Compressive Algorithms ily, a power of two) used as “seeds” for phase applies the same comparison to
We are entering the age of compressive longer sequence matches. CORA uses the candidates as the naïve approach.
algorithms, which make use of this much longer k-mers (for example, 33–64 Furthermore, when the distance func-
completely different paradigm for the nucleotides long), and links each one to tion used for comparisons is a metric—
structure of biological data. Seeking to its neighbors within a small Hamming more specifically, when it obeys the
take advantage of the redundancy in- or Levenshtein distance. The term l-mer triangle inequality—false negatives will
herent in genomic sequence data, Loh, distinguishes these substrings from typ- also never occur. Yet, in practice, non-
Baym and Berger22 introduced com- ically short k-mers. metric distance functions are used,
pressive genomics, an approach that re- In the area of metagenomic search, such as E-values in BLAST or cosine
lies on compressing data in such a way
that the desired computation (such as Metric-entropy ratio (ratio of clusters to entries in database) and fractal dimension at
typical search radii for four datasets.
BLAST search) can be performed in the
compressed representation. Compres- Metric-entropy ratio gives an estimate of the acceleration of coarse search with respect to
sive genomics is based on the concept naïve search, and as long as fractal dimension is low, coarse search should dominate total
of compressive acceleration, which re- search time. NCBI’s non-redundant ‘NR’ protein and ‘NT’ nucleotide sequence databases are
from June 2015. Protein Data Bank (PDB) is from July 2015. PubChem is from October 2013.
lies on a two-stage search, referred
to as coarse and fine search. Coarse
search is performed only on the coarse, Dataset Metric-entropy ratio Fractal dimension
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 79
review articles
distance in esFragBag, and thus false techniques, but the data derives from Information Storage and Retrieval 7, 5 (1971) 217–240.
21. Liao, C.-S., Lu, K., Baym, M., Singh, R. and Berger, B.
negatives can occur. Fortunately, these astoundingly complex processes that IsoRankN: Spectral methods for global alignment of
error rates are low, and recall better themselves are driven by evolution. It is multiple protein networks. Bioinformatics 12 (2009),
i253–i258.
than 90% has been demonstrated.12,22,38 through the development of algorithms 22. Loh, P.-R., Baym, M., and Berger, B. Compressive
that leverage the structure of biological genomics. Nature Biotechnology 30, 7 (2012), 627–630.
23. MacFabe, D.F. Short-chain fatty acid fermentation
Conclusion data that we can make sense of biology products of the gut microbiome: Implications in
The explosion of biological data, largely in light of evolution. autism spectrum disorders. Microbial Ecology in
Health and Disease 23 (2012).
due to technological advances such as 24. Marco-Sola, S., Sammeth, M., Guigó, R. and Ribeca,
P. The gem mapper: Fast, accurate and versatile
next-generation sequencing, presents Acknowledgments alignment by filtration. Nature Methods 9, 12 (2012),
us with challenges as well as opportu- This work is supported by the Nation- 1185–1188.
25. Marx, V. Biology: The big challenges of big data. Nature
nities. The promise of unlocking the al Institutes of Health, under grant 498, 7453 (2013), 255–260.
secrets of diseases such as cancer, obe- GM108348. Y.W.Y. is also supported by 26. Ochoa, I., Asnani, H., Bharadia, D., Chowdhury, M.,
Weissman, T. and Yona, G. QualComp: A new lossy
sity, Alzheimer’s, autism spectrum dis- a Hertz Fellowship. compressor for quality scores based on rate distortion
order, and many others, as well as bet- theory. BMC bioinformatics 14, 1 (2013), 187.
27. Patro, R. and Kingsford, C. Data-dependent bucketing
ter understanding the basic science of References improves reference-free compression of sequencing
1. 1000 Genomes Project Consortium et al. An
biology, relies on researchers’ ability to integrated map of genetic variation from 1,092 human
reads. Bioinformatics (2015).
28. Prat, Y., Fromer, M., Linial, N. and Linial, M. Recovering
analyze the growing flood of genomic, genomes. Nature 491, 7422 (2012), 56–65. key biological constituents through sparse
2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and
metagenomic, structural, and interac- Lipman, D.J. Basic local alignment search tool.
representation of gene expression. Bioinformatics 5
(2011), 655–661.
tome data. Journal of Molecular Biology 215, 3 (1990), 403–410. 29. Rahman, S.A., Bashton, M., Holliday, G.L., Schrader, R.
3. Berger, B., Peng, J. and Singh, M. Computational and Thornton, J.M. Small molecule subgraph detector
The approach of compressive accel- solutions for omics data. Nature Reviews Genetics 14, (SMSD) toolkit. J. Cheminformatics 1, 1 (2009), 1–13.
eration,22 and its demonstrated abil- 5 (2013), 333–346. 30. Rubinfeld, R. and Shapira, A. Sublinear time
4. Bonfield, J.K. and Mahoney, M.V. Compression of algorithms. SIAM J. Discrete Mathematics 25, 4
ity to scale with the metric entropy of FASTQ and SAM format sequencing data. PLoS ONE 8, (2011), 1562–1588.
the data,38 while providing orthogonal 3 (2013), e59190. 31. Schatz, M.C., Langmead, B. and Salzberg, S.L.
5. Bredel, M. and Jacoby, E. Chemogenomics: An Cloud computing and the DNA data race. Nature
benefits to many other useful index- emerging strategy for rapid target and drug discovery. Biotechnology 28, 7 (2010), 691–693.
ing techniques, is an important tool Nature Reviews Genetics 5, 4 (2004), 262–275. 32. Singh, R., Xu, J. and Berger, B. Global alignment of
6. Bruijn, D.N. A combinatorial problem. In Proceedings multiple protein interaction networks with application
for coping with the deluge of data. The of the Koninklijke Nederlandse Akademie van to functional orthology detection. In Proceedings of
extension of this compressive accelera- Wetenschappen, Series A 49, 7 (1946), 758. the National Academy of Sciences 105, 35 (2008),
7. Buchfink, B., Xie, C., and Huson, D.H. Fast and sensitive 12763–12768.
tion approach to metagenomics, NGS protein alignment using DIAMOND. Nature Methods 33. Siragusa, E., Weese, D. and Reinert, K. Fast and
12, 1 (2015), 59–60.
read mapping,37 and chemogenomics 8. Candes, E.J. and Tao, T. Decoding by linear
accurate read mapping with approximate seeds and
multiple backtracking. Nucleic Acids Research 41, 7
suggests its flexibility. Likewise, com- programming. IEEE Transactions on Information (2013), e78.
Theory 51, 12 (2005), 4203–4215.
pressive storage for these applications 9. Cao, M., Zhang, H., Park, J., Daniels, N.M., Crovella,
34. Stephens, Z.D. et al. Big data: Astronomical or
genomical? PLoS Biol. 13, 7 (2015), e1002195.
can be shown to scale with the informa- M.E., Cowen, L.J. and Hescott, B. Going the distance for 35. Uhlmann, J.K. Satisfying general proximity/similarity
protein function prediction: A new distance metric for
tion-theoretic entropy of the dataset.38 protein interaction networks. PLoS ONE 8, 10 (2013).
queries with metric trees. Information Processing
Letters 40, 4 (1991), 175–179.
The field of computational biology 10. Chindelevitch, L., Trigg, J., Regev, A. and Berger, B. 36. Weinstein, J.N. et al. The cancer genome atlas pan-
An exact arithmetic toolbox for a consistent and cancer analysis project. Nature Genetics 45, 10 (2013),
must continue to innovate, but also to reproducible structural analysis of metabolic network 1113–1120.
incorporate the best ideas from other models. Nature Communications 5, (2014). 37. Yorukoglu, D., Yu, Y.W., Peng, J. and Berger, B.
11. Cho, H., Berger, B., and Peng, J. Diffusion component Compressive mapping for next-generation sequencing
areas of computer science. For example, analysis: Unraveling functional topology in biological Nature Biotechnology 4 (2016), 374–376.
the compressive acceleration approach networks. Research in Computational Molecular 38. Yu, Y.W., Daniels, N., Danko, D.C. and Berger, B.
Biology. Springer, 2015, 62–64. Entropy-scaling search of massive biological data. Cell
bears similarity to a metric ball tree, 12. Daniels, N.M., Gallant, A., Peng, J., Cowen, L.J., Baym, Systems 1, 2 (2015), 130–140.
first described in the database com- M. and Berger, M. Compressive genomics for protein 39. Yu, Y.W., Yorukoglu, D., Peng, J. and Berger, B. Quality
databases. Bioinformatics 29 (2013), i283–i290. score compression improves genotyping accuracy.
munity over 20 years ago;35 however, the 13. Dobzhansky, T. Nothing in biology makes sense except Nature Biotechnology 33, 3 (2015), 240–243.
latter does not allow one to analyze per- in the light of evolution (1973). 40. Zhao, Y., Tang, H. and Ye, Y. RAPSearch2: A fast and
14. Forsberg, K.J., Reyes, A., Wang, B., Selleck, E.M., memory-efficient protein similarity search tool for
formance guarantees in terms of metric Sommer, M.O. and Dantas, G. The shared antibiotic next-generation sequencing data. Bioinformatics 28, 1
resistome of soil bacteria and human pathogens.
entropy and fractal dimension. Other Science 337, 6098 (2012), 1107–1111.
(2012), 125–126.
ideas from image processing, compu- 15. Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F.,
Birol, I., Eichler, E.E. and Sahinalp, S.C. mrsFAST: A Bonnie Berger ([email protected]) is a professor in CSAIL
tational geometry,18 sublinear-time al- cache-oblivious algorithm for short-read mapping. and the Department of Mathematics and EECS at
gorithms,30 and other areas outside of Nature Methods 7, 8 (2010), 576–577. Massachusetts Institute of Technology, Cambridge, MA.
16. Hach, F., Sarra, I. Hormozdiari, F., Alkan, C., Eichler,
biology are likely to bear fruit. It is also E.E. and Sahinalp, S.C. mrsFAST-Ultra: a compact, Noah M. Daniels ([email protected]) is a postdoctoral
likely that algorithmic ideas developed SNP-aware mapper for high-performance sequencing associate in CSAIL and Department of Mathematics,
applications. Nucleic Acids Research (2014), gku370. Massachusetts Institute of Technology, Cambridge, MA.
within computational biology will be- 17. Hart, Y., Sheftel, H., Hausser, J., Szekely, P., Ben- Y. William Yu ([email protected]) is a graduate student in
come useful in other fields experienc- Moshe, N.B., Korem, Y., Tendler, A., Mayo, A.E. and CSAIL and Department of Mathematics, Massachusetts
Alon, U. Inferring biological tasks using Pareto Institute of Technology, Cambridge, MA.
ing a data deluge, such as astronomy or analysis of high-dimensional data. Nature Methods 12,
social networks.34 3 (2015), 233–235. Copyright held by authors.
18. Indyk, P. and Motwani, R. Approximate nearest
Biological data science is unique for neighbors: Towards removing the curse of
two primary reasons: biology itself— dimensionality. In Proceedings of the 13th Annual
ACM Symposium on Theory of Computing. ACM, 1998,
even molecular biology—predates the 604–613.
19. Janda, J.M. and Abbott, S.L. 16S rRNA gene Watch the authors discuss
information age, and “nothing in biol- sequencing for bacterial identification in the diagnostic their work in this exclusive
ogy makes sense except in light of evo- laboratory: pluses, perils, and pitfalls. J. Clinical Communications video.
Microbiology 45, 9 (2007), 2761–2764. http://cacm.acm.org/videos/
lution.”13 Not only have biologists de- 20. Jardine, N. and van Rijsbergen, C.J. The use of computational-biology-in-the-
veloped a diverse array of experimental hierarchic clustering in information retrieval. 21st-century
P. 92 P. 93
Technical
Perspective Ur/Web: A Simple Model
Why Didn’t for Programming the Web
I Think of That? By Adam Chlipala
By Philip Wadler
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 81
research highlights
DOI:10.1145/ 2 9 6 1 8 9 0
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/2958738 rh
“ I T ’ S N O T A bug; it’s a feature!” Though The computing industry and re- provides specific probabilities of soft
this sentence is often meant as a joke, search community have developed errors for different operations (for
sometimes a bug really is a feature— many tools and techniques for find- example, reading from memory, per-
when the benefits of tolerating the ing bugs and validating properties of forming an addition). Therefore the
bug outweigh its negative impact on programs. However, for the most part approach is oblivious to the particular
applications. those approaches do not help to an- details of the hardware architecture
Designers of emerging hardware ar- swer the question here. The issue in and the causes of its soft errors.
chitectures are taking this point of view this setting is not whether a bug ex- These choices not only make the
in order to increase energy efficiency, ists, but how likely the bug is to occur approach more general; they also en-
which is a critical concern across the and how it will affect the application’s able the authors to recast the prob-
computing landscape, from tiny em- behavior. Further, the bug is not in the lem in a manner that is surprisingly
bedded devices to enormous datacen- application but rather in the underly- amenable to traditional program veri-
ters. Techniques such as a low-voltage ing hardware platform. Finally, it’s fication techniques. Their analysis
mode for data-processing components not even clear how to specify a desired validates reliability specifications by
and a low refresh rate for memory com- quality-of-service level; traditional determining the probability that each
ponents can significantly decrease program logics based on a binary no- variable’s computation incurs no soft
energy consumption. But they also tion of truth and falsehood are not up errors, since that is a lower bound on
increase the likelihood of soft errors, to the task. the variable’s probability of being reli-
which are transient hardware faults The following paper by Carbin et able. By abstracting away the specific
that can cause an erroneous value to be al. addresses these challenges in the reliability probabilities of function
computed or retrieved from memory. context of an important subproblem. inputs as well as of individual op-
Ultimately, whether these tech- The authors introduce the notion of a erations, the problem essentially be-
niques should be considered bugs or quantitative reliability specification for a comes one of counting the number of
features rests on the ability of software variable, which specifies a minimal ac- operations that can incur soft errors
systems, and their developers, to tol- ceptable probability that the variable’s and that can affect a variable’s value,
erate the increase in soft errors. For- computed value will be correct despite a task that is well suited to automated
tunately, a large class of applications the potential for soft errors. For exam- program analysis.
known as approximate computations ple, a developer may desire a particular This work is part of an exciting
is naturally error-tolerant. A book rec- variable’s value to be correct 99% of the stream of recent research that adapts
ommendation system approximates time. The authors introduce a language and extends traditional program veri-
an unknown “ideal” recommendation for providing such specifications as well fication techniques to reason about
function, for example, by clustering us- as an automated code analysis to verify probabilistic properties, which are
ers with similar tastes. With enough us- them. Separately, the authors and other abundant in modern software sys-
ers and data about these users, sporad- researchers have tackled complemen- tems. I am hopeful this research agen-
ic errors in the clustering computation tary problems, such as how to bound da will lead to general ways of build-
are unlikely to cause noticeably worse the maximum effect that soft errors can ing robust systems out of potentially
recommendations. Similarly, an audio have on a variable’s value. unreliable parts, where the notion of
encoder can likely tolerate sporadic The power of the authors’ approach unreliability is broadly construed—
errors that introduce additional noise comes from its generality. Despite my not only soft errors, but also faulty
without affecting the user experience. example here, reliability specifications sensor and other environmental in-
Even so, no application can toler- are relative rather than absolute. For puts, untrusted libraries, and approx-
ate an unbounded number of errors. example, the reliability specification imate computations themselves. The
At some point the book recommenda- for a function’s return value is defined more tools we have to reason about
tions will be random and the music in terms of the reliability probabili- unreliability, the more bugs we can
will be unlistenable. How can the im- ties of the function’s arguments and turn into features.
plementers of these applications gain so must hold for all possible values of
assurance that the quality of service those probabilities. Further, the ap- Todd Millstein is a professor of computer science at
UCLA, Los Angeles, CA.
will be acceptable despite the potential proach is parameterized by a separate
for soft errors? hardware reliability specification that Copyright held by author.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 83
research highlights
example, researchers are investigating designs that incor- accuracy property might state that the transformed program
porate aggressive device and voltage scaling techniques must produce a result that differs by at most a specified
to provide low-power ALUs and memories. A key aspect of percentage from the result that a corresponding original
these components is that they forgo traditional correctness program produces.19, 30 Alternatively, a potential accuracy
checks and instead expose timing errors and bitflips with property for our example program might require the min to
some non-negligible probability.9, 11, 13, 15, 21, 22, 27 be within the smallest N/2 elements a[0]–a[N–1]. Such
an accuracy property might be satisfied by, for example, a
1.2. Reasoning about approximate programs loop perforation transformation that skips N/2–1 of the
Approximate computing violates the traditional contract that loop iterations.
the programming system must preserve the standard semantics In this article we focus on reliability properties for
of the program. It therefore invalidates standard paradigms approximate computations executing on unreliable hard-
and motivates new, more general, approaches to reasoning ware platforms. In other research, we have developed
about program behavior, correctness, and acceptability. techniques for reasoning about integrity properties5, 6
One key aspect of approximate applications is that they typi- and both worst-case and probabilistic accuracy proper-
cally contain critical regions (which must execute without error) ties.5, 19, 30 We have also extended the research presented
and approximate regions (which can execute acceptably even in in this article to include combinations of reliability and
the presence of occasional errors).7, 25 Existing systems, tools, accuracy properties.17
and type systems have focused on helping developers iden-
tify, separate, and reason about the binary distinction between 1.3. Verifying reliability (contributions)
critical and approximate regions.7, 11, 15, 25, 27 However, in practice, To meet the challenge of reasoning about reliability, we
no computation can tolerate an unbounded accumulation of present a programming language, Rely, and an associated
errors—to execute acceptably, executions of even approximate program analysis that computes the quantitative reliability
regions must satisfy some minimal requirements. of the computation—that is, the probability with which the
Approximate computing therefore raises a number of fun- computation produces a correct result when parts of the
damental new research questions. For example, what is the computation execute on unreliable hardware with soft errors
probability that an approximate program will produce the (independent errors that occur nondeterministically with
same result as a corresponding original exact program? How some probability). Specifically, given a hardware specification
much do the results differ from those produced by the origi- and a Rely program, the analysis computes, for each value
nal program? And is the resulting program safe and secure? that the computation produces, a conservative probability
Because traditional correctness properties do not pro- that the value is computed correctly despite the possibility of
vide an appropriate conceptual framework for addressing soft errors.
these kinds of questions, we instead work with acceptability Rely supports and is specifically designed to enable parti-
properties—the minimal requirements that a program must tioning a program into critical regions (which must execute
satisfy for acceptable use in its designated context. We iden- without error) and approximate regions (which can execute
tify three kinds of acceptability properties and use the fol- acceptably even in the presence of occasional errors).7, 25
lowing program (which computes the minimum element In contrast to previous approaches, which support only
min in an N-element array) to illustrate these properties: a binary distinction between critical and approximate
regions, quantitative reliability can provide precise static
int min = INT_MAX ;
probabilistic acceptability guarantees for computations
for (int i = 0; i < N; ++i)
that execute on unreliable hardware platforms. This article
if (a[i] < min) min = a[i];
specifically presents the following contributions:
Integrity Properties: Integrity properties are properties that Quantitative Reliability Specifications: We present quan-
the computation must satisfy to produce a successful result. titative reliability specifications, which characterize the
Examples include computation-independent properties (no probability that a program executed on unreliable hardware
out of bounds accesses, null dereferences, divide by zero produces the correct result, as a constructive method for
errors, or other actions that would crash the computation) developing applications. Quantitative reliability specifica-
and computation-dependent properties (e.g., the computa- tions enable developers who build applications for unreli-
tion must return a result within a given range). One integ- able hardware architectures to perform sound and verified
rity property for our example program is that accesses to the reliability engineering.
array a must always be within bounds. Language: We present Rely, a language that enables devel-
Reliability Properties: Reliability properties characterize the opers to specify reliability requirements for programs that
probability that the produced result is correct. Reliability prop- allocate data in unreliable memory regions and use unreli-
erties are often appropriate for approximate computations able arithmetic/logical operations.
executing on unreliable hardware platforms that exhibit occa- Quantitative Reliability Analysis: We present a program
sional nondeterministic errors. A potential reliability property analysis that verifies that the dynamic semantics of a Rely
for our example program is that min must be the minimum program satisfies its quantitative reliability specifications.
element in a[0]–a[N–1] with probability at least 95%. For each function in the program, the analysis computes a
Accuracy Properties: Accuracy properties characterize symbolic reliability precondition that characterizes the set
how accurate the produced result must be. For example, an of valid specifications for the function. The analysis then
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 85
research highlights
energy-efficient unreliable operations (which execute cor- additional checking logic, enabling the unit to execute
rectly with only some probability). Specifically, Rely supports more efficiently but also allowing for soft errors that may
reasoning about reads and writes of unreliable memory occur due to, for example, power variations within the
regions and unreliable arithmetic/logical operations. ALU’s combinatorial circuits or particle strikes.
Memory Region Specification: Each parameter declaration To prevent the execution from taking control flow
specifies the memory region in which the data of the param- edges that are not in the program’s static control flow
eter is allocated. Memory regions correspond to the physical graph, the control unit of the CPU reliably fetches,
partitioning of memory at the hardware level into regions of decodes, and schedules instructions (as is supported by
varying reliability. Here pblocks and cblock are allocated existing unreliable processor architectures11, 27). In addi-
in an unreliable memory region named urel. tion, given a virtual address in the application, the control
Lines 10–13 declare the local variables of the function. unit correctly computes a physical address and operates
By default, variables in Rely are allocated in a fully reliable only on that address.
memory region. However, a developer can also optionally Memory: Rely supports machines with memories that
specify a memory region for each local variable. For exam- consist of an arbitrary number of memory partitions (each
ple, the variables declared on Lines 10–12 reside in urel. potentially of different reliability), but for simplicity Figure 2
Unreliable Operations: The operations on Lines 23, 24, partitions memory into two regions: reliable and unreliable.
and 30 are unreliable arithmetic/logical operations. In Unreliable memories can, for example, use decreased DRAM
Rely, every arithmetic/logical operation has an unreliable refresh rates to reduce power consumption at the expense of
counterpart that is denoted by suffixing a period after the increased soft error rates.15, 27
operation symbol. For example, “−.” denotes unreliable
subtraction and “<.” denotes unreliable comparison. 2.4. Hardware reliability specification
Using these operations, search_ref’s implementa- Rely’s analysis works with a hardware reliability specifica-
tion approximately computes the index (minblock) of the tion that specifies the reliability of arithmetic/logical and
most similar block, that is, the block with the minimum memory operations. Figure 3 presents a hardware reli-
distance from cblock. The repeat statement on line 15, ability specification that is inspired by results from the
iterates a constant nblock number of times, enumerating existing computer architecture literature.10, 15 Each entry
over all previously encoded blocks. For each encoded block, specifies the reliability—the probability of a correct exe-
the repeat statements on lines 18 and 20 iterate over the cution—of arithmetic operations (e.g., +.) and memory
height * width pixels of the block and compute read/write operations.
the sum of the squared differences (ssd) between each For ALU operations, the presented reliability specifi-
pixel value and the corresponding pixel value in the cur- cation uses the reliability of an unreliable multiplication
rent block cblock. Finally, the computation on lines 30 operation from Ref.10, Figure 9. For memory operations, the
through 33 selects the block that is—approximately—the specification uses the probability of a bit flip in a memory
most similar to cblock. cell from Ref.15, Figure 4 with extrapolation to the prob-
ability of a bit flip within a 32-bit word. Note that a memory
2.3. Hardware semantics region specification includes two reliabilities: the reliability
Figure 2 illustrates the conceptual machine model behind of a read (rd) and the reliability of a write (wr).
Rely’s reliable and unreliable operations; the model con-
sists of a CPU and a memory.
CPU: The CPU consists of (1) a register file, (2) arithmetic Figure 3. Hardware reliability specification.
logical units that perform operations on data in registers,
and (3) a control unit that manages the program’s execution. reliability spec {
The arithmetic-logical unit can execute reliably or unre- operator (+.) = 1 - 10^-7;
liably. Figure 2 presents physically separate reliable and operator (-.) = 1 - 10^-7;
unreliable functional units, but this distinction can be operator (*.) = 1 - 10^-7;
operator (<.) = 1 - 10^-7;
achieved through other mechanisms, such as dual-volt-
memory rel {rd = 1, wr = 1};
age architectures.11 Unreliable functional units may omit memory urel {rd = 1 - 10^-7, wr = 1};
}
Figure 2. Machine model. Orange boxes represent unreliable
components.
Exact Approximate
Figure 4. Rely analysis overview.
Registers CPU Cache Memory
Rely
ALU
Program Precondition Precondition Verified
Hardware Generator Checker Yes/No
Specification
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 87
research highlights
on the maximum number of its iterations that therefore Step 3: In the final step, the analysis leaves the scope of
bounds the reliability degradation of modified variables. the conditional and conjoins the two preconditions for its
The loop on Line 15 iterates nblocks times and therefore branches after transforming them to include the direct
decreases the reliability of any modified variables nblocks dependence of the control flow variable on the reliability of
times. Because the reliability degradation is bounded, Rely’s the if statement’s condition expression.
analysis uses unrolling to reason about the effects of a The reliability of the if statement’s expression is greater
bounded loop. than or equal to the product of (1) the reliability of the <. oper-
Conditionals: The analysis of the body of the loop on Line ator (r0), (2) the reliability of reading both ssd and minssd
15 encounters the if statement on Line 30.b This if statement from unreliable memory (r 20), and (3) the reliability of the
uses an unreliable comparison operation on ssd and computation that produced ssd and minssd (R(ssd,
minssd, both of which reside in unreliable memory. The minssd) ). The analysis therefore transforms each predicate
reliability of minblock when modified on Line 32 therefore that contains the variable 30, by multiplying the right-hand
also depends on the reliability of this expression because side of the inequality with r 30 and replacing the variable 30
faults may force the execution down a different path. with ssd and minssd.
Figure 5 presents a Hoare logic style presentation of the This produces the precondition Q2:
analysis of the conditional statement. The analysis works in
three steps; the preconditions generated by each step are
numbered with the corresponding step.
Step 1: To capture the implicit dependence of a variable
on an unreliable condition, Rely’s analysis first uses latent Simplification: After unrolling a single iteration of the loop
control flow variables to make these dependencies explicit. that begins at Line 15, the analysis produces
A control flow variable is a unique program variable (one for R(pblocks, cblock, i, ssd, minssd) as the precondition for
each statement) that records whether the conditional evalu- a single iteration of the loop’s body. The constant 2564 rep-
ated to true or false. We denote the control flow variable for resents the number of unreliable operations within a single
the if statement on Line 30 by 30. loop iteration.
To make the control flow dependence explicit, the analy- Note that there is one less predicate in this precondition
sis adds the control flow variable to all joint reliability terms than in Q 2. As the analysis works backwards through the
in Q1 that contain variables modified within the body of the program, it uses a simplification technique that identifies
if conditional (minssd and minblock). that a predicate Aret ≤ r1 ⋅R(X1) subsumes another predicate
Step 2: The analysis next recursively analyzes both the A ret ≤ r 2 ⋅ R(X 2). Specifically, the analysis identifies that
“then” and “else” branches of the conditional, produc- r 1 ≤ r2 and X2 ⊆ X1, which together mean that the sec-
ing one precondition for each branch. As in a standard ond predicate is a weaker constraint on A ret than the first
precondition generator (e.g., weakest-preconditions) and can therefore be removed. This follows from the fact
the assignment of i to minblock in the “then” branch that the joint reliability of a set of variables is less than
replaces minblock with i in the precondition. Because or equal to the joint reliability of any subset of the vari-
reads from i and writes to minblock are reliable (accord- ables—regardless of the distribution of their values.
ing to the specification) the analysis does not introduce This simplification is how Rely’s analysis achieves scal-
any new r0 factors. ability when there are multiple paths in the program; specifi-
cally a simplified precondition characterizes the least reliable
path(s) through the program.
b
This happens after encountering the increment of i on Line 35, which
Final Precondition: When the analysis reaches the begin-
does not modify the current precondition because it does not reference i. ning of the function after fully unrolling the loop on Line 15, it
has a precondition that bounds the set of valid specifications
as a function of the reliability of the parameters of the func-
Figure 5. if statement analysis in the last loop iteration.
tion. For search_ref, the analysis generates the precondition
(3) {Q0 ∧ Aret ≤ r 04 · R(i, ssd, minssd)
∧ Aret ≤ r 04 · R(minblock, ssd, minssd)}
if (ssd <. minssd) {
(2) {Q0 ∧ Aret ≤ r0 · R(i, 30)}
minssd = ssd;
{Q0 ∧ Aret ≤ r0 · R(i, 30)}
Precondition checker. The final precondition is a con-
minblock = i;
{Q0 ∧ Aret ≤ r0 · R(minblock, 30)}
junction of predicates of the form A out ≤ r ⋅R(X), where
} else { A out is a placeholder for the reliability specification of an
(2) {Q0 ∧ Aret ≤ r0 · R(minblock, 30)} output. Because reliability specifications are all of the
skip; form r ⋅R(X), each predicate in the final precondition
{Q0 ∧ Aret ≤ r0 · R(minblock, 30)} (where each Aout is replaced with its specification) is of
} the form form r1 ⋅R(X1) ≤ r 2 ⋅R(X 2), where r 1 ⋅R(X 1) is a
(1) {Q0 ∧ Aret ≤ r0 · R(minblock, 30)} reliability specification and r 2 ⋅R(X2) is computed by the
analysis. Similar to the analysis’s simplifier (see Precon-
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 89
research highlights
3.2. Analysis summary desirable to reason about the accuracy of the result that the
The table here presents Rely’s analysis results on the bench- computation produces. Dynamic techniques observe the
mark computations. For each benchmark, the table presents accuracy impact of program transformations, for exam-
the type of the computation (checkable or approximate), its ple, Refs., 2, 3, 16, 20, 25, 29 or injected soft errors, for example,
length in lines of code (LOC), the execution time of the anal- Refs. 9, 15, 27 Researchers have developed static techniques
ysis, and the number of inequality predicates in the final that use probabilistic reasoning to characterize the accu-
precondition produced by the precondition generator both racy impact of various sources of uncertainty.8, 19, 30 And of
without and with our simplification strategy. course, the accuracy impact of the floating point approxi-
Analysis Time: The analysis times for all benchmarks are mation to real arithmetic has been extensively studied in
under one second when executed on an Intel Xeon E5520 numerical analysis.
machine with 16 GB of main memory. More recently, we developed the Chisel optimization
Number of Predicates: We used Rely with the hardware reli- system to automate the placement of approximate opera-
ability specification from Figure 3 to generate a reliability tions and data.17 Chisel extends the Rely reliability specifi-
precondition for each benchmark. The second to last column cations (that capture acceptable frequency of errors) with
(labeled N) presents the number of predicates in the precon- absolute error specifications (that also capture accept-
dition when using a naïve strategy that does not include our able magnitude of errors). Chisel formulates an integer
simplification procedure. The rightmost column (labeled S) optimization problem to automatically navigate the trad-
presents the number of predicates in each precondition eoff space and generate an approximate computation that
when Rely employs our simplification procedure. provides maximum energy savings (for the given model of
When Rely uses simplification, the size of each precondi- approximate hardware) while satisfying the developer’s
tion is small (all consisting of less than five predicates). The reliability and absolute error specifications.
difference in size between the naïvely generated precondi- Fault Tolerance and Resilience: Researchers have devel-
tions and those generated via simplification demonstrates oped various software, hardware, or mixed approaches for
that simplification reduces the size of preconditions by detection and recovery from specific types of soft errors
multiple orders of magnitude. Simplification achieves these that guarantee a reliable program execution, for example,
results by identifying that many of the additional predicates Refs.9, 23, 24 For example, Reis et al.24 present a compiler
introduced by the reasoning required for conditionals can that replicates a computation to detect and recover from
be removed. These additional predicates are often sub- single event upsets. These techniques are complemen-
sumed by another predicate. tary to Rely in that each can provide implementations of
operations that need to be reliable, as either specified by
4. RELATED WORK the developer or as required by Rely, to preserve memory
In this section, we present an overview of the other work that safety and control flow integrity.
intersects with Rely and its contributions to modeling and
analysis of approximate computations, and computation 5. CONCLUSION
fault tolerance. Driven by hardware technology trends, future computa-
Integrity: Almost all approximate computations have tional platforms are projected to contain unreliable hard-
critical regions that must execute without error for the ware components. To safely exploit the benefits (such as
computation as a whole to execute acceptably. Dynamic reduced energy consumption) that such unreliable compo-
criticality analyses automatically change different nents may provide, developers need to understand the effect
regions of the computation or internal data structures, that these components may have on the overall reliability of
and observe how the change affects the program’s out- the approximate computations that execute on them.
put, for example, Refs.7, 20, 25 In addition, specification- We present a language, Rely, for exploiting unreliable
based static criticality analyses let the developer identify hardware and an associated analysis that provides probabi-
and separate critical and approximate program regions, listic reliability guarantees for Rely computations executing
for example, Refs.15, 27 Carbin et al.5 present a verification on unreliable hardware. By enabling developers to bet-
system for relaxed approximate programs based on a rela- ter understand the probabilities with which this hardware
tional Hoare logic. The system enables rigorous reason- enables approximate computations to produce correct
ing about the integrity and worst-case accuracy properties results, these guarantees can help developers safely exploit
of a program’s approximate regions. the benefits that unreliable hardware platforms offer.
In contrast to the prior static analyses that focus on the
binary distinction between reliable and approximate com- Acknowledgments
putations, Rely allows a developer to specify and verify that We thank Deokhwan Kim, Hank Hoffmann, Vladimir
even approximate computations produce the correct result Kiriansky, Stelios Sidiroglou, and Rishabh Singh for their
most of the time. Overall, this additional information can insightful comments.
help developers better understand the effects of deploying This research was supported in part by the National
their computations on unreliable hardware and exploit the Science Foundation (Grants CCF-0905244, CCF-1036241,
benefits that unreliable hardware offers. CCF-1138967, CCF-1138967, and IIS-0835652), the United
Accuracy: In addition to reasoning about how often a States Department of Energy (Grant DE-SC0008923), and
computation may produce a correct result, it may also be DARPA (Grants FA8650-11-C-7192, FA8750-12-2-0110).
Innovative.
Sankaralingam, K. Relax: An (2005).
architectural framework for software 25. Rinard, M. Probabilistic accuracy
recovery of hardware faults. In ISCA bounds for fault-tolerant
(2010). computations that discard tasks. In
Insightful.
10. Ernst, D., Kim, N.S., Das, S., Pant, S., ICS (2006).
Rao, R., Pham, T., Ziesler, C., Blaauw, D., 26. Rinard, M., Cadar, C., Dumitran, D.,
Austin, T., Flautner, K., Mudge, T. Roy, D., Leu, T., Beebee, W. Jr.
Razor: A low-power pipeline based on Enhancing server availability and
circuit-level timing speculation. security through failure-oblivious
In MICRO (2003). computing. In OSDI (2004).
11. Esmaeilzadeh, H., Sampson, A.,
Ceze, L., Burger, D. Architecture
27. Sampson, A., Dietl, W., Fortuna, E.,
Gnanapragasam, D., Ceze, L., The VR Book: Human-Centered
Design for Virtual Reality
support for disciplined approximate Grossman, D. EnerJ: Approximate
programming. In ASPLOS (2012). data types for safe and general
12. Hoffman, H., Sidiroglou, S., Carbin, M., low-power computation. In PLDI
Misailovic, S., Agarwal, A., Rinard, M.
Dynamic knobs for responsive
(2011).
28. Sankaranarayanan, S., Chakarov, A., By Jason Jerald, PhD
power-aware computing. In ASPLOS Gulwani, S. Static analysis for
(2011). probabilistic programs: Inferring Good VR design requires strong communication between
13. Leem, L., Cho, H., Bau, J., Jacobson, Q., whole program properties from
Mitra, S. Ersa: Error resilient system finitely many paths. In PLDI
human and machine, indicating what interactions are
architecture for probabilistic (2013). possible, what is currently occurring, and what is about to
applications. In DATE (2010). 29. Sidiroglou, S., Misailovic, S.,
14. Leveson, N., Cha, S., Knight, J.C., Hoffmann, H., Rinard, M. Managing occur. A human-centered design principle, like lean
Shimeall, T. The use of self checks performance vs. accuracy trade-offs methods, is to avoid completely defining the problem at
and voting in software error detection: with loop perforation. In FSE (2011).
An empirical study. In IEEE TSE 30. Zhu, Z., Misailovic, S., Kelner, J., the start and to iterate upon repeated approximations and
(1990).
15. Liu, S., Pattabiraman, K., Moscibroda, T.,
Rinard, M. Randomized accuracy-
aware program transformations for
modifications through rapid tests of ideas with real users.
Zorn, B. Flikker: Saving dram efficient approximate computations. Thus, The VR Book is intended as a foundation for anyone
refresh-power through critical data In POPL (2012).
and everyone involved in creating VR experiences
including: designers, managers, programmers, artists,
Michael Carbin, Sasa Misailovic, and
Martin C. Rinard, Computer Science psychologists, engineers, students, educators, and user
and Artificial Intelligence Laboratory, experience professionals.
Massachusetts Institute of Technology,
Cambridge, MA. Available in hardcover, paperback and eBook.
DOI: 10.1145/2792790
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 91
research highlights
DOI:10.1145/ 2 9 6 1 8 9 2
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/2958736 rh
ON C E I N A while, an idea will strike Advanced as these systems were, a Web application has been treated
you with great force, and you say “Why even as simple a matter as enter- as a global variable, accessible to all.
didn’t I think of that?” The history of ing a date on a form could be com- Chlipala suggests a better approach:
programming the Web is a sequence plex. It might be input as a single allow each module to declare locally
of innovations in abstraction, each of string from a form that needed to be a portion of the database relevant to
which might make you utter the afore- parsed, or three drop-down menus its needs, and hide that portion from
mentioned phrase. for day, month, and year that need to the other modules. He also introduc-
The Web is supported by an array be assembled, or through a calendar es primitives to support concurrency
of interlocked standards, including widget in JavaScript. It was not until and transaction, with a more elegant
HTTP, HTML, CSS, CGI, and Java- 2006 that I saw the iData system of design than found in most other Web
Script. On top of these have been built a Plasmeijer and Achten,6 which sug- languages. Finally, he suggests a novel
series of abstractions, each increasing gested Web systems should abstract form of functional reactive program-
the ease with which a Web application away from such detail, introduc- ming that incorporates imperative ac-
can be designed and implemented. ing model-view abstraction to Web tions. How the latter compares with
One of the earliest of these ab- forms, encapsulating how data was the more declarative form of func-
stractions appeared at the end of the input separately from how it was to tional reactive programming found
last millennium, when Atkins, Ball, be processed. An obvious idea—but in languages such as Elm remains
Bruns, and Cox1 devised MAWL, the only in retrospect. Cooper, Lindley, to be seen. Chlipala has tried these
Mother of All Web Languages. MAWL Wadler, and Yallop3 reworked this techniques in practice, and an intrigu-
introduced to the Web world the now- aspect of iData into formlets. Just as ing list of his early customers may be
common idea of inversion of control, inversion of control was easier to ab- found in the research version of this
where a sequence of requests from sorb once it was related to the known paper, which appeared in POPL 2015.
users invoking programs that gener- notion of continuations, so formlets Modularizing database access is a
ate Web pages is instead viewed as a benefited from fitting into the known simple idea of enormous power, and I
single program generating a sequence notion of applicatives, as introduced expect it will be coming to a Web pro-
of pages to which users respond. My by McBride and Patterson.5 The the- gramming language near you soon.
first attempts to come to grips with ory aided practice: developers wrote Why didn’t I think of that?
inversion of control hurt my brain, formlet libraries for F#, Haskell, Ja-
but within a few years it acquired firm vaScript, and Racket, and incorporat- References
1. Atkins, D.L., Ball, T., Bruns, G. and Cox, K. Mawl: A
foundations in theory, relating it to ed formlet support into frameworks domain-specific language for form-based services.
well-understood notions of continua- including Happstack, Tupil, Web- IEEE Trans. Software Engineering 25, 3 (1999),
334–346.
tions and continuation-passing style, Sharper, and Yesod. 2. Cooper, E., Lindley, S., Wadler, P. and Yallop, J. Links:
Web programming without tiers. Formal Methods for
thanks to the efforts of Queinnec7 and The following paper presents the Components and Objects. Springer, 2007, 266–296.
the PLT Scheme (now Racket) team of next step. Until now, the database in 3. Cooper, E., Lindley, S., Wadler, P. and Yallop, J. The
essence of form abstraction. In Proceedings of the
Graunke, Findler, Krishnamurthi, Van Asian Symposium on Programming Languages and
Der Hoeven, and Felleisen.4 Systems. Springer, 2008, 205–220.
Web programming was complex Until now, 4. Graunke, P., Krishnamurthi, S., Van Der Hoeven, S. and
Felleisen, M. Programming the Web with high-level
because it involved a plethora of pro- programming languages. In Proceedings of the
grams written in different languages the database in European Symposium on Programming. Springer,
2001, 122–136.
running on different platforms. Typi- a Web application 5. McBride, C. and Paterson, R. Applicative programming
with effects. J. Functional Programming 18, 1 (2008),
cally, a three-tier system consisted of
JavaScript on the client, Java (or some has been treated 1–13.
6. Plasmeijer, R. and Achten, P. idata for the World Wide
as a global variable,
Web—Programming interconnected Web forms. In
other language) on the server, and SQL Proceedings of the International Symposium on
on the database. The idea of generat- Functional and Logic Programming. Springer, 2006,
ing all three tiers from a single source accessible to all. 242–258.
7. Queinnec, C. The influence of browsers on evaluators
was christened “programming without Chlipala suggests or, continuations to program Web servers. ACM
SIGPLAN Notices 35 (2000), 23–33.
tiers” by Cooper, Lindley, Wadler, and
Yallop.2 Many systems independently a better approach. Phil Wadler ([email protected]) is Professor of
appeared generating two or three Theoretical Computer Science in the Laboratory for
Foundations of Computer Science in the School of
tiers from a single source, including Informatics at the University of Edinburgh, Scotland.
Google’s AWT and Microsoft’s LINQ,
and research-oriented systems includ-
ing Ocsygen, Opa, and Hop. Copyright held by author.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 93
research highlights
rooms, each of which maintains a log of messages. Any visi- Figure 1. A simple chat-room application.
tor to a chat room may append any line of text to the log, and
table room : { Id : int, Title : string }
there should be some way for other users to stay up-to-date table message : { Room : int, When : time,
on log additions. We start with a simple implementation, in Text : string }
the style of 20th century Web applications, before it became
fun chat id =
common to do significant client-side scripting. We evolve let
toward a version with instant updating upon all message fun say r =
dml (INSERT INTO message (Room, When, Text)
additions, where a chat room runs within a single HTML VALUES ({[id]}, CURRENT_TIMESTAMP, {[r.Text]}));
page updated incrementally by client-side code. Along the chat id
in
way, we highlight our running themes of encapsulation and title <− oneRowE1 (SELECT (room.Title) FROM room
simple concurrency. WHERE room.Id = {[id]});
log <− queryX1 (SELECT message.Text FROM message
The examples from this section are written to be under- WHERE message.Room = {[id]}
standable to readers with different levels of familiarity with ORDER BY message.When)
statically typed functional languages. Though the code should (fn r => <xml>{[r.Text]}<br/> </xml>);
return <xml><body>
be understandable to all at a high level, some remarks (safe to <h1>Chat Room: {[title]}</h1>
skip) may require more familiarity.
<form>
Add message: <textbox{#Text}/>
2.1. HTML and SQL <submit value="Add" action={say}/>
</form>
Mainstream modern Web applications manipulate code in
many different languages and protocols. Ur/Web hides most <hr/>
of them within a unified programming model, but we decided {log}
to expose two languages explicitly: HTML, for describing the </body> </xml>
end
structure of Web pages as trees, and SQL, for accessing a per-
sistent relational database on the server. In contrast to main- fun main () =
stream practice, Ur/Web represents code fragments in these rooms <− queryX1 (SELECT * FROM room
ORDER BY room.Title)
languages as first-class, strongly typed values. (fn r => <xml> <li><a link={chat r.Id}>
Figure 1 gives our first chat-room implementation, rely- {[r.Title]}</a> </li> </xml>);
return <xml> <body>
ing on embedding of HTML and SQL code. While, in general, <h1>List of Rooms</h1>
Ur/Web programs contain code that runs on both server and
{rooms}
clients, all code from this figure runs on the server, where </body> </xml>
we are able to enforce that it is run exactly as written in the
source code.
The first two lines show declarations of SQL tables, which
can be thought of as mutable global variables of type “mul- sense that they take other functions as arguments, and we
tiset of records.” Table room’s records contain integer IDs often write those function arguments anonymously using the
and string titles, while table message’s records contain syntax fn x => e, which defines a function that, when called,
integer room IDs, timestamps, and string messages. The returns the value of expression e where parameter variable x
former table represents the set of available chat rooms, is replaced with the actual argument value. We adopt a typo-
while the latter represents the set of all (timestamped) mes- graphic convention for documenting each library function
sages sent to all rooms. briefly, starting with queryX1, used in main:
We direct the reader’s attention now to the declaration
queryX1 Run an SQL query that returns columns from a
of the main function, near the end of Figure 1. Here we see
single table (leading to the 1 in the identifier), calling
Ur/Web’s syntax extensions for embedded SQL and HTML
an argument function on every result row. Since just
code. Such notation is desugared into calls to constructors
a single table is involved, the input to the argument
of abstract syntax tree types. The main definition demon-
function is a record with one field per column returned
strates two notations for “antiquoting,” or inserting Ur code
by the query. The argument function should return
within a quoted code fragment. The notation {e} asks to eval-
XML fragments (leading to the X in the identifier), and
uate expression e to produce a subfragment to be inserted
all such fragments are concatenated together, in order,
at that point, and notation {[e]} adds a further stage of for-
to form the result of queryX1.
matting e as a literal of the embedded language (using type
classes17 as in Haskell’s show). Note that we are not exposing Ur/Web follows functional languages like Haskell in enforc-
syntax trees to the programmer as strings, so neither anti- ing purity, where expressions may not cause side effects.
quoting form presents any danger of code injection attacks, We allow imperative effects on an “opt-in” basis, with types
where we accidentally interpret user input as code. delineating boundaries between effectful and pure code,
What exactly does the main definition do? First, we run following Haskell’s technique of monadic IO.14 For instance,
an SQL query to list all chat rooms. In our tutorial examples, the main function here inhabits a distinguished monad
we will call a variety of functions from Ur/Web’s standard for input-output. Thus, we use the <- notation to run an
library, especially various higher-order functions for using effectful computation and bind its result to a variable, and
SQL query results. Such functions are higher-order in the we call the return function to lift pure values into trivial
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 95
research highlights
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 97
research highlights
Add message: <ctextbox source={text}/> We loop over all channels associated with the current room,
<button value="Add" onclick={fn _ => sending the new message to each one.
txt <− get text;
set text ""; There is one last change from Figure 4. The onload attri-
lastSn <− get lastSeen; bute of our <body> tag still contains code to run immediately
newMsgs <− rpc (say txt lastSn);
set lastSeen (maxTimestamp newMsgs); after the page is loaded. This time, before we initialize the Log
List.app (fn r => Log.append log r.Text) structure, we also create a new thread with the spawn primi-
newMsgs}/> tive. That thread loops forever, blocking to receive messages
<hr/>
{Log.render log} from the freshly created channel and add them to the log.
</body> </xml> Threads follow a simple cooperative semantics, where
end
end the programming model says that, at any moment in time,
at most one thread is running across all clients of the appli-
cation. Execution only switches to another thread when the
2.3. Message passing from server to client current one terminates or executes a blocking operation,
Web browsers make it natural for clients to contact servers via among which we have RPCs and channel recv. Of course,
HTTP requests, but the other communication direction may the Ur/Web implementation will run many threads at once,
also be useful. One example is our chat application, where only with an arbitrary number on the server and one JavaScript
the server knows when a client has posted a new message, and thread per client, but the implementation ensures that no
we would like the server to notify all other clients in the same behaviors occur that could not also be simulated with the
chat room. Ur/Web presents an abstraction where servers are simpler one-thread-at-a-time model.
able to send typed messages directly to clients, and the Ur/Web This simple approach has pleasant consequences for pro-
compiler and runtime system implement this abstraction gram modularity. The example of Figure 5 only shows a single
once and for all on top of standard protocols and APIs. program module taking advantage of channels. It is possible
The messaging abstraction is influenced by concurrent for channels to be used freely throughout a program, and the
programming languages like Erlang1 and Concurrent ML.15 Ur/Web implementation takes care of routing messages to
Communication happens over unidirectional channels. clients, while maintaining the simple thread semantics.
Every channel has an associated client and a type. The server Figure 5 contains no explicit deallocation of clients that
may send any value of that type to the channel, which con- have stopped participating. The Ur/Web implementation
ceptually adds the message to a queue on the client. Clients detects client departure using a heartbeating mechanism.
AU G U ST 2 0 1 6 | VO L. 59 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 99
research highlights
have chosen the same string name for their elements; mod- Acknowledgments
ule authors must coordinate on how to divide a global This work has been supported in part by National Science
namespace. Meteor supports automatic publishing of server- Foundation grant CCF-1217501. The author thanks Christian
side database changes into client-side caches, and then from J. Bell, Benjamin Delaware, Xavier Leroy, Clément Pit–
those caches into rendered pages. In addition to automatic Claudel, Benjamin Sherman, and Peng Wang for their feed-
updating of pages based on state changes, Meteor provides back on drafts.
a standard DOM-based API for walking document struc-
ture and making changes imperatively, though it is not very References
1. Armstrong, J. Erlang – A survey of the ACM SIGPLAN Conference
idiomatic. Meteor’s machinery for reactive page updating language and its industrial applications. on Programming Language Design
involves a more complex API than Ur/Web’s. Its central con- In Proceedings of the Symposium and Implementation, PLDI ’13
on Industrial Applications of Prolog (New York, NY, USA, 2013).
cept is of imperative functions that need to be rerun when (INAP96) (1996), 16–18. ACM, 411–422.
any of their dependencies change, whereas Ur/Web describes 2. Balat, V. Ocsigen: Typing Web 10. Krishnamurthi, S., Hopkins, P.W.,
interaction with Objective Caml. McCarthy, J., Graunke, P.T., Pettyjohn, G.,
reactive computations in terms of pure code within the sig- In Proceedings of the 2006 Workshop Felleisen, M. Implementation and
on ML, ML ’06 (New York, NY, USA, use of the PLT Scheme Web Server.
nal monad, such that it is easy to rerun only part of a com- 2006). ACM, 84–94. Higher Order Symbol. Comput. 20, 4
putation, when not all of its dependencies have changed. 3. Balat, V., Vouillon, J., Yakobowski, (2007), 431–460.
B. Experience report: Ocsigen, a 11. MacQueen, D. Modules for standard
Forcing purity on these computations helps avoid the con- Web programming framework. ML. In Proceedings of the 1984 ACM
fusing consequences of genuine side effects being repeated In Proceedings of the 14th ACM Symposium on LISP and Functional
SIGPLAN International Conference Programming, LFP ’84 (New York, NY,
on each change to dependencies. The five lines of code near on Functional Programming, ICFP ’09 USA, 1984). ACM, 198–207.
the start of Section 2.2, together with the <dyn> pseudotag, (New York, NY, USA, 2009). ACM, 12. Meijer, E., Beckman, B., Bierman, G.
311–316. LINQ: Reconciling objects, relations
give the complete interface for reactive programming in 4. Cheney, J., Lindley, S., Wadler, P. A and XML in the .NET framework.
Ur/Web, in contrast with tens of pages of documentation practical theory of language-integrated In Proceedings of the 2006 ACM
query. In Proceedings of the 18th ACM SIGMOD International Conference on
(of dynamically typed functions) for Meteor. SIGPLAN International Conference on Management of Data, SIGMOD ’06
Other popular JavaScript frameworks include Angular.js,e Functional Programming, ICFP ’13 (New York, NY, USA, 2006). ACM,
(New York, NY, USA, 2013). ACM, 706–706.
Backbone,f Ractive,g and React.h A commonality among these 403–416. 13. Meyerovich, L.A., Guha, A., Baskin, J.,
libraries seems to be heavyweight approaches to the basic 5. Chlipala, A. Ur: Statically-typed Cooper, G.H., Greenberg, M.,
metaprogramming with type-level Bromfield, A., Krishnamurthi, S.
structure of reactive GUIs, with built-in mandatory concepts record computation. In Proceedings Flapjax: A programming language for
of models, views, controllers, templates, components, etc. of the 31st ACM SIGPLAN Conference Ajax applications. In Proceedings of
on Programming Language Design the 24th ACM SIGPLAN Conference
In contrast, Ur/Web has its 5-line API of sources and signals. and Implementation, PLDI ’10 on Object Oriented Programming
(New York, NY, USA, 2010). ACM, Systems Languages and Applications,
These mainstream JavaScript frameworks tend to force ele- 122–133. OOPSLA ’09 (New York, NY, USA,
ments of reactive state to be enumerated explicitly as fields of 6. Chlipala, A. An optimizing compiler 2009). ACM, 1–20.
for a purely functional web-application 14. Peyton Jones, S.L., Wadler, P.
some distinguished object, instead of allowing data sources language. In Proceedings of the Imperative functional
to be allocated dynamically throughout the modules of a pro- 20th ACM SIGPLAN International programming. In Proceedings of
Conference on Functional the 20th ACM SIGPLAN-SIGACT
gram and kept as private state of those modules. Programming, ICFP 2015 (New York, Symposium on Principles of
NY, USA, 2015). ACM, 10–21. Programming Languages, POPL ’93
7. Chlipala, A. Ur/Web: A simple (New York, NY, USA, 1993). ACM,
4. CONCLUSION model for programming the Web. 71–84.
We have presented the design of Ur/Web, a programming In Proceedings of the 42nd Annual 15. Reppy, J.H. Concurrent Programming
ACM SIGPLAN-SIGACT Symposium in ML. Cambridge University Press,
language for Web applications, focusing on a few language- on Principles of Programming New York, NY, USA, 1999.
design ideas that apply broadly to a class of distributed Languages, POPL ’15 (New York, NY, 16. Serrano, M., Gallesio, E., Loitsch, F.
USA, 2015). ACM, 153–165. Hop, a language for programming the
applications. Our main mission is to promote two desiderata 8. Cooper, E., Lindley, S., Wadler, P., Web 2.0. In Proceedings of the First
that programmers should be asking for in their Web frame- Yallop, J. Links: Web programming Dynamic Languages Symposium, DLS
without tiers. In Proceedings of the ’06 (New York, NY, USA, 2006). ACM.
works, but which seem almost absent from mainstream 5th International Conference on 17. Wadler, P., Blott, S. How to make
discussion: stronger and domain-specific modes of encapsu- Formal Methods for Components and ad-hoc polymorphism less ad hoc.
Objects, FMCO’06 (Berlin, Heidelberg, In Proceedings of the 16th ACM
lation and simple concurrency models based on transactions. 2007). Springer-Verlag, 266–296. SIGPLAN-SIGACT Symposium
9. Czaplicki, E., Chong, S. Asynchronous on Principles of Programming
Ur/Web is used in production today for a number of appli- functional reactive programming for Languages, POPL ’89 (New York,
cations, including at least onei with thousands of paying GUIs. In Proceedings of the 34th NY, USA, 1989). ACM, 60–76.
customers. We list more examples elsewhere.7 We have also
Adam Chlipala ([email protected]),
written elsewhere about Ur/Web’s whole-program optimizing MIT CSAIL, Cambridge, MA.
compiler,6 which has placed favorably in a third-party Web-
framework benchmarking initiative,j for instance achieving
Copyright held by author.
the second-best throughput (out of about 150 participating Publication rights licensed to ACM. $15.00
frameworks) of about 300,000 requests per second on the test
closest to a full Web app.
e
https://angularjs.org/.
f
http://backbonejs.org/.
g
http://www.ractivejs.org/.
h
http://facebook.github.io/react/.
Watch the author discuss his work
i
http://www.bazqux.com/. in this exclusive Communications video.
j
http://www.techempower.com/benchmarks/. http://cacm.acm.org/videos/ur-web
[ C ONTI N U E D FRO M P. 104] fingers, shelter, oblivious to his own carelessness ventions keeps quiet about it. And why ev-
but it’s just the frame and the forks and ... but the offense is trivial compared to the ery tax inspector who goes in finds the ac-
the mudguards. The saddles look weird tax evasion, and I doubt he knows anything counts add up. Or if they don’t, they like
before they’re fully grown, on the trunks about how the company pulls it off. the people here too much to tell on them.
like cankers, but that’s all they are. Just as I won’t.
Growths, like the seats we’re sitting on. Anyway, Jess. Oh, hi. Just telling him The funny thing is, I should be feeling
Don’t get up. It’ll only go out of shape. about you, but I’m sure you can ... ah, outraged, but I’m not. I’m laughing at the
There’s a skill to the job, you know. be off with you! ingenuity and effrontery of it all. I look at
Say you see a growth that’s going wild, That woman will be the death of the guy, grinning stupidly. There’s some-
or a stray sapling from another plan- me some day. Likes her fun. Loves her thing I want to tell him. It’s just on the tip
tation, and you have to prune it before work. But you should have seen her of my tongue. But I can’t.
it spreads. OK, I’ll put the knife away. I when she came here two years ago.
don’t want you to feel uncomfortable. Health and Safety inspector. Hard And our bar seats have the same
Settle in. Take it easy. hat and high heels and a clipboard. effect on anyone who sits on them—
I know, I know. The lab boys—and Wouldn’t touch the beer, insisted on good feelings.
girls of course—will tell you it’s all fruit juice. None of ours, either. She Gut feelings, you might say. Because
predictable, like a chemical factory. took a bottle from her own handbag that’s where they come from—your in-
But biology’s not like that, even when and sipped from it. testinal flora tweaked by the spores to
it’s synthetic. I’m not mystical about Sit yourself down, I said, and got a pump out mood-modifying molecules
it. It’s just mutation and natural selec- beer for myself and started talking to to your bloodstream.
tion. We have a joke—that we’re for her, just like I’m talking to you. After a Doesn’t mean they’re not real feelings.
intelligent design, and against evo- while she began to come round to my You’ll like it here.
lution. Intelligent design is what the way of seeing things, just like you are.
company does, and evolution is what And I did, for several days. The feel-
it pays us to prevent. So that’s what became of the young ings were real all right. I loved the place.
But you can’t. Not completely. lady from Environment ... I find my- I looked around and considered what
Time for another. No, not at all, my self enjoying the sound of the forester’s job to apply for, and kept phoning in
treat. You’re our guest here. And have a voice, and somehow that lazy enjoy- glowing reports to the office. Meanwhile,
shot of spirit on the side; yes, we do our ment is more important than the dis- the polycarbonate strands under my
own distilling. Careful though, you’re covery. Must be the vague benign glow skin kept reporting back what was really
probably not used to it. from the drink. The drink! Have I been going on. Then, the next morning, I woke
Just a sip, that’s it. Oh! There’s a slipped something? to the sound of helicopters.
napkin. A gulp of beer afterward, that’s The Revenue knew what the wire had
the trick. Gesundheit! Here, have anoth- Nah, don’t look at your pint like reported, but back at the office they got
er napkin. that, or even your shot. It’s good beer nothing out of me. Geraldine was very
Where were we? Oh yes. This tax and good grain spirit with a touch of understanding, and put me on sick leave
business you’re on about. Well, the flavoring, that’s all. It’s not what’s until the artificial loyalty wore off.
company takes care of all that. We get bringing you round. And it’s not what’s Now it has, I can tell the story. But
our bank statements on our phones. been making you sneeze. one question still bothers me: What
Take a look. Yes, the balance does It’s the spores. stopped me, for those days of synthetic
mount up. Not much we need to Told you, the seat’s made from the love for the company, from telling them
spend money on here. Every so often same stuff as the bike saddles. Just an- about the wire?
some government department sends other mutant growth. But it’s not just Only now, when the wire has disin-
in someone young and keen like your- the shape that’s mutated. tegrated, do I realize one obvious an-
self who thinks they can make a big The spores waft out whenever you swer—the wire itself. The Revenue has
deal of some little thing and make a shift your weight. You can’t help but its own biochemical tricks to ensure loy-
name for themselves. But they soon inhale them. The people who designed alty and secrecy. And it isn’t telling any-
come round, every one of them. the bikes wanted people to feel confi- one about them.
Take Jess over there. Lovely girl. dent riding them, so they did a bit of When I ask Geraldine if I’m right, she
No, she’s not smoking. Good grief, tinkering with the spores. Put a bit looks vague.
do you think we’re crazy? It’s some of genetic code in to work on the old ‘It’s just on the tip of my tongue,’ she
kind of steam, you just roll up the nervous system. Give feelings of confi- says. ‘It’ll come back to me.’
leaf real tight like, and it heats up in dence and rightness and wellbeing.
the center. Some experimental thing Ken MacLeod ([email protected]) is the
author of 15 novels, from The Star Fraction (Orbit Books,
that blew in, no doubt. Like I was say- So that’s why LifeCycles bikes sell so London, 1995) to The Corporation Wars: Dissidence (Orbit
ing, evolution. well, why their customers are so loyal. Books, London, 2016). He blogs at The Early Days of a
Better Nation (http://kenmacleod.blogspot.com) and
And their employees, too ... and why ev- tweets as @amendlocke.
Got him right there, admitting a breach of eryone who finds out about this massive,
the GM regulations. My cheerful host waves brazen violation of the GM regulations
a hand around the crowded recreation and all the relevant international con- © 2016 ACM 0001-0782/16/08 $15.00
Future Tense
Gut Feelings
Even a little genetic engineering can
render us too comfortable for our own good.
I ’ M A T A X I N S P E C T O R . I don’t expect to
be liked.
‘They’re up to something in there,’
said Geraldine Myles, her finger-tap cast-
ing a black shadow on the map projec-
tion. ‘I can feel it.’
‘We can all feel it, chief,’ I said. ‘It’s
getting the evidence that’s the problem.’
The LifeCycles forest covered most of
Fife. Its export record didn’t square with
its revenue. The product’s popularity was
as inarguable as it was unaccountable.
Every inspector we’d sent in had returned
glowing reports. The young lady from
Environment was so impressed she’d
chucked in her job and joined it. Drones?
The forest found a way to swat them. The
tiny wreckage was always returned, with
a bland apology and a note on commer-
cial secrecy.
‘Evidence? That’s where you come in.’
Myles shot me an evil smile. ‘Or rather,
that’s where you’re going in.’
Talk in private? Sure, happy to do that. is conventional polished hardwood, but Sit yourself down and let me get you a
We’ve got nothing to hide. Everything’s seamless in a way that makes me sus- pint. No, no … it’s nothing, it’s a local
open and above board, even if it’s a bit pect it’s GM, too. A couple dozen people brew, you can see the kegs right there.
dark and gloomy looking. But what’s are here, at the end of the working day. Yes, it’s all covered in the forest bylaws;
private these days? The trees don’t talk, Folks come and go through gaps in the you can look it up.
but the walls listen; that’s what we say hedge. Talk is careless when loud. I Cheers! Good stuff, eh?
around here. can’t hear what it’s like when it’s quiet. You were saying? Oh yes! The trees.
But the wire can. Well, I can see how you might get that
IMAGE BY AND RIJ BORYS ASSOCIAT ES/SHUT TERSTOCK
Walls? I’m confused for a moment, until It’s not the walls he should be worry- impression. A lot of folks do, the first
I realize this friendly foreman is refer- ing about. time. Oppressive and sinister, they
ring to the densely interwoven geneti- We call the transmitter injected across tell me. It’s dark beneath the trees—
cally modified hedge that surrounds my collarbones ‘the wire,’ but that’s just but the leaves are designed to catch as
and covers this place deep in the for- a skeuomorph. It’s a new device. It’ll be much sunlight as possible.
est. It’s one of the LifeCycles recreation about a fortnight before it disintegrates. And the branches … nothing but bi-
spots, modeled on an old-fashioned Subcutaneous, organic, undetectable ... cycle frames, thousands of them. It looks
pub. The lighting is bioluminescent and now, with a carefully casual shrug of sinister only when you let daft ideas into
tubes. The floor is like pine needles, the my shoulders … your head, about a crooked arm, maybe,
furnishings cork. The bar at the center On. with clawed [C O NTINUED O N P. 103]