0% found this document useful (0 votes)
340 views116 pages

Communications201710-Dl - Barriers To Refactoring

Uploaded by

hhhzine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
340 views116 pages

Communications201710-Dl - Barriers To Refactoring

Uploaded by

hhhzine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

COMMUNICATIONS

ACM
CACM.ACM.ORG OF THE 10/2017 VOL.60 NO.10

Internet Advertising
Metaphors We Compute By
3D-Printing Body Parts
What Can Agile Methods Bring
to Software Development?

Association for
Computing Machinery
Previous
A.M. Turing Award
Recipients

1966 A.J. Perlis


1967 Maurice Wilkes
1968 R.W. Hamming
1969 Marvin Minsky
1970 J.H. Wilkinson
1971 John McCarthy
1972 E.W. Dijkstra
1973 Charles Bachman
1974 Donald Knuth
1975 Allen Newell
1975 Herbert Simon
1976 Michael Rabin
1976 Dana Scott
1977 John Backus
1978 Robert Floyd
1979 Kenneth Iverson
1980 C.A.R Hoare ACM A.M. TURING AWARD
NOMINATIONS SOLICITED
1981 Edgar Codd
1982 Stephen Cook
1983 Ken Thompson
1983 Dennis Ritchie
1984 Niklaus Wirth Nominations are invited for the 2017 ACM A.M. Turing Award.
1985 Richard Karp
1986 John Hopcroft This is ACM’s oldest and most prestigious award and is given
1986 Robert Tarjan to recognize contributions of a technical nature which are of
1987 John Cocke
1988 Ivan Sutherland lasting and major technical importance to the computing field.
1989 William Kahan The award is accompanied by a prize of $1,000,000.
1990 Fernando Corbató
1991 Robin Milner Financial support for the award is provided by Google Inc.
1992 Butler Lampson
1993 Juris Hartmanis
1993 Richard Stearns
Nomination information and the online submission form
1994 Edward Feigenbaum are available on:
1994 Raj Reddy http://amturing.acm.org/call_for_nominations.cfm
1995 Manuel Blum
1996 Amir Pnueli
1997 Douglas Engelbart Additional information on the Turing Laureates
1998 James Gray is available on:
1999 Frederick Brooks http://amturing.acm.org/byyear.cfm .
2000 Andrew Yao
2001 Ole-Johan Dahl
2001 Kristen Nygaard The deadline for nominations/endorsements is
2002 Leonard Adleman
2002 Ronald Rivest
January 15, 2018.
2002 Adi Shamir
2003 Alan Kay For additional information on ACM’s award program
2004 Vinton Cerf
2004 Robert Kahn please visit: www.acm.org/awards/
2005 Peter Naur
2006 Frances E. Allen
2007 Edmund Clarke
2007 E. Allen Emerson
2007 Joseph Sifakis
2008 Barbara Liskov
2009 Charles P. Thacker
2010 Leslie G. Valiant
2011 Judea Pearl
2012 Shafi Goldwasser
2012 Silvio Micali
2013 Leslie Lamport
2014 Michael Stonebraker
2015 Whitfield Diffie
2015 Martin Hellman
2016 Sir Tim Berners-Lee
COMMUNICATIONS OF THE ACM

Departments News Viewpoints

5 Editor’s Letter 24 Technology Strategy and Management


Computing Is a Profession Amazon and Whole Foods:
By Andrew A. Chien Follow the Strategy (and the Money)
Checking out the recent Amazon
7 Cerf’s Up acquisition of Whole Foods.
Six Education By Michael A. Cusumano
By Vinton G. Cerf
27 Inside Risks
8 Letters to the Editor The Real Risks
Beyond Brute Force of Artificial Intelligence
Incidents from the early days
12 BLOG@CACM of AI research are instructive in
Manipulating Word Representations, the current AI environment.
and Preparing Students By David Lorge Parnas
for Coding Jobs? 18
Robin K. Hill mulls an aspect 32 Economic and Business Dimensions
of natural language processing 15 3D-Printing Human Body Parts FinTech Platforms and Strategy
research, while Mark Guzdial Bioprinting has generated bones, Integrating trust and
ponders why coding is taught cartilage, and some muscles; automation in finance.
in public schools. hearts and livers are still years away. By Vasant Dhar and Roger M. Stein
By Keith Kirkpatrick
33 Calendar 36 Kode Vicious
18 Digital Hearing IoT: The Internet of Terror
103 Careers Advances in audio processing If it seems like the sky is falling,
help separate the conversation that’s because it is.
from background noise. By George V. Neville-Neil
Last Byte By Don Monroe
38 Viewpoints
112 Upstart Puzzles 21 Portable Device Fears Show What Can Agile Methods
Partitioned Peace Power of Social Development Bring to High-Integrity
By Dennis Shasha How do small screens Software Development?
impact young minds? Considering the issues and
By Chris Edwards opportunities raised by Agile
practices in the development
of high-integrity software.
By Roderick Chapman, Neil White,
and Jim Woodcock
PHOTO BY EDWA RD OLIVE

2 COMMUNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 10


10/2017 VOL. 60 NO. 10

Practice Contributed Articles Review Articles

70 Internet Advertising:
Technology, Ethics, and
a Serious Difference of Opinion
Exploring the technical and ethical
issues surrounding Internet
advertising and ad blocking.
By Stephen B. Wicker
and Kolbeinn Karlsson

Watch the authors discuss


their work in this exclusive
Communications video.
https://cacm.acm.org/
videos/internet-advertising
42 54

Research Highlights
42 Metaphors We Compute By 54 Barriers to Refactoring
Code is a story that explains Developers know refactoring 80 Technical Perspective
how to solve a particular problem. improves their software, but Broadening and Deepening
By Alvaro Videla many find themselves unable Query Optimization Yet
to do so when they want to. Still Making Progress
46 Research for Practice: Technology By Ewan Tempero, Tony Gorschek, By Jeffrey F. Naughton
for Underserved Communities; and Lefteris Angelis
Personal Fabrication 81 Multi-Objective Parametric
Expert-curated guides to Query Optimization
Watch the authors discuss
the best of CS research. their work in this exclusive
By Immanuel Trummer
Communications video. and Christoph Koch
https://cacm.acm.org/
50 Four Ways to Make CS videos/barriers-to-
and IT More Immersive refactoring 90 Technical Perspective
Why the Bell curve hasn’t Shedding New Light on
transformed into a hockey stick. 62 Millennials’ Attitudes Toward IT an Old Language Debate
By Thomas A. Limoncelli Consumerization in the Workplace By Jeffrey S. Foster
Millennials entering the workforce
(L) IM AGE BY ANDRIJ BORYS ASSOCIAT ES/SH UTT ERSTOC K; ( R) ILLUSTRATION BY J USTIN MET Z

Articles’ development led by ignore the risks of using privately 91 A Large-Scale Study
queue.acm.org
owned devices on the job. of Programming Languages and
By Heiko Gewald, Xuequn Wang, Code Quality in GitHub
Andy Weeger, Mahesh S. Raisinghani, By Baishakhi Ray, Daryl Posnett,
Gerald Grant, Otavio Sanchez, Premkumar Devanbu,
and Siddhi Pittayachawan and Vladimir Filkov

About the Cover:


Practitioners have turned
to refactoring for years
to make software code
cleaner and sleeker. But
studies show developers
do not rely on refactoring
as often as the could, and
maybe should, for a variety
of reasons. Our cover
story (p. 54) explores
the specific barriers
that hinder refactoring,
and suggests ways to Association for Computing Machinery
promote its benefits. Cover Advancing Computing as a Science & Profession
illustration by Justin Metz.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF THE ACM 3


COMMUNICATIONS OF THE ACM
Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.

ACM, the world’s largest educational STA F F EDITORIAL BOARD ACM Copyright Notice
and scientific computing society, delivers DIRECTOR OF PU BL ICATIONS E DITOR- IN- C HIE F Copyright © 2017 by Association for
resources that advance computing as a Scott E. Delman Andrew A. Chien Computing Machinery, Inc. (ACM).
science and profession. ACM provides the [email protected] [email protected] Permission to make digital or hard copies
computing field’s premier Digital Library of part or all of this work for personal
and serves its members and the computing or classroom use is granted without
Executive Editor S E NIOR E DITOR
profession with leading-edge publications, fee provided that copies are not made
Diane Crawford Moshe Y. Vardi
conferences, and career resources. or distributed for profit or commercial
Managing Editor
advantage and that copies bear this
Thomas E. Lambert NE W S
Executive Director and CEO notice and full citation on the first
Senior Editor Co-Chairs
Bobby Schnabel page. Copyright for components of this
Andrew Rosenbloom William Pulleyblank and Marc Snir
Deputy Executive Director and COO work owned by others than ACM must
Senior Editor/News Board Members
Patricia Ryan be honored. Abstracting with credit is
Lawrence M. Fisher Monica Divitini; Mei Kobayashi;
Director, Office of Information Systems permitted. To copy otherwise, to republish,
Web Editor Michael Mitzenmacher; Rajeev Rastogi;
Wayne Graves to post on servers, or to redistribute to
David Roman François Sillion
Director, Office of Financial Services lists, requires prior specific permission
Rights and Permissions
Darren Ramdin and/or fee. Request permission to publish
Deborah Cotton VIE W P OINTS
Director, Office of SIG Services from [email protected] or fax
Editorial Assistant Co-Chairs
Donna Cappo (212) 869-0481.
Jade Morris Tim Finin; Susanne E. Hambrusch;
Director, Office of Publications
Scott E. Delman John Leslie King; Paul Rosenbloom For other copying of articles that carry a
Art Director Board Members code at the bottom of the first or last page
Andrij Borys William Aspray; Stefan Bechtold; or screen display, copying is permitted
ACM CO U N C I L Michael L. Best; Judith Bishop;
Associate Art Director provided that the per-copy fee indicated
President Stuart I. Feldman; Peter Freeman;
Margaret Gray in the code is paid through the Copyright
Vicki L. Hanson Mark Guzdial; Rachelle Hollander;
Assistant Art Director Clearance Center; www.copyright.com.
Vice-President Richard Ladner; Carl Landwehr;
Cherri M. Pancake Mia Angelica Balaquiot
Production Manager Carlos Jose Pereira de Lucena; Subscriptions
Secretary/Treasurer Beng Chin Ooi; Loren Terveen;
Bernadette Shade An annual subscription cost is included
Elizabeth Churchill Marshall Van Alstyne; Jeannette Wing
Advertising Sales Account Manager in ACM member dues of $99 ($40 of
Past President
Ilia Rodriguez which is allocated to a subscription to
Alexander L. Wolf
Communications); for students, cost
Chair, SGB Board P R AC TIC E
is included in $42 dues ($20 of which
Jeanna Matthews Columnists Chair is allocated to a Communications
Co-Chairs, Publications Board David Anderson; Phillip G. Armour; Stephen Bourne and Theo Schlossnagle subscription). A nonmember annual
Jack Davidson and Joseph Konstan Michael Cusumano; Peter J. Denning; Board Members subscription is $269.
Members-at-Large Mark Guzdial; Thomas Haigh; Eric Allman; Samy Bahra; Peter Bailis;
Gabriele Anderst-Kotis; Susan Dumais; Leah Hoffmann; Mari Sako; Terry Coatta; Stuart Feldman; Nicole Forsgren; ACM Media Advertising Policy
Elizabeth D. Mynatt; Pamela Samuelson; Pamela Samuelson; Marshall Van Alstyne Camille Fournier; Benjamin Fried; Communications of the ACM and other
Eugene H. Spafford Pat Hanrahan; Tom Killalea; Tom Limoncelli; ACM Media publications accept advertising
SGB Council Representatives Kate Matsudaira; Marshall Kirk McKusick;
C O N TAC T P O IN TS in both print and electronic formats. All
Paul Beame; Jenna Neefe Matthews; Erik Meijer; George Neville-Neil;
Copyright permission advertising in ACM Media publications is
Barbara Boucher Owens Jim Waldo; Meredith Whittaker
[email protected] at the discretion of ACM and is intended
Calendar items to provide financial support for the various
BOARD C HA I R S
[email protected] C ONTR IB U TE D A RTIC LES activities and services for ACM members.
Education Board Co-Chairs Current advertising rates can be found
Change of address
Mehran Sahami and Jane Chu Prey James Larus and Gail Murphy by visiting http://www.acm-media.org or
[email protected]
Practitioners Board Board Members by contacting ACM Media Sales at
Letters to the Editor
Terry Coatta and Stephen Ibaraki William Aiello; Robert Austin; (212) 626-0686.
[email protected]
Elisa Bertino; Gilles Brassard; Kim Bruce;
REGIONA L C O U N C I L C HA I R S Alan Bundy; Peter Buneman; Carl Gutwin; Single Copies
W E B S IT E
ACM Europe Council Yannis Ioannidis; Gal A. Kaminka; Single copies of Communications of the
http://cacm.acm.org
Dame Professor Wendy Hall Karl Levitt; Igor Markov; Gail C. Murphy; ACM are available for purchase. Please
ACM India Council Bernhard Nebel; Lionel M. Ni; Adrian Perrig; contact [email protected].
Srinivas Padmanabhuni AU T H O R G U ID E L IN ES
Sriram Rajamani; Marie-Christine Rousset;
ACM China Council http://cacm.acm.org/about-
Krishan Sabnani; Ron Shamir; Yoav Shoham; COMMUN ICATION S OF THE ACM
Jiaguang Sun communications/author-center
Josep Torrellas; Michael Vitale; (ISSN 0001-0782) is published monthly
Hannes Werthner; Reinhard Wilhelm by ACM Media, 2 Penn Plaza, Suite 701,
PUB LICATI O N S BOA R D New York, NY 10121-0701. Periodicals
ACM ADVERTISIN G DEPARTM E NT
Co-Chairs RES E A R C H HIGHLIGHTS postage paid at New York, NY 10001,
2 Penn Plaza, Suite 701, New York, NY
Jack Davidson; Joseph Konstan Co-Chairs and other mailing offices.
10121-0701
Board Members Azer Bestavros and Gregory Morrisett
T (212) 626-0686
Phoebe Ayers; Karin K. Breitman; Board Members POSTMASTER
F (212) 869-0481
Terry J. Coatta; Anne Condon; Nikil Dutt; Martin Abadi; Amr El Abbadi; Sanjeev Arora; Please send address changes to
Roch Guerrin; Chris Hankin; Carol Hutchins; Michael Backes; Maria-Florina Balcan; Communications of the ACM
Yannis Ioannidis; Michael L. Nelson; Advertising Sales Account Manager Andrei Broder; Doug Burger; Stuart K. Card; 2 Penn Plaza, Suite 701
M. Tamer Ozsu; Eugene H. Spafford; Ilia Rodriguez Jeff Chase; Jon Crowcroft; Alexei Efros; New York, NY 10121-0701 USA
Stephen N. Spencer; Alex Wade; [email protected] Alon Halevy; Sven Koenig; Steve Marschner;
Keith Webster; Julie R. Williamson Tim Roughgarden; Guy Steele, Jr.;
Printed in the U.S.A.
Media Kit [email protected] Margaret H. Wright; Nicholai Zeldovich;
ACM U.S. Public Policy Office Andreas Zeller
1701 Pennsylvania Ave NW, Suite 300,
Washington, DC 20006 USA WEB
T (202) 659-9711; F (202) 667-1066 Association for Computing Machinery Chair
(ACM) James Landay
Computer Science Teachers Association 2 Penn Plaza, Suite 701 Board Members A
SE
REC
Y

Deborah Seehorn, New York, NY 10121-0701 USA Marti Hearst; Jason I. Hong;
E

CL
PL

Interim Executive Director T (212) 869-7440; F (212) 869-0481 Jeff Johnson; Wendy E. MacKay
NE
TH

S
I

Z
I

M AGA

4 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


editor’s letter

DOI:10.1145/3137136 Andrew A. Chien

Computing Is a Profession

T
H E N O T I O N O F what consti- ing demands adherence to the highest fessionalism. What roles specifically?
tutes a profession has been technical and ethical standards, and First, a professional society should be
studied extensively through failure has significant consequences at a conservator and disseminator of deep
exploration of the attributes personal, corporate, national, and inter- technical knowledge and expertise:
of the activities, roles, and national scale. Computing applications championing the advance of the field by
community that lead to their rise, reach into every corner. Appropriation, leading technologists worldwide, docu-
definition, and how they achieve im- misuse, or just free flow of this informa- menting the state-of-the-art in technolo-
portance and influence and society.1 tion has demonstrably destroyed indi- gy and application, and accelerating the
Common among these attributes are vidual privacy, relationships, and careers dissemination and availability of such
a deep technical expertise, an essen- (for example, Ashley Madison), grand knowledge to computing professionals.
tial, valued, societal contribution, corporate plans (for example, Sony), and Second, societies develop and advo-
and the need to adhere to high ethi- radically changed international rela- cate principles for ethical technical con-
cal and technical standards. Pro- tions (for example, U.S. Govt OPM pen- duct that frame the role of computing
fessions such as medicine, law, and etration, and Snowden releases). professionals, and buttress them with the
accounting exemplify these attri- With growing excitement for artifi- stature and role of the profession in so-
butes. Computing exhibits all of the cal intelligence, computing is being ciety. Examples include the articulation
attributes of a profession. thrust into new societal roles (recom- of best practices, intellectual challenges
Deep Technical Expertise. Every day mender, decider) and given autonomy for the field,3 as well as address societal
we witness and drive computing’s rapid to make decisions with life-changing questions that require deep technical
technical advance—new technologies, human impact (personal assistant, perspective, such as the USACM joint re-
advancing sophistication, and outright sentencing guidelines, self-driving lease on the Internet of Things.4
new capabilities. The compounding of cars). While deep technical challenges An independent professional society
this continued and accelerating advance abound, the ethical challenges, prin- must transcend any individual, organiza-
give rise to a deep technical expertise. Al- ciples, and standards are even more tion, government, or cause. Necessarily
gorithms and systems behavioral and in- daunting. Compared to Hanks’ “rule- so, as technical knowledge and profes-
ternal complexity are peers to the great- book” in Bridge of Spies, the underlying sional ethics must inform profession-
est complexities humanity has known in principles of a representative govern- al conduct, and inevitably come into
biology, society, and the universe. ment, as embodied in the Constitution conflict with personal interest, corpo-
Societal Recognition. Computing’s of the U.S., and one can only wonder if rate interest, government or national
evident importance to society is deep the issues are any less thorny. interest, or even overt coercion.
and growing—sophisticated collection So yes, computing is a profession, and As the leading computing profes-
and information processing underpins we should proudly embrace the respon- sional society, the ACM seeks to fill
decision-making, logistics, and optimi- sibility. We should welcome, educate, these roles for computing!
zation industry and commerce. Web, and mentor new generations not just
email, and messaging platforms are the as “coders” and “hackers” or program- Andrew A. Chien, EDITOR-IN-CHIEF
information backbone of government mers, but as computing professionals.
and commerce. Social application Computing professionals have a Andrew A. Chien is the William Eckhardt Distinguished
Service Professor in the CS Department at the University
platforms expand these roles from of- responsibility to practice at the state- of Chicago, Director of the CERES Center for Unstoppable
ficial to social, insinuating computing of-the-art, and maintain their knowl- Computing, and a Senior Scientist at Argonne National Lab.

into the core of social fabric. A world edge at the forefront. In addition,
References
without cheap, pervasive computing of professionals have an obligation to 1. Abbott, A. The System of Professions: An Essay on the
extraordinary capability is if not incon- share clear understanding of the tech- Division of Expert Labor. University of Chicago Press,
1988, ISBN-13: 978-0226000695
ceivable at least so distant as to be un- nology and its implications to non- 2. ACM Code of Ethics, (1992); https://www.acm.org/
recognizable. Just as modern existence professionals, and operate in accord about-acm/code-of-ethics
3. Denning, P.J. The Profession of IT. Columns in
without the practice of medicine or law with professional ethics.2 These are Commun. ACM, 2008–2017.
would be unimaginable. daunting goals for an individual, and 4. USACM. Statement on Internet of Things Privacy and
Security, 2017; https://www.acm.org/
Necessity for High Ethical and Techni- so professional societies play a critical
cal Standards. The practice of comput- role in cultivating and supporting pro- Copyright held by author.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF THE ACM 5


AWARD NOMINATIONS SOLICITED
AWARD NOMINATIONS SOLICITED
As part of its mission, ACM brings broad recognition to outstanding technical
As part of its mission, ACM brings broad recognition to outstanding technical
andprofessional
and professional achievements
achievements in computing
in computing and information
and information technology.
technology.
ACMwelcomes
ACM welcomes nominations
nominations for those
for those whowho deserve
deserve recognition
recognition foraccomplishments.
for their their accomplishments. Please
Please refer to therefer
ACMtoAwards
the ACM Awards
website at http://awards.acm.org for guidelines on how to nominate, lists of the members of the 2017
website at http://awards.acm.org for guidelines on how to nominate, lists of the members of the 2017 Award Committees,Award Committees,
andlistings
and listings
ofof past
past award
award recipients
recipients and and
theirtheir citations.
citations.
Nominations
Nominationsareare
due January
due 15, 15,
January 2018 (formerly
2018 November
(formerly 30, 2017)
November 30,with thewith
2017) exceptions of the Doctoral
the exceptions of theDissertation Award
Doctoral Dissertation Award
(dueOctober
(due October31,31,
2017) andand
2017) the the
ACMACM– IEEE CS George
– IEEE Michael
CS George Memorial
Michael HPC Fellowship
Memorial (due May(due
HPC Fellowship 1, 2018).
May 1, 2018).

A.M. Turing Award: ACM’s most prestigious award recognizes contributions of a technical nature which are of lasting and major technical
A.M. Turing Award: ACM’s most prestigious award recognizes contributions of a technical nature which are of lasting and major technical
importance to the computing community. The award is accompanied by a prize of $1,000,000 with financial support provided by Google.
importance to the computing community. The award is accompanied by a prize of $1,000,000 with financial support provided by Google.
ACM Prize in Computing (previously known as the ACM-Infosys Foundation Award in the Computing Sciences): recognizes an early-
ACM
to Prize infundamental,
mid-career Computinginnovative
(previously known asinthe
contribution ACM-Infosys
computing Foundation
that, through Award
its depth, in the
impact andComputing Sciences):
broad implications, recognizes
exemplifies the an early-
to mid-career
greatest fundamental,
achievements innovative
in the discipline. Thecontribution
award carriesin computing
a prize that,Financial
of $250,000. throughsupport
its depth, impact and
is provided broadLtd.
by Infosys implications, exemplifies the
greatest achievements in the discipline. The award carries a prize of $250,000. Financial support is provided by Infosys Ltd.
Distinguished Service Award: recognizes outstanding service contributions to the computing community as a whole.
Distinguished Service Award: recognizes outstanding service contributions to the computing community as a whole.
Doctoral Dissertation Award: presented annually to the author(s) of the best doctoral dissertation(s) in computer science and
engineering, and is accompanied
Doctoral Dissertation Award: by a prize of $20,000.
presented annuallyThe Honorable
to the Mention
author(s) of theAward is accompanied
best doctoral by a prizeintotaling
dissertation(s) $10,000.
computer science and
Financial support
engineering, andof is
the award is provided
accompanied by a by Google,
prize Inc. Winning
of $20,000. dissertations
The Honorable are published
Mention Awardinis the ACM Digital by
accompanied Library andtotaling
a prize the ACM$10,000.
Books Series.
Financial support of the award is provided by Google, Inc. Winning dissertations are published in the ACM Digital Library and the ACM
Books
ACM Series.
– IEEE CS George Michael Memorial HPC Fellowship: honors exceptional PhD students throughout the world whose research focus
is on high-performance computing applications, networking, storage, or large-scale data analysis using the most powerful computers that
ACM
are – IEEEavailable.
currently CS George TheMichael Memorial
Fellowship includesHPC Fellowship:
a $5,000 honors exceptional PhD students throughout the world whose research focus
honorarium.
is on high-performance computing applications, networking, storage, or large-scale data analysis using the most powerful computers that
Grace Murray Hopper
are currently Award:
available. presented toincludes
The Fellowship the outstanding
a $5,000young computer professional of the year, selected on the basis of a
honorarium.
single recent major technical or service contribution. The candidate must have been 35 years of age or less at the time the qualifying
Grace Murray
contribution wasHopper
made. AAward: presented
prize of $35,000 to the outstanding
accompanies the award. young computer
Financial support isprofessional of the year, selected on the basis of a
provided by Microsoft.
single recent major technical or service contribution. The candidate must have been 35 years of age or less at the time the qualifying
Paris Kanellakis Theory and Practice Award: honors specific theoretical accomplishments that have had a significant and demonstrable
contribution was made. A prize of $35,000 accompanies the award. Financial support is provided by Microsoft.
effect on the practice of computing. This award is accompanied by a prize of $10,000 and is endowed by contributions from the Kanellakis
family, and financialTheory
Paris Kanellakis supportand
by ACM’s SIGACT,
Practice SIGDA,
Award: SIGMOD,
honors SIGPLAN,
specific and theaccomplishments
theoretical ACM SIG Project Fund,
thatand individual
have contributions.
had a significant and demonstrable
effect
Karl on the practice
V. Karlstrom of computing.
Outstanding This award
Educator Award: is accompanied
presented by a prize
to an outstanding of $10,000
educator who isand is endowed
appointed by contributions
to a recognized from the Kanellakis
educational
family, and financial
baccalaureate support
institution, by ACM’s
recognized SIGACT, new
for advancing SIGDA, SIGMOD,
teaching SIGPLAN, and
methodologies, the ACM
effecting newSIG Project development
curriculum Fund, and individual contributions.
or expansion
in computer science and engineering, or making a significant contribution to ACM’s educational mission. Teachers with 10 years or less
Karl V. Karlstrom Outstanding Educator Award: presented to an outstanding educator who is appointed to a recognized educational
experience are given special consideration. The Karlstrom Award is accompanied by a prize of $10,000. Financial support is provided by
baccalaureate
Pearson institution, recognized for advancing new teaching methodologies, effecting new curriculum development or expansion
Education.
in computer science and engineering, or making a significant contribution to ACM’s educational mission. Teachers with 10 years or less
ACM – AAAI Allen
experience Newell
are given Award:
special presented to individuals
consideration. selected
The Karlstrom for career
Award contributions
is accompanied bythat haveof
a prize breadth within
$10,000. computer
Financial science,
support is provided by
or that bridge
Pearson computer science and other disciplines. The $10,000 prize is provided by ACM and AAAI, and by individual contributions.
Education.
Outstanding Contribution to ACM Award: recognizes outstanding service contributions to the Association. Candidates are selected
ACM – AAAI Allen Newell Award: presented to individuals selected for career contributions that have breadth within computer science,
based on the value and degree of service overall.
or that bridge computer science and other disciplines. The $10,000 prize is provided by ACM and AAAI, and by individual contributions.
ACM Policy Award: recognizes an individual or small group that had a significant positive impact on the formation or execution of public
Outstanding
policy affecting Contribution to ACM
computing or the Award:
computing recognizes
community. Thisoutstanding service contributions
can be for education, to the Association.
service, or leadership Candidates
in a technology are selected
position; for
based on the
establishing value and program
an innovative degree ofin service overall. or advice; for building the community or a community resources in technology
policy education
policy; or other notable policy activity. The biennial award is accompanied by a $10,000 prize. The first award will be the 2017 award.
ACM Policy Award: recognizes an individual or small group that had a significant positive impact on the formation or execution of public
Software System Award:
policy affecting presented
computing or thetocomputing
an institution or individuals
community. recognized
This foreducation,
can be for developingservice,
a software system that in
or leadership hasa had a lasting position; for
technology
influence, reflected in contributions to concepts, in commercial acceptance, or both. A prize of $35,000 accompanies
establishing an innovative program in policy education or advice; for building the community or a community resources the award with in technology
financial
policy; support
or otherprovided
notableby IBM. activity. The biennial award is accompanied by a $10,000 prize. The first award will be the 2017 award.
policy
ACM Athena Lecturer Award: celebrates women researchers who have made fundamental contributions to computer science. The award
Software System Award: presented to an institution or individuals recognized for developing a software system that has had a lasting
includes a $25,000 honorarium provided by Google.
influence, reflected in contributions to concepts, in commercial acceptance, or both. A prize of $35,000 accompanies the award with
For SIG-specific
financial supportAwards, please
provided IBM. http://awards.acm.org/sig-awards.
by visit
ACM Athena Lecturer Award: celebrates women researchers who have made fundamental contributions to computer science. The award
Vinton G. aCerf,
includes $25,000
ACMhonorarium providedCo-Chair
Awards Committee by Google. John R. White, ACM Awards Committee Co-Chair
Insup Lee, SIG Governing
For SIG-specific Awards,Board Awards
please visit Committee Liaison Rosemary McGuinness, ACM Awards Committee Liaison
http://awards.acm.org/sig-awards.

Vinton G. Cerf, ACM Awards Committee Co-Chair John R. White, ACM Awards Committee Co-Chair
ABA_acm-nominations-cacm-ad-2017.indd 1 8/29/17 9:49 AM
cerf’s up

DOI:10.1145/3134431 Vinton G. Cerf

Six Education
How many ways can you make numbers add
up to six? The obvious pairs are (1,5), (2,4),
(3,3), (4,2) and (5,1). Ok, I left out (0,6) and
(6,0), which are of negligible interest.
Of course, if you permit negative num- methods get the same mathematical these adults will need to learn new
bers the number of pairs is infinite: results. The point is not to use every things to stay current as technology con-
(-1,7), (-2,8), and so on. When I was possible way to do computation but, tinues its relentless evolutionary pace.
growing up, we learned our addition rather, to understand in a more fun- Adapting to change will be a career-en-
and times tables, memorized them, damental way the nature of computa- hancing capability.
and used that information to do simple tion. Students are asked to show their One of the keynote speakers was
arithmetic. I recently had the opportu- work—their reasoning—as a way of Jill Biden, a teacher of teachers who,
nity to join several thousand teachers in determining how well they have ab- astonishingly, taught at Northern Vir-
Southern California for an annual con- sorbed basic concepts of mathemat- ginia Community College while serving
fab on teaching.a I came away with a very ics, for example. Another way in which the U.S. as Second Lady. She referred to
different view of elementary and sec- this depth of knowledge is assessed is the ineffable pain as her stepson, Beau
ondary education than I had going in. to ask students to show they can make Biden, lost his battle with cancer and
For years we have tended to measure practical estimates of the magnitude drew to mind Vice President Biden’s
how well our students have learned of answers they should expect to get. earlier loss of his first wife, Nelia, and
by testing what they know. In the dis- This can be a quick way of ensuring the daughter, Naomi, in a car crash years
cussions at California State Univer- answers are in the right ballpark. For before. In her moving account of life
sity, Fullerton, a different mind set example: before, during, and after the Obama/
emerged. Why not test what students Biden Administration she summarized:
can do? The shift in focus takes into ac- 197 “What matters most in life is how well
count that we have myriad ways in the +332 we will walk through fire.” This, too, was
21st century to find out what we want a life lesson I took away from this very
to know. Part of knowing how to do 527 thought-provoking event.
things is finding information when we In the last couple of months I have
need it. Moreover, it seems evident that To estimate the ballpark, we could also encountered new ways to teach
we learn better not from rote memori- sum 200+330 to get 530, which is a mathematics in online settings. Alexan-
zation, but from using knowledge to good approximation to 527 or even der Khachatryan founded the Reason-
do things. The Montessori schools 200+300 = 500 and easier to do in our ingMind organization to create courses
approach learning from an experien- heads. in mathematics for pre-K, K, elemen-
tial and exploratory perspective, for In the No Child Left Behind legisla- tary, and secondary school levels.c I tried
example. We learn from our mistakes tion, testing knowledge was the gold some of the lessons for the youngest
sometimes more than from our suc- standard but it led to behavior such as children and they plainly laid a founda-
cesses. Certainly science tends to work “teach to the test.” While this might pro- tion for understanding the notion of sets
that way. duce good scores, it might not produce and set membership and equivalence
The Common Core State education good understanding. What I think we classes, all without using unnecessarily
standardsb initiative gets at this idea by want is to produce graduate students complex and obscure terminology.
focusing on the reasons why different who know how to learn and how to find
information they need from a variety of c https://www.reasoningmind.org/
sources. This will also prove to be im-
a Better Together: California Teachers Summit
2017 (http://cateacherssummit.com/)
portant as children grow into adults Vinton G. Cerf is vice president and Chief Internet Evangelist
at Google. He served as ACM president from 2012–2014.
b https://en.wikipedia.org/wiki/Common_Core_ and experience longer working careers
State_Standards_Initiative spanning decades. Without much doubt Copyright held by author.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF THE ACM 7


letters to the editor

DOI:10.1145/3135241

Beyond Brute Force

I
N T H E I R A R T I C L E “The Science of all possible ways to describe phy-
of Brute Force” (Aug. 2017), sician-documented insomnia.1 We
Marjin J.H. Heule and Oliver tested this approach extensively, and
Kullmann humorously asked it played an important role in all our
whether a mathematician subsequent publications. We even ex-

Call for
using brute force is really “a kind of tended it for additional uses (such as
barbaric monster.” While applying to extract family histories of coronary

Nominations simplistic approaches to complex


domains (such as image and speech
processing) is inefficient, certain spe-
artery disease, classify patients with
cancer, and improve accuracy of the
Framingham risk score for patients
for ACM cific computational problems do in- with liver disease).
General Election deed benefit from brute force. In this Metaphorically speaking, a rock-
regard, the mathematician who uses et ship would be a potential solution
brute force is simply functioning as a if one wanted to cross the Grand
good engineer intent on solving prob- Canyon, but if such a trek has no
The ACM Nominating lems efficiently. strict time-completion restriction,
My primary focus during my Mas- riding a donkey or simply walking
Committee is preparing sachusetts General Hospital fellow- would be more efficient, as such so-
to nominate candidates ship (2013–2016) was analyzing the lutions require little maintenance
for the officers of ACM: electronic health records for 314,292 or purchase of expensive fuel. Sci-
patients.2 To identify biomarkers entists confronting a computational
President, associated with outcomes, my col- challenge should not automatically
Vice-President, leagues and I were initially interested apply readily available algorithms
Secretary/Treasurer; in knowing the smoking status of all just because they are commonly
of them—current, past, or never—for known to be efficient. Alternatives
and two
our prediction models. Smoking sta- should be considered as potentially
Members at Large. tus is typically documented in clini- the most efficient solution, includ-
cal narrative notes as free text, and, ing those that may seem to be brute
Suggestions for candidates as reported throughout the literature, force at first glance, but, as in Hans
are solicited. Names should be classification accuracy of current Christian Andersen’s “Ugly Duck-
sent by November 5, 2017 methods is poor. ling” story, might instead mature
to the Nominating Committee Chair, I hypothesized that following a into the most graceful of swans.
c/o Pat Ryan,
simple human-in-the-loop brute-force
Chief Operating Officer, References
approach designed to semi-manual-
ACM, 2 Penn Plaza, Suite 701, 1. Beam, A., Kartoun, U., Pai, J., Chatterjee, A., Fitzgerald,
ly extract non-negated expressions T., Shaw, S., and Kohane, I. Predictive modeling of
New York, NY 10121-0701, USA. physician-patient dynamics that influence sleep
could achieve better accuracy than medication prescriptions and clinical decision making.
many widely used algorithms. It was, I Scientific Reports 9, 7 (Feb. 2017), 1–7.
With each recommendation, 2. Kartoun, U. The man who had them all. Interactions
thought, different from common ma-
please include background 24, 4 (July-Aug. 2017), 22–23.
information and names of individuals chine learning approaches that typi-
the Nominating Committee cally attempt to classify sentences or Uri Kartoun, Cambridge, MA
can contact for additional find associations between words in a
information if necessary. series of words. Moreover, machine
learning algorithms, when applied to For More CS Teachers
Alexander L. Wolf is the Chair text, rely on the assumption that any Try Higher Salaries
of the Nominating Committee, language includes an infinite number One aspect of why more K–12
and the members are of possible expressions. In contrast, schools do not offer computer sci-
Karin Breitman, Judith Gal-Ezer, across a variety of medical conditions, ence (CS) classes not addressed by
Rashmi Mohan, and Satoshi Matsuoka. we observed that clinicians tend to Jennifer Wang in her Viewpoint “Is
use the same expressions to describe the U. S. Education System Ready for
patients’ conditions. For instance, CS for All?” (Aug. 2017) is how salary
the most commonly reported expres- might lead to a lack of qualified and
sions for sleep disorders, including motivated CS teachers. People grad-
“poor sleep,” “sleep disruption,” and uating from universities with CS de-
“sleeps poorly,” covered over 99% grees would most likely prefer to go

8 COMMUNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


letters to the editor

with him for the next 25 years while he

Coming Next Month in COMMUNICATIONS


lent his business insight to helping us
The most advance our fantastic careers.
important task  onald F. Costello, Lincoln, NE,
D
Distinguished Lecturer ACM
for any life form
is to find that
Infection Paths Also
sweet spot where Needed for Internet Growth
there is just As a developer of security in operat-
ing systems, I definitely understand
the right amount that underspecifying URLs, as Vin-
of innovation. ton G. Cerf described it in his Cerf’s
Up column “In Praise of Under-Spec-
ification?” (Aug. 2017), allows many
creative infection paths into a user’s
personal computer. Eric Schmidt, ex-
ecutive chairman of Alphabet, Inc., Cambits:
has declared loose specifications a A Reconfigurable
into industry where they could earn necessary feature of the Internet, Camera System
higher salaries. They do not usually one that has given it the freedom to
prefer to go into teaching where they grow to the network it is today. That
might start at $40,000 per year in is somewhat like saying a swimming User Reviews for Top
public schools (depending on loca- pool or water supply without chlo- Mobile Apps in the
tion) or private schools where they rine is a feature that allows toxic al- Google Play Store
might earn even less. For public gae to bloom. It is, in fact, illegal in
schools, at least in some U.S. states, the U.S. for a water supply to allow Healthcare Robotics
they also must obtain a teacher cer- algae blooms. Other countries have
tification in addition to a CS degree. loose water-supply specifications,
Colin Carlile, Devine, TX thus allowing all sorts of life forms Hootsuite: In Pursuit
the freedom to innovate and thrive of Reactive Systems
in the water supply. Some may be
Author Responds: good for human health and well be- Is There a Single Method
Salary and certification were beyond the ing, while others are deadly. So, too,
for the Internet
scope of the study. However, many factors, with the loose specifications on the
including salary, influence the choice Internet. Douglas Hofstadter of Indi- of Things?
of teaching as a career and profession. ana University included a nice anal-
Also, many potential sources of computer ogy in the prologue of his 1995 book Heads-Up Limit Hold’em
science teachers are constrained by Fluid Concepts and Creative Analogies Poker Is Solved
certification requirements. These are describing life as living in the liquid
excellent considerations for future research range, between both solid and gas-
as schools seek to address the demand for eous phases, neither able to support The Heat Method for
computer science. life. The most important task for any Distance Computation
Jennifer Wang, Mountain View, CA life form is to find that sweet spot
where there is just the right amount Social Agents: Bridging
of innovation; it is only in this Goldi- Simulation and
Helping Our Old Man Code locks zone where life is even pos-
Philip Guo’s blog “How Adults Ages sible. I hope humans are, in spite of Engineering
60+ Are Learning to Code” (Aug. 2017) all evidence to the contrary, capable
reminded me how, in 1964, when I of creating a society where a pleasant Pay What You Want as a
had been a member of ACM for just life is a possibility. Pricing Model for Open
four years and my brother Michael Tom Jones, Seattle, WA
Access Publishing?
was studying computer science at the
University of Wisconsin, our father, Communications welcomes your opinion. To submit a
age 60, decided he ought to learn to Letter to the Editor, please limit yourself to 500 words or
less, and send to [email protected].
code if he hoped to communicate
Plus the latest news about
with his sons. He enrolled at Ford-
brain-computer interfaces,
ham University in the Bronx, NY, and copolymers and self-assembly,
took a course in Fortran, earning a and policing online advertising.
B+. My brother and I discussed coding ©2017 ACM 0001-0782/17/10

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF THE ACM 9


ACM
ON A MISSION TO SOLVE TOMORROW.

Dear Colleague,

Without computing professionals like you, the world might not know the modern
operating system, digital cryptography, or smartphone technology to name an obvious few.

For over 60 years, ACM has helped computing professionals be their most creative, connect
to peers, and see what’s next, and inspired them to advance the profession and make a
positive impact.

We believe in constantly redefining what computing can and should do.

ACM offers the resources, access and tools to invent the future. No one has a larger
global network of professional peers. No one has more exclusive content. No one
presents more forward-looking events. Or confers more prestigious awards. Or provides
a more comprehensive learning center.

Here are just some of the ways ACM Membership will support your professional growth
and keep you informed of emerging trends and technologies:

• Subscription to ACM’s flagship publication Communications of the ACM


• Online books, courses, and videos through the ACM Learning Center
• Discounts on registration fees to ACM Special Interest Group conferences
• Subscription savings on specialty magazines and research journals
• The opportunity to subscribe to the ACM Digital Library, the world’s
largest and most respected computing resource

Joining ACM means you dare to be the best computing professional you can be. It means
you believe in advancing the computing profession as a force for good. And it means
joining your peers in your commitment to solving tomorrow’s challenges.

Sincerely,

Vicki L. Hanson
President
Association for Computing Machinery

Advancing Computing as a Science & Profession


SHAPE THE FUTURE OF COMPUTING.
JOIN ACM TODAY.
ACM is the world’s largest computing society, offering benefits and resources that can advance your career and
enrich your knowledge. We dare to be the best we can be, believing what we do is a force for good, and in joining
together to shape the future of computing.

SELECT ONE MEMBERSHIP OPTION


ACM PROFESSIONAL MEMBERSHIP: ACM STUDENT MEMBERSHIP:
q Professional Membership: $99 USD q Student Membership: $19 USD
q Professional Membership plus q Student Membership plus ACM Digital Library: $42 USD
ACM Digital Library: $198 USD ($99 dues + $99 DL) q Student Membership plus Print CACM Magazine: $42 USD
q ACM Digital Library: $99 USD q Student Membership with ACM Digital Library plus
(must be an ACM member) Print CACM Magazine: $62 USD

q Join ACM-W: ACM-W supports, celebrates, and advocates internationally for the full engagement of women in
computing. Membership in ACM-W is open to all ACM members and is free of charge.
Priority Code: CAPP
Payment Information
Payment must accompany application. If paying by check
or money order, make payable to ACM, Inc., in U.S. dollars
Name or equivalent in foreign currency.

ACM Member # q AMEX q VISA/MasterCard q Check/money order


Mailing Address
Total Amount Due

Credit Card #
City/State/Province
Exp. Date
ZIP/Postal Code/Country
Signature
Email

Return completed application to:


Purposes of ACM ACM General Post Office
ACM is dedicated to: P.O. Box 30777
1) Advancing the art, science, engineering, and New York, NY 10087-0777
application of information technology
Prices include surface delivery charge. Expedited Air
2) Fostering the open interchange of information Service, which is a partial air freight delivery service, is
to serve both professionals and the public available outside North America. Contact ACM for more
3) Promoting the highest professional and information.
ethics standards Satisfaction Guaranteed!

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.

1-800-342-6626 (US & Canada) Hours: 8:30AM - 4:30PM (US EST) [email protected]
1-212-626-0500 (Global) Fax: 212-944-1318 acm.org/join/CAPP
The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.

Follow us on Twitter at http://twitter.com/blogCACM

DOI:10.1145/3131066 http://cacm.acm.org/blogs/blog-cacm

Manipulating Word Mikolov and his colleagues find


that v1 - v2 + v3 ≈ v4 (in vector offset

Representations,
mathematics in the n-dimensional
space). This is certainly an intrigu-
ing result, in accord with our un-

and Preparing Students derstanding of the meanings of the


four words, in which taking “king,”
removing the “male” aspect, and

for Coding Jobs? replacing it with a “female” aspect,


gives “queen.”
The word vectors can also capture
Robin K. Hill mulls an aspect of natural language processing research, plurality and other shades of mean-
while Mark Guzdial ponders why coding is taught in public schools. ing the researchers regard as syntac-
tic. For example, the offset between
singular and plural is a (learned)
Robin K. Hill refines the values of each element of constant:
Deep Dictionary each word vector in the training phase.
http://bit.ly/2tRnVZN The network, through iterated adjust- d(“apple”) - d(“apples”) ≈ d(“families”)
June 20, 2017 ment of the elements of the vector - d”family”) ≈ d(“cars”) - d(“car”)
Recent research in natu- based on errors detected on compari-
ral language process- son with the text corpora, produces the More details of word2vec can be
ing using the program word2vec gives values in continuous space that best found in the explanation by Xin
manipulations of word representa- reflect the contextual data given. The Rong.3 It looks like, not by direct cod-
tions that look a lot like semantics end result is the word vector, a lengthy ing but by some fortuitous discov-
produced by vector math. For vector list of real numbers that do not seem to ery, the system has figured out some
calculations to produce semantics have any particular interpretation that mathematical analog for semantics.
would be remarkable, indeed. The would reflect properties of the thing it- (There is no claim that individual ele-
word vectors are drawn from context, self, nor properties of words. ments of the vector are capturing fea-
big, huge context. And, at least rough- In the example provided by Mikolov tures such as gender, status, or gram-
ly, the meaning of a word is its use (in et al.,1 they “... assume relationships matical number.)
context). Is it possible some question are present as vector offsets, so that in We already have a compendium of
is begged here? the embedding space, all pairs of words data on relationships of words to oth-
We represent words by vectors (one- sharing a particular relation are related er words through contexts of use: the
dimensional arrays) of a large number by the same constant offset.” Let’s use dictionary. The use of a word is largely
of elements, the numeric values of ‘d’ to stand for “distributed representa- given by its context. Its context can be
which are determined by reading and tion by vector” of a word. inferred also from its dictionary defi-
processing a vast number of examples nition. Most dictionaries will offer a
of context in which those words ap- d(“king”) = v1 direct or indirect connection through
pear, and which are functions of the d(“male”) = v2 “king” to “ruler” or “sovereign” and
distances between occurrences of d(“female”) = v2 “male” and through “queen” to “rul-
words in sequences. A neural network d(“queen”) = v4 er” or “sovereign” and “female,” as :

12 COM MUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


blog@cacm

queen ary. That close reflection is derived from The lessons to be learned from earli-
The female ruler of an independent the fact that words in natural text will be er school reformers are straightforward.
state, especially one who inherits arranged in accordance with dictionary ˲˲ Build teacher capabilities in con-
the position by right of birth definitions. The word2vec result is revela- tent and skills since both determine to
king tion of an embedded regularity. what degree, if any, a policy gets past the
The male ruler of an independent classroom door.
state, especially one who inherits References ˲˲ With or without enhanced capabili-
1. Mikolov, T., Chen, K., Corrado, C., and Dean, J. Efficient
the position by right of birth Estimation of Word Representations in Vector Space, ties and expertise, teachers will adapt
ruler https://arxiv.org/abs/1301.3781. policies aimed at altering how and what
2. Oxford Living Dictionary, 2017, Oxford University
A person exercising government or Press, https://en.oxforddictionaries.com/. they teach to the contours of the class-
3. Rong, X. word2vec Parameter Learning Explained,
dominion https://arxiv.org/abs/1411.2738.
rooms in which they teach. If policymak-
ers hate teacher fingerprints over inno-
These definitions2 show gender can vations, if they seek fidelity in putting
be “factored out,” and in common usage Mark Guzdial desired reforms into practice, they wish
the gender aspect of sovereigns is nota- Coding in Schools as for the impossible.
ble. We would expect those phenomena New Vocationalism: ˲˲ Ignoring both of the above lessons
to show up in vast text corpora. In fact, Larry Cuban on ends up with incomplete implementa-
we would expect that to show up in text What Schools Are For tion of desired policies and sorely disap-
corpora because of the dictionary entries. http://bit.ly/2tpSgip pointed school reformers.
Since we base word use on definitions July 18, 2017 ˲˲ In Part 3 (http://bit.ly/2wpo9o8), he
captured by the dictionary, it is natural Larry Cuban is an educational historian returns to the question of what school is
for any graph-theoretic distance metric who has written before on why requiring for. He describes successful reform as
based on node placement to (somehow) coding in schools is a bad idea (http:// a collaboration between top-down de-
reflect that cross-semantic structure. bit.ly/2uocMLQ). Jane Margolis and Yas- signers and policy-makers and bottom-
Suppose that, employing the English min Kafai wrote an excellent response up teachers. He describes a successful
slang terms “gal” and “guy” for male and about the importance of coding in model for reform that created “work
female, the word for queen were “ruler- schools (http://bit.ly/2vmi52U). Cuban circles” of researchers and teachers (at
gal,” and for king “rulerguy,” (perhaps penned a three-part series about “Cod- Northwestern University) to achieve the
the word for mother were “parentgal,” ing: The New Vocationalism,” likely in- goals of the researchers’ curriculum by
and for father, “parentguy”). Then the spired by a recent New York Times article adapting it with the help of the teachers.
word vector offsets calculated would not about the role tech firms are having on Cuban is not necessarily against
appear as remarkable, the relationships school policy (http://nyti.ms/2uodaKi). teaching computing in schools; he says
exposed in the words themselves. ˲˲ In Part 1 (http://bit.ly/2v2ULoo), he it doesn’t make sense to impose it as a
The system word2vec constructs and describes the ‘dance’ schools have had mandate from industry. More impor-
operates through the implicit frame- with industry over more than 100 years, tantly, he offers a path forward: mutual
work of a dictionary, which gave rise to between preparing future citizens and adaptation of curricular goals, between
the input data to word2vec. How could preparing future workers. designers and teachers.
it be otherwise? As we understand the Preparation for the workplace is not Mutual adaptation can benefit teach-
high degree of contextual dependency the only goal for public schooling, yet ers and students. While this is only one
of word meanings in a language, any that has been the primary purpose for study of four teachers wrestling with
representation of word meaning to a most reformers over the past three de- teaching a science unit, it is suggestive
significant degree will reflect context, cades. A century ago, reformers also of what can occur.
where context is its interassociation elevated workplace preparation as the Will similar efforts involve teachers
with other words. overarching purpose for tax-supported and make the process of mutual adap-
The result is still intriguing. We public schools. tation work for both teachers and stu-
have to ask how co-occurrence of In the new vocationalism, Cuban dents? I have yet to read of such initia-
words can reliably lay out semantic sees schools have been tied to economic tives as districts and states mandating
relationships. We might explore the growth and the needs of information- computer science courses and requiring
aspects of semantics missing from age society. He sees coding advocates young children to learn to code. Repeat-
context analysis, if any. We might (and blending the roles of school in prepar- ing the errors of the past and letting
should) ask what sort of processing of ing citizens and school as preparing mutual adaptation roll out thought-
a dictionary would deliver the same workers by arguing that computing is lessly has been the pattern thus far. The
sort of representations, if any. necessary for modern society. “New Vocationalism,” displaying a nar-
The word vectors produced by the ˲˲ In Part 2 (http://bit.ly/2vmjCG8), he rowed purpose for tax-supported public
method of training on a huge natural text points out any education reform faces schools, marches on unimpeded.
dataset, in which words are given dis- the reality of what teachers know and
tributed vector representations refined what they will actually do in the class- Robin K. Hill is an adjunct professor in the Department of
Philosophy at the University of Wyoming. Mark Guzdial
through associations present in the in- room. He draws on efforts in the 1950s is a professor in the College of Computing at Georgia
put context, reflect the cross-referential and 1960s, and uses the story of Logo to Institute of Technology.

semantic compositionality of a diction- explain how reformers get it wrong. Copyright held by authors.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 13
SEC 2017
The 2nd ACM/IEEE Symposium on Edge Computing
San Jose, CA, October 12-14, 2017
http://acm-ieee-sec.org/2017/
General Chair
Register Now
Junshan Zhang, Arizona State Univ.

Program Chairs
Mung Chiang, Purdue Univ.
Keynote Speaker Bruce Maggs, Akamai/Duke Univ.

Steering Committee
Prof. Mahadev Satyanarayanan Victor Bahl, Microsoft Research
Flavio Bonomi, Nebbiolo Technologies, Inc
Carnegie Mellon Univ. Rong N. Chang, IBM Research
Dejan Milojicic, HP Labs
Michael Rabinovich, Case Western Reserve
Univ.
Dr. Pablo Rodriguez Weisong Shi (Chair), Wayne State Univ.
CEO of Telefonica Alpha Tao Zhang, Cisco

Conference Sponsorship

Platinum Sponsors

Gold Sponsors
N
news

Science | DOI:10.1145/3131068 Keith Kirkpatrick

3D-Printing
Human Body Parts
Bioprinting has generated bones, cartilage, and some muscles;
hearts and livers are still years away.

T
H E A D VE N T OF three-dimen-
sional (3D) printing is al-
ready yielding benefits in
many fields by improving
the speed and efficiency of
product development, prototyping,
and manufacturing, while also en-
abling true on-off customization to
suit individual needs. While this is
certainly a boon for manufacturers,
3D printing holds even greater prom-
ise when considering the possibilities
for creating body parts or tissues to
replace or repair organs or limbs that
have worn out, become damaged, or
have been lost to due injury or disease.
Indeed, significant ongoing re-
search is being conducted at the uni-
versity level into the use of 3D printing
to create a variety of replacement parts
for aspects of the human anatomy.
PHOTO BY CH RIS RATCLIF FE/ BLOOM BERG VIA GET T Y IM AGES

Unlike standard two-dimensional


printers that affix toner to sheets of
paper, 3D printers create objects by Three-dimensionally printed prosthetic replacements for a human nose and ear.
putting down a layer of a powdered
substrate, heating it to harden it, and body, a process known as bioprinting, were harvested from patients, then
then repeating the process, layer by was born out of a process initially de- painstakingly layered on top of one
layer, building the item up to create a veloped 20 years ago, when surgeon another to create handmade “scaf-
completed object. Anthony Atala and his team at Boston folds”; then the cells would grow and
The notion of using 3D printing Children’s Hospital started to build multiply, eventually producing func-
(also known as additive manufactur- novel tissues for regenerative medi- tional tissues, including skin, blad-
ing) to replace parts of the human cine by hand. In this process, cells der, and cartilage.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 15
news

The process was far too slow and complex tissue and more control; 3D that support blood vessels.
labor-intensive to be practical, so Ata- printing can provide that.” “The biggest issue on the whole
la’s team, which had relocated to the The Wake Forrest University team right now is blood supply,” Goldstein
Wake Forest Institute for Regenera- has successfully fabricated mandible says. “You can’t make the constructs
tive Medicine in Winston-Salem, NC, and calvarial (skullcap) bone, car- too large without running into the is-
reconfigured standard desktop inkjet tilage, and skeletal muscle, and ex- sue of having a necrotic core (an area
printers so they could use the inkjets pects to conduct additional clinical of dead cells) in the middle. Just like
to shoot out cells. The idea behind trials over the next three to five years our body needs blood circulating all
bioprinting is to use a small nozzle to demonstrate the effectiveness of its the time, so do these living tissues.”
to precisely control the placement of process. Studies have shown the printed
cells, and the biomaterials around the Researchers on Long Island, NY, tracheal tissue was just as strong
cells, to form a structural framework, have addressed 3D bioprinting in a as the tracheal tissue found natu-
while also creating a construct that different manner, using relatively in- rally in mammals, and since the suc-
can deliver nutrients to ensure the expensive consumer-grade 3D print- cess of printed tracheas, the team
cells’ growth. ers to print replacement body parts, has gone on to print other types of
One of the key considerations with including bone, cartilage, and a tra- body parts, including bone and car-
bioprinting is the need to maintain chea. Indeed, Todd Goldstein, a 3D tilage. These parts are particularly
a 3D structure. The liquid “inks,” bioprinting researcher and director of strong candidates for 3D printing
which contain cells and structural Northwell Ventures 3D Printing Labo- because each individual will re-
biomaterial such as collagen, must ratory, Manhasset, NY, has printed quire uniquely sized parts. Further-
gel as quickly as they are deposited, tracheas made of living cells, using a more, because bone, cartilage, and
to maintain their form. The printers, Makerbot Replicator 2, a 3D printer tendons don’t require a blood sup-
with multiple nozzles, allow for the that has been modified to print bio- ply, they can be implanted without the
simultaneous printing of multiple material, at a cost of under $2,000. possibility of rejection.
cell types, structural biomaterials, Goldstein modified the Makerbot Indeed, that is why the printing of
and other chemicals. This, of course, printer with a syringe that dispenses more-complex body parts is still likely
requires significant pre-planning bio-ink, a material made of living cells several years away. Creating pieces of
calculations to be made focused on with a viscous consistency. The nozzle soft tissue to mend organs such as the
cell density, type, and proportions of that comes standard on the machine liver, kidney, and heart, can be quite
other supporting biomaterial prior to is used to emit an organic plastic challenging to bioprint, given the
the printing, although according to called PLA (polylactic acid), which cre- need to support constant, consistent
researchers, much of this can be mod- ates the scaffold of the trachea. The blood flow.
eled by observing existing tissue taken material costs just a few dollars, and “The biggest challenge is creating
from patients, and then replicating is biodegradable in the body so only blood vessels that will work immedi-
that structure with biomaterial. the body part is left after implantation ately,” Goldstein says. “When you bio-
More recently, Wake Forest Univer- is completed. print cells, if they don’t have a blood
sity’s Sang Jin Lee and his team have Goldstein says his team’s work is supply, they’ll be dead in five hours or
been working on the development mostly focused on bone, cartilage, and less.”
of an integrated tissue-organ printer trachea printing, due to the current Goldstein acknowledges that at
that can fabricate stable, human-scale limitations surrounding the efficient present, “there isn’t a good solution
tissue constructs of any shape. To en- and effective printing of constructs for that right now. That’s why a lot
sure the shape of the tissue construct of research is going into 3D-printing
is correct, clinical imaging data is fed small capillaries and a vascular net-
into a computer model of the dam- The printing of work that can support larger 3D-print-
aged or missing tissue, and that mod- ed constructs, like the liver, kidney, or
el data is then translated into a pro- complex body parts heart.”
gram that controls the motions of the is likely still several Work is progressing on creating
printer nozzles, which dispense cells structures that can support blood
to specific locations. The printer also years away. The soft flow. Adam Feinberg, associate profes-
incorporates micro-channels into the tissue required to sor of materials science and biomedi-
constructs, which allow the precise cal engineering at Carnegie Mellon’s
diffusion of nutrients to the printed mend organs can College of Engineering, has developed
cells, ensuring their survival. be challenging to a new, less-expensive way of 3D print-
The key benefit to using 3D printers ing biological soft materials, includ-
is the ability to create complex parts bioprint. ing coronary arteries and embryonic
relatively quickly and efficiently. “Us- hearts. Feinberg also uses modified
ing 3D-printing techniques, we can consumer-grade 3D printers that use
easily build complex structures,” Lee open-source software to control and
says, noting that in a clinical situation fine-tune their print parameters.
with real patients, “you need more What makes Feinberg’s work

16 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


news

unique is his approach to printing


soft materials, such as arteries, which
It is unlikely
ACM
need to be supported while being
printed so they don’t fall apart. Fein-
berg and his team developed a meth-
3D printing of Member
od of printing soft materials inside a biomaterial News
support bath material, which creates will become
one gel inside of another, thereby al-
lowing the structure to be printed commonplace over DREAMING OF
A SECURE INTERNET
while also being supported in the cor- the next few years, “I was always
interested in
rect position.
“My lab has developed a number of due to the research many things,
but somehow
novel 3D-printing approaches, where and medical trials my mind was
like a Petri dish
we put these gels inside of a support
gel,” Feinberg says. “We fill up a con- still needed. for computer
science ideas,” says Adrian
tainer with one gel, then drop a needle Perrig, professor of computer
in the middle, and then print another science at ETH Zurich in
gel with the [structure] material.” Switzerland. “I happen to be
good at it.”
Another key challenge that Fein- Perrig displayed an early
berg highlights is, “What you actually affinity for programming. “On
print is not what you’re going to want “There’s broader challenges to bio- a borrowed Commodore 64, I
wrote my first program, which
in the end.” He explains that the initial printing,” Feinberg says. “We need
was a health indicator I saw in a
living tissue that is printed will need a lot of cells, and some cells are just book I was reading.”
to be cultured and grown into the final hard to come by, or very costly to get. Realizing he liked
structure that will be implanted into a For example, heart cells don’t actually programming and was good
at it, he decided to study
human. “That is a total unknown right divide normally, and there’s no real computer science. Perrig earned
now, because we don’t have good com- stem cell population to generate them his undergraduate degree in
puter models for that process yet.” in the adult heart.” While Feinberg computer engineering from
Feinberg contrasts bioprinting says it is now possible to use embry- École polytechnique fédérale de
Lausanne (EPFL) in Switzerland,
body parts with printing static parts onic cells to grow new ones, it remains and a Ph.D. in computer
used in manufacturing, where it is very costly to do so. science from Carnegie Mellon
possible to model how a part will wear Wake Forest’s Lee agrees, noting University.
He launched his career as a
or function over time, based on previ- that the real challenges lie around professor at Carnegie Mellon,
ously known data from materials sci- printing complex organs. “We are try- where he remained for 11
ence and engineering work. “In bio- ing to mimic original tissue as much years. Perrig then returned to
printing, the cells are highly variable, as possible,” Lee says. However, “for Switzerland as a professor for
ETH Zurich, where he now leads
the biomaterial is also highly variable, complex organs like the kidney and the network security group.
and the interaction of the cells and heart, we need better imaging scans Perrig focuses on networking
material over time is just not under- and more cell types,” Lee says. “Unlike and systems security,
specifically in the design of a
stood at all,” Feinberg says. “That’s a simple tissue like bone or cartilage,
secure next-generation Internet
huge area that needs improvement.” the kidney requires more than 30 cell architecture. “My dream
As a result, it is unlikely that 3D types.” has been to develop a secure
printing of biomaterial will become Internet,” he says. To that end,
he started SCION, an acronym
commonplace within the next few for Scalability, Control, and
Further Reading
years, due to the significant amount of 3D Bioprinting System for Human-Scale Isolation On Next-Generation
research and medical trials that need Tissue Constructs: http://www.nature.com/ Networks, in 2009.
to be conducted. The hard work has paid
nbt/journal/v34/n3/full/nbt.3413.html
off, as the SCION network
There also are challenges around Northwell Health Wins Award for 3D is now functional. “Our
maintaining sufficiently high quality Bioprinting: https://www.northwell.edu/ routers are in ISPs, and a
control, for both safety and health is- about/news/northwell-health-announces- global test bed is being set
3d-bioprinting-winner-medical- up to make use of this secure
sues, as well as in keeping regulatory infrastructure,” Adrian adds.
innovation-contest
bodies, like the U.S. Food and Drug SCION has been Perrig’s sole
Administration, happy. Researchers Video: Adam Feinberg of Carnegie Mellon focus over the past few years,
University discusses 3D bioprinting: and will remain so, he says. “If
agree additional tools are needed for
https://www.youtube.com/watch?v=Zfl_ you spread yourself too thin, it is
assessing the quality of printed con- tFdt2D4&feature=youtu.be hard to have an impact.”
structs so they are developing prop- —John Delaney
erly, and in the case of living tissue, Keith Kirkpatrick is principal of 4K Research &
sourcing and acquiring cells on which Consulting, LLC, based in Lynbrook, NY.
to model 3D-printed cells can be time
consuming, expensive, or both. © 2017 ACM 0001-0782/17/10 $15.00

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 17
news

Technology | DOI:10.1145/3131079 Don Monroe

Digital Hearing
Advances in audio processing help separate
the conversation from background noise.

L
IKE MANY OTHER technolo-
gies, hearing aids have im-
proved rapidly over the last
two decades.
Digital audio processing
has enabled more sophisticated and
personalized features, and users report
higher satisfaction. Still, in noisy so-
cial situations such as cocktail parties,
hearing-impaired users must work
hard to track the conversation of com-
panions, just as someone with normal
hearing may struggle to follow a lively
conference call.
Researchers and companies contin-
ue to advance the technology by under-
standing how listeners process a com-
plex “auditory scene,” which requires
more than just amplifying sounds.
“Restoring audibility is simply not
enough,” said Christine Jones, direc-
tor of the Phonak Audiology Research
Center in Warrenville, IL. “There’s a
loss of fidelity that prevents individu-
als from having full restoration of func-
tion even when the sound becomes au-
dible again.”
Research also has shown that peo-
ple process sound in different ways, so The Earlens Light-Driven Hearing Aid converts sounds into pulses of light, which activate a
the “best” hearing aid technology will lens on the eardrum.
differ, too. Even as researchers struggle
to measure these differences, compa- DSPs also suppress the feedback that labs because of the severe design con-
nies are providing new ways to address arises from the close proximity of mi- straints posed by small device size and
them and to develop fitting procedures crophone and speaker, and can re- the need for low power consumption to
and user controls that work best for duce distracting background noise, preserve battery life. In addition, pro-
each individual. such as wind. cessing must be done with very limited
In recent years, hearing aid compa- delay, especially because the processed
Digital Revolution nies also have introduced frequency- signal must coexist with sound passing
Digital signal processors (DSPs) were lowering technology. Patients with ex- through bypass channels that avoid
first used in hearing aids in 1996. In the treme hearing loss at high frequencies uncomfortable ear-canal blockage.
ensuing years, almost all hearing devices can benefit when sounds are shifted to In special situations, engineers can
adopted them because of their flexibility frequencies where they still have some dramatically improve listeners’ experi-
in providing programmable features. function, said Brent Edwards, CTO of ences by incorporating external devic-
For example, simple amplification Earlens in Menlo Park, CA. “It’s a dis- es. For example, asking conversation
makes soft sounds perceptible, but tortion, but if you do it effectively, you partners to clip a microphone on their
PHOTO BY EDWA RD OLIVE

loud sounds uncomfortable. DSPs can get some phonemes still through collar greatly reduces background
make it easy to adjust both the gain the damaged auditory system.” noise, and combining the output of
(sound boost) and the dynamic-range Hearing researchers have explored several microphones on a table can
compression to match an individual’s other processing strategies over the enhance a particular speaker. External
hearing loss at different frequencies. years, but many of them remain in the loudspeakers, as at a business, may

18 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


news

also provide a more compelling spatial Edwards said the “hottest topic in
perception of sound than in-ear micro- hearing science right now” is called
phones. Yet everyday use still requires “As you develop new “hidden hearing loss,” in which patients
small, low-power devices. technologies, you lose some of the normal nerve connec-
One useful type of external device tions. “It’s like losing half the pixels on
that many people already carry is the have to develop new your high-def TV,” he said, but it does
smartphone. A Bluetooth connection ways of measuring its not affect the audibility of pure tones.
to a phone app gives users a richer The standard test of detection
and more intuitive interface, for ex- benefit. That’s part of threshold “is not telling you, say in a
ample, even with a low data rate. A what engineers don’t restaurant, how salient the representa-
more bandwidth-intensive approach tion of the auditory scene is,” Edwards
offloads some signal processing to understand about warned. “As you develop new technolo-
the phone, with its larger battery and hearing loss.” gies, you have to develop new ways of
higher processing power, but long de- measuring its benefit,” he said. “That’s
lays are unacceptable. part of what engineers don’t under-
stand about hearing loss. There’s a lot
Auditory Scenes about the individual patient that we
Although increased processing power can’t tell from the diagnostics.”
has clearly benefited hearing aid tech- most effective features,” says Jones,
nology, designs must extend beyond but “we have different levels of beam- Fitting the Listener
electrical engineering to encompass forming depending on the scene.” As Fine-tuning settings is already a key as-
the complex and idiosyncratic ways background noise increases, for ex- pect of “fitting” of a new hearing aid by a
that people process and interpret ample, the effective beam is made in- trained audiologist, which contributes
sounds. A particularly influential con- creasingly narrow. But if a car scene is to the devices’ price. Edwards com-
cept is auditory scene analysis, which detected, the beamforming is turned pares this process to improving a TV
posits that people perceive sound off, because speakers may well be to picture by yelling instructions to a tech-
not as frequencies but as individual the side or behind the listener. nician who can’t see the picture. “What
sources, each emitting waveforms people are now doing is developing sys-
with similar attributes. In this view, Individual Strategies tems where the patient can make adjust-
hearing aids should preserve or ac- There are limits to such automated ments themselves,” he said. “There has
centuate the cues that let a listener selection of algorithms, however, due to be a balance between diagnostic-driv-
identify and attend to sources of in- not only to varying preferences of indi- en fitting by a professional and this sort
terest while disregarding others. viduals, but to differences in how their of self-adjustment.”
As a step in that direction, hearing brains process sound. “We’ve done a really good job now of
aid companies often categorize the One feature that helps distinguish creating algorithms that work on aver-
sound environment and vary the pro- sources in an auditory scene is the age, based on the degree and configu-
cessing algorithms accordingly. “We overall modulation of the auditory sig- ration of the hearing loss,” said Jones,
have at least 12 different attributes we nal, Kuk said. Some people, particular- who has practiced as an audiologist, in-
look at,” including the frequency spec- ly those with both hearing impairment cluding fitting hearing aids to pediatric
trum, the modulation characteristics, and cognitive limitations, “are more patients with limited communication
and the angle of the sound deduced reliant on the temporal waveform” to skills. “But two patients who present
by comparing two microphones, said identify a source, for example when with the same hearing loss could have
Francis Kuk of the Office of Research different frequencies get louder at the very different levels of functional audi-
in Clinical Amplification (ORCA-USA) same time. Hearing aids that respond tory behavior. There’s still room to do
of Widex, in Lisle, IL. rapidly to protect users from sudden some fine-tuning based on their own
Based on these parameters, Widex loud sounds distort the temporal wave- personal capabilities.”
will predict that the environment con- form, making it difficult for these lis- In light of the high price of current
tains music, speech in a quiet envi- teners to use this important cue. hearing aids (which can cost as much
ronment, speech in a train station, or The effort and attention required at $4,000 per unit), there is increasing
other settings. The inferred scene then to decipher speech can be exhaust- pressure to open the market to cheaper
determines the default parameters of ing for hearing-impaired people, but over-the-counter (OTC) devices. Low-
the hearing aid, such as the noise re- that is not easy to assess. “We used to ered regulatory requirements, advo-
duction for a noisy environment. say speech understanding and sound- cates suggest, might encourage com-
The auditory scene can also be used quality assessment were the two mea- panies like Dolby and Bose to leverage
to modify beamforming, which wire- sures of what a technology was doing. their expertise in entertainment audio
lessly combines audio signals from Now we can add a third dimension, and for personal devices.
both earpieces. The technique can en- that’s cognitive impact,” said Edwards. “I support creating an OTC catego-
hance sounds coming from straight “Some technologies may not improve ry, as long as it doesn’t affect the tra-
ahead, for example, because they ar- speech understanding that much, but ditional distribution and provision of
rive at the ears in sync. “It’s one of our reduce cognitive load.” hearing aids,” Edwards said. Kuk cau-

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 19
news

tioned such over-the-counter products segment contributes to the speech ing target. As technology improves, peo-
are unlikely to have the advanced capa- of interest or can be disregarded as ple’s expectations for what a satisfying
bilities of medically-approved hearing noise. A similar binary classification hearing aid is also increases.”
aids like those his company offers, and and masking is used to make MP3 au-
may instead resemble the generic read- dio encoding more efficient. “Once
Further Reading
ing glasses sold in drug stores. you have formulated it as a classifica-
“As an audiologist, I believe that for tion problem, then modern machine Abrams, H., and Kihm, J.
An Introduction to MarkeTrak IX:
many people with hearing loss, taking learning techniques can be utilized;
A New Baseline for the Hearing Aid Market,
something off the shelf will not do the in our case, deep neural networks,” Hearing Review, June 2015,
best job,” Jones agreed, although pa- Wang said. Lab testing shows that http://bit.ly/2sKl3cL
tients with limited impairment may masking the “noisy” time-frequency Edwards, B.
find such devices helpful. “There are segments significantly improves lis- A Model of Auditory-Cognitive Processing
still people that benefit from the ability teners’ speech comprehension. and Relevance to Clinical Applicability,
of an audiologist to more closely fine- If research like this can be exploited Ear & Hearing July/August 2016,
http://bit.ly/2rX9DpQ
tune the amplification and the behav- in practical devices, it could work with
ior of the hearing instruments to both listeners’ brains to help extract specific Rönnberg, J., et al.
their hearing loss and also their life- sources from a complex audio scene. The Ease of Language Understanding (ELU)
model: theoretical, empirical, and clinical
style needs.” But when speech is also the noise, Jones advances, Front. Syst. Neurosci., 13 July
said, “there’s no acoustic separation, 2013, http://bit.ly/2rtSD93
Distinguishing Sources other than direction and maybe level, Wang, D.
In spite of steady advances in hearing between your target and your inter- Deep Learning Reinvents the Hearing Aid,
aids, users still struggle to distinguish ference. That is the challenge.” IEEE Spectrum, December 6, 2016,
voices in a hubbub of other voices, said In spite of these aspirations for the http://bit.ly/2qYmhQm
DeLiang Wang of Ohio State Universi- future, “the satisfaction rate for hearing Stix, G.
ty. “Every hearing-aid company recog- aids today is much, much higher than How Hearing Works [Video],
nizes that the cocktail-party problem several years ago, and especially before Scientific American, August 1, 2016,
http://bit.ly/2qQsbE0
is not solved—and needs to be solved.” digital hearing aids,” Kuk said. Industry-
In the laboratory, Wang and his col- sponsored surveys show that “almost
Don Monroe is a science and technology writer based in
leagues divide the audio stream into 90% of patients who wear today’s hear- Boston, MA.
20ms-long segments in various fre- ing aids are satisfied,” he explained.
quency bands, and ask whether each “Having said that, satisfaction is a mov- © 2017 ACM 0001-0782/17/10 $15.00

Milestones

ACM Honors Researchers For Innovations


ACM recently named the Jeffrey Heer was named contributions to computer Hassanieh developed
recipients of four prestigious recipient of the ACM Grace vision that have led the field highly efficient algorithms for
technical awards, selected by Murray Hopper Award, for in image segmentation and computing the Sparse Fourier
their peers for making significant developing visualization object category recognition. Transform, and demonstrated
contributions. They were among languages that have changed The award is presented to an their applicability in many
those honored at the ACM the way people build and individual selected for career domains including networks,
Awards Banquet in June. interact with charts and graphs contributions that have breadth graphics, medical imaging, and
Mahadev Satyanarayanan, across the Web. The award is within computer science, or biochemistry.
Michael L. Kazar, Robert N. given to the outstanding young that bridge computer science Hassanieh’s Sparse Fourier
Sidebotham, David A. Nichols, computer professional of the and other disciplines. Transform algorithm was
Michael J. West, John H. Howard, year, selected on the basis of a chosen by MIT Technology
Alfred Z. Spector, and Sherri M. single recent major technical or MIT GRADUATE Review as one of the top 10
Nichols were named recipients of service contribution EARNS DOCTORAL breakthrough technologies
the ACM Software System Award for Amos Fiat and Moni DISSERTATION AWARD of 2012. He has also been
developing the Andrew File System, Naor received the ACM Paris Haitham Hassanieh, recognized with the Sprowls
the first distributed file system Kanellakis Theory and Practice assistant professor in the Award for Best Dissertation in
designed for tens of thousands of Award for the development Department of Electrical and Computer Science, and with
machines, and pioneering the use of broadcast encryption and Computer Engineering and the Best Paper Award from
of scalable, secure, and ubiquitous traitor tracing systems. The the Department of Computer SIGCOMM, ACM’s professional
access to shared file data. The award honors theoretical Science at the University of forum for the discussion
award is presented to an institution accomplishments that have Illinois at Urbana-Champaign, of topics in the field of
or individual(s) recognized for had a significant, demonstrable recently was awarded the ACM communications and computer
developing a software system effect on the practice of 2016 Doctoral Dissertation networks, including technical
that has had a lasting influence, computing. Award, presented annually to design and engineering,
reflected in contributions Jitendra Malik was selected the author(s) of the best doctoral regulation and operations,
to concepts, in commercial to receive the ACM-AAAI Allen dissertation(s) in computer and the social implications of
acceptance, or both. Newell Award for seminal science and engineering. computer networking.

20 COM MUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


news

Society | DOI:10.1145/3131271 Chris Edwards

Portable Device Fears Show


Power of Social Development
How do small screens impact young minds?

L
A S T Y E A R , T H E American “What we are finding in our lab is
Academy of Pediatrics (AAP) that these devices command atten-
updated its guidelines on tion much better than other things.
how much access children It can make it more difficult for par-
should have to electronic de- ents to interact with their children,”
vices amid growing concerns among Christakis adds. “I tend to think of
parents of the effects of electronic me- the effects being mediated through
dia. Yet extrapolation of the evidence two different pathways. One is the di-
linking television and behavior may rect pathway, which is the actual con-
obscure potentially more subtle and tent. Interactive media could lead to
diverse effects. Recent developments the same kind of overstimulation as
in work with interactive devices rep- fast-paced TV, although being inter-
resent an increased understanding active, it means the child can control
of how children learn and the impor- the pacing in a way that isn’t possible
tance of social interaction. with television.
Concerns over the mental effects “There is also the indirect pathway,
of electronic devices have been largely which works through displacement.
driven by fears that the prevalence of This is about what could they be doing
conditions such as attention-deficit/ that they aren’t, whether it’s singing,
hyperactivity disorder (ADHD) seem to reading, or going outside to play. Even
follow their adoption. In 2011, the U.S. if someone developed the perfect app
Centers for Disease Control and Pre- Danielle Erkoboni-Wilbur, a pedia- that was perfectly paced and shown to
vention (CDC) reported an increase trician at St. Christopher’s Hospital be beneficial, if children used that app
of 33% in ADHD prevalence among for Children in Philadelphia. “When eight hours a day, we would recognize
children from 1997 to 2008. A 2016 those studies on behavior were done, that behavior as being a problem,”
follow-up study by the CDC found the technology just meant television. Christakis adds.
increase continued to 2012, but then There is nothing that’s formally in the Christakis is concerned about the
began to fall through 2015 among literature linking those outcomes to addictiveness of applications on tab-
children of poorer families, although portable digital media.” lets and smartphones, and the po-
that reduction was not reflected in A 2015 study by pediatrician Dr. Hil- tential for them to eat too far into the
wealthier homes. da Kabali and colleagues at the Einstein child’s daytime activities. “I tell par-
From 1999 onward, a number of Medical Center in Philadelphia found ents to limit time on interactive devic-
studies looked at heavy television use much of the average child’s time with es to no more than 30 minutes. People
by young children and identified pos- portable devices is spent watching on- ask, ‘How do I know?’ I came up with
sible effects on their development. line TV, rather than using interactive ap- that limit when we looked at what peo-
Dr. Dimitri Christakis, director of plications. However, as less passive ex- ple have done in the past, using tech-
the Center for Child Health, Behav- periences become more common over niques such as time diaries. Children
ior, and Development at Seattle Chil- time, they may have different effects on in the pre-iPad days typically spent
dren’s Hospital, says: “We found that children’s behavior. a maximum of 20–30 minutes a day
the more television children watched Says Christakis, “I think the touch- with a particular toy, but they will of-
at age three, the more likely they were screen interactive experience that began ten spend much more time than that
to have attention problems.” The fast- with the iPad is, for the lack of a better with an iPad game; there is something
changing images and sounds in many word, a transformative technology. It’s very different about the experience.
television programs that are made to the interactivity that makes it different. “The makers of many of these
capture the attention of young children With traditional media, a child never games design them to be addictive.
“condition the mind to a reality that thinks or says ‘I did it’: it’s a completely We live in an attentional economy,
PHOTO BY T Y LIM

doesn’t exist,” he notes. passive experience. But it’s so gratifying so apps writers are often trying to get
The question is whether such a link to make something happen by touching people drawn in and get people ad-
extends to portable devices, says Dr. an object on the screen.” dicted,” Christakis adds.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 21
news

Alexis Hiniker, a Ph.D. candidate One underappreciated limitation information really quite rapidly,” Barr
in Human Centered Design and Engi- of screen-based devices that has says. “There is something about a per-
neering at the University of Washing- emerged in studies is their two-di- son helping you that is different, but the
ton, says parents quickly detect over- mensional (2D) nature. That realiza- device itself can often seem so compel-
addictive apps. “A lot of the families tion is helping to bolster research on ling. I wonder if maybe parents some-
we talked to, they said ‘we tried X how children learn. Research by de- times think maybe the device can teach
but it was so hard to get them to put velopmental psychologist Rachel Barr them better,” Barr adds.
it away that we had to take it away’. and colleagues at Georgetown Univer- Researchers such as Christakis in-
They had to go cold turkey. So we find sity revealed infants find it more diffi- sist parents should resist the tempta-
apps that are super-grabby are not cult than older children and adults to tion to believe the device will be better
successful, at least when it comes to translate learning from the 2D space at teaching and to be confident in what
younger children.” to three dimensions. they can bring to the situation. “One
Taking the device away inevitably “We call it the transfer deficit. A has to keep in mind that our brains
causes tension. Hiniker’s own work very common problem in learning in have evolved over millennia; they are
involves finding ways to design apps general is when we have take some- contingent on social interaction. Every
for children that make it easier to set thing from one context to another. time we are interacting with a child,
limits on usage time without causing What you are faced with on the screen we are laying down new synapses and
tantrums when device time has to stop. is different to what you are faced with making new connections in the brain.”
“Our approach was to look at what is in the real world,” Barr says. “We need Erkoboni-Wilbur concludes: “The
helpful to children in acquiring self- to understand that this transfer defi- message that we are focusing on is that
regulation. We found if children have cit is important. Children can seem to technology can be a social experience.
explicit opportunities to make plans, be digital natives when they use these We need to guide it away from being an
they will take more ownership over their devices so easily, but it’s actually isolated type of interaction. We are really
behavior,” she explains. more cognitively demanding to learn encouraging parents to sit down togeth-
The Plan & Play system trialed by from them.” er and learn with technology. Parents
Hiniker lets children work out how What makes it easier for children and their children, teachers and chil-
long they want to spend on an activity to learn is what Barr calls the “social dren, children and their peers; experi-
before starting. “As they are playing, scaffold.” The Georgetown research- ence it as a social medium, and enhance
we surface that information back to ers showed this effect in an experiment that social interaction.”
them and remind them of what their based on a jigsaw-like puzzle. Research-
plan was. ers demonstrated the game to half the
Further Reading
“In terms of practical consider- children. The other half only viewed a
ations for industry, I would love to “ghost version”: a puzzle that assembles Kabali, H. K., Irigoyen, M. M., Nunez-Davis, R.,
Budacki, J. G., Mohanty, S. H., Leister, K. P., and
see design for things like parental itself onscreen without human interven-
Bonner Jr, R. L.
controls shift to include these ideas, tion. When given the tablet to solve the Exposure and use of mobile media devices by
[and] move away from lockout mecha- puzzle, the young children who only saw young children, Pediatrics, 136, 1044–1050
nisms that simply force children to do the ghost version were often unable to (2015)
other things.” complete the exercise. “We gave them Chassiakos, Y.R., Radesky, J., Christakis, D.,
Erkoboni-Wilbur says a focus on de- the touchscreen and they were baffled, Moreno, M.A., and Cross, C.
signing devices and apps for children but if someone showed them first, they Children and Adolescents and Digital Media,
American Academy of Pediatrics Technical
can avoid the potential for harm and were really good at it.”
Report.
make them support better develop- Barr points to a 2014 study performed
ment. “There are a variety of futures in by Northwestern University doctoral Barr, R.
Memory Constraints on Infant Learning From
front of us. We can see how easy it is for candidate Courtney Blackwell (now a re- Picture Books, Television, and Touchscreens,
technology to isolate folks, but we do see search assistant professor of medical so- Child Development Perspectives (2013)
pointers in research as to what causes cial sciences at Northwestern’s Feinberg Hiniker, A., Lee, B., Sobel, K., and Choe, E.K.
great outcomes; what we can do to op- School of Medicine) with kindergarten- Plan and Play: Supporting Intentional Media
timize technology to improve develop- age children at three schools in Chicago Use in Early Childhood, Proceedings of the
mental outcomes.” as another example of the social scaf- 16th Annual ACM Conference on Interaction
Allison Druin, a professor in the fold in action, this time with children Design and Children (IDC ’17)
College of Information Studies at the working together. At the beginning of Blackwell, C. K.
University of Maryland, College Park, the school year, children at one location iPads in Kindergarten? Investigating
the effects of tablet computers on
agrees: “Technology, as with any tool received an iPad, another school did not student achievement, Proceedings
in a person’s life, is either going to supply them at all, and a third gave iPads of the International Communication
amplify the challenges or support the to pairs of students. Those who shared Association (2015).
strengths. We have the situation where devices scored higher on literacy tests
autistic kids finally feel comfortable than the other two groups. Chris Edwards is a Surrey, U.K.-based writer who reports
socially because they have this technol- “This human ability is really quite an on electronics, IT, and synthetic biology.
ogy bridge, but there is also the poten- amazing one that we have. We can take
tial to amplify inattention.” information from someone and link © 2017 ACM 0001-0782/17/10 $15.00

22 COMM UNICATIO NS O F THE AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


Introducing ACM Transactions
on Human-Robot Interaction

Now accepting submissions to ACM THRI

In January 2018, the Journal of Human-Robot Interaction (JHRI) will become an ACM
publication and be rebranded as the ACM Transactions on Human-Robot Interaction (THRI).

Founded in 2012, the Journal of HRI has been serving as the


premier peer-reviewed interdisciplinary journal in the field.

Since that time, the human-robot interaction field has


experienced substantial growth. Research findings at the
intersection of robotics, human-computer interaction,
artificial intelligence, haptics, and natural language
processing have been responsible for important discoveries
and breakthrough technologies across many industries.

THRI now joins the ACM portfolio of highly respected


journals. It will continue to be open access, fostering the
widest possible readership of HRI research and information.
All issues will be available on the ACM Digital Library.

Editors-in-Chief Odest Chadwicke Jenkins of the University of Michigan and Selma


Šabanović of Indiana University plan to expand the scope of the publication, adding a new
section on mechanical HRI to the existing sections on computational, social/behavioral,
and design-related scholarship in HRI.

The inaugural issue of the rebranded ACM Transactions on Human-Robot Interaction is


planned for March 2018.

To submit, go to https://mc.manuscriptcentral.com/thri
V
viewpoints

DOI:10.1145/3132722 Michael A. Cusumano

Technology Strategy
and Management
Amazon and Whole Foods:
Follow the Strategy
(and the Money)
Checking out the recent Amazon acquisition of Whole Foods. 

I
N JU N E 2 0 1 7 , Amazon an- new business development and ex- has always made deft use of physical
nounced it would acquire pansion, and since Amazon does not assets as well as the Internet. It soon
Whole Foods, the national have to dilute its stock, this seems like offered two million or more titles—far
grocery chain with nearly 500 a great deal for CEO Jeff Bezos. But is more than actual bookstores were able
stores, for $13.7 billion in cash. it? And what does this acquisition say to stock. Later in the 1990s, Amazon
This is less than Whole Foods’ 2016 rev- about Amazon’s strategy? added a platform service that linked
enues of $15.7 billion, which included Launched by Bezos in 1994 as a buyers who wanted less-popular books
$507 million in net profits (3.2% of rev- pioneering online bookstore, Amazon with third-party sellers that held the in-
enues). Whole Foods became a take- ventory, a business now called Amazon
over target because of declining sales Marketplace. Unlike Google and other
and profits as well as increasing com- Unlike Google and Internet platforms, Amazon was never
petition in the cutthroat supermarket a completely virtual business. Bezos
business.1 By comparison, Amazon’s other Internet operated out of a warehouse that in-
2016 revenues were $136 billion, up platforms, Amazon tercepted shipments from distributors
27% over 2015, with operating profits and then re-sent the books to custom-
of $4.1 billion and net profits of $2.4 was never ers. The company also began to buy
billion (1.8% of revenues). Amazon’s a completely large numbers of best-seller books and
market value of around $480 billion stock them so Amazon could benefit
is 3.5 times last year’s revenues. It is virtual business. from scale economies in purchasing
buying Whole Foods for less than one and earn premiums from delivering
times its last year’s revenues (0.85%). the books quickly.
Given that Whole Foods is actually Eventually, Amazon expanded to
more profitable than Amazon, which other popular items suitable for sale
plows most of its potential profits into over the Internet, ranging from elec-

24 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


viewpoints

tronic goods to digital content (music platforms, and mix products and ser- customer loyalty and seem to drive
and videos) to clothing. Amazon now vices with varying profit rates. In 2016, long-term sales growth.
sells some two million items.4 A decade sales of products accounted for 70% of Bezos’ interest in groceries goes
ago it also launched Amazon Web Ser- Amazon’s revenues.2 Sales of services back a decade, when he launched Am-
vices (AWS) to sell excess computing (mostly from selling third-party goods azon Fresh. Why? Volume. The annual
and storage capacity from its massive on Amazon Marketplace, revenues grocery business in the U.S. alone was
data centers. In addition, Bezos experi- from AWS, some allocations from worth some $1.3 trillion in 2016, with
mented with his own products such Amazon Prime membership fees, and Walmart (which has 4,500 stores)
as the successful Kindle tablet for e- some advertising) accounted for the having the largest market share at
books, the failed Fire smartphone, and other 30%. AWS produced only $12.2 18%.7 Amazon and Whole Foods to-
the intriguing Echo/Alexa digital assis- billion or 9% of revenues (compared to gether would have a market share
tant and speaker device. More recently, 5% in 2014) but a whopping $3.1 billion of about 3.5%.11 Amazon has figured
Amazon opened a bookstore in Seattle (74%) of operating profits. By contrast, out how to sell perishable and non-
to complement the online store, and it Amazon lost $1.3 billion on overseas perishable groceries via the Web. It
IMAGE BY ALICIA KUBISTA /A ND RIJ BORYS ASSOCIAT ES

is experimenting with new technology sales of $44 billion, nearly one-third is not clear the company has figured
(Amazon Go) that eliminates cashiers. of revenues. It is also estimated that out how to do this at a profit. Amazon
Until now, Amazon’s pieces fit Amazon loses up to $2 billion per year can learn from Whole Foods how to
together pretty well. Now the com- on Prime Memberships.6 With 70- to sell groceries to upscale customers,
pany is moving deeper into the low- 80-million members, Prime generated but this market is shrinking.
margin grocery business. We have to $6.8 billion in 2016 revenue.5 Prime of- Many customers are the same, how-
wonder why. fers free shipping, a lot of free digital ever, so Amazon can try cross-market-
Amazon’s financials are not particu- content, and other benefits for a $99 ing. About 62% of Whole Foods cus-
larly strong compared to other leading annual fee or less in some cases. The tomers are Amazon Prime members.9
technology companies and Internet free services are costly but encourage Perhaps Amazon can convert the 38%

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 25
viewpoints

it benefitted from increasing scale

COMMUNICATIONSAPPS Bezos’ goal


does not seem
economies and positive feedback loops
associated with digital platforms and
network effects (see “The Evolution of
Platform Thinking,” Communications,
to be profit, January 2010). That is, the more goods
at least not in and services Amazon sells, the more
customers it has, and the more likely
the short term. it becomes that more buyers and sell-
ers will use Amazon, especially if com-
petitors vanish or falter. With weak or
no competitors, Amazon is also free to
raise prices.
Bezos’ goal does not seem to be
of Whole Foods customers who are not profit, at least not in the short term.
yet Prime members. Perhaps Ama- Rather, he seems intent on expand-
zon can expand the reach of Whole ing Amazon’s reach. As noted in the
Foods through its Web presence and title of a 2013 book on the company,
online delivery services. Perhaps the Bezos wants Amazon to be “The Every-
Whole Foods network of stores and thing Store.”10 Whole Foods fits this
Access the warehouses can help Amazon in strategy of retail expansion. It also fits
storing products or handling returns Bezos’ apparent belief that, to expand
latest issue, from Web purchases, or showcasing its reach, Amazon needs to increase its
products for sale. physical presence. Why physical? Fol-
past issues, Are there significant synergies be- low the money. Nearly 44% of U.S. con-
tween selling books and selling lettuce, sumers go first to Amazon when they
BLOG@CACM, or between warehousing groceries and want to make a purchase.6 However,
News, and other items? Maybe, but maybe not. the vast majority of purchases we make
Synergies on paper are always difficult (about 92%) still occur in brick-and-
more. to realize in acquisitions. That is why, mortar stores, not over the Internet.3
in many studies, at least two-thirds of The value of this acquisition will be in
acquisitions fail. In this case, Amazon how effectively and broadly Amazon
will find that economies of scale and is able to utilize the new physical plat-
scope in the grocery business are not form it is buying.
like the digital world. There may even
be some economic penalties with ex- References
1. Abrams, R. and Creswell, J. Amazon deal for Whole
pansion or automation if Amazon can- Foods starts a supermarket war. The Wall Street
not maintain the quality and service Journal (June 16, 2017).
2. Amazon.com, Inc. Form 10-K Annual Report.
Available for iPad, that are hallmarks of Whole Foods. 3. Bray, H. Whole Foods deal fits into Amazon’s plan to
offer one-stop shopping. Boston Globe (June 16, 2017).
iPhone, and Android Another argument against the 4. Elgan, M. This is why Amazon will open physical
acquisition is that the more sectors bookstores. Computerworld (Feb. 8, 2016).
5. Gustafson, K. Amazon hints at one of its best kept
into which Amazon diversifies or secrets: How many prime members it has. CNBC.com
integrates vertically, the more it re- (Feb. 17, 2017).
6. Khan, L. Amazon’s antitrust paradox. Yale Law Journal
sembles a conglomerate with no par- (Jan. 2017), 751–752.
ticular specialization or competitive 7. Russell, K. and Seshagiri, A. Amazon is trying to do (and
sell) everything. The New York Times (June 16, 2017).
advantage.8 Perhaps Amazon should 8. Sorkin, A. Conglomerates didn’t die. They look like
use the $13.7 billion to expand AWS. Amazon. The New York Times (June 19, 2017).
9. Stevens, L. and Haddon, H. Big prize in Amazon-Whole
This is Amazon’s most profitable busi- Foods deal: Data. The Wall Street Journal (June 20, 2017).
Available for iOS, ness, but it is subject to intense com- 10. Stone, B. The Everything Store: Jeff Bezos and the Age
of Amazon. Little, Brown, NY, 2013.
Android, and Windows petition from Microsoft, Google, IBM, 11. Wingfield, N. and de la Merced, M. Amazon to buy
and other companies. Whole Foods for 13.4 billion. The New York Times
(June 16, 2017).
http://cacm.acm.org/ Although the Federal Trade Com-
about-communications/ mission approved the takeover in Michael A. Cusumano ([email protected]) is a
mobile-apps August, lawyers and analysts have professor at the MIT Sloan School of Management and
founding director of the Tokyo Entrepreneurship and
raised long-term anti-trust concerns.6 Innovation Center at Tokyo University of Science.
Amazon disrupted the competition by
pricing physical books, e-books, elec- The author thanks Annabelle Gawer, Xiaohua Yang, and
tronics, diapers, digital media, and David Yoffie for their very helpful comments on earlier
versions of this column.
other goods low—and sometimes be-
low cost—to gain market share. Then Copyright held by author.

26 COM MUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


V
viewpoints

DOI:10.1145/3132724 David Lorge Parnas

Inside Risks
The Real Risks of
Artificial Intelligence
Incidents from the early days of AI research
are instructive in the current AI environment.

T
HE VAST INCREASE in speed, An Early Introduction to AI
memory capacity, and com- As a student at Carnegie Mellon Uni-
munications ability allows Application versity (CMU),b I learned about “arti-
today’s computers to do of AI methods ficial intelligence” from some of the
things that were unthink- field’s founders. My teachers were
able when I started programming six can lead to devices clever but took a cavalier, “Try it and
decades ago. Then, computers were and systems fix it,” attitude toward programming.
primarily used for numerical calcula- I missed the disciplined approach to
tions; today, they process text, images, that are problem solving that I had learned as
and sound recordings. Then, it was an untrustworthy a student of physics, electrical engi-
accomplishment to write a program neering, and mathematics. Science
that played chess badly but correctly. and sometimes and engineering classes stressed
Today’s computers have the power to dangerous. careful (measurement-based) defini-
compete with the best human players. tions; the AI lectures used vague con-
The incredible capacity of today’s cepts with unmeasurable attributes.
computing systems allows some pur- My engineering teachers showed me
veyors to describe them as having “artifi- how to use physics and mathematics
cial intelligence” (AI). They claim that AI to thoroughly analyze problems and
is used in washing machines, the “per- products; my AI teachers relied almost
sonal assistants” in our mobile devices, eventually make people superfluous. entirely on intuition.
self-driving cars, and the giant com- Experts have predicted AI will even re- I distinguished three types of AI
puters that beat human champions at place specialized professionals such as research:
complex games. lawyers. A Microsoft researcher recently ˲˲ building programs that imitate hu-
Remarkably, those who use the made headlines saying, “As artificial man behavior in order to understand
term “artificial intelligence” have not intelligence becomes more powerful, human thinking;
defined that term. I first heard the people need to make sure it’s not used ˲˲ building programs that play games
term more than 50 years ago and have by authoritarian regimes to centralize well; and
yet to hear a scientific definition. Even power and target certain populations.”a ˲˲ showing that practical computer-
now, some AI experts say that defin- Automation has radically trans- ized products can use the methods that
ing AI is a difficult (and important) formed our society, and will continue humans use.
question—one that they are working to do so, but my concerns about “arti- Computerized models can help re-
on. “Artificial intelligence” remains a ficial intelligence” are different. Appli- searchers understand brain function.
buzzword, a word that many think they cation of AI methods can lead to devic- However, as illustrated by Joseph Weizen-
understand but nobody can define. es and systems that are untrustworthy baum,2 a model may duplicate the “black-
Recently, there has been growing and sometimes dangerous. box” behavior of some mechanism with-
alarm about the potential dangers of out describing that mechanism.
artificial intelligence. Famous giants
of the commercial and scientific world a Interview with Kate Crawford in The Guardian, b CMU was then known as Carnegie Institute
have expressed concern that AI will March 13, 2017. of Technology.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 27
viewpoints

Writing game-playing programs is “Computing Machinery and Intelli-


INTER ACTIONS harmless and builds capabilities. How- gence.”1 It is frequently claimed that,
ever, I am very concerned by the pro- in that paper, Turing proposed a test
posal that practical products should ap- for machine intelligence.
ply human methods. Imitating humans Those who believe that Turing pro-
is rarely the best way for a computer to posed a test for machine intelligence
perform a task. Imitating humans may should read that paper. Turing under-
result in programs that are untrust- stood that science requires agreement
worthy and dangerous. on how to measure the properties be-
To explain my reservations about AI, ing discussed. Turing rejected “Can
this column discusses incidents from machines think?” as an unscientific
the early days of AI research. Though question because there was no mea-
the stories are old, the lessons they surement-based definition of “think.”
teach us remain relevant today. That question is not one that a scientist
ACM’s Interactions magazine should try to answer.
explores critical relationships Heuristic Programming Turing wrote: “If the meaning of the
between people and AI researchers sometimes describe their words ‘machine’ and ‘think’ are to be
technology, showcasing approach as “heuristic programming.” found by examining how they are com-
emerging innovations and An early CMU Ph.D. thesis defined a monly used it is difficult to escape the con-
industry leaders from around heuristic program as one that “does not clusion that the meaning and the answer
the world across important always get the right answer.” Heuristic to the question, “Can machines think?” is
applications of design thinking programs are based on “rules of thumb,” to be sought in a statistical survey such as
and the broadening field of that is, rules based on experience but not a Gallup poll. But this is absurd. Instead of
interaction design. supported by theory.c attempting such a definition I shall replace
“Heuristic” is not a desirable attri- the question by another, which is closely
Our readers represent a growing
bute of software. People can use rules related to it and is expressed in relatively
community of practice that is
of thumb safely because, when rules unambiguous words.”
of increasing and vital global
suggest doing something stupid, most Turing’s proposed replacement
importance.
people won’t do it. Computers execute question was defined by an experiment.
their programs unquestioningly; they He described a game (the imitation
should be controlled by programs that game) in which a human and a machine
can be demonstrated to behave cor- would answer questions and observers
rectly in any situation that might arise. would attempt to use those answers to
The domain of applicability of a pro- identify the machine. If questioners
gram should be clearly documented. could not reliably identify the machine,
Truly trustworthy programs warn their that machine passed the test.
users whenever they are applied out- Turing never represented his re-
side that domain. placement question as equivalent to
Heuristics can be safely used in a “Can machines think?” He wrote, “The
program if: original question, ‘Can machines think?’
˲˲ The specification allows several ac- I believe to be too meaningless to deserve
ceptable solutions and the heuristic is discussion.” A meaningless question
used either to select one of them or to cannot be equivalent to a scientific one.
determine the presentation order. Most of Turing’s paper was not
˲˲ The heuristic is intended to speed about either machine intelligence
up a program that conducts a search or thinking; it discussed how to test
that will either find a solution or estab- whether or not a machine had some
To learn more about us, lish that there is no solution. well-specified property. He also specu-
visit our award-winning website In other situations, heuristic program- lated about when we might have a ma-
http://interactions.acm.org
ming is untrustworthy programming. chine that would pass his test and de-
Follow us on molished many arguments that might
Facebook and Twitter What Alan Turing Really Said be used to assert that no machine
Alan Turing is sometimes called the could ever pass his test. He did not try
To subscribe:
“Father of AI” because of a 1950 paper, to design a machine that would pass
http://www.acm.org/subscribe
his test; there is no indication that he
thought that would be useful.
c Those who write heuristic programs rarely
Association for characterize the set of conditions under which
Computing Machinery the program would produce an incorrect re-
Joseph Weizenbaum’s Eliza
sult. See the section “An AI System for Con- Anyone interested in the Turing Test
structing Parsers.” should study the work of the late MIT

28 COMMUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0

IX_XRDS_ThirdVertical_V01.indd 1 3/18/15 3:35 PM


viewpoints

professor Joseph Weizenbaum.3 In the ficult for computers. The optical char- Newell’s example of a good thesis was
mid-1960s, he created Eliza, a program acter recognition software that I use Floyd’s example of a bad one.
that imitated a practitioner of Rogeri- to recognize characters on a scanned The disputed thesis presented an
an psychotherapy.d Eliza had interest- printed page frequently errs. The fact AI program that would generate pars-
ing conversations with users. Some that character recognition is easy for ersh from grammars. Newell consid-
“patients” believed they were dealing humans but still difficult for comput- ered it good because it demonstrated
with a person. Others knew that Eliza ers is used to try to keep programs from that AI could solve practical problems.
was a machine but still wanted to con- logging on to Internet sites. For exam- Floyd, a pioneer in the field of parsing,
sult it. Nobody who examined Eliza’s ple, the website may displayg a CAPT- explained that nobody could tell him
code would consider the program to CHA as shown here what class of grammars the AI parser
be intelligent. It had no information generator could handle, and he could
about the topics it discussed and did prove that that class was smaller than
not deduce anything from facts that the class of languages that could be
it was given. Some believed Weizen- handled by previously known math-
baum was seriously attempting to cre- ematical techniques. In short, while
ate intelligence by creating a program the AI system appeared to be useful, it
that could pass Turing’s test. How- and require the user to type “s m w m.” was inferior to systems that did not use
ever, in talks and conversations, Wei- This technique works well because the heuristic methods. Bob Floyd taught
zenbaum emphasized that was never character recognition problem has not me that an AI program may seem im-
his goal. On the contrary, by creating been solved. pressive but come out poorly when
a program that clearly was not “intel- Early AI experts taught us to design compared to math-based approaches.
ligent” but could pass as human, he character recognition programs by in-
showed that Turing’s test was not an terviewing human readers. For example, An AI System that “Understood”
intelligence test. readers might be asked how they distin- Drawings and Text
guished an “8” from a “B.” Consistently, A 1967 AI Ph.D thesis described a pro-
Robert Dupchak’s Penny-Matcher the rules they proposed failed when im- gram that purportedly “understood”
Around 1964,e the late Robert Dupchak, plemented and tested. People could do both natural language text and pic-
a CMU graduate student, built a small the job but could not explain how. tures. Using a light pen and a graph-
box that played the game of “penny Modern software for character rec- ics display,i a user could draw geomet-
matching.”f His box beat us consistently. ognition is based on restricting the ric figures. Using the keyboard, users
Consequently, we thought it must be fonts that will be used and analyzing could ask questions about the drawing.
very intelligent. the properties of the characters in those For example, one could ask “Is there a
It was Dupchak who was intelli- fonts. Most humans can read a text, in triangle inside a rectangle”? When the
gent—not his machine. The machine a new font without studying its charac- author demonstrated it, the program
only remembered past moves by its teristics, but machines often cannot. appeared to “understand” both the pic-
opponent and assumed that patterns The best solution to this problem is to tures and the questions. As a member
would repeat. Like Weizenbaum, Dup- avoid it. For texts created on a comput- of the examining committee, I read the
chak demonstrated that a computer er, both a human-readable image and a thesis and asked to try it myself. The
could appear smart without actually machine-readable string are available. system used heuristics that did not al-
being intelligent. He also demonstrat- Character recognition is not needed. ways work. I repeatedly input examples
ed that anyone who knew what was in- that caused the system to fail. In pro-
side his box would defeat it. In a seri- An AI System for duction use, the system would have
ous application, it would be dangerous Constructing Parsers been completely untrustworthy.
to depend on such software. As a new professor, I made appoint- The work had been supervised by
ments with three famous colleagues to another Turing Award recipient, Her-
Character Recognition ask how to recognize a good topic for bert Simon, whose reaction to my ob-
A popular topic in early AI research and my students’ Ph.D theses. The late Alan serving that the system did not work
courses was the character recognition Perlis, the first recipient of ACM’s pres- was, “The system was not designed for
problem. The goal was to write pro- tigious Turing Award, gave the best an- antagonistic users.” Experience has
grams that could identify hand-drawn swer. Without looking up from his work, shown that computer systems must
or printed characters. This task, which he said, “Dave, you’ll know one when be prepared for users to be careless
most of us perform effortlessly, is dif- you see it. I’m busy; get out of here!” Two and, sometimes, antagonistic. The
other Turing Award winners, the late Al- techniques used in that thesis would
d Practitioners of Rogerian psychotherapy echo len Newell and the late Robert Floyd, not be acceptable in any commercial
the patient’s words in their responses. met with me. Separately, both said that
e Dupchak’s accidental death prevented publi- while they could not answer my ques- h Parsers, an essential component of compilers,
cation of his work. I cannot give a precise date. tion directly, they would discuss both a divide a program into its constituent parts.
f Penny-matching is a two-player game. Each Before Floyd’s work, parsers were created by
player uses a coin to make a head or tail
good thesis and a bad one. Interestingly, humans. Floyd’s algorithm automatically gen-
choice. One player wins if both pick the same erated parsers for a large class of languages.
face; the other wins if the choices are different. g This example was found in Wikipedia. i Advanced hardware for the time.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 29
viewpoints

product. If heuristics are used in criti- Artificial Neural Networks heat absorption/loss characteristics
cal applications, legal liability will be a Another approach to AI is based on of the building, and so on. Using this
serious problem. modeling the brain. Brains are a net- model, which allowed their system
work of units called neurons. Some to anticipate needs, and the ability to
An AI Assembly-Line Assistant researchers try to produce AI by imitat- pump heat from one part of the build-
An assembly line could run faster after ing the structure of a brain. They cre- ing to another, they designed a system
tool-handling assistants were hired: ate models of neurons and use them that reduced temperature fluctuations
Whenever workers finished using a to simulate neural networks. Artificial and was more energy efficient.
tool, they tossed it in a box; when a tool neural networks can perform simple Humans do not have the measure-
was needed, the assistant retrieved it tasks but cannot do anything that can- ment and calculation ability that is
for the workers. not be done by conventional comput- available to a modern computer sys-
A top research lab was contracted ers. Generally, conventional programs tem; a system that imitates people
to replace the human assistants with are more efficient. Several experiments won’t do as well as one based on physi-
robots. This proved unexpectedly dif- have shown that conventional mathe- cal models and modern sensors.
ficult. The best computer vision algo- matical algorithms outperform neural Humans solve complex physics prob-
rithms could not find the desired tool networks. There is intuitive appeal to lems all the time. For example, running
in the heap. Eventually, the problem constructing an artificial brain based is complex. Runners maintain balance
was changed. Instead of tossing the on a model of a biological brain, but no intuitively but have no idea how they
tool into the box, assemblers handed reason to believe this is a practical way do it. A solution to a control problem
it to the robot, which put it in the box. to solve problems. should be based on physical laws, and
The robot remembered where the tool mathematics, not mimicking people.
was and could retrieve it easily. The AI The Usefulness of Physics Computers can rapidly search complex
controlled assistant could not imitate and Mathematics spaces completely; people cannot. For
the human but could do more. It is A researcher presented a paper on us- example, a human who wants to drive to
wiser to modify the problem than to ing AI for image processing to an audi- a previously unvisited location is likely
accept a heuristic solution. ence that included experts in radar sig- to modify a route to a previously visited
nal processing. They observed that the nearby place. Today’s navigation devic-
“Artificial Intelligence” in Germanj program used special cases of widely es can obtain the latest data and calcu-
When AI was young, a German psychol- used signal-processing algorithms late a route from scratch and often find
ogy researcher visited pioneer AI re- and asked “What is new in your work?” better routes than a human would.
searchers Seymour Papert and Marvin The speaker, unaware of techniques
Minsky (both now deceased) at MIT. used in signal processing, replied, “My Machine Learning
He asked how to say “artificial intel- methods are new in AI.” AI researchers Another approach to creating artifi-
ligence” in German because he found are often so obsessed with imitating cial intelligence is to construct pro-
the literal translation (Künstliche Intel- human beings that they ignore practi- grams that have minimal initial capa-
ligenzk) meaningless. cal approaches to a problem. bility but improve their performance
Neither researcher spoke Ger- A study of building temperature- during use. This is called machine
man. However, they invited him to control systems compared an AI ap- learning. This approach is not new.
an AI conference, predicting that he proach with one developed by expe- Alan Turing speculated about build-
would know the answer after hearing rienced engineers. The AI program ing a program with the capabilities of
the talks. Afterward, he announced monitored individual rooms and a child that would be taught as a child
that the translation was “natürliche turned on the cooling/heating as need- is taught.1 Learning is not magic; it is
Dummheit” (natural stupidity) be- ed. The engineers used a heat-flow the use of data collected during use to
cause AI researchers violated basic model that included the building’s improve future performance. That re-
rules of psychology research. He orientation, the amount of sunlight quires no “intelligence.” Robert Dup-
said that psychology researchers do hitting sections of the building, the chak’s simple penny-matching ma-
not generally ask subjects how they chine used data about an opponent’s
solve a problem because the answers behavior and appeared to “learn.” Use
might not be accurate; if they do Learning is not magic, of anthropomorphic terms obscures
ask, they do not trust the answers. the actual mechanism.
In contrast, AI researchers were ask- it is the use of data Building programs that “learn” seems
ing chess players how they decide on collected during use easier than analyzing the actual prob-
their next move and then writing pro- lem, but the programs may be untrust-
grams based on the player’s answers. to improve future worthy. Programs that “learn” often ex-
performance. hibit the weaknesses of “hill-climbing”l
algorithms; they can miss the best
j I cannot warrant the truth of this story; it was
related to me as true, but I was not present for
the events. I include it because it contains an l Hill-climbing algorithms are analogous to hikers
important lesson. who always walk uphill. They may end up at the
k Current terminology in German. top of a foothill far below the mountain peak.

30 COMM UNICATIO NS O F THE AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


viewpoints

solution. They may also err because of Do not be misled by demonstra-


incomplete or biased experience. Learn- tions: they are often misleading be-
ing can be viewed as a restricted form of Whenever developers cause the demonstrator avoids any
statistical classification, mathematics talk about AI, situations where the “AI” fails. Com-
that is well developed. Machine-learning puters can do many things better
algorithms are heuristic and may fail in ask questions. than people. Humans have evolved
unusual situations. Although “AI” through a sequence of slight improve-
ments that need not lead to an opti-
Robot Ethics has no generally mal design. “Natural” methods have
When people view computers as think- accepted definition, evolved to use our limited sensors and
ing or sentient beings, ethical issues actuators. Modern computer systems
arise. Ethicists traditionally asked if the it may mean use powerful sensors and remote ac-
use of some device would be ethical; something specific tuators, and can apply mathematical
Now, many people discuss our ethi- methods that are not practical for hu-
cal obligations to AIs and whether AIs to them. mans. It seems very unlikely that hu-
will treat us ethically. Sometimes ethi- man methods are the best methods
cists posit situations in which AI must for computers.
choose between two actions with un- When Alan Turing rejected “Can
pleasant consequences, and ask what machines think?” as unscientific,
the device should do. Because people in and Weizenbaum demonstrated it is and described a different question
the same situation would have the same easy to create the illusion of intelligence. to illustrate what he meant by “sci-
issues, these dilemmas were discussed We do not want computer systems entific,” he was right but misled us.
long before computers existed. Others that perform tricks; we need trust- Researchers working on his “replace-
discuss whether we are allowed to dam- worthy tools. Trustworthy systems ment question” are wasting their time
age an AI. These questions distract us must be based on sound mathemat- and, very often, public resources. We
from the real question, “Is the machine ics and science, not heuristics or illu- don’t need machines that simulate
trustworthy enough to be used?” sionist’s tricks. people. We need machines that do
things that people can’t do, won’t do,
Wordplay Conclusion or don’t do well.
The AI research community exploits Whenever developers talk about AI, Instead of asking “Can a computer
the way that words change meaning: ask questions. Although “AI” has no win Turing’s imitation game?” we
the community’s use of the word “ro- generally accepted definition, it may should be studying more specific ques-
bot” is an example. “Robot” began as mean something specific to them. tions such as “Can a computer system
a Czech word in Karel Čapek’s play, The term “AI” obscures the actual safely control the speed of a car when
R. U. R. (Rossum’s Universal Robots). mechanism but, while it often hides following another car?” There are
Čapek’s robots were humanoids, al- sloppy and untrustworthy methods, many interesting, useful, and scien-
most indistinguishable from human it might be concealing a sound mech- tific questions about computer capa-
beings, and acted like humans. If “ro- anism. An AI might be using sound bilities. “Can machines think?” and
bot” is used with this meaning, build- logic with accurate information, or “Is this program intelligent?” are not
ing robots is challenging. However, the it could be applying statistical infer- among them.
word “robot” is now used in connection ence using data of doubtful prove- Verifiable algorithms are preferable
with vacuum cleaners, bomb-disposal nance. It might be a well-structured to heuristics. Devices that use heuris-
devices, flying drones, and basic factory algorithm that can be shown to work tics to create the illusion of intelligence
automation. Many claim to be building correctly, or it could be a set of heu- present a risk we should not accept.
robots even though devices remotely ristics with unknown limitations. We
like Karel Čapek’s are nowhere in sight. cannot trust a device unless we know References
1. Turing, A.M. Computing machinery and intelligence.
This wordplay adds an aura of wizardry how it works. Mind 59 (1950), 433–460.
and distracts us from examining the ac- AI methods are least risky when it 2. Weizenbaum, J. Automating psychotherapy. ACM
Forum Letter to the Editor. Commun. ACM 17, 7 (July
tual mechanism to see if it is trustwor- is acceptable to get an incorrect result 1974), 425; doi: 10.1145/361011.361081.
thy. Today’s “robots” are machines that or no result at all. If you are prepared 3. Weizenbaum, J. ELIZA—A computer program for the
study of natural language communication between
can, and should, be evaluated as such. to accept “I don’t understand” or an ir- man and machine. Commun. ACM 9, 1 (Jan. 1966),
When discussing AI, it is important to relevant answer from a “personal assis- 36–45; doi: 10.1145/365153.365168.

demand precise definitions. tant,” AI is harmless. If the response is


important, be hesitant about using AI. David Lorge Parnas works for Middle Road Software, Inc.,
in Ottawa, Canada. He is Professor Emeritus at McMaster
AI: Creating Illusions Some AI programs almost always University in Canada and the University of Limerick in
Alan Perlis referred to AI researchers as Ireland.
work and are dangerous because we
“illusionists” because they try to create learn to depend on them. A failure may
Lillian Chik-Parnas, Nancy Leveson, Peter Denning, and
the illusion of intelligence. He argued go undetected; even if failures are de- Peter Neumann offered helpful suggestions about earlier
they should be considered stage magi- tected, users may not be prepared to drafts of this column.
cians rather than scientists. Dupchak proceed without the device. Copyright held by author.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 31
V
viewpoints

DOI:10.1145/3132726 Vasant Dhar and Roger M. Stein

Economic and
Business Dimensions
FinTech Platforms
and Strategy
Integrating trust and automation in finance.

T
E N Y E A R S F R O M now, will
Google and Amazon play
more of a role in managing
your investment portfolio
than Fidelity? Finance has
traditionally been about trust. We
trust banks to hold our money and
give it back to us when we want it; and
we trust brokerage firms to buy the
securities we want at market prices
and to debit and credit our accounts
accordingly. Because trust is so im-
portant, we have historically held
banks and financial firms to much
higher standards of compliance and
control than other businesses. Finan-
cial institutions are required to follow
well-defined processes with oversight
and failsafe plans aimed at minimiz-
ing risk and maximizing public trust.
These processes have traditionally
involved humans, even as they have
been increasingly augmented with
technology over time.
But the last 20 years have seen the
emergence of a fundamentally new
and different phenomenon enabled to make or support decisions. It would able: mistakes are relatively infrequent
by Internet connectivity and access: have seemed unthinkable 10 years ago and their consequences do not exceed
the “platform” business. In platform to imagine trusting autonomous driv- a reasonable tolerance threshold.1
businesses, many complex processes, ing vehicles to take over the wheel, or The question is, will future inves-
including compliance and checks-and- trust robots to perform surgery on us. tors trust FinTech platforms to the
IMAGE BY ANTON K HRUPIN

balance procedures, are performed But we are increasingly doing just that degree that previous generations
securely by machines. Along with this in a growing number of real-world ar- have trusted traditional banks? Con-
platform phenomenon, newer genera- eas. Broadly, we seem comfortable versely, what will it take for FinTech
tions of humans have co-evolved to be trusting machine-made decisions in platforms to be trusted sufficiently by
more comfortable trusting machines domains in which the risks are accept- future generations?

32 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


viewpoints

FinTech and Platforms


To answer this type of question, we first
Internet businesses
Calendar
need to define what we mean by “Fin-
Tech” and “platform,” and identify the
“platform opportunities” in finance.
are increasingly of Events
We define FinTech as: “Financial sec- structured as September 29–October 2
tor innovations involving technology- platforms, and in MEMOCODE ‘17: 15th ACM-IEEE
International Conference
enabled business models that can facil-
itate disintermediation; revolutionize most markets, on Formal Methods and
Models for System Design,
how existing firms create and deliver a few such platforms Vienna, Austria,
Contact: Jean-Pierre Talpin,
products and services; address privacy,
regulatory and law-enforcement chal- tend to dominate. Email: [email protected]

lenges; provide new gateways for entre- October 1–4


preneurship; and seed opportunities ICTIR ‘17: ACM SIGIR
for inclusive growth.a International Conference on the
Theory of Information Retrieval,
We define a platform as an entity that Amsterdam, Netherlands,
provides “A nexus of operational and FinTech Platforms Sponsored: ACM/SIG,
business rules; integrated technology Because of historical factors and regu- Contact: Evangelos Kanoulas,
architecture and engines; and channel lation, platforms in finance have tend- Email: [email protected]
access that facilitates exchange between ed to be “incomplete” in that they October 1–4
two or more interdependent groups, lack one of more of the three essential SIGUCCS ‘17: ACM SIGUCCS
usually consumers and producers.”3,5 components.2 Exchanges are exam- Annual Conference,
Platforms have a number of impor- ples of some of the earliest “members Seattle, WA,
Sponsored: ACM/SIG,
tant properties, but today’s “complete” only” incomplete platforms—the Contact: Cate Lyon,
Internet platforms always have three Royal Exchange of London (opened in Email: [email protected]
essential components: 1571),e and the Osaka Rice Exchange
1. They are “open,” allowing easy (Dōjima kome ichiba, established in October 4–7
SIGITE/RIIT 2017: The
participation; 1697) are early instances of these.4 18th Annual Conference on
2. They implement key business and Modern exchanges have replaced Information Technology
operational processes, some of which manual process with machines and Education and the 6th Annual
Conference on Research in
typically exhibit network effects that added process sophistication, but are Information Technology (RIIT),
increase in value as participation in- still generally accessible only to mem- Rochester, NY,
creases; and bers in order to minimize risk to the Sponsored: ACM/SIG,
3. They implement these business platform. Similarly, clearinghouses Contact: Stephen Zilora,
Email: [email protected]
processes automatically using enabling and payment platforms with retricted
technology (which may also capture accesss are incomplete. October 9–13
and generate vast amounts of data that Competition among platforms of- MobiCom ‘17: The 23rd Annual
enhances the value of the platform).6 ten arises from differentiation of some International Conference
on Mobile Computing and
Internet businesses are increas- component. For example, electronic Networking,
ingly structured as platforms, and in exchanges, the evolutionary successors Snowbird, UT,
most markets, a few such platforms to the Royal Exchange of London and Sponsored: ACM/SIG,
Contact: Jacobus Van Der
tend to dominate. Facebook has domi- the Osaka Rice Exchange, are complete
Merwe,
nated social networking with 1.8 bil- platforms, allowing a much broader Email: [email protected]
lion users;b Amazon has a 65% market set of investors to trade assets such as
share in online books; Google’s global foreign exchange, equities, or bonds October 12–14
SEC ‘17: IEEE/ACM Symposium
search market share is 77%;c Uber directly with each other and without
on Edge Computing,
claims more than 80% of U.S. market human brokers. Dozens of e-exchang- San Jose/Silicon Valley, CA,
share in its market.d And so on. Com- es have arisen in the last two decades, Contact: Junshan Zhang,
plete platforms such as these become each designed to meet some kind of Email: [email protected]
increasingly difficult to dislodge once specialized need, such as the ability to
October 14–18
they are established. As a result, the transact large sizes or to provide incen- MICRO-50: The 50th Annual
allure of market dominance provides tives to specific types of liquidity takers IEEE/ACM International
a significant incentive to establish a or makers, and so on. Liquidnet is an Symposium on
Microarchitecture,
complete platform. example of a complete platform, well Cambridge, MA,
suited for institutional trading involv- Contact: Hillery C. Hunter,
ing execution of large blocks of assets Email: hhunterjaeger@hotmail.
that are often illiquid. Historically com
a See http://bit.ly/2dZnJxN
b See http://bit.ly/2daz7Yr
these were handled through brokers to
c See http://bit.ly/1MYzonY
d See http://bit.ly/2wMvqiQ e See http://bit.ly/2wMvqiQ

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 33
viewpoints

XRDS
minimize market impact. Now, robots a number of ways. To start, the frame-
match market participants wishing to work identifies which capabilities are
transact, without third-party interme- required for each type of platform to at-
diation. In so doing they increase li- tain completeness.
quidity at a much larger scale and with Conversely, the framework suggests
more efficiency. which business functions are vulner-
able to different types of disruptions
Seeks Student Lack of open access is not the only
way in which a finance platform may due to their incompleteness. It also
Volunteers be incomplete. Yahoo Finance, for ex- provides a way to think about transi-
ample, is incomplete because it lacks tions from partial to complete plat-
Are you a student who key business processes to complete forms, the implications of these, and
transactions. Ratings agencies are in- the opportunities that would motivate
enjoys staying up to
complete because most of their key val- such transitions. Finally, the frame-
date with the latest ue-adding business processes require work permits us to examine new tech-
tech innovations, and is human expertise and are not amenable nologies in terms of the businesses
looking to make an impact to codification in IT systems. they are likely to impact. Technologies
on the ACM community? The Venn diagram shown in the ac- that facilitate platform completion
companying figure provides a graphi- along a specific dimension will likely
Editorial positions cal view of our framework, which com- affect businesses that are incomplete
bines the three platform components along that dimension.
are now open. and also provides some examples. Complete platforms may also be
The intersection at the center defines created, by design, de novo. Amazon
For more information complete platforms; all of the other and PayPal are early examples in retail
and to apply visit: regions denote incomplete ones. The and payments. Peer-to-peer lending
http://xrds.acm.org/ Venn diagram representation is intui- and robo-advisor platforms are recent
volunteer.cfm tive. The businesses on the periphery instances of complete platforms in
of the diagram tend to be those that finance. Lending Tree, for example,
play supporting roles, rather than cen- is “open,” and provides the technol-
Association for
Computing Machinery
tral ones, in the life cycle of financial ogy infrastructure and processing re-
transactions. The platform framework quired to connect lenders and borrow-
can be useful for analyzing incumbent ers directly. Robo-advisors are openly

S
and new entrant business models in accessible to retail investors and aim
G
.OR
CM
S .A
XRD

D
O .3
•N
L .2 2
VO
0 16
G2

R
The three core components of a complete platform with examples of platform businesses
R IN
SP

ts
Stu
den
exhibiting various levels of completeness.

X
for
zine
aga
AC MM
The
ads
ssro
Cro

G
.OR
.A CM
RDS

ital ion
Digbricat
Fa

S
for n .OR
G
est atio abric CM
anif bric
S .A
XRD
A M ital Fa F

D
with nd
Dig ting a Ha NO
.4

f t Prin L ends VO
L .2
2•

So a tion 20
16
r ic ER
Fab
MM

R
SU

ts
den
r Stu

X
e fo
azin
M Mag
AC
The
ads
ssro
Cro

S
G
.OR
CM
S .A
XRD

D
O .2
•N
L .2 2
VO
15
20
TER

R
W IN

ts
den
Stu

X
for
zine

s of
ultureuting
C mp er y:
Co d Th oce
r an nd Eur
riph m
e Pe ntris
call
y,
te
Cen Beyo g o bally
L
The ignin Glo
Des alizing n and
ib ig hina
ann Des
34 COMMUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0 C Guan onation in C
v
Jian Inno
viewpoints

to do much of what human advisors participants today, once market partici-


have traditionally done—screening pants begin to trust the data-handling
investments and providing standard The broad and AI capabilities of the large Inter-
analytics like portfolio optimization— implications net platforms, they could become core
through technology. components of FinTech platforms for
For businesses that are currently for FinTech the same reasons they are trusted in the
based on incomplete platforms, the platforms social, retail, and device domains.
diagram also suggests approaches to Indeed, partnerships between
platform completion. One strategy are clear. technology platforms and finan-
for transitioning from a legacy incom- cial services franchises are already
plete platform to a complete platform emerging. H&R Block’s partnership
is to add a missing component. This with IBM’s Watson in the tax arena
platform completion strategy can be is a recent case in retail finance. Just
pictured schematically by considering plete platforms that require verification as H&R Block chose not to build its
arrows originating at peripheral points and trust. own version of Watson from scratch,
on the diagram and terminating in the Emerging AI technologies could be other financial players are unlikely to
center. An example of this form of plat- similarly disruptive in replacing exist- choose to build Google’s or Amazon’s
form completion from a non-finance ing business processes that involve artificial intelligence and machine-
domain is the expansion of Angie’s basic human perception and reason- learning capabilities from scratch.
Lists from a home services rating sys- ing about both structured and un- The broad implications for FinTech
tem to home services booking system, structured data from text, images, and platforms are clear. If future investors
through the addition of a process for sound. As the volume of unstructured and consumers of financial services
conducting transactions. In finance, an data continues to increase, the artifi- begin to trust FinTech platforms in
equivalent example might be if Google cial intelligence functionality for inter- the ways they have for retail and travel,
or Yahoo Finance were to introduce ca- preting and acting on it automatically then financial advisory and interme-
pabilities like trading, investment ad- will likely replace a growing number of diation activity may well go the way of
vice, asset management (or even invest- human-intensive processes. manufacturing and brick-and-mor-
ment services) on their platforms. Dominant AI platforms are already tar retail stores. In such a connected
An alternative strategy for FinTech beginning to emerge. They are fueled world of data-intensive platforms, in
platform completion is component re- by their access to “big data” (searches, which products and services can be
placement, aimed at introducing func- purchases, and posts from partici- micro-tailored to specific clients, and
tionality that is cheaper, faster, or safer, pants using their technology) and are in which trusted platforms provide a
relative to legacy platform components. well positioned to become critical broad spectrum of these products and
Such replacement strategies most often components of complete FinTech plat- services to these clients, will future
seek to leverage the increasing speed, forms of the future, where the platform generations really care if they buy stock
connectivity, and safety of technology owners can exploit their expertise in through Fidelity, Google, or Amazon …
components. A compelling FinTech cloud computing, secure transaction or through some combination?
example is the potential for Blockchain processing, search and optimization,
technology to replace legacy post-trade without incurring the up-front costs of References
1. Dhar, V. When to trust robots with decisions and when
processes that currently require trusted formally entering the finance industry not to. Harvard Business Review (May 2016); http://
third parties such as clearinghouses and as full-blown competitors. bit.ly/23W1xFi
2. Dhar, V. and Stein R.M. FinTech platforms and
depositories to manage and administer In the current regulatory landscape, strategy. MIT Sloan Research Paper No. 5183-16,
the clearing, settlement, and custody platform completion and component 2017; SSRN link: https://papers.ssrn.com/sol3/papers.
cfm?abstract_id=2892098
associated with trades and payments. replacement in FinTech seem likely to 3. Dhar, V. and Sundararajan, A. Information technologies
Properly implemented, Blockchain- occur mostly through platform partner- in business: A blueprint for education and research.
Information Systems Research 18, 2 (June 2007);
based systems could significantly re- ships, driven by the forces of regulation http://bit.ly/2waHo8o
4. Moss, D. and Kintgen, E. The Dojima Rice Market and
duce the need for specialized custodial and economics. Customer acquisition the Origins of Futures Trading. HBR Case 9-709-044.
institutions since, in principle, market and regulatory compliance activities in Harvard Business School, 2010.
5. Parker, G. and Van Alstyne, M. Platform strategy. In
participants would be able to exchange finance can be very expensive for new- The Palgrave Encyclopedia of Strategic Management,
and record transactions securely in the comers, making it difficult to dislodge Macmillan, Hampshire, U.K., 2015.
6. Van Alstyne, M., Parker, G., and Choudary, S. Pipelines,
cloud. This new technology, which es- incumbents. At the same time, incum- platforms, and the new rules of strategy. Harvard
sentially uses crowdsourcing to imple- bents tend to resist ceding access to Business Review (Apr. 2016); http://bit.ly/1RfZAzM
ment distributed ledgers, is currently their customers, and this aversion will
too slow and expensive to replace exist- likely induce them to pursue strategies Vasant Dhar ([email protected]) is a professor at the
Stern School of Business and the Center for Data Science
ing practices. However, the expectation centered on acquiring innovators for at New York University.
is that speed and cost improvements platform completion or component re- Roger M. Stein ([email protected]) is an adjunct
will displace platform components that placement while exploiting the existing professor of Finance at NYU’s Stern School of Business
and a research affiliate at the MIT Laboratory for
address post-transaction processes. levels of trust embodied in their brands. Financial Engineering.
Indeed, Blockchain could serve as the Finally—and perhaps most significant-
core technology for other future com- ly—even though they are not natural Copyright held by author.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 35
V
viewpoints

DOI:10.1145/3132728 George V. Neville-Neil

Kode Vicious
IoT: The Internet
of Terror
If it seems like the sky is falling, that’s because it is.

Dear KV,
We are deploying a consumer IoT (In-
ternet of Things) device, with each de-
vice connected to a cloud service that
acts as the platform from which it will
be controlled. The device itself is not
dangerous: it is a simple, slimmed-
down tablet to be used in hotel chains
to replace an alarm clock and TV re-
mote, and to provide access to room
service. The device is battery operated,
rechargeable, and cheap enough that
hopefully no one will want to steal it.
Guests cannot load any information
into it, and—unlike a typical tablet—it
does not serve as a Web browser.
We have an engineer who seems
overly worried about the security of the
communication between the device
and the back end. We are working on
the alpha version of the system now.
All communication between the device
and the cloud is unencrypted, and no
user data crosses the network. Who
cares what TV channel you are watch-
ing or what you ate for breakfast? On
such a lightweight embedded device,
the battery-life cost of encryption is
significant and reduces battery life by lot more of the battery protecting what running around announcing that the
25% in our tests. We expect many users is, in reality, data that is hardly a state sky is falling, but, unless you have been
IMAGE BY AND RIJ BORYS ASSOCIAT ES/SHUT TERSTOCK

will not replace them in their charg- secret. What do you think is the right living under a rock, you will notice that,
ing cradles, since most people cannot level of encryption, if any, to use in indeed, the sky is falling. Not a day goes
remember to charge their own phones such a system, and how can we get the by without a significant attack against
every night; thus, battery life will be annoying engineer to shut up? networked systems making the news,
very important to the project. Not So Secret and the Internet of Terror is leading the
Even if we do turn on some encryp- charge in taking distributed systems
tion, we would like it to be as little as down the road to hell—a road that you
possible, again, to preserve battery life. Dear Not So, wish to pave with your good intentions.
I know a longer key is harder to break, It is true that many security-focused en- Before I get to the question of en-
but it also means that we will be using a gineers can sound like Chicken Little, cryption and key length, I would like

36 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


viewpoints

to point out two things. An IoT device suggests AES-128 as a minimum from
is nothing more than an embedded 2016 through 2030, and as your system
system with a TCP/IP stack. It is not a Encryption, (likely) does not contain state secrets, it
magical object that is somehow pro- in and of itself, ought to be sufficient for your use.
tected from attackers because of how Many embedded microprocessors
cool, interesting, or colorful it is. is not a solution now come with dedicated circuits
Second, and I cannot believe I have to securing to offload cryptographic operations.
to point this out to people who would The offloading of crypto algorithms is
read or write to KV, but the Internet has a system. nothing new and is subject to the same
a lot of stuff on it, and it is getting bigger “cycle of life” that we see in other areas
every day. Once upon a time, there were of software and systems engineering,
fewer than 100 nodes on the Internet, where specialized services first appear
and that time is long past. KV has three as add-on peripherals, then show up
Internet-enabled devices within arm’s in the CPU itself. In the 1990s, micro-
reach, and, if you think a billion users with their room number and the ap- processors were unable to keep up with
of the Internet aren’t scary, try mul- propriate settings. Perhaps you would the then-new algorithms being used to
tiplying that by 10 once every fridge, like to put a hotel out of business? set up virtual private networks (VPNs),
microwave, and hotel alarm clock can Easy enough, just order expensive so companies produced cryptographic
spew packets into the ether(net). champagne from room service for offload chips.
Let’s get this straight: if you attach everyone staying in the hotel. Hotel Processors got faster and larger,
something to a network—any net- guests will gladly accept it and then and what could not be handled by fre-
work—it had better be secured as well sue to have it removed from their quency scaling was put into special in-
as possible, because I am quite tired bills. Attacks do not have to occur at structions to handle the harder parts
of being awakened by the sound of the the NSA/FSB/GCHQ or similar levels of crypto. Embedded processors, like
gnashing of teeth caused by each and to run you and your customers out of all processors, continue to gain more
every new network security breach. The business. I think you can see why your circuits, including those for cryptog-
Internet reaches everywhere, and if engineer is talking about encryption. raphy. Any device, such as yours, that
even one-tenth of 1% of the people on it You did not mention authentication drives a display is bound to have some
are bad actors, then that is one million at all, so let’s talk a bit about that as sort of cryptographic engine, and the
potential attackers across the earth, well, because authentication protocols only reason to ignore it is sheer lazi-
and each one may be in control of far also require the use of cryptographic ness. It is very likely that your embed-
more than three devices! algorithms—the ones, you complain, ded processor already has accelera-
Encryption, in and of itself, is not that drain your batteries. The system tion for AES, and if it does not, spend
a solution to securing a system. There you describe needs the ability to dis- a quarter and buy a better one.
are many components that go into cern who made a request as a way of KV
building a secure system and service, preventing the attack I just described,
most of which I will not go into here where I ordered champagne for every-
due to limitation of space and my one in the hotel. How are you going to
Related articles
blood pressure medicine, which my ensure that room 243 really ordered on queue.acm.org
doctor tells me not to grind between those drinks?
Kode Vicious Battles On
my teeth. Encrypted communication Now that I have convinced you that
George Neville-Neil
is one tool to be used when securing connections to the back-end system A koder with attitude, KV answers your
a networked service. You say there are need to be both encrypted and authen- questions. Miss Manners he ain’t.
no “state secrets” in your system, and, ticated, we can split hairs on how big http://queue.acm.org/detail.cfm?id=1059801
I guess, so long as the president is not a key to use, and, you know what they Pervasive, Dynamic Authentication of
staying in one of your IoT-equipped say, the bigger the key, the deeper the Physical Items
hotels, perhaps that is true. Even if lock! Choosing algorithms and key Mandel Yu and Srinivas Devadas
The use of silicon PUF circuits
the president no longer eats children sizes is where we can start to talk about
http://queue.acm.org/detail.cfm?id=3047967
for breakfast, there are plenty of other device features and power usage.
risks involved in building a system as At the moment, the AES (Advanced Security Collapse in the HTTPS Market
Axel Arnbak, et al.
you have described. Encryption Standard) set of algorithms Assessing legal and technical solutions to
If attackers gain access to the pri- is what is used most often for encryp- secure HTTPS
vate Wi-Fi network the hotel uses to tion, and this protocol has at least http://queue.acm.org/detail.cfm?id=2673311
deploy your system—which is likely, three standard key lengths: 128, 192,
because the hotel will probably pick and 256 bits. There are various rec- George V. Neville-Neil ([email protected]) is the proprietor of
Neville-Neil Consulting and co-chair of the ACM Queue
a password such as “1234567890”— ommendations on key lengths, but editorial board. He works on networking and operating
then they can see all the traffic used I rather like the comparison of sev- systems code for fun and profit, teaches courses on
various programming-related subjects, and encourages
to perform operations. Want to wake eral sets of recommendations on this your comments, quips, and code snips pertaining to his
Communications column.
your neighbors at 01:00, 03:00, and website: https://www.keylength.com/
05:00? Simple, just create a request en/4/. NSA—whoops, I mean NIST— Copyright held by author.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 37
V
viewpoints

DOI:10.1145/3133233 Roderick Chapman, Neil White, and Jim Woodcock

Viewpoints
What Can Agile Methods
Bring to High-Integrity
Software Development?
Considering the issues and opportunities raised by Agile
practices in the development of high-integrity software.

T
HERE IS MUCH interest in
Agile engineering, espe-
cially for software develop-
ment. Agile’s proponents
promote its flexibility, lean-
ness, and ability to manage changing
requirements, and deride the plan-
driven or waterfall approach. Detrac-
tors criticize Agile’s free-for-all.
At Altran U.K., we use disciplined
and planned engineering, particu-
larly when it comes to high-integrity
systems that involve safety, security,
or other critical properties. A shallow
analysis is that Agile is anathema to
high-integrity systems development,
but this is a naïve reaction. Pertinent
questions include:
˲˲ Is Agile compatible with high-
integrity systems development?
˲˲ Where is Agile inappropriate?
˲˲ Do Agile’s assumptions hold for
high-integrity or embedded systems?
PHOTO BY UK’ S GOVERNM ENT DIGITA L SERVICE/F LICKR (C C BY 2 .0)

˲˲ Could high-integrity best-practice


improve Agile?
We don’t have all the answers, but are not amenable to change. Neither seven iterations, but delivered after
we hope this Viewpoint continues to myth rings true with our experience. 13, owing to a barrage of change re-
provoke debate on this important As our projects develop, they must quests. Nevertheless, it worked, and
topic. absorb change and respond to de- we were able to keep a substantial
Why bother with Agile at all? We fects just like any other. This led to formal specification and all the oth-
often encounter two myths regarding an observation: your project is going er design documentation up to date
the “traditional” approach to high- to become iterative whatever you do, as the project evolved. Knowing that
integrity software development: that so you might as well plan it that way “change” and “iteration” were at the
we somehow manage to perform a from the beginning. This lesson was heart of the Agile manifesto, we de-
single-iteration waterfall style pro- put to good effect in the MULTOS CA cided to see what we could learn and
cess, and that “formal” notations project,6 which initially planned for bring to future projects.

38 COM MUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


viewpoints

Background and Sources break the proof (or the tests) … ”


Many consider Agile as beginning Upfront Activities and Architec-
with XP,1 but its roots are much older. How do Agile’s ture. Agile advocates building what is
Many of XP’s core practices were well practices and needed now, using refactoring to defer
established long ago—their combina- decisions. Refactoring must be cheap,
tion and rigorous practice was novel. assumptions match fast, and limit rework to source code.
A survey9 notes that both incremen- real high-integrity Our principal weapon in meeting
tal and iterative styles of engineering non-functional requirements is system
were used in the 1950s. Redmill’s work projects? (not just software) architecture, in-
on evolutionary delivery12 predicted cluding redundancy and separation of
many of the problems faced by Agile critical from non-critical. Such things
projects. Boehm2 provides some use- can be prohibitively expensive to refac-
ful insight, while the development of tor late in the day. We need just enough
MULTOS CA6 compared Correctness- upfront architecture work to argue sat-
by-Construction with XP,3 showing difficult—corner-case vulnerabilities, isfaction of key properties. We also do
that the two were not such strange such as HeartBleed—defy an arbitrari- a “What If?” exercise to ensure the pro-
bedfellows after all. ly large amount of testing and use. In posed architecture can accommodate
Lockheed Martin developed the high-integrity development, we use di- foreseeable changes.
Mission Computers for the C130J by verse forms of verification, including The MULTOS CA project had some
combining semi-formal specification, checklist-driven reviews, automated extraordinary security requirements,
strong static verification, iterative de- static verification, traceability analy- which were met by a carefully consid-
velopment, and a strongly Lean mind- sis, and structural coverage analysis. ered combination of physical, opera-
set.11 Use of Agile has been reported by There is no barrier between these tional, and computer-based mecha-
Thales Avionics,5 while SINTEF have verification techniques and Agile, es- nisms. The software design was much
reported success with SafeScrum.13 A pecially with an automated integra- simplified as a result of this whole
recent and plain-speaking evaluation tion pipeline. We try to use verifica- system view. The physical measures
of Agile comes from Meyer,10 although tion techniques that complement, not included the provision of a bank vault
he does not specifically deal with high- repeat each other. If possible, we ad- and enclosing Faraday cage—hardly
integrity issues. vocate for sound static analyses (tools items that we could have ignored and
that find all the bugs, not just some of then “refactored in” later.
Agile Assumptions and Issues them), since this gives greater assur- User Stories and Non-Functional
How do Agile’s practices and assump- ance and reduces pre-test defect den- Requirements. For security and safety,
tions match real high-integrity proj- sity. With careful consideration of the we must ensure our specification cov-
ects? Here are some of the most obvi- assumptions that underpin the static,8 ers all possible inputs and states. Ag-
ous clashes. For each issue, we start we can reduce or entirely remove later ile uses stories to document require-
with a brief recap of the practice in testing activities. ments, but these sample behavior,
question, then go on to describe the is- The NATS iFACTS system4 augments with no completeness guarantee. The
sue or perceived clash, followed by our the software tools available to air-traf- gaps between stories may contain vul-
ideas and experiences in overcoming fic controllers in the U.K. It supplies nerabilities, bugs, unexpected termi-
it. Where possible, we close each sec- electronic flight-strip management, nation, and undefined behavior. Mey-
tion with an example of our experience trajectory prediction, and medium- er files user stories under “Bad and
from the C130J, MULTOS, or iFACTS term conflict detection for the U.K.’s Ugly,” and we agree.
projects. en-route airspace, giving controllers For critical systems, we prefer a
substantially improved ability to plan (semi-)formal specification that of-
Dependence on “Test” ahead and predict potential loss-of- fers some hope of completeness. The
Agile calls for continuous integra- separation in a sector. The developers C130J used Parnas tables to specify
tion, with a regression test suit, and a precede commit, build, and testing critical functions. They seemed to
test-first development style, with each activities with static analysis using the work well—they were simple enough
function associated with specific tests. SPARK toolset. Overnight, the integra- to be understood by system engineers,
Meyer calls these practices “brilliant” tion server rebuilds an entire proof of yet sufficiently formal to be imple-
in his summary analysis,10 but Agile as- the software, populating a persistent mented and analyzed for correctness.
sumes that dynamic test is the princi- cache, accessible to all developers the Sprint Pipeline. Agile usually re-
pal (possibly only) verification activity, next morning. Working on an isolated quires a single active “Sprint,” deliv-
saying when refactoring is complete, change, the developers can repro- ered immediately to the customer, so
or when the product is good enough to duce the proof of the entire system in only two builds are ever of interest:
ship. about 15 minutes on their desktop ma- ˲˲ Build N: in operation with the cus-
The safety-critical community hit chines, or in a matter of seconds for a tomer; used to report defects.
the limits of testing long ago. Ultra- change to a single module. While Agile ˲˲ Build N+1: the current develop-
reliability cannot be claimed from projects might have a “don’t break the ment sprint.
“lots of testing.” Security is even more tests” mantra, on iFACTS it’s “don’t This assumes the customer is al-

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 39
viewpoints

High-integrity Agile evidence engine.

ways able to accept delivery of the build (perhaps once every six months) In mitigation, we reduce on-target
product and use it immediately. This has a complete assurance package, in- testing with more static verification.
is not realistic for high-integrity proj- cluding a safety case, and is designed Secondly, if we know that code is com-
ects. Some customers have their own for eventual operation. The trick is pletely unambiguous, then we can jus-
acceptance process, and regulators to make the iteration rates harmonic, tify testing on host development ma-
may have to assess the system before both with each other and with the cus- chines and reduce the need to repeat
deployment. These processes can be tomer and regulator’s ability to accept the test runs on target. Hardware sim-
orders-of-magnitude slower than a and deploy releases. ulation can give each developer a desk-
typical Agile tempo. Embedded Systems Issues. Agile top virtual target or a fast cloud for the
iFACTS uses a deeper pipeline and presumes plentiful availability of fast deployment pipeline. While virtual-
multiple iteration rates, with at least testing resources to drive the devel- ization of popular microprocessors is
four builds in the pipeline: opment pipeline. For embedded sys- common, high-fidelity simulation of
˲˲ Build N: in operation with the cus- tems, if the hardware exists, there may a target’s operating environment re-
tomer. be just one or two target rigs that are mains a significant challenge.
˲˲ Build N+1: undergoing customer slow, hostile to automation, and dif- On one embedded project, all de-
acceptance. This process is subject to ficult to access. We have seen projects velopment of code, static analysis, and
regulatory requirements, and so can revert to 24-hour-a-day shift-working testing is done on developers’ host ma-
take months. to allow access to the target hardware. chines, which are plentiful, fast, and
˲˲ Build N+2: in development and offer a friendly environment. A final
test. re-run of the test cases is performed
˲˲ Build N+3: undergoing require-
Agile presumes on the target hardware with the expec-
ments and formal specification. tation of pass-first-time, and allowing
All four pipeline stages run concur- plentiful availability the collection of structural coverage
rently with multiple internal iteration of fast testing data at the object-code level.
rates and delivery standards. The de-
velopment team can deliver to our test resources to drive the Opportunities
team several times a day. A rapid build development pipeline. High-integrity practices can comple-
can be delivered to the customer (in, ment Agile. We previously mentioned
say, 24 hours), but comes with limita- the use of static verification tools.
tions on its assurance package and While we have a preference for devel-
allowed use: it is not intended for op- oper-led, sound analysis, we recog-
erational use, but for feedback from nize that some projects might find
the customer on a new feature. A full more benefit in unsound, but easier to

40 COM MUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


viewpoints

adopt, technologies, such as bounded ification activity, minimizing upfront


model checking. Computing power is activities in the face of non-functional
readily available to make these analy- How can we adopt requirements, the incompleteness of
ses tractable at an Agile tempo. an Agile approach, user stories (especially for secure sys-
A second opportunity comes with tems), the need to align sprints and it-
the realization that, if we can automate and still estimate, eration rates with customers and regu-
analysis, building and testing of code, bid, win, and lators ability to accept deliveries, and
why not automate the production of the (non-)availability of test hardware
other artifacts, such as synthesis of deliver projects for embedded systems.
code from formal models, traceability at a reasonable ˲˲ Agile assumptions: customer deci-
analysis, and all the other documenta- sion-making power and tempo, avail-
tion that might be required by a partic- profit? ability of plentiful test hardware, and
ular customer, regulator, or standard? commercial and contractual models
An example of such an “Evidence En- needed to “procure Agile.”
gine” is shown in the accompanying ˲˲ Opportunities: Adoption of formal
figure. languages, automated synthesis, and
static verification as part of the deploy-
Commercial Issues High-Integrity Deployment Pipeline ment pipeline. Generalization of con-
A crucial issue: How can we adopt an We have used the ideas described in tinuous integration into an “Evidence
Agile approach, and still estimate, bid, this Viewpoint at Altran, but we have Engine.”
win, and deliver projects at a reason- yet to deploy all of them at once. An We are deploying these ideas on fur-
able profit? Our customers’ default ap- idealized Agile development process ther projects, and look forward to be-
proach is often to require a competi- would use: ing able to report the results. We hope
tive bid at a fixed price, but how can ˲˲ Principled requirements engi- others will do the same.
this be possible in an Agile fashion if neering,7 concentrating initially on
we are brave enough to admit that we non-functional requirements and de- References
1. Beck, K. Extreme Programming Explained: Embrace
don’t know everything at the start of a velopment of architecture, specifica- Change. Addison Wesley, 1999.
project? In most of our projects, the tion, and associated satisfaction argu- 2. Boehm, B. and Turner, R. Balancing Agility and Discipline:
A Guide for the Perplexed. Addison Wesley, 2003.
users, procurers, and regulators are ments. 3. Chapman, R. and Amey, P. Static verification and
˲˲ A rolling formal specification, extreme programming. In Proceedings of the ACM
distinct groups, all of whom may have SIGAda Conference (2003).
wildly different views of what “Agile” with just enough formality to estimate 4. Chapman, R. and Schanda, F. Are we there yet? 20
years of industrial theorem proving with SPARK. In
means anyway. the remaining work and opening de- Proceedings of Interactive Theorem Proving 2014.
We have had good experience with velopment iterations. Springer LNCS Vol. 8558, (2014), 17–26.
5. Chenu, E. Agility and Lean for Avionics. Thales
a two-phase approach to contracting, ˲˲ An evidence engine, combining
Avionics, 2009; http://www.open-do.org/2009/05/07/
akin to the “architect/builder” model static verification, continuous regres- avionics-agility-and-lean/
6. Hall, A. and Chapman, R. Correctness by construction:
for building a house. Phase 1 consists sion testing, automated generation of Building a commercial secure system. IEEE Software
of the “Upfront” work—requirements, documents and assurance evidence, 19, 1 (2002), 18–25.
7. Jackson, M. Problem Frames. Pearson, 2000.
architectural design, and construction and a cloud of virtualized target plat- 8. Kanig, J. et al. Explicit assumptions—A prenup for
of a skeletal satisfaction argument. forms for integration and deployment marrying static and dynamic program verification. In
Proceedings of Tests and Proofs 2014. Springer-Verlag
The “just enough” termination criteria testing. LNCS, 8570, (2014), 142–157; DOI: 10.1007/978-3-
for phase 1 are: ˲˲ A planned, iterative development 319-09099-3_11
9. Larman, C. and Basili, V. Iterative and incremental
˲˲ Convincing evidence that the arc style, starting with a partial-order development: A brief history. IEEE Computer, 2003.
hitecture will work, meet non-func- over system infrastructure and fea- 10. Meyer, B. Agile! The Good, the Hype, and the Ugly.
Springer, 2014.
tional requirements, and can accom- tures that exposes potential for paral- 11. Middleton, P. and Sutton, J. Lean Software Strategies.
Productivity Press, 2005.
modate foreseeable change. lel development. Early iterations are 12. Redmill, F. Software Projects: Evolutionary vs. Big-
˲˲ An estimate of the size (and there- planned in detail, while the plans for Bang Delivery. Wiley, 1997; http://www.safetycritical.
info/library/NFR/.
fore cost) of the remaining work, given later iterations are left open to accom- 13. SINTEF. SafeScrum website, 2015; http://www.sintef.
the currently understood scope. modate change. no/safescrum.
˲˲ Established ground rules for agree-
ing the scope, size, and additional cost Conclusion Roderick Chapman ([email protected]) is an
independent consultant software engineer, and an
of change requests, and commitment Returning to the questions posed at honorary visiting professor at the University of York, U.K.
to the tempo of decision making for the beginning of this Viewpoint, we Neil White ([email protected]) is Director of the
triage of changes and defects. could summarize our findings as fol- Intelligent Systems Expertise Centre of Altran U.K.

Phase 2 (possibly a competitive lows: Jim Woodcock ([email protected]) is Professor


of Software Engineering in the Department of Computer
bid) could be planned as an iterative ˲˲ No clash: continuous integration, Science at the University of York, U.K.
development, using the ideas set out verification-driven development style,
here. The MULTOS CA was delivered continuous regression testing, and an Thanks to Felix Redmill, Jon Davies, Mike Parsons, Harold
Thimbleby, and Communications’ reviewers for their
in this fashion, with phase 1 on a time- explicitly planned iterative approach. comments on earlier versions of this Viewpoint.
and-materials basis, and phase 2 on a ˲˲ Potential clash or inappropriate:
capped-price risk-sharing contract. overdependence on test as the sole ver- Copyright held by authors.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 41
practice
DOI:10.1145/ 3106625
˲˲ Your claims are indefensible.

Article development led by ˲˲ He attacked every weak point in my
queue.acm.org
argument.
˲˲ I demolished his argument.
Code is a story that explains ˲˲ I never won an argument with him.

how to solve a particular problem. These sentences may seem innoc-


uous, but the problem is how we act
BY ALVARO VIDELA and feel based on them. We end up
seeing the person we are arguing with
as our opponent, someone who is at-

Metaphors
tacking our positions, so we structure
our arguments as if we were at war
with the other person. This means

We Compute By
the metaphor is not just language
flourish; we live it. Lakoff and John-
son propose the exercise of imagin-
ing a culture where arguments are
not viewed in terms of war—of win-
ners and losers—but in which lan-
guage is a dance, where you have to
cooperate with a partner in order to
achieve a desired goal, reaching con-
clusions as a team.
The book goes on to analyze the
different aspects of language and
metaphors and how they affect our
concepts and view of the world. The
authors give many examples to defend
the thesis that our understanding of
I N T H EIR NOW classic book Metaphors We Live By,6 the world is based on metaphors and
that those metaphors are the founda-
George Lakoff and Mark Johnson set out to show the tion of behavior.
linguistic and philosophical worlds that metaphor One of the book’s biggest takeaways
is that metaphors enable certain ways
isn’t just a matter of poetry and rhetorical flourish. of thinking, while restricting others,
They presented how metaphor permeates all areas of as the argument-as-war example illus-
our lives, and in particular that metaphor dictates how trates. This article applies this idea to
computer science. How do metaphors
we understand the world, how we act in it, how we live shape the way we understand comput-
in it. They showed that our conceptual system is based ing and its related problems? What
kinds of problems are enabled by the
on metaphors, too, but since we are not normally metaphors in use? And, no, monads are
aware of our own conceptual system, they had to study not like burritos!8
it via a proxy: language. First, the article looks at how meta-
phors help us understand the relative-
By studying language, Lakoff and Johnson tried ly young world of computers and how
to understand how metaphors work by imposing they affect the way we structure code
or design algorithms and data struc-
meaning in our lives. The basic example they present tures. We even solve problems based
is the conceptual metaphor “argument is war.” We on which metaphors are part of our
understand the act of arguing with another person arsenal, or toolbox. “Sometimes our
tools do what we tell them to. Other
in the same way we understand war. This leads to the times, we adapt ourselves to our tools’
following expressions in our daily language: requirements,” states author Nicholas

42 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


Carr in his book The Shallows.3 Meta- the answer. Now came these new ma- tried to write at a distance, using one-
phors are the tools of comprehension. chines that did the work automatically; to-one mapping of letters of the alpha-
they called them electronic computers, bet into wires, which sounds terribly
A Metaphorical Understanding eventually dropping the electronic part impractical. Around that time, thanks
People understand new concepts by re- of the name. So, our very own disci- to Louis Braille, people began to under-
lating them to what they already know. pline was named after a metaphor. stand that language could be coded in
Back in the late 1940s and early 1950s But metaphors also obscure possi- a form different from the way it sounds
when today’s computers came to life, bilities if you do not understand their (or is written). Morse code was the next
no word for such an invention existed, limitations. A common problem with step in improving the telegraph and in
IMAGE BY AND RIJ BORYS ASSOCIAT ES/SHUT TERSTOCK

but people understood them as auto- new metaphors is the original mean- understanding that people don’t have
matic brains. Actually, the word com- ing of the word is used at face value. to “write at a distance” to have actual
puter existed at that time, but it was The word being used to explain a new long-distance communications.
used to refer to the person who did cal- concept may actually limit understand- Thanks to mathematician Claude
culations for engineers. Think of engi- ing of that very concept. In his book Shannon and others like him, we
neers needing to know the trajectory The Information,5 James Gleick gives a managed to escape from the problems
of a projectile or how the wind would fascinating account of the invention of the telegraph metaphor and build
affect an airplane’s wing shape; they of the telegraph. The word tele-graph the whole discipline of information
would throw a couple of formulas and means far-writing. Lo and behold, early theory beginning in the late 1940s.
numbers to their human computers to get telegraphs were strange machines that Shannon’s seminal book, The Math-

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 43
practice

ematical Theory of Communication, put particular problem, but we often forget


forth the basic elements of communi- that a data structure also contains ex-
cation: information must be encoded planatory power.
first, then sent into a channel to be de- The choice of data structures helps
coded at the other end, so the destina-
tion receives it.7 By choosing convey the meaning. You could store
a bunch of elements in an array, but

Metaphors and Code


the right metaphor, what if you need to ensure the ele-
ments are unique? Isn’t that what sets
A well-known unattributed quote (often your program are for? The underlying representa-
misattributed to Charles Baker) is, “To
program is to write to another program-
reaches a level of tion of your set can still be backed by
a plain array, but now your program
mer about our solution to a problem.”2 abstraction that expresses its intentions in a clearer
A program is an explanation of how a
problem might be solved; it’s a meta-
requires the least way. Whenever other programmers
read it, they will understand that the
phor that stands in place of a person’s amount of effort elements in your collection must be
understanding. For metaphors to be
effective, however, they need to con- for someone foreign unique. It is important to realize that a
program is just another succession of
vey meaning using concepts already to the problem bits that a computer needs to process.
known to us. The explanatory power of
a particular program can be used as a to understand It is the programmer who gives mean-
ing to those bits, so you have to use the
measure of its own elegance.
Consider the following example.
the solution. right metaphor on top of them to make
your program clearer. As has been said,
Say you could program a computer to “no one has seen a program which the
command other computers to perform machine could not comprehend but
tasks, respecting their arrival order. which humans did.”2
This description is already difficult You must strive to make your pro-
to understand. On the other hand, gram as easy as possible for other pro-
you could describe the solution by grammers to understand. Ease of com-
describing a queue server that assigns prehension should be the standard by
jobs to workers based on a first come which programs are measured. Keep
first served queue discipline. in mind that you can arrange code in
A queue is a familiar concept from many different ways to solve a comput-
daily life, seen at the supermarket, the ing problem, but not all those arrange-
bank, airports, and train stations. Peo- ments will favor human communica-
ple know how they work, so for some- tion and understanding. You must ask
one reading your code, it might be eas- yourself: By reading my code, will oth-
ier to talk about queues, workers, and ers understand how I solved this par-
jobs than trying to explain this setting ticular problem?
without using the queue metaphor. Just as metaphors enable certain
By choosing the right metaphor, ways of understanding and limit oth-
your program reaches a level of ab- ers, so do data structures. Earlier we
straction that requires the least saw the problem with using direct
amount of effort for someone foreign metaphors such as the original tele-
to the problem to understand the so- graph. When it comes to data struc-
lution. Also, solving the problem with tures, you can see that a set will reveal
queues provides a whole mathematical that its elements are unique, and it
theory for free. Mathematics is itself a will allow you to test if an element is a
field where problems are tackled only member of the set. With a linked list,
when an appropriately expressive lan- however, you get the idea of travers-
guage is available to approach them. ing its elements one after the other,
With the queue metaphor, you are no without being able to skip them.
longer punching blindly in the dark. With an array you get the idea that
Now you can analyze and understand you can address its elements by index.
the problem with all the tools provided The same can be seen with queues
by queueing theory. and stacks, two of the most funda-
mental data structures taught in any
Data Structures as Metaphors algorithms course. Each can be im-
Analyzing data structures helps us plemented using an array—the differ-
to see which would be a better fit for ence is that one returns the elements
the performance characteristics of a in FIFO (first in, first out) order, while

44 COMM UNICATIO NS O F THE AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


practice

the other does so in LIFO (last in, first tell two other computers about the will be able to explain a concept, but
out) order, respectively. update, and so on until the informa- you must have enough skill to choose
Even if this looks like an everyday tion was replicated across the system. the right one that’s able to convey your
thing for most programmers, the mo- This metaphor gave way to a new area ideas to future programmers who will
ment you choose to use a stack instead of study called gossip algorithms. The read the code.
of a queue, you are deciding how to ex- gossip metaphor makes the idea easy Thus, you cannot use every meta-
plain your program. The stack is a very to explain, but the Xerox team was phor you know. You must master the
good metaphor for the collection of still lacking the mathematical tools art of metaphor selection, of meaning
items that a program works with, be- that would help analyze the effective- amplification. You must know when
cause it tells a future reader of the pro- ness of the algorithms. to add and when to subtract. You will
gram in which order to expect the items During their research, they dis- learn to revise and rewrite code as a
to process. You don’t even need to read covered another metaphor related to writer does. Once there is nothing else
how the stack is implemented, since their problem: epidemics. They un- to add or remove, you have finished
you can assume you will get the items derstood their algorithms replicated your work. The problem you started
in LIFO order. This is why types are so data the same way in which an epi- with is now the solution. Is that the
important in computer science—not demic disseminates across a popu- meaning you intended to convey in the
types as in static type checking of pro- lation. By using this new metaphor, first place?
grams, but types as the concepts used they got immediate access to all the
to describe programs: persons, users, knowledge in The Mathematical The- Acknowledgments
stacks, trees, nodes, you name it. Types ory of Epidemics,1 which fit their work Thanks to Jordan West and Carlos
are the characters that tell the story of like a glove. Not only did they name Baquero for our discussions about how
your program; without types, you just their paper “Epidemic Algorithms for metaphors permeate computing, and
have operations on streams of bytes. Replicated Database Maintenance,”4 for their feedback for this article.
they also took the nomenclature of
Cognitive Leaps that discipline to explain their algo-
Related articles
The goal is to find the right metaphor rithms. It was a matter of finding the on queue.acm.org
that describes and explains a prob- right metaphor to get access to a new
First, Do No Harm: A Hippocratic Oath
lem. As explained earlier with the world of explanatory power.
for Software Developers
queueing theory example, a cognitive Phillip A. Laplante
leap was needed to go from tasks that Metaphors Everywhere http://queue.acm.org/detail.cfm?id=1016991
have to be processed in a certain order Do we really use that many metaphors Coding for the Code
to understanding that this is a queue- in programming? Let’s take a look at an Friedrich Steimann and Thomas Kühne
ing problem. Once you manage to example from the distributed systems http://queue.acm.org/detail.cfm?id=1113336
make the cognitive leap, all the math- literature (metaphors are in italics): A Nice Piece of Code
ematical tools from queueing theory Whenever nodes need to agree on a George V. Neville-Neil
are at your disposal. Graph theory common value, a consensus algorithm http://queue.acm.org/detail.cfm?id=2246038
is filled with examples of mundane is used to decide on a value. There’s
tasks that, once converted to a graph usually a leader process that takes care References
1. Bailey, N.T.J. The Mathematical Theory of Epidemics.
problem, have well-known solutions. of making the final decision based on C. Griffin and Co., 1957.
Whenever you ask Google Maps to get the votes it has received from its peers. 2. Baker, C. (Ed.). What a programmer does. Datamation
(Apr. 1967); http://archive.computerhistory.org/
you to your destination, Google trans- Nodes communicate by sending mes- resources/text/Knuth_Don_X4100/PDF_index/k-9-
lates your problem to a graph repre- sages over a channel, which might be- pdf/k-9-u2769-1-Baker-What-Programmer-Does.pdf.
3. Carr, N. The Shallows. W.W. Norton, 2011.
sentation and suggests one or more come congested because of too much 4. Demers, A., Greene, D., Hauser, C., Irish, W. and
Larson, J. Epidemic algorithms for replicated database
paths in the graph. Graphs are the traffic. This could create an informa- maintenance. In Proceedings of the 6th Annual ACM
right metaphor, understood by math- tion bottleneck, with queues at each Symposium on Principles of Distributed Computing,
(1987), 1–12.
ematicians and computers alike. Are end of the channels backing up. These 5. Gleick, J. The Information: A History, a Theory, a Flood.
there other instances of problems bottlenecks might render one or more Pantheon, 2011.
6. Lakoff, G. and Mark J. Metaphors We Live By.
that seem difficult but that can be nodes unresponsive, causing network University of Chicago Press, 1980.
solved by finding the right metaphor? partitions. Is the process that’s taking 7. Shannon, C.E. and Weaver, W. The Mathematical
Theory of Communication. University of Illinois
The distributed-systems literature too long to respond dead? Why didn’t it Press, 1949.
has a very interesting one. acknowledge the heartbeat and trigger 8. Yorgey, B. Abstraction, intuition, and the ‘monad
tutorial fallacy.’ https://byorgey.wordpress.
In the late 1980s Alan Demers and a timeout? This could go on, but you com/2009/01/12/abstraction-intuition-and-the-
his colleagues from Xerox tried to find get the point. monad-tutorial-fallacy/

a solution to database replication in


Alvaro Videla (alvaro-videla.com; @old_sound)
unreliable networks. They classified A Story in Code works as Lead Architect for a major Swiss company.
their algorithms as “randomized,” Programmers must be able to tell a Previously, he was a senior software engineer at Apple
and a core developer of RabbitMQ. He is the author of
explaining them by using the rumor- story with their code, explaining how RabbitMQ in Action.
mongering metaphor: a computer they solved a particular problem. Like
would tell two other computers about writers, programmers must know Copyright held by owner/author.
an update, then each in turn would their metaphors. Many metaphors Publication rights licensed to ACM. $15.00.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 45
practice
DOI:10.1145/ 3080188
Their selection covers new techniques

Article development led by
queue.acm.org
for fabricating (and emulating) com-
plex materials (for example, by ma-
nipulating the internal structure of
Expert-curated guides to an object), for more easily specifying
the best of CS research. object shape and behavior, and for
human-in-the-loop rapid prototyping.

Research
Combined, these two guides provide
a fascinating deep dive into some of
the latest human-centric computer sci-
ence research results.

for Practice:
As always, our goal in this article is
to allow our readers to become experts
in the latest topics in computer science
research in a weekend afternoon’s

Technology
worth of reading. To facilitate this
process, we have provided open access
to the ACM Digital Library for the rele-
vant citations from these selections so

for Underserved
you can enjoy these research results in
full. Please enjoy!
—Peter Bailis

Communities; Peter Bailis is an assistant professor of computer science


at Stanford University. His research in the Future Data
Systems group (futuredata.stanford.edu) focuses on

Personal
the design and implementation of next-generation data-
intensive systems.

Fabrication
Technology For
Underserved
Communities
By Tawanna Dillahunt
According to the Global
Multidimensional Pov-
erty Index 2013, 1.6 billion people—or
THIS INSTALLMENT OF Research for Practice provides more than 30% of the combined popu-
curated reading guides to technology for underserved lations of the 104 countries analyzed—
were impoverished in terms of health,
communities and to new developments in personal education, and living conditions.1
fabrication. First, Tawanna Dillahunt describes design Here, these individuals and those fac-
ing similar conditions are referred to
considerations and technology for underserved and as underserved.
impoverished communities. Designing for the more Designing and building technology
than 1.6 billion impoverished individuals worldwide to support people in these underserved
communities has several complexi-
requires special consideration of community needs, ties. Overcoming these complexities
constraints, and context. Her selections span protocols requires the following:
˲˲ Understanding the needs of a spe-
for poor-quality communication networks, community- cific underserved population and em-
driven content generation, and resource and public powering or enabling individuals from
service discovery. Second, Stefanie Mueller and Patrick that population to produce informa-
tion and develop their own solutions.
Baudisch provide an overview of recent advances in ˲˲ Understanding the context and
personal fabrication (for example, 3D printers). constraints that underserved indi-

46 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


viduals often face, such as limited or from poor infrastructures such as un- ronments. It also enables deployment
nonexistent reading skills, digital lit- stable network connectivity and elec- architects (that is, nondevelopers who
eracy, Internet and technology access, tricity (Brunette et al.); (2) enable in- customize off-the-shelf software pack-
and, in many cases, infrastructure (for dividuals to produce information, not ages) to customize mobile-applica-
example, no electricity or water). Indi- simply consume information, as sug- tion software to improve data transfer
viduals from underserved populations gested by Dell and Kumar2 (Vashistha reliability in such challenging envi-
often face social barriers such as lim- et al.); and (3) support the connection ronments. This paper contributes a
ited social networks, social isolation, among individuals within a communi- framework for optimizing mobile-
and systemic issues that exist beyond ty, or support the connection between application performance for limited-
our control such as social or income low-resource communities and high- resource settings.
inequality. In addition, these issues resource communities, for informa-
and limitations vary from region to re- tion and resource sharing, as suggest-
gion, and platforms that might appear ed by Heeks and Bhatnagar3 (Vashistha Technology Supporting the Connection
Among Individuals within Communities by
successful in one situation may not et al. and Dillahunt et al.).
Enabling the Production of Information
apply in another situation. Vashistha, A., Cutrell, E.,
˲˲ Developing technologies that Borriello, G. and Thies, W.
Technology Designed to Address
are sustainable or have the ability to Sangeet Swara: A community-moderated voice
Poor Network Environments forum in rural India. In Proceedings of the 33rd
maintain themselves in limited-re- Brunette, W., Vigil, M., Pervaiz, F., Levari, S., Annual ACM Conference on Human Factors in
source environments. Borriello, G. and Anderson, R. Computing Systems, 2015, 417–426;
˲˲ Considering that technology Optimizing mobile application communica- http://dl.acm.org/citation.cfm?id=2702191
alone may not meet the needs of cer- tion for challenged network environments. In
Proceedings of the 6th Annual ACM Symposium
tain underserved communities, given Sangeet Swara is a social media Hin-
on Computing for Development, 2015, 167–175;
the unique factors these communi- http://dl.acm.org/citation.cfm?id=2830644 di-based voice forum for songs and
ties face. The benefits of technology cultural content that was built using
are often limited to populations that This paper describes a configurable da- an IVR (interactive voice response)
are relatively affluent, educated, and ta-transmission framework designed system. Vashistha et al. wanted to un-
socially connected. to minimize network connectivity is- derstand how such a system might
The recently published papers pre- sues for mobile applications in special become financially sustainable. To
sented here are broadly centered on contexts. For example, the Open Data achieve financial sustainability for
technology and aim to support the Kit (ODK) is a free and open source set Sangeet Swara, the authors worked to
needs of underserved communities. of tools that helps organizations in low- reduce the cost of a community mod-
The selection represents a diversity resource areas write, field, and manage erator by having the community mod-
of ACM venues, including the Sym- mobile data collection solutions. ODK erate the content.
posium on Computing and Devel- is used in mobile-application deploy- Those calling into Sangeet Swara
opment, the Conference on Human ments to collect health and socioeco- recorded voice messages, listened to
Factors in Computing Systems, and nomic surveys using global positioning recorded messages, voted on them,
Designing for Interactive Systems. system locations and images. Quickly and shared them without access to the
The selection also represents a diver- notifying others of important health is- Internet. Callers helped curate the con-
sity of regions—including research sues such as an Ebola or polio outbreak tent by voting on the voice forums as a
from the Global South and Global in such contexts necessitates reliable way of prioritizing playback for others
North—and a diversity of technologi- data transfer despite challenging net- and at the same time eliminating the
cal artifacts, including one designed work environments. need for a moderator.
to optimize mobile application com- The authors of this paper contrib- The use of the system was suc-
munication for challenged network ute ODK Submit, an Android service cessful, with significant adoption by
environments and both long- and that coordinates data communica- low-income communities as well as
short-term application deployments. tion by providing channel-monitoring low-income individuals who were vi-
The selection also includes efforts and transmission-scheduling mecha- sually impaired; the system received
and approaches to reach the needs of nisms to Android apps. ODK Submit more than 25,000 calls and 5,000 voice
traditionally underserved groups (for improves deployments by enabling messages from more than 1,500 people
example, low-resource, blind, and un- application-level optimization of in an 11-week deployment. It did not
employed people). sparse heterogeneous networks, turn out to be financially sustainable
Finally, at least one of the selected sends appropriate data over available after the toll-free number was replaced
works exemplifies technological solu- channels, and identifies available con- with a paid number; however, the op-
tions that: (1) address issues resulting nectivity in challenging network envi- portunity remains to address this issue

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 47
practice

going forward. cerns as employers increasingly re- University of Michigan’s School of Information and holds
a courtesy appointment with the electrical engineering
In summary, the authors contrib- quire job seekers to apply online; and computer science department. She leads the
Social Innovations Group (www.socialinnovations.us),
uted Sangeet Swara, a system that government services, such as unem- an interdisciplinary group specializing in the R&D of
empowered low-income community ployment benefits, must be applied ubiquitous and social computing technologies.
members to produce information and for online (in some cases, this can be
build social capital, which is vital for done via phone); and other services,
connecting and strengthening com- such as health care and even housing,
munities. The community moderated are researched and obtained online.
the forums by categorizing and rating Connecting underserved job seekers
posts, thus eliminating the need for a with individuals who are employed is
single moderator—a step toward creat- commendable. Without making these
ing a self-sustaining system. technologies more inclusive, however,
many individuals will continue to be Personal Fabrication
left behind. By Stephanie Mueller and
Technology Connecting Low-Resource
Communities to High-Resource
Patrick Baudisch
Communities for Information Sharing Conclusion Personal fabrication tools such as 3D
Dillahunt, T.R., Bose, N., Diwan, S., For a computer scientist who cares printers are paving the way to a future
and Chen-Phang, A. about designing, implementing, and in which nontechnical users will be
Designing for disadvantaged job seekers: In- deploying inclusive tools, it is vital able to create custom objects. With the
sights from early investigations. In Proceedings to build software and software tech- recent decrease in price for 3D-printing
of the ACM Conference on Designing Interac-
niques that aim to: hardware, these tools are about to enter
tive Systems, 2016, 905–910; http://dl.acm.org/
˲˲ Improve poor infrastructures the mass market. The cost of the aver-
citation.cfm?id=2901865
such as unstable network connectiv- age consumer 3D printer has dropped
This article focuses on unemployed ity and electricity or provide solutions from about $14,000 in 2007 to $500 to-
job seekers from low-income commu- that do not require these infrastruc- day. Given the decreasing price, it is not
nities in the U.S. The authors conduct- tures to be stable. surprising that the number of consum-
ed a user-centered design process to ˲˲ Empower individuals to produce er 3D printers sold has doubled every
create and deploy Review-Me, a Web- information, not only to consume it.2 year—from 6,000 in 2010 to more than
based application that sourced résu- ˲˲ Connect individuals to others with 270,000 in 2015.
mé feedback for individuals who were more resources and help support devel- While the hardware is now more
unemployed from local individuals opment within and across communities. affordable and the number of people
who were employed. While the appli- When creating software, computer who own a 3D printer is increasing,
cation deployment successfully con- scientists and practitioners should only a few people use the printers to
nected job seekers who were currently consider issues that persist in limited- create 3D models. Most users down-
students, it uncovered limitations and resource areas, such as unstable ac- load models from a platform such as
constraints among underserved job cess to electricity and water and overall Thingiverse, and after downloading,
seekers. For example, underserved job poor infrastructure. Designing for oth- fabricate them on their 3D printers. At
seekers did not always have access to er constraints such as limited reading most, users adjust a few parameters of
digital résumés or understand how to and digital literacy and limited Inter- the model, such as changing its color
recreate physical résumés in digital net access is often beneficial to popu- or browsing among predetermined
form. Job seekers rarely had access lations beyond underserved groups. shape options.
to an email address, and those with Empowering individuals to produce Personal fabrication has the poten-
email access often forgot their pass- information and strengthen bonds tial for more: instead of only consum-
words. Finally, low literacy levels made within and across communities is criti- ing existing content, nontechnical us-
it very difficult for these job seekers to cal to reducing systemic issues that ex- ers in the future could use 3D printers
sign up to use the application. ist beyond our control, such as social or to create objects that only trained ex-
The results of this work suggested income inequality. perts can create today. In the past few
three fundamental design principles years, human-computer interaction
to address these shortcomings: com- References (HCI) and graphics researchers have
1. Alkire, S., Roche, J.M. and Seth, S. Multidimensional
patibility (for example, systems should Poverty Index 2013. Oxford Poverty and Human worked on a number of challenges to
accept photographed images of résu- Development Initiative; http://www.ophi.org.uk/wp- move us toward such a future.
content/uploads/Global-Multidimensional-Poverty-
més); practicality (for example, sys- Index-2013-8-pager.pdf?0a8fd7. In our paper, “Personal Fabrica-
tems should provide ways to submit ré- 2. Dell, N. and Kumar, N. The ins and outs of HCI for tion” (Foundations and Trends in Hu-
development. In Proceedings of the CHI Conference
sumés offline, such as through kiosks on Human Factors in Computing Systems, 2016, man–Computer Interaction 10, 3–4,
or offline networking devices); and fa- 2220–2232. 2017), we provide a comprehensive
3. Heeks, R. and Bhatnagar, S.C. Understanding success
miliarity (for example, systems should and failure in Information Age reform. Reinventing overview of these challenges and in-
allow registration via SMS or familiar Government in the Information Age: International clude a survey of more than 200 recent
Practice in IT-enabled Public-sector Reform. R. Heeks,
mobile accounts such as Instagram or (Ed). Routledge, London, 1999, 49–74. papers. Here, we summarize three of
Facebook). those papers, which are representa-
The research findings raise con- Tawanna Dillahunt is an assistant professor at the tive of the larger field.

48 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


practice

havior and either critiques the user’s in a workflow that looks quite differ-
Multimaterial Fabrication design or automatically adjusts the ent from what is seen today.
Vidimce, K., Kaspar, A., Wang, Y., Matusik, W.
Foundry: Hierarchical material design for
design to make it comply with respect In the new “interactive fabrication”
multi-material fabrication. In Proceed- to forces. workflow presented in this paper, users
ings of the 29th Annual Symposium on Consider even the simple example work hands-on on the physical work-
User Interface Software and Technology, of designing a paper glider as pre- piece using physical tools (much like
2016, 563–574; http://dl.acm.org/citation. sented in this paper. Depending on in crafting), and see the physical work-
cfm?id=2984516.
the orientation of the glider, it will be piece change immediately as they edit.
With 3D printing, users can design subject to drag forces that make the The fast feedback loop allows users to
every aspect of an object: they can glider resist the airflow and lift forces evaluate every intermediate step, allow-
create a specific appearance (for ex- that move the glider upward. All forc- ing them to adjust their decisions along
ample, a desired shape, color, and es depend not only on the shape of the the way. This potentially makes editing
reflectance), a specific feel (for ex- glider, but also on its velocity and ori- of physical data as easy as editing digi-
ample, by printing tactile textures entation at a certain moment in time; tal data on a multitouch device today.
or by using soft materials), and they thus, they are constantly changing
can make an object perform a desired as the glider moves through the air. Conclusion
function (for example, using conduc- This creates a large parameter space When will personal fabrication reach
tive materials for printed electronics that is infeasible for the user to tackle consumers? The journey has only just
or optical clear materials for printing manually. The design tool described begun. If the starting point is deemed
light pipes). in this paper lets users design the to be 2009 (that is, when the first pat-
Functional properties can also be shape of the glider, and as they de- ents ran out and the first low-cost 3D
achieved by designing the internal sign, provides real-time feedback on printer, the MakerBot Cupcake CNC,
structure of an object—for example, the flight performance. appeared on the market), then we are
making an object stand by redistribut- clearly still at the very beginning of
ing its infill to shift its center of mass. putting personal fabrication into the
From Batch-Processing, To Turn-Taking,
Finally, by using microstructures with To Direct Manipulation For Physical ‘Data’ hands of consumers.
structurally varying cells, researchers Willis, K.D.D., Xu, C., Wu, K.-J., If personal fabrication today feels
have shown how to emulate different Levin, G., Gross, M.D. like a niche technology for hobbyists,
material behaviors using a single ma- Interactive fabrication: New interfaces for it is most likely because there are still
digital fabrication. In Proceedings of the 5th
terial (so-called metamaterials). decades to go. We should look at the
International Conference on Tangible, Embed-
This paper provides an editing ded, and Embodied Interaction, 2010, 69–72; in-between progress with patience.
environment for designing such http://dl.acm.org/citation.cfm?id=1935716. The success of other technologies
multimaterial composite objects. such as personal computing could
Composing a set of operators into an When simulation is not possible, such certainly not have been predicted un-
operator graph creates the material as when testing the aesthetics and er- til decades after their inception (con-
definitions. The operators are imple- gonomics of a design, users have to sider the words of Thomas Watson,
mented using a domain-specific lan- fabricate the object to evaluate it. Since president of IBM, in 1943: “I think
guage for multimaterial fabrication, 3D printing is so slow, users have to there is a world market for maybe
and users can easily extend the library think carefully before printing, as every five computers”).
by writing their own operators. mistake may imply another hour-long If personal fabrication should turn
or even overnight print. out anything like personal comput-
Planning every step, however, is ing in its adoption, we still have an
Domain Knowledge
Umetani, N., Koyama, Y., Schmidt, R. not feasible for nontechnical users amazing journey ahead of us.
and Igarashi, T. as they lack the experience to reason
Pteromys: Interactive design and optimiza- about the consequences of their de- Stefanie Mueller is an assistant professor in MIT’s
tion of free-formed free-flight model air- sign decisions. To solve the problem, electrical engineering and computer science department,
planes. ACM Transactions on Graphics 33, 4 joint with mechanical engineering, and is a member of
researchers proposed repeating the MIT CSAIL. She develops novel hardware and software
(2014), Article 65; http://dl.acm.org/citation. systems that advance personal fabrication technologies.
cfm?id=2601129. evolution of the user interface from
personal computing for personal Patrick Baudisch is a professor of computer science at
Hasso Plattner Institute at Potsdam University and chair
Professional CAD tools require years fabrication. Computing also started of the Human Computer Interaction Lab. Previously, he
worked as a research scientist in the Adaptive Systems
of engineering training to gain the with machines that ran a program in and Interaction Research Group at Microsoft Research
necessary expertise as they provide one go overnight. Then turn-taking and at Xerox PARC.
fine control over every parameter systems such as the command line
in the design. HCI and graphics re- evolved, which provided users with
searchers have looked at how to cre- feedback after every input; finally, di-
ate design tools that abstract away the rect-manipulation interfaces (such as
necessary domain knowledge by let- today’s multitouch devices) provided
ting users specify the shape and mo- users with continuous feedback dur-
tion of the desired object; the system ing editing. Applying the same inter- Copyright held by owners/authors.
then simulates the mechanical be- action concept to fabrication results Publication rights licensed to ACM.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 49
practice
DOI:10.1145/ 3106633
The environment I was trying to re-

Article development led by
queue.acm.org
produce, however, was not normal, or
more accurately, it was not typical. A
typical IT organization is, in compari-
Why the Bell curve hasn’t transformed son, in utter disarray. The quality of
into a hockey stick. IT organizations follows a bell curve:
A few percent run like fine-tuned ma-
BY THOMAS A. LIMONCELLI chines, a few percent look like toxic
waste dumps on fire, and the vast ma-

Four Ways
jority are somewhere in the middle.
Fortunately for me, I won the IT ca-
reer lottery. Early in my career I saw what
the best in class looked like and consid-

to Make CS
ered it normal. Later, this high standard
made me look like a visionary. The truth
is I just didn’t know any other way.
Most IT practitioners are not so for-

and IT More
tunate. They are not blessed with the
same experience I was afforded, and
they literally do not know any better.
This, I believe, is why the bell curve

Immersive
has not transformed into a hockey
stick, or is even a lopsided blob. This is
why we cannot have nice things.

How Did We Get Into This Situation?


Students certainly are not learning best
practices in the classroom. In fact, stu-
dents are more likely to learn the best-
of-breed DevOps practices through
extracurricular involvement in open-
source projects than from their univer-
sity professors.
Most large open source projects use
Git for source-code control, use Jenkins
I was lucky. I learned IT in an incredibly immersive for CI/CD (continuous integration/con-
way. My first two jobs were in organizations that tinuous deployment), and have a fully
automated testing procedure because
followed the very best practices for their day. Because it enables them to scale to large num-
it was all I knew, I considered that to be normal. I had bers of participants with minimal over-
head. Smaller open source projects
no idea how unique those organizations were. I didn’t tend to use these tools, too, because
know at the time that the rest of the industry would they lack resources and using these
not adopt these techniques for a decade or more. tools makes managing the project sig-
nificantly easier.
My next career moves brought me in contact with Yet, how many universities require
organizations that did not adhere to the same best CS homework to be turned in via Git
commit? How many universities have
practices, nor any others. In fact, they were unaware an IT department that is a showcase for
that such best practices existed at all. I considered this best DevOps practices? How many uni-
to be a bug and went about fixing it, dismayed that versities have CS departments and IT
departments that collaborate to push
anyone would settle for anything else. I was re-creating the boundaries of best practices? I as-
what I considered “normal.” sure you the number is very low. It is no

50 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


surprise that the innovations that led a Git commit and a Jenkins output log. and accidentally turning into a Web ap-
to the DevOps transformation did not Some instructors will undoubtedly plication boot camp.
come from academia. This is a disgrace feel it is difficult enough to teach first- It isn’t a radical statement to say
that should make every CS department year computer science without adding that most software engineers write
bow their heads in shame. the complexities of Git. Most IDEs, code that is somehow part of a Web-
however, make simple check-in/check- based application. At companies like
How Can We Turn This Around? out operations a breeze, especially for Squarespace and Google, software en-
How can we ensure students are ex- single-person projects with no branch- gineers’ IDEs supply default templates
posed to the best of the best practices es. By the time projects get more col- for new programs. Such a template is
from the start so that they consider laborative, the students will be ready for a self-contained Web server that
anything else a bug? for the more advanced Git features. directs output to a Web page. Even a
How can we make curricula more 2. Homework should generate a Web simple “Hello World!” program is a
immersive? page, not text to the console. I recently Web server that outputs the greeting as
PHOTO BY J ESSE T HE T RAVELER/ FLICKR ( CC BY- NC 2 .0)

Here are a few small and big things spoke to a roomful of third-year CS ma- a result of an HTTP request and, by de-
that universities could do. jors and was shocked to learn that fault, generates logging information,
1. Use DevOps tools from the start. most didn’t know HTML. The cur- monitoring metrics, and so on.
Students should use source-code re- riculum was fairly standard—under- Yes, that is a bit much for an intro-
positories such as Git, and CI/CD tools graduate algorithms and such. HTML ductory student’s “Print your name 10
such as Jenkins, as they do their CS was something you learned in the art times” program. But after that, gener-
homework. These processes should be department; the computer science de- ate a Web page!
established as the normal way to work. partment was for serious students. 3. IT curricula should be immersive.
Professors should expect homework as- I think there is a middle ground be- How could formal education better
signments to be turned in by linking to tween serious computer science theory emulate the immersive experience that

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 51
practice

I was lucky enough to benefit from? continuous improvement. As a result, tion technologies (infrastructure as
Most IT curricula are bottom up. students are more motivated and better code and software-defined network-
Students are taught individual subsys- able to assess their own work. This leads ing), Lean/KanBan ideas, cloud funda-
tems, followed by higher levels of ab- to improved engagement and fosters mentals, and even DevSecOps.
stractions. At the end, they learn how more practical class discussions. It cre-
it all fits together. Toward the end of ates a direct feedback loop between a Conclusion
their college careers, they learn the best student’s actions and the value they cre- Education should seek to normalize
practices that make all of it sustainable. ate. Most importantly, it better prepares best practices from the start. Work-
Or, more typically, those sustainabil- students for the real world. ing outside these best practices
ity practices are not learned until later, 4. Be immersive from the start. IT should be considered a bug. Stu-
when the new graduate has a job and projects usually involve some kind of dents should not struggle to learn
is assigned to a coworker who explains legacy system. The most apt analogy best practices after graduation, and
“how things work in the real world.” is being asked to change the tires on they should be shocked if potential
Instead, an IT curriculum should a truck while it is being driven down new employers do not already have
start with a working system that fol- the highway. these practices in place.
lows all the best practices. Students Software engineers spend more Both IT and CS curricula could be
should see this as the norm. They can time reading other people’s code than structured to be more immersive, as im-
dissect the individual subsystems and writing their own. We evolve existing mersive education more reliably reflects
put them back together, rather than systems. Green-field or “fresh start” the real world. It prepares students
building them from scratch. opportunities are rare. Many people I for industry and better informs the re-
The Masters of System Administra- have met have never been in a situation search of those who choose that path.
tion curriculum at the University of Oslo where they designed a new network, Seeing the forest, and then understand-
includes a multiweek immersive expe- application, or infrastructure from ing the trees, helps students under-
rience called the Uptime Challenge.1 scratch. Why can’t education better stand why they are learning something
Students are divided into two teams, prepare students for this? before they learn it. It is more hands-on
and each team is given a Web-based ap- Could something like the Uptime and therefore more engaging, and lends
plication, including multiple Web serv- Challenge be introduced even earlier itself to gamification.
ers, a load balancer, a database, and so in the educational process? Our first experiences cement what
on. The application is a simple social Perhaps on the first day of class becomes normal for us. Students
network application called BookFace. students should be handed not only should start off seeing a well-run sys-
Once the system is running, the in- copies of the syllabus, but also the tem, dissect it, learn its parts, and pro-
structor enables a system that sends an username and passwords to the ad- gressively dig down into the details.
ever-increasing amount of simulated ministrative control panel of a work- Don’t let them see what a badly run sys-
traffic to the application. Each team’s ing system. Instruction and labs could tem looks like until they have experi-
system is checked for uptime every five be oriented around maintaining this enced one that is well run. A badly run
minutes. The team receives a certain system. Students would have their own system should then disgust them.
amount of fake money (points) if the wikis to maintain documentation and
site is up, and a small bonus if the page operational runbooks.
Related articles
loads within 0.5 seconds. If the site is Each student would have their own
on queue.acm.org
down, money is deducted from the working system, but I suggest that ev-
team. This simulates a typical website ery few weeks students be randomly re- Undergraduate Software Engineering
Michael J. Lutz, et al.
business model: you make money only assigned to administer a different sys- http://queue.acm.org/detail.cfm?id=2653382
if the site is up. Faster sites are more tem. Seeing how their fellow students
A Conversation with Alan Kay
appealing and profitable. Customers had done things differently would be
http://queue.acm.org/detail.cfm?id=1039523
react to down or slow sites by switch- educational. Also, the best way to learn
ing to competitors; thus, those lower- Evolution of the Product Manager
the value of a well-written runbook is
Ellen Chisa
performing sites lose money. to inherit someone else’s badly main- http://queue.acm.org/detail.cfm?id=2683579
The challenge lasts multiple weeks, tained runbook.
during which the students learn to per- Institutions are developing more References
form common web-operation tasks immersive educational strategies. In 1. Begnum, K. and Anderssen, S.S. The Uptime
challenge: A learning environment for value-driven
such as software upgrades, bug fixing, cooperation with industry, Bossier operations through gamification. Usenix J. Education
task automation, performance tuning, Parish Community Collegea has cre- in System Administration 2, 1 (2016); https://www.
usenix.org/jesa/0201/begnum.
and so on. Inspired by Netflix’s Chaos ated an Associate of Applied Science in 2. Tseitlin, A. The antifragile organization. Commun. ACM
Monkey,2 individual hosts are random- Systems Administrator degree, which 56, 8 (Aug. 2013), 40–44.
ly rebooted to test the resiliency of the is highly immersive and covers core
overall system. DevOps principals, including automa- Thomas A. Limoncelli is a site reliability
The Uptime Challenge enables stu- engineer at Stack Overflow Inc. in NYC. He blogs
at EverythingSysadmin.com and tweets at @YesThatTom
dents to understand IT’s value to the a https://www.bpcc.edu/catalog/current/tech-
organization and to identify the IT pro- nologyengineeringmathematics/aas-system- Copyright held by owner/author.
cesses that impact this value and permit administration.html Publication rights licensed to ACM. $15.00

52 COMM UNICATIO NS O F THE AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


Call for Nominations
The ACM Doctoral Dissertation Competition

Background of the Competition Publication Rights


ACM established the Doctoral Dissertation Award program Each nomination must be accompanied by an assignment
to recognize and encourage superior research and to ACM by the author of exclusive publication rights.
writing by doctoral candidates in computer science and (Copyright reverts to author if not selected for publication.)
engineering. These awards are presented annually at the
ACM Awards Banquet. Publication
Winning dissertations will be published by ACM in
Submissions the ACM Books Program and appear in the ACM Digital Library.
Nominations are limited to one per university or college, Honorable mention dissertations will appear in the
from any country, unless more than 10 Ph.D.’s are granted ACM Digital Library.
in one year, in which case two may be nominated.
Selection Procedure
Sponsorship Dissertations will be reviewed for technical depth and
Each nomination shall be forwarded by the thesis advisor significance of the research contribution, potential impact
and must include the endorsement of the department head. on theory and practice, and quality of presentation. A committee
A one-page summary of the significance of the dissertation of individuals serving staggered three-year terms performs
written by the advisor must accompany the transmittal. an initial screening to generate a short list, followed by
an in-depth evaluation to determine the winning dissertation.
Deadline
Submissions must be received by October 31, 2017 The selection committee will select the winning dissertation
to qualify for consideration. in early 2018.

For Eligibility and Submission Procedure see Award


http://awards.acm.org/doctoral_dissertation/ The Doctoral Dissertation Award is accompanied by a prize
of $20,000 and the Honorable Mention Award is accompanied
by a prize of $10,000. Financial sponsorship of the award
is provided by Google.
contributed articles
DOI:10.1145/ 3131873
We hope that presenting what we have
Developers know refactoring improves learned will encourage improvement
in refactoring support.
their software, but many find themselves
unable to do so when they want to. Reasons to Not Refactor
The reasons practitioners gave us for
BY EWAN TEMPERO, TONY GORSCHEK, AND LEFTERIS ANGELIS not refactoring a design believed to
have quality problems can be catego-

Barriers to
rized as follows:
Resources. Concern over the re-
sources required was a frequently
cited reason for not refactoring. The

Refactoring
resource mentioned most often was
time, as in “Deadlines often don’t al-
low refactorings”; “Sometimes there
is just no time”; and “No time no
time.”
Risk. Also frequently cited was the
risk involved in making a change, in par-
ticular introducing new faults or other
problems, as in “That kind of refactor-
ing is time consuming and there is a
REFACTORING6 IS SOMETHING software developers like to large risk of introducing bugs” and “If
it’s working I leave it alone.”
do. They refactor a lot. But do they refactor as much as Difficulty. Another concern was the
they would like? Are there barriers that prevent them difficulty in making the change, as in
“Inheritance is tricky to refactor cor-
from doing so? Refactoring is an important tool for rectly” and “This kind of refactoring
improving quality. Many development methodologies is usually difficult.”
rely on refactoring, especially for agile methodologies ROI. Participants acknowledged
that while there may be benefits from
but also in more plan-driven organizations. If barriers refactoring, there are also costs, and
exist, they would undermine the effectiveness of many the return on investment, or ROI, has
to be considered, as in “Again, I have
product-development organizations. We conducted a to weigh the costs and benefits. The
large-scale survey in 2009 of 3,785 practitioners’ use benefits have to be clear before tak-
of object-oriented concepts,7 including questions as ing on the costs of refactoring, retest-
ing, etc.”
to whether they would refactor to deal with certain Technical. Participants reported a
design problems. We expected either that practitioners variety of constraints due to charac-
teristics of the project that restricted
would tell us our choice of design principles was
inappropriate for basing a refactoring decision or that key insights
refactoring is the right decision to take when designs ˽˽ Developers understand the value of
were believed to have quality problems. However, we refactoring but are often prevented
from doing it due to factors beyond
were told the decision of whether or not to refactor was their control.

due to non-design considerations. ˽˽ Refactoring has benefits, but also


ILLUSTRATION BY J UST IN M ETZ

costs and risks, and developers’


It is now eight years since the survey, but little has inability to quantify them inhibits
their use of refactoring.
changed in integrated development environment ˽˽ The decision to refactor is ultimately a
(IDE) support for refactoring, and what has changed business decision, and support is needed
to allow developers to make the business
has done little to address the barriers we identified. case regarding whether or not to refactor.

54 COM MUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 55
contributed articles

their ability to refactor. Examples in- We know that developers refactor


cluded having to implement a third- frequently but also that different de-
party interface that exceeds a limit, veloper groups have different refac-
concern about the impact of any po- toring behavior. Simple refactoring
tential required changes on other
parts of the system, the degree of fa- Perhaps the most (such as the simple “rename” opera-
tion) is the most frequently applied.
miliarity with the code, dealing with
legacy code, and lack of other support
surprising theme Generally, the more complex the
refactoring operation is, the less fre-
(such as test suites), as in “A large leg- is the complete quently it is used.12,17
acy codebase makes such refactoring
tough”; “And with no tests, if I try to
lack of concern Kim et al.10 studied the use of refac-
toring on Microsoft Windows, survey-
change things, they will break”; and about the existence ing more than 300 engineers who had
“I will always refactor code I am cre-
ating; however, if that is not the case
and quality carried out some form of refactoring
of Windows, interviewing a subset.
which is the most usual case then I of tool support They identified such challenges per-
don’t have time budgeted for refac-
toring.” for refactoring. forming refactoring as managing
inter-component and inter-branch
Management. Participants ob- dependencies and maintaining back-
served they did not always control ward compatibility.
what they spend their time on. Their More recently, Chen et al.3 surveyed
managers or clients also had a say, 105 practitioners using agile software-
as in “I want to, but the boss doesn’t development processes to understand
like it”; “deadlines and pointy-haired how they planned and practiced refac-
bosses”; and “I’d like to more, but the toring, finding that while 98.1% of
client isn’t paying for it [sic].” those surveyed did careful planning
Tools. Inadequate tool support was when creating new features, only 40%
mentioned as a reason for not refac- addressed refactoring.
toring; however, the tools referred to Existing research has studied the
were not those that do the refactor- behavior and issues faced by devel-
ing: “Source control makes this par- opers who have made the decision to
ticular refactoring quite painful, that refactor. Here, we are interested in
is, a lack of tool support.” those developers who have made the
decision not to refactor.
What Researchers Know
The results of our survey are still sig- Survey Design
nificant because they indicate miss- We used a survey to increase the num-
ing support for refactoring. Although ber of developers we could reach,
there has been much research on looking to determine when and if
refactoring, including into develop- refactoring is used. But instead of
ers’ use and factors that may affect asking about this outright we used
their use, such studies have focused realistic scenarios as a basis for the
mainly on what developers do when investigation. The survey participants
they have decided to refactor, rather were doing development in some
than whether or not they will refactor. object-oriented language. We asked
For example, a 2006 study by Xing and what they consider to be good design
Stroulia18 suggested refactoring was principles and later in the survey if
a common activity and raised ques- these good principles were broken,
tions about the quality of existing tool would they use refactoring to fix the
support. Murphy-Hill and Black11 pro- problem (or not). We asked them to
posed five principles for supporting indicate what they thought was a good
what they called “floss” refactoring: design, focusing on a reasonable level
frequent refactoring intermingled of class depth and size.
with other kinds of program chang- The use of a survey limits the kind
es.11 They also observed that existing of questions we could ask, so we
tools rarely meet all principles and chose two simple design heuristics
suggested this might explain the un- as our focus. We asked about the ba-
derutilization of tools they had ob- sis for two of the metrics from the
served. Consequently, tool-usability well-known set of object oriented
issues and their underuse is a recog- metrics proposed by Chidamber and
nized research focus.12,17 Kemerer,5 namely Weighted Meth-

56 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


contributed articles

ods per Class, or WMC, and Depth of


Inheritance Tree, or DIT. WMC is a
form of size measure for classes, and
we used its simplest form, a count of
Survey Questions
We asked our survey participants about practices relating to commonly taught design
the number of methods in the class. principles for object-oriented programming; the full survey and raw data is available on
DIT, which captures a characteristic our website http://sefolklore.com. We designed the questions relevant to this article as
of the inheritance hierarchy, is the two pairs of questions for two design metrics, one pair (Q14, Q16) relating to class size
length of the longest path from a as measured by number of methods and the second pair (Q18, Q20) relating to depth
of a class in an inheritance hierarchy. We also invited participants to provide a free-
class to a root of the hierarchy. text comment to Q16 and Q20 relating to their response. The labels for the possible
In the software engineering re- responses to the questions were used to report the results.
search community, these metrics Q14. What do you think is the largest number of methods a class should have?
are regularly presented as indicat- (Choose one of the alternatives, and add a number in the textbox that replaces “N” in
the text of the alternative you chose.)
ing possible design problems; the Never. There should never be more than about N methods.
higher the values of WMC or DIT the Try. I try to avoid having more than about N methods in a class but allow exceptions
more likely there are problems. Re- in extreme circumstances.
Prefer. I prefer to avoid having more than N methods in a class but am not fanatical
searchers have sought to establish about it.
thresholds for these and other met- Don’t. I don’t really think about how many methods there are in a class but prefer to
rics, including Chidamber et al. 4 and avoid having classes with more than N methods.
Shatnawi.15 However, measurements None. I don’t think there should be any limit on the number of methods in a class.
of software systems indicate there Q16. What is the likelihood you will refactor a class (to create classes with fewer
are classes that have many methods methods) if it has more methods than the limit you chose in question Q14?
Always. Always.
or are very deep in the hierarchy.2 Most. Most of the time.
The disconnect between theory and Obvious. Only if it is obvious how to refactor the class.
practice could be due to practitio- Nothing. Only if I have nothing else to do.
Forces. Only if someone forces me to.
ners not being aware of the theory
Hardly. Hardly ever.
or knowing the theory but believing
Q18. What do you think the maximum depth of a class should be in the inheritance
it to be wrong, perhaps because the
hierarchy? (Choose one of the alternatives and add a number in the textbox that
theory really is wrong. We previously replaces “N” in the text of the alternative you chose.)
reported results for part of our survey Similar responses, as with Q14, but replacing “number of methods” with “depth of class.”
indicating that 12.0% (452) of par- Q20. What is the likelihood you will refactor the class hierarchy (to reduce the
ticipants had a preference for a limit maximum depth of classes) if there are classes deeper than the limit you chose in
on the number of methods, whereas question Q18?
Same responses as with Q16.
25.2% (952) indicated a preference
for a limit on the depth of a class;
for details see Gorschek et al.7 These We analyzed the data using con- would presumably consider refactor-
results suggest a significant propor- tent analysis.13 Specifically, we coded ing. We used the responses “Never”
tion of developers believe the theory. the free-text answers using provi- or “Try” to Q14 (for methods) or Q18
We asked survey participants sional coding14 to map statements to (for depth) as indicating such strong
whether they thought there should be two main categories of general moti- agreement. We know (as reported in
a limit to the number of methods or vation for “why” or “why not” refac- Gorschek et al.7) that 452 (12.0%) par-
depth of classes and, if so, what the toring was done. Subsequently, we ticipants indicated a strong preference
threshold should be. It is important coded the overall categories using si- for establishing a limit on the number
to note we let participants give their multaneous coding14 to further refine of methods, whereas 952 (25.2%) indi-
own views of what is good or bad, types of motivation, as with, say, more cated a strong preference for there be-
then, and only then, asked whether detailed reasoning in terms of time, ing a limit on the depth of a class. The
or not they would refactor designs risk, or economic factors. median choice for the limit on number
in which the number of methods or of methods was 10 and for depth 3.
depth exceeded the threshold they Survey Summary Table 1 reports the distribution of
themselves found to be relevant. We We had 3,785 participants complete responses from all participants to the
first asked them to indicate what they the compulsory questions of the sur- questions on the likelihood of refactor-
believed was a good threshold for vey (see the sidebar “Survey Ques- ing a class that exceeds a specified num-
size—WMC—and depth—DIT. We tions”). The responses of greatest ber of methods (Q16) or depth (Q20).
then asked if they would use refac- interest to us were from participants And Table 2 reports the distribution
toring to fix classes in light of their who indicated strong agreement with of responses to Q16 and Q20 for just
chosen limits. We also gave them the design principles that limit the num- the “believers” who indicated strong
opportunity to provide free-text com- ber of methods or depth of a class. agreement with the design principles.
mentary on their chosen response. This is the category of developers who The columns “Comments for Q16” and
This commentary was revealing as to should be most interested in making “Comments for Q20” report, for those
what factors affected participants’ de- sure that size and depth are not too who would impose a limit, how many
cision to refactor (or not). large and deep, respectively, and so provided a free-text comment.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 57
contributed articles

We classified participants re- to do anything when that limit is ex- operating system, role, type of de-
sponding “Always” or “Most” to ques- ceeded. In both cases a significant velopment, and highest qualifica-
tions Q16 and Q20 as an indication proportion would not refactor their tion. This information indicated our
that they would refactor classes they designs even when, by their own as- sample provided a broad representa-
perceived as being “bad.” Consider- sessment, a particular design had tion of developers. Half of the 3,785
ably more (952) agreed with having a problems. We wanted to know why participants reported more than four
limit on the depth of a class than on this was so. years of experience, half had a bach-
the number of methods (452). For One possibility is there is some- elor’s, and another 30% had mas-
those participants inclined to limit thing about the participants’ back- ter’s or higher academic credentials.
the number of methods, 265/452 grounds that influences their deci- Within the role category, almost all
(58.6%) would refactor classes that ex- sion to refactor or not. For example, reported having a programmer role
ceed their choice of limit (in bold). For project managers may be more con- (95.6%), but all other roles also had
those inclined to limit the depth of a cerned about resources and risk, and good representation; for example, ar-
class, 364/952 (38.2%) would refactor inexperienced developers may view chitects made up 64.1% of the total.
if the choice of depth was exceeded. refactoring as too difficult. We col- Participants could report more than
So developers are more inclined lected demographic information for one role. The highest reported lan-
to limit the depth of a class than the all participants, including amount of guage was C# (55.7%), though Java
number of methods but less inclined experience, programming language, (49.4%) and C++ (45%) were also well
represented; full details are available
Table 1. Likelihood of refactoring a class if limit of number of methods or class depth is in Gorschek et al.7
exceeded (all participants).
We examined just those partici-
pants who agreed with having a limit,
Q16 Refactor Methods Q20 Refactor Depth
as they were responding to whether
Always 125 3.3% 120 3.2%
they would refactor to deal with a de-
Most 845 22.3% 536 14.2%
sign issue. Our statistical analysis,
Obvious 1,725 45.6% 1,660 43.9%
which was based on logistic regres-
Nothing 415 11.0% 366 9.7%
sion, showed two groups—architects
Forces 128 3.4% 224 5.9% and experienced C# programmers—
Hardly 547 14.5% 879 23.2% were more inclined to refactor if the
Total 3,785 3,785 number of methods exceeded the
participants’ chosen limit. If the class
depth exceeded the limit, C# devel-
Table 2. Likelihood of refactoring by those who limit number of methods or class depth. opers and architects were again more
inclined to refactor. Those in the pro-
Response to Q14 Comments Q18 Comments grammer role were less inclined to
Q16/Q20 Limit Methods for Q16 Limit Depth for Q20
refactor. Given that almost all partici-
Always 38 8.4% 14 8.6% 76 8.0% 29 8.4%
pants indicated having a programmer
Most 227 50.2% 84 51.9% 288 30.2% 102 29.7%
role, this result may simply reflect the
Obvious 145 32.1% 50 30.9% 415 43.6% 138 40.1%
overall response.
Nothing 28 6.2% 7 4.3% 77 8.1% 21 6.1%
These results are interesting, al-
Forces 4 0.9% 2 1.2% 26 2.7% 12 3.5%
though none of the variables are good
Hardly 10 2.2% 5 3.1% 70 7.4% 42 12.2% predictors, as they only weakly indi-
Total 452 162 952 344 cate whether or not developers will
refactor. Neither participant role nor
background gave a clear answer as to
Table 3. Comment category for those who would not refactor (“No”) and those who would why they chose not to refactor even if
refactor (“Yes”); participants may be in more than one category.
their class design contradicted their
view of good design. We had to dig
Category Limit Methods Limit Depth deeper into the motivation offered by
No Yes No Yes the respondents.
Resources 23 47.9% 27 49 36.0% 8
Risk 19 39.6% 6 43 31.6% 1 Participant Comments
Technical 11 22.9% 11 25 18.4% 17 To better understand what influ-
Difficulty 3 6.2% 5 13 9.6% 6 ences a decision to refactor, we ex-
ROI 0 0% 0 13 9.6% 3 amined the comments provided by
Management 1 2.1% 0 11 8.1% 3 those participants who indicated
Tools 1 2.1% 1 0 0% 0 there should be a limit but would not
Participants 48 46 136 35 refactor if the limit was exceeded.
As outlined in Table 2, 162 partici-
pants who would limit the number

58 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


contributed articles

of methods provided comments of tool support is perhaps missing a


(“Comments for Q16”), as did 344 significant issue.
who would limit the depth of classes Another theme we noted was that
(“Comments for Q20”). participants try to avoid the problem
We identified the barriers (de-
scribed earlier under the subhead The results indicate in the first place, as in “Would hope
to never get to the case where a refac-
“Reasons to Not Refactor”) through there is often little tor of this type is necessary” and “This
should be caught at design time. I am
or no knowledge
analysis of free-text comments. Not
all comments explained participants’ cautious to refactor this problem.”
reasons for not refactoring. The dis-
tribution of those who did is outlined
about what the One participant made this comment
regarding “number of methods,” and
in Table 3 (“No” columns). Some par- actual ROI is when 16 participants made it for “depth of
ticipants’ comments fall into more
than one category; for example, we
refactoring in a class.” So there is awareness of the
benefits of good design. However,
classified the participant comment specific case. even with the best of intentions, de-
“Unless a class is ridiculously huge sign quality can degrade, so avoid-
or easy to refactor, it’s probably not ance is no solution and poor designs
worth the time or risk” as both “re- still need refactoring.
sources” and “risk.” While our results suggest the re-
While we were most interested in moval of perceived design problems
participants who would not refactor, is not a priority, many responses
we note that those who would refactor indicated awareness of the benefit
made similar comments. The num- of refactoring, as in “Time taken to
bers in each category are also report- refactor the code is probably a good
ed in Table 3 (“Yes” columns). investment, since it will probably
save developer time later.” However,
Interpreting the Results a common theme was that exter-
As with any survey data, especially nal factors (such as “deadlines” and
qualitative data based on free text, “perceived risk”) led to the decision
we urge caution in its interpretation. to not refactor, as in “Unless a way
Our categories are not necessarily to refactor is clear, time pressure
orthogonal; a management decision generally prevents refactoring” and
to not refactor might be due to per- “Refactoring class hierarchies is usu-
ceived cost, risk, or ROI calculation. ally really time-consuming and can
Some technical concerns may be have all kinds of unintended effects.”
about risk, as when the reason given Implicit in such comments is the sug-
for not refactoring is that there are gestion that the benefits of refactor-
insufficient tests, the real concern ing do not always outweigh the cost—
may be due to the risks. Some man- that refactoring is not always a good
agement restrictions could be due to investment, as in “It sometimes takes
managers’ concern over ROI, while more effort than it is worth to reduce
concern regarding difficulty could be the depth.”
about the risk of introducing faults. Whether such investment is worth
Where we could, we chose the most it was more explicitly mentioned in
specific category. Nevertheless, while other responses, as in “Again this is
the exact numbers may be uncertain, the time trade-off of do I see a clear
the themes are quite clear. opportunity to improve things and
Perhaps the most surprising how long will it take?” In particular,
theme is the complete lack of con- the inability to determine ROI is a
cern about the existence and qual- barrier, as in “If I can’t make a rea-
ity of tool support for refactoring. sonable estimation on how long it will
As other studies have shown,12,17 take, I will leave it alone.” That is, the
such concerns are indeed real. We decision to not refactor was not due to
speculate they come into play only it being considered a bad idea but to
when the decision to refactor has uncertainty as to the benefit. The re-
been made, though in our survey the sults indicate there is often little or no
quality of tools did not appear to be a knowledge about what the actual ROI
factor in making that decision. This is when refactoring in a specific case.
suggests the emphasis within the As mentioned earlier, studies indi-
research community on the quality cate developers prefer simple refac-

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 59
contributed articles

toring. For example, Murphy-Hill al.’s survey17 identified a preference


et al.12 asked “Why is the ‘rename’ for lightweight methods for invoking
refactoring tool so much more popu- refactorings. We suggest providing
lar than other refactoring tools?” The lightweight goal-directed refactoring
answer could be simply that the “re-
name” operation and less-complex We speculate recommendations would be a fruitful
area for future research.
operations are, in general, what devel-
opers need most often. However, our
that an unstated We also did not see lack of interest
in refactoring. Our participants were
results suggest this is not the case. We barrier is by and large willing to consider refac-
see that developers do want to refac-
tor complex cases and suggest that
the difficulty toring, but barriers prevented them
doing so. Tool support for perform-
the preference for a simpler opera- translating ing refactoring can address some of
tion can be explained in ROI terms;
that is, being simple, the cost is low,
a refactoring goal them but not all. For example, lack
of certainty about ROI is a challeng-
meaning it is almost always worth do- into refactoring ing but also possibly very rewarding
ing. More-complex operations take
longer, are higher risk, and deliver operations. area of research. We are not the first
to highlight the importance of using
benefit that is not so clear, as when ROI (see, for example, Harun and Li-
one respondent said, “I will refactor chter,8 Kazman et al.,9 and Szöke et
the class only if it has a clear benefit.” al.16), but our results suggest it is a
notable barrier to improving design
What We Learned quality. What is needed is a decision-
We asked whether developers would support system that allows practitio-
refactor classes that did not meet ners to be able to quantify benefit in
characteristics they themselves the long and short term. This would
thought were important, finding help inform the decision as to wheth-
many (at least 40%, as in Table 2) er the required resources are justified
would not. The reasons can be sum- or the potential risk is tolerated. It
marized as lack of resources, of infor- would also allow developers and man-
mation identifying consequences, of agers to make informed choices as to
certainty regarding risk, and of sup- whether or not to refactor.
port from management. This is con- The reasons participants gave for
sistent with studies on tool use. In not refactoring will mostly come as
particular, Vakilian et al.17 reported no surprise, especially to practitio-
that reasons for not using automation ners. What we provide is solid data to
for refactoring involve trust, predict- back up everyone’s suspicions. Prac-
ability, and complexity. titioners who have been prevented
What we did not see was concern from refactoring for similar reasons
over lack of tool support. This was can at least take heart that they are
surprising, since, as Murphy-Hill not alone, or even, it may seem, in
and Black 11 noted, two advantages the minority. While existing research
from using refactoring tools are goals regarding refactoring are rea-
lower error rates and less time re- sonable, we feel they do not address
quired, so good tool support should problems actually faced by practi-
go some way toward addressing de- tioners. As researchers, we need to
velopers’ concerns. better understand what barriers they
Recent tool research has focused face and better target our research to
on primitive refactoring operations— support them.
“rename,” “move,” “method,” and so
on—whereas our question was goal- Validity of Results
directed, asking, “Will you refactor Our data is representative because
to achieve this effect?” rather than of the way we elicited it. We avoided
simply, “Would you use this refactor- leading questions, especially for free-
ing operation?” We speculate that an text responses, asking only, “Please
unstated barrier is the difficulty trans- explain your answer.” This led to
lating a refactoring goal into refac- comments that did not provide spe-
toring operations. While research cific reasons for refactoring (or not).
has proposed refactoring operations A more targeted question might have
to remove code smells (such as de- yielded more and clearer results but
scribed by Bavota et al.1), Vakilian et might also have biased the responses.

60 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


contributed articles

None of the material we used tor, whereas ours included those who 8. Harun, M.F. and Lichter, H. Towards a technical
debt-management framework based on cost-benefit
to recruit participants mentioned chose not to refactor. That these more analysis. In Proceedings of the 10th International
refactoring or gave specifics of the recent surveys report findings simi- Conference on Software Engineering Advances
(Barcelona, Spain). International Academy,
concepts we would be asking about. lar to ours suggests there has been Research, and Industry Association, 2015.
Consequently, unlike other studies, little change in barriers to refactoring 9. Kazman, R., Cai, Y., Mo, R., Xiao, L., Feng, Q., Haziyev,
S., Fedak, V., and Shapochka, A. A case study in
rather than include only develop- since our survey . locating the architectural roots of technical debt. In
ers known to refactor, there should We used two metrics to identify Proceedings of the 37th International Conference on
Software Engineering (Firenze, Italy). IEEE Press,
have been no obvious bias for or potential design-quality issues based 2015.
10. Kim, M., Zimmermann, T., and Nagappan, N. A field
against the use of refactoring. We re- on published theories. It is possible study of refactoring challenges and benefits. In
cruited through personal contacts, the theories are wrong. However, Proceedings of the ACM SIGSOFT 20th International
Symposium on the Foundations of Software
word of mouth, and social media, our conclusions are based on com- Engineering (Cary, NC). ACM Press, New York, 2012,
supported by our website http://se- ments by participants who (rightly or article 50.
11. Murphy-Hill, E. and Black, A.P. Refactoring tools:
folklore.com and a YouTube video. wrongly) believed the theories. Fitness for purpose. IEEE Software 25, 5 (Sept.-Oct.
The demographic data we collected 2008), 38–44.
12. Murphy-Hill, E., Parnin, C., and Black, A.P. How we
reflected a variety of experience, Conclusion refactor, and how we know it. IEEE Transactions on
roles, languages, level of qualifica- There are significant barriers pre- Software Engineering 38, 1 (Jan. 2012), 5–18.
13. Robson, C. and McCartan, K. Real World Research,
tion, and type of development. venting developers from refactoring Fourth Edition. John Wiley & Sons, Inc., 2015.
We set a high bar in assessing to remove software design-quality 14. Saldana, J. The Coding Manual for Qualitative
Researchers, Third Edition. SAGE Publications Ltd.,
whether a participant would agree issues, no matter how they are iden- 2016.
with placing a limit on number of tified. Reducing or even eliminating 15. Shatnawi, R., Li, W., Swain, J., and Newman, T.
Finding software metrics threshold values using
methods or class depth or would still the barriers has the potential to sig- ROC curves. Journal of Software Maintenance and
refactor when that limit is exceeded. nificantly improve software quality. Evolution: Research and Practice 22, 1 (Jan. 2010),
1–16.
Different choices would change the One means is to provide refactoring 16. Szöke, G., Nagy, C., Ferenc, R., and Gyimóthy, T. A
case study of refactoring large-scale industrial
distribution of the frequencies in the support that is goal-directed rather systems to efficiently improve source code. In
different categories somewhat, but than operations-directed. Another Proceedings of the 14th International Conference
on Computational Science and Its Applications
the same trends would be evident. is to provide better quantification (Guimarães, Portugal, June 30–July 3). Springer
Our wording of the possible re- of the benefits, thus better inform- International Publishing, Cham, Switzerland, 2014,
524–540.
sponses might have affected what re- ing the decision as to whether or 17. Vakilian, M., Chen, N., Negara, S., Rajkumar, B.A.,
sponses participants chose, but the not to refactor. Bailey, B.P., and Johnson, R.E. Use, disuse, and
misuse of automated refactorings. In Proceedings
barriers we identified came from of the 34th International Conference on Software
the free-text commentary, which Acknowledgments Engineering (Zurich, Switzerland). IEEE Press, 2012,
233–243.
was unlikely to be affected by such We thank our survey participants, 18. Xing, Z. and Stroulia, E. Refactoring practice: How it
concerns. Even those who would many of whom made the extra effort is and how it should be supported (an Eclipse case
study). In Proceedings of the 22nd International
refactor mentioned some of the to provide the comments we included Conference on Software Maintenance (Philadelphia,
same concerns (as reported in Table and discussed here. We also thank the PA). IEEE Press, 2006, 458–468.

3 “Yes” columns). Participants from anonymous reviewers for their valu-


all demographics provided com- able comments and suggestions. Ewan Tempero ([email protected]) is
an associate professor of computer science at the
ments, so there is no obvious bias University of Auckland, Auckland, New Zealand.
due to background, as was also the References Tony Gorschek ([email protected]) is a
case with our target group—those 1. Bavota, G., Oliveto, R., Gethers, M., Poshyvanyk, D., professor of software engineering at the Blekinge
and De Lucia, A. Methodbook: Recommending move Institute of Technology, Karlskrona, Sweden.
who would limit number of methods method refactorings via relational topic models.
IEEE Transactions on Software Engineering 40, 7 Lefteris Angelis ([email protected]) is a professor of
or depth of inheritance. (July 2014), 671–694. statistics and information systems in the Department
We conducted the survey more 2. Baxter, G., Frean, M., Noble, J., Rickerby, M., of Informatics at Aristotle University of Thessaloniki,
Smith, H., Visser, M., Melton, H., and Tempero, E. Thessaloniki, Greece.
than eight years ago and IDEs have Understanding the shape of Java software. In
since improved, so perhaps we would Proceedings of the 21st Annual ACM SIGPLAN ©2017 ACM 0001-0782/17/10
Conference on Object-Oriented Programming,
get different results today. However, Systems, Languages, and Applications (Oct. 2006),
little has changed in refactoring sup- 397–412.
3. Chen, J., Xiao, J., Wang, Q., Osterweil, L.J., and Li, M.
port, and we are not aware of any re- Perspectives on refactoring planning and practice:
search directly addressing the issues An empirical study. Empirical Software Engineering
21, 3 (June 2016), 1397–1436.
our survey identified. More recent 4. Chidamber, S.R., Darcy, D.P., and Kemerer, C.F.
surveys have identified similar issues. Managerial use of metrics for object-oriented
software: An exploratory analysis. IEEE
For example, Kim et al.10 reported Transactions on Software Engineering 24, 8 (Aug.
concerns regarding lack of adequate 1998), 629–639.
5. Chidamber, S.R. and Kemerer, C.F. A metrics suite
test suites. Chen et al.3 noted 21% of for object-oriented design. IEEE Transactions on
participants reported lack of support Software Engineering 20, 6 (June 1994), 476–493.
6. Fowler, M. Refactoring: Improving the Design of
from management as a reason for not Existing Code. Addison-Wesley, Boston, MA, 1999.
refactoring and 12.4% did no plan- 7. Gorschek, T., Tempero, E., and Angelis, L. A
large-scale empirical study of practitioners’ use of Watch the authors discuss
ning regarding refactoring. However, object-oriented concepts. In Proceedings of the 32nd their work in this exclusive
ACM/IEEE International Conference on Software Communications video.
these studies surveyed developers Engineering (Cape Town, South Africa, 2010), https://cacm.acm.org/videos/
who were already committed to refac- 115–124. barriers-to-refactoring

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 61
contributed articles
DOI:10.1145/ 3132745
weighs benefits against risks when it
Millennials entering the workforce ignore the comes to intention to use technology
in a business environment.
risks of using privately owned devices on the job. In 2013, we conducted an interna-
tional study involving 402 students in
BY HEIKO GEWALD, XUEQUN WANG, ANDY WEEGER, their final year of undergraduate study
MAHESH S. RAISINGHANI, GERALD GRANT, just before entering the workplace.
OTAVIO SANCHEZ, AND SIDDHI PITTAYACHAWAN We received feedback from students
at Neu-Ulm University of Applied Sci-
ences (Germany), Dongbei University

Millennials’
of Finance & Economics (China), Tex-
as Woman’s University (U.S.), Carleton
University (Canada), Fundação Getu-

Attitudes
lio Vargas (Brazil), and RMIT Univer-
sity (Australia). We found they share
a common set of values regardless of

Toward IT
nationality, including motivational
drivers that would alarm corporate IT
managers, if known. The individuals

Consumerization
in our sample value their own benefit
highly and dramatically neglect the
risks their actions might pose.

in the Workplace
The way we work, think, and behave
is heavily influenced by the Internet,
email, smartphones, and other tech-
nological innovations that have prolif-
erated over the past 20 to 30 years. The
generation of people born after 1980
is the first to grow up with information
everywhere, anytime24 and referred to
as digital natives, or, more commonly,
millennials.15
PEOPLE BORN AFTER 1980, often called “millennials” Many studies have sought to ana-
lyze them.20 For example, in their 2010
by demographic researchers, behave differently literature review, Ng et al.24 character-
from older generations in significant ways. They ized them as “want it all” and “want
are the first “digital natives,” the “always on it now.” It seems generally accepted
in research and practice alike that
generation” that expects to have information millennials are difficult to cope with
instantly and always available at its fingertips. Their
attitudes have been described by previous research key insights
in often unfavorable terms. And when they enter the ˽˽ Members of Generation Y (so-called
millennials) see the use of their privately
workplace, they pose a major challenge to managers owned devices for work as a necessity,
not an option.
from older generations, who, it has been shown,
˽˽ Millennials focus on their personal benefit,
typically follow a different set of values. generally ignoring the risks they may
introduce into corporate networks when
Our research investigates the attitudes of using their own devices and accounting for
millennials who have not yet entered the workforce risk only if it threatens them directly.

toward the use of information technology (IT) in ˽˽ When it comes to weighing risks
against benefits, such behavior is seen
terms of “IT consumerization.” Specifically, we want across developed economies, with no
significant cultural influence across an
to know how this significant part of the population international sample.

62 COM MUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


when entering the workplace from the of IT consumerization. Here come the ness purposes.31 “Ownership” of the
perspective of managers born perhaps cherry pickers. device or service is usually regarded as
decades earlier.32 However, given the a key characteristic of IT consumeriza-
need for talent in today’s technology- Research Background tion,14,25 as possession shifts from em-
driven society, especially in computer Here, we introduce the concept of IT ployer to user/employee.
science, management needs to adapt consumerization, then discuss how We expect IT consumerization to
and offer millennials working condi- national culture was relevant in our have a positive influence on employ-
tions that attract them.37 study: ees’ work performance by increasing
Previous research focused mainly IT consumerization. IT consumer- satisfaction, flexibility, and mobility.14
on millennials’ attitudes toward ization describes the ongoing process It is also demanded by more and more
work.3 Their attitude toward using of blending private and business life employees who want to use their own
technology for work purposes has when it comes to the use of technol- smartphone to, say, access corporate
not yet, however, received sufficient ogy, as driven by employees who push email messages.36 However, using pri-
academic attention.35 There is a IT solutions they use privately into vately owned devices at work involves
striking paucity of research looking the workplace.38 This applies to hard- risks like blurred boundaries between
at how millennials use technology for ware (as in “bring your own device,” or professional and private lives,5,13,25 cre-
professional and personal purposes.35 BYOD) where privately owned laptops, ating additional stress for the employ-
PHOTO BY DEA N DROBOT

To partially close this gap, we con- tablets, and smartphones are used for ee (such as when responding to email
ducted our study on the motivational business tasks but also to software messages on weekends when techni-
factors that shape millennials’ inten- when online email services or cloud- cally not at work). Not only users, but
tion to use technology in the context storage solutions are used for busi- their employers as well, face notable

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 63
contributed articles

challenges from this trend. Anecdotal individualism/collectivism; masculin- privately owned devices on the job is
evidence indicates corporate IT de- ity/femininity; uncertainty avoidance; determined by the outcome of weigh-
partments are under pressure to give and long-term orientation. Other re- ing perceived benefits vs. perceived
in to user demands to be allowed to search found that national cultural risks/associated costs, as in Figure 1.
use privately owned IT for work pur- values strongly affect an individual’s Perceived benefits. Perceived ben-
poses. However, granting myriad dif- IT-adoption behavior.12,32 efits “include all benefits which the
ferent consumer devices access to the In order to understand how mil- customer perceives as having been
corporate network is a nightmare for lennials perceive benefits and risks received.”18 In the context of IT use,
anyone concerned about IT security. associated with IT consumerization they reflect the overall positive utility
Consumerization of IT seems to across different cultures, we identi- individuals expect when using a par-
be a key characteristic of millenni- fied uncertainty avoidance (UA) and ticular technology.12 Prior research
als, as their desire to be always on is individualism/collectivism (IC) as demonstrated that perceived benefits
not limited only to the workplace. In the most relevant dimensions. Power significantly affect behavioral inten-
the same vein, they are accustomed to distance, masculinity/femininity, and tion regarding the use of IT.16,19
always using state-of-the-art technol- long-term orientation are important We define perceived benefits as
ogy, something not every work envi- in the more general context of tech- individuals’ assessment of the func-
ronment is able to provide, especially nology adoption but less relevant in tional benefits they associate with us-
because the definition of “state of the understanding the effect of perceived ing a privately owned device for work
art” is subjective. risks and benefits in the context of our purposes. Building on the premises of
Our study focused on mobile de- study. technology acceptance and use mod-
vices as an exemplary technology to “Uncertainty avoidance” refers els,22,29,33 we propose that the benefits
explain millennials’ behavioral inten- to “the extent to which individuals of using a privately owned device for
tions when it comes to the use of tech- feel vulnerable to unpredictable and work purposes are related to the char-
nology. This area is of great concern unknown situations.”9 People with acteristics of the technology and the
to practitioners, including CIOs and strong UA values fear uncertainty. In functional advances it provides.29 We
senior IT managers.31 the context of work-related technol- thus assume perceived benefits as a
To understand the role of such con- ogy, they need the predictability often multidimensional construct compris-
tradictory factors in individual deci- provided by rules, policies, and struc- ing three facets of employment behav-
sion making, social psychology pro- ture in organizations that IT consum- ior: performance expectancy; effort
vides net-valence models (NVMs) that erization contradicts or dilutes. UA expectancy; and compatibility.
assume individuals intend to perform can thus help understand how mil- Employees may realize productiv-
an action only if the perceived benefits lennials perceive the risks associated ity gains when allowed to select de-
outweigh the associated costs.7,26 Prior with IT consumerization. vices on their own.14 Consequently,
research found NVMs help explain the “Individualism/collectivism” is performance expectancy reflects the
adoption of technology-related ser- one of the most widely studied cultur- extent individuals perceive that using
vices.17 Other prominent theories on al values in cross-cultural research,30 privately owned devices supports their
technology adoption (such as Unified referring to “an individual’s prefer- ability to perform better at work.33
Theory of Acceptance and Use of Tech- ence for a social framework where in- Moreover, devices selected by individ-
nology33) do not capture the risks asso- dividuals take care of themselves (in- ual employees are usually perceived as
ciated with technology use and is why dividualism), as opposed to how they easier to use and more intuitive than
we chose NVMs as our theoretical lens expect the group to take care of them those provided by an IT department.25
(see Figure 1). in exchange for their loyalty (collec- We thus define effort expectancy as
Cultural values and IT use behav- tivism).”9 Individuals with individual- the degree of ease an individual as-
ior. Millennials’ use of corporate IT istic values have a more complex and sociates with using a privately owned
involves multiple challenges for IT more frequently sampled private self. device as compared to using a device
executives worldwide. However, the Consequently, their own goals, be- provided by an IT department. Overall
literature suggests behavioral mod- liefs, and values are more salient. Con- benefit perceptions are also formed
els do not apply universally across all sidering technology use at work, they by an individual’s work style and as-
cultures.30 Research by Srite and Kara- are more concerned with the benefits sociated needs and values. To capture
hanna30 showed the significance of they might achieve than the disadvan- these influential factors, Moore and
factors determining technology use tages that could arise for others. IC is Benbasat22 proposed the construct
are notably dependent on espoused thus useful in understanding how mil- “compatibility” as the degree to which
cultural values, particularly those re- lennials perceive benefits associated using a privately owned device for
flecting national culture. National cul- with IT consumerization, especially work purposes fits the individual’s
ture refers to “the collective program- when there could be conflict between work style. Employees who agree to
ming of the mind that distinguishes themselves and their employers. be available for work responsibilities
the members of one group or category (such as to respond to email messag-
of people from another.”11 Hofstede Research Model es) after work hours are more likely to
and Bond10 proposed five dimensions Based on NVMs, it is assumed an in- see the use of their devices for busi-
of national culture: power distance; dividual’s behavioral intention to use ness purposes as beneficial.

64 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


contributed articles

Following these insights, we hy- Using a device for both private and
pothesize that perceived benefits in- business purposes entails the risk that
fluence individuals’ consumerization personal information is disclosed to
behavior: the employer without the employee’s
Hypothesis 1. The greater the per-
ceived benefits of using privately Granting myriad consent and knowledge.21 Privacy risk,
as defined by Featherman and Pavlou6
owned devices for work purposes, the
greater an individual’s intention to
different consumer as the “potential loss of control over
personal information,”6 encompasses
participate in a BYOD program. devices access this facet of risky behavior. Business
“Perceived risk” reflects negative
utility from a subjective perspective, a
to the corporate data, as well as personal data, is at
risk. The potential for corporate data
concept introduced by Bauer2 as part network is to be exposed to unauthorized third
of his “Perceived Risk Theory,” which
assumes subjective risk perceptions
a nightmare for parties also increases when individu-
als use their private devices for work
directly influence an individual’s in- anyone concerned purposes.25 Information security is
tention to perform a certain action.4
Perceived risk is defined by Cunning- about IT security. one of the most important topics re-
lated to IT consumerization, as 90%
ham4 as “the amount that would be of all corporate data breaches fall into
lost, or that which is at stake, if the four patterns:34 lost and stolen devic-
consequences of an act were not fa- es, user-initiated crimeware, insider
vorable, and the individual’s sub- misuse, and miscellaneous human
jective feeling of certainty that the errors. To capture this facet of risky
consequences will be unfavorable.” behavior, we assume security risk, or
Featherman and Pavlou6 and Hoehle potential loss due to fraud or a hacker
et al.9 found perceived risk plays a sig- compromising corporate information
nificant role in individuals’ IT-use be- security,16 contributes to overall per-
havior. ceived risk.
To reflect the perceived cost as- We thus hypothesize that perceived
sociated with using privately owned risk negatively affects individuals’
devices, we define perceived risk as decisions regarding use of privately
the belief of individuals about the owned devices at work:
potential negative outcomes caused Hypothesis 2. The greater the per-
by using privately owned devices on ceived risk of using privately owned
the job. The negative consequences devices for work purposes, the lower
of such behaviors can be classified an individual’s intention to partici-
into multiple types of loss, indicat- pate in a BYOD program.
ing that, as with perceived benefit, We also assume the perceived risk
perceived risk is a multidimensional associated with IT consumerization
construct.6,16 Based on the arguments influences behavioral intention indi-
discussed earlier regarding consum- rectly by negatively affecting perceived
erization and its effects on corporate benefits. For instance, as a measure of
IT, we hypothesize that using privately safeguarding IT security, firms usu-
owned devices for business purposes ally adopt policies that allow them to
encompasses three facets of risk: per- erase data when an employee’s de-
formance; privacy; and security. vice is lost or stolen. Such “loss of full
Using privately owned devices for ownership” significantly affects the
work purposes generally shifts re- perceived benefits of BYOD.28 We thus
sponsibility from the IT department propose:
to the individual. For instance, the Hypothesis 3. The perceived risk of
individual is, at least psychologically, using privately owned devices for work
accountable for “how well the [device] purposes negatively affects an individ-
will perform relative to expectations.”1 ual’s perception of benefit.
The risk associated with using one’s Cultural values. Research provides
own devices on the job includes the evidence that millennials’ cultural
potential that the device the individ- values influence their technology-use
ual is responsible for is not sufficient behavior.30 However, it remains to be
for its intended business purpose. demonstrated whether the proposed
Performance risk thus reflects the po- NVM holds across the general popu-
tential for not being able to perform lation of millennials who reflect a va-
business activities as expected. riety of cultural values.30 We propose

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 65
contributed articles

Table 1. Dataset demographics. of using privately owned devices for


work purposes and hence to be more
Cluster A Cluster B Complete Set C
likely to use their own devices at work.
IC + / UA – IC – / UA +
We also expect perceived risk to be
n 242 160 402
less important for millennials who
espouse lower uncertainty-avoidance
Gender cultural values.
Male 44.2% 37.5% 41.5% We thus propose that there are dis-
Female 55.4% 62.5% 58.2% tinctive subcultures within the overall
Age group of millennials that are deter-
<= 21 years 53.7% 55.6% 54.5% mined by differences regarding their
22 to 25 years 46.3% 44.4% 45.5% UA and IC values. We hypothesize that
the subcultures will take different ap-
Country
proaches toward risk/benefit-assess-
Australia 9.9% 6.3% 8.5%
ment of their intention to participate
Brazil 7.0% 9.4% 8.0%
in a BYOD program:
Canada 6.2% 10.6% 8.0%
Hypothesis 4. The effect of the per-
China 39.7% 27.5% 34.8%
ceived risks of using privately owned
Germany 13.6% 13.8% 13.7%
devices for work purposes differs
U.S. 14.0% 23.8% 17.9%
among millennials with lower UA
Other 9.6% 8.6% 9.1%
scores and millennials with higher UA
scores. The effect of perceived ben-
efits of using privately owned devices
Figure 1. Research model. for work purposes also differs among
millennials with higher IC scores and
millennials with lower IC scores.

Uncertainty
Methodology
Avoidance Here, we discuss our data collection,
Performance Risk sample clustering, data analysis, and
results:
Perceived H4a Data collection. To test our re-
Privacy Risk
Risks
search model, we developed a ques-
H2 (–)
tionnaire with a set of measurement
Security Risk
items for each construct, and to
Behavioral
H3 (–)
Intention
safeguard measurement validity, we
adapted items from prior research, as
Performance Expectancy outlined in the online appendix (dl.
H1 (+)
acm.org/citation.cfm?doid=3132745
Effort Expectancy Perceived &picked=formats).
Benefits H4b
We distributed the questionnaire
among students in their final year of
Compatibility
Individualism/
undergraduate studies and with rel-
Collectivism evant work experience (most respon-
dents worked full time for at least six
months during their studies) using
an online survey tool. Our approach
is consistent with Vodanovich et al.35
that the explanatory power of the the- ing a privately owned device can indi- who suggested conducting surveys
oretical model depends on how dis- cate that individuals are more strongly with students to understand how
tinctively millennials espouse charac- concerned with their own needs than millennials (“digital natives” in their
teristic cultural values. with those of the collective, includ- terminology) use technology. We col-
Based on these arguments, we ex- ing their employers. We thus expect lected data from students with “tech-
pect to see distinctions in individu- individuals who espouse individual- nology-affine” majors—“information
alistic values and perceptions toward istic cultural values to more strongly systems,” “industrial engineering,”
uncertainty. Using their own devices value the benefits of using privately and “business administration”—in a
for work purposes enables millennials owned devices. Likewise, we assume number of universities worldwide. We
to express their sense of self and bet- individuals who experience less diffi- chose countries with different values
ter achieve their own goals and follow culty dealing with uncertainty to put of UA and IC, according to Hofstede.11
their own beliefs and values. Also, us- less emphasis on the potential risks After stripping out incomplete ques-

66 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


contributed articles

tionnaires, we received a total of 402 The formative measures of “per- and security risk contributed signifi-
valid responses. ceived benefits” were significant, at cantly only to the formative index of
Clustering for espoused cultural least at the .05 level, and path coeffi- the complete dataset. Although not
values. We conducted exploratory fac- cients were greater than .1, suggest- all facets of risky behavior contrib-
tor analysis, a statistical method used ing the chosen characteristics of each uted significantly, VIF was less than 2
to uncover the underlying structure category were relevant for the forma- within all three datasets, confirming
in a large set of variables, to test the tion of the construct (see Table 2). indicator validity. Consequently, low
“unidimensionality” of the measure- Moreover, the variance inflation factor redundancy of indicators’ informa-
ment items for IC and UA. It revealed (VIF) was less than 2, supporting our tion was confirmed.
three items measuring IC and two assumption for indicator validity. The results show performance
items measuring UA load with a high The formative measures of “per- risk contributed significantly to over-
coefficient on the factors they are in- ceived risks” revealed mixed results all risk perception in dataset B and
tended to measure (loadings > 0.79). regarding the risk facets’ contribu- C, while security risk contributed to
Using the factor scores of these items, tion to the formation of the formative overall risk perception in only the
we then conducted a K-means clus- index. We found privacy risk was not complete dataset C. Privacy risk did
tering. Cluster analysis revealed two relevant regardless of dataset used; not significantly contribute to per-
clusters (see Table 1) where the first performance risk was significant only ceived risk, regardless of dataset. And
cluster (referred to as A) encompassed in subset B and the complete dataset; performance expectancy, effort expec-
respondents with high IC scores (clus-
ter center 0.13) and low UA scores Table 2. Formative constructs measurements.
(cluster center −0.61), and the second
cluster (referred to as B) encompassed Construct Facet Cluster A Cluster B Complete Set C

respondents with rather low IC scores Performance 0.740 0.631* 0.713**


Perceived
(cluster center −0.20) and rather high Privacy -0.315 0.133 -0.124
Risks
UA scores (cluster center 0.93). Security 0.668 0.530 0.613*
We characterized the respondents Performance 0.522*** 0.430** 0.491***
Perceived Expectancy
in group A as more individualistic
Benefits Effort Expectancy 0.325* 0.230* 0.292***
and less risk-averse than in group B.
Compatibility 0.349** 0.538*** 0.420***
Although both groups encompass
only millennials, they showed differ- *** p < 0.001; ** p < 0.01; * p < 0.05

ent characteristics when it comes to


the formation of behavioral intention.
One could argue A is the more forth- Figure 2. Structural model assessment.
coming, self-centered, and aggressive,
while B represents the more group-
oriented and considerate.
Measurement model assessment.
We tested our model with partial
least squares using SmartPLS 3.0 with Perceived
1,000 samples bootstrapping, assess- Risks
ing the measurement model with the A –0.058
B –0.079***
complete dataset C, as well as with C –0.062***
clusters A and B.
We measured behavioral intention
reflectively (loadings of the indica- Behavioral
Intention
tors were above 0.95 and significant at A –0.126
B –0.268* A 0.335
the .001 level) and confirmed internal R2 = B 0.419
C –0.178**
consistency by assessing Cronbach’s C 0.368
Alpha (CA) and Composite Reliabil-
ity (CR) measures. Both exceeded the
threshold of 0.90 for all datasets: A 0.568***
A CA=0.90, CR=0.95; B CA=0.92, B 0.621***
CR=0.96; C CA=0.91; and CR=0.96. C 0.593***
The average variance extracted was Perceived *p < 0.05
Benefits **p < 0.01
greater than 0.50 (A 0.91; B 0.93; and ***p < 0.001
C 0.92), demonstrating sufficient
convergent validity. Finally, we used
cross-loading analysis to confirm that
all constructs load highest with their
respective items.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 67
contributed articles

tancy, and compatibility contributed Discussion mediated through perceived benefits.


significantly to perceived benefits in These results confirm our expecta- However, this effect applies only to
all datasets. tions but also point to unanticipated group B.
Structural model assessment. The conclusions. Millennials are egocen- That is, the results show that the
results (see Figure 2) demonstrate the tric, especially when it comes to their espoused cultural values individual-
millennials responding to the survey expectations regarding the work- ism and uncertainty avoidance only
primarily consider the benefits and place.24,27 The respondents in our sur- slightly influenced survey-responding
neglect the risks of using technol- vey expect great benefit from using millennials’ decision making regard-
ogy; in our research context, which in- their own devices, but the related risks ing their IT consumerization behav-
volves taking part in a corporate BYOD are not reflected in their behavioral de- ior. Given these findings, our study
program. cision patterns unless they are person- contributes to theory and practice
Detailed analysis confirmed the ally affected. In our sample, we also alike. First, it provides an NVM that
existence of two distinct groups of found that only those millennials who accounts for the particularities of IT
millennials differentiated by their es- are more strongly oriented toward the consumerization, demonstrating the
poused cultural values in our sample. collective and somehow feel vulner- risks/costs associated with the use of
We named them due to their strong able in unpredictable situations (pru- privately owned devices at work do
characteristics the “narcissists” dent) are influenced to at least some not significantly affect decision mak-
(group A) and the “prudent” (group degree by risk perceptions regarding ing for most millennials. Further evi-
B). Both groups showed intentions the performance of their privately dence is thus given that millennials
expected by millennials: take the ben- owned devices. These findings con- are concerned mainly with their own
efits and ignore the risks. However, firm what research expects23,24,27 and benefit and happy to neglect the risks
the groups also showed different ap- anecdotal evidence underlines. that do not jeopardize them directly.
proaches to acting on their risk-and- What we did not expect was the Privacy risk seems to be of no concern,
benefit perceptions, depending on cross-cultural homogeneity of the as in Vodanovich35
their respective IC/UA scores. sample. Millennials’ values seem Second, the study also enhances
For millennials with high IC scores uniform regardless of cultural back- our knowledge of the role of millen-
and low UA scores—narcissists—risk ground. Throughout our sample we nials’ espoused cultural values, show-
perceptions have no effect, while ben- did not find statistically significant ing for the first time in the academic
efits are weighted comparatively high differences in responses based on in- literature that the formation of tech-
when it comes to technology-adoption terviewee nationality. Investigating nology-use decisions are apparently
decisions. Additionally, perceived further, we used two distinct cultural universal for millennials independent
risks do not significantly affect per- values—IC and UA—to cluster the re- of cultural background.
ceived benefits. For the group with low sponses. We were able to distinguish For corporate IT managers, our
IC scores and high UA scores (prudent) between two groups—narcissist and findings are a warning: Millennials
risk and benefit perceptions are—rel- prudent—that react as typical millen- will quite likely use their private de-
ative to the narcissists—more impor- nials but also differ in tendencies to- vices for work, not necessarily as part
tant for the intention to participate in ward their intention to participate in of an official program, then through
a BYOD program. For these millenni- BYOD programs. Forming their inten- shadow IT. To avoid this, structured
als, we found perceived risk was sig- tion, the prudent (with lower IC and offerings should be made available.
nificantly related to perceived benefits higher UA) seem slightly more risk- IT security managers should thus con-
(medium effect, Cohen’s d=.163), in- sensitive than the narcissists (with sider the implications of millennials’
dicating they are more aware of risk higher IC and lower UA). Contrary to egocentric approach to risk. If users
and react more cautiously. our expectations, the two groups were are not concerned about the risks
Overall, the results show our re- well distributed among all partici- their behavior poses to the company,
search model is capable of explaining pants in our sample; that is, we found then the company itself must be dou-
a good portion of the variance in mil- no significant correlation with nation- bly cautious. To counter this potential
lennials’ behavioral intention to par- ality or any other control factor to de- threat, companies need to create ro-
ticipate in a BYOD program (R²=.37). fine the two clusters. bust BYOD programs that provide em-
The explanatory power of the model is Even if the intention to use private- ployees the benefits of using their own
slightly stronger (R²=.42) for individu- ly owned devices had formed differ- devices and simultaneously safeguard
als categorized as the prudent. ently depending on IC and UA scores, corporate data and networks.
Finally, removing perceived ben- millennials remain millennials first Our findings embrace the typical
efits from the model leads to a sig- and foremost. The proposed facets of limitations of empirical research: We
nificantly stronger path between per- risk do not significantly contribute to had only a limited number of partici-
ceived risk and behavioral intention their risk perceptions, which, in turn, pants from a limited number of coun-
for the prudent (β=−244**) and for the do not show considerable direct effect tries and who cannot be representative
combined dataset (β=−166**). This re- on behavior intention to participate in of the global millennial population.
sult indicates the effect of perceived corporate BYOD programs. Our data As such, our findings represent an
risk on behavioral intention is most implies that risk perceptions influ- indication but are not generalizable
likely mediated by perceived benefits. ence their decision making, which is for all millennials. Additionally, our

68 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


contributed articles

research subjects were students. References


Media, and Research: Ageism and the Younger Worker.
Palgrave Macmillan, London, U.K., 2017.
Although all participants belonged 1. Aldás-Manzano, J., Lassala-Navarre, C., Ruiz-
24. Ng, E.S.W., Schweitzer, L., and Lyons, S.T. New
Mafe, C., and Sanz-Blas, S. The role of consumer
to the millennial generation and had innovativeness and perceived risk in online banking
generation, great expectations: A field study of
the millennial generation. Journal of Business and
working experience, they were cur- usage. International Journal of Bank Marketing 27, 1
Psychology 25, 2 (June 2010), 281–292.
(2009), 53–75.
rently not all actually working. The 25. Niehaves, B., Köffer, S., and Ortbach, K. IT consumerization:
2. Bauer, R.A. Consumer behavior as risk taking. In
A theory and practice review. In Proceedings of the
part of the millennials population al- Risk Taking and Information Handling in Consumer
18th Americas Conference on Information Systems
Behavior, D.F. Cox, Ed. Harvard University Press,
ready in the active workforce was not (Seattle, WA, Aug. 9–11). Association for Information
Cambridge, MA, 1967, 23–33.
Systems, Atlanta, GA, 2012.
included in our dataset. Even though 3. Breitsohl, H. and Ruhle, S. Differences in work-related
26. Peter, J.P. and Tarpey Sr., L.X. A comparative analysis
attitudes between Millennials and Generation X:
of three consumer decisions strategies. Journal of
we took care, it cannot be ruled out Evidence from Germany. In Managing the New
Consumer Research 2, 1 (June 1975), 29–37.
that the occasional participant lacked Workforce: International Perspective on the Millennial
27. Pew Research. Teens, Social Media, and Privacy.
Generation, E.S.W. Ng, S. Lyons, and L. Schweitzer,
Pew Research, Washington, D.C., 2013; http://
the required work experience. It is Eds. Edward Elgar, Cheltenham, U.K., 2014, 107–129.
www.pewinternet.org/files/2013/05/PIP_
4. Cunningham, S.M. The major dimensions of perceived
thus possible that some participants risk. In Risk Taking and Information Handling in
TeensSocialMediaandPrivacy_PDF.pdf
28. Rapoza, J. The increasing costs of BYOD. Aberdeen
did not fully understand or misin- Consumer Behavior, D.F. Cox, Ed. Harvard University
Group Market Alert, Boston, MA, 2014.
Press, Cambridge, MA, 1967, 21–33.
terpreted the BYOD concept and its 5. Derks, D., Duin, D., Tims, M., and Bakker, A.B.
29. Rogers, E.M. Diffusion of Innovations. Free Press,
New York, 1995.
implication for their future working Smartphone use and work-home interference: The
30. Srite, M. and Karahanna, E. The role of espoused
moderating role of social norms and employee
lives. work engagement. Journal of Occupational and
national cultural values in technology acceptance.
MIS Quarterly 30, 3 (Sept. 2006), 679–704.
Organizational Psychology 88, 1 (Mar. 2015), 155–177.
31. Steelman, Z.R., Lacity, M., and Sabherwal, R. Charting
6. Featherman, M.S. and Pavlou, P.A. Predicting
Conclusion e-services adoption: A perceived risk facets
your organization’s bring-your-own-device voyage.
MIS Quarterly Executive 15, 2 (2016), 85–104.
We conducted an international study perspective. International Journal of Human-
32. Twenge, J.M. The evidence for Generation Me and
Computer Studies 59, 4 (2003), 451–474.
to obtain information about how mil- against Generation We. Emerging Adulthood 1, 1
7. Fishbein, M., Ed. Readings in Attitude Theory and
(2013), 11–16.
lennials about to enter the workforce Measurement. John Wiley & Sons, Inc., New York, 1967. 33. Venkatesh, V., Morris, M., Davis, G., and Davis, F.
8. Gewald, H. and Franke, J. The risks of business User acceptance of information technology: Toward
weigh risks and benefits when it comes process outsourcing: A two-fold assessment in the a unified view. MIS Quarterly 27, 3 (Sept. 2003),
to using privately owned technology German banking industry. International Journal of 425–478.
Electronic Finance 1, 4 (2007), 420–441. 34. Verizon. Data Breach Investigations Report, 2015;
in the workplace. It included 402 stu- 9. Hoehle, H., Zhang, X., and Venkatesh, V. An espoused https://iapp.org/media/pdf/resource_center/Verizon_
dents in their final year of study and cultural perspective to understand continued intention data-breach-investigation-report-2015.pdf
to use mobile applications: A four-country study of 35. Vodanovich, S., Sundaram, D., and Myers, M. Digital
with relevant work experience from mobile social media application usability. European natives and ubiquitous information systems.
Journal of Information Systems 24, 3 (2015), 337–359.
six countries. The results indicate they 10. Hofstede, G. and Bond, M.H. The Confucius
Information Systems Research 21, 4 (2010), 711–723.
36. Weeger, A. and Gewald, H. Factors influencing future
pay a lot of attention to their own ben- connection: From cultural roots to economic growth. employees’ decision making to participate in a BYOD
Organizational Dynamics 16, 4 (Spring 1988), 5–21.
efit and significantly neglect the risks 11. Hofstede, G., Hofstede, G.J., and Minkov, M. Cultures
program: Does risk matter? In Proceedings of the 22nd
European Conference on Information Systems (Tel
associated with using privately owned and Organizations: Software of the Mind. McGraw Hill, Aviv, Israel), 2014.
New York, 2010.
technology on the job. We tested our 12. Kim, D. and Olfman, L. Determinants of corporate Web
37. Weeger, A., Wang, X., and Gewald, H. IT consumerization:
BYOD-program acceptance and its impact on employer
hypothesis by asking them about their services adoption: A survey of companies in Korea. attractiveness. Journal of Computer Information
Communications of the AIS 29, 1 (2011), 1–24.
intention to use privately owned de- 13. Köffer, S., Anlauf, L., Ortbach, K., and Niehaves,
Systems 56, 1 (Fall 2016), 1–10.
38. Weiß, F. and Leimeister, J.M. Consumerization: IT
vices at work by enrolling in a corpo- B. The intensified blurring of boundaries between innovations from the consumer market as a challenge
work and private life through IT consumerization. for corporate IT. Business & Information Systems
rate BYOD program. The responses In Proceedings of the European Conference on Engineering 54, 6 (Dec. 2012), 363–366.
showed they expect to use their private Information Systems (Münster, Germany, 2015).
14. Köffer, S., Ortbach, K., Junglas, I., Niehaves, B., and Harris,
devices for work purposes. They per- J. Innovation through BYOD? Business & Information Heiko Gewald ([email protected]) is a
ceive major benefits from using their Systems Engineering 57, 6 (Dec. 2015), 1–13. research professor of information management and
15. Kraut, R. and Burke, M. Internet use and psychological director of the Center for Research on Service Sciences
own devices but pay little attention well-being: Effects of activity and audience. Commun. at Neu-Ulm University of Applied Sciences, Neu-Ulm,
to the risk such use may impose on ACM 58, 12 (Dec. 2015), 94–100. Germany.
16. Lee, M.-C. Factors influencing the adoption of
their employers. These findings have Internet banking: An integration of TAM and TPB Xuequn Wang ([email protected]) is a
with perceived risk and perceived benefit. Electronic lecturer at Murdoch University School of Engineering and
important implications for corporate Commerce Research and Applications 8, 3 (May-June Information Technology, Perth, Australia.
IT managers who must deal with this 2009), 130–141.
Andy Weeger ([email protected]) is a
17. Li, Y., Wang, X., Lin, X., and Hajli, M. Seeking and
new workforce (of cherry pickers). sharing health information on social media: A research assistant at Neu-Ulm University of Applied
Sciences, Neu-Ulm, Germany, and a Ph.D. candidate in the
However, a new generation—born net-valence model and cross-cultural comparison.
University of Bamberg, Bamberg, Germany.
Technological Forecasting and Social Change (July 20,
after 1998, or “Gen-Z”—will itself soon 2016); https://doi.org/10.1016/j.techfore.2016.07.021 Mahesh S. Raisinghani ([email protected]) is
enter the workforce23 and expected to 18. Liljander, V. and Strandvik, T. Estimating zones of a professor of management information systems in
tolerance in perceived service quality and perceived the MBA (Executive Track) in the College of Business
behave differently—more concerned service value. International Journal of Service Administration at Texas Woman’s University, Denton, TX.
with risk and less on their own person- Industry Management 4, 2 (1993), 6–28.
19. Liu, Y., Yang, Y., and Li, H. A unified risk-benefit analysis Gerald Grant ([email protected]) is a director of
al benefit.27 This change in employees’ framework for investigating mobile payment adoption. the Centre for Information Technology, Organizations, and
mindset will provide fruitful ground In Proceedings of the 2012 International Conference People and associate professor of information systems
on Mobile Business. AIS Electronic Library, Atlanta, in the Sprott School of Business, Carleton University,
for further research. Further research GA, 2012; http://aisel.aisnet.org/icmb2012/20 Ottawa, Canada.
should also look toward longitudinal 20. Lyons, S. and Kuron, L. Generational differences in the
Otavio P. Sanchez ([email protected]) is a professor
workplace: A review of the evidence and directions for
datasets, as it would be interesting to future research. Journal of Organizational Behavior in the Business Administration Ph.D. Program at Fundação
35, S1 (Feb. 2014), 139–157. Getulio Vargas, Sao Paulo, Brazil.
find out how perceptions change be-
21. Miller, K.W., Voas, J., and Hurlburt, G.F. BYOD: Security Siddhi Pittayachawan ([email protected].
fore entering the workplace and then and privacy considerations. IT Professional 14, 5 au) is a senior lecturer of information systems and supply
again after several months and years. (Sept.-Oct. 2012), 53–55. chain management in the School of Business IT and
22. Moore, G.C. and Benbasat, I. Development of an Logistics at RMIT University, Melbourne, Australia.
Maybe IT managers could see cherry instrument to measure the perceptions of adopting
an information technology innovation. Information
pickers evolve into socially minded Systems Research 2, 3 (1991), 192–222. Copyright held by the authors.
corporate citizens over time. 23. Nadler, J.T., Morr, R., and Naumann, S. Millennials, Publication rights licensed to ACM. $15.00

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 69
review articles
DOI:10.1145/ 3048384
are rendered.27 As it turned out, large
Exploring the technical and ethical issues numbers of people wanted to do just
that.28 Ad blockers had been available
surrounding Internet advertising and ad blocking. for some time, but their potential use
in the world’s most popular mobile
BY STEPHEN B. WICKER AND KOLBEINN KARLSSON browser heightened their saliency and
brought the debate over their use—a

Internet
debate sometimes serious and nu-
anced, but often frivolous—into the
mainstream media.29
To put the issue into perspective,

Advertising:
consider the following provided by
PageFair, “a leading provider of coun-
ter ad block solutions to Web publish-
ers,” in its 201530 and 201631 reports on

Technology,
ad blocking:
˲˲ Ad blocking was estimated to have
cost publishers nearly $22 billion dur-

Ethics, and
ing 2015.
˲˲ As of November 2016, at least 309
million people are blocking advertis-

a Serious
ing on their smartphones.
˲˲ 298 million of these people use an
ad blocking browser, more than twice
the number using blocking browsers

Difference in 2015.
˲˲ Ad blocking is particularly popular
in emerging markets, with the largest

of Opinion
number of active monthly users in Chi-
na, India, and Indonesia. The U.S. is in
ninth place.
In its 2016 report, PageFair made
the following prediction: Mobile ad
blocking is a serious threat to the future
of media and journalism in emerging
markets, where people are coming online
for the first time via relatively expensive

key insights
“Every time you block an ad, what you’re really blocking
˽˽ Internet advertisers use networks of
is food from entering a child’s mouth.” 25 supply- and demand-side platforms and
automated auctions to deliver targeted
advertising to readers in a matter of
“In reality, ad blockers are one of the few tools that we as milliseconds.
Ad blocking is an existential threat to
users have if we want to push back against the perverse ˽˽
the Internet advertising industry, with
design logic that has cannibalized the soul of the Web.” 26 costs to advertisers ranging in the tens of
billions of dollars.
˽˽ The argument that ad blockers violate an
I N FALL 2015 , Apple introduced a “content blocking” implicit contract between the reader and
IMAGE BY YIORG OS GR

content provider fails on legal grounds.


extension point into its Safari mobile browser, ˽˽ A virtue ethics-based analysis clearly
providing a hook for software that prevents supports ad blockers, while also
pointing to solutions that may benefit all
advertisements from being loaded when Web pages stakeholders.

70 COMMUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 71
review articles

or slow mobile connections. Usage in on wireless cellular links, a burden Suppose the Web browser requests
Western economies is likely to grow as that is usually funded by unwilling us- a page from a content publisher that
more manufacturers and browsers start ers. The ad-blocking software provider supports his or her work through ad-
to include ad blocking as a feature.31 Shine, an Israeli startup that began life vertising (this is represented in the
Given the amount of money in- in 2011 as an anti-virus software devel- accompanying figure by link 1). Most
volved in advertising, one might expect oper, estimates that advertising con- publishers do not generate their own
a certain amount of invective on the sumes between 10% and 50% of user advertising content, so they will embed
subject of ad blocking. One would be data plans, depending on user location. requests for advertising into the HTML
correct. Ad blocking has been referred A typical mobile gaming app with adver- files they send to requesting users (link
to as “evil “and as a form of “theft.”32 tising was found to consume 5Mb over a 2). When the requesting host attempts
Ad Age, an advertising industry trade five-minute session, but only 50Kb with to render the HTML file, it will generate
magazine, accused ad blockers of be- ad blocking in place.17 requests for advertisements from an ad
ing exploitative, extortionate, and anti- Shine produces ad-blocking soft- exchange. The ad exchange, as shown in
democratic, all within the space of a ware that can be incorporated into the figure, sits at the center of a network
single sentence: cellular datacenters. In June 2016, the consisting of supply side and demand
As abetted by for-profit technology U.K. cellular service provider Three be- side entities. The supply side entities
companies, ad blocking is robbery, plain came the first to conduct trials using provide information about the user,
and simple—an extortionist scheme that this software to block ads on cellular while the demand side entities provide
exploits consumer disaffection and risks data connections.12 Given that mar- advertising in response to requests
distorting the economics of democratic keters are expected to have spent over from publishers.
capitalism.33 $100 billion on mobile ads in 2016,10 The HTML code provided by the
Randall Rothenberg, president and the response is expected to be extreme. publisher directs the host to a supply-
chief executive officer for the Interac- In this article, we explore how adver- side platform (SSP—link 3). The re-
tive Advertising Bureau accuses ad tising networks and ad blockers work. quest sent to the SSP includes a cook-
blocking “profiteers” of “stealing from We further consider how ad blockers ie—a small string of information that
publishers, subverting freedom of the are subverted, and whether they are eth- was previously stored by the SSP on the
press, operating a business model ical. The ethical analysis yields mixed user’s computer. The cookie enables
predicated on censorship of content, results, but it does, however, suggest a the SSP to craft a response that is specif-
and ultimately forcing consumers to solution that empowers users, allowing ically tailored to the requesting user. In
pay more money for less—and less di- them to select the types of ads that they this case, the cookie will include a user
verse—information.”34 see and how often they see them. ID that the ad exchange can use to co-
On the other side of the debate, ordinate bidding for an advertisement.
many have pointed to the ads them- The Technology of Ad The ad exchange forwards the user
selves as fostering needless consump- Networks and Ad Blockers ID and any other information that it
tion while being tasteless, intrusive, Web browsers request a Web page may have about the requesting user to
and evil (this word occurs a lot in these from a server by sending an HTTP GET one or more demand-side platforms
discussions), while suggesting that the command to the appropriate Internet (DSPs) that place bids on behalf of ad-
advertising industry brought ad block- host. The host responds with HTML vertisers for the opportunity to display
ing upon itself.1 code that the Web browser uses to ren- their ads. Through a process known as
There are purely technical issues as der the desired page and present it to cookie syncing, the DSPs are able to
well. The technology that allows Inter- the user. This much is both simple and match the SSP cookie ID to a user pro-
net advertisers to better target potential ubiquitous, but the details, particularly file, which is often stored and managed
consumers slows the loading of Web when advertising is involved, are much by a separate entity called a data man-
pages and places a significant burden more complicated. agement platform (DMP).
As multiple SSPs and DSPs can use
Internet ad delivery is a complex process involving multiple redirections, synchronization of the same DMP, the DMP may link a wide
user information, and an auction, all in a few tens of milliseconds.
range of user IDs to the same person.
Normal HTTP requests/redirects
This enables all interested parties (other
Cookie ID included in HTTP request SSP than the user) to exchange information
Actual ad served 4
on the user and form a more complete
3
picture of that user’s browsing history.
First-party websites may also partici-
1 pate in the process, providing yet more
Publisher Ad Data Economy
DMP
Website Exchange user information. For example, if a user
2
5 supplies an email address to a website
7 to sign up for its newsletter, the email
8 address can be linked at the DMP to the
DSP
cookie IDs associated with that user. If
Ad Server 6
the user provides a name and address
to a website, that information may also

72 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


review articles

be linked to the cookie IDs. The DMP Ad blockers can use several methods
may take this a step further by including to disrupt the process described ear-
information inferred from the user’s lier, thus prevent ads from being dis-
social media activity, purchase history played. Many prominent ad blockers,
on various sites, search history, and
email messages. Finally, the DMP may Given the amount such as Adblock Plus and its variants,
block ads by preventing the browser
have access to data gathered offline.
Data aggregators are known to collect
of money involved from sending HTTP requests to certain
URLs. The URL blacklist for a given
data from publicly available records, in- in advertising, blocker is often a crowd sourced effort,
cluding licensing records (for example,
licenses for doctors, lawyers, pilots, or
one might expect such as EasyList, the default blacklist
used in Adblock Plus. EasyList is prob-
hunters or fishermen licenses), voter a certain amount ably the most widely used blacklist;
registration databases, court records,
and DMV records, as well as buying data
of invective on the number of EasyList downloads was
used by PageFair and Adobe to esti-
from commercial sources including the subject mate the prevalence of ad blocking in
brick-and-mortar store purchase histo-
ries and transaction information from of ad blocking. their 2015 joint report.30
While URL blacklisting appears to be
financial services companies.9 Data ag- One would the most common method of ad block-
gregators also buy and sell information
from each other. This whole system of be correct. ing, the Electronic Frontier Founda-
tion’s Privacy Badger takes a different
transactions is often referred to as “the approach,36 attempting to learn which
data economy.” Through this data econ- domains and sites are tracking a user
omy, the DMP is able to build a strik- and blocking the ones that do. It detects
ingly detailed simulacrum of an indi- behavior such as the use of uniquely
vidual consumer, a simulacrum whose identifying cookies, canvas fingerprint-
accuracy drives the advertisers’ return ing, and the appearance of the same
on investment, and whose inaccuracy third-party site at multiple domains. As
may drive the consumer to distraction. such, it blocks very few domains at first,
If a DSP determines the user profile but the more it is used, the more it learns
fits its target audience, it places a bid for to block. It should be noted that Privacy
advertising space on the web page being Badger aims to prevent tracking, not
rendered by the host computer. The ad ads; but since the two are intimately con-
exchange selects an ad from among the nected, it often serves both purposes.
bidding DSPs; a Vickrey auction is gen- A third ad blocking method blocks
erally used, where the highest offer is website elements fitting certain pat-
selected and the amount paid is that of- terns; for example, it could look for the
fered by the second highest bidder. The “iframe” HTML tag and check to see
winning DSP provides a URL for retriev- if it contains text strings like “Spon-
ing the ad. In an “impression”-based sored” or links to a URL with the word
system, an agency ad server determines “ad” in it. This method can block ad-
whether the ad is actually downloaded, vertising served by the Web site itself
and pays the publisher accordingly, as opposed to just third parties, adver-
with the cost per thousand impressions tising that includes ads embedded in
(CPM) being the most widely used sta- search results and social media feeds.
tistic in Internet advertising.35 All of this This content filtering can happen
happens within tens of milliseconds, at the client or at an intermediate
though the actual loading of the win- proxy. Some ad blockers use a root cer-
ning ad into the user’s browser may take tificate to redirect browser requests to
far longer depending on the bandwidth a VPN or proxy that removes ad con-
of the user connection and the size of tent using the methods previously
the ad. mentioned before forwarding the
As a result of this process, the pub- HTML code to the browser. This ap-
lisher of the content often has limited proach can block ads for mobile apps
control over the safety, quality, or taste- as well as browsers, but it comes with
fulness of the ad seen by the content the risks associated with having third
consumer. A publisher may, for exam- parties interfere with browser traffic,
ple, be able to prevent advertisements risks that include the classic man-in-
from a particular advertiser or class of the-middle attacks. Apple recently re-
advertisers, but she may not be able to moved several ad-blocking apps from
exercise finer control. its app store on this basis.37

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 73
review articles

Publishers will sometimes try to cir- the ad, only serving malvertising every
cumvent attempts at ad blocking. Anti- 10th or 20th time, and not serving mal-
ad blocking usually works by serving a vertising to certain IP addresses.5 Even
fake ad in some way and verifying that large and reputable websites have been
it has been loaded or displayed. If it fails
to load, the site stops displaying the pri- Publishers will known to accidentally serve malvertis-
ing, making malvertising a potential
mary content or refuses to load it in the
first place. For example, a site can con-
sometimes try problem for every Internet user.

tain an iframe ostentatiously marked as to circumvent Is Ad Blocking


“Advertising” and then use JavaScript
to see if it was displayed.a If the iframe
attempts at ad a Breach of Contract?
It would take a long law review article,
is not displayed, the site does not pro- blocking. Anti-ad and one written by another set of au-
vide the primary content. Similarly, the
browser can be directed to load a JavaS-
blocking usually thors, to properly address the legality of
ad blocking. We do, however, wish to ad-
cript with a name, such as “ads.js,” that works by serving dress the oft-cited argument that the pro-
can be found in common ad blocker
filter rules and check to see if it is run. a fake ad in some vision of free content that contains ads
is done under an “implicit contract.”41
Aside from trying to explicitly detect ad way and verifying Under this contract, the consumer is
blockers, ad networks can obfuscate the
URLs of their ads, such as by using IP it has been loaded provided with free content in return for
the user’s agreement to view advertise-
addresses instead of domain names.
Ad blockers can often adapt, circum-
and displayed. ments. This is not a new argument, as it
has been applied by network executives
venting new anti-ad blocking mecha- to broadcast television for many years,
nisms. Facebook recently announced it sometimes in a very extreme form. In
would prevent ad blocking,38 only to have 2002, Jamie Kellner, then CEO of Turner
Adblock Plus announce a few days later Broadcasting, suggested that any sys-
that it found a way to defeat Facebook’s tematic practice of using the bathroom
prevention technique.39 This is but one during commercials was stealing.7
example of the evolving arms race be- In the U.S., the legal concept under-
tween publishers and ad blockers.16 lying this argument is the “implied-in-
Though the initial motivation for ad fact” contract.b The law is summarized
blocking may be annoying ads or track- as follows:
ing, increased computer security is a To establish the existence of an implied
major side benefit. Online ads are usu- in fact contract, it is necessary to show: an
ally pieces of code as opposed to static unambiguous offer, unambiguous ac-
images or text. The end result of the ceptance, mutual intent to be bound,
ad auction process described earlier and consideration. However, these ele-
is that the user’s browser is redirected ments may be established by the conduct
to a URL of the advertiser’s choosing. of the parties rather than through express
The retrieved object may take the form written or oral agreements.42
of JavaScript, Flash, or even Java code. As an example, suppose you agree to
Vulnerabilities in these frameworks wash your neighbor’s car once a week.
can be used to execute malicious code You receive payment for each of the
on the client machine without the user first six weeks, but upon washing his
noticing anything out of the ordinary. car the seventh time, your neighbor
Even though browser support for Java refuses to pay because there was no
and Adobe Flash is being phased out,40 written agreement. Most courts would
vulnerabilities in these frameworks are agree that there was an implied-in-fact
still being exploited. Java exploits are contract as evidenced by the conduct of
on the decline, but Flash vulnerabili- the parties for the first six weeks. Your
ties are still some of the most common neighbor has to pay.
vehicles for malvertising.4 While ad net- Now consider a real example in
works have measures in place to detect which it was found that there was no
malvertising, there are ways to circum-
vent and avoid detection, at least tem- b U.S. contract law allows for two other types of
porarily, such as serving a legitimate contracts: express contract (written) and im-
ad until the ad network has approved plied-in-law contracts (also called “quasi con-
tracts,” they are more legal obligations than true
contracts). We only address U.S. contract law
a See, for example, http://adblockingdetector. here; other jurisdictions may be substantially
johnmorris.me/how-thisplugin-works/. different, and are well beyond our expertise.

74 COM MUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


review articles

implied-in-fact contract. In 1917, the More generally, a third or more (39% ˲˲ Advertisers will seek other venues
U.S. leased a pier from the Baltimore in the U.K. and 30% in the U.S.) say they for their advertising dollars.
and Ohio railroad for the purpose of ignore ads. Around three in 10 (31%/29%) ˲˲ Some content generators will stop
handling supplies destined for the say they actively avoid sites where ads in- generating content.
war in Europe. An earlier fire was be- terfere with the content.20 ˲˲ Some content publishers will stop
lieved to have been an act of sabotage, publishing content.
so soldiers were deployed to guard the Are Ad Blockers Unethical? ˲˲ Some content publishers will pub-
pier and surrounding equipment. The In After Virtue, Alasdair MacIntyre de- lish content of lower quality.
weather was cold, and the troop com- scribes the breakdown in ethical argu- ˲˲ There will be less free content
mander often complained about the ment that occurs when the foundations available to all users on the Internet,
tents in which his men were forced to for ethical systems are cut away, leav- and the content that remains freely
live. A railroad official offered to build ing proponents of differing perspec- available will, in some cases, be of re-
temporary barracks. Though there was tives to argue past each other without duced quality.
never any discussion of compensation, any basis for decisive engagement.14 Ad It is important to provide some con-
the barracks were built. The railroad blocking provides a canonical exam- text for the suggestion that the quality
later sued to recover the cost of the ple, as we have one group arguing for of online content will be diminished by
construction, arguing there had been individual rights (the right to receive a general acceptance of ad blocking.
an implied-in-fact contract. In what be- payment for one’s effort in providing Newspaper journalism was in decline
came the 1923 case of Baltimore & Ohio content), while the other group argues well before the advent of ad blocking, or
R. Co. v. United States,2 the Supreme for the general welfare (an Internet de- even the advent of the Internet, primar-
Court disagreed. The Court stated that void of continual distraction caused ily because of the failure of its core busi-
an “implied agreement” required “a by tasteless advertising). It is not clear ness model.15 The business model was
meeting of minds inferred, as a fact, how the two arguments can be recon- that of a quasi-monopoly: competition
from conduct of the parties in the light ciled, or how one can clearly overcome was limited, so that a local paper could
of surrounding circumstances.” The the other. We suggest a solution lies in charge higher prices for advertising,
Court found there had been no such a technologically mediated meeting and then use the revenue to maintain
meeting of minds, as the railroad com- of minds, but before we consider the reporters across the world. In essence,
pany never intimated that it would ex- solution, we offer a more detailed ac- the local Wal-Mart paid for the Bagh-
pect payment from the government. count of the ethical arguments. dad bureau through its advertising dol-
It follows that there are several rea- The utilitarian approach, first pro- lars. The limit on competition was due
sons the alleged quid pro quo of viewing pounded by Jeremy Bentham and John to a fact of technology: printing presses
ads in return for free Internet content Stuart Mill in the late 18th and early were very expensive to operate and
fails to rise to the level of an implied-in- 19th centuries, is based on the famil- maintain, so all but the largest munici-
fact contract. First, as with the Baltimore iar precept that “it is the greatest hap- palities could only sustain one or two
& Ohio case, there was no unambiguous piness of the greatest number that is (print) newspapers at any given point in
offer. The Internet content consumer the measure of right and wrong.”3 In time.22 In a pre-Internet world, the pa-
is rarely told precisely what is going to what follows, we will consider act utili- pers acted as an intermediary between
be loaded into his or her Web browser, tarianism, which focuses on the conse- advertisers and consumers, charging
and what is expected in return. Con- quences of individual actions. We will both for the opportunity to communi-
tent consumers suffer the embedding also substitute “well-being” for “happi- cate. In a multi-newspaper market, the
of ads and, on occasion, trackers and ness” to counter some of the more ob- equilibrium was often unstable; a nota-
other forms of spyware into their Web vious criticisms of utilitarianism. ble scoop could send more advertising
browsers without receiving any notice Does the use of ad blockers create dollars to the scooping paper, allowing
from the content provider whatsoever. the greatest well-being for the greatest the scooper to grow (literally) fatter and
In fact, as we have seen, the content pro- number? Those affected by the deci- more attractive to the buyers.
vider may not know what is being inject- sion to block ads include the following: The unraveling of this relationship
ed into the consumer’s browser. ˲˲ Ad blocking users, began with the television era and the
Second, the alleged agreement fails ˲˲ Ad viewing users, movement of affluent readers from the
to satisfy the unambiguous acceptance ˲˲ Content generators, inner city to the suburbs. National and
element. Unlike the lawn-mowing ex- ˲˲ Content publishers, and retail advertisers moved their dollars
ample, there is no prior conduct that ˲˲ Advertisers to television, and newspapers came to
indicates a general understanding that Should users choose to employ ad depend more on classified ads.6 With
an agreement is in place. The popular- blockers, the following will arguably the advent of the Internet in general
ity of ad blockers20,30,31 indicates that result: and Craig’s List in particular (founded
most consumers do not want to see the ˲˲ The ad blocking users will see few- in 1995), classified advertising revenue
ads, and clearly have not agreed to do er advertisements. also began to leave the newspapers’
so. A Reuters survey provides further ˲˲ The content generators will receive balance sheets. By 2010 the newspa-
evidence, indicating that even those less revenue per reading user. per industry was in deep decline, with
who do not employ ad blockers are ig- ˲˲ The content publishers will receive many major players facing bankruptcy
noring or avoiding the ads: less revenue per reading user. (for example, the Tribune Company in

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 75
review articles

2008), and others left to cope with dra- you use humanity, as much in your own human values. For example, in Ethical IT
matically reduced staffs. person as in the person of every other, Innovation, Sarah Spiekermann points
The consequences of a general use always at the same time as end and to both Aristotle and Maslow while con-
of ad blockers may thus be character- never merely as means.”13 Ad blocking cluding that technical design must be
ized as a further reduction in the quality readers arguably do not satisfy this for- based on an understanding that knowl-
of free online content through the de- mulation—they treat the content gen- edge, freedom, and autonomy are pre-
parture of some Internet content gen- erators as means rather than an end in conditions for human growth, self-es-
erators and publishers to other ways of themselves, taking their work product teem, friendship and self-actualization.24
making a living. For large numbers of without respecting their efforts to make At best, the design of advertising
consumers these are apparently accept- a living. It appears that Kant is on the technology shows little concern for
able outcomes given what they avoid: side of the advertisers, while Bentham knowledge, freedom, and autonomy of
the problems associated with spyware favors the general reader. consumers. At worst, advertising tech-
and the relentless distraction of advertis- Contractualism, an ethical theory nology actively works to subvert these
ing. There is also evidence that Internet related to Kant’s deontological ap- values. This subversion can be seen
readers do not greatly value what they are proach,18 more clearly takes into ac- through the lens of the “attention econ-
reading; given the choice between pay- count all interested parties, while omy,” a term coined by Herbert Simon
ing for the content and losing it, most pointing to a potential solution. In to capture the finite nature of the indi-
prefer the latter. The aforementioned What We Owe Each Other, T.M. Scan- viduals’ attention in the face of a seem-
Reuters survey found that only 10% of on- lon21 offers the following ethical rule ingly infinite amount of information.23
line users appeared to be willing to pay for action (emphasis added): The attention economy is reflected in
for once-free news content. An act is wrong if its performance under advertisers’ insertion of themselves
After a sharp upturn in 2012–2013— the circumstances would be disallowed by into virtually all personal interacti ons
when a large number of paywalls were any set of principles for the general regu- in everyday life, ranging from highway
introduced—our data shows very little lation of behavior that no one could rea- billboards to doctors’ offices to the bot-
change in the absolute number of people sonably reject as a basis for informed, un- toms of the trays at airport security.
paying for digital news over the past year. forced, general agreement. Writing for the “Practical Ethics” blog
In most countries the number paying for In establishing rules for behavior, of Oxford University, James Williams
any news is hovering around 10% of online Scanlon suggests that we must consid- argues the resulting distractions are
users and in some cases less than that.20 er the perspectives of all stakeholders, more than an annoyance, they “keep us
If Internet readers and users of ad and define a basis for informed general from living the lives we want to live:”
blockers are rational actors who are agreement. This would require com- In the short term, distractions can keep
making decisions based on their indi- munication between all stakeholders, us from doing the things we want to do. In
vidual well-being, and as the readers something that is sorely lacking in the the longer term, however, they can accu-
outnumber the writers and advertis- context of online advertising. We will mulate and keep us from living the lives
ers, one may conclude that the use of return to this point when we consider we want to live, or, even worse, undermine
ad blockers provides the greatest well possible solutions. our capacities for reflection and self-reg-
being for the greatest number. From a The third and final approach to be ulation, making it harder, in the words of
utilitarian perspective, ad blocking is considered shifts the balance of the Harry Frankfurt, to “want what we want
ethical; the content providers should argument in favor of the general reader, to want.” Thus there are deep ethical im-
look for a better business model. but on a far firmer basis than the argu- plications lurking here for freedom, well-
The counterargument is ready at ments of Bentham et al. Aretaic, or vir- being, and even the integrity of the self.19
hand: this analysis clearly does not tue ethics, emphasizes virtues of mind From a virtue ethics standpoint, it
take into account all stakeholders; the and character.8,11 Virtue ethics origi- follows that the design of Internet ad-
content generators and publishers, for nated with Aristotle’s Nicomachean vertising technology is itself unethi-
example, would almost certainly not be Ethics and his notion that the ultimate cal in that it works against the human
pleased with the consequences of this aim (telos) of an individual is to live project of self-creation. Ad blockers are
utilitarian calculus. This is an example a virtuous life. A virtuous life is a life thus not only ethical, but are literally
of a key criticism of utilitarianism; lived according to reason, where deci- a matter of self-defense. Quoting the
namely, that in emphasizing aggregate sions are based on a set of values held Practical Ethics blog once again:
well being, some individuals may be left dear by the individual. Virtue ethics In reality, ad blockers are one of the few
in far worse condition than before. thus involve the questions of “what is tools that we as users have if we want to
A deontic analysis avoids this par- desirable, good or morally worthwhile push back against the perverse design logic
ticular problem. Immanuel Kant sug- in life?” “What values should we pur- that has cannibalized the soul of the Web.6
gested in his Groundwork of the Meta- sue for ourselves and others?”8
physic of Morals that there is a single Virtue ethics has enjoyed a recent Solutions and Conclusion
primary moral obligation, which he re- resurgence, both in philosophy depart- The advertising delivery systems de-
ferred to as the “categorical imperative” ments and in schools of technology. scribed in this article are the antith-
(CI). Kant offered several formulations With regard to the latter, value-based esis of value-based design. The values
of CI, including one that sounds very design practices have been developed that Spiekermann and others point
much like the golden rule: “act so that based on various lists of fundamental to as a foundation for virtue-based

76 COMM UNICATIO NS O F THE AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


review articles

design—knowledge, freedom, and Such a solution will require careful Reporter Please Turn out the Lights: The Collapse of
Journalism and What Can Be Done To Fix It. The New
autonomy—are precisely the values design and far more communication Press, NY, 2011.
that online advertising systems most between stakeholders than currently 16. Nithyanand, R. et al. Adblocking and Counter-Blocking:
A Slice of the Arms Race (2016); arXiv:1605.05077.
systematically undermine. takes place, but it offers the potential 17. O’Reilly, L. This ad blocking company has the potential
Internet advertisers exchange in- for clearly informing readers of their op- to tear a hole right through the mobile Web—and it
has the support of carriers. BusinessInsider.com, May
formation about users without their tions, options upon which they can ex- 13, 2015.
knowledge or control, using that in- ercise rational choice in pursuit of their 18. Parfit, D. On What Matter 1 (2011), 412–413. Oxford
University Press.
formation to manipulate users into own individual goals. We hope that ad- 19. Practical Ethics Blog. University of Oxford;
behavior they might not otherwise vertisers see this as an opportunity. http://blog.practicalethics.ox.ac.uk/2015/10/why-its-
ok-to-blockad
have exhibited. This summary may We have argued that ad blocking is 20. Reuters Institute Digital News Report 2015. Reuters
Institute, University of Oxford, https://reutersinstitute.
seem harsh, and some may argue that not a violation of an existing contract politics.ox.ac.uk/sites/default/files/Reuters%20
advertisers would happily engage in (at least in U.S. law). This does not Institute%20Digital%20News%20Report%202015_
Full%20Report.pdf
more ethical behavior if better chan- mean that ad blocking is beyond the 21. Scanlon, T.M. What We Owe Each Other. Belknap
nels of communication were provided reach of earnest lobbyists and subse- Press, Cambridge, MA, 2000.
22. Shirky, C. Newspapers and thinking the unthinkable.
to interested consumers. A solution quent legislation. One might expect, Will the Last Reporter Please Turn out the Lights: The
beneficial to all may lie in a virtue- however, that such legislation would Collapse of Journalism and What Can Be Done To
Fix It. R.W. McChesney and V. Pickard, Eds. The New
based redesign. Such a redesign would not be very popular with the general Press, NY, 2011.
embed T.M. Scanlon’s suggestion that public. We hope the agreement sug- 23. Simon, H.A. Designing organizations for an
information-rich world. Computers, Communications,
there be an “informed, unforced, gen- gested here takes form before the bat- and the Public Interest. Johns Hopkins University
eral agreement” among all parties. tle between advertisers and ad block- Press, Baltimore, MD, 1971.
24. Spiekermann, S. Ethical IT Innovation: A Value-Based
The agreement would be based on a ers escalates any further. System Design Approach. CRC Press, Boca Raton,
system that provides revenue for con- FL, 2015.
25. http://www.tomsguide.com/us/ad-blocking-is-
tent generators and connects adver- Acknowledgments stealing,news-20962.html
tisers to interested consumers while The authors greatly acknowledge the 26. http://blog.practicalethics.ox.ac.uk/2015/10/why-its-
ok-toblock-ads/
reducing the deleterious impact of assistance of Sarah Wicker and Adam 27. https://developer.apple.com/library/ios/releasenotes/
General/WhatsNewIniOS/Articles/iOS9.html
the current system of advertising on Engst with several technical, legal, 28. https://adblockplus.org/blog/adblock-plus-for-ios-9-
the reading public. The key step lies in and ethical issues. We also thank the finallyhere-and-pssst-it-s-free
29. http://time.com/4052033/apple-iphone-ios-9-ad-blockers/
empowering the reading/consuming editor and reviewers for their time, ef- 30. https://blog.pagefair.com/2015/ad-blocking-report/
public—letting them choose whether fort, and expertise. 31. https://pagefair.com/downloads/2016/05/Adblocking-
Goes-Mobile.pdf
they will download ads, and if so, what 32. http://fortune.com/2015/09/18/dear-apple-i-may-rob-
type of ads. Should a reader choose References
yourstore/ http://www.tomsguide.com/us/ad-blocking-
isstealing,news-20962.html
not to download ads, he or she should 1. Alexander, J., Crompton, T. and Shrubsole, G. Think 33. http://adage.com/article/digitalnext/ad-blocking-
Of Me As Evil? Opening The Ethical Debates In
be given the opportunity to pay for ad- Advertising. Public Interest Research Centre (PIRC)
unnecessaryinternet-apocalypse/300470/
34. http://www.iab.com/news/rothenberg-says-ad-
free content. and WWF-UK, Oct. 2011. blocking-is-awar-against-diversity-and-freedom-of-
2. Baltimore & Ohio R. Co. v. United States 261 U.S. 592 (1923)
The supporting technology for 3. Bentham, J. A fragment on government. (1776).
expression/
35. http://www.allbusiness.com/web-advertising-and-
such a system already exists in the cur- Reprinted in The Collected Works of Jeremy Bentham. cpm-aquick-guide-for-small-businesses-2646-1.html
J.H. Burns and H.L.A. Hart, eds. The Athlone Press, 36. https://www.eff.org/privacybadger
rent ad networks. Recall the current London, U.K. 1977, 393. 37. http://www.theguardian.com/technology/2015/oct/09/
scheme of directed Internet advertis- 4. Cisco. Cisco 2016 Midyear Cybersecurity Report; appleremoves-iphone-adblockers-facebook-third-
http://www.cisco.com/c/m/en_us/offers/sc04/2016- party-apps
ing relies on the use of cookies stored midyearcybersecurity-report/index.html. 38. http://www.wsj.com/articles/facebook-will-force-
on user machines. These cookies are 5. Cyphort Labs. The Rise of Malvertising, 2015; http:// advertisingon-ad-blocking-users-1470751204
go.cyphort.com/Malvertising-Report-15-Page.html. 39. http://www.cnet.com/news/facebook-adblock-
sent to service-side and demand-side 6. Downie, Jr., L. and Schudson, M. The reconstruction plusworkaround-august-11
platforms to obtain directed advertis- of American journalism. Will the Last Reporter Please 40. http://www.extremetech.com/computing/209888-
Turn out the Lights: The Collapse of Journalism and mozillafirefox-kills-flash-by-default-security-chief-
ing for insertion into content initially What Can Be Done To Fix It. R.W. McChesney and V. calls-for-adobe-toissue-an-end-of-life-date, http://
Pickard, eds. The New Press, NY, 2011.
requested by a user. Suppose the cook- 7. EFFector 15, 15 (May 24, 2002); https://www.eff.org/
www.bbc.com/news/technology-36301904
41. http://www.computerworld.com/article/2487367/
ies are replaced by information explic- effector/15/15 ecommerce/ad-blockers--a-solution-or-a-problem-.html
8. Frankena, W. Ethics (2nd Edition). Prentice Hall, Upper
itly provided by the user that indicates Saddle River, NJ, 1973.
42. https://www.law.cornell.edu/wex/contract_implied_in_fact
buying habits and interest in specific 9. FTC. Data Brokers: A Call for Transparency and
Accountability; https://www.ftc.gov/system/files/
consumer goods. The data manage- documents/reports/data-brokerscall-transparency- Stephen B. Wicker ([email protected]) is a
professor in the School of Electrical and Computer
ment platform (DMP) would request, accountability-report-federal-trade-commissionmay-
Engineering at Cornell University, Ithaca, NY.
2014/140527databrokerreport.pdf
coordinate, and update this user-sup- 10. Hof, R. Mobile ads will smash $100 billion mark Kolbeinn Karlsson ([email protected]) is a graduate
plied information as necessary. Rather worldwide in 2016. Forbes (Apr. 2, 2015); http://www. student in the School of Electrical and Computer
forbes.com/sites/roberthof/2015/04/02/mobile- Engineering at Cornell University, Ithaca, NY.
than inferring potential sales from adswill-smash-100-billion-mark-worldwide-in-2016
browsing habits, advertising networks 11. Hursthouse, R. On Virtue Ethics. Oxford University
© 2017 ACM 0001-0782/17/10 $15.00
Press, 2002.
could make advertising bidding deci- 12. Jackson, J. Three network to run 24-hour ad
sions based on the clearly expressed blocking trial. The Guardian (May 25, 2016); https://
www.theguardian.com/media/2016/may/26/three-
desires of potential consumers. Such networkto-run-24-hour-adblocking-trial
a system would increase the agency of 13. Kant, I. Groundwork for the Metaphysics of Morals.
A. Wood, ed. Yale University Press, 2002, 37. Watch the authors discuss
the browsing user, while potentially 14. MacIntyre, A. After Virtue: A Study in Moral Theory their work in this exclusive
(3rd Edition). University of Notre Dame Press, South Communications video.
increasing return on investment for Bend, IN, 2007. https://cacm.acm.org/videos/
advertisers. 15. McChesney, R.W. and Pickard, V. Eds. Will the Last internet-advertising

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 77
This book, a revised version of the 2014 ACM Dissertation Award winning dissertation, proposes an
architecture for cluster
This book, computing
a revised systems
version of the that can
2014 ACM tackle emerging
Dissertation data dissertation,
Award winning processing proposes
workloads an at scale.
Whereasarchitecture
early cluster computing
for cluster systems,
computing likethat
systems MapReduce, handleddata
can tackle emerging batch processing,
processing our atarchitecture
workloads scale. also
enables streaming and interactive queries, while keeping MapReduce’s scalability and fault
Whereas early cluster computing systems, like MapReduce, handled batch processing, our architecture also tolerance. And
whereasenables
most deployed
streaming systems only support
and interactive simple
queries, while one-pass
keeping computations
MapReduce’s scalability(e.g., SQLtolerance.
and fault queries),Andours also
whereas most deployed systems only support simple one-pass computations (e.g., SQL queries),
extends to the multi-pass algorithms required for complex analytics like machine learning. Finally, unlike the ours also
extends
specialized to theproposed
systems multi-pass for
algorithms
some ofrequired for complex analytics
these workloads, like machine
our architecture learning.
allows Finally,
these unlike the to be
computations
specialized systems proposed for some of these workloads, our architecture allows these computations to be
combined, enabling rich new applications that intermix, for example, streaming and batch processing.
combined, enabling rich new applications that intermix, for example, streaming and batch processing.
We achieve these results
We achieve through
these results a simple
through extension
a simple extensiontoto MapReduce
MapReduce that that adds
adds primitives
primitives forsharing,
for data data sharing,
called Resilient Distributed
called Resilient Datasets
Distributed (RDDs).
Datasets (RDDs). We
Weshow
show that thisisisenough
that this enough to capture
to capture a wide
a wide range range
of of
workloads. We implement RDDs in the open source Spark system, which we evaluate using synthetic and
workloads. We implement RDDs in the open source Spark system, which we evaluate using synthetic and
real workloads.
real workloads. SparkSpark matches
matches or exceeds
or exceeds thetheperformance
performance ofofspecialized
specialized systems in many
systems domains,
in many while while
domains,
offering stronger fault tolerance properties and allowing these workloads to be combined. Finally, we examine
offering stronger fault tolerance properties and allowing these workloads to be combined. Finally, we examine
the generality of RDDs from both a theoretical modeling perspective and a systems perspective.
the generality of RDDs from both a theoretical modeling perspective and a systems perspective.
This version of the dissertation makes corrections throughout the text and adds a new section on the
This version of the
evolution dissertation
of Apache makes
Spark in corrections
industry since 2014.throughout the textformatting,
In addition, editing, and addsand
a new
linkssection
for the on the
evolution of Apache
references haveSpark in industry since 2014. In addition, editing, formatting, and links for the
been added.
references have been added.
research highlights
P. 80 P. 81
Technical
Perspective Multi-Objective Parametric
Broadening and Query Optimization
Deepening Query By Immanuel Trummer and Christoph Koch
Optimization
Yet Still Making
Progress
By Jeffrey F. Naughton

P. 90 P. 91
Technical
Perspective A Large-Scale Study of
Shedding Programming Languages and
New Light on
an Old Language Code Quality in GitHub
By Baishakhi Ray, Daryl Posnett,
Debate Premkumar Devanbu, and Vladimir Filkov
By Jeffrey S. Foster

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 79
research highlights
DOI:10.1145/ 3 0 6 8 6 1 0

Technical
To view the accompanying paper,
visit doi.acm.org/10.1145/3068612 rh

Perspective
Broadening and Deepening
ACM
ACM Conference
Conference
Proceedings
Query Optimization
Proceedings
Now
Now Available via
Available via
Yet Still Making Progress
By Jeffrey F. Naughton
Print-on-Demand!
Print-on-Demand!
QUERY OPTIMIZATION IS a fundamental times execution time is not the only crite-
problem in data management. Simply rion by which plans should be selected. As
Did you know that you can put, most database query languages are a prominent and current example, if the
now order many popular declarative rather than imperative—that query is being run in the cloud, the system
is, they specify properties the answer may obviously want to find fast evaluation
ACM conference proceedings should satisfy, rather than give an algo- plans, but may also desire inexpensive
via print-on-demand? rithm to compute the answer. The best ones. That is, now we have two objectives:
known and most widely used database running time and cost. This gives rise to
query language—SQL—is a prime exam- multi-objective query optimization, where
Institutions, libraries and ple of a language for which optimization the problem is: Given a query and a set of
individuals can choose is essential. objectives, find a set of plans that are Pa-
By “essential,” I mean that database reto-optimal for these objectives (a plan is
from more than 100 titles optimization is not a matter of shaving “Pareto-optimal” if it is not dominated in
on a continually updated 10% or even a factor of 2x from a query’s all objectives by other plans.)
execution time. In database query evalua- Both parametric and multi-objective
list through Amazon, Barnes tion, the difference between a good plan query optimization have been studied in
& Noble, Baker & Taylor, and a bad or even average plan can be the past, but the following paper by Trum-
Ingram and NACSCORP: multiple orders of magnitude—so suc- mer and Koch is a remarkable tour de
cessful query optimization makes the dif- force exploration of the combination of
CHI, KDD, Multimedia, ference between a plan that runs quickly the two. Here, the problem is the follow-
SIGIR, SIGCOMM, SIGCSE, and one that never finishes at all. Accord- ing: Given a partially specified query, and
ingly, since the seminal papers in the multiple objectives for the resulting plan,
SIGMOD/PODS, 1970s, query optimization has received find a set of Pareto-optimal plans that can
and many more. and continues to receive a great deal of be chosen at runtime by filling in all pa-
attention from both the industrial and re- rameters.
search database communities. Since the original query optimization
For available titles and Early work on optimization focused problem and its variants are already very
ordering info, visit: on a scenario in which the query was fully difficult, one might despair that simul-
specified, and the optimization goal was taneously treating two substantial exten-
librarians.acm.org/pod query evaluation time. That is, the prob- sions would yield a hopelessly intractable
lem was: What is the fastest way to evaluate problem. This paper is surprising in its
this query? While this problem was (and elegance and effectiveness. It embeds the
is!) challenging, it is not broad enough to problem in an insightful and expressive
capture the optimization problem faced formal framework, and specifies a solu-
by modern systems. As an important ex- tion that combines aspects of piecewise
ample, many times the query is not fully linear functions, dynamic programming
specified in advance (for example, it may with pruning based upon Pareto polytope
contain variables, or “parameters” that analyses, and linear programming. A thor-
are only discovered at runtime). This gen- ough set of experiments with an imple-
eralization gives rise to parametric query mentation of their algorithm completes
optimization, where the problem is as the paper, indicating all of this actually
follows: Given a partially specified query, works.
find a set of good evaluation plans, one of
which will be chosen at runtime when the Jeffrey F. Naughton ([email protected]) is a principal
scientist at Google and previously a professor of computer
parameter is instantiated. science at the University of Wisconsin, Madison.
Yet another necessary generalization
involves the optimization goal. Some- Copyright held by author.

80 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


DOI:10.1145/ 3 0 6 8 6 1 2

Multi-Objective Parametric
Query Optimization
By Immanuel Trummer and Christoph Koch

Abstract implies that all information required to produce cost esti-


We propose a generalization of the classical database query mates is available to the query optimizer. The goal in classi-
optimization problem: multi-objective parametric query cal query optimization is to find a query plan with minimal
(MPQ) optimization. MPQ compares alternative process- execution cost.
ing plans according to multiple execution cost metrics. It Multi-objective query optimization1, 7, 11, 16, 17 generalizes
also models missing pieces of information on which plan the classical model and associates each query plan with
costs depend upon as parameters. Both features are cru- a cost vector c ∈ n instead of a scalar value. This allows to
cial to model query processing on modern data processing model scenarios where multiple execution cost metrics are
platforms. of interest. If data processing takes place in the cloud then
MPQ generalizes previously proposed query optimization we are not only interested in execution time but also in mon-
variants, such as multi-objective query optimization, para- etary execution fees. Different components of the plan cost
metric query optimization, and traditional query optimiza- vector represent cost according to different cost metrics.
tion. We show, however, that the MPQ problem has different The goal is to find the set of Pareto-optimal query plans for
properties than prior variants and solving it requires novel which no alternative plan offers better cost according to all
methods. We present an algorithm that solves the MPQ metrics.
problem and finds, for a given query, the set of all relevant Parametric query optimization3, 4, 6, 8, 10, 13 generalizes the
query plans. This set contains all plans that realize optimal standard model in a different way. It associates each query
execution cost tradeoffs for any combination of parameter plan with a cost function c ∈ m → , mapping from a mul-
values. Our algorithm is based on dynamic programming tidimensional parameter space to a one-dimensional cost
and recursively constructs relevant query plans by combin- space. Parameters represent pieces of information that are
ing relevant plans for query parts. We assume that all plan not yet available at optimization time but required to esti-
execution cost functions are piecewise-linear in the param- mate plan execution cost. For instance, parametric query
eters. We use linear programming to compare alternative optimization allows to optimize query classes that are
plans and to identify plans that are not relevant. We present defined via query templates with unspecified predicates.
a complexity analysis of our algorithm and experimentally One parameter could then represent the selectivity of one
evaluate its performance. unspecified predicate. The concrete predicate becomes
known not before run time and so does the concrete param-
eter value. The goal in parametric query optimization is
1. INTRODUCTION typically to find a set of plans containing for each possible
1.1. Context parameter value combination, the plan with minimal execu-
The goal of the database query optimization is to map a tion cost.
query (describing the data to generate) to the optimal query
plan (describing how to generate the data). Query optimiza- 1.2. Problem
tion is a long standing research area in the database field We propose multi-objective parametric query (MPQ) opti-
dating back to the 1970s.14 The original query optimization mization, a query optimization variant that generalizes
problem model has been motivated by the capabilities of multi-objective query optimization, parametric query opti-
data processing systems at that time. However, there have mization, and classical query optimization at the same time.
been fundamental advances in data processing techniques MPQ models the cost of a single query plan as a cost func-
and systems in the meantime. Hence the original problem tion c ∈ m → n that maps a multidimensional parameter
model is not sufficiently expressive to capture all relevant space to a multidimensional cost space. MPQ assumes that
aspects of modern data processing systems. In this paper, query plans are compared according to multiple cost met-
we propose an extension of the classical query optimiza- rics and that cost estimates depend on parameters whose
tion problem model and a corresponding optimization values are unknown at optimization time.
algorithm. The goal in MPQ is to find the set of Pareto-optimal plans
Alternative query plans are compared according to their for each possible parameter value combination. This prob-
execution cost (e.g., execution time) in query optimization. lem model is required wherever the application scenarios of
Query optimization variants can be classified according to multi-objective query optimization intersect with the ones
how they model the execution cost of a single query plan.
Traditional query optimization14 models the cost of a query
The original version of this article was published in the
plan as scalar cost value c ∈ . This implies that query plans
Proceedings of the VLDB Endowment, Volume 8.
are compared according to one single cost metric. It also

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 81
research highlights

of parametric query optimization. The following example


Figure 1. Multi-objective parametric query optimization precomputes
describes a scenario in which MPQ is necessary. a set of relevant query plans. The optimal plan is selected from that
set according to parameter values and user preferences.
Example 1. Assume that we need to process the same query
in regular time intervals. Query processing takes place in the Query Template
cloud and we would like to use Amazon EC2 Spot Instances.
Here, we care about two execution cost metrics which is Before
the execution time and the monetary execution fees. We Runtime
Optimization
can trade between them by adapting the type and number
of the computational resources that we rent from Amazon. Plan Set
The query processing cost also depends on parameters: the
pricing of Amazon Spot Instances. Due to unpredictable Parameters,
Plan Selection
fluctuations, we do not know their values in advance. As we Preferences At
Runtime
process the same query repeatedly, we can determine the set
of all potentially relevant query plans in a preprocessing
step. At run time, given concrete Spot prices and execution cost Query Plan
bounds, we can efficiently select the best query plan out of
the precomputed set. This avoids expensive optimization at
run time. The preprocessing step requires MPQ since multiple functions can approximate arbitrary functions.8 The dif-
plan cost metrics and parameters need to be considered. ference between our model and the one used in paramet-
ric query optimization is that we associate each plan with
There are many other scenarios in which multiple pro- multiple piecewise-linear cost functions representing cost
cessing cost metrics are of interest. Techniques for approx- according to different metrics.
imate query processing allow us to trade between execution
time and result precision.1 Different query plans can real- 1.3. Algorithm
ize different tradeoffs between energy consumption and We present an algorithm that solves MPQ for piecewise-linear
execution time for the same query.18 If data is processed plan cost functions. Our algorithm is based on dynamic pro-
by crowd workers then latency, execution fees, and result gramming. It recursively decomposes the input query for
precision are all relevant cost metrics.12 If the queries we which we need to determine the set of relevant query plans
want to process at run time correspond to query templates into subqueries. In a bottom-up approach, it recursively cal-
that are known before run time then we can make query culates sets of relevant plans for a query out of optimal plan
optimization a preprocessing step. At preprocessing time, sets for its subqueries: it combines plans that are relevant
plan cost estimates depend on parameters with unknown for the subqueries to form new plans that are potentially
values. Those parameters can represent query properties ­relevant for the decomposed query. Dynamic programming
which are not fully specified in the template or properties is a classical approach for query optimization. The crucial
of the query execution platform (e.g., the Spot Instance difference between our algorithm and prior algorithms is
prices) that will become known only at run time. MPQ is the implementation of the pruning function, that is, in how
applicable in such scenarios and avoids query optimiza- we compare alternative query plans and prune out subopti-
tion at run time. mal plans.
The result of MPQ is the set of all potentially relevant Conceptually, we associate each plan for a query or sub-
query plans for a given query or query template. It contains query with a region in the parameter space for which the
all Pareto-optimal plans for each possible parameter value plan is Pareto-optimal. We call this region the Pareto region.
combination. At run time, we can select the best query plan The goal during pruning is to compare alternative plans
out of that set based on the concrete parameter values and generating the same result in order to discard suboptimal
based on user preferences. Users can specify their prefer- plans. We compare plans pair-wise and determine for each
ences in advance (e.g., by specifying cost bounds and priori- plan the parameter space region in which it is dominated
ties between different cost metrics1, 16) such that the optimal by another plan, that is, in which the other plan has compa-
plan according to those preferences can be selected auto- rable or better cost according to each plan cost metric. Then
matically. As an alternative, we can use the precomputed we reduce the Pareto region of the first plan by the region in
plan set to visualize all Pareto-optimal cost tradeoffs for which it is dominated. If the Pareto region of a plan becomes
given parameter values. This allows users to select the pre- empty then it is not Pareto-optimal for any parameter value
ferred cost tradeoff directly.17 Figure 1 illustrates the context combination and can be discarded.
of MPQ. All Pareto regions that could ever occur during the execu-
So far we have introduced a very generic problem model tion of that algorithm can be represented using the follow-
for MPQ. In order to make the problem tractable, we restrict ing formalism. We represent Pareto regions as a union of
ourselves to a specific class of cost functions in this paper: convex polytopes in the parameter space from which other
we consider piecewise-linear plan cost functions. Many convex polytopes have been subtracted. We prove that this
approaches for parametric query optimization6, 8 consider representation is closed under all operations that the algo-
only piecewise-linear plan cost functions as well since such rithm needs to perform on Pareto regions. Note that this

82 COMMUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


region shape is a consequence of the class of cost functions In other words, p1 dominates p2 if cp1 (x) contains for no com-
(piecewise-linear functions) that we consider. ponent a higher value than cp2 (x). Now, we are ready to intro-
The algorithm needs to perform several elementary oper- duce the MPQ problem.
ations on Pareto regions and cost functions. For instance,
it must verify whether a Pareto region is empty or calculate Definition 1. An MPQ problem is defined by a query q, a
a parameter space region in which one plan is preferable parameter space X, and a cost space C. A solution is a subset
to a second one. We show how all those operations can be S ⊆ P(q) of query plans such that for each possible plan p
implemented based on the aforementioned representation ∈ P(q) and for each possible parameter value vector x ∈ X
of Pareto regions. We implement those operations using lin- there is a solution plan s ∈ S such that s dominates p for x,
ear programming. that is, s x p.

1.4. Outline We focus on a subclass of MPQ problems that restricts


The remainder of this paper is organized as follows. We the class of cost functions. In order to define the class of
define the MPQ problem and related concepts more for- cost functions that we consider, we must first introduce
mally in Section 2. In Section 3, we describe our algorithm for convex polytopes. A convex polytope is defined by a set of
MPQ with piecewise-linear plan cost functions. In Section 4, linear inequalities. The convex polytope is the set of points
we analyze the MPQ problem and the asymptotic complex- in the parameter space that satisfy all its inequalities. We
ity of our algorithm. We present experimental results for an use the terms convex polytope and polytope as synonyms.
implementation of our algorithm in Section 5 and discuss A linear cost function is defined by a constant b and an
related work in Section 6. n-dimensional weight vector w ∈ n such that b + wT × x is
the associated cost value for each parameter vector x ∈ X.
2. FORMAL PROBLEM STATEMENT A scalar piecewise-linear cost function is a cost function
We now define the MPQ problem. A query describes data to that allows to partition the parameter space into convex
generate. The description of our algorithm for solving MPQ polytopes such that the function is linear in each polytope.
problems, given in the next section, focuses on simple SQL A vector-valued piecewise-linear cost function consists of
join queries. An SQL join query is defined by a set of tables one piecewise-linear cost function for each cost metric. We
to join. A subquery joins a subset of tables. Standard meth- use the terms vector-valued piecewise-linear cost function
ods exist by which a query optimization algorithm for this and piecewise-linear cost function as synonyms. We restrict
simple query language can be extended into an algorithm our scope to MPQ with piecewise-linear cost functions.
supporting full SQL queries.14
A query plan describes how to generate data. We say 3. ALGORITHM
that a query plan answers a query if it generates the data, that Our algorithm produces a set of relevant plans for a given
is, described by the query. We assume in the following that query. A plan is relevant if its execution cost is Pareto-
query plans consist of a sequence of scan operations and optimal for some parameter value combination.
binary join operations. For a query q, we denote by P(q) the
set of alternative plans that answer the query. 3.1. Overview
We compare query plans according to their execution Our algorithm splits the input query recursively into
cost. The cost of a given plan depends on a set of real- smaller and smaller parts until we obtain atomic subque-
valued parameters. The set of parameters is a property of ries. We start with atomic subqueries and calculate the set
the query. All alternative plans in P(q) depend therefore on of relevant plans for each of them. After that, larger sub-
the same parameters. A parameter value vector contains for queries are treated. We treat subqueries in an order which
each parameter a corresponding value. We do not know the makes sure that before treating a query, we have calcu-
parameter values at optimization time. The parameter space lated relevant plan sets for each of its subqueries. The rea-
is the set of all possible parameter value vectors. We assume son for restricting the order is that we want to calculate the
in the following that there are n parameters and denote by set of relevant plans for a query out of the sets of relevant
X  ⊆ n the n-dimensional parameter space. A parameter plans for its subqueries. More precisely, we can guaran-
space region is a subset of the parameter space. tee that each relevant plan for a query can be obtained by
We compare query plans according to multiple execu- splitting the query into two subqueries and combining a
tion cost metrics. A cost vector contains for each cost metric relevant plan for the first subquery with a relevant plan
a nonnegative cost value. We assume in the following that for the second subquery, thereby generating a new query
there are m execution cost metrics and denote by C = m plan. Having calculated the set of relevant plans for each
the space of cost vectors. We associate each query plan p subquery, we can therefore obtain a superset of relevant
with a cost function cp: X → C that maps the n-dimensional query plans by iterating over all possible splits into sub-
parameter space to the m-dimensional cost space. We can queries and over all possible combinations of relevant
compare the cost of query plans for specific parameter val- subplans. In order to reduce the superset to the actual
ues. Denote by x ∈ X a parameter value vector and by p1 and set of relevant query plans, we must prune plans answer-
p2 two plans answering the same query. We say that p1 domi- ing the same query. Pruning them means to identify and
nates p2 for x, written p1 x p2, if p1 has lower or equivalent to discard plans that are irrelevant. The input query is
cost than p2 according to each metric for parameter values x. treated last. The set of relevant plans for the input query is

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 83
research highlights

the desired algorithm output. In summary, our algorithm plans p1 and p2 and we find that plan p1 has better cost than
can be ­written as follows: or equivalent cost to p2 according to all cost metrics for the
parameter space region X then we reduce the Pareto region
• Iterate over all subqueries s of the input query in ascend- of p2 by subtracting X. Pareto regions can only shrink during
ing order of query size: a pruning operation. Once the region of one plan becomes
– If subquery s is an atomic subquery then consider all empty, it is irrelevant and can be safely discarded. We dis-
possible plans for s. card plans as soon as possible in order to avoid unnecessary
– Otherwise, if s is not an atomic subquery, then iterate comparisons.
over all possibilities to decompose s into two subque- More precisely, the pruning function iterates over all
ries s1 and s2: plan pairs and executes for each pair the following steps.
 For each split into two subqueries s1 and s2, con- First, it identifies the region in which one plan dominates
sider all plans that are combinations of a relevant the other plan. Second, it updates the Pareto region of the
plan for s1 and a relevant plan for s2. dominated plan by subtracting the region in which it is
– Prune all considered plans to obtain the set of relevant dominated. Third, it checks whether the Pareto region of
plans for s. the dominated plan becomes empty after the update. In
that case, the plan is discarded and does not participate in
As many query optimization algorithms,8, 14, 16 our algo- further comparisons. Figure 2 illustrates how the Pareto
rithm is based on dynamic programming. We can use region of a plan is reduced after comparing it to another
dynamic programming since the principle of optimality holds plan. The example refers to a scenario where two param-
for query optimization.14 Formulated in general terms, the eters and two cost metrics are considered (execution time
principle of optimality designates the problem property that and fees).
optimal solutions can be obtained by combining optimal Note that two plans can mutually dominate each other in
solutions to subproblems. In the context of query optimiza- different parameter space regions. Having determined that
tion, the principle of optimality means more specifically that a first plan dominates a second plan for some parameter
optimal query plans can be obtained by combining optimal space region, we must therefore still verify if the second plan
plans for subqueries. The principle of optimality has been dominates the first plan as well.
shown to hold for all common execution cost metrics in multi-
objective query optimization.16 This means that a Pareto- 3.3. Data structures
optimal query plan can be combined from Pareto-optimal We describe the data structures by which we represent plan
plans for subqueries. A relevant plan is Pareto-optimal for cost functions and Pareto regions. Our plan cost model is
some points in the parameter space. It is therefore intuitive based on piecewise-linear functions. A piecewise-linear
that a relevant query plan can be combined from relevant function is linear in parameter space regions that form
plans for the subqueries (we omit the formal proof). In other convex polytopes. A linear function can be represented by
words, the principle of optimality holds for MPQ as well. It is a constant and by weights capturing the slope of the func-
the fundament of our MPQ algorithm. tion for each parameter. Hence a piecewise-linear function
can be represented by a set of convex polytopes where each
3.2. Pruning convex polytope is associated with a constant and weights.
Many query optimization algorithms for classical query We consider multiple plan cost metrics. Each query plan is
optimization,14 multi-objective query optimization,16 or therefore associated with one piecewise-linear cost function
parametric query optimization8 are based on dynamic pro- per plan cost metric.
gramming. The primary difference between all those algo- We consider the class of piecewise-linear cost functions
rithms is the realization of the pruning function. As we treat to represent plan cost. We decided to use that class of func-
a novel problem variant, we must design a novel pruning tions since it allows to approximate arbitrary functions
function. In the following, we describe how our algorithm
prunes query plans, that is, how it compares plans for the
same query and identifies irrelevant plans. Figure 2. We subtract the area in which a plan is dominated from its
Our pruning function is based on the key concept of the Pareto region.
Pareto region. Each query plan is associated with a Pareto
region. This is a parameter space region in which it real- Before Comparison Comparison After Comparison
izes Pareto-optimal cost tradeoffs. A plan is irrelevant if its
Parameter 2

Parameter 2

Parameter 2

Pareto region is empty. The goal of the pruning function is to


compare a set of plans answering the same query in order to
calculate their Pareto regions. The pruning function works
as follows. At pruning start, we assume by default that each
plan is Pareto-optimal in the entire parameter space. This Parameter 1 Parameter 1 Parameter 1
means that we assign the entire parameter space as Pareto
region to each query plan. During pruning, we compare all Plan Pareto Region Other Plan Faster
query plans answering the same query pair-wise in order Other Plan Cheaper Other Plan Dominates
to calculate their true Pareto regions. If we compare two

84 COMMUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


up to an arbitrary degree of precision (using more pieces to determine the parameter space region in which one func-
increases precision). In contrast to that, we cannot freely tion has lower values than the other one. Now, we general-
decide which class of shapes we consider for representing ize from linear cost functions to piecewise-linear functions.
Pareto regions. The algorithm must be able to represent Each piecewise-linear function partitions the parameter
each shape that could potentially occur during pruning. space into convex polytopes in which the function is lin-
Our decision to use piecewise-linear cost functions implies ear. If we compare two piecewise-linear functions then we
the class of shapes that we need to consider as Pareto can partition the parameter space such that both functions
regions. are linear in each partition. More precisely, we obtain the
We describe our representation of Pareto regions. We aforementioned partitioning by intersecting the partitions
motivate this representation in an informal way. It is, how- associated with the first cost function with the partitions
ever, relatively easy to prove that the proposed representa- associated with the second function. Figure 3 illustrates
tion covers all possible cases. how we intersect two parameter space partitionings in a
We start by considering the special case of linear cost two-dimensional parameter space. Having this partition-
functions. Parametric query optimization is a special case ing, we apply the method for linear cost functions separately
of MPQ. It has been shown in the domain of paramet- in each partition. If we have the subregion in which a first
ric query optimization that the parameter space region plan dominates a second one for each parameter space
in which one plan is better than another plan according region then the union of those subregions is the total area
to one cost metric is a convex polytope if both plans have in which the first plan dominates. If we have multiple cost
linear cost functions.6 In a setting with multiple cost met- metrics instead of only one, then we can apply the method
rics, a plan is strictly better than another plan if it is better described before for each cost metric separately. If we have
according to each cost metric. The region in which a plan for each cost metric the parameter space region in which the
is better than another one is therefore an intersection of first plan dominates the second one then the intersection of
multiple convex polytopes. An intersection of convex poly- those areas (over all cost metrics) yields the area in which the
topes is a convex polytope again. The region in which all first plan is better according to all cost metrics.
other plans are better than a given plan is, hence, a union Given the area in which a plan is dominated, we must sub-
of convex polytopes. tract it from that plan’s Pareto region. The implementation
Now, let us generalize that reasoning from linear cost of this operation is straightforward: as discussed before, we
functions to piecewise-linear cost functions. The general- represent Pareto regions as a union of convex polytopes from
ization is straightforward. Given two piecewise-linear cost which other convex polytopes have been subtracted. The
functions, we can always partition the parameter space into region in which one plan dominates another one must con-
convex polytopes such that both cost functions are linear sist of convex polytopes. In order to subtract such a region
in each polytope. Thereby we reduce the case of piecewise- from the Pareto region, we simply add the corresponding
linear cost functions to the case of linear cost functions. In polytopes to the list of subtracted polytopes.
summary, we can represent the Pareto region of a plan as a We must determine whether a given Pareto region is
union of convex polytopes from which we subtract another empty. A Pareto region is a set of polytopes from which other
union of convex polytopes. polytopes have been subtracted. We consider first the special
case of one polytope P+ from which another set of polytopes
3.4. Elementary operations have been subtracted. We want to verify whether the
Having described the data structures used to represent cost given polytope becomes empty after the subtractions. We
functions and Pareto regions, we outline now how to imple- can verify that as follows. Assume that all subtracted poly-
ment elementary operations on those data structures. We topes are contained within P+. Then the region remaining
require the following elementary operations to realize the after subtraction becomes empty if and only if . We
pruning function as described before. First, given the cost can use the algorithm by Bemporad et al.2 to check the latter
functions of two plans, we must determine the parameter
space region in which one plan dominates the other one.
Figure 3. To compare two piecewise-linear cost functions, we
Second, given a Pareto region of a plan and a region in which intersect the parameter space partitions in which each function is
it is dominated, we must reduce the Pareto region by that linear. We compare the functions separately in each of the resulting
region. Third, given a Pareto region, we must determine partitions.
whether it is empty.
Partitioning 1 Partitioning 2 Intersection
Convex polytopes are described by a set of linear inequal-
ities and we consider linear cost functions. All elementary
Parameter 2

Parameter 2

Parameter 2

operations that we describe in the following can, hence, be C 2 A2 C2


realized by solving systems of linear inequalities. Executing
A
the elementary operations therefore requires a linear B 1 A1 B1
solver.
We describe how to determine the parameter space Parameter 1 Parameter 1 Parameter 1
region in which one plan dominates another one. Assume
first that we have only one cost metric and that cost func- Partition Boundaries
tions are linear. Then, we can directly use the linear solver

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 85
research highlights

condition. The algorithm by Bemporad verifies whether the refers to a scenario in which two cost metrics, namely execu-
union of a given set of convex polytopes forms a convex poly- tion time and execution fees, are of interest. Cost functions
tope again. If this is the case then the algorithm constructs depend on a single parameter, called “Parameter 1” in the
that polytope. The condition can only be verified figure, that could refer to unspecified predicates in the input
if forms a convex polytope. In that case, the algorithm query template. We see the cost functions of three plans.
by Bemporad constructs the polytope and a linear For parameter value 0, plan 1 is Pareto-optimal since it has
solver can verify whether P− and P+ are equivalent. lowest execution fees. Plan 3 is Pareto-optimal since it has
lower execution time than all other plans. Plan 2 is, however,
4. ANALYSIS dominated by plan 1 since plan 1 has equivalent execution
We analyze the formal properties of the freshly introduced time and lower execution cost. This means that plan 2 is not
MPQ problem in this section. We also analyze the complex- Pareto-optimal for parameter value 0. For parameter value 2,
ity of the algorithm described in the last section. the situation is similar and plans 1 and 3 are Pareto-optimal
while plan 2 is not. For parameter values between 0.5 and
4.1. Problem analysis 1.5, plan 2 is however Pareto-optimal. Even though the same
MPQ generalizes parametric query optimization since it set of plans is Pareto-optimal at the borders of the parameter
allows to consider multiple plan cost metrics instead of value interval [0, 2], additional plans can be Pareto-optimal
only one. We compare the formal properties of MPQ to for values at the interior of that interval. All plan cost func-
the properties of parametric query optimization in the tions are linear in the example and an interval is a special
following. case of a convex polytope. The example is minimal for MPQ:
The parametric query optimization problem with linear having less than two cost metrics would lead to parametric
cost functions has the following property: if the same query query optimization. Having less than one parameter would
plan is optimal at all vertices of a convex polytope in the lead to multi-objective query optimization. Hence, we can
parameter space then that plan must be optimal inside the conclude from this example that the guiding principles do
polytope as well.6 This property is commonly known as one not apply for MPQ in general.
of the “guiding principle of parametric query optimization.”5
Many algorithms for parametric query optimization exploit 4.2. Algorithm analysis
this property as follows6, 9: they recursively decompose the The space and time complexity of dynamic programming-
parameter space into convex polytopes and calculate opti- based query optimization algorithms depends on the
mal query plans at the vertices. Due to the guiding principle, number of plans stored per subquery. In traditional query
the decomposition of the parameter space can be stopped optimization, plans are compared according to one cost
once the same plan is optimal at all vertices of a polytope. metric and cost functions do not depend on parameters.
Such algorithms transform the parametric query optimiza- If we assume that alternative query plans are compared
tion problem into a series of traditional query optimization based on their cost values alone then exactly one plan, a
problems (calculating the optimal plan at a polytope vertex plan with minimal cost, remains after pruning an arbitrary
is a traditional query optimization problem). This has the set of plans. In parametric query optimization, plans are
advantage that traditional query optimizers can be used for compared according to one cost metric but cost functions
parametric query optimization with minimal changes. It is depend on parameters. This means that different plans
therefore interesting to verify whether an analog property can be optimal for different parameter values. In multi-
holds for MPQ. objective query optimization, we compare plans accord-
Unfortunately this is not the case as we show next. The ing to different cost metrics. Hence multiple plans can be
following property for MPQ would be analog to the guiding Pareto-optimal for each subquery. As a result, we generally
principle of parametric query optimization: if the same set need to store multiple plans per subquery in parametric
of plans is Pareto-optimal at all vertices of a polytope in the and in multi-objective query optimization. The number
parameter space then that set of plans must be Pareto-optimal of plans to store depends on many factors. Research in
inside the polytope as well. Figure 4 illustrates a counter parametric query optimization has focused on analyzing
example showing that this property does not hold. The figure how the number of plans per subquery depends on the
number of parameters. Research in multi-objective query
optimization has focused on the dependency between the
Figure 4. The guiding principle of parametric query optimization number of plans and the number of cost metrics. Such
does not hold for multi-objective parametric query optimization. analysis is necessarily based on simplifying assumptions.
Traditionally, the weights that define the cost functions
2 2
of different query plans are assumed to follow indepen-
Time
Fees

1 1 dent random distributions.6, 7 Based on that assumption,


0 0
the number of remaining plans after pruning can be con-
0 0.5 1 1.5 2 0 0.5 1 1.5 2 sidered a random variable as well and we can calculate its
Parameter 1 Parameter 1 expected value. This reasoning led to an asymptotic upper
bound of 2m, where m designates the number of cost met-
Cost of Plan 1 Cost of Plan 2 Cost of Plan 3 rics, on the expected number of plans per subquery in
multi-objective query optimization.7

86 COMMUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


We perform a similar analysis to determine the expected query templates which are not fully specified: the predicates
number of plans per subquery in MPQ. We consider linear defined on single tables are placeholders. The selectivity
cost functions. We denote the number of parameters by n. of such a predicate, meaning the average fraction of tuples
A linear function is therefore defined by a vector consist- satisfying the predicate, is unknown to our MPQ algorithm.
ing of n + 1 components, specifying the function slope for Hence, the selectivity of each predicate placeholder must be
each parameter and a constant. We still denote the num- represented by a parameter. Our algorithm finds all plans
ber of considered plan cost metrics by m. Each query plan realizing optimal cost trade-offs for each possible param-
is therefore associated with m linear functions. The multi- eter value combination.
dimensional cost function of each query plan can therefore We generate the queries for our benchmark randomly.
be described by a matrix containing m × (n+1) components, We use the method described by Steinbrunn et al.15 to
specifying for each cost metric the cost slopes and a con- produce random queries that join a given number of
stant. Assume that we have two cost functions and that all tables. The number of rows in each table and the selec-
constants and slopes describing the first function are lower tivity of each predicate is chosen randomly according to
than the corresponding entries for the second cost function. that method. We distinguish two classes of queries: chain
Then the first cost functions has for each cost metric a lower queries and star queries. For chain queries, the binary
constant cost component and a lower slope in each param- join predicates connect query tables in a chain. For star
eter. In other words, the first cost function has lower values queries, the binary join predicates connect one table (the
than the second one for arbitrary parameter values and cost middle of the “star”) to all other query tables. The number
metrics. If both cost functions are associated with query of predicates is for both query classes one less than the
plans then the plan associated with the second function is number of tables.
clearly irrelevant. We describe the plan search space that our algorithm
We can exploit this fact as follows. Assume that we choose considers. Our algorithm considers all possible orders in
an arbitrary number of D-dimensional vectors randomly which tables can be joined with only one restriction: when-
with independent identical distribution. Then the expected ever we have the choice between joining two relations that
number of vectors such that no other vector has a lower or are connected via a binary join predicate and joining two
equivalent value in each component is bounded by 2D.7 We relations where this is not the case then only joins of the first
assume that vectors describing the cost functions of differ- category are considered. This restriction on the join order
ent query plans are chosen randomly with independent and is often used in query optimization.14, 15 In addition to the
identical distribution. Setting D = m × (n + 1), we infer that join orders, our algorithm considers different scan and join
the expected number of vectors such that no other vector operators. For scanning single tables on which a predicate
has lower or equal values in all components is bounded by is defined, we consider a full scan and an index-based scan.
2m×(n+1). As outlined before, this is at the same time an upper Which of the two operators is preferable depends on the
bound on the expected number of relevant query plans per selectivity of the predicate. If the selectivity is low (few tuples
subquery. will satisfy the predicate) then the index scan is often prefer-
In order to obtain an upper bound on the asymptotic able. If the predicate is satisfied for most tuples then the full
space complexity, we multiply the aforementioned bound by scan is more efficient. We model the selectivity of a predi-
the number of subqueries. We generate new plans by com- cate defined on a single table by a parameter. The optimal
bining two relevant plans. The number of generated plans choice for the scan operator therefore depends on the value
grows therefore as the square of the number of relevant of that parameter. We consider two join operators: a distrib-
plans. All generated plans for the same subquery are com- uted join and a single-node hash join. For sufficiently large
pared pair-wise during pruning. The number of plan com- amounts of input data, the distributed join saves execution
parisons grows therefore as the fourth power of the number time. On the other side, the distributed join requires to rent
of relevant plans. Multiplying by the number of subquery more computational resources from the cloud provider and
splits yields the time complexity measured by the number of is therefore more expensive. Hence, we can realize differ-
plan comparisons. ent tradeoffs between execution time and execution fees
by selecting between alternative join operators. We imple-
5. EXPERIMENTS mented our MPQ algorithm in Java 1.7. We used Gurobi 5.6a
5.1. Experimental setup as linear solver. All experiments were executed on an iMac
We evaluate our MPQ algorithm experimentally. More pre- equipped with an i5-3470S processor with 2.9 GhZ and 16 GB
cisely, we study how optimization time depends on the of RAM.
input query size and on the number of considered param-
eters. Our experiments are based on an example scenario 5.2. Experimental results
in which SQL queries are processed in the cloud. Hence, Figure 5 shows our experimental results. Each data point in
we compare alternative query plans according to two cost that figure corresponds to the median value of 25 randomly
metrics: execution time and monetary execution fees. We generated test cases. We report optimization time, the num-
consider a restricted class of SQL queries: each query is ber of generated query plans (counting plans for the input
described by a set of tables to join, by predicates defined
on single tables, and by binary join predicates defined on
table pairs. We assume that our MPQ algorithm is applied to a
 http://www.gurobi.com/.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 87
research highlights

query and plans for subqueries), and the number of solved 6. RELATED WORK
linear programs. We generated query templates joining Figure 6 shows how MPQ optimization relates to prior query
between 2 and 12 tables and having between one and two optimization variants. The figure shows for each variant the
parameters. type of cost function c, that is, associated with each query
Optimization time increases in the number of tables. plan. Arrows point from a more restricted to a more general
As predicted by our formal analysis in the previous section, query optimization variant. Multi-objective query optimiza-
optimization time also increases in the number of parame- tion1, 7, 11, 16, 17 and parametric query optimization3, 4, 6, 8, 10, 13
ters. Optimization time grows faster in the number of query both generalize the traditional query optimization model.14
tables for star queries than for chain queries. The reason is Multi-objective parametric query optimization generalizes
that the number of admissible join orders grows faster in the both of the aforementioned variants.
number of query tables for star queries. Speaking of admis- The algorithm that we propose in this paper allows to
sible join orders, we mean join orders that comply with the solve query optimization problems that prior algorithms
restriction mentioned before. Optimization time, the num- cannot solve. Algorithms for parametric query optimization
ber of generated plans, and the number of solved linear pro- are not applicable to MPQ since they cannot handle multiple
grams are all correlated. This is intuitive as the number of cost metrics. Algorithms for multi-objective query optimi-
generated plans relates to the number of plan comparisons zation are not applicable to MPQ since they cannot handle
that are required during pruning. The number of linear pro- parameters. Note that parameters and cost metrics have
grams is related to the number of plan comparisons since a different semantic such that it is not possible to model
plan comparisons are realized by solving linear programs. parameters as cost metrics or vice versa. Intuitively, we want
The time required for generating plans and for solving lin- to “cover” the entire parameter space (by finding plans for
ear programs adds to optimization time. each possible parameter value combination) while we do not
The query sizes that we consider in our benchmark are want to cover the entire cost space (plans with higher cost
typical for query sizes as they appear in standard bench- values than necessary are not part of the result plan set).
marks: the queries in the popular TPC-H benchmarkb join at The algorithm that we describe in this article is based
most eight tables. MPQ takes longer than traditional query on dynamic programming. It calculates optimal plans for
optimization. In contrast to traditional query optimization, a query by combining optimal plans for its subqueries.
MPQ takes, however, place before runtime. This makes Many query optimization algorithms for traditional query
higher optimization times acceptable. optimization,14 multi-objective query optimization,16, 17 and
parametric query optimization8 use the same dynamic pro-
gramming scheme. The difference between our algorithm
b
 http://www.tpc.org/tpch/. and all prior algorithms lies in the implementation of the
pruning function. We use linear programming in the prun-
Figure 5. Optimization time, number of generated plans, and number
ing function. Our algorithm shares this property with prior
of solved linear programs. algorithms for parametric query optimization.8 We support
however multiple cost metrics and hence the definition of
Chain queries Star queries the pruning function, the type of the used data structures,
104 and the implementation of elementary operations on those
105
data structures differ.
Time (ms)

103 104
Many algorithms for parametric query optimization are
103
102 based on the guiding principles of parametric query optimi-
102 zation.5 They partition the parameter space in a more and
101 101 more fine-grained manner until a single query plan is opti-
105 mal in each partition.6, 9 The condition that allows to verify
# Linear programs # Created plans

103 whether a single query plan is optimal in a given partition is


104
102 103
102 Figure 6. Multi-objective parametric query optimization generalizes
101
101 the cost models of multi-objective and of parametric query
optimization.
105
106 Traditional
104 105 c∈
103 104
103 Multi-Objective Parametric
102 102
c∈ m c ∈ n → 
101
2 4 6 8 10 12 2 4 6 8 10 12
No. of tables No. of tables Multi-Objective Parametric
1 Par. 2 Par. c ∈ n → m

88 COM MUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


based on the guiding principles. We have shown in Section 4 13. Reddy, N., Haritsa, J. Analyzing 16. Trummer, I., Koch, C. Approximation
plan diagrams of database query schemes for many-objective query
that the multi-objective analog of the guiding principles does optimizers. VLDB (2005), 1228–1239. optimization. In SIGMOD (2014),
not hold for MPQ. Hence, we cannot use generalizations of 14. Selinger, P.G., Astrahan, M.M., 1299–1310.
Chamberlin, D.D., Lorie, R.A., Price, 17. Trummer, I., Koch, C. An incremental
the aforementioned decomposition methods for MPQ. T.G. Access path selection in a anytime algorithm for multi-objective
relational database management query optimization. In SIGMOD
system. In SIGMOD (1979), 23–34. (2015), 1941–1953.
7. CONCLUSION 15. Steinbrunn, M., Moerkotte, G., 18. Xu, Z., Tu, Y.C., Wang, X. PET: Reducing
The traditional query optimization model is outdated. We Kemper, A. Heuristic and randomized database energy cost via query
optimization for the join ordering optimization. VLDB 5, 12 (2012),
proposed a generalized problem model that allows to rep- problem. VLDBJ 6, 3 (1997), 191–208. 1954–1957.
resent multiple plan cost metrics and multiple parameters.
We described and analyzed a first algorithm that solves the Immanuel Trummer (itrummer@cornell. Christoph Koch ([email protected]),
edu), Cornell University, Computer Science Ecole Polytechnique Fédérale de Lausanne,
novel optimization problem. Department, Ithaca, NY. EPFL DATA Lab, Lausanne, Switzerland.

References
1. Agarwal, S., Iyer, A., Panda, A. Blink 7. Ganguly, S., Hasan, W., Krishnamurthy, R.
and it’s done: Interactive queries on Query optimization for parallel
very large data. In VLDB 5, 12 (2012), execution. In SIGMOD (1992),
1902–1905. 9–18.
2. Bemporad, A., Fukuda, K., Torrisi, F. 8. Hulgeri, A., Sudarshan, S. Parametric
Convexity recognition of the union query optimization for linear and
of polyhedra. Comput. Geom. 18, 3 piecewise linear cost functions.
(2001), 141–154. In VLDB (2002), 167–178.
3. Bizarro, P., Bruno, N., DeWitt, D. 9. Hulgeri, A., Sudarshan, S.
Progressive parametric query AniPQO: Almost non-intrusive
optimization. KDE 21, 4 (2009), parametric query optimization for
582–594. nonlinear cost functions. In VLDB
4. Darera, P., Haritsa, J. On the (2003), 766–777.
production of anorexic plan diagrams. 10. Ioannidis, Y.E., Ng, R.T., Shim, K.,
PVLDB (2007), 1081–1092. Sellis, T.K. Parametric query
5. Dey, A., Bhaumik, S., Haritsa, J. optimization. VLDBJ 6, 2 (May 1997),
Efficiently approximating query 132–151.
optimizer plan diagrams. In VLDB 1, 2 11. Papadimitriou, C., Yannakakis, M.
(2008), 1325–1336. Multiobjective query optimization.
6. Ganguly, S. Design and analysis In PODS (2001), 52–59.
of parametric query optimization 12. Park, H., Widom, J. Query optimization
algorithms. In VLDB (1998), over crowdsourced data. In VLDB
228–238. (2013), 781–792. © 2017 ACM 0001-0782/17/10 $15.00

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 89
research highlights
DOI:10.1145/ 3 1 2 6 9 0 7

Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3126905 rh

Shedding New Light on


an Old Language Debate
By Jeffrey S. Foster

AS C OM P U T E R S CI EN T I STS, we use pro- choice and code quality are related. To domain, for example, user application
gramming languages to turn our ideas do so, the authors perform an observa- versus library versus middleware. Last,
into reality. It is no surprise, then, that tional study on a corpus of 728 popular they report that some defect types,
programming language design has GitHub projects, totaling 63 million such as memory errors, are strongly as-
been a major concern since at least the lines of code. The largest single proj- sociated with certain languages.
1950s, when John Backus introduced ect considered is Linux, comprising These are interesting results. While
FORTRAN, usually considered the first 15+ million lines of code. The authors they suggest that the language does in-
high-level programming language. apply a variety of techniques to extract deed matter, almost all of the observed
The revolutionary innovation of FOR- data from the subject programs to try effects are small … except that some
TRAN—the thing that made it high-lev- to answer the question at hand. They particular language features, such as
el—was that it included concepts, such rely on GitHub Linguist to identify a lack of memory safety, do have pro-
as loops and complex expressions, that the primary language of a project. As a found effects.
made the programmer’s job easier. To proxy for code quality, they count the Like any empirical study, the results
put it another way, FORTRAN showed total number of defect-fixing commits here have threats to validity: noise in
that a programming language could per project, determined by textually the data, such as the classification of
introduce new abstractions that were searching for one of a fixed set of key- a commit as defect-fixing, is difficult
encoded via a compiler, rather than words in the commit messages. And to account for; defects may have been
directly implemented in the hardware. they use supervised machine learning made and fixed without an intervening
Not long after the introduction to approximately classify bug reports commit, for example, defects prevent-
of FORTRAN, other programming into categories. ed by a static type checker are likely not
languages appeared with somewhat The authors then apply a range of included; projects vary significantly
different sets of abstractions: John statistical methods and reach four in software engineering practices, for
McCarthy’s LISP, which introduced main conclusions. First, they report example, Linux is an outlier, with an
functional programming, and Grace that projects in Clojure, Haskell, Ruby, extremely large user base with many
Murray Hopper’s COBOL, which aimed and Scala are slightly less likely to developers and testers; tool support
to support business, rather than scien- have defect-fixing commits, and those for different languages varies signifi-
tific or mathematical, applications. in C, C++, Objective-C, and Python cantly; there may be a strong relation-
Thus, for at least the last 60 years, are slightly more likely. Second, they ship between programmer skill and
programmers have been faced with report that languages that are func- language choice; language design can
the question: What programming lan- tional, disallow implicit type conver- obviate classes of errors, for example,
guage should I use? sion, have static typing, and/or use- buffer overflows can occur in C and
Today, answering this question has managed memory have slightly fewer C++ but not Java; and in practice the
only gotten more difficult. Myriad lan- defects than languages without these choice of programming language is
guages have been developed in the last characteristics. Third, they report that often constrained both by external
six decades, with at least a few dozen in defect-proneness does not depend on factors (for example, the language of
common usage today. Moreover, even existing codebases) and the problem
if in a theoretical sense any general- domain (for example, device drivers
purpose language can implement any For at least are likely to be written in C or C++).
algorithm, in practice the different Finally, while the use of regression
abstractions provided by different lan- the last 60 years, analysis in such observational studies
guages seem to have a strong influence programmers can control for confounds and strongly
on programming tasks. The Internet suggest relationships, it cannot defini-
is filled with heated debates about the have been faced tively establish causation. Even so, this
merits of functional versus imperative with the question: paper raises intriguing questions in an
programming, the costs versus the effort to shed light on one of the oldest
benefits of object orientation, and the What programming debates in computer science.
trade-offs between dynamic and static language should I use?
typing, among many others. Jeffrey S. Foster (jfoster at cs.umd.edu) is a professor
in the computer science department at the University of
The following paper aims to bring Maryland, College Park.
empiricism to this debate by study-
ing whether programming language Copyright held by author.

90 COMMUNICATIO NS O F TH E ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


DOI:10.1145/ 3 1 2 6 9 0 5

A Large-Scale Study of
Programming Languages and
Code Quality in GitHub
By Baishakhi Ray, Daryl Posnett, Premkumar Devanbu, and Vladimir Filkov

Abstract outcomes, such as code quality, language properties, and


What is the effect of programming languages on software usage domains. Considering, for example, software quality,
quality? This question has been a topic of much debate for there are a number of well-known influential factors, such
a very long time. In this study, we gather a very large data set as code size,6 team size,2 and age/maturity.9
from GitHub (728 projects, 63 million SLOC, 29,000 authors, Controlled experiments are one approach to examining
1.5 million commits, in 17 languages) in an attempt to shed the impact of language choice in the face of such daunting
some empirical light on this question. This reasonably large confounds, however, owing to cost, such studies typically
sample size allows us to use a mixed-methods approach, introduce a confound of their own, that is, limited scope.
combining multiple regression modeling with visualization The tasks completed in such studies are necessarily limited
and text analytics, to study the effect of language features and do not emulate real world development. There have been
such as static versus dynamic typing and allowing versus several such studies recently that use students, or compare
disallowing type confusion on software quality. By triangulat- languages with static or dynamic typing through an experi-
ing findings from different methods, and controlling for con- mental factor.7, 12, 15
founding effects such as team size, project size, and project Fortunately, we can now study these questions over a large
history, we report that language design does have a signifi- body of real-world software projects. GitHub contains many
cant, but modest effect on software quality. Most notably, projects in multiple languages that substantially vary across
it does appear that disallowing type confusion is modestly size, age, and number of developers. Each project repository
better than allowing it, and among functional languages, provides a detailed record, including contribution history,
static typing is also somewhat better than dynamic typing. project size, authorship, and defect repair. We then use a
We also find that functional languages are somewhat bet- variety of tools to study the effects of language features on
ter than procedural languages. It is worth noting that these defect occurrence. Our approach is best described as mixed-
modest effects arising from language design are overwhelm- methods, or triangulation5 approach; we use text analysis,
ingly dominated by the process factors such as project size, clustering, and visualization to confirm and support the
team size, and commit size. However, we caution the reader findings of a quantitative regression study. This empirical
that even these modest effects might quite possibly be due approach helps us to understand the practical impact of
to other, intangible process factors, for example, the prefer- programming languages, as they are used colloquially by
ence of certain personality types for functional, static lan- developers, on software quality.
guages that disallow type confusion.
2. METHODOLOGY
Our methods are typical of large scale observational stud-
1. INTRODUCTION ies in software engineering. We first gather our data from
A variety of debates ensue during discussions whether a given several sources using largely automated methods. We then
programming language is “the right tool for the job.” While filter and clean the data in preparation for building a statis-
some of these debates may appear to be tinged with an tical model. We further validate the model using qualitative
almost religious fervor, most agree that programming lan- methods. Filtering choices are driven by a combination of
guage choice can impact both the coding process and the factors including the nature of our research questions, the
resulting artifact. quality of the data and beliefs about which data is most
Advocates of strong, static typing tend to believe that the suitable for statistical study. In particular, GitHub contains
static approach catches defects early; for them, an ounce of many projects written in a large number of programming
prevention is worth a pound of cure. Dynamic typing advo- languages. For this study, we focused our data collection
cates argue, however, that conservative static type checking efforts on the most popular projects written in the most
is wasteful of developer resources, and that it is better to rely popular languages. We choose statistical methods appro-
on strong dynamic type checking to catch type errors as they priate for evaluating the impact of factors on count data.
arise. These debates, however, have largely been of the arm-
chair variety, supported only by anecdotal evidence.
The original version of the paper was published in the
This is perhaps not unreasonable; obtaining empirical evi-
Proceedings of the 22nd ACM SIGSOFT International Sym-
dence to support such claims is a challenging task given the
posium on Foundations of Software Engineering, 155–165.
number of other factors that influence software engineering

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 91
research highlights

2.1. Data collection projects). This leaves us with 728 projects. Table 1 shows the
We choose the top 19 programming languages from GitHub. top 3 projects in each language.
We disregard CSS, Shell script, and Vim script as they are Retrieving project evolution history. For each of 728 projects,
not considered to be general purpose languages. We further we downloaded the non-merged commits, commit logs, author
include TypeScript, a typed superset of JavaScript. date, and author name using git. We compute code churn
Then, for each of the studied languages we retrieve the top 50 and the number of files modified per commit from the num-
projects that are primarily written in that language. In total, ber of added and deleted lines per file. We retrieve the languages
we analyze 850 projects spanning 17 different languages. associated with each commit from the extensions of the modi-
Our language and project data was extracted from the GitHub fied files (a commit can have multiple language tags). For each
Archive, a database that records all public GitHub activities. commit, we calculate its commit age by subtracting its commit
The archive logs 18 different GitHub events including new date from the first commit of the corresponding project. We also
commits, fork events, pull request, developers’ information, calculate other project-related statistics, including maximum
and issue tracking of all the open source GitHub projects commit age of a project and the total number of develop-
on an hourly basis. The archive data is uploaded to Google ers, used as control variables in our regression model, and
BigQuery to provide an interface for interactive data analysis. discussed in Section 3. We identify bug fix commits made to
Identifying top languages. We aggregate projects based individual projects by searching for error related keywords:
on their primary language. Then we select the languages with “error,” “bug,” “fix,” “issue,” “mistake,” “incorrect,” “fault,”
the most projects for further analysis, as shown in Table 1. “defect,” and “flaw,” in the commit log, similar to a prior study.18
A given project can use many languages; assigning a single Table 2 summarizes our data set. Since a project may use
language to it is difficult. Github Archive stores information multiple languages, the second column of the table shows
gathered from GitHub Linguist which measures the language the total number of projects that use a certain language at
distribution of a project repository using the source file exten- some capacity. We further exclude some languages from a
sions. The language with the maximum number of source project that have fewer than 20 commits in that language,
files is assigned as the primary language of the project. where 20 is the first quartile value of the total number of
Retrieving popular projects. For each selected language, we commits per project per language. For example, we find 220
filter the project repositories written primarily in that language projects that use more than 20 commits in C. This ensures
by its popularity based on the associated number of stars. This sufficient activity for each language–project pair.
number indicates how many people have actively expressed In summary, we study 728 projects developed in 17 lan-
interest in the project, and is a reasonable proxy for its popu- guages with 18 years of history. This includes 29,000 dif-
larity. Thus, the top 3 projects in C are linux, git, and php-src; ferent developers, 1.57 million commits, and 564,625 bug
and for C++ they are node-webkit, phantomjs, and mongo; and fix commits.
for Java they are storm, elasticsearch, and ActionBarSherlock.
In total, we select the top 50 projects in each language. 2.2. Categorizing languages
To ensure that these projects have a sufficient develop- We define language classes based on several properties of the
ment history, we drop the projects with fewer than 28 com- language thought to influence language quality,7, 8, 12 as shown
mits (28 is the first quartile commit count of considered in Table 3. The Programming Paradigm indicates whether the

Table 2. Study subjects.


Table 1. Top 3 projects in each language.
Project details Commits
Language Projects
#Devs #Commits #Insertion #BugFixes
C Linux, git, php-src Language #Projects (K) (K) (MLOC) (K)
C++ Node-webkit, phantomjs, mongo
C 220 13.8 447.8 75.3 182.6
C# SignalR, SparkleShare,
C++ 149 3.8 196.5 46.0 79.3
ServiceStack
C# 77 2.3 135.8 27.7 50.7
Objective-C AFNetworking, GPUImage, RestKit
Objective-C 93 1.6 21.6 2.4 7.1
Go Docker, lime, websocketd
Go 54 6.6 19.7 1.6 4.4
Java Storm, elasticsearch,
Java 141 3.3 87.1 19.1 35.1
ActionBarSherlock
CoffeeScript 92 1.7 22.5 1.1 6.3
CoffeeScript Coffee-script, hubot, brunch
JavaScript 432 6.8 118.3 33.1 39.3
JavaScript Bootstrap, jquery, node
TypeScript 14 2.4 3.3 2.0 0.9
TypeScript Typescript-node-definitions,
Ruby 188 9.6 122.1 5.8 30.5
StateTree, typescript.api
Php 109 4.9 118.7 16.2 47.2
Ruby Rails, gitlabhq, homebrew
Python 286 5.0 114.2 9.0 41.9
Php Laravel, CodeIgniter, symfony
Perl 106 0.8 5.5 0.5 1.9
Python Flask, django, reddit
Clojure 60 0.8 28.4 1.5 6.0
Perl Gitolite, showdown, rails-dev-box
Erlang 51 0.8 31.4 5.0 8.1
Clojure LightTable, leiningen, clojurescript
Haskell 55 0.9 46.1 2.9 10.4
Erlang ChicagoBoss, cowboy, couchdb
Scala 55 1.3 55.7 5.3 12.9
Haskell Pandoc, yesod, git-annex
Scala Play20, spark, scala Summary 728 28 1574 254 564

92 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


data type provides can lead to implicit type conversion and
Table 3. Different types of language classes.
also have unintended consequences.c Hence, we classify a
Language language based on whether its compiler allows or disallows
classes Categories Languages the implicit type conversion as above; the latter explicitly
Programming Imperetive C, C++, C#, Objective-C, detects type confusion and reports it.
paradigm procedural Java, Go Disallowing implicit type conversion could result from
Imperetive CoffeeScript, JavaScript, static type inference within a compiler (e.g., with Java),
scripting Python, Perl, Php, Ruby
using a type-inference algorithm such as Hindley10 and
Functional Clojure, Erlang, Haskell, Scala
Milner,17 or at run-time using a dynamic type checker. In
Type checking Static C, C++, C#, Objective-C, Java,
­contrast, a type-confusion can occur silently because it is
Go, Haskell, Scala
Dynamic CoffeeScript, JavaScript, either undetected or is unreported. Either way, implicitly
Python, Perl, Php, Ruby, allowing type conversion provides flexibility but may even-
Clojure, Erlang tually cause errors that are difficult to localize. To abbrevi-
Implicit type Disallow C#, Java, Go, Python, Ruby, ate, we refer to languages allowing implicit type conversion
conversion Clojure, Erlang, Haskell, as implicit and those that disallow it as explicit.
Scala Memory Class indicates whether the language requires
Allow C, C++, Objective-C,
CoffeeScript, JavaScript,
developers to manage memory. We treat Objective-C as
Perl, Php unmanaged, in spite of it following a hybrid model, because
Memory class Managed Others
we observe many memory errors in its codebase, as discussed
Unmanaged C, C++, Objective-C in RQ4 in Section 3.
We omit TypeScript from language classification as it allows both explicit
Note that we classify and study the languages as they are
and implicit type conversion. colloquially used by developers in real-world software. For
example, TypeScript is intended to be used as a static lan-
guage, which disallows implicit type conversion. However,
project is written in an imperative procedural, imperative in practice, we notice that developers often (for 50% of the
scripting, or functional language. In the rest of the paper, we variables, and across TypeScript-using projects in our
use the terms procedural and scripting to indicate impera- dataset) use the any type, a catch-all union type, and thus, in
tive procedural and imperative scripting respectively. practice, TypeScript allows dynamic, implicit type conver-
Type Checking indicates static or dynamic typing. In stati- sion. To minimize the confusion, we exclude TypeScript
cally typed languages, type checking occurs at compile time, from our language classifications and the corresponding
and variable names are bound to a value and to a type. In model (see Table 3 and 7).
addition, expressions (including variables) are classified by
types that correspond to the values they might take on at run- 2.3. Identifying project domain
time. In dynamically typed languages, type checking occurs We classify the studied projects into different domains based
at run-time. Hence, in the latter, it is possible to bind a vari- on their features and function using a mix of automated and
able name to objects of different types in the same program. manual techniques. The projects in GitHub come with
Implicit Type Conversion allows access of an operand of project descriptions and README files that describe their
type T1 as a different type T2, without an explicit conversion. features. We used Latent Dirichlet Allocation (LDA)3 to analyze
Such implicit conversion may introduce type-confusion in this text. Given a set of documents, LDA identifies a set of top-
some cases, especially when it presents an operand of spe- ics where each topic is represented as probability of generat-
cific type T1, as an instance of a different type T2. Since not ing different words. For each document, LDA also estimates
all implicit type conversions are immediately a problem, we the probability of assigning that document to each topic.
operationalize our definition by showing examples of the We detect 30 distinct domains, that is, topics, and estimate
implicit type confusion that can happen in all the languages the probability that each project belonging to each domain.
we identified as allowing it. For example, in languages like Since these auto-detected domains include several project-
Perl, JavaScript, and CoffeeScript adding a string to specific keywords, for example, facebook, it is difficult to
a number is permissible (e.g., “5” + 2 yields “52”). The same identify the underlying common functions. In order to assign
operation yields 7 in Php. Such an operation is not permitted a meaningful name to each domain, we manually inspect
in languages such as Java and Python as they do not allow each of the 30 domains to identify projectname-independent,
implicit conversion. In C and C++ coercion of data types can domain-identifying keywords. We manually rename all of
result in unintended results, for example, int x; float y; the 30 auto-detected domains and find that the majority of
y=3.5; x=y; is legal C code, and results in different values the projects fall under six domains: Application, Database,
for x and y, which, depending on intent, may be a problem CodeAnalyzer, Middleware, Library, and Framework. We also
downstream.a In Objective-C the data type id is a generic find that some projects do not fall under any of the above
object pointer, which can be used with an object of any data
type, regardless of the class.b The flexibility that such a generic b 
This Apple developer article describes the usage of “id” http://tinyurl.com/
jkl7cby.
a
 Wikipedia’s article on type conversion, https://en.wikipedia.org/wiki/ c
  Some examples can be found here http://dobegin.com/objc-id-type/ and
Type_conversion, has more examples of unintended behavior in C. here http://tinyurl.com/hxv8kvg.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 93
research highlights

domains and so we assign them to a catchall domain labeled (2) Supervised classification. We use the annotated bug
as Other. This classification of projects into domains was fix logs from the previous step as training data for supervised
subsequently checked and confirmed by another member learning techniques to classify the remainder of the bug fix
of our research group. Table 4 summarizes the identified messages by treating them as test data. We first convert each
domains resulting from this process. bug fix message to a bag-­of- words. We then remove words
that appear only once among all of the bug fix messages. This
2.4. Categorizing bugs reduces project specific keywords. We also stem the bag-of-
While fixing software bugs, developers often leave impor- words using standard natural language processing tech-
tant information in the commit logs about the nature of niques. Finally, we use Support Vector Machine to classify the
the bugs; for example, why the bugs arise and how to fix the test data.
bugs. We exploit such information to categorize the bugs, To evaluate the accuracy of the bug classifier, we man-
similar to Tan et al.13, 24 ually annotated 180 randomly chosen bug fixes, equally
First, we categorize the bugs based on their Cause and distributed across all of the categories. We then compare
Impact. Causes are further classified into disjoint sub- the result of the automatic classifier with the manually
­categories of errors: Algorithmic, Concurrency, Memory, annotated data set. The performance of this process was
generic Programming, and Unknown. The bug Impact is acceptable with precision ranging from a low of 70% for
also classified into four disjoint subcategories: Security, performance bugs to a high of 100% for concurrency bugs
Performance, Failure, and Other unknown categories. Thus, with an average of 84%. Recall ranged from 69% to 91% with
each bug-fix commit also has an induced Cause and an an average of 84%.
Impact type. Table 5 shows the description of each bug cat- The result of our bug classification is shown in Table 5.
egory. This classification is performed in two phases: Most of the defect causes are related to generic program-
(1) Keyword search. We randomly choose 10% of the bug- ming errors. This is not surprising as this category involves
fix messages and use a keyword based search technique to a wide variety of programming errors such as type errors,
automatically categorize them as potential bug types. We typos, compilation error, etc. Our technique could not clas-
use this annotation, separately, for both Cause and Impact sify 1.04% of the bug fix messages in any Cause or Impact
types. We chose a restrictive set of keywords and phrases, as ­category; we classify these as Unknown.
shown in Table 5. Such a restrictive set of keywords and
phrases helps reduce false positives. 2.5. Statistical methods
We model the number of defective commits against other
factors related to software projects using regression. All
Table 4. Characteristics of domains.
models use negative binomial regression (NBR) to model the
Domain Example Total counts of project attributes such as the number of commits.
Domain name characteristics projects projects NBR is a type of generalized linear model used to model non-
(APP) Application End user programs bitcoin, macvim 120 negative integer responses.4
(DB) Database SQL and NoSQL mysql, mongodb 43 In our models we control for several language per-project
(CA) CodeAnalyzer Compiler, parser, etc. ruby, php-src 88 dependent factors that are likely to influence the outcome.
(MW) Middleware OS, VMs, etc. linux, memcached 48 Consequently, each (language, project) pair is a row in our
(LIB) Library APIs, libraries, etc. androidApis, 175
opencv
regression and is viewed as a sample from the population
(FW) Framework SDKs, plugins ios sdk, 206 of open source projects. We log-transform dependent count
coffeekup variables as it stabilizes the variance and usually improves
(OTH) Other – Arduino, 49 the model fit.4 We verify this by comparing transformed with
autoenv
non transformed data using the AIC and Vuong’s test for
non-nested models.

Table 5. Categories of bugs and their distribution in the whole dataset.

Bug type Bug description Search keywords/phrases Count % count


Algorithm (Algo) Algorithmic or logical errors Algorithm 606 0.11
Concurrancy (Conc) Multithreading/processing issues Deadlock, race condition, synchronization error 11,111 1.99
Cause

Memory (Mem) Incorrect memory handling Memory leak, null pointer, buffer overflow, heap 30,437 5.44
overflow, null pointer, dangling pointer, double free,
segmentation fault
Programming (Prog) Generic programming errors Exception handling, error handling, type error, typo, 495,013 88.53
compilation error, copy-paste error, refactoring,
missing switch case, faulty initialization, default value
Security (Sec) Runs, but can be exploited Buffer overflow, security, password, oauth, ssl 11,235 2.01
Impact

Performance (Perf) Runs, but with delayed response Optimization problem, performance 8651 1.55
Failure (Fail) Crash or hang Reboot, crash, hang, restart 21,079 3.77
Unknown (Unkn) Not part of the above categories 5792 1.04

94 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


To check that excessive multicollinearity is not an issue, we The sign and magnitude of the estimated coefficients in
compute the variance inflation factor of each dependent vari- the above model relates the predictors to the outcome. The
able in all of the models with a conservative maximum value first four variables are control variables and we are not inter-
of 5.4 We check for and remove high leverage points through ested in their impact on the outcome other than to say that
visual examination of the residuals versus leverage plot for they are all positive and significant. The language variables
each model, looking for both separation and large values of are indicator variables, viz. factor variables, for each proj-
Cook’s distance. ect. The coefficient compares each language to the grand
We employ effects, or contrast, coding in our study to facili- weighted mean of all languages in all projects. The language
tate interpretation of the language coefficients.4 Weighted coefficients can be broadly grouped into three general cat-
effects codes allow us to compare each language to the aver- egories. The first category is those for which the coefficient
age effect across all languages while compensating for the is statistically insignificant and the modeling procedure
unevenness of language usage across projects.23 To test for could not distinguish the coefficient from zero. These lan-
the relationship between two factor variables we use a Chi- guages may behave similar to the average or they may have
square test of independence.14 After confirming a dependence wide variance. The remaining coefficients are significant
we use Cramer’s V, an r × c equivalent of the phi coefficient for and either positive or negative. For those with positive coef-
nominal data, to establish an effect size. ficients we can expect that the language is associated with
a greater number of defect fixes. These languages include
3. RESULTS C, C++, Objective-C, Php, and Python. The languages
We begin with a straightforward question that directly Clojure, Haskell, Ruby, and Scala, all have negative
addresses the core of what some fervently believe must be coefficients implying that these languages are less likely
true, namely: than average to result in defect fixing commits.
RQ1. Are some languages more defect-prone than others? One should take care not to overestimate the impact of
We use a regression model to compare the impact of each language on defects. While the observed relationships are
language on the number of defects with the average impact statistically significant, the effects are quite small. Analysis
of all languages, against defect fixing commits (see Table 6). of deviance reveals that language accounts for less than 1%
We include some variables as controls for factors that of the total explained deviance.
will clearly influence the response. Project age is included as
older projects will generally have a greater number of defect Df Deviance Resid. Df Resid. dev Pr (>Chi)
fixes. Trivially, the number of commits to a project will also
NULL 1075 25,176.25
impact the response. Additionally, the number of develop- Log commits 1 4256.89 1071 1286.74 0
ers who touch a project and the raw size of the project are Log age 1 8011.52 1074 17,164.73 0
both expected to grow with project activity. Log size 1 10,082.78 1073 7081.95 0
Log devs 1 1538.32 1072 5543.63 0
Language 16 130.78 1055 1155.96 0
Table 6. Some languages induce fewer defects than other languages.

Defective commits model Coef. (Std. Err.) We can read the model coefficients as the expected change
(Intercept) −2.04 (0.11)***
in the log of the response for a one unit change in the predic-
Log age 0.06 (0.02)*** tor with all other predictors held constant; that is, for a coef-
Log size 0.04 (0.01)*** ficient βi, a one unit change in βi yields an expected change
Log devs 0.06 (0.01)*** in the response of eβi. For the factor variables, this expected
Log commits 0.96 (0.01)***
change is compared to the average across all languages. Thus,
C 0.11 (0.04)** if, for some number of commits, a particular project devel-
C++ 0.18 (0.04)***
oped in an average language had four defective commits, then
C# −0.02 (0.05)
Objective-C 0.15 (0.05)** the choice to use C++ would mean that we should expect one
Go −0.11 (0.06) additional defective commit since e0.18 × 4 = 4.79. For the same
Java −0.06 (0.04) project, choosing Haskell would mean that we should expect
CoffeeScript 0.06 (0.05) about one fewer defective commit as e−0.26 × 4 = 3.08. The accu-
JavaScript 0.03 (0.03)
racy of this prediction depends on all other factors remaining
TypeScript 0.15 (0.10)
Ruby −0.13 (0.05)** the same, a challenging proposition for all but the most trivial
Php 0.10 (0.05)* of projects. All observational studies face similar limitations;
Python 0.08 (0.04)* we address this concern in more detail in Section 5.
Perl −0.12 (0.08) Result 1: Some languages have a greater association with
Clojure −0.30 (0.05)***
Erlang −0.03 (0.05)
defects than other languages, although the effect is small.
Haskell −0.26 (0.06)*** In the remainder of this paper we expand on this basic
Scala −0.24 (0.05)*** result by considering how different categories of applica-
Response is the number of defective commits. Languages are coded with weighted tion, defect, and language, lead to further insight into the
effects coding. AIC=10432, Deviance=1156, Num. obs.=1076. relationship between languages and defect proneness.
***p < 0.001, **p < 0.01, *p < 0.05
Software bugs usually fall under two broad categories: (1)
Domain Specific bug: specific to project function and do not

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 95
research highlights

depend on the underlying programming language. (2) Generic Table 7. Functional languages have a smaller relationship to defects
bug: more generic in nature and has less to do with project than other language classes whereas procedural languages are
function, for example, typeerrors, concurrency errors, etc. greater than or similar to the average.
Consequently, it is reasonable to think that the interaction of
application domain and language might impact the number Defective commits

of defects within a project. Since some languages are believed (Intercept) −2.13 (0.10)***
to excel at some tasks more so than others, for example, C for Log commits 0.96 (0.01)***
Log age 0.07 (0.01)***
low level work, or Java for user applications, making an inap- Log size 0.05 (0.01)***
propriate choice might lead to a greater number of defects. To Log devs 0.07 (0.01)***
study this we should ideally ignore the domain specific bugs, Functional-Static-Explicit-Managed −0.25 (0.04)***
as generic bugs are more likely to depend on the programming Functional-Dynamic-Explicit-Managed −0.17 (0.04)***
Proc-Static-Explicit-Managed −0.06 (0.03)*
language featured. However, since a domain-specific bugs may
Script-Dynamic-Explicit-Managed 0.001 (0.03)
also arise due to a generic programming error, it is difficult to Script-Dynamic-Implicit-Managed 0.04 (0.02)*
separate the two. A possible workaround is to study languages Proc-Static-Implicit-Unmanaged 0.14 (0.02)***
while controlling the domain. Statistically, however, with 17
Language classes coded with weighted effects codes (AIC = 10,419, Deviance = 1132,
languages across 7 domains, the large number of terms would Num. obs. = 1067).
be challenging to interpret given the sample size. ***p < 0.001, **p < 0.01, *p < 0.05.
Given this, we first consider testing for the dependence
between domain and language usage within a project, using
a Chi-square test of independence. Of 119 cells, 46, that is, different with p = 0.00044. We note here that while choos-
39%, are below the value of 5 which is too high. No more than ing different coding methods affects the coefficients and
20% of the counts should be below 5.14 We include the value z-scores, the models are identical in all other respects. When
here for completenessd; however, the low strength of associa- we change the coding we are rescaling the coefficients to
tion of 0.191 as measured by Cramer’s V, suggests that any reflect the comparison that we wish to make.4 Comparing the
relationship between domain and language is small and that other language classes to the grand mean, Proc-Static-
inclusion of domain in regression models would not produce Implicit-Unmanaged languages are more likely to induce
meaningful results. defects. This implies that either implicit type conversion or
One option to address this concern would be to remove memory management issues contribute to greater defect
­languages or combine domains, however, our data here presents proneness as compared with other procedural languages.
no clear choices. Alternatively, we could combine languages; Among scripting languages we observe a similar relation-
this choice leads to a related but slightly different question. ship between languages that allow versus those that do not
RQ2. Which language properties relate to defects? allow implicit type conversion, providing some evidence that
Rather than considering languages individually, we aggre- implicit type conversion (vs. explicit) is responsible for this dif-
gate them by language class, as described in Section 2.2, and ference as opposed to memory management. We cannot state
analyze the relationship to defects. Broadly, each of these this conclusively given the correlation between factors. However
properties divides languages along lines that are often dis- when compared to the average, as a group, languages that do
cussed in the context of errors, drives user debate, or has not allow implicit type conversion are less error-prone while
been the subject of prior work. Since the individual proper- those that do are more error-prone. The contrast between static
ties are highly correlated, we create six model factors that and dynamic typing is also visible in functional languages.
combine all of the individual factors across all of the languages The functional languages as a group show a strong dif­
in our study. We then model the impact of the six different­ ference from the average. Statically typed languages have a
factors on the number of defects while controlling for the substantially smaller coefficient yet both functional language
same basic covariates that we used in the model in RQ1. classes have the same standard error. This is strong evidence
As with language (earlier in Table 6), we are comparing that functional static languages are less error-prone than
language classes with the average behavior across all lan- functional dynamic languages, however, the z-tests only test
guage classes. The model is presented in Table 7. It is clear whether the coefficients are different from zero. In order to
that Script-Dynamic-Explicit-Managed class has strengthen this assertion, we recode the model as above using
the smallest magnitude coefficient. The coefficient is insig- treatment coding and observe that the Functional-Static-
nificant, that is, the z-test for the coefficient cannot distin- Explicit-Managed language class is significantly less
guish the coefficient from zero. Given the magnitude of the defect-prone than the Functional-Dynamic-Explicit-
standard error, however, we can assume that the behavior Managed language class with p = 0.034.
of languages in this class is very close to the average across
all languages. We confirm this by recoding the coefficient Df Deviance Resid. Df Resid. Dev Pr (>Chi)
using Proc-Static-Implicit-Unmanaged as the base
NULL 1066 32,995.23
level and employing treatment, or dummy coding that com- Log commits 1 31,634.32 1065 1360.91 0
pares each language class with the base level. In this case, Log age 1 51.04 1064 1309.87 0
Script-Dynamic-Explicit-Managed is significantly Log size 1 50.82 1063 1259.05 0
Log devs 1 31.11 1062 1227.94 0
Lang. class 5 95.54 1057 1132.40 0
d
  Chi-squared value of 243.6 with 96 df. and p = 8.394e−15

96 COMM UNICATIO NS O F THE ACM | O C TO BER 201 7 | VO L . 60 | NO. 1 0


As with language and defects, the relationship between here as, even though this is a nonstatistical method, some
language class and defects is based on a small effect. The relationships could impact visualization. For example, we
deviance explained is similar, albeit smaller, with language found that a single project, Google’s v8, a JavaScript proj-
class explaining much less than 1% of the deviance. ect, was responsible for all of the errors in Middleware. This
We now revisit the question of application domain. Does was surprising to us since JavaScript is typically not used
domain have an interaction with language class? Does the for Middleware. This pattern repeats in other domains, con-
choice of, for example, a functional language, have an advan- sequently, we filter out the projects that have defect density
tage for a particular domain? As above, a Chi-square test for below 10 and above 90 percentile. The result is in Figure 1.
the relationship between these factors and the project domain We see only a subdued variation in this heat map which
yields a value of 99.05 and df = 30 with p = 2.622e−09 allow- is a result of the inherent defect proneness of the languages
ing us to reject the null hypothesis that the factors are inde- as seen in RQ1. To validate this, we measure the pairwise
pendent. Cramer’s-V yields a value of 0.133, a weak level of rank correlation between the language defect proneness
association. Consequently, although there is some relation for each domain with the overall. For all of the domains
between domain and language, there is only a weak relation- except Database, the correlation is positive, and p-values are
ship between domain and language class. significant (<0.01). Thus, w.r.t. defect proneness, the lan-
Result 2: There is a small but significant relationship between guage ordering in each domain is strongly correlated with
language class and defects. Functional languages are associated the overall language ordering.
with fewer defects than either procedural or scripting languages.
It is somewhat unsatisfying that we do not observe a strong APP CA DB FW LIB MW
association between language, or language class, and domain Spearman corr. 0.71 0.56 0.30 0.76 0.90 0.46
within a project. An alternative way to view this same data p-Value 0.00 0.02 0.28 0.00 0.00 0.09
is to disregard projects and aggregate defects over all lan-
guages and domains. Since this does not yield independent Result 3: There is no general relationship between applica-
samples, we do not attempt to analyze it statistically, rather tion domain and language defect proneness.
we take a descriptive, visualization-based approach. We have shown that different languages induce a larger
We define Defect Proneness as the ratio of bug fix commits number of defects and that this relationship is not only
over total commits per language per domain. Figure 1 illustrates related to particular languages but holds for general classes
the interaction between domain and language using a heat of languages; however, we find that the type of project does
map, where the defect proneness increases from lighter to not mediate this relationship to a large degree. We now turn
darker zone. We investigate which language factors influ- our attention to categorization of the response. We want to
ence defect fixing commits across a collection of projects understand how language relates to specific kinds of defects
written across a variety of languages. This leads to the follow- and how this relationship compares to the more general rela-
ing research question: tionship that we observe. We divide the defects into catego-
RQ3. Does language defect proneness depend on domain? ries as described in Table 5 and ask the following question:
In order to answer this question we first filtered out proj- RQ4. What is the relation between language and bug
ects that would have been viewed as outliers, filtered as high category?
leverage points, in our regression models. This was necessary We use an approach similar to RQ3 to understand the rela-
tion between languages and bug categories. First, we study
Figure 1. Interaction of language’s defect proneness with domain. the relation between bug categories and language class.
Each cell in the heat map represents defect proneness of a language A heat map (Figure 2) shows aggregated defects over language
(row header) for a given domain (column header). The “Overall” classes and bug types. To understand the interaction between
column represents defect proneness of a language over all the
domains. The cells with white cross mark indicate null value, that is,
no commits were made corresponding to that cell. Figure 2. Relation between bug categories and language class.
C
Each cell represents percentage of bug fix commit out of all bug fix
C++ commits per language class (row header) per bug category (column
C# header). The values are normalized column wise.
Objective−C
Go script-dynamic- expl. -managed
Java Bug (%)
Coffeescript
Language

script-dynamic- implic .managed


Javascript 40
Typescript 30
Language class

Ruby 20
proc-static- expl. -unmanaged
Php 10
Python proc-static- implic .managed
Perl
Clojure func-static- implic .managed
Erlang
Haskell
func-dynamic- implic .managed
Scala
n

er

se

ll
ar
or

ar
tio

ra

go

re

ce

it y
yz

ba

nc

or

in
br
ew

lu
w

an
Al

ur
ica

m
em
al

re
Ov
ta

i
e

Fa
Li

c
rm

m
An

ur
am

dl

Se
pl

Da

ra
rfo
nc
id
Ap

de

og
Fr

Co

Pe

Pr
Co

Domain Bug type

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 97
research highlights

bug categories and languages, we use an NBR regression managed languages, Java induces more memory errors,
model for each category. For each model we use the same although fewer than the unmanaged languages. Although
control factors as RQ1 as well as languages encoded with Java has its own garbage collector, memory leaks are not
weighted effects to predict defect fixing commits. surprising since unused object references often prevent the
The results along with the anova value for language are garbage collector from reclaiming memory.11 In our data,
shown in Table 8. The overall deviance for each model is 28.89% of all the memory errors in Java are the result of a
substantially smaller and the proportion explained by lan- memory leak. In terms of effect size, language has a larger
guage for a specific defect type is similar in magnitude for impact on memory defects than all other cause categories.
most of the categories. We interpret this relationship to Concurrency errors. 1.99% of the total bug fix commits
mean that language has a greater impact on specific cat- are related to concurrency errors. The heat map shows that
egories of bugs, than it does on bugs overall. In the next Proc-Static-Implicit-Unmanaged dominates this
section we expand on these results for the bug categories error type. C and C++ introduce 19.15% and 7.89% of the
with significant bug counts as reported in Table 5. However, errors, and they are distributed across the projects.
our conclusion generalizes for all categories.
Programming errors. Generic programming errors account C C++ C# Java Scala Go Erlang
for around 88.53% of all bug fix commits and occur in all Race 63.11 41.46 77.7 65.35 74.07 92.08 78.26
the language classes. Consequently, the regression analy- Deadlock 26.55 43.36 14.39 17.08 18.52 10.89 15.94
sis draws a similar conclusion as of RQ1 (see Table 6). SHM 28.78 18.24 9.36 9.16 8.02 0 0
All languages incur programming errors such as faulty error- MPI 0 2.21 2.16 3.71 4.94 1.98 10.14
handling, faulty definitions, typos, etc.
Memory errors. Memory errors account for 5.44% of all the Both of the Static-Strong-Managed language classes
bug fix commits. The heat map in Figure 2 shows a strong rela- are in the darker zone in the heat map confirming, in general
tionship between Proc-Static-Implicit-Unmanaged static languages produce more concurrency errors than others.
class and memory errors. This is expected as languages with Among the dynamic languages, only Erlang is more prone
unmanaged memory are known for memory bugs. Table 8 to concurrency errors, perhaps relating to the greater use
confirms that such languages, for example, C, C++, and of this language for concurrent applications. Likewise, the
Objective-C introduce more memory errors. Among the negative coefficients in Table 8 shows that projects written

Table 8. While the impact of language on defects varies across defect category, language has a greater impact on specific categories than it
does on defects in general.

Memory Concurrency Security Failure


(Intercept) −7.49 (0.46)*** −8.13 (0.74)*** −7.29 (0.58)*** −6.21 (0.41)***
Log commits 0.99 (0.05)*** 1.09 (0.09)*** 0.89 (0.07)*** 0.88 (0.05)***
Log age 0.15 (0.06)* 0.19 (0.10) 0.30 (0.08)*** 0.07 (0.06)
Log size 0.01 (0.04) −0.08 (0.07) −0.01 (0.05 0.14 (0.04)***
Log devs 0.07 (0.04) 0.09 (0.07) 0.07 (0.06) −0.11 (0.04)*
C 1.71 (0.12)*** 0.39 (0.22) 0.28 (0.18) 0.43 (0.13)**
C# −0.12 (0.17) 0.81 (0.24)*** −0.42 (0.23) −0.07 (0.16)
C++ 1.08 (0.10)*** 1.07 (0.18)*** 0.40 (0.16)* 1.05 (0.11)***
Objective-C 1.40 (0.15)*** 0.41 (0.28) −0.14 (0.24) 1.10 (0.15)***
Go −0.05 (0.25) 1.62 (0.30)*** 0.35 (0.28) −0.49 (0.24)*
Java 0.53 (0.14)*** 0.80 (0.22)*** −0.07 (0.19) 0.15 (0.14)
CoffeeScript −0.41 (0.23) −1.73 (0.54)** −0.36 (0.27) −0.05 (0.19)
JavaScript −0.16 (0.10) −0.21 (0.16) 0.02 (0.12) −0.15 (0.09)
TypeScript −0.58 (0.62) −0.63 (1.02) 0.37 (0.51) −0.42 (0.41)
Ruby −1.16 (0.19)*** −0.89 (0.29)** −0.18 (0.21) −0.32 (0.16)*
Php −0.69 (0.17)*** −1.70 (0.34)*** 0.11 (0.21) −0.62 (0.17)***
Python −0.48 (0.14)*** −0.25 (0.22) 0.36 (0.16)* 0.04 (0.12)
Perl 0.15 (0.35) −1.23 (0.83) −0.62 (0.45) −0.64 (0.38)
Scala −0.47 (0.18)** 0.63 (0.24)** −0.22 (0.22) −0.93 (0.18)***
Clojure −1.21 (0.27)*** −0.01 (0.30) −0.82 (0.27)** −0.62 (0.19)**
Erlang −0.60 (0.23)** 0.63 (0.28)* 0.62 (0.22)** 0.59 (0.17)***
Haskell −0.28 (0.20) −0.27 (0.32) −0.45 (0.26) −0.49 (0.20)*
AIC 2991.47 2210.01 3328.39 4086.42
Deviance 895.02 665.17 896.58 1043.02
Num. obs. 1081 1081 1073 1077
Residual deviance (NULL) 5065.30 2124.93 2170.23 3769.70
Language deviance 522.86 139.67 42.72 240.51
For all models the deviance explained by language type has p < 0.0003076.
***p < 0.001, **p < 0.01, *p < 0.05.

98 COM MUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


in dynamic languages like Ruby and Php have fewer concur- students in a lab setting with custom-designed language and
rency errors. Note that, certain languages like JavaScript, IDE. Our study, by contrast is a field study of popular software
CoffeeScript, and TypeScript do not support concur- applications. While we can only indirectly (and post facto)
rency, in its traditional form, while Php has a limited support control for confounding factors using regression, we benefit
depending on its implementations. These languages intro- from much larger sample sizes, and more realistic, widely-
duce artificial zeros in the data, and thus the concurrency used software. We find that statically typed languages in gen-
model coefficients in Table 8 for those languages cannot be eral are less defect-prone than the dynamic types, and that
interpreted like the other coefficients. Due to these artificial disallowing implicit type conversion is better than allowing
zeros, the average over all languages in this model is smaller, it, in the same regard. The effect sizes are modest; it could
which may affect the sizes of the coefficients, since they are be reasonably argued that they are visible here precisely
given w.r.t. the average, but it will not affect their relative rela- because of the large sample sizes.
tionships, which is what we are after. Harrison et al.8 compared C++, a procedural language,
A textual analysis based on word-frequency of the bug fix with SML, a functional language, finding no significant dif-
messages suggests that most of the concurrency errors occur ference in total number of errors, although SML has higher
due to a race condition, deadlock, or incorrect synchroniza- defect density than C++. SML was not represented in our data,
tion, as shown in the table above. Across all language, race which however, suggest that functional languages are gener-
conditions are the most frequent cause of such errors, for ally less defect-prone than procedural languages. Another line
example, 92% in Go. The enrichment of race condition errors of work primarily focuses on comparing development effort
in Go is probably due to an accompanying race-detection tool across different languages.12, 20 However, they do not analyze
that may help developers locate races. The synchronization language defect proneness.
errors are primarily related to message passing interface (2) Surveys. Meyerovich and Rabkin16 survey developers’
(MPI) or shared memory operation (SHM). Erlang and Go views of programming languages, to study why some languages
use MPIe for inter-thread communication, which explains are more popular than others. They report strong influence
why these two languages do not have any SHM related errors from non-linguistic factors: prior language skills, availabil-
such as locking, mutex, etc. In contrast, projects in the other ity of open source tools, and existing legacy systems. Our
languages use SHM primitives for communication and can study also confirms that the availability of external tools also
thus may have locking-related errors. impacts software quality; for example, concurrency bugs in
Security and other impact errors. Around 7.33% of all the Go (see RQ4 in Section 3).
bug fix commits are related to Impact errors. Among them (3) Repository mining. Bhattacharya and Neamtiu1 study
Erlang, C++, and Python associate with more security four projects developed in both C and C++ and find that the
errors than average (Table 8). Clojure projects associate software components developed in C++ are in general more
with fewer security errors (Figure 2). From the heat map we reliable than C. We find that both C and C++ are more defect-
also see that Static languages are in general more prone prone than average. However, for certain bug types like con-
to failure and performance errors, these are followed by currency errors, C is more defect-prone than C++ (see RQ4
Functional-Dynamic-Explicit-Managed languages in Section 3).
such as Erlang. The analysis of deviance results confirm
that language is strongly associated with failure impacts. 5. THREATS TO VALIDITY
While security errors are the weakest among the catego- We recognize few threats to our reported results. First, to iden-
ries, the deviance explained by language is still quite strong tify bug fix commits we rely on the keywords that developers
when compared with the residual deviance. often use to indicate a bug fix. Our choice was deliberate.
Result 4: Defect types are strongly associated with languages; We wanted to capture the issues that developers continu-
some defect type like memory errors and concurrency errors ously face in an ongoing development process, rather than
also depend on language primitives. Language matters more reported bugs. However, this choice possesses threats of over
for specific categories than it does for defects overall. estimation. Our categorization of domains is subject to inter-
preter bias, although another member of our group verified
4. RELATED WORK the categories. Also, our effort to categorize bug fix commits
Prior work on programming language comparison falls in could potentially be tainted by the initial choice of keywords.
three categories: The descriptiveness of commit logs vary across projects. To
(1) Controlled experiment. For a given task, developers mitigate these threats, we evaluate our classification against
are monitored while programming in different languages. manual annotation as discussed in Section 2.4.
Researchers then compare outcomes such as development We determine the language of a file based on its exten-
effort and program quality. Hanenberg7 compared static ver- sion. This can be error-prone if a file written in a different
sus dynamic typing by monitoring 48 programmers for 27 h language takes a common language extension that we have
while developing a parser program. He found no significant studied. To reduce such error, we manually verified language
difference in code quality between the two; however, dynamic categorization against a randomly sampled file set.
type-based languages were found to have shorter develop- To interpret language class in Section 2.2, we make cer-
ment time. Their study was conducted with undergraduate tain assumptions based on how a language property is most
commonly used, as reflected in our data set, for example, we
e
  MPI does not require locking of shared resources. classify Objective-C as unmanaged memory type rather

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T HE ACM 99
research highlights

than hybrid. Similarly, we annotate Scala as functional time. In Proceedings of the ACM undocumented software. In ACM
International Conference on Object SIGPLAN Notices, Volume 47 (2012).
and C# as procedural, although both support either design Oriented Programming Systems ACM, 683–702.
choice.19, 21 We do not distinguish object-oriented languages Languages and Applications, 16. Meyerovich, L.A., Rabkin, A.S.
OOPSLA’10 (New York, NY, USA, Empirical analysis of programming
(OOP) from procedural languages in this work as there is no 2010). ACM, 22–35. language adoption. In Proceedings
clear distinction, the difference largely depends on program- 8. Harrison, R., Smaraweera, L., Dobie, M., of the 2013 ACM SIGPLAN
Lewis, P. Comparing programming International Conference on Object
ming style. We categorize C++ as allowing implicit type con- paradigms: An evaluation of functional Oriented Programming Systems
version because a memory region of a certain type can be and object-oriented programs. Softw. Languages & Applications (2013).
Eng. J. 11, 4 (1996), 247–254. ACM, 1–18.
treated differently using pointer manipulation.22 We note that 9. Harter, D.E., Krishnan, M.S., 17. Milner, R. A theory of type
Slaughter, S.A. Effects of process polymorphism in programming.
most C++ compilers can detect type errors at compile time. maturity on quality, cycle time, J. Comput. Syst. Sci. 17, 3 (1978),
Finally, we associate defect fixing commits to language and effort in software product 348–375.
development. Manage. Sci. 46 4 18. Mockus, A., Votta, L.G. Identifying
properties, although they could reflect reporting style or (2000), 451–466. reasons for software changes using
other developer properties. Availability of external tools or 10. Hindley, R. The principal type-scheme historic databases. In ICSM’00.
of an object in combinatory logic. Proceedings of the International
libraries may also impact the extent of bugs associated with Trans. Am. Math. Soc. (1969), 29–60. Conference on Software Maintenance
a language. 11. Jump, M., McKinley, K.S. Cork: (2000). IEEE Computer Society, 120.
Dynamic memory leak detection for 19. Odersky, M., Spoon, L., Venners, B.
garbage-collected languages. In ACM Programming in Scala. Artima Inc,
6. CONCLUSION SIGPLAN Notices, Volume 42 (2007). 2008.
ACM, 31–38. 20. Pankratius, V., Schmidt, F.,
We have presented a large-scale study of language type and 12. Kleinschmager, S., Hanenberg, S., Garretón, G. Combining functional
use as it relates to software quality. The Github data we used is Robbes, R., Tanter, É., Stefik, A. Do and imperative programming for
static type systems improve the multicore software: An empirical
characterized by its complexity and variance along multiple maintainability of software systems? study evaluating scala and java. In
dimensions. Our sample size allows a mixed-methods study An empirical study. In 2012 IEEE Proceedings of the 2012 International
20th International Conference on Conference on Software Engineering
of the effects of language, and of the interactions of language, Program Comprehension (ICPC) (2012). IEEE Press, 123–133.
domain, and defect type while controlling for a number of (2012). IEEE, 153–162. 21. Petricek, T., Skeet, J. Real World
13. Li, Z., Tan, L., Wang, X., Lu, S., Zhou, Y., Functional Programming: With
confounds. The data indicates that functional languages are Zhai, C. Have things changed now? An Examples in F# and C#. Manning
empirical study of bug characteristics Publications Co., 2009.
better than procedural languages; it suggests that disallow- in modern open source software. 22. Pierce, B.C. Types and Programming
ing implicit type conversion is better than allowing it; that In ASID’06: Proceedings of the 1st Languages. MIT Press, 2002.
Workshop on Architectural and System 23. Posnett, D., Bird, C., Dévanbu, P. An
static typing is better than dynamic; and that managed mem- Support for Improving Software empirical study on the influence of
ory usage is better than unmanaged. Further, that the defect Dependability (October 2006). pattern roles on change-proneness.
14. Marques De Sá, J.P. Applied Statistics Emp. Softw. Eng. 16, 3 (2011),
proneness of languages in general is not associated with soft- Using SPSS, Statistica and Matlab, 2003. 396–423.
ware domains. Additionally, languages are more related to 15. Mayer, C., Hanenberg, S., Robbes, R., 24. Tan, L., Liu, C., Li, Z., Wang, X., Zhou, Y.,
Tanter, É., Stefik, A. An empirical Zhai, C. Bug characteristics in open
individual bug categories than bugs overall. study of the influence of static source software. Emp. Softw. Eng.
On the other hand, even large datasets become small type systems on the usability of (2013).
and insufficient when they are sliced and diced many ways
simultaneously. Consequently, with an increasing number Baishakhi Ray ([email protected]), Daryl Posnett, Premkumar Devanbu,
Department of Computer Science, and Vladimir Filkov ({dpposnett@,
of dependent variables it is difficult to answer questions University of Virginia, Charlottesville, VA. devanbu@cs., filkov@cs.}ucdavis.edu),
about a specific variable’s effect, especially where variable Department of Computer Science,
University of California, Davis, CA.
interactions exist. Hence, we are unable to quantify the spe-
cific effects of language type on usage. Additional methods
such as surveys could be helpful here. Addressing these chal-
lenges remains for future work.

Acknowledgments
This material is based upon work supported by the National
Science Foundation under grant nos. 1445079, 1247280,
1414172, 1446683 and from AFOSR award FA955–11-1-0246.

References
1. Bhattacharya, P., Neamtiu, I. 4. Cohen, J. Applied Multiple
Assessing programming language Regression/Correlation Analysis for
impact on development and the Behavioral Sciences. Lawrence
maintenance: A study on C and Erlbaum, 2003.
C++. In Proceedings of the 33rd 5. Easterbrook, S., Singer, J., Storey,
International Conference on Software M.-A., Damian, D. Selecting empirical
Engineering, ICSE’11 (New York, NY, methods for software engineering
USA, 2011). ACM, 171–180. research. In Guide to Advanced
2. Bird, C., Nagappan, N., Murphy, B., Empirical Software Engineering
Gall, H., Devanbu, P. Don’t touch (2008). Springer, 285–311.
my code! Examining the effects 6. El Emam, K., Benlarbi, S., Goel, N.,
of ownership on software quality. Rai, S.N. The confounding effect of
In Proceedings of the 19th ACM class size on the validity of object-
SIGSOFT Symposium and the 13th oriented metrics. IEEE Trans. Softw.
European Conference on Foundations Eng. 27, 7 (2001), 630–650.
of Software Engineering (2011). ACM, 7. Hanenberg, S. An experiment about
4–14. static and dynamic type systems:
3. Blei, D.M. Probabilistic topic models. Doubts about the positive impact of Copyright held by owners/authors.
Commun. ACM 55, 4 (2012), 77–84. static type systems on development Publication rights licensed to ACM.

100 CO MM UNICATIO NS O F T H E AC M | O C TO BER 201 7 | VO L . 60 | N O. 1 0


Text Data Management and Analysis covers the major concepts, techniques, and ideas in
Text Data Management and Analysis covers the major concepts, techniques, and ideas in
information retrieval and text data mining. It focuses on the practical viewpoint and includes
information retrieval and text data mining. It focuses on the practical viewpoint and includes
many many
hands-on exercises
hands-on designed
exercises designed with
witha acompanion softwaretoolkit
companion software toolkit (i.e.,
(i.e., MeTA)MeTA) to help
to help readers
readers
learn how
learnto
howapply techniques
to apply of information
techniques of informationretrieval andtext
retrieval and textmining
mining to to real-world
real-world text data.
text data. It It
also shows readers
also shows howhow
readers to experiment
to experimentwithwithandandimprove someofofthethe
improve some algorithms
algorithms for interesting
for interesting
application tasks.tasks.
application The The
book cancan
book be be usedasasa atext
used text for
for computer science
computer science undergraduates
undergraduates and graduates,
and graduates,
library and information scientists, or as a reference for practitioners working on
library and information scientists, or as a reference for practitioners working on relevant problemsrelevant problems in in
managing and analyzing
managing and analyzing text data. text data.
The newest
ACM forum.
Contributions
that cover the vast
information-rich world
where computing
Captivating
is embedded topics.
everywhere. Net Neutrality
and the Regulated
ACM’s Ubiquity is
Internet
the online magazine
oriented toward the The End of Life
future of computing As We Know It
and the people who A Shortage of
are creating it. Technicians
We invite you to The Fractal
participate: leave Software
comments, vote for Hypothesis
your favorites, or
submit your own Your Grandfather’s
contributions.
Oldsmobile–NOT!
Superscalar
Smart Cities

Visit us at
http://ubiquity.acm.org/blog/
CAREERS

California Institute of Technology of the largest Christian colleges in North America, an official copy of most recent transcripts, and
Lecturer in Computing and Mathematical and was named the #1 regional college in the Mid- a diversity statement that describes how your
Sciences west for 2017 by U.S. News & World Report. teaching, scholarship, mentoring and/or service
For more information and application infor- might contribute to a liberal arts college commu-
The Department of Computing and Mathemati- mation, see: https://cs.calvin.edu/documents/ nity that includes a commitment to diversity as
cal Sciences (CMS) at the California Institute of Tenure_Track_Faculty_Position. one of its core values. Three letters of recommen-
Technology invites applications for the position dation should be sent separately. Review of appli-
of Lecturer in Computing and Mathematical Sci- cations will continue until the position is filled.
ences. This is a (non-tenure-track) career teaching Furman University To submit an application and letters of recom-
position, with full-time teaching responsibilities. Open-Rank Tenure Track Professor in mendation, please visit http://jobs.furman.edu.
We seek to fill this position this coming academic Computer Science
year and the initial term of appointment can be
up to three years. The Department of Computer Science at Furman Johns Hopkins University
The lecturer will teach introductory computer University invites applications for an open-rank Full-Time Teaching Position
science courses including data structures, algo- tenure track position to begin in the fall of 2018.
rithms and software engineering, and will work Candidates must have a Ph.D. in Computer Sci- The Department of Computer Science at Johns
closely with the CMS faculty on instructional ence or a closely related field, and all areas of spe- Hopkins University seeks applicants for a full-
matters. The ability to teach intermediate-level cialty will be considered. The position requires time teaching position. This is a career-oriented,
undergraduate courses in areas such as software teaching excellence, scholarly and professional renewable appointment that is responsible for
engineering, computing systems and/or com- activity involving undergraduates, effective insti- the development and delivery of courses pri-
pilers is desired. The lecturer may also assist in tutional service, and a willingness to work with marily to undergraduate students both within
other aspects of the undergraduate program, colleagues across disciplines. and outside the major. Teaching faculty are also
including curriculum development, academic The Department of Computer Science confers encouraged to engage in departmental and uni-
advising, and monitoring research projects. The degrees with majors in Computer Science (B.S.) versity service and may have advising responsi-
lecturer must have a track record of excellence in and Information Technology (B.S. and B.A.), an bilities. Opportunities to teach graduate level
teaching computer science to undergraduates. innovative, interdisciplinary program of study. courses may also be available, depending on the
In addition, the lecturer will have opportunities The Department values teaching and research candidate’s background. Extensive grading sup-
to participate in research projects in the depart- projects that bridge Computer Science with other port is given to all instructors. The university has
ment. An advanced degree in Computer Science disciplines, providing students with learning instituted a non-tenure track career path for full-
or related field is desired but not required. opportunities outside the classroom and in the time teaching faculty culminating in the rank of
Applications will be accepted on an ongo- community, and contributing to Furman’s uni- Teaching Professor.
ing basis until the position is filled. Please view versity-wide First Year Writing Seminar program. Johns Hopkins is a private university known
the application instructions and apply on-line at Furman Computer Science professors mentor for its commitment to academic excellence and
https://applications.caltech.edu/job/cmslect undergraduates both formally and informally, research. The Computer Science department is
The California Institute of Technology is an and work to build a welcoming student-faculty one of nine academic departments in the Whit-
Equal Opportunity/Affirmative Action Employer. community. ing School of Engineering. We are located in Bal-
Women, minorities, veterans, and disabled per- Furman University is a selective private liber- timore, MD in close proximity to Washington,
sons are encouraged to apply. al arts and sciences college committed to help- DC and Philadelphia, PA. See the department
ing students develop intellectually, personally, webpages at http://www.cs.jhu.edu for additional
and interpersonally and providing the practical information about the department, including
Calvin College skills necessary to succeed in a rapidly-changing undergraduate programs and current course de-
Tenure Track CS Faculty Position world. Our recently-launched strategic vision, scriptions.
The Furman Advantage, promises students an Applicants for the position must have a Ph.D.
The Department of Computer Science at Calvin individualized four-year pathway facilitated by in Computer Science or a closely related field,
College invites applications for a tenure track fac- team of mentors and infused with a rich and var- demonstrated excellence in and commitment to
ulty position to begin August 2018, pending ad- ied set of high-impact experiences outside the teaching, and excellent communication skills.
ministrative approval. Our department features classroom that include undergraduate research, Applicants should apply online at https://aca-
supportive colleagues, excellent facilities, (most- study away, internships, community-focused demicjobsonline.org/ajo/jobs/9571. Applications
ly) hardworking students, a dynamic colloquium learning, and opportunities to engage across will be evaluated on a rolling basis. Questions
series, and strong undergraduate programs in disciplines. should be directed to [email protected].
computer science, data science, digital commu- Furman is an Equal Opportunity Employer The Johns Hopkins University is commit-
nication, and information systems, including a committed to increasing the diversity of its fac- ted to active recruitment of a diverse faculty and
BCS major accredited by ABET (abet.org). Appli- ulty and staff. The University aspires to create a student body. The University is an Affirmative Ac-
cants should have a PhD (or be near completion) community of people representing a multiplic- tion/Equal Opportunity Employer of women, mi-
in computer science or a related area, or have a ity of identities including gender, race, religion, norities, protected veterans and individuals with
master’s degree and 5 years of related experience. spiritual belief, sexual orientation, geographic disabilities and encourages applications from
We are especially interested in expanding our ex- origin, socioeconomic background, ideology, these and other protected group members. Con-
pertise in the areas of 3D modeling & animation, world view, and varied abilities. Domestic part- sistent with the University’s goals of achieving
cybersecurity, data analytics & visualization, or ners of employees are eligible for comprehensive excellence in all areas, we will assess the compre-
machine learning, but individuals from all com- benefits. hensive qualifications of each applicant.
puting-related areas are encouraged to apply. Applicants should submit a curriculum vitae, The Whiting School of Engineering and the
Calvin is a Christian comprehensive liberal arts cover letter, statement of teaching philosophy Department of Computer Science are committed
college located in Grand Rapids, Michigan; it is one and experience, statement of research interests, to building a diverse educational environment.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T H E ACM 103


CAREERS

Johns Hopkins University mation for at least three references. Applications a Ph.D. in computer science or a closely related
Tenure-Track Faculty Positions must be made on-line at https://academicjobson- discipline, teaching experience, and strong re-
line.org/ajo/jobs/9559. Review of applications search record or research potential.
The Johns Hopkins University’s Department of will begin in December 2017. While candidates Strong candidates in all areas of Computer Sci-
Computer Science seeks applicants for tenure- who complete their applications by December ence are encouraged to apply but we are especially
track faculty positions at all levels and across all 15, 2017 will receive full consideration, the de- interested in cyber security and networking, mo-
areas of computer science. Particular emphasis is partment will consider exceptional applicants bile development, database, or big data analytics.
at the junior level and in the areas of systems, dis- at any time. Questions should be directed to Applications should include a cover letter,
tributed systems, networks, system security, data [email protected]. curriculum vitae, research statement, a teaching
center scale computing, and big data infrastruc- The Johns Hopkins University is commit- statement, and a list of 3 references. Please send
ture. However, qualified applicants in all areas of ted to active recruitment of a diverse faculty and electronic copies of these documents to Alfred
computer science will be considered. student body. The University is an Affirmative Ac- McKinney [email protected]. Review of
The Department of Computer Science has tion/Equal Opportunity Employer of women, mi- applications will begin on September 30th, 2017,
30 full-time tenured and tenure-track faculty norities, protected veterans and individuals with and continue until the position is filled.
members, 8 research and 3 teaching faculty disabilities and encourages applications from Please, visit http://www.lsus.edu for more de-
members, 150 PhD students, 225 MSE/MSSI these and other protected group members. Con- tails about LSUS and http://www.lsus.edu/hr for
students, and over 300 undergraduate students. sistent with the University’s goals of achieving more information on this position and other em-
There are several affiliated research centers and excellence in all areas, we will assess the compre- ployment opportunities.
institutes including the Laboratory for Compu- hensive qualifications of each applicant. LSUS is committed to achieving excellence
tational Sensing and Robotics (LCSR), the Cen- The Whiting School of Engineering and the through cultural diversity. The University actively
ter for Language and Speech Processing (CLSP), Department of Computer Science are committed encourages applications and/or nominations of
the JHU Information Security Institute (JHUISI), to building a diverse educational environment. women, persons of color, veterans and persons
the Institute for Data Intensive Engineering and with disabilities. LSUS is an Affirmative Action,
Science (IDIES), the Malone Center for Engineer- Equal Opportunity Employer.
ing in Healthcare (MCEH), and other labs and Louisiana State University in
research groups. More information about the Shreveport
Department of Computer Science can be found Assistant or Associate Professor of Computer Louisiana State University in
at www.cs.jhu.edu and about the Whiting School Science Shreveport
of Engineering at https://engineering.jhu.edu. Chair of Computer Science
Qualifications and required materials can be The Department of Computer Science at Loui-
found at http://www.cs.jhu.edu/about/employ- siana State University in Shreveport invites ap- The Department of Computer Science at Louisi-
ment-opportunities/. plications for a tenure-track Assistant/Associate ana State University in Shreveport invites applica-
Applicants should submit a curriculum vitae, Professor position. The successful candidate will tions for the position of Chair of the Department
a research statement, a teaching statement, three teach 3 courses per semester (9 credit hours) be- of Computer Science in the College of Arts and
recent publications, and complete contact infor- ginning in the fall of 2018. Candidates must have Sciences to begin in the fall of 2018.

Department of Electrical and Computer Engineering


Graduate School of Engineering and Management
Air Force Institute of Technology (AFIT)
Dayton, Ohio
Department Head and Faculty Position TENURE-TRACK AND TENURED POSITIONS
ShanghaiTech University invites highly qualified
The Air Force Institute of Technology is seeking applications for both the candidates to fill multiple tenure-track/tenured
faculty positions as its core founding team in the School of Information Science and
position of Head, Department of Electrical and Computer Engineering Technology (SIST). We seek candidates with exceptional academic records or demonstrated
(non-tenure-track), as well as a faculty position in Computer Science or strong potentials in all cutting-edge research areas of information science and technology.
Computer Engineering (tenure-track). They must be fluent in English. English-based overseas academic training or background
is highly desired.
Applicants for the Department Head position must have an earned ShanghaiTech is founded as a world-class research university for training future generations
doctorate in electrical engineering, computer engineering, or computer of scientists, entrepreneurs, and technical leaders. Boasting a new modern campus in
science with credentials commensurate with the appointment as a full Zhangjiang Hightech Park of cosmopolitan Shanghai, ShanghaiTech shall trail-blaze a new
education system in China. Besides establishing and maintaining a world-class research
professor. A strong record of academic leadership experience is highly profile, faculty candidates are also expected to contribute substantially to both graduate
desirable. This is a non-tenure-track, 3 year renewable term position. and undergraduate educations.
Applicants for the tenure-track faculty position must have an earned Academic Disciplines: Candidates in all areas of information science and technology shall
be considered. Our recruitment focus includes, but is not limited to: computer architecture,
doctorate in computer science, computer engineering, electrical software engineering, database, computer security, VLSI, solid state and nano electronics, RF
engineering, or closely related field. All areas and ranks will be electronics, information and signal processing, networking, security, computational foundations,
considered. We are particularly interested in a cyber-security focus big data analytics, data mining, visualization, computer vision, bio-inspired computing systems,
involving embedded and/or cyber physical systems security. The power electronics, power systems, machine and motor drive, power management IC as well as
inter-disciplinary areas involving information science and technology.
position requires teaching at the graduate level as well as establishing
Compensation and Benefits: Salary and startup funds are highly competitive,
and sustaining a strong research program. commensurate with experience and academic accomplishment. We also offer a
The Air Force Institute of Technology (AFIT) is the premier Department of comprehensive benefit package to employees and eligible dependents, including on-
campus housing. All regular ShanghaiTech faculty members will join its new tenure-track
Defense (DoD) institution for graduate education in science, technology, system in accordance with international practice for progress evaluation and promotion.
engineering, and management. The Department of Electrical and Qualifications:
Computer Engineering offers accredited MS and PhD programs in • Strong research productivity and demonstrated potentials;
Electrical Engineering, Computer Engineering, and Computer Science • Ph.D. (Electrical Engineering, Computer Engineering, Computer Science, Statistics,
Applied Math, or related field);
as well as an MS program in Cyber Operations. • A minimum relevant (including PhD) research experience of 4 years.
Applicants for either position must be U.S. citizens. Full details on these Applications: Submit (in English, PDF version) a cover letter, a 2-page
positions, the department, applicant qualifications, and application research plan, a CV plus copies of 3 most significant publications, and names
procedures can be found at https://www.afit.edu/ENG/. Review of of three referees to: [email protected]. For more information, visit
http://sist.shanghaitech.edu.cn/NewsDetail.asp?id=373
applications will begin on December 1, 2017. The United States Air
Deadline: The positions will be open until they are filled by appropriate candidates.
Force is an equal opportunity, affirmative action employer.

104 COMM UNICATIO NS O F T H E AC M | O C TO BER 201 7 | VO L . 60 | N O. 1 0


CALIFORNIA STATE UNIVERSITY-EAST BAY
COLLEGE OF SCIENCE: COMPUTER SCIENCE

FACULTY EMPLOYMENT OPPORTUNITY


ASSISTANT PROFESSOR OF COMPUTER SCIENCE (GENERAL/CORE), TENURE-TRACK FACULTY
18-19 CS-GENERAL/CORE-TT [POSITION PS #00001011]
(2 positions)

THE UNIVERSITY: California State University, East Bay is known RANK AND SALARY: Assistant Professor. Salary is dependent
for award-winning programs, expert instruction, a diverse student upon educational preparation and experience. Subject to budgetary
body, and a choice of more than 100 career-focused fields of study. authorization.
The ten major buildings of the Hayward Hills campus, on 342 acres,
contain over 150 classrooms and teaching laboratories, over 177 DATE OF APPOINTMENT: Fall Semester, 2018.
specialized instructional rooms, numerous computer labs and a library,
which contains a collection of over one million items. The University QUALIFICATIONS: Applicants must have a Ph.D. in Computer
also has campuses in Concord and Oakland, as well as Online. With Science by Fall Semester 2018. Applicants must be able to teach
an enrollment of over 15,000 students and approximately 900 faculty, undergraduate and master’s level courses in most of the standard
CSUEB is organized into four colleges: Letters, Arts, and Social Sciences; computer science core subjects. Candidates should demonstrate
Business and Economics; Education and Allied Studies; and Science. experience in teaching, mentoring, research, or community service
The University offers bachelor’s degrees in 50 fields, minors in 61 fields, that has prepared them to contribute to our commitment to diversity
master’s degrees in 37 fields, 9 credentials programs, 12 certificate and excellence. Additionally, applicants must demonstrate a record
options, and 1 doctoral degree program. http://www20.csueastbay.edu/ of scholarly activity. Candidate’s accomplishments should be
All California State University campuses, including CSUEB, will become commensurate with their professional level.
smoke and tobacco-free effective September 1, 2017.
This University is fully committed to the rights of students, staff and
THE DEPARTMENT: The Department of Computer Science has faculty with disabilities in accordance with applicable state and federal
12 full-time faculty members, with a wide range of backgrounds and laws. For more information about the University’s program supporting
interests. The faculty is committed to teaching its undergraduate and the rights of our students with disabilities see: http://www20.
Master’s level students. In a typical quarter, the Department will offer csueastbay.edu/af/departments/as/
over 30 undergraduate and about 20 graduate classes. Classes are
offered both in day and evening. Classes are generally small, with APPLICATION DEADLINE: Review of applications will begin
many opportunities for faculty-student contact. The Department offers November 1, 2017. The position, however, will be considered open
a variety of degrees: B.S. in Computer Science (with possible options until filled. Please submit a letter of application, which addresses the
in Networking and Data Communications, Software Engineering, qualifications noted in the position announcement; a complete and
or Computer Engineering), and M.S. in both Computer Science and current vita; names and email addresses of three references, three
Computer Networks. Currently, there are more than 500 undergraduate letters of recommendation, a statement of teaching philosophy, and
majors and over 250 students in the M.S. programs. evidence of teaching and research abilities via http://apply.interfolio.
com/42009. See instructions on how to apply below. Applicants are
DUTIES OF THE POSITION (2 positions currently strongly encouraged to also submit a one-page diversity statement
available): Teaching courses at B.S. and M.S. levels, curriculum that addresses how you engage a diverse student population in your
development at both levels, and sustaining a research program. Please teaching, research, mentoring, and advising.
note that teaching assignments at California State University, East Bay
include courses at the Hayward, Concord and Online campuses. In Note: California State University, East Bay hires only individuals
addition to teaching, all faculty have advising responsibilities, assist lawfully authorized to work in the United States. All offers of employment
the department with administrative and/or committee work, and are are contingent upon presentation of documents demonstrating the
expected to assume campus-wide committee responsibilities. The appointee’s identity and eligibility to work in accordance with provisions
ideal candidate for this position is able to: of the Immigration Reform and Control Act. A background check
1. Teach a wide range of computer science courses including most or (including a criminal records check and prior employment verification)
all of the core subject matter at both the undergraduate and graduate must be completed and cleared prior to the start of employment.
level.
2. Support offerings for undergraduate C.S. students including teaching For instructions on how to apply, please visit:
courses, developing the undergraduate curriculum, and engaging https://help.interfolio.com/hc/en-us/articles/203701176-Job-
undergraduate students in research. Applicant-s-Guide-to-ByCommittee -Faculty-Search
3. Support offerings for graduate C.S. students including teaching
courses, guiding M.S. theses, developing the graduate comprehensive
examination, etc. As an Equal Opportunity Employer, CSUEB does not discriminate on
4. Advise Computer Science students. the basis of any protected categories: age, ancestry, citizenship, color,
5. Participate in departmental activities such as curriculum development, disability, gender, immigration status, marital status, national origin,
assessment, outreach, etc. race, religion, sexual orientation, or veteran’s status. The University is
6. Develop and continue ongoing research activities, service and committed to the principles of diversity in employment and to creating
leadership. a stimulating learning environment for its diverse student body.
CAREERS

The Chair will teach 2 classes per semester Applications will be reviewed on a continuing Mathematics, and Statistics, as well as a new
(6 credit hours) and oversee Departmental basis starting on October 30th, 2017, until the minor in Data Science. Typical class sizes
operations, personnel, and resources. position is filled and the posting removed from range from 15 to 32 students. We encourage
Additionally, the Chair will advance the the LSUS HR Web site. We welcome questions innovative pedagogy and curriculum and
Departmental vision of academic leadership and related to this posting. emphasize computer science’s interdisciplinary
excellence and will represent the Department Please, visit http://www.lsus.edu for more de- connections. We have close relationships with
to academia, industry, and government for its tails about LSUS and http://www.lsus.edu/hr for several disciplines both within and beyond the
continued vitality. The Chair will actively work more information on this position and other em- sciences, and we are interested in candidates
with faculty in the Department and across the ployment opportunities. whose work spans disciplinary boundaries.
University to identify and pursue innovations LSUS is committed to achieving excellence Areas of highest priority include computer and
in research, education, and service. Moreover, through cultural diversity. The University actively data security and privacy, mobile and ubiquitous
the Chair will be actively involved in strategic encourages applications and/or nominations of computing, computer networks and systems.
planning that would expand the influence of the women, persons of color, veterans and persons For more information about our programs, see:
Department in the broader academic community with disabilities. LSUS is an Affirmative Action, http://macalester.edu/mscs
both within and outside the university, and will Equal Opportunity Employer.
promote diversity within the Department and the About Macalester
institution. Macalester College is a highly selective, private
The successful candidate must have earned Macalester College liberal arts college in the vibrant Minneapolis-
a Ph.D. in Computer Science or a closely related Two Tenure-Track Assistant Professors of Saint Paul metropolitan area. The Twin Cities
field. The candidate should have a track record Computer Science have a population of approximately three million,
of research, classroom teaching, student men- a rich arts community, strong local industries,
torship, and funding from diverse sources. The Macalester invites applications for two tenure- an award-winning parks system, and are home
candidate must provide evidence of scientific and track positions at the assistant professor level to to many colleges and universities, including
organizational leadership, educational innova- begin Fall 2018. Candidates who have, or are com- the University of Minnesota. Macalester’s
tion, and have outstanding communication, in- pleting, a Ph.D. in Computer Science are preferred, diverse student body comprises over 2000
terpersonal, and administrative skills. but closely related fields may also be considered. undergraduates from 40 states and the District
Interested individuals should submit We are especially interested in candidates who of Columbia and over 90 nations. The College
an application, including a cover letter have a strong commitment to both teaching and maintains a longstanding commitment to
summarizing qualifications and leadership research in an undergraduate liberal arts environ- academic excellence with a special emphasis
approach, a vision statement for the Department, ment. This person will contribute to the teaching on internationalism, multiculturalism, and
a research and student mentoring statement, a of our introductory, core and advanced courses, service to society. We are especially interested in
teaching statement, detailed curriculum vitae, and mentor undergraduate research. applicants dedicated to excellence in teaching
and three letters of reference. Please send Macalester offers majors in Computer and research/creative activity within a liberal
electronic copies of these documents to Dr. Science, Mathematics, and Applied Mathematics arts college community. As an Equal Opportunity
Alfred McKinney at [email protected]. and Statistics, and minors in Computer Science, employer supportive of affirmative efforts to
achieve diversity among its faculty, Macalester
College strongly encourages applications from
women and members of underrepresented
minority groups.

Applying
To apply via Academic Jobs Online submit (1) cur-
riculum vitae, (2) graduate transcripts, (3) three
letters of recommendation (at least one of which
discusses your potential as a teacher), (4) a cover
Urbana-Champaign, IL
letter that addresses why you are interested in
Head and Professor Macalester, (5) a statement of teaching philoso-
phy, and (6) a research statement. Please contact
Department of Computer Science Shilad Sen at [email protected] with any
questions about the position. Evaluation of appli-

T he University of Illinois at Urbana-Champaign seeks a highly accomplished scholar and


strategic leader as Head of the Department of Computer Science (CS). The new Head
will lead a department that has continuously advanced the forefront of computing research
cations will begin October 15, 2017 and continue
until the position is filled.
Apply now: https://www.macalester.edu/aca-
and innovation since its inception, and whose faculty and alumni pioneered the modern demics/mscs/compscitenure-trackjob.html
computing era. The new Head will be uniquely positioned to build on the considerable
strength and illustrious history of the department to lead the next chapter of the digital
revolution. More details about the department can be found at http://cs.illinois.edu.
Massachusetts Institute of Technology
For a complete overview of this position, please visit http://cs.illinois.edu/about-us/faculty- Faculty Positions
positions. To ensure full consideration, applications should be received by November 10,
2017, but applications will be accepted until the position is filled.
The Massachusetts Institute of Technology (MIT)
The University has retained Witt/Keiffer, a national executive search firm, to assist in this Department of Electrical Engineering and Com-
recruitment. Nominations and applications, including cover letter, curriculum vita, and the puter Science (EECS) seeks candidates for faculty
names/contact information for three references, should be submitted electronically to positions starting in September 1, 2018, or on a
Witt/Kieffer consultants John K. Thornburgh and Brian Bloomfield at the email address
mutually agreed date thereafter. Appointment
[email protected]. The consultants can be reached by telephone, care of Donna
Janulis, at 630-575-6131. will be at the assistant or untenured associate
professor level. In special cases, a senior faculty
The University of Illinois conducts criminal background checks on all job candidates upon appointment may be possible. Faculty duties in-
acceptance of a contingent offer. clude teaching at the undergraduate and gradu-
Illinois is an EEO Employer/Vet/Disabled - www.inclusiveillinois.illinois.edu ate levels, research, and supervision of student
research. Candidates should hold a Ph.D. in
electrical engineering and computer science or a
related field by the start of employment. We will
consider candidates with research and teaching

106 COM MUNICATIO NS O F TH E AC M | O C TO BER 201 7 | VO L . 60 | N O. 1 0


interests in any area of electrical engineering and National University of Singapore Assistant Professorship as well as the President
computer science. Tenure-Track Faculty Positions in All Areas of Assistant Professor.
Candidates must register with the EECS Computer Science
search website at https://school-of-engineering- Application Details:
faculty-search.mit.edu/eecs/, and must submit The Department of Computer Science at the Na- ˲˲ Submit the following documents (in a single
application materials electronically to this web- tional University of Singapore (NUS) invites ap- PDF) online via: https://faces.comp.nus.edu.sg
site. Candidate applications should include a plications for tenure-track faculty positions in • A cover letter that indicates the position ap-
description of professional interests and goals all areas of computer science. The Department plied for and the main research interests
in both teaching and research. Each application enjoys ample research funding, moderate teach- • Curriculum Vitae
should include a curriculum vitae and the names ing loads, excellent facilities, and extensive inter- • A teaching statement
and addresses of three or more individuals who national collaborations. We have a full range of • A research statement
will provide letters of recommendation. Letter faculty covering all major research areas in com- ˲˲ Provide the contact information of 3 referees
writers should submit their letters directly to puter science and a thriving PhD program that at- when submitting your online application, or, ar-
MIT, preferably on the website or by mailing to tracts the brightest students from the region and range for at least 3 references to be sent directly
the address below. Complete applications should beyond. More information is available at www. to [email protected]
be received by December 1, 2017. Applications comp.nus.edu.sg/careers ˲˲ Application reviews will commence immedi-
will be considered complete only when both the NUS is an equal opportunity employer that ately and continue until positions are filled
applicant materials and at least three letters of offers highly competitive salaries, and is situated ˲˲ Please submit your application by 31 January
recommendation are received. in Singapore, an English-speaking cosmopolitan 2018.
It is the responsibility of the candidate to ar- city that is a melting pot of many cultures, both
range reference letters to be uploaded at https:// the east and the west. Singapore offers high-qual- If you have further enquiries, please contact
school-of-engineering-faculty-search.mit.edu/ ity education and healthcare at all levels, as well the Search Committee Chair, Weng-Fai Wong, at
eecs/ by December 1, 2017. as very low tax rates. [email protected]
Send all materials not submitted on the web- The Department is looking for candidates for
site to: all levels of tenured and tenure-track positions in
Professor Asu Ozdaglar any area of computer science. Candidates for se- Santa Clara University
Interim Department Head, Electrical Engi- nior positions should have an established record Two Tenure-Track Assistant Professors in
neering and Computer Science of outstanding, recognized research achieve- Computer Science
Massachusetts Institute of Technology ments, and thought leadership in his/her chosen
Room 38-435 area of computer science. The Department of Mathematics and Computer
77 Massachusetts Avenue Candidates for Assistant Professor positions Science at Santa Clara University invites applica-
Cambridge, MA 02139 should demonstrate excellent research potential, tions for two tenure-track assistant professor posi-
and a strong commitment to teaching. Truly out- tions in computer science. Our highest preference
M.I.T. is an equal opportunity/affirmative ac- standing Assistant Professor applicants will also is in candidates with research interests in an area
tion employer. be considered for the prestigious Sung Kah Kay related to cybersecurity for the first position and

Announcement of an open position at the Faculty of Informatics,


ADVERTISING TU Wien, Austria

IN CAREER FULL PROFESSORSHIP


OPPORTUNITIES of
How to Submit a Classified Line Ad: Send an e-mail to COMPUTER AIDED VERIFICATION
[email protected]. Please include text, and indicate (Successor of Helmut Veith)
the issue/or issues where the ad will appear, and a contact
The TU Wien (Vienna University of Technology) invites applications for a
name and number.
full professorship at the Faculty of Informatics.
Estimates: An insertion order will then be e-mailed back to
The applicant is required to have an outstanding academic record in the field
you. The ad will by typeset according to CACM guidelines.
of Computer Aided Verification (CAV). Correctness, safety, and reliability
NO PROOFS can be sent. Classified line ads are NOT
of electronic systems are paramount in today’s software-controlled world.
commissionable.
The focus of the professorship on CAV will be on automated techniques to
Deadlines: 20th of the month/2 months prior to issue date. verify soft- and hardware. Besides a proven ability in CAV core methods
For latest deadline info, please contact: (Computational Logic, Theoretical Computer Science), the candidate will
[email protected] also have a strong interdisciplinary background, especially in relation
to Embedded Information Systems, Software Verification, Synthesis or
Career Opportunities Online: Classified and recruitment
Distributed Algorithms. This position will strengthen the area of Logic and
display ads receive a free duplicate listing on our website at:
Computation as well as form a link to other research foci of the faculty.
http://jobs.acm.org
We offer excellent working conditions in an attractive research environment
Ads are listed for a period of 30 days. in a city with an exceptional quality of life.
For More Information Contact:
For a more detailed announcement and information on how to apply,
ACM Media Sales
please go to: http://www.informatik.tuwien.ac.at/vacancies
at 212-626-0686 or
[email protected] Application Deadline: October 16, 2017

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T H E ACM 107


CAREERS

an area related to algorithms for the second posi- include the analytical and empirical study of tech- Tufts University
tion. Strong candidates with research interests in nological systems, in which technology, people, Dept. of Computer Science
artificial intelligence and software aspect of data and markets interact. It thus includes operations, Tenure-Track Assistant Professor in Computer
science will be considered as well. The successful information systems/technology, and manage- Science
candidates will demonstrate not only potential for ment of technology. Applicants are expected to
excellent undergraduate teaching, but also prom- have rigorous training in management science, The Department of Computer Science at Tufts
ise in sustained research with opportunities to engineering, computer science, economics, and/ University invites applications for multiple
involve undergraduates, mentoring or recruiting or statistical modeling methodologies. Candi- tenure-track faculty positions to begin in
underrepresented groups in computer science, dates with strong empirical training in econom- September 2018. We are looking for engaged
and service to the department, College or Univer- ics, behavioral science or computer science are and engaging researchers and teachers with
sity. Positions available starting in September 2018. encouraged to apply. The appointed will be ex- a strong vision who can build and maintain
Ph.D. or equivalent required by September 2018. pected to do innovative research in the OIT field, a high-quality research program at Tufts and
The closing date for applications is Decem- to participate in the school’s PhD program, and whose research will both connect with some
ber 1, 2017 at 3 pm Pacific time. Undergraduate to teach both required and elective courses in the of our current faculty and extend into new
teaching only. MBA program. Junior applicants should have or areas. We are seeking candidates at the rank of
Santa Clara University, located in California’s expect to complete a PhD by September 1, 2018. Assistant Professor but exceptional candidates
Silicon Valley, is a comprehensive, Jesuit, Catho- Applicants should submit their applications at the rank of Associate or full Professor will also
lic university, and an AA/EEO employer. electronically by visiting the web site http://www. be considered.
For more information or to apply, visit https:// gsb.stanford.edu/recruiting and uploading their We are especially interested in candidates
jobs.scu.edu/postings/6211. curriculum vitae, research papers and publica- with research in Artificial Intelligence/Machine
tions, and teaching evaluations, if applicable, Learning/Robotics, Security, and Systems for
on that site. For an application to be considered Data Science, where these areas are broadly con-
Stanford University, Graduate School of complete, all applicants must submit a CV, a job strued. Exceptional candidates in other areas will
Business market paper and arrange for three letters of rec- be considered as well.
Faculty Positions in Operations, Information ommendation to be submitted by November 15, Please submit your application online
and Technology 2017. For questions regarding the application through Interfolio at https://apply.interfolio.
process, please send an email to Faculty_Recruit- com/43666. You can contact help@interfolio.
The Operations, Information and Technology [email protected]. com with questions.
(OIT) area at the Graduate School of Business, Stanford is an equal employment opportu- Review of applications will begin December
Stanford University, is seeking qualified appli- nity and affirmative action employer. All quali- 15, 2017 and will continue until the position is
cants for full-time, tenure-track positions, start- fied applicants will receive consideration for em- filled. For more information about the depart-
ing September 1, 2018. All ranks and relevant ployment without regard to race, color, religion, ment or the position, please visit our web page at
disciplines will be considered. Applicants are con- sex, sexual orientation, gender identity, national http://engineering.tufts.edu/computer-science-
sidered in all areas of Operations, Information origin, disability, protected veteran status, or any positions. Inquiries should be emailed to
and Technology (OIT) that are broadly defined to other characteristic protected by law. [email protected].

A personal walk down the


computer industry road.
BY AN EYEWITNESS.
Smarter Than Their Machines: Oral Histories
of the Pioneers of Interactive Computing is
based on oral histories archived at the Charles
Babbage Institute, University of Minnesota.
These oral histories contain important messages
for our leaders of today, at all levels, including
that government, industry, and academia can
accomplish great things when working together in
an effective way.

108 COMM UNICATIO NS O F T H E AC M | O C TO BER 201 7 | VO L . 60 | N O. 1 0


University of Central Florida pected to have credentials and qualifications like
Assistant or Associate Professor in Faculty those expected of a tenured associate or full pro-
Cluster for Cyber Security and Privacy fessor. To obtain tenure, the selected candidate
must have a demonstrated record of teaching,
The University of Central Florida (UCF) is recruit- research and service commensurate with rank.
ing a tenure-track assistant or associate professor This will be an interdisciplinary position that
for its cyber security and privacy cluster. This posi- will be expected to strengthen both the cluster
tion has a start date of August 8, 2018. and a chosen tenure home department, as well
This will be an interdisciplinary position that as a possible combination of joint appointments.
will be expected to strengthen both the cluster and The candidate can choose a combination of units
a chosen tenure home department, as well as a pos- from the cluster for their appointment. (See http://
sible combination of joint appointments. The can- www.ucf.edu/faculty/cluster/cyber-security-and-
didate can choose a combination of units from the privacy/.) Both individual and interdisciplinary in-
cluster for their appointment (see http://www.ucf. frastructure and startup support will be provided.
edu/faculty/cluster/cyber-security-and-privacy/). The ideal candidate will have a strong back-
The ideal junior candidates will have a strong ground in cyber security and privacy and outstand-
background in cyber security and privacy, and be ing research credentials and research impact, as
on an upward leadership trajectory in these areas. reflected in a sustained record of high quality pub-
They will have research impact, as reflected in lications and external funding. All relevant techni-
high-quality publications and the ability to build cal areas will be considered including: network
a well-funded research program. All relevant tech- security, cryptography, blockchains, hardware
nical areas will be considered. We are looking for security, trusted computing bases, cloud comput-
a team player who can help bring together current ing, human factors, anomaly detection, forensics,
campus efforts in cyber security or privacy. In par- privacy, and software security, as well as appli-
ticular, we are looking for someone who will work cations of security and privacy to areas such as
at the intersection of several areas, such as: (a) IoT, cyber-physical systems, finance, and insider
hardware and IoT security, (b) explaining and pre- threats. A history of working with teams, especially
dicting human behavior, creating policies, study- teams that span multiple disciplines, is a strongly
ing ethics, and ensuring privacy, (c) cryptography preferred qualification. A record of demonstrated
and theory of security or privacy, or (d) tools, meth- leadership is highly desired, as we are looking for
ods, training, and evaluation of human behavior. a leader to bring together all the current campus
Minimum qualifications include a Ph.D., ter- efforts in cyber security and privacy. This includes
minal degree, or foreign degree equivalent from three cluster members already hired, as well as a
an accredited institution in an area appropriate to pending hire for the 2017-18 academic year.
the cluster, and a record of high impact research re- Minimum qualifications include a Ph.D. from
lated to cyber security and privacy, demonstrated by an accredited institution in an appropriate area,
a strong scholarly and/or funding record. A history and a record of high impact research related to cy-
of working with teams, especially teams that span ber security and privacy demonstrated by a strong
multiple disciplines, is a strongly preferred qualifi- scholarly publication record and a significant
cation. The position will carry a rank commensurate amount of sustained funding.
with the candidate’s prior experience and record. Candidates must apply online at http://www.
Candidates must apply online at https://www. jobswithucf.com/postings/50044 and upload the
jobswithucf.com/postings/50404 and attach the fol- following materials: cover letter, CV, teaching and
lowing materials: a cover letter, curriculum vitae, research statements, and contact information for
teaching statement, research statement, and con- 3 professional references. In the cover letter, can-
tact information for three professional references. didates should address their background, and
In the cover letter candidates must address their identify the department for their potential tenure
background in cyber security and privacy, and identi- home and any desired joint appointments.
fy the department or departments for their potential An equal opportunity/affirmative action em-
tenure home and the joint appointments they would ployer, UCF encourages all qualified applicants
desire. When applying, have all documents ready so to apply, including women, veterans, individuals
they can be attached at that time, as the system does with disabilities, and members of traditionally
not allow resubmittal to update applications. underrepresented populations.
As an equal opportunity/affirmative action em- Questions can be directed to the search com-
ployer, UCF encourages all qualified applicants to mittee chair, Gary T. Leavens, at [email protected].
apply, including women, veterans, individuals
with disabilities, and members of traditionally
underrepresented populations. University of Illinois at Urbana-
For questions, please contact the Cluster’s Champaign
Search Committee Chair, Gary T. Leavens, at Leav- Head and Professor of the Department of
[email protected]. Computer Science

The University of Illinois at Urbana-Champaign


University of Central Florida seeks a highly accomplished scholar and stra-
Cluster Lead, Cyber Security and Privacy tegic leader as Head of the Department of Com-
Cluster puter Science (CS). This individual will succeed
Professor Rob A. Rutenbar, who was recently ap-
The University of Central Florida (UCF) is recruit- pointed Senior Vice Chancellor for Research at
ing a lead for its cluster on cyber security and pri- the University of Pittsburgh.
vacy. This position has a start date of August 8, The new Head will lead a department that has
2018. The position will carry a rank of associate continuously advanced the forefront of comput-
or full professor, commensurate with the candi- ing research and innovation since its inception,
date’s prior experience and record. The lead is ex- and whose faculty and alumni pioneered the

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T H E ACM 109


CAREERS

modern computing era. The new Head will be record and supporting letters from recognized lead- losophy, state of research and plans, and contact
uniquely positioned to build on the considerable ers in the field. Senior applicants should have a par- information for three professional references.
strength and illustrious history of the department ticularly strong record of research and teaching ac- Review of applications will begin on Novem-
to lead the next chapter of the digital revolution. complishments, scientific leadership and creativity. ber 1, 2017 and continue until the search closes
More details about the department can be found The University of Minnesota is located in the on December 31, 2017. Inquiries should be di-
at http://cs.illinois.edu. heart of the vibrant Minneapolis-St. Paul metropoli- rected to Ms. Lisa Cody, [email protected].
The Head is responsible for visioning, strategic tan area, which is consistently rated as one of Amer- EEO/AA Women and under-represented
planning, operations, finance, academic affairs, ex- ica’s best places to live and is home to many leading groups, individuals with disabilities, and veter-
ternal relations, and advancement, and is a tenured companies. The Department of Industrial and Sys- ans are encouraged to apply.
Professor in the department. The successful can- tems Engineering is within the College of Science
didate will be committed to enhancing the Univer- and Engineering at the University of Minnesota.
sity’s education, research, and service missions and Applicants are encouraged to apply by Novem- University of Oregon
will possess the scholarly record, leadership skills, ber 1, 2017. Review of applications will begin imme- Dept. of Computer and Information Science
and strategic capacity to advance the department. diately and will continue until the position is filled. Faculty Positions
Additional essential qualifications include success- Additional information and application instruc-
ful administrative experience in a university, indus- tions can be found at http://www.isye.umn.edu/ The University of Oregon’s Computer and Infor-
try, or government environment, the ability to effec- news/Search_f2017.shtml. The University of Minne- mation Science (CIS) Department invites appli-
tively engage a broad range of internal and external sota is an equal opportunity educator and employer. cations for two tenure-track faculty positions at
constituencies, and a commitment to diversity. the rank of Assistant Professor, to begin in Sep-
The Head position is a full-time, twelve- tember 2018. We seek candidates specializing in
month administrative appointment accompa- University of Nevada, Reno high-performance computing or data science.
nied by a full- time, tenured Professor appoint- Assistant or Associate Professor We are especially interested in scholars who will
ment with full University benefits. The desired enhance/complement the department’s exist-
start date for this position is as soon as possible. The Department of Computer Science and Engi- ing strengths in these areas. Applicants whose
Applicants may be interviewed before the closing neering (CSE) at the University of Nevada, Reno research addresses security and/or privacy issues
date; however, no hiring decision will be made (UNR) invites applications for a Tenure-track in these sub-disciplines are of particular interest.
until after that date. To ensure full consideration, Faculty position starting July 1, 2018. The Depart- CIS is a diverse and growing department with
applications should be received by November 10, ment seeks highly qualified candidates in games, strengths in networking and distributed systems,
2017, but applications will be accepted until the hardware, or in areas that extend or complement data science, and high-performance computing.
position is filled. Salary is negotiable and com- the Department’s existing strengths or fulfill De- We offer a stimulating, friendly environment for
mensurate with skills and experience. partment needs. The position is at the Assistant collaborative research both within the department,
The University has retained Witt/Kieffer, a or Associate Professor level. The new hire will which expects to grow substantially in the next few
national executive search firm, to assist in this re- work with existing faculty to strengthen research, years, and with other units on campus; for exam-
cruitment. Nominations and applications, includ- attract research funding, teach courses, and en- ple, the department plays a key role in the Knight
ing cover letter, curriculum vita, and the names/ hance our graduate and undergraduate programs. Campus for Accelerating Scientific Impact. The
contact information for three references, should The rapidly expanding and dynamic Depart- department hosts two interdisciplinary research
be submitted electronically to Witt/Kieffer consul- ment of Computer Science and Engineering has centers, the Center for Cyber Security and Privacy
tants John K. Thornburgh and Brian Bloomfield added nine positions over the last five years and and the NeuroInformatics Center. Successful can-
at the email address [email protected]. expects to add more. Several faculty have NSF CA- didates have access to a new, state-of-the-art high-
The consultants can be reached by telephone, REER awards and play lead roles in multiple state- performance computing facility. CIS is part of the
care of Donna Janulis, at 630-575-6131. wide and national multi-million dollar NSF awards. College of Arts and Sciences and is housed within
The University of Illinois conducts criminal In addition to federal support from DoD, DHS, DoE, the Lorry Lokey Science Complex. The department
background checks on all job candidates upon and NASA, companies like Google, Microsoft, Ford, offers B.S., M.S. and Ph.D. degrees. More informa-
acceptance of a contingent offer. AT&T, Nokia, and Honda support our research. tion about the department, its programs and fac-
The University of Illinois is an Equal Opportu- In the last five years, the College of Engineer- ulty can be found at https://cs.uoregon.edu/.
nity, Affirmative Action employer. Minorities, wom- ing has witnessed unprecedented growth in stu- Applicants must have a Ph.D. in computer sci-
en, veterans and individuals with disabilities are dent enrollment and number of faculty positions. ence or closely related field, a demonstrated re-
encouraged to apply. For more information, visit The College is positioned to further enhance its cord of excellence in research, and a strong com-
http://go.illinois.edu/EEO. To learn more about the growth of its students, faculty, staff, facilities as mitment to teaching. A successful candidate will
University’s commitment to diversity, please visit well as its research productivity and its graduate be expected to conduct a vigorous research pro-
http://www.inclusiveillinois.illinois.edu. and undergraduate programs. gram and to teach at both the undergraduate and
Thanks to this substantial growth in both graduate levels. Additionally, successful candi-
student enrollment and tenure-track faculty po- dates will support and enhance a diverse learning
University of Minnesota sitions, the College of Engineering has received and working environment. Salary is competitive.
Two Tenure-Track Positions funding to build a new engineering building, Candidates are asked to apply online at https://
scheduled to be completed in 2020. The new academicjobsonline.org/ajo/jobs/9499 by submit-
The Department of Industrial and Systems Engi- engineering building provides both additional ting a cover letter, a curriculum vitae, a research
neering at the University of Minnesota invites ap- space critically needed by the College and the statement, a teaching statement, and the contact
plications for two tenure-track faculty positions modern facilities capable of supporting advanced details for a minimum of three referees, by 15 De-
starting in Fall 2018. research and laboratory space. This building will cember 2017, or until the post has been filled. If
Applicants at all ranks will be considered. We allow the College to pursue its strategic vision, you are unable to use this online resource, please
seek candidates with a strong methodological foun- serve Nevada and the nation, and educate future contact [email protected] to ar-
dation in Operations Research, Industrial Engineer- generations of engineering professionals. range alternate means of application submission.
ing and a demonstrated interest in applications The University of Nevada, Reno recognizes The University of Oregon is dedicated to the goal
including, but not limited to: business analytics, en- that diversity promotes excellence in education of building a culturally diverse and pluralistic faculty
ergy and the environment, healthcare and medical and research. We are an inclusive and engaged committed to teaching and working in a multicul-
applications, transportation and logistics, supply community and recognize the added value that tural environment and strongly encourages applica-
chain management, and service systems. Appli- students, faculty, and staff from different back- tions from minorities, women, and people with dis-
cants must hold a Ph.D. in Industrial Engineering, grounds bring to the educational experience. abilities. Applicants are requested to include in their
Operations Research, Operations Management or Interested candidates must apply online to cover letter information about how they will further
a closely related discipline and have demonstrated www.unrsearch.com/postings/25237. Applica- this goal. In particular, candidates should describe
the potential to conduct a vigorous and significant tion process includes: a detailed letter of applica- previous activities mentoring minorities, women, or
research program as evidenced by their publication tion, curriculum vitae, statement of teaching phi- members of other underrepresented groups.

110 CO MM UNICATIO NS O F T H E AC M | O C TO BER 201 7 | VO L . 60 | NO. 1 0


last byte

[C O NTINUED FRO M P. 112] goal of any

ACM
solution is to minimize swaps.

LEARNING CENTER
RESOURCES
Solution to challenge. Numbering
the rows from top to bottom and left to

FOR LIFELONG LEARNING right, we can build roads between red


towns from (1,1) to (2,2), (2,4) to (3,3),
(2,2) to (3,3), and (3,1) to (2,2). This
leaves the red towns at (1,3), (4,2), and
learning.acm.org (4,4) surrounded by blue towns, high-
lighted in bold, in the following grid

Now build roads between the blue


towns from (1,2) to (2,1), from (2,1) to
(3,2), from (2,3) to (3,2), and from (2,3)
to (3,4). This leaves certain towns iso-
lated from their own colors, like those
highlighted in bold in this grid

The two sides then need just three


swaps: red and blue on top and the pairs
on the bottom.
Upstart 1. Suppose you are given an
arbitrary planar graph of connections
and an arbitrary red/blue coloring of
nodes. You are also given a budget of r
Online Courses from Skillsoft planar edges you can add. Can you cre-
ate an algorithm (and implementation)
that will perform a minimum number
Online Books from Safari, Books24x7, of swaps to achieve partitioned peace?
If so, please explain it in pseudo-code
Morgan Kaufmann and Syngress and send links to platform-indepen-
dent software.
Upstart 2. Now consider the same
Webinars on today’s question as in Upstart, 1 but allow non-
planar edges.
hottest topics in computing All are invited to submit their solutions to
[email protected]; solutions and discussion
will be posted at http://cs.nyu.edu/cs/faculty/shasha/
papers/cacmpuzzles.html

Dennis Shasha ([email protected]) is a professor


of computer science in the Computer Science Department
of the Courant Institute at New York University, New York,
as well as the chronicler of his good friend the omniheurist
Dr. Ecco.
Copyright held by the author.

O C TO B E R 2 0 1 7 | VO L. 6 0 | N O. 1 0 | C OM M U N IC AT ION S OF T H E ACM 111


last byte

DOI:10.1145/3133244 Dennis Shasha

Upstart Puzzles
Partitioned Peace
I N A M Y T H I C A L land of rhetorically en-
couraged antagonism, different fac-
tions manage to co-exist, though poorly.
Imagine a set of red and blue hill towns 1
connected by a network of roads. People
in the red towns deal well with one an-
other. People in the blue towns deal well
7 2
with one another. But when a person
from a red town travels through a blue
town or vice versa, things can get un-
pleasant. The leaders of the red and the
blue towns get together and decide the
best way to resolve their differences is to
perform a series of swaps in which the 6 3
inhabitants of k red towns swap towns
with the inhabitants of k blue towns
with the end result that a person from a
blue town can visit any other blue town
without passing through a red town and
likewise for a person from a red town.
We call such a desirable state “parti-
tioned peace.” The goal is to make k as
small as possible.
5
Warm-Up 1. Given the configuration
in the figure here, what is the minimum
4
number of swaps needed to achieve par-
titioned peace? What is the minimum number of towns you can exchange so red and blue travelers never
Solution to Warm-Up 1. Two swaps need to cross the other color’s towns?
are sufficient: Red_1 with Blue_7 and
Red_3 with Blue_4. without crossing through blue towns. ing to cross through red towns.
Because exchanging town popula- Likewise, the blue town dwellers can Challenge: So far, we have considered
tions is painful for the people who must travel to other blue towns without hav- only a very simple configuration of towns;
move, the leaders seek other arrange- now consider that the red and blue towns
ments; they are willing to build, for ex- alternate like the squares of a four-by-four
ample, a certain number of roads to re- Can you create checkerboard. Every town is connected
duce the number of swaps. to all its vertical and horizontal neigh-
Warm-Up 2. Given the configura- an algorithm that bors. You may build eight new roads be-
tion in the figure, what is the minimum will perform a tween diagonally neighboring towns.
IMAGE BY AND RIJ BORYS ASSOCIAT ES

number of swaps needed to achieve par- Where should the roads go? And af-
titioned peace if you were able to build a minimum number ter the roads are built, which towns
single new road? of swaps to achieve should swap populations to minimize
Solution to Warm-Up 2. Build a road the number of swaps needed to achieve
between Blue_7 and Red_1 and then partitioned peace? partitioned peace, where the swaps are
swap Red_1 with Blue_4. The red town between towns directly connected by
dwellers can travel to other red towns roads? The [C O NTINUED O N P. 111]

112 COMM UNICATIO NS O F T H E ACM | O C TO BER 201 7 | VO L . 60 | N O. 1 0


Untitled-2 1 07.07.2017 23:25
CONFERENCE 27 – 30 November 2017
EXHIBITION 28 – 30 November 2017
BITEC, Bangkok, Thailand

THE CELEBRATION OF LIFE & TECHNOLOGY


The 10th ACM SIGGRAPH Conference and Exhibition
on Computer Graphics and Interactive Techniques in Asia

Register online by 15 October 2017,


& enjoy early bird discounts of up to
SA2017.SIGGRAPH.ORG/REGISTRATION
20%
Sponsored by Organized by

You might also like