Sysml: The New Frontier of Machine Learning Systems: March 29, 2019
Sysml: The New Frontier of Machine Learning Systems: March 29, 2019
Alexander Ratner1 Dan Alistarh2 Gustavo Alonso3 Peter Bailis1 Sarah Bird4 Nicholas Carlini5 Bryan
Catanzaro6 Eric Chung4 Bill Dally1,6 Jeff Dean5 Inderjit S. Dhillon7,8 Alexandros Dimakis7 Pradeep
Dubey9 Charles Elkan10 Grigori Fursin11,12 Gregory R. Ganger13 Lise Getoor14 Phillip B. Gibbons13
15,16,13 17 9 18
Garth A. Gibson Joseph E. Gonzalez Justin Gottschlich Song Han Kim Hazelwood19 Furong
20 21 22 17 13
Huang Martin Jaggi Kevin Jamieson Michael I. Jordan Gauri Joshi Rania Khalaf23 Jason
Knight9 Jakub Konečný5 Tim Kraska18 Arun Kumar10 Anastasios Kyrillidis24 Jing Li 25 Samuel
Madden18 H. Brendan McMahan5 Erik Meijer19 Ioannis Mitliagkas26,27 Rajat Monga5 Derek Murray5
25 28 25 5
Dimitris Papailiopoulos Gennady Pekhimenko Theodoros Rekatsinas Afshin Rostamizadeh Christopher
Ré1 Christopher De Sa29 Hanie Sedghi5 Siddhartha Sen4 Virginia Smith13 Alex Smola8,13 Dawn
Song17 Evan Sparks30 Ion Stoica17 Vivienne Sze18 Madeleine Udell29 Joaquin Vanschoren31
Shivaram Venkataraman25 Rashmi Vinayak13 Markus Weimer4 Andrew Gordon Wilson29 Eric Xing13,32
1,33 3 ∗13,30
Matei Zaharia Ce Zhang Ameet Talwalkar
1
Stanford, 2 IST Austria, 3 ETH Zurich, 4 Microsoft, 5 Google, 6 NVIDIA, 7 University of Texas at Austin, 8 Amazon, 9 Intel,
10
University of California San Diego, 11 cTuning Foundation, 12 Dividiti, 13 Carnegie Mellon University, 14 UC Santa Cruz,
15
Vector Institute, 16 Univerrsity of Toronto, 17 UC Berkeley, 18 MIT, 19 Facebook, 20 University of Maryland, 21 EPFL, 22 University
of Washington, 23 IBM Research, 24 Rice University, 25 University of Wisconsin-Madison, 26 Mila, 27 University of Montreal,
28
University of Toronto, 29 Cornell University, 30 Determined AI, 31 Eindhoven University of Technology, 32 Petuum, 33 Databricks
Abstract
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and imple-
menting the systems that support ML models in real-world deployments remains a significant obstacle, in large
part due to the radically different development and deployment profile of modern ML methods, and the range
of practical concerns that come with broader adoption. We propose to foster a new systems machine learning
research community at the intersection of the traditional systems and ML communities, focused on topics such as
hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy.
To do this, we describe a new conference, SysML, that explicitly targets research at the intersection of systems
and machine learning with a program committee split evenly between experts in systems and ML, and an explicit
focus on topics at the intersection of the two.
1 Introduction
Over the last few years, machine learning (ML) has hit an inflection point in terms of adoption and results. Large
corporations have invested billions of dollars in reinventing themselves as “AI-centric”; swaths of academic
disciplines have flocked to incorporate machine learning into their research; and a wave of excitement about AI and
ML has proliferated through the broader public sphere. This has been due to several factors, central amongst them
new deep learning approaches, increasing amounts of data and compute resources, and collective investment in open-
source frameworks like Caffe, Theano, MXNet, TensorFlow, and PyTorch, which have effectively decoupled model
design and specification from the systems to implement these models. The resulting wave of technical advances
and practical results seems poised to transform ML from a bespoke solution used on certain narrowly-defined tasks,
to a commodity technology deployed nearly everywhere.
Unfortunately, while it is easier than ever to run state-of-the-art ML models on pre-packaged datasets, designing
and implementing the systems that support ML in real-world applications is increasingly a major bottleneck. In
large part this is because ML-based applications require distinctly new types of software, hardware, and engineering
systems to support them. Indeed, modern ML applications have been referred to by some as a new “Software
2.0” [5] to emphasize the radical shift they represent as compared to traditional computing applications. They
are increasingly developed in different ways than traditional software—for example, by collecting, preprocessing,
labeling, and reshaping training datasets rather than writing code—and also deployed in different ways, for example
∗ Corresponding author, [email protected].
1
utilizing specialized hardware, new types of quality assurance methods, and new end-to-end workflows. This shift
opens up exciting research challenges and opportunities around high-level interfaces for ML development, low-level
systems for executing ML models, and interfaces for embedding learned components in the middle of traditional
computer systems code.
Modern ML approaches also require new solutions for the set of concerns that naturally arise as these techniques
gain broader usage in diverse real-world settings. These include cost and other efficiency metrics for small and
large organizations alike, including e.g. computational cost at training and prediction time, engineering cost, and
cost of errors in real-world settings; accessibility and automation, for the expanding set of ML users that do not
have PhDs in machine learning, or PhD time scales to invest; latency and other run-time constraints, for a widening
range of computational deployment environments; and concerns like fairness, bias, robustness, security, privacy,
interpretability, and causality, which arise as ML starts to be applied to critical settings where impactful human
interactions are involved, like driving, medicine, finance, and law enforcement.
This combination of radically different application requirements, increasingly-prevalent systems-level concerns,
and a rising tide of interest and adoption, collectively point to the need for a concerted research focus on the
systems aspects of machine learning. To accelerate these research efforts, our goal is to help foster a new systems
machine learning community dedicated to these issues. We envision focusing on broad, full-stack questions that are
complementary to those traditionally tackled independently by the ML and Systems communities, including:
1. How should software systems be designed to support the full machine learning lifecycle, from program-
ming interfaces and data preprocessing to output interpretation, debugging and monitoring? Example
questions include:
• How can we enable users to quickly “program” the modern machine learning stack through emerging
interfaces such as manipulating or labeling training data, imposing simple priors or constraints, or defining
loss functions?
• How can we enable developers to define and measure ML models, architectures, and systems in higher-level
ways?
• How can we support efficient development, monitoring, interpretation, debugging, adaptation, tuning, and
overall maintenance of production ML applications- including not just models, but the data, features, labels,
and other inputs that define them?
2. How should hardware systems be designed for machine learning? Example questions include:
• How can we develop specialized, heterogeneous hardware for training and deploying machine learning
models, fit to their new operation sets and data access patterns?
• How can we take advantage of the stochastic nature of ML workloads to discover new trade-offs with respect
to precision, stability, fidelity, and more?
• How should distributed systems be designed to support ML training and serving?
3. How should machine learning systems be designed to satisfy metrics beyond predictive accuracy, such as
power and memory efficiency, accessibility, cost, latency, privacy, security, fairness, and interpretability?
Example questions include:
• How can machine learning algorithms and systems be designed for device constraints such as power, latency,
and memory limits?
• How can ML systems be designed to support full-stack privacy and security guarantees, including, e.g.,
federated learning and other similar settings?
• How can we increase the accessibility of ML, to empower an increasingly broad range of users who may be
neither ML nor systems experts?
Another way of partitioning these research topics is into high-level systems for ML that support interfaces and
workflows for ML development—the analogue of traditional work on programming languages and software
engineering—and low-level systems for ML that involve hardware or software—and that often blur the lines
between the two—to support training and execution of models, the analogue of traditional work on compilers and
architecture. Regardless of the ontology, we envision these questions being addressed by a strong mix of theoretical,
empirical, and applications-driven perspectives. And given their full-stack nature, we see them being best answered
by a research community that mixes perspectives from the traditional machine learning and systems communities.
A separate but closely related and increasingly exciting area of focus is machine learning for systems: the idea of
applying machine learning techniques to improve traditional computing systems. Examples include replacing the
2
data structures, heuristics, or hand-tuned parameters used in low-level systems like operating systems, compilers,
and storage systems with learned models. While this is clearly a distinct research direction, we also see the systems
machine learning community as an ideal one to examine and support this line of work, given the required confluence
of ML and systems expertise.
Finally, we see the systems machine learning community as an ideal jumping-off point for even larger-scale and
broader questions, beyond how to interface with, train, execute, or evaluate single models [4]. For instance, how do
we manage entire ecosystems of models that interact in complex ways? How do we maintain and evaluate systems
that pursue long term goals? How do we measure the effect of ML systems on societies, markets, and more? How
do we share and reuse data and models at societal scale, while maintaining privacy and other economic, social, and
legal issues? All of these questions and many more will likely need to be approached by research at the intersection
of traditional machine learning and systems viewpoints.
3
fields, the SysML review process is a rigorous, highly selective process. However, unlike traditional ML or Systems
conferences, the SysML Program Committee consists of experts from both ML and Systems who review all papers
together, and has an explicit focus on topics that fall in the interdisciplinary SysML space (as opposed to the broad
ML or broad Systems space). Finally, to spur reproducibility and rapid progress in this research area, SysML
embraces modern artifact evaluation processes that have been successful at other conferences [1].
SysML was established in 2018 by the inaugural Organizing Committee (Peter Bailis, Sarah Bird, Dimitris
Papailiopoulos, Chris Ré, Ben Recht, Virginia Smith, Ameet Talwalkar, Matei Zaharia), led by Program Chair
Ameet Talwalkar and Co-Chair Dimitris Papailiopoulos, and with guidance from the Steering and Program
Committees.
• Steering Committee: Jennifer Chayes, Bill Dally, Jeff Dean, Michael I. Jordan, Yann LeCun, Fei-Fei Li, Alex
Smola, Dawn Song, Eric Xing.
• Program Committee: David Andersen, Bryan Catanzaro, Eric Chung, Christopher De Sa, Inderjit Dhillon, Alex
Dimakis, Charles Elkan, Greg Ganger, Lise Getoor, Phillip Gibbons, Garth Gibson, Joseph Gonzalez, Furong
Huang, Kevin Jamieson, Yangqing Jia, Rania Khalaf, Jason Knight, Tim Kraska, Aparna Lakshmiratan, Samuel
Madden, Brendan McMahan, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Theodoros
Rekatsinas, Afshin Rostamizadeh, Siddhartha Sen, Evan Sparks, Ion Stoica, Shivaram Venkataraman, Rashmi
Vinayak, Markus Weimer, Ce Zhang.
4 Conclusion
There is an incredibly exciting set of research challenges that can be uniquely tackled at the intersection of
traditional machine learning and systems communities, both today and moving forward. Solving these challenges
will require advances in theory, algorithms, software, and hardware, and will lead to exciting new low-level systems
for executing ML algorithms, high-level systems for specifying, monitoring, and interacting with them, and beyond
that, new paradigms and frameworks that shape how machine learning interacts with society in general. We envision
the new SysML conference as a center of research in these increasingly important areas.
References
[1] ACM: Artifact reviewing and badging. https://www.acm.org/publications/policies/
artifact-review-badging, 2018.
[2] C. Coleman, D. Narayanan, D. Kang, T. Zhao, J. Zhang, L. Nardi, P. Bailis, K. Olukotun, C. Ré, and M. Zaharia.
DAWNBench: An end-to-end deep learning benchmark and competition. NeurIPS ML Systems Workshop,
2017.
[3] P. Eckersley, Y. Nasser, et al. EFF AI progress measurement project. https://eff.org/ai/metrics,
2017.
[4] M. Jordan. Artificial intelligence—the revolution hasn’t happened yet. 2018. https://medium.com/
@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7.
[5] A. Karpathy. Software 2.0. https://medium.com/@karpathy/software-2-0-a64152b37c35,
2017.
[6] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine
learning: The high interest credit card of technical debt. 2014.
[7] A. Talwalkar. Toward the jet age of machine learning. O’Reilly, 2018. https://www.oreilly.com/
ideas/toward-the-jet-age-of-machine-learning.