0% found this document useful (0 votes)
25 views11 pages

A Berkeley View of Systems Challenges For AI: Continual or Life-Long Never-Ending

The document discusses the transition of AI from research to production, driven by advancements in data, computation, and machine learning methodologies. It highlights the challenges faced by next-generation AI systems, including the need for continual learning, robust decision-making, and secure processing of data in unpredictable environments, particularly as Moore's Law comes to an end. The authors propose several research directions to address these challenges, emphasizing the importance of innovations in systems, architectures, and security to unlock AI's potential.

Uploaded by

shobitg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views11 pages

A Berkeley View of Systems Challenges For AI: Continual or Life-Long Never-Ending

The document discusses the transition of AI from research to production, driven by advancements in data, computation, and machine learning methodologies. It highlights the challenges faced by next-generation AI systems, including the need for continual learning, robust decision-making, and secure processing of data in unpredictable environments, particularly as Moore's Law comes to an end. The authors propose several research directions to address these challenges, emphasizing the importance of innovations in systems, architectures, and security to unlock AI's potential.

Uploaded by

shobitg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

A Berkeley View of Systems Challenges for AI

Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz,
Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph Gonzalez, Ken Goldberg,
Ali Ghodsi, David Culler, Pieter Abbeel∗
ABSTRACT foster new industries around IoT, augmented reality, biotechnology
With the increasing commoditization of computer vision, speech and autonomous vehicles.
recognition and machine translation systems and the widespread These applications will require AI systems to interact with the
deployment of learning-based back-end technologies such as dig- real world by making automatic decisions. Examples include au-
ital advertising and intelligent infrastructures, AI (Artificial In- tonomous drones, robotic surgery, medical diagnosis and treatment,
virtual assistants, and many more. As the real world is continu-
arXiv:1712.05855v1 [cs.AI] 15 Dec 2017

telligence) has moved from research labs to production. These


changes have been made possible by unprecedented levels of data ally changing, sometimes unexpectedly, these applications need to
and computation, by methodological advances in machine learning, support continual or life-long learning [96, 109] and never-ending
by innovations in systems software and architectures, and by the learning [76]. Life-long learning systems aim at solving multiple
broad accessibility of these technologies. tasks sequentially by efficiently transferring and utilizing knowl-
The next generation of AI systems promises to accelerate these edge from already learned tasks to new tasks while minimizing
developments and increasingly impact our lives via frequent inter- the effect of catastrophic forgetting [71]. Never-ending learning is
actions and making (often mission-critical) decisions on our behalf, concerned with mastering a set of tasks in each iteration, where
often in highly personalized contexts. Realizing this promise, how- the set keeps growing and the performance on all the tasks in the
ever, raises daunting challenges. In particular, we need AI systems set keeps improving from iteration to iteration.
that make timely and safe decisions in unpredictable environments, Meeting these requirements raises daunting challenges, such
that are robust against sophisticated adversaries, and that can pro- as active exploration in dynamic environments, secure and robust
cess ever increasing amounts of data across organizations and in- decision-making in the presence of adversaries or noisy and un-
dividuals without compromising confidentiality. These challenges foreseen inputs, the ability to explain decisions, and new modular
will be exacerbated by the end of the Moore’s Law, which will con- architectures that simplify building such applications. Furthermore,
strain the amount of data these technologies can store and process. as Moore’s Law is ending, one can no longer count on the rapid
In this paper, we propose several open research directions in sys- increase of computation and storage to solve the problems of next-
tems, architectures, and security that can address these challenges generation AI systems.
and help unlock AI’s potential to improve lives and society. Solving these challenges will require synergistic innovations
in architecture, software, and algorithms. Rather than addressing
KEYWORDS specific AI algorithms and techniques, this paper examines the
essential role that systems will play in addressing challenges in AI
AI, Machine Learning, Systems, Security
and proposes several promising research directions on that frontier.

1 INTRODUCTION
Conceived in the early 1960’s with the vision of emulating human 2 WHAT IS BEHIND AI’S RECENT SUCCESS
intelligence, AI has evolved towards a broadly applicable engineer-
The remarkable progress in AI has been made possible by a “perfect
ing discipline in which algorithms and data are brought together
storm” emerging over the past two decades, bringing together:
to solve a variety of pattern recognition, learning, and decision-
(1) massive amounts of data, (2) scalable computer and software
making problems. Increasingly, AI intersects with other engineer-
systems, and (3) the broad accessibility of these technologies. These
ing and scientific fields and cuts across many disciplines in com-
trends have allowed core AI algorithms and architectures, such as
puting.
deep learning, reinforcement learning, and Bayesian inference to
In particular, computer systems have already proved essential in
be explored in problem domains of unprecedented scale and scope.
catalyzing recent progress in AI. Advances in parallel hardware [31,
58, 90] and scalable software systems [32, 46, 114] have sparked the
development of new machine learning frameworks [14, 31, 98] and
algorithms [18, 56, 62, 91] to allow AI to address large-scale, real- 2.1 Big data
world problems. Rapidly decreasing storage costs [1, 80], crowd- With the widespread adoption of online global services, mobile
sourcing, mobile applications, internet of things (IoT), and the com- smartphones, and GPS by the end of 1990s, internet companies
petitive advantage of data [40] have driven further investment in such as Google, Amazon, Microsoft, and Yahoo! began to amass
data-processing systems and AI technologies [87]. The overall effect huge amounts of data in the form of audio, video, text, and user
is that AI-based solutions are beginning to approach or even surpass logs. When combined with machine learning algorithms, these
human-level capabilities in a range of real-world tasks. Maturing AI massive data sets led to qualitatively better results in a wide range of
technologies are not only powering existing industries—including core services, including classical problems in information retrieval,
web search, high-speed trading and commerce—but are helping to information extraction, and advertising [49].
A Berkeley View of Systems Challenges for AI,

2.2 Big systems an accident or oil on the road), by learning in real time from other
Processing this deluge of data spurred rapid innovations in com- cars that have successfully dealt with these conditions. Similarly,
puter and software systems. To store massive amounts of data, in- an AI-powered intrusion-detection system must quickly identify
ternet service companies began to build massive-scale datacenters, and learn new attack patterns as they happen. In addition, such
some of which host nearly 100, 000 servers, and provide EB [65] of mission-critical applications must handle noisy inputs and defend
storage. To process this data, companies built new large-scale soft- against malicious actors.
ware systems able to run on clusters of cheap commodity servers. Challenges: Design AI systems that learn continually by inter-
Google developed MapReduce [32] and Google File System [43], fol- acting with a dynamic environment, while making decisions that are
lowed shortly by the open-source counterpart, Apache Hadoop [7]. timely, robust, and secure.
Then came a plethora of systems [46, 55, 60, 67, 114], that aimed to
improve speed, scale, and ease of use. These hardware and software 3.2 Personalized AI
innovations led to the datacenter becoming the new computer [11]. From virtual assistants to self-driving cars and political campaigns,
With the growing demand for machine learning (ML), re- user-specific decisions that take into account user behavior (e.g., a
searchers and practitioners built libraries on top of these systems virtual assistant learning a user’s accent) and preferences (e.g., a
to satisfy this demand [8, 52, 75]. self-driving system learning the level of “aggressiveness” a user is
The recent successes of deep learning (DL) have spurred a new comfortable with) are increasingly the focus. While such personal-
wave of specialized software systems have emerged to scale out ized systems and services provide new functionality and significant
these workloads on CPU clusters and take advantage of special- economic benefits, they require collecting vast quantities of sen-
ized hardware, such as GPUs and TPUs. Examples include Tensor- sitive personal information and their misuse could affect users’
Flow [2], Caffe [57], Chainer [20], PyTorch [89], and MXNet [22]. economic and psychological wellbeing.
Challenges: Design AI systems that enable personalized applica-
2.3 Accessibility to state-of-the-art technology tions and services yet do not compromise users’ privacy and security.
The vast majority of systems that process data and support AI work-
loads are built as open-source software, including Spark [114], Ten- 3.3 AI across organizations
sorFlow [2], MXNet [22], Caffe [57], PyTorch [89], and BigDL [15]. Companies are increasingly leveraging third-party data to augment
Open source allows organizations and individuals alike to leverage their AI-powered services [27]. Examples include hospitals shar-
state-of-the-art software technology without incurring the prohibi- ing data to prevent epidemic outbreaks and financial institutions
tive costs of development from scratch or licensing fees. sharing data to improve their fraud-detection capabilities. The pro-
The wide availability of public cloud services (e.g., AWS, Google liferation of such applications will lead to a transition from data
Cloud, and MS Azure) allows everyone to access virtually unlim- silos—where one company collects data, processes it, and provides
ited amounts of processing and storage without needing to build the service—to data ecosystems, where applications learns and
large datacenters. Now, researchers can test their algorithms at a make decisions using data owned by different organizations.
moment’s notice on numerous GPUs or FPGAs by spending just a Challenges: Design AI systems that can train on datasets owned
few thousands of dollars, which was unthinkable a decade ago. by different organizations without compromising their confidentiality,
and in the process provide AI capabilities that span the boundaries of
3 TRENDS AND CHALLENGES potentially competing organization.
While AI has already begun to transform many application domains,
looking forward, we expect that AI will power a much wider range 3.4 AI demands outpacing the Moore’s Law
of services, from health care to transportation, manufacturing to The ability to process and store huge amounts of data has been one
defense, entertainment to energy, and agriculture to retail. More- of the key enablers of the AI’s recent successes (see Section 2.1).
over, while large-scale systems and ML frameworks have already However, keeping up with the data being generated will become
played a pivotal role in the recent success of AI, looking forward, increasingly difficult due to the following two trends.
we expect that, together with security and hardware architectures, First, data continues to grow exponentially. A 2015 Cisco white
systems will play an even more important role in enabling the broad paper [25] claims that the amount of data generated by Internet of
adoption of AI. To realize this promise, however, we need to address Everything (IoE) devices by 2018 to be 400ZB, which is almost 50x
significant challenges that are driven by the following trends. the estimated traffic in 2015. According to a recent study [100], by
2025, we will need a three-to-four orders of magnitude improve-
3.1 Mission-critical AI ment in compute throughput to process the aggregate output of all
With ongoing advances in AI in applications, from banking to genome sequencers in the world. This would require computation
autonomous driving to robot-assisted surgery and to home au- resources to at least double every year.
tomation, AI is poised to drive more and more mission-critical Second, this explosion of data is coming at a time when our
applications where human well-being and lives are at stake. historically rapidly improving hardware technology is coming to a
As AI will increasingly be deployed in dynamic environments, grinding halt [53]. The capacity of DRAMs and disks are expected
AI systems will need to continually adapt and learn new “skills” to double just once in the next decade, and it will take two decades
as the environment changes. For example, a self-driving car could before the performance of CPUs doubles. This slowdown means that
quickly adapt to unexpected and dangerous road conditions (e.g., storing and processing all generated data will become impracticable.
A Berkeley View of Systems Challenges for AI A Berkeley View of Systems Challenges for AI,

Challenges: Develop domain-specific architectures and software These more challenging environments require agents that continu-
systems to address the performance needs of future AI applications ally learn and adapt to asynchronous changes.
in the post-Moore’s Law era, including custom chips for AI work- Some aspects of learning in dynamic environments are addressed
loads, edge-cloud systems to efficiently process data at the edge, and by online learning [17], in which data arrive temporally and updates
techniques for abstracting and sampling data. to the model can occur as new data arrive. However, traditional
online learning does not aim to handle control problems, in which
4 RESEARCH OPPORTUNITIES an agent’s actions change the environment (e.g., as arise naturally
This section discusses the previous challenges from the systems in robotics), nor does it aim to handle cases in which the outcomes
perspective. In particular, we discuss how innovations in systems, of decisions are delayed (e.g., a move in a game of chess whose
security, and architectures can help address these challenges. We outcome is only evaluated at the end, when the game is lost or won).
present nine research opportunities (from R1 to R9), organized These more general situations can be addressed in the frame-
into three topics: acting in dynamic environments, secure AI, and work of Reinforcement Learning (RL). The central task of RL is
AI-specific architectures. Figure 1 shows the most common rela- to learn a function—a “policy”—that maps observations (e.g., car’s
tionships between trends, on one hand, and challenges and research camera inputs or user’s requested content) to actions (e.g., slowing
topics, on the other hand. down the car or presenting an ad) in a sequence that maximizes
long-term reward (e.g., avoiding collisions or increasing sales). RL
algorithms update the policy by taking into account the impact of
Trends Challenges & Research agent’s actions on the environment, even when delayed. If envi-
Mission-critical AI Acting in dynamic ronmental changes lead to reward changes, RL updates the policy
R 1, R environments:
2, R 3 accordingly. RL has a long-standing tradition, with classical success
R1: Continual learning
R2: Robust decisions stories including learning to play backgammon at level of the best
R7
R4

R3: Explainable decisions human players [108], learning to walk [105], and learning basic
,R
,R

motor skills [86]. However, these early efforts require significant


5
9

Personalized AI
R4, Secure AI:
R5,
R6 R4: Secure enclaves
tuning for each application. Recent efforts are combining deep
R7

neural networks with RL (Deep RL) to develop more robust train-


,R

R5: Adversarial learning


8

AI across R6 R6: Shared learning on ing algorithms that can work for a variety of environments (e.g.,
organizations confidential data many Atari games [77]), or even across different application do-
R8
,R
9 AI-specific architectures: mains, as in the control of (simulated) robots [92] and the learning
R7: Domain specific hardware of robotic manipulation skills [66]. Noteworthy recent results also
R9
AI demands outpacing R7, R8, R8: Composable AI systems include Google’s AlphaGo beating the Go world champion [95],
Moore’s Law R9: Cloud-edge systems
and new applications in medical diagnosis [104] and resource man-
agement [33].
Figure 1: A mapping from trends to challenges and research topics.
However, despite these successes, RL has yet to see widescale
real-world application. There are many reasons for this, one of
4.1 Acting in dynamic environments which is that large-scale systems have not been built with these use
Many future AI applications will operate in dynamic environments, cases in mind. We believe that the combination of ongoing advances
i.e., environments that may change, often rapidly and unexpectedly, in RL algorithms, when coupled with innovations in systems design,
and often in non-reproducible ways. For example, consider a group will catalyze rapid progress in RL and drive new RL applications.
of robots providing security for an office building. When one robot Systems for RL. Many existing RL applications, such as game-
breaks or a new one is added, the other robots must update their playing, rely heavily on simulations, often requiring millions or
strategies for navigation, planning, and control in a coordinated even billions of simulations to explore the solution space and “solve”
manner. Similarly, when the environment changes, either due to the complex tasks. Examples include playing different variants of a
robots’ own actions or to external conditions (e.g., an elevator going game or experimenting with different control strategies in a robot
out of service, or a malicious intruder), all robots must re-calibrate simulator. These simulations can take as little as a few milliseconds,
their actions in light of the change. Handling such environments and their durations can be highly variable (e.g., it might take a
will require AI systems that can react quickly and safely even to few moves to lose a game vs. hundreds of moves to win one). Fi-
scenarios that have not been encountered before. nally, real-world deployments of RL systems need to process inputs
R1: Continual learning. Most of today’s AI systems, including from a variety of sensors that observe the environment’s state, and
movie recommendation, image recognition, and language trans- this must be accomplished under stringent time constraints. Thus,
lation, perform training offline and then make predictions online. we need systems that can handle arbitrary dynamic task graphs,
That is, the learning performed by the system does not happen where tasks are heterogeneous in time, computation, and resource
continually with the generation of the data, but instead it happens demands. Given the short duration of the simulations, to fully uti-
sporadicallly, on very different and much slower time scales. Typ- lize a large cluster, we need to execute millions of simulations per
ically, models are updated daily, or in the best case hourly, while second. None of the existing systems satisfies these requirements.
predictions/decisions happen at second or sub-second granularity. Data parallel systems [55, 79, 114] handle orders of magnitude fewer
This makes them a poor fit for environments that change continu- tasks per sec, while HPC and distributed DL systems [2, 23, 82]
ally and unexpectedly, especially in mission-critical applications.
A Berkeley View of Systems Challenges for AI,

have limited support for heterogeneous and dynamic task graphs. data storage systems. While some of these challenges apply more
Hence, we need new systems to support effectively RL applications. generally, two notions of robustness that are particularly important
Simulated reality (SR). The ability to interact with the environ- in the context of AI systems and that present particular systems
ment is fundamental to RL’s success. Unfortunately, in real-world challenges are: (1) robust learning in the presence of noisy and ad-
applications, direct interaction can be slow (e.g., on the order of sec- versarial feedback, and (2) robust decision-making in the presence
onds) and/or hazardous (e.g., risking irreversible physical damage), of unforeseen and adversarial inputs.
both of which conflict with the need for having millions of inter- Increasingly, learning systems leverage data collected from un-
actions before a reasonable policy is learned. While algorithmic reliable sources, possibly with inaccurate labels, and in some cases
approaches have been proposed to reduce the number of real-world with deliberately inaccurate labels. For example, the Microsoft Tay
interactions needed to learn policies [99, 111, 112], more generally chatbot relied heavily on human interaction to develop rich natural
there is a need for Simulated Reality (SR) architectures, in which an dialogue capabilities. However, when exposed to Twitter messages,
agent can continually simulate and predict the outcome of the next Tay quickly took on a dark personality [16].
action before actually taking it [101]. In addition to dealing with noisy feedback, another research
SR enables an agent to learn not only much faster but also much challenge is handling inputs for which the system was never trained.
more safely. Consider a robot cleaning an environment that encoun- In particular, one often wishes to detect whether a query input is
ters an object it has not seen before, e.g., a new cellphone. The robot drawn from a substantially different distribution than the training
could physically experiment with the cellphone to determine how data, and then take safe actions in those cases. An example of a safe
to grasp it, but this may require a long time and might damage the action in a self-driving car may be to slow down and stop. More
phone. In contrast, the robot could scan the 3D shape of the phone generally, if there is a human in the loop, a decision system could
into a simulator, perform a few physical experiments to determine relinquish control to a human operator. Explicitly training models
rigidity, texture, and weight distribution, and then use SR to learn to decline to make predictions for which they are not confident,
how to successfully grasp it without damage. or to adopt a default safe course of actions, and building systems
Importantly, SR is quite different from virtual reality (VR); that chain such models together can both reduce computational
while VR focuses on simulating a hypothetical environment (e.g., overhead and deliver more accurate and reliable predictions.
Minecraft), sometimes incorporating past snapshots of the real Research: (1) Build fine grained provenance support into AI sys-
world (e.g., Flight Simulator), SR focuses on continually simulating tems to connect outcome changes (e.g., reward or state) to the data
the physical world with which the agent is interacting. SR is sources that caused these changes, and automatically learn causal,
also different from augmented reality (AR), which is primarily source-specific noise models. (2) Design API and language support for
concerned with overlaying virtual objects onto real world images. developing systems that maintain confidence intervals for decision-
Arguably the biggest systems challenges associated with SR are making, and in particular can flag unforeseen inputs.
to infer continually the simulator parameters in a changing real- R3: Explainable decisions. In addition to making black-box
world environment and at the same time to run many simulations predictions and decisions, AI systems will often need to provide
before taking a single real-time action. As the learning algorithm explanations for their decisions that are meaningful to humans.
interacts with the world, it gains more knowledge which can be This is especially important for applications in which there are
used to improve the simulation. Meanwhile, many potential sim- substantial regulatory requirements as well as in applications such
ulations would need to be run between the agent’s actions, using as security and healthcare where legal issues arise [24]. Here, ex-
both different potential plans and making different “what-if” as- plainable should be distinguished from interpretable, which is often
sumptions about the world. Thus, the simulation is required to run also of interest. Typically, the latter means that the output of the
much faster than real time. AI algorithm is understandable to a subject matter expert in terms
Research: (1) Build systems for RL that fully exploit parallelism, of concepts from the domain from which the data are drawn [69],
while allowing dynamic task graphs, providing millisecond-level la- while the former means that one can identify the properties of the
tencies, and running on heterogeneous hardware under stringent dead- input to the AI algorithm that are responsible for the particular
lines. (2) Build systems that can faithfully simulate the real-world output, and can answer counterfactual or “what-if” questions. For
environment, as the environment changes continually and unexpect- example, one may wish to know what features of a particular or-
edly, and run faster than real time. gan in an X-ray (e.g., size, color, position, form) led to a particular
R2: Robust decisions. As AI applications are increasingly diagnosis and how the diagnosis would change under minor pertur-
making decisions on behalf of humans, notably in mission-critical bations of those features. Relatedly, one may wish to explore what
applications, an important criterion is that they need to be robust to other mechanisms could have led to the same outcomes, and the
uncertainty and errors in inputs and feedback. While noise-resilient relative plausibility of those outcomes. Often this will require not
and robust learning is a core topic in statistics and machine learning, merely providing an explanation for a decision, but also considering
adding system support can significantly improve classical methods. other data that could be brought to bear. Here we are in the domain
In particular, by building systems that track data provenance, we of causal inference, a field which will be essential in many future AI
can diminish uncertainty regarding the mapping of data sources to applications, and one which has natural connections to diagnostics
observations, as well as their impact on states and rewards. We can and provenance ideas in databases.
also track and leverage contextual information that informs the de- Indeed, one ingredient for supporting explainable decisions is the
sign of source-specific noise models (e.g., occluded cameras). These ability to record and faithfully replay the computations that led to a
capabilities require support for provenance and noise modeling in particular decision. Such systems hold the potential to help improve
A Berkeley View of Systems Challenges for AI A Berkeley View of Systems Challenges for AI,

decision explainability by replaying a prediction task against past the other end of the spectrum, cloud providers are starting to offer
inputs—or randomly or adversarially perturbed versions of past special bare-bone instances that are physically protected, e.g., they
inputs, or more general counterfactual scenarios—to identify what are deployed in secure “vaults” to which only authorized personnel,
features of the input have caused a particular decision. For example, authenticated via fingerprint or iris scanning, has access.
to identify the cause of a false alarm in a video-based security In general, with any enclave technology, the application devel-
system, one might introduce perturbations in the input video that oper must trust all the software running within the enclave. Indeed,
attenuate the alarm signal (e.g., by masking regions of the image) or even in the case of hardware enclaves, if the code running inside the
search for closely related historical data (e.g., by identifying related enclave is compromised, it can leak decrypted data or compromise
inputs) that led to similar decisions. Such systems could also lead decisions. Since a small code base is typically easier to secure, one
to improved statistical diagnostics and improved training/testing research challenge is to split the AI system’s code into code running
for new models; e.g., by designing models that are (or are not) inside the enclave, hopefully as little as possible, and code running
amenable to explainability. outside of the enclave, in untrusted mode, by leveraging crypto-
Research: Build AI systems that can support interactive diagnostic graphic techniques. Another approach to ensure that code inside
analysis, that faithfully replay past executions, and that can help to the enclave does not leak sensitive information is to develop static
determine the features of the input that are responsible for a particular and dynamic verification tools as well as sandboxing [9, 12, 93].
decision, possibly by replaying the decision task against past perturbed Note that beside minimizing the trusted computing base, there
inputs. More generally, provide systems support for causal inference. are two additional reasons for splitting the application code: in-
creased functionality and reduced cost. First, some of the function-
ality might not be available within the enclave, e.g., GPU processing
4.2 Secure AI for running Deep Learning (DL) algorithms, or services and appli-
Security is a large topic, many aspects of which will be central to cations which are not vetted/ported yet to run within the secure
AI applications going forward. For example, mission-critical AI enclave. Second, the secure instances offered by a cloud provider
applications, personalized learning, and learning across multiple can be significantly more expensive than regular instances.
organizations all require systems with strong security properties. Research: Build AI systems that leverage secure enclaves to ensure
While there is a wide range of security issues, here we focus on two data confidentiality, user privacy and decision integrity, possibly by
broad categories. The first category is an attacker compromising splitting the AI system’s code between a minimal code base running
the integrity of the decision process. The attacker can do so either within the enclave, and code running outside the enclave. Ensure
by compromising and taking the control of the AI system itself, or the code inside the enclave does not leak information, or compromise
by altering the inputs so that the system will unknowingly render decision integrity.
decisions that the attacker wants. The second category is an attacker R5: Adversarial learning. The adaptive nature of ML algo-
learning the confidential data on which an AI system was trained rithms opens the learning systems to new categories of attacks that
on, or learning the secret model. Next, we discuss three promising aim to compromise the integrity of their decisions by maliciously
research topics to defend against such attacks. altering training data or decision input. There are two broad types
R4: Secure enclaves. The rapid rise of public cloud and the of attacks: evasion attacks and data poisoning attacks.
increased complexity of the software stack considerably widen the Evasion attacks happen at the inference stage, where an ad-
exposure of AI applications to attacks. Two decades ago most ap- versary attempts to craft data that is incorrectly classified by the
plications ran on top of a commercial OS, such as Windows or learning system [47, 103]. An example is altering the image of a
SunOS, on a single server deployed behind organization’s firewalls. stop sign slightly such that it still appears to a human to be a stop
Today, organizations run AI applications in the public cloud on sign but is seen by an autonomous vehicle as a yield sign.
a distributed set of servers they do not control, possibly shared Data poisoning attacks happen at the training stage, where an
with competitors, on a considerably more complex software stack, adversary injects poisoned data (e.g., data with wrong labels) into
where the OS itself runs on top of a hypervisor or within a con- the training data set that cause the learning system to learn the
tainer. Furthermore, the applications leverage directly or indirectly wrong model, such that the adversary thereby has input data in-
a plethora of other systems, such as log ingestion, storage, and data correctly classified by the learner [73, 74, 113]. Learning systems
processing frameworks. If any of these software components is that are periodically retrained to handle non-stationary input data
compromised, the AI applications itself might be compromised. are particularly vulnerable to this attack, if the weakly labeled data
A general approach to deal with these attacks is providing a “se- being used for retraining is collected from unreliable or untrust-
cure enclave” abstraction—a secure execution environment—which worthy sources. With new AI systems continually learning by
protects the application running within the enclave from malicious interacting with dynamic environments, handling data poisoning
code running outside the enclave. One recent example is Intel’s attacks becomes increasingly important.
Software Guard Extensions (SGX) [5], which provides a hardware- Today, there are no effective solutions to protect against evasion
enforced isolated execution environment. Code inside SGX can attacks. As such, there are a number of open research challenges:
compute on data, while even a compromised operating system or provide better understanding of why adversarial examples are often
hypervisor (running outside the enclave) cannot see this code or easy to find, investigate what method or combination of different
data. SGX also provides remote attestation [6], a protocol enabling a methods may be effective at defending against adversarial examples,
remote client to verify that the enclave is running the expected code. and design and develop systematic methods to evaluate potential
ARM’s TrustZone is another example of a hardware enclave. At defenses. For data poisoning attacks, open challenges include how
A Berkeley View of Systems Challenges for AI,

to detect poisoned input data and how to build learning systems rendered based on the model—can still leak information about the
that are resilient to different types of data poisoning attacks. In data [42, 94]. One approach to address this challenge is differential
addition, as data sources are identified to be fraudulent or explicitly privacy [36, 37, 39], a popular technique proposed in the context of
retracted for regulatory reasons, we can leverage replay (see R3: statistical databases. Differential privacy adds noise to each query
Explainable decisions) and incremental computation to efficiently to protect the data privacy, hence effectively trading accuracy for
eliminate the impact of those sources on learned models. As pointed privacy [35]. A central concept of differential privacy is the privacy
out previously, this ability is enabled by combining modeling with budget which caps the number of queries given a privacy guarantee.
provenance and efficient computation in data storage systems. There are three interesting research directions when applying
Research: Build AI systems that are robust against adversarial differential privacy to model serving. First, a promising approach
inputs both during training and prediction (e.g., decision making), is to leverage differential privacy for complex models and infer-
possibly by designing new machine learning models and network ences, by taking advantage of the inherent statistical nature of
architectures, leveraging provenance to track down fraudulent data the models and predictions. Second, despite the large volume of
sources, and replaying to redo decisions after eliminating the fraudu- theoretical research, there are few practical differential privacy
lent sources. systems in use today. An important research direction is to build
R6: Shared learning on confidential data. Today, each com- tools and systems to make it easy to enable differential privacy for
pany typically collects data individually, analyzes it, and uses this real-world applications, including intelligently selecting which pri-
data to implement new features and products. However, not all vacy mechanisms to use for a given application and automatically
organizations possess the same wealth of data as found in the few converting non-differentially-private computations to differentially-
large AI-focused corporations, such as Google, Facebook, Microsoft, private computations. Finally, one particular aspect in the context
and Amazon. Going forward, we expect more and more organiza- of continual learning is that data privacy can be time dependent,
tions to collect valuable data, more third-party data services to be that is, the privacy of fresh data is far more important than the
available, and more benefit to be gained from learning over data privacy of historical data. Examples are stock market and online
from multiple organizations (see Section 3). bidding, where the privacy of the fresh data is paramount, while
Indeed, from our own interaction with industry, we are learning the historical data is sometimes publicly released. This aspect could
about an increasing number of such scenarios. A large bank pro- enable the development of new differential privacy systems with
vided us with a scenario in which they and other banks would like adaptive privacy budgets that apply only to decisions on the most
to pool together their data and use shared learning to improve their recent data. Another research direction is to further develop the
collective fraud detection algorithms. While these banks are natural notion of differential privacy under continuous observation and
competitors in providing financial services, such ”cooperation” is data release [21, 38].
critical to minimize their losses due to fraudulent activities. A very Even if we are able to protect data confidentiality during training
large health provider described a similar scenario in which com- and decision making, this might still not be enough. Indeed, even
peting hospitals would like to share data to train a shared model if confidentiality is guaranteed, an organization might refuse to
predicting flu outbreaks without sharing the data for other purposes. share its data for improving a model from which its competitors
This would allow them to improve the response to epidemics and might benefit. Thus, we need to go beyond guaranteeing confi-
contain the outbreaks, e.g., by rapidly deploying mobile vaccination dentiality and provide incentives to organizations to share their
vans at critical locations. At the same time, every hospital must data or byproducts of their data. Specifically, we need to develop
protect the confidentiality of their own patients. approaches that ensure that by sharing data, an organization gets
The key challenge of shared learning is how to learn a model strictly better service (i.e., better decisions) than not sharing data.
on data belonging to different (possible competitive) organizations This requires ascertaining the quality of the data providing by a
without leaking relevant information about this data during the given organization—a problem which can be tackled via a “leave-
training process. One possible solution would be to pool all the one-out” approach in which performance is compared both with
data in a hardware enclave and then learn the model. However, and without that organization’s data included in the training set.
this solution is not always feasible as hardware enclaves are not We then provide decisions that are corrupted by noise at a level
yet deployed widely, and, in some cases, the data cannot be moved that is inversely proportional to the quality of the data provided
due to regulatory constraints or its large volume. by an organization. This incentivizes an organization to provide
Another promising approach is using secure multi-party com- higher-quality data. Overall, such incentives will need to be placed
putation (MPC) [13, 45, 70]. MPC enables n parties, each having a within a framework of mechanism design to allow organizations to
private input, to compute a joint function over the input without forge their individual data-sharing strategies.
any party learning the inputs of the other parties. Unfortunately, Research: Build AI systems that (1) can learn across multiple
while MPC is effective for simple computations, it has a nontrivial data sources without leaking information from a data source during
overhead for complex computations, such as model training. An training or serving, and (2) provide incentives to potentially competing
interesting research direction is how to partition model training organizations to share their data or models.
into (1) local computation and (2) computation using MPC, so that
we minimize the complexity of the MPC computation.
While training a model without compromising data confidential- 4.3 AI-specific architectures
ity is a big step towards enabling shared learning, unfortunately, AI demands will drive innovations both in systems and hardware
it is not always enough. Model serving—the inferences (decisions) architectures. These new architectures will aim not only to improve
A Berkeley View of Systems Challenges for AI A Berkeley View of Systems Challenges for AI,

the performance, but to simplify the development of the next gen- and type of domain-specific processors, DRAM, and NVRAM. Such
eration of AI applications by providing rich libraries of modules resource disaggregation at scale would significantly improve the
that are easily composable. allocation of increasingly diverse tasks to correspondingly hetero-
R7: Domain specific hardware. The ability to process and geneous resources. It is particularly valuable for AI workloads,
store huge amounts of data has been one of the key enablers of the which are known to gain significant performance benefits from
AI’s recent successes (see Section 2.1). However, continuing to keep large memory and have diverse resource requirements that don’t
up with the data being generated will be increasingly challenging. all conform to a common pattern.
As discussed in Section 3, while data continues to grow exponen- Besides performance improvements, new hardware architectures
tially, the corresponding performance-cost-energy improvements will also provide additional functionality, such as security support.
that have fueled the computer industry for more than 40 years are While Intel’s SGX and ARM’s TrustZone are paving the way to-
reaching the end-of-line: wards hardware enclaves, much more needs to be done before they
can be fully embraced by AI applications. In particular, existing en-
• Transistors are not getting much smaller due to the ending
claves exhibit various resource limitations such as addressable mem-
of Moore’s Law,
ory, and they are only available for a few general purpose CPUs.
• Power is limiting what can be put on a chip due to the end
Removing these limitations, and providing a uniform hardware
of Dennard scaling,
enclave abstraction across a diverse set of specialized processors,
• We’ve already switched from one inefficient processor/chip
including GPUs and TPUs, are promising directions of research. In
to about a dozen efficient processors per chip, but there
addition, open instruction set processors, such as RISC-V represent
are limits to parallelism due to Amdahl’s Law.
an exciting “playground” to develop new security features.
The one path left to continue the improvements in performance- Research: (1) Design domain-specific hardware architectures to
energy-cost of processors is developing domain-specific processors. improve the performance and reduce power consumption of AI ap-
These processors do only a few tasks, but they do them extremely plications by orders of magnitude, or enhance the security of these
well. Thus, the rapid improvements in processing that we have applications. (2) Design AI software systems to take advantage of these
expected in the Moore’s law era must now come through innova- domain-specific architectures, resource disaggregation architectures,
tions in computer architecture instead of semiconductor process and future non-volatile storage technologies.
improvements. Future servers will have much more heterogeneous R8: Composable AI systems. Modularity and composition
processors than in the past. One trailblazing example that spot- have played a fundamental role in the rapid progress of software
lights domain specific processors is Google’s Tensor Processing systems, as they allowed developers to rapidly build and evolve new
Unit, which has been deployed in its datacenters since 2015 and systems from existing components. Examples range from microker-
is regularly used by billions of people. It performs the inference nel OSes [3, 68], LAMP stack [64], microservice architectures [85],
phase of deep neural networks 15× to 30× faster than its contem- and the internet [26]. In contrast, today’s AI systems are monolithic
porary CPUs and GPUs and its performance per watt is 30× to which makes them hard to develop, test, and evolve.
80× better. In addition, Microsoft has announced the availability Similarly, modularity and composition will be key to increasing
of FPGA-powered instances on its Azure cloud [88], and a host of development speed and adoption of AI, by making it easier to
companies, ranging from Intel to IBM, and to startups like Cere- integrate AI in complex systems. Next, we discuss several research
bras and Graphcore are developing specialized hardware for AI problems in the context of model and action composition.
that promise orders of magnitude performance improvements over Model composition is critical to the development of more flexible
today’s state-of-the-art processors [19, 48, 54, 78]. and powerful AI systems. Composing multiple models and querying
With DRAM subject to the same limitations, there are several them in different patterns enables a tradeoff between decision accu-
novel technologies being developed that hope to be its successor. racy, latency, and throughput in a model serving system [29, 106]
3D XPoint from Intel and Micron aims to provide 10× storage ca- In one example, we can query models serially, where each model
pacity with DRAM-like performance. STT MRAM aims to succeed either renders the decision with sufficiently high accuracy or says
Flash, which may hit similar scaling limits as DRAM. Hence, the “I’m not sure”. In the latter case, the decision is passed to the next
memory and storage of the cloud will likely have more levels in the model in the series. By ordering the models from the highest to the
hierarchy and contain a wider variety of technologies. Given the lowest “I’m not sure” rate, and from lowest to the highest latency,
increasing diversity of processors, memories, and storage devices, we can optimize both latency and accuracy.
mapping services to hardware resources will become an even more To fully enable model composition, many challenges remain to
challenging problem. These dramatic changes suggest building be addressed. Examples are (1) designing a declarative language
cloud computing from a much more flexible building block than the to capture the topology of these components and specifying per-
classic standard rack containing a top-of-rack switch and tens of formance targets of the applications, (2) providing accurate perfor-
servers, each with 2 CPU chips, 1 TB of DRAM, and 4 TBs of flash. mance models for each component, including resource demands,
For example, the UC Berkeley Firebox project [41] proposes a latency and throughput, and (3) scheduling and optimization al-
multi-rack supercomputer that connects thousands of processor gorithms to compute the execution plan across components, and
chips with thousands of DRAM chips and nonvolatile storage chips map components to the available resources to satisfy latency and
using fiber optics to provide low-latency, high-bandwidth, and long throughput requirements while minimizing costs.
physical distance. Such a hardware system would allow system Action composition consists of aggregating sequences of basic
software to provision computation services with the right ratio decisions/actions into coarse-grained primitives, also called options.
A Berkeley View of Systems Challenges for AI,

In the case of a self-driving car, an example of an option is changing efficiently compile on-the-fly complex algorithms and run them on
the lane while driving on a highway, while the actions are speeding edge devices. This approach can leverage recent code generation
up, slowing down, turning left or right, signaling the change of tools, such as TensorFlow’s XLA [107], Halide [50], and Weld [83].
direction, etc. In the case of a robot, an example of a primitive could The second general approach is to design AI systems that are
be grasping an object, while actions include actuating the robot’s well suited to partitioned execution across the cloud and the edge.
joints. Options have been extensively studied in the context of hier- As one example, model composition (see Section 4.3) could allow
archical learning [30, 34, 84, 97, 102, 110]. Options can dramatically one to run the lighter but less accurate models at the edge, and the
speed up learning or adaptation to a new scenario by allowing the computation-intensive but higher-accuracy models in the cloud.
agent to select from a list of existing options to accomplish a given This architecture would improve decision latency, without compro-
task, rather than from a much longer list of low-level actions. mising accuracy, and it has been already employed in recent video
A rich library of options would enable the development of new recognition systems [59, 115]. In another example, action composi-
AI applications by simply composing the appropriate options the tion would allow building systems where learning of hierarchical
same way web programmers develop applications today in just options [63] takes place on powerful clusters in the cloud, and then
a few lines of code by invoking powerful web APIs. In addition, execution of these options happens at the edge.
options can improve responsiveness as selecting the next action Robotics is one application domain that can take advantage of a
within an option is a much simpler task than selecting an action in modular cloud-edge architecture. Today, there is a scarcity of open
the original action space. source platforms to develop robotic applications. ROS, arguably
Research: Design AI systems and APIs that allow the composition the most popular such platform in use today, is confined to run-
of models and actions in a modular and flexible manner, and develop ning locally and lacks many performance optimizations required
rich libraries of models and options using these APIs to dramatically by real-time applications. To take advantage of the new develop-
simplify the development of AI applications. ments in AI research such as shared and continual learning, we
R9: Cloud-edge systems. Today, many AI applications such need systems that can span both edge devices (e.g., robots) and
as speech recognition and language translation are deployed in the the cloud. Such systems would allow developers to seamlessly mi-
cloud. Going forward we expect a rapid increase in AI systems that grate the functionality between a robot and the cloud to optimize
span edge devices and the cloud. On one hand, AI systems which decision latency and learning convergence. While the cloud can
are currently cloud only, such as user recommendation systems [72], run sophisticated algorithms to continually update the models by
are moving some of their functionality to edge devices to improve leveraging the information gathered by distributed robots in real
security, privacy, latency and safety (including the ability to cope time, the robots can continue to execute the actions locally based
with being disconnected from the internet). On the other hand, on previously downloaded policies.
AI systems currently deployed at the edge, such as self-driving To address the challenge of the data deluge collected by the edge
cars, drones, and home robots, are increasingly sharing data and devices, learning-friendly compression methods can be used to
leveraging the computational resources available in the cloud to reduce processing overhead. Examples of such methods include
update models and policies [61]. sampling and sketching, which have already been successfully em-
However, developing cloud and the cloud-edge systems is chal- ployed for analytics workloads [4, 10, 28, 51, 81]. One research
lenging for several reasons. First, there is a large discrepancy be- direction is to aggressively leverage sampling and sketching in a
tween the capabilities of edge devices and datacenter servers. We systematic way to support a variety of learning algorithms and pre-
expect this discrepancy to increase in the future, as edge devices, diction scenarios. An arguably more difficult challenge is to reduce
such as cellphones and tablets, have much more stringent power and the storage overhead, which might require to delete data. The key
size constraints than servers in datacenters. Second, edge devices challenge here is that we do not always know how the data will be
are extremely heterogeneous both in terms of resource capabilities, used in the future. This is essentially a compression problem, but
ranging from very low power ARM or RISC-V CPUs that power IoT compression for the purposes of ML algorithms. Again, distributed
devices to powerful GPUs in self-driving cars, and software plat- approaches based in materialized samples and sketches can help
forms. This heterogeneity makes application development much provide solutions to this problem, as can ML-based approaches in
harder. Third, the hardware and software update cycles of edge the form of feature selection or model selection protocols.
devices are significantly slower than in a datacenter. Fourth, as the Research: Design cloud-edge AI systems that (1) leverage the edge
increase in the storage capacity slows down, while the growth in to reduce latency, improve safety and security, and implement intel-
the data being generated continues unabated, it may no longer be ligent data retention techniques, and (2) leverage the cloud to share
feasible or cost effective to store this deluge of data. data and models across edge devices, train sophisticated computation-
There are two general approaches to addressing the mix of cloud intensive models, and take high quality decisions.
and edge devices. The first is to repurpose code to multiple hetero-
geneous platforms via retargetable software design and compiler
technology. To address the wide heterogeneity of edge devices
and the relative difficulty of upgrading the applications running on 5 CONCLUSION
these devices, we need new software stacks that abstract away the The striking progress of AI during just the last decade is leading
heterogeneity of devices by exposing the hardware capabilities to to the successful transition from the research lab into commercial
the application through common APIs. Another promising direc- services that have previously required human input and oversight.
tion is developing compilers and just-in-time (JIT) technologies to Rather than replacing human workers, AI systems and robots have
A Berkeley View of Systems Challenges for AI A Berkeley View of Systems Challenges for AI,

potential to enhance human performance and facilitate new forms arXiv preprint arXiv:1512.01274 (2015).
of collaboration [44]. [23] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun
Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible
To realize the full promise of AI as a positive force in our lives, and Efficient Machine Learning Library for Heterogeneous Distributed Systems.
there are daunting challenges to overcome, and many of these chal- CoRR abs/1512.01274 (2015).
[24] Travers Ching, Daniel S Himmelstein, Brett K Beaulieu-Jones, Alexandr A Kalinin,
lenges are related to systems and infrastructure. These challenges Brian T Do, Gregory P Way, Enrico Ferrero, Paul-Michael Agapow, Wei Xie,
are driven by the realization that AI systems will need to make Gail L Rosen, et al. 2017. Opportunities And Obstacles For Deep Learning In
decisions that are faster, safer, and more explainable, securing these Biology And Medicine. bioRxiv (2017), 142760.
[25] cisco. 2015. Cisco Global Cloud Index: Forecast and Methodology, 2015-
decisions as well as the learning processes against ever more sophis- 2020. http://www.cisco.com/c/dam/en/us/solutions/collateral/service-provider/
ticated types of attacks, continuously increasing the computation global-cloud-index-gci/white-paper-c11-738085.pdf. (2015).
capabilities in the face of the end of Moore’s Law, and building com- [26] D. Clark. 1988. The Design Philosophy of the DARPA Internet Protocols. SIG-
COMM Comput. Commun. Rev. 18, 4 (Aug. 1988), 106–114.
posable systems that are easy to integrate in existing applications [27] CMS updates rule allowing claims data to be sold. 2016. http://www.
and can span the cloud and the edge. modernhealthcare.com/article/20160701/NEWS/160709998. (2016).
[28] Graham Cormode, Minos Garofalakis, Peter J Haas, and Chris Jermaine. 2012.
This paper proposes several open research directions in systems, Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations
architectures, and security that have potential to address these and Trends in Databases 4, 1–3 (2012), 1–294.
challenges. We hope these questions will inspire new research that [29] Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gon-
zalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving
can advance AI and make it more capable, understandable, secure System. NSDI ’17 (2017).
and reliable. [30] Peter Dayan and Geoffrey E. Hinton. 1992. Feudal Reinforcement Learning. In
Advances in Neural Information Processing Systems 5, [NIPS Conference, Denver,
Colorado, USA, November 30 - December 3, 1992]. 271–278. http://papers.nips.cc/
REFERENCES paper/714-feudal-reinforcement-learning
[1] A History of Storage Cost. 2017. http://www.mkomo.com/ [31] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark
cost-per-gigabyte-update. (2017). Mao, Marcaurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc Le,
[2] Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In NIPS ’12.
Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, and Matthieu Devin. http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf
2015. TensorFlow: Large-scale machine learning on heterogeneous systems. [32] Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Process-
(2015). ing on Large Clusters. In Proceedings of the 6th Conference on Symposium on
[3] Mike Accetta, Robert Baron, William Bolosky, David Golub, Richard Rashid, Opearting Systems Design & Implementation - Volume 6 (OSDI’04).
Avadis Tevanian, and Michael Young. 1986. Mach: A New Kernel Foundation for [33] DeepMind AI Reduces Google Data Centre Cool-
UNIX Development. 93–112. ing Bill by 40%. 2017. https://deepmind.com/blog/
[4] Sameer Agarwal et al. 2013. BlinkDB: queries with bounded errors and bounded deepmind-ai-reduces-google-data-centre-cooling-bill-40/. (2017).
response times on very large data. In EuroSys. [34] Thomas G. Dietterich. 1998. The MAXQ Method for Hierarchical Reinforcement
[5] Ittai Anati, Shay Gueron, Simon Johnson, and Vincent Scarlata. 2013. Innovative Learning. In Proceedings of the Fifteenth International Conference on Machine
technology for CPU based attestation and sealing. In Proceedings of the 2nd Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27, 1998. 118–126.
international workshop on hardware and architectural support for security and [35] John Duchi, Michael Jordan, and Martin Wainwright. to appear. Minimax optimal
privacy, Vol. 13. procedures for locally private estimation. J. Amer. Statist. Assoc. (to appear).
[6] Ittai Anati, Shay Gueron, Simon P Johnson, and Vincent R Scarlata. 2013. Inno- [36] Cynthia Dwork. 2006. Differential Privacy. In ICALP (2), Vol. 4052. Springer.
vative Technology for CPU Based Attestation and Sealing. (2013). [37] Cynthia Dwork. 2008. Differential privacy: A survey of results. In International
[7] Apache Hadoop. 2017. http://hadoop.apache.org/. (2017). Conference on Theory and Applications of Models of Computation.
[8] Apache Mahout. 2017. http://mahout.apache.org/. (2017). [38] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. 2010. Dif-
[9] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin, ferential privacy under continual observation. In Proceedings of the 42nd ACM
Christian Priebe, Joshua Lind, Divya Muthukumaran, Daniel OfiKeeffe, Mark L symposium on Theory of computing.
Stillwell, et al. 2016. SCONE: Secure linux containers with Intel SGX. In 12th [39] Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential
USENIX Symp. Operating Systems Design and Implementation. privacy. Foundations and Trends in Theoretical Computer Science 9 (2014).
[10] Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, and [40] The Economist. 2017. The world’s most valuable resource is no longer oil, but
Sahaana Suri. 2017. MacroBase: Prioritizing Attention in Fast Data. In Proceedings data. (May 2017).
of the 2017 ACM International Conference on Management of Data (SIGMOD ’17). [41] FireBox. 2017. https://bar.eecs.berkeley.edu/projects/2015-firebox.html. (2017).
ACM, New York, NY, USA, 541–556. [42] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion
[11] Luiz Andre Barroso and Urs Hoelzle. 2009. The Datacenter As a Computer: An attacks that exploit confidence information and basic countermeasures. In Pro-
Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool. ceedings of the 22nd ACM SIGSAC Conference on Computer and Communications
[12] Andrew Baumann, Marcus Peinado, and Galen Hunt. 2015. Shielding applications Security. ACM, 1322–1333.
from an untrusted cloud with haven. ACM Transactions on Computer Systems [43] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File
(TOCS) 33, 3 (2015), 8. System. In Proceedings of the Nineteenth ACM Symposium on Operating Systems
[13] Michael Ben-Or, Shafi Goldwasser, and Avi Wigderson. 1988. Completeness Principles (SOSP ’03). 29–43.
theorems for non-cryptographic fault-tolerant distributed computation. In Pro- [44] Ken Goldberg. 2017. Op-Ed: Call it Multiplicity: Diverse Groups of People and
ceedings of the 20th ACM symposium on Theory of Computing. Machines Working Together. Wall Street Journal (2017).
[14] James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan [45] Oded Goldreich, Silvio Micali, and Avi Wigderson. 1987. How to play any mental
Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua game. In Proceedings of the 19th ACM symposium on Theory of computing.
Bengio. 2010. Theano: a CPU and GPU math expression compiler. In Proceedings [46] Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin.
of the Python for scientific computing conference (SciPy), Vol. 4. Austin, TX, 3. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs
[15] bigdl. BigDL: Distributed Deep Learning on Apache Spark. https://software.intel. (OSDI’12). 17–30.
com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark. (????). [47] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and
[16] Tay (bot). 2017. https://en.wikipedia.org/wiki/Tay (bot). (2017). harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[17] Léon Bottou. 1998. On-line Learning in Neural Networks. (1998), 9–42. [48] Graohcore. 2017. https://www.graphcore.ai/. (2017).
[18] Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. [49] Alon Halevy, Peter Norvig, , and Fernando Pereira. 2009. The Unreasonable
In Proceedings of COMPSTAT’2010. Springer, 177–186. Effectiveness of Data. IEEE Intelligent Systems 24, 2 (2009), 8–12.
[19] Cerebras. 2017. https://www.cerebras.net/. (2017). [50] Halide: A Language for Image Processing and Computational Photography. 2017.
[20] Chainer. 2017. https://chainer.org/. (2017). http://halide-lang.org/. (2017).
[21] T.-H. Hubert Chan, Elaine Shi, and Dawn Song. 2010. Private and Continual [51] Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online Aggregation.
Release of Statistics. In ICALP (2), Vol. 6199. Springer. In Proceedings of the 1997 ACM SIGMOD International Conference on Management
[22] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun of Data (SIGMOD ’97). ACM, New York, NY, USA, 171–182. https://doi.org/10.
Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible 1145/253260.253291
and Efficient Machine Learning Library for Heterogeneous Distributed Systems.
A Berkeley View of Systems Challenges for AI,

[52] Joseph M. Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, [74] Shike Mei and Xiaojin Zhu. 2015. Using Machine Teaching to Identify Optimal
Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Training-Set Attacks on Machine Learners.. In AAAI. 2871–2877.
Kun Li, and Arun Kumar. 2012. The MADlib Analytics Library: Or MAD Skills, [75] Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkatara-
the SQL. Proc. VLDB Endow. 5, 12 (Aug. 2012), 1700–1711. man, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris
[53] John L. Hennessy and David A. Patterson. to appear. Computer Architecture, Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet
Sixth Edition: A Quantitative Approach. (to appear). Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. Journal of Machine
[54] Intel Nervana. 2017. https://www.intelnervana.com/intel-nervana-hardware/. Learning Research 17, 34 (2016), 1–7. http://jmlr.org/papers/v17/15-237.html
(2017). [76] Tom M Mitchell, William W Cohen, Estevam R Hruschka Jr, Partha Pratim
[55] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Talukdar, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matthew
Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. In Gardner, Bryan Kisiel, Jayant Krishnamurthy, et al. 2015. Never Ending Learning..
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer In AAAI. 2302–2310.
Systems 2007 (EuroSys ’07). 59–72. [77] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness,
[56] Martin Jaggi, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krish- Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg
nan, Thomas Hoffmann, and Michael I. Jordan. 2015. Communication-Efficient Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen
Distributed Dual Coordinate Ascent. In NIPS, 27. King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015.
[57] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Human-level control through deep reinforcement learning. Nature 518, 7540 (26
Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convo- 02 2015), 529–533. http://dx.doi.org/10.1038/nature14236
lutional architecture for fast feature embedding. In Proceedings of the ACM [78] Dharmendra Modha. 2016. The brainfis architecture, efficiencyfi on
International Conference on Multimedia. ACM, 675–678. a chip. (Dec. 2016). https://www.ibm.com/blogs/research/2016/12/
[58] Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, the-brains-architecture-efficiency-on-a-chip/
Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, [79] Derek G. Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil
Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Madhavapeddy, and Steven Hand. 2011. CIEL: A Universal Execution Engine for
Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, Distributed Data-flow Computing. In Proceedings of the 8th USENIX Conference
William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, on Networked Systems Design and Implementation (NSDI’11). USENIX Association,
Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Berkeley, CA, USA, 113–126. http://dl.acm.org/citation.cfm?id=1972457.1972470
Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve [80] Average Historic Price of RAM. 2017. http://www.statisticbrain.com/average-
Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle historic-price-of-ram/. (2017).
Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran [81] Frank Olken and Doron Rotem. 1990. Random sampling from database files: A
Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, survey. Statistical and Scientific Database Management (1990), 92–111.
Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, [82] Open MPI: Open Source High Performance Computing. 2017. https://www.
Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, open-mpi.org/. (2017).
Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo [83] Shoumik Palkar, James J. Thomas, Anil Shanbhag, Deepak Narayanan, Holger
Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Pirk, Malte Schwarzkopf, Saman Amarasinghe, and Matei Zaharia. 2017. Weld:
Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis A Common Runtime for High Performance Data Analytics. In CIDR.
of a Tensor Processing Unit. In Proceedings of the 44th Annual International [84] Ronald Parr and Stuart J. Russell. 1997. Reinforcement Learning with Hierarchies
Symposium on Computer Architecture (ISCA ’17). ACM, New York, NY, USA, 1–12. of Machines. In Advances in Neural Information Processing Systems 10, [NIPS
https://doi.org/10.1145/3079856.3080246 Conference, Denver, Colorado, USA, 1997]. 1043–1049. http://papers.nips.cc/paper/
[59] Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 1384-reinforcement-learning-with-hierarchies-of-machines
2017. Optimizing Deep CNN-Based Queries over Video Streams at Scale. CoRR [85] Pattern: Microservice Architecture. 2017. http://microservices.io/patterns/
abs/1703.02529 (2017). microservices.html. (2017).
[60] Asterios Katsifodimos and Sebastian Schelter. 2016. Apache Flink: Stream Ana- [86] Jan Peters and Stefan Schaal. 2008. Reinforcement learning of motor skills with
lytics at Scale. policy gradients. Neural networks 21, 4 (2008), 682–697.
[61] Ben Kehoe, Sachin Patil, Pieter Abbeel, and Ken Goldberg. 2015. A Survey of [87] Gil Press. 2016. Forrester Predicts Investment In Artificial Intelligence Will Grow
Research on Cloud Robotics and Automation. IEEE Trans. Automation Science 300% in 2017. Forbes (November 2016).
and Eng. 12, 2 (2015). [88] Project Catapult. 2017. https://www.microsoft.com/en-us/research/project/
[62] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic project-catapult/. (2017).
Optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980 [89] PyTorch. 2017. http://pytorch.org/. (2017).
[63] Sanjay Krishnan, Roy Fox, Ion Stoica, and Ken Goldberg. 2017. DDCO: Discovery [90] Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale Deep
of Deep Continuous Options for Robot Learning from Demonstrations. In 1st Unsupervised Learning Using Graphics Processors. In Proceedings of the 26th
Conference on Robot Learning (CoRL). Annual International Conference on Machine Learning (ICML ’09). ACM, New
[64] LAMP (software bundle). 2017. https://en.wikipedia.org/wiki/LAMP (software York, NY, USA, 873–880. https://doi.org/10.1145/1553374.1553486
bundle). (2017). [91] Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild:
[65] Leo Leung. 2015. How much data does x store? (March 2015). https: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS 24.
//techexpectations.org/tag/how-much-data-does-youtube-store/ [92] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter
[66] Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end Abbeel. 2015. Trust Region Policy Optimization. In Proceedings of the 32nd
Training of Deep Visuomotor Policies. J. Mach. Learn. Res. 17, 1 (Jan. 2016), International Conference on Machine Learning (ICML).
1334–1373. http://dl.acm.org/citation.cfm?id=2946645.2946684 [93] Felix Schuster, Manuel Costa, Cédric Fournet, Christos Gkantsidis, Marcus
[67] Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Peinado, Gloria Mainar-Ruiz, and Mark Russinovich. 2015. VC3: Trustwor-
Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling thy data analytics in the cloud using SGX. In Security and Privacy (SP), 2015 IEEE
Distributed Machine Learning with the Parameter Server. In OSDI ’14. 583–598. Symposium on. IEEE, 38–54.
[68] J. Liedtke. 1995. On Micro-kernel Construction. In Proceedings of the Fifteenth [94] Reza Shokri, Marco Stronati, and Vitaly Shmatikov. 2016. Membership inference
ACM Symposium on Operating Systems Principles (SOSP ’95). ACM, New York, attacks against machine learning models. arXiv preprint arXiv:1610.05820 (2016).
NY, USA, 237–250. https://doi.org/10.1145/224056.224075 [95] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George
[69] M. W. Mahoney and P. Drineas. 2009. CUR Matrix Decompositions for Improved Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershel-
Data Analysis. Proc. Natl. Acad. Sci. USA 106 (2009), 697–702. vam, Marc Lanctot, and others. 2016. Mastering the game of Go with deep neural
[70] Dahlia Malkhi, Noam Nisan, Benny Pinkas, Yaron Sella, et al. 2004. Fairplay- networks and tree search. Nature 529, 7587 (2016), 484–489.
Secure Two-Party Computation System.. In USENIX Security Symposium, Vol. 4. [96] Daniel L Silver, Qiang Yang, and Lianghao Li. 2013. Lifelong Machine Learning
San Diego, CA, USA. Systems: Beyond Learning Algorithms.. In AAAI Spring Symposium: Lifelong
[71] Michael Mccloskey and Neil J. Cohen. 1989. Catastrophic Interference in Connec- Machine Learning, Vol. 13. 05.
tionist Networks: The Sequential Learning Problem. The Psychology of Learning [97] Satinder P. Singh. 1992. Reinforcement Learning with a Hierarchy of Abstract
and Motivation 24 (1989), 104–169. Models. In Proceedings of the 10th National Conference on Artificial Intelligence.
[72] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and San Jose, CA, July 12-16, 1992. 202–207. http://www.aaai.org/Library/AAAI/1992/
Blaise Aguera y Arcas. 2016. Communication-Efficient Learning of Deep Net- aaai92-032.php
works from Decentralized Data. In Proceedings of the 20th International Conference [98] Evan R Sparks et al. 2013. MLI: An API for distributed machine learning. In
on Artificial Intelligence and Statistics (AISTATS). http://arxiv.org/abs/1602.05629 ICDM.
[73] Shike Mei and Xiaojin Zhu. 2015. The Security of Latent Dirichlet Allocation.. In [99] Stephane Ross. 2013. Interactive Learning for Sequential Decisions and Predic-
AISTATS. tions. https://en.wikipedia.org/wiki/LAMP (software bundle). (2013).
A Berkeley View of Systems Challenges for AI A Berkeley View of Systems Challenges for AI,

[100] Zachary D Stephens, Skylar Y Lee, Faraz Faghri, Roy H Campbell, Chengxiang
Zhai, Miles J Efron, Ravishankar Iyer, Michael C Schatz, Saurabh Sinha, and
Gene E Robinson. 2015. Big data: Astronomical or genomical? PLoS Biology 13,
7 (2015), e1002195.
[101] Richard S. Sutton. Integrated architectures for learning, planning, and reacting
based on approximating dynamic programming. In Proceedings of the Seventh
International Conference on Machine Learning. Morgan Kaufmann.
[102] Richard S. Sutton, Doina Precup, and Satinder P. Singh. 1999. Between MDPs and
Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.
Artif. Intell. 112, 1-2 (1999), 181–211. https://doi.org/10.1016/S0004-3702(99)
00052-1
[103] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru
Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural
networks. arXiv preprint arXiv:1312.6199 (2013).
[104] Kai-Fu Tang, Hao-Cheng Kao, Chun-Nan Chou, and Edward Y. Chang. 2016.
Inquire and Diagnose: Neural Symptom Checking Ensemble using Deep Rein-
forcement Learning. http://infolab.stanford.edu/∼echang/NIPS DeepRL 2016
Symptom Checker.pdf. (2016).
[105] Russ Tedrake, Teresa Weirui Zhang, and H Sebastian Seung. 2005. Learning to
walk in 20 minutes. In Proceedings of the Fourteenth Yale Workshop on Adaptive
and Learning Systems, Vol. 95585. Yale University New Haven (CT), 1939–1412.
[106] TensorFlow Serving. 2017. https://tensorflow.github.io/serving/. (2017).
[107] TensorFlow XLA. 2017. https://www.tensorflow.org/performance/xla/. (2017).
[108] Gerald Tesauro. 1995. Temporal difference learning and TD-Gammon. Commun.
ACM 38, 3 (1995), 58–68.
[109] Sebastian Thrun. 1998. Lifelong learning algorithms. Learning to learn 8 (1998),
181–209.
[110] Sebastian Thrun and Anton Schwartz. 1994. Finding Structure in Reinforce-
ment Learning. In Advances in Neural Information Processing Systems 7, [NIPS
Conference, Denver, Colorado, USA, 1994]. 385–392. http://papers.nips.cc/paper/
887-finding-structure-in-reinforcement-learning
[111] Joshua Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba,
and Pieter Abbeel. 2017. Domain Randomization for Transferring Deep Neural
Networks from Simulation to the Real World. CoRR abs/1703.06907 (2017).
http://arxiv.org/abs/1703.06907
[112] Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell.
2014. Deep Domain Confusion: Maximizing for Domain Invariance. CoRR
abs/1412.3474 (2014). http://arxiv.org/abs/1412.3474
[113] Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert,
and Fabio Roli. 2015. Is feature selection secure against training data poisoning?.
In ICML. 1689–1698.
[114] Matei Zaharia et al. 2012. Resilient distributed datasets: A fault-tolerant ab-
straction for in-memory cluster computing. In NSDI ’12.
[115] Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodı́k, Matthai Philipose,
Paramvir Bahl, and Michael J. Freedman. 2017. Live Video Analytics at Scale
with Approximation and Delay-Tolerance. In NSDI.

You might also like