Real-Time Machine Learning: The Missing Pieces

This document discusses the requirements for real-time machine learning applications. It outlines three key requirements: 1) low latency and high throughput computation to support tasks with millisecond timing and millions of tasks per second, 2) an adaptive execution model to handle heterogeneous tasks with arbitrary dataflow dependencies, and 3) fault tolerance, debuggability and profiling capabilities for production environments. Existing frameworks do not meet all of these requirements, indicating a need for a new distributed execution framework designed for real-time machine learning applications.

Uploaded by

No12n533

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views6 pages

Real-Time Machine Learning: The Missing Pieces

Uploaded by

No12n533

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Real-Time Machine Learning: The Missing Pieces

Robert Nishihara∗ , Philipp Moritz∗ , Stephanie Wang, Alexey Tumanov, William Paul,
Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael I. Jordan, Ion Stoica
arXiv:1703.03924v2 [cs.DC] 19 May 2017

UC Berkeley

logs
Abstract
Training Models Query
Machine learning applications are increasingly deployed Data Model
(off-
not only to serve predictions using static models, but also sets Serving Prediction
line)
as tightly-integrated components of feedback loops in-
volving dynamic, real-time decision making. These ap- (a)
plications pose a new set of requirements, none of which
Compute & Query Observation
are difficult to achieve in isolation, but the combination policy
of which creates a challenge for existing distributed ex- (obs. à action) Action
ecution frameworks: computation with millisecond la-
tency at high throughput, adaptive construction of arbi- (b)
trary task graphs, and execution of heterogeneous kernels
over diverse sets of resources. We assert that a new dis- Figure 1: (a) Traditional ML pipeline (off-line training).
tributed execution framework is needed for such ML ap- (b) Example reinforcement learning pipeline: the system con-
tinously interacts with an environment to learn a policy, i.e., a
plications and propose a candidate approach with a proof-
mapping between observations and actions.
of-concept architecture that achieves a 63x performance
improvement over a state-of-the-art execution framework
for a representative application.
Since learning by interacting with the real world can
be unsafe, impractical, or bandwidth-limited, many re-
1 Introduction inforcement learning systems rely heavily on simulating
physical or virtual environments. Simulations may be
The landscape of machine learning (ML) applications is used during training (e.g., to learn a neural network pol-
undergoing a significant change. While ML has predom- icy), and during deployment. In the latter case, we may
inantly focused on training and serving predictions based constantly update the simulated environment as we inter-
on static models (Figure 1a), there is now a strong shift act with the real world and perform many simulations to
toward the tight integration of ML models in feedback figure out the next action (e.g., using online planning al-
loops. Indeed, ML applications are expanding from the gorithms like Monte Carlo tree search). This requires the
supervised learning paradigm, in which static models are ability to perform simulations faster than real time.
trained on offline data, to a broader paradigm, exempli-
Such emerging applications require new levels of pro-
fied by reinforcement learning (RL), in which applica-
gramming flexibility and performance. Meeting these
tions may operate in real environments, fuse and react to
requirements without losing the benefits of modern dis-
sensory data from numerous input streams, perform con-
tributed execution frameworks (e.g., application-level
tinuous micro-simulations, and close the loop by taking
fault tolerance) poses a significant challenge. Our own ex-
actions that affect the sensed environment (Figure 1b).
perience implementing ML and RL applications in Spark,
∗ equal contribution MPI, and TensorFlow highlights some of these challenges
Time
(a) multiple sensor inputs (b) Monte Carlo tree search (MCTS) (c) Recurrent Neural Network (RNN)

Figure 2: Example components of a real-time ML application: (a) online processing of streaming sensory data to model
the environment, (b) dynamic graph construction for Monte Carlo tree search (here tasks are simulations exploring sequences of
actions), and (c) heterogeneous tasks in recurrent neural networks. Different shades represent different types of tasks, and the task
lengths represent their durations.

and gives rise to three groups of requirements for sup- plicit system support for heterogeneity of tasks and
porting these applications. Though these requirements are resources is essential for RL applications.
critical for ML and RL applications, we believe they are
broadly useful. • R5: Arbitrary dataflow dependencies. Similarly,
Performance Requirements. Emerging ML applications deep learning primitives and RL simulations produce
have stringent latency and throughput requirements. arbitrary and often fine-grained task dependencies
(not restricted to bulk synchronous parallel).
• R1: Low latency. The real-time, reactive, and inter-
active nature of emerging ML applications calls for Practical Requirements.
fine-granularity task execution with millisecond end-
• R6: Transparent fault tolerance. Fault tolerance re-
to-end latency [8].
mains a key requirement for many deployment sce-
• R2: High throughput. The volume of micro- narios, and supporting it alongside high-throughput
simulations required both for training [16] as well and non-deterministic tasks poses a challenge.
as for inference during deployment [19] necessitates
• R7: Debuggability and Profiling. Debugging and
support for high-throughput task execution on the or-
performance profiling are the most time-consuming
der of millions of tasks per second.
aspects of writing any distributed application. This
Execution Model Requirements. Though many exist- is especially true for ML and RL applications, which
ing parallel execution systems [9, 21] have gotten great are often compute-intensive and stochastic.
mileage out of identifying and optimizing for common
Existing frameworks fall short of achieving one or more
computational patterns, emerging ML applications re-
of these requirements (Section 5). We propose a flexible
quire far greater flexibility [10].
distributed programming model (Section 3.1) to enable
• R3: Dynamic task creation. RL primitives such as R3-R5. In addition, we propose a system architecture
Monte Carlo tree search may generate new tasks dur- to support this programming model and meet our perfor-
ing execution based on the results or the durations of mance requirements (R1-R2) without giving up key prac-
other tasks. tical requirements (R6-R7). The proposed system archi-
tecture (Section 3.2) builds on two principal components:
• R4: Heterogeneous tasks. Deep learning primitives a logically-centralized control plane and a hybrid sched-
and RL simulations produce tasks with widely differ- uler. The former enables stateless distributed components
ent execution times and resource requirements. Ex- and lineage replay. The latter allocates resources in a
bottom-up fashion, splitting locally-born work between graph must be constructed dynamically in order to allow
node-level and cluster-level schedulers. the algorithm to adapt to real-time constraints and oppor-
The result is millisecond-level performance on mi- tunities.
crobenchmarks and a 63x end-to-end speedup on a repre-
sentative RL application over a bulk synchronous parallel 3 Proposed Solution
(BSP) implementation.
In this section, we outline a proposal for a distributed ex-
ecution framework and a programming model satisfying
2 Motivating Example requirements R1-R7 for real-time ML applications.
To motivate requirements R1-R7, consider a hypothetical
application in which a physical robot attempts to achieve 3.1 API and Execution Model
a goal in an unfamiliar real-world environment. Various In order to support the execution model requirements (R3-
sensors may fuse video and LIDAR input to build multi- R5), we outline an API that allows arbitrary functions to
ple candidate models of the robot’s environment (Fig. 2a). be specified as remotely executable tasks, with dataflow
The robot is then controlled in real time using actions dependencies between them.
informed by a recurrent neural network (RNN) policy
(Fig. 2c), as well as by Monte Carlo tree search (MCTS) 1. Task creation is non-blocking. When a task is cre-
and other online planning algorithms (Fig. 2b). Using a ated, a future [4] representing the eventual return
physics simulator along with the most recent environment value of the task is returned immediately, and the
models, MCTS tries millions of action sequences in par- task is executed asynchronously.
allel, adaptively exploring the most promising ones. 2. Arbitrary function invocation can be designated as a
The Application Requirements. Enabling these kinds remote task, making it possible to support arbitrary
of applications involves simultaneously solving a num- execution kernels (R4). Task arguments can be either
ber of challenges. In this example, the latency require- regular values or futures. When an argument is a
ments (R1) are stringent, as the robot must be controlled future, the newly created task becomes dependent on
in real time. High task throughput (R2) is needed to the task that produces that future, enabling arbitrary
support the online simulations for MCTS as well as the DAG dependencies (R5).
streaming sensory input. 3. Any task execution can create new tasks without
Task heterogeneity (R4) is present on many scales: blocking on their completion. Task throughput is
some tasks run physics simulators, others process diverse therefore not limited by the bandwidth of any one
data streams, and some compute actions using RNN- worker (R2), and the computation graph is dynami-
based policies. Even similar tasks may exhibit substantial cally built (R3).
variability in duration. For example, the RNN consists of 4. The actual return value of a task can be obtained by
different functions for each “layer”, each of which may calling the get method on the corresponding future.
require different amounts of computation. Or, in a task This blocks until the task finishes executing.
simulating the robot’s actions, the simulation length may 5. The wait method takes a list of futures, a timeout,
depend on whether the robot achieves its goal or not. and a number of values. It returns the subset of fu-
In addition to the heterogeneity of tasks, the dependen- tures whose tasks have completed when the timeout
cies between tasks can be complex (R5, Figs. 2a and 2c) occurs or the requested number have completed.
and difficult to express as batched BSP stages.
Dynamic construction of tasks and their dependen- The wait primitive allows developers to specify la-
cies (R3) is critical. Simulations will adaptively use the tency requirements (R1) with a timeout, accounting for
most recent environment models as they become avail- arbitrarily sized tasks (R4). This is important for ML ap-
able, and MCTS may choose to launch more tasks explor- plications, in which a straggler task may produce negligi-
ing particular subtrees, depending on how promising they ble algorithmic improvement but block the entire compu-
are or how fast the computation is. Thus, the dataflow tation. This primitive enhances our ability to dynamically
the database is fault-tolerant, we can recover from compo-
Global Scheduler nent failures by simply restarting the failed components.
Web UI

Furthermore, the database stores the computation lineage,

Proﬁling Tools
which allows us to reconstruct lost data by replaying the
Control State
Debugging Tools computation [21]. As a result, this design is fault tolerant
Object Table

Task Table
(R6). The database also makes it easy to write tools to
Error Diagnosis
Event Logs profile and inspect the state of the system (R7).
Function Table
To achieve the throughput requirement (R2), we shard
the database. Since we require only exact matching opera-
Node Node Node tions and since the keys are computed as hashes, sharding
Local Scheduler Local Scheduler Local Scheduler
is straightforward. Our early experiments show that this
Worker Worker Worker Worker Worker Worker Worker Worker Worker
design enables sub-millisecond scheduling latencies (R1).
Shared Memory Shared Memory Shared Memory

Object Store Object Store Object Store

3.2.2 Hybrid Scheduling
Our requirements for latency (R1), throughput (R2), and
dynamic graph construction (R3) naturally motivate a hy-
Figure 3: Proposed Architecture, with hybrid scheduling (Sec-
brid scheduler in which local schedulers assign tasks to
tion 3.2.2) and a centralized control plane (Section 3.2.1).
workers or delegate responsibility to one or more global
schedulers.
modify the computation graph as a function of execution- Workers submit tasks to their local schedulers which
time properties (R3). decide to either assign the tasks to other workers on the
To complement the fine-grained programming model, same physical node or to “spill over” the tasks to a global
we propose using a dataflow execution model in which scheduler. Global schedulers can then assign tasks to lo-
tasks become available for execution if and only if their cal schedulers based on global information about factors
dependencies have finished executing. including object locality and resource availability.
Since tasks may create other tasks, schedulable work
3.2 Proposed Architecture may come from any worker in the cluster. Enabling
Our proposed architecture consists of multiple worker any local scheduler to handle locally generated work
processes running on each node in the cluster, one lo- without involving a global scheduler improves low la-
cal scheduler per node, one or more global schedulers tency (R1), by avoiding communication overheads, and
throughout the cluster, and an in-memory object store for throughput (R2), by significantly reducing the global
sharing data between workers (see Figure 3). scheduler load. This hybrid scheduling scheme fits well
The two principal architectural features that enable R1- with the recent trend toward large multicore servers [20].
R7 are a hybrid scheduler and a centralized control plane.

3.2.1 Centralized Control State 4 Feasibility

As shown in Figure 3, our architecture relies on a To demonstrate that these API and architectural proposals
logically-centralized control plane [13]. To realize this could in principle support requirements R1-R7, we pro-
architecture, we use a database that provides both (1) vide some simple examples using the preliminary system
storage for the system’s control state, and (2) publish- design outlined in Section 3.
subscribe functionality to enable various system compo-
nents to communicate with each other.1
This design enables virtually any component of the sys- 4.1 Latency Microbenchmarks
tem, except for the database, to be stateless. So as long as Using our prototype system, a task can be created, mean-
1 Inour implementation we employ Redis [18], although many other ing that the task is submitted asynchronously for execu-
fault-tolerant key-value stores could be used. tion and a future is returned, in around 35µs. Once a task
has finished executing, its return value can be retrieved in these systems fully support the ability to dynamically ex-
around 110µs. The end-to-end time, from submitting an tend the dataflow graph in response to both input data and
empty task for execution to retrieving its return value, is task progress (R3).
around 290µs when the task is scheduled locally and 1ms Dynamic dataflow systems like CIEL [15] and
when the task is scheduled on a remote node. Dask [17] support many of the same features as static
dataflow systems, with additional support for dynamic
task creation (R3). These systems meet our execution
4.2 Reinforcement Learning
model requirements (R3-R5). However, their architec-
We implement a simple workload in which an RL agent tural limitations, such as entirely centralized scheduling,
is trained to play an Atari game. The workload alternates are such that low latency (R1) must often be traded off
between stages in which actions are taken in parallel sim- with high throughput (R2) (e.g., via batching), whereas
ulations and actions are computed in parallel on GPUs. our applications require both.
Despite the BSP nature of the example, an implementa- Other systems like Open MPI [11] and actor-
tion in Spark is 9x slower than the single-threaded imple- model variants Orleans [5] and Erlang [3] provide low-
mentation due to system overhead. An implementation in latency (R1) and high-throughput (R2) distributed com-
our prototype is 7x faster than the single-threaded version putation. Though these systems do in principle provide
and 63x faster than the Spark implementation.2 primitives for supporting our execution model require-
This example exhibits two key features. First, tasks are ments (R3-R5) and have been used for ML [7, 2], much of
very small (around 7ms each), making low task overhead the logic required for systems-level features, such as fault
critical. Second, the tasks are heterogeneous in duration tolerance (R6) and locality-aware task scheduling, must
and in resource requirements (e.g., CPUs and GPUs). be implemented at the application level.
This example is just one component of an RL workload,
and would typically be used as a subroutine of a more so-
phisticated (non-BSP) workload. For example, using the 6 Conclusion
wait primitive, we can adapt the example to process the
simulation tasks in the order that they finish so as to bet- Machine learning applications are evolving to require dy-
ter pipeline the simulation execution with the action com- namic dataflow parallelism with millisecond latency and
putations on the GPU, or run the entire workload nested high throughput, posing a severe challenge for existing
within a larger adaptive hyperparameter search. These frameworks. We outline the requirements for supporting
changes are all straightforward using the API described this emerging class of real-time ML applications, and we
in Section 3.1 and involve a few extra lines of code. propose a programming model and architectural design
to address the key requirements (R1-R5), without com-
promising existing requirements (R6-R7). Preliminary,
5 Related Work proof-of-concept results confirm millisecond-level system
overheads and meaningful speedups for a representative
Static dataflow systems [9, 21, 12, 14] are well- RL application.
established in analytics and ML, but they require the
dataflow graph to be specified upfront, e.g., by a driver
Acknowledgments
program. Some, like MapReduce [9] and Spark [21], em-
phasize BSP execution, while others, like Dryad [12] and We would like to thank Richard Shin for substantial con-
Naiad [14], support complex dependency structures (R5). tributions to the development of our prototype.
Others, such as TensorFlow [1] and MXNet [6], are op- This research is supported in part by DHS Award
timized for deep-learning workloads. However, none of HSHQDC-16-3-00083, NSF CISE Expeditions Award
2 In this comparison, the GPU model fitting could not be naturally CCF-1139158, and gifts from Ant Financial, Ama-
parallelized on Spark, so the numbers are reported as if it had been per- zon Web Services, CapitalOne, Ericsson, GE, Google,
fectly parallelized with no overhead in Spark. Huawei, Intel, IBM, Microsoft and VMware.
References [12] I SARD , M., B UDIU , M., Y U , Y., B IRRELL , A., AND F ETTERLY,
D. Dryad: Distributed data-parallel programs from sequential
[1] A BADI , M., BARHAM , P., C HEN , J., C HEN , Z., DAVIS , A., building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys
D EAN , J., D EVIN , M., G HEMAWAT, S., I RVING , G., I SARD , European Conference on Computer Systems 2007 (New York, NY,
M., K UDLUR , M., L EVENBERG , J., M ONGA , R., M OORE , S., USA, 2007), EuroSys ’07, ACM, pp. 59–72.
M URRAY, D. G., S TEINER , B., T UCKER , P., VASUDEVAN , V.,
[13] K REUTZ , D., R AMOS , F. M., V ERISSIMO , P. E., ROTHENBERG ,
WARDEN , P., W ICKE , M., Y U , Y., AND Z HENG , X. TensorFlow:
C. E., A ZODOLMOLKY, S., AND U HLIG , S. Software-defined
A system for large-scale machine learning. In 12th USENIX Sym-
networking: A comprehensive survey. Proceedings of the IEEE
posium on Operating Systems Design and Implementation (OSDI
103, 1 (2015), 14–76.
16) (GA, 2016), USENIX Association, pp. 265–283.
[14] M URRAY, D. G., M C S HERRY, F., I SAACS , R., I SARD , M.,
[2] A MODEI , D., A NUBHAI , R., BATTENBERG , E., C ASE , C., BARHAM , P., AND A BADI , M. Naiad: A timely dataflow sys-
C ASPER , J., C ATANZARO , B., C HEN , J., C HRZANOWSKI , M., tem. In Proceedings of the Twenty-Fourth ACM Symposium on
C OATES , A., D IAMOS , G., ET AL . Deep speech 2: End-to- Operating Systems Principles (New York, NY, USA, 2013), SOSP
end speech recognition in english and mandarin. arXiv preprint ’13, ACM, pp. 439–455.
arXiv:1512.02595 (2015).
[15] M URRAY, D. G., S CHWARZKOPF, M., S MOWTON , C., S MITH ,
[3] A RMSTRONG , J., V IRDING , R., W IKSTR ÖM , C., AND S., M ADHAVAPEDDY, A., AND H AND , S. CIEL: A universal exe-
W ILLIAMS , M. Concurrent programming in ERLANG. cution engine for distributed data-flow computing. In Proceedings
[4] BAKER , J R ., H. C., AND H EWITT, C. The incremental garbage of the 8th USENIX Conference on Networked Systems Design and
collection of processes. In Proceedings of the 1977 Symposium on Implementation (Berkeley, CA, USA, 2011), NSDI’11, USENIX
Artificial Intelligence and Programming Languages (New York, Association, pp. 113–126.
NY, USA, 1977), ACM, pp. 55–59. [16] NAIR , A., S RINIVASAN , P., B LACKWELL , S., A LCICEK , C.,
[5] B YKOV, S., G ELLER , A., K LIOT, G., L ARUS , J. R., PANDYA , F EARON , R., M ARIA , A. D., PANNEERSHELVAM , V., S ULEY-
R., AND T HELIN , J. Orleans: Cloud computing for everyone. MAN , M., B EATTIE , C., P ETERSEN , S., L EGG , S., M NIH , V.,
In Proceedings of the 2nd ACM Symposium on Cloud Computing K AVUKCUOGLU , K., AND S ILVER , D. Massively parallel meth-
(2011), ACM, p. 16. ods for deep reinforcement learning, 2015.

[6] C HEN , T., L I , M., L I , Y., L IN , M., WANG , N., WANG , M., [17] ROCKLIN , M. Dask: Parallel computation with blocked algo-
X IAO , T., X U , B., Z HANG , C., AND Z HANG , Z. MXNet: A rithms and task scheduling. In Proceedings of the 14th Python in
flexible and efficient machine learning library for heterogeneous Science Conference (2015), K. Huff and J. Bergstra, Eds., pp. 130
distributed systems. In NIPS Workshop on Machine Learning Sys- – 136.
tems (LearningSys’16) (2016). [18] S ANFILIPPO , S. Redis: An open source, in-memory data structure
[7] C OATES , A., H UVAL , B., WANG , T., W U , D., C ATANZARO , store. https://redis.io/, 2009.
B., AND A NDREW, N. Deep learning with COTS HPC systems. [19] S ILVER , D., H UANG , A., M ADDISON , C. J., G UEZ , A.,
In Proceedings of The 30th International Conference on Machine S IFRE , L., VAN D EN D RIESSCHE , G., S CHRITTWIESER , J.,
Learning (2013), pp. 1337–1345. A NTONOGLOU , I., PANNEERSHELVAM , V., L ANCTOT, M.,
ET AL . Mastering the game of Go with deep neural networks and
[8] C RANKSHAW, D., BAILIS , P., G ONZALEZ , J. E., L I , H.,
tree search. Nature 529, 7587 (2016), 484–489.
Z HANG , Z., F RANKLIN , M. J., G HODSI , A., AND J ORDAN ,
M. I. The missing piece in complex analytics: Low latency, scal- [20] W ENTZLAFF , D., AND AGARWAL , A. Factored operating sys-
able model management and serving with Velox. arXiv preprint tems (fos): The case for a scalable operating system for multicores.
arXiv:1409.3809 (2014). SIGOPS Oper. Syst. Rev. 43, 2 (Apr. 2009), 76–85.
[9] D EAN , J., AND G HEMAWAT, S. MapReduce: Simplified data [21] Z AHARIA , M., X IN , R. S., W ENDELL , P., DAS , T., A RMBRUST,
processing on large clusters. Commun. ACM 51, 1 (Jan. 2008), M., DAVE , A., M ENG , X., ROSEN , J., V ENKATARAMAN , S.,
107–113. F RANKLIN , M. J., G HODSI , A., G ONZALEZ , J., S HENKER , S.,
AND S TOICA , I. Apache Spark: A unified engine for big data
[10] D UAN , Y., C HEN , X., H OUTHOOFT, R., S CHULMAN , J., AND
processing. Commun. ACM 59, 11 (Oct. 2016), 56–65.
A BBEEL , P. Benchmarking deep reinforcement learning for con-
tinuous control. In Proceedings of the 33rd International Confer-
ence on Machine Learning (ICML) (2016).
[11] G ABRIEL , E., FAGG , G. E., B OSILCA , G., A NGSKUN , T., D ON -
GARRA , J. J., S QUYRES , J. M., S AHAY, V., K AMBADUR , P.,
BARRETT, B., L UMSDAINE , A., C ASTAIN , R. H., DANIEL ,
D. J., G RAHAM , R. L., AND W OODALL , T. S. Open MPI:
Goals, concept, and design of a next generation MPI implemen-
tation. In Proceedings, 11th European PVM/MPI Users’ Group
Meeting (Budapest, Hungary, September 2004), pp. 97–104.

Question Bank
100% (1)
Question Bank
28 pages
Round Robin Scheduling Algorithm (With Different Arrival Time of Processes)
0% (1)
Round Robin Scheduling Algorithm (With Different Arrival Time of Processes)
4 pages
Lec-All Deep Learning Coursework
100% (2)
Lec-All Deep Learning Coursework
639 pages
Operating System Important Questions and Answers - Crowley
No ratings yet
Operating System Important Questions and Answers - Crowley
14 pages
Manifolds, Tensor Analysis, and Applications: R. Abraham J.E. Marsden T. Ratiu
No ratings yet
Manifolds, Tensor Analysis, and Applications: R. Abraham J.E. Marsden T. Ratiu
3 pages
Effects of Resource Contention and Resource Access Control: Priority Inversion
No ratings yet
Effects of Resource Contention and Resource Access Control: Priority Inversion
10 pages
Read:-1.U.O.No. 4368/2019/admn Dated, 23.03.2019.: University of Calicut
No ratings yet
Read:-1.U.O.No. 4368/2019/admn Dated, 23.03.2019.: University of Calicut
125 pages
Project OS2
No ratings yet
Project OS2
2 pages
Unit-2 (OS) - 1
No ratings yet
Unit-2 (OS) - 1
85 pages
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (Obc B Category)
No ratings yet
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (Obc B Category)
2 pages
Cloud Resource Management Problems
No ratings yet
Cloud Resource Management Problems
5 pages
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (All Category)
No ratings yet
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (All Category)
11 pages
IT
No ratings yet
IT
129 pages
Tensor Flow
No ratings yet
Tensor Flow
19 pages
Cs330 IIT Kanpur
No ratings yet
Cs330 IIT Kanpur
17 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
No ratings yet
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
119 pages
Autoencoders: Parallel Programming Parallel Processing
No ratings yet
Autoencoders: Parallel Programming Parallel Processing
5 pages
Scheduling
No ratings yet
Scheduling
29 pages
Nidadavolu Malathi: Telugu Women Writers, 1950-1975
0% (1)
Nidadavolu Malathi: Telugu Women Writers, 1950-1975
111 pages
Tensorflow: A System For Large-Scale Machine Learning
No ratings yet
Tensorflow: A System For Large-Scale Machine Learning
21 pages
Image Recognitiion
No ratings yet
Image Recognitiion
50 pages
Example of Round Robin Scheduling PDF
0% (1)
Example of Round Robin Scheduling PDF
2 pages
Tensorflow: Large-Scale Machine Learning On Heterogeneous Distributed Systems
No ratings yet
Tensorflow: Large-Scale Machine Learning On Heterogeneous Distributed Systems
4 pages
CSE Embedded Systems Report
No ratings yet
CSE Embedded Systems Report
19 pages
Deep Reinforcement Learning An Overview
No ratings yet
Deep Reinforcement Learning An Overview
30 pages
Ray: A Distributed Framework For Emerging AI Applications
No ratings yet
Ray: A Distributed Framework For Emerging AI Applications
19 pages
Uncertainty in Modeling
No ratings yet
Uncertainty in Modeling
25 pages
RL Systems
No ratings yet
RL Systems
88 pages
Cs329s 01 Slides
No ratings yet
Cs329s 01 Slides
70 pages
Rlpyt: A Research Code Base For Deep Reinforcement Learning in Pytorch
No ratings yet
Rlpyt: A Research Code Base For Deep Reinforcement Learning in Pytorch
12 pages
TF Estimators KDD Paper
No ratings yet
TF Estimators KDD Paper
9 pages
A New Platform For Distributed
No ratings yet
A New Platform For Distributed
19 pages
Deep Reinforcement Learning For Network Slicing With Heterogeneous Resource Requirements and Time Varying Traffic Dynamics - Koo Et Al. 2019
No ratings yet
Deep Reinforcement Learning For Network Slicing With Heterogeneous Resource Requirements and Time Varying Traffic Dynamics - Koo Et Al. 2019
9 pages
Dotrl: A Platform For Rapid Reinforcement Learning Methods Development and Validation
No ratings yet
Dotrl: A Platform For Rapid Reinforcement Learning Methods Development and Validation
9 pages
T.E. (Computer Science I & II)
100% (6)
T.E. (Computer Science I & II)
20 pages
Scheduling Questions
No ratings yet
Scheduling Questions
19 pages
RIoTBench Summary
No ratings yet
RIoTBench Summary
26 pages
Prospectus HGT
No ratings yet
Prospectus HGT
23 pages
Chp1 Slide
No ratings yet
Chp1 Slide
78 pages
A4PRe3 21r4r
No ratings yet
A4PRe3 21r4r
80 pages
Issn 0976-8645: A First Epqrst Journal-/^Êϵϳϲͳθϲϰϱ A First Epqrst Journal- /^Êϵϳϲͳθϲϰϱ
No ratings yet
Issn 0976-8645: A First Epqrst Journal-/^Êϵϳϲͳθϲϰϱ A First Epqrst Journal- /^Êϵϳϲͳθϲϰϱ
55 pages
Angle A New Larg-Scale Machine Learning System
No ratings yet
Angle A New Larg-Scale Machine Learning System
21 pages
"Gati Limited Q3 FY2022 Earnings Conference Call": February 07, 2022
No ratings yet
"Gati Limited Q3 FY2022 Earnings Conference Call": February 07, 2022
17 pages
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (ST Category)
No ratings yet
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (ST Category)
1 page
By Chandra Chekuri August 1998
No ratings yet
By Chandra Chekuri August 1998
145 pages
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (Obc A Category)
No ratings yet
Nabadwip Vidyasagar College Final Merit List Sanskrit Hons (Obc A Category)
1 page
Duquette, Ramasubramanian 2009, Anyathākhyāti - A Critique by Appaya Dīk Ita in The Parimala
No ratings yet
Duquette, Ramasubramanian 2009, Anyathākhyāti - A Critique by Appaya Dīk Ita in The Parimala
17 pages
Singular Values and Eigenvalues of Tensors: A Variational Approach
No ratings yet
Singular Values and Eigenvalues of Tensors: A Variational Approach
4 pages
Letter T2o Indian PM Revisedyyg
No ratings yet
Letter T2o Indian PM Revisedyyg
1 page
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
No ratings yet
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
6 pages
Experimental Study of High Performance Priority Queues: David Lan Roche Supervising Professor: Vijaya Ramachandran
No ratings yet
Experimental Study of High Performance Priority Queues: David Lan Roche Supervising Professor: Vijaya Ramachandran
35 pages
Improved Conic Reformulations For K-Means Clustering: Madhushini Narayana Prasad and Grani A. Hanasusanto
No ratings yet
Improved Conic Reformulations For K-Means Clustering: Madhushini Narayana Prasad and Grani A. Hanasusanto
24 pages
Information Processing Letters: Malik Magdon-Ismail
No ratings yet
Information Processing Letters: Malik Magdon-Ismail
4 pages
On The Justification of Deduction and Induction
No ratings yet
On The Justification of Deduction and Induction
38 pages
Johnson-Lindenstrauss Theory
No ratings yet
Johnson-Lindenstrauss Theory
8 pages
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
No ratings yet
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
15 pages
Ics 2270 Os
No ratings yet
Ics 2270 Os
29 pages
15 ML
No ratings yet
15 ML
60 pages
Cloudy Knapsack Problems: An Optimization Model For Distributed Cloud-Assisted Systems
No ratings yet
Cloudy Knapsack Problems: An Optimization Model For Distributed Cloud-Assisted Systems
5 pages
Ds r16 - Unit-4 (Ref-2)
No ratings yet
Ds r16 - Unit-4 (Ref-2)
15 pages
Sid AIML SEM6
No ratings yet
Sid AIML SEM6
32 pages
DesignSafe Bootcamp V1
No ratings yet
DesignSafe Bootcamp V1
129 pages
June 2012 Ugc Net Computer Science - Solved
No ratings yet
June 2012 Ugc Net Computer Science - Solved
20 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
Edge Impulse
No ratings yet
Edge Impulse
15 pages
DLBench A Comprehensive Experimental Evaluation of
No ratings yet
DLBench A Comprehensive Experimental Evaluation of
23 pages
Towards MLOps ACase Studyof MLPipeline Platform
No ratings yet
Towards MLOps ACase Studyof MLPipeline Platform
7 pages
F2.1-Management Accounting (QP) August 2023
No ratings yet
F2.1-Management Accounting (QP) August 2023
8 pages
Annex 1 - Description of The Action (Part B)
No ratings yet
Annex 1 - Description of The Action (Part B)
79 pages
A Survey On Distributed Machine Learning
No ratings yet
A Survey On Distributed Machine Learning
33 pages
Building Machine Learning Systems With A Feature Store Batch, Real-Time, and LLM Systems Early Release Jim
No ratings yet
Building Machine Learning Systems With A Feature Store Batch, Real-Time, and LLM Systems Early Release Jim
84 pages
RLtools
No ratings yet
RLtools
15 pages
Embedded Systems and IoT - CS3691 - Notes - Unit 2 - Embedded C Programming
No ratings yet
Embedded Systems and IoT - CS3691 - Notes - Unit 2 - Embedded C Programming
27 pages
Sheets, Banks, Written
No ratings yet
Sheets, Banks, Written
42 pages
Superserve: Fine-Grained Inference Serving For Unpredictable Workloads
No ratings yet
Superserve: Fine-Grained Inference Serving For Unpredictable Workloads
20 pages
The Landscape of Machine,...
No ratings yet
The Landscape of Machine,...
31 pages
Mass-Storage Structure
No ratings yet
Mass-Storage Structure
25 pages
A Review On The Emerging Technology of TinyML
No ratings yet
A Review On The Emerging Technology of TinyML
37 pages
Kairouz, McMahan Et Al 2019 - Advances and Open Problems in Federated Learning
No ratings yet
Kairouz, McMahan Et Al 2019 - Advances and Open Problems in Federated Learning
121 pages
2021 Advances and Open Problems in Federating Learning
No ratings yet
2021 Advances and Open Problems in Federating Learning
76 pages
OSY Model Answer of Sample Board Paper
No ratings yet
OSY Model Answer of Sample Board Paper
16 pages
AI - Assignment 2 Zaryab Khan
No ratings yet
AI - Assignment 2 Zaryab Khan
6 pages
Federated Learning A Survery
No ratings yet
Federated Learning A Survery
31 pages
Best Practices For Armv8 R Cortex r52 St2 Whitepaper
No ratings yet
Best Practices For Armv8 R Cortex r52 St2 Whitepaper
33 pages
ADAPTIVE6G - Adaptive Resource Management For Network Slicing Architectures in Current 5G and Future 6G Systems
No ratings yet
ADAPTIVE6G - Adaptive Resource Management For Network Slicing Architectures in Current 5G and Future 6G Systems
24 pages
DL Mid
No ratings yet
DL Mid
7 pages
Advanced Systemdesign 2023
No ratings yet
Advanced Systemdesign 2023
65 pages
Lec1 24th Nov
No ratings yet
Lec1 24th Nov
29 pages
Module 4
No ratings yet
Module 4
36 pages
RLtools-Nov. 2024
No ratings yet
RLtools-Nov. 2024
19 pages
COMPDLA08
No ratings yet
COMPDLA08
3 pages
Synthesis - An Efficient Implementation of Fundamental Operating System Services
No ratings yet
Synthesis - An Efficient Implementation of Fundamental Operating System Services
158 pages
BHCS 06 Operating System 32341303
No ratings yet
BHCS 06 Operating System 32341303
4 pages
Operating System-Syllabus
No ratings yet
Operating System-Syllabus
6 pages
Important Questions 22516
No ratings yet
Important Questions 22516
4 pages
Article - Python - TensorFlow: A System For Large-Scale Machine Learning
No ratings yet
Article - Python - TensorFlow: A System For Large-Scale Machine Learning
18 pages
Were Rnns All We Needed?: Leo - Feng@Mila - Quebec
No ratings yet
Were Rnns All We Needed?: Leo - Feng@Mila - Quebec
27 pages
Fall2016 BUSI403 PS G Sol
No ratings yet
Fall2016 BUSI403 PS G Sol
4 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
No ratings yet
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
9 pages
Wa0011.
No ratings yet
Wa0011.
23 pages
Unit 1 Supervised Learning
No ratings yet
Unit 1 Supervised Learning
33 pages
Optimized and Unified Machine Learning Input Data Pipelines
No ratings yet
Optimized and Unified Machine Learning Input Data Pipelines
15 pages
Unlocking LLM Performance With Ebpf Optimizing Training and Inference Pipelines Chuan Hui Ebpfji Xi Llmxia Daep Xiao Zhen Relia Fa Qiu Yang Xiang Yunshan Networks Inc 1
No ratings yet
Unlocking LLM Performance With Ebpf Optimizing Training and Inference Pipelines Chuan Hui Ebpfji Xi Llmxia Daep Xiao Zhen Relia Fa Qiu Yang Xiang Yunshan Networks Inc 1
37 pages
Decentralized Federated Learning Fundamentals State of The Art Frameworks Trends and Challenges
No ratings yet
Decentralized Federated Learning Fundamentals State of The Art Frameworks Trends and Challenges
31 pages
Unit - 4 Pyq Short
No ratings yet
Unit - 4 Pyq Short
8 pages
Nsdi21 SwitchML
No ratings yet
Nsdi21 SwitchML
25 pages
Deep Reinforcement Learning For Computational Fluid Dynamics On HPC
No ratings yet
Deep Reinforcement Learning For Computational Fluid Dynamics On HPC
9 pages

Real-Time Machine Learning: The Missing Pieces

Uploaded by

Real-Time Machine Learning: The Missing Pieces

Uploaded by

Real-Time Machine Learning: The Missing Pieces

Furthermore, the database stores the computation lineage,

Object Store Object Store Object Store

3.2.1 Centralized Control State 4 Feasibility

You might also like