0% found this document useful (0 votes)

347 views56 pages

MLOPS

The document discusses machine learning deployment and operations. It covers offline deployment where a trained model is applied to unlabeled data. It also discusses online prediction serving, monitoring and versioning of models through MLOps as well as federated machine learning and LLMOps.

Uploaded by

Priyanshu Bhatia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

347 views56 pages

MLOPS

Uploaded by

Priyanshu Bhatia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

CSE 234

Data Systems for Machine Learning

Arun Kumar

Topic 3: ML Deployment, MLOps, and LLMOps

Chapter 8.5 of MLSys book

1
ML Deployment in the Lifecycle

Feature Engineering
Data acquisition Serving
Training & Inference
Data preparation Monitoring
Model Selection 2
ML Deployment in the Lifecycle

3
Outline

❖ Offline ML Deployment
❖ MLOps:
❖ Online Prediction Serving
❖ Monitoring and Versioning
❖ Federated ML
❖ LLMOps

4
Offline ML Deployment

❖ Given: A trained prediction function f(); a set of (unlabeled) data

examples
❖ Goal: Apply inference with f() to all examples efficiently
❖ Key metrics: Throughput, cost, latency
❖ Historically, offline was the most common scenario
❖ Still is at most enterprises, healthcare, academia
❖ Typically once a day / week / month / quarter!
❖ Aka model scoring in some settings

5
Offline ML Deployment: Systems

❖ Not particularly challenging in most applications

❖ All ML systems support offline batch inference by default

Disk-based files: Layered on RDBMS/Spark:

General ML
Libraries: Cloud-native: “AutoML” platforms:

GBDT Systems: DL Systems: LLM Systems:

6
Offline ML Deployment: Optimizations

Q: What systems-level optimizations are possible here?

❖ Data Parallelism:
❖ Inference is embarrassingly parallel across examples
❖ Factorized ML (e.g., in Morpheus):
❖ Push ML computations down through joins
❖ Pre-computes some FLOPS and reuses across examples
<latexit sha1_base64="2V0kohkHpexr5CSTE+avL90vdmQ=">AAACCnicbZDLSsNAFIYnXmu9RV26GS2CCymJKAoiFN24EaqYtpCGMJlO2qGTCzMTaQlZu/FV3LhQxK1P4M63cZpG0NYfBj7+cw5nzu/FjAppGF/azOzc/MJiaam8vLK6tq5vbDZElHBMLByxiLc8JAijIbEklYy0Yk5Q4DHS9PqXo3rznnBBo/BODmPiBKgbUp9iJJXl6jsDl8JzaA/clB7cZmcwB+sHrjPH1StG1cgFp8EsoAIK1V39s92JcBKQUGKGhLBNI5ZOirikmJGs3E4EiRHuoy6xFYYoIMJJ81MyuKecDvQjrl4oYe7+nkhRIMQw8FRngGRPTNZG5n81O5H+qZPSME4kCfF4kZ8wKCM4ygV2KCdYsqEChDlVf4W4hzjCUqVXViGYkydPQ+Owah5XjZujSu2iiKMEtsEu2AcmOAE1cAXqwAIYPIAn8AJetUftWXvT3setM1oxswX+SPv4BhzkmUQ=</latexit>

xi = [xi,R ; xi,U ; xi,M ]

Example: GLM inference:
w T xi = w R
T T T
<latexit sha1_base64="rZ9cDtVxw9VjJ0+8Nd1k/o4JYyM=">AAACIHicbVBNSwJBGJ7t0+xrq2OXIQmCQnajsEsgdekimLgqqC2z46iDs7PLzGwmiz+lS3+lS4ci6la/plFXKO2FgefjfXnnfbyQUaks68tYWFxaXllNraXXNza3ts2d3YoMIoGJgwMWiJqHJGGUE0dRxUgtFAT5HiNVr3c98qv3REga8LIahKTpow6nbYqR0pJr5vp3ZfjgUngJ+25pjGN6UhrCY82dKXcmvDDlhaFrZqysNS44D+wEZEBSRdf8bLQCHPmEK8yQlHXbClUzRkJRzMgw3YgkCRHuoQ6pa8iRT2QzHh84hIdaacF2IPTjCo7V3xMx8qUc+J7u9JHqyllvJP7n1SPVvmjGlIeRIhxPFrUjBlUAR2nBFhUEKzbQAGFB9V8h7iKBsNKZpnUI9uzJ86BymrXPs9btWSZ/lcSRAvvgABwBG+RAHtyAInAABo/gGbyCN+PJeDHejY9J64KRzOyBP2V8/wCHMaC9</latexit>

xi,R + wU xi,U + wM xi,M

7
Offline ML Deployment: Optimizations

Q: What systems-level optimizations are possible here?

❖ More general pre-computation / caching / batching:
❖ Factorized ML is a specific form of sharing/caching
❖ Other forms of “multi-query optimization” possible

Example: Batched inference for separate GLMs:

<latexit sha1_base64="O5xcOdH8jwPhx9HWfNjC8HEluKY=">AAACCnicbVBNS8NAEJ34WetX1aOX1SLUS0lE0WPRi8cK9gPaEDabTbt0swm7G6WEnr34V7x4UMSrv8Cb/8ZtG0FbHww83pthZp6fcKa0bX9ZC4tLyyurhbXi+sbm1nZpZ7ep4lQS2iAxj2Xbx4pyJmhDM81pO5EURz6nLX9wNfZbd1QqFotbPUyoG+GeYCEjWBvJKx20vUygrmYRVSgYocq95xx7WfAjOSOvVLar9gRonjg5KUOOulf67AYxSSMqNOFYqY5jJ9rNsNSMcDoqdlNFE0wGuEc7hgps9rjZ5JUROjJKgMJYmhIaTdTfExmOlBpGvumMsO6rWW8s/ud1Uh1euBkTSaqpINNFYcqRjtE4FxQwSYnmQ0MwkczcikgfS0y0Sa9oQnBmX54nzZOqc1a1b07Ltcs8jgLswyFUwIFzqME11KEBBB7gCV7g1Xq0nq03633aumDlM3vwB9bHN0U/mVo=</latexit>

Xn⇥d (w1 )d⇥1 <latexit sha1_base64="FbyWeykkKjbYXgcgUvh1k2Dkhgw=">AAACB3icbVDLSsNAFJ34rPUVdSnIYBFclaRVFNwU3bisYB+QhjCZTNqhk0mYmVhK6M6Nv+LGhSJu/QV3/o2TNgttPTDD4Zx7ufceP2FUKsv6NpaWV1bX1ksb5c2t7Z1dc2+/LeNUYNLCMYtF10eSMMpJS1HFSDcRBEU+Ix1/eJP7nQciJI35vRonxI1Qn9OQYqS05JlHXeiMPPsKjrxa/tVdLwtgT9GISFifeGbFqlpTwEViF6QCCjQ986sXxDiNCFeYISkd20qUmyGhKGZkUu6lkiQID1GfOJpypOe42fSOCTzRSgDDWOjHFZyqvzsyFEk5jnxdGSE1kPNeLv7nOakKL92M8iRVhOPZoDBlUMUwDwUGVBCs2FgThAXVu0I8QAJhpaMr6xDs+ZMXSbtWtc+r1t1ZpXFdxFECh+AYnAIbXIAGuAVN0AIYPIJn8ArejCfjxXg3PmalS0bRcwD+wPj8AQiPl4U=</latexit>

X[w1 ; w2 ; w3 ]d⇥3
Xw2 Xw3
<latexit sha1_base64="ILwJ5m3jrQ8hO3GSh3nWcgWIy24=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoseiF48VTFtoQ9lsJ+3SzSbsbpRS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBVcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqZNMMfRZIhLVDqlGwSX6hhuB7VQhjUOBrXB0O/Nbj6g0T+SDGacYxHQgecQZNVby2+SpV+uVK27VnYOsEi8nFcjR6JW/uv2EZTFKwwTVuuO5qQkmVBnOBE5L3UxjStmIDrBjqaQx6mAyP3ZKzqzSJ1GibElD5urviQmNtR7Hoe2MqRnqZW8m/ud1MhNdBxMu08ygZItFUSaIScjsc9LnCpkRY0soU9zeStiQKsqMzadkQ/CWX14lzVrVu6y69xeV+k0eRxFO4BTOwYMrqMMdNMAHBhye4RXeHOm8OO/Ox6K14OQzx/AHzucPDoGOMA==</latexit>

<latexit sha1_base64="DZ7caenQYrDXMFVHE77M5yj9HSQ=">AAAB7HicbVDLSgNBEOyNrxhfUY9eBoPgKez6QI9BLx4juEkgWcLspDcZMju7zMwqYck3ePGgiFc/yJt/4+Rx0MSChqKqm+6uMBVcG9f9dgorq2vrG8XN0tb2zu5eef+goZNMMfRZIhLVCqlGwSX6hhuBrVQhjUOBzXB4O/Gbj6g0T+SDGaUYxLQvecQZNVbyW+Spe94tV9yqOwVZJt6cVGCOerf81eklLItRGiao1m3PTU2QU2U4EzgudTKNKWVD2se2pZLGqIN8euyYnFilR6JE2ZKGTNXfEzmNtR7Foe2MqRnoRW8i/ue1MxNdBzmXaWZQstmiKBPEJGTyOelxhcyIkSWUKW5vJWxAFWXG5lOyIXiLLy+TxlnVu6y69xeV2s08jiIcwTGcggdXUIM7qIMPDDg8wyu8OdJ5cd6dj1lrwZnPHMIfOJ8/EAWOMQ==</latexit>

Reduces memory stalls for X; raises hardware efficiency

8
Peer Instruction Activity

(Switch slides)

9
Outline

❖ Offline ML Deployment
❖ MLOps:
❖ Online Prediction Serving
❖ Monitoring and Versioning
❖ Federated ML
❖ LLMOps

10
Background: DevOps

❖ Software Development + IT Operations (DevOps) is a long

standing subarea of software engineering
❖ No uniform definition but loosely, the science + engineering of
administering software in production
❖ Fuses many historically separate job roles
❖ Cloud and “Agile” s/w eng. have revolutionized DevOps

11
Background: DevOps

12
https://medium.com/swlh/how-to-become-an-devops-engineer-in-2020-80b8740d5a52
Key Parts of DevOps Stack/Practice

Logging & Monitoring

Continuous Integration (CI)

Building & Testing Version Control
& Continuous Delivery (CD)

Infrastructure-as-Code (IaC), Microservices /

including Config. & Policy Containerization & Orchestration

Content Credit: Manasi Vartak, Verta.AI

https://aws.amazon.com/devops/what-is-devops/ 13
The Rise of MLOps

❖ MLOps = DevOps for ML-infused software

❖ Much harder than for deterministic software!
❖ Things that matter beyond just ML model codes:
❖ Training and validation datasets
❖ Data cleaning/prep/featurization codes/scripts
❖ Hyperparameters, other training configs
❖ Post-inference rules/configs/ensembling
❖ Software versions/configs?
❖ Training hardware/configs?

Content Credit: Manasi Vartak, Verta.AI 14

The Rise of MLOps

❖ Need to change DevOps for ML program semantics

❖ Online Prediction Serving
❖ Logging & Monitoring:
❖ Prediction failures; concept drift; feature inflow changes
❖ Version Control:
❖ Anything can change: ML code, data, configs, etc.
❖ Build & Test; CI & CD:
❖ Rigorous train-val-test splits; beware insidious overfitting
❖ New space with a lot of R&D; no consensus on standards

Content Credit: Manasi Vartak, Verta.AI 15

The “3 Vs of MLOps”

❖ Velocity:
❖ Need for rapid experimentation, prototyping, and deployment
with minimal friction
❖ Validation:
❖ Need for checks on quality and integrity of data, features,
models, predictions
❖ Versioning:
❖ Need to keep track of deployed models and features to
ensure provenance and fallback options

16
https://arxiv.org/pdf/2209.09125.pdf
The “3 Vs of MLOps”

❖ Interplay/tussles between the 3 Vs shapes decisions on tools,

processes, and people management in MLOps
❖ Examples:
❖ Should Jupyter notebooks be deployed to production?
Velocity vs. Validation
❖ Are feature stores needed? Velocity vs. Versioning
❖ Relabel/augment val. data? Validation vs. Versioning

17
https://arxiv.org/pdf/2209.09125.pdf
Birds-eye View of MLOps

18
https://arxiv.org/pdf/2205.02302.pdf
Birds-eye View of MLOps

19
https://arxiv.org/pdf/2205.02302.pdf
Outline

❖ Offline ML Deployment
❖ MLOps:
❖ Online Prediction Serving
❖ Monitoring and Versioning
❖ Federated ML
❖ LLMOps

20
Online Prediction Serving

❖ Standard setting for Web and IoT deployments of ML

❖ Typically need to be realtime; < 100s of milliseconds!
❖ AKA model serving or ML serving
❖ Given: A trained prediction function f() + a stream of unlabeled
data example(s)
❖ Goal: Apply f() to all/each example efficiently
❖ Key metrics: Latency, memory footprint, cost, throughput

21
Online Prediction Serving

❖ Surprisingly challenging to do well in ML systems practice!

❖ Active area of R&D; many startups
❖ Key Challenges:
❖ Heterogeneity of environments: webpages, cloud-based
apps, mobile apps, vehicles, IoT, etc.
❖ Unpredictability of load: need to elastically upscale or
downscale resources
❖ Function’s complexity: model, featurization and data prep
code, output thresholds, etc.
❖ May straddle libraries and even PLs!
❖ Hard to optimize end to end in general
22
The Rise of Serverless Infra.

❖ Prediction serving is a “killer app” for Function-as-a-Service

(FaaS), AKA serverless cloud infra.
❖ Extreme pay-as-you-go; can rent at millisecond level!

❖ Still, many open efficiency issues for ML deployment:

❖ Reduce memory footprints, input access restrictions,
logging / output persistence restrictions, latency
23
Online Prediction Serving: Systems

❖ Numerous serving systems have sprung up

General-purpose (supports multiple ML tools):

ML System-specific:

TF Serving TorchServe
24
Clipper

❖ A pioneering general-purpose ML serving system

25
Clipper: Principles and Techniques

❖ Generality and modularity:

❖ One of the first to use containers for prediction serving
❖ Supports multiple ML tools in unified layered API
❖ Efficiency:
❖ Some basic optimizations: batching to raise throughput;
caching of frequently access models/vectors
❖ Multi-model deployment and flexibility:
❖ A heuristic “model selection” layer to dynamically pick among
multiple deployed models; ensembling

26
TensorFlow Serving

❖ TF Serving is a mature ML serving system, also pioneering

❖ Optimized for TF model formats; also supports batching
❖ Dynamic reloading of weights; multiple data sources

❖ TF Lite and
TF.JS
optimized for
more niche
backends/
runtime
environments

27
VLLM: Overview
❖ Goal: Improve throughput of serving LLMs on GPUs
❖ Observation: Memory fragmentation due to dynamic memory
footprint of attention ops’ KV tensors wastes GPU memory
❖ Key Idea:
❖ Level of indirection akin to paging and virtual memory in OS
❖ Group attention ops’ KV tensors into a block per set of tokens
❖ Blocks need not be laid out contiguously in GPU memory

28
VLLM: Techniques and Impact

❖ (Switch to Hao’s slide deck)

29
Your Reviews on VLLM Paper

❖ (Walked through in class)

30
Comparing ML Serving Systems

❖ Benefits of general-purpose vs. ML system-specific:

❖ Tool heterogeneity is a reality for many orgs
❖ More nimble to customize accuracy post-deployment with
different kinds of models/tools
❖ Flexibility to swap ML tools; no “tool lock-in”
❖ Benefits of ML system-specific vs. general-purpose:
❖ Generality may not be needed inside org. (e.g., Google);
lower complexity of MLOps
❖ Likely more amenable to code/pipeline optimizations
❖ Likely better hardware utilization, lower cloud costs

31
Outline

❖ Offline ML Deployment
❖ MLOps:
❖ Online Prediction Serving
❖ Monitoring and Versioning
❖ Federated ML
❖ LLMOps

32
Example for ML Monitoring: TFX

❖ TFX’s “Model Analysis” lets user specify metrics, track over time
automatically, alert on-call
❖ Can specify metrics for feature-based data “slices” too

33
https://www.tensorflow.org/tfx/guide/tfma
Example for ML Versioning: Verta

❖ Started with ModelDB for storing and tracking ML artifacts

❖ ML code; data; configuration; environment
❖ APIs as hooks into ML dev code; SDK and web app./GUI
❖ Registry for versions and workflows

34
https://blog.verta.ai/blog/the-third-wave-of-operationalization-is-here-mlops
Open Research Questions in MLOps

❖ Efficient and consistent version control for ML datasets and

featurization codes
❖ Automate prediction failure detection and recovery
❖ Detect concept drift in an actionable manner; prescribe fixes
❖ Velocity and complexity of streaming ML applications
❖ CI & CD for model ensembles without insidious overfitting
❖ Automated end-to-end optimizations
❖ …

35
Outline

❖ Offline ML Deployment
❖ MLOps:
❖ Online Prediction Serving
❖ Monitoring and Versioning
❖ Federated ML
❖ LLMOps

36
Federated ML

❖ Pioneered by Google for ML/AI applications on smartphones

❖ Key benefit is more user privacy:
❖ User’s (labeled) data does not leave their device
❖ Decentralizes ML model training/finetuning to user data

https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
37
https://mlsys.org/Conferences/2019/doc/2019/193.pdf
Federated ML

❖ Key challenge: Decentralize SGD to intermittent updates

❖ They proposed a simple “federated averaging” algorithm

❖ User-partitioned updates breaks IID assumption; skews arise

❖ Turns out SGD is still pretty robust (recall async. PS); open
theoretical questions still being studied

38
https://arxiv.org/abs/1602.05629
Federated ML

❖ Privacy/security-focused improvements:
❖ New SGD variants; integration with differential privacy
❖ Cryptography to anonymize update aggregations
❖ Apart from strong user privacy, communication and energy
efficiency also major concerns on battery-powered devices
❖ Systems+ML heuristic optimizations:
❖ Compression and quantization to save upload bandwidth
❖ Communicate only high quality model updates
❖ Novel federation-aware ML algorithmics

https://arxiv.org/abs/1602.05629
https://arxiv.org/pdf/1610.02527.pdf
39
https://eprint.iacr.org/2017/281.pdf
Federated ML

❖ Federated ML protocol has become quite sophisticated to ensure

better stability/reliability, accuracy, and manageability

40
https://mlsys.org/Conferences/2019/doc/2019/193.pdf
Federated ML

❖ Google has neatly abstracted the client-side (embedded in mobile

app.) and server-side functionality with actor design

41
https://mlsys.org/Conferences/2019/doc/2019/193.pdf
Federated ML

❖ Notion of “FL Plan” and simulation-based tooling for data

scientists to tailor ML for this deployment regime
❖ (Users’) Training data is out of reach!
❖ Model is updated asynchronously automatically
❖ Debugging and versioning become even more difficult

42
https://mlsys.org/Conferences/2019/doc/2019/193.pdf
Outline

❖ Offline ML Deployment
❖ MLOps:
❖ Online Prediction Serving
❖ Monitoring and Versioning
❖ Federated ML
❖ LLMOps

43
LLMOps: Birds-Eye View

44
https://www.databricks.com/glossary/llmops
LLMOps: Emerging Stack

45
https://a16z.com/emerging-architectures-for-llm-applications/
LLMOps: Principles and Practices

❖ Three main groups of new technical concerns in LLMOps:

❖ Managing Data Ingestion
❖ Chunking of text, embedding creation/indexing/maintenance,
Retrieval-Augmented Generation (RAG)
❖ Managing LLM API Usage
❖ Prompt engineering / management, application abstractions,
caching, logging, validation/“guardrails”
❖ Customizing the LLM
❖ Finetuning, transfer learning, routing layers

46
LLMOps: Managing Data Ingestion

❖ LLM applications need to handle large multimodal corpora: text,

PDFs, JSON, images, etc.
❖ 3 key sub-parts on handling such data:
❖ Chunking: Partition docs, pages, etc. into bite-sized pieces
❖ Embedding: Generate embeddings for chunks (using LLMs)
❖ Vector DBMS: Store; index; retrieve for queries; maintain

47
LLMOps: Managing Data Ingestion

48
https://www.linkedin.com/pulse/3-ways-vector-databases-take-your-llm-use-cases-next-level-mishra
LLMOps: Prompt Engineering

Q: What is a “prompt”?

❖ A prompt is an input to an LLM API

with some of these elements:
❖ Instruction: A specific task or
instruction for the model to do
❖ Context: External information or
additional context that can steer the
model to better responses
❖ Input Data: The input or question we
need a response for
❖ Output Indicator: The type or format
of the output
49
https://www.promptingguide.ai/introduction/elements
LLMOps: Prompt Engineering

Q: What is a “prompt engineering”?

50
LLMOps: Prompt Engineering

Q: What is a “prompt engineering”?

❖ A set of “best practices” (witchcraft?) and “guidelines” (spellbook?)

to craft prompts (spells?) for more effective use of LLMs
❖ Tricks and techniques abound, e.g,. “chain of thought”, “self-
consistency”, “few-shot”, etc.; take a generative NLP course
❖ Some useful practical references:

https://www.promptingguide.ai/techniques
https://platform.openai.com/docs/guides/prompt-engineering/six-strategies-for-getting-better-results
https://aws.amazon.com/what-is/prompt-engineering/

51
Emerging LLMOps-Specific Tools

❖ Prompt management and data embedding management are

increasingly critical for LLM applications
❖ Tools for LLM appl. dev in flux; 2 recent popular examples:
❖ LangChain:
❖ Enables more structured compositions of LLM API invocations
called “chains” (akin to “transactions” in RDBMS applications)
❖ Stitch together more holistic “agents” for an application
❖ LlamaIndex:
❖ Mainly an aid for Retrieval-Augmented Generation and
similarity search
❖ APIs to parse, chunk, store, index, and query pieces
52
53
RAG and LlamaIndex

54
Regular MLOps vs. LLMOps

❖ Center of the universe? In-house trained model artifact vs. API-

based access to a pre-trained LLM
❖ Query input? Feature vector vs. rich customizable prompts
❖ Typical prediction targets? Discriminative/inferential targets vs.
generative content
❖ Tuning? Model selection aspects (hyper-par. tuning, arch.
tuning, feature eng., etc.) vs. in-context learning / prompt tuning
❖ Auxiliary data stores? Feature stores/ML platforms vs. Vector
DBMSs
❖ Cost/latency optimizations? In-house model/pipeline/resource
optimizations vs. Fixed LLM operating points or re-routing

55
Review Questions
1. Briefly explain 2 reasons why online prediction serving is typically
more challenging in practice than offline deployment.
2. Briefly describe 1 systems optimizations performed by both Clipper
and VLLM for prediction serving.
3. Briefly discuss 1 systems-level optimization amenable to both
offline ML deployment and online prediction serving.
4. Name 3 things that must be versioned for rigorous version control
in MLOps.
5. Briefly explain 2 reasons why ML monitoring is needed.
6. Briefly explain 2 reasons why federated ML is more challenging for
data scientists to reason about.
7. Briefly explain 1 way LLMOps deviates from regular MLOps.
8. Briefly explain 1 new technical concern in LLMOps that required
new tool development. 56

Logcat
No ratings yet
Logcat
4,525 pages
Real-World Machine L v8 MEAP
100% (1)
Real-World Machine L v8 MEAP
186 pages
IDE204 - TimeGPT Generative AI For Time Series
100% (1)
IDE204 - TimeGPT Generative AI For Time Series
36 pages
AWS FMOps LLMOps Operationalise GenAI Using MLOps Principles
100% (1)
AWS FMOps LLMOps Operationalise GenAI Using MLOps Principles
56 pages
WM-GL-HAL-PSL-503 - Maintenance Procedures For A Lo Torc Plug Valve
100% (2)
WM-GL-HAL-PSL-503 - Maintenance Procedures For A Lo Torc Plug Valve
29 pages
MLOps Continuous Delivery For ML On AWS
No ratings yet
MLOps Continuous Delivery For ML On AWS
69 pages
GraphRAG + GPT-4o-Mini Is The RAG Heaven - by Vatsal Saglani - Jul, 2024 - Towards AI
No ratings yet
GraphRAG + GPT-4o-Mini Is The RAG Heaven - by Vatsal Saglani - Jul, 2024 - Towards AI
34 pages
Mlops 101
No ratings yet
Mlops 101
33 pages
ML Ops
100% (1)
ML Ops
19 pages
Generative AI LLM Tutorial
No ratings yet
Generative AI LLM Tutorial
25 pages
Multi Agents Share
No ratings yet
Multi Agents Share
45 pages
Explaining Vector Databases in 3 Levels of Difficulty - by Leonie Monigatti - Jul, 2023 - Towards Data Science
No ratings yet
Explaining Vector Databases in 3 Levels of Difficulty - by Leonie Monigatti - Jul, 2023 - Towards Data Science
12 pages
Improve Real-World RAG Systems
No ratings yet
Improve Real-World RAG Systems
43 pages
Practical MLOPS
100% (1)
Practical MLOPS
52 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
AWS Machine Learning Engineer: Nanodegree Program Syllabus
100% (1)
AWS Machine Learning Engineer: Nanodegree Program Syllabus
18 pages
Top 100 ML Interview Q&A
100% (1)
Top 100 ML Interview Q&A
39 pages
LLM Fine Tuning
No ratings yet
LLM Fine Tuning
1 page
RAG Notes
No ratings yet
RAG Notes
19 pages
ML Observability Build Vs Buy Download Guide 1689038317
No ratings yet
ML Observability Build Vs Buy Download Guide 1689038317
31 pages
Aios LLM As Os
100% (2)
Aios LLM As Os
35 pages
MLops Concept
No ratings yet
MLops Concept
20 pages
2024 NEW Myg Catalogue
No ratings yet
2024 NEW Myg Catalogue
8 pages
Machine Learning Operations (Mlops) : Overview, Definition, and Architecture
No ratings yet
Machine Learning Operations (Mlops) : Overview, Definition, and Architecture
13 pages
TensorFlow Cheatsheet Zero To Mastery V1.01
No ratings yet
TensorFlow Cheatsheet Zero To Mastery V1.01
26 pages
Large Language Model
No ratings yet
Large Language Model
6 pages
GenerativeAI Projects
100% (2)
GenerativeAI Projects
46 pages
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
No ratings yet
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
32 pages
MLOps Specialization Course April 2024
100% (1)
MLOps Specialization Course April 2024
25 pages
Aws Mlops Framework
No ratings yet
Aws Mlops Framework
43 pages
Embeddings
No ratings yet
Embeddings
13 pages
Danfoss Refrigeration Basics - ESSENTIAL
100% (1)
Danfoss Refrigeration Basics - ESSENTIAL
24 pages
Preliminar Não Fabricar: Plan View From Above Showing Foundation Hole Drilling
No ratings yet
Preliminar Não Fabricar: Plan View From Above Showing Foundation Hole Drilling
1 page
Vector Database Essentials
No ratings yet
Vector Database Essentials
26 pages
Building Machine Learning Systems With A Feature Store - Early Release
100% (2)
Building Machine Learning Systems With A Feature Store - Early Release
48 pages
Embr 1 PDF
No ratings yet
Embr 1 PDF
32 pages
Imaging and Design For The Online Environment
No ratings yet
Imaging and Design For The Online Environment
30 pages
GraphRAG + GPT-4o Mini - Building An AI Knowledge Graph at Low Cost - by Shuyi Wang - Jul, 2024 - Cubed
No ratings yet
GraphRAG + GPT-4o Mini - Building An AI Knowledge Graph at Low Cost - by Shuyi Wang - Jul, 2024 - Cubed
31 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
100% (1)
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
Vector Databases - A Technical Primer
100% (1)
Vector Databases - A Technical Primer
68 pages
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
No ratings yet
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
15 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
A Practical Primer To AI Agents 1736197641
No ratings yet
A Practical Primer To AI Agents 1736197641
23 pages
Table Morgan Sample Thesis
86% (7)
Table Morgan Sample Thesis
1 page
Mlops Ebook With Preview
67% (3)
Mlops Ebook With Preview
57 pages
Building LLM Applications For Production
100% (3)
Building LLM Applications For Production
28 pages
General Physics 1: Phys100
No ratings yet
General Physics 1: Phys100
20 pages
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
From Everand
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
Amandeep
No ratings yet
GenAI Interview Questions-Draft
No ratings yet
GenAI Interview Questions-Draft
27 pages
Operations and Supply Chain Management Week 6
No ratings yet
Operations and Supply Chain Management Week 6
13 pages
AIML001 Generative AI On AWS - Build and Scale Generative AI Applications With Foundation Models
100% (1)
AIML001 Generative AI On AWS - Build and Scale Generative AI Applications With Foundation Models
28 pages
Cambridge International AS & A Level: Physics 9702/23
No ratings yet
Cambridge International AS & A Level: Physics 9702/23
12 pages
Build An MLOps Project in 6 Steps
No ratings yet
Build An MLOps Project in 6 Steps
8 pages
What Are Vector Databases
No ratings yet
What Are Vector Databases
5 pages
Combo
No ratings yet
Combo
11 pages
Generative Ai Terminology
67% (3)
Generative Ai Terminology
26 pages
Generative AI - POC - Readout
100% (3)
Generative AI - POC - Readout
56 pages
Cantilever Slab
No ratings yet
Cantilever Slab
3 pages
Build A Chatbot On Your CSV Data With LangChain and OpenAI
No ratings yet
Build A Chatbot On Your CSV Data With LangChain and OpenAI
5 pages
PP PPT Myp5
No ratings yet
PP PPT Myp5
14 pages
Electrically Switch Electromagnet
No ratings yet
Electrically Switch Electromagnet
16 pages
Friction JEE
No ratings yet
Friction JEE
33 pages
The New Stack and Ops For AI - LLMOps
No ratings yet
The New Stack and Ops For AI - LLMOps
12 pages
PEC1-Format Prak Corporate 2024 (Autosaved)
No ratings yet
PEC1-Format Prak Corporate 2024 (Autosaved)
25 pages
LLMs in Production-MLC - GRC
No ratings yet
LLMs in Production-MLC - GRC
39 pages
LangGraph: Multi-Agent Systems
No ratings yet
LangGraph: Multi-Agent Systems
9 pages
26 RAG Concepts in Alphabetical Order
No ratings yet
26 RAG Concepts in Alphabetical Order
15 pages
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
No ratings yet
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
16 pages
Chapter4performanceparav2 28student 29
No ratings yet
Chapter4performanceparav2 28student 29
19 pages
Evolving LLOMPS For RAG
No ratings yet
Evolving LLOMPS For RAG
6 pages
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
100% (1)
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
27 pages
Building LLM Powered Applications With Langchain
100% (1)
Building LLM Powered Applications With Langchain
11 pages
Introduction To Generative AI LLM
100% (1)
Introduction To Generative AI LLM
9 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
No ratings yet
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
1 page
Gender Inequality Reflected in Play Medea
No ratings yet
Gender Inequality Reflected in Play Medea
3 pages
Lesson Plan in English VI - Compund Sentence
No ratings yet
Lesson Plan in English VI - Compund Sentence
5 pages
Aw Hook-Simulationxpress Study-1
No ratings yet
Aw Hook-Simulationxpress Study-1
11 pages
AVR-15 Manual E
No ratings yet
AVR-15 Manual E
8 pages
LLM Evaluation
No ratings yet
LLM Evaluation
1 page
(Utkarsh Pandey WTLF)
No ratings yet
(Utkarsh Pandey WTLF)
28 pages
Electronics AND Communication Engineers: Indian Society OF
No ratings yet
Electronics AND Communication Engineers: Indian Society OF
2 pages
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
No ratings yet
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
5 pages
Enhancing AI Systems With Agentic Workflows Patterns in Large Language Model
No ratings yet
Enhancing AI Systems With Agentic Workflows Patterns in Large Language Model
6 pages
LLM Applications
100% (1)
LLM Applications
1 page
Vector Databases
No ratings yet
Vector Databases
35 pages
Marker Enzymes
No ratings yet
Marker Enzymes
4 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
C# Tutorial - SoloLearn - Learn To Code For FREE!
No ratings yet
C# Tutorial - SoloLearn - Learn To Code For FREE!
1 page
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet

MLOPS

Uploaded by

MLOPS

Uploaded by

CSE 234

Data Systems for Machine Learning

Topic 3: ML Deployment, MLOps, and LLMOps

Chapter 8.5 of MLSys book

❖ Given: A trained prediction function f(); a set of (unlabeled) data

❖ Not particularly challenging in most applications

Disk-based files: Layered on RDBMS/Spark:

GBDT Systems: DL Systems: LLM Systems:

Q: What systems-level optimizations are possible here?

xi = [xi,R ; xi,U ; xi,M ]

xi,R + wU xi,U + wM xi,M

Q: What systems-level optimizations are possible here?

Example: Batched inference for separate GLMs:

Reduces memory stalls for X; raises hardware efficiency

❖ Software Development + IT Operations (DevOps) is a long

Logging & Monitoring

Continuous Integration (CI)

Infrastructure-as-Code (IaC), Microservices /

Content Credit: Manasi Vartak, Verta.AI

❖ MLOps = DevOps for ML-infused software

Content Credit: Manasi Vartak, Verta.AI 14

❖ Need to change DevOps for ML program semantics

Content Credit: Manasi Vartak, Verta.AI 15

❖ Interplay/tussles between the 3 Vs shapes decisions on tools,

❖ Standard setting for Web and IoT deployments of ML

❖ Surprisingly challenging to do well in ML systems practice!

❖ Prediction serving is a “killer app” for Function-as-a-Service

❖ Still, many open efficiency issues for ML deployment:

❖ Numerous serving systems have sprung up

General-purpose (supports multiple ML tools):

❖ A pioneering general-purpose ML serving system

❖ Generality and modularity:

❖ TF Serving is a mature ML serving system, also pioneering

❖ (Switch to Hao’s slide deck)

❖ (Walked through in class)

❖ Benefits of general-purpose vs. ML system-specific:

❖ Started with ModelDB for storing and tracking ML artifacts

❖ Efficient and consistent version control for ML datasets and

❖ Pioneered by Google for ML/AI applications on smartphones

❖ Key challenge: Decentralize SGD to intermittent updates

❖ User-partitioned updates breaks IID assumption; skews arise

❖ Federated ML protocol has become quite sophisticated to ensure

❖ Google has neatly abstracted the client-side (embedded in mobile

❖ Notion of “FL Plan” and simulation-based tooling for data

❖ Three main groups of new technical concerns in LLMOps:

❖ LLM applications need to handle large multimodal corpora: text,

❖ A prompt is an input to an LLM API

Q: What is a “prompt engineering”?

Q: What is a “prompt engineering”?

❖ A set of “best practices” (witchcraft?) and “guidelines” (spellbook?)

❖ Prompt management and data embedding management are

❖ Center of the universe? In-house trained model artifact vs. API-

You might also like