0% found this document useful (0 votes)

179 views48 pages

Slides Deep Learning On AWS With NVIDIA From Training To Deployment

The document discusses deep learning on AWS with NVIDIA. It covers NVIDIA's relationship with AWS, NVIDIA AI capabilities on AWS including machine learning training at scale and model deployment/inference. It also outlines NVIDIA's GPU offerings like the A100 and upcoming H100 and how they power AWS instances for tasks such as large language model training.

Uploaded by

Nadjmuddin Ar Rasyid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views48 pages

Slides Deep Learning On AWS With NVIDIA From Training To Deployment

Uploaded by

Nadjmuddin Ar Rasyid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

22 February 2023

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep learning on AWS with NVIDIA:
From training to deployment

Michael Lang
Solutions Architect Manager – APAC South
NVIDIA

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• NVIDIA and AWS relationship
• NVIDIA AI on AWS
• ML model training (at scale)
• ML model deployment and inference
• Conclusion
• Next steps

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA AI
End-to-end open platform for production AI

Application workflows

CLARA RIVA TOKKIO MERLIN MODULUS MAXINE METROPOLIS CUOPT NEMO ISAAC DRIVE MORPHEUS

Medical Speech AI Customer Recommenders Physics Video Video Logistics Conversational Robotics Autonomous Cybersecurity
imaging service ML analytics AI vehicles

NVIDIA AI Enterprise NVIDIA

LaunchPad
AI and data science development and deployment tools

Cloud-native management and orchestration

Hands-on
Infrastructure optimization labs

Accelerated infrastructure

Cloud Data center Edge Embedded

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA and AWS relationship

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GPU power from the cloud to the edge

Machine Virtual High-performance Internet of

learning workstations compute things

ML training Work Solve large Extend AI/ML

and cost- from computational to edge devices
effective anywhere problems that act locally
inference

Powerful | Cost-Effective | Flexible

https://aws.amazon.com/nvidia/

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GPU power from the cloud to the edge
The highest-performance instance
Improve your operations with
for ML training and HPC
computer vision at the edge
applications powered by NVIDIA
powered by NVIDIA Jetson
A100 GPUs

High-performance instances for Spot defects with automated

graphics-intensive applications quality inspection powered
and ML inference powered by by NVIDIA Jetson
NVIDIA A10G GPUs

The best price performance NVIDIA GPU-optimized

in Amazon EC2 for graphics software available for free on
workloads powered by the NVIDIA NGC portal.
NVIDIA T4G GPUs

Deploy fast and scalable AI

with NVIDIA Triton Inference
Server in Amazon SageMaker

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA AI on AWS

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA A100
Supercharging High performing ai supercomputing gpu

80 GB HBM2e 2 TB/s +
For largest datasets High-memory bandwidth
and models to feed extremely fast GPU

3rd-gen Tensor core Multi-instance GPU

Powering Amazon EC2 P4d/P4de instances

3rd-gen NVLink

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA H100 – Coming soon to AWS
The new engine of the world’s AI infrastructure

Advanced Transformer 2nd-gen MIG

chip engine

Powering the next generation of GPU systems on AWS Confidential 4th-gen DPX instructions
computing NVLink

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA H100 supercharges large language models
Hopper architecture addresses LLM needs at scale

Supercharged LLM training High-performance prompt learning 30x real-time inference throughput

Lower is better
4K A100
4 5X
Training time (weeks)

3
300
1 month concurrent
to users
2 1 week
Days
to 10
concurrent
1 hours 1X
4K H100 users

0
70 175 530 1000 A100 H100
A100 H100

Time-to-train by LLM size 530B P-tuning time-to-train 530B inference on 10 DGX systems
(billion parameters)

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NGC
Portal to AI services, software, support
NGC catalog

Cloud services Performance optimized Fully transparent Accelerates development

End-to-end AI development Tested across Quickly find and Focus on building, not setup
GPU-accelerated platforms deploy the right SW
1.9 Faster training on the same stack*

Training Speedup
1.4

0.9
May '21 Nov '21 May '22

AI services for NLP, biology, speech Monthly SW container updates Detailed security scan reports One-click deploy from NGC

Multiple cloud
providers

AI workflow management & support SOTA models Model resumes Develop once; deploy
anywhere with NVIDIA VMI

ngc.nvidia.com

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2 instances powered by NVIDIA GPUs
Accessible via AWS, AWS Marketplace, and AWS services

AWS GPU On-demand

NVIDIA GPU GA Use case recommendations Regions GPUs
instance memory price/hour

Graphic workloads such as Android game streaming,

T4g G5g 11/2021 5 16 GB 1, 2 $0.42
ML inference, graphics rendering, and AV simulation

Best performance for graphics, HPC,

A10G G5 11/2021 3 24 GB 1, 4, 8 $1.00
and cost-effective ML inference

Best performance,
A100 P4d, P4de 11/2020 8 40, 80 GB 8 $32.77
ML training, HPC across industries

V100 P3, P3dn 10/2017 ML training, HPC across industries 14+ 16, 32 GB 1, 4, 8 $3.06–$31.21

The universal GPU,

ML inference, training, remote visualization workstations,
T4 G4 9/2019 20+ 16 GB 1, 4, 8 $0.52–$7.82
rendering, video transcoding
Includes Quadro Virtual Workstation

EC2 G5g is now available in US East (N. Virginia), US West (Oregon), and Asia Pacific (Tokyo, Seoul, and Singapore) Regions; On-Demand, Reserved, and Spot pricing available
EC2 G5 is now available in US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions; On-Demand, Reserved, Spot, or as part of Savings Plans
EC2 P4d is now available in US East (N. Virginia and Ohio), US West (Oregon), Europe (Ireland and Frankfurt), and Asia Pacific (Tokyo and Seoul) Regions; On-Demand, Reserved, Spot, Dedicated Hosts, or Savings
Plans availability

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training computer vision and
conversational AI

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Proliferation of use cases
Industrial
Healthcare manufacturing
Patient monitoring Automated optical inspection
Smart hospitals Worker safety
Robot-assisted surgery Process automation

Retail Smart infrastructure

Detecting people movement Pedestrian safety
Analyzing action Traffic management
Warehouse logistics Waste management

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Creating an AI application is hard and complex

INGEST DATA PREP DATA SELECT MODEL TRAIN VALIDATE OPTIMIZE & DEPLOY INTEGRATE MONITOR
EXPORT IN APP

DATA PREPERATION TRAINING DEPLOYMENT

DATA PREPARATION TRAINING DEPLOYMENT

Labeling, annotating, and augmenting Model training, pruning, and optimizing Deploying and monitoring

Get started today with the TAO Toolkit: https://developer.nvidia.com/tao-toolkit-get-started

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA TAO Toolkit
Train, adapt, optimize
Create custom, production-ready AI models in hours rather than months

How can I run this?

• Containerized on
Amazon EC2
• Containerized with
Amazon EC2
• Bring-your-own-
container on Amazon
SageMaker

All available from the

NGC catalog

TRAIN EASILY CUSTOMIZE FASTER OPTIMIZE FOR DEPLOYMENT SUPPORTED BY EXPERTS

Fine-tune NVIDIA Built on TensorFlow and Optimize for inference Supported by NVIDIA experts
pretrained models with PyTorch that abstract away and integrate with to help resolve issues from
a fraction of the data the AI framework complexity Riva or DeepStream development to deployment

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The NVIDIA TAO stack

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-performance pretrained vision AI models
100
96 98 97
80 92
85 84
Accuracy (%)

60
56 Models Accuracy
40
Facial 6.1-pixel
20 landmark error
Gaze
6.5 RMSE
0 estimation

Inference
Performance
(FPS)
2D Pose
PeopleNet PeopleSemSegNet TrafficCamNet FaceDetectIR LPD LPR Facial Landmark Gaze Estimation
Estimation

Nano 11 1.4 18 104 66 94 5 125 98

Xavier NX 296 17 340 2000 1158 564 48 747 923

AGX Xavier 462 28 656 3915 1880 1045 84 1451 1627

A30 4163 330 4991 26635 12207 15960 1515 10078 15172
A100 6001 519 9520 50541 21931 26600 2686 23117 26534

15+ pretrained models – download for free from NGC

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pretrained conversational AI models

BERT
Jasper QuartzNet BERT NER FastPitch HiFi-GAN
Punctuation

BERT Text BERT Intent

CitriNet N-Gram
Classification & Slot

Domain-Specific BERT & Megatron

NER QA

Support for models that are used Adapt with your dataset using Deploy with turnkey inference
in the conversational AI pipeline NVIDIA TAO Toolkit applications in NVIDIA Riva

https://developer.nvidia.com/blog/building-and-deploying-conversational-ai-models-using-nvidia-tao-toolkit/

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resources
Getting Started with the TAO Toolkit

TAO Toolkit product page TAO Toolkit getting started page TAO Toolkit whitepaper
All information related to product Detailed information on how to get Includes examples on data
features and developer blogs started with the TAO Toolkit augmentation, adding new classes

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Developer resources
Computer vision
• TAO Toolkit computer vision models and container collection:
Download from NGC

• To deploy TAO Toolkit models using DeepStream, go to download

resources

2D Pose Estimation Supercharge your AI workflow • Collection of Jupyter Notebooks and training specs for vision AI models
Model with NVIDIA TAO with TAO Toolkit whitepaper
Toolkit Part 1 | Part 2
Conversational AI
• TAO Toolkit conversational AI models and container collection:
Download from NGC

• To deploy with Riva, go to download resources

• Get started with Jupyter Notebooks:

Speech Recognition | Question Answering | Text Classification
Train and deploy action Building conversational AI models Named Entity Recognition | Punctuation & Capitalization | Intent Detection
recognition model using the NVIDIA TAO Toolkit & Slot Tagging

TAO TOOLKIT GETTING STARTED PAGE

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training (at scale)
large language models

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LLMs unlock new opportunities
LLMs transcend language and pattern matching

Translating Brand Dynamic code Molecular

Summarization
Wikipedia creation commenting representations

Real-time
Marketing Gaming Function Drug
GPT-3 NLLB-200 metaverse DALL-E-2 CODEX MegaMolBART
copy characters generation discovery
translation

TEXT TRANSLATION IMAGE CODING LIFE SCIENCE

GENERATION GENERATION

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
When large language models make sense

Zero-shot (or few-shot learning)

Traditional NLP approach Large language models
Painful and impractical to get a large
Requires corpus of labeled data
Yes No
labeled data
Models can learn new tasks
Parameters 100s of millions Billions to trillions If you want models with “common sense”
and can generalize well to new tasks
Desired model General (model can do
capability
Specific (one model per task)
many tasks)
A single model can serve all use cases
At scale, you avoid costs and complexity of
Training Retrain frequently with Never retrain or retrain
many models, saving cost in data curation,
frequency task-specific training data minimally
training, and managing deployment

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training and deploying LLMs is not for the faint of heart
LLMs are challenging to build & Deploy

UNMET NEEDS

Large-scale data processing

• Training and deploying models take months to
years
Multilingual data processing & training
• Requires deep technical expertise
Finding optimal hyperparameters • Extensive compute resources in the scale of
1,000s GPUs for training a 530B model over
Convergence of models several months
Scaling on clouds • Tools to scale to 1,000s of GPUs are limited

Deploying for inference • All leading to high financial investments, in the

order of tens of millions of dollars for 175B+
Deployment at scale models

Evaluating models in industry standard benchmarks

Differing infrastructure setups

Lack of knowledge

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NeMo Megatron
End-to-end framework for training and deploying large-scale language models with trillions of parameters

Model availability

Models NVIDIA verified training recipes

GPT-3: 126M, 5B, 20B, 40B, 175B
T5: 220M, 3B, 11B, 23B, 41B
mT5: 170M, 390M, 3B, 11B, 23B

NVIDIA publicly available model

checkpoints
T5: 3B
GPT-3: 5B, 20B

Training and inference support for

popular community pretrained models
(coming in Q4 2022)

• Rapidly create and tune state-of-the-art custom language models

• Linear scaling to 1,000s of GPUs for up to a trillion parameter language models Now in open beta
• 30% speed-up in training using new sequence parallelism and selective Find out more:
activation recomputation techniques NVIDIA NeMo Megatron
• Distributed inference using Triton Inference Server
• Prompt learning capabilities with P-tuning and prompt tuning

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solving pain points across the stack
NeMo Megatron simplifies the path to an LLM
Unmet needs How we are helping

Large-scale data processing Data curation and preprocessing tools

Multilingual data processing and training Relative positional embedding (RPE) – multilingual support

Finding optimal hyperparameters Hyperparameter tool

Convergence of models Verified recipes for large GPT and T5-style models

Scaling on clouds Scripts/configs to run on AWS

Deploying for inference Model navigator + export to FT functionalities

Deployment at scale Quantization to accelerate inferencing

Evaluating models in industry-standard benchmarks Productization evaluation harness

Differing infrastructure setups Full-stack support with FP8 and Hopper support

Lack of knowledge Documentation

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NeMo Megatron
Value Proposition
End-to-end Performance at scale Easy to use Fastest time to solution
Bring your own data, SOTA training techniques Containerized Tools and SOTA performance
train and deploy LLM framework
• NeMo Megatron is an end-to-end
application framework for training
and deploying LLMs with billions
and trillions of parameters

• Turnkey containerized framework

with recipes for training and
deploying GPT-3 (up to 1T
parameters), T5, and mT5 (up to
50B parameters) style models

Customization Availability Battle-hardened Training container Inference container

Source-open approach Train on your choice Enterprise-grade framework with
of infrastructure verified recipes to work OOTB

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resources
GETTING STARTED
Register here for open beta
NVIDIA NeMo Megatron
NVIDIA brings large language AI Models to enterprises worldwide | NVIDIA newsroom

DEV BLOGS
Adapting P-Tuning to solve non-english downstream tasks
NVIDIA AI platform delivers big gains for large language models
Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, the world’s largest and most powerful generative language
model | NVIDIA developer blog

CUSTOMER STORIES
The King’s Swedish: AI rewrites the book in Scandinavia

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deployment and inference

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AI inference workflow
Two-part process implemented by multiple personas

Data scientist, ML
ML engineer engineer MLOps, DevOps

App developer
Query
Model Inference
optimization serving Result AI application

Trained Optimize for multiple Model Scaled multi-framework

models constraints for high- repo inference serving for
performance inference high-performance &
utilization on GPU/CPU

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference is complex
REAL TIME | COMPETING CONSTRAINTS | RAPID UPDATES

Query
Inference
TensorRT
Inference Triton
Serving
Result

Large trained models Low-latency inference, every framework

FRAMEWORKS CONSTRAINTS HARDWARE

Accuracy Memory Response

time Data center Jetson

Model
Throughput
architectures DRIVE

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A world-leading inference
WORLD LEADING performance
INFERENCE PERFORMANCE
TensorRT accelerates every workload

BEST-IN-CLASS RESPONSE TIME AND THROUGHPUT vs. CPUs

36x 583x 21x

Computer vision Speech recognition NLP
< 7 ms < 100 ms < 50 ms

10x 178x 12x

Reinforcement Text-to-speech Recommenders
learning < 100 ms < 1 sec

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA TensorRT
SDK for High-Performance Deep Learning Inference

Optimize and deploy neural networks in production

Maximize throughput for latency-critical applications with Trained TensorRT TensorRT
compiler and runtime; optimize every network, including CNNs, DNN Optimizer Runtime
RNNs, and transformers
1. Reduced mixed precision: FP32, TF32, FP16, and INT8
2. Layer and tensor fusion: Optimizes use of GPU memory
bandwidth
3. Kernel auto-tuning: Select best algorithm on target GPU
4. Dynamic tensor memory: Deploy memory-efficient
applications Embedded Automotive Data center
5. Multi-stream execution: Scalable design to process multiple
streams
6. Time fusion: Optimizes RNN over time steps
Jetson Drive Data center
GPUs

TensorRT Torch-TensorRT TensorFlow-TensorRT

TensorRT 8.4 GA is available for free to the members of the NVIDIA Developer Program: developer.nvidia.com/tensorrt

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA Triton Inference Server
Open-source software for fast, scalable, simplified inference serving

DevOps & Performance &

Any framework Any query type Any platform
MLOps utilization

Supports multiple Optimized for real X86 CPU | Arm Integration with Model Analyzer
framework time, batch, CPU | NVIDIA Kubernetes, for optimal
backends natively; streaming, GPUs | MIG KServe, configuration
e.g., TensorFlow, ensemble Prometheus &
PyTorch, TensorRT, inferencing Linux | Windows | Grafana Optimized for
XGBoost, ONNX, virtualization high GPU/CPU
Python & more Available across all utilization, high
Public cloud, data major cloud AI throughput & low
center, and platforms latency
edge/embedded
(Jetson)

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tritons architecture
Delivering high performance across frameworks

Model analyzer Model orchestration

Multiple client
applications
Query
Many
Python/C++ Dynamic batching active
Standard (real time, batch, stream)
client library Result models
HTTP/gRPC
Per model
scheduler queues
Or
Query Flexible model loading
In-process API (all, selective)
Python/C++ (directly
…
client library Result integrate into Model
Multiple GPU & CPU
client app via repository
backends
C or Java API)
Query
Custom
Python/C++
client library Metrics Kubernetes,
Result
Utilization, throughput, latency metrics Prometheus

GPU CPU

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Concurrent model execution
INCREASE THROUGHPUT AND UTILIZATION

Dynamic batching scheduler

GROUP REQUESTS TO FORM LARGER BATCHES, INCREASE GPU UTILIZATION

Optimal model configuration

USING THE MODEL ANALYZER CAPABILITY

Large language model inference

USING TRITON’S FASTERTRANSFORMER BACKEND

Model pipelines with business logic scripting

CONTROL FLOW AND LOOPS IN MODEL ENSEMBLES

Decoupled models
ALLOWS 0, 1, OR 1+ RESPONSES PER REQUEST

Triton Inference Server | NVIDIA Developer

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Real-time spell check for product search
Amazon Search

• One of the most visited ecommerce websites Optimize

model with
Choose best
config with
Deploy with
Triton
Triton Model Inference
TensorRT
• Deep learning (DL) AI model for automatic spell Analyzer Server

correction to search effortlessly

• Triton + TensorRT meets sub-50 ms latency target
and delivers 5x throughput for DL model on GPUs
on AWS
• Triton Model Analyzer reduced time to find
optimal configuration from weeks to hours

https://aws.amazon.com/blogs/machine-learning/how-amazon-search-achieves-low-latency-
high-throughput-t5-inference-with-nvidia-triton-on-aws/

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn more and download
For more information
https://developer.nvidia.com/nvidia-triton-inference-server

Get the ready-to-deploy container with monthly updates from the NGC catalog
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver

Open-source GitHub repository

https://github.com/NVIDIA/triton-inference-server

Latest release information

https://github.com/triton-inference-server/server/releases

Quick start guide

https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md

A Triton Inference Server container developed with NVIDIA – includes NVIDIA Triton
Inference Server along with useful environment variables to tune performance (e.g,.
set thread count) on SageMaker

Use with SageMaker Python SDK to deploy your models on scalable, cost-effective
SageMaker endpoints without worrying about Docker

Code examples to find readily usable code samples using Triton Inference Server
with popular machine learning frameworks on Amazon SageMaker

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker & Triton technical resources
Triton on Amazon SageMaker
Achieve hyperscale performance for model serving using NVIDIA Triton Inference Server on Amazon SageMaker
Amazon announces new NVIDIA Triton Inference Server on Amazon SageMaker
Deploy fast and scalable AI with NVIDIA Triton Inference Server in Amazon SageMaker
Use Triton Inference Server with Amazon SageMaker
How Amazon Search achieves low-latency, high-throughput T5 inference with NVIDIA Triton on AWS
Getting the most out of NVIDIA T4 on AWS G4 Instances
Deploying the Nvidia Triton Inference Server on Amazon ECS

AWS AI/ML Heroes collaboration

NVIDIA Triton spam detection engine of C-suite labs
Blurry faces: Training, optimizing and deploying a segmentation model on Amazon SageMaker with NVIDIA TensorRT
and NVIDIA Triton

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sign up for NVIDIA and AWS free ML Course
In this course, you will gain hands-on experience on building, training, and deploying scalable
machine learning models with Amazon SageMaker and Amazon EC2 instances powered by
NVIDIA GPUs

Hands-on Machine Learning with AWS/NVIDIA | Coursera

https://www.coursera.org/learn/machine-learning-aws-
nvidia

Free e-book: Dive into deep learning

https://d2l.ai

NVIDIA GPUs power the most compute-intensive workloads from computer

vision to speech to language and many more
NVIDIA TAO is a toolkit for training CV and speech models efficiently
NVIDIA NeMo Megatron is a open-source toolkit for large language model
training and deployment
NVIDIA TensorRT is an SDK for optimizing deep learning models
NVIDIA Triton is an inference server for deploying your models

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Join the NVIDIA Inception program for startups
Accelerate your startup’s growth and build your solutions faster with engineering guidance, free
technical training, preferred pricing on NVIDIA products, opportunities for customer introductions
and co-marketing, and exposure to the VC community

APPLY TO INCEPTION TODAY

https://www.nvidia.com/en-us/startups

GET THE LATEST NEWS, UPDATES, AND MORE

https://www.nvidia.com/en-us/preferences/email-signup/

MH2p Tutorial:: Red Menu
100% (1)
MH2p Tutorial:: Red Menu
12 pages
Generative AI A Transformative Force in Business Intelligence
No ratings yet
Generative AI A Transformative Force in Business Intelligence
7 pages
Databricks 101
No ratings yet
Databricks 101
16 pages
Steps To Create A VPC
No ratings yet
Steps To Create A VPC
2 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Netaji Subhash Chandra Bose
No ratings yet
Netaji Subhash Chandra Bose
6 pages
Describe The Concept of Insurance
No ratings yet
Describe The Concept of Insurance
12 pages
AI Driven Companies in Egypt
No ratings yet
AI Driven Companies in Egypt
16 pages
1 - Optimize Amazon SageMaker Deployment Strategies
No ratings yet
1 - Optimize Amazon SageMaker Deployment Strategies
45 pages
Agentic AI Fundamentals Quiz Complete With Code Diagrams Nida Rizwan
100% (1)
Agentic AI Fundamentals Quiz Complete With Code Diagrams Nida Rizwan
14 pages
Accelerate Computing Vision and Image Processing Using VPI 1.1 by Rodolfo Lima
No ratings yet
Accelerate Computing Vision and Image Processing Using VPI 1.1 by Rodolfo Lima
23 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Data Science Learning Path For 50 Days
No ratings yet
Data Science Learning Path For 50 Days
15 pages
AWS Machine Learning Engineer: Nanodegree Program Syllabus
100% (1)
AWS Machine Learning Engineer: Nanodegree Program Syllabus
18 pages
The Dawn of LMMS: Preliminary Explorations With Gpt-4V (Ision)
No ratings yet
The Dawn of LMMS: Preliminary Explorations With Gpt-4V (Ision)
166 pages
Scalable-ML-3 4 1
No ratings yet
Scalable-ML-3 4 1
147 pages
Deep Learning and Computer Vision For Video Analytics
No ratings yet
Deep Learning and Computer Vision For Video Analytics
37 pages
Yugandar - Generative AI Architect
No ratings yet
Yugandar - Generative AI Architect
8 pages
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
No ratings yet
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
2 pages
Six Week-Total Handson Internship Program On Machine Learning
No ratings yet
Six Week-Total Handson Internship Program On Machine Learning
8 pages
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Anomaly Detection: Course: Data Mining II
No ratings yet
Anomaly Detection: Course: Data Mining II
12 pages
Python Programming-Grade 9
No ratings yet
Python Programming-Grade 9
53 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
NLP and Generative AI Syllabus - 2025
No ratings yet
NLP and Generative AI Syllabus - 2025
5 pages
Prediction and Monitoring of Air Pollution Using Internet of Things (IoT)
No ratings yet
Prediction and Monitoring of Air Pollution Using Internet of Things (IoT)
4 pages
Aios LLM As Os
100% (2)
Aios LLM As Os
35 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Databricks Secure Deployments and Security Baselines
No ratings yet
Databricks Secure Deployments and Security Baselines
25 pages
Piyush Data Science 3
No ratings yet
Piyush Data Science 3
26 pages
Federated Learning Overview, Strategies, Applications, Tools and
No ratings yet
Federated Learning Overview, Strategies, Applications, Tools and
24 pages
Chart The Course For Responsible AI Governance
No ratings yet
Chart The Course For Responsible AI Governance
29 pages
DevOps - Fresher Training
No ratings yet
DevOps - Fresher Training
15 pages
Cert DEWD (Edits)
No ratings yet
Cert DEWD (Edits)
158 pages
Selecting An Enterprise Mlops Platform, 2021: Publication Date
No ratings yet
Selecting An Enterprise Mlops Platform, 2021: Publication Date
36 pages
Geographic Coordinate Conversion
No ratings yet
Geographic Coordinate Conversion
11 pages
Gpu-Applications-Catalog 2021
No ratings yet
Gpu-Applications-Catalog 2021
76 pages
Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
100% (1)
Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
504 pages
Strategies For Passing Microsoft Certified Azure AI Fundamentals (AI-900)
No ratings yet
Strategies For Passing Microsoft Certified Azure AI Fundamentals (AI-900)
42 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
Deloitte Take Home Challenge - V2
No ratings yet
Deloitte Take Home Challenge - V2
83 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
Example Star Schema For Banking
No ratings yet
Example Star Schema For Banking
16 pages
A Comprehensive Roadmap To Mastery in AI, ML, DS, DA, DSA & LLMs
No ratings yet
A Comprehensive Roadmap To Mastery in AI, ML, DS, DA, DSA & LLMs
24 pages
MLOps Continuous Delivery For ML On AWS
No ratings yet
MLOps Continuous Delivery For ML On AWS
69 pages
Abstract On The Artificial Intelegence
No ratings yet
Abstract On The Artificial Intelegence
15 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Building Your Career in Data Science
No ratings yet
Building Your Career in Data Science
7 pages
Ontology Unit 2 Notes
No ratings yet
Ontology Unit 2 Notes
25 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Vendor Selection Matrix Aiops Platforms Analyst Paper
No ratings yet
Vendor Selection Matrix Aiops Platforms Analyst Paper
43 pages
Cloudera Nokia Case Study Final
No ratings yet
Cloudera Nokia Case Study Final
2 pages
A Guide To: Data Science at Scale
No ratings yet
A Guide To: Data Science at Scale
20 pages
7 Data Science Trends You Need To Know Going Into 2023
No ratings yet
7 Data Science Trends You Need To Know Going Into 2023
7 pages
Chartered Data Scientists Curriculum 2020 PDF
No ratings yet
Chartered Data Scientists Curriculum 2020 PDF
4 pages
Maths of Machine Learning
No ratings yet
Maths of Machine Learning
75 pages
Machine Learning + Devops Using Azure ML Services
No ratings yet
Machine Learning + Devops Using Azure ML Services
17 pages
Amruta Academy Brochure - Artificial Intelligence
100% (1)
Amruta Academy Brochure - Artificial Intelligence
18 pages
MLOps Buyers Guide by Seldon
No ratings yet
MLOps Buyers Guide by Seldon
11 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
3 Analysis (Easier Awk)
No ratings yet
3 Analysis (Easier Awk)
36 pages
SPARE RMIO and NAMC-51
No ratings yet
SPARE RMIO and NAMC-51
4 pages
Free PDF en
No ratings yet
Free PDF en
16 pages
T24 Notes Programming
No ratings yet
T24 Notes Programming
13 pages
Vishal - Girisagar - Resume
100% (1)
Vishal - Girisagar - Resume
3 pages
Test Results For MCRNC IP Connectivity With Juniper EX 4500 v1 0
No ratings yet
Test Results For MCRNC IP Connectivity With Juniper EX 4500 v1 0
10 pages
1st Unit - Jupyter Notebook
No ratings yet
1st Unit - Jupyter Notebook
16 pages
PC Cards
No ratings yet
PC Cards
11 pages
Logs
No ratings yet
Logs
76 pages
Computer System Architecture MCA - 301
No ratings yet
Computer System Architecture MCA - 301
10 pages
Assignment 02 BigData Computing Noc23-Cs112
No ratings yet
Assignment 02 BigData Computing Noc23-Cs112
9 pages
C92 User Manual V2,1
No ratings yet
C92 User Manual V2,1
12 pages
Sap Sybase Ase 15.7 Tutorial: V Entric Technologies PVT LTD
No ratings yet
Sap Sybase Ase 15.7 Tutorial: V Entric Technologies PVT LTD
33 pages
EME Diagnostic
No ratings yet
EME Diagnostic
28 pages
Draft Syllabus NIELIT Certified Hardware IT Support Executive
No ratings yet
Draft Syllabus NIELIT Certified Hardware IT Support Executive
47 pages
MaxDB - Administration
No ratings yet
MaxDB - Administration
20 pages
Desarrollo y Gestion de Proyectos Informaticos Steve Mcconnell PDF
No ratings yet
Desarrollo y Gestion de Proyectos Informaticos Steve Mcconnell PDF
5 pages
Installation of The STB Emu App
No ratings yet
Installation of The STB Emu App
6 pages
Catapult Ref Man
No ratings yet
Catapult Ref Man
372 pages
Chapter 2 - Networking & Telecommunication
No ratings yet
Chapter 2 - Networking & Telecommunication
32 pages
Settingsprovider
No ratings yet
Settingsprovider
83 pages
BIOS Level Programming
No ratings yet
BIOS Level Programming
39 pages
Ether Channel: Prepared By: Rezdar Hassan Superviced: Mr.s Shilan Abdullah
No ratings yet
Ether Channel: Prepared By: Rezdar Hassan Superviced: Mr.s Shilan Abdullah
10 pages
Clamav Report 110410 170516
No ratings yet
Clamav Report 110410 170516
2 pages
MPMC by Godse
No ratings yet
MPMC by Godse
5 pages
Xenon Motherboard Theory of Operations
No ratings yet
Xenon Motherboard Theory of Operations
35 pages
PSCAD v4-6 Software Setup - Quick Start (Certificate Licensing)
No ratings yet
PSCAD v4-6 Software Setup - Quick Start (Certificate Licensing)
3 pages
ROBOTC Software Inspection Guide
No ratings yet
ROBOTC Software Inspection Guide
95 pages