Slides Deep Learning On AWS With NVIDIA From Training To Deployment
Slides Deep Learning On AWS With NVIDIA From Training To Deployment
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep learning on AWS with NVIDIA:
From training to deployment
Michael Lang
Solutions Architect Manager – APAC South
NVIDIA
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• NVIDIA and AWS relationship
• NVIDIA AI on AWS
• ML model training (at scale)
• ML model deployment and inference
• Conclusion
• Next steps
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA AI
End-to-end open platform for production AI
Application workflows
CLARA RIVA TOKKIO MERLIN MODULUS MAXINE METROPOLIS CUOPT NEMO ISAAC DRIVE MORPHEUS
Medical Speech AI Customer Recommenders Physics Video Video Logistics Conversational Robotics Autonomous Cybersecurity
imaging service ML analytics AI vehicles
Hands-on
Infrastructure optimization labs
Accelerated infrastructure
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA and AWS relationship
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GPU power from the cloud to the edge
https://aws.amazon.com/nvidia/
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GPU power from the cloud to the edge
The highest-performance instance
Improve your operations with
for ML training and HPC
computer vision at the edge
applications powered by NVIDIA
powered by NVIDIA Jetson
A100 GPUs
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA AI on AWS
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA A100
Supercharging High performing ai supercomputing gpu
80 GB HBM2e 2 TB/s +
For largest datasets High-memory bandwidth
and models to feed extremely fast GPU
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA H100 – Coming soon to AWS
The new engine of the world’s AI infrastructure
Powering the next generation of GPU systems on AWS Confidential 4th-gen DPX instructions
computing NVLink
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA H100 supercharges large language models
Hopper architecture addresses LLM needs at scale
Supercharged LLM training High-performance prompt learning 30x real-time inference throughput
Lower is better
4K A100
4 5X
Training time (weeks)
3
300
1 month concurrent
to users
2 1 week
Days
to 10
concurrent
1 hours 1X
4K H100 users
0
70 175 530 1000 A100 H100
A100 H100
Time-to-train by LLM size 530B P-tuning time-to-train 530B inference on 10 DGX systems
(billion parameters)
LLM Training | 4,096 GPUs | H100 NDR IB | A100 HDR IB | 300 billion tokens
P-tuning | DGX H100 | DGX A100 | 530B Q&A tuning using SQuAD dataset
Inference | Chatbot | 10 DGX H100 NDR IB | 10 DGX A100 HDR IB | <1 second latency | 1 inference/second/user
H100 data center projected workload performance, subject to change
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NGC
Portal to AI services, software, support
NGC catalog
Training Speedup
1.4
0.9
May '21 Nov '21 May '22
AI services for NLP, biology, speech Monthly SW container updates Detailed security scan reports One-click deploy from NGC
Multiple cloud
providers
AI workflow management & support SOTA models Model resumes Develop once; deploy
anywhere with NVIDIA VMI
ngc.nvidia.com
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2 instances powered by NVIDIA GPUs
Accessible via AWS, AWS Marketplace, and AWS services
Best performance,
A100 P4d, P4de 11/2020 8 40, 80 GB 8 $32.77
ML training, HPC across industries
V100 P3, P3dn 10/2017 ML training, HPC across industries 14+ 16, 32 GB 1, 4, 8 $3.06–$31.21
EC2 G5g is now available in US East (N. Virginia), US West (Oregon), and Asia Pacific (Tokyo, Seoul, and Singapore) Regions; On-Demand, Reserved, and Spot pricing available
EC2 G5 is now available in US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions; On-Demand, Reserved, Spot, or as part of Savings Plans
EC2 P4d is now available in US East (N. Virginia and Ohio), US West (Oregon), Europe (Ireland and Frankfurt), and Asia Pacific (Tokyo and Seoul) Regions; On-Demand, Reserved, Spot, Dedicated Hosts, or Savings
Plans availability
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training computer vision and
conversational AI
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Proliferation of use cases
Industrial
Healthcare manufacturing
Patient monitoring Automated optical inspection
Smart hospitals Worker safety
Robot-assisted surgery Process automation
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Creating an AI application is hard and complex
INGEST DATA PREP DATA SELECT MODEL TRAIN VALIDATE OPTIMIZE & DEPLOY INTEGRATE MONITOR
EXPORT IN APP
Labeling, annotating, and augmenting Model training, pruning, and optimizing Deploying and monitoring
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA TAO Toolkit
Train, adapt, optimize
Create custom, production-ready AI models in hours rather than months
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The NVIDIA TAO stack
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-performance pretrained vision AI models
100
96 98 97
80 92
85 84
Accuracy (%)
60
56 Models Accuracy
40
Facial 6.1-pixel
20 landmark error
Gaze
6.5 RMSE
0 estimation
Inference
Performance
(FPS)
2D Pose
PeopleNet PeopleSemSegNet TrafficCamNet FaceDetectIR LPD LPR Facial Landmark Gaze Estimation
Estimation
A30 4163 330 4991 26635 12207 15960 1515 10078 15172
A100 6001 519 9520 50541 21931 26600 2686 23117 26534
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pretrained conversational AI models
BERT
Jasper QuartzNet BERT NER FastPitch HiFi-GAN
Punctuation
Support for models that are used Adapt with your dataset using Deploy with turnkey inference
in the conversational AI pipeline NVIDIA TAO Toolkit applications in NVIDIA Riva
https://developer.nvidia.com/blog/building-and-deploying-conversational-ai-models-using-nvidia-tao-toolkit/
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resources
Getting Started with the TAO Toolkit
TAO Toolkit product page TAO Toolkit getting started page TAO Toolkit whitepaper
All information related to product Detailed information on how to get Includes examples on data
features and developer blogs started with the TAO Toolkit augmentation, adding new classes
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Developer resources
Computer vision
• TAO Toolkit computer vision models and container collection:
Download from NGC
2D Pose Estimation Supercharge your AI workflow • Collection of Jupyter Notebooks and training specs for vision AI models
Model with NVIDIA TAO with TAO Toolkit whitepaper
Toolkit Part 1 | Part 2
Conversational AI
• TAO Toolkit conversational AI models and container collection:
Download from NGC
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training (at scale)
large language models
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LLMs unlock new opportunities
LLMs transcend language and pattern matching
Real-time
Marketing Gaming Function Drug
GPT-3 NLLB-200 metaverse DALL-E-2 CODEX MegaMolBART
copy characters generation discovery
translation
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
When large language models make sense
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training and deploying LLMs is not for the faint of heart
LLMs are challenging to build & Deploy
UNMET NEEDS
Lack of knowledge
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NeMo Megatron
End-to-end framework for training and deploying large-scale language models with trillions of parameters
Model availability
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solving pain points across the stack
NeMo Megatron simplifies the path to an LLM
Unmet needs How we are helping
Multilingual data processing and training Relative positional embedding (RPE) – multilingual support
Convergence of models Verified recipes for large GPT and T5-style models
Differing infrastructure setups Full-stack support with FP8 and Hopper support
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NeMo Megatron
Value Proposition
End-to-end Performance at scale Easy to use Fastest time to solution
Bring your own data, SOTA training techniques Containerized Tools and SOTA performance
train and deploy LLM framework
• NeMo Megatron is an end-to-end
application framework for training
and deploying LLMs with billions
and trillions of parameters
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resources
GETTING STARTED
Register here for open beta
NVIDIA NeMo Megatron
NVIDIA brings large language AI Models to enterprises worldwide | NVIDIA newsroom
DEV BLOGS
Adapting P-Tuning to solve non-english downstream tasks
NVIDIA AI platform delivers big gains for large language models
Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, the world’s largest and most powerful generative language
model | NVIDIA developer blog
CUSTOMER STORIES
The King’s Swedish: AI rewrites the book in Scandinavia
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deployment and inference
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AI inference workflow
Two-part process implemented by multiple personas
Data scientist, ML
ML engineer engineer MLOps, DevOps
App developer
Query
Model Inference
optimization serving Result AI application
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference is complex
REAL TIME | COMPETING CONSTRAINTS | RAPID UPDATES
Query
Inference
TensorRT
Inference Triton
Serving
Result
Model
Throughput
architectures DRIVE
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A world-leading inference
WORLD LEADING performance
INFERENCE PERFORMANCE
TensorRT accelerates every workload
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA TensorRT
SDK for High-Performance Deep Learning Inference
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Download TensorRT today
Tensorflow with Tensorrt
TensorRT 8.4 GA is available for free to the members of the NVIDIA Developer Program: developer.nvidia.com/tensorrt
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NVIDIA Triton Inference Server
Open-source software for fast, scalable, simplified inference serving
Supports multiple Optimized for real X86 CPU | Arm Integration with Model Analyzer
framework time, batch, CPU | NVIDIA Kubernetes, for optimal
backends natively; streaming, GPUs | MIG KServe, configuration
e.g., TensorFlow, ensemble Prometheus &
PyTorch, TensorRT, inferencing Linux | Windows | Grafana Optimized for
XGBoost, ONNX, virtualization high GPU/CPU
Python & more Available across all utilization, high
Public cloud, data major cloud AI throughput & low
center, and platforms latency
edge/embedded
(Jetson)
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tritons architecture
Delivering high performance across frameworks
GPU CPU
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Concurrent model execution
INCREASE THROUGHPUT AND UTILIZATION
Decoupled models
ALLOWS 0, 1, OR 1+ RESPONSES PER REQUEST
https://aws.amazon.com/blogs/machine-learning/how-amazon-search-achieves-low-latency-
high-throughput-t5-inference-with-nvidia-triton-on-aws/
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn more and download
For more information
https://developer.nvidia.com/nvidia-triton-inference-server
Get the ready-to-deploy container with monthly updates from the NGC catalog
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Triton Inference Server on Amazon SageMaker
A Triton Inference Server container developed with NVIDIA – includes NVIDIA Triton
Inference Server along with useful environment variables to tune performance (e.g,.
set thread count) on SageMaker
Use with SageMaker Python SDK to deploy your models on scalable, cost-effective
SageMaker endpoints without worrying about Docker
Code examples to find readily usable code samples using Triton Inference Server
with popular machine learning frameworks on Amazon SageMaker
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker & Triton technical resources
Triton on Amazon SageMaker
Achieve hyperscale performance for model serving using NVIDIA Triton Inference Server on Amazon SageMaker
Amazon announces new NVIDIA Triton Inference Server on Amazon SageMaker
Deploy fast and scalable AI with NVIDIA Triton Inference Server in Amazon SageMaker
Use Triton Inference Server with Amazon SageMaker
How Amazon Search achieves low-latency, high-throughput T5 inference with NVIDIA Triton on AWS
Getting the most out of NVIDIA T4 on AWS G4 Instances
Deploying the Nvidia Triton Inference Server on Amazon ECS
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sign up for NVIDIA and AWS free ML Course
In this course, you will gain hands-on experience on building, training, and deploying scalable
machine learning models with Amazon SageMaker and Amazon EC2 instances powered by
NVIDIA GPUs
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recap and next steps
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recap and key takeaways
What did we learn today?
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Join the NVIDIA Inception program for startups
Accelerate your startup’s growth and build your solutions faster with engineering guidance, free
technical training, preferred pricing on NVIDIA products, opportunities for customer introductions
and co-marketing, and exposure to the VC community
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Michael Lang
[email protected]
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.