0% found this document useful (0 votes)

18 views

Module 3

Uploaded by

maha.kandadai

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Module 3

Uploaded by

maha.kandadai

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Intro to Inferentia

Accelerate Inference on AWS

Henry Axelrod
Principal WW Data & AI PSA

© 2024, Amazon Web Services, Inc. or its affiliates. All rights © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
reserved.
AWS Inferentia

HIGH PERFORMANCE AND LOWEST COST INFERENCE IN THE

CLOUD

Up to 25% higher Up to 70% lower cost

Support for popular
throughput vs per inference than
ML frameworks
comparable GPU- comparable GPU-
including PyTorch and
based Amazon EC2 based Amazon EC2
TensorFlow
instances instances

Host PCIe
Inferentia2 DM
Inf2 powered by Inferentia2
DM
DM DMA
Collective
DMA
DMA
AAA
DMA
Communication
HBM

NeuronCore-v2 NeuronCore-v2
BF16/FP16 INT8
On-chip Tensor On-chip Tensor
SRAM Engine SRAM Engine 2.3 PFLOPS 4.6 petaOPS
memor memor
y y

Vector Scalar Vector Scalar A G G R E G AT E NETWORK

Engin Engin Engin Engin A C C E L E R AT O R M E M O R Y CONNECTIVITY
e e e e
384 GB 100 Gbps
HBM

GPSIMD GPSIMD
Engine Engine

NEURONCORE V2 Supports
NeuronLink-v2 NeuronLink-v2 P y To r c h &
NEURONLINK V2 Te n s o r F l o w

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2 Inf2 instances
IN PREVIEW: THE MOST COST-EFFICIENT DL INFERENCE INSTANCE

Up to 2.5x higher throughput (vs. Inf1)

15x more memory bandwidth

Deploy 175B parameter models in a single server

Inferentia2 Accelerator
Instance size vCPUs NeuronLink Instance memory Instance networking
chips memory

Inf2.xlarge 4 1 32 GB N/A 16 GB Up to 15 Gbps

Inf2.8xlarge 32 1 32 GB N/A 128 GB Up to 25 Gbps

Inf2.24xlarge 96 6 192 GB N/A 384 GB 50 Gbps

Inf2.48xlarge 192 12 384 GB 192 GB/s 768 GB 100 Gbps

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deploy Llama2 with up to 4.7x lower cost with AWS
Inferentia2 Average Higher Throughput (Tokens/Sec)
395.
Amazon Inf2
1 3x
Up to

7B
Comparable 131.

3.7x
Instance* 3
331.
Amazon Inf2
9 3.7x

13B
Comparable
89.5
Instance*
139.
Amazon Inf2
7

70B
Achieve higher inference Comparable
OOM
Instance*
throughput on Llama 2
models
Average Lower Per Token Latency

Up to Amazon Inf2 10.2 69%

4.7x
7B
Comparable
Instance*
33.7

Amazon Inf2 15.2 77%

13B

Comparable
66.2
Instance*

Lower cost to deploy with Amazon Inf2 28.6

70B

Comparable
Inf2 for Llama 2 models Instance*
OOM

AWS Inferentia2 vs Comparable Inference Optimized Amazon EC2 Instances

*Comparable Inference Optimized Amazon EC2

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Instances
Comparing average performance of Llama2 7B, 13B and 70B across Batch 1-8, 512-2k Context Lengths, AWS 3Yr RI
Instance Pricing
AWS Inferentia2 Performance: BERT/RoBERTa
models
RoBERTa-Base Benchmarks Inf2 vs. Comparable inference-
Inf2 [BF16]; Comparable EC2 instance [Mixed Precision],
seqlen 128 optimized Amazon EC2 instance
Inf2 [BF16]; Comparable EC2 instance [Mixed Precision], seqlen 128

Higher
Lower Lower
Comparable inference-optimized Amazon Model Throughp
EC2 instance Latency Cost
ut
BERT-base 2.6x 6.7x 3.4x
Latency

BERT-large 2.2x 5.0x 2.9x

8.1
RoBERTa-
base
3x 8.1x 4x

up
3x up
to
higher throughput
to
x
lower latency

Throughput

Latency (ms)
Traditional Sampling
Llama 3 70B

TTFT
Speculative
Decoding 4x Lower Latency
Llama 3 8B/70B

Traditional Sampling
Llama 3 70B

PTL
Speculative
Decoding 4x Lower Latency
Llama 3 8B/70B

With Speculative Decoding on

Inferentia deliver up to

80
Higher Throughput per dollar Amazon Inf2
SDXL 1.1

887.2

%
to deploy Diffusion Models
with Amazon Inf2 Comparable
Amazon EC2 491.5
Instances
Throughput / $

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. SDXL 1.1, 1024x1024, 30 steps, AWS 3 Yr RI Instance Pricing
AWS Inferentia2 Performance: Vision
models
UP TO 5X LOWER INFERENCE COST FOR DIFFUSION MODELS

Cost per
Image Time
Model 1000
Size (sec)
images
Stable Diffusion 1.5 512 2.4 $0.51
Stable Diffusion 1.5 768 7.9 $1.6

Stable Diffusion 2.1 512 1.9 $0.41

Stable Diffusion 2.1 768 6.1 $1.40

Stable Diffusion XL 1024 14.9 $3.41

Stable Diffusion XL +
1024 13.0 $2.96
Refiner
Notes:
• SD 2.1, SD 1.5, and SD XL models from Hugging
Face
• Batch =1, Iterations = 50
• Neuron results: FP32/autocast and BF16
• Using on-demand instance pricing

Americas EMEA APAC

Inf1 N. Virginia Ohio Inf1 Dublin Inf1 Tokyo Mumbai*
N. Virginia Ohio Oregon Stockholm Frankfurt Beijing Singapore Melbourne*
Ohio Oregon LHR Stockholm Bahrain Mumbai
Oregon Sao Paulo Milan London Sydney Sydney
Paris Paris Tokyo
Capetown Hong Kong
Dublin Seoul
Trn1/n Frankfurt Singapore
Inf2 N. Virginia Mumbai
Trn1/n Trn1/n
Inf2 Stockholm* Inf2 Tokyo*
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
*Coming in 2024
AWS Neuron Getting Started
https://bityl.co/IcJO

Thank you!

Sample Codes Training and Inference

Performance

Amazon © Web2024, Amazon
Services, Web Services,
reserved.
AWS, the Powered Inc. logo,
by AWS or itsand
affiliates.
all AWS All rights reserved.
service names used in this slide deck are trademarks of Amazon.com,
Inc. or its affiliates.

Practical Data Science On AWS: Generative AI
No ratings yet
Practical Data Science On AWS: Generative AI
41 pages
Packt Hands-On Serverless Deep Learning With Tensorflow and Aws Lambda 2019
100% (1)
Packt Hands-On Serverless Deep Learning With Tensorflow and Aws Lambda 2019
204 pages
REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1
No ratings yet
REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1
37 pages
Module 4
No ratings yet
Module 4
14 pages
Reinvent Online Recap 2018 v5 425675166 190103174030 PDF
No ratings yet
Reinvent Online Recap 2018 v5 425675166 190103174030 PDF
50 pages
High-Performance & Cost-Effective Model Deployment With Amazon SageMaker
No ratings yet
High-Performance & Cost-Effective Model Deployment With Amazon SageMaker
21 pages
machine-learning-in-practice-cb2b8168-7f52-4410-984c-70830799aa68-1111656172-180813160029
No ratings yet
machine-learning-in-practice-cb2b8168-7f52-4410-984c-70830799aa68-1111656172-180813160029
50 pages
e-Book___The_DOs_and_DON’Ts_for_LLMs_today
No ratings yet
e-Book___The_DOs_and_DON’Ts_for_LLMs_today
1 page
AWS Innovate Q4T7S5
No ratings yet
AWS Innovate Q4T7S5
46 pages
Handout Migration Compute Best Practices and Design Patterns For Resiliency
No ratings yet
Handout Migration Compute Best Practices and Design Patterns For Resiliency
37 pages
AIM301 Deep Learning With TensorFlow PyTorch and MXNet on AWS
No ratings yet
AIM301 Deep Learning With TensorFlow PyTorch and MXNet on AWS
29 pages
Unleashing The Power of AWS With Intel PDF Slide
No ratings yet
Unleashing The Power of AWS With Intel PDF Slide
14 pages
Emotion Recognition in Images - MXNet Meetup SF - Aug 2018
No ratings yet
Emotion Recognition in Images - MXNet Meetup SF - Aug 2018
28 pages
Handout Accelerate To Production With Serverless Compute For Generative AI Applications
No ratings yet
Handout Accelerate To Production With Serverless Compute For Generative AI Applications
43 pages
Amazon EC2 Foundations
No ratings yet
Amazon EC2 Foundations
74 pages
AWS Core Services
No ratings yet
AWS Core Services
18 pages
Elastic Compute Cloud (EC2)
No ratings yet
Elastic Compute Cloud (EC2)
59 pages
Intel AI Everywhere
No ratings yet
Intel AI Everywhere
29 pages
RSTR Airesources en
No ratings yet
RSTR Airesources en
4 pages
AWS Solutions Architect Cheat Sheet Nov 2024
No ratings yet
AWS Solutions Architect Cheat Sheet Nov 2024
148 pages
Deep Learning Projects Using TensorFlow 2: Neural Network Development with Python and Keras 1st Edition Vinita Silaparasetty - Download the complete ebook in PDF format and read freely
No ratings yet
Deep Learning Projects Using TensorFlow 2: Neural Network Development with Python and Keras 1st Edition Vinita Silaparasetty - Download the complete ebook in PDF format and read freely
60 pages
Intel Bluedata Ai ML Webinar 62818 Final - 435477
No ratings yet
Intel Bluedata Ai ML Webinar 62818 Final - 435477
31 pages
Amazon SageMaker DataWrangler Deep Dive Deck
No ratings yet
Amazon SageMaker DataWrangler Deep Dive Deck
30 pages
3AWS - Cloud Services Demo and Use Case
No ratings yet
3AWS - Cloud Services Demo and Use Case
43 pages
Discussion 1 Cloud Computing Technologies
No ratings yet
Discussion 1 Cloud Computing Technologies
3 pages
11-15
No ratings yet
11-15
6 pages
Paper 4
No ratings yet
Paper 4
10 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Products Whizcard Saa c02!26!23
No ratings yet
Products Whizcard Saa c02!26!23
132 pages
First Hop Redundancy Protocol: Network Redundancy Protocol
From Everand
First Hop Redundancy Protocol: Network Redundancy Protocol
Mulayam Singh
No ratings yet
SAAU Snacking AWS Presentation
No ratings yet
SAAU Snacking AWS Presentation
17 pages
Amazon Bedrock Limited Preview - 20230630 Release features for the field 20230717 vF
No ratings yet
Amazon Bedrock Limited Preview - 20230630 Release features for the field 20230717 vF
38 pages
Scale Machine Learning From Zero To Millions of Users - Praveen Jayamumar
No ratings yet
Scale Machine Learning From Zero To Millions of Users - Praveen Jayamumar
38 pages
Yousef Udacity Deep Learning Part 3 CNN
No ratings yet
Yousef Udacity Deep Learning Part 3 CNN
253 pages
Full Deep Learning Projects Using TensorFlow 2: Neural Network Development With Python and Keras 1st Edition Vinita Silaparasetty PDF All Chapters
100% (3)
Full Deep Learning Projects Using TensorFlow 2: Neural Network Development With Python and Keras 1st Edition Vinita Silaparasetty PDF All Chapters
62 pages
Introducing Amazon Machine Learning
No ratings yet
Introducing Amazon Machine Learning
3 pages
Harness The Power of AWS To Build Game-Changing Applications
No ratings yet
Harness The Power of AWS To Build Game-Changing Applications
44 pages
Instant download (Ebook) Deep Learning Projects Using TensorFlow 2: Neural Network Development with Python and Keras by Vinita Silaparasetty ISBN 9781484258019, 1484258010 pdf all chapter
100% (7)
Instant download (Ebook) Deep Learning Projects Using TensorFlow 2: Neural Network Development with Python and Keras by Vinita Silaparasetty ISBN 9781484258019, 1484258010 pdf all chapter
52 pages
AWS-Solutions-Architect-Cheat-Sheet-Feb-2025
No ratings yet
AWS-Solutions-Architect-Cheat-Sheet-Feb-2025
65 pages
7 Leading Machine Learning Use Cases
No ratings yet
7 Leading Machine Learning Use Cases
11 pages
Slides Transform Your Business With AI ML Create A Competitive Advantage in Your Organization by Leveraging AI ML Latest Trends
No ratings yet
Slides Transform Your Business With AI ML Create A Competitive Advantage in Your Organization by Leveraging AI ML Latest Trends
47 pages
Reinvent - Recap 2019 PDF
No ratings yet
Reinvent - Recap 2019 PDF
88 pages
CSAA Whizcard Revised 19 07 2021
No ratings yet
CSAA Whizcard Revised 19 07 2021
119 pages
AWS ML Notes -Domain 3 - Deployment
No ratings yet
AWS ML Notes -Domain 3 - Deployment
30 pages
AWS AI Services
No ratings yet
AWS AI Services
30 pages
Build+business+outcomes+with+artificial+intelligence+and+machine+learning+-+Spencer+Marley+and+Aashmeet+Kalra-1
No ratings yet
Build+business+outcomes+with+artificial+intelligence+and+machine+learning+-+Spencer+Marley+and+Aashmeet+Kalra-1
30 pages
[25D2S01]_[Keynote Speech] AI First 시대의 시작
No ratings yet
[25D2S01]_[Keynote Speech] AI First 시대의 시작
43 pages
Deepdive On Amazon Sagemaker and Aws Reinvent New Features
No ratings yet
Deepdive On Amazon Sagemaker and Aws Reinvent New Features
31 pages
Amazon web service
No ratings yet
Amazon web service
49 pages
7 Leading Machine Learning Use Cases
No ratings yet
7 Leading Machine Learning Use Cases
11 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
aws ai cert mock exam 1 (detailed ans)
No ratings yet
aws ai cert mock exam 1 (detailed ans)
88 pages
B Tech Major Project Report Final B Tech Major Project Report Final
No ratings yet
B Tech Major Project Report Final B Tech Major Project Report Final
56 pages
Linux Essentials for Hackers & Pentesters
From Everand
Linux Essentials for Hackers & Pentesters
Linux Advocate Team
No ratings yet
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
From Everand
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
Linux Advocate Team
No ratings yet
GAI Workshop L200 Budiling With GenAI On AWS ASEAN
No ratings yet
GAI Workshop L200 Budiling With GenAI On AWS ASEAN
87 pages
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
No ratings yet
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
119 pages
AcademyCloudFoundations Module 06
No ratings yet
AcademyCloudFoundations Module 06
89 pages
The First Artificial Neuron
No ratings yet
The First Artificial Neuron
2 pages
AcademyCloudFoundations Module 06
No ratings yet
AcademyCloudFoundations Module 06
90 pages
Chapter8_Decision_Trees_Exercises_v2_20230112
No ratings yet
Chapter8_Decision_Trees_Exercises_v2_20230112
42 pages
Chapter7_Clustering_Exercises_v2_20230112
No ratings yet
Chapter7_Clustering_Exercises_v2_20230112
49 pages
7 Day Vegan_Vegetarian Meal Plan For Weight Gain _ Holland & Barrett
No ratings yet
7 Day Vegan_Vegetarian Meal Plan For Weight Gain _ Holland & Barrett
17 pages
Chapter8_Regression_Exercises_v2_20230112
No ratings yet
Chapter8_Regression_Exercises_v2_20230112
13 pages
webinarcloudmigration-210526181903
No ratings yet
webinarcloudmigration-210526181903
67 pages
9781788836
No ratings yet
9781788836
100 pages
AWS Resold
No ratings yet
AWS Resold
14 pages
c113
No ratings yet
c113
110 pages
Prompt Engineering
100% (1)
Prompt Engineering
33 pages
1722416031637
No ratings yet
1722416031637
8 pages
1722414346054
No ratings yet
1722414346054
18 pages
Ucl Dissertation Word Count
100% (2)
Ucl Dissertation Word Count
5 pages
Systems Approach in Management
0% (1)
Systems Approach in Management
8 pages
User Guide: Truly Wireless Earphones
No ratings yet
User Guide: Truly Wireless Earphones
62 pages
Org Stru
No ratings yet
Org Stru
14 pages
Grade 11 - Empowerment Technology - Week 3 REVISED
No ratings yet
Grade 11 - Empowerment Technology - Week 3 REVISED
6 pages
RPMC Lateral 2023
No ratings yet
RPMC Lateral 2023
9 pages
2022 Grade 11 3rd Tem Sinhala Lit
No ratings yet
2022 Grade 11 3rd Tem Sinhala Lit
5 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
2016r2 Tapflo Manual - Centrifugal Pump CTS
No ratings yet
2016r2 Tapflo Manual - Centrifugal Pump CTS
30 pages
ETRA 2019 Wolfersdorrff Consulting
No ratings yet
ETRA 2019 Wolfersdorrff Consulting
12 pages
Crystal Report
No ratings yet
Crystal Report
176 pages
DITEC
No ratings yet
DITEC
4 pages
Veterinary Homeopathy Book Download
No ratings yet
Veterinary Homeopathy Book Download
479 pages
ETom Ebook PDF
100% (3)
ETom Ebook PDF
1,460 pages
SUST Admission Test 2017 - 18 (English Analysis Missing)
No ratings yet
SUST Admission Test 2017 - 18 (English Analysis Missing)
6 pages
Enterprise Risk Management Process Framework
No ratings yet
Enterprise Risk Management Process Framework
18 pages
USA - Prasanth Sir - Oct 2023
100% (1)
USA - Prasanth Sir - Oct 2023
68 pages
OSCM-471571_Homework5_Sp25
No ratings yet
OSCM-471571_Homework5_Sp25
5 pages
Stefab Quotation (Option - 1)
No ratings yet
Stefab Quotation (Option - 1)
2 pages
CCNA 4 (v5.0.3 + v6.0) Practice Final Exam Answers Full PDF
No ratings yet
CCNA 4 (v5.0.3 + v6.0) Practice Final Exam Answers Full PDF
28 pages
Question Bank Programs Python CS
No ratings yet
Question Bank Programs Python CS
17 pages
Ralph James Lo BSA 1A
No ratings yet
Ralph James Lo BSA 1A
7 pages
Bhargavi Resume
No ratings yet
Bhargavi Resume
3 pages
CAT 2020 Question Paper With Solution Slot 1 QA
No ratings yet
CAT 2020 Question Paper With Solution Slot 1 QA
34 pages
00 Cover
No ratings yet
00 Cover
6 pages
Attachment 1-60-6KBPD Pump Hydraulic Calculation
No ratings yet
Attachment 1-60-6KBPD Pump Hydraulic Calculation
3 pages
SV
No ratings yet
SV
8 pages
New Factory Monitoring Playbook
No ratings yet
New Factory Monitoring Playbook
21 pages
写作手册：学术论文、学位论文和博士论文的指南
100% (1)
写作手册：学术论文、学位论文和博士论文的指南
6 pages
200kVA Transformer Shopdrawing
No ratings yet
200kVA Transformer Shopdrawing
1 page