0% found this document useful (0 votes)
37 views21 pages

Furiosa Introduction Confidential

FuriosaAI Inc. aims to create a next-generation AI accelerator, the RNGD chip, which is designed for efficient and sustainable AI computing, targeting large language models and multimodal applications. The chip architecture, known as TCP (Tensor Contraction Processor), is recognized for its programmability, scalability, and superior performance per watt, making it suitable for future AI workloads. The RNGD chip is set to be commercially available in Q3 2024, following the successful deployment of its first-generation WARBOY chip.

Uploaded by

satasi.satasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views21 pages

Furiosa Introduction Confidential

FuriosaAI Inc. aims to create a next-generation AI accelerator, the RNGD chip, which is designed for efficient and sustainable AI computing, targeting large language models and multimodal applications. The chip architecture, known as TCP (Tensor Contraction Processor), is recognized for its programmability, scalability, and superior performance per watt, making it suitable for future AI workloads. The RNGD chip is set to be commercially available in Q3 2024, following the successful deployment of its first-generation WARBOY chip.

Uploaded by

satasi.satasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

The next-gen AI accelerator

for more powerful and


sustainable AI computing

Confidential (c) 2024 FuriosaAI Inc.


Our Mission

Make AI computing sustainable,


enabling access to powerful AI
for everyone on Earth

Confidential (c) 2024 FuriosaAI Inc.


“Compute is going to be the
new currency of the future.”

Sam Altman, from Lex Fridman Podcast

Confidential (c) 2024 FuriosaAI Inc.


OpenAI’s ChatGPT Compute & Cost

350B computations per every new token generated


$700K per day to run ChatGPT
$250 M per year to run ChatGPT (as of 2023 April)

So, the boy went to the Park


New token

Context (sequence of tokens)

96

96 x (12,288 × 49,152 dimensional matrix multiplication)

Confidential (c) 2024 FuriosaAI Inc.


”One of the barriers there is actually building the
great software it’s very usable, so there is a real
possibility to build better chips that are optimized
for not just today’s models, but to be really able
to see where the models are going and making it
so people can experiment very flexibly”

Greg Brockman, OpenAI

5
Confidential (c) 2024 FuriosaAI Inc.
$ 1 Trillion Long-Term Opportunity with Fast-evolving AI

7
02
by2
B
00
$4
$ 400B
Year 2027

$30B Today
Now!
Year 2023
Early stage of the dynamic
and fast changing AI landscape

Confidential (c) 2024 FuriosaAI Inc. Source: AMD & Nvidia announcements
6
Key to winning AI chip architecture

01 Programmability and scalability

02 Superior performance per watt

Easy deployment at mass


2nd gen-chip, RNGD (Renegade)
03 scale cloud, on-premise

Confidential (c) 2024 FuriosaAI Inc.


01
Programmable and Scalable
Chip Architecture (TCP)

“TCP is a tensor processor that hits a good sweet-spot


of being generalized for many tensor-based workloads
in modern neural architectures, yet specialized enough
to exploit many of the commonly known tricks of
structured parallelism in AI acceleration.

This makes TCP a great arch, not only for one use case
but reasonably future-proof to host new and upcoming
DNN architectures (based on current trends observed).”

– ISCA Paper Reviewer

TCP (Tensor Contraction Processor)

Confidential (c) 2024 FuriosaAI Inc.


TCP Recognized by ISCA
“I found all architectural decisions to be well-motivated and
well- thought for many of today's challenges in AI acceleration.

The arch tackles memory access challenges, compute-vs-


memory boundness, full-pipelined logic, diversity in OPs,
scalability, multi-tenancy, dynamic reconfigurability through
extensive control logic, well- thought cost-optimized compilation
flow together with a low-level API for full control, bitwidth
flexibility (FP8, BF16, INT8, INT4), and more.”

– ISCA Reviewer

Context: The only other NPU architectures accepted by ISCA are Google's TPU and Groq

Confidential (c) 2024 FuriosaAI Inc.


01
1st-gen WARBOY’s superior
programmability – high
performance across many models

Confidential (c) 2024 FuriosaAI Inc.


02
RNGD Server H100 DGX
RNGD can run the
memory capacity HBM3 960 GB HBM3 640 GB
most advanced models
with around 1/2
memory performance 30 TB/s 26.8 TB/s

power consumption power (TDP) 4.0 kW max 10.2 kW max

FP8 10.2 petaFLOPS 16 petaFLOPS

Confidential (c) 2024 FuriosaAI Inc.


02
RNGD H100 L40S

RNGD will be the world’s most efficient 4x

AI chip for accelerating LLMs and 3x


multimodal models in data centers

Energy efficiency

LLaMa 7B

Confidential (c) 2024 FuriosaAI Inc.


PyTorch 2.x Support Overview

03
Dynamo Furiosa
tracing compiler codegen
Python fx.GraphModule LLTC IR RNGD ISA
LLM engine quantizer

Furiosa SW stack
device runtime
runtime calibrator

Furiosa SW stack streamlines debugger, profiler


model execution with native
PyTorch 2.x support.
OpenAI Triton Support Plan
All done with zero code change
from users. AST Triton-Furiosa
Visitor compiler codegen
Python Triton-IR LLVM IR RNGD ISA

LLTC Low-Level Tensor Contraction


AST Abstract Syntax Tree
IR Intermediate Representation
Confidential (c) 2024 FuriosaAI Inc.
ISA Instruction Set Architecture
03
Easy deployment at mass
scale cloud and on-premise

Confidential (c) 2024 FuriosaAI Inc.


Semiconductor Productization & Commercialization

Tape-out GDS file Photomask ASML EUV Lithography Machine

Confidential (c) 2024 FuriosaAI Inc.


1st-gen WARBOY had successful
commercialization with Samsung
and ASUS, and deployed with data
centers and enterprise clients.

1st gen, WARBOY

Confidential (c) 2024 FuriosaAI Inc.


2nd gen-chip, RNGD (Renegade)
targeting LLMs and Multimodality
is available in Q3 2024

08/11 – Renegade Bringup getting started


05/20 – 20x Renegade PCIe PCB Board being ready for SW development
6/20 – 100-120x Renegade PCB board ready
7/26 – MLPerf submission (GPT-J, Llama 70B, BERT)
8/15 – Initial Renegade SK release to early customers and more
comprehensive benchmark results for LLM.

Confidential (c) 2024 FuriosaAI Inc.


Furiosa Future Roadmap

2022 4Q 2024 3Q 2024 4Q 2025 1Q 2025 4Q

WARBOY RNGD RNGD-MAX RNGD-S RNGD-TURBO

LPDDR4X 16GB HBM3 48GB HBM3 96GB LPDDR5X 64GB HBM3E 288GB
66 GB/s 1.5 TB/s 3.0 TB/s 256 GB/s 8.0 TB/s
60 W 150 W 350 W 60 W 600 W
64 TOPS (INT8) 512 TFLOPS (FP8) 1024 TFLOPS (FP8) 230 TFLOPS (FP8) 2 PFLOPS (FP8)

Confidential (c) 2024 FuriosaAI Inc.


World-class R&D & Business Team in HW, SW and AI

HW/Architecture AI Models & Algorithms SW Stack HW Verification


“TCP: A Tensor Contraction “Can MLLMs Perform Text-to-Image “Integrating an NPU with PyTorch “Functional Coverage Closure
Processor for AI Workloads” In-Context Learning?” 2.0 Compile” with Python”

Accepted by ISCA 2024 Co-authored with UW Madison Presented at PyTorch Conference 2023 Presented at DVCON 2024

Confidential (c) 2024 FuriosaAI Inc.


Movement coming soon
2024

Confidential (c) 2024 FuriosaAI Inc.


Thank you

Confidential (c) 2024 FuriosaAI Inc.

You might also like