Chip Builder GPT

DEVELOPING CHIPS
FOR TOMORROW:
YOU, ME,
AND CHATGPT®
Dr Jawad Nasrullah | Palo Alto Electron
February 07, 2024
Agenda
1. ChatGPT® and chip design
2. Heterogeneous chips for generative AI-an

example
3. AI memory demand beyond HBM
4. Challenges and opportunities of AI HW

Vision
Chips to
enhance AI
AI to improve
chip design
ChatGPT and
®
Chip Design
Chip design uses coding
Manufacturing design kits and
Chip design collaterals
life-cycle Architecture and early

planning
Design implementation and

verification
Verification and sign off
Manufacture, test, qualify and ship


planning

verification


planning

verification

write a spice netlist of an inverter in a
subcircuit named chiplet1 Vdd
● .subckt chiplet1 Vin Vout Vdd Gnd In Out
● * Definition of PMOS transistor

● M1 Vout Vin Vdd Vdd PMOS L=1u W=2u
Vss
● * Definition of NMOS transistor
● M2 Vout Vin Gnd Gnd NMOS L=1u W=1u
● .ends chiplet1
write behavioral model of chiplet1 in Verilog
Vdd
● module chiplet1(Vin, Vout); In Out
● input Vin; // Input port

● output Vout; // Output port
Vss
● // Logical behavior of inverter
assign Vout = ~Vin;
● endmodule
write a SPICE testbench to test transient output
of chiplet1 with a pulse of 100ns width
Vdd
● .include "Chiplet1.spice" ; Include the

Chiplet1 subcircuit definition
In Out
● Vdd Vdd 0 DC 5V
● Vss Vss 0 DC 0V
● Vin in 0 PULSE(0V 5V 0 1ns 1ns 100ns
200ns) Vss
● X1 in out Vdd Vss Chiplet1
● .tran 1ns 500ns ; Transient analysis:
step time, total time
● .end
write an essay on Chiplet1
In the realm of semiconductor design and

electronics, the concept of a "chiplet,"
such as the imaginary "Chiplet1,"
represents a significant stride in the
modular approach to integrated circuit (IC)
design. While "Chiplet1" is a hypothetical
…. serves as an excellent example to
understand modern trends in IC design, such
as chiplet-based architecture…
• Not hallucinating and shows some ‘understanding’

• Can be useful in writing verification scripts
A simple view of the generative pipeline
Prompt Foundation Large Language Output

Model (LLM)
Language ChatGPT3— 175 Billion parameters
Training
Corpora ChatGPT4— ~1.7Trillion parameters
-
~4TB (Inference)
-
~27TB (Training)
SPICE and
Verilog
Manuals, App ChatGPT5— 20x increase?
Notes
A chip design generative pipeline
Engineering
Prompt
Chip Design Fine- EDA
Tuned tools
LLM
Chip Design
Training Corpora
(Style, IP, Kits)
Design Database Architecture
Retrieval Implementation
Chiplet modularity
helps simplify this
Verify, Sign off
pipeline
GPT4 Training Cost
Estimates
X ~2000-3000 needed for a
month to train GPT
Rental cost/training ~$80 Million
• Electricity cost/training ~$4 Million

(Greenhouse gas emissions equivalent to ~1600 gas-
powered cars for a year.)
• Equipment CapEx ~$450 Million

Data Center GPU
Estimates
X 75k units @10.2kW
(“600k H100 equivalent” - Meta/Zuckerberg)
Power Demand = 750 MW
Needs own power generation
• Equipment CapEx >$10 B

Compute = 1 x 1018 math operations/s
Power Dissipation = 20 W
Memory = 2.5 x 1015 = 2.5 Peta Bytes
Chips for AI
A heterogeneous example
AMD MI300A AMD MI300X
228 GPU, 24 CPU 304 GPU

128 GB HBM DRAM 192 GB HBM DRAM
5.3TB/s Memory BW 5.3TB/s Memory BW
750W
750A @ 1V
16A @ 48V
Gen AI justifying super expensive chips

Rack ~50kW
MI300 systems leverages OCP open

accelerator infrastructure
AMD MI300 Chip MI300 OCP OAM MI300 OCP UBB

Module
304 GPU (8 XCD) 8x OAM
192 GB HBM (DRAM) OAM Heatsink 1.5 TB HBM (DRAM)
5.3 TB/s Memory BW 42 TB/s Memory BW
750 W TBP 10 kW UBB
102mm
~78mm
MI300 OAM
• 304 GPUs
170mm
• 192 GB
• 750W
GPU CPU
CPU
GPU CPU
IOD IOD
HBM
Stack
Passive Si Interposer
Substrate
GPU CPU
CPU
GPU CPU
IOD IOD
HBM
Stack
Passive Si Interposer
Substrate
102mm
~78mm
MI300 OAM
• 304 GPUs
• 192 GB
170mm
• 750W
102mm
+3 Years
• 456 GPUs
~110
mm
• 1TB
170mm
• 1.5kW
150mm
+10 Years
• >1000 GPUs
170mm
• 4 TB
• 3kW
HBM and beyond
Still “there is plenty of room at the bottom”
HBM System Trends
100000
10000
1000
HBM
Stack 100
10
1
2020 2030 2040
1024 data bus
~50um
• DRAM capacity (#stacks x stack capacity)
• Bandwidth/stack (#wires x symbol rate)
• DRAM power budget (system design)
745um
GPU
Silicon Interposer
Now +10 years
Die Stack 8-Hi, 12-Hi 24-Hi
HBM Capacity/Package 288GB 4TB
Data Bus Width 1024 2048

HBM
Stack Symbol rate/wire 8Gbps 32Gbps
Core Vdd 1.1V 0.8V
1024 data bus • Cu-Cu bonding, New DRAM devices, large interposers
~50um • Substrate/interposer technology improvement
• New physical/logical layer circuitry
• DTCO, circuit design
745um
GPU
Silicon Interposer
Now +10 years
Die Stack 8-Hi, 12-Hi 24-Hi
HBM Capacity/Package 288GB 4TB
Data Bus Width 1024 2048

HBM
Stack Symbol rate/wire 8Gbps 32Gbps
Core Vdd 1.1V 0.8V
• Cu-Cu bonding, New DRAM devices, large interposers

• Substrate/interposer technology improvement
• New physical/logical layer circuitry
• DTCO, circuit design
745um
GPU
Silicon Interposer
Challenges and
opportunities of AI HW
Power efficiency and scale out
Manufacturing
• Beyond CMOS (multi-gate)
• Vdd scaling (target 200mV)
• True 3D transistor stacking
• Fine pitch backend
Chip Design System Design

• Modularity (Chiplets) • Scale-out networking
• Power reduction • Multi chiplet integration
• Circuit Density • HBM/3D integration
Manufacturing

• Modularity (Chiplets) • Scale-out networking
• Power reduction • Multi chiplet integration
Manufacturing
Chip System
Design Kits Design Kits
Design
Automation

• Modularity (Chiplets) Power Supply • Scale-out networking
• Power reduction Cooling • Multi chiplet integration
Thank you.

Chip Builder GPT

Uploaded by

Copyright:

Available Formats

Chip Builder GPT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chip Builder GPT

Uploaded by

Copyright:

Available Formats

DEVELOPING CHIPS

1. ChatGPT® and chip design

2. Heterogeneous chips for generative AI-an

3. AI memory demand beyond HBM

4. Challenges and opportunities of AI HW

life-cycle Architecture and early

Design implementation and

Verification and sign off

Manufacture, test, qualify and ship

life-cycle Architecture and early

Design implementation and

Verification and sign off

Manufacture, test, qualify and ship

life-cycle Architecture and early

Design implementation and

Verification and sign off

Manufacture, test, qualify and ship

● .subckt chiplet1 Vin Vout Vdd Gnd In Out

● * Definition of PMOS transistor

● module chiplet1(Vin, Vout); In Out

● input Vin; // Input port

● .include "Chiplet1.spice" ; Include the

In the realm of semiconductor design and

• Not hallucinating and shows some ‘understanding’

Prompt Foundation Large Language Output

Rental cost/training ~$80 Million

• Electricity cost/training ~$4 Million

• Equipment CapEx ~$450 Million

Power Demand = 750 MW

Needs own power generation

• Equipment CapEx >$10 B

228 GPU, 24 CPU 304 GPU

Gen AI justifying super expensive chips

MI300 systems leverages OCP open

AMD MI300 Chip MI300 OCP OAM MI300 OCP UBB

Die Stack 8-Hi, 12-Hi 24-Hi

HBM Capacity/Package 288GB 4TB

Data Bus Width 1024 2048

Core Vdd 1.1V 0.8V

Die Stack 8-Hi, 12-Hi 24-Hi

HBM Capacity/Package 288GB 4TB

Data Bus Width 1024 2048

Core Vdd 1.1V 0.8V

• Cu-Cu bonding, New DRAM devices, large interposers

Chip Design System Design

Chip Design System Design

Chip Design System Design

You might also like