0% found this document useful (0 votes)
70 views59 pages

Ece5745 Overview

Uploaded by

vitteran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views59 pages

Ece5745 Overview

Uploaded by

vitteran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

ECE 5745 Complex Digital ASIC Design

Course Overview
Christopher Batten

School of Electrical and Computer Engineering


Cornell University

http://www.csl.cornell.edu/courses/ece5745
• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Application Complex Digital ASIC Design


Algorithm
▶ Course goal, structure, motivation
PL
▷ What is the goal of the course?
OS ▷ Why should students want to take this course?
Compiler ▷ How is the course structured?

ISA ▶ Activity: Evaluation of Integer Multiplier


μArch ▶ ASIC Design Case Studies
RTL ▷ Example design-space exploration
▷ Example real ASIC chips
Gates
Circuits
Devices
Technology

ECE 5745 Course Overview 2 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

The Computer Systems Stack

Application

Gap too large to bridge in one step


(but there are exceptions e.g., magnetic compass)

Technology

ECE 5745 Course Overview 3 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

The Computer Systems Stack

Application
Algorithm
Programming Language
Operating System
Compiler
Instruction Set Architecture
Microarchitecture
Register-Transfer Level
Gate Level
Circuits
Devices
Technology

In its broadest definition, computer architecture is the


design of the abstraction/implementation layers that allow us to
execute information processing applications efficiently
using available manufacturing technologies

ECE 5745 Course Overview 3 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

What is Computer Architecture?

Application
Algorithm Application Requirements
Programming Language • Provide motivation for building system
Computer Architecture

Operating System • SW/HW interface expressive yet productive


Compiler
Instruction Set Architecture Computer architects provides feedback to guide
Microarchitecture application and technology research directions
Register-Transfer Level
Gate Level
Circuits Technology Constraints
Devices • Restrict what can be done efficiently
Technology • New technologies make new arch possible

In its broadest definition, computer architecture is the


design of the abstraction/implementation layers that allow us to
execute information processing applications efficiently
using available manufacturing technologies

ECE 5745 Course Overview 4 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

8
10
Single-Core Multi-Core Accelerator
7
10 Transistors
(Thousands) Parallelization
6 &
10 Specialization
Aggressive Superscalar
Out-of-Order Execution SPECrate
5
10 (4-7 cores)
Superscalar Out-of-Order
Execution SPECint (singe-core)
4
10
Superscalar
Execution
3
10 Pipelining
Frequency (MHz)
& Caches Power (W)
2
10
1 Number of
10 Accelerators
Number of
Cores
0
10

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025
C. Batten, M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, K. Rupp & [Y. Shao, IEEE Micro'15] & [C. Leiserson, Science'20]

ECE 5745 Course Overview 5 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Key Metrics in Computer Architecture


▶ Primary Metrics
Network ▷ Execution time (cycles/task)
▷ Energy (Joules/task)
I$ I$ I$ I$ ▷ Cycle time (ns/cycle)
▷ Area (µm2 )
▶ Secondary Metrics
P P P P
▷ Performance (ns/task)
▷ Average power (Watts)
Network ▷ Peak power (Watts)
▷ Cost ($)
▷ Design complexity
D$ D$ D$ D$ ▷ Reliability
▷ Flexibility
Network Discuss qualitative first-order analysis
from ECE 4750 on board

ECE 5745 Course Overview 6 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Unanswered Questions from ECE 4750


▶ How can we quantitatively evaluate
Network area, cycle time, and energy?
▶ How do we actually implement
I$ Accelerated I$
Instructions processors, memories, and
networks in a real chip?
P Xcel Xcel P ▶ How should we implement/analyze
application-specific accelerators?
Network ▷ Very loosely coupled
memory-mapped accelerators
▷ More tightly coupled co-processor
D$ D$ D$ D$ accelerators
▷ Specialized instructions and
Network functional units

ECE 5745 Course Overview 7 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

ASIC: Application-Specific Integrated Circuit

Network Out-of-Order C D
Superscalar
Superpipelined

I$ Accelerated I$

Energy (Joules per Task)


Superscalar
Instructions w/ Deeper
Pipelines B Multicore E
F

P Xcel Xcel P Processor Power


Simple Constraint
Proc
A
Network
Specialized Accelerators

D$ D$ D$ D$
Performance (Tasks per Second)
Network

ECE 5745 Course Overview 8 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

ASIC: Application-Specific Integrated Circuit

Design
Network Performance Custom

Energy Efficiency (Tasks per Joule)


Constraint ASIC

n
io
at
Embedded Less Flexible

iz
Accelerated Accelerator

al
I$ I$ Architectures

ci
pe
Instructions More Flexible

.S
vs
Accelerator

y
ilit
ib
P Xcel Xcel P

ex
Fl
Simple Design Power
Network Processor Constraint

High-Performance
D$ D$ D$ D$ Architectures

Performance (Tasks per Second)

Network

ECE 5745 Course Overview 9 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Goal for ECE 5745 is to answer these questions!


▶ How can we quantitatively evaluate
Network area, cycle time, and energy?
▶ How do we actually implement
I$ Accelerated I$
Instructions processors, memories, and
networks in a real chip?
P Xcel Xcel P ▶ How should we implement/analyze
application-specific accelerators?
Network ▷ Very loosely coupled
memory-mapped accelerators
▷ More tightly coupled co-processor
D$ D$ D$ D$ accelerators
▷ Specialized instructions and
Network functional units

ECE 5745 Course Overview 10 / 58


quires complete customization of all layers of wafer

signer is free to do anything, anywhere


st time consuming design style
– Reserved for very high performance or very high volume

– though each design team usually imposes some discipline


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

devices (Intel microprocessors, RF power amps for cellphones)


Full Custom Design vs. Standard-Cell Design

Full Custom Design


Piece of full-custom multiplier array,

▶ Full-Custom Design (ECE 4740)


▷ Designer is free to do anything, anywhere; though
team usually imposes some design discipline
▷ Most time consuming design style; reserved for
1.0 m 2-metal

very high performance or very high volume chips


(Intel microprocessors, RF power amps for
cellphones)

▶ Standard-Cell Design (ECE 5745)


▷ Fixed library of “standard cells” and SRAM
memory generators
▷ Register-transfer-level description is automatically
Full-custom layout mapped to this library of standard cells, then these
in 1.0µm w/ 2 metal cells are placed and routed automatically
layers
▷ Enables agile hardware design methodology

ECE 5745 Course Overview 11 / 58


Also called Cell-Based ICs (CBICs)
Fixed library of •cells
Course Goal,
plus Structure,
memory Motivation •
generators Activity ASIC Design Case Studies
Cells can be synthesized from HDL, or entered in schematics
Standard-Cell Design Methodology
Cells placed and routed automatically
Requires complete set of custom masks for each design
Currently most popular hard-wired ASIC type (6.884 will use this)

Cells arranged
in rows

Standard
Mem
1 Mem
Cell Design
2
Cells have standard height but vary in width
Generated
Designed to connect power, ground, and wells by abutment
memory arrays

Well Contact
pring 2005 2 Feb 2005 L01 – Introduction 24

under Power Rail


Clock Rail
(not typical) Clock Rail

VDD Rail
Cell I/O
on M2
Power Ripple carry adder with carry
Rails in
M1 chain highlighted

GND Rail
NAND2 Flip-flop

ECE 5745 Course Overview 12 / 58


6.884 – Spring 2005 2 Feb 2005 L01 – Introduction 25
• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Standard-Cell Design Methodology

Design in HDL Standard Cells


Area (μm2)
HDL Synthesis Cycle Time (ns)
Simulator
Energy (J/task)

Switching Activity Gate-Level Model


Area (μm2)
Execution Time Place&Route Cycle Time (ns)
(cycles/task) Energy (J/task)

Layout

Power
Energy (J/task)
Analysis

ECE 5745 Course Overview 13 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Motivation Architectural Patterns Scale VT Core Maven VT Core Evaluation

Example Standard-Cell Chip Plot


Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL Christopher Batten 32 / 42


ECE 5745 Course Overview 14 / 58
• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

What is Complex Digital ASIC Design?

Application Complex digital ASIC design is


Algorithm the process of
Programming Language
quantitatively exploring the
Operating System
area, cycle time, execution time, and
Compiler energy trade-offs
Instruction Set Architecture
Microarchitecture of various
Register-Transfer Level application-specific accelerators
Gate Level (and general-purpose proc+mem+net)
Circuits
Devices using
Technology automated standard-cell CAD tools
and then to transform the most
promising design to
layout ready for fabrication

ECE 5745 Course Overview 15 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Application Complex Digital ASIC Design


Algorithm
▶ Course goal, structure, motivation
PL
▷ What is the goal of the course?
OS ▷ Why should students want to take this course?
Compiler ▷ How is the course structured?

ISA ▶ Activity: Evaluation of Integer Multiplier


μArch ▶ ASIC Design Case Studies
RTL ▷ Example design-space exploration
▷ Example real ASIC chips
Gates
Circuits
Devices
Technology

ECE 5745 Course Overview 16 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Technology Scaling is Slowing

7nm, ~50B Transistors

Integrated
CMOS
Technology
System Performance

Fallow
Period?
Integrated
Bipolar OR

Discrete Golden Age


Transistor of Chip
Design?
Vacuum
New Technology?
Tube

1940 1955 1970 1985 2000 2015 2030


Adapted from D. Brooks Keynote at NSF XPS Workshop, May 2015.

ECE 5745 Course Overview 17 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Example Application Domain: Image Recognition

Starfish

Dog

ECE 5745 Course Overview 18 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Machine Learning: Training vs. Inference

Training
Model
forward
"starfish" labels

=? "dog"

error
backward
many images

forward
Inference "dog"
few
images

ECE 5745 Course Overview 19 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

ImageNet Large-Scale Visual Recognition Challenge

26% ~100%
28% 89%

Entries Using GPUs


Top 5 Error Rate

74%

16%
12%
Human
7% Error Rate
14%
3.6% 3% 2.3%
0 0 Hardware: Graphics Processing Units

'10 '11 '12 '13 '14 '15 '16 '17

Software: Deep Neural Network

ECE 5745 Course Overview 20 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Accelerators for Machine Learning in the Cloud

NVIDIA DGX Hopper


Microsoft Catapult
▶ Graphics processor
specialized just for ▶ Custom FPGA board for
accelerating machine accelerating Bing
learning Google TPU v4 search and machine
learning
▶ Available as part of a ▶ Custom chip specifically
complete system with designed to accelerate ▶ Accelerators developed
both the software and Google’s TensorFlow with/by app developers
hardware designed by C++ library ▶ Tightly integrated into
NVIDIA ▶ Tightly integrated into Microsoft data center’s
Google’s data centers and cloud computing
platforms

ECE 5745 Course Overview 21 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Accelerators for Machine Learning at the Edge

Amazon Echo Facebook Oculus


Movidius Myriad 2
▶ Developing AI chips so ▶ Starting to design custom
Echo line can do more chips for Oculus VR
on-board processing headsets
▶ Reduces need for ▶ Significant performance
round-trip to cloud demands under strict
▶ Co-design the algorithms power requirements
and the underlying
hardware

ECE 5745 Course Overview 22 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Top-five software companies are ▶ Graphcore


▶ Nervana
all building custom accelerators
▶ Cerebras
▶ Facebook: w/ Intel, in-house AI chips ▶ Wave Computing
▶ Amazon: Echo, Oculus, networking chips ▶ Horizon Robotics
▶ Microsoft: Hiring for AI chips ▶ Cambricon
▶ Google: TPU, Pixel, convergence ▶ DeePhi
▶ Apple: SoCs for phones and laptops ▶ Esperanto
▶ SambaNova
Chip startup ecosystem for ▶ Eyeriss
machine learning accelerators ▶ Tenstorrent
▶ Mythic
is thriving! ▶ ThinkForce
▶ Groq
▶ Lightmatter

ECE 5745 Course Overview 23 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

The field of complex digital ASIC design


is experiencing a disruptive
sea change and has a critical choice:

1. A technological fallow period


2. A golden age of ASIC design

This course will help you appreciate and possibly


contribute to this golden age!

ECE 5745 Course Overview 24 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Course Motivation: Comp Arch Research Perspective


8
10
Single-Core Multi-Core Accelerator
7
10 Transistors
(Thousands) Parallelization
6 &
10 Specialization
Aggressive Superscalar
Out-of-Order Execution SPECrate
5
10 (4-7 cores)
Superscalar Out-of-Order
Execution SPECint (singe-core)
4
10
Superscalar
Execution
3
10 Pipelining
Frequency (MHz)
& Caches Power (W)
2
10
1 Number of
10 Accelerators
Number of
Cores
0
10

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025
C. Batten, M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, K. Rupp & [Y. Shao, IEEE Micro'15] & [C. Leiserson, Science'20]

ECE 5745 Course Overview 25 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Cross-Layer Interaction is Critical

Application Architecture-level researchers


Algorithm need to quantitatively
Programming Language understand area, cycle time,
Operating System and energy trade-offs to create
Compiler new architectures for the
Instruction Set Architecture accelerator era
Microarchitecture
Register-Transfer Level
Cross-layer interaction
Gate Level
Circuits
can generate some
Devices of the most exciting
Technology research ideas!

ECE 5745 Course Overview 26 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Course Motivation: Circuits Research Perspective

Your Digital Circuit Here

Your Analog Circuit Here

Apple M2 System-on-Chip (2022)


20 Billion transistors

ECE 5745 Course Overview 27 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Cross-Layer Interaction is Critical

Application
Algorithm
Programming Language
Operating System
Compiler Circuit-level researchers
Instruction Set Architecture need to appreciate the
Microarchitecture
system-level context for
Register-Transfer Level
their circuits
Gate Level
Circuits
Devices Cross-layer interaction
Technology can generate some
of the most exciting
research ideas!

ECE 5745 Course Overview 28 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Application Complex Digital ASIC Design


Algorithm
▶ Course goal, structure, motivation
PL
▷ What is the goal of the course?
OS ▷ Why should students want to take this course?
Compiler ▷ How is the course structured?

ISA ▶ Activity: Evaluation of Integer Multiplier


μArch ▶ ASIC Design Case Studies
RTL ▷ Example design-space exploration
▷ Example real ASIC chips
Gates
Circuits
Devices
Technology

ECE 5745 Course Overview 29 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Course Structure

Prereq Part 2
Computer Digital CMOS
Architecture Circuits

P P
Part 1
ASIC Design
Overview M M

Part 3
CAD Algorithms

ECE 5745 Course Overview 30 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Part 1: ASIC Design Overview

P P Topic 1
Hardware
Description
Languages
M M
Topic 4 Topic 6 Topic 5
Full-Custom Closing Automated Topic 8
Design the Design Testing and Verification
Methodology Gap Methodologies

Topic 3
CMOS Circuits

Topic 7
Clocking, Power Distribution,
Topic 2 Packaging, and I/O
CMOS Devices

ECE 5745 Course Overview 31 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Part 2: Digital CMOS Circuits

al
Lo na 9

ne 1

St en 10
ct
c n

on 1

e al
bi pic
gi tio

rc ic

qu ic
at ti
te p
om To

Se Top
In To
C

ECE 5745 Course Overview 32 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Part 3: CAD Algorithms

Topic 12 Topic 13
Synthesis Algorithms Physical Design Automation
Placement

RTL to Logic
Synthesis
Global
x = a'bc + a'bc' Routing
y = b'c' + ab' + ac
Technology
Independent
Synthesis
x = a'b
y = b'c' + ac Detailed
Technology
Dependent Routing
Synthesis

ECE 5745 Course Overview 33 / 58


• Course Goal, Structure, Motivation • Activity ASIC Design Case Studies

Five-Week Design Project

Design
Network Performance Custom

Energy Efficiency (Tasks per Joule)


Constraint ASIC

n
io
at
Embedded Less Flexible

iz
Accelerated Accelerator

al
I$ I$ Architectures

ci
pe
Instructions More Flexible

.S
vs
Accelerator

y
ilit
ib
P Xcel Xcel P

ex
Fl
Simple Design Power
Network Processor Constraint

High-Performance
D$ D$ D$ D$ Architectures

Performance (Tasks per Second)

Network

ECE 5745 Course Overview 34 / 58


Course Goal, Structure, Motivation • Activity • ASIC Design Case Studies

Application Complex Digital ASIC Design


Algorithm
▶ Course goal, structure, motivation
PL
▷ What is the goal of the course?
OS ▷ Why should students want to take this course?
Compiler ▷ How is the course structured?

ISA ▶ Activity: Evaluation of Integer Multiplier


μArch ▶ ASIC Design Case Studies
RTL ▷ Example design-space exploration
▷ Example real ASIC chips
Gates
Circuits
Devices
Technology

ECE 5745 Course Overview 35 / 58


Course Goal, Structure, Motivation • Activity • ASIC Design Case Studies

Fixed-Latency Iterative Multiplier Datapath

b_mux_sel
b_reg b_lsb

req_msg.b
>>
32b
32b
req_msg

a_mux_sel
a_reg
req_msg.a

<<
32b
32b

result_ add_
mux_sel result_ mux_sel
reg

resp_msg
0 32b

result_en 32b

ECE 5745 Course Overview 36 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Application Complex Digital ASIC Design


Algorithm
▶ Course goal, structure, motivation
PL
▷ What is the goal of the course?
OS ▷ Why should students want to take this course?
Compiler ▷ How is the course structured?

ISA ▶ Activity: Evaluation of Integer Multiplier


μArch ▶ ASIC Design Case Studies
RTL ▷ Example design-space exploration
▷ Example real ASIC chips
Gates
Circuits
Devices
Technology

ECE 5745 Course Overview 37 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Scalar Processors with Multithreading

Programmer's μT0 μT1 μT2 μT3 μT4 μTi


Logical
View

Memory

Typical Instr Memory


Core Multi-
μT0 μT2
Micro- threaded MIMD
μT1 μT3

Energy
Architecture Cores

Data Memory

Performance

ECE 5745 Course Overview 38 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Vector-SIMD Processors

Programmer's CT0 CTj


Logical elm 0 elm 1 elm 2 elm 3 elm i
View

Memory

Typical Instruction Memory


Core CP VIU
Micro-

Vector Lanes
Architecture MIMD

Energy
0 1
2 3
Vector-
SIMD
VMU

Data Memory Performance

ECE 5745 Course Overview 39 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Motivation Architectural Patterns Scale VT Core Maven VT Core Evaluation

Quantitative Area Evaluation


Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL Christopher Batten 32 / 42


ECE 5745 Course Overview 40 / 58
Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Quantitative Area Evaluation

1.75 control logic


1.50 register file
Normalized Area

1.25 memory units


1.00 floating-point functional units
0.75 integer functional units
0.50 control processor
0.25
instruction cache
data cache
0.00
1 thread
2 threads
4 threads
8 threads

4 lanes
1 core w/

4 cores w/
1 lane ea
Single-core multi-lane design
reduces area by 15%
Quad-Core Vector- Multi-core single-lane design
w/ Vertical SIMD increases area by 20%
Multithreading (8 elm/lane)

ECE 5745 Course Overview 41 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Quantitative Performance and Energy Evaluation


Multithreaded Multicore
Normalized Energy / Task

(increasing number of
1.6 control logic

Energy / Task (μJ)


threads per core) 25
1.4 register file
20
1.2 memory units
15 int func units
1.0
10
control proc
0.8 inst cache
Single-Lane Vector 5
0.6 data cache
(increasing vlen) leakage
0.4 0
0.8 1.0 1.2 1.4 1.6

1 thread

1 elm/lane
2 elm/lane
4 elm/lane
8 elm/lane
2 threads
4 threads
8 threads
Normalized Tasks / Second
Performance reduction with increasing
threads due to increased cycle time and
thread management overhead on Quad-Core Vector-
fine-grain loops w/ Vertical SIMD
Multithreading (4 core w/ 1 lane)

ECE 5745 Course Overview 42 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Application Complex Digital ASIC Design


Algorithm
▶ Course goal, structure, motivation
PL
▷ What is the goal of the course?
OS ▷ Why should students want to take this course?
Compiler ▷ How is the course structured?

ISA ▶ Activity: Evaluation of Integer Multiplier


μArch ▶ ASIC Design Case Studies
RTL ▷ Example design-space exploration
▷ Example real ASIC chips
Gates
Circuits
Devices
Technology

ECE 5745 Course Overview 43 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Simple RISC Processor ASIC

SP Control

SP Datapath SP Regfile

RAM Interface
VCO
Controller
AHIP

RAM RAM
Subbank Subbank
(2KB) (2KB)

RAM RAM
Subbank Subbank
(2KB) (2KB)

ECE 5745
Figure Course
2: STC1 chip plot and die photo. On the Overview
chip plot, the Exec Unit is colored blue, and the 44 / 58
Fetch Unit is colored green and purple.
Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Simple RISC Processor ASIC


1
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
0
1
8
0
1
9
0
2
0
0
2
1
0
2
2
0
2
3
0
2
4
0
2
5
0
2
6
0
2
7
0
2
8
0
2
9
0
3
0
0
3
1
0
3
2
0
3
3
0
3
4
0
3
5
0
3
6
0
3
7
0
3
8
0
3
9
0
4
0
0
▶ RISC
Figure 5: Shmooprocessor
plot for a w/ 8 KB SRAM
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 wide voltage range. The hori-
750
740
730
------------- .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
▶ TSMC
zontal axis 0.18volt-
plots supply µm process
720 . . . . . . . . . . . . . . . . . . . . . . . . .
age in mV, and the vertical
710
700
690
680
.
------------- .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
*
.
.
.
*
.
.
*
*
.
*
*
*
.
*
*
*
*
*
*
*
*
*
*
.
.
.
.
.
▶ 1.7 2.1 mm
×
axis plots frequency in MHz.
670 . . . . . . . . . . . . . . . . * * * * * * * . .
660
650
640
.
------------- .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
*
*
.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
.
*
*
.
.
.
▶ 610Kcorrect
A * indicates Transistors
opera-
630 . . . . . . . . . . . . . * * * * * * * * * * . . tion at that operating point
450indicates
MHz at 1.8 V
620 . . . . . . . . . . . . * * * * * * * * * * * * .
610
600
590
.
------------- .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
*
*
*
*
.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
.
.
*
.
.
.

and a dot failure.
580 . . . . . . . . . . * * * * * * * * * * * * * . .
570
560
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
The shmoo plot shows com-
550
540
------------- .
.
.
.
.
.
.
.
.
.
.
.
.
.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
.
.
* bined results from two differ-
530 . . . . . . . * * * * * * * * * * * * * * * * * *
520
510
.
.
.
.
.
.
.
.
.
.
.
.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* ent chips.
500 ------------- . . . . . * * * * * * * * * * * * * * * * * * * *
490 . . . . * * * * * * * * * * * * * * * * * * * * *
480 . . . . * * * * * * * * * * * * * * * * * * * * *
470 . . . * * * * * * * * * * * * * * * * * * * * * *
460 . . . * * * * * * * * * * * * * * * * * . * * * *
450 ------------- . . . * * * * * * * * * * * * * * * * * * * * * *
440 . . * * * * * * * * * * * * * * * * * * * * * * *
430 . . * * * * * * * * * * * * * * * * * * * * * * *
420 . . * * * * * * * * * * * * * * * * * * * * * * *
410 . * * * * * * * * * * * * * * * * * * * * * * * *
400 ------------- . * * * * * * * * * * * * * * * * * * * * * * * *
390 . . . . . * * * * *
380 . . . . . * * * * *
370 . . . . . * * * * *
360 . . . . . * * * * *
350 --- . . . . * * * * * *
340 . . . . * * * * * *
330 . . . . * * * * * *
320 . . . * * * * * * *
310 . . . * * * * * * *
300 --- . . . * * * * * * *
290 . . . * * * * * * *
280 . . * * * * * * * *
270 . . * * * * * * * *
260 . . * * * * * * * *
250 --- . . * * * * * * * *
240 . * * * * * * * * *
230 . * * * * * * * * *
220 . * * * * * * * * *
210 . * * * * * * * * *
200 --- * * * * * * * * * *
190 * * * * * * * * * *
180 * * * * * * * * * *

ECE 5745 Course Overview 45 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Scale Vector-Thread Processor ASIC

ECE 5745 Course Overview 46 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Scale Vector-Thread Processor ASIC


TSMC 0.18µm • 7.14 Million Transistors • 16.6 mm2 Core Area

Cache Tag CAMs

CP Cache Control
Cache X Vector Lane 0
Data B
RAM A Vector Lane 1
Banks R V V
M I
U Vector Lane 2 U

Vector Lane 3

ECE 5745 Course Overview 47 / 58


Course Goal,
Motivation Structure, Motivation
Vector-Threading Activity
Scale VT Processor • ASIC
Maven VT Design Case Studies
Processor •
Future Research

Scale Energy vs.


Energy-Efficiency ofPerformance
the Scale VTResults
Processor

ECE 5745 Course Overview 48 / 58


MIT CSAIL Christopher Batten 24 / 39
Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Batten Research Group Test Chips


TSMC 180nm, 28nm, 16nm; SkyWater 130nm Chip Tapeouts In Prep or Being Tested

GF 130nm, 12nm; Intel 22FFL CIFER HammerBlade BRGTC5 OC-FPGA

2015 2016 2017 2018 2019 2020 2021 2022 2023

DCS BRGTC1 Celerity BRGTC2 BRGTC4


TSMC 65nm IBM 130nm TSMC 16nm TSMC 28nm TSMC 180nm
1x2.2mm 2x2mm 5x5mm 1x1.25mm 2x2.5mm

▶ Simple RISC-V cores ▶ Mesh on-chip networks


▶ Coarse-grain reconfigurable arrays ▶ Crossbar interconnects
▶ Clustered manycore architectures

ECE 5745 Course Overview 49 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

BRG Test Chip 1 (2016)

divided
clk out

clk out
clk out

debug
LVDS

LVDS

reset
diff clk (+) LVDS clk
diff clk (−) Recv div
single reset
ended clk clk tree tree

Ctrl
host2chip Host Reg RISC Sort
chip2host Interface Core Accel

Memory Arbitration Unit

SRAM SRAM SRAM SRAM


Bank Bank Bank Bank
(2KB) (2KB) (2KB) (2KB)

Post-Silicon Evaluation Strategy Taped-out Layout for BRGTC1


The testing platform enables running small 2x2mm 1.3M transistors in IBM 130nm
test programs on BRGTC1 to compare the RISC processor, 16KB SRAM
performance and energy of pure-software kernels HLS-generated accelerators
versus the HLS-generated sorting accelerator Static Timing Analysis Freq. @ 246 MHz

ECE 5745 Course Overview 50 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

Celerity System-on-Chip Overview (2017)


Target Workload: High-Performance Embedded Computing
▶ 5 × 5mm in TSMC 16 nm FFC
▶ 385 million transistors
▶ 511 RISC-V cores
▷ 5 Linux-capable Rocket cores
▷ 496-core tiled manycore
▷ 10-core low-voltage array
▶ 1 BNN accelerator
▶ 1 synthesizable PLL
▶ 1 synthesizable LDO Vreg
▶ 3 clock domains
▶ 672-pin flip chip BGA package
▶ 9-months from PDK access to
tape-out

ECE 5745 Course Overview 51 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

BRG Test Chip 2 (2018)

L1 Instruction $
(32KB)
Host Interface

Instruction Memory Arbiter


Synthesizable PLL

Data Memory Arbiter

LLFU Arbiter

Int Mul/Div L1 Data $


FPU (32KB)

Taped-out Layout for BRGTC2


Block Diagram 2x2mm, 1.2M-trans, IBM 130nm
4xRV32IMAF cores with “smart” Static Timing Analysis Freq. @ 500MHz
sharing L1$/LLFU,
synthesizable PLL

ECE 5745 Course Overview 52 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

BRG Test Chip 3/4 (2020/2021)

ECE 5745 Alumni


Tape-Out!
• 2x2.5mm, TSMC 180nm
• SPI minion interface
• Open-source FPU
• Synthesizable digital
clock generator
• BRGTC3 had hold time
issue in the SPI minion
• BRGTC4 fully functional

ECE 5745 Course Overview 53 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

ECE 5745 Teaching Tapeout (2022)


▶ First teaching tapeout in 10 years
▷ SkyWater 130nm through efabless
▷ Taped out using completely
open-source EDA tools!
▶ Four student projects
▷ CRC32 checksum unit
implemented using C++ HLS
▷ Latency insensitive synthesizable
memory implemented in PyMTL3
▷ 2x2 systolic array multiplier
implemented in SystemVerilog
▷ Greatest common divisor unit
implemented in SystemVerilog
▷ Each unit included dedicated SPI
interface

ECE 5745 Course Overview 54 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

BRG Test Chip #5 (2022)

RISC-V RV32IM core with 32-KB of SRAM


SPI minion for config; SPI master and GP I/O for peripherals
2x2.5mm, TSMC 180nm
100% done using PyMTL3 by ECE 5745 Alumni
ECE 5745 Course Overview 55 / 58
Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

BRG Test Chip #5 (2022)

ECE 5745 Course Overview 56 / 58


Course Goal, Structure, Motivation Activity • ASIC Design Case Studies •

BRGTC5 Fall 2022


BRG Test Chip #5 (2022)
seen in Figure 68. This comparison is done at 66 MHz with a 3.3 V core voltage since these are
the conditions under which the processor is simulated.

Figure 68. Measured and simulated energy per instruction breakdown at 66 MHz clock and 3.3
V core voltage. Both measurements include static power.

As can be seen in the figure, the simulated energy per instruction is quite close to the
post-silicon measured energy use. This gives us confidence in the tools ability to provide
accurate energy and power estimations based on the post place-and-route design. This figure also
confirms the theory that load and store instructions use significantly less energy in the ALU than
the add and addi instructions.

6.6: Performance
ECE 5745 Evaluating
Course the performance of the chip involves exploring the tradeoffs between57
Overview energy
/ 58
and latency for programs run on the processor. The shmoo plot in Figure 69 illustrates the range
of frequencies at which the chip can operate for different core voltage levels.
Course Goal, Structure, Motivation Activity ASIC Design Case Studies

Application Take-Away Points


Algorithm
▶ Complex digital ASIC design is the process of
PL quantitatively exploring the area, cycle time,
OS execution time, and energy trade-offs of
Compiler general-purpose and application-specific designs
using automated standard-cell CAD tools and then
ISA
to transform the most promising design to layout
μArch ready for fabrication
RTL ▶ Course provides an excellent foundation for
Gates students interested in pursuing a career in in
Circuits industry development of ASICs or can provide
useful experience with cross-layer interaction for
Devices
students interested in pursuing research in
Technology computer architecture or circuits

ECE 5745 Course Overview 58 / 58

You might also like