ASPLOS 2021 - Golden Age of Compilers
ASPLOS 2021 - Golden Age of Compilers
International Conference on
Architectural Support for Programming Languages and
Operating Systems (ASPLOS 2021)
Chris Lattner
SiFive Inc
April 19, 2021
Video Recording
Let’s talk compilers + accelerators
● Classical Compiler Design
● Modular Compiler Infrastructure
● Domain Specific Architectures
● Accelerator Compilers
● Silicon Compilers
● A Golden Age of Compilers
A New Golden Age for Computer Architecture
John L. Hennessy, David A. Patterson; June 2018
RISC
2X / 1.5
yrs
(52%/yr)
[cite] Turing Lecture, Hennessy, Patterson; June 2018 / CACM Feb 2019
HW / SW co-design is the best way to expose parallelism of silicon... and utilize it
Let’s learn from the past, then project into the future 🚀
Classical Compiler Design
C Compilers leading into the early 90s
IBM
One frontend for many backends, one backend for many frontends
Lessons Learned
Achieved “O(frontend+backends)” scalability of compiler ecosystem
Clang
Optimizer CodeGen Emitter
Frontend
RISC-V
SPARC
LICM
IPCP
DCE
CSE
Dataflow
X86
...
Tooling
Parser
IRGen
...
JIT
Sema
.o
.s
Key insight: Compilers as libraries, not an app!
● Enable embedding in other applications
● Mix and match components
● No hard coded lowering pipeline
LLVM
Components and interfaces!
Better than monolithic approaches for large scale designs:
- Easier to understand and document components
- Easier to test
- Easier to iterate and replace
- Easier to subset
- Easier to scale the community
LLVM
Lessons Learned
Larger center of gravity concentrated scarce compiler engineering effort
● Enables innovations in languages, frontends and backends
LLVM
Limitations of LLVM
20 years in perspective on LLVM:
● “One size fits all” quickly turns into “one size fits none”
● LLVM is: 👍 CPUs, “just ok” 👈 for SIMT, but 👎 for many accelerators
● … is not great for parallel programming models 💩
LLVM
Domain Specific Architectures
[cite] Hennessy, Patterson; June 2018 / CACM Feb 2019
It’s happening!
CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC
Specialization
CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC
IBM
We need some unifying theories!
We need:
● “O(frontend+backends)” scalability of compiler ecosystem
● Larger center of gravity concentrated scarce compiler engineering effort
● Reduced 💢fragmentation:
● Ability to innovate in the programming model
● ... without reinventing the whole stack
Accelerator Compilers
How do accelerators work?
Control Processor / Sequencer
Control Processor / “Sequencer” ● Executes commands by the host driver app
Parallel ● Handles booting and other housekeeping
Compute ● Diagnostics, security, debug, other functions
Unit
Some accelerators may do significantly more!
Parallel
Compute
Unit
Including DDR, HBM, … AMBA, PCI, CXL, etc depending on integration level
“Oops we need some software”
Hardware Software
Memory / Bus Interface
Device Kernels
The SW people are called in after the accelerator is defined to “make it work”
Larger accelerators go multicore/SIMT...
Hardware Software
Control Processor / “Sequencer”
Control Processor / “Sequencer” Programming Model + Userspace API
Parallel
Memory / Bus Interface
Unit
Compute Unit
Compute Multistream Mgmt / Interop Parallelism
Unit
Compute Unit
Compute Memory + Communication Optimization
Unit Unit Heterogenous Device + Host fallback
Unit Unit
Control Processor / “Sequencer” Control Processor / “Sequencer”
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Compute
Parallel Compute
Parallel Parallel Device Kernels
Compute
Parallel Compute
Parallel
Unit
Compute Unit
Compute
Unit
Compute Unit
Compute
Unit Unit Control Proc Assembler + Kernel Driver
Unit Unit
⇒ Also, hierarchical compute at the board, rack, and datacenter level
Pro & Cons of hand written kernels
Benefits:
● Easy to get started, ability to get peak performance, hackability
Pro & Cons of hand written kernels
Benefits:
● Easy to get started, ability to get peak performance, hackability
Unit
Compute Unit
Compute Accelerator Kernel Compiler
Unit
Compute Unit
Compute
Unit Unit Multistream Mgmt / Interop Parallelism
Unit Unit
Control Processor / “Sequencer” Control Processor / “Sequencer”
Memory + Communication Optimization
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer” Heterogenous Device + Host fallback
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Kernel Code Generation
Compute
Parallel Compute
Parallel
Compute
Parallel Compute
Parallel
Unit
Compute Unit
Compute
Unit
Compute Unit
Compute
Unit Unit Control Proc Assembler + Kernel Driver
Unit Unit
This is hard!
… and we keep reinventing it over and over again
… at the expense of usability and quality
Mostly needless reinvention, not co-design!
Hardware Software
Control Processor / “Sequencer” Control Processor / “Sequencer”
Control Processor / “Sequencer” Control Processor / “Sequencer” Programming Model + Userspace API
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Compute
Parallel Compute
Parallel
Compute
Parallel Compute
Parallel
Memory / Bus Interface
Unit
Compute Unit
Compute Accelerator Kernel Compiler
Unit
Compute Unit
Compute
Unit Unit Multistream Mgmt / Interop Parallelism
Unit Unit
Control Processor / “Sequencer” Control Processor / “Sequencer”
Memory + Communication Optimization
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer” Heterogenous Device + Host fallback
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Kernel Code Generation
Compute
Parallel Compute
Parallel
Compute
Parallel Compute
Parallel
Unit
Compute Unit
Compute
Unit
Compute Unit
Compute
Unit Unit Control Proc Assembler + Kernel Driver
Unit Unit
Most complexity is in non-differentiated (table stakes) components
Innovate where it matters
… use open standards to accelerate the rest
Industry already standardized the buses
Memory / Bus Interface
Parallel
Compute
Unit
RISC-V 2-Series
RISC-V 5 Series RISC-V
Parallel 7 Series RISC-V
Compute Parallel
Compute Unit
Parallel Parallel Parallel
Compute Compute Compute 8+ Series
Unit Unit Unit Unit
Parallel
Compute
Unit
Write your kernels in C or LLVM IR!
● Use existing code generators
● Use existing simulators
● Step through them in a debugger
https://mlir.llvm.org
See more (e.g.):
2020 CGO Keynote Talk Slides
2021 CGO Paper
RISC-V+MLIR: Uniting an Industry
CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC
??
Programmable xPUs Custom Hardware
Building Parallel Compute Units?
Memory / Bus Interface
RISC-V Processor
Parallel
Compute
Unit
nextpnr
�
Innovation Explosion Underway!
Research is producing new HW design models and abstraction approaches
Magma
Dahlia
Goals:
● Unite HW design tools community
● “Accelerate” design of the accelerators!
https://circt.llvm.org
CIRCT Ambition / Path Ahead
Support multiple different “hardware design models” in one framework:
● Generators, HLS, atomic transactions, ...
Increase abstraction level in the hardware design IR:
● Integrate modern type system features from the SW world
● Capture more design intent, higher level verification and tools
● Better integrate formal methods into the design flow
Increase quality of the tools themselves:
● Compile time: shrink development cycle time
● Usability: robust location tracking for good error messages
We’re hiring!
Get involved!
https://mlir.llvm.org/
https://circt.llvm.org/
Too much content,
skip this section
https://openai.com/blog/ai-and-compute