0% found this document useful (0 votes)

24 views64 pages

ASPLOS 2021 - Golden Age of Compilers

Uploaded by

fussfuss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views64 pages

ASPLOS 2021 - Golden Age of Compilers

Uploaded by

fussfuss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

The Golden Age of Compilers

in an era of Hardware/Software co-design

International Conference on
Architectural Support for Programming Languages and
Operating Systems (ASPLOS 2021)

Chris Lattner
SiFive Inc
April 19, 2021
Video Recording
Let’s talk compilers + accelerators
● Classical Compiler Design
● Modular Compiler Infrastructure
● Domain Speciﬁc Architectures
● Accelerator Compilers
● Silicon Compilers
● A Golden Age of Compilers
A New Golden Age for Computer Architecture
John L. Hennessy, David A. Patterson; June 2018

RISC
2X / 1.5
yrs
(52%/yr)

[cite] Turing Lecture, Hennessy, Patterson; June 2018 / CACM Feb 2019
HW / SW co-design is the best way to expose parallelism of silicon... and utilize it

[cite] Hennessy, Patterson; June 2018 / CACM Feb 2019

🧐 How do we program these things?
[cite] Hennessy, Patterson; June 2018 / CACM Feb 2019
Hardware is getting harder
Modern compute acceleration platforms are multi-level and explicit:
● Scalar, SIMD/Vector, Multi-core, Multi-package, Multi-rack
● Non-coherent memory subsystems increase eﬃciency

Heterogeneous compute incorporating domain-speciﬁc accelerators

● Standard in high-end SoCs, domain-speciﬁc hard blocks in FPGAs

Many accelerator IPs are conﬁgurable:

● Optional extensions, tile / core count, memory hierarchy, etc

🤯 How…canand“normal people” write Software for this in the ﬁrst place?

how can you afford to build generation-specific SW?
Next-Gen compilers and PL are needed!
We need:
● hardware abstraction spanning diverse accelerators
● support for heterogeneous compute platforms
● domain specific languages and programming models
● quality, reliability, and scalability of infrastructure

This opportunity is beckoning a golden age in compiler and PL technology!

Let’s learn from the past, then project into the future 🚀
Classical Compiler Design
C Compilers leading into the early 90s

IBM

⇒ Expensive, not very compatible, inconsistencies abound

● … and didn’t share any code
Also: Came in boxes, with printed manuals, often on ﬂoppy disks!
Three Phase Compiler Design

Figure 11.1: Three Major Components of a Three-Phase Compiler

[cite] LLVM @ The Architecture of Open Source Applications

FOSS Enables Collaboration & Reuse

Figure 11.2: Retargetablity

One frontend for many backends, one backend for many frontends
Lessons Learned
Achieved “O(frontend+backends)” scalability of compiler ecosystem

Larger center of gravity concentrated scarce compiler engineering eﬀort

● Enables innovations in languages, frontends and backends

Reduced 💢fragmentation, standardized “C in practice”

● Enabled new business models 💸
● Untied the CPU ISA war from inconsequential impl details
Modular Compiler Infrastructure
Library Based Design

Clang
Optimizer CodeGen Emitter
Frontend

RISC-V

SPARC
LICM

IPCP
DCE
CSE
Dataflow

X86
...
Tooling
Parser

IRGen

...

JIT
Sema

.o
.s
Key insight: Compilers as libraries, not an app!
● Enable embedding in other applications
● Mix and match components
● No hard coded lowering pipeline
LLVM
Components and interfaces!
Better than monolithic approaches for large scale designs:
- Easier to understand and document components
- Easier to test
- Easier to iterate and replace
- Easier to subset
- Easier to scale the community

LLVM
Lessons Learned
Larger center of gravity concentrated scarce compiler engineering eﬀort
● Enables innovations in languages, frontends and backends

Reduced 💢fragmentation of JIT compilers, standardized CPU codegen

● Enabled new business models
● Databases, graphics shader compilers, GPGPU, EDA HLS tools, …
Scalable community architecture:
● Design methodology / developer policies
● Community policies: inclusion, licensing, extensions etc

LLVM
Limitations of LLVM
20 years in perspective on LLVM:
● “One size ﬁts all” quickly turns into “one size ﬁts none”
● LLVM is: 👍 CPUs, “just ok” 👈 for SIMT, but 👎 for many accelerators
● … is not great for parallel programming models 💩

Engineering is “pretty good” but could be better:

● Lots of redundancy/reimplementation @ diﬀerent levels of abstraction
● Deeper discussion @ CGO 2020 talk

Going beyond basic CPUs means going beyond LLVM IR!

LLVM
Domain Speciﬁc Architectures
[cite] Hennessy, Patterson; June 2018 / CACM Feb 2019
It’s happening!
CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC

Programmable xPUs Custom Hardware

Specialization

[cite] Applying Circuit IR Compilers and Tools (CIRCT)

to ML Applications, Mike Urbach, MLSys Chips And Compilers
Symposium 2021
Lots of players! (an incomplete list!)

CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC

Programmable xPUs Custom Hardware

How do we compile for this?

⇒ Not very compatible, inconsistent quality and scope

● … and don’t share much code
We’ve seen this before!

IBM
We need some unifying theories!
We need:
● “O(frontend+backends)” scalability of compiler ecosystem
● Larger center of gravity concentrated scarce compiler engineering eﬀort
● Reduced 💢fragmentation:
● Ability to innovate in the programming model
● ... without reinventing the whole stack
Accelerator Compilers
How do accelerators work?
Control Processor / Sequencer
Control Processor / “Sequencer” ● Executes commands by the host driver app
Parallel ● Handles booting and other housekeeping
Compute ● Diagnostics, security, debug, other functions
Unit
Some accelerators may do signiﬁcantly more!

Ratio of control to parallel compute vary, as do the internal arch’s of both

Add a system interface
Memory / Bus Interface

Control Processor / “Sequencer”

Parallel
Compute
Unit

Communicate w other parts of the SoC, or to oﬀ-chip resources

Including DDR, HBM, … AMBA, PCI, CXL, etc depending on integration level
“Oops we need some software”
Hardware Software
Memory / Bus Interface

Control Processor / “Sequencer”

Programming Model + Userspace API

Parallel
Compute
Unit

Device Kernels

Control Proc Assembler + Kernel Driver

The SW people are called in after the accelerator is deﬁned to “make it work”
Larger accelerators go multicore/SIMT...
Hardware Software
Control Processor / “Sequencer”
Control Processor / “Sequencer” Programming Model + Userspace API
Parallel
Memory / Bus Interface

Control Processor / “Sequencer”

Parallel
Control Processor / “Sequencer”
Compute
Parallel
Compute
Parallel
Unit
Compute
Unit
Compute
Unit
Unit

Parallel Device Kernels

Control Proc Assembler + Kernel Driver

Use of more HW area is desired, requiring parallel control logic

Tiling and heterogeneity for generality
Hardware Software
Control Processor / “Sequencer” Control Processor / “Sequencer”
Control Processor / “Sequencer” Control Processor / “Sequencer” Programming Model + Userspace API
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Compute
Parallel Compute
Parallel
Compute
Parallel Compute
Parallel
Memory / Bus Interface

Unit
Compute Unit
Compute Multistream Mgmt / Interop Parallelism
Unit
Compute Unit
Compute Memory + Communication Optimization
Unit Unit Heterogenous Device + Host fallback
Unit Unit
Control Processor / “Sequencer” Control Processor / “Sequencer”
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Compute
Parallel Compute
Parallel Parallel Device Kernels
Compute
Parallel Compute
Parallel
Unit
Compute Unit
Compute
Unit
Compute Unit
Compute
Unit Unit Control Proc Assembler + Kernel Driver
Unit Unit
⇒ Also, hierarchical compute at the board, rack, and datacenter level
Pro & Cons of hand written kernels
Beneﬁts:
● Easy to get started, ability to get peak performance, hackability
Pro & Cons of hand written kernels
Beneﬁts:
● Easy to get started, ability to get peak performance, hackability

Problem: hand written kernels don’t scale

● Expensive to maintain a library of 100’s to 1000’s of kernels
● Don’t scale to conﬁgurable IPs, not even memory hierarchy dimensions
● Don’t scale to device families, or evolving µarch’s over time
● Eventually end up limiting HW design space exploration / evolution

Often addressed with metaprogramming (aka “mini compilers”)

“DSA Compilers” to the rescue
Hardware Software
Control Processor / “Sequencer” Control Processor / “Sequencer”
Control Processor / “Sequencer” Control Processor / “Sequencer” Programming Model + Userspace API
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Compute
Parallel Compute
Parallel
Compute
Parallel Compute
Parallel
Memory / Bus Interface

Unit
Compute Unit
Compute Accelerator Kernel Compiler
Unit
Compute Unit
Compute
Unit Unit Multistream Mgmt / Interop Parallelism
Unit Unit
Control Processor / “Sequencer” Control Processor / “Sequencer”
Memory + Communication Optimization
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer” Heterogenous Device + Host fallback
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Kernel Code Generation
Compute
Parallel Compute
Parallel
Compute
Parallel Compute
Parallel
Unit
Compute Unit
Compute
Unit
Compute Unit
Compute
Unit Unit Control Proc Assembler + Kernel Driver
Unit Unit
This is hard!
… and we keep reinventing it over and over again
… at the expense of usability and quality
Mostly needless reinvention, not co-design!
Hardware Software
Control Processor / “Sequencer” Control Processor / “Sequencer”
Control Processor / “Sequencer” Control Processor / “Sequencer” Programming Model + Userspace API
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Parallel Parallel
Control Processor / “Sequencer” Control Processor / “Sequencer”
Compute
Parallel Compute
Parallel
Compute
Parallel Compute
Parallel
Memory / Bus Interface

Control Processor / “Sequencer”

Parallel
Compute
Unit

PCI, HBM, DDR, CXL, AMBA, etc are all standardized

Standardize the Control Processor?
Control Processor / “Sequencer”
SW is a bigger problem than HW for accelerators:
Standard Interface

Parallel ● Control processor is bottom of the SW stack

Compute
Unit We fool ourselves into building trivial CPUs:
● it can seem fun to design a new solution here
● … except reset, debug, power management,
security, etc (the hard parts!)

“Saving a few gates” slows down what matters

● … and hobbles the critical path: software
[cite] Hennessy, Patterson; June 2018 / CACM Feb 2019
Open Industry Standard
● Many implementations available

🧩 Modular and subset-able ISA design:

● Extensibility allows easy addition of heterogeneous units

Scalability allows full spectrum of design points!

RISC-V 2-Series
RISC-V 5 Series RISC-V
Parallel 7 Series RISC-V
Compute Parallel
Compute Unit
Parallel Parallel Parallel
Compute Compute Compute 8+ Series
Unit Unit Unit Unit

Hard Coded Programmable Heterogeneous General Purpose CPU

Accelerator Accelerator Workload Accelerator
“15,000 gates? I can’t count that low!”
Cliﬀ Young, TPU architect, Google Brain
MLSys 2021 Chips and Compilers Symposium Panel
Standardize your base Software
RISC-V Control Processor
Standard Interface

Parallel
Compute
Unit
Write your kernels in C or LLVM IR!
● Use existing code generators
● Use existing simulators
● Step through them in a debugger

RISC-V Compiler + Kernel Drivers

The next frontier: DSA Compilers?

“No one size ﬁts all” compiler! Accelerator Kernel Compiler

Multistream Mgmt / Interop Parallelism

Shape of the problem is the same... Memory + Communication Optimization
… but the accel details always vary Heterogenous Device + Host fallback
Kernel Code Generation

How do we get reuse?

RISC-V Software Ecosystem
MLIR: Compiler Infra at the End of Moore’s Law
● Multi-Level Intermediate Representation
● Joined LLVM, follows open library-based philosophy
● 🧩 Modular, extensible, general to many domains
○ Being used for CPU, GPU, TPU, FPGA, HW, quantum, ....
● Easy to learn, great for research
● MLIR + LLVM IR + RISC-V CodeGen = 💝💝

https://mlir.llvm.org
See more (e.g.):
2020 CGO Keynote Talk Slides
2021 CGO Paper
RISC-V+MLIR: Uniting an Industry
CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC

Programmable xPUs Custom Hardware

What is the beneﬁt of this?
Larger center of gravity concentrated scarce compiler engineering eﬀort
● Enables innovations in programming models and hardware

Achieved “O(frontend+backends)” scalability of compiler ecosystem

Reduced 💢fragmentation, improved 🧩 modularity

● Focus on the diﬀerentiated parts of the stack
But… what about hardware?
HW design is fragmented too
CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC

??
Programmable xPUs Custom Hardware
Building Parallel Compute Units?
Memory / Bus Interface

RISC-V Processor

Parallel
Compute
Unit

Notice how I conveniently

omitted how to build the
“interesting” part!
Silicon Compilers
Hardware Design is ripe with opportunity
SystemVerilog is industry standard, but:
● Huge, complicated, incompletely implemented
● Is it an IR? or programming language for humans? neither? both?

EDA tools are mature, but not always:

● … innovating rapidly, now that process technology has slowed
● … designed for usability
● … using best practices in SW architecture
● … cost eﬃcient
Open Source tools to the rescue?
Wonderful ecosystem of Open Source tools, but:
● Generally aspiring to be “as good” as proprietary tools
● Fragmented communities, not sharing much code
● Monolithic designs connected by unfortunate standards

nextpnr
�
Innovation Explosion Underway!
Research is producing new HW design models and abstraction approaches

Magma

Dahlia

A great opportunity to pull PL + type system + compiler tech from SW world...

… held back by poor interop standards and ecosystem 💢fragmentation
See also: ASPLOS LATTE’21 Workshop
CIRCT: Circuit IR for Compilers and Tools
Compiler infrastructure for design and veriﬁcation

● LLVM incubator project built on MLIR & LLVM

● Composable toolchain for diﬀerent aspects of
hardware design / EDA processes
● 🧩 Modularity, library based design, ecosystem
● High quality, usability, performance

Goals:
● Unite HW design tools community
● “Accelerate” design of the accelerators!
https://circt.llvm.org
CIRCT Ambition / Path Ahead
Support multiple different “hardware design models” in one framework:
● Generators, HLS, atomic transactions, ...
Increase abstraction level in the hardware design IR:
● Integrate modern type system features from the SW world
● Capture more design intent, higher level verification and tools
● Better integrate formal methods into the design flow
Increase quality of the tools themselves:
● Compile time: shrink development cycle time
● Usability: robust location tracking for good error messages

“10x” design and veriﬁcation, change economics of hardware design

Co-design of HW and SW design
CPU, etc. GPGPU, etc. TPU, NPU, etc. FPGA, CPLD, etc. ASIC

Programmable xPUs Custom Hardware

A Golden Age of Compilers
in an era of Hardware/Software co-design
Compiler/PL tech more important than ever!
The world is evolving fast at the “End of Moore’s Law”
● Changing assumptions, expanding possibilities
HW changes require new programming models and approaches:
● … and is validating well known but sparsely adopted techniques
We need compiler and PL experts to step up!

We’re hiring!
Get involved!
https://mlir.llvm.org/
https://circt.llvm.org/
Too much content,
skip this section

Frontiers in Compiler Architecture

Concurrency within the compiler
Parallel for each is not enough
Caching
Why are we rerunning N^2 and NP complete algorithms from scratch when
their inputs aren’t always changing??

Need to design the compiler for this from the beginning

Distribution
Distributed systems vs compilers. One of the heaviest workloads, why are we
doing this??

Build system + compilers dichotomy is terrible.

No distributed system person would ever build things this way.

Many problems are embarrassingly parallel here.

Extensible Compilers
Call back into the generator as part of lowering.
Computational Demands of Machine Learning

https://openai.com/blog/ai-and-compute

Computer Architecture
No ratings yet
Computer Architecture
667 pages
Computer Applications Notes
No ratings yet
Computer Applications Notes
38 pages
Embedded Signal Processing
67% (3)
Embedded Signal Processing
103 pages
ACA Mod1
No ratings yet
ACA Mod1
118 pages
Cloud Computing Unit-1
100% (1)
Cloud Computing Unit-1
88 pages
2017 01 31 FPGA Lecture HS
No ratings yet
2017 01 31 FPGA Lecture HS
75 pages
Chapter 0
No ratings yet
Chapter 0
26 pages
Chapter 1 Edit
No ratings yet
Chapter 1 Edit
463 pages
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
Computer Architecture Lec3 Combinational Logic Design 1731399879
No ratings yet
Computer Architecture Lec3 Combinational Logic Design 1731399879
193 pages
Microprocessor Notes Vtu Brief 6th Sem
100% (4)
Microprocessor Notes Vtu Brief 6th Sem
31 pages
VMware Interview Questions
No ratings yet
VMware Interview Questions
74 pages
Kai Hwang: Advanced Computer Architecture
No ratings yet
Kai Hwang: Advanced Computer Architecture
9 pages
Adv CA - Slide Deck 1
No ratings yet
Adv CA - Slide Deck 1
105 pages
Concurrent Programming With Threads: Rajkumar Buyya
No ratings yet
Concurrent Programming With Threads: Rajkumar Buyya
168 pages
Chapter 01 See Program Running
No ratings yet
Chapter 01 See Program Running
63 pages
CAO Fall 2024 Lecture 01 Introduction Motivation
No ratings yet
CAO Fall 2024 Lecture 01 Introduction Motivation
68 pages
EL3011 - 16 Wrap Up
No ratings yet
EL3011 - 16 Wrap Up
62 pages
01-System Architecture
No ratings yet
01-System Architecture
55 pages
IT3030E CA Chap1 Introduction
No ratings yet
IT3030E CA Chap1 Introduction
52 pages
Wk05 - CPU Architecture (Part 1)
No ratings yet
Wk05 - CPU Architecture (Part 1)
72 pages
RG1 Intro ParallelArch HPCAI Jan2020
No ratings yet
RG1 Intro ParallelArch HPCAI Jan2020
47 pages
L01 Intro
No ratings yet
L01 Intro
54 pages
Embedded Prathap
No ratings yet
Embedded Prathap
58 pages
Lecture2 GPU Architecture - 2025
No ratings yet
Lecture2 GPU Architecture - 2025
46 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
82 pages
08 Architecture
No ratings yet
08 Architecture
51 pages
CA I - Chapter 1 Introduction
No ratings yet
CA I - Chapter 1 Introduction
39 pages
Parralel 01
No ratings yet
Parralel 01
38 pages
Chapter 2 - Computer Organization
No ratings yet
Chapter 2 - Computer Organization
30 pages
217 Lec1
No ratings yet
217 Lec1
35 pages
Microsoft PowerPoint - SoC Design Flow Tools Codesign
No ratings yet
Microsoft PowerPoint - SoC Design Flow Tools Codesign
110 pages
CPEN 311: Digital Systems Design Slide Set 19: High-Level Synthesis
No ratings yet
CPEN 311: Digital Systems Design Slide Set 19: High-Level Synthesis
28 pages
Topic #1 - Introduction
No ratings yet
Topic #1 - Introduction
23 pages
Hal Embedded
No ratings yet
Hal Embedded
61 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Super Cpmputers
No ratings yet
Super Cpmputers
101 pages
Parallel Architecture Fundamental
No ratings yet
Parallel Architecture Fundamental
18 pages
And Motivation: Presenter
No ratings yet
And Motivation: Presenter
22 pages
SOC Architecture and Design: - System-On-Chip (SOC) - SOC Covers Many Topics
No ratings yet
SOC Architecture and Design: - System-On-Chip (SOC) - SOC Covers Many Topics
32 pages
Opens in A New Window: Types of Direct Memory Access (DMA)
No ratings yet
Opens in A New Window: Types of Direct Memory Access (DMA)
11 pages
2017edan85l4 1
No ratings yet
2017edan85l4 1
33 pages
CS 61C: Great Ideas in Computer Architecture: Course Introduction
No ratings yet
CS 61C: Great Ideas in Computer Architecture: Course Introduction
55 pages
Future of Computer Architecture
No ratings yet
Future of Computer Architecture
46 pages
Chapter 1 Edit PDF
No ratings yet
Chapter 1 Edit PDF
40 pages
Flynns
No ratings yet
Flynns
41 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
Cs295: Modern Systems What Are Fpgas and Why Should You Care
No ratings yet
Cs295: Modern Systems What Are Fpgas and Why Should You Care
22 pages
RISC, CISC, and Assemblers!: Hakim Weatherspoon CS 3410, Spring 2011
No ratings yet
RISC, CISC, and Assemblers!: Hakim Weatherspoon CS 3410, Spring 2011
31 pages
cc11 PDF
No ratings yet
cc11 PDF
32 pages
TRIPS - An EDGE Instruction Set Architecture: Chirag Shah April 24, 2008
No ratings yet
TRIPS - An EDGE Instruction Set Architecture: Chirag Shah April 24, 2008
35 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
Parallel Archit 1
No ratings yet
Parallel Archit 1
18 pages
HMC+and+V7+System+p VUG
No ratings yet
HMC+and+V7+System+p VUG
83 pages
Embedded Systems: Thanos Stathopoulos CS239 Spring 03
No ratings yet
Embedded Systems: Thanos Stathopoulos CS239 Spring 03
29 pages
ECE153a/253 Embedded Systems Class Overview
No ratings yet
ECE153a/253 Embedded Systems Class Overview
41 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
Creating HWSW Co-Designed MPSoPCs From High Level Programming Models
No ratings yet
Creating HWSW Co-Designed MPSoPCs From High Level Programming Models
7 pages
02 Chipyard Basics
100% (1)
02 Chipyard Basics
38 pages
Specification s1200
No ratings yet
Specification s1200
50 pages
Digital Principles and Computer Organization - CS3351 - Important Questions With Answer - Unit 3 - Computer Fundamentals
No ratings yet
Digital Principles and Computer Organization - CS3351 - Important Questions With Answer - Unit 3 - Computer Fundamentals
13 pages
Engine Control Unit (ECU) System Operation
No ratings yet
Engine Control Unit (ECU) System Operation
3 pages
Microcomputer Organization
No ratings yet
Microcomputer Organization
18 pages
Rvfpga Getting Started Guide: The Imagination University Programme
No ratings yet
Rvfpga Getting Started Guide: The Imagination University Programme
102 pages
Arm Lab Manual
No ratings yet
Arm Lab Manual
84 pages
MAIN Electrical Parts List: Parts Code Design Loc Description
No ratings yet
MAIN Electrical Parts List: Parts Code Design Loc Description
16 pages
Basics of Computers - Introduction Notes On Tutorialspoint
100% (1)
Basics of Computers - Introduction Notes On Tutorialspoint
4 pages
Data Entry Ope 336 Class 12TH in English
No ratings yet
Data Entry Ope 336 Class 12TH in English
34 pages
Ece 747 Digital Signal Processing Architecture: Soc Lecture - Working With Buses & Interconnects
No ratings yet
Ece 747 Digital Signal Processing Architecture: Soc Lecture - Working With Buses & Interconnects
25 pages
Unit Ii PDF
No ratings yet
Unit Ii PDF
90 pages
Database Management Systems: Course Objectives
No ratings yet
Database Management Systems: Course Objectives
18 pages
Ist Unit Notes
No ratings yet
Ist Unit Notes
44 pages
Actual Parameter Application Software: Glossary of Object-Oriented and Programming Terms
No ratings yet
Actual Parameter Application Software: Glossary of Object-Oriented and Programming Terms
5 pages
2VAA000760A en P13 70PR05B 1.0 Sales Data Sheet 110616
No ratings yet
2VAA000760A en P13 70PR05B 1.0 Sales Data Sheet 110616
2 pages
It Sags Notes 2022 - 240826 - 192921
No ratings yet
It Sags Notes 2022 - 240826 - 192921
95 pages
Computer Fundamental MCQ
No ratings yet
Computer Fundamental MCQ
15 pages
Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory Memory MGT Hardware
No ratings yet
Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory Memory MGT Hardware
8 pages
MP Cheat 2.0
No ratings yet
MP Cheat 2.0
2 pages
Ben Yehuda
No ratings yet
Ben Yehuda
14 pages
CAO Pipelining Lecture
No ratings yet
CAO Pipelining Lecture
50 pages
Microprocessor Lab1
No ratings yet
Microprocessor Lab1
17 pages
Assessment Test
No ratings yet
Assessment Test
3 pages
JCM Gobal Gen5
No ratings yet
JCM Gobal Gen5
2 pages

ASPLOS 2021 - Golden Age of Compilers

Uploaded by

ASPLOS 2021 - Golden Age of Compilers

Uploaded by

The Golden Age of Compilers

in an era of Hardware/Software co-design

[cite] Hennessy, Patterson; June 2018 / CACM Feb 2019

Heterogeneous compute incorporating domain-speciﬁc accelerators

Many accelerator IPs are conﬁgurable:

🤯 How…canand“normal people” write Software for this in the ﬁrst place?

This opportunity is beckoning a golden age in compiler and PL technology!

⇒ Expensive, not very compatible, inconsistencies abound

Figure 11.1: Three Major Components of a Three-Phase Compiler

[cite] LLVM @ The Architecture of Open Source Applications

Figure 11.2: Retargetablity

Larger center of gravity concentrated scarce compiler engineering eﬀort

Reduced 💢fragmentation, standardized “C in practice”

Reduced 💢fragmentation of JIT compilers, standardized CPU codegen

Engineering is “pretty good” but could be better:

Going beyond basic CPUs means going beyond LLVM IR!

Programmable xPUs Custom Hardware

[cite] Applying Circuit IR Compilers and Tools (CIRCT)

Programmable xPUs Custom Hardware

⇒ Not very compatible, inconsistent quality and scope

Ratio of control to parallel compute vary, as do the internal arch’s of both

Control Processor / “Sequencer”

Communicate w other parts of the SoC, or to oﬀ-chip resources

Control Processor / “Sequencer”

Programming Model + Userspace API

Control Proc Assembler + Kernel Driver

Control Processor / “Sequencer”

Parallel Device Kernels

Control Proc Assembler + Kernel Driver

Use of more HW area is desired, requiring parallel control logic

Problem: hand written kernels don’t scale

Often addressed with metaprogramming (aka “mini compilers”)

Control Processor / “Sequencer”

PCI, HBM, DDR, CXL, AMBA, etc are all standardized

Parallel ● Control processor is bottom of the SW stack

“Saving a few gates” slows down what matters

🧩 Modular and subset-able ISA design:

Scalability allows full spectrum of design points!

Hard Coded Programmable Heterogeneous General Purpose CPU

RISC-V Compiler + Kernel Drivers

“No one size ﬁts all” compiler! Accelerator Kernel Compiler

Multistream Mgmt / Interop Parallelism

How do we get reuse?

Programmable xPUs Custom Hardware

Achieved “O(frontend+backends)” scalability of compiler ecosystem

Reduced 💢fragmentation, improved 🧩 modularity

Notice how I conveniently

EDA tools are mature, but not always:

A great opportunity to pull PL + type system + compiler tech from SW world...

● LLVM incubator project built on MLIR & LLVM

“10x” design and veriﬁcation, change economics of hardware design

Programmable xPUs Custom Hardware

Frontiers in Compiler Architecture

Need to design the compiler for this from the beginning

Build system + compilers dichotomy is terrible.

No distributed system person would ever build things this way.

Many problems are embarrassingly parallel here.

You might also like