0% found this document useful (0 votes)

60 views31 pages

RL4 Re Al

Research paper that exllains register allocation.

Uploaded by

Zarif Sadman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views31 pages

RL4 Re Al

Research paper that exllains register allocation.

Uploaded by

Zarif Sadman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy1, Siddharth Jain1, Anilava Kundu1, Rohit Aggarwal1, Albert Cohen2,

Ramakrishna Upadrasta1

IIT Hyderabad1, Google2

LLVM Performance Workshop

25th February 2023
Register allocation

● Registers are scarce!

Unbounded set of variables →Finite set of registers
● One of the classic NP-Hard problems
Reducible to graph coloring

● Solutions
○ Constraint-based: ILP and PBQP formulations
○ Heuristic approaches

● LLVM - 4 register allocators

○ Constraint-based: PBQP
○ Heuristic: Greedy, Basic, Fast

2
LLVM’s Register Allocation Strategies and Heuristics

y y
x1 z
Splitting x z x2

Coalescing
y y

Spilling
y y
x z M z x1 x2 z x z

Eviction y y
R1 z x z
● No single best allocator
Greedy performs better in general

● Greedy Allocator Heuristics - Splitting, Coalescing, Eviction and Spilling

● PBQP Allocator Heuristics - Coalescing and Spilling
3
What makes ML based Register allocation difficult?

● Complex problem with multiple sub-tasks

○ Splitting, Spilling, Coalescing, etc.
● ML schemes should ensure correctness
○ Register type constraints
○ Live range constraints
● Integration of ML solutions with compiler frameworks
○ Python ↔ C++

Proposal - RL4ReAl: Reinforcement Learning for Register Allocation

4
RL4ReAl: Objectives
Objectives: Machine Learning Framework for Register Allocation
● End-to-end application of Reinforcement Learning for register allocation

● Semantically correct code generation

○ Without resorting to a correction phase
○ Correctness constraints imposed on action space

● Multi architecture support

Can an ML model match/outperform half-a-century old heuristics?

5
Constraints in Register Allocation

6
Register Allocation: Correctness constraints
Registers are complicated!

1.Register Constraints

2.Type constraints

3.Congruence constraints

4.Interference constraints

7
Register Constraints
● Architectural constraints
○ Eg: IDIV32 → Divides contents of $eax; stores result in $eax and $edx

● Register allocation ⇒ Allocating left out virtual registers

8
Type constraints
● Different types of registers in a register ﬁle
○ General purpose registers
○ Floating point registers
○ Vector registers, …

● Variable type compatibility with the register type

9
Congruence constraints
● Real-world ISAs have hierarchy of register classes
○ Congruent classes

Figure source: Wikipedia 10

Interference constraints
Register allocation ⇒ Graph coloring problem

x = 10
y = 20 Interval Interference
print x Interference Graph
z = 20 + y y
print y y
x z
z = z +10 x z
print z

Color x Color y Color z Colored Inference

Graph
y x => R1 y y => R2 R2 z => R1 R2

x z R1 z R1 z R1 R1

Available Registers: R1(Green), R2(Blue) 11

RL4ReAl: Reinforcement Learning for Register
Allocation
LLVM Environment RL Framework

MLRegAlloc

gRPC
Update

Split Info
gRPC
Stub gRPC
Stub

12
Interference graphs
Edges: {phy reg - vir reg, vir reg - vir reg}

Vertices
● MIR instruction representations in the live range of a variable

● Instruction → Rn MIR2Vec embeddings

● Final representation: Rm ⨯ n

MIR2Vec representations
● n dimensional vector representation

● Opcode and operand information form the entities in MIR

○

13
Grouping opcodes
● MIR has specialized opcodes
● Based on width, source and destination types
○ 200 different MOV instructions
○ MOV32rm, MOVZX64rr16, MOVAPDrr, etc.

● 15.3K opcodes in x86; 5.4K opcodes in AArch64

○ {build dir}/lib/Target/X86/X86GenInstrInfo.inc
○ {build dir}/lib/Target/AArch64/AArch64GenInstrInfo.inc

● Generic opcodes
○ Specialized opcodes are grouped together
○ {MOV32rx, MOVZX64rr16, MOVAPDrr, …} → MOV

14
Representing Interference graphs
● GGNNs - Gated Graph Neural Networks
○ Processing graph structured inputs

● Message passing
○ Information propagated multiple times across nodes

● Annotations on nodes → Current state

0 0 0
○ Visited 1 1 0
○ Colored z
○ Spilled x

● Rm ⨯ n → R k
y

1 0 1

15
Hierarchical Reinforcement Learning
● Environment - MLRegAlloc pass in LLVM
○ Generates interference graphs + representations
○ Register allocation, splitting and spilling as per the prediction

● Multi-agent hierarchical reinforcement learning

○ Sub tasks of register allocation → Low level agents

● Agents
○ Node selection
○ Task selection
○ Splitting
○ Coloring

16
Agents Selects the vertex to process next

Node Selection Agent Action space

Vertices that are not colored

Reward: Based on low-level agents

Selected Node
Pick
Next Node
Selects between split and color

Action space
Task Selection Agent
Split or Color - Split is allowed only if #Uses > k (k = 2)

Reward: Based on low-level agents

Split Color
Splitting Agent Coloring Agent

Predicts the split point in live range of a variable Picks an appropriate color for a given vertex

Action space Action space

Set of valid use points to split OR Set of Legal registers, if available. Otherwise, spill

Reward Reward
Difference in spill weights before and after splitting +Spill weight, if colored; -Spill weight, if spilled
17
Materialization of splitting
● Involves inserting move instructions
● Dataﬂow problem
○ Similar to phi or copy placement

● Use dominance frontier

18
Global Rewards
● Based on the throughput (Th) of the generated function
● Use LLVM MCA
○ Machine Code Analyzer of LLVM
○ Static model to estimate throughput

19
Integration with LLVM
● RL4ReAl - to-and-fro communication
○ Decisions/Actions by Python model
○ Materialization of decisions in C++ compiler

● LLVM-gRPC - gRPC based framework

○ Seamless connection between LLVM and Python ML workloads
■ Works as an LLVM library
■ Easy integration
● As simple as implementing a few API calls

○ Support for any ML workload

■ Not just limited to RL
■ With both training and inference ﬂow

20
Training

1. Request:Interference Graph

2. Reply:Interference Graph (split)

RL Model 3. Request: Action +Reward for decision
LLVM
(Python) (C++)
4. Reply: Reward as reply

Training phase

○ Involves RL model (Python) requesting C++(LLVM)

○ Model takes decisions on splitting and coloring
○ C++ (LLVM) generates code for the decision and returns the reward accordingly

21
Inference

1. Interference graph

RL Model LLVM
(Python) 2. Reply: Decisions (C++)

Inference phase
○ For any input code C++(LLVM) sends a request to the trained model
for splitting decision
○ As a reply, the trained model returns the decision it took and code is
generated.

22
Experiments
● MIR2Vec representations
○ 2000 source ﬁles from SPEC CPU 2017 and C++ Boost libraries
○ 100 dimensional embeddings; trained over 1000 epochs

● Evaluation
○ x86 - Intel Xeon W2133, 6 cores, 32GB RAM
○ AArch64 - ARM Cortex A72, 2 cores, 4GB RAM

● RL models - PPO policy with standard set of hyperparameters

● Register allocations
○ General purpose, ﬂoating point and vector registers

23
Runtime improvements on x86

● RL4ReAl shows speedups over Basic in 14/18 benchmarks

● Runtimes very close to Greedy
● Only 1 show more than 4% slow-down
24
Analysis of Hot functions

%Difference in runtime with Basic as baseline on hot functions

25
Analysis of Hot functions
%speedups obtained by Greedy and RL4ReAl over Basic

26
Analysis of Hot functions
%speedups obtained by Greedy and RL4ReAl over Basic

27
Runtimes on AArch64

28
Policy Improvement on Regression cases
● Regression in performance
○ Identify → Reﬁne heuristics → Evaluate

● MLGO’s policy improvement cycle

○ Fine-tuning of learned RL policy on regression cases

● Identify and Reﬁne

○ Poorly performing benchmarks from each conﬁguration
○ RL4Real-L
■ milc (-13.8s → -0.8s)
○ RL4Real-G
■ Hmmer (-37.6s → -26s), xz (-8.5s → -2.5s)

● Strong case for online learning and domain specialization

Troﬁn et al, MLGO: a machine learning guided compiler optimizations framework - arXiv, 2021 29
Summary

● RL4ReAl: Architecture independent Reinforcement Learning for Register Allocation

● Multi agent hierarchical approach

● Generates semantically correct code: constraints imposed on the action space

● Allocations on par or better than the best allocators of LLVM

● New opportunities for compiler/ML research

● Framework will be open-sourced

● https://compilers.cse.iith.ac.in/publications/rl4real

30
Thank You!
https://compilers.cse.iith.ac.in/publications/rl4real/

Internship REPORT (Mahesh)
No ratings yet
Internship REPORT (Mahesh)
28 pages
Fiber Glass Protection
100% (1)
Fiber Glass Protection
679 pages
ML807 Distributed and Federated Learning Slides 1
No ratings yet
ML807 Distributed and Federated Learning Slides 1
190 pages
Schneider Ecostructure Guide
No ratings yet
Schneider Ecostructure Guide
80 pages
TGREDCO - Telangana Renewable Energy Development Corporation LTD.
No ratings yet
TGREDCO - Telangana Renewable Energy Development Corporation LTD.
5 pages
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
100% (1)
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
656 pages
Owner's Manual Brillance T50-T52 - 21x14cms
No ratings yet
Owner's Manual Brillance T50-T52 - 21x14cms
160 pages
Cell Broadcast (GBSS19.1 01)
No ratings yet
Cell Broadcast (GBSS19.1 01)
87 pages
A-Dec Dental Lights and Monitor Mounts Service Guide
No ratings yet
A-Dec Dental Lights and Monitor Mounts Service Guide
68 pages
Week1 Slide ECE4010
No ratings yet
Week1 Slide ECE4010
301 pages
CO chpt-4
No ratings yet
CO chpt-4
161 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
AML - Lecture - 09 - 08nov24
No ratings yet
AML - Lecture - 09 - 08nov24
126 pages
L24 MarkovDecisionProcess
No ratings yet
L24 MarkovDecisionProcess
129 pages
Chapter 4
100% (1)
Chapter 4
166 pages
LLVM Tutorial
100% (1)
LLVM Tutorial
59 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
PHD Thesis On Physics Education
100% (3)
PHD Thesis On Physics Education
5 pages
Modeling A Non-Uniform Memory Access Architecture For Optimizing
No ratings yet
Modeling A Non-Uniform Memory Access Architecture For Optimizing
79 pages
04 Nursing Process of MHN
100% (1)
04 Nursing Process of MHN
13 pages
Chapter - 04 RISC V
No ratings yet
Chapter - 04 RISC V
132 pages
Triaxial Test For Rocks
No ratings yet
Triaxial Test For Rocks
12 pages
ISO-9001-quality-management System
No ratings yet
ISO-9001-quality-management System
16 pages
Introduction To Logic Module 3 Language and Definitions
No ratings yet
Introduction To Logic Module 3 Language and Definitions
16 pages
Locked College List Mop Up Round AIQ
No ratings yet
Locked College List Mop Up Round AIQ
4 pages
MLIR Tutorial
No ratings yet
MLIR Tutorial
78 pages
AV 50 Terzan PDF
No ratings yet
AV 50 Terzan PDF
47 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
AIEngineering
No ratings yet
AIEngineering
25 pages
Exploiting ILP With Software Approach
No ratings yet
Exploiting ILP With Software Approach
104 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
Code
No ratings yet
Code
73 pages
Cambridge International AS & A Level: Geography 9696/41
No ratings yet
Cambridge International AS & A Level: Geography 9696/41
24 pages
Series D1MW Characteristics: Technical Features
No ratings yet
Series D1MW Characteristics: Technical Features
6 pages
Global Reg Allocation
No ratings yet
Global Reg Allocation
76 pages
Lecture10 - High-Level Digital Design Automation
No ratings yet
Lecture10 - High-Level Digital Design Automation
34 pages
102 User Experiences With The Portable Stimulus Standard
No ratings yet
102 User Experiences With The Portable Stimulus Standard
61 pages
PLDI Week 04 LLVM
No ratings yet
PLDI Week 04 LLVM
62 pages
PHD CSE
No ratings yet
PHD CSE
37 pages
Instruction-Level Parallel Processors: Objective
No ratings yet
Instruction-Level Parallel Processors: Objective
31 pages
DeepMind - Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning
No ratings yet
DeepMind - Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning
17 pages
S.No. Name of The Agency Contact Details: M/s M.P. Printers
100% (1)
S.No. Name of The Agency Contact Details: M/s M.P. Printers
3 pages
Dulac Arnold 2021
No ratings yet
Dulac Arnold 2021
50 pages
Topic 3 Me111 PDF
No ratings yet
Topic 3 Me111 PDF
25 pages
hls4ml Tutorial
No ratings yet
hls4ml Tutorial
49 pages
RL Chap 5
No ratings yet
RL Chap 5
21 pages
An Empirical Investigation of The Challenges of Real-World Reinforcement Learning
No ratings yet
An Empirical Investigation of The Challenges of Real-World Reinforcement Learning
48 pages
AWS SAA Notes
No ratings yet
AWS SAA Notes
18 pages
@@register Allocation
No ratings yet
@@register Allocation
37 pages
LLVM Essentials - Sample Chapter
No ratings yet
LLVM Essentials - Sample Chapter
16 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
TechTalk Kruppe Espasa RISC V Vectors and LLVM
No ratings yet
TechTalk Kruppe Espasa RISC V Vectors and LLVM
23 pages
A Crash Course On Reinforcement Learning
No ratings yet
A Crash Course On Reinforcement Learning
40 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
A Complete Guide To LLVM For Programming Language Creators
No ratings yet
A Complete Guide To LLVM For Programming Language Creators
22 pages
Improving Assembly Code Performance With Large Language Models Via Reinforcement Learning
No ratings yet
Improving Assembly Code Performance With Large Language Models Via Reinforcement Learning
15 pages
Nova
No ratings yet
Nova
21 pages
Uid-Module 3 Menus
No ratings yet
Uid-Module 3 Menus
25 pages
Faster Sorting Algorithms Discovered Using Deep Re
No ratings yet
Faster Sorting Algorithms Discovered Using Deep Re
18 pages
Rvfpga-Soc: Getting Started Guide
No ratings yet
Rvfpga-Soc: Getting Started Guide
5 pages
Global Reg All 2
No ratings yet
Global Reg All 2
58 pages
v2.0
No ratings yet
v2.0
21 pages
Phase 4
No ratings yet
Phase 4
10 pages
Ray: A Distributed Framework For Emerging AI Applications
No ratings yet
Ray: A Distributed Framework For Emerging AI Applications
19 pages
Report ML Aat g1 Final
No ratings yet
Report ML Aat g1 Final
8 pages
Reinforcement Learning2018
No ratings yet
Reinforcement Learning2018
5 pages
ECE-6913 - RISC-V Project - A1
No ratings yet
ECE-6913 - RISC-V Project - A1
4 pages
BUSS 1020 - Quantitative Business Analysis Individual ASSIGNMENT Semester 2, 2015
No ratings yet
BUSS 1020 - Quantitative Business Analysis Individual ASSIGNMENT Semester 2, 2015
3 pages
Day - 6 - Machine Learning Scenarios
No ratings yet
Day - 6 - Machine Learning Scenarios
6 pages
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
No ratings yet
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
4 pages
Rvcorep: An Optimized Risc-V Soft Processor of Five-Stage Pipelining
No ratings yet
Rvcorep: An Optimized Risc-V Soft Processor of Five-Stage Pipelining
9 pages
00 MCB BC-L Series Leaflet
No ratings yet
00 MCB BC-L Series Leaflet
2 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
AI - Assignment 2 Zaryab Khan
No ratings yet
AI - Assignment 2 Zaryab Khan
6 pages
FATE-LLM: A Industrial Grade Federated Learning Framework For Large Language Models
No ratings yet
FATE-LLM: A Industrial Grade Federated Learning Framework For Large Language Models
7 pages
L15 Register Allocation
No ratings yet
L15 Register Allocation
5 pages
Simple RL: Reproducible Reinforcement Learning in Python: David - Abel@brown - Edu
No ratings yet
Simple RL: Reproducible Reinforcement Learning in Python: David - Abel@brown - Edu
11 pages
Assignment 5 Comp 3261
No ratings yet
Assignment 5 Comp 3261
6 pages
Plis in Verification Environment: Surekha Sonawane Rohit Gupta Jitendra Puri
No ratings yet
Plis in Verification Environment: Surekha Sonawane Rohit Gupta Jitendra Puri
11 pages
HDFC 5000 Book4 07to31mar25
No ratings yet
HDFC 5000 Book4 07to31mar25
3 pages
Dice Resume CV Kelly Carlson
No ratings yet
Dice Resume CV Kelly Carlson
4 pages
Real-Time Machine Learning: The Missing Pieces
No ratings yet
Real-Time Machine Learning: The Missing Pieces
6 pages
Report General Chejj
No ratings yet
Report General Chejj
3 pages
ME 466 Introduction To Artificial Intelligence Fall 2021: Kerem Altun
No ratings yet
ME 466 Introduction To Artificial Intelligence Fall 2021: Kerem Altun
2 pages
Nitish Bnkassociate
No ratings yet
Nitish Bnkassociate
2 pages
9800 Relay Series
No ratings yet
9800 Relay Series
2 pages
BLANKS: Checks The BOD Water & BOD Bottles: Notes
No ratings yet
BLANKS: Checks The BOD Water & BOD Bottles: Notes
2 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

RL4 Re Al

Uploaded by

RL4 Re Al

Uploaded by

RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy1, Siddharth Jain1, Anilava Kundu1, Rohit Aggarwal1, Albert Cohen2,

IIT Hyderabad1, Google2

LLVM Performance Workshop

● Registers are scarce!

● LLVM - 4 register allocators

● Greedy Allocator Heuristics - Splitting, Coalescing, Eviction and Spilling

● Complex problem with multiple sub-tasks

Proposal - RL4ReAl: Reinforcement Learning for Register Allocation

● Semantically correct code generation

● Multi architecture support

Can an ML model match/outperform half-a-century old heuristics?

● Register allocation ⇒ Allocating left out virtual registers

● Variable type compatibility with the register type

Figure source: Wikipedia 10

Color x Color y Color z Colored Inference

Available Registers: R1(Green), R2(Blue) 11

● Instruction → Rn MIR2Vec embeddings

● Opcode and operand information form the entities in MIR

● 15.3K opcodes in x86; 5.4K opcodes in AArch64

● Annotations on nodes → Current state

● Multi-agent hierarchical reinforcement learning

Node Selection Agent Action space

Reward: Based on low-level agents

Reward: Based on low-level agents

Action space Action space

● Use dominance frontier

● LLVM-gRPC - gRPC based framework

○ Support for any ML workload

2. Reply:Interference Graph (split)

○ Involves RL model (Python) requesting C++(LLVM)

● RL models - PPO policy with standard set of hyperparameters

● RL4ReAl shows speedups over Basic in 14/18 benchmarks

%Difference in runtime with Basic as baseline on hot functions

● MLGO’s policy improvement cycle

● Identify and Reﬁne

● Strong case for online learning and domain specialization

● RL4ReAl: Architecture independent Reinforcement Learning for Register Allocation

● Generates semantically correct code: constraints imposed on the action space

● Allocations on par or better than the best allocators of LLVM

● New opportunities for compiler/ML research

● Framework will be open-sourced

You might also like