0% found this document useful (0 votes)
60 views31 pages

RL4 Re Al

Research paper that exllains register allocation.

Uploaded by

Zarif Sadman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views31 pages

RL4 Re Al

Research paper that exllains register allocation.

Uploaded by

Zarif Sadman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy1, Siddharth Jain1, Anilava Kundu1, Rohit Aggarwal1, Albert Cohen2,


Ramakrishna Upadrasta1

IIT Hyderabad1, Google2

LLVM Performance Workshop


25th February 2023
Register allocation

● Registers are scarce!


Unbounded set of variables →Finite set of registers
● One of the classic NP-Hard problems
Reducible to graph coloring

● Solutions
○ Constraint-based: ILP and PBQP formulations
○ Heuristic approaches

● LLVM - 4 register allocators


○ Constraint-based: PBQP
○ Heuristic: Greedy, Basic, Fast

2
LLVM’s Register Allocation Strategies and Heuristics

y y
x1 z
Splitting x z x2

Coalescing
y y

Spilling
y y
x z M z x1 x2 z x z

Eviction y y
R1 z x z
● No single best allocator
Greedy performs better in general

● Greedy Allocator Heuristics - Splitting, Coalescing, Eviction and Spilling


● PBQP Allocator Heuristics - Coalescing and Spilling
3
What makes ML based Register allocation difficult?

● Complex problem with multiple sub-tasks


○ Splitting, Spilling, Coalescing, etc.
● ML schemes should ensure correctness
○ Register type constraints
○ Live range constraints
● Integration of ML solutions with compiler frameworks
○ Python ↔ C++

Proposal - RL4ReAl: Reinforcement Learning for Register Allocation

4
RL4ReAl: Objectives
Objectives: Machine Learning Framework for Register Allocation
● End-to-end application of Reinforcement Learning for register allocation

● Semantically correct code generation


○ Without resorting to a correction phase
○ Correctness constraints imposed on action space

● Multi architecture support

Can an ML model match/outperform half-a-century old heuristics?

5
Constraints in Register Allocation

6
Register Allocation: Correctness constraints
Registers are complicated!

1.Register Constraints

2.Type constraints

3.Congruence constraints

4.Interference constraints

7
Register Constraints
● Architectural constraints
○ Eg: IDIV32 → Divides contents of $eax; stores result in $eax and $edx

● Register allocation ⇒ Allocating left out virtual registers

8
Type constraints
● Different types of registers in a register file
○ General purpose registers
○ Floating point registers
○ Vector registers, …

● Variable type compatibility with the register type

9
Congruence constraints
● Real-world ISAs have hierarchy of register classes
○ Congruent classes

Figure source: Wikipedia 10


Interference constraints
Register allocation ⇒ Graph coloring problem

x = 10
y = 20 Interval Interference
print x Interference Graph
z = 20 + y y
print y y
x z
z = z +10 x z
print z

Color x Color y Color z Colored Inference


Graph
y x => R1 y y => R2 R2 z => R1 R2

x z R1 z R1 z R1 R1

Available Registers: R1(Green), R2(Blue) 11


RL4ReAl: Reinforcement Learning for Register
Allocation
LLVM Environment RL Framework

MLRegAlloc

gRPC
Update

Split Info
gRPC
Stub gRPC
Stub

12
Interference graphs
Edges: {phy reg - vir reg, vir reg - vir reg}

Vertices
● MIR instruction representations in the live range of a variable

● Instruction → Rn MIR2Vec embeddings

● Final representation: Rm ⨯ n

MIR2Vec representations
● n dimensional vector representation

● Opcode and operand information form the entities in MIR


13
Grouping opcodes
● MIR has specialized opcodes
● Based on width, source and destination types
○ 200 different MOV instructions
○ MOV32rm, MOVZX64rr16, MOVAPDrr, etc.

● 15.3K opcodes in x86; 5.4K opcodes in AArch64


○ {build dir}/lib/Target/X86/X86GenInstrInfo.inc
○ {build dir}/lib/Target/AArch64/AArch64GenInstrInfo.inc

● Generic opcodes
○ Specialized opcodes are grouped together
○ {MOV32rx, MOVZX64rr16, MOVAPDrr, …} → MOV

14
Representing Interference graphs
● GGNNs - Gated Graph Neural Networks
○ Processing graph structured inputs

● Message passing
○ Information propagated multiple times across nodes

● Annotations on nodes → Current state


0 0 0
○ Visited 1 1 0
○ Colored z
○ Spilled x

● Rm ⨯ n → R k
y

1 0 1

15
Hierarchical Reinforcement Learning
● Environment - MLRegAlloc pass in LLVM
○ Generates interference graphs + representations
○ Register allocation, splitting and spilling as per the prediction

● Multi-agent hierarchical reinforcement learning


○ Sub tasks of register allocation → Low level agents

● Agents
○ Node selection
○ Task selection
○ Splitting
○ Coloring

16
Agents Selects the vertex to process next

Node Selection Agent Action space


Vertices that are not colored

Reward: Based on low-level agents

Selected Node
Pick
Next Node
Selects between split and color

Action space
Task Selection Agent
Split or Color - Split is allowed only if #Uses > k (k = 2)

Reward: Based on low-level agents

Split Color
Splitting Agent Coloring Agent

Predicts the split point in live range of a variable Picks an appropriate color for a given vertex

Action space Action space


Set of valid use points to split OR Set of Legal registers, if available. Otherwise, spill

Reward Reward
Difference in spill weights before and after splitting +Spill weight, if colored; -Spill weight, if spilled
17
Materialization of splitting
● Involves inserting move instructions
● Dataflow problem
○ Similar to phi or copy placement

● Use dominance frontier

18
Global Rewards
● Based on the throughput (Th) of the generated function
● Use LLVM MCA
○ Machine Code Analyzer of LLVM
○ Static model to estimate throughput

19
Integration with LLVM
● RL4ReAl - to-and-fro communication
○ Decisions/Actions by Python model
○ Materialization of decisions in C++ compiler

● LLVM-gRPC - gRPC based framework


○ Seamless connection between LLVM and Python ML workloads
■ Works as an LLVM library
■ Easy integration
● As simple as implementing a few API calls

○ Support for any ML workload


■ Not just limited to RL
■ With both training and inference flow

20
Training

1. Request:Interference Graph

2. Reply:Interference Graph (split)


RL Model 3. Request: Action +Reward for decision
LLVM
(Python) (C++)
4. Reply: Reward as reply

Training phase

○ Involves RL model (Python) requesting C++(LLVM)


○ Model takes decisions on splitting and coloring
○ C++ (LLVM) generates code for the decision and returns the reward accordingly

21
Inference

1. Interference graph

RL Model LLVM
(Python) 2. Reply: Decisions (C++)

Inference phase
○ For any input code C++(LLVM) sends a request to the trained model
for splitting decision
○ As a reply, the trained model returns the decision it took and code is
generated.

22
Experiments
● MIR2Vec representations
○ 2000 source files from SPEC CPU 2017 and C++ Boost libraries
○ 100 dimensional embeddings; trained over 1000 epochs

● Evaluation
○ x86 - Intel Xeon W2133, 6 cores, 32GB RAM
○ AArch64 - ARM Cortex A72, 2 cores, 4GB RAM

● RL models - PPO policy with standard set of hyperparameters

● Register allocations
○ General purpose, floating point and vector registers

23
Runtime improvements on x86

● RL4ReAl shows speedups over Basic in 14/18 benchmarks


● Runtimes very close to Greedy
● Only 1 show more than 4% slow-down
24
Analysis of Hot functions

%Difference in runtime with Basic as baseline on hot functions

25
Analysis of Hot functions
%speedups obtained by Greedy and RL4ReAl over Basic

26
Analysis of Hot functions
%speedups obtained by Greedy and RL4ReAl over Basic

27
Runtimes on AArch64

28
Policy Improvement on Regression cases
● Regression in performance
○ Identify → Refine heuristics → Evaluate

● MLGO’s policy improvement cycle


○ Fine-tuning of learned RL policy on regression cases

● Identify and Refine


○ Poorly performing benchmarks from each configuration
○ RL4Real-L
■ milc (-13.8s → -0.8s)
○ RL4Real-G
■ Hmmer (-37.6s → -26s), xz (-8.5s → -2.5s)

● Strong case for online learning and domain specialization

Trofin et al, MLGO: a machine learning guided compiler optimizations framework - arXiv, 2021 29
Summary

● RL4ReAl: Architecture independent Reinforcement Learning for Register Allocation


● Multi agent hierarchical approach

● Generates semantically correct code: constraints imposed on the action space

● Allocations on par or better than the best allocators of LLVM

● New opportunities for compiler/ML research

● Framework will be open-sourced

● https://compilers.cse.iith.ac.in/publications/rl4real

30
Thank You!
https://compilers.cse.iith.ac.in/publications/rl4real/

31

You might also like