Designing A Compiler For An AI/ML-oriented DRAM-based Processing in Memory (PIM) Architecture
Designing A Compiler For An AI/ML-oriented DRAM-based Processing in Memory (PIM) Architecture
1
Goals
• To design a compiler for a DRAM-based Processing in Memory (PIM) Architecture
with a custom-built Instruction Set Architecture (ISA)
• The compiler will translate HLL (High-level Language) Programs to the PIM ISA-compatible
instructions
• The compiler will have to be compatible with the Unique Programmability of the PIM architecture
and generate instructions accordingly
• The compiler will integrate DRAM physical memory mapping, account for the PIM floor map, and
dispatch DRAM ISA commands to perform necessary memory accesses
• The Winning team(s) will be invited to collaborate on research projects to perform a full-scale
development of a compiler framework for multiple applications such as AI and ML.
2
Overview
1. The Processing in Memory Architecture (pPIM)
a. Processing Layout and Interface
b. Processing Architecture
3
pPIM: A Programmable Processing in Memory Architecture 1, 2
DRAM Chip
Near-Subarray Computing:
• Performs computing within the DRAM banks next to DRAM Bank 0
subarrays
• The Clusters can read data from the DRAM subarray’s Local Row Buffer
• A Local Row Buffer holds one DRAM Page of data at a time
• DRAM ACTIVATE (ACT) Command buffers a page
1P.R. Sutradhar, M. Connolly, S. Bavikadi, S. M. Pudukotai Dinakarrao, M. A. Indovina and A. Ganguly, "pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning," in IEEE Computer Architecture Letters, vol. 19,
no. 2, pp. 118-121, 1 July-Dec. 2020, doi: 10.1109/LCA.2020.3011643.
2M.Connolly, P. R. Sutradhar, M. Indovina and A. Ganguly, "Flexible Instruction Set Architecture for Programmable Look-up Table based Processing-in-Memory," 2021 IEEE 39th International Conference on Computer Design (ICCD), Storrs, CT, USA, 4
2021, pp. 66-73, doi: 10.1109/ICCD53106.2021.00022.
pPIM: LUT-based Computing
6
Instruction Set Architecture
• Fixed-length Instructions
• Opcodes representing different Logic/Arithmetic Ops
• Core Pointer/ID primarily used for programming
the LUTs
• Rd/Wr bits are used for Memory Operations
• Row Address is used for Memory Operation
• Instruction Types:
A. Memory Access
B. LUT Programming: Typically, an atomic instruction
that programs all the cores in the Cluster
C. Compute (Logic/Arithmetic)
7
Instruction Set Architecture pPIM
Local Row
Decoder
18-17 16-11 10 9 8-0
Opcode Read/Core Ptr. Rd Wr Row Address Subarray
8-0
Addr. Ptr. LRB
Instr. Reg
Instr. Dec
CPU
Host
16-9
Extended
Core Ptr.
Bitlines
16-9
R/W Ptr.
R/W Buffer
18-17
Control Word 1
Program
Counter
Accumulator
Control Word n+3
Rd.
Ptr.
Router
Wr.
Ptr.
Control Bus Register Controls
8
SIMD Layout Instruction X
9
The Compiler
Conversion of Matrix-
vector multiplication
kernels to a Stream of
SIMD Instructions
Large Matrix-Matrix/Vector Compiler SIMD Instructions for the ISA
Multiplication Workloads
• Loop unrolling
• Insertion of LUT
Programming
Instruction
• Insertion of Memory
Access Requests
• And Other
Challenges… 10
Thank you!
Questions?
11