0% found this document useful (0 votes)
6 views5 pages

Documentation Compiler

This document outlines a compiler designed to convert C++ matrix multiplication operations into instructions for a Processor-in-Memory (PIM) architecture. It details the process of generating LLVM Intermediate Representation, extracting Three-Address Code, and creating ISA instructions for parallel execution across multiple cores. Key components include Clang/LLVM for conversion, a custom LLVM pass for TAC extraction, and a Python script for ISA generation, all aimed at optimizing matrix operations in a PIM environment.

Uploaded by

johnsneak63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Documentation Compiler

This document outlines a compiler designed to convert C++ matrix multiplication operations into instructions for a Processor-in-Memory (PIM) architecture. It details the process of generating LLVM Intermediate Representation, extracting Three-Address Code, and creating ISA instructions for parallel execution across multiple cores. Key components include Clang/LLVM for conversion, a custom LLVM pass for TAC extraction, and a Python script for ISA generation, all aimed at optimizing matrix operations in a PIM environment.

Uploaded by

johnsneak63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Compiler Implementation for Matrix

Multiplication using a PIM architecture

Faculty : Senthil Prakash


Slot : B1+TB1

------------------------------------COMPILATHON--------------------------------------

Team :

Rohit kumar singh 22BRS1258


Punish Midha 22BPS1150
Taher hussain kapadia 22BPS1113

Overview :
This document describes a compiler that transforms C++ matrix operations into custom instruction set
architecture (ISA) commands for a Processor-in-Memory (PIM) system. The process involves generating
LLVM Intermediate Representation, extracting Three-Address Code, and converting this code into machine
instructions compatible with the PIM architecture.

Process Flow :
C++ Code → LLVM IR → TAC Extraction → ISA Generation → Parallel Execution

Key Components :
 Clang / LLVM: Converts C++ to LLVM IR and provides analysis tools

 Custom LLVM Pass: Extracts Three-Address Code from LLVM IR

 Python Converter: Transforms TAC into ISA-compatible instructions

 Target Architecture: Uses 24-bit instruction format designed for DRAM subarray parallel processing
Implementation Steps :
1. Starting Point

Begin with a C++ program containing predefined matrices and a matrix multiplication function.

2. Generate LLVM IR

Output : matrix_ops.ll
Used ( -01 ) : Disables optnone attribute , allowing the LLVM pass to analyze the IR.

3. Extract Three-Address Code

 Custom LLVM pass (TACGenPass.cpp) identifies load, store, and arithmetic operations
 The pass outputs operations to tac_output.txt
 Compilation command:

Compile the pass : clang++ -shared -fPIC TACGenPass.cpp -o tacgen.so $(llvm-config --cxxflags --ldflags
--system-libs --libs core)

Run the pass : opt -load-pass-plugin ./tacgen.so -passes="tacgen" matrix_ops.ll -o /dev/null 2> tac_output.txt
OUTPUT : tac_output.txt

4. Generate ISA Instructions

 Python script maps TAC operations to the 24-bit ISA format


 Distributes instructions across multiple processing elements
 Execution command:

Run the script : python3 modified_tac_to_isa.py

OUTPUT : parallal_output.isa
ISA Instruction Format
24-bit instruction with the following fields:

 OPCODE (2 bits): 00=LOAD, 01=MULT, 10=STORE


 CODE_ID (6 bits): Processing element ID (0-3)
 Rd/Wr (2 bits): Read/Write flags
 Row Address (9 bits): DRAM row address
 Reserved (5 bits): For future expansion

Example Instruction

00 000001 11 000010000 00000 = LOAD from address 0x1000 on Core 1

Parallel Execution Strategy


 Instructions are distributed across 4 cores using round-robin assignment
 Each DRAM subarray processes independent iterations (row 0 on Core 0, row 1 on Core 1, etc.)

TAC Output Analysis

1. OP: %8 = add nuw nsw i64 %5, 1

o This is an addition operation (likely for loop iteration).

o nuw (No Unsigned Wrap) and nsw (No Signed Wrap) are LLVM flags indicating no
overflow occurs.

2. GEP: %12 = getelementptr inbounds [2 x i32], ptr %2, i64 %5, i64 %11

o This is a GetElementPtr (GEP) instruction used to calculate the address of an element in a


2D array.

o %5 and %11 are loop indices for accessing elements.

3. STORE: store i32 0, ptr %12, align 4, !tbaa !8

o This stores the value 0 into memory at %12. This likely corresponds to initializing C[i][j] = 0.

4. OP: %14 = add nuw nsw i64 %11, 1


o Another addition operation for loop iteration.

5. GEP: %19 = getelementptr inbounds [2 x i32], ptr %0, i64 %5, i64 %17

o GEP instruction to calculate the address of an element in matrix A.

6. LOAD: %20 = load i32, ptr %19, align 4, !tbaa !8

o Load the value from matrix A.

7. GEP: %21 = getelementptr inbounds [2 x i32], ptr %1, i64 %17, i64 %11

o GEP instruction to calculate the address of an element in matrix B.

8. LOAD: %22 = load i32, ptr %21, align 4, !tbaa !8

o Load the value from matrix B.

9. OP: %23 = mul nsw i32 %22, %20

o Multiply two loaded values (A[i][k] * B[k][j]) to compute a partial product.

10. OP: %24 = add nsw i32 %18, %23

o Add the partial product to the accumulator (C[i][j] += A[i][k] * B[k][j]).

11. STORE: store i32 %24, ptr %12, align 4, !tbaa !8

o Store the updated value back into matrix C.

12. OP: %25 = add nuw nsw i64 %17, 1

o Increment loop variable for the innermost loop.

}
}

You might also like