0% found this document useful (0 votes)

9 views12 pages

Lab09 - GPU Programming - Matrix Multiplication

The document outlines a lab focused on GPU programming for matrix multiplication, detailing the tasks of porting CUDA implementations to HIP, optimizing the code, and benchmarking the results. Students must submit a zip file containing their implementations and a report by a specified deadline. Scoring is based on the completeness of the report and the performance of the implementations, with specific guidelines for the use of techniques and performance metrics.

Uploaded by

Newbie Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views12 pages

Lab09 - GPU Programming - Matrix Multiplication

Uploaded by

Newbie Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Lab 09:

GPU Programming - Matrix Multiplication

Moreh Vietnam

GPU Programming
Matrix Multiplication
Given:
● Matrix A with dimension MxK (M rows, N columns)
● Matrix B with dimension KxN (K rows, N columns)
● Result Matrix C with dimension MxN
Each element C[i][j] (where 0<=i<M, 0<=j<N) is computed as:

This is the dot product of the i row of A and the j column of B

GPU Programming
Practice
You are given examples of matrix multiplication at
https://colab.research.google.com/drive/1MqmttKRsckSpnw5cJmLl--R6MidbFywM
Your task is:
1. Port some implementations from CUDA to HIP
2. Make an optimized version
3. Benchmark all implementations (from 1. and 2.) on the practice server
4. Write a report.

GPU Programming
Practice (1)
Port some implementations from CUDA to HIP, 5 to do:
1. Naive
2. Tiled (Block level)
3. Tiled (1D - ILP)
4. Tiled (2D - ILP)
5. Vectorized
6. Warp Tiled

GPU Programming
Practice (2)
Make an optimized version:
● This is an implementation of your own
○ Written in “best_gemm.cpp”
○ Make sure that your implementation can:
■ Verify the accuracy of output data
■ Calculate the throughput (GFLOPS) of the kernel
● Using any technique from the course
○ Advance technology like matrix core, tensor core are not allowed
○ Only “float” is allowed for floating point data type
● Your implementation will be tested with the following set of MxNxK:
○ 1024x1024x128
○ 1024x1024x1024
○ 512x2048x4096
○ 8192x8192x8192

GPU Programming
Practice (3)
Benchmark all implementations
● On the practice server (of course)
● Use one MI250 GPU only
○ You don’t have to do anything, just use `srun` like the example in the practice server
introduction slides

GPU Programming
Practice (4)
Write a report, your report should contain:
● What optimized techniques are used inside each implementation
● How do optimized techniques make improvements, compare to the previous
implementation?
○ This is optional
○ But it affects to your score, you should try your best
● The performance number (GFLOPS) of each implementation

GPU Programming
Scoring
Remember, this lab is a part of your midterm exam.
10 points in total:
● 8 for your report:
○ You will gain 5 points, if:
■ All the CUDA examples are ported into HIP and benchmarked on the practice server
■ All optimization techniques used in those CUDA to HIP implementations are mentioned
○ 3 more bonus points for:
■ Explain how optimization techniques works
■ What have been done in your “best_gemm” implementation
● 2 points for best performing implementation in class
○ Other students get points adjusted for their implementation performance relative to the
best.
■ Nothing for the worst
○ Remember that we only count result for implementation in “best_gemm.cpp”

GPU Programming
Submission
A zip file, which contains:
● 6 .cpp files of CUDA to HIP implementations
● “best_gemm.cpp”
● Your report in PDF
Due date:
● 23:59 09/04/2025
● You cannot submit after that
● This is not an endless assignment

GPU Programming
HIP
HIP is a C++ Runtime API and Kernel Language that allows developers to create portable
applications for AMD and NVIDIA GPUs from single source code.
● https://github.com/ROCm/hip
● In short:
○ Replace “cuda” by “hip”
○ Compile with “hipcc” instead of “nvcc”

GPU Programming
Example code
#include <iostream>
#include <hip/hip_runtime.h>

// Error checking macro for HIP calls

#define CHECK_HIP( cmd) \
{\
hipError_t error = cmd; \
if (error != hipSuccess) { \
std::cerr << "HIP error: " << hipGetErrorString (error) << " at line " << __LINE__ <<
std::endl; \
exit (EXIT_FAILURE); \
}\
}

// Simple HIP kernel that prints from GPU

__global__ void helloFromGPU () {
printf ("Hello World from GPU thread %d!\n", threadIdx .x);
}

int main () {
// Print from CPU first
std::cout << "Hello World from CPU!" << std::endl;

// Launch kernel with 1 block containing 8 threads

helloFromGPU<<< dim3 (1), dim3 (8)>>>();
CHECK_HIP (hipGetLastError ());

// Wait for GPU to finish

CHECK_HIP (hipDeviceSynchronize ());

std::cout << "Done!" << std::endl;

return 0;
}

GPU Programming
Workflow
1. Compile code with `hipcc` on login-node
Example: hipcc -O3 hip_hello.cpp -o hip_hello --offload-arch=gfx90a
2. Execute program with `srun` on working-node
Example: srun --time=01:00 ./hip_hello

GPU Programming

Lecture 12 GPU Programming
No ratings yet
Lecture 12 GPU Programming
65 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
If4093 Syllabus1
No ratings yet
If4093 Syllabus1
2 pages
Converting CUDA Programs To Run On AMD
No ratings yet
Converting CUDA Programs To Run On AMD
63 pages
Thesis Gpu Programming
100% (2)
Thesis Gpu Programming
6 pages
HPCXX 2023 d4
No ratings yet
HPCXX 2023 d4
52 pages
Alumni Management System
86% (7)
Alumni Management System
37 pages
Amd Gpu Hip Training 20190906
No ratings yet
Amd Gpu Hip Training 20190906
111 pages
Bonsai
No ratings yet
Bonsai
64 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
Cuda Lab Manual
100% (1)
Cuda Lab Manual
22 pages
Lecture12 GPUArchCUDA02-CUDAMem
No ratings yet
Lecture12 GPUArchCUDA02-CUDAMem
67 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Owens
No ratings yet
Owens
67 pages
Speakout 2nd Edition Elementary Reading&Listening Extra
100% (2)
Speakout 2nd Edition Elementary Reading&Listening Extra
14 pages
4.3-Essay Writing
No ratings yet
4.3-Essay Writing
7 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
8.4 GPU Architecture and Programming
No ratings yet
8.4 GPU Architecture and Programming
27 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Threads
No ratings yet
Threads
54 pages
Unit 5
No ratings yet
Unit 5
14 pages
GPUProgramming Talk
No ratings yet
GPUProgramming Talk
18 pages
Basic-Cuda
No ratings yet
Basic-Cuda
49 pages
UNIT-5 Part 1
No ratings yet
UNIT-5 Part 1
14 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
07 cmsc416 Cuda
No ratings yet
07 cmsc416 Cuda
26 pages
CUDA Programming Model
No ratings yet
CUDA Programming Model
14 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
1 Tutorial Intro
No ratings yet
1 Tutorial Intro
27 pages
Cuda 1
No ratings yet
Cuda 1
45 pages
Crud Hello
No ratings yet
Crud Hello
4 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
3 Gpgpu PDF
No ratings yet
3 Gpgpu PDF
11 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Unit 5'
No ratings yet
Unit 5'
33 pages
Download
No ratings yet
Download
7 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
38 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
HPC 1
No ratings yet
HPC 1
27 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
Accelerating Data Parallelism in Gpus Through Apgas
No ratings yet
Accelerating Data Parallelism in Gpus Through Apgas
9 pages
Criteria of Truth
No ratings yet
Criteria of Truth
6 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Full Le Triangle Et L Hexagone Maboula Soumahoro Ebook All Chapters
No ratings yet
Full Le Triangle Et L Hexagone Maboula Soumahoro Ebook All Chapters
49 pages
English A Guide To Giving Dawah To Non Muslims
100% (1)
English A Guide To Giving Dawah To Non Muslims
52 pages
HPC 4 B
No ratings yet
HPC 4 B
5 pages
cs239 Ejer1
No ratings yet
cs239 Ejer1
2 pages
Lec 1
No ratings yet
Lec 1
27 pages
Repeat Patterns Lesson Plan - August
No ratings yet
Repeat Patterns Lesson Plan - August
3 pages
Assignment On The Overview of China by Shahinur Parvin
No ratings yet
Assignment On The Overview of China by Shahinur Parvin
48 pages
Arthur Conan Doyle
No ratings yet
Arthur Conan Doyle
15 pages
Computer Fundamentals (ALL in ONE)
No ratings yet
Computer Fundamentals (ALL in ONE)
818 pages
GPU Architecture and Programming Lecture
No ratings yet
GPU Architecture and Programming Lecture
9 pages
Accelerating Large Graph Algorithms On The GPU Using CUDA
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using CUDA
12 pages
Introduction To Gpgpu and Parallel Computing (Gpu Architecture and Cuda Programming Models)
No ratings yet
Introduction To Gpgpu and Parallel Computing (Gpu Architecture and Cuda Programming Models)
4 pages
Islamic Story
No ratings yet
Islamic Story
19 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Notes DWDM
No ratings yet
Notes DWDM
12 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
CUDA
No ratings yet
CUDA
33 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
LearnEnglish Speaking B1 Agreeing and Disagreeing
No ratings yet
LearnEnglish Speaking B1 Agreeing and Disagreeing
2 pages
JT Catia-Composer Doc
No ratings yet
JT Catia-Composer Doc
11 pages
Omnibus Sworn Cert - 11.23.20
No ratings yet
Omnibus Sworn Cert - 11.23.20
1 page
Accelerating Large Graph Algorithms On The GPU Using Cuda
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
12 pages
Ensayo Sobre Tu Historia de Vida
No ratings yet
Ensayo Sobre Tu Historia de Vida
5 pages
Patek Philippe
No ratings yet
Patek Philippe
5 pages
Bach Prof
No ratings yet
Bach Prof
4 pages
John's Gospel
No ratings yet
John's Gospel
3 pages
DZ01524505211 24 01 2025 Etrf
No ratings yet
DZ01524505211 24 01 2025 Etrf
1 page
Polisi
No ratings yet
Polisi
14 pages
Kierkegaard and Fellini
No ratings yet
Kierkegaard and Fellini
12 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
Navtool Newtestament
0% (1)
Navtool Newtestament
2 pages
Grammar
No ratings yet
Grammar
2 pages
Database Full Report
No ratings yet
Database Full Report
12 pages
Passive Voice
No ratings yet
Passive Voice
5 pages
Let'S Have A Party!: Level
No ratings yet
Let'S Have A Party!: Level
7 pages
Propose Persuade Complain: Tower Dirty Neighbourhood Market
No ratings yet
Propose Persuade Complain: Tower Dirty Neighbourhood Market
5 pages
Passive Exercises With Answers
No ratings yet
Passive Exercises With Answers
6 pages

Lab09 - GPU Programming - Matrix Multiplication

Uploaded by

Lab09 - GPU Programming - Matrix Multiplication

Uploaded by

Lab 09:

GPU Programming - Matrix Multiplication

This is the dot product of the i row of A and the j column of B

// Error checking macro for HIP calls

// Simple HIP kernel that prints from GPU

// Launch kernel with 1 block containing 8 threads

// Wait for GPU to finish

std::cout << "Done!" << std::endl;

You might also like