0% found this document useful (0 votes)
9 views3 pages

Lab7 TPU

The document provides instructions and code for benchmarking matrix multiplication on CPU, TPU, and GPU using PyTorch. It includes the setup process for each hardware accelerator and the execution time results for various matrix sizes. Additionally, it features code to visualize the execution times using matplotlib.

Uploaded by

Amal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Lab7 TPU

The document provides instructions and code for benchmarking matrix multiplication on CPU, TPU, and GPU using PyTorch. It includes the setup process for each hardware accelerator and the execution time results for various matrix sizes. Additionally, it features code to visualize the execution times using matplotlib.

Uploaded by

Amal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

CPU version

Run the following: Click on Runtime menu > Click on


Change runtime type > Choose CPU under Hardware accelerator > Click on
Save

import time
def benchmark(func, A, B, label, runs=3):
times = []
for _ in range(runs):
torch.cuda.empty_cache()
start = time.time()
_ = func(A, B)
torch.cuda.synchronize() if torch.cuda.is_available() else None
times.append(time.time() - start)
avg_time = sum(times) / len(times)
print(f"{label}: {avg_time:.4f} seconds")
return avg_time

import torch

cpu = torch.device("cpu")

MAT = lambda N : torch.randn(N, N)

results = []
for N in range(1024, 8192+1024, 1024):
results.append(benchmark(torch.matmul, MAT(N), MAT(N), f"N = {N}"))

N = 1024: 0.2094 seconds


N = 2048: 0.6324 seconds
N = 3072: 1.3461 seconds
N = 4096: 1.8144 seconds
N = 5120: 3.9984 seconds
N = 6144: 6.7071 seconds
N = 7168: 10.2410 seconds
N = 8192: 15.9684 seconds

import matplotlib.pyplot as plt

plt.plot(range(1024, 8192+1024, 1024), results, 'o-')


plt.xlabel("Matrix size")
plt.ylabel("Execution time (seconds)")
plt.title("Matrix multiplication execution time")
plt.grid(True)
plt.show()

[]

TPU version

Run the following: Click on Runtime menu > Click on


Change runtime type > Choose v2-8 TPU under Hardware accelerator > Click
on Save

import time
def benchmark(func, A, B, label, runs=3):
times = []
size = A.shape[0] - 1
for _ in range(runs):
torch.cuda.empty_cache()
start = time.time()
_ = func(A, B)
print(f"{_[size][size]*_[int(size/2)][int(size/2)]*_[size][int(size/
2)]*_[int(size/2)][size]*0}")
torch.cuda.synchronize() if torch.cuda.is_available() else None
times.append(time.time() - start)
avg_time = sum(times) / len(times)
print(f"{label}: {avg_time:.4f} seconds")
return avg_time

import torch_xla.core.xla_model as xm

tpu = xm.xla_device()

import torch

MAT = lambda N : torch.randn(N, N, device=tpu)

results = []
for N in range(1024, 8192+1024, 1024):
results.append(benchmark(torch.matmul, MAT(N), MAT(N), f"N = {N}"))

-0.0
-0.0
-0.0
N = 1024: 0.2381 seconds
-0.0
-0.0
-0.0
N = 2048: 0.2768 seconds
0.0
0.0
0.0
N = 3072: 0.3290 seconds
0.0
0.0
0.0
N = 4096: 0.3554 seconds
-0.0
-0.0
-0.0
N = 5120: 0.4916 seconds
-0.0
-0.0
-0.0
N = 6144: 0.5138 seconds
0.0
0.0
0.0
N = 7168: 0.5576 seconds
-0.0
-0.0
-0.0
N = 8192: 0.5849 seconds
import matplotlib.pyplot as plt

plt.plot(range(1024, 8192+1024, 1024), results, 'o-')


plt.xlabel("Matrix size")
plt.ylabel("Execution time (seconds)")
plt.title("Matrix multiplication execution time")
plt.grid(True)
plt.show()

[]

GPU version

Run the following: Click on Runtime menu > Click on


Change runtime type > Choose GPU T4 under Hardware accelerator > Click
on Save

import time
def benchmark(func, A, B, label, runs=3):
times = []
for _ in range(runs):
torch.cuda.empty_cache()
start = time.time()
_ = func(A, B)
torch.cuda.synchronize() if torch.cuda.is_available() else None
times.append(time.time() - start)
avg_time = sum(times) / len(times)
print(f"{label}: {avg_time:.4f} seconds")
return avg_time

import torch

gpu = torch.device("cuda")

MAT = lambda N : torch.randn(N, N, device=gpu)

results = []
for N in range(1024, (8192+1024), 1024):
results.append(benchmark(torch.matmul, MAT(N), MAT(N), f"N = {N}"))

N = 1024: 0.0456 seconds


N = 2048: 0.0062 seconds
N = 3072: 0.0202 seconds
N = 4096: 0.0401 seconds
N = 5120: 0.0679 seconds
N = 6144: 0.1210 seconds
N = 7168: 0.1905 seconds
N = 8192: 0.2765 seconds

import matplotlib.pyplot as plt

plt.plot(range(1024, 8192+1024, 1024), results, 'o-')


plt.xlabel("Matrix size")
plt.ylabel("Execution time (seconds)")
plt.title("Matrix multiplication execution time")
plt.grid(True)
plt.show()

[]

You might also like