Image Blurring Report
Image Blurring Report
Non-Tiling
Objective
The purpose of this lab is to implement and optimize an image blur filter using CUDA. The
optimization is achieved through tiling to reduce global memory accesses. The tiled and non-tiled
versions are compared in terms of performance, memory bandwidth utilization, and global memory
access patterns.
Implementation
The non-tiled version accesses global memory directly for each pixel in the BLUR_SIZE x
BLUR_SIZE region. The tiled version uses shared memory to load a tile of the image, reducing
Experimentation
Experiments were conducted using a 1024x1024 grayscale image. Execution times were measured
for different tile sizes and blur sizes. Results showed significant performance improvements in the
tiled version.
Results
|-----------|-----------|-------------------|---------------|
The tiled implementation significantly reduced execution times compared to the non-tiled version.
Shared memory usage reduced global memory accesses, leading to better performance. The
experiments demonstrated that tiling is a powerful optimization technique for image processing tasks
on the GPU.