0% found this document useful (0 votes)

13 views

Image Blurring Report

Uploaded by

omarobeidd03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Image Blurring Report

Uploaded by

omarobeidd03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Image Blurring with CUDA - Tiling vs.

Non-Tiling

Objective

The purpose of this lab is to implement and optimize an image blur filter using CUDA. The

optimization is achieved through tiling to reduce global memory accesses. The tiled and non-tiled

versions are compared in terms of performance, memory bandwidth utilization, and global memory

access patterns.

Implementation

The non-tiled version accesses global memory directly for each pixel in the BLUR_SIZE x

BLUR_SIZE region. The tiled version uses shared memory to load a tile of the image, reducing

redundant global memory accesses and improving performance.

Experimentation

Experiments were conducted using a 1024x1024 grayscale image. Execution times were measured

for different tile sizes and blur sizes. Results showed significant performance improvements in the

tiled version.

Results

Execution Times (in milliseconds):

| Tile Size | Blur Size | Non-Tiled Version | Tiled Version |

|-----------|-----------|-------------------|---------------|

| N/A |1 | 23.5 | 15.2 |

| N/A |2 | 45.1 | 28.6 |

| N/A |3 | 71.3 | 45.9 |

| 16x16 |1 | N/A | 12.8 |

| 16x16 |2 | N/A | 24.5 |

| 16x16 |3 | N/A | 39.7 |

Conclusion

The tiled implementation significantly reduced execution times compared to the non-tiled version.

Shared memory usage reduced global memory accesses, leading to better performance. The

experiments demonstrated that tiling is a powerful optimization technique for image processing tasks

on the GPU.

LAB2
No ratings yet
LAB2
4 pages
GPU Architecture and Parallel Programming: Tiled Convolution Analysis
No ratings yet
GPU Architecture and Parallel Programming: Tiled Convolution Analysis
18 pages
AED Report
No ratings yet
AED Report
6 pages
UNIT-5 Tiling
No ratings yet
UNIT-5 Tiling
23 pages
Li Yi Wei
No ratings yet
Li Yi Wei
25 pages
tilining
No ratings yet
tilining
23 pages
Tile Size Selection Revisited: Loop Tiling (Wolfe
No ratings yet
Tile Size Selection Revisited: Loop Tiling (Wolfe
27 pages
12 Gpu Cuda 3
No ratings yet
12 Gpu Cuda 3
58 pages
CUDA_Memory
No ratings yet
CUDA_Memory
56 pages
Sobel Edge Detector in VB
No ratings yet
Sobel Edge Detector in VB
7 pages
73 Rasterization HwImplOfMicropolygonRasterization
No ratings yet
73 Rasterization HwImplOfMicropolygonRasterization
9 pages
Use of Reconfigurable FPGA For Image Processing
No ratings yet
Use of Reconfigurable FPGA For Image Processing
5 pages
A Fast and Shorter Path Finding Method For Maze Images by Image Processing Techniques and Graph Theory
No ratings yet
A Fast and Shorter Path Finding Method For Maze Images by Image Processing Techniques and Graph Theory
5 pages
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
No ratings yet
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
132 pages
Lee2014 - A New Integral Image Structure For Memory Size Reduction
No ratings yet
Lee2014 - A New Integral Image Structure For Memory Size Reduction
3 pages
Sahilcg
No ratings yet
Sahilcg
21 pages
Power Reduction and Prediction Techniques For 3-D Reconfigurable Architectures
No ratings yet
Power Reduction and Prediction Techniques For 3-D Reconfigurable Architectures
19 pages
HPA Lab7 Report
No ratings yet
HPA Lab7 Report
7 pages
sc09 Fluid Sim Cohen
No ratings yet
sc09 Fluid Sim Cohen
33 pages
Final Project Report MRI Reconstruction
No ratings yet
Final Project Report MRI Reconstruction
19 pages
Ece408 Lecture5 CUDA Tiled Matrix Multiplication
No ratings yet
Ece408 Lecture5 CUDA Tiled Matrix Multiplication
31 pages
Image Processing Using Fpgas: Imaging
No ratings yet
Image Processing Using Fpgas: Imaging
4 pages
C. HPC Based Optimized NEXT 2-D LFSR The NEXT 2-D LFSR Synthesis Algorithm (10), Written
No ratings yet
C. HPC Based Optimized NEXT 2-D LFSR The NEXT 2-D LFSR Synthesis Algorithm (10), Written
1 page
Ghost Cells
No ratings yet
Ghost Cells
16 pages
VSCSE-Lecture3-cuda-memory-model-2012
No ratings yet
VSCSE-Lecture3-cuda-memory-model-2012
31 pages
HPC Revision
No ratings yet
HPC Revision
16 pages
FPGA-Based Feature Detection
No ratings yet
FPGA-Based Feature Detection
9 pages
Matrix-Matrix Multiplication Using Shared Memory
No ratings yet
Matrix-Matrix Multiplication Using Shared Memory
27 pages
Optimal Periodoc Memory Allocation
No ratings yet
Optimal Periodoc Memory Allocation
40 pages
Summary Master Thesis
No ratings yet
Summary Master Thesis
3 pages
Median Filter PDF
No ratings yet
Median Filter PDF
2 pages
Report Template PDF
No ratings yet
Report Template PDF
9 pages
Seam Carving
No ratings yet
Seam Carving
7 pages
Gimp Alpha Matting
No ratings yet
Gimp Alpha Matting
5 pages
Smart Texture Magnification Filtering PDF
No ratings yet
Smart Texture Magnification Filtering PDF
10 pages
MAXON CINEMA 4D R16 Studio: A Tutorial Approach, 3rd Edition
From Everand
MAXON CINEMA 4D R16 Studio: A Tutorial Approach, 3rd Edition
Prof. Sham Tickoo
No ratings yet
Chapter 3
No ratings yet
Chapter 3
20 pages
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
Algorithm and Architecture Optimization For 2D Dis
No ratings yet
Algorithm and Architecture Optimization For 2D Dis
17 pages
Gpu Zen 2 Advanced Rendering Techniques 179758314x 9781797583143
No ratings yet
Gpu Zen 2 Advanced Rendering Techniques 179758314x 9781797583143
304 pages
2.2 Flow Around A Cylinder: The Open Source CFD Toolbox
No ratings yet
2.2 Flow Around A Cylinder: The Open Source CFD Toolbox
7 pages
Abstract
No ratings yet
Abstract
4 pages
TFM - Daniel - Resumen Paper. ULTIMO S
No ratings yet
TFM - Daniel - Resumen Paper. ULTIMO S
5 pages
Csit3913 PDF
No ratings yet
Csit3913 PDF
12 pages
Dutton Marcus F 201105 PHD PDF
No ratings yet
Dutton Marcus F 201105 PHD PDF
230 pages
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
Blender Pro Studio Advanced Techniques for Real-World Projects: Blender, #3
From Everand
Blender Pro Studio Advanced Techniques for Real-World Projects: Blender, #3
Steven Mcananey
No ratings yet
Automatically Converting C/ C++ To Opencl/Cuda: Introduction by David Williams
No ratings yet
Automatically Converting C/ C++ To Opencl/Cuda: Introduction by David Williams
52 pages
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
No ratings yet
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
44 pages
ML-based Fast On-Chip Transient Thermal Simulation For Heterogeneous 2.5D 3D IC Designs
No ratings yet
ML-based Fast On-Chip Transient Thermal Simulation For Heterogeneous 2.5D 3D IC Designs
8 pages
Image Processing Paper
No ratings yet
Image Processing Paper
5 pages
Packing Square Tiles Into One Texture: EUROGRAPHICS 2004 / M. Alexa and E. Galin
No ratings yet
Packing Square Tiles Into One Texture: EUROGRAPHICS 2004 / M. Alexa and E. Galin
4 pages
CUDA 2D Stencil Computations For The Jacobi Method: Jos e Mar Ia Cecilia, Jos e Manuel Garc Ia, and Manuel Ujald On
No ratings yet
CUDA 2D Stencil Computations For The Jacobi Method: Jos e Mar Ia Cecilia, Jos e Manuel Garc Ia, and Manuel Ujald On
4 pages
Gpgpu Final Report
No ratings yet
Gpgpu Final Report
9 pages
Mastering CUDA C Programming
From Everand
Mastering CUDA C Programming
Ed Norex
No ratings yet
An Efficient and Effective Steganographic Method Using Reversible Texture Synthesis
No ratings yet
An Efficient and Effective Steganographic Method Using Reversible Texture Synthesis
5 pages
Data Flow Graph Mapping Techniques of Computer Architecture With Data Driven Computation Model
No ratings yet
Data Flow Graph Mapping Techniques of Computer Architecture With Data Driven Computation Model
5 pages
3-Binary image analysis I
No ratings yet
3-Binary image analysis I
19 pages
OpenGL to WebGL: Bridging the Graphics Divide
From Everand
OpenGL to WebGL: Bridging the Graphics Divide
Kameron Hussain
No ratings yet
Image Processing With CUDA
No ratings yet
Image Processing With CUDA
66 pages
LEC17_MTH305(1) (1)
No ratings yet
LEC17_MTH305(1) (1)
34 pages
LEC16_MTH305
No ratings yet
LEC16_MTH305
72 pages
CSC 438 Blockchain Systems - Programming Project
No ratings yet
CSC 438 Blockchain Systems - Programming Project
4 pages
Deep Learning L5
No ratings yet
Deep Learning L5
17 pages
CSC447 Multidimensional Grids and Data
No ratings yet
CSC447 Multidimensional Grids and Data
65 pages
Semaphores and Mutexes
No ratings yet
Semaphores and Mutexes
36 pages
Deep Learning L4
No ratings yet
Deep Learning L4
19 pages
Google Colab Solution Activity
No ratings yet
Google Colab Solution Activity
5 pages
Lt.j-A) : TLR-D
No ratings yet
Lt.j-A) : TLR-D
4 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
CH 1234 Summaries SE
No ratings yet
CH 1234 Summaries SE
13 pages
CSC430 L2 Sum
No ratings yet
CSC430 L2 Sum
3 pages

Image Blurring Report

Uploaded by

Image Blurring Report

Uploaded by

Image Blurring with CUDA - Tiling vs.

redundant global memory accesses and improving performance.

Execution Times (in milliseconds):

| Tile Size | Blur Size | Non-Tiled Version | Tiled Version |

| N/A |1 | 23.5 | 15.2 |

| N/A |2 | 45.1 | 28.6 |

| N/A |3 | 71.3 | 45.9 |

| 16x16 |1 | N/A | 12.8 |

| 16x16 |2 | N/A | 24.5 |

| 16x16 |3 | N/A | 39.7 |

You might also like