0% found this document useful (0 votes)

41 views6 pages

Parallel Edge Detection by SOBEL Algorithm Using CUDA C

Uploaded by

gokularaman1996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views6 pages

Parallel Edge Detection by SOBEL Algorithm Using CUDA C

Uploaded by

gokularaman1996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2016 IEEE Students’ Conference on Electrical, Electronics and Computer Science

PARALLEL EDGE DETECTION BY SOBEL

ALGORITHM USING CUDA C
Adhir Jain Anand Namdev Dr. Meenu Chawla
Dept. of Computer Science & Engg. Dept. of Computer Science & Engg. Dept. of Computer Science & Engg.
MANIT MANIT MANIT
Bhopal, M.P., India 462003 Bhopal,M.P., India 462003 Bhopal,M.P., India 462003
[email protected] [email protected] [email protected]

Abstract—Edge detection is one of the most important the set of connected pixel that forms the boundary between
paradigm of Image processing. Images contain millions of pixel two disjoint regions. Edges allows the user to observe feature
and each pixel information is independent of its neighbouring of an image where more or less abrupt change in intensity
pixel. Hence this paper puts to test the capability of Graphics
Processing Unit (GPU) to compute in parallel against the occurs. Edge detection has several important applications in
millions of pixel calculations involved in image processing. digital image processing like pattern recognition, medical
Each pixel operation is independent from other thus GPU field,etc. A number of edge detection algorithms have been
can be effectively used along with the help of high level described in previous papers. Sobel operator [1] is one of
programmable interfaces. More specifically, this paper focuses the classic operator used in edge detection as it is a simple
on Compute Unified Device Architecture (CUDA) as its parallel
programming platform and examines the possible gain in operator, insensitive to noise and requires less computations in
time which can be attained for edge detection in images. A comparison to other operators. How to quickly and accurately
well-known algorithm SOBEL for edge detection is used in the extract the edge information of the images is always a hot
experiment. A dataset of images was tested for edge detection research topic. An image consists of millions of pixels and
both serially and parallely. The results of parallel algorithm each pixel operation is independent to its neighbouring pixel.
were further divided according to the machine on which the
algorithm was tested and were further classified according to Thus parallel programming model can be used which focuses
the number of kernel function used in each machine. Results on performing many operations concurrently but slowly rather
showed that parallel implementation is about 262 times and than performing individual operations rapidly [10].
943 times faster when 2 kernel functions were implemented CUDA [5] is a parallel computing architecture developed by
in parallel on GeForce and Tesla machines respectively, and NVidia for massively parallel high performance computing.
about 120 times and 455 times faster when 3 kernel functions
were implemented in parallel on GeForce and Tesla machines It is the compute engine in the GPU and is accessible by
respectively as compared to serial implementation for larger developers through standard programming languages. The
images. Statistics also showed a decline in speedup of about source code for edge detection program consists of both the
52% when 3 kernels were used than when 2 kernels were used serial(CPU) and parallel(GPU) code.
due to increase in communication time. Hence, an analysis This paper compares the performance differences between
came out that the decision regarding which sections of the
algorithm to be parallelised, should be taken wisely. If not, program code that are run on a sequential processor (CPU)
would lead to an additional overhead i.e. communication and a parallel processor (GPU). Also it compares the results
time (time taken in transferring the data from CPU to GPU on different types of GPU machine (GeForce [3] and Tesla
& back from GPU to CPU) thereby reducing the overall speedup. [4]) and on the basis of number of kernel function [2] [11]
involved in each machine for computation.
Keywords — Parallel Edge Detection, Parallel Computing,
CUDA, Kernel, Sobel Operator, Speedup, Image Processing, The organization of this paper is as follows. In section 2, the
MATLAB, Parallelism v/s Speedup Sobel edge detection operator is discussed, CUDA processing
flow is described in section 3. In the section 4, CUDA thread
I. I NTRODUCTION hierarchy and CUDA kernel is introduced, detailed algorithm
Image processing is one of the most important field widely to be implemented is explained in section 5. In section 6, GPU
used today. The human vision has the capability to easily programming in MATLAB is presented. Outcomes and results
acquire, understand and interpret the information stored in the are shown in section 7. In section 8, graphs are analysed along
form of images. On the other hand it is a challenging task with some observations. Finally, the conclusions are stated in
to make the machine understand, acquire these information section 9.
so as to automate the task without human intervention. It is
thus important to learn and implement various techniques of II. SOBEL EDGE DETECTION OPERATOR
the image processing. One of the most important paradigm Edge detection is a common image processing technique
of the image processing is the Image Edge detection. Image used in feature detection and extraction. Applying edge
edges are the most basic features of an image. Edges are detection on an image can significantly reduce the amount of

978-1-4673-7918-2/16/$31.00 c 2016 IEEE

Authorized licensed use limited to: VIT University. Downloaded on November 08,2024 at 14:45:10 UTC from IEEE Xplore. Restrictions apply.
data needed to be processed at a later phase while maintaining The above processing flow is depicted in figure 1 for more
the important structure of the image. The idea is to remove understanding:
everything from the image except the pixels that are part of
an edge. These edges have special properties, such as corners,
lines, curves, etc. A collection of these properties or features
can be used to accomplish a bigger picture, such as image
recognition. An edge can be identified by significant local
changes of intensity in an image. An edge usually divides
two different regions of an image. Most edge detection
algorithms work best on an image that has the noise removal
procedure already applied. A simple edge detection algorithm
is to apply the Sobel edge detection algorithm. It involves
convolving the image using an integer value filter, which is
both simple and computationally inexpensive.
The Sobel operator is widely used in image processing,
particularly within edge detection algorithms. The Sobel’s
operator find the approximate derivative in horizontal and
vertical direction.
Gx = {f (x + 1, y − 1) + 2f (x + 1, y) + f (x + 1, y + 1)} −
{f (x − 1, y − 1) + 2f (x − 1, y) + f (x − 1, y + 1)}

Gy = {f (x − 1, y + 1) + 2f (x, y + 1) + f (x + 1, y + 1)} −
{f (x − 1, y − 1) + 2f (x, y − 1) + f (x + 1, y − 1)}
q
And the net gradient is : g(x, y) = G2x + G2y
Its convolution template operator as follows:

Sobel operator is used to detect the edge of image M,

then horizontal template Tx and vertical template Ty are used
to convolute with the image without taking into account the Figure 1. CUDA processing flow [12]
border conditions. Then the total gradient value G may get
by adding the two gradient matrices. This G is called as the
gradient image. Finally, edge can be detected by applying IV. CUDA THREAD HIERARCHY AND CUDA
threshold to gradient image [6]. KERNEL
Threads on the device are automatically invoked when a
III. CUDA PROCESSING FLOW kernel is being executed. The programmer determines the
number of threads that best suits the given problem. The
To make the GPU work for general purpose calculations, a thread count along with the thread configurations are passed
certain processing flow is to be maintained which is as follows: into the kernel. Figure 2 shows the entire collection of threads
1) Copy input image arrays from CPU memory to GPU responsible for an execution of the kernel, called a grid. A
memory to load required data on device for computation. grid is further partitioned and can consist of one or more
2) Load GPU program and execute, caching data on chip thread blocks. A block is an array of concurrent threads that
for performance. The time required for evaluation of execute the same thread program. A thread block can be
results on GPU is known as execution time. partitioned into one, two or three dimensions. All threads
3) Copy result image arrays back from GPU memory to within a block can cooperate with each other. They can share
CPU memory to further manipulate and display the data by reading and writing to shared memory, and they can
results. synchronize their execution by using syncthreads().

SCEECS 2016
Authorized licensed use limited to: VIT University. Downloaded on November 08,2024 at 14:45:10 UTC from IEEE Xplore. Restrictions apply.
The threading configuration is then passed to the kernel. three sections of the algorithm have been identified for
Within the kernel, this information is stored as built-in executing in parallel which are as follows:
variables. BlockDim holds the dimension information of the (a) Fetching red, blue and green components from colour
current block , BlockIdx and threadIdx provides the current image.
block and thread index information. One limitation on blocks (b) Conversion of Coloured (RGB) image to GRAYSCALE
is that each block can hold up to 512 threads. Once a kernel image.
is launched, the corresponding grid and block structure is (c) Applying Sobel‘s mask and calculation of gradient image.
created.
This paper shows implementation of the algorithm in
two ways: First by defining 2 kernels by parallelizing only
points (b) and (c). Secondly, by defining 3 kernels by
parallelizing all of the above points. Finally the results are
compared regarding speedup on two machines GeForce and
Tesla.
The decision about which section of the code to be parallelised
should be taken wisely. As the parallelism in a code increases
the speedup increases till a image size limit only, after
that any more parallelism in the code results in the overall
reduction of the speedup. The reason being parallelism
involves huge transfer of the data from CPU to GPU for
parallel execution on device side and then again from GPU to
CPU to copy the results on the host side. Now as the image
size increases, both execution time and communication time
increases due to more pixel calculations but communication
time i.e. overhead increases at a higher rate. Thus ratio
of execution time and communication time decreases with
increase in image resolution resulting in the reduction of
the overall speedup. This paper clearly explains the above
phenomenon and analyses the effect of parallelism versus
image resolution with the help of graph in section 8.
Figure 2. Grid of thread blocks [7]

V. DETAILED ALGORITHM
The algorithm is as follows:
1) Fetch the red(R(i,j)), green(G(i,j)) and blue(B(i,j)) com-
ponents of the input colour image.
2) Go to step 3 directly if the image is already in grayscale
format. Otherwise, convert the input image into the
grayscale [8] image using the formula:

gray(i,j) = 0.2989∗R(i,j)+0.5870∗G(i,j)+0.1140∗B(i,j)
3) Apply Sobel‘s operator in both the horizontal and verti-
cal direction to calculate horizontal and vertical gradient.
4) Calculate net gradient and call that image as gradient
image.
5) Finally take input from user through a slider depicting
the different threshold values and apply on the gradient
image.
NOTE: According to the need of the user, he can set the
threshold or treat the gradient image with a predefined
optimum threshold [6] value. Figure 3. Algorithmic Flow
The above algorithm (flowchart in Figure 3) is implemented
both, serially on CPU and parallely on GPU. In this paper,

SCEECS 2016
Authorized licensed use limited to: VIT University. Downloaded on November 08,2024 at 14:45:10 UTC from IEEE Xplore. Restrictions apply.
VI. GPU PROGRAMMING IN MATLAB
GPUs are increasingly applied to scientific calculations. ptx object = parallel.gpu.CUDAKernel(‘Kernel.ptx‘,
Unlike a traditional CPU, which includes no more than a ‘Kernel.cu‘ );
handful of cores, a GPU has a massively parallel array of
integer and floating-point processors, as well as dedicated, Once the ptx object is created a few setup task must
high-speed memory. A typical GPU comprises hundreds of be completed before one can launch it, such as initializing
these smaller processors. The increased throughput made return data and setting the sizes of the thread blocks and grid.
possible by a GPU comes at a cost. Firstly, for computations The kernel can then be used just like any other MATLAB
to be fast enough data must be sent from the CPU to the function, except that the kernel is launched using the feval
GPU before calculation and then retrieved from it afterwards. command, with the following syntax:
Because a GPU is attached to the host CPU via the Peripheral output = feval(ptx object, input Arguments)[11]
Component Interconnect (PCI) Express bus, the memory
access is slower than with a traditional CPU. This means that VII. OUTCOMES AND RESULTS
our overall computational speedup is limited by the amount
of data transfer that occurs in our algorithm. Secondly,
programming for GPUs requires a different model and a skill
set that can be difficult and time-consuming to acquire. Also,
a lot of time is spent on managing and making our code work
at these large numbers of threads.
Experienced programmers can write their own CUDA
code and can use the CUDA Kernel interface in Parallel
Computing Toolbox of MATLAB to integrate the CUDA
code with MATLAB thereby creating a MATLAB object
that provides access to the existing CUDA kernel which is
already converted into PTX code (PTX is a low-level Parallel
Threaded eXecution instruction set). They then invoke the
feval command to evaluate the kernel on the GPU, using
MATLAB arrays as input and output.

A. 64bit .ptx generation

The CUDA kernel code which is written separately cannot
be directly run in MATLAB, hence it is converted to ptx
code which is then executed from MATLAB commands by
creating an object of the created ptx file.
This can be done by running CMD as administrator and Figure 5. The above four figures are the outcomes of the experiment
done with parallel algorithm for Sobel‘s edge detection. (a) original image
typing in the required command. A screenshot has been given Thunderbird of size 1500x1500 pixels, (b) is greyscale converted image,
for proper visualization as shown in Figure 4. (c) is gradient image and image (d) is final edge detected image.

The abbreviations used in the below tables are stated as

follows:
IR= Image Resolution in pixel2
ST = Serial Time in seconds
PT(2k)= Parallel Time when two kernels were used in
seconds
S(2k)= Speedup when two kernels were used
PT(3k)= Parallel Time when three kernels were used in
seconds
S(3k)= Speedup when three kernels were used
Figure 4. Command to generate ptx file [9]
∆(2k,3k)= Percentage decrease in speedup

B. Evaluating the CUDA Kernel in MATLAB

To load the kernel into MATLAB, path is provided to the
compiled PTX file and source code:

SCEECS 2016
Authorized licensed use limited to: VIT University. Downloaded on November 08,2024 at 14:45:10 UTC from IEEE Xplore. Restrictions apply.
Table I executing time which results in the reduction of the speedup.
ALGORITHM RUNNING TIME AND SPEEDUP ON GEFORCE
MACHINE (SLOW)

PT PT
IR ST S (2k) S (3k) ∆(2k,3k)
(2k) (3k)
1500x1500 59.50 0.35 168.36 0.74 79.83 52.58
1800x1800 86.73 0.45 191.79 1.06 81.74 57.38
2200x2200 137.18 0.70 194.43 1.49 91.81 52.77
3500x3500 425.67 1.69 250.74 3.74 113.63 54.68

Table II
ALGORITHM RUNNING TIME AND SPEEDUP ON TESLA
MACHINE (FAST)

PT PT
IR ST S (2k) S (3k) ∆(2k,3k)
(2k) (3k)
1500x1500 59.50 0.08 705.32 0.17 338.52 52.04 Figure 6. Comparative study (2 kernels)
1800x1800 86.73 0.11 725.56 0.25 342.14 52.84
2200x2200 137.18 0.17 771.71 0.36 372.25 51.76 Figure 7 shows the graph which has been drawn against
3500x3500 425.67 0.45 943.88 0.93 454.16 51.88
test cases when GPU code consisted of 3 kernels. One extra
kernel was added in order to produce maximum parallelism in
the code and the results were then analysed.
Table 1 and Table 2 show the results of image datasets. The From the graph it can be clearly observed that the speed up de-
same experiment was performed on around 25 image dataset creased in comparison to the earlier GPU code with 2 kernels,
with their resolution ranging from 256x256 to 10000x10000. the reason being the increase in communication overhead due
The experiment was performed on two machines GeForce and to that extra kernel which also involved transferring the whole
Tesla. The experiment results showed that on an average there data on the GPU side and then executing the kernel on the
was about 52% decrease in speedup when 3 kernels were GPU side. This simply adds to the total time. The graph has
used as compared to when 2 kernels were used. Hence it is been plotted upto a resolution of 7000 on the faster machine
not always required to impose parallelism unnecessarily as it while upto a resolution of 5000 on the slower machine, as
comes more with a cost of communication time as explained memory was insufficient to store the huge arrays of the image
above in the paper. For simulation environment, the hardware on the GPU side. The graph also shows that the speed up
used is described below: gradually decreases after a resolution of 6500 because of the
extra added burden of the communication time of the 3 kernels.

Table III
HARDWARE SPECIFICATIONS

Property CPU GeForce Tesla

No. of cores 8 96 448
Memory capacity (GB) 8 2 8
Memory speed (GHz) 1.90 1.62 1.50

VIII. GRAPHS AND OBSERVATIONS

Figure 6 shows the graph that has been plotted against image
resolution (in pixel2 ) and the speedup on both machines using
2 kernel functions.
From the graph in figure 6 it can be seen that as the image
resolution increases, the speedup increases but the increase
in the speedup is observed only up to a resolution value of Figure 7. Comparative study (3 kernels)
6500, that is any further increase in the size of the image
results in the decrease of the speedup. This huge decrease in
speedup was mainly due to the communication time i.e. it IX. C ONCLUSION
consists of first copying the data on the GPU side and then This project work now concludes that if parallelism is used
executing the data over device side. Hence as the size of the optimally then one can achieve higher speedups, even 1000
image increases the time of transferring the data over the GPU times or 10000 times faster. The implementation contains
side i.e. communication time increase at a higher rate than the both the sequential version and the parallel version. This

SCEECS 2016
Authorized licensed use limited to: VIT University. Downloaded on November 08,2024 at 14:45:10 UTC from IEEE Xplore. Restrictions apply.
allows the reader to compare and contrast the performance
differences between the two executions. The implementation
used in this project achieves a maximum speedup of around
950 times, however if one tries to maximize parallelism where
even not necessary then it results in reduced speedup due to
much increase in communication time which is overhead for
performance . This project explains the two separate speedups
: one with (2 kernels) and the other with (3 kernels) and the
observation comes out to be that speed up reduced by around
52% on both machines : one GeForce with 96 CUDA cores
and the other Tesla with 448 CUDA cores. So one thing is
clear that proper and efficient use of CUDA programming or
parallelism can perform complex computations in much less
time and provide you with higher speedups.
Future work will involve the edge detection of images with
resolution greater than 10000x10000 with the help of Image
Segmentation as GPU memory becomes insufficient to store
large arrays for high resolution images. Image segmentation
is the process of dividing an image into multiple sub images
such that the result is a set of segments that collectively cover
the entire image. Due to this segmentation of image into parts,
instead of transferring whole array to GPU, calculation can be
done in parts and the result can be merged which will help to
evaluate results for more higher resolution images.
ACKNOWLEDGMENT
We express our sincere gratitude to our guide Dr. Meenu
Chawla and thank her for her guidance and support in
completing this paper.

R EFERENCES
[1] SOBEL, I., An Isotropic 3x3 Gradient Operator,. Machine Vision for
Three – Dimensional Scenes, Freeman, H., Academic Pres, NY, 376-
379, 1990.
[2] CUDA C Programming Guide PG-02829-001 v7.0 Page 9-10,2015
[3] http://www.geforce.com/hardware/notebook-gpus/geforce-gt-630m/
specifications
[4] Michael Garland, “Parallel Computing Experiences with CUDA” in
IPDPS 2010, pp. 13-27
[5] Jayshree Ghorpade, Jitendra Parande, Madhura Kulkarni, Amit Bawaskar,
“GPGPU Processing in CUDA Architecture” in Advanced Computing: An
International Journal ( ACIJ ), Vol.3, No.1, January 2012, pp 105-120
[6] Jin-Yu., Yan. ,Xiang., ”Edge Detection of Images Based on Improved
Sobel Operator and Genetic Algorithms”, IEEE International Conference
on Image Analysis and Signal Processing, IASP 2009,April 2009,pp. 31-35
[7] CUDA C Programming Guide PG-02829-001 v7.5 Page 11,2015
[8] R. C. Gonzalez and R.E. Woods, Digital Image Processing, Prentice Hall,
New Jersey, 2008.
[9] Generating CUDA ptx files from Visual Studio [on-
line] Available: http://stackoverflow.com/questions/13426170/
convert-cu-file-to-ptx-file-in-windows?rq=1
[10] Tinku Acharya and Ajay K. Ray,Image Processing Principles and
Applications, John Wiley & Sons, New Jersey 2005
[11] Cliff Woolley,CUDA Overview,NVIDIA Developer Technology Group,
pp. 20-30
[12] Cyril Zeller,CUDA C/C++ Basics,NVIDIA Corporation, Supercomput-
ing Tutorial,pp. 9-11, 2011

SCEECS 2016
Authorized licensed use limited to: VIT University. Downloaded on November 08,2024 at 14:45:10 UTC from IEEE Xplore. Restrictions apply.

NCA-AIIO AI Infrastructure and Operations Exam Free Dumps
No ratings yet
NCA-AIIO AI Infrastructure and Operations Exam Free Dumps
11 pages
Unit 3 - Feature Extraction
No ratings yet
Unit 3 - Feature Extraction
92 pages
FPGA Implementation of Sobel Edge Detection Algori
No ratings yet
FPGA Implementation of Sobel Edge Detection Algori
7 pages
Micro Mart 31 December 2015
100% (1)
Micro Mart 31 December 2015
92 pages
Image Sobel Edge Extraction Algorithm Accelerated by Opencl: Han Xiao Shiyang Xiao Ge Ma Cailin Li
No ratings yet
Image Sobel Edge Extraction Algorithm Accelerated by Opencl: Han Xiao Shiyang Xiao Ge Ma Cailin Li
30 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
11 pages
DS1822 ParallelComputing Unit4
No ratings yet
DS1822 ParallelComputing Unit4
16 pages
VHDL Based Sobel Edge Detection: Abstract
No ratings yet
VHDL Based Sobel Edge Detection: Abstract
7 pages
CNN 5
No ratings yet
CNN 5
8 pages
EE277 Report Group 8
No ratings yet
EE277 Report Group 8
10 pages
Image Processing (DSP Research)
No ratings yet
Image Processing (DSP Research)
13 pages
Zhang 2020 J. Phys. Conf. Ser. 1678 012105
No ratings yet
Zhang 2020 J. Phys. Conf. Ser. 1678 012105
10 pages
Lecture 3 of Computer Vision
No ratings yet
Lecture 3 of Computer Vision
45 pages
An Improved Sobel Edge Algorithm and FPGA Implemen
No ratings yet
An Improved Sobel Edge Algorithm and FPGA Implemen
6 pages
NVDA F3Q23 Investor Presentation FINAL
No ratings yet
NVDA F3Q23 Investor Presentation FINAL
62 pages
Image Project Paper
No ratings yet
Image Project Paper
5 pages
Paper BDF
No ratings yet
Paper BDF
4 pages
Sobel
No ratings yet
Sobel
6 pages
(55-61) Implementation of Edge Detection Algorithm
No ratings yet
(55-61) Implementation of Edge Detection Algorithm
7 pages
Implementation of Sobel Edge Detection With Image Processing On FPGA
No ratings yet
Implementation of Sobel Edge Detection With Image Processing On FPGA
5 pages
Low Computation and High Efficiency Sobel Edge Detector For Robot Vision
No ratings yet
Low Computation and High Efficiency Sobel Edge Detector For Robot Vision
6 pages
Area Optimized Fpga Implementation of Color Edge Detection: Project Guide - Dept of ETC Group - B
No ratings yet
Area Optimized Fpga Implementation of Color Edge Detection: Project Guide - Dept of ETC Group - B
26 pages
DSP Project Example
No ratings yet
DSP Project Example
24 pages
Fosip Expt 8
No ratings yet
Fosip Expt 8
6 pages
Comparative Analysis of Different Optimization Technique For Sobel Edge Detection On FPGA
No ratings yet
Comparative Analysis of Different Optimization Technique For Sobel Edge Detection On FPGA
5 pages
Edge Detection On Fpga
No ratings yet
Edge Detection On Fpga
19 pages
Edge Detection
No ratings yet
Edge Detection
24 pages
0.edge Detection - Sobel Algorithm (8 Files Merged)
No ratings yet
0.edge Detection - Sobel Algorithm (8 Files Merged)
33 pages
Progress Report
No ratings yet
Progress Report
4 pages
DSD Open Ended Task
No ratings yet
DSD Open Ended Task
5 pages
Matecconf cscns2020 03031
No ratings yet
Matecconf cscns2020 03031
6 pages
Jingcheng 2019
No ratings yet
Jingcheng 2019
6 pages
189 425 Ec It
No ratings yet
189 425 Ec It
1 page
133 137 Ec It
No ratings yet
133 137 Ec It
1 page
IBMTurbonomic 8.12.2
No ratings yet
IBMTurbonomic 8.12.2
1,720 pages
Mini Project Abs
No ratings yet
Mini Project Abs
3 pages
Image Segmentation by Sobel Edge Detection Algorithm - Mosaic Method
No ratings yet
Image Segmentation by Sobel Edge Detection Algorithm - Mosaic Method
8 pages
Research On Edge Detection Algorithm in Digital Image Processing
No ratings yet
Research On Edge Detection Algorithm in Digital Image Processing
6 pages
Autonomous College Affiliated To University of Mumbai
No ratings yet
Autonomous College Affiliated To University of Mumbai
5 pages
Heterogeneous Integration Technologies For Artificial Intelligence Applications
No ratings yet
Heterogeneous Integration Technologies For Artificial Intelligence Applications
9 pages
Overview and Comparative Analysis of Edge Detection Techniques in Digital Image Processing
No ratings yet
Overview and Comparative Analysis of Edge Detection Techniques in Digital Image Processing
8 pages
Image Processing With CUDA
No ratings yet
Image Processing With CUDA
66 pages
A1
No ratings yet
A1
24 pages
Image Edge Detection
No ratings yet
Image Edge Detection
20 pages
Multi-Threaded Computation of The Sobel Image Gradient On Intel Multi-Core Processors Using Openmp Library
No ratings yet
Multi-Threaded Computation of The Sobel Image Gradient On Intel Multi-Core Processors Using Openmp Library
14 pages
I Jcs It 20140503241
No ratings yet
I Jcs It 20140503241
5 pages
Comparison of Edge Detection Techniques
No ratings yet
Comparison of Edge Detection Techniques
4 pages
Pedestrian Detection System For Night Vision Application To Avoid Pedestrian Vehicle Related Accidents
No ratings yet
Pedestrian Detection System For Night Vision Application To Avoid Pedestrian Vehicle Related Accidents
6 pages
EDGE-Net: Efficient Deep-Learning Gradients Extraction Network
No ratings yet
EDGE-Net: Efficient Deep-Learning Gradients Extraction Network
15 pages
Image Edge Detection Based On Fpga: Sree Vidyanikethan Engineering College
No ratings yet
Image Edge Detection Based On Fpga: Sree Vidyanikethan Engineering College
38 pages
Sobel Edge Detection Algorithm - Tutorial
No ratings yet
Sobel Edge Detection Algorithm - Tutorial
11 pages
CAED Project PDF
No ratings yet
CAED Project PDF
5 pages
Edge Detection Sobel De2 Altera
No ratings yet
Edge Detection Sobel De2 Altera
4 pages
A FPGA Based Implementation of Sobel Edge Detection
No ratings yet
A FPGA Based Implementation of Sobel Edge Detection
8 pages
Edge Detectors: Deptofcs& E
No ratings yet
Edge Detectors: Deptofcs& E
26 pages
Image Edge Detection Based On Fpga: Presented By: M.Ashok
No ratings yet
Image Edge Detection Based On Fpga: Presented By: M.Ashok
38 pages
Design of Sobel Operator Based Image Edge Detection Algorithm On FPGA
No ratings yet
Design of Sobel Operator Based Image Edge Detection Algorithm On FPGA
5 pages
Parallel Implementation of Sobel Filter Using CUDA
No ratings yet
Parallel Implementation of Sobel Filter Using CUDA
4 pages
5.IJAEST Vol No 9 Issue No 2 FPGA Based Image Edge Detection and Segmentation 187 192
No ratings yet
5.IJAEST Vol No 9 Issue No 2 FPGA Based Image Edge Detection and Segmentation 187 192
6 pages
11-124.nintendo Revolution Lamont - Final
No ratings yet
11-124.nintendo Revolution Lamont - Final
23 pages
Ijetae 0112 51
No ratings yet
Ijetae 0112 51
3 pages
Image Parallel Processing Based On GPU PDF
No ratings yet
Image Parallel Processing Based On GPU PDF
4 pages
Building AI Projects: Starting An AI Project
No ratings yet
Building AI Projects: Starting An AI Project
33 pages
Dell Precision T5600 Spec Sheet PDF
No ratings yet
Dell Precision T5600 Spec Sheet PDF
2 pages
FPGA Based Design and Implementation of Image Edge Detection Using Xilinx System Generator
No ratings yet
FPGA Based Design and Implementation of Image Edge Detection Using Xilinx System Generator
4 pages
1.3.1.6 Worksheet - Build A Specialized Computer System
80% (5)
1.3.1.6 Worksheet - Build A Specialized Computer System
6 pages
Whitepaper Selecting The Right Workstation PDF
No ratings yet
Whitepaper Selecting The Right Workstation PDF
9 pages
Beamng Dxdiag
No ratings yet
Beamng Dxdiag
52 pages
G-GPU A Fully-Automated Generator of GPU-like ASIC Accelerators
No ratings yet
G-GPU A Fully-Automated Generator of GPU-like ASIC Accelerators
4 pages
Transforming Edge Ai With Npus in Microcontrollers
No ratings yet
Transforming Edge Ai With Npus in Microcontrollers
12 pages
HP Probook 440 G11
No ratings yet
HP Probook 440 G11
5 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
Uni Wuppertal Bachelor Thesis Format
100% (3)
Uni Wuppertal Bachelor Thesis Format
7 pages
Biostar b450mx-s Spec
No ratings yet
Biostar b450mx-s Spec
5 pages
VGPU App Guide Revit 2016
No ratings yet
VGPU App Guide Revit 2016
19 pages
Taskflow A Generalpurpose Parallel and Heterogeneous Task Programming System Using Modern CPP Tsungwei Huang Cppcon 2020
No ratings yet
Taskflow A Generalpurpose Parallel and Heterogeneous Task Programming System Using Modern CPP Tsungwei Huang Cppcon 2020
53 pages
From The First Computers To Supercharged AI Chips A Journey Through The Hardware Behind AI1
No ratings yet
From The First Computers To Supercharged AI Chips A Journey Through The Hardware Behind AI1
12 pages
Passware Kit Forensic Datasheet
No ratings yet
Passware Kit Forensic Datasheet
2 pages
Performance and Quality of Random Number Generator PDF
No ratings yet
Performance and Quality of Random Number Generator PDF
7 pages
NB Asus User
No ratings yet
NB Asus User
7 pages
Cpu and Soc: Memory
No ratings yet
Cpu and Soc: Memory
1 page
How To Use Google Colaboratory For Video Processing - CodeProject
No ratings yet
How To Use Google Colaboratory For Video Processing - CodeProject
12 pages
Graphical Display: Graphics Cards Tested : Release 19.1 Minimum Graphics Requirements
No ratings yet
Graphical Display: Graphics Cards Tested : Release 19.1 Minimum Graphics Requirements
2 pages
Viewshed Analysis
No ratings yet
Viewshed Analysis
9 pages
Nvidia RTX A400 Datasheet
No ratings yet
Nvidia RTX A400 Datasheet
2 pages
Gigabyte GA-B250-FinTech-CF Performance Results - UserBenchmark
No ratings yet
Gigabyte GA-B250-FinTech-CF Performance Results - UserBenchmark
5 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
From Everand
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet

Parallel Edge Detection by SOBEL Algorithm Using CUDA C

Uploaded by

Parallel Edge Detection by SOBEL Algorithm Using CUDA C

Uploaded by

2016 IEEE Students’ Conference on Electrical, Electronics and Computer Science

PARALLEL EDGE DETECTION BY SOBEL

978-1-4673-7918-2/16/$31.00 c 2016 IEEE

Sobel operator is used to detect the edge of image M,

A. 64bit .ptx generation

The abbreviations used in the below tables are stated as

B. Evaluating the CUDA Kernel in MATLAB

Property CPU GeForce Tesla

VIII. GRAPHS AND OBSERVATIONS

You might also like