0% found this document useful (0 votes)
36 views8 pages

Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran

This document introduces OpenCL and GPU acceleration. It describes using OpenCL to accelerate a vector addition operation on a GPU. The key steps are: 1) Writing a C++ program to perform vector addition on the CPU as a baseline, which takes 16.83ms. 2) Writing an OpenCL kernel to perform the same operation on the GPU. 3) Creating an OpenCL program to set up the GPU environment, load the kernel, transfer data and launch the kernel. When run, the GPU version takes only 2.15ms, achieving a 7.8x speedup compared to the CPU version. Speedups decrease for very small problem sizes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran

This document introduces OpenCL and GPU acceleration. It describes using OpenCL to accelerate a vector addition operation on a GPU. The key steps are: 1) Writing a C++ program to perform vector addition on the CPU as a baseline, which takes 16.83ms. 2) Writing an OpenCL kernel to perform the same operation on the GPU. 3) Creating an OpenCL program to set up the GPU environment, load the kernel, transfer data and launch the kernel. When run, the GPU version takes only 2.15ms, achieving a 7.8x speedup compared to the CPU version. Speedups decrease for very small problem sizes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Embedded Linux – GSE5

LAB5 – Introduction to OpenCL

BARRIGA Ponce de Leon Ricardo

GUO Ran
GSE5 LAB5 Linux BARRIGA_GUO

Objectives
Introduction to OpenCL and GPU hardware acceleration.

Introduction
OpenCL (Open Computing Language) is an open standard for parallel programming of
heterogeneous computational resources at processor level.
In this lab, we use both methods to test the speed of executing a segment of code. And then we
compare the performance improvements of the CPU (C ++ code) with GPU acceleration (OpenCL).

1. C++ vector addition


At first we use C++ code which in the fichier vector_add.cpp.
We add the code to measure the execution time of vector addition by using gettimeofday.

As the code above, the parameters start and end was declared as struct timeval. To do this, we use
the library <sys/time.h>.

In the struct timeval, it contains a variable tv_sec which records seconds, and tv_usec which means
microseconds.

Then we calculate and print execution time with the code below:

We compile and run it, then we get the result as the figure below, the execution time is 16,83 ms.

1
GSE5 LAB5 Linux BARRIGA_GUO

2. OpenCL vector addition


To let OpenCL process this operation in parallel on the compute device(s), we need to define a
kernel. The kernel is the OpenCL function which will run on the compute device(s). This kernel is
defined in a separate file named vector_add_opencl.cl.

Then we created a new C++ program vector_add_opencl.cpp (the host CPU program) to setup
and control the execution of previous OpenCL kernel on the compute device (GPU).

• Set up OpenCL environment


A. Create a context on a GPU on the first available platform. An OpenCL context is created with
one or more peripherals. Contexts are used to manage objects such as command queues,
memory, program, and kernel objects, and to run kernels on peripherals specified in the context.
With the help of common.h, we call the function bool createContext(cl_context* context).

B. Create an OpenCL command queue for a given context. Command queue is an object that
holds commands that will be executed on a specific device. The command-queue is created on a
specific  device  in a  context.  Commands  to a  command-queue  are queued in-order but may be
executed in-order or out-of-order.
With the help of common.h, we call the function bool createCommandQueue(cl_context context,
cl_command_queue* commandQueue, cl_device_id* device).

C. Create an OpenCL program from vector_add_opencl.cl. An OpenCL program consists of a set


of kernels. Programs may also contain auxiliary functions called by the __kernel functions and
constant data.

D. Create our OpenCL kernel for the kernel function

• Set up memory/data

2
GSE5 LAB5 Linux BARRIGA_GUO
A. Create 3 memory buffers for the input/output data. By using function clCreateBuffer, we created
a memory object for the kernel, 3 buffers for the input / output data (inputA, inputB, outputC).

B. Initialize the input data. By using function clEnqueueMapBuffer, we mapped the input buffers to
pointers. It enqueues a command to map a region of the buffer object given by buffer into the
host address space and returns a pointer to this mapped region.

C. Set the kernel arguments. We Passed the 3 memory buffers to the kernel as arguments by using
function clSetKernelArg.

• Execute the kernel instances


A. At first, we should define the global work size and enqueue the kernel by using function
clEnqueueNDRangeKernel.

B. Then, we wait for kernel execution completion by using function clFinish. It blocks until all
previously queued OpenCL commands in a command-queue are issued to the associated device
and have completed.

• After execution
A. Retrieve results. We have mapped the output buffer to a local pointer, then we read the results
using the mapped pointer with the help of clEnqueueReadBuffer. For more convenient, we print
the result in terminal.

Then we unmapped the output data with function clEnqueueUnmapMemObject.

3
GSE5 LAB5 Linux BARRIGA_GUO
B. We release OpenCL objects with the function cleanUpOpenCL in common.h. Note that this
function can only be used once, so we create un tableau for releasing.

We measured execution time using clGetEventProfilingInfo

After the compilation, we get the result as the figure below:

Obviously, the execution time is extremely faster than the situation does not use OpenCL. It is
about 14 ms faster than without OpenCL. We have the acceleration 16,83ms / 2,150ms = 7,828.

Then we change the vector size:


we use the size 512*512, the acceleration 4,968ms / 0,649ms = 7,655
we use the size 128*128, the acceleration 0,428ms / 0,0569ms = 7,522
we use the size 16*16, the acceleration 0,065ms / 0,058ms = 1,121
We can see that the acceleration value is stable and approximately equal to 7 when the vector
size is enough large. But when the vector size is very small, such as 16*16, the CPU execution
time is almost the same as OpenCL execution time.

Conclusion
This lab gave us an introduction to hardware acceleration and OpenCL programming. During this
lab, we learned how to write an OpenCL program. We managed to improve the performance of the
addition by using GPUs through OpenCL.

4
GSE5 LAB5 Linux BARRIGA_GUO

Annex
• vector_add.cpp

• vector_add_opencl.cpp

5
GSE5 LAB5 Linux BARRIGA_GUO

6
GSE5 LAB5 Linux BARRIGA_GUO

You might also like