Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran
Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran
GUO Ran
GSE5 LAB5 Linux BARRIGA_GUO
Objectives
Introduction to OpenCL and GPU hardware acceleration.
Introduction
OpenCL (Open Computing Language) is an open standard for parallel programming of
heterogeneous computational resources at processor level.
In this lab, we use both methods to test the speed of executing a segment of code. And then we
compare the performance improvements of the CPU (C ++ code) with GPU acceleration (OpenCL).
As the code above, the parameters start and end was declared as struct timeval. To do this, we use
the library <sys/time.h>.
In the struct timeval, it contains a variable tv_sec which records seconds, and tv_usec which means
microseconds.
Then we calculate and print execution time with the code below:
We compile and run it, then we get the result as the figure below, the execution time is 16,83 ms.
1
GSE5 LAB5 Linux BARRIGA_GUO
Then we created a new C++ program vector_add_opencl.cpp (the host CPU program) to setup
and control the execution of previous OpenCL kernel on the compute device (GPU).
B. Create an OpenCL command queue for a given context. Command queue is an object that
holds commands that will be executed on a specific device. The command-queue is created on a
specific device in a context. Commands to a command-queue are queued in-order but may be
executed in-order or out-of-order.
With the help of common.h, we call the function bool createCommandQueue(cl_context context,
cl_command_queue* commandQueue, cl_device_id* device).
• Set up memory/data
2
GSE5 LAB5 Linux BARRIGA_GUO
A. Create 3 memory buffers for the input/output data. By using function clCreateBuffer, we created
a memory object for the kernel, 3 buffers for the input / output data (inputA, inputB, outputC).
B. Initialize the input data. By using function clEnqueueMapBuffer, we mapped the input buffers to
pointers. It enqueues a command to map a region of the buffer object given by buffer into the
host address space and returns a pointer to this mapped region.
C. Set the kernel arguments. We Passed the 3 memory buffers to the kernel as arguments by using
function clSetKernelArg.
B. Then, we wait for kernel execution completion by using function clFinish. It blocks until all
previously queued OpenCL commands in a command-queue are issued to the associated device
and have completed.
• After execution
A. Retrieve results. We have mapped the output buffer to a local pointer, then we read the results
using the mapped pointer with the help of clEnqueueReadBuffer. For more convenient, we print
the result in terminal.
3
GSE5 LAB5 Linux BARRIGA_GUO
B. We release OpenCL objects with the function cleanUpOpenCL in common.h. Note that this
function can only be used once, so we create un tableau for releasing.
Obviously, the execution time is extremely faster than the situation does not use OpenCL. It is
about 14 ms faster than without OpenCL. We have the acceleration 16,83ms / 2,150ms = 7,828.
Conclusion
This lab gave us an introduction to hardware acceleration and OpenCL programming. During this
lab, we learned how to write an OpenCL program. We managed to improve the performance of the
addition by using GPUs through OpenCL.
4
GSE5 LAB5 Linux BARRIGA_GUO
Annex
• vector_add.cpp
• vector_add_opencl.cpp
5
GSE5 LAB5 Linux BARRIGA_GUO
6
GSE5 LAB5 Linux BARRIGA_GUO