0% found this document useful (0 votes)

36 views8 pages

Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran

This document introduces OpenCL and GPU acceleration. It describes using OpenCL to accelerate a vector addition operation on a GPU. The key steps are: 1) Writing a C++ program to perform vector addition on the CPU as a baseline, which takes 16.83ms. 2) Writing an OpenCL kernel to perform the same operation on the GPU. 3) Creating an OpenCL program to set up the GPU environment, load the kernel, transfer data and launch the kernel. When run, the GPU version takes only 2.15ms, achieving a 7.8x speedup compared to the CPU version. Speedups decrease for very small problem sizes.

Uploaded by

RicardoBarrigaPoncedeLeon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views8 pages

Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran

Uploaded by

RicardoBarrigaPoncedeLeon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Embedded Linux – GSE5

LAB5 – Introduction to OpenCL

BARRIGA Ponce de Leon Ricardo

GUO Ran
GSE5 LAB5 Linux BARRIGA_GUO

Objectives
Introduction to OpenCL and GPU hardware acceleration.

Introduction
OpenCL (Open Computing Language) is an open standard for parallel programming of
heterogeneous computational resources at processor level.
In this lab, we use both methods to test the speed of executing a segment of code. And then we
compare the performance improvements of the CPU (C ++ code) with GPU acceleration (OpenCL).

1. C++ vector addition

At first we use C++ code which in the fichier vector_add.cpp.
We add the code to measure the execution time of vector addition by using gettimeofday.

As the code above, the parameters start and end was declared as struct timeval. To do this, we use
the library <sys/time.h>.

In the struct timeval, it contains a variable tv_sec which records seconds, and tv_usec which means
microseconds.

Then we calculate and print execution time with the code below:

We compile and run it, then we get the result as the figure below, the execution time is 16,83 ms.

1
GSE5 LAB5 Linux BARRIGA_GUO

2. OpenCL vector addition

To let OpenCL process this operation in parallel on the compute device(s), we need to define a
kernel. The kernel is the OpenCL function which will run on the compute device(s). This kernel is
defined in a separate file named vector_add_opencl.cl.

Then we created a new C++ program vector_add_opencl.cpp (the host CPU program) to setup
and control the execution of previous OpenCL kernel on the compute device (GPU).

• Set up OpenCL environment

A. Create a context on a GPU on the first available platform. An OpenCL context is created with
one or more peripherals. Contexts are used to manage objects such as command queues,
memory, program, and kernel objects, and to run kernels on peripherals specified in the context.
With the help of common.h, we call the function bool createContext(cl_context* context).

B. Create an OpenCL command queue for a given context. Command queue is an object that
holds commands that will be executed on a specific device. The command-queue is created on a
specific device in a context. Commands to a command-queue are queued in-order but may be
executed in-order or out-of-order.
With the help of common.h, we call the function bool createCommandQueue(cl_context context,
cl_command_queue* commandQueue, cl_device_id* device).

C. Create an OpenCL program from vector_add_opencl.cl. An OpenCL program consists of a set

of kernels. Programs may also contain auxiliary functions called by the __kernel functions and
constant data.

D. Create our OpenCL kernel for the kernel function

• Set up memory/data

2
GSE5 LAB5 Linux BARRIGA_GUO
A. Create 3 memory buffers for the input/output data. By using function clCreateBuffer, we created
a memory object for the kernel, 3 buffers for the input / output data (inputA, inputB, outputC).

B. Initialize the input data. By using function clEnqueueMapBuffer, we mapped the input buffers to
pointers. It enqueues a command to map a region of the buffer object given by buffer into the
host address space and returns a pointer to this mapped region.

C. Set the kernel arguments. We Passed the 3 memory buffers to the kernel as arguments by using
function clSetKernelArg.

• Execute the kernel instances

A. At first, we should define the global work size and enqueue the kernel by using function
clEnqueueNDRangeKernel.

B. Then, we wait for kernel execution completion by using function clFinish. It blocks until all
previously queued OpenCL commands in a command-queue are issued to the associated device
and have completed.

• After execution
A. Retrieve results. We have mapped the output buffer to a local pointer, then we read the results
using the mapped pointer with the help of clEnqueueReadBuffer. For more convenient, we print
the result in terminal.

Then we unmapped the output data with function clEnqueueUnmapMemObject.

3
GSE5 LAB5 Linux BARRIGA_GUO
B. We release OpenCL objects with the function cleanUpOpenCL in common.h. Note that this
function can only be used once, so we create un tableau for releasing.

We measured execution time using clGetEventProfilingInfo

After the compilation, we get the result as the figure below:

Obviously, the execution time is extremely faster than the situation does not use OpenCL. It is
about 14 ms faster than without OpenCL. We have the acceleration 16,83ms / 2,150ms = 7,828.

Then we change the vector size:

we use the size 512*512, the acceleration 4,968ms / 0,649ms = 7,655
we use the size 128*128, the acceleration 0,428ms / 0,0569ms = 7,522
we use the size 16*16, the acceleration 0,065ms / 0,058ms = 1,121
We can see that the acceleration value is stable and approximately equal to 7 when the vector
size is enough large. But when the vector size is very small, such as 16*16, the CPU execution
time is almost the same as OpenCL execution time.

Conclusion
This lab gave us an introduction to hardware acceleration and OpenCL programming. During this
lab, we learned how to write an OpenCL program. We managed to improve the performance of the
addition by using GPUs through OpenCL.

4
GSE5 LAB5 Linux BARRIGA_GUO

Annex
• vector_add.cpp

• vector_add_opencl.cpp

5
GSE5 LAB5 Linux BARRIGA_GUO

6
GSE5 LAB5 Linux BARRIGA_GUO

01 Cuda C Basics
No ratings yet
01 Cuda C Basics
32 pages
Openroads Manual For Designers
100% (1)
Openroads Manual For Designers
108 pages
Detailed Lesson Plan in Css
100% (1)
Detailed Lesson Plan in Css
13 pages
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
No ratings yet
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
128 pages
Introduction To OpenCL With Examples
No ratings yet
Introduction To OpenCL With Examples
128 pages
LTE Overview
No ratings yet
LTE Overview
44 pages
Assignment 5 - OpenCL Optimizations
100% (1)
Assignment 5 - OpenCL Optimizations
2 pages
t100 Manual
No ratings yet
t100 Manual
40 pages
CS-3006 7 UsingOpenCL DataParallelProgramming
No ratings yet
CS-3006 7 UsingOpenCL DataParallelProgramming
80 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
11 - OpenCL Fundamentals
No ratings yet
11 - OpenCL Fundamentals
253 pages
NVIDIA OpenCL JumpStart Guide
No ratings yet
NVIDIA OpenCL JumpStart Guide
15 pages
OpenCL For EiT-M
No ratings yet
OpenCL For EiT-M
41 pages
SJ-20130930111324-002-ZXMW NR8120 (V2.03.02) System Description
100% (3)
SJ-20130930111324-002-ZXMW NR8120 (V2.03.02) System Description
83 pages
Hill CH 2 Ed 3
No ratings yet
Hill CH 2 Ed 3
61 pages
A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture
No ratings yet
A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture
74 pages
PostgreSQL OpenCL Procedural Language
No ratings yet
PostgreSQL OpenCL Procedural Language
29 pages
Opencl: These Notes Will Introduce Opencl
No ratings yet
Opencl: These Notes Will Introduce Opencl
34 pages
AdvancedOpenCL Full
No ratings yet
AdvancedOpenCL Full
101 pages
Opencl 03 Basics
No ratings yet
Opencl 03 Basics
62 pages
Computer Graphics Topic: Introduction To Opengl Review Lecture-Sessional I
No ratings yet
Computer Graphics Topic: Introduction To Opengl Review Lecture-Sessional I
35 pages
High-Performance Tomographic Reconstruction Using OpenCL
No ratings yet
High-Performance Tomographic Reconstruction Using OpenCL
99 pages
Upcrc Opencl Lec1
No ratings yet
Upcrc Opencl Lec1
38 pages
Figure List
No ratings yet
Figure List
57 pages
Lecture17 12
No ratings yet
Lecture17 12
86 pages
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
No ratings yet
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
258 pages
I.C.T 2020
No ratings yet
I.C.T 2020
16 pages
Splunk 8.2 Cloud Administration
No ratings yet
Splunk 8.2 Cloud Administration
386 pages
FPGA and OpenCL
No ratings yet
FPGA and OpenCL
31 pages
Lecture07 OpenGL Interaction
No ratings yet
Lecture07 OpenGL Interaction
49 pages
Introduction To OpenCL Programming (201005)
No ratings yet
Introduction To OpenCL Programming (201005)
132 pages
Digital Video Guidebook
100% (2)
Digital Video Guidebook
18 pages
Introduction To OpenCL
No ratings yet
Introduction To OpenCL
44 pages
Opencl 1pp PDF
No ratings yet
Opencl 1pp PDF
48 pages
OpenCL Guide
No ratings yet
OpenCL Guide
19 pages
Lecture 19-Opencl: Ece 459: Programming For Performance
No ratings yet
Lecture 19-Opencl: Ece 459: Programming For Performance
47 pages
OpenCL A Parallel Programming Standart For Heterogeneous
No ratings yet
OpenCL A Parallel Programming Standart For Heterogeneous
12 pages
Opencl 2pp
No ratings yet
Opencl 2pp
28 pages
GPU Programming Using openCL
No ratings yet
GPU Programming Using openCL
13 pages
Pete Presentation 2
No ratings yet
Pete Presentation 2
17 pages
Gpu Rigid Body Simulation Using Opencl: Bullet 2.X Refactoring
No ratings yet
Gpu Rigid Body Simulation Using Opencl: Bullet 2.X Refactoring
22 pages
06-Intro To Opencl PDF
No ratings yet
06-Intro To Opencl PDF
57 pages
3 Heterogeneous Computer Architectures: 3.1 Gpus
No ratings yet
3 Heterogeneous Computer Architectures: 3.1 Gpus
16 pages
OpenCL Image Convolution Filter - Box Filter
No ratings yet
OpenCL Image Convolution Filter - Box Filter
8 pages
Opencl Programming For The Cuda Architecture
No ratings yet
Opencl Programming For The Cuda Architecture
23 pages
Lab 1
No ratings yet
Lab 1
13 pages
Parallel Programming in Opencl: Advanced Graphics & Image Processing
No ratings yet
Parallel Programming in Opencl: Advanced Graphics & Image Processing
31 pages
My Experiments: Opencl Gpu Matrix Multiplication Program
No ratings yet
My Experiments: Opencl Gpu Matrix Multiplication Program
19 pages
OpenGL vs. OpenCL, Which To Choose and Why - Stack Overflow
No ratings yet
OpenGL vs. OpenCL, Which To Choose and Why - Stack Overflow
9 pages
1 Open GLBasics
No ratings yet
1 Open GLBasics
34 pages
OpenCL Jumpstart Guide
No ratings yet
OpenCL Jumpstart Guide
17 pages
Supercomputing On Graphics Cards: Marcus Bannerman
No ratings yet
Supercomputing On Graphics Cards: Marcus Bannerman
18 pages
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
No ratings yet
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
17 pages
Hello World in EDK
No ratings yet
Hello World in EDK
21 pages
SL Lab Manual
No ratings yet
SL Lab Manual
9 pages
SwapMagic v3.6 UserManual
No ratings yet
SwapMagic v3.6 UserManual
2 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
CG Exp-1 Case Study2
No ratings yet
CG Exp-1 Case Study2
4 pages
OpenCL On de Series Boards
No ratings yet
OpenCL On de Series Boards
18 pages
A Es Implementation On Open CL
No ratings yet
A Es Implementation On Open CL
6 pages
Clenqueuereadbuffer (Queue, C - Buffer,, 0, N, C, 0, ,)
No ratings yet
Clenqueuereadbuffer (Queue, C - Buffer,, 0, N, C, 0, ,)
3 pages
DNA Assembly With de Bruijn Graphs On FPGA PDF
No ratings yet
DNA Assembly With de Bruijn Graphs On FPGA PDF
4 pages
Stage 6 - Servo Control Using Freertos Queues and Cli
No ratings yet
Stage 6 - Servo Control Using Freertos Queues and Cli
4 pages
Opencl 1 1 Quick Reference Card
No ratings yet
Opencl 1 1 Quick Reference Card
8 pages
IT6L3 Computer Graphics and Algorithms Lab Credits:2 Internal Assessment: 25 Marks Lab: 3 Periods/week Semester End Examination: 50 Marks Objectives
No ratings yet
IT6L3 Computer Graphics and Algorithms Lab Credits:2 Internal Assessment: 25 Marks Lab: 3 Periods/week Semester End Examination: 50 Marks Objectives
1 page
Karthik Bip38 Decrypt - Py
No ratings yet
Karthik Bip38 Decrypt - Py
2 pages
Elections in African Developing Democracies 1st Edition by Hilary A A Miezah 9783319537061 3319537067 Instructor Test Bank
No ratings yet
Elections in African Developing Democracies 1st Edition by Hilary A A Miezah 9783319537061 3319537067 Instructor Test Bank
337 pages
ADP Pattern & Interview 24 OCT 2007: Paper
No ratings yet
ADP Pattern & Interview 24 OCT 2007: Paper
5 pages
Build A Human Lightwave
No ratings yet
Build A Human Lightwave
6 pages
Netflix - Ecommerce
No ratings yet
Netflix - Ecommerce
17 pages
ConfigGuide IMDS PDF
No ratings yet
ConfigGuide IMDS PDF
23 pages
Oracle SPARC Servers Solution Engineer Assessment Examen
No ratings yet
Oracle SPARC Servers Solution Engineer Assessment Examen
7 pages
Formation of Bus Admittance and Impedance Matrices and Solution of Networks Date: Expt No: Aim
No ratings yet
Formation of Bus Admittance and Impedance Matrices and Solution of Networks Date: Expt No: Aim
6 pages
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
No ratings yet
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
19 pages
Statistics and Probability Reviewer
No ratings yet
Statistics and Probability Reviewer
10 pages
AI Class 9 Part A
No ratings yet
AI Class 9 Part A
26 pages
OD2e L2 Word List
No ratings yet
OD2e L2 Word List
5 pages
Catalogue of Unbalanced Chromosome Aberrations in Man 2nd Edition Albert Schinzel PDF Download
100% (3)
Catalogue of Unbalanced Chromosome Aberrations in Man 2nd Edition Albert Schinzel PDF Download
19 pages
KRA-KYC Text File Download Structure - Ver2.2
No ratings yet
KRA-KYC Text File Download Structure - Ver2.2
49 pages
How To Set Up A LLC in USA For Non Residents
No ratings yet
How To Set Up A LLC in USA For Non Residents
29 pages
Https WWW - Solaredge.com Sites Default Files Se-three-phase-Inverter-setapp-ds
No ratings yet
Https WWW - Solaredge.com Sites Default Files Se-three-phase-Inverter-setapp-ds
2 pages
W29C040 × 8 Cmos Flash Memory: General Description
No ratings yet
W29C040 × 8 Cmos Flash Memory: General Description
24 pages
Course Outline
No ratings yet
Course Outline
14 pages
Training On Ip & Soc Functional Verification Methodology: Using Uvm
No ratings yet
Training On Ip & Soc Functional Verification Methodology: Using Uvm
16 pages
Sheets FrontEnd
No ratings yet
Sheets FrontEnd
14 pages
Paper 1 Answers MPPSC 2021 P
No ratings yet
Paper 1 Answers MPPSC 2021 P
12 pages
Fortisandbox Software: Guo Ran Barriga Ricardo Elec5 - Systèmes Embarqués
No ratings yet
Fortisandbox Software: Guo Ran Barriga Ricardo Elec5 - Systèmes Embarqués
19 pages
Lab2 Uml Railroad Crossing System: BARRIGA Ponce de Leon Ricardo Guo Ran
No ratings yet
Lab2 Uml Railroad Crossing System: BARRIGA Ponce de Leon Ricardo Guo Ran
15 pages
Conclusion
No ratings yet
Conclusion
2 pages
OOPS Project Proposal-3
No ratings yet
OOPS Project Proposal-3
3 pages
Training On Ip & Soc Functional Verification Methodology: Using Uvm
No ratings yet
Training On Ip & Soc Functional Verification Methodology: Using Uvm
6 pages
Embedded Linux - GSE5 LAB1 - Kernel Development: Barriga Ponce de Leon Ricardo Guo Ran
No ratings yet
Embedded Linux - GSE5 LAB1 - Kernel Development: Barriga Ponce de Leon Ricardo Guo Ran
5 pages
Embedded Linux - GSE5 LAB2 - Character Driver: BARRIGA Ponce de Leon Ricardo Guo Ran
No ratings yet
Embedded Linux - GSE5 LAB2 - Character Driver: BARRIGA Ponce de Leon Ricardo Guo Ran
5 pages

Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran

Uploaded by

Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran

Uploaded by

Embedded Linux – GSE5

LAB5 – Introduction to OpenCL

BARRIGA Ponce de Leon Ricardo

1. C++ vector addition

2. OpenCL vector addition

• Set up OpenCL environment

C. Create an OpenCL program from vector_add_opencl.cl. An OpenCL program consists of a set

D. Create our OpenCL kernel for the kernel function

• Execute the kernel instances

Then we unmapped the output data with function clEnqueueUnmapMemObject.

We measured execution time using clGetEventProfilingInfo

After the compilation, we get the result as the figure below:

Then we change the vector size:

You might also like