0% found this document useful (0 votes)

103 views74 pages

A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture

The document provides an overview of OpenCL, a parallel programming language for heterogeneous computing devices like CPUs and GPUs. It discusses OpenCL's language specification, platform layer API, and runtime API. It also compares OpenCL to CUDA, NVIDIA's proprietary parallel programming model, noting differences in naming conventions, memory models, kernel execution, and programming interfaces. The document aims to provide developers a jump start into OpenCL programming.

Uploaded by

kaoutar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views74 pages

A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture

Uploaded by

kaoutar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 74

A Jump Start to OpenCL

Another Language to Program Parallel Computing Devices

March 15, 2009

CIS 565/665 – GPU Computing
and Architecture
Sources
• OpenCL Tutorial - Introduction to OpenCL
• OpenCL for NVIDIA GPUs – Chris Lamb
• OpenCL – Parallel Computing for
Heterogeneous Devices (SIGGASIA) –
Kronos Group
• NVIDIA OpenCL Jump Start Guide
• OpenCL – Making Use of What You’ve Got
• OpenCL Basics and Advanced (PPAM
2009) – Domink Behr
Sources
• OpenCL Tutorial - Introduction to OpenCL
• OpenCL for NVIDIA GPUs – Chris Lamb
• OpenCL – Parallel Computing for
Heterogeneous Devices (SIGGASIA) –
Kronos Group
• NVIDIA OpenCL Jump Start Guide
• OpenCL – Making Use of What You’ve Got
• OpenCL Basics and Advanced (PPAM
2009) – Domink Behr
CUDA Working Group

• Because of Nexus and

Visual Studio Integration….
Anatomy of OpenCL
• Language Specification
• C-based cross-platform programming interface
• Subset of ISO C99 with language extensions - familiar
to developers
• Well-defined numerical accuracy (IEEE 754 rounding
with specified max error)
• Online or offline compilation and build of compute
kernel executables
• Includes a rich set of built-in functions

• Platform Layer API

• A hardware abstraction layer over diverse
computational resources
• Query, select and initialize compute devices
• Create compute contexts and work-queues

• Runtime API
• Execute compute kernels
• Manage scheduling, compute, and memory resources
CUDA
Memory Model Comparison

OpenCL CUDA
CUDA vs OpenCL
Architecture – Execution Model

• Kernel – Smallest unit of execution, like a C

function
• Host program – A collection of kernels
• Work item, an instance of kernel at run time
• Work group, a collection of work items
Command Queues
CUDA vs OpenCL API
Differences
• Naming Schemes
• How data gets passes to the API
• C for CUDA programs are compiled
with an external tool (NVCC
compiler)
• OpenCL compiler it typically invoked
at runtime (you can offline compile
too)
CUDA OpenCL
cuInit(0); cl_context hContext;
cuDeviceGet(&hContext, 0); hContext = clCreateContextFromType(0,
cuCtxCreate(&hContext, 0, hDevice)); CL_DEVICE_DEVICE_TYPE_GPU, 0,0,0);

CUdeviceptr pDeviceMemA, pDeviceMemB, cl_mem hDeviceMemA, hDeviceMemB,

pDeviceMemC; hDeviceMemC;
cuMemAlloc(&pDeviceMemA, cnDimension * hDeviceMemA = clCreateBuffer(hContext,
sizeof(float)); CL_MEM_READ_ONLY |
cuMemAlloc(&pDeviceMemB, cnDimension * CL_MEM_COPY_HOST_PTR,
sizeof(float)); cnDimension * sizeof(cl_float), pA, 0);
cuMemAlloc(&pDeviceMemC, cnDimension * hDeviceMemB = clCreateBuffer(hContext,
sizeof(float)); CL_MEM_READ_ONLY |
// copy host vectors to device CL_MEM_COPY_HOST_PTR,
cuMemcpyHtoD(pDeviceMemA, pA, cnDimension cnDimension * sizeof(cl_float), pA, 0);
* sizeof(float)); hDeviceMemC = clCreateBuffer(hContext,
cuMemcpyHtoD(pDeviceMemB, pB, cnDimension CL_MEM_WRITE_ONLY,
* sizeof(float)); cnDimension * sizeof(cl_float) 0, 0);

cuFuncSetBlockShape(cuFunction, cnBlockSize, clEnqueueNDRangeKernel(hCmdQueue,

1, 1); hKernel, 1, 0,
cuLaunchGrid (cuFunction, cnBlocks, 1); &cnDimension, &cnBlockSize, 0, 0, 0);
CUDA Pointer Traversal
struct Node { Node* next; }
n = n->next; // undefined operation in OpenCL,
// since ‘n’ here is a kernel input
OpenCL Pointer Traversal

struct Node { unsigned int next; }

…
n = bufBase + n; // pointer arithmetic is fine, bufBase is
// a kernel input param to the buffer’s beginning
CUDA Kernel code:
__global__ void
vectorAdd(const float * a, const float * b, float * c)
{
// Vector element index
int nIndex = blockIdx.x * blockDim.x + threadIdx.x;
c[nIndex] = a[nIndex] + b[nIndex];
}
OpenCL Kernel code:
__kernel void
vectorAdd(__global const float * a,
__global const float * b,
__global float * c)
{
// Vector element index
int nIndex = get_global_id(0);
c[nIndex] = a[nIndex] + b[nIndex];
}

CUDA kernel functions are declared using the “global”

function modifier

OpenCL kernel functions are declared using “__kernel”.

CUDA Driver API Host code:
const unsigned int cnBlockSize = 512;
const unsigned int cnBlocks = 3;
const unsigned int cnDimension = cnBlocks * cnBlockSize;
CUdevice hDevice;
CUcontext hContext;
CUmodule hModule;
CUfunction hFunction;
// create CUDA device & context
cuInit(0);
cuDeviceGet(&hContext, 0); // pick first device
cuCtxCreate(&hContext, 0, hDevice));
cuModuleLoad(&hModule, “vectorAdd.cubin”);
cuModuleGetFunction(&hFunction, hModule, "vectorAdd");
// allocate host vectors
float * pA = new float[cnDimension];
float * pB = new float[cnDimension];
float * pC = new float[cnDimension];
// initialize host memory
randomInit(pA, cnDimension);
randomInit(pB, cnDimension);
// allocate memory on the device
CUdeviceptr pDeviceMemA, pDeviceMemB, pDeviceMemC;
cuMemAlloc(&pDeviceMemA, cnDimension * sizeof(float));
cuMemAlloc(&pDeviceMemB, cnDimension * sizeof(float));
cuMemAlloc(&pDeviceMemC, cnDimension * sizeof(float));
// copy host vectors to device
cuMemcpyHtoD(pDeviceMemA, pA, cnDimension * sizeof(float));
cuMemcpyHtoD(pDeviceMemB, pB, cnDimension * sizeof(float));
// setup parameter values
cuFuncSetBlockShape(cuFunction, cnBlockSize, 1, 1);
cuParamSeti(cuFunction, 0, pDeviceMemA);
cuParamSeti(cuFunction, 4, pDeviceMemB);
cuParamSeti(cuFunction, 8, pDeviceMemC);
cuParamSetSize(cuFunction, 12);
// execute kernel
cuLaunchGrid(cuFunction, cnBlocks, 1);
// copy the result from device back to host
cuMemcpyDtoH((void *) pC, pDeviceMemC, cnDimension * sizeof(float));
delete[] pA; delete[] pB; delete[] pC;
cuMemFree(pDeviceMemA); cuMemFree(pDeviceMemB); cuMemFree(pDeviceMemC);
OpenCL Host Code:
const unsigned int cnBlockSize = 512;
const unsigned int cnBlocks = 3;
const unsigned int cnDimension = cnBlocks * cnBlockSize;
// create OpenCL device & context
cl_context hContext;
hContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU,
0, 0, 0);
// query all devices available to the context
size_t nContextDescriptorSize;
clGetContextInfo(hContext, CL_CONTEXT_DEVICES,
0, 0, &nContextDescriptorSize);
cl_device_id * aDevices = malloc(nContextDescriptorSize);
clGetContextInfo(hContext, CL_CONTEXT_DEVICES,
nContextDescriptorSize, aDevices, 0);
// create a command queue for first device the context reported
cl_command_queue hCmdQueue;
hCmdQueue = clCreateCommandQueue(hContext, aDevices[0], 0, 0);
// create & compile program
cl_program hProgram;
hProgram = clCreateProgramWithSource(hContext, 1,
sProgramSource, 0, 0);
clBuildProgram(hProgram, 0, 0, 0, 0, 0);
// create kernel
cl_kernel hKernel;
hKernel = clCreateKernel(hProgram, “vectorAdd”, 0);
// allocate host vectors
float * pA = new float[cnDimension];
float * pB = new float[cnDimension];
float * pC = new float[cnDimension];
// initialize host memory
randomInit(pA, cnDimension);
randomInit(pB, cnDimension);
// allocate device memory
cl_mem hDeviceMemA, hDeviceMemB, hDeviceMemC;
hDeviceMemA = clCreateBuffer(hContext,
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, cnDimension * sizeof(cl_float), pA, 0);
hDeviceMemB = clCreateBuffer(hContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, cnDimension *
sizeof(cl_float), pA, 0);
hDeviceMemC = clCreateBuffer(hContext, CL_MEM_WRITE_ONLY, cnDimension * sizeof(cl_float), 0, 0);
// setup parameter values
clSetKernelArg(hKernel, 0, sizeof(cl_mem), (void *)&hDeviceMemA);
Personal Aside…
• I’m a bit skeptical…
• 1) slower Source: Matt Harvey Porting CUDA to OpenCL

• 2) NVIDIA has to fully commit…

More Performance notes…

Source: Matt Harvey Porting CUDA to OpenCL

1 Vulkan Tutorial - English
No ratings yet
1 Vulkan Tutorial - English
210 pages
Elixir Language
100% (1)
Elixir Language
97 pages
Introduction To OpenCL With Examples
No ratings yet
Introduction To OpenCL With Examples
128 pages
Vulkan
No ratings yet
Vulkan
77 pages
Michael J. Folk, Bill Zoellick, Greg Riccardi - File Structures - An Object-Oriented Approach With C++-Addison-Wesley (1998)
No ratings yet
Michael J. Folk, Bill Zoellick, Greg Riccardi - File Structures - An Object-Oriented Approach With C++-Addison-Wesley (1998)
749 pages
Mastering iOS 18 Development: Take your iOS development experience to the next level with iOS, Xcode, Swift, and SwiftUI
From Everand
Mastering iOS 18 Development: Take your iOS development experience to the next level with iOS, Xcode, Swift, and SwiftUI
Avi Tsadok
No ratings yet
C Lab MANUAL
No ratings yet
C Lab MANUAL
65 pages
Data Science Fir Civil Engineering Unit 1 Notes and Assignments
No ratings yet
Data Science Fir Civil Engineering Unit 1 Notes and Assignments
29 pages
Compiler Construction
100% (1)
Compiler Construction
512 pages
Voltage Sag Mitigation With Repetitive Controlled Dynamic Voltage Restorer
No ratings yet
Voltage Sag Mitigation With Repetitive Controlled Dynamic Voltage Restorer
69 pages
Instant Download Java Programming Joyce Farrell PDF All Chapter
100% (6)
Instant Download Java Programming Joyce Farrell PDF All Chapter
53 pages
Median Filter Using Nios Ii Processor With Sort Hardware Accelerator
No ratings yet
Median Filter Using Nios Ii Processor With Sort Hardware Accelerator
87 pages
Applying UML and Patterns
100% (3)
Applying UML and Patterns
616 pages
Energies 13 04152
No ratings yet
Energies 13 04152
39 pages
Empirical Mode Decomposition: Applications On Signal and Image Processing
No ratings yet
Empirical Mode Decomposition: Applications On Signal and Image Processing
52 pages
Implementation and Validation of An Adaptive Fuzzy
No ratings yet
Implementation and Validation of An Adaptive Fuzzy
19 pages
Public 1 (22) Adobe Air SDK Release Notes
No ratings yet
Public 1 (22) Adobe Air SDK Release Notes
22 pages
Dynamic Clustering Equivalent Model of Wind Turbin
No ratings yet
Dynamic Clustering Equivalent Model of Wind Turbin
17 pages
Extreme Guide C++ Builder
No ratings yet
Extreme Guide C++ Builder
46 pages
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
No ratings yet
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
43 pages
Buildings 12 01003 v2 1
No ratings yet
Buildings 12 01003 v2 1
16 pages
RFC
No ratings yet
RFC
92 pages
CUDA Programming Within Mathematica
No ratings yet
CUDA Programming Within Mathematica
17 pages
Student Marks Register System
No ratings yet
Student Marks Register System
27 pages
1 PB 1
No ratings yet
1 PB 1
13 pages
Vulkan Tutorial PDF
No ratings yet
Vulkan Tutorial PDF
293 pages
Opencl 2pp
No ratings yet
Opencl 2pp
28 pages
Technology Stack - Template
No ratings yet
Technology Stack - Template
3 pages
Gpu Parallel Program Development Cuda
100% (2)
Gpu Parallel Program Development Cuda
477 pages
Fundamentals of Software Engineering: Prof. Dr. Ing. Moldoveanu Alin
No ratings yet
Fundamentals of Software Engineering: Prof. Dr. Ing. Moldoveanu Alin
24 pages
Vulkan Tutorial
No ratings yet
Vulkan Tutorial
239 pages
Vulkan in C++ (By Nvidia)
100% (1)
Vulkan in C++ (By Nvidia)
32 pages
66ac0e190e0e80d17f1677ba Report Cantinacode Super Champs Token 4
No ratings yet
66ac0e190e0e80d17f1677ba Report Cantinacode Super Champs Token 4
20 pages
Mark Scheme Practice 2023 (J27702)
100% (1)
Mark Scheme Practice 2023 (J27702)
17 pages
Lighttpd
From Everand
Lighttpd
Andre Bogus
4/5 (2)
Manual de Referencia de MASM PDF
No ratings yet
Manual de Referencia de MASM PDF
545 pages
Opencl: These Notes Will Introduce Opencl
No ratings yet
Opencl: These Notes Will Introduce Opencl
34 pages
Citation 221305915
No ratings yet
Citation 221305915
1 page
Vertex 3
100% (1)
Vertex 3
338 pages
Cws 1
No ratings yet
Cws 1
15 pages
Designing and Building Parallel Programs
No ratings yet
Designing and Building Parallel Programs
371 pages
Research On The Current Situation of Data Governance in The Wind Power Industry
No ratings yet
Research On The Current Situation of Data Governance in The Wind Power Industry
8 pages
DLD Lecture 6
No ratings yet
DLD Lecture 6
38 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
ALGOL
No ratings yet
ALGOL
29 pages
PHP Pandas
100% (1)
PHP Pandas
194 pages
Cuda Webinars WarpsAndOccupancy
No ratings yet
Cuda Webinars WarpsAndOccupancy
14 pages
Android Studio 3.4 Development Essentials - Kotlin Edition: Developing Android 9 Apps Using Android Studio 3.4, Kotlin and Android Jetpack
From Everand
Android Studio 3.4 Development Essentials - Kotlin Edition: Developing Android 9 Apps Using Android Studio 3.4, Kotlin and Android Jetpack
Neil Smyth
No ratings yet
Exercise 1
No ratings yet
Exercise 1
8 pages
Vulkan Overview
100% (1)
Vulkan Overview
25 pages
Pre Test
No ratings yet
Pre Test
3 pages
iOS 18 App Development Essentials: Developing iOS Apps with SwiftUI, Swift, and Xcode 16
From Everand
iOS 18 App Development Essentials: Developing iOS Apps with SwiftUI, Swift, and Xcode 16
Neil Smyth
No ratings yet
Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
Golang Course Syllabus cd11970
No ratings yet
Golang Course Syllabus cd11970
8 pages
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
No ratings yet
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
4 pages
Elements of Assembly Language Programming
80% (10)
Elements of Assembly Language Programming
10 pages
Computer Graphics
100% (1)
Computer Graphics
132 pages
The Linux Concept Journey - V4.0 - July 2024
No ratings yet
The Linux Concept Journey - V4.0 - July 2024
50 pages
Real Time Eeg Signal Processing Based On Ti S tms320c6713 DSK
No ratings yet
Real Time Eeg Signal Processing Based On Ti S tms320c6713 DSK
9 pages
Object Oriented Programming (Oop) Using C++: A Question Bank
100% (1)
Object Oriented Programming (Oop) Using C++: A Question Bank
51 pages
OpenCL Guide
No ratings yet
OpenCL Guide
19 pages
Performance Enhancement of Shunt Active Power Filter Application Using Adaptive Neural Network
No ratings yet
Performance Enhancement of Shunt Active Power Filter Application Using Adaptive Neural Network
8 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Egoless Programming
No ratings yet
Egoless Programming
2 pages
C Sharp From Java Orange Book 2009 PDF
No ratings yet
C Sharp From Java Orange Book 2009 PDF
24 pages
Computer Vision for the Web: Unleash the power of the Computer Vision algorithms in JavaScript to develop vision-enabled web content
From Everand
Computer Vision for the Web: Unleash the power of the Computer Vision algorithms in JavaScript to develop vision-enabled web content
Foat Akhmadeev
No ratings yet
D11 - 0396 - CSE2003 - CAO - 100118 - Prof. Anand Motwani - Fall 2021-22 - Midterm
No ratings yet
D11 - 0396 - CSE2003 - CAO - 100118 - Prof. Anand Motwani - Fall 2021-22 - Midterm
1 page
Linked List: Method To Insert (Add) A Node in The Beginning of A Linked List
No ratings yet
Linked List: Method To Insert (Add) A Node in The Beginning of A Linked List
6 pages
Introduction To Saturn Assembly Language 3e - Fernandes & Rechlin 2005
No ratings yet
Introduction To Saturn Assembly Language 3e - Fernandes & Rechlin 2005
189 pages
SQL Joins
No ratings yet
SQL Joins
59 pages
Victaulic Grooved Aluminum Installation
No ratings yet
Victaulic Grooved Aluminum Installation
3 pages
Software Development Gems. A Small Handbook
From Everand
Software Development Gems. A Small Handbook
samiuddin samiuddin
No ratings yet
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
No ratings yet
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
30 pages
GDC09 Abrash Larrabee+Final
No ratings yet
GDC09 Abrash Larrabee+Final
116 pages
Linux Graphics Demystified
100% (1)
Linux Graphics Demystified
49 pages
How To Pass A Silicon Valley Software Interview
No ratings yet
How To Pass A Silicon Valley Software Interview
95 pages
Grome Terrain Modeling with Ogre3D, UDK, and Unity3D
From Everand
Grome Terrain Modeling with Ogre3D, UDK, and Unity3D
Richard Hawley
No ratings yet
Kubernetes Basic To Advanced
No ratings yet
Kubernetes Basic To Advanced
4 pages
GPU Programming Using openCL
No ratings yet
GPU Programming Using openCL
13 pages
Ravi Sontakke SAP ABAP
No ratings yet
Ravi Sontakke SAP ABAP
6 pages
Computer Graphics CSE 306
No ratings yet
Computer Graphics CSE 306
119 pages
Lec 12 OpenGL Primitives
No ratings yet
Lec 12 OpenGL Primitives
16 pages
Tile Based Games
No ratings yet
Tile Based Games
104 pages
System RPL Input Informs
No ratings yet
System RPL Input Informs
11 pages
The Joy of Computing Using Python: Assignment 3
No ratings yet
The Joy of Computing Using Python: Assignment 3
5 pages
Programming Languages (OOP)
100% (1)
Programming Languages (OOP)
27 pages
C1SE.38 SprintBacklog EQR
No ratings yet
C1SE.38 SprintBacklog EQR
8 pages
Lecture 3: Animation & Graphics
No ratings yet
Lecture 3: Animation & Graphics
32 pages
NVIDIA OpenCL JumpStart Guide
No ratings yet
NVIDIA OpenCL JumpStart Guide
15 pages
Advanced C Programming
No ratings yet
Advanced C Programming
7 pages
Verilog Nonblocking Assignments Demystified
100% (2)
Verilog Nonblocking Assignments Demystified
3 pages
Learn Game Programming With Ruby Bring Your Ideas ... - (Cover)
No ratings yet
Learn Game Programming With Ruby Bring Your Ideas ... - (Cover)
6 pages
Drivers Linux
100% (1)
Drivers Linux
57 pages
Angularjs Cheatsheet
No ratings yet
Angularjs Cheatsheet
4 pages
DirectX Demystified: A Comprehensive Guide to Game Development Essentials
From Everand
DirectX Demystified: A Comprehensive Guide to Game Development Essentials
Kameron Hussain
No ratings yet
SAP ABAP Online Training
No ratings yet
SAP ABAP Online Training
3 pages
Gujarat Technological University: Diploma in Computer Engineering Semester: 3
No ratings yet
Gujarat Technological University: Diploma in Computer Engineering Semester: 3
3 pages
Linear Algebra and Gaming
No ratings yet
Linear Algebra and Gaming
8 pages
Unit Testing C++ Code - CppUnit by Example
No ratings yet
Unit Testing C++ Code - CppUnit by Example
11 pages
LangChainJS For Beginners - Nathan Sebhastian
No ratings yet
LangChainJS For Beginners - Nathan Sebhastian
168 pages
OpenGL to Vulkan: Mastering Graphics Programming
From Everand
OpenGL to Vulkan: Mastering Graphics Programming
Kameron Hussain
No ratings yet
SFML Classlist
No ratings yet
SFML Classlist
3 pages
Graphics Programming in C
No ratings yet
Graphics Programming in C
2 pages
FreeBASIC Types
No ratings yet
FreeBASIC Types
1 page

A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture

Uploaded by

A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture

Uploaded by

A Jump Start to OpenCL

Another Language to Program Parallel Computing Devices

March 15, 2009

• Because of Nexus and

• Platform Layer API

• Kernel – Smallest unit of execution, like a C

CUdeviceptr pDeviceMemA, pDeviceMemB, cl_mem hDeviceMemA, hDeviceMemB,

cuFuncSetBlockShape(cuFunction, cnBlockSize, clEnqueueNDRangeKernel(hCmdQueue,

struct Node { unsigned int next; }

CUDA kernel functions are declared using the “__global__”

OpenCL kernel functions are declared using “__kernel”.

• 2) NVIDIA has to fully commit…

Source: Matt Harvey Porting CUDA to OpenCL

You might also like

CUDA kernel functions are declared using the “global”