0% found this document useful (0 votes)

237 views44 pages

Introduction To OpenCL

Uploaded by

Yuvan Nadarajan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

237 views44 pages

Introduction To OpenCL

Uploaded by

Yuvan Nadarajan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Introduction to OpenCL

OpenCL

• Why OpenCL?
 It’s a heterogeneous world
• Modern computing platform include:
– One or more CPUs
– One of more GPUs
– Etc.
 OpenCL lets programmers write a single portable program
that uses ALL resources in the heterogeneous platform
 Other APIs
• OpenMP, MPI, POSIX threads, CUDA, C++ AMP, OpenACC,
Renderscript, etc.
Microprocessor Trends
Individual processors have many (possibly heterogeneous) cores.

10 cores
61 cores 16 wide SIMD 16 cores
16 wide SIMD 32 wide SIMD

ATI™ RV770
Intel® Xeon Phi™
NVIDIA® Tesla®
coprocessor
C2090
The Heterogeneous many-core challenge:
How are we to build a software ecosystem for the
Heterogeneous many core platform?
Third party names are the property of their owners.
Industry Standards for Programming
Heterogeneous Platforms
GPUs
CPUs Emerging Increasingly general purpose
Multiple cores driving data-parallel computing
performance increases Intersection

Graphics APIs
Multi-processor Heterogeneous and Shading
programming – Languages
e.g. OpenMP Computing

OpenCL – Open Computing Language

Open, royalty-free standard for portable, parallel programming of
heterogeneous parallel computing CPUs, GPUs, and other processors
The Origins of OpenCL
ARM
AMD Merged, needed Nokia
commonality IBM
ATI across products Sony
Wrote a rough draft Qualcomm
GPU vendor – straw man API Imagination
wants to steal
NVIDIA TI
market share
from CPU + many
more
Khronos Compute
CPU vendor –
group formed
wants to steal
Intel market share
from GPU

Was tired of recoding for

many core, GPUs.
Apple Pushed vendors to
standardize
Third party names are the property of their owners.
OpenCL Working Group within Khronos
• Diverse industry participation
 Processor vendors, system OEMs, middleware vendors,
application developers
• OpenCL became an important standard upon release
by virtue of the market coverage of the companies
behind it

Third party names are the property of their owners.

OpenCL Timeline
• Launched Jun’08 … 6 months from “strawman” to OpenCL
1.0
• Rapid innovation to match pace of hardware innovation
 18 months from 1.0 to 1.1 and from 1.1 to 1.2
 Goal: a new OpenCL every 18-24 months
 Committed to backwards compatibility to protect software
investments

OpenCL 1.1 OpenCL 2.0

Specification and Provisional
conformance tests Specification released
released for public review
Dec08 Nov11 Nov13
Jun10 Jul13
OpenCL 1.0 OpenCL 1.2 OpenCL 2.0
released. Specification and Specification
Conformance tests conformance tests finalised and
released Dec08 released conformance tests
released
OpenCL: From cell phone to supercomputer

• OpenCL Embedded profile for

mobile and embedded silicon
 Relaxes some data type and
precision requirements
 Avoids the need for a separate “ES”
specification
• Khronos APIs provide computing
support for imaging & graphics
 Enabling advanced applications in,
A camera phone with GPS
e.g., Augmented Reality processes images to recognize
• OpenCL will enable parallel buildings and landmarks and
computing in new markets provides relevant data from
internet
 Mobile phones, cars, avionics
OpenCL SDK

• OpenCL SDKs:
– AMD APP (Accelerated Parallel Processing)

– CUDA (Compute Unified Device Architecture)

– Intel SDK for OpenCL Applications

• OpenCL uses an “Installable Client Driver” (ICD)

model
 To allow platforms from different vendors to co-exist
 Applications can choose a platform at runtime
The Big Picture
OpenCL Architecture

• The OpenCL specifications is defined in four parts,

called models:
 Platform model
 Execution model
 Memory model
 Programming model
OpenCL Architecture

• Platform model
 The host – one processor coordinates execution
 The devices – one or more processors capable of executing
OpenCL code
 Kernels – an abstract hardware model used by
programmers when writing OpenCL functions that execute
on devices
• Execution model
 How the OpenCL environment is configured on the host
and kernels executed on the device(s)
 Includes setting up the context
OpenCL Architecture

• Memory model
 Abstract memory hierarchy that kernels use, regardless of
actual memory architecture
 Closely resembles current GPU memory hierarchies
 Four distinct memory regions
• Global memory, constant memory, local memory, private memory
• Programming model
 Defines how concurrency model is mapped to physical
hardware
 OpenCL supports data parallel and task parallel
programming models
OpenCL Processing
Analogy: OpenCL Processing and a Card Game
Analogy: OpenCL Processing and a Card Game
Analogy: OpenCL Processing and a Card Game

• Important differences in the analogy and OpenCL:

 Analogy does not mention platforms
• A platform is a data structure that identifies a vendor’s
implementation of OpenCL
• Provides a way to access devices
– E.g. access an Nvidia device through the Nvidia platform
 Card dealer does not choose which players sit at the table
• An OpenCL host selects which devices should be placed in a
context
 Card dealer cannot deal the same card to multiple players
• An OpenCL host can dispatch the same kernel to multiple devices
through their command queues
Analogy: OpenCL Processing and a Card Game

• Important differences in the analogy and OpenCL:

 Analogy does not mention data or how it is partitioned
 In a card game, a player can arrange the cards themselves
• In OpenCL, the host places kernel-execution commands into a
command queue, by default, each device executes the kernels in
order
 In a card games, dealers commonly deal cards in a round-
robin fashion
• OpenCL sets no constraints on kernel distribution to multiple
devices
Multiple Devices
OpenCL

• OpenCL framework divided into the following

components
 OpenCL platform API
• Defines functions used by host to discover devices and their
capabilities, and to create context for the application
• Platform, devices, context
 OpenCL runtime API
• This API manipulates the context to create command-queues and
other operations that occur at runtime
• Command queues, buffers, program, kernels and execution, etc.
 OpenCL programming language
• Programming language used to write code for the kernels
OpenCL

• First step in programming any OpenCL application

 Coding the host application
• When writing an OpenCL application
 May not know the underlying hardware it will run on
• Need to identify vendor specific OpenCL implementations and/or
devices in code
• Fundamental data structures
 Platforms, devices, contexts, programs, kernels, and
command queues
• Example functions
– Create the structure
– Provide/obtain information after it is created
– Etc.
OpenCL

• Primitive data types

Platform Model

• The platform model

 Describes the computational resources utilised by OpenCL
and their relationship with one another
• Each OpenCL implementation
 Can create platforms consisting of resources in the system
with which it is capable of interacting
• E.g., an AMD platform can consist of x86 CPUs and Radeon GPUs
• Installable Client Driver (ICD) model
 Allows platforms from different vendors
• Applications can choose a platform at runtime
Platform Model

• The platform model

 Defines a host connected to one or more compute devices
 A device is divided into one or more compute units

…
…
…
……
………
Processing
Element ……
… Host
…
……

Compute Unit Compute Device

Selecting a Platform

• The platform is selected using an API call (may have a

number of platforms)
 There is a pattern to many OpenCL calls which allows the
API to be vendor neutral
• A first call can be made to return storage requirements for data
• A second call can be made to return the data
 Robust applications therefore usually make this call twice
• First call, to get the number of platforms available
• Memory space is then allocated for the platform objects
• Second call, to retrieve the platform objects
Selecting a Platform

• Usually called twice

 E.g.
Obtaining Platform Information
Selecting Devices

• Once a platform is selected

 Can then query for the devices that it knows how to
interact with
 Can specify which types of devices we are interested in
(e.g. all devices, CPUs only, GPUs only)
 This call is performed twice as with clGetPlatformIDs
• The first call is to determine the number of devices, the second
retrieves the device objects
Obtaining Device Information
Contexts

• A context refers to the environment for managing

OpenCL objects and resources
 To manage OpenCL programs, the following are associated
with a context
• Devices: the things doing the execution
• Program objects: the program source that implements the kernels
• Kernels: functions that run on OpenCL devices
• Memory objects: data that are operated on by the device
• Command queues: mechanisms for interaction with the devices
– Memory commands (data transfers)
– Kernel execution
– Synchronisation
Contexts

• When you create a context, you will provide a list of

devices to associate with it
 For the rest of the OpenCL resources, you will associate
them with the context as they are created
Context
Empty context
Contexts

• This function creates a context given a list of devices

 The properties argument specifies which platform to use (if
NULL, the default chosen by the vendor will be used)
 The function also provides a callback mechanism for
reporting errors to the user
Command Queues

• Command queue
 Mechanism for the host to request that an action be
performed by the device
• Perform a memory transfer, begin executing, etc.
 Each device requires a separate command queue
 Commands within the queue
• Can be synchronous or asynchronous
• Can execute in-order or out-of-order
Command Queues

• A command queue associates a context with a device

Context
Command queues
Memory Objects

• Memory objects are associated with a context

 Must be explicitly transferred to devices prior to execution

Uninitialised OpenCL memory objects - the original

data will be transferred later to/from these objects

Context

Original input/output
data (not OpenCL
memory objects)
Programs and Kernels

• Programs
 A program object is basically a collection of OpenCL
kernels
• Kernel
 A kernel is a function declared in a program that is
executed on an OpenCL device
• A kernel object is a kernel function along with its associated
arguments
 A kernel object is created from a compiled program
 Must explicitly associate arguments (memory objects,
primitives, etc.) with the kernel object
Programs

• A program object
 Created and compiled by providing source code or a binary
file and selecting which devices to target
Program
Context
Creating Programs

• This function creates a program object from strings

of source code
 count specifies the number of strings
 The user must create a function to read in the source code
to a string
• If the strings are not NULL-terminated, the lengths
fields are used to specify the string lengths
Compiling Programs

• This function compiles and links an executable from

the program object for each device in the context
 If device_list is supplied, then only those devices are
targeted
• Optional preprocessor, optimisation, and other
options can be supplied by the options argument
Reporting Compile Errors

• If a program fails to compile, OpenCL requires the

programmer to explicitly ask for compiler output
 A compilation failure is determined by an error value
returned from clBuildProgram()
 Calling clGetProgramBuildInfo() with the program object
and the parameter CL_PROGRAM_BUILD_LOG returns a
string with the compiler output
Kernels
• Kernel objects
 Created from a program object by specifying the name of
the kernel function
Kernels

Context
Kernels

• Creates a kernel from the given program

 The kernel that is created is specified by a string that
matches the name of the function within the program
Kernel Arguments

• Memory objects and individual data values can be

set as kernel arguments

Context

Data (e.g. images) are

set as kernel arguments
References

• Among others, material sourced from

 “Heterogeneous Computing with OpenCL”, – B.R. Gaster, L. Howes,
D.R. Kaeli, P. Mistry and D. Schaa
 “OpenCL in Action: How to Accelerate Graphics and Computation,” –
M. Scarpino
 “OpenCL Programming Guide,” – A. Munshi, B.R. Gaster, T.G. Mattson,
J. Fung and D. Ginsburg
 “The OpenCL Specifications”, version 1.1, Khronos OpenCL Working
Group
 Zhongliang Chen, Yash Ukidave, Perhaad Mistry, Dana Schaa, Ben
Gaster, OpenCL University Kits
http://developer.amd.com/partners/university-programs/
 Simon McIntosh-Smith and Tom Deakin, Hands on OpenCL
https://handsonopencl.github.io/
 https://en.wikipedia.org/wiki/

GPU Programming Slides 1
No ratings yet
GPU Programming Slides 1
33 pages
Using OpenCL Programming Massively Parallel Computers
No ratings yet
Using OpenCL Programming Massively Parallel Computers
309 pages
11 - OpenCL Fundamentals
No ratings yet
11 - OpenCL Fundamentals
253 pages
Opencl 03 Basics
No ratings yet
Opencl 03 Basics
62 pages
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
No ratings yet
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
258 pages
OpenCL For EiT-M
No ratings yet
OpenCL For EiT-M
41 pages
CS-3006 7 UsingOpenCL DataParallelProgramming
No ratings yet
CS-3006 7 UsingOpenCL DataParallelProgramming
80 pages
Introduction To OpenCL With Examples
No ratings yet
Introduction To OpenCL With Examples
128 pages
OpenCL Programming
100% (1)
OpenCL Programming
246 pages
14 Parallel Algorithms CUDA Basics s20
No ratings yet
14 Parallel Algorithms CUDA Basics s20
89 pages
Opencl 1pp PDF
No ratings yet
Opencl 1pp PDF
48 pages
Lecture 19-Opencl: Ece 459: Programming For Performance
No ratings yet
Lecture 19-Opencl: Ece 459: Programming For Performance
47 pages
No Load and Blocked Rotor Test On Three Phase Squirrel Cage Induction Motor
No ratings yet
No Load and Blocked Rotor Test On Three Phase Squirrel Cage Induction Motor
8 pages
Opencl: These Notes Will Introduce Opencl
No ratings yet
Opencl: These Notes Will Introduce Opencl
34 pages
Upcrc Opencl Lec1
No ratings yet
Upcrc Opencl Lec1
38 pages
FPGA and OpenCL
No ratings yet
FPGA and OpenCL
31 pages
Parallel Programming in Opencl: Advanced Graphics & Image Processing
No ratings yet
Parallel Programming in Opencl: Advanced Graphics & Image Processing
31 pages
Opencl 2pp
No ratings yet
Opencl 2pp
28 pages
Nvidia Opencl Best Practices Guide: Optimization
No ratings yet
Nvidia Opencl Best Practices Guide: Optimization
49 pages
06-Intro To Opencl PDF
No ratings yet
06-Intro To Opencl PDF
57 pages
ngerso/1 Rand: Operating Maintenance Manual Compressor Models
83% (6)
ngerso/1 Rand: Operating Maintenance Manual Compressor Models
3 pages
Opencl: Graphics Interop: The Best of Both Worlds - Graphics and Compute
No ratings yet
Opencl: Graphics Interop: The Best of Both Worlds - Graphics and Compute
18 pages
AdvancedOpenCL Full
No ratings yet
AdvancedOpenCL Full
101 pages
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
No ratings yet
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
128 pages
3 Heterogeneous Computer Architectures: 3.1 Gpus
No ratings yet
3 Heterogeneous Computer Architectures: 3.1 Gpus
16 pages
OpenCL A Parallel Programming Standart For Heterogeneous
No ratings yet
OpenCL A Parallel Programming Standart For Heterogeneous
12 pages
Supercomputing On Graphics Cards: Marcus Bannerman
No ratings yet
Supercomputing On Graphics Cards: Marcus Bannerman
18 pages
Main Bearing Analysis and Signatures For Waukesha Engines PDF
100% (2)
Main Bearing Analysis and Signatures For Waukesha Engines PDF
24 pages
Pete Presentation 2
No ratings yet
Pete Presentation 2
17 pages
OpenCL Guide
No ratings yet
OpenCL Guide
19 pages
Csit3913 PDF
No ratings yet
Csit3913 PDF
12 pages
GPGPU
No ratings yet
GPGPU
139 pages
Introduction To OpenCL Programming (201005)
No ratings yet
Introduction To OpenCL Programming (201005)
132 pages
PostgreSQL OpenCL Procedural Language
No ratings yet
PostgreSQL OpenCL Procedural Language
29 pages
PgCOn 2011 Parallel Image Searching
No ratings yet
PgCOn 2011 Parallel Image Searching
20 pages
DS1822-Parallel Computing - Unit5
No ratings yet
DS1822-Parallel Computing - Unit5
16 pages
Intro To OpenCL C++ Whitepaper May15
No ratings yet
Intro To OpenCL C++ Whitepaper May15
9 pages
OpenGL vs. OpenCL, Which To Choose and Why - Stack Overflow
No ratings yet
OpenGL vs. OpenCL, Which To Choose and Why - Stack Overflow
9 pages
AB60 AUTOBOX REMOVALl
100% (1)
AB60 AUTOBOX REMOVALl
9 pages
The OpenCL™ Specification-19-26
No ratings yet
The OpenCL™ Specification-19-26
8 pages
Lec 1
No ratings yet
Lec 1
27 pages
OpenCL Unleashing The Power of Parallel Computing
No ratings yet
OpenCL Unleashing The Power of Parallel Computing
8 pages
Opencl Programming For The Cuda Architecture
No ratings yet
Opencl Programming For The Cuda Architecture
23 pages
LLVM Clang - Advancing Compiler Technology
No ratings yet
LLVM Clang - Advancing Compiler Technology
28 pages
A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture
No ratings yet
A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture
74 pages
WhitePaper GPU Computing On Mali
No ratings yet
WhitePaper GPU Computing On Mali
6 pages
Chain Drive System
100% (1)
Chain Drive System
15 pages
Material Safety Data Sheet: 1. Chemical Product and Company Identification
No ratings yet
Material Safety Data Sheet: 1. Chemical Product and Company Identification
5 pages
NVIDIA OpenCL JumpStart Guide
No ratings yet
NVIDIA OpenCL JumpStart Guide
15 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
OpenCL On de Series Boards
No ratings yet
OpenCL On de Series Boards
18 pages
AMD OpenCL Programming User Guide
No ratings yet
AMD OpenCL Programming User Guide
180 pages
Nvidia Opencl Best Practices Guide: Optimization
No ratings yet
Nvidia Opencl Best Practices Guide: Optimization
49 pages
Arco Lagoon Project PDF
No ratings yet
Arco Lagoon Project PDF
137 pages
OpenCL Jumpstart Guide
No ratings yet
OpenCL Jumpstart Guide
17 pages
DNA Assembly With de Bruijn Graphs On FPGA PDF
No ratings yet
DNA Assembly With de Bruijn Graphs On FPGA PDF
4 pages
Cache Coherence: CSE 661 - Parallel and Vector Architectures
No ratings yet
Cache Coherence: CSE 661 - Parallel and Vector Architectures
37 pages
OpenCL Best Practices Guide
No ratings yet
OpenCL Best Practices Guide
54 pages
AIRCUT 101I-161 IW-200 IW Operating Manual PDF
No ratings yet
AIRCUT 101I-161 IW-200 IW Operating Manual PDF
34 pages
Peraturan Pemarkahan Tingkatan 4 2012 Kimia Kertas 2: Section A Description Marks 1 (A)
No ratings yet
Peraturan Pemarkahan Tingkatan 4 2012 Kimia Kertas 2: Section A Description Marks 1 (A)
12 pages
GPU Programming Using openCL
No ratings yet
GPU Programming Using openCL
13 pages
Study of Single Reciprocating Air-Compressor
No ratings yet
Study of Single Reciprocating Air-Compressor
2 pages
Foam Concrete
No ratings yet
Foam Concrete
12 pages
Karakteristik Trafo
No ratings yet
Karakteristik Trafo
2 pages
The Use of Electrical Resistivity As NDT Method For PDF
No ratings yet
The Use of Electrical Resistivity As NDT Method For PDF
94 pages
First For Apple Mac, Iphone, Ipod and
No ratings yet
First For Apple Mac, Iphone, Ipod and
99 pages
Manual HP 7900
No ratings yet
Manual HP 7900
159 pages
As 128685 VHX-7000 C 612183 KCN Us 2122 4
No ratings yet
As 128685 VHX-7000 C 612183 KCN Us 2122 4
40 pages
MSDS Poly SugaBetaine L (16 Section)
No ratings yet
MSDS Poly SugaBetaine L (16 Section)
5 pages
FLSMIDTH - Wave Grate For The SF Cross-Bar Cooler PDF
100% (1)
FLSMIDTH - Wave Grate For The SF Cross-Bar Cooler PDF
4 pages
Differential Equations
No ratings yet
Differential Equations
28 pages
UI Design Principles and Application
No ratings yet
UI Design Principles and Application
26 pages
Global Positioning System (GPS) and Its Applications
No ratings yet
Global Positioning System (GPS) and Its Applications
15 pages
Entity Relationship Modeling: Objectives
No ratings yet
Entity Relationship Modeling: Objectives
13 pages
Wireless Hooter MD-204R User Manual: Antenn A
No ratings yet
Wireless Hooter MD-204R User Manual: Antenn A
2 pages
Weather Data Al Jouf
No ratings yet
Weather Data Al Jouf
7 pages
Somdev Resume
No ratings yet
Somdev Resume
5 pages
Govt and PVT Iti Block Wise VQ 1533
No ratings yet
Govt and PVT Iti Block Wise VQ 1533
3 pages
3.3.7 SISTEMA CLIMATIZACION Sistema-De-Enfriamiento-En-Fila-Liebert-Crv-10-66-Kwsistema-De-Enfriamiento-En-Fila-Liebert-Crv-10-66-Kw
No ratings yet
3.3.7 SISTEMA CLIMATIZACION Sistema-De-Enfriamiento-En-Fila-Liebert-Crv-10-66-Kwsistema-De-Enfriamiento-En-Fila-Liebert-Crv-10-66-Kw
2 pages
VFPxWorkbookXlsx Release Notes
No ratings yet
VFPxWorkbookXlsx Release Notes
3 pages
#Include : SHM - RDONLY, Which Makes The Attached Memory Read-Only
No ratings yet
#Include : SHM - RDONLY, Which Makes The Attached Memory Read-Only
2 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
OpenCL Programming by Example
From Everand
OpenCL Programming by Example
Ravishekhar Banger
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
From Everand
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
Robert Johnson
No ratings yet
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
From Everand
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
Rodrigo Copetti
No ratings yet
Raspberry Pi :The Ultimate Step by Step Raspberry Pi User Guide (The Updated Version )
From Everand
Raspberry Pi :The Ultimate Step by Step Raspberry Pi User Guide (The Updated Version )
Jason Scotts
4/5 (4)
PC Hardware Explained
From Everand
PC Hardware Explained
V. Subhash
No ratings yet
OpenGL to Vulkan: Mastering Graphics Programming
From Everand
OpenGL to Vulkan: Mastering Graphics Programming
Kameron Hussain
No ratings yet