Introduction To OpenCL
Introduction To OpenCL
OpenCL
• Why OpenCL?
It’s a heterogeneous world
• Modern computing platform include:
– One or more CPUs
– One of more GPUs
– Etc.
OpenCL lets programmers write a single portable program
that uses ALL resources in the heterogeneous platform
Other APIs
• OpenMP, MPI, POSIX threads, CUDA, C++ AMP, OpenACC,
Renderscript, etc.
Microprocessor Trends
Individual processors have many (possibly heterogeneous) cores.
10 cores
61 cores 16 wide SIMD 16 cores
16 wide SIMD 32 wide SIMD
ATI™ RV770
Intel® Xeon Phi™
NVIDIA® Tesla®
coprocessor
C2090
The Heterogeneous many-core challenge:
How are we to build a software ecosystem for the
Heterogeneous many core platform?
Third party names are the property of their owners.
Industry Standards for Programming
Heterogeneous Platforms
GPUs
CPUs Emerging Increasingly general purpose
Multiple cores driving data-parallel computing
performance increases Intersection
Graphics APIs
Multi-processor Heterogeneous and Shading
programming – Languages
e.g. OpenMP Computing
• OpenCL SDKs:
– AMD APP (Accelerated Parallel Processing)
• Platform model
The host – one processor coordinates execution
The devices – one or more processors capable of executing
OpenCL code
Kernels – an abstract hardware model used by
programmers when writing OpenCL functions that execute
on devices
• Execution model
How the OpenCL environment is configured on the host
and kernels executed on the device(s)
Includes setting up the context
OpenCL Architecture
• Memory model
Abstract memory hierarchy that kernels use, regardless of
actual memory architecture
Closely resembles current GPU memory hierarchies
Four distinct memory regions
• Global memory, constant memory, local memory, private memory
• Programming model
Defines how concurrency model is mapped to physical
hardware
OpenCL supports data parallel and task parallel
programming models
OpenCL Processing
Analogy: OpenCL Processing and a Card Game
Analogy: OpenCL Processing and a Card Game
Analogy: OpenCL Processing and a Card Game
…
…
…
……
………
Processing
Element ……
… Host
…
……
• Command queue
Mechanism for the host to request that an action be
performed by the device
• Perform a memory transfer, begin executing, etc.
Each device requires a separate command queue
Commands within the queue
• Can be synchronous or asynchronous
• Can execute in-order or out-of-order
Command Queues
Context
Original input/output
data (not OpenCL
memory objects)
Programs and Kernels
• Programs
A program object is basically a collection of OpenCL
kernels
• Kernel
A kernel is a function declared in a program that is
executed on an OpenCL device
• A kernel object is a kernel function along with its associated
arguments
A kernel object is created from a compiled program
Must explicitly associate arguments (memory objects,
primitives, etc.) with the kernel object
Programs
• A program object
Created and compiled by providing source code or a binary
file and selecting which devices to target
Program
Context
Creating Programs
Context
Kernels
Context