0% found this document useful (0 votes)
233 views44 pages

Introduction To OpenCL

Uploaded by

Yuvan Nadarajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
233 views44 pages

Introduction To OpenCL

Uploaded by

Yuvan Nadarajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Introduction to OpenCL

OpenCL

• Why OpenCL?
 It’s a heterogeneous world
• Modern computing platform include:
– One or more CPUs
– One of more GPUs
– Etc.
 OpenCL lets programmers write a single portable program
that uses ALL resources in the heterogeneous platform
 Other APIs
• OpenMP, MPI, POSIX threads, CUDA, C++ AMP, OpenACC,
Renderscript, etc.
Microprocessor Trends
Individual processors have many (possibly heterogeneous) cores.

10 cores
61 cores 16 wide SIMD 16 cores
16 wide SIMD 32 wide SIMD

ATI™ RV770
Intel® Xeon Phi™
NVIDIA® Tesla®
coprocessor
C2090
The Heterogeneous many-core challenge:
How are we to build a software ecosystem for the
Heterogeneous many core platform?
Third party names are the property of their owners.
Industry Standards for Programming
Heterogeneous Platforms
GPUs
CPUs Emerging Increasingly general purpose
Multiple cores driving data-parallel computing
performance increases Intersection

Graphics APIs
Multi-processor Heterogeneous and Shading
programming – Languages
e.g. OpenMP Computing

OpenCL – Open Computing Language


Open, royalty-free standard for portable, parallel programming of
heterogeneous parallel computing CPUs, GPUs, and other processors
The Origins of OpenCL
ARM
AMD Merged, needed Nokia
commonality IBM
ATI across products Sony
Wrote a rough draft Qualcomm
GPU vendor – straw man API Imagination
wants to steal
NVIDIA TI
market share
from CPU + many
more
Khronos Compute
CPU vendor –
group formed
wants to steal
Intel market share
from GPU

Was tired of recoding for


many core, GPUs.
Apple Pushed vendors to
standardize
Third party names are the property of their owners.
OpenCL Working Group within Khronos
• Diverse industry participation
 Processor vendors, system OEMs, middleware vendors,
application developers
• OpenCL became an important standard upon release
by virtue of the market coverage of the companies
behind it

Third party names are the property of their owners.


OpenCL Timeline
• Launched Jun’08 … 6 months from “strawman” to OpenCL
1.0
• Rapid innovation to match pace of hardware innovation
 18 months from 1.0 to 1.1 and from 1.1 to 1.2
 Goal: a new OpenCL every 18-24 months
 Committed to backwards compatibility to protect software
investments

OpenCL 1.1 OpenCL 2.0


Specification and Provisional
conformance tests Specification released
released for public review
Dec08 Nov11 Nov13
Jun10 Jul13
OpenCL 1.0 OpenCL 1.2 OpenCL 2.0
released. Specification and Specification
Conformance tests conformance tests finalised and
released Dec08 released conformance tests
released
OpenCL: From cell phone to supercomputer

• OpenCL Embedded profile for


mobile and embedded silicon
 Relaxes some data type and
precision requirements
 Avoids the need for a separate “ES”
specification
• Khronos APIs provide computing
support for imaging & graphics
 Enabling advanced applications in,
A camera phone with GPS
e.g., Augmented Reality processes images to recognize
• OpenCL will enable parallel buildings and landmarks and
computing in new markets provides relevant data from
internet
 Mobile phones, cars, avionics
OpenCL SDK

• OpenCL SDKs:
– AMD APP (Accelerated Parallel Processing)

– CUDA (Compute Unified Device Architecture)

– Intel SDK for OpenCL Applications

• OpenCL uses an “Installable Client Driver” (ICD)


model
 To allow platforms from different vendors to co-exist
 Applications can choose a platform at runtime
The Big Picture
OpenCL Architecture

• The OpenCL specifications is defined in four parts,


called models:
 Platform model
 Execution model
 Memory model
 Programming model
OpenCL Architecture

• Platform model
 The host – one processor coordinates execution
 The devices – one or more processors capable of executing
OpenCL code
 Kernels – an abstract hardware model used by
programmers when writing OpenCL functions that execute
on devices
• Execution model
 How the OpenCL environment is configured on the host
and kernels executed on the device(s)
 Includes setting up the context
OpenCL Architecture

• Memory model
 Abstract memory hierarchy that kernels use, regardless of
actual memory architecture
 Closely resembles current GPU memory hierarchies
 Four distinct memory regions
• Global memory, constant memory, local memory, private memory
• Programming model
 Defines how concurrency model is mapped to physical
hardware
 OpenCL supports data parallel and task parallel
programming models
OpenCL Processing
Analogy: OpenCL Processing and a Card Game
Analogy: OpenCL Processing and a Card Game
Analogy: OpenCL Processing and a Card Game

• Important differences in the analogy and OpenCL:


 Analogy does not mention platforms
• A platform is a data structure that identifies a vendor’s
implementation of OpenCL
• Provides a way to access devices
– E.g. access an Nvidia device through the Nvidia platform
 Card dealer does not choose which players sit at the table
• An OpenCL host selects which devices should be placed in a
context
 Card dealer cannot deal the same card to multiple players
• An OpenCL host can dispatch the same kernel to multiple devices
through their command queues
Analogy: OpenCL Processing and a Card Game

• Important differences in the analogy and OpenCL:


 Analogy does not mention data or how it is partitioned
 In a card game, a player can arrange the cards themselves
• In OpenCL, the host places kernel-execution commands into a
command queue, by default, each device executes the kernels in
order
 In a card games, dealers commonly deal cards in a round-
robin fashion
• OpenCL sets no constraints on kernel distribution to multiple
devices
Multiple Devices
OpenCL

• OpenCL framework divided into the following


components
 OpenCL platform API
• Defines functions used by host to discover devices and their
capabilities, and to create context for the application
• Platform, devices, context
 OpenCL runtime API
• This API manipulates the context to create command-queues and
other operations that occur at runtime
• Command queues, buffers, program, kernels and execution, etc.
 OpenCL programming language
• Programming language used to write code for the kernels
OpenCL

• First step in programming any OpenCL application


 Coding the host application
• When writing an OpenCL application
 May not know the underlying hardware it will run on
• Need to identify vendor specific OpenCL implementations and/or
devices in code
• Fundamental data structures
 Platforms, devices, contexts, programs, kernels, and
command queues
• Example functions
– Create the structure
– Provide/obtain information after it is created
– Etc.
OpenCL

• Primitive data types


Platform Model

• The platform model


 Describes the computational resources utilised by OpenCL
and their relationship with one another
• Each OpenCL implementation
 Can create platforms consisting of resources in the system
with which it is capable of interacting
• E.g., an AMD platform can consist of x86 CPUs and Radeon GPUs
• Installable Client Driver (ICD) model
 Allows platforms from different vendors
• Applications can choose a platform at runtime
Platform Model

• The platform model


 Defines a host connected to one or more compute devices
 A device is divided into one or more compute units




……
………
Processing
Element ……
… Host

……

Compute Unit Compute Device


Selecting a Platform

• The platform is selected using an API call (may have a


number of platforms)
 There is a pattern to many OpenCL calls which allows the
API to be vendor neutral
• A first call can be made to return storage requirements for data
• A second call can be made to return the data
 Robust applications therefore usually make this call twice
• First call, to get the number of platforms available
• Memory space is then allocated for the platform objects
• Second call, to retrieve the platform objects
Selecting a Platform

• Usually called twice


 E.g.
Obtaining Platform Information
Selecting Devices

• Once a platform is selected


 Can then query for the devices that it knows how to
interact with
 Can specify which types of devices we are interested in
(e.g. all devices, CPUs only, GPUs only)
 This call is performed twice as with clGetPlatformIDs
• The first call is to determine the number of devices, the second
retrieves the device objects
Obtaining Device Information
Contexts

• A context refers to the environment for managing


OpenCL objects and resources
 To manage OpenCL programs, the following are associated
with a context
• Devices: the things doing the execution
• Program objects: the program source that implements the kernels
• Kernels: functions that run on OpenCL devices
• Memory objects: data that are operated on by the device
• Command queues: mechanisms for interaction with the devices
– Memory commands (data transfers)
– Kernel execution
– Synchronisation
Contexts

• When you create a context, you will provide a list of


devices to associate with it
 For the rest of the OpenCL resources, you will associate
them with the context as they are created
Context
Empty context
Contexts

• This function creates a context given a list of devices


 The properties argument specifies which platform to use (if
NULL, the default chosen by the vendor will be used)
 The function also provides a callback mechanism for
reporting errors to the user
Command Queues

• Command queue
 Mechanism for the host to request that an action be
performed by the device
• Perform a memory transfer, begin executing, etc.
 Each device requires a separate command queue
 Commands within the queue
• Can be synchronous or asynchronous
• Can execute in-order or out-of-order
Command Queues

• A command queue associates a context with a device


Context
Command queues
Memory Objects

• Memory objects are associated with a context


 Must be explicitly transferred to devices prior to execution

Uninitialised OpenCL memory objects - the original


data will be transferred later to/from these objects

Context

Original input/output
data (not OpenCL
memory objects)
Programs and Kernels

• Programs
 A program object is basically a collection of OpenCL
kernels
• Kernel
 A kernel is a function declared in a program that is
executed on an OpenCL device
• A kernel object is a kernel function along with its associated
arguments
 A kernel object is created from a compiled program
 Must explicitly associate arguments (memory objects,
primitives, etc.) with the kernel object
Programs

• A program object
 Created and compiled by providing source code or a binary
file and selecting which devices to target
Program
Context
Creating Programs

• This function creates a program object from strings


of source code
 count specifies the number of strings
 The user must create a function to read in the source code
to a string
• If the strings are not NULL-terminated, the lengths
fields are used to specify the string lengths
Compiling Programs

• This function compiles and links an executable from


the program object for each device in the context
 If device_list is supplied, then only those devices are
targeted
• Optional preprocessor, optimisation, and other
options can be supplied by the options argument
Reporting Compile Errors

• If a program fails to compile, OpenCL requires the


programmer to explicitly ask for compiler output
 A compilation failure is determined by an error value
returned from clBuildProgram()
 Calling clGetProgramBuildInfo() with the program object
and the parameter CL_PROGRAM_BUILD_LOG returns a
string with the compiler output
Kernels
• Kernel objects
 Created from a program object by specifying the name of
the kernel function
Kernels

Context
Kernels

• Creates a kernel from the given program


 The kernel that is created is specified by a string that
matches the name of the function within the program
Kernel Arguments

• Memory objects and individual data values can be


set as kernel arguments

Context

Data (e.g. images) are


set as kernel arguments
References

• Among others, material sourced from


 “Heterogeneous Computing with OpenCL”, – B.R. Gaster, L. Howes,
D.R. Kaeli, P. Mistry and D. Schaa
 “OpenCL in Action: How to Accelerate Graphics and Computation,” –
M. Scarpino
 “OpenCL Programming Guide,” – A. Munshi, B.R. Gaster, T.G. Mattson,
J. Fung and D. Ginsburg
 “The OpenCL Specifications”, version 1.1, Khronos OpenCL Working
Group
 Zhongliang Chen, Yash Ukidave, Perhaad Mistry, Dana Schaa, Ben
Gaster, OpenCL University Kits
http://developer.amd.com/partners/university-programs/
 Simon McIntosh-Smith and Tom Deakin, Hands on OpenCL
https://handsonopencl.github.io/
 https://en.wikipedia.org/wiki/

You might also like