PostgreSQL OpenCL Procedural Language
PostgreSQL OpenCL Procedural Language
A New PostgreSQL
Procedural Language
Unlocking the Power of the GPU!
By
Tim Child
Bio
Tim Child
• 35 years experience of software development
• Formerly
• VP Oracle Corporation
• VP BEA Systems Inc.
• VP Informix
• Leader at Illustra, Autodesk, Navteq, Intuit, …
• 30+ years experience in 3D, CAD, GIS and DBMS
Terminology
Term Description
Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )
GPU Graphics Processing Unit (highly specialized CPU for graphics)
GPGPU General Purpose GPU (non-graphics programming on a GPU)
CUDA Nvidia’s GPU programming environment
APU Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip)
ISO C99 Modern standard version of the C language
OpenCL Open Compute Language
OpenMP Open Multi-Processing (parallelizing compilers)
SIMD Single Instruction Multiple Data (Vector instructions )
SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions
xPU Any Processing Unit device (CPU, GPU, APU)
Kernel Functions that execute on a OpenCL Device
Work Item Instance of a Kernel
Workgroup A group of Work Items
FLOP Floating Point Operation (single = SQL real type )
MIC Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
Some Technology Trends
Impacting DBMS
• Solid State Storage
– Reduced Access Time, Lower Power, Increasing in capacity
• Virtualization
– Server consolidation, Specialized VM’s, lowers direct costs
• Cloud Computing
– EC2, Azure, … lowers capital requirements
• Multi-Core
– 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications
• xPU (GPU/APU)
– GPU >1000 Cores
– > 1T FLOP /s @ €2500
– APU = CPU + GPU Chip Hybrids due in Mid 2011
– 2 T FLOP /s for $2.10 per hour (AWS EC2)
– Intel MIC “Knights Corner “ > 50 x86 Cores
Compute Intensive
xPU Database Applications
• Bioinformatics
• Signal/Audio/Image Processing/Video
• Searching
• Sorting
• Map/Reduce
• Scientific Computing
• Many Others …
GPU vs CPU
Vendor NVidia ATI Radeon Intel
Architecture Fermi Evergreen Nehalem
Cores 448 1600 4
Simple Simple Complex
Transistors 3.1 B 2.15 B 731 M
Clock 1.5 G Hz 851 M Hz 3 G Hz
Peak Float 1500 G 2720 G 96 G
Performance FLOP / s FLOP / s FLOP / s
Peak Double 750 G 544 G 48 G
Performance FLOP / s FLOP / s FLOP / s
Memory ~ 190 G / s ~ 153 G / s ~ 30 G / s
Bandwidth
Power 250 W > 250 W 80 W
Consumption
SIMD / Vector Many Many SSE4+
Instructions
Multi-Core Performance
Source NVidia
Future (Mid 2011)
APU Based PC
APU (Accelerated Processing Unit)
APU Chip
CPU CPU ~20 GB/s System RAM
North Bridge
~20 GB/s APU’s
PCIE ~12 GB/s
Adds an Embedded
Embedded GPU
GPU
Discrete
150 GB/s Graphic RAM
GPU
Source AMD
Scalar vs. SIMD
Scalar Instruction
C=A+B 1 + 2 = 3
SIMD Instruction 1 3 5 7
+
Vector C = Vector A + Vector B 2 4 6 8
=
3 7 11 15
OpenCL
Vector lengths 2,4,8,16 for char, short, int, float, double
Summarizing xPU
Trends
• Many more xPU Cores in our Future
• Compute Environment becoming Hybrid
– CPU and GPU’s
– Need CPU to give access to GPU power
• GPU Capabilities
– Lots of cores
– Vector/SIMD Instructions
– Fast Memory
• GPU Futures
– Virtual Memory
– Multi-tasking / Pre-emption
Scaling PostgreSQL Queries
on xPU’s
Multi-Core CPU Many Core GPU
Using More
Transistors
Parallel
Programming Systems
Category CUDA OpenMP OpenCL
Language C C, Fortran C
Cross Platform X √ √
Standard Vendor OpenMP Khronos
CPU X √ √
GPU √ X √
Clusters X √ X
PgOpenCL
PgOpenCL
Web HTTP Web SQL SQL
SQL
Browser Server Statement Procedure
Procedure
PCIe X2 Bus
TCP/IP
App
PostgreSQL GPGPU
Server
$BODY$
Language PgOpenCL;
PgOpenCL
Execution Model
A
Table
B
xPU
VectorAdd(A, B)
A + B Returns C = C
C C C C C C C C C C C C C
Using
Re-Shaped Tables
100’s - 1000’s of
Table of Threads (Kernels) Table of
Arrays Arrays
A + B = C
A
C C C C
B
xPU
VectorAdd(A, B)
Returns C
A
C C C C
B
Copy
Copy
Today’s GPGPU
Challenges
• No Pre-emptive Multi-Tasking
• No Virtual Memory
• Limited Bandwidth to discrete GPGPU
– 1 – 8 G/s over PCIe Bus
• Hard to Program
– New Parallel Algorithms and constructs
– “New” C language dialect
• Immature Tools
– Compilers, IDE, Debuggers, Profilers - early years
• Data organization really matters
– Types, Structure, and Alignment
– SQL needs to Shape the Data
• Profiling and Debugging is not easy
2010 2011
• Wish List
• Beta Testers
– Existing OpenCL App?
– Have a GPU App?
• Contributors
– Code server side functions?
• Sponsors & Supporters
– AMD Fusion Fund?
– Khronos?
PgOpenCL
Future Plans
• Increase Platform Support
• Scatter/Gather Functions
• Additional Type Support
– Image Types
– Sparse Matrices
• Run-Time
– Asynchronous
– Events
– Profiling
– Debugging
Using the
Whole Brain
APU Chip
PgOpenCl PgOpenCl
PgOpenCL PgOpenCL
CPU
CPU CPU
Postgres You can’t be in a
parallel universe
with a single
brain!
North Bridge
~20 GB/s
• Heterogeneous Compute Environments
PgOpenCl
PgOpenCl • CPU’s, GPU’s, APU’s
Embedded PgOpenCl • Expect 100’s – 1000’s of cores
PgOpenCl
GPU PgOpenCL
• OpenCL
• Portable and high-performance framework
–Ideal for computationally intensive algorithms
–Access to all compute resources (CPU, APU, GPU)
–Well-defined computation/memory model
•Efficient parallel programming language
–C99 with extensions for task and data parallelism
–Rich set of built-in functions
•Open standard for heterogeneous parallel computing
• PgOpenCL
• Integrates PostgreSQL with OpenCL
• Provides Easy SQL Access to xPU’s
• APU, CPU, GPGPU
• Integrates OpenCL
• SQL + Web Apps(PHP, Ruby, … )
More
Information
• PGOpenCL
• Twitter @3DMashUp
• OpenCL
• www.khronos.org/opencl/
• www.amd.com/us/products/technologies/stream-technology/opencl/
• http://software.intel.com/en-us/articles/intel-opencl-sdk
• http://www.nvidia.com/object/cuda_opencl_new.html
• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
Q&A