GPU Accelerated Databases
Database Driven OpenCL Programming
Tim Child 3DMashUp CEO
Outline
Speakers Biography Outline Solution Goals OpenCL Programming Challenge Review of GPU Accelerated Databases Swiss Army Knife of Data OpenCL Bindings to PostgreSQL Challenges Example Use Cases Benefits of the Approach Q&A
Speakers Bio
Tim Child 35 years experience of software development Formerly
VP Engineering, Oracle Corporation VP Engineering, BEA Systems Inc. VP Engineering , Informix Leader at Illustra, Autodesk, Navteq, Intuit,
30+ years experience in 3D, CAD, GIS and DBMS
Goals
Develop New Applications
Develop new GPU Accelerated Database Applications that are computationally intensive.
Ease of Use
Make use GPU accelerated code easier to use Make GPU accelerated code more mainstream to Information Technology
Data Scalability
Scale GPU application data size
Enhance existing database internal operations
OpenCL Programming Challenge
Write an OpenCL Application that : Reads data from DBMS or File Publishes Results as Web Pages Handles Frequent Data Updates Data Size >> System RAM >> GPU RAM
Possible Solutions
Other Choices ??
or C/C++ Binding using Web CGI Database Driven Java/Perl/Python Bindings in App Server GPU Programming
REVIEW OF GPU ACCELERATED DATABASE ARCHITECTURES
GPU Co-Process
TCP/IP DBMS Client DBMS Server
IPC / RPC
GPU Language Co-Process
GPGPU DRAM
PCI Bus
Examples 2004 Bandi, Sun, et al Many others
Data Tables
GPGPU
GPU Hosted Data Architecture
PCI Bus TCP/IP DBMS Client DBMS Sever + GPU Host Data Indices Copy GPGPU DRAM Data Tables Copy
GPGPU
Examples 2008 Bakkum, Skardon 2010 Palo OLAP 2010 ParStream 2011 Kaczmarski
Data Tables
Procedural Language Architecture
TCP/IP PCI Bus DBMS Server Results GPGPU Host Queries
GPGPU DRAM
DBMS Client
10G B
RAM Cache
GPGPU
Examples 1995 Illustra/Intel 2010 3DMashUp Data Tables
10T B
PostgeSQL Swiss Army Knife of Data
SQL Extensible Types
(Declarative Language, Set Operations)
Extensible Procedural Languages
(Java, Perl, )
Rules System
Extensible Indices
Open Source
Vibrant Community
Native APIs
Remote Data Access
PostGIS
(Vector, Raster)
OpenCL
SQL OpenCL Types
Vector Types
cl_charX cl_ucharX cl_shortX cl_ushortX cl_floatX cl_doubleX
SQL Syntax Create table opencltypes ( id serial, matrix cl_double4[4], image image2d ); Insert into opencltypes ( matrix) values ( { 1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1 } );
Images Types
image2d_t Image3d_t
Database Driven OpenCL
PostgreSQL Sever
HTTP PgOpenCL PgOpenCL SQL SQL Procedure Procedure PCIe x2 Bus TCP/IP
Web Browser
Web Server
SQL Statement
App Server
PostgreSQL GPGPU
TCP/IP
Disk I/O PostgreSQL
Data Tables
Client
OpenCL SQL Language Bindings
CREATE or REPLACE FUNCTION VectorAdd(IN Id int[], IN a real[], IN B real[], OUT C real[] ) AS $BODY$
__kernel void VectorAdd( __global int * id, __global float *a, __global float *b, __global float *c) { int i = get_global_id(0); /* Query OpenCL for the Array Subscript **/ c[i] = a[i] + b[i]; }
$BODY$
Language PgOpenCL; Select VectorAadd(Id, a, c) from Vectors;
Comparison Table
Database Driven OpenCL
Table A B Select Table to Array 100s - 1000s of Threads (Kernels)
xPU
VectorAdd(A, B) Returns C
Copy Copy
Unnest Array To Table
Table
MORE DATA TYPES
PgOpenCL Time Series Type
CL_UNSIGNED_INT, CL_INTENSITY
CL_FLOAT, CL_INTENSITY
Time Series Data
34 Years IBM data in 3NF = 8734 records Date 3/11/2003 3/10/2003 3/7/2003 3/6/2003 3/5/2003 3/4/2003 3/3/2003 Open 75.82 77.45 75.71 77 76.7 77.6 78.9 High 76.33 77.45 77.99 77.78 77.73 77.75 79 Low 75.2 75.5 75.71 76.7 76.25 76.53 77.12 Close Volume 75.35 8119200 75.7 6641300 77.9 8129200 77.07 5876300 77.73 6658000 76.7 5672200 77.3 661830
As Time Series = 34 Records, 6 Series Columns (~256 Values/Series)
Time Series Properties
Hurst Exponent Based on Fractal Dimension
0.5 Random < 0.5 Seasonal Variations > 0.5 Trending
Pearson Match Correlation Coefficient Correlation between two Time Series 1 Linear Relation Between Samples -1 Inverse Linear Relation Between Samples 0.0 No Linear Relationship between samples
FURTHER USES OF GPU ACCELERATED DBMS
Example Use Cases
GPU Accelerated Time Series 3D Content Management / GIS
Spatial Selections Coordinate Transformations Image Processing
Bioinformatics
DNA & Protein Sequence Matching
Database Internal Operations
Joins Sorting Query Planning
Example Screen 1
Example Screen 2
Example Screen 3
Example Screen 4
Example Screen 5
Type Mapping
Challenges
Problem Size
DBMS Table Size >> GPU RAM
Setup > Runtime
Extended SQL Types OpenCL Vectors Types OpenCL Image Types Time Series
Caching kernel info CPU GPU Still present SQL Queries
# Work Groups / # Work Items
Dynamic Parallelism
Runtime Partitioning
Dynamic Simplified Return Types
Data Transfer
Device Management
CPU vs. GPU
Runtime Selection
Concurrency
No Pre-emptive Multi-Tasking
Time-out Long Queries Partitioning / Scheduling
+ Overhead ( < 4s )
Map Array
Bulk Data Loaders
New Task
Summary
OpenCL
PostgreSQL
Open Source Release
Database Internal Operations
Q&A
PgOpenCL Twitter @3DMashUp Blog www.scribd.com/3dmashup OpenCL
www.khronos.org/opencl/ www.amd.com/us/products/technologies/stream-technology/opencl/ http://software.intel.com/en-us/articles/intel-opencl-sdk http://www.nvidia.com/object/cuda_opencl_new.html