Data Science Stuff
Data Science Stuff
Scientic Data
David Pugmire
Scientic Data Group
Oak Ridge National Laboratory
6 March 2014
Data Driven Science and
Scientic Visualization
Volume
Increasing mesh resolutions
Increasing temporal resolution
Velocity
Increasing temporal resolution
Frequency of data
Variety
Multi-variate
Ensembles
Increasing complexity
Veracity
Uncertainty
Errors
Approximations
Value
Visualization and Analysis
Feature detection
Scientic insight
HPC Visualization Tools of
Today
Analysis cluster
or
Supercomputer
Client
GUI
Viewer
API
Server
I/O
Analysis
Visualization
Scalability of Visualization Tools
Can current visualization tools survive at the exascale?
What are the bottlenecks at the largest scales?
What differences to architecture make?
Research Questions:
Core-collapse supernova simulation. Data
courtesy of T. Mezzacappa (GenASiS)
Methodology:
Create exascale data. Trillions of zones
Run a simple workow:
Read data
Volume render / contour data
Render and composite
Scalability of Visualization Tools
Challenges at Exascale:
100-200
I/O Caveats:
System System Peak I/O Peak I/O Reality I/O Hero
JaguarPF 2PF 200 GB/s 1 GB/s 60 GB/s
Titan 20PF 1.2 TB/s 1 GB/s 120 GB/s
Future 1000PF 10 TB/s (?) ?? ??
Visualization at the Exascale
Target approaching hardware/software ecosystems
EAVL: Extreme-scale Analysis and Visualization Library
Research Areas Volume Velocity Variety Veracity Value
Data Model
X X X X
Heterogenous
Computing
X X X
In situ / In transit
X X X
And, make it all accessible for developers
EAVL Research Goals:
Data Model
De-facto standards like VTK have a limited data model
Point Arrangement
Cells Coordinates Explicit Logical Implicit
Structured
Strided Structured Grid
Separated Rectilinear Grid Image Data
Unstructured
Strided Unstructured Grid
Separated
Arbitrary Composition for Both
Efciency and Flexibility
EAVL allows full exibility in representation
Point Arrangement
Cells Coordinates Explicit Logical Implicit
Structured
Strided
Separated
Unstructured
Strided
Separated
Data Model Gaps Addressed in EAVL
Hybrid mesh types
1D/2D/3D/.... coordinate systems
Higher dimensional data
Non-physical data, e.g. graphs
Face and edge data
Multiple groups of cells in one mesh
e.g. subsets, external faces
Mixed topology meshes
e.g. molecules, embedded surfaces
9D mesh used by
GenASiS
2nd order quadtree
from MADNESS
Mixed topology
molecule mesh
Graph mesh
Example: Memory and Algorithmic
Efciency
Explicit points
Explicit cells
Threshold regular grid: 35 < pressure < 45
Traditional Data Model EAVL Data Model
Implicit points
Explicit cells
Fully unstructured grid Hybrid implicit/explicit grid
Example: Memory and Algorithmic
Efciency
EAVL: 7X reduction in
memory usage
EAVL: 4-5x performance
improvement
EAVL Research Goals:
Heterogeneous Computing
Implementations for CPU, GPU, and Phi
Surface Normal Calculation
EAVL Research Goals:
Usability
Minimal footprint
No dependencies
Header le only implementation
1D, 2D, and 3D rendering with annotations
Optional MPI, CUDA, OpenMP support
Optional le readers
EAVLab lightweight tool
for rapid prototyping and
experimentation
EAVL Research Goals:
Tightly-coupled In Situ
Zero-copy host and device
Parallel rendering
infrastructure
Examples:
LULESH (Hydrodynamics)
Xlotal (Fusion)
EAVL Research Goals:
Loosely-coupled In Situ
ADIOS Staging and XGC Fusion
code
Exploits network hardware
support for fast data transfer to
remote memory
Application writes using ADIOS
API
Viz app reads using ADIOS API
Staging
Viz
XGC application
EAVL Roadmap
Continued algorithm research and development
Data parallel algorithms are very different
Autonomic algorithms
Techniques for handling uncertainty
Continued efforts in loosely and tightly coupled in situ
Deployment as services into data streaming frameworks
Deployment path into HPC vis tools (e.g., VisIt and
Paraview)
Thank you for your
attention