0% found this document useful (0 votes)

96 views91 pages

Parallel Computing

This document introduces parallel and high-performance computing concepts with a focus on machine learning applications. It covers parallel computing basics including why parallelism is needed, computing platforms, algorithm design principles like decomposition, and programming models like distributed memory, shared memory and accelerators. The document is split into two parts, with part one focusing on parallel computing fundamentals and part two covering parallel algorithms for machine learning tasks like classification.

Uploaded by

Aaqib Inam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views91 pages

Parallel Computing

Uploaded by

Aaqib Inam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

Introduction to Parallel and

High-Performance Computing
(with Machine-Learning applications)

Anshul Gupta, IBM Research

Prabhanjan (Anju) Kambadur, Bloomberg L.P.
Introduction to Parallel and High-Performance
Computing (with Machine-Learning applications)

Part 1 Part 2

Parallel computing Parallel algorithms for

basics and parallel Building a Classifier
algorithm analysis
Part 1: Parallel and High-Performance Computing

• Why parallel computing?

Part 1: Parallel and High-Performance Computing

• Why parallel computing?

• Parallel computing platforms
Part 1: Parallel and High-Performance Computing

• Why parallel computing?

• Parallel computing platforms
• Parallel algorithm basics
Part 1: Parallel and High-Performance Computing

• Why parallel computing?

• Parallel computing platforms
• Parallel algorithm basics
• Decomposition for parallelism
Part 1: Parallel and High-Performance Computing

• Why parallel computing?

• Parallel computing platforms
• Parallel algorithm basics
• Decomposition for parallelism
• Parallel programming models
Part 1: Parallel and High-Performance Computing

• Why parallel computing?

• Parallel computing platforms
• Parallel algorithm basics
• Decomposition for parallelism
• Parallel programming models
• Parallel algorithm analysis
Why parallel computing?

Microprocessor
trends: 1972--2015

Kirk M. Bresniker, Sharad

Singhal, R. Stanley Williams,
"Adapting to Thrive in a New
Economy of Memory
Abundance",Computer, vol.48,
no. 12, pp. 44-53, Dec. 2015,
doi:10.1109/MC.2015.368
Parallel computing platforms

Highest level of
parallelism:

• Compute nodes on an
interconnection network
• Possibly, thousands of nodes
• Distributed memory
• Distributed or shared address
space
• Scalability analysis crucial
Parallel computing platforms (cont.)

A generalized compute node

• (Possibly) multiple CPUs

• (Possibly) multiple GPUs
Parallel computing platforms (cont.)
Parallel computing platforms (cont.)

Shared memory on a node, but

Non-Uniform Memory Access
(NUMA).
Parallel computing platforms (cont.)

Nvidia GeForce GTX 680 (Kepler)

• 4 GPCs (graphics processing

clusters)
• 8 SMXs (streaming
multiprocessors)
• 192 X 8 = 1536 CUDA cores
Parallel hardware hierarchy

Node
ensemble

Nodes

GPUs

CPUs GPCs

SMXs

Cores

CUDA
cores
Parallel program hierarchy

Process
groups

Processes

Thread
pools

Threads

GPU thread
grids
GPU thread
blocks

GPU
threads
Parallel programming paradigms

Distributed Memory Shared Memory Accelerator

Parallel programming paradigms

Distributed Memory

• Multiple processes
• Distributed address
space
• Explicit data
movement
• Locality!
Parallel programming paradigms

Distributed Memory Shared Memory

• Multiple processes • Multiple threads

• Distributed address • Shared address space
space
• Implicit data
• Explicit data movement
movement • Locality!
• Locality!
Parallel programming paradigms

Distributed Memory Shared Memory Accelerator

• Multiple processes • Multiple threads • Host memory  PCIe

• Distributed address • Shared address space  Device memory
space • Explicit data
• Implicit data
movement
• Explicit data movement
movement • Locality! • Locality!
• Locality!
Algorithm design

• Algorithm design is critical to devising a computing

solution
Algorithm design

• Algorithm design is critical to devising a computation

solution
• Serial algorithm is a recipe or sequence of basic steps or
operations
Algorithm design

• Algorithm design is critical to devising a computation

solution
• Serial algorithm is a recipe or sequence of basic steps or
operations
• Parallel algorithm is a recipe for solving the given
problem using an ensemble of hardware resources
• Specifying parallel algorithm involves a lot more than
specifying a sequence of basic steps
Algorithm design

• Algorithm design is critical to devising a computation

• Identifying portions of work that can be performed

concurrently
Parallel algorithm design steps

• Identifying portions of work that can be performed

concurrently
• Mapping concurrent pieces of work onto computing agents
running in parallel
Parallel algorithm design steps

• Identifying portions of work that can be performed

concurrently
• Mapping concurrent pieces of work onto computing agents
running in parallel
• Making the input, output, and intermediate data available to
the right computing agent at the right time
• Managing simultaneous requests for shared data
Parallel algorithm design steps

• Identifying portions of work that can be performed

• Identifying portions of work that can be performed concurrently

- Decomposition
• Mapping concurrent pieces of work onto computing agents
running in parallel
• Making the input, output, and intermediate data available to the
right computing agent at the right time – Data Dependencies
• Managing simultaneous requests for shared data
• Synchronizing computing agents for correct program execution –
Task Dependencies
Decomposition for concurrency

Task Decomposition Data Decomposition

Decomposition for concurrency

Task Decomposition

• Concurrent tasks are identified

and mapped onto to threads or
processes
Decomposition for concurrency

Task Decomposition

• Concurrent tasks are identified

and mapped onto to threads or
processes
• Tasks share or exchange data
as needed
Decomposition for concurrency

Task Decomposition

• Concurrent tasks are identified

and mapped onto to threads or
processes
• Tasks share or exchange data
as needed
• May be static or dynamic
Decomposition for concurrency

Task Decomposition Data Decomposition

• Concurrent tasks are identified • Data is partitioned (input,

and mapped onto to threads or output, or intermediate)
processes
• Tasks share or exchange data
as needed
• May be static or dynamic
Decomposition for concurrency

Task Decomposition Data Decomposition

• Concurrent tasks are identified • Data is partitioned

and mapped onto to threads or • Partitions are assigned to
processes computing agents
• Tasks share or exchange data • “Owner computes” rule
as needed
• May be static or dynamic
Decomposition for concurrency

Task Decomposition Data Decomposition

• Concurrent tasks are identified • Data is partitioned

and mapped onto to threads or • Partitions are assigned to
processes computing agents
• Tasks share or exchange data • “Owner computes” rule
as needed
• Usually static
• May be static or dynamic
Decomposition for concurrency

Data + Task = Hybrid

Decomposition Decomposition Decompositions

(Example: sparse
matrix factorization)
Task decomposition example

Chess Program

R1 K1 B1 Q P1 P2

...
• Each task evaluates all moves of a single piece (branch-and-bound)
• Small data (board position) can be replicated
• Dynamic load balancing required
Data decomposition example

Dense Matrix-Vector Multiplication

P1
P2
P3
P4
P5
P6 . =
P7
P8
P9
Parallel application design guidelines

• Focus on one level of hierarchy at a time – from top to

bottom.
Parallel application design guidelines

• Focus on one level of hierarchy at a time – from top to

bottom.
• Devise the best decomposition strategy at the given level
Parallel application design guidelines

• Focus on one level of hierarchy at a time – from top to

bottom.
• Devise the best decomposition strategy at the given level
• Computing agents are likely to be parallel themselves
Parallel application design guidelines

• Focus on one level of hierarchy at a time – from top to

bottom.
• Devise the best decomposition strategy at the given level
• Computing agents are likely to be parallel themselves
• Minimize interactions, synchronization, and data
movement among computing agents
Parallel application design guidelines

• Focus on one level of hierarchy at a time – from top to

• Serial run time, TS: time required by best known method on a

single computing agent
Parallel algorithm analysis

• Serial run time, TS: time required by best known method on a

single computing agent
• Problem size, W = total amount of work: TS = kW
Parallel algorithm analysis

• Serial run time, TS: time required by best known method on a

single computing agent
• Problem size, W = total amount of work: TS = kW
• Parallel run time, TP: time elapsed between start of computation
until the last of the p computing agents finishes
• Overhead, sum of all wasted compute resources: TO = pTP – TS
• Speedup, ratio of serial to parallel time: S = TS/TP = pTS/(TS+TO)
Parallel algorithm analysis

• Serial run time, TS: time required by best known method on a

• Function fE(p) of the number

of computing agents p by
which the problem size W
must grow in order to maintain
a given efficiency E.
Isoefficiency function

• Function fE(p) of the number

of computing agents p by
which the problem size W
must grow with p in order to
maintain a given efficiency E.
• Captures the effect of
communication, load-
imbalance, contention, serial-
bottlenecks, etc.
Isoefficiency function
Scalability analysis

E = S/p = TS/pTP
Scalability analysis

E = S/p = TS/pTP

Since TO = pTP – TS or pTP = TS +TO

Scalability analysis

E = S/p = TS/pTP

Since TO = pTP – TS or pTP = TS +TO

Therefore, E = TS/(TS +TO), or E = kW/(kW+TO), because TS = kW

Scalability analysis

E = S/p = TS/pTP

Since TO = pTP – TS or pTP = TS +TO

Therefore, E = TS/(TS +TO), or E = kW/(kW+TO), because TS = kW

W = TO . E/k(1-E)
Scalability analysis

E = S/p = TS/pTP

Since TO = pTP – TS or pTP = TS +TO

Therefore, E = TS/(TS +TO), or E = kW/(kW+TO), because TS = kW

W = TO . E/k(1-E)

W ~ TO
Scalability analysis: W = O(n3)

Algorithm A Algorithm B

TP = O(n3/p) + O(n2/√p) TP = O(n3/p) + O(n√n)

Scalability analysis: W = O(n3)

Algorithm A Algorithm B

TP = O(n3/p) + O(n2/√p) TP = O(n3/p) + O(n√n)

W = O(n3) => n3 ~ n2√p W = O(n3) => n3 ~ n1.5p

Scalability analysis: W = O(n3)

Algorithm A Algorithm B

TP = O(n3/p) + O(n2/√p) TP = O(n3/p) + O(n√n)

W = O(n3) => n3 ~ n2√p W = O(n3) => n3 ~ n1.5p

n ~ √p n1.5 ~ p
Scalability analysis: W = O(n3)

Algorithm A Algorithm B

TP = O(n3/p) + O(n2/√p) TP = O(n3/p) + O(n√n)

W = O(n3) => n3 ~ n2√p W = O(n3) => n3 ~ n1.5p

n ~ √p n1.5 ~ p

W = O(n3) = O(p1.5) W = O(n3) = O(p2)

Parallel algorithm design and analysis

Dense Matrix-Vector Multiplication (1-D decomposition)

P1
P2
P3
P4
P5 A x y
P6 . =
P7
P8
P9
Parallel algorithm design and analysis

Dense Matrix-Vector Multiplication (2-D decomposition)

P1 P2 P3 P1 P1

P5 P6

A . x = y

P7 P8 P9
Parallel matrix-vector multiplication:
1-D decomposition
Parallel matrix-vector multiplication:
2-D decomposition
Scalability analysis of matrix-vector
multiplication (1-D decomposition)

TP = n2/p + tslog(p) + twn

Scalability analysis of matrix-vector
multiplication (1-D decomposition)

TP = n2/p + tslog(p) + twn

TO = tsplog(p) + twpn
Scalability analysis of matrix-vector
multiplication (1-D decomposition)

TP = n2/p + tslog(p) + twn

TO = tsplog(p) + twpn
W = O(n2)
Scalability analysis of matrix-vector
multiplication (1-D decomposition)

TP = n2/p + tslog(p) + twn

TO = tsplog(p) + twpn
W = O(n2)
1: n2 ~ plog(p)
Scalability analysis of matrix-vector
multiplication (1-D decomposition)

TP = n2/p + tslog(p) + twn

TO = tsplog(p) + twpn
W = O(n2)
1: n2 ~ plog(p)
2: n2 ~ pn, or n ~ p, or W = O(n2) = O(p2)
Scalability analysis of matrix-vector
multiplication (1-D decomposition)

TP = n2/p + tslog(p) + twn

TO = tsplog(p) + twpn
W = O(n2)
1: n2 ~ plog(p)
2: n2 ~ pn, or n ~ p, or W = O(n2) = O(p2)
Scalability analysis of matrix-vector
multiplication (2-D decomposition)

TP = n2/p + tslog(p) + tw(n/√p)log(p)

Scalability analysis of matrix-vector
multiplication (2-D decomposition)

TP = n2/p + tslog(p) + tw(n/√p)log(p)

TO = tsplog(p) + twn√plog(p)
Scalability analysis of matrix-vector
multiplication (2-D decomposition)

TP = n2/p + tslog(p) + tw(n/√p)log(p)

TO = tsplog(p) + twn√plog(p)
W = O(n2)
Scalability analysis of matrix-vector
multiplication (2-D decomposition)

TP = n2/p + tslog(p) + tw(n/√p)log(p)

TO = tsplog(p) + twn√plog(p)
W = O(n2)
1: n2 ~ plog(p)
Scalability analysis of matrix-vector
multiplication (2-D decomposition)

TP = n2/p + tslog(p) + tw(n/√p)log(p)

TO = tsplog(p) + twn√plog(p)
W = O(n2)
1: n2 ~ plog(p)
2: n2 ~ n√plog(p), or n ~ √plog(p),
or W = O(n2) = O(plog2p)
Scalability analysis of matrix-vector
multiplication (2-D decomposition)

TP = n2/p + tslog(p) + tw(n/√p)log(p)

TO = tsplog(p) + twn√plog(p)
W = O(n2)
1: n2 ~ plog(p)
2: n2 ~ n√plog(p), or n ~ √plog(p),
or W = O(n2) = O(plog2p)
Isoefficiency function of dense sparse matrix-
vector multiplication

1-D decomposition 2-D decomposition

W ~ p2 W ~ plog2p

2-D decomposition is likely to yield higher speedups, require smaller

problems to deliver the speedups, and scale more readily to larger number
of computing agents.
Concluding remarks

• Parallelism necessary for continued performance

improvement.
Concluding remarks

• Parallelism necessary for continued performance

improvement.
• Complex hierarchy of parallel computing
hardware and programming paradigms
Concluding remarks

• Parallelism necessary for continued performance

improvement.
• Complex hierarchy of parallel computing
hardware and programming paradigms
• Systematic top down parallel application design
Concluding remarks

• Parallelism necessary for continued performance

improvement.
• Complex hierarchy of parallel computing
hardware and programming paradigms
• Systematic top down parallel application design
• Decomposition strategy is critical
Concluding remarks

• Parallelism necessary for continued performance

Chapter 3 - U.S.-China Competition in Emerging Technologies
No ratings yet
Chapter 3 - U.S.-China Competition in Emerging Technologies
102 pages
Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
No ratings yet
Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
330 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
PARAM Siddhi-AI System Manual Ver1.0
No ratings yet
PARAM Siddhi-AI System Manual Ver1.0
88 pages
Nvidia Learning Learning Path Developers It Administrators
No ratings yet
Nvidia Learning Learning Path Developers It Administrators
19 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
78 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Generic Questions
No ratings yet
Generic Questions
70 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
Error Log
No ratings yet
Error Log
302 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Installation Manual
No ratings yet
Installation Manual
122 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Basics CUDA
No ratings yet
Basics CUDA
55 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Programming Models For GPU Architecture
No ratings yet
Programming Models For GPU Architecture
55 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
PDC 3
No ratings yet
PDC 3
26 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
2023 CSC14120 Lecture01 CUDAIntroduction
No ratings yet
2023 CSC14120 Lecture01 CUDAIntroduction
32 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel Computing Unit 3 - Principles of Parallel Computing Design
No ratings yet
Parallel Computing Unit 3 - Principles of Parallel Computing Design
78 pages
Thesis PPT Hritu Raj-1
No ratings yet
Thesis PPT Hritu Raj-1
26 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Voyis VSLAM Powered by EIVA NaviSuite 1.2 - Installation Guide
No ratings yet
Voyis VSLAM Powered by EIVA NaviSuite 1.2 - Installation Guide
20 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
HPC Note
No ratings yet
HPC Note
39 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Customization Guide
No ratings yet
Customization Guide
25 pages
Programming Massively Parallel Processors 4th Edition Wenmei W Hwu Instant Download
No ratings yet
Programming Massively Parallel Processors 4th Edition Wenmei W Hwu Instant Download
77 pages
Partitioning
No ratings yet
Partitioning
37 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
Cud A Reference Manual
No ratings yet
Cud A Reference Manual
299 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Filtering SQL Injection From Classic ASP
No ratings yet
Filtering SQL Injection From Classic ASP
130 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
Maya Render Log
No ratings yet
Maya Render Log
24 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
Release 445 Driver For Windows, Version 445.87
No ratings yet
Release 445 Driver For Windows, Version 445.87
39 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
03-Task Decomposition and Mapping
No ratings yet
03-Task Decomposition and Mapping
62 pages
Anjum 2017 Cloud-Based Scalable Object Detection and Classification in Video Streams Accepted
No ratings yet
Anjum 2017 Cloud-Based Scalable Object Detection and Classification in Video Streams Accepted
35 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Computation 12 00061
No ratings yet
Computation 12 00061
13 pages
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
No ratings yet
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
19 pages
COSC 3101A - Design and Analysis of Algorithms 7
No ratings yet
COSC 3101A - Design and Analysis of Algorithms 7
50 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
User's Guide For Quantum ESPRESSO (v.6.5)
No ratings yet
User's Guide For Quantum ESPRESSO (v.6.5)
26 pages
CUDA Occupancy Calculator
No ratings yet
CUDA Occupancy Calculator
26 pages
Flynns
No ratings yet
Flynns
41 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
Nvidia Magnum Io Gpudirect Storage Overview Guide: 1.1. Related Documents
No ratings yet
Nvidia Magnum Io Gpudirect Storage Overview Guide: 1.1. Related Documents
22 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
MCUDA: An Efficient Implementation of CUDA Kernels On Multi-Cores
No ratings yet
MCUDA: An Efficient Implementation of CUDA Kernels On Multi-Cores
19 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Universidad Nacional Mayor de San Marcos: Arquitectura de Computadoras Mg. Juan Carlos Gonzales Suarez 2019 - I
No ratings yet
Universidad Nacional Mayor de San Marcos: Arquitectura de Computadoras Mg. Juan Carlos Gonzales Suarez 2019 - I
22 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
DaVinci Resolve 12 Configuration Guide
No ratings yet
DaVinci Resolve 12 Configuration Guide
68 pages
WRF-GPU DR Young-Tae+Kim
No ratings yet
WRF-GPU DR Young-Tae+Kim
22 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
YOLO Based Detection and Classification of Objects in Video Records
No ratings yet
YOLO Based Detection and Classification of Objects in Video Records
5 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
CuDNN Installation Guide
No ratings yet
CuDNN Installation Guide
13 pages
Jetson Orin Nano Developer Kit Datasheet
No ratings yet
Jetson Orin Nano Developer Kit Datasheet
2 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
CUDA Based Minimum Spanning Tree
No ratings yet
CUDA Based Minimum Spanning Tree
8 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
From Everand
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet