0% found this document useful (0 votes)

191 views16 pages

PyCUDA AH PDF

This document provides an introduction to using PyCUDA to run CUDA programs in Python. It demonstrates adding two matrices in parallel on a GPU. The tutorial shows how to set up a Google Colab notebook with GPU acceleration, import PyCUDA, allocate memory on the GPU, copy matrices from CPU to GPU, write a CUDA kernel to add the matrices in parallel, call the kernel to perform the calculation on the GPU, extract the result back to the CPU, and compare the result to adding the matrices sequentially on the CPU. The advantages of parallel programming include faster execution of repeated computations, efficient data handling, and not requiring as powerful machines as CPUs.

Uploaded by

jackops

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

191 views16 pages

PyCUDA AH PDF

Uploaded by

jackops

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

PyCUDA

Tutorial
Introduction
As we may have already seen, NVIDIA GPUs use CUDA cores to perform calculations. In this segment, we
will be using Python to run CUDA programs.

The difference between running C and Python is that Python has a lot more QOL especially in regards to
data visualization and scientific computing.
Aim
We will demonstrate one of the simplest applications of parallel programming. Adding of two matrices.
Note that each element in the matrix will be added in parallel i.e. in a single step. We will forgo the
classical for loop in this case. We will start with 2 2D matrices with an arbitrary size in numpy and then
will proceed to add them.
Getting started
To start our virtual workspace, we will use Google Colaboratory, address at:

https://colab.research.google.com/

Google’s cloud service will provide this to us for free (note that you will require a gmail account)
Setup

1. Top left, File-> new Python 3 notebook

2. Runtime -> Hardware accelerator -> GPU
3. Setup workspace with a new code cell

Since pycuda is not a native library in colab we need an additional line before importing the libraries.

!pip install pycuda.

Run the code segment first before proceeding (at the left, a play button)
Building
Wait for a bit while pycuda is being installed. After the build finishes, we are ready to proceed.

Firstly we need to import the basic libraries.

Declaring
We declare our matrix as follows

Of course, we are free to declare as we wish, but this will create a standard random number matrix,
repeat for a “b” matrix. Note that the cloud probably does not support double precision accuracy so to
avoid difficulties, we pre-convert to float32 or single precision.
GPU details
As we may have seen earlier, but GPU’s don’t support data in them, that is to say, we can’t say that
> accessing GPU
> GPU variable a
>a=3

We will need to allocate memory and then create a copy from the device and access it via pointers and
the run calculations on that.
We do this as follows:

As the memory allocation step

Then proceed with:

Which essentially copies a to a_gpu

htod stands for “host to device” that is, from your system to the GPU, we will be making the
reverse of this command to extract data from our GPU.
Writing the function
Interestingly enough, the function body has to be The equivalence in this is such that all a[idx] are
written in C being executed simultaneously.

Linearization of the 2D matrix will give us the

following relation between idx and (x,y) of an
index.

PS : don’t forget the “;”

a a+1 a+2 a+3 a+4 Array indexing.

threadIdx.x refers to x
a+5 a+(5+1) a+(5+2) a+(5+3) a+(5+4) threadIdx.y refers to y

blockDim.x refers to
a+(5*2) a+(5*2+1) . . . dimension in x which
in our case is 5
Hence, we can create
. . . . . an equivalent 1D
array of size 25x1
with the following
. . . . a+(5*4+4) index notation.
= a+24

s e
Calling the function

Have to create a separate variable to get This will extract the function from
the function as follows “SourceModule” and execute it on the
GPU copies of a and b
Extracting the
result

This step is pretty straightforward. For

simplicity, we will create a different matrix
and copy data from device to host using

dtoh

Which stands for “device to host”

Result

We should get some output in the form of

matrices

As an exercise, we could check the check

the result without the GPU in action as a+b

And compare the results

Advantages of parallel programming
1. Faster execution of repeated computation
2. Efficient handling of data
3. Does not need high power machines in comparison to CPU’s

Note: As we may already have seen, but parallel programming needs to be applied in very specific
circumstances. More specifically, it is best suited for work where a computation is not affected by its next
step.

Ex. repeated derivatives as opposed to complete derivative calculation.

Thank You

How To Train An Object Detection Model With Mmdetection - DLology
No ratings yet
How To Train An Object Detection Model With Mmdetection - DLology
7 pages
A Guide To Social Media Intelligence
No ratings yet
A Guide To Social Media Intelligence
36 pages
Scraping 1000's of News Articles Using 10 Simple Steps - by Kajal Yadav - Jun, 2020 - Towards Data Science
No ratings yet
Scraping 1000's of News Articles Using 10 Simple Steps - by Kajal Yadav - Jun, 2020 - Towards Data Science
24 pages
CS (Ai&ml)
No ratings yet
CS (Ai&ml)
181 pages
BuildingBotWithWatson PDF
No ratings yet
BuildingBotWithWatson PDF
248 pages
A Hybrid Framework Using Explainable AI (XAI) in Cyber-Risk Management For Defence and Recovery Against Phishing Attacks
No ratings yet
A Hybrid Framework Using Explainable AI (XAI) in Cyber-Risk Management For Defence and Recovery Against Phishing Attacks
14 pages
Jlr-8400 (e) 7zpna4701 (1版) Service Manual 190311
No ratings yet
Jlr-8400 (e) 7zpna4701 (1版) Service Manual 190311
70 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
23 pages
GPU Computing With Spark and Python
No ratings yet
GPU Computing With Spark and Python
33 pages
Programs of 8085
100% (1)
Programs of 8085
46 pages
Explainable Intrusion Detection For Cyber Defences in The Internet of Things Opportunities and Solutions
No ratings yet
Explainable Intrusion Detection For Cyber Defences in The Internet of Things Opportunities and Solutions
33 pages
Yousef Udacity Deep Learning Part 3 CNN
No ratings yet
Yousef Udacity Deep Learning Part 3 CNN
253 pages
Thesis For Arena
0% (1)
Thesis For Arena
29 pages
Dart
No ratings yet
Dart
21 pages
Vulnerabilities in Modern Web Applications
100% (1)
Vulnerabilities in Modern Web Applications
34 pages
Projects List - Raspberry PI Projects-1230 - Projects
0% (2)
Projects List - Raspberry PI Projects-1230 - Projects
44 pages
What Is Shodan?
No ratings yet
What Is Shodan?
3 pages
Python Core PDF
No ratings yet
Python Core PDF
4 pages
Thing Speak
No ratings yet
Thing Speak
18 pages
Naveen Python - For - Data-Science-Report
100% (1)
Naveen Python - For - Data-Science-Report
24 pages
Explainable Artificial Intelligence For Drug Discovery and Development - A Comprehensive Survey
No ratings yet
Explainable Artificial Intelligence For Drug Discovery and Development - A Comprehensive Survey
13 pages
DC Circuit Analysis
No ratings yet
DC Circuit Analysis
17 pages
CS2203 - Computer Organization - MTE1 Paper
No ratings yet
CS2203 - Computer Organization - MTE1 Paper
2 pages
HTML Tables and Forms (PDFDrive)
100% (1)
HTML Tables and Forms (PDFDrive)
68 pages
ExamI X86 Assembly Language
100% (1)
ExamI X86 Assembly Language
5 pages
Scripting PYTHON Programming Lab Manual by M Murali Mohan Reddy
No ratings yet
Scripting PYTHON Programming Lab Manual by M Murali Mohan Reddy
25 pages
Hacking The Art of Digital Exploration
No ratings yet
Hacking The Art of Digital Exploration
10 pages
Analysis of The Vulnerabilities of Unmanned Aerial Vehicles To Cyber Attacks Review-No2-2020
No ratings yet
Analysis of The Vulnerabilities of Unmanned Aerial Vehicles To Cyber Attacks Review-No2-2020
78 pages
Hadoop Tutorial
No ratings yet
Hadoop Tutorial
58 pages
PGCS - Opentext VIM Technical Architecture - WIP v0.1
No ratings yet
PGCS - Opentext VIM Technical Architecture - WIP v0.1
23 pages
The Bug Hunters Methodology v3
No ratings yet
The Bug Hunters Methodology v3
64 pages
IoT - Lecture 1
100% (1)
IoT - Lecture 1
71 pages
AI and Cybersecurity, Enhancing Digital Defense in The Age of Advanced Threats 2
No ratings yet
AI and Cybersecurity, Enhancing Digital Defense in The Age of Advanced Threats 2
4 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Yahya Thesis - Draft
100% (1)
Yahya Thesis - Draft
58 pages
1 s2.0 S0167923623001264 Main
No ratings yet
1 s2.0 S0167923623001264 Main
12 pages
Bigdata Engineer PDF
No ratings yet
Bigdata Engineer PDF
3 pages
Concurrent and Real-Time Programming in Java: © Andy Wellings, 2004
No ratings yet
Concurrent and Real-Time Programming in Java: © Andy Wellings, 2004
35 pages
Think Speak Iot Document
No ratings yet
Think Speak Iot Document
11 pages
Web Module 2
No ratings yet
Web Module 2
133 pages
Python Tuple
No ratings yet
Python Tuple
23 pages
Email List - 400
No ratings yet
Email List - 400
6 pages
20bce7201-Cs Final Lab Report
No ratings yet
20bce7201-Cs Final Lab Report
20 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
Learning Parallel Computing Environment Bioengineering
No ratings yet
Learning Parallel Computing Environment Bioengineering
269 pages
Python (Anaconda) - Installation Kit
No ratings yet
Python (Anaconda) - Installation Kit
7 pages
Shell Programming Examples
No ratings yet
Shell Programming Examples
6 pages
Informatics College Pokhara: Fundamentals of Computing CS4051NP
No ratings yet
Informatics College Pokhara: Fundamentals of Computing CS4051NP
37 pages
ML Lab Session 06 - VGG16-CNN
No ratings yet
ML Lab Session 06 - VGG16-CNN
15 pages
Microprocessor and Get The Result in Hexadecimal
No ratings yet
Microprocessor and Get The Result in Hexadecimal
14 pages
Raphics Rocessing NIT: Nust College of Electrical and Mechanical Engineering
No ratings yet
Raphics Rocessing NIT: Nust College of Electrical and Mechanical Engineering
27 pages
Python Programming-Grade 9
No ratings yet
Python Programming-Grade 9
53 pages
Python Data Structures
No ratings yet
Python Data Structures
20 pages
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
No ratings yet
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
6 pages
Python Content Manual
No ratings yet
Python Content Manual
95 pages
Alibaba Cloud Apsara Stack Solution: Version 2018/4/9
No ratings yet
Alibaba Cloud Apsara Stack Solution: Version 2018/4/9
39 pages
Image Processing With CUDA
No ratings yet
Image Processing With CUDA
66 pages
DS Toolbox DataScienceGenius
No ratings yet
DS Toolbox DataScienceGenius
1 page
Isom 3400 - Python For Business Analytics 1. Intro To Python
No ratings yet
Isom 3400 - Python For Business Analytics 1. Intro To Python
46 pages
How To Scan FC LUNS and SCSI Disks
0% (1)
How To Scan FC LUNS and SCSI Disks
6 pages
M.sc. Computer Science
No ratings yet
M.sc. Computer Science
18 pages
Marketing Mix For Apple I - Phone: by Sangam Abhay
100% (2)
Marketing Mix For Apple I - Phone: by Sangam Abhay
23 pages
Cuda - New Features and Beyond Ampere Programming For Developers PDF
No ratings yet
Cuda - New Features and Beyond Ampere Programming For Developers PDF
78 pages
01 ABAP Book - Basics
No ratings yet
01 ABAP Book - Basics
57 pages
Changes
No ratings yet
Changes
15 pages
Book
No ratings yet
Book
393 pages
Export To PDF Crystal Reports
100% (1)
Export To PDF Crystal Reports
2 pages
BSB Remote
No ratings yet
BSB Remote
9 pages
CSI Adhyayan Jul-Sep 2020 PDF
No ratings yet
CSI Adhyayan Jul-Sep 2020 PDF
138 pages
HP Product and Services
No ratings yet
HP Product and Services
41 pages
L14-File Management
No ratings yet
L14-File Management
23 pages
L15-File Management
No ratings yet
L15-File Management
19 pages
Brochure Philips LCD Monitors 2007 2008 en
No ratings yet
Brochure Philips LCD Monitors 2007 2008 en
28 pages
M580-2CH User Manual 20230620
No ratings yet
M580-2CH User Manual 20230620
40 pages
XLOOKUP
No ratings yet
XLOOKUP
21 pages
Database Management Systems
No ratings yet
Database Management Systems
41 pages
GATE Electrical Engineering 2014 Set 1
No ratings yet
GATE Electrical Engineering 2014 Set 1
21 pages
Nvidia-Learning-Training Course-Catalog
No ratings yet
Nvidia-Learning-Training Course-Catalog
32 pages
Knowledge Engineering: With Semantic Web Technologies
No ratings yet
Knowledge Engineering: With Semantic Web Technologies
20 pages
Typesetting in Wxmaxima: 1.1 Entering Material & Exporting L Tex Files
No ratings yet
Typesetting in Wxmaxima: 1.1 Entering Material & Exporting L Tex Files
9 pages
1a. B.Tech - First Year Student Enrollmet Process
No ratings yet
1a. B.Tech - First Year Student Enrollmet Process
28 pages
Readme
No ratings yet
Readme
6 pages
7W211 Quick Guide PDF
No ratings yet
7W211 Quick Guide PDF
2 pages
Emergency Vehicle Detection and Traffic Prevention Using 3ea5iu9zkw
No ratings yet
Emergency Vehicle Detection and Traffic Prevention Using 3ea5iu9zkw
9 pages
L13-File Management
No ratings yet
L13-File Management
18 pages
Lectii Photoshop - Frames
No ratings yet
Lectii Photoshop - Frames
3 pages
Introduction To Autocad: Technical College of Engineering Engineering Drawing and Autocad
No ratings yet
Introduction To Autocad: Technical College of Engineering Engineering Drawing and Autocad
11 pages
Cyber Attacks
No ratings yet
Cyber Attacks
5 pages
The Complete Guide To Event-Driven Architecture - by Seetharamugn - Medium
No ratings yet
The Complete Guide To Event-Driven Architecture - by Seetharamugn - Medium
11 pages
Sedes Ip
No ratings yet
Sedes Ip
11 pages
Biometric Slip Form
No ratings yet
Biometric Slip Form
4 pages
Seclore Data-Centric Security Platform
No ratings yet
Seclore Data-Centric Security Platform
2 pages
Trackpad Pro Ver. 5.0 Class 7
From Everand
Trackpad Pro Ver. 5.0 Class 7
Nidhi Arora
5/5 (1)
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Malware Protection And Removal
From Everand
Malware Protection And Removal
Frank Kern
No ratings yet
IoT Standard Requirements
From Everand
IoT Standard Requirements
Gerardus Blokdyk
No ratings yet

PyCUDA AH PDF

Uploaded by

PyCUDA AH PDF

Uploaded by

PyCUDA

1. Top left, File-> new Python 3 notebook

!pip install pycuda.

Firstly we need to import the basic libraries.

As the memory allocation step

Which essentially copies a to a_gpu

Linearization of the 2D matrix will give us the

PS : don’t forget the “;”

This step is pretty straightforward. For

Which stands for “device to host”

We should get some output in the form of

As an exercise, we could check the check

And compare the results

Ex. repeated derivatives as opposed to complete derivative calculation.

You might also like