Pytorch Fundamentals
Pytorch Fundamentals
What is PyTorch?
PyTorch is an open source machine learning and deep learning framework.
Topic Contents
Introduction to tensors Tensors are the basic building block of all of machine learning and deep learning.
Creating tensors Tensors can represent almost any kind of data (images, words, tables of numbers).
Getting information from
If you can put information into a tensor, you'll want to get it out too.
tensors
Machine learning algorithms (like neural networks) involve manipulating tensors in
Manipulating tensors
many different ways such as adding, multiplying, combining.
Dealing with tensor One of the most common issues in machine learning is dealing with shape mismatches
shapes (trying to mixed wrong shaped tensors with other tensors).
If you've indexed on a Python list or NumPy array, it's very similar with tensors, except
Indexing on tensors
they can have far more dimensions.
Mixing PyTorch tensors PyTorch plays with tensors (torch.Tensor), NumPy likes arrays (np.ndarray) sometimes
and NumPy you'll want to mix and match these.
Machine learning is very experimental and since it uses a lot of randomness to work,
Reproducibility
sometimes you'll want that randomness to not be so random.
GPUs (Graphics Processing Units) make your code faster, PyTorch makes it easy to run
Running tensors on GPU
your code on GPUs.
Where can you get help?
Notebooks adapted from live on GitHub.
Importing PyTorch
Note: Before running any of the code in this notebook, you should have gone through the PyTorch setup
steps.
However, if you're running on Google Colab, everything should work (Google Colab comes with PyTorch
and other libraries installed).
Let's start by importing PyTorch and checking the version we're using.
In [1]:
import torch
torch.__version__
Introduction to tensors
Now we've got PyTorch imported, it's time to learn about tensors.
Tensors are the fundamental building block of machine learning.
Their job is to represent data in a numerical way.
For example, you could represent an image as a tensor with shape [3, 224, 224] which would mean
[colour_channels, height, width], as in the image has 3 colour channels (red, green, blue), a height of 224 pixels
and a width of 224 pixels.
example of going from an input image to a tensor representation of the image, image gets broken down into 3
colour channels as well as numbers to represent the height and width
In tensor-speak (the language used to describe tensors), the tensor would have three dimensions, one for
colour_channels, height and width.
Creating tensors
PyTorch loves tensors. So much so there's a whole documentation page dedicated to the torch.Tensor class.
Your first piece of homework is to read through the documentation on torch.Tensor for 10-minutes. But you can get
to that later.
Let's code.
The first thing we're going to create is a scalar.
A scalar is a single number and in tensor-speak it's a zero dimension tensor.
Note: That's a trend for this course. We'll focus on writing specific code. But often I'll set exercises which
involve reading and getting familiar with the PyTorch documentation. Because after all, once you're
finished this course, you'll no doubt want to learn more. And the documentation is somewhere you'll be
finding yourself quite often.
In [2]:
# Scalar
scalar = torch.tensor(7)
scalar
Out [3]: 0
In [4]: # Get the Python number within a tensor (only works with one-element tensors)
scalar.item()
Out [4]: 7
In [5]:
# Vector
vector = torch.tensor([7, 7])
vector
In [6]:
# Check the number of dimensions of vector
vector.ndim
Out [6]: 1
Hmm, that's strange, vector contains two numbers but only has a single dimension.
I'll let you in on a trick.
You can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside ([)
and you only need to count one side.
How many square brackets does vector have?
Another important concept for tensors is their shape attribute. The shape tells you how the elements inside them are
arranged.
Let's check out the shape of vector.
In [7]:
# Check shape of vector
vector.shape
The above returns torch.Size([2]) which means our vector has a shape of [2]. This is because of the two
elements we placed inside the square brackets ([7, 7]).
Let's now see a matrix.
In [8]:
# Matrix
MATRIX = torch.tensor([[7, 8],
[9, 10]])
MATRIX
Wow! More numbers! Matrices are as flexible as vectors, except they've got an extra dimension.
In [9]:
# Check number of dimensions
MATRIX.ndim
Out [9]: 2
MATRIX has two dimensions (did you count the number of square brakcets on the outside of one side?).
What shape do you think it will have?
In [10]:
MATRIX.shape
We get the output torch.Size([2, 2]) because MATRIX is two elements deep and two elements wide.
How about we create a tensor?
In [11]:
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
[3, 6, 9],
[2, 4, 5]]])
TENSOR
In [12]:
# Check number of dimensions for TENSOR
TENSOR.ndim
Out 3
[12]:
Lower or upper
Name What is it? Number of dimensions
(usually/example)
scalar a single number 0 Lower (a)
a number with direction (e.g. wind speed
vector with direction) but can also have many 1 Lower (y)
other numbers
matrix a 2-dimensional array of numbers 2 Upper (Q)
can be any number, a 0-dimension
tensor an n-dimensional array of numbers tensor is a scalar, a 1-dimension tensor Upper (X)
is a vector
Random tensors
We've established tensors represent some form of data.
And machine learning models such as neural networks manipulate and seek patterns within tensors.
But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've
being doing).
Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random
numbers as it works through data to better represent it.
In essence:
Start with random numbers -> look at data -> update random numbers -> look at data -> update random
numbers...
As a data scientist, you can define how the machine learning model starts (initialization), looks at data
(representation) and updates (optimization) its random numbers.
We'll get hands on with these steps later on.
For now, let's see how to create a tensor of random numbers.
We can do so using torch.rand() and passing in the size parameter.
In [14]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype
The flexibility of torch.rand() is that we can adjust the size to be whatever we want.
For example, say you wanted a random tensor in the common image shape of [224, 224, 3] ([height, width,
color_channels]).
In [15]:
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim
We can do the same to create a tensor of all ones except using torch.ones() instead.
In [18]:
# Use torch.arange(), torch.range() is deprecated
zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the f
Sometimes you might want one tensor of a certain type with the same shape as another tensor.
For example, a tensor of all zeros with the same shape as a previous tensor.
To do so you can use torch.zeros_like(input) or torch.ones_like(input) which return a tensor filled with zeros or
ones in the same shape as the input respectively.
In [19]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros
Tensor datatypes
There are many different tensor datatypes available in PyTorch.
Some are specific for CPU and some are better for GPU.
Getting to know which is which can take some time.
Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing
toolkit called CUDA).
The most common type (and generally the default) is torch.float32 or torch.float.
This is referred to as "32-bit floating point".
But there's also 16-bit floating point (torch.float16 or torch.half) and 64-bit floating point (torch.float64 or
torch.double).
And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.
Plus more!
Note: An integer is a flat round number like 7 whereas a float has a decimal 7.0.
The reason for all of these is to do with precision in computing.
Precision is the amount of detail used to describe a number.
The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.
This matters in deep learning and numerical computing because you're making so many operations, the more detail
you have to calculate on, the more compute you have to use.
So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation
metrics like accuracy (faster to compute but less accurate).
Resources:
See the PyTorch documentation for a list of all available tensor datatypes.
Read the Wikipedia page for an overview of what precision in computing) is.
Let's see how to create some tensors with specific datatypes. We can do so using the dtype parameter.
In [20]: # Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
dtype=None, # defaults to None, which is torch.float32
device=None, # defaults to None, which uses the defaul
requires_grad=False) # if True, operations performed o
Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across
in PyTorch are datatype and device issues.
For example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the
same format).
Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be
on the same device).
We'll see more of this device talk later on.
For now let's create a tensor with dtype=torch.float16.
In [21]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
dtype=torch.float16) # torch.half would also work
float_16_tensor.dtype
Out torch.float16
[21]:
Note: When you run into issues in PyTorch, it's very often one to do with one of the three attributes above.
So when the error messages show up, sing yourself a little song called "what, what, where":
"what shape are my tensors? what datatype are they and where are they stored? what shape, what
datatype, where where where"
Basic operations
Let's start with a few of the fundamental operations, addition (+), subtraction (-), mutliplication (*).
They work just as you think they would.
In [23]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10
Notice how the tensor values above didn't end up being tensor([110, 120, 130]), this is because the values inside
the tensor don't change unless they're reassigned.
In [25]:
# Tensors don't change unless reassigned
tensor
PyTorch also has a bunch of built-in functions like torch.mul() (short for multiplication) and torch.add() to perform
basic operations.
However, it's more common to use the operator symbols like * instead of torch.mul()
In [30]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1-
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)
In [31]:
import torch
tensor = torch.tensor([1, 2, 3])
tensor.shape
Out torch.Size([3])
[31]:
The difference between element-wise multiplication and matrix multiplication is the addition of values.
For our tensor variable with values [1, 2, 3]:
Out tensor(14)
[33]:
In [34]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor
Out tensor(14)
[34]:
In [35]:
%%time
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expens
value = 0
for i in range(len(tensor)):
value += tensor[i] * tensor[i]
value
torch.matmul(tensor, tensor)
In [37]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
[3, 4],
[5, 6]], dtype=torch.float32)
---------------------------------------------------------------------------RuntimeError
<a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
<a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
<a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
<a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
<a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
<a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
---> <a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code/
RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)
We can make matrix multiplication work between tensor_A and tensor_B by making their inner dimensions match.
One of the ways to do this is with a transpose (switch the dimensions of a given tensor).
You can perform transposes in PyTorch using either:
torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are
the dimensions to be swapped.
tensor.T - where tensor is the desired tensor to transpose.
tensor([[1., 2.],
[3., 4.],
[5., 6.]])
tensor([[ 7., 10.],
[ 8., 11.],
[ 9., 12.]])
tensor([[1., 2.],
[3., 4.],
[5., 6.]])
tensor([[ 7., 8., 9.],
[10., 11., 12.]])
New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])
Output:
In [41]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A, tensor_B.T)
Without the transpose, the rules of matrix mulitplication aren't fulfilled and we get an error like above.
How about a visual?
You can create your own matrix multiplication visuals like this at http://matrixmultiplication.xyz/.
Note: A matrix multiplication like this is also referred to as the dot product of two matrices.
y = x ⋅ AT + b
Where:
x is the input to the layer (deep learning is a stack of layers like torch.nn.Linear() and others on top of each
other).
A is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural
network learns to better represent patterns in the data (notice the "T", that's because the weights matrix gets
transposed).
Note: You might also often see W or another letter like X used to showcase the weights matrix.
b is the bias term used to slightly offset the weights and inputs.
y is the output (a manipulation of the input in the hopes to discover patterns in it).
This is a linear function (you may have seen something like y = mx + b in high school or elsewhere), and can be
used to draw a straight line!
Let's play around with a linear layer.
Try changing the values of in_features and out_features below and see what happens.
Do you notice anything to do with the shapes?
In [42]:
# Since the linear layer starts with a random weights matrix, let's make it reproduci
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of in
out_features=6) # out_features = describes outer value
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")
Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
[4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
[6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
grad_fn=<AddmmBackward0>)
Question: What happens if you change in_features from 2 to 3 above? Does it error? How could you
change the shape of the input (x) to accomodate to the error? Hint: what did we have to do to tensor_B
above?
If you've never done it before, matrix multiplication can be a confusing topic at first.
But after you've played around with it a few times and even cracked open a few neural networks, you'll notice it's
everywhere.
Remember, matrix multiplication is all you need.
matrix multiplication is all you need
When you start digging into neural network layers and building your own, you'll find matrix multiplications everywhere.
Source: https://marksaroufim.substack.com/p/working-class-deep-learner
Out tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
[43]:
In [44]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")
Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450
Note: You may find some methods such as torch.mean() require tensors to be in torch.float32 (the most
common) or another specific datatype, otherwise the operation will fail.
You can also do the same as above with torch methods.
Positional min/max
You can also find the index of a tensor where the max or minimum occurs with torch.argmax() and torch.argmin()
respectively.
This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself
(we'll see this in a later section when using the softmax activation function).
Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0
Out torch.float32
[47]:
Now we'll create another tensor the same as before but change its datatype to torch.float16.
In [48]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16
Out tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)
[48]:
In [49]:
# Create a int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8
Out tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)
[49]:
Note: Different datatypes can be confusing to begin with. But think of it like this, the lower the number
(e.g. 32, 16, 8), the less precise a computer stores the value. And with a lower amount of storage, this
generally results in faster computation and a smaller overall model. Mobile-based neural networks often
operate with 8-bit integers, smaller and faster to run but less accurate than their float32 counterparts. For
more on this, I'd read up about precision in computing).
Exercise: So far we've covered a fair few tensor methods but there's a bunch more in the torch.Tensor
documentation, I'd recommend spending 10-minutes scrolling through and looking into any that catch
your eye. Click on them and then write them out in code yourself to see what happens.
In [50]:
# Create a tensor
import torch
x = torch.arange(1., 8.)
x, x.shape
Out (tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))
[51]:
In [52]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(1, 7)
z, z.shape
Out (tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))
[52]:
Remember though, changing the view of a tensor with torch.view() really only creates a new view of the same
tensor.
So changing the view changes the original tensor too.
In [53]:
# Changing z changes x
z[:, 0] = 5
z, x
Out (tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))
[53]:
If we wanted to stack our new tensor on top of itself five times, we could do so with torch.stack().
In [54]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # try changing dim to dim=1 and see what
x_stacked
In [55]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")
And to do the reverse of torch.squeeze() you can use torch.unsqueeze() to add a dimension value of 1 at a specific
index.
You can also rearrange the order of axes values with torch.permute(input, dims), where the input gets turned into
a view with new dims.
Note: Because permuting returns a view (shares the same data as the original), the values in the permuted
tensor will be the same as the original tensor and if you change the values in the view, it will change the
values of the original.
In [58]:
# Create a tensor
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape
Indexing values goes outer dimension -> inner dimension (check out the square brackets).
You can also use : to specify "all values in this dimension" and then use a comma (,) to add another dimension.
In [60]: # Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]
Out tensor([[1, 2, 3]])
[60]:
In [61]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]
Out tensor([5])
[62]:
In [63]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]
Indexing can be quite confusing to begin with, especially with larger tensors (I still have to try indexing multiple
times to get it right). But with a bit of practice and following the data explorer's motto (visualize, visualize, visualize),
you'll start to get the hang of it.
In [64]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor
Note: By default, NumPy arrays are created with the datatype float64 and if you convert it to a PyTorch
tensor, it'll keep the same datatype (as above).
However, many PyTorch calculations default to using float32.
So if you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor
(float32), you can use tensor = torch.from_numpy(array).type(torch.float32).
Because we reassigned tensor above, if you change the tensor, the array stays the same.
In [65]:
# Change the array, keep the tensor
array = array + 1
array, tensor
And if you want to go from PyTorch tensor to NumPy array, you can call tensor.numpy().
In [66]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor
Out (tensor([1., 1., 1., 1., 1., 1., 1.]),
[66]: array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))
And the same rule applies as above, if you change the original tensor, the new numpy_tensor stays the same.
In [67]:
# Change the tensor, keep the array the same
tensor = tensor + 1
tensor, numpy_tensor
Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.
Why?
So you can perform repeatable experiments.
For example, you create an algorithm capable of achieving X performance.
And then your friend tries it out to verify you're not crazy.
How could they do such a thing?
That's where reproducibility comes in.
In other words, can you get the same (or very similar) results on your computer running the same code as I get on
mine?
Let's see a brief example of reproducibility in PyTorch.
We'll start by creating two random tensors, since they're random, you'd expect them to be different right?
In [68]:
import torch
print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B
Tensor A:
tensor([[0.8016, 0.3649, 0.6286, 0.9663],
[0.7687, 0.4566, 0.5745, 0.9200],
[0.3230, 0.8613, 0.0919, 0.3102]])
Tensor B:
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
[0.3743, 0.5220, 0.1336, 0.9666],
[0.9754, 0.8474, 0.8988, 0.1105]])
Just as you might've expected, the tensors come out with different values.
But what if you wanted to created two random tensors with the same values.
As in, the tensors would still contain random values but they would be of the same flavour.
That's where torch.manual_seed(seed) comes in, where seed is an integer (like 42 but it could be anything) that
flavours the randomness.
Let's try it out by creating some more flavoured random tensors.
print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D
Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
[0.3904, 0.6009, 0.2566, 0.7936],
[0.9408, 0.1332, 0.9346, 0.5936]])
Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
[0.3904, 0.6009, 0.2566, 0.7936],
[0.9408, 0.1332, 0.9346, 0.5936]])
Nice!
It looks like setting the seed worked.
Resource: What we've just covered only scratches the surface of reproducibility in PyTorch. For more, on
reproducbility in general and random seeds, I'd checkout:
The PyTorch reproducibility documentation (a good exericse would be to read through this for 10-
minutes and even if you don't understand it now, being aware of it is important).
The Wikipedia random seed page (this'll give a good overview of random seeds and
pseudorandomness in general).
1. Getting a GPU
You may already know what's going on when I say GPU. But if not, there are a few ways to get access to one.
Difficulty to
Method Pros Cons How to setup
setup
Free to use, almost zero setup Doesn't save your data
Follow the Google
Google Colab Easy required, can share work with outputs, limited compute,
Colab Guide
others as easy as a link subject to timeouts
Follow the PyTorch
Run everything locally on your GPUs aren't free, require
Use your own Medium installation
own machine upfront cost
guidelines
Can get expensive if running Follow the PyTorch
Cloud computing Medium- Small upfront cost, access to
continually, takes some time installation
(AWS, GCP, Azure) Hard almost infinite compute
to setup right guidelines
There are more options for using GPUs but the above three will suffice for now.
Personally, I use a combination of Google Colab and my own personal computer for small scale experiments (and
creating this course) and go to cloud resources when I need more compute power.
Resource: If you're looking to purchase a GPU of your own but not sure what to get, Tim Dettmers has an
excellent guide.
To check if you've got access to a Nvidia GPU, you can run !nvidia-smi where the ! (also called bang) means "run
this on the command line".
In [70]:
!nvidia-smi
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1061 G /usr/lib/xorg/Xorg 53MiB |
| 0 N/A N/A 2671131 G /usr/lib/xorg/Xorg 97MiB |
| 0 N/A N/A 2671256 G /usr/bin/gnome-shell 9MiB |
+-----------------------------------------------------------------------------+
If you don't have a Nvidia GPU accessible, the above will output something like:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA drive
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Out True
[71]:
If the above outputs True, PyTorch can see and use the GPU, if it outputs False, it can't see the GPU and in that case,
you'll have to go back through the installation steps.
Now, let's say you wanted to setup your code so it ran on CPU or the GPU if it was available.
That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.
Let's create a device variable to store what kind of device is available.
In [72]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device
Out 'cuda'
[72]:
If the above output "cuda" it means we can set all of our PyTorch code to use the available CUDA device (a GPU)
and if it output "cpu", our PyTorch code will stick with the CPU.
Note: In PyTorch, it's best practice to write device agnostic code. This means code that'll run on CPU
(always available) or GPU (if available).
If you want to do faster computing you can use a GPU but if you want to do much faster computing, you can use
multiple GPUs.
You can count the number of GPUs PyTorch has access to using torch.cuda.device_count().
In [73]:
# Count number of devices
torch.cuda.device_count()
Out 1
[73]:
Knowing the number of GPUs PyTorch has access to is helpful incase you wanted to run a specific process on one
GPU and another process on another (PyTorch also has features to let you run a process across all GPUs).
In [4]:
# Check for Apple Silicon GPU
import torch
torch.backends.mps.is_available() # Note this will print false if you're not running
Out [4]: True
In [7]:
# Set device type
device = "mps" if torch.backends.mps.is_available() else "cpu"
device
As before, if the above output "mps" it means we can set all of our PyTorch code to use the available Apple Silicon
GPU.
In [8]: if torch.cuda.is_available():
device = "cuda" # Use NVIDIA GPU (if available)
elif torch.backends.mps.is_available():
device = "mps" # Use Apple Silicon GPU (if available)
else:
device = "cpu" # Default to CPU if no GPU is available
Why do this?
GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our device
agnostic code (see above), it'll run on the CPU.
Note: Putting a tensor on GPU using to(device) (e.g. some_tensor.to(device)) returns a copy of that
tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them:
some_tensor = some_tensor.to(device)
Let's try creating a tensor and putting it on the GPU (if it's available).
In [9]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])
If you have a GPU available, the above code will output something like:
tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='cuda:0')
Notice the second tensor has device='cuda:0', this means it's stored on the 0th GPU available (GPUs are 0 indexed,
if two GPUs were available, they'd be 'cuda:0' and 'cuda:1' respectively, up to 'cuda:n').
In [75]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()
---------------------------------------------------------------------------TypeError
<a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
----> <a href='vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory firs
Instead, to get a tensor back to CPU and usable with NumPy we can use Tensor.cpu().
This copies the tensor to CPU memory so it's usable with CPUs.
The above returns a copy of the GPU tensor in CPU memory so the original tensor is still on GPU.
In [77]:
tensor_on_gpu
Exercises
All of the exercises are focused on practicing the code above.
You should be able to complete them by referencing each section or by following the resource(s) linked.
Resources:
1. Documentation reading - A big part of deep learning (and learning to code in general) is getting familiar with
the documentation of a certain framework you're using. We'll be using the PyTorch documentation a lot
throughout the rest of this course. So I'd recommend spending 10-minutes reading the following (it's okay if
you don't get some things for now, the focus is not yet full understanding, it's awareness). See the
documentation on torch.Tensor and for torch.cuda.
2. Create a random tensor with shape (7, 14).
3. Perform a matrix multiplication on the tensor from 2 with another random tensor with shape (1, 7) (hint: you
may have to transpose the second tensor).
4. Set the random seed to 0 and do exercises 2 & 3 over again.
5. Speaking of random seeds, we saw how to set it with torch.manual_seed() but is there a GPU equivalent? (hint:
you'll need to look into the documentation for torch.cuda for this one). If there is, set the GPU random seed to
1234.
6. Create two random tensors of shape (6, 4) and send them both to the GPU (you'll need access to a GPU for
this). Set torch.manual_seed(1234) when creating the tensors (this doesn't have to be the GPU random seed).
7. Perform a matrix multiplication on the tensors you created in 6 (again, you may have to adjust the shapes of
one of the tensors).
8. Find the maximum and minimum values of the output of 7.
9. Find the maximum and minimum index values of the output of 7.
10. Make a random tensor with shape (1, 1, 1, 10) and then create a new tensor with all the 1 dimensions
removed to be left with a tensor of shape (10). Set the seed to 7 when you create it and print out the first tensor
and it's shape as well as the second tensor and it's shape.