00 Pytorch Fundamentals - Ipynb - Colab
00 Pytorch Fundamentals - Ipynb - Colab
ipynb - Colab
Importing PyTorch
Note: Before running any of the code in this notebook, you should have gone through the PyTorch setup steps.
However, if you're running on Google Colab, everything should work (Google Colab comes with PyTorch and other libraries
installed).
Let's start by importing PyTorch and checking the version we're using.
For example, Andrej Karpathy (head of AI at Tesla) has given several talks (PyTorch DevCon 2019, Tesla AI Day 2021) about how Tesla uses
PyTorch to power their self-driving computer vision models.
PyTorch is also used in other industries such as agriculture to power computer vision on tractors.
PyTorch also helps take care of many things such as GPU acceleration (making your code run faster) behind the scenes.
So you can focus on manipulating data and writing algorithms and PyTorch will make sure it runs fast.
And if companies such as Tesla and Meta (Facebook) use it to build models they deploy to power hundreds of applications, drive thousands of
cars and deliver content to billions of people, it's clearly capable on the development front too.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 1/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
This course is broken down into different sections (notebooks).
Subsequent notebooks build upon knowledge from the previous one (numbering starts at 00, 01, 02 and goes to whatever it ends up going to).
This notebook deals with the basic building block of machine learning and deep learning, the tensor.
Introduction to tensors Tensors are the basic building block of all of machine learning and deep learning.
Creating tensors Tensors can represent almost any kind of data (images, words, tables of numbers).
Getting information from tensors If you can put information into a tensor, you'll want to get it out too.
Manipulating tensors Machine learning algorithms (like neural networks) involve manipulating tensors in many different ways such as adding, multiplying, combining.
Dealing with tensor shapes One of the most common issues in machine learning is dealing with shape mismatches (trying to mix wrong shaped tensors with other tensors).
Indexing on tensors If you've indexed on a Python list or NumPy array, it's very similar with tensors, except they can have far more dimensions.
Mixing PyTorch tensors and NumPy PyTorch plays with tensors ( torch.Tensor ), NumPy likes arrays ( np.ndarray ) sometimes you'll want to mix and match these.
Reproducibility Machine learning is very experimental and since it uses a lot of randomness to work, sometimes you'll want that randomness to not be so random.
Running tensors on GPU GPUs (Graphics Processing Units) make your code faster, PyTorch makes it easy to run your code on GPUs.
And if you run into trouble, you can ask a question on the Discussions page there too.
There's also the PyTorch developer forums, a very helpful place for all things PyTorch.
import torch
torch.__version__
'2 4 1+cu121'
import torch
torch.__version__
'1.13.1+cu116'
This means if you're going through these materials, you'll see most compatability with PyTorch 1.10.0+, however if your version number is far
higher than that, you might notice some inconsistencies.
And if you do have any issues, please post on the course GitHub Discussions page.
For example, you could represent an image as a tensor with shape [3, 224, 224] which would mean [colour_channels, height, width] , as
in the image has 3 colour channels (red, green, blue), a height of 224 pixels and a width of 224 pixels.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 2/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
In tensor-speak (the language used to describe tensors), the tensor would have three dimensions, one for colour_channels , height and
width .
Your first piece of homework is to read through the documentation on torch.Tensor for 10-minutes. But you can get to that later.
Let's code.
Note: That's a trend for this course. We'll focus on writing specific code. But often I'll set exercises which involve reading and
getting familiar with the PyTorch documentation. Because after all, once you're finished this course, you'll no doubt want to learn
more. And the documentation is somewhere you'll be finding yourself quite often.
# Scalar
scalar = torch.tensor(7)
scalar
tensor(7)
scalar.ndim
# Get the Python number within a tensor (only works with one-element tensors)
scalar.item()
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 3/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
As in, you could have a vector [3, 2] to describe [bedrooms, bathrooms] in your house. Or you could have [3, 2, 2] to describe [bedrooms,
bathrooms, car_parks] in your house.
The important trend here is that a vector is flexible in what it can represent (the same with tensors).
# Vector
vector = torch.tensor([7, 7])
vector
tensor([7, 7])
Hmm, that's strange, vector contains two numbers but only has a single dimension.
You can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside ( [ ) and you only need to count
one side.
Another important concept for tensors is their shape attribute. The shape tells you how the elements inside them are arranged.
torch.Size([2])
The above returns torch.Size([2]) which means our vector has a shape of [2] . This is because of the two elements we placed inside the
square brackets ( [7, 7] ).
# Matrix
MATRIX = torch.tensor([[7, 8],
[9, 10]])
MATRIX
tensor([[ 7, 8],
[ 9, 10]])
Wow! More numbers! Matrices are as flexible as vectors, except they've got an extra dimension.
MATRIX has two dimensions (did you count the number of square brackets on the outside of one side?).
MATRIX.shape
torch.Size([2, 2])
We get the output torch.Size([2, 2]) because MATRIX is two elements deep and two elements wide.
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
[3, 6, 9],
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 4/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
[2, 4, 5]]])
TENSOR
tensor([[[1, 2, 3],
[3, 6, 9],
[2, 4, 5]]])
The one we just created could be the sales numbers for a steak and almond butter store (two of my favourite foods).
How many dimensions do you think it has? (hint: use the square bracket counting trick)
torch.Size([1, 3, 3])
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 5/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
Note: You might've noticed me using lowercase letters for scalar and vector and uppercase letters for MATRIX and TENSOR .
This was on purpose. In practice, you'll often see scalars and vectors denoted as lowercase letters such as y or a . And matrices
and tensors denoted as uppercase letters such as X or W .
You also might notice the names martrix and tensor used interchangably. This is common. Since in PyTorch you're often dealing
with torch.Tensor s (hence the tensor name), however, the shape and dimensions of what's inside will dictate what it actually is.
Let's summarise.
Name What is it? Number of dimensions Lo
vector a number with direction (e.g. wind speed with direction) but can also have many other numbers 1 Lo
tensor an n-dimensional array of numbers can be any number, a 0-dimension tensor is a scalar, a 1-dimension tensor is a vector Up
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 6/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
And machine learning models such as neural networks manipulate and seek patterns within tensors.
But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've been doing).
Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works
through data to better represent it.
In essence:
Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...
As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates
(optimization) its random numbers.
The flexibility of torch.rand() is that we can adjust the size to be whatever we want.
For example, say you wanted a random tensor in the common image shape of [224, 224, 3] ( [height, width, color_channels ]).
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 7/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim
This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them).
We can do the same to create a tensor of all ones except using torch.ones() instead.
Where:
Note: In Python, you can use range() to create a range. However in PyTorch, torch.range() is deprecated and may show an
error in the future.
/tmp/ipykernel_3695928/193451495.py:2: UserWarning: torch.range is deprecated and will be removed in a future release because its be
zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Sometimes you might want one tensor of a certain type with the same shape as another tensor.
For example, a tensor of all zeros with the same shape as a previous tensor.
To do so you can use torch.zeros_like(input) or torch.ones_like(input) which return a tensor filled with zeros or ones in the same shape
as the input respectively.
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 8/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
Some are specific for CPU and some are better for GPU.
Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).
The most common type (and generally the default) is torch.float32 or torch.float .
But there's also 16-bit floating point ( torch.float16 or torch.half ) and 64-bit floating point ( torch.float64 or torch.double ).
And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.
Plus more!
Note: An integer is a flat round number like 7 whereas a float has a decimal 7.0 .
The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.
This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on,
the more compute you have to use.
So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to
compute but less accurate).
Resources:
See the PyTorch documentation for a list of all available tensor datatypes.
Read the Wikipedia page for an overview of what precision in computing is.
Let's see how to create some tensors with specific datatypes. We can do so using the dtype parameter.
Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are datatype and
device issues.
For example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the same format).
Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device).
float_16_tensor.dtype
torch.float16
We've seen these before but three of the most common attributes you'll want to find out about tensors are:
shape - what shape is the tensor? (some operations require specific shape rules)
dtype - what datatype are the elements within the tensor stored in?
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGeu… 9/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
device - what device is the tensor stored on? (usually GPU or CPU)
Let's create a random tensor and find out details about it.
# Create a tensor
some_tensor = torch.rand(3, 4)
Note: When you run into issues in PyTorch, it's very often one to do with one of the three attributes above. So when the error
messages show up, sing yourself a little song called "what, what, where":
"what shape are my tensors? what datatype are they and where are they stored? what shape, what datatype, where where
where"
A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a
representation of the patterns in the input data.
Addition
Substraction
Multiplication (element-wise)
Division
Matrix multiplication
And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.
Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).
# Multiply it by 10
tensor * 10
Notice how the tensor values above didn't end up being tensor([110, 120, 130]) , this is because the values inside the tensor don't change
unless they're reassigned.
tensor([1, 2, 3])
Let's subtract a number and this time we'll reassign the tensor variable.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 10/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
# Subtract and reassign
tensor = tensor - 10
tensor
tensor([1, 2, 3])
PyTorch also has a bunch of built-in functions like torch.mul() (short for multiplication) and torch.add() to perform basic operations.
tensor([1, 2, 3])
However, it's more common to use the operator symbols like * instead of torch.mul()
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)
Resource: You can see all of the rules for matrix multiplication using torch.matmul() in the PyTorch documentation.
Let's create a tensor and perform element-wise multiplication and matrix multiplication on it.
import torch
tensor = torch.tensor([1, 2, 3])
tensor.shape
torch.Size([3])
The difference between element-wise multiplication and matrix multiplication is the addition of values.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 11/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
tensor([1, 4, 9])
# Matrix multiplication
torch.matmul(tensor, tensor)
tensor(14)
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor
tensor(14)
%%time
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
value += tensor[i] * tensor[i]
value
%%time
torch.matmul(tensor, tensor)
keyboard_arrow_down One of the most common errors in deep learning (shape errors)
Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and
sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/home/daniel/code/pytorch/pytorch-course/pytorch-deep-learning/00_pytorch_fundamentals.ipynb Cell 75 in <cell line: 10>()
<a href='vscode-notebook-cell://ssh-
remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code/pytorch/pytorch-course/pytorch-deep-
learning/00_pytorch_fundamentals.ipynb#Y134sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a> tensor_A = torch.tensor([[1, 2],
<a href='vscode-notebook-cell://ssh-
remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code/pytorch/pytorch-course/pytorch-deep-
learning/00_pytorch_fundamentals.ipynb#Y134sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a> [3, 4],
<a href='vscode-notebook-cell://ssh-
remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code/pytorch/pytorch-course/pytorch-deep-
learning/00_pytorch_fundamentals.ipynb#Y134sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a> [5, 6]],
dtype=torch.float32)
We can make <a matrix multiplication work between tensor_A and tensor_B by making their inner dimensions match.
href='vscode-notebook-cell://ssh-
remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code/pytorch/pytorch-course/pytorch-deep-
One of the ways to do this is with a transpose (switch the dimensions of a given tensor).
learning/00_pytorch_fundamentals.ipynb#Y134sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5'>6</a> tensor_B = torch.tensor([[7, 10],
<a href='vscode-notebook-cell://ssh-
You can perform transposes in PyTorch using either:
remote%2B7b22686f73744e616d65223a22544954414e2d525458227d/home/daniel/code/pytorch/pytorch-course/pytorch-deep-
learning/00_pytorch_fundamentals.ipynb#Y134sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6'>7</a> [8, 11],
torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be
<a href='vscode-notebook-cell://ssh-
swapped.
tensor.T - where tensor is the desired tensor to transpose.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 12/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)
tensor([[1., 2.],
[3., 4.],
[5., 6.]])
tensor([[ 7., 10.],
[ 8., 11.],
[ 9., 12.]])
tensor([[1., 2.],
[3., 4.],
[5., 6.]])
tensor([[ 7., 8., 9.],
[10., 11., 12.]])
New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])
Output:
Without the transpose, the rules of matrix multiplication aren't fulfilled and we get an error like above.
You can create your own matrix multiplication visuals like this at http://matrixmultiplication.xyz/.
Note: A matrix multiplication like this is also referred to as the dot product of two matrices.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 13/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
The torch.nn.Linear() module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a
matrix multiplication between an input x and a weights matrix A .
y = x ⋅ AT + b
Where:
x is the input to the layer (deep learning is a stack of layers like torch.nn.Linear() and others on top of each other).
A is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better
represent patterns in the data (notice the " T ", that's because the weights matrix gets transposed).
Note: You might also often see W or another letter like X used to showcase the weights matrix.
b is the bias term used to slightly offset the weights and inputs.
y is the output (a manipulation of the input in the hopes to discover patterns in it).
This is a linear function (you may have seen something like y = mx + b in high school or elsewhere), and can be used to draw a straight line!
Let's play around with a linear layer.
Try changing the values of in_features and out_features below and see what happens.
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
out_features=6) # out_features = describes outer value
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")
Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
[4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
[6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
grad_fn=<AddmmBackward0>)
Question: What happens if you change in_features from 2 to 3 above? Does it error? How could you change the shape of the
input ( x ) to accommodate to the error? Hint: what did we have to do to tensor_B above?
If you've never done it before, matrix multiplication can be a confusing topic at first.
But after you've played around with it a few times and even cracked open a few neural networks, you'll notice it's everywhere.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 14/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
When you start digging into neural network layers and building your own, you'll find matrix multiplications everywhere. Source:
https://marksaroufim.substack.com/p/working-class-deep-learner
First we'll create a tensor and then find the max, min, mean and sum of it.
# Create a tensor
x = torch.arange(0, 100, 10)
x
tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")
Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450
Note: You may find some methods such as torch.mean() require tensors to be in torch.float32 (the most common) or another
specific datatype, otherwise the operation will fail.
This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later
section when using the softmax activation function).
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 15/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0
If one tensor is in torch.float64 and another is in torch.float32 , you might run into some errors.
You can change the datatypes of tensors using torch.Tensor.type(dtype=None) where the dtype parameter is the datatype you'd like to use.
First we'll create a tensor and check its datatype (the default is torch.float32 ).
torch.float32
Now we'll create another tensor the same as before but change its datatype to torch.float16 .
tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)
tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)
Note: Different datatypes can be confusing to begin with. But think of it like this, the lower the number (e.g. 32, 16, 8), the less
precise a computer stores the value. And with a lower amount of storage, this generally results in faster computation and a
smaller overall model. Mobile-based neural networks often operate with 8-bit integers, smaller and faster to run but less accurate
than their float32 counterparts. For more on this, I'd read up about precision in computing.
Exercise: So far we've covered a fair few tensor methods but there's a bunch more in the torch.Tensor documentation, I'd
recommend spending 10-minutes scrolling through and looking into any that catch your eye. Click on them and then write them
out in code yourself to see what happens.
torch.reshape(input, shape) Reshapes input to shape (if compatible), can also use torch.Tensor.reshape() .
Tensor.view(shape) Returns a view of the original tensor in a different shape but shares the same data as the original tensor.
torch.stack(tensors, dim=0) Concatenates a sequence of tensors along a new dimension ( dim ), all tensors must be same size.
torch.permute(input, dims) Returns a view of the original input with its dimensions permuted (rearranged) to dims .
Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix
multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make sure the right elements of your tensors are
mixing with the right elements of other tensors.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 16/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
Let's try them out.
# Create a tensor
import torch
x = torch.arange(1., 8.)
x, x.shape
Remember though, changing the view of a tensor with torch.view() really only creates a new view of the same tensor.
# Changing z changes x
z[:, 0] = 5
z, x
(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))
If we wanted to stack our new tensor on top of itself five times, we could do so with torch.stack() .
To do so you can use torch.squeeze() (I remember this as squeezing the tensor to only have dimensions over 1).
And to do the reverse of torch.squeeze() you can use torch.unsqueeze() to add a dimension value of 1 at a specific index.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 17/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")
You can also rearrange the order of axes values with torch.permute(input, dims) , where the input gets turned into a view with new dims .
Note: Because permuting returns a view (shares the same data as the original), the values in the permuted tensor will be the
same as the original tensor and if you change the values in the view, it will change the values of the original.
If you've ever done indexing on Python lists or NumPy arrays, indexing in PyTorch with tensors is very similar.
# Create a tensor
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape
(tensor([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]]),
torch.Size([1, 3, 3]))
Indexing values goes outer dimension -> inner dimension (check out the square brackets).
You can also use : to specify "all values in this dimension" and then use a comma ( , ) to add another dimension.
# Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]
tensor([[1, 2, 3]])
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]
tensor([[2, 5, 8]])
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 18/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
tensor([5])
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]
tensor([1, 2, 3])
Indexing can be quite confusing to begin with, especially with larger tensors (I still have to try indexing multiple times to get it right). But with a
bit of practice and following the data explorer's motto (visualize, visualize, visualize), you'll start to get the hang of it.
The two main methods you'll want to use for NumPy to PyTorch (and back again) are:
Note: By default, NumPy arrays are created with the datatype float64 and if you convert it to a PyTorch tensor, it'll keep the
same datatype (as above).
So if you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), you can use tensor
= torch.from_numpy(array).type(torch.float32) .
Because we reassigned tensor above, if you change the tensor, the array stays the same.
And if you want to go from PyTorch tensor to NumPy array, you can call tensor.numpy() .
And the same rule applies as above, if you change the original tensor , the new numpy_tensor stays the same.
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 19/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so
the randomness they create are simulated randomness (though there is debate on this too, but since I'm not a computer scientist, I'll let you find
out more yourself).
How does this relate to neural networks and deep learning then?
We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to
improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.
In short:
start with random numbers -> tensor operations -> try to make better (again and again and again)
Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.
Why?
And then your friend tries it out to verify you're not crazy.
In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?
We'll start by creating two random tensors, since they're random, you'd expect them to be different right?
import torch
print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B
Tensor A:
tensor([[0.8016, 0.3649, 0.6286, 0.9663],
[0.7687, 0.4566, 0.5745, 0.9200],
[0.3230, 0.8613, 0.0919, 0.3102]])
Tensor B:
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
[0.3743, 0.5220, 0.1336, 0.9666],
[0.9754, 0.8474, 0.8988, 0.1105]])
Just as you might've expected, the tensors come out with different values.
But what if you wanted to create two random tensors with the same values.
As in, the tensors would still contain random values but they would be of the same flavour.
That's where torch.manual_seed(seed) comes in, where seed is an integer (like 42 but it could be anything) that flavours the randomness.
import torch
import random
print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 20/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D
Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
[0.3904, 0.6009, 0.2566, 0.7936],
[0.9408, 0.1332, 0.9346, 0.5936]])
Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
[0.3904, 0.6009, 0.2566, 0.7936],
[0.9408, 0.1332, 0.9346, 0.5936]])
Nice!
Resource: What we've just covered only scratches the surface of reproducibility in PyTorch. For more, on reproducibility in general
and random seeds, I'd checkout:
The PyTorch reproducibility documentation (a good exercise would be to read through this for 10-minutes and even if you
don't understand it now, being aware of it is important).
The Wikipedia random seed page (this'll give a good overview of random seeds and pseudorandomness in general).
And by default these operations are often done on a CPU (computer processing unit).
However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the
specific types of operations neural networks need (matrix multiplications) than CPUs.
If so, you should look to use it whenever you can to train neural networks because chances are it'll speed up the training time dramatically.
There are a few ways to first get access to a GPU and secondly get PyTorch to use the GPU.
Note: When I reference "GPU" throughout this course, I'm referencing a Nvidia GPU with CUDA enabled (CUDA is a computing
platform and API that helps allow GPUs be used for general purpose computing & not just graphics) unless otherwise specified.
Google Colab Easy Free to use, almost zero setup required, can share work with others as easy as a link Doesn't save your data outputs, limited compute, su
Use your own Medium Run everything locally on your own machine GPUs aren't free, require upfront cost
Cloud computing (AWS, GCP, Azure) Medium-Hard Small upfront cost, access to almost infinite compute Can get expensive if running continually, takes som
There are more options for using GPUs but the above three will suffice for now.
Personally, I use a combination of Google Colab and my own personal computer for small scale experiments (and creating this course) and go
to cloud resources when I need more compute power.
Resource: If you're looking to purchase a GPU of your own but not sure what to get, Tim Dettmers has an excellent guide.
To check if you've got access to a Nvidia GPU, you can run !nvidia-smi where the ! (also called bang) means "run this on the command line".
!nvidia-smi
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 21/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1061 G /usr/lib/xorg/Xorg 53MiB |
| 0 N/A N/A 2671131 G /usr/lib/xorg/Xorg 97MiB |
| 0 N/A N/A 2671256 G /usr/bin/gnome-shell 9MiB |
+-----------------------------------------------------------------------------+
If you don't have a Nvidia GPU accessible, the above will output something like:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
If you do have a GPU, the line above will output something like:
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
True
If the above outputs True , PyTorch can see and use the GPU, if it outputs False , it can't see the GPU and in that case, you'll have to go back
through the installation steps.
Now, let's say you wanted to setup your code so it ran on CPU or the GPU if it was available.
That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.
'cuda'
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 22/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
If the above output "cuda" it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output "cpu" , our
PyTorch code will stick with the CPU.
Note: In PyTorch, it's best practice to write device agnostic code. This means code that'll run on CPU (always available) or GPU (if
available).
If you want to do faster computing you can use a GPU but if you want to do much faster computing, you can use multiple GPUs.
You can count the number of GPUs PyTorch has access to using torch.cuda.device_count() .
Knowing the number of GPUs PyTorch has access to is helpful incase you wanted to run a specific process on one GPU and another process on
another (PyTorch also has features to let you run a process across all GPUs).
Be sure that the versions of the macOS and Pytorch are updated.
True
'mps'
As before, if the above output "mps" it means we can set all of our PyTorch code to use the available Apple Silicon GPU.
if torch.cuda.is_available():
device = "cuda" # Use NVIDIA GPU (if available)
elif torch.backends.mps.is_available():
device = "mps" # Use Apple Silicon GPU (if available)
else:
device = "cpu" # Default to CPU if no GPU is available
Why do this?
GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our device agnostic code (see above), it'll run
on the CPU.
Note: Putting a tensor on GPU using to(device) (e.g. some_tensor.to(device) ) returns a copy of that tensor, e.g. the same
tensor will be on CPU and GPU. To overwrite tensors, reassign them:
some_tensor = some_tensor.to(device)
Let's try creating a tensor and putting it on the GPU (if it's available).
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 23/24
2024/10/6 晚上9:37 00_pytorch_fundamentals.ipynb - Colab
t t t (d i )
tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='mps:0')
If you have a GPU available, the above code will output something like:
https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=5v3iRCRUTGe… 24/24