Pytorch: Tensors and Datasets
Pytorch: Tensors and Datasets
Discourse link
Tensors:
In Pytorch, networks are composed of Tensors. They are generalizations for numbers and
arrays. The neural network is composed by weights, stored in tensors. The inputs are also
tensors, and the operations that the network make to obtain the output (also a tensor) are
tensor operations.
Tensors 1D:
A 0D tensor is simply a number.
A tensor contains elements of a single datatype. This type can be real numbers, integers, etc.
of several types: torch.float32, torch.double, torch.half, torch.int…
We can access the elements in the tensor as we would access a numpy array: a[0], a[1]
a.dtype will tell us the type of the data in the tensor, while a.type() tell us the type of the
tensor.
We can specify the type of the tensor in the constructor: a = torch.tensor([1, 3 ,5, 4], dtype =
torch.int). We can also use a specific constructor for each tensor type: a =
torch.FloatTensor([1.2, 3.7 ,5, 4]).
We can also change the type of a tensor using the type() method and indicating the new
desired type as its argument a.type(torch.FloatTensor)
The method size() give us the number of elements in the Tensor: a.size(). Son las filas en
realidad.
The attribute ndimension represents the rank (number of dimensions) of the tensor.
a.ndimension()
To change the number of dimensions of a tensor, e.g., for using it as an input to a network, we
can use the method view(): a_col = a.view(5,1) changes the tensor to a one with 5 rows and 1
column.
To convert a tensor to a numpy array, we can use the attribure numpy(): back_to_numpy =
torch_tensor.numpy()
It must be remarked that changing the numpy array will also change the associated tensor, as
it is a pointer to the array.
We can do the same with a Pandas series using its attribute values: pandas_to_torch =
torch.from_numpy(pandas_series.values)
We can use the method tolist() to return a list from a tensor: torch_to_list =
this_tensor.tolist()
IMPORTANT: The individual members of a tensor are also tensors. To access the value itself
we must use the method item(): new_tensor[0].item()
Basics Operations:
Vector Addition and Subtraction (element to element): z = u+v being u and v two tensors. We
can also sum an integer directly to each element of a tensor: z = u +1
Vector multiplication by a scalar: z = 2*u. It returns a tensor with each element multiplied.
Product of two tensors (element to element): z = u*v
Functions:
Universal functions: mean, max, min, etc. max_b = b.max
Note that we must convert the tensor to a Numpy array before plotting it with matplotlib.
Tensors in 2D
In 2D tensors are practically matrixes. Tensors can be extended to any number of dimensions
3D, 4D, etc., e.g., a 3 colour RGB image where each channel is a matrix.
A.ndimension will return 2 in this case. A.shape will return 3,3. A.size() will return
3,3. A.numel() will return 9 (the number of elements).
We can also use only one bracket with the numbers separated by a comma.
Tensor Addition and Subtraction (element to element): z = u+v being u and v two tensors. We
can also sum an integer directly to each element of a tensor: z = u +1
Tensor multiplication by a scalar: z = 2*u. It returns a tensor with each element multiplied.
Differentiation in PyTorch:
For generating parameters in NN.
Derivatives:
In this example, we first define a tensor with a value of 2 in the regular way. However, we add
the argument requires grad = True because we are going to use x to evaluate a function.
Then we define the function y = x**2 using the tensor x for its declaration. This way we will be
able to evaluate y (and its derivate) at the x value.
Doing y.backward() we are computing the derivative of y (it is called a backward function and
it seems that it is stored in the same y function, a node in a graph).
This must be done this way since PyTorch is calculating the derivative by creating a backwards
graph in which the tensors and the backward functions are the nodes.
The graph will have the tensor we created as a
leaf (is_leaf = True) and it is associated to the
function y (that is not a leaf).
Partial Derivatives:
The process is the same, but you can evaluate the gradient of f on both of the variables
separately.
NOTAS: The method detach() excludes further tracking of operations in the graph, and
therefore the subgraph will not record operations. Cuando creamos el grafo, se empiezan a
traquear y evaluar todas las operaciones que se hacen. Si lo que queremos es acceder a los
valores del tensor y punto, sin cambiar nada más, usar x.detach().numpy()
Se puede hacer la evaluación de tensores que en vez de un solo elemento sea un array, creado
por ejemplo con linspace:
Simple Dataset:
Dataset is a Pytorch class. It can be transformed. It can be used after importing it from
torch.utils.data import Dataset
We can create our own implementation of a dataset that will inherit properties of Dataset:
Transforms Compose:
We use this class if we want to apply several transforms one after another. We create a
transforms.Compose object whose arguments will be the classes of the different transforms:
We can also assign it to the dataset class constructor in the same way as a single transform.
During the examples in this section, we will use the previous libraries. PIL.Image for reading
images, pandas for reading dataframes, os for file system management, pytplot for showing
the images, and finally Dataset and DataLoader.
Here we are going to read the data about the dataset
from a folder called data. The index.csv contains the path
to the images and their categories.
Instead of applying these transforms independently we can compose them and/or integrating
it into the Dataset class via the transform attribute.
In this example CenterCrop(20) is cropping the image to 20x20 pixels centered in the “center”
of the original image. ToTensor() converts the PIL image to a Tensor (useful to feed CNNs).
In torchvision.datasets there are prebuilt datasets with databases like MNIST and others.
This example loads the dataset MNIST. root indicates the path to the dataset, and if it is not
downloaded and download=True it will download it into that path. Train = False indicates you
want to use the test partition of the dataset.