0% found this document useful (0 votes)
26 views

Pytorch: Tensors and Datasets

The document discusses tensors in PyTorch, including how to construct and manipulate tensors of different dimensions, perform basic operations on tensors, and use tensors for neural network differentiation and simple datasets. Tensors are the fundamental data structure in PyTorch and allow representing numbers, arrays, and neural network parameters and inputs.

Uploaded by

Javier Hernandez
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Pytorch: Tensors and Datasets

The document discusses tensors in PyTorch, including how to construct and manipulate tensors of different dimensions, perform basic operations on tensors, and use tensors for neural network differentiation and simple datasets. Tensors are the fundamental data structure in PyTorch and allow representing numbers, arrays, and neural network parameters and inputs.

Uploaded by

Javier Hernandez
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

TENSOR AND DATASETS

Discourse link

Tensors:
In Pytorch, networks are composed of Tensors. They are generalizations for numbers and
arrays. The neural network is composed by weights, stored in tensors. The inputs are also
tensors, and the operations that the network make to obtain the output (also a tensor) are
tensor operations.

Is extremely easy to conver numpy arrays to Pytorch tensors and viceversa.

Tensors 1D:
A 0D tensor is simply a number.

A 1D tensor is an array of numbers, e.g., a row in a database.

A tensor contains elements of a single datatype. This type can be real numbers, integers, etc.
of several types: torch.float32, torch.double, torch.half, torch.int…

To construct a tensor in Pytorch:

We can access the elements in the tensor as we would access a numpy array: a[0], a[1]

a.dtype will tell us the type of the data in the tensor, while a.type() tell us the type of the
tensor.

We can specify the type of the tensor in the constructor: a = torch.tensor([1, 3 ,5, 4], dtype =
torch.int). We can also use a specific constructor for each tensor type: a =
torch.FloatTensor([1.2, 3.7 ,5, 4]).

We can also change the type of a tensor using the type() method and indicating the new
desired type as its argument a.type(torch.FloatTensor)
The method size() give us the number of elements in the Tensor: a.size(). Son las filas en
realidad.

The attribute ndimension represents the rank (number of dimensions) of the tensor.
a.ndimension()

To change the number of dimensions of a tensor, e.g., for using it as an input to a network, we
can use the method view(): a_col = a.view(5,1) changes the tensor to a one with 5 rows and 1
column.

To convert a tensor to a numpy array, we can use the attribure numpy(): back_to_numpy =
torch_tensor.numpy()

To convert a numpy array to a tensor: torch_tensor = torch.from_numpy(numpy_array)

It must be remarked that changing the numpy array will also change the associated tensor, as
it is a pointer to the array.

We can do the same with a Pandas series using its attribute values: pandas_to_torch =
torch.from_numpy(pandas_series.values)

We can use the method tolist() to return a list from a tensor: torch_to_list =
this_tensor.tolist()

IMPORTANT: The individual members of a tensor are also tensors. To access the value itself
we must use the method item(): new_tensor[0].item()

Indexing and Slicing:


Changing the elements in a tensor is make in the same sway than lists and arrays. Accessing to
a range of elements is also the same (with : ). This processes return tensors.

To assign values to a range of elements we can do it as follows:

Basics Operations:
Vector Addition and Subtraction (element to element): z = u+v being u and v two tensors. We
can also sum an integer directly to each element of a tensor: z = u +1

Vector multiplication by a scalar: z = 2*u. It returns a tensor with each element multiplied.
Product of two tensors (element to element): z = u*v

Dot product of two tensors: z = torch.dot(u,v)

Functions:
Universal functions: mean, max, min, etc. max_b = b.max

Linspace: it returns a torch tensor with numbers spaced linearly.

Plotting mathematical functions:

Note that we must convert the tensor to a Numpy array before plotting it with matplotlib.

Tensors in 2D
In 2D tensors are practically matrixes. Tensors can be extended to any number of dimensions
3D, 4D, etc., e.g., a 3 colour RGB image where each channel is a matrix.

Tensor creation in 2D:

A.ndimension will return 2 in this case. A.shape will return 3,3. A.size() will return
3,3. A.numel() will return 9 (the number of elements).

Indexing and slicing in 2D:

We can also use only one bracket with the numbers separated by a comma.

Slicing is made in the same way than for 1D tensors.


However, we can't combine using slicing on row and pick one column by using the code:
tensor_obj[begin_row_number :end_row_number][begin_column_number: end_column
number]. The reason is that the slicing will be applied on the tensor first. The result type will
be a two dimension again. Para poder acceder de esa manera, todo en los mismos paréntesis
separados con comas: tensor_obj[begin_row_number :end_row_number,
begin_column_number: end_column number].

Basic operations in 2D:

Tensors must be of the same type.

Tensor Addition and Subtraction (element to element): z = u+v being u and v two tensors. We
can also sum an integer directly to each element of a tensor: z = u +1

Tensor multiplication by a scalar: z = 2*u. It returns a tensor with each element multiplied.

Product of two tensors (element to element): z = u*v Es elementwise.

Dot product of two tensors: z = torch.mm(u,v) Es la multiplicación de dos matrices, funciona


normal. El numero de columnas de la primera matriz es igual al numero de filas de la segunda
matriz.

Differentiation in PyTorch:
For generating parameters in NN.

Derivatives:

In this example, we first define a tensor with a value of 2 in the regular way. However, we add
the argument requires grad = True because we are going to use x to evaluate a function.

Then we define the function y = x**2 using the tensor x for its declaration. This way we will be
able to evaluate y (and its derivate) at the x value.

Doing y.backward() we are computing the derivative of y (it is called a backward function and
it seems that it is stored in the same y function, a node in a graph).

Finally, doing x.grad we are evaluating the derivative at the value of x.

This must be done this way since PyTorch is calculating the derivative by creating a backwards
graph in which the tensors and the backward functions are the nodes.
The graph will have the tensor we created as a
leaf (is_leaf = True) and it is associated to the
function y (that is not a leaf).

Data contain the value of the tensor, and the


value of the function when evaluated on the
tensor.

Grad_fn is None for tensors and the type of


function for the functions.

Grad will contain the value of the gradient


once is calculated (there is a typo in the
image, it must be 4)

Requires_grad indicates if the tensor will be


used for calculating the derivative.

You can access this attributes with: x.data,


x.grad_fn, y.is_leaf, etc.

Partial Derivatives:

The process is the same, but you can evaluate the gradient of f on both of the variables
separately.
NOTAS: The method detach() excludes further tracking of operations in the graph, and
therefore the subgraph will not record operations. Cuando creamos el grafo, se empiezan a
traquear y evaluar todas las operaciones que se hacen. Si lo que queremos es acceder a los
valores del tensor y punto, sin cambiar nada más, usar x.detach().numpy()

Se puede hacer la evaluación de tensores que en vez de un solo elemento sea un array, creado
por ejemplo con linspace:

Y = Y.sum() devuelve la suma de evaluar todos los valores de X en Y. Es necesario porque


backward() solo puede usarse para scalar outputs, no para vectores. Y será un tensor con los
valores de la funcion ReLu para cada valor de x, mientras que y.backward()se usa para hacer
que se calcule la derivada con x.grad.

Simple Dataset:
Dataset is a Pytorch class. It can be transformed. It can be used after importing it from
torch.utils.data import Dataset

We can create our own implementation of a dataset that will inherit properties of Dataset:

dataset = toy_set() will call the method


__init__

len(dataset) will call the method __len__

dataset[0] will call the method __getitem__


that returns a tuple with the values of the
dataset at that index. It can be accessed
iteratively with indexing.
Transforms: we may want to transform the data for
example for normalization. Instead of doing functions
outside the class, we can define another class with
methods and use it as a transform function inside the
dataset.

This example defines a class in which we have two


elements one to store the multiplication and the
addition of two parameters.

It is instantiated doing z = addmult(), and when we do


z(x,y) it performs the operations.

We can use it in our toy_set dataset in two ways:


directly with x_,y_ = z(dataset[0]), or assigning the
function to the transform attribute of the toy_set dataset with : dataset = toy_set(transform =
z). Then we will use with dataset[0] that checks if self.transform has been set.

Transforms Compose:

We use this class if we want to apply several transforms one after another. We create a
transforms.Compose object whose arguments will be the classes of the different transforms:

Then it will apply them consecutively.

We can also assign it to the dataset class constructor in the same way as a single transform.

Datasets for images:

During the examples in this section, we will use the previous libraries. PIL.Image for reading
images, pandas for reading dataframes, os for file system management, pytplot for showing
the images, and finally Dataset and DataLoader.
Here we are going to read the data about the dataset
from a folder called data. The index.csv contains the path
to the images and their categories.

We will create a Dataset for images, that will not


contain all the images but the path to the data
directory and the names of the images (obtained
from the csv file). It must have the __init__,
__len__, and __getitem__ methods.
Torch Vision Transforms:
In torchvision.transforms there are prebuilt transform that can be used on images.

Instead of applying these transforms independently we can compose them and/or integrating
it into the Dataset class via the transform attribute.

In this example CenterCrop(20) is cropping the image to 20x20 pixels centered in the “center”
of the original image. ToTensor() converts the PIL image to a Tensor (useful to feed CNNs).

Torch Vision Datasets:

In torchvision.datasets there are prebuilt datasets with databases like MNIST and others.

This example loads the dataset MNIST. root indicates the path to the dataset, and if it is not
downloaded and download=True it will download it into that path. Train = False indicates you
want to use the test partition of the dataset.

You might also like