Ilovepdf Merged
Ilovepdf Merged
In [ ]: import torch
import numpy as np
import sys
In [ ]: torch.__version__
Out[ ]: '2.1.0'
Device: cpu
Let's have linear regression as a case study to study the different components of PyTorch.
These are the following components we will be covering:
In a linear regression model, each target variable is estimated to be a weighted sum of the
input variables, offset by some constant, known as a bias :
Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall
and humidity:
The learning part of linear regression is to figure out a set of weights w11, w12,... w23,
b1 & b2 using gradient descent
inputs = torch.from_numpy(x_train)
targets = torch.from_numpy(y_train)
print(inputs.size())
print(targets.size())
torch.Size([15, 3])
torch.Size([15, 2])
In [ ]: # Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]
We'll now create a DataLoader , which can split the data into batches of a predefined size
while training. It also provides other utilities like shuffling and random sampling of the data.
In each iteration, the data loader returns one batch of data, with the given batch size. If
shuffle is set to True, it shuffles the training data before creating batches. Shuffling helps
randomize the input to the optimization algorithm, which can lead to faster reduction in the
loss.
In [ ]: import torch.nn as nn
# Define model
model = nn.Linear(3, 2) #nn.Linear assume this shape (in_features, out_features)
print(model.weight)
print(model.weight.size()) # (out_features, in_features)
print(model.bias)
print(model.bias.size()) #(out_features)
Parameter containing:
tensor([[-0.3365, 0.3363, -0.3117],
[ 0.1094, -0.0304, 0.2932]], requires_grad=True)
torch.Size([2, 3])
Parameter containing:
tensor([-0.4731, -0.5072], requires_grad=True)
torch.Size([2])
In fact, our model is simply a function that performs a matrix multiplication of the inputs
and the weights w and adds the bias b (for each observation)
PyTorch models also have a helpful .parameters method, which returns a list containing
all the weights and bias matrices present in the model. For our linear regression model, we
have one weight matrix and one bias matrix.
In [ ]: # Parameters
list(model.parameters()) #model.param returns a generator
We can use the model(tensor) API to perform a forward-pass that generate predictions
In [ ]: # Generate predictions
preds = model(inputs)
preds
In [ ]: criterion_mse = nn.MSELoss()
criterion_softmax_cross_entropy_loss = nn.CrossEntropyLoss()
tensor(7681.4390, grad_fn=<MseLossBackward0>)
7681.43896484375
In [ ]: # Define optimizer
#momentum update the weight based on past gradients also, which will be useful for
#If our momentum parameter was $0.9$, we would get our current grad + the multiplic
#from one time step ago by $0.9$, the one from two time steps ago by $0.9^2 = 0.81$
opt = torch.optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)
# 1. Predict
pred = model(xb)
# 2. Calculate loss
loss = loss_fn(pred, yb)
# 3. Calculate gradient
opt.zero_grad() #if not, the gradients will accumulate
loss.backward()
In [ ]: # Generate predictions
preds = model(inputs)
loss = criterion_mse(preds, targets)
print(loss.item())
28.631633758544922
PyTorch - Logistic Regression
In [ ]: import torch, torchvision
from torchvision import transforms
from torch import nn
import numpy as np
import sys
cpu
In [ ]: # Hyper-parameters
input_size = 784
hidden_size = 89
num_classes = 10
num_epochs = 1
batch_size = 100
learning_rate = 0.001
1. Defining dataset
In [ ]: # MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='data',
train=True,
transform=transforms.ToTensor(), #conve
download=True)
test_dataset = torchvision.datasets.MNIST(root='data',
train=False,
transform=transforms.ToTensor())
# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
Let's now define the model using the class. Every nn.Module can also use the
.to(device) to fully use the GPU capabilities.
Here we will be using Adam which is an adaptive learning rate optimization. Comparing
Adam and SGD, Adam is more adaptive in terms of how it uses momentum and learning
rate. Namely, Adam uses the squared gradients to scale the learning rate and it takes
advantage of momentum by using moving average of the gradient instead of gradient
itself like SGD with momentum
Whether Adam vs. SGD is still very debatable. Adam is proposed in 2015 to great success
and many recent papers found that SGD can be more generalized than Adam...so I really
don't know. It's best to try both, I guess.
In [ ]: # Loss and optimizer
#this is softmax + cross-entropy loss together, thus the output layer does not need
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
3. Training
Let's train the model
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels) #note that outputs shape [batch, num_cla
if (i+1) % 100 == 0:
sys.stdout.write('\rEpoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
4. Testing
Let's test the model