0% found this document useful (0 votes)
34 views10 pages

Ilovepdf Merged

This document discusses using PyTorch for linear regression and logistic regression. It covers: 1. Specifying input and target data, creating datasets and data loaders. 2. Using nn.Linear layers to define the model and access weights/biases. 3. Defining loss and optimizer functions like MSELoss and SGD. 4. Training the model in a loop that makes predictions, calculates loss, and updates weights. It then discusses logistic regression on MNIST data using a fully connected network with an input, hidden and output layer with sigmoid activations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views10 pages

Ilovepdf Merged

This document discusses using PyTorch for linear regression and logistic regression. It covers: 1. Specifying input and target data, creating datasets and data loaders. 2. Using nn.Linear layers to define the model and access weights/biases. 3. Defining loss and optimizer functions like MSELoss and SGD. 4. Training the model in a loop that makes predictions, calculates loss, and updates weights. It then discusses logistic regression on MNIST data using a fully connected network with an input, hidden and output layer with sigmoid activations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

PyTorch - Linear Regression

In [ ]: import torch
import numpy as np
import sys

In [ ]: torch.__version__

Out[ ]: '2.1.0'

In [ ]: #We can check whether we have gpu


device = torch.device("cuda:0" if (torch.cuda.is_available()) else "cpu")
print("Device: ", device)

Device: cpu

Let's have linear regression as a case study to study the different components of PyTorch.
These are the following components we will be covering:

1. Specifying input and target


2. Dataset and DataLoader
3. nn.Linear (Dense)
4. Define loss function
5. Define optimizer function
6. Train the model

Consider this data:

In a linear regression model, each target variable is estimated to be a weighted sum of the
input variables, offset by some constant, known as a bias :

yield = w11 ∗ temp + w12 ∗ rainfall + w13 ∗ humidity + b1


apple

yield = w21 ∗ temp + w22 ∗ rainfall + w23 ∗ humidity + b2


orange

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall
and humidity:

The learning part of linear regression is to figure out a set of weights w11, w12,... w23,
b1 & b2 using gradient descent

1. Specifiying input and target


In [ ]: # Input (temp, rainfall, humidity)
x_train = np.array([[73, 67, 43], [91, 88, 64], [87, 134, 58],
[102, 43, 37], [69, 96, 70], [73, 67, 43],
[91, 88, 64], [87, 134, 58], [102, 43, 37],
[69, 96, 70], [73, 67, 43], [91, 88, 64],
[87, 134, 58], [102, 43, 37], [69, 96, 70]],
dtype='float32')

# Targets (apples, oranges)


y_train = np.array([[56, 70], [81, 101], [119, 133],
[22, 37], [103, 119], [56, 70],
[81, 101], [119, 133], [22, 37],
[103, 119], [56, 70], [81, 101],
[119, 133], [22, 37], [103, 119]],
dtype='float32')

inputs = torch.from_numpy(x_train)
targets = torch.from_numpy(y_train)
print(inputs.size())
print(targets.size())

torch.Size([15, 3])
torch.Size([15, 2])

2. Dataset and DataLoader


We'll create a TensorDataset , which allows access to rows from inputs and targets as
tuples, and if we want to use DataLoader (will talk shortly) from numpy array, we have to
first make TensorDataset .

In [ ]: from torch.utils.data import TensorDataset

In [ ]: # Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

Out[ ]: (tensor([[ 73., 67., 43.],


[ 91., 88., 64.],
[ 87., 134., 58.]]),
tensor([[ 56., 70.],
[ 81., 101.],
[119., 133.]]))

We'll now create a DataLoader , which can split the data into batches of a predefined size
while training. It also provides other utilities like shuffling and random sampling of the data.

In [ ]: from torch.utils.data import DataLoader

In [ ]: # Define data loader


batch_size = 3
train_dl = DataLoader(train_ds, batch_size, shuffle=True)
The data loader is typically used in a for-in loop. Let's look at an example

In [ ]: for xb, yb in train_dl:


print(xb)
print(yb)
break

tensor([[91., 88., 64.],


[73., 67., 43.],
[69., 96., 70.]])
tensor([[ 81., 101.],
[ 56., 70.],
[103., 119.]])

In each iteration, the data loader returns one batch of data, with the given batch size. If
shuffle is set to True, it shuffles the training data before creating batches. Shuffling helps
randomize the input to the optimization algorithm, which can lead to faster reduction in the
loss.

3. Define some layer - nn.Linear


Instead of initializing the weights & biases manually, we can define the model using the
nn.Linear class from PyTorch, which does it automatically.

In [ ]: import torch.nn as nn

# Define model
model = nn.Linear(3, 2) #nn.Linear assume this shape (in_features, out_features)
print(model.weight)
print(model.weight.size()) # (out_features, in_features)
print(model.bias)
print(model.bias.size()) #(out_features)

Parameter containing:
tensor([[-0.3365, 0.3363, -0.3117],
[ 0.1094, -0.0304, 0.2932]], requires_grad=True)
torch.Size([2, 3])
Parameter containing:
tensor([-0.4731, -0.5072], requires_grad=True)
torch.Size([2])

In fact, our model is simply a function that performs a matrix multiplication of the inputs
and the weights w and adds the bias b (for each observation)

PyTorch models also have a helpful .parameters method, which returns a list containing
all the weights and bias matrices present in the model. For our linear regression model, we
have one weight matrix and one bias matrix.
In [ ]: # Parameters
list(model.parameters()) #model.param returns a generator

Out[ ]: [Parameter containing:


tensor([[-0.3365, 0.3363, -0.3117],
[ 0.1094, -0.0304, 0.2932]], requires_grad=True),
Parameter containing:
tensor([-0.4731, -0.5072], requires_grad=True)]

In [ ]: #we can print the complexity by the number of parameters


print(sum(p.numel() for p in model.parameters() if p.requires_grad))

We can use the model(tensor) API to perform a forward-pass that generate predictions

In [ ]: # Generate predictions
preds = model(inputs)
preds

Out[ ]: tensor([[-15.9079, 18.0490],


[-21.4476, 25.5367],
[ -2.7596, 21.9406],
[-31.8700, 20.1926],
[-13.2228, 24.6457],
[-15.9079, 18.0490],
[-21.4476, 25.5367],
[ -2.7596, 21.9406],
[-31.8700, 20.1926],
[-13.2228, 24.6457],
[-15.9079, 18.0490],
[-21.4476, 25.5367],
[ -2.7596, 21.9406],
[-31.8700, 20.1926],
[-13.2228, 24.6457]], grad_fn=<AddmmBackward0>)

4. Define loss function


The nn module contains a lot of useful loss function like this:

In [ ]: criterion_mse = nn.MSELoss()
criterion_softmax_cross_entropy_loss = nn.CrossEntropyLoss()

In [ ]: mse = criterion_mse(preds, targets)


print(mse)
print(mse.item()) ##print out the loss number

tensor(7681.4390, grad_fn=<MseLossBackward0>)
7681.43896484375

5. Define the optimizer


We use optim.SGD to perform stochastic gradient descent where samples are selected in
batches (often with random shuffling) instead of as a single group. Note that
model.parameters() is passed as an argument to optim.SGD .

In [ ]: # Define optimizer
#momentum update the weight based on past gradients also, which will be useful for
#If our momentum parameter was $0.9$, we would get our current grad + the multiplic
#from one time step ago by $0.9$, the one from two time steps ago by $0.9^2 = 0.81$
opt = torch.optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)

6. Training - putting everything together


In [ ]: # Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):

# Repeat for given number of epochs


for epoch in range(num_epochs):

# Train with batches of data


for xb,yb in train_dl:

xb.to(device) #move them to gpu if possible, if not, it will be cpu


yb.to(device)

# 1. Predict
pred = model(xb)

# 2. Calculate loss
loss = loss_fn(pred, yb)

# 3. Calculate gradient
opt.zero_grad() #if not, the gradients will accumulate
loss.backward()

# Print out the gradients.


#print ('dL/dw: ', model.weight.grad)
#print ('dL/db: ', model.bias.grad)

# 4. Update parameters using gradients


opt.step()

# Print the progress


if (epoch+1) % 10 == 0:
sys.stdout.write("\rEpoch [{}/{}], Loss: {:.4f}".format(epoch+1, num_ep

In [ ]: #train for 100 epochs


fit(100, model, criterion_mse, opt, train_dl)

Epoch [100/100], Loss: 18.83326

In [ ]: # Generate predictions
preds = model(inputs)
loss = criterion_mse(preds, targets)
print(loss.item())

28.631633758544922
PyTorch - Logistic Regression
In [ ]: import torch, torchvision
from torchvision import transforms
from torch import nn
import numpy as np
import sys

Fully-Connected Neural Network


Let's load the MNIST dataset. Our architecture is simple:

1. Input layer receiving 784 features


2. Hidden layer with size of 89 neurons
3. Output layer with size of 10 neurons

We will be using Sigmoid activation.

In [ ]: #set gpu if available


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cpu

In [ ]: # Hyper-parameters
input_size = 784
hidden_size = 89
num_classes = 10
num_epochs = 1
batch_size = 100
learning_rate = 0.001

1. Defining dataset
In [ ]: # MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='data',
train=True,
transform=transforms.ToTensor(), #conve
download=True)

test_dataset = torchvision.datasets.MNIST(root='data',
train=False,
transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)

x_sample, y_sample = next(iter(train_loader))


print("X: ", x_sample.shape)
print("X min: ", x_sample.min())
print("X max: ", x_sample.max())
print("y: ", y_sample.shape)
print("y unique: ", y_sample.unique())

X: torch.Size([100, 1, 28, 28])


X min: tensor(0.)
X max: tensor(1.)
y: torch.Size([100])
y unique: tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

2. Defining the model


In [ ]: # Fully connected neural network with one hidden layer
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(NeuralNet, self).__init__() #super(Model, self)
self.fc1 = nn.Linear(input_size, hidden_size)
#add non-linearity; recall ReLU is max(input, 0)
#->Go study about LeakyReLU (max (input, a * input)) and Swish (x * sigmoid
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, num_classes)

def forward(self, x):


out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
return out

Let's now define the model using the class. Every nn.Module can also use the
.to(device) to fully use the GPU capabilities.

In [ ]: model = NeuralNet(input_size, hidden_size, num_classes).to(device)

Let's define the Loss and optimizer.

Here we will be using Adam which is an adaptive learning rate optimization. Comparing
Adam and SGD, Adam is more adaptive in terms of how it uses momentum and learning
rate. Namely, Adam uses the squared gradients to scale the learning rate and it takes
advantage of momentum by using moving average of the gradient instead of gradient
itself like SGD with momentum

Whether Adam vs. SGD is still very debatable. Adam is proposed in 2015 to great success
and many recent papers found that SGD can be more generalized than Adam...so I really
don't know. It's best to try both, I guess.
In [ ]: # Loss and optimizer

#this is softmax + cross-entropy loss together, thus the output layer does not need
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

3. Training
Let's train the model

In [ ]: # Train the model


total_step = len(train_loader) #for printing purpose
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):

#images shape is [100, 1, 28, 28] [batch_size, channel, height, width]

# Move tensors to the configured device


# also reshape to [100, 784] so it can be inputted into the Dense layer
images = images.reshape(-1, 28*28).to(device)
labels = labels.to(device)

# Forward pass
outputs = model(images)
loss = criterion(outputs, labels) #note that outputs shape [batch, num_cla

# Backward and optimize


optimizer.zero_grad()
loss.backward()
optimizer.step()

if (i+1) % 100 == 0:
sys.stdout.write('\rEpoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, total_step, loss.item()))

Epoch [1/1], Step [600/600], Loss: 0.1646

4. Testing
Let's test the model

In [ ]: # Test the model


# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images = images.reshape(-1, 28*28).to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1) #returns max value, indices
total += labels.size(0) #keep track of total
correct += (predicted == labels).sum().item() #.item() give the raw number

print('Accuracy of the network on the 10000 test images: {} %'.format(100 * cor

# Save the model checkpoint


torch.save(model.state_dict(), 'models/dense-mnist.ckpt')

Accuracy of the network on the 10000 test images: 93.56 %

You might also like