0% found this document useful (0 votes)
8 views

Chapter 3

Uploaded by

Javier Gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Chapter 3

Uploaded by

Javier Gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Discovering

activation functions
between layers
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Science Content Developer
Limitations of the sigmoid and softmax function
Sigmoid functions:

Bounded between 0 and 1

Can be used anywhere in the network


Gradients:

Approach zero for low and high values of x

Cause function to saturate

Sigmoid function saturation can lead to


vanishing gradients during backpropagation.

This is also a problem for softmax.

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Introducing ReLU
Rectified Linear Unit (ReLU):

f(x) = max(x, 0)

for positive inputs, the output is equal to


the input

for strictly negative inputs, the output is


equal to zero

overcomes the vanishing gradients problem

In PyTorch:

relu = nn.ReLU()

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Introducing Leaky ReLU
Leaky ReLU:

For positive inputs, it behaves similarly to


ReLU

For negative inputs, it multiplies the input


by a small coefficient (defaulted to 0.01)

The gradients for negative inputs are never


null

In PyTorch:

leaky_relu = nn.LeakyReLU(negative_slope = 0.05)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
A deeper dive into
neural network
architecture
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Science Content Developer
Layers are made of neurons
Linear layers are fully connected
Each neuron of a layer connected to each
neuron of previous layer

A neuron of a linear layer:


computes a linear operation using all
neurons of previous layer

contains N+1 learnable parameters

where N = dimension of previous layer's


outputs

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Layer naming convention

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Tweaking the number of hidden layers
Input and output layers dimensions are fixed.
input layer depends on the number of features n_features

output layer depends on the number of categories n_classes

model = nn.Sequential(nn.Linear(n_features, 8),


nn.Linear(8, 4),
nn.Linear(4, n_classes))

We can use as many hidden layers as we want

Increasing the number of hidden layers = increasing the number of parameters = increasing
the model capacity

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Counting the number of parameters
Given the following model: Using PyTorch:

.numel() : returns the number of elements


model = nn.Sequential(nn.Linear(8, 4),
in the tensor
nn.Linear(4, 2))

total = 0
Manually calculating the number of
for parameter in model.parameters():
parameters:
total += parameter.numel()
first layer has 4 neurons, each neuron has print(total)
8+1 parameters = 36 parameters
46
second layer has 2 neurons, each neuron
has 4+1 parameters = 10 parameters
total = 46 learnable parameters

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Learning rate and
momentum
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Science Content Developer
Updating weights with SGD
Training a neural network = solving an optimization problem.
Stochastic Gradient Descent (SGD) optimizer

sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)

Two parameters:
learning rate: controls the step size

momentum: controls the inertia of the optimizer

Bad values can lead to:


long training times

bad overall performances (poor accuracy)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Impact of the learning rate: optimal learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Impact of the learning rate: small learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Impact of the learning rate: high learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Without momentum
lr = 0.01 momentum = 0 , after 100 steps minimum found for x = -1.23 and y = -0.14

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


With momentum
lr = 0.01 momentum = 0.9 , after 100 steps minimum found for x = 0.92 and y = -2.04

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Summary

Learning rate Momentum


Controls the step size Controls the inertia
Too small leads to long training Null momentum can lead to the optimizer being stuck in a
times local minimum
Too high leads to poor
Non-null momentum can help find the function minimum
performances

Typical values between 10−2


Typical values between 0.85 and 0.99
and 10−4

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Layer initialization
and transfer
learning
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Science Content Developer
Layer initialization
import torch.nn as nn
layer = nn.Linear(64, 128)
print(layer.weight.min(), layer.weight.max())

(tensor(-0.1250, grad_fn=<MinBackward1>), tensor(0.1250, grad_fn=<MaxBackward1>))

Layer weights are initialized to small values

Layer outputs can explode if inputs and weights are not normalized

Weights can be initialized using different methods (e.g., with a uniform distribution)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Layer initialization
import torch.nn as nn

layer = nn.Linear(64, 128)


nn.init.uniform_(layer.weight)
print(custom_layer.fc.weight.min(), custom_layer.fc.weight.max())

(tensor(0.0002, grad_fn=<MinBackward1>), tensor(1.0000, grad_fn=<MaxBackward1>))

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Transfer learning and fine-tuning
Transfer learning: reusing a model trained on a first task for a second similar task, to
accelerate the training process.

import torch

layer = nn.Linear(64, 128)


torch.save(layer, 'layer.pth')

new_layer = torch.load('layer.pth')

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Transfer learning and fine-tuning
Fine-tuning = A type of transfer learning
Smaller learning rate

Not every layer is trained (we freeze some of them)

Rule of thumb: freeze early layers of network and fine-tune layers closer to output layer

import torch.nn as nn

model = nn.Sequential(nn.Linear(64, 128),


nn.Linear(128, 256))

for name, param in model.named_parameters():


if name == '0.weight':
param.requires_grad = False

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

You might also like