0% found this document useful (0 votes)

10 views26 pages

Chapter 3

Uploaded by

Javier Gonzalez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views26 pages

Chapter 3

Uploaded by

Javier Gonzalez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Discovering

activation functions
between layers
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Science Content Developer
Limitations of the sigmoid and softmax function
Sigmoid functions:

Bounded between 0 and 1

Can be used anywhere in the network

Gradients:

Approach zero for low and high values of x

Cause function to saturate

Sigmoid function saturation can lead to

vanishing gradients during backpropagation.

This is also a problem for softmax.

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Introducing ReLU
Rectified Linear Unit (ReLU):

f(x) = max(x, 0)

for positive inputs, the output is equal to

the input

for strictly negative inputs, the output is

equal to zero

overcomes the vanishing gradients problem

In PyTorch:

relu = nn.ReLU()

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Introducing Leaky ReLU
Leaky ReLU:

For positive inputs, it behaves similarly to

ReLU

For negative inputs, it multiplies the input

by a small coefficient (defaulted to 0.01)

The gradients for negative inputs are never

null

In PyTorch:

leaky_relu = nn.LeakyReLU(negative_slope = 0.05)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
A deeper dive into
neural network
architecture
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Science Content Developer
Layers are made of neurons
Linear layers are fully connected
Each neuron of a layer connected to each
neuron of previous layer

A neuron of a linear layer:

computes a linear operation using all
neurons of previous layer

contains N+1 learnable parameters

where N = dimension of previous layer's

outputs

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Layer naming convention

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Tweaking the number of hidden layers
Input and output layers dimensions are fixed.
input layer depends on the number of features n_features

output layer depends on the number of categories n_classes

model = nn.Sequential(nn.Linear(n_features, 8),

nn.Linear(8, 4),
nn.Linear(4, n_classes))

We can use as many hidden layers as we want

Increasing the number of hidden layers = increasing the number of parameters = increasing
the model capacity

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Counting the number of parameters
Given the following model: Using PyTorch:

.numel() : returns the number of elements

model = nn.Sequential(nn.Linear(8, 4),
in the tensor
nn.Linear(4, 2))

total = 0
Manually calculating the number of
for parameter in model.parameters():
parameters:
total += parameter.numel()
first layer has 4 neurons, each neuron has print(total)
8+1 parameters = 36 parameters
46
second layer has 2 neurons, each neuron
has 4+1 parameters = 10 parameters
total = 46 learnable parameters

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Learning rate and
momentum
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Science Content Developer
Updating weights with SGD
Training a neural network = solving an optimization problem.
Stochastic Gradient Descent (SGD) optimizer

sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)

Two parameters:
learning rate: controls the step size

momentum: controls the inertia of the optimizer

Bad values can lead to:

long training times

bad overall performances (poor accuracy)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: optimal learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: small learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: high learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Without momentum
lr = 0.01 momentum = 0 , after 100 steps minimum found for x = -1.23 and y = -0.14

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

With momentum
lr = 0.01 momentum = 0.9 , after 100 steps minimum found for x = 0.92 and y = -2.04

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Summary

Learning rate Momentum

Controls the step size Controls the inertia
Too small leads to long training Null momentum can lead to the optimizer being stuck in a
times local minimum
Too high leads to poor
Non-null momentum can help find the function minimum
performances

Typical values between 10−2

Typical values between 0.85 and 0.99
and 10−4

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Layer initialization
and transfer
learning
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Science Content Developer
Layer initialization
import torch.nn as nn
layer = nn.Linear(64, 128)
print(layer.weight.min(), layer.weight.max())

(tensor(-0.1250, grad_fn=<MinBackward1>), tensor(0.1250, grad_fn=<MaxBackward1>))

Layer weights are initialized to small values

Layer outputs can explode if inputs and weights are not normalized

Weights can be initialized using different methods (e.g., with a uniform distribution)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Layer initialization
import torch.nn as nn

layer = nn.Linear(64, 128)

nn.init.uniform_(layer.weight)
print(custom_layer.fc.weight.min(), custom_layer.fc.weight.max())

(tensor(0.0002, grad_fn=<MinBackward1>), tensor(1.0000, grad_fn=<MaxBackward1>))

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Transfer learning and fine-tuning
Transfer learning: reusing a model trained on a first task for a second similar task, to
accelerate the training process.

import torch

layer = nn.Linear(64, 128)

torch.save(layer, 'layer.pth')

new_layer = torch.load('layer.pth')

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Transfer learning and fine-tuning
Fine-tuning = A type of transfer learning
Smaller learning rate

Not every layer is trained (we freeze some of them)

Rule of thumb: freeze early layers of network and fine-tune layers closer to output layer

import torch.nn as nn

model = nn.Sequential(nn.Linear(64, 128),

nn.Linear(128, 256))

for name, param in model.named_parameters():

if name == '0.weight':
param.requires_grad = False

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Pytorch Cheatsheet EN
No ratings yet
Pytorch Cheatsheet EN
1 page
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Chapter 3
No ratings yet
Chapter 3
26 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Activation Functions: Ismail Elezi
No ratings yet
Activation Functions: Ismail Elezi
30 pages
Chapter1 Intro
No ratings yet
Chapter1 Intro
35 pages
Chapter 1
No ratings yet
Chapter 1
37 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Pytorch Tutorial 1
No ratings yet
Pytorch Tutorial 1
48 pages
Py Torch
No ratings yet
Py Torch
786 pages
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
No ratings yet
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
8 pages
Pytorch 101: Deep Learning PHD Course 2017/2018
No ratings yet
Pytorch 101: Deep Learning PHD Course 2017/2018
19 pages
Deep Learning Lab: How To Train Your First Neural Network
No ratings yet
Deep Learning Lab: How To Train Your First Neural Network
68 pages
Module02 PyTorch
No ratings yet
Module02 PyTorch
36 pages
cs519 hw2
No ratings yet
cs519 hw2
15 pages
DIP Lab 10
No ratings yet
DIP Lab 10
11 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
00 Pytorch and Deep Learning Fundamentals PDF
No ratings yet
00 Pytorch and Deep Learning Fundamentals PDF
44 pages
PyTorch - A Comprehensive Overview
No ratings yet
PyTorch - A Comprehensive Overview
7 pages
Pytorch Tutorial 1 Rev 1
No ratings yet
Pytorch Tutorial 1 Rev 1
48 pages
یادگیری پایتورچ
No ratings yet
یادگیری پایتورچ
30 pages
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (7)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
Pytorch Demo 1749471354
No ratings yet
Pytorch Demo 1749471354
10 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
8 pages
Deep Learning With Keras
100% (5)
Deep Learning With Keras
136 pages
Day 45 PyTorch Presentation
No ratings yet
Day 45 PyTorch Presentation
67 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Chapter 3 - Training Deep Neural Networks
No ratings yet
Chapter 3 - Training Deep Neural Networks
25 pages
PyTorch CrashCourse
No ratings yet
PyTorch CrashCourse
16 pages
2c PyTorch4
No ratings yet
2c PyTorch4
4 pages
Pytorch Tutorial: Narges Honarvar Nazari January 30
No ratings yet
Pytorch Tutorial: Narges Honarvar Nazari January 30
29 pages
Deep Learning With Scikit-Learn and PyTorch (2024)
100% (2)
Deep Learning With Scikit-Learn and PyTorch (2024)
126 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Lecture 2: Introduction To Pytorch
No ratings yet
Lecture 2: Introduction To Pytorch
7 pages
Building Deep Learning Models Using The PyTorch Library
No ratings yet
Building Deep Learning Models Using The PyTorch Library
4 pages
Pytorch Slides
No ratings yet
Pytorch Slides
31 pages
A Brief Introduction To Pytorch: (A Deep Learning Library)
No ratings yet
A Brief Introduction To Pytorch: (A Deep Learning Library)
32 pages
Deep Learning - Part II-1
No ratings yet
Deep Learning - Part II-1
23 pages
Train Your Image Classifier Model With PyTorch
No ratings yet
Train Your Image Classifier Model With PyTorch
6 pages
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
No ratings yet
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
19 pages
06 Pytorch Transfer Learning
No ratings yet
06 Pytorch Transfer Learning
18 pages
Morgan & Claypool - Introduction To Deep Learning For Engineers Using Python and Google Clod Platform - 2020
No ratings yet
Morgan & Claypool - Introduction To Deep Learning For Engineers Using Python and Google Clod Platform - 2020
111 pages
Beginner's PyTorch Guide
No ratings yet
Beginner's PyTorch Guide
35 pages
PyTorch PDF
No ratings yet
PyTorch PDF
72 pages
Building Your Deep Neural Network - Step by Step v8 PDF
No ratings yet
Building Your Deep Neural Network - Step by Step v8 PDF
44 pages
PyTorch Guide With Code
No ratings yet
PyTorch Guide With Code
4 pages
PyTorch Workflow Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
No ratings yet
PyTorch Workflow Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
43 pages
Pytorch
No ratings yet
Pytorch
19 pages
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
No ratings yet
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
8 pages
DNN Hyperparameter Tuning
No ratings yet
DNN Hyperparameter Tuning
105 pages
CS236 Introduction To PyTorch
100% (4)
CS236 Introduction To PyTorch
33 pages
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
No ratings yet
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
108 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
m575 Chapter 9 Lecture Notes Exercise and Solution
No ratings yet
m575 Chapter 9 Lecture Notes Exercise and Solution
12 pages
Percobaan Mat
No ratings yet
Percobaan Mat
4 pages
Mth643 Quize
No ratings yet
Mth643 Quize
15 pages
Errata Nlaa - Second
No ratings yet
Errata Nlaa - Second
2 pages
10.5 DeepRecurrent
No ratings yet
10.5 DeepRecurrent
8 pages
Convolutional Neural Networks: Edge Detection Example
No ratings yet
Convolutional Neural Networks: Edge Detection Example
4 pages
خورزميات اسئله3
No ratings yet
خورزميات اسئله3
3 pages
Regularization
No ratings yet
Regularization
14 pages
GPSS World Simulation Report
No ratings yet
GPSS World Simulation Report
10 pages
SimulatedAnnealing PDF
No ratings yet
SimulatedAnnealing PDF
26 pages
2024FuallStackBench Seed
No ratings yet
2024FuallStackBench Seed
26 pages
MacCormack Method
No ratings yet
MacCormack Method
1 page
Machine Learning - Neural Networks
No ratings yet
Machine Learning - Neural Networks
2 pages
Numerical Analysis Tutorial 2
No ratings yet
Numerical Analysis Tutorial 2
2 pages
Matlab Optimization Toolbox Documentation Instant Download
No ratings yet
Matlab Optimization Toolbox Documentation Instant Download
83 pages
Gaussian Quadrature Weights and Abscissae
No ratings yet
Gaussian Quadrature Weights and Abscissae
61 pages
Class 10 Substitution Method: Choose Correct Answer(s) From The Given Choices
No ratings yet
Class 10 Substitution Method: Choose Correct Answer(s) From The Given Choices
14 pages
2 - EE-Tuning An Economical Yet Scalable Solution For Tuning Early-Exit Large Language Models2402.00518
No ratings yet
2 - EE-Tuning An Economical Yet Scalable Solution For Tuning Early-Exit Large Language Models2402.00518
1 page
On Solving Biquadratic Optimization Via Semidefinite Relaxation
No ratings yet
On Solving Biquadratic Optimization Via Semidefinite Relaxation
20 pages
NP CLASS(s)
No ratings yet
NP CLASS(s)
11 pages
Solution of The Transportation Model
No ratings yet
Solution of The Transportation Model
15 pages
10 TH Worksheet On chp-2 Polynomials (New)
No ratings yet
10 TH Worksheet On chp-2 Polynomials (New)
29 pages
Lecture On Euler's Method
No ratings yet
Lecture On Euler's Method
15 pages
Advanced Algorithms & Data Structures: Lecturer: Karimzhan Nurlan Berlibekuly
No ratings yet
Advanced Algorithms & Data Structures: Lecturer: Karimzhan Nurlan Berlibekuly
28 pages
MATH 368K (54195) : Numerical Methods For Applications: General Information
No ratings yet
MATH 368K (54195) : Numerical Methods For Applications: General Information
4 pages
Tomanovic GQF 2023
No ratings yet
Tomanovic GQF 2023
20 pages
Math 542 Numerical Solutions of Differential Equations
No ratings yet
Math 542 Numerical Solutions of Differential Equations
34 pages
Homework 2
No ratings yet
Homework 2
1 page
Polynomials
No ratings yet
Polynomials
3 pages
1.0 Bisection Method
No ratings yet
1.0 Bisection Method
95 pages

Chapter 3

Uploaded by

Chapter 3

Uploaded by

Discovering

Maham Faisal Khan

Bounded between 0 and 1

Can be used anywhere in the network

Approach zero for low and high values of x

Cause function to saturate

Sigmoid function saturation can lead to

This is also a problem for softmax.

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

for positive inputs, the output is equal to

for strictly negative inputs, the output is

overcomes the vanishing gradients problem

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

For positive inputs, it behaves similarly to

For negative inputs, it multiplies the input

The gradients for negative inputs are never

leaky_relu = nn.LeakyReLU(negative_slope = 0.05)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

A neuron of a linear layer:

contains N+1 learnable parameters

where N = dimension of previous layer's

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

output layer depends on the number of categories n_classes

model = nn.Sequential(nn.Linear(n_features, 8),

We can use as many hidden layers as we want

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

.numel() : returns the number of elements

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)

momentum: controls the inertia of the optimizer

Bad values can lead to:

bad overall performances (poor accuracy)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Learning rate Momentum

Typical values between 10−2

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

(tensor(-0.1250, grad_fn=<MinBackward1>), tensor(0.1250, grad_fn=<MaxBackward1>))

Layer weights are initialized to small values

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

layer = nn.Linear(64, 128)

(tensor(0.0002, grad_fn=<MinBackward1>), tensor(1.0000, grad_fn=<MaxBackward1>))

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

layer = nn.Linear(64, 128)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Not every layer is trained (we freeze some of them)

model = nn.Sequential(nn.Linear(64, 128),

for name, param in model.named_parameters():

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

You might also like