0% found this document useful (0 votes)

10 views26 pages

Chapter 3

The document discusses various activation functions used in deep learning, highlighting the limitations of sigmoid and softmax functions, and introducing ReLU and Leaky ReLU as alternatives that mitigate issues like vanishing gradients. It also covers neural network architecture, including the structure of layers, the importance of learning rates and momentum in training, and techniques for layer initialization and transfer learning. Practical examples using PyTorch are provided throughout to illustrate these concepts.

Uploaded by

hunglaikcad1247

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views26 pages

Chapter 3

Uploaded by

hunglaikcad1247

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Discovering

activation functions
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Limitations of the sigmoid and softmax function
Sigmoid functions:

Bounded between 0 and 1

Can be used anywhere in the network

Gradients:

Approach zero for low and high values of x

Cause function to saturate

Sigmoid function saturation can lead to

vanishing gradients during backpropagation.

This is also a problem for softmax.

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Introducing ReLU
Rectified Linear Unit (ReLU):

f(x) = max(x, 0)

for positive inputs, the output is equal to

the input

for strictly negative inputs, the output is

equal to zero

overcomes the vanishing gradients problem

In PyTorch:

relu = nn.ReLU()

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Introducing Leaky ReLU
Leaky ReLU:

For positive inputs, it behaves similarly to

ReLU

For negative inputs, it multiplies the input

by a small coefficient (defaulted to 0.01)

The gradients for negative inputs are never

null

In PyTorch:

leaky_relu = nn.LeakyReLU(negative_slope = 0.05)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
A deeper dive into
neural network
architecture
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Layers are made of neurons
Linear layers are fully connected
Each neuron of a layer connected to each
neuron of previous layer

A neuron of a linear layer:

computes a linear operation using all
neurons of previous layer

contains N+1 learnable parameters

where N = dimension of previous layer's

outputs

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Layer naming convention

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Tweaking the number of hidden layers
Input and output layers dimensions are fixed.
input layer depends on the number of features n_features

output layer depends on the number of categories n_classes

model = nn.Sequential(nn.Linear(n_features, 8),

nn.Linear(8, 4),
nn.Linear(4, n_classes))

We can use as many hidden layers as we want

Increasing the number of hidden layers = increasing the number of parameters = increasing
the model capacity

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Counting the number of parameters
Given the following model: Using PyTorch:

.numel() : returns the number of elements

model = nn.Sequential(nn.Linear(8, 4),
in the tensor
nn.Linear(4, 2))

total = 0
Manually calculating the number of
for parameter in model.parameters():
parameters:
total += parameter.numel()
first layer has 4 neurons, each neuron has print(total)
8+1 parameters = 36 parameters
46
second layer has 2 neurons, each neuron
has 4+1 parameters = 10 parameters
total = 46 learnable parameters

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Learning rate and
momentum
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Updating weights with SGD
Training a neural network = solving an optimization problem.
Stochastic Gradient Descent (SGD) optimizer

sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)

Two parameters:
learning rate: controls the step size

momentum: controls the inertia of the optimizer

Bad values can lead to:

long training times

bad overall performances (poor accuracy)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: optimal learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: small learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: high learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Without momentum
lr = 0.01 momentum = 0 , after 100 steps minimum found for x = -1.23 and y = -0.14

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

With momentum
lr = 0.01 momentum = 0.9 , after 100 steps minimum found for x = 0.92 and y = -2.04

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Summary

Learning rate Momentum

Controls the step size Controls the inertia
Too small leads to long training Null momentum can lead to the optimizer being stuck in a
times local minimum
Too high leads to poor
Non-null momentum can help find the function minimum
performances

Typical values between 10−2

Typical values between 0.85 and 0.99
and 10−4

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Layers initialization,
transfer learning
and fine tuning
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Layer initialization (1)
import torch.nn as nn
layer = nn.Linear(64, 128)
print(layer.weight.min(), layer.weight.max())

(tensor(-0.1250, grad_fn=<MinBackward1>), tensor(0.1250, grad_fn=<MaxBackward1>))

A layer weights are initialized to small values

The outputs of a layer would explode if the inputs and the weights are not normalized.

The weights can be initialized using different methods (for example, using a uniform
distribution)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Layer initialization (2)
import torch.nn as nn

layer = nn.Linear(64, 128)

nn.init.uniform_(layer.weight)

print(custom_layer.fc.weight.min(), custom_layer.fc.weight.max())

(tensor(0.0002, grad_fn=<MinBackward1>), tensor(1.0000, grad_fn=<MaxBackward1>))

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Transfer learning and fine tuning (1)
Transfer learning: reusing a model trained on a first task for a second similar task, to
accelerate the training process.

For example, we trained a first model on a large dataset of data scientist salaries across the
US and we want to train a new model on a smaller dataset of salaries in Europe.

import torch

layer = nn.Linear(64, 128)

torch.save(layer, 'layer.pth')

new_layer = torch.load('layer.pth')

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Transfer learning and fine-tuning
Fine-tuning = A type of transfer learning
Smaller learning rate

Not every layer is trained (we freeze some of them)

Rule of thumb: freeze early layers of network and fine-tune layers closer to output layer

import torch.nn as nn

model = nn.Sequential(nn.Linear(64, 128),

nn.Linear(128, 256))

for name, param in model.named_parameters():

if name == '0.weight':
param.requires_grad = False

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Py Torch
No ratings yet
Py Torch
786 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Final Exam Update Huawei
0% (1)
Final Exam Update Huawei
13 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Activation Functions: Ismail Elezi
No ratings yet
Activation Functions: Ismail Elezi
30 pages
Pytorch Tutorial: Narges Honarvar Nazari January 30
No ratings yet
Pytorch Tutorial: Narges Honarvar Nazari January 30
29 pages
Chapter 1
No ratings yet
Chapter 1
37 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Deep Learning With Scikit-Learn and PyTorch (2024)
100% (2)
Deep Learning With Scikit-Learn and PyTorch (2024)
126 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Deep Learning With PyTorch
No ratings yet
Deep Learning With PyTorch
19 pages
Deep Learning - Part II-1
No ratings yet
Deep Learning - Part II-1
23 pages
PyTorch PDF
No ratings yet
PyTorch PDF
72 pages
Pytorch Slides
No ratings yet
Pytorch Slides
31 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Pytorch Tutorial 1
No ratings yet
Pytorch Tutorial 1
48 pages
Chapter 3 - Training Deep Neural Networks
No ratings yet
Chapter 3 - Training Deep Neural Networks
25 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Chapter1 Intro
No ratings yet
Chapter1 Intro
35 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
Pytorch 101: Deep Learning PHD Course 2017/2018
No ratings yet
Pytorch 101: Deep Learning PHD Course 2017/2018
19 pages
Morgan & Claypool - Introduction To Deep Learning For Engineers Using Python and Google Clod Platform - 2020
No ratings yet
Morgan & Claypool - Introduction To Deep Learning For Engineers Using Python and Google Clod Platform - 2020
111 pages
Train Your Image Classifier Model With PyTorch
No ratings yet
Train Your Image Classifier Model With PyTorch
6 pages
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
No ratings yet
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
8 pages
PyTorch CrashCourse
No ratings yet
PyTorch CrashCourse
16 pages
یادگیری پایتورچ
No ratings yet
یادگیری پایتورچ
30 pages
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
No ratings yet
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
8 pages
Pytorch Demo 1749471354
No ratings yet
Pytorch Demo 1749471354
10 pages
Module02 PyTorch
No ratings yet
Module02 PyTorch
36 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Building Deep Learning Models Using The PyTorch Library
No ratings yet
Building Deep Learning Models Using The PyTorch Library
4 pages
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
No ratings yet
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
108 pages
Deep Learning Lab: How To Train Your First Neural Network
No ratings yet
Deep Learning Lab: How To Train Your First Neural Network
68 pages
PyTorch Guide With Code
No ratings yet
PyTorch Guide With Code
4 pages
DIP Lab 10
No ratings yet
DIP Lab 10
11 pages
Beginner's PyTorch Guide
No ratings yet
Beginner's PyTorch Guide
35 pages
Pytorch Cheatsheet EN
No ratings yet
Pytorch Cheatsheet EN
1 page
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (7)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
Introduction To PyTorch
No ratings yet
Introduction To PyTorch
35 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
2c PyTorch4
No ratings yet
2c PyTorch4
4 pages
A Brief Introduction To Pytorch: (A Deep Learning Library)
No ratings yet
A Brief Introduction To Pytorch: (A Deep Learning Library)
32 pages
Pytorch Tutorial 1 Rev 1
No ratings yet
Pytorch Tutorial 1 Rev 1
48 pages
DNN Hyperparameter Tuning
No ratings yet
DNN Hyperparameter Tuning
105 pages
PyTorch - A Comprehensive Overview
No ratings yet
PyTorch - A Comprehensive Overview
7 pages
Lecture 2: Introduction To Pytorch
No ratings yet
Lecture 2: Introduction To Pytorch
7 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
8 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
cs519 hw2
No ratings yet
cs519 hw2
15 pages
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
No ratings yet
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
19 pages
00 Pytorch and Deep Learning Fundamentals PDF
No ratings yet
00 Pytorch and Deep Learning Fundamentals PDF
44 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Day 45 PyTorch Presentation
No ratings yet
Day 45 PyTorch Presentation
67 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
B12158 Mastering PyTorch Ebook 15 Pages
No ratings yet
B12158 Mastering PyTorch Ebook 15 Pages
15 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
1157 CS F425 20231222015056 Mid Semester Question Paper DL
No ratings yet
1157 CS F425 20231222015056 Mid Semester Question Paper DL
2 pages
XOR Problem Demonstration Using MATLAB
0% (1)
XOR Problem Demonstration Using MATLAB
19 pages
Deep Learning Tutorial
No ratings yet
Deep Learning Tutorial
133 pages
Ai Fundamentals Final Exam
No ratings yet
Ai Fundamentals Final Exam
21 pages
100 Days of DEep Learning
No ratings yet
100 Days of DEep Learning
5 pages
3.convolutional Networks and Sequence Modeling
No ratings yet
3.convolutional Networks and Sequence Modeling
19 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
Unit 6
No ratings yet
Unit 6
41 pages
ML Imp Ques 2
No ratings yet
ML Imp Ques 2
37 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
5 pages
DL Student Lab Manual
No ratings yet
DL Student Lab Manual
81 pages
Support Vector Machines
No ratings yet
Support Vector Machines
69 pages
Perceptron & Back Propagation Algorithm
No ratings yet
Perceptron & Back Propagation Algorithm
35 pages
Lecture 2 Deep Learning Overview
No ratings yet
Lecture 2 Deep Learning Overview
98 pages
Thesis Report
No ratings yet
Thesis Report
44 pages
CS-601-CBGS: B.Tech., VI Semester
No ratings yet
CS-601-CBGS: B.Tech., VI Semester
4 pages
Sri Ram - Week 3 Assignment
No ratings yet
Sri Ram - Week 3 Assignment
14 pages
Comprehensive Popular Deep Learning Interview Questions Answers
No ratings yet
Comprehensive Popular Deep Learning Interview Questions Answers
15 pages
An Experimental Review On Deep Learning Architectures For Time Series Forecasting
No ratings yet
An Experimental Review On Deep Learning Architectures For Time Series Forecasting
25 pages
Endsem ML All Pyq
No ratings yet
Endsem ML All Pyq
9 pages
2.vanishing Gradient and Exploding Gradient Simple Notes
No ratings yet
2.vanishing Gradient and Exploding Gradient Simple Notes
2 pages
Generative AI (21CS733) AAT-1 Final Marks
No ratings yet
Generative AI (21CS733) AAT-1 Final Marks
8 pages
ANN - Session 3 CO 1
No ratings yet
ANN - Session 3 CO 1
8 pages
Bab 7
No ratings yet
Bab 7
3 pages
Back Propagation Example
No ratings yet
Back Propagation Example
3 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Chapter 3

Uploaded by

Chapter 3

Uploaded by

Discovering

Maham Faisal Khan

Bounded between 0 and 1

Can be used anywhere in the network

Approach zero for low and high values of x

Cause function to saturate

Sigmoid function saturation can lead to

This is also a problem for softmax.

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

for positive inputs, the output is equal to

for strictly negative inputs, the output is

overcomes the vanishing gradients problem

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

For positive inputs, it behaves similarly to

For negative inputs, it multiplies the input

The gradients for negative inputs are never

leaky_relu = nn.LeakyReLU(negative_slope = 0.05)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

A neuron of a linear layer:

contains N+1 learnable parameters

where N = dimension of previous layer's

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

output layer depends on the number of categories n_classes

model = nn.Sequential(nn.Linear(n_features, 8),

We can use as many hidden layers as we want

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

.numel() : returns the number of elements

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)

momentum: controls the inertia of the optimizer

Bad values can lead to:

bad overall performances (poor accuracy)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Learning rate Momentum

Typical values between 10−2

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

(tensor(-0.1250, grad_fn=<MinBackward1>), tensor(0.1250, grad_fn=<MaxBackward1>))

A layer weights are initialized to small values

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

layer = nn.Linear(64, 128)

(tensor(0.0002, grad_fn=<MinBackward1>), tensor(1.0000, grad_fn=<MaxBackward1>))

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

layer = nn.Linear(64, 128)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Not every layer is trained (we freeze some of them)

model = nn.Sequential(nn.Linear(64, 128),

for name, param in model.named_parameters():

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

You might also like