0% found this document useful (0 votes)
101 views44 pages

Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology

The document discusses the basic building blocks of neural networks including perceptrons and how they can be stacked to form neural networks. It covers loss minimization using gradient descent and how to implement and train artificial neural networks using libraries. Activation functions, backpropagation, stochastic gradient descent, and techniques for avoiding overfitting like regularization, dropout, and early stopping are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views44 pages

Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology

The document discusses the basic building blocks of neural networks including perceptrons and how they can be stacked to form neural networks. It covers loss minimization using gradient descent and how to implement and train artificial neural networks using libraries. Activation functions, backpropagation, stochastic gradient descent, and techniques for avoiding overfitting like regularization, dropout, and early stopping are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Omar Arif

[email protected]
National University of Sciences and Technology
 The Perceptron – Basic building block of Neural network

 Neural Network – Stacking perceptron to build neural networks

 Loss Minimization – Gradient descent

 Implementing ANNs – How to use libraries to implement neural


network
 Training ANNs
Building Block of Neural Network
Bias term

𝑥0 𝑤0
𝒙 = 𝑥1 , 𝐰 = 𝑤1
𝑥2 𝑤2

ℎ𝒘 𝒙 = g(𝐰 T 𝒙)

Non-linearity, activation functio


 Activation function allows to Introduce non-linearity into the
network
1
 Sigmoid: 𝜎(𝑧) =
1+𝑒 −𝑧

 Rectified Linear Unit: relu z = max 𝑧, 0

 Softplus: softplus z = log(1 + 𝑒 𝑧 )


𝒙𝟏 𝒙𝟐 𝒉

0 0 0

0 1 1

1 0 1

1 1 1
𝒙𝟏 𝒙𝟐 𝒉

0 0 0

0 1 0

1 0 0

1 1 1
 Non Linear Decision Boundary
𝒙𝟏 𝒙𝟐 𝒉

0 0 0
 Using the basic perceptron, we can
not approximate a non-linear
0 1 1
function

1 0 1

1 1 0
 Feature Engineering: Use higher order features such as 𝑥 2 or 𝑥 3
to obtain non-linear function.
 Problems: Don’t know what features to choose

 we would like to automate things and let the algorithm choose


the features
 Neural networks allow one to automatically learn the
representation/features of a linear classifier which are geared
towards the desired task, rather than specifying them all by
hand.
Building neural networks by stacking perceptron
x1 x2 h h h
AND NOR OR
0 0 0 1 1

0 1 0 0 0

1 0 0 0 0

1 1 1 0 1
Loss function tells us how good our neural network is
 Optimization problem
𝑚
Training data: 𝐷𝑡𝑟𝑎𝑖𝑛 = (𝑥 𝑖 , 𝑦 𝑖 ) 𝑖=1

min 𝐽(𝒘, 𝐷𝑡𝑟𝑎𝑖𝑛 )


𝒘

1
𝐽 𝒘𝐷𝑡𝑟𝑎𝑖𝑛 = ෍ 𝐿𝑜𝑠𝑠(𝑥, 𝑦, 𝒘)
𝑚
𝑥,𝑦 ∈𝐷𝑇𝑟𝑎𝑖𝑛

 Goal: Compute gradient

𝛻w J(𝐰, Dtrain )
 Mean squared loss error

𝐿𝑜𝑠𝑠 𝑥, 𝑦, 𝒘 = (ℎ𝑤 𝑥 − 𝑦)2

 Binary Cross Entropy Loss (Logistic Loss)

𝐿𝑜𝑠𝑠 𝑥, 𝑦, 𝒘 = −𝑦𝑙𝑜𝑔 log ℎ 𝑥 − 1 − 𝑦 (1 − log(ℎ(𝑥)))


 Forward Pass: compute the output of the network

 Backward Pass: compute gradients

See Backpropagation_examples.pdf
Stochastic Gradient
Batch Gradient Descent
Descent
 Initialize weights randomly  Initialize weights randomly
 Loop  Loop
 Computer gradient  For all data points in 𝐷𝑡𝑟𝑎𝑖𝑛
𝜕𝐽(𝒘)  Computer gradient
𝜕𝒘 𝜕𝐿𝑜𝑠𝑠(𝑥, 𝑦, 𝒘)
𝜕𝒘
 Update 𝒘  Update 𝒘
𝜕𝐽(𝒘) 𝜕𝐿𝑜𝑠𝑠(𝑥, 𝑦, 𝒘)
𝒘≔𝒘−𝜶 𝒘≔𝒘−𝜶
𝜕𝒘 𝜕𝒘
CIFAR10
MNIST
Fashion-MNIST
 Softmax function takes as input a vector of k real numbers and
normalizes it into a probability distribution
𝑒 𝑦𝑖
 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑦𝑖 =
σ𝑘
𝑖=1 𝑒 𝑦𝑖

𝑒 𝑦1 𝑝(𝑦 = 1|𝑥, 𝑤)
1
 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 = ⋮ = ⋮
σ𝑘
𝑖=1 𝑒
𝑦𝑖
𝑒 𝑦𝑘 𝑝(𝑦 = 𝑘|𝑥, 𝑤)

 Negative-Log-Likelihood-Loss= − σ𝑘
𝑖=1 log(𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑦𝑖 ))
The activation function of all neurons in the hidden layer is ReLU
The output neurons implement Logsoftmax
For complete code see cifar10linear.ipynb
See mnist_classification.ipynb
Labels
● 0 T-shirt/top
● 1 Trouser
● 2 Pullover
● 3 Dress
● 4 Coat
● 5 Sandal
● 6 Shirt
● 7 Sneaker
● 8 Bag
● 9 Ankle boot

Deadline: Submit .ipynb file by16th Feb. midnight


Mini batch gradient descent
Learning Rate
Avoiding overfitting
 Batch gradient descent
Batch gradient descent, computes the gradient of the cost
function w.r.t. to the parameters 𝑤 for the entire training dataset.
 Stochastic gradient descent
Stochastic gradient descent (SGD) in contrast performs a
parameter update for each training example (𝑥 𝑖 , 𝑦 𝑖 )
 Mini-batch gradient descent
Mini-batch gradient descent performs an update for every mini-
batch of 𝑛𝑏𝑎𝑡𝑐ℎ𝑠𝑖𝑧𝑒 training examples.
 How to choose Learning Rate

𝜕𝐽
𝒘≔𝒘−𝜶
𝜕𝒘
 Small learning rate converges slowly while large learning rate
overshoots

 Loss Landscape of Neural Nets is


non convex
 Momentum is a method that helps accelerate SGD in the relevant
direction and dampens oscillations

𝑣𝑡 = 𝛾𝑣𝑡−1 + 𝛼𝛻𝑤 𝐽
𝑤 ≔ 𝑤 − 𝑣𝑡
optimizer = optim.SGD(h.parameters(), lr = 0.001, momentum=.9)

http://ruder.io/optimizing-gradient-descent/
Learning rate is not fixed
 Adam (Adaptive Momentum Estimation)
 torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999))

 Adagrad - adapts the learning rate for each weight


torch.optim.Adagrad(params, lr=0.01)

https://pytorch.org/docs/stable/optim.html
1. L2 weight Regularization

𝐽 𝑤 = 𝐿𝑜𝑠𝑠 𝑥, 𝑦, 𝑤 + 𝜆σ𝑤 2

torch.optim.SGD(params, lr=<>, momentum=0, weight_decay=0)


Set weight_decay to 𝜆
1. Dropout:
 randomly select neurons and remove them along with incoming and
outgoing connections
 Forces the network to use all neurons

torch.nn.functional.dropout(input, p=0.5)
3. Early Stopping:
 Stop before the network starts to over fit
 The Perceptron – Basic building block of Neural network

 Neural Network – Stacking perceptron to build neural networks

 Loss Minimization – Gradient descent

 Implementing ANNs – How to use libraries to implement neural


network
 Training ANN

You might also like