0% found this document useful (0 votes)
8 views

ITNN Week3

Uploaded by

shalinipriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

ITNN Week3

Uploaded by

shalinipriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction to Neural Network - Week 3

POST
GRADUATE
PROGRAM
AIML ARTIFICIAL INTELLIGENCE & MACHINE
LEARNING

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1
Brain Stormer
Q1. Back Propagation is a learning technique that adjusts weights in neural network by propagating weight
changes.

a. Forward from input to output


b. Backward from output to input
c. Forward from input to hidden layers
d. Backward from hidden layers to input

Q2. What is sigmoid as an activation function in neural network.


Answer-
A weighted sum of inputs is passed through an activation function and this output serves as an input to
the next layer. When the activation function for a neuron is a sigmoid function it is a guarantee that the output of
this unit will always be between 0 and 1.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Week 3: Introduction to neural networks and deep learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
3
Learning Objective
❖ Types of Optimizers
❖ Weight initialization
❖ Regularization
❖ Drop out
❖ Batch Normalization
❖ Types of neural networks
❖ Case study
❖ Questions

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
4
Different types of optimizers
1. SGD with Momentum
This method computes gradient by exponentially weighted averages, hence it takes less time to converge compared to normal
stochastic gradient descent

2. Adagrad ( Adaptive gradient algorithm )


Adagrad does not use momentum concept rather it utilizes different learning rates hence making it simpler than SGD with momentum

3. RMSProp (Root Mean Square Propagation )


RMS prop automatically adjusts the learning rate for each parameter

4. ADAM
ADAM proposes the characteristics of both SGD with Momentum and RMSprop

References
Paperspace, medium, Medium

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
5
Weight Initialization

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
6
Why Initialize Weights
The aim of weight initialization is to prevent layer activation outputs from exploding or vanishing during the course of a
forward pass through a deep neural network. If either occurs, loss gradients will either be too large or too small to flow
backwards beneficially, and the network will take longer to converge, if it is even able to do so at all.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
7
What Happends When W=0 Init Is Used

Output Layer
Input Layer
Hidden Layer

The method of setting W=0 serves almost no purpose as it causes neurons to perform the same calculation
in each iterations and produces same outputs. neurons will learn same features in each iterations.
This problem is known as network failing to break symmetry.

Text source = Medium

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
8
Initialization Techniques

● Zero initialization
● Random initialization
● Xavier initialization
● He initialization
● Kaiming initialization
● And many more

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
9
Regularization

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
10
Data Augmentation

Data augmentation is a technique to artificially


create new training data from existing training
data.

Image data augmentation is perhaps the most well-


known type of data augmentation and involves creating
transformed versions of images in the training dataset
that belong to the same class as the original image

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
11
Where We Should Do Data Augmentation

● We may not have a big dataset, so create more data.


● It helps in regularizing the network.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
12
Data Augmentation Pipeline
Load image and label

“Dog ”

Compute
loss
CNN

Transformimage

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
13
Data Augmentation Techniques
● Horizontal flips
● Rotation
● Crop/scale
● Color jitter
● Other creative techniques
○ Random mix/combinations of :
■ translation (what about a pure ConvNet?)
■ Rotation
■ Stretching
■ Shearing
■ lens distortions

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
14
Dropout
Dropout is a regularization method that approximates training a large number of neural networks with different
architectures in parallel.

During training, some number of layer outputs are randomly ignored or “dropped out.” This has the effect of making the
layer look-like and be treated-like a layer with a different number of nodes and connectivity to the prior layer. In effect,
each update to a layer during training is performed with a different “view” of the configured layer.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
15
Dropout

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
16
Dropout – How It Works

Forces the network to have a redundant representation.

has an ear X

has a tail

is furry X cat
score

has claws

mischievous X
look

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
17
Dropout
Another interpretation:

Dropout is training a large ensemble of models (that


share parameters).

Each binary mask is one model, gets trained on


only ~one batch.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
18
Batch Normalization
•Due to this normalization “layers” between each fully
connected layers, the range of input distribution of each layer
stays the same, no matter the changes in the previous layer

•Given x inputs from k-th neuron:

•Normalization brings all the inputs centered around 0. This


way, there is not much change in each layer input
Text and image source: Medium

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
19
Types of Neural Network

Feed Forward
Neural
Network

Convolutional
Neural
Network

Recurrent
Neural
Network

LSTM – Long
Short-Term
Memory

Sequence to
Sequence
Models
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
20
Thank you! :)
Questions are always welcome

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited

You might also like