Training Neural

Uploaded by

Papai Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

19 views16 pages

Training Neural

Uploaded by

Papai Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 16

sea, 1224 8 “ren upon: Dep er sey Rin PT see en Senin © rotons tonsa sone @ Ravindra Parmar (Jal) vono g Sep1t,2018 . Tminvead » © Listen Training Deep Neural Networks Deep Learning Accessories Deep Neural Network Inputayor hidden layr—niddontayer2 ‘hidden ayer 3“snr Deep Neral Neto Onep asin Acca | Ravina amar owas Da Scan impossible for humans. To achieve high level of accuracy, huge amount of data and henceforth computing power is needed to train these networks. However, despite the computational complexity involved, we can follow certain guidelines to reduce the time for training and improve model accuracy. In this article we will look through few of these techniques. Data Pre processing ‘The importance of data pre-processing can only be emphasized by the fact that your neural network is only as good as the input data used to train it. If important data inputs are missing, neural network may not be able to achieve desired level of accuracy. On the other side, if data is not processed beforehand, it could effect the accuracy as well as performance of the network down the lane. Mean subtraction (Zero centering) It’s the process of subtracting mean from each of the data point to make it zero-centered, Consider a case where inputs to neuron (unit) are all positive or all negative. In that case the gradient calculated during back propagationsera, 1221 a “rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane @ sien (Geisaed) zero-centered data “Mean subtraction (Zero centering the data) Data Normalization Normalization refers to normalizing the data to make it of same scale across all dimensions. Common way to do that is to divide the data across each dimension by it’s standard deviation. However, it only makes sense if you have a reason to believe that different input features have different scales but they have equal importance to the learning algorithm.sera, 1221 a “rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane @ sien (Geisared) original data normalized data 4 Normalization of data across both dimensions Parameter Initialization Deep neural networks are no stranger to millions or billions of parameters. The way these parameters are initialized can determine how fast our learning algorithm would converge and how accurate it might end up. The straightforward way is to initialize them all to zero. However, if we initialize weights of a layer to all zero, the gradients calculated will be same for each unit in the layer and hence update to weights would be same for all units. Consequently that layer is as good as a single logistic regression unit.= oernennnnnmear ing of 500 units and using tanh activation function. [Just a note on tanh activation before proceeding further]. layer deep neural network each cons 10 “Tanh activation function On the left is the plot for tanh activation function. There are few important points to remember about this activation as we move along :- + This activation is zero-centered.sera, 1221 a “rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane @ Senin (crtsaiea) ‘To start with, we initialize all weights from a standard Gaussian with zero mean and 1 e-2 standard deviation. W = 0.01 * np.random.vandn (fan_in, fan_out) Unfortunately, this works well only for small networks. And to see what issues it creates for deeper networks plots are generated with various parameters. These plots depict the mean, standard deviation and activation for each layer as we go deeper into the network.“snr Deep Neral Neto Dnep ening ezsaes yan Parnas Data Sone Signin => 7 4 (a(t 13 |: Mean standard deviation and activation aerose layers ‘input layer fad mean isdn taper had higden Layer 2 had hidden Layer 3 had hugden Layer $ had Ridden Layer 3 had ed fa hes hiaden Layer 6 Pgden Layer 7 higden Layer 8 Rigden Layer 9 hsinowadeataenecmivandap-eral-awart ist Stee a“rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane Signin (CGotstarted gradually as we go deeper into the network till it collapses to zero. This, as well, is obvious on account of the Fert that wo are multiplying the inputs with very small weights at each layer, " _? the gradients calculated would also be very small and hence update to weights will be negligible. Well not so good!!! Next let's try initializing weights with very large numbers. To do so, let’s sample weights from standard Gaussian with zero mean and standard deviation as 1.0 (instead of 0.01). W = 1.0 * np.random.randn (fan_in, fan_out) Below are the plots showing mean, standard deviation and activation for all layers.sera, 1221 a “snr Deep Neral Neto Dnep ening ezsaes yan Parnas Data Sone en Sianin Mean standard deviation and activation aeross layers iP ff ; i i layer 3 aver 3 hea tier oper hoa per 2 ayer 6 oa Layer 9 hoa Ayer aha => sinomadeataenecmivaringéap-eriawart est Stee one“snr Deep Neral Neto Onep asin Acca | Ravina amar owas Da Scan linearity (squashes to range +1 to -1). Consequently, the gradients calculated ‘would also be very close to zero as tanh saturates in these regimes (derivative is zero). Finally, the updates to weight would almost again be negligible. In practice, Xavier initialization is used for initializing the weights of all layers. The motivation behind Xavier initialization is to initialize the weights in such a way they do not end up in saturated regimes of tanh activation ie initialize with values not too small or too large. To achieve that we scale by the number of inputs while randomly sampling from standard Gaussian. W= 1.0 * np.xandom.xandn(fan_in, fan_out) / np.egrt (fan_in) However, this works well with the assumption that tanh is used for activation. This would surely break in case of other activation functions for e.g — ReLu. No doubt that proper initialization is still an active area of research. Batch normalization This is somewhat related idea to what all we discussed till now. Remember, we“rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane Itexplains why even after learning the mapping from some input to output, we need to re-train the learning algorithm to learn the mapping from that same input to output in case data distribution of input changes. However, the issue isn't resolved there only as data distribution could vary in deeper layers as well. Activation at each layer could result in different data distribution, Hence, to increase the stability of deep neural networks we need to normalize the data fed at each layer by subtracting the mean and dividing by the standard deviation. There's an article that explains this in depth. Regularization One of the most common problem in training deep neural network is over- ‘fitting. You'll realize over-fitting in play when your network performed exceptionally well on the training data but poorly on test data. This happens as our learning algorithm tries to fit every data point in the input even if they represent some randomly sampled noise as demonstrated in figure below.sera, 1221 a “rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane looming & reuarizaton Underfiting Overfitting Source Regularization helps avoid over-fitting by penalizing the weights of the network. To explain it further, consider a loss function defined for classification task over neural network as below :“snr Deep Neral Neto Onep asin Acca | Ravina amar owas Da Scan = Overall objective function to minimize. ~ Number of training samples ~ Actual label for ith training sample. ~ Predicted label for ith training sample. = Cross entropy loss. ~ Weights of neural network. ~ Regularization parameter. Notice how, regularization parameter (lambda) is used to control the effect of ‘weights on final objective function. So, in case lambda takes a very large value, weights of the network should be close to zero so as to minimize the objective function. But as we let weights collapse to zero, we would nullify the effect of many units in the layer and hence network is no better than a single linear classifier with few logistic regression units. And unexpectedly, this will throw us in the regime known as under-fitting which is not much better than over-fitting. Clearly, we have to choose the value of lambda very carefully so that at the end our model falls into balanced category(3rd plot in the figure). Dropout Regularization In addition to what we discussed, there's one more powerful technique to reduce over-fitting in deep neural networks known as dropout regularization.sera, 1221 a “rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane ° son Gomme) as to ignore those units during forward propagation or backward propagation. Ina sense this prevents the network from adapting to some specific set of features. (a) Standard Neural Net Source At each iteration we are randomly dropping some units from the network. And consequently, we are forcing each unit to not rely (not give high weights)sera, 1221 a “rsnrg Deep Neral Neto Onep asin Azza | Ravina amar owas Daa Sane ° son Gomme) ‘Muth of the content can be attributed to ‘Stanford University CS231n: Convolutional Neural Networks for Visual Recognition ‘Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping.. ceatanstanfordeds Please let me know through your comments any modification/improvements needed in the article. ‘Sign up for The Variable By Towards Data Science Every Thursday. the Variable delvrsthe very best of Towards Data Slence: rom hands-on tutorials and cutting-eege research to orignalfeatures you dont want toms, Takea ok,en - sarin (isan)

A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Neural Network (DNN)
100% (1)
Deep Neural Network (DNN)
80 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
General Observation
No ratings yet
General Observation
93 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Unit 3
No ratings yet
Unit 3
110 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Ceng403 - Week 6b
No ratings yet
Ceng403 - Week 6b
51 pages
Week 10
No ratings yet
Week 10
69 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
MCQ On Properties of Gases 5eea6a1539140f30f369f4a4
No ratings yet
MCQ On Properties of Gases 5eea6a1539140f30f369f4a4
21 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Ai-Ds 2
No ratings yet
Ai-Ds 2
31 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
DL CS 7 M4 Live Class Flow
No ratings yet
DL CS 7 M4 Live Class Flow
37 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
DeepLearning
No ratings yet
DeepLearning
32 pages
LecML - 3 NN
No ratings yet
LecML - 3 NN
33 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
DL Intro
No ratings yet
DL Intro
64 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Lecture 9 Training Deep Networks
No ratings yet
Lecture 9 Training Deep Networks
20 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
Cours 4
No ratings yet
Cours 4
30 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Initializing Neural Networks - Deeplearning - Ai
No ratings yet
Initializing Neural Networks - Deeplearning - Ai
15 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
25 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
DL IT324a 3
No ratings yet
DL IT324a 3
13 pages
DL Notes B Div
No ratings yet
DL Notes B Div
13 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Deep Learning Interview Q&a
No ratings yet
Deep Learning Interview Q&a
7 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Unit 4
No ratings yet
Unit 4
13 pages
Internet Security Protocols
No ratings yet
Internet Security Protocols
24 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Understanding Deep Convolutional Networks
No ratings yet
Understanding Deep Convolutional Networks
17 pages
MCQ On Gases and Laws Related To Gases 5eea6a0c39140f30f369dff2
No ratings yet
MCQ On Gases and Laws Related To Gases 5eea6a0c39140f30f369dff2
21 pages
MCQ On Gases and Laws Related To Gases 5eea6a0c39140f30f369dff2
No ratings yet
MCQ On Gases and Laws Related To Gases 5eea6a0c39140f30f369dff2
21 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages

Training Neural

Uploaded by

Training Neural

Uploaded by

You might also like