0% found this document useful (0 votes)

15 views30 pages

Cours 4

Uploaded by

bouchaar dounia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views30 pages

Cours 4

Uploaded by

bouchaar dounia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Deep learning

Dr. Aissa Boulmerka

[email protected]

2023-2024

1
CHAPTER 4
PRACTICAL ASPECTS OF DEEP LEARNING

2
Applied ML is a highly iterative process
Idea
# layers
# hidden units
learning rates
activation functions …
Code
Experiment

 Its impossible to get all your hyperparameters right on a new application

from the first time.
 So the idea is you go through the loop: Idea ⟹ Code ⟹ Experiment.
 You have to go through the loop many times to figure out your
hyperparameters.

3
Train/dev/test sets
Classical ML (100 – 10000 samples):
Data: Training Dev Test

60% 20% 20%

Deep learning (1M samples)

Data: Training Dev Test

98% 1% 1%
99.5% 0.25% 0.25%

 Your data will be split into three parts:

 Training set (Has to be the largest set)
 Hold-out cross validation set / Development or "dev" set.
 Testing set.
 You will try to build a model upon training set.
 Then try to optimize hyperparameters on dev set as much as possible.
 Then after your model is ready you try and evaluate the testing set.
 The trend now gives the training data the biggest sets.
4
Mismatched train/test distribution

 Make sure dev set and test set come from the same distribution.
 For example if cat train set is from the web and the dev/test images are
from users cell phone they will mismatch. It is better to make sure that dev
and test set are from the same distribution.

Training set: Dev/test sets:

Cat pictures from Cat pictures from
webpages users using your app

 The dev set rule is to try them on some of the good models you've created.

 Its OK to only have a dev set without a testing set. But a lot of people in this case
call the dev set as the test set. A better terminology is to call it a dev set as its used
in the development.

5
Bias and Variance

high bias "just right" high variance

Underfitting Appropriate Overfitting

 Bias / Variance techniques are easy to learn, but difficult to master.

 So here the explanation of Bias / Variance:
 If your model is underfitting, it has a "high bias"
 If your model is overfitting then it has a "high variance"
 Your model will be alright if you balance the Bias / Variance.

6
Bias and Variance
𝑦=1 𝑦=0

Cat classification

Train set error 1% 15% 15% 0.5%

Dev set error 11% 16% 30% 1%
High variance High bias High bias and Low bias and
(Overfitting) (Underfitting) high variance low variance
(Overfitting and (Best)
underfitting

Assuming humans get 0% error

7
Basic recipe for machine learning

 Bigger network (size of hidden units,

number of layers).
High bias? YES  Try to run training longer.
(Training data performance)  Try NN architecture search.
 Try different (advanced) optimization
algorithms.
NO
 More data
High variance ? YES  Try regularization
(dev set performance)  NN architecture search (a different
model that is suitable for your data.)
NO

Done  Try until you have a low bias and low variance.
 Its very helpful to use deep learning for solving the "Bias/variance
tradeoff" problem because with deep learning you have more
options/tools.
 Training a bigger neural network never hurts.
8
Regularization (Logistic regression)
Euclidian norm
min 𝐽 𝑤, 𝑏 , 𝑤 ∈ ℝ𝑛𝑥 , 𝑏 ∈ ℝ
𝑤,𝑏

1 𝑚 𝜆 2 2 𝑛𝑥
𝑳𝟐 regularization : 𝐽 𝑤, 𝑏 = 𝑚 𝑖=1 ℒ 𝑦 𝑖 ,𝑦 𝑖
+ 2𝑚 𝑤 2, 𝑤 2 = 𝑗=1 𝑤𝑗
2
= 𝑤𝑇𝑤

1 𝑚 𝜆 𝑛𝑥
𝑳𝟏 regularization : 𝐽 𝑤, 𝑏 = 𝑚 𝑖=1 ℒ 𝑦 𝑖 ,𝑦 𝑖 + 𝑚 𝑤 1, 𝑤 1 = 𝑗=1 𝑤𝑗

 Adding regularization to NN will help it reduce variance (overfitting)

 L1 regularization version makes a lot of 𝑤 values become zeros, which makes the model
size smaller.
 L2 regularization is being used much more often.
 𝜆 (lambda) is the regularization parameter (hyperparameter).

9
Regularization (Neural network)

min 𝐽 𝑤 [1] , 𝑏 [1] , … , 𝑤 [𝐿] , 𝑏 [𝐿] ,

𝑤,𝑏
1 𝑚 𝜆 2 𝟐
𝐽 𝑤 [1] , 𝑏 [1] , … , 𝑤 [𝐿] , 𝑏 [𝐿] = 𝑖=1 ℒ 𝑦 𝑖 ,𝑦 𝑖
+ 𝑤 [𝑙] 𝐹
. 𝑭: Forbenius norm
𝑚 2𝑚
2 𝑛 [𝑙] 𝑛[𝑙−1] [𝑙] 2
𝑤 [𝑙] 𝐹
= 𝑖=1 𝑗=1 𝑤𝑖𝑗 𝑤 [𝑙] : 𝑛[𝑙] , 𝑛[𝑙−1]

𝜆 [𝐿]
𝑑𝑤 [𝐿] = 𝑓𝑟𝑜𝑚 𝑏𝑎𝑐𝑘𝑝𝑟𝑜𝑝 + 𝑤
𝑚
𝑤 [𝐿] = 𝑤 [𝐿] − 𝛼𝑑𝑤 [𝐿]
Weight decay:
𝜆  In practice this penalizes large
[𝐿] [𝐿]
𝑤 =𝑤 −𝛼 𝑓𝑟𝑜𝑚 𝑏𝑎𝑐𝑘𝑝𝑟𝑜𝑝 + 𝑤 [𝐿] weights and effectively limits
𝑚
𝛼𝜆 𝐿 the freedom in your model.
𝑤 [𝐿] = 𝑤 [𝐿] − 𝑤 − 𝛼 𝑓𝑟𝑜𝑚 𝑏𝑎𝑐𝑘𝑝𝑟𝑜𝑝 𝛼𝜆
 The new term 1 − 𝑚 𝑤 [𝐿]
𝑚
𝛼𝜆 causes the weight to decay in
𝑤 [𝐿] = 1− 𝑤 [𝐿] − 𝛼 𝑓𝑟𝑜𝑚 𝑏𝑎𝑐𝑘𝑝𝑟𝑜𝑝
𝑚 proportion to its size.
10
How does regularization prevent overfitting?

𝑥1 𝑚 𝐿
1 𝜆 2
𝑥2 𝑦 𝐽 𝑤 [𝑙] , 𝑏 [𝑙] =
𝑚
ℒ 𝑦 𝑖 ,𝑦 𝑖
+
2𝑚
𝑤 [𝑙] 𝐹
𝑖=1 𝑙=1

𝑥3
 If 𝜆 is very big ⇒ 𝑤 [𝑙] ≈ 0 ⇒ more simple neural network.

Here are some intuitions:

 Intuition 1:
 If 𝜆 is too large : a lot of w's will be close to zeros which will make the NN
simpler (you can think of it as it would behave closer to logistic regression).
 If 𝜆 is good enough: it will just reduce some weights that makes the neural
network overfit.

11
How does regularization prevent overfitting?

tanh:
𝑧 𝑧 − 𝑒 −𝑧
𝑎= 𝑧
𝑧 + 𝑒 −𝑧

≈ Linear Nonlinear

𝑧 [𝑙] = 𝑤 𝑙 𝑎 𝑙−1 + 𝑏 [𝑙]

If 𝜆 is very big ⇒ 𝑤 [𝑙] ≈ 0 ⇒ 𝑧 [𝑙] will be relatively small, then every layer will be ≈ Linear

 Intuition 2 (with tanh activation function):

 If lambda is too large: w's will be small (close to zero) - will use the linear part of the
tanh activation function, so we will go from non linear activation to roughly linear which
would make the NN a roughly linear classifier.
 If lambda good enough: it will just make some of tanh activations roughly linear which
will prevent overfitting.
12
Dropout regularization

𝑥1 𝑥1
𝑥2 𝑥2
𝑦 𝑦
𝑥3 𝑥3
𝑥4 𝑥4
0.5 0.5 0.5
 Go through each of the layers of the network and set some probability of
eliminating a node in neural network.
 For each of these layers, we're going to, for each node, toss a coin and have a 0.5
chance of keeping each node and 0.5 chance of removing each node.
 Then remove all the outgoing things from that node as well.

13
Dropout regularization

𝑥1 𝑥1
𝑥2 𝑥2
𝑦 𝑦
𝑥3 𝑥3
𝑥4 𝑥4
0.5 0.5 0.5
 We end up with a much smaller network.
 And then do back propagation training.

14
Implementing dropout (“Inverted dropout”)
keep_prob = 0.8 # 0 <= keep_prob <= 1
l = 3 # this code is only for layer 3
# the generated number that are less than 0.8 will be dropped.
80% stay, 20% dropped
d3 = np.random.rand(a[l].shape[0], a[l].shape[1]) < keep_prob
a3 = np.multiply(a3,d3) # keep only the values in d3
# increase a3 to not reduce the expected value of output
# (ensures that the expected value of a3 remains the same) - to
solve the scaling problem
a3 = a3 / keep_prob

 Vector d[l] is used for forward and back propagation and is the same for them, but
it is different for each iteration (pass) or training example.
 At test time we don't use dropout. If you implement dropout at test time - it would
add noise to predictions.

15
Why does drop-out work?
 Intuition: Can’t rely on any one feature, so have to spread out weights.

𝑥1
𝑦
𝑥2
1.0
𝑥3 1.0
0.7
1.0

0.7 0.5

16
Data augmentation

 In a computer vision data:

 You can flip all your pictures horizontally this will give you m more data instances.
 You could also apply a random position and rotation to an image to get more data.


4
in OCR, you can impose random rotations and distortions to digits/letters.
 New data obtained using this technique isn't as good as the real independent
data, but still can be used as a regularization technique.
17
Early stopping

𝑱
..
# iterations
 In this technique we plot the training set and the dev set cost together for each iteration.
At some iteration the dev set cost will stop decreasing and will start increasing.
 We will pick the point at which the training set error and dev set error are best (lowest
training cost with lowest dev cost).
 We will take these parameters as the best parameters.
 The advantage of this method is that you don't need to search a hyperparameter like
in other regularization approaches (like lambda in L2 regularization).
18
Normalizing training sets

𝒙𝟐 𝒙𝟐 𝒙𝟐

𝒙𝟏 𝒙𝟏

𝒙𝟏

Subtract mean: Normalize variance:

𝑚 𝑚
1 1 2
𝜇= 𝑋 (𝑖) 𝜎2 = 𝑋 (𝑖)
𝑚 𝑚
𝑖=1 𝑖=1
𝑋 ≔𝑋−𝜇 𝑋 ≔ 𝑋/𝜎 2

Use the same parameters 𝝁 and 𝝈𝟐 to normalize the test set.

19
Why normalize inputs?
Unnormalized : Normalized :
𝑱 𝑚 𝑱
1
J 𝑤, 𝑏 = ℒ 𝑦, 𝑦
𝑚
𝑖=1

𝒘 𝒘
𝒃 𝒃

𝒃 𝒃

𝒘 𝒘

If we normalize, we can use a much larger learning rate

𝛼 ⟹ speed up the training process
20
Vanishing/exploding gradients ‫اﻟﻤﺘﻔﺠﺮه‬/‫اﻟﺘﺪرﺟﺎت اﻟﻤﺘﻞاﺷﻴﻪ‬

𝑥1
𝑦
𝑥2
𝑤 [1] 𝑤 [2] 𝑤 [3] ⋯ 𝑤 [𝐿]

 The Vanishing / Exploding gradients occurs when your derivatives become very small
or very big.
 To understand the problem, suppose that we have a deep neural network with number of
layers L, and all the activation functions are linear and each b = 0
𝑔 𝑧 =𝑧 ,𝑏=0
𝑦 = 𝑤 [𝐿] 𝑤 [𝐿−1] ⋯ 𝑤 [2] 𝑤 [1] 𝑥
Example: Deep neural network (L layers)
0.5 0
If 𝑤 = ⟹ 0.5𝐿−1 ⇒ Vanishing
0 0.5
1.5 0
If 𝑤 = ⟹ 1.5𝐿−1 ⇒ Exploding
0 1.5

In both cases gradient descent takes a very long time

21
Vanishing/exploding gradients
 A partial solution to the Vanishing / Exploding gradients in NN is better or more careful
choice of the random initialization of weights.
 He/Xavier initialization:

𝑙 2
o For ReLU: 𝑊 = 𝑟𝑎𝑛𝑑 ∗
𝑛[𝑙−1]

𝑙 1
o For tanh: 𝑊 = 𝑟𝑎𝑛𝑑 ∗
𝑛[𝑙−1]

𝑙 2
o For tanh (Bengio et al.): 𝑊 = 𝑟𝑎𝑛𝑑 ∗
𝑛 𝑙 +𝑛 𝑙−1

 Number 1 or 2 in the nominator can also be a hyperparameter to tune (but not the first to
start with).
 This is one of the best way of partially solution to Vanishing / Exploding gradients (ReLU
+ Weight Initialization with variance) which will help gradients not to vanish/explode too
quickly.
 This initialization is called "He Initialization / Xavier Initialization" and has been
published in 2015 paper.
22
Gradient Checking
 If your cost does not decrease on each iteration you may have a back-
propagation bug.

 Gradient checking approximates the gradients and is very helpful for

finding the errors in your backpropagation implementation but it's
slower than gradient descent (so use only for debugging).

 Implementation of this is very simple.

23
Gradient Checking

 Take 𝑊 [1] , 𝑏 [1] , ⋯ , 𝑊 [𝐿] , 𝑏 [𝐿] and reshape into a big vector 𝜃.

The cost function will be 𝐉 𝜽

 Take 𝑑𝑊 [1] , 𝑑𝑏 [1] , ⋯ , 𝑑𝑊 [𝐿] , 𝑑𝑏 [𝐿] and reshape into a big

vector d𝜃.

Is 𝐝𝜽 the gradient of 𝐉 𝜽 ?

24
Gradient Checking
 Algorithm:

eps = 𝟏𝟎−𝟕 # small number

for i in len(𝜽):
𝐽 𝜃1 , 𝜃2 … , 𝜃𝑖 + 𝑒𝑝𝑠, … − 𝐽 𝜃1 , 𝜃2 … , 𝜃𝑖 − 𝑒𝑝𝑠, …
𝑑𝜃𝑎𝑝𝑝𝑟𝑜𝑥 [𝑖] =
2 ∗ 𝑒𝑝𝑠

𝑑𝜃𝑎𝑝𝑝𝑟𝑜𝑥 −𝑑𝜃
 Finally we evaluate this formula and check (with 𝑒𝑝𝑠 = 10−7) (*):
𝑑𝜃𝑎𝑝𝑝𝑟𝑜𝑥 + 𝑑𝜃
o if it is < 𝟏𝟎−𝟕 : great, very likely the backpropagation implementation is correct.
o if around 𝟏𝟎−𝟓 : can be OK, but need to inspect if there are no particularly big values in
𝑑𝜃𝑎𝑝𝑝𝑟𝑜𝑥 − 𝑑𝜃.
o if it is ≥ 𝟏𝟎−𝟑 : bad, probably there is a bug in backpropagation implementation.

(*) || ||: Euclidean vector norm.

25
Gradient checking implementation notes
 Don't use the gradient checking algorithm at training time because it's very slow.

 Use gradient checking only for debugging.

 If the algorithm fails grad check, look at components to try to identify the bug.

𝜆
 Don't forget to add 𝑚 𝑤 1 to 𝐽 if you are using L1 or L2 regularization.

 Gradient checking doesn't work with dropout because 𝐽 is not consistent.

o You can first turn off dropout (set keep_prob = 1.0 ), run gradient checking and then turn on dropout
again.

 Run gradient checking at random initialization and train the network for a while maybe
there's a bug which can be seen when weights 𝑤 and bias 𝑏 become larger (further from
0) and can't be seen on the first iteration (when weights 𝑤 and bias 𝑏 are very small).

26
Initialization summary
 The weights 𝑾 should be initialized randomly to break symmetry.
 However, you can initialize the biases 𝑏 to zeros. Symmetry is still broken so long
as 𝑾 is initialized randomly.
 Different initializations lead to different results.
 Random initialization is used to break symmetry and make sure different hidden
units can learn different things.
 Don't intialize to values that are too large.
 He initialization works well for networks with ReLU activations.

27
L2 Regularization summary
 Observations:
o λ is a hyperparameter that you can tune using a dev set.
o L2 regularization makes your decision boundary smoother. If λ is too large, it is also possible to
"oversmooth", resulting in a model with high bias.

 What is L2-regularization actually doing?:

o L2-regularization relies on the assumption that a model with small weights is simpler than a model with
large weights. Thus, by penalizing the square values of the weights in the cost function you drive all the
weights to smaller values. It becomes too costly for the cost to have large weights! This leads to a smoother
model in which the output changes more slowly as the input changes.

 What you should remember: Implications of L2-regularization on:

o cost computation: A regularization term is added to the cost
o backpropagation function: There are extra terms in the gradients with respect to weight matrices
o weights: weights end up smaller ("weight decay") - are pushed to smaller values.

28
Dropout summary
What you should remember about dropout:
 Dropout is a regularization technique.
 Only use dropout during training. Don't use dropout (randomly eliminate nodes) during test time.
 Apply dropout both during forward and backward propagation.
 During training time, divide each dropout layer by keep_prob to keep the same expected value for
the activations.

For example:
 If keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled
by 0.5 since only the remaining half are contributing to the solution.
 Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected
value.
 You can check that this works even when keep_prob is other values than 0.5.

29
References
 Andrew Ng. Deep learning. Coursera.
 Geoffrey Hinton. Neural Networks for Machine Learning.
 Kevin P. Murphy. Probabilistic Machine Learning An Introduction. MIT
Press, 2022.
 MIT Deep Learning 6.S191 (http://introtodeeplearning.com/)

Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
DLL Els Quarter 1 Week 3
No ratings yet
DLL Els Quarter 1 Week 3
3 pages
Deep Learning
100% (2)
Deep Learning
49 pages
2021 - IEEE Style Manual
0% (1)
2021 - IEEE Style Manual
70 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
ARB Payment Gateway REST API Integration Doc - V1.26
No ratings yet
ARB Payment Gateway REST API Integration Doc - V1.26
288 pages
Science 5 - Q2 - M12
No ratings yet
Science 5 - Q2 - M12
16 pages
Ceng403 - Week 6b
No ratings yet
Ceng403 - Week 6b
51 pages
Ensayos de Biología Superior
100% (1)
Ensayos de Biología Superior
6 pages
CV Vetting Guidelines 2023-24
No ratings yet
CV Vetting Guidelines 2023-24
16 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Chapter 3 - Problem Solving by Searching - 2
No ratings yet
Chapter 3 - Problem Solving by Searching - 2
90 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Bain Report Long Live Luxury Converge To Expand Through Turbulence
No ratings yet
Bain Report Long Live Luxury Converge To Expand Through Turbulence
32 pages
DL CS 7 M4 Live Class Flow
No ratings yet
DL CS 7 M4 Live Class Flow
37 pages
Chapter 1 - Introduction To AI-2017-2018
No ratings yet
Chapter 1 - Introduction To AI-2017-2018
82 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Week 10
No ratings yet
Week 10
69 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
DNN Tip
No ratings yet
DNN Tip
49 pages
Unit 4 Short Notes
No ratings yet
Unit 4 Short Notes
27 pages
LecML - 3 NN
No ratings yet
LecML - 3 NN
33 pages
AP05 Audit of Receivables
No ratings yet
AP05 Audit of Receivables
4 pages
2003 Roger Parker - Content Generator
No ratings yet
2003 Roger Parker - Content Generator
167 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
DL Regularization
No ratings yet
DL Regularization
28 pages
Unit 3
No ratings yet
Unit 3
110 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Breeding Scheme
No ratings yet
Breeding Scheme
15 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
Chapter 2 - Problem Solving by Searching - 1
No ratings yet
Chapter 2 - Problem Solving by Searching - 1
63 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Further Comments On DCTV's "Empathy"
100% (1)
Further Comments On DCTV's "Empathy"
13 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Cours1 Annotations
No ratings yet
Cours1 Annotations
42 pages
Cours 1
No ratings yet
Cours 1
42 pages
Cours2 Annotations
No ratings yet
Cours2 Annotations
25 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Chapter 2 - 4 Important Techniques
No ratings yet
Chapter 2 - 4 Important Techniques
34 pages
Unit 4
No ratings yet
Unit 4
13 pages
Dataset Augmentation
No ratings yet
Dataset Augmentation
30 pages
Sharplcd13 15 20s1u2
No ratings yet
Sharplcd13 15 20s1u2
59 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Microsemi APT100S20BG High-Voltage Schottky Diode Datasheet D
No ratings yet
Microsemi APT100S20BG High-Voltage Schottky Diode Datasheet D
8 pages
Cours 6
No ratings yet
Cours 6
26 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Cours 5
No ratings yet
Cours 5
23 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
Photometry and Instrumentation.V2
No ratings yet
Photometry and Instrumentation.V2
28 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Afm-Kiosk Deign Criteria 2021-R2
No ratings yet
Afm-Kiosk Deign Criteria 2021-R2
27 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
40 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
Theo 5 - Module 4
No ratings yet
Theo 5 - Module 4
26 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Fixing Neural Network Course 2 1659759284
No ratings yet
Fixing Neural Network Course 2 1659759284
30 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
DL IT324a 3
No ratings yet
DL IT324a 3
13 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Regularization For Neural Networks 1718966083
No ratings yet
Regularization For Neural Networks 1718966083
9 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Faster Eft
100% (1)
Faster Eft
3 pages
MC 10161751 9999
No ratings yet
MC 10161751 9999
3 pages
BR Gaswellblowoutfire
No ratings yet
BR Gaswellblowoutfire
8 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
DL Class3
No ratings yet
DL Class3
28 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Training Neural
No ratings yet
Training Neural
16 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
4 NN Regularization
No ratings yet
4 NN Regularization
13 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Monitoring Sheet MR Sia Opv Campaign Final 2023 Doc Grace
No ratings yet
Monitoring Sheet MR Sia Opv Campaign Final 2023 Doc Grace
12 pages
Answer Key
No ratings yet
Answer Key
1 page
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
No ratings yet
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
39 pages
C - 16922312 - Shafa Raisa Hazet - P-3.4 - 1
No ratings yet
C - 16922312 - Shafa Raisa Hazet - P-3.4 - 1
10 pages
Lesson 21 Organic & Inorganic Chemistry
No ratings yet
Lesson 21 Organic & Inorganic Chemistry
5 pages
Ogunka 3 PDF
No ratings yet
Ogunka 3 PDF
18 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
RFS Journal Primer - Interventional Oncology SL
No ratings yet
RFS Journal Primer - Interventional Oncology SL
14 pages
Analysis of Past Year Questions - Maths PMR
No ratings yet
Analysis of Past Year Questions - Maths PMR
1 page
Retail Supply Chain Management
No ratings yet
Retail Supply Chain Management
12 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
ACKS - Class - Illusionist PDF
No ratings yet
ACKS - Class - Illusionist PDF
8 pages
Internal Energy Change Equations
No ratings yet
Internal Energy Change Equations
2 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page