Deep Learning

Uploaded by

nomialsCry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views3 pages

Deep Learning

Uploaded by

nomialsCry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

GD: batch size parameter w = w - alpha*(dw / sqrt(s_dw))

Normalizing: faster convergence => move at a b = b - alpha*(db / sqrt(s_db))

similar scale
Mini batch GD: What happens if we raise beta?
- Batch size > 1
Stochastic GD: Adam:
- Bath size = 1, one training example at a time V_dw = beta*V_dw + (1-beta)*dw
- Extremely noisy S_dw = beta*S_dw + (1-beta)*dw^2
- No convergence
V_dw^corrected = V_dw / (1-beta_1^t)
Exponentially weighted moving averages S_dw^corrected = S_dw / (1-beta_2^t)
Moving average W = w - alpha*V_dw^corrected /
V_t = Beta*v_{t-1} + (1-b)*Theta_t (sqrt(S_dw^corrected) + e)
Averaging over 1/(1-Beta) - many t’s
Compute the average of the last t data points Hyperparameters choice:
So in time series let’s say you want the average of Use default values for the hyperparameters but
the quarters: alpha, the learning rate, needs to be tuned
(¼)*sum of the first year, then plot the average of ½ * Combines RMSProp and gradient with momentum
next two
Average of the last 1/(1-b) days plus some weight of Learning rate decay:
that day => Reduce learning rate - to oscillate over a tighter
Number of days proportional to b region
Recursive on sum (1-b)*(b^n)*T for n = 1 to 100 alpha = (1 / (1+decayRate*epoch))*alpha_0

What happens if we raise beta? Closer to the points Local optima:

Bias correction:
Starts off really low => add a bias term

Gradient descent with momentum:

Like moving average for derivatives, instead of time
series
V_dw = beta*V_dw + (1-beta)*dw
V_db = beta*V_db + (1-beta)*db
W = w - a*V_dw
b = b - a*V_db

What happens if we raise beta? It’s like learning rate

High β\betaβ Leads to More Horizontal Movement
=>shorter axis, smoother => depends more on the
general trend, less sensitive to noise
Low β\betaβ Leads to More Vertical Movement,
greater descent noisier => depends more on the Exponential, step decay
current level

RootMeanSprop:
S_dw = beta*S_dw + (1-beta)*dw^2
S_db = beta*S_db + (1-beta)*db^2
Computational resources SEQUENCE MODELS
- tune: learning rate, mini-batch size
- whether to try Panda or Caviar

New hyperparameters - should only be done if new

hardware or computational power is acquired =>
false

Batch normalization
Are beta and gamma learned

Deep learning programming frameworks don’t

require cloud-based machines to run

Framework allows fewer lines of code

Tasks that could be addressed by a many-to-one
RNN model architecture:
If searching among a large number of
hyperparameters, you SHOULD NOT try values What’s many-to-one RNN model architecture?
in a grid rather than random values, so that you Sequence then outputs 1 result
can carry out the search more systematically
and not rely on chance => use random search

Don’t use the most recent mini-batch’s mean

and sigma used for normalization

After training a neural network with batch norm,

at test time, to evaluate the neural network on a
new example, you should perform the needed
normalizations, use mean and sigma estimated
using an exponentially weighted average across
mini-batches seen during training
If you are training an RNN model, and find that your LSTM
weights and activations are all taking on the value of
NaN (“Not a Number”) gradients exploding

Gu has the same dim as # of hidden nodes

choose the r-th training sample first,

then the s-th word

If we want c<t> to be highly dependent on c<t-

1>, we want Gu to be very low,
Gr => about remembering previous states

Dimensionality in word embedding:

Question 10
The sparsity of connections and weight sharing are
mechanisms that allow us to use fewer parameters in a
convolutional layer making it possible to train a network
with smaller training sets. True/False?

Number of weights per filter:

Total number of weights for all filters:
Bias parameters: one per filter

LSTM

Gu => update gate

Gr => forget
Go => output gate
Gu has dimension = # hidden units in the LSTM

B.Ed. First Semester (CBS) Examination Perspectives in Sociological and Philosophical Bases of Education (Compulsory)
No ratings yet
B.Ed. First Semester (CBS) Examination Perspectives in Sociological and Philosophical Bases of Education (Compulsory)
3 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Training NNs
No ratings yet
Training NNs
34 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Unit 4 NNDL-1
No ratings yet
Unit 4 NNDL-1
12 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
Optimization
No ratings yet
Optimization
44 pages
Lect 7
No ratings yet
Lect 7
43 pages
Lec 8
No ratings yet
Lec 8
43 pages
Deep Learning Concepts
No ratings yet
Deep Learning Concepts
13 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
ANN Doc
No ratings yet
ANN Doc
2 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Cours 5
No ratings yet
Cours 5
23 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
ANNs
No ratings yet
ANNs
17 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
15 Deep
No ratings yet
15 Deep
39 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Batch Normalization
No ratings yet
Batch Normalization
7 pages
M3 Transcript
No ratings yet
M3 Transcript
10 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
ITNN Week3
No ratings yet
ITNN Week3
21 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
Adl Unit 1 2
No ratings yet
Adl Unit 1 2
67 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter I. Kattan
3.5/5 (11)
Differential Geometry and High Dimensional Concepts
No ratings yet
Differential Geometry and High Dimensional Concepts
2 pages
Stanford KNNassignment
No ratings yet
Stanford KNNassignment
78 pages
All The Math
No ratings yet
All The Math
6 pages
BV21
No ratings yet
BV21
25 pages
Absil malickProjectiveRetraction
No ratings yet
Absil malickProjectiveRetraction
25 pages
Nonlinear Orthogonal Projection
No ratings yet
Nonlinear Orthogonal Projection
31 pages
Graphs
No ratings yet
Graphs
2 pages
Grading Sheet: Tusik Elementary School
No ratings yet
Grading Sheet: Tusik Elementary School
1 page
Sip Annex 2a Child Friendly School Survey Bnhs
No ratings yet
Sip Annex 2a Child Friendly School Survey Bnhs
5 pages
Technology and Livelihood Education: SMAW NCI - Module 1 Identify and Select Materials and Tools
No ratings yet
Technology and Livelihood Education: SMAW NCI - Module 1 Identify and Select Materials and Tools
33 pages
The Presence of Feeling in Thought - Semantic Scholar
No ratings yet
The Presence of Feeling in Thought - Semantic Scholar
3 pages
Most People Believe That It S Important To Look Nice However Some People Say That We Place Too Much Importance On Appearance and Fashion
No ratings yet
Most People Believe That It S Important To Look Nice However Some People Say That We Place Too Much Importance On Appearance and Fashion
2 pages
Mark Scheme: Assessment Unit M1
No ratings yet
Mark Scheme: Assessment Unit M1
8 pages
Caption and Reflection
No ratings yet
Caption and Reflection
2 pages
SMEPA 2020-2021: May Anne C. Almario
100% (4)
SMEPA 2020-2021: May Anne C. Almario
19 pages
Sports Development
No ratings yet
Sports Development
11 pages
PSYC1030 Cycle of Science Template - Individual - 2023
No ratings yet
PSYC1030 Cycle of Science Template - Individual - 2023
2 pages
Unit 3 PDF
No ratings yet
Unit 3 PDF
2 pages
CE5026 CW1 Assignment Brief 2024 2025
No ratings yet
CE5026 CW1 Assignment Brief 2024 2025
4 pages
Template Manuskrip ICIE 24
No ratings yet
Template Manuskrip ICIE 24
5 pages
Tips For A Newscast
No ratings yet
Tips For A Newscast
2 pages
Sasojyjyqijoty
No ratings yet
Sasojyjyqijoty
3 pages
The Imperative
No ratings yet
The Imperative
41 pages
Contoh Personal Statement
100% (1)
Contoh Personal Statement
1 page
Probability Tables
No ratings yet
Probability Tables
19 pages
Microsoft Word - Eamcet 2022 - Catb Guidelines-Final
No ratings yet
Microsoft Word - Eamcet 2022 - Catb Guidelines-Final
3 pages
Were Was Were Were Were Was Were Were Were Were Were
No ratings yet
Were Was Were Were Were Was Were Were Were Were Were
5 pages
Chapter 6 Completing Business Messages-3
100% (1)
Chapter 6 Completing Business Messages-3
27 pages
Focus Group Discussions
No ratings yet
Focus Group Discussions
16 pages
Case 1 PC Soft Is A Company Selling Business Software. Their Major Selling Software Is
No ratings yet
Case 1 PC Soft Is A Company Selling Business Software. Their Major Selling Software Is
2 pages
Application Form
No ratings yet
Application Form
2 pages
Language in Use A Reader (Patrick Griffiths, Andrew John Merrison Etc.) (Z-Library)
No ratings yet
Language in Use A Reader (Patrick Griffiths, Andrew John Merrison Etc.) (Z-Library)
417 pages
Electrical System Design For The Proposed Alulod Barangay Hall, Indang, Cavite
No ratings yet
Electrical System Design For The Proposed Alulod Barangay Hall, Indang, Cavite
5 pages
Government of Karnataka Schemes For Primary and Secondary Education 1. Schemes of Department of Primary Education
No ratings yet
Government of Karnataka Schemes For Primary and Secondary Education 1. Schemes of Department of Primary Education
2 pages
7974 USTA RTC Training Manual
100% (4)
7974 USTA RTC Training Manual
123 pages
7411edn Assignment 1
No ratings yet
7411edn Assignment 1
23 pages