0% found this document useful (0 votes)

84 views48 pages

Deep Learning

The document discusses three axes of advancement in machine learning: network architectures, learning algorithms, and temporal/spatial hierarchy. It notes recurrent neural networks (RNNs) have improved over time through new architectures like LSTMs and GRUs that address the difficulty of training RNNs. RNNs are now achieving success in natural language processing tasks thanks to increased computational power, large datasets, and these new architectures that better model long-term dependencies.

Uploaded by

Bob Assan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views48 pages

Deep Learning

Uploaded by

Bob Assan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Deep

learning,
where are you going?
Kyunghyun Cho
New York University
Center for Data Science, and
Courant Institute of Mathematical Sciences
Three axes of advance in machine learning
1. Network architectures
• ConvNet

Algorithms
Learning
• Highway networks/ResNet
• LSTM/GRU
2. Learning algorithms
• Supervised learning
• Unsupervised learning

y ial
• Reinforcement learning

ch at
ar p
Network

er l/S
3. Temporal/Spatial Hierarchy
Architectures

Hi ra
• Thumbnails => high-res images

po
m
• Single frame => multi-frame video clips

Te
• Words => phrases => sentences => …
Three axes of advance in machine learning
1. Network architectures
• ConvNet

Algorithms
Learning
• Highway networks/ResNet
• LSTM/GRU
2. Learning algorithms
• Supervised learning
• Unsupervised learning

y ial
• Reinforcement learning

ch at
ar p
Network

er l/S
3. Temporal/Spatial Hierarchy
Architectures

Hi ra
• Thumbnails => high-res images

po
m
• Single frame => multi-frame video clips

Te
• Words => phrases => sentences => …
Network architectures
Recurrent networks LSTM/GRU-RNN
• The history of ML is a
series of new/old/old-is-
Hidden Markov State-space models
new-again/new-is-old- models (Kalman Filter, …)
already models Memory-enhanced RNN
PCA, GMM, … (NTM, DCN, MemNet, …)

• Many of them can be cast

Convolutional networks
into a neural net with Multilayer perceptrons,
(ReLU, Highway, ResNet,
Convolutional networks
different net Stride-1 Conv, …)

architectures
Gaussian process
Hebbian rule based (GPR, GPC, GPL)
linear models Kernel machines
(Perceptron, Hopfield network, ..) (Kernel SVM)

1950’s 1970’s 1980’s 1990’s 2000’s

Very inaccurate illustration of the history of ML

Network architectures
1. Improved representational power
• Linear separability and nonlinear classifiers
• Deeper networks for more complex problems
• Kernel methods for infinite-dimensional projection
• Probabilistic approaches for uncertainty modelling
Recurrent networks LSTM/GRU-RNN

2. Better inductive bias Hidden Markov State-space models

models (Kalman Filter, …)
• (local) Translation, rotation invariance PCA, GMM, …
Memory-enhanced RNN
(NTM, DCN, MemNet, …)
• Unbounded memory storage
Convolutional networks
Multilayer perceptrons,
(ReLU, Highway, ResNet,
Convolutional networks
Stride-1 Conv, …)

Gaussian process
Hebbian rule based (GPR, GPC, GPL)
linear models Kernel machines
(Perceptron, Hopfield network, ..) (Kernel SVM)

1950’s 1970’s 1980’s 1990’s 2000’s

Very inaccurate illustration of the history of ML

Making Ameri.. RNNs great again!
Recurrent networks LSTM/GRU-RNN

Hidden Markov State-space models

models (Kalman Filter, …)
Memory-enhanced RNN
PCA, GMM, … (NTM, DCN, MemNet, …)

Convolutional networks
Multilayer perceptrons,
(ReLU, Highway, ResNet,
Convolutional networks
Stride-1 Conv, …)

Gaussian process
Hebbian rule based (GPR, GPC, GPL)
linear models Kernel machines
(Perceptron, Hopfield network, ..) (Kernel SVM)

1950’s 1970’s 1980’s 1990’s 2000’s

Neural networks and language processing
in late-80’s and early-90’s
• First wave of applying neural
networks for natural languages
• Text as well as speech
Bob Allen
• Limited due to Yoshua Bengio

1. Lack of computational power Risto Miikkulainen

2. Lack of large-scale data
3. Unable to train an RNN well

Lonnie Chrisman Jeffrey Elman Jay McClelland

Recurrent networks are difficult to train

Yoshua Bengio (1994)

Juergen Schmidhuber (2013)

Making Ameri.. RNNs great again!
Recurrent networks LSTM/GRU-RNN

Hidden Markov State-space models

models (Kalman Filter, …)
Memory-enhanced RNN
PCA, GMM, … (NTM, DCN, MemNet, …)

Convolutional networks
Multilayer perceptrons,
(ReLU, Highway, ResNet,
Convolutional networks
Stride-1 Conv, …)

Gaussian process
Hebbian rule based (GPR, GPC, GPL)
linear models Kernel machines
(Perceptron, Hopfield network, ..) (Kernel SVM)

1950’s 1970’s 1980’s 1990’s 2000’s

Re-thinking a recurrent neural network
tanh-RNN as a CPU

Registers h
Execution
1. Read the whole register h

h 2. Update the whole register

h tanh(W [x] + U h + b)

2017-02-13 10
Re-thinking a recurrent neural network
GRU as a CPU

Registers h
h u Execution
1. Select a readable subset r
r 2. Read the subset r h
3. Select a writable subset u
h̃ 4. Update the subset
h u h̃ + (1 ut ) h

Clearly gated recurrent units* are much more realistic.

* By gated recurrent units, I refer to both LSTM and GRU
2017-02-13 [Hochreither & Schmidhuber, 1997; Cho et al., 2014]. 11
Re-thinking a recurrent neural network

h u • In practice, GRU or LSTM RNN’s are so much

easier to train and work well with a wider
range of hyperparameters
r
h̃

2017-02-13 12
Making RNNs great again!
Recurrent networks LSTM/GRU-RNN

Hidden Markov State-space models

models (Kalman Filter, …)
Memory-enhanced RNN
PCA, GMM, … (NTM, DCN, MemNet, …)

Convolutional networks
Multilayer perceptrons,
(ReLU, Highway, ResNet,
Convolutional networks
Stride-1 Conv, …)

Gaussian process
Hebbian rule based (GPR, GPC, GPL)
linear models Kernel machines
(Perceptron, Hopfield network, ..) (Kernel SVM)

1950’s 1970’s 1980’s 1990’s 2000’s

Re-thinking sequence-to-sequence learning

Sequence-to-sequence model The cat sat on the mat.

(or encoder-decoder network) Decoder
• Encode an input sequence
into a code vector z Code vector z
• Decode the code vector Encoder
into a target sequence
고양이가 매트 위에 앉았다.
(Allen, 1987; Chrisman, 1991; Neco&Forcada, 1997; Castano et al., 1997)
(Kalchbrenner&Blunsom, 2013; Cho et al., 2014; Sutskever et al., 2014)
Re-thinking sequence-to-sequence learning
?
• Motivated by human translators
고양이가 매트 위에 앉았다.
1. Summarize what has been translated so far
2. Find a relevant part
3. Write the next target symbol The cat ?
4. Go to 1
고양이가 매트 위에 앉았다.
• Machine learning view
1. Search for relevant info from the source The cat sat ?
• Based on the current state: what has
been generated so far 고양이가 매트 위에 앉았다.
2. Generate the next target symbol
3. Go to 1
The cat sat on ?

고양이가 매트 위에 앉았다.
on
The cat sat
Re-thinking Generate

sequence-to-sequence Su
mm
ariz
e
Agent
(Decoder)

learning

mand

Return
Com
• Cooperation among three agents Agent
1. Agent 1 (Encoder): transforms the source (Search)
sentence into a set of code vectors in a memory

Sele
ect
Insp

ct
2. Agent 2 (Search): searches for relevant code
vectors in the memory based on the command

Code vector
Code vector
Code vector
Code vector
Code vector
from the Agent 3 and returns them to the Agent

Memory
3.
3. Agent 3 (Decoder): observes the current state
(previously decoded symbols), commands the
Agent 2 to find relevant code vectors and
generates the next symbol based on them. Agent (Encoder)
고양이가 매트 위에 앉았다.
on
The cat sat
Attention-based
Generate

Agent
neural machine translation
Su
mm
ariz (Decoder)
e

mand

Return
Com
• Model Implementation
• Agent 1 (Encoder): Bidirectional GRU/LSTM-RNN Agent
• Agent 2 (Search): differentiable attention mechanism (Search)

Sele
• Agent 3 (Decoder): GRU/LSTM-RNN Language Model

ect
Insp

ct
• Learning algorithm
• Maximum likelihood: maximize the predictability of

Code vector
Code vector
Code vector
Code vector
Code vector
Memory
the Agent 3 (Decoder).
• Backpropagation through all the agents
• Now a de facto standard in machine translation
• Google Translate, Facebook, Systran, Naver, …
Agent (Encoder)
(Bahdandau, Cho & Bengio, 2015) 고양이가 매트 위에 앉았다.
Attention-based neural machine translation
• Flexibility in input/output representation
• Multilingual, character-level translation:
recurrent decoder, feedforward attention and recurrent-convolutional encoder

(Lee et al., 2016; Ha et al., 2016; Johnson et al., 2016)

• Attention-based image captioning:

recurrent decoder,
feedforward attention and
deep convolutional encoder
(Xu et al., 2015; Yao et al., 2015)
Memory-augmented The cat sat
on

Recurrent Neural Networks Generate

Su Agent
mm
ariz (Decoder)
• Agent (Decoder) decides what to store e

and

(Key, V
in the memory

Return
⇥n

m
⇥n

Com

alue)
• Agent (Decoder) may access from and
write to the memory multiple times per step Agent Agent
(Search) (Encoder)
• Memory may grow or shrink

Sele
ect

t
ec
• Closer to von Neumann architecture

Insp

sp
In
• Is it good? – We’ll see…

Memory
Memory nets: (Weston et al., 2014; Sukhbaatar et al.,

Slot
Slot
Slot
Slot
Slot
2015; Kumar et al., 2015; Miller et al., 2016; and many

ite
Wr
more)
Neural Turing machines: (Graves et al., 2014; Graves et
al., 2016; Gulcehre et al., 2016&2017; and many more)
Three axes of advance in machine learning
1. Network architectures
• ConvNet

Algorithms
Learning
• Highway networks/ResNet
• LSTM/GRU
2. Learning algorithms
• Supervised learning
• Unsupervised learning

y ial
• Reinforcement learning

ch at
ar p
Network

er l/S
3. Temporal/Spatial Hierarchy
Architectures

Hi ra
• Thumbnails => high-res images

po
m
• Single frame => multi-frame video clips

Te
• Words => phrases => sentences => …
Supervised learning
• Learner does not interact with the world
World Supervisor Learner
• Supervisor annotates data in advance
• Learner learns from the supervisor’s
feedback (reward, correct answer) Annotation
data collection
• Advantages Query
Answer
• Strong learning signal
Reward
• Offline training Correct Answer
Update
• Disadvantages
• Mismatch between training and test
Unsupervised learning
• Learner does not interact with the world
World Supervisor Learner
• Supervisor collects data
• No feedback from the supervisor
Annotation
• Advantages data collection
Query Update
• Potentially infinite amount of data
• Strong learning signal Query Update

• Disadvantages Query Update

• What’s the goal of learning?
Reinforcement learning
• Learner directly interacts with the world
World Learner
• There is no supervisor
Observe
• Learner learns from the world’s weak Act
feedback (reward alone) Observe
• Advantages Act
• Online learning: perfect match
between training and test Update
Reward
• Disadvantages
• Weak learning signal
• Non-trivial balance between
exploration and exploitation
Mix-and-match: Supervised+Reinforcement learning

Supervised/Reinforcement
Learning

Action
Selector

Unsupervised
Act

Feature Learning
Extraction Supervisor
Observe

World

The Obligatory LeCake!

Mix-and-match: Imitation Learning
• Learner directly interacts
with the world World Supervisor Learner
• Supervisor augments reward signal from Observe
the world
Act
• Advantages
Reward
• Match between training and test Reward
Update
• Strong learning signal Correct action

• Disadvantages
• Where do we get the supervisor???

(Ross et al., 2011; Daume III et al., 2007; and more…)

SafeDAgger: Query-Efficient Imitation Learning
World SafetyNet Supervisor Learner
• Supervisors are expensive
Observe
• As the learner gets better, less
Easy
intervention from the supervisor
Act
• Learner learns from difficult examples
Reward
• Questions: Observe
1. Where do we get the safety net? Diﬃcult
2. What is the impact on the
learner’s performance?
Act
Reward Correct action Update

(Zhang&Cho, 2017; Laskey et al., 2016)

SafeDAgger: Query-Efficient Imitation Learning
1. Learner observes the world
2. SafetyNet observes the learner

Learner

Steering
Brake
3. SafetyNet predicts whether the
learner will fail
4. If no, the learner continues
5. If yes,

Supervisor
Act
1. the supervisor intervenes SafetyNet
2. The learner imitate the CNN
supervisor’s behaviour
Observe
World
SafeDAgger: Learning

Learner

Steering
Brake
1. Initial labelled data sets: and
D0 D0S
⇡0
2. Train the policy using D0
3. Train the safety net using
0 D0S

Supervisor
Act
1. Target for the safety net given x 2 D S CNN
SafetyNet
⇢
1, if k⇡0 (x) y ⇤ k > ⌧
y⇤S = Observe
0, otherwise
World

4. Collect additional data D 0

1. Let drive, but the expert intervenes when
⇡0 0 (x) = 1
2. Collect data: D 0 D0 [ {(x, y)| 0 (x) = 1}
5. Data aggregation: D0 D0 [ D 0
6. Go to 2
SafeDAgger in Action
SafeDAgger in Action
SafeDAgger in Action
SafeDAgger
• Many aspects of learning
1. Main objective
2. Constraints imposed by the world
• Cannot go beyond physical constraints.
3. Safety of an agent
• Unlike in a game, you can’t hit a pedestrian and continue as
if nothing happened

Learner

Steering
• The car will break down if it has crashed many times

Brake
• How much can we automate?
1. Automatic determination of safety (SafetyNet)

Supervisor
Act
SafetyNet
2. …? CNN

Observe
World
Three axes of advance in machine learning
1. Network architectures
• ConvNet

Algorithms
Learning
• Highway networks/ResNet
• LSTM/GRU
2. Learning algorithms
• Supervised learning
• Unsupervised learning

y ial
• Reinforcement learning

ch at
ar p
Network

er l/S
3. Temporal/Spatial Hierarchy
Architectures

Hi ra
• Thumbnails => high-res images

po
m
• Single frame => multi-frame video clips

Te
• Words => phrases => sentences => …
Awesomeness
everywhere! Awesome
Auto-
Driver
Awesome
Q&A

Awesome Awesome
RoboArm ASR
Controller

Awesome
ConvNet
Awesome
Awesome Awesome Atari
LM Meta-Player
Awesome
Learner
Program
Interpreter
I want something like…
Awesome
Q&A
Awesome
Awesome Memory
ConvNet

Awesome
ASR

Awesome
LM

But, do we want to train

this end-to-end? Awesome
RoboArm
Awesome
Auto-
Controller
Driver
Awesome

Neural networks are modules

Q&A
Awesome
Awesome Memory
ConvNet

Awesome
ASR

Awesome
LM

Awesome
• Each module is used Memory Awesome
RoboArm
Controller
Awesome
Auto-

for multiple tasks

Driver

• Interactions among
modules are not trivial
Awesome Awesome Awesome
• Shared representation ConvNet LM ASR
of information among
different modules
• Time to go beyond
end-to-end learning? Awesome Awesome Awesome
Auto- RoboArm Q&A
Driver Controller
Awesome

Learning to use an NN module

Q&A
Awesome
Awesome Memory
ConvNet

Awesome
ASR

Higher- Awesome

• Q&A system level LM

Module
1. Receives a question via Awesome
Awesome

awesome LM+ASR
RoboArm
Auto-
Controller
Driver

2. Retrieves relevant info from

awesome memory
3. Generates a response via Output Output Output
awesome LM
• Autonomous driving
1. Senses the environment with Neural Neural Neural
awesome ConvNet+ASR Network Network Network
2. Plans the route with
awesome memory
3. Controls a car via awesome
Input Input Input
robot arm controller
t-1 t t+1

But, simple composition of neural networks may not work! Why Not?
Awesome

Learning to use an NN module

Q&A
Awesome
Awesome Memory
ConvNet

Awesome
ASR

Awesome
LM

Higher-
• Why not? level
Awesome
RoboArm
Controller
Awesome
Auto-
Driver

• Target tasks are often unknown at Module

training time
• Input/output with a large available
training set are too rigid
Output Output Output
• Rich information captured by the NN
module must be passed along
• Internal of the NN module must allow Neural Neural Neural
external manipulation Network Network Network

Reminds us of the memory-augmented Input Input Input

recurrent neural networks.. t-1 t t+1
Good: NN’s are totally transparent!
• NN’s are not black boxes.
• We can observe every single bit
inside a neural net.

Bad: NN’s are not easy to understand!

• Humans are not good with high-dimensional
vectors
• Distributed representation
• exponential combinations of hidden units

(Karpathy et al., 2015) Hidden activations of a recurrent language model

Awesome

Learning to use an NN module

Q&A
Awesome
Awesome Memory
ConvNet

Awesome
ASR

Awesome
LM

Trainable
• Neural nets are good at interpreting NN Awesome
RoboArm
Awesome
Auto-

high-dimensional input Module Controller

Driver

• Neural nets are also good at

predicting high-dimensional output
• Internal representation learned by a Output Output Output
neural network is well structured
• Neural nets can be trained with an
arbitrary objective [reinforcement Neural Neural Neural
learning] Network Network Network

Input Input Input

t-1 t t+1
(My NSF Proposal, 2016)
(1) Simultaneous Translation
Decoding
1. Start with a pretrained neural machine
translation model
2. Build a simultaenous decoder that
intercepts and interprets the incoming
signal
3. Simultaneous decoder forces the
pretrained model to either
1. output a target symbol, or
2. wait for a next source symbol
Learning
1. Trade-off between delay and quality
2. Stochastic policy gradient (REINFORCE)
(Gu, Cho & Li, 2016)
(1) Simultaneous Translation
(2) Trainable Decoding Algorithm
Trainable
Decoder

yt |ŷ<t , X
Decoding
1. Start with a pretrained neural ht 1 , ŷt 1 Decoder
machine translation model GRU/LSTM

2. Build a trainable decoder that ct

intercepts and interprets the
Attention
incoming signal

Sele
ect
3. Trainable decoder sends out the

Insp

ct
altering signal back to the

Code vector
Code vector
Code vector
Code vector
Code vector
pretrained model

Memory
Learning
1. Deterministic policy gradient
2. Maximize any arbitrary objective Agent (Encoder)
(Gu, Cho & Li, 2017) 고양이가 매트 위에 앉았다.
Awesome

Learning to use an NN module

Q&A
Awesome
Awesome Memory
ConvNet

Awesome
ASR

Awesome
LM

Trainable
• Spatio-temporal abstraction as NN
Awesome
RoboArm
Awesome
Auto-

learning to glue together multiple

Controller
Driver

Module
lower-level, multi-purpose neural
network modules + black-box modules
• Enables sequential, asynchronous
learning at multiple scales Output Output Output

• Enables higher-level planning

• Potential framework for meta-learning
Neural Neural Neural
• Potential security threat? Network Network Network

Input Input Input

t-1 t t+1
Te
m
po
Hi ra
er l/S
ar p
ch at
y ial Learning
Algorithms
Thank you!

Network
Architectures
(1) Trainable Decoding Algorithm
Trainable
Decoder

yt |ŷ<t , X
Models
1. Actor ⇡ : R3d ! Rd ht 1 , ŷt 1 Decoder
• Input: prev. hid. state , prev. symbol , and
ht 1 ŷt 1 GRU/LSTM
context from the attention model
ct
ct
• Output: additive bias for hid. state zt
• Example: Attention

Sele
zt = U (W [ht 1 ; E(ŷ); ct ] + b) + c

ect
Insp

ct
2. Critic Rc : Rd ⇥ · · · ⇥ Rd ! R

Code vector
Code vector
Code vector
Code vector
Code vector
• Input: a sequence of the hidden states from the decoder

Memory
• Output: a predicted return
• In our case, the critic estimates the full return rather than
Q at each time step
Agent (Encoder)
(Gu, Cho & Li, 2017) 고양이가 매트 위에 앉았다.
(1) Trainable Decoding Algorithm
Trainable
Decoder

yt |ŷ<t , X
Learning
1) Generate translation given a source sentence with noise ht 1 , ŷt Decoder
1
((h1 , z1 ), . . . , (hT , zT )) and R GRU/LSTM
c 2
2) Train the critic to minimize (R (h 1 , . . . , h T ) R)
ct
3) Generate multiple translations with noise
(h11 , z11 ), . . . , (h1T , zT1 ) , . . . , (hM , z M
), . . . , (h M
, z M Attention
1 1 T T )

Sele
ect
4) Critic-aware actor learning: newly proposed

Insp

ct
M
1 X exp( (Rm
c
Rm )2 ) @Rc

Code vector
Code vector
Code vector
Code vector
Code vector
Memory
M m=1 Z @⇡

Inference: simply throw away the critic and use the actor
Agent (Encoder)
(Gu, Cho & Li, 2017) 고양이가 매트 위에 앉았다.

Deep Learning
No ratings yet
Deep Learning
37 pages
Deep Learning and Its Applications
No ratings yet
Deep Learning and Its Applications
33 pages
Understanding Deep Learning
100% (1)
Understanding Deep Learning
39 pages
RLDL End Sem
No ratings yet
RLDL End Sem
230 pages
CH-6 Assignment - Models Modified
No ratings yet
CH-6 Assignment - Models Modified
48 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
C100 Service Training Manual:: All Wheel Drive (AWD)
No ratings yet
C100 Service Training Manual:: All Wheel Drive (AWD)
18 pages
Lecture 5
No ratings yet
Lecture 5
102 pages
Rupam's Master Thesis
No ratings yet
Rupam's Master Thesis
58 pages
Lecture1 ANN - Full
No ratings yet
Lecture1 ANN - Full
66 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
11 RNN
No ratings yet
11 RNN
32 pages
The Evolution of Deep Learning
No ratings yet
The Evolution of Deep Learning
53 pages
Module 3
No ratings yet
Module 3
97 pages
Deep Learning 15 May 2014
No ratings yet
Deep Learning 15 May 2014
70 pages
Omkw 1
No ratings yet
Omkw 1
32 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
Aerobic Respiration Worksheet
No ratings yet
Aerobic Respiration Worksheet
2 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
CS480 Lecture November 28th
No ratings yet
CS480 Lecture November 28th
96 pages
Deep Learning For Natural Language GDG Bloomington 1690248059
No ratings yet
Deep Learning For Natural Language GDG Bloomington 1690248059
41 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
MV cs4243 2024 Amir 6 p0
No ratings yet
MV cs4243 2024 Amir 6 p0
40 pages
cq02 Vdthanh Ass3
No ratings yet
cq02 Vdthanh Ass3
20 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Chemistry 12 (PBA QIB)
No ratings yet
Chemistry 12 (PBA QIB)
27 pages
Machine Learning Deep Learning Overview AIST
No ratings yet
Machine Learning Deep Learning Overview AIST
86 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
PP&DS 5
No ratings yet
PP&DS 5
31 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
82bace127438068b8ebe
No ratings yet
82bace127438068b8ebe
73 pages
JHC Common Entrance Exam Cee For Fy Ug Self Financing Programmes 2024 25
No ratings yet
JHC Common Entrance Exam Cee For Fy Ug Self Financing Programmes 2024 25
20 pages
Deep Learning Most Important Ideas PDF
No ratings yet
Deep Learning Most Important Ideas PDF
16 pages
Exploring The Efficacy of LSTM Networks in Machine Translation: A Survey of Techniques and Applications
No ratings yet
Exploring The Efficacy of LSTM Networks in Machine Translation: A Survey of Techniques and Applications
11 pages
DL - Unit - 1 - Foundations of Deep Learning
No ratings yet
DL - Unit - 1 - Foundations of Deep Learning
35 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Deep Learning 1.0 and Beyond: A Tutorial
No ratings yet
Deep Learning 1.0 and Beyond: A Tutorial
50 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
Adaptive Deep Supervised Autoencoder Based Image R PDF
No ratings yet
Adaptive Deep Supervised Autoencoder Based Image R PDF
15 pages
NN DL Unit - III
No ratings yet
NN DL Unit - III
19 pages
IC Unit6 DeepLearning
No ratings yet
IC Unit6 DeepLearning
35 pages
Parameter Estimation of A Plucked String Synthesis Model Using A Genetic Algorithm With Perceptual Fitness Calculation
No ratings yet
Parameter Estimation of A Plucked String Synthesis Model Using A Genetic Algorithm With Perceptual Fitness Calculation
15 pages
CH 4 Deep Learning
No ratings yet
CH 4 Deep Learning
7 pages
Self-Adaptive Control Systems
No ratings yet
Self-Adaptive Control Systems
130 pages
NN DL
No ratings yet
NN DL
54 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
A Self Attentional Auto Encoder Based in PDF
No ratings yet
A Self Attentional Auto Encoder Based in PDF
9 pages
Neural Networks:: Basics Using MATLAB
No ratings yet
Neural Networks:: Basics Using MATLAB
54 pages
Neural Networks:: Basics Using MATLAB
No ratings yet
Neural Networks:: Basics Using MATLAB
54 pages
Applications
No ratings yet
Applications
6 pages
2020 Vehicle Technologies Office Annual Merit Review High Efficiency Powertrain For Heavy Duty Trucks Using Silicon Carbide (Sic) Inverter
No ratings yet
2020 Vehicle Technologies Office Annual Merit Review High Efficiency Powertrain For Heavy Duty Trucks Using Silicon Carbide (Sic) Inverter
23 pages
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
No ratings yet
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
10 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Session 8
No ratings yet
Session 8
24 pages
Assignment 14 Modern AI
No ratings yet
Assignment 14 Modern AI
3 pages
Unit 3 Introduction To Deep Learning Part 1
No ratings yet
Unit 3 Introduction To Deep Learning Part 1
7 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Deep Learning 2 July 2014
No ratings yet
Deep Learning 2 July 2014
75 pages
This Paper Is SAMPLE of The Official TSH Scholarship Event Exam (This Sample Is Missing The Optional Question 81 and Will Be Updated Soon)
100% (1)
This Paper Is SAMPLE of The Official TSH Scholarship Event Exam (This Sample Is Missing The Optional Question 81 and Will Be Updated Soon)
42 pages
Deep Learning 1.0 and Beyond: A Tutorial
No ratings yet
Deep Learning 1.0 and Beyond: A Tutorial
55 pages
Neural Networks and Their Applications: Machine Learning
No ratings yet
Neural Networks and Their Applications: Machine Learning
8 pages
Operating Instructions New
No ratings yet
Operating Instructions New
54 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Nasa 5020a - Its All in The Preload - Predictive Engineering Fea Consulting Engineering Service 20201230
No ratings yet
Nasa 5020a - Its All in The Preload - Predictive Engineering Fea Consulting Engineering Service 20201230
8 pages
Machine Translation Wise 2016/2017
No ratings yet
Machine Translation Wise 2016/2017
58 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
0 1 App Log
No ratings yet
0 1 App Log
13 pages
Inbound 1766823743387247522
No ratings yet
Inbound 1766823743387247522
6 pages
Stacked Progressive Auto-Encoders (SPAE) For Face Recognition Across Poses
No ratings yet
Stacked Progressive Auto-Encoders (SPAE) For Face Recognition Across Poses
8 pages
Ch3 Rotor System Operation PDF
No ratings yet
Ch3 Rotor System Operation PDF
13 pages
Specification and Description
No ratings yet
Specification and Description
16 pages
Unit 1
No ratings yet
Unit 1
20 pages
Deep Learning in Neural Networks: An Overview
No ratings yet
Deep Learning in Neural Networks: An Overview
31 pages
TC 20140501 0022-Desbloqueado PDF
No ratings yet
TC 20140501 0022-Desbloqueado PDF
5 pages
Face Recognition Based On Deep Autoencoder Networks With Dropout
No ratings yet
Face Recognition Based On Deep Autoencoder Networks With Dropout
4 pages
ANN Unit 3 Answers
No ratings yet
ANN Unit 3 Answers
12 pages
Deep Features Creation For Smile Classification in Biometric Systems
No ratings yet
Deep Features Creation For Smile Classification in Biometric Systems
3 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Asce 7-22 CH 01 - For PC
100% (2)
Asce 7-22 CH 01 - For PC
17 pages
Risk Management Package
No ratings yet
Risk Management Package
25 pages
GMAT Quant Topic 8 - Probability Solutions
No ratings yet
GMAT Quant Topic 8 - Probability Solutions
20 pages
Apoorva Nandakumar Resume PDF
No ratings yet
Apoorva Nandakumar Resume PDF
2 pages
Scheme of Examination
No ratings yet
Scheme of Examination
42 pages
Project Plan - Kel 5 PDF
No ratings yet
Project Plan - Kel 5 PDF
5 pages
Bla Bla
No ratings yet
Bla Bla
6 pages
Basic Programming Sample Paper
No ratings yet
Basic Programming Sample Paper
11 pages
Experiment No: 4: Aim: Purpose
No ratings yet
Experiment No: 4: Aim: Purpose
3 pages
JASCO FT-IR Spectrometers
No ratings yet
JASCO FT-IR Spectrometers
2 pages
How To Install Ubuntu Linux From USB Drive
No ratings yet
How To Install Ubuntu Linux From USB Drive
2 pages
Recovering A Project From A MER File
No ratings yet
Recovering A Project From A MER File
4 pages
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet

Deep Learning

Uploaded by

Deep Learning

Uploaded by

Deep

• Many of them can be cast

1950’s 1970’s 1980’s 1990’s 2000’s

Very inaccurate illustration of the history of ML

2. Better inductive bias Hidden Markov State-space models

1950’s 1970’s 1980’s 1990’s 2000’s

Very inaccurate illustration of the history of ML

Hidden Markov State-space models

1950’s 1970’s 1980’s 1990’s 2000’s

1. Lack of computational power Risto Miikkulainen

Lonnie Chrisman Jeffrey Elman Jay McClelland

Yoshua Bengio (1994)

Juergen Schmidhuber (2013)

Hidden Markov State-space models

1950’s 1970’s 1980’s 1990’s 2000’s

h 2. Update the whole register

Clearly gated recurrent units* are much more realistic.

h u • In practice, GRU or LSTM RNN’s are so much

Hidden Markov State-space models

1950’s 1970’s 1980’s 1990’s 2000’s

Sequence-to-sequence model The cat sat on the mat.

(Lee et al., 2016; Ha et al., 2016; Johnson et al., 2016)

• Attention-based image captioning:

Recurrent Neural Networks Generate

• Disadvantages Query Update

The Obligatory LeCake!

(Ross et al., 2011; Daume III et al., 2007; and more…)

(Zhang&Cho, 2017; Laskey et al., 2016)

4. Collect additional data D 0

But, do we want to train

Neural networks are modules

for multiple tasks

Learning to use an NN module

• Q&A system level LM

2. Retrieves relevant info from

Learning to use an NN module

• Target tasks are often unknown at Module

Reminds us of the memory-augmented Input Input Input

Bad: NN’s are not easy to understand!

(Karpathy et al., 2015) Hidden activations of a recurrent language model

Learning to use an NN module

high-dimensional input Module Controller

• Neural nets are also good at

Input Input Input

2. Build a trainable decoder that ct

Learning to use an NN module

learning to glue together multiple

• Enables higher-level planning

Input Input Input

You might also like