0% found this document useful (0 votes)

8 views36 pages

5th Unit

Uploaded by

tejaswini reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views36 pages

5th Unit

Uploaded by

tejaswini reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Applications

Lecture slides for Chapter 12 of Deep Learning

www.deeplearningbook.org
Ian Goodfellow
2018-10-25
Disclaimer
• Details of applications change much faster than the
underlying conceptual ideas

• A printed book is updated on the scale of years, state-

of-the-art results come out constantly

• These slides are somewhat more up to date

• Applications involve much more specific knowledge, the

limitations of my own knowledge will be much more
apparent in these slides than others

(Goodfellow 2018)
Large Scale Deep Learning
Number of neurons (logarithmic scale)

1011 Human
1010
17 20
109 16 19 Octopus
108 14 18
107 11 Frog
106 8
105 3 Bee
Ant
104
103 Leech
13
102
101 1 2 12 15 Roundworm
6 9
100 5 10
10 1 4 7
10 2 Sponge
1950 1985 2000 2015 2056
Year
ure 1.11: Since the introduction of hidden units, artificial neural networks have doub
ize roughly every 2.4 years. Biological neural network sizes from Wikipedia (2015
Figure 1.11 (Goodfellow 2018)
Fast Implementations
• CPU

• Exploit fixed point arithmetic in CPU families where this oﬀers a speedup

• Cache-friendly implementations

• GPU

• High memory bandwidth

• No cache

• Warps must be synchronized

• TPU

• Similar to GPU in many respects but faster

• Often requires larger batch size

• Sometimes requires reduced precision

(Goodfellow 2018)
Distributed Implementations
• Distributed

• Multi-GPU

• Multi-machine

• Model parallelism

• Data parallelism

• Trivial at test time

• Synchronous or asynchronous SGD at train time

(Goodfellow 2018)
Synchronous SGD

TensorFlow tutorial (Goodfellow 2018)

Example: ImageNet in 18
minutes for $40

Blog post (Goodfellow 2018)

Model Compression
• Large models often have lower test error

• Very large model trained with dropout

• Ensemble of many models

• Want small model for low resource use at test time

• Train a small model to mimic the large one

• Obtains better test error than directly training a small

model
(Goodfellow 2018)
Quantization

Important for
mobile deployment

(TensorFlow Lite)
(Goodfellow 2018)
Dynamic Structure: Cascades

(Viola and Jones, 2001)

(Goodfellow 2018)
Dynamic Structure

Outrageously Large Neural Networks

(Goodfellow 2018)
Dataset Augmentation for
Computer Vision
Aﬃne Elastic
Noise
Distortion Deformation

Horizontal Random
Hue Shift
flip Translation

(Goodfellow 2018)
Generative Modeling:
Sample Generation

Training Data Sample Generator

(CelebA) (Karras et al, 2017)

Covered in Part III Progressed rapidly

after the book was
Underlies many written
graphics and
speech applications
(Goodfellow 2018)
Graphics

(Table by Augustus Odena) (Goodfellow 2018)

Video Generation

(Wang et al, 2018)

(Goodfellow 2018)
Everybody Dance Now!

(Chan et al 2018)
(Goodfellow 2018)
Model-Based Optimization

(Killoran et al, 2017)

g optimization with a learned predictor model. a) Original experimental
and measured binding scores (horizontal axis); we fit a model to this data
an oracle for scoring generated sequences. Plot shows scores on held-out
(Goodfellow 2018)
elation 0.97). b) Data is restricted to sequences with oracle scores in the
Designing Physical Objects

(Hwang et al 2018)

(Goodfellow 2018)
translations (Cho et al., 2014a) and for generating translated sentences (Sutskever
et al., 2014). Jean et al. (2014) scaled these models to larger vocabularies.

12.4.5.1
Attention Mechanisms
Using an Attention Mechanism and Aligning Pieces of Data

↵(t 1)
↵(t) ↵(t+1)

⇥ ⇥ ⇥

h(t 1)
h(t) h(t+1)

Figure 12.6: A modern attention mechanism, as introduced by Bahdanau et al. (2015), is

Figure 12.6
essentially a weighted average. A context vector c is formed by taking a weighted average
Important in hmany
of feature vectors (t)
with vision, speech,
weights ↵(t) . In someand NLP applications
applications, the feature vectors h are
hidden units of a neural network, but they may also be raw input to the model. The
Improved rapidly
weights ↵(t) are produced after
by thethe book
model itself.was written
They are usually values in the interval
(Goodfellow 2018)
(t)
Attention for Images

Attention mechanism from

Wang et al 2018
Image model from Zhang et al 2018
(Goodfellow 2018)
Generating Training Data

(Bousmalis et al, 2017)

(Goodfellow 2018)
Generating Training Data

(Bousmalis et al, 2017)

(Goodfellow 2018)
natural
imply bylanguage. Depending
looking up two storedonprobabilities.
how the model For is designed,
this a token
to exactly may
reproduce
word, a character,
nference or even
in Pn , we must omita byte. Tokens
the final are always
character discrete
from each entities.
sequence whenThewe
st
rain
kens Natural Language Processing
successful
Pn 1 . language models were based on models of fixed-length sequences
Ascalled n-grams.
an example, An n-gram how
we demonstrate is a sequence of n tokens.
a trigram model computes the probability
of the sentence
Models based on“THE define
DOG RAN
n-grams The first words
the conditional
AWAY.” of the sentence
probability cannot
of the n-th be
token
handled
the by the default
preceding n 1formula
tokens. based
The on conditional
model uses probability
products of because
these there is no
conditional
• An important predecessor to deep NLP is the family
ontext attothe
butions beginning
define of the sentence.
the probability Instead, over
distribution we must usesequences:
longer the marginal prob-
ability of
over models
words atbased onofn-grams:
the start the sentence. We thus evaluate P3 (THE DOG RAN).
⌧
Finally, the last word may be predicted Y using the typical case, of using the condi-
P (xdistribution
ional 1 , . . . , x⌧ ) =PP (x1 , .|.DOG
(AWAY . , xnRAN). P (xt |this
1 ) Putting xt n+1 t 1 ).equation(12.5)
, . . . , xwith
together 12.6,
we obtain: t=n
461
P (THE DOG RAN AWAY) = P3 (THE DOG RAN)P3 (DOG RAN AWAY)/P2 (DOG RAN).
(12.7)
Improve
A fundamental with:of maximum likelihood for n-gram models is that Pn
limitation
as estimated from training set counts is very likely to be zero in many cases, even
-Smoothing
hough the tuple (xt n+1 , . . . , xt ) may appear in the test set. This can cause two
-Backoﬀ
diﬀerent kinds of catastrophic outcomes. When Pn 1 is zero, the ratio is undefined,
o the model -Word
does notcategories
even produce a sensible output. When Pn 1 is non-zero but
Pn is zero, the test log-likelihood is 1. To avoid such catastrophic outcomes,
(Goodfellow 2018)
Word Embeddings in Neural
CHAPTER 12. APPLICATIONS

Language
multiple latent variables (Mnih and Hinton, Models
2007).

6 22
France
7 China
Russian 21
8 French 2009
2008
English
9 20 2004
2003 2007
2001
10 2006
Germany Iraq
Ontario 19 2000
2005 1999
11 Europe
EU
Union
Africa
African
Assembly Japan 1995 2002
12
European 18 19981996
1997
BritishNorth
13 Canada
Canadian
14 South 17
34 32 30 28 26 35.0 35.5 36.0 36.5 37.0 37.5 38.0

Figure 12.3: Two-dimensional visualizations of word embeddings obtained from a neural

machine translation model (Bahdanau et al., 2015),
Figure 12.3 zooming in on specific areas where
semantically related words have embedding vectors that are close to each other. Countries
(Goodfellow 2018)
High-Dimensional Output
Layers for Large Vocabularies

• Short list

• Hierarchical softmax

• Importance sampling

• Noise contrastive estimation

(Goodfellow 2018)
A Hierarchy of Words and
Word Categories
CHAPTER 12. APPLICATIONS

(0) (1)

(0,0) (0,1) (1,0) (1,1)

w0 w1 w2 w3 w4 w5 w6 w7

(0,0,0) (0,0,1) (0,1,0) (0,1,1) (1,0,0) (1,0,1) (1,1,0) (1,1,1)

Figure 12.4
Figure 12.4: Illustration of a simple hierarchy of word categories, with 8 words w0 , . . . , w7 (Goodfellow 2018)
Neural Machine Translation
HAPTER 12. APPLICATIONS

Output object (English

sentence)

Decoder

Intermediate, semantic representation

Encoder

Source object (French sentence or image)

Figure 12.5
igure 12.5: The encoder-decoder architecture to map back and forth between a surfac
epresentation (such as a sequence of words or an image) and a semantic representatio
By using the output of an encoder of data from one modality (such as the encoder(Goodfellow
mappin 2018)
Google Neural Machine Translation

Wu et al 2016
(Goodfellow 2018)
Speech Recognition
Current speech recognition
is based on seq2seq with
attention

Graphic from
“Listen, Attend, and Spell”
Chan et al 2015

(Goodfellow 2018)
Speech Synthesis

WaveNet
(van den Oord et al, 2016)

(Goodfellow 2018)
Deep RL for Atari game playing

(Mnih et al 2013)

Convolutional network estimates the value function (future

rewards) used to guide the game-playing agent.

(Note: deep RL didn’t really exist when we started the book,

became a success while we were writing it, extremely hot topic by the time the book was printed)
(Goodfellow 2018)
Superhuman Go Performance
Monte Carlo tree search, with convolutional networks for value
function and policy

(Silver et al, 2016)

(Goodfellow 2018)
Robotics

(Google Brain) (Goodfellow 2018)

Healthcare and Biosciences

(Google Brain) (Goodfellow 2018)

Autonomous Vehicles

(WayMo) (Goodfellow 2018)

Questions

(Goodfellow 2018)

Foundations of LLM
No ratings yet
Foundations of LLM
231 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Java Course File
No ratings yet
Java Course File
306 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Life Insurance and Its Mathematical Basis
100% (1)
Life Insurance and Its Mathematical Basis
374 pages
NLP Foundation
No ratings yet
NLP Foundation
436 pages
Cheat Sheet - PLC
No ratings yet
Cheat Sheet - PLC
8 pages
ERNIE Technical Report
No ratings yet
ERNIE Technical Report
72 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
Mihai Surdeanu, Marco Antonio Valenzuela-Escarcega - Deep Learning For Natural Language Processing - A Gentle Introduction-Cambridge University Press (2024)
No ratings yet
Mihai Surdeanu, Marco Antonio Valenzuela-Escarcega - Deep Learning For Natural Language Processing - A Gentle Introduction-Cambridge University Press (2024)
345 pages
L6 - UCLxDeepMind DL2020 Document of Google
No ratings yet
L6 - UCLxDeepMind DL2020 Document of Google
141 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
Smooth N-Gram
No ratings yet
Smooth N-Gram
2 pages
Modifying Gauss-Elimination For Tridiagonal Systems &#8211 C PROGRAM
No ratings yet
Modifying Gauss-Elimination For Tridiagonal Systems &#8211 C PROGRAM
5 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
(Ebook) Optimization Models by Giuseppe C
No ratings yet
(Ebook) Optimization Models by Giuseppe C
82 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
2005 14165v3 PDF
No ratings yet
2005 14165v3 PDF
74 pages
Lecture11 VDL
No ratings yet
Lecture11 VDL
58 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
No ratings yet
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
66 pages
Nn4ir PDF
No ratings yet
Nn4ir PDF
290 pages
Brief Introduction To LLM
No ratings yet
Brief Introduction To LLM
69 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
ChatBot With GANs
No ratings yet
ChatBot With GANs
61 pages
3 Sequence and Language Modeling
No ratings yet
3 Sequence and Language Modeling
56 pages
W03 NLP
No ratings yet
W03 NLP
88 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
Neubig 16 Afnlp
No ratings yet
Neubig 16 Afnlp
58 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Llms Course Andrew
No ratings yet
Llms Course Andrew
46 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
Fundamentals of Deep Learning: Part 6: Advanced Architectures
No ratings yet
Fundamentals of Deep Learning: Part 6: Advanced Architectures
35 pages
Class44-46 Introduction To Enncoder-Decoder Model Attention-03-09May2023
No ratings yet
Class44-46 Introduction To Enncoder-Decoder Model Attention-03-09May2023
35 pages
Livro Das Questoes
No ratings yet
Livro Das Questoes
8 pages
Towards Interpreting Language Models
No ratings yet
Towards Interpreting Language Models
79 pages
The University of Zambia Department of Mathematics & Statistics MAT1110: Foundation Mathematics & Statistics For Social Sciences Test 2
No ratings yet
The University of Zambia Department of Mathematics & Statistics MAT1110: Foundation Mathematics & Statistics For Social Sciences Test 2
2 pages
Natual Language Processing
No ratings yet
Natual Language Processing
33 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
21CSC204J-DAA Unit 2
No ratings yet
21CSC204J-DAA Unit 2
104 pages
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
No ratings yet
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
83 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
Neural Text Generation: A Practical Guide: Ziang Xie Zxie@cs - Stanford.edu
No ratings yet
Neural Text Generation: A Practical Guide: Ziang Xie Zxie@cs - Stanford.edu
21 pages
10.48550 Arxiv.2204.02311
No ratings yet
10.48550 Arxiv.2204.02311
87 pages
C11-Attention and Transformers
No ratings yet
C11-Attention and Transformers
59 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
Trend
No ratings yet
Trend
47 pages
Neural Language Modelling - NLP
No ratings yet
Neural Language Modelling - NLP
30 pages
Presented by Prof. Dr. A. M. Siddiqui Penn State University, York, USA
No ratings yet
Presented by Prof. Dr. A. M. Siddiqui Penn State University, York, USA
18 pages
Jason Wei Stanford cs330 Talk
No ratings yet
Jason Wei Stanford cs330 Talk
44 pages
PIIS2589004224005558
No ratings yet
PIIS2589004224005558
24 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
HW 5 Soln
100% (1)
HW 5 Soln
12 pages
Libopenabe v1.0.0 Design
No ratings yet
Libopenabe v1.0.0 Design
30 pages
9.2 CNN-Motivation
No ratings yet
9.2 CNN-Motivation
17 pages
SSE QuantifyingGhostlyEpisodes PartII 7 29NDKD
No ratings yet
SSE QuantifyingGhostlyEpisodes PartII 7 29NDKD
39 pages
A Survey On Neural Network Language Models
No ratings yet
A Survey On Neural Network Language Models
7 pages
Nokia The Road To Quantum Safe Networks White Paper EN
No ratings yet
Nokia The Road To Quantum Safe Networks White Paper EN
18 pages
Leap Year Check in Python
No ratings yet
Leap Year Check in Python
5 pages
Wavelets Meet Large Language Models
No ratings yet
Wavelets Meet Large Language Models
16 pages
Cs224n Self Attention Transformers 2023 Draft
No ratings yet
Cs224n Self Attention Transformers 2023 Draft
18 pages
Generative Adversarial Networks For Text Using Word2vec Intermediaries
No ratings yet
Generative Adversarial Networks For Text Using Word2vec Intermediaries
12 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
No ratings yet
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
39 pages
Deep Neural Network Language Models - W12-2703
No ratings yet
Deep Neural Network Language Models - W12-2703
9 pages
Noncomputability and The Busy Beaver Problem: Bryant A. Julstrom
No ratings yet
Noncomputability and The Busy Beaver Problem: Bryant A. Julstrom
36 pages
Topic06 - APT and Multifactor Models
No ratings yet
Topic06 - APT and Multifactor Models
28 pages
Cse Week 2
No ratings yet
Cse Week 2
2 pages
Stress Detection Using Natural Language
No ratings yet
Stress Detection Using Natural Language
24 pages
Original
No ratings yet
Original
20 pages
Exam ml4nlp1 Hs21.example Solution
No ratings yet
Exam ml4nlp1 Hs21.example Solution
6 pages
PersoNet A Novel Framework For Personality Classification-Based Apt Customer Service Agent Selection
No ratings yet
PersoNet A Novel Framework For Personality Classification-Based Apt Customer Service Agent Selection
7 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
No ratings yet
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
12 pages
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
No ratings yet
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
6 pages
Measuring The Ripeness of Fruit With Hyperspectral Imaging and Deep Learning
No ratings yet
Measuring The Ripeness of Fruit With Hyperspectral Imaging and Deep Learning
8 pages
Engineering College: Department of Computer Science and Engineering
No ratings yet
Engineering College: Department of Computer Science and Engineering
1 page
Numerical Analysis I Sma 2321.docx 1
No ratings yet
Numerical Analysis I Sma 2321.docx 1
3 pages
Ecs 101 Ass 1
No ratings yet
Ecs 101 Ass 1
2 pages
Winter 2022
No ratings yet
Winter 2022
2 pages
Lab Report
No ratings yet
Lab Report
2 pages
9036 - English
No ratings yet
9036 - English
2 pages
Applied Cryptography (Cs6530), Iit Madras
No ratings yet
Applied Cryptography (Cs6530), Iit Madras
1 page
M. Borga, O. Friman, P. Lundberg and H. Knutsson
No ratings yet
M. Borga, O. Friman, P. Lundberg and H. Knutsson
1 page
INVENRELATION (Second Edition): INVENRELATION
From Everand
INVENRELATION (Second Edition): INVENRELATION
Shih-Yu Chang
No ratings yet
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
Programming Kotlin Applications: Building Mobile and Server-Side Applications with Kotlin
From Everand
Programming Kotlin Applications: Building Mobile and Server-Side Applications with Kotlin
Brett McLaughlin
No ratings yet
Compiler Frontiers Unveiled
From Everand
Compiler Frontiers Unveiled
Azhar ul Haque Sario
No ratings yet
Mastering TensorFlow: From Basics to Expert Proficiency
From Everand
Mastering TensorFlow: From Basics to Expert Proficiency
William Smith
No ratings yet