0% found this document useful (0 votes)

34 views77 pages

10 RNN

Uploaded by

Ritesah Madhunala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views77 pages

10 RNN

Uploaded by

Ritesah Madhunala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

On

Neural Language Model, RNNs

Pawan Goyal

CSE, IIT Kharagpur

CS60010

Predictive Hig
-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 1 / 30

Language Modeling
Language Modeling is the task of predicting what word comes next.

Contex

Goal: Compute the probability of a sentence or sequence of words:

P(W) = P(w1 , w2 , w3 , . . . , wn ) mainzule

y
--

Related Task: probability of an upcoming word:

P(w4 |w1 , w2 , w3 ) M
--
-
A model that computes either of these is called a language model
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 2 / 30
Language Modeling

You can also think of a language model as a system that assigns

probability to a piece of text.
For example, if we have some text x(1) , . . . , x(T) , then the probability of
this text (according to the Language Model) is:

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 3 / 30

You use language models every day!

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 4 / 30

You use language models every day!

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 5 / 30

Why should we care about language modeling?

E Language Modeling is a benchmark task that helps us measure our

progress on understanding language G
Language Modeling is fundamental to many NLP tasks, especially those
involving generating text or estimating the probability of text:

-
I Predictive typing
I Speech recognition translation
>
Chatbot
-
I Handwriting recognition
I Spelling/grammar correction
compress
data
a Parameter
Let of
-
to set

↓
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 6 / 30
n-gram language models
-

~ O
books their
L Pr) books)
u
pened
Pr (books) their
·

!
-

-
Eos 10.
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 7 / 30
n-gram language models

↓ - H 4-gram
- - -
-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 8 / 30

n-gram language models: Example

o
100 %
- ->

we
1000
= 0 . 02

v4
E -

var

- -

-
O
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 9 / 30
Storage Problems with n-gram Language Model

su
-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 10 / 30

A fixed-window neural language model
V
of wi I
p Uh
3dh +
-
[In

# I I 3d ilp size
their
Hi Students ·

--
pened -

d-dim
--

1-hot - ↳

EBVE

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 11 / 30

A fixed-window neural language model

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 12 / 30

How do we obtain word representations?

In traditional NLP / IR, words are treated as discrete symbols.

One-hot representation
Words are represented as one-hot vectors: one 1, the rest 0s
d
V

What is the problem?

Vector dimension = number of words in vocabulary (e.g., 500,000)
The vectors are orthogonal, and there is no natural notion of similarity
-

between one-hot vectors!

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 13 / 30

Word2Vec – A distributed representation
Distributional representation – word embedding?
Any word wi in the corpus is given a distributional representation by an
embedding

G
w 2R i
d

i.e., a d dimensional vector, which is mostly learnt!

Bey
divasion ?

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 14 / 30

Distributional Representation: Illustration
If we label the dimensions in a hypothetical word vector (there are no such
pre-assigned labels in the algorithm of course), it might look a bit like this:

-
-
-
-
-

↳
self-superen
Such a vector represents the ‘meaning’ of a word in some abstract way

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 15 / 30

Unsupervised Self-supervised Superused

Tran data
Traing dat
Just date with lates
tio laizel Evo human-generated
lessful labelsJ
E to
Generates labels
from the date itself
Learning Word Vectors: Overview

Jones
Word

·rf
Center

d
Basic Idea: Use self-supervision
We have a large corpus of text -
Every word in a fixed vocabulary is represented by a vector
Go through each position t in the text, which has a center word c and
context (“outside”) words o
Use the similarity of the word vectors for c and o to calculate the
probability of o given c (or vice versa)
Keep adjusting the word vectors to maximize this probability

nectors
* word are
your parames

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 16 / 30

Word2Vec (Skip-gram) Overview

Example windows and process for computing P(wt+j |wt )

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 17 / 30

Word2Vec Overview

Example windows and process for computing P(wt+j |wt )

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 18 / 30

Word2Vec: objective function

We want to minimize the loss function:

L ---
I 2v vectors

How to calculate P(wt+j |wt ; q)?

w w

We will use two vectors per word w:

No
°
⑦

vw when w is a center word

uw when w is a context word

G
-

Then, for a center word c and a context word o

- -

exp(uTo vc )
P(o|c) =
Âw2V exp(uTw vc )

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 19 / 30

Understanding P(o|c) further

P(o|c) =
exp(uTo vc )
Âw2V exp(uTw vc )
I
-
-

not

e -
Nu

X
3
4.

no" -

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 20 / 30

~-we #
&
d
I

hidden--
dev

Out-
-

rectors
-
v

L ②
-
-

Crises
banking

Vw
f
+
O

Un
·
&
w -
-

d dis Edin
-
vxe
Gidden)
- resp(0()

-
+
U1

airir
#D
softmax
>
-

exp /NotPa)
-

E explur)
W= 1
Try this problem

Skip-gram
Suppose you are computing the word vectors using Skip-gram architecture.
You have 5 words in your vocabulary,
{passed, through, relu, activation, function} in that order and suppose you
have the window, ‘through relu activation’ in your corpora. You use this window
with ‘relu’ as the center word and one word before and after the center word as
your context.

Compute the loss

Also, suppose that for each word, you have 2-dim in and out vectors, which
have the same value at this point given by [1,-1],[1,1],[-2,1],[0,1],[1,0] for the 5
words, respectively. As per the Skip-gram architecture, the loss corresponding
to the target word “activation” would be log(x). What is the value of x?

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 21 / 30

Homework

Compute partial derivative of the loss with respect to vc

-xl(

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 22 / 30

A fixed-window neural language model: Pros and Cons

!
drapaci

Ed + O

-"
1
-
-

f
fixed-window
-

Pawan Goyal (IIT Kharagpur)

de Eckly offered
Neural Language Model, RNNs
d
meis

CS60010 23 / 30
Recurrent Neural Networks

0000

Ween Wan Was Wer

......

Core Idea
Apply the same weights repeatedly!

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 24 / 30

Recurrent Neural Networks

We can process a sequence of vectors x by applying a recurrence formula at

each step:

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 25 / 30

H
#g E n

hi
f
-
We
-
-
-1
↳
--
-
-

- >
-

f
↑ 4t I
# I I -
-

We Kn
de 22
RNN as a feed-forward network

↓
dont

· dowx du
duxan du
v:

: daxdin

ha
din
=
f(Uh + - 1 + Wx+ + b)
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 26 / 30
Forward Propagation

ht = g(Uht 1 + Wxt )

yt = softmax(Vht )

Let the dimensions of the input, hidden and output be din , dh and dout ,
respectively
The three parameter matrices: W : dh ⇥ din , U : dh ⇥ dh , V : dout ⇥ dh
-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 27 / 30

RNN Unrolled in Time
2

0000 ↑

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 28 / 30

Train &NN LN Compas
-
-
selt-supervision
u(GP)
Meg &
↓
-

- M
H El Ye
-

↳ I
-

↳
_
↑
e
w

-
Training an RNN language model

To train RNN LM, we use self-supervision (or self-training)

We take a corpus of text as training material
At each time step t, we ask the model to predict the next word

Why is it called self-supervision?

We do not add any gold data, the natural sequence of words is its own
supervision!
We simply train the model to minimize the error in predicting the true next
word in the training sequence

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 29 / 30

Training an RNN language model

4
L f
4
f X
↓

F
- -

If V EI
-

↓
~, W g W r
W -
- & >
-

↑
-

⑭ T

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 30 / 30

< /s]
CST
Bus
awood
-

-
-
samplethepro
a
as ter
· ho

A
- Es
I+
Co
I
< ST
ofa
so

· 0

So
.
of 0 .

18 0 12
.

M ↑
Generating text with an RNN Language Model

w
RNN-based language models can be used for language generation (and
hence, for machine translation, dialog, etc.)
A language model can incrementally generate words by repeatedly
sampling the words conditioned on the previous choices – also known as
autoregressive generation.

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 31 / 41

Autoregressive Generation with RNNs

All your parameters have already been trained.

Start with a special begin of sentence token <s> as input
Through forward propagation, obtain the probability distribution at the
output, and sample a word
Feed the word as input at the next time-step (its word vector)
Continue generating until the end of sentence token is sampled, or a fixed
length of the sentence has been reached.

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 32 / 41

Autoregressive Generation with RNNs

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 33 / 41

RNNs can be used for various other applications

Cl
zu --

O I to
La
Y --

↳
- I
Sequence labeling: Named Entity Recognition, Parts-of-Speech Tagging
-

#- Text Classification: Sentiment Analysis, Spam Detection

O
-

47
- -

-
e
z It

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 34 / 41

RNNs for Sequence Labeling

Task
Assign a label chosen from a small fixed set of labels to each element of the
sequence
Inputs: Word embeddings
Outputs: Tag probabilities generated by the softmax layer

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 35 / 41

RNNs for Sequence Labeling

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 36 / 41

RNNs for Sequence Classification

Task
Classify the entire sequence rather than the token within them
Pass the text to be classified a word at a time, generating new hidden
states at each time step
The hidden state of the last token can be thought of as a compressed
representation of the entire sequence
This last hidden state is passed through a feed-forward network that
chooses a class via softmax
There are other options of combining information from all the hidden
states

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 37 / 41

RNNs for Sequence Classification

E C
3 =
[hi
T

kr o ·

300 - d

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 38 / 41

Other Variations: Stacked RNNs

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 39 / 41

Other Variations: Bidirectional RNNs

RNN makes use of information from left (prior) context to predict at time t
In many applications, the entire sequence is available; so it makes sense
to also make use of the right context to predict at time t
Bidirectional RNNs combine two independent RNNs, one where the input
is processed from left to right (forward RNN), and another from end to the
start (backward RNN).
hft = RNNforward (x1 , n, xt )
hbt = RNNbackward (xn , n, xt )
·--

ht = [hft ; hbt ]
af .
&
>
- - > -
hft + E
Kn
ze
C
- Eka
b

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 40 / 41

Other Variations: Bidirectional RNNs

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs CS60010 41 / 41

RNNs: Other Applications, LSTMs
des

wondered dim
Pawan Goyal
Usin
bi-RNE
- taske
vord o
CSE, IIT Kharagpur
Pos -
M
for

Parkinders
CS60010 tag

hiddor- ↳
Dos

of
#

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 1 / 25

" Froxo 50x
Y >
-

U
50
: :

>
- - -
> he Va he
°
-

ho
~

>
-

#
↑ 300
E
E
:

:
50x50

300450
3
s da
-
-

4.
[in]
=

Ex50
:
,
of classes
5-6 weeks

/Train
-

-Mi-
zi ne
.

-Endsch

& Assignment Project hum

&
I

fe35mini
K
terliest
George moni
*
quig
↳
quigals edo
After a
%

20 -
L
>
-

-
Using Bidirectional RNNs for Sequence Classification

- -

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 2 / 25

Need for better units: Vanishing Gradient

Y2 43 You

-v u -V

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 3 / 25

Effect of vanishing gradient on RNN LM

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 4 / 25

Effect of vanishing gradient on RNN LM
-v
- --

Synthe
②-

W
-

>
- n

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 5 / 25

How to fix vanishing gradient problem?

The main problem is that it is too difficult for the RNN to learn to preserve
information over many timesteps.
In a vanilla RNN, the hidden state is constantly being rewritten

-Hantes
~
c1
(t)
-G
h = tanh(Uh + Wx )
-
(t 1)
- -
(t)

How about better RNN units? "pen Few

-v
U -
F. -k I - -
-
....

L
T
-000
- · .

4t
↳ A-1 50-dim
-

>
- A-

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 6 / 25

gate
50

Gate
Neural Sigmoid

#
=

KA -
1
T
Pet
-

o -

it
-
I
u (U ↑
,
he- t Wich)
↑
-

--
Te more leanable
params
Using Gates for better RNN units

The gates are also vectors

On each timestep, each element of the gates can be open (1), close (0)
or somewhere in-between.
The gates are dynamic: their value is computed based on the current
context.

Two famous architectures

GRUs, LSTMs

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 7 / 25

Long Short Term Memory (LSTM)
It
pyt
TV ht >
- ht +

context
E
9t
I

At kA Ct
-
Context
victor
KA :
Oz O
-

(CA) Lad-state
Ant
Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 8 / 25
He
CA-1 HA -
gA =
Fant (Night
-

in o UghAr)
+
-t
+

&A -
02 B - new state Content

(like RNN
is
At (WfR Uf hat
:
+
er

(Wile + Vikto
it =
2

2) Woc +
Vo4A-1)
Of =
LSTM: More Details

For context management, an explicit context layer is added to the

architecture
It makes use of specialized neural units (gates) to control the flow of
information
The gates share a common design feature, and choice of sigmoid pushes
its output to 0 or 1, thus it works as a binary mask.

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 9 / 25

LSTM: In Equations
Forget Gate
Controls what is kept vs forgotten from the context

-
-
ft = s(Uf ht
-
1 + Wf xt )

Input Gate
Controls what parts of new cell content are written to the context

it = s(Ui ht 1 + Wi xt )
-

Output Gate
Controls what part of context are output to hidden state

ot = s(Uo ht 1 + Wo x t )

New Cell content: gt = tanh(Ug ht 1 + Wg xt )

New Context Vector: ct = it gt + ft ct 1
New Hidden State: ht = ot tanh(ct )
Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 10 / 25
How does LSTM solve vanishing gradients?

Ift Wie
-

The LSTM architecture makes it easier for the RNN to preserve information
over many timesteps
e.g., if the forget gate is set to remember everything on every timestep,
then the info in the cell is preserved indefinitely
By contrast, it is harder for vanilla RNN to learn a recurrent weight matrix
U that preserves info in hidden state Vi Vo
Vo -
vi

e
U

12 I
-
Ot
( >
ht +

its and
-
>
-
it
to of he -
HA
W LTM cal
Me
Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 11 / 25
Common RNN NLP Architectures

600 -
RiritIsiris -

E
encoles

words
-°
-
&

<
- &
/
- -

l
O
-
>
-
Traslation

b-
+4
und
-

-
-decoder
>
>
-

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 12 / 25

Encoder-decoder networks

Also known as sequence-to-sequence networks, and are capable of

generating contextually appropriate, arbitrary length output sequences given
the input sequence.

Three conceptual components

·
An encoder that accepts an input sequence x1:n and generates a
o
- -

corresponding sequence of contextualized representations h1:n

②
A context vector, c, which is a function of h1:n and conveys the essence

G
>
-

of the input to the decoder

A decoder which accepts c as input and generates an arbitrary length
sequence of hidden states h1:m from which the corresponding output
-

states y1:m can be obtained.

°
⑩

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 13 / 25

Encoder-decoder networks

the - example
Un
for
C =

: Text
Text olp Seco
Ip: Seco

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 14 / 25

Encoder-decoder networks for translation
e

ho
>
-

he
Endor
t

der

SE Si

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 15 / 25

Training the Encoder-decoder model Lebra↓
S

now
U Toron
a
ho
,
↳ I Vocab
>
e Den 1 -Megp) dire

End-to-end training
For MT, the training data typically consists of set of sentences and their
translations
The network is given a source sentence and then a separator token, it is
trained auto-regressively to predict the next word
Teacher forcing is used during training, i.e., the system is forced to use

C
the gold target token from training as the next input xt+1 , rather than
-

relying on the last decoder output ŷt

t ! +

at time-
Never
inference
Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 16 / 25
Training the Encoder-decoder model

Randomsitialized

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 17 / 25

Encoder-decoder: Bottleneck

The context vector, hn is the hidden state of the last time step of the
source text
It acts as a bottleneck, as it has to represent absolutely everything about
the meaning of the source text, as this is the only thing decoder knows
about the source text

a
Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 18 / 25
Encoder-decoder with attention

not
,
dist
-
attention
dist
-
5 -

- 6
3
O -
-

wit

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 19 / 25

Attention: In Equations

The context vector ci is generated anew with each decoding step i

hid
- -
+ i

↓ hdi = g(ŷi 1 , hdi 1 , ci )

°
-
-
- --

wat
weighing tencoderdates
Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 20 / 25
Attention: In Equations

Computing ci
Compute how much to focus on each encoder state, by seeing how
relevant it is to the decoder state captured in hdi 1 – give it a score

wateeile
Simplest scoring mechanism is dot-product attention

score(hdi 1 , hej ) = hdi 1 · hej >

-
hit
- -

Normalize these scores using softmax to create a vector of weights

aij = softmax(score(hdi 1 , hej )) E
A fixed-length context vector is created for the current decoder state

⑳
Â bi
ci = aij hej

Zahi + j

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 21 / 25

Attention is quite helpful

Attention improves NMT performance

It is useful to allow decoder to focus on certain parts of the source

Attention helps with the long-term dependency problem

Provides shortcut to faraway states

Attention provides some interpretability

By inspecting attention distribution, we can see what the decoder was
focusing on
We get alignment for free even if we never explicitly trained an alignment
system

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 22 / 25

Example: Machine Translation

Neural Machine Translation by jointly learning to align and Translate, ICLR

2015

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 23 / 25

Example: Text Summarization

A Neural Attention Model for Sentence Summarization, EMNLP 2015

3
Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 24 / 25
Summary

3
4 ee 4 . > hi ?
-

+
- Es
To
-
-

3
Boffmt [2
-

Attention has proved to be a very impactful idea in NLP

Lot of new models are based on self-attention, e.g., Transformer, BERT

ei e
e 1
Cz Lij he
- I
-

j)
4

E 2i
j
=
Softmax[h ,
. i
h

Pawan Goyal (IIT Kharagpur) RNNs: Other Applications, LSTMs CS60010 25 / 25

Englich
v
-
V
-
Hindi -

zir
night
-
-
↳ digeu

Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Neural Language Model, RNNS: Pawan Goyal
No ratings yet
Neural Language Model, RNNS: Pawan Goyal
15 pages
NLP Week7 RNNLSTM
No ratings yet
NLP Week7 RNNLSTM
66 pages
Rnnjan 25
No ratings yet
Rnnjan 25
59 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Recurrent Neural Networks Cheatsheet
No ratings yet
Recurrent Neural Networks Cheatsheet
44 pages
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
No ratings yet
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
9 pages
Recurrent Neural Nets
No ratings yet
Recurrent Neural Nets
144 pages
cs224n-2021-LSTM NN
No ratings yet
cs224n-2021-LSTM NN
59 pages
RNN
No ratings yet
RNN
22 pages
RNN-1
No ratings yet
RNN-1
50 pages
Image Captions With Deep Learning: Yulia Kogan & Ron Shiff
No ratings yet
Image Captions With Deep Learning: Yulia Kogan & Ron Shiff
24 pages
Nn4ir PDF
No ratings yet
Nn4ir PDF
290 pages
Word Embedding 9 Mar 23 PDF
No ratings yet
Word Embedding 9 Mar 23 PDF
16 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Lecture 16
No ratings yet
Lecture 16
311 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
L5 Cse256 Fa24 LM
No ratings yet
L5 Cse256 Fa24 LM
65 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
Decoder Models PPT 2
No ratings yet
Decoder Models PPT 2
63 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
Natural Language Processing With Neural Network - Class3
No ratings yet
Natural Language Processing With Neural Network - Class3
25 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
Character-Aware Neural Language Models
No ratings yet
Character-Aware Neural Language Models
9 pages
5th Unit
No ratings yet
5th Unit
36 pages
Class44-46 Introduction To Enncoder-Decoder Model Attention-03-09May2023
No ratings yet
Class44-46 Introduction To Enncoder-Decoder Model Attention-03-09May2023
35 pages
Natual Language Processing
No ratings yet
Natual Language Processing
33 pages
Exam ml4nlp1 Hs21.example Solution
No ratings yet
Exam ml4nlp1 Hs21.example Solution
6 pages
Lecture 6 N Gram Language Models Contd Annotations
No ratings yet
Lecture 6 N Gram Language Models Contd Annotations
36 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
NLP Slides2
No ratings yet
NLP Slides2
93 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
RNN For Moodle
No ratings yet
RNN For Moodle
42 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
Neubig 16 Afnlp
No ratings yet
Neubig 16 Afnlp
58 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
NLP NN Language Modeling Week5
No ratings yet
NLP NN Language Modeling Week5
33 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
32 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part IV Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part IV Spring 2015
12 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Language Models
No ratings yet
Language Models
11 pages
L6 - UCLxDeepMind DL2020 Document of Google
No ratings yet
L6 - UCLxDeepMind DL2020 Document of Google
141 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
Recurrent Neural Networks: Amir H. Payberah
No ratings yet
Recurrent Neural Networks: Amir H. Payberah
142 pages
04 - RNNs
No ratings yet
04 - RNNs
37 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-02-28 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-02-28 Reference-Material-I
39 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
Linguistic Regularities in Continuous Space Word Representations
No ratings yet
Linguistic Regularities in Continuous Space Word Representations
6 pages
Module 12
No ratings yet
Module 12
17 pages
Module 10
No ratings yet
Module 10
12 pages
Module 41
No ratings yet
Module 41
21 pages
Module 44
No ratings yet
Module 44
89 pages
Chemical Burn
No ratings yet
Chemical Burn
32 pages
Welding Classification
No ratings yet
Welding Classification
30 pages
Ericsson The Bss To Cloud Journey
No ratings yet
Ericsson The Bss To Cloud Journey
26 pages
Operation & Service Manual For Cable Tensiometer: Series
No ratings yet
Operation & Service Manual For Cable Tensiometer: Series
28 pages
PT Mathematics-6 Q2
No ratings yet
PT Mathematics-6 Q2
7 pages
Tritaal/teentaal-Single Speed - (Thah)
No ratings yet
Tritaal/teentaal-Single Speed - (Thah)
7 pages
Semester V (2022-25)
No ratings yet
Semester V (2022-25)
1 page
About Paraguay
No ratings yet
About Paraguay
4 pages
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
No ratings yet
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
9 pages
Journal of Oral Health and Dentistry Research (ISSN: 2583-522X) Case Report The in Uence of The Pulp On The Periodontium: A Viewpoint
No ratings yet
Journal of Oral Health and Dentistry Research (ISSN: 2583-522X) Case Report The in Uence of The Pulp On The Periodontium: A Viewpoint
11 pages
Resident Evil Code - Veronica X - Action Replay Codes, US - Cheat Happens
No ratings yet
Resident Evil Code - Veronica X - Action Replay Codes, US - Cheat Happens
7 pages
C++ With Visual Basic
No ratings yet
C++ With Visual Basic
10 pages
Adding and Subtracting Integers Lesson Plan
No ratings yet
Adding and Subtracting Integers Lesson Plan
3 pages
Mikro DM38
No ratings yet
Mikro DM38
2 pages
Bai Tap Ham Tai Chinh
No ratings yet
Bai Tap Ham Tai Chinh
4 pages
Arcs and Inscribed Angle
No ratings yet
Arcs and Inscribed Angle
29 pages
Musico 2023 The Role of Perfectionistic Self Presentation and Problematic Instagram Use in The Relationship Between
No ratings yet
Musico 2023 The Role of Perfectionistic Self Presentation and Problematic Instagram Use in The Relationship Between
15 pages
Vina Milk
No ratings yet
Vina Milk
5 pages
ĐỀ THI THỬ SỐ 10 - Khóa Đề
No ratings yet
ĐỀ THI THỬ SỐ 10 - Khóa Đề
6 pages
Making The Most of Your Conference Poster: DR Krystyna Haq Graduate Education Officer Graduate Research School
No ratings yet
Making The Most of Your Conference Poster: DR Krystyna Haq Graduate Education Officer Graduate Research School
19 pages
Search:: A Really Simple Database
No ratings yet
Search:: A Really Simple Database
30 pages
TCA 1 Hard Surface Flooring Proposal and Reason Statement
No ratings yet
TCA 1 Hard Surface Flooring Proposal and Reason Statement
2 pages
SuperiorBroomDT80 CT
No ratings yet
SuperiorBroomDT80 CT
2 pages
A19 CC5051NP CW1
No ratings yet
A19 CC5051NP CW1
39 pages
CMPC Pulp: Insulation Requirement: Heat Conservation (For Personnel Protection, See Notes 3 & 4) Service
No ratings yet
CMPC Pulp: Insulation Requirement: Heat Conservation (For Personnel Protection, See Notes 3 & 4) Service
3 pages
Mock CLAT 47 Question Paper - FINAL5969333
No ratings yet
Mock CLAT 47 Question Paper - FINAL5969333
31 pages
SPM Physics Definition List
No ratings yet
SPM Physics Definition List
5 pages
JLG-860SJ - en
No ratings yet
JLG-860SJ - en
142 pages
Mini Project 1.. 1
No ratings yet
Mini Project 1.. 1
15 pages