0% found this document useful (0 votes)

17 views67 pages

Neural Models For NLP

Uploaded by

anand.ashish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views67 pages

Neural Models For NLP

Uploaded by

anand.ashish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 67

Deep Learning for NLP:

Neural Models

Ashish Anand
Professor, Dept. of CSE, IIT Guwahati
Associated Faculty, Mehta Family School of Data Sc and AI, IIT Guwahati
Outline: Introduction to NLP

• Neural Language Models

• Vector Semantics
• CNN Models for Classification
• RNN Models for NLP Tasks
NEURAL LANGUAGE
MODEL
Pre-Transformer Era
Feed-Forward Neural Language
Model

Bengio et al, JMLR 03

Advantages over statistical n-gram
models
• Better flexibility in considering larger context
Advantages over statistical n-gram
models
• Better generalizability
• Can generalize to context not seen during the training
• Example: Need to estimate P(reading | ram is)
• Assume if “ram is reading” is not in training, however, “john is reading” is
present. And there are other sentences like “Ram is writing”, “john is writing”
• It is likely that word representations learned by models for “ram” and “john”
is similar, then the model will also give similar probability to P(reading | ram
is) as to P(reading | john is)
Major drawbacks

• Inefficient
• Unable to exploit sequential nature of text
• Limited Context
• Unidirectional
HANDLING THE
DRAWBACKS
Inefficiency: Hierarchical Softmax

Neural Network Lectures by Hugo Leorchelle

Vector Semantics

• Study of vector representation of words

• model to represent meaning of words

• Question is what are the different aspects of meaning ?

• Answer: Lexical semantics – linguistic study of word meaning
DESIDERATA OF
WORD MEANING
Polysemy: Multiple sense of words

• Examples
• Class : Teaching group / Economic Group / Rank
• Right: Correct / A direction
• Mouse: Animal / a specific computer peripheral
• Multiple meaning: word sense
Synonymy: Similar Sense

• Relation: Synonyms/Antonyms
• Synonym: one word has a sense whose meaning is identical to a
sense of another word, or nearly identical
• Examples: couch/sofa; vomit/throw up
• Antonym: meanings are opposite
• Examples: long/short; big/little; fast/slow rise/fall
Word Similarity: Relations beyond
synonyms/antonyms
word1 word2 similarity
• Word Similarity
• Words with similar vanish disappear 9.8
meanings but not synonyms behave obey 7.3
• Cat / Dog belief impression 5.95
muscle bone 3.65
SimLex-999 dataset (Hill et al., 2015) modest flexible 0.98
hole agreement 0.3
Adapted from Jurafsky and Martin’s slide
Word Relatedness / Word Association

• Words are related by semantic frame or field

• Cat, Dog : similar
• Student, Teacher: related but not similar
• Relatedness: co-participation in a shared event

Adapted from Jurafsky and Martin’s slide

Semantic Field

• Set of words covering a particular semantic domain, and

• Have structured relations among them
• Example
• University
• Teacher, student, study, class, lecture, assignment, project
• House
• Room, door, furniture, bedroom
• Topic Modeling: example of semantic field
Semantic Frame

• Semantic Frames and Roles

• Set of words denoting perspectives or participants in a particular
event type
• Different agents playing distinct role in a single event
• Teaching/Learning
• Doctor/Patient
• Buyer/Seller
Taxonomic Relations
One sense is a subordinate/hyponym of another if the first sense is
more specific, denoting a subclass of the other
• car is a subordinate of vehicle
• mango is a subordinate of fruit
Conversely superordinate / hypernym
• vehicle is a superordinate of car
• fruit is a subodinate of mango

Superordinate vehicle fruit furniture

Subordinate car mang chair
o
Adapted from Jurafsky and Martin’s slide
Connotation: Affective Meaning

• Aspects of word’s meaning that are related to a writer’s or

reader’s emotions, opinions, or evaluations
• Happy: Positive connotations vs Sad: Negative connotations
• Great: Positive evaluation vs Terrible: negative evaluation
Connotation: Three dimensional
vector representation
• Three important dimension of
affective meaning (Osgood et
al. (1957))
• Valence: pleasantness of the
stimulus (happy vs annoyed)
• Arousal: intensity of emotion
provoked by the stimulus
(excited vs calm)
• Dominance: degree of control
exerted by the stimulus
(controlling vs awed)
In Summary

• Words
• Have multiple senses, leading to complex relations between words
• Synonymy / Antonymy
• Similarity
• Relatedness
• Taxonomic Relations: Hypernym/Hyponym
• Connotation

• Challenge is how we obtain an appropriate representation

Distributional hypothesis: radically
different approach
• Ludwing Wittgenstein
• Linguist / Philosopher of language
• Meaning of a word is its use in language
• Joos, Harris and Firth
• Define a word by the distribution it occurs in language use
Context determines meaning of
words
• Harris (1954)
• "Oculist and eye-doctor … occur in almost the same environments"
• Generalize it: "If A and B have almost identical environments … we
say that they are synonyms"

• Firth (1957)
• "You shall know a word by the company it keeps!"
Context determines meaning of word

A bottle of tesguino is on the table.

Everybody likes tesguino.
Tesguino makes you drunk.
We make tesguino out of corn.
Broad categories of vector space
models
• Long and Sparse vector representation
• Co-occurrence matrix based methods (term-doc, term-term matrices based on
MI, tf-idf etc.)

• Short and Dense vector representation

• Dimensionality reduction techniques such as Singular value decomposition
(Latent Semantic Analysis) on co-occurrence matrix
• Neural language inspired models (skip-grams, CBOW)
• GloVe

• Other Methods
• Clustering methods: Brown Clusters [Collins lecture]
GLOBAL VECTOR
GloVe

• Main Idea: Use Global co-occurrence statistics and linear

relationship

• Co-occurrence matrix from Term-Context matrix

GloVe: Notation
GloVe: Intuition

• Distribution relationship of two words with the help of a

probe/context word

Source: Pennigton et al. (GloVe: Global Vectors for Word Representation)

GloVe
Outline

• Convolution Neural Network (CNN/ConvNet)

• Motivation
• 1d and 2d Convolution
• CNN for Text (Sequence Data)
• More Terminologies
• An Example

• RNN
• Vanilla RNN
Convolution: Motivation

• Two Questions –

• What filters or feature extractors were used to extract image features,

specifically for image classification task?

• What were the common set of handcrafted features in NLP domain?

Convolution: Motivation in Image
Classification Task

 LeCun and Bengio, 1995

 Object Detectors

Source: 1.
https://www.cs.columbia.edu/education/courses/course/COMSW4995-7/26050/
2.
https://towardsdatascience.com/a-beginners-guide-to-convolutional-neural-networks-cnns
-14649dbddce8
Convolution: Motivation

• NLP: Text Classification

• n-grams as features
• Example: “The movie was based on a true story” -> “the movie was”, “movie
was based” and so on
Convolution: Motivation

• Capture local predictor/features

• Avoid dense or fully connected networks

• Translation-invariant (important specifically in image)

Convolution: 2d convolution in image

• Grid data: 2d convolution

1 0 1 1

0 1 1 0 1 0 1
2 2
0 0 0
1 0 1 0 2 1
0 1 0
0 1 0 1

Input Image Filter (Kernel) Output Image

1d Convolution: Convolution for Text
the (Sequence)
0.3 0.2 0.1 -0.4
movie 0.5 0.4 -0.7 -0.1
is 0.4 0.2 -0.3 -0.2
based -0.2 -0.1 -0.1 0.4
on -0.6 0.5 0.1 0.1
a 0.7 0.3 -0.1 0.1
true 0.8 0.4 0.5 0.6
story 0.01 0.02 0.01 -0.4

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

1d Convolution: Convolution for Text
the(Sequence)
0.3 0.2 0.1 -0.4
movie 0.5 0.4 -0.7 -0.1 themovieis -1.3
2 1 -1 3 movieisbased 2.6
is 0.4 0.2 -0.3 -0.2 isbasedon 0.7
1 1 2 -1
based -0.2 -0.1 -0.1 0.4 basedona 2.2
1 2 2 3 onatrue 4.6
on -0.6 0.5 0.1 0.1
atruestory 2.57
a 0.7 0.3 -0.1 0.1
true 0.8 0.4 0.5 0.6 Filter of width/size 3 Convoluted Feature
story 0.01 0.02 0.01 -0.4

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

1d Convolution: Convolution for Text
the (Sequence)
0.3 0.2 0.1 -0.4
movie 0.5 0.4 -0.7 -0.1 themovieis -1.3
2 1 -1 3 movieisbased 2.6
is 0.4 0.2 -0.3 -0.2 isbasedon 0.7
1 1 2 -1
based -0.2 -0.1 -0.1 0.4 basedona 2.2
1 2 2 3 onatrue 4.6
on -0.6 0.5 0.1 0.1
atruestory 2.57
a 0.7 0.3 -0.1 0.1
true 0.8 0.4 0.5 0.6 Filter of width/size 3 Convoluted Feature
story 0.01 0.02 0.01 -0.4 Width: k = 3
Vector of size: n-k+1
Input sentence and embedded representation

Sentence length: n = 8; embedding dimension: d = 4

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

1d Convolution for Text with padding
◊ 0.0 0.0 0.0 0.0
the 0.3 0.2 0.1 -0.4 ◊themovie 0.7
movie 0.5 0.4 -0.7 -0.1 2 1 -1 3 themovieis -1.3
movieisbased 2.6
is 0.4 0.2 -0.3 -0.2 1 1 2 -1 isbasedon 0.7
based -0.2 -0.1 -0.1 0.4 1 2 2 3 basedona 2.2
onatrue 4.6
on -0.6 0.5 0.1 0.1 Filter of width/size 3 atruestory 2.57
a 0.7 0.3 -0.1 0.1 Width: k = 3
truestory◊ 3.75
true 0.8 0.4 0.5 0.6
Convoluted Feature
story 0.01 0.02 0.01 -0.4
◊ 0.0 0.0 0.0 0.0

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

1d Convolution for Text with padding,
stride
◊ 0.0
= 2
0.0 0.0 0.0
the 0.3 0.2 0.1 -0.4
movie 0.5 0.4 -0.7 -0.1 ◊themovie 0.7
2 1 -1 3
movieisbased 2.6
is 0.4 0.2 -0.3 -0.2 1 1 2 -1 Basedona 2.2
based -0.2 -0.1 -0.1 0.4 1 2 2 3 atruestory 2.57
on -0.6 0.5 0.1 0.1 story◊◊ -1.17
Filter of width/size 3
a 0.7 0.3 -0.1 0.1 Width: k = 3
true 0.8 0.4 0.5 0.6
story 0.01 0.02 0.01 -0.4 Convoluted Feature

◊ 0.0 0.0 0.0 0.0

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Multi-channel 1d Convolution with
padding, stride=1 2 1 -1 3
◊ 0.0 0.0 0.0 0.0 1 1 2 -1
◊themovie 0.7 0.4 1.3
the 0.3 0.2 0.1 -0.4 1 2 2 3
movie 0.5 0.4 -0.7 -0.1 themovieis -1.3 1.9 -0.9
movieisbase 2.6 3.5 0.7
is 0.4 0.2 -0.3 -0.2 3 1 -1 3 d
based -0.2 -0.1 -0.1 0.4 2 -1 -2 -1 isbasedon 0.7 1.3 -0.5
basedona 2.2 -0.2 2.6
on -0.6 0.5 0.1 0.1 1 2 2 1
onatrue 4.6 3.3 3.5
a 0.7 0.3 -0.1 0.1
atruestory 2.57 2.07 1.36
true 0.8 0.4 0.5 0.6 2 -1 1 4
truestory◊ 3.75 4.48 4.54
story 0.01 0.02 0.01 -0.4 1 1 1 -1
◊ 0.0 0.0 0.0 0.0 1 2 1 3 Convoluted Feature

Input sentence and embedded representation 3 Filters of width/size 3

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Pooling over time: Obtaining fixed size
vector
Global Pooling

◊themovie 0.7 0.4 1.3

themovieis -1.3 1.9 -0.9
movieisbase 2.6 3.5 0.7 Max Pooling
d
s 4.6 4.48 4.54
isbasedon 0.7 1.3 -0.5
basedona 2.2 -0.2 2.6
onatrue 4.6 3.3 3.5
atruestory 2.57 2.07 1.36
truestory◊ 3.75 4.48 4.54

Convoluted Feature

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Pooling over time: Obtaining fixed size
vector
Global Pooling

◊themovie 0.7 0.4 1.3

Max Pooling
themovieis -1.3 1.9 -0.9 s 4.6 4.48 4.54
movieisbase 2.6 3.5 0.7
d
isbasedon 0.7 1.3 -0.5
basedona 2.2 -0.2 2.6 Avg Pooling
onatrue 4.6 3.3 3.5 s 1.98 2.09 1.58
atruestory 2.57 2.07 1.36
truestory◊ 3.75 4.48 4.54

Convoluted Feature

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Pooling over time: Obtaining fixed
size vector
Global Pooling

◊themovie 0.7 0.4 1.3

Max Pooling
themovieis -1.3 1.9 -0.9 s 4.6 4.48 4.54
movieisbase 2.6 3.5 0.7 Avg Pooling
d
s 1.98 2.09 1.58
isbasedon 0.7 1.3 -0.5
basedona 2.2 -0.2 2.6
4.6 3.3 3.5 k-max Pooling
onatrue 4.6 3.5 3.5
2.57 2.07 1.36 s
atruestory k=2 3.75 4.48 4.54
truestory◊ 3.75 4.48 4.54

Convoluted Feature

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Local Pooling, stride = 2

◊themovie 0.7 0.4 1.3

themovieis -1.3 1.9 -0.9
◊themovieis 0.7 1.9 1.3
movieisbase 2.6 3.5 0.7
d Max Pooling movieisbasedon 2.6 3.5 0.7
isbasedon 0.7 1.3 -0.5
basedonatrue 4.6 3.3 3.5
basedona 2.2 -0.2 2.6
onatrue 4.6 3.3 3.5 atruestory◊ 3.75 4.48 4.54
atruestory 2.57 2.07 1.36
truestory◊ 3.75 4.48 4.54

Convoluted Feature

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

1d convolution with dilation:
efficient
0.7 0.4 1.3
way to get wider context
◊themovie
2 1 -1
themovieis -1.3 1.9 -0.9
1 1 2
movieisbase 2.6 3.5 0.7 ◊themovieisbased 15
d 1 2 2
isbasedon 0.7 1.3 -0.5 themovieisbasedon
basedona 2.2 -0.2 2.6 atrue
2 -1 1
onatrue 4.6 3.3 3.5 movieisbasedonatr
1 1 1 uestory
atruestory 2.57 2.07 1.36
3.75 4.48 4.54 1 2 1 isbasedonatruestor
truestory◊ y◊
Convoluted Feature

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

An example of 1-layer CNN for
sentence classification

Source: Zhang and Wallace:

https://arxiv.org/pdf/1510.03820.pdf
CONVOLUTIONAL
TO
RECURRENT
NEURAL NETWORK
Natural Language has sequence and
orders
• Natural Language
• Sequence of characters : Word
• Sequence of words: sentence
• Sequence of sentence: documents
Natural Language has sequence and
orders
• Representation
• Feed Forward Network: concatenation of vectors or vector addition
• Concatenation: size vary as input size vary
• Vector addition: fixed size at the expense of ignoring order in sequence
• CNN
• Respect orders but mostly local patterns
RNN naturally handles sequence and
orders

h(t) • Loop allows information to be passed from one step of the network to
the next.
Θ
• Recursive function being applied to input at time step t and previous
wt hidden state h(t-1)
Unrolling a RNN
( 5)
Output Layer ^
𝒚
( 1)
^
𝒚
( 2)
𝒚 (3)
^ ^
𝒚
(4) ^
𝒚

h(t) h(0) h(1) h(2) h(3) h(4) h(5)

Hidden Layer
Wh Wh Wh Wh Wh
W

Ww Ww Ww Ww Ww

Embedding Layer

Input Sequence w1 w2 w3 w4 w5
RNN is about sequence and orders
(𝑡 ) (𝑡)
^
𝒚 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑼 h +𝒃 2)
Output Layer ^
𝒚
( 1)
^
𝒚
( 2)
^
𝒚
(3)
^
𝒚
(4)
𝒚 ( 5)
^

h(0) h(1) h(2) h(3) h(4) h(5)

Hidden Layer (𝑡 ) (𝑡 −1 ) 𝑡
Wh Wh Wh Wh Wh 𝒉 =𝜎 (𝑾 h 𝒉 +𝑾 𝑤 𝒘 +𝒃 1)

Ww Ww Ww Ww Ww

Embedding Layer

Input Sequence w1 w2 w3 w4 w5
RNN in different form: Acceptor
Predict and
calculate loss
( 5)
Output Layer ^
𝒚

Hidden Layer
Wh Wh Wh Wh Wh

Ww Ww Ww Ww Ww

Embedding Layer

Input Sequence w1 w2 w3 w4 w5
RNN in different form: Encoder
Encoded the
representation
( 5)
Output Layer ^
𝒚

Hidden Layer
Wh Wh Wh Wh Wh

Loss is dependent
on other features
or other network Ww Ww Ww Ww Ww

Embedding Layer

Input Sequence w1 w2 w3 w4 w5
RNN in different form: Transducer
Global loss Predict and Predict and
. . . . . . . .
calculate loss calculate loss
( 5)
Output Layer 𝒚 ( 1)
^ 𝒚 ( 2)
^ ^
𝒚
(3)
𝒚 (4)
^ ^
𝒚

Hidden Layer W W W W W
h h h h h
Loss is dependent
on other features
or other network Ww Ww Ww Ww Ww

Embedding
Layer
Input w1 w2 w3 w4 w5
Sequence
Advantages

• Respect orders
• Can process any input length
• Model complexity remains the same
• In theory, information from previous time steps remains
Disadvantages

• Slow computation
• Not parallelizable
• In practice, forgets information from many steps back
• Primarily happens due to vanishing gradient problem
• Also may lead to exploding gradient problem
NEURAL LANGUAGE
MODEL
Pre-Transformer Era
Major drawbacks

• Inefficient
• Unable to exploit sequential nature of text
• Limited Context
• Unidirectional
Limited Context and Sequential
Nature: RNN-LM

Jurafsky and Martin, Speech and Language Processing, 3 rd Draft Jan 2022 Ed
Unidirectional: ELMo

Source: Devlin et al. NAACL 2019

Unidirectional: ELMo

Source: Devlin et al. NAACL 2019

Issues with RNN-based LMs

• Limited bi-directionality
• Difficult to parallelize
References

• Jurafsky and Martin, Speech and Language Processing, 3rd Ed. Draft
[Available at https://web.stanford.edu/~jurafsky/slp3/ ]
Thanks!
Question and Comments!

[email protected] https://www.iitg.ac.in/
anand.ashish

DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
NLP Session 1-7 BT DR - Chetna
No ratings yet
NLP Session 1-7 BT DR - Chetna
469 pages
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
No ratings yet
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
9 pages
6 Vector Apr18 2021
No ratings yet
6 Vector Apr18 2021
106 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Master Thesis
No ratings yet
Master Thesis
74 pages
Lecture 3. 6 - Vector - Apr18 - 2021
No ratings yet
Lecture 3. 6 - Vector - Apr18 - 2021
106 pages
Lecture 2
No ratings yet
Lecture 2
80 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
4.machine Learning Word Embedding-1
No ratings yet
4.machine Learning Word Embedding-1
36 pages
Ed3book - Jan72023 111 141
No ratings yet
Ed3book - Jan72023 111 141
31 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Vector Semantics and Embedding (Part 1)
No ratings yet
Vector Semantics and Embedding (Part 1)
66 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
CSE442 Text
No ratings yet
CSE442 Text
89 pages
Week11 WordEmbedding
No ratings yet
Week11 WordEmbedding
20 pages
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
No ratings yet
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
46 pages
Vector Semantics and Embeddings: Smilodon Thylacosmilus
No ratings yet
Vector Semantics and Embeddings: Smilodon Thylacosmilus
34 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
Lecture12 - Word RepEmb
No ratings yet
Lecture12 - Word RepEmb
28 pages
Week 5
No ratings yet
Week 5
26 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
NLP1 Lecture5
No ratings yet
NLP1 Lecture5
52 pages
Vector Semantics
No ratings yet
Vector Semantics
83 pages
Notes
No ratings yet
Notes
37 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
NLP Unit 4
No ratings yet
NLP Unit 4
23 pages
Unit 2
No ratings yet
Unit 2
15 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
NLP Basic - YL
No ratings yet
NLP Basic - YL
16 pages
Feature Eng
No ratings yet
Feature Eng
34 pages
Vector Based Models
No ratings yet
Vector Based Models
41 pages
NLP Summary
No ratings yet
NLP Summary
6 pages
Extracting Word Synonyms From Text Using Neural Approaches
No ratings yet
Extracting Word Synonyms From Text Using Neural Approaches
7 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
103 pages
Lecture2.2 UnimodalRepresentations Part2
No ratings yet
Lecture2.2 UnimodalRepresentations Part2
51 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Lecture 3. Vector Semantics
No ratings yet
Lecture 3. Vector Semantics
51 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Chapter 4 NLP
No ratings yet
Chapter 4 NLP
17 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Chapter Transformers
No ratings yet
Chapter Transformers
8 pages
Christopher Manning Lecture 1: Introduction and Word Vectors
No ratings yet
Christopher Manning Lecture 1: Introduction and Word Vectors
42 pages
Lab 5
No ratings yet
Lab 5
27 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
Text Similarity
No ratings yet
Text Similarity
31 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Introduction To Semantic Processing
No ratings yet
Introduction To Semantic Processing
13 pages
Lab Manual Digital Marketing: Mr. Prince Vohra
No ratings yet
Lab Manual Digital Marketing: Mr. Prince Vohra
32 pages
CC-LINK Interface: SR83 Digital Controller
No ratings yet
CC-LINK Interface: SR83 Digital Controller
24 pages
Meity - Approved Exemption Application Format-Revised
No ratings yet
Meity - Approved Exemption Application Format-Revised
8 pages
Krushi Bhavan
No ratings yet
Krushi Bhavan
5 pages
Sectional Weights
No ratings yet
Sectional Weights
1 page
B.A. H Economics Intermedi Bikup2y2023
No ratings yet
B.A. H Economics Intermedi Bikup2y2023
32 pages
T9 Assembly Modeling
No ratings yet
T9 Assembly Modeling
15 pages
BOP Configurations
No ratings yet
BOP Configurations
4 pages
NemoSens Briefsheet 008
No ratings yet
NemoSens Briefsheet 008
2 pages
Manual Fujikura 70s
No ratings yet
Manual Fujikura 70s
98 pages
ANL201 Study Unit 1 (Ver 20200108)
No ratings yet
ANL201 Study Unit 1 (Ver 20200108)
42 pages
Appendix-B - Major Dams in India
No ratings yet
Appendix-B - Major Dams in India
4 pages
Toshiba RAS-M10SKV-E
No ratings yet
Toshiba RAS-M10SKV-E
52 pages
HV Assignment 1 (Section 2a) (1 47)
No ratings yet
HV Assignment 1 (Section 2a) (1 47)
5 pages
Python ch4
No ratings yet
Python ch4
23 pages
Installation and Testing of Battery & Battery Charger
No ratings yet
Installation and Testing of Battery & Battery Charger
3 pages
Pengaruh Penyajian Laporan Keuangan Dan Aksesibilitas TERHADAP TINGKAT AKUNTABILITAS KEU PADA SKPD KAB BENGKALIS
No ratings yet
Pengaruh Penyajian Laporan Keuangan Dan Aksesibilitas TERHADAP TINGKAT AKUNTABILITAS KEU PADA SKPD KAB BENGKALIS
7 pages
How To Importing Text File
No ratings yet
How To Importing Text File
18 pages
ThinkCentre M75q Gen 2 Spec
No ratings yet
ThinkCentre M75q Gen 2 Spec
9 pages
10 Formatting Text (Font, Paragraph, Lists)
No ratings yet
10 Formatting Text (Font, Paragraph, Lists)
3 pages
(System Message) (System Message) (System Message) : (Dota V6.69C.W3X)
No ratings yet
(System Message) (System Message) (System Message) : (Dota V6.69C.W3X)
38 pages
Aegis El RG 4m El RG 4k Manual 21-03-31
No ratings yet
Aegis El RG 4m El RG 4k Manual 21-03-31
8 pages
Gravitee Vs Azure - Cloud Freedom From Your API Management Platform
No ratings yet
Gravitee Vs Azure - Cloud Freedom From Your API Management Platform
5 pages
Form B Level 200
No ratings yet
Form B Level 200
1 page
Number System and Polynomials
No ratings yet
Number System and Polynomials
2 pages
Microwave Lab 03
No ratings yet
Microwave Lab 03
8 pages
Add Label For XY Scatter Chart
No ratings yet
Add Label For XY Scatter Chart
34 pages
Sai Kumar Modified Resume
No ratings yet
Sai Kumar Modified Resume
1 page
Statistical Analysis System: First SAS Program
No ratings yet
Statistical Analysis System: First SAS Program
8 pages
B11 - B12 - B13 - 0141 - MAT2002 - 100318 - Dr. Sheerin Kayenat - Fall 22-23 - TEE
No ratings yet
B11 - B12 - B13 - 0141 - MAT2002 - 100318 - Dr. Sheerin Kayenat - Fall 22-23 - TEE
2 pages
Review Your Grammar and Ace Exams
From Everand
Review Your Grammar and Ace Exams
Florian Navarroza-Flores
No ratings yet
The Ultimate In: Word Power and Comprehension for the "New" S.A.T.®
From Everand
The Ultimate In: Word Power and Comprehension for the "New" S.A.T.®
GFS
No ratings yet
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
Story Development Glossary & User Guide 2nd Revised Edition
From Everand
Story Development Glossary & User Guide 2nd Revised Edition
Jeff Lyons
No ratings yet
Prototext-metatext translation shifts: A model with examples based on Bible translation
From Everand
Prototext-metatext translation shifts: A model with examples based on Bible translation
Bruno Osimo
No ratings yet

Neural Models For NLP

Uploaded by

Neural Models For NLP

Uploaded by

Deep Learning for NLP:

• Neural Language Models

Bengio et al, JMLR 03

Neural Network Lectures by Hugo Leorchelle

• Study of vector representation of words

• Question is what are the different aspects of meaning ?

• Words are related by semantic frame or field

Adapted from Jurafsky and Martin’s slide

• Set of words covering a particular semantic domain, and

• Semantic Frames and Roles

Superordinate vehicle fruit furniture

• Aspects of word’s meaning that are related to a writer’s or

• Challenge is how we obtain an appropriate representation

A bottle of tesguino is on the table.

• Short and Dense vector representation

• Main Idea: Use Global co-occurrence statistics and linear

• Co-occurrence matrix from Term-Context matrix

• Distribution relationship of two words with the help of a

Source: Pennigton et al. (GloVe: Global Vectors for Word Representation)

• Convolution Neural Network (CNN/ConvNet)

• What filters or feature extractors were used to extract image features,

• What were the common set of handcrafted features in NLP domain?

 LeCun and Bengio, 1995

• NLP: Text Classification

• Capture local predictor/features

• Avoid dense or fully connected networks

• Translation-invariant (important specifically in image)

• Grid data: 2d convolution

Input Image Filter (Kernel) Output Image

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Sentence length: n = 8; embedding dimension: d = 4

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

◊ 0.0 0.0 0.0 0.0

Input sentence and embedded representation

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Input sentence and embedded representation 3 Filters of width/size 3

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

◊themovie 0.7 0.4 1.3

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

◊themovie 0.7 0.4 1.3

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

◊themovie 0.7 0.4 1.3

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

◊themovie 0.7 0.4 1.3

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Adapted from Stanford CS224n-2019 Lecture slides (Lecture 11)

Source: Zhang and Wallace:

h(t) h(0) h(1) h(2) h(3) h(4) h(5)

h(0) h(1) h(2) h(3) h(4) h(5)

Source: Devlin et al. NAACL 2019

Source: Devlin et al. NAACL 2019

You might also like