Neural Models For NLP
Neural Models For NLP
Neural Models
Ashish Anand
Professor, Dept. of CSE, IIT Guwahati
Associated Faculty, Mehta Family School of Data Sc and AI, IIT Guwahati
Outline: Introduction to NLP
• Inefficient
• Unable to exploit sequential nature of text
• Limited Context
• Unidirectional
HANDLING THE
DRAWBACKS
Inefficiency: Hierarchical Softmax
• Examples
• Class : Teaching group / Economic Group / Rank
• Right: Correct / A direction
• Mouse: Animal / a specific computer peripheral
• Multiple meaning: word sense
Synonymy: Similar Sense
• Relation: Synonyms/Antonyms
• Synonym: one word has a sense whose meaning is identical to a
sense of another word, or nearly identical
• Examples: couch/sofa; vomit/throw up
• Antonym: meanings are opposite
• Examples: long/short; big/little; fast/slow rise/fall
Word Similarity: Relations beyond
synonyms/antonyms
word1 word2 similarity
• Word Similarity
• Words with similar vanish disappear 9.8
meanings but not synonyms behave obey 7.3
• Cat / Dog belief impression 5.95
muscle bone 3.65
SimLex-999 dataset (Hill et al., 2015) modest flexible 0.98
hole agreement 0.3
Adapted from Jurafsky and Martin’s slide
Word Relatedness / Word Association
• Words
• Have multiple senses, leading to complex relations between words
• Synonymy / Antonymy
• Similarity
• Relatedness
• Taxonomic Relations: Hypernym/Hyponym
• Connotation
• Firth (1957)
• "You shall know a word by the company it keeps!"
Context determines meaning of word
• Other Methods
• Clustering methods: Brown Clusters [Collins lecture]
GLOBAL VECTOR
GloVe
• RNN
• Vanilla RNN
Convolution: Motivation
• Two Questions –
Source: 1.
https://www.cs.columbia.edu/education/courses/course/COMSW4995-7/26050/
2.
https://towardsdatascience.com/a-beginners-guide-to-convolutional-neural-networks-cnns
-14649dbddce8
Convolution: Motivation
1 0 1 1
0 1 1 0 1 0 1
2 2
0 0 0
1 0 1 0 2 1
0 1 0
0 1 0 1
Convoluted Feature
Convoluted Feature
Convoluted Feature
Convoluted Feature
h(t) • Loop allows information to be passed from one step of the network to
the next.
Θ
• Recursive function being applied to input at time step t and previous
wt hidden state h(t-1)
Unrolling a RNN
( 5)
Output Layer ^
𝒚
( 1)
^
𝒚
( 2)
𝒚 (3)
^ ^
𝒚
(4) ^
𝒚
wt
Ww Ww Ww Ww Ww
Embedding Layer
Input Sequence w1 w2 w3 w4 w5
RNN is about sequence and orders
(𝑡 ) (𝑡)
^
𝒚 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑼 h +𝒃 2)
Output Layer ^
𝒚
( 1)
^
𝒚
( 2)
^
𝒚
(3)
^
𝒚
(4)
𝒚 ( 5)
^
Hidden Layer (𝑡 ) (𝑡 −1 ) 𝑡
Wh Wh Wh Wh Wh 𝒉 =𝜎 (𝑾 h 𝒉 +𝑾 𝑤 𝒘 +𝒃 1)
Ww Ww Ww Ww Ww
Embedding Layer
Input Sequence w1 w2 w3 w4 w5
RNN in different form: Acceptor
Predict and
calculate loss
( 5)
Output Layer ^
𝒚
Hidden Layer
Wh Wh Wh Wh Wh
Ww Ww Ww Ww Ww
Embedding Layer
Input Sequence w1 w2 w3 w4 w5
RNN in different form: Encoder
Encoded the
representation
( 5)
Output Layer ^
𝒚
Hidden Layer
Wh Wh Wh Wh Wh
Loss is dependent
on other features
or other network Ww Ww Ww Ww Ww
Embedding Layer
Input Sequence w1 w2 w3 w4 w5
RNN in different form: Transducer
Global loss Predict and Predict and
. . . . . . . .
calculate loss calculate loss
( 5)
Output Layer 𝒚 ( 1)
^ 𝒚 ( 2)
^ ^
𝒚
(3)
𝒚 (4)
^ ^
𝒚
Hidden Layer W W W W W
h h h h h
Loss is dependent
on other features
or other network Ww Ww Ww Ww Ww
Embedding
Layer
Input w1 w2 w3 w4 w5
Sequence
Advantages
• Respect orders
• Can process any input length
• Model complexity remains the same
• In theory, information from previous time steps remains
Disadvantages
• Slow computation
• Not parallelizable
• In practice, forgets information from many steps back
• Primarily happens due to vanishing gradient problem
• Also may lead to exploding gradient problem
NEURAL LANGUAGE
MODEL
Pre-Transformer Era
Major drawbacks
• Inefficient
• Unable to exploit sequential nature of text
• Limited Context
• Unidirectional
Limited Context and Sequential
Nature: RNN-LM
Jurafsky and Martin, Speech and Language Processing, 3 rd Draft Jan 2022 Ed
Unidirectional: ELMo
• Limited bi-directionality
• Difficult to parallelize
References
• Jurafsky and Martin, Speech and Language Processing, 3rd Ed. Draft
[Available at https://web.stanford.edu/~jurafsky/slp3/ ]
Thanks!
Question and Comments!
[email protected] https://www.iitg.ac.in/
anand.ashish