Lecture 2
Lecture 2
Heng Ji
[email protected]
Discourse
Pragmatics
Semantics
Syntax
1. Words
2. Syntax
5. Applications exploiting each
3. Meaning
4. Discourse
Simple Applications
▪ Word counters (wc in UNIX)
▪ Spell Checkers, grammar checkers
▪ Predictive Text on mobile handsets
Bigger Applications
▪ Intelligent computer systems
▪ NLU interfaces to databases
▪ Computer aided instruction
▪ Information retrieval
▪ Intelligent Web searching
▪ Data mining
▪ Machine translation
▪ Speech recognition
▪ Natural language generation
▪ Question answering
▪ Image Caption Generation
Part-of-Speech Tagging and
Syntactic Parsing
S
NP VP
NN IN NP VBZ NP
School of NP CC NP presents JJ NN
Theatre Dance
Semantic Role Labeling: Adding
Semantics into Trees
S
NP/ARG0 VP
Predicate
Predicate
Core Arguments
▪ Arg0 = agent
▪ Arg1 = direct object / theme / patient
▪ Arg2 = indirect object / benefactive / instrument /
attribute / end state
▪ Arg3 = start point / benefactive / instrument / attribute
▪ Arg4 = end point
Dependency Parsing
▪ The little prince could guess easily enough that she was not
any too modest--but how moving--and exciting--she was!
▪ In 1975, after being fired from Columbia amid allegations that he used
company funds to pay for his son's bar mitzvah, Davis founded Arista
▪ Is ‘1975’ related to the employee_of relation between Davis and
Arista?
▪ If so, does it indicate START, END, HOLDS… ?
15
Event Extraction
▪ An event is specific occurrence that implies a change of states
▪ event trigger: the main word which most clearly expresses an event occurrence
▪ event arguments: the mentions that are involved in an event (participants)
▪ event mention: a phrase or sentence within which an event is described, including
trigger and arguments
▪ ACE defined 8 types of events, with 33 subtypes
17
Recap: Language Modeling
18
We will study basic neural NLP models
21
Building Blocks: How to Represent Language?
▪ Neural Networks
24
Building Blocks
▪ Neural Networks
25
How to Represent a Word?
Word embeddings
▪ Idea: learn an embedding from words into vectors
= a1b1+a2b2+ … +anbn
One can derive Cosine similarity
cos θab = a b / ||a||||b||
Softmax Function:
If we take an input of [1,2,3,4,1,2,3], the softmax of that is
[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175].
The softmax function highlights the largest values and
suppress other values, so that they are all positive and
sum to 1.
Word2Vec
48/39
▪ Quiz 6: Come Up with Some Words that Cannot be represented by
such vectors?
▪ Action knowledge “open”, “close”, “sit”… (need vision and
simulation)
▪ Top-employee (need embedding composition)
▪ Numbers, Time
▪ Chemical entities
50
Recap: Neuron
51
Activation functions are applied element-wise (e.g., f(x) = [f(x 1), …, f(xn)])
https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep-learning-9689331ba092
52
Neural Network
▪ Multiple sets of weights in sequence
▪ Multiple hidden layers
▪ Activation functions
▪ Adds nonlinearity
xn = Φ(Wxn-1 + b)
• W is weight matrix
• b is bias vector
• xn-1 is response from
previous layer
53
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
54
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
56
59
Training
60
Generation
61
Generation
62
Generation
63
Generation
64
Generation
65
Conditioned Generation
66
Seq2Seq
67
Seq2Seq
68
Seq2Seq
69
Seq2Seq
70
Seq2Seq
71
Seq2Seq with Attention
72
Seq2Seq with Attention
73
Seq2Seq with Attention
74
Seq2Seq with Attention
75
Seq2Seq with Attention
76
Seq2Seq with Attention
77
Seq2Seq with Attention
78
Seq2Seq with Attention
79
Seq2Seq with Attention
80