0% found this document useful (0 votes)

19 views40 pages

13-Neuralcrf Pos Tagging

The document discusses sequence labeling in natural language processing (NLP), focusing on tasks such as part-of-speech tagging and named entity recognition. It highlights the importance of modeling output structure using Conditional Random Fields (CRFs) to account for local dependencies and improve prediction accuracy. Additionally, it compares various models, including BiLSTM-CRF and BERT, demonstrating the effectiveness of combining neural networks with CRFs for better performance in low data settings.

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views40 pages

13-Neuralcrf Pos Tagging

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Sequence Labeling

Neural CRFs

Mausam

1
Types of Prediction Tasks

2
Sequence problems

• Many problems in NLP have data which is a

sequence of characters, words, phrases, lines,
or sentences …
• We can think of our task as one of labeling
each item
VBG NN IN DT NN IN NN B B I I B I B I B B
Chasing opportunity in an age of upheaval
而相对于这些品牌的价
POS tagging Word segmentation
Q
A
PERS O O O ORG ORG Q Text
Murdoch discusses future of News Corp.
A
A
segmen-
A tation
Named entity recognition Q
A
POS Tagging

DT NNP NN VBD VBN RP NN NNS

The Georgia branch had taken on loan commitments …

DT NN IN NN VBD NNS VBD

The average of interbank offered rates plummeted …

4
POS Tagging Ambiguity

• Words often have more than one POS: back

– The back door = JJ
– On my back = NN
– Win the voters back = RB
– Promised to back the bill = VB
• The POS tagging problem is to determine the
POS tag for a particular instance of a word.
Named Entity Recognition (NER)
• A very important sub-task: find and classify
names in text, for example:

– The decision by the independent MP Andrew

Wilkie to withdraw his support for the minority
Labor government sounded dramatic but it
should not further threaten its stability. When,
after the 2010 election, Wilkie, Rob Oakeshott,
Tony Windsor and the Greens agreed to support
Labor, they gave just two guarantees: confidence
and supply.
Named Entity Recognition (NER)
Person
• A very important sub-task: find and classify Date
Location
names in text, for example: Organi-
zation
– The decision by the independent MP Andrew
Wilkie to withdraw his support for the minority
Labor government sounded dramatic but it
should not further threaten its stability. When,
after the 2010 election, Wilkie, Rob Oakeshott,
Tony Windsor and the Greens agreed to support
Labor, they gave just two guarantees: confidence
and supply.
The Named Entity Recognition Task
Task: Predict entities in a text

Foreign ORG
Ministry ORG
spokesman O
Shen
Guofang
PER
PER } Standard
evaluation
told O is per entity,
Reuters ORG not per token
: O
Precision/Recall/F1 for IE/NER
• Recall and precision are straightforward for tasks like
IR and text categorization, where there is only one
grain size (documents)
• The measure behaves a bit funnily for IE/NER when
there are boundary errors (which are common):
– First Bank of Chicago announced earnings …
• This counts as both a fp and a fn
• Selecting nothing would have been better
• Some other metrics (e.g., MUC scorer) give partial
credit (according to complex rules)
Encoding classes for NER
IO encoding IOB encoding

Fred PER B-PER

showed O O
Sue PER B-PER
Mengqiu PER B-PER
Huang PER I-PER
‘s O O
new O O
painting O O

Practically negligible differences in performance. BIO is more standard..

Sequence Labeling as
Independent Classification

Structured Prediction task

But not a Structured Prediction Model
Instead: independent multi-class classification
Sequence Labeling with
BiLSTM / Transformer

What is missing?
Still not modeling output structure!
Outputs are independent (of each other)
Why Model Interactions in Output?
• Consistency is important!

• Example 2: Paris Hilton

Conditional Random Fields
• Models w/ Local Dependencies

• Some independence assumptions on the

output space, but not entirely independent
(local dependencies)

• Exact and optimal decoding/training via

dynamic programs
16
Local vs Global Normalization

17
CRFs

18
Potential Functions

19
Linear Chain CRF (in practice)
BiLSTM-CRF

21
Properties

22
Decoding Problem
Given X=x1 …xT, what is “best” tagging y1 …yT?

Several possible meanings of ‘solution’

1. States which are individually most likely
2. Single best state sequence

We want sequence y1 …yT, 1 1 1 … 1

2 2 2 … 2
such that P(Y|X) is maximized … … … …
K K K … K

Y* = argmaxY P( Y|X ) x1 x2 x3 xT

23
Most Likely Sequence
• Problem: find the most likely (Viterbi) sequence under the model
 Given model parameters, we can score any sequence pair
NNP VBZ NN NNS CD NN .
Fed raises interest rates 0.5 percent .

 In principle, we’re done – list all possible tag sequences, score

each one, pick the best one (the Viterbi state sequence)
2T+1 operations
NNP VBZ NN NNS CD NN logP = -23 per sequence
NNP NNS NN NNS CD NN logP = -29
NNP VBZ VB NNS CD NN logP = -27

 Beam search works ok in practice

 … but sometimes you want the optimal answer
 … and there’s often a better option than naïve beams
25
State Lattice / Trellis
<s> <s> <s> <s> <s> <s>

N N N N N N

V V V V V V

J J J J J J

D D D D D D

</s> </s> </s> </s> </s> </s>

<s> Fed raise interest rates

26
State Lattice / Trellis
<s> <s> <s> <s> <s> <s>

e(N)
N N N N N N

V V V V V V

J J J J J J

D D D D D D

</s> </s> </s> </s> </s> </s>

<s> Fed raise interest rates

27
Dynamic Programming
• Decoding: Y *  arg max P(Y | X )  arg max score( X , Y )
Y T 1 Y T
 arg max W ( yt 1 , yt )   e( X , yt )
Y t 1 t 1

• First consider how to compute max

• Define  i ( yi )  max
y[1:i 1]
score( X , y[1..i ] )
– score of most likely label sequence ending with tag yi at
position i, given words x1, …, xT
 i ( yi )  max e( X , yi )  W ( yi 1 , yi )  score( X , y[1..i 1] )
y[1:i 1]

 e( X , yi )  max W ( yi 1 , yi )  max score( X , y[1..i 1] )

yi1 y[1:i  2 ]

 e( X , yi )  max W ( yi 1 , yi )   i 1 ( yi 1 )
yi1 28
Viterbi Algorithm
• Input: x1,…,xT, W() and e()
• Initialize: δ0(<s>) = 0, and –infinity for other labels
• For i=1 to T do
– For (y’) in all possible tagset
 i ( y' )  e( X , y' )  max W ( y, y' )   i 1 ( y)
y
• Return
max W ( y' ,  / s )   T ( y' )
y'

returns only the optimal value

keep backpointers
29
Viterbi Algorithm
x1 x2 …..xi-1 xi………………………………xT
Tag 1
Maxy’ δi-1(y’) + Wtrans+ eobs
2

i δi(s)

Remember: δ (y) = score of most likely

i
tag seq ending with y at time i

30
Terminating Viterbi
x1 x2 …………………………………………..xT

Tag 1
δ
2 δ
i δ
Choose
δ Maxy W(y,</s>)
+δT(y)
K
δ

31
Terminating Viterbi
x1 x2 ……………………………………………xT

State 1

2 δ* Max

How did we compute *? Maxs’ δT-1(y’) + Ptrans + Pobs

Now Backchain to Find Final Sequence

Time: O(|Y|2T)
Linear in length of sequence
Space: O(|Y|T) 32
Training
• Find weights such that

Loss( )   log PCRF (Y | X ; )

is minimized

Log_sum_exp
(additive terms)
How to compute partition function?
(backward step handled by autograd)
33
BiLSTM-CRF w/ Features

34
MSQU: Multi-Sentence Qn Understanding
• “I am taking 15 Scouts to New Zealand over Christmas and
New Year. We are spending NYE in Auckland and are
looking for suggestions of restaurants (maybe buffet style)
which will be suitable for a large group? Ideally close to
somewhere where we can watch the fireworks from. Any
ideas would be welcome”
~Open Question Understanding

select x where x.type = “restaurant” and

x.location IN “Auckland” and x.attribute = “buffet style” and
x.attribute = “suitable for large group” and
x.attribute PREF “somewhere we can watch fireworks from”

Key Issue: Only 150 labeled questions!

Human Insight: Features!
• Token level features
• Raw token, lexicalized features, POS Tag, NER Tags

• Hand designed features

• Indicator features for candidates that are likely to be types based on targets of
WH- POS words such as Which, Where etc
• Indicator features for candidates that are likely to be attributes by checking if
there is an edge in the dependency graph leading up to a candidate type.
• Indicator features for adj-noun phrases

• Cluster ids of word2vec clustered words

• Global word counts in post

Question Parsing Accuracy
[Contractor, Patra, Mausam, Singla JNLE’21]

Model F1 F1 F1 F1
(type) (attribute) (location) (macro-avg)
CRF (with Features) 51.4 45.3 55.7 50.8
BiLSTM CRF 53.3 47.6 52.1 51.0
BiLSTM CRF + Features 58.4 48.1 62.0 56.2

Neural + Features > Neural > Symbolic + Features

Question Parsing Accuracy
[Contractor, Patra, Mausam, Singla JNLE’21]

Model F1 F1 F1 F1
(type) (attribute) (location) (macro-avg)
CRF 51.4 45.3 55.7 50.8
BiLSTM CRF 53.3 47.6 52.1 51.0
BERT 59.6 50.6 59.5 56.6
BERT + BiLSTM + CRF 63.4 56.5 72.4 64.4

BERT + CRF > BERT

Summary
• BiLSTM+CRF (or more generally Neural CRFs)
– combines feature engineering of Neural models
– global reasoning of CRFs

• When are CRFs helpful?

– Joint inference
– Low data setting

CS229
No ratings yet
CS229
216 pages
Information Extraction and Named Entity Recognition
No ratings yet
Information Extraction and Named Entity Recognition
32 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
281 pages
Chapter 6 (6.2)
No ratings yet
Chapter 6 (6.2)
65 pages
MLT Key
No ratings yet
MLT Key
71 pages
Assembly Language Step-by-Step: Programming with Linux
From Everand
Assembly Language Step-by-Step: Programming with Linux
Jeff Duntemann
2.5/5 (3)
12 Neuralcrf
No ratings yet
12 Neuralcrf
41 pages
Lecture10 Lstms
No ratings yet
Lecture10 Lstms
34 pages
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
No ratings yet
CS11-711 Advanced NLP: Retrieval and Retrieval-Augmented Generation
37 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
Crftut FNT PDF
No ratings yet
Crftut FNT PDF
109 pages
Instance Based Learning
No ratings yet
Instance Based Learning
21 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
155 pages
Midpaper
No ratings yet
Midpaper
16 pages
Jane Eyre (简·爱)
No ratings yet
Jane Eyre (简·爱)
646 pages
04 TemporalDiffHung Hung
No ratings yet
04 TemporalDiffHung Hung
93 pages
I2DL Student Lecture Notes
No ratings yet
I2DL Student Lecture Notes
97 pages
Deep Learning Book Part1
No ratings yet
Deep Learning Book Part1
100 pages
A Comprehensive Survey On Pretrained Foundation Models: A History From BERT To ChatGPT
No ratings yet
A Comprehensive Survey On Pretrained Foundation Models: A History From BERT To ChatGPT
99 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
AML Mod5
No ratings yet
AML Mod5
33 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Deep Learning Notes (1) 2
No ratings yet
Deep Learning Notes (1) 2
54 pages
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
No ratings yet
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
223 pages
Cheatsheets For Deep Learning 1650192034
No ratings yet
Cheatsheets For Deep Learning 1650192034
95 pages
Unsupervised Time Series Outlier Detection123
No ratings yet
Unsupervised Time Series Outlier Detection123
56 pages
Predicting Structured Data
No ratings yet
Predicting Structured Data
361 pages
Case Study of CPTED & Defensible Space Theory On Malaysia Low Cost Housing
100% (1)
Case Study of CPTED & Defensible Space Theory On Malaysia Low Cost Housing
31 pages
Lecun 20181015 Ihes Gomax PDF
No ratings yet
Lecun 20181015 Ihes Gomax PDF
109 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
Syntactic and Dependency Parsing
No ratings yet
Syntactic and Dependency Parsing
159 pages
Gorkha School Level 3w Reporting Form-2015!07!23
No ratings yet
Gorkha School Level 3w Reporting Form-2015!07!23
350 pages
Technical Manual
No ratings yet
Technical Manual
240 pages
NLP Summary
No ratings yet
NLP Summary
2 pages
PRNN P S Sastry Lec 1
No ratings yet
PRNN P S Sastry Lec 1
177 pages
Grounded Recurrent Neural Networks
No ratings yet
Grounded Recurrent Neural Networks
11 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Assignment 14 Modern AI
No ratings yet
Assignment 14 Modern AI
3 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
CHP 1,2
No ratings yet
CHP 1,2
18 pages
Computer Science 2
No ratings yet
Computer Science 2
66 pages
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
No ratings yet
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
126 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
10 Estimators Pre Lecture
No ratings yet
10 Estimators Pre Lecture
109 pages
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
No ratings yet
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
28 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
3 - Slides Corpus3
No ratings yet
3 - Slides Corpus3
88 pages
Violin 2020 Grade 1 PDF
No ratings yet
Violin 2020 Grade 1 PDF
18 pages
2DI90 ch9
No ratings yet
2DI90 ch9
83 pages
Lect33 Textcat
No ratings yet
Lect33 Textcat
70 pages
04-Textcat Text Class
No ratings yet
04-Textcat Text Class
77 pages
Matconvnet: Convolutional Neural Networks For Matlab
No ratings yet
Matconvnet: Convolutional Neural Networks For Matlab
55 pages
Out
No ratings yet
Out
151 pages
4 - Slides Regualer Expression
No ratings yet
4 - Slides Regualer Expression
75 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
SDC Lab Manual (Mid - 1)
No ratings yet
SDC Lab Manual (Mid - 1)
35 pages
MODULE 6. Effects of Contemporary Economic Issues Affecting The Filipino
No ratings yet
MODULE 6. Effects of Contemporary Economic Issues Affecting The Filipino
32 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
2DI90 chID190-CH5
No ratings yet
2DI90 chID190-CH5
62 pages
POS Tagging
No ratings yet
POS Tagging
63 pages
07 Covariance Answers Hidden Lecture
No ratings yet
07 Covariance Answers Hidden Lecture
62 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
2DI90 ch11
No ratings yet
2DI90 ch11
54 pages
01-Introduction PLC
No ratings yet
01-Introduction PLC
53 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
34 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
Rec Ex 11
No ratings yet
Rec Ex 11
13 pages
Book Three Conditions of Employment
100% (1)
Book Three Conditions of Employment
14 pages
ch07 Consistency Replication
No ratings yet
ch07 Consistency Replication
30 pages
Tut4 - WordEmb NLP
No ratings yet
Tut4 - WordEmb NLP
30 pages
Training Conditional Random Fields With Natural Gradient Descent
No ratings yet
Training Conditional Random Fields With Natural Gradient Descent
9 pages
Primes
No ratings yet
Primes
39 pages
Imran Shaikh Project
No ratings yet
Imran Shaikh Project
79 pages
Matconvnet Manual
No ratings yet
Matconvnet Manual
59 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
2.BasicTextProcessing NEW
No ratings yet
2.BasicTextProcessing NEW
39 pages
Instructors: Moses Charikar, Tengyu Ma, and Chris Re: Hope Everyone Stays Safe and Healthy in These Difficult Times!
No ratings yet
Instructors: Moses Charikar, Tengyu Ma, and Chris Re: Hope Everyone Stays Safe and Healthy in These Difficult Times!
40 pages
Happy Days Farm, Exton Pennsylvania Historic Resource Survey Form - Photoisite Plan Sheet
No ratings yet
Happy Days Farm, Exton Pennsylvania Historic Resource Survey Form - Photoisite Plan Sheet
115 pages
2 Corpora and Smoothing
No ratings yet
2 Corpora and Smoothing
85 pages
Quiz Sol
No ratings yet
Quiz Sol
4 pages
Grade Beam - Grade Foundation Analysis & Design
100% (2)
Grade Beam - Grade Foundation Analysis & Design
2 pages
BBBB
No ratings yet
BBBB
8 pages
Berger and Luckman Sociology of Knowledge
100% (1)
Berger and Luckman Sociology of Knowledge
7 pages
01-Bayes-All-Handout Prob
No ratings yet
01-Bayes-All-Handout Prob
28 pages
02 Random Vars All Handout
No ratings yet
02 Random Vars All Handout
23 pages
Slides08 LR Parsing
No ratings yet
Slides08 LR Parsing
25 pages
Applications of ANN
No ratings yet
Applications of ANN
19 pages
Imc Shift-Cipher
No ratings yet
Imc Shift-Cipher
17 pages
Jarrar LectureNotes Ch1 Introduction
No ratings yet
Jarrar LectureNotes Ch1 Introduction
18 pages
Sist en 12390 1 2021
No ratings yet
Sist en 12390 1 2021
10 pages
13-Oo-Opolymorphism PLC
No ratings yet
13-Oo-Opolymorphism PLC
15 pages
Ch. 1 Notes
No ratings yet
Ch. 1 Notes
11 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Manual A2 Final
No ratings yet
Manual A2 Final
43 pages
Reduction Proofs
No ratings yet
Reduction Proofs
9 pages
Bai Tap Ve Su Hoa Hop Giua Chu Ngu Va Dong Tu
No ratings yet
Bai Tap Ve Su Hoa Hop Giua Chu Ngu Va Dong Tu
4 pages
Search Document
No ratings yet
Search Document
13 pages
UNITAR Introduction To Sustainable Development in Practice
No ratings yet
UNITAR Introduction To Sustainable Development in Practice
30 pages
Critical Thinking Debate
No ratings yet
Critical Thinking Debate
16 pages
New Trends For Authentication
No ratings yet
New Trends For Authentication
5 pages
170 Items General Education
No ratings yet
170 Items General Education
26 pages
GD
No ratings yet
GD
18 pages
Biology 163 Study Guide 1
No ratings yet
Biology 163 Study Guide 1
8 pages
Honda Avoiding Fuel-Related Problems
No ratings yet
Honda Avoiding Fuel-Related Problems
2 pages
Laporan Presensi Pegawai Bulanan
No ratings yet
Laporan Presensi Pegawai Bulanan
7 pages
IFRS 8 Operating Segments
No ratings yet
IFRS 8 Operating Segments
2 pages
Manzano Vs CA
No ratings yet
Manzano Vs CA
7 pages
Gunluk - Plan 7 Ingilizce 33 39924
No ratings yet
Gunluk - Plan 7 Ingilizce 33 39924
1 page
Notes - The New Deal - Text PP
No ratings yet
Notes - The New Deal - Text PP
2 pages
Flywheel Effect or Polar Moment of Inertia - Engineers Edge
No ratings yet
Flywheel Effect or Polar Moment of Inertia - Engineers Edge
2 pages
Transpo Phil Rabbit Vs Iac
No ratings yet
Transpo Phil Rabbit Vs Iac
1 page