0% found this document useful (0 votes)

26 views

Conditional Random Field Model (CRF)

Uploaded by

Hiba Mansour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Conditional Random Field Model (CRF)

Uploaded by

Hiba Mansour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Conditional Random

Field Model (CRF)

Prepared by:
Hiba Mansour

Submitted to:
Dr. Hamed1 Abdelhaq
Outline

 HMM & MEMM

 CRF
“
 Bi-LSTM & CRF

2
Outline

“
Hidden Markov Model(HMM)
&
Maximum Entropy Markov Model(MEMM)

3
Generative VS.
Discremenative

4
HMM
Definition of HMM:
HMM is a class of probabilistic graphical model that allow us to predict a sequence of unknown
(hidden) variables from a set of observed variables. It's a type of Markov chain where the states are
not directly observable, but the observations are assumed to be dependent on the states. For
example, predicting the weather (hidden variable) based on the type of clothes that someone wears
(observed).

5
Limitations of HMM
▷ Limited dependences
HMM models direct dependences between each state and only its corresponding observation.
In a sentence segmentation task, segmentation may depend not just on a single word, but also on the
features of the whole line such as line length, indentation, amount of white spaces, etc.

▷ Static transmission and emission

probabilities

▷ Learns a joint probability distribution

HMM learns a joint probability distribution of states and observations P(Y,X), but in a prediction task, we
need the conditional probability P(Y|X).

▷ HMM need number of augmentations to achieve high accuracy

In some tasks like POS we often run into unknown words, we need a way to add arbitrary features directly in
the model

6
Solution:MEMM
A Maximum Entropy Markov Model (MEMM) is a probabilistic model used for sequence labeling tasks,
particularly in natural language processing (NLP). MEMMs extend the concept of Hidden Markov Models
(HMMs) by incorporating maximum entropy principles to model, the conditional probability distribution of
labels given a sequence of observations.

 Models dependence between each state and the full observation

sequence explicitly
More expressive than HMMs

 Discriminative model
Completely ignores modeling P(X): saves modeling effort
Learning objective function consistent with predictive function: P(Y|X)

7
MEMM:Label bias problem

8
MEMM:Label bias problem

9
MEMM:Label bias problem

10
MEMM:Label bias problem

11
MEMM:Label bias problem

Most likely path:

always
Average outgoing weight of
smaller
More out-edges Label bias
problem
Solution : global normalization,
CRF
12
Outline

“
Conditional Random Field(CRF)

13
Conditional Random Field (CRF)
▷ CRF Definition
CRF is a undirected probabilistic graphical model that models the conditional probability of a
sequence of labels given an observation sequence.

▷ CRF Flexibility
CRF allows for the incorporation of a wide range of features, making it more expressive and
powerful than HMM or MEMM.

▷ CRF Advantages
CRF overcomes the label bias problem of MEMM and can capture long-range dependencies,
making it a robust and effective structured prediction model.

14
Examples & Application

Natural Language Bioinformatics Computer Vision

Processing
In bioinformatics, Conditional CRF models can incorporate
CRF models excel at sequence
Random Fields (CRFs) are spatial and contextual
labeling tasks in NLP, such as
widely used. They help compare information to improve object
part-of-speech tagging, named
RNA molecules for structural detection and segmentation in
entity recognition, and semantic
15
alignment and predict protein computer vision tasks like
role labeling.
structures. image captioning and scene
understanding.
CRF structures
▷ Linear Conditional Random Fields (CRFs):
1. Linear CRFs are a type of graphical model used for sequence labeling tasks.
2. They model dependencies between input features and output labels in a linear
chain structure.
3. Linear CRFs are widely applied in tasks like named entity recognition and part-of-
▷ speech
Skip tagging.
Chain Conditional Random Fields (CRFs):
1. Skip chain CRFs extend the linear CRF model by allowing for skips or
transitions between non-adjacent positions in the sequence.
2. This enables the modeling of long-range dependencies in sequential data.
3. Skip chain CRFs find applications in tasks such as speech recognition.

▷ General Conditional Random Fields (CRFs):

1. General CRFs are a more flexible form of CRF that can handle arbitrary graph
structures.
2. Unlike linear and skip chain CRFs, which assume a linear or skip-chain
topology, general CRFs allow for more complex relationships between
variables.
3. They are suitable for various tasks including image segmentation, and natural
language parsing
16
From MEMM to CRF
CRF

Global feature

𝒁 ( 𝑿)

17
Inference

Ignore Z(X) ,since it is

constant

Ignore exp ,since it

doesn’t change argmax

18
Inference
Using Viterbi algorithm to find
optimal path:

• Viterbi path probability for HMM:

• Viterbi path probability for

CRF:

19
Training CRF
We also need to find the 𝑤 parameters that best fit the training data, a given a set of
labelled sentences:

the 𝑤 parameters that best fit the data we need to maximize the conditional likelihood of the
where each pair is a sentence with the corresponding word labels annotated. To find

training data:

L2 regularization
the parameter estimates are computed as:
term.

The standard approach to finding is to compute the gradient of the objective function and use the
gradient in an optimization algorithm like L-BFGS

20
HMM,MEMM and CRF Comparison
HMM MEMM CRF
Type Generative model Discriminative model Discriminative model
Relaxes independence
Strong independence Flexible modeling of
Dependency assumptions, but suffers
assumptions dependencies, avoids label bias
from label bias
Sensitive to noisy Sensitive to label bias Robust to noise and can capture
Limitation
observations issue long-range dependencies
Training MLE Gradient based Gradient based
Inference Forward-backward/Viterbi Forward-backward/Viterbi Forward-backward/Viterbi
Computation Easier Moderate Harder
Performance Weaker Moderate Stronger
Graph Directed Directed Undirected
Context Local Local Global,local

21
HMM,MEMM and CRF Comparison

22
Outline

“
CRF and Bi-LSTM

23
Bi-LSTM
▷ Bi-LSTM
Bi-directional Long Short-Term Memory (Bi-LSTM) is a type of recurrent neural network that can capture long-range
dependencies in sequential data.

24
BiLSTM-CRF

▷ BiLSTM-CRF
• Firstly, every word in sentence x𝑥 is represented as a vector which includes the word’s character embedding and
word embedding. The character embedding is initialized randomly. The word embedding usually is from a pre-
trained word embedding file. All the embeddings will be fine-tuned during the training process.
• Second, the inputs of BiLSTM-CRF model are those embeddings and the outputs are predicted labels for words in
sentence x

25
BiLSTM-CRF
▷ BiLSTM-CRF
• The outputs of BiLSTM layer are the scores of each label. For example, for w0𝑤0 ， the outputs of BiLSTM node are
1.5 (B-Person), 0.9 (I-Person), 0.1 (B-Organization), 0.08 (I-Organization) and 0.05 (O). These scores will be the
inputs of the CRF layer.
• Then, all the scores predicted by the BiLSTM blocks are fed into the CRF layer. In the CRF layer, the label sequence
which has the highest prediction score would be selected as the best answer.

26
What if don’t have CRF

Because the outputs of BiLSTM of each word are the label scores. We can select the label which
has the highest score for each word.

Although we can get correct labels for sentence x in this example, but it is not always like that.

27
What if don’t have CRF

▷ CRF layer can learn constrains from training data

The CRF layer could add some constrains to the final predicted labels to ensure they are valid. These constrains can be
learned by the CRF layer automatically from the training dataset during the training process.

The constrains could be:

 The label of the first word in a sentence should start with “B-“ or “O”, not “I-“
 “B-label1 I-label2 I-label3 I-…”, in this pattern, label1, label2, label3 … should be the same named entity label. For
example, “B-Person I-Person” is valid, but “B-Person I-Organization” is invalid.
 “O I-label” is invalid. The first label of one named entity should start with “B-“ not “I-“, in other words, the valid
pattern should be “O B-label”

With these useful constrains, the number of invalid predicted label sequences will decrease dramatically.

28
Case Study

▷ Study objectives
The research paper focuses on the challenge of extracting lung cancer diagnosis
and relating it to the diagnosis date from clinical notes written in Spanish. The
proposed approach combines deep learning-based methods with rule-based
techniques to address the complexities in clinical narratives

▷ Approach

▷ Results
Average F1-score=0.89
29
References
• “Conditional Random Fields for Sequence Prediction.” Accessed: May 28, 2024. [Online]. Available: https://www.davidsbatista.net/blog/2017/11/13/Conditional_Random_Fields/
• “How do Conditional Random Fields (CRF) compare to Maximum Entropy Models and Hidden Markov Models?,” Quora. Accessed: May 28, 2024. [Online]. Available:
https://www.quora.com/How-do-Conditional-Random-Fields-CRF-compare-to-Maximum-Entropy-Models-and-Hidden-Markov-Models
• “CRF Layer on the Top of BiLSTM - 1,” CreateMoMo. Accessed: May 29, 2024. [Online]. Available: http://createmomo.github.io/2017/09/12/CRF_Layer_on_the_Top_of_BiLSTM_1/index.htm l
• E. Schubert, “Maximum Entropy Markov Models (MEMM),” Machine Learning Bits, 2024, Accessed: Jun. 01, 2024. [Online]. Available:
https://dm.cs.tu-dortmund.de/mlbits/sequential-models-maximum-entropy-models/
• “Named Entity Recognition using a Bi-LSTM with the Conditional Random Field Algorithm - Data Science <3 Machine Learning.” Accessed: May 28, 2024. [Online]. Available:
https://michhar.github.io/bilstm-crf-this-is-mind-bending/
• P. Tum, “NLP: Text Segmentation Using Conditional Random Fields,” Medium. Accessed: May 28, 2024. [Online]. Available:
https://medium.com/@phylypo/nlp-text-segmentation-using-conditional-random-fields-e8ff1d2b6060
• PGM 18Spring Lecture 11: CRF (cont’d) + Intro to Topic Models, (Feb. 21, 2018). Accessed: Jun. 01, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=d4r6o-2G-bA
• PGM 18Spring Lecture 10 updated: Discrete sequential Models + General CRF, (Mar. 07, 2018). Accessed: Jun. 01, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=hoSrWNvWjcI
• A. Hannun, “The Label Bias Problem.” Accessed: Jun. 01, 2024. [Online]. Available: https://awni.github.io/label-bias/
• C. Sutton and A. McCallum, “An Introduction to Conditional Random Fields for Relational Learning,” in Introduction to Statistical Relational Learning, L. Getoor and B. Taskar, Eds., The MIT Press,
2007, pp. 93–128. doi: 10.7551/mitpress/7432.003.0006.
• “Conditional Random Fields Explained | by Aditya Prasad | Towards Data Science.” Accessed: Jun. 04, 2024. [Online]. Available:
https://towardsdatascience.com/conditional-random-fields-explained-e5b8256da776
• Conditional Random Fields : Data Science Concepts, (Mar. 01, 2022). Accessed: Jun. 04, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=rI3DQS0P2fk
• “ed3book_jan122022-final-version.pdf.”
• “NER using Random Forest and CRF.” Accessed: Jun. 04, 2024. [Online]. Available: https://www.kaggle.com/code/shoumikgoswami/ner-using-random-forest-and-crf
• “NER with Bi-LSTM CRF.” Accessed: Jun. 04, 2024. [Online]. Available: https://www.kaggle.com/code/ab971631/ner-with-bi-lstm-crf
• O. Solarte Pabón, M. Torrente, M. Provencio, A. Rodríguez-Gonzalez, and E. Menasalvas, “Integrating Speculation Detection and Deep Learning to Extract Lung Cancer Diagnosis from Clinical Notes,”
Applied Sciences, vol. 11, no. 2, p. 865, Jan. 2021, doi: 10.3390/app11020865. 30
Thanks!
Any questions?

Lecture On Stochastic Processes
No ratings yet
Lecture On Stochastic Processes
77 pages
FGGHHDH GDD HH
100% (3)
FGGHHDH GDD HH
433 pages
(5)Conditional Random Fields (CRFs).pptx
No ratings yet
(5)Conditional Random Fields (CRFs).pptx
13 pages
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
No ratings yet
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
28 pages
Conditional Random Field
No ratings yet
Conditional Random Field
5 pages
CRF Klinger Tomanek
No ratings yet
CRF Klinger Tomanek
32 pages
An Introduction To Conditional Random Fields: Charles Sutton and Andrew Mccallum
No ratings yet
An Introduction To Conditional Random Fields: Charles Sutton and Andrew Mccallum
90 pages
Crftut FNT PDF
No ratings yet
Crftut FNT PDF
109 pages
CRF_Laura_Kallmeyer
No ratings yet
CRF_Laura_Kallmeyer
21 pages
hlt2004
No ratings yet
hlt2004
8 pages
Partially Directed Graphs and Conditional Random Fields: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Partially Directed Graphs and Conditional Random Fields: Sargur Srihari Srihari@cedar - Buffalo.edu
43 pages
02 Unit 4
No ratings yet
02 Unit 4
10 pages
Semi-Markov Conditional Random Fields For Information Extraction
No ratings yet
Semi-Markov Conditional Random Fields For Information Extraction
8 pages
14 CRF 06 09 2024
No ratings yet
14 CRF 06 09 2024
10 pages
Flexcrfs
No ratings yet
Flexcrfs
34 pages
Shallow Parsing With Conditional Random Fields
No ratings yet
Shallow Parsing With Conditional Random Fields
8 pages
What Is CRF?
No ratings yet
What Is CRF?
3 pages
lecture10-lstms
No ratings yet
lecture10-lstms
34 pages
Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國
No ratings yet
Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國
25 pages
Convolutional CRFs For Semantic Segmentation
No ratings yet
Convolutional CRFs For Semantic Segmentation
12 pages
Shallow Parsing With Conditional Random Fields
No ratings yet
Shallow Parsing With Conditional Random Fields
8 pages
Text Classification of Clinical Trial Document Based On Bilstm CRF
No ratings yet
Text Classification of Clinical Trial Document Based On Bilstm CRF
5 pages
CRF Tutorial Talk
No ratings yet
CRF Tutorial Talk
35 pages
Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs
No ratings yet
Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs
5 pages
8 CRF
No ratings yet
8 CRF
12 pages
Bidirectional LSTM-CRF For Named Entity Recognition
No ratings yet
Bidirectional LSTM-CRF For Named Entity Recognition
10 pages
Conditional Random Fields: An Introduction: 1 Labeling Sequential Data
No ratings yet
Conditional Random Fields: An Introduction: 1 Labeling Sequential Data
9 pages
Module 3
No ratings yet
Module 3
17 pages
Precursor-Induced Conditional Random
No ratings yet
Precursor-Induced Conditional Random
13 pages
Sequence Labeling for Parts of Speech and Named Entities ppt 2
No ratings yet
Sequence Labeling for Parts of Speech and Named Entities ppt 2
18 pages
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
No ratings yet
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
41 pages
Hidden Conditional Random Fields For Phone Recognition: Yun-Hsuan Sung and Dan Jurafsky
No ratings yet
Hidden Conditional Random Fields For Phone Recognition: Yun-Hsuan Sung and Dan Jurafsky
6 pages
Efficient, Feature-Based, Conditional Random Field Parsing: Jenny Rose Finkel, Alex Kleeman, Christopher D. Manning
No ratings yet
Efficient, Feature-Based, Conditional Random Field Parsing: Jenny Rose Finkel, Alex Kleeman, Christopher D. Manning
9 pages
Ch13 5-ConditionalRandomFields
No ratings yet
Ch13 5-ConditionalRandomFields
57 pages
SLU_Deep Belief Network based Semantic Taggers for Spoken Language Understanding
No ratings yet
SLU_Deep Belief Network based Semantic Taggers for Spoken Language Understanding
5 pages
P Final
No ratings yet
P Final
5 pages
CRF Tutorial ISMIR-2013 PDF
No ratings yet
CRF Tutorial ISMIR-2013 PDF
133 pages
Temporary Report
No ratings yet
Temporary Report
28 pages
NLP UNIT-I Part-II
No ratings yet
NLP UNIT-I Part-II
17 pages
Convolutional Neural Networks For Sentence Classification: Yoon Kim New York University Yhk255@nyu - Edu
No ratings yet
Convolutional Neural Networks For Sentence Classification: Yoon Kim New York University Yhk255@nyu - Edu
6 pages
Week_7_1_02_20_2025
No ratings yet
Week_7_1_02_20_2025
35 pages
Predicting Structured Data
No ratings yet
Predicting Structured Data
29 pages
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
No ratings yet
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
17 pages
Conditional Random Fields
No ratings yet
Conditional Random Fields
10 pages
crf2 PDF
No ratings yet
crf2 PDF
10 pages
Bidirectional LSTM-CRF Models For Sequence Tagging
No ratings yet
Bidirectional LSTM-CRF Models For Sequence Tagging
10 pages
Adv Ai
No ratings yet
Adv Ai
9 pages
NLP Unit 4
No ratings yet
NLP Unit 4
22 pages
Speech Recognition With Deep Recurrent Neural Networks
No ratings yet
Speech Recognition With Deep Recurrent Neural Networks
2 pages
Machine Learning Technique - Introduction To Graphical Models
No ratings yet
Machine Learning Technique - Introduction To Graphical Models
12 pages
Classifying Relations by Ranking With Convolutional Neural Networks
No ratings yet
Classifying Relations by Ranking With Convolutional Neural Networks
9 pages
Discriminative Fields For Modeling Spatial Dependencies
No ratings yet
Discriminative Fields For Modeling Spatial Dependencies
8 pages
Condensed Memory Networks For Clinical Diagnostic Inferencing
No ratings yet
Condensed Memory Networks For Clinical Diagnostic Inferencing
8 pages
Semantic Image Segmentation Via Deep Parsing Network
No ratings yet
Semantic Image Segmentation Via Deep Parsing Network
11 pages
12 Neuralcrf
No ratings yet
12 Neuralcrf
41 pages
2025-Lecture06-MachineLearning
No ratings yet
2025-Lecture06-MachineLearning
56 pages
Chapters - Mini Project Report Format
No ratings yet
Chapters - Mini Project Report Format
17 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
Medical Image Segmentation
No ratings yet
Medical Image Segmentation
12 pages
Markov Random Field: Exploring the Power of Markov Random Fields in Computer Vision
From Everand
Markov Random Field: Exploring the Power of Markov Random Fields in Computer Vision
Fouad Sabry
No ratings yet
The Comprehensive Guide to Machine Learning Algorithms and Techniques
From Everand
The Comprehensive Guide to Machine Learning Algorithms and Techniques
Mohammed Ahmed
5/5 (1)
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
RVSP
No ratings yet
RVSP
2 pages
BRT - Notes Part 1
No ratings yet
BRT - Notes Part 1
15 pages
normdist
No ratings yet
normdist
2 pages
S7 Kurtosis
No ratings yet
S7 Kurtosis
33 pages
Chapter 05 - Intan Revised
No ratings yet
Chapter 05 - Intan Revised
11 pages
2024_Chapter4 (5)
No ratings yet
2024_Chapter4 (5)
65 pages
ERG2040C: Lecture 7 Examples and Conditional Expectation
No ratings yet
ERG2040C: Lecture 7 Examples and Conditional Expectation
20 pages
Ch-04 - Random Variables and Their Properties
No ratings yet
Ch-04 - Random Variables and Their Properties
32 pages
Sta1503 2013 - Tutorial Letter 101 2013 3 e PDF
No ratings yet
Sta1503 2013 - Tutorial Letter 101 2013 3 e PDF
21 pages
P&S Mid-2 R2022051-1
No ratings yet
P&S Mid-2 R2022051-1
15 pages
Chapter 8: Interval Estimates and Hypothesis Testing
No ratings yet
Chapter 8: Interval Estimates and Hypothesis Testing
30 pages
Brock Mirman JET 72
No ratings yet
Brock Mirman JET 72
35 pages
Probability - Dice Roll Probabilities - Mathematics Stack Exchange
No ratings yet
Probability - Dice Roll Probabilities - Mathematics Stack Exchange
2 pages
Eco 311 Applied Statistics Tutorial Questions and Answers
No ratings yet
Eco 311 Applied Statistics Tutorial Questions and Answers
20 pages
Probability and Statistics 2021-2022 (Se3)
No ratings yet
Probability and Statistics 2021-2022 (Se3)
2 pages
Bandit
No ratings yet
Bandit
8 pages
Stat LAS 1
No ratings yet
Stat LAS 1
6 pages
Signal Processing: Random Signals
No ratings yet
Signal Processing: Random Signals
14 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
Module III - Static Reliability Analysis and Design
No ratings yet
Module III - Static Reliability Analysis and Design
35 pages
Random Number Tables
No ratings yet
Random Number Tables
2 pages
Causal Probabilistic Programming Without Tears
No ratings yet
Causal Probabilistic Programming Without Tears
15 pages
Final EE3150 2015 Fall
No ratings yet
Final EE3150 2015 Fall
2 pages
nep-math-syllabus17217257784370
No ratings yet
nep-math-syllabus17217257784370
65 pages
Variance, Covariance, and Moment-Generating Functions: Practice Problems - Solutions
No ratings yet
Variance, Covariance, and Moment-Generating Functions: Practice Problems - Solutions
2 pages
(EP2091) Tugas Probstat
No ratings yet
(EP2091) Tugas Probstat
8 pages
BUS 5 Module 2 - Probability
No ratings yet
BUS 5 Module 2 - Probability
13 pages
Expected Value of Random Variables
No ratings yet
Expected Value of Random Variables
3 pages