Conditional Random Field Model (CRF)
Conditional Random Field Model (CRF)
Prepared by:
Hiba Mansour
Submitted to:
Dr. Hamed1 Abdelhaq
Outline
CRF
“
Bi-LSTM & CRF
2
Outline
“
Hidden Markov Model(HMM)
&
Maximum Entropy Markov Model(MEMM)
3
Generative VS.
Discremenative
4
HMM
Definition of HMM:
HMM is a class of probabilistic graphical model that allow us to predict a sequence of unknown
(hidden) variables from a set of observed variables. It's a type of Markov chain where the states are
not directly observable, but the observations are assumed to be dependent on the states. For
example, predicting the weather (hidden variable) based on the type of clothes that someone wears
(observed).
5
Limitations of HMM
▷ Limited dependences
HMM models direct dependences between each state and only its corresponding observation.
In a sentence segmentation task, segmentation may depend not just on a single word, but also on the
features of the whole line such as line length, indentation, amount of white spaces, etc.
6
Solution:MEMM
A Maximum Entropy Markov Model (MEMM) is a probabilistic model used for sequence labeling tasks,
particularly in natural language processing (NLP). MEMMs extend the concept of Hidden Markov Models
(HMMs) by incorporating maximum entropy principles to model, the conditional probability distribution of
labels given a sequence of observations.
Discriminative model
Completely ignores modeling P(X): saves modeling effort
Learning objective function consistent with predictive function: P(Y|X)
7
MEMM:Label bias problem
8
MEMM:Label bias problem
9
MEMM:Label bias problem
10
MEMM:Label bias problem
11
MEMM:Label bias problem
“
Conditional Random Field(CRF)
13
Conditional Random Field (CRF)
▷ CRF Definition
CRF is a undirected probabilistic graphical model that models the conditional probability of a
sequence of labels given an observation sequence.
▷ CRF Flexibility
CRF allows for the incorporation of a wide range of features, making it more expressive and
powerful than HMM or MEMM.
▷ CRF Advantages
CRF overcomes the label bias problem of MEMM and can capture long-range dependencies,
making it a robust and effective structured prediction model.
14
Examples & Application
Global feature
𝒁 ( 𝑿)
17
Inference
18
Inference
Using Viterbi algorithm to find
optimal path:
19
Training CRF
We also need to find the 𝑤 parameters that best fit the training data, a given a set of
labelled sentences:
the 𝑤 parameters that best fit the data we need to maximize the conditional likelihood of the
where each pair is a sentence with the corresponding word labels annotated. To find
training data:
L2 regularization
the parameter estimates are computed as:
term.
The standard approach to finding is to compute the gradient of the objective function and use the
gradient in an optimization algorithm like L-BFGS
20
HMM,MEMM and CRF Comparison
HMM MEMM CRF
Type Generative model Discriminative model Discriminative model
Relaxes independence
Strong independence Flexible modeling of
Dependency assumptions, but suffers
assumptions dependencies, avoids label bias
from label bias
Sensitive to noisy Sensitive to label bias Robust to noise and can capture
Limitation
observations issue long-range dependencies
Training MLE Gradient based Gradient based
Inference Forward-backward/Viterbi Forward-backward/Viterbi Forward-backward/Viterbi
Computation Easier Moderate Harder
Performance Weaker Moderate Stronger
Graph Directed Directed Undirected
Context Local Local Global,local
21
HMM,MEMM and CRF Comparison
22
Outline
“
CRF and Bi-LSTM
23
Bi-LSTM
▷ Bi-LSTM
Bi-directional Long Short-Term Memory (Bi-LSTM) is a type of recurrent neural network that can capture long-range
dependencies in sequential data.
24
BiLSTM-CRF
▷ BiLSTM-CRF
• Firstly, every word in sentence x𝑥 is represented as a vector which includes the word’s character embedding and
word embedding. The character embedding is initialized randomly. The word embedding usually is from a pre-
trained word embedding file. All the embeddings will be fine-tuned during the training process.
• Second, the inputs of BiLSTM-CRF model are those embeddings and the outputs are predicted labels for words in
sentence x
25
BiLSTM-CRF
▷ BiLSTM-CRF
• The outputs of BiLSTM layer are the scores of each label. For example, for w0𝑤0 , the outputs of BiLSTM node are
1.5 (B-Person), 0.9 (I-Person), 0.1 (B-Organization), 0.08 (I-Organization) and 0.05 (O). These scores will be the
inputs of the CRF layer.
• Then, all the scores predicted by the BiLSTM blocks are fed into the CRF layer. In the CRF layer, the label sequence
which has the highest prediction score would be selected as the best answer.
26
What if don’t have CRF
Because the outputs of BiLSTM of each word are the label scores. We can select the label which
has the highest score for each word.
Although we can get correct labels for sentence x in this example, but it is not always like that.
27
What if don’t have CRF
With these useful constrains, the number of invalid predicted label sequences will decrease dramatically.
28
Case Study
▷ Study objectives
The research paper focuses on the challenge of extracting lung cancer diagnosis
and relating it to the diagnosis date from clinical notes written in Spanish. The
proposed approach combines deep learning-based methods with rule-based
techniques to address the complexities in clinical narratives
▷ Approach
▷ Results
Average F1-score=0.89
29
References
• “Conditional Random Fields for Sequence Prediction.” Accessed: May 28, 2024. [Online]. Available: https://www.davidsbatista.net/blog/2017/11/13/Conditional_Random_Fields/
• “How do Conditional Random Fields (CRF) compare to Maximum Entropy Models and Hidden Markov Models?,” Quora. Accessed: May 28, 2024. [Online]. Available:
https://www.quora.com/How-do-Conditional-Random-Fields-CRF-compare-to-Maximum-Entropy-Models-and-Hidden-Markov-Models
• “CRF Layer on the Top of BiLSTM - 1,” CreateMoMo. Accessed: May 29, 2024. [Online]. Available: http://createmomo.github.io/2017/09/12/CRF_Layer_on_the_Top_of_BiLSTM_1/index.htm l
• E. Schubert, “Maximum Entropy Markov Models (MEMM),” Machine Learning Bits, 2024, Accessed: Jun. 01, 2024. [Online]. Available:
https://dm.cs.tu-dortmund.de/mlbits/sequential-models-maximum-entropy-models/
• “Named Entity Recognition using a Bi-LSTM with the Conditional Random Field Algorithm - Data Science <3 Machine Learning.” Accessed: May 28, 2024. [Online]. Available:
https://michhar.github.io/bilstm-crf-this-is-mind-bending/
• P. Tum, “NLP: Text Segmentation Using Conditional Random Fields,” Medium. Accessed: May 28, 2024. [Online]. Available:
https://medium.com/@phylypo/nlp-text-segmentation-using-conditional-random-fields-e8ff1d2b6060
• PGM 18Spring Lecture 11: CRF (cont’d) + Intro to Topic Models, (Feb. 21, 2018). Accessed: Jun. 01, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=d4r6o-2G-bA
• PGM 18Spring Lecture 10 updated: Discrete sequential Models + General CRF, (Mar. 07, 2018). Accessed: Jun. 01, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=hoSrWNvWjcI
• A. Hannun, “The Label Bias Problem.” Accessed: Jun. 01, 2024. [Online]. Available: https://awni.github.io/label-bias/
• C. Sutton and A. McCallum, “An Introduction to Conditional Random Fields for Relational Learning,” in Introduction to Statistical Relational Learning, L. Getoor and B. Taskar, Eds., The MIT Press,
2007, pp. 93–128. doi: 10.7551/mitpress/7432.003.0006.
• “Conditional Random Fields Explained | by Aditya Prasad | Towards Data Science.” Accessed: Jun. 04, 2024. [Online]. Available:
https://towardsdatascience.com/conditional-random-fields-explained-e5b8256da776
• Conditional Random Fields : Data Science Concepts, (Mar. 01, 2022). Accessed: Jun. 04, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=rI3DQS0P2fk
• “ed3book_jan122022-final-version.pdf.”
• “NER using Random Forest and CRF.” Accessed: Jun. 04, 2024. [Online]. Available: https://www.kaggle.com/code/shoumikgoswami/ner-using-random-forest-and-crf
• “NER with Bi-LSTM CRF.” Accessed: Jun. 04, 2024. [Online]. Available: https://www.kaggle.com/code/ab971631/ner-with-bi-lstm-crf
• O. Solarte Pabón, M. Torrente, M. Provencio, A. Rodríguez-Gonzalez, and E. Menasalvas, “Integrating Speculation Detection and Deep Learning to Extract Lung Cancer Diagnosis from Clinical Notes,”
Applied Sciences, vol. 11, no. 2, p. 865, Jan. 2021, doi: 10.3390/app11020865. 30
Thanks!
Any questions?