0% found this document useful (0 votes)
26 views

Conditional Random Field Model (CRF)

Uploaded by

Hiba Mansour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Conditional Random Field Model (CRF)

Uploaded by

Hiba Mansour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Conditional Random

Field Model (CRF)

Prepared by:
Hiba Mansour

Submitted to:
Dr. Hamed1 Abdelhaq
Outline

 HMM & MEMM

 CRF

 Bi-LSTM & CRF

2
Outline


Hidden Markov Model(HMM)
&
Maximum Entropy Markov Model(MEMM)

3
Generative VS.
Discremenative

4
HMM
Definition of HMM:
HMM is a class of probabilistic graphical model that allow us to predict a sequence of unknown
(hidden) variables from a set of observed variables. It's a type of Markov chain where the states are
not directly observable, but the observations are assumed to be dependent on the states. For
example, predicting the weather (hidden variable) based on the type of clothes that someone wears
(observed).

5
Limitations of HMM
▷ Limited dependences
HMM models direct dependences between each state and only its corresponding observation.
In a sentence segmentation task, segmentation may depend not just on a single word, but also on the
features of the whole line such as line length, indentation, amount of white spaces, etc.

▷ Static transmission and emission


probabilities

▷ Learns a joint probability distribution


HMM learns a joint probability distribution of states and observations P(Y,X), but in a prediction task, we
need the conditional probability P(Y|X).

▷ HMM need number of augmentations to achieve high accuracy


In some tasks like POS we often run into unknown words, we need a way to add arbitrary features directly in
the model

6
Solution:MEMM
A Maximum Entropy Markov Model (MEMM) is a probabilistic model used for sequence labeling tasks,
particularly in natural language processing (NLP). MEMMs extend the concept of Hidden Markov Models
(HMMs) by incorporating maximum entropy principles to model, the conditional probability distribution of
labels given a sequence of observations.

 Models dependence between each state and the full observation


sequence explicitly
More expressive than HMMs

 Discriminative model
Completely ignores modeling P(X): saves modeling effort
Learning objective function consistent with predictive function: P(Y|X)

7
MEMM:Label bias problem

8
MEMM:Label bias problem

9
MEMM:Label bias problem

10
MEMM:Label bias problem

11
MEMM:Label bias problem

Most likely path:


always
Average outgoing weight of
smaller
More out-edges Label bias
problem
Solution : global normalization,
CRF
12
Outline


Conditional Random Field(CRF)

13
Conditional Random Field (CRF)
▷ CRF Definition
CRF is a undirected probabilistic graphical model that models the conditional probability of a
sequence of labels given an observation sequence.

▷ CRF Flexibility
CRF allows for the incorporation of a wide range of features, making it more expressive and
powerful than HMM or MEMM.

▷ CRF Advantages
CRF overcomes the label bias problem of MEMM and can capture long-range dependencies,
making it a robust and effective structured prediction model.

14
Examples & Application

Natural Language Bioinformatics Computer Vision


Processing
In bioinformatics, Conditional CRF models can incorporate
CRF models excel at sequence
Random Fields (CRFs) are spatial and contextual
labeling tasks in NLP, such as
widely used. They help compare information to improve object
part-of-speech tagging, named
RNA molecules for structural detection and segmentation in
entity recognition, and semantic
15
alignment and predict protein computer vision tasks like
role labeling.
structures. image captioning and scene
understanding.
CRF structures
▷ Linear Conditional Random Fields (CRFs):
1. Linear CRFs are a type of graphical model used for sequence labeling tasks.
2. They model dependencies between input features and output labels in a linear
chain structure.
3. Linear CRFs are widely applied in tasks like named entity recognition and part-of-
▷ speech
Skip tagging.
Chain Conditional Random Fields (CRFs):
1. Skip chain CRFs extend the linear CRF model by allowing for skips or
transitions between non-adjacent positions in the sequence.
2. This enables the modeling of long-range dependencies in sequential data.
3. Skip chain CRFs find applications in tasks such as speech recognition.

▷ General Conditional Random Fields (CRFs):


1. General CRFs are a more flexible form of CRF that can handle arbitrary graph
structures.
2. Unlike linear and skip chain CRFs, which assume a linear or skip-chain
topology, general CRFs allow for more complex relationships between
variables.
3. They are suitable for various tasks including image segmentation, and natural
language parsing
16
From MEMM to CRF
CRF

Global feature

𝒁 ( 𝑿)

17
Inference

Ignore Z(X) ,since it is


constant

Ignore exp ,since it


doesn’t change argmax

18
Inference
Using Viterbi algorithm to find
optimal path:

• Viterbi path probability for HMM:

• Viterbi path probability for


CRF:

19
Training CRF
We also need to find the 𝑤 parameters that best fit the training data, a given a set of
labelled sentences:

the 𝑤 parameters that best fit the data we need to maximize the conditional likelihood of the
where each pair is a sentence with the corresponding word labels annotated. To find

training data:

L2 regularization
the parameter estimates are computed as:
term.

The standard approach to finding is to compute the gradient of the objective function and use the
gradient in an optimization algorithm like L-BFGS

20
HMM,MEMM and CRF Comparison
HMM MEMM CRF
Type Generative model Discriminative model Discriminative model
Relaxes independence
Strong independence Flexible modeling of
Dependency assumptions, but suffers
assumptions dependencies, avoids label bias
from label bias
Sensitive to noisy Sensitive to label bias Robust to noise and can capture
Limitation
observations issue long-range dependencies
Training MLE Gradient based Gradient based
Inference Forward-backward/Viterbi Forward-backward/Viterbi Forward-backward/Viterbi
Computation Easier Moderate Harder
Performance Weaker Moderate Stronger
Graph Directed Directed Undirected
Context Local Local Global,local

21
HMM,MEMM and CRF Comparison

22
Outline


CRF and Bi-LSTM

23
Bi-LSTM
▷ Bi-LSTM
Bi-directional Long Short-Term Memory (Bi-LSTM) is a type of recurrent neural network that can capture long-range
dependencies in sequential data.

24
BiLSTM-CRF

▷ BiLSTM-CRF
• Firstly, every word in sentence x𝑥 is represented as a vector which includes the word’s character embedding and
word embedding. The character embedding is initialized randomly. The word embedding usually is from a pre-
trained word embedding file. All the embeddings will be fine-tuned during the training process.
• Second, the inputs of BiLSTM-CRF model are those embeddings and the outputs are predicted labels for words in
sentence x

25
BiLSTM-CRF
▷ BiLSTM-CRF
• The outputs of BiLSTM layer are the scores of each label. For example, for w0𝑤0 , the outputs of BiLSTM node are
1.5 (B-Person), 0.9 (I-Person), 0.1 (B-Organization), 0.08 (I-Organization) and 0.05 (O). These scores will be the
inputs of the CRF layer.
• Then, all the scores predicted by the BiLSTM blocks are fed into the CRF layer. In the CRF layer, the label sequence
which has the highest prediction score would be selected as the best answer.

26
What if don’t have CRF

Because the outputs of BiLSTM of each word are the label scores. We can select the label which
has the highest score for each word.

Although we can get correct labels for sentence x in this example, but it is not always like that.

27
What if don’t have CRF

▷ CRF layer can learn constrains from training data


The CRF layer could add some constrains to the final predicted labels to ensure they are valid. These constrains can be
learned by the CRF layer automatically from the training dataset during the training process.

The constrains could be:


 The label of the first word in a sentence should start with “B-“ or “O”, not “I-“
 “B-label1 I-label2 I-label3 I-…”, in this pattern, label1, label2, label3 … should be the same named entity label. For
example, “B-Person I-Person” is valid, but “B-Person I-Organization” is invalid.
 “O I-label” is invalid. The first label of one named entity should start with “B-“ not “I-“, in other words, the valid
pattern should be “O B-label”

With these useful constrains, the number of invalid predicted label sequences will decrease dramatically.

28
Case Study

▷ Study objectives
The research paper focuses on the challenge of extracting lung cancer diagnosis
and relating it to the diagnosis date from clinical notes written in Spanish. The
proposed approach combines deep learning-based methods with rule-based
techniques to address the complexities in clinical narratives

▷ Approach

▷ Results
Average F1-score=0.89
29
References
• “Conditional Random Fields for Sequence Prediction.” Accessed: May 28, 2024. [Online]. Available: https://www.davidsbatista.net/blog/2017/11/13/Conditional_Random_Fields/
• “How do Conditional Random Fields (CRF) compare to Maximum Entropy Models and Hidden Markov Models?,” Quora. Accessed: May 28, 2024. [Online]. Available:
https://www.quora.com/How-do-Conditional-Random-Fields-CRF-compare-to-Maximum-Entropy-Models-and-Hidden-Markov-Models
• “CRF Layer on the Top of BiLSTM - 1,” CreateMoMo. Accessed: May 29, 2024. [Online]. Available: http://createmomo.github.io/2017/09/12/CRF_Layer_on_the_Top_of_BiLSTM_1/index.htm l
• E. Schubert, “Maximum Entropy Markov Models (MEMM),” Machine Learning Bits, 2024, Accessed: Jun. 01, 2024. [Online]. Available:
https://dm.cs.tu-dortmund.de/mlbits/sequential-models-maximum-entropy-models/
• “Named Entity Recognition using a Bi-LSTM with the Conditional Random Field Algorithm - Data Science <3 Machine Learning.” Accessed: May 28, 2024. [Online]. Available:
https://michhar.github.io/bilstm-crf-this-is-mind-bending/
• P. Tum, “NLP: Text Segmentation Using Conditional Random Fields,” Medium. Accessed: May 28, 2024. [Online]. Available:
https://medium.com/@phylypo/nlp-text-segmentation-using-conditional-random-fields-e8ff1d2b6060
• PGM 18Spring Lecture 11: CRF (cont’d) + Intro to Topic Models, (Feb. 21, 2018). Accessed: Jun. 01, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=d4r6o-2G-bA
• PGM 18Spring Lecture 10 updated: Discrete sequential Models + General CRF, (Mar. 07, 2018). Accessed: Jun. 01, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=hoSrWNvWjcI
• A. Hannun, “The Label Bias Problem.” Accessed: Jun. 01, 2024. [Online]. Available: https://awni.github.io/label-bias/
• C. Sutton and A. McCallum, “An Introduction to Conditional Random Fields for Relational Learning,” in Introduction to Statistical Relational Learning, L. Getoor and B. Taskar, Eds., The MIT Press,
2007, pp. 93–128. doi: 10.7551/mitpress/7432.003.0006.
• “Conditional Random Fields Explained | by Aditya Prasad | Towards Data Science.” Accessed: Jun. 04, 2024. [Online]. Available:
https://towardsdatascience.com/conditional-random-fields-explained-e5b8256da776
• Conditional Random Fields : Data Science Concepts, (Mar. 01, 2022). Accessed: Jun. 04, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=rI3DQS0P2fk
• “ed3book_jan122022-final-version.pdf.”
• “NER using Random Forest and CRF.” Accessed: Jun. 04, 2024. [Online]. Available: https://www.kaggle.com/code/shoumikgoswami/ner-using-random-forest-and-crf
• “NER with Bi-LSTM CRF.” Accessed: Jun. 04, 2024. [Online]. Available: https://www.kaggle.com/code/ab971631/ner-with-bi-lstm-crf
• O. Solarte Pabón, M. Torrente, M. Provencio, A. Rodríguez-Gonzalez, and E. Menasalvas, “Integrating Speculation Detection and Deep Learning to Extract Lung Cancer Diagnosis from Clinical Notes,”
Applied Sciences, vol. 11, no. 2, p. 865, Jan. 2021, doi: 10.3390/app11020865. 30
Thanks!
Any questions?

You might also like