0% found this document useful (0 votes)

106 views41 pages

Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3

This document discusses conditional random fields (CRFs) and how to use the MALLET toolkit to build CRF models for labeling sequence data. It begins with an overview of hidden Markov models and their limitations, then introduces CRFs as undirected graphical models that allow arbitrary overlapping features without independence assumptions. The document explains how MALLET's SimpleTagger implements CRFs and demonstrates training a model to label parts of speech using example sentence features.

Uploaded by

RichaSinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views41 pages

Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3

Uploaded by

RichaSinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Using MALLET for Conditional Random Fields

Matthew Michelson & Craig A. Knoblock CSCI 548 Lecture 3

The road to CRFs

In the beginningGenerative Models (Probability of X and Y P(X,Y)?) Markov assumption: prob. in current state only depends on previous and current state Standard model: Hidden Markov Model (HMM)

Markov Process
Lets say were independent of time, then we can define aij = P(qt=Sj|qt-1=Si) as a STATE TRANSITION from Si to Sj aij >= 0

This conserves all of the Mass of probability; i.e. all outgoing probabilities sum to 1

Markov Process

Two more terms to define: i = P(q1=Si) = probability that we start in state Si bj(k) = P(k|qt = Sj) = probability of observation symbol k in State j. So, lets say symbols = {A,B}, then we could have something like b1(A) = P(A|S1) i.e. what is the probability that we output A in state 1?

Hidden Markov Model

A Hidden Markov Model (HMM)

Set of states, Set of ai,j , Set of i ,Set of bj(k)

learn a set of sequence of observations, and their transition and emission probabilities. Training When testing, input comes in, and fits models internal observations with some probability, output best state transition sequence to produce the input observation Decoding you can observe the sequence of emissions, but you do not know what state the model is in Hidden If 2 states output yes, all I see is yes, I have no idea what state or set of states produced this!

HMM - Example
Urn and Ball Model Each urn has large num. of M distinct colored balls. Randomly pick an urn, and pick out a colored ball, repeat. S = set of states = set of urns Transition Probs = choice of next urn bi(color) = prob. of getting that colored ball in urni

Urn and Ball Problem

Urn and Ball Example

Lets say we have the following: 2 urns 2 colors (Red,Blue) a1,1 = 0.25 a1,2 = 0.75 a2,1 = 0.3 a2,2 = 0.7 b1(Red) = 0.9, b1(Blue) = 0.1 b2(Red) = 0.4, b2(Blue) = 0.6

Urn and Ball Example

Lets say its perfectly random to start with either urn, i.e. 1 = 2 = 0.5 What is the most probable state sequence that produces {Red,Red,Blue}?

Urn and Ball Example

We will use the Viterbi algorithm to do this, recursively: Define (i) = max P[q1,q2,,qt = i,O1,O2,,On| HMM model] (Remember, qt = current state, O are observations) So, t+1(i) = [max t(i) * ai,j] * bj(Ot+1)

Urn and Ball Example

We need a first set of initialized values: 1(i) = i*bi(O1 = Red) i = {1,2} 1(1) = 1*b1(O1 = Red) = 0.5*0.9 = 0.45 1(2) = 2*b2(O1 = Red) = 0.5*0.4 = 0.2

Urn and Ball Example

Now, recurse: 2(1) = max ( {1(1)*a1,1 , 1(2)*a2,1} )*b1(O2 = Red) = max( {0.45*0.25, 0.2*0.3) * 0.9 = 0.10125 2(2) = max( {1(1)*a1,2, 1(2)*a2,2} )*b2(O2 = Red) = max( {0.45*0.75, 0.2*0.7} )*0.4 = 0.135

Urn and Ball Example

Now, recurse: 3(1) = max ( {2(1)*a1,1 , 2(2)*a2,1} )*b1(O3 = Blue) = max( {0.10125*0.25, 0.135*0.3} ) * 0.1 = 0.00405 3(2) = max( {2(1)*a1,2, 2(2)*a2,2} )*b2(O3 = Blue) = max( {0.10125*0.75, 0.135*0.7} )*0.6 = 0.0567

Urn and Ball Example

So, we see that at each step, maximally we have: 3(2) = 0.0567, 2(2) = 0.135 , 1(1) = 0.45 So, working backwards, know the state transitions went Urn 2 Urn 2 Urn 1. So, if we are given observation (Red,Red,Blue) we say that the most probable State transition set is {Start in Urn 1/red, Go to Urn 2/red,Stay Urn 2/blue}

HMM Issues
1 Independence Assumption Current observation only depends on what state you are in right now. Or, to say it differently, the current output has no dependence on previous outputs. For our urn example, we couldnt model the fact that if urn1 outputs a red ball, than urn2 should decrease its probability of doing so.

HMM Issues
2 Multiple Features Issue HMM generates a set of probabilities given an observation. But what if you want to capture many features from an observation, and these features interact? E.g. observation is Doug. This is a noun, capital, and masculine. Now, what if transition is into state = MAN? Now, we know that state MAN probably depends on the observations noun and capital. But, what if we have state CITY too? Doesnt that depend on noun and cap? To transfer into MAN might require a masculine name. This observation strongly depends on the word having feature masculine.

HMM Issues
3 an abundance of training data for one state has no effect on the others

Hidden Markov Model

states

Yi-1
transitions

Yi+1
observations

Xi-1

Xi+1

P( X , Y ) = P ( X i | Yi ) P (Yi | Yi 1 )
i

But how do we model this?

Yi-1 Yi Yi+1

noun is Doug Capit.

Xi-1

X DEPENDENT FEATURES!!

Xi+1

Choice #1: Model all dependencies

Yi-1 Yi Yi+1

is Doug

Capit. noun

Xi-1

Xi+1

Grows infeasible. Need LOTS of training data

Choice #2: Ignore dependencies

Yi-1 Yi Yi+1

noun is Doug Capit.

Xi-1

X Not really a solution

Xi+1

Conditional Model

We prefer a model that is trained to maximize a conditional probability rather than joint probability: P(s|o) instead of P(s,o):

Allow arbitrary, non-independent features on the observation sequence X

Examine features, but dont generate them. (There is not a directed transition from a state to an output) Dont have to explicitly model their dependencies.

Conditionally trained means, Given a set of observations (input) what is the most likely set of labels (states,nodes in the graph) that the model has been trained to traverse given this input

Maximum Entropy Markov Models (MEMMs)

Exponential model Given training set X with label sequence Y: Train a model that maximizes P(Y|X, ) For a new data sequence x, the predicted label y maximizes P(y|x, )

Yi+1

Xi+1

MEMMs (contd)

MEMMs have all the advantages of Conditional Models

Per-state normalization: all the mass that arrives at a state must be distributed among the possible successor states (conservation of score mass) Subject to Label Bias Problem

Label Bias Problem

Consider this MEMM:

P(1 and 2 | ro) = P(2 | 1 and ro)P(1 | ro) = P(2 | 1 and o)P(1 | r) P(1 and 2 | ri) = P(2 | 1 and ri)P(1 | ri) = P(2 | 1 and i)P(1 | r)

Since P(2 | 1 and x) = 1 for all x, P(1 and 2 | ro) = P(1 and 2 | ri) In the training data, label value 2 is the only label value observed after label value 1 Therefore P(2 | 1) = 1, so P(2 | 1 and x) = 1 for all x However, we expect P(1 and 2 | ri) to be greater than P(1 and 2 | ro). Per-state normalization does not allow the required expectation

Another view of Label Bias

Conditional Random Fields (CRFs)

CRFs have all the advantages of MEMMs without label bias problem

MEMM uses per-state exponential model for the conditional probabilities of next states given the current state CRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence

Undirected acyclic graph Allow some transitions vote more strongly than others depending on the corresponding observations

Random Field what it looks like

CRF what it looks like

CRF the guts

CRFdefined
We make feature functions to define features Not generated by model (Xs of HMM)

CRF

Now we have Pr(label|obs.,model)

Find most probable label sequence (ys), given an observation sequence (xs) No more independence assumption

conditionally trained for a whole label sequence given an input sequence (so long range and multi-feature reflected by this)

Example of a feature funct. (ys are labels, xs are input obs)

MALLET

Machine learning toolkit specifically for language tasks Developed at U. Mass. by Andrew McCallum and his group For our purposes, we will use the SimpleTagger class which implements Conditional Random Fields

Getting MALLET to work

1. 2. 3. 4.

Install Cygwin (HW Instructions) Install Ant (HW Inst.) Install MALLET (HW Inst.) Train/Test/Label with SimpleTagger

SimpleTagger

Training
Each line is of the form: <feature1> <feature2> <featureN> <label>

Lets start with an example of a sentence: Los Angeles is a great city! We want to find all nouns, like the example in: http://mallet.cs.umass.edu/index.php/SimpleTagger_example

Training CRFs
Lets say we have some tools that can identify features: Colors List of colors Regex Apostrophe finder Regex Capitalized Stop-Words Common tokens: a, the, etc.. (not etc. the word..)

The red bears favorite color is green?

STOPWORD CAPITALIZED APOS COLOR

Training CRFs
GOAL: Find NOUNS LABELED INPUTS: The SW CAP not-noun red COLOR not-noun bears APOS noun
Note: In SimpleTagger, the default ignore label is O (Used in HW)

The red bears favorite color is green?

STOPWORD CAPITALIZED APOS COLOR

Train SimpleTagger

java -cp "class;lib/mallet-deps.jar" edu.umass.cs.mallet.base.fst.SimpleTagger -train true --model-file SAVEDMODEL TrainingData.txt

Labeling with SimpleTagger

Once you have a trained model, can re-use it to label new data! java -cp "class;lib/mallet-deps.jar" edu.umass.cs.mallet.base.fst.SimpleTagger -include-input true --model-file SAVEDMODEL NotLabeledText.txt > LabeledOutput.txt

CRFs and MALLET

Have fun!

Permutation and Combinations
From Everand
Permutation and Combinations
Ramesh Chandra
4/5 (36)
ASME II A 1 (2015) .PDF Extract
100% (1)
ASME II A 1 (2015) .PDF Extract
7 pages
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
No ratings yet
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
28 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
CRF Eric Xing
No ratings yet
CRF Eric Xing
31 pages
Hidden Markov Models: Julia Hirschberg CS4705
No ratings yet
Hidden Markov Models: Julia Hirschberg CS4705
37 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
56 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Lectures 7 and 8
No ratings yet
Lectures 7 and 8
37 pages
Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國
No ratings yet
Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國
25 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
No ratings yet
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
26 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
8 CRF
No ratings yet
8 CRF
12 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
09 - Hidden Markov Model
No ratings yet
09 - Hidden Markov Model
78 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
5 pages
HMM Detailed
No ratings yet
HMM Detailed
41 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
36 pages
Hidden Markov Models: Background
No ratings yet
Hidden Markov Models: Background
13 pages
Markov Models
No ratings yet
Markov Models
54 pages
PoSTagging-HMM
No ratings yet
PoSTagging-HMM
24 pages
CSE 473: Ar+ficial Intelligence: Bayes' Nets
No ratings yet
CSE 473: Ar+ficial Intelligence: Bayes' Nets
26 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
31 pages
Lecture07 HMM S
No ratings yet
Lecture07 HMM S
26 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
2024 Fall CSE366 12 HMM
No ratings yet
2024 Fall CSE366 12 HMM
46 pages
CS109/Stat121/AC209/E-109 Data Science: Bayesian Methods Continued, Text Data
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Bayesian Methods Continued, Text Data
35 pages
斯坦福大学机器学习数学基础 65-72
No ratings yet
斯坦福大学机器学习数学基础 65-72
8 pages
Lec20 PDF
No ratings yet
Lec20 PDF
7 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
32 pages
Cis262 HMM
No ratings yet
Cis262 HMM
34 pages
Research On CDR
No ratings yet
Research On CDR
24 pages
CRF Laura Kallmeyer
No ratings yet
CRF Laura Kallmeyer
21 pages
Hidden Markov Models Applied To Information Extraction: Part I: Concept
No ratings yet
Hidden Markov Models Applied To Information Extraction: Part I: Concept
34 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
Machine Learning For Natural Language Processing: Hidden Markov Models
No ratings yet
Machine Learning For Natural Language Processing: Hidden Markov Models
33 pages
Unit3pdf 2025 01 14 10 38 08
No ratings yet
Unit3pdf 2025 01 14 10 38 08
4 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
24f 09 Hidden Markov Models
No ratings yet
24f 09 Hidden Markov Models
79 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
9 pages
crf2 PDF
No ratings yet
crf2 PDF
10 pages
Conditional Random Fields
No ratings yet
Conditional Random Fields
10 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
19 pages
Exam2 s15 Sol
100% (1)
Exam2 s15 Sol
10 pages
Optical Character Recognition Using Hidden Markov Models
100% (1)
Optical Character Recognition Using Hidden Markov Models
31 pages
Log-Linear Models, Memms, and CRFS: 1 Notation
No ratings yet
Log-Linear Models, Memms, and CRFS: 1 Notation
11 pages
cs229 HMM
No ratings yet
cs229 HMM
13 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
HLT 2004
No ratings yet
HLT 2004
8 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Unit 4 Full PPT (ML)
No ratings yet
Unit 4 Full PPT (ML)
31 pages
Fuzzy Stat Prob
No ratings yet
Fuzzy Stat Prob
24 pages
Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model 1. Project Introduction
No ratings yet
Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model 1. Project Introduction
11 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Semantic Web For Begineering
No ratings yet
Semantic Web For Begineering
43 pages
Department of Information Technology: Practical List B.E. 5 I.T Prepared By: Richa Sinha
No ratings yet
Department of Information Technology: Practical List B.E. 5 I.T Prepared By: Richa Sinha
1 page
Pan Sharpening Algorithm 1
No ratings yet
Pan Sharpening Algorithm 1
30 pages
Grid Computing
No ratings yet
Grid Computing
136 pages
Internet of Things
No ratings yet
Internet of Things
9 pages
M3B10 M8a10 M10a16 M2a10 M4a10 M1a4 M17a16
No ratings yet
M3B10 M8a10 M10a16 M2a10 M4a10 M1a4 M17a16
1 page
GEMvDAF Wastewater Case Study Dissolved Air Floatation
No ratings yet
GEMvDAF Wastewater Case Study Dissolved Air Floatation
2 pages
Jha - de Spading Activities For N2 Purging
67% (3)
Jha - de Spading Activities For N2 Purging
4 pages
Caterpillar PDF
No ratings yet
Caterpillar PDF
54 pages
Steel AISI O2
No ratings yet
Steel AISI O2
1 page
DD Fonts
No ratings yet
DD Fonts
9 pages
Recent Advances in Irrigation Devices: Dr. Bikramjeet Singh Mds 3 Year
No ratings yet
Recent Advances in Irrigation Devices: Dr. Bikramjeet Singh Mds 3 Year
84 pages
BIg Data
100% (1)
BIg Data
15 pages
Developers
No ratings yet
Developers
45 pages
Growatt 15kw Grid Tied Inverter 1
No ratings yet
Growatt 15kw Grid Tied Inverter 1
2 pages
EY IFRS Accounting For Crypto Assets
No ratings yet
EY IFRS Accounting For Crypto Assets
24 pages
MD6310 Manual Operación Mantenimiento (May 2019)
100% (1)
MD6310 Manual Operación Mantenimiento (May 2019)
238 pages
By: Akansh Gupta CSE 08B91A0505
No ratings yet
By: Akansh Gupta CSE 08B91A0505
15 pages
Bench Marking For Coal Preparation
No ratings yet
Bench Marking For Coal Preparation
10 pages
IC2016TECHNO-SOCIETAL 2016 Paper 298
No ratings yet
IC2016TECHNO-SOCIETAL 2016 Paper 298
9 pages
BCA 2050 Computer Organization Model Question Paper
No ratings yet
BCA 2050 Computer Organization Model Question Paper
16 pages
Switched Mode Power Supply Specifications: Technical Data
No ratings yet
Switched Mode Power Supply Specifications: Technical Data
42 pages
Mahagun Medalleo - Mahagun Medalleo Sector 107
No ratings yet
Mahagun Medalleo - Mahagun Medalleo Sector 107
3 pages
TE IT SEM-5 Advanced Data Structure - Analysis
No ratings yet
TE IT SEM-5 Advanced Data Structure - Analysis
2 pages
Lila Poonawalla Foundation - Complete
No ratings yet
Lila Poonawalla Foundation - Complete
60 pages
Ezc Raymond
No ratings yet
Ezc Raymond
678 pages
Confirmation Emails
No ratings yet
Confirmation Emails
9 pages
9241 Ic Datos
No ratings yet
9241 Ic Datos
5 pages
Method For Dynamic & PIT
No ratings yet
Method For Dynamic & PIT
10 pages
DST Concept
100% (2)
DST Concept
17 pages
Mil Week 2
No ratings yet
Mil Week 2
15 pages
CP400 21
No ratings yet
CP400 21
4 pages
Vertical Separator Sizing Report
No ratings yet
Vertical Separator Sizing Report
4 pages