0% found this document useful (0 votes)

72 views10 pages

Multi-Tagging For Transition-Based Dependency Parsing

This document discusses a modification to transition-based dependency parsing that uses multi-tagging rather than a single part-of-speech tag for each word. A maximum entropy Markov model is trained as a POS multi-tagger to pass more than its 1-best guess to the parser. The parser is then able to make a better decision when committing to a parse for the sentence. The document also discusses using the Universal Dependencies corpus for training and evaluation rather than the Penn Treebank corpus. Feature templates are described for the maximum entropy model, and handling of unknown words is discussed.

Uploaded by

Freyja Sigurgisladottir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views10 pages

Multi-Tagging For Transition-Based Dependency Parsing

Uploaded by

Freyja Sigurgisladottir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Multi-Tagging for Transition-based Dependency Parsing

Alexander Whillas

School of Computer Science and Engineering

University of New South Wales, NSW, Australia
[email protected]

Abstract. This project focuses on a modification of a greedy transition based

dependency parser. Typically a Part-Of-Speech (POS) tagger models a probability
distribution over all the possible tags for each word in the given sentence and
chooses one as its best guess. This is then pass on to the parser which uses this
information to build a parse tree. The current state of the art for POS tagging is
about 97% word accuracy, which seems high but results in a around 56% sentence
accuracy [Manning, 2011]. Small errors at the POS tagging phase can lead to
large errors down the NLP pipeline and transition based parsers are particularity
sensitive to these types of mistakes.
A maximum entropy Markov model was trained as a POS multi-tagger passing
more than its 1-best guess to the parser which was thought could make a bet-
ter decision when committing to a parse for the sentence. This has been shown
[Curran et al., 2006] to give improved accuracy in other parsing approaches. We
shown there is a correlation between tagging ambiguity and parsers accuracy and
in fact the higher the average tags per word the higher the accuracy.

1 Introduction
In Natural Language Processing (NLP) two fundamental tasks are part-of-speech (POS)
tagging, assigning lexical categories to words in a sentence such as Verb or Noun, and
sentence parsing which is preforming a grammatical analysis on the sentence the output
of which is typically a parse tree representing the structure of the words. Typically the
POS taggers output is feed into the parser as input creating an NLP pipeline.
The parse trees that the parser produces depends on the grammar formalism that
is being used to model the language. These can be dived into two main categories:
phrase-structure grammars [Chomsky, 1957] which adhere to the constituency relation
and dependency grammars [Tesnière, 1959] which are based on the dependency relation
(without intermittent constituents in the tree).
Dependency parsing produces a connected, directed acyclic graph (DAG)1 with
nodes that are all the words in a sentence, typically with the addition of an artificial
ROOT node, and with arcs that are the dependency relations. There are three types
of parser methods for producing these trees: transition based methods, similar to finite
state automata; graph based methods2 which find a maximum spanning tree (MST); and
grammar-based methods, which use predefined projective dependency grammar similar
1
aka arborescence
2
or ‘ArcFactored models’
to context-free grammars and thus parsing becomes a constraint satisfaction problem.
[Kübler et al., 2009]

2 Maximum Entropy multi-tagger

Maximum entropy Markov models (MEMM) are conditional probabilistic sequence

models. Given an observation sequence they produces probability distributions over
possible label sequences as output 3 .
The conditional probabilities take the form p(y|x) where y is a POS tag and x is a
local context in which y occurs. p has the log-linear form:
#» #»
e v · f (x,y)
p(y|x; v) = P #» #» . (1)
y 0 ∈Y e v · f (x,y0 )
#»
where f (x, y) is a vector of indicator feature functions and #»
v is a vector of feature
weights. The divisor is a normalisation term to ensure a proper probability distribution.
Feature functions are contextual predicates that identify the presence of selected
features that are estimated to be significant factors in predicting the tag for the current
word. They are typically boolean valued functions returning 1 if the feature is present
and 0 otherwise. A typical feature function might look like:
(
1 if word(x) = the ∧ y = DET
fk (x, y) =
0 otherwise
where the predicate word(x) = the ∧ y = DET is true if the current word is
“the” and the predicted POS tag is “DET”. These feature functions are generated from
a template from every word position in every sentence and typically number in the
hundreds of thousands even for small training data sets. Feature templates very similar
to those used by [Ratnaparkhi et al., 1996] were used and are listed in table 1 with the
addition of up to four suffixes.
with wi = f ox and the previous two tags are previous tags are RB and JJ, and next
two are VBZ and RB. Usually when moving from left to right or right to left only the
previous tags from the last two steps are available. In that case the features that require
tags ahead of the direction the tagger is moving are just not present.
In some cases words which have a high enough frequency and little to no POS tag
ambiguity e.g. “the” for example, tags are looked up before inference begins and thus
their POS tags are always available. This was used to speed up the forward-backward
algorithm only consider the looked up tags instead of all the possibilities.
Parameter estimation of the large number of feature weights, v above, can be done
using an off the shelf gradient accent method. The LBFGS algorithm, a quasi Newtonian
method, was chosen as it handles large gradient vectors and is fast as it approximates
the Hessian matrix from previous steps instead of recalculating it every time.
3
Even thought a MEMM is much slower to train and tag than a Perceptron it does have the
advantage of producing a probability distribution for all tags at all points in a context and this
allows us to multi tag
Feature Template Example
Current word wi & ti wi =fox & NN
Previous word wi−1 & ti wi−1 =brown & NN
Word two back wi−2 & ti wi−2 =quick & NN
Next word wi+1 & ti wi+1 =jumped & NN
Word two ahead wi+2 & ti wi+2 =over & NN
Bigram features wi−2 , wi−1 & ti wi−2 =quick, wi−1 =brown & NN
wi−1 , wi & ti wi−1 =brown, wi =fox & NN
wi wi+1 & ti wi =fox wi+1 = jumped & NN
wi+1 , wi+2 & ti wi+1 =jumped, wi+2 =over & NN
Bigram tag features ti−1 & ti ti−1 =JJ & NN
ti−2 & ti ti−2 =RB & NN
ti+1 & ti ti+1 =VBZ & NN
ti+2 & ti ti+2 =RB & NN
Tri-gram tag ti−2 , ti−1 & ti ti−2 =RB, ti−1 =JJ & NN
ti−1 , ti+1 & ti ti−1 =JJ, ti+1 =VBZ & NN
ti+1 , ti+2 & ti ti+1 =VBZ, ti+2 =RB & NN

Table 1: Feature templates. Examples using:

The quick brown fox jumped over the lazy dog

To avoid over-fitting the training data a regularisation prior term was added to the
objective function.

Forward-backward algorithm [Rabiner, 1989] is used to generate the probability dis-

tributions over all the tags for each word in a sentence. It finds the most probable state
for any given point in a sequence but not necessarily the most probable sequence of
states4 . It does this by combining the probabilities of all the state sequences leading up
to a point in a sequence, the forward probabilities, and then all the sequences after the
point starting from the end, the backward probabilities. The brut force time complexity
of combining every state transition is O(nk ) where n is the length of the sequence and
k is the number of states5 . The forward-backward algorithm is a dynamic programming
algorithm and manages it in O(nk 2 ).

3 Data

The Penn Tree Bank (PTB) corpus which has become the de facto standard data set
for comparing POS tagging and parser performance. The problem with this data is
that it is not freely available to the general public and the process of obtaining it via
official channels is also not straight forward. This is a problem for new and underfunded

4
See the Viterbi algorithm for most probable state sequence.
5
or tags in this case which was 50
researchers. Fortunately a free alternative corpus has just been released. albeit a third
the size, the development was done with this corpus.6
The corpus used was the English part of the Universal Dependencies version 1.1
[Agić et al., 2015] which was in turn build from the English Web Treebank [Silveira et al., 2014].
“The corpus comprises 254,830 words and 16,622 sentences, taken from various web
media including weblogs, newsgroups, emails, reviews, and Yahoo! answers”. Web cor-
pus usually have loser and more diverse grammar than commercial publications such as
the Wall Street Journal and so are perhaps more challenging for NLP tasks.

3.1 Handling unknown words

A dictionary of all the words seen in the training data is built during train. Those with
a high enough frequency and which are not very ambiguous in terms of POS tag fre-
quency are kept in a list along with the tag. This is then used for fast lookup.
Unknown words seen at testing time are then mapped to one of a dozen pseudo-
words which follow the “word features” of [Bikel et al., 1999]. This is has the effect of
closing the vocabulary which means that at test time every word would have been seen
at least once.

Word Feature Example Text Intuition

twoDigitNum 90 Two-digit year
fourDigitNum 1990 Four digit year
containsDigitAndAlpha A8956-67 Product code
containsDigitAndDash 09-96 Date
containsDigitAndSlash 11/9/89 Date
containsDigitAndComma 23,000.00 Monetary amount
containsDigitAndPeriod 1.00 Monetary amount, percentage
otherNum 456789 Other number
allCaps BBN Organisations
capPeriod M. Person name initial
firstWord first word of sentence No useful capitalisation information
initCap Sally Capitalised word
lowerCase can Uncapitalised word
other , Punctuation marks, all other words

Table 2: Pseudo-words classes that unseen words are mated to, in order of precedence
as per [Bikel et al., 1999]

4 Multi-tagging
The multi-tagging approach taken here is heavily influenced by [Curran et al., 2006] in
which they used a multi tagging approach in a POS-tagger and an intermediate super-
6
Not having access to the PTB means that results are not comparable to previous work but this
wasn’t too important as the results only had to compare to themselves.
tagger for a CCG 7 parser and got a 1.6% word accuracy and almost 20% sentence
accuracy improvement by adding an extra 1 tag in 20 words.
We adopted their tag selection process taking all tags for a word with in a factor γ
of the most probable tag for each word.

Ci = c|P (Ci = c|S) > γP (Ci = cmax |S) (2)

where Ci is the set of all tags assigned to the ith word, Ci is the random variable
of the tag of the ith word, cmax is the highest probability of a tag of the ith word, and
S is the sentence. This ensures that if there is very little ambiguity as to the tag for a
given word, its cmax is much higher than all other candidates, then no extra tags will be
added.
To calculate probability distribution across all tags for a word i, P (Ci = c|S), the
forward-backwards algorithm, typically used in Markov models, was used [Charniak et al., 1996].
Real-valued features were used as per [Curran et al., 2006] which gave them the
best performance. The perceptron dependency parsers model uses features very similar
to those used by the MEMM except instead for defining predicates over the lists of
words and tags features are defined over the parse configurations i.e. the stack, buffer
and arc set (see ‘Dependency parsing’ bellow for definition of these).
Thus the standard boolean features of the parser were extend to use real valued
features e.g.
(
p(P OS(x) = NN) if y = NP
fk (x, y) =
0 otherwise

5 Dependency parsing
A dependency parser can be thought of as an abstract machine8 which takes as input a
sequence of words and mapping them to a matching set of head indices (see Table 3).
Typically an artificial ROOT word is added which takes the first (zero index).

indices: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
words: ROOT I have never hated a man enough to give his diamonds back .
heads: - 4 4 4 0 6 4 4 9 4 11 9 9 4

Table 3: Example dependency parse from [Agić et al., 2015] CV set.

More formally we can define a dependency tree as a labelled directed tree T =

(V, A) where the vertices V = {w1 , w2 ..., wn } are the words in the sentence and the
arcs A ⊆ V × L × V where L is a set of arch labels. The labels are left out for the
unlabelled case A ⊆ L × V and the dependency tree can be uniquely defined by a set
of directed arch of the form (h, d), h being the head vertex/word and d it’s dependant.
7
Combinatory categorial grammar
8
A finite-state transducer
ROOT

I have never hated a man enough to give his diamonds back .

Fig. 1: A projective dependancy tree visualising the example from table 3.

Projectivity is a property of a dependency tree that put simply means that the arcs of the
tree do not cross as in Figure1. We assume that all the dependency tree considered have
this property and those that did not were filtered out of the training set. A non-projective
tree is shown in Figure2. These made up about 4% of the corpus. Non-projectivity is
more prevalent in languages like Czech but are not considered a problem in English.

PU
ROOT

TMP
ATT

PC
ATT SBJ VC
ATT

A hearing is scheduled on the issue today .

Fig. 2: A non-projective, labeled dependancy tree.

A transition-based dependency parser treats the parsing task as a search for an op-
timal transition sequence for a given sentence. The transition sequence is broken down
into a series of decisions from a small set of options (see figure 3). At each step a classi-
fier is trained using an oracle that, given a gold parse tree, determines optimal transition
sequences and thus a correct choice can be know and trained against.
A transition system state is defined as having a configuration c = (σ, β, A) where
σ is a stack, β is a buffer and A is a set of dependency arcs. Initially the system has
an empty stack and set of arcs and the buffer is is set to the word in the sentence to be
parsed β = V = w1 , w2 , ..., wn , ROOT . The final terminal state is a configuration in
which the stack is empty and the buffer only contains ROOT 9 .
An Arc-Eager [Goldberg and Nivre, 2013] transition system has four transitions:
There is a precondition on the RIGHT and LEF T transitions that b 6= ROOT
and also the stack σ must not be empty. LEF T is also only legal if s does not have a
parent in the existing arcs A and REDU CE must have a parent in in A. This system
9
and A contains the set of arcs of the parse tree
SHIF T [(σ, b|β, A)] = (σ|b, β, A)
RIGHT [(σ|s, b|β, A)] = (σ|s|b, β, A ∪ (s, b))
LEF T [(σ|s, b|β, A)] = (σ, b|β, A ∪ (b, s))
REDU CE[(σ|s, β, A)] = (σ, β, A)

Fig. 3: Transitions in an unlabelled Arc-Eager transition parser model taken from

[Goldberg and Nivre, 2013]

collects all of its left dependants first then its right dependants and is “eager” because it
adds arcs as early as possible10 .
Dependancy relations can also include labels to indicate the nature of the relation,
such as ‘subject’ or ‘object’. The Arc-Eager system defined above can be extended to
include the labels which simply requires a new set of LEF T and RIGHT transitions
for each label. All the work here can be trivially extended to this case.

The averaged perceptron classifier was introduced to NLP in [Collins, 2002]. Its be-
come the dominant classifier for transition based dependancy parsing because it is a
simple design and yet very effective. It is also linear to train and predict with while
maintaining an competitive accuracy. It works the same way the classical perceptron
algorithm, updating the weights of features on bad predictions: incrementing those that
lead to the right prediction and decrementing those that gave the wrong answer. The
main difference is at the end of the training cycle the average of each weights history
over all the training examples is taken.

A dynamic oracle , introduced by [Goldberg and Nivre, 2012], is used at training time
to predict the optimal transition sequence given any previous transition history includ-
ing those that deviate from the gold tree11 parse. This is done non-deterministically
because a set transitions is returned for a given parse state and gold tree. This allows
a greedy parser to learn how to recover from mistakes and reduces the effects of error
propagation and results in better parse accuracy.

6 Experiments
The data was split into three sets: a large training set, a cross validation set and a testing
set from which the final accuracies are draw.
The tag dictionary, which recorded words and their corresponding tag frequencies
was used as a baseline with unknown words using the highest scoring class they fell
into. This gave a baseline tagging accuracy of 84%.
10
unlike the Arc-Standard model which lacks the REDU CE transition and builds its trees
bottom up i.e. each word collects its dependence before attaching itself to its head word.
11
“gold” standard i.e. the best, most reliable data
The MEMM model was trained on the training set and then its regularisation pa-
rameter was tuned using the cross validation set. A value of 0.66 as settled on see table
4.

regularisation Word% Sent.%

0.1 89.71 43.86
0.2 89.72 43.81
0.33 89.70 43.76
0.5 89.79 43.96
0.58 89.80 44.01
0.66 89.80 44.01
0.9 89.78 43.71
1.0 89.52 43.41

Table 4: Regularisation parameters tested on the UDP 1.1 cross-validation set.

The training data was multi-tagged using the MEMM model using 10-fold cross
validation. The leave-one-out set that got multi-tagged and saved to use in the the parser
training. The dynamic oracle of the transition parser requires tagged data that has the
mistakes the tagger is likely to make at test time so it can learn to recover from transition
histories that incorporate mistakes. This has been shown [Goldberg and Nivre, 2012] to
give 1.5 - 3% improvement in accuracy.

Distributed processing was employed as the MEMM tagger took over 26 hours to run
on 2000 sentences. Doing this ten times for the 10-fold cross validation was thus not
feasible without distributed processing power. Training and tagging of the training set
was done one 49 computers each with 4 cores12

6.1 Parsing

The final test set was made up of 96.34% projective trees. The parses were using the
standard unlabelled attachment score (UAS) which simply counts the number of correct
head dependencies returned by the parser. The sentence accuracy was also counted
which requires the parser to get the dependencies right for a whole sentence.
This was using a MEMM POS tagger trained on the full training set and which
achieved 89.94% word accuracy and 44.63% sentence accuracy on the testing set. The
values for the parser accuracy here are well under state of the art and have not been
tuned.

12
This was possible thank to the dispy python module. http://dispy.sourceforge.
net/
Ambiguity Baseline Multi tagging Difference
γ tags/word UAS UAS Sent. UAS UAS Sent. Word Sent.
0.0 1.92 78.86 46.41 79.34 47.28 0.48 0.87
0.1 1.40 79.08 46.75 79.38 47.21 0.30 0.46
0.2 1.31 79.09 46.88 79.35 46.97 0.26 0.09
0.3 1.27 79.11 46.92 79.38 46.89 0.27 -0.03
0.4 1.24 79.13 46.94 79.31 46.81 0.18 -0.13
0.5 1.22 79.04 46.82 79.28 46.79 0.24 -0.03
0.6 1.21 79.06 46.81 79.28 46.81 0.22 0.00
0.7 1.19 79.07 46.78 79.26 46.74 0.19 -0.04
0.8 1.18 79.07 46.83 79.24 46.69 0.17 -0.14
0.9 1.17 79.07 46.73 79.21 46.66 0.14 -0.07
1.0 1.00 79.08 46.73 79.21 46.70 0.13 -0.03

Table 5: Unlabelled attachment Score (UAS) for word and sentence for both the base-
line without multi-tags and with multi-tagging. The difference column is the difference
between the two sets.

7 Conclusion

The results listed in Table 5 indicate that as the number of tags per word increases (i.e.
as gamma approaches zero the ambiguity increases) the UAS increases as well.
The MEMM tagger was also bellow the published 96% accuracy. Some of this is
the result of the data used here being web data and only about a third the size of the
Penn Wall Street Journal treebank. Due to the sparsity of the features in the MEMM
model they tent to benefit from more data.
Another problem with the MEMM tagger used here is that its run time is much
slower than the parser that it generates it’s multiple tags for i.e. polynomial verse lin-
ear. Regardless the aim of the experiment was to see if multi tagging features have
some benefit to a feature based dependency parser. Other, faster, methods of generating
probability distributions could be explored in other work such as the SoftMax layer of
[Weiss et al., 2015] for example.
The results were not as significant as those reported by [Curran et al., 2006] on
which this work was inspired. There a probabilistic based parser, the CYK algorithm
that uses a Probabilistic Context Free Grammar (PCFG), to generate a packed chart
of all possible parses over the given tagged sentences[Steedman, 2000]. These are then
ranked by a log-linear model similar to a MEMM. Given that the parser was using prob-
abilities in its calculations the effect of injecting real-valued features would have been
more prominent.
This small experiment shows that there is potential along this line of enquiry if
multi-tagging can be done in a more efficient manor.
Acknowledgements
I would like to thank Alan Blair for supervising this project. Also Matthew Honnibal
for the initial idea behind this line of enquiry and he use of this dependency parser code.

References
Agić et al., 2015. Agić, Ž., Aranzabe, M. J., Atutxa, A., Bosco, C., Choi, J., de Marneffe, M.-C.,
Dozat, T., Farkas, R., Foster, J., Ginter, F., Goenaga, I., Gojenola, K., Goldberg, Y., Hajič, J.,
Johannsen, A. T., Kanerva, J., Kuokkala, J., Laippala, V., Lenci, A., Lindén, K., Ljubešić, N.,
Lynn, T., Manning, C., Martı́nez, H. A., McDonald, R., Missilä, A., Montemagni, S., Nivre,
J., Nurmi, H., Osenova, P., Petrov, S., Piitulainen, J., Plank, B., Prokopidis, P., Pyysalo, S.,
Seeker, W., Seraji, M., Silveira, N., Simi, M., Simov, K., Smith, A., Tsarfaty, R., Vincze, V.,
and Zeman, D. (2015). Universal dependencies 1.1.
Bikel et al., 1999. Bikel, D., Schwartz, R., and Weischedel, R. (1999). An algorithm that learns
what’s in a name. Machine Learning, 34(1-3):211–231.
Charniak et al., 1996. Charniak, E., Carroll, G., Adcock, J., Cassandra, A., Gotoh, Y., Katz, J.,
Littman, M., and McCann, J. (1996). Taggers for parsers. Artificial Intelligence, 85(1):45–57.
Chomsky, 1957. Chomsky, N. (1957). Syntactic structures. Walter de Gruyter.
Collins, 2002. Collins, M. (2002). Discriminative training methods for hidden markov models:
Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference
on Empirical methods in natural language processing-Volume 10, pages 1–8. Association for
Computational Linguistics.
Curran et al., 2006. Curran, J. R., Clark, S., and Vadas, D. (2006). Multi-tagging for lexicalized-
grammar parsing. In Proceedings of the 21st International Conference on Computational Lin-
guistics and the 44th annual meeting of the Association for Computational Linguistics, pages
697–704. Association for Computational Linguistics.
Goldberg and Nivre, 2012. Goldberg, Y. and Nivre, J. (2012). A dynamic oracle for arc-eager
dependency parsing. In COLING, pages 959–976.
Goldberg and Nivre, 2013. Goldberg, Y. and Nivre, J. (2013). Training deterministic parsers
with non-deterministic oracles. Transactions of the association for Computational Linguistics,
1:403–414.
Kübler et al., 2009. Kübler, S., McDonald, R., and Nivre, J. (2009). Dependency parsing. Syn-
thesis Lectures on Human Language Technologies, 1(1):1–127.
Manning, 2011. Manning, C. D. (2011). Part-of-speech tagging from 97% to 100%: is it time
for some linguistics? In Computational Linguistics and Intelligent Text Processing, pages 171–
189. Springer.
Rabiner, 1989. Rabiner, L. (1989). A tutorial on hidden markov models and selected applications
in speech recognition. Proceedings of the IEEE, 77(2):257–286.
Ratnaparkhi et al., 1996. Ratnaparkhi, A. et al. (1996). A maximum entropy model for part-of-
speech tagging. In Proceedings of the conference on empirical methods in natural language
processing, volume 1, pages 133–142. Philadelphia, USA.
Silveira et al., 2014. Silveira, N., Dozat, T., de Marneffe, M.-C., Bowman, S., Connor, M., Bauer,
J., and Manning, C. D. (2014). A gold standard dependency corpus for English. In Proceedings
of the Ninth International Conference on Language Resources and Evaluation (LREC-2014).
Steedman, 2000. Steedman, M. (2000). The syntactic process, volume 24. MIT Press.
Tesnière, 1959. Tesnière, L. (1959). Eléments de syntaxe structurale. Librairie C. Klincksieck.
Weiss et al., 2015. Weiss, D., Alberti, C., Collins, M., and Petrov, S. (2015). Structured training
for neural network transition-based parsing. arXiv preprint arXiv:1506.06158.

Natural Language Processing From Scratch
No ratings yet
Natural Language Processing From Scratch
45 pages
Agile-Stage-Gate For Manufacturers: Research-Technology Management
No ratings yet
Agile-Stage-Gate For Manufacturers: Research-Technology Management
11 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
This Is AI4001: GCR: t37g47w
No ratings yet
This Is AI4001: GCR: t37g47w
51 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
2.1 Rule Based POS Tagging
No ratings yet
2.1 Rule Based POS Tagging
5 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
Sepe A POS Tagger For Spanish
No ratings yet
Sepe A POS Tagger For Spanish
10 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
NLPChapter3
No ratings yet
NLPChapter3
14 pages
UNIT NO 3
No ratings yet
UNIT NO 3
8 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
hidden markov model
No ratings yet
hidden markov model
13 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
Lec3-posner intro
No ratings yet
Lec3-posner intro
30 pages
Hmm
No ratings yet
Hmm
94 pages
POStagging
No ratings yet
POStagging
72 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
ai txt unit4
No ratings yet
ai txt unit4
39 pages
Patoary 2020
No ratings yet
Patoary 2020
4 pages
unit-3
No ratings yet
unit-3
50 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
NLP 4
No ratings yet
NLP 4
83 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
Arxiv: Natural Language Processing (Almost) From Scratch
No ratings yet
Arxiv: Natural Language Processing (Almost) From Scratch
47 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
TNT - A Statistical Part-Of-Speech Tagger: T I I 1 I 2 I I T+1 T
No ratings yet
TNT - A Statistical Part-Of-Speech Tagger: T I I 1 I 2 I I T+1 T
8 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
POS Tagging (1)
No ratings yet
POS Tagging (1)
5 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Rule-Based POS Tagging: Part of Speech Tagging
No ratings yet
Rule-Based POS Tagging: Part of Speech Tagging
10 pages
A Fast and Accurate Dependency Parser Using Neural Networks
No ratings yet
A Fast and Accurate Dependency Parser Using Neural Networks
11 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
Neural Net
No ratings yet
Neural Net
62 pages
Ijcnn 2001
No ratings yet
Ijcnn 2001
5 pages
Sanskrit Tag-Sets and Part-Of-Speech Tagging Methods - A Survey
No ratings yet
Sanskrit Tag-Sets and Part-Of-Speech Tagging Methods - A Survey
6 pages
Module-2_NLP (1)
No ratings yet
Module-2_NLP (1)
50 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Preprocessing NLTK
No ratings yet
Preprocessing NLTK
5 pages
POS Tagging Comparison
No ratings yet
POS Tagging Comparison
3 pages
0abc5d7ab6458ec55a14c9f7c300438b_lec10
No ratings yet
0abc5d7ab6458ec55a14c9f7c300438b_lec10
77 pages
723
No ratings yet
723
5 pages
pos tagging and chunking
No ratings yet
pos tagging and chunking
29 pages
NLP Programming en 04 HMM
No ratings yet
NLP Programming en 04 HMM
24 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
18 pages
Evaluating Word Embeddings and A Revised Corpus For Part of Speech Tagging in Portuguese
No ratings yet
Evaluating Word Embeddings and A Revised Corpus For Part of Speech Tagging in Portuguese
15 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Rutuja
No ratings yet
Rutuja
10 pages
NLP Summary
No ratings yet
NLP Summary
6 pages
Cross Correlation: Unlocking Patterns in Computer Vision
From Everand
Cross Correlation: Unlocking Patterns in Computer Vision
Fouad Sabry
No ratings yet
Logic Primer, third edition
From Everand
Logic Primer, third edition
Colin Allen
No ratings yet
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
Announcements: See Web Page For Talk Schedule Dire Consequences If I Don't Hear From You by Monday Schedule Next Week
No ratings yet
Announcements: See Web Page For Talk Schedule Dire Consequences If I Don't Hear From You by Monday Schedule Next Week
39 pages
TWM 1903 Editorial Proof (Dragged)
No ratings yet
TWM 1903 Editorial Proof (Dragged)
2 pages
F Xy+x' y '+ y ' Z F Xy+x' y '+ y ' Z: Logic Design (CE 207, CE 213) Chapter No. 2 - Part No. 2
No ratings yet
F Xy+x' y '+ y ' Z F Xy+x' y '+ y ' Z: Logic Design (CE 207, CE 213) Chapter No. 2 - Part No. 2
8 pages
Digital Circuit Verification Hardware Descriptive Language Verilog
No ratings yet
Digital Circuit Verification Hardware Descriptive Language Verilog
17 pages
Logic Design (CE 207, CE 213) Chapter No. 2 - Part No. 1
No ratings yet
Logic Design (CE 207, CE 213) Chapter No. 2 - Part No. 1
11 pages
Logic Design (CE 207, CE 213) Chapter No. 1 - Part No. 2
No ratings yet
Logic Design (CE 207, CE 213) Chapter No. 1 - Part No. 2
5 pages
General Exercise Info: Assistant Lecturer: Dr. Xiaopeng Hong Office: Room TS 325, Phone (Office) : +358 294 482795
No ratings yet
General Exercise Info: Assistant Lecturer: Dr. Xiaopeng Hong Office: Room TS 325, Phone (Office) : +358 294 482795
2 pages
Digital Logic HW #2 (Chapter 2) : (A) Xy + Xy' (B) (X + Y) (X + Y') (C) Xyz + X'y + Xyz' (D) (A + B) ' (A' + B') '
No ratings yet
Digital Logic HW #2 (Chapter 2) : (A) Xy + Xy' (B) (X + Y) (X + Y') (C) Xyz + X'y + Xyz' (D) (A + B) ' (A' + B') '
2 pages
521493S Computer Graphics Exercise 1 (Chapters 1-3) : Solutions
No ratings yet
521493S Computer Graphics Exercise 1 (Chapters 1-3) : Solutions
5 pages
Music Recommendation Engine
No ratings yet
Music Recommendation Engine
12 pages
Guia Rápido
No ratings yet
Guia Rápido
2 pages
Solid State Diagrams Only
No ratings yet
Solid State Diagrams Only
23 pages
1.yourself - IT Jobs
No ratings yet
1.yourself - IT Jobs
8 pages
Kaylee Babasade Resume-2
No ratings yet
Kaylee Babasade Resume-2
2 pages
IS 513-1-2016 - Cold Reduced Carbon Steel Sheet and Strip Part 1 Cold Forming and Drawing Purpose
No ratings yet
IS 513-1-2016 - Cold Reduced Carbon Steel Sheet and Strip Part 1 Cold Forming and Drawing Purpose
19 pages
Eos MCQ
No ratings yet
Eos MCQ
189 pages
AC10 (FM) MAR19 As LR
No ratings yet
AC10 (FM) MAR19 As LR
24 pages
The Battle For Children World War Ii Youth Crime And Juvenile Justice In Twentiethcentury France Sarah Fishman pdf download
No ratings yet
The Battle For Children World War Ii Youth Crime And Juvenile Justice In Twentiethcentury France Sarah Fishman pdf download
86 pages
Unit 3 - Q
No ratings yet
Unit 3 - Q
3 pages
Certification Specifications For Standard Changes & Standard Repairs (CS-STAN) - Phase 1
No ratings yet
Certification Specifications For Standard Changes & Standard Repairs (CS-STAN) - Phase 1
58 pages
Practical Experience Logbook
No ratings yet
Practical Experience Logbook
6 pages
MÃ 823-KSCL KHỐI 12-2023
No ratings yet
MÃ 823-KSCL KHỐI 12-2023
6 pages
Hydrochloric Acid Plant Design: The Copperbelt University School of Technology Chemical Engineering Department
100% (1)
Hydrochloric Acid Plant Design: The Copperbelt University School of Technology Chemical Engineering Department
86 pages
Caparo Maruti Ltd. Summer Training Report
64% (14)
Caparo Maruti Ltd. Summer Training Report
32 pages
Upstream Intermediate b2 Students Book (2)
No ratings yet
Upstream Intermediate b2 Students Book (2)
2 pages
Atwood Water Heater Manual PDF
No ratings yet
Atwood Water Heater Manual PDF
41 pages
Enduro_Ladder_Cable_Tray_Installation_Guide_03-11
No ratings yet
Enduro_Ladder_Cable_Tray_Installation_Guide_03-11
2 pages
HSTET2018 - 2 Feb 19 - Day 2 - Shift 1 - 9.30am - HINDI (Paper1) PDF
No ratings yet
HSTET2018 - 2 Feb 19 - Day 2 - Shift 1 - 9.30am - HINDI (Paper1) PDF
38 pages
Superscalar Processor
No ratings yet
Superscalar Processor
4 pages
Sermons by Gbile Akanni
No ratings yet
Sermons by Gbile Akanni
6 pages
10-3511-505-ST25-03 Stratos Micra 25 Installers Handbook
No ratings yet
10-3511-505-ST25-03 Stratos Micra 25 Installers Handbook
58 pages
Leadership
No ratings yet
Leadership
14 pages
RP 01 - Making Up A Standard Solution and Carrying Out An Acid-Base Titration
No ratings yet
RP 01 - Making Up A Standard Solution and Carrying Out An Acid-Base Titration
6 pages
Paired Muscle Relaxation
No ratings yet
Paired Muscle Relaxation
1 page
Annual Syllabus 202425 IX (3)
No ratings yet
Annual Syllabus 202425 IX (3)
21 pages
Chapter 1 4, Bini
No ratings yet
Chapter 1 4, Bini
43 pages
Pharmaceutical Calculations USP
No ratings yet
Pharmaceutical Calculations USP
30 pages
Teenage Pregnancy
No ratings yet
Teenage Pregnancy
2 pages
Influence of Evaporation Ducts On Radar Sea Return
No ratings yet
Influence of Evaporation Ducts On Radar Sea Return
9 pages
Swing Drive
No ratings yet
Swing Drive
4 pages