0% found this document useful (0 votes)
34 views

2.1 Rule Based POS Tagging

This document discusses different approaches to part-of-speech (POS) tagging, including rule-based, empirical, and hybrid approaches. Empirical approaches are further divided into example-based and stochastic-based tagging. Stochastic tagging uses probability and can be supervised, using a tagged corpus to train models, or unsupervised, inducing tags automatically. Common supervised techniques are hidden Markov models (HMM) and support vector machines (SVM). Transformation-based tagging is similar to rule-based tagging but induces rules automatically through an iterative process of tagging and correcting.

Uploaded by

Rabia Qasim
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

2.1 Rule Based POS Tagging

This document discusses different approaches to part-of-speech (POS) tagging, including rule-based, empirical, and hybrid approaches. Empirical approaches are further divided into example-based and stochastic-based tagging. Stochastic tagging uses probability and can be supervised, using a tagged corpus to train models, or unsupervised, inducing tags automatically. Common supervised techniques are hidden Markov models (HMM) and support vector machines (SVM). Transformation-based tagging is similar to rule-based tagging but induces rules automatically through an iterative process of tagging and correcting.

Uploaded by

Rabia Qasim
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

3…………

2. POS TAGGING APPROACHES


POS taggers are broadly classified into three categories called rule
based, Empirical based and Hybrid based .In case of rule based approach
hand-written rules are used to distinguish the tag ambiguity. The
empirical POS taggers are further classified into Example based and
Stochastic based taggers. Stochastic taggers are either HMM based,
choosing the tag sequence which maximizes the product of word
likelihood and tag sequence probability, or cue-based, using decision
trees or maximum entropy models to combine probabilistic features. The
stochastic taggers are further classified in to supervised and
unsupervised taggers. Each of these supervised and unsupervised taggers
are categorized into different groups based on the particular algorithm
used. The Fig. 2 shows the classification of parts of speech approaches.
2.1 Rule Based POS tagging
The rule based POS tagging models apply a set of hand written rules
and use contextual information to assign POS tags to words. These rules
are often known as context frame rules. For example, a context frame
rule might say something like: “If an ambiguous/unknown word X is
preceded by a Determiner and followed by a Noun, tag it as an
Adjective”. One of the first and widely used English POS-taggers
employs rule based algorithms is “Brill‟s tagger”. The earliest
algorithms for automatically assigning part-of-speech were based on
two-stage architecture. The first stage used a dictionary to assign each
word a list of potential parts of speech. The second stage used large
lists of hand-written disambiguation rules to bring down this list to a
single part-of-speech for each word. The ENGTWOL tagger is based on
the same two-stage architecture, although both the lexicon and the
disambiguation rules are much more sophisticated than the early
algorithms.
2.2 Empirical Based POS tagging
The relative failure of rule-based approaches, the
increasing availability of machine readable text and the
increase in capability of hardware (CPU, memory, disk
space) with decrease in cost are some of the reasons,
researchers to prefer corpus based pos tagging. The
empirical approach of parts speech tagging is further divided
in to two categories: Example-based approach and Stochastic
based approach. Literature shows that majority of the developed
POS taggers belongs to empirical based approach.
2.2.1 Example-Based techniques
The heading for subsubsections should be in Times New Roman
11-point italic with initial letters capitalized and 6-points of
white space above the subsubsection head.
2.2.2 Stochastic based POS tagging
The stochastic approach finds out the most frequently used tag
for a specific word in the annotated training data and uses this
information to tag that word in the unannotated text. A
stochastic approach required a sufficient large sized corpus and
calculates frequency, probability or statistics of each and every
word in the corpus. The problem with this approach is that it
can come up with sequences of tags for sentences that are not
acceptable according to the grammar rules of a language. The
use of probabilities in tags is quite old; probabilities in tagging
were first used in 1965, a complete probabilistic tagger with
Viterbi decoding was sketched by Bahl and Mercer (1976), and
various stochastic taggers were built in the 1980's (Marshall,
1983; Garside, 1987; Church, 1988; DeRose, 1988). Supervised
and unsupervised are two broad categories of stochastic based
approach. Supervised POS tagging: The supervised POS tagging
models require pre-tagged corpora which are used for training
to learn information about the tagset, word-tag frequencies,
rule sets etc. The performance of the models generally
increases with the increase in size of this corpus. The following
are the two familiar examples for supervised POS taggers.
Hidden Markov Model (HMM) based POS tagging: An
alternative to the word frequency approach is known as the n-
gram approach that calculates the probability of a given
sequence of tags. It determines the best tag for a word by
calculating the probability that it occurs with the n previous
tags, where the value of n is set to 1, 2 or 3 for practical
purposes. These are known as the Unigram, Bigram and
Trigram models. The most common algorithm for implementing
an n-gram approach for tagging new text is known as the
HMM‟s Viterbi Algorithm. The Viterbi algorithm is a search
algorithm that avoids the polynomial expansion of a breadth
first search by trimming the search tree at each level using the
best „m‟ Maximum Likelihood Estimates (MLE) where „m‟
represents the number of tags of the following word. For a
given sentence or word sequence, HMM taggers choose the tag
sequence that maximizes as in formula 1:
P(word | tag ) X P(tag | previous n tags) (1)
A bigram-HMM tagger of this kind chooses the tag ti for word
wi that is most probable given the previous tag ti-1 and the
current word wi:
(2) 1 argmax ( | , ) i j i i j t P t t w
Support Vector Machines: SVM is a machine learning algorithm
for binary classification, which has been successfully applied to
a number of practical problems, including NLP. Let {(x1, y1). . .
(xN, yN)} be the set of N training examples, where each
instance xi is a vector in RN and yi ∈ {−1,+1} is the class label.
In their basic form, a SVM learns a linear hyperplane, that
separates the set of positive examples from the set of negative
examples with maximal margin (the margin is defined as the
distance of the hyperplane to the nearest of the positive and
negative examples). This learning bias has proved to have good
in terms of generalization bounds for the induced classifiers.
The SVMTool is intended to comply with all the requirements
of modern NLP technology, by combining simplicity, flexibility,
robustness, portability and efficiency with state–of–the–art
accuracy. This is achieved by working in the Support Vector
Machines (SVM) learning framework, and by offering NLP
researchers a highly customizable sequential tagger generator.
Unsupervised POS Tagging: Unlike the supervised models, the
unsupervised POS tagging models do not require a pre-tagged
corpus. Instead, they use advanced computational methods like
the Baum-Welch algorithm to automatically induce tagsets,
transformation rules etc. Based on the information, they either
calculate the probabilistic information needed by the stochastic
taggers or induce the contextual rules needed by rule-based
systems or transformation based systems.
2.2.3 Transformation-based POS tagging
In general, the supervised tagging approach usually requires large sized
pre-annotated corpora for training, which is difficult for most of the
cases. But recently, good amount of work has been done to
automatically induce the transformation rules. One approach to
automatic rule induction is to run an untagged text through a tagging
model and get the initial output. A human then goes through the output
of this first phase and corrects any erroneously tagged words by hand.
This tagged text is then submitted to the tagger, which learns correction
rules by comparing the two sets of data. Several iterations of this process
are sometimes necessary before the tagging model can achieve
considerable performance. The transformation based approach is similar
to the rule based approach in the sense that it depends on a set of rules
for tagging.
Transformation-Based Tagging, sometimes called Brill tagging, is an
instance of the Transformation-Based Learning (TBL) approach to
machine learning (Brill, 1995) and draws inspiration from both the rule-
based and stochastic taggers. Like the rule-based taggers, TBL is based
on rules that specify what tags should be assigned to a particular word.
But like the stochastic taggers, TBL is a machine learning technique, in
which rules are automatically induced from the data.

You might also like