This document discusses different approaches to part-of-speech (POS) tagging, including rule-based, empirical, and hybrid approaches. Empirical approaches are further divided into example-based and stochastic-based tagging. Stochastic tagging uses probability and can be supervised, using a tagged corpus to train models, or unsupervised, inducing tags automatically. Common supervised techniques are hidden Markov models (HMM) and support vector machines (SVM). Transformation-based tagging is similar to rule-based tagging but induces rules automatically through an iterative process of tagging and correcting.
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
34 views
2.1 Rule Based POS Tagging
This document discusses different approaches to part-of-speech (POS) tagging, including rule-based, empirical, and hybrid approaches. Empirical approaches are further divided into example-based and stochastic-based tagging. Stochastic tagging uses probability and can be supervised, using a tagged corpus to train models, or unsupervised, inducing tags automatically. Common supervised techniques are hidden Markov models (HMM) and support vector machines (SVM). Transformation-based tagging is similar to rule-based tagging but induces rules automatically through an iterative process of tagging and correcting.
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5
3…………
2. POS TAGGING APPROACHES
POS taggers are broadly classified into three categories called rule based, Empirical based and Hybrid based .In case of rule based approach hand-written rules are used to distinguish the tag ambiguity. The empirical POS taggers are further classified into Example based and Stochastic based taggers. Stochastic taggers are either HMM based, choosing the tag sequence which maximizes the product of word likelihood and tag sequence probability, or cue-based, using decision trees or maximum entropy models to combine probabilistic features. The stochastic taggers are further classified in to supervised and unsupervised taggers. Each of these supervised and unsupervised taggers are categorized into different groups based on the particular algorithm used. The Fig. 2 shows the classification of parts of speech approaches. 2.1 Rule Based POS tagging The rule based POS tagging models apply a set of hand written rules and use contextual information to assign POS tags to words. These rules are often known as context frame rules. For example, a context frame rule might say something like: “If an ambiguous/unknown word X is preceded by a Determiner and followed by a Noun, tag it as an Adjective”. One of the first and widely used English POS-taggers employs rule based algorithms is “Brill‟s tagger”. The earliest algorithms for automatically assigning part-of-speech were based on two-stage architecture. The first stage used a dictionary to assign each word a list of potential parts of speech. The second stage used large lists of hand-written disambiguation rules to bring down this list to a single part-of-speech for each word. The ENGTWOL tagger is based on the same two-stage architecture, although both the lexicon and the disambiguation rules are much more sophisticated than the early algorithms. 2.2 Empirical Based POS tagging The relative failure of rule-based approaches, the increasing availability of machine readable text and the increase in capability of hardware (CPU, memory, disk space) with decrease in cost are some of the reasons, researchers to prefer corpus based pos tagging. The empirical approach of parts speech tagging is further divided in to two categories: Example-based approach and Stochastic based approach. Literature shows that majority of the developed POS taggers belongs to empirical based approach. 2.2.1 Example-Based techniques The heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized and 6-points of white space above the subsubsection head. 2.2.2 Stochastic based POS tagging The stochastic approach finds out the most frequently used tag for a specific word in the annotated training data and uses this information to tag that word in the unannotated text. A stochastic approach required a sufficient large sized corpus and calculates frequency, probability or statistics of each and every word in the corpus. The problem with this approach is that it can come up with sequences of tags for sentences that are not acceptable according to the grammar rules of a language. The use of probabilities in tags is quite old; probabilities in tagging were first used in 1965, a complete probabilistic tagger with Viterbi decoding was sketched by Bahl and Mercer (1976), and various stochastic taggers were built in the 1980's (Marshall, 1983; Garside, 1987; Church, 1988; DeRose, 1988). Supervised and unsupervised are two broad categories of stochastic based approach. Supervised POS tagging: The supervised POS tagging models require pre-tagged corpora which are used for training to learn information about the tagset, word-tag frequencies, rule sets etc. The performance of the models generally increases with the increase in size of this corpus. The following are the two familiar examples for supervised POS taggers. Hidden Markov Model (HMM) based POS tagging: An alternative to the word frequency approach is known as the n- gram approach that calculates the probability of a given sequence of tags. It determines the best tag for a word by calculating the probability that it occurs with the n previous tags, where the value of n is set to 1, 2 or 3 for practical purposes. These are known as the Unigram, Bigram and Trigram models. The most common algorithm for implementing an n-gram approach for tagging new text is known as the HMM‟s Viterbi Algorithm. The Viterbi algorithm is a search algorithm that avoids the polynomial expansion of a breadth first search by trimming the search tree at each level using the best „m‟ Maximum Likelihood Estimates (MLE) where „m‟ represents the number of tags of the following word. For a given sentence or word sequence, HMM taggers choose the tag sequence that maximizes as in formula 1: P(word | tag ) X P(tag | previous n tags) (1) A bigram-HMM tagger of this kind chooses the tag ti for word wi that is most probable given the previous tag ti-1 and the current word wi: (2) 1 argmax ( | , ) i j i i j t P t t w Support Vector Machines: SVM is a machine learning algorithm for binary classification, which has been successfully applied to a number of practical problems, including NLP. Let {(x1, y1). . . (xN, yN)} be the set of N training examples, where each instance xi is a vector in RN and yi ∈ {−1,+1} is the class label. In their basic form, a SVM learns a linear hyperplane, that separates the set of positive examples from the set of negative examples with maximal margin (the margin is defined as the distance of the hyperplane to the nearest of the positive and negative examples). This learning bias has proved to have good in terms of generalization bounds for the induced classifiers. The SVMTool is intended to comply with all the requirements of modern NLP technology, by combining simplicity, flexibility, robustness, portability and efficiency with state–of–the–art accuracy. This is achieved by working in the Support Vector Machines (SVM) learning framework, and by offering NLP researchers a highly customizable sequential tagger generator. Unsupervised POS Tagging: Unlike the supervised models, the unsupervised POS tagging models do not require a pre-tagged corpus. Instead, they use advanced computational methods like the Baum-Welch algorithm to automatically induce tagsets, transformation rules etc. Based on the information, they either calculate the probabilistic information needed by the stochastic taggers or induce the contextual rules needed by rule-based systems or transformation based systems. 2.2.3 Transformation-based POS tagging In general, the supervised tagging approach usually requires large sized pre-annotated corpora for training, which is difficult for most of the cases. But recently, good amount of work has been done to automatically induce the transformation rules. One approach to automatic rule induction is to run an untagged text through a tagging model and get the initial output. A human then goes through the output of this first phase and corrects any erroneously tagged words by hand. This tagged text is then submitted to the tagger, which learns correction rules by comparing the two sets of data. Several iterations of this process are sometimes necessary before the tagging model can achieve considerable performance. The transformation based approach is similar to the rule based approach in the sense that it depends on a set of rules for tagging. Transformation-Based Tagging, sometimes called Brill tagging, is an instance of the Transformation-Based Learning (TBL) approach to machine learning (Brill, 1995) and draws inspiration from both the rule- based and stochastic taggers. Like the rule-based taggers, TBL is based on rules that specify what tags should be assigned to a particular word. But like the stochastic taggers, TBL is a machine learning technique, in which rules are automatically induced from the data.