0% found this document useful (0 votes)
27 views10 pages

306140C49 NLP Exp2

Uploaded by

KP TECH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views10 pages

306140C49 NLP Exp2

Uploaded by

KP TECH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Experiment No.

2
Aim- Implement morphological Parser to accept and reject given string

Objective- To understand and implement morphological parser to accept and reject string

Outcome- students will be able to understand the morphological parsing to accept and reject string.

Theory-

The most sophisticated methods for lemmatization involve complete morphological parsing of the
word. Morphology is the study of the way words are built up from smaller meaning-bearing units
called morphemes. Two broad classes of morphemes can be distinguished: stems—the central
morpheme of the word, supplying the main meaning—and affixes—adding “additional” meanings
of various kinds. So, for example, the word fox consists of one morpheme (the morpheme fox) and
the word cats consists of two: the morpheme cat and the morpheme -s. A morphological parser
takes a word like cats and parses it into the two morphemes cat and s, or parses a Spanish word
like amaren (‘if in the future they would love’) into the morpheme amar ‘to love’, and the
morphological features 3PL and future subjunctive.
The goal of morphological parsing is to find out what morphemes a given word is built from. For
example, a morphological parser should be able to tell us that the word cats is the plural form of
the noun stem cat, and that the word mice is the plural form of the noun stem mouse. So, given the
string cats as input, a morphological parser should produce an output that looks similar to cat N
PL. Here are some more examples:

mouse mouse N SG
mice mouse N PL
foxes fox N PL

Morphological parsing yields information that is useful in many NLP applications. In


parsing, e.g., it helps to know the agreement features of words. Similarly, grammar
checkers need to know agreement information to detect such mistakes. But
morphological information also helps spell checkers to decide whether something is a
possible word or not, and in information retrieval it is used to search not only cats, if
that's the user's input, but also for cat.

To get from the surface form of a word to its morphological analysis, we are going to
proceed in two steps. First, we are going to split the words up into its possible
components. So, we will make cat + s out of cats, using + to indicate morpheme
boundaries. In this step, we will also take spelling rules into account, so that there are
two possible ways of splitting up foxes, namely foxe + s and fox + s. The first one
assumes that foxe is a stem and s the suffix, while the second one assumes that the stem
is fox and that the e has been introduced due to the spelling rule that we saw above.

In the second step, we will use a lexicon of stems and affixes to look up the categories
of the stems and the meaning of the affixes. So, cat + s will get mapped to cat NP PL,
and fox + s to fox N PL. We will also find out now that foxe is not a legal stem. This
tells us that splitting foxes into foxe + s was actually an incorrect way of splitting foxes,
which should be discarded. But note that for the word houses splitting it into house +
s is correct.

Here is a picture illustrating the two steps of our morphological parser with some
examples.

We will now build two transducers: one to do the mapping from the surface form to
the intermediate form and the other one to do the mapping from the intermediate form
to the underlying form.

1 From the Surface to the Intermediate Form

To do morphological parsing this transducer has to map from the surface form to the
intermediate form. For now, we just want to cover the cases of English singular and
plural nouns that we have seen above. This means that the transducer may or may not
insert a morpheme boundary if the word ends in s. There may be singular words that
end in s (e.g. kiss). That's why we don't want to make the insertion of a morpheme
boundary obligatory. If the word ends in ses, xes or zes, it may furthermore delete
the e when introducing a morpheme boundary. Here is a transducer that does this. The
``other'' arc in this transducer stands for a transition that maps all symbols except for s,
z, x to themselves.
Let's see how this transducer deals with some of our examples. The following graphs
show the possible sequences of states that the transducer can go through given the
surface forms cats and foxes as input.

2 From the Intermediate Form to the Morphological Structure

Now, we want to take the intermediate form that we produced in the previous section
and map it to the underlying form. The input that this transducer has to accept is of one
of the following forms:

1. regular noun stem, e.g. cat


2. regular noun stem + s, e.g. cat + s
3. singular irregular noun stem, e.g. mouse
4. plural irregular noun stem, e.g. mice

In the first case, the transducer has to map all symbols of the stem to themselves and
then output N and SG. In the second case, it maps all symbols of the stem to themselves,
but then outputs N and replaces PL with s. In the third case, it does the same as in the
first case. Finally, in the fourth case, the transducer should map the irregular plural noun
stem to the corresponding singular stem (e.g. mice to mouse) and then it should
add N and PL. So, the general structure of this transducer looks like this:
What still needs to be specified is how exactly the parts between state 1 and states 2,3,
and 4 respectively look like. Here, we need to recognize noun stems and decide whether
they are regular or not. We do this be encoding a lexicon in the following way. The
transducer part that recognizes cat, for instance, looks like this:

And the transducer part mapping mice to mouse can be specified as follows:

Plugging these (partial) transducers into the transducer given above we get a transducer
that checks that input has the right form and adds category and numerus information.

transducers for mapping from the surface to the intermediate form and for mapping
from the intermediate to the underlying form run in a cascade (i.e. we let the second
transducer run on the output of the first one), we can do a morphological parse of (some)
English noun phrases. However, we can also use this transducer for generating a surface
form from an underlying form. Remember that we can change the direction of
translation when using a transducer in translation mode.

Now, consider the input berries. What will our cascaded transducers make out of it?
The first one will return two possible splittings, berries and berrie + s, but the one that
we would want, berry + s, is not one of them. The reason for this is that there is another
spelling rule at work, here, which we haven't taken into account at all. This rule is saying
that ``y changes to ie before s''. So, in the first step there may be more than one spelling
rules that all have to be applied.

There are basically two ways of dealing with this. First, we can formulate the
transducers for each of the rules in such a way that they can be run in a cascade. Another
possibility is to specify the transducers in such a way that they can be applied in parallel.

There are algorithms for combining several cascaded tranducers or several transducers
that are supposed to be applied in parallel into a single transducer. However, these
algorithms only work, if the individual transducers obey some restrictions so that we
have to take some care when specifying them.
3 Combining the two Transducers

If we now let the two transducers for mapping from the surface to the intermediate form
and for mapping from the intermediate to the underlying form run in a cascade (i.e. we
let the second transducer run on the output of the first one), we can do a morphological
parse of (some) English noun phrases. However, we can also use this transducer for
generating a surface form from an underlying form. Remember that we can change the
direction of translation when using a transducer in translation mode.

Now, consider the input berries. What will our cascaded transducers make out of it?
The first one will return two possible splittings, berries and berrie + s, but the one that
we would want, berry + s, is not one of them. The reason for this is that there is another
spelling rule at work, here, which we haven't taken into account at all. This rule is saying
that ``y changes to ie before s''. So, in the first step there may be more than one spelling
rules that all have to be applied.

There are basically two ways of dealing with this. First, we can formulate the
transducers for each of the rules in such a way that they can be run in a cascade. Another
possibility is to specify the transducers in such a way that they can be applied in parallel.

There are algorithms for combining several cascaded tranducers or several transducers
that are supposed to be applied in parallel into a single transducer. However, these
algorithms only work, if the individual transducers obey some restrictions so that we
have to take some care when specifying them.

Conclusion- Thus students studied in depth about morphological parsing with the implementation
in python/R programming.
PART B
(PART B: TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the practical. The soft
copy must be uploaded on the Blackboard or emailed to the concerned lab in charge faculties at the end
of the practical in case the there is no Black board access available)

Roll. No.BE-C49 Name: Wakhare Amar Sanjay


Class: BE-Comps Batch:C3
Date of Experiment:24/07/2023 Date of Submission:24/07/2023
Grade:

B.1 Software Code written by student:


(Paste your Search material completed during the 2 hours of practical in the lab here)
B.2 Input and Output:
(Not Required)

B.3 Observations and learning:


(Students are expected to comment on the output obtained with clear observations and learning for each task/ sub
part assigned)
B.4 Conclusion:
(Students must write the conclusion as per the attainment of individual outcome listed above and learning/observation
noted in section B.3)

B.5 Question of Curiosity


(To be answered by student based on the practical performed and learning/observations)
Q1: What is morphology? Why do we need to do Morphological Analysis? Discuss various
application domains of Morphological Analysis.

Ans: Morphology deals with parts of words called morphemes. Morphological analysis looks at

how morphemes can be combined or separated to make different words with different

meanings. The most common examples are plural nouns. Usually a noun’s root word alone

means the singular version; for example, for the morpheme cat, the root word cat means “one

cat.” To talk about two or more cats, we take the morpheme cat and add an –s to the end; this is

because spelling plurals with –s or –es is common in English. Understanding the relationship

between cat, cats, and the suffix –s is all part of morphology.

Morphological analysis can be used to reduce the size of lexicon and also plays an important role

in determining the pronunciation of a homograph. It also helps in Schwa deletion and the

consideration of it.

Q2: Explain derivational and inflectional morphology with suitable example.


Ans: inflectional morphemes never change the grammatical category (part of speech) of a word.
derivational morphemes often change the part of speech of a word. Thus, the verb read becomes
the noun reader when we add the derivational morpheme -er. It is simply that read is a verb, but
reader is a noun. However, some derivational morphemes do not change the grammatical category
of a word.
Q3: What is the role of FSA in Morphological analysis? Explain FST in details
Ans:- A finite-state transducer (FST) is a finite-state machine with two memory tapes, following
the terminology for Turing machines: an input tape and an output tape. This contrasts with an
ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton
(FSA) that maps between two sets of symbols.[1] An FST is more general than an FSA. An FSA
defines a formal language by defining a set of accepted strings, while an FST defines relations
between sets of strings.An FST will read a set of strings on the input tape and generates a set of
relations on the output tape. An FST can be thought of as a translator or relater between strings
in a set.In morphological parsing, an example would be inputting a string of letters into the FST,
the FST would then output a string of morphemes.

You might also like