0% found this document useful (0 votes)
5 views18 pages

Module 1 Notes

Natural Language Processing (NLP) focuses on creating computational models to automate language processing and enhance understanding of human communication. It encompasses various levels of analysis, including lexical, syntactic, semantic, discourse, and pragmatic analysis, each requiring different types of knowledge. NLP faces challenges such as ambiguity in natural languages and the complexity of grammar, while its applications include machine translation, speech recognition, and text summarization.

Uploaded by

Spoorthi Harkuni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

Module 1 Notes

Natural Language Processing (NLP) focuses on creating computational models to automate language processing and enhance understanding of human communication. It encompasses various levels of analysis, including lexical, syntactic, semantic, discourse, and pragmatic analysis, each requiring different types of knowledge. NLP faces challenges such as ambiguity in natural languages and the complexity of grammar, while its applications include machine translation, speech recognition, and text summarization.

Uploaded by

Spoorthi Harkuni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

NATURAL LANGUGAE PROCESSING BAI601/BAD613B

MODULE-1
CHAPTER-1
INTRODUCTION
1.1 WHAT IS NATURAL LANGUAGE PROCESSING (NLP)
Natural language processing (NLP) is concerned with the development of computational
models of aspects of human language processing. There are two main reasons for such
development:
1. To develop automated tools for language processing
2. To gain a better understanding of human communication.
Building computational models with human language-processing abilities requires a
knowledge of how humans acquire, store, and process language. It also requires a knowledge
of the world and of language.
Historically, there have been two major approaches to NLP-he rationalist approach and the
empiricist approach. Early NILP research took a rationalist approach, which assumes the
existence of some language faculty in the human brain. Supporters of this approach argue
that it is not possible for children to learn a complex thing like natural language from limited
sensory inputs. Empiricists do not believe in existence of a language faculty. Instead, they
believe in the existence of some general organization principles such as pattern recognition,
generalization, and association. Learning of detailed structures can, therefore, take place
through the application of these principles on sensory inputs available to the child.

1.2 ORIGINS OF NLP


Natural language processing sometimes mistakenly termed natural language understanding-
originated from machine translation research. While natural language understanding
involves only the interpretation of language, natural language processing includes both
understanding (interpretation) and generation (production). The NILP also includes speech
processing. However, in this book, we are concerned with text processing only, covering work
in the area of computational linguistics, and the tasks in which NLP has found useful
application.
Computational linguistics is similar to theoretical- and psycho-linguistics, but uses different
tools. Theoretical linguists mainly provide structural description of natural language and its
semantics. They are not concerned with the actual processing of sentences or generation of
sentences from structural description. They are in a quest for principles that remain common
across languages and identify rules that capture linguistic generalization. oor example, most
languages have constructs like noun and verb phrases. Theoretical linguists identify rules that
describe and restrict the structure of languages (grammar). Psycholinguists explain how
humans produce and comprehend natural language. Unlike theoretical linguists, they are
interested in the representation of linguistic structures as well as in the process by which
these structures are produced. They rely primarily on empirical investigations to back up

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

their theories. Computational linguistics is concerned with the study of language using
computational models of linguistic phenomena. It deals with the application of linguistic
theories and computational techniques for NLP. In computational linguistics, representing a
language is a major problem; most knowledge representations tackle only a small part of
knowledge. Computational models May be broadly classified as
Knowledge driven and data-driven.
Knowledge driven they rely on explicitly coded linguistic knowledge which are expressed as
a set of grammar rules. Whereas data-driven, Rely on existence of a large amount of data,
Uses machine learning techniques to learn.

1.3 LANGUAGE AND KNOWLEDGE


Language is the medium of expression in which knowledge is deciphered
It can be a knowledge representation tool that has historically represented the whole body
of knowledge and that can be modified may be through generating of new words or including
new ideas and situation. The language and speech community, on the other hand, considers
a language as a set of sounds that, through combinations, conveys meaning to a listener.
However, we are concerned with representing and processing text only. Language (text)
processing has different levels, each involving different types of knowledge. We now discuss
various levels of processing and the types of knowledge it involves.
 The simplest level of analysis is lexical analysis, which involves analysis of words.
Words are the most fundamental unit (syntactic as well as semantic) of any natural
language text. Word level processing requires morphological knowledge, i.e.,
knowledge about the structure and formation of words from basic units
(morphemes). The rules for forming words from morphemes are language specific.
 Syntactic analysis, which considers a sequence of words as a unit. It decomposes a
sentence into words and identifies the relation between them.
It captures grammaticality or non-grammaticality of sentences by looking at
constraints like word order, number, and case agreement. This level of processing
requires syntactic knowledge, i.e., knowledge about how words are combined to
form larger units such as phrases and sentences, and what constraints are imposed
on them. Arranging the words in the manner that shows the relationship among the
words. The sentence such as “The school goes to boy is” is rejected by the English
syntactic analyser.
 Semantic analysis which is associated with the meaning of the language. It is
concerned with creating meaningful representation of linguistic inputs. It draws the
exact meaning or dictionary meaning from the text. It focuses on the literal meaning
of words.
Eg. Apple is best fruit.
Apple phone is expensive than others.
 Discourse analysis is a higher level of analysis.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

It attempts to interpret the structure and meaning of larger units, e.g., at the
paragraph and document level, in terms of words, phrases, clusters, and sentences.
It deals with how the meaning of a sentence is determined by preceding sentence.
 Pragmatic analysis is the highest level of analysis. It deals with the purposeful use
of sentences in situations.
 It requires knowledge of the world, i.e., knowledge that extends beyond the
contents of the text.
 It focuses on inferred meaning perceived by the speaker and the listener.

1.4 THE CHALLENGES OF NLP

Given that natural languages are highly ambiguous and vague, achieving such representation
can be difficult. The inability to capture all the required knowledge is another source of
difficulty. It is almost impossible to embody all sources of knowledge that humans use to
process language. Even if this were done, it is not possible to write procedures that imitate
language processing as done by humans. The inability to capture all the required knowledge.
Identifying the semantics of sentences.
• Sentence meaning can be inferred by the syntactic and semantic relation of words.
• The frequency of a word being used in a particular sense also affects its meaning.
• Idioms, metaphor, and ellipses add more complexity to identify the meaning of the
written text.
• The scope of quantifiers (the, each, etc.) is often not clear and poses problem in
automatic processing.
Ambiguity of natural languages
• Word level ambiguity: A word may be ambiguous in its part. Of-speech or it may be
ambiguous in its meaning. The word can' is ambiguous in its part-of-speech
whereas the word 'bat is ambiguous in its meaning.
• Eg: bank, can
• Structural ambiguity:
• A sentence may be ambiguous even if the words are not. oor example, the
sentence: Stolen rifle found by tree.' None of the words in this sentence is
ambiguous but the sentence is. This is an example of structural ambiguity.
• Eg: Stolen ice candy found by tree

• An idiom is a phrase that, when taken as a whole, has a meaning you wouldn't be able to
deduce from the meanings of the individual words.

 The ball is in your court.

 A piece of cake

 Every cloud has a silver lining

 Not one’s cup of tea

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

• A metaphor is a figure of speech that describes something by saying it's something else.

 Life is a highway.

 Her eyes were diamonds.

 He is a shining star.

• An ellipsis is a punctuation mark of three dots (. . .) that shows an omission of words,


represents a pause, or suggests there's something left unsaid.

 You mean he's...

 The chancellor sweated profusely and stammered, “Your highness, … I wasn’t … I


didn’t mean to imply that … I would never question your wisdom.”

1.5 LANGUGAE AND GRAMMAR.


• Automatic processing of language requires the rules and
exceptions of a language to be explained to the computer.

Grammar defines language.

• It consists of a set of rules that allows us to parse and generate sentences in a language.
• The main hurdle in language specification is constantly changing nature of natural
languages and the presence of a large number of hard to specify exceptions.
• It leads to the development of a number of grammars.
• Main among them are transformational grammar, lexical functional grammar, Government
and binding , Dependency grammar, paninan grammar, tree-adjoining grammar
• Hierarchy of formal grammar based on level of complexity.
• These grammars use phrase structure rules.
• Generative grammar uses a set of rules to specify or generate all and only grammatical
(well-formed) sentences in a language.
• Transformational grammar is a system of language analysis that recognizes the
relationship among the various elements of a sentence and among the possible
sentences of a language and uses rules (some of which are called transformations) to
express these relationships
• Transformational grammar assigns a “deep structure” and a “surface structure” to show
the relationship of such sentences.
• Deep structure is what you wish to express and surface structure how you express it in with
the help of words and sentence.
• Surface structures are the versions of sentences that are seen or heard, while deep
structures contain the basic units of meaning of a sentence
• The mapping from deep structure to surface structure is carried out by transformations
• Deep structure can be transformed in a number of ways to yield many different surface-
level representations.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

• Sentences with different surface-level representation, having the same meaning,


share a common deep-level representation.

Example sentences:

• Pooja plays veena


• Veena is played by Pooja
• Above sentences have same meaning but different surface structures. They have same
deep structure
• Deep subject is Pooja
• Deep object is veena

Transformational grammar has three components


• Phrase structure grammar.
• Transformational rules
• Morphophonemic rules
•These rules match each sentence representation to a string of phonemes.
•Each of these components consists of a set of rules.
•Phrase structure grammar consists of rules that generate natural language sentences and
assign a structural description to them.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

•Eg:

• In these rules, S stands for sentence, NP for noun phrase, VP for verb phrase, and Det for
determiner. Sentences that can be generated using these rules are termed grammatical. The
structure assigned by the grammar is a constituent structure analysis of the sentence. The
second component of transformational grammar is a set of transformation rules, which
transform one phrase-maker (underlying) into another phrase-marker (derived). These rules are
applied on the terminal. String generated by phrase structure rules. Unlike phrase structure
rules. Transformational rules are heterogeneous and may have more than one symbol on their
left hand side. These rules are used to transform one Surface representation into another, C-g.,
an active sentence into passive one. The rule relating active and passive sentences (as given by
Chomsky) is

This rule says that an underlying input having the structure NP-Aux can be transformed to NP -
Aux + be + en - V - by + NP. This transformation involves addition of strings 'be' and 'en' and
certain re arrangements of the constituents of a sentence. Transformational rules can be
obligatory or optional. An obligatory transformation is one that ensures agreement in number
of subject and verb, etc., whereas an optional transformation is one that modifies the structure
of a sentence while preserving its meaning. Morphophonemic rules match each sentence
representation to a string of phoneme.
Consider the active sentence:
The police will catch the snatcher. (1.5)
The (1.5) application of phrase structure rules will assign the structure shown in oigure 1.2 to
this sentence.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

The passive transformation rules will convert the sentence into: The + culprit + will + be + en +
catch + by + police (oigure 1.3).

Another transformational rule will then reorder 'en + catch' to 'catch + en' and subsequently
one of the morphophonemic rules will convert catch + en' to 'caught'. In general, the noun
phrase is not always as simple as in sentence (1.5).

1.5 PROCESSING INDIAN LANGUAGES


Differences between Indian languages and English

• Unlike English, Indic scripts have a non-linear structure.


• Unlike English, Indian languages have SOV (Subject-Object-Verb) as the default sentence
structure.
• Indian languages have a free word order, i.e., words can be moved freely within a
sentence without changing the meaning of the sentence.
• Spelling standardization is more subtle in Hindi than in English
• Indian languages have a relatively rich set of morphological variants.
• Indian languages make extensive and productive use of complex predicates (CPs).
• Indian languages use post-position (Karakas) case markers instead of prepositions.
• Indian languages use verb complexes consisting of sequences of verbs

Eg: ga raha hai - singing, khel rahi hai - playing

Except for the direction in which its script is written, Urdu is closely related to Hindi. Both share
similar phonology, morphology, and syntax. Both are free-word-order languages and use post-
positions. They also share a large amount of their vocabulary. Differences in the vocabulary arise
mainly because a significant portion of Urdu vocabulary comes from Persian and Arabic, while
Hindi borrows much of its vocabulary from Sanskrit. Paninian grammar provides a framework for

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

Indian language models. These can be used for computation of Indian languages. The grammar
focuses on extraction of Karaka relations from a sentence.

1.7 NLP APPLICATIONS

Machine Translation
• Automatic translation of text from one human language to another.
• In order to carry out this translation, it is necessary to have an understanding of
words and phrases, grammars of the two language, involved, semantics of the
languages, and world knowledge.
Speech Recognition
• It is the process of mapping acoustic speech signals to a set of words.
• Issues might occur due to wide variations in the pronunciation of words homonym
(e.g. dear and deer) and acoustic ambiguities (e.g., in the rest and interest).
Speech Synthesis
• Speech synthesis refers to automatic production of speech utterance of natural
language sentences
• The systems can read out the mails on telephone, or read out a storybook.
• In order to generate utterances, text has to be processed.
• Natural Language Interfaces to Databases
• Natural language interfaces allow querying a structured database using natural
language sentences.
Question Answering
• Attempts to find the precise answer, or at least the precise portion of text in which
the answer appears.
• Requires precise analysis of questions and portions of texts, semantics and
background knowledge to answer- certain type of questions.
Text summarization
Creates the summaries of documents and involves syntactic, semantic, and discourse level
processing
Information Retrieval
• Identifies the documents relevant to a user’s query. NLP techniques such as Indexing
(stop word elimination, stemming, phrase extraction, etc.), word sense
disambiguation, query modification,
• and knowledge bases have also been used in IR system to enhance performance.
• •Eg :Google Search
• WordNet, LDOCE (Longman Dictionary of Contemporary English) and Roget’s
• Thesaurus are some of the useful lexical resources for IR research.
Information Extraction
• Captures and outputs factual information contained within a document.
• The information need is specified as pre-defined database schemas or templates.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

• Extracting summaries from vast collections of text like Wikipedia, conversational AI


systems like chatbots, and extracting stock market announcements from financial
news.

***CHAPTER ENDS**

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

CHAPTER-2
LANGUAGE MODELLING
• A language model is a description of language.

• Language modelling can be viewed either as a problem of grammar inference or a


problem of probability estimation.

 A grammar-based language model attempts to distinguish a grammatical sentence


from a non-grammatical one.

 A probabilistic language model attempts to identify a sentence based on a probability


measure, usually a maximum likelihood estimate.

• Two approaches:

 Grammar-based language model

 Statistical language modelling

• Grammar-based language model

 A grammar-based approach uses the grammar of a language to create its model.

 It attempts to represent the syntactic structure of language.

 Grammar consists of hand-coded rules defining the structure and ordering of various
constituents appearing in a linguistic unit (phrase, sentence, etc.).

 Eg: a sentence usually consists of noun phrase and a verb phrase.

 The grammar-based approach attempts to utilize this structure and also the
relationships between these structures.

• Statistical language modelling

 The statistical approach creates a language model by training it from a corpus.

 In order to capture regularities of a language, the training corpus needs to be


sufficiently large. It is one of the fundamental tasks in many NLP applications,
including speech recognition, spelling correction, handwriting recognition,
machine translation, information retrieval, text summarization, and question
answering.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

Statistical language modelling


• A statistical language model is a probability distribution P(s) over all possible word sequences
(or any other linguistic unit like words, sentences, paragraphs, documents, or spoken
utterances).

• The dominant approach in statistical language modelling is the n-gram model.

n-gram Model

• The goal of a statistical language model is to estimate the probability (likelihood) of a


sentence.

• This is achieved by decomposing sentence probability into a product of conditional

• probabilities using the chain rule as follows:

where hi, is history of word wi defined as w1 ,w2 ...,w i-I.

So, in order to calculate sentence probability, we need to calculate the probability of a word, given
the sequence of words preceding it.

The n-gram model calculates P(Wi/hi) by modelling langauge as Markov model of order n-1, ie. by
looking at previous n-1 words only.

A model that limits the history to previous one word only only , is termed as bi-gram(n=1) model.

likewise, a model that conditions the probability of a word to the previous two words , is called tri-
gram(n=2) model.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

EXAMPLE:

• Training set:

 The Arabian Knights

 These are the fairy tales of the east -

 The stories of the Arabian knights are translated in many languages

• Test sentence(s):

 The Arabian knights are the fairy tales of the east.

• Solution

 P(the/<s>= 2/3 = 0.67 P(Arabian/the) = 2/4 =0.5 P(knights /Arabian) = 2/2=1.0

 P(are/these) =1/1= 1.0 P(the/are) =1/2= 0.5 P(fairy/the) =1/5= 0.2

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

 P(tales/fairy) =1/1 = 1.0 P(of/tales) = 1/1= 1.0 P(the/of) = 2/2= 1.0

 P(east/the) =1/5 = 0.2

 P(stories/the) = 1/5= 0.2 P(of/stories) = 1/1= 1.0 P(are/knights) =½ = 0.5


P(translated/are)=1/2= 0.5 P(in/translated)=1/1=1.0 P(many/in) =1/1= 1.0
P(languages/many) = 1/1=1.0

 Test sentence:

• P(The/<s>) x P(Arabian/the) x P(Knights/Arabian) x P(are/knights) x


P(the/are) x P(fairy/the) x P(tales/fairy) x P(of/tales) x P(the/of) x P(east/the)

• 0.67 x 0.5x 1.0x 0.5 x 0.5 x 0.2 x 1.0 x 1.0 x 1.0 x 0.2 = 0.00335

• The n-gram model suffers from data sparseness problem.

 An n-gram that does not occur in the training data is assigned zero probability, so
that even a large corpus has several zero entries in its bi-gram matrix.

 There are several long distance dependencies in natural language sentences, which
this model fails to capture.

 Smoothing techniques have been developed to handle the data sparseness problem

 Smoothing in general refers to the task of reevaluating Zero Probability or low


probability n-grams and assigning them non-zero values.

2.3.2 Add one smoothening

• It adds a value of one to each n-gram frequency before normalizing them into probabilities.
Thus, the conditional probability becomes:

• where V is the
vocabulary size,
i.e., size of the set of all the words being considered.

• Issues:

 It assigns the same probability to all missing n-grams, even though some words are
more important.

 It shifts too much of the probability mass towards the unseen n-grams (n-grams with
0 probabilities) as there number is usually quite large

 Good-Turing smoothing considers the number of n-grams with a high frequency in


order to estimate the probability mass that needs to be assigned to missing or low-
frequency n-grams.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

2.3.3 Good Turing smoothing

• Good—Turing smoothing adjusts the frequency f of an gram using the count of n-grams
having a frequency of occurrence f+1.

• It converts the frequency of an n-gram from f to r using the following expression

• where nf is the number of n-grams that occur exactly f times in the training corpus.

• Eg:, consider that the number of n-grams that occur 4 times is 25,108 and the
number of n-grams that occur 5 times is 20,542.

 Then, the smoothed count for 5 will be 20542/25108=4.09

2.3.4 Caching technique

• The frequency of n-gram is not uniform across the text segments or corpus.

• Certain words occur more frequently in certain segments (or documents) and rarely in
others.

• The basic n-gram model ignores this sort of variation of n- gram frequency.

• The cache model combines the most recent n-gram frequency with the standard n-gram
model to improve its performance locally.

• The underlying assumption here is that the recently discovered words are more likely to be
repeated.

PANINIAN FRAMEWORK
• Paninian grammar (PG) was written by Panini in 500 BC.

• Asian languages are SOV (Subject-Object-Verb) ordered and inflectionally rich.

 The inflections provide important syntactic and semantic cues for language
analysis and understanding.

 The Paninian framework takes advantage of these features

• Some Important oeatures of Indian Languages

 morphologically rich language

 Relatively word order free

 Some languages like Sanskrit have the flexibility to allow word groups representing
subject, object, and verb to occur in any order.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

• In Hindi, the position of subject and object can be changed

 Maan Bachche ko khanaa detii hai

 Mother child to food give

 Mother give food to the child.

 Bachche ko Maan khanaa detii hai

 Child to mother food give

 Mother gives food to the child

The auxiliary verbs follow the main verb

In Indian languages, the nouns are followed by post-positions instead of prepositions.

Layered Representation in PG

• Paninian grammar framework is said to be syntactico-semantic, that is, one can go from
surface layer to deep semantics by passing through intermediate layers.

• It has 4 layers

 The surface level is the uttered or the written sentence

 Semantic level is what the person has in his mind

 PG specifies a mapping between the karaka level and the vibhakti level, and the
vibhakti level and the surface form.

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

• The vibhakti level is the level at which there are local word groups based on case endings,

preposition or postposition markers

 Vibhakti refers to word (noun, verb, or other) groups based either on case endings,
or post- positions, or compound verbs, or main and auxiliary verbs, etc.

 Word groups are formed based on various kinds of markers.

 These markers are language-specific, but all Indian languages can be represented at
the. Vibhakti level

 Vibhakti for verbs includes the verb form and the auxiliary verbs

 The information about TAM (tense, aspect and modality) is given by the vibhakti for
a verb.

• Karaka Level lies between this topmost level and vibhakti level

 At the karaka level, we have karaka relations and verb-verb relations etc.

 Karaka relations are syntactico-semantic (or semantico-syntactic) relations


between the verbs and other related constituents (typically nouns) in a sentence.

• Through these relations, the Karakas try to capture the information from the
semantics of the texts.

 Thus, Karaka level processes the semantics of the language but represents it at the
syntactic level.

• Hence, it acts as a bridge between semantic and syntactic analysis of a


language.

 Paninian Grammar has its own way of defining Karaka relations

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

• These relations are based on the way the word groups participate in the
activity denoted by the verb group

 This is the level of semantics that is important syntactically and is reflected in the
surface form of the sentence.

KARAKA THEORY

• Karaka theory is the central theme of PG framework.

• Karaka relations are assigned based on the roles palyed by various participants in the main
activity.

 These roles are reflected in case markers and post position markers

 Reltions are similar to English but defined in different manner.

• the richness of the case endings found in Indian languages has been used to
its advantage.

• Karakas - direct participants in the action indicated by a verb root

 Karta (subject) - doer

 Karma (object) -locus of the result of the activity

 Karana (instrument)

 Sampradana (beneficiary/(recipient))

 Apadan(source/separation)

 Adhikaran (location locus).

• Maan bachchi ko aangan mein haath se rotii khilaatii hei

Mother child-to courtyard-in hand-by bread feed (s).

The mother feeds bread to the child by hand in the courtyard.

 maan' (mother) is the Karta

• Karta has generally 'ne' or ‘ ' case marker

 Rotii (bread) is karma

• karma has generally ‘ko' or ‘ ' case marker

 haath (hand) is karana

• karma has generally ‘dwara' or ‘ se' case marker

 bachchi (child) is ka Sampradana

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM
NATURAL LANGUGAE PROCESSING BAI601/BAD613B

• karma has generally ‘dwara' or ‘ se' case marker

• Maan ne thaali se khana uthakar bachthe ko diyaa

Mother-Karta plate from Apaadan food taking-up child-to gave. The mother gave food to the
child taking it up from the plate.

 thaali (plate) is Apadan

 aangan (courtyard) is Adhikaran

Issues in Paninian Grammar

• Computational implementation of PG

• Adaptation of PG to Indian, and other similar languages.

• PG is a multilayered implementation.

 The approach is named `Utsarga-Apvada’ (default-exception), where rules are arranged in


multiple layers in such a way that each layer consists of rules which are in exception to rules
in the higher layer.

• Thus, as we go down the layer, more particular information is derived.

• Rules may be represented in the form of charts (such as Karaka chart and Lakshan chart).

 In cases of shared Karak relations many issues remain unresolved

 Another difficulty arises when mapping between the Vibhakti (case markers and post-
positions) and the semantic relation (with respect to verb) is not one to one.

• Two different Vibhakti can represent the same relation, or the same Vibhakti can represent
different relations in different contexts. The strategy to disambiguate the various senses of
words, or word groupings, are still the challenging issues.

 As the system of rules is different in different languages, the framework requires adaptations
to tackle various applications in various languages.

******CHAPTER ENDS*****

*****END Oo MODULE-1*****

AUTHOR: MEGHA RANI R


Assistant Prof. (SENIOR), SMVITM

You might also like