0% found this document useful (0 votes)
63 views83 pages

NLP MODULE 1 Chapter1 &2

The document provides an overview of Natural Language Processing (NLP), detailing its definition, components, history, applications, advantages, and challenges. It highlights the importance of language in human communication and how NLP enables machines to understand and process human language. Additionally, it discusses grammar's role in NLP, types of grammar, and transformational grammar rules.

Uploaded by

bhavanibhani626
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views83 pages

NLP MODULE 1 Chapter1 &2

The document provides an overview of Natural Language Processing (NLP), detailing its definition, components, history, applications, advantages, and challenges. It highlights the importance of language in human communication and how NLP enables machines to understand and process human language. Additionally, it discusses grammar's role in NLP, types of grammar, and transformational grammar rules.

Uploaded by

bhavanibhani626
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 83

NATURAL LANGUAGE

PROCESSING
TEXTBOOK 1: TANVEER SIDDIQUI, U.S. TIWARY, “NATURAL LANGUAGE

PROCESSING AND INFORMATION RETRIEVAL”, OXFORD UNIVERSITY PRESS,

2008.

TEXTBOOK 2: ANNE KAO AND STEPHEN R, POTEET(EDS), “NATURAL

LANGUAGE PROCESSING AND TEXT MINING”, SPRINGER-VERLAG LONDON

LIMITED 2007.
MODULE 1: CHAPTER1: INTRODUCTION
• Language is method of Communication for humans. By studying language, we come to understand more
about the world. With the help of communication, we can speak, read and write. Eg: we think, we make
decisions, plans and more in natural language; precisely in words.

• Natural Language Processing (NLP): is the sub-field of Computer Science especially AI that is concerned
about enabling computers to understand and process human language.

• It is basically concerned to develop computational models with Human Language processing.

• There are two main reasons for such developments:

1. To develop automated tools for language processing.

2. To gain a better understanding of human communication.


• NLP Natural Language Processing (NLP) is a subfield of artificial intelligence (AI). It helps

machines process and understand the human language so that they can automatically perform

repetitive tasks.

• Examples include machine translation, ticket classification, and spell check.

• NLP involve processing of speech, grammar and meaning.

• NLP is composed of two parts: NLU( NATURAL LANGUAGE UNDERSTANDING) and

NLG( NATURAL LANGUAGE GERNERATION).

• It’s goal is to process the text data (unstructured data) to perform task like translation, grammar

checking, topic classification, document similarities etc.

• Examples: Google Assistant, Siri, Alexa.


COMPONENTS OF NLP
• NATURAL LANGUAGE UNDERSTAND(Ambiguity is major problem).

• Lexical (word level): It happens when word has more than one meaning. Ie.,which words to
choose. Eg: treating the word silver as a noun, an adjective.

• Syntactic(sentence level/parsing): “The man saw the girl with the telescope". Sentence has
multiple meanings for compiler. It is about confusion in how words are grouped together in a
sentence. It is ambiguous whether the man saw the girl carrying a telescope or he saw her through
his telescope.

• Referential ambiguity: It occurs when a phrase can have multiple interpretations due to the use
of multiple objects and the referencing not being clear.

• Ex: Meena went to Geetha. She says that she is hungry.” Here “she” can refer to either Meena or
Geetha in a compiler.
Contd..
• NATURAL LANGUAGE GENERATION:

• TEXT PLANNING:
 It includes retrieving the relevant content from knowledge base.
• SENTENCE PLANNING:
 It includes choosing required words, forming meaningful phrases, setting
tone of the sentence.
 It arrange words in proper meaningful way.
• TEXT REALIZATION:
 It is used to make a structure of sentences and display as output.
Contd..
ORIGINS/HISTORY OF NLP
• (1940-1960) - Focused on Machine Translation (MT):The Natural Languages
Processing started in the year 1940s.

• 1948 - In the Year 1948, the first recognizable NLP application was introduced in
Birkbeck College, London.

• (1960-1980)- Flavored with Artificial Intelligence (AI): In the year 1960 to 1980, the
key developments were:

• Augmented Transition Networks (ATN):


 Augmented Transition Networks is a finite state machine that is capable of recognizing regular
languages.
Contd..
• Case Grammar:
 Case Grammar was developed by Linguist Charles J. Fillmore in the year 1968. Case Grammar
uses languages such as English to express the relationship between nouns and verbs by using the
preposition.

 For example: "Neha broke the mirror with the hammer". In this example case grammar
identify Neha as an agent, mirror as a theme, and hammer as an instrument.
• SHRDLU: In the year 1960 to 1980, key systems were:
 SHRDLU is a program written by Terry Winograd in 1968-70. It helps users to communicate
with the computer and moving objects.
Contd..
• LUNAR:
 LUNAR is the classic example of a Natural Language database interface system that is used ATNs
and Woods' Procedural Semantics. It was capable of translating elaborate natural language
expressions into database queries and handle 78% of requests without errors.

• 1980 –Current:Till the year 1980, natural language processing systems were based on
complex sets of hand-written rules. After 1980, NLP introduced machine learning algorithms for
language processing.

• In the beginning of the year 1990s, NLP started growing faster and achieved good process
accuracy, especially in English Grammar.

• Now, modern NLP consists of various applications, like speech recognition, machine
translation, and machine text reading.
LANGUAGE AND KNOWLEDGE

• Language is a vital part of human connection. Although all species have their
ways of communicating, humans are the only ones that have mastered cognitive
language communication.

• Language allows us to share our ideas, thoughts, and feelings with others.

• An NLP knowledge base is a database that stores information that an AI chatbot


can access to provide natural language responses to customers. Eg: This type of
KB lets chatbots understand humans and respond. Here’s a real-life NLP chatbot
example:
Contd..
Contd..
• Here’s an example of how differently these two chatbots respond to questions.
THE CHALLENGES OF NLP
1. Language differences:
 In the United States, most people speak English, but if you’re thinking of reaching an international

and/or multicultural audience, you’ll need to provide support for multiple languages.

2. Training data:
 At its core, NLP is all about analyzing language to better understand it. A human being must be

immersed in a language constantly for a period of years to become fluent in it; even the best AI
must also spend a significant amount of time reading, listening to, and utilizing a language.
Contd..
3. Misspellings:
 Misspellings are a simple problem for human beings. But for a machine, misspellings can be harder to
identify. You’ll need to use an NLP tool with capabilities to recognize common misspellings of words,
and move beyond them.

4. Words with multiple meanings: No language is perfect, and most languages have words that
have multiple meanings. For example, a user who asks, “how are you” has a totally different goal than a
user who asks something like “how do I add a new credit card?” Good NLP tools should be able to
differentiate between these phrases with the help of context.

5. Keeping a conversation moving:


 Many modern NLP applications are built on dialogue between a human and a machine. Accordingly,

your NLP AI needs to be able to keep the conversation moving, providing additional questions to collect
NLP Applications

Speech Recognition: Speech recognition is used for converting spoken words


into text. Example: It is used in applications, such as mobile, home automation,
dictating to Microsoft Word, voice biometrics, voice user interface, and so on.

Speech Synthesis: It is used for converting text to speech. The voice synthesizers
Alexa, Google Home are a few well-known examples.

Natural Language Interfaces to Databases: Natural language interfaces allow


querying a structured database using natural language sentences.
Contd..
Information Retrieval: This is concerned with identifying documents relevant to
user’s query. NLP techniques have found useful applications in information
retrieval. Indexing(stop word elimination, stemming, phrase extraction, etc.)

 Information Extraction: A system captures and outputs the factual information


contained within a documents. It is similar to an information retrieval system, it
responds to user’s information need. But the information need is not expressed as
a keyword query.
Contd..
Text Summarisation: It is the creation of a short, accurate, and fluent summary
of a longer text document.

Question Answering: It focuses on building systems that automatically answer


the questions asked by humans in a natural language.
Contd..
Spelling correction: Microsoft Corporation provides word processor software
like MS-word, PowerPoint for the spelling correction.
Contd..
Machine Translation: Machine translation is used to translate text or speech
from one natural language to another natural language. Example: Google
Translator.
Contd..
Chatbot: Implementing the Chatbot is one of the important applications of NLP.
It is used by many companies to provide the customer's chat services.
Advantages of NLP
• NLP helps users to ask questions about any subject and get a direct response within
seconds.

• NLP offers exact answers to the question means it does not offer unnecessary and
unwanted information.

• NLP helps computers to communicate with humans in their languages.

• It is very time efficient.

• Most of the companies use NLP to improve the efficiency of documentation processes,
accuracy of documentation, and identify the information from large databases.
Disadvantages of NLP

• NLP is unpredictable: Data privacy issues arise mainly due to its reliance on
personal information collected from users via apps and websites.

• NLP is unable to adapt to the new domain, and it has a limited function that's why
NLP is built for a single and specific task only.
Language and Grammer in NLP

• Grammer defines language. It consist of a set of rules that allows us to parse and
generate sentences in a language.

• Inotherwords, Grammar in NLP is a set of rules for constructing sentences in a


language used to understand and analyze the structure of sentences in text data.

• This includes identifying parts of speech such as nouns, verbs, and adjectives,
determining the subject and predicate of a sentence, and identifying the
relationships between words and phrases.
What is Grammer?
• Grammar is defined as the rules for forming well-structured sentences.

• Grammar also plays an essential role in describing the syntactic structure of well-
formed programs, like denoting the syntactical rules used for conversation in
natural languages.

• In the theory of formal languages, grammar is also applicable in Computer


Science, mainly in programming languages and data structures.

• In the C programming language, the precise grammar rules state how functions
are made with the help of lists and statements.
Contd..
• Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P) where:

o N or VN = set of non-terminal symbols or variables.

o T or ∑ = set of terminal symbols.

o S = Start symbol where S ∈ N

o P = Production rules for Terminals as well as Non-terminals.

o It has the form 𝛼 →𝛽 where α and β are strings on VN∪∑, and at least one symbol of α
belongs to VN
Syntax
• Each natural language has an underlying structure usually referred to under
Syntax.

• The fundamental idea of syntax is that words group together to form the
constituents like groups of words or phrases which behave as a single unit.

• These constituents can combined to form bigger constituents and, eventually,


sentences.

• Syntax also refers to the way words are arranged together.


Contd..
• Let us see some basic ideas related to syntax:

o Constituency: Groups of words may behave as a single unit or phrase - A


constituent, for example, like a Noun phrase.

o Grammatical relations: These are the formalization of ideas from traditional


grammar. Examples include - subjects and objects.

o Subcategorization and dependency relations: These are the relations between


words and phrases, for example, a Verb followed by an infinitive verb.
Contd..
o Regular languages and part of speech: Refers to the way words are arranged
together but cannot support easily. Examples are Constituency, Grammatical
relations, and Subcategorization and dependency relations.

o Syntactic categories and their common denotations in NLP: np - noun


phrase, vp - verb phrase, s - sentence, det - determiner (article), n - noun, tv -
transitive verb (takes an object), iv - intransitive verb, prep - preposition, pp -
prepositional phrase, adj – adjective.
Types of Grammar in NLP
• Let us move on to discuss the types of grammar in NLP. We will cover three types of
grammar: context-free, constituency, and dependency.

• Context Free Grammar: It consists of a set of rules expressing how symbols of the
language can be grouped and ordered together and a lexicon of words and symbols.

o One example rule is to express an NP (or noun phrase) that can be composed of
either a ProperNoun or a determiner (Det) followed by a Nominal, a Nominal in turn
can consist of one or more Nouns: NP → DetNominal, NP → ProperNoun; Nominal
→ Noun | NominalNoun
Contd..
• A Context free grammar consists of a set of rules or productions, each expressing
the ways the symbols of the language can be grouped, and a lexicon of words.

• Context-free grammar (CFG) can also be seen as the list of rules that define the
set of all well-formed sentences in a language. Each rule has a left-hand side that
identifies a syntactic category and a right-hand side that defines its alternative
parts reading from left to right.

• Example: The rule s --> np vp means that "a sentence is defined as a noun
phrase followed by a verb phrase."
Contd..
• Formalism in rules for context-free grammar: A sentence in the
language defined by a CFG is a series of words that can be derived by
systematically applying the rules, beginning with a rule that has s on its left-hand
side.

o Use of parse tree in context-free grammar: A convenient way to describe a


parse is to show its parse tree, simply a graphical display of the parse.

o A parse of the sentence is a series of rule applications in which a syntactic


category is replaced by the right-hand side of a rule that has that category on its
left-hand side, and the final rule application yields the sentence itself.
Contd..
• Semantic representation sentences which describes the meaning are called deep
structures.

• Surface structures describes the sound. It refers to the sentence as it is pronounced/


written.

• Chomsky’s theory was able to explain why sentences like:

 Pooja plays veena.

 Veena is played by pooja.

Have the same meaning, Both the sentences are being generated from ‘deep structure’ In
which deep subject is pooja and deep object is veena.
Contd..
• Each sentence in a language has two levels of representations as shown in the below fig:
Deep structure and Surface structure.

• The mapping from deep structure to surface structure is carried out by transformations. In
the following paragraphs, we introduce transformational grammer.
Contd..
• Transformational grammar has three components: Each of these components
consist of a set of rules.

o Phrase structure grammar: It consists of rules that generate natural language sentences
and assign structure description to them.

o Transformational rules:

The second component of transformational grammar is a set of transformational rules, which


transform one phrase-maker into another phrase-maker. These rules are applied to the terminal string
generated by phrase structure rules.

The rule relating to active and passive sentences is –


Contd..

• The second component of transformational grammar is a set of transformational


rules, which transform one phrase-maker(underlying) into another phrase-
maker(derived).

• Transformational rules are heterogenous, and may have more than one symbol on
their left hand side. These rules are used to transform one surface representation
into another. Eg: an active sentence into passive sentence.
Contd..
Q:Write the transformational grammar for the sentence
S: “The boy hit the girl.”
Transformational Grammar for the sentence - A boy hit the girl
S -> NP + VP
NP -> Det + Noun
VP -> V + NP
V -> Aux + V
Det -> A, the
Noun -> boy, girl
Verb -> hit
Contd..
• Morphophonemic rules: These rules match each sentence representation to a string of
phonemes.

• Morphophonemic involves an investigation of the phonological variations within


morphemes, usually marking different grammatical functions; e.g., the vowel changes in
“sleep” and “slept,” “bind” and “bound,” “vain” and “vanity,” and the consonant
alternations in “knife” and “knives,” “loaf” and “loaves.”

• The Transformational rule will reorder ‘en + catch’ to ‘catch + en’ and subsequently one
of the morphophonemic rules will convert ‘catch + en’ to ‘caught’.
Contd..
• As an example, consider a following set of rules: Sentences that can be generated
using these rules are termed as grammatical.
Contd..
Processing Indian Languages
• There are a number of differences between Indian Languages and English.

• This introduces differences in their processing. Some of the differences are listed here:

o Unlike English, Indian Languages have SOV(Subject-Object-Verb) as default sentence


structure.

o Indian Languages have a free word order i.e., words can be moved freely within a
sentence without changing the meaning of the sentence.

o Spelling standardization is more subtle in Hindi than in English.

o Indian languages use verb complexes consisting of sequences of verbs, e.g., गा रहा
है (ga raha hai — singing ) and खेल रही है (khel rahi hai — playing).
Information Retrieval
• Information Retrieval is the software program that deals with the organization, storage,
retrieval, and evaluation of information from document repositories particularly textual
information.

• Information Retrieval is the activity of obtaining material that can usually be documented
on an unstructured nature i.e. usually text which satisfies an information need from within
large collections which is stored on computers. For example: Information Retrieval can be
when a user enters a query into the system.
Contd..
• Information retrieval also retrieves information about a subject.

• Small errors are likely to go unnoticed.

• A set of keywords are required to search. Keywords are what people are searching
for in search engines. These keywords summarize the description of the
information.

• Not always well structured and is semantically ambiguous.

• Does not provide a solution to the user of the database system.

• The results obtained are approximate matches.


Issues in Information Retrieval
• The main issues of the Information Retrieval (IR) are Document and Query Indexing,
Query Evaluation, and System Evaluation.
Contd..
• Document and Query Indexing:Main goal of Document and Query Indexing is to
find important meanings and creating an Internal representation.

• In the retrieval model how can a document be represented with the selected keywords and
how are documents and query representations compared to calculate a score.

• Information Retrieval (IR) deals with issues like uncertainty and vagueness in information
systems.

o Uncertainty: The available representation does not typically reflect true semantics of
objects such as images, videos etc.

o Vagueness: The information that the user requires lacks clarity, is only vaguely expressed
in a query, feedback or user action.
Contd..
• System Evaluation:

System Evaluation tells about the importance of determining the impact of


information given on user achievement. Here, we see if the efficiency of the
particular system related to time and space.
Chapter 2: Language Modelling
• A model is a description of some complex entity or process.

• A Language model is a description of language.

• Natural Language is a complex entity and in order to process it through a computer-based program,
we need to build a representation(model) of it. This is known as Language modelling.

• Language modeling is the way of determining the probability of any sequence of words.

• Language modeling is used in a wide variety of applications such as Speech Recognition, Spam
filtering, etc.

• In fact, language modeling is the key aim behind the implementation of many state-of-the-art
Natural Language Processing models.
Contd..
• Methods of Language Modelings:
 Two types of Language Modelings:

Grammer-Based Language Modelings: A grammar-based approach uses the grammar of


the language to create its model.

• It is used to represent the syntactic structure of language.

• For example: a sentence usually consists of noun phrase and verb phrase.

• The grammar-based approach attempts to utilize this structure and also relationship between
this structures.

Statistical Language Modelings: Statistical Language Modeling, or Language Modeling, is


the development of probabilistic models that are able to predict the next word in the sequence
given the words that precede. Ex: N-gram: are the simplest and most common kind of
Various Grammer-Based Language Models
• Generative Grammers: In NLP, we can generate sentences in a language if we know
the collection of words and rules in that language. Only those sentences that can be
generated as per the rules are grammatical. Ex: in the zoo, on the table, near the window.

• Hiearchical Grammer: Chomsky described classes of grammars in a hierarchical


manner, where top layer contained the grammars represented by its sub classes. Hence

Type 0 ( or unrestricted) grammar contains

Type 1( or context-sensitive grammar) which in turn contains

Type 2( context-free grammar) and that again contains

Type 3 grammar( regular grammar)


Contd..
• Government and Binding (GB):
o Transformational grammars assume two levels of existence of sentences-
one at the surface level
 other at the deep root level

o Government and Binding (GB):theories have renamed them as s-level and d-level and
identified two more levels of representations( parallel to each other) called phonetic form
and logical form.

o According to GB theories, language can be considered for analysis at the level shown in
the figure.
Contd..
• Let us take an example to explain TG representation of a sentence:
• Ex: Mukesh was killed.

(i) In transformational grammar, this can be represented as S-NP AUX VP as given below
fig2.2
Contd
• Example to explain d-structure and s-structure:
Consider the sentence: S: Mukesh was killed
s-structure d-structure
D-structure and S-structure
Contd..
• Ex: Drano, he drank
Contd..
• Components of GB: It comprises a set of theories that map the structures from d-
structure to s-structure and to logical form (LF)

• A general transformational rule called ‘Move 𝛼 ’ is applied at d-structure level and s-


structure level. By several Theories and principles this can move constituents at any place.
Contd..
• Organisation of GB:
Contd..
• The X theory (pronounced X-bar theory) is one of the central concepts in GB.

• Instead of defining several phrase structures and the sentence structure with separate sets
of rules, X theory defines them both as maximal projections of some head. In this
manner, the entities defined become language independent.

• Thus, noun phrase (Np), verb phrase (VP), adjective phrase (AP), and prepositional
phrase (PP) are maximal projections of noun (N), verb (V), adjective (A), and preposition
(P) respectively, and can be represented as head X of their corresponding phrases (where
X = {N, V, A, P}) structure (S’- projection of sentence) can be regarded as maximal
projection of inflection.
Contd..
• The GB envisages projection at two levels:
o first the projection of head at semi-phrasal level, denoted by x̄
o second maximal projection at the phrasal level is denoted by x( double bar).
o For sentences, the first level projection is denoted as S.
o Second level maximal projection is denoted by S’
o We now illustrate phrase and sentence representations with the help of examples.

• Example 2.3 Figure 2.7 depicts the general and particular structures with examples. We
see the general structure in Figure 2.7(a).
Contd..
• Next, we consider the representation of the NP, the food in a dhaba This is followed by
the representation of VP, AP, and PP structure in Figure 2.7(c—e); and finally Figure
2.7(f) shows the representation of a sentence.
Contd..
Contd..
• As shown in Figure 2.7(f), the sentence is considered to be the head of INFL and the projection of
sentence is denoted by S, which has the specifier as complementizer (COMP).
Contd..
• Different components of GB:
1. X-bar theory
2. Sub-Categorization
3. Projection
4. Theta Theory(θ-Theory)
5. Theta-role and Theta-criterion
6. Binding Theory
7. Empty Category Principle
8. Case Theory and Case Filter
Lexical Functional Grammer(LFG Model)

• Lexical Functional Grammar (LFG) plays a vital role in the area of Natural Language Processing
(NLP).

• LFG includes two basic forms: c-structure and f-structure.

• Structure of LFG contains constituent or categorial structure( c-structure) and functional


structure(f-structure).

• C-structure indicates the hierarchical composition of words into larger units or phrasal
constituents,

• f-structure is a representation of grammatical functions like subject, object, etc.


Lexical Rules in LFG
• Ex: She saw stars in the sky.
• CFG rules to handle this sentence are: S→ NP VP
VP→ V {NP} {NP} PP* {S’}
PP→ P NP
NP→ Det N {PP}
S’ →Comp S
where S: sentence
N: Noun
V: Verb
P: preposition
S’: clause
Comp: complement
{ } optional
*: Phrase can appear any number of times including blank.
Contd..
Paninian Framework
• Paninian grammar (PG) was written by Panini in 500 BC in Sanskrit. PG framework can
be used for other Indian languages and even for some Asian languages.

• Unlike English, Asian languages are SOV (Subject-Object-Verb)ordered and


inflectionally rich. The inflections provide important syntactic and semantic for language
analysis and understanding.

• The classical Paninian Grammar facilitates the task of obtaining the semantics through
syntactical framework. In PG, an extensive and perfect interpretation of Phonology,
Morphology, Syntax, and Semantics is available.
Contd..
• Layered representation in panini grammar:
• Paninian Grammar (PG) framework is said to be syntactico–semantic that is one can go from the
surface layer to deep semantics by passing through immediate layers.

• PG works on various levels of language analysis to achieve the meaning of the sentence from the
hearer’s perspective.

• To achieve the desired meaning, the grammar analysis is divided itself internally into various
levels as shown in the figure below.
Contd..
• Semantic Level: Represents the speaker’s actual intention, that is, his real thought for the sentence.

• Surface Level: It is the actual string or the sentence. It captures the written or the spoken sentences as
it is.

• Vibhakti Level: Vibhakti is the word suffix, which helps to find out the participants, gender as well as
form of the word.

 Vibhakti level is purely syntactic. At the Vibhakti level, a noun is formed containing a noun, which
contains the instances of noun or pronoun, etc.

 Vibhakti for verbs includes the verb form and the auxiliary verbs.

Karaka Level: At the Karaka level, the relation of the participant noun, in the action, to the verb is
determined.

 Karaka relations are Syntactico-semantic.


Karaka Theory
• The meaning of the word Karaka is ‘one who does something’, i.e. one who performs an action.
• Note: The one who performs an action, accepts an action, or otherwise helps to perform an
action is known as a Karaka.
• There is a mutual expectancy in between the action i.e. Kriya and the adjuncts i.e. Karaka.

• Various Karakas: The Karta, Karma, and Karana are considered the foremost Karakas while
Sampradana, Apadana, and Adhikarana Karakas are known as the influenced Karakas.
1. Karta - subject
2. Karma - object
3. Karana - instrument
4. Sampradhana - beneficiary
5. Apadana - separation
6. Adhara or Adhikarana - locus
7. Sambandh - relation
8. Tadarthya - purpose
Contd..
1. Karta Karaka: The Karta Karaka is the premier one according to action and it is used to perform
an action independently of its own.

• An action indicated in a sentence is entirely dependent upon the Karta-Karaka. Activity either resides
in or rises from the Karta only.

E.g. Tiger killed the goat

Tiger - karta.

2. Karma Karaka: Karma is the Aashraya (locus) of the result.

• The most desirable nominative by the Karta Karaka is the Karma Karaka. When the Karta carries
out any activity, the result of that activity rests in the Karma. As the Karma(object) is the basis of
outcome of the primary action, it is one of the important prominent Karaka.

Eg. Tiger killed the goat.


Contd..
3. Karana Karaka: Karaka which helps in carrying out the Kriya is known as Karana Karaka. Karana
Karaka helps in attaining the desired result ascertained by the Kriya.

• Karana is the only direct participant in the action.

• Karta and Karma also are directly dependent on the Karana for performing the action.

• Karana is the most important tool by means of which the action is achieved.

• When the Karana Karaka executes its auxiliary actions then only the main action is executed by the
Karta Karaka. This is why the Karana is considered as the efficient mean in action accomplishment.
Example:

The man cut the wood with an axe

Ram cuts the apple with knife.

karana - axe and knife


Contd..
4. Sampradhana Karaka:The word Sampradana can be interpreted as ‘He to whom something
is given properly’.

• Sampradana Karaka receives or gets benefited from the action. It can also be said that, the
person/object for which the Karma is intentional, is known as Sampradana.

• In this regard, the Sampradana is the final destination of the action.

Example 1: Dipti gave chocolates to Shambhavi

Shambhavi is sampradhana.

Example 2: Ram gave me a book.

me is sampradhana.

Example 3: He gave flowers for Shanbhavi

Shambhavi is sampradhana
Contd..
5. Apadana Karaka: About Apadana Karaka Panini stated that, as when separation is affected by
a verbal action, the point of separation is called Apadana.

• During the execution of the action whenever the task of separation from a certain entity is
executed then whatever remains unmoved or constant is known as Apadana.

• Thus, an Apadana denotes the starting point of an action of separation.

• The entity from which something gets separated or is separated out is known as Apadana.

Example: Shambhavi tore the page from the book with a scissor.

From the book is apadana


Contd..
6. Adhikarana Karaka: Adhikarana’ is the place or thing, which is the location of the action existing in the agent
or the object.

• Adhikarana is assigned to the locus of the action i.e. Kriya. Adhikarana may indicate the place at which the Kriya
(the action) is taking place or the time at which the Kriya is carried out. Any action i.e. the Kriya is either bounded
by space (place) or by time.

Example: ‘Yesterday Shambhavi hit the dog with the stick in front of the shop.’

The Karaka annotation of the above sentence can be given as:


hit : verb (root)

Yesterday : Kala-Adhikarana (time)

Shambhavi : Agent i.e. Karta

Dog : Karma

Stick : Karana

Shop : Desh-Adhikarana (location i.e. Place)


Contd..
• Identify different karaka’s in the following Hindi sentence

S: “Maan putree ko angan mein haath se roti khilaathi hai”

Karaka role:

khilaathi - Verb (root)

Maan - Karta

roti - Karma

haath - Karana

putree - Sampradhana

angan - Adhikaran
Statistical language model
• A statistical language model is a probability distribution P(s) over all possible word sequences (or
any other linguistic unit like words, sentences, paragraphs, documents etc).

• A number of statistical language models have been proposed in the literature. The dominant
approach in modeling is the n-gram model:

• n-gram Model: The goal of a statistical language model is to estimate the probability of a
sentence. This is achieved by decomposing sentence probability into a product of conditional
probabilities using the chain rule as follows:
Contd..
P(s) = P (w1, w2, w3, …,wn)
= P (w1) P (w2/w1) P (w3/w1 w2) P (w4/w1 w2 w3) … P (wn/ w1 w2 … wn-1)

where hi is the history of the word wi defined as w1 w2 …wi-1


• To find the sentence probability, we need to calculate the probability of a word, given the sequence of
word preceding it.
• The n-gram model simplifies the task by approximating the probability of a word given all the

previous words by the conditional probability given previous n-1 words only.
• A model that limits the history to the previous one word only is termed a bi-gram (n = 2)

model. A model that conditions the probability of a word to the previous two words, is called a tri-gram (n
= 3) model.
Contd..
• Using bi-gram and tri-gram estimates, the probability of a sentence can be calculated as

• A special word (pseudoword) <s> is introduced to mark the beginning of the sentence in bi-gram
estimation. The probability of the first word in the sentence is conditioned on <s>. Similarly, in tri-
gram estimation, two pseudo words <s1> <s2> are introduced.
• Estimation of probabilities is done by training the n-gram model on the training corpus. We count
a
particular n-gram in the training corpus and divide it by the sum of all n-grams that share the same
prefix.
Contd..
Q: Find the probability of the test sentence – P(“They play in a big garden”) in the
following training set using the bi-gram model.

<S>There is a big garden.

Children play in the garden.

They play inside beautiful garden. </S>

Training Set:

<S> There is a big garden. </S>

<S> Children play in the garden. </S>

<S> They play inside a beautiful garden. </S>


Contd..
Test Sentence:
They play in the big garden.
Bi-gram model:
P (They play in the big garden)
= P (They / <S>) *
P (play / they) *
P (in / play) *
P (the / in) *
P (big / the) *
P (garden / big)
= ⅓ * 2/1 * ½ * 1/1 * 1/1 * 3/1
=1
Contd..
Q: Find the probability of the test sentence – P(“I I am not”) in the following training set
using the bi-gram model.

<S> I am a human

I am not a stone

I I Live in a Lahore </S>

Training Set:

<S> I am a human </S>

<S> I am not a stone </S>

<S> I I Live in a Lahore </S>


Contd..
• Using Bi-gram model:

So, P(I|<s>) = count(<s> I)/count(<s>)

= 3/3=1

P(am|I) = count( I am)/count(I)

= 2/4= ½= 0.5

P(live|I) = count( I live)/count(I)

= ¼ = 0.25
Contd..
Test Sentence:

“I I am not”

Bi-gram model:

P(“ I I am not”) = P(I|<s>) P(I|I) P(am|I) P(not|am)

= 3/3 * ¼ * 2/4 * ½

= 1 * 0.25 * 0.5 * 0.5 = 0.0625

You might also like