0% found this document useful (0 votes)
57 views21 pages

Unit - 2 NLP - R20

The document discusses grammars and parsing in natural language processing (NLP), detailing top-down and bottom-up parsing techniques, as well as context-free grammars (CFG). It explains the structure of sentences through constituents, the role of parsers in syntax analysis, and introduces concepts like ambiguity and derivations. Additionally, it outlines parsing algorithms, including a simple top-down parsing algorithm and bottom-up parsing methods, along with examples and illustrations of parsing processes.

Uploaded by

narayanababu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views21 pages

Unit - 2 NLP - R20

The document discusses grammars and parsing in natural language processing (NLP), detailing top-down and bottom-up parsing techniques, as well as context-free grammars (CFG). It explains the structure of sentences through constituents, the role of parsers in syntax analysis, and introduces concepts like ambiguity and derivations. Additionally, it outlines parsing algorithms, including a simple top-down parsing algorithm and bottom-up parsing methods, along with examples and illustrations of parsing processes.

Uploaded by

narayanababu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

R20-Regulations NLP

UNIT - II Grammars and Parsing Lecture 8Hrs Grammars and Parsing- Top- Down and
Bottom-Up Parsers, Transition Network Grammars, Feature Systems and Augmented
Grammars, Morphological Analysis and the Lexicon, Parsing with Features, Augmented
Transition Networks, Bayes Rule, Shannon game, Entropy and Cross Entropy.
Grammars and Parsing:

Natural language has an underlying structure usually referred to under the heading of Syntax.
The fundamental idea of syntax is that words group together to form so-called constituents
i.e. groups of words or phrases which behave as a single unit. These constituents can combine
together to form bigger constituents and eventually sentences.

The instance, John, the man, the man with a hat and almost every man are constituents (called
Noun Phrases or NP for short) because they all can appear in the same syntactic context (they
can all function as the subject or the object of a verb for instance). Moreover, the NP
constituent the man with a hat can combine with the VP (Verb Phrase) constituent run to
form a S (sentence) constituent.

Grammar: It is a set of rules which checks whether a string belongs to a particular


language a not

A program consists of various strings of characters. But, every string is not a proper or
meaningful string. So, to identify valid strings in a language, some rules should be specified
to check whether the string is valid or not. These rules are nothing but make Grammar.

Example − In English Language, Grammar checks whether the string of characters is


acceptable or not, i.e., checks whether nouns, verbs, adverbs, etc. are in the proper sequence.

Context-Free Grammar(CFG): Context-free grammar (CFG) is a type of formal grammar


used in natural language processing (NLP) and computational linguistics. It provides a way to
describe the structure of a language in terms of a set of rules for generating its sentences.

SEAGI-NB 1
R20-Regulations NLP

Formally, Context-Free Grammar (G) can be defined as − It is a 4-tuple (V,T,P,S)


 V is a set of Non-Terminals or Variables. Examples N,NP, V, VP, PP, Det etc
 T is a set of terminals. ∑={names, things, places & verbs}
 P is a set of Productions or set of rules
 S is a starting symbol
The following Grammar productions are used in NLP parsers.
1. S -> NP VP
2. NP -> ART N
3. NP -> ART ADJ N
4. VP -> V
5. VP -> V NP

Derivations:
A derivation is a sequence of rule applications that derive a terminal string w = w1 … wn
from S
For Example:
S->NP VP
Pro VP
I VP
I Verb NP
I prefer NP
I prefer Det Nom
I prefer a Nom
I prefer a Nom Noun
I prefer a Noun Noun
I prefer a morning Noun
I prefer a morning flight

SEAGI-NB 2
R20-Regulations NLP

Parsing: In the syntax analysis phase, a compiler verifies whether or not the tokens generated
by the lexical analyzer are grouped according to the syntactic rules of the language. This is
done by a parser.

The parser obtains a string of tokens from the lexical analyzer and verifies that the
string can be the grammar for the source language. It detects and reports any syntax errors
and produces a parse tree from which intermediate code can be generated.

The following diagram describes the working procedure of the parser.

Ambiguity A grammar that produces more than one parse tree for some sentence is said to be
ambiguous.
Eg- consider a grammar
S -> aS | Sa | a
Now for string aaa, we will have 4 parse trees, hence ambiguous

SEAGI-NB 3
R20-Regulations NLP

Basic concepts of parsing:


Two problems for grammar G and string w:
Recognition: determine if G accepts w
Parsing: retrieve (all or some) parse trees assigned to w by G
Two basic search strategies:
Top-down parser: start at the root of the tree
Bottom-up parser: start at the leaves

Top-Down Parser

A parsing algorithm can be described as a procedure that searches through various ways of
combining grammatical rules to find a combination that generates a tree that could be the
structure of the input sentence. In other words, the algorithm will say whether a certain
sentence is accepted by the grammar or not. The top-down parsing method is related in many
artificial intelligence (AI) search applications.

SEAGI-NB 4
R20-Regulations NLP

A top-down parser starts with the S symbol and attempts to rewrite it into a sequence of
terminal symbols that matches the classes of the words in the input sentence. The state of the
parse at any given time can be represented as a list of symbols that are the result of operations
applied so far, called the symbol list.

For example, the parser starts in the state (5) and after applying the rule S -> NP VP the
symbol list will be (NP VP). If it then applies the rule NP ->ART N, the symbol list will be
(ART N VP), and so on.

1. S -> NP VP
2. NP -> ART N
3. NP -> ART ADJ N
4. VP -> V
5. VP -> V NP
The parser could continue in this fashion until the state consisted entirely of terminal
symbols, and then it could check the input sentence to see if it matched. The lexical analyser
will produce list of words from the given sentence. A very small lexicon for use in the
examples is

cried: V
dogs: N, V
the: ART
Positions fall between the words, with 1 being the position before the first word. For
example, here is a sentence with its positions indicated:

1 The 2 dogs 3 cried 4

A typical parse state would be

((N VP) 2)

indicating that the parser needs to find an N followed by a VP, starting at position two. New
states are generated from old states depending on whether the first symbol is a lexical symbol
or not. If it is a lexical symbol, like N in the preceding example, and if the next word can
belong to that lexical category, then you can update the state by removing the first symbol
and updating the position counter. In this case, since the word dogs is listed as an N in the
lexicon, the next parser state would be ((VP) 3)
SEAGI-NB 5
R20-Regulations NLP

which means it needs to find a VP starting at position 3. If the first symbol is a nonterminal,
like VP, then it is rewritten using a rule from the grammar. For example, using rule 4 in the
above Grammar, the new state would be

((V) 3)
which means it needs to find a V starting at position 3. On the other hand, using rule 5, the
new state would be

((V NP) 3)
A parsing algorithm that is guaranteed to find a parse if there is one must systematically
explore every possible new state. One simple technique for this is called backtracking. Using
this approach, rather than generating a single new state from the state ((VP) 3), you generate
all possible new states. One of these is picked to be the next state and the rest are saved as
backup states. If you ever reach a situation where the current state cannot lead to a solution,
you simply pick a new current state from the list of backup states. Here is the algorithm in a
little more detail.
A Simple Top-Down Parsing Algorithm

The algorithm manipulates a list of possible states, called the possibilities list. The first
element of this list is the current state, which consists of a symbol list - and a word position In
the sentence, and the remaining elements of the search state are the backup states, each
indicating an alternate symbol-list—word-position pair. For example, the possibilities list
(((N) 2) ((NAME) 1) ((ADJ N) 1))
indicates that the current state consists of the symbol list (N) at position 2, and that there are
two possible backup states: one consisting of the symbol list (NAME) at position 1 and the
other consisting of the symbol list (ADJ N) at position 1.

Top-

down depth-first parse of 1 The 2 dogs 3 cried 4

SEAGI-NB 6
R20-Regulations NLP

The algorithm starts with the initial state ((S) 1) and no backup states

1. Select the current state: Take the first state off the possibilities list and call it C. If the
possibilities list is empty, then the algorithm fails (that is, no successful parse is
possible).

2. If C consists of an empty symbol list and the word position is at the end of the
sentence, then the algorithm succeeds

3. Otherwise, generate the next possible states

If the first symbol on the symbol list of C is a lexical symbol, and the next word in the
sentence can be in that class, then create a new state by removing the first symbol
from the symbol list and updating the word position, and add it to the possibilities list.

Otherwise, if the first symbol on the symbol list of C is a non-terminal, generate a


new state for each rule in the grammar that can rewrite that nonterminal symbol and
add them all to the possibilities list.

Consider an example. Using above Grammar, above shows a trace of the algorithm on the
sentence The dogs cried. First, the initial S symbol is rewritten using rule 1 to produce a new
current state of ((NP VP) 1) in step 2. The NP is then rewritten in turn, but since there are two
possible rules for NP in the grammar, two possible states are generated: The new current state
involves (ART N VP) at position 1, whereas the backup state involves (ART ADJ N VP) at
position 1. In step 4 a word in category ART is found at position 1 of the sentence, and the
new current state becomes (N VP). The backup state generated in step 3 remains untouched.
The parse continues in this fashion to step 5, where two different rules can rewrite VP. The
first rule generates the new current state, while the other rule is pushed onto the stack of
backup states. The parse completes successfully in step 7, since the current state is empty and
all the words in the input sentence have been accounted for.

Consider the same algorithm and grammar operating on the sentence

1 The 2 old 3 man 4 cried 5

In this case assume that the word old is ambiguous between an ADJ and an N and that the
word man is ambiguous between an N and a V (as in the sentence The sailors man the
boats). Specifically, the lexicon is

SEAGI-NB 7
R20-Regulations NLP

the: ART

old: ADJ, N

man: N, V

cried: V

The parse proceeds as follows. The initial S symbol is rewritten by rule 1 to produce the new
current state of ((NP VP) 1). The NP is rewritten in turn, giving the new state of ((ART N
VP) 1) with a backup state of ((ART ADJ N VP) 1). The parse continues, finding the as an
ART to produce the state ((N VP) 2) and then old as an N to obtain the state ((VP) 3). There
are now two ways to rewrite the VP, giving us a current state of ((V) 3) and the backup states
of ((V NP) 3) and ((ART ADJ N) 1) from before. The word man can be parsed as a V. giving
the state (04). Unfortunately, while the symbol list is empty, the word position is not at the
end of the sentence, so no new state can be generated and a backup state must be used. In the
next cycle, step 8, ((V NP) 3) is attempted. Again man is taken as a V and the new state ((NP)
4) generated. None of the rewrites of NP yield a successful parse. Finally, in step 12, the last
backup state, ((ART ADJ N VP) 1), is tried and leads to a successful parse.

Parsing as a Search Procedure

You can think of parsing as a special case of a search problem as defined in Al. In particular,
the top-down parser in this section was described in terms of the following generalized search
procedure. The possibilities list is initially set to the start state of the parse. Then you repeat
the following steps until you have success or failure:

1. Select the first state from the possibilities list (and remove it from the list).

2. Generate the new states by trying every possible option from the selected state (there may
be none if we are on a bad path).

SEAGI-NB 8
R20-Regulations NLP

3. Add the states generated in step 2 to the possibilities list.

A top-down parse of 1 The2 old man 4 cried5

Bottom-Up Parser

A bottom-up parser builds derivation by working from input sentence back toward the start
symbol S. The bottom-up parser is also known as shift-reduce parser.

The basic operation in bottom-up parsing is to take a sequence of symbols and match it to the
right-hand side of the rules. You could build a bottom-up parser simply by formulating this
matching process as a search process. The state would simply consist of a symbol list,
starting with the words in the sentence. Successor states could be generated by exploring all
possible ways to:

 rewrite a word by its possible lexical categories

 replace a sequence of symbols that matches the right-hand side of a grammar rule by
its left-hand side

SEAGI-NB 9
R20-Regulations NLP

Bottom-up Chart Parsing Algorithm


Initialization: For every rule in the grammar of form S -> X1 ... Xk, add an arc labeled S ->
o X1 ... Xk using the arc introduction algorithm.
Parsing: Do until there is no input left:
1. If the agenda is empty, look up the interpretations for the next word in the input and
add them to the agenda.
2. Select a constituent from the agenda (let’s call it constituent C from position p1 to
p2).
3. For each rule in the grammar of form X -> C X1 ... Xn, add an active arc of form X ->
C o C o X1 ........n from position p1 to p2.
4. Add C to the chart using the arc extension algorithm above.
Example1.
Grammar and Lexicon
Grammar:
1. S->NP VP
2. NP - > ART N
3. NP -> ART ADJ N
4. VP -> V NP
Lexicon:
the: ART man: N, V
old: ADJ, N
boat: N

SEAGI-NB 10
R20-Regulations NLP

Sentence: 1 The 2 old 3 man 4 the 5 boat 6

Transition Network Grammars


The Transition Network Grammars are based on nodes and edges labeled arcs. One of the
nodes is specified as the initial state, or start state. Consider the network named NP in the
following Grammar with the initial state labeled NP and each arc labelled with a word
category.

Starting at the initial state, you can traverse an arc if the current word in the sentence is in the
category on the arc. If the arc is followed, the current word is updated to the next word. A
phrase is a legal NP if there is a path from the node NP to a pop arc (an arc labeled pop) that
accounts for every word in the phrase. This network recognizes the same set of sentences as
the following context-free grammar:

SEAGI-NB 11
R20-Regulations NLP

NP -> ART NP1


NP1 -> ADJ NP1

NP1 -> N

Consider parsing the NP a purple cow with this network. Starting at the node NP, you can
follow the arc labelled art, since the current word is an article— namely, a. From node NP1
you can follow the arc labeled adj using the adjective purple, and finally, again from NP1,
you can follow the arc labeled noun using the noun cow. Since you have reached a pop arc, a
purple cow is a legal NP.

Consider finding a path through the S network for the sentence The purple cow ate the grass.
Starting at node 5, to follow the arc labeled NP, you need to traverse the NP network. Starting
at node NP, traverse the network as before for the input the purple cow. Following the pop
arc in the NP network, return to the S network and traverse the arc to node S 1. From node S
1 you follow the arc labeled verb using the word ate. Finally, the arc labeled NP can be
followed if you can traverse the NP network again. This time the remaining input consists of
the words the grass. You follow the arc labeled art and then the arc labeled noun in the NP
network; then take the pop arc from node NP2 and then another pop from node S3. Since you
have traversed the network and used all the words in the sentence, The purple cow ate the
grass is accepted as a legal sentence.

SEAGI-NB 12
R20-Regulations NLP

In natural languages there are often agreement restrictions between words and phrases.
For example, the NP "a men" is not correct English because the article a indicates a single
object while the noun "men" indicates a plural object; the noun phrase does not satisfy the
number agreement restriction of English. There are many other forms of agreement, including
subject- verb agreement, gender agreement for pronouns, restrictions between the head of a
phrase and the form of its complement, and so on. To handle such phenomena conveniently,
the grammatical formalism is extended to allow constituents to have features. For example,
we might define a feature NUMBER that may take a value of either s (for singular) or p (for
plural), and we then might write an augmented CFG rule such as

NP -> ART N only when NUMBER1 agrees with NUMBER2

This rule says that a legal noun phrase consists of an article followed by a noun, but only
when the number feature of the first word agrees with the number feature of the second. This
one rule is equivalent to two CFG rules that would use different terminal symbols for
encoding singular and plural forms of all noun phrases, such as

NP-SING -> ART-SING N-SING

NP-PLURAL -> ART-PLURAL N-PLURAL

This rule says that a legal noun phrase consists of an article followed by a noun, but only
when the number feature of the first word agrees with the number feature of the second.

While the two approaches seem similar in ease-of-use in this one example, consider that all
rules in the grammar that use an NP on the right-hand side would now need to be duplicated
to include a rule for NP-SING and a rule for NP-PLURAL, effectively doubling the size of
the grammar.

To accomplish this, a constituent is defined as a feature structure - a mapping from features to


values that defines the relevant properties of the constituent.

By using feature-based conditions, augmented grammars can capture a wider range of


linguistic phenomena, including agreement.

Augmented grammars provides more precise and detailed linguistic analysis and can handle
complex linguistic phenomena more effectively.

SEAGI-NB 13
R20-Regulations NLP

Morphological Analysis and the Lexicon


Lexical Morphology: Lexical morphology is a specific type of morphology that refers to the
lexemes in a language. A lexeme is a basic unit of lexical meaning; it can be a single word or
a group of words.

Morphological Analysis: Morphological Analysis is the study of lexemes and how they are
created. The discipline is particularly interested in neologisms (newly created words from
existing words(root word)), derivation, and compounding. In morphological analysis each
token will be analysed as follows:

token -> lemma(root word) + part of speech + grammatical features

Examples: cats -> cat+N+plur

played -> play+V+past

katternas ->katt+N+plur+def+gen

Often non-deterministic (more than one solution):

plays -> play+N+plur

plays -> play+V+3sg

Derivation in Lexical Morphology:

Derivation refers to a way of creating new words by adding affixes to the root of a word - this
is also known as affixation. There are two of affixes: prefixes and suffixes.

Prefixes: rewrite ( root is write), unfair (root is fair)

Suffixes: wanted, wants, wanting ( root is want )

Compounding: Compounding refers to the creation of new words by combining two or more
existing words together. Now here are some examples of compounding:

Green + house = Greenhouse

Mother + in + law = Mother-in-law

Motor + bike = Motorbike

Cook + book = Cookbook

SEAGI-NB 14
R20-Regulations NLP

Foot + ball = Football

The lexicon must contain information about all the different words that can be used,
including all the relevant feature value restrictions. When a word is ambiguous, it may be
described by multiple entries in the lexicon, one for each different use.

Most English verbs, for example, use the same set of suffixes to indicate different forms: -s is
added for third person singular present tense, -ed for past tense, -ing for the present
participle, and so on.

The idea is to store the base form of the verb in the lexicon and use context-free rules to
combine verbs with suffixes to derive the other entries. Consider the following rule for
present tense verbs:

(V ROOT ?r SUBCAT ?s VFORM pres AGR 3s) -> (V ROOT ?r SUBCAT ?s VFORM
base) (+S)

where +S is a new lexical category that contains only the suffix morpheme -s. This rule,
coupled with the lexicon entry

want:

(V ROOT want

SUBCAT {_np_vp:inf _np_vp:inf}

VFORM base)

would produce the following constituent given the input string want –s

want:

(V ROOT want

SUBCAT {_np_vp:inf _np_vp:inf}

VFORM pres AGR 3s)

Another rule would generate the constituents for the present tense form not in third person
singular, which for most verbs is identical to the root form:

(V ROOT ?r SUBCAT ?s VFORM pres AGR {ls 2s lp 2p 3p}) —> (V ROOT ?r


SUBCAT ?s VFORM base)
SEAGI-NB 15
R20-Regulations NLP

But this rule needs to be modified in order to avoid generating erroneous interpretations.
Currently, it can transform any base form verb into a present tense form, which is clearly
wrong for some irregular verbs. For instance, the base form be cannot be used as a present
form (for example, *We be at the store). To cover these cases, a feature is introduced to
identify irregular forms. Specifically, verbs with the binary feature +IRREGPRES have
irregular present tense forms. Now the rule above can be stated correctly:

(V ROOT ?r SUBCAT ?s VFORM pres AGR (ls 2s lp 2p 3p)) —> (V ROOT ?r


SUBCAT ?s VFORM base IRREG-PRES -)

Parsing with Features

 Parsing is the process of analyzing the syntactic structure of a sentence according to a


specific grammar or set of rules. It involves breaking down a sentence into its parts,
such as phrases and clauses, and finding the relationships between these components.
The goal of parsing is to understand the underlying structure of a sentence and how its
individual elements combine to convey meaning.

 Parsing with features in NLP involves incorporating additional linguistic information,


called features, into the parsing process. These features provide additional context and
constraints to improve the accuracy and precision of parsing models.

 Features in parsing represent specific linguistic properties of words, phrases. These


properties can include part-of-speech tags, tense, gender, number, and more.

 The goal of incorporating features into parsing is to enhance the accuracy and quality
of the parsed output.

 One common approach is to use probabilistic parsing models, such as statistical


parsers, and augment them with feature-based models. These models assign
probabilities to different parsing choices based on the observed features in the input
sentence.

 Features can be defined at different levels. For example, at the word level, features
may include part-of-speech tags, lemma forms, or morphological(ed,s,ing) properties.
At the syntactic level, features may involve phrase types, head words, or
dependencies between words.

SEAGI-NB 16
R20-Regulations NLP

 Features can be handcrafted by linguists or automatically learned from annotated


training data using machine learning techniques.

Example features that can be used in parsing include:

1. Part-of-speech tags: Providing information about the grammatical category of a


word (noun, pronoun, adverb, adjective…) which can guide parsing decisions.

2. Syntactic parsing: Identifying the correct structure of the sentence then finding the
relationships between words and phrases.

3. Semantic parsing: Identifying the meaning of the words used in the sentence and
understanding the relationship between them.

4. Named entities: Identifying and classifying named entities in the sentence, such as
person names, locations, or organizations

Augmented Transition Networks

The ATN (augmented transition network) is produced by adding new features to a recursive
transition network. Features in an ATN are traditionally called registers. Constituent
structures are created by allowing each network to have a set of registers. Each time a new
network is pushed, a new set of registers is created. As the network is traversed, these
registers are set to values by actions associated with each arc. When the network is popped,
the registers are assembled to form a constituent structure, with the CAT slot being the
network name.

Consider Grammar below fig is a simple NP network. The actions are listed in the table
below the network. ATNs use a special mechanism to extract the result of following an arc.
When a lexical arc, such as arc 1, is followed, the constituent built from the word in the input
is put into a special variable named "*".

The action DET := * then assigns this constituent to the DET register

The second action on this arc, AGR := AGR* assigns the AGR register of the network to the
value of the AGR register of the new word (the constituent in "*"). Agreement checks are
specified in the tests. A test is an expression that succeeds if it returns a nonempty value and
fails if it returns the empty set or nil.

SEAGI-NB 17
R20-Regulations NLP

A simple NP network

If a test fails, its arc is not traversed. The test on arc 2 indicates that the arc can be followed
only if the AGR feature of the network has a non-null intersection with the AGR register of
the new word (the noun constituent in "*").

Features on push arcs are treated similarly. The constituent built by traversing the NP
network is returned as the value "*". Thus in Grammar below, the action on the arc from S to
S1, SUBJ := * would assign the constituent returned by the NP network to the register SUBJ.
The test on arc 2 will succeed only if the AGR register of the constituent in the SUBJ register
has a non-null intersection with the AGR register of the new constituent (the verb). This test
enforces subject- verb agreement.

A simple S Network

SEAGI-NB 18
R20-Regulations NLP

Baye’s rule

 Bayes' rule is a formula that describes how to update your beliefs about something
based on new evidence

 In NLP, Bayes' Rule is used in various applications, such as text classification, spam
filtering, sentiment analysis.

 Bayes' Rule calculates the conditional probability of an event A given an event B,

 It can be expressed as:

o P(A|B) = (P(B|A) * P(A)) / P(B)

 Where:

o P(A|B) the probability of A occurring given that B has occurred

o P(B|A) the probability of B occurring given that A has occurred

o P(A) the initial probability of A occurring

o P(B) the initial probability of B occurring

 Let's consider a simple example of spam email classification using Bayes' Rule.
Assume we have a dataset of emails labeled as spam or not spam, and we want to
classify a new email as spam or not spam based on its content.

o P(spam|word) is the probability that an email is spam given a specific word


occurrence.

o P(word|spam) is the probability of seeing that specific word in a spam email.

o P(spam) is the overall probability of an email being spam.

o P(word) is the overall probability of encountering that specific word in any


email.

 The formula can be applied as follows:

o P(spam|word) = (P(word|spam) * P(spam)) / P(word)

SEAGI-NB 19
R20-Regulations NLP

Shannon's Game

The Shannon game is a thought experiment in linguistics and natural language processing
(NLP) that asks participants to guess the next letter in a sequence based on its preceding
context.

If the next letter in a sequence is highly predictable, then the game will be easy to win.
However, if the next letter is not predictable, then the game will be more difficult to win.

The Shannon game can be used to measure the entropy of a language. Entropy is a measure
of the uncertainty or randomness in a sequence. The higher the entropy, the more uncertain
the sequence is. The lower the entropy, the more predictable the sequence is.

Here is an example of how the Shannon game can be used in NLP:

 A language model is trained on a dataset of text.

 The language model is then used to predict the next letter in a sequence.

 The Shannon game is used to measure the predictability of the language model's
predictions.

 The predictability of the language model's predictions is used to improve the language
model's accuracy.

 So, for example:

o q
o qu
o que
o ques----
o quest---
o questi—
o questio-
o question
The Shannon game is a powerful tool for understanding and improving the predictability of
language

SEAGI-NB 20
R20-Regulations NLP

Entropy and Cross Entropy


Entropy and cross-entropy are important concepts in Natural Language Processing (NLP)
related to information theory and probabilistic models.
Entropy
 Entropy measures the average amount of information or uncertainty in a random
variable or a probability distribution.
 In NLP, entropy is used to quantify the uncertainty or unpredictability of linguistic
events or sequences of words.
 Higher entropy values indicate higher uncertainty or unpredictability, whereas lower
entropy values indicate more predictability.
For example, consider a language model trained on a corpus of text. If the model has
high entropy, it means it is more uncertain about the next word given the previous
context, while low entropy implies higher predictability.
Cross Entropy
 Cross-entropy measures the average number of bits needed to encode events from one
probability distribution using another distribution as a reference.
 In NLP, cross-entropy is commonly used to evaluate the performance of language
models by comparing their predicted distributions with the actual distributions.
 Lower cross-entropy values indicate better performance and higher similarity between
the predicted and true distributions.
For example, when evaluating a language model, a lower cross-entropy indicates that the
model's predicted probabilities align closely with the actual probabilities of words in the
training data.
Here is an example of how entropy and cross entropy can be used in NLP:
 A language model is trained on a dataset of text.
 The language model is then used to predict the next word in a sequence.
 The entropy of the language model's predictions is calculated.
 The cross entropy between the language model's predictions and the actual next word
is calculated.
 The cross entropy is used to update the language model's parameters.
This process is repeated until the cross entropy is minimized. The lower the cross entropy, the
better the language model's predictions.

SEAGI-NB 21

You might also like