0% found this document useful (0 votes)
42 views94 pages

Ai Unit03

The document provides an introduction to Natural Language Processing (NLP), detailing its history, components, and various techniques used in the field. It covers key concepts such as syntactic and semantic analysis, parsing types, and the challenges faced in natural language understanding. Additionally, it outlines the steps involved in NLP, including lexical analysis, syntactic analysis, and pragmatic analysis.

Uploaded by

24963
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views94 pages

Ai Unit03

The document provides an introduction to Natural Language Processing (NLP), detailing its history, components, and various techniques used in the field. It covers key concepts such as syntactic and semantic analysis, parsing types, and the challenges faced in natural language understanding. Additionally, it outlines the steps involved in NLP, including lexical analysis, syntactic analysis, and pragmatic analysis.

Uploaded by

24963
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Introduction to

Artificial Intelligence

MODULE 03
Natural Language Processing

Department of Computer Science, YIASCM


Index:
• Introduction to NLP
• Basics of Syntactic Processing
• Meaning, concept of parser
• Types of Parsing, Concept of derivation, Types of derivation
• Concept of grammar, Phrase structure or Constituency Grammar
• Dependency Grammar, Context-Free Grammar
• Basic of Semantic analysis
• Basics of grammar free analyzers

23-10-2024 Department of Computer Science, YIASCM 2


What is NLP?

• NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence.
• It is the technology that is used by machines to understand, analyse,
manipulate, and interpret human languages.
• It helps developers to organize knowledge for performing tasks such as translation,
automatic summarization, Named Entity Recognition (NER), speech recognition,
relationship extraction, and topic segmentation.

23-10-2024 Department of Computer Science, YIASCM 3


History of NLP
● Natural Language Processing started in 1950 When Alan Mathison Turing
published an article in the name Computing Machinery and Intelligence.
● It is based on Artificial intelligence. It talks about automatic interpretation and
generation of natural language. As the technology evolved, different approaches
have come to deal with NLP tasks.
● Heuristics-Based NLP: This is the initial approach of NLP. It is based on defined
rules. Which comes from domain knowledge and expertise. Example: regex
● Statistical Machine learning-based NLP: It is based on statistical rules and
machine learning algorithms. In this approach, algorithms are applied to the data
and learned from the data, and applied to various tasks. Examples: Naive Bayes,
support vector machine (SVM), hidden Markov model (HMM), etc.

23-10-2024 Department of Computer Science, YIASCM 4


History of NLP

• Neural Network-based NLP: This is the latest approach that comes with the
evaluation of neural network-based learning, known as Deep learning. It provides
good accuracy, but it is a very data-hungry and time-consuming approach.
• It requires high computational power to train the model. Furthermore, it is based on
neural network architecture. Examples: Recurrent neural networks (RNNs), Long
short-term memory networks (LSTMs), Convolutional neural networks (CNNs),
Transformers, etc.

23-10-2024 Department of Computer Science, YIASCM 5


Components of NLP
There are two components of NLP as given:
● Natural Language Understanding (NLU) Understanding involves the
following tasks: Mapping the given input in natural language into useful
representations. Analyzing different aspects of the language.
● Natural Language Generation (NLG) It is the process of producing
meaningful phrases and sentences in the form of natural language from
some internal representation.

23-10-2024 Department of Computer Science, YIASCM 6


Components of NLP

It involves:
● Text planning: It includes retrieving the relevant content from
knowledge base.
● Sentence planning: It includes choosing required words, forming
meaningful phrases, setting tone of the sentence.
● Text Realization: It is mapping sentence plan into sentence
structure.

23-10-2024 Department of Computer Science, YIASCM 7


Difficulties in NLU

● NL has an extremely rich form and structure.


● It is very ambiguous. There can be different levels of ambiguity:
❏ Lexical ambiguity: It is at very primitive level such as word-level.
● For example, treating the word “board” as noun or verb?
❑ Syntax Level ambiguity: A sentence can be parsed in different ways.
● For example, “The chicken is ready to eat.” This sentence has two possible
meanings:The chicken (as food) is prepared and ready for someone to eat.
The chicken (as a live animal) is hungry and ready to eat.

23-10-2024 Department of Computer Science, YIASCM 8


Difficulties in NLU

❑ Referential ambiguity: Referring to something using pronouns.


For example, Rima went to Gauri. She said, “I am tired.” -
Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.

23-10-2024 Department of Computer Science, YIASCM 9


NLP Terminology

● Phonology: It is study of organizing sound systematically.


● Morphology: It is a study of construction of words from primitive
meaningful units.
● Morpheme: It is primitive unit of meaning in a language.
● Syntax: It refers to arranging words to make a sentence. It also involves
determining the structural role of words in the sentence and in phrases.
● Semantics: It is concerned with the meaning of words and how to
combine words into meaningful phrases and sentences

23-10-2024 Department of Computer Science, YIASCM 10


NLP Terminology

● Pragmatics: It deals with using and understanding sentences in


different situations and how the interpretation of the sentence is
affected.
● Discourse: It deals with how the immediately preceding sentence
can affect the interpretation of the next sentence.
● World Knowledge: It includes the general knowledge about the
world.

23-10-2024 Department of Computer Science, YIASCM 11


ACTIVITY - 10
1. What is Natural Language Processing (NLP), and how does it
differ from traditional programming languages? Provide a brief
overview of NLP techniques used for language understanding.
2. Why is NLP important in today's digital age? Explain examples of
NLP applications in everyday life.
3. How has NLP evolved over the years? Briefly explain key
milestones in the history of NLP, from early rule-based systems to
modern machine learning approaches.

23-10-2024 Department of Computer Science, YIASCM 12


Steps in NLP
There are general five steps:

23-10-2024 Department of Computer Science, YIASCM 13


1. Lexical Analysis
Lexical analysis is the process of identifying and analyzing the structure
of words in a given text. In programming and linguistics, it involves
breaking down the input text into smaller units called lexemes (words,
phrases, or tokens) that form the basis of meaning in a language.
Lexicon: A collection of words and phrases in a language.
Lexical Analysis: Dividing the text into paragraphs, sentences, and words,
identifying tokens that will be further processed by a compiler or
interpreter.
EX: AI is interesting.
Lexical analysis is ‘AI’ ‘is’ ‘interesting’ ‘.’

23-10-2024 Department of Computer Science, YIASCM 14


2. Syntactic Analysis (Parsing)

Syntactic analysis involves examining the grammatical structure of a sentence by


analyzing its components (words) and arranging them in a way that reflects their
relationships.
It checks if a sentence follows the rules of grammar to create meaningful sentences.
For ex: "The dog barked loudly at the stranger.“
Part-of-Speech (POS) Tagging S (Sentence)
"The" → Determiner (DET) / \
NP VP (Verb Phrase)
"dog" → Noun (N) / \ / \
"barked" → Verb (V) DET N V PP (Prepositional Phrase)
"loudly" → Adverb (ADV) | | | / \
"The" "dog" "barked" P NP (Noun Phrase)
"at" → Preposition (P) | /\
"the" → Determiner (DET) "at" DET N
"stranger" → Noun (N) | |
"the" "stranger"
23-10-2024 Department of Computer Science, YIASCM 15
3. Semantic Analysis

Semantic analysis is the process of extracting the exact or dictionary


meaning from the text. It checks if the sentence conveys a meaningful
idea.
It involves mapping syntactic structures (the grammatical arrangement
of words) to real-world objects or concepts in the task domain.
For example, the sentence "hot ice-cream" would be disregarded by a
semantic analyzer, even though it's syntactically correct. This is because
it violates real-world knowledge—ice cream is cold, so the phrase "hot
ice-cream" doesn't make sense.
Semantic analysis ensures that a sentence is not only grammatically
correct but also meaningful in context.
23-10-2024 Department of Computer Science, YIASCM 16
4. Discourse Integration
It is the process of understanding how sentences relate to each other within a
larger context. The meaning of a sentence can be influenced by the sentences that
come before and after it.

✓ Contextual Understanding:
The meaning of a sentence often relies on its context within a larger text or
conversation. This includes the preceding and following sentences.
For example, in a dialogue:
Sentence 1: "I went to the bank."
Sentence 2: "It was closed for renovations."
Here, Sentence 2 clarifies the context of Sentence 1, indicating why the
speaker couldn't conduct their business at the bank.

23-10-2024 Department of Computer Science, YIASCM 17


✓Coherence and Cohesion:
Coherence refers to the overall logical flow of ideas in a text.
Cohesion involves the grammatical and lexical connections between
sentences, such as the use of pronouns or conjunctions.
For example:
Sentence 1: "Sarah loves hiking."
Sentence 2: "She goes every weekend."
The pronoun "She" refers back to "Sarah," creating cohesion and
maintaining coherence in the text.

23-10-2024 Department of Computer Science, YIASCM 18


5. Pragmatic Analysis
During this, what was said is re-interpreted on what it actually meant. It
involves deriving those aspects of language which require real world
knowledge. It is mainly used in chatbots and sentiment analysis.

EX:
Person A: “Are you coming to the party tonight?”
Person B: “I have to study for my exam.”

Pragmatic Interpretation: Person B is likely implying that they will not


attend the party because studying takes precedence.

23-10-2024 Department of Computer Science, YIASCM 19


Syntactic analysis
• Syntactic analysis or parsing or syntax analysis is the third phase of
NLP. The purpose of this phase is to draw exact meaning, or you
can say dictionary meaning from the text.
• Syntax analysis checks the text for meaningfulness comparing to
the rules of formal grammar. For example, the sentence like “hot
ice-cream” would be rejected by semantic analyzer.
• In this sense, syntactic analysis or parsing may be defined as the
process of analyzing the strings of symbols in natural language
conforming to the rules of formal grammar.

23-10-2024 Department of Computer Science, YIASCM 20


Parser
• It may be defined as the software component designed for taking input data
(text) and giving structural representation of the input after checking for
correct syntax as per formal grammar.
• It also builds a data structure generally in the form of parse tree or abstract
syntax tree or other hierarchical structure.
• Parsing is a crucial technique in NLP that involves analyzing a string of
symbols, either in natural language or computer languages, according to the
rules of a formal grammar.
• Parsing helps in understanding the syntactic structure of a sentence and is
essential for various NLP tasks such as machine translation, question answering,
and text-to-speech systems.

23-10-2024 Department of Computer Science, YIASCM 21


Concept of Parser

23-10-2024 Department of Computer Science, YIASCM 22


The main roles of the parse include −
● To report any syntax error.
● To recover from commonly occurring error so that the processing
of the remainder of program can be continued.
● To create parse tree.
● To create symbol table.
● To produce intermediate representations (IR).

23-10-2024 Department of Computer Science, YIASCM 23


Types of Parsing in NLP
Syntactic Parsing (Syntax Analysis)
• To analyze the grammatical structure of a sentence.
• A parse tree or syntax tree that represents the syntactic structure according to a
given grammar.
Semantic Parsing
• To convert a natural language sentence into a formal representation of its
meaning.
• A logical form or some representation of the sentence's meaning.

23-10-2024 Department of Computer Science, YIASCM 24


There are two main approaches in syntactic parsing
1. Constituency Parsing
Constituency Parsing breaks down a sentence into its parts, or "constituents," like
noun phrases and verb phrases. It creates a parse tree that shows how these parts fit
together to form the sentence's hierarchical structure.
"The professor teaches computer science.“
S (Sentence): The whole sentence.
NP (Noun Phrase): "The professor"
Det (Determiner): "The"
N (Noun): "Professor"
VP (Verb Phrase): "Teaches computer science"
V (Verb): "Teaches"
NP (Noun Phrase): "Computer science"
N (Noun): "Computer science" (treated as a compound noun here)

23-10-2024 Department of Computer Science, YIASCM 25


2. Dependency Parsing
Dependency Parsing focuses on the grammatical relationships between words. It
constructs a tree where each word is connected to another word, indicating a
dependency relationship, such as subject-verb-object.
"The student submitted the assignment.“
"Submitted" is the main verb.
"Student" is the subject dependent on the verb "submitted."
"Assignment" is the object dependent on the verb "submitted."
"The" is dependent on "student" and "assignment" as their determiner.

23-10-2024 Department of Computer Science, YIASCM 26


Semantic Parsing
1. Extracting Meaning: Unlike syntactic parsing, which looks at the
structure, semantic parsing tries to understand what the sentence means.
2. Context Understanding: It analyzes the roles of words in a specific
context and how they relate to each other.
3. Applications:
• Question Answering: Helps in understanding and answering questions correctly.
• Knowledge Base Populating: Extracts information from text to fill in databases.
• Text Understanding: Helps in comprehending the overall meaning of documents
and texts.

23-10-2024 Department of Computer Science, YIASCM 27


Parsing Techniques in NLP
The fundamental link between a sentence and its grammar is derived from a parse tree. A
parse tree is a tree that defines how the grammar was utilized to construct the sentence.
There are mainly two parsing techniques, commonly known as top-down and bottom-
up.
Top-down:
• Starts from the root of the parse tree (the start symbol) and works down to the leaves
(the words in the sentence).
• Top-down parsing is a method of parsing where the process starts from the root of the
parse tree (usually the start symbol of a grammar) and tries to break down the
sentence into smaller components (like words and phrases), following the grammar
rules. The parser predicts what structure the sentence might have and tries to match it
with the actual input.
• This type of parsing uses a recursive approach, meaning it keeps breaking down the
sentence until it matches the given input or finds that no valid parse exists.
23-10-2024 Department of Computer Science, YIASCM 28
Working:
1. Start with the Root: Begin with the start symbol (S) of the
grammar.
2. Expand Using Grammar Rules: Look at rules where S is on the
left side and use them to expand the tree.
3. Create Tree from Top to Bottom: Continue expanding the tree
from top to bottom until reaching the words of the sentence.
4. Backtracking: If a path doesn't match the sentence, backtrack to the
previous step and try a different rule.

23-10-2024 Department of Computer Science, YIASCM 29


The parse tree looks like this:
Example 01: S
For the sentence "the cat sat" / \

● Start with S. NP VP
● Use the rule S → NP VP. /\ |
● Expand NP to Det N (using the rule Det N V
NP → Det N).
"the" "cat" "sat"
● Expand VP to V (using the rule VP →
V).

23-10-2024 Department of Computer Science, YIASCM 30


Example 02
S → NP VP (A sentence is made of a noun phrase and a verb phrase)
NP → Det N (A noun phrase is a determiner followed by a noun)
VP → V NP (A verb phrase is a verb followed by a noun phrase)
Det → 'the‘
N → 'cat' | 'mouse‘
V → 'chases‘
And the sentence to parse: "the cat chases the mouse".

23-10-2024 Department of Computer Science, YIASCM 31


Step-by-Step Top-Down Parsing

1. Start with the start symbol (S):


The sentence structure is predicted as S → NP VP.
2. Expand NP (Noun Phrase):
NP → Det N.
Now, predict that the first word in the sentence will be a Det (determiner) followed by
a N (noun).
3. Match with input:
The input starts with "the", which matches the rule Det → 'the'.
Now, look for a noun. The next word in the input is "cat", which matches N → 'cat'.
4. Expand VP (Verb Phrase):
VP → V NP.
The verb phrase is predicted as a verb followed by a noun phrase.
23-10-2024 Department of Computer Science, YIASCM 32
5. Match with input:
The next word in the input is "chases", which
matches V → 'chases'.
Now, look for another noun phrase (NP).

6. Expand NP:
NP → Det N again.
The next word is "the", matching Det → 'the',
and the last word is "mouse", matching N →
'mouse'.

7. Complete the parse:


The whole input "the cat chases the mouse" is
successfully parsed according to the grammar.

23-10-2024 Department of Computer Science, YIASCM 33


2. Bottom-Up Parsing
• Starts from the words in the sentence and works up to the root of the parse tree
(the start symbol).
• Bottom-Up Parsing is a parsing technique in which the parser starts from the
input (the individual words or tokens of a sentence) and tries to build the parse
tree by working its way upwards, combining smaller parts (like words or phrases)
into larger structures until it reaches the start symbol of the grammar.
• The parser begins with the input words and repeatedly tries to find grammar rules
that can combine them into bigger phrases.
• Reduces small parts (like words) into larger parts (like noun phrases or verb
phrases) based on grammar rules.
• Continues this process until the whole sentence is reduced to the start symbol of
the grammar (e.g., a "sentence").

23-10-2024 Department of Computer Science, YIASCM 34


Working:
1. Start with the Words: Begin with the words of the sentence as the
leaves of the tree.
2. Apply Grammar Rules: Use rules to combine words into larger
constituents.
3. Build Tree from Bottom to Top: Continue combining constituents
until reaching the start symbol (S) at the root.
4. Reduction: Replace parts of the sentence with the left side of
matching rules, reducing the sentence step by step until only the start
symbol remains.

23-10-2024 Department of Computer Science, YIASCM 35


Example: The parse tree looks like
this:
For the sentence "the cat sat":
S
● Start with words: "the", "cat", "sat".
/ \
● Combine "the" and "cat" into NP
(using the rule NP → Det N). NP VP
● Combine NP and "sat" into S (using / \ |
the rule S → NP VP and VP → V).
Det N V
"the" "cat" "sat“

23-10-2024 Department of Computer Science, YIASCM 36


Example:
Consider the sentence:
The dog barks.
Grammar Rules:
S → NP VP (A sentence is made of a noun phrase followed by a verb
phrase)
NP → Det N (A noun phrase is made of a determiner and a noun)
VP → V (A verb phrase is just a verb)
Det → 'The'
N → 'dog'
V → 'barks'

23-10-2024 Department of Computer Science, YIASCM 37


Step-by-Step Process for bottom up parsing:
1. Input: "The dog barks.“
2. Look at the words: Identify parts of speech (Det = "The", N = "dog", V =
"barks").
Det: "The“
N: "dog“
V: "barks"
3. Apply the rules bottom-up:
First, combine Det + N → NP:
"The" + "dog" → NP ("The dog")
Now combine V → VP:
"barks" → VP

23-10-2024 Department of Computer Science, YIASCM 38


4. Final reduction:
Combine NP + VP → S (Sentence):
"The dog" + "barks" → S (The dog barks)
The parser successfully constructs a valid
parse tree for the sentence, starting from
the individual words and working its way
up to form the full sentence.

23-10-2024 Department of Computer Science, YIASCM 39


Top Down Parsing Bottom Up Parsing

It starts evaluating the parse tree from the lowest


It starts evaluating the parse tree from the top and
level of the tree and move upwards for parsing the
move downwards for parsing other nodes.
node.

It attempts to find the left most derivation for a It attempts to reduce the input string to first symbol
given string. of the grammar.

Top-down parsing uses leftmost derivation. Bottom-up parsing uses the rightmost derivation.

Bottom-up parsing searches for a production rule to


Top-down parsing searches for a production rule
be used to reduce a string to get a starting symbol of
to be used to construct a string.
grammar.

23-10-2024 Department of Computer Science, YIASCM 40


How Does the Parser Work?
●The first step is to identify the sentence’s subject. The parser divides the
text sequence into a group of words that are associated with a phrase. So,
this collection of words that are related to one another is referred to as the
subject.
●Syntactic parsing and parts of speech are context-free grammar
structures that are based on the structure or arrangement of words. It is not
determined by the context.
●The most important thing to remember is that grammar is always
syntactically valid, even if it may not make contextual sense.

23-10-2024 Department of Computer Science, YIASCM 41


Applications of Parsing in NLP
● Syntactic Analysis: Parsing helps in determining the syntactic structure of
sentences by detecting parts of speech, phrases, and grammatical relationships
between words. This information is critical for understanding sentence
grammar.
● Named Entity Recognition (NER): NER parsers can detect and classify
entities in text, such as people’s, organizations, and locations’ names, among
other things. This is essential for information extraction and text
comprehension.
● Semantic Role Labeling (SRL): SRL parsers determine the semantic roles
of words in a sentence, such as who is the “agent,” “patient,” or “instrument” in
a given activity. It is essential for understanding the meaning of sentences.

23-10-2024 Department of Computer Science, YIASCM 42


ACTIVITY - 11
Identify the syntactic components of the given sentences by breaking it down into its
basic parts of speech, such as noun phrases (NP), verb phrases (VP), adjectives (Adj),
prepositional phrases (PP), etc. Explain the semantic interpretation by describing the
meaning conveyed by the sentence, focusing on the roles of the subject, predicate, and
any additional descriptive elements.
1. The intelligent student solved the complex problem quickly.
2. The small child happily played with the colorful toys.
3. The dedicated team worked on the challenging project throughout the night.

[Form groups of 3-5 students. Solve for the above sentences and should be able to
explain it when asked.]

23-10-2024 Department of Computer Science, YIASCM 43


Concept of Derivation
● In order to get the input string, we need a sequence of production rules. Derivation is
a set of production rules.

● In the context of parsing, a non-terminal is a symbol used in grammar rules that can
be further expanded or replaced by other symbols (both terminals and non-terminals).
Non-terminals represent abstract grammatical categories or structures, such as a
sentence (S), noun phrase (NP), or verb phrase (VP).

● During parsing, the non-terminal symbols are progressively replaced by other


symbols based on grammar rules (called production rules) until only terminal
symbols, which are the actual words or tokens in the sentence, remain. Non-terminals
help define the structure of a sentence but aren't the final words themselves.

23-10-2024 Department of Computer Science, YIASCM 44


Example Sentence: "The cat sleeps.“
Step-by-Step Parsing Process (Non-terminals and Terminals):
1. Grammar Rules (Production Rules):
We'll define some basic production rules for parsing this sentence:
S → NP VP (A sentence can be a noun phrase followed by a verb phrase)
NP → Det N (A noun phrase can be a determiner followed by a noun)
VP → V (A verb phrase can just be a verb)
Det → "The" (The determiner "The")
N → "cat" (The noun "cat")
V → "sleeps" (The verb "sleeps")

23-10-2024 Department of Computer Science, YIASCM 45


2. Start with the Non-terminal S (Sentence):
S is the starting non-terminal that represents the entire sentence.
According to the rule:
S → NP VP
So, we break it down into a noun phrase (NP) and a verb phrase (VP).
3. Replace NP (Noun Phrase):
Now, we expand NP using the rule:
NP → Det N
This breaks the noun phrase into a determiner (Det) and a noun (N).
4. Replace Det and N (Terminal symbols):
Det → "The"
We replace the non-terminal Det with the terminal word "The".
N → "cat"
We replace the non-terminal N with the terminal word "cat".
So now, the NP is fully replaced by "The cat"

23-10-2024 Department of Computer Science, YIASCM 46


5. Replace VP (Verb Phrase)
Now, we move to the verb phrase VP and use the rule:
VP → V
This means the verb phrase is replaced by a verb V.
6. Replace V (Terminal symbol):
V → "sleeps"
We replace the non-terminal V with the terminal word "sleeps".

Final Breakdown of the Sentence:


S → NP VP
NP → Det N → "The cat“
VP → V → "sleeps“
Thus, "The cat sleeps." is generated through the replacement of non-
terminal symbols by terminals following the production rules.

23-10-2024 Department of Computer Science, YIASCM 47


Concept of Derivation: Example

Non-terminals: Terminals:
These are the abstract symbols that These are the actual words in the
represent parts of the sentence and sentence, which are the final outputs
need to be replaced: after replacing non-terminals:
○ S (Sentence) ○ "The" (Determiner)
○ NP (Noun Phrase) ○ "cat" (Noun)
○ VP (Verb Phrase) ○ "sleeps" (Verb)
○ Det (Determiner)
○ N (Noun)
○ V (Verb)
Types of Derivation
● Left-most Derivation
In the left-most derivation, the sentential form of an input is scanned and replaced
from the left to the right. The sentential form in this case is called the left-sentential
form.

● Right-most Derivation
In the left-most derivation, the sentential form of an input is scanned and replaced
from right to left. The sentential form in this case is called the right-sentential form.

23-10-2024 Department of Computer Science, YIASCM 49


Types of Derivation

● Left-most Derivation: Example


Sentence: “The cat sleeps”
Grammar Rules:
■ S → NP VP (A sentence is a noun phrase followed by a verb phrase)
■ NP → Det N (A noun phrase is a determiner followed by a noun)
■ VP → V (A verb phrase is a verb)
■ Det → "The"
■ N → "cat"
■ V → "sleeps"
Types of Derivation
Left-most Derivation: Example
In left-most derivation, we always expand the left-most non-terminal first.
Step-by-step process:
■ S → NP VP
We start with the left-most non-terminal S and expand it to NP and VP.
■ NP → Det N
Now, we expand the left-most non-terminal NP into Det and N.
■ Det → "The"
The left-most non-terminal is now Det, so we replace it with the terminal word
"The".
■ N → "cat"
Now the left-most non-terminal is N, so we replace it with "cat".
■ VP → V
Finally, we expand the remaining VP to V.
■ V → "sleeps"
We replace the V with the terminal word "sleeps".
Types of Derivation
Left-most Derivation: Example
Left-most derivation sequence:
○ S → NP VP
○ NP VP → Det N VP
○ Det N VP → "The" N VP
○ "The" N VP → "The" "cat" VP
○ "The cat" VP → "The cat" V
○ "The cat" V → "The cat sleeps"
Thus, in the left-most derivation, we expand non-terminals from left to right.
Types of Derivation

Right-most Derivation: Example


In right-most derivation, we always expand the right-most non-terminal
first.
Step-by-step process:
○ S → NP VP [Start with S and expand it to NP and VP.]
○ VP → V [Instead of expanding NP (left-most), we expand the right-most non-
terminal, which is VP into V.]
○ V → "sleeps“ [We replace the V with the terminal word "sleeps".]
○ NP → Det N [Now, we go back to the left and expand NP into Det and N.]
○ N → "cat“ [We replace the right-most non-terminal N with the terminal word
"cat".]
○ Det → "The" [Finally, we replace Det with "The".]
Types of Derivation
Right-most Derivation: Example
Right-most derivation sequence:
○ S → NP VP
○ NP VP → NP V
○ NP V → NP "sleeps"
○ NP "sleeps" → Det N "sleeps"
○ Det N "sleeps" → Det "cat" "sleeps"
○ Det "cat" "sleeps" → "The" "cat" "sleeps"
○ In right-most derivation, non-terminals are expanded from right to left.
Grammar

Grammar is essential for defining the syntactic structure of well-formed


sentences in both natural languages (like English, Hindi) and programming
languages (like C). It ensures that communication, whether between humans
or between humans and computers, is clear and meaningful.
❑ Natural Language Grammar
Defines rules for combining words to form meaningful sentences.
Example:
○ Sentence: "The cat sat on the mat."
○ Rule: [Determiner] + [Noun] + [Verb] + [Preposition] + [Determiner] +
[Noun]

23-10-2024 Department of Computer Science, YIASCM 55


❑ Formal Language Grammar (Programming)
Purpose: Defines rules for combining symbols to form valid programs.
Example:
Function in C:

int add(int a, int b) {


return a + b;
}
Rule: return_type function_name(parameter_list) { body }

23-10-2024 Department of Computer Science, YIASCM 56


Chomsky's Model of Grammar (1956)
Noam Chomsky introduced a mathematical model for grammars, which is useful for describing both natural
and programming languages.

Formal Grammar: A 4-Tuple Model


A grammar G can be formally represented as a 4-tuple (N, T, S, P):
1. N (Non-terminal symbols): Variables representing groups of symbols.
2. T or ∑ (Terminal symbols): Basic symbols from which strings are formed.
3. S (Start symbol): The initial non-terminal symbol, where S∈N.
4. P (Production rules): Rules for transforming non-terminal symbols into other non-
terminal or terminal symbols. The form is are strings composed
of symbols from contains at least one non-terminal symbol.

23-10-2024 Department of Computer Science, YIASCM 57


Example of a Grammar:
Consider the grammar G=(N,T,S,P) where:
N={S,A}
T={a, b}
S=S
P={S→ aA, A→ b}
This grammar describes a simple language where: The start symbol S can be replaced by
aA.
The non-terminal symbol A can be replaced by the terminal symbol b. Thus, the
derivation for the string "ab" could look like this:
S→ aA
A→ b
So, the string "ab" belongs to the language generated by this grammar.

23-10-2024 Department of Computer Science, YIASCM 58


Example of a Formal Grammar
For a simple arithmetic expression grammar:
1. N: {Expression, Term, Factor}
2. T: {+, *, (, ), id}
3. S: Expression
4. P:
○ Expression → Expression + Term | Term
○ Term → Term * Factor | Factor
○ Factor → ( Expression ) | id

23-10-2024 Department of Computer Science, YIASCM 59


Example:
Let’s say we have this grammar:
N = {S, NP, VP} (non-terminals)
T = {dog, runs, the, a} (terminals)
S = S (start symbol)
P = {S → NP VP, NP → the dog, VP → runs} (production rules)

If we start with S (start symbol), we use the rules:


S → NP VP (a sentence becomes a noun phrase + verb phrase)
NP → the dog (the noun phrase becomes "the dog")
VP → runs (the verb phrase becomes "runs")

Now we have: "the dog runs," which is a valid sentence made up of terminal
symbols.

23-10-2024 Department of Computer Science, YIASCM 60


Phrase Structure or Constituency Grammar
Phrase Structure Grammar, also known as Constituency Grammar, was
introduced by Noam Chomsky. This type of grammar focuses on how
sentences are composed of smaller units called constituents, which can be
nested within each other.
1. Constituency Relation:
○ Sentences are structured based on the relationship between
constituents.
○ Derived from the subject-predicate structure in Latin and Greek
grammar.
2. Basic Clause Structure:
○ Consists of a Noun Phrase (NP) and a Verb Phrase (VP).

23-10-2024 Department of Computer Science, YIASCM 61


Example of Constituency Grammar
Let's break down the sentence: "This tree is illustrating the constituency
relation."
Constituency Structure
1. Sentence (S): The entire sentence.
2. Noun Phrase (NP): The subject of the sentence.
○ Example: "This tree"
3. Verb Phrase (VP): The predicate of the sentence, which includes the verb
and additional components.
○ Example: "is illustrating the constituency relation"

23-10-2024 Department of Computer Science, YIASCM 62


Phrase Structure or Constituency Grammar

23-10-2024 Department of Computer Science, YIASCM 63


Dependency Grammar
Dependency Grammar (DG) is a type of syntactic structure that differs from
constituency grammar. It focuses on the dependency relation between words in a
sentence rather than on hierarchical phrases. DG was introduced by Lucien Tesnière
and is characterized by the following:
• Linguistic Units and Directed Links: In DG, words (linguistic units) are
connected to each other by directed links that denote dependencies.
• Central Role of the Verb: The verb is the central element of the clause structure.
• Dependencies: Other syntactic units are connected to the verb and each other
through directed links, forming dependencies.

23-10-2024 Department of Computer Science, YIASCM 64


Example Sentence: "This tree is illustrating the dependency relation"
To understand how dependency grammar works, let's break down the sentence "This
tree is illustrating the dependency relation" into its dependencies.
1.Root Verb: The main verb "illustrating" is the center of the clause.
2.Subject: "tree" is the subject of the verb "illustrating".
3.Determiner: "This" modifies "tree".
4.Auxiliary Verb: "is" is an auxiliary verb that helps the main verb "illustrating".
5.Object: "relation" is the object of "illustrating".
6.Modifier: "dependency" modifies "relation".

23-10-2024 Department of Computer Science, YIASCM 65


Dependency Grammar
We can write the sentence “This tree is illustrating the dependency relation”
as follows;

23-10-2024 Department of Computer Science, YIASCM 66


Context-Free Grammar (CFG)
Context-Free Grammar (CFG), also known as CFG, is a formalism used to
describe languages. It is more powerful than regular grammar and is capable of
generating more complex languages. CFG is composed of four main
components:
1. Set of Non-terminals (V)
2. Set of Terminals (Σ)
3. Set of Productions (P)
4. Start Symbol (S)

23-10-2024 Department of Computer Science, YIASCM 67


1.Set of Non-terminals (V)
○ Non-terminals are syntactic variables that represent sets of
strings.
○ They help define the structure of the language generated by
the grammar.
Example: ‘V = {S, A, B}’
2.Set of Terminals (Σ)
○ Terminals are the basic symbols from which strings are
formed.
○ They are also called tokens.
Example: ‘Σ = {a, b}’

23-10-2024 Department of Computer Science, YIASCM 68


Set of Productions (P)
●Productions define how terminals and non-terminals can be combined.
●Each production consists of a non-terminal, an arrow (→), and a sequence of
terminals and/or non-terminals.
●Non-terminal symbols on the left side are rewritten as the sequence on the right side.
Example: ‘P = {S → AB, A → a, B → b}’
Start Symbol (S)
●The start symbol is a special non-terminal from which the production begins.
Example: ‘S = S’

23-10-2024 Department of Computer Science, YIASCM 69


Example CFG
Let's consider a simple example of a CFG that generates the
language consisting of the string "ab":
Non-terminals (V): ‘V = {S, A, B}’
Terminals (Σ): ‘Σ = {a, b}’
Productions (P):
○ ‘S → AB’
○ ‘A → a’
○ ‘B → b’
Start Symbol (S): ‘S = S’

23-10-2024 Department of Computer Science, YIASCM 70


Derivation Process
The derivation process shows how the start symbol is
transformed into a string using the productions.
1.Start with S: ‘S’
2.Apply S → AB: ‘AB’
3.Apply A → a: ‘aB’
4.Apply B → b: ‘ab’
So, the string "ab" is generated by this CFG.

23-10-2024 Department of Computer Science, YIASCM 71


Diagram
Here's a simple diagram representing the
example CFG:
S ‘S’ is the start symbol.
‘S’ produces ‘A’ and ‘B’.
/ \
‘A’ produces the terminal ‘a’.
A B ‘B’ produces the terminal ‘b’.
| |
a b

23-10-2024 Department of Computer Science, YIASCM 72


Semantic Analysis
Semantic analysis is the process of extracting meaning from text. It helps computers
understand and interpret sentences, paragraphs, or entire documents by analyzing
their structure and identifying relationships between words.
Key Points of Semantic Analysis
1. Understanding Relationships: It examines how words in a text relate to each
other.
2. Applications: Used in tools like chatbots, search engines, and text analysis.
3. Benefits: Helps companies extract valuable information from unstructured data
like emails and customer feedback.

23-10-2024 Department of Computer Science, YIASCM 73


How Semantic Analysis Works
Lexical Semantics
• Hyponyms and Hypernyms: A hyponym is a specific instance of a
general concept (hypernym). E.g., "orange" (hyponym) is a type of "fruit"
(hypernym).
• Meronomy: Describes parts of a whole. E.g., a "segment" of an orange.
• Polysemy: Words that have multiple meanings related by a common core.
E.g., "paper" can mean a material to write on or an academic article.
• Synonyms: Words with similar meanings. E.g., happy, content, ecstatic.
• Antonyms: Words with opposite meanings. E.g., happy and sad.
• Homonyms: Words that sound and are spelled the same but have different
meanings. E.g., "orange" (color) and "orange" (fruit).

23-10-2024 Department of Computer Science, YIASCM 74


Machine Learning in Semantic Analysis
Training Algorithms: Feeding algorithms with text samples to learn from past
observations.
Sub-tasks:
Word Sense Disambiguation: Identifying the correct meaning of a word based on
context. E.g., "date" can mean a day, a fruit, or a meeting.
Relationship Extraction: Detecting relationships between entities in a text. E.g.,
“Steve Jobs is one of the founders of Apple.”

23-10-2024 Department of Computer Science, YIASCM 75


Semantic Analysis Techniques
Semantic Classification Models
1. Topic Classification: Categorizing text based on content.
○ E.g., Classifying support tickets as "Payment issue" or "Shipping problem".
2. Sentiment Analysis: Detecting emotions (positive, negative, or neutral) in text.
○ E.g., Tagging Twitter mentions by sentiment to gauge customer feelings.
3. Intent Classification: Determining what customers want to do next.
E.g., Tagging sales emails as “Interested” or “Not Interested”.

23-10-2024 Department of Computer Science, YIASCM 76


Semantic Extraction Models
1.Keyword Extraction: Identifying important words and phrases in
text.
○ E.g., Analyzing keywords in negative tweets to find common
complaints.
2.Entity Extraction: Identifying names of people, companies, places,
etc.
○ E.g., Extracting product names, shipping numbers, and emails
from support tickets.

23-10-2024 Department of Computer Science, YIASCM 77


Basics of Grammar-Free Analyzers
Grammar-free analyzers in Natural Language Processing (NLP) are systems that process text
without relying on predefined grammatical rules or formal grammars. Instead, they use statistical
methods, machine learning algorithms, or other heuristics to analyze and understand language. It’s
components are:

1. Statistical Methods:
○ Rely on large corpora of text to understand language patterns.
○ Use frequency counts, probabilities, and co-occurrence statistics to analyze text.
2. Machine Learning:
○ Employ algorithms that learn from annotated data to make predictions or classifications.
○ Common approaches include supervised, unsupervised, and reinforcement learning.
3. Heuristic Methods:
○ Use rule-of-thumb strategies to process text.
○ These methods might include pattern matching, keyword extraction, or context analysis.

23-10-2024 Department of Computer Science, YIASCM 78


Common Techniques

1. n-grams
An n-gram is a contiguous sequence of n items (usually words or characters) from a
given text. n-grams are used to model the likelihood of sequences and predict the next
word in a sequence based on the previous words.
• Unigrams (1-grams): Single words. Example: "The cat sat on the mat" → "The",
"cat", "sat", "on", "the", "mat".
• Bigrams (2-grams): Pairs of words. Example: "The cat", "cat sat", "sat on", "on
the", "the mat".
• Trigrams (3-grams): Triplets of words. Example: "The cat sat", "cat sat on", "sat
on the", "on the mat".

23-10-2024 Department of Computer Science, YIASCM 79


2. Term Frequency-Inverse Document Frequency (TF-IDF)
Measures the importance of a word in a document relative to a collection of
documents (corpus).
• Term Frequency (TF): The frequency of a term in a document.
• Inverse Document Frequency (IDF): Measures how much information the
word provides, i.e., whether the term is common or rare across all documents.
Formula:

Where N is the total number of documents and n is the number of documents


containing the term.
TF-IDF is used for text mining and information retrieval to identify important words
in documents.

23-10-2024 Department of Computer Science, YIASCM 80


3. Word Embeddings
Represents words as dense vectors of real numbers in a continuous vector space,
capturing semantic meanings.
Word2Vec: Creates word embeddings using two models, CBOW (Continuous Bag of
Words) and Skip-gram. It captures the context of words.
GloVe (Global Vectors for Word Representation): Constructs word embeddings by
aggregating global word-word co-occurrence statistics from a corpus.
FastText: Extends Word2Vec by considering subword information, making it effective
for morphologically rich languages.
Word embeddings are used in various NLP tasks such as text classification, sentiment
analysis, and machine translation.

23-10-2024 Department of Computer Science, YIASCM 81


4. Latent Dirichlet Allocation (LDA)
A generative probabilistic model used for topic modeling. It assumes
documents are mixtures of topics and topics are mixtures of words.
Process: Step 1: Determine the number of topics.
Step 2: Assign words to topics based on their distribution.
Step 3: Iterate to improve the assignment.
LDA is used to identify underlying topics in large text corpora, making it
useful for organizing, summarizing, and understanding large sets of
documents.

23-10-2024 Department of Computer Science, YIASCM 82


5. Hidden Markov Models (HMMs)
A statistical model that represents systems with hidden states through observable
events. In NLP, it's used for tasks like part-of-speech tagging.
HMMs are applied to sequence labeling tasks such as part-of-speech tagging, named
entity recognition, and bioinformatics.
Components:
States: Represent parts of speech or other categories.
Observations: Actual words or tokens in the text.
Transitions: Probabilities of moving from one state to another.
Emissions: Probabilities of an observation being produced from a state.

6. Naive Bayes Classifier


Concept: A probabilistic classifier based on Bayes' theorem, assuming independence
between features.
23-10-2024 Department of Computer Science, YIASCM 83
Formula:

Naive Bayes is used for text classification tasks such as spam detection, sentiment analysis,
and document categorization.
7. Pointwise Mutual Information (PMI)
Measures the association between two words by comparing the probability of their co-
occurrence with their individual probabilities.

PMI is used to find collocations and word associations, helping in tasks like word sense
disambiguation and information retrieval.

23-10-2024 Department of Computer Science, YIASCM 84


ACTIVITY - 12
Analyse the following sentences using top down and bottom up parsing techniques.
1. The cat is on the mat.
2. The dog barked loudly.

[Top down parsing: Provide a step-by-step breakdown of how the sentence can be
parsed starting from the start symbol (usually the sentence, S) and applying
production rules until you reach the terminal symbols (words of the sentence).
Bottom Up parsing: Provide a step-by-step breakdown of how the sentence can be
parsed starting from the terminal symbols (words of the sentence) and applying
production rules in reverse until you reach the start symbol (S)]

23-10-2024 Department of Computer Science, YIASCM 85


Basics of translation

What is Machine Translation?

Machine translation is a sub-field of computational linguistics that focuses on


developing systems capable of automatically translating text or speech from one
language to another. In NLP, the goal of machine translation is to produce
translations that are not only grammatically correct but also convey the meaning of
the original content accurately.
Basics of translation

Annyeonghaseyo
What are the key approaches in Machine Translation?
In machine translation, the original text is decoded and then encoded into the target
language through two step process that involves various approaches employed by
language translation technology to facilitate the translation mechanism.

1. Rule-Based Machine Translation


Rule-based machine translation relies on the resources to ensure precise translation of
specific content. The process involves the software parsing input text, generating a
transitional representation, and then converting it into the target language with reference
to grammar rules and dictionaries.
2. Statistical Machine Translation

● Rather than depending on linguistic rules, statistical machine


translation utilizes machine learning for text translation. Machine
learning algorithms examine extensive human translations,
identifying statistical patterns.
● When tasked with translating a new source text, the software
intelligently guesses based on the statistical likelihood of specific
words or phrases being associated with others in the target language.
3. Neural Machine Translation (NMT)

● A neural network, inspired by the human brain, is a network of interconnected nodes


functioning as an information system. Input data passes through these nodes to
produce an output.
● Neural machine translation software utilizes neural networks to process vast datasets,
with each node contributing a specific change from source text to target text until the
final result is obtained at the output node.
4. Hybrid Machine Translation

● Hybrid machine translation tools integrate multiple machine translation models


within a single software application, leveraging a combination of approaches to
enhance the overall effectiveness of a singular translation model.
● This process typically involves the incorporation of rule-based and statistical
machine translation subsystems, with the ultimate translation output being a
synthesis of the results generated by each subsystem.
Why we need Machine Translation in NLP?
Machine translation in Natural Language Processing (NLP) has several benefits,
including:

1. Improved communication: Machine translation makes it easier for people who


speak different languages to communicate with each other, breaking down
language barriers and facilitating international cooperation.
2. Cost savings: Machine translation is typically faster and less expensive than
human translation, making it a cost-effective solution for businesses and
organizations that need to translate large amounts of text.
3. Increased accessibility: Machine translation can make digital content more
accessible to users who speak different languages, improving the user
experience and expanding the reach of digital products and services.
4. Improved efficiency: Machine translation can streamline the translation
process, allowing businesses and organizations to quickly translate large
amounts of text and improving overall efficiency.
5. Language learning: Machine translation can be a valuable tool for language
learners, helping them to understand the meaning of unfamiliar words and
phrases and improving their language skills.
References:

• E. Rich and K. Knight, “Artificial intelligence”, TMH, 2nd ed., 1992.


• Nilsson, N. J. (1986). Principles of artificial intelligence. Morgan Kaufmann.
• Craig, J. J. (2009). Introduction to robotics: mechanics and control, 3/E.
Pearson Education India.
• Klafter, R. D., Chmielewski, T. A., &Negin, M. (1989). Robotic
engineering : an integrated approach. Prentice-Hall.
• Yoshikawa, T. (1990). Foundations of robotics: analysis and control. MIT
press.

23-10-2024 Department of Computer Science, YIASCM 94

You might also like