0% found this document useful (0 votes)
27 views39 pages

04 - Parsing in NLP

NLP-Final sem study material

Uploaded by

Khushi khokhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views39 pages

04 - Parsing in NLP

NLP-Final sem study material

Uploaded by

Khushi khokhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Unit # 2

Syntactic Analysis
Concept of Grammar
Phases of Natural Language Processing
Stem ,
Morphem ,
Morphology POS
Analysis

Natural Grammar
Pragmatic Syntactic
language Rules
Analysis Analysis
Processing

Contextual
Information
Semantic
Analysis
Semantic
Rules
What is Syntactic Analysis
Syntactic analysis, or parsing, is the process of analyzing natural language with the
rules of a formal grammar. Grammatical rules are applied to categories and groups
of words, NOT individual words. Syntactic analysis basically assigns a semantic
structure to text.
• Use of Noun-Verb pair: A sentence includes a subject and a predicate. We combine
every noun phrase with a verb phrase in the sentence.
Example: The dog (noun phrase) went away (verb phrase)
• Adjective before Noun: Adjectives are usually placed before the noun they describe.
Example: The beautiful garden was blooming with flowers.
• Use of Articles:'A' or 'an' is used before singular, countable nouns that are not
specific; 'the' is used before specific nouns.
Example: A cat sat on the mat. (any cat)
Example: The cat sat on the mat. (a specific cat)
• Proper Placement of Modifiers: Modifiers should be placed next to the word they
modify.
Example: She almost drove six hours to get home.
• Pronoun Antecedent Agreement: Pronouns must agree with their antecedents in
number and gender.
Example: Every student must bring his or her own pencil.
• Subject-Verb Agreement: A singular subject takes a singular verb, while a plural
subject takes a plural verb.
Example: The dog barks. (singular)
Example: The dogs bark. (plural)
Chomsky Hierarchy of Grammar
• The field of formal language theory (FLT) initiated by Noam Chomsky sets a
minimal limit on description adequacy.
• Chomsky approach ignores meaning , usage of expressions ,frequency, context
dependence and processing complexity entirely from the natural language.
• Chomsky theory only assume that patterns that are productive for short strings
apply to strings of arbitrary length in an unrestricted way.
• An expression in the sense of FLT is simply a finite string of symbols, and a
(formal) language is a set of such strings. Chomsky theory explores the
mathematical and computational properties of such sets.
• The immense success of his framework influence not only linguistics but also
theoretical computer science and molecular biology.
• Particularly , FLT deals with formal languages (= sets of strings) that is defined by
a finite set of rules – Grammar (𝒢).
• Grammar in FLT is composed of four elements :
(1) a finite vocabulary of symbols (Σ), referred to as terminals , that appear in the
strings of the language
(2) finite vocabulary of extra symbols called non-terminals (NT)
(3) a special designated non-terminal called the start symbol (S)
(4) and a finite set of rules. (R)

Thus a grammar 𝒢 can be referred to a quadruple 〈Σ,NT,S,R〉


Chomsky Hierarchy of Grammar (contd)
• 𝒢 will be said to generate a string consisting of symbols from Σ if and only if it is
possible to start with S through some finite sequence of rule applications.
• The set of all strings that 𝒢 can generate is called the language of 𝒢, and is
notated L(𝒢) .
Chomsky classified grammar hierarchy into four levels or categories based on their
generative power:
1. Type 0 - Unrestricted Grammars
2. Type 1 - Context-Sensitive Grammars
3. Type 2 - Context-Free Grammars
4. Type 3 - Regular Grammars
Type 0 - Unrestricted Grammar
• The productions can be in the form of α → β where α is a string of terminals and non
terminals with at least one non-terminal and α cannot be null. β is a string of terminals
and non-terminals. Examples S → ACaB
Bc → acB
CB → DB
• Type 0 - Unrestricted Grammars are the most powerful in the Chomsky hierarchy and
capable of generating any language. This level of generative power allows for the
description of languages and behaviors that are highly complex and encompass all
other grammar types within the hierarchy.
• The sky's truly the limit here.
Chomsky Hierarchy of Grammar (contd)
• Type 0 grammars are not typically used in natural language processing (NLP)
due to their computational complexity and lack of constraints.
Example : The cat chases the mouse.
"Chases mouse the cat." or even "The the mouse cat chases."

Type 1 - Context-Sensitive Grammar


• The productions can be in the form of α → β with condition that len(α ) <= len(β)
• Type-1 grammars in the Chomsky hierarchy are more restrictive than Type-0
grammars but less restrictive than Type-2 (context-free grammars) and Type-3
(regular grammars).
• The format ensures that the context around non-terminal A (represented by α and β)
can dictate the substitution of α into β, making the grammar context-sensitive.
• This rule translate in English language to rules like say, agreement in number
between subjects and verbs in sentences. The English grammar rule ensure that a
singular subject matches with a singular verb form and a plural subject with a plural
verb form, which can be considered within the context of the surrounding words.
Chomsky Hierarchy of Grammar (contd)
• So in short , the previous example now has a Rule to follow .
Previous Example : The cat chases the mouse.
Singular Subject with Singular Verb:
"The cat chases the mouse."
"Cat" is a singular noun, so the verb "chases" is also in the singular
form.
Plural Subject with Plural Verb:
"The cats chase the mouse."
"Cats" is a plural noun, so the verb "chase" is in the plural form, without
the 's' at the end.
Thus rule must be followed to construct grammatically correct sentences in
Chomsky Type-1 Context Sensitive Grammar.

To describe the grammar associated with this example , we have a set of


production rules. These rules explain how sentences in the language are
constructed from words and phrases.
Chomsky Hierarchy of Grammar (contd)
Some more Rules as illustration – English Grammar (for other languages ... xyz ….)
Pronoun Antecedent Agreement:
Rule: Pronouns must agree in number and gender with their antecedents.
Every student must bring his or her pencil.

Use of articles :
Rule: The definite article 'the' is used before a noun that is specific or known
to the listener, while 'a' or 'an' is used for non-specific nouns in the singular
form.
She wants an apple from the basket.

Subjunctive Mood:
Rule: The subjunctive mood is used for wishes, hypotheticals, or actions that
are contrary to fact.
If I were you, I would not do that.
• These rules illustrate how the context surrounding words or phrases can
dictate the appropriate grammatical forms to use, which is a hallmark of
context-sensitive (Type-1) grammars.
• Starting from a string in question β, there are finitely many ways in which
rules can be applied backward to it.
Chomsky Hierarchy of Grammar (contd)
Type 2 - Context-free Grammar
Chomsky Type-2 Grammar, also known as context-free grammar (CFG), is a formal
grammar in which every production rule is of the form α → β where α is a single non-
terminal symbol, and β is a string of terminals and/or non-terminals (β can be
empty). The productions need NOT follow condition that len(α ) <= len(β) instead
- Every string has an equal number of 'α's and 'β's, but in any order which yields a
context-free grammar. ab → ba
aabb → bbaa
CB → DB
- Further, it follows a hierarchical structure i.e it consists a set of production rules
that can be applied recursively and can generate a tree structure.
The hierarchical structure refers to the way sentences can be broken down into
smaller parts, and those parts can be broken down further, following the CFG rules.
This leads to the creation of a parse tree, which visually represents the breakdown of
a sentence into its grammatical parts.
In a parse tree for a context-free grammar:
The root node is typically the start symbol (often S for sentence).
The leaf nodes are terminal symbols, which correspond to the words of the sentence.
The interior nodes are non-terminal symbols, representing the syntactic categories (like
noun phrases, verb phrases, etc.).
Chomsky Hierarchy of Grammar (contd)
For the sentence "The cat chases the mouse.", we define a context-free rule as
follows:
S→NPsingular VPsingular S
NPsingular →Det N singular / \
VPsingular → Vsingular NP NP VP
/ \ / \
1. Start with the Sentence (S): Det N V NP
The initial rule identifies the sentence structure: | | | / \
S→NP VP The cat chases Det N
| |
2. Expand the Noun Phrase (NP) for the Subject: The mouse
Here, we expand the noun phrase to include a
determiner (Det) and a singular noun (N_singular):
NP→Det Nsingular
​ "The cat": NP→[The][cat]

The tree shows the hierarchical structure of the sentence. The sentence is divided into a
noun phrase and a verb phrase. The noun phrase NP consists of a determiner Det
("The") and a noun N ("cat"), which together refer to the subject of the sentence. The
verb phrase VP consists of a verb V ("chases") and a noun phrase NP, which is the
object of the sentence. This object NP is again made up of a determiner "The" and a
noun "mouse".
Chomsky Hierarchy of Grammar (contd)
Type 3 - Regular Grammar
• Chomsky's Type-3 Grammar, also known as Regular Grammar, is the simplest
type of grammar in the Chomsky hierarchy.

• The production rules in a Type-3 grammar are restricted to a single non-terminal


on the left side and on the right side either a single terminal or a terminal followed
by a non-terminal.
α → β or α → β Y He talks, She runs
where α, Y ∈ N (Non terminal) Quickly", "Happily
and β ∈ T (Terminal) Unhappy , Happiness

• The Type-3 Grammar are suitable for describing the simplest syntactic structures
that involve direct adjacency and do not require nesting or recursion.
• It does not allow hierarchical structure or much nesting or recursion, unlike
context-free grammars.

• For each regular grammar 𝒢, it is possible to construct an algorithm (a FSA) that


reads a string from left to right, and then outputs ‘yes’ if the string belongs to L(𝒢),
and ‘no’ otherwise.
Conclude Chomsky Grammar
• A simple sentences built up in a hierarchical S
fashion from smaller parts to the complete ├── NP
sentence. │ ├── Det
│ │ └── The
• This hierarchical structure is critical for │ ├── Adj
understanding the syntactic function of each │ │ └── quick
word and phrase within a sentence. │ ├── Adj
• It shall allow the analysis and generation of │ │ └── brown
syntactically correct sentences in natural │ └── N
language processing. │ └── fox
└── VP
├── V
│ └── jumps
├── P
│ └── over
└── NP
├── Det
│ └── the
├── Adj
3. Where are Natural Language located ? │ └── lazy
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3367686/ └── N
└── dog
Semantic
Analysis

Syntax
Lexical Analysis
Analysis

Code
Generation
Code
Optimisation

Concept of Parsing
Parsing in NLP
• Parsing in basic terms can be described as breaking down the sentence into its
constituent words in order to find out the grammatical type of each word or
alternatively to decompose an input into more easily processed components.
• Every natural language consist of its own grammar rules according to which the
sentences are formed. Parsing is used to find out the sequence of rules applied for
sentence generation in that particular language.
• The basic connection between a sentence and the grammar is derived from the
parse tree. Natural Language processing provides us with two basic parsing
techniques viz, Top-Down and Bottom-Up. Their name describes the direction in
which parsing process advances.

Top-Down parsing
• The process involves predicting the structure of a sentence from the start symbol
of the grammar down to the terminals, which correspond to the words in the
sentence.
• The start symbol S represents the most general concept, typically a sentence in
natural language grammars.
• The algorithm starts from the tops of the tree i.e S, by looking on the grammar
rules with S on left hand side so that all the possible trees are generated.
Top-Down parsing
• The algorithm proceeds by substituting the start symbol with one of its possible
expansions (productions). This prediction is guided by the grammar rules, which
define how symbols can be replaced or expanded.
• The process is recursive; for each non-terminal symbol encountered, the parser
selects a production rule to expand it further, moving towards the terminal
symbols.
• This expansion continues until the parser reaches the terminal symbols, which are
the actual words or tokens of the input sentence.
• If the parser selects a production that doesn't lead to a successful match with the
input sentence, it may need to backtrack. Backtracking involves going back up the
parse tree to a previous decision point and trying a different production rule.
• This can be computationally expensive in cases where many backtracks are
necessary.
• The goal of top-down parsing is to construct a parse tree that represents the
syntactic structure of the input sentence according to the grammar. If the entire
input sentence is successfully matched against the productions of the grammar,
the sentence is considered syntactically valid.
• Top-down parsing can be implemented in various forms,
o The simplest being a Recursive Descent Parser
o Predictive Parser
Top-Down parsing (contd)
Recursive Descent Parser
• Recursive descent parsing is one of the most straightforward forms of parsing.
• This parser checks the syntax of the input stream of text by reading it from left to
right (hence, it is also known as the Left-Right Parser).
• The parser first reads a character from the input stream and then verifies it or
matches it with the grammar's terminals. If the character is verified, it is accepted
else it gets rejected.
• Recursive descent parsers are straightforward to implement and can handle a
wide range of grammars, including those that are not context-free.
• Since the grammar in parser is manually coded, it can include sophisticated error
reporting and recovery mechanisms. Consider
expression ::= term (('+' | '-') term)*
term ::= factor (('*' | '/') factor)*
<h1> , <b>, <head>
factor ::= NUMBER , <html> , <img>
3 + (4 * 5)
Predictive Parser
• The Predictive Parser is a type of top-down parser that is specifically designed to
work with a class of grammars known as LL grammars, where the first "L" stands
for scanning the input from left to right, and the second "L" for producing a leftmost
derivation.
Top-Down parsing (contd)
Grammar Rule :
The basic sentence is understood in terms of noun phrase NP and verb phrase VP.
Other rules lets say are stated as below :

S -> NP VP # S indicate the entire sentence


VP -> V NP # VP is verb phrase the
V -> "eats" | "drinks" # V is verb
NP -> Det N # NP is noun phrase (chunk that has noun in it)
Det -> "a" | "an" | "the" # Det is determiner used in the sentences
N -> "president" |"Obama" |"apple"| "coke" # N some example nouns

• President eats apple


• Obama drinks coke
Bottom-Up parsing
• The bottom-up parsing approach follows the leaves-to-root approach or technique.
In the bottom-up parsing approach, the construction of the parse tree starts from
the leaf node. So, first, the leaf node is formed and then the generation goes
subsequently up by generating the parent node, and finally, the root node is
generated .
• The bottom up parser begins with the input sentence, treating each word as a
basic unit or leaf node in the parse tree.
• It then looks for sequences of nodes that match the right-hand side of a grammar
rule. When it finds such a match obeying the rule , it effectively construct a higher-
level node in the parse tree.
• This process of matching and replacing continues iteratively, building up the tree
from the leaves (input symbols) towards the root (the start symbol).
• The parsing is successful if the entire input can be reduced to the start symbol of
the grammar, indicating that the sentence conforms to the specified grammar.
• The most common type of bottom-up parser are
o Shift-reduce Parser
o LR Parser
Bottom-Up parsing (contd)
Shift-reduce Parser
• A shift-reduce parser is a sort of bottom-up parser that starts with the input and
builds a parse tree by performing a series of shift (transfer data to the stack) and
reduction (apply grammar rules) operations.
S -> NP VP
VP -> V NP
V -> "eats" | "drinks"
NP -> Det N
Det -> "a" | "an" | "the
N -> "president" |"Obama" |"apple"| "coke"

sentence = Obama eats an apple

• Initially, the parser shifts each word of the sentence onto a stack, one word at a
time, starting from "Obama".
• When the items on the stack match the right side of a grammar rule, the parser
reduces those items into a single item based on the rule. For example, after
shifting "Obama", it matches the rule N -> 'Obama', so "Obama" is reduced to N.
Bottom-Up parsing (contd)
Shift "Obama" onto the stack. (Stack: [Obama])
Reduce "Obama" to N using the rule N -> 'Obama'. (Stack: [N])
Reduce N to NP using the rule NP -> N. (Stack: [NP])
Shift "eats" onto the stack. (Stack: [V, eats])
Reduce "eats" to VP using the rule VP -> V -> 'eats'.
Shift "an" onto the stack.
Reduce "an" to Det and further using the rule Det -> NP ->VP
Reduce "apple" to N using the rule N -> NP ->VP
Reduce N to NP using the rule NP -> Det N.
Reduce NP VP to S using the rule S -> VP NP and NP -> Det N
Bottom-Up parsing (contd)
• This process continues with shifting and reducing according to the rules defined
in the grammar until the entire sentence is reduced to the start symbol (S),
indicating successful parsing.
• The ShiftReduceParser might not always find a parse for a sentence, especially if
the grammar is ambiguous or doesn't cover the sentence structure. In such cases,
we need to adjust the grammar.

• Apple eats coke


• President drinks
Obama

When it comes to a syntactic parser, there is a chance that a syntactically formed


sentence could be meaningless. To get to the semantics, we need a deeper
understanding of semantics structure of the sentence.
Parsed Tree – English Grammar
Example sentence :
It feels great to be the wind beneath the wings of one of the top ranking
Emerging Institutes of Technology in India The devouring stride of IGDTUW,
since its inception in 1998 as IGIT, has been exemplary, culminating in
transforming itself into an icon for women empowerment.
Some more Parsers flavors …
Chart Parser
• A chart parser is a type of parsing algorithm that is designed to efficiently parse
sentences by avoiding redundant computations and by handling ambiguity in
natural language grammatically.
• The chart parser uses a data structure known as a "chart," which stores
intermediate parsing results during the parsing process. The intermediate parsing
results allows the parser to explore different interpretations .
• This approach thus enables the parser to efficiently handle complex and
ambiguous grammatical structures that can occur in natural language.

RegexpParser
• The RegexpParser in NLTK, or Regular Expression Parser, is a tool for chunking
text into groups based on patterns defined using regular expressions. In this we
first define patterns using regular expressions that describe the syntactic
structures we want to identify. These patterns are matched against the parts of
speech of words in a sentence.
• It's particularly useful for identifying specific structures within sentences, such as
noun phrases (NPs), verb phrases (VPs), and other syntactic groups, without
requiring a full parsing of the sentence's structure.
Parsed … Why ? Purpose ?
Parsing long sentences and generating parse trees offers several practical advantages
in understanding and processing natural language.
1. Syntactic Structure Understanding
Parse trees provide a visual and structural representation of the syntactic relationships
within a sentence. This helps students understand how sentences are constructed in
natural language, including how words relate to each other through dependencies or
hierarchies.
2. Ambiguity Resolution
Long sentences often contain ambiguities in meaning. This can be resolved through
parsing by choosing specific interpretations based on syntactic rules or learned
patterns (context).
I saw the man with the telescope.

Ultimately, the "correct" interpretation


depends on the intended meaning as
inferred from context, background
knowledge, and the specific application of
the parsed information.
Parsed … Why ? Purpose ?
3. Supports Text Analysis
In text analysis and sentiment analysis, understanding the structure of long sentences
can help in accurately determining the sentiment conveyed by a sentence or a
paragraph. Parse trees make it easier to identify the scope of negations or the
relationships between subjects and objects, which are critical for these analyses.
Administration Cell announced the date of
Summer Break exciting all AIML students
In this collegewide.
sentence,
parsing helps
in several
aspects of text
analysis:
Parsed … Why ? Purpose ?
Identifying Entities and Actions > Parsing helps identify "ADMINISTRATION" as the
subject performing the action “ANNOUNCED" You can segregate actions ...
Understanding Relationships > The parsing process reveals the relationship between
"Administration" and "the announcement of summer break" showing that
Administration is the entity responsible for the action. This can be used in relation
extraction, a subset of information extraction, where relationships between entities are
crucial.
Sentiment Analysis > The phrase "exciting all AIML students collegewide " can be
linked to the announcement, suggesting a positive sentiment towards the action of
the new Administration. Parsing allows sentiment analysis systems to more
accurately attribute sentiments to specific entities or events within the text.
4. Enhances Language Learning
For students learning a new language, parsing can reveal differences in syntactic
structures between languages, aiding in the understanding of grammar and sentence
construction.

In short summary :
Parsers produced structured representation enables deeper analysis, such as
identifying subjects, objects, and the actions connecting them, which is crucial
for understanding the meaning of texts, sentiment analysis, information
extraction, and more.
Dependency Parsing
• As the name suggests , Dependency grammar is a fundamental concept in natural
language processing (NLP) that allows us to understand how words connect within
sentences. It provides a framework for representing sentence structure based on
word-to-word relationships.
• Every sentence has a central idea represented by a main verb, which connects all
other words in the sentence to it. This central idea is known as the root in
dependency grammar.
• In every word relationship, there are two key roles: the governor and the
dependent. The “governor” represents the main action while “dependent” serves
as object , target on which action relies.
• In a dependency parse tree , Each dependency relation line is labeled to illustrate
the relationship between the words on each end. Labels like subject (subj) and
object (obj) provide the grammatical role for every word in the sentence structure.
• The Dependency parsing using dependency grammar principles, analyzes
sentences and produces a tree that illustrates the grammatical relationships
between words. This is essential for understanding the structure of sentences.
• Dependency graph satisfy the following constraints:
o They have a single designated root node that has no incoming arcs.
o Each node has one incoming edge except the root node.
o There is a unique path to each node from the root node.
Dependency Parsing(contd)
An essential part of dependency parsing is the dependency tag. A dependency tag
indicates the relationship between two phrases. It is the word that modifies the
meaning of the other word. For eg take the following sentence

Intelligent students score good marks with their hard


work. it can be seen that , the word "Intelligent" is an adjective for the word
In this sentence
"students“ hence an arc from students to Intelligent signifying this dependency. Similar
dependency can be seen for other words in the dependency graph below.
Dependency Parsing(contd)
Some of the common tags that are used for the syntactical relationships between the
words are described below. They are necessarily to be understood to decode the
dependency parse tree.
Dependency ccomp clausal complement
Description
Tag clf classifier
clausal modifier of a noun compound compound
acl compound:lvc gentle verb building
(adnominal clause)
compound:prt phrasal verb particle
acl:relcl relative clause modifier
compound:redup reduplicated compounds
advcl adverbial clause modifier
compound:svc serial verb compounds
advmod adverbial modifier conj conjunct
advmod:emph emphasizing phrase, intensifier cop copula
advmod:lmod locative adverbial modifier csubj clausal topic
csubj:move clausal passive topic
amod adjectival modifier
dep unspecified dependency
appos appositional modifier det determiner
aux auxiliary рrоnоminаl quаntifier gоverning the саse
det:numgov
оf the nоun
aux:move passive auxiliary
рrоnоminаl quаntifier agreeing with the
case case-marking det:nummod
саse оf the nоun
cc coordinating conjunction det:poss possessive determiner
cc:preconj preconjunct discourse discourse ingredient

https://www.analyticsvidhya.com/blog/2021/12/dependency-parsing-in-natural-
language-processing-with-examples/
Dependency Parsing(contd)
Example sentence :
YOU EDUCATE A MAN, YOU EDUCATE A MAN. YOU EDUCATE A
WOMAN, YOU EDUCATE A GENERATION.
Some open source library NLP libraries …
1. Flair: Flair is an NLP library developed by Zalando Research and focuses for tasks
like named entity recognition, part-of-speech tagging, syntactic & sentiment analysis.
https://engineering.zalando.com/posts/2018/11/zalando-research-releases-flair.html

2. Stanza: Stanza, previously known as CoreNLP, is a Python library developed by the


Stanford NLP group. It offers a suite of NLP tools, including tokenization, part-of-speech
tagging, dependency parsing, syntactic & sentiment analysis, named entity recognition.
Stanza's strength lies in its accuracy and efficiency, making it a valuable tool for
research and practical NLP applications. https://stanfordnlp.github.io/stanza/

3. AllenNLP: AllenNLP is developed by the Allen Institute for AI. It provides a wide
range of pre-built models and components for tasks like text classification, semantic
role labeling, and more. AllenNLP also offers flexibility for custom model development
with its modular design. https://allenai.org/allennlp/software/allennlp-library

4. Gensim: Gensim is a popular Python library for topic modeling, document similarity
analysis, and word vector representations. It provides implementations of algorithms
such as Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and
Word2Vec. Gensim is known for its efficiency, scalability, and ease of use
Some Natural language processing (NLP) libraries
5. UIMA: (Unstructured Information Management Architecture): UIMA is an open-
standard framework for building NLP pipelines developed by the Apache Software
Foundation. It provides a scalable and extensible infrastructure for processing
unstructured information, including text, audio, and video in C++ & Java framework.
https://uima.apache.org/

6. NLTK-Contrib: NLTK-Contrib is a collection of contributions and extensions to the


Natural Language Toolkit (NLTK). It offers additional functionalities and models that
extend the capabilities of NLTK, including language models, tokenizers, classifiers, and
more. NLTK-Contrib provides a wide range of experimental and less-known NLP tools
for research and experimentation.




This is not full list. There are many more . Pl. explore …
Case study of parsers of
NLP systems like ELIZA, LUNAR

Next Class
Case study - ELIZA, LUNAR NLP systems
1. ELIZA
• ELIZA is a computer program that simulates the behavior of a therapist. This is first
programs of its very kind developed way back in 1966 in MIT. This program interacts
wit the user in simple English and simulate the presence or talking with a therapist
imaginary Doctor. Though then many concepts of Artificial Intelligence were still not
developed but ELIZA surprised number of individuals as it attributed human-like
feelings to its user.
• Eliza listened to what the user said and it could parse the sentence in very basic
way, and then present question in way that is somehow related to question. In its
early 1960s people were fooled by Eliza as they thought were told that a real live
therapist was talking from the second computer.
• A program like ELIZA requires knowledge of three domains.
1. Artificial Intelligence
2. Expert System
3. Natural Language Processing
• Weizenbaum who was developer of this program was shocked to know that the
MIT staff of the lab thought that the machine was a real therapist, and spent hours
revealing their problems to the program. When Weizenbaum informed them that he
had access to logs of all conversations, the community was outraged at this
invasion of their privacy. He himself was shocked that these kind of simple
programs could so easily deceive a native user into revealing personal information.
Case study - ELIZA, LUNAR NLP systems
1. ELIZA
• ELIZA is a computer program that simulates the behavior of a therapist. This is first
programs of its very kind developed way back in 1966 in MIT. This program interacts
wit the user in simple English and simulate the presence or talking with a therapist
imaginary Doctor. Though then many concepts of Artificial Intelligence were still not
developed but ELIZA surprised number of individuals as it attributed human-like
feelings to its user.
• Eliza listened to what the user said and it could parse the sentence in very basic
way, and then present question in way that is somehow related to question. In its
early 1960s people were fooled by Eliza as they thought were told that a real live
therapist was talking from the second computer.
• A program like ELIZA requires knowledge of three domains.
1. Artificial Intelligence
2. Expert System
3. Natural Language Processing
• Weizenbaum who was developer of this program was shocked to know that the
MIT staff of the lab thought that the machine was a real therapist, and spent hours
revealing their problems to the program. When Weizenbaum informed them that he
had access to logs of all conversations, the community was outraged at this
invasion of their privacy. He himself was shocked that these kind of simple
programs could so easily deceive a native user into revealing personal information.
Case study - ELIZA, LUNAR NLP systems
• Although ELIZA doesn't understand context or meaning and is limited to do
syntactic analysis of current generation chat box, it set the stage for the
development of more sophisticated AI and chatbots that use complex NLP and
machine learning techniques to interact with users.
• The Technical Blocks of ELIZA can be broken down in to following components

Input Processing: ELIZA starts with the input processing where the user's
input is scanned for keywords or phrases that the system can recognize. This
is typically a simple string matching process without any understanding of the
language.
Pattern Matching: ELIZA uses a pattern matching technique to identify the
user's statements' key elements. This is done using a script, which is
essentially a collection of pattern-response pairs.
Decomposition Rules: Once a pattern is identified in the user's input, ELIZA
applies decomposition rules to break down the input into smaller parts. These
rules are used to transform the input into a form that can be more easily
manipulated to generate a response.
Case study - ELIZA, LUNAR NLP systems
Reassembly Rules: After decomposition, reassembly rules are used to construct the
response. These rules take the decomposed input and reassemble it into a statement
that reflects what the user has said. The reassembly process often involves
rephrasing the user's input and asking for further information.
Script Database: ELIZA operates using a script, a database of pre-defined patterns
and responses. The most famous script, known as DOCTOR, simulates a Rogerian
psychotherapist, which means it primarily uses the user's own statements to form
questions.
Response Generation: The response is generated based on the matching patterns
and associated rules. It then selects an appropriate response from the script that
corresponds to the identified pattern.
Output: Finally, the generated response is output to the user, continuing the
conversation.
Case study - ELIZA, LUNAR NLP systems
Thanks

You might also like