0% found this document useful (0 votes)
6 views18 pages

Natural Language Processing Internal 1

The document discusses morphological models in Natural Language Processing (NLP), detailing types of morphology, rule-based, statistical, neural, dictionary lookup, finite-state, unification-based, functional morphology, and morphology induction. It also covers system paradigms in NLP, focusing on search and classification paradigms, as well as the importance of word segmentation and various methods to solve it. Additionally, it outlines parsing algorithms, including recursive descent, shift-reduce, and chart parsing techniques.

Uploaded by

tinananda556
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views18 pages

Natural Language Processing Internal 1

The document discusses morphological models in Natural Language Processing (NLP), detailing types of morphology, rule-based, statistical, neural, dictionary lookup, finite-state, unification-based, functional morphology, and morphology induction. It also covers system paradigms in NLP, focusing on search and classification paradigms, as well as the importance of word segmentation and various methods to solve it. Additionally, it outlines parsing algorithms, including recursive descent, shift-reduce, and chart parsing techniques.

Uploaded by

tinananda556
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Morphological Models in NLP

Morphology is the study of word structure how words are formed using morphemes (smallest
meaning units).
Two main types:
➔ Inflectional (changes form, keeps meaning): e.g., "talk" → "talked"
➔ Derivational (changes meaning or part of speech): e.g., "happy" → "happiness"
Used in NLP tasks like:
➔ Part-of-speech tagging
➔ Machine translation
➔ Named Entity Recognition (NER)
➔ Text-to-speech synthesis
➔ Spell checking & grammar tools

1. Rule-Based Models
➔ Use handwritten rules made by language experts.
➔ Based on linguistic knowledge.
➔ Follow if-then patterns for analyzing word forms.
Pros:
➔ Transparent and explainable.
➔ Work well for regular, low-morphology languages (e.g., English).
Cons:
➔ Hard to scale.
➔ Not flexible for languages with complex/irregular forms.

2. Statistical Models
➔ Learn morphology from annotated corpora using probabilistic methods.
Common techniques:
➔ Hidden Markov Models (HMMs)
➔ Conditional Random Fields (CRFs)
Pros:
➔ Better performance than rule-based.
➔ Can generalize to unseen words.
Cons:
➔ Need large annotated datasets.
➔ Limited understanding of deep structure.
3. Neural Models
➔ Use deep learning (e.g., RNNs, LSTMs, Transformers).
➔ Can model complex and irregular morphology.
How they work:
➔ Input words as sequences of characters or subwords.
➔ Learn patterns using multiple layers of neurons.
Pros:
➔ State-of-the-art results.
➔ Work well for morphologically rich languages (Arabic, Turkish, Finnish).
Cons:
➔ Need lots of data and compute power.
➔ Hard to interpret.

4. Dictionary Lookup
➔ Uses a lexicon (dictionary) of known words and their forms.
➔ When a word is found in text, its properties (POS, base form, etc.) are retrieved from the
dictionary.
Pros:
➔ Fast and simple.
➔ Effective for regular and low-morphology languages.
Cons:
➔ Fails with out-of-vocabulary or irregular words.
➔ Not suitable for complex languages.
Enhancements:
1. Lemmatization: Reduce word to its lemma (base form).
Example: "better" → "good", "running" → "run"
2. Stemming: Chop off affixes to get the stem.
Example: "running", "runner" → "run"
3. Morphological Analysis: Segment words into prefix + root + suffix and extract
grammatical features.

5. Finite-State Morphology
➔ Uses finite-state automata and finite-state transducers (FSTs).
➔ Works by defining rules as transitions in a state machine.
Functions:
➔ Analysis: Break a word into morphemes.
➔ Generation: Build word forms from morphemes + features.
Pros:
➔ Very efficient and fast.
➔ Works well for highly regular languages like Finnish, Turkish.
➔ Easy to visualize and debug.
Cons:
➔ Less effective for irregular or unpredictable word forms.

6. Unification-Based Morphology
➔ Based on feature structures and constraints (e.g., number, gender, tense).
➔ Uses unification (matching of features) to process word forms.
Pros:
➔ Handles complex and irregular forms.
➔ Good for rich feature languages (Arabic, German).
➔ Modular and extensible.
Cons:
➔ Computationally heavy (slow).
➔ Complex implementation.

7. Functional Morphology
➔ Focuses on meaning and function of word forms in context.
➔ Based on usage-based and cognitive linguistics.
➔ Ties morphology to discourse, semantics, and communication.
Pros:
➔ Great for analyzing real-world language use.
➔ Works well with corpus-based NLP, sentiment analysis, etc.
➔ Can adapt to context and speaker intent.
Cons:
➔ Needs large corpora.
➔ Less formal structure.
➔ Can be hard to generalize.

8. Morphology Induction
➔ Unsupervised learning approach – no labeled data needed.
➔ Learns word structure from raw text using patterns/statistics.
How it works:
➔ Segments words into subword units.
➔ Finds morphemes based on frequency, patterns, etc.
➔ Uses clustering, probabilistic models, or neural methods.
Pros:
➔ Useful for low-resource or under-studied languages.
➔ Language-independent.
Cons:
➔ May not be very accurate.
➔ Results are often hard to interpret.
Note: Any five models are enough

System Paradigms in NLP


Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI). Two fundamental
problem-solving paradigms in NLP are:

1. The Search Paradigm


It is used when:
➔ We don’t have predefined answers.
➔ We're interested in how a solution is derived.
What is Search?
➔ It’s about exploring possible states or outcomes.
➔ Common in rule-based parsing where multiple rules may apply.
➔ We search for the best sequence of operations to reach a goal.
State Space Search
A search defined by:
➔ Initial state
➔ Goal (termination) condition
➔ Transition rules (how to move from one state to another)
Two main types:
1. Tree/Graph Traversal:
➔ Breadth-First Search (BFS) – explores neighbors first.
➔ Depth-First Search (DFS) – explores one path deeply.
2. Best-First Search (uses scoring functions):
➔ Greedy Search – chooses node closest to goal.
➔ A* – balances cost-so-far + estimated cost-to-goal.
➔ Beam Search – keeps only a fixed number of best options (limited memory).
➤ Other Search Strategies
➔ Hill Climbing – always move to a better neighbor (no memory or backtracking).
➔ Gradient Descent – adjust parameters to minimize error, used in machine learning.

2. The Classification Paradigm


It is used when:
1. We know the possible categories.
2. The task is to label input data (e.g., POS tagging, NER).
What is Classification?
1. A classifier assigns a label based on features of the input.
2. Can use:
◦ Rule-based methods (e.g., regular expressions)
◦ Statistical models (learned from data)

Types of Classifiers
1. Rule-based – manually written rules
2. Statistical – uses language models and probabilities
3. May be simple (short word sequences) or complex (word embeddings)
Perceptron Classifier (Example)
A linear model using:
1. Vectors for features and weights
2. Dot product to score and choose the best class

Word Segmentation and its importance


➔ Word segmentation is the task of identifying boundaries between words in a sentence.
➔ This process is easy in space-delimited languages like English but becomes challenging in
languages without clear word separators.
➔ Word segmentation is a critical step in Natural Language Processing (NLP) because it
enables tokenization, which is essential for many downstream tasks such as part-of-speech
tagging, named entity recognition, machine translation, and more.
➔ In languages that do not naturally mark word boundaries such as Chinese, Japanese, and
Thai word segmentation becomes crucial.
➔ Without correctly segmenting the words, it is impossible for an NLP system to understand
where one word ends and another begins, leading to serious misinterpretation of meaning.
➔ By contrast, languages like English use whitespace to separate words, so segmentation is
usually straightforward.
Examples:
➔ Chinese: "我喜欢中文" ("I like Chinese") → must be segmented as "我 / 喜欢 / 中文".

➔ Japanese: " 私は日本語が好きです " ("I like Japanese") → contains a mix of scripts (Kanji,
Hiragana, Katakana) and allows multiple possible segmentations.
➔ Thai: No spaces between words, no capitalization clues, and complex character
combinations make segmentation very difficult.
➔ Vietnamese: Although it uses a Latin-based script, words often consist of multiple syllables
with diacritics, making boundary detection tricky.

Methods to Solve Word Segmentation


To address these challenges, researchers use various techniques:

1. Rule-Based Methods
➔ Rule-based models use manually created grammar rules and dictionaries specific to each
language.
➔ These models often rely on:
(a)Lists of common words
(b)Morphological analysis (prefixes, suffixes)
(c)Handwritten heuristics about syllable or character patterns
➔ Example: In Chinese, if certain character sequences frequently occur together, they are
considered a word.
➔ Advantages: Fast and interpretable.
➔ Disadvantages: Hard to maintain, language-specific, and brittle to exceptions.

2. Statistical Methods
➔ These models use large annotated corpora to learn the probability of character or syllable
sequences forming valid words.
➔ Techniques include:
(a)N-gram models: Predict the likelihood of a sequence based on frequency.
(b)Hidden Markov Models (HMMs): Model sequences with hidden states
representing words.
(c)Maximum Entropy Models: Combine multiple contextual features for prediction.
➔ Statistical models can infer the most likely segmentation by maximizing the probability of a
sequence of words.
➔ Advantages: More flexible than rule-based; adapts to variations in language.
➔ Disadvantages: Needs large labeled datasets; struggles with rare words or phrases.
3. Neural Network Methods (Deep Learning)
➔ Neural approaches have recently become dominant for word segmentation tasks.
➔ Techniques include:
(a) Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks
(LSTMs): Model sequences while remembering long-range dependencies.
(b) Convolutional Neural Networks (CNNs): Extract local patterns in character
sequences.
(c) Transformer-based Models (like BERT): Pre-trained deep models that understand
context bidirectionally and predict segment boundaries.
(d) Sequence Labeling: Frame segmentation as a classification problem where each
character is labeled as "Begin", "Inside", or "End" of a word (e.g., using BIO tagging
schemes).
➔ Advantages: High accuracy, adaptable to complex scripts and contexts.
➔ Disadvantages: Requires large amounts of annotated data and computational resources.

Performances of the Approaches


The performance of different word segmentation approaches varies based on the task complexity,
language characteristics, and availability of data. Each method has its own strengths and
limitations:

1. Rule-Based Approaches
➔ Effective when:
• The document or language structure is simple.
• The rules governing word boundaries are clear and consistent.
➔ Struggles when:
• The language or document structure is complex or highly ambiguous.
• Significant manual effort is needed to craft, maintain, and update the rules.
• It faces difficulty adapting to new, unseen text patterns.

2. Statistical Approaches
➔ Effective when:
• There is a large amount of labeled data available for training.
• The document structure is relatively consistent across examples.
• Probabilistic models can accurately estimate the likelihood of different word
boundary patterns.
➔ Struggles when:
• Novel or rare document structures appear that were not well-represented in the
training data.
• Performance can degrade on out-of-domain texts or low-frequency word
combinations.

3. Deep Learning Approaches


➔ Effective when:
• Handling complex, ambiguous, and context-dependent document structures.
• Discovering new segmentation patterns without explicit programming.
• Models like LSTMs, CNNs, and Transformers can learn deep contextual
relationships in the text.
➔ Struggles when:
• Large annotated datasets are unavailable, limiting the model’s learning ability.
• Requires substantial computational resources (GPU/TPU) for training and
inference.
• Interpretability is low it can be difficult to understand how exactly the model arrives
at a decision.

Types of Parsing Algorithms


Parsing is the process of analyzing the syntactic structure of a sentence using a grammar. Several
algorithms are used for parsing in NLP, and each has its own strategy.

1. Recursive Descent Parsing


Type: Top-down Parsing
1. Begins with the start symbol (e.g., S or E).
2. Recursively applies production rules to expand non-terminals.
3. Matches the input tokens to the grammar from left to right.
4. Backtracking may occur if a rule fails.
How it works:
1. Each non-terminal in the grammar has a function.
2. It attempts to match the input string by recursively calling functions for the symbols on the
right-hand side.
Example Grammar (Arithmetic Expressions):
E→E+T|E-T|T
T→T*F|T/F|F
F → ( E ) | num
Parsing the expression: 3 + 4 * (5 - 2)
The parse tree (simplified) would look like:
E
/ | \
E + T
| / | \
T T * F
| | \
F F=num ( E )
| |
num=3 E
/ | \
E - T
| |
T T
| |
F F
| |
num=5 num=2

2. Shift-Reduce Parsing in NLP


Shift-reduce parsing is a bottom-up parsing strategy used to construct a parse tree from input
tokens. It operates using a stack, an input buffer, and a set of grammar rules.
How It Works:
1. Shift: Move the next word from the input buffer onto the stack.
2. Reduce: Replace items on the stack with a non-terminal if they match the right-hand side
of a grammar rule.
3. Goal: Reduce the entire sentence to the start symbol (e.g., S).
Grammar
S → NP VP
NP → Det N
VP → V NP
Det → the
N → cat | mouse
V → chased
Example: Sentence → "the cat chased the mouse"
3. Chart Parsing Using Hypergraphs
Chart parsing is a technique in syntactic analysis (parsing) that stores intermediate results in a
structure called a chart, usually represented as a 2D table. This helps avoid recalculating the same
partial parses multiple times. It’s efficient, especially for ambiguous grammars.
Chart parsing can be top-down, bottom-up, or both. We’ll use a bottom-up style here for a simple
sentence:
"the cat chased the mouse"
Grammar Rules:
S → NP VP
NP → Det N
VP → V NP
Det → the
N → cat | mouse
V → chased
How Chart Parsing Works:
1. Initialization:
Create a chart (like a triangular matrix) with spans representing all possible substrings of the
sentence. Each cell (i, j) in the chart stores all possible parses for the span from word i to j.
2. Scanning:
Add terminal rules (like Det → the) for individual words in their respective chart cells.
3. Prediction:
Use grammar rules to predict non-terminal possibilities for longer spans. For example, if
"the" and "cat" are found next to each other, try applying NP → Det N.
4. Combination (Completion):
Keep combining smaller constituents (like Det + N → NP) into larger ones (like NP + VP →
S) until the entire sentence is parsed.
5. Final Goal:
The chart cell covering the full sentence span (0, 5) should contain a complete parse with S
as the label.
In chart parsing, we treat the sentence as a sequence of tokens (words), and we index them from
position 0 to N, where N is the number of words in the sentence.
The span (i, j) represents the substring (or span) of the sentence that starts at index i and ends just
before j in other words, it includes the words from position i to j-1.

We build up the chart from the smallest spans (individual words) to larger spans (phrases and
whole sentence):
1. Lexical rules fill spans of one word each — like (0,1): Det → the.
2. Using grammar, we combine adjacent spans to form larger phrases:
1. (0,2) gets NP by combining (0,1) + (1,2)
2. (2,5) gets VP by combining V and NP
3. (0,5) becomes S by combining NP and VP
The topmost chart cell (0,5) has S → NP VP, meaning the whole sentence has been successfully
parsed.
4. Dependency Parsing and MST (Minimum Spanning Tree)
1. It represents the grammatical structure of a sentence as a directed graph (not a tree like
in phrase-structure parsing).
2. Each word is a node in the graph.
3. Edges connect words to show grammatical dependencies, like subject-of, object-of, etc.
4. The resulting graph is a Directed Acyclic Graph (DAG) and typically forms a tree called
the dependency tree.
Why Use MST (Minimum Spanning Tree)?
We can imagine all possible syntactic dependencies between words as edges in a complete
graph. But only one subset of those edges forms the most likely parse of the sentence the correct
tree.
To find this tree:
1. Each edge is assigned a score (how likely that dependency is)
2. The goal is to find the tree with the highest total score
This is exactly what MST algorithms do.
Chu-Liu/Edmonds Algorithm Steps (for directed MST)
1. Create a directed graph with:
1. Nodes = words
2. Edges = possible dependencies
3. Edge weights = scores based on statistical or neural models
2. Choose a root node usually the main verb (e.g., “chased”)
3. Assign scores to all edges using features (POS tags, distance, embeddings)
4. Apply MST algorithm:
1. For each node (except root), choose one incoming edge with max score
2. If there’s a cycle, resolve it by removing the lowest-weight edge in the cycle
3. Repeat until we get a tree with no cycles
5. Label the edges in the final tree with syntactic roles:
1. e.g., nsubj(chased, cat) or obj(chased, mouse)
Example Table: MST Dependency Parse for
"The cat chased the mouse"
Let's say we assign scores for possible head–dependent relations like this:
This gives us the tree:
ROOT
|
chased
/ \
cat mouse
| |
the the
And corresponding dependency edges:
1. root(ROOT, chased)
2. nsubj(chased, cat)
3. det(cat, the)
4. obj(chased, mouse)
5. det(mouse, the)
This is the MST (maximum-scoring tree) that forms a valid dependency parse.

Syntax Analysis Using Phrase Structure


They are also called parse trees or syntax trees, these represent the hierarchical structure of a
sentence.

Structure
➔ Each node: A phrase or word category (e.g., NP = Noun Phrase, VP = Verb Phrase).

➔ Tree root: Represents the entire sentence (S).

➔ Leaves: The individual words in the sentence.

Example
Sentence: “The cat sat on the mat”
Phrase structure representation:
(S
(NP (DT The) (NN cat))
(VP (VBD sat)
(PP (IN on)
(NP (DT the) (NN mat)))))
➔ Sentence (S) = Noun Phrase (NP) + Verb Phrase (VP)

➔ VP = Verb + Prepositional Phrase (PP)


➔ PP = Preposition + Noun Phrase (NP)

Why Phrase Structure Trees?


➔ They are great for linguistic analysis.
➔ Help in understanding the hierarchical structure of sentences.

Applications
1. Text-to-Speech systems
2. Natural Language Understanding (NLU)
3. Machine Translation
4. Syntax-based Language Generation

Issues and Challenges in Finding the Structure of Words in NLP


Understanding and processing the structure of words is essential for tasks like parsing, translation,
and text understanding. However, several challenges make this a complex problem in NLP.

1. Ambiguity
➔ Homonyms:
Words spelled and pronounced the same but have different meanings.
Example: "bank" → financial institution / river bank
➔ Polysemy:
Words with multiple related meanings.
Example: "book" → physical object or act of reserving
➔ Syntactic Ambiguity:
Sentences with multiple valid parses.
Example: "I saw her duck" (bird or action?)
➔ Cultural/Linguistic Ambiguity:
Idioms or slang confusing NLP systems.
Example: "kick the bucket" (means "to die")
➔ Solutions:
Contextual embeddings, part-of-speech tagging, syntactic parsing, large datasets.

2. Morphology
➔ Languages have complex rules for word formation.
➔ Words change to show tense, number, gender, etc.
➔ Example: "run," "ran," "running," "runner"
➔ Solutions:
Morphological analyzers, lemmatization, morphological tagging.

3. Word Order
➔ The position of words affects meaning.
➔ In free-word-order languages, order varies without changing meaning.
Example: Russian, Hindi.
➔ Solutions:
Syntax parsing, dependency parsing.

4. Informal Language
➔ Slang, colloquialisms, emojis, abbreviations common in casual texts.
Example: "LOL", "gonna", "brb", "u r gr8"
➔ Solutions:
Text normalization, preprocessing techniques, social-media-trained models.

5. Out-of-Vocabulary (OOV) Words


➔ Systems encounter unseen words (slang, typos, new terms).
➔ Problematic in: Morphologically rich languages.
➔ Solutions:
Subword tokenization (Byte-Pair Encoding, WordPiece), contextual embeddings.

6. Named Entity Recognition (NER) Issues


➔ Recognizing names (people, places, organizations) is tricky.
➔ Entities can be ambiguous.
Example: "Apple" → fruit or company
➔ Solutions:
Specialized NER models trained on labeled data.

7. Language-Specific Challenges
➔ Every language has unique structures, rules, and exceptions.
Example: English methods may fail for Korean, Arabic, Finnish.
➔ Solutions:
Language-specific tools, multilingual models like mBERT, XLM-R.

8. Domain-Specific Challenges
➔ General NLP models struggle with specific domains (medicine, law, tech).
➔ Example: "virus" in medicine vs. cybersecurity.
➔ Solutions:
Fine-tuning models on domain-specific corpora.
9. Irregularity
➔ Languages have irregular verbs, plurals, inflections.
➔ Examples:
"go → went" (not "goed")
"child → children" (not "childs")
➔ Solutions:
Rule-based systems + ML models for spotting irregular patterns.

10. Productivity
➔ Languages constantly create new words using prefixes, suffixes, compounding.
Examples:
"happy → unhappy"
"smart + phone → smartphone"
➔ Solutions:
Morphological analysis tools, subword-aware models.

Finding the Structure of Documents in NLP


In NLP (Natural Language Processing), finding the structure of a document means figuring out the
different parts of a document, like headings, paragraphs, and sections, and organizing them in a way
that makes sense.
Here are the main methods used to do this:
1. Rule-based Methods:
These methods follow specific rules. For example, a rule might say that a heading is larger
or bolded. The system looks for these signs to find the parts of the document.
2. Machine Learning Methods:
These methods use computers to learn from examples. The system looks at a bunch of
documents that already have their structure labeled, then it learns to recognize similar
patterns in new documents.
3. Hybrid Methods:
These methods combine the two approaches. For example, the system might first use rules
to find headings, then use machine learning to figure out what the section is about (like if it's
an introduction or conclusion).
Finding the structure helps computers understand the document better. This makes tasks like
summarizing, searching for information, and classifying text more accurate.
Overall, it's a tough task, but it's very important for analyzing and understanding documents in NLP.

Sentence Boundary Detection in NLP


Sentence boundary detection is the process of figuring out where one sentence ends and the next
begins in a document. This is important for many tasks in NLP, like machine translation,
summarizing text, and searching for information.
However, it’s not always easy because language can be tricky.
For example, abbreviations, acronyms, or names that end with a period can confuse the system into
thinking a sentence has ended when it hasn’t.
To solve this problem, there are several methods used for sentence boundary detection:
1. Rule-based Methods:
◦ These use pre-set rules to identify the end of a sentence.

◦ For example, a rule might say that if there is a period followed by a space, it marks the
end of a sentence unless it’s part of an abbreviation.
2. Machine Learning Methods:
◦ These methods use algorithms that learn from data. The system is trained on documents
with labeled sentence boundaries and learns to recognize patterns.
◦ For example, it might look at the length of a sentence, punctuation marks, or the part of
speech of the last word to predict where a sentence ends.
3. Hybrid Methods:
◦ These combine both rule-based and machine learning methods.
◦ For example, the system might first use rules to find most of the sentence boundaries
and then apply machine learning to correct any mistakes or handle special cases.
Some tools and techniques used in sentence boundary detection include:
1. Regular Expressions:
◦ These are patterns that help find specific character sequences, like periods followed by
spaces, to mark the end of a sentence.
2. Hidden Markov Models (HMMs):
◦ These models look at the probabilities of different sentence-ending markers to predict
where sentences are likely to end.
3. Deep Learning Models:
◦ These are advanced neural networks that can learn complex patterns from large amounts
of data and are very effective at detecting sentence boundaries.
◦ Accurately detecting sentence boundaries is key to many NLP tasks.
◦ It helps systems understand and process text more accurately, which leads to better
summarization, information extraction, and other language-related tasks.

Advantages and Disadvantages of NLP


Advantages of NLP
1. Natural Language Processing (NLP) offers several advantages that make human-computer
interaction more efficient:
2. It enables users to ask questions in natural language and receive direct, accurate responses
within seconds.
3. NLP focuses on providing exact answers rather than overwhelming the user with
unnecessary or irrelevant information.
4. It bridges the communication gap by allowing computers to understand and interact using
human languages.
5. The technology is highly time-efficient, streamlining tasks that would otherwise require
manual search and interpretation.

Disadvantages of NLP
1. Despite its benefits, NLP also faces some limitations:
2. It sometimes struggles to fully capture or represent context, leading to misinterpretations.
3. NLP systems can be unpredictable, especially in handling complex or ambiguous language
inputs.
4. Using NLP interfaces may require additional keystrokes compared to simpler input methods.
5. Most NLP systems are domain-specific and cannot easily adapt to new topics or tasks
without retraining or modification.

Sentiment Analysis
➔ Sentiment Analysis, also called opinion mining, is a technique used to evaluate the
emotional tone behind a body of text. By assigning values such as positive, negative, or
neutral, it helps identify the mood or emotional state of the sender (happy, sad, angry, etc.).
➔ Sentiment analysis combines techniques from NLP and statistical analysis and is widely
used to understand public opinion on websites, social media, and customer feedback
platforms.

LUNAR
LUNAR is a classical example of a Natural Language database interface system developed using
Woods' Procedural Semantics. It was designed to translate complex natural language expressions
into database queries and successfully handled approximately 78% of user requests without errors.

Difference Between NLU and NLG


Natural Language Understanding (NLU) Natural Language Generation (NLG)
Involves reading, interpreting, and understanding Involves generating natural language text from
human language. non-linguistic data.
Converts natural language inputs into machine- Converts structured or unstructured data into
readable representations. human-readable text.

Semantic Parsing
➔ Semantic parsing is the process of automatically translating natural language utterances into
formal meaning representations that computers can understand and execute. For instance, a
geographical information system might use a semantic parser to interpret a user query like,
"What is the highest mountain in Europe?" into a structured database query.
➔ The working of semantic parsing involves mapping natural language inputs to machine-
understandable logical forms. These logical forms are executable against a real-world
environment or knowledge base to yield a response, known as the denotation.
Note: This information is completely based on the textbook reference. The actual context may vary depending on the specific question asked in the exam. Please ensure you understand
the concepts thoroughly and apply them appropriately based on the question requirements.

You might also like