0% found this document useful (0 votes)

43 views9 pages

NLP JNTUH Unit 1

The document discusses key linguistic concepts such as irregularity, ambiguity, productivity, and various morphological models. It explains the differences between generative and discriminative sequence classification methods, highlighting their training complexities and performance in tasks like sentence segmentation. Additionally, it presents performance results of different classifiers on speech corpora, demonstrating the effectiveness of various approaches.

Uploaded by

21wh1a6625

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views9 pages

NLP JNTUH Unit 1

Uploaded by

21wh1a6625

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Irregularity:

● Irregularity means having forms or structures in language that don't fit the usual
patterns or rules. These are exceptions to the standard ways of forming words or
sentences.
● Irregular verbs are verbs that don't follow the standard rule of adding -d, -ed, or -ied to
form their past simple or past participle forms (e.g., "go" becomes "went" instead of
"goed").
● Examples of irregular verbs include "run" (past: "ran"), "buy" (past: "bought"), and
"take" (past: "took").
● Irregularities can also appear in other parts of language, like noun plurals (e.g., "child"
becomes "children" instead of "childs") or comparative adjectives (e.g., "good" becomes
"better" instead of "gooder").
● Understanding irregularities is important because they often appear in everyday
language, and using them correctly makes speech and writing more natural.

Ambiguity:

● Ambiguity means that something, like a sentence or word, can have two or more
possible meanings.In both speaking and writing, there are two main types of ambiguity:
1. Lexical Ambiguity:
● Lexical ambiguity occurs when a word has more than one meaning, and it's
unclear which meaning is intended in a given context.
● This happens because some words have multiple definitions or uses.
○ Example:
● "Bank"
○ Could mean a financial institution (where you store
money).
○ Or it could refer to the side of a river (a riverbank).
2. Syntactic Ambiguity: Presence of two or more possible meanings within a single
sentence or sequence of words.
● This type of ambiguity happens when the way words are arranged allows for
multiple possible meanings.
○ Example:"The chicken is ready to eat."
○ This could mean that the chicken is prepared and ready for
someone to eat it (the chicken is the food).
○ Or it could mean that the chicken itself is hungry and ready to eat
something.

Issues and Challenges:Ambiguity is a challenge in communication because it can lead to

misunderstandings or confusion.

Linguistic Ambiguity:This happens when language is unclear and can be understood in

different ways. It can make it hard or even impossible for someone—or an AI program—to
figure out the exact meaning without more context or information.
○ Example: "I saw someone on the hill with a telescope." This sentence could
mean you used a telescope to see someone on the hill, or that the person you
saw had a telescope.

Homonyms: These are words that look the same but have different meanings or functions.

○ Examples:
■ "Bore" (to drill a hole) and "boar" (a wild pig).
■ "Two" (the number) and "too" (meaning also).

Productivity:

● Productivity in language refers to our unlimited ability to create new sentences and
expressions. This means we can use any language to say things that have never
been said before. It’s also called open-endedness or creativity.
● The term can also refer to specific parts of language, like prefixes or suffixes, that
help us create new words of the same type (e.g., adding "-ness" to "happy" to make
"happiness").
● Productivity is most often talked about in relation to word-formation, which is how we
create new words.
● Humans constantly come up with new ways to express ideas and describe new
things by using their language creatively. This ability, called productivity, allows us to
create an infinite number of sentences.
● Other animals don't have this kind of flexibility in communication. For example, cicadas
have only four signals, and vervet monkeys have 36 vocal calls. They can’t create new
signals to talk about new experiences or events.
● The limitless ability to create and understand completely new sentences is known as
open-endedness.
● Another important part of human creativity is the freedom to respond in any way we
choose. People can say whatever they want in any situation, or they can choose to
say nothing at all.

Morphological Models

• Dictionary Lookup

• Finite-State Morphology

• Unification-Based Morphology

• Functional Morphology
Dictionary Lookup:

● Morphological parsing is a process where word forms in a language are matched

with their corresponding linguistic meanings or structures.
● To analyze a word, systems often look it up directly in word lists, dictionaries, or
databases.
● A dictionary in this context is a data structure designed to quickly provide
precomputed results, like word analyzes.
● This data structure can be optimized for fast lookups, making the process efficient.
● The results from these lookups can also be shared across different applications.
● Lookup operations in dictionaries are typically simple and fast.
● Dictionaries can be created using various data structures like lists, binary search
trees, tries, hash tables, etc.
● Efficient Retrieval: Using optimized data structures in dictionaries helps in retrieving
word information quickly, which is crucial for real-time language processing.
● Broad Application: Dictionary-based methods are adaptable and have been
implemented in various linguistic tools and systems for multiple languages.
● Scalability: With the vast availability of data online, dictionaries can be continuously
updated and expanded, ensuring they cover a wide range of word forms and usages.

Finite-State Morphology:

● Finite-state morphological models use specifications written by programmers that are

directly compiled into finite-state transducers (FSTs).
● Two popular tools for this approach are XFST (Xerox Finite-State Tool) and LexTools.

Finite-State Transducers:

● A finite-state transducer is a computational device that extends the functionality of a

finite-state automaton.
● Essentially, an FST is like a finite-state automaton but operates on two (or more)
tapes: it reads input from one tape and writes output to another.
● Think of transducers as "translating machines" that convert input symbols (like
letters or words) into output symbols.
● FSTs are made up of a finite set of nodes connected by directed edges. These
nodes are called states, and the edges are called arcs.
● As you move through the network from the initial states to the final states along these
arcs, the FST reads input symbols and writes corresponding output symbols.
● The sequences that the transducer accepts define the input language, while the
sequences it outputs define the output language.
● Efficiency: FSTs are efficient for processing language because they handle regular
patterns and relations quickly.
● Flexibility: They can be used in various linguistic tasks, like word formation, spelling
correction, and more.
● Widely Used: Due to their power and flexibility, FSTs are commonly used in
computational linguistics and natural language processing.
● Complex Relations: FSTs can model complex relationships between input and output,
making them ideal for tasks like morphological analysis, where one form of a word needs
to be transformed into another.

Unification-Based Morphology:

● Unification-based morphology focuses on providing complete grammatical

descriptions of languages, especially within frameworks like head-driven phrase
structure grammar (HPSG).
● Unification is a key process where feature structures are combined to create a more
detailed structure.
● Feature structures can be visualized as directed acyclic graphs (DAGs), where
nodes represent variable values and paths represent variable names.
● These structures are often displayed as attribute-value matrices. For example, an
attribute named "number" might have the value "singular." Attributes can be atomic
(simple values like "singular") or complex (like another feature structure, a list, or a set).
● Unification can fail if the feature structures contain conflicting information.
● Unification can be monotonic (preserving all information)
● Morphological models based on unification are often formulated as logic programs
and use unification to solve the constraints defined by the model.
● Advantages include better abstraction for developing a morphological grammar and the
elimination of redundant information.
● Unification-based models have been successfully implemented for languages such as
Russian, Czech, Slovene, Persian, Hebrew, and Arabic.

Functional Morphology:

● Functional morphology uses principles from functional programming and type

theory.
● It views morphological operations (like word formation) as pure mathematical
functions and organizes these operations into different types and categories.
● This approach isn’t limited to one type of language structure; it’s especially useful for
fusional languages, where one word part (morpheme) expresses multiple grammatical
features.
● Key language concepts like paradigms (patterns), rules, exceptions, grammatical
categories, lexemes (words or word stems), morphemes (smallest units of meaning),
and morphs (specific forms) can all be modeled using functional morphology.
● Functional morphology implementations are designed to be reusable as
programming libraries. These libraries can handle the complete morphological
structure of a language and can be used in various applications.
● A functional morphology model can be turned into finite-state transducers for specific
tasks, or it can be used in a more flexible, interactive way.
● Many functional morphology models are built into general-purpose programming
languages, giving developers the ability to use advanced programming techniques to
create real-world applications.
● Functional morphology models achieve high levels of abstraction.

Generative Sequence Classification Methods:

● Generative Approach: This method focuses on learning about each class (or
category) by understanding how data is generated for that class.
○ Learning Process: It learns the joint probability distribution p(x,y), which
means it tries to model how both the features (x) and the classes (y) are
related.
○ Data Modeling: It models the distribution of data within each class
separately. For instance, it learns what a lion and an elephant look like based on
images from the zoo.
○ Reconstruction: It can generate new samples that are similar to those from the
classes it has learned about. For example, it can generate images of lions and
elephants that resemble the ones seen before.
○ Understanding: It has a deeper understanding of the overall structure of the
data and the relationships between different features.
○ Applications: Useful in scenarios where you need to generate new data,
simulate scenarios, or understand the underlying distribution of the data.
Examples include generative adversarial networks (GANs) and hidden
Markov models (HMMs) and Naive Bayes classifiers.
○ Flexibility: Can be used for tasks like data imputation, anomaly detection, and
more because it understands the data generation process.
○ Advantages:
■ Can handle missing data by generating it.
■ Useful for scenarios where understanding the data generation process is
crucial.
○ Disadvantages:
● Can be more complex to train due to the need to model the entire
distribution.
● May require more data and computation.

Discriminative Sequence Classification Methods:

● Discriminative Approach:
○ This method focuses on distinguishing between different classes based on
the features provided.
○ Learning Process: It learns the conditional probability distribution p(y∣x),
which means it tries to model the probability of a class given the features.
○ Feature Differences: It focuses on learning the differences between classes
by directly analyzing features and their relationships. For instance, it
identifies specific features that differentiate a lion from an elephant.
○ Classification: It is primarily used for making classifications or predictions.
For example, it can classify an unknown animal as a lion or elephant based on its
features.
○ Efficiency: Often requires less data to achieve high accuracy in
classification because it focuses on distinguishing features rather than
understanding the entire data distribution.
○ Applications: Commonly used in tasks like image recognition, spam
detection, and speech recognition.
○ Examples include logistic regression and support vector machines (SVMs).
○ Performance: Typically performs better in classification tasks where the primary
goal is to distinguish between categories, rather than understanding how each
category is generated.

Summary:

● Generative Models aim to understand and model how data is generated, providing a
deeper insight into the data distribution and enabling data generation.
● Discriminative Models aim to focus on distinguishing between different categories
based on features, often leading to more accurate and efficient classification in practice.
Complexity of Approaches

1. Complexity in Training and Prediction:

○ Generative Models:
■ Training Complexity: Generally less complex because they focus on
modeling the overall data distribution. Training involves learning the joint
probability distribution p(x,y).
■ Prediction Complexity: Typically faster since the model has learned the
entire data distribution and can use this to generate or classify new data
directly.
■ Performance: Often requires more data and computational resources
but can handle a variety of tasks, including generating new data.
○ Discriminative Models:
■ Training Complexity: More complex because they focus on learning the
boundary between classes. Training involves adjusting feature weights
through multiple passes over the data to optimize classification
performance.
■ Prediction Complexity: Generally slower despite simpler models
because prediction requires evaluating feature weights for each instance.
However, some discriminative models can make predictions quickly once
trained.
■ Performance: Typically performs better on smaller training sets
compared to generative models. They are often more accurate for
classification tasks.
2. Preprocessing:
○ Some algorithms, particularly generative ones, may require preprocessing of
data. This includes converting continuous features into discrete features or
normalizing data to improve performance.
3. Sequence Models:
○ Decoding Complexity: For sequence classification, the additional complexity
arises from decoding, which involves finding the best sequence of decisions. This
requires evaluating all possible sequences, which can be computationally
expensive and time-consuming.
4. Real-World Performance:
○ Generative Models: Perform well when there is ample training data and when
understanding the data generation process is crucial.
○ Discriminative Models: Often excel in real-world classification tasks,
especially with smaller datasets. They are typically more accurate but may
require more sophisticated training processes.

Performance of Approaches for Sentence Segmentation in Speech

Evaluation Metrics:
1. Error Rate: Measures the ratio of errors to the total number of examples. Lower
error rates indicate better performance.
2. F1 Measure: The harmonic mean of recall and precision, which balances both
metrics to provide a single performance measure. Higher F1 scores reflect better
performance.

Performance Results:

● Mandarin TDT4 Multilingual Broadcast News Speech Corpus:

○ MaxEnt Classifier: F1 measure of 69.1%
○ Adaboost: F1 measure of 72.6%
○ Support Vector Machines (SVMs): F1 measure of 72.7%
○ Combination of 3 Classifiers Using Logistic Regression: The report suggests
this approach may improve results, though exact F1 measures are not specified.
● Turkish Broadcast News Corpus:
○ HELM: F1 measure of 78.2%
○ fHELM with Morphology Features: F1 measure of 86.2%
○ Adaboost: F1 measure of 86.9%
○ Conditional Random Fields (CRFs): F1 measure of 89.1%
○ Note: HELMs (Hierarchical Hidden Markov Models) were trained on the same
corpus as the other classifiers, highlighting that model performance can vary
based on the type of classifier and additional features used.

NLP Unit I Notes-1
75% (4)
NLP Unit I Notes-1
22 pages
Emily M. Bender - Linguistic Fundamentals For Natural Language Processing-Morgan & Claypool (2013)
100% (1)
Emily M. Bender - Linguistic Fundamentals For Natural Language Processing-Morgan & Claypool (2013)
166 pages
NLP Unit I
No ratings yet
NLP Unit I
117 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
Word Level Analysis
No ratings yet
Word Level Analysis
49 pages
NLP Notes
No ratings yet
NLP Notes
180 pages
NLP Unit-I-1
No ratings yet
NLP Unit-I-1
84 pages
NLP Final Notes
No ratings yet
NLP Final Notes
47 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Electronic
No ratings yet
Electronic
17 pages
Chapter 1 - Natural Language Processing (NLP)
No ratings yet
Chapter 1 - Natural Language Processing (NLP)
35 pages
NLP Unit 1
No ratings yet
NLP Unit 1
56 pages
NLP Lect 2 Words and Morphology
No ratings yet
NLP Lect 2 Words and Morphology
52 pages
CME4408 P3 NLPtechniques
No ratings yet
CME4408 P3 NLPtechniques
33 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
7 pages
tdt4310 2024 Lect11 Full
No ratings yet
tdt4310 2024 Lect11 Full
78 pages
NLP Conventional
No ratings yet
NLP Conventional
27 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
NLP Ambiguity
No ratings yet
NLP Ambiguity
35 pages
AI - NLP-Comp Vision Updated1
No ratings yet
AI - NLP-Comp Vision Updated1
40 pages
LIN Textbook Hale Kissock Melaurie
No ratings yet
LIN Textbook Hale Kissock Melaurie
192 pages
Lecture 2 LinguisticPreliminaries
No ratings yet
Lecture 2 LinguisticPreliminaries
65 pages
Lecture Template 16x9
No ratings yet
Lecture Template 16x9
16 pages
Hale Kissock Introduction Edited
No ratings yet
Hale Kissock Introduction Edited
190 pages
NLP Question and Answers Final
No ratings yet
NLP Question and Answers Final
129 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
Wordlevel Analysis - Chap2
No ratings yet
Wordlevel Analysis - Chap2
97 pages
Ambiguity in Natural Language Processing
No ratings yet
Ambiguity in Natural Language Processing
9 pages
2 NLP
No ratings yet
2 NLP
36 pages
SebentaLN Parte1
No ratings yet
SebentaLN Parte1
42 pages
Introduction To Natural Language Processing: Unit 1
No ratings yet
Introduction To Natural Language Processing: Unit 1
60 pages
Lexical Semantic - 3.3
No ratings yet
Lexical Semantic - 3.3
19 pages
Unit 12 (3 Half)
No ratings yet
Unit 12 (3 Half)
37 pages
NLP IA1 Question Bank: Concept
No ratings yet
NLP IA1 Question Bank: Concept
10 pages
Finnish 2008
No ratings yet
Finnish 2008
64 pages
Morphological Analysis
No ratings yet
Morphological Analysis
35 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
NLP m2
No ratings yet
NLP m2
71 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
19 pages
NLP Lecture 3
No ratings yet
NLP Lecture 3
44 pages
Chapter 7 - Communication Perceving and Acting
No ratings yet
Chapter 7 - Communication Perceving and Acting
21 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Module 3 - Part 1
No ratings yet
Module 3 - Part 1
54 pages
AI Unit 5
No ratings yet
AI Unit 5
18 pages
NLPNotes
No ratings yet
NLPNotes
12 pages
NLP Shorts 3
No ratings yet
NLP Shorts 3
25 pages
Approaches To Natural Language Processing
No ratings yet
Approaches To Natural Language Processing
9 pages
Linguistics Essentials: Instructor: Rada Mihalcea Taught by J. Hajic at Johns Hopkins University
No ratings yet
Linguistics Essentials: Instructor: Rada Mihalcea Taught by J. Hajic at Johns Hopkins University
46 pages
NLP Notes
No ratings yet
NLP Notes
26 pages
NLP Topper
100% (1)
NLP Topper
71 pages
Feature Systems and Augmented Grammars
No ratings yet
Feature Systems and Augmented Grammars
7 pages
Notes
No ratings yet
Notes
9 pages
Solution NLP UT1
No ratings yet
Solution NLP UT1
7 pages
3nlp Computer
No ratings yet
3nlp Computer
83 pages
Natural Language Processing
No ratings yet
Natural Language Processing
44 pages
NLP Sem Unit 1
No ratings yet
NLP Sem Unit 1
8 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
Module2 NLP BAD613B Notes
100% (1)
Module2 NLP BAD613B Notes
16 pages
Rich Automata Solns
No ratings yet
Rich Automata Solns
196 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
Anchas' Mid Psycho
No ratings yet
Anchas' Mid Psycho
5 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
NLP Notes of Unit One
No ratings yet
NLP Notes of Unit One
278 pages
NLP JNTUH Unit 1
No ratings yet
NLP JNTUH Unit 1
9 pages
Finite-State Description of Vietnamese Reduplication
No ratings yet
Finite-State Description of Vietnamese Reduplication
7 pages
Syllabus AI MTech CSE
No ratings yet
Syllabus AI MTech CSE
31 pages
Optical Character Recognition - Project Report
100% (1)
Optical Character Recognition - Project Report
84 pages
Natural Language Processing ..
No ratings yet
Natural Language Processing ..
20 pages
Lecture 2 PDF
No ratings yet
Lecture 2 PDF
65 pages
IS 7118 Unit-3 Morphology
No ratings yet
IS 7118 Unit-3 Morphology
98 pages
2 - Unit - 1 - Find Structures of Words
No ratings yet
2 - Unit - 1 - Find Structures of Words
42 pages
Language Technology in Tamil
No ratings yet
Language Technology in Tamil
38 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
Language Models
No ratings yet
Language Models
50 pages
Unit1 TAFL
No ratings yet
Unit1 TAFL
159 pages
Explain in Detail Rule Based POS Tagging
No ratings yet
Explain in Detail Rule Based POS Tagging
12 pages
Finite State Transducers: Data Structures and Algorithms For Computational Linguistics III
No ratings yet
Finite State Transducers: Data Structures and Algorithms For Computational Linguistics III
31 pages
Nondeterministic Finite State Machines: Nondeterminism
No ratings yet
Nondeterministic Finite State Machines: Nondeterminism
28 pages
Application of Fsa To NLP
100% (1)
Application of Fsa To NLP
21 pages
Part I: Introduction 1 Why Study Automata Theory? 2 Languages and Strings
No ratings yet
Part I: Introduction 1 Why Study Automata Theory? 2 Languages and Strings
22 pages
10 1 1 37 307
No ratings yet
10 1 1 37 307
50 pages
Morphosyntactic Analysis of Georgian
No ratings yet
Morphosyntactic Analysis of Georgian
21 pages
FST Explained
No ratings yet
FST Explained
7 pages
Amazigh-Sys: Intelligent System For Recognition of Amazigh Words
No ratings yet
Amazigh-Sys: Intelligent System For Recognition of Amazigh Words
8 pages
Computations of Programs Are Driven by Their Inputs
No ratings yet
Computations of Programs Are Driven by Their Inputs
9 pages
Collins Cobuild English Grammar
From Everand
Collins Cobuild English Grammar
HarperCollins UK
4/5 (13)
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
Syntax and Sentence Structure in Linguistics
From Everand
Syntax and Sentence Structure in Linguistics
Aadinath Guha
No ratings yet

NLP JNTUH Unit 1

Uploaded by

NLP JNTUH Unit 1

Uploaded by

Irregularity:

Issues and Challenges:Ambiguity is a challenge in communication because it can lead to

Linguistic Ambiguity:This happens when language is unclear and can be understood in

● Morphological parsing is a process where word forms in a language are matched

● Finite-state morphological models use specifications written by programmers that are

● A finite-state transducer is a computational device that extends the functionality of a

● Unification-based morphology focuses on providing complete grammatical

● Functional morphology uses principles from functional programming and type

Generative Sequence Classification Methods:

Discriminative Sequence Classification Methods:

1. Complexity in Training and Prediction:

Performance of Approaches for Sentence Segmentation in Speech

● Mandarin TDT4 Multilingual Broadcast News Speech Corpus:

You might also like