NLP UNIT 2 Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

NLP UNIT 2 Notes –

Morphological Analysis –
Morphological analysis is a field of linguistics that studies the structure of
words. It identifies how a word is produced through the use of morphemes. A
morpheme is a basic unit of the English language. The morpheme is the smallest
element of a word that has grammatical function and meaning. Free morpheme
and bound morpheme are the two types of morphemes. A single free morpheme
can become a complete word.
What is Morpheme?
Morpheme is the smallest meaningful units in any language. A word in a
language is made up of constituent morphemes. In English, some of the
example morphemes are as follows; words, plural morphemes (‘-s’ and ‘-es’),
grammatical morphemes (‘-ing’, and ‘-ed’) etc.
Morphemes can be broadly categorized into two classes –
 Free Morpheme
 Bounded Morpheme
Free morpheme – The morpheme that can appear in isolation and be
meaningful is called free morpheme. Stem (root) of a word is called free
morpheme. Because the root form of a word can give the meaning.
Example: dog, carry, good etc.
Bound morpheme – The morpheme that is usually attached to any other free
morphemes to give additional meaning of various kinds including plural and
grammatical variations is called bound morpheme. Bound morphemes are
sometimes referred as Affixes. There are four types of affixes. They are;
Prefixes – morphemes attached at the front of a stem
Example: undo, disagree, uncommon etc.
Suffixes – morphemes attached at the end of a stem
Example: hiding, attached, dogs etc.
Infixes – morphemes attached in between the stem
Example: infix morphemes are rare and not present in English
Types of Morphology:
Inflectional Morphology -
modification of a word to express different grammatical categories. Inflectional
morphology is the study of processes, including affixation and vowel change,
that distinguish word forms in certain grammatical categories.Inflectional
morphology consists of at least five categories, provided in the following
excerpt from Language Typology and Syntactic Description: Grammatical
Categories and the Lexicon. As the text will explain, derivational morphology
cannot be so easily categorized because derivation isn’t as predictable as
inflection.
Examples- cats, men etc.
Derivational Morphology:
Is defined as morphology that creates new lexemes, either by changing the
syntactic category (part of speech) of a base or by adding substantial,
nongrammatical meaning or both. On the one hand, derivation may be
distinguished from inflectional morphology, which typically does not change
category but rather modifies lexemes to fit into various syntactic contexts;
inflection typically expresses distinctions like number, case, tense, aspect,
person, among others. On the other hand, derivation may be distinguished from
compounding, which also creates new lexemes, but by combining two or more
bases rather than by affixation, reduplication, subtraction, or internal
modification of various sorts. Although the distinctions are generally useful, in
practice applying them is not always easy.

Syntactic Representation of NLP –


Syntactic analysis, also known as parsing, is a fundamental task in Natural
Language Processing (NLP) that involves analyzing the grammatical structure
of sentences to understand how words relate to each other. The goal of syntactic
analysis is to identify the syntactic constituents (e.g., phrases, clauses) in a
sentence and determine their hierarchical relationships according to the rules of
a formal grammar.
Parsing Algorithm -
Parsing algorithms are methods used to analyze the structure of a string based
on a formal grammar. These algorithms determine whether a given string
conforms to the grammar and, if so, how it can be decomposed into its
constituent parts. Two widely used parsing algorithms are the top-down
(predictive) parsing and bottom-up parsing.

Top-Down Parsing:
Top-down parsing starts from the root of the parse tree and tries to construct the
tree from the top down. It begins with the start symbol of the grammar and
repeatedly applies production rules to rewrite non-terminal symbols until the
entire input string is generated. Common top-down parsing techniques include
recursive descent parsing and LL parsing.
Bottom-Up Parsing:
Bottom-up parsing, as the name suggests, starts from the input string and works
its way up to the root of the parse tree. It begins by identifying the smallest
constituents (tokens) of the input string and combines them according to the
grammar rules until the start symbol is reached. Common bottom-up parsing
techniques include shift-reduce parsing and LR parsing.
. Chart Parsing
Chart parsing is a dynamic programming-based approach that constructs a chart
data structure to store and combine partial parse trees efficiently. It uses the
Earley parser algorithm or CYK (Cocke-Younger-Kasami) algorithm for
context-free grammars. Chart parsers can handle ambiguity and provide
multiple parses for a sentence, making them valuable for natural languages with
complex syntactic structures.

4. Shift-Reduce Parsing
Shift-reduce parsing is often used in dependency parsing, where the goal is to
build a dependency tree. In shift-reduce parsing, the parser maintains a stack of
words and a set of actions. It shifts a word onto the stack or reduces words
based on grammar rules. This efficient method can handle non-projective
syntactic structures, which other algorithms may struggle with.
Probabilistic Context-Free Grammars (PCFGs)-
Probabilistic Context-Free Grammars (PCFGs) are an extension of traditional
context-free grammars (CFGs) used in Natural Language Processing (NLP) to
model the syntax of natural language sentences. PCFGs introduce probabilities
to grammar rules, allowing for a probabilistic interpretation of the likelihood of
generating or parsing a sentence.

Here's an explanation of PCFGs, including mathematical terms, expressions,


and benefits:
Mathematical Terms:

Production Rules: Like in CFGs, PCFGs consist of a set of production rules,


where each rule defines a way to generate or derive a syntactic structure. Each
production rule is associated with a probability representing the likelihood of
using that rule.

Non-terminal Symbols: These are symbols that can be expanded into other
symbols or terminals (words). Each non-terminal symbol has associated
production rules.

Terminal Symbols: These are the actual words in the language. Terminal
symbols do not have any production rules associated with them.

Probability Distributions: PCFGs use probability distributions to assign


probabilities to production rules. These distributions can be discrete or
continuous, depending on the context.
Example:
Let's consider a simple grammar for generating basic sentences:
Terminal Symbols: the, dog, cat, runs, jumps
Non-Terminal Symbols: S (Sentence), NP (Noun Phrase), VP (Verb Phrase)
Production Rules:
S -> NP VP (Probability: 1.0) - Every sentence starts with a noun phrase
followed by a verb phrase.
NP -> the dog (Probability: 0.6) - Noun phrase can be "the dog".
NP -> the cat (Probability: 0.4) - Noun phrase can be "the cat".
VP -> runs (Probability: 0.7) - Verb phrase can be "runs".
VP -> jumps (Probability: 0.3) - Verb phrase can be "jumps".
Generating a Sentence:
Start with the S (sentence) symbol.
Apply a rule with S on the left-hand side. The only option is S -> NP VP
(probability 1.0).
Now we have NP on the left-hand side. Choose a rule:
NP -> the dog (probability 0.6) OR
NP -> the cat (probability 0.4) Let's say we randomly pick "the dog"
(probability 0.6).
Similarly, for VP, pick a rule:
VP -> runs (probability 0.7) OR
VP -> jumps (probability 0.3) Let's say we pick "runs" (probability 0.7).
Resulting Sentence:

Following these steps, we generated the sentence "the dog runs" with a
combined probability of 1.0 (S) * 0.6 (NP) * 0.7 (VP) = 0.42.
Benefits:
Robustness: Probabilistic parsers are often more robust to variations and
ambiguities in language. By considering the likelihood of different parses, they
can handle syntactic ambiguity more effectively. This robustness is particularly
useful in real-world applications where language can be highly ambiguous.

Statistical Insights: Probabilistic parsing provides statistical insights into the


structure of language. By analyzing large corpora of annotated data,
probabilistic parsers can learn the likelihood of different syntactic structures and
use this knowledge to guide parsing decisions.

Scalability: Probabilistic parsing techniques can be more scalable compared to


deterministic parsers. Rather than exhaustively searching through all possible
parse trees, probabilistic parsers can prioritize the most likely parses based on
learned probabilities, making parsing more efficient.
Integration with Machine Learning: Probabilistic parsing techniques can be
easily integrated with machine learning approaches. By training parsers on
annotated data using techniques such as maximum likelihood estimation or
neural networks, parsers can learn from data and improve their
accuracy over time.

Adaptability: PCFGs can be easily adapted to different domains or languages


by adjusting the probabilities assigned to production rules. This flexibility
makes PCFGs suitable for a wide range of NLP applications, including machine
translation, speech recognition, and information retrieval.

Semantic Analysis –
Semantic analysis in Natural Language Processing (NLP) involves the
understanding of the meaning conveyed by words, phrases, sentences, or
documents. It aims to capture the underlying semantics of language beyond
surface-level syntax.

Relations among lexemes & their senses –


Hyponymy -
Hyponymy happens when the meaning of one form is included in the meaning
of another in some type of hierarchical relationship. It can be found in verbs,
adjectives, and nouns. There are three major terms used in hyponymy:
'hypernym' which refers to a general term, 'hyponym' which refers to a more
specific term, and 'co-hyponyms' which refer to the hyponyms of the same
level. Take a look at the examples below:
Green, white and blue are the hyponyms of 'color'.
Here, 'color' is the hypernym of green, white and blue. Green, white and blue
are co-hyponyms of each other.
Homonyms –
When words with the same spelling and pronunciation have different meanings,
they are described as homonyms. Here are some examples:
Bat (flying animal)/bat (used for hitting the ball)
Polysemy-
Polysemy meaning “many signs”. It is a Greek word. Polysemy happens when a
word has more than one meaning and all its meanings are listed under one entry
in a dictionary. Here are some examples of polysemous words:
Mouth (noun) → mouth of a river, mouth of an animal, mouth of a cave.
Light (adjective): color, not heavy, not serious.

Synonymy-
It is a relation between two lexical items having different forms but expressing
the same or a close meaning.
Examples are 'author / writer', 'fate/destiny'

Antonymy
Antonymy is the relation between words with opposite meanings. Antonyms are
usually in pairs and can be found across all parts of speech. Here are some
examples:
‘like/dislike’, ‘rich/poor’, ‘never/always’, ‘on/off’

Wordnet –
WordNet is a lexical database for the English language that has been widely
used in Natural Language Processing (NLP) and computational linguistics. It is
a large repository of words organized into sets of synonyms, called synsets, and
linked by semantic relationships such as hypernyms (generalization), hyponyms
(specialization), meronyms (part-whole relationships), and holonyms (whole-
part relationships).

Here's a detailed explanation of WordNet:

Structure:
WordNet organizes words into synsets, which are groups of synonymous words
or phrases that express the same concept. Each synset represents a distinct
concept or meaning of a word.
Synsets are linked by various semantic relations, allowing for navigation
between related concepts. These relations include:
Hypernyms: Words that are more general or abstract and encompass the
meaning of another word. For example, "animal" is a hypernym of "cat".
Hyponyms: Words that are more specific and fall under the meaning of another
word. For example, "cat" is a hyponym of "animal".
Meronyms: Words that denote a part of something else. For example, "wheel"
is a meronym of "car".
Holonyms: Words that denote the whole to which another word belongs. For
example, "car" is a holonym of "wheel".
These semantic relations form a rich network of interconnected concepts,
providing valuable semantic information for NLP tasks.
Applications:
Word Sense Disambiguation (WSD): WordNet is commonly used in WSD
tasks to disambiguate the sense of a word based on its context. By leveraging
the semantic relationships in WordNet, algorithms can infer the correct sense of
a word in a given context.
Information Retrieval: WordNet can improve the performance of information
retrieval systems by expanding queries with synonyms and related words. By
including synonyms and hyponyms of query terms, retrieval systems can
retrieve more relevant documents.
Text Mining and Summarization: WordNet is used in text mining and
summarization tasks to identify key concepts, extract relevant information, and
generate summaries. By analyzing semantic relationships, algorithms can
identify important concepts and relationships in text.
Machine Translation: WordNet can aid in machine translation by providing
synonyms and related words in the target language. Translating words based on
their meanings rather than direct translations can improve translation quality.
Semantic Analysis: WordNet is valuable for semantic analysis tasks such as
sentiment analysis, named entity recognition, and semantic similarity. By
leveraging the semantic relations in WordNet, algorithms can better understand
the meaning of words and their relationships in text.

Word Sense Disambiguation (WSD) –


Word Sense Disambiguation (WSD) is a crucial task in Natural Language
Processing (NLP) that aims to determine the correct meaning of a word in a
given context, particularly when the word has multiple possible meanings or
senses. The goal of WSD is to select the most appropriate sense of a word based
on its surrounding context in a sentence or a document.

Here's a detailed technical explanation of Word Sense Disambiguation:

Problem Definition:
WSD addresses the ambiguity that arises when a word has multiple senses or
meanings. For example, the word "bank" can refer to a financial institution or
the side of a river. Determining which sense is intended in a particular context is
essential for accurate language understanding and processing.

Approaches for WSD –

Dictionary-based or Knowledge-based Methods:

These methods rely on dictionaries, thesauri, and lexical knowledge bases to


disambiguate word senses. They do not require labeled corpora for training. The
Lesk method, introduced by Michael Lesk, is a seminal dictionary-based
approach. It measures the overlap between sense definitions of a word and the
surrounding context to identify the correct sense.The sense with the highest
overlap is chosen as the correct sense for the word in that context.
Supervised Methods:
Supervised methods for WSD use machine learning techniques and rely on
sense-annotated corpora for training. These methods assume that the context
provides enough evidence to disambiguate word senses. Support vector machine
and memory-based learning are the most successful supervised learning
approaches to WSD. These methods rely on substantial amount of manually
sense-tagged corpora, which is very expensive to create.
Semi-supervised Methods-
Due to the lack of training corpus, most of the word sense disambiguation
algorithms use semi-supervised learning methods. It is because semi-supervised
methods use both labelled as well as unlabeled data. These methods require very
small amount of annotated text and large amount of plain unannotated text. The
technique that is used by semisupervised methods is bootstrapping from seed
data.
Unsupervised Methods:
Unsupervised methods for WSD assume that similar word senses occur in
similar contexts. They induce senses from text by clustering word occurrences
based on the similarity of their context. This process, known as word sense
induction or discrimination, does not rely on manual annotation efforts.
Applications of WSD:
Information Retrieval:
Improving the accuracy of search engines by ensuring that queries are
interpreted correctly and relevant documents are retrieved based on the intended
sense of ambiguous words.
Machine Translation:
Enhancing the quality of machine translation systems by disambiguating word
senses in source text and selecting appropriate translations in the target
language.
Question Answering:
Facilitating question answering systems by accurately identifying the meaning
of words in questions and providing relevant answers based on the intended
sense.
Text Summarization:
Improving the quality of automatic text summarization systems by
disambiguating word senses in input documents and generating concise and
accurate summaries.
Sentiment Analysis:
Enhancing sentiment analysis tasks by disambiguating sentiment-bearing words
and accurately determining the polarity (positive, negative, neutral) of text
based on context.
Semantic Search:
Supporting semantic search systems by disambiguating word senses in queries
and retrieving documents that match the intended meaning of ambiguous terms.
Word Sense Induction:
Discovering and clustering word senses in an unsupervised manner, which can
aid in various downstream NLP tasks and contribute to the creation of lexical
resources.

Latent Semantic Analysis –

Latent Semantic Analysis (LSA) is a technique in Natural Language Processing


(NLP) and Information Retrieval (IR) used to analyze the relationships between
a set of documents and the terms they contain. It is based on the principle of
latent semantic indexing, where the semantics (meanings) of words and
documents are represented in a lower-dimensional space. Here's an explanation
of LSA:
1. Vector Space Model:
LSA operates within the vector space model, where each document and term is
represented as a vector in a high-dimensional space.
2. Term-Document Matrix:
Create a term-document matrix where each row corresponds to a term, each
column corresponds to a document, and the matrix elements represent the
frequency of a term in a document.
3. Singular Value Decomposition (SVD):
Apply Singular Value Decomposition to decompose the term-document matrix
into three matrices: U (term space), Σ (diagonal matrix of singular values), and
V to the powerT (document space).
A≈UΣV to the power T
4. Dimensionality Reduction:
Retain only the top k singular values and their corresponding columns in U and
V to the power T. This reduces the dimensionality of the term and document
spaces.
5. Semantic Space:
The reduced U matrix represents the terms in a lower-dimensional semantic
space.
The reduced V to the power T matrix represents the documents in the same
semantic space.
6. Cosine Similarity:
Calculate the cosine similarity between the vectors in the semantic space to
measure the similarity between terms and documents.
Applications:
LSA has several applications in NLP and IR, including:
Information Retrieval: LSA can be used to find documents relevant to a query
by measuring the similarity between the query vector and document vectors in
the latent semantic space.
Document Clustering: LSA can cluster documents based on their semantic
similarity, allowing for organization and summarization of large document
collections.
Topic Modeling: LSA can extract latent topics from a corpus by analyzing the
patterns of term-document associations in the latent semantic space.

You might also like