NLP UNIT 2 Notes
NLP UNIT 2 Notes
NLP UNIT 2 Notes
Morphological Analysis –
Morphological analysis is a field of linguistics that studies the structure of
words. It identifies how a word is produced through the use of morphemes. A
morpheme is a basic unit of the English language. The morpheme is the smallest
element of a word that has grammatical function and meaning. Free morpheme
and bound morpheme are the two types of morphemes. A single free morpheme
can become a complete word.
What is Morpheme?
Morpheme is the smallest meaningful units in any language. A word in a
language is made up of constituent morphemes. In English, some of the
example morphemes are as follows; words, plural morphemes (‘-s’ and ‘-es’),
grammatical morphemes (‘-ing’, and ‘-ed’) etc.
Morphemes can be broadly categorized into two classes –
Free Morpheme
Bounded Morpheme
Free morpheme – The morpheme that can appear in isolation and be
meaningful is called free morpheme. Stem (root) of a word is called free
morpheme. Because the root form of a word can give the meaning.
Example: dog, carry, good etc.
Bound morpheme – The morpheme that is usually attached to any other free
morphemes to give additional meaning of various kinds including plural and
grammatical variations is called bound morpheme. Bound morphemes are
sometimes referred as Affixes. There are four types of affixes. They are;
Prefixes – morphemes attached at the front of a stem
Example: undo, disagree, uncommon etc.
Suffixes – morphemes attached at the end of a stem
Example: hiding, attached, dogs etc.
Infixes – morphemes attached in between the stem
Example: infix morphemes are rare and not present in English
Types of Morphology:
Inflectional Morphology -
modification of a word to express different grammatical categories. Inflectional
morphology is the study of processes, including affixation and vowel change,
that distinguish word forms in certain grammatical categories.Inflectional
morphology consists of at least five categories, provided in the following
excerpt from Language Typology and Syntactic Description: Grammatical
Categories and the Lexicon. As the text will explain, derivational morphology
cannot be so easily categorized because derivation isn’t as predictable as
inflection.
Examples- cats, men etc.
Derivational Morphology:
Is defined as morphology that creates new lexemes, either by changing the
syntactic category (part of speech) of a base or by adding substantial,
nongrammatical meaning or both. On the one hand, derivation may be
distinguished from inflectional morphology, which typically does not change
category but rather modifies lexemes to fit into various syntactic contexts;
inflection typically expresses distinctions like number, case, tense, aspect,
person, among others. On the other hand, derivation may be distinguished from
compounding, which also creates new lexemes, but by combining two or more
bases rather than by affixation, reduplication, subtraction, or internal
modification of various sorts. Although the distinctions are generally useful, in
practice applying them is not always easy.
Top-Down Parsing:
Top-down parsing starts from the root of the parse tree and tries to construct the
tree from the top down. It begins with the start symbol of the grammar and
repeatedly applies production rules to rewrite non-terminal symbols until the
entire input string is generated. Common top-down parsing techniques include
recursive descent parsing and LL parsing.
Bottom-Up Parsing:
Bottom-up parsing, as the name suggests, starts from the input string and works
its way up to the root of the parse tree. It begins by identifying the smallest
constituents (tokens) of the input string and combines them according to the
grammar rules until the start symbol is reached. Common bottom-up parsing
techniques include shift-reduce parsing and LR parsing.
. Chart Parsing
Chart parsing is a dynamic programming-based approach that constructs a chart
data structure to store and combine partial parse trees efficiently. It uses the
Earley parser algorithm or CYK (Cocke-Younger-Kasami) algorithm for
context-free grammars. Chart parsers can handle ambiguity and provide
multiple parses for a sentence, making them valuable for natural languages with
complex syntactic structures.
4. Shift-Reduce Parsing
Shift-reduce parsing is often used in dependency parsing, where the goal is to
build a dependency tree. In shift-reduce parsing, the parser maintains a stack of
words and a set of actions. It shifts a word onto the stack or reduces words
based on grammar rules. This efficient method can handle non-projective
syntactic structures, which other algorithms may struggle with.
Probabilistic Context-Free Grammars (PCFGs)-
Probabilistic Context-Free Grammars (PCFGs) are an extension of traditional
context-free grammars (CFGs) used in Natural Language Processing (NLP) to
model the syntax of natural language sentences. PCFGs introduce probabilities
to grammar rules, allowing for a probabilistic interpretation of the likelihood of
generating or parsing a sentence.
Non-terminal Symbols: These are symbols that can be expanded into other
symbols or terminals (words). Each non-terminal symbol has associated
production rules.
Terminal Symbols: These are the actual words in the language. Terminal
symbols do not have any production rules associated with them.
Following these steps, we generated the sentence "the dog runs" with a
combined probability of 1.0 (S) * 0.6 (NP) * 0.7 (VP) = 0.42.
Benefits:
Robustness: Probabilistic parsers are often more robust to variations and
ambiguities in language. By considering the likelihood of different parses, they
can handle syntactic ambiguity more effectively. This robustness is particularly
useful in real-world applications where language can be highly ambiguous.
Semantic Analysis –
Semantic analysis in Natural Language Processing (NLP) involves the
understanding of the meaning conveyed by words, phrases, sentences, or
documents. It aims to capture the underlying semantics of language beyond
surface-level syntax.
Synonymy-
It is a relation between two lexical items having different forms but expressing
the same or a close meaning.
Examples are 'author / writer', 'fate/destiny'
Antonymy
Antonymy is the relation between words with opposite meanings. Antonyms are
usually in pairs and can be found across all parts of speech. Here are some
examples:
‘like/dislike’, ‘rich/poor’, ‘never/always’, ‘on/off’
Wordnet –
WordNet is a lexical database for the English language that has been widely
used in Natural Language Processing (NLP) and computational linguistics. It is
a large repository of words organized into sets of synonyms, called synsets, and
linked by semantic relationships such as hypernyms (generalization), hyponyms
(specialization), meronyms (part-whole relationships), and holonyms (whole-
part relationships).
Structure:
WordNet organizes words into synsets, which are groups of synonymous words
or phrases that express the same concept. Each synset represents a distinct
concept or meaning of a word.
Synsets are linked by various semantic relations, allowing for navigation
between related concepts. These relations include:
Hypernyms: Words that are more general or abstract and encompass the
meaning of another word. For example, "animal" is a hypernym of "cat".
Hyponyms: Words that are more specific and fall under the meaning of another
word. For example, "cat" is a hyponym of "animal".
Meronyms: Words that denote a part of something else. For example, "wheel"
is a meronym of "car".
Holonyms: Words that denote the whole to which another word belongs. For
example, "car" is a holonym of "wheel".
These semantic relations form a rich network of interconnected concepts,
providing valuable semantic information for NLP tasks.
Applications:
Word Sense Disambiguation (WSD): WordNet is commonly used in WSD
tasks to disambiguate the sense of a word based on its context. By leveraging
the semantic relationships in WordNet, algorithms can infer the correct sense of
a word in a given context.
Information Retrieval: WordNet can improve the performance of information
retrieval systems by expanding queries with synonyms and related words. By
including synonyms and hyponyms of query terms, retrieval systems can
retrieve more relevant documents.
Text Mining and Summarization: WordNet is used in text mining and
summarization tasks to identify key concepts, extract relevant information, and
generate summaries. By analyzing semantic relationships, algorithms can
identify important concepts and relationships in text.
Machine Translation: WordNet can aid in machine translation by providing
synonyms and related words in the target language. Translating words based on
their meanings rather than direct translations can improve translation quality.
Semantic Analysis: WordNet is valuable for semantic analysis tasks such as
sentiment analysis, named entity recognition, and semantic similarity. By
leveraging the semantic relations in WordNet, algorithms can better understand
the meaning of words and their relationships in text.
Problem Definition:
WSD addresses the ambiguity that arises when a word has multiple senses or
meanings. For example, the word "bank" can refer to a financial institution or
the side of a river. Determining which sense is intended in a particular context is
essential for accurate language understanding and processing.