0% found this document useful (0 votes)
19 views17 pages

02 - Morphological Analysis

NLP-Final sem study material

Uploaded by

Khushi khokhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views17 pages

02 - Morphological Analysis

NLP-Final sem study material

Uploaded by

Khushi khokhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit # 1

Morphological
Analysis
Typical Use case ….
Absolutely loving the new update to the app. Great job! Positive Review
Very disappointed with the customer service, not helpful at all. Negative Review
I noticed the store has extended its hours. Interesting move. Neutral comment
Does anyone know if this product is available in blue? Enquiry
Just tried the new cafe downtown, and it's amazing! Praise , Positive f / b

I'm having trouble logging into my account, can you assist me? Support Request
My order has been delayed for two weeks now, what's going on? Complain

.
.
What are your store hours on weekends?
Can I get more information about the warranty on the laptop models you
sell
Suggestions Service Enquiry Complaint Top Mgmt
0.45 0.72 0.35 0.85 0.15
What is Morphology ?
In linguistics, Morphology is the study of the internal structure of words. It is the study
of words, how they are formed, and their relationship to other words in the same
language. It analyzes the structure of words and parts of words such as stems, root
words, prefixes, and suffixes. Morphology also looks at parts of speech, intonation
and stress, and the ways context can change a word's pronunciation and meaning.

It focuses on how the components within a word (stems, root words, prefixes,
suffixes, etc.) are arranged or modified to create different meanings.

Morphology varies greatly between languages. In languages such as Russian, word


endings indicate the role of a word in a sentence . As a result, morphological analysis
depends heavily on the source language, and an understanding of what is supported
within that language plays vital role in developing a NLP application.
The Natural Language API uses morphological analysis to infer grammatical
information about words.
Types of Morphemes
Some very Important relevant terminologies used in Morphology are …
Stem :
Is a part of a word responsible for its lexical meaning. It refers to the main part of a
word to which affixes (prefixes, suffixes, infixes, circumfixes) are added. It is the base
form that remains after removing all the affixes that modify its meaning or create new
words. Examples.
In the word "unbelievable" the stem is
For the word "runner," the
"believe."
stem is "run."
Prefix: "un-" (meaning not)
Stem: "run" (basic action)
Stem: "believe" (basic meaning: accept as
Suffix: "-ner" (one who does
true)
the action)
Suffix: "-able" (meaning able to be)
"Runner" refers to 'one who
The word "unbelievable" thus means 'not
runs.'
able to be believed.'

Root :
Is the most basic, irreducible part that carries the core meaning of the word. Unlike
stems, roots cannot be broken down into smaller parts and typically do not have
prefixes, suffixes, or infixes attached to them in their most basic form. Roots form the
base upon which stems and ultimately full words are built. In many cases, the root is
the same as the stem
Types of Morphemes (contd)
For the word "reaction," the root is "act." In "writer" the root is "write."
Prefix: "re-" (meaning again or back)
Root: "act" (basic action or doing) Root: "write" (basic action: to form letters
Suffix: "-ion" (denoting the action or or words)
condition of) Suffix: "-er" (one who does the action)
"Reaction" refers to 'the action of doing "Writer" refers to 'one who writes.'
something again or in response.'

Part of Speech :
Is a category of words in a language that have similar grammatical properties.
Common parts of speech include nouns, verbs, adjectives, adverbs, pronouns,
prepositions, conjunctions, and interjections. Each part of speech plays a specific role
in a sentence, contributing to the sentence's overall meaning and structure.
Understanding parts of speech is crucial for analyzing and constructing sentences
effectively.
Nouns: Words that name people, places, Adjectives: Words that describe or modify
things, or ideas. nouns.
Example: "Computer," "Paris," "happiness." Example: "red," "quick," "intelligent."
Verbs: Words that express actions,
occurrences, or states of being.
Example: "run," "is," "think."
Types of Morphemes (contd)
Adverbs: Words that modify verbs,
adjectives, or other adverbs, often indicating Conjunctions: Words that join
manner, place, time, or degree. words, phrases, or clauses.
Example: "quickly," "there," "very.“ Example: "and," "but," "because.“

Pronouns: Words that take the place of Interjections: Words used to


nouns. express emotions or sudden bursts
Example: "he," "they," "it.“ of feeling.
Example: "Wow!," "Ouch!," "Hey!"
Prepositions: Words that show the
relationship between a noun (or pronoun)
and other words in a sentence, often
indicating time, place, or direction.
Example: "in," "at," "by.“

Inflectional morphology
Adds information to a word consistent with its context within a sentence
Examples
• Number (singular versus plural) • Case (nominative versus accusative versus…)
automaton → automata he, him, his, …
• Walk → walks
Morphology Analysis Approaches
Morphological analysis may be defined as the process of obtaining grammatical
information from tokens, given their suffix information. Morphological analysis can be
performed in three ways:
1. Morpheme-based morphology (or anitem and arrangement approach),
2. Word-based morphology (or a word and paradigm approach), and
3. Lexeme-based morphology (or an item and process approach).

1. Morpheme-based morphology
Morpheme-based morphology analyzes and describes the structure of words by
breaking them down into their smallest meaningful units, called morphemes. There
are two main types of morphemes in morpheme-based morphology.
Free Morphemes: These can stand alone as words (e.g., "book", "go").
Bound Morphemes: These cannot stand alone and must be attached to a free
morpheme (e.g., prefixes like "un-", suffixes like "-ing"). Words are formed by
combining these morphemes in a linear arrangement.
Word: "Unhappiness"
Structure: [Prefix "Un-"] + [Root "happy"] + [Suffix "-ness"]

This structure shows that the word "unhappiness" is composed of three morphemes:
"un-" (a prefix), "happy" (a root), and "-ness" (a suffix). Each morpheme contributes to
the overall meaning of the word.
Morphology Analysis Approaches (contd)
2. Word -based morphology
Word-based morphology focuses on words as the central units of morphological
analysis rather than morphemes. This approach emphasizes the full forms of words
rather than attempting to segment words into constituent morphemes. It’s a contrast
to morpheme-based morphology, which breaks down words into the smallest units of
meaning. It treats words as indivisible wholes or as bases to which processes are
applied. It looks at how words change as whole units through processes like
inflection, derivation, and compounding.
There is less focus on dividing the word into prefixes, stems, and suffixes. Instead,
the processes that affect the word as a whole are examined.

Base Word: "Run" → Past Tense Process → Result: "Ran"


Morphology Analysis Approaches (contd)
3. Lexeme-based morphology
Lexeme-based morphology is a theoretical framework in linguistics, which
separates morphological processes into two layers: the lexical layer and the
inflectional layer.
-The lexical layer consists of lexemes, which are the abstract, minimal units of
meaning without any inflectional endings or derivational affixes. They represent
the set of words which often are "dictionary entries.
-The inflectional layer involves the addition of affixes to lexemes to express
grammatical relationships and features, such as tense, number, gender, etc.,
without changing the core meaning or word class (e.g., "walk" to "walked").
[ Lexeme "walk" ] → [ Derivation (N/A in this case) ] ↓
[ Inflection ] → [ "walk" (base) | "walks" (3rd person singular) | "walked"
(past) | "walking" (progressive) ]
Morphology Analysis (contd)
A morphological analyzer may be defined as a program that is responsible for the
analysis of the morphology of a given input token. It analyzes a given token and
generates morphological information, such as root ,stem,prefix and so on, as an
output.
While performing the morphological analysis, each particular word is analyzed. Each
word is assigned a syntactic category to discard the uncertainty from the word. Non-
word tokens such as punctuation are removed from the words.

Stemming
Stemming algorithms aim to remove those affixes required for eg. grammatical role,
tense, derivational morphology leaving only the stem of the word. This is a difficult
problem due to irregular words (eg. common verbs in English), complicated
morphological rules, and part-of-speech and sense ambiguities
NLTK algorithm
- PorterStemmer
- SnowballStemmer
- Lancaster stemmer:
Morphology Analysis (contd)
Lemmatization
Lemmatization is another technique used to reduce inflected words to their root
word. It describes the algorithmic process of identifying an inflected word’s “lemma”
(dictionary form) based on its intended meaning.

POS
Part of natural language processing is determining the role of each word or token in
a body of text. In the world of NLP, we call this process part-of-speech (POS)
tagging. The NLTK package comes with a function pos_tag() that makes this job
relatively seamless, and gives us a good starting point.
VB verb, base form – take
VBD verb, past tense – took
VBG verb, gerund/present participle – taking
VBN verb, past participle – taken
VBP verb, sing. present, non-3d – take
VBZ verb, 3rd person sing. present – takes

NN noun, singular ‘- desk’


NNS noun plural – ‘desks’
NNP proper noun - America
NNPS proper noun, plural - Americans

RB adverb – very, silently,


Stemming Vs Lemmatisation
Stemming and lemmatization are both text-processing techniques that aim to
reduce inflected words to a common base root. Despite the correlation in the
overarching objective, the two techniques are not the same.
Stemming algorithms attempt to find the common base roots of various inflections by
cutting off the endings or beginnings of the word. The crude heuristic approach taken
by stemming algorithms typically means they’re fast and efficient but not always
accurate.
On the other hand, lemmatization algorithms attempt to find common base roots from
inflected words by conducting a more heuristic morphological analysis. However , to
accurately reduce inflections, a detailed dictionary must be kept so the algorithm can
search through to link an inflected word back to its lemma. Lemmatization algorithms
sacrifice speed and efficiency for accuracy, BUT, may result in meaningful base roots
better than Stemming algorithms.
Popular NLP Tools
NLTK
NLTK is a leading platform for building Python programs to work with human
language data. It provides easy-to-use interfaces to over 50 corpora and lexical
resources such as WordNet, along with a suite of text processing libraries for
classification, tokenization, stemming, tagging, parsing, and semantic reasoning,
wrappers for industrial-strength NLP libraries

Google Natural Language API


The Google Natural Language API is an easy to use interface to a set of powerful NLP
models which have been pre-trained by Google to perform various tasks. As these
models have been trained on enormously large document corpuses, their performance
is usually quite good as long as they are used on datasets that do not make use of a
very idiosyncratic language.
The Natural Language API comprises five different services:

Syntax Analysis
Sentiment Analysis
Entity Analysis
Entity Sentiment Analysis
Text Classification
Popular NLP Tools (contd)
The analyzeSyntax method returns details about the linguistic structure of the given
text. For each token in the text, the Natural Language API provides information about
its internal structure (morphology) and its role in the sentence (syntax).

Google AutoML Natural Language


• If the Natural Language API is not flexible enough for business purposes, then
AutoML Natural Language is the next choice. AutoML is a new Google Cloud
Service (still in beta) that enables the user to create customized machine learning
models. In contrast to the Natural Language API, the AutoML models will be
trained on the user’s data and therefore fit a specific task. The AutoML service
requires a bit more effort for the user, mainly because you have to provide a
dataset to train the model.
• The AutoML service covers three use cases. All of these use cases support solely
the English language for now.
1. AutoML Text Classification
2. AutoML Entity Extraction
Thanks
Google AutoML Natural Language
If the Natural Language API is not flexible enough for your business purposes, then
AutoML Natural Language might be the right service. AutoML is a new Google Cloud
Service (still in beta) that enables the user to create customized machine learning
models. In contrast to the Natural Language API, the AutoML models will be trained
on the user’s data and therefore fit a specific task.

You might also like