NLP Notes
NLP Notes
● Communication is crucial for exchanging information between agents and their environments.
● It involves producing and interpreting signs from a shared system of symbols.
● Effective communication allows agents to acquire and utilize information observed or inferred
by others, enhancing their decision-making and success.
Purpose of NLP:
● The field of NLP focuses on enabling computers to process and perform tasks using natural
human languages.
● NLP systems work with inputs like spoken language and written text.
● A key subfield, Natural Language Understanding (NLU), is concerned with machine reading
comprehension, interpreting the meaning from language input.
Goals of NLP:
● The main objective is to create software capable of analyzing, understanding, and generating
human-like language.
● The ultimate aim is for users to communicate with computers as naturally as they would with
another human being.
● Natural Language Processing (NLP) enables computer programs to understand and process
human speech in its natural form.
● It is a subset of artificial intelligence focused on interpreting complex and ambiguous human
language, including slang, dialects, and contextual factors.
● Traditional programming languages are structured and precise, whereas human language can
be ambiguous and context-dependent, posing a challenge for NLP development.
Approaches to NLP:
● Modern NLP relies heavily on machine learning, a subset of AI that identifies patterns in data
to improve understanding and performance.
● Machine learning helps in handling the unpredictability of human language by adapting to
diverse linguistic patterns and improving accuracy over time.
● NLP can accurately interpret complex sentences, understanding abbreviations, context, and
industry-specific terms. For instance:
○ Recognizing that "cloud" refers to "cloud computing."
○ Identifying "SLA" as an acronym for "Service Level Agreement."
● The ultimate aim is to eliminate the need for traditional programming languages.
● In the future, all computer interactions may rely solely on natural human language, making
communication with computers as intuitive as speaking with another person.
● 1950s: NLP research began with Machine Translation (MT), focusing on converting text from
one language to another.
● Turing Test: Introduced by Alan Turing in the 1950s to evaluate a machine's ability to mimic
human conversation indistinguishably from a human.
● Linguistics and Cryptography: Early research included syntactic structures and language
translation.
● 1960s: Introduction of ELIZA, a popular NLP system simulating a psychotherapist's responses.
● Over time, NLP evolved from basic syntax analysis to include knowledge augmentation and
semantics, paving the way for machine learning-based approaches.
● Recent advancements involve multiple NLP systems driven by machine learning, with
competitions centered around the Turing Test.
Pragmatic Analysis in NLP:
● Pragmatics involves analyzing context and purpose, especially when resolving ambiguities that
arise at syntactic or semantic levels.
● Pragmatic analysis supports the interpretation of ambiguous phrases by considering the
context of the utterance.
Components of NLP
1. Natural Language Understanding:
○ Involves converting input in natural language to a meaningful internal representation.
○ Requires multiple levels of analysis:
■ Morphological Analysis: Study of word forms.
■ Syntactic Analysis: Structure of sentences.
■ Semantic Analysis: Meaning of sentences.
■ Discourse Analysis: Context of sentences in a conversation.
2. Natural Language Generation:
○ Producing natural language output from an internal representation.
○ Involves
■ Deep Planning: Deciding what to communicate.
■ Syntactic Generation: Structuring sentences.
○ Natural Language Understanding is generally more complex than generation.
3. Planning in NLP
○ Involves breaking down complex problems into manageable subparts.
○ Refers to computing steps for problem-solving before execution.
1. Pattern Matching:
○ Utilizes predefined patterns to interpret input as a whole rather than breaking it down.
○ Hierarchical pattern matching can reduce complexity by matching sub-phrases
gradually.
○ Semantic primitives (core concepts) can be used instead of specific words to simplify
the matching process.
2. Syntactically Driven Parsing:
○ Focuses on combining words into larger syntactic units like phrases or sentences.
○ Uses grammar rules to interpret sentence structure, contrasting with pattern matching
by starting with smaller components and building up.
3. Semantic Grammars:
○ Combines both syntactic and semantic elements for analysis.
○ Categories in a semantic grammar are defined by their meaning, making it more
flexible.
4. Case Frame Instantiation:
○ An advanced technique that uses a recursive structure for interpretation.
○ Combines bottom-up (starting from small units) and top-down (starting from larger
context) approaches for analysis.
Levels and Task of NLP
NLP tasks can be broadly classified into two main categories:
Morphological Analysis:
Syntactic Analysis:
Semantic Analysis:
Discourse Integration:
Pragmatic Analysis:
Prosody:
Stages in NLP
● Lexical Analysis is the first phase of NLP. It involves scanning the source code or text as a
stream of characters and converting them into meaningful lexemes (basic units of meaning).
● This phase divides the text into paragraphs, sentences, and words.
● Morphological Analysis examines the structure and formation of words, combining sounds
into minimal units of meaning (morphemes).
Semantic Analysis:
● Concerned with understanding the literal meaning of words, phrases, and sentences,
regardless of context.
● It focuses on what the words actually mean, leading to the creation of a meaningful
representation.
● Ambiguities may arise during this phase, as words can have multiple meanings.
Pragmatic Knowledge:
● This is the final phase of NLP, dealing with intended effects and the inner meaning behind a
sentence.
● Pragmatic analysis is concerned with how sentences are used in different contexts.
● For example, the command "Open the door" can be interpreted as a request rather than an
order.
Discourse Integration:
World Knowledge:
Factual Knowledge:
Conceptual Knowledge:
Procedural Knowledge:
● Refers to the skills or processes necessary to carry out tasks or activities within a
domain.
● Often called "know-how," it’s about knowing the “how” to do something, including
techniques, methods, and steps.
● Examples: Solving equations, using software tools, or performing a scientific
experiment.
Meta-cognitive Knowledge:
Phonetic Knowledge
● This refers to the understanding of sound-symbol relationships and how sounds are
represented in a language.
● As children learn to talk, they develop phonemic awareness, which is recognizing the distinct
sounds (phonemes) in language.
● Phonemes are the smallest units of sound that can differentiate words (e.g., the difference
between the sounds /b/ and /p/ in "bat" and "pat").
● Example: When a child learns that the sounds /k/, /a/, and /t/ together form the word "cat."
Phonological Knowledge
● This involves the broader ability to recognize and manipulate the sound structure of language,
including words, syllables, and rhymes.
● Phonological awareness includes skills like counting syllables, segmenting words, and
recognizing patterns.
● It encompasses phonemic awareness, but also includes understanding how larger sound units
like syllables and rhymes work together in language.
● Example: Counting the number of syllables in "elephant" or segmenting the sentence "The cat
sleeps" into individual words.
● Phonological Awareness: Ability to recognize that words are made of different sounds, which
includes tasks like syllable counting, rhyming, and breaking down sentences into words.
● Phonemic Awareness: Focuses specifically on understanding and manipulating phonemes,
like identifying the number of sounds in a word.
Examples:
Lexical Ambiguity
● Definition: Ambiguity that arises from a single word having multiple meanings.
● Example: The word "silver" can be interpreted as:
○ A noun (a metal or color)
○ An adjective (describing color)
○ A verb (to coat with silver)
Syntactic Ambiguity
● Definition: Ambiguity that occurs when a sentence can be parsed in different ways due to
word arrangement.
● Example: "The man saw the girl with the telescope."
○ Did the man use a telescope to see the girl?
○ Or, was the girl holding the telescope?
Semantic Ambiguity
● Definition: Ambiguity that arises when the meaning of words or phrases is unclear, leading to
multiple interpretations.
● Example: "The car hit the pole while it was moving."
○ Interpretation 1: The car, while moving, hit the pole.
○ Interpretation 2: The pole was moving when the car hit it.
Anaphoric Ambiguity
● Definition: Ambiguity caused by the use of pronouns or other referring expressions that are
unclear.
● Example: "The horse ran up the hill. It was very steep. It soon got tired."
○ Does "it" refer to the hill (steep) or the horse (tired)?
Pragmatic Ambiguity
● Definition: Ambiguity that arises from the context of a phrase, leading to multiple
interpretations based on social or conversational context.
● Example: "I like you too."
○ Interpretation 1: "I like you just as much as you like me."
○ Interpretation 2: "I like you, just like I like someone else."
● A significant portion of India's population, especially in rural areas, is literate in local languages
rather than English.
● Enhancing NLP for Indian languages can help bridge the digital divide and ensure wider
accessibility.
Digital Inclusion:
● The goal of a truly inclusive Digital India hinges on providing language support beyond English.
● The language barrier remains a challenge for smartphone usage, which is critical for accessing
information and digital services.
Applications in Agriculture:
● Farmers, who form a substantial part of India's economy, often lack English proficiency,
making it challenging to access modern agricultural knowledge.
● A voice-based application similar to Google Assistant but designed for Indian farmers could
significantly enhance their ability to access relevant information in their native language.
● Effective NLP for Indian languages is crucial for initiatives like precision agriculture, farmer
helplines, and knowledge sharing.
● Understanding farmer issues, including sensitive topics like farmer suicides, also requires
nuanced language processing capabilities.
● NLP can play a crucial role in enabling interpretation of sign languages and facilitating
communication through text-to-speech and speech-to-text technologies.
● This makes information more accessible to individuals with hearing or speech impairments.
Translation of Signboards:
● Translating signboards from local languages to English and other widely spoken languages can
make travel and navigation easier for non-native speakers and tourists.
● This helps create a more inclusive environment for both domestic and international travelers.
● Developing high-quality fonts for Indian scripts can significantly enhance the readability and
visual impact of advertisements, signboards, presentations, and reports.
● This ensures that written communication in local languages is clear and effective.
● For optimal results, there is a need for high-quality corpora and tools for Indian languages that
match the resources available for English.
● This includes comprehensive datasets, linguistic tools, and robust language models to support
diverse NLP applications.
Challenges to NLP
Language Differences:
● The performance of an NLP system depends heavily on the quality of training data.
● Poor-quality or biased data can lead to inaccurate or skewed results, impacting the system's
overall understanding of language.
Development Time:
Phrasing Ambiguities:
● Natural language often contains ambiguous phrasing that even humans struggle to interpret.
● NLP systems must be adept at understanding context and should be capable of seeking
clarification if needed.
Handling Misspellings:
Innate Biases:
● NLP systems can inherit biases from the programmers and the datasets used.
● Eliminating biases to ensure fairness and reliability across diverse user groups is a significant
challenge.
Words with Multiple Meanings (Polysemy):
● Many words have multiple meanings depending on the context, making interpretation
complex.
● Contextual understanding is crucial for accurately deciphering the intended meaning.
● Some user inputs have more than one intention, requiring the NLP to handle each aspect
without oversimplification.
● For example, distinguishing between canceling an order and updating payment details in a
single query is essential.
Applications of NLP
Translation:
Speech Recognition:
● Speech recognition enables machines to understand spoken language and convert it into text.
It allows for hands-free interaction, such as voice commands.
● Example: Google Now, Siri, and Alexa recognize speech commands like "call Ravi" and
respond accordingly.
Sentiment Analysis:
● NLP is used to analyze emotions in text data (like social media posts or reviews). It can classify
opinions as positive, negative, or neutral, helping companies understand public sentiment
about their products or services.
● Sentiment analysis is particularly important in fields like the stock market, where public
sentiment can impact stock prices.
Chatbots:
● Chatbots are AI-powered tools designed to interact with users and answer queries
automatically. They can range from basic customer support systems to more advanced ones
capable of handling complex requests.
● In healthcare, chatbots can assess symptoms, schedule appointments, and recommend
treatments.
Question-Answer Systems:
● These systems use NLP to answer user queries by understanding context and providing
accurate responses.
● IBM’s Watson famously competed on the quiz show Jeopardy!, showcasing advanced NLP
and AI capabilities by answering complex questions accurately.
● This application condenses large amounts of text into shorter summaries while retaining the
key information.
● It is useful for generating news headlines, search results snippets, and summarizing long
reports.
Market Intelligence:
● NLP helps businesses analyze unstructured data to gain insights into market trends, consumer
behavior, and competitor activities.
● Market intelligence tools can track sentiment, keywords, and intent in data, aiding in strategic
decision-making.
● This involves categorizing text based on its content, helping with tasks like organizing
information or filtering spam emails.
● NLP applications are used to classify spam vs non-spam emails or to tag content for
searchability.
● NLP tools can automatically detect and correct spelling and grammar errors in text, improving
writing quality.
● Example: Tools like Grammarly use NLP to highlight errors and suggest improvements.
Spam Detection:
● NLP and machine learning models are used to detect unwanted emails and classify them as
spam or not.
● This is crucial for managing email inboxes efficiently and preventing malicious content from
reaching users.
Information Extraction:
● This involves extracting structured data from unstructured documents. NLP helps convert
large amounts of unstructured text into a usable format for analysis.
● Example: Extracting data from financial reports or legal documents to facilitate quick
decision-making.
● NLU converts human language into formal representations (e.g., logical structures) that are
easier for computers to process and manipulate.
● This allows machines to better understand complex language constructs and make decisions
based on them.
Advantages of NLP
Disadvantages of NLP
Tokenization
Tokenization is a foundational task in Natural Language Processing (NLP). It involves splitting a piece
of text into smaller units called tokens, which can be words, characters, or subwords. Tokenization
types include:
● Word Tokenization: Splits text by words (e.g., "Never give up" → "Never", "give", "up").
● Character Tokenization: Breaks text into individual characters (e.g., "smarter" → "s", "m", "a",
"r", "t", "e", "r").
● Subword Tokenization: Splits words into meaningful parts (e.g., "smarter" → "smart", "er").
● Tokens are essential for processing text in NLP models like Transformers, RNNs, GRUs, and
LSTMs.
● Tokenization is used to process sensitive data, allowing for security in credit card processing,
e-commerce, and more by replacing sensitive info with tokens.
Tokens in Security
Token Vault: Stores sensitive information securely. Some tokens, however, use a vault-less method
by storing data algorithmically.
Token
Tokenization substitutes sensitive information with equivalent non sensitive information. The
nonsensitive, replacement information is called a token.
Word Tokenization
Word Tokenization uses delimiters to split text into words and underpins Word2Vec and GloVe
embeddings. Issues include:
● Out of Vocabulary (OOV): Words not in the training data vocabulary are unrecognized.
○ Solution: Replace rare words with an unknown token (UNK) to manage OOV.
● Vocabulary Size: Large corpora create extensive vocabularies, making memory management
challenging.
Character Tokenization
Character Tokenization represents text as characters, reducing OOV issues and limiting vocabulary
size (e.g., English has 26 letters). Drawbacks:
● Lengthy Sequences: Increases input and output sentence lengths, complicating learning.
Subword Tokenization
Subword Tokenization breaks down words using linguistic rules, capturing affixes that alter meanings
(e.g., "machinating" → "machinat", "ing"). Benefits:
● Manages OOV words by segmenting unknown words and retaining meaning through affixes.
Importance of Tokenization
Tokenization converts unstructured data into numerical vectors for machine learning. Tokenization is
the first step in any NLP pipeline. It has an important effect on the rest of your pipeline. A tokenizer
breaks unstructured data and natural language text into chunks of information that can be
considered as discrete elements. The token occurrences in a document can be used directly as a
vector representing that document.
This immediately turns an unstructured string (text document) into a numerical data structure
suitable for machine learning. They can also be used directly by a computer to trigger useful actions
and responses. Or they might be used in a machine learning pipeline as features that trigger more
complex decisions or behavior.
Tokenization can separate sentences, words, characters, or subwords. When we split the text into
sentences, we call it sentence tokenization.
It enables:
Benefits of Tokenization
● Tokenization makes it more difficult for hackers to gain access to cardholder data. In older
systems, credit card numbers were stored in databases and exchanged freely over networks.
● It is more compatible with legacy systems than encryption.
● It is a less resource-intensive process than encryption.
● The risk of the fallout in a data breach is reduced.
● The payment industry is made more convenient by allowing new technologies like
Subword Tokenization
Sub-word tokenization is a more granular approach to breaking down text than standard word
tokenization. It involves breaking individual words into smaller units, often using linguistic rules like
affixes (prefixes, suffixes, and infixes). This allows the model to understand how parts of words
function, which is especially useful for handling out-of-vocabulary (OOV) words.
Key Concepts:
1. Affixes: Affixes are parts of words that modify their meaning. They include:
○ Prefixes (e.g., "un-" in "undo"),
○ Suffixes (e.g., "-ing" in "running"),
○ Infixes (less common, inserted within words).
2. Breaking Words into Sub-words: In sub-word tokenization, words are split into smaller
meaningful units. For example, the sentence "What is the tallest building?" might be tokenized
into:
○ 'what', 'is', 'the', 'tall', 'est', 'build', 'ing'.
3. Handling Out-of-Vocabulary (OOV) Words:
○ If a word is not in the model's vocabulary (OOV), it is still tokenized into smaller
subunits.
○ For example, the word "machinating" might be broken down into the unknown token
'machin' and the suffix 'ing'. While 'machin' might not be recognized, 'ing' can
provide valuable information.
4. Inferences from Suffixes:
○ Suffixes like -ing can indicate:
■ Present participle (e.g., "running" from "run"),
■ Noun form (e.g., "building" from "build").
○ The NLP model can infer that "machinating" might function as a verb in its present
participle form, which aids in understanding the world's role in a sentence.
Stemming
Stemming is a technique in Natural Language Processing (NLP) that reduces inflected words to their
root forms. It simplifies the words by removing their inflections (e.g., tense, gender, or mood) to make
them uniform and easier to process.
Key Points:
1. Inflection:
○ Inflection involves modifying a word to express different grammatical categories, such
as tense or gender.
○ For example, the word “connect” can have various forms like “connections”,
“connected”, and “connects”.
2. Stemming Process:
○ Stemming involves reducing words to their base or root form.
○ For instance, “connections”, “connected”, and “connects” all stem to "connect".
○ In some cases, the result might not be a valid word in itself, such as “troubl” from
"trouble", "troubled", and "troubles", which is not a recognized word but serves as the
stem.
3. Purpose of Stemming:
○ Stemming helps in normalizing text, reducing redundancy, and preventing models
from overfitting due to variations of the same word.
○ It simplifies the words into their basic form, reducing the complexity for NLP models,
especially when analyzing large datasets.
4. Importance:
○ Data Reduction: Stemming reduces the number of unique terms in a dataset by
consolidating different forms of a word into one.
○ Improved Performance: By reducing words to their root form, stemming helps to
avoid redundancy and improves the efficiency of text processing, making NLP models
more effective.
○ Normalization: It ensures that different forms of the same word are treated as the
same, which improves model generalization and understanding of the data.
Challenges in Stemming
Stemming, while useful, has two primary challenges:
1. Overstemming:
○ Occurs when a word is truncated too much, leading to a nonsensical stem.
○ Example: "universal", "university", and "universe" are all reduced to "univers", which can
create confusion, as these words have distinct meanings in modern contexts. This can
negatively affect search results or understanding in NLP applications.
2. Understemming:
○ Occurs when related words are not reduced to the same stem due to linguistic
variations or complexity.
○ Example: "alumnus", "alumni", "alumna", and "alumnae" are all forms of the same word
in Latin, but they are not treated as equivalents in the stemming process, leading to
inconsistent results in NLP tasks.
Text Stemming
Stemming is a process in Natural Language Processing (NLP) where inflected or derived words are
reduced to their base or root form. This helps in treating different forms of a word as the same, thus
simplifying analysis and improving the effectiveness of NLP models. The process of stemming involves
removing prefixes and suffixes added to words, leading to their root form.
1. Root Form: The basic version of a word, from which other forms or variations are derived.
○ Example: The root of "walking," "walks," and "walked" is "walk."
2. Suffixes and Prefixes: These are added to words to change their meaning or grammatical
form.
○ Example: "Consult" can become "consultant," "consulting," "consultative," and
"consultants," but the stem remains "consult."
3. Stemming Algorithm: NLP algorithms called stemmers are used to remove suffixes and
prefixes from words, reducing them to their root form.
○ For example, a stemming algorithm would take words like "walking," "walked," and
"walks" and convert them to "walk."
Example:
In this example, the stemming algorithm identifies and reduces all the different forms of "Consult" to
their base form, "consult," despite the addition of different suffixes and prefixes.
Stemming can introduce errors due to the complexity and variability of language. Two main types of
errors are associated with stemming:
1. Overstemming:
○ Definition: This error occurs when the stemming algorithm removes too much of a
word, resulting in words with different meanings being reduced to the same stem.
○ Problem: The algorithm mistakenly groups words that have different meanings under
the same root, even though they should not be considered equivalent in context.
○ Example: Consider the words "university," "universities," "universal," and "universe." If
a stemmer reduces all these words to the stem "univers," it’s an example of
overstemming. While "universe" and "universal" are closely related, "university" and
"universities" have different meanings and should be stemmed differently.
○ Overstemming can lead to nonsensical results and affect the quality of information
retrieval or text analysis.
2. Understemming:
○ Definition: This error happens when the stemming algorithm fails to reduce a set of
related words to the same stem, treating them as separate words instead.
○ Problem: It occurs when the algorithm does not perform aggressive stemming, leaving
related words as different stems, thus failing to group them effectively.
○ Example: The words "alumnus," "alumni," "alumna," and "alumnae" are all related but
may not be reduced to a common stem, causing them to be treated as distinct entities.
Lemmatization
Lemmatization is the process of reducing words to their base or root form, known as a lemma, by
grouping together inflected forms of a word that share the same meaning. Unlike stemming, which
simply removes prefixes and suffixes, lemmatization involves a more comprehensive approach by
taking the context into account and converting words to their dictionary form.
Example:
● "leaf" → "leaves"
● "studying" → "study"
● "ran" → "run"
The term "leafs" would be lemmatized to "leaf" and "studying" to "study," helping in understanding
the intended meaning rather than just reducing the word form.
Example:
● NLTK provides a WordNet Lemmatizer, which uses a Morphy() function from the WordNet
corpus to find the lemma of words.
Importance of Lemmatization
1. Vital for NLU and NLP: Lemmatization plays a key role in Natural Language Understanding
(NLU) and Natural Language Processing (NLP), where accurately processing and
interpreting words is crucial.
2. Artificial Intelligence & Big Data: It's significant in both AI and big data analysis as it helps to
normalize words, improving data processing efficiency.
3. Accuracy: Lemmatization is more accurate than stemming as it ensures that words are
reduced to meaningful forms, making it more suitable for understanding user input in
applications like chatbots.
4. Slower than Stemming: While lemmatization provides higher accuracy, it is computationally
more expensive and slower than stemming due to its reliance on vocabulary and
morphological analysis.
Advantages:
1. More Accurate: Lemmatization is more accurate than stemming because it reduces words to
their root form based on context, ensuring that words with the same meaning are grouped
together, even if their inflections differ.
2. Uses Dictionary Forms: Unlike stemming, which just cuts off prefixes or suffixes,
lemmatization retrieves the root word from a dictionary, ensuring the result has meaning. For
example, "running" becomes "run," which is a valid dictionary word.
3. Better Context Recognition: Lemmatization is particularly beneficial for chatbots, as it
considers the exact and contextual meaning of words, improving the understanding of user
input and generating more accurate responses.
Disadvantages:
1. Time-Consuming and Slow: Lemmatization can be slower than stemming due to the need for
morphological analysis and vocabulary lookup, making it less efficient in real-time
applications.
2. Slower Algorithms: Since lemmatization requires a deeper analysis (e.g., checking the word in
a dictionary or corpus), the algorithms tend to be slower compared to stemming algorithms,
which simply trim the words.
ENGLISH MORPHOLOGY
Morphology is the study of the internal structure of words, focusing on how the components within a
word (such as stems, prefixes, and suffixes) are arranged or modified to convey different meanings. In
English, morphology plays a crucial role in modifying words to express various grammatical aspects
like tense, number, or class.
Key Points about English Morphology:
1. Morphemes: The smallest units of meaning in a language. For instance, in the word cats, "cat"
is the root morpheme, and "s" is a morpheme indicating plurality.
2. Affixes: English morphology frequently involves adding affixes (prefixes, suffixes) to root
words to form new words or alter their meaning. Examples include:
○ Plurality: Adding "s" or "es" to a noun to indicate plurality (e.g., cat → cats).
○ Past Tense: Adding "ed" to a verb to indicate past tense (e.g., walk → walked).
○ Adjective to Adverb: Adding "ly" to an adjective to form an adverb (e.g., happy →
happily).
3. Morphological Analysis in NLP: In Natural Language Processing (NLP), morphological analysis
helps computers understand the internal structure of words and their roles in sentences. This
understanding is essential for tasks like part-of-speech tagging and syntactic parsing.
4. Morphology in English vs. Other Languages: English is considered a "moderate" morphology
language compared to languages like Latin or Russian, which have complex inflection systems.
English relies more on word order than inflections to convey grammatical relationships (e.g.,
subject-object-verb order).
Kinds of Morphology
Morphology in linguistics is divided into two main categories: Inflectional Morphology and
Derivational Morphology. These categories help in understanding how words change in form and
meaning.
Inflectional Morphology
● Definition: Inflectional morphology involves changes to a word to express grammatical
features, such as tense, number, case, gender, or person, but it does not change the core
meaning or the part of speech of the word.
● Characteristics:
○ Regular: Inflectional morphemes apply to most or all words within a category. For
example, all countable nouns have a singular and plural form, and all verbs can be
conjugated to indicate different tenses.
○ Productivity: Inflectional rules are productive, meaning they can be applied to new
words that fit the category. For example:
■ Count nouns: dog → dogs (plural).
■ Verbs: talk → talked (past tense), running → run (present participle).
● Conveys Grammatical Information: Inflectional morphology provides crucial grammatical
details like number, tense, person, gender, and case. For example:
○ Number: "cat" (singular) → "cats" (plural).
● Meaning and Category Do Not Change: Unlike derivational morphology, inflection does not
change the basic meaning of the word or its part of speech.
○ For instance, the noun "cat" remains a noun even when it is inflected to "cats" (plural).
● Inflection of Root Word: The root word (or stem) can be inflected to form different
grammatical variations, but it stays within the same word class. For example:
○ Nouns: "dog" → "dogs" (plural), "fox" → "foxes" (irregular plural).
● Creation of Different Forms: Inflection produces different forms of the same word, keeping
the word's meaning intact but altering its grammatical properties. For example:
○ "work" (present) → "works" (third-person singular present).
● Examples:
○ Nouns: "cat" → "cats" (plural), "child" → "children" (irregular plural).
○ Verbs: "walk" → "walks" (third-person singular present), "talked" (past tense).
2. Derivational Morphology
● Definition: Derivational morphology changes a word’s form and often alters its part of speech
(form class). It can create new words or change the meaning of existing ones by adding
prefixes or suffixes.
● Characteristics:
○ Changes Part of Speech: Derivational morphemes often change the grammatical
category of a word, such as turning a noun into a verb, an adjective into a noun, etc.
○ Not Always Regular: Derivational morphology is not always as productive as
inflectional morphology. It can be irregular or less commonly applied, especially in
specific contexts or more specialized vocabulary.
○ Useful in Specialized Domains: Derivational morphemes are especially useful for
creating abstract nouns, forming technical terms, or developing scientific registers.
● Creating New Words: Derivation involves combining affixes with root words to form new
words. These new words can then act as roots for further derivations.
○ Example: Adding the suffix "-ness" to the adjective "happy" forms the noun "happiness."
● Derived from Root Words: In derivational morphology, new words are directly derived from
existing root words. The meaning of the derived word can differ significantly from the original
word.
○ For example, "perform" (verb) can be derived into "performance" (noun).
● Complexity in English Derivation: English derivation is complex due to several reasons:
○ Less Productive: Some affixes can only be applied to specific types of words. Not all
verbs or nouns can accept any given derivational affix.
○ Example: The verb "summarize" can combine with the suffix "-ation" to form
"summarization," but not all verbs can take the "-ation" suffix.
● Complex Meaning Differences : Some derivational suffixes can create words with
significantly different meanings, even when derived from the same root.
○ "Conformation" and "conformity" are both derived from the root word "conform," but
they have different meanings:
■ Conformation refers to the shape or structure of something.
■ Conformity refers to the act of adhering to rules, standards, or laws.
● Examples:
○ Noun to Adjective:
■ photograph (noun) → photographic (adjective).
○ Verb to Noun:
■ clear (adjective) + -ance → clearance (noun),
■ clear (adjective) + -ity → clarity (noun).
○ Noun to Verb:
■ nation (noun) + -al (adjective) → national (adjective),
■ national (adjective) + -ize → nationalize (verb),
■ nationalize (verb) + -ation → nationalization (noun).
○ Complex Derivations:
■ denationalization (noun) (process of reversing the nationalization of something).
● Productivity: Some derivational morphemes are highly productive, like -ize, which can be
added to many base words to form verbs (e.g., maximize, minimize, modernize).
Dictionary Lookup in NLP
In Natural Language Processing (NLP), dictionary lookup refers to the process of referencing a
pre-compiled list of unique words (or terms) that appear in a given corpus. A dictionary in NLP
contains not just individual words, but can also include multi-word terms that represent a single
concept. These terms are mapped to their corresponding linguistic representations and annotations,
which can help in further text analysis tasks.
Dictionary Definition:
● A dictionary in NLP is a collection of unique words or terms that occur in the text corpus.
Words are listed only once, even if they appear multiple times across different documents.
● Each term in the dictionary is associated with a term ID, which is a unique identifier.
Types of Terms:
● The dictionary may contain single words or multi-word terms that represent a single concept
(e.g., a list of country names to extract the concept of "country").
● For example, terms like "United States" or "New York" may be included as multi-word terms in
the dictionary for better concept extraction.
Variants of Terms:
● A dictionary can include different forms of a base term, like the plural form of a noun, or
different tenses of a verb. This helps capture variations in how terms are used in different
contexts.
Morphological Parsing:
● Morphological parsing involves associating word forms with their linguistic descriptions. A
dictionary-based approach to this parsing process directly links words to their precomputed
analyses.
● The dictionary or word list is typically structured to enable fast lookups of word forms, allowing
for efficient analysis and retrieval of linguistic features (e.g., tense, number, etc.).
Detailed Explanation
Morphological parsing is an important task in language processing, where word forms are associated
with their corresponding linguistic properties. A dictionary-based approach to this process works by
having an extensive list of word forms and their corresponding linguistic descriptions.
● Finite-State Automata (FSA): FSTs extend the power of finite-state automata. They consist of
a finite set of states connected by arcs (edges), with each arc labeled with pairs of input and
output symbols.
● The transducer processes an input sequence (e.g., a word form), navigating through states and
producing an output sequence (e.g., the word's lemma or another morphological form).
● The transducer defines a regular relation between input and output languages. For example,
it can translate words like vnuk to grandson, pravnuk to great-grandson, etc.
● In morphological analysis, surface strings represent the observed forms of words, while
lexical strings (lemmas) represent their underlying or base forms.
● For instance, the surface form "bigger" has the lexical form "big + Adj + comp", indicating that
"bigger" is the comparative form of the adjective "big".
● FSTs are used to define relations between surface forms and their corresponding lemmas (e.g.,
the relationship between running and run).
● In these transducers, a path from the initial state to a final state corresponds to a mapping
between a surface form and its lemma. The transducer is constructed by defining regular
expressions to describe these relations, which are then compiled into the transducer.
Two Key Challenges in Morphology:
1. Morphotactics:
○ Morphotactics refers to the rules that govern how morphemes (the smallest units of
meaning) are ordered and combined to form words. For example, in English, the suffix
-ness can combine with pity to form "piteness", but -ness cannot combine with -less as in
"pitilessness".
○ Some languages exhibit non-concatenative processes such as interdigitation
(intercalating morphemes) or reduplication, in addition to simple concatenation.
2. Morphological Alternations :
○ Morphological alternations refer to changes in the shape of morphemes depending on
their environment. For instance, the verb "die" becomes "dying" in the context of the
verb-forming morpheme -ing, but this is a morphophonemic alternation that needs to
be captured in the model.
Orthographic Rules:
● These are general rules used for word decomposition. They govern how words are
transformed in written form, such as how fox becomes foxes in the plural form.
● An example is the rule that singular English words ending in -y change to -ies when pluralized
(e.g., city becomes cities).
Morphological Rules:
● Morphological rules refer to exceptions to orthographic rules and are necessary when parsing
more complex word forms.
● These rules account for non-standard word transformations, such as irregular plural forms like
child to children or mouse to mice.
● With the rise of neural networks in natural language processing (NLP), the use of FSTs has
become less common, especially for languages with abundant training data. Neural networks
can perform morphological analysis with high accuracy and handle the complexity of
morphological rules more flexibly.
1. Machine Translation:
○ Morphological parsing aids in translating words accurately by identifying the correct
base forms and inflections across languages.
2. Spell Checkers:
○ Morphological analysis helps spell checkers by identifying not only correct spellings but
also valid morphemes, enabling more sophisticated error detection and correction.
3. Information Retrieval:
○ In information retrieval, understanding the morphology of a word helps improve search
queries by recognizing variations of words and retrieving relevant results.
Module 3 : Syntax Analysis
1. Dictionary/Lexicon Lookup:
○ The tagger first consults a dictionary or lexicon to assign possible POS tags to each
word in the sentence.
○ A word can have multiple possible tags if it has different meanings or usages in the
language (e.g., run can be a noun or a verb).
2. Disambiguation Using Rules:
○ If a word has multiple potential tags, the tagger uses a set of hand-written rules to
choose the most likely correct tag based on the context.
○ These rules analyze linguistic features such as the preceding and following words to
handle ambiguity.
● If a word is preceded by an article (e.g., the) or an adjective (e.g., beautiful), then the word is
likely to be a noun.
● Such rules are encoded in a tagger to resolve tagging ambiguities.
1. First Stage:
○ Uses a dictionary to assign a list of potential POS tags to each word.
2. Second Stage:
○ Applies a series of manually created disambiguation rules to narrow down the list to a
single POS tag for each word.
1. Knowledge-driven Taggers:
○ Rule-based POS taggers rely on expert knowledge to manually define the rules, making
them knowledge-driven.
2. Manual Rule Creation:
○ The rules are created by linguists or experts who understand the language's structure
and grammar.
3. Large Set of Rules:
○ Rule-based taggers typically require a substantial number of rules (around 1000 rules)
to cover various linguistic cases and handle exceptions.
4. Explicit Smoothing and Language Modelling:
○ Rule-based taggers explicitly define smoothing techniques to handle words not found in
the lexicon (out-of-vocabulary words) and ensure proper language modeling.
● Accuracy for Well-Defined Languages: Highly accurate for languages with well-defined
grammar and syntax.
● Interpretability: The rules are interpretable, allowing linguists to understand why a word was
tagged a certain way.
● Consistency: Provides consistent tagging if the rules are comprehensive.
1. Word-Frequency Approach:
○ This approach disambiguates words by looking at how frequently a word appears with
each possible tag in the training data.
○ The tag that appears most frequently with the word is chosen when tagging.
○ Example: If the word "bank" is tagged as a noun (N) 70% of the time and as a verb (V)
30% of the time in the training data, the tagger will choose Noun if it encounters the
word "bank" again.
○ Limitation: This method can produce inappropriate sequences of tags, as it does not
consider the context of the entire sentence, leading to errors in complex scenarios.
● Instead of just looking at individual word frequencies, this method calculates the probability
of sequences of tags occurring together.
● It assigns the best tag for a word based on the probability of that word appearing with the
preceding tags in the sentence.
● N-gram Approach:
○ Unigram: Consider each word individually.
○ Bigram: Consider the probability of a tag given the previous tag.
○ Trigram: Consider the probability of a tag given the two preceding tags.
● This approach is more context-aware and often more accurate than the Word-Frequency
approach.
● Rule-Based: Like traditional rule-based taggers, TBL utilizes rules to determine which tags to
assign.
● Machine Learning: Similar to stochastic taggers, it incorporates machine learning by
automatically learning rules from a training dataset.
● Readable Rules: TBL maintains the linguistic knowledge in a human-readable form, making it
easy to understand why certain decisions are made.
● Initialization: It starts with an initial tagging of the text. This can be a simple method, such as
assigning the most frequent tag from the training data for each word.
● Refinement: The initial tags are refined using transformation rules, which specify how to
change the current tag based on the context. The tagger iteratively applies the most beneficial
transformation.
● Iteration: The process continues in cycles until no further transformations improve the
tagging accuracy.
1. Begin with an Initial Solution : TBL starts with a basic tagging solution. This initial state might
involve assigning the most common tag for each word based on a training corpus.
2. Selecting the Most Beneficial Transformation: In each cycle, the system evaluates multiple
potential transformations. It selects the transformation rule that results in the most
significant improvement in tagging accuracy. A transformation rule could be: Change a tag from
X to Y if the preceding word is Z.
3. Applying the Transformation: The selected transformation is applied to the text, modifying
the tags accordingly.
4. Stopping Condition: The process repeats until no more beneficial transformations can be
found, indicating that the tagging is as accurate as possible.
1. Small and Simple Rule Set : Only a small number of transformation rules are needed to
achieve effective tagging. These rules are typically straightforward and easy to manage.
2. Ease of Development and Debugging : The rules are human-readable, making it easier to
understand and modify them. Debugging the model becomes simpler since the impact of each
rule is transparent.
3. Reduced Complexity : By combining machine-learned rules with manually written ones, TBL
simplifies the tagging process without sacrificing accuracy.
4. Efficiency : TBL is generally faster than probabilistic models like Markov-Model taggers due to
its simpler rule application.
Disadvantages of Transformation-Based Learning (TBL):
1. No Probability Estimation : TBL does not assign probabilities to the tags. This means it lacks
the statistical foundation found in stochastic models, which makes probabilistic reasoning
impossible.
2. Slow Training Time with Large Corpora : When dealing with large datasets, the training
phase in TBL can be slow, as it involves evaluating numerous transformations over many
cycles.
● Definition: The primary challenge in POS tagging is handling ambiguity. Many words in English
can serve multiple functions, leading to uncertainty in tagging.
● Example: The word "shot" can be tagged as a noun (He took a shot) or a verb (He shot the ball).
Disambiguating the correct POS requires understanding the context in which the word
appears.
● In English, common words often have several meanings, each associated with a different POS.
This can complicate the tagging process since the correct tag is context-dependent.
● Impact: Inaccurate tagging leads to downstream error propagation, affecting subsequent
NLP tasks like parsing, named entity recognition, or machine translation.
● To enhance tagging accuracy, POS tagging can be integrated with other processes, such as
dependency parsing. Joint approaches can provide better results than treating POS tagging as
an isolated task.
Context Dependency:
● The POS tag for a word is not solely determined by the word itself but is often influenced by
the neighboring words. The surrounding context, such as the preceding and following words,
plays a significant role in disambiguating POS.
Word Probabilities:
● The likelihood of a word being a certain part of speech can help resolve ambiguity. For
instance, "man" is more frequently used as a noun than a verb, making the noun tag more
probable in the absence of strong contextual clues.
Generative Models
1. Capability to Generate Data : A generative model has the ability to create new data instances
that resemble real examples. For instance, it could generate images of animals that look
convincingly real based on learned patterns.
2. Joint Probability:
○ Given a set of data instances X and labels Y, generative models are concerned with
capturing the joint probability P(X,Y). This means they can represent the probability of
both the data and the associated labels occurring together.
○ If there are no labels, generative models focus on the probability P(X), which represents
the likelihood of the data itself.
3. Understanding Data Distribution: Generative models aim to learn the underlying
distribution of data, allowing them to assign probabilities to new instances. For example,
models predicting the next word in a sequence are generative because they estimate the
likelihood of a particular word sequence appearing.
States : In POS tagging, each possible part-of-speech tag (like noun, verb, adjective) is a state. These
states are "hidden" because the true tag sequence is not directly observed.
Observations : The words in the input sentence are considered as observable events. Based on these
words, the HMM infers the sequence of hidden states (tags).
Transition Probability : This represents the probability of moving from one tag to another. For
example, the probability that a noun is followed by a verb.
Emission Probability : This measures the likelihood of observing a specific word given a particular
tag. For instance, how likely the word "run" is tagged as a verb.
Viterbi Algorithm : This is a dynamic programming technique used with HMMs to find the most
probable sequence of states (tags) for a given sequence of observations (words). It computes the
optimal path through a sequence by maximizing probabilities.
Markov Models
Markov models are probabilistic models used to describe a sequence of possible events, where the
probability of each event depends only on the state attained in the previous event. There are two
types:
1. Observable Markov Model (Markov Chain) : Each state is directly visible to the observer, and
there are no hidden variables. An example is predicting weather conditions (sunny, rainy)
where transitions depend only on the current state.
2. Hidden Markov Model (HMM) : The states are not directly visible, and instead, observations
provide indirect evidence about the states. HMM is particularly useful in cases where the
sequence of events is partially hidden.
Markov Chains
A Markov Chain is a way to predict a sequence of events where each event depends only on the
event right before it. In simple terms, it’s a system that moves from one state to another, and the
future state depends only on the present state, not on the entire past history. Imagine you're playing
a simple board game where you roll a dice and move to different spaces. The number you roll decides
where you go next, but it doesn't matter where you started or what you rolled before—only the
current roll matters. That's how a Markov Chain works!
States: These are the different situations you can be in. In the board game example, each space on
the board is a state.
Transition: Moving from one state to another. In the game, each dice roll is a transition from one
space to another.
Probability: Each transition has a probability. For example, if you’re in a certain space, you might
have a 50% chance to go to one space and a 50% chance to go to another, based on your dice roll.
How It Works:
1. Hidden States : These are the variables you cannot directly observe. For example, in weather
prediction, the hidden states might be "Rainy" or "Sunny." Although you can't see these states
directly, you can make educated guesses about them based on observable data.
2. Observations : These are the events or data you can see. For instance, someone carrying an
umbrella might be an observable event, which can give a hint about the hidden weather state.
3. Markov Assumption: HMMs rely on the assumption that each hidden state only depends on
the previous hidden state (memoryless property).
4. Components of an HMM:
○ Initial Probability Distribution: This tells you the starting likelihood of each hidden
state.
○ Transition Probability Distribution: The probability of moving from one hidden state
to another. For example, the chance of going from "Rainy" to "Sunny."
○ Emission Probabilities: These define the likelihood of an observable event given a
hidden state. For example, the probability of seeing someone shopping if the weather is
"Sunny."
○ Sequence of Observations: The series of visible events that you use to make guesses
about the hidden states.
Imagine you want to predict what someone is doing based on the weather, but you can’t see the
weather directly. Instead, you can observe activities like "shopping," "walking," or "cleaning."
How It Works:
● HMMs use the current hidden state to predict future observations and hidden states.
● The hidden states help make predictions, but you only get clues about them through
observable events.
● For example, if you notice someone frequently walking outside, it might hint that it's "Sunny"
rather than "Rainy."
Viterbi Algorithm
The Viterbi Algorithm is a dynamic programming technique used to find the most probable
sequence of hidden states in a Hidden Markov Model (HMM), given a sequence of observed events.
It’s often used in applications like speech recognition, part-of-speech tagging, and bioinformatics.
1. Initialization: Start by setting up the initial probabilities for each hidden state based on the
first observation.
2. Recursion: For each subsequent observation, calculate the probability of each hidden state
using the previous states. This involves choosing the path that maximizes the likelihood.
3. Backtracking: Once all observations are processed, trace back the sequence of hidden states
that led to the highest probability.
The Viterbi Algorithm ensures that you get the optimal hidden state sequence efficiently, even for
complex data sequences, by narrowing down to the most likely paths as it processes the
observations.
Issues in HMM
● The main problem with HMM POS Tagging is ambiguity.
● The POS tagging is based on the probability of tag occurring.
● There is no probability for the words that do not exist in the corpus.
● It uses different testing corpus, other than training corpus.
● It is the simplest POS tagging, since it chooses the most frequent tags associated with a word
in the training corpus.
● An HMM model is the doubly-embedded stochastic model, where the underlying stochastic
process is hidden.
● The hidden stochastic process can only be observed through another set of stochastic
processes that produces the sequence of observations.
Module 4: Semantic Analysis
Introduction to Semantic Analysis
Semantic Analysis is the process of interpreting and finding meaning in text. It helps computers
understand sentences, paragraphs, or documents by analyzing their grammatical structure and
identifying how individual words relate in a particular context. The primary goal is to derive the exact
or dictionary meaning from the text, checking if it makes logical sense.
For instance, consider the sentence, "Govind is great." The context is crucial to determine if "Govind"
refers to Lord Govind or a person named Govind. Semantic analysis aims to resolve such ambiguities.
● Machine translations
● Chatbots
● Search engines
● Text analysis
These applications extract significant information, ensuring the accurate meaning of a sentence.
While syntactic analysis considers word types, semantic analysis goes deeper into the meanings and
relationships between words.
In Natural Language Processing (NLP), semantic analysis plays a crucial role. It clarifies the context
and emotions behind a sentence, enabling computers to extract relevant information and perform
tasks with human-like accuracy.
1. Entities: These are individual, specific items or names, like a person, place, or object.
Examples include "Haryana," "Kejriwal," and "Pune."
2. Concepts: These represent general categories or types to which entities belong, such as
"person," "city," or "country."
3. Relations: This defines the relationships between entities and concepts. For example, in the
sentence "Lata Mangeshkar was a singer," a relation exists between "Lata Mangeshkar" (entity)
and "singer" (concept).
4. Predicates: These are verb structures that define actions or states. Predicates specify roles
within a sentence, such as the subject and object. Examples include case grammar and
semantic roles.
1. First Order Predicate Logic (FOPL): A formal system used to describe the meaning of
sentences through predicates and quantifiers.
2. Frames: Structured representations of knowledge with slots and fillers, often used to describe
typical situations or objects.
3. Rule-based Architecture: Systems based on predefined rules to interpret the meaning of
text.
4. Conceptual Graphs: Graph structures that visually represent the relationships between
concepts.
5. Semantic Nets: Networks that use nodes to represent concepts and edges to show
relationships between them.
6. Conceptual Dependency (CD): A model that represents the meaning of sentences through
actions and states to describe events.
7. Case Grammar: An approach that focuses on the semantic roles of words, such as agent,
object, and instrument.
Need for Meaning Representations
Lexical Semantics
Lexical Semantics is a branch of semantic analysis that focuses on the meanings of individual words
and smaller components, such as prefixes, suffixes, and compound phrases. These components are
collectively referred to as lexical items. Lexical semantics helps in understanding the relationship
between these items, the meaning of sentences, and how they fit into the syntactic structure of a
sentence.
1. Lexical Items: These are the building blocks of language, including words, parts of words (like
prefixes and suffixes), and phrases.
2. Relationship Between Lexical Items: Lexical semantics studies how these items interact with
each other and contribute to the overall meaning of a sentence.
1. Classification of Lexical Items: This involves organizing words, sub-words, and affixes based
on their characteristics, such as part of speech (noun, verb, adjective, etc.) or word structure.
2. Decomposition of Lexical Items: Breaking down words into smaller parts to understand their
root meanings, prefixes, suffixes, and how they contribute to the overall word meaning.
3. Analyzing Differences and Similarities: Comparing various words and phrases to explore
differences in their meanings or identify similarities in their structure or usage.
Lexical Characteristics
Lexical Characteristics focus on understanding language through the analysis of lexical
units—words, phrases, and their patterns—rather than emphasizing grammatical structures. This
method, known as the Lexical Approach, centers on the idea that meaning in language is primarily
carried by vocabulary rather than syntax.
These units are crucial because they reflect how words are naturally used together in a language.
While the lexical approach helps learners quickly grasp useful phrases, it has some drawbacks:
● It may limit creativity because learners rely on fixed expressions rather than constructing
sentences from scratch.
● There is less emphasis on understanding the deeper, intricate structures of the language,
which can affect fluency in novel situations.
This means that vocabulary is at the heart of conveying meaning, while grammar acts as a supportive
structure to manage and organize these words. In essence, learning vocabulary is seen as more
fundamental than mastering grammar for effective communication.
Corpus Study
Corpus Study, also known as corpus linguistics, is a research methodology that involves the
statistical analysis of large collections of written or spoken texts to investigate linguistic phenomena.
A corpus refers to a structured set of "real-world" texts, reflecting how language is used in natural
contexts. This method is crucial for uncovering the rules and patterns of a language by analyzing
authentic data instead of relying solely on theoretical constructs.
Corpus studies have broad applications, including linguistic research, creating dictionaries, and
crafting grammar guides. For example, the American Heritage Dictionary of the English Language
(1969) and A Comprehensive Grammar of the English Language (1985) were developed using
corpus data.
1. Annotation: Involves tagging texts with relevant information, like part-of-speech (POS)
tagging, parsing, and other structural details. This helps in organizing the data for further
study.
2. Abstraction: Involves translating annotated data into a theoretically driven model or dataset.
This can include linguist-directed searches or automated rule-learning.
3. Analysis: Focuses on statistically analyzing the data to identify trends, optimize rules, or
discover new insights. This stage may involve statistical evaluations, data manipulation, and
generalization.
Annotated corpora offers the advantage of allowing other researchers to perform further
experiments, facilitating shared linguistic debates and studies.
Corpus Approach
The Corpus Approach is a method that relies on a comprehensive collection of naturally occurring
texts for analysis. These collections can vary by type, such as written, spoken, or specialized academic
texts. The emphasis is on using naturally occurring language to understand its patterns and
variations.
These techniques help linguists uncover language use patterns and discourse practices.
Dictionaries vary in scope and structure, with some not fitting neatly into general or specialized
categories. Examples include:
Specialized Dictionaries
Specialized dictionaries, also known as technical dictionaries, focus on terminology within a specific
field. Lexicographers divide them into three main categories:
1. Multi-field Dictionary:
○ Covers several subject areas. Example: A business dictionary covering finance,
marketing, and management.
○ Example: Inter-Active Terminology for Europe (covers 23 languages).
2. Single-field Dictionary:
○ Focuses on one domain. Example: A legal dictionary.
○ Example: American National Biography (focused on biographical entries).
3. Sub-field Dictionary:
○ Even more specialized, covering niche areas within a domain. Example: Constitutional
law.
○ Example: African American National Biography (focusing on African American figures).
An alternative to these is a glossary, an alphabetical list of specialized terms, often seen in fields like
medicine.
Defining Dictionaries
A defining dictionary provides the simplest and most fundamental meanings of basic concepts:
● It includes a core glossary—the simplest definitions for the most commonly used words.
● In English, defining dictionaries usually limit their entries to around 2000 basic words, allowing
them to define about 4000 common idioms and metaphors.
Key elements in the analysis of semantic relationships among lexemes (words) include:
1. Hyponymy
○ Definition: A relationship between a general category (hypernym) and its specific
instances (hyponyms).
○ Example: "Colour" is a hypernym, while "red" and "green" are its hyponyms.
2. Homonymy
○ Definition: Words that have the same spelling or pronunciation but different and
unrelated meanings.
○ Example: The word "bat" can refer to both a piece of sports equipment and a flying
mammal.
3. Polysemy
○ Definition: A single word that has multiple meanings that are related by extension.
○ Example: The word "bank" can refer to:
■ (i) A financial institution.
■ (ii) The building that houses such an institution.
■ (iii) A verb meaning "to rely on."
4. Difference Between Polysemy and Homonymy
○ Polysemy involves meanings that are related to each other, even if distinct. For
example, different senses of "bank" are connected by the concept of "reliability" or
"holding."
○ Homonymy deals with meanings that are completely unrelated, such as the "bat" that
flies and the "bat" used in sports, which share no semantic connection apart from the
word form itself.
5. Synonymy
○ Definition: The relationship between two lexical items that have different forms but
express the same or very similar meanings.
○ Examples: "Author" and "writer," "fate" and "destiny.
6. Antonymy
○ Definition: The relationship between two lexical items that possess opposing meanings
relative to a certain axis.
○ Scope of Antonymy:
■ (i) Binary Opposition (Property or Not): Reflects a direct opposition, such as
"life/death" or "certitude/incertitude."
■ (ii) Gradable Opposition (Scalable Property): Involves a spectrum of opposites
where degrees exist, such as "rich/poor" or "hot/cold."
■ (iii) Relational Opposition (Usage-Based): A type of antonymy where the items
are defined by their relationship, such as "father/son" or "moon/sun."
Lexical Ambiguity
● Definition: The ambiguity arising from a single word that can have multiple meanings.
● Example: The word "walk" can be interpreted as a noun ("I went for a walk") or as a verb ("I
walk every morning").
Syntactic Ambiguity
● Definition: Occurs when a sentence can be parsed in multiple ways due to its structure.
● Example: The sentence "The man saw the girl with the camera" can be interpreted in two
ways:
○ The man saw a girl who had a camera.
○ The man saw the girl through a camera.
Semantic Ambiguity
● Definition: Ambiguity that arises when the meaning of a word or phrase in a sentence can be
misinterpreted.
● Example: The sentence "The bike hit the pole when it was moving" can mean:
○ The bike, while moving, hit the pole.
○ The bike hit the pole while the pole was moving.
Anaphoric Ambiguity
● Definition: Ambiguity that occurs when the use of anaphoric entities (e.g., pronouns) leads to
unclear references.
● Example: "The horse ran up the hill. It was very steep. It soon got tired." The pronoun "it"
could ambiguously refer to the hill or the horse in both instances.
Pragmatic Ambiguity
● Definition: Ambiguity that arises when the context allows for multiple interpretations of a
situation.
● Example: The phrase "I like you too" can have different meanings depending on context:
○ "I like you (just as you like me)."
○ "I like you (just like someone else does)."
WSD is applicable across several NLP fields, aiding in accurate interpretation and processing of
language data:
Relevance of WSD
WSD is closely related to Part-of-Speech (POS) tagging, a fundamental component of NLP. However,
unlike POS tagging, WSD involves understanding the semantic content of a word, not just its
grammatical category.
● The challenge lies in the contextual and non-binary nature of word meanings. Unlike numerical
quantities, word senses are fluid and depend heavily on context.
● Lexicography, which generalizes language data, may not always provide definitions applicable
to algorithmic processes or data sets, emphasizing the need for adaptable and context-aware
WSD methods.
WSD is vital for achieving higher accuracy in NLP applications, allowing systems to parse and
understand language closer to how humans interpret it.
Knowledge-Based Approach
A knowledge-based system (KBS) refers to a computer system that uses knowledge stored in a
database to reason and solve problems. The behavior of such systems can be designed using the
following approaches:
Declarative Approach
In the declarative approach, an agent begins with an empty knowledge base and progressively adds
information. The agent "Tells" or inserts sentences (facts or rules) one after another until it has
enough knowledge to perform tasks and interact with its environment effectively. This approach
focuses on what the system knows rather than how it processes that knowledge. The agent doesn't
specify the steps or procedures for solving problems explicitly; instead, it describes the necessary
facts and rules in a declarative manner.
For example, a rule like "if it rains, then the ground gets wet" would be added to the knowledge base,
and the system would use that to infer consequences when needed.
Procedural Approach
The procedural approach is quite different. Instead of merely storing facts and rules, this method
focuses on encoding the required behavior directly into the program code. In this approach, the
system specifies how the task is to be performed by translating knowledge into explicit instructions
(procedures or algorithms).
While the declarative approach emphasizes the knowledge itself, the procedural approach focuses on
the process or procedure for handling knowledge and solving problems. This can involve writing
step-by-step instructions or algorithms that define how the system operates in different situations.
Lesk Algorithm
The Lesk Algorithm is a method used in Word Sense Disambiguation (WSD) to determine the
meaning of an ambiguous word based on its context. The core idea of the algorithm is that words
within a given context or "neighborhood" tend to share a common topic or theme, and the dictionary
definition of the word in question can be compared with these neighboring words to help identify the
correct sense.
1. Dictionary Sense Comparison: For each possible sense (meaning) of the ambiguous word,
the algorithm compares its dictionary definition with the surrounding words in the context (i.e.,
its "neighborhood").
2. Counting Overlaps: It counts how many words from the neighborhood appear in the
dictionary definition of the sense being considered.
3. Selecting the Best Sense: The sense with the highest overlap count is chosen as the correct
meaning for the word in that particular context.
● Pine:
1. A kind of evergreen tree with needle-shaped leaves.
2. Waste away through sorrow or illness.
● Cone:
1. Solid body which narrows to a point.
2. Something of this shape whether solid or hollow.
3. Fruit of certain evergreen trees.
In this case:
● The best intersection of senses would be pine #1 (evergreen tree) and cone #3 (fruit of certain
evergreen trees), which gives an overlap count of 2. Therefore, this combination of senses
would be selected as the correct interpretation.
The Simplified Lesk Algorithm is a modified version of the original Lesk algorithm, with an emphasis
on efficiency and precision.
How it Works:
● In the simplified version, the sense of each word is determined individually, based on how
much overlap there is between its dictionary definition and the surrounding context.
● Unlike the original Lesk algorithm, which attempts to disambiguate all the words in a given
context together, the simplified approach treats each word independently.
Performance:
● A comparative evaluation of the algorithm on the Senseval-2 English all-words dataset showed
that the simplified Lesk algorithm outperforms the original version in terms of both precision
and efficiency.
● The simplified version achieved 58% precision, while the original version only achieved 42%
precision.
While Lesk-based methods are useful for WSD, they come with certain limitations:
1. Sensitivity to Exact Wording: Lesk’s approach is highly sensitive to the exact wording of
dictionary definitions. Small changes in the phrasing can significantly alter the disambiguation
results.
2. Absence of Certain Words: If a word is missing from a definition, the overlap count may be
greatly reduced, affecting the accuracy of the algorithm.
3. Limited Glosses: Lesk’s algorithm determines overlaps only among the glosses (brief
definitions) of the senses being considered. These glosses are often short and may not provide
enough vocabulary to distinguish between different senses effectively.
4. Insufficient Vocabulary in Glosses: Since dictionary glosses tend to be very concise, they may
lack enough context to clearly differentiate between multiple senses of a word, especially
when senses are subtle or nuanced.
Modifications and Improvements:
To overcome these limitations, various modifications to the Lesk algorithm have been proposed:
● Synonym Dictionaries: Using synonyms or additional words found in the glosses of senses to
improve the disambiguation process.
● Morphological and Syntactic Models: Incorporating morphological or syntactic analysis to
better understand the context and enhance sense disambiguation.
● Derivatives and Related Words: Using derivatives of words or related terms from the
definitions to find better overlaps.
Module 5 - Pragmatic and Discourse Analysis
REFERENCE RESOLUTION
Reference resolution is the process through which we determine the relationships between referring
expressions and their referents in discourse. For a computer or an automated system, understanding
how pronouns and other referring expressions like "he" or "it" relate to entities previously mentioned
in the text is a challenging task. This section discusses how reference resolution works and introduces
several key terms related to the process.
Corefer: When two referring expressions refer to the same entity, they are said to corefer.
● Example: In the sentence above, "John" and "he" corefer because both refer to the same
person, John.
Antecedent: The antecedent of a referring expression is the referring expression that enables the
use of another. In other words, the antecedent is the first mention that allows a subsequent pronoun
or referring expression to be used.
● Example: In the sentence, "John went to Bill's car dealership," "John" is the antecedent of the
pronoun "he" that follows.
● Anaphora refers to the use of a referring expression to refer to an entity that has already
been introduced into the discourse.
● A referring expression that does this is called anaphoric.
Example: In the sentence "He looked at it for about an hour," both "he" and "it" are anaphoric as they
refer back to previously introduced entities ("John" and "Acura Integra," respectively).
Reference Phenomena
In natural languages, reference is a key aspect of communication. Different types of referring
expressions and complex referent categories help navigate the relationships between terms and
entities in discourse. The following sections discuss the various types of referring expressions and
challenges in reference resolution.
● Examples:
○ "I saw an Acura Integra today. The Integra was white and needed to be washed."
○ "The Indianapolis 500 is the most popular car race in the US."
● Here, the Integra refers to a previously mentioned car, and the Indianapolis 500 is unique
enough to be identified by the listener.
Pronouns
Pronouns simplify reference by replacing noun phrases. They usually refer to entities recently
introduced or activated in the discourse model. Pronouns can be restricted by the salience or
immediacy of the referent.
● Example: "I saw an Acura Integra today. It was white and needed to be washed."
Pronouns often have to be close to their antecedents in the text (e.g., he, she, it referring to entities
mentioned recently). They can also appear before their referent (cataphora).
● Cataphora Example: "Before he bought it, John checked over the Integra very carefully."
In some cases, pronouns appear in quantified contexts and are bound to variables (e.g., Every woman
bought her Acura).
Demonstratives
Demonstrative pronouns and determiners like this and that show proximity and distance. They signal
spatial or temporal distance depending on context.
● Spatial Example: "I like this better than that."
● Temporal Example: "I bought an Integra yesterday. It’s similar to the one I bought five years
ago. That one was really nice, but I like this one even better."
Names
Names refer to specific entities, such as people, places, or organizations. They can refer to both
known and new entities in discourse.
1. Inferrables : These are entities that the listener can infer from the discourse context but
aren't explicitly mentioned. For example, the listener might infer that a person referred to by a
pronoun (e.g., he or she) is a specific person without needing to repeat their name.
2. Discontinuous Sets : These refer to a set of related entities that aren't mentioned in a
continuous sequence. For example, referring to multiple cars that may not have been
discussed together but still relate to the discourse.
3. Generics These refer to types or categories of things rather than specific instances. For
example, using cars or Acura Integras in a general sense without pointing to any particular one.
Number Agreement
Pronouns must match their antecedents in number (singular or plural). For example:
Pronouns must also match the person (first, second, third) and case (nominative, accusative, genitive).
For example:
Gender Agreement
Gender in English third-person pronouns (he, she, it) should match the gender of the noun they refer
to. For example:
● Correct: "John has an Acura. 'He' is attractive."
● Incorrect: "John has an Acura. 'It' is attractive." (The pronoun "it" could confuse the reference
to John.)
Syntactic Constraints
Syntactic constraints refer to how pronouns and their potential antecedents interact in sentence
structure. Reflexive pronouns, for instance, refer to the subject of the most immediate sentence. For
example:
Syntactic rules can also prevent certain pronouns from referring to certain subjects. For example, in
"John wanted a new car. Bill bought him a new Acura," "him" can refer to John.
Selectional Restrictions
Some verbs impose constraints on what type of object they can take. In sentences like:
● Example: "John parked his Acura in the garage. He had driven it around for hours," The
pronoun "it" clearly refers to the Acura, since "drive" is a verb associated with vehicles, not a
garage.
● Example: "John bought a new Acura. It drinks gasoline like you would not believe." (Here,
"drink" is used metaphorically for the car.)
In addition to syntactic and selectional constraints, semantic knowledge about the world helps
determine which referent is most likely. For instance:
● Example: "John parked his Acura in the garage. It is incredibly messy, with old bike and car
parts lying around everywhere."
● The garage is likely the intended referent for "it" because garages typically contain bike and car
parts, unlike a car.
Anaphora Resolution
The Hobbs Algorithm is a syntactic method for resolving pronouns. It operates by constructing a
parse tree of the sentence and then searching for potential antecedents (referents) to the pronoun.
Jack and Jill went up the hill, To fetch a pail of water. Jack fell down and broke 'his' crown, And Jill
came tumbling after.
Resolution Process:
● The algorithm's primary strategy involves searching left of the target word, restricting the
search to elements that have appeared before the pronoun. In this case, it eliminates 'crown'
as a possible referent because it appears after the pronoun 'his'.
● Next, it applies gender agreement. Since 'his' is a masculine pronoun, Jill (a feminine noun) is
ruled out. Additionally, inanimate objects like hill and water are unsuitable since 'his' typically
refers to animate entities.
● With the recency property, entities closest to the pronoun take precedence. This leaves Jack
as the most likely antecedent, matching both gender and recency constraints.
Algorithm Steps:
Generative models learn the underlying patterns and structures in the input data and use this
understanding to generate similar data. The applications are vast and include:
VAEs are a type of neural network architecture used for generating new data samples. They are an
extension of the traditional AutoEncoders, with a probabilistic twist. VAEs assume that the input data
can be modeled by a latent probability distribution, and they learn to map the input data to this
distribution.
Architecture
● Encoder: Maps the input data to a latent space, producing a mean and variance for the latent
variables.
● Decoder: Generates new data by sampling from the latent space and reconstructing the
original input.
Instead of learning a single latent representation, the VAE learns a probability distribution over the
latent space. This allows for better generalization and the ability to generate diverse samples.
How it Works
1. The input data is passed through the Encoder, which outputs a mean and variance.
2. The latent vector is sampled from this distribution using reparameterization trick: z=μ+σ⊙ϵz
= μ+σ⊙ϵ, where ϵ is sampled from a standard normal distribution.
3. The sampled latent vector Z is fed into the Decoder, which reconstructs the original data.
Applications
Advantages
● Provides a continuous latent space, allowing for smooth interpolation between generated
samples.
● Offers better regularization due to the probabilistic approach.
GANs are a type of neural network architecture designed to generate realistic data by using two
competing neural networks: a Generator and a Discriminator. The two networks are trained in a
zero-sum game, where the Generator tries to produce realistic data, and the Discriminator tries to
distinguish between real and generated data.
Architecture
How it Works
Advantages
Challenges : GANs can be difficult to train due to instability and mode collapse, where the
Generator produces limited varieties of outputs.
Limitations
1. Training Complexity
○ Generative models, especially GANs, require substantial computational resources and
expertise to train effectively. Issues like mode collapse and vanishing gradients can
make training unstable.
2. Data Quality Dependence
○ The performance of generative AI is heavily dependent on the quality and diversity of
the training data. If the data is biased, the generated content may also reflect these
biases.
3. Ethical and Privacy Concerns
○ Generative models can create highly realistic fake content, such as deepfakes, which
can be used for malicious purposes like misinformation or identity theft.
○ Using private or sensitive data for training generative models can lead to privacy
violations.
4. Lack of Control and Interpretability
○ It can be challenging to control the specific output of a generative model. For instance,
in text generation, the model might produce incorrect, inappropriate, or biased
responses.
5. Overfitting and Poor Generalization
○ Generative models may overfit the training data, making them less capable of
producing novel and diverse samples. This is a common issue when the training data is
limited.
What is ChatGPT?
ChatGPT is an advanced conversational AI model developed by OpenAI, based on the GPT
(Generative Pre-trained Transformer) architecture. It belongs to a class of large language models
(LLMs) that utilize deep learning techniques to understand and generate human-like text.
Advantages of ChatGPT
Limitations of ChatGPT
● Lacks True Understanding: Despite its impressive capabilities, ChatGPT does not have true
comprehension or reasoning. It generates text based on patterns in the data it was trained on.
● May Generate Incorrect Information: ChatGPT can confidently provide responses that are
factually incorrect or misleading.
● Sensitivity to Input Phrasing: The quality of responses can vary depending on how the input
query is phrased.
● Risk of Bias: The model may reflect biases present in the training data, leading to biased or
inappropriate responses.
Well-designed prompts can help the model produce accurate, relevant, and coherent responses,
while poorly constructed prompts may lead to incorrect, vague, or biased outputs.
Types of Prompts
Instruction-Based Prompts
● These prompts give direct instructions to the model, specifying the task clearly.
● Example: "Summarize the following text in one sentence: [text]."
Context-Based Prompts
● These prompts provide context before asking the main question, helping the model
understand the background better.
● Example: "You are an expert in climate science. Explain the impact of greenhouse gases on
global warming."
Completion Prompts
● The model is given a starting text and asked to continue or complete it.
● Example: "In a world where artificial intelligence has taken over human tasks, the greatest
challenge is..."
Question-Based Prompts
Role-Based Prompts
● The prompt assigns a role or persona to the model to tailor the response style.
● Example: "Act as a software development mentor and explain how to use version control in
Git."
Prompt Templates
Prompt Templates are pre-designed structures that can be reused to interact with LLMs for different
tasks. They help maintain consistency and ensure the prompt is clear and effective.
1. Summarization Template:
○ "Summarize the following content in a concise paragraph: [insert content here]."
○ Use Case: Quickly getting a summary of articles, research papers, or long text inputs.
2. Q&A Template:
○ "Based on the given context, answer the following question: [context]. Question: [insert
question here]."
○ Use Case: Helps when extracting information from specific contexts or datasets.
3. Code Assistance Template:
○ "You are a Python expert. Given the code snippet below, provide a detailed explanation
and suggest improvements: [insert code here]."
○ Use Case: Code review and debugging support.
4. Creative Writing Template:
○ "Write a short story about [topic or theme], focusing on the characters [insert character
names]."
○ Use Case: Generating creative content for storytelling or brainstorming ideas.
5. Structured Output Template:
○ "Generate a response in JSON format with fields for 'Summary', 'Key Points', and
'Recommendations': [insert input text here]."
○ Use Case: Obtaining structured data outputs for further processing or integration.
● Assigning a role or persona to the model can help it adopt a specific tone or expertise level,
making the responses more targeted.
● Example:
○ "You are a financial analyst. Analyze the impact of inflation on the stock market."
c) Provide Context
● Including context before the main query can help the model understand the background,
leading to better responses.
● Example:
○ Context: "The article discusses climate change and its effects on polar bears."
○ Query: "Summarize the impact of climate change on polar bear populations."
● Providing examples can help the model understand the expected format or style of the
response.
● Example:
○ "Translate the following sentences into French. Example: 'Good morning' → 'Bonjour'.
Sentence: 'How are you?'"
● Specify the desired length of the response if you need a short answer or a detailed
explanation.
● Example:
○ "In two sentences, explain why neural networks are used in deep learning."
● Asking the model to format the response in a specific way can help in extracting structured
data.
● Example:
○ "List the advantages of using GANs in bullet points."
● If the response is not satisfactory, refine the prompt iteratively by clarifying the task or
rephrasing the question.
● Example:
○ Initial Prompt: "Explain convolutional neural networks."
○ Refined Prompt: "Explain convolutional neural networks, focusing on their architecture
and applications in image processing."
Zero-Shot Learning
What is Zero-Shot Learning?
Zero-shot learning refers to a scenario where the model is expected to perform a task without having
seen any specific examples of it during training. Instead, the model relies on its general
understanding and the information provided in the prompt to infer what is required.
How it Works
In zero-shot learning, the prompt is designed to be clear and self-explanatory, containing all the
necessary instructions for the model to understand the task. The model uses its vast knowledge base,
acquired during pre-training, to interpret the task and generate an appropriate response.
1. Text Summarization:
○ Prompt: "Summarize the following article in one sentence: [Insert article text]."
○ The model is not given any specific examples of summaries but is expected to produce
one based on its understanding.
2. Sentiment Analysis:
○ Prompt: "Analyze the sentiment of this review: 'I absolutely loved this product. It
exceeded my expectations.'"
○ The model infers that it needs to classify the sentiment without being given labeled
examples.
3. Translation:
○ Prompt: "Translate this sentence into Spanish: 'Where is the nearest hospital?'"
○ The model performs the translation task without explicit training on this specific
sentence.
● Versatility: It allows the model to handle a wide variety of tasks without additional training
data.
● Ease of Use: No need to provide examples or fine-tune the model for specific tasks.
Challenges:
● Limited Accuracy: The model may not always produce accurate results, especially for
complex tasks or tasks requiring domain-specific knowledge.
● Ambiguity: The model might misinterpret the task if the prompt is not clear enough.
Few-Shot Learning
What is Few-Shot Learning?
Few-shot learning is a technique where the model is provided with a few examples of the task within
the prompt. These examples serve as demonstrations, helping the model understand the expected
format, style, and requirements of the task.
How it Works
The prompt includes a few input-output pairs as examples before asking the model to complete a
similar task. This approach helps the model generalize better because it can learn from the provided
examples and apply the learned pattern to new inputs.
1. Text Classification:
Analyze the sentiment of this review: "The plot was dull and predictable."
The model uses the provided examples to determine the sentiment of the new review.
Example 1: "Barack Obama was born in Hawaii." → Person: Barack Obama, Location:
Hawaii
Extract entities from this sentence: "Elon Musk founded SpaceX in California."
3. Code Completion:
● Improved Accuracy: Providing examples helps the model understand the task better and
increases the likelihood of generating correct responses.
● Flexibility: It can adapt to new tasks without requiring full retraining.
Challenges:
● Prompt Length: Including many examples can make the prompt longer, which may be
inefficient for very large tasks.
● Overfitting to Examples: The model might rely too heavily on the provided examples, limiting
its ability to generalize.
Transformer Architecture
The transformer architecture, central to many modern LLMs like GPT, uses self-attention mechanisms
and multi-layer processing to understand and generate human-like text effectively. It was designed to
address issues in sequence modeling, like those found in recurrent neural networks (RNNs) and long
short-term memory networks (LSTMs), which struggled with long-range dependencies and
parallelization.
1. Self-Attention Mechanism: This is the core feature allowing the model to weigh the
importance of different words in a sentence regardless of their position. By computing
attention scores, the transformer can identify relevant relationships within the input sequence.
2. Encoder-Decoder Structure: The original transformer model consists of an encoder and a
decoder. However, in LLMs like GPT, only the decoder part is used for autoregressive text
generation, making it efficient for tasks like text completion and dialogue.
3. Positional Encoding: Since transformers do not have a natural sense of word order like RNNs,
positional encoding is added to provide a sense of sequence, helping the model to distinguish
between different positions of words.
4. Multi-Head Attention: Instead of focusing on a single attention score, multi-head attention
allows the model to look at different parts of the sequence simultaneously, capturing various
aspects of the word relationships and context.
5. Feed-Forward Neural Network: Each encoder and decoder layer also includes a feed-forward
neural network for additional nonlinearity and complexity, followed by normalization to
stabilize training.
Pre-training is the first phase of LLM development, where the model is exposed to vast amounts of
text data to learn general language patterns, grammar, facts, and context.
Process:
Advantages:
● Knowledge Base: The model develops a foundational understanding of language, facts, and
general knowledge.
● Transferability: The general language skills learned during pre-training can be adapted to a
variety of specific tasks with fine-tuning.
Challenges:
2. Fine-tuning
Fine-tuning is the second phase, where the pre-trained model is adapted for specific tasks using a
smaller, task-specific dataset.
Process:
● Supervised Learning: Fine-tuning typically involves supervised learning, where the model is
trained with labeled examples for a specific task (e.g., sentiment analysis, text classification,
question answering).
● Task-specific Objectives: The objective changes from general language modeling to the
specific task at hand. For instance, during fine-tuning for sentiment analysis, the model learns
to classify text as positive or negative.
● Smaller Dataset: The dataset for fine-tuning is much smaller compared to pre-training, but it
is highly relevant to the specific task.
Advantages:
● Task Adaptation: Fine-tuning allows the model to specialize and perform well on targeted
tasks.
● Efficiency: Since the model has already learned a lot about the language during pre-training,
fine-tuning can be done faster with less data.
Challenges:
● Overfitting: The model can overfit the small fine-tuning dataset, especially if it is not diverse
or large enough.
● Catastrophic Forgetting: The model may lose some of its general knowledge gained during
pre-training if fine-tuning heavily focuses on a specific task.
N gram Numerical and theory