0% found this document useful (0 votes)
13 views66 pages

Chapter 1

The document provides an overview of Natural Language Processing (NLP), including its history, components, applications, and key techniques such as tokenization, stemming, and lemmatization. It outlines the curriculum for a course on NLP, detailing foundational concepts, machine translation, generative models, transformer networks, and large language models. Additionally, it discusses various word embedding methods and their advantages and disadvantages in text analysis.

Uploaded by

01fe22bci028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views66 pages

Chapter 1

The document provides an overview of Natural Language Processing (NLP), including its history, components, applications, and key techniques such as tokenization, stemming, and lemmatization. It outlines the curriculum for a course on NLP, detailing foundational concepts, machine translation, generative models, transformer networks, and large language models. Additionally, it discusses various word embedding methods and their advantages and disadvantages in text analysis.

Uploaded by

01fe22bci028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 66

Chapter 1: Introduction to NLP

Vertical - Data Engineering

Introduction
to
calculus
• Functions and Graphs • Vectors and Vector Spaces • Description of data • Introduction and ER Mod
• Limits and derivatives • Introduction to Matrices, matrix • Probability • Relational Data Model
• Applications of transformations, determinants • Random variables • Relational Algebra
differentiation • Vector space • Probability distribution • SQL, Database design
• Integrals • Eigen value and vector, SVD • Statistical inference • transaction processing
• Applications of integrals and PCA • Concurrency system
First Year
Second Year

• Introduction to NLP • Regression , classification


• Auto encoders and decoders • Descriptive Data Analysis • Partial Derivative
• Ensemble learning
• GAN’s and GenAI • Visualization • Chain Rule
• Neural Network
• Transformer Networks • Data pre-processing • Langrage's multipliers
• Deep Neural Network
• Diffusion models, LLM • Supervised learning • Differential equations of
• Seq2seq models
• Clustering higher orders
• Time series analysis
Third Year
Outcome

1. Apply foundational NLP techniques, including various word embeddings and


parsing methods to analyze natural language data.
2. Design and implement machine translation systems using Seq2Seq models,
Attention mechanisms, autoencoders and decoders for complex language
processing tasks.
3. Differentiate between generative and discriminative models, design and
evaluate Generative Adversarial Networks (GANs), and explore various types of
GANs for application in language modeling and other AI-driven tasks.
4. Design and implement transformer Networks for text generation, analyze the
properties of diffusion models for advanced NLP tasks.
5. Apply Large Language Models (LLMs) including BERT and GPT, prompting
techniques, and Low-Rank Adaptation (LoRA) to enhance and customize and
optimize the performance for domain specific applications.
6. Create a comprehensive report of course project and publish paper at technical
conference.
Vertical - Data Engineering

Unit - 1
1 Introduction to NLP
Introduction to Natural Language Processing, Applications of Natural Language 04
Processing, Word embeddings. Parsing techniques - Dependency Grammar, Neural hrs
dependency parsing.
2 Machine Translation, Auto encoders and decoders 06
Machine Translation, Seq2Seq and Attention, Autoencoder and decoders. hrs
3 Generative Adversarial Networks
05
Generative vs. Discriminative models, Generative Adversarial Networks and Language
hrs
Models, types of GANs.
Unit - 2
4 Transformer Networks & Diffusion models
Transformer Networks, transformers for text generation, Diffusion models – continuous 07
vs discrete, deterministic vs stochastic models. hrs

5 Large Language Models


Introductions to LLM’s, LLM - BERT and GPT models, prompting techniques, Adapters 08
and low rank adoption (LoRA). hrs
Lab Experiments

Exp No. Brief description about the experiment No of


slots
1 Experiments on Text Classification- word2vec, Language Modeling 2
Machine Translation, Text Summarization
2 Experiments on Machine Translation - seq2seq model, Text 2
Summarization
3 Experiments on Part-of-Speech (POS) Tagging, Question Answering 2
Systems, Topic Modeling
4  Implementing a Basic Diffusion Model 2
 Training a Diffusion Model
 Image Denoising Using Diffusion Models
5 Data pre-processing and Tokenization, Building a Simple Language 1
Model
6 Implementing Attention Mechanisms, Exploring Transformer 1
Architectures
7  Fine-Tuning for Specific Tasks
 Ethical Considerations and Bias Detection 4
 Real-World Application Development
 Performance Optimization and Scaling
Chapter 1 – What is NLP?

1. NLP stands for Natural Language Processing.


2. It is the branch of Artificial Intelligence that
gives the ability to machine understand and
process human languages.
3. Human languages can be in the form of text
or audio format.
4. NLP enables computers and digital devices
to recognize, understand and generate text
and speech by combining computational
linguistics—the rule-based modeling of
human language together with statistical
modeling, machine learning and deep
learning.
History of NLP

1950: Alan Turing publishes "Computing Machinery and Intelligence", introducing the idea of machines
understanding and generating human language.
Heuristics-Based NLP (Early Years):
•Rule-based methods using predefined, hand-crafted rules from domain experts (e.g., regular
expressions).
•Limitations: Limited scalability for complex language processing.
Statistical NLP (1990s):
•Shift to machine learning algorithms using statistical patterns learned from data.
•Examples: Naive Bayes, Support Vector Machines (SVM), Hidden Markov Models (HMMs).
Neural Network-Based NLP (Present):
•Deep learning models provide better accuracy but require large datasets and high computational power.
•Examples: Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Transformers (e.g.,
BERT, GPT).
Components of NLP

Natural Language Understanding (NLU)

●Focuses on enabling machines to comprehend and interpret human language.


●Key Tasks:
○ Syntax Analysis (Parsing): Understanding the grammatical structure of sentences.
○ Semantics: Extracting the meaning of the text.
○ Named Entity Recognition (NER): Identifying proper nouns (e.g., names, locations).
○ Sentiment Analysis: Determining emotions or opinions expressed in text.
Natural Language Generation (NLG)

●Involves generating coherent human language from machine-understood data.


●Key Tasks:
○ Text Generation: Automatically producing human-like sentences (e.g., chatbots, AI writing).
○ Summarization: Condensing large texts into shorter versions.
○ Machine Translation: Converting text from one language to another.
Applications of NLP

● Text and speech processing like-Voice assistants


– Alexa, Siri, etc.
● Text classification like Grammarly, Microsoft
Word, and Google Docs
● Information extraction like-Search engines like
DuckDuckGo, Google
● Chatbot and Question Answering like:- website
bots
● Language Translation like:- Google Translate
● Text summarization
Phases of NLP

• Lexical Analysis: Breaks down the text into words (tokens) and identifies their part of speech (nouns, verbs, etc.).
• Syntactic Analysis (Parsing): Examines the grammatical structure of a sentence, checking if the arrangement of
words follows the rules of a language.
• Semantic Analysis: Focuses on understanding the meaning of individual words and how they combine in a sentence.
• Pragmatic Analysis: Interprets the meaning of the sentence in context, considering background knowledge or the
speaker's intent.
• Discourse Integration: Ensures that individual sentences are connected to form a coherent text or conversation.
Regular Expressions

A regular expression (sometimes called a rational expression) is a sequence of characters that define a search
pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace” like
operations. Regular expressions are a generalized way to match patterns with sequences of characters.
Rules for Regular Expressions
•Every letter of ∑ can be made into a regular expression, null string, ∈ itself is a regular expression.
•If r1 and r2 are regular expressions, then (r1), r1.r2, r1+r2, r1*, r1 + are also regular expressions.
Terminology

Corpus
• Definition: A corpus is a large collection of text used for training and evaluating NLP
models.
• Example: Wikipedia articles, news datasets, customer reviews.
• Types of Corpus:
• Monolingual Corpus: Text in one language (e.g., English News Corpus).
• Parallel Corpus: Text in multiple languages for translation tasks (e.g., Europarl dataset).
• Domain-Specific Corpus: Text from specialized fields like medical or legal texts.
Example:
If we collect 10,000 articles from news websites, we call it a news corpus.

Documents
• Definition: A document is a single unit of text in a corpus.
• Example: An email, a tweet, a book chapter, a product review.
• Relationship: A corpus is made up of multiple documents.

Example:
• A news corpus contains thousands of news articles (each article is a document).
• A book corpus contains individual book chapters as documents.
Terminology

Vocabulary
• Definition: A vocabulary is the set of unique words found in a corpus.
• Example: [“AI”, “Machine”, “Learning”, “Python”, “Deep”, “Neural”]
• Size: Vocabulary size depends on the dataset; large corpora have extensive vocabularies.
• Preprocessing Impact:
• Removing stopwords (common words like “is”, “the”) reduces vocabulary size.
• Applying stemming/lemmatization normalizes words and minimizes redundant terms.
Example:
If we process a news corpus with 1 million words, and find 50,000 unique words, then the
vocabulary size is 50,000.

Words (Tokens)
• Definition: Words (or tokens) are the smallest units in a text after tokenization.
• Example: [“I”, “love”, “NLP”, “!”]
• Types of Tokens:
• Unigram Tokens: Single words (“deep”, “learning”).
• Bigram Tokens: Pairs of words (“deep learning”, “neural networks”).
• Subword Tokens: Smaller parts of words used in NLP models (e.g., “learn” and “##ing” in BERT).
Example:
Text: “Natural Language Processing is fun!”
• Tokens: [“Natural”, “Language”, “Processing”, “is”, “fun”, “!”]
Text Processing

Text Text Word to


Dataset Processing 1 Processing 2 Vector
Tokenization
Stemming
Lemmatization
Stopwords Count or Freq DL

One Hot Encoding Word2Vec


Bag of words
Tf-IDF
Text Processing

Tokenization

Definition:
Tokenization is the process of splitting text into smaller units called tokens (words,
phrases, or sentences). It helps in preprocessing text for NLP tasks.

Types of Tokenization:
• Word Tokenization: Splitting text into words.
• Sentence Tokenization: Splitting text into sentences.
• Subword Tokenization: Used in deep learning models like BERT (e.g., WordPiece,
Byte Pair Encoding).

Text: "Natural Language Processing is amazing!"


Word Tokenization: ['Natural', 'Language', 'Processing', 'is', 'amazing', '!']
Sentence Tokenization: ["Natural Language Processing is amazing!"]
Text Processing

Stemming

Definition:
Stemming reduces a word to its root form by chopping off suffixes, but it does not always
produce meaningful words.

Common Stemming Algorithms:


• Porter Stemmer
• Snowball Stemmer
• Lancaster Stemmer

Words: ["running", "flies", "easily"]


Porter Stemmer: ["run", "fli", "easili"]
Text Processing

Lemmatization

Definition:
Lemmatization reduces words to their dictionary base form (lemma), considering the
word’s meaning and grammar.

Words: ["running", "flies", "easily"]


Lemmatized: ["run", "fly", "easy"]

Lemmatization vs. Stemming:


• Stemming: Works by cutting off word endings (faster but less accurate).
• Lemmatization: Uses a vocabulary lookup, preserving meaning (slower but more accurate).
Comparison of Tables
Word Embeddings

• Basic Terminologies in NLP


1. Corpus – Paragraph
2. Documents – Sentences
3. Vocabulary – Unique words present in the dictionary
4. Words – Basic words present in the paragraph
Word Embeddings

• In NLP, word embeddings is a term used for the representation of words


for text analysis, typically in the form of a real valued vector, that
encodes the meaning of the word, such that that the words that are
closer in the vector space are expected to be similar in meaning.
Road-Map of NLP
Word Embeddings

The flow chart, shows the most popular Word to Vector


conversion techniques.
Word Embeddings - One Hot Encoding

One Hot Encoding


•One-hot encoding is a method used to convert categorical data into a numerical
format that machine learning algorithms can process. It transforms each unique
category into a binary vector, where only one position is marked as 1 (hot), and all
others are 0 (cold).
Word Embeddings - One Hot Encoding

Example on One Hot Encoding


For the given the Text and Specific Output. Convert the Text Document into Vectors
using One-Hot Encoding.
Document Text Output
D1 The food is good 1
D2 The food is bad 0
D3 Pizza is Amazing 1

Solution:
Vocabulary {unique words present in the dataset} = 7
“The Food Is Good Bad Pizza Amazing”
Based on these words One-Hot Encoding is done.
Word Embeddings - One Hot Encoding
Word Embeddings - One Hot Encoding

Advantage:
1. Easy to implement and understand.

Disadvantage:
1. The resulting matrix is sparse (mostly zeros), which can be inefficient for
storage and computation, and also leads to overfitting.
2. As seen in the above example, for d3, the matrix size is 3*7. This infers that
“fixed text size” is not possible in all the cases. But, for training an ML
algorithm it requires fixed size matrix.
3. No semantic meaning is captured between the words.
4. One-hot encoding does ignore out-of-vocabulary (oov) categories in test
data.
Word Embeddings - Bag of Words

Bag of Words
•The Bag of Words model represents text data by counting the occurrences of each
word in a document by ignoring grammar and word order, but keeping track of
frequency.
Word Embeddings - Bag of Words
Example on Bag of words:
Given dataset, solve using Bag of words

Document Text Output


S1 He is a good boy 1
S2 She is a good girl 1
S3 Boy and girl are good 1

Solution :
First step – lower all the words.
Second step – Exclude the stop words (words not involved in sentiment
analysis).
Hence the reframed Text is as follows
S1 - good boy
S2 – good girl
S3 – boy girl good
Word Embeddings - Bag of Words

Vocabulary frequency
good 3
boy 2
girl 2

Document Feature
good boy girl
S1 [1 1 0] These vectors are used to train the
S2 [1 0 1] machine learning model
S3 [1 1 1]

They are 2 types of Bag of Words


1.Binary bag of words
2.Normal bag of words
Word Embeddings - Bag of Words

Advantages
—Simple to implement and understand.
—Results in fixed size input, which benefits ML algorithms.

Disadvantages
—The resulting matrix is sparse which can lead to overfitting.
—Ordering of the word is changed.
—Ignores out-of-vocabulary (oov) categories in test data.
—Sometimes, it fails in capturing the semantic meaning between the
sentences.
Word Embeddings: TF-IDF
Word Embeddings : TF-IDF

Example on TF-IDF
Given dataset, solve using TF-IDF
Document Text Output
D1 He is a good boy 1
D2 She is a good girl 1
D3 Boy and girl are 1
good
Solution:
First step – lower all the words.
Second step – Exclude the stop words (words not involved in sentiment
analysis).
Hence the reframed Text is as follows
S1 - good boy
S2 – good girl
S3 – boy girl good
Word Embeddings : TF-IDF

Term Frequency
Words in S1 S2 S3
Vocabulary
good 1/2 1/2 1/3
boy 1/2 0 1/3
girl 0 1/2 1/3

IDF
WORDS IDF
good Log(3/3)=
0
boy Log(3/2)
girl Log(3/2)
Final TF-IDF
Sentence good boy girl
Sentence 1 0 ½log(3/2) 0
Sentence 2 0 0 1/2log(3/2)
Sentence 3 0 1/3log(3/2) 1/3log(3/2)
Word Embeddings – TF-IDF

Advantages
—Simple to implement and understand.
—Results in fixed size input, which benefits ML algorithms.
—Word importance is getting captured

Disadvantages
—The resulting matrix is sparse which can lead to overfitting.
—Ignores out-of-vocabulary (oov) categories in test data.
—TF-IDF treats words as independent entities and doesn’t consider semantic
relationships between them.
Word Embeddings – Word2Vec

• Word2Vec is a technique for NLP published in 2013.


• The Word2Vec algorithm uses neural network model to learn word associations
from a large Corpus of text.
• Once trained, such a model can detect synonymous words or suggest additional
words for a partial sentence.
• As the name implies, Word2Vec represents each distinct word with a particular list
of numbers called a Vector.
Word Embeddings – Word2Vec

• Example -

Features Boy Girl King Queen Apple Mango


Gender -1 1 -0.92 0.93 0.01 0.05
Royal 0.01 0.02 0.95 0.96 -0.02 0.02
Age 0.03 0.02 0.75 0.68 0.95 0.96
Food 0.01 0.01 0.01 0.01 0.91 0.92
…. ….

• Each unique word is represented in the form of vector considering the particular
features.
• The numerical value is assigned based on the relationship between the vocabulary
and feature representation.
Word Embeddings – Word2Vec
Word Embeddings – Word2Vec: CBOW and Skip Gram

Word2Vec is a technique for word embedding, and it has two architectures:


1.CBOW (Continuous Bag of Words) – Predicts the target word based on surrounding
context words.
2.Skip-gram – Predicts surrounding words given a target word.

CBOW Working Mechanism


•Takes context words (surrounding words) as input.
•Predicts the target word (center word).
•Uses a neural network to learn embeddings.

Skip-gram Working Mechanism


•Input: A single target word
•Output: Predicts surrounding context words

Question- Write the difference between CBOW and Skip-Gram


Word Embeddings – Word2Vec: CBOW
Skip Gram working
SkipGram VS CBOW
Word Embeddings – GloVe

GloVe (Global Vectors for Word Representation)


—GloVe is a count-based model that generates word vectors by analyzing
word co-occurrence in a large corpus. It captures both local and global
context.
—Builds a co-occurrence matrix (words that appear together in a
document).
—Applies matrix factorization (Singular Value Decomposition - SVD).
—Generates fixed-size word embeddings based on the matrix.
Word Embeddings – FastText

— FastText is an extension of Word2Vec by Facebook that improves word


embeddings by using subword information (character n-grams).
— It is useful for handling rare words, misspellings, and different word
forms.
— Instead of learning embeddings for whole words, it learns embeddings
for subword units (n-grams).
— Example: The word “apple” can have subwords like “app” “ppl” “ple”.
— Even if a word does not exist in the training data, its meaning can be
inferred from its subwords.
Dependency grammar

• Dependency grammar is a
fundamental concept in natural
language processing (NLP) that allows
us to understand how words connect
within sentences.
• It provides a syntactic framework for
representing sentence structure based
on word-to-word relationships are
connected by directed links
(dependencies).
• Focuses on head-dependent relations
rather than phrase structure.
Grammar in the sentence
Dependency Parsing

• Dependency Parsing is the process to


analyze the grammatical structure in a
sentence and find out related words as well
as the type of the relationship between
them.
Each relationship:
1.Has one head and a dependent that
modifies the head.
2.Is labeled according to the nature of the
dependency between the head and
the dependent.
Example
• In the above diagram, there exists a
relationship between car
and black because black modifies the
meaning of car. Here, car acts as
the head and black is a dependent of
the head.

• The nature of the relationship here


is amod which stands for "Adjectival
Modifier". It is an adjective or an adjective
phrase that modifies a noun.
Example:

The dependency grammar framework to represent


the following sentence : “I prefer the morning flight
through Denver”:
The ‘prefer’ is the key
concept in this sentence,
where the subject ‘I’ prefers
a flight that has some
properties.

All other words are merely


specifying what kind of
flight it is.
Source :https://web.stanford.edu/~jurafsky/slp3/14.pdf

In this example, the word


‘prefer’ is the root in
dependency parsing,
Example:

• "I" is the subject (Pronoun).


• "prefer" is the main verb (Predicate).
• "the morning flight through Denver" is the
direct object (a noun phrase).
• "the" is the determiner (article).
• "morning flight" is a noun phrase, where
"morning" is an adjective modifying "flight".
• "through Denver" is a prepositional phrase
modifying "flight", where:"through" is the
preposition.
• "Denver" is a noun (proper noun) as the
object of the preposition.
Example: Tree Structure
Example: Tree Structure
Some rules in dependency parsing

• Only heads point to dependents.

• The labels are describing what does a


dependent means to its head.

• This is called a typed dependency


structure because in typed dependency
the labels are drawn from a fixed
inventory of grammatical relations.
Some of the Universal Dependency Relation
s
Examples and Visualization

• Example: 'The cat chased the mouse' • 'She enjoys playing the piano'
• chased (head) → cat (subject) • enjoys (head) → She (subject)
• chased (head) → mouse (object) • enjoys (head) → playing (object)
• cat (head) → the (determiner) • playing (head) → piano (object)
• mouse (head) → the (determiner)
• piano (head) → the (determiner)
Examples and Visualization

• 'John gave Mary a book’ • 'They quickly finished the assignment'


• gave (head) → John (subject) • finished (head) → They (subject)
• gave (head) → Mary (indirect object)• finished (head) → assignment (object)
• gave (head) → book (direct object) • finished (head) → quickly (adverbial modifi
• book (head) → a (determiner) • assignment (head) → the (determiner)
Problems with traditional parsers

• 1. From a statistical perspective, traditional


parsers suffer from the use of millions of
mainly poorly estimated feature weights.

• 2. Almost all existing parsers rely on a


manually designed set of feature templates,
which require a lot of expertise and are
usually incomplete.

• 3. The use of many feature templates causes a


less studied problem: in modern dependency
parsers, most of the runtime is consumed not
by the core parsing algorithm but in the
feature extraction step.
Conclusion

• Dependency Grammar offers a clear representation


of sentence structure.
• It is widely used in applications such as parsing,
machine translation, and information extraction.
• Understanding dependency relations helps improve
the accuracy of NLP models.
Neural Dependency Grammar

Neural dependency parsers typically use one of the following architectures:


🔹 1. BiLSTM-based Dependency Parsing
Bidirectional LSTM (BiLSTM) is a type of Recurrent Neural Network (RNN) that processes text
in both forward and backward directions. It captures both past and future context, making it
effective for dependency parsing.
Each word is represented as a word embedding (e.g., GloVe, FastText).
A BiLSTM layer processes the sequence to encode contextual dependencies.
A feedforward neural network (FFNN) or graph-based model predicts the dependency
relationships.

🔹 2. Transformer-Based Parsing (BERT, GPT, etc.)


Transformers, especially BERT (Bidirectional Encoder Representations from Transformers),
have revolutionized NLP, including dependency parsing.
Instead of LSTMs, it uses self-attention mechanisms to capture dependencies across the
entire sentence.
Pretrained models like BERT, XLNet, and RoBERTa are fine-tuned on dependency parsing
datasets.
Neural Parsing Vs Traditional Parsing
Implementation(Lab)

• Using spaCy
• spaCy is an open-source Python library for
Natural Language Processing.
• To get started, first install spaCy and load
the required language model.

• en_core_web_sm is the smallest English


model available in spaCy with a size of
12MB. Refer spaCy English Models to view
other available models.
spaCy also provides a built-in dependency
visualizer called displaCy that can be used
to generate dependency graph for
sentences.

displacy.render() function will generate the


visualization for the sentence.
Dependency Parsing using SpaCy
Part-of-Speech (POS) Tagging

Parts of Speech tagging is a linguistic activity in Natural Language Processing (NLP)


wherein each word in a document is given a particular part of speech (adverb, adjective,
verb, etc.) or grammatical category.
Through the addition of a layer of syntactic and semantic information to the words, this
procedure makes it easier to comprehend the sentence’s structure and meaning.

In NLP applications, POS tagging is useful for machine translation, named entity
recognition, and information extraction, among other things.
It also works well for clearing out ambiguity in terms with numerous meanings and
revealing a sentence’s grammatical structure.
Part-of-Speech (POS) Tagging

Consider the sentence: “The quick brown fox jumps over the lazy dog.”

After performing POS Tagging:


“The” is tagged as determiner (DT)
“quick” is tagged as adjective (JJ)
“brown” is tagged as adjective (JJ)
“fox” is tagged as noun (NN)
“jumps” is tagged as verb (VBZ)
“over” is tagged as preposition (IN)
“the” is tagged as determiner (DT)
“lazy” is tagged as adjective (JJ)
“dog” is tagged as noun (NN)
By offering insights into the grammatical structure, this tagging aids machines in
comprehending not just individual words but also the connections between them inside a
phrase.
Part-of-Speech (POS) Tagging

Advantages of POS Tagging


•Text Simplification: Breaking complex sentences down into their constituent parts makes
the material easier to understand and easier to simplify.
•Information Retrieval: Information retrieval systems are enhanced by point-of-sale (POS)
tagging, which allows for more precise indexing and search based on grammatical categories.
•Syntactic Parsing: It facilitates syntactic parsing, which helps with phrase structure
analysis and word link identification.

Disadvantages of POS Tagging


•Ambiguity: The inherent ambiguity of language makes POS tagging difficult since words can
signify different things depending on the context, which can result in misunderstandings.
•Idiomatic Expressions: Slang, colloquialisms, and idiomatic phrases can be problematic for
POS tagging systems since they don’t always follow formal grammar standards.
•Out-of-Vocabulary Words: Out-of-vocabulary words (words not included in the training
corpus) can be difficult to handle since the model might have trouble assigning the correct
POS tags.

You might also like