0% found this document useful (0 votes)

56 views15 pages

NLP Exam Notes

The document provides an overview of Natural Language Processing (NLP), detailing its advantages, disadvantages, and key components such as Natural Language Understanding (NLU) and Natural Language Generation (NLG). It outlines various applications of NLP, including sentiment analysis, speech recognition, and chatbots, as well as the phases of NLP and text pre-processing techniques. Additionally, it discusses important concepts like lexical databases, morphology analysis, parts of speech tagging, and the significance of stop words in NLP.

Uploaded by

rijalsishir7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views15 pages

NLP Exam Notes

Uploaded by

rijalsishir7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Important Topics

 Introduction to NLP
-Natural Language Processing (NLP) is a field of Artificial Intelligence (AI)
that deals with the interaction between computers and human languages. NLP is
used to analyze, understand, and generate natural language text and speech. The
goal of NLP is to enable computers to understand and interpret human language
in a way that is like how humans process language.
Advantages of NLP:
1. Improving Communication: NLP can improve communication by
enabling computers to understand natural language and respond in a way
that is more intuitive for humans.
2. Text Summarization: NLP can be used to summarize large amounts of
text quickly and accurately, allowing users to quickly identify key points.
3. Sentiment Analysis: NLP can be used to analyse the sentiment of text,
allowing businesses to monitor customer feedback and adjust their
strategies accordingly.
4. Personalization: NLP can be used to personalize content for individual
users based on their preferences and behaviour.
5. Automates repetitive tasks: NLP techniques can be used to automate
repetitive tasks, such as text summarization, sentiment analysis, and
language translation, which can save time and increase efficiency.
6. NLP helps users to ask questions about any subject and get a direct
response within seconds.
7. NLP offers exact answers to the question means it does not offer
unnecessary and unwanted information.
8. NLP helps computers to communicate with humans in their languages.
9. It is very time efficient.
10.Most companies use NLP to improve the efficiency of documentation
processes, accuracy of documentation, and identify the information from
large databases.
Disadvantages of NLP:
1. Requires large amounts of data: NLP systems require large amounts of
data to train and improve their performance, which can be expensive and
time-consuming to collect.
2. Limited ability to understand idioms and sarcasm: NLP systems have a
limited ability to understand idioms, sarcasm, and other forms of
figurative language, which can lead to misinterpretations or errors in the
output.
3. Limited understanding of context: NLP systems have a limited
understanding of context, which can lead to misinterpretations or errors in
the output.
4. Limited ability to understand emotions: NLP systems have a limited
ability to understand emotions and tone of voice, which can lead to
misinterpretations or errors in the output.
5. Bias: NLP systems may reflect the biases of their developers or training
data, leading to inaccurate or unfair results.
6. NLP may not show context.
7. NLP is unpredictable.
8. NLP may require more keystrokes.
9. NLP is unable to adapt to the new domain, and it has a limited function
that's why NLP is built for a single and specific task only.

 Challenges in NLP

 Understanding of NLU & NLG

# There are the following two components of NLP -

1. Natural Language Understanding (NLU)

Natural Language Understanding (NLU) helps the machine to understand and

analyse human language by extracting the metadata from content such as
concepts, entities, keywords, emotions, relations, and semantic roles.NLU is
mainly used in Business applications to understand the customer's problem in
both spoken and written language. NLU (Natural Language Understanding)
refers to the ability of machines to understand and interpret natural language
input from humans. NLU involves techniques such as syntactic and semantic
analysis, named entity recognition, and sentiment analysis to extract meaning
from text or speech. NLU is used in applications such as chatbots, virtual
assistants, and automated customer service systems to understand and respond
to user queries in a natural language format.

NLU involves the following tasks -

o It is used to map the given input into useful representation.

o It is used to analyze different aspects of the language.

2. Natural Language Generation (NLG)

Natural Language Generation (NLG) acts as a translator that converts the

computerized data into natural language representation. It mainly involves Text
planning, Sentence planning, and Text Realization. NLG (Natural Language
Generation) refers to the ability of machines to generate natural language text
from structured data or other inputs. NLG involves techniques such as text
planning, sentence generation, and surface realization to create coherent and
grammatically correct sentences or paragraphs. NLG is used in applications
such as automated report generation, weather forecasts, and personalized
marketing messages.

Both NLU and NLG are essential components of NLP and are used in a variety
of applications to improve human-machine interaction and automate tasks that
involve natural language processing. NLU is focused on understanding natural
language input, while NLG is focused on generating natural language output.
Together, these two subfields enable machines to communicate with humans
more naturally and intuitively.

# Note: The NLU is more difficult than NLG.

 Applications of NLP
- NLP techniques are used in a wide range of applications, including:

1. Question Answering- Question Answering focuses on building

systems that automatically answer the questions asked by humans in a
natural language.
2. Spam Detection- Spam detection is used to detect unwanted e-mails
getting to a user's inbox.
3. Sentiment Analysis- Sentiment Analysis is also known as opinion
mining. It is used on the web to analyse the attitude, behavior, and
emotional state of the sender. This application is implemented through
a combination of NLP (Natural Language Processing) and statistics by
assigning the values to the text (positive, negative, or natural), and
identifying the mood of the context (happy, sad, angry, etc.). NLP
techniques are used to determine the sentiment or emotion expressed
in text, which is useful for tasks such as customer feedback analysis
and social media monitoring.
4. Speech Recognition- Speech recognition is used for converting
spoken words into text. It is used in applications, such as mobile,
home automation, video recovery, dictating to Microsoft Word, voice
biometrics, voice user interface, and so on. NLP techniques are used
to convert speech to text, which is useful for tasks such as dictation
and voice-controlled assistants.
5. Chatbot- Implementing the Chatbot is one of the important
applications of NLP. It is used by many companies to provide the
customer's chat services.

 NLP Phases with example

- There are generally five steps –
1. Lexical Analysis and Morphological Analysis − It involves
identifying and analyzing the structure of words. The Lexicon of a
language means the collection of words and phrases in a language.
Lexical analysis is dividing the whole chunk of text into paragraphs,
sentences, and words. The first phase of NLP is the Lexical Analysis.
This phase scans the source code as a stream of characters and
converts it into meaningful lexemes. It divides the whole text into
paragraphs, sentences, and words.
2. Syntactic Analysis (Parsing) − It involves the analysis of words in
the sentence for grammar and arranging words in a manner that shows
the relationship among the words. A sentence such as “The school
goes to the boy” is rejected by the English syntactic analyzer.
Syntactic Analysis is used to check grammar and word arrangements
and shows the relationship among the words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, which does not make any
sense, so this sentence is rejected by the Syntactic analyzer.
3. Semantic Analysis − It draws the exact meaning or the dictionary
meaning from the text. The text is checked for meaningfulness. It is
done by mapping syntactic structures and objects in the task domain.
The semantic analyzer disregards sentences such as “hot ice cream”.
Semantic analysis is concerned with the meaning representation. It
mainly focuses on the literal meaning of words, phrases, and
sentences.

4. Discourse Integration − The meaning of any sentence depends upon

the meaning of the sentence just before it. In addition, it also brings
about the meaning of the immediately succeeding sentence. Discourse
Integration depends upon the sentences that proceed with it and
invoke the meaning of the sentences that follow it.

5. Pragmatic Analysis − During this, what was said is re-interpreted on

what it meant. It involves deriving those aspects of language which
require real-world knowledge. Pragmatic is the fifth and last phase of
NLP. It helps you to discover the intended effect by applying a set of
rules that characterize cooperative dialogues.

 Introduction to Text Pre Processing

- The various text pre-processing steps are:
 Tokenization
 Lower casing
 Stop words removal
 Stemming
 Lemmatization
Text pre-processing is a critical step in Natural Language Processing (NLP) that
involves cleaning and transforming raw text data into a format that is more
suitable for analysis.

The goal of text pre-processing is to remove noise and irrelevant information

and to extract important features from the text data that can be used for further
analysis. Text pre-processing typically involves the following steps:

1. Tokenization: Breaking the text into individual words or tokens. This

step is essential for many NLP tasks, as it allows the computer to
understand the structure of the text and analyze it at a more granular
level.
2. Lowercasing: Converting all text to lowercase to avoid case sensitivity
issues.
3. Stopword removal: Removing common words that do not carry much
meaning, such as “the”, “and”, and “a”.
4. Stemming or Lemmatization: Reducing words to their root form to
avoid redundancy in the data. For example, “running”, “runs”, and “run”
would all be stemmed from “run”.
5. Removing special characters and punctuation: Removing characters
such as emojis or punctuation that may not contribute to the meaning of
the text.
6. Spell correction: Correcting common spelling mistakes to ensure
consistency in the data.

Text pre-processing helps to improve the accuracy and efficiency of NLP

algorithms and is a necessary step in many NLP applications such as sentiment
analysis, and text classification. Proper text pre-processing can also reduce the
size of the dataset and improve the speed and accuracy of machine learning
models.

Knowledge about the following terms:

 Lexical Databases- A lexicon is a collection of words and/ or phrases

along with associated information, like part-of-speech and sense
definitions. Lexical resources are created to enrich the text as they are
secondary to texts.
 # Phrases, # Compound words, and # Idioms- Phases, compounds, and
idioms are commonly used in Natural Language Processing (NLP) to
describe specific techniques, processes, or concepts. Here are some
examples:
1. Phases:
2. Compounds: Compounds are terms or phrases that are commonly used in
NLP to describe specific techniques or processes. Some examples of NLP
compounds include:

a) Sentiment analysis: It involves using NLP techniques to identify and

extract subjective information from a text, such as opinions, attitudes, and
emotions.

b) Text classification: It involves categorizing a text into predefined classes

or categories, such as spam/not spam, positive/negative sentiment, and so on.
c) Machine translation: It involves using NLP techniques to automatically
translate text from one language to another.

3. Idioms: Idioms are commonly used expressions that have a figurative or

metaphorical meaning. Here are some examples of idioms used in NLP:

a) A penny for your thoughts: This idiom means asking someone to share
their thoughts or opinions about something. In NLP, this idiom can be used
in sentiment analysis to understand the opinions and attitudes of people
toward a particular topic.

b) In hot water: This idiom means being in trouble or facing difficulties. In

NLP, this idiom can be used to understand the sentiment of a text that talks
about a difficult situation.

c) A piece of cake: This idiom means something very easy to do. In NLP,
this idiom can be used to describe a text that is easy to classify or analyze,
such as a text with a clear sentiment polarity.

 Morphology analysis
Morphology analysis in NLP refers to the process of analyzing the
structure of words to identify their morphemes, which are the smallest
units of meaning in a language. Morphology is an important aspect of
NLP because it allows us to understand how words are formed and how
their meaning can be modified by adding prefixes, suffixes, or other
affixes.

The goal of morphology analysis is to identify and segment words into

their constituent morphemes and to assign each morpheme a meaning or
grammatical function. Morphology analysis can be performed using
various techniques, such as rule-based approaches, statistical models, or
machine learning algorithms.

For example, in English, the word "unhappiness" can be analyzed into

three morphemes: "un-" (a prefix meaning "not"), "happy" (the root
word), and "-ness" (a suffix meaning "the state or quality of"). By
breaking down the word into its constituent morphemes, we can
understand that "unhappiness" means "not happy" or "lacking happiness".
 Parts of speech tagging
Part-of-speech (POS) tagging in NLP is the process of assigning each
word in a sentence its appropriate grammatical category, or part of
speech. The categories include nouns, verbs, adjectives, adverbs,
pronouns, prepositions, conjunctions, and interjections.

The POS tagging process involves using contextual information to

determine the most likely part of speech for each word, based on its
position in the sentence and its relationship to other words. The accuracy
of POS tagging can vary depending on the language being analyzed and
the complexity of the sentence structure.

POS tagging can be performed using various techniques, such as rule-

based approaches, statistical models, or machine learning algorithms.
Some common techniques include Hidden Markov Models (HMMs),
Maximum Entropy Markov Models (MEMMs), and Conditional Random
Fields (CRFs).

Here are some examples of POS tagging:

Sentence: "The cat sat on the mat."

Tagged Sentence: "The (DT) cat (NN) sat (VBD) on (IN) the (DT) mat
(NN). (.)"

In this example, "DT" represents a determiner (such as "the"), "NN"

represents a noun, and "VBD" represents a past tense verb.

Sentence: "I am eating a delicious pizza."

Tagged Sentence: "I (PRP) am (VBP) eating (VBG) a (DT) delicious (JJ)
pizza (NN). (.)"

In this example, "PRP" represents a pronoun (such as "I"), "VBP"

represents a present tense verb, "VBG" represents a verb in the present
participle form, and "JJ" represents an adjective (such as "delicious").

POS tagging is an important step in many NLP applications, such as text

classification, sentiment analysis, and machine translation, as it provides
valuable information about the structure and meaning of natural language
text.

 # Stop word(English and regional language).

Stop words are words that are commonly used in a language but are
considered to have little or no meaning in the context of natural language
processing (NLP). These words are often removed from text data during
text pre-processing to improve the accuracy of NLP models and reduce
the computational complexity of processing large volumes of text data.

Some examples of stop words in English include "the", "and", "a", "an",
"in", "of", "to", "is", "that", and "it". However, the list of stop words may
vary depending on the specific NLP task and the characteristics of the
text data being analyzed.

Here's an example of how to stop words are removed from text data
during pre-processing:

Original text: The quick brown fox jumps over the lazy dog. Stop words
removed: quick brown fox jumps lazy dog.

In this example, the stop words "the", "over", and "the" has been removed from
the original text to create a cleaner version of the text that is easier to analyze
with NLP techniques.

It's worth noting that not all NLP tasks require the removal of stop
words. In some cases, stop words may be important for understanding
the meaning and context of text data. Therefore, it's important to
carefully consider the specific NLP task and the characteristics of the
text data when deciding whether to remove stop words.

 Types of stemmers and lemmatizers

In natural language processing (NLP), stemming and lemmatization are two
common techniques used to reduce inflectional forms of words to their base
forms. Stemming involves removing the suffixes from words to extract their
root form, while lemmatization involves mapping words to their base or
dictionary form. Here are some types of stemmers and lemmatizers commonly
used in NLP:
Stemmer: Stemming is a natural language processing technique that lowers
inflection in words to their root forms, hence aiding in the pre-processing of
text, words, and documents for text normalization.

1. Porter stemmer: The Porter stemmer is one of the most widely used
stemmers in NLP. It is a rule-based algorithm that applies a series of rules
to remove common suffixes from words. The Porter stemmer is often
used in information retrieval and text mining applications. Here's an
example:
Original Word: walking
Stemmed Word: walk

2. Snowball stemmer: The Snowball stemmer, also known as the Porter2

stemmer, is an improved version of the Porter stemmer. It is a more
aggressive stemmer that applies a wider range of rules to extract the root
form of words. The Snowball stemmer is commonly used in search
engines and text classification systems. Here's an example:
Original Word: cats
Stemmed Word: cat

3. Lancaster stemmer: The Lancaster stemmer is a highly aggressive

stemmer that uses a set of complex rules to extract the root form of
words. It is known for its high accuracy, but it can sometimes produce
very short or unusual stems. The Lancaster stemmer is often used in
information retrieval and text mining applications. Here's an example:
Original Word: computers
Stemmed Word: comput

4. Regexp Stemmer - Regex stemmer identifies morphological affixes

using regular expressions. Substrings matching the regular expressions
will be discarded. RegexpStemmer() is a module in NLTK that
implements the Regex stemming technique. Here’s an example:
mass ---> mas
was ---> was
bee ---> bee
computer ---> computer
advisable ---> advis

Lemmatizer: The purpose of lemmatization is the same as that of stemming but

overcomes the drawbacks of stemming. In stemming, some words, may not give
meaningful representation such as “History”. Here, lemmatization comes into
the picture as it gives a meaningful word.
1. WordNet lemmatizer: The WordNet lemmatizer is a dictionary-based
lemmatizer that maps words to their base form using the WordNet lexical
database. It uses part-of-speech tagging to identify the correct lemma for
a given word. The WordNet lemmatizer is commonly used in natural
language generation and text classification systems. Here's an example:
Original Word: running
Lemmatized Word: run

2. Stanford lemmatizer: The Stanford lemmatizer is a machine learning-

based lemmatizer that uses a statistical model to map words to their base
form. It uses part-of-speech tagging and context-based features to identify
the correct lemma for a given word. The Stanford lemmatizer is
commonly used in named entity recognition and machine translation
systems. Here's an example:
Original Word: went
Lemmatized Word: go

Overall, the choice of stemmer or lemmatizer depends on the specific NLP

application and the characteristics of the text data being analyzed. Each
stemmer or lemmatizer has its strengths and weaknesses, and different
techniques may be more appropriate for different types of text data or language
domains.

It's worth noting that stemmers and lemmatizers can produce different results
depending on the context and the specific algorithm used. Therefore, it's
important to choose the appropriate stemmer or lemmatizer based on the
specific NLP task and the characteristics of the text data being analyzed.

 # Multi-word expression
Multi-word expressions (MWEs) in NLP refer to phrases or groups of words
that function as a single unit and carry a specific meaning that cannot be easily
inferred from the individual words alone. MWEs can include idioms,
collocations, phrasal verbs, and other fixed expressions that are commonly used
in language.

MWEs present a challenge for NLP tasks such as parsing, machine translation,
and sentiment analysis because their meaning is often not predictable from the
individual words in the expression. For example, the expression "kick the
bucket" means "to die", but this meaning cannot be inferred from the individual
words "kick" and "bucket".
To deal with MWEs in NLP, various approaches have been developed,
including:

1. Identification and extraction: This involves identifying MWEs in text and

extracting them as a single unit for further analysis. This can be done
using pattern-based approaches or statistical models.
2. Composition: This involves assigning a compositional meaning to the
MWE based on the individual words in the expression. For example, the
meaning of the MWE "hot potato" can be inferred from the meanings of
the words "hot" and "potato".
3. Disambiguation: This involves disambiguating the meaning of the MWE
based on the context in which it is used. For example, the expression "let
the cat out of the bag" can mean "to reveal a secret" or "to let something
escape", depending on the context.

MWEs are an important aspect of NLP because they are commonly used in
language and can significantly affect the meaning of the text. Proper handling of
MWEs can improve the accuracy and efficiency of NLP applications and
contribute to a more accurate understanding of natural language text.

 NLTK
Natural Language Toolkit (NLTK) is a popular open-source platform for
building Python programs that work with human language data. It is a
comprehensive library of tools and algorithms for tasks such as tokenization,
stemming, tagging, parsing, and machine learning, as well as a variety of corpus
resources.

NLTK provides a wide range of functionalities for NLP, such as:

1. Tokenization: Splitting text into individual words or sentences.

2. Stemming: Reducing words to their base or root form.
3. Part-of-speech tagging: Assigning each word in a sentence its appropriate
grammatical category.
4. Chunking: Grouping words into "chunks" based on their part-of-speech
tags.
5. Parsing: Analyzing the grammatical structure of a sentence.
6. Sentiment analysis: Determining the sentiment or opinion expressed in a
piece of text.
7. Machine learning: Building and training NLP models using machine
learning algorithms.
NLTK also provides access to a variety of corpora, including text collections in
multiple languages, as well as pre-trained models and algorithms for various
NLP tasks.

NLTK is widely used in research and academia, as well as in industry for

various applications such as chatbots, sentiment analysis, and text classification.
It is also a popular tool for teaching NLP and computational linguistics.

 Introduction to Machine learning

Machine learning (ML) is a subfield of artificial intelligence (AI) that focuses on
building systems that can automatically learn from data and improve their
performance on a specific task without being explicitly programmed. ML
algorithms can learn patterns and relationships in data through training and
use this knowledge to make predictions or decisions on new, unseen data.

ML can be divided into three main categories: supervised learning,

unsupervised learning, and reinforcement learning.

1. Supervised learning: In supervised learning, the algorithm is trained on

labeled data, meaning that each example in the training dataset is labeled
with the correct output. The algorithm learns to map inputs to outputs
based on this labeled data and can make predictions on new, unseen data.
Examples of supervised learning tasks include classification (predicting a
discrete output) and regression (predicting a continuous output).
2. Unsupervised learning: In unsupervised learning, the algorithm is
trained on unlabeled data, meaning that there is no predefined output for
each example. The algorithm learns to identify patterns and relationships
in the data without guidance, such as clustering similar data points
together or reducing the dimensionality of the data. Examples of
unsupervised learning tasks include clustering and dimensionality
reduction.
3. Reinforcement learning: In reinforcement learning, the algorithm learns
by interacting with an environment and receiving rewards or penalties for
certain actions. The algorithm learns to take actions that maximize the
expected reward over time, such as playing a game or controlling a robot.

ML algorithms can be implemented using various techniques, such as decision

trees, linear regression, logistic regression, support vector machines, neural
networks, and deep learning.
ML has numerous applications in various fields, including computer vision,
natural language processing, speech recognition, recommender systems, fraud
detection, and autonomous vehicles. The increasing availability of data and
computing power has led to a surge in ML research and applications in recent
years.

 Data mining, Text mining, and Opinion mining

# Data mining: Data mining involves the process of discovering patterns
and relationships in large datasets. It involves the use of machine learning
algorithms and statistical techniques to analyze data and extract useful
information. In NLP, data mining can be used to extract insights from
large collections of text data, such as social media posts or customer
reviews. Examples of data mining techniques in NLP include sentiment
analysis, topic modeling, and named entity recognition.

# Text mining: Text mining, also known as text analytics, is the process
of analyzing unstructured text data to extract useful information. It
involves techniques such as natural language processing, machine
learning, and information retrieval to identify patterns and relationships in
text data. Text mining can be used for a variety of applications, such as
information extraction, text classification, and text summarization.

Opinion mining: Opinion mining, also known as sentiment analysis, is

the process of identifying and extracting subjective information from text
data, such as opinions, attitudes, and emotions. Opinion mining
techniques can be used to analyze customer reviews, social media posts,
and other forms of user-generated content to gain insights into customer
sentiment and preferences. Examples of opinion-mining techniques in
NLP include lexicon-based methods, machine learning-based methods,
and deep learning-based methods.

All three subfields of NLP involve the use of machine learning algorithms and
statistical techniques to extract insights from large amounts of text data. By
applying these techniques, businesses, and organizations can gain valuable
insights into customer behavior, preferences, and opinions, and use this
knowledge to inform business decisions and improve customer satisfaction.

 Syntax Analysis or Parsing

Syntactic analysis or parsing or syntax analysis is the third phase of NLP. The
purpose of this phase is to draw exact meaning, or you can say dictionary
meaning from the text. Syntax analysis checks the text for meaningfulness
compared to the rules of formal grammar. For example, a sentence like “hot ice
cream” would be rejected by the semantic analyzer.
In this sense, syntactic analysis or parsing may be defined as the process of
analyzing the strings of symbols in natural language conforming to the rules of
formal grammar. The origin of the word ‘parsing’ is from the Latin
word ‘pars’ which means ‘part’.

 # Text classifications
Text classification is a subfield of natural language processing (NLP) that
involves assigning predefined categories or labels to text based on its content. It
is also known as text categorization or text tagging.
Text classification is used in a variety of applications such as spam filtering,
sentiment analysis, language identification, topic modeling, and more. It
involves using machine learning algorithms to train models that can
automatically classify text into different categories or labels.
There are several approaches to text classification, including rule-based
methods, machine learning methods, and deep learning methods. Rule-based
methods involve defining a set of rules or patterns that can be used to classify
text. Machine learning methods involve training a model on a labeled dataset,
and then using the trained model to classify new text. Deep learning methods
use neural networks to automatically learn features from text and classify it into
different categories.
Some popular machine learning algorithms used for text classification include
Naive Bayes, Support Vector Machines (SVM), and decision trees. Deep
learning models like Convolutional Neural Networks (CNN) and Recurrent
Neural Networks (RNN) have also shown promising results in text classification
tasks.
Overall, text classification is a crucial task in NLP, as it allows us to
automatically categorize and analyze large amounts of text data, making it
easier to extract insights and make informed decisions.

NLP Unit 1 Notes
100% (1)
NLP Unit 1 Notes
19 pages
Timeline of American Literature
100% (6)
Timeline of American Literature
8 pages
Unsolved Case Files Who Whacked Jack 01
No ratings yet
Unsolved Case Files Who Whacked Jack 01
11 pages
Natural Language Processing
100% (1)
Natural Language Processing
3 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
100% (8)
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
5 pages
Unit 4 NLP Notes
No ratings yet
Unit 4 NLP Notes
35 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
3.1 Natural Language Processing
No ratings yet
3.1 Natural Language Processing
5 pages
What Is Moxibustion Acupuncturedrcmt PDF
No ratings yet
What Is Moxibustion Acupuncturedrcmt PDF
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Natural Language Processing
100% (1)
Natural Language Processing
6 pages
NLP Notes
No ratings yet
NLP Notes
37 pages
NLP MODULE 1 Chapter1 &2
No ratings yet
NLP MODULE 1 Chapter1 &2
83 pages
HF2020 XFS ATM Jackpotting Alexandre Beaulieu
No ratings yet
HF2020 XFS ATM Jackpotting Alexandre Beaulieu
40 pages
NLP Unit I
No ratings yet
NLP Unit I
30 pages
Unit V
No ratings yet
Unit V
16 pages
2 Introduction
No ratings yet
2 Introduction
15 pages
Krumrei-Mancuso2015 Humility Scale
No ratings yet
Krumrei-Mancuso2015 Humility Scale
14 pages
An Exegetical Study of Song of Songs 4
No ratings yet
An Exegetical Study of Song of Songs 4
12 pages
Book Unit 2
No ratings yet
Book Unit 2
4 pages
What Is NLP?: Natural Language Processing in AI
No ratings yet
What Is NLP?: Natural Language Processing in AI
5 pages
Microsoft Investment Analysis
No ratings yet
Microsoft Investment Analysis
4 pages
Writing A Creative Writing PHD Proposal - Guide Feb 2023
No ratings yet
Writing A Creative Writing PHD Proposal - Guide Feb 2023
3 pages
AI Unit-5
No ratings yet
AI Unit-5
10 pages
CMoS s5 Phy Chem Calculations Seminar 01?
100% (1)
CMoS s5 Phy Chem Calculations Seminar 01?
3 pages
Solution To NLP Viva Questions
No ratings yet
Solution To NLP Viva Questions
21 pages
As 1789
No ratings yet
As 1789
2 pages
Writing and Academic Essay
No ratings yet
Writing and Academic Essay
4 pages
HPE Reference Configuration For Veeam Backup & Replication Version 12 With HPE StoreOnce
No ratings yet
HPE Reference Configuration For Veeam Backup & Replication Version 12 With HPE StoreOnce
71 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
A Uniform Thin Ring of Radius R and Mass M Suspended in A Vertical Pla
No ratings yet
A Uniform Thin Ring of Radius R and Mass M Suspended in A Vertical Pla
1 page
CNet Training Brochure
No ratings yet
CNet Training Brochure
52 pages
NLP Notes
No ratings yet
NLP Notes
90 pages
Introduction To NLP
No ratings yet
Introduction To NLP
51 pages
Class 1 - NLP
No ratings yet
Class 1 - NLP
28 pages
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
No ratings yet
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
87 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
Ai Unit4
No ratings yet
Ai Unit4
36 pages
CL Unit 1
No ratings yet
CL Unit 1
11 pages
Set Off Complete Notes 29-5-24
No ratings yet
Set Off Complete Notes 29-5-24
22 pages
NLP UNIT 1 Part 1
No ratings yet
NLP UNIT 1 Part 1
24 pages
Neuro PT Assessment
No ratings yet
Neuro PT Assessment
26 pages
NLP Unit 1 To 5
No ratings yet
NLP Unit 1 To 5
91 pages
Foundation For NLP
No ratings yet
Foundation For NLP
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
16 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
NLP Unit 1
No ratings yet
NLP Unit 1
48 pages
What Is NLP
No ratings yet
What Is NLP
16 pages
NLP Lecture
No ratings yet
NLP Lecture
18 pages
NLP Meterial 5 Units
No ratings yet
NLP Meterial 5 Units
151 pages
Islamic Law of Evidence and Procedure
No ratings yet
Islamic Law of Evidence and Procedure
24 pages
Notes MSC NLP
No ratings yet
Notes MSC NLP
36 pages
What Is Natural Language Processing?
No ratings yet
What Is Natural Language Processing?
5 pages
Delica n2 v1v2 Text Medium
No ratings yet
Delica n2 v1v2 Text Medium
18 pages
Unix and Shell Programming
No ratings yet
Unix and Shell Programming
19 pages
Unit 4
No ratings yet
Unit 4
39 pages
Week 4 Sipa
No ratings yet
Week 4 Sipa
15 pages
1 NLP
No ratings yet
1 NLP
26 pages
Natural Language Processing Unit1
No ratings yet
Natural Language Processing Unit1
23 pages
Aura and Color Readings
No ratings yet
Aura and Color Readings
4 pages
Natural Language Processing - 1
No ratings yet
Natural Language Processing - 1
44 pages
TOPIC 4 Natural Language Processing
No ratings yet
TOPIC 4 Natural Language Processing
26 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
Unit 3&4
No ratings yet
Unit 3&4
10 pages
This Release Contains:: How To Upgrade From Previous Versions
No ratings yet
This Release Contains:: How To Upgrade From Previous Versions
8 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
9 pages
Ai 2
No ratings yet
Ai 2
7 pages
CAT King Study Material 2
No ratings yet
CAT King Study Material 2
20 pages
CH 5 NLP
No ratings yet
CH 5 NLP
12 pages
01 - Intro NLP
No ratings yet
01 - Intro NLP
13 pages
Ai Applications Unit-1
No ratings yet
Ai Applications Unit-1
11 pages
AI Init-5
No ratings yet
AI Init-5
6 pages
Unit1 A
No ratings yet
Unit1 A
8 pages
What Is NLP?: Natural Language Processing Computer Science, Human Language, Artificial Intelligence
No ratings yet
What Is NLP?: Natural Language Processing Computer Science, Human Language, Artificial Intelligence
10 pages
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Natural Language Processing: Components of NLP
No ratings yet
Natural Language Processing: Components of NLP
8 pages
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
Role of General Public in Crime Prevention
No ratings yet
Role of General Public in Crime Prevention
1 page
What Is NLP?
No ratings yet
What Is NLP?
5 pages
Harmonizing Humanity and Technology
No ratings yet
Harmonizing Humanity and Technology
10 pages
Sold To A Ruthless Mafia Boss Outline
No ratings yet
Sold To A Ruthless Mafia Boss Outline
4 pages
Brief History of NLP
No ratings yet
Brief History of NLP
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
Aditya Praksh Jalan Saraswati Vidya Mandir, Kudlum: Online Class Routine
No ratings yet
Aditya Praksh Jalan Saraswati Vidya Mandir, Kudlum: Online Class Routine
1 page
DT Vs NDT
No ratings yet
DT Vs NDT
2 pages
Singultus
No ratings yet
Singultus
2 pages
Nervous System
No ratings yet
Nervous System
1 page
Relation-Reincarnation and Globalisation.
No ratings yet
Relation-Reincarnation and Globalisation.
3 pages
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
From Everand
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
daniel Huston
No ratings yet

NLP Exam Notes

Uploaded by

NLP Exam Notes

Uploaded by

Important Topics

 Understanding of NLU & NLG

# There are the following two components of NLP -

1. Natural Language Understanding (NLU)

Natural Language Understanding (NLU) helps the machine to understand and

NLU involves the following tasks -

o It is used to map the given input into useful representation.

2. Natural Language Generation (NLG)

Natural Language Generation (NLG) acts as a translator that converts the

# Note: The NLU is more difficult than NLG.

1. Question Answering- Question Answering focuses on building

 NLP Phases with example

4. Discourse Integration − The meaning of any sentence depends upon

5. Pragmatic Analysis − During this, what was said is re-interpreted on

 Introduction to Text Pre Processing

The goal of text pre-processing is to remove noise and irrelevant information

1. Tokenization: Breaking the text into individual words or tokens. This

Text pre-processing helps to improve the accuracy and efficiency of NLP

Knowledge about the following terms:

 Lexical Databases- A lexicon is a collection of words and/ or phrases

a) Sentiment analysis: It involves using NLP techniques to identify and

b) Text classification: It involves categorizing a text into predefined classes

3. Idioms: Idioms are commonly used expressions that have a figurative or

b) In hot water: This idiom means being in trouble or facing difficulties. In

The goal of morphology analysis is to identify and segment words into

For example, in English, the word "unhappiness" can be analyzed into

The POS tagging process involves using contextual information to

POS tagging can be performed using various techniques, such as rule-

Here are some examples of POS tagging:

Sentence: "The cat sat on the mat."

In this example, "DT" represents a determiner (such as "the"), "NN"

Sentence: "I am eating a delicious pizza."

In this example, "PRP" represents a pronoun (such as "I"), "VBP"

POS tagging is an important step in many NLP applications, such as text

 # Stop word(English and regional language).

 Types of stemmers and lemmatizers

2. Snowball stemmer: The Snowball stemmer, also known as the Porter2

3. Lancaster stemmer: The Lancaster stemmer is a highly aggressive

4. Regexp Stemmer - Regex stemmer identifies morphological affixes

Lemmatizer: The purpose of lemmatization is the same as that of stemming but

2. Stanford lemmatizer: The Stanford lemmatizer is a machine learning-

Overall, the choice of stemmer or lemmatizer depends on the specific NLP

1. Identification and extraction: This involves identifying MWEs in text and

NLTK provides a wide range of functionalities for NLP, such as:

1. Tokenization: Splitting text into individual words or sentences.

NLTK is widely used in research and academia, as well as in industry for

 Introduction to Machine learning

ML can be divided into three main categories: supervised learning,

1. Supervised learning: In supervised learning, the algorithm is trained on

ML algorithms can be implemented using various techniques, such as decision

 Data mining, Text mining, and Opinion mining

Opinion mining: Opinion mining, also known as sentiment analysis, is

 Syntax Analysis or Parsing

You might also like