Chapter 6
Natural Language
Processing
Melaku M.
Outline
In this chapter, you’ll learn about:-
►Basics of NLP
►Components of NLP
►Build NLP pipeline
►Phases of NLP
►NLP Ambiguity or Challenges
►Discover the most popular NLP applications in business.
Introduction to Natural Language Processing(NLP)
Have you ever wondered how robots such as Sophia or
home assistants sound so humanlike?
How do they understand you?
• All of this is because of the magic of NLP.
What is Natural Language Processing(NLP)?
NLP is the intersection of Computer Science, and Linguistics that is
concerned with the communication between computers and humans in
natural language.
It is the technology that allows computers to understand, and generate
human language, whether it will be written, spoken, or even scribbled.
NLP is focused on making human communication, such as speech, text,
and etc. are comprehensible/ intelligible to machines.
Components of NLP
• There are two basic NLP components.
Natural Langage Understanding(NLU)
• NLU is the process of reading and interpreting language.
• NLG is the process of writing or generating language.
Build NLP pipelines
• There are the following steps to build an NLP pipeline -
1. Sentence Segmentation
Breaks the large piece of text into constituent sentences.
• Example:
• Independence Day is one of the important festivals for every Indian citizen. It
is celebrated on the 15th of August each year ever since India got
independence from the British rule. The day celebrates independence in the
true sense.
• Sentence Segmentation:
'''Independence Day is one of the important festivals for every Indian
citizen.''
''It is celebrated on the 15th of August each year ever since India got
independence from the British rule.''
''This day celebrates independence in the true sense.''
Build NLP pipelines
2. Word Tokenization
Breaks the sentence into linguistic units called tokens, such as words,
punctuation, numbers, alphanumeric, etc.
• Example:
• Microsoft offers Corporate Training, Online Training, and Winter Training.
• Word Tokenizer output:
• '' Microsoft '', ''offers'' , ''Corporate'', ''Training'' , ''Online'', ''Training'',
''and'' ,''Winter'', ''Training'', ''.''
Build NLP pipelines
3. Stemming
Normalize words into its base form or root form. Stemming algorithm works by
cutting starting and ending of the word by considering prefix and affix that
found in the word.
Example:
The following all words are generated from the root word “Affect”
The issue with stemming is that sometimes it produces the root word which may not have any
meaning and wouldn’t available in the dictionary.
Build NLP pipelines
4. Lemmatization
Lemmatization is used to normalize words into its base form or root
form. It is grouping the different inflected forms of the word.
Lemmatization is quite similar to the Stemming. The difference between
Stemming and lemmatization is that lemma produces the root word,
which has a meaning.
Example 1: The terms "is, are, am, were, been,” are grouped under the
lemma ‘be.’ The terms “went,, gone, going” …. are grouped under the
lemma ‘go.’
Build NLP pipelines
5. Removing stop words
In English, many words appear very frequently such as ''is'', ''are'' ''and'',
''the'', and ''a''. NLP pipelines will flag these words as stop words. Stop
words might be filtered out before doing any statistical analysis.
Example:
The words ‘the’, ‘are’, and ‘at’ are filtered
Build NLP pipelines
6. Part of Speech tagging(POS)
POS stands for parts of speech, which includes Noun, verb, adverb,… etc.
It indicates that how a word functions with its meaning as well as
grammatically within the sentences. The word can have more than one context
based on the speech.
Build NLP pipelines
7. Named Entity Recognition
• Named entity recognition aims to extract entities in a piece of text into predefined
categories. The input to such a model is generally text, and the output is the
various named entities.
Categories
Phases of NLP
1. Lexical (Morphological) analysis
Separate words into individual morphemes (meaningful linguistic unit e.g. dog)
and identify the class of the morphemes. It recognizes the word and category
using a dictionary (word+ category).
2. Syntactic analysis(Parsing)
Syntactic Analysis is used to check grammar, word arrangements, and shows the
relationship among the words.
Assesses how natural language aligns with the grammar rules
It may remove illegal statement that doesn’t satisfy the grammar rule.
Phases of NLP
3. Semantic analysis
Semantic analysis (transform into a logical form, semantic network,
etc.)
Semantic analysis is concerned with the meaning representation. It
mainly focuses on the literal meaning of words, phrases, and sentences.
Phases of NLP
4. Discourse analysis
• Semantics beyond individual sentences
• A discourse is a sequence of sentences. Understanding discourse structure is
extremely important for dialog system.
• Understanding a text
• Who/when/where/what ... are involved in an event?
• How to connect the semantic representations of different sentences?
• Example 1:
• He hits the car with a stone. It bounces back.
• Example 2: The dialog may be
• When does the bus to Hyderabad leave?
• There is one at 10 a.m. and one at 1 p.m.
• Give me two tickets for the earlier one, please.
Phases of NLP
5. Pragmatic analysis
Practical usage of language: what a sentence means in practice.
• Do you have time?
• How do you do?
• It is too cold to go outside!
Pragmatic analysis is concerned with the purposeful use of language in
situations and utilizes context over and above the contents of the text for
understanding.
Phases of NLP
NLP :Ambiguity(Challenges )
NLP is difficult because Ambiguity & Uncertainty exist in the language.
Ambiguity: Three kinds of ambiguity exists in natural language.
NLP :Ambiguity(Challenges )
Single word can have two or more meaning in the sentence, in such a scenario lexical
ambiguity exists.
NLP :Ambiguity(Challenges )
Syntactic Ambiguity exists in the presence of two or more possible meanings
for the sentence.
NLP :Ambiguity(Challenges )
Referential Ambiguity exists when you are referring to something using the
pronoun. E.g. Rita went to Sunita. She said, ''I am hungry.''
Some of the most popular Applications of NLP
• Question Answering: Question Answering focuses on building
systems that automatically answer the questions asked by humans
in a natural language.
Some of the most popular Applications of NLP
• Spam Detection: Spam detection is used to detect unwanted e-mails
getting to a user's inbox.
Some of the most popular Applications of NLP
• Information retrieval finds the documents that are most relevant to a query.
This is a problem every search and recommendation system face. The goal is
not to answer a particular query but to retrieve, from a collection of
documents that may be numbered in the millions, a set that is most relevant
to the query.
• Summarization is the task of shortening text to highlight the most relevant
information. An NLP technique that summarizes a longer text, in order to
make it more manageable for time-sensitive readers. Some common texts that
are summarized include reports and articles.
Some of the most popular Applications of NLP
• Machine translation automates translation between different
languages
Some of the most popular Applications of NLP
• Grammatical error correction models encode grammatical rules to correct
the grammar within text. Online grammar checkers like Grammarly and
word-processing systems like Microsoft Word use such systems to provide a
better writing experience to their customers.
Some of the most popular Applications of NLP
• Text generation, more formally known as natural language
generation (NLG), produces text that’s similar to human-written text.
• It’s particularly useful for autocomplete and chatbots.
Autocomplete predicts what word comes next.
Chatbots automate one side of a conversation while a human conversant.
It is used by many companies to provide the customer's chat services.
Some of the most popular Applications of NLP
• Sentiment Analysis
• It is the process of classifying the emotional intent of text. Generally, the input to
a sentiment classification model is a piece of text, and the output is the
probability that the sentiment expressed is positive, negative, or neutral. Also
focus on feelings and emotions (angry, happy, sad, etc), and even on intentions
(e.g. interested v. not interested).
Sentiment analysis is
commonly used by
businesses to better classify
and understand the
customer reviews/feedback
on various online platforms.
“Question
s?