Unit 4

Uploaded by

ramsssssssssss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views8 pages

Unit 4

Uploaded by

ramsssssssssss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

4.

NATURAL LANGUAGE PROCESSING, UNDERSTANDING,

AND GENERATION
Differentiate NLP,NLU and NLG

Chatbot Architecture:

Fig: Architecture diagram for chatbots

Let’s say an airline company has built a chatbot to book a flight via their website or social media
pages. The following are the steps as per the architecture shown in Figure:
1. Customer says, “Help me book a flight for tomorrow from London to New York”
through the airline’s Facebook page. In this case, Facebook becomes the presentation
layer. A fully functional chatbot could be integrated into a company’s website, social
network page, and messaging apps like Skype and Slack.
2. Next, the message is carried to the messaging backend where the plain text passes
through an NLP/NLU engine, where the text is broken into tokens, and the message is
converted into a machine-understandable command.
3. The decision engine then matches the command with preconfigured workflows. So, for
example, to book a flight, the system needs a source and a destination. This is where
NLG helps. The chatbot will ask, “Sure, I will help in you booking your flight from
London to New York. Could you please let me know if you prefer your flight from
Heathrow or Gatwick Airport?” The chatbot picks up the source and destination and
automatically generates a follow-up question asking which airport the customer prefers.
4. The chatbot now hits the data layer and fetches the flight information from preferred data
sources, which could typically be connected to live booking systems. The data source
provides flight availability, price, and many other services as per the design.

Further, to differentiate between NLP and NLU, the Venn diagram in Figure
shows a few applications of NLP and NLU. It shows NLU as a subset of NLP. The
overall objective is to process and understand the natural language text to make machines
think like humans.

Fig: Applications of NLP and NLU

Popular Open Source NLP and NLU Tools :

1. NLTK: The Natural Language Toolkit (NLTK) is a Python library for processing
English vocabulary. It has an Apache 2.0 open source license. NLTK is written in the
Python programming language. The following are some of the tasks NLTK can perform:
 Classification of text: Classifying text into a different category for better
organization and content filtering
 Tokenization of sentences: Breaking sentences into words for symbolic and
statistical natural language processing
 Stemming words: Reducing words into base or root form
 Part-of-speech (POS) tagging: Tagging the words into POS, which categorizes the
words into similar grammatical properties
 Parsing text: Determining the syntactic structure of text based on the underlying
grammar
 Semantic reasoning: Ability to understand the meaning of the word to create
representations
NLTK is the first choice of a tool for teaching NLP. It is also widely used as a
platform for prototyping and research.
2. spaCy : Most organizations that build a product involving natural language data are
adapting spaCy. It stands out with its offering of a production-grade NLP engine that is
accurate and fast. With the extensive documentation, the adaption rate further increases.
It is developed in Python and Cython. All the language models in spaCy are trained using
deep learning, which provides high accuracy for all NLP tasks.
Currently, the following are some high-level capabilities of spaCy:
 Covers NLTK features: Provides all the features of NLTK-like tokenization, POS
tagging, dependency trees, named entity recognition, and many more.
 Deep learning workflow: spaCy supports deep learning workflows, which can
connect to models trained on popular frameworks like Tensorflow, Keras, Scikit-
learn, and PyTorch. This makes spaCy the most potent library when it comes to
building and deploying sophisticated language models for real-world applications.
 Multi-language support: Provides support for more than 50 languages including
French, Spanish, and Greek.
 Processing pipeline: Offers an easy-to-use and very intuitive processing pipeline for
performing a series of NLP tasks in an organized manner. For example, a pipeline for
performing POS tagging, parsing the sentence, and named the entity extraction could
be defined in a list like this: pipeline = ["tagger," "parse," "ner"]. This makes the code
easy to read and quick to debug.
 Visualizers: Using displaCy, it becomes easy to draw a dependency tree and entity
recognizer. We can add our colors to make the visualization aesthetically pleasing
and beautiful. It quickly renders in a Jupyter notebook as well.

3. CoreNLP: Stanford CoreNLP is one of the oldest and most robust tools for all natural
language tasks. Its suite of functions offers many linguistic analysis capabilities,
including the already discussed POS tagging, dependency tree, named entity recognition,
sentiment analysis, and others. Unlike spaCy and NLTK, CoreNLP is written in Java. It
also provides Java APIs to use from the command line and third-party APIs for working
with modern programming languages. The following are the core features of using
 Fast and robust: Since it is written in Java, which is a time-tested and robust
programming language, CoreNLP is a favorite for many developers.
 A broad range of grammatical analysis: Like NLTK and spaCy, CoreNLP also
provides a good number of analytical capabilities to process and understand natural
language.
 API integration: CoreNLP has excellent API support for running it from the
command line and programming languages like Python via a third-party API or web
service.
 Support multiple Operating Systems (OSs): CoreNLP works in Windows, Linux,
and MacOS.
 Language support: Like spaCy, CoreNLP provides useful language support, which
includes Arabic, Chinese, and many more.

4. Gensim: gensim is a popular library written in Python and Cython. It is robust and
production-ready, which makes it another popular choice for NLP and NLU. It can help
analyze the semantic structure of plain-text documents and come out with important
topics. The following are some core features of gensim:
 Topic modeling: It automatically extracts semantic topics from documents. It
provides various statistical models, including latent Dirichlet analysis (LDA) for topic
modeling.
 Pretrained models: It has many pretrained models that provide out-of-the-box
capabilities to develop general-purpose functionalities quickly.
 Similarity retrieval: gensim’s capability to extract semantic structures from any
document makes it an ideal library for similarity queries on numerous topics.
 Features available in spaCy, NLTK, and CoreNLP
5. TextBlob: TextBlob is a relatively less popular but easy-to-use Python library that
provides various NLP capabilities like the libraries discussed above. It extends the
features provided by NLTK but in a much-simplified form. The following are some of
the features of TextBlob:
 Sentiment analysis: It provides an easy-to-use method for computing polarity and
subjectivity kinds of scores that measures the sentiment of a given text.
 Language translations: Its language translation is powered by Google Translate,
which provides support for more than 100 languages.
 Spelling corrections: It uses a simple spelling correction method demonstrated by
Peter Norvig on his blog at http://norvig.com/spell-correct.html. Currently the
Engineering Director at Google, his approach is 70% accurate.
6. fastText: fastText is a specialized library for learning word embeddings and text
classification. It was developed by researchers in Facebook’s FAI Research (FAIR) lab. It
is written in C++ and Python, making it very efficient and fast in processing even a large
chunk of data. The following are some of the features of fastText:
 Word embedding learnings: Provides many word embedding models using
skipgram and Continous Bag of Words (CBOW) by unsupervised training.
 Word vectors for out-of-vocabulary words: It provides the capability to obtain
word vectors even if the word is not present in the training vocabulary.
 Text classification: fastText provides a fast text classifier, which in their paper titled
“Bag of Tricks for Efficient Text Classification” claims to be often at par with many
deep learning classifiers’ accuracy and training time.
Natural Language ProcessingNatural language processing deals with understanding
and manicuring natural language text or speech to perform specific useful desired tasks. NLP
combines ideas and concepts from computer science, linguistics, mathematics, artificial
intelligence, machine learning, and psychology.

1. Processing Textual Data: The dataset can be downloaded from

www.kaggle.com/snap/amazon-fine-food-reviews, which is made available with a CC0:
Public Domain license.
 Reading the CSV File:
Using a read_csv function from the pandas library, we read the Reviews. csv file into
a food_review data frame and print the top rows (Figure ):
import pandas as pd
food_review = pd.read_csv("Reviews.csv")
food_review.head()

 Sampling: Using the sample function from the pandas data frame, let’s randomly
pick the text of 1000 reviews and print the top rows (see Figure ):
food_review_text = pd.DataFrame(food_review["Text"])
food_review_text_1k = food_review_text.sample(n= 1000, random_state = 123)
food_review_text_1k.head()

 Tokenization Using NLTK: The first step in processing text data is to separate a
sentence into individual words. This process is called tokenization. We will use the
NLTK’s word_tokenize function to create a column in the food_review_text_1k data
frame we created above and print the top six rows to see the output of tokenize
(Figure):
food_review_text_1k['tokenized_reviews'] = food_review_
text_1k['Text'].apply(nltk.word_tokenize)
food_review_text_1k.head()
 Word Search Using Regex: let’s take the first row in the data frame and search for
the presence of the word using a regular expression (regex). The regex searches for
any word that contains c as its first character and i as the third character. We can write
various regex searches for a pattern of interest. We use the re.search() function to
perform this search:
#Search: All 5-letter words with c as its first letter and i as its third letter
search_word = set([w for w in food_review_text_1k['tokenized_
reviews'].iloc[0] if re.search('^c.i..$', w)])
print(search_word)
{'chips'}
 Word Search Using the Exact Word: Another way of searching for a word is to use
the exact word. This can be achieved using the str.contains() function in pandas. In
the following example, we search for the word “great” in all of the reviews. The rows
of the reviews containing the word will be retrieved. They can be considered a
positive review. See Figure .
#Search for the word "great" in reviews
food_review_text_1k[food_review_text_1k['Text'].str.contains('great')]

Fig: Samples with a specific word

Developing Apps with Python and Flet
From Everand
Developing Apps with Python and Flet
Williams Asiedu
No ratings yet
Conceptual Programming with Python
From Everand
Conceptual Programming with Python
Thorsten Altenkirch
4/5 (1)
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
From Everand
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
James Tudor
5/5 (1)
Building Web Apps with Python and Flask: Learn to Develop and Deploy Responsive RESTful Web Applications Using Flask Framework (English Edition)
From Everand
Building Web Apps with Python and Flask: Learn to Develop and Deploy Responsive RESTful Web Applications Using Flask Framework (English Edition)
Malhar Lathkar
4/5 (1)
UNIT IV Lecture Notes Covering Natural Language Processing
No ratings yet
UNIT IV Lecture Notes Covering Natural Language Processing
6 pages
Assignment 1 1
No ratings yet
Assignment 1 1
6 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
NLP Tools
No ratings yet
NLP Tools
14 pages
NLP Prep
No ratings yet
NLP Prep
14 pages
NLP Assignment-1
No ratings yet
NLP Assignment-1
11 pages
PowerShell Practitioner: Understanding The Core Building Blocks of Programming & Scripting through PowerShell, Plus Debunking Popular Misconceptions
From Everand
PowerShell Practitioner: Understanding The Core Building Blocks of Programming & Scripting through PowerShell, Plus Debunking Popular Misconceptions
Stevens-Sobolewski Justin
No ratings yet
Your First Python Program
From Everand
Your First Python Program
Alexander Paz
No ratings yet
Aicb Unit 4
No ratings yet
Aicb Unit 4
15 pages
Literature Review On Vulnerability Detection Using
No ratings yet
Literature Review On Vulnerability Detection Using
10 pages
Objective-C Programming Nuts and bolts
From Everand
Objective-C Programming Nuts and bolts
Keith Lee
No ratings yet
Natural Language Generation
No ratings yet
Natural Language Generation
5 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
NLP
No ratings yet
NLP
9 pages
CC S 339 NLP Basics &TSA
No ratings yet
CC S 339 NLP Basics &TSA
68 pages
Python Data Persistence
From Everand
Python Data Persistence
Malhar Lathkar
No ratings yet
NLG System Backend Research
No ratings yet
NLG System Backend Research
11 pages
Cheating
No ratings yet
Cheating
1 page
Understanding Python: Beginner's Guide to Programming
From Everand
Understanding Python: Beginner's Guide to Programming
Sabry Fattah
No ratings yet
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
NLP 1
No ratings yet
NLP 1
11 pages
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
Libraries NLP
No ratings yet
Libraries NLP
2 pages
The Beginner’s Guide to Creating AI Chatbots
From Everand
The Beginner’s Guide to Creating AI Chatbots
Steven Mcananey
No ratings yet
NLP Materia
No ratings yet
NLP Materia
29 pages
Introducing Natural Language Processing
No ratings yet
Introducing Natural Language Processing
13 pages
Sha 10
No ratings yet
Sha 10
6 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
Mastering Python in 7 Days
From Everand
Mastering Python in 7 Days
Alex Wood
No ratings yet
Mastering Python Networking - Third Edition: Your one-stop solution to using Python for network automation, programmability, and DevOps, 3rd Edition
From Everand
Mastering Python Networking - Third Edition: Your one-stop solution to using Python for network automation, programmability, and DevOps, 3rd Edition
Eric Chou
3/5 (2)
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
Programming And Coding in Intermidiate Level
From Everand
Programming And Coding in Intermidiate Level
Memo
No ratings yet
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
No ratings yet
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
454 pages
NLP (VN-3, VN-14)
No ratings yet
NLP (VN-3, VN-14)
7 pages
NLP (VN-3, VN-14)
No ratings yet
NLP (VN-3, VN-14)
4 pages
What Is NLP?
No ratings yet
What Is NLP?
74 pages
Intro To Natural Language Processing (NLP)
No ratings yet
Intro To Natural Language Processing (NLP)
13 pages
The spaCy Handbook: Simplifying Natural Language Processing
From Everand
The spaCy Handbook: Simplifying Natural Language Processing
Robert Johnson
No ratings yet
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
Basic Guide to Programming Languages Python, JavaScript, and Ruby
From Everand
Basic Guide to Programming Languages Python, JavaScript, and Ruby
Kiet Huynh
No ratings yet
Mastering Deepseek in Python: A Complete Guide to Building, Training, Deploying, and Scaling Advanced NLP Applications with Deepseek Models in Python
From Everand
Mastering Deepseek in Python: A Complete Guide to Building, Training, Deploying, and Scaling Advanced NLP Applications with Deepseek Models in Python
Dargslan
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
Project Plan - Kel 5 PDF
No ratings yet
Project Plan - Kel 5 PDF
5 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
9 pages
Ai 2
No ratings yet
Ai 2
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
28 pages
Unit 1 TB
No ratings yet
Unit 1 TB
19 pages
NLP 1
No ratings yet
NLP 1
29 pages
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
From Everand
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Eric Vargas
No ratings yet
DS Questions
No ratings yet
DS Questions
3 pages
R23 3rd Year B.Tech CSE
0% (1)
R23 3rd Year B.Tech CSE
81 pages
External Question Paper Key
No ratings yet
External Question Paper Key
16 pages
0 Index
No ratings yet
0 Index
1 page
AI Chatbot Unit 2
No ratings yet
AI Chatbot Unit 2
7 pages
Unit 3
No ratings yet
Unit 3
5 pages
Testing Interview Questions
No ratings yet
Testing Interview Questions
14 pages
Fadzil Hafizi 1671860828923 16032024
No ratings yet
Fadzil Hafizi 1671860828923 16032024
1 page
Copy of Funded Companies List For Eastcoast
No ratings yet
Copy of Funded Companies List For Eastcoast
30 pages
VMware Interview Questions and Answers
0% (1)
VMware Interview Questions and Answers
7 pages
DHTML: Dynamic and Interactive Web Sites
No ratings yet
DHTML: Dynamic and Interactive Web Sites
23 pages
The PX-712A/PX-712SA
No ratings yet
The PX-712A/PX-712SA
39 pages
Professional Summary:: RPA Developer
No ratings yet
Professional Summary:: RPA Developer
5 pages
Thinkpad X1 Carbon (3Rd Gen) : 20Bs002Tus
No ratings yet
Thinkpad X1 Carbon (3Rd Gen) : 20Bs002Tus
3 pages
21BCS11102 - Prashant Kumar C++ 1.3
No ratings yet
21BCS11102 - Prashant Kumar C++ 1.3
13 pages
Forval
No ratings yet
Forval
5 pages
Manual Tascam - DR 70 Español
No ratings yet
Manual Tascam - DR 70 Español
128 pages
Python For Engineers Lab Manual AY2023 24
No ratings yet
Python For Engineers Lab Manual AY2023 24
86 pages
Abstract Method
No ratings yet
Abstract Method
15 pages
Efm-2200 NF00153 1112a
No ratings yet
Efm-2200 NF00153 1112a
4 pages
Webdev 21 Concepts Us
No ratings yet
Webdev 21 Concepts Us
254 pages
22MCA24 - Web Technologies - 2nd Assignment
No ratings yet
22MCA24 - Web Technologies - 2nd Assignment
1 page
The Magazine For Professional Testers
No ratings yet
The Magazine For Professional Testers
56 pages
OpenDaylight As Software Defined Networking Controller Shortcomings and Possible Solutions
No ratings yet
OpenDaylight As Software Defined Networking Controller Shortcomings and Possible Solutions
6 pages
Software Life Cycle Models: Johns Hopkins University
No ratings yet
Software Life Cycle Models: Johns Hopkins University
19 pages
History of Computer
No ratings yet
History of Computer
2 pages
Plan Installation and Maintenance of Hardware in Technology System
No ratings yet
Plan Installation and Maintenance of Hardware in Technology System
6 pages
SEO Audit Template
No ratings yet
SEO Audit Template
6 pages
CBSE Syllabus For Class 9 Information Technology 2023 24
No ratings yet
CBSE Syllabus For Class 9 Information Technology 2023 24
13 pages
UG - BCA - Computer Applications - 101 14 - Lab C and Data Structure
No ratings yet
UG - BCA - Computer Applications - 101 14 - Lab C and Data Structure
100 pages
PM Debug Info
No ratings yet
PM Debug Info
176 pages
Grade 7 IT Nov 1-5
No ratings yet
Grade 7 IT Nov 1-5
6 pages
High End: Test: CPU RAM CPU
No ratings yet
High End: Test: CPU RAM CPU
26 pages
Ccna 200 301
No ratings yet
Ccna 200 301
40 pages
Network Assessment Sample
No ratings yet
Network Assessment Sample
66 pages
El Filibusterismo Buod Kabanata 1 PDF 2
No ratings yet
El Filibusterismo Buod Kabanata 1 PDF 2
1 page

Unit 4

Uploaded by

Unit 4

Uploaded by

4.

NATURAL LANGUAGE PROCESSING, UNDERSTANDING,

Fig: Architecture diagram for chatbots

Fig: Applications of NLP and NLU

1. Processing Textual Data: The dataset can be downloaded from

Fig: Samples with a specific word

You might also like