1009 NLP PPT

The document discusses Natural Language Processing (NLP) and its applications, including automatic summarization, sentiment analysis, text classification, and virtual assistants. It outlines the AI project cycle, chatbot types (script-bots and smart-bots), and the differences between human and computer languages, emphasizing text normalization processes. Additionally, it explains the Bag of Words algorithm for converting text into numerical data for machine understanding.

Uploaded by

aripandey2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views31 pages

1009 NLP PPT

Uploaded by

aripandey2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Subject specific skills

chapter: 6
Natural Language Processing
(8 marks)
NLP
• Enables machines to understand and process human language
Applications
• Automatic summarization
• Sentiment analysis
• Text classification-> spam filtering
• Virtual assistant
Applications
• Automatic summarization: summarizing the meaning of documents.
• Sentiment analysis: to identify sentiments among several posts.
• Text classification: to assign predefined categories to a document.
• Virtual assistant: Google Assistant, ChatGPT
AI project cycle
• Problem scoping
• Data acquisition-> surveys, observations, database from internet,
interviews
• Data exploration-> text is normalized through various steps and is
lowered to minimum vocabulary
• Modelling
• Evaluation
chatbots
chatbots
• Script- bots
• These bots are pre-programmed with specific responses to certain
phrases or keywords.
• They are good for simple tasks like answering frequently asked
questions, providing basic information, and processing simple
transactions.
• However, they are limited to a set of predetermined responses and
cannot learn from previous interactions with users.
Smart- bot
• These bots use artificial intelligence (AI) to perform their functions.
• These bots are not pre-programmed with responses.
• They can learn from previous interactions with users.
chatbots
Script- bot Smart- bot
Easy to make Flexible and powerful
Work around script which is Work on bigger databases and other
programed in them. resources directly.
Mostly they are free and are easy to Learns with more data
integrate to a messaging platform
No or little language processing use NLP to perform their functions.
skills
Limited functionality Wide functionality
Human language vs computer language

The computer understands the language of

numbers.
Human language
• Nouns, verbs, adverbs, adjectives
• There are rules to provide structure to a language.
• Syntax: the grammatical structure of a sentence.
• human communication is complex.
• His face turned red after he found out that he took the wrong bag.
• The red car zoomed past his nose.
• His face turns red after consuming the medicine.
• In natural language a word can have multiple meanings according to
the context of it.
• Semantics is the meaning of words, phrases, and sentences in human
language.
• Syntax: the grammatical structure of a sentence.
• Semantics is the meaning of words, phrases, and sentences in human
language.
• Different syntax, same semantics: 2+3 = 3+2
Data processing/Text normalization
• Text normalization helps to lower the complexity of
textual data.
• The whole textual data from all the documents
altogether is known as corpus.
• When developing models like ChatGPT, large
amounts of text data (referred to as a corpus) are
used as input during the training process.
• The corpus can include books, articles, websites,
conversations, and other text sources.
• This helps the AI learn patterns of language,
grammar, and knowledge.
Steps of Text Normalisation
i. Sentence Segmentation
ii. Tokenisation
iii. Removing stopwords, special characters and
numbers
iv. Converting text to a common case
v. Stemming or Lemmatization
Sentence Segmentation
• The whole corpus is divided into sentences.
• Example: Corpus: Dr. Smith went to the hospital. He arrived on time.
The operation started soon after.
• After sentence segmentation:
Sentence 1: Dr. Smith went to the hospital.
Sentence 2: He arrived on time.
Sentence 3: The operation started soon after.
Tokenisation
• Tokens : any word or number or special character occurring in a
sentence.
• Example: Sentence: Dr. Smith went to the hospital.
• When divided into tokens:

Dr . Smith went

to the hospital .
Removing stopwords, special characters and
numbers
• Tokens which are not necessary are removed from the token list.
• Stopwords are the words which occur very frequently in the corpus
but do not add any value to it.

Note: if you are working on a document containing email IDs, then do

not remove the special characters and numbers from that document.
Removing stopwords, special characters and
numbers
Dr . Smith went

to the hospital .

After removing stopwords, special characters and

numbers

Dr Smith went hospital

Converting text to a common case
• This ensures that the case-sensitivity of the machine does not
consider same words as different just because of different cases.
Converting text to a common case

Dr Smith went hospital

• After converting text to lower case

dr smith went hospital

Stemming
• Stemming is the process in which the affixes of words are removed
and the words are converted to their base form.

• Stemming does not take into account if the stemmed word is meaningful or not.
• It just removes the affixes hence it is faster.
Lemmatization
• Stemming and lemmatization both are alternative processes to each
other.
• Both does removal the affixes
• In lemmatization, the word we get after affix removal(also known as
lemma) is a meaningful one.
• It takes a longer time to execute than stemming.
Lemmatization
Q. Document1: Aman and Anil are stressed.
• Document2: Aman went to a therapist.
• Document3: Anil went to download a health chatbot.
Apply text normalization on the above corpus. Write the output of each
steps in text normalization.
Q. Define
Corpus
Token
Lemma
Syntax
Semantics
Stopwords
Q. Differentiate between stemming and lemmatization with examples.
Bag of Words
• We need to convert the tokens into numbers. Since computer can
understand only numbers.
• For this we would use the Bag of Words algorithm.
Bag of Words algorithm
1. Text normalization
2. Create dictionary
3. Create document vectors
4. Create document vectors for all the documents.
Bag of Words algorithm: Step 1: Text
normalization
• Document 1: Aman and Anil are stressed
• Document 2: Aman went to a therapist
• Document 3: Anil went to download a health chatbot
aman and anil are stressed

aman went to a therapist

anil went to download a health chatbot

Note: no tokens have been removed in the stopwords removal step. It is
because we have very little data.
Step 2: Create dictionary
• Make a list of all unique words occurring in the corpus.

aman and anil are stressed

aman went to a therapist

anil went to download a health chatbot

aman and anil are stressed went to a therapist download health chatbot
Step3: Create document vectors
• For each document in the corpus, find out how many times the word
from the dictionary has occurred.
• Document 1: Aman and Anil are stressed

aman and anil are stressed went to a therapist download health chatbot

1 1 1 1 1 0 0 0 0 0 0 0
Step 4: Create document vectors for all the documents.

• Document 1: Aman and Anil are stressed

• Document 2: Aman went to a therapist
• Document 3: Anil went to download a health chatbot
aman and anil are stressed went to a therapist download health chatbot

1 1 1 1 1 0 0 0 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 0
0 0 1 0 0 1 1 1 0 1 1 1
Q. Write the steps of Bag of Words algorithm on the below corpus.
Document 1: Dr. Smith went to the hospital.
Document 2: He arrived on time.
Document 3: The operation started soon after.

E-Voting Using Bock Chain Srs
0% (1)
E-Voting Using Bock Chain Srs
13 pages
Natural Language Processing Notes Class 10 AI
100% (1)
Natural Language Processing Notes Class 10 AI
20 pages
Text Preprocessing: Information Retrieval
100% (2)
Text Preprocessing: Information Retrieval
16 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Natural Language Processing Notes Class 10 AI
No ratings yet
Natural Language Processing Notes Class 10 AI
25 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
NLP Class10 PDF
No ratings yet
NLP Class10 PDF
9 pages
Previous Year Question Paper NLP
No ratings yet
Previous Year Question Paper NLP
5 pages
PDF NLP
No ratings yet
PDF NLP
7 pages
Introduction To
No ratings yet
Introduction To
16 pages
CNS - R 20 Unit 1
No ratings yet
CNS - R 20 Unit 1
19 pages
Document Vector Table Question 2
No ratings yet
Document Vector Table Question 2
2 pages
Q ClassX AI Ch7
No ratings yet
Q ClassX AI Ch7
6 pages
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing
No ratings yet
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing
3 pages
Sample Paper Questions - NLP (Part 2)
No ratings yet
Sample Paper Questions - NLP (Part 2)
7 pages
Natural Language Processing - Compressed
No ratings yet
Natural Language Processing - Compressed
17 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
Introduction To Natural Language Processing (Nlp) : Ths. Đặng Nhân Cách Email: [email protected]
No ratings yet
Introduction To Natural Language Processing (Nlp) : Ths. Đặng Nhân Cách Email: [email protected]
25 pages
Ch-3 NLP Questions
No ratings yet
Ch-3 NLP Questions
6 pages
Text Preprocessing
No ratings yet
Text Preprocessing
59 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Unit-6 Natural Language Processing
No ratings yet
Unit-6 Natural Language Processing
7 pages
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
No ratings yet
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
19 pages
Lec 2
No ratings yet
Lec 2
21 pages
Session1 2024 - 2025 - Natural Language Processing
No ratings yet
Session1 2024 - 2025 - Natural Language Processing
40 pages
NLP Ai X
No ratings yet
NLP Ai X
6 pages
Unit 6 (NLP)
No ratings yet
Unit 6 (NLP)
8 pages
Part B Notes
No ratings yet
Part B Notes
62 pages
NLP Notes
No ratings yet
NLP Notes
10 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
54 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Text Mining
No ratings yet
Text Mining
34 pages
Welcome
No ratings yet
Welcome
8 pages
Introduction
No ratings yet
Introduction
23 pages
Natural Language Processing CS 1462: Some Slides Borrows From Carl Sable
No ratings yet
Natural Language Processing CS 1462: Some Slides Borrows From Carl Sable
54 pages
NLP Unit 1
No ratings yet
NLP Unit 1
44 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
X - AI-NLP Worksheet
No ratings yet
X - AI-NLP Worksheet
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP Unit1
No ratings yet
NLP Unit1
24 pages
NLP Unit 1 Part1
No ratings yet
NLP Unit 1 Part1
61 pages
NLP
No ratings yet
NLP
40 pages
C10 - Ai - Unit 3 - NLP - Half Yearly
No ratings yet
C10 - Ai - Unit 3 - NLP - Half Yearly
37 pages
NLP Final
No ratings yet
NLP Final
33 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
Unit 6 - NLP Notes
No ratings yet
Unit 6 - NLP Notes
7 pages
UNIT 1 - Part1
No ratings yet
UNIT 1 - Part1
121 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
NLP m2
No ratings yet
NLP m2
71 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
List of High Level Programming Languages
No ratings yet
List of High Level Programming Languages
3 pages
Ch11 Crypto7e
No ratings yet
Ch11 Crypto7e
49 pages
AnshPatelResume OP-2
No ratings yet
AnshPatelResume OP-2
1 page
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Career in Software Engineering - Software Engineering Is The Branch of Engineering
No ratings yet
Career in Software Engineering - Software Engineering Is The Branch of Engineering
4 pages
07 Padam+Paper
No ratings yet
07 Padam+Paper
11 pages
Top 10 Courses
No ratings yet
Top 10 Courses
4 pages
Unit II
No ratings yet
Unit II
35 pages
Rit 39
No ratings yet
Rit 39
19 pages
Woldia University, School of Computing: Advanced Database Systems Final Exam (40%)
No ratings yet
Woldia University, School of Computing: Advanced Database Systems Final Exam (40%)
5 pages
Datamining 1
No ratings yet
Datamining 1
7 pages
Dbms-1 Question Bank
No ratings yet
Dbms-1 Question Bank
12 pages
Serverless Computing
No ratings yet
Serverless Computing
10 pages
CC Qualis Periodicos 2017 Ranking
No ratings yet
CC Qualis Periodicos 2017 Ranking
41 pages
Hughes LaRon Resume 05-30-23
No ratings yet
Hughes LaRon Resume 05-30-23
4 pages
Kostrytsyn Java CV
No ratings yet
Kostrytsyn Java CV
1 page
Computer Science (H) - DSE 4 - T
No ratings yet
Computer Science (H) - DSE 4 - T
4 pages
IJCRT2312092
No ratings yet
IJCRT2312092
6 pages
Maang Resume Rishabh
No ratings yet
Maang Resume Rishabh
2 pages
3Nf, BCNF, 4Nf & 5Nf: UNIT-3 Rdbms BCA304 Presenter-Daisy Sharmah
No ratings yet
3Nf, BCNF, 4Nf & 5Nf: UNIT-3 Rdbms BCA304 Presenter-Daisy Sharmah
31 pages
Chapter 1 - Introdution
No ratings yet
Chapter 1 - Introdution
26 pages
Machine Learning Last Mile Delivery
No ratings yet
Machine Learning Last Mile Delivery
16 pages
An Efficacy Analysis of Data Using Fuzzy Logic and Fractal Encryption Techniques For Cloud Platform Data Security
No ratings yet
An Efficacy Analysis of Data Using Fuzzy Logic and Fractal Encryption Techniques For Cloud Platform Data Security
6 pages
AI Coursework
No ratings yet
AI Coursework
43 pages
Resume That Gets You A Job in Microsoft
No ratings yet
Resume That Gets You A Job in Microsoft
1 page
03data Measurement and Causal Inferences in Machine Learning Opportunities and Challenges For Marketing
No ratings yet
03data Measurement and Causal Inferences in Machine Learning Opportunities and Challenges For Marketing
14 pages
122cs0557 Biswaranjan Dash CV
No ratings yet
122cs0557 Biswaranjan Dash CV
1 page
Erm BC
No ratings yet
Erm BC
2 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
AI For Everyone
From Everand
AI For Everyone
Gurprit Singh
No ratings yet
Writing for the World of Work: “Writing Right – a Clear, Concise,Complete, Correct and Courteous Approach to Good Business Writing”
From Everand
Writing for the World of Work: “Writing Right – a Clear, Concise,Complete, Correct and Courteous Approach to Good Business Writing”
Harley Robinson
No ratings yet