Natural Language Processing - Compressed

Uploaded by

hrigved.ugale.1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views17 pages

Natural Language Processing - Compressed

Uploaded by

hrigved.ugale.1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Natural Language Processing

• Branch of AI that enables computers to process human language

in the form of text or voice data and mimic human conversation.
• Applications

Automatic Sentiment Text Virtual

Summarization Analysis Classification Assistants
Natural Language Processing
• Branch of AI that enables computers to process human language
in the form of text or voice data and mimic human conversation.

Underfitting Perfect fit Overfitting

Chatbots
• Script Bots
• Smart Bots
Script Bots Smart Bots
Data Processing
• Text Normalisation helps in cleaning up the textual data in such
a way that it comes down to a level where its complexity is lower
than the actual data.
• The whole textual data from all the documents altogether is known
as corpus.
• Sentence Segmentation: the whole corpus is divided into
sentences. Each sentence is taken as a different data so now the
whole corpus gets reduced to sentences.
• Tokenisation: Each sentence is then further divided into tokens.
Tokens is a term used for any word or number or special
character occurring in a sentence.
Data Processing
• Removing Stopwords, Special Characters and Numbers: tokens
which are not necessary are removed from the token list.
• Stopwords are the words which occur very frequently in the corpus
but do not add any value to it. Ex: a,an,and,are,as,for,it, is,into...
• Converting text to a common case: After the stopwords
removal, we convert the whole text into a similar case,
preferably lower case.
• Stemming: The remaining words are reduced to their root
words. In other words, stemming is the process in which the
affixes of words are removed and the words are converted to
their base form.
Data Processing
• Removing Stopwords, Special Characters and Numbers: tokens
which are not necessary are removed from the token list.
• Stopwords are the words which occur very frequently in the corpus
but do not add any value to it. Ex: a,an,and,are,as,for,it, is,into...
• Converting text to a common case: After the stopwords
removal, we convert the whole text into a similar case,
preferably lower case.
• Stemming: The remaining words are reduced to their root
words. In other words, stemming is the process in which the
affixes of words are removed and the words are converted to
their base form.
Data Processing
• Lemmatization: Difference between Stemming and
Lemmatization is that in lemmatization, the word we get after
affix removal (also known as lemma) is a meaningful one.
Bag of Words (BoW)
• Bag of Words is a Natural Language Processing model which
helps in extracting features out of the text which can be helpful
in machine learning algorithms. In bag of words, we get the
occurrences of each word and construct the vocabulary for the
corpus.
• Steps to implement BoW:
• Text Normalisation: Collect data and pre-process it.
• Create Dictionary: Make a list of all the unique words occurring in
the corpus. (Vocabulary).
• Create document vectors: For each document in the corpus, find
out how many times the word from the unique list of words has
occurred.
Bag of Words (BoW)
• Bag of Words is a Natural Language Processing model which
helps in extracting features out of the text which can be helpful
in machine learning algorithms. In bag of words, we get the
occurrences of each word and construct the vocabulary for the
corpus.
• Steps to implement BoW:
• Create document vectors for all the documents.

• Ex:
• Document 1: Aman and Anil are stressed.
• Document 2: Aman went to a therapist.
• Document 3: Anil went to download a health chatbot.
Bag of Words (BoW)
• Step 1: Text Normalisation - Collect data and pre-process it.
• After normalisation-
• Document 1: [aman, and, anil, are, stressed]
• Document 2: [aman, went, to, a, therapist]
• Document 3: [anil, went, to, download, a, health, chatbot]

• Step 2: Create Dictionary - List down all the words which occur in all
three documents.
Bag of Words (BoW)
• Step 3: Create document vector.
aman and anil are stressed went to a therapist download heal chat
th bot
1 1 1 1 1 0 0 0 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 0
0 0 1 0 0 1 1 1 0 1 1 1

• Step 4:
• Term Frequency(TF): Frequency of a word in one document.
Bag of Words (BoW)
• Inverse Document Frequency(IDF): Document Frequency is the
number of documents in which the word occurs irrespective of how
many times it has occurred in those documents.
aman and anil are stressed went to a therapist download heal chat
th bot
1 1 1 1 1 0 0 0 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 0
0 0 1 0 0 1 1 1 0 1 1 1
aman and anil are stressed went to a therapist download heal chat
th bot
3/2 3/1 3/1 3/1 3/1 3/2 3/2 3/2 3/1 3/1 3/1 3/1
Bag of Words (BoW)
TFIDF = TF(W) x log(IDF(W))
aman and anil are stress went to a therapist download heal chat
th bot
1*log( 1*lo 1*lo 1*lo 1*log(3 0*log(3/ 0*l 0*l 0*log(3) 0*log(3) 0*lo 0*log(
3/2) g(3) g(3/ g(3) ) 2) og( og( g(3) 3)
2) 3/2) 3/2)
1*log( 0*lo 0*lo 0*lo 0*log(3 1*log(3/ 1*l 1*l 1*log(3) 0*log(3) 0*lo 0*log(
3/2) g(3) g(3/ g(3) ) 2) og( og( g(3) 3)
2) 3/2) 3/2)

0*log( 0*lo 1*lo 0*lo 0*log(3 1*log(3/ 1*l 1*l 0*log(3) 1*log(3) 1*lo 1*log(
3/2) g(3) g(3/ g(3) ) 2) og( og( g(3) 3)
2) 3/2) 3/2)
Bag of Words (BoW)
TFIDF = TF(W) x log(IDF(W))
aman and anil are stress went to a therapis download health chatbot
t
0.176 0.47 0.1 0.47 0.477 0 0 0 0 0 0 0
7 76 7

0.176 0 0 0 0 0.176 0.176 0.176 0.477 0 0 0

0 0 0.1 0 0 0.176 0.176 0.176 0 0.477 0.477 0.477

76
Applications of TFIDF

Document Classification:
Helps in classifying the type and genre of a document.
Topic Modelling:
Helps in predicting the topic for a corpus.
Information Retrieval System:
To extract the important information out of a corpus.
Stop word filtering:
Helps in removing the unnecessary words out of a text body.

Corpus Linguistics Volume 1
100% (4)
Corpus Linguistics Volume 1
796 pages
NLP Sem Questions and Answers
No ratings yet
NLP Sem Questions and Answers
72 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
Natural Language Processing Question Bank
No ratings yet
Natural Language Processing Question Bank
3 pages
NLP QB
100% (2)
NLP QB
14 pages
Machine Translation Dissertation
100% (2)
Machine Translation Dissertation
6 pages
14-Word Embeddings II
No ratings yet
14-Word Embeddings II
31 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Word Embedding 9 Mar 23 PDF
No ratings yet
Word Embedding 9 Mar 23 PDF
16 pages
Proceedings CLICit 2014
No ratings yet
Proceedings CLICit 2014
404 pages
PDF NLP
No ratings yet
PDF NLP
7 pages
Text Mining
No ratings yet
Text Mining
62 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Languages and Linguistics (EbookCenter - Ir) (20140507)
No ratings yet
Languages and Linguistics (EbookCenter - Ir) (20140507)
48 pages
Specialized Areas of Study in Linguistics
No ratings yet
Specialized Areas of Study in Linguistics
42 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
NLP TT-1 Question Bank
No ratings yet
NLP TT-1 Question Bank
21 pages
NLP
No ratings yet
NLP
40 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
Statistical NLP
No ratings yet
Statistical NLP
45 pages
NLP m2
No ratings yet
NLP m2
71 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
C10 - Ai - Unit 3 - NLP - Half Yearly
No ratings yet
C10 - Ai - Unit 3 - NLP - Half Yearly
37 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
(Slide) Neural Machine Translation
No ratings yet
(Slide) Neural Machine Translation
37 pages
Aycock 和 Bawden - 2024 - Topic-guided Example Selection for Domain Adaptati
No ratings yet
Aycock 和 Bawden - 2024 - Topic-guided Example Selection for Domain Adaptati
21 pages
Part B Notes
No ratings yet
Part B Notes
62 pages
18 Text Mining - Text Preprocessing
No ratings yet
18 Text Mining - Text Preprocessing
40 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
Hendy Et Al (2023) - How Good at GPT Models at Machine Translation-2
No ratings yet
Hendy Et Al (2023) - How Good at GPT Models at Machine Translation-2
30 pages
Ai TXT Unit2
No ratings yet
Ai TXT Unit2
14 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
Text Analytics and Natural Language Processing - KAI073
No ratings yet
Text Analytics and Natural Language Processing - KAI073
24 pages
Rationale-Guided Retrieval Augmented Generation For Medical Question Answering
No ratings yet
Rationale-Guided Retrieval Augmented Generation For Medical Question Answering
15 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
NLP Unit 5
No ratings yet
NLP Unit 5
39 pages
Dupppppppppp
No ratings yet
Dupppppppppp
15 pages
Text Mining
No ratings yet
Text Mining
34 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
Semeval-2015 Task 10: Sentiment Analysis in Twitter: Sara Rosenthal Preslav Nakov Svetlana Kiritchenko
No ratings yet
Semeval-2015 Task 10: Sentiment Analysis in Twitter: Sara Rosenthal Preslav Nakov Svetlana Kiritchenko
13 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
1009 NLP PPT
No ratings yet
1009 NLP PPT
31 pages
NLP Notes
No ratings yet
NLP Notes
12 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
20 pages
Pdftriage: Question Answering Over Long, Structured Documents
No ratings yet
Pdftriage: Question Answering Over Long, Structured Documents
17 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
Torward Effective Disambiguation For MT With LLM
No ratings yet
Torward Effective Disambiguation For MT With LLM
14 pages
Adnan Amin
No ratings yet
Adnan Amin
19 pages
Investigating ESL Learners Perception and Problem
No ratings yet
Investigating ESL Learners Perception and Problem
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
Ass7 Write Up .Final
No ratings yet
Ass7 Write Up .Final
11 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
517-C-30070-Assignment - Chapter NLP
No ratings yet
517-C-30070-Assignment - Chapter NLP
9 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
CB SC P2cse23010
No ratings yet
CB SC P2cse23010
30 pages
Unit 6 (NLP)
No ratings yet
Unit 6 (NLP)
8 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
2 Marks
No ratings yet
2 Marks
11 pages
NLP Ai X
No ratings yet
NLP Ai X
6 pages
Hijack RAG
No ratings yet
Hijack RAG
9 pages
Discourse Analysis of English General Extenders in Nigerian Newspapers Editorials
No ratings yet
Discourse Analysis of English General Extenders in Nigerian Newspapers Editorials
32 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
NLP Class10 PDF
No ratings yet
NLP Class10 PDF
9 pages
Pipeline
No ratings yet
Pipeline
9 pages
Q ClassX AI Ch7
No ratings yet
Q ClassX AI Ch7
6 pages
Reinforcement Learning For Optimizing RAG For Domain Chatbots
No ratings yet
Reinforcement Learning For Optimizing RAG For Domain Chatbots
7 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
Effectively Leveraging BERT For Legal Document Classification
No ratings yet
Effectively Leveraging BERT For Legal Document Classification
7 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
Unit 6 - NLP Notes
No ratings yet
Unit 6 - NLP Notes
7 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
SNLP Past Papers
No ratings yet
SNLP Past Papers
6 pages
Stanford CS 224N Deep Learning For NLP Practice Quiz Pack
No ratings yet
Stanford CS 224N Deep Learning For NLP Practice Quiz Pack
4 pages
Ch-6 NLP
No ratings yet
Ch-6 NLP
4 pages
NLP - CH-6
No ratings yet
NLP - CH-6
4 pages
Bag of Words Algorithm: Paragraph
No ratings yet
Bag of Words Algorithm: Paragraph
3 pages
NLP Key Points
No ratings yet
NLP Key Points
3 pages
NLP Exp 4
No ratings yet
NLP Exp 4
2 pages
Assignment 2 - NLP 2024
No ratings yet
Assignment 2 - NLP 2024
2 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
From Everand
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
Tanjimul Islam Tareq
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet