0% found this document useful (0 votes)
102 views3 pages

Bag of Words Algorithm: Paragraph

The document describes the bag of words algorithm for creating document vectors from text. It outlines the steps: 1) text normalization including tokenization and removing stop words, 2) creating a dictionary of terms, 3) making document vectors counting term frequencies, and 4) calculating TF-IDF scores to determine the most important terms in each sentence. The conclusion analyzes which terms received the highest TF-IDF scores and priority in each of the 3 example sentences.

Uploaded by

Varshitha Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views3 pages

Bag of Words Algorithm: Paragraph

The document describes the bag of words algorithm for creating document vectors from text. It outlines the steps: 1) text normalization including tokenization and removing stop words, 2) creating a dictionary of terms, 3) making document vectors counting term frequencies, and 4) calculating TF-IDF scores to determine the most important terms in each sentence. The conclusion analyzes which terms received the highest TF-IDF scores and priority in each of the 3 example sentences.

Uploaded by

Varshitha Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

BAG OF WORDS ALGORITHM

Paragraph –

We can use health chatbots for treating stress

We can use NLP to create chatbots and we will be making health chatbots now!

Health chatbots cannot replace human counselors now.

Step 1- Text Normalization

Sentence segmentation

Sent 1: We can use health chatbots for treating stress.

Sent 2: We can use NLP to create chatbots and we will be making health chatbots now!

Sent 3: Health chatbots cannot replace human counselors now.

Tokenization

Sent 1: We can use health chatbots for treating stress .

Sent 2: We can use NLP to create chatbots and we will be making health chatbots now !

Sent 3: Health chatbots cannot replace human counselors .

Removing stop words, special characters, numbers

Sent 1: We health chatbots treating stress

Sent 2: We NLP create chatbots making health chatbots

Sent 3: Health chatbots replace human counselors

Converting stentences to lower case

Sent 1: we health chatbots treating stress

Sent 2: we nlp create chatbots making health chatbots

Sent 3: health chatbots replace human counselors

Stemming

Sent 1: we health chatbots treat stress.


Sent 2: we nlp create chatbots make health chatbots

Sent 3: health chatbots replace human counselor

Lemmatization

Sent 1: we health chatbot treat stress

Sent 2: we nlp create chatbot make health chatbot

Sent 3: health chatbot replace human counselor

Step 2- Create dictionary

w healt chatb tre stre nl crea mak repla counsel huma


e h ot at ss p te e ce or n

Step 3- Make document vector for all the sentences

we health chatbot treat stress nlp create make replace counselor human

Sent 1 1 1 1 1 0 0 0 0 0 0
1

Sent 1 1 1 0 0 1 1 1 0 0 0
2

Sent 0 1 1 0 0 0 0 0 1 1 1
3

Step 4: TFIDF

 Term Frequency

w health chatbot treat stres nlp creat mak replac human counselor
e s e e e

2 3 3 1 1 1 1 1 1 1 1

 Document frequency

we health chatbot trea stress nlp creat mak replace huma counselor
t e e n

3/ 3/3 3/3 3/1 3/1 3/1 3/1 3/1 3/1 3/1 3/1
2
TFIDF(W) = TF(W)*log[IDF(W)]

we health chatbot treat stress nlp create make replace counselo human
r

Sent 1*log3/ 1*log3/ 1*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/1 0*log3/1
1 2 3 3 1 1 1 1 1 1

Sent 1*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/ 1*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/1 0*log3/1
2 2 3 3 1 1 1 1 1 1

Sent 0*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/ 1*log3/ 1*log3/1 1*log3/1
3 2 3 3 1 1 1 1 1 1

IDF Values

we health chatbot treat stress nlp creat make replac counselo human
e e r

Sent1 0.176 0 0 0.477 0.477 0 0 0 0 0 0

Sent2 0.176 0 0 0 0 0.477 0.477 0.477 0 0 0

Sent3 0 0 0 0 0 0 0 0 0.477 0.477 0.477

CONCLUSION

In sentence 1, priority was given to stress and treat as compared to other words.

In sentence 2, priority was given to NLP, create and make as compared to other words.

In sentence 3, priority was given to replace, counselor and human as compared to other words.

-O.Varshitha
10C

You might also like