NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
Corpus
Document 1: We can use health chatbots for treating stress.
Document 2: We can use NLP to create chatbots and we will be making health chatbots now!
Document 3: Health Chatbots cannot replace human counsellors now. Yay >< !! @1nteLA!4Y
No. Sentence
Step 2: Tokenization
Separate your sentences into tokens. How many tokens do you have?
Tokens
Modified form
Step 5: Stemming
List out the stem words.
Stem words
Step 6: Lemmatization
List out the root words/ lemma.
Lemma
Final data
List out the final, processed data.
Processed data
Bag of words
Step 1: Collect data and process it
For this exercise, we can use the sentences without processing it so that it is easier for us to read the sentences.
No. Sentence
2 We can use NLP to create chatbots and we will be making health chatbots now
Dictionary
Step 1 - 3: Count the number of documents where the word appears at least once & write that
number down next to the word in your vocabulary to get your document frequency. Draw your
own table for this!
aman and Anil are stressed went to a therapist download health chatbot
2 1 2 1 1 2 2 2 1 1 1 1
aman and anil are stressed went to a therapist download health chatbot
3/2 3/1 3/2 3/1 3/1 3/2 3/2 3/2 3/1 3/1 3/1 3/1
Your tf-idf: