Natural Language Processing
Natural Language Processing
Types of Chatbots
1) Script Bot
2) Smart Bot
Script bots work around a script Smart bots work on bigger databases
which is programmed in them and other resources directly
Mostly they are free and are easy to Smart bots learn with more data
integrate to a messaging platform
No or little language processing Coding is required to take this up on
skills board
Lemmatization
Stemming and lemmatization are alternate techniques to
one another because they both function to remove
affixes.
But the difference between both of them is that in
lemmatization, the word we get after affix removal (also
known as lemma) is a meaningful one.
Word Affixes Lemma
Tries Es Try
Trying Ing Try
Sweetened ed Sweeten
Sweetening Ing Sweeten
Sweetener Er Sweeten
BAG OF WORDS
Bag of Words is an algorithm of Natural Language
processing. In bag of words, we get the occurrences of each
word and construct the vocabulary for the corpus.
Here are three documents having one sentence each. After text normalisation,
the text becomes:
NOTE: Note that no tokens have been removed in the stop words removal step.
It is because we have very little data and since the frequency of all the words is
almost the same, no word can be said to have lesser value than the other.
Step 2: Create Dictionary : Go through all the steps and create a dictionary
i.e., list down all the words which occur in all three documents:
Step 3: Create document vector : In this step, the vocabulary is written in the
top row. Now, for each word in the document, if it matches with the vocabulary,
put a 1 under it.
If the same word appears again, increment the previous value by 1. And if the
word does not occur in that document, put a 0 under it.
Document 1:
Aman And Anil Are Stressed Went To A Therapist Download Health chatbot
1 1 1 1 1 0 0 0 0 0 0 0
Since in the first document, we have words: aman, and, anil, are, stressed. So,
all these words get a value of 1 and the rest of the words get a 0 value.
Document 2:
Aman And Anil Are Stressed Went To A Therapist Download Health chatbot
1 0 0 0 0 1 1 1 1 0 0 0
Document 3:
Aman And Anil Are Stressed Went To A Therapist Download Health chatbot
0 0 1 0 0 1 1 1 0 1 1 1
Combined Table
Aman And Anil Are Stressed Went To A Therapist Download Health chatbot
1 1 1 1 1 0 0 0 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 0
0 0 1 0 0 1 1 1 0 1 1 1
TFIDF: Term Frequency & Inverse Document Frequency
TFIDF helps in identifying the value for each word.
Finally, the words have been converted to numbers. These numbers are the
values of each for each document. Here, you can see that since we have less
amount of data, words like ‘are’ and ‘and’ also have a high value. But as the IDF
value increases, the value of that word decreases.