PORTFOLIO-AI-NLP - Document Vector
PORTFOLIO-AI-NLP - Document Vector
Solution:
Step 1: Text Normalisation ( Collect data and preprocess)
Here are 4 documents having 1 senetence each , after text Normalisation the text becomes
Doc 1 2 1 1 0 0 0 0 0 0 0 0 0
Doc 2 0 0 1 1 1 1 0 0 0 0 0 0
Doc 3 0 0 1 0 0 1 1 1 0 0 0 0
Doc 4 0 0 0 0 0 0 0 0 1 1 1 3
1 1 3 1 1 2 1 1 1 1 1 1
Document frequency: Document frequency is the number of documents in which the word occurs
irrespective of how many times it has occurred in that document.
Step 6: Inverse Document Frequency
Inverse document frequency table is represented wherein, we need to put the document
frequency in the denominator while the total number of documents is the numerator.
Here, the total number of documents are 4, hence inverse document frequency becomes:
Johny yes papa eating sugar no telling lies open your mouth ha
4/1 4/1 4/3 4/1 4/1 4/2 4/1 4/1 4/1 4/1 4/1 4/1