TF Idf
TF Idf
document.
Terminologies:
df(t) = N(t)
where
Suppose we are looking for documents using the query Q and our
database is composed of the documents D1, D2, and D3.
Q: The cat.
Let’s compute the TF scores of the words “the” and “cat” (i.e. the query
words) with respect to the documents D1, D2, and D3.
Let’s compute the IDF scores of the words “the” and “cat”.
Multiplying TF and IDF gives the TF-IDF score of a word in a document. The
higher the score, the more relevant that word is in that particular
document.
Let’s compute the TF-IDF scores of the words “the” and “cat”.
Average TF-IDF of D3 = (0 + 0) / 2 = 0
Looks like the word “the” does not contribute to the TF-IDF scores of each
document. This is because “the” appears in all of the documents and thus
it is considered a not-relevant word.
As a conclusion, when performing the query “The cat” over the collection
of documents D1, D2, and D3, the ranked results would be: