Introduction To Information Retrieval - by William Scott - Medium
Introduction To Information Retrieval - by William Scott - Medium
Introduction To Information Retrieval - by William Scott - Medium
William Scott
For those who are highly interested, i suggest the book “Introduction to Information
Retrieval” book by Manning
3. TF-IDF
More to come…
So what an IR system does is, it takes the query from user, understands it, searches it in its
corpus and sends the results of the relevant documents.
Synonyms: There are many words which have alternative words. for example, when
a user is trying to get a haircut, his search query could be “salon” or “barber”. we
cannot just show him the documents which have barber and which doesn’t have
salon. because they both mean the same thing. More general examples are Mom —
Mother, Hat — Cap.
Homographs: These are the words which have the same spelling but have different
meaning in different sentences. we do not basically deal with the pronunciation of
words here. Lie — can be lying on bed, or lying to another person. tear — could be
tearing a paper, or having tears (as in crying). Apple — Could be a company or a
fruit.
so due to these above problems, we need to build an intelligent IR model which can
understand the query of the user and give the relevant documents. do not worry about
the above problems, we will basically deal with them later, just as a gist, we deal with
this by going through a important stage called, preprocessing, where the information is
turned into a more general form which can help us relate the words much better.
Intelligent IR
When we are trying to retrieve relevant documents, we need to first define relevance.
are we going to retrieve the latest documents? or are we going to retrieve the documents
which match the subject?
An Intelligent IR model do not just depend on one factor to find out relevance,
metadata, authoritativeness, type of information need, meaning of the query,
meaning of the sentence in the document and many such factors are considered.
Basic Terminology:
Collection / Corpus: collection of documents
3. TF-IDF
More to come…
Resources:
Introduction to Information Retrieval — Manning