0% found this document useful (0 votes)
103 views11 pages

Information Retrieval 8 Term Weighting A

Term weighting assigns a weight to each term in a document based on its frequency and other properties to indicate the importance and usefulness of that term in describing the document's contents; the weights are used to rank documents in response to a query. Term frequencies within documents and a term's document frequency are used to compute weights, like TF-IDF, while a term correlation matrix reflects the correlation between terms that tend to co-occur.

Uploaded by

Vaibhav Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views11 pages

Information Retrieval 8 Term Weighting A

Term weighting assigns a weight to each term in a document based on its frequency and other properties to indicate the importance and usefulness of that term in describing the document's contents; the weights are used to rank documents in response to a query. Term frequencies within documents and a term's document frequency are used to compute weights, like TF-IDF, while a term correlation matrix reflects the correlation between terms that tend to co-occur.

Uploaded by

Vaibhav Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Information Retrieval : 8

Term Weighting

Prof Neeraj Bhargava


Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Term Weighting
• The terms of a document are not equally useful
for describing the document contents
• In fact, there are index terms which are simply
vaguer than others
• There are properties of an index term which are
useful for evaluating the importance of the term
in a document
– For instance, a word which appears in all documents
of a collection is completely useless for retrieval tasks
Term Weighting
• To characterize term importance, we associate a
weight wi,j > 0 with each term ki that occurs in the
document dj
– If ki that does not appear in the document dj , then
wi,j = 0.
• The weight wi,j quantifies the importance of the
index term ki for describing the contents of
document dj
• These weights are useful to compute a rank for
each document in the collection with regard to a
given query
Term Weighting
Term Weighting
• The weights wi,j can be computed using the frequencies of
occurrence of the terms within documents
• Let fi,j be the frequency of occurrence of index term ki in
• the document dj
• The total frequency of occurrence Fi of term ki in the
collection is defined as

• where N is the number of documents in the collection


Term Weighting
• The document frequency ni of a term ki is the number of
documents in which it occurs
• Notice that ni < Fi or ni = Fi.
• For instance, in the document collection below, the values fi,j ,
Fi and ni associated with the term do are
Term-term correlation matrix
• For classic information retrieval models, the index term
weights are assumed to be mutually independent
– This means that wi,j tells us nothing about wi+1,j
• This is clearly a simplification because occurrences of
index terms in a document are not uncorrelated
• For instance, the terms computer and network tend to
appear together in a document about computer
networks
– In this document, the appearance of one of these terms
attracts the appearance of the other
• Thus, they are correlated and their weights should
reflect this correlation.
Term-term correlation matrix
Term-term correlation matrix
TF-IDF Weights
• TF-IDF term weighting scheme:
– Term frequency (TF)
– Inverse document frequency (IDF)
– Foundations of the most popular term weighting
scheme in IR
Assignment
• Discuss in detail the concept of Term
Weighting and Term Correlation matrix

You might also like