Term weighting assigns a weight to each term in a document based on its frequency and other properties to indicate the importance and usefulness of that term in describing the document's contents; the weights are used to rank documents in response to a query. Term frequencies within documents and a term's document frequency are used to compute weights, like TF-IDF, while a term correlation matrix reflects the correlation between terms that tend to co-occur.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
103 views11 pages
Information Retrieval 8 Term Weighting A
Term weighting assigns a weight to each term in a document based on its frequency and other properties to indicate the importance and usefulness of that term in describing the document's contents; the weights are used to rank documents in response to a query. Term frequencies within documents and a term's document frequency are used to compute weights, like TF-IDF, while a term correlation matrix reflects the correlation between terms that tend to co-occur.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11
Information Retrieval : 8
Term Weighting
Prof Neeraj Bhargava
Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer Term Weighting • The terms of a document are not equally useful for describing the document contents • In fact, there are index terms which are simply vaguer than others • There are properties of an index term which are useful for evaluating the importance of the term in a document – For instance, a word which appears in all documents of a collection is completely useless for retrieval tasks Term Weighting • To characterize term importance, we associate a weight wi,j > 0 with each term ki that occurs in the document dj – If ki that does not appear in the document dj , then wi,j = 0. • The weight wi,j quantifies the importance of the index term ki for describing the contents of document dj • These weights are useful to compute a rank for each document in the collection with regard to a given query Term Weighting Term Weighting • The weights wi,j can be computed using the frequencies of occurrence of the terms within documents • Let fi,j be the frequency of occurrence of index term ki in • the document dj • The total frequency of occurrence Fi of term ki in the collection is defined as
• where N is the number of documents in the collection
Term Weighting • The document frequency ni of a term ki is the number of documents in which it occurs • Notice that ni < Fi or ni = Fi. • For instance, in the document collection below, the values fi,j , Fi and ni associated with the term do are Term-term correlation matrix • For classic information retrieval models, the index term weights are assumed to be mutually independent – This means that wi,j tells us nothing about wi+1,j • This is clearly a simplification because occurrences of index terms in a document are not uncorrelated • For instance, the terms computer and network tend to appear together in a document about computer networks – In this document, the appearance of one of these terms attracts the appearance of the other • Thus, they are correlated and their weights should reflect this correlation. Term-term correlation matrix Term-term correlation matrix TF-IDF Weights • TF-IDF term weighting scheme: – Term frequency (TF) – Inverse document frequency (IDF) – Foundations of the most popular term weighting scheme in IR Assignment • Discuss in detail the concept of Term Weighting and Term Correlation matrix