0% found this document useful (0 votes)

7 views3 pages

Assignment 2

This document outlines an assignment to implement different scoring functions for an information retrieval system and compare their performance. Students are asked to score documents for queries using TF-IDF, BM25, and a language modeling approach. Evaluation will be done by calculating NDCG metrics against relevance judgments.

Uploaded by

sana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Assignment 2

Uploaded by

sana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Information Retrieval

Programming Assignment 2
Overview

In this assignment, you will use the index you created in Assignment 1 to rank documents and
create a search engine. You will implement different scoring functions and compare their results
against a baseline ranking produced by expert analysts.

Running Queries

For this assignment, you will need the following two files:

 queries: It contains the queries you will be testing.

 qrels: It contains the relevance grades from expert assessors. While these grades are not
necessarily entirely correct (and defining correctness unambiguously is quite difficult),
they are fairly reliable and we will treat them as being correct here.

The format here is:

o <topic> is the ID of the query for which the document was assessed
o <doc> is the name of one of the documents which you have indexed.
o <grade> is a value in the set {1, 2, 3, 4}, where a higher value means that the
document is more relevant to the query. The value 1 indicates a document which
is non-relevant.

This QREL does not have assessments for every (query, document) pair. If an assessment
is missing, we assume the correct grade for the pair is 1 (non-relevant).

You will write a program which takes the name of a scoring function as a command line
argument and which prints a ranked list of documents for all queries found in topics.xml using
that scoring function. For example:

$ ./query.py --score TF-IDF

202 clueweb12-0000tw-13-04988 1 0.73 run1
202 clueweb12-0000tw-13-04901 2 0.33 run1
202 clueweb12-0000tw-13-04932 3 0.32 run1
...
214 clueweb12-0000tw-13-05088 1 0.73 run1
214 clueweb12-0000tw-13-05001 2 0.33 run1
214 clueweb12-0000tw-13-05032 3 0.32 run1
...
250 clueweb12-0000tw-13-05032 500 0.002 run1
The output should have one row for each document which your program ranks for each query it
runs. These lines should have the format:

<topic> <docid> <rank> <score> <run>

 <topic> is the ID of the query for which the document was ranked.
 <docid> is the document identifier.
 <rank> is the order in which to present the document to the user. The document with the
highest score will be assigned a rank of 1, the second highest a rank of 2, and so on.
 <score> is the actual score the document obtained for that query.
 <run> is the name of the run. You can use any value here. It is meant to allow research
teams to submit multiple runs for evaluation in competitions such as TREC.

Query Processing

Before running any scoring function, you should process the text of the query in exactly the same
way that you processed the text of a document for your inverted index. That is:

1. Split the query into tokens

2. Apply stop-wording to the query using the same list you used in assignment 1
3. Apply the same stemming algorithm to the query which you used in your indexer

Scoring Function 1: TF-IDF

The parameter --score TF-IDF directs your program to use a vector space model with TF-IDF
scores. This should be very similar to the TF score, but use the following scoring function:

where D is the total number of documents, and df(i) is the number of documents which contain term
i.

Scoring Function 2: Okapi BM25

Implement BM25 scores. This should use the following scoring function for document d and
query q:
Where k1,k2, and b are constants. For start, you can use the values suggested in the lecture on BM25
(k1 = 1.2, k2 varies from 0 to 1000, b = 0.75). Feel free to experiment with different values for these
constants to learn their effect and try to improve performance.

Scoring Function 3: Language model with Dirichlet Smoothing

Implement a language model with Dirichlet smoothing. The parameter mu should be set equal to
average document length in collection.

Evaluation

To evaluate your results, we will write a program that computes NDCG@5, NDCG@10,
NDCG15, NDCG@20. The input to program will be the qrel file (relevance judgments) and
scoring file that has rank list of documents.

These measures should be computed for each query. Average for all queries should also be
computed.

Report

Make a table for results of each query.

Submission Checklist

Submit your files in a zipped folder named your roll number on Google classroom.

 Your source code

 The output ranking for each scoring function (zipped or gzipped)
 Report with table of results with Mean average precision for each scoring function
 DO NOT SUBMIT INDEX

CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
Midterm Exam Information Retrieval (INLS 509) March 6 TH, 2013
No ratings yet
Midterm Exam Information Retrieval (INLS 509) March 6 TH, 2013
9 pages
Midterm2006 Sol Csi4107
100% (2)
Midterm2006 Sol Csi4107
9 pages
C++ Programking Lab 01 and 02 Check - Sheet
No ratings yet
C++ Programking Lab 01 and 02 Check - Sheet
4 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Mini Project 4
No ratings yet
Mini Project 4
2 pages
3 Retrieval Models
No ratings yet
3 Retrieval Models
87 pages
Module 2-1
No ratings yet
Module 2-1
6 pages
COURSEWORK1 Details
No ratings yet
COURSEWORK1 Details
3 pages
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
No ratings yet
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
3 pages
153 Sanskriti IR File
No ratings yet
153 Sanskriti IR File
55 pages
LectureLtR-neural IR 1
No ratings yet
LectureLtR-neural IR 1
55 pages
Web Search
No ratings yet
Web Search
30 pages
Chapter 6 - Scoring Term Weighting and Vector Space Model
No ratings yet
Chapter 6 - Scoring Term Weighting and Vector Space Model
43 pages
NLP See
No ratings yet
NLP See
9 pages
NLP See
No ratings yet
NLP See
27 pages
1 Overview
No ratings yet
1 Overview
44 pages
Project Report
No ratings yet
Project Report
5 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
IR - Models
100% (3)
IR - Models
58 pages
4 IRinArabic2021 Ranked Retrieval I
No ratings yet
4 IRinArabic2021 Ranked Retrieval I
49 pages
IR Assignment 01
No ratings yet
IR Assignment 01
19 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
IR and RL NLP
No ratings yet
IR and RL NLP
9 pages
Ir Mod2 Notes
No ratings yet
Ir Mod2 Notes
26 pages
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
No ratings yet
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
45 pages
Programming Assignment Unit 05 - CS 3308 - Information Retrieval - University of The People
No ratings yet
Programming Assignment Unit 05 - CS 3308 - Information Retrieval - University of The People
9 pages
Performance Evaluation of Information Retrieval Systems
No ratings yet
Performance Evaluation of Information Retrieval Systems
46 pages
Scoring
No ratings yet
Scoring
49 pages
Chapter 2 Modeling: Modern Information Retrieval by R. Baeza-Yates and B. Ribeir
No ratings yet
Chapter 2 Modeling: Modern Information Retrieval by R. Baeza-Yates and B. Ribeir
47 pages
Irt Q&A
No ratings yet
Irt Q&A
14 pages
Lec 4
No ratings yet
Lec 4
39 pages
IR Systems Usually Adopt Index Terms To Process Queries Index Term
No ratings yet
IR Systems Usually Adopt Index Terms To Process Queries Index Term
24 pages
Theory Assignment
No ratings yet
Theory Assignment
4 pages
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
No ratings yet
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
420 pages
Chapter 4 IR Models
No ratings yet
Chapter 4 IR Models
43 pages
Chapter 2: Modeling: Advanced Topics in Information Retrieval
No ratings yet
Chapter 2: Modeling: Advanced Topics in Information Retrieval
28 pages
Ir - Assignment 3
No ratings yet
Ir - Assignment 3
11 pages
Ir QB
No ratings yet
Ir QB
8 pages
Ir Journal
No ratings yet
Ir Journal
41 pages
Lect 13-Text Ranking
No ratings yet
Lect 13-Text Ranking
58 pages
Cs 473 HW 5
No ratings yet
Cs 473 HW 5
4 pages
(李航) A short introduction to learning to rank
No ratings yet
(李航) A short introduction to learning to rank
9 pages
Unit 2
No ratings yet
Unit 2
13 pages
NLP Week10 IR Enc Dec Annotated - by - Ces
No ratings yet
NLP Week10 IR Enc Dec Annotated - by - Ces
83 pages
L12&L13 Ranked Retrieval
No ratings yet
L12&L13 Ranked Retrieval
31 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
IR Presentation 2
No ratings yet
IR Presentation 2
28 pages
5 IRModels IR
No ratings yet
5 IRModels IR
25 pages
Chapter Five IR Models
No ratings yet
Chapter Five IR Models
28 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
L03
No ratings yet
L03
16 pages
Assignment 2 IR
No ratings yet
Assignment 2 IR
6 pages
Ir Answers
No ratings yet
Ir Answers
15 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
4 IRModels
No ratings yet
4 IRModels
32 pages
Ranked Retrieval
No ratings yet
Ranked Retrieval
52 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
23 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Cbse Class 10 Maths Pre Board Sample Paper For 2023 24
No ratings yet
Cbse Class 10 Maths Pre Board Sample Paper For 2023 24
7 pages
Tapo C310 2.0&2.20&2.26&2.28 - Datasheet
No ratings yet
Tapo C310 2.0&2.20&2.26&2.28 - Datasheet
8 pages
Linux Imp Topics
No ratings yet
Linux Imp Topics
29 pages
ISTQB Agile Tester Exam - Answer
No ratings yet
ISTQB Agile Tester Exam - Answer
139 pages
Chandigarh Administration Chandigarh Police: JAN SAMPARK: Information Gateway of Chandigarh Administration: 1 of 1
No ratings yet
Chandigarh Administration Chandigarh Police: JAN SAMPARK: Information Gateway of Chandigarh Administration: 1 of 1
1 page
Barrons November 62023
No ratings yet
Barrons November 62023
53 pages
Paper 4
No ratings yet
Paper 4
33 pages
Oracle Break Glass For Fusion Cloud Ds
No ratings yet
Oracle Break Glass For Fusion Cloud Ds
5 pages
Week4 - Understanding Colors
No ratings yet
Week4 - Understanding Colors
43 pages
Exploiting Temporal and Depth Information For Multi-Frame Face Anti-Spoofing
No ratings yet
Exploiting Temporal and Depth Information For Multi-Frame Face Anti-Spoofing
15 pages
BESM - Cold Hands, Dark Hearts
No ratings yet
BESM - Cold Hands, Dark Hearts
132 pages
DX Diag
No ratings yet
DX Diag
35 pages
С1 Smartwatches Are They as Smart as We Think 1
No ratings yet
С1 Smartwatches Are They as Smart as We Think 1
15 pages
Digital Therapeutics Apps On Prescription
No ratings yet
Digital Therapeutics Apps On Prescription
12 pages
Cirrus: SR22 / SR22T WM Temporary Revision 24-50-02 Electrical Power
No ratings yet
Cirrus: SR22 / SR22T WM Temporary Revision 24-50-02 Electrical Power
4 pages
Complaint Type:Cyber Crime / Report & Track: Complainant Details
No ratings yet
Complaint Type:Cyber Crime / Report & Track: Complainant Details
2 pages
Long Short Term Memory Networks For Automated Waste Treatment Augmented With IoT and Bioelectric Sensors
No ratings yet
Long Short Term Memory Networks For Automated Waste Treatment Augmented With IoT and Bioelectric Sensors
12 pages
Xhamster VR Manual
No ratings yet
Xhamster VR Manual
5 pages
Computer Architecture and Organisation Notes
100% (1)
Computer Architecture and Organisation Notes
18 pages
Mantra MFS 110
No ratings yet
Mantra MFS 110
8 pages
Ac & DC Ammeters: Fixed Range & Selectable Range (16 Ranges in 1 Meter)
No ratings yet
Ac & DC Ammeters: Fixed Range & Selectable Range (16 Ranges in 1 Meter)
2 pages
Literature Review Natural Language Processing
100% (2)
Literature Review Natural Language Processing
6 pages
LAS 03 Illustrating A Probability Distribution For A Discrete Random Variable
No ratings yet
LAS 03 Illustrating A Probability Distribution For A Discrete Random Variable
1 page
Buy 20W USB-C Power Adapter - Apple
No ratings yet
Buy 20W USB-C Power Adapter - Apple
1 page
Commercial Proposal - Synzeal Research Pvt. Ltd. - v1.0
No ratings yet
Commercial Proposal - Synzeal Research Pvt. Ltd. - v1.0
25 pages
SEPM QB (2) Merged
No ratings yet
SEPM QB (2) Merged
92 pages
QUICK-969D
No ratings yet
QUICK-969D
16 pages
Online Sbi Registration Form To The Branch Manager State Bank of India .
No ratings yet
Online Sbi Registration Form To The Branch Manager State Bank of India .
3 pages
IP ROUTING (Unit III)
No ratings yet
IP ROUTING (Unit III)
38 pages

Assignment 2

Uploaded by

Assignment 2

Uploaded by

Information Retrieval

 queries: It contains the queries you will be testing.

The format here is:

$ ./query.py --score TF-IDF

<topic> <docid> <rank> <score> <run>

1. Split the query into tokens

Scoring Function 1: TF-IDF

Scoring Function 2: Okapi BM25

Scoring Function 3: Language model with Dirichlet Smoothing

Make a table for results of each query.

 Your source code

You might also like