Assignment 1

This document outlines an assignment to build and compare three information retrieval systems: a grep-based system, an index-based system developed by the student, and a Lucene-based system. Students are asked to implement boolean query processing in each system, measure precision and recall on a provided dataset, and compare the performance of the three approaches. The dataset includes documents, queries, and relevance judgments, and is available at a provided URL.

Uploaded by

Gourab Patro

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Assignment 1

Uploaded by

Gourab Patro

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

CS60092: Information Retrieval

Jan 2018, Assignment 1

Motivation: This assignment is to give you a hands on feel about a simple Information Retrieval
system.

Task:
You are given a dataset consisting of the following:
● Documents
● Queries
● Documents relevant to the queries.

You have to implement and compare the following systems for boolean query processing:
● Grep based.
● Index based.
● Lucene based (well known indexing tool).

Report performance metrics (Precision and Recall) and total time for searching all queries for
both the techniques.

You can issue and queries with all the search terms.

Using the dataset provided :

a) Use grep to find the result to the sample queries given. Time each execution of grep
search and make a record.

b) Develop an inverted index - dictionary and postings list using standard data structures in
Java (Hashmaps, ArrayList…) or Python(Dictionary, Json Formats, List…). You can
choose to tokenize and stem / lemmatize the data. In python use NLTK 3 libraries
(http://www.nltk.org/install.html) (NLTK Book -- http://www.nltk.org/book) or CoreNLP
libraries 3.6 in Java (http://stanfordnlp.github.io/CoreNLP/download.html). Develop
solution for simple conjunctive/disjunctive queries. Run on the queryset given. Tabulate
the speedup of search against the aforementioned grep usage. Also calculate precision
and recall for the given queryset.

c) Build an inverted index using Lucene (Java) - https://lucene.apache.org/

or PyLucene(Python) -https://lucene.apache.org/pylucene/install.html or
Elasticsearch(Python) - https://pypi.python.org/pypi/elasticsearch.

Now again tabulate the speed as well as Precision/Recall and compare with the previous two
approaches.
Output expected for submission: Code + document with tabulations of speed, precision/recall,
and comparison with the previous two approaches.

Dataset description:

All necessary data is available at:

https://drive.google.com/open?id=1Pvc9MBMc2fF02vTB4BtgaYs4YhW_Pb0-
The folder Assignment1 contains query.txt, output.txt, alldocs.rar.

1. query.txt contains total 82 queries, which has 2 columns query id and query.
2. alldocs.rar contains documents file named with doc id. Each document has set of sentences.
3. output.txt contains top 50 relevant documents (doc id) for each query.

2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
DevOps For Data Science (Alex K Gold) (Z-Library)
No ratings yet
DevOps For Data Science (Alex K Gold) (Z-Library)
274 pages
Practical Social Network Analysis With Python PDFDrive
No ratings yet
Practical Social Network Analysis With Python PDFDrive
424 pages
Use Vim Like A Pro: Tim Ottinger
No ratings yet
Use Vim Like A Pro: Tim Ottinger
33 pages
Data Science - Assessment
No ratings yet
Data Science - Assessment
1 page
Mpower Project PDF
No ratings yet
Mpower Project PDF
15 pages
University College Cork Exam, Questions and Answers - SQL Exam 2016
No ratings yet
University College Cork Exam, Questions and Answers - SQL Exam 2016
23 pages
Sde Problems PDF
No ratings yet
Sde Problems PDF
7 pages
CS3233 C II P I Competitive Programming: Dr. Steven Halim Week 04 - Problem Solving Paradigms
No ratings yet
CS3233 C II P I Competitive Programming: Dr. Steven Halim Week 04 - Problem Solving Paradigms
46 pages
Rdbms Lab - q5
No ratings yet
Rdbms Lab - q5
5 pages
SIM Swap Attackpdf - 240526 - 102634
No ratings yet
SIM Swap Attackpdf - 240526 - 102634
17 pages
Easy Level: Microsoft Syllabus Page of 1 6
No ratings yet
Easy Level: Microsoft Syllabus Page of 1 6
6 pages
Searching and Sorting: Objectives
No ratings yet
Searching and Sorting: Objectives
20 pages
Leetcode Questions - Public
No ratings yet
Leetcode Questions - Public
26 pages
C++ Interview Questions
100% (3)
C++ Interview Questions
11 pages
Codeforces Tutorial
No ratings yet
Codeforces Tutorial
72 pages
Leetss-Code Questions
No ratings yet
Leetss-Code Questions
224 pages
Scaler
No ratings yet
Scaler
1 page
All Assignments
No ratings yet
All Assignments
104 pages
Java Coding Questions
No ratings yet
Java Coding Questions
16 pages
ACM ICPC Programming Contest Orientation
No ratings yet
ACM ICPC Programming Contest Orientation
40 pages
Data Science - Assignment 2
No ratings yet
Data Science - Assignment 2
4 pages
76 - Sample - Chapter Kunci M2K3 No 9
No ratings yet
76 - Sample - Chapter Kunci M2K3 No 9
94 pages
Programs
No ratings yet
Programs
236 pages
Open Source Tools
No ratings yet
Open Source Tools
364 pages
OLA - Research Engineer AI IITR
No ratings yet
OLA - Research Engineer AI IITR
2 pages
Coding Statements TCS NQT
No ratings yet
Coding Statements TCS NQT
13 pages
Data Science - Assignment 1
No ratings yet
Data Science - Assignment 1
4 pages
Leetcode DSA Complete Sheet
No ratings yet
Leetcode DSA Complete Sheet
11 pages
React JS 10-Day Roadmap
No ratings yet
React JS 10-Day Roadmap
30 pages
Output Leetcode Questions PDF
No ratings yet
Output Leetcode Questions PDF
224 pages
DSA Notes
No ratings yet
DSA Notes
84 pages
Fundamentals of Data Science 1st Edition Sanjeev J. Wagh All Chapters Instant Download
No ratings yet
Fundamentals of Data Science 1st Edition Sanjeev J. Wagh All Chapters Instant Download
29 pages
Instructions For KPIT's Engineering Graduates Hiring: 19 and 20 June 2021
No ratings yet
Instructions For KPIT's Engineering Graduates Hiring: 19 and 20 June 2021
53 pages
AI
No ratings yet
AI
101 pages
Closest Pair
No ratings yet
Closest Pair
9 pages
SCSA3016 Data Science L T P Credits Total Marks 3 0 0 3 100
No ratings yet
SCSA3016 Data Science L T P Credits Total Marks 3 0 0 3 100
1 page
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
250+ TOP MCQs On SQL Queries and Answers - Quiz
No ratings yet
250+ TOP MCQs On SQL Queries and Answers - Quiz
1 page
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
No ratings yet
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
148 pages
Coding Interview Prep: Milestone 0: Learning A Programming Language
No ratings yet
Coding Interview Prep: Milestone 0: Learning A Programming Language
4 pages
MCQ
No ratings yet
MCQ
11 pages
DBMS Unit-I
No ratings yet
DBMS Unit-I
172 pages
Mathematics Formula 1
No ratings yet
Mathematics Formula 1
30 pages
Python Lists: List Initialization
No ratings yet
Python Lists: List Initialization
25 pages
Dsa Cheatsheet: 1) Learn A Language - Resources
No ratings yet
Dsa Cheatsheet: 1) Learn A Language - Resources
4 pages
Extra Lecturenotes Cs725
No ratings yet
Extra Lecturenotes Cs725
119 pages
PEA306
No ratings yet
PEA306
1 page
Association Analysis: Basic Concepts and Algorithms
No ratings yet
Association Analysis: Basic Concepts and Algorithms
28 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
DATA STRUCTURE KCS 301 Aktutor
No ratings yet
DATA STRUCTURE KCS 301 Aktutor
10 pages
DSA-251 by Parikh Jain
No ratings yet
DSA-251 by Parikh Jain
22 pages
Noc20-Cs28 Week 01 Assignment 01
No ratings yet
Noc20-Cs28 Week 01 Assignment 01
6 pages
I R Assignment 1
No ratings yet
I R Assignment 1
2 pages
Project Report
No ratings yet
Project Report
5 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
COURSEWORK1 Details
No ratings yet
COURSEWORK1 Details
3 pages
Information Retreival Assignment
No ratings yet
Information Retreival Assignment
4 pages
Assessment 2
No ratings yet
Assessment 2
3 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet

Assignment 1

Uploaded by

Assignment 1

Uploaded by

CS60092: Information Retrieval

Jan 2018, Assignment 1

Using the dataset provided :

c) Build an inverted index using Lucene (Java) - https://lucene.apache.org/

All necessary data is available at:

You might also like