0% found this document useful (0 votes)

89 views2 pages

I R Assignment 1

This document outlines an assignment to implement and compare different information retrieval systems. Students are asked to build indexes and search a provided dataset using grep, a basic index, and Lucene. They must time each search method, calculate precision and recall, and compare the performance of the three approaches. The dataset includes documents, queries with relevant documents, and instructions for accessing the data. The goal is to give students hands-on experience with simple information retrieval systems.

Uploaded by

Arghya Adhya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views2 pages

I R Assignment 1

Uploaded by

Arghya Adhya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

CS60092: Information Retrieval

Autumn 2016, Assignment 1

Motivation: This assignment is to give you a hands on feel about a simple Information Retrieval
system.
Task:
You are given a dataset consisting of the following:
Documents
Queries
Documents relevant to the queries.
You have to implement and compare the following systems for boolean query processing:
Grep based.
Index based.
Lucene based (well known indexing tool).
Report performance metrics (Precision and Recall) and total time for searching all queries for
both the techniques.
You can issue and queries with all the search terms.
Using the dataset provided :
a) Use grep to find the result to the sample queries given. Time each execution of grep
search and make a record.
b) Develop an inverted index - dictionary and postings list using standard data structures in
Java (Hashmaps, ArrayList) or Python(Dictionary, Json Formats, List). You can
choose to tokenize and stem / lemmatize the data. In python use NLTK 3 libraries
(http://www.nltk.org/install.html) (NLTK Book -- http://www.nltk.org/book) or CoreNLP
libraries 3.6 in Java (http://stanfordnlp.github.io/CoreNLP/download.html). Develop
solution for simple conjunctive/disjunctive queries. Run on the queryset given. Tabulate
the speedup of search against the aforementioned grep usage. Also calculate precision
and recall for the given queryset.
c) Build an inverted index using Lucene (Java) - https://lucene.apache.org/
or PyLucene(Python) -https://lucene.apache.org/pylucene/install.html
or Elasticsearch(Python) - https://pypi.python.org/pypi/elasticsearch.
Now again tabulate the speed as well as Precision/Recall and compare with the previous two
approaches.

Dataset description:
All necessary data is available at:
https://www.dropbox.com/sh/1zfuw3xuhmul3d2/AAAuYd2GevTnvSgUK2YK8CvRa?dl=0
The folder Assignment1 contains query.txt, output.txt, alldocs.rar.
1. query.txt contains total 82 queries, which has 2 columns query id and query.
2. alldocs.rar contains documents file named with doc id. Each document has set of sentences.
3. output.txt contains top 50 relevant documents (doc id) for each query.

Final CSI 4107 - 2009 Solution
100% (1)
Final CSI 4107 - 2009 Solution
11 pages
Midterm2006 Sol Csi4107
100% (2)
Midterm2006 Sol Csi4107
9 pages
Programming Assignment Unit 05 - CS 3308 - Information Retrieval - University of The People
No ratings yet
Programming Assignment Unit 05 - CS 3308 - Information Retrieval - University of The People
9 pages
Project Report
No ratings yet
Project Report
5 pages
Classwork For Information Retrieval
No ratings yet
Classwork For Information Retrieval
118 pages
IR Journal
No ratings yet
IR Journal
36 pages
FULLTEXT01
No ratings yet
FULLTEXT01
32 pages
IR Merged Merged
No ratings yet
IR Merged Merged
132 pages
DSA Mini Project Template
No ratings yet
DSA Mini Project Template
11 pages
Project Proposal
No ratings yet
Project Proposal
10 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
Irt Ia 2
No ratings yet
Irt Ia 2
9 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
35 pages
Certificate: T.Y.Bsc Cs
No ratings yet
Certificate: T.Y.Bsc Cs
120 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
DSS Assignment
No ratings yet
DSS Assignment
5 pages
CS657A: Information Retrieval: Assignment 1 (120 Marks) Due On: 28th February, 2022, 11:00pm
No ratings yet
CS657A: Information Retrieval: Assignment 1 (120 Marks) Due On: 28th February, 2022, 11:00pm
2 pages
Query Languages
No ratings yet
Query Languages
54 pages
Ap May 23 QP Ans
No ratings yet
Ap May 23 QP Ans
9 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
B47 IRS JayeshSIngh AssignmentNo-1
No ratings yet
B47 IRS JayeshSIngh AssignmentNo-1
8 pages
IR Journal
No ratings yet
IR Journal
20 pages
Bits Pilani, Dubai Campus
No ratings yet
Bits Pilani, Dubai Campus
11 pages
Tamirat IRS
No ratings yet
Tamirat IRS
7 pages
Assignment 01
No ratings yet
Assignment 01
2 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
23 pages
COURSEWORK1 Details
No ratings yet
COURSEWORK1 Details
3 pages
Vanessaa Wim
No ratings yet
Vanessaa Wim
9 pages
20BCE1779 - Web Mining - Lab-1
No ratings yet
20BCE1779 - Web Mining - Lab-1
9 pages
Assessment 2
No ratings yet
Assessment 2
3 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
4 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
IR - 754 All Practical
No ratings yet
IR - 754 All Practical
21 pages
IR Practical
No ratings yet
IR Practical
24 pages
IR Model Question Paper
No ratings yet
IR Model Question Paper
2 pages
Theory Assignment
No ratings yet
Theory Assignment
4 pages
Cross Lingual Information Retrieval and Error Tracking in Search Engine
No ratings yet
Cross Lingual Information Retrieval and Error Tracking in Search Engine
37 pages
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
NLP See
No ratings yet
NLP See
27 pages
Irt Ans
No ratings yet
Irt Ans
9 pages
Supervisionguide16 17 Students
No ratings yet
Supervisionguide16 17 Students
17 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
IR - Set 1
No ratings yet
IR - Set 1
5 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
No ratings yet
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
2 pages
Irt Q&A
No ratings yet
Irt Q&A
14 pages
IRS7
No ratings yet
IRS7
2 pages
IR
No ratings yet
IR
5 pages
117DX052018
No ratings yet
117DX052018
2 pages
Information Retreival Assignment
No ratings yet
Information Retreival Assignment
4 pages
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Functional Python Programming
From Everand
Functional Python Programming
Steven Lott
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Production System: Fundamentals and Applications
From Everand
Production System: Fundamentals and Applications
Fouad Sabry
No ratings yet

I R Assignment 1

Uploaded by

I R Assignment 1

Uploaded by

CS60092: Information Retrieval

Autumn 2016, Assignment 1

You might also like