Assignment 2

Uploaded by

Swathi Gonuguntla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

26 views

Assignment 2

Uploaded by

Swathi Gonuguntla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 4

Assignment 2 Scoring and Evaluation (Deadline: 05.11.2023, 11:59 PM ‘This assignment is on building a tof based ranked retrieval system to answer free text queries. You have to use python for this assignment, since python provides many features that wil ease up your Workload, as compared to other programming languages like C+ Dataset: “The data for this assignment can be found at this link (CRAN Folder): hitpsiicrive.qooale.com/crvefolders/19C-WbeYCiSValdl_ KGAQRKGDAgeHon us ‘You wil require two files for this assignment: ‘© eran.all.1400: This isthe main document file, containing the information for 1400 documents. To parse each document, you must read, sequentially, the following records from the fle. ‘© Each document starts with the fed, indicating the 1D © Followed by the Til, indicating the tite ‘© Followed by the.A fel, indicating the author © Followed by the 8 field, indicating the sourcelocation © Followed by the W fol, which actually contains tho text ofthe document ‘the boundary layer in simples sm.bglauer .bin (picile) fle saved in the main code directory. To build a ranked retrieval model, you have to vectorize each query and each document available in the corpus. > Consider all the terms (keys) inthe invert index tobe your vocabulary V. Obtain the Document Frequency for each term DF) as the size of the corresponding posting st Inthe Inverted index, > Tho Torm Frequency TF(, dof torm fin document dis fined as the numberof times f occurs in d. TEADF welght Wt, a) of each term ts thus obtained as Wt. d) = TF, a) x IDF(t, Obtain the query and document texts as previously done in Assignment 1. Our goal now is to ‘obtain [V-dimensional TF-IOF vectors for each query and each document inthe corpus Represent each query 9 as fq) = [ W(t, q) ¥ tin VJ. Similar, represent each document din the corpus as [d] = [W(t ) ¥tin V}, where Vis the vocabulary defined above. Refer side #41 of Lecture 5 and write codes for implementing the following three déd.qae schemes for weighting and normalizing the [V|-dimensional TF-IDF vectors: > Incite > Inebte > ancape Rank all the documents in the corpus corresponding to each query using the cosine similarity metric as descrived in slide #36 of Lecture 5 > For each of the schemes, store the query ids and their coresponding top 50 document namesiids in ranked order ina two-column esv file with a format similar to “rankedRelevantDocList.es’, .. “ : ‘Save the following tree fle in your main code directory ‘Assignment2__ranked_list_A.csv for “Inc.lte” ‘Assignment2_ ranked list_B.csv for “Lne.L.p ‘Assignment2_ ranked _list_C.csv for “anc apt > Name your code fleas: Assignment2__rankerpy > Bunning the fle ; Your code should take the path to the dataset and inverted index fle, Le ‘model_queries_.bin (obtained in Assignment 1) as input and it should run i the following manner: ‘$>> python Assignment2__rankerpy ‘ bin> ‘Task 28 (Evaluation) 1. For each query, consider the top 20 ranked documents from the list obtained in the previous stop.2, For each query, calculate and report the folowing metrics with respect to the gold-standara rankad lst of documents provided in “grt cv" ‘a. Avorago Procision (AP) @10. b. Average Precision (AP) @20. ©. Normalized Discounted Cumulative Gain (NDCG) @10, 4. Normalized Discounted Cumulative Gain (NDCG) @20. “arels.cs" has 4 fields -topie_id (represents query no) iteration, cord_id, judgment (values 0-2) Use the iteration field to resolve confcts of multiple entries (if any) ofthe same fopic_id and ordi take the record having higher iteration value) For binary relevance, consider non-zero judgment values tobe relevant, ‘Assume the relevance of any pair af (topic, cord_id) not presentin “gre. csv"to be 0. 3. Final, caloulate and report the Mean Average Precision (mAP@10 and mAP@20) and average NDCG (averNDCG@t0 and averNDCG@20) by averaging overall tho queris. 4. For each of the liweo ranked lists Assignment2__ranked list .csv (K in ‘AB.C) obtained in the previous step, create a separate fle inthe main code directory with name ‘Assignment2__metrics_txt and systematically save the values of the above mentioned evaluation metrics (both querywise and average), 5. Name your code fle as: Assignment? _evaluatorpy 6, Bunning the fie : For each value of K (AVBIC), your code should take the path tothe oblained ranked list and gold-standard ranked lst as input and it should run inthe folowing manner: ‘$>> python Assignment2__evaluator.py ‘ - ranked list_.csv> ‘Submit the fies: ‘Assignment2__ranker-py ‘Assignment2. evaluator py ‘Assignment2_ metrics A.csv ‘Assignment2. metrics B.csv ‘Assignment2_metrics_C.csv README.txt in a zipped fle named: Assignment2_zip Your README should contain any specific brary requirements to run your cade and the specie Python version you are using. Any other spacial information about your code or logic tat you wish to convey should be in the README fle. Further, provide detals of your design in the readme, such as the vocabulary length, pre-processing pipeline, etc ‘Also, mention you rll number in the fst line of your README, IMPORTANT: PLEASE FOLLOW THE EXACT NAMING CONVENTION OF THE FILES AND THE ‘SPECIFIC INSTRUCTIONS IN THE TASKS CAREFUL, ANY DEVIATION FROM IT WILL RESULT IN DEDUCTION OF MARKS.Python brary restrictions: You can use simple python Hbaries Tike nite, numpy 08, sy, collections, timeit, ete, However, you cannot use Hbrares tke lucene, elasticsearch, or any other search ap. If your code Is found to use any of such libraries, you will be awarded with zoro marks for this assignmont without any evaluation. You also cannot use parsing libraries either for parsing the corpus and query fles, do itby wring your own code. Plagiarism Rules: We wil be employing strict plagiarism checking. If your code matches with another students code, all those students whose codes match will be awarded with zero marks without any ‘evaluation, Therefore, it is your responsibilty to ensure you neither copy anyone's code nor anyone is able to copy your code. ‘God exter: If your code doesn't run or gives exror while running, marks will be awarded based on the ‘correctness of logic. Kf requirec, you mignt be called to mest the TAs and explain your code.

CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
ME P4252-II Semester - MACHINE LEARNING
No ratings yet
ME P4252-II Semester - MACHINE LEARNING
48 pages
JtdmoMJK64 hw4
No ratings yet
JtdmoMJK64 hw4
10 pages
Final Sol
100% (1)
Final Sol
8 pages
Midterm2006 Sol Csi4107
100% (2)
Midterm2006 Sol Csi4107
9 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
IR Assignment 01
No ratings yet
IR Assignment 01
19 pages
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
No ratings yet
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
3 pages
Ir - Assignment 3
No ratings yet
Ir - Assignment 3
11 pages
Irs 122010304057 PDF
No ratings yet
Irs 122010304057 PDF
23 pages
COURSEWORK1 Details
No ratings yet
COURSEWORK1 Details
3 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
ir
No ratings yet
ir
23 pages
MP 1
No ratings yet
MP 1
2 pages
COL 774: Assignment 2
No ratings yet
COL 774: Assignment 2
3 pages
Theory Assignment
No ratings yet
Theory Assignment
4 pages
Assignment 2 IR
No ratings yet
Assignment 2 IR
6 pages
Information Retreival Assignment
No ratings yet
Information Retreival Assignment
4 pages
IR practical
No ratings yet
IR practical
24 pages
Lab Assignment - SVM - 2024
No ratings yet
Lab Assignment - SVM - 2024
5 pages
DMlab2021
No ratings yet
DMlab2021
4 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Datascience
No ratings yet
Datascience
8 pages
Assignment 2v2
No ratings yet
Assignment 2v2
4 pages
Intro to Programming with Python - Final exam Practice Fall 2024
No ratings yet
Intro to Programming with Python - Final exam Practice Fall 2024
4 pages
IDSUP MID SEM EXAM-2023
No ratings yet
IDSUP MID SEM EXAM-2023
2 pages
CS5785 Homework 4: .PDF .Py .Ipynb
No ratings yet
CS5785 Homework 4: .PDF .Py .Ipynb
5 pages
Trí Tuệ Nhân Tạo - Search - Project
No ratings yet
Trí Tuệ Nhân Tạo - Search - Project
34 pages
NLP Soc
No ratings yet
NLP Soc
15 pages
Assignment 01
No ratings yet
Assignment 01
2 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Data Science Intern - 2nd Round Assessment Final
No ratings yet
Data Science Intern - 2nd Round Assessment Final
4 pages
IC152 Lab Assignment 6
No ratings yet
IC152 Lab Assignment 6
10 pages
CSE 3024: Web Mining: Lab Assessment - 3
No ratings yet
CSE 3024: Web Mining: Lab Assessment - 3
13 pages
DSS Assignment
No ratings yet
DSS Assignment
5 pages
IR
No ratings yet
IR
12 pages
Topic
No ratings yet
Topic
13 pages
Mini Project 4
No ratings yet
Mini Project 4
2 pages
BTCS 1st IMLP_assignment 3_4_ 5
No ratings yet
BTCS 1st IMLP_assignment 3_4_ 5
3 pages
I R Assignment 1
No ratings yet
I R Assignment 1
2 pages
FIT5196-S2-2020 Assessment 1: Task 1: Parsing Text Files (U)
No ratings yet
FIT5196-S2-2020 Assessment 1: Task 1: Parsing Text Files (U)
4 pages
COL774_A4_v3
No ratings yet
COL774_A4_v3
4 pages
NLP External Lab Programs
No ratings yet
NLP External Lab Programs
2 pages
Assignment 3_553
No ratings yet
Assignment 3_553
9 pages
Q1: Conference Reviewing (20 PTS, 5 Pts Each) : M M M (I) (J) I J J I M (I) (J) - 1
No ratings yet
Q1: Conference Reviewing (20 PTS, 5 Pts Each) : M M M (I) (J) I J J I M (I) (J) - 1
9 pages
Project
No ratings yet
Project
11 pages
02-working-with-data
No ratings yet
02-working-with-data
3 pages
CS657A: Information Retrieval: Assignment 1 (120 Marks) Due On: 28th February, 2022, 11:00pm
No ratings yet
CS657A: Information Retrieval: Assignment 1 (120 Marks) Due On: 28th February, 2022, 11:00pm
2 pages
Ir Practical Manual 2
No ratings yet
Ir Practical Manual 2
24 pages
Final Aiml
No ratings yet
Final Aiml
16 pages
First Name (Please Write As Legibly As Possible Within The Boxes)
No ratings yet
First Name (Please Write As Legibly As Possible Within The Boxes)
14 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
FIT1043 A2 Specification - S2 2024 - Gks6arg
No ratings yet
FIT1043 A2 Specification - S2 2024 - Gks6arg
5 pages
FAI Practice-1
No ratings yet
FAI Practice-1
4 pages
Jaccard Similarity Join: The Code
No ratings yet
Jaccard Similarity Join: The Code
3 pages

Assignment 2

Uploaded by

Assignment 2

Uploaded by

You might also like