Assignment 1
Assignment 1
A [4,6,10,12,14,16,18,20,22,32,47,81,120,122,157,180]
B [4,18,121,180]
Work out how many comparisons would be done to intersect the two postings
lists with skip pointer and without skip pointer.
2. Construct the inverted index using the following documents. 2
Document 1: The quick brown fox jumped over the lazy dog.
Document 2: The lazy dog slept in the sun.
Document 3: The sun was bright.
Document 4: The bright sun was in the blue sky.
Document 5: The sky was blue.
3. For a given document stored in the data warehouse, compress the words by 2
applying following preprocessing technique separately.
i. Normalization
ii. Normalization and stemming
iii. Stop words removal
Information retrieval is the activity of obtaining
information resources relevant to an information need from
a collection of information resources. Searches can be
based on full text or other content based indexing.
Automated information retrieval systems are used to reduce
what has been called "information overload". Many
universities and public libraries use IR systems to provide
access to books, journals and other documents. Web search
engines are the most visible IR applications.
4. State and discuss the algorithmic steps to compute edit distance between two 2
strings. Use the Levenshtein matrix to represent the distance score between
"kitten" and "sitting".
5. Write the pseudocode for Jaccard Coefficient and calculate the Jaccard 2