0% found this document useful (0 votes)

17 views12 pages

Assignment 1

The document contains an assignment for Computer Engineering students at Bilkent University, focusing on information retrieval metrics and indexing methods. It discusses precision and recall calculations for two queries, partitioning strategies for indexing, and the concept of skipping in inverted index construction. The document also includes tables and examples to illustrate these concepts.

Uploaded by

Ahmed Ibrahim Ghnnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views12 pages

Assignment 1

Uploaded by

Ahmed Ibrahim Ghnnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

C OMPUTER E NGINEERING D EPARTMENT

B ILKENT U NIVERSITY

Assignment No. 1

Abdurrahman Yasar
June 10, 2014

1 QUESTION 1
Consider the following search results for two queries Q1 and Q2 (the documents are ranked
in the given order, (the relevant documents are shown in bold).
Q1: D1, D2, D3, D4, D5, D6, D7, D8, D9, D10
Q2: D1, D2, D3, D4, D5, D6, D7, D8, D9, D10
For Q1 and Q2 the total number of relevant documents are, respectively, 15 and 4 (in Q1 12 of
the relevant documents are not retrieved).

1.1 A. T REC I NTERPOLATION RULE

Using the TREC interpolation rule, in a table give the precision value for the 11 standard recall
levels 0.0, 0.1, 0.2, ... 1.0. Please also draw the corresponding recall-precision graph as shown
in the first figure of TREC-6 Appendix A (its link is available on the course web site).

TREC Interpolation Rule: ’Interpolated’ means that, for example, precision at recall
0.10 (i.e., after 10% of rel docs for a query have been retrieved) is taken to be MAXIMUM of
precision at all recall points>= 0.10. Values are averaged over all queries (for each of the 11
recall levels). These values are used for Recall-Precision graphs.

number o f r el ev ant i t ems r et r i eved

Rec al l =
number o f r el event i t ems i n col l ec t i on

number o f r el ev ant i t ems r et r i eved

P r eci si on =
t ot al number o f i t ems r et r i eved

1
Rank 1 2 3 4 5 6 7 8 9 10
Recall 0 0.067 0.067 0.133 0.133 0.200 0.200 0.200 0.200 0.200
Precision 0 1/2 1/3 2/4 2/5 3/6 3/7 3/8 3/9 3/10

Table 1.1: Recall- Precision Table For Q1

Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Precision 1/2 2/4 3/6 0 0 0 0 0 0 0 0

Table 1.2: Interpolated Recall- Precision Table For Q1

Figure 1.1: Recall-Precision Graph of Q1

Rank 1 2 3 4 5 6 7 8 9 10
Recall 0.25 0.25 0.5 0.5 0.75 0.75 0.75 0.75 1.0 1.0
Precision 1 1/2 2/3 2/4 3/5 3/6 3/7 3/8 4/9 4/10

Table 1.3: Recall- Precision Table For Q2

2
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Precision 1 1 1 2/3 2/3 2/3 3/5 3/5 4/9 4/9 4/10

Table 1.4: Interpolated Recall- Precision Table For Q2

Figure 1.2: Recall-Precision Graph of Q2

1.2 B. R-P RECISION

Find R-Precision (TREC-6 AppendixA for definition) for Query1 and Query2.

R-Precision: R-Precision is the precision after R documents have been retrieved where R
is the number of relevant documents for the topic. So in our case:

• Q1: We cannat calculate R-Precision for Q1 because the number of relevant documents
for Q1 is 15 so R is 15 but we have only retrived 10 documents so we cannot say anything
about the last 5.

• Q2: R is 4 and there are 2 relevant documents in 4 retrieved documents so R-Precision

= 12

3
• For Q1 we have 0 becaus we couldn’t calculate its R-Precision and for Q2 we have 0.5.
0+ 12 1
Then Avarage : 2 = 4

1.3 C.F IND M AP

Find MAP for these queries. If it is not possible to calculate explain why.

0.067+0.133+0.200+12§0.000
• Average precision of Q1 : 15 = 0.027

1+ 23 + 35 + 49
• Average precision of Q1 : 4 = 0.677
0.027+0.677
• MAP : 2 = 0.352

2 QUESTION 2
Consider document-based partitioning and term-based partitioning approaches as define
in the Zobel-Moffat paper "Inverted files for text search engines" (ACM Computing Surveys,
2006). Please also consider the following document by term binary D matrix for m= 6 doc-
uments (rows), n= 6 terms (columns). Describe a two ways of indexing across a cluster of
machines. 2 3
1 0 0 0 1 1
60 1 0 1 1 07
6 7
61 0 1 0 0 07
6 7
D =6 7
60 1 0 1 0 07
6 7
40 0 1 0 1 15
0 0 0 0 1 1

Document Based Partitioning: The simplest distribution regime is to partition the collec-
tion and allocate one subcollection to each of the processors.
Term Based Partitioning: In a term-partitioned index, the index is split into components
by partitioning the vocabulary.

Lets give an example partitioning assuming the above D matrix. We have 6 documents;
d 1 , d 2 , ..., d 6 , 6 terms t 1 , t 2 , ..., t 6 and assume that we have 3 computers; c 1 , c 2 , c 3 . If we
use document based partitioning 2 document will be sent to a computer in other words we
send rows to computers:

c 1 ! {d 1 , d 2 }
c 2 ! {d 3 , d 4 }
c 3 ! {d 5 , d 6 }

If we use term based partitioning we send same terms of the documents, in other words
columns, to computers:

4
c 1 ! {t 1 , t 2 }
c 2 ! {t 3 , t 4 }
c 3 ! {t 5 , t 6 }

These partitioning methods is important for balancing the workload. For example in the
above example using document based partitioning is better because each document has at
least 2 and at most 3 terms so the workload of computer will be relatively equal to each other.
But if we use term-based partitioning we see that for each term appears at least 2 at most 4 in
documents so this type of partitioning gives us a worse load balancing.

In conclusion after looking distribution of terms in documents and number of terms per doc-
ument we can decide one of the partitioning methods which is more uniformly distributed
to obtain a good workload balancing.

3 QUESTION 3
In this part again consider the Zobel-Moffat paper.

3.1 A. S KIPPING C ONCEPT

Understand the skipping concept as applied to the inverted index construction.

Assume that we have the following posting list for term a: < 1, 5 > < 4, 1 > < 9, 3 > < 10, 4 > <
12, 4 > < 17, 4 > < 18, 3 > < 22, 2 > < 24, 4 > < 33, 4 > < 38, 5 > < 43, 5 > < 55, 3 > < 64, 2 > <
68, 4 > < 72, 5 > < 75, 1 > < 88, 2 >. The posting list indicates that term -a appears in d1 five
times and in d3 twice , etc.

Assume that we have the following posting list for term - b: < 12, 3 > < 40, 2 > < 66, 1 >

Consider the following conjunctive Boolean query: term-a and term-b. If no skipping is used
how many comparisons do you have to find the intersection of these two lists?

Introduce a skip structure, draw the corresponding figure then give the number of compar-
isons involved to process the same query.

• term-a !< 1, 5 > < 4, 1 > < 9, 3 > < 10, 4 > < 12, 4 > < 17, 4 > < 18, 3 > < 22, 2 > < 24, 4 >
< 33, 4 > < 38, 5 > < 43, 5 > < 55, 3 > < 64, 2 > < 68, 4 > < 72, 5 > < 75, 1 > < 88, 2 >

• term-b !< 12, 3 > < 40, 2 > < 66, 1 >

Boolean query: term-a and term-b. Given these two inverted lists term-a and term-b we can
find intersection of the above lists without using skipping in 15 comparisons. Since the lists
are in the order of increasing document identifiers we can intersect them in one scan.

5
• Compare < 12, 3 > of term-b’s list with < 1, 5 > < 4, 1 > < 9, 3 > < 10, 4 > < 12, 4 > from
term-a’s list. After 5 comparisons then we know where to place < 12, 3 >.

• Compare < 40, 2 > of term-b’s list with < 17, 4 > < 18, 3 > < 22, 2 > < 24, 4 > < 33, 4 > <
38, 5 > < 43, 5 > from term-a’s list. After 7 comparisons then we know where to place
< 40, 2 >.

• Compare < 66, 1 > of term-b’s list with < 55, 3 > < 64, 2 > < 68, 4 > from term-a’s list.
After 3 comparisons then we know where to place < 66, 1 >.

I wil use skipping with chunk size = 4. To do this we will split long list to chunks; in our case
this is the list of term-a. So, there will be 5 chunks for term-a’s inverted list.

• chunk-1: !< 1, 5 > < 4, 1 > < 9, 3 > < 10, 4 > chunk descriptor: < 10, 4 >

• chunk-2: !< 12, 4 > < 17, 4 > < 18, 3 > < 22, 2 > chunk descriptor: < 22, 2 >

• chunk-3: !< 24, 4 > < 33, 4 > < 38, 5 > < 43, 5 > chunk descriptor: < 43, 5 >

• chunk-4: !< 55, 3 > < 64, 2 > < 68, 4 > < 72, 5 > chunk descriptor: < 72, 5 >

• chunk-5: !< 75, 1 > < 88, 2 > chunk descriptor: < 88, 2 >

Here you see the comparisons for skipping concept:

• Merge < 12, 3 >

1. Compare with first chunk’s descriptor. 10 < 12 so go to next chunk

2. Compare with second chunk’s descriptor 12 < 22 so we will insert < 12, 3 > into
this chunk
3. Compare with <12,4>. We have found the position so this merge is completed.

• Merge < 40, 2 >

1. Compare with first chunk’s descriptor. 10 < 40 so go to next chunk

2. Compare with second chunk’s descriptor. 22 < 40 so go to next chunk
3. Compare with third chunk’s descriptor 40 < 43 so we will insert < 40, 2 > into this
chunk
4. Compare with < 24, 4 >. 24 < 40 so go to next tuple.
5. Compare with < 33, 4 >. 33 < 40 so go to next tuple.
6. Compare with < 38, 5 >. 38 < 40 so go to next tuple.
7. Compare with < 43, 5 >. We have found the position so this merge is completed.

• Merge < 66, 1 >

1. Compare with first chunk’s descriptor. 10 < 66 so go to next chunk

6
2. Compare with second chunk’s descriptor. 22 < 66 so go to next chunk
3. Compare with third chunk’s descriptor. 43 < 66 so go to next chunk
4. Compare with fourth chunk’s descriptor 66 < 72 so we will insert < 66, 1 > into this
chunk
5. Compare with < 55, 3 >. 55 < 66 so go to next tuple.
6. Compare with < 64, 2 >. 64 < 66 so go to next tuple.
7. Compare with < 68, 4 >. We have found the position so this merge is completed.

Using skipping method we have made 17 comparisons.

3.1.1 B. P OSTING L IST

Give a posting list of term-a (above it is given in standard sorted by document number order)
in the following forms: a) ordered by f d ,t , b) ordered by frequency information in prefix form.

• Ordered by f d ,t (the frequency of term t in document d): , < 1, 5 >, < 38, 5 >, < 43, 5 >
, < 72, 5 >, < 10, 4 >, < 12, 4 >, < 17, 4 >, < 24, 4 >, < 33, 4 >, < 68, 4 >, < 9, 3 >, < 18, 3 >
, < 55, 3 >, < 22, 2 >, < 64, 2 >, < 88, 2 >, < 4, 1 >, < 75, 1 >

• Ordered by frequency information in prefix form: < 5 : 4 : 1, 38, 43, 72 >, < 4 : 6 :
10, 12, 17, 24, 33, 68 >, < 3 : 3 : 9, 18, 55 >, < 2 : 3 : 22, 64, 88 >, < 1 : 2 : 4, 75 >

List with frequency order could improve the query processing time. We can look ter frequency
of documents and the document with same terms with high frequencies could be considered
nearly the same and the ones that is very different with term frequencies could be considered
as irrelevant. To make this more significant I prefer to use a post list ordered by frequency
information in prefix form. Because using such a list doing comparisons are more efficient
because there is no need to process entire list.

4 QUESTION 4
What is meant by Cranfield approach to testing in information retrieval?

The Cranfield approach uses test collections to evaluate documents and information retrieval:
standardised resources used to evaluate information retrieval systems with respect to system.
The main components of an information retrieval test collection are the document collec-
tion, topics, and relevance assessments. These, together with evaluation measures, simulate
the users of a search system in an operational setting and enable the effectiveness of an in-
formation retrieval system to be quantified. Evaluating information retrieval systems in this
manner enables the comparison of different search algorithms and the effects on altering al-
gorithm parameters to be systematically observed and quantified.

7
The most common way of using the Cranfield approach is to compare various retrieval strate-
gies or systems, which is referred to as comparative evaluation. In this case the focus is on the
relative performance between systems, rather than absolute scores of system effectiveness.
To evaluate using the Cranfield approach typically requires these stages: (1) select different
retrieval strategies or systems to compare; (2) use these to produce ranked lists of documents
(often called runs) for each query (often called topics); (3) compute the effectiveness of each
strategy for every query in the test collection as a function of relevant documents retrieved;
(4) average the scores over all queries to compute overall effectiveness of the strategy or sys-
tem; (5) use the scores to rank the strategies/systems relative to each other.

5 QUESTION 5
Is pooling a reliable approach for the construction of test collections ? Find the paper by
Zobel regarding this issue, it may help. What is bpref? Is it anyway related to the pooling con-
cept?

The pooling method examines the top ranked k documents from each n independent re-
trieval efforts. If k and n are large, the set of documents judged relevant may be assumed to
be representative of the ideal set therfore suitable for evaluating retrieval results.

One of the disadvantages of having measurement depth m exceed pool depth p is that similar
systems can reinforce each other. Consider a pool of three systems, A, B, and C. Suppose A
and B use similar retrieval mechanisms such that some of the documents retrieved by A at a
depth between p and m are retrieved by B at depth less than p ,and vice versa; and suppose C
is based on different principles to A and B. Then the performance of C can be underestimated.
That is, this methodology may misjudge the performance of novel retrieval techniques.

Another potential disadvantage of pooling is that, there could be a technique whose effec-
tiveness is underestimated because of inopportunity to contribute the pool which is caused
by identifying only a fraction of the relevant documents.

6 QUESTION 6
Please study the paper "Data stream clustering," by Silva et al. ACM Computing Survey, 2013.

6.1 A. D ATA S TREAM VS . T RADITIONAL I.R. E NVIRONMENTS

Describe the similarities and differences of the data stream and traditional information re-
trieval environments.

• Traditional information retrieval systems run querries on a collection of information

which is available.

8
• In a data stream manner data flows so the systems make filtering the information; in
other words the data is not static.

• In traditional information retrieval first we retrieve documents then process them in

data stream we retrieve and process documents at the same time.

• In traditional data clustering is more accurrate because we already know the data in
data stream we don’t

6.2 B. W INDOW M ODEL

What is the concept of time window, what are the advantages of using time window?

In a data streaming system a stream could change the data distribution. Systems who gives
the equal importance to new and old data cannot capture these changes. To avoid that kind
of problems window models have been proposed. In this type of models you have a win-
dow where you store incomming streams and then process them. This storing and storing
and processing steps depends on the time window model that you use. Actually there are 3
common types of window models which are; (i ) sliding windows, (i i ) damped windows and
(i i i ) landmark windows. All these models try to efficiently use incomming data and avoid the
problems like we described in the beginning of the paragraph.

6.3 C. A BSTRACTION
What is meant by abstraction?

Here by abstraction they mean summarizing the data to deal with space and memory con-
straints of stream applications. With summarizing the stream and preserve their meaning
without the need of actually storing them.

6.4 D. D ATA S TRUCTURES

Choose a data structure defined in the data abstraction part and explain its components?

Prototype Array: Briefly a prototype array is a simplified summarization data structure. It

summarizes the data partition and stores prototypes like medoids and centroids.

For examle in Stream [Guha et al. 2000] there is an array of prototype. To summarize the
stream they divide it into chunks. Each chunk summarized in in 2k representative objects by
using a variant of k-medoids algorithm. Compressing the descriptions is repeated until an
array of m prototypes is obtained. Next these m prototypes are further compressed into 2k
prototypes and the process continues along the stream.(See fig. 6.1)

9
Figure 6.1: Overviewof Stream [Guha et al. 2000], which makes use of a prototype array

6.5 E. C LUSTERING
In data stream clustering how do we use the standard clustering algorithms such as k-means?

One of the most effective idea to cluster streams is usage of CF vectors. To adopt CF vectors
to k-means clustering there are 3 ways:

• Calculate the centroid of each CF vector. Each centroid will be clustered by k-means

• Same as the previous item but at this time we use weighting.

• Aooley the clustering algorithm directly to the CF vectors since thier components keep
the sufficient statistics for calculating mos of the required distances and quality met-
rics.

7 QUESTION 7
Study the paper "A survey of Web clustering search engines" by Carpineto et al. ACM Com-
puting Survey, 2009. Find two of the web search engines defined in the paper and compare
their user interface, performance in general, by conduction a simple set experiments. Please
also define your experiments. If the eng ines mentioned in the paper do not exist find other
similar search engines on the web. Note that you are not writing a research you are trying to
convince people with IR knowledge. Try to be more convincing by providing some quantita-
tive observations.

For this question I chose to compare Google’s and Yahoo’s search engines. Let’s begin with
user interface; as you can see in the following figures their user interfaces are almost the
same. It has one text box and a serch button.

10
(a) Google Search Engine (b) Yahoo Search Engine

Figure 7.1: User Interfaces

To compare the results I have made 30 tests; 10 for just words like jaguar, ankara, paris,
bosphor etc. 10 for searches with 2 to 5 words and 10 for searches 6 to 10 words. After these
tests with single word searches the top ten results of google and yahoo is almost the same but
when we increase the number of words the accuracy between them decreases. In my tests I
have obtained 95% of accuracy for 1 word search, 78% accuracy for 2 to 5 word searches and
63% accuracy for 6 to 10 word searches. But also instead of looking top ten if we look top 15
results these persentages increases a little.

(a) Google Search for Jaguar (b) Yahoo Search for Jaguar

Figure 7.2: Search Result Example

After my tests I can say that top 3-4 results of google are better than yahoo. For example when
I search something that is relevant to a research topic google directly returns some papers
about it but yahoo don’t. So if you want to find more relevant pages for your searching phrase
in the top 3-4 of the list google is better.

8 QUESTION 8
On data forms used in clustering:

11
8.1 A. N OMINAL D ATA
What is nominal data? Give two examples

Nominal data are items which are differentiated by a simple naming system. For example:

• Asian countries

• The number pinned on a building

8.2 B. D M ATRIX
What is the type of data (D matrix) we use in document clustering (nominal etc.)?

D matrix is in the ratio type of data because D it ,hj element of D matrix gives us the number of
occurances of term j in document i.

8.3 C. N OMINAL D ATA IN C LUSTERING

Can we use nominal data in clustering algorithm we have in IR? If your answer is no how can
we convert nominal data into a meaningful form so that they can be used for clustering? Do
some research on this on the Web. State your references

Actually we cannot use nominal data types in clustering algorithms we have in IR. Because in
this type of clustering algorithms we need a distance measure. So at this point to use nominal
data for clustering firstly we need to normlize them to a form that we can calculate distances.
We cannot do this with appointing random integers to these data because similarities must
be conserved. After this transformation then we can use our clustering algorithms. In the
web we see several papers that proposes some transformation methods:

1. A New Clustering Algorithm On Nominal Data Sets, Bin Wang

2. Clustering Categorical Data, Steven X. Wang

3. Clustering nominal and numerical data: a new distance concept for an hybrig genetic
algorithm

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
Solution.: Increase - 3
No ratings yet
Solution.: Increase - 3
5 pages
The Medieval Welsh Poetry Associated With Owain Glyndwr
No ratings yet
The Medieval Welsh Poetry Associated With Owain Glyndwr
5 pages
Information Retrieval
No ratings yet
Information Retrieval
3 pages
20230922044043-Chapter 1
No ratings yet
20230922044043-Chapter 1
4 pages
2019 - Final Solution - Spring - IR
No ratings yet
2019 - Final Solution - Spring - IR
10 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
Ir End Pyq Sols
No ratings yet
Ir End Pyq Sols
8 pages
09 Indexes2
No ratings yet
09 Indexes2
5 pages
Chapter15 1
No ratings yet
Chapter15 1
43 pages
HW 3 Sol
No ratings yet
HW 3 Sol
8 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
12 CS Set A Anskey
No ratings yet
12 CS Set A Anskey
16 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
01 Intro
No ratings yet
01 Intro
145 pages
Final Review
No ratings yet
Final Review
96 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
20 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
Project Report
No ratings yet
Project Report
5 pages
T1 PDF
No ratings yet
T1 PDF
2 pages
Supervisionguide16 17 Students
No ratings yet
Supervisionguide16 17 Students
17 pages
L05
No ratings yet
L05
33 pages
IR - Midsem Question Paper - 2024 - Solutionfull
No ratings yet
IR - Midsem Question Paper - 2024 - Solutionfull
7 pages
Certificate: T.Y.Bsc Cs
No ratings yet
Certificate: T.Y.Bsc Cs
120 pages
ISR Question Bank
No ratings yet
ISR Question Bank
19 pages
Quiz&Solution
No ratings yet
Quiz&Solution
2 pages
IR Journal
No ratings yet
IR Journal
36 pages
Guc 437 59 31055 2023-05-25T16 41 09
No ratings yet
Guc 437 59 31055 2023-05-25T16 41 09
15 pages
Introduction To Information Rertrieval Recitation
No ratings yet
Introduction To Information Rertrieval Recitation
2 pages
asila-IR
No ratings yet
asila-IR
16 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
Final Exam (Spring 2020 - V1)
No ratings yet
Final Exam (Spring 2020 - V1)
11 pages
Modified by Dr. ISSAM ALHADID 11/3/2019
No ratings yet
Modified by Dr. ISSAM ALHADID 11/3/2019
112 pages
Midterm Sol
No ratings yet
Midterm Sol
6 pages
Question Bank - BE IT - ISR
No ratings yet
Question Bank - BE IT - ISR
2 pages
Query Processing
No ratings yet
Query Processing
77 pages
DBMS Indexing
No ratings yet
DBMS Indexing
43 pages
Midterm 15w2
No ratings yet
Midterm 15w2
8 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
Query Languages
No ratings yet
Query Languages
54 pages
Query Execution
No ratings yet
Query Execution
87 pages
C10 IR M2021 IndexConstruction SimpleandDistributed
No ratings yet
C10 IR M2021 IndexConstruction SimpleandDistributed
42 pages
Exercises&Solutions
No ratings yet
Exercises&Solutions
3 pages
Cs 473 HW 5
No ratings yet
Cs 473 HW 5
4 pages
Computer Network Assignment Help
100% (1)
Computer Network Assignment Help
17 pages
Lect 13-Text Ranking
No ratings yet
Lect 13-Text Ranking
58 pages
Irt Ans
No ratings yet
Irt Ans
9 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
CH 1 Query Processing
No ratings yet
CH 1 Query Processing
38 pages
13 QP1
No ratings yet
13 QP1
33 pages
DSA Project Details
No ratings yet
DSA Project Details
6 pages
TechVault Database Indexes Hash Composite
No ratings yet
TechVault Database Indexes Hash Composite
17 pages
DBMS IMPORTANT UNIT-4 QUESTIONS and Answer
No ratings yet
DBMS IMPORTANT UNIT-4 QUESTIONS and Answer
5 pages
03 Dictionaries
No ratings yet
03 Dictionaries
112 pages
03 Dictionaries
No ratings yet
03 Dictionaries
112 pages
Solution Quiz 2 Set A 2025
No ratings yet
Solution Quiz 2 Set A 2025
3 pages
CSE 301 Lecture-8-Indexing WT
No ratings yet
CSE 301 Lecture-8-Indexing WT
31 pages
Answer Midterm 2024 - 11 - 19
No ratings yet
Answer Midterm 2024 - 11 - 19
4 pages
Lec5 Flask
No ratings yet
Lec5 Flask
5 pages
Sen QB5
No ratings yet
Sen QB5
18 pages
Lec4 Designpattern
No ratings yet
Lec4 Designpattern
48 pages
MNU CAI ICI334 Lec7
No ratings yet
MNU CAI ICI334 Lec7
30 pages
MNU CAI ICI334 Lec4&5
No ratings yet
MNU CAI ICI334 Lec4&5
33 pages
BDA Lec1
No ratings yet
BDA Lec1
25 pages
BDA Lec3
No ratings yet
BDA Lec3
48 pages
Sodapdf
No ratings yet
Sodapdf
4 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
BDA Lec4
No ratings yet
BDA Lec4
40 pages
Lec. 3
No ratings yet
Lec. 3
18 pages
Lecture 02,03
No ratings yet
Lecture 02,03
54 pages
AI Lecture 9
No ratings yet
AI Lecture 9
39 pages
Lecture 9 - MapReduce
No ratings yet
Lecture 9 - MapReduce
50 pages
Lecture 7 - Wide Column Stores - Part 1
No ratings yet
Lecture 7 - Wide Column Stores - Part 1
30 pages
Section 5
No ratings yet
Section 5
7 pages
Chapter 8 Concurrency-P1
No ratings yet
Chapter 8 Concurrency-P1
30 pages
Bill of Material Functionality in SAP ASUG Wisconsin PDF
No ratings yet
Bill of Material Functionality in SAP ASUG Wisconsin PDF
45 pages
Heath Anthology 5 Ins Guide
100% (2)
Heath Anthology 5 Ins Guide
1,175 pages
Karthik V.: Sr. Azure Devops Engineer
No ratings yet
Karthik V.: Sr. Azure Devops Engineer
9 pages
DM No 167 S 2021 Evaluation Rating Sheets For Learning Resources
No ratings yet
DM No 167 S 2021 Evaluation Rating Sheets For Learning Resources
28 pages
FH
No ratings yet
FH
6 pages
Meta Serif TF Bold
No ratings yet
Meta Serif TF Bold
4 pages
Final - '08-'09-1
No ratings yet
Final - '08-'09-1
36 pages
Universidad Autónoma de Entre Ríos: Facultad de Humanidades, Artes y Ciencias Sociales
No ratings yet
Universidad Autónoma de Entre Ríos: Facultad de Humanidades, Artes y Ciencias Sociales
12 pages
Sap Start and Stop
No ratings yet
Sap Start and Stop
15 pages
TVL CSS12 Q2 M14
No ratings yet
TVL CSS12 Q2 M14
10 pages
New Synopsis
No ratings yet
New Synopsis
12 pages
Figures of Speech Revision
No ratings yet
Figures of Speech Revision
1 page
Final Ppt. Epm
No ratings yet
Final Ppt. Epm
24 pages
SESSION 10 - Final Testssa
No ratings yet
SESSION 10 - Final Testssa
3 pages
Empathic Listening Case Study
No ratings yet
Empathic Listening Case Study
3 pages
Syllabus Class 7
No ratings yet
Syllabus Class 7
2 pages
Soal Kelas 11 Latihan Sas
No ratings yet
Soal Kelas 11 Latihan Sas
16 pages
C1 Input Manual 2007
No ratings yet
C1 Input Manual 2007
342 pages
Final Term Lecture and Activities
No ratings yet
Final Term Lecture and Activities
12 pages
Stds62 Using SWIFTNet PDF
100% (1)
Stds62 Using SWIFTNet PDF
305 pages
CH 12
No ratings yet
CH 12
48 pages
Pandas
No ratings yet
Pandas
91 pages
Assignment 2
No ratings yet
Assignment 2
10 pages
Using Student-Centered Methods With Teacher-Centered Students (Marilyn Lewis Hayo Reinders) (Z-Library)
No ratings yet
Using Student-Centered Methods With Teacher-Centered Students (Marilyn Lewis Hayo Reinders) (Z-Library)
126 pages
Literature in English 1 2019 MS Corrected-1
No ratings yet
Literature in English 1 2019 MS Corrected-1
14 pages
Adiwach Unit-3-Giving-opinions-exercise
No ratings yet
Adiwach Unit-3-Giving-opinions-exercise
4 pages
The Deal PDF
No ratings yet
The Deal PDF
26 pages
Imaginary Situations
No ratings yet
Imaginary Situations
5 pages
Toolkit: Interface Tool Development Software
No ratings yet
Toolkit: Interface Tool Development Software
3 pages

Assignment 1

Uploaded by

Assignment 1

Uploaded by

C OMPUTER E NGINEERING D EPARTMENT

1.1 A. T REC I NTERPOLATION RULE

number o f r el ev ant i t ems r et r i eved

number o f r el ev ant i t ems r et r i eved

Table 1.1: Recall- Precision Table For Q1

Table 1.2: Interpolated Recall- Precision Table For Q1

Figure 1.1: Recall-Precision Graph of Q1

Table 1.3: Recall- Precision Table For Q2

Table 1.4: Interpolated Recall- Precision Table For Q2

Figure 1.2: Recall-Precision Graph of Q2

1.2 B. R-P RECISION

• Q2: R is 4 and there are 2 relevant documents in 4 retrieved documents so R-Precision

1.3 C.F IND M AP

3.1 A. S KIPPING C ONCEPT

Here you see the comparisons for skipping concept:

• Merge < 12, 3 >

1. Compare with first chunk’s descriptor. 10 < 12 so go to next chunk

• Merge < 40, 2 >

1. Compare with first chunk’s descriptor. 10 < 40 so go to next chunk

• Merge < 66, 1 >

1. Compare with first chunk’s descriptor. 10 < 66 so go to next chunk

Using skipping method we have made 17 comparisons.

3.1.1 B. P OSTING L IST

6.1 A. D ATA S TREAM VS . T RADITIONAL I.R. E NVIRONMENTS

• Traditional information retrieval systems run querries on a collection of information

• In traditional information retrieval first we retrieve documents then process them in

6.2 B. W INDOW M ODEL

6.4 D. D ATA S TRUCTURES

Prototype Array: Briefly a prototype array is a simplified summarization data structure. It

• Same as the previous item but at this time we use weighting.

Figure 7.1: User Interfaces

Figure 7.2: Search Result Example

• The number pinned on a building

8.3 C. N OMINAL D ATA IN C LUSTERING

1. A New Clustering Algorithm On Nominal Data Sets, Bin Wang

2. Clustering Categorical Data, Steven X. Wang

You might also like