0% found this document useful (0 votes)
265 views14 pages

SodaPDF Converted Text

This document contains questions about search engines and information retrieval. It begins by asking multiple choice questions about crawling, indexing, and search engine optimization techniques. It then provides a set of true/false questions about the functions and uses of search engines. Finally, it asks additional multiple choice questions about information retrieval topics like indexing, query processing, and structured vs unstructured data. The document tests knowledge of key concepts and processes in search and information retrieval.

Uploaded by

Suraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
265 views14 pages

SodaPDF Converted Text

This document contains questions about search engines and information retrieval. It begins by asking multiple choice questions about crawling, indexing, and search engine optimization techniques. It then provides a set of true/false questions about the functions and uses of search engines. Finally, it asks additional multiple choice questions about information retrieval topics like indexing, query processing, and structured vs unstructured data. The document tests knowledge of key concepts and processes in search and information retrieval.

Uploaded by

Suraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Question 1

______ is the process of fetching all the web pages linked to a web site.
A
Indexing
B
Calculating Relevancy
C
Crawling
D
Processing

Question 2
Task of Crawling is performed by a complex software which is called as __________.
A Spider
B All of these
C Boat
D Crawler

Question 3

Websites fetched by crawler are indexed and kept in huge database, this process is called as __________.
A None of these
B Indexing
C Optimizing
D Crawling

Question 4

Which of the following are types of SEO ?


A Front Page SEO
B On Page SEO
C On Page and Front Page SEO
D Off Page SEO
E On Page and Off Page SEO

Question 5

Serving different version of a page to search engine and to human visitors is called ________ SEO.
A Cloaking
B Fooling
C None of these
D Tapping

SEARCH ENGINEE QUESTIONS

What is NOT a primary reason for using a search engine?

They quickly point you to where you need to go.


They provide topic options you may not have considered.
They are free.
They provide very select and specific search results.
Question 2.
A keyword search engine:

returns a list of sites that have been reviewed by humans.


searches a variety of other search engines.
allows all users to change its content.
returns a list of sites based on the search terms you enter.

Question 3.
A subject-oriented search engine:

returns a list of sites based on the list of search terms you enter.
searches a variety of other search engines.
allows all users to change its content.
returns a list of sites that have been reviewed by humans.

Question 4.
A meta search engine:

returns a list of sites based on the list of search terms you enter.
returns a list of sites that have been reviewed by humans.
searches a variety of other search engines.
allows all users to change its content.

Question 5.
To search for information in Google, you would:

browse the topic categories.


send a request to the search librarian.
enter the search terms in the search box.
pay for a search subscription.

Question 6.
To search more effectively, you should:

use multiple words in your search to help narrow the results.


use only one word to find sufficient results.
enclose your search terms in parentheses.
use a meta search engine.

Question 7.
Compared to subject directories, search engines:

return more hits.


return fewer hits.
return hits that are reviewed by humans.
return better-sorted hits.

Question 8.
Boolean logic allows you to:

list your hits according to significance.


better understand the content of Web sites.
require that all words in your search terms are found in a Web site.
find only information related to a certain year.

Question 9.
If you have a rough idea of your search in terms of what type of category it might fall into, it is best to use a:

meta search engine.


search engine such as Google.
formula for the keywords you enter.
subject directory.

Question 10.
If you enter the phrase "Cookie Recipes" with quotation marks in a search engine, you will get results of all pages w
ith:

the words Cookie Recipes on the page together in order.


the word Cookie and the word Recipes in them.
the word Cookie without Recipes.
the word Cookie or the word Recipes in them.

ANSWERS

They provide very select and specific search results.


returns a list of sites based on the search terms you enter.
returns a list of sites that have been reviewed by humans.
searches a variety of other search engines.
enter the search terms in the search box.
use multiple words in your search to help narrow the results.
return more hits.
require that all words in your search terms are found in a Web site.
subject directory.
the words Cookie Recipes on the page together in order.

If a website's search engine saturation with respect to a particular search engine is 20%, what does it mean?

A. 20% of the webpages of the website have been indexed by the search engine
B. Only 20% of the pages of the website will be indexed by the search engine
C. 20% of the websites pages will never be indexed
D. None of the above

Ans : A
Which of the following items search engines don't want?

A. keyword stuffing
B. buying links
C. poor user experience
D. All of the above

Ans : D

Which of the following items search engines don't want?

A. keyword stuffing
B. buying links
C. poor user experience
D. All of the above

Ans : D

1.Distributed indexing is used in:

Select one:
a. All of the above
b. Web-scale indexing
c. Google data centres
d. Parallel tasking

Ans: a. All of the above

2.Which is a good idea for using skip pointers?

Select one:
a. Fewer skips, larger skip spans
b. None
c. Depends upon the no. of comparisons needed
d. More skips, shorter skip spans

Ans: c. Depends upon the no. of comparisons needed

3. Edit distance (Levenshtein distance) is a way of:

Select one:
a. Context-sensitive spelling correction
b. Document correction
c. Isolated word correction
d. Phonetic correction

Ans: c. Isolated word correction

4.Boolean retrieval model does not provide provision for:

Select one:
a. Ranked search
b. Proximity search
c. Phrase search
d. Both proximity and ranked search

Ans: d. Both proximity and ranked search

5. Permuterm indices are used for solving:

Select one:
a. None
b. Boolean queries
c. Phrase queries
d. Wildcard queries

Ans: d. Wildcard queries

6. A large repository of documents in IR is called as:

Select one:

a. Corpus
b. Database
c. Dictionary
d. Collection

Ans: a. Corpus

7. Benefits of using a hash table is:

Select one:

a. Do not need to rehash everything periodically if vocabulary keeps growing.

b. Lookup in a hash table is faster than lookup in a tree.

c. All of the above

d. No prefix search is required

Ans: b. Lookup in a hash table is faster than lookup in a tree.

8. Variable-size postings lists is used when:

Select one:
a. More seek time is desired and the corpus is dynamic
b. Less seek time is desired and the corpus is dynamic
c. Less seek time is desired and the corpus is static
d. More seek time is desired and the corpus is dynamic

Ans: d. More seek time is desired and the corpus is dynamic


9. An alternative to equivalence classing is to do:

Select one:
a.Asymmetric expansion
b. Symmetric expansion
c. Case folding
d. Normalization

Ans: d. Normalization

10. We need external sorting algorithms to:

Select one:

a. Maximize the disk seek time.


b. Maintain constant disk seek time
c. Minimize the disk seek time.
d. None

Ans: c. Minimize the disk seek time.

11. Benefits of using B-trees:

Select one:
a. Re-balancing is cheap
b. Balanced trees allow efficient retrieval
c. Faster O(log M)
d. Solves the prefix problem.

Ans: d. Solves the prefix problem.

12. Postings list should be sorted by:

Select one:
a. Document Frequency
b. DocID
c. TermID
d. Term frequency

Ans: b. DocID

13. Key idea behind Single-pass in-memory indexing is:

Select one:
a. Don’t sort, Accumulate postings in postings lists as they occur.
b. Generate separate dictionaries for each block.
c. All of the above
d. No need to maintain term-termID mapping across blocks.
Ans: c. All of the above

14. For postings of length L, no. of skip pointers required are:

Select one:
a. Use L evenly-spaced skip pointers

b. Use L^2 evenly-spaced skip pointers.

c. Use L^1/2 evenly-spaced skip pointers

d. Use 2L evenly-spaced skip pointers.

Ans: c. Use L^1/2 evenly-spaced skip pointers

15. For query optimization while intersecting two postings list, we should:

Select one:
a. Process in the order of increasing document frequency
b. Process in any order
c. None of the above
d. Process in the order of decreasing document frequency

Ans: a. Process in the order of increasing document frequency

16. The goal of IR is to:

Select one:
a.find documents relevant to an information need
b. find documents relevant to an information need from a given document set
c. find documents relevant to an information need from a large document set
d. find documents relevant to an information need from a small document set

Ans: c. find documents relevant to an information need from a large document set

17. Best implementation approach for dynamic indexing is:

Select one:
a. Periodic re-indexing
b. Using Invalidation bit-vector for deleted docs
c. None
d. Using logarithmic merge

Ans: d. Using logarithmic merge

18. Issues in biword indexes are:

Select one:
a. Any one
b. Index blowup due to bigger dictionary
c. Both
d. False positives

Ans: c. Both

19. Any string of terms of the following form is called an extended biword:

Select one:
a. NNX*
b. NXNN
c. *NNX
d. NX*N

Ans:d. NX*N

20. Structured data allows for:

Select one:

a. Does not depend on data complexity

b. Less complex queries

c. No relationship

d. More complex queries

Ans: d. More complex queries

21. Blocked sort-based Indexing is a method of:

Select one:
a. Sorting with more disk seeks.
b. Merging with fewer disk seeks.
c. Comparing with fewer disk seeks.
d. Sorting with fewer disk seeks.

Ans: a. Sorting with more disk seeks.

22. Term-document incidence matrix is:

Select one:
a. Sparse
b. Depends upon the data
c. Dense
d. Cannot predict

Ans: a. Sparse

23. Lemmatization is a technique for:


Select one:
a. Ranking documents
b. Case folding
c. Normalization
d. Tokenization

Ans: c. Normalization

24. If list lengths are x and y, merge takes:

Select one:
a. O(Yn) operations
b. O(xy) operations
c. O(xn) operations
d. O(x+y) operations

Ans: d. O(x+y) operations

25. Unstructured data tends to refer to information on the web and is processed using:

Select one:
a. Both
b. Database systems
c. IR systems
d. None

Ans: c. IR systems

Question 1
Consider the following documents:
D1. Cat in the hat
D2. The cat chased the rat
D3. The rat died
D4: The cat died
What is the space requirement for an uncompressed Boolean term-document incidence matrix of the above documen
ts?

Select one:
7 bytes
28 bits
28 bytes
7 bits
The correct answer is: 28 bits

Question 2
Which of the following terms have the same soundex code?

Select one or more:


Brightsite
Briteside
Brightside
Feedback
The correct answer is: Brightside, Brightsite

Question 3
Consider an index for 100000 documents each having a length of 750 words. Assume there are 200K distinct terms i
n total. What is the minimum number of bits required for representing the Doc-ID?

Select one:
8 bits
18 bits
17 bits
20 bits
The correct answer is: 17 bits

Question 4
Which of the following is(are) NOT true with Google Search Engine?

Select one:
It offers specialized search services

It does stemming
It does stop-word removal
None of the choices
The correct answer is: None of the choices

Question 5
A fragment from an inverted index (augmented with positional information) is given below.
Information: d1:12 ; d2:23,32,43; d3:13, d5:32,45,80
systems: d1:15; d2:34,42; d3: 35, d5: 38
Which of the following phrase(s) has(have) possible occurrences in the above document sequence?

Select one or more:


“Information retrieval systems”
“Information systems”
“Information theory retrieval systems”
None of the choices

The correct answer is: “Information retrieval systems”, “Information theory retrieval systems”

Question 7
Consider the following fragment of a positional index with the format:
word: document: (position, position, . . .); document:(position, . . .i). . .
Gates: 1: (3); 2: (6); 3: (2,17); 4: (1);
IBM: 4: (3); 7: (14);
Microsoft: 1: (1); 2: (1,21); 3: (3); 5: (16,22,51);
The /k operator, word1 /k word2 finds occurrences of word1 within k words of word2 (either on left or right side), w
here k is a positive integer argument. Thus k = 1 demands that word1 be adjacent to word2.
What is the set of documents that satisfy the query Gates /2 Microsoft?

Select one:
1,3
3
1
No document satisfies the query
The correct answer is: 1
Question 8
Given the query uni*e , if you want to search for permuterm wildcard index, which of the following keys can be loo
ked upon?

Select one:
e$uni*
e$uin*
$unie*
Ie$un*
The correct answer is: e$uin*

Question 9
If X denotes the length of string s1 and Y denotes the length of the string s2, then the edit distance between s1 and s
2 is never more than --------------------

Select one:
Min(X,Y)
None of the Choices
Max(X,Y)
X+Y
The correct answer is: Max(X,Y)

Question 10
What is the soundex code for the term “amazing”?
Select one:
A552
A252
A525
A255

The correct answer is: A525

Question 11
Given a document collection of 1000 documents which has 110 relevant documents for a given query and if the IR s
ystem retrieves 30 relevant and 15 irrelevant documents, what is the recall value of the system?

Select one:
0.03
0.27
0.33
0.66

The correct answer is: 0.27

Question 12
When Lemmatization is applied to the term “Destruction” to which of the following form it gets reduced?

Select one:
Destruct
Destroy
Destruc

The correct answer is: Destroy


Question 13
Variable-size postings lists is used when

Select one:
Less seek time is desired and the corpus is dynamic
Less seek time is desired and the corpus is static
More seek time is desired and the corpus is dynamic
More seek time is desired and the corpus is static

The correct answer is: More seek time is desired and the corpus is dynamic

Question 14
Inverted Index Dictionary is sorted by

Select one:
Term frequency
Document Frequency
Term/TermID
DocID

The correct answer is: Term/TermID

Question 15
Which of the following is called an extended biword?
Select one:
NXNN
NNX*
NX*N
*NNX
The correct answer is: NX*N

Question 16
If the two postings list are of length X and Y , then maximum number of operations needed for merge is

Select one:
max(X,Y)
X+Y
X*Y
min(X,Y)
The correct answer is: X+Y

Question 17
Given the Boolean query with terms (cat OR bat) AND NOT (dog or mat)
Which of the following will be the equivalent Disjunctive Normal Form of
the above query?

Select one:
(cat AND (NOT dog) AND (NOT mat)) OR (cat AND bat AND(NOT dog))
(cat AND (NOT dog) AND (NOT mat)) OR (bat AND (NOT mat) AND(NOT dog))
None of the Choices
(cat AND bat AND (NOT dog)) OR (cat AND bat AND (NOT mat))

The correct answer is: (cat AND (NOT dog) AND (NOT mat)) OR (bat AND (NOT mat) AND(NOT dog))
Question 18
If string s1= filosophi and s2= philosophy, what is the minimum edit distance
between s1and s2?

Select one:
3
5
4
2
The correct answer is: 3

Question 19
Given a document containing the sentence “I left my left bag at my home” the number of tokens in the sentence is
Select one:
8
6
4
The correct answer is: 8

Question 20
Given a document collection which has 35 relevant documents, if an IR system retrieves 10 relevant and 13 irreleva
nt documents, what is the precision value of the system?

Select one:
0.43
0.28
0.33
0.66
The correct answer is: 0.43

Question 21
Consider the following documents:
Doc1: new home sales top forecasts
Doc2: home sales rise in july
Doc3: increase in home sales in july
Doc4: july new home sales rise
When the Term Document incidence matrix is constructed and the query home AND (new OR july) is executed o
n it, the resultant doc’s retrieved will be

Select one:
Doc1
Doc1,Doc3, Doc4
Doc1, Doc4,
Doc1, Doc2,Doc3,Doc4
The correct answer is: Doc1, Doc2,Doc3,Doc4

Question 22
Yahoo search engine uses stemming for its Index generation

Select one:
True
False
The correct answer is 'False'.

Question 23
When stemming is used, it should be used for both indexing and query processing.
Select one:
True
False
The correct answer is 'True'.

Question 24
Boolean Retrieval model maintains the term frequency. Is the statement True or False.

Select one:
True
False
The correct answer is 'False'.

Question 25
Phrase queries can be solved using N-grams.
Select one:
True
False
The correct answer is 'False'.

You might also like