SodaPDF Converted Text
SodaPDF Converted Text
______ is the process of fetching all the web pages linked to a web site.
A
Indexing
B
Calculating Relevancy
C
Crawling
D
Processing
Question 2
Task of Crawling is performed by a complex software which is called as __________.
A Spider
B All of these
C Boat
D Crawler
Question 3
Websites fetched by crawler are indexed and kept in huge database, this process is called as __________.
A None of these
B Indexing
C Optimizing
D Crawling
Question 4
Question 5
Serving different version of a page to search engine and to human visitors is called ________ SEO.
A Cloaking
B Fooling
C None of these
D Tapping
Question 3.
A subject-oriented search engine:
returns a list of sites based on the list of search terms you enter.
searches a variety of other search engines.
allows all users to change its content.
returns a list of sites that have been reviewed by humans.
Question 4.
A meta search engine:
returns a list of sites based on the list of search terms you enter.
returns a list of sites that have been reviewed by humans.
searches a variety of other search engines.
allows all users to change its content.
Question 5.
To search for information in Google, you would:
Question 6.
To search more effectively, you should:
Question 7.
Compared to subject directories, search engines:
Question 8.
Boolean logic allows you to:
Question 9.
If you have a rough idea of your search in terms of what type of category it might fall into, it is best to use a:
Question 10.
If you enter the phrase "Cookie Recipes" with quotation marks in a search engine, you will get results of all pages w
ith:
ANSWERS
If a website's search engine saturation with respect to a particular search engine is 20%, what does it mean?
A. 20% of the webpages of the website have been indexed by the search engine
B. Only 20% of the pages of the website will be indexed by the search engine
C. 20% of the websites pages will never be indexed
D. None of the above
Ans : A
Which of the following items search engines don't want?
A. keyword stuffing
B. buying links
C. poor user experience
D. All of the above
Ans : D
A. keyword stuffing
B. buying links
C. poor user experience
D. All of the above
Ans : D
Select one:
a. All of the above
b. Web-scale indexing
c. Google data centres
d. Parallel tasking
Select one:
a. Fewer skips, larger skip spans
b. None
c. Depends upon the no. of comparisons needed
d. More skips, shorter skip spans
Select one:
a. Context-sensitive spelling correction
b. Document correction
c. Isolated word correction
d. Phonetic correction
Select one:
a. Ranked search
b. Proximity search
c. Phrase search
d. Both proximity and ranked search
Select one:
a. None
b. Boolean queries
c. Phrase queries
d. Wildcard queries
Select one:
a. Corpus
b. Database
c. Dictionary
d. Collection
Ans: a. Corpus
Select one:
Select one:
a. More seek time is desired and the corpus is dynamic
b. Less seek time is desired and the corpus is dynamic
c. Less seek time is desired and the corpus is static
d. More seek time is desired and the corpus is dynamic
Select one:
a.Asymmetric expansion
b. Symmetric expansion
c. Case folding
d. Normalization
Ans: d. Normalization
Select one:
Select one:
a. Re-balancing is cheap
b. Balanced trees allow efficient retrieval
c. Faster O(log M)
d. Solves the prefix problem.
Select one:
a. Document Frequency
b. DocID
c. TermID
d. Term frequency
Ans: b. DocID
Select one:
a. Don’t sort, Accumulate postings in postings lists as they occur.
b. Generate separate dictionaries for each block.
c. All of the above
d. No need to maintain term-termID mapping across blocks.
Ans: c. All of the above
Select one:
a. Use L evenly-spaced skip pointers
15. For query optimization while intersecting two postings list, we should:
Select one:
a. Process in the order of increasing document frequency
b. Process in any order
c. None of the above
d. Process in the order of decreasing document frequency
Select one:
a.find documents relevant to an information need
b. find documents relevant to an information need from a given document set
c. find documents relevant to an information need from a large document set
d. find documents relevant to an information need from a small document set
Ans: c. find documents relevant to an information need from a large document set
Select one:
a. Periodic re-indexing
b. Using Invalidation bit-vector for deleted docs
c. None
d. Using logarithmic merge
Select one:
a. Any one
b. Index blowup due to bigger dictionary
c. Both
d. False positives
Ans: c. Both
19. Any string of terms of the following form is called an extended biword:
Select one:
a. NNX*
b. NXNN
c. *NNX
d. NX*N
Ans:d. NX*N
Select one:
c. No relationship
Select one:
a. Sorting with more disk seeks.
b. Merging with fewer disk seeks.
c. Comparing with fewer disk seeks.
d. Sorting with fewer disk seeks.
Select one:
a. Sparse
b. Depends upon the data
c. Dense
d. Cannot predict
Ans: a. Sparse
Ans: c. Normalization
Select one:
a. O(Yn) operations
b. O(xy) operations
c. O(xn) operations
d. O(x+y) operations
25. Unstructured data tends to refer to information on the web and is processed using:
Select one:
a. Both
b. Database systems
c. IR systems
d. None
Ans: c. IR systems
Question 1
Consider the following documents:
D1. Cat in the hat
D2. The cat chased the rat
D3. The rat died
D4: The cat died
What is the space requirement for an uncompressed Boolean term-document incidence matrix of the above documen
ts?
Select one:
7 bytes
28 bits
28 bytes
7 bits
The correct answer is: 28 bits
Question 2
Which of the following terms have the same soundex code?
Question 3
Consider an index for 100000 documents each having a length of 750 words. Assume there are 200K distinct terms i
n total. What is the minimum number of bits required for representing the Doc-ID?
Select one:
8 bits
18 bits
17 bits
20 bits
The correct answer is: 17 bits
Question 4
Which of the following is(are) NOT true with Google Search Engine?
Select one:
It offers specialized search services
It does stemming
It does stop-word removal
None of the choices
The correct answer is: None of the choices
Question 5
A fragment from an inverted index (augmented with positional information) is given below.
Information: d1:12 ; d2:23,32,43; d3:13, d5:32,45,80
systems: d1:15; d2:34,42; d3: 35, d5: 38
Which of the following phrase(s) has(have) possible occurrences in the above document sequence?
The correct answer is: “Information retrieval systems”, “Information theory retrieval systems”
Question 7
Consider the following fragment of a positional index with the format:
word: document: (position, position, . . .); document:(position, . . .i). . .
Gates: 1: (3); 2: (6); 3: (2,17); 4: (1);
IBM: 4: (3); 7: (14);
Microsoft: 1: (1); 2: (1,21); 3: (3); 5: (16,22,51);
The /k operator, word1 /k word2 finds occurrences of word1 within k words of word2 (either on left or right side), w
here k is a positive integer argument. Thus k = 1 demands that word1 be adjacent to word2.
What is the set of documents that satisfy the query Gates /2 Microsoft?
Select one:
1,3
3
1
No document satisfies the query
The correct answer is: 1
Question 8
Given the query uni*e , if you want to search for permuterm wildcard index, which of the following keys can be loo
ked upon?
Select one:
e$uni*
e$uin*
$unie*
Ie$un*
The correct answer is: e$uin*
Question 9
If X denotes the length of string s1 and Y denotes the length of the string s2, then the edit distance between s1 and s
2 is never more than --------------------
Select one:
Min(X,Y)
None of the Choices
Max(X,Y)
X+Y
The correct answer is: Max(X,Y)
Question 10
What is the soundex code for the term “amazing”?
Select one:
A552
A252
A525
A255
Question 11
Given a document collection of 1000 documents which has 110 relevant documents for a given query and if the IR s
ystem retrieves 30 relevant and 15 irrelevant documents, what is the recall value of the system?
Select one:
0.03
0.27
0.33
0.66
Question 12
When Lemmatization is applied to the term “Destruction” to which of the following form it gets reduced?
Select one:
Destruct
Destroy
Destruc
Select one:
Less seek time is desired and the corpus is dynamic
Less seek time is desired and the corpus is static
More seek time is desired and the corpus is dynamic
More seek time is desired and the corpus is static
The correct answer is: More seek time is desired and the corpus is dynamic
Question 14
Inverted Index Dictionary is sorted by
Select one:
Term frequency
Document Frequency
Term/TermID
DocID
Question 15
Which of the following is called an extended biword?
Select one:
NXNN
NNX*
NX*N
*NNX
The correct answer is: NX*N
Question 16
If the two postings list are of length X and Y , then maximum number of operations needed for merge is
Select one:
max(X,Y)
X+Y
X*Y
min(X,Y)
The correct answer is: X+Y
Question 17
Given the Boolean query with terms (cat OR bat) AND NOT (dog or mat)
Which of the following will be the equivalent Disjunctive Normal Form of
the above query?
Select one:
(cat AND (NOT dog) AND (NOT mat)) OR (cat AND bat AND(NOT dog))
(cat AND (NOT dog) AND (NOT mat)) OR (bat AND (NOT mat) AND(NOT dog))
None of the Choices
(cat AND bat AND (NOT dog)) OR (cat AND bat AND (NOT mat))
The correct answer is: (cat AND (NOT dog) AND (NOT mat)) OR (bat AND (NOT mat) AND(NOT dog))
Question 18
If string s1= filosophi and s2= philosophy, what is the minimum edit distance
between s1and s2?
Select one:
3
5
4
2
The correct answer is: 3
Question 19
Given a document containing the sentence “I left my left bag at my home” the number of tokens in the sentence is
Select one:
8
6
4
The correct answer is: 8
Question 20
Given a document collection which has 35 relevant documents, if an IR system retrieves 10 relevant and 13 irreleva
nt documents, what is the precision value of the system?
Select one:
0.43
0.28
0.33
0.66
The correct answer is: 0.43
Question 21
Consider the following documents:
Doc1: new home sales top forecasts
Doc2: home sales rise in july
Doc3: increase in home sales in july
Doc4: july new home sales rise
When the Term Document incidence matrix is constructed and the query home AND (new OR july) is executed o
n it, the resultant doc’s retrieved will be
Select one:
Doc1
Doc1,Doc3, Doc4
Doc1, Doc4,
Doc1, Doc2,Doc3,Doc4
The correct answer is: Doc1, Doc2,Doc3,Doc4
Question 22
Yahoo search engine uses stemming for its Index generation
Select one:
True
False
The correct answer is 'False'.
Question 23
When stemming is used, it should be used for both indexing and query processing.
Select one:
True
False
The correct answer is 'True'.
Question 24
Boolean Retrieval model maintains the term frequency. Is the statement True or False.
Select one:
True
False
The correct answer is 'False'.
Question 25
Phrase queries can be solved using N-grams.
Select one:
True
False
The correct answer is 'False'.