Search Engine
Search Engine
Search Engine
Shashwat Shriparv
[email protected]
InfinitySoft
Why Engine?
Finding key information
from gigantic World Wide
Web is similar to find a
needle lost in haystack.
For this purpose we
would use a special
magnet that would
automatically, quickly and
effortlessly attract that
needle for us.
In this scenario magnet
is “Search Engine”
“Even a blind squirrel finds a nut ,
occasionally.” But few of us are
determined enough to search through
millions, or billions, of pages of
information to find our “nut.” So, to
reduce the problem to a, more or less,
manageable solution, web “search
engines” were introduced a few years
ago.
Search Engine
A software program that searches a
database and gathers and reports
information that contains or is related to
specified terms.
OR
A website whose primary function is
providing a search for gathering and
reporting information available on the
Internet or a portion of the Internet.
Eight reasonably well-known
Web search engines are : -
Top 10 Search Providers by Searches, August 2007
Provider Searches (000) Share of Total
Searches (%)
4,199,495 53.6
1,561,903 19.9
1,011,398 12.9
435,088 5.6
136,853 1.7
71,724 0.9
37,762 0.5
34,699 0.4
32,483 0.4
31,912 0.4
Other 275,812 3.5
All Search 7,829,129 100.0
Heap
n
Term Frequency,
Where,
• | D | : total number of documents in the corpus
• : number of documents where the
term ti appears (that is ).
Inverse Document Frequency
There are many different formulas used to calculate tf–
idf.
One way of calculating “document frequency” (DF)
is to determine how many documents contain the word
and divide it by the total number of documents in the
collection.
For Example ,If the word computer appears in 1,000
documents out of a total of 10,000,000 then the
document frequency is 0.0001 (1000/10,000,000).
Alternatives to this formula are to take the
log of the document frequency. The natural
logarithm is commonly used. In this example we
would have
idf = ln(1,000 / 10,000,000) =1/ 9.21
Inverse Document Frequency
Pros :-
Easy to use
Able to search more web pages in less
time.
High probability of finding the desired
page(s)
It will get at least some results when no
result had been obtained with traditional
search engines.
Pros and Cons of Meta Search Engines
Cons :-
Metasearch engine results are less relevant, since it
doesn’t know the internal “alchemy” of search engine
used.
Since, only top 10-50 hits are retrieved from each
search engine, the total number of hits retrieved may
be considerably less than found by doing a direct
search.
Advanced search features (like, searches with
boolean operators and field limiting ; use of " ", +/-.
default AND between words e.t.c.) are not usually
available.
Meta Search Engines Cont….
Meta-Search Primary Web Ad Special
Engine Databases Databases Features
Vivisimo Ask, MSN, Gigablast, Looksmart, Open Google Clusters
Directory, Wisenut results