Search Engine Comparison
Search Engine Comparison
There is an absolute agreement in most articles for Five experiments about different kinds of
the importance of search engines for web mining information is executed, we picked up five target
due to the huge grow of data among the worldwide pages, those pages contain different kinds of
web. A variety of search engines which offer information and data files, text, audio, photo,
diversified services to its users [10], A lot of article, and a person information page, for each
Comparative Studies were done to clarify and target page we collect five keywords, then we
explain the preference of one search engine among apply two queries, simple and complex one for
others, taking into account the tremendousfeatures each target page in the five search engines of
that search engine companies compete to create comparison,one examination done to check the
and enhance so as it kept them a head start, addition of Boolean logics[conjunction AND,
popularity, usability, website quality of search disjunction OR, inurl, intitle, …] with the queries
engines are goals for cause of excellence. text, here are a table that shows the target pages
More studies focused on the mechanism of how and keywords selected for each query:
search engine works, as in common all have the
main parts of Web crawling or Spidering, Web Experiment1
Target Page1:http://en.wikipedia.org/wiki/Troy
Indexing, and searching. Here we are looking for text information about Troy
Keywords selected: [1.Greek2.Homeric 3.war 4.ancient 5.troy]
First query :[troy]
Crawl the second query :[troy Greek Homeric war ancient]
The WEB WEB Document
IDs Experiment2
TargetPage2:http://www.englishbaby.com/findfriends/view_ph
Rank oto/683675?_sp_album=19680
Results Create Here we are looking for aphoto file of Yassir Arafat And Ahmad Yaseen
inverted Keywords: [1.yassir arafat 2.ahmed yaseen 3.photo 4.Palestine 5.shymaa]
index First query :[yassirarafatahmedyaseen]
Search
second query:[yassir Arafat ahmed yaseen photo Palestine shymaa]
Engine
Servers
Experiment3
TargetPage3:http://www.mp3quran.net/maher.html
Inverted Here we are looking for mp3 file of Quran chapter 96, search in Arabic language
User
Keywords: [1. المعيقلي2. العلق3. قران4. mp3 5.]تحميل
Query Index First query :[]المعيقلي العلق
second query :[ قران العلق المعيقليmp3 ]تحميل
Figure1: Simple Mechanism Representation of How Search Engine Works.
Experiment4
TargetPage4:http://www.alquds.edu/en/faculties/faculty-of-
Web mining is made of three branches i.e. web science-technology/department-of-computer-science-it/135-
content mining (WCM), web structure mining staff/9790-rashid-jayousi.html
Here we are looking for person information.
(WSM) and web usage mining (WUM)[1], WCM Keywords: [1. Rashid Jayousi2. Dr3. Alquds University 4. Staff 5.Palestine]
exploring the proper and relevant information from First query :[Rashid Jayousi Staff]
second query :[ Rashid Jayousi Dr Alquds University Staff]
the contents of web. WSM find out the relation
between different web pages by processing the Experiment5
TargetPage5:ijcer.org/index.php/ojs/article/download/110/37
structure of web. WUM recording the user profile Here we are looking for a pdf file format that contains scientific research.
and user behavior inside the log file of the web, Keywords: [1. ijcer 2. Vijaya Kumar 3. search engine 4. pdf 5.2013]
First query : [ijcersearch engineVijaya Kumar pdf 2013]
and with information scaling over the web, search second query : Using Boolean Operators
engines must Accommodates this scaling, this is
why large scale search engines architecture Table1: Five Experiments Done to Compare Search Engine Features.
includes lots of distributed servers, and many
papers discusses how to improve the Results obtained is analyzed and compared
performance of web search engines regarding regarding several attributes and features as
the user interface and query input, or towards clarified in table2 of results.
filtering the output results, and improvement
in solving algorithms in web page spying and
collecting, indexing, and output[15].
5. Experiments for Search Engines: Google, Yahoo, Bing, Ask, DuckDuckGo.
Search Engine Google Yahoo Bing Ask DuckDuckGow
URL www.google.com www.yahoo.com www.bing.com www.ask.com ww.duckduckgo.com
Boolean Operators AND, OR, - , intitle:, inurl:intext: AND, OR, (), “ ”, NOT, Domain: AND, OR, - , intitle:, site: , “ ” OR, - , + , site: AND, OR, -, “”, intitle:, inbody:,
site:
Experiment1 About 45,400,000 results (0.40 s) 87,500,000 results 47,700,000 RESULTS Target webpage is 3rd
Target webpage is 1st choice
Target webpage is 2nd choice Target webpage is 2nd choice Target webpage is 6th choice choice
Q1: [troy]
Q2: [troy Greek Homeric About 1,650,000 results (0.27 s)
Target webpage is 1st choice
1,930,000 results
Target webpage is 1st choice
3,010,000 RESULTS
Target webpage is 1st choice
Target webpage is 1st
choice
Target webpage is 1st choice
war ancient]
Table2: Table of Results For Five Experiments of Searching Different Query’s Over the World Wide Web.
Analyzing and evaluation of results, by taking into account total number of results and speed for each query,
the choice of target page in results, a mark of 10 is given for excellent result, and a mark of 7 is given for
average result, and a mark of 4 is given for poor result, a mark of 60 is given for the availability of
categorizations and search tool, and a mark of 40 is given for the effectiveness of Boolean operators in
(experiment5 Q2). Then the total marks for one search engine is a ratio measure from 300 (Optimal SE).
Features
Topic and region specific “yahoos!”
Automatic truncation
No case sensitivity and stop words the
syntax that yahoo follows for searching is
fairly standard among all search engines.
The component:
• Crawler: There are several distributed crawlers, they parse the pages and
extract links and keywords
• URL Server: Provides to crawlers a list of URLs to scan.
• Server Store: The crawlers sends collected data to a store serve. It compresses
the pages and places them in the repository. Each page is stored with an
identifier, a docID
• Repository: Contains a copy of the pages and images, allowing comparisons Figure 4: Yahoo Architecture.
and caching.
• Indexer: It indexes pages .It decompresses documents and converts them into The components:
sets of words called "hits". It distributes hits among a set of "barrels". This Data Acquisition -- Web Crawling
provides an index partially sorted. It also creates a list of URLs on each page. A • follow hyperlinks to download pages
hit contains the following information: the word, its position in the document, • spam detection
font size, capitalization. • (near) duplicate detection
• Barrels: These "barrels" are databases that classify documents by docID. They • Link analysis -- e.g., Pagerank
are created by the indexer and used by the sorter • prepares input for crawling and query processing
• Anchors: The bank of anchors created by the indexer contains internal Index Construction and Updates
links and text associated with each link • build inverted index structure in bulk, similar to mining but updates trickier
• URL Resolver: It takes the contents of anchors, converts relative URLs into Query Processing
absolute addresses and finds or creates a docID.It builds an index of documents Boolean queries:
and a database of links. • compute unions/intersections of lists
• Doc Index: Contains the text relative to each URL. Ranked queries:
• Links: The database of links associates each one with a docID (and so to a real • give scores to all docs in union
document on the Web).
• Page Rank: The software uses the database of links to define the PageRank of
each page. 8. STATISTICAL OVERVIEW
• Sorter: It interacts with barrels. It includes documents classified by docID and
creates an inverted list sorted by worded. Figure 5 shows that Google is still the king of
• Lexicon: A software called DumpLexicon takes the list provided by the sorter
(classified by wordID), and also includes the lexicon created by the indexer (the search traffic, accounting for 70.38% of all search
sets of keywords in each page), and produces a new lexicon to the searcher.
• Searcher: It runs on a web server in a datacenter, uses the lexicon built by traffic in October 2013. Bing and Yahoo! follow
DumpLexicon in combination with the index classified by wordID, taking into further behind with 10.63% and 7.31%
account the PageRank, and produces a results page.
respectively, while Ask is at 3.44% and AOL
7.2 Yahoo: An Overview Search is at 1.24% [[3]].
Yahoo is the oldest and also the largest directory
on the Internet began in mid-1994. This is one of
the most frequently accessed tools, and despite the
fact that most people consider it as a search engine,
and it is classified as a directory.
Study of Google and Bing Search Engines in Context of
Precision andRelative Recall Parameter”, Vol. 4 No. 01 January
2012
[2] NeelamTyagi, Simple Sharma, “Weighted Page Rank Algorithm
Based on Number of Visits of Links of Web Page”, International
Journal of Soft Computing and Engineering (IJSCE), Volume-2,
Issue-3, July 2012.
[3] Krishan Kant Lavania, Sapna Jain, Madhur Kumar Gupta, and
Nicy Sharma, “Google: A Case Study (Web Searching and
Crawling)”, International Journal of Computer Theory and
Engineering, Vol. 5, No. 2, April 2013.
[4] Dilip Kumar Sharma et al, “A Comparative Analysis of Web
Page Ranking Algorithms”, (IJCSE) International Journal on
Computer Science and Engineering Vol. 02, No. 08, 2010.
Figure 5: Search Engine Usage For October 2013.
[5] P.N. Vijaya Kumar, “A PRACTICAL APPROACH TO
WORKING OF WEB SEARCH ENGINE”, et al International
Journal of Computer and Electronics Research [Volume 2, Issue
9. SUMMARY & CONCLUSIONS 1, February 2013]
On average a user need to find the target page [6] B. BarlaCambazoglu, Flavio P. Junqueira, VassilisPlachouras,
“A Refreshing Perspective of Search Engine Caching”,
while he is searching with less keyword and simple International World Wide Web Conference Committee (IW3C2)
query. April 26–30, 2010
[7] Zolt´anGy¨ongyi Hector Garcia-Molina Jan Pedersen,
A simple user interfaces that enables users to use “Combating Web Spam with TrustRank”, 30th VLDB
search tools or categorization feature sufficiently Conference
makes the effectiveness of search engine. Toronto, Canada, 2004
ZivBarYossef, “Efficient Search Engine Measurements”, the
As seen from analysis and results, there are [8]
International World Wide Web Conference Committee
advantages for some search engine (Google, (IW3C2), May2007]
Yahoo, Bing )over others because of their accuracy [9] Bernard J. Jansen, Amanda Spink, “How are we searching the
World Wide Web? A comparison of nine search engine
in hitting result that user is looking for and appears transaction logs”, Information Processing and Management,
in the first page, This is due to exhaustive analysis science directs42 (2006)
of algorithms for several process of servers for [10] Inderjeet Singh Oberoi, Mridul Chopra, “Web Search Engines
–A Comparative Study”, Mälardalen University 2010
crawling and ranking and searching.
[11] P.N. Vijaya Kumar, et al. International Journal of Computer
More Advantages because of user friendly and Electronics Research [Volume 2, Issue 1, February 2013],
interface and features and tools well introduce in ," A PRACTICAL APPROACH TO WORKING OF WEB
SEARCH ENGINE"
the interface. [12] RasmitaMohanty, K S Chudamani. International CALIBER-
Advanced user look for more advanced tools and 2008," A Comparative Study of Google and Yahoo Web
adequate featuresthat are available in some search Resources on theSearch term “Physics India”
[13] Dr. KhannaSamratVivekanandOmprakash. International
engines but with different forms or techniques to Journal of Advanced Engineering Research and Studies E-
perform. ISSN2249–8974," CONCEPT OF SEARCH ENGINE
Boolean Operators make difference when we are OPTIMIZATION IN WEB SEARCH ENGINE"
[14] Inderjeet Singh Oberoi,Mridul Chopra." Web Search Engines
searching a unique target link, as we see in - A Comparative Study"
experiment 5, by using Boolean operators we can [15] Monica Peshave “HOW SEARCH ENGINES WORK AND A
have unique target at the first results. WEB CRAWLER APPLICATION”, University of Illinois at
For more accurate results we need to examine each [16]
Springfield, IL 62703
Krishan Kant Lavania et al. International Journal of Computer
search engine for more queries experiments and we Theory and Engineering, Vol. 5, No. 2, April 2013." Google: A
need to check more the category feature and the Case Study (Web Searching and Crawling)"
[17] Sergey Brin and Lawrence Page. Computer Science
search tools in details. Department,Stanford University, Stanford, CA 94305, US."
The basic principle of operations for all search The Anatomy of a Large-Scale Hypertextual Web Search
Engine"
engines are the same which are crawling ,
indexing and ranking with some differences which [[1]] http://www.webopedia.com/TERM/S/search_engine.html
may lead to significant changes in the results
[[2]] http://www.rba.co.uk/search/compare.shtml
accuracy.
[[3]] http://www.thesearchguru.com/search-stats.asp
10. REFERENCES
[[4]] http://searchenginewatch.com/article/2067276/Searches-Per-
Day
[1] Tauqeer Ahmad Usmani et al. International Journal on [[5]] http://www.google.com/about/company/
Computer Science and Engineering (IJCSE), “A Comparative