ICCED WEB 30-Rev

This document proposes a smart search engine on web browsers that can generate summaries of search results. It combines current search engine algorithms with text summarization of the top 5 web pages. The smart search engine aims to present concise yet meaningful information to the user, in line with the goals of Web 3.0. An evaluation of the approach showed an average accuracy of 82% in summarizing results from keyword searches.

Uploaded by

Risma sari hidayat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views6 pages

ICCED WEB 30-Rev

Uploaded by

Risma sari hidayat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

WEB 3.

0 Concept: Smart Search Engine

on Web Browser
1st Adhitia Erfina 2rd Arny Lattu
Departement of Infomation System Departement of Infomation System
Nusa Putra University Nusa Putra University
Sukabumi, Indonesia Sukabumi, Indonesia
[email protected] [email protected]

3rd Metodius Gulo 4th Rifqi Muhammad Alfatah

Departement of Infomation System Departement of Infomation System
Nusa Putra University Nusa Putra University
Sukabumi, Indonesia Sukabumi, Indonesia
[email protected] [email protected]

Abstract — Society 5.0 that we are experiencing search for information compared to conventional
today makes information a commodity for everyone. sources of information (such as newspapers or
This situation requires receiving, disseminating, and magazines). This causes the web to become one of the
most widely used portals for people to search for
processing data in real-time and very quickly. With information.
the sophistication of existing technology, people are Since it was first launched in 1989 by Tim Bernes-
increasingly using technology (such as the internet, Lee, the development of the web has undergone many
cell phones, and computers) to search for evolutions [1]. At the time WEB 1.0 web pages were
information compared to conventional sources of created to display information statically, users could
only view and read the information presented without
information (such as newspapers or magazines). This being able to interact with the web page. WEB 2.0 is a
causes the web to become one of the most widely time when users can interact with web pages. The
used portals for people to search for information. interaction in question is that users can not only read
The development of the web since it was first and view web pages, but users can also read and write or
launched in 1989 by Tim Bernes-Lee has undergone two-way communication [2]. WEB 3.0 is a concept
from the Intelligent Web. WEB 3.0 is also called the
many evolutions which until now will face WEB 3.0. Semantic Web, which supports users to get more
WEB 3.0 is a concept from the Intelligent Web. WEB meaningful information. The idea of WEB 3.0 is
3.0 is also called the Semantic Web, which supports assisted by artificial intelligence to gather and present
users to get more meaningful information. The information in a shorter but meaningful form from
results of this study are in the form of a Smart various relevant sources on the internet.
Search engines available in current web browsers
Search Engine on Web Browser to generate a
(such as Google, Bing, MSN, etc.) can accommodate
summary of data that has been processed user searches and provide the best results based on the
automatically based on the top 5 web pages from source (web page), relevance, and the amount of traffic
search results based on keywords entered by the from keywords to the web page. [3]. Unfortunately, the
user. Data testing conducted with three different current search engines are only able to display web
pages that are relevant to the keywords in sequence, and
methods resulted in an average accuracy rate of
they have not been able to present the essence of the
82%. search results carried out and processed to become
concise and complete information that the user can well
Keywords — Smart Search Engine, WEB 3.0, Search receive.
Engine Pagerank, Automatic Text Summarization. This research focuses on improving the performance
of existing search engines in browsers to achieve the
intended WEB 3.0 concept, namely by combining
I. INTRODUCTION
current search engine algorithms and adding the text
Society 5.0 that we are experiencing today makes summarization method to the top 5 web pages from
information a commodity for everyone. This situation search results in search engines. . The text
requires receiving, disseminating, and processing data in summarization in question is part of the scientific field
real-time and very quickly. People get information from of text mining which automatically produces a summary
various sources such as social media, forums, news, containing meaningful sentences and includes all
newspapers, etc. With the sophistication of existing relevant necessary information from the original
technology, people are increasingly using technology document [4]. The results of this study are in the form of
(such as the internet, cell phones, and computers) to
a Smart Search Engine on Web Browser to generate a automatic and learns to improve from experience
summary of information that has been processed without being explicitly programmed [8].
automatically based on the top 5 web pages from search The latest research with the machine learning
results based on keywords entered by the user. approach is [9] by hybridizing the Maximal marginal
importance (MMI) method, PSO, and hybrid the other
approach techniques, namely the fuzzy logic method.
II. RELATED WORK The input of this research is a single document, and the
Several related studies that we take as a reference for summary results are extractive summaries. MMI is used
this research include research related to search engine to produce summaries that excel in terms of diversity by
algorithms, search engine page rank, text summarization determining the most important sentences. The most
and the semantic web. The search engine algorithm that critical penalties are determined by selecting the same
is the reference in this study is the algorithm belonging punishment and choosing various sentences by
to the Google search engine. As we know that Google is extracting sentences from the original text [10]. PSO is
a potent search engine that almost everyone uses to used to select the most important and least important
become their favourite search engine on their web features, and fuzzy logic is used to help PSO to create
browser [5]. The way search engines work on web risk and uncertainty values, and the tolerance value can
browsers is classified into three stages, namely: be changed flexibly.
Crawling: The first stage involves Google's bots (the Meanwhile, [11] with his research entitled Smart
infamous “spiders”) crawling the web and looking for Search Engine, focuses on improving search engine
new or updated web pages. In general, the more links a performance to understand the intent of the keywords
page has to it, the easier it is for Google to locate it. entered by the user using the Design and Test of
Pages need to be crawled and indexed to rank. Intelligent Search of News with Classification. This
Indexing: Google's next step is to analyze these URLs research is also similar to research [12] on search
and figure out what each page is about. It does this by engines for the semantic web. However, this study uses
looking closely at the content, images, and other media a new approach that combines the search engine page
files on the page and then stores this information in a rank method with automatic text summarization to
massive database known as the Google index. produce a summary of information from the top 5 search
Serving: The final step is to determine which pages are results web pages.
the most relevant and helpful for a particular search
query. This is known as the ranking stage, and this is III RESEARCH METHODOLOGY
where the Google search algorithm comes in [6]. This study uses a new approach that combines the
Search engines will present search results to users search engine page rank method with automatic text
according to predetermined criteria. The result is called summarization to produce a summary of information
PageRank. Research Ao-Jan Su et al. describes 17 from the top 5 search results web pages. The research
ranking features that affect the ranking of a web page. flow can be seen in Figure 1.
These features are described in table 1.

Table 1. Ranking Features [7]

Group Feature Detail
PG pagerank score
Page age of the web page in a search engine’s
AGE
index
HOST keyword appear in hostname
URL
PATH keyword in the path segment of url
D_SIZE size of the web site’s domain
Domain
D_AGE age of the web site’s domain
TITLE keyword in the title tag of HTML header
Header M_KEY keyword in meta-keyword tag Search Engine
M_DES keyword in meta-description tag Page Rank
Body DENS keyword density
H1 keyword in h1 tag
H2 keyword in h2 tag
Heading H3 keyword in h3 tag
H4 keyword in h4 tag
H5 keyword in h5 tag
ANCH keyword in anchor text
Link
IMG keyword in image tag

The most crucial step in this research is to make an

accurate summary of the top 5 web pages generated by
search engines. Adhika et al. mentioned that, in general,
there are six techniques or approaches to doing text
summarization, namely fuzzy-based, machine learning,
statistics, graphics, topic modelling, and rule-based. The
most approach technique used in text summarization is
machine learning because this approach is a modern Automatic Text
Summarization
technique. The machine learning performance is
Figure 1. Proposed Method
Automatic Text Summarization
Search Engine Pagerank
Search engine page rank search results will be an
The stages in the search engine page rank begin with
essential document to perform automatic text
identifying the keywords entered by the user using this
research. This research also tries to make search engines summarization. Each of the top 5 web pages will go
understand the meaning of the keywords entered by the through a preprocessing stage (segmentation, stop word
user and understand the meaning implied in it. Figure 2 removal, and stemming) so that the text used can be
shows the framework of the search. appropriately processed by machine learning. The initial
stage in preprocessing is segmentation. Segmentation is
a preprocessing stage that separates/cuts text documents
into the chapter. Once separated or cut into the next
chapter is paragraph segmentation, which is separating/
cutting the chapter into paragraphs. Next is the sentence
segmentation to word segmentation.
Stop word removal is a word that is ignored in
processing. These neglected words are stored in the stop
word list. The main characteristics for determining stop
words are words that usually have a high frequency of
occurrence, for example, conjunctions like ‘‘and”, ‘‘or”,
‘‘but”, ‘‘will,” and others. There are no definite rules in
determining the stop word to be used. The determination
of the stop word can be adjusted to the case being
resolved. For example, with the language used, the
Hindi stopword list [14] will be very different from
English or other languages. The last preprocessing that
is often used is Stemming. Stemming is used to change
words with affixes into primary forms or remove affixes
that stick to the essential words. For example, ‘‘to be
given” becomes ‘‘to give”. The stemming process in
each language is different. For example, Affixes in
Indonesian are more complex when compared to affixes
Figure 2. Search Framework in English. Because affixes in Indonesian consist of
prefixes, infixes, suffixes, repeated forms, and confixes
The entered keywords must go through the (a combination of prefixes and suffixes).
preprocessing stage first so that the mapping results can The next stage is Features Extraction, which in this
get better. After the keywords are entered, the next stage study uses MMI Diversity based Method, MMI
is Crawling. Crawling is the stage of search engines to Diversity based Swarm Method, and Fuzzy Swarm
search for those on the WEB. Data search is done by based Method. Here the role of fuzzy logic is very
browsing every data storage media on the internet suitable for automatic text summarization because fuzzy
according to the keywords entered. Crawlers will can determine values based on unclear data sources [15].
usually be very easy to identify keywords according to These three methods are used because they have their
the link, but this also means that Crawlers identify the uniqueness. Use the calculation below to get the value
Title, Body, and Tags. of the central sentence of the Maximal Marginal
The next stage is indexing, which means search
Importance (MMI).
engines will match the keywords entered with the search
results obtained from Crawling. This also includes
matching images, media, content, or files that may be
related to keywords.
They are serving as the final stage of this page rank.
This stage will present the results obtained based on the
sequences of web pages according to keyword Meanwhile, to get the value of the mid-sentence
relevance. Several factors that affect web page ranking from the Fuzzy Swarm based Method, use the
are [13]: calculation below:
1. Website's backlinks count
2. Website's age
3. Website's content and data relevance
4. Website's traffic
5. Keyword occurrence on website.
6. Website's domain authority.
The results obtained must go through the Sentence Selector Chelsea's goal shook in the 70th
first so that the presentation of the sentence results is more minute after Kevin De Bruyne's
optimal for the user. shot from outside the penalty
box was denied by Kepa
E
Arrizabalaga. 1-0 advantage
lasted until the referee blew the
IV RESULT AND DISCUSSION long whistle signalling the end
Search engine page rank search results will be an of the match.
essential document to perform automatic text Smart Search Manchester City played against
summarization. Generally, the current Google search Engine Result Chelsea in 2 matches this
engine will display search results in Figure 3. season. The Manchester City vs
Chelsea match in the 22nd week
of the Premier League, the
highest caste of the English
Premier League 2021-2022.
Manchester City won 1-0 at
Stamford Bridge. Chelsea's goal
shook in the 70th minute after
Kevin De Bruyne's shot from
outside the penalty box was
denied by Kepa Arrizabalaga.

The results obtained from the Smart Search Engine are a

combination of information on the top 5 web pages of search
results on the internet. Data can be summarized in the form of
text. The three feature selections used can also be measured
based on Precision, Recall, and F-Measure. The results
obtained through the three methods used produce an average
accuracy of 82%.
Figure 3. Browsing Result

Figure 3 shows the search results for the keyword

"Man. City Chelsea". The results provided by Google
are web pages that are relevant to the keywords entered
by the user but from various sources. Example:

Tabel 2. Browsing Result

Webpage Webpage Content
Manchester City is going head
to head with Chelsea.
Manchester City played against
A Chelsea in 2 matches this
season. Currently, Manchester
City ranks 1st, while Chelsea
holds third position.
In the 22nd week of the Premier
League, the highest caste of the Figure 4. Average Result
English Premier League 2021-
B 2022, the Manchester City vs
Chelsea match ended with a
score of 1-0 for Pep Guardiola's V CONCLUTION
team to win. Through the model created, this research succeeded in
Gabriel Jesus ended Chelsea’s combining Search Engine Pagerank with Automatic Text
unbeaten start to the season as Summarization to produce a Smart Search Engine as a form of
C
Manchester City won 1-0 at WEB 3.0 concept to produce clear and concise information for
Stamford Bridge. users from the top 5 web pages of search results. The average
Week 22 of the Premier League
accuracy value of the three combined methods shows good
presents a fierce battle between
Manchester City and Chelsea at results, which is 82%. Future research will focus on other
D the Etihad Stadium. The things that support WEB 3.0 because this research only
Citizens, who played as hosts, focuses on text summarization, where the data that can be
came out as winners with a analyzed is only in text.
score of 1-0.
VI. REFERENCE 1–14. https://doi.org/10.1109/TKDE.2015.2405553.
[1] Sana Aslam and Sharad Kumar Sonkar, “Journey of
Web 2.0 to Web 3.0,” Empowering Libraries with
Emerging Technologies for Common Sustainable
Future, Babasaheb Bhimrao Ambedkar University, [15] Erfina, A., and Y. H. Putra. "Irony Sentence Detection
Lucknow Techniques Using Fuzzy Historical Classifier." In IOP
[2] Zia, M.W., & Mr., F.A. (2019). "Possible uses of web Conference Series: Materials Science and Engineering,
3.0 in websites" of Libraries of Academic Institutions vol. 662, no. 6, p. 062004. IOP Publishing, 2019
of Pakistan.
[3] Judit Bar-Ilan, “Search Engine Results over Time-A
Case Study on Search Engine Stability .,” vol. 2, no. 3,
1998.
[4] Wafaa S. El-Kassas, Cherif R. Salama, Ahmed A.
Rafea, Hoda K. Mohamed, “Automatic text
summarization: A comprehensive survey,” Expert
Systems with Applications., vol. 165 ,. 2021 ISSN
0957-4174.
[5] Weber, Ingmar. “An Analysis of Factors Used in
Search Engine Ranking.” Adversarial Information
Retrieval … (2005): n. pag.
[6] Gandour, A. and Regolini, A. (2011), "Web site search
engine optimization: a case study of Fragfornet",
Library Hi Tech News, Vol. 28 No. 6, pp. 6-13.
https://doi.org/10.1108/07419051111173874
[7] Ao-Jan Su, Y. Charlie Hu, Aleksandar Kuzmanovic,
Cheng-Kok Koh, “How to Improve Your Search
Engine Ranking: Myths and Reality,” ACM
Transactions on the Web, Vol. 8, No. 2, Article 8,
Publication date: March 2014.
[8] Adhika Pramita Widyassari a,b, Supriadi Rustad a,
Guruh Fajar Shidik, Edi Noersasongko, Abdul Syukur,
Affandy Affandy, De Rosal Ignatius Moses Setiadi,
“Review of automatic text summarization techniques
& methods,” Journal of King Saud University –
Computer and Information Sciences vol. 34, no. 4,
2022.
[9] Binwahlan, M. S., Salim, N., & Suanmali, L. (2009b).
Swarm based features selection for text
summarization. IJCSNS International Journal of
Computer Science and Network Security, 9(1), 175–
179.
[10] Yuangui Lei, Victoria Uren, and Enrico Motta,
SemSearch: A Search Engine for the Semantic Web,
Knowledge Media Institute no. November. 2021.
[11] Chaoyang Li, Ke Liu, “Smart Search Engine A Design
and Test of Intelligent Search of News with
Classification,” Dalarna University, 2022
[12] Binwahlan, M. S., Salim, N., & Suanmali, L. (2009e).
“Integrating of the diversity and swarm based methods
for text summarization”. In The 5th postgraduate
annual research seminar (PARS), 17–19 June, Johor,
Malaysia (pp. 523–527).
[13] Ercan, M. F. (2008). A performance comparison of
PSO and GA in scheduling hybrid flow-shops with
multiprocessor tasks. In Proceedings of the 2008 ACM
symposium on applied computing, SAC’08, 16–20
March, Fortaleza, Ceará, Brazil (pp. 1767–1771).
[14] Liu, C., Tseng, C., Chen, M., 2015a. IncreSTS:
Towards real-time incremental short text
summarization on comment streams from social
network services. IEEE Trans. Knowl. Data Eng. 60,