Web Page Classification Based On Schema - Org Collection

The document discusses classifying web pages based on their use of schemas from the Schema.org collection. It focuses on using the Recipe schema to identify web pages containing recipes and extract additional semantic information to improve search results. The approach analyzes web page source code to identify relevant schemas and classify pages into genres and microgenres to provide more detailed search options.

Uploaded by

amitguptakkrnic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views5 pages

Web Page Classification Based On Schema - Org Collection

Uploaded by

amitguptakkrnic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Web Page Classiﬁcation based on Schema.

org Collection

Jonáš Krutil, Miloš Kudělka and Václav Snášel

Department of Computer Science, FEECS
VŠB Technical University of Ostrava
17. listopadu 15, 708 33 Ostrava Poruba
[email protected], [email protected], [email protected]

Abstract—The internet is a library of a huge amount of buy the product. There is no option to set these criteria. But
information and there is a need for categorize its content with more semantics on the web, search engines can support
based on web page classification. Classification of web page this in the future.
content can improve the quality of web search and its accuracy.
Unfortunately the high dimensionality of the web pages dataset Existing web pages globally lack a better semantics to
has made the process of classification difficult. The use of an provide to the user similar advanced search options. In order
automatic method for web page classification can simplify the to provide detailed and more relevant information on the web
whole process and assist the search engine in getting more search content, collection of microdata schemas schema.org
relevant results. Nowadays information on the web is generally was created in 2011. It extend family of existing RDF [1],
structured and formatted in a not formal way. This absence
of semantics leads to create formal methods to provide more Microformats and microdata [2] to their gradual replacement
semantics information into web page. Search engines including in the future [3]. This collection gives us an uniform and
Bing, Google, Yahoo! and Yandex formed collection of schemas formal set of rules and recommendations that allow us to
Schema.org to support web page semantics and improve their add significantly better semantic information to the web page
search results. This paper explores the use of formal source source code.
code structure for classifying a large collection of the web
content. Is focused on use of schemas collection Schema.org to Search engines including Bing, Google, Yahoo! and Yan-
classify web pages and categorize them unambiguously. dex rely on this markup to improve the display of search
results, making it easier for people to find the right web
Keywords-Collection of schemas Schema.org; Web Page
Clasification; Genres; Microgenres; Microformats; pages. These global web search engines who overarching
schema.org project, committed to support future of mi-
I. I NTRODUCTION crodata. Their search algorithms are extended to support
microdata schemas gradually. Many applications, especially
Internet can be considered as the world’s largest informa- search engines, can benefit greatly from direct access to this
tion library. We can imagine single books in the library as structured data information. On-page markup enable search
an individual web pages. There is no one who can classify engines to clearly understand the information on web pages
them according to specific topics into the shelves. In this and provides richer search results in order to make it easier
situation it’s not an easy way to be and stay oriented in the for users to find relevant information on the web. Markup
amount of unstructured information. can also enable new tools and applications that can benefit
Automatic classification of web pages is an effective way from use of this structure.
to deal with the difficulty of retrieving relevant information Currently most widely used and supported schema
from the Internet and help users to orientate better. is the Recipe schema. Our work focuses on the unique
Our goal is offer to users more effective way of orientation identification of specific genres and microgenres [4] using
in the amount of web information and give them more the microdata schema collection. Extraction of semantic
relevant search results. We want to assign to each web page information and classifying web pages. About the content
which contain Schema.org collection new auxiliary labels as of analyzed web pages we can determine the amount
genres “Movie”, “Person”, “Recipe”, “Blog” or microgenres of additional information which can be used to expand
as “Price information”, “Something to read”, etc. This can web search options, more accurate search results and others.
clearly identify web page genres and microgenres. We can
offer to our users more detailed search possibilities, which
can lead to much more relevant search results. A. Related work
For example when user is searching for lasagna recipe Apart from schema.org microdata schemas collection
with at least one user review and five star rating. Current there is several other formally defined rules for the semantic
search engines don’t allow this advanced search query. The web. These semantic web languages are presented in [8].
similar situation is with any product. When we are searching Also systems for web information extraction which trans-
only for web pages with product specification and offer to form web pages into program-friendly structures such as

978-1-4673-4794-5/12/$31.00 2012
c IEEE 356
a relational database are important to us. This approach
analyzes the structure and the templates of the web page.
The survey of major web information extraction approaches
is presented at [9].
Concept of Genres and MicroGenres is introduced at [4]
as ambiguous categories without fixed boundaries and are
especially formed by the sets of conventions. Authors of
paper [6] analyze web pages using a web patterns and
introduces a method for semantic analysis of web pages.
Automatic web page classification in a dynamic and hier-
archical way is presented at [12]. It relates to text learning
and document classification. Text learning is a machine
learning method on textual data that combines information
retrieval techniques and is used as a tool to extract the
content of textual data [10]. Significant survey “Web page
classification: Features and Algorithms” [13] examine the
space of Web classification approaches to find new areas for
research, as well as to collect the latest practices to inform
future classifier implementations. Carefully review the Web-
specific features and algorithms that have been explored and
found to be useful for Web page classification.
Importance of HTML structural elements and metadata
in automated subject classification is shown in paper [11].
The aim of the paper was to determine how significance
indicators assigned to different Web page elements (internal
metadata, title, headings, and main text) influence automated
classification.

B. Organization
This paper is organized into the following sections: Sec-
tion II. describes our research and approach. Section III.
introduces schemas collection Schema.org, our algorithm
and genres. Finally, in Section IV. we draw conclusion and
provide future research.
II. O UR RESEARCH
In contrast to the above approaches is the information
located in the source code by tags and atributes essential
to us and our approach cannot be applied to plain text
only. Our algorithm uses a specific semantic attributes and
the information they marked. It has been demonstrated that
using information derived from tags can boost the classifiers
performance [11].
People who search the web usually have a clear concep-
tion for what they are searching for and they know how
this search result ideally looks like. In our research we are Fig. 1: Recipe by chow.com described by Schema.org
searching for Recipes and for our experiments we chose
schema of Recipe, the most widely used schema and being
supported by web search engines mentioned above in I.
Introduction. obtain on the basis of our analysis additional semantic
Our aim was to analyze the source code of a sufficient information about a particular web page. Assign these web
number of web pages that publish articles about cooking. pages to one or more predefined category labels, also known
We want try to clearly identify the Recipe schema and as web page categorization or classification in order to

2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN) 357
increase the precision of web search. The labels can be
represented as genres and microgenres examples from I. People can read this information and understand the
section or any others. We went manually through hundreds meaning of its individual parts, but search engine crawler
of international and domestic recipe web sites. We were will not understand the meaning so well. Information will
looking for pages that contain information about the Recipe be stored in a search engine database as a plain text inside
schema, possibility rate the recipe by stars and have option some general table probably. Maybe just add to some words
write a user review. Our results from human manual crawling more weight thanks to the importance of html tag. If you
were compared with our results from algorithm using a assign a schemas to our data above, the result for the search
microdata collection Schema.org. engine crawler will be much more readable. The following
Demonstrative recipe web page from our research is is an example of how to embed information about a recipe
presented at (Fig. 1). Schema.org schemas are described in and the structure of the information into a website. In order
blocks. There is Recipe and Review schemas which includes to mark up the data the attribute itemtype along with the url
CreativeWork and Thing schemas with their own properties of the schema is used. The attribute itemscope defines the
from collections. Main genre of web page is Recipe and scope of the itemtype. The kind of the current item can be
we can also see some microgenres “Something to read” and defined by using the attribute itemprop. Within the schema
“Rating”. for a recipe is a schema for a nutrition Information.
III. M ICRODATA COLLECTION S CHEMA . ORG <div itemscope itemtype="http://schema.org/Recipe">
<h1 itemprop="name">Mom’s World Famous Banana
Schema.org goal is to get back unambiguous meaning of Bread</h1>
the information that is lost during the transfer from the By <span itemprop="author">John Smith</span>,
<meta itemprop="datePublished" content="2009-05-08">
database (information in database is clearly divided and May 8, 2009
described in the tables, their columns and rows) to the <img itemprop="image" src="bananabread.jpg" />
<span itemprop="description">This classic banana
aplication presentation layer. Collection of schemas is used bread</span>
to restore that lost information back into the source code of <div itemprop="nutrition"
itemscope itemtype="http://schema.org/NutritionInfo
web pages and offers the possibility to extend the semantic rmation">
meaning even further. Major web search engines create and <strong>Nutrition facts:</strong>
<span itemprop="calories">240 calories</span>,
support a common vocabulary for structured data markup on <span itemprop="fatContent">9 grams fat</span>
web pages. </div>
<strong>Ingredients:</strong>
With schema.org collection, site owners and developers - <span itemprop="ingredients">3 or 4 ripe bananas,
can learn about structured data and improve how their sites smashed</span>
- <span itemprop="ingredients">1 egg</span>
appear in major search engines. Web page owners can ...
improve how their sites appear in search results not only on </div>
Google, but on Bing, Yahoo! and potentially other search
engines as well in the future. The information described using microdata is much
Schema.org also introduces schemas for more than a better semantically structured [5]. We can see that the text
hundred new categories, including movies, music, organiza- belongs to the genre of Recipe has its own name, author,
tions, TV shows, products, places and more. As webmasters published date and recipe description. Also we have clearly
add this semantic markup to their sites, search engines can listed the individual ingredients and nutritional information.
develop richer search experiences. Search engines have been Now we can store these semantic results into recipe tables
working independently to support structured markup for a with appropriate properties.
few years now. Much of the vocabulary on schema.org was
inspired by earlier formats such as Microformats, FOAF, A. Algorithm
GoodRelations, hCard and OpenCyc.
If we are talking about semantics, we can imagine se- The algorithm parses source code of web pages that
mantically unstructured data as plain text with some html is divided into several blocks by itemscope and itemtype
tags: attribute. With this procedure we obtain clear information if
the recipe schema is on the website or not (Fig. 2). Because
<h1>Mom’s World Famous Banana Bread</h1> we are searching for recipe schema, algorithm at first seeks
By John Smith
May 8, 2009 for itemscope with Recipe value and then inside the returned
<img src="bananabread.jpg" /> data blocks is looking for aditional information (aggregat-
This classic banana bread
<strong>Nutrition facts:</strong> eRating, Review) using values in itemprop attribute.
240 calories, 9 grams fat
<strong>Ingredients:</strong> <div itemscope itemtype="http://schema.org/Recipe">
- 3 or 4 ripe bananas, smashed <div itemprop="aggregateRating" itemscope itemtype=
- 1 egg "http://schema.org/AggregateRating">
... <span itemprop="ratingValue">4</span> stars - based

358 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)
Fig. 2: Algorithm process

on Actual genre of web page [14] then can be determined by

<span itemprop="reviewCount">250</span> reviews
</div>
the name of the schema collection. In this paper we describe
<div itemprop="review" itemscope itemtype=" term Genre as information what kind of document it is. Web
http://schema.org/Review">
<span itemprop="name">Great recipe</span> - by
page content can be about cakes, meat, soups but we know
<span itemprop="author">Ellie</span>, it’s “Recipe web page”.
<div itemprop="reviewRating" itemscope itemtype="
http://schema.org/Rating"> We can obtain the individual labels “Movie”, “Person”,
<meta itemprop="worstRating" content = "1"> “Recipe” and others as we mentioned above. The search
<span itemprop="ratingValue">4</span>/
<span itemprop="bestRating">5</span>stars can become much more comfortable for users.
</div>
<span itemprop="description">Delicious!</span>
We also mentioned that the recipe genre is currently
</div> the the most widely used schema. Its implementation by
</div>
Google and Yahoo! is shown in the figures (Fig. 3, Fig.
4). It’s clearly see that this search result is much more
This procedure clearly identify the web page containing
informative than “standard” search result. Thanks to good
the microdata schema with the recipe, ratings and reviews
semantic source code, Google.com shows recipe photo, star
of users.
rating and counter for user reviews. Yahoo! provides photo
It should be noted that the algorithm is language indepen- and rating but also recipe total time and ingredients.
dent and Schema.org collection is written in English.
If the schema is written according to specifications then
detection by our algorithm achieves 100% success rate with
an average duration 12ms per page. Algorithm can be easily
extended to other schemas of the collection and we are able
to identify and extract all the information that are described
within the collection. It gives us the possibility to easily
extract specific information that is important to us within a
web page. Also we can obtain information about web page
Genre and set of MicroGenres. These information can be
used for advanced search features, personalization results
[7] or for other applications working with structured data.
Fig. 3: Recipe schema by Google.com

B. Genres & MicroGenres

Web pages that use collection of schemas are logically
divided into blocks and clearly describing genres and some
microgenres [4] for every website. For microgenre example,
we can obtain “information about the price” for a particular
product. This microgenre is usable if you are considering
buying a product and your search results are only the product
reviews and forums. The content of such a web pages we do
not want to include because there is no shopping possibility.
We can offer to the user the ability to view only web pages Fig. 4: Recipe schema by Yahoo.com
that contain “information about the price” and other web
pages are ﬁltered out in favour of the relevant results.

2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN) 359
IV. C ONCLUSION [7] M. Eirinaki and M. Vazirgiannis, Web mining for web person-
alization. ACM Transactions on Internet Technology (TOIT),
Due to the dependence of the schemas and semantic Vol. 3 Issue 1, 2003.
structure of source code our approach is not universal. Also
schemas can be currently found worldwide only in a limited [8] J. Bailey, F. Bry, T. Furche and S. Schaffert, Web and Semantic
number of web pages. However it is clear already that web Web Query Languages: A Survey. Reasoning Web Summer
School, Springer-Verlag, LNCS 3564, 2005.
search engines will be pushing web developers to write their
source code with microdata. Web developers should want to [9] C.-H. Chang, M. Kayed, M. R. Girgis and K. F. Shaalan, A
create more semantic source code, because their website will survey of Web information extraction systems. IEEE Transac-
be more visible in search engine results. We think that in tions on Knowledge and Data Engineering, 18(10):14111428,
the collection of schemas schema.org is the future of web 2006.
semantics. [10] D. Koller and M. Sahami, Hierarchically classifying docu-
In the near future, we want to focus on web pages that ments usina very few words. Proceedings of the 14 interna-
contains no schema information but there should be any. tional Conference on Machine Learning ECML98, 1998.
Web pages which belongs to any of Schema.org collection
but they don’t have microdata in the source code. We will [11] K. Golub and A. Ardo, Importance of HTML Structural
Elements and Metadata in Automated Subject Classification.
be analyze these web pages and trying to find a way to ECDL 2005, LNCS 3652, Springer-Verlag Berlin Heidelberg,
recommend add schema to the source code and improve 2005.
their semantic meaning.
Also we want take a deeper look to web page Genres, [12] X. Peng, B. Choi, Automatic Web Page Classification in a
Dynamic and Hierarchical Way. Center for Entrepreneurship
MicroGenres and make better classification through them.
and Information Technology (CEnlT), Louisiana Tech Univer-
sity, 2002.
ACKNOWLEDGMENT
[13] X. QI and B. D. Davison, Web Page Classification: Features
This paper was supported by the IT4Innovations Centre and Algorithms. ACM Computing Surveys, Vol. 41, No. 2,
of Excellence project, reg. no. CZ.1.05/1.1.00/02.0070 sup- Article 12, 2009.
ported by Operational Programme Research and Develop-
ment for Innovations’ funded by Structural Funds of the [14] A. Finn and N. Kushmerick, Learning to classify documents
European Union and state budget of the Czech Republic; by according to genre. In IJCAI-03 WS on Computational
Approaches to Style Analysis and Synthesis, 2003.
the SoftComp: Development of human resources in research
and development of innovative softcomputing methods and
their practical use, reg. no.CZ.1.07/2.3.00/20.0072 funded
by Operational Programme Education for Competitiveness,
cofinanced by ESF and state budget of the Czech Republic;
by SGS, VSB-Technical University of Ostrava, under the
grant no. SP2012/58.

R EFERENCES
[1] D. Brickley and R. V. Guha, RDF Vocabulary Description Lan-
guage 1.0: RDF Schema. The World Wide Web Consortium
(W3C), 2004.

[2] I. Hickson and R. V. Guha, HTML Microdata. The World

Wide Web Consortium (W3C), 2012.

[3] Schema.org FAQ. Google Inc., 2012.

[4] M. Kudelka, V. Snasel, Z. Horak and A. Abraham, Micro-

Genre: Building block of web pages. Networked Digital
Technologies, 2009.

[5] S. Bradley, Why (And How) You Should Use HTML5 Micro-
data. Van SEO Design, 2011.

[6] M. Kudelka, V. Snasel, O. Lehecka and E. El-Qawasmeh,

Semantic Analysis of Web Pages Using Web Patterns. Web
Intelligence, 2006.

360 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)

Semantic Web Unit - 1 & 2
No ratings yet
Semantic Web Unit - 1 & 2
16 pages
Webmininglec
100% (1)
Webmininglec
75 pages
Semantic Web Report
100% (1)
Semantic Web Report
15 pages
Automatic Semantic Classification and Categorization of Web Services in Digital Environment
No ratings yet
Automatic Semantic Classification and Categorization of Web Services in Digital Environment
6 pages
Assessing Approaches To Genre Classification
No ratings yet
Assessing Approaches To Genre Classification
72 pages
Learning To Classify Documents According To Genre: Aidan Finn and Nicholas Kushmerick
No ratings yet
Learning To Classify Documents According To Genre: Aidan Finn and Nicholas Kushmerick
26 pages
Genre Classification of Web Pages: - User Study and Feasibility Analysis
No ratings yet
Genre Classification of Web Pages: - User Study and Feasibility Analysis
15 pages
241-320 Design Architecture and Engineering For Intelligent System
No ratings yet
241-320 Design Architecture and Engineering For Intelligent System
46 pages
Paper 1
No ratings yet
Paper 1
8 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
8 pages
Ijaera2
No ratings yet
Ijaera2
12 pages
Keyw Word Quer Ry Based D Focused Dwebc Rawler: Sciencedirect
No ratings yet
Keyw Word Quer Ry Based D Focused Dwebc Rawler: Sciencedirect
7 pages
Web Page Classification - Features and Algorithms
No ratings yet
Web Page Classification - Features and Algorithms
31 pages
Chapter - 2 Literature Survey: S. No Page No
No ratings yet
Chapter - 2 Literature Survey: S. No Page No
22 pages
Case Sudies Assignment
No ratings yet
Case Sudies Assignment
21 pages
Comp Sci - IJCSE - Topic Specfic Concept - Sonam Arora
No ratings yet
Comp Sci - IJCSE - Topic Specfic Concept - Sonam Arora
12 pages
Python Design and Implementation of A Simple Web Search E
No ratings yet
Python Design and Implementation of A Simple Web Search E
9 pages
Formalizing - Relationaldatabasesas - Owlontologies REF-15-23
No ratings yet
Formalizing - Relationaldatabasesas - Owlontologies REF-15-23
9 pages
360i POV On The Schema - Org Markup Initiative
No ratings yet
360i POV On The Schema - Org Markup Initiative
7 pages
How Can A Logic Based, Semantic Internet Support Interactivity in Programmed Interfaces, Improving Their Potential in The Field of Art & Technology?
No ratings yet
How Can A Logic Based, Semantic Internet Support Interactivity in Programmed Interfaces, Improving Their Potential in The Field of Art & Technology?
17 pages
Classification of World Wide Web Documents
No ratings yet
Classification of World Wide Web Documents
15 pages
Intelligent Information Retrieval From The Web
No ratings yet
Intelligent Information Retrieval From The Web
4 pages
A Proposed Technique For Finding Pattern From Web Usage Data
No ratings yet
A Proposed Technique For Finding Pattern From Web Usage Data
4 pages
IJARCCE 67 Project Research Paper
No ratings yet
IJARCCE 67 Project Research Paper
3 pages
Advanced Information and Knowledge Processing: Springer
No ratings yet
Advanced Information and Knowledge Processing: Springer
411 pages
Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering
No ratings yet
Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering
36 pages
Abbott Kim Genre Classification PDF
No ratings yet
Abbott Kim Genre Classification PDF
3 pages
A Novel Approach For Clustering of Heterogeneous XML and HTML Data Using K-Means
No ratings yet
A Novel Approach For Clustering of Heterogeneous XML and HTML Data Using K-Means
5 pages
Semantic Search: With Contributions From Thanh Tran (KIT)
No ratings yet
Semantic Search: With Contributions From Thanh Tran (KIT)
78 pages
Akash Raj
No ratings yet
Akash Raj
23 pages
Karin Koogan Breitman - Semantic Web - Concepts, Technologies and Applications
No ratings yet
Karin Koogan Breitman - Semantic Web - Concepts, Technologies and Applications
328 pages
Search Engine Using Apache Lucene
No ratings yet
Search Engine Using Apache Lucene
5 pages
Introduction To Semantic Web
No ratings yet
Introduction To Semantic Web
27 pages
Fenix A Semantic Search Engine Based On
No ratings yet
Fenix A Semantic Search Engine Based On
19 pages
Aidan Hogan - The Web of Data (2020)
No ratings yet
Aidan Hogan - The Web of Data (2020)
689 pages
Impact of Semantic Web Technologies On Digital Collections of Libraries
No ratings yet
Impact of Semantic Web Technologies On Digital Collections of Libraries
11 pages
Web Search Using Automatic Classification: Computer Science Department, Stanford University
No ratings yet
Web Search Using Automatic Classification: Computer Science Department, Stanford University
11 pages
A Survey On Semantic Web Search Engines: October 2011
No ratings yet
A Survey On Semantic Web Search Engines: October 2011
8 pages
Recent Survey On Automatic Ontology Learning
No ratings yet
Recent Survey On Automatic Ontology Learning
5 pages
M Dunshire Web Resource Management Sematice Web
No ratings yet
M Dunshire Web Resource Management Sematice Web
10 pages
Comparative Study On Semantic Search Engines
No ratings yet
Comparative Study On Semantic Search Engines
9 pages
Introduction To The Semantic Web
No ratings yet
Introduction To The Semantic Web
11 pages
CIS 555 F P P: P ' F S E: Inal Roject Oogle ENN S Avorite Earch Ngine
No ratings yet
CIS 555 F P P: P ' F S E: Inal Roject Oogle ENN S Avorite Earch Ngine
5 pages
A Proteus Configuration To Respond To The Various Web Services
No ratings yet
A Proteus Configuration To Respond To The Various Web Services
3 pages
A Keyword Focused Web Crawler Using Domain Engineering and Ontology
No ratings yet
A Keyword Focused Web Crawler Using Domain Engineering and Ontology
3 pages
SW Mids
No ratings yet
SW Mids
5 pages
Towards The Semantic Web: Collaborative Tag Suggestions
No ratings yet
Towards The Semantic Web: Collaborative Tag Suggestions
8 pages
Focussed WC
No ratings yet
Focussed WC
8 pages
Machine Learning Techniques For Search Engine Development
No ratings yet
Machine Learning Techniques For Search Engine Development
12 pages
Semantic - Based Querying Using Ontology in Relational Database of Library Management System
No ratings yet
Semantic - Based Querying Using Ontology in Relational Database of Library Management System
12 pages
Ontology Matching - A Machine Learning Approach
No ratings yet
Ontology Matching - A Machine Learning Approach
20 pages
Internet Research: What's Hot in Search, Advertizing & Cloud Computing
No ratings yet
Internet Research: What's Hot in Search, Advertizing & Cloud Computing
59 pages
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
13 pages
Creating Ontologies From Web Documents
No ratings yet
Creating Ontologies From Web Documents
8 pages
Research Paper
No ratings yet
Research Paper
4 pages
Semantic Web: Department of Computer Science,,CUSAT
No ratings yet
Semantic Web: Department of Computer Science,,CUSAT
27 pages
Semantic Web
No ratings yet
Semantic Web
8 pages

Web Page Classification Based On Schema - Org Collection

Uploaded by

Web Page Classification Based On Schema - Org Collection

Uploaded by

Web Page Classiﬁcation based on Schema.

Jonáš Krutil, Miloš Kudělka and Václav Snášel

on Actual genre of web page [14] then can be determined by

B. Genres & MicroGenres

[2] I. Hickson and R. V. Guha, HTML Microdata. The World

[3] Schema.org FAQ. Google Inc., 2012.

[4] M. Kudelka, V. Snasel, Z. Horak and A. Abraham, Micro-

[6] M. Kudelka, V. Snasel, O. Lehecka and E. El-Qawasmeh,

You might also like