0% found this document useful (0 votes)
58 views

The Extensible Markup Language

The document discusses the history and development of XML, including its origins from SGML and its creation by the W3C to enable generic SGML to be processed on the web similarly to HTML. It describes how XML allows users to define their own markup languages and has seen widespread success due to enabling data interchange over a standard syntax in an interconnected world.

Uploaded by

Jennet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

The Extensible Markup Language

The document discusses the history and development of XML, including its origins from SGML and its creation by the W3C to enable generic SGML to be processed on the web similarly to HTML. It describes how XML allows users to define their own markup languages and has seen widespread success due to enabling data interchange over a standard syntax in an interconnected world.

Uploaded by

Jennet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

The Extensible Markup Language (XML)

According to Markup is defined as extra textual syntax that can be used to


describe formatting, actions, structure information, text semantics, attributes, etc.
One example of markup can be the formatting commands of the popular text
formatting software TeX. In the late seventies was defined the Standard
Generalized Markup Language (SGML), a metalanguage for tagging text
developed by Charles F. Goldfarb and his group, and based on a previous work
done at IBM. In 1996 the SGML Editorial Review Board became the XML
Working Group under the auspices of the World Wide Web Consortium (W3C),
chaired by Jon Bosak of Sun Microsystems and with the intermediation of Dan
Connolly. This group developed The Extensible Markup Language (XML), a
subset of SGML which goal is to enable generic SGML to be served, received, and
processed on the Web in the way that is now possible with HTML(also based on
SGML). XML has been designed for ease of implementation and for
interoperability with both SGML and HTML. XML, the same as SGML, is not
exactly a markup language, it is a metalanguage that can be used to define specific
markup languages (like XHTML, MathML, SVG, etc.). That means that XML
allows users to define new tags and structures for their own language
For some reasons, some obvious and other that will remain a mystery, XML have
reached an amazing success worldwide. In an interconnected and global society,
the interchange of data over a standard syntax has become a key issue, and here is
where XML its perfectly.

The Semantic Web


The Semantic Web is a promising initiative lead by the W3C which aim is to
provide a data model for the Web, allowing information to be understood and
processed also by machines. The definition of the W3C says The Semantic Web is
the representation of data on the World Wide Web. [....] It is based on the Resource
Description Framework (RDF), which integrates a variety of applications using
XML for syntax and URIs for naming.. So, it is clear that this initiative strongly
relies on RDF, but what is the meaning of the representation of data? Often in this
document we have referred to de di-culties related to search and retrieve
information on the Web. One of the main reasons is the fact that the most part of
Web data, despite of being processed by machines, can be only understand by
humans. This include natural language text, still/moving images, audio, etc. Before
we have discussed the diference between information retrieval and data retrieval,
saying that while data retrieval is appropriate for databases it is not appropriate (or
not enough) for the Web. The reason is that the information on the Web, contrary
to databases, does not have an underlying data model. So, the representation of
datameans two things, the development of a data model for the Web, and the
dissemination of machine understandable metadata (under the framework of the
data model) linked in some way to the Web information. Another classic denition
is that by Berners-Lee et al. The Semantic Web is an extension of the current web
in which information is given well-defined meaning, better enabling computers and
people to work in cooperation.
Querying the Semantic
Web The Semantic Web initiative has opened a broad spectrum of
opportunities for improving the search and retrieve of information on the Internet.
Of course this is not casual, but one of the main targets of this new scenario as
pointed. However, the consolidation of a standardised way to interchange semantic
information is just another step in the race for interoperability. Other battles are
being to rationalise the way this information is processed and search and retrieval
are maybe the most important elements of the information feed chain. The
challenge is to e-cient and rational ways to exploit this new information that begins
to be disseminated over the net, and that, despite of it is formalised in a standard
way (RDF), it can be stored in diferent ways (embedded on HTML pages, in a
database, in specific knowledge repositories, etc.) and it remains highly
heterogeneous (an innumerable an unrestricted number of ontologies, potentially
overlapped, can co-live in the Semantic Web). This two key issues, how to locate
and access the information, and how to manage heterogeneity, are of relevance for
our analysis and also very related with what we have said in the previous sections.
Some research works redect special approaches to this, like the Edutella project
that constitutes a distributed search network for educational resources and is based
on P2P networking (its JXTA implementation) and RDF. This interesting work
uses the query exchange language family RDF-QEL-i (based on Datalog semantics
and subsets) as standardised query exchange language format. Because Edutella
peers are highly heterogeneous and have difierent kinds of local storage for RDF
triples, as well as some kind of local query language (e.g., SQL) to enable them to
participate in the network, wrappers are used to translate queries and results from
the peer and vice versa. Another work is Sesam, an extensible architecture
implementing a persistent RDF store and a query engine capable to process RQL
queries. Of special interest for us is TAP a system that implements a general query
interface called GetData, Semantic Negotiation and Web of Trust enabled
registries. It introduces the concept of Semantic Search and describes an
implemented system which uses the data from the Semantic Web to improve
traditional search results. The GetData interface is a simple query interface to
network accessible data presented as directed labelled graphs, in contrast to
expressive query languages like SQL, RQL or DQL. This work defends
deployability against query expressiveness. Related to this project, and also with
the query language of Edutella, is RDF-QBE, a mechanism for specifying RDF
subgraphs, which they call 'Query by Example', that could allow a high
performance standardised interface for retrieval of semantic information from
remote servers. From all this study cases we can observe the latent necessity of
dening a low-barrier mechanism that allow to harvest heterogeneous semantic
information and how it generates a trade-o between deployability and
expressiveness. Some of them (e.g. TAP) point the necessity to consider also other
conventional or not-semantic search strategies, like crawler-based engines, when
thinking in future applications.
Data Integration and XML The classic data integration literature focused on the
Relational Model for both queries and mappings till mid-90s. However, in late-90s
researches turned their interest to a new and emerging data model, XML. The new
model aroused as a de-facto standard to expose and interchange data, so it was the
ideal choose for systems pursuing data interoperability. Now, XML and its query
languages are the selected interfaces for Web Services, XML-native databases and
lots of other applications. 5.1 Mapping the classic data integration problems to
XML Integrating data from various XML sources arise the same problems
described in the classic data integration literature, but new solutions need to be
found to tackle the particularities of the new scenario. The first of these classical
problems is schema mapping. The schema in which terms is expressed the query
(there's no need to call it the Global Schema if we are e.g. in a peer-to-peer
context) must be someway mapped to the schema or schemas of the sources where
the query will be actually executed. The simplest approach to such mapping is an
attribute correspondence, where some property or attribute in one representation
corresponds to some attribute in the other representation. We end an increased
complexity when mapping concepts that are semantically the same, but the XML
representations may be structured diferently. Example 5.1.1. This example,
illustrates some of the problems of mapping XML schemas. Source1.xml DTD:
pubs book* title author* name publisher* name Source2.xml DTD: authors
author* full-name publication* title pub-type The example shows how a simple
schema describing books and authors can take diferent shapes. The di-cult of
obtaining a mapping between them will depend on the goal of that mapping. It may
serve for simple migration tasks (translation of data from one schema to another),
and then a simple translation template will be enough. However, it may be needed
for querying purposes, and then a more complex strategy is needed, related to the
old query rewriting problem described in previous sections. 5.2 XML query
languages and data integration XML query languages have been broadly used for
the development of simple data integration applications. Mapping between
schemas or dening wrappers with XSLT or XQuery can be a direct solution for
some real world problems. These solutions generally are based on the manual
coding of templates and updates, so they represent the modern version of the more
primitive data integration approaches.
XSL Transformations (XSLT) XSL Transformations (XSLT) is a language
standardized by the W3C for transforming XML documents into other XML
documents. XSLT is a component of the W3C's XML Stylesheet Language, and
initially its main purpose was to be used in conjunction with a formatting language
like XSL:FO, targetting the presentation layer independence. However, XSLT can
be used independently, and it has been used in many application areas, but
specially by the data integration community. A transformation expressed in XSLT,
called a stylesheet, describes rules for transforming a source XML document into a
result XML document. An XSLT stylesheet associates patterns with templates.
When a pattern is matched against an element in the source XML tree, the
corresponding template is instantiated to generate XML code for the result
document. This generation can include data from the source tree, but also can
include new data. The current version, XSLT 2.0 (W3C Candidate
Recommendation 3 November 2005), is a revised version of the XSLT 1.0
Recommendation published on 16 November 1999. It is designed to be used in
conjunction with XPath 2.0. XSLT shares the same data model as XPath 2.0,
which is defined. The capabilities of XSLT for transforming XML documents
makes it a natural choice for data integration applications. In scenarios where
heterogeneous XML schemas need to be mapped, XSLT stylesheets can be
manually coded or semi-automatically generated to allow the conversion between
the dierent schemas. XSLT has been used also for intra-model conversions, like
RDF-to-XML. Another usage of XSLT has been in the definition of web wrappers.
HTML code can be easily modified to become XHTML with tools like HTML
Tidy [143], and then filtered with XSLT stylesheets. Lots of commercial products
make use from XSLT data integration capabilities, like the Altova MapForce tool.
Example 5.2.1. This example shows how an input XML document can be
transformed using an XSLT template. Take the followint XML document
describing two movies. intput.xml: 1982 1959 The following XSLT template is
applied recursively to all the nodes of the underlying tree of the input document.
The template translates the movie elements into record elements. It also translates
the id attributes into equivalent elements. template.xslt:

You might also like