11A Programming 1 HTML-Web
11A Programming 1 HTML-Web
Learning Map
1. The Semantic Web
Giving web page contents more meaning for people and computers One of the most important tools for creating the semantic web Challenges of many cultures, many pages, in many languages and how XML and the semantic web may help
2. XML
Semantic Web
Semantic:
Part of the structure of language relating to meaning, especially of words
HTML does not actually say what anything actually is what kind of information it is
You cant tell from the <h2> tag what Zap Mama signifies. Is it a command? A label? A name?
A naming scheme
Unique identity for those documents
[Goble 03]
A place where computers do the presentation (easy) and people do the linking and interpreting (hard). Why not get computers to do more of the hard work?
http://www.buzzbutt.com/ html/shaw_party.html
Tags like <artist>Zap Mama</artist> would replace the current HTML strategy of tagging the format of the information
TBL (and others) have since been working towards realizing this vision, which has become known as the Semantic Web
article in May 2001 issue of Scientific American
Oh Happy Day!
The Semantic Web is under development Three major components
XML Extensible markup language
for tagging the structure of the data
Of course, we can't be drawing our way through the Semantic Web, so instead how about a table-style representation for the graph? Each row represents an arrow (an edge) in the figure. The first column has the name of the node at the start of the edge. The second column has the label of the edge itself (the kind of edge). The third column has the name of the node at the end of the arrow.
Example XML
The following text may look identical in a browser
Example XML
But its quite different under the hood. See how the XML differs from the HTML?
HTML
XML
Example XML
You can use Internet Explorer to view XML in its raw form (VIEW>SOURCE) Note the meaningful tags, like <tasklist>
Example RDF
RDF information is expressed in XML This example describes the prior example
Gives the title, author, creation date, and subject These pieces of information are called metadata because they are data about data
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://www.example.org/"> <rdf:Description rdf:about="http://www.example.org/vincent_donofrio"> <ex:starred_in> <ex:tv_show rdf:about="http://www.example.org/law_and_order_ci" /> </ex:starred_in> </rdf:Description>
<rdf:Description rdf:about="http://www.example.org/the_thirteenth_floor"> <ex:similar_plot_as rdf:resource="http://www.example.org/the_matrix" /> </rdf:Description> </rdf:RDF>
OWL Example
OWL is also expressed in a form similar to XML Things to note from the example:
a wine is a potable liquid produced by at least one maker of type winery A wine is made from at least one type of grape (such grapes are restricted to wine grapes elsewhere in the ontology)
Wine
<rdfs:Class rdf:ID="WINE"> <rdfs:subClassOf rdf:resource="#POTABLE-LIQUID"/> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#MAKER"/> <daml:minCardinality> 1 </daml:minCardinality> </daml:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#MAKER"/> <daml:toClass rdf:resource="#WINERY"/> </daml:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#GRAPE-SLOT"/> <daml:minCardinality> 1 ......... </rdfs:Class>
Can be used for a variety of purposes, but for the semantic web is used for giving structure to raw data XML is a restricted form of SGML, the Standard Generalized Markup Language (ISO Standard #8879)
XML Smackdown
XML is bigger and more powerful than HTML Anything HTML can do, XML can do better XML can be used to create specialized tools (such as XML Signatures for storing and managing digital signatures)
There is no easy or consistent way to do this in HTML
HTML has recently been eaten up by XHTML, which expresses all of the HTML tags as proper XML tags
Eventually, HTML will become obsolete, and to be properly displayed, all pages will have to be XHTML compliant
2. XML
The universal markup language, poised to take over pretty much everything having to do with markup on the web XML is the heart of the semantic web
Where is the major growth coming from in terms of new web content?
Latin America/Caribbean
586,662,468
18,068,919
186,922,050
31.9 %
934.5 %
10.4 %
Oceania / Australia
34,700,201
7,620,480
21,110,490
60.8 %
177.0 %
1.2 %
Likely Conclusions
Although the total number of English language web pages will continue to grow, the proportion versus total pages will continue to drop
The proportional growth of pages in European languages will also slow down The proportional growth of pages in Chinese will grow at an accelerating pace
The use for or necessity of automated and semiautomated page translation will increase markedly over the coming ten years
Machine Translation
Refers to the process of using a computer program to translate from one language to another The state of the art is still not as accurate or sophisticated as one might like Back-translation example from Babelfish:
Original text: In all of Syracuse University, there is not a finer instructor of Information Technology than the highly-accomplished, intellectual overachiever, and all-around good guy who is known as Randy Wenner.
Chinese back to English: West the grand total forces the dove Si university, compared to is called blue Wenner the high success, the intelligence high achievement and the versatile goodness does not have an information technology better instructor.
Machine translation can be improved by the use of XML and XML standards
XML documents are much easier to translate than other electronic documents because they separate out form from content, and they conform to a rigorous standard and defined syntax.
Google
Google Translate (http://translate.google.com/)
Translates text or URLs
English is first language in relatively little of the world population and a declining proportion of Internet users The need will grow over the coming 5-10 years for automated translation of web content to facilitate use of foreign language web pages XML can be used to develop tools and standards that will assist with the development of better machine translation