Unit-1 Notes
Unit-1 Notes
Unit-1 Notes
Introduction: Introduction to Semantic Web, the Business Case for the Semantic Web,
XML and Its Impact on the Enterprise
Tim Berner’s Lee has a two part visison for the future web
Computing professionals began to realize that data was important, and it must be verified
and protected. Programming languages began to acquire object-oriented facilities that
internally made data first-class citizens. However, this “data as king” approach was kept
internal to applications so that vendors could keep data proprietary to their applications for
competitive reasons.
With the Web, Extensible Markup Language (XML), and now the emerging Semantic
Web, the shift of power is moving from applications to data. This also gives us the key to
understanding the Semantic Web. The path to machine-processable data is to make the
data smarter.
Ontologies and rules. In this stage, new data can be inferred from existing data by
following logical rules. In essence, data is now smart enough to be described with
concrete relationships, and sophisticated formalisms where logical calculations can be
made on this “semantic algebra.” This allows the combination and recombination of data
at a more atomic level and very fine-grained analysis of data. Thus, in this stage, data no
longer exists as a blob but as a part of a sophisticated microcosm. An example of this data
sophistication is the automatic translation of a document in one domain to the equivalent
(or as close as possible) document in another domain
The Semantic Web is not just for the World Wide Web. It represents a set of
technologies that will work equally well on internal corporate intranets. This is
analogous to Web services representing services not only across the Internet but also
within a corporation’s intranet. So, the Semantic Web will resolve several key
problems facing current information technology architectures.
Information Overload
Stovepipe Systems
A stovepipe system is a system where all the components are hardwired to only work
together. Therefore, information only flows in the stovepipe and cannot be shared by
other systems or organizations that need it. For example, the client can only
communicate with specific middleware that only understands a single database with a
fixed schema Breaking down stovepipe systems needs to occur on all tiers of
enterprise information architectures; however, the Semantic Web technologies will be
most effective in breaking down stove piped database systems.
XML is the syntactic foundation layer of the Semantic Web. All other technologies providing
features for the Semantic Web will be built on top of XML. Requiring other Semantic Web
technologies (like the Resource Description Framework) to be layered on top of XML
guarantees a base level of interoperability.
The technologies that XML is built upon are Unicode characters and Uniform Resource
Identifiers (URIs). The Unicode characters allow XML to be authored using international
characters. URIs are used as unique identifiers for concepts
in the Semantic Web.
For example, if I label something a <price> $12.00 </price> and you label that field on your
invoice <cost> $12.00 </cost>, there is no way that a machine will know those two mean the
same thing unless Semantic Web technologies like
ontologies are added
2. Easier discovery: Semantic Web technologies will be necessary to solve the Web service
discovery problem.
3. Better interaction: is in enabling Web services to interact with other Web services.
Advanced Web service applications involving comparison, composition , or orchestration of
Web services will require Semantic Web technologies for such interactions to be automated.
In short web services are like building blocks and the semantic web is the glue that
helps us connect them and use them effectively.
Figure 1.3 demonstrates the various convergences that combine to form Semantic Web
services.
Web services complete a platform-neutral processing model for XML. The step after that is
to make both the data and the processing model smarter. In other words, continue along the
“smart-data continuum.”. This will move along five axes:
1. logical assertions
2. classification
3. formal class models
4. rules, and
5. trust.
Logical Assertions
An assertion is the smallest expression of useful information. How do we make an assertion?
One way is to model the key parts of a sentence by connecting a subject to an object with a
verb. Resource Description Framework (RDF), which captures these associations between
subjects and objects. The importance of this cannot be understated. As Tim Berners-Lee
states, “The philosophy was: What matters is in the connections. It isn’t the letters, it’s the way
they’re strung together into words. It isn’t the words, it’s the way they’re strung together into
phrases. It isn’t the phrases, it is the way they’re strung together into a document.” Agreeing
with this sentiment, Hewlett-Packard Research has developed open source software to process
RDF called Jena.
Classification
Formal class models. A formal representation of classes and relationships between classes to
enable inference requires rigorous formalisms even beyond conventions used in current
object-oriented programming languages like Java and C#. Ontologies are used to represent
such formal class hierarchies, constrained properties, and relations between classes. The W3C
is developing a Web Ontology Language (abbreviated as OWL)
Figure 1.5 shows several classes (Person, Leader, Image, etc.), a few properties of the class
Person (birthdate, gender), and relations between classes (knows, is-A, leads, etc.). Again,
while not nearly a complete ontology, the purpose of Figure 1.5 is to demonstrate how an
ontology captures logical information in a manner that can allow inference. For example, if
John is identified as a Leader, you can infer than John is a person and that John may lead an
organization. Additionally, you may be interested in questioning any other person that
“knows” John. Or you may want to know if John is depicted in the same image as another
person (also known as co-depiction). It is important to state that the concepts described so far
(classes, subclasses, properties) are not rigorous enough for inference. To each of these basic
concepts, additional formalisms are added. For example, a property can be further specialized
as a symmetric property or a transitive property. Here are the rules that define those
formalisms:
Bad precedent
The paragraphs discuss how skeptics often use the failure of early artificial intelligence
predictions to discredit the Semantic Web. They mention a famous prediction from 1957 by
Herbert Simon and Allen Newell about a computer beating a human at chess within 10 years,
which didn't come true. Tim Berners-Lee, the inventor of the World Wide Web, responds to
these comparisons by clarifying that the Semantic Web is not the same as artificial
intelligence. He explains that the Semantic Web focuses on making documents
understandable to machines, but it doesn't involve creating magical AI that understands
human language effortlessly. Instead, it requires people to structure data in a specific way to
make it easier for machines to process and solve well-defined problems.
Fear, uncertainty, and doubt (FUD). This is skepticism “in the small” or nitpicking
skepticism over the difficulty of implementation details. The most common FUD tactic is
deeming the Semantic Web as too costly. Semantic Web modeling is on the same scale as
modeling complex relational databases. Relational databases were costly in the 1970s, but
prices have dropped precipitously (especially with the advent of open source). The cost of
Semantic Web applications is already low due to the Herculean efforts of academic and
research institutions. The cost will drop further as the Semantic Web goes mainstream in
corporate portals and intranets within the next three years.
Status quo.
Some people don't like change (status quo!). They think the current way of doing things is
fine and the Semantic Web is unnecessary.
a. Just like people initially doubted the World Wide Web, they doubt the Semantic Web's
usefulness.
b. Back then, people used clunky systems just for phone numbers. Tim Berners-Lee
showed them a simple web-based solution, but they didn't see the bigger picture.
c. They preferred isolated "stovepipe" solutions for each task, instead of a more flexible
web architecture.
d. Why? They couldn't understand the power of sharing information on a global network.
Basically: Some people resist change and don't see the potential benefits of the Semantic
Web, just like they initially doubted the World Wide Web
2. Consumers and businesses want to apply the network effect to their information.
Average people see and understand the network effect and want it applied to their home
information processing. Average homeowners now have multiple computers and want them
networked. Employees understand that they can be more effective by capturing and
leveraging knowledge from their coworkers. Businesses also see this, and the smart ones are
using it to their advantage. Many businesses and government organizations see an
opportunity for employing these technologies (and business process reengineering) with the
deployment of enterprise portals as natural aggregation points.
Key points:
Knowledge is Power: Having the right information and being able to use it quickly is
crucial for success. It's not just about having lots of data anymore; it's about knowing how
to turn that data into useful knowledge for making decisions.
3. Role of Semantic Web: The Semantic Web can help bring order to this chaos by
organizing information in a structured way. It's not just about storing data; it's about
tagging it in a way that computers can understand and making sure it's reliable.
4. Key Concepts of Semantic Web: It's not enough to just gather information; we need to
tag it with meaningful labels and ensure its reliability. We also need tools to analyze and
use this information effectively.
5.Benefits for Organizations: By implementing Semantic Web principles, organizations
can avoid duplicating efforts, share lessons learned, and save time and money. Having a
centralized knowledge base that software can analyze can lead to efficient web-based
applications.
Decision Support
The example used is the 9/11 attacks, where FBI Director Mueller wished for a system
that could connect different data sources and identify hidden relationships.
This is exactly what the Semantic Web aims to do - connect and understand
information beyond just keywords.
It allows software agents (like intelligent assistants) to find hidden connections and
patterns in data across different systems, something traditional methods struggle with.
This is beneficial for both large organizations like governments and businesses
because:
o Information from different departments/groups can be combined and
analyzed, revealing hidden insights.
o Relationships between projects or data points can be identified, giving a broader
picture.
o Better decisions can be made based on this richer understanding of information.
Examples include:
o Government: Automatically identifying suspicious patterns in financial documents
(like SEC filings).
o Businesses: Creating decision support systems that provide relevant information and
alerts based on specific needs.
Overall: The Semantic Web helps organizations unlock the true power of their
information by connecting it and making it more meaningful, leading to better
decisions.
Business Development
1. Importance of Up-to-Date Information: It's crucial for organizations to have the
latest information to seize business opportunities. However, it's not feasible to
physically bring all experts to every sales meeting.
In simple terms, having access to the right information at the right time can
significantly enhance business opportunities and improve customer relationships.
Semantic Web technologies play a crucial role in organizing and utilizing this
information effectively.
Is the Technology for the Semantic Web “There Yet”?
The Semantic Web is still a dream, but the groundwork is being laid with new technologies.
Standards are being developed by organizations like W3C to make data work across different
systems.
Technologies like XML and RDF are being used to add meaning to information online.
Web services are being built to allow software programs to talk to each other.
Companies like Adobe and IBM are investing in Semantic Web technologies.
In short, the internet is getting smarter and more connected, paving the way for a Semantic Web.
Why is XML success?
The primary use of XML is for data exchange between internal and external organizations
(interoperability).
Think of XML as plain English for computers. It's easy for anyone to read and
understand, unlike some software's specific file formats.
Imagine you have a document:
a. Normal document: You can easily read and understand it.
b. Software-specific document: It's like a secret code, only the software
that created it can understand.
XML is like the normal document. It's readable by any program, making it easier to
share and use data.
The second key accomplishment is that XML provides a simple, standard syntax for
encoding the meaning of data values, or meta data. An often-used definition of meta
data is “data about data.” XML standardizes a simple, text-based method for
encoding meta data. In other words, XML provides a simple yet robust mechanism for
encoding semantic information, or the meaning of data. Table 3.1 demonstrates the
difference between meta data and data. It should be evident that the data is the raw
context-specific values and the meta data denotes the meaning or purpose of those
values.
The last accomplishment of XML is that it is not a new technology. XML is a subset
of the Standardized Generalized Markup Language (SGML) that was invented in 1969
by Dr. Charles Goldfarb, Ed Mosher, and Ray Lorie. So, the concepts for XML were
devised over 30 years ago and continuously perfected, tested, and broadly
implemented.
What is XML?
XML is not a language; it is actually a set of syntax rules for creating semantically rich
markup languages in a particular domain. In other words, you apply XML to create
new languages. Any language created via the rules of XML, like the Math Markup
Language (MathML), is called an application of XML. A markup language’s primary
concern is how to add semantic information about the raw content in a document; thus,
the vocabulary of a markup language is the external “marks” to be attached or
embedded in a document.
<footnote>
<author> Michael C. Daconta </author>, <title> Java Pitfalls </title>
</footnote>
Here we have one element, called “footnote,” which contains character data
and two subelements: “author” and “title.”
A valid XML document references and satisfies a schema. A schema is a separate
document whose purpose is to define the legal elements, attributes, and structure of an
XML instance document. In general, think of a schema as defining the legal vocabulary,
number, and placement of elements and attributes in your markup language. Therefore, a
schema defines a particular type or class of documents. The markup language constrains
the information to be of a certain type to be considered “legal.”
What Are XML Namespaces?
Namespaces are a simple mechanism for creating globally unique names for the elements
and attributes of your markup language. This is important for two reasons:
Unfortunately, namespaces were not fully compatible with DTDs, and therefore their
adoption has been slow. The current markup definition languages, like XML Schema,
fully support namespaces.
This processing occurs during parsing by an application. Parsing is the dissection of a
block of text into discernible words (also known as tokens). There are three common
ways to parse an XML document: by using the Simple API for XML (SAX), by building
a Document Object Model (DOM), and by employing a new technique called pull
parsing. SAX is a style of parsing called event-based parsing where each information
class in the instance document generates a corresponding event in the parser as the
document is traversed. SAX parsers are useful for parsing very large XML documents or
in low-memory environments.
The Document Object Model (DOM) is a language-neutral data model and application
programming interface (API) for programmatic access and manipulation of XML and
HTML. Unlike XML instances and XML schemas, which reside in files on disk, the
DOM is an in-memory representation of a document. The need for this arose from
differences between the way Internet Explorer (IE) and Netscape Navigator allowed
access and manipulation of HTML documents to support Dynamic HTML (DHTML). IE
and Navigator represent the parts of a document with different names, which made cross
browser scripting extremely difficult. Thus, out of the desire for cross-browser scripting
came the need for a standard representation for document objects in the browser’s
memory. The model for this memory representation is object oriented programming
(OOP). So , by turning around the title, we get the definition of a DOM: a data model,
using objects, to represent an XML or HTML document.
Object-oriented programming introduces two key data modeling concepts classes and
objects. A class is a definition or template describing the characteristics and behaviors of
a real-world entity or concept. From this description, an in memory instance of the class
can be constructed, which is called an object. So,an object is a specific instance of a
class. The key benefit of this approach to modeling program data is that your
programming language more closely resembles the problem domain you are solving.
Real-world entities have characteristics and behaviors. Thus, programmers create classes
that model realworld entities like “Auto,” “Employee,” and “Product.” Along with a class
name, a class has characteristics, known as data members, and behaviors,
known as methods. Figure 3.6 graphically portrays a class and two objects.
The simplest way to think about a DOM is as a set of classes that allow you to
create a tree of objects in memory that represent a manipulable version of an
XML or HTML document. There are two ways to access this tree of objects:
The generic way (see Figure 3.7) shows all parts of the document as objects of the same
class, called Node. The generic DOM representation is often called a “flattened view”
because it does not use class inheritance. Class inheritance is where a child class inherits
characteristics and behaviors from a parent class just like in biological inheritance.
There are currently three DOM levels:
DOM Level 1. This set of classes represents XML 1.0 and HTML 4.0 documents.
DOM Level 2. This extends Level 1 to add support for namespaces; cascading style
sheets, level 2 (CSS2); alternate views; user interface events; and enhanced tree
manipulation via interfaces for traversal and ranges. Cascading style sheets can be
embedded in HTML or XML documents in the <style> element and provide a method of
attaching styles to selected elements in the document. Alternate views allow alternate
perspectives of a document like a new DOM after a style sheet has been applied. User
interface events are events triggered by a user, such as mouse events and key events, or
triggered by other software, such as mutation events and HTML events (load, unload,
submit, etc.). Traversals add new methods of visiting nodes in a tree—specifically,
NodeInterator and TreeWalker—that correspond to traversing the flattened view and
traversing the hierarchical view.
DOM Level 3. This extends Level 2 by adding support for mixed vocabularies (different
namespaces), XPath expressions load and save methods, and a representation of abstract
schemas (includes both DTD and XML Schema). XPath is a language to select a set of
nodes within a document. Load and save methods specify a standard way to load an
XML document into a DOM and a way to save a DOM into an XML document. Abstract
schemas provide classes to represent DTDs and schemas and operations on the schemas.
XML is pervading all areas of the enterprise, from the IT department to the intranet,
extranet, Web sites, and databases. The adoption of XML technology has moved well
beyond early adopters into mainstream use and has become integrated with the
majority of commercial products on the market, either as a primary or enabling
technology.
XML has become the universal syntax for exchanging data between organizations. By
agreeing on a standard schema, organization can produce these text documents that
can be validated, transmitted, and parsed by any application regardless of hardware or
operating system. The government has become a major adopter of XML and is moving
all reporting requirements to XML. Companies report financial information via XML,
and local governments report regulatory information. XML has been called the next
Electronic Data Interchange (EDI) system, which formerly was extremely costly, was
cumbersome, and used binary encoding. Easy data exchange is the enabling
technology behind the next two areas:
a. ebusiness and
b. Enterprise Application Integration.
2. Ebusiness
CRM systems enable an organization’s sales and marketing staff to understand, track,
inform, and service their customers. CRM involves many of the other systems we have
discussed here, such as portals, content management systems, data integration, and
databases, where XML is playing a major role. XML is becoming the glue to tie all
these systems together to enable the sales force or customers (directly) to access
information when they want and wherever they are (including wireless).
XML is pervading all areas of the enterprise, from the IT department to the intranet,
extranet, Web sites, and databases. The adoption of XML technology has moved well
beyond early adopters into mainstream use and has become integrated with the
majority of commercial products on the market, either as a primary or enabling
technology.
XML has become the universal syntax for exchanging data between organizations. By
agreeing on a standard schema, organization can produce these text documents that
can be validated, transmitted, and parsed by any application regardless of hardware or
operating system. The government has become a major adopter of XML and is moving
all reporting requirements to XML. Companies report financial information via XML,
and local governments report regulatory information. XML has been called the next
Electronic Data Interchange (EDI) system, which formerly was extremely costly, was
cumbersome, and used binary encoding. Easy data exchange is the enabling
technology behind the next two areas:
c. ebusiness and
d. Enterprise Application Integration.
6. Ebusiness
Business-to-business (B2B) transactions have been revolutionized through XML. B2B
revolves around the exchange of business messages to conduct business transactions.
There are dozens of commercial products supporting numerous business vocabularies
developed by RosettaNet, OASIS, and other organizations. Coca-Cola, IBM, and
others have seen major benefits from using XML to streamline their B2B transactions.
This has helped them cut costs and boost profits. The future of B2B communication is
even brighter with web services, which are like little software applications that
businesses can easily connect to and use. This will make it even easier for companies
to work together electronically.
XML has had a greater effect on relational database management systems (DBMS)
than object-oriented programming (which created a new category of database called
object-oriented database management systems, or OODBMS). XML has even
spawned a new category of databases called native XML databases exclusively for the
storage and retrieval of XML. All the major database vendors have responded to this
challenge by supporting XML translation between relational tables and XML schemas.
Additionally, all of the database vendors are further integrating XML into their
systems as a native data type. This trend toward storing and retrieving XML will
accelerate with the completion of the W3C XQuery specification.
CRM systems enable an organization’s sales and marketing staff to understand, track,
inform, and service their customers. CRM involves many of the other systems we have
discussed here, such as portals, content management systems, data integration, and
databases, where XML is playing a major role. XML is becoming the glue to tie all
these systems together to enable the sales force or customers (directly) to access
information when they want and wherever they are (including wireless).
XML meta data is a form of description. It describes the purpose or meaning of raw data
values via a text format to more easily enable exchange, interoperability, and application
independence. As description, the general rule applies that “more is better.” Meta data
increases the fidelity and granularity of our data. The way to think about the current state of
meta data is that we attach words (or labels) to our data values to describe it. How could we
attach sentences? What about paragraphs? While the approach toward meta data evolution
will not follow natural language description, it is a good analogy for the inadequacy of
words alone. The motivation for providing richer data description is to move data
processing from being tediously preplanned and mechanistic to dynamic, just-in-time, and
adaptive.
For example, you may be enabling your systems to respond in real time to a
location-aware cell phone customer who is walking by one of your store outlets. If your
system can match consumers’ needs or past buying habits to current sale merchandise, you
increase revenue. Additionally, your computers should be able to support that sale with
just-in-time inventory by automating your supply chain with your partners. Finally, after
the sale, your systems should perform rich customer relationship management by allowing
transparency of your operations in fulfilling the sale and the ability to anticipate the needs
of your customers by understanding their life and needs. The general rule is this: The more
computers understand, the more effectively they can handle complex tasks
Semantic Levels
Figure 3.9 shows the evolution of data fidelity required for semantically aware applications.
Instead of just meta data, we will have an information stack composed of semantic levels.
We are currently at Level 1 with XML Schema, which is represented as modeling the
properties of our data classes. We are capturing and processing meta data about isolated
data classes like purchase orders, products, employees, and customers. On the left side of
the diagram we associate a simple physical metaphor to the state of each level. Level 1 is
analogous to describing singular concepts or objects. In Level 2, we will move beyond data
modeling (simple meta data properties) to knowledge modeling. Knowledge modeling
enables us to model statements both about the relationships between Level 1 objects and
about how those objects operate. This is diagrammed as connections between our objects in
Figure 3.9. Beyond the knowledge statements of Level 2 are the superstructures or “closed
world modeling” of Level 3. The technology that implements these sophisticated models of
systems is called ontologies.