0% found this document useful (0 votes)
8 views68 pages

SNSW Unit-4

The document discusses the development of social semantic applications, emphasizing the importance of formal semantics and distributed data in Semantic Web applications. It introduces tools like Elmo and GraphUtil for managing RDF data and highlights the architecture of Semantic Web applications, including the use of triple stores like Sesame. Additionally, it presents Flink, a framework for analyzing social networks within the Semantic Web community using semantic technologies.

Uploaded by

sreereshma2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views68 pages

SNSW Unit-4

The document discusses the development of social semantic applications, emphasizing the importance of formal semantics and distributed data in Semantic Web applications. It introduces tools like Elmo and GraphUtil for managing RDF data and highlights the architecture of Semantic Web applications, including the use of triple stores like Sesame. Additionally, it presents Flink, a framework for analyzing social networks within the Semantic Web community using semantic technologies.

Uploaded by

sreereshma2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Social

Networks &
Semantic Web
CO 4: Developing social semantic
applications
Contents

Building Semantic Flink: The social openacademia:


Web applications with networks of the Distributed, semantic-
social network Semantic Web based publication
features. community. management.
Introduction
• We know the web 2.0 developed using:
• LAMP
• M:Relational Databases.
• Is Such relational database approaches suitable for Semantic Web?
• Semantic Web Applications deals with a different kind of data.
• Ontologies with formal semantics
• Semantic Web Applications deals with data that is not centralized
rather distributed.
Introduction
• In order to be considered a Semantic Web application with
respect to the Challenge, an application has to meet the
following criteria:
• The meaning of data has to play a central role
• Meaning must be represented using formal descriptions.
• Data must be manipulated/processed in interesting ways to derive useful
information.
• This semantic information processing has to play a central role in
achieving goals that alternative technologies cannot do as well, or at all;
Introduction
• In order to be considered a Semantic Web application with
respect to the Challenge, an application has to meet the
following criteria:
• The information sources used
• Should be under diverse ownership or control.
• Should be heterogeneous (syntactically, structurally, and semantically).
• Should contain substantial quantities of real world data (i.e. not toy
examples).
• It is required that all applications assume an open world, i.e. that
the information is never complete.
Introduction
• Developers delving into SW development are stopped short because there are
all these challenges with data sources.
• Developers new to Semantic Web development need to get familiar with the
slate of technologies related to ontology-based knowledge representation such
as
• standard ontology languages (RDF/OWL).
• representations (RDF/XML).
• query languages (SPARQL).
• Although developers can still use their preferred operating systems and
programming languages, they need to familiarize themselves with a new set of
tools for processing.
• Lastly, there is also the chicken and egg problem of lacking applications due to a
lack of data in RDF format and lacking data due to a lack of applications.
Introduction
• Some issues were addressed later but many still remine:
• The scalability of ontology databases (the so-called triple stores)
has reached the point when switching to RDF as a native data
format of an application is easy and does not lead to a significant
loss in performance.
• The number of data sources to connect has increased due to
which more number of ontologies available.
Introduction
• Elmo
• A toolkit to enables software engineers to create semantic applications that
manipulate RDF/OWL data at the level of domain concepts (persons,
publications etc.) instead of working at the lower level of the ontology
language (resources, literals and statements).
• Acting as a middle layer between the RDF ontology store and the
application, Elmo hides much of the complexity of application development
so that the developer can focus on implementing application logic.
• It Contains a number of tools for dealing with RDF data, including tools for
working with FOAF data and performing instance unification as discussed
previously.
• It also helps the programming to deal with incorrect data by providing a
validation framework.
Introduction
• GraphUtil
• It maps the FOAF representation of network data to the graph model of the
JUNG (Java Universal Network Graph) API.
• It allows the developer to deal with network data in FOAF at the level of
network concepts such vertices, edges etc. and to compute network statistics
such as various measures of centrality.
• In the following, we first describe the general design of Semantic Web
applications so that we can pinpoint the place of these components.
• Subsequently, we briefly introduce Sesame, Elmo, the GraphUtil
utility.
The generic architecture of Semantic Web
applications

Semantic Web applications have been


developed in the past years in a wide range
of domains from cultural heritage to
medicine, from music retrieval to e-science.
Yet, almost all share a generic architecture.
The generic architecture of Semantic Web
applications
• Before external, heterogeneous data sources can be reused, they need to
be normalized syntactically as well as semantically.
• Transforming data into an RDF syntax such as RDF/XML if data is not directly available
in these formats.
• Ontologies (schemas and instances) of the data sources need to be reconciled.
• Data sources are having multiple ownerships and are heterogenous.
• Like in early developments of SW, if data is from limited number of
sources, the schemas of the data sources are known in advance and their
mapping can be performed manually.
• But as the SW grows it is expected that Semantic Web applications will be
able to discover and select new data sources and map them
automatically.
The generic architecture of Semantic Web
applications
• Most Semantic Web applications have a web interface for querying
and visualization and thus considered by all as web applications using
SPARQL.
• A SPARQL service allows other applications to query the triple store,
but it provides no data manipulation features such as adding or
removing data.
• Therefore, most triple stores also provide custom web interfaces for
data manipulation.
• A Semantic Web application may also expose data or services at
higher levels of abstraction than the level of triples, i.e. on the level
of domain objects and operations that can be executed on them.
The generic architecture of Semantic Web
applications
• As one would assume, the application logic of Semantic Web
applications is placed between the triple store and the eventual web
interface(s).
• The application normally accesses the triple store through its client
API.
• When working with the API of the triple store, the programmer
manipulates the data at the level of RDF triples.
• One has to rely on SPARQL or proprietary APIs to access a triple store
due to the lack of ODBC or JDBC protocols for relational databases.
The generic architecture of Semantic Web
applications
• Also, most of the time one needs to access the triple store at an ontology
level, i.e., at the level of classes, instances and their properties.
• The Elmo library to be introduced facilitates this by providing access to the
data in the triple store through Java classes that map the ontological data
in the triple store.
• Elmo is a set of interfaces that have been implemented for the specific case
of working with data in Sesame triple stores.
• The Elmo interfaces can be implemented for other, Java-based triples stores
such as Jena.
• Interfacing with non-Java triple stores would require an agreement on
standard protocols similar to JDBC.
The generic architecture of Semantic Web
applications: Sesame
• Sesame is one of the most popular RDF triple stores
implemented using Java technology.
• It allows creating repositories and specifying access
privileges, storing RDF data in a repository and querying the
data using any of the supported query languages.
• SPARQL : Has the advantage in terms of standardization, it is also
minimal by design.
• SeRQL: A more expressive query language with many useful
features.
The generic architecture of Semantic Web
applications: Sesame
• The data in the repository can be manipulated on the level of triples:
• Individual statements can be added and removed from the repository.
• Updates can be carried out by removing and then adding a statement but not
by editing.
• RDF data can be added or extracted in any of the supported RDF
representations including the RDF/XML and Turtle languages but not OWL.
• Sesame has a built-in inferencer for applying the RDF(S) inference
rules hence is not only a data store but also integrates reasoning.
• Reasoning can be enabled or disabled for specific repositories and is
performed at the time when data is added to the repository or when
it is removed.
The generic architecture of Semantic Web
applications: Sesame
• A recently added feature of Sesame (Sesame 2.0) allows it to store
and retrieve context information.
• In case of the presence of context information, every triple
becomes a quad, with the last attribute identifying the context.
• All the mentioned functionalities of Sesame can be accessed in
three ways.
• Sesame provides an HTML interface that can be accessed through a
browser.
• A set of servlets exposes functionality for remote access through HTTP,
SOAP and RMI.
• A Java client library for developers which exposes all the above
mentioned functionality of a Sesame repository using method calls on a
Java object called Sesame Repository.
The generic architecture of Semantic Web
applications: Sesame

Queries, for example, can


be executed by calling the
evaluateTableQuery
method of this class,
passing on the query itself
and the identifier of the
query language.
The generic architecture of Semantic Web
applications: Sesame

The result is another object


(QueryResultsTable) which
contains the result set in the
form of a table much like the
one shown in the web
interface.
The generic architecture of Semantic Web
applications: Sesame
• Every row is a result and every column contains the value for a given
variable.
• The values in the table are objects of type URI, BNode or Literal, the
object representations of the same notions in RDF.
• For example, one may call the getValue, getDatatype and getLanguage
methods of Literal to get the String representation of the literal, its
datatype and its language.
The generic architecture of Semantic Web
applications: Sesame

Adding data to a Sesame repository


using the web interface.
The generic architecture of Semantic Web
applications: Elmo
• Elmo is a development toolkit consisting of two main components.
• Elmo API.
• Elmo Tools: A set of tools for working with RDF data, including an RDF
crawler and a framework of smushers.
The generic architecture of Semantic Web
applications: Elmo API
• Provides the interface between a set of JavaBeans representing
ontological classes and the underlying triple store containing the data
that is manipulated through the JavaBeans.
• The API also includes the tool for generating JavaBeans from
ontologies and vice versa.
• The core of the Elmo API is the ElmoManager a JavaBean pool
implementation that is responsible for creating, loading, renaming
and removing ElmoBeans.
• ElmoBeans are a composition of concepts and behaviors.
The generic architecture of Semantic Web
applications: Elmo API
• Concepts are Java interfaces that correspond one-to-one to a
particular ontological class and provide getter and setter methods
corresponding to the properties of the ontological class.
• The inheritance hierarchy of the ontological classes is mapped directly
to the inheritance hierarchy of concepts.
• Elmo concepts are typically generated using a code-generator.
• Instances of ElmoBeans corresponds to instances of the data set.
• As resources in an ontology may have multiple types, ElmoBeans
themselves need to be composed of multiple concepts.
• ElmoBeans implement particular combinations of concept interfaces.
The generic architecture of Semantic Web
applications: Elmo API
• ElmoBeans may also implement behaviors.
• Behaviors are concrete or abstract classes that can be used to give
particular implementations of the methods of a concept (in case the
behavior should differ from the default behavior), but can also be
used to add additional functionality.
The generic architecture of Semantic Web
applications: Elmo API
• As a separate package, Elmo also provides ElmoBean representations
for the most popular Web ontologies, including FOAF, RSS 1.0 and
Dublin Core.
• For example, in the FOAF model there is Person JavaBean with the
properties of foaf :Person.
• Getting and setting these properties manipulates the underlying RDF
data.
• This higher level of representation significantly simplifies
development.
The generic architecture of Semantic Web
applications: Elmo API
The generic architecture of Semantic Web
applications: Elmo API
• As we see in this example, after creating the repository all the
interaction with the contents of the repository is encapsulated by the
ElmoManager class, which is used to load and instantiate the
JavaBean.
• After setting some of the properties of the Person instance, we write
it out as an RDF/XML document.
The generic architecture of Semantic Web
applications: Elmo API
• An additional module of the Elmo API, the AugurRepository, can be
used to improve the performance of applications through (predictive)
caching.
• Information read from the repository is cached for further queries.
(Similarly, writes are also cached until the transaction is committed.
The default, however, is automatic commit.)
• Caching also involves predicting the kind of queries the user is likely
to ask and pre-loading the information accordingly.
• Already when a resource is first accessed all the properties of that
resource are preloaded.
The generic architecture of Semantic Web
applications: Elmo API
• Lastly, Elmo helps developers to design applications that are robust
against incorrect data, which is a common problem when designing
for the Web.
The generic architecture of Semantic Web
applications: Elmo Tools
• Elmo also contains a number of tools to work with RDF data.
• Scutter: a generic RDF crawler that follows rdfs:seeAlso links in RDF
documents, which typically point to other relevant RDF sources on
the web.
• Smusher: To find equivalent instances in large sets of data.
• Instance by Instance
• Property by Property
The generic
architecture of
Semantic Web
applications: GraphUtil

• GraphUtil is a simple utility that


facilitates reading FOAF data into the
graph object model of the Java
Universal Network Graph (JUNG)
API.
The generic architecture of Semantic Web applications:
GraphUtil
• GraphUtil can be configured by providing two different queries that define
the nodes and edges in the RDF data.
• These queries thus specify how to read a graph from the data.
• For FOAF data, the first query is typically one that returns the foaf :Person
instances in the repository, while the second one returns foaf :knows
relations between them.
• JUNG is a Java library (API) that provides an object-oriented representation
of different types of graphs (sparse, dense, directed, undirected, k-partite
etc.)
• JUNG also contains implementations for the most well known graph
algorithms such as Dijkstra’s shortest path.
The generic architecture of Semantic Web applications:
GraphUtil
• Various implementations of the Ranker interface allow to compute
various social network measures such as the different variations of
centrality.
• PermanentNodeRanker which makes it possible to not just compute
but store and retrieve node rankings in an RDF store.
• JUNG provides a customizable visualization framework for displaying
graphs.
• Most importantly, the framework let’s the developer choose the kind
of layout algorithm to be used and allows for defining interaction with
the graph visualization.
Flink: the social networks of the Semantic
Web community
• Flink is a distributed processing engine and a scalable data analytics framework.
You can use Flink to process data streams at a large scale and to deliver real-
time analytical insights about your processed data with your streaming
application.
• Flink has been the first system that utilizes semantic technologies for the
purposes of network analysis based on heterogeneous knowledge sources and
has been the winner of the Semantic Web Challenge of 2004.
• Flink, developed by the present author, is a general system that can be
instantiated for any community for which substantial electronic data is available.
• The current, public instantiation of Flink is a presentation of the professional
work and social connectivity of Semantic Web researchers (International
Semantic Web Conferences (ISWC) or the Semantic Web Working Symposium of
2001).
Flink: the social networks of the Semantic
Web community
• The information sources are largely the natural byproducts of the
daily work of a community: HTML pages on the Web about people
and events, emails and publications.
• From these sources Flink extracts knowledge about the social
networks of the community and consolidates what is learned using a
common semantic representation, namely the FOAF ontology.
Flink: Features
• Flink takes a network perspective on the Semantic Web community,
which means that the navigation of the website is organized around
the social network of researchers.
• Once the user has selected a starting point for the navigation, the
system returns a summary page of the selected researcher, which
includes profile information as well as links to other researchers that
the given person might know.
• The immediate neighborhood of the social network (the ego-network
of the researcher) is also visualized in a graphical form.
Flink:
Features
Flink: Features
• The navigation from a profile can also proceed by clicking on the
names of coauthors, addressees or others listed as known by this
researcher.
• In this case, a separate page shows a summary of the relationship
between the two researchers, in particular the evidence that the
system has collected about the existence of this relationship.
• This includes the weight of the link, the physical distance, friends,
interests and depictions in common as well as emails sent between
the researchers and publications written together.
Flink: Features
• The information about the
interests of researchers is
also used to generate a
lightweight ontology of the
Semantic Web community.
• The concepts of this ontology
are research topics, while the
associations between the
topics are based on the
number of researchers who
have an interest in the given
pair of topics
Flink:
Features
• The visitor of the website can also view some basic statistics of the social
network.
• Degree, closeness and betweenness are common measures of
importance or influence in Social Network Analysis, while the degree
distribution attests to a general characteristic of the network itself.
Flink:System
design
• Similarly to the design of most
data-driven applications, the
architecture of Flink can be
divided in three layers
concerned with metadata
acquisition, storage and
presentation, respectively.
Flink: System Design
• Information Collection
• Sources: HTML pages from the web, FOAF profiles from the Semantic Web,
public collections of emails and bibliographic data.
• The web mining component of Flink employs a co-occurrence analysis
technique.
• FOAF is the native format of profiles that we collect from the Web. FOAF
profiles are gathered using the Elmo scutter, starting from the profile of the
author.
Flink: System Design
• Storage and Aggregation
• In our case ontology mapping is a straightforward task: the schemas used are
small, stable, lightweight web ontologies (SWRC and FOAF).
• Their mapping cause little problem: such mappings are static and can be
manually inserted into the knowledge base.
• An example of such a mapping is the subclass relationship between the
swrc:Person and foaf :Person classes or the subproperty relationship between
swrc:name and foaf :name.
• The aggregated collection of RDF data is stored in a Sesame server.
• Besides aggregation, we also use reasoning to enrich the data. For example,
we infer foaf :knows relations between the senders and recipients of emails
and the coauthors of publications.
Flink: System Design
• User Interface
• The user interface of Flink is a pure Java web application based on the Model-
ViewController (MVC) paradigm.
• The key idea behind the MVC pattern is a separation of concerns among the
components responsible for the data (the model), the application logic
(controller) and the web interface (view).
• The role of the programmer is to extend this skeletal application with domain
and task specific objects.
• The model objects of Flink are Elmo beans representing persons, publications,
emails etc.
Flink: System Design
• User Interface
• When requests reach the controller, all the beans that are necessary to
generate the page are retrieved from the store and passed on to the view
layer.
• The GraphUtil utility is used again to read the social network from the
repository, which is also handed over to the visualization.
• In the view layer, servlets, JavaServer Pages (JSP) and the Java Standard Tag
Library (JSTL) are used to generate a front-end that hides much of the code
from the designer of the front-end.
• This means that the design of the web interface may be easily changed
without affecting the application and vice versa.
Openacademia: distributed, semantic-based
publication management
• Information about scientific publications is often maintained by
individual researchers.
• Reference management software such as EndNote and BibTeX help
researchers to maintain a personal collection of bibliographic
references.
• Even some researchers and research groups also have to maintain a
web page about publications for interested peers from other
institutes.
• Issue: Manual and Duplication
openacademia: distributed, semantic-based
publication management
• The openacademia system removes the unnecessary duplication of
effort involved in maintaining personal references and webpages.
• It also solves the problem of creating joined publication lists for
webpages at the group or institutional level.
• At the same time it gives a new way of instantly notifying interested
peers of new works instead of waiting for them to visit the web page
of the researcher or the institute.
• A public openacademia website is available on the Web for general
use, i.e. anyone can submit his own publications to this service.
openacademia: Features
• The possibility to generate an HTML representation of one’s personal
collection of publications and publish it on the Web.
• This requires filling out a single form on the openacademia website,
which generates the code (one line of JavaScript!) that needs to be
inserted into the body of the homepage.
• The code inserts the publication list in the page dynamically and thus
there is no need to update the page separately if the underlying
collection changes.
openacademia:
Features
openacademia: Features
• one can also generate an RSS feed from the collection.
• RSS: RDF Site Summary or Really Simple Syndication.
• RSS is a web feed that allows users and applications to access updates
to websites in a standardized, computer-readable format.
• Adding such an RSS feed to a homepage allows visitors to subscribe to
the publication list using any RSS news reader. Whenever a new
publication is added, the subscribers of the feed will be notified of
this change through their reader.
openacademia: Features
• A number of generic tools are available for reading and aggregating
RSS information, including browser extensions, online aggregators,
news clients and desktop readers for a variety of platforms.
• The RSS feeds of openacademia are RDF-based and can also be
consumed by any RDF aware software such as Piggy Bank browser
extension.
• Piggy Bank allows users to collect RDF statements linked to Web
pages while browsing through the Web and to save them for later
use.
openacademia:
Features
openacademia: Features
• Research groups can install their own openacademia server.
• Members of the research group can submit their publications by
creating a FOAF profile pointing to the location of their publication
collection.
• What the system provides is the possibility to create a unified group
publication list and post it to a website similarly to personal lists.
• There is also an AJAX based interface for browsing and searching the
publication collection.
• Another interactive visualization shows publication along a timeline
that can be scrolled using the mouse.
openacademia:
Features
openacademia:
Features
openacademia:
System Design
The architecture of
openacademia follows the
same design as Flink: in the
middle of the architecture is
an RDF repository that is filled
with a variety of information
sources and queried by a
number of services
openacademia: System Design
• The difference lies in the dynamics of the two systems.
• Flink is filled with data every two or three months in a semi-automated
fashion.
• openacademia repositories refresh their content every day automatically.
• In case a publication feed is generated from a single BibTeX or
EndNote file the entire process of filling and querying the repository
is carried out on the fly.
• In this case we use an in-memory repository that is discarded after
the answer has been served.
openacademia: System Design
• Information Collection
• For obtaining metadata about publications, we rely on the BibTeX and
EndNote formats commonly in use in academia worldwide.
• We ask authors to include a BibTeX file with their own publications on a
publicly accessible part of their website.
• Further, researchers fill out a form to create a basic FOAF profile, which
contains at least their name and the location of their references.
• If FOAF profiles already exist, we use Elmo crawlers.
• The BibTeX files are translated to RDF using the BibTeX-2-RDF service.
• They also defined the swrc-ext:authorList and swrc-ext:editorList properties,
which have rdf :Seq as range, comprising an ordered list of authors.
openacademia: System Design
• Storage and Aggregation
• Issue: Heterogeneity (information reused from a distributed, web-
based environment) effects both the schema and instance levels.
• Used the Elmo smusher framework to match foaf :Person
instances based on name and inverse-functional properties.
• Publications are matched on a combination of properties.
• The instance matches that we find are recorded in the RDF store
using the owl:sameAs property (OWL not supported so a Sesame’s
custom rule).
openacademia: System Design
• Presentation
• After all information has been merged, the triple store can be
queried to produce publications lists according to a variety of
criteria, including personal, group or publication facets.
• The online interfaces helps users to build such queries against the
publication repository.
• The queries are processed by another web-based component, the
publication web service.
openacademia: System Design : Presentation

The query, formulated in the SeRQL language, returns all publications


authored by the members of the Artificial Intelligence department in 2004.
This department is uniquely identified by its homepage.
openacademia: System Design : Presentation
• The publish service takes a query like the one shown above, the
location of the repository, the properties of the resulting RSS channel
and optional style instructions as parameters.
• In a single step, it queries the repository, performs the post-
processing and generates an RSS channel with the publications
matching the query.
• The resulting channel appears as an RSS 1.0 channel for compatible
tools while preserving RDF metadata.
• The presentation service can also add XSL stylesheet information to
the RSS feed, which allows to generate different HTML layouts.

You might also like