Term Paper - Devansh Mathur
Term Paper - Devansh Mathur
Submitted by:
Devansh Mathur
B.TECH (CSE) III Semester
CERTIFICATE
This is to certify that Devansh Mathur student of B. Tech. in Computer Science & Technology has carried out the work presented in the project of the Term paper entitled Semantic Digital library as a part of Second Year programme of Bachelor of Technology in of B. Tech. in Computer Science & Technology from Amity School of Engineering and Technology, Amity University Rajasthan, under my supervision. DATE: Mr Sanjay Jain Faculty Guide ASET, AUR
ACKNOWLEDGEMENT
It has come out to be a sort of great pleasure and experience for me to work on the project Semantic digital library. I wish to express my indebtedness to those who helped us i.e. the faculty of our Institute Mr Sanjay Jain during the preparation of the manual script of this text. This would not have been made successful without his help and precious suggestions. Finally, I also warmly thanks to all our colleagues who encouraged us to an extent, which made the project successful. Devansh Mathur
Content
Abstract.. 5 List of Figures. 6 1. Introduction.. 7 2. The Use of semantic web in digital libraries.. 10 2.1Resource Development Framework 11 2.2Web Ontology Language. 13 3. Digital Libraries: Earlier times.. 18 4. Digital Libraries Evolution: Content Sharing.. 23 5. The Digital Library Universe: .. .. 29 5.1. A Three-tier Framework.. 29 5.2. Main concept... 31 5.3. The Main Roles of Actors. 38 6. Advantages and Disadvantages of Libraries: From Past Till Now... 44 6.1. Of Past Digital Libraries.. 44 6.2. Of DL to SDL. 45 6.3. Of SDL to SSDL.. 47 6.4. SSDL and the future. 48 7. Future Prospects 49 8. Existing Semantic Digital Library Systems. 52 8.1. SMILE.. 52 8.2. JeromeDL. 52 8.3. BRICKS53 9. Conclusion. 54 10. References..55
List of Figures
Figure 1 Semantic Web Stack 11
Figure 2
29
Figure 3
32
Figure 4
37
Figure 5
38
Figure 6
43
Figure 7
Evolution Libraries
of
Semantic
Digital
46
Figure 8
Evolution Information
of
Social
Semantic
47
ABSTRACT
The term paper shows the detailed study made on Semantic Digital Library. Libraries were always a source of organized knowledge for various application areas, e.g. teaching. In recent years more and more information has been made available on the Web. High quality information is often stored in dedicated databases of digital libraries, which are on their way to become expanding islands of well-organized information. Digital libraries deliver similar services in a digital context, e.g. e-learning. Digital libraries have been an important source of information throughout the history of mankind. It has been present in our societies in different forms. Notably, traditional libraries have found their on the desktops of internet users. They have taken the shape of semantic digital libraries, which are accessible at any time, and accordingly provide a more meaningful search. This paper further discusses social semantic digital libraries that also incorporate the social and collaborative aspect. In this term paper we show how semantic web and social networking techniques can help to improve services of a digital library. We present architecture for a social semantic digital library; we describe various services based on semantic web and social networking technologies.
Introduction
Typical digital libraries usually focus on categorizing and cataloguing resources. Information retrieval in such libraries relies primarily on text search engines and free browsing. This approach proved to be useful, however it suffers from ambiguity of natural language, neglecting the importance of metadata; it also does not engage users in the process of sharing knowledge. Simple searching still returns too many results which have to be filtered somehow. Page ranking algorithm helps with websites but cannot be easily applied to books or e-learning objects. On the other hand, having a look on a friends bookshelf can give us much clearer view on what is worth reading in a particular domain than digging through a thousand books or websites published this month. The semantic digital library is an attempt to restore the collaborative approach to sharing knowledge. The term Digital Library is currently used to refer to systems that are heterogeneous in scope and yield very different functionality. These systems range from digital object and metadata repositories, reference-linking systems, archives, and content administration systems (mainly developed by industry) to complex systems that integrate advanced digital library services (mainly developed in research environments). This overloading of the term Digital Library is a consequence of the fact that as yet there is no agreement on what Digital Libraries are and what functionality is associated with them. This results in a lack of interoperability and reuse of both content and technologies. This
document attempts to put some order in the field for the benefit of its future advancement.
Libraries, together with archives, have always been the primary institutions delegated to manage collect, preserve and diffuse human knowledge and culture. When advances in computer science allowed dealing with digital representation of documents dedicated to capture human knowledge and culture rather than printed ones, libraries were particularly involved in exploiting the potential of the digital revolution. Thus digital libraries soon became the term to indicate the digital counterpart of traditional libraries. However, digital library systems have greatly evolved since their early appearance. Today they have become complex networked systems able to support communication and collaboration among different worldwide distributed communities, dealing with digital objects comprising not only the digital counterpart of printed documents, but also images, video, programs and any other kind of multimedia objects a community may define as appropriate to its working and communication needs. The evolution of digital libraries (DLs) has not been linear, coming from the contribution of many disciplines. This has created several conceptions of what a DL is, each one influenced by the perspective of the primary discipline of the conceiver(s) or by the concrete needs it was designed to satisfy. As a natural consequence, the history of Digital Libraries, which is now approximately twenty years long, is the history of a variety of different types of information systems that have been called digital libraries. These systems are very heterogeneous in scope and functionality and their evolution does not follow a single path. In particular, when changes happened this has not only meant that a better quality system was been conceived superseding the
preceding ones but also meant that a new conception of digital libraries was born corresponding to new raised needs. As it will be seen, most of the systems dealt with in this history are still living in their original conception, even though not in their original technological solutions. The rest of this chapter goes back over this history, giving an account of past and present understanding of these kinds of systems and on-going work in the area. The chapter concludes with a vision of the impact that new DLs are expected to have in the near future.
Many digital objects contain metadata, or information contained within the document that defines information about the document such as the author, publisher and date created. This metadata is often written using XML which
10
gives digital objects a similar framework, regardless of which metadata scheme is used within the object. Although XML can define structure within a digital object, it does not convey meaning to computers.
11
subClassOf
Domain
Person
subClassOf
Range
hasSuperVisor
Researcher Type
Has Supervisor
Jeen
Each part of the triplet can be assigned a unique universal resource identifier (URI), which directs the computer, to that specific digital object. As pieces of information are added to this data model it can be written in XML and transmitted across applications. The RDF schema is the set of rules that define the way RDF document must be written. The benefits of using RDF are tremendous. As McCarthie Nevile and Mendez state, If you give an RDF processor two sets of statements about the same thing (a thing identified by the same URI), it just treats them as one collection of statements. This means that a web of meaning can be created about a specific object, and that this web of meaning can have multiple contributors.
12
There are many digital library projects already making use of these technologies. One of the greatest benefits of the Semantic Web is the ability to relate resources by their meaning rather than their form. This allows for the linking of concepts from different vocabularies, freeing users from the confines of any particular vocabulary.
13
The best example of the use of ontologies to achieve this goal is the Unified Medical Language System (UMLS) developed by the National Library of Medicine. The UMLS is comprised of three parts: a Meta thesaurus which contains over one million biomedical concepts from over 100 source vocabularies; a Semantic Network which defines 133 broad categories and fifty-four relationships between categories for labelling the biomedical domain; and a SPECIALIST Lexicon & Lexical Tools which provide lexical information and programs for language processing (National Library of Medicine, 2010). The UMLS is distributed with software and tools that enable a user or organization to use the UMLS to do research in the biomedical field. The UMLS is essentially a comprehensive ontology of the biomedical field, and can be used by a digital library who serves users with an interest in the biomedical field. The UMLS is the perfect example of the potential for interoperability created by using Semantic Web technologies. The National Library of Medicine is arguably the well-respected source of biomedical resources, and their UMLS can be used by any single user or digital library to enhance the quality of their searching or to build a search and retrieval interface. Another example of a collaborative digital library project using Semantic Web technology is Europeana , an interface for searching the digital resources of Europe's museums, libraries, archives and audio-visual collections
(Europeana). Europeana allows users to explore these items and share those using social media outlets. Europeana has also developed a Semantic Searching Prototype which currently contains items from the Rijksmuseum, the Louvre, and Netherlands Institute for Art History. The Semantic Search interface is a simple search bar that allows users to search for concepts represented by digital objects from these institutions.
14
Performing a search for love returns 504 items that are listed under the following headings: works showing concept, works titled, works showing a more specific concept, works showing a related concept, works from type a related concept, and other. This breakdown gives the user a unique perspective on how a Semantic Web search differs from a traditional search. When the user clicks on a resource, a window pops up that gives the user information about the resource including date, material, relation, subject, title, and type. These fields correspond to many metadata schemes. The user can also link to the page from the original institution that contains the item, and to a Europeana created page for that item. The Europeana page allows users to download RDF files relating to specific data about the item. The Europeana Semantic Search Prototype shows how Semantic Web technology can be used to integrate resources from multiple sources, to allow users to access these resources, and to facilitate the sharing of Semantic Web data for each resource.
There are also many Semantic Web resources that are not directly connected with a traditional digital library that can potentially be used in digital library interfaces. One such resource is Friend of a Friend Project or FOAF. This project provides users with a way to create an RDF file that identifies the user as a unique individual. This project is similar to the authority files created in traditional libraries to differentiate between authors with the same name. FOAF files can designate a wealth of information about a person. They can be generated using XML/RDF by users or can be generated through social networking sites. Digital libraries could allow users to create profiles, and then translate these profiles to FOAF files for use within the digital library interface. Digital libraries should provide users with services beyond simple search and retrieval of items, and allowing users to customize their interface and track
15
search history is one way to accomplish this goal. A 2009 study by Jiang and Tan attempted to use Semantic Web technology to create user ontologies to help provide personalized information services. Jiang and Tan (2009) explain that user ontology is a specialization of domain ontology by assigning each concept and relation of the domain ontology with a specific value for indicating a users interest.
This user ontology allows for ranking of retrieved search items by user preferences. The authors required participants to submit a search query to a system loaded with documents from the ACM digital library. The participants then were asked to browse the top 30 documents retrieved and rank them based on relevance. This data was loaded into that users ontology based on the concepts included in preferred and non-preferred documents. After the creation of the user ontology, the data was loaded into the search interface. The participants then searched for a new query, and the search system ranked the documents based on preferences from the user ontology. These results were compared to a search query of the same system not loaded with the user ontology. The authors found that the system loaded with the user ontologies returned more precise results than the system that was not programmed for user preference. The authors state User ontology can be used in many ways to support Semantic Web applications, including document re-ranking, information filtering, and query expansion.
The implications of this research are that digital libraries can employ Semantic Web technology to track user preferences in a new way, and can seamlessly integrate these preferences with current search capabilities.
16
Semantic Web technologies have been integrated with digital libraries to meet the needs of digital library users and to meet the goals of digital library organizations. The greatest asset of the Semantic Web is that it allows for interoperability between organizations and information systems, for even agents that were not expressly designed to work together can transfer data among themselves when the data comes with semantics. Another asset of the Semantic Web is that users can create information about digital objects using any language, classification scheme, or metadata scheme. Once this information has been linked to that objects DOI, the computer will integrate each piece of information scattered throughout the Web to create a more complete record of that object. 8 The Use of Semantic Web Technologies in Digital Libraries Digital librarians should be leading the charge for the creation of RDF schemas and ontologies for digital objects because they have a combination of traditional library skills such as content representation and cataloguing along with technological skills such as metadata creation. Digital libraries must use these technologies to improve all aspects of their organizations.
The ideal digital library would use Semantic Web technology as the backbone of its entire organization. All digital objects contained within their collections would have RDF documents associated with them. The collections would all be pre-loaded with ontologies related to their specific domain. Digital libraries would contain resources from organizations around the world. Users would be able to log in and create profiles that would translate to FOAF files and user ontologies to keep track of semantic preferences. The search and retrieval functions would benefit from the ability to search semantically along with the traditional search methods the library currently employs. Semantic Web technology essentially teaches the computer the relationships between digital
17
objects, their meaning, and how users would like to interact with them. The technology is already in place and the ideal digital library is only a few keystrokes away.
18
memex and there amplified). Licklider realized that computers were getting to be powerful enough to support the type of automated library systems that Bush had described and in 1965, wrote his book about how a computer could provide an automated library with simultaneous remote use by many different people through access to a common database. Because of this, Licklider is also considered a pioneer of Internet and in its book he established the connection between Internet and digital library. Thus, it is not surprising that research and development activity on digital libraries started in the early 1990s, with the Internet proliferation, and that Internet has created unprecedented possibilities to discover and deliver human knowledge. The first systems delivering knowledge artefacts in digital form can essentially be seen as archives of digital texts accessible through a search service and implemented by a centralized metadata catalogue. The early ones of such systems were constructed on a rather simple architecture, with the exception of very few cases. This worked to the advantage of their diffusion and adoption by different scientific communities. Besides arXiv, significant examples of such early systems were archives of various type like Electronic Thesis and Dissertations repositories (ETDs), whose pilot project started in 1996 , and archives of cognitive sciences papers and of research papers in economics both launched in 1997. The former was a system which was offering services for submitting, browsing and searching electronic thesis in PDF format. The availability of this product stimulated the creation of the Networked Digital Library of Theses and Dissertations international organization, still operational, which registers and keep track of ETDs. Cog Prints, was initially conceived as repository allowing the cognitive science community to self-archive their papers. It now contains more than 3,000 artefacts starting from 1950. In 2000 it was made compliant with the protocol
19
defined by the Open Archives Initiative and then its software was converted into the EPrints Digital Repository Software (EPrints, (n.d.)), a flexible platform supporting easy and fast set up of repositories of open access research outputs. Because of its simplicity, EPrints is currently widely used, more than 250 repositories declared to rely on it. Similarly, RePEc was initially conceived as an open repository of electronic papers in a specific domain. Thomas Krichel, principal investigator of the RePEc Project, in 1997 illustrated the principles underlying a new realised version of this system by affirming Distributed archives should offer metadata about digital objects (mainly working papers); the data from all archives should form one single logical database despite the fact that it should be held on different servers; users could access the data through many interfaces; providers of archives should offer their data to all interfaces at the same time. Krichel, with these statements was anticipating a view that would have largely emerged few years later. These systems all still living in more recent and enhanced versions represent very embryonic forms of digital libraries. In fact, their functionality is essentially confined to (self) publishing of simple information objects and discovery of these information objects through rudimentary search and browse facilities.
The Digital Library Initiative (DLI) consisted of two major competitive funding programs, the first of which started in 1994 and funded six research projects (chosen among 73 proposals) over a four-year period (Schatz and Chen,1996) while the second phase was dedicated to extend the research carried out during the previous phase by including content providers thus to guarantee the availability of real test bed to validate research outcomes. However, the DLI
20
funded projects have not been the only ongoing efforts (CACM, 1995) even if they were very innovative because they focused on future technological problems. The six projects funded by DLI phase one were: the California Environmental Digital Library (Wilensky, 1995) focused on developing the technologies to access large, distributed collections of photographs, satellite images, videos, maps, documents, and multivalent documents and to support work-centred digital information services (Wilensky, 1996); the Alexandria Digital Library (Smith and Frew, 1995) focused on building an online, distributed digital library for geo-referenced1 information, including maps, aerial photographs, satellite imagery, and catalogue records, and on supporting geographically defined queries (Smith, 1996); the Informedia Digital Video Library (Christel, Kanade, Maudlin, et al., 1995) focused on establishing a large, online digital video collection with full-content and knowledge-based search and retrieval (Wactlar, Kanade, Smith and Stevens, 1996) Despite none of these systems exist anymore as a running service2, the solutions proposed, the technology developed as well as the resources collected and built have been largely used by more complex DLs developed later. It is well known that one of the most important success stories resulting from these projects is Google. Page and Brin started working on their search engine while being PhD Students at Stanford working on the Stanford Digital Library Project.
Actually, the Digital Library Initiative merits goes far beyond the specific work that it funded and we can affirm that it gave shape to digital library as a new research discipline. Research in digital library topics was not new but it had been fragmented across many disciplines. This program led to conferences, publications and researcher, teams explicitly interested in doing research in digital libraries. Moreover, it gave directions to the overall movement toward a
21
practical research field.(Arms, 2001) As anticipated, in Europe the scene was characterised by the existence of DELOS initiatives. In addition, it was intended to develop new models for intelligent audiovisual content-based searching and film-sequence retrieval, new video abstracting tools, and user interfaces specifically tailored to the new functionality. The provision of multilingual services and cross language retrieval tools was also addressed. Another project, i.e. An Integrated Art Analysis and Navigation Environment (ARTISTE) (Allen, Vaccari and Presutti, 2000), focused on giving providers, publishers, distributors, rights protectors and end users of art images information, as well as the multi-media information market as a whole, a more efficient system for storing, classifying, linking, matching and retrieving art images. This environment was providing, for example, automatic extraction of metadata based on iconography, painting style, etc; content-based navigation for art documents; distributed linking and searching across multiple archives allowing ownership of data to be retained; and storage of art images using large multimedia object relational databases.
According to these principles: the functionality of a digital library system were available in the form of distinct functional units, each exposing its operational semantics through an open protocol; digital library systems are compositions of these functional units and new functionality can be added through the implementation of value-added services, which interact with existing others using established protocols; the components (and content) of a digital library could be spread over the global Internet, but should be presented to the user as a single system.
22
23
highly functional level of interoperability among scholarly e-print archives and to the establishment of the Open Archives Initiative. (Van de Sompel, H., Lagoze, C., 2000) The meeting started by discussing a concrete example of interoperability implemented through the UPS Prototype (Van de Sompel, H., Krichel, T., Nelson, M.L., 2000) and recognising its potentialities. The UPS prototype demonstrated the integrated action of a variety of services operating over data originating from a set of archives. Each of those services provided a reasonably rich level of functionality (accessible through a set of protocol methods). The participants recognised that trying to reach consensus on the full functionality of the prototype was aiming too high and that a proper degree of modesty in the approach toward integration capable to balance the cost of participation with the need for adequate functionality was mandatory. The Santa Fe Convention identified two key roles in participating institutions: data providers and service providers. Data providers were in charge to handle the depositing and publishing of resources in a repository and expose for harvesting the metadata (what they called record) about resources in the repository. They were the creators and keepers of the metadata and repositories of resources. Service providers were in charge of harvesting metadata from data providers for the purpose of providing one or more services over the collected data. The types of services that might be offered included a search interface, peer-review system, etc. The cooperation between content and service providers was regulated by a protocol, initially defined as a subset of the Dienst protocol and nowadays known as the Open Archive Protocol for Metadata In the US, the National Science Foundation funded the National Science Digital Library (NSDL) (Zia, L.L., 2001) with the aim to provide organized access to high quality resources and tools that support innovations in teaching and
24
learning at all levels of science, technology, engineering, and mathematics education. These large-scale initiatives devoted to aggregate in a single place knowledge that is spread across a plethora of archives and systems will ever exist for a series of reasons including the existence of various (institutional) repositories and the ever growing multidisciplinary nature of our society. In particular, TEL and DARE anticipated important initiatives, namely, Europeana and DRIVER, respectively, which were launched few years later. Europeana14 is a Thematic Network funded by the European Commission under the eContentplus programme, as a part of the i2010 initiative15. Europeana began in July 2007. Originally known as the European digital library network EDLnet it is the result of a partnership of 100 representatives of heritage and knowledge organisations and IT experts from throughout Europe. Objective of Europeana is to provide access to Europes cultural and scientific heritage through a crossdomain portal. The first Europeana prototype, launched in November 2008, provided simple search and retrieval facility on an information space of approximately two millions of digital objects selected from Europes museums, libraries, archives and audio-visual collections, harvested through the OAIPMH protocol. The first production quality version of Europeana (called Rhine) will go live on July 2010, to be followed in April 2011 by a more sophisticate version (Danube), including more contents and offering a richer set of functionality. The intention is that by 2010 the Europeana portal will give everybody direct access to well over 6 million digital sounds, pictures, books, archival records and films. Moreover, Europeanas goal is to realize a system serving very different type of users. It should meet occasional curiosity of generic users as well as the information needs of school children and students. It should also provide academic students and teachers with certified information
25
and the possibility to export information for courses, as well as offer expert researchers and professional the possibility of searching, verifying and annotating information and using ad-hoc services. In the context established by Europeana, special types of providers are the aggregators, i.e. specialised DLs that act as collectors of content from other providers. For instance, Culture.fr is the largest aggregator, providing content from about 480 organizations in France, including the Louvre and the Muse dOrsay. The information resources that populate Europeanas information space are harvested as surrogates of the original objects that are located at content providers sites. Since surrogates may also contain elements of the original object (table of contents, full text index items, music and video abstraction etc.), the very interesting new feature of Europeana is that it will also deliver digital objects besides metadata. Clearly, heterogeneity and interoperability are main issues that such a DL is having to deal with, as well as, of course with scalability, quality of service and, more in general, sustainability of the joint portal. DRIVER16 is another notable example of a DL that relies on content provided by a large number of external data providers. It is the result of two subsequent projects funded by the European Commission in the period 2006-2009. The main aim of these two projects is to create the organisational and technological conditions for the set up of a European Repository Infrastructure (Jones, S., Manghi, P., 2009). The main instrument identified by the project to address organisational issues is the DRIVER Confederation17. The Confederation partners represent European and international repository communities, like subject based communities, repository system providers, service providers, as well as political, research, and funding organisations, who share the DRIVER vision to allow all research institutions in Europe and worldwide to make all their research publications openly accessible through institutional repositories.
26
In the spirit of this shared goal, the DRIVER confederation encourages a combined effort of repository development by setting up guidelines and best practices that favour the realization of a shared, trusted, long-term repository infrastructure. From the technical point of view, DRIVER is based on the D-Net technology18. This enabling technology is quite innovative in the context of these kinds of aggregative systems because it is oriented to the realisation of a digital library infrastructure (cf. Sec. 5). D-Net is based on a Service-oriented architecture, where distributed and shared resources are implemented as standard Web Services and applications consist of sets of interacting services. It offers services to both data providers, that through it can more easily share their content, and service providers, that are facilitated in implementing DLs that exploit the aggregated content.19 At the time of this writing, the DRIVER service provides access to approximately one million records out of 200+ repositories across 27 countries. Moreover, it delivers three DL applications: the Belgium national repository portal, offering search over the Belgium Repository Federation subset; Recolecta national repository portal, offering search on the Spanish Repository Federation subset; and the main DRIVER portal, providing access and advanced functionality over the whole space. The current Europeana and DRIVER services operate an information space of metadata records, i.e. they harvest metadata records through the OAI-PMH protocol from exiting repositories and then they run their services by exploiting this content. Because of this they suffer from the limitations that OAI-PMH poses if it has to be used to exchange information objects that are rich in structure and payload as those at the core of changing nature of scholarship and scholarly communication.(Van de Sompel, H., Payette, S., 2004)(Van de Sompel, H., Lagoze, C., 2006) In particular, when feasible, they give access to the content associated with the metadata by exploiting URL or
27
some other information contained in the record. This solution to access information objects, however, suffers of two main problems: (i) the access is not always feasible since there is no standard protocol to access objects; (ii) There is no way of accessing compound objects since the structure and the relation holding among the different parts is unknown. A solution to this problem may come from the OAI-ORE20 standard, whose version 1.0 has been released in October 2008 by the Open Archives Initiative. This standard, based on Web standards, proposes a solution to handle aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources having multiple media types including text, images, data, and video as to form innovative research outcomes. Both Europeana and DRIVER have already planned to move very soon to technologies la OAI-ORE to manage compound objects. All the systems and initiatives described in this section are essentially oriented to content sharing. Moreover, the majority of them is characterised by a strong organisational effort since the model is based on a cooperative participation of the content providers. Content sharing across digital libraries is now being largely promoted as an important strategy to reduce the digital library set up costs largely coming from selecting, digitising, describing, and digitally curating content resources. However, the realisation of wide and generalised content sharing is today still problematic due to the great variety of proprietary models and ontologies adopted by existing systems and by the lack of systematic approach to interoperability a recently funded EC project stemming from the DELOS project, is paving the way for the future interoperability of DL systems thus making feasible the implementation of global digital library infrastructures.
28
These three system notions are often confused and are used interchangeably in the literature this terminological imprecision has produced a plethora of heterogeneous entities and contribute to make the description, understanding and development of digital library systems difficult.
29
As Figure indicates, all three systems play a central and distinct role in the digital library development process. To clarify their differences and their individual characteristics, the explicit definitions that follow may help.
Digital Library (DL) A potentially virtual organisation, that comprehensively collects, manages and preserves for the long depth of time rich digital content, and offers to its targeted user communities specialised functionality on that content, of Defined quality and according to comprehensive codified policies.
Digital Library System (DLS) A deployed software system that is based on a possibly distributed architecture and provides all facilities required by a particular Digital Library. Users interact with a Digital Library through the corresponding Digital Library System.
Digital Library Management System (DLMS) A generic software system that provides the appropriate software infrastructure both (i) to produce and administer a Digital Library System incorporating The Suite of facilities considered fundamental for Digital Libraries and (ii) to integrate additional software offer in more refined specialised or advanced facilities. Although the concept of Digital Library is intended to capture an abstract system that consists of both physical and virtual components the Digital Library System and the Digital Library Management System capture concrete software systems. For every Digital library there is a unique digital library system in operation (possibly consisting of man inter connected smaller digital library systems) whereas all Digital Library systems are based on a handful of Digital library management systems. The digital library is thus the abstract
30
entity that `lives` thanks to the software system constituting the DLS and the DLMS is the software system that is conceived to support the life cycle of one or more DLS. It is important to note that all these concepts underlie all types of information environment and systems, e.g. database, hospital info systems, Banking systems, the web, Wikipedia,, etc. however it is the particular characterizations given in the definitions of the previous section that distinguishes digital libraries from the others: the content should be rich, annotated and preserved for depth of time the user communities should be targeted, the functionality should be specialised the quality should be measured and according to the comprehensive policies . All of these characterizations, of course are abstract and subject to interpretation, so they cannot lead to a precise formal definition. Nevertheless they offer conceptual yardsticks by which system can be measured and mutually compared and psychological lower bounds can be established regarding the nature of digital libraries.
Main Concepts
Despite the great variety and diversity of existing digital libraries, there is a small number of fundamental concepts that underlie all systems. These concepts are identifiable in nearly every digital library currently in use. They serve as a starting point for any researcher who wants to study and understand the field , for any system developer intending to construct a digital library, and for any content provider seeking to expose its content via digital library technologies. In this section, we identify these concepts and briefly discuss them. Seven core concepts provide a foundation for digital libraries. One of them appears in the definition of digital library to capture the commonalities between this universe and other social arrangements: Organisation.
31
Five of them appear in the definition of digital library to capture the features characterising this kind of organisation and the expected service: Content, User, Functionality, Quality and Policy. The seventh one emerges in the definition of the digital library system to capture the systemic features underlying the expected service: Architecture. All seven concepts influence the digital library three-tier framework, as shown in fig.
Organisation The organisation concept is surrounding the entire Digital library universe. A digital library is a kind of organisation by its own, it is a social arrangement pursuing a well-defined goal (the digital library service). This concept subsumes the mission the digital library has been conceived for and every other aspect that is needed to define this mission and the operation of the resulting service. However, this should not be confused with the organisation/institution that
32
decided to set up the digital library and drive its development although there are overlaps and dependencies between the two. It is quite easy to recognise the dependency relationship between the two, to some extent the institution sets the scene for the digital library organisation, the institution is the establisher of the Digital Library Organisation and has the power to define the overall service this organisation is requested to realise. However, the digital library, being an organisation by its own has the power to control its own behaviour and evolution in the frame defined by the institution. This concept is fundamental to characterise the digital library universe because it highlights the commonalities between this universe and the other one dedicated to capture organised body of people having a particular purpose.
Content The content concept encompasses the data and the information that the Digital library handles and makes available to its users. It is composed of a set of information objects organised in collections. Content is an umbrella concept used to aggregate all forms of information objects that a digital library collects, manages and delivers. It encompasses a diverse range of information objects, including primary objects annotations and metadata. This concept is fundamental to characterise the digital library universe because it captures one of the major resource these organisations are called to manage, i.e. the data and the information that is made available through it.
User The User concept covers the various actors (whether human or machine) entitled to interact with Digital Libraries. Digital Libraries connect actors with information and support them in their ability to consume and make creative use
33
of it to generate new information. User is an umbrella concept including all notions related to the representation and management of actors entities within a digital library. It encompasses such elements as the rights that actors have within the system and the profiles of the actors with characteristics that personalise the systems behaviour or represent these actors in collaborations. This concept is fundamental to characterise the Digital Library universe because it captures the actors of the overall Organisation.
Functionality The Functionality concept encapsulates the services that a Digital Library offers to its different users, whether individual users or user groups. While the general expectation is that Digital Libraries will be rich in functionality, the bare minimum of functions includes new information object registration, search and browse. Beyond that, the system seeks to manage the functions of the Digital Library to ensure that the overall service reflects the particular needs of the Digital Librarys community of users and/or the specific requirements related to Its Content. This concept is fundamental to characterise the digital library universe because it captures the facilities offered by the overall organisation.
Policy The Policy concept represents the set or sets of conditions, rules, terms, and governing every single aspect of the digital library service including acceptable behaviour, digital rights management, privacy and confidentiality charges to users, and collection formation. Policies may be defined within the digital library or be superimposed by the institution establishing the digital library or outside of that (e.g., Policy governing our society). There policies can be extrinsic or intrinsic policies.
34
This concept is fundamental to characterise the Digital Library universe because it captures the rules and conditions regulating the overall Organisation.
Quality The Quality concept represents the parameters that can be used to characterise and evaluate the overall service of a Digital Library including every aspect of it, i.e. Content, User, Functionality, Policy, Quality, and Architecture. Quality can be associated not only with each class of content or functionality but also with specific information objects or services. Some of these parameters are quantitative and objective in nature and can be measured automatically, whereas others are qualitative and subjective in nature and can only be measured through user evaluations (e.g., focus groups). This concept is fundamental to characterise the Digital Library universe because it captures qualitative aspects characterising the Organisation.
Architecture The Architecture concept refers to a Digital Library System and represents a mapping of the overall service offered by a Digital Library (and characterised by Content, User, Functionality, Policy and Quality) on to hardware and software components. There are two primary reasons for having Architecture as a core concept: (i) Digital Libraries are often assumed to be among the most complex and advanced forms of information systems (Fox & Marchionini, 1998); And
35
(ii) Interoperability across Digital Libraries is recognising as a major challenge. A clear architectural framework for Digital Library Systems offers ammunition in addressing both of these issues effectively. This concept is fundamental to characterise the Digital Library universe because it captures the systemic part of the service offered by the Organisation. The concepts populating the areas just introduced (Organisation Is a special case since it subsumes all the rest) share many similar characteristics and all refer to internal entities of a Digital Library that can be sensed by the external world. Therefore, there has also been introduced a higher--level concept referring to all of these, i.e., Resource, which enables us to reason about the common characteristics in a consistent manner. Figure puts in perspective the main concepts of the Digital Library universe. The Organisation concept surrounds and subsumes all the other concepts. Among the remaining six, two of them are independent each other, i.e., they exist independently of a specific Digital Library. These are User, Representing the external humans or hardware interacting with the Digital Library and Content, representing the material handled by the Digital Library. Architecture, representing the technological design on which the Digital Library System is based, represents the underlying technology that is called to implement all the rest. On top of these concepts there comes Functionality, primarily representing the means for connecting User to Content, i.e., all procedures, transformations, actions and interactions that bring Content to User Or vice versa. Finally, operation of the Digital Library and activation of its Functionality are based on Policy and aim to achieve certain Quality.
36
In order to describe how a Digital Library Organisation is expected to work, it is fundamental to identify which are the main roles that actors can play while interacting with the digital library systems previously identified and which are their relations with the six core concepts (Content, User, Functionality, Quality, Policy and Architecture) characterising such a kind of Organisation. These roles are discussed in the next section.
37
As shown in Fig each role is primarily associated with one of the three systems in the three-- tier framework. The system a role is associated with represents the entity that is expected to provide the actor playing such a role with the facilities needed to accomplish the mandate assigned to the role. Moreover, every actor, independently from the role he/she is playing, is expected to deal with all the foundational concepts characterising the Digital Library universe.
38
DL End--user DL End--users exploit the overall Digital Library service for the purpose of providing, consuming, and managing the DL. They are the target clients of the service defined by the DL Organisation in terms of the Content to be managed, the User(s) to be served, the Functionality to be supported, the Policy (ies) to be put in place and the Quality to be exposed. They perceive the DL as a stateful entity serving their needs. This state of the Digital Library is a complex condition resulting from and influencing Content, User, Functionality, Policy and Quality aspects of the DL Organisation. Moreover, the state is expected to evolve during the lifetime of the Digital Library as a consequence of a series of actions and activities performed in the context of the DL Organisation as well as of external factors influencing the DL Organisation. DL End--users may be further divided into Content Creators, Content Consumers and Digital Librarians. Content Creators are the producers of the Digital Library Content, i.e., they take care of producing new items contributing to the Digital Library Content. Their activity is performed (i) (ii) (iii) Through the Functionality the DL is provided with, In accordance with the Policies defined in the DL, and With the guarantee of Quality the DL declares.
Content Consumers are the clients of the Digital Library Content, i.e., they access and use the items in the Digital Library Content. Their activity is performed (i) Through the Functionality the DL is provided with,
39
(ii) (iii)
In accordance with the Policies defined in the DL, and With the guarantee of Quality the DL declares.
Digital Librarians are the curators of the Digital Library Content, i.e. they select, organise and look after the items in the Digital Library Content. Their activity is performed
Through the Functionality the DL is provided with, In accordance with the Policies Defined in the DL, and With the guarantee of Quality the DL declares. Moreover, they might influence the behaviour of the overall Digital Library service by acting as mediators between the final clients of it i.e., Content Creators and Content Consumers and those defining and operating this service i.e., DL Managers by distilling and elaborating feedbacks on the DL.
DL Managers DL Managers are the actors driving the overall Digital Library service. They are expected to rely on the facilities offered by The DLMS To define and operate the Digital Library and the DLS implementing it. DL Managers may be further divided into DL Designers and DL System Administrators. The former are called to devise the overall service while the latter are called to deploy and operate the DLS implementing the planned service.
DL Designers exploit their knowledge of the application environment that a DL is called to serve in order to define, customise, and maintain the Digital Library so that it is aligned with the needs of its target DL End--users.
40
To perform this task, the DL Designers interact with the DLMS to decide upon the characteristics the Digital Library Should have in terms of (i) Content, e.g., the set of repositories, ontologies, classification schemas, information object types, metadata formats, authority files, and gazetteers that form the DL Content; (ii) User, e.g., the allowed actors, the allowed roles, the information characterising the actors; (iii) Functionality, e.g., the functional facilities to be offered, the behaviour these facilities should implement; (iv) Policy e.g., the rule and principles governing the evolution of the DL Content, the allowed actions per actor or family of actors, the exploitation of a resource; (v) Quality, e.g. the minimal availability of DL Functionality, the minimal response time of DL Functionality, the completeness and
authoritativeness of the DL Content, the confidentiality of the User actions. These aspects characterise the overall Digital Library service, actually the way it is perceived by the DL End--users.
These parameters need not necessarily be fixed for the entire lifetime of the DL; they may be reconfigured to enable the DL to respond to the evolving expectations of target users and changes in all aspects.
DL System Administrators DL System administrator work in tandem with DL Designers t oput in place the Digital Library System implementing the planned Digital Library service.
41
They select, deploy and manage a set of networked computers and software modules needed to fulfil the expectations that DL End--users and DL Designers have for the Digital Library.
DL System Administrators perform their tasks by interacting with the DLMS and relying on the facilities these systems offer for DLS constituents identification, linking, allocation, deployment, configuration, tuning,
monitoring, alerting, and any other management facility requested to manage potentially distributed software systems as DLSs are expected to be. Different DLMSs are expected to offer diverse management facilities ranging from manual installation and configuration of the computers and the software modules on the target computers to fully autonomic solutions aiming at reducing human intervention to a few corner cases.
DL Software Developers DL Software Developers develop and/or customise the software components that will be used as constituents of the DLSs. They are requested to produce the software implementing every aspect of the Digital Library service ranging from the DL Content and User to Functionality, Policy and Quality. However, DL Software Developers should not start from scratch and their activity is expected to be performed by relying on the offering of a DLMS. In fact, a DLMS is a software system that is equipped with a bunch of off--the--shelf software modules implementing to some extent some Digital Library facilities, e.g., content repositories, users management systems, cooperative working environments, information retrieval engines, policies enforcement modules.
42
DL Software Developers include Software Engineers and Programmers that are requested to customise and complement the set of software modules provided by the exploited DLMS as to obtain the set of software constituents needed to implement the planned Digital Library. The three roles described above encompass the entire spectrum of actors working in the digital libraries universe. Their conceptual models of such a universe are linked together in a hierarchical way, as shown in Figure I.4--2.
This hierarchy is a direct consequence of the above definitions, since DL End-users act on the Digital Library, whereas DL Managers and DL Application Developers operate on the DLS (through the mediation of a DLMS) and, consequently, on the DL as well. This inclusion relationship ensures that cooperating actors share a common vocabulary and knowledge. For instance, the DL End--user expresses requirements in terms of the DL model and, subsequently, the DL Designer understands these requirements and defines the DL accordingly.
43
Advantages
People can access required information at any time of the day, as long as they have access to the internet.
Disadvantages
Searching is not efficient, as it may not provide meaningful data to the user as a result of his command. In many cases, access to certain information is limited by copyright law.
44
Data is static; therefore, no users can contribute their views or share their knowledge with other participants.
Digital libraries should explore using Semantic Web technologies to meet their organizational challenge. It summarizes the key challenges facing digital libraries, based on a digital libraries workshop. They identify five key challenges facing digital libraries: interoperability; description of objects and repositories; collection management and organization; user interfaces and human-computer interaction; and economic, social, and legal issues. Digital libraries can use Semantic Web technology to help meet all of these challenges.
FROM DL TO SDL
Following the advent of digital libraries in our lives, another innovative step followed. This step was made in relation to making the search more meaningful and direct. Essentially, it was concerned with refraining from the habit of searching all the things everywhere. The growth of Web 2.0 has given way to new methods of accessing information and contributing opinions. Notably, semantic digital libraries enable the user to get the intended information concerning an object without the presence of the exact word in the search. This integrated form of information is based on different metadata which provides a more meaningful data. These libraries tend to provide a better and more convenient form of browsing interfaces.
45
Advantages
Semantic Digital Libraries make it easier to find information in the vast ocean of available data. This is facilitated by ontology-based search and facet search.
Access is not confined to only one digital library; to the contrary, it provides a mechanism of interoperability between different systems.
46
Disadvantages
Existing metadata of the digital libraries have to be lifted to a semantic level.
Semantic digital libraries tend to focus more keenly on the retrieval of meaningful information rather than giving the opportunity of sharing user knowledge. This need subsequently led to the development of social semantic digital libraries.
47
This is achieved by a combination of Semantic Web with collaboration tools on the web. Social semantic digital libraries complement the existing features of the semantic digital libraries by providing the opportunity to contribute to the information. Web 1.0 evolved into a collaborative platform where people could interact and share information, i.e. Web 2.0. Web 2.0 was promoted by Tim OReily around the year 2005; it gives ordinary internet users the opportunity to interact, meet and share information like never before, and involves concepts like blogs, wikis, social networking sites etc.
Advantages
Social Semantic Digital Libraries (and Web 2.0) have made the web collaborative and interactive; however, one drawback which has become apparent as a result of this innovation is that of information overload. Owing to the increase in internet users and thus their participation level on forums, it has become difficult to point to the knowledge part of the content.
48
Disadvantages
Another drawback of such libraries is that web pages are dynamic but are not very structured. Notably, Web 2.0 tools enable the shaping of content on the pages, but not the content itself. Essentially, the future will address these aspects and make the web more powerful and structured by the advent of Web 3.0 . The basic concept behind Web 3.0 is that of ontologydefined by Thomas Gruber as explicit specification of a conceptualisation. Another future enhancement which is foreseen for the future is that there shall be digital annotation linked with physical objects in life: for example, in a museum. An application of this technology can be to have real-virtual tours of a certain place: for example, to start with a real guided tour and then (if desired) browsing through the virtual context information or otherwise gathering information about other exhibitions in the premises. The future aspect of the social semantic digital library is to improve user benefits by empowering the user interfaces and social networking. The user identification and system automation are important key points in the future social semantic digital libraries.
49
Future Prospects
There are inevitable barriers to the Semantic Web that still need to be addressed. We have mentioned the slow progress on certain features, particularly ontology and reasoning support, due to the development community not coming to a consensus. This does not mean that progress cannot be made immediately using the simpler tools for RDF and RDF Schema available now. Some of the larger IT companies are hanging back, waiting to spot the opportunity and waiting for the research community to settle on standards. Thus the main impetus is coming from communities themselves it is an opportunity to profoundly affect the way that the world talks to each other. There is a good deal of RDF data giving semantic descriptions already on the Web, both from website owners publishing their own annotations as RDF files and from sites such as rdfdata.org which provide portals for RDF data.
However, before the Semantic Web can become globally usable, there does need to be more, and it needs to be more easily available. There is a distinct overhead to using the Semantic Web in terms of establishing shared vocabularies and ontologies, and in providing the appropriate annotations to resources which make them visible to the Semantic Web. This is a non-trivial task and often users will either not have the time to include this, or the expertise to do it well. A missing component of the Semantic Web is a simple means to support this, similar to the editors and tools for the conventional Web. Undoubtedly the simplicity of the HTML language used within the current Web was a major influence on its success and in order for the Semantic Web to break out from narrow communities to universal use it needs to address the issues of making it easy to use and accessible to all.
50
Otherwise, the Semantic Web is likely to require particular effort and expertise. This is expensive, and so it may well be confined to particular domains on the Web which see a strong advantage in its use, although over time as the expertise becomes more commonplace it should become cheaper. Also, the 'network effect' can work as both a barrier and an incentive. One of the main advantages of supplying Semantic Web annotation is that is can be shared and can gain advantage to others, so when there is little data to share, then there is little incentive to take the extra expense in sharing; however, once the ball starts to roll, there is an exponential advantage in combining your own data with others'. These problems may be less of a disadvantage in the HE and FE sectors, which has well-integrated communities with stronger control over their resources. Information science professionals in libraries are available to help with the task of cataloguing and publishing annotations. Thus it is likely that this sector will be in the forefront in the use of this technology.
The impact on digital libraries, combined with the Open Access Initiative and the rise of open archiving is likely to be quite profound. Libraries become 'value-added' information annotators and collators rather than the archivists of externally published literature and the holders of the published output of institutions. The Semantic Web, although not a prerequisite or a motivator for this change is nevertheless likely to smooth its development. The tools are in place for sharing classification schemes and to allow the community to develop, deepen and share such schemes. The information infrastructure tools discussed above will have particular impact on the way students and researchers find information, so these tools may typically be provided and adapted by libraries who will tailor them to the needs of their own users. The Semantic Web, like
51
the current Web, has the capacity of being an overwhelming place; libraries are well-placed to make sense of this for the HE and FE community.
Jerome Digital library: Can be considered as a social semantic digital library. Based on Semantic Web as well as social networking in order to
52
promote collaborative activities along with other common uses of semantic digital libraries. With JeromeDL social and semantic services every library user can bookmark interesting books, articles or other materials in semantically annotated directories. Users can share their knowledge with others within a social network. We enriched the standard SSCF browser with an ability to bookmark and browse community based data. JeromeDL also has a feature which allows it to treat a single library resource as a blog post. With SIOC based annotations users can to comment the content of the resource and in this way create new knowledge. JeromeDL also provides various browsing, filtering and navigation solutions, such as TagsTreeMaps, MultiBeeBrowse and Exhibit. JeromeDL has been installed in a number of locations; the two most used, DERI Galway library and WBSS8 at Gdansk University of Technology, serve their community of users in everyday activities. DERI Galway library is used by researchers as a pre-print server to locate and share publications. WBSS maintains a set of scans of antique books and a number of books written by lecturers at GUT; the latter ones are used as learning material.
BRICKS: This system focuses on the basic infrastructure of a digital library network so that information can be shared amongst users in the cultural heritage domain. The BRICKS network infrastructure uses the Internet as a backbone, and is made of decentralized BRICKS Nodes (BNode), in order to avoid central points whose failure or overload could stop or slowdown the whole Network. BNodes communicate among each other and use available resources for content and metadata management. Every BNode knows directly only a subset of other BNodes in the system. However, if a BNode wants to reach another member that is directly unknown
53
to it, it will forward a request to some of its known neighbour BNodes that will deliver the request to the final destination or forward it again. BRICKS users access the system only through a local BNode available at their institution. Hence every user request is primarily sent to the institution's BNode and then the request is routed via other BNodes to the final destination. Search requests behave like that; the BNode pre-selects a list of BNodes where a search request can be fulfilled, and then the BNode routes it there. When the location of the content is known, e.g. as a result of the query, the BNode is directly contacted.
Conclusion
Traditional libraries have taken the shape of an interactive, accessible and efficient platform which is present for the user at any time of the day. The new forms of digital libraries, i.e. semantic digital libraries, have proved to produce more meaningful results for the user. Further developments in semantic digital libraries have evolved the concept of contribution of information and social interactivity between the contributors. Therefore, the future holds much more promising and efficient mechanisms for handling information.
54
References
[1] Diane, V., 2006, When did the Web start? Developed Traffic, http://developedtraffic.com/2006/08/04/when-did-the-web-start/. [2] Berners-Lee, T., Hendler, J., Lassila, O., 2001, the Semantic Web, Scientific American Magazine (May 17, 2001). http://www.sciam.com/article.cfm?id=thesemantic-web&print=true. [3] Kruk, S., R., Haslhofer, B., Kneevic, P., 2007, Tutorial 7- Semantic Digital Libraries, JCDL 2007 [4] Lagoze, C., Krafft, D., B., Payette, S., Jesuroga, S., 2005, What Is a Digital Library Anymore, Anyway? D-Lib Magazine November 2005, Volume 11 Number 11, http://dlib.org/dlib/november05/lagoze/11lagoze.html [5] Borgman, C., L., 2000, Digital libraries and the continuum of scholarly communication, Journal of Documentation, 56 (4), pp. 412-430 [6] Borgman, C., L., 1999, What are digital libraries? Competing visions, Information Processing & Management, pp. 227-243, [7] Celino, I., Turati, A., Della Valle, E., and Cerizza, D. (2006). Squiggle Med: Semantic search for medical digital library, Technical report, CEFRIEL. [8] Baruzzo, A., Casoto, P., Challapalli, P., Dattolo, A., Pudota, N., Tasso, C., 2009, Toward Semantic Digital Libraries: Exploiting Web2.0 and Semantic Services in Cultural Heritage, Journal of Digital Information, Vol. 10, No 6. [9] Xu, X., Zhang, F., Ni, Z., 2008, An Ontology-Based Query System For Digital Libraries, IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application [10] Kruk, S., R., Decker, S., Haslhofer, B., Kneevic, P., Payette, S., Krafft, D., 2007, Tutorial Semantic Digital Libraries, BANFF 2007. [11] Thomas, S., 2006, Web 2.0 and the future for library systems, Technical Report, University of Adelaide,
55
http://digital.library.adelaide.edu.au/dspace/bitstream/2440/14789/1/Web2.0.pdf [12] Baruzzo, A., Casoto, P., Dattolo, A., and Tasso, C., 2009, A conceptual model for digital libraries evolution. In WEBIST 09: Proceedings of 5th Informational Conference on Web Information Systems and Technologies, pages 299304, Berlin. Springer-Verlag. [13] The Spiritus Temporis Web Ring Community, 2005 Digital library, http://www.spiritus-temporis.com/digital-library/disadvantages.html [13] D. McGuinness and F van Harmelen (eds) OWL Web Ontology Language Overview http://www.w3.org/TR/2003/WD-owl-features-20030331/ [14] M. Dean, G. Schreiber (eds), F. van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneider, L. Stein, OWL Web Ontology Language Reference http://www.w3.org/TR/2003/WD-owl-ref-20030331/
56