Using XML and Databases: W3C Standards in Practice
Using XML and Databases: W3C Standards in Practice
February 2008 Bill Trippe, Senior Analyst Dale Waldt, Contributing Analyst
Sponsored by
Table of Contents
Executive Summary..............................................1 Introduction.........................................................2 The Power of Standards-based Computing ...........3
Standards Development Organizations .........................3 Comparing Relational and XML Databases ....................4 XML Standards and XML Databases ..............................5
Gilbane Group, Inc. 763 Massachusetts Ave Cambridge, MA 02139 +1 617 497 9443 [email protected] http://gilbane.com
Executive Summary
XML has emerged as a powerful format for representing data in a wide variety of fields, from technical data to finance to healthcare. Unlike traditional data formats, such as relational data, XML has a hierarchical structure that can be used to model virtually any type of data. In addition, XML is far more flexible and forgiving of change than other formats. XML presents a number of interesting challenges and opportunities for data storage. Relational databases and full-text search mechanisms that have been the backbone of many applications are not designed to manage XML content effectively. A new class of databases has emerged that is designed specifically to manage XML content. Typically called XML Native Databases or just XML databases, they incorporate functionality that greatly improves the management, searching, and manipulation of XML to produce the most effective XML data management solution. The World Wide Web Consortium (W3C), the standards organization that developed XML, has also developed many standards that can be used to access, search, process, and store XML data. XML databases take advantage of these standards to provide efficient and precise access, query, storage, and processing capabilities not found in traditional database technology. The result is that applications using XML databases are more efficient and better suited for managing XML data. These W3C standards, including XML Schemas, XSLT, DOM, XLink, and XQuery, are well established and tested in real world applications. The XML databases that take advantage of them provide the platform for industrial strength applications to manage XML content. Like any new technology, adoption is slow at first. Then as the technology matures and understanding on how to best deploy increases, applications emerge that demonstrate the advantages of the approach. Today, we can find many applications to manage XML content that demonstrate the power and flexibility that can only be achieved through XML-native databases. Information intensive companies such as the airline and manufacturer described in this paper have achieved significant technical and business benefits from their use of XML standards and database technology over alternative approaches.
-1-
http://gilbane.com
Introduction
Organizations that manage complex or diverse structured content have discovered the power of XML for expressing the structure of their information. Related standards and tools allow their XML-encoded data to be processed and reused efficiently to satisfy a range of business objectives. Developers are becoming increasingly fluent in using XML, standards related to XML, and a range of platforms and tools to develop XML-based applications. These platforms include relational database management systems that have continued to add more capability for managing XML data and content. Even with this continued innovation and improvement, developers are finding that certain content and data processing requirements are best met with a native XML database technology. With the right platform and tools, developers can take full advantage of the capabilities of XML and related standards. The key to effective XML content management is a flexible system architecture based on interoperability standards for data, services, and system integration. These standards serve as an intermediate middle layer allowing powerful tools to manage, process, access, and transform XML structured content in a cost effective manner. This paper describes the challenges faced in managing information in robust systems that enable reuse of XML-encoded data objects and support their efficient processing. This is accomplished through the use of XML databases and open standards based on the eXtensible Markup Language (XML) produced by the World Wide Web Consortium. This paper is divided into three major sections, exploring: The Power of Standards-Based Computing, Status of Standards used by XML Databases, and Two Customer Case Examples
-2-
http://gilbane.com
puzzle, it is no wonder people find it difficult to keep track of the many specifications, their roles, and current stage of development. There are a number of XML-based standards that are currently available that are used in XML data management. These include familiar languages such as XPath, XML Schema, and the DOM. While these standards are currently adopted by the W3C, there are other areas of functionality not covered by an existing interoperability standard. Even so, the standards that exist today create a powerful architecture for connecting XML data management services to commonly available application frameworks. In short, these XML standards enable the development of powerful XML content management applications to meet even the most demanding business requirements.
-4-
http://gilbane.com
A solution to some of these problems was the introduction of XML databases. When these appeared shortly after XML 1.0 was released, people werent sure if they were a replacement for relational databases or a return to hierarchical databases. In fact, they were designed for the entirely new types of applications XML made possible. XML databases have a number of features that are useful for working with XML. The most important are the XML data model, which is flexible enough to model subjects as diverse as technical documentation, health data, and customer profiles; XML-aware full-text searches; and structured query languages like XQuery. They are also designed to manage large numbers and a diverse array of XML documents. Other advantages include node-level updates (which reduce the cost of updating large documents), links, and versioning. Another advantage of XML databases is their ability to handle large documents, as well as large numbers of documents. Both of these are difficult to query due to the time it takes to parse the documents and find the required data. XML databases solve this problem by parsing and indexing documents when they are inserted. This allows documents to be queried without further parsing and may even allow queries to be resolved only by searching indexes. A final advantage is more flexibility in handling schema evolution than is found in relational databases. While schema evolution, or changes in the data model, is a normal thing, it can move slowly in the relational world if the particular relational database lacks tools or functions to make changes to the relational schema easier and more manageable. In the XML world, change moves more quickly, both because XML is new and because XML exposes users to more sources of change. Examples of the latter include external trading partners who control the schemas used to move data across organizational boundaries, rapidly evolving fields like finance and biology, and long-lived fields like mortgage and insurance contracts. Fortunately, some XML databases do not require fixed schemas and can easily handle data conforming to multiple schemas or multiple versions of a schema. Ironically, the strongest endorsement of XML databases to date is that the major relational databases are adding native XML storage capabilities. This shows that the need for and application of native XML data management has become well understood, and adoption increases continually.
-5-
http://gilbane.com
EMC Documentum XML Store provides numerous XML-aware features to enable powerful management of XML data structures, including: an XQuery engine for retrieving specific parts of a document a versioning mechanism for tracking differences within XML data various indexing methods to optimize access to frequently used XML data a transformer and formatter for publishing XML data in XHTML or PDF EMC Documentum XML Store uses and supports XML standards including XML 1.0 and 1.1, XQuery 1.0, XML Schema 1.0, XPath 1.0, XSLT 1.0, XPointer, XLink 1.0, and DOM Level 1, 2, and 3. The following diagram shows standards and industry specifications that EMC Documentum XML Store works in concert with to create a robust XML data management environment. This diagram is organized into three layers: Application APIs: Interfaces to commonly available application frameworks used to develop much of today's business and content applications Interoperability Standards: Open standards that serve as a consistent means for interoperability between specific applications, and Data Management Services: Specific services to manage XML data content.
But even in this diagram not all aspects of interoperability are currently addressed by existing standards. Work continues in several venues on specifications, either to enhance existing functionality or to create new standards that play a role where no standard currently exists. Standards development for XML data management will continue for the foreseeable future, but there is already a significant amount of the infrastructure illustrated in this diagram in place today with existing standards. A brief update of the existing, emerging, and missing pieces of this standards puzzle follows.
-6-
http://gilbane.com
Adopted
Mature
Low risk associated with implementation of this specification due to the extensive use and testing it has gone through in implementation. Developers have ample resources to understand its application and any limitations it may have.
Admittedly the difference between a standard given the "Adopted" and "Mature" status described above may be subjective and dependent on the specific usage and environment. The distinction between the two is to help those not familiar with a particular specification to understand whether there are issues that have
-7-
http://gilbane.com
yet to be resolved with a completed specification or if there are many tried and true applications of it that demonstrate its readiness. For the most part, the standards described below according to this simple rating system will be mature, well proven standards.
XML
Status: Mature The eXtensible Markup Language (XML) was adopted in 1998 by the W3C and has enjoyed an explosive growth ever since. Not only have many organizations built applications that maintain content in an XML format, dozens of newer standards have been developed based on XML, or even using XML documents as a description language. It is hard to imagine a language that is more tested and proven than XML. XML documents can be managed with or without a related set of validation rules. Validation is the addition of a set of rules for element and attribute names, occurrence, and sequence that allow organizations to manage documents with much more custom-defined control. These rules are expressed in either a Document Type Definition (DTD) or an XML Schema, both of which are described in more detail below. In environments where information must be consistent with structural, naming, and occurrence rules, one of the following schema languages available to validate XML data should be employed to define and enforce these rules.
and transformed for delivery to other users. The rules expressed in a DTD ensure that all required elements are present before the data progresses to the next step in the process. For instance, it is useful to insist that an invoice must contain a purchase order number before it is sent to the accounts payable department.
Namespaces in XML
Status: Mature It is not uncommon for XML data encoded according to several different specific DTDs or XML Schemas to be processed in the same environment. This mixing of different XML vocabularies could cause confusion to processors and human users alike, especially when similar elements from two document types have the same names but different content rules. Namespaces in XML is a specification that allows systems to differentiate between vocabularies and sort out this confusion.
-9-
http://gilbane.com
For example, the XLink standard defines structures that can be used in any document to define links to other documents. These structures have their own namespace, so they cannot be confused with similarly named structures in other schemas. This kind of reuse, which is made possible by Namespaces in XML, allows you to take advantage of the effort of other parties. XML Namespaces is an essential tool in an XML processing environment that removes constraints that would otherwise require expensive and time consuming reformatting of data shared between systems. For instance, a large manufacturer that has processing environments in many locations and configurations would need to share operating and product information between these systems. Namespaces would be used to differentiate the source, and therefore which DTD or Schema is applied, for each data type encountered during processing. The alternative, to reformat all data into a complicated single vocabulary, is usually cost prohibitive and unfeasible.
XQuery
Status: Adopted XQuery is a query language for retrieving and interpreting information from XMLencoded content and data. It was designed with flexibility in mind, recognizing that XML-encoded information is a wide range of sources, including data and
- 10 -
http://gilbane.com
documents. The W3C recommendation is careful to note that XQuery operates on the logical structure of an XML document, or the data model, as defined in XDM. XQuery is a significant addition to the family of XML standards, and by some measures, the most significant apart from XML itself. XQuery allows XML documents, Web pages, and database content alike to be searched using a powerful, XML-aware language. Individual documents, collections of documents, or nodes in a database can be searched for, selected, processed and manipulated, similar to searching capabilities found in other query technologies. The XQuery standard consists of a number of different standards which work together to define the requirements, data model, and syntax of a data query. These include: XDM XQuery 1.0 and XPath 2.0 Formal Semantics XQuery 1.0 and XPath 2.0 Functions and Operators XSLT 2.0 and XQuery 1.0 Serialization XML Schema Except for XML Schema, the other standards were all released simultaneously with XQuery 1.0 as W3C Recommendations. (XML Schema was released in October 2004.) Even during its draft status, there was significant support for the XQuery 1.0 and XPath 2.0 specifications from vendors and solution providers. With the formal adoption of the standard, we expect more support from vendors and more implementation from users.
DOM
Status: Mature The W3C Web site describes the Document Object Model (DOM) as, "a platformand language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page." Commonly referred to as an "API into XML Documents" the DOM enables powerful manipulation of XML document nodes for a wide range of applications. By using the XML DOM, processing and system development can be simplified. A document can be extracted from a database and loaded into a DOM aware tool and rearranged, searched against, etc. without having to perform extensive changes to the original applications or data structures. This is particularly useful in content reuse applications and for environments that are very dynamic and require ad hoc access and storage of XML information.
XPointer
Status: Mature XPointer is used to extend the reach of XLink, XInclude and other XML specifications to XML resources distributed across the Internet. It uses URI
- 11 -
http://gilbane.com
references to locate these resources and the power of XPath to identify and retrieve internal data structures within them. XPointer allows organizations to extend their reach well beyond corporate boundaries by defining XML documents that include XML data found anywhere on the Web. For example, a single Web page might link to or include information found in numerous external XML resources, such as other Web pages or applications including Web services or XML-aware databases. The XPointer standard consists of three different standards which work together to define a data reference.
XLink
Status: Mature The XML Linking Language (XLink) allows cross reference or link elements to be inserted into XML documents that refer to documents, graphics, or other applications. XLink extends the capabilities found in HTML unidirectional linking with additional functionality, including: Defining extended linking relationships among more than two resources, sometimes called "one-to-many" or "many-to-many" links, Associating metadata with links, that may be used to resolve links or be used by applications at the target location, Expressing links that reside in a location separate from linked resources, Associating behaviors with links, such as whether to open a new window, open in the same window or to bring the linked content into the position of the current document where the linking reference resides. The end result is a robust linking expression language that adds considerable power to XML data and applications. Organizations can take advantage of XLink capabilities to create powerful links into databases instead of flat files on a Web site. Also, links can be defined to result in powerful behaviors such as opening a new window and including several documents as if they were maintained as a single HTML file. These links enable very powerful and dynamic document processing that avoids a lot of static Web page development. Also, the relationship of the links expressed in XLink outlive any specific instance of the data so links do not break as easily as in other linking methods.
XInclude
Status: Mature XML Inclusions (XInclude) is a language used to define the inclusion or merging of XML documents to support modularity. Expressed in a friendly XML syntax, XInclude expresses which documents or document components are to be merged and the processing behavior to be applied. The syntax leverages the XML constructs of elements and attributes, as well as URIs. XInclude supports merging the contents of files or portions of XML documents expressed using XPointer. XInclude extends the capabilities for merging expressed in XLink by adding a processing model to control merge processing. While XML
2008 Gilbane Group, Inc. - 12 http://gilbane.com
documents can be merged using several approaches, XInclude can be applied independently of other processing. Prior to XInclude support, users would have to rely on custom or proprietary processing done by a content management system or other mechanism. While such an approach might work well in a specific environment, it did not allow ready integration or extension of the XML processing with other systems that did not use the same proprietary approach. XInclude would be used by an organization that wants to maintain documents as modules that are assembled as needed. Perhaps highly controlled boilerplate text such as legal disclaimers would be included in many documents during product staging or delivery. This would avoid inconsistencies, or even liabilities from text errors in the resulting delivered documents.
Updates in XQuery
XML databases have a variety of strategies for updating and deleting documents. These range from replacing or deleting the existing document, to modifications applied through a live DOM tree, to languages that specify how to modify fragments of a document. Currently most methods to accomplish this type of functionality are proprietary. There is recent activity where people have explored the creation of standardized languages for updating XML documents. Most implementers are looking to the W3C effort, XQuery Update Facility 1.0, which published a working draft in August of 2007. XQuery Update is designed to create a set of extensions to XQuery to address updating XML content and to formalize them in an official standard.
2008 Gilbane Group, Inc. - 13 http://gilbane.com
Versioning
Document versioning in an XML database involves a set of functionality that manages the relationship of document components that are shared in multiple document instances. As one version is updated, the need to decide whether related versions also need to be updated, or if the relationship needs to be continued, can be supported by the database application. Different XML databases manage these relationships and decisions to varying degrees. There are complexities, such as whether branching is allowed, that may or may not be supported by different tools. XPath, XPointer and other XML standards can play a powerful role in defining and managing these relationships. But until a standard that addresses XML database versioning emerges, implementers will need to consider the specific capabilities of XML databases and the impact their capabilities will have on their specific development system projects.
Collections
Most XML databases support the concept of a collection, or grouping multiple individual documents. This is similar to the way a table in a relational database or a directory in a file system groups multiple instances of records or documents. If an application using an XML database manages invoice data, it might be very useful to the accounting department to refer to collections of invoices and structures above the individual invoice during processing sets of invoices related to a single customer. Similar relationships can be useful in groups of documents such as product descriptions, technical manuals, and many other examples. Collections might even contain other collections, depending on the capabilities of the specific database system in use. Currently only XQuery uses the concept of collections, but it is not difficult to see how XPath, XPointer, XLink, XInclude and other standards could use this concept as well.
Transaction Management
Transaction support is a critical feature of database systems. Wikipedia offers an excellent definition of a transaction as, a unit of interaction with a database management system or similar system that is treated in a coherent and reliable way independent of other transactions. Software developers often view transaction support as a critical functionality in data management, and thus will often lean toward relational database management systems because of what they see as more mature transaction support. XML databases support transaction management in some form or another (including updating elements and presumably rolling back to prior versions). However, locking is often at the level of entire documents, rather than at the level of individual nodes, so multi-user concurrency can be relatively low. Whether this is an issue depends on the specific application and what constitutes a "document". Many XML implementations rely on being able to manage information, and the transactions applied against that information, at hierarchical levels below the document level. Some need to manage their information at very granular, low level structures. As desirable as it is for node-level locking in XML databases, there are complexities with node-level locking. Locking a node usually requires managing control of parent and child nodes, including, for instance, whether a parent can be updated or deleted while one of its children nodes is locked by another process. Schemes
2008 Gilbane Group, Inc. - 14 http://gilbane.com
for defining and expressing transaction management rules are being discussed in various technical communities. But, until a standard addresses this area, implementers will need to depend on the specific capabilities of the XML Database systems they implement. These vary to a large degree from one database system to another.
PSVI API
The XML DOM provides a means of accessing data as it is processed. It relies on the canonical form of the XML data produced by XML processors called the Infoset. When XML data is processed using a W3C XML Schema an additional infoset is created called the Post Schema Validation Infoset or PSVI. The advantages of being able to load XML data into DOM processing tools and having a formal "API" into that form is very clear. Similarly, an API or formal syntax for accessing and manipulating data in the PSVI form would provide beneficial processing efficiencies and capabilities when processing data against a XML schema. There are groups, such as the Apache Project, that have created PSVI "API" tools, but PSVI API abilities are likely to remain fragmented until a syntax is formally developed and passed by the W3C XML Schema working group.
- 15 -
http://gilbane.com
An Airline Company
The major goal of an airline is to provide air travel service to an audience in a highly competitive market where costs and efficiency can make the difference between success and failure of a business. Aircraft are expensive and complicated, and the information used to maintain and operate them is uniquely complex. Each aircraft in a fleet may have different specific components, and the information used to maintain and repair this equipment needs to be accurate and current. The airline industry is extremely concerned with safety and must modify existing manuals and work cards to keep them current with aircraft manufacturer and government regulatory information as it is made available. Finally, a large airline has users of this information distributed all over the world, and providing timely access to the correct information from anywhere in their systems is a challenge faced by few industries. Airline maintenance information is organized into technical manuals and task oriented work cards, as well as many other derivative information formats. Technical manuals are very comprehensive, while work cards are organized for a specific maintenance or repair procedure that can be accomplished by a specialized worker in a single shift. They may be made of much of the same information but are organized in very different formats. Even so, the relationships of the information components that comprise them need to be maintained so they are current, synchronized, and highly accurate. Traditionally, airlines have managed their maintenance information in systems that produce books and documents. The current maturity of XML standards and the support of these standards by XML databases make it possible to break out of document-centric publishing into an environment where the information is highly integrated and easily reused. Data coming from diverse sources, manufacturers, internal departments, and regulatory agencies can be quickly integrated and inserted into information objects that make up the diverse delivery formats. The relationship between a repair step in a work card or manual and the government directive that changes that procedure is quickly identified and executed upon.
2008 Gilbane Group, Inc. - 16 http://gilbane.com
The airline industry has been using XML and SGML technology for many years for sharing information. Newer equipment in airline fleets may be captured and managed in the industry's ATA 2200 specifications, but documentation for older aircraft may still reside in paper form, or older proprietary publishing system formats. Also, the data contained in these documents must be delivered in a variety of consumable forms, including print and the Web. Given these challenges, most airlines are looking to create a single repository of maintenance information components to facilitate effective reuse and repurposing. The end result is to improve accuracy and efficiency, while reducing costs and delivery schedules. The links between information objects fall into three categories: 1. In some cases, when an information component that exists in one document but is also used in several others is changed, the new content should automatically be replaced in all locations. In other cases, there are significant differences in the actual text in two related objects, so when information is changed in one place, the system must track that the information needs to be manually updated to be kept in synch, Lastly, in some information objects, sub-components can be automatically updated, while others need to be manually updated.
2.
3.
This challenging reuse model is difficult to maintain using manual tracking methods. XML standards for structure and relationship linking supported in EMC Documentum XML Store enable these processes and updates to be significantly automated or at least tracked and managed by the system. Updates to a technical manual that comes from an aircraft manufacturer supplier may trigger several automatic updates, initiate several other manual workflow steps, generate multiple delivery formats such as HTML, PDF and others. Also, it may initiate updating of management information to control the process and report on steps taken for tracking and compliance purposes. Airlines, like many other large corporations, are large operations with diverse user and authoring roles, and a heterogeneous mix of software, operating systems and data formats. Authors tend to be highly concentrated in one location on a consistent set of workstations. Meanwhile data and application servers might be on other platforms that host applications and databases. Finally, consumers of this information are highly distributed and need to work in a mixture of paper documents and Web clients. The XML standards supported by EMC Documentum XML Store enable data that can be captured, edited, and rendered throughout a heterogeneous environment such as this. For these reasons airlines are interested in avoiding being locked into a proprietary format that would insist on more consistent environments for all phases of document and information production and consumption. Also, considering that most aircraft equipment, the airplanes themselves, remains in service for 20 to 30 years, the data used to support them must remain portable and outlive the environment in which it is processed today. According to one airline industry executive who manages these types of data, "There is no other technology that provides what XML standards do to meet these requirements."
- 17 -
http://gilbane.com
A Manufacturing Company
Information about products is a critical feature of customer satisfaction to any manufacturer. When a product is fairly complicated, such as appliances, machinery, or even vehicles, the supporting information is proportionately sophisticated. Managing that information in a cost effective manner, according to government safety regulations, and in a form that enables all required delivery formats is an essential part of the overall manufacturing process. A manufacturer of recreational vehicles needs to produce the user manuals that customers rely on from the same information that dealers and repair shops use to inform potential buyers of features and specifications and to maintain the equipment once it is purchased. The technical documentation for these personal vehicles is subject to content management challenges. For example, one manufacturer of recreational vehicles maintains documentation on 28 different models that may include one or more of 400 separate alerts (shared warning information). Also, they produce more than 3,000 components that can be purchased separately and installed on the vehicles, and must provide installation instructions for each of these as stand-alone documents or as parts of other documents. In all it is a complex data management puzzle. Components on these vehicles can be shared between different models, so the documentation on these vehicles must be similarly shared and tracked. These manuals, alerts, and product specification sheets are made available in printed form for customers, as well as electronic formats for dealers and internal staff. Many safety regulations and standardized language rules affects how these information objects are reused as well. In selecting a system to manage its documentation, this manufacturer wanted to take advantage of the XML structure and processing standards as well as best-ofbreed document component management functionality found in EMC Documentum XML Store. The focus was on reducing the costs and complexity of managing the relationships between information objects and their reuse in the many specific documents and output formats. Challenges included looking for technology that was compatible with existing systems and workflows. The integration capabilities expected from EMC Documentum XML Store, due to its support for many XML-based standards and industry APIs, made it a logical choice for the data management system for their documents. The XML standards allowed documents to be created and managed so that they could be reused in different manuals and repurposed for different output formats. New and existing processing environments may utilize many third-party tools for various processes. The industry APIs supported by EMC Documentum XML Store enable integration with existing systems and the existing efficient workflow processes, such as structured text editors, composition systems, and Web content management tools. In the resulting environment based on EMC Documentum XML Store, searches utilize XQuery and are very precise. XQuery provided better search results than some of the other XML tools and technologies considered. EMC Documentum XML Store, through XQuery, enables document components to be shared reliably and quickly. EMC Documentum XML Store also supports the WebDAV architecture
- 18 -
http://gilbane.com
which provides stateless editing, powerful file locking controls, and enables applications to work in the heterogeneous legacy environment. Due to the high degree of document component sharing and reuse, strong versioning capabilities of EMC Documentum XML Store enabled warnings and cautions to be tightly controlled and rules enforced about which version is in use at any point in time. Warnings and cautions come from internal engineering and technical writing personnel, as well as regulatory sources, and are a critical part of the documentation. Many manufacturers, such as the one described above, enjoy sales of their products to countries around the world and need to produce supporting information in multiple languages. One significant benefit of publishing documents using systems that take advantage of XML and related standards is that translations into different languages have become more efficient due to the strong information object sharing. Because a single object is reused it only needs to be translated once for each language. Previously, objects were duplicated and maintained in multiple files and usually became slightly inconsistent over time. XML-aware version control and transformation prevented the relationships of different versions from becoming lost and reduced the need for much of the translation due to minor textual differences. This manufacturer did not want to sacrifice performance to get the benefits of robust information sharing and automated content management and processing. The power of XML standards based information processing is very apparent in this publishing application.
- 19 -
http://gilbane.com
Conclusions
Organizations intent on creating powerful and efficient environments for managing and accessing XML data have found that traditional relational databases are not designed to work with hierarchical XML content. Excessive application development and inadequate performance and efficiency result when tools that take advantage of XML content and standards for processing are not deployed. A new breed of tools called XML databases not only understands the structure and syntax of XML content, but also utilizes the powerful standards developed by the W3C to process and access XML data. The W3C, developers of XML, have produced many related standards, including specifications that provide XML processing and database support. These include XML DTDs & Schemas, XSLT, DOM, XPath, XPointer, XLink, and XQuery. These specifications are well tested and deployed by organizations managing complex XML content. They enable a powerful platform for managing XML data that surpasses alternative traditional approaches for data management. Developers seeking to build a best-of-breed XML content management environment need to understand the capabilities that XML databases bring that cannot be achieved using non-XML aware tools. Information-intensive businesses such as an airline and a manufacturer have built applications that demonstrate the power and efficiency of using an XML database such as EMC Documentum XML Store to manage their XML information assets. With Documentum, EMC provides an enterprise-class content management system with a unified content architecture. The Documentum platform provides a unified environment for storing, accessing, organizing, controlling, retrieving, and delivering any type of unstructured information within an enterprise. As a part of Documentums next release, XML Store can be added on to the Documentum Content Server and will allow XML document storage and access via XQuery. The documents stored into XML Store will be part of the native Documentum repository, accessible through standard Documentum APIs and query language, and subject to the same policies and management as other documents. Customers can now get both the security, compliance and archiving capabilities of the Documentum platform and the high performance provided by XML Store.
- 20 -
http://gilbane.com
Resources
W3C Activities, World Wide Web Consortium (W3C), http://w3.org/Consortium/Activities Going native: Use cases for Native XML Databases, R. Bourret, http://www.rpbourret.com/xml/UseCases.htm EMC Documentum XML Store, http://www.emc.com
- 21 -
http://gilbane.com
- 22 -
http://gilbane.com