A Review On Query Processing and Query Languages For Content Management in XML Database

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Web Site: www.ijettcs.org Email: [email protected], editorijettcs@gmail.

com Volume 2, Issue 2, March April 2013 ISSN 2278-6856

A review on Query Processing and query languages for content management in XML database
Ms. M. A. Ramteke1, Prof. S. S. Dhande2
1 Department of Computer Science and Engineering Sipna College of Engineering and Technology, M.S, India

Associate Professor Department of Computer Science and Engineering Sipna College of Engineering and Technology, M.S, India

Abstract: An XML database is a data persistence software


system that allows data to be stored in XML format. These data can then be queried, exported and serialized into the desired format. XML databases are usually associated with document-oriented databases. This paper reviews query processing in XML and query languages for querying XML databases. Extensible Markup Language (XML) is emerging as a standard for data representation and data exchange. It is important to have query languages for query processing for XML documents. While XML (Extensible Markup Language) has been widely accepted as the standard for data storage, exchange, and integration over the Internet, The XML query processing becomes an interesting and challenging research topic in XML database research because of the rapidly emerging applications in XML query analysis and optimization.

one class of XML document to another. XSLT documents are useful as a general purpose language for expressing transformations form one schema type to another.

2. XML Schema, Document Type Definition and XML


Document An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. An XML Schema describes the structure of an XML document. A Document Type Definition (DTD) is a specific document defining and constraining definition or set of statements that follow the rules of the Standard Generalized Markup Language (SGML) or of the Extensible Markup Language (XML), a subset of SGML. A DTD is a specification that accompanies a document and identifies what the funny little codes (or markup) are that, in the case of a text document, separate paragraphs, identify topic headings, and so forth and how each is to be processed. The [4] following text shows XML Schema, Document Type Definition and XML Document XML Schema <complexType name=villageinfo> <sequence> <element name=name type=string minOccurs=1 maxOccurs=1/> <element name=country type=string minOccurs=1 maxOccurs=1/> <element name=accommodation type=string minOccurs=0 maxOccurs=unbounded/> DTD <! ELEMENT village(name, country, accommodation*) > <! ELEMENT name (#PCDATA)> <! ELEMENT name (#PCDATA)> <! ELEMENT country (#PCDATA)> Page 397

Keywords: XML Database, XML Query Processing, XML query languages.

1. INTRODUCTION
XML has become a popular approach to transfer and storage of diverse data because of its simplicity and transparency. XML has emerged as the dominant standard for representing and exchanging data over the Internet. The use of XML as the common format for representing, exchanging, storing, and accessing data poses many new challenges to database systems. Its nested, self-describing structure provides a simple yet flexible means for applications to model and exchange data [2]. XML is used as vendor independent interchange format between applications and seems to become a major standard for data exchange on the web. XQL is a query language designed specifically for XML. XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents. XQuery uses XPath expression syntax to address specific parts of an XML document. The W3C-sanctioned language for addressing subsets of an XML document is called the XML Path Language (XPath).The XSL Transformations (XSLT) specification defines an XML based language for expressing transformations rules from Volume 2, Issue 2 March April 2013

Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 2, Issue 2, March April 2013 ISSN 2278-6856
<! ELEMENT accommodation (#PCDATA)> XML Document <village> <name>John</name> <country>Tyrol</country> < accommodation >Hotel Post</ accommodation > < accommodation >Hotel Admiral</accommodation > < accommodation >Hotel Anker</accommodation > </village> An XML document must be well-formed to be processed correctly. A well-formed document is one that has only one root element, has matching start and end tags for every element, has no tags nested out of order, and is syntactically correct in regard to the specification. An XML document can be, but is not required to be, valid. A valid document is one that is wellformed, and conforms to its document type definition (DTD). XQuery expressions can be XPath expressions FLWR (!) expressions Quantified expressions Aggregation, sorting, and more. Example: [6] Query: <books>{ for $b in doc(books.xml)//biblio[publisher=Wiley]/books where $b/author/lastname=Smith order by $b/price return <book>{ $b/title, $b/price }</book> }</books> XQuery language has the following features Strongly-typed query language Large-scale database access Safety/correctness of operations on data 4.2 The XPath Language XPath is used to navigate through elements and attributes in an XML document. XPath, the XML Path Language, is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C). XPath [6] language has the following properties describes a single navigation path in an XML document selects a sequence of nodes reachable by the path main construct: axis navigation consists of one or more navigation steps separated by / Example: [9] Query: /bibliography/book/author o Like a UNIX directory Result: all author elements reachable from root via the path /bibliography/book/author 4.3 XSLT Each time a given XSLT or XQuery instruction needs to address (refers to) parts of an XML document, we use XPath expressions. XPath expressions also can contain functions, simple math and boolean expressions. Within XSLT, XPath expressions are typically used in match, select and test attributes: [10]

3. XML as a data model


An XML document can be modeled as a rooted, nested, ordered node-labeled or edge-labeled data tree. These two models are in fact, equivalent except for the placement of the labels; labels are on the nodes in the node-labeled model and on the edges in the edge labeled model [5]. Nodes represent elements or values. Edges model direct containment properties.

Figure1: Data model (Node-labeled, ordered tree)

4.

XML Query Languages

There are different query languages for querying XML database. XML query languages are based on hierarchical structure navigation for example XPath. 4.1 The XQuery Language XQuery is a functional language in which a query is represented as an expression. Xquery supports several kinds of expressions, and the structure and appearance of a query may differ significantly depending on which kinds of expressions are used. The various forms of Xquery expressions can be nested with full generality [3]. XQuery is a query and functional programming language that is designed to query collections of XML data. XQuery relies on path expressions for navigating in hierarchic documents. Volume 2, Issue 2 March April 2013

Figure2: Xpath expressions in an XSLT template Page 398

Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 2, Issue 2, March April 2013 ISSN 2278-6856 5. XML Parsing
XSL (Extensible Stylesheet Language) consists of XSL-T (Transformation) - primarily designed for transforming the structure of an XML document -W3C Specification: http://www.w3c.org/TR/xslt XSL-FO (Formatting Objects) - designed for formatting XML documents -W3C Specification: http://www.w3c.org/TR/xsl XSLT has the following features Loosely-typed scripting language Format XML in HTML for display in browser Highly tolerant of variability/errors in data Example:[16] The XML input file <book> <title>The Dilbert Principle</title> <author>Scott Adams</author> </book> XSLT <xsl:stylesheet version = '1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'> <xsl:template match="/"> <h1> <xsl:value-of select="//title"/> </h1> <h2> <xsl:value-of select="//author"/> </h2> </xsl:template> </xsl:stylesheet> Output: <h1>The Dilbert Principle</h1> <h2>Scott Adams</h2> 4.3.1 Transformation of XML data using XSLT i) A conversion of XML data into a tree structure, e.g. using an XML parser conformant to Document Object Model (DOM) Simple Api for XML (SAX) ii) A structural transformation of the data: from the input to the desired output structure Involves selecting-projecting-joining, aggregating, grouping, sorting data. XSLT vs. custom applications: factoring out common subtasks and present them as transformation rules in a high-level declarative language iii) Formatting of the data: data in the desired output structure is enriched with target format constructs, e.g. from PDF (paper-print), VoiceXML (aural presentations), SVG (graphics), HTML (browsing). For a computer program to access the structured information in the document in a meaningful way, parsing is required. The parser first reads the stream of characters and recognizes the syntactic details of elements, attributes and text in the document. Then, the parser exposes the hierarchical set of information in the document as a tree of related elements, attributes and text items. The logical tree of information items created after parsing the XML document, is called the Information Set or Infoset. This can then be manipulated in different ways and data extracted for usage in applications, databases etc.

6. XML Query Processing


The approach used by XML database engines for answering a user query is shown in Figure3. Figure3 shows, a user query is first parsed and mapped to its equivalent algebraic representation called logical plan or query plan. This plan is then optimized by applying several optimization techniques and strategies. The output of this phase is an execution plan also known as physical plan. The next phase consists of mapping the execution plan to a sequence of statements which will in turn be processed as a final step towards the generation of results. The logical plan, which has either a tree or a graph structure, consists of a connected sequence of algebraic operators. The set of all operators defined by a database system forms what is called the databases logical algebra [7].

Figure 3: Steps followed in Query Processing Given a query, there are generally a variety of methods for computing the answer. It is the responsibility of the system to transform the query as entered by the user into an equivalent query that can be computed more efficiently. The process of finding a good strategy for processing a query is called query optimization. Query optimization is function wherein multiple query plans satisfy a query by selecting a appropriate query plan which is evaluated by the system. The proposed approach Page 399

Volume 2, Issue 2 March April 2013

Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 2, Issue 2, March April 2013 ISSN 2278-6856
of query optimization is by using XML database management system. When Query is optimized the time to traverse the query over the path gets minimized. Thus it enhances the throughput of the system. [8]http://www.cs.toronto.edu/~ryanjohn/t...cscc43s12/lect ures/c43-xpath-v04.pdf [9]http://www.cs.duke.edu/courses/fall02/cps196.3/lecture s/15-xml-notes.pdf [10]http://edutechwiki.unige.ch/en/XPath tutorial-basicsEduTech Wiki.htm [11]xslt1http://www.cs.ox.ac.uk/dan.olteanu/tutorials/xslt 1.pdf [12] http://research.cs.wisc.edu/niagara/papers/step.pdf [13] Mikael Fernandus Simalango: XML Query Processing and Query Languages: A Survey. tech.amikelive.com/.../XML_Query_Processingtechnical_paper.pdf [14] Essential XML: Beyond Markup Don Box, Aaron Skonnard, John Lam Pearson Education, Inc. Published by Dorling Kindersley (India) Pvt. Ltd. (First Impression 2006). [15]http://www.ccse.kfupm.edu.sa/~nizar/download/XML _Mabroukeh-KFUPM2001.ppt [16]http://www.aviationia.com/aeec/projects/aoc/XSLT_E xamples.pdf

7. Conclusion
An XML query language defines more comprehensible and structurized construct for conducting operation on an XML document or various XML documents. XPath is used for navigating an XML tree. XPath is fairly mature and stable. XQuery is XML query language which provides the means to extract and manipulate data from XML documents. Very powerful (as opposed to relational algebra); however, query processing/optimization is hard. XSLT simplify manipulation of XML Schema-typed content, support sorting nodes based on XML Schema type and simplify grouping. XML can store and organize any kind of information in a form that is tailored to your needs. XML is a potential fit for the exchange format. XML applications are typically Web-hooked and have many simultaneous, interactive users. This dynamic nature requires highly efficient XML query processing .Query processing allows users to query seamlessly across XML documents.

REFERENCES
[1] A Query Processing Approach for XML Database Systems (2005) by Christian Mathis, Theo Harder University of Kaiserslautern, 17 Workshop Grundlagen von Datenbanken Worlitz, Mai 2005 Tagungsband. [2] A General Technique for Querying XML Documents using a Relational Database System by Jayavel Shanmugasundaram Eugene Shekita Jerry Kiernan IBM Almaden Research Center San Jose, CA 95120, Rajasekar Krishnamurthy Efstratios Viglas Jeffrey Naughton University of Wisconsin Madison, WI 53706, Igor Tatarinov University of Washington Seattle, WA 98195, SIGMOD Record, Vol.30, No.3 September 2001. [3]http://www.wiscorp.com/XQuery1.0AnXMLQueryLan guage.pdf [4]http://www.cs.gsu.edu/yli/teaching/Spring08/DB/slides /bhavin.ppt [5] Data storage practices and query processing in XML databases: A survey by Su-Cheng Haw, Chien-Sing Lee Knowledge-Based Systems 24 (2011) 13171340 Elsevier [6] Query Processing of Streamed XML Data Leonidas Fegaras University of Texas at Arlington http://lambda.uta.edu/uta07.ppt [7] Overview of Query Optimization in XML Database Systems by Riham Abdel Kader Maurice van Keulen November 12, 2007 (Online) Available: http://doc.utwente.nl/64449/1/litterature_review.pdf Filename:litterature_review.pdf

Volume 2, Issue 2 March April 2013

Page 400

You might also like