Unit 2
Unit 2
XPath (XML Path Language) is a query language used to navigate and locate elements or
attributes in an XML (Extensible Markup Language) document. XPath provides a way to
traverse the structure of XML documents, which are typically organized as trees, to extract
data or perform operations on elements and attributes.
Key Features:
Path Expressions: XPath uses path expressions to select nodes in an XML document.
Navigation: It allows the navigation of XML documents using elements, attributes,
text, etc.
Syntax: The syntax is similar to a file path or directory structure, allowing you to
access various parts of an XML document.
Components of XPath:
1. Nodes: XPath operates on nodes like elements, attributes, text, and more.
2. Axes: Define the relationship between nodes (e.g., parent, child, sibling).
3. Predicates: Conditions used to filter the nodes.
Example:
<bookstore>
<book>
<title lang="en">The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price>10.99</price>
</book>
<book>
<title lang="fr">Le Petit Prince</title>
<author>Antoine de Saint-Exupéry</author>
<price>12.99</price>
</book>
</bookstore>
//title
To select the title of the book where the language attribute is "en":
//book/title[@lang='en']
/bookstore/book[2]/price
"12.99"
Conclusion:
XPath is a powerful tool for querying and manipulating XML documents, and it’s widely
used in combination with other technologies like XSLT and XQuery for processing XML
data.
XPath is highly versatile and powerful, and it allows for complex queries and navigation
within XML documents. Below are some additional concepts and examples to deepen your
understanding of XPath:
Node Selection: To select nodes, you use path expressions. These can be absolute
(starting from the root) or relative (starting from the current node).
o Absolute Path: Starts from the root element.
o /bookstore/book/title
This selects the title element that is a child of the book element, which is a
child of the bookstore root.
This would select title nodes that are descendants of book elements,
regardless of the position of book.
2. Wildcard (*):
3. Selecting Attributes:
Predicates are used to filter results based on conditions within square brackets [].
This helps you narrow down the selection.
o Example: Selecting the title of the first book:
o /bookstore/book[1]/title
This will return the title of the first book, e.g., "The Great Gatsby".
This will return the title of books where the price is greater than 10, such as
"Le Petit Prince".
5. Logical Operators:
XPath supports logical operators like and, or, and not to create more complex
expressions.
o Example: Select books where the price is greater than 10 and the language is
"fr":
o /bookstore/book[price > 10 and title/@lang='fr']
6. Axes in XPath:
XPath defines several axes that describe the relationship between nodes. Here are some
important axes:
child: Selects children of the current node (this is the default axis).
/bookstore/book/child::title
descendant: Selects all descendants (children, grandchildren, etc.) of the current
node.
/bookstore/descendant::title
parent: Selects the parent of the current node.
/bookstore/book/title/parent::book
following-sibling: Selects the siblings that come after the current node.
/bookstore/book/title/following-sibling::author
7. XPath Functions:
XPath also has built-in functions to make it easier to extract or manipulate data. Some
common functions include:
<library>
<book>
<title lang="en">Moby Dick</title>
<author>Herman Melville</author>
<year>1851</year>
<price>15.99</price>
</book>
<book>
<title lang="fr">Les Misérables</title>
<author>Victor Hugo</author>
<year>1862</year>
<price>20.99</price>
</book>
</library>
XPath Query: Find the title of books published after 1850 in English.
This will return the title "Moby Dick" since it was published after 1850 and is in
English.
Conclusion:
XPath is an essential tool for anyone working with XML data. Whether you're transforming
XML data, querying an XML document, or using it as part of a larger application,
understanding XPath's flexibility and functionality will help you extract and manipulate data
efficiently. With its combination of path navigation, conditions, axes, and built-in functions,
XPath can perform complex queries on XML documents.
In XML, nodes are the building blocks of an XML document. An XML document is
essentially structured as a tree, where each part of the document (such as an element,
attribute, text, etc.) is a node in that tree. XPath and other XML-related tools navigate this
tree and manipulate nodes to extract data or perform operations.
1. Element Node:
Example:
<book>
<title>Harry Potter</title>
<author>J.K. Rowling</author>
<price>19.99</price>
</book>
In this example, the <book>, <title>, <author>, and <price> are element nodes.
The <book> node contains child nodes (<title>, <author>, <price>), and each of
those is also an element node.
2. Attribute Node:
Attribute nodes are associated with elements. They provide additional information
about the element but are not child elements.
Attributes are part of the element node and are used to store data in a key-value pair
format.
Example:
<book lang="en">
<title>Harry Potter</title>
<author>J.K. Rowling</author>
</book>
Here, lang="en" is an attribute node of the <book> element. The attribute is named lang,
and its value is "en".
3. Text Node:
A text node holds the text content inside an element. It doesn't have child nodes, but
it stores the actual data inside an element or attribute.
Example:
<title>Harry Potter</title>
In this example, "Harry Potter" is a text node within the <title> element.
4. Comment Node:
Comment nodes represent comments in the XML document. These comments are not
part of the data and are used for documentation or to leave notes for other users.
Example:
Example:
6. Document Node:
The document node represents the entire XML document. It is the root of the XML
tree and is typically not visible in the XML text but is implied when processing the
document.
The document node is the topmost parent of all elements and nodes in the XML
document.
7. Namespace Node (optional in XML):
Namespace nodes define the XML namespace for elements and attributes to avoid
naming conflicts.
They are used in XML documents where elements and attributes may be reused
across different XML documents.
Breakdown of Nodes:
Document Node: The root node of the XML document, representing the entire
document.
Element Nodes:
o <bookstore>: Root element node.
o <book>, <title>, <author>, <price>: Element nodes that represent various
parts of the XML structure.
Attribute Nodes:
o lang="en" and lang="fr": Attribute nodes that provide information about
the language of each book.
Text Nodes:
o "Harry Potter", "J.K. Rowling", "19.99", "Le Petit Prince", etc., are text nodes
containing the actual content of the elements.
Comment Node: <!-- This is a comment about the bookstore --> is a
comment node.
XPath is used to query XML documents and select nodes. Here's how you can use XPath to
select different types of nodes:
This will select all <book> element nodes under the <bookstore> element.
Summary:
Nodes in XML are the basic units of structure, representing elements, attributes, text,
comments, etc.
Element nodes hold the main structure, and each element may contain text nodes,
attribute nodes, and child element nodes.
Attribute nodes store additional information about an element, and text nodes
contain the actual content.
Comment nodes and processing instruction nodes provide metadata or
documentation within the XML.
XPath is commonly used to query and navigate these nodes, selecting specific parts of
an XML document.
In XPath, a location path is an expression used to navigate through elements and attributes
in an XML document. It defines the path from the root element to the target nodes in the
XML structure, helping you select specific parts of the document. Location paths are the core
of XPath queries, and they are used in conjunction with axes, predicates, and operators to
locate nodes.
1. Steps:
o A location path is made up of one or more steps.
o Each step is separated by a / (forward slash).
o A step usually consists of an axis, a node test, and optional predicates.
For example:
xpath
Copy
/bookstore/book/title
2. Axis:
The axis defines the direction in which to navigate relative to the current node.
o
There are several axes available, like child, parent, attribute, etc.
3. Node Test:
o The node test specifies the type of node to be selected, such as elements ( book,
title), attributes (@lang), or text (text()).
4. Predicates:
o Predicates are conditions placed in square brackets [] to filter the nodes
selected by the location path. For example, [1] selects the first node, or
[price > 20] filters nodes based on the value of price.
An absolute location path always begins with a single forward slash (/). This indicates that
the path starts from the root of the XML document.
Structure:
xpath
Copy
/axis::node-test
xml
Copy
<bookstore>
<book lang="en">
<title>Harry Potter</title>
<author>J.K. Rowling</author>
</book>
<book lang="fr">
<title>Le Petit Prince</title>
<author>Antoine de Saint-Exupéry</author>
</book>
</bookstore>
xpath
Copy
/bookstore/book/title
"Harry Potter"
"Le Petit Prince"
A relative location path doesn't start with / (i.e., it doesn't start from the root). Instead, it
starts from the current node (context node). This is useful when you want to select nodes
relative to the current position in the document.
xpath
Copy
book/title
This selects the <title> element inside any <book> element starting from the current
context node.
It’s a shortcut that assumes we are within a <book> element, so it selects the <title>
within it.
If the current context is within a <book> element, this query would return the <title> of that
specific book.
1. Axes:
o The axis defines the direction of navigation from the current node. Axes are
essential to understand because they specify where the query should look
relative to the current node.
Example:
xpath
Copy
/bookstore/book/descendant::title
o This expression starts from the <bookstore> element, moves to the <book>
child, and then selects all descendants (which could include <title>,
<author>, etc.) under the <book>.
2. Node Test: The node test is used to define the specific type of node to select. In
XPath, you can specify:
o Element Nodes (like book, author, price).
o Attribute Nodes (like @lang).
o Text Nodes (like text() to select the actual text content).
o Comment Nodes (using comment()).
Example:
xpath
Copy
/bookstore/book/title
o The node test here is title, so this selects the <title> element nodes inside
the <book> element.
3. Predicates: Predicates are used to filter nodes based on conditions. They are enclosed
in square brackets []. Predicates can select specific nodes based on their position,
content, or other criteria.
o Position-based filtering: For example, [1] selects the first node, [2] selects
the second node, and so on.
o Value-based filtering: For example, [price > 20] selects nodes with a price
greater than 20.
o Text content filtering: For example, [author='J.K. Rowling'] selects
nodes where the text content matches "J.K. Rowling".
xpath
Copy
/bookstore/book[1]/title
xpath
Copy
/bookstore/book[price > 20]/title
This selects the <title> of books where the <price> is greater than 20.
xpath
Copy
/bookstore/book[author='J.K. Rowling']/title
This selects the <title> of books where the <author> is "J.K. Rowling".
xpath
Copy
/bookstore/book/title
o This expression starts from the root of the document and selects the <title>
element inside each <book> element that is a child of <bookstore>.
o Result:
"Harry Potter"
"Le Petit Prince"
2. Relative Location Path:
xpath
Copy
book/title
o If the current node is <book>, this expression will select the <title> element
inside it.
o Result (if the current node is <book>):
"Harry Potter"
"Le Petit Prince"
3. Using @ to Select Attributes:
xpath
Copy
/bookstore/book/@lang
o This selects the lang attribute of all <book> elements under the root
<bookstore>.
o Result:
"en"
"fr"
4. Selecting Text Nodes:
xpath
Copy
/bookstore/book/title/text()
o This selects the text content of the <title> element inside each <book>.
o Result:
"Harry Potter"
"Le Petit Prince"
xpath
Copy
/bookstore/child::book
xpath
Copy
/bookstore/book/parent::bookstore
xpath
Copy
/bookstore/descendant::title
xpath
Copy
/bookstore/book/following-sibling::book
xpath
Copy
/bookstore/book/preceding-sibling::book
xpath
Copy
/bookstore/book/@lang
xpath
Copy
/bookstore/book[1]/title
xpath
Copy
/bookstore/book[price > 20]/title
This selects the <title> elements of books where the <price> is greater than 20.
xpath
Copy
/bookstore/book[2][@lang='fr']/title
This selects the <title> element of the second <book> with the attribute lang="fr".
xpath
Copy
/bookstore/book[author='J.K. Rowling']/title
This selects the <title> elements of books where the <author> is "J.K. Rowling".
xpath
Copy
/bookstore/book/title
This will return the title elements of all <book> elements inside <bookstore>.
xpath
Copy
/bookstore/book/author
This will return all <author> elements inside <book> elements.
xpath
Copy
/bookstore/book[2]/title
Conclusion:
Location Paths are the core of XPath queries, allowing you to navigate through XML
documents to select specific nodes (elements, attributes, etc.).
Location paths can be absolute (starting from the root) or relative (starting from the
current node).
XPath supports various axes (e.g., child, descendant, parent, attribute) to define
how to navigate to the desired node.
Predicates within square brackets are used to filter nodes based on conditions, such
as selecting specific positions, values, or attributes.
In XPath, node sets represent collections of nodes (elements, attributes, text, etc.) that are
selected using location paths. These node sets can be manipulated using operators and
functions that allow you to filter, sort, and process the data more effectively. Understanding
these operators and functions is crucial for working with complex XPath expressions and
extracting meaningful information from XML documents.
Node set operators allow you to perform operations on sets of nodes. These operators enable
you to combine or filter node sets in specific ways.
The union operator (|) combines two or more node sets. It returns a node set that contains all
the nodes from both operands, removing duplicates.
Syntax:
node-set1 | node-set2
Description:
o This operator combines two node sets into one, including nodes from both
node sets.
o The result will not contain duplicates. If the same node appears in both sets, it
will appear only once.
Example:
Given the following XML:
<bookstore>
<book lang="en">
<title>Harry Potter</title>
<author>J.K. Rowling</author>
</book>
<book lang="fr">
<title>Le Petit Prince</title>
<author>Antoine de Saint-Exupéry</author>
</book>
</bookstore>
An XPath expression to select both <title> and <author> elements using the union operator
would be:
/bookstore/book/title | /bookstore/book/author
XPath doesn't have a direct "intersection" operator like in set theory, but the and operator in
predicates is used to filter nodes based on multiple conditions.
Syntax:
node-set1 and node-set2
Description:
o The and operator combines conditions within a predicate. It checks if both
conditions are true.
o It is commonly used inside predicates to filter the nodes.
Example:
This expression:
Filters the <book> elements by checking if the price is greater than 20 and the
language attribute (@lang) is "en".
It then selects the <title> element of books that satisfy both conditions.
The intersection-like behavior in this case ensures that only books meeting both conditions
(price > 20 and lang = "en") are selected.
XPath does not have a direct "difference" operator either, but it can be achieved using not()
within predicates to exclude nodes from a set.
Syntax:
not(node-set)
Description:
o The not() function in XPath can be used to exclude nodes based on a
condition.
o It is used to filter out nodes that do not meet the condition.
Example:
<bookstore>
<book lang="en">
<title>Harry Potter</title>
<price>29.99</price>
</book>
<book lang="fr">
<title>Le Petit Prince</title>
<price>19.99</price>
</book>
</bookstore>
An XPath expression to select all books where the price is not greater than 20 would be:
This expression:
Selects all <book> elements where the <price> is not greater than 20.
The result would be the title of "Le Petit Prince" (because its price is 19.99, which is
not greater than 20).
2. XPath Functions
XPath provides a rich set of functions that can be used to process node sets, strings, numbers,
and booleans. Below are some of the most commonly used functions.
a) position() Function
The position() function is used to get the position of a node in a node set. This is helpful
for selecting elements at specific positions in a list.
Syntax:
position()
Description:
o It returns the position of the current node in a node set (starts from 1).
o It is often used with predicates to select nodes at specific positions.
Example:
/bookstore/book[position() = 2]/title
This expression:
b) last() Function
The last() function returns the position of the last node in a node set.
Syntax:
last()
Description:
o This function is used to reference the last node in a node set, regardless of its
actual position.
Example:
/bookstore/book[last()]/title
This expression:
Selects the <title> of the last <book> element in the <bookstore>, regardless of the
number of books.
c) count() Function
The count() function returns the number of nodes in a node set. It is useful for determining
the size of a node set or checking conditions based on the number of nodes.
Syntax:
count(node-set)
Description:
o It returns the number of nodes in the provided node set.
Example:
count(/bookstore/book)
This expression:
Returns the total number of <book> elements under <bookstore>. In the example
above, it would return 2.
d) text() Function
The text() function is used to select the text content of a node. This is essential for
extracting the textual data inside elements.
Syntax:
text()
Description:
o It is used to select the text content of an element or attribute.
Example:
/bookstore/book/title/text()
This expression:
Selects the text content of the <title> element inside each <book>.
e) normalize-space() Function
The normalize-space() function trims leading and trailing whitespace from a string and
replaces multiple spaces between words with a single space.
Syntax:
normalize-space(string)
Description:
o It is useful for cleaning up the text content by removing unnecessary spaces.
Example:
normalize-space(/bookstore/book/title)
This expression:
Trims any extra spaces around the text inside the <title> element and normalizes
multiple spaces within the text.
f) string() Function
The string() function converts a node into a string. It can be used to get the string value of
a node, such as text or attribute values.
Syntax:
string(node)
Description:
o It converts the given node to a string.
Example:
string(/bookstore/book/title)
This expression:
Converts the <title> element of the first <book> node into a string, which will be the
text inside the <title> (e.g., "Harry Potter").
g) contains() Function
Syntax:
contains(string, substring)
Description:
o It returns true if the first string contains the second string, otherwise false.
Example:
/bookstore/book/title[contains(text(), 'Harry')]
This expression:
Selects the <title> of books where the title contains the substring "Harry".
h) starts-with() Function
Syntax:
starts-with(string, substring)
Description:
o It returns true if the first string starts with the second string.
Example:
/bookstore/book/title[starts-with(text(), 'Le')]
This expression:
Selects the <title> element of books where the title starts with "Le" (e.g., "Le Petit
Prince").
Conclusion
XPath provides a range of operators and functions that make working with node sets more
flexible and powerful. By using the union (|) operator, positioning functions like
position() and last(), and other string and numerical functions like normalize-
space(), string(), and contains(), you can create sophisticated queries to extract and
manipulate data from XML documents.
These operators and functions allow you to:
Mastering these tools will enable you to write precise and efficient XPath queries for a wide
range of XML data processing tasks.
XSLT works by applying templates (rules) to the XML document's content, using a
combination of XPath expressions to match nodes in the source document and apply
transformations to them. XSLT is based on XML syntax and follows a strict tree-based
transformation approach, making it suitable for complex transformations.
An XSLT stylesheet consists of an XML document that uses the <xsl:stylesheet> element
as the root element. The basic structure includes:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Define templates and transformations here -->
</xsl:stylesheet>
Let’s start with a simple example to demonstrate how XSLT can be used to transform XML
data into HTML format.
In this XML document, we have a <bookstore> element that contains multiple <book>
elements, each having a <title>, <author>, and <price>.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
</xsl:stylesheet>
Explanation:
The <xsl:template match="/"> element defines a template that matches the root of
the XML document (the <bookstore> element).
The <xsl:for-each select="bookstore/book"> element iterates over each
<book> in the <bookstore> element.
<xsl:value-of select="title" /> extracts and outputs the value of the <title>
element for each <book>, and similarly for <author> and <price>.
The output is an HTML table that displays the bookstore catalog.
Output HTML:
After applying this XSLT transformation to the provided XML input, the result would look
like:
<html>
<body>
<h1>Bookstore Catalog</h1>
<table border="1">
<tr>
<th>Title</th>
<th>Author</th>
<th>Price</th>
</tr>
<tr>
<td>Harry Potter</td>
<td>J.K. Rowling</td>
<td>29.99</td>
</tr>
<tr>
<td>Le Petit Prince</td>
<td>Antoine de Saint-Exupéry</td>
<td>19.99</td>
</tr>
</table>
</body>
</html>
Now that we’ve seen a basic example, let’s dive deeper into the key elements and functions
used in XSLT.
This element extracts the value of the specified node. It is often used for selecting text
content from XML nodes.
Example:
<xsl:value-of select="title" />
The <xsl:for-each> element is used to iterate over a set of nodes. It is similar to a "for"
loop in programming.
Syntax:
<xsl:for-each select="node-set">
<!-- Transformation logic here -->
</xsl:for-each>
Example:
<xsl:for-each select="bookstore/book">
<p><xsl:value-of select="title" /></p>
</xsl:for-each>
This will display the titles of all the books in the bookstore.
Templates define how nodes are transformed. They can match specific elements or attributes
and apply the desired transformation.
Example:
<xsl:template match="book">
<h2><xsl:value-of select="title" /></h2>
</xsl:template>
This will apply the transformation to all <book> elements and display their titles as
<h2>.
The <xsl:apply-templates> element applies templates to child nodes of the current node.
Example:
<xsl:apply-templates select="book" />
The <xsl:if> element allows you to include conditional logic in your transformation. It
works like an if statement in programming.
Example:
<xsl:if test="price > 20">
<p>Price is greater than 20</p>
</xsl:if>
This checks if the price is greater than 20 and, if so, displays the message.
The <xsl:choose> element is used for more complex conditional logic, allowing you to
define multiple "if-else" conditions.
Example:
<xsl:choose>
<xsl:when test="price > 20">Expensive</xsl:when>
<xsl:otherwise>Affordable</xsl:otherwise>
</xsl:choose>
This will create a <book lang="en"> element and insert the book title inside.
XSLT heavily relies on XPath expressions to select nodes from the XML document. XPath
allows you to navigate and manipulate the XML document's structure.
Conclusion
XSLT is a powerful tool for transforming XML data into other formats such as HTML, text,
or another XML structure. It is based on XML syntax and uses XPath for selecting parts of
the document to transform. By using templates, conditional statements, loops, and XPath
expressions, you can apply complex transformations and generate a wide variety of outputs.
Mastering XSLT opens up many possibilities for working with and displaying XML data in a
clean, readable, and useful format.
XSL-FO is a declarative XML-based language used to describe the structure and style of
documents. The language allows designers to specify how the content of an XML document
should be visually presented in terms of fonts, colors, margins, page layouts, and more. Once
you define the layout and formatting in an XSL-FO stylesheet, the formatted content can be
rendered using an XSL-FO processor, which converts the FO document into a desired output
format like PDF or PostScript.
<fo:stylesheet xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<!-- Define page layouts -->
</fo:layout-master-set>
<fo:flow flow-name="xsl-region-body">
<!-- Content to be displayed on the page -->
</fo:flow>
</fo:stylesheet>
Let's take a simple example of an XSL-FO document that will format a document to look like
a simple book, with a title, author, and text content.
<bookstore>
<book>
<title>Harry Potter and the Sorcerer's Stone</title>
<author>J.K. Rowling</author>
<price>29.99</price>
</book>
<book>
<title>Le Petit Prince</title>
<author>Antoine de Saint-Exupéry</author>
<price>19.99</price>
</book>
</bookstore>
<fo:stylesheet xmlns:fo="http://www.w3.org/1999/XSL/Format">
</fo:flow>
</fo:stylesheet>
1. Page Layout:
o The <fo:layout-master-set> defines a simple page layout using
<fo:simple-page-master>. The page has a height of 297mm, a width of
210mm (A4 size), and 1-inch margins all around.
o The <fo:region-body> inside the layout defines the region of the page where
content will be placed.
2. Content Flow:
o The <fo:flow> element specifies the content that should appear on the page.
The flow-name="xsl-region-body" attribute indicates that this content will
be placed inside the body region defined in the page layout.
3. Text Content:
o The <fo:block> element is used to define block-level content. Each block
will contain a piece of the book's data (title, author, and price).
o The font-size, font-weight, and text-align properties are used to style
the content. For example, the title is centered with a large font size and bold
style.
o The margin-top property creates space between different blocks of content.
4. Dynamic Data:
o The <xsl:value-of> elements pull data from the XML input document (e.g.,
<title>, <author>, and <price> of the books) and output them as formatted
text.
Rendering the Output
Once the XSL-FO document is created, you can use an XSL-FO processor like Apache
FOP (Formatting Objects Processor) to render the XSL-FO document into a desired output
format (such as PDF or PostScript).
To generate a PDF output from the XSL-FO document using Apache FOP:
This command will read the transform.xsl stylesheet and the input.xml file, and output
the formatted document as a output.pdf.
1. <fo:block>: This element is used for block-level content like paragraphs or sections.
o Example:
o <fo:block font-size="12pt" line-height="14pt">This is a
paragraph.</fo:block>
2. <fo:inline>: This element is used for inline content, like text or images, that appears
within a block.
o Example:
o <fo:inline font-style="italic">Italicized text</fo:inline>
3. <fo:table>: Used for creating tables.
o Example:
o <fo:table>
o <fo:table-column column-width="100mm"/>
o <fo:table-body>
o <fo:table-row>
o <fo:table-cell><fo:block>Cell
1</fo:block></fo:table-cell>
o <fo:table-cell><fo:block>Cell
2</fo:block></fo:table-cell>
o </fo:table-row>
o </fo:table-body>
o </fo:table>
4. <fo:external-graphic>: Used to include external images in the document.
o Example:
o <fo:external-graphic src="image.png" content-width="50mm"/>
5. <fo:page-sequence>: Defines a sequence of pages, used for multi-page documents.
o Example:
o <fo:page-sequence master-reference="simple-page">
o <fo:flow flow-name="xsl-region-body">
o <fo:block>Content goes here</fo:block>
o </fo:flow>
o </fo:page-sequence>
XSL-FO Properties
1. Typography:
o font-family, font-size, font-weight, font-style, text-align, line-
height, etc.
o Example:
o <fo:block font-family="Arial" font-size="14pt" text-
align="center">Centered text</fo:block>
2. Margins and Padding:
o margin, padding, border, space-before, space-after, etc.
o Example:
o <fo:block margin-top="10mm" margin-bottom="5mm">Text with
margin</fo:block>
3. Positioning:
o position, float, clear, width, height, etc.
o Example:
o <fo:block width="100mm" height="50mm">Block with width and
height</fo:block>
XSL-FO is a rich and feature-packed language that provides extensive control over how
XML data is rendered and formatted. We’ll explore some more advanced elements,
properties, and techniques.
In XSL-FO, the layout is determined by page masters and page sequences. The page
master defines how the pages are laid out (e.g., the size, margins, and regions for content),
and page sequences determine how the content flows across those pages.
A page master defines a page template. You can specify the size, margins, and regions
for the content on the page.
Example:
xml
Copy
<fo:layout-master-set>
<fo:simple-page-master master-name="simple-page" page-height="297mm"
page-width="210mm" margin="1in">
<fo:region-body />
<fo:region-before extent="2in"/>
<fo:region-after extent="1in"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:region-body>: Specifies the body region, where the main content goes.
<fo:region-before>: Defines the region for headers.
<fo:region-after>: Defines the region for footers.
Example:
xml
Copy
<fo:page-sequence master-reference="simple-page">
<fo:flow flow-name="xsl-region-body">
<fo:block>This is the body content.</fo:block>
</fo:flow>
</fo:page-sequence>
In this case, the page sequence uses the simple-page master to lay out the page.
2. Multi-page Documents
XSL-FO excels at handling multi-page documents, such as reports or books, where content
needs to be paginated across multiple pages.
Page Breaks: You can control where page breaks occur using properties like break-
before, break-after, and break-inside.
Example:
xml
Copy
<fo:block break-before="page">This content starts on a new page.</fo:block>
Example:
xml
Copy
<fo:block>
<fo:multi-column column-count="2" column-gap="10mm">
<fo:block>This is the content in column 1.</fo:block>
<fo:block>This is the content in column 2.</fo:block>
</fo:multi-column>
</fo:block>
You can easily include images in an XSL-FO document using the <fo:external-graphic>
element. The image will be rendered as part of the output document (e.g., in a PDF).
Example:
xml
Copy
<fo:block>
<fo:external-graphic src="images/logo.png" content-width="50mm"/>
</fo:block>
This will place the image logo.png in the document, with the specified width of 50mm.
Inline Graphics
XSL-FO also supports inline graphics (e.g., lines, shapes, or paths) using the <fo:instream-
foreign-object> element. This can be used to embed SVG graphics directly within the
document.
Example:
xml
Copy
<fo:block>
<fo:instream-foreign-object>
<svg xmlns="http://www.w3.org/2000/svg" width="100mm"
height="100mm">
<circle cx="50mm" cy="50mm" r="40mm" fill="blue" />
</svg>
</fo:instream-foreign-object>
</fo:block>
XSL-FO provides powerful support for tables with precise control over their appearance and
layout. You can create tables, define column widths, row heights, and even nested tables.
Example:
xml
Copy
<fo:table>
<fo:table-column column-width="3cm"/>
<fo:table-column column-width="5cm"/>
<fo:table-body>
<fo:table-row>
<fo:table-cell>
<fo:block>Cell 1</fo:block>
</fo:table-cell>
<fo:table-cell>
<fo:block>Cell 2</fo:block>
</fo:table-cell>
</fo:table-row>
</fo:table-body>
</fo:table>
You can also nest tables within cells, enabling more complex structures.
Example:
xml
Copy
<fo:table>
<fo:table-row>
<fo:table-cell>
<fo:table>
<fo:table-row>
<fo:table-cell>
<fo:block>Nested Cell 1</fo:block>
</fo:table-cell>
</fo:table-row>
</fo:table>
</fo:table-cell>
</fo:table-row>
</fo:table>
XSL-FO provides rich controls for text formatting, including font properties, text alignment,
and line breaks.
Example:
xml
Copy
<fo:block font-family="Times New Roman" font-size="12pt" line-height="14pt"
text-align="justify">
This is a paragraph of text styled with a specific font and size.
</fo:block>
Text Alignment
You can align text to the left, center, or right using the text-align property.
Example:
xml
Copy
<fo:block text-align="center">This text is centered on the page.</fo:block>
<fo:block text-align="right">This text is aligned to the right.</fo:block>
Adjusting the line height and spacing between lines is easy with XSL-FO.
Example:
xml
Copy
<fo:block line-height="1.5" space-before="10mm" space-after="10mm">
This block has customized line height and spacing.
</fo:block>
XSL-FO allows you to create footnotes and endnotes, useful for academic papers or detailed
documents where references need to be cited.
Footnotes are placed at the bottom of the page, while endnotes appear at the end of
the document.
Although XSL-FO itself is a declarative language, you can combine it with XSLT to
introduce logic, conditionals, and variables into your transformations.
For instance, an XSLT processor can be used to conditionally apply certain styles based on
the data in the XML document.
xml
Copy
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/bookstore">
<fo:stylesheet xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="simple-page" page-
height="297mm" page-width="210mm" margin="1in">
<fo:region-body />
</fo:simple-page-master>
</fo:layout-master-set>
<fo:flow flow-name="xsl-region-body">
<xsl:for-each select="book">
<xsl:variable name="price" select="price"/>
<fo:block>
<xsl:value-of select="title"/> -
<xsl:value-of select="$price"/>
</fo:block>
<xsl:if test="$price > 20">
<fo:block font-weight="bold">Expensive
Book</fo:block>
</xsl:if>
</xsl:for-each>
</fo:flow>
</fo:stylesheet>
</xsl:template>
</xsl:stylesheet>
This example uses XSLT to transform the XML into an XSL-FO document and conditionally
applies a different style for expensive books.
XSL-FO allows the creation of highly complex and customized page layouts. For example:
Conclusion
2.7 Xlink
XLink (XML Linking Language) is a standard defined by the W3C for creating links
between different XML documents. XLink extends the capabilities of XML by allowing the
creation of links in a way that is more complex and flexible than what is possible with
traditional HTML hyperlinks.
The main idea behind XLink is to enable linking between elements within XML documents
or across multiple XML documents. XLink is not a standalone language, but rather an
extension to XML that provides new attributes for elements. It allows for simple links
(similar to standard hyperlinks) as well as advanced links that can connect multiple sources
or define link behaviors (such as linking one document to multiple destinations or defining a
link with more complex behaviors).
1. XLink Types:
o Simple Link: A simple, one-to-one link between two elements or resources.
o Extended Link: A more complex link that can connect multiple resources and
define the relationship between those resources.
2. XLink Attributes:
o xlink:type: Specifies the type of link. It can be simple (for a basic link) or
extended (for complex links).
o xlink:href: Specifies the URI of the resource being linked to.
o xlink:role: Describes the role of the link (optional).
o xlink:arcrole: Describes the nature of the relationship
between the
resources (optional).
o xlink:title: Provides a title for the link (optional).
3. Simple Link Example:
o A simple link in XLink works similarly to an HTML anchor (<a>) tag. It
connects one XML element to another or to an external resource.
Here’s a basic example of how to use XLink to create a simple hyperlink between two XML
elements:
Explanation:
In this example, each <link> element is associated with a "Buy Now" action for different
books in the catalog, which links to an external resource (e.g., Amazon or O'Reilly).
Explanation:
In this example, the book "Learning XML" has two links: one to Amazon and one to
O'Reilly's site. The links are part of an extended link with roles defined for each.
In addition to defining XLink in XML, you may want to process it using XSLT (Extensible
Stylesheet Language Transformations). XSLT can be used to transform XML documents
with XLinks into another XML document or into other formats such as HTML or XHTML.
For example, an XSLT stylesheet that processes XLink in XML can generate HTML links:
<xsl:template match="/catalog">
<html>
<head>
<title>Book Catalog</title>
</head>
<body>
<h1>Book Catalog</h1>
<ul>
<xsl:for-each select="book">
<li>
<xsl:value-of select="title"/> by
<xsl:value-of select="author"/><br/>
<a>
<xsl:value-of select="link/@xlink:title"/>
<xsl:attribute name="href">
<xsl:value-of
select="link/@xlink:href"/>
</xsl:attribute>
</a>
</li>
</xsl:for-each>
</ul>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Explanation:
The XSLT stylesheet matches the <catalog> element and generates an HTML page
with a list of books.
For each <book>, it extracts the book title, author, and the link (using the xlink:href
and xlink:title attributes).
It generates HTML <a> tags, with the href pointing to the link's destination and the
text showing the title of the link.
<html>
<head>
<title>Book Catalog</title>
</head>
<body>
<h1>Book Catalog</h1>
<ul>
<li>
Learning XML by Erik T. Ray<br/>
<a
href="https://www.amazon.com/Learning-XML-Erik-Ray/dp/156592496X">
Buy the book
</a>
</li>
<li>
XML in a Nutshell by Elliotte Rusty Harold<br/>
<a
href="https://www.oreilly.com/library/view/xml-in-a/0596002310/">
Buy the book
</a>
</li>
</ul>
</body>
</html>
Conclusion
XLink provides a powerful way to link resources in XML documents. With simple links, it
behaves similarly to HTML anchors, while extended links enable more complex
relationships between multiple resources. By using XSLT, you can easily transform XML
documents with XLinks into other formats, such as HTML, for display in web browsers.
XLink enhances XML’s capabilities, making it more suitable for applications requiring rich
and flexible linking mechanisms, such as data integration and web-based documents.
2.8 XPointer
XPointer is a language used to address parts of an XML document, allowing you to point to
specific portions or fragments. The primary use of XPointer is to identify particular elements,
attributes, or text nodes within an XML document, enabling the precise linking of specific
sections or parts of XML documents. It builds on the XPath language, which is used to
navigate XML documents, but adds the ability to handle fragments and ranges more
explicitly.
XPointer provides ways to reference sections within XML documents and is often used in
conjunction with XLink to create complex links between documents or fragments of
documents.
1. XPointer Syntax:
o XPointer syntax can reference elements, attributes, and even ranges of text in
an XML document.
o It is used as part of the URI in web links, where the pointer is used to point
directly to specific fragments in the XML document.
2. XPointer Components:
o Element Pointers: Point to specific XML elements using paths.
o Attribute Pointers: Point to specific attributes within elements.
o Text Node Ranges: Address a range of text nodes within an element or part of
an element.
o XPath-based Expressions: XPointer uses XPath expressions to pinpoint a
section in the XML document.
3. Fragment Identifier: XPointer is typically used with fragment identifiers in URIs
(Uniform Resource Identifiers). The URI points to an XML file, and the fragment
identifier (after the #) uses XPointer to point to a specific section of the document.
4. XPath Integration: XPointer uses XPath expressions to navigate through the XML
document. Therefore, any valid XPath expression can be used within an XPointer.
You can point to the first <book> element with id="1" using the following XPointer
expression:
#xpointer(/library/book[@id='1'])
2. Pointing to an Attribute
#xpointer(/library/book[1]/@id)
Explanation: This XPointer expression selects the id attribute of the first <book>
element.
#xpointer(/library/book[1]/title/text())
Explanation: This expression points to the text node inside the <title> element of
the first book (Learning XML).
XPointer is typically used in conjunction with XLink or directly in URLs to link to specific
parts of XML documents. You can link to specific elements of an XML file using the #
symbol followed by the XPointer expression.
Assume you want to create a hyperlink to the title of the first book in the XML document:
In this case:
In an XML document, you may want to create links that target different parts of the
document. Below is an example where XPointer is used with XLink to reference multiple
parts of the same XML document.
In this case:
The first <title> links to the title of the first book (Learning XML).
The second <title> links to the title of the second book (Advanced XML).
XPointer allows for more advanced usages such as referencing ranges of text. If you want to
link to a range of text within a paragraph or element, XPointer allows that functionality.
You can create an XPointer expression that points to a specific <para> element:
Explanation: This XPointer expression points to the second <para> element within
the first <section> of the document.
XPointer allows you to link from one XML document to another, pointing to specific parts of
the second document. Let’s assume you have two XML documents: catalog.xml and
book.xml. You can link to specific sections of book.xml from catalog.xml.
catalog.xml:
book.xml:
In this case:
catalog.xml has links to titles in book.xml, with each link pointing to a different
book's title using XPointer.
XPointer is most commonly used with fragment identifiers in URLs to link directly to parts
of XML documents. When a browser or XML processor encounters a URL with an XPointer
expression, it resolves the XPointer within the document, identifying and potentially
displaying the part specified by the XPointer.
Conclusion
XPointer is a powerful language for linking and referencing specific parts of an XML
document, whether it be elements, attributes, or ranges of text. By using XPath syntax,
XPointer allows for precise targeting of parts of an XML document. It is commonly used in
XLink to create complex, navigable links between XML documents, and it is especially
useful in scenarios where you need to link directly to sub-sections or fragments of large XML
documents.
By combining XPointer with technologies like XLink, XSLT, and XML Schema, you can
create highly dynamic and interconnected XML-based systems.
XInclude and XBase are both technologies related to XML that help with combining or
managing parts of an XML document. They are both used to include other documents or
pieces of data into a primary XML document. Let's break down each one in detail.
XInclude is a standard for including XML documents into other XML documents, allowing
you to manage content modularly. Instead of copying and pasting sections of XML code,
XInclude allows you to reference external XML files and incorporate them at runtime.
1. Modularization: You can split large XML files into smaller ones, keeping them manageable.
2. Reuse: It promotes reuse of XML content across different parts of an application.
3. Data Merging: It can combine different XML sources into one, keeping the source
documents intact.
4. Flexible Inclusion: You can specify inclusion from local or remote files, or even from a URL.
Syntax of XInclude:
The XInclude processing is specified in an XML element with a specific namespace. The
element that performs the inclusion is <xi:include>.
Attributes of <xi:include>:
Imagine we have an XML document called bookstore.xml that includes details about a
bookstore, and we want to include the details of a book stored in an external file called
book.xml.
book.xml:
<?xml version="1.0" encoding="UTF-8"?>
<book>
<title>XML for Beginners</title>
<author>John Doe</author>
<price>19.99</price>
</book>
bookstore.xml (Using XInclude to Include book.xml):
<?xml version="1.0" encoding="UTF-8"?>
<bookstore xmlns:xi="http://www.w3.org/2001/XInclude">
<name>Global Bookstore</name>
<location>New York, USA</location>
<xi:include href="book.xml"/>
<book>
<title>Advanced XML</title>
<author>Jane Doe</author>
<price>29.99</price>
</book>
</bookstore>
Explanation:
<book>
<title>XML for Beginners</title>
<author>John Doe</author>
<price>19.99</price>
</book>
<book>
<title>Advanced XML</title>
<author>Jane Doe</author>
<price>29.99</price>
</book>
</bookstore>
In this example, XInclude has merged the content from book.xml into bookstore.xml. This
is helpful when you want to split data across multiple files but combine them for processing
or presentation.
XBase is a specification that deals with specifying base URIs in XML documents. It defines a
way to manage and resolve relative URIs for documents that may reference other documents
or resources. XBase provides a mechanism for indicating a base URI that should be applied
to the contents of an XML document, which is useful when working with external resources
like images, links, or other XML files.
1. Base URI Resolution: It specifies the base URI used for resolving relative URIs in an XML
document.
2. Global URI Handling: It is useful when multiple XML files or resources refer to the same base
URL, avoiding the need for repeating the base URL in each reference.
XBase Syntax:
The xml:base attribute is used to define the base URI in the XML document.
xml:base: This attribute can be added to any XML element to specify the base URI.
Imagine an XML document catalog.xml that refers to several external resources (like
images) relative to a base URI.
catalog.xml:
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns:xml="http://www.w3.org/XML/1998/namespace">
<product xml:base="http://www.example.com/products/">
<name>XML for Beginners</name>
<image>images/book1.jpg</image>
</product>
<product xml:base="http://www.example.com/products/">
<name>Advanced XML</name>
<image>images/book2.jpg</image>
</product>
</catalog>
Explanation:
The xml:base attribute is applied to the <product> element, and it defines a base URI for
resolving relative paths.
For the first product, the image file will be resolved as
http://www.example.com/products/images/book1.jpg, and similarly, the second
image will resolve as http://www.example.com/products/images/book2.jpg.
Modularization: Just like XInclude, XBase provides a modular approach to defining base URIs
for various resources in the XML document.
Avoid Redundancy: Instead of specifying the full URL for every resource (such as images,
external links, etc.), you can define the base URI once for an element or document.
Convenience: XBase makes it easier to change the base URL, as it can be changed in one
place, and all relative references within that scope will be resolved automatically.
Purpose:
o XInclude is used to include other XML documents or fragments into an XML
document, merging their content into a larger structure.
o XBase is used to define a base URI for resolving relative URIs for resources like
images, links, etc., in an XML document.
Use Case:
o XInclude is beneficial for modularizing XML documents, where you want to split
content across multiple files but treat them as a single document.
o XBase is useful when you need to manage resources that are referred to with
relative URIs, ensuring they are resolved correctly based on a specified base URI.
You can combine XInclude and XBase in an XML document when you want to include
external XML documents and manage resources relative to a base URI. For example, you
might include an external product catalog using XInclude and resolve images using XBase:
While we've seen the basic use of XInclude in integrating XML content, there are additional
advanced features that make XInclude quite powerful in real-world applications.
As we briefly mentioned, XInclude allows you to not just include entire documents but also
point to specific fragments within an external XML file. This is possible using XPointer in
combination with XInclude.
For instance, if you want to include just a specific part of an XML document (rather than the
entire document), you can combine XPointer to include specific elements or attributes.
xml
Copy
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book id="1">
<title>XML for Beginners</title>
<author>John Doe</author>
<price>19.99</price>
</book>
<book id="2">
<title>Advanced XML</title>
<author>Jane Doe</author>
<price>29.99</price>
</book>
</bookstore>
In the main document catalog.xml, you want to include only the <book> element with
id="1".
xml
Copy
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="bookstore.xml#xpointer(/bookstore/book[@id='1'])"/>
</catalog>
Explanation:
<xi:include
href="bookstore.xml#xpointer(/bookstore/book[@id='1'])"/>: This line
includes just the <book> element with id="1" from bookstore.xml, using the XPointer
to target the specific element.
This ability to include specific fragments is very useful for working with large documents or
when integrating specific content (e.g., one article or a product detail) from an external file.
2. Include Multiple Documents in Sequence
XInclude supports including multiple documents at different places in the same document.
You can include several parts of different XML files or even the same file multiple times.
In this case, both book1.xml and book2.xml will be included in the catalog.xml document
at their respective positions. This can be very useful when combining content from multiple
sources into a single XML document.
The parse attribute in XInclude allows you to control how the external XML content is
parsed. The attribute can have values such as xml, html, or text.
parse="xml": This is the default, and it tells the processor to treat the included content as
XML, parsing it accordingly.
parse="html": Treats the included content as HTML.
parse="text": Treats the content as plain text, with no further parsing.
In this case:
This makes XInclude very flexible for dealing with different types of external content.
As mentioned, XBase is used to manage base URIs in XML documents. It is essential when
working with relative URIs within the XML file. Below, we will look into advanced
scenarios and real-world applications where XBase becomes more useful.
1. Using xml:base for Complex URI Resolution
The xml:base attribute in XBase defines the base URI for an XML document. This means
that all relative URIs in the XML document will be resolved against this base URI.
You can set xml:base at any XML element (or document root) level, and it will apply to all
subsequent relative URIs within that scope.
Imagine an XML document where you define the base URI at a higher level and change it in
a nested element.
xml
Copy
<?xml version="1.0" encoding="UTF-8"?>
<library xmlns:xml="http://www.w3.org/XML/1998/namespace">
<book xml:base="http://www.example.com/books/">
<title>XML for Beginners</title>
<image>cover.jpg</image> <!-- Resolves to
http://www.example.com/books/cover.jpg -->
</book>
<book xml:base="http://www.example.com/advanced-books/">
<title>Advanced XML</title>
<image>cover.jpg</image> <!-- Resolves to
http://www.example.com/advanced-books/cover.jpg -->
</book>
</library>
Explanation:
The first <book> element has its base URI set to http://www.example.com/books/. So,
cover.jpg will resolve to http://www.example.com/books/cover.jpg.
The second <book> element changes its base URI to
http://www.example.com/advanced-books/, so cover.jpg will resolve to
http://www.example.com/advanced-books/cover.jpg.
This ability to change the base URI at different levels of an XML document provides great
flexibility in managing resources.
In practical XML-based applications, such as digital catalogs or web services, XBase helps
manage external resources like images, files, and links that are relative to the XML
document.
In a larger system, XBase can work alongside other XML technologies like XLink and
XInclude to manage document fragments or external resources across multiple files.
For instance, you can use XBase to define a base URI for the resources, while using
XInclude to pull in parts of external XML files. This way, you can manage both the inclusion
of content and the resolution of external links or files.
Imagine a scenario where you are creating a set of XML documents representing parts of a
larger e-commerce catalog. Instead of copying all the data into one file, you can split the
catalog into separate files: one for books, one for electronics, and so on.
With XInclude, you can merge these files into a master catalog. You can also use XBase to
resolve relative URLs for product images, prices, and other resources.
In scientific or research-based applications, large datasets are often managed using XML
documents. If these datasets refer to numerous external resources (such as images, charts, or
supplementary files), XBase makes it easy to ensure that these resources are properly
resolved across a distributed system.
When dealing with REST APIs or web services that provide XML responses, XInclude can
be used to dynamically merge responses from different parts of a web service or external
documents. At the same time, XBase can handle any relative links or paths that are returned
as part of the XML data.
Conclusion
XInclude is ideal for combining multiple XML documents into one document dynamically.
XBase is used for managing and resolving relative URIs in XML documents, making it easier
to handle external resources.
Both XInclude and XBase play essential roles in managing modular XML content and
handling external resources in XML-based applications. They can work together to create
flexible, maintainable, and scalable XML systems.