0% found this document useful (0 votes)

12 views7 pages

Handout 3

This document provides instructions for exercises using XPath to extract information from XML documents. It describes using the command line tool xmllint to interactively explore sample XML files and check that they are well-formed. It then introduces using XPath expressions in Python scripts to extract specific nodes and attributes from another sample XML file on movie quotations. The final part discusses handling XML namespaces, providing another sample XML file using two namespaces and modifying the Python script to support namespaces.

Uploaded by

Mon Lucy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

Handout 3

Uploaded by

Mon Lucy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Practical 3: Reading XML using XPath

In Practical 2 you looked at ways of producing XML output from Fortran codes. In this af-
ternoon’s exercises you will look using the XML in ways beyond reading it into a web
browser or Google Earth. The aim of the session is to give you an understanding of XPath,
a rather high level interface to data from an XML document used by many XML technolo-
gies. It’s worth designing your documents so that they can be easily used with XPath ex-
pressions. However, there is no Fortran XPath interface, so this exercise will be done us-
ing other tools including Python or Perl.

Don’t worry though, you don’t need to know either scripting language, and XPath is a
language-agnostic interface; for this exercise you will only be doing XPath, not writing
code. But this also reflects practicalities, in that even for data produced by Fortran codes,
much analysis and post-processing is done using scripting languages such as Perl or Py-
thon.

The files needed for this practical can be found in the directory ~/Practical_3. Some of
the exercises in this practical can be performed in more then one language (Perl, Python
and sometimes Shell examples are given), unless you have a particular reason to choose
one of these we recommend you work on the Python examples as Python has a cleaner
interface to the XPath library you will be using.

Exercise 3.1: xmllint

Before you get started with XPath
proper, you should spend a bit of time
exploring a simple document using a
program called ‘xmllint’. This is a
command line interface to the C li-
brary libXML2. You’ll be using bind-
ings to this library from scripting lan-
guages in the later parts of this
practical. xmllint and the underlying
library are installed by default on all
modern linux and Mac OS X comput-
ers.

In the exercise_1 directory you will

find a simple XML document
(document.xml) which represents
data similar to the structure on the
right. The first use of xmllint is to
check if a document is well formed.
This can be done by running the
command:

xmllint document.xml

if the document is well formed then the XML file will be printed to the screen (useful for pip-
ing into a second process), but if it is not an error message will be reported including a line
number of where the error occurred. This can be an incredibly useful tool when debugging!
Is the provided document well formed? If not, how can you fix it?

3-1
For the record, there is an error (a missing ‘>’) at the end of the last line of the provided
document. Once you have fixed this you can use xmllint to interactively explore the XML
file, do the following:

xmllint document.xml --shell

This will give you a prompt, looking like:

/ >

You can use this prompt in the same way as a normal shell prompt - type commands into
it, and then press return for them to be executed.

By starting up xmllint in this way, it lets you walk around the XML document, as if it were a
filesystem tree, as described in the lecture. At the start of a session, you are placed at the
top of the document tree.

So you are now in the XML document at the top level. You can navigate it as if were a di-
rectory tree, using 'cd' to change directory, and 'ls' to look at directory contents. So, as a
first step, do:

/ > cd parent

Note that as you do this, the prompt changes to show you where you are in the tree.

parent > ls

You will see a list of results. These are all the XML elements which are the top-most chil-
dren of the XML file. That is, the top-level 'directories' if this were a filesystem tree. There
should be three 'child' elements. (You can ignore the 1st and 2nd columns of output.)

Let's look at the first element. There are two ways to do this:

parent > cat child[1]

shows you the raw XML for that element and:

parent > dir child[1]

shows you what information the XML actually contains.

The arguments you have been providing are actually XPath expressions. So you can get
all of the grandchild names by doing:

parent > ls child/grandchild/@name

or:

parent > cat child/grandchild/@name

3-2
Here the ‘@’ is asking for an attribute of with the name ‘name’. But what if you want to se-
lect the date of birth (born attribute) of, say, Elizabeth I? or the names of the children of
Margaret Tudor? To do this you will need to add a ‘predicate’, a condition that must be met
for the XPath expression to match. These are included in square brackets. The date of
birth of Elizabeth I can be found by doing:

cat child/grandchild[@name='Elizabeth I']/@born

and the children of Margaret Tudor can be found by doing:

cat child[@name='Margaret Tudor']/grandchild/@name

In this exercise we have only scratched the surface of xmllint’s capabilities. It can be used
for a wide range of XML related tasks including validation, changing the text encoding,
canonicalisation and processing Xinclude statements but rather then looking at this in any
depth we’ll now move on to look at building XPath expressions in scripting environments.
One thing we have not touched on is namespaces in xmllint. This is not because they are
not supported, but because the commands needed rapidly get quite involved and (as we’ll
see below) namespaces are easily handled from Python.

Exercise 3.2: Exploring XPath

Files for this exercise can be found in the exercise_2 directory. The file xpather.py is a
simple python script designed to read an XML document, run an XPath query and print the
result. The XML document monty.xml contains some quotations from the film “Monty Py-
thon and the Holy Grail”. Take a look at the XML file and the xpather.py script and try to
deduce what will be output when the script is run. Now run the script by typing:

python xpather.py

You should see “[‘Monty Python and the Holy Grail’]” - which is python’s way
of printing a one element list (array) with a single string element.

For the rest of this part of the exercise you should change the XPath query to extract other
data. To change the xpather.py script you just need to edit and save it- there is no need
to compile a python script. The only line you will need to change is:

answer = docRoot.xpath("/film/@name")

Modify the XPath query to extract the following information:

1. The date of the film encoded in the “<data date='1975'/>” element. Hint: you will need
to add an additional location step in the XPath query, and modify the attribute name.

XPath query: __________________________________________________________

2. The names of all five of the listed Pythons. Hint: The fact that this requires a list of
names to be returned does not add to the complexity of the XPath query needed, again
you just need to change a location step and modify the attribute name.

XPath query: __________________________________________________________

3-3
3. The quotations of all the characters. Hint: you won’t need any attributes, just three loca-
tion steps, and you will need /text() to recover the text.

XPath query: __________________________________________________________

4. You can also complete part 3 with a single location step. Can you construct such an
XPath expression?

XPath query: __________________________________________________________

5. Modify your solution to part 2 so that only the name of the character played by Terry
Gilliam is reported. Hint: You will need to use a predicate (in square brackets).

XPath query: __________________________________________________________

6. Write a query to return the quotation of Eric Idle.

XPath query: __________________________________________________________

7. Write a query to return the quotation of the character Sir Bedevere.

XPath query: __________________________________________________________

Exercise 3.3: XPath with Namespaces.

In the exercise_3 directory you should also find a file called monty_ns.xml. As the name
suggests, this is a version of monty.xml with XML namespaces added. We saw in
Practical 2 that documents with multiple namespaces are a useful way to combine different
XML vocabularies in the same file. In this exercise we you will examine ways to use XPath
on such a document. In monty_ns.xml I have imagined that two XML vocabularies exist,
one to describe films with the namespace http://www.example.com/films, and one
to describe quotations with the namespace http://www.example.com/quotes. We
will use this mixed vocabulary for this exercise.

First try some of the solutions to exercise 3.2 with the file monty_ns.xml. A copy of
xpather.py is included in the directory. Note that the file name monty.xml has been
changed to monty_ns.xml on the line to load the XML:

docRoot = lxml.etree.parse(source="monty_ns.xml")

Do any of the previous expressions work?

You should find that all the expressions return an empty list (‘[]’). The XPath expressions
have not matched any nodes. This is because we have not specified the namespace and
so the XPath library is only looking for nodes with no defined namespace. All nodes in
monty_ns.xml have a namespace defined, either directly with an ‘xmlns=’ declaration,
or by inheritance.

3-4
The file xpather_ns.py is a python script set up to do the same work as xpather.py,
but for namespaced documents. Take a look at the xpather_ns.py file. You should
note two changes. First the line:

namespaces = {'q':'http://www.example.com/quotes',
'f':'http://www.example.com/films'}

has been added. This declares a python dictionary of namespaces and related local short
names (f and q) to enable their use in XPath expressions with much less typing. Secondly,
the call to the XPath library has been modified:

answer = docRoot.xpath("/f:film/@name", namespaces)

to tell the library to use the dictionary of namespaces. One way to think of this is that the
dictionary lists all namespaces this script is designed to know about. XPath expressions
will simply ignore namespaces that the script does not understand.

Run xpather_ns.py - does it work? You should now repeat Exercise 3.2 with this
namespace aware version.

1. The date of the film encoded in the “<data date='1975'/>” element. Do you need
to give a namespace to each data element in the search path, or is the namespace in-
herited through the query?

XPath query: __________________________________________________________

2. The names of all five of the listed Pythons.

XPath query: __________________________________________________________

3. The quotations of all the characters. Hint: remember that <quote> elements are in a
different namespace to <film>, <data> and <comic>.

XPath query: __________________________________________________________

4. You can also complete part 3 with a single location step. Can you construct such an
XPath expression?

XPath query: __________________________________________________________

5. Modify your solution to part 2 so that only the name of the character played by Terry
Gilliam.

XPath query: __________________________________________________________

6. Write a query to return the quotation of Eric Idle.

XPath query: __________________________________________________________

7. Write a query to return the quotation of the character Sir Bedevere.

XPath query: __________________________________________________________

3-5
Exercise 3.4: Matching Nodes
The remainder of this morning’s practical involves using XPath to extract data from the
mixed namespace KML document produced by yesterday’s final exercise. In case you did
not finish the final part of that exercise, a suitable XML document named hypoDD.kml is
provided in the exercise_4 directory along with all the python scripts needed for the the
remainder of this practical.

kml The basic structure a typical KML document is given in

Document the figure on the left in terms of a tree-like representa-
tion (not all elements are shown). The document root is
Style
represented by a <kml> element, this (may) have one
Style or more <Document> child elements called, each of
Folder which contains some number of <Style>s and
Placemark <Folder>s. The python script “document_info.py”
is designed to print some information about the KML
Placemark
file, but the XPath expressions are missing. Fill them in
Folder to make the script work.
Placemark
The first XPath expression is intended to return the
Placemark
name(s) of any <Document>s which are children of the
root element. The second two expressions are sup-
posed to return numbers, specifically the number of
<Style>s and <Folder>s belonging to the <Document>s.

First XPath query: _________________________________________________________

Second XPath query: ______________________________________________________

Third XPath query: ________________________________________________________

Hint: Absolute XPath expressions (starting with a / including all the elements needed to
find the information) can be used for this exercise. The XPath function count() can be used
to find the number of nodes returned by an XPath query.

Exercise 3.5: Nodesets and loops

The python script folder_info.py is designed to extract some simple information re-
garding all the folders within a KML document. This script is intended to perform data ex-
traction in two stages. The first query should extract a list of all folders and stores the re-
sult in the variable ‘folders’ (which will be a list). This XPath query should match all
<Folder> elements in the document wherever they are located. The second XPath query
is located within a loop which extracts each node in series from the list in the ‘folders’ vari-
able and places it in the object ‘node’ which is passed to the subsequent XPath expres-
sions. These two expressions need to contain relative XPath queries (not starting with a
‘/’). The first should return the name of the folder represented by ‘node’ and the second
should count the number of Placemarks within the folder.

First XPath query: _________________________________________________________

3-6
Second XPath query: ______________________________________________________

Third XPath query: ________________________________________________________

Exercise 3.6: Analysis

The python script calculate_moves.py performs some more involved analysis of the
output data using the quakeML data embedded within the KML document. Specifically, the
script is designed to work out how far each of the earthquake locations have been moved
during the hypoDD run represented by the document. Six XPath expressions are needed.
In order, these need to:

1. Extract a list of placemarks within the folder with the name ‘Initial positions’.

XPath query: __________________________________________________________

2. Extract the unique_id from the quakeML location element embedded in each place-
mark in turn. This expression involves a change of namespace.

XPath query: __________________________________________________________

3. Extract the latitude and longitude of each event.

XPath query: __________________________________________________________

4. Find the latitude and longitude of the event with a matching unique_id from the ‘Final
positions’ folder.

XPath query: __________________________________________________________

The script should then print the distance that hypoDD has moved each earthquake during
its refinement process.

3-7

Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Arduino Based Digital Thermometer
67% (3)
Arduino Based Digital Thermometer
3 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
Graphic Designer Resume Sample
100% (1)
Graphic Designer Resume Sample
4 pages
XSLT and XPath
100% (6)
XSLT and XPath
15 pages
Junos Release Notes 22.4r3
No ratings yet
Junos Release Notes 22.4r3
177 pages
Profibus DP Mapping of Siprotec Compact Relays
No ratings yet
Profibus DP Mapping of Siprotec Compact Relays
56 pages
13 XPath
No ratings yet
13 XPath
91 pages
XQuery
No ratings yet
XQuery
24 pages
Lecture 4
No ratings yet
Lecture 4
40 pages
(Cat) Sat Phone
No ratings yet
(Cat) Sat Phone
19 pages
Brief History of Legislative
No ratings yet
Brief History of Legislative
33 pages
Parts Guide Manual: Ineo+ 454e A5C0121
No ratings yet
Parts Guide Manual: Ineo+ 454e A5C0121
149 pages
XQuery
No ratings yet
XQuery
21 pages
Microprocessor and Assembly Language Lecture Note For Ndii Computer Engineering
No ratings yet
Microprocessor and Assembly Language Lecture Note For Ndii Computer Engineering
25 pages
Battlefy Player Guide
No ratings yet
Battlefy Player Guide
73 pages
Xpath Guideline and Practises
No ratings yet
Xpath Guideline and Practises
39 pages
Spam Detection Viva Questions Full
No ratings yet
Spam Detection Viva Questions Full
5 pages
Python Language 581 860
No ratings yet
Python Language 581 860
280 pages
Unit 3
No ratings yet
Unit 3
11 pages
XMLques
No ratings yet
XMLques
5 pages
Using Xquery For Problem Solving: Pekka Kilpel Ainen
No ratings yet
Using Xquery For Problem Solving: Pekka Kilpel Ainen
41 pages
HikCentral Access Control Brochure
No ratings yet
HikCentral Access Control Brochure
2 pages
Alfred Espinas As Sociedades Animais
No ratings yet
Alfred Espinas As Sociedades Animais
601 pages
Xpath Xquery
No ratings yet
Xpath Xquery
36 pages
Xpath
No ratings yet
Xpath
40 pages
XPath Tutorial
No ratings yet
XPath Tutorial
11 pages
Unit 2
No ratings yet
Unit 2
50 pages
Query Languages For XML: Xpath Xquery XSLT (Not Being Covered Today!)
No ratings yet
Query Languages For XML: Xpath Xquery XSLT (Not Being Covered Today!)
19 pages
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
No ratings yet
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
2 pages
AKANKSHA START PAGE - Merged
No ratings yet
AKANKSHA START PAGE - Merged
51 pages
Introduction To XML
No ratings yet
Introduction To XML
44 pages
Preprocessing, Inverted Index
No ratings yet
Preprocessing, Inverted Index
15 pages
XML Starlet
No ratings yet
XML Starlet
16 pages
08 XQuery
No ratings yet
08 XQuery
88 pages
Lecture 17 XML and XPATH and XQUERY
No ratings yet
Lecture 17 XML and XPATH and XQUERY
93 pages
04-xml XPath
No ratings yet
04-xml XPath
37 pages
Pylxml
No ratings yet
Pylxml
56 pages
XML Query Language: Advisor: Prof. Zaniolo Hung-Chih Yang Ling-Jyh Chen
No ratings yet
XML Query Language: Advisor: Prof. Zaniolo Hung-Chih Yang Ling-Jyh Chen
30 pages
XPath
No ratings yet
XPath
12 pages
Font Type WP Hebrew David (TrueType)
No ratings yet
Font Type WP Hebrew David (TrueType)
1 page
Digital Logic Families - TTL - NMOS
No ratings yet
Digital Logic Families - TTL - NMOS
36 pages
HollySys - Introduction V1.2 - 2021
No ratings yet
HollySys - Introduction V1.2 - 2021
42 pages
Report AMRUTHA FINAL
No ratings yet
Report AMRUTHA FINAL
12 pages
Xquery
No ratings yet
Xquery
43 pages
Aws Report 1
No ratings yet
Aws Report 1
7 pages
CS 345B Homework 3: XML and Databases: Donald Kossmann Daniela Florescu Anish Das Sarma
No ratings yet
CS 345B Homework 3: XML and Databases: Donald Kossmann Daniela Florescu Anish Das Sarma
2 pages
Transition To Dark NOC DNOC
No ratings yet
Transition To Dark NOC DNOC
7 pages
10717-13 XPath
No ratings yet
10717-13 XPath
90 pages
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
From Everand
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
Christopher Right
2.5/5 (2)
Computer Programming 2 Prelim Reviewer (AMA)
No ratings yet
Computer Programming 2 Prelim Reviewer (AMA)
4 pages
Python XML Processing With LXML
No ratings yet
Python XML Processing With LXML
56 pages
A Review On Query Processing and Query Languages For Content Management in XML Database
No ratings yet
A Review On Query Processing and Query Languages For Content Management in XML Database
4 pages
Introduction to XML: Nhóm thực hiện
No ratings yet
Introduction to XML: Nhóm thực hiện
46 pages
Sap-C S4ewm 2023
No ratings yet
Sap-C S4ewm 2023
31 pages
Introduction To XPath Injection Techniques
No ratings yet
Introduction To XPath Injection Techniques
8 pages
Towards Responsible Machine Translation Ethical and Legal Considerations in Machine Translation Helena Moniz Instant Download
No ratings yet
Towards Responsible Machine Translation Ethical and Legal Considerations in Machine Translation Helena Moniz Instant Download
65 pages
Xpath Tutorial
No ratings yet
Xpath Tutorial
33 pages
Sy0-701 9
No ratings yet
Sy0-701 9
23 pages
Xquery
No ratings yet
Xquery
23 pages
2 XPath - English
No ratings yet
2 XPath - English
17 pages
MCSL-204 ENG (Jan 25 To July 25)
No ratings yet
MCSL-204 ENG (Jan 25 To July 25)
19 pages
E-Geforce 6200 TC: 32Mb On-Board 128Mb Supporting DDR Tv-Out Pci-E
No ratings yet
E-Geforce 6200 TC: 32Mb On-Board 128Mb Supporting DDR Tv-Out Pci-E
1 page
Module 3
No ratings yet
Module 3
21 pages
XPath
No ratings yet
XPath
50 pages
Xquery and Xpath 2
No ratings yet
Xquery and Xpath 2
25 pages
Soa XSLT Xpath Xquery
No ratings yet
Soa XSLT Xpath Xquery
8 pages
Xquery Tutorial: What You Should Already Know
No ratings yet
Xquery Tutorial: What You Should Already Know
21 pages
Python XML Processing With LXML
No ratings yet
Python XML Processing With LXML
52 pages
Xquery Tutorial
No ratings yet
Xquery Tutorial
20 pages
XPath Introduction
No ratings yet
XPath Introduction
12 pages
G Usb BLSTR v2.5-876789
No ratings yet
G Usb BLSTR v2.5-876789
30 pages
Part I: Basics: Chapter 1. Xquery: A Guided Tour
No ratings yet
Part I: Basics: Chapter 1. Xquery: A Guided Tour
61 pages
W3schools Xpath PDF
No ratings yet
W3schools Xpath PDF
32 pages
Xpath Tutorial and Reference
No ratings yet
Xpath Tutorial and Reference
5 pages
HAAS - Kiwicon7-Automating Advanced XPath Injection Attacks PDF
No ratings yet
HAAS - Kiwicon7-Automating Advanced XPath Injection Attacks PDF
39 pages
JavaScript Introduction
From Everand
JavaScript Introduction
Lisa Saldivar
No ratings yet
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
CISSP Simplilearn
80% (5)
CISSP Simplilearn
969 pages
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Python: Learn Python in 24 Hours
From Everand
Python: Learn Python in 24 Hours
Alex Nordeen
4/5 (12)
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
XSL Primer
From Everand
XSL Primer
Stephen Cote
No ratings yet
Geoffrey Riggs: Memorandum
No ratings yet
Geoffrey Riggs: Memorandum
7 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
MVS JCL Utilities Quick Reference, Third Edition
From Everand
MVS JCL Utilities Quick Reference, Third Edition
Robert Wingate
5/5 (1)
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Handout 3

Uploaded by

Handout 3

Uploaded by

Practical 3: Reading XML using XPath

Exercise 3.1: xmllint

In the exercise_1 directory you will

xmllint document.xml --shell

This will give you a prompt, looking like:

parent > cat child[1]

shows you the raw XML for that element and:

parent > dir child[1]

shows you what information the XML actually contains.

parent > ls child/grandchild/@name

parent > cat child/grandchild/@name

cat child/grandchild[@name='Elizabeth I']/@born

and the children of Margaret Tudor can be found by doing:

cat child[@name='Margaret Tudor']/grandchild/@name

Exercise 3.2: Exploring XPath

Modify the XPath query to extract the following information:

XPath query: __________________________________________________________

XPath query: __________________________________________________________

XPath query: __________________________________________________________

XPath query: __________________________________________________________

XPath query: __________________________________________________________

6. Write a query to return the quotation of Eric Idle.

XPath query: __________________________________________________________

7. Write a query to return the quotation of the character Sir Bedevere.

XPath query: __________________________________________________________

Exercise 3.3: XPath with Namespaces.

Do any of the previous expressions work?

answer = docRoot.xpath("/f:film/@name", namespaces)

XPath query: __________________________________________________________

2. The names of all five of the listed Pythons.

XPath query: __________________________________________________________

XPath query: __________________________________________________________

XPath query: __________________________________________________________

XPath query: __________________________________________________________

6. Write a query to return the quotation of Eric Idle.

XPath query: __________________________________________________________

7. Write a query to return the quotation of the character Sir Bedevere.

XPath query: __________________________________________________________

kml The basic structure a typical KML document is given in

First XPath query: _________________________________________________________

Second XPath query: ______________________________________________________

Third XPath query: ________________________________________________________

Exercise 3.5: Nodesets and loops

First XPath query: _________________________________________________________

Third XPath query: ________________________________________________________

Exercise 3.6: Analysis

XPath query: __________________________________________________________

XPath query: __________________________________________________________

3. Extract the latitude and longitude of each event.

XPath query: __________________________________________________________

XPath query: __________________________________________________________

XPath query: __________________________________________________________

You might also like