Xmlschema PDF
Xmlschema PDF
Xmlschema PDF
Release 1.0.7
Davide Brunato
1 Introduction 1
1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Usage 3
2.1 Create a schema instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 XSD declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Data decoding and encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Validating and decoding ElementTree’s elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Customize the decoded data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Decoding to JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 XSD validation modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9 XML attacks prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 API Documentation 13
3.1 Document level API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Schema level API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 ElementTree and XPath API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 XSD globals maps API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 XML Schema converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Resource access API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Errors and exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Testing 31
4.1 Test scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Test files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 Release notes 35
5.1 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
i
ii
CHAPTER 1
Introduction
The xmlschema library is an implementation of XML Schema for Python (supports Python 2.7 and Python 3.4+).
This library arises from the needs of a solid Python layer for processing XML Schema based files for MaX (Materials
design at the Exascale) European project. A significant problem is the encoding and the decoding of the XML data
files produced by different simulation software. Another important requirement is the XML data validation, in order
to put the produced data under control. The lack of a suitable alternative for Python in the schema-based decoding of
XML data has led to build this library. Obviously this library can be useful for other cases related to XML Schema
based processing, not only for the original scope.
The full xmlschema documentation is available on “Read the Docs”.
1.1 Features
1
xmlschema Documentation, Release 1.0.7
1.2 Installation
You can install the library with pip in a Python 2.7 or Python 3.4+ environment:
The library uses the Python’s ElementTree XML library and requires elementpath and defusedxml additional packages.
The base schemas of the XSD standards are included in the package for working offline and to speed-up the building
of schema instances.
2 Chapter 1. Introduction
CHAPTER 2
Usage
import xmlschema
The module initialization builds the XSD meta-schemas and of the dictionary containing the code points of the Unicode
categories.
Import the library and then create an instance of a schema using the path of the file containing the schema as argument:
this option might not works when the schema includes other local subschemas, because the package cannot knows
anything about the schema’s source location:
3
xmlschema Documentation, Release 1.0.7
Schema:
Path: /xs:schema/xs:element/xs:complexType/xs:sequence/xs:element
The schema object includes XSD declarations (notations, types, elements, attributes, groups, attribute_groups, substi-
tution_groups). The global XSD declarations are available as attributes of the schema instance:
>>> schema.types
NamespaceView({'vehicleType': XsdComplexType(name='vehicleType')})
>>> pprint(dict(schema.elements))
{'bikes': XsdElement(name='vh:bikes', occurs=[1, 1]),
'cars': XsdElement(name='vh:cars', occurs=[1, 1]),
'vehicles': XsdElement(name='vh:vehicles', occurs=[1, 1])}
>>> schema.attributes
NamespaceView({'step': XsdAttribute(name='vh:step')})
Those declarations are local views of XSD global maps shared between related schema instances. The global maps
can be accessed through XMLSchema.maps attribute:
4 Chapter 2. Usage
xmlschema Documentation, Release 1.0.7
Schema objects include methods for finding XSD elements and attributes in the schema. Those are methods ot the
ElementTree’s API, so you can use an XPath expression for defining the search criteria:
>>> schema.find('vh:vehicles/vh:bikes')
XsdElement(ref='vh:bikes', occurs=[1, 1])
>>> pprint(schema.findall('vh:vehicles/*'))
[XsdElement(ref='vh:cars', occurs=[1, 1]),
XsdElement(ref='vh:bikes', occurs=[1, 1])]
2.3 Validation
The library provides several methods to validate an XML document with a schema.
The first mode is the method XMLSchema.is_valid(). This method returns True if the XML argument is
validated by the schema loaded in the instance, returns False if the document is invalid.
>>> schema.is_valid('xmlschema/tests/cases/examples/vehicles/vehicles.xml')
True
>>> schema.is_valid('xmlschema/tests/cases/examples/vehicles/vehicles-1_error.xml')
False
>>> schema.is_valid("""<?xml version="1.0" encoding="UTF-8"?><fancy_tag/>""")
False
An alternative mode for validating an XML document is implemented by the method XMLSchema.validate(),
that raises an error when the XML doesn’t conforms to the schema:
>>> schema.validate('xmlschema/tests/cases/examples/vehicles/vehicles.xml')
>>> schema.validate('xmlschema/tests/cases/examples/vehicles/vehicles-1_error.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/brunato/Development/projects/xmlschema/xmlschema/schema.py", line 220,
˓→in validate
raise error
xmlschema.exceptions.XMLSchemaValidationError: failed validating <Element ...
Schema:
<xs:sequence xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element maxOccurs="unbounded" minOccurs="0" name="car" type=
˓→"vh:vehicleType" />
</xs:sequence>
Instance:
<ns0:cars xmlns:ns0="http://example.com/vehicles">
NOT ALLOWED CHARACTER DATA
<ns0:car make="Porsche" model="911" />
(continues on next page)
2.3. Validation 5
xmlschema Documentation, Release 1.0.7
A validation method is also available at module level, useful when you need to validate a document only once or if
you extract information about the schema, typically the schema location and the namespace, directly from the XML
document:
>>> schema.types['vehicleType'].decode
<bound method XsdComplexType.decode of XsdComplexType(name='vehicleType')>
>>> schema.elements['cars'].encode
<bound method ValidationMixin.encode of XsdElement(name='vh:cars', occurs=[1, 1])>
Those methods can be used to decode the correspondents parts of the XML document:
{'@xmlns:vh': 'http://example.com/vehicles',
'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
{'@make': 'Yamaha', '@model': 'XS650'}]}
You can also decode the entire XML document to a nested dictionary:
6 Chapter 2. Usage
xmlschema Documentation, Release 1.0.7
The decoded values coincide with the datatypes declared in the XSD schema:
>>> pprint(xs.to_dict('xmlschema/tests/cases/examples/collection/collection.xml'))
{'@xmlns:col': 'http://example.com/ns/collection',
'@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'@xsi:schemaLocation': 'http://example.com/ns/collection collection.xsd',
'object': [{'@available': True,
'@id': 'b0836217462',
'author': {'@id': 'PAR',
'born': '1841-02-25',
'dead': '1919-12-03',
'name': 'Pierre-Auguste Renoir',
'qualification': 'painter'},
'estimation': Decimal('10000.00'),
'position': 1,
'title': 'The Umbrellas',
'year': '1886'},
{'@available': True,
'@id': 'b0836217463',
'author': {'@id': 'JM',
'born': '1893-04-20',
'dead': '1983-12-25',
'name': 'Joan Miró',
'qualification': 'painter, sculptor and ceramicist'},
'position': 2,
'title': None,
'year': '1925'}]}
If you need to decode only a part of the XML document you can pass also an XPath expression using in the path
argument.
>>> xs = xmlschema.XMLSchema('xmlschema/tests/cases/examples/vehicles/vehicles.xsd')
>>> pprint(xs.to_dict('xmlschema/tests/cases/examples/vehicles/vehicles.xml', '/
˓→vh:vehicles/vh:bikes'))
Note: Decode using an XPath could be simpler than using subelements, method illustrated previously. An XPath
expression for the schema considers the schema as the root element with global elements as its children.
All the decoding and encoding methods are based on two generator methods of the XMLSchema class, namely
iter_decode() and iter_encode(), that yield both data and validation errors. See Schema level API section for more
information.
Validation and decode API works also with XML data loaded in ElementTree structures:
The standard ElementTree library lacks of namespace information in trees, so you have to provide a map to convert
URIs to prefixes:
You can also convert XML data using the lxml library, that works better because namespace information is associated
within each node of the trees:
{'@xmlns:vh': 'http://example.com/vehicles',
'@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'@xsi:schemaLocation': 'http://example.com/vehicles vehicles.xsd',
'vh:bikes': {'vh:bike': [{'@make': 'Harley-Davidson', '@model': 'WL'},
{'@make': 'Yamaha', '@model': 'XS650'}]},
'vh:cars': {'vh:car': [{'@make': 'Porsche', '@model': '911'},
{'@make': 'Porsche', '@model': '911'}]}}
8 Chapter 2. Usage
xmlschema Documentation, Release 1.0.7
Starting from the version 0.9.9 the package includes converter objects, in order to control the decoding process and
produce different data structures. Those objects intervene at element level to compose the decoded data (attributes and
content) into a data structure.
The default converter produces a data structure similar to the format produced by previous versions of the package. You
can customize the conversion process providing a converter instance or subclass when you create a schema instance
or when you want to decode an XML document. For instance you can use the Badgerfish converter for a schema
instance:
>>> import xmlschema
>>> from pprint import pprint
>>> xml_schema = 'xmlschema/tests/cases/examples/vehicles/vehicles.xsd'
>>> xml_document = 'xmlschema/tests/cases/examples/vehicles/vehicles.xml'
>>> xs = xmlschema.XMLSchema(xml_schema, converter=xmlschema.BadgerFishConverter)
>>> pprint(xs.to_dict(xml_document, dict_class=dict), indent=4)
{ '@xmlns': { 'vh': 'http://example.com/vehicles',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'},
'vh:vehicles': { '@xsi:schemaLocation': 'http://example.com/vehicles '
'vehicles.xsd',
'vh:bikes': { 'vh:bike': [ { '@make': 'Harley-Davidson',
'@model': 'WL'},
{ '@make': 'Yamaha',
'@model': 'XS650'}]},
'vh:cars': { 'vh:car': [ { '@make': 'Porsche',
'@model': '911'},
{ '@make': 'Porsche',
'@model': '911'}]}}}
You can also change the data decoding process providing the keyword argument converter to the method call:
>>> pprint(xs.to_dict(xml_document, converter=xmlschema.ParkerConverter, dict_
˓→class=dict), indent=4)
See the XML Schema converters section for more information about converters.
The data structured created by the decoder can be easily serialized to JSON. But if you data include Decimal values
(for decimal XSD built-in type) you cannot convert the data to JSON:
>>> import xmlschema
>>> import json
>>> xml_document = 'xmlschema/tests/cases/examples/collection/collection.xml'
>>> print(json.dumps(xmlschema.to_dict(xml_document), indent=4))
Traceback (most recent call last):
File "/usr/lib64/python2.7/doctest.py", line 1315, in __run
compileflags, 1) in test.globs
File "<doctest default[3]>", line 1, in <module>
print(json.dumps(xmlschema.to_dict(xml_document), indent=4))
File "/usr/lib64/python2.7/json/__init__.py", line 251, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/lib64/python2.7/json/encoder.py", line 209, in encode
(continues on next page)
This problem is resolved providing an alternative JSON-compatible type for Decimal values, using the keyword argu-
ment decimal_type:
From version 1.0 there are two module level API for simplify the JSON serialization and deserialization task. See the
xmlschema.to_json() and xmlschema.from_json() in the Document level API section.
10 Chapter 2. Usage
xmlschema Documentation, Release 1.0.7
Starting from the version 0.9.10 the library uses XSD validation modes strict/lax/skip, both for schemas and for XML
instances. Each validation mode defines a specific behaviour:
strict Schemas are validated against the meta-schema. The processor stops when an error is found in a schema or
during the validation/decode of XML data.
lax Schemas are validated against the meta-schema. The processor collects the errors and continues, eventually
replacing missing parts with wildcards. Undecodable XML data are replaced with None.
skip Schemas are not validated against the meta-schema. The processor doesn’t collect any error. Undecodable XML
data are replaced with the original text.
The default mode is strict, both for schemas and for XML data. The mode is set with the validation argument,
provided when creating the schema instance or when you want to validate/decode XML data. For example you can
build a schema using a strict mode and then decode XML data using the validation argument setted to ‘lax’.
Starting from the release 0.9.27 the XML data loading is protected using the defusedxml package. The protection is
applied both to XSD schemas and to XML data. The usage of this feature is regulated by the XMLSchema’s argument
defuse. For default this argument has value ‘remote’ that means the protection on XML data is applied only to data
loaded from remote. Other values for this argument can be ‘always’ and ‘never’.
12 Chapter 2. Usage
CHAPTER 3
API Documentation
13
xmlschema Documentation, Release 1.0.7
• xml_document – can be a file-like object or a string containing the XML data or a file
path or a URL of a resource or an ElementTree/Element instance.
• schema – can be a schema instance or a file-like object or a file path or a URL of a resource
or a string containing the schema.
• cls – schema class to use for building the instance (for default uses XMLSchema).
• path – is an optional XPath expression that matches the subelement of the document that
have to be decoded. The XPath expression considers the schema as the root element with
global elements as its children.
• process_namespaces – indicates whether to use namespace information in the decod-
ing process.
• locations – additional schema location hints, in case a schema instance has to be built.
• base_url – is an optional custom base URL for remapping relative locations, for default
uses the directory where the XSD or alternatively the XML document is located.
• kwargs – optional arguments of XMLSchema.iter_decode() as keyword arguments
to variate the decoding process.
Returns an object containing the decoded data. If validation='lax' keyword argument is
provided the validation errors are collected and returned coupled in a tuple with the decoded
data.
Raises XMLSchemaValidationError if the object is not decodable by the XSD component,
or also if it’s invalid when validation='strict' is provided.
xmlschema.to_json(xml_document, fp=None, schema=None, cls=None, path=None, converter=None,
process_namespaces=True, locations=None, base_url=None, json_options=None,
**kwargs)
Serialize an XML document to JSON. For default the XML data is validated during the decoding phase. Raises
an XMLSchemaValidationError if the XML document is not validated against the schema.
Parameters
• xml_document – can be a file-like object or a string containing the XML data or a file
path or an URI of a resource or an ElementTree/Element instance.
• fp – can be a write() supporting file-like object.
• schema – can be a schema instance or a file-like object or a file path or an URL of a
resource or a string containing the schema.
• cls – schema class to use for building the instance (for default uses XMLSchema).
• path – is an optional XPath expression that matches the subelement of the document that
have to be decoded. The XPath expression considers the schema as the root element with
global elements as its children.
• converter – an XMLSchemaConverter subclass or instance to use for the decoding.
• process_namespaces – indicates whether to use namespace information in the decod-
ing process.
• locations – additional schema location hints, in case a schema instance has to be built.
• base_url – is an optional custom base URL for remapping relative locations, for default
uses the directory where the XSD or alternatively the XML document is located.
• json_options – a dictionary with options for the JSON serializer.
• validation (str) – defines the XSD validation mode to use for build the schema, it’s
value can be ‘strict’, ‘lax’ or ‘skip’.
• global_maps (XsdGlobals or None) – is an optional argument containing an
XsdGlobals instance, a mediator object for sharing declaration data between dependents
schema instances.
• converter (XMLSchemaConverter or None) – is an optional argument that can
be an XMLSchemaConverter subclass or instance, used for defining the default XML
data converter for XML Schema instance.
• locations (dict or list or None) – schema location hints for namespace im-
ports. Can be a dictionary or a sequence of couples (namespace URI, resource URL).
• base_url (str or None) – is an optional base URL, used for the normalization of
relative paths when the URL of the schema resource can’t be obtained from the source
argument.
• defuse (str or None) – defines when to defuse XML data. Can be ‘always’, ‘remote’
or ‘never’. For default defuse only remote XML data.
• timeout (int) – the timeout in seconds for fetching resources. Default is 300.
• build (bool) – defines whether build the schema maps. Default is True.
Variables
• XSD_VERSION (str) – store the XSD version (1.0 or 1.1).
• meta_schema (XMLSchema) – the XSD meta-schema instance.
• target_namespace (str) – is the targetNamespace of the schema, the namespace to
which belong the declarations/definitions of the schema. If it’s empty no namespace is
associated with the schema. In this case the schema declarations can be reused from other
namespaces as chameleon definitions.
• validation (str) – validation mode, can be ‘strict’, ‘lax’ or ‘skip’.
• maps (XsdGlobals) – XSD global declarations/definitions maps. This is an instance of
XsdGlobal, that store the global_maps argument or a new object when this argument is
not provided.
• converter (XMLSchemaConverter) – the default converter used for XML data de-
coding/encoding.
• locations (NamespaceResourcesMap) – schema location hints.
• namespaces (list) – a dictionary that maps from the prefixes used by the schema into
namespace URI.
• warnings – warning messages about failure of import and include elements.
• notations (NamespaceView) – xsd:notation declarations.
• types (NamespaceView) – xsd:simpleType and xsd:complexType global declarations.
• attributes (NamespaceView) – xsd:attribute global declarations.
• attribute_groups (NamespaceView) – xsd:attributeGroup definitions.
• groups (NamespaceView) – xsd:group global definitions.
• elements (NamespaceView) – xsd:element global declarations.
root
Root element of the schema.
get_text()
Gets the XSD text of the schema. If the source text is not available creates an encoded string representation
of the XSD tree.
url
Schema resource URL, is None if the schema is built from a string.
tag
Schema root tag. For compatibility with the ElementTree API.
id
The schema’s id attribute, defaults to None.
version
The schema’s version attribute, defaults to None.
attribute_form_default
The schema’s attributeFormDefault attribute, defaults to 'unqualified'
element_form_default
The schema’s elementFormDefault attribute, defaults to 'unqualified'.
block_default
The schema’s blockDefault attribute, defaults to None.
final_default
The schema’s finalDefault attribute, defaults to None.
schema_location
A list of location hints extracted from the xsi:schemaLocation attribute of the schema.
no_namespace_schema_location
A location hint extracted from the xsi:noNamespaceSchemaLocation attribute of the schema.
target_prefix
The prefix associated to the targetNamespace.
default_namespace
The namespace associated to the empty prefix ‘’.
base_url
The base URL of the source of the schema.
root_elements
The list of global elements that are not used by reference in any model of the schema. This is implemented
as lazy property because it’s computationally expensive to build when the schema model is complex.
builtin_types = <bound method XMLSchemaBase.builtin_types of <class 'xmlschema.validato
get_locations(namespace)
Get a list of location hints for a namespace.
include_schema(location, base_url=None)
Includes a schema for the same namespace, from a specific URL.
Parameters
• location – is the URL of the schema.
• base_url – is an optional base URL for fetching the schema resource.
Returns the included XMLSchema instance.
import_schema(namespace, location, base_url=None, force=False)
Imports a schema for an external namespace, from a specific URL.
Parameters
• namespace – is the URI of the external namespace.
• location – is the URL of the schema.
• base_url – is an optional base URL for fetching the schema resource.
• force – is set to True imports the schema also if the namespace is already imported.
Returns the imported XMLSchema instance.
classmethod create_schema(*args, **kwargs)
Creates a new schema instance of the same class of the caller.
create_any_content_group(parent, name=None)
Creates a model group related to schema instance that accepts any content.
create_any_attribute_group(parent, name=None)
Creates an attribute group related to schema instance that accepts any attribute.
classmethod check_schema(schema, namespaces=None)
Validates the given schema against the XSD meta-schema (meta_schema).
Parameters
• schema – the schema instance that has to be validated.
• namespaces – is an optional mapping from namespace prefix to URI.
Raises XMLSchemaValidationError if the schema is invalid.
build()
Builds the schema XSD global maps.
built
Property that is True if schema validator has been fully parsed and built, False otherwise.
validation_attempted
Property that returns the XSD component validation status. It can be ‘full’, ‘partial’ or ‘none’.
https://www.w3.org/TR/xmlschema-1/#e-validation_attempted
https://www.w3.org/TR/2012/REC-xmlschema11-1-20120405/#e-validation_attempted
validity
Property that returns the XSD validator’s validity. It can be ‘valid’, ‘invalid’ or ‘notKnown’.
https://www.w3.org/TR/xmlschema-1/#e-validity
https://www.w3.org/TR/2012/REC-xmlschema11-1-20120405/#e-validity
all_errors
A list with all the building errors of the XSD validator and its components.
iter_components(xsd_classes=None)
Creates an iterator for traversing all XSD components of the validator.
Parameters xsd_classes – returns only a specific class/classes of components, otherwise
returns all components.
iter_globals(schema=None)
Creates an iterator for XSD global definitions/declarations.
Parameters schema – Optional schema instance.
get_converter(converter=None, namespaces=None, **kwargs)
Returns a new converter instance.
Parameters
• converter – can be a converter class or instance. If it’s an instance the new instance is
copied from it and configured with the provided arguments.
• namespaces – is an optional mapping from namespace prefix to URI.
• kwargs – optional arguments for initialize the converter instance.
Returns a converter instance.
validate(source, use_defaults=True, namespaces=None)
Validates an XML data against the XSD schema/component instance.
Parameters
• source – the source of XML data. For a schema can be a path to a file or an URI of a
resource or an opened file-like object or an Element Tree instance or a string containing
XML data. For other XSD components can be a string for an attribute or a simple type
validators, or an ElementTree’s Element otherwise.
• use_defaults – indicates whether to use default values for filling missing data.
• namespaces – is an optional mapping from namespace prefix to URI.
Raises XMLSchemaValidationError if XML data instance is not a valid.
is_valid(source, use_defaults=True)
Like validate() except that do not raises an exception but returns True if the XML document is valid,
False if it’s invalid.
Parameters
• source – the source of XML data. For a schema can be a path to a file or an URI of a
resource or an opened file-like object or an Element Tree instance or a string containing
XML data. For other XSD components can be a string for an attribute or a simple type
validators, or an ElementTree’s Element otherwise.
• use_defaults – indicates whether to use default values for filling missing data.
iter_errors(source, path=None, use_defaults=True, namespaces=None)
Creates an iterator for the errors generated by the validation of an XML data against the XSD
schema/component instance.
Parameters
• source – the source of XML data. For a schema can be a path to a file or an URI of a
resource or an opened file-like object or an Element Tree instance or a string containing
XML data. For other XSD components can be a string for an attribute or a simple type
validators, or an ElementTree’s Element otherwise.
• path – is an optional XPath expression that defines the parts of the document that have to
be validated. The XPath expression considers the schema as the root element with global
elements as its children.
• use_defaults – Use schema’s default values for filling missing data.
class xmlschema.ElementPathMixin
Mixin abstract class for enabling ElementTree and XPath API on XSD components.
Variables
• text – The Element text. Its value is always None. For compatibility with the ElementTree
API.
• tail – The Element tail. Its value is always None. For compatibility with the ElementTree
API.
tag
Alias of the name attribute. For compatibility with the ElementTree API.
attrib
Returns the Element attributes. For compatibility with the ElementTree API.
get(key, default=None)
Gets an Element attribute. For compatibility with the ElementTree API.
iter(tag=None)
Creates an iterator for the XSD element and its subelements. If tag is not None or ‘*’, only XSD ele-
ments whose matches tag are returned from the iterator. Local elements are expanded without repetitions.
Element references are not expanded because the global elements are not descendants of other elements.
iterchildren(tag=None)
Creates an iterator for the child elements of the XSD component. If tag is not None or ‘*’, only XSD
elements whose name matches tag are returned from the iterator.
find(path, namespaces=None)
Finds the first XSD subelement matching the path.
Parameters
• path – an XPath expression that considers the XSD component as the root element.
• namespaces – an optional mapping from namespace prefix to full name.
Returns The first matching XSD subelement or None if there is not match.
findall(path, namespaces=None)
Finds all XSD subelements matching the path.
Parameters
• path – an XPath expression that considers the XSD component as the root element.
• namespaces – an optional mapping from namespace prefix to full name.
Returns a list containing all matching XSD subelements in document order, an empty list is
returned if there is no match.
iterfind(path, namespaces=None)
Creates and iterator for all XSD subelements matching the path.
Parameters
• path – an XPath expression that considers the XSD component as the root element.
• namespaces – is an optional mapping from namespace prefix to full name.
Returns an iterable yielding all matching XSD subelements in document order.
The base class XMLSchemaConverter is used for defining generic converters. The subclasses implement some of the
most used conventions for converting XML to JSON data.
class xmlschema.converters.ElementData(tag, text, content, attributes)
Namedtuple for Element data interchange between decoders and converters.
class xmlschema.XMLSchemaConverter(namespaces=None, dict_class=None, list_class=None,
text_key=’$’, attr_prefix=’@’, cdata_prefix=None,
etree_element_class=None, indent=4, **kwargs)
Generic XML Schema based converter class. A converter is used to compose decoded XML data for an Element
into a data structure and to build an Element from encoded data structure.
Parameters
• namespaces – map from namespace prefixes to URI.
• dict_class – dictionary class to use for decoded data. Default is dict.
• list_class – list class to use for decoded data. Default is list.
• text_key – is the key to apply to element’s decoded text data.
• attr_prefix – controls the mapping of XML attributes, to the same name or with a
prefix. If None the converter ignores attributes.
• cdata_prefix – is used for including and prefixing the CDATA parts of a mixed content,
that are labeled with an integer instead of a string. CDATA parts are ignored if this argument
is None.
• etree_element_class – the class that has to be used to create new XML elements, if
not provided uses the ElementTree’s Element class.
• indent – number of spaces for XML indentation (default is 4).
Variables
• dict – dictionary class to use for decoded data.
• list – list class to use for decoded data.
• text_key – key for decoded Element text
• attr_prefix – prefix for attribute names
• cdata_prefix – prefix for character data parts
• etree_element_class – Element class to use
is_lazy()
Gets True the XML resource is lazy.
is_loaded()
Gets True the XML text of the data source is loaded.
iter(tag=None)
XML resource tree elements lazy iterator.
iter_location_hints()
Yields schema location hints from the XML tree.
get_namespaces()
Extracts namespaces with related prefixes from the XML resource. If a duplicate prefix declaration is
encountered then adds the namespace using a different prefix, but only in the case if the namespace URI is
not already mapped by another prefix.
Returns A dictionary for mapping namespace prefixes to full URI.
get_locations(locations=None)
Returns a list of schema location hints. The locations are normalized using the base URL of the instance.
The locations argument can be a dictionary or a list of namespace resources, that are inserted before the
schema location hints extracted from the XML resource.
xmlschema.fetch_resource(location, base_url=None, timeout=30)
Fetch a resource trying to accessing it. If the resource is accessible returns the URL, otherwise raises an error
(XMLSchemaURLError).
Parameters
• location – an URL or a file path.
• base_url – reference base URL for normalizing local and relative URLs.
• timeout – the timeout in seconds for the connection attempt in case of remote data.
Returns a normalized URL.
xmlschema.fetch_schema(source, locations=None, **resource_options)
Fetches the schema URL for the source’s root of an XML data source. If an accessible schema location is not
found raises a ValueError.
Parameters
• source – An an Element or an Element Tree with XML data or an URL or a file-like
object.
• locations – A dictionary or dictionary items with schema location hints.
• resource_options – keyword arguments for providing XMLResource class init op-
tions.
Returns An URL referring to a reachable schema resource.
xmlschema.fetch_schema_locations(source, locations=None, **resource_options)
Fetches the schema URL for the source’s root of an XML data source and a list of location hints. If an accessible
schema location is not found raises a ValueError.
Parameters
• source – an Element or an Element Tree with XML data or an URL or a file-like object.
• locations – a dictionary or dictionary items with Schema location hints.
exception xmlschema.XMLSchemaException
The base exception that let you catch all the errors generated by the library.
exception xmlschema.XMLSchemaRegexError
Raised when an error is found when parsing an XML Schema regular expression.
exception xmlschema.XMLSchemaValidatorError(validator, message, elem=None,
source=None, namespaces=None)
Base class for XSD validator errors.
Parameters
• validator (XsdValidator or function) – the XSD validator.
• message (str or unicode) – the error message.
• elem (Element) – the element that contains the error.
• source (XMLResource) – the XML resource that contains the error.
• namespaces (dict) – is an optional mapping from namespace prefix to URI.
Variables path – the XPath of the element, calculated when the element is set or the XML resource
is set.
exception xmlschema.XMLSchemaNotBuiltError(validator, message)
Raised when there is an improper usage attempt of a not built XSD validator.
Parameters
• validator (XsdValidator) – the XSD validator.
• message (str or unicode) – the error message.
exception xmlschema.XMLSchemaParseError(validator, message, elem=None)
Raised when an error is found during the building of an XSD validator.
Parameters
• validator (XsdValidator or function) – the XSD validator.
• message (str or unicode) – the error message.
• elem (Element) – the element that contains the error.
exception xmlschema.XMLSchemaValidationError(validator, obj, reason=None, source=None,
namespaces=None)
Raised when the XML data is not validated with the XSD component or schema. It’s used by decoding and
encoding methods. Encoding validation errors do not include XML data element and source, so the error is
limited to a message containing object representation and a reason.
Parameters
• validator (XsdValidator or function) – the XSD validator.
• obj (Element or tuple or str or list or int or float or bool)
– the not validated XML data.
• reason (str or unicode) – the detailed reason of failed validation.
• source (XMLResource) – the XML resource that contains the error.
• namespaces (dict) – is an optional mapping from namespace prefix to URI.
exception xmlschema.XMLSchemaDecodeError(validator, obj, decoder, reason=None,
source=None, namespaces=None)
Raised when an XML data string is not decodable to a Python object.
Parameters
• validator (XsdValidator or function) – the XSD validator.
• obj (Element or tuple or str or list or int or float or bool)
– the not validated XML data.
• decoder (type or function) – the XML data decoder.
• reason (str or unicode) – the detailed reason of failed validation.
• source (XMLResource) – the XML resource that contains the error.
• namespaces (dict) – is an optional mapping from namespace prefix to URI.
exception xmlschema.XMLSchemaEncodeError(validator, obj, encoder, reason=None,
source=None, namespaces=None)
Raised when an object is not encodable to an XML data string.
Parameters
• validator (XsdValidator or function) – the XSD validator.
Testing
The tests of the xmlschema library are implemented using the Python’s unitest library. The test scripts are located
under the installation base into tests/ subdirectory. There are several test scripts, each one for a different topic:
test_helpers.py Tests for ElementTree functionalities
test_helpers.py Tests for helper functions and classes
test_meta.py Tests for the XSD meta-schema and XSD builtins
test_models.py Tests concerning model groups validation
test_package.py Tests regarding packaging and forgotten development code
test_resources.py Tests about XML/XSD resources access
test_resources.py Tests about XSD regular expressions
test_schemas.py Tests about parsing of XSD Schemas
test_validators.py Tests regarding XML data validation/decoding/encoding
test_xpath.py Tests for XPath parsing and selectors
You can run all tests with the script test_all.py. From the project source base, if you have the tox automation tool
installed, you can run all tests with all supported Python’s versions using the command tox.
Two scripts (test_schemas.py, test_validators.py) create the most tests dinamically, loading a set of XSD or XML files.
Only a small set of test files is published in the repository for copyright reasons. You can found the published test files
into xmlschema/tests/examples/ subdirectory.
31
xmlschema Documentation, Release 1.0.7
You can locally extend the test with your set of files. For make this create the base subdirectory xmlschema/
tests/extra-schemas/ and then copy your XSD/XML files into it. After the files are copied create a new file
called testfiles into the extra-schemas/ subdirectory:
cd tests/extra-schemas/
touch testfiles
Fill the file testfiles with the list of paths of files you want to be tested, one per line, as in the following example:
# XHTML
XHTML/xhtml11-mod.xsd
XHTML/xhtml-datatypes-1.xsd
# Quantum Espresso
qe/qes.xsd
qe/qes_neb.xsd
qe/qes_with_choice_no_nesting.xsd
qe/silicon.xml
qe/silicon-1_error.xml --errors 1
qe/silicon-3_errors.xml --errors=3
qe/SrTiO_3.xml
qe/SrTiO_3-2_errors.xml --errors 2
The test scripts create a test for each listed file, dependant from the context. For example the script that test the schemas
uses only .xsd files, where instead the script that tests the validation uses both types, validating each XML file against
its schema and each XSD against the meta-schema.
If a file has errors insert an integer number after the path. This is the number of errors that the XML Schema validator
have to found to pass the test.
From version 1.0.0 each test-case line is parsed for those additional arguments:
-L URI URL Schema location hint overrides.
–version=VERSION XSD schema version to use for the test case (default is 1.0).
–errors=NUM Number of errors expected (default=0).
–warnings=NUM Number of warnings expected (default=0).
–inspect Inspect using an observed custom schema class.
–defuse=(always, remote, never) Define when to use the defused XML data loaders.
–timeout=SEC Timeout for fetching resources (default=300).
–skip Skip strict encoding checks (for cases where test data uses default or fixed values or some test data are skipped
by wildcards processContents).
–debug Activate the debug mode (only the cases with –debug are executed).
If you put a --help on the first case line the argument parser show you all the options available.
Note: Test case line options are changed from version 1.0.0, with the choice of using almost only double dash prefixed
options, in order to simplify text search in long testfiles, and add or remove options without the risk to change also
parts of filepaths.
To run tests with also your personal set of files you have to add a -x/--extra option to the command, for example:
python xmlschema/tests/test_all.py -x
32 Chapter 4. Testing
xmlschema Documentation, Release 1.0.7
or:
tox -- -x
34 Chapter 4. Testing
CHAPTER 5
Release notes
5.1 License
The xmlschema library is distributed under the terms of the MIT License.
5.2 Support
The project is hosted on GitHub, refer to the xmlschema’s project page for source code and for an issue tracker.
5.3 Roadmap
• XSD 1.1
35
xmlschema Documentation, Release 1.0.7
37
xmlschema Documentation, Release 1.0.7
L V
load() (xmlschema.XMLResource method), 26 validate() (in module xmlschema), 13
load_xml_resource() (in module xmlschema), 28 validate() (xmlschema.XMLSchemaBase method), 19
lossless (xmlschema.XMLSchemaConverter attribute), validation_attempted (xmlschema.XMLSchemaBase at-
24 tribute), 18
losslessly (xmlschema.XMLSchemaConverter attribute), validity (xmlschema.XMLSchemaBase attribute), 18
24 version (xmlschema.XMLSchemaBase attribute), 17
M X
map_attributes() (xmlschema.XMLSchemaConverter XMLResource (class in xmlschema), 26
method), 24 XMLSchema (in module xmlschema), 15
map_content() (xmlschema.XMLSchemaConverter xmlschema.XMLSchema10 (built-in class), 15
method), 24 XMLSchemaBase (class in xmlschema), 15
XMLSchemaChildrenValidationError, 30
N XMLSchemaConverter (class in xmlschema), 23
namespace (xmlschema.XMLResource attribute), 26 XMLSchemaDecodeError, 29
no_namespace_schema_location XMLSchemaEncodeError, 29
(xmlschema.XMLSchemaBase attribute), XMLSchemaException, 28
17 XMLSchemaImportWarning, 30
normalize_url() (in module xmlschema), 28 XMLSchemaIncludeWarning, 30
XMLSchemaNotBuiltError, 29
O XMLSchemaParseError, 29
open() (xmlschema.XMLResource method), 26 XMLSchemaRegexError, 28
XMLSchemaValidationError, 29
P XMLSchemaValidatorError, 28
XsdGlobals (class in xmlschema), 22
ParkerConverter (class in xmlschema), 25
38 Index