Lesson1 Introduction v2
Lesson1 Introduction v2
Lesson 1: Introduction
This document, DFDL Schema Tutorial, provides an easily approachable
description of the Data Format Description Language 1.0 (DFDL for short), and
should be used alongside the formal descriptions of the language contained in
the DFDL 1.0 Specification. The intended audience of this document includes
application developers whose programs read and write DFDL-described data,
and DFDL schema authors who need to know about the features of the
language. The text assumes that you have a basic understanding of XML 1.0,
Namespaces in XML and XML Schema. Each section of the tutorial introduces
new features of the language, and describes those features in the context of
concrete examples.
Lesson1. Explains what DFDL is, why DFDL is useful, and when to use DFDL.
The first DFDL examples are introduced.
Lesson 2. Introduces the fundamentals of the DFDL language. The subset of
XML Schema components used by DFDL is discussed as well as the special
DFDL annotations that are used with each component.
Lesson 3. Describes the syntax and rules for the DFDL properties that are
carried on DFDL annotations and used to model data.
Lesson 4. Shows how to model basic fixed length data, variable length data, and
data with initiators (tags) and terminators, using DFDL properties.
Lesson 5. Shows how to model data that contains alternatives, known in DFDL
as a choice.
Lesson 6. Shows how to model data that may optionally occur in the data
stream, or that repeats multiple times in the data stream.
Lesson 7. Covers the modeling of all kinds of text data, including strings,
numbers, calendars and Booleans.
Lesson 8. Covers the modeling of all kinds of binary data, including strings,
numbers, calendars and Booleans.
Lesson 9. Shows how padding and trimming are handled in DFDL, especially
useful when dealing with fixed length data.
Lesson 10. Want to default in values for missing data? Need to model out-ofband data values? What does empty string represent? This lesson walks
through examples using nil and default settings.
Lesson 11. Explains how to handle delimiters that occur in data values by using
either escape characters and escape blocks. Steps through setting up an escape
scheme.
Lesson 12. DFDL expressions. Several examples will show this powerful feature
of the language, from a simple path expression to a complex conditional
expression.
Lesson 13. Creating dynamic models. Building on top of expressions, learn how
to use DFDL variables to add in complex control of your DFDL properties and
other settings.
Lesson 14. Making assertions about your data. Discover the uses of asserts
and discriminators, and the difference between them. Concepts such as rule
placement, and timing of triggering will be discussed via example.
Lesson 15. Bits versus bytes dont worry if your data is expressed in terms of
bits rather than bytes DFDL can handle that as well.
Lesson 16. Have components in your data where the order of the data is not
fixed? This lesson takes you through an example of a floating component within
a group. Then we will make the entire group unordered.
Lesson 17. Elements with calculated values and Hidden Elements
The tutorial is a non-normative document, meaning that it does not provide a
definitive specification of the DFDL language. The examples and other
explanatory material in this document are provided to help you understand DFDL
but they may not always provide definitive answers. In such cases, you will need
to refer to the DFDL 1.0 Specification, and to help you do this, many links
pointing to the relevant parts of the specification are provided.
Conventions used:
New terms will be introduced in italic font.
Examples will be presented in boxed courier font.
How can a DFDL schema (which itself is an XML schema) describe non-XML
data? The answer is annotations.
Annotations A way to provide additional information in an XML schema.
They are ignored by XML parsers. They can be used to add
documentation or to add additional information that will be used by
application programs.
Well-formed XML XML instance document which conforms to all the
Well-formedness constrains in the W3C XML Recommendation. In other
words, it follows the basic rules of XML.
Valid XML Not only does this XML instance document follow the rules of
well-formedness, it also conforms to the rules imposed by its XML
schema.
Delimiters (also called Markup or Syntax) - Clues in the data that tell us
where pieces of the data begin and end. For example in written English a
period or full stop tells us that we are at the end of a sentence.
(there will be much more detail in future lessons). In this version we are using
annotations to add in the lengths of our fixed length elements:
1 <xs:element name="address" dfdl:lengthKind="implicit">
2
<xs:complexType>
3
<xs:sequence dfdl:sequenceKind="ordered">
4
<xs:element name="houseNumber" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="6"/>
5
<xs:element name="street" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="20"/>
6
<xs:element name="city" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="20"/>
7
<xs:element name="state" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="2"/>
8
</xs:sequence>
9
</xs:complexType>
10</xs:element>
There are a few new concepts shown in the above example that we will
highlight.
If you noticed that the DFDL models in this example look a great deal like XML
Schema then you would be correct. Even though our data is not XML, DFDL
uses XML Schema to model it
o Line 1 establishes an element named address with one DFDL property on it
setting the length to be implicit, this means that the length of element
address is determined by the length of its children.
o Line 3 is establishing an unnamed sequence within element address. It also
holds a DFDL property of saying that the sequence is ordered. So
houseNumber is always first, followed by street, city and ending with
state.
o Line 4 lists the first element of the sequence, houseNumber. On this
element there are two DFDL properties specifying that it has an explicit
length of 6.
o Line 5 defines the second element in the sequence as street. The DFDL
properties on this element describes it as having an explicit length of 20.
o Line 6 defines the third element in the sequence as city. The DFDL
properties on this element describes it as having an explicit length of 20.
o Line 7 defines the fourth element in the sequence as element state. The
DFDL properties on this element describes it as having an explicit length of 2.
The data content from our example may be represented in an infoset format
such as:
Infoset:
address
houseNumber
street
118
Ridgewood Circle
city
state
Rochester
NY
Notice that leading zeros and trailing spaces have been trimmed from the values
in the infoset. It is possible to do this using other DFDL properties (not shown),
as we will see in later lessons.
Infoset Also known as Information Set is an abstract data
model that describes the information content of a document.
118
Ridgewood Circle
Rochester
NY
Notice that the infosets in Example 1 and Example 2 look the same. This is
because in both example instance documents the data is logically the same it
is just presented differently physically (with fixed sizes in Example 1 versus with
delimiters in Example 2). Once we strip away the delimiters and the padding we
are left with the same data. This distinction between physical data and logical
data will be very important (and useful) as we progress though our understanding
of DFDL.
Now that weve seen a very small set of what DFDL can do, we need to start
exploring how to exploit it fully. Our next lesson will introduce fundamentals of
the DFDL language and the concept of DFDL annotations.