0% found this document useful (0 votes)

57 views9 pages

Lesson1 Introduction v2

This document provides an introduction to the Data Format Description Language (DFDL) through a series of lessons. DFDL is a language for describing the format of non-XML data in a way that is independent of the format. The introduction explains what DFDL is, when it should be used, and provides some first examples of using DFDL to describe fixed length and delimited address data. The 17 lessons that follow cover various DFDL modeling concepts like basic and complex data types, padding, default values, expressions, and more.

Uploaded by

Emanuel Jimenez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views9 pages

Lesson1 Introduction v2

Uploaded by

Emanuel Jimenez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

DFDL Introduction For Beginners

Lesson 1: Introduction
This document, DFDL Schema Tutorial, provides an easily approachable
description of the Data Format Description Language 1.0 (DFDL for short), and
should be used alongside the formal descriptions of the language contained in
the DFDL 1.0 Specification. The intended audience of this document includes
application developers whose programs read and write DFDL-described data,
and DFDL schema authors who need to know about the features of the
language. The text assumes that you have a basic understanding of XML 1.0,
Namespaces in XML and XML Schema. Each section of the tutorial introduces
new features of the language, and describes those features in the context of
concrete examples.
Lesson1. Explains what DFDL is, why DFDL is useful, and when to use DFDL.
The first DFDL examples are introduced.
Lesson 2. Introduces the fundamentals of the DFDL language. The subset of
XML Schema components used by DFDL is discussed as well as the special
DFDL annotations that are used with each component.
Lesson 3. Describes the syntax and rules for the DFDL properties that are
carried on DFDL annotations and used to model data.
Lesson 4. Shows how to model basic fixed length data, variable length data, and
data with initiators (tags) and terminators, using DFDL properties.
Lesson 5. Shows how to model data that contains alternatives, known in DFDL
as a choice.
Lesson 6. Shows how to model data that may optionally occur in the data
stream, or that repeats multiple times in the data stream.
Lesson 7. Covers the modeling of all kinds of text data, including strings,
numbers, calendars and Booleans.
Lesson 8. Covers the modeling of all kinds of binary data, including strings,
numbers, calendars and Booleans.
Lesson 9. Shows how padding and trimming are handled in DFDL, especially
useful when dealing with fixed length data.

Lesson 10. Want to default in values for missing data? Need to model out-ofband data values? What does empty string represent? This lesson walks
through examples using nil and default settings.
Lesson 11. Explains how to handle delimiters that occur in data values by using
either escape characters and escape blocks. Steps through setting up an escape
scheme.
Lesson 12. DFDL expressions. Several examples will show this powerful feature
of the language, from a simple path expression to a complex conditional
expression.
Lesson 13. Creating dynamic models. Building on top of expressions, learn how
to use DFDL variables to add in complex control of your DFDL properties and
other settings.
Lesson 14. Making assertions about your data. Discover the uses of asserts
and discriminators, and the difference between them. Concepts such as rule
placement, and timing of triggering will be discussed via example.
Lesson 15. Bits versus bytes dont worry if your data is expressed in terms of
bits rather than bytes DFDL can handle that as well.
Lesson 16. Have components in your data where the order of the data is not
fixed? This lesson takes you through an example of a floating component within
a group. Then we will make the entire group unordered.
Lesson 17. Elements with calculated values and Hidden Elements
The tutorial is a non-normative document, meaning that it does not provide a
definitive specification of the DFDL language. The examples and other
explanatory material in this document are provided to help you understand DFDL
but they may not always provide definitive answers. In such cases, you will need
to refer to the DFDL 1.0 Specification, and to help you do this, many links
pointing to the relevant parts of the specification are provided.
Conventions used:
New terms will be introduced in italic font.
Examples will be presented in boxed courier font.

What is Data Format Description Language?

Data Format Description Language or DFDL is pronounced like the flower
daffodil: dfdl. It is a language designed to describe the format of data.
Specifically, it is designed to describe the format of data in a way that is
independent of the format itself. The idea is that you choose an appropriate data
representation for an application based on its needs and then describe the format
using DFDL so that multiple programs can directly interchange the described
data. That is, DFDL is not a format for data; it is a way of describing any data
format.
DFDL is intended for data commonly found in scientific and numeric
computations, as well as record-oriented representations found in commercial
data processing. DFDL can be used to describe legacy data files, to simplify
transfer of data across domains without requiring global standard formats, or to
allow third-party tools to easily access multiple formats.
DFDL is designed to provide flexibility and also permit implementations that
achieve very high levels of performance. DFDL descriptions are separable and
native applications do not need to use DFDL libraries to parse their data formats.
DFDL parsers can also be highly efficient. The DFDL language is designed to
permit implementations that use lazy evaluation of formats and to support
seekable, random access to data. The following goals can be achieved by DFDL
Note that DFDL is specifically not intended to be used to describe XML, which
already has well-defined ways to describe it. However, the DFDL language is
built upon many of the principals of XML and is designed to make XML tooling
available for use with non-XML data.
Schema A model, a framework, a plan. We will be using the generic
term schema to mean a model of some data.
XML Schema or XSD A model used specifically to describe the structure
and content of XML data/instance documents.
Instance Document A term that describes the entire stream of data that
we are processing in this run or instance.
DFDL Schema A model used specifically to describe the structure and
content of non-XML data/instance document.
A DFDL schema uses a subset of XML schema constructs, so a DFDL schema is
actually a well-formed and valid XML. However, a DFDL schema does not
describe data that is XML, it describes data that is not XML.

How can a DFDL schema (which itself is an XML schema) describe non-XML
data? The answer is annotations.
Annotations A way to provide additional information in an XML schema.
They are ignored by XML parsers. They can be used to add
documentation or to add additional information that will be used by
application programs.
Well-formed XML XML instance document which conforms to all the
Well-formedness constrains in the W3C XML Recommendation. In other
words, it follows the basic rules of XML.
Valid XML Not only does this XML instance document follow the rules of
well-formedness, it also conforms to the rules imposed by its XML
schema.

When to use DFDL

DFDL should be used to describe general text or binary data. XML data is
different from non-XML data in a variety of ways. One obvious way is that with
XML you always know that your data will start with an XML start tag and end with
an XML end tag. So an XML schema doesnt need to tell you how to find the
beginning and end of each piece of data.
Example XML showing start and end tags:
<song>One More Than Two Visually Challenged Rodents</song>
Description:
<song> is the start tag
</song> is the end tag
The data that is song is:
One More Than Two Visually Challenged Rodents
A DFDL schema describes non-XML data which might not have start tags and
end tags so needs other ways to define in the schema where data starts and
ends. There are many types of structures for data, for example, sometimes data
is made up of fixed length components, and sometimes data has delimiters to tell
us where pieces of the data begin and end.
Fixed Length The length of a portion of data is known at the time of data
modeling (design time) and is always the same. For example, in the US a
State Code is always 2 characters in length.

Delimiters (also called Markup or Syntax) - Clues in the data that tell us
where pieces of the data begin and end. For example in written English a
period or full stop tells us that we are at the end of a sentence.

First set of DFDL examples

The best way to understand how DFDL works is to look at a couple of small
examples. We will model the same logical data in different ways.
1. A sequence of fixed-length data elements
2. A sequence of delimited data elements
Description of data for all address examples:
element 1: houseNumber
element 2: street
element 3: city
element 4: state
Example Address 1 address with fixed length data components:
000118Ridgewood Circle
Rochester
NY
In the above example we have an instance which is fixed length of 48 composed
of :
6 digit houseNumber
20 character street
20 character city
2 character state
Example Address 2 address with delimited data components:
118*Ridgewood Circle*Rochester*NY
In the above example we have an instance document which has variable sized
components, infix delimited by an asterisk (*):
piece 1: houseNumber
piece 2: street
piece 3: city
piece 4: state
Variable length The length of a portion of data is not known at the time
the data is being modeled (design time) but is determined when data is
being processed (run-time).

Infix Delimited Data in which a delimiter or separator appears in

between each component. This delimiter is not found before the first
component, not is it found after the last component.
Looking at the address examples we can see exactly the same data content1
expressed in different ways. With DFDL we are operating under the principal
that we already know what our data looks like and that we want to model it in the
easiest way possible we want to describe the format that we already have. If
our data is non-XML and we want to use an open-standard modeling language,
then we will want to use DFDL.
Now that weve seen different ways to express our address data we can compare
different ways to model it.
Modeling the process of specifying the structure and content of your
data file. In the above example we say that the XML schema is the model
for the XML instance document. The process of creating a model or a
schema is sometimes called modeling.

Modeling fixed length data

To show one way to model fixed length data we will start by modeling Example
Address 1 address with fixed length data components.
We need to describe where each piece of data begins and ends. We will do this
by using DFDL.
Looking again at our fixed length data example.
Example Address 1 address with fixed length data components:
000118Ridgewood Circle
Rochester
NY
Note: When looking at the data we dont see anything in the data that names the
root of the data. In our case we will call the entire piece of data an address, so
out root is address.
We also do not see anything in the data which shows us where each piece of
data begins and ends we need to know that our first 6 characters of data is the
houseNumber in order to understand this data.
There are a few ways to layout this structure in DFDL. In the following example
we are using what are known as short form annotations which is the most
compact layout. There are portions of the DFDL schema that are not shown in
the below example in order to make the concepts clearer and simpler to start

(there will be much more detail in future lessons). In this version we are using
annotations to add in the lengths of our fixed length elements:
1 <xs:element name="address" dfdl:lengthKind="implicit">
2
<xs:complexType>
3
<xs:sequence dfdl:sequenceKind="ordered">
4
<xs:element name="houseNumber" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="6"/>
5
<xs:element name="street" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="20"/>
6
<xs:element name="city" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="20"/>
7
<xs:element name="state" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="2"/>
8
</xs:sequence>
9
</xs:complexType>
10</xs:element>

There are a few new concepts shown in the above example that we will
highlight.
If you noticed that the DFDL models in this example look a great deal like XML
Schema then you would be correct. Even though our data is not XML, DFDL
uses XML Schema to model it
o Line 1 establishes an element named address with one DFDL property on it
setting the length to be implicit, this means that the length of element
address is determined by the length of its children.
o Line 3 is establishing an unnamed sequence within element address. It also
holds a DFDL property of saying that the sequence is ordered. So
houseNumber is always first, followed by street, city and ending with
state.
o Line 4 lists the first element of the sequence, houseNumber. On this
element there are two DFDL properties specifying that it has an explicit
length of 6.
o Line 5 defines the second element in the sequence as street. The DFDL
properties on this element describes it as having an explicit length of 20.
o Line 6 defines the third element in the sequence as city. The DFDL
properties on this element describes it as having an explicit length of 20.
o Line 7 defines the fourth element in the sequence as element state. The
DFDL properties on this element describes it as having an explicit length of 2.
The data content from our example may be represented in an infoset format
such as:
Infoset:
address
houseNumber
street

118
Ridgewood Circle

city
state

Rochester
NY

Notice that leading zeros and trailing spaces have been trimmed from the values
in the infoset. It is possible to do this using other DFDL properties (not shown),
as we will see in later lessons.
Infoset Also known as Information Set is an abstract data
model that describes the information content of a document.

Modeling variable length delimited data

We started by modeling Example Address 1 address with fixed length data
components. Now we want to model the same data but our components are not
fixed length. In this case we can find the beginning and end of each part of the
data by looking at markup in the data, known as delimiters in DFDL. Here is our
example data again.
Example Address 2 address with delimited data components:
118*Ridgewood Circle*Rochester*NY
There is an asterisk (*) between each piece of data in this instance document.
As there is no asterisk before the first piece of data (houseNumber) or after the
last piece of data (state) we say that the data is infix delimited by *.
When we model this structure in DFDL we will not be specifying element lengths
in our model but instead will be specifying the delimiter and its location. Such a
DFDL schema, again in short form may look like:
1
2
3

<xs:element name="address" dfdl:lengthKind="implicit">

<xs:complexType>
<xs:sequence dfdl:sequenceKind="ordered"
dfdl:separator="*"
dfdl:separatorPosition="infix">
4
<xs:element name="houseNumber" type="xs:string"
dfdl:lengthKind="delimited"/>
5
<xs:element name="street" type="xs:string"
dfdl:lengthKind="delimited"/>
6
<xs:element name="city" type="xs:string"
dfdl:lengthKind="delimited"/>
7
<xs:element name="state" type="xs:string"
dfdl:lengthKind="delimited"/>
8
</xs:sequence>
9
</xs:complexType>
10 </xs:element>

o Line 1 establishes an element named address with one DFDL property on it

setting the length to be implicit, meaning that the length of address is
determined by the length of its children.

o Line 3 is establishing an unnamed sequence within element address. It also

holds a property saying that the sequence is ordered. So houseNumber is
always first, followed by street, city and ending with state. This line also
holds three other DFDL properties necessary to define the delimiter used in
the data. A delimiter between the components of a sequence is called an
infix separator in DFDL. Accordingly, the separator property is set to the
asterisk *, its position is set to infix.
o Line 4 lists the first element of the sequence, houseNumber. On this
element there is one DFDL property saying its length is delimited, meaning
that the length of this element is determined by looking at delimiters in the
data.
o Line 5 defines the second element in the sequence as element street. The
DFDL property on this element describes it as delimited.
o Line 6 defines the third element in the sequence as element city. The DFDL
property on this element describes it as delimited.
o Line 7 defines the fourth element in the sequence as element state. The
DFDL property on this element describes it as delimited.
The data content from our example may be represented in an infoset format
such as:
Infoset:
address
houseNumber
street
city
state

118
Ridgewood Circle
Rochester
NY

Notice that the infosets in Example 1 and Example 2 look the same. This is
because in both example instance documents the data is logically the same it
is just presented differently physically (with fixed sizes in Example 1 versus with
delimiters in Example 2). Once we strip away the delimiters and the padding we
are left with the same data. This distinction between physical data and logical
data will be very important (and useful) as we progress though our understanding
of DFDL.
Now that weve seen a very small set of what DFDL can do, we need to start
exploring how to exploit it fully. Our next lesson will introduce fundamentals of
the DFDL language and the concept of DFDL annotations.

WM6681 G06 MSG Modeling
No ratings yet
WM6681 G06 MSG Modeling
70 pages
GFD 197
No ratings yet
GFD 197
6 pages
Data Structures
100% (4)
Data Structures
275 pages
Data Structures PDF
100% (1)
Data Structures PDF
275 pages
Database Design With UML and SQL
100% (1)
Database Design With UML and SQL
76 pages
GFD 240
No ratings yet
GFD 240
253 pages
Difference Between DBMS and RDBMS
No ratings yet
Difference Between DBMS and RDBMS
16 pages
320 - CS8391 Data Structures - Notes 2
No ratings yet
320 - CS8391 Data Structures - Notes 2
97 pages
Java Chap2: Class As The Basis of All Computation (Prof. Ananda M Ghosh.)
90% (10)
Java Chap2: Class As The Basis of All Computation (Prof. Ananda M Ghosh.)
20 pages
GFD 207
No ratings yet
GFD 207
244 pages
Ogf DFDL 3
No ratings yet
Ogf DFDL 3
1 page
Unit-1 - Introduction To Basic Terminology
No ratings yet
Unit-1 - Introduction To Basic Terminology
14 pages
DS Unit 1 PDF
No ratings yet
DS Unit 1 PDF
19 pages
Notes
No ratings yet
Notes
215 pages
GFD 174
No ratings yet
GFD 174
168 pages
Screenshot 2023-05-16 at 5.54.20 PM
No ratings yet
Screenshot 2023-05-16 at 5.54.20 PM
158 pages
DFDL Tutorial Lesson 1 - Introduction
No ratings yet
DFDL Tutorial Lesson 1 - Introduction
10 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
158 pages
Unit 1
No ratings yet
Unit 1
149 pages
DSA - Unit 1 06.09.22
No ratings yet
DSA - Unit 1 06.09.22
171 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
158 pages
Ds 01
No ratings yet
Ds 01
71 pages
AI212 - DBMS - Lab Manual
No ratings yet
AI212 - DBMS - Lab Manual
78 pages
PDF
No ratings yet
PDF
130 pages
Microsoft Azure Data Fundamentals Explore Core Data Concepts
No ratings yet
Microsoft Azure Data Fundamentals Explore Core Data Concepts
8 pages
Unit2 PL
No ratings yet
Unit2 PL
41 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
35 pages
Unit 4. Data Structure-15.9.2022
No ratings yet
Unit 4. Data Structure-15.9.2022
41 pages
Basic Concept of Algorithm: Adding Two Numbers Algorithm To Add Two Numbers
No ratings yet
Basic Concept of Algorithm: Adding Two Numbers Algorithm To Add Two Numbers
22 pages
Dsa Module 1
No ratings yet
Dsa Module 1
25 pages
Object Oriented and Object Relational Databases, Logical and Web Databases-1-32
No ratings yet
Object Oriented and Object Relational Databases, Logical and Web Databases-1-32
32 pages
DFDL v1.0
No ratings yet
DFDL v1.0
168 pages
Module 3 (DFD, DD, Databse Schema, UML, CBA)
No ratings yet
Module 3 (DFD, DD, Databse Schema, UML, CBA)
25 pages
Lec 3
No ratings yet
Lec 3
15 pages
Lecture 7 - Z-Lanugage With Library System
No ratings yet
Lecture 7 - Z-Lanugage With Library System
23 pages
Module 1 Discussions and Activity PDF
No ratings yet
Module 1 Discussions and Activity PDF
22 pages
CSE 2321 Lec-1
No ratings yet
CSE 2321 Lec-1
19 pages
Unit 1
No ratings yet
Unit 1
17 pages
File System, Problems With File System Unit1
No ratings yet
File System, Problems With File System Unit1
14 pages
The Database Life Cycle: 1. Heading 1
No ratings yet
The Database Life Cycle: 1. Heading 1
10 pages
SAD Notes BCA III
No ratings yet
SAD Notes BCA III
8 pages
Data Dictionary: What Does "Backordered Item" Mean?
No ratings yet
Data Dictionary: What Does "Backordered Item" Mean?
27 pages
Unit 1: Introduction To Fundamental Data Types and Structures
No ratings yet
Unit 1: Introduction To Fundamental Data Types and Structures
15 pages
Data Structure Intro and Algo
No ratings yet
Data Structure Intro and Algo
11 pages
What Is A Data Flow Diagram (DFD) ?
No ratings yet
What Is A Data Flow Diagram (DFD) ?
9 pages
PT&TK 2022 BaiTap2
No ratings yet
PT&TK 2022 BaiTap2
8 pages
Database Design With Entity Relationship Model
No ratings yet
Database Design With Entity Relationship Model
14 pages
Unit - C SE (DFD, ER Model)
No ratings yet
Unit - C SE (DFD, ER Model)
6 pages
Ogf DFDL 1
No ratings yet
Ogf DFDL 1
1 page
M1.1: Basic Terminology of Data Organization:: WWW - Magix.in
No ratings yet
M1.1: Basic Terminology of Data Organization:: WWW - Magix.in
0 pages
Data Structure
No ratings yet
Data Structure
3 pages
Lesson1 Introduction v2
No ratings yet
Lesson1 Introduction v2
9 pages
0604 Beers Teamdev
No ratings yet
0604 Beers Teamdev
31 pages
PMQ Solution Guide
No ratings yet
PMQ Solution Guide
117 pages
Environments and Profile Files
No ratings yet
Environments and Profile Files
2 pages
Beginning XML
From Everand
Beginning XML
Joe Fawcett
3/5 (1)
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
From Everand
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
Christopher Right
2.5/5 (2)
Mastering XML: Essential Techniques
From Everand
Mastering XML: Essential Techniques
Brett Neutreon
No ratings yet
SQL Tutorial For Beginners
From Everand
SQL Tutorial For Beginners
HAU DANG
No ratings yet
Schematron: A language for validating XML
From Everand
Schematron: A language for validating XML
Erik Siegel
No ratings yet
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
XML Unlocked: A Complete Guide to Mastery and Advanced Techniques
From Everand
XML Unlocked: A Complete Guide to Mastery and Advanced Techniques
Adam Jones
No ratings yet
MVS JCL Utilities Quick Reference, Third Edition
From Everand
MVS JCL Utilities Quick Reference, Third Edition
Robert Wingate
5/5 (1)

Lesson1 Introduction v2

Uploaded by

Lesson1 Introduction v2

Uploaded by

DFDL Introduction For Beginners

What is Data Format Description Language?

When to use DFDL

First set of DFDL examples

Infix Delimited Data in which a delimiter or separator appears in

Modeling fixed length data

Modeling variable length delimited data

<xs:element name="address" dfdl:lengthKind="implicit">

o Line 1 establishes an element named address with one DFDL property on it

o Line 3 is establishing an unnamed sequence within element address. It also

You might also like