EXPRESS Data Modeling
EXPRESS Data Modeling
Abstract
International Standards Organization Technical Committee 184, Sub-committee 4,
Working Group 3 (ISO TC184/SC4/WG3) is responsible for the development of ISO
10303 (informally known as the STandard for the Exchange of Product model data—
STEP) and has a formalized methodology for capturing domain knowledge in the form of
an Application Protocol that uses the EXPRESS language to define data requirements for
the exchange of product model data between dissimilar automation systems. In a very
real sense, an Application Protocol is a Software Requirements Specification that can be
realized by compiling EXPRESS language specifications into object repositories that can
be interfaced by additional software.
This paper provides an overview of the key EXPRESS constructs and shows how these
key constructs are used within an ISO 10303 Application Protocol. This term paper was
prepared for Computer Software Engineering course 648 (CSWE 648) " Software Design
and Implementation" during the Spring Semester of 2001 at the University of Maryland
University College (UMUC).
Introduction
As software systems continue to increase in complexity and scope, the need for more
complex representations of real world objects also increases. Complex systems rely on
complex data models to support them. Within the Engineering community, computers
have become central to the design of complex systems and products like Aircraft, Ships,
Automobiles, Process Plants, etc. In 1979 it became apparent that there was a need to
exchange computer data describing complex systems between Computer Aided Design
(CAD) tools running on disparate computer systems with different levels of
sophistication and capability. The Initial Graphics Exchange Specification (IGES) was
created to address this need and sanctioned by the American National Standards Institute
(ANSI). (Kemmerer, 11)
In 1984, a second effort called the Product Data Exchange Specification (PDES) was
launched to leverage the early lessons from IGES and to produce a more robust product
data modeling framework. This effort soon became an international standards effort
which in 1994 yielded an initial collection of documents known formally as “ISO 10303
Industrial automation system and integration – Product data representation and exchange”
and informally as the STandard for the Exchange of Product model data (STEP). STEP
follows a three-tier data modeling approach that captures knowledge in a model using
Contents
This section of the paper provides a “road map” to the rest of the paper. This paper
consists of the following sections:
• Abstract—Provides a brief overview of the paper’s subject.
• Introduction—Provides some background and context useful for interpreting the
paper.
• Contents—This section.
• EXPRESS—Defines the EXPRESS and EXPRESS-G languages in enough detail
to understand the rest of the paper.
• STEP—Provides an overview of the layout and document numbers of the STEP
standard. Also describes STEP’s three tier modeling approach.
• Application Protocols—Shows the general contents of an Application Protocol
(AP) and explains the process used by ISO to define them. Also highlights the
parallels between AP development and canonical software development lifecycle
models (SDLC).
• AP 227 Pipe Definition—Traces the definition/representation of a pipe in ISO
10303-227:2001(E) “Industrial automation systems and integration—Product data
representation and exchange—Part 227: Application protocol: Plant spatial
configuration” through all three tiers of STEP. This example shows the use of
EXPRESS in its proper context within ISO 10303.
• Conclusions—Provides a summary of the paper and highlights some of the key
ideas that can be concluded from the material presented here.
• References—Points to the reference material used in preparing this paper.
EXPRESS
Overview
ISO 10303-11:1994(E) “Industrial automation systems and integration—Product data
representation and exchange—Part 11: Description methods: The EXPRESS language
reference manual” “defines a language by which aspects of product data can be specified.
The lexical form of EXPRESS is defined using a derivative of Wirth Syntax Notation
(WSN). (ISO 10303-11,7). The outermost syntactical element in an EXPRESS schema is
declared using the SCHEMA keyword. The WSN for SCHEMA is defined on page 56 of the
specification and is repeated here for reference:
The complete annotated listing of WSN for the EXPRESS language is contained in
annex A of ISO 10303-11. For the purposes of this paper, it is sufficient to understand
that the fist line of the listing above requires that an EXPRESS schema start with a
syntactical statement similar to:
SCHEMA plant_spatial_configuration;
END_SCHEMA; -- plant_spatial_configuration
In addition to the SCHEMA keyword, we will also consider the following keywords in the
next few sections of this paper: USE FROM, TYPE, ENTITY, WHERE, RULE, and FUNCTION.
Except for USE FROM and WHERE, all of these keywords define a block structure within
an EXPRESS file. The block starts with the keyword and ends with the keyword repeated
with the prefix “END_”. For example, TYPE and END_TYPE demark a block of an
EXPRESS file that defines a data type.
The general skeleton of an EXPRESS Schema is listed below. The following sections of
the paper present each syntactical structure in more detail.
SCHEMA schema_name;
USE FROM other_schema_name (entity_name, entity_name2, . . .);
SCHEMA
The SCHEMA keyword surrounds all other syntax in an EXPRESS file. Each schema has a
name that is used to uniquely identify the schema being defined. It is common to refer to
the EXPRESS file by the name of the schema regardless of the actual filename in a
computer’s operating system. Within STEP there are two general kinds of schemas:
short-form and long-form. The distinction is based on the USE FROM keyword and is
explained later.
EXAMPLE OF SCHEMA
SCHEMA plant_spatial_configuration;
o o o
END_SCHEMA;
USE FROM
One schema can copy all or part of another schema’s contents in a manner similar to the
“#include” compiler directive in the C language. USE FROM does not have a matching
“END_USE” keyword. It is simply terminated at the next semi-colon.
In STEP, schemas that contain USE FROM statements are called short-form schemas.
Typically, an Application Protocol contains a short-form schema that uses syntax from
the Generic Integrated Resource parts of the STEP standard (Generic Integrated
Resources are explained later in this paper). Each Application Protocol also has an
official long-form schema in which all of the USE FROMs have been resolved. In other
words, a long-form schema has copied the necessary syntax directly into itself and is a
complete and self-contained definition.
This listing says that the current schema has an ENTITY named “myEntity_3” and that the
definition for this entity is to be copied from (used from) the definition of the ENTITY
named “ENTITY_3” in the SCHEMA named “MYSCHEMA”.
TYPE
One of the more powerful aspects of the EXPRESS language is the flexibility of user-
defined types. In addition to the “built-in” types of: String, Number, Integer, Real,
Boolean, Logical, and Binary, a data modeler can define data types that add semantic
meaning to attributes within a schema.
For example, a data modeler can define a type “name” to hold a string value. This allows
the data model to impose constraints on the strings that are names. Simple constraints can
be imposed using the keyword WHERE as explained later in the paper.
In addition to types that resolve to individual literals, EXPRESS also allows data
modelers to define enumerations and a special data type known as a “SELECT TYPE”. A
SELECT TYPE defines a compound data type that may assume any legal value of any of
its constituents.
EXAMPLE OF TYPE
TYPE DEFINED_1 = STRING;
END_TYPE;
This listing shows four user-defined data types named “DEFINED_1”, “DEFINED_2”,
“SELECT_1”, and “ENUMERATION_1” respectively. DEFINED_1 and DEFINED_2
are simple types in that they simply add semantics to a string and an integer. In this case
the added semantics are rather abstract. ENUMERATION_1 is an enumeration data type.
As shown in the listing, the legal values for ENUMERATION_1 are: “ONE”, “TWO”, or
“THREE”. SELECT_1 is a select type and defines a data type whose legal values include
the legal values of any of the other three user-defined data types described so far.
Lines are used to show relationships between the boxes. The circle on the end of the line
can be thought of as an arrowhead. For example, DEFINED_1 is a STRING because
there is a line pointing from DEFINED_1 to STRING.
ENTITY
ENTITYs are the heart of an EXPRESS schema. They collect attributes and constraints
together in a manner similar to an ENTITY in IDEF1x or a CLASS in C++. EXPRESS
supports single and multiple inheritance such that a child entity inherits all of its parents’
EXAMPLE OF ENTITY
ENTITY ENTITY_1;
ATTRIBUTE_1 : INTEGER;
ATTRIBUTE_2 : STRING;
ATTRIBUTE_3 : ENTITY_2;
END_ENTITY;
ENTITY ENTITY_2;
END_ENTITY;
ENTITY ENTITY_3
SUBTYPE OF(ENTITY_1);
END_ENTITY;
This listing shows the definition of three ENTITYs named “ENTITY_1”, “ENTITY_2”,
and “ENTITY_3” respectively. ENTITY_1 has three attributes, one of which is another
ENTITY. ENTITY_3 is a subtype of ENTITY_1. This means that an instance of
ENTITY_3 will also have three attributes defined because ENTITY_3 inherits all of
ENTITY_1’s attributes.
Where rules may be applied to ENTITYs or TYPEs. In both cases, the WHERE clause
appears after the primary part of the declaration and before the “END_”. RULES may also
be applied to a whole collection of entity instances as described in the next section of the
paper.
EXAMPLES OF WHERE
TYPE day_in_month_number = INTEGER;
WHERE
wr1: ((1 <= SELF) AND (SELF <= 31));
END_TYPE; -- day_in_month_number
The listing above shows how a where rule is used to require that a valid
day_in_month_number be an integer between 1 and 31. This constraint applies to all
ENTITYs that use this as a data type for an attribute.
ENTITY offset_curve_2d
SUBTYPE OF (curve);
basis_curve : curve;
distance : length_measure;
self_intersect : LOGICAL;
WHERE
wr1: (basis_curve.dim = 2);
END_ENTITY; -- offset_curve_2d
The where rule in the above listing is a little more interesting. It says that the attribute
named “basis_curve” is of type “curve”, which is an ENTITY; the curve has an attribute
named “dim”; and a valid instance of offset_curve_2d must have a value of 2 for dim.
RULE
EXPRESS allows constraints to be applied to whole collections of ENTITYs as well as to
individual TYPEs and ENTITYs. These constraints are captured as global rules using the
RULE keyword.
EXAMPLE OF RULE
RULE application_context_requires_ap_definition FOR
(application_context, application_protocol_definition);
WHERE
wr1: (SIZEOF(QUERY ( ac <* application_context | (NOT (SIZEOF(
QUERY ( apd <* application_protocol_definition | ((ac :=: apd.
application) AND (apd.
application_interpreted_model_schema_name =
'plant_spatial_configuration')) )) = 1)) )) = 0);
The listing above collects a single where rule into a global RULE. This is an example of
how ugly constraint specification can be in the EXPRESS language. The RULE says that
“For each instance of application_context, there shall be exactly one instance of
application_protocol_definition that references the instance of application_context as its
application with a value of ‘plant_spatial_configuration’ as its application_interpreted-
_model_schema_name.”(ISO 10303-AP227,880).
FUNCTION
FUNCTIONs allow complex constraints within where rules to be split out from the where
rules in a manner similar to splitting out functions as SUBROUTINES within the
FORTRAN language. Each function has a signature that defines a return type as well as a
list of input types.
EXAMPLE OF FUNCTION
FUNCTION acyclic_curve_replica(
rep: curve_replica;
parent: curve
): BOOLEAN;
IF NOT ('PLANT_SPATIAL_CONFIGURATION.CURVE_REPLICA' IN
TYPEOF(parent))
THEN
RETURN(TRUE);
END_IF;
IF parent :=: rep THEN
RETURN(FALSE);
ELSE
RETURN(acyclic_curve_replica(rep,parent\curve_replica.parent_curve));
END_IF;
END_FUNCTION; -- acyclic_curve_replica
Document Numbering
ISO 10303 is organized as a series of parts, each published separately. The structure of
this international standard is described in ISO 10303-1. The numbering of the parts of
this International Standard reflects its structure:
— Parts 11 to 14 specify the description methods;
— Parts 21 to 29 specify the implementation methods;
— Parts 31 to 35 specify the conformance testing methodology and framework;
— Parts 41 to 50 specify the integrated generic resources;
— Parts 101 to 107 specify the integrated application resources;
— Parts 201 to 237 specify the application protocols;
— Parts 301 to 337 specify the abstract test suites;
— Parts 501 to 520 specify the application interpreted constructs.
A complete list of parts of ISO 10303 is available from the Internet:
http://www.nist.gov/sc4/editing/step/titles/
Application Protocols are the 200 series part of the international standard. The generic
integrated resources are numbered from 41 to 50 and contain EXPRESS schemas. Each
Application Protocol uses (USE FROM) combinations of these generic integrated resource
schemas to form a short-form EXPRESS schema that meets the data exchange needs for a
domain.
The second tier maps all of the data requirements and constraints from the first tier onto a
set of generic integrated resources. These generic integrated resources are defined using
the EXPRESS language and every Application Protocol is mapped onto the same set of
generic integrated resources. In theory, similar data requirements from different domains
should map to the same EXPRESS constructs in the generic integrated resources. In
practice this ideal has not yet been proven.
AP Table of Contents
Just as every EXPRESS schema follows the generic skeleton described above, every
STEP Application Protocol (AP) follows the same basic format as outlined in the
following skeletal Table of Contents:
Introduction
1 Scope
2 Normative references
3 Terms, definitions, and abbreviations
4 Information requirements
4.1 Units of functionality
4.2 Application objects
4.3 Application assertions
5 Application interpreted model
5.1 Mapping table
5.2 AIM EXPRESS short listing
6 Conformance requirements
Annex A (normative) AIM EXPRESS expanded listing
o o o
Annex F (informative) Application activity model
It is interesting to note that the EXPRESS language allows for a “compilation” of a short-
form EXPRESS model into a long-form EXPRESS model. Furthermore, there are several
compilers that translate the long-form EXPRESS into a collection of C++ or Java classes,
which can then be compiled as part of a larger software product. The tracking of domain
requirements from the ARM to the Long-form EXPRESS entities is very similar to the
tracking of user requirements to software functionality in a typical software development
project.
By using STEP standards as part of the requirements for new software development
efforts, organizations (both developers and customers) can leverage very large data
modeling efforts for the cost of purchasing the ISO Standards and the cost of learning
how to read the ISO standards.
Finally, clause 4.2.154 defines the Pipe object. “A Pipe is a type of Piping_component
(see 4.2.157) that is a hollow cylindrical conveyance, with a constant radius for the cross-
sectional circle, for directing fluid, vapour, or particulate flow. Each Pipe may be one of
the following: a Mitre_bend_pipe (see 4.2.142), a Nipple (see 4.2.143), a Straight_pipe
(see 4.2.232), or a Swept_bend_pipe (see 4.2.248).” This clause also notes that, “In most
cases, the Pipe will conform to the dimensional requirements for nominal pipe size as
tabulated in national standards such as American National Standards Institute (ANSI)
B36.10 and ANSI B36.19.”, and that, “This definition does not exclude tubing and flex
hoses from consideration as Pipe.” (ISO 10303-227, 95)
NOTE: The two literal values ‘pipe’ and ‘plant item’ from the “Reference path” in
Figure 4 are annotated on Figure 5.
Conclusions
This paper has provided a high-level introduction to the EXPRESS data modeling
language and has provided some insight into the application of this language to
developing Application Protocols within the STEP (ISO 10303) framework.
By using the EXPRESS language, ISO 10303 Application Protocols meet several of the
primary tenants of good software specification. Specifically they are traceable and traced
and they are precise. Whether they are also clear and unambiguous is a matter for debate
and is outside the scope of this paper. EXPRESS offers an advantage over other
languages such as IDEF and UML in that it able to be machine processed and the
relationship between its graphical and lexical forms is standardized.
The USE FROM mechanism is another advantage of EXPRESS. This mechanism allows
patterns of data models to be shared in a standardized manner. STEP’s use/abuse of USE
FROM in its Generic Integrated Resources has resulted in the complexity reflected in
Figure 5. Specifically, the STEP methodology led to the need to instantiate eight ENTITYs
just to say “there is a pipe.” Furthermore, to understand the meaning of these entities
requires recognizing a patter that consists of the fourteen ENTITYs shown in Figure 5.
The process of mapping domain information into data structures is an essential part of
any software development effort whether it follows a formal development lifecycle
model or not. Software development efforts targeted towards the design of complex
systems (like process plants, automobiles, or ships) can gain a great deal of leverage by
incorporating STEP Application Protocols. This leverage is amplified by the fact that
long-form EXPRESS schemas can be directly compiled into data repositories with
application programming interfaces (APIs). As more applications attempt to share data
References
ISO 10303-1:1994(E) “Industrial automation systems and integration—Product data
representation and exchange—Part 1: Overview and fundamental principles”
Kemmerer, Sharon J. editor, (1999), “STEP The Grand Experience”, NIST Special
Publication 939, National Institute of Standards and Technology, CODEN: NSPUE2.