Advanced Standard SQL Dynamic Structured Data Modeling and Hierarchical
Advanced Standard SQL Dynamic Structured Data Modeling and Hierarchical
Michael M. David
Lee Fesperman
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the U.S. Library of Congress.
ISBN: 978-1-60807-533-1
All rights reserved. Printed and bound in the United States of America. No part of this
book may be reproduced or utilized in any form or by any means, electronic or mechanical,
including photocopying, recording, or by any information storage and retrieval system,
without permission in writing from the publisher.
All terms mentioned in this book that are known to be trademarks or service marks have
been appropriately capitalized. Artech House cannot attest to the accuracy of this informa-
tion. Use of a term in this book should not be regarded as affecting the validity of any
trademark or service mark.
10 9 8 7 6 5 4 3 2 1
Contents
Preface xxi
Introduction xxv
v
vi Advanced SQL Dynamic Data Modeling and Hierarchical Processing
2.8 Conclusion 21
4 Natural Joins 37
4.8 Conclusion 47
6.9 Conclusion 80
26 Summary 313
Contents xix
Glossary 317
Bibliography 361
Index 367
Preface
This revised and updated edition of Advanced ANSI SQL Data Modeling and
Structure Processing delves deeper into the inherent hierarchical processing of
SQL and covers the hierarchical processing discoveries and new findings that
have evolved since the first edition came out. To be clear, this is not a book on
external databases built on top of SQL and driven procedurally by the user.
These types of databases are two-dimensional, consisting of height and width,
and are basically flat. This book is about the powerful natural hierarchical data-
base inherent in SQL-92. This is a powerful, automatic, three-dimensional
database containing the height, width, and depth necessary to process
heavy-duty professional databases such as IBM’s IMS, XML databases as well as
new logical hierarchical relational databases.
There are many new hierarchical data modeling and processing capabili-
ties that have been made possible with the standard SQL join syntax and outer
join operation added in the SQL-92 standard. This is still one of SQL’s biggest
kept secrets today. Most of these capabilities are not generally known, if they
are known at all. These hierarchical capabilities have been lying dormant, wait-
ing to be utilized. They unlock the power of hierarchical processing that comes
free with the SQL-92 standard. The standard SQL join syntax actually contains
a very flexible and powerful programming language with dynamic data model-
ing and hierarchical structure processing capabilities. Their full utilization can
be extremely beneficial and useful to all SQL programmers, DBAs, database
designers, product developers, data scientists, and product users. While these
capabilities are available for use, they have not been documented in other SQL
reference books or SQL vendors’ user manuals. This book remedies this prob-
lem by thoroughly documenting these powerful inherent hierarchical data
xxi
xxii Advanced SQL Dynamic Data Modeling and Hierarchical Processing
modeling and processing capabilities. This book will also demonstrate these
advanced capabilities so that database professionals can see examples of these
hierarchical queries run on an experimental SQL hierarchical XML processor.
Using this book, SQL beginners and experts will be able to immediately
utilize the standard SQL outer join operation to support its advanced
underutilized hierarchical processing capabilities. The outer join technology
presented can be safely applied because it is open and standard SQL compati-
ble, avoiding interface problems now and in the future. Because the inherent
and direct processing of complex hierarchical data structures is new to SQL,
data structures, their semantics, and direct use with the standard SQL outer
join are also well covered in this book. This will fully round out the outer join
coverage and its many uses. Some of its advanced new capabilities are hierarchi-
cal integration of relational and hierarchical data, dynamic, transparent, and
navigationless hierarchical multipath processing, automatic processing of
dynamically structured data, powerful any-to-any structure transformations,
and structure-aware processing for hierarchical optimization, dynamic format-
ted XML output, and dynamic joining of hierarchical structures creating new
structures.
The standard SQL join has many different join types and a very flexible
syntax for specifying them that can significantly control its operation and affect
its join result. This makes outer joins difficult to use and prone to semantic
errors. Many combinations of join types produce illogical structures that can
produce ambiguous results. It is a complicated topic, and for these reasons,
there has not been a book or vendor manual on SQL that demonstrates or dis-
cusses anything more than very simple two-table outer joins. For this reason,
the outer join operation is just too complex a topic to deal with in a limited
way, and is fully covered in this book.
The real power of the outer join is achieved when these advanced capabil-
ities are used in outer joins involving three or more tables. This book instructs
the SQL user on how to perform powerful multiple table outer joins by follow-
ing the hierarchical rules and principles set forth to make constructing and
understanding the effects and semantics of multiple table outer joins very intu-
itive. This structured data logic can be embedded in SQL views. This data
modeling and structure processing ability can establish a default database stan-
dard or model for modeling because it is supported completely by standard
SQL syntax and semantics. The following new features are supported:
The SQL examples in this book have been designed so that the intended
meaning of the query results are self-explanatory. This means there is usually no
need to compare query output data in the examples against actual data in the
database. There is a consistent set of familiar data structures used throughout
the book (see the appendix). In addition, if the structure is important to the
example, it is shown again in the example. The query result columns are usually
arranged following their structure so that the semantics are more easily inter-
preted based on the data structure. It is important to keep in mind that—when
comparing the results of queries—the column order of sibling segments has no
semantic significance.
There are two types of SQL examples used in this book. These are
real-world examples and pseudo-examples. The real-world examples are valid
SQL and are used to show specific examples, while the pseudo-examples are not
necessarily complete or totally valid SQL. They are used when it is important to
easily convey a general idea or principle. Often, the pseudo-examples use table
names, such as T1, T2 or A, B, C, and may also use these conventions instead
of columns names to highlight that the importance is not the column name,
but which table the column name belongs to. A pseudo-SQL example may
have the form of From A Left Join on A=B where there may be no SELECT
clause or fully qualified column names in the ON condition when the condi-
tion is not necessary to the concept being discussed.
This book is divided into five parts that are best read sequentially, though
the important points are repeated or referenced in the text when their under-
standing is necessary for the topic being covered. Part I covers the basics of the
relational join operation. Part II investigates the basic data modeling and struc-
ture processing features that are inherent with the standard SQL outer join and
are available for immediate use. Part III explains the new capabilities that were
not previously possible in SQL, but that are now made possible by the outer
join’s data modeling capability. Part IV examines advanced data structure pro-
cessing operations that have been made possible by SQL’s new hierarchical data
modeling and processing capabilities. Part V, using the hierarchical data model-
ing and structure processing background that has been presented previously,
describes the creation of a new and powerful SQL transparent hierarchical
XML query processor and how it operates. What makes this SQL processor dif-
ferent is that it transparently supports full hierarchical multipath processing
with inherent native XML input and output support. This capability was devel-
oped utilizing new discoveries made during the research of this technology.
xxiv Advanced SQL Dynamic Data Modeling and Hierarchical Processing
These discoveries (which are covered in this book and that allow these
hierarchical capabilities in SQL) are:
xxv
xxvi Advanced Standard SQL Dynamic Structured Data Modeling
hierarchical data modeling and processing language, what its capabilities are,
and how it can achieve those capabilities. This is the purpose of this book.
There are data modeling books on the market that cover hierarchical data
modeling. The difference with this book is that it explains standard SQL’s
inherent hierarchical data modeling capability and why it is not just another
data modeling methodology. It is a complete data modeling language that actu-
ally controls SQL’s full hierarchical operation. This means that this book is not
proposing just another data modeling language; it is defining how the one that
inherently exists in standard SQL operates and performs full hierarchical pro-
cessing. This allows it to be utilized immediately after standard SQL is
installed. This means that when a hierarchical data structure is modeled using a
standard SQL join and is subsequently executed in SQL, the result reflects
exactly the hierarchical semantics of the data structure that is being modeled.
Using this natural technology, an experimental SQL hierarchical XML proces-
sor was built to test out the hierarchical processing and hierarchical/relational
integration to produce XML structured output to demonstrate and verify the
hierarchical accuracy.
Part I
The Basics of the Relational
Join Operation
Part I covers the basics of the relational join operation with a concentrated look
at the more complex and less known outer join operation. The inner join is the
more common and simpler standard join. Chapter 1 introduces the inner and
outer join operations and explains their basic functions and operations, and
their strong and weak points. Chapter 2 defines the standard SQL outer join
operation and discusses its main operation. Chapter 3 goes into the many dif-
ferent types and features of the standard SQL outer join operation and their
specific operations. Chapter 4 concentrates on one specific optional feature of
the join operation, the NATURAL option of the join. This feature makes each
outer join type operate in a different way, which is why it has its own chapter.
1
1
Relational Join Introduction
In relational databases, data is stored in two-dimensional tables. These tables
are arranged in rows and columns of data where each row can be thought of as a
record and the columns are the data fields. For example, a given row would
contain related data such as employee number, salary, and department number.
Other rows in the table would contain these same types of information (attrib-
utes) for other employees.
An application database view usually requires multiple tables, because
standard relational tables do not yet allow for variable repeating fields in a row.
This is because standard relational databases require first normal form data.
Thus, repeating data is supported by using additional tables to hold repeating
values in multiple rows. Second and third normal form data modeling decisions
can also account for related data being split across multiple tables, but these
decisions relate to good database design and are not a requirement.
In relational terms, rows are also known as tuples. Each table column
contains the same type of data (attributes), such as salary or department num-
ber. Every row needs to be uniquely identified by a primary-key field such as
employee number or social security number. Rows can also contain nonunique
key fields such as alternate and foreign keys, like a department number in the
Employee table. These can be used to access a group of related rows, such as all
employees for a given department.
A primary-key field in one table can be a foreign-key field in another
table. This is the case in the familiar Department and Employee tables, where
the department number in the Department table is its primary key, and in the
Employee table the department number is the foreign key. A join operation
is used to combine tables like the Department and Employee tables using a
3
4 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
common key in both tables, such as the department number keys to match the
rows that will be combined.
With the inner join, the order that the tables are specified for joining does
not affect the result. If the order that the table names were specified in the inner
join statement in Figure 1.1 were reversed, the result would remain the same.
Because the order that the table joins are processed has no effect on the result,
this allows internal optimizations to pick the most efficient join order for
execution.
It is also worth mentioning that the WHERE clause can specify filtering
criteria as well as join criteria, as in SELECT * FROM Department, Employee
WHERE DeptNo= EmpDeptNo AND Salary >= 50,000. In this case, the result
of the join operation also filters out result rows where the salary is less than
50,000.
replicated data values unless it is necessary to reflect the proper data structure
(as will be demonstrated in Chapter 12). But as we saw earlier, there is no way
in the inner join syntax to specify the data structure or to represent the data
structure. When joining the Department table with the Employee table, there
are two data structures possible, Department over Employee or Employee
over Department. Each has its own and distinct semantics, but neither can
be represented in the inner join result of these two tables as demonstrated in
Figure 1.1.
some interesting and powerful new optimizations that can be applied to outer
joins. These are discussed in detail in Chapter 11.
the other table is preserved (which may seem confusing). The example in Fig-
ure 1.4 demonstrates a case where the Department table is preserved and the
Employee table is not.
The example in Figure 1.4 demonstrates a one-sided join. This is because
the Department table represented in the WHERE clause by the DeptNo col-
umn preserves data because the matching EmpDeptNo join column is flagged
with an asterisk. This, as described below, causes it to be augmented with an
all-null value row that will match with any nonmatching row in DeptNo.
FULL outer joins can also be specified by each join comparison column having
its own asterisk, as in: EmpDeptNo*=*DeptNo, which is demonstrated in
Figure 1.5.
Notice that the result table in Figures 1.4 and 1.5 below have department
A’s data preserved even though there were no matching employees for it, and in
Figure 1.5 employee Y was also preserved even though there was no matching
department for it. This is the reason for the two null values representing the
missing employee and department data in the join result. While this SQL
example operates fine, there is a problem when more than two tables are being
joined. The problem, as mentioned earlier, is that the join table order can affect
the result when outer joins are involved, and these early outer join operations
do not have a method of specifying or controlling the join order. This makes
the result unpredictable when more than two tables are being joined. For exam-
ple, the join statement in Figure 1.6 is ambiguous.
How is the SELECT statement in Figure 1.6 processed? Is the Depart-
ment table outer joined with the Employee table first, or is the Employee table
inner joined with the Dependent table first? The inner join is very destructive
—if performed after the outer join, it can negate the data-preserving effect of
the outer join. So, the join order can be very significant to the result, and there
is no provision in this early nonstandard SQL syntax to control the join order.
1.5 Conclusion
Inner joins lose data when there is no matching data. Outer joins preserve
unmatched data by padding the missing data columns with null values in the
result. Its operation may be more costly than the inner join because of its more
complex requirements. The first outer joins were not standardized, and oper-
ated ambiguously when three or more tables were joined. The standard SQL
outer join is standardized, and its syntax is nonambiguous, as will be shown in
the next chapter.
2
The Standard SQL Join Operation
The SQL-92 version of the standard SQL standard officially introduced an
outer join operation. Much study went into the design of this outer join opera-
tion to correct the problems that had been identified from previous nonstan-
dardized versions, which were covered in Chapter 1. The inner join is still the
standard and default join operation. The syntax of the outer join has been
seamlessly grafted onto the FROM clause, leaving the inner join operation
downwardly compatible with existing SQL code.
11
12 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
SELECT ---
FROM Table-Reference[,Table-Reference]…
WHERE ---
such as the LEFT, RIGHT, FULL, and INNER joins. Left-sided nesting
occurs on the left side of outer join operations, and right-sided nesting occurs
on the right side of outer join operations where tables are brought in by the
recursive syntax. This is reflected in the outer join definition in Figure 2.1. For
completeness sake, the syntactical notations used in this outer join definition
are specified in Figure 2.2.
To simplify the standard SQL outer join definition in Figure 2.1, three
versions of the joined table construct were specified. The first is the most stan-
dard and common syntax. In the second version, a NATURAL option adds a
NATURAL keyword that eliminates the join specification. The third version is
a CROSS join, which also does not use a join specification. The join specifica-
tion with its ON or USING clause also controls nesting, which controls table
join order. Since the CROSS join and natural joins using the NATURAL join
option do not use an ON or a USING clause to control nesting, parentheses
can be used to control nesting and therefore table join order. Normally the join
table order cannot be changed by the use of parentheses because the join order
is determined by the ON and USING clauses. This is discussed further in
Section 2.2.
The FROM clause of the outer join definition, FROM Table-Refer-
ence[,Table-Reference]…, shown in Figure 2.1 allows multiple table references
to be specified. At this top level, multiple table references are relationally joined
The Standard SQL Join Operation 13
using standard inner join logic, making this definition compatible with the
standard inner join.
The standard SQL outer join operation comes into play when a table ref-
erence contains a joined table specification. Coding more than one table refer-
ence at this top level when outer join operations are performed at the lower
level is not desirable. This is because the data-losing properties of the inner join
operation occurring at the top level would negate the data-preserving effects of
the outer join at the lower level. For this reason, this particular syntax use will
not be explored further in this book.
The order the tables are joined using the new outer join syntax is usually
controlled by the nesting (recursive) syntax, which is not always straightfor-
ward. This is because it follows the order of join processing that is not always
apparent with right-sided nesting (nesting occurring with the right table argu-
ment). Left-sided nesting is naturally processed left to right, but right-sided
nesting in combination with left-to-right processing is not a straightforward
process. It requires a stacking procedure to internally assist execution. The rea-
son for this will become clear in the next section.
The join specification in Figure 2.1 can consist of an ON clause with a
join condition, or a USING clause specifying one or more column names to
be used for joining. Each column name that is specified with a USING clause
must exist in both table inputs, and are used internally to form an equal join
(equijoin). The ON and USING clauses specify the join criteria for their asso-
ciated join operations. The USING clause turns the join operation into a natu-
ral join just as if the NATURAL option was specified. The NATURAL option
and USING clause will be described further in Chapter 4.
Because tables and working sets are joined two at a time in a specific
order, a single WHERE clause specifying the join criteria that is logically
applied after all tables are joined (see Chapter 1) does not work well with outer
14 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
joins whose tables need to be joined in a specific order. What is needed and
supplied by the standard SQL outer join is a clause like the ON or USING
clause that specifies the join criteria at each join point. This also has the effect of
separating join criteria specified on these clauses from selection data-filtering
cri- teria specified on the WHERE clause. The column names that are refer-
enced on an ON or USING clause must be found in the tables or working sets
processed by their associated join operation. This is known as the columns
being in the “scope of control.”
Data-filtering criteria can also be specified on the ON clause. This will
achieve a finer level of filtering control than is capable on the WHERE clause.
This filtering will affect only partial areas of the resulting rows. This is covered
further in Chapter 7.
If no join type is specified with a join operation, an inner join is assumed.
The OUTER keyword is an optional informational keyword. The examples in
this document will exclude the OUTER keyword in order to save space in the
SQL examples. The JOIN keyword, while defined as required in the standard
SQL specification, and therefore the join syntax definition in Figure 2.1, is not
necessary in the join syntax to enable it to be processed correctly. For this rea-
son, many SQL implementations treat its use as optional. Taking advantage of
this fact, some of the examples in this book may also exclude the JOIN key-
word when example space is scarce.
Department table because it had been accessed in the generation of the working
set used as the left input of its associated LEFT join operation, and is therefore
in its scope of control.
The outer join specification shown in Figure 2.4 is an example of
right-sided nesting. Parentheses are used in this example to emphasize join exe-
cution order, but have no effect because join order is controlled by the place-
ment of ON clauses when they are present. Notice that the ON clause for the
first LEFT join is actually delayed until after the second LEFT join is com-
pletely specified. This causes the latter join to be performed first, returning the
result to the previous LEFT join as its right-sided input. This nesting can be car-
ried to any depth. Note also that the first specified ON clause associated with
the second LEFT join operation cannot reference columns in the Department
table, since it has not been previously joined with either table input associated
with the second join operation and is therefore not in its scope of control. This
is because right-sided nesting outer joins like this one generate multiple work-
ing sets concurrently, each with a different scope of control associated with it.
This is described further in Chapter 7.
Employee view:
right argument. This will usually produce a different result than without paren-
theses because of the mixture of different join types.
Figure 2.8 Outer join result does not produce a strict Cartesian product subset.
18 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Above query is not logically the same as the following pseudo query:
Figure 2.9 Use of ON clause that is not possible in Cartesian product model.
Both of the outer join queries below produce the same result.
SELECT A,B,C
FROM A LEFT JOIN B ON A=B LEFT JOIN C ON A=C
SELECT A,B,C
FROM A LEFT JOIN (B LEFT JOIN C ON A=C) ON A=B
SELECT A,B,C
FROM A LEFT JOIN B ON A= C LEFT JOIN C ON A=B
Figure 2.11 It is not always possible to rewrite a query to change the join order.
joins by moving the ON clause. Normally, the ability to reorder the joins
requires both associative and commutative properties, and one-sided outer
joins are not commutative as stated earlier. This example builds the same
multileg hierarchical data structure in both SQL statements by reversing the
construction of its legs. This does not change the semantics for hierarchical
structures. This is one of many hierarchical properties that will be covered in
Chapter 5. This example demonstrates that the hierarchictivity property can be
useful in addition to associativity and commutativity when using outer joins.
2.8 Conclusion
The standard SQL outer join preserves data and corrects problems with earlier
nonstandard outer joins. The standard SQL join syntax also has a separate ON
or USING clause for each join type that requires them. These ON and USING
clauses specify the join condition, and each use has its own scope of control.
The standard SQL join syntax supports both the inner join and many other
22 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
23
24 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
means the placement of its two table operands does not affect the result, as
shown in Figure 3.1.
The standard SQL FULL outer join also operates associatively, as defined
in Chapter 2. Since the FULL outer join is associative and commutative, the
table join order, when more than two tables are being joined, can be changed
without affecting the result. There are two reasons for this. First, the FULL join
loses no data regardless of the table join order. Second, the standard SQL FULL
outer join has separate join clauses for each join, which controls and limits the
possible valid FULL joins that are possible. This was not true of the older, non-
standardized outer joins that were less associative in nature. The examples in
Figure 3.2 demonstrate FULL outer joins where the table join order is changed
without changing the result. Each table contains a row that will not be
matched. The first join example joins the Department table to the Employee
table first, while the second join example uses right-sided nesting (discussed in
Chapter 2) to join the Employee table to the Dependent table before joining
the Department table.
There is one situation where FULL outer joins may appear to be nonasso-
ciative, but this situation does fit the definition of associativity and nonassocia-
tivity as described in Chapter 2. Many SQL books use this situation to prove
that the outer join is nonassociative. This situation occurs when three or more
tables are joined across a common domain (key value). This allows the oppor-
tunity to have more valid join combinations. In the SELECT statements in
Figure 3.2, there are only two possible join combinations. If this join was
joined over one common domain, there would be three possible combina-
tions—Department and Dependent could also be joined directly. This is dem-
onstrated in Figure 3.3, which joins all three tables over DeptNo. The third join
Figure 3.1 The FULL outer join demonstrating its commutative behavior.
Standard SQL Join Types and Their Operation 25
Same FULL outer join as above with table join order changed:
FIRST JOIN
SELECT * FROM Department FULL JOIN PROCESSED
(Employee FULL JOIN Dependent ON EmpNo=DpndEmpNo)
ON DeptNo=EmpDeptNo
Figure 3.2 The FULL outer join demonstrating its associative behavior.
statement in Figure 3.3 may produce different results than the two SQL state-
ments above it since it has a different join condition than they do, this being
DeptNo=DpndDeptNo. Even though DeptNo=EmpDeptNo and EmpDeptNo=
DpndDeptNo, which intuitively means DeptNo=DpndDeptNo, this transitive
logic does not hold up for the standard SQL join with its multiple ON clauses
that are each processed separately.
The FULL outer join examples in Figure 3.3 do not lose any data. This
means all the results will contain the same data, but the way their rows are com-
bined may be different because the third example in Figure 3.3 is referencing
different combinations of field locations, which can change the result in this
situation. This is not a case of simply rewriting the outer join statement. In this
case, a different join condition referring to a different table was used, which
changes the semantics and the results. This is demonstrated in their results, also
shown in Figure 3.3.
With FULL joins involving more than two tables joined across a common
domain, you may notice, as in Figure 3.3, that the results may contain
rows that could have been combined more efficiently to reduce the number
of rows generated. For example, the first example results in Figure 3.3 where
the rows had null values added by the join process could be compressed into
26 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Similar join, but not the same join condition as either of those above:
two rows without losing any data, as in the second set of results in Figure 3.3.
The fact that the second set of results had a more compressed result was deter-
mined by the data and not the SQL statements alone. In this same situation,
it is always possible to generate the most compressed result by using the
NATURAL option of the FULL outer join, which is described in Chapter 4.
Figure 3.4 LEFT and RIGHT joins are different forms of the same basic operation.
may be useful for complex outer joins, but can usually be avoided by using the
LEFT outer join.
Since one-sided outer joins only preserve data on one side, they are non-
commutative in operation. This means that the location of the two table input
arguments makes a difference in the results, as shown in Figure 3.5. You can see
that the results of the two LEFT joins have distinctively different semantics.
Since one-sided outer joins only preserve data on one of the two sides—
the dominant side—their result is hierarchical in nature. For example, Depart-
ment LEFT JOIN Employee ON DeptNo=EmpDeptNo produces a result where
Department table values can exist without a matching Employee table value,
but Employee table values can’t exist without a matching Department table
value. This means that Department is hierarchically over Employee. When
joining more than two tables, the effect can be extended as shown in Figure 3.6.
In this SQL example, Department table values can exist without a matching
Employee or Dependent table value. Employee table values can exist without a
matching Dependent, but require a matching Department, and so on. This
means that the Department value is hierarchically over Employee and
Employee is hierarchically over Dependent. One-sided joins can also model
nonhierarchical data structures, which will be covered in Chapter 6. Join table
order and its effect on one-sided outer join operations involving three or more
tables is a complex issue that will also be covered in further detail in Chapter 6,
having to do with data modeling with the outer join.
Being hierarchical in nature, one-sided outer joins can build hierarchical
structures top-down, as shown in Figure 3.6, or by changing the join order to
affect building the hierarchical structure bottom-up, as shown in Figure 3.7.
Because the one-sided outer join is hierarchical in nature, reordering the join
from top-down to bottom-up execution does not change the result. If this is
true, it would prove that the one-sided join is associative in operation—at least
Department
SELECT * FROM Department
LEFT JOIN Employee ON DeptNo=EmpDeptNo Employee
LEFT JOIN Dependent ON EmpNo=DpndEmpNo
Dependent
Result produced from above query:
Figure 3.7 One-sided outer join can also build structures bottom-up.
Department
SELECT * FROM Dependent
RIGHT JOIN Employee ON EmpNo=DpndEmpNo Employee
RIGHT JOIN Department ON DeptNo=EmpDeptNo
Dependent
Department Dependent
SELECT * FROM Employee
LEFT JOIN Dependent ON EmpNo=DpndEmpNo
LEFT JOIN Department ON DeptNo=EmpDeptNo
around in the join operations did not change the results. The principle of hier-
archictivity as coined and defined in Chapter 2 can be applied to multileg hier-
archical structures like this one as well as the single-leg hierarchical structures
shown in Figures 3.6 to 3.8.
The principles of hierarchictivity intuitively make sense, since one-sided
joins are hierarchical in nature and hierarchical structures can be built top-
down, bottom-up, left to right, right to left, or in any combination of these
methods. These one-sided outer join operations can build very complex and
powerful hierarchical data structures. Chapter 5 supplies a review on hierarchi-
cal data structures, and Chapter 6 describes in detail how to model these data
structures using one-sided outer joins.
One-sided joins can also model complex structures that are not hierarchi-
cal structures. When these structures are used in applications, it may be difficult
to predict their operation because they can lack unambiguous semantics. It is
useful to see how this nonhierarchical modeling can occur through one-sided
joins. This awareness can prevent the accidental use of nonhierarchical data
structures. Figure 3.10 demonstrates a nonhierarchical structure being mod-
eled. As is shown, this structure can be modeled in more than one way. While
this structure resembles a network structure, it doesn’t actually operate like one
Standard SQL Join Types and Their Operation 31
Employee
SELECT * FROM Department LEFT JOIN
Dependent LEFT JOIN Employee ON EmpNO=DpndEmpNo
ON DeptNo=EmpDeptNo
because the legs relate to each other hierarchically. In this structure, the Depart-
ment table is hierarchically above the Dependent table. If an Employee row
doesn’t have a link to a Department row, then the unmatched Employee rows
and their parent Dependent rows are excluded from the result. Other nonhier-
archical structures can be created from complex ON clauses consisting of refer-
ences to more than two tables. More information on these nonhierarchical
structures can be found in Chapter 6.
Following the rules for assessing associativity specified in Chapter 2, the
one-sided outer join does not operate nonassociatively, making its operation
under our definition associative. This does not include intermixing LEFT and
RIGHT joins, which may perform nonassociatively. The modeled nonhierar-
chical structure in Figure 3.10 will also produce a different result if the order its
legs are joined in is reversed. In this structure, the order of the legs has signifi-
cance, but the table reordering required to accomplish this is outside the scope
of associativity, which only includes regrouping.
SELECT * FROM T1
[INNER] JOIN T2 ON T1=T2
[INNER] JOIN T3 ON T2=T3
Simulated by:
on the left side with nulls. Then the two tables can be UNIONed one on top of
the other as shown in Figure 3.13. This outer UNION effect can also be per-
formed by a FULL join by specifying the join criteria to never match, as shown
in Figure 3.13.
Simulated by:
since the Department data was preserved for some purpose. Chapter 7 docu-
ments a powerful coding technique to prevent this destructive behavior when
nondata-preserving (destructive) joins or intermixing join types must be used.
3.7 Conclusion
This chapter has looked at all of the different standard SQL join types: the
FULL, RIGHT, LEFT, CROSS, UNION, and INNER joins. Except for the
INNER join, all of these joins also preserve rows when there are no matching
rows.
The two types of outer joins, FULL and one-sided, while logically similar,
behave very differently when three or more tables are being joined together.
Standard SQL Join Types and Their Operation 35
One-sided joins operate hierarchically, while FULL joins do not since they are
symmetrical in operation.
Because the ON clause plays a major role with the outer join and greatly
limits its ability to be freely regrouped, the FULL and one-sided joins be-
have associatively. This can change when the NATURAL option is used. The
NATURAL option is documented in Chapter 4. Intermixing join types can
also make FULL and one-sided joins operate nonassociatively.
Commutativity and associativity do not account for all the valid cases
where the outer join specification can be rearranged and still produce the same
result. To help account for these additional cases, the term hierarchictivity was
introduced to account for the principles of hierarchical structures, which can
also be applied to the reordering of one-sided outer join statements.
4
Natural Joins
Natural joins are INNER, FULL, and one-sided joins where the common
named columns used in the join criteria are coalesced (turned into single-
column values) in the result. For example, when inner joining the Department
and Employee tables over the common key value of the department number,
DeptNo, it is usually convenient to have only one occurrence of the join key
value in the result instead of two (or more) copies of the same key value. This
assumes equal join (equijoin) conditions were used, and natural joins always
use equal join conditions. Natural joins take on added significance with outer
joins because of their data-preserving behavior. This introduces a situation
where one side or the other side of the join condition’s key values may be miss-
ing (null) from the result, making the key location unpredictable. In this case,
the coalesced key values allow a single key location to be used for each row in
the resulting table so it can be referenced easily and consistently. Depending on
the situation, coalescing of the join columns and natural join processing can
increase or decrease the associativity of outer joins across three or more tables
that are under a common domain. This can significantly change the operation
of the outer join operation, which is why it is being examined separately in this
chapter.
37
38 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
coalesce the common named join column keys into single key values. As indi-
cated in the outer join syntax in Figure 2.1, when the NATURAL keyword
option is specified, the ON and USING clauses are not specified. This is
because the join condition is automatically taken as the equal join between col-
umns having the same name in the tables that are in the scope of control of the
outer join operation being performed.
An implicit natural join does not specify the NATURAL keyword; the
NATURAL option is indicated by coding the USING clause instead of an ON
clause to indicate which columns are to be equijoined and coalesced. This is
why this is also called a column name join. It assumes that the specified column
names occur in both table inputs or their scope of control. This gives more con-
trol than the explicit natural join option by externally controlling the specifica-
tion of which common named columns take part in the join condition. Just as
in the explicit natural join, the column names that take part in the join condi-
tion are coalesced in the result. The example in Figure 4.1 demonstrates the
explicit and implicit natural joins and how the column results are affected by
natural joins. In this example, the explicit and implicit natural joins produce
identical results, as you would expect.
The first SQL example in Figure 4.1 is a standard inner join statement
that shows in its result two copies of the join condition key value 123. The next
two SQL join examples demonstrate an explicit and implicit natural inner join.
No Natural Option:
Explicit Natural:
Implicit Natural:
123 HR John
They are equivalent statements. In these examples, DeptNo is the key in the
Department table (Dept) and a foreign key in the Employee table (EMP). This
key is used to perform the join operation. Because this is an equijoin, the join
condition column named DeptNo in each resulting row will always have the
same DeptNo values and can be coalesced for convenience.
The NATURAL option when applied to columns across two tables does
not affect its internal operation. This is not the case for natural joins across
three or more tables over a common column (domain). This is described
directly below.
Figure 4.2 Simulating the coalescing effect of the natural outer join.
Table Names: T1 T2 T3
Column Names: X Y X Z Y Z
Values: 1 2 0 3 2 3
Result 1: X Y Z
1 2 Null
0 Null 3
Null 2 3
Result 2: X Y Z
0 2 3
1 2 Null
null because the chain has been broken. The natural LEFT join does not sup-
port this chaining effect. Basically, the first table (T1) is always preserved and
its key join value(s) remains in force because of the coalescing effect of the
NATURAL option. This will increase the amount of data preserving that is
possible based on table T1’s key values, as can be seen in the inclusion of value
T3C in the natural join result.
After the lead table is processed in one-sided natural joins as in Figure 4.4,
the join order of the other tables can be changed without affecting the result.
This means that the first statement establishes the result, making the natural
one-sided join nonassociative. This is proven in Figure 4.5, which demonstrates
that changing the join order of a natural join can produce a different result.
rows. This is because with coalesced data, there is always a non-null key avail-
able to match on, reducing the generation of null data and creating a predict-
able result. The examples in Figure 4.6 demonstrate this effect.
The standard FULL join shown at the top of Figure 4.6 is not a natural
join. Because of this, it is difficult to predict the order that the rows will be
combined in, as shown in the first example. Using the explicit or implicit natu-
ral FULL join in the second example in Figure 4.6, the rows are condensed,
more predictable, and easier to process, because with the NATURAL option
there is always a fixed key position available to match on. Notice also that the
result rows of the natural FULL join, excluding nulls, contain the same data as
the standard FULL join. This, as explained above, is because no data is lost
Figure 4.6 Natural FULL join producing the most condensed result.
44 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
with a FULL join. Because of this condensing effect, the natural FULL join is
associative in operation (except for the special situation concerning explicit
natural joins documented in Section 4.2).
Since the natural join produces the most condensed result, it also follows
that the natural FULL join can also be reordered in any manner without
changing the result. This is also demonstrated in Figure 4.7. There is another
reason for this behavior, which applies here and in the inner join example in
Figure 4.8. The natural FULL join and natural inner join are both commuta-
tive and associative in operation. By applying both these properties together,
the SQL statement can be completely reordered in any fashion without chang-
ing the result.
be introduced into the result from missing rows because this condition causes
the entire row to be eliminated.
The natural inner join examples in Figure 4.8 demonstrate that the natu-
ral inner join can be completely reordered and it will not change the result.
This behavior includes associativity. Because rows are so easily eliminated with
inner joins, the example data was increased in this example from the previous
examples to derive a result; otherwise, the inner joins in these examples would
have produced empty results.
All the above SQL statements produce: Key2 T2A T2B T2C
statement, or vice versa. Explicit and implicit natural joins can also be inter-
mixed. Intermixing of natural join types is nonassociative. An example of this is
shown in Figure 4.9.
The fact that this natural one-sided outer join transformation is possible
also points out that the natural feature for one-sided outer joins does not offer
any additional capabilities beyond the one-sided outer join operation. This
means it can be avoided by using the more intuitive non-natural one-sided
outer join.
4.8 Conclusion
The NATURAL join option takes on new meaning with outer joins because it
can significantly affect the results of outer joins. This occurs when more than
two tables are natural outer joined across a commonly named column. The
natural outer join operation guarantees that there is always a coalesced key col-
umn value available to join with any of the following tables to be joined. This
changes the operation of one-sided outer joins and FULL outer joins. With
one-sided outer joins, it can cause more data to be preserved and change their
operation to be nonassociative. With FULL outer joins, the NATURAL option
can produce more condensed and predictable results having fewer rows while
containing the same data, and it remains associative in operation except for one
case—this being that explicit natural joins can behave nonassociatively when all
of the tables do not have the same commonly named tables consistently across
the natural join.
Part II
Outer Join Data Modeling and
Structured Processing
Part II documents in detail the inherent data modeling and structure-
processing capabilities of the standard SQL outer join operation. These are
capabilities that outer join users can utilize immediately. Chapter 5 supplies a
background in data modeling and data structure processing. Chapter 6 shows
in detail how the standard SQL outer join operation can perform complex data
modeling. Chapter 7 introduces new data modeling–related features. Chapter 8
supplies further information on the outer join’s data modeling capabilities.
49
5
Data Structure Review
Working with SQL and its lack of data modeling, relational database profes-
sionals may have a tendency to forget about data structures and their inherent
capabilities. This chapter serves as a short review on data structures, data mod-
eling, and data structure processing necessary to understand the outer join’s
data modeling and structure-processing capabilities identified and demon-
strated in this book.
51
52 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
employee and his or her dependents cannot exist if they are not associated with
a department (i.e., Bill is missing). This is not the case in the Employee view,
which has the opposite semantics that prevent department DeptC from existing
since it has no employees associated with it. This situation is possible if an
entire department is outsourced. In the Department view, DeptC can still exist
and can have a budget and other information associated with it.
Ignoring which fields are present and their column order in Figure 5.1,
notice that the Department and Employee views’ data appear to handle repli-
cated data differently. Hierarchical higher level values control (or own) lower
level values, as shown in both data view displays. Most obvious is that repli-
cated data is totally eliminated in the Department view. To represent this in the
data display, a blank field means that the last value printed in that column is
still valid (unless a dash appears, which means the value is missing). Replication
of the department name is not necessary since any given department can have
many employees in this view and shouldn’t need repeating for each employee
occurrence. The structured output represents the actual data in the view. This
is WYSIWYG (“what you see is what you get”) display processing based on the
semantics of the data structure.
Over in the Employee view in Figure 5.1 you will notice that DeptA is
replicated when the next employee, Mary, is introduced in the display. This fol-
lows the semantics of the Employee view where Employee segment is hierarchi-
cally over Department segment so that each employee has its own department
occurrence. This view’s WYSIWYG display is also valid, showing the correct
replication (notice that employee Mike, with two dependents, did not cause a
Employee
Figure 5.1 Two application views with the same relationships and their data.
Data Structure Review 53
replication). Knowledge of the data structure will further improve the useful-
ness and application of this intuitively formatted data.
The data displays of the Department and Employee views in Figure 5.1
represent the semantics of their data structures—for example, if you were to
take and divide up both views’ data into separate structured records based on
the root value as the record key. Then each view would still reflect the same
data value occurrence counts (cardinality) shown. This verifies that the con-
trolled replicated values are correct.
Most query languages that operate on hierarchical structures are self-navi-
gating, following the data structure, and are controlled by the semantics of the
data structure. This makes them intuitive and powerful. They follow rules
based on parentage and sibling segment (multileg) operation derived from the
hierarchical semantics. Parentage rules can affect processing by controlling
internal looping ranges. Sibling segments are different data paths directly under
the same (common) parent, such as the Department and Dependent paths in
the Employee view in Figure 5.1. The segment occurrences in each of the paths
do not correspond in a one-to-one fashion; they are related only by their com-
mon parent—in this case, Employee—and are otherwise independent of each
other. The left-to-right positioning of segments under a common parent is not
significant. In the Employee view in Figure 5.1, the Dependent and Depart-
ment segments could be reversed without changing the semantics or results.
Combining the above fourth-generation semantics with the Employee
view in Figure 5.1, for example, data selection based on a given department
value from the Department leg and displaying dependents from the Dependent
leg will select all dependents under the active common parent Employee. Using
the Employee view in Figure 5.1, SELECT Dpnd FROM EmployeeView
WHERE Dept=“DeptA”, will in this case display all dependents—Jason, Jane,
and Sam—from department DeptA. This query works by satisfying the selec-
tion criteria to determine the active common parent(s): Mike and Mary from
the Employee table, which controls the range of selected data; Jason and Jane
under Mike; and Sam under Mary. This cycle is repeated until all selection cri-
teria in the database have been tested.
Figure 5.2. These three levels allow for a much greater level of database flexibil-
ity than if they were not used. Unfortunately, relational databases do not inher-
ently support this, but by following good database design, it can be supported
externally.
External User/Application
Department
Department Employee Employee
Employee
Figure 5.3 Conceptual view that encompasses the Department and Employee views.
mapped to one another. The conceptual view logically separates the external
and internal structures, allowing the internal view to change without changing
the external views, and allows the external views to change without changing
the internal view. This adds greatly to the data and structure independence,
database flexibility, and reduced maintenance requirements.
One-to-Many Many-to-One
Relationship Relationship
Parts-Suppliers M to M Relationship
Association
Suppliers Parts
Table
Parts Suppliers
Division Division
related to each other on a row-by-row basis and all combinations of the rows
are necessary to simulate independent processing of the legs so they can be
accessed in any order or combination.
In Figure 5.8, we can see how the Cartesian product effect can explode
the join result when one-to-many relationships cause multiple keys to match in
both tables, such as Key1 in this example. This exploded result becomes neces-
sary because standard relational data is forced into using flat two-dimensional
tables, so the result table as shown above has to be exploded to hold the results.
This becomes particularly important in selecting or filtering data based on data
from two or more tables, as in the WHERE clause of WHERE Alpha=“B” AND
Numeric=1 applied to the data result in Figure 5.8. Locating the table row
result with an Alpha value of B and a Numeric value of 1 requires exploding
the result rather than joining the tables in a simple parallel join method,
which would not produce a row with these values since they are on different
occurrences.
Applying this Cartesian product effect to the joining of the Department,
Employee, and Dependent tables produces a flat, tabular SQL table structure,
as shown in Figure 5.9.
Data Structure Review 59
Notice that with the flattened first normal form structure in Figure 5.9,
the same hierarchical processing as was used in Section 5.1 is achieved by pro-
cessing each row one at a time. No looping or navigation is necessary since all
combinations have been generated and exist in the rows. This means that the
same query used for hierarchical access in Section 5.1 can be used in this case to
achieve the same data results with the flattened structure shown in Figure 5.9.
This query was SELECT Dpnd FROM DeptEmp WHERE Dept=“DeptA”,
which will display all dependents—Jason, Jane, and Sam—from department
DeptA. While this example produces the same results as the identical query in
Section 5.1, flat structures like the one in Figure 5.9 will often produce repli-
cated data in the result. This is the result of the replicated data introduced into
the creation of the flat structure as described in Chapter 1 and shown in
Figure 5.9. This can be seen in the query SELECT Dept FROM DeptEmp
WHERE Dept=“DeptA”, which when applied to the data in Figure 5.9 will rep-
licate the value DeptA three times—once for each row that is present.
This does not happen in physical views, which always represent their actual
structure correctly. This is shown in Figure 5.10, where there are two employ-
ees with the same name in the same department, but this fact is lost in the logi-
cal database view because the structure is determined by data. While this error
could be corrected by taking the count using a unique key, the fact is that the
physical data view is not subject to this error situation.
Figure 5.10 Physical and logical views can produce different results.
Data Structure Review 61
SELECT Div, Prod, Dept Div1 ProdA DeptY Div1 ProdA DeptY
FROM DivisionView
WHERE Dept="DeptY"
AND Prod="ProdA"
one entry in the Product leg is ProdA and at least one occurrence in the Depart-
ment leg is DeptY. This example also selects the data that is included in the
qualification criteria, so this data is also filtered. This means that only values
ProdA and DeptY are selected from their respective common parent Div1.
Notice how the Cartesian product model can support this processing one row
at a time as performed by relational processing. If the AND operator in the
WHERE clause were changed to an OR operator, the Cartesian product pro-
cessing would select rows with a Product value of ProdA or rows with a Depart-
ment value of DeptX. This produces the correct semantics even though
replicated values are also produced because of the Cartesian product effect. This
is shown in Figure 5.13.
As an important point on semantics, both conditions of an OR operation,
as in the SQL from Figure 5.13, have to be tested even if the first condition
tests true. In this query, the first selection condition, Dept=“DeptY”, is true, but
the outcome is still affected by the second selection condition, Prod=“ProdA”,
62 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
SELECT Div, Prod, Dept Div1 ProdA DeptX Div1 ProdA DeptX
FROM DivisionView ProdB DeptY Div1 ProdA DeptY
WHERE Dept="DeptY" Div1 ProdB DeptY
OR Prod="ProdA"
Structured Record N
Segment X
Occurrence N
Segment Y Segment Z
Occurrences Occurrences
segment occurrences represent the rows of the tables as shown in Figure 5.16.
These structured files can be supported directly by COBOL and by other struc-
tured languages by using an interface (with some variable segment occurrence
limitations). More detailed information on structured records can be found in
Chapter 14.
Structured Record N
Table X
Row N
Table Y Table Z
Rows Rows
Second normal form does not permit any partial key dependencies. A
nonkey field (column or attribute) must not be functionally dependent on a
field that is only part of the primary key. In other words, every nonkey field is
fully dependent on the primary key. Third normal form requires every nonkey
field to be nontransitively dependent on the primary key. This means all fields
are directly dependent on the primary key. To correct these potential design
problems, the offending fields should be moved into another table or segment
where they obey these database design rules.
These basic normalization rules may not be enough to satisfy a good data-
base design. Improper database design could still produce a condition known as
lossy decomposition, introduced from the basic normalization process that
breaks tables apart. Imagine breaking a table into two tables based on ZIP code
instead of account number. When these tables are reconstructed by a join oper-
ation, this condition introduces additional extraneous rows that were not in the
original table. This has the effect of obfuscating the semantics of the valid rows,
resulting in a loss of information. To solve this problem, a lossless join property
is needed that can be supplied by advanced normalization forms, known as
Boyce Codd normal form, fourth normal form, and fifth normal form. The
first three basic normal forms explained above removed dependencies. In these
advanced normal forms, advanced dependencies that rely on superkeys are used
to support lossless joins. Superkeys are composite keys that when broken down
still uniquely identify a row. This eliminates the introduction of extraneous
data when tables are joined.
5.14 Conclusion
This chapter has identified and discussed the elements involved with data mod-
eling. These were three-tier database architecture with its application views and
conceptual model; data relationships such as one-to-many, many-to-one, and
many-to-many; data structures such as hierarchical and network; data structure
processing as it relates to relational processing; the semantics of multileg data
structures; and good database design principles.
Network structures are necessary for the definition of the conceptual data
model, which needs the ability to define many different data views for the same
database (tables). However, if network data structures are used as application
views, there can be problems because data values in the structure can be
reached from multiple paths, making the view ambiguous. This allows invalid
assumptions to be made by nonprocedural languages. This is not true of hierar-
chical data structures, which are singular in meaning. This makes their seman-
tics very powerful in the nonprocedural processing of data structures.
66 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
67
68 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Department Employee
Employee
Department Dependent
Dependent
Figure 6.1 Different outer join data structures comprised of the same relationships.
Figure 6.1. With the basic modeling capabilities shown in these data structures,
any hierarchical data structure can be modeled.
The relationships depicted in the Department view in Figure 6.1 are
one-to-many. One department has many employees, and one employee can
have many dependents. In the Employee view in Figure 6.1, the department to
employee one-to-many relationship shown in the Department view has been
flipped around to define an employee to department many-to-one relationship.
Both of the structures in Figure 6.1 use the same tables and the same rela-
tionships to derive different structures with different semantics. This is shown
in the differing query results in Figure 6.1 where department DeptC with no
employees can’t exist in the Employee view, and employee Bill can’t exist in the
Department view because he has no department designation. What triggers this
difference? Since the join relationships are identical, it wasn’t directly any of the
ON clauses. It was the initial LEFT join that reversed the Department and
Employee table arguments from the Department view, putting Employee over
Department. This in effect transformed the structure into the multileg struc-
ture shown in the Employee view in Figure 6.1. This is because the Employee
table is now hierarchically above the Department and Dependent tables and is
directly related to both of them through their ON clauses. This demonstrates
that ON clauses are also of importance by controlling the link (join) points
between the data structures.
Outer Join Does Data Modeling 69
A A
B C B C
X X1 X2
Resulting Structure
Dependent
Dependent
DeptEmp View
L
I Department Department
N
K
Employee Employee
Figure 6.3 Example of breaking link rule three to build a hierarchical structure.
below). This is necessary to specify a complete path from the upper structure’s
link point to the lower structure’s link point. The link point is a specific table in
the upper and lower structures determined by the specification of the ON
clause join condition that joins (or links) the upper and lower structures. The
determination of the link points is specified in the second and third ON clause
join condition rules described directly below.
The second rule applies to the top structure. In the top structure, only one
single path can be referenced from the link point up the path to the root. Refer-
encing multiple paths using AND or OR operators creates an ambiguous net-
work structure, as demonstrated in the network structure in Figure 6.2. When
using AND and OR conditions in the ON clause, OR clauses create subclauses
that can consist of AND operations. When referencing multiple locations along
a path in the upper level structure, the lowest table referenced in each OR
subcondition becomes the link point, and the link point in each OR subclause
must specify the same link point table; otherwise, a network or illogical struc-
ture is created. When the link point in the upper level structure is not the low-
est level point on its path, a new leg of the structure is created. This can be seen
in Figure 6.1 when in the Employee view the Dependent table is joined to the
Employee table, forming a multileg structure.
The third and final rule applies to the bottom structure. In the bottom
structure, only the root (top) table can be referenced. This is necessary to
preserve the top-down processing of hierarchical structures that is normally
expected. While breaking this rule may limit some of the advantages of a strict
hierarchy, it is possible to link to a lower level structure based on table columns
below the root of the lower structure. Regardless of which table or tables
are referenced below the root, the root table should still be treated as the
bottom structure link point, as demonstrated in Figure 6.3. The exact se-
mantics of this unconventional hierarchical structure will be covered in
72 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Chapter 15, but up until then this text assumes that the third linking rule is
always obeyed.
B B
C C
Figure 6.4 The difference between OR and AND operators when linking structures.
Outer Join Does Data Modeling 73
A A
B C B
X C
behavior can be considered illogical. Again, this does not mean that there is not
some possible use for the semantics of this structure.
The second ON clause for the hierarchical structure in Figure 6.5 demon-
strates how the OR operator can be used to specify a choice of two OR
subconditions because each OR subcondition isolates the same two link points:
tables B and C. The reference to table A in the upper structure is disregarded in
determining the link point since table B is at a lower level. This example also
demonstrates that the join condition does not always have to compare two col-
umns directly to each other (i.e., C=“X” AND B=“Y” ). The link can be satisfied
as long as each subclause references a table from each structure and satisfies the
join condition rules described in Section 6.2.
substructures, the same rules apply as those defined earlier in this chapter for
building structures. In particular, the ON clause rules in Section 6.2 must be
followed.
As mention in Chapter 2, right-sided nesting is required to support
stored structured views, or more precisely the ability of the outer join syntax to
support the simultaneous building and handling of multiple data structures.
Take for example: (A LEFT JOIN B ON A=B) LEFT JOIN (C LEFT JOIN D
ON C=D) ON B=C. The parentheses have been added to make the outer join
statement clearer, but are unnecessary since the join order is controlled by the
placement of the ON clauses (see Chapter 2). The join operations in parenthe-
ses are performed first, forming separate structures, each stored in a different
working set before they are combined into one structure following the last,
rightmost ON clause. The LEFT join operations enclosed in the parentheses
76 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
can be thought of as two stored structured views that have been expanded into
their representative SQL when inline expansion is used by the SQL system.
When the inline expansion of the stored structured views occurred in the
above SQL, notice what happened to the rightmost ON clause. It got pushed
to the right, causing right-sided nesting. Fortunately, the standard SQL syntax
handles this situation properly to support inline expansion. With stored struc-
tured views, this right-sided nesting occurs transparently, so the SQL program-
mer need not normally be concerned with right-sided nesting. The
transparency of this operation is demonstrated in Figure 6.8.
The Department view’s SQL in Figure 6.8 demonstrates how the embed-
ded subview EmpView is expanded to define the Department data structure.
While the semantics of the expanded Department SQL are the same as the
depicted Department structure, the order that the joins are performed is
now from the bottom up instead of from the top down. The reason the sem-
antics remain the same is that with hierarchical structures you can build
them up, down, or in any order and the semantics remain the same as was
described in Chapter 3. There is one caveat when building a structure upwards:
when the ON clause references a field further up the structure than the
upper link point, the upper level structure must contain all references at
the time of the join. This should not present a problem for stored views
since they should only be referencing columns in their own view domain.
EmpView View
DEFINE EmpViewAS
Employee SELECT * FROM Employee
LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
Dependent
Department View
Expanded View:
Employee
SELECT * FROM Department
LEFT JOIN
Dependent (Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo)
ON DeptNo=EmpDeptNo
: Emp
Produces Dept Dpnd
Mike DeptA Jason
Mike DeptA Jane
Mary DeptA Sam
EmpView View
DEFINE EmpViewAS
Employee SELECT * FROM Employee
LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
Dependent
WHERE EmpAge>55
Department View
Expanded View:
Employee
SELECT * FROM Department
LEFT JOIN
Dependent Employee LEFTJOIN Dependent
ON EmpNo=DpndEmpNo
ON DeptNo=EmpDeptNo
AND EmpAge>55
moved to the ON clause that controls the linking of this substructure when it is
processed. This seamless transformation allows the substructure to be inte-
grated seamlessly into the overall structure, and allows a top-to-bottom process-
ing order to process the substructure. This is also shown in Figure 6.10.
In Figure 6.10, moving the WHERE clause data filter of the subview
higher up to the ON clause of the join that controls linking the subview works
because the filtering applies to the total subview, just as the WHERE clause
would have.
6.9 Conclusion
Building hierarchical data structures and the structured processing of them is
possible with the one-sided outer join operation. This building of hierarchical
data structures or combining of hierarchical data structures involves two opera-
tions. First, the placement or specification of which structure is hierarchically
over the other, and second, the specification of the pathway from the link
points from the upper structure to the lower structure. The first operation is
accomplished using a LEFT or RIGHT outer join that places one structure
hierarchically above the other, and the second operation, specifying pathways,
is specified by ON clauses. Both of these operations are required to model hier-
archical data structures. Data structures modeled in such a fashion can still be
filtered by the inclusion of a WHERE clause in the data structure definition
and/or view invocation.
Amazingly, the syntax of the standard SQL join operation naturally sup-
ports the use of substructure views as standard SQL views. These structured
subviews can be used anywhere a table can be specified to combine with other
structures to form larger data structures. These substructure views can also be
embedded in other structure views.
Also shown in this chapter was the capability for the outer join operation
to create ambiguous network data structures and illogical structures. While
these structures do not have the same powerful semantics as hierarchical data
structures, they still may be useful in certain specialized situations that the user
may have. Unfortunately, when these structures are used, it is usually by
accident. The knowledge of how to construct hierarchical structures can also
prevent ambiguous and illogical structures from being built unintentionally.
7
Outer Join Data Modeling–Related
Capabilities
This chapter covers powerful capabilities and features that inherently accom-
pany or enhance the standard SQL outer join data modeling capability. For this
reason, they are automatically available for database professionals to use if they
know that they exist and how to use them.
81
82 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
criteria. Basically, this means that any tables referenced by the ON clause filter-
ing criteria must be limited to the root of the lower level structure or any tables
from the link point up the path to the root. In this way, the data filtering crite-
ria cannot inadvertently affect the link points that would change the structure
being modeled and its semantics.
separate Employee tables that would be useful if FULL outer or inner joined
and placed into a hierarchical structure as a single logical table.
Logical tables can be created as temporary tables in a previous step and
introduced into the structure. Unfortunately, these temporary tables cannot
take advantage of the semantic capabilities of hierarchical structures. For exam-
ple, the optimizations covered in Chapter 11 would not be able to optimize
the joins performed in a previous step. But performing inline nonhierarchical
joins while building a hierarchical structure can invalidate the structure, turn-
ing it into a nonhierarchical structure with unstable application semantics,
as described in Chapter 5. Such a nonhierarchical structure is defined in
Figure 7.3 from a combination of LEFT and FULL joins.
In Figure 7.3, EmpY becomes a second entry point in the data structure,
invalidating the hierarchical data structure. If an inner join was used instead of
the FULL outer join, it could also cause the removal of the Dept segment,
which would be logically above it.
There turns out to be a solution to the problem of incorporating non-
hierarchical, symmetric join types into the hierarchical model being built. The
solution again rests with right-sided nesting, which was discussed in Chapter 6,
to support stored and embedded structured views. When left-sided nesting is
intermixed with right nesting, we also determined in Chapter 6 that multiple
separate structures were temporarily formed. When a new structure was cre-
ated, the current one being built was put on hold and sheltered from the effects
of joins to the active structure. This technique can be used to perform
nonhierarchical joins without invalidating the hierarchical structure(s) being
built. This is demonstrated in Figure 7.4.
The FULL outer join operation performed in Figure 7.4 is sheltered from
invalidating currently existing hierarchical structures because of the strategic
use of right-sided nesting. The FULL join operation that is highlighted in
Figure 7.4 is performed in isolation. In this example, the FULL outer join
Dept
EmpX EmpY
Dpnd
ISOLATED
SELECT * FROM Dept LEFT JOIN JOIN
EmpX FULL JOIN EmpY USING (EmpNo)
ON DeptNo=EmpDeptNo
LEFT JOIN Dpnd ON EmpNo=DpndEmpNo
could also have been an INNER or UNION join. These operations are sym-
metrical in operation, making their data modeling ability neutral in nature—
both sides carry equal data-preserving ability. This means these operations form
a single, flat logical object, like EmpX|EmpY in the diagram in Figure 7.4. This
is why this object can be viewed as a single logical table. These logical tables
can be composed of more than two tables by using left-sided nesting when
building the logical table. And finally, more than one logical table can be in-
corporated into a hierarchical structure. These concepts are demonstrated in
Figure 7.5.
When creating logical tables with the INNER or FULL join operation, it
is usually desirable to have one fixed key location per logical table. This can be
easily performed using the NATURAL or USING option, which was described
in Chapter 4. This is demonstrated in Figure 7.6. The parentheses are used for
readability in this example—they do not affect the join order.
As described in Chapter 4, the NATURAL option used with any type
join operation will not allow the modeling of hierarchical data structures. But
used with right-sided nesting, as shown in Figure 7.6, its nonhierarchical opera-
tion used with symmetric joins is also sheltered from the hierarchical structure
being built.
It is also possible to use a logical table as the root of a structure. This is
shown in Figure 7.7. In this example, the root logical table is not being pro-
tected by right-sided nesting because it is specified on the left side. Right-sided
nesting is not necessary in this case because the root logical table is defined first
in the SQL statement, so no sheltering is necessary since there is no other struc-
ture in existence or active to be affected. The SQL example in Figure 7.7 also
86 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
B C D
E F G H
Figure 7.5 Complex hybrid hierarchical structure with multiple logical tables.
SELECT * FROM
A FULL JOIN B ON A=B
A B C FULL JOIN C ON B=C
LEFT JOIN
D E (D FULL JOIN E USING (Key))
ON C=E AND A=D OR B=E
demonstrates by its complex use of AND and OR operators that logical tables
follow the same linking rules and capabilities as standard tables.
The example in Figure 7.7 may raise some concerns that logical tables or
substructures in general, when specified on the left, may be subject to inter-
ference from or cause interference to other structures—they may come into
contact with them on their left side. If true, this would make their use unpre-
dictable or unstable, reducing their usefulness. This, however, is definitely not
Outer Join Data Modeling–Related Capabilities 87
DEFINEEmpAll AS
SELECT * FROM EmpWest
FULL JOIN EmpEast
EmpWest EmpEast USING (EmpNo)
Department View
which is accomplished by only referencing columns from the root tables for the
join criteria. This is demonstrated in Figure 7.10.
Figure 7.10 demonstrates two structures being FULL outer joined. As can
be seen in these examples, structures naturally form the proper protected envi-
ronment needed for nonhierarchical joins as described in Section 7.3. These
can be expanded views of data structures or structures built in place, which is
equivalent to the expanded structure views as shown in Figure 7.10. Also
shown in Figure 7.10 is the expanded SQL rewritten to be more efficiently exe-
cuted by avoiding throwaway tuples. This is accomplished by performing the
FULL outer join first, as shown.
While the nonhierarchical example in Figure 7.10 uses a FULL outer join
to link the data structures, it could have also been an inner join. While these
symmetric operations both produce the same valid hierarchical structure, the
semantics as far as the resulting data content are different, as you would expect.
The inner join removes both structures being linked if both do not exist, while
the FULL outer join will preserve data structures even if they have no matching
data structure.
Linking symmetrically at the root level causes no invalidating of the
hierarchical data structure. Applying nonhierarchical linking at structure levels
lower than their root produces nonhierarchical data structures. Inner joins
can cause data loss further up the data structure, which invalidates the data
Outer Join Data Modeling–Related Capabilities 89
SELECT * FROM
ViewA FULL JOIN ViewX ON A=X
A X AX
B C Y B C Y
Z Z
Expanded: Rewritten:
SELECT * FROM SELECT * FROM
A LEFT JOIN B ON A=B A FULL JOIN X ON A=X
LEFT JOIN C ON A=C LEFT JOIN B ON A=B
FULL JOIN Equal LEFT JOIN C ON A=C
X LEFT JOIN Y ON X=Y LEFT JOIN Y ON X=Y
LEFT JOIN Z ON Y=Z LEFT JOIN Z ON Y=Z
ON A=X
structure, and a FULL outer join can cause only the lower structure to be
preserved, which also forms an invalid structure. These situations are both
avoided by joining the data substructures only at their roots. This is also
the most natural and common way to join two data structures symmetrically
(nonhierarchically).
Single tables can also be nonhierarchically joined to data structures. Since
a single table is actually a data structure consisting of one table with its only
table as the root table, it can be joined nonhierarchically to a multitable struc-
ture following the same requirements stated above for joining data structures
nonhierarchically.
The capability to perform symmetric joins when modeling hierarchical
data structures is quite useful and an important feature for hierarchical data
modeling. Figure 7.11 demonstrates the usefulness of symmetric joins in mod-
eling hierarchical data structures. The first data structure in Figure 7.11 does
not use a symmetric join in modeling a structure with two Employee tables. It
uses the Department table to join the two Employee tables. This introduces a
number of problems, such as two separate Employee tables to access with (pos-
sibly) different employees in each. There is also another side effect of having the
Employee tables joined by their common department, causing an unnecessary
data explosion with rows that contain employee data from different employees.
The second data structure and its defining SQL in Figure 7.11 solve the
problems introduced from the first data structure that were noted above. The
90 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Dept Dept
Employee tables are naturally FULL outer joined, preserving all data from both
tables and creating one unique key for each row result produced. And this logi-
cal table result is placed in the data structure hierarchically in the correct posi-
tion without invalidating the data structure. This correctly matches up the
Employee tables without exploding the data or generating extraneous, incor-
rectly matched employee rows while still correctly organizing the employees
under their department. This also allows the joining of the Dependent and Pro-
ject tables to the structure by a match from either of the Employee tables, pro-
ducing a more consistent and accurate structure.
PSX PSX
Suppliers Parts
association table (PSX), used in the SQL specification will appear transparent,
as it should. This is also the case if intersecting data from the association table,
such as prices of parts from each supplier, is selected, which will logically appear
as data from the lower level table. An example of intersecting data use can be
found in Chapter 12.
7.6 Conclusion
From the information supplied in this chapter and the preceding chapter, it
should be clear that the standard SQL join operation with its flexible syntax
and powerful outer join operation can be used or programmed to accomplish
tasks requiring complex semantics. The outer join can be used to model both
hierarchical and nonhierarchical data structures. Hierarchical data structures
are advantageous because they have singular meaning, which makes their
semantics unambiguous and for this reason better suited for application use.
Nonhierarchical structures, such as network structures, are not generally rec-
ommended for application view use, but may still be useful in applications with
very specific requirements as long as the SQL programmer is aware of their
unstable or ambiguous semantics.
There has been sufficient information supplied in these last two chapters
to enable the design and construction of a hierarchical, network, or hybrid data
structure using the standard SQL join operation. The LEFT and RIGHT outer
92 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
joins are hierarchical operations and are used to model a hierarchical data struc-
ture. The INNER and FULL joins are symmetric joins that do not model hier-
archical data structures, and can in fact invalidate hierarchical structures. It was
shown how these symmetric operations can be used to form logical tables that
can be safely and seamlessly introduced into a hierarchical structure being mod-
eled without invalidating it by using right-sided nesting. Similarly it was shown
how to symmetrically link data structures so they maintain a valid hierarchical
data structure.
Besides modeling data structures, the standard SQL join syntax also
seamlessly supports a fine level of data filtering that precisely filters data, follow-
ing the defined hierarchical data structure. To help with the coding of standard
SQL data modeling joins and features like the fine data filtering capability,
Chapter 8 describes a procedure that can help automate this process.
It was also shown how many-to-many relationships can be seamlessly
modeled. Using all the capabilities documented in this and the previous chap-
ter, any hierarchical data structure can be modeled.
8
More About Outer Join Data Modeling
This chapter examines the significance of the standard SQL outer join’s data
modeling and structure-processing ability to SQL, which did not previously
support this capability. It also examines how these outer join data modeling
statements can be generated, and their efficiency. This chapter also presents
empirical proof that the outer join does enable and support data modeling and
structure processing as presented in this book.
93
94 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
by having to follow the old inner join’s Cartesian product model of operation
as described in Chapter 2.
Figure 8.1 Coding data modeling outer joins from structure diagrams.
More About Outer Join Data Modeling 95
A Logical Table
Logical Table X Y Z
Figure 8.2 Coding outer join statements that use logical tables.
Department Employee
Employee
Department Dependent
Dependent
The following progression of outer join examples follows the outer join’s opera-
tion as described above. The first two examples demonstrate a simple hierarchi-
cal modeling operation and show that it works for one-to-many as well as
many-to-one relationships.
The outer join specification Department LEFT JOIN Employee ON
DeptNo= EmpDeptNo creates the one-to-many hierarchical relationship of
Department over Employee because:
Proof:
• Dependent can exist only if a matching Department Dpnd
and Employee exist.
• Employee and Dependent exist only if a matching
Department exists.
Proof:
• Department and Dependent can only exist with a matching Employee.
• Department and Dependent are not dependent on one another:
• Department can exist without a Dependent.
• Dependent can exist without a Department.
Notice in the outer join proof directly above that the Dependent table
was joined after the Department table was joined, but that in this case these two
tables are on different paths and cannot influence each other. This is because
the Dependent table was joined to the Employee table and not the Department
table; therefore, it doesn’t rely on the Department table’s existence even though
it was joined in a later join operation.
While the example data structures used in this section do not show many-
to-many relationships directly, many-to-many relationships (see Chapter 5) are
composed of many-to-one and one-to-many relationships, which were
described in this section. It is therefore not necessary to show examples of
many-to-many relationships.
• Table T1 and its LEFT join are put on hold, waiting until T1
the matching ON clause is ready for processing. During this
time, T1’s working set cannot be modified.
• While waiting for table T1 and its LEFT join’s matching TX TY
ON clause, tables TX and TY are UNIONed in isolation.
Since the UNION operation is symmetric, the resulting
structure is neutral and not hierarchical, making it a valid T2
logical table.
More About Outer Join Data Modeling 99
Proof:
Nested View
View ABCView defined as: ABView LEFT JOIN C ON B=C
right processing handles nested left-sided views, processing them in LIFO fash-
ion (the last nested view source replacement is the first to be processed). This
preserves the data modeling semantics of each view—allowing logical table
views to be specified on the left side where they can’t affect the data structure.
Let’s now examine some examples of right-sided view source replacement
and see how and why it works. Right-sided nesting occurs when views are
expanded on the right side of the join operation. The first example in
Figure 8.5 demonstrates the basic right-sided view replacement, which pro-
duces right-sided nesting. As this example demonstrates, right-sided nesting is
not processed left to right, but requires postfix processing and argument stack-
ing, changing the processing order to right to left. This stacking processing will
be discussed in further detail in Chapter 9, Section 9.4. The second example
demonstrates how this right-sided processing is handled in nested right-sided
views. The stacking creates a protected environment that preserves the data
modeling semantics of each view, allowing logical table views to also be speci-
fied on the right side.
Notice in the second (nested view) examples in Figures 8.4 and 8.5 that
the innermost nested views of both are processed first. In Figure 8.4, left-sided
views expand their view source to the left as the nested views are expanded
More About Outer Join Data Modeling 101
Nested View
View BCDView defined as: B LEFT JOIN CDView ON B=C
Proof: Dpnd
Employee
Department Dependent
SELECT *
FROM Employee
LEFT JOIN Department ON DeptNo=EmpDeptNo AND EmpStat=“Full”
LEFT JOIN Dependent ON EmpNo=DpndEmpNo AND EmpPos=“Mgr”
WHERE clause query below does not represent above ON clause query
SELECT *
FROM Employee LEFT JOIN Department LEFT JOIN Dependent
WHERE DeptNo=EmpDeptNo AND EmpStat=“Full”
AND EmpNo=DpndEmpNo AND EmpPos=“Mgr”
SELECT * Department
FROM Department, Employee, Dependent
Employee
WHERE DeptNo(+)=EmpDeptNo
AND EmpNo(+)=DpndEmpNo Dependent
SELECT * Employee
FROM Department, Employee, Dependent
WHERE DeptNo=EmpDeptNo(+)
AND EmpNo(+)=DpndEmpNo Department Dependent
Figure 8.7 Old-style outer joins can perform limited data modeling.
as obvious as the equivalent standard SQL join statement. These old-style outer
joins can be easily translated into standard SQL joins.
8.13 Conclusion
This chapter has presented empirical proof that outer join statements can per-
form data modeling and structure processing, and demonstrated that views
containing structures and logical tables can be used seamlessly in building and
modeling complex data structures. It pointed out that because this data model-
ing capability is possible with standard SQL statements, it can be used safely,
can maintain its usefulness with SQL:1999, and can also become a default
106 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
standard for database data modeling. It was shown how data modeling outer
joins can be generated by constructing them while following the hierarchical
data structure, and that it was possible to use older nonstandard-style outer
joins to model simple data structures. Finally, this chapter discussed the impor-
tance of SQL’s inherent data structure processing ability, and how the inner
join’s role and proper use has changed with the addition of the outer join.
Part III
New Capabilities Based on Outer Join
Data Modeling
Part III describes advanced SQL capabilities made possible by the standard
SQL outer join data modeling capability that SQL vendors can offer to users.
Chapter 9 introduces the data structure extraction (DSE) technology used to
extract the data structure information naturally embedded in standard SQL
outer join statements. Chapter 10 identifies a number of advanced capabilities
made possible by the data modeling capability of the standard SQL outer join.
Chapter 11 describes the many powerful semantic SQL optimizations that are
possible based on the data modeling information available from outer joins.
Chapter 12 demonstrates a hierarchical relational processor prototype that
operates by utilizing the data structure information from outer join statements.
Chapter 13 presents an object relational interface that is based on the data
structure information from outer join specifications. Chapter 14 looks at
nonrelational SQL-based universal data access frameworks and how outer join
processing naturally fits in by using a structured data record interface as an
example.
107
9
Data Structure Extraction (DSE)
Technology
Advanced Data Access Technologies a company affiliated with the author, has
been researching the standard SQL join operation for a number of years. It real-
ized that the outer join operation, which is part of the SQL standard, along
with the standard SQL powerful syntax, combine to produce powerful data
modeling and data structure processing capabilities. Since SQL previously had
no inherent data modeling and data structure processing capabilities, Advanced
Data Access Technologies also realized this would be of significant benefit to
users and vendors if recognized, understood, and properly utilized.
109
110 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
SELECT * FROM A
A LEFT JOIN B ON A1=B1
LEFT JOIN B C
C LEFT JOIN D ON C1=D1
ON A2=C2 D
complex data structures being modeled even though the standard SQL join
programmer may not realize that he or she is performing data modeling.
The DSE technology dynamically determines the data structure by ana-
lyzing and interpreting how the outer join statement has been specified, taking
into account the table relationships used and general hierarchical data structure
concepts and principles that were discussed in Chapters 5 and 6. This data
structure extraction is accomplished with no additional or supplemental infor-
mation supplied by the programmer or SQL system other than what is nor-
mally available. This makes capabilities supported by the DSE technology
seamless and transparent. The DSE technology also detects invalid structures
(see Chapter 6), and can operate dynamically for use with ad hoc (i.e., interac-
tive) and object-oriented uses (i.e., late binding).
the example in Figure 9.3 covers the only situation possible for this type of link-
ing. Notice how the generated hierarchical structure meta information remains
top-down, indicating that linking of the root substructure tables X and M can
be performed before their associated substructures are built. So, this symmetric
data structure join is represented in the structure meta information the same
way that the logical table was in Section 9.3.
Data Structure Extraction (DSE) Technology 113
SQL. The DSE technology is a building block technology that allows the easy
addition of powerful new standard SQL–compatible features and capabilities
that eliminate or greatly reduce this non-SQL standardization problem. It can
also significantly help with the problem of poor efficiency with the standard
SQL outer join operation, and in many cases can bring its efficiency up to that
of the older standard inner join. Outer join specifications with questionable
(i.e., ambiguous) data structure semantics are also detected. Lastly, with this
data structure meta information freely available, it makes good business sense
to put it to use.
supplied data structure meta information. This mismatch will often produce
erroneous results. The best solution all the way around is to use the natural data
modeling capability of outer joins and the DSE technology to supply the data
structure meta information wherever and whenever it is needed. Since the DSE
technology is deriving the data structure meta information directly from the
SQL, its data structure meta information is always accurate, with little or no
chance for error.
9.8 Conclusion
The DSE technology proved that it is possible to dynamically extract the data
structure meta information embedded in standard SQL join specifications.
These hierarchical data structures can also utilize nonhierarchical, symmetric
join operations in their definition to support logical tables and symmetric sub-
structure joins. What makes this technology unique is that it is fully standard
SQL compatible (both syntactically and semantically), which enables SQL fea-
tures not previously possible with standard relational databases. It was also
shown why this technology offers the best solution to supplying data structure
meta information to SQL-based data access drivers and processors.
The following chapters will demonstrate how this dynamically supplied
meta information provided by the DSE technology can be utilized to create
new products and features. These features include powerful semantic
optimizations, seamless legacy access, object capabilities, postrelational process-
ing, and plug-and-play capabilities.
10
Outer Join Advanced Capabilities
This chapter presents advanced capabilities that SQL vendors can implement
for their users by utilizing the data modeling and data structure processing
capabilities of the standard SQL outer join operation. The advanced capabili-
ties are made possible by dynamically extracting the data structure meta infor-
mation from standard SQL outer join specifications. This data structure meta
in- formation is free information, placed in the outer join specification either
knowingly or unknowingly by the programmer of the outer join specification.
It can be extracted for the SQL product’s use by a DSE procedure like the one
documented in Chapter 9. With this information, the advanced database capa-
bilities covered in this chapter are possible.
117
118 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
optimized since the entire portion of the data structure being accessed can be
determined before being accessed. These navigational instructions can be used
to access any database that supports hierarchical access. The extracted data
structure can be a logical structure composed of more than one physical type of
database so that support for disparate heterogeneous databases and enter-
prise-wide access is also possible. When navigating physical databases, the order
of sibling legs, such as B before C in Figure 10.1, may be important. It is useful
to realize that the database navigation process described here can be performed
dynamically.
Figure 10.1 The outer join can enable universal database navigation and access.
Outer Join Advanced Capabilities 119
A A
SELECT C SELECT A
FROM ViewABC FROM ViewABC
B C B C
Figure 10.2 Outer join view dynamic optimization based on selection criteria.
Figure 10.3 Disparate database access is possible with the outer join.
120 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
access statements do not instruct how to access the database, but rather what is
desired from the database. This means that all the information needed to know
how to access the database is determined by a query optimizer, allowing an effi-
cient global access strategy to be developed. Because of this, very efficient access
can be achieved, as in the example in Figure 10.2, which can also be applied to
nonrelational databases. Nonrelational optimized SQL access is described in
more detail in Chapter 11, and nonrelational heterogeneous SQL access is
described further in Chapter 14.
functions that can occur anywhere in the data structure and do not include rep-
licated data values in the results, more flexible aggregate operations where the
range of input columns is controlled naturally by the data structure, and easing
of syntax limitations. An example of more flexible and accurate syntax is shown
in Figure 10.5. Summary results are taken at multiple locations in the data
structure, and the WHERE and HAVING clauses allow a two-level filtering
where rows can be filtered before being summed and then filtered on their
summed value. Additionally, the use of this advanced summary processing in
the HAVING clause has avoided the need for a nested SELECT statement.
Figure 10.5 Multiple summaries taken at different locations in the data structure.
122 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Employee
Product Department
Dependent
Division Manager Manager Employees
Figure 10.6 The outer join can access unlimited views from data warehouse repository.
SELECT * A
FROM A 01 A Char 20
LEFT JOIN B ON A=B 10 B Char 20 Occurs …
LEFT JOIN C ONA=C B C 10 C Char 20 Occurs …
A
SELECT *
FROM A B 01 A Char 20
LEFT JOIN B ON A=B 10 B Char 20 Occurs …
LEFT JOIN C ON B=C C 20 C Char 20 Occurs …
Figure 10.8 Object relational interface can read and write structured data.
124 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Dept Emp
Delete
Emp
Dept Dpnd
Dpnd
Figure 10.9 Deleting a department from different views produces different results.
Dept Emp
Delete
Emp
Dept Dpnd
Dpnd
Figure 10.10 Deleting an employee from different views produces different results.
Outer Join Advanced Capabilities 125
applications, but can still be data modeled in their own unique way using the
data modeling capability shown in Figure 10.11.
Data structure
Data provider/driver
extraction
Legend: SQL
Meta Data OLE DB, ODBC, JDBC
External
Data Data Modeling SQL
Generation UDA Product
Definition
Figure 10.12 Integrating external data definitions with data modeling SQL.
128 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
data access. The data provider component uses the data structure extraction
technology described in Chapter 9 to retrieve the data structure meta informa-
tion from the SQL specification. Chapter 14 goes into this topic in more detail.
It is important to realize that the standard SQL join data modeling capa-
bility is based totally on the outer join’s standard syntax and semantics. This
data modeling capability exists inherently in the ANSI/ISO SQL standard, and
is operating automatically all the time. This means that any other approach
used to supply the data structure of a SQL query could be in conflict with the
data modeling occurring naturally with externally supplied outer join specifica-
tions, and this could produce incorrect results.
This data structure conflict can be eliminated by generating data model-
ing SQL from the externally supplied data definition, thereby introducing SQL
that accurately models the data structure, and from which the data structure
can be extracted at any time and location. The diagram in Figure 10.12 demon-
strates this system design.
Figure 10.13 Structured data can be moved accurately between SQL and XML.
legs and multiple levels can be specified. The elements of the XML definition
are nested by following the hierarchical structure.
The XML and SQL capability to define and process hierarchical struc-
tured data has great utility value. One important use is to dynamically transfer
data from databases to Web servers, business-to-business (B2B) applications
and integration servers. This technique is greatly improved by SQL’s ability to
dynamically transfer structured data from any combination of database sources
into an XML container, where it can be served as XML or rendered for display
as HTML. As shown in Figure 10.14, SQL is invoked by the browser to trans-
fer data into the Web site in XML format.
Other important use cases include archiving and data replication. Because
XML data is tagged when it is exported from SQL databases, it is self-defining,
a very useful property for data archives. Because SQL database products can
import and export XML, it’s a viable solution for replication across disparate
SQL database platforms.
Another use of SQL for web content is a new capability made possible by
XML. It is the capability to treat XML web content as a database, with SQL
capable of accessing structured XML data along with other databases for
retrieval or even update, as shown in Figure 10.15. This means that web sites
with static XML content do not have to be a closed system accessible only by a
browser. The content can be accessible to disparate and heterogeneous data
access by a wide variety of SQL client software.
130 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
User Browser
Data Data
SQL
Any XML WEB
Structured Data
Database DB SQL Page
Processor
Request
Figure 10.14 SQL can move structured data dynamically into an XML Web site.
SQL Structured
Data Request
DB Direct Access
Any Request SQL XML WEB
Structured Data Page
Database Processor
Data XML Data
Result
Figure 10.15 SQL can treat XML Web sites like any other database.
can be directly addressed by SQL or joined to from other record types in the
heterogeneous virtual structure using their foreign-key field value.
10.13 Conclusion
The data structure meta information that is extracted by the DSE technology is
extremely valuable. It has the potential of supporting many powerful new SQL
features and capabilities not previously possible. Many of these were identified
in this chapter, such as optimization, object relational interface support, view
update capability, hierarchical relational processing, seamless legacy database
access, and direct access to XML Web sites. The main enabler of these capabili-
ties is the database navigation and processing of data structures. While these are
global solutions, there is also the potential for specific solutions or features that
can extend or compliment individualized products.
11
Outer Join Optimization
The standard SQL join operation is more difficult to optimize with its ON
clauses and outer join operations than the simpler common inner join. With
the common inner join, its tables can be freely reordered to best optimize
access. With the standard SQL join, this ability is constrained by its ON
clauses. Working within the constraints of the ON clauses, INNER and FULL
joins can each be reordered in any order because they are both commutative
and associative in operation. The one-sided outer join is not commutative; its
tables cannot be freely reordered. But hierarchictivity can play a role in optimi-
zation. This chapter explores the hierarchical semantics of the one-sided outer
join for use in optimization.
133
134 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Access A
direction Missing table B occurrence
B terminates access path for
this row occurrence
materialized view (the data that represents the view) on which the view invoca-
tion is based is always affected by all tables in the inner join view. This is
because missing data anywhere in the inner join will cause unmatched rows to
be removed. This was discussed back in Chapter 1 where Figure 1.1 showed
that an inner join composed of the Department and Employee tables would
not contain departments that had no employees. This means that if this view,
call it DeptEmpView, was invoked as in SELECT DeptName FROM Dept-
EmpView, only DeptNames for departments that had employees would be
selected. This result required that the Employee table be accessed, even though
no data was selected from it. If this was not the desired result, then this view
should not have been used and the Department table should have been accessed
directly.
The necessity of accessing all tables in a view is a requirement for the way
inner joins use the Cartesian product model for processing joins, as described in
Chapter 1. This is not necessary for outer joins that generate hierarchical struc-
tures. standard SQL outer joins operate differently than inner joins as described
in Chapter 2.
Outer join views that model hierarchical structures do not always need to
access all tables in the view when invoked. Take for example the outer join view
DeptEmpView, defined as SELECT * FROM Department LEFT JOIN
Employee ON DeptNo=EmpDeptNo. When this view is invoked as SELECT
DeptName FROM DeptEmpView, the Employee table is not referenced and
does not need to be accessed. This is because, in the semantics of this hierarchi-
cal data structure, the Employee table is at a lower level than the selected table
Department. This means that the Employee table cannot affect the Depart-
ment table, and therefore does not need to be accessed.
Any hierarchical structure access, no matter how complex, defined by
outer joins can apply this powerful view optimization. This is performed by
eliminating tables from access consideration that are not referenced in the
query and are not on a path to a referenced table in the query. This excludes
136 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
tables referenced on ON clauses since they will not affect the query if they are
not referenced anywhere else in the query, because they are only used if access
of the table is necessary. This optimization is based on the modeled hierarchical
data structure and the columns specified at the time of the view invocation.
This is not new. Hierarchical access logic dictates this behavior. The true test
of this is that this logic derives the same data results as if all the tables were
accessed. This is demonstrated in Figure 11.3.
There is an additional beneficial side effect of this optimization: it helps
eliminate unnecessary replicated rows. These replicated rows are introduced
by accessing unnecessary tables. This means that the optimized result is more
semantically correct than the unoptimized result. For example, in the outer join
DeptEmpView example described earlier in this section, the unoptimized view
invocation would replicate the department’s name (DeptName) for each
employee in the department even though no Employee columns were selected.
The optimized invocation would not replicate department names since no
access to the Employee table was needed. This is also shown in Figure 11.3.
The two examples in Figure 11.3 demonstrate view optimization applied
to two different SQL views of the same data and relationships. The data struc-
ture diagrams shown reflect the structure of the SQL outer join view definitions
and data that were originally defined in Figure 6.1. For the Department and
Employee views, the dotted lines in the data structure diagrams in Figure 11.3
represent areas of the structures that can be eliminated from access based on the
view selection criteria shown directly above the diagrams. Data enclosed in a
Department Employee
Employee
EmpName EmpName
Mary Mary
John John Key: Dotted
Mike Bill boxes removed
Mike Mike if optimization
in effect
Mike
Figure 11.3 Outer join view optimizations can produce more accurate results.
Outer Join Optimization 137
dotted box represents unnecessary replicated data that is removed when optimi-
zation is applied. This duplicate removal is more semantically controlled than
SQL’s duplicate row value removal option.
In the examples shown, replicated data is produced because employee
Mike has two dependents, causing Mike to be in the virtual view twice when
using the old inner join Cartesian product access model (see Chapter 2). With-
out optimization, this replication is confusing since dependents are of no
importance or significance in either query, and therefore should not affect the
result. And note, these example data views are small; larger views offer a much
greater opportunity for optimization.
Other benefits of the outer join view optimizations are that it does not
penalize the user for picking a view that is too large, and that large views
will eliminate the need for many small views, making life easier on end users
and DBAs.
Subprocess 1 B D Subprocess 2
accesses leg 1 accesses leg 2
C E
these new external functions will require modifying existing SQL code, usually
by hand. In SQL:1999, these functions, which can be user-defined functions,
can be navigation functions that can access tables through other tables to avoid
the need to join them. For example, the first outer join example in Figure 11.5,
which models the structure diagram in Figure 11.4, is only selecting a column
from the lower level table C. This SQL statement can be rewritten to avoid
unnecessary join operations, as in the bottom SQL example in Figure 11.5,
by using a navigation function that uses the data structure meta information
extracted from the original query so that it only returns keys that exist in the
structure.
This optimization still conforms to the semantics of the structure shown
in Figure 11.4 and operates seamlessly because it continues to follow and obey
the hierarchical semantics of the outer join. Using outer join data modeling
today can allow for the capability of automatically utilizing future features (like
this one) as they are introduced into SQL systems. This is achieved by database
system software that dynamically rewrites the SQL specification to use the new
functionality. This capability, with its dynamic operation, also allows it to be
applied to ad hoc queries where it could not be accomplished otherwise, since
the selected columns are not known beforehand.
Current SQL:
Figure 11.5 Automatic SQL rewrite to take advantage of future SQL capabilities.
Outer Join Optimization 139
Figure 11.6 Outer join query can be translated to very efficient IMS access code.
140 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Network View
junction points, which can turn a network structure into a valid hierarchical
structure dynamically. For example, if tables D and E are not referenced in the
network structure in Figure 11.7 (as documented in Section 11.3), then tables
D and E are eliminated from the materialized view, creating a valid hierarchical
structure and enabling all the benefits that go with it, as described in Chapter 5.
This is demonstrated in Figure 11.8.
The optimizations shown in Figure 11.8 will also apply for network
structures where the network junction points are linked to multiple paths using
AND logic instead of OR logic. This structure, while similar, is not actually a
network structure, and is described in Chapter 6.
A A
SELECT B, C SELECT C
FROM NetView B C FROM NetView B C
D D
E E
Unoptimized: Optimized:
optimize inner join query. The converted outer join query now performs the
inner join of the three tables involved and then filters the result using the
WHERE clause criteria. In this case, the WHERE clause in Figure 11.9 is
based on the lowest level table, Dpnd, which means any missing data for table
Dpnd would be filtered out. This further implies that missing data for table
Emp would be filtered out and so on up the path. This logically turns the query
into an inner join since no data is actually being preserved—this means only
complete rows that match the selection criteria are selected.
If the WHERE clause in Figure 11.9 specified a filter on table Emp
instead of table Dpnd, the optimization shown could not have been performed
since it would remove data preserving below the table Emp level when table
Emp passed the filtering test. This leads one to believe this inner join optimiza-
tion can only work when the WHERE clause is filtering at the lowest level. But
this is only partially correct. To see why, examine the SQL optimization in
Figure 11.10.
In Figure 11.10, the WHERE clause is at the lowest level in the data
structure and the filtering data is contained in the last table joined, table Dpnd.
But, the problem here is that while table Dpnd is at the lowest level, there are
other legs in the structure. Table Dept is on another leg, and if the query were
changed to an inner join, no data would be preserved when table Dept did
not match a table Emp row occurrence. In this case, as we learned earlier in
Chapter 5 on data structures, sibling legs are independent of one another. This
means what occurs in one leg should not influence the other. By converting the
outer join in Figure 11.10 to an inner join, it changed the semantics such that
what happens in one leg can affect all the other legs. This changes the result of
the query. This means that performing these types of optimizations requires
analyzing the semantics of the outer join queries very carefully.
11.9 Conclusion
This chapter has presented powerful semantic optimizations that are enabled by
the outer join data modeling ability. Without utilizing the outer join optimiza-
tions presented in this chapter, the outer join will operate less efficiently than
the inner join. This will prevent many users and vendors from utilizing this
powerful operation. But if the outer join optimizations presented here are util-
ized, the efficiency of the outer join could equal or even surpass the inner join
in many cases. This means that the outer join, with all of its powerful capabili-
ties, can be comparable to the efficiency of the inner join!
It was also demonstrated that outer join view optimization could convert
a network structure into a hierarchical structure, thereby enabling all the fea-
tures and capabilities available to hierarchical structures.
The optimizations presented in this chapter demonstrate the value of data
modeling and the importance of the capability to determine the data model
defined by outer joins. The data model represents the semantics of the data and
makes it easier to determine the consequences of changing the SQL to optimize
SQL queries.
12
Hierarchical Relational Processor
Prototype
With standard SQL having the capability to inherently process hierarchical
structures, it is no longer necessary to force all data into a flat structure that
obscures the data structure and unnecessarily replicates data. If the data is being
modeled hierarchically, it can be processed directly in this more powerful form
by using outer join specifications that directly model the data structure and exe-
cution paths.
The examples in this chapter show the operation of an standard
SQL-based hierarchical relational database processor prototype that is driven by
the inherent data modeling capability of the standard SQL outer join. It utilizes
the DSE technology, described in Chapter 9, to dynamically extract the data
structure meta information naturally present in outer join specifications. This
freely available information is used to control the hierarchical heterogeneous
processing of relational and nonrelational data. It produces a hierarchical
WYSIWYG display that conforms to the underlying data structure of the SQL
query request. This produces results that are semantically superior to standard
SQL processing and are more semantically accurate.
This new hierarchical processing prototype does not require that the data
be in a fixed format or that the data structure be predefined. The data can be
stored in standard first normal form relational tables, flat files, or hierarchical
prerelational or postrelational databases such as a legacy database or a nested
relational database. The data structure can be specified dynamically, giving it
145
146 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
The Suppliers over Parts example in Figure 12.4 does reference the associ-
ation table to include the QNT (quantity) column. This value is known as
intersecting data because its data is meaningful at the point of intersection (i.e.,
the quantity of a given part for a given supplier) also explained in Chapter 7.
This intersecting data appears to be a value associated with the Parts table since
values in the association table will always appear to be a value from the lower
level table, as shown in Figure 12.4.
processing capability. It often happens that a stored view is used where it is not
necessary to access all the tables defined for the desired result. With standard
inner join views, it is always necessary that all tables in the view be accessed.
This not only results in more overhead, but often incorrect results caused by
accessing unneeded tables, which in turn can cause replicated data values and
lost data. With outer join views, this unnecessary data access concern is not nec-
essary and can be avoided.
The example in Figure 12.7 is identical to the previous example in
Figure 12.6, except in this example no data is selected from the Dependent
table. In this case, the hierarchical relational prototype determines from the
semantics of the data structure that the Dependent table does not need to be
accessed (see the Access column in the data structure table above). Notice that
the result of the SQL query statement in the example above, without the
Dependent data and access to the Dependent table, remains consistent with the
previous example. This proves that this optimization works in this situation.
152 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
12.6 Conclusion
This chapter has demonstrated an innovative SQL processor prototype that
operates on disparate heterogeneous data in a high-level hierarchical manner.
Previously, SQL processing of disparate heterogeneous data always used the
lowest common denominator structure—the flat structure. With standard
SQL’s capability to directly model and process hierarchical structures, there is
no longer a need to map structured data into a flat structure when hierarchical
structures are being modeled. Besides the ease and efficiency of one-to-one
mapping, the powerful hierarchical semantics of the modeled data structure are
maintained and utilized.
The live hierarchical SQL examples presented in this chapter prove a
number of things about the DSE technology. First, the DSE software operates
as expected—it does extract the data structure meta information embedded in
the outer join. Second, it can be utilized to develop products like the hierarchi-
cal relational processor that would not be possible otherwise with standard
SQL. Third, and most importantly, it proves the data modeling technology
Hierarchical Relational Processor Prototype 153
behind the DSE software is valid and does work. This means the outer join
does indeed inherently support the data modeling of complex data structures
consisting of multiple legs, and one-to-many, many-to-one, and many-to-many
relationships. Fourth, it demonstrates this technology is useful and viable.
13
Object/Relational Interface
The outer join’s object/relational interface capability is the best showcase for
the features and capabilities of the outer join. It uses all the inherent features
and attributes of the outer join and the advanced capabilities made possible by
the DSE technology described in Chapter 9. But the most powerful operation
at work is the interaction and synergism of these capabilities. These capabilities
and their interrelationships are represented in Figure 13.1. This chapter will
cover each capability and attribute in the diagram and explain its function,
importance, and interaction with those capabilities it enhances. Other object/
relational capabilities introduced in SQL:1999 are described in Chapter 8.
This chapter covers each object feature shown in the diagram in
Figure 13.1, one or more times. At the top of the diagram, the standard SQL
outer join operation acquires its object-enabling capabilities and attributes.
These capabilities are standardized via ANSI standardization, dynamic opera-
tion, and powerful data modeling capability enabling complex data structure
processing.
155
156 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Outer join
Figure 13.2 Object/relational interface transfers data to and from structured memory.
158 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Dept
Emp Emp
View
Emp Emp
View
Dpnd Dept
Dpnd
and reusability, but can be applied to all forms of database access and inheri-
tance described in Section 13.4. Because of outer join optimizations described
in Chapter 11, they do not necessarily add inefficiencies.
The EmpView and DpndView structured views shown in Figure 13.4 are
hierarchical as represented in the diagram, indicating they would be combined
with a LEFT outer join. Another possibility that may give more desirable
results depending on the situation is to join the tables using a FULL natural
outer join to create a logical table, as described in Chapter 7. In this way, the
Coalesce function can be very useful for data inheritance when the same data
types exist in both tables and one or the other need to be used or overridden—
for example, COALESCE (Person.Birthdate, Employee.Birthdate). In this way,
Birthdate would be supplied if it existed in either table, and if it existed in both
tables, the Birthdate value from the Person table would be used since it is the
first one specified in the Coalesce function.
have to be concerned about using the most limited view available for the query.
One large view can serve for many smaller subviews. This increases data
abstraction for the user and helps reusability by allowing one view to be used
efficiently in many applications. Efficiency is derived from the possible seman-
tic optimizations and database navigation that supplies the means to implement
the optimizations.
The optimizations utilize the hierarchical structure modeled by the outer
join so they will also work seamlessly on nonrelational databases. Another opti-
mization that offers powerful capabilities for object databases is the dynamic
rewriting of outer join requests that can automatically utilize advanced capabili-
ties in the underlying database system as they become available. This was
described in Chapter 11 and is shown in Figure 13.5. These include SQL:1999
object capabilities and functions that can be used to perform direct navigation
to bypass costly joins. This means that SQL outer join views do not have to be
associated with slow processing join bound processing. This can improve the
performance of inheritance, described in Section 13.4, so that it becomes prac-
tical to use. Since data modeling and structure processing can be improved by
outer join optimizations, all capabilities that depend on them are likewise
improved.
Figure 13.5 SQL:1999 navigation can avoid joins while maintaining view semantics.
Object/Relational Interface 161
A
B C Y Z Y Z
C
Application Application
13.8 Conclusion
The data modeling and data structure processing ability of the outer join
coupled with the data structure meta information extraction technology
(Chapter 9) can produce the capabilities and attributes shown in Figure 13.1.
These capabilities interact with each other to produce features that are
more powerful than when taken alone. Used together, they help make a very
powerful object/relational interface that has the capabilities required of an
object database and at the same time has the features and characteristics of a
relational interface.
The capabilities presented in this chapter were not accomplished by graft-
ing on new features that do not meld with relational operation, or by arbitrarily
defining new semantics for SQL. The standard SQL outer join operation inher-
ently and seamlessly supplies the framework for the capabilities discussed and
shown in this chapter.
14
Nonrelational SQL-Based Universal Data
Access
The growth of the database market resulted in a variety of vendors releasing
SQL products having diverse features, including disparate types, data access
interfaces, and dialects of SQL. There was demand in the database community
for commonality and the ability to use a single SQL dialect and single program-
ming interface in standards-compliant SQL products.
The SQL database companies cooperated to develop standards for the
language and then standards for the data access programming interface. The
international standard SQL Call-Level Interface (SQL/CLI) was published in
1995 and Microsoft aligned its ODBC specification with that standard. When
Java was developed, the JDBC™ specification adopted many of the conven-
tions used with ODBC and SQL/CLI, such as supporting the same SQL
language.
ODBC, SQL/CLI and JDBC support the use of the SQL OUTER
JOIN. Those specifications support OUTER JOIN and the APIs provide exe-
cution time capabilities for determining if a specific database supports
OUTER JOIN. ODBC and JDBC™ share a common escape sequence for
expressing an OUTER JOIN in interoperable SQL statements.
Besides ODBC and JDBC™, a variety of other application program-
ming interfaces (APIs) were developed to provide universal data access. Like
ODBC and JDBC™, they exploit SQL as the language for accessing data.
Using these frameworks with SQL to access a nonrelational data source is feasi-
ble using specialized software, database drivers and data provider, for that data
163
164 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
source. With the appropriate driver, you can use SQL to access spreadsheets,
CODASYL databases and plain text files (if the text files are structured text).
Many nonrelational data sources are hierarchical in structure, but it’s pos-
sible to interface seamlessly to them using SQL. The outer join’s data modeling
ability provides one more powerful tool for SQL-based universal data access.
To demonstrate outer join’s power, this chapter presents a method that enables
standards-based data access frameworks to seamlessly process structured data
records. This process can be applied to hierarchically structured data, such as
XML, IMS, SAS, and Adabas data.
Structured record processing is usually the last legacy type access that is
implemented by SQL-based universal data access products. Because of the way
the structured data is contiguously stored in structured records, SQL has had a
difficult task interpreting its makeup and mapping it to a relational data struc-
ture. This chapter will show how the ANSI outer join operation can naturally
map these hierarchical structures and how their contiguous structure makeup
can be accessed seamlessly by standard SQL-based universal access frameworks.
Some SQL products are starting to support nested relations, where a given col-
umn of a table can itself contain multiple rows and columns of data. These
nested relations can form hierarchical structures very similar to structured
records, and for this reason can be processed in a similar fashion to that shown
in this chapter.
01 Div. Div
10 DivName Pic X(20).
10 ProdCnt Pic 99.
10 DeptCnt Pic 99. Dept Prod
10 Dept Occurs 0 To 50 Times
Depending On DeptCnt.
20 DeptName Pix X(20). Emp
20 EmpCnt Pic 99.
20 Emp Occurs 0 To 50 Times
Depending On EmpCnt. Division Data:
30 EmpName Pic X(20).
Div Dept Emp Prod
10 Prod Occurs 0 To 50 Times
DivX DeptA Ron ProdX
Depending On ProdCnt.
Mary ProdY
20 ProdName Pic X(20).
DeptC Mark
DivX 2 2
DeptA 2 Ron Mary DeptC 1 Mark ProdX ProdY
01Div. Div
10 DivName Pic X(20).
10 ProdCnt Pic 99.
10 DeptCnt Pic 99. Dept Prod
10 Dept Occurs 0 To 50Times
Depending On DeptCnt.
20 DeptName Pix X(20). Emp
20 Emp Occurs 2 Times.
30 EmpName Pic X(20).
10 Prod Occurs 0 To 50 Times Division Data:
Depending On ProdCnt.
20 ProdName Pic X(20). Div Dept Emp Prod
DivX DeptA Ron ProdX
Mary ProdY
DeptC Mark
DivX 2 2
DeptA Ron Mary DeptC Mark Null ProdX ProdY
Figure 14.2 View of a structured data record with “fixed occurs” Emp segment.
DEFINE DivView AS
SELECT * FROM Div Div
LEFT JOIN Dept
ON DivKey=DeptDivFkey Dept Prod
LEFT JOIN Emp
ON DeptKey=EmpDeptFkey
LEFT JOIN Prod Emp
ON DivKey=ProdDivFkey
DivX
DeptA Ron Mary DeptC Mark ProdX ProdY
,
the only view necessary for accessing the structured data record. Because of the
SQL optimization documented in Chapter 11, Section 11.3, this view always
eliminates unnecessary table accesses for each specific use of the view. This
means there is never a penalty for using this global view.
Set Structured Record Buffer address to start of structured record input data.
Set current position in View Definition to root segment definition.
Init Internal Segment Address Stack to empty.
the data is for read-only purposes and will not be updated. Another optimiza-
tion that is possible is to hold off invoking this segment decomposition routine
until after the root segment for the active record is processed. This is possible
because the root segment will be processed first, before the lower level segments
Nonrelational SQL-Based Universal Data Access 169
of the record are required. The root segment is the leading segment and is
accessible without performing the segment decomposition routine. The reason
that this is an optimization is that very often the root segment contains record
selection or join qualification criteria that may cause bypassing of further pro-
cessing of the record, and this optimization will avoid the process of decompos-
ing the record.
If the structured record is to be updated, including inserting of segment
occurrences, the structured record must also be moved into a hierarchically
linked structure, or at least expanded while it is being mapped. This will allow
for the insertion of segment occurrences. Writing an updated structure record
back out is accomplished by first compressing it back into a contiguous struc-
tured record. This process is much easier than expanding the data structure,
since it has already been mapped.
It is worth noting that languages that can define hierarchical structures,
including COBOL, C, C++, XML, Java, Haskell, have the procedural flexibil-
ity to define structures that do not conform to good structure definition princi-
ples. These can cause problems for mapping procedures like the one in Figure
14.4. The most important rule to observe when defining a hierarchical struc-
ture is to keep each segment’s data definition contiguous. This means that once
a lower level child segment type is defined, it should indicate the end of the par-
ent segment. Any remaining segment data is ambiguous to the structure defini-
tion process.
Data Provider/Driver
SQL
Middleware Product
these intermediate virtual tables, any order of SQL requests from the universal
data access interfaces can be handled in a direct fashion, including updates.
With the outer join modeling the structured data record, this method produces
a truly seamless interface process with the SQL-based universal data access
interfaces.
Because structured records on file are more easily addressed through their
root segment, this can affect processing of SQL WHERE and ON clauses that
reference data in lower level segments in structured records. For root references,
the structured record processor in Figure 14.5 can directly address the required
structured records on file, while for lower level references it will have to sequen-
tially search through the selected structured records’ contents unless a second-
ary index was used.
indication in the root segment and then using the proper structure overlay to
process it. A similar technique can be used for SQL queries to ensure that only
records of a specific format are processed by selecting on the format indication.
This is usually appropriate for queries since only one format for a query is usu-
ally required at one time. This format selection process can be specified as in:
SELECT EmpNo FROM StructuredView WHERE DeptNo=123 AND Struc-
turedFormat=2. In this example, the DeptNo and StructuredFormat fields are
located in the root segment. This technique works because the structured
record can be retrieved and its root segment tested without the need to decom-
pose the structured record, as discussed in Section 3 of this chapter.
Dept Prod
Emp
14.8 Conclusion
A variety of applications make use of structured, hierarchical and tagged data.
When structured data blocks are written out to a file, they are accessible as
structured records.This chapter has shown how these structured records can be
seamlessly processed by SQL. In order to demonstrate this, it was shown how
structured records are composed and decomposed for access. It was then shown
how SQL processing can seamlessly map to and from a decomposed structured
record. Finally, it was shown how SQL structured record access can be imple-
mented seamlessly using SQL data access with APIs such as ODBC, SQL/CLI
and JDBC. This structured data example was used because it can be easily
adapted to operate with all other physical forms of hierarchical data.
Part IV
Advanced Data Structure Processing
Capabilities
Part IV describes the new capabilities for supporting SQL hierarchical process-
ing with advanced and extended operations. Chapter 15 introduces advanced
lower level structured data linking, opening new data modeling capabilities and
unlimited structure join capabilities. Chapter 16 covers three new ways to com-
bine data structures using joining, mashups, and table association for advanced
ways to heterogeneously integrate and filter data. Chapter 17 describes how to
dynamically increase data value and flexibility of queries, making them more
powerful, supporting hierarchical optimization, dynamic structure joining, and
the needed structure-aware processing. Chapter 18 covers how the lowest com-
mon ancestor (LCA) processing automatically supports multipath hierarchical
structure processing naturally in SQL. Chapter 19 introduces many forms of
data structure generation, using looking forward and backward to support dif-
ferent the types of variable structure generation that are discussed. Chapter 20
demonstrates semantically controlled data structure transformations involving
restructuring, reshaping, and data virtualization. Finally, Chapter 21 intro-
duces the new automatic processing of remote dynamic structured data process-
ing for capabilities such as new software development techniques using social
collaboration.
175
15
Advanced Lower Structure Linking
Advanced lower structure linking applies to hierarchically linking to the lower
structure in a way that is not covered in the linking rules specified in Chapter 6.
Normally when linking to the lower structure, the root of the lower structure is
the only link point that can be referenced. This creates a valid hierarchy, and
one that can be built top to bottom as would normally be expected for a hierar-
chy. But there may be times when it is desirable to link to an existing lower
level structure not based on its root. This is actually possible, and it will form a
valid logical hierarchical structure with hierarchical semantics that are seam-
lessly compatible with standard SQL view processing.
177
178 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Resulting Structure
Manager
DivView View Manager
L
I Division Division
N
K Department Department
L Division View
I
N Division Manager
K
Department Product Department
defined in linking rule two in Chapter 6. An example of this operation with its
data structure diagram and SQL is shown in Figure 15.3. Even with multiple
paths to the lower structure, the root of the lower data structure is semantically
the link point and the standard SQL outer join semantically and operationally
supports this derived data structure. The lower level structure, which is usually
built before it is joined, is filtered when joined according to the link criteria.
This is the same process that occurs when structures are built bottom-up and
throwaways (retrieved row discards) occur, as was described in Chapter 11. In
the example below, the Division view is filtered according to the Manager link
value as it is linked. This means as each manager is linked to the Division
view, only the Department and/or Product for which that particular employee
Employee Employee
Mgr Mgr
SELECT *
FROM Manager Div Div
LEFT JOIN DivView
ON DeptMgr=Mgr Dept Prod Dept Prod
Figure 15.5 Single path nonroot reference to lower structure data example.
with no other data since Jim is a product manager and not a department man-
ager, and the linking was based on department managers. Notice that all the
other data on the nonfiltered paths are not filtered out. This structured result
also reflects the same result (minus the replicated data) applied relationally, as
can be seen by applying the link criteria to each row in the Cartesian product
in Figure 15.4.
Mgr Mgr
SELECT *
FROM Manager Div Div
LEFT JOIN DivView
ON DeptMgr=Mgr
Dept Prod Dept Prod
OR ProdMgr=Mgr
Emp Emp
Structured result :
Figure 15.6 Multiple path nonroot reference to lower structure data example.
but not Departments. As stated previously, the multiple path semantics dem-
onstrated here were covered in Chapter 5 under sibling leg semantics.
This structured result also reflects the same result applied relationally,
as can be seen by applying the link criteria to each row in the Cartesian product
in Figure 15.4. This result may seem ambiguous since in some cases Products
are filtered and in other cases Departments are filtered. But it does link
the structure to the DivView structure hierarchically and may be useful if
the filtered values are not used in summaries unless they match the resulting
semantics.
A final word about multiple paths and sibling path semantics. The Divi-
sion view (DivView) in Figure 15.3 was used to demonstrate multiple path
semantics using the Department and Product tables. These semantics were first
described in Chapter 5, which documented how sibling leg semantics relied on
the “common parent” domain to determine and control the semantics. The
common parent of the Department and Product segments is the Division seg-
ment, which also happens to be the root segment of the Division structure.
Note that this is a coincidence—the root of a structure does not automatically
operate as a common parent. This means that semantics of multiple path lower
level references could become complex, with many different common parents
184 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Structure
Manager Resulting Optimized
L L
I I
N N Manager
K DivView View K
1 2
Division Division
EmpView View
DEFINE EmpView AS
Employee SELECT * FROM Employee
LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
Dependent WHERE DpndAge<19
Department View
Expanded View:
Employee
SELECT * FROM Department
LEFT JOIN
Dependent Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
ON DeptNo=DpndDeptNo
AND DpndAge<19
15.8 Conclusion
Nonroot lower level structure linking is a powerful capability that extends the
outer join’s data modeling capability and hierarchical processing. While it does
not follow hierarchical processing rules precisely, it does generate hierarchical
structures with hierarchically correct semantics, and extends this hierarchical
data modeling ability automatically and naturally to relational and
nonrelational database processing.
16
Dynamic Structure Combining by
Joining, Mashups, and Association
So far in this book, the data modeling has consisted of building the hierarchical
structure one table or node at a time. In this chapter, we will look at the joining
of fully formed hierarchical structures and their hierarchical joining into larger
hierarchical structures that contain the combined semantics of both structures.
The query process that joins the hierarchical structures can also query the com-
bined structure, allowing for enhanced improved processing results. It is not
possible to gain these results by accessing each structure separately. In this chap-
ter, we will look at three ways to join hierarchical structures. These are: using
the standard structure join method; using a newly discovered method that
allows for advanced data structure mashups; and using a powerful association
table when the structures are not directly related. The standard structure join
method allows joining to any location in the upper hierarchical structure, but
requires joining to the root of the lower structure. This is semantically valid
intuitively. The LEFT JOIN is used to hierarchically preserve unmatched data
on the left side of the join operation. The LEFT JOIN ON clause join criteria
are specified at each join point, giving absolute hierarchical control.
187
188 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Division
Division
Product Department
Product Department
Dependent Dependent
static joins not easily modifiable. The join operation will materialize each struc-
ture and link them together by following the ON clause specification in the
same way this book has shown how hierarchical structures have been built. The
ON clause operation is represented in Figure 16.1 by the arrow connecting the
Division and Department node boxes. This join is invoked by the dynamic
SELECT query operation, directly following the previously defined static view
definition in this example. The materialized structure is freed up after the query
is processed. The SQL source provides the metadata that defines the structure.
This automatically combines structures by joining their metadata and defines
the structure so that the hierarchically combined structure can be automatically
navigated internally when processed. Like XML, the SQL source can directly
act as the hierarchical metadata defining the hierarchical data structure.
Division Division
Dependent Employee
Dependent
Department View
Department
Department
LEFT JOIN Employee
ON DeptNO=EmpDepNo
Employee
LEFT JOIN Dependent
ON EmpNo=DeptEmpNo
Dependent
join criteria in Figure 16.5, it does not seem intuitively correct for a hierarchical
structure. It is not modeling a valid hierarchical structure on the face of this
operation. The join points, when linking below the lower-level root in Figure
16.5, do not appear to define the resulting structure or any valid hierarchical
structure. In reality, the resulting structure and SQL semantics displayed are
valid as shown. This is a mashup because it eliminates joining restrictions
allowing unlimited joining possibilities.
It was shown in Figure 16.3 that the lower-level structure was not sequentially
joined one node at a time to the upper structure that had already been built,
but was first totally materialized before being joined to the upper structure.
This allows two powerful capabilities. It allows lower logical structures to be
treated as physical structures that are fully materialized before being joined to
the upper structure. This internally makes lower-level linking feasible. This was
the first hurdle. The second hurdle was finding valid consistent semantics for
linking below the root. It turns out that a materialized hierarchical structure has
its own fully formed hierarchical semantics that are controlled from its root
downward. Linking below the root (as shown in Figure 16.5) does have a hier-
archical filtering effect on the result, which makes sense semantically and is
desirable. This will cause unmatched data node occurrences below the
employee node to be filtered out, and it can also filter out higher-level nodes
above Employee if all employees are filtered out. This will result in no matches
for the lower-level structure.
Since the lower structure is fully formed with its own hierarchical seman-
tics, the lower-level structure can be linked anywhere below the root. The
Division Division
Dependent Employee
Dependent
resulting semantics always enforce the linking from the upper level structure to
the root of the lower level structure, as shown in Figure 16.5. This preserves the
semantics of the lower level structure.
Department
Department
Employee
Employee
Dependent
Dependent Division
Product
Division
Feature Project
Product
Feature Project
criteria is matched. When only the Project node is matched, the Project node is
filtered instead of the Feature node, which remains unfiltered. The reverse
would be true if only the Feature node was matched. If both nodes qualify
together, both perform filtering together from both sides. If this join criteria
used an AND condition instead of an OR condition, then both sides would
need to qualify at the same time in order to find a match. This demonstrates
very complex and flexible control for mashups. This is the same logic as access-
ing the lower structure directly with a multipath query. Both cases should be
and are processed the same, making this logic correct.
Department Department
Employee Employee
Division
EmpKey ProjKey
Division Product
Product
Feature Project
Feature Project
the Department view by reversing the input and output. This new structure
will make the intersecting data available, as shown in the EmpProj node. The
association table will have to be updated as needed. This can be done by an
automated procedure. The EmpProj and ProjEmp represent the two
one-to-many relationships derived from the many-to-many association table in
Figure 16.8.
196 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Division
Division
Product
Product
Feature Project
Feature Project
Employee Employee
Dependent Dependent
16.10 Conclusion
This chapter has shown how full multipath hierarchical structures can be easily
and powerfully combined hierarchically into larger structures that both
enhance and strengthen the embedded semantics, and can be dynamically pro-
cessed in queries. This can be a heterogeneous combination of powerful logical
and physical hierarchical structures. This combining of structures can be per-
formed using standard structure joining, new powerful mashups, and the asso-
ciation tables that are used when matching relationships are not available.
Hierarchical query filtering, which was necessary to understand the lower level
structure data filtering described in this chapter, was also touched upon.
17
Dynamically Increasing Data Value and
Flexibility
Beyond their ability to organize data, the full power of hierarchical data struc-
tures is not realized today. They have the inherent ability to significantly
increase data value beyond the value of the data collected. Hierarchical struc-
tures naturally and automatically capture more meaning than is stored in the
data, increasing the value of the data they store. This chapter will present and
explain why this is so, and will show ways in which hierarchical structures can
be used to significantly increase data value. When this is understood, hierarchi-
cal structures should be used more than they are today in order to take advan-
tage of their incredible amount of unused data value and its flexible utilization.
197
198 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Add-> Emp
Add-> Dpnd
Emp1
Emp2
Dpnd 1
Dpnd 2 Dpnd 3
Dpnd 4
track of by the hierarchical structure. In this case, Employees 1 and 2 each have
two different sets of Dependents. This multiple data occurrence shared across
the parent utilizes multiple data sharing. This ability to keep track of multiple
hierarchical sets of data objects is not only useful, but it further increases the
data value by separating and containing the data. This automatic operation also
continues to make this data more valuable automatically.
Div
Feat Proj
Div1
Prod 1 Prod 2
Combining the Division and Department view in Figure 17.5 was made
very easy, and significantly increased the resulting structures’ data value beyond
the total data value of the separate use of each. Many new queries can be speci-
fied that were not possible before, and these queries are made significantly more
powerful by combining the massive semantics of both structures. It is easy to
Dynamically Increasing Data Value and Flexibility 201
Division
Product Department
Dependent
see why this seamless combined structure offers many times the capability than
each of its structure taken separately.
SELECT * FROMDivisionViewLEFTJOINDeptViewON
DivNo=DeptDivNo
The resulting structure produced by the above dynamic SELECT join state-
ment is the same as the one shown in Figure 17.5. It can be seen how easy these
dynamic views can be specified, modified, and created at any time without hav-
ing to previously define them in a view that makes them static.
paths to data. Relational structures are logical structures that are created or
modified when needed. XML and IBM’s IMS are examples of the use of physi-
cal access methods that operate with fixed, physical data structures. Those data
structures are inflexible, whereas logical structures (like relational) are flexible,
efficient, and dynamic when delivering the benefit of data independence. How-
ever, fixed structures actually become logical structures when retrieved by SQL
and stored as relational data values. This can be supported by the introduction
of physical views. This is shown in the following XML view. The difference
between physical views and logical views is that the ON clause join condition
for fixed structures equates node types instead of fields in nodes. This is because
fixed hierarchical structures are contiguous, like XML, or already are linked
internally, like IBM’s IMS. In effect, they are already joined so that only the
linking node types are necessary to define the physical view. This is shown in
the XML view below:
This view, when processed, converts the physical structure into a number
of hierarchical related tables that are defined as a relational structure using a
series of LEFT outer joins. This physical data support automatically enables the
seamless heterogeneous processing of a mixture of fixed and logical structures,
because when they are processed, they both are logical relational structures
defined by SQL’s LEFT outer join operator. The hierarchical LEFT out join
will seamlessly process them as a single virtual hierarchical structure.
Div
Prod <------LCA
Feat Proj
automatic and easily available. This makes the data accessed more readily avail-
able to nontechnical users, and therefore, more valuable.
17.9 Conclusion
This chapter has described how hierarchical data structures can be very power-
ful and useful for storing data and in the processing of this data by continually
increasing the value of the stored data. The stored predefined static data value
increases in use and information value as the view structure is expanded
through data modeling. At this point, the data view structures can be dynami-
cally combined, greatly increasing the data value. And finally, the query con-
trols how the data value is further increased in order to answer the query. This
automatic multilevel data reuse and dynamic combining of the hierarchical
structures makes the hierarchical structures a very useful data storage structure
that automatically leverages its increasing data value. This automatic increasing
of data value in the hierarchical structure and hierarchical queries is an incredi-
bly powerful overlooked and underutilized capability.
18
Automatic Multipath Hierarchical
Structure Operations
SQL hierarchical processing became possible with the introduction of the
SQL-92 standard by using its new one-sided LEFT (Outer) JOIN operation
and its accompanying ON clause addition. This supplied the necessary hierar-
chical principles, data preservation operations, and specific node linking con-
trol. What was unexpected was that this new operation also enables automatic
multipath hierarchical processing and its required special lowest common
ancestor (LCA) processing that makes this possible, which is discussed in this
chapter.
This chapter introduces and demonstrates a number of multipath hierar-
chical structure operations that are new for SQL. Many are made possible by
SQL’s inherent hierarchical processing. Some are new for hierarchical process-
ing in general. Some advanced new hierarchical processing operations that are
made possible by automatic hierarchical optimization and that can be intro-
duced seamlessly into an SQL hierarchical processor will be shown. These pro-
duce many advanced hierarchical operations that are seamlessly supported by
this hierarchical optimization. While some are not inherent SQL operations,
they can be easily added to SQL and automatically performed in SQL. These
described capabilities include: focused aggregated data retrieval; multipath LCA
hierarchical processing; schema-free processing; and hierarchical data filtering.
207
208 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
B B
C F G
B E
31
41
Div 1
Div 1
1 11 10
Result 20
2 31 40
Div 1 41
Div <-----
Feat Proj
different set of data combinations is used for the different LCAs. This is per-
formed automatically and naturally in the SQL, as shown in Figure 18.7.
Div <----
Feat Proj
Div
Prod 2 Prod 1
hierarchically above node Div; this should be treated as an error because this
operation has inadvertently and incorrectly changed the structure being pro-
cessed, and the SQL hierarchical processor would be operating incorrectly if it
proceeds. It will not let the query proceed.
down directions. All nodes can still be affected as shown. In this example, other
pathways are related and can be filtered. The A node can be affected and the fil-
tering will be reflected down the path through the B node to the C and D
nodes. If the A node data occurrences are filtered out, so are all lower-level
nodes data occurrences under it. This allows the entire hierarchical structure to
be easily filtered. The WHERE clause can be as complex as needed, there are no
limitations or restrictions.
18.9 Conclusion
This chapter has described a number of very powerful processing capabilities
that inherently support, or extend support, for advanced new multipath pro-
cessing capabilities. These described capabilities include: structure aware pro-
cessing, focused aggregated data retrieval, multipath LCA hierarchical
processing, schema-free processing, nonlinear hierarchical ordering, and hierar-
chical data filtering.
This chapter has shown how natural LCA in SQL processing keeps hier-
archical multipath processing operating accurately in hierarchical and relational
processing. LCA is an area that is not covered outside of academic research.
Today’s academic research involves supporting LCA using external functions
because it has not yet been recognized that LCA processing exists naturally in
SQL, as explained in this chapter. The LCA internal complexity issues shown
demonstrate why external LCA functions will not work and they cannot be
seamlessly applied, requiring structure knowledge by the user. This chapter has
shown how inherent LCA processes naturally available in SQL solve these LCA
220 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
processing problems. This chapter also demonstrated how the automatic accu-
racy is maintained and how it is achieved.
This chapter has shown how hierarchical structures are extremely useful
and powerful. This is why they should be used more in business and everyday
processing. If they do become used to a great extent, their ability to be parallel
processed automatically could be a huge benefit.
19
Variable Data Structure Generation
This book contains five different ways, each with different uses, to dynamically
generate hierarchical data structures in SQL. These include basic data modeling
building-block structures, the joining of structures to create new more powerful
structures, and the use of the SELECT operation to dynamically specify the
desired data to be aggregated, which condenses the structure changing it. These
previously described data structure operations are under the control of the user.
This chapter will demonstrate a fourth, more powerful concept of having the
data drive the variable data structure generation. A fifth new way to generate
data structures (described later in Chapter 20) by using the current semantics to
transform any new data structure required.
221
222 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
joined acts as a look-ahead operation. This works for the same reason that link-
ing below the root in Figure 19.1 works; the lower-level structure has already
been materialized, making the data available for testing even before the join is
performed.
in Section 19.2, when linking below the root was explained. This added test
controls the structure by the value in the data being tested and controls whether
or not the join is performed, depending on the value in the data. The possible
structure results are shown in Figure 19.3. The effect of this data value test
below the root of the lower-level structure is the same as separately querying the
lower structure. The location of the immediately affected node tested (node L
in this case) makes a difference in how this data qualification performs as it
spreads out to affect the entire structure. This can be seen in Figure 18.10.
OR operation must always be tested and applied together. L.l=2 produces one
set of results, whereas M.m=4 produces a different set of results. These are auto-
matically combined in SQL to give the correct result. The replacing of the OR
with an AND operation will usually produce a different result.
19.10 Conclusion
This chapter has shown us how to use and control variable data structure filter-
ing. All the other methods of data structure generation shown in this book are
directly driven by the user. The data driven method relies on the ON clause
capability to test data values in order to further qualify the data from the ON
clause join criteria. This is a very useful capability that is not generally available
elsewhere.
20
Semantically Controlled Data Structure
Transformations
Data structure transformation is a vague term. There are actually two basic
types of data structure transformations. The transformation terms restructuring
and reshaping have been used interchangeably with data structure transforma-
tions, but these two terms bring to mind different types of transformations that
are associated with two basic transformations. In this chapter, they are
recategorized and defined as two different types of data structure transforma-
tions. Restructuring uses data relationships and reshaping uses data semantics
to perform their operations. SQL will be utilized to demonstrate these types of
data structure transformations using SQL’s natural hierarchical data structure
processing capability. Data structure virtualization can also be considered
another form of data structure transformation and is also covered in this
chapter.
231
232 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
these data relationships can involve the comparisons and formulations needed
to make the new relationships linkages.
Restructuring is used when new semantics are needed and data relationships
are available to support this. The resulting structure is not a concern. Reshaping is
used when a specific result structure is necessary; for example, when establishing
parent-child relationships with a hierarchical data model. The resulting semantics
are not a primary concern, but they are expected to be hierarchically derived from
the source structure. Restructuring requires that the necessary data relationships are
available, while reshaping does not require any external or internal requirements.
Therefore, reshaping is always available to use.
To perform transformation, SQL uses a multiple structure copy tech-
nique in order to transform the data by using the semantics in the data. The
transform examples show the transform by displaying each copy of the struc-
ture and the operations applied to them. Unneeded nodes are indicated by
dashed boxes, and unused paths are indicated by dashed lines. Nodes that take
part in the operation have bolded names, indicating that these nodes are moved
to the new result structure. Solid arrows indicate how the different structure
levels are modeled together. If linking below the root is used, then an additional
dotted arrow indicates this linking.
20.1.1 Restructuring
Restructuring is performed by taking the structure apart and rebuilding it by
using the multiple copies of the structure technique to rebuild the structure in a
different order, and/or by using different relationships. Using this technique,
the following SQL restructure operation in Figure 20.1 slices out the Proj node
shown below, and makes it the new root in the resulting structure using the
hierarchical LEFT outer join, shown in Figure 20.1. The data relationship used
in making this new structure may be the same or a previously unused data rela-
tionship. If this data structure was retrieved from a contiguous data structure,
like XML, there may not have been any physical data relationships to use. On
the other hand, once any type of data structure (physical or logical, contiguous
or linked) is retrieved into a relational rowset, it can be freely taken apart. Reas-
sembling will depend on the data relationships that can be made.
The alias feature of SQL (aside from its renaming use) allows for the mak-
ing of multiple separately named copies of a structure in the rowset that can be
used in taking apart the structure in Figure 20.1 and hierarchically reassem-
bling it. The separately identified data from the two named copies of the
Semantically Controlled Data Structure Transformations 233
Emp EmpProj
Y
Emp
X
Y Prod Result
Structure
Emp
X
Dpnd Proj
node in the same hierarchical structure position under Emp node, preserving
their hierarchy. This demonstrates that structures can be moved with a single
join operation when they can be used “as is,” as in the Emp/Dpnd fragment at
level Z. This is controlled in the SELECT statement that controls which
datatypes associated with their data level are used for output.
Notice in the SQL in Figure 20.2 compared to the previous SQL in Fig-
ure 20.1 that the additionally added LEFT join creates another copy of the
structure named Z. The Emp and Dpnd field in the SELECT list now uses the
Z prefix so that it can be referenced in the newly added LEFT join. This LEFT
join is used to move the Emp node hierarchically under the Proj node. Also
note that the X.Prod/Y.Proj node relationship utilizes linking below the root, as
shown by the dotted arrow that is used heavily in data structure transformation,
allowing for unlimited ways structures can be joined and transformed.
20.2 Reshaping
Reshaping is different from restructuring in that reshaping brings to mind a
molding process by shifting pieces of the structure around, as in molding a
piece of clay. This means that that there are no limitations as to what the result-
ing structure can be, allowing any structure to be transformed into any other
structure and enabling any-to-any structure transformations. The logic per-
forming this type of transform is in the semantics of the structure driving how
the transform is performed. This preserves the basic semantics that are applied
to the creation of the new structure. This means that the naturally implied rela-
tionships from the physical or logical juxtaposition of the data in the structure
flows with the structure as it is logically molded and controlled in the rowset.
This produces the desired new structure while taking into consideration the
current semantics represented by the data. This joining process is synchronized
by joining to the same matching data item copy in both copies being com-
pared. This is because no data relationships can be utilized.
The following examples will demonstrate a number of data structure
reshaping examples that cover linear-to-linear (single path); linear-to-nonlinear
(multipath); nonlinear-to-linear; and nonlinear-to-nonlinear structure trans-
forms. The previous restructuring used physical data relationships in the data to
perform the transform. This is not used in reshaping, which uses a technique
similar to the restructuring. It also uses the same processing of using multiple
copies of the structure, but the coordination between the copies of the struc-
tures is different because data relationships are not available. This requires
another method to coordinate multiple levels, so the relationship of the
236 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
structures is now made between the same named data type of the two structure
copies that are being moved in their required hierarchical order. This reshaping
technology has been used before to invert linear structures and will be expanded
here to include reshaping of nonlinear structures. All examples use a dotted
arrow to show the selected output data and how and where it is moved to the
output structure.
three nodes represented by their data are selected in reverse order once each for
output, so that the three levels X, Y, and Z are squeezed together through natu-
ral node promotion to keep their structure and juxtaposition relationship,
which is naturally inverted and now has M-to-1 relationships. These copies are
kept synchronized by comparing data values in the separate copies.
basically kept attached to the same nodes, as shown below in the derived result
structure. Emp over Dpnd remains the same while Emp over Dept has been
inverted.
DeptView
X De pt Linear Result
Structure
Dept Dept
Y
This example is similar to the previous example, but places Emp not under
Dept but under Prod instead, making Dept and Emp siblings for this structure.
This requires that a third copy of the input structure Z is also matched to Prod
because that is where Emp is being attached to. The Emp node is accessed indi-
rectly from Prod up to Dept then back down to Emp, a powerful related
semantic reshape operation. Basically, the third structure Z and additional join
in this example is necessary in order to move Emp from under Dept to under
the Prod node. This indirect link from Prod to Emp, because Dept becomes
the data modeling root, places it directly under Prod. This also utilizes the full
semantics between the Prod and Emp nodes. This produces the desired mean-
ingful result between Prod and Emp, but the closer the linkage is, the better the
result. The semantics can become fuzzy and there could be data loss when
reversing data relationships.
20.6 Conclusion
The SQL reshaping, restructuring, and virtualization shown do take a bit of
more complex and procedural programming, but they still use SQL’s automatic
semantic processing to help control and make processing more automatic and
correct. A high-level nonprocedural transform language could be designed for
this transformation based only the desired structure, and the SQL could be
generated automatically or could be internally performed automatically by
SQL.
It may have occurred to you that the need to create multiple copies of the
structure in the examples to perform transformations could be costly in mem-
ory use. However, most SQL processors should optimize the use of multiple
copies of the structure by keeping and using only a single copy.
X
Y
Y
X Z
Z
245
246 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
the place of static structured data for additional flexibility because it continues
to obey the rules for hierarchical structures after it has been dynamically
changed. The software using dynamic structures needs to accommodate the
variable hierarchical structures by using automatic metadata maintenance. This
will significantly speed up and automate changing the operation of dynamic
structured data.
multipath hierarchical structure before sending it off to user 3 for further pro-
cessing. This is done using the SQL hierarchical XML processor.
User 2 and user 3 are now performing independently and concurrently.
Both users 2 and 3 are retrieving their structure input from user 1, inputting
additional relational table data from their different user home locations, and
joining this data to their working data structures. After completing these tasks,
user 2 and user 3 will both send their modified data structures off to common
user 4 for further processing.
User 4 accepts the modified data structures from both user 2 and user 3,
which operated concurrently. It hierarchically joins them together using a
matching data item value between nodes B and X (B.bb=X.xx). User 4 then
eliminates unneeded data items from the joined result using SQL’s dynamic
SELECT operation to select data items for output from nodes A, B, E, Y and
W. This SQL query looks like: SELECT A.a, B.b, E.e, Y.y, W.w FROM U2
LEFT JOIN U3 ON B.b=X.x. This slices out all nodes (C, D, Z, X, V) that
were not referenced by the SELECT statement. This automatically aggregates
the necessary data nicely, as shown in Figure 21.1. This process is known as
projection in relational processing and node promotion in hierarchical process-
ing. The LEFT join operation hierarchically places user 2’s structure over user
3’s structure, connected by the ON clause specification of: B.b=X.x. This
newly-combined hierarchical structure in user 4 is sent back to user 1 for
immediate review, processing, and output. The hierarchical data can be selec-
tively output in different formats, each with different data selections, as shown
in Figure 21.1.
During this entire user-to-user collaboration process, the changing
dynamic data structures and data types are automatically maintained and
248 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
utilized transparently for the user as needed. This is indicated by the dynamic
metadata dashed box shown in Figure 21.1. The user at each receiving user
location can also view the current active structure and its data types. However,
knowledge of the structure is not necessary for the user to specify in the query
because the maintained structure is automatically known and used inherently
by the query processor. Different working data structure versions can also be
saved and restored at each user location by the user.
U2 U3
SQL SQL
Rel/Hier Rel/Hier
when and while they are being used. These logical structures add flexibility to
hierarchical structures and efficiency to their new use.
21.8 Conclusion
All of the powerful and flexible capabilities mentioned in this chapter make
multipath hierarchical structures and their hierarchical processing the perfect
opportunity for this new dynamic structured data processing. SQL is a univer-
sally known query solution, making it a perfect API that is enhanced by
dynamic relational hierarchical processing technology. Single one-way data
transmissions will also always be available to send to anyone at any time because
a receive-only version of the SQL hierarchical processor user-to-user will be
freely available to download and use to automatically view and utilize the
one-way transmitted data structure. This enables sending dynamically created
structured data anywhere and having it immediately available for the receiver to
utilize automatically.
The real importance of dynamic structured data is that it remains accu-
rate and precise even though it can change dynamically and remain accurate. Its
operation is fast and immediate; no manual metadata updating is needed and
changes to metadata can occur automatically and seamlessly. This flexibility
allows real-time parallel and network development collaboration using power-
ful hierarchical processing. It also enables unlimited new dynamic possibilities.
These dynamic capabilities allow the structure data processing to adapt auto-
matically and accurately to the desired needs using powerful standard SQL
hierarchical processing.
22
New SQL Hierarchical Processing
Technology and Discoveries
In this chapter, new discoveries and methods that were derived from our
research and used in our ANSI SQL hierarchical processor are reviewed. These
newly discovered methods are necessary for this advanced product to operate.
These methods will be discussed in this chapter. To start this chapter off, it will
first be discussed what type of hierarchical processing is being performed
because there is some confusion as to what is necessary for a fully powered, pro-
fessional, use-relational database.
255
256 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
entire nodes from the view from the start of processing, and this optimization
may not be picked up by the standard relational optimization. The standard
SQL query optimization is still required but the previous hierarchical optimiza-
tion may still have assisted the query optimization by having reduced the com-
plexity of the required relational optimization. This makes it possible to operate
more efficiently in recognizing optimizations that might otherwise be missed
by a query optimizer.
other node types in this example have multiple node data occurrences. For
example, B1 and B2 data occurrences under the root node A1 data occurrence,
and C1 and C2 data occurrences under B1 data occurrences. Notice that the
B1 data occurrence also has D1 and D2 data occurrences under it. The B2 data
node occurrence also has a similar set of data occurrences under it.
LCA can change dynamically. This happens when the WHERE clause uses an
OR operation instead of an AND condition. Because of the way data is quali-
fied by selecting ranges from the side that is not directly qualified, the OR oper-
ation always needs to be processed on both of its sides. This means that, even if
the first OR operation test is true, the second right-side test must be performed
too. This can make the LCA change from each side tested and processed. This
is valid and is how the relational rowset Cartesian product works too.
For example, using the data structure in Figure 22.1, the SQL query:
SELECT X.B, X.C, X.D, Y.E, Y.F, Y.G From Aboveview X LEFT JOIN
aboveview Y ON X.D=Y.E will join the sub structure with nodes B, C, D over
the substructure with nodes E, F, G. These substructures are connected
between the D and E nodes. The X prefix identifies the upper-level substruc-
ture, while the Y prefix identifies the lower structure.
This is very useful for XML, which does not generally use or need data keys
because XML structures are typically stored contiguously without keys associ-
ated with document components. These different transformations are further
explained directly below.
Restructuring is performed by breaking the structure apart and putting it
back together differently by using other data relationships that naturally exist in
the data. This introduces new meaning to the structure. On the other hand,
reshaping implies molding the structure into a new shape utilizing the natural
semantics in the data structure. This naturally changes the data structure, but
preserves the meaning and semantics in the structure. Restructuring and
reshaping usually have different uses. Reshaping can be used to match the
structure to an application, while restructuring is used to produce a structure
that is a new in meaning and use. The processing of both of these techniques
can be combined.
22.13 Conclusion
This chapter highlights all of the research, new features and new discoveries
that were needed to support the SQL transparent hierarchical processor and its
new capabilities presented in this and previous chapters. These new features
and capabilities are listed below.
The new features and capabilities are:
253
23
SQL/XML: Operation, Politics,
Ramifications, and Solution
Today, most SQL product designers entered the database field with the advent
of E.F. Codd’s relational database and model. The author entered the field ear-
lier designing commercial hierarchical query products when hierarchical data-
bases were in their pinnacle. When RDB, arrived, he adapted these relational
products to support hierarchical processing seamlessly and transparently. This
gives him a unique perspective of relational and hierarchical database from the
point of view of not only how they are different, but how they are also alike.
The SQL hierarchical processor mentioned in this and previous chapters
is described in greater detail in Chapter 24. In order to demonstrate and prove
how this processor supports hierarchical processing, transparent XML support
was implemented in it. Valid XML data structures are hierarchical and access to
the data is performed by navigating through the hierarchy of the data structure.
XML contains some operations that are not present in traditional SQL query
processing; these are described in this chapter.
The implementation of the ISO SQL/XML standard and W3C XQuery
query standards were a disappointment for this author. The technical and polit-
ical reasons contributing to this will be discussed later in this chapter. The SQL
hierarchical XML processor mentioned above was designed to properly support
hierarchical and XML support. It will be used to demonstrate what is needed
for an SQL solution to fully support XML.
265
266 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Cust
Invoice Addr
Table 23.1
XML Element Mode Output
SELECT * FROM CustView FOR XML Element
<cust> <addrcustid>Cust02</addrcustid>
<custid>Cust03</custid> <addrstate>CA</addrstate>
<custstoreid>Store01</custstoreid> <addrtext/>
<custtext>Comment Five, </addr>
Comment Six</custtext> </cust>
<addr> <cust>
<addrid>Addr03</addrid> <custid>Cust01</custid>
<addrcustid>Cust03</addrcustid> <custstoreid>Store01</custstoreid>
<addrstate>NV</addrstate> <custtext>Comment One,
<addrtext>This is addr Comment Two,
text</addrtext> Comment Three,
</addr> Comment Four</custtext>
</cust> <invoice>
<cust> <invid>Inv02</invid>
<custid>Cust02</custid> <invcustid>Cust01</invcustid>
<custstoreid>Store01</custstoreid> <invstatus>O</invstatus>
<custtext/> <invtext/>
<invoice> </invoice>
<invid>Inv03</invid> <invoice>
<invcustid>Cust02</invcustid> <invid>Inv01</invid>
<invstatus>O</invstatus> <invcustid>Cust01</invcustid>
<invtext/> <invstatus>P</invstatus>
</invoice> <invtext/>
<addr> </invoice>
<addrid>Addr04</addrid> <addr>
<addrcustid>Cust02</addrcustid> <addrid>Addr01</addrid>
<addrstate>CA</addrstate> <addrcustid>Cust01</addrcustid>
<addrtext/> <addrstate>CA</addrstate>
</addr> <addrtext/>
<addr> </addr>
<addrid>Addr02</addrid> </cust>
268 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Table 23.2
XML Attribute Mode Output
SELECT * FROM CustView FOR XML Attribute
<cust custid="Cust03" custstoreid="Store01" custtext="Comment Five,
Comment Six">
<addr addrid="Addr03" addrcustid="Cust03" addrstate="NV"
addrtext="This is addr text"/>
</cust>
<cust custid="Cust02" custstoreid="Store01" custtext="">
<invoice invid="Inv03" invcustid="Cust02" invstatus="O" invtext=""/>
<addr addrid="Addr04" addrcustid="Cust02" addrstate="CA" addrtext=""/>
<addr addrid="Addr02" addrcustid="Cust02" addrstate="CA" addrtext=""/>
</cust>
<cust custid="Cust01" custstoreid="Store01" custtext="Comment One,
Comment Two, Comment Three, Comment Four">
<invoice invid="Inv02" invcustid="Cust01" invstatus="O" invtext=""/>
<invoice invid="Inv01" invcustid="Cust01" invstatus="P" invtext=""/>
<addr addrid="Addr01" addrcustid="Cust01" addrstate="CA" addrtext=""/>
</cust>
Table 23.3
XML Mixed Mode Output
SELECT * FROM CustView FOR XML Mixed
<cust custid="Cust03" custstoreid="Store01">
Comment Five, Comment Six
<addr addrid="Addr03" addrcustid="Cust03" addrstate="NV">This is
addr text</addr>
</cust>
<cust custid="Cust02" custstoreid="Store01">
<invoice invid="Inv03" invcustid="Cust02" invstatus="O"></invoice>
<addr addrid="Addr04" addrcustid="Cust02" addrstate="CA"></addr>
<addr addrid="Addr02" addrcustid="Cust02" addrstate="CA"></addr>
</cust>
<cust custid="Cust01" custstoreid="Store01">
Comment One, Comment Two, Comment Three, Comment Four
<invoice invid="Inv02" invcustid="Cust01" invstatus="O"></invoice>
<invoice invid="Inv01" invcustid="Cust01" invstatus="P"></invoice>
<addr addrid="Addr01" addrcustid="Cust01" addrstate="CA"></addr>
</cust>
All of these XML output examples have the same hierarchical structure shown
in Figure 23.1.
data is stored along with the XML data. This metadata includes the structure
metadata, which allows the data structure formats to change dynamically
because the metadata that defines it is changing dynamically in order to con-
form to the actual data format.
Similar to XML’s variable format, the SQL ON clause can be used to test
data values at specific hierarchical node points to control whether or not the
join is performed. This allows data structure generation to be variable depend-
ing on the data values in the data structure being processed. This allows the
variable data generation to be more dynamic.
23.1.7 Namespaces
Namespaces are used in XML to solve naming conflicts. When combining SQL
views, the same naming conflicts occur as when XML documents are com-
bined. In a similar way as XML’s naming conflicts are handled, SQL’s
high-level name prefix can be added when referring to data types to prevent
naming conflicts by making these names unique.
though it might hold a multipath structure, only one pathway can be ordered
in the result set. This is because, being a single flat structure, when one pathway
is ordered, all of the other paths become unordered. A solution to this problem
by the SQL hierarchical XML processor is to seamlessly support separate order-
ing of paths assisted by its post processing. Another problem with ordering
hierarchical data is that ordering that goes against the structure can change the
hierarchical structure, inadvertently producing invalid results. For example, in
the structure employee over dependent, ordering dependent before employee
changes the structure to dependent over employee; this is problematic because
the structure has been changed, but the processor does not know this and
will produce invalid results. This is why the SQL hierarchical processor will
not allow this invalid ordering for hierarchical structures. This additional bene-
ficial level of hierarchical processing requires operating at a hierarchical
structure-aware processing.
made that would not be good for SQL and XQuery XML support. Unfortu-
nately, most of these sacrifices were made, and we live with their limitations
today in the ISO SQL/XML and W3C XQuery standard. Many of these are
described next.
hierarchical looping logic; therefore, the placement of each data item that will
be output has to be performed carefully and possibly separately for each data
item. Adding or removing an output item in XQuery takes manual program-
ming and testing. In some cases, adding an output data item will require pro-
gramming an additional hierarchical nested loop level.
23.3.4.2 Dynamically Removing a Data Item from an SQL Query Is Easy
With SQL, adding or removing an output item is just the simple process of
removing or adding an output item in the SELECT list, which can be done
dynamically at view invocation and requires no additional procedural program
logic or knowledge of the structure. With SQL’s automatic and true
nonprocedural processing, it performs only the processing necessary to produce
the result for the data items that are specified in the SELECT list. XQuery has
no automatic equivalent or similar general operation that can dynamically
accommodate the simple ad hoc adding and removing of output data items.
This also automatically drives the different required processing from query to
query. This can also dynamically control the output data structure and auto-
matically aggregates (condenses) the result nicely. These are important aspects
in a query language, and XQuery is missing them.
23.4.1 The SQL Hierarchical XML Solution Stays Naturally within SQL
By utilizing SQL’s natural hierarchical processing, the SQL/XML solution
would stay within the SQL structured data box. In this way, the internal hierar-
chical processing is efficiently and naturally performed across the entire
multipath structure. This is possible because this hierarchical processing is a
natural subset of relational processing. This means that the hierarchical results
SQL/XML: Operation, Politics, Ramifications, and Solution 277
are not only hierarchically correct, but that they are also ANSI SQL relationally
and mathematically correct.
23.5 Conclusion
If there were no politics involved, SQL would have favored XML structured
data processing instead of SQL and XQuery, both of which favor semi-struc-
tured XML processing. It does not make sense for SQL, a natural structured
data processor, not to favor XML structured processing. This has left a hole in
SQL’s XML processing and is the reason that the SQL hierarchical XML pro-
cessor mentioned in this chapter was developed to handle structured XML data
processing and to prove that its new and advanced hierarchical processing is
possible. This is a solid SQL processor; it is designed for SQL users to include
XML processing, not XQuery users.
24
SQL Hierarchical XML Processor
Operation
The hierarchical processing capabilities of the SQL hierarchical XML processor
are naturally and transparently performed in standard SQL. These include:
data modeling and processing of full multipath hierarchical structures; per-
forming hierarchical joins of multipath data structures; full support and utiliza-
tion of multipath hierarchical query semantics; basic hierarchical processing,
such as node promotion and fragment processing; structure transformations;
and more. The user does not have to know XML or be aware of its processing
in order to use the SQL hierarchical XML processor. This allows SQL/XML
development projects to begin immediately and require no additional risk,
design, development, training, or debugging effort while delivering efficient
hierarchical accurate results consistently. The internal and external design of
the SQL hierarchical XML processor will be covered in this chapter.
The fundamental belief that relational and hierarchical data and process-
ing cannot be integrated seamlessly and fully is wrong. The SQL hierarchical
XML processor described in this book is implemented using a breakthrough
ANSI SQL-driven technology described in Figure 24.1 that finally solves the
SQL/XML integration problem. It is the first full nonlinear multipath hierar-
chical product for XML processing today. The SQL hierarchical XML proces-
sor automatically enables a standard SQL processor to transparently access,
integrate, and process relational and native XML data at a full hierarchical pro-
cessing level. This seamless operation ensures hierarchically accurate and cor-
rectly represented XML results. These capabilities are missing from XML
processing and significantly reduce risk and increase ROI for the user.
279
280 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Figure 24.1 Mapping logical hierarchical structure to and from relational rowset.
Figure 24.2 Mapping physical XML structure to and from relational rowset.
24.6.1 Preprocessor
Starting at the top of Figure 24.5, the preprocessor accepts an SQL query defi-
nition, RDB definition, XML schema, or a hierarchical definition. The RDB
definitions are checked for valid specification or required modifications that
may be necessary and submitted to the standard SQL processor for accepting
their RDB Data when required. XML schema is checked for a valid hierarchical
structure definition. Hierarchical definitions define generic hierarchical struc-
ture input for dynamic and static retrieved hierarchical input. For static rela-
tional data input, the preprocessor retrieves the data immediately using the
SQL Hierarchical XML Processor Operation 285
dynamic hierarchical data input routine, which also converts it from hierarchi-
cal to relational data. The relational data is then stored in separate relational
tables in the static relational data box by the preprocessor for future input when
requested by the standard SQL processor. For dynamic data, the data definition
is updated so that it defines a dynamic external file or UDF call in the standard
SQL processor to invoke the asynchronous access processor when external hier-
archical data input is required. In this way, the dynamic hierarchical data input
routine is invoked when external hierarchical data retrieval is necessary.
In the preprocessor, SQL queries are analyzed in order to determine and
extract the data structure that is being accessed. This optimization eliminates
pathways that do not require access. The optimized structure is passed through
to the standard SQL processor for immediate query processing. The optimized
metadata structure is also made available to the other subprocessors in the hier-
archical processor so that they can perform structure-aware processing. Optimi-
zation is also important because it is needed to support global views with no
overhead and schema-free navigationless processing.
the dynamic hierarchical data input routine. This process is performed when
external hierarchical data is required. The dynamic hierarchical data input rou-
tine is used to retrieve both static and dynamic data requests. The standard
SQL processor has access to the dynamic hierarchical data (EII), static rela-
tional data (ETL), and standard RDB data.
24.6.4 Postprocessor
The postprocessor receives the result set from the standard SQL processor. The
rowset is converted to a hierarchical structure that renormalizes the data by
removing replicated data. Replicated data is data that has been replicated by the
relational Cartesian joining process; these replications need to be removed from
the resulting hierarchical structure. Duplicate data, for our purposes, is differ-
ent than replicated data, as duplicate data is valid data that represents identical
data having the same key value. These duplicates should not be removed, but
this does require a special processing in order to distinguish between duplicate
data and the replicated data. This process is also used to keep keyless XML
structures in their initial order because order preservation is assumed in XML.
XML does not need to use keys because the data in the structure is contiguous.
The resulting hierarchical structure is output using structured formatted XML.
This requires structure-aware processing because the resulting structure can be
different than the initial structure. Joining hierarchical data structures dynami-
cally extends the structure changing it. It is also standard not to include empty
nodes in the hierarchical result. Therefore, node promotion is used to remove
them. This also affects the data structure result as shown in Figure 24.3.
24.7 Conclusion
This chapter has described how the SQL hierarchical XML processor operates
internally and externally. What makes this XML processor novel is that it uses
SQL Hierarchical XML Processor Operation 287
standard ANSI SQL syntax and semantics to make SQL naturally operate hier-
archically. It showed how the SQL relational data processing can be mapped
naturally to hierarchal data processing and back again. This is possible because
hierarchical processing is a subset of relational processing, which has been
proven by the SQL hierarchical XML processor. This chapter has shown where
the relational-to-hierarchical and hierarchical-to-relational conversions take
place and how heterogeneous processing is seamlessly performed. Also covered
were the processes in the pre-, post-, and asynchronous processing that were
necessary to make SQL operate hierarchically and support XML input and out-
put formatted XML. It also shows how external static and dynamic hierarchical
data access is achieved and can be processed together.
25
SQL Hierarchical XML Processor
Examples
This final chapter will show the SQL hierarchical XML processor in actual
operation demonstrating its common and critical operations by a series of exe-
cuted query examples. The live query results are shown in XML using attribute
mode, which is the default for the SQL hierarchical XML processor. These
examples are annotated to help explain their internal and external operation.
Supporting XML requires the hierarchical processor to support full multipath
nonlinear processing. Figure 25.1 shows how the relational StoreView and its
subviews, CustView and EmpView, are represented hierarchically for use in
this chapter.
Figure 25.2 shows the hierarchical data and its hierarchical structure. This
can be used to further verify the XML hierarchical results in this chapter. Figure
25.3 contains the hierarchical structure definitions that define what the differ-
ent line and box figures mean. These are used to annotate the examples in this
chapter. All the operations shown in the examples of this chapter have been
explained in this book.
289
290 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
the SELECT operation. Node promotion, node collection, and fragment oper-
ation rely on node selection, and are discussed in this section.
Store01
Cust01 Emp01, F
Cust02
Cust03 Emp02, Null
Addr01
Addr04 Dpnd01, D
Inv01, P
Inv02, O Addr02
Inv03, O Addr03 EAddr: EAddr:
Addr01 Addr03
though it has no lower-level invoice child. Examining the XML and hierarchi-
cal data tree in Figure 25.4, it can be seen that this is the correct SQL result.
<Root>
<store storeid="Store01"> Result
StoreView
<cust custid="Cust01">
<invoice invid="Inv01"/> Store
Store
<invoice invid="Inv02"/>
</cust> Cust
<cust custid="Cust02"> Cust Emp
<invoice invid="Inv03"/> Invoice
</cust>
<cust custid="Cust03"> Invoice Addr Dpnd Eaddr
</cust>
</store>
</root>
Cust node in the SELECT list. Normal hierarchical operation is to simply skip
over the Cust node and to keep accessing other fields down the path. This is the
same operation as with relational projection, which also slices out unselected
columns that can be equated to hierarchical nodes. This node promotion oper-
ation can be turned off in order to keep the accessed structure intact as it was.
This can be specified by using the KEEP NODE option in the FOR XML
clause at the end of the query. This is useful for XML navigation purposes
where the same navigation logic can be used as the one defined for the original
structure.
paths. This does not present a problem for the user since there are no special
requirements for specifying paths on the SELECT request. This is shown in the
following request, which selects fields from Dpnd and Invoice nodes that are
located on separate paths. As an additional test, two fields are selected from the
Dpnd node. The DpndCode is referenced first, which is out of node order and
separate from its other Dpnd node field DpndID. This demonstrates that there
are no data order requirements for users to worry about.
To make this query more interesting, SQL SELECT references for the
intervening Cust and Emp nodes have been excluded, causing node promotion
on both paths to occur in Figure 25.6. This causes the Store node to collect the
node promotion from both paths. This node collection processing is demon-
strated by the following generated structure and XML. It also shows the addi-
tional capability of changing the default collection node of “root” to
“storeview” using the FOR XML option of UNDER to specify a different col-
lection name.
<root>
<cust custid="Cust01"> Cust WHERE CustID=’Cust01’
<invoice invid="Inv01"/>
<invoice invid="Inv02"/>
</cust> Invoice
</root>
that Inv03 did not qualify because its parent Cust occurrence Cust02 did not
qualify.
<root> Cust
<cust custid="Cust01"/>
</root>
Invoice WHERE InvID=’Inv01’
<root> Store
<store storeid="Store01"> Store
<invoice invid="Inv01"/> WHERE
<invoice invid="Inv02"/> CustID=’Cust01’ Cust
</store> Invoice
</root> Invoice
of the previous two single path queries, which qualifies both down to invoice
and up to Store at the same time. Actually, with hierarchical processing, these
queries can be processed in either direction first. Top-down processing is usu-
ally more efficient because, if the root is disqualified, the children do not need
to be accessed.
<root> Cust01
<emp empid="Emp01"/> WHERE
<emp empid="Emp02"/> InvID=’Inv02’ Emp01
Inv02 Emp02
</root>
<root>
<store storeid="Store01"> LCA Store01
<cust custid="Cust01">
<invoice invid="Inv02"/>
</cust> Cust01
<emp empid="Emp01"/> WHERE
<emp empid="Emp02"/> Emp01
InvID=’Inv02’ Inv02 Emp02
</store>
</root>
<root>
<cust custid="Cust01"/> Cust01 LCA for WHERE decision logic
</root>
they are not related by the same LCA data occurrence. In Figure 25.16, the
LCA data occurrence was Cust01, which was the same LCA occurrence of
both. In Figure 25.17, Inv02 and Addr02 under the Cust node are in different
parent data occurrences and are not considered meaningfully related because
they have different LCAs. These are standard, nonlinear, hierarchical process-
ing rules and SQL processing naturally follows them with its Cartesian prod-
uct-controlled data replication. Adding StoreID to the SELECT list does not
select StoreID at the higher level either because the LCA and the Cartesian
product replicated data generation remains the same for the Invoice and Addr
nodes.
would not be selected. One might think that it should not be selected because
the left side does test true with InvID=”inv02,” but it is selected because the
other sibling (right) side test where AddrID=”Addr01” was also tested and
selected. It qualified both invoices under the LCA “Cust01.”
The above description means that both sides of the OR condition always
need to be tested at the hierarchical query level. This result and hierarchical
logic can be proven by breaking the query into two queries, each with one side
of the WHERE clause and unioning the results together. This logic also results
in LCA qualification logic being dynamically switched between the left and
right OR condition, depending on which side is true, which is tested below in
the examples in Figure 25.18. This double-sided testing of OR conditions on
the relational WHERE clause is naturally performed by the Cartesian product,
building all combinations so that both sides of the WHERE clause are eventu-
ally tested over multiple rows containing replicated data, so that the following
query operates correctly in relational hierarchical processing.
for Addr node. In this example, you will notice that only the Addr node is pres-
ent for Employees who are full-time (have an “F” status code). This filtering
does not affect the Emp and Cust nodes because the LEFT join processing only
affects the Addr node and its forward path.
Emp
EmpView
Invoice Addr
<root>
<emp empid="Emp01" empstoreid="Store01" empcustid="Cust01"
empstatus="F">
<dpnd dpndid="Dpnd01" dpndempid="Emp01" dpndcode="D"/>
<eaddr eaddrid="Addr01" eaddrcustid="Cust01" eaddrstate="CA"/>
<cust custid="Cust01" custstoreid="Store01">
<invoice invid="Inv01" invcustid="Cust01" invstatus="P"/>
<invoice invid="Inv02" invcustid="Cust01" invstatus="O"/>
<addr addrid="Addr01" addrcustid="Cust01" addrstate="CA"/>
</cust>
</emp>
<emp empid="Emp02" empstoreid="Store01" empcustid="Cust03"
empstatus="">
<eaddr eaddrid="Addr03" eaddrcustid="Cust03" eaddrstate="NV"/>
<cust custid="Cust03" custstoreid="Store01">
<addr addrid="Addr03" addrcustid="Cust03" addrstate="NV"/>
</cust>
</emp>
</root>
clause to the right to expand the lower-level structure view with its own ON
clauses. The right-sided nesting leaves the current left-sided structure in sus-
pension and starts building the new right-sided structure until it is complete. It
is then joined to the left structure. This materialized lower-level view can be
qualified on its ON clause before joining. This makes it a look-ahead
operation.
The lower-level structure’s original root node remains the root node even
if the link point is below the root. This is because the original root node still
affects what tables (or nodes) are in the structure. In this example, if “Cust01”
data occurrence did not exist, then this lower-level structure occurrence would
not exist. This is demonstrated in Figure 25.23.
The filtering of the linking below the root needs to be pointed out in the
example directly above. The lower-level ON clause reference was to the Addr
node in the Cust structure. The matching Eaddr higher-level node values from
the Emp structure are “Addr01” and “Addr03.” These will match the
“Addr01” and “Addr03”of the lower-level structure and will qualify them up,
down, and across qualified paths (Inv01, Inv02). In this example, there are no
lower-level nodes under Addr01 and Addr03 (downward). However, if there
were lower-level nodes under Addr01 and Addr03, they would be qualified if
they were in the Select list. Upward, Cust01 and Cust03 qualify. However,
notice that Addr02 and Addr04 under Cust02 that did not match do not qual-
ify, which means that Cust02 does not qualify either because it is not qualified
from any other qualified node occurrence. For this reason, they are filtered out
by the join operation. This is very sophisticated lower-level structure processing
that is carried out easily. It applies to logical or physical structures because they
are both in the rowset at this point.
Input
Emp Result Emp Structures
Structure
Dpnd Eaddr
Dpnd Invoice Addr
Cust
<root>
<emp empid="Emp01">
<dpnd dpndid="Dpnd01"/> Invoice Addr
<invoice invid="Inv01"/>
<invoice invid="Inv02"/>
<addr addrid="Addr01"/>
</emp>
<emp empid="Emp02">
<addr addrid="Addr03"/>
</emp>
</root>
the node or view level and can use join criteria values above and below the
lower link point, as was shown previously with the upper and lower join points.
Notice in the XML result of Figure 25.26 that employee Emp01 with a
status of “F” contains a Dpnd node and not an Eaddr node, while employee
Emp02 with no status has an Eaddr node but no Dpnd node.
25.8 Conclusion
This final chapter has shown the SQL hierarchical XML processor in actual
operation demonstrating its common and critical operations. The actual query
results are shown in XML using attribute mode, which is the default for the
SQL hierarchical XML processor. These examples where annotated to help
explain their internal and external operation.
26
Summary
The standard SQL join is an operation with powerful syntax and semantics,
whose hierarchical capabilities have not been fully understood or realized. This
book’s purpose has been to remedy this situation. Many of these capabilities are
based on the outer join’s inherent ability to dynamically model and process
complex hierarchical data structures. This book has proved that this powerful
data modeling capability does exist and has demonstrated that this ability can
be harnessed using only the standard SQL facility. Using the outer join opera-
tion to perform data modeling and hierarchical processing was explained, care-
fully showing that any hierarchical data structure could be dynamically
modeled and processed.
The flexible syntax of the standard SQL is what enables a singular, unam-
biguous hierarchical application view to be defined and utilized. These data
modeling capabilities can be immediately used by the user or can be utilized by
SQL vendors to support advanced new capabilities. The associated data struc-
ture meta information is embedded implicitly in the outer join syntax that
defines the view. This information is automatically available in standard outer
joins. By dynamically extracting and utilizing this meta information from outer
join syntax, SQL vendors can provide standard features that have not been pre-
viously possible with standard SQL. These advanced features have been dis-
cussed in the book and include: full multipath hierarchical processing; dynamic
structure joining; transparent XML integration; navigationless hierarchical pro-
cessing; data fragment level processing; structured data transformation; and
structure-aware processing to support hierarchical optimization and automatic
structured data output.
313
314 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
The real power of the ANSI outer join appears when three or more tables
are joined. This is because changing the order in which tables are joined in
influences the result. This is a dramatic and significant departure for relational
databases. Before this book, the effects and capabilities of this have not previ-
ously been fully studied, understood, or documented. Understanding these
effects opens a whole new realm of possibilities for relational processing.
Using the outer join, the power of hierarchical data structure processing
was shown and described, along with the inherent semantics of these multipath
structures. These structures are unambiguous, making them perfect for applica-
tion views. This book also demonstrates how the flat Cartesian product model
produces the same LCA multipath semantics and operation as its comparable
hierarchical model when processing relational database queries. This has an
important significance for SQL seamless access of heterogeneous and legacy
database access, which is also shown in this book.
These hierarchical outer join capabilities are still unfamiliar to most SQL
users, and are available to be used if users know how. It is hoped that this book
will help identify this capability. A new automatic metadata maintenance capa-
bility that would allow dynamically structured data to be specified across peer
locations for immediate processing, eliminating the need of metadata updating
by the user, was also described. This will open many advanced data processing
capabilities, such as peer-to-peer real-time hierarchical SQL coding collabora-
tions using dynamic structured data. The natural hierarchical processing can
also offer a gateway or interface to the semantic Web, and can also offer auto-
matic parallel processing. This is possible for hierarchical queries because hier-
archical pathways are parallel and can automatically take advantage of parallel
processing.
Appendix A
Database Relationships and Views Used
in This Book
315
316 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Department
Division
Employee
Conceptual
Product Department View
Dependent
ProdMgr DeptMgr
Division
Product Department
Ad hoc query: An ad hoc query is a database query that can be specified inter-
actively or for unanticipated queries. This means that the database query does
not require being predefined to the database system that is processing the
query. In relational systems, this will require dynamic SQL query processing.
Aggregate data: Data that is the result of applying a process to combine data
elements collectively or in summary form. The SQL SELECT List does this
very easily and offers quite a bit of dynamic control.
317
318 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
foreign key can be considered an alternate key. The alternate key is usually the
“many” side of a one-to-many relationship.
Ancestor nodes: Ancestor nodes are nodes that are further up the path from
their related descendent node. As in a parent node, ancestor nodes control the
existence or range of processing of those nodes under it.
Application view: The application view is how the application visualizes the
structure of the database. This structure should be hierarchical because hierar-
chical structures are singular (unambiguous) in meaning. This enhances the
usefulness of the data structure semantics. With application views, applications
can share views and databases can support many different views.
tables, so an association table is used between the Parts and Suppliers tables in
order to maintain the one-to-many relationships in both directions when per-
forming the necessary joins.
Atomic value: Atomic value is a basic value that is not combined of other
classifiable parts.
Blob: A blob is a relational column type used to hold binary large objects that
can be composed of any type of data. A blob is used mainly for storage. For
example, it can store a native XML document. It is not meant to be processed
directly by SQL inherent operations. It can be processed by user-defined
functions.
320 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Bushy query: A bushy query is a query that accesses and/or processes multiple
paths of the hierarchical data structure that is being processed.
CDATA: The XML CDATA type construct specifies an escape block for an
element that specifies that the indicated text data should not be parsed because
it has special characters or requires special processing.
Child: A child is the next lower level table or node in the data structure that
follows the path downward. There can be multiple children definitions for a
parent node definition, each one on a separate path from the parent. In a hier-
archical structure, children data occurrences can not exist without an active
parent data occurrence.
Closure: All expressions in a language return values that are in the data model
being processed by the language.
Coalescing: Coalescing is the inspection of key values under the same domain
to return a single, non-null, valid key value found amongst them. This has spe-
cial significance for outer joins where null key values can be produced because
of their data preserving ability. This can identify a key field among multiple
keys when there is at least one key non-null value present so it can be used as
the only key field. This avoids multiple key fields to check.
Composite key: A key that contains more than one column. This is also
known as a concatenated key, or multivalue key.
Commutable joins: Joins that can change order or be replaced with other
joins to produce the same results.
Complex data modeling: Complex data modeling used in the context of this
book applies to the ability to construct hierarchical data structures that contain
multiple paths by using the outer join operation. Multiple paths add another
level of capabilities and complexity to the principles involved in defining data
structures with the outer join and to the semantics that are associated with the
data structure.
Conceptual view: A conceptual view is a view or schema that defines all pos-
sible data and the valid relationships they comprise in a database so that all
required application views can be defined from it. As such, a conceptual view
requires a network structure to define it because of the high probability of con-
verging paths. A conceptual view sits between the internal and external views
and acts as an automatic level of abstraction between the two.
Content model: There are three data content models when XML elements
are declared. These are data content, element content, and mixed content.
With data content, elements can specify text data, but cannot contain
subelements. With element content, an element can only specify subelements
and optional rules for their use. With mixed content, elements can specify both
text data and a subelement within the text.
Cousins: As used in this book, cousins are nodes that are not directly related
to other nodes on the same active path, but are related indirectly by a common
ancestor node data occurrence. This means that every node in the hierarchical
structure is related directly or indirectly to each other.
Cross join: The cross join is one of the ANSI-92’s SQL join types. It creates a
basic nonrestricted inner join Cartesian product result and as such, it does not
use or require a join condition, so no ON or USING clause is used with it.
Dangling tuple: Dangling tuples are the rows that are not matched in join
operations. With inner joins they are discarded, and with outer joins they can
be preserved in the result by padding their unmatched row side with null
values.
Data abstraction: Data abstraction is the ability to hide the complexity of the
data. In this book, a good example would be a stored structured data view
whose use helps hide the complexity of the hierarchical data structure.
Data modeling: Data modeling is the ability and process of specifying and
constructing complex data structures that represent specific semantics. In SQL,
this can be performed with the ANSI-92 LEFT outer join operation that can
inherently define and process complex data structures.
Data partition: Breaking tables into multiple tables for different purposes.
This can be done vertically or horizontally. Vertically, rows are split across mul-
tiple tables. Horizontally, tables are split based on some data value or range,
such as names starting from A to F in one table and G to M in another table, or
maybe by office location.
Data persistence: The capability to retain and reaccess data after the applica-
tion creating it has terminated normally.
that define data structures. This metadata contains a detailed description of the
data structure from which powerful and useful semantics can be derived to per-
form automatic operations, such as hierarchical optimizations and automatic
hierarchical data formatting, such as for XML.
Data type: A data type is how the data is defined in the database; each data
type can have unlimited data occurrences.
Data virtualization: The ability to easily select and combine data fragments
from many different locations dynamically and in any way into a single data
structure while also maintaining its semantic accuracy.
Derived data: Derived data, as its name implies, is data derived from some
process or calculation. Derived data, when data is retrieved, is modified after
being retrieved and placed in the input buffer as if it was retrieved directly. For
example, a birthday could be converted to an age. This is a good example
because age is constantly changing.
Descendent node: A descendent node is a node that is further down the path
from the related node.
Dirty data: Dirty data is data that is or has become missing, inconsistent, or
erroneous.
DOM: DOM is the document object model API. A DOM processor is used
to access, parse, store, and retrieve tokens from an XML document. There are
other APIs, such as SAX, that can be used to access native XML documents.
DOM tree: A DOM tree is the entire document internal hierarchical struc-
ture produced when DOM accesses the next document occurrence. This can be
many times the size of the actual document native occurrence.
Duplicate data: Duplicate data, as used in this book, is real data that natu-
rally occurs multiple times, containing the identical data; each occurrence of
duplicate data is meaningful. The term replicated data, on the other hand,
describes data that is replicated because of operations applied to the data, such
as joining tables or flattening hierarchical structures. In this case, the identical
copies of the data are not meaningful and were only necessary in order to help
perform the desired operation and can have side effects.
node in more than one location in the hierarchical structure. This is similar to
object subclasses, such as an address class, being used as a subclass in both cus-
tomer and employee super-classes. These duplicate named element types will
show up in multiple locations of an XML hierarchical structure, causing ambi-
guity problems for navigationless query languages such as SQL.
Duplicate key: Duplicate key means that more than one record, row, or ele-
ment can have the same key. The key is not unique. A duplicate keyed record
usually means that the primary key can be duplicated.
Dynamic: In this book, the term dynamic is used as a modifier for a database
operation, indicating that the operation it modifies can be performed dynami-
cally or in an ad hoc fashion, such as a dynamic query or a dynamic joining of
structures.
Dynamic structured data: Structured data that has been dynamically modi-
fied. See Dynamic metadata maintenance.
specified and joined in an ad hoc interactive fashion that does not require
predefinition. This capability is automatically extended to data modeling, hier-
archical structure processing, and hierarchical structure joining capabilities
made possible by the ANSI-92 outer join operation.
Edge table: An edge table is a table structure that defines a specific tree struc-
ture. Each row defines a node type of the structure, as well as its parent and
child node types.
Element: XML elements define data in two ways, using a start and stop/end
tag name that contains a text string that can also contain subelements, and also
through attributes that are name and value pairs. Either or both can be used
unless restricted by a schema. The tag names can be used to name the data val-
ues or act as markup in the text.
End tag: An end tag is a matching tag for an XML start tag represented as
</tagname>. It closes the definition of the current element occurrence.
Enterprise data: Enterprise data is data that is used or can be used across the
entire corporation.
Equal join: An equal join is a relational join that uses an equality operation to
relate the tables. An equal join is also known in relational terms as an equijoin.
ETL: Extract, transform, and Load are utilities for accessing, converting, and
loading massive amounts of data. The newest ETL products are designed to
convert and move relational data sources to XML sources, and XML sources to
relational sources. This involves shredding (flattening) the XML data.
Expanded views: Expanded views are embedded stored views whose name
reference is replaced with its representative source code so that the query can be
processed against its expanded source code. When structured views are
expanded, they automatically form a unified hierarchical view that uniformly
models the hierarchical structure being processed.
External view: An external view is one of the three types of views that com-
prise the three tier model for database architecture. These are the internal,
external, and conceptual views. The external view is the view that the applica-
tion and user of an application has of the database. For this reason, it is also
known as the application view. With application views, applications can share
views and databases can support many views.
Federated database: A federated database accesses the data from other data-
bases when the data is needed. This is the opposite of a centralized database sys-
tem. Also see disparate heterogeneous database access.
First normal form: First normal form doesn’t permit relational tables to con-
tain repeating data types or groups in a single row. Repeating data should be
placed in another table where each occurrence of the repeating data is placed in
a different row. This allows a table to be a flat, two-dimensional structure. First
normal form is not a prerequisite for good database design, it is only required
for relational databases and their flat tables. Also see Nonfirst normal form.
Fixed-occurring fields: Fixed-occurring fields are data fields that can occur
multiple times in a record. They are fixed because the amount of space required
to contain them is reserved in the record whether it is used or not. This means
that a fixed-occurring field can contain a variable number of data fields, but is
still considered fixed because it always uses the same fixed amount of storage
space and cannot exceed the maximum space allocated for it.
332 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Flat file: A flat file is a file that has the same fixed, unvarying format for each
record. It has no variable-occurring fields, but can have fixed-occurring fields.
In this way, each record is of the same length. A flat file can be thought of as a
relational table, with each of its fixed records as a row of the table.
Foreign key: A foreign key is an alternate key in one or more tables that
relates to a primary key in another table, creating either a one-to-many or
many-to-one relationship.
Four value logic: With XML, there can be four values of logic: true, false, no
value, and empty value. This is because an element can have no value specified
or it can be specified as an empty element. The no value and empty value can
both be interpreted differently by the application.
FULL join: A FULL join is an outer join type that preserves data on both
sides of the join operation when rows are not matched up. Unmatched rows are
padded with null values. This does not model a hierarchical structure; it models
a flat structure because a FULL join is a symmetric operation. These can be
incorporated into a hierarchical structure as a single logical node comprised of
two or more FULL joined tables or nodes.
Global view: A view that encompasses the entire physical structure and could
also include smaller views with no overhead for using the oversized view. This is
used to support the single view concept where larger views can be used without
efficiency concerns. This avoids the need to have many specialized views tai-
lored to a subset of the global view, making global views more user-friendly and
always efficient.
Graph: A tree is a directed graph where the direction is from the root down.
Hierarchical join: As used in this book, it means that the hierarchical struc-
tures are being joined hierarchically one above the other. This properly com-
bines into the larger hierarchical structure with the correct combined
hierarchical structure. One-sided LEFT or RIGHT joins can be used to per-
form hierarchical joins. LEFT outer joins are hierarchically easier to work
because they are combined in a left-to-right order that follows a top-to-bottom
hierarchical direction.
HTML: Hypertext Markup Language is used for formatting a web page for
output. Its tags are fixed. You could say that it is an XML vocabulary for web
output.
Implicit natural join: An implicit natural join is a term use in this book for
ANSI-92 natural joins that are specified by replacing the ON clause with the
336 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
USING clause which implies that a natural join is to be performed, hence the
use of the term implicit.
Inner join: The inner join is the standard default join. It does not preserve
unmatched data rows under any circumstances. It is a symmetric join operation
and therefore models only a flat structure.
Join operation: Relational tables are joined across their rows creating a larger
wider table.
Join table order: The table join order can be specified in the outer join state-
ment. This table join control is important in some outer join operations where
it can influence the result.
Join table reordering: Join table reordering is the process of altering the table
join order to optimize the execution of outer joins. This cannot be done indis-
criminately because changing the table join order can affect the results of the
Glossary 337
outer join operation. Analyzing the data structures defined by the outer join
operation and understanding its semantics is one way of determining when and
how table join order can be optimized without changing the result.
Late binding: Late binding with outer join data modeling is the ability of the
database application to accept different data structures that can be specified at
run time.
LCA query: An LCA query is a multipath query using and requiring lowest
common ancestor logic in order to process hierarchical structures.
Left join: The LEFT join operation is an outer join that preserves unmatched
rows from the table specified on the dominant left side of the join operation. It
is a natural hierarchical operation that allows the hierarchical structure to be
built from the top to the bottom. The RIGHT outer join preserving data on
the right builds hierarchical structures from the bottom upward. The LEFT
outer join is easier and natural to use because it progresses naturally from the
left to the right in the same direction as its execution.
Leg: A leg is a path in the data structure, including the data that is stored
along its path. This is an older name for what is simply called a “path” today.
Link points: Link points mentioned in this book are the connection node
points for joining hierarchical structures, one in the upper and one or more in
the lower data structure. They are connected by a pathway when the data struc-
ture is being built. This occurs using the outer join operation and its ON clause
join specification which specifies the link points.
338 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Linked data: Linked data refers to connecting related data that was not previ-
ously linked. It also can involve using the Internet to help increase the capabili-
ties of linking data together.
Linking: Linking is the process defined in this book for specifying a pathway
between two hierarchical structures that control how the two structures are
joined into a single hierarchical structure, properly preserving and combining
the semantics.
Lists: Lists are assumed ordered, while sets are assumed unordered. XML data
is assumed ordered, while relational data is assumed unordered.
Logical table: A logical table, as used in this book, is a series of flat structures
joined together that represent a single flat structure that is a single node in the
overall hierarchical structure being modeled. This logical flat structure is mod-
eled using INNER or FULL outer joins that are symmetric join operations that
model flat structures. This also enables INNER and FULL outer join opera-
tions to be used in the modeling of hierarchical structures.
Lossless integration: A data integration process that does not lose any data
information.
Lossless process: Any process that has not lost any information, including
semantic information.
Markup data: Markup data contains markup elements that are used to indi-
cate markup indicators in the text. The rules for markup elements allow them
to be freely nested in any fashion as you would need for markup data. However,
this has no meaning for representing hierarchical data structures. If possible,
the entire markup text should be defined as CDATA and processed separately.
Middleware: Middleware is software that sits between the user and user inter-
face, or between the user interface and the database, which adds value to the
data.
Missing data: Missing data, also known as lost data, is the data that is lost in
an inner join when rows of the tables being joined do not match with any other
rows. Missing data can also occur with one-sided joins on the side that is not
being preserved. This definition ignores all the other reasons for missing data.
Mixed content: XML mixed content can contain attributes, elements, and
text.
Multipath: Multipath a newer term used to indicate a data structure that has
multiple legs. Hierarchical legs and paths represent the same thing.
Native XML: Native XML refers to the actual XML, with its embedded
metadata, and not data that has been extracted and isolated from XML
NCA: Nearest common ancestor, see Lowest common ancestor (LCA). LCA is
used in this book.
Node definition, node declaration, or node type: This refers to the defini-
tion of a node in the structure and not a data occurrence of the node.
Node promotion: When a defined node in the structure has not been picked
for data selection (no data projection, node exclusion) from it, it is not placed
Glossary 343
in the output structure and its selected descendent nodes are moved up the path
around it to their next selected ancestor node.
Nonfirst normal form: In relational terms, nonfirst normal form means that
tables can support structured or nested data with repeating data (multiple
occurrences of data in a single column). This form of relational data can be pro-
cessed by a nested relational processor. The first normal form requirement is
not a requirement for good database design or even a relational requirement, it
is a requirement imposed by SQL and its requirement for two dimensional
tables.
Null: Nulls are padding values that are used to represent missing data in outer
join results. Nulls are also used to represent unknown values when data is
entered into a relational table.
ODBC: ODBC is the open database connectivity API standard put forth by
the Microsoft Corporation. It uses SQL as the database interface language.
ON clause: The ON clause is used with the ANSI-92 outer join operation to
specify the join criteria for each table being joined in the join specification. The
ON clause does supply greater control over outer joining tables than is possible
through a single WHERE clause. This proves that it has usefulness over the
WHERE clause and is also crucial to performing outer join data modeling.
ON clause filtering: The ON clause is used with the ANSI-92 outer join
operation to specify the join criteria for each table being joined. However, it
can also specify hierarchical data filtering, which allows for more control and a
more precise level of data filtering than if specified on the WHERE clause.
One-sided join: The one-sided join is the LEFT or RIGHT join. These are
known as one-sided joins because they preserve data only on one side, the dom-
inant side.
Ordered data: Most data systems are either ordered or unordered systems by
default. XML by default is ordered, and assumes that the data is ordered. This
is probably because XML was first a markup language where order is crucial.
SQL is unordered by default. The SQL row order has no significance and rows
can be returned in any order unless explicitly ordered. Ordered data are lists,
and unordered data are sets.
Outer join: The outer join operation is used to preserve data that doesn’t find
a match in a join operation in order to preserve dangling tuples (partial rows).
There are basically FULL outer joins that preserve data on both sides of the
join, and one-sided outer joins that preserve data only on one given side known
as LEFT or RIGHT joins.
P2P: Peer-to-peer networks eliminate the need for servers and allow all com-
puters to communicate and share resources as peers.
Parent: A parent is the next higher level table, or node, in the data structure
that follows the path upward. In a hierarchical structure, parents are important
because their children can not be created without them.
Persistent data: Persistent data is data that is created and remains after the
operation that created it and is available for reuse.
Physical data structure: Nodes that comprise physical databases are con-
nected by physical address links (such as IBM’s hierarchical IMS database) or
juxtaposition, proximity, or nesting (such as XML).
Primary key: A primary key is a database key that uniquely identifies a record
or a row in a file or table and is usually required.
Record: A database record is comprised of all node occurrences from the root
node occurrence down.
Recursive structures: XML supports recursive structure where the same ele-
ment node type, or sequence of node type specifications, in a path can be speci-
fied again in the same path in the structure, causing a circular definition. This is
used in structures to explode compound objects, such as parts, that can consist
of other parts that are repeated until their atomic parts are reached.
Regular data structure: A regular data structure is a data structure that fol-
lows standard conventional formatting rules. Also see Conventional data
structure.
Replicated data: Replicated data, as used in this book, is data that is repli-
cated when structured data is flattened into a two-dimensional table structure.
This replicated data can throw summaries off and has the potential to obscure
the data structure. Replicated data is not the same as duplicate data, whose
identical data occurrences is semantically correct.
Reusable: The trait that some software component can be reused in many dif-
ferent applications, such as an SQL view that can be queried in many ways, and
used in building larger views. This trait saves on development effort for the
348 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Result set: The result set is the flat relational result returned by SQL.
RIGHT join: The RIGHT join operation is an outer join that preserves
unmatched data from the dominant table specified on the right side of the join
operation. This is not as natural or as easy to work with as the LEFT outer Join.
Root: The root of a hierarchical structure is the topmost table or node in the
structure. Because a hierarchical structure is an upside-down tree, it makes
sense that the starting table, or node, is called the root. All access to a hierarchi-
cal structure originates from the root.
Glossary 349
Row: Relational tables are made up of horizontal rows and vertical columns.
The relational name for a row is a tuple. A row is analogous to a record in a flat
file.
Runtime: Runtime is the of time occurring during the start and end of execu-
tion. Dynamic operations happen during this time.
SAX: A simpler and smaller XML API than DOM. DOM reads the entire
document into memory while SAX only returns the data requested. SAX is
more efficient but more limited.
Schema: An XML schema defines and maps a specific class XML documents.
It is newer and much more advanced than the older DTD, which serves this
same basic purpose.
Schema-free: As used in this book, schema-free means that the user does not
need to know the data structure being queried because navigation is automatic.
Also see Navigationless.
Scope of control: Each specific join operation joins two working sets or
tables. This means that the tables referenced by ON clauses during each join
operation must belong to one each of the two working sets that are being cur-
rently joined. Because of right-sided nesting, there can be many working sets
that are stacked; these should not be referenced until they are unstacked and
become active. This ON clause range of acceptable table references is also
known as the scope of control.
Secondary key: A secondary key is a key that is not necessarily unique, so that
searching on it will return multiple records such as when searching on the color
red where color is the secondary key. It is also known as an alternate key. A pri-
mary key is unique.
Selection: Relational selection is filtering row data based on a data value. The
WHERE clause data selection removes entire rows. ON clause data filtering
removes pieces of data from selected rows that are replaced with NULL values.
Semantic loss: Semantic loss, as used in this book, occurs when semantic
structural information is obscured or lost from a structure when it is trans-
formed. In particular, when a hierarchical structure is flattened, the data struc-
ture and the structural semantics are significantly obscured.
Semantic transparency: The ability of the data processor to establish the rela-
tionships between data types to automatically derive the correct results.
Sets: Sets of data are assumed unordered, whereas lists of data are assumed
ordered. XML data is assumed ordered while relational data is assumed
unordered.
Shared element data: Shared element data, as referred to in this book, is cre-
ated by an XML IDREF usage that produces multiple paths into a node type so
that the same physical data occurrences it defines is shared by two or more
paths. Also see IDREF.
Sibling nodes: Sibling nodes are the sibling node types of a parent node type.
Their left-to-right defined order is application dependent.
Sibling paths: Sibling paths are parallel paths that are indirectly related
through a common ancestor node. These paths are separate and do not influ-
ence each other. They have no node-by-node occurrence correlation. This has
specific consequences for the semantics of the data structure. For example,
352 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
comparing data fields from two sibling paths requires comparing all combina-
tions under the LCA occurrence.
Significant white space: Significant white spaces are spaces, tabs, and line
break codes that are part of the document text and should be preserved and dis-
played when output.
Social search: A social search allows you to discover relevant content from
your social connections.
Sorted outer union solution: This is a solution to the multipath data explo-
sion problem solution using the sorted outer union operation to suppress the
data explosion. It is also known as SOU technique. It also sacrifices the seman-
tics of the multipath data structure for accurately querying the entire structure.
SQL: Structured query language the ANSI and ISO standard interactive and
programming language for getting information from and updating a database.
SQL/XML Standard: This is the ANSI standard for defining syntax and
functions in ANSI SQL to handle input and output of native XML from SQL.
A number of functions have been defined for XML output. These output func-
tions require nested use and XML-centric operations in order to form hierar-
chical XML documents.
Glossary 353
Start and stop tags: Start and stop tags enclose the content of an XML ele-
ment. The first tag of a container element also names the element.
Static Query: A predefined query. It implies that the query cannot be speci-
fied dynamically.
Structured data: Used in this book, structured data means the same as hierar-
chical data. See Hierarchical data.
Structured SQL views: These are SQL views that define logical or physical
hierarchical structures and can be dynamically joined to form larger logical
hierarchical structures. Structured SQL views are also self-optimizing so they
can be used more often greatly increasing their data abstraction.
Substructure views: Substructure views are SQL views that contain hierarchi-
cal data structures that can be seamlessly embedded in SQL statements and
structured views to create larger views.
Surrogate key: Most relational keys serve two purposes: their use as a key, and
their use as data. A surrogate is only used as a key; it is usually automatically
generated because there was probably no data available that could also be used
as the key.
Symmetric join: INNER and FULL joins are referred to as symmetric joins
because they are commutative in operation. They produce the same results
when left and right table inputs are reversed. They model flat structures.
Three value logic: True, false, and unknown conditions used with relational
processing.
Throwaways: The term throwaways, as used in this book, are rows retrieval
in performing a join operation that are later discarded in the same join opera-
tion because of encountering unmatched rows.
Glossary 355
Tree graph: A tree graph is a directed graph with one start node.
Twins: The different children node types of a parent node type represent dif-
ferent children with different data types and formats, known as siblings. Twins
are the multiple data occurrences for a specific node type that has the same par-
ent data node occurrence. The node type is the same across twins and the node
parent data occurrence is the same, hence the name twin or twins.
Unified view: A unified view sits over heterogeneous data sources and offers a
consistent view definition by defining the entire logical structure view. The
356 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
ANSI SQL LEFT outer join can do this, and it offers some of the following
advantages. The unified view can be specified as subviews in separate manage-
able and reusable SQL views, and these SQL subviews can be specified and
arranged dynamically at execution. In addition, the subview’s definition can be
specified dynamically, and when all the views are expanded they form a solid
single, unified view defined entirely by standard SQL syntax and semantics.
Universal data access: Universal data access (UDA) is a term that indicated
that a given product can support access to all forms, types, and combinations of
data and databases.
Universal relation model: The user is given a view or imaginary relation con-
sisting of one single relation (table) that is derived from the natural join of all
relations in the database. As a user interface, the advantage is that the user does
not need to know which columns/fields are in which relation (table). The uni-
versal relation model also preserves dangling tuples, which would normally
remove entire rows when there is no join match.
Unordered data: Unordered data is data that has not been ordered.
Unparsed form: Native XML, with its complex hierarchical structure and
embedded metadata, requires parsing in order to be accessed. There are times
when it is desired that areas of the XML data are to be bypassed by the parsing
operation. This unparsed data is identified in the XML metadata as CDATA.
Unstructured data: Unstructured data has no real structure, such as the data
in an email and a memo. Interestingly, estimates have 85% of all business infor-
mation as unstructured data. There are now many products coming on the
market that can put some structure into unstructured data so that it can be cat-
egorized or organized hierarchically.
Glossary 357
URL: A URL is a universal (or unified) resource locator that is used to access
data on the Internet or Intranet. It is a Web address.
USING clause: The USING clause is used instead of the ON clause to spec-
ify that an implicit natural join option is to be applied to the join operation.
Variable data structure: With XML, the structure of the data structure can
vary from document occurrence to occurrence, or even within a given docu-
ment. Within limits, SQL can do this using the ON clause based on a data
value field at a higher level of the structure.
Variable length fields: Variable length fields are fields that are of variable
length. They hold any type of value or field. The length of a variable length
field is usually contained somewhere in the record (known to the application)
preceding the variable length field. This means that a data record with variable
length fields is variable length record. This does not make it a varying (format)
structure because this does not change the format.
Variable length records: Variable length data records in a data set are records
that contain variable length fields and/or variable occurring fields, making
them a variable length that changes in the data set. This does not make it a
varying (format) structure because this length change does not change the
format.
Variable occurring fields: Variable occurring fields are data fields that can
repeat sequentially for multiple occurrences in a record. They are variable
because the amount of space required to contain them is variable, only using
the space required. The active number of field occurrences usually directly pre-
cedes the variable occurring fields. This means that a record with variable
occurring fields is variable length record. This does not make it a varying (for-
mat) structure because this does not change the format.
penalty for using an outer join hierarchical view that contains more nodes than
are needed. This also means that the number of required views can be reduced
since one large view can do the job of many small ones.
View update: The term view update is really the capability to update a
multitable join view. This has always presented a problem because of the lack of
semantics when multiple tables are joined. Modeling hierarchical structures
allows much more flexibility with multitable updates.
Virtual field: virtual field does not physically exist until it is requested or its
associated record is retrieved. At that time, it is computed. It is also called a
computed field.
Virtual key: A virtual key is a logical key that does not physically exist in a
row or record, but is used to retrieve the data and is inserted when the row or
record is retrieved into storage to act as its key. This can be the case when the
key exists in an index and does not exist in the row or record that is indexed.
Virtual view: A virtual view makes multiple data sources from possibly dis-
tributed sites appear as one seamless view in heterogeneous queries.
Web service: Any software service that is available over the Internet using a
standard XML messaging system that is not tied to a specific operating system.
WHERE clause filtering: WHERE clauses can also specify data filtering cri-
teria besides join criteria. When data filtering is specified on the WHERE
clause, it can affect the entire row so that, if the data filtering criteria causes the
last node occurrence to be removed, the entire row is filtered out. This is not
the case with ON clause filtering, which allows for a finer level of hierarchical
filtering with its data preserving operation.
Glossary 359
White space: White space in XML documents is controlled by the space, car-
riage return, and linefeed characters. Unfortunately, white space can become
important and can affect outcomes of certain types of processing. For example,
when reconstructing a document from its deconstructed pieces, it is difficult to
recreate the white space exactly. This can throw document comparisons off.
XML aware: When an application can accept and output XML documents.
Also see XML enabled.
XML data type: A relational data type used in SQL to indicate native XML.
XML enabled: This term means that the indicated application, or utility, can
input and output XML. This means it can operate in an XML environment.
XPath: XPath was a simple XML query language that is now used in most
XML query languages as their navigational sublanguage. Unfortunately, XPath
is single path oriented and cannot handle multipath (bushy) queries in a single
use very well.
XQuery: XQuery is the newest XML query language endorsed by the W3C.
It is a procedural-like language that is particularly good at textual transforma-
tion. As a separate procedural XML processor, it is very good and powerful.
Performing full multipath hierarchical processing will require complex proce-
dural static processing.
361
362 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Czumaj, A., Kowaluk, M., and Lingas, A., “Faster Algorithms for Finding
Lowest Common Ancestors in Directed Acyclic Graphs,” Electronic Collo-
quium on Computational Complexity, Revision 2 of Report No. 111, 2006.
David, M. M., “Advanced Capabilities of the Outer Join,” ACM SIGMOD
Record, Vol. 21, No. 1, March, 1992.
David, M. M., “ANSI SQL Hierarchical Processing Can Fully Integrate Na-
tive XML,” ACM SIGMOD Record, Vol. 32, Issue 1, March, 2003.
David, M. M., “Automatic Full Parallel Processing of Hierarchical SQL Que-
ries,” DevX, Feb 22, 2009.
David, M. M., “The Power Behind SQL’s Inherent Multipath LCA Hierar-
chical Processing,” Database Journal, May 20, 2010.
Dyreson, C., Bhowmick, S. & Jannu, A. R. “Morph: A (Shape) Polymorphic
XML Query Language,” Plan-X Workshop 08, Savannah, GA, 2009.
Guo, L., Shao, F. and Botev, C., “XRANK: Ranked Keyword Search over
XML Documents,” Proceedings of the 2003 ACM SIGMOD International Con-
ference on Management of data, San Diego, CA, 2003.
Histidis, V., Koudas, N., and Papakonstantiniu, Y., “Keyword Proximity
Search in XML Trees,” IEEE Transactions ON Knowledge AND Data Engi-
neering, Vol. 18, No. 4, 2004.
Krishnamurthy, R., Kaushik, R., and Naughton, J. F., “Unraveling the Dupli-
cate-Elimination Problem in XML-to-SQL Translation,” Seventh International
Workshop on the Web and Databases (WebDB), Paris, France, 2004.
Leven, M., and Loizou, G., “Semantics for Null Extended Nested Relations,”
ACM Transactions on Database Systems (TODS), Vol. 18, Issue 3, 1993.
Li, G., Feng, J., and Wang, J., “Effective Keyword Search for Valuable LCAs
over XML Documents,” Proceedings of the Sixteenth ACM Conference on Infor-
mation Knowledge Management, Lisboa, Portugal, 2007
Li, Q., and Moon, B., “Indexing and Querying XML Data for Regular Path
Expressions,” Proceedings of the 27th VLDB Conference, Roma, Italy, 2001.
Li, Y., Yu, C., and Jagadish, H. V., “Enabling Schema-Free XQuery with
Meaningful Query Focus,” The International Journal on Very Large Data Bases,
Vol. 17, Issue 3, May 2008.
Bibliography 363
Liu, Z., and Yi, C., “Identifying Meaningful Return Information for XML
Keyword Search,” Proceedings of the 2007 ACM SIGMOD International Con-
ference on Management of data, Beijing, China, 2007.
Mani, M., Wang, S., and Dougherty, D. J., “Join Minimization in
XML-to-SQL Translation: An Algebraic Approach,” ACM SIGMOD Record,
Vol. 35, No. 1, March, 2006.
Pal, S., Cseri, I., and Seeliger, O., “XQuery Implementation in a Relational
Database System,” Proceedings of the 31st VLDB Conference, Trondheim, Nor-
way, 2005.
Shanmugasundaram, J., Kierman, J., and Shekita, E., “Querying XML Views
of Relational Data,” Proceedings of the 27th VLDB Conference, Roma, Italy,
2001.
Shanmugasundaram, J., Krishnamurthy, R., and Tatarinov, I., “A General
Technique for Querying XML Documents Using a Relational Database Sys-
tem,” SIGMOD Record, Vol. 30, No. 3, September, 2001.
Shanmugasundaram, J., Tufte, K., and He, G., “Relational Databases for
Querying XML Documents Limitations and Opportunities,” Proceedings of
the 25th VLDB Conference, Edinburgh, Scotland, 1999.
Sun, C., Chan, C.-Y., and Goenka, A. K., “Multiway SLCA-based Keyword
Search in XML Data,” Proceedings of the 16th International Conference on
World Wide Web, Banff, Alberta, Canada, 2007.
Trotman, A., Geva, S., Kamps, J., Lalmas, M., and Murdock, V., "Current
Research in Focused Retrieval and Results Aggregation," Computer Science In-
formation Retrieval, Vol. 13, No. 5, 2010, pp. 407–411.
Ullman, J. D., Aho, A. V., and Hopcroft, J. E., “On Finding Lowest Com-
mon Ancestors in Trees,” Annual ACM Symposium on Theory of Computing,
New York, NY, 1973.
Ullman, J. D., “Principles of Database and Knowledge-Base Systems,” The
Universal Relation, Volume II, Rockville, MD: Computer Science Press, 1989,
p. 1050.
Vagena, Z., Mora, M. M., and Tsotras, V. J., “Twig Query Processing over
Graph-Structured XML Data,” Seventh International Workshop on the Web
and Databases, Paris, France, 2004.
364 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
Xy, Y., and Papakonstantinnou, Y., “Efficient LCA based Keyword Search in
XML Data,” Proceedings of the 11th ACM International Conference Series on
Extending database Technology, Nantes, France, 2008,
Zhang, S. and Dyreson, C., “Polymorphic XML Restructuring,” WWW
Workshop 06, Edinburg, Scotland, May 23–26, 2006.
About the Authors
Michael M. David is the founder and CTO of Advanced Data Access Technol-
ogies. Previously, he was the lead XML architect for NCR/Teradata and their
representative to the SQLX Group. Before that, he was a staff scientist at
Teradata designing database utilities, a senior software designer at Sterling soft-
ware’s Answer Division, and a software designer at Informatics General. He has
researched, designed, and developed commercial query languages for heteroge-
neous hierarchical and relational databases for over 25 years. He has authored
many papers and articles on database topics and his research findings. These
have appeared in SOA World Magazine, Database Journal, DevX, TDAN, DM
Review, XML Journal, Semantic Universe, Web Techniques, Database Program-
ming & Design, DBMS Magazine, ACM SIGMOD Record, Ken North’s
SQLSummit Site, and Colin Whites’ Info DB Journal.
His research and findings have shown that hierarchical data processing is
a subset of relational data processing, and have shown how to utilize this
advanced inherent capability automatically in standard SQL. At a deeper level,
he discovered and located where standard SQL is inherently performing LCA
processing that is needed to support multipath hierarchical processing, and has
determined how this advanced processing has occurred naturally, proving its
existence and validity for use. He has also found valid semantic extensions to
hierarchical data modeling and processing, allowing for powerful data structure
mashups and flexible data structure transformations, using the hierarchical
semantics in relational rowsets, which assures that the resulting semantics are
valid. This book covers these new advancements.
Lee Fesperman is a software veteran who implemented operating systems,
compilers, interpreters, and assemblers at IBM in the early 1970s. With the
365
366 Advanced SQL Dynamic Data Modeling and Hierarchical Processing
367
368 Advanced Standard SQL Dynamic Structured Data Modeling