0% found this document useful (0 votes)
33 views

Advanced Standard SQL Dynamic Structured Data Modeling and Hierarchical

Uploaded by

Daiane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Advanced Standard SQL Dynamic Structured Data Modeling and Hierarchical

Uploaded by

Daiane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 407

Advanced Standard SQL Dynamic

Structured Data Modeling and


Hierarchical Processing
For a listing of recent titles in the
Artech House Computing Library,
turn to the back of this book.
Advanced Standard SQL Dynamic
Structured Data Modeling and
Hierarchical Processing

Michael M. David
Lee Fesperman
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the U.S. Library of Congress.

British Library Cataloguing in Publication Data


A catalogue record for this book is available from the British Library.

Cover design by Vicki Kane

ISBN: 978-1-60807-533-1

©2013 ARTECH HOUSE


685 Canton Street
Norwood, MA 02062

All rights reserved. Printed and bound in the United States of America. No part of this
book may be reproduced or utilized in any form or by any means, electronic or mechanical,
including photocopying, recording, or by any information storage and retrieval system,
without permission in writing from the publisher.
All terms mentioned in this book that are known to be trademarks or service marks have
been appropriately capitalized. Artech House cannot attest to the accuracy of this informa-
tion. Use of a term in this book should not be regarded as affecting the validity of any
trademark or service mark.

10 9 8 7 6 5 4 3 2 1
Contents
Preface xxi

Introduction xxv

Part I: The Basics of the Relational Join Operation 1

1 Relational Join Introduction 3

1.1 Standard Inner Join Review 4


1.2 Problems with Relational Join Processing 5
1.3 Outer Join Review 6
1.4 Problems with Previous Outer Join Syntax 7
1.5 Conclusion 9

2 The Standard SQL Join Operation 11

2.1 Standard SQL Join Syntax 11


2.2 Standard SQL Join Operation 14
2.3 Standard SQL Join Does Not Follow the Cartesian
Product Model 17

2.4 Determining Standard SQL Join Associativity and


Commutativity 18

v
vi Advanced SQL Dynamic Data Modeling and Hierarchical Processing

2.5 What Outer Join Commutativity Is 19

2.6 What Outer Join Associativity Is 19


2.7 Hierarchictivity in Addition to Associativity and
Commutativity 20

2.8 Conclusion 21

3 Standard SQL Join Types and Their Operation 23

3.1 FULL Outer Join 23

3.2 One-Sided Outer Join 26


3.3 INNER Join 31
3.4 CROSS Join 32
3.5 UNION Join 32

3.6 Intermixing Join Types 33


3.7 Conclusion 34

4 Natural Joins 37

4.1 Explicit and Implicit Natural Joins 37


4.2 Multitable Natural Outer Joins 39
4.3 Natural One-Sided Outer Join 41
4.4 Natural FULL Outer Join 42
4.5 Natural Inner Joins 44

4.6 Intermixing Natural Join Types 45


4.7 Natural One-Sided Join Transformation 46

4.8 Conclusion 47

Part II: Outer Join Data Modeling and


Structured Processing 49
Contents vii

5 Data Structure Review 51

5.1 The Power of Hierarchical Data Structures 51


5.2 Three-Tier Database Architecture 53
5.3 External and Internal Views 54
5.4 Conceptual View 54

5.5 Many-to-One and One-to-Many Relationships 55


5.6 Many-to-Many Relationships 55

5.7 Converting Network Structures to Hierarchical


Structures 57
5.8 Relating Hierarchical Processing to Relational
Processing 57
5.9 Physical Versus Logical Data Structures 59

5.10 Sibling Legs Query Semantics 60


5.11 Ordering of Data Structures Can Cause Their
Restructuring 62
5.12 Data Structure Composition 63

5.13 Good Data Modeling Design Principles 64


5.14 Conclusion 65

6 Outer Join Does Data Modeling 67

6.1 SQL Data Modeling Using the Outer Join 67


6.2 ON Clause Data Modeling Join Condition Rules 70
6.3 Valid and Invalid ON Clause Data Modeling Examples 72

6.4 Valid and Invalid Data Modeling Results 73


6.5 Substructure Views 74

6.6 WHERE Clause Filtering with Data Structures 77


6.7 WHERE Clause Filtering with Substructures 78

6.8 Complex Data Modeling Example 79


viii Advanced SQL Dynamic Data Modeling and Hierarchical Processing

6.9 Conclusion 80

7 Outer Join Data Modeling–Related Capabilities 81

7.1 Data Structure Filtering 81


7.2 Indirect Structure Linking 83
7.3 Nonhierarchical Join Type Support 83
7.4 Nonhierarchical Joining of Data Structures 87

7.5 Many-to-Many Data Modeling and Intersecting Data 90


7.6 Conclusion 91

8 More About Outer Join Data Modeling 93

8.1 Importance of SQL’s Inherent Data Structure Processing


Ability 93
8.2 Efficient Client/Server Data Structure Processing 94

8.3 Coding Data Modeling Outer Join Statements 94


8.4 Generation of Data Modeling Outer Join Statements 95
8.5 Hierarchical Data Structure Processing Empirical Proof 95
8.5.1 Hierarchical Control 96
8.5.2 Structure Control 97

8.6 Nonhierarchical Data Structure Processing Empirical


Proof 98
8.7 Embedded Structured View Support Empirical Proof 99
8.8 Indirect Link Empirical Proof 101

8.9 SQL:1999 and Data Modeling 102


8.10 What Makes the ANSI Outer Join Unique for
Data Modeling 103
8.11 Data Modeling with Old-Style Outer Joins 104
8.12 The New Role of the Inner Join Operation 105

8.13 Conclusion 105


Contents ix

Part III: New Capabilities Based on Outer Join


Data Modeling 107

9 Data Structure Extraction (DSE) Technology 109

9.1 Extracting Data Structure Information From the


Outer Join 109
9.2 DSE Example 110

9.3 Logical Table Example 111


9.4 Symmetric Linking of Data Structures Example 111
9.5 DSE Internal Logic 113
9.6 Why Vendors Need the DSE Technology 113
9.7 DSE Avoids Imposing Data Structures on SQL 114
9.8 Conclusion 115

10 Outer Join Advanced Capabilities 117

10.1 Database Navigation 117


10.2 Access Optimizations 118

10.3 Enterprise and Legacy Database Access 119


10.4 Open Database Access Interface 120

10.5 Seamless Value-Added Features 120


10.6 Data Warehouse Interface 121

10.7 Hierarchical Relational Processing 121


10.8 Object Relational Interface 123
10.9 View Update Capability 123

10.10 Multimedia Application Directory Support 125


10.11 Universal Data Access of Structured Data 127

10.12 The SQL XML Data Structure Connection 128


10.13 Conclusion 131
x Advanced SQL Dynamic Data Modeling and Hierarchical Processing

11 Outer Join Optimization 133

11.1 Join Table Reordering 133


11.2 Dynamic Shortening of the Access Path 134
11.3 Removal of Unnecessary Tables From Outer Join View 134
11.4 Increased Efficiency of Parallel Database Processing 137

11.5 Dynamic Rebuild to Pick Up New SQL Features 137


11.6 Optimization of Nonrelational SQL Interfaces 138

11.7 Applying Hierarchical Optimizations to Network


Structures 140
11.8 Shifting ON Clauses to the WHERE Clause 141
11.9 Conclusion 143

12 Hierarchical Relational Processor Prototype 145

12.1 Hierarchical Relational Prototype Operation 146


12.2 Basic Data Modeling 146
12.3 Many-to-Many Relationships 148
12.4 Embedded Views 150

12.5 View Optimization 150


12.6 Conclusion 152

13 Object/Relational Interface 155

13.1 Standardized SQL Interface 155


13.2 Data Modeling and Structure Processing 156
13.3 Data Abstraction and Reusability 157
13.4 Data Inheritance 158

13.5 Database Navigation, Efficiency, and Nonrelational


Access 159
Contents xi

13.6 Late Binding and Polymorphism 160

13.7 Plug and Play 161


13.8 Conclusion 162

14 Nonrelational SQL-Based Universal Data Access 163

14.1 Structured Record Overview 164


14.2 SQL Structured Data Access Basics 166
14.3 Internal Navigation and Mapping of Structured Data 167
14.4 SQL-Based Universal Data Access of Structured Data 169

14.5 Handling Multiple Structure Formats Within a File 170


14.6 Interfacing to Prerelational and Postrelational Data 171

14.7 The Importance of the View for Contiguous Data 171


14.8 Conclusion 173

Part IV: Advanced Data Structure Processing


Capabilities 175

15 Advanced Lower Structure Linking 177

15.1 Overview of Nonroot Lower Level Linking 177


15.2 Previous Nonroot Lower Level Linking Method 178

15.3 Semantics of Nonroot Lower Level Linking 178


15.4 Single Path Reference to Lower Structure 181

15.5 Multiple Path References to Lower Structure 182


15.6 Optimization Concerns for Nonroot Lower Level
Linking 184
15.7 Using Lower Structure Linking With a View
WHERE Clause 185

15.8 Conclusion 186


xii Advanced SQL Dynamic Data Modeling and Hierarchical Processing

16 Dynamic Structure Combining by Joining, Mashups,


and Association 187

16.1 Static Structure Join 187


16.2 Dynamic Structure Join 188

16.3 Heterogeneous Join 189


16.4 Access Path Data Filtering 190
16.5 Natural View Nesting 190
16.6 Simple Mashup 190
16.7 Complex Mashup 193
16.8 Combining Structures with Association Tables 194
16.9 More Complex Association Table Usage 194

16.10 Conclusion 196

17 Dynamically Increasing Data Value and Flexibility 197

17.1 Data Structure Modeling of Single-Path Structures 197


17.1.1 Structure Modeling Vertical Growth 198
17.1.2 Structure Modeling Depth Growth 198
17.2 Data Structure Modeling of Multiple-Path Processing 199
17.3 Static Data Joining of Structures 200
17.4 Dynamic Data Joining of Structures 201
17.5 Logical Data Structure Advantage 201
17.6 Multipath Data Qualification 202

17.7 Dynamic Path Data Filtering 203


17.8 Miscellaneous Operations that Increase the Data Value 203
17.8.1 Structure-Aware Processing 203
17.8.2 Hierarchical Optimization 204
17.8.3 Increase of Data Accuracy and Correctness 204
17.8.4 Interactive Data Access 204
17.8.5 Automatic Data Aggregation 204
Contents xiii

17.9 Conclusion 205

18 Automatic Multipath Hierarchical Structure


Operations 207

18.1 Structure-Aware Processing 208


18.2 Hierarchical Optimization 209

18.3 Focused Aggregated Data Retrieval 210


18.4 Multipath Hierarchical Processing 211
18.4.1 LCA Processing 211
18.4.2 LCA Type 1 Internal Processing 212
18.4.3 LCA Type 2 Internal Processing 212
18.4.4 LCA Type 2 Variable OR Processing 214
18.4.5 Multiple LCA Type 1 Processing 215
18.4.6 Combining Processing of LCA Types 1 and 2 216

18.5 Nonlinear Ordering 216


18.6 Global Views and Schema-Free Processing 217
18.7 Global Queries and Hierarchical Data Filtering 217
18.8 Automatic Hierarchical Parallel Processing 218
18.9 Conclusion 219

19 Variable Data Structure Generation 221

19.1 Variable Data Structure Generation Is a Powerful


Concept 221
19.2 Linking Below the Root Increases Structure Joining 222

19.3 Looking Backward and Forward 222


19.3.1 Looking Backward 223
19.3.2 Looking Forward 223

19.4 Advanced Variable Structure Control 224


19.5 Flexible Multiple Generation Choices 225
19.5.1 One or the Other Variable Test 225
19.5.2 Multiple Independent Tests 226
xiv Advanced SQL Dynamic Data Modeling and Hierarchical Processing

19.6 Nested and Embedded Variable Structure Creation 227


19.6.1 Nested Variable Structure Test 227
19.6.2 Embedded Variable Structure Test 227

19.7 Variable Structure Generation Along Multiple Paths 228


19.8 Variable Structure Range Filtering 228
19.9 Why Variable Structures Work with Hierarchical Data 230
19.10 Conclusion 230

20 Semantically Controlled Data Structure


Transformations 231

20.1 Restructuring and Reshaping 231


20.1.1 Restructuring 232
20.1.2 Restructuring Using Multiple Levels 234
20.2 Reshaping 235
20.2.1 Inverting a Linear Structure by Reshaping 236
20.2.2 Linear-to-Nonlinear Reshaping 237
20.2.3 Nonlinear-to-Linear Reshaping 238
20.2.4 Nonlinear-to-Nonlinear Reshaping 238
20.3 Data Structure Virtualization 239
20.3.1 Data Fragment Control 240
20.3.2 Data Virtualization Example 241
20.4 Polymorphic Transformation 242
20.4.1 Polymorphic Linear Example 242
20.4.2 Polymorphic Nonlinear Example 242
20.5 Multipath Queries Alternative to Transformations 244

20.6 Conclusion 244

21 Automatic Processing of Remote Dynamic Structured


Data 245

21.1 Static Versus Dynamic Structured Data 245


Contents xv

21.2 Automatic Processing of Remote Dynamic


Structured Data 246

21.3 Dynamic Structured Data Processing Example 246


21.4 Integrating SQL with Dynamic Structured Data
Maintenance 248
21.5 Different Levels of Metadata Processing 249
21.6 Structured Data Processing Collaboration 249
21.7 SQL Hierarchical Processing for Structured Data
Collaboration 250

21.8 Conclusion 251

Part V: SQL Transparent XML Hierarchical


Multipath Query Processor 253

22 New SQL Hierarchical Processing Technology and


Discoveries 255

22.1 External Versus Internal SQL Hierarchical Processing 255


22.2 Hierarchical Processing Background History 256
22.3 Hierarchical Principles and Operation 257

22.4 Schema-Free Navigationless Hierarchical Database


Access 257
22.5 Focused Aggregated Data Retrieval 258
22.6 Combing Relational and Hierarchical Advantages 259

22.7 Global Hierarchical Optimization 259


22.8 SQL Multipath Multioccurrence Data Filtering 260

22.9 Multipath LCA Types of Processing 261


22.9.1 WHERE Clause LCA Processing 261
22.9.2 SELECT Operation LCA Processing 262
22.10 Isolating and Manipulating Data Segments 262

22.11 Linking Below Root 263


xvi Advanced SQL Dynamic Data Modeling and Hierarchical Processing

22.12 SQL Data Transformations 263

22.13 Conclusion 264

23 SQL/XML: Operation, Politics, Ramifications,


and Solution 265

23.1 XML Data Description and Operation 266


23.1.1 Semistructured Data 266
23.1.2 Multiple Content Types 266
23.1.3 Variable Structure Formats 268
23.1.4 Duplicate Element Use 269
23.1.5 Shared Element Data 269
23.1.6 XML Navigation 270
23.1.7 Namespaces 270
23.1.8 Recursive Structures 270
23.1.9 Ordered Data 270
23.1.10 XML Data Processing 271

23.2 Politics of SQL, XML, and the Secret Agenda 271


23.2.1 SQL/XML Standard and XQuery Decisions Limit
Capabilities 272
23.2.2 XQuery’s Decision to Also Support Relational
Processing 272
23.2.3 Limiting Hierarchical Support to Single-Path Processing 272
23.2.4 Ignoring Navigationless Schema-Free Access Support 273
23.2.5 Not Utilizing Standard SQL’s Natural Hierarchical
Processing 273
23.3 Further Effects of the Secret SQL/XML Agenda 274
23.3.1 SQL/XML Vendor Solutions are Proprietary and
Incompatible 274
23.3.2 XQuery and SQL/XML Standard Favors
Semi-structured Processing 274
23.3.3 XML Processing Today Is Limited by User’s Linear
Mindset 275
23.3.4 XQuery Does Not Support SQL’s Powerful SELECT
Operator 275
Contents xvii

23.4 A Better SQL/XML Solution Using Standard SQL is


Possible 276
23.4.1 The SQL Hierarchical XML Solution Stays Naturally
within SQL 276
23.4.2 XML-Centric Syntax Additions Are Unnecessary 277
23.5 Conclusion 277

24 SQL Hierarchical XML Processor Operation 279

24.1 Mapping Relational Hierarchical Structure to


Hierarchical Relational Rowset 280
24.2 Mapping Physical XML Hierarchical Structure to
Hierarchical Relational Rowset 280
24.3 SQL Hierarchical Query Specification with Data
Filtering 281
24.4 SQL Hierarchical Processor Internal Layout 283

24.5 SQL Hierarchical XML Processor External Operations 284


24.6 SQL Hierarchical XML Processor Operations 284
24.6.1 Preprocessor 284
24.6.2 Standard SQL Processor 285
24.6.3 Asynchronous Access Processor 286
24.6.4 Postprocessor 286
24.7 Conclusion 286

25 SQL Hierarchical XML Processor Examples 289

25.1 Node Selection with SQL SELECT Operation 289


25.1.1 Selecting a Single Linear Path 290
25.1.2 Node Promotion with Single Path 291
25.1.3 Node Collection with Multiple Paths 292
25.1.4 Selecting Structure Fragments 293

25.2 Multipath Hierarchical Data Filtering using WHERE


Clause 294
25.2.1 Downward Path Data Qualification 294
xviii Advanced SQL Dynamic Data Modeling and Hierarchical Processing

25.2.2 Upward Path Data Qualification 295


25.2.3 Bidirectional Data Qualification 295
25.3 Simple Multipath Nonlinear Data Qualification 296
25.3.1 LCA Many-to-One Result Data Qualification 297
25.3.2 LCA One-to-Many Result Data Qualification 297
25.3.3 LCA Can be Located Higher than Parent 297
25.3.4 LCA Data from Up and Down the Structure 298
25.3.5 Multiple LCAs 298
25.4 Complex Multipath Nonlinear Data Qualification 299
25.4.1 LCA Determines Range of Combinations for
Decision Logic 299
25.4.2 LCA Data Combinations are Controlled by
Data Occurrence 299
25.4.3 Variable LCAs with OR Decision Logic 300
25.4.4 Complex Multipath LCA Decision Logic 301
25.4.5 LCA Logic too Complex to Hand Code 302
25.5 Backward Path Data Filtering 302
25.5.1 Static Backward Path Data Filtering 302
25.5.2 Dynamic Backward Path Qualification 303
25.6 Advanced Structure Linking with Data Mashups 303
25.6.1 Hierarchical Structure Linking 304
25.6.2 Linking Below Root of Lower Structure with Root
Selected 305
25.6.3 Linking Below Root of Lower Structure without
Root Selected 307
25.6.4 Filtering Below Root of Lower View with Qualification 308
25.7 Dynamic Variable Structure Generation Control 308
25.7.1 Variable Structure Generation Controlled at the Node
Level 309
25.7.2 Variable Structure Generation Controlled at the View
Level 310

25.8 Conclusion 310

26 Summary 313
Contents xix

Appendix: Database Relationships and Views Used


in This Book 315

Notes on the Database Views 316

Glossary 317

Bibliography 361

About the Authors 365

Index 367
Preface
This revised and updated edition of Advanced ANSI SQL Data Modeling and
Structure Processing delves deeper into the inherent hierarchical processing of
SQL and covers the hierarchical processing discoveries and new findings that
have evolved since the first edition came out. To be clear, this is not a book on
external databases built on top of SQL and driven procedurally by the user.
These types of databases are two-dimensional, consisting of height and width,
and are basically flat. This book is about the powerful natural hierarchical data-
base inherent in SQL-92. This is a powerful, automatic, three-dimensional
database containing the height, width, and depth necessary to process
heavy-duty professional databases such as IBM’s IMS, XML databases as well as
new logical hierarchical relational databases.
There are many new hierarchical data modeling and processing capabili-
ties that have been made possible with the standard SQL join syntax and outer
join operation added in the SQL-92 standard. This is still one of SQL’s biggest
kept secrets today. Most of these capabilities are not generally known, if they
are known at all. These hierarchical capabilities have been lying dormant, wait-
ing to be utilized. They unlock the power of hierarchical processing that comes
free with the SQL-92 standard. The standard SQL join syntax actually contains
a very flexible and powerful programming language with dynamic data model-
ing and hierarchical structure processing capabilities. Their full utilization can
be extremely beneficial and useful to all SQL programmers, DBAs, database
designers, product developers, data scientists, and product users. While these
capabilities are available for use, they have not been documented in other SQL
reference books or SQL vendors’ user manuals. This book remedies this prob-
lem by thoroughly documenting these powerful inherent hierarchical data

xxi
xxii Advanced SQL Dynamic Data Modeling and Hierarchical Processing

modeling and processing capabilities. This book will also demonstrate these
advanced capabilities so that database professionals can see examples of these
hierarchical queries run on an experimental SQL hierarchical XML processor.
Using this book, SQL beginners and experts will be able to immediately
utilize the standard SQL outer join operation to support its advanced
underutilized hierarchical processing capabilities. The outer join technology
presented can be safely applied because it is open and standard SQL compati-
ble, avoiding interface problems now and in the future. Because the inherent
and direct processing of complex hierarchical data structures is new to SQL,
data structures, their semantics, and direct use with the standard SQL outer
join are also well covered in this book. This will fully round out the outer join
coverage and its many uses. Some of its advanced new capabilities are hierarchi-
cal integration of relational and hierarchical data, dynamic, transparent, and
navigationless hierarchical multipath processing, automatic processing of
dynamically structured data, powerful any-to-any structure transformations,
and structure-aware processing for hierarchical optimization, dynamic format-
ted XML output, and dynamic joining of hierarchical structures creating new
structures.
The standard SQL join has many different join types and a very flexible
syntax for specifying them that can significantly control its operation and affect
its join result. This makes outer joins difficult to use and prone to semantic
errors. Many combinations of join types produce illogical structures that can
produce ambiguous results. It is a complicated topic, and for these reasons,
there has not been a book or vendor manual on SQL that demonstrates or dis-
cusses anything more than very simple two-table outer joins. For this reason,
the outer join operation is just too complex a topic to deal with in a limited
way, and is fully covered in this book.
The real power of the outer join is achieved when these advanced capabil-
ities are used in outer joins involving three or more tables. This book instructs
the SQL user on how to perform powerful multiple table outer joins by follow-
ing the hierarchical rules and principles set forth to make constructing and
understanding the effects and semantics of multiple table outer joins very intu-
itive. This structured data logic can be embedded in SQL views. This data
modeling and structure processing ability can establish a default database stan-
dard or model for modeling because it is supported completely by standard
SQL syntax and semantics. The following new features are supported:

• Automatic processing of dynamic and variable structured data;


• Data structure mashups, transforms, and visualizations;
• XML transparent native input and navigationless output;
Preface xxiii

• Structure-aware processing for hierarchical optimization;


• Automatic structured data formatted output.

The SQL examples in this book have been designed so that the intended
meaning of the query results are self-explanatory. This means there is usually no
need to compare query output data in the examples against actual data in the
database. There is a consistent set of familiar data structures used throughout
the book (see the appendix). In addition, if the structure is important to the
example, it is shown again in the example. The query result columns are usually
arranged following their structure so that the semantics are more easily inter-
preted based on the data structure. It is important to keep in mind that—when
comparing the results of queries—the column order of sibling segments has no
semantic significance.
There are two types of SQL examples used in this book. These are
real-world examples and pseudo-examples. The real-world examples are valid
SQL and are used to show specific examples, while the pseudo-examples are not
necessarily complete or totally valid SQL. They are used when it is important to
easily convey a general idea or principle. Often, the pseudo-examples use table
names, such as T1, T2 or A, B, C, and may also use these conventions instead
of columns names to highlight that the importance is not the column name,
but which table the column name belongs to. A pseudo-SQL example may
have the form of From A Left Join on A=B where there may be no SELECT
clause or fully qualified column names in the ON condition when the condi-
tion is not necessary to the concept being discussed.
This book is divided into five parts that are best read sequentially, though
the important points are repeated or referenced in the text when their under-
standing is necessary for the topic being covered. Part I covers the basics of the
relational join operation. Part II investigates the basic data modeling and struc-
ture processing features that are inherent with the standard SQL outer join and
are available for immediate use. Part III explains the new capabilities that were
not previously possible in SQL, but that are now made possible by the outer
join’s data modeling capability. Part IV examines advanced data structure pro-
cessing operations that have been made possible by SQL’s new hierarchical data
modeling and processing capabilities. Part V, using the hierarchical data model-
ing and structure processing background that has been presented previously,
describes the creation of a new and powerful SQL transparent hierarchical
XML query processor and how it operates. What makes this SQL processor dif-
ferent is that it transparently supports full hierarchical multipath processing
with inherent native XML input and output support. This capability was devel-
oped utilizing new discoveries made during the research of this technology.
xxiv Advanced SQL Dynamic Data Modeling and Hierarchical Processing

These discoveries (which are covered in this book and that allow these
hierarchical capabilities in SQL) are:

• Full hierarchical data processing;


• LCA processing in SQL to support multipath queries;
• Linking below the root to support data mashups;
• Global structures and views with no overhead;
• Navigationless schema-free processing;
• Any-to-any data structure transformation;
• Automatic metadata maintenance for peer-to-peer use.
Introduction
The outer join operation was introduced in the SQL-92 standard. It can be
used to hierarchically preserve data in a join operation so that no data is lost
when joining tables. The older standard join, known as the inner join, will lose
data in a join when a row from one table does not find a match in the other
table being joined. For example, joining a Department table with an Employee
table using a standard inner join will lose all departments that do not have any
employees and vice versa. The one-sided outer join prevents this data loss on
the desired side. This allows a hierarchical structure to be generated and pro-
cessed one join node at a time by controlling the data side preservation.
To carry out this data preservation, the outer join has an important char-
acteristic that the older inner join did not have: control over the order in which
the joins are performed, which can affect the result. This means that the capa-
bility to control the join order was introduced into the syntax of the standard
one-sided SQL join operation. The join criteria of these joins have their own
ON clause join criteria specified at each join point. This control offers further
join control, producing new capabilities such as full multipath hierarchical pro-
cessing. These added capabilities are significant to SQL. A cornerstone of SQL
has always been that the join order does not matter. The SQL-92 standard join
syntax and its additional join capabilities change all of that. This makes the
standard SQL join syntax a very powerful, self-contained hierarchical data
modeling language with capabilities that can be used by users directly out of the
box. It can also be utilized by database product developers to freely add new
features and capabilities to standard SQL. This book will explain and demon-
strate how the standard SQL left outer join can be used as a self-contained

xxv
xxvi Advanced Standard SQL Dynamic Structured Data Modeling

hierarchical data modeling and processing language, what its capabilities are,
and how it can achieve those capabilities. This is the purpose of this book.
There are data modeling books on the market that cover hierarchical data
modeling. The difference with this book is that it explains standard SQL’s
inherent hierarchical data modeling capability and why it is not just another
data modeling methodology. It is a complete data modeling language that actu-
ally controls SQL’s full hierarchical operation. This means that this book is not
proposing just another data modeling language; it is defining how the one that
inherently exists in standard SQL operates and performs full hierarchical pro-
cessing. This allows it to be utilized immediately after standard SQL is
installed. This means that when a hierarchical data structure is modeled using a
standard SQL join and is subsequently executed in SQL, the result reflects
exactly the hierarchical semantics of the data structure that is being modeled.
Using this natural technology, an experimental SQL hierarchical XML proces-
sor was built to test out the hierarchical processing and hierarchical/relational
integration to produce XML structured output to demonstrate and verify the
hierarchical accuracy.
Part I
The Basics of the Relational
Join Operation
Part I covers the basics of the relational join operation with a concentrated look
at the more complex and less known outer join operation. The inner join is the
more common and simpler standard join. Chapter 1 introduces the inner and
outer join operations and explains their basic functions and operations, and
their strong and weak points. Chapter 2 defines the standard SQL outer join
operation and discusses its main operation. Chapter 3 goes into the many dif-
ferent types and features of the standard SQL outer join operation and their
specific operations. Chapter 4 concentrates on one specific optional feature of
the join operation, the NATURAL option of the join. This feature makes each
outer join type operate in a different way, which is why it has its own chapter.

1
1
Relational Join Introduction
In relational databases, data is stored in two-dimensional tables. These tables
are arranged in rows and columns of data where each row can be thought of as a
record and the columns are the data fields. For example, a given row would
contain related data such as employee number, salary, and department number.
Other rows in the table would contain these same types of information (attrib-
utes) for other employees.
An application database view usually requires multiple tables, because
standard relational tables do not yet allow for variable repeating fields in a row.
This is because standard relational databases require first normal form data.
Thus, repeating data is supported by using additional tables to hold repeating
values in multiple rows. Second and third normal form data modeling decisions
can also account for related data being split across multiple tables, but these
decisions relate to good database design and are not a requirement.
In relational terms, rows are also known as tuples. Each table column
contains the same type of data (attributes), such as salary or department num-
ber. Every row needs to be uniquely identified by a primary-key field such as
employee number or social security number. Rows can also contain nonunique
key fields such as alternate and foreign keys, like a department number in the
Employee table. These can be used to access a group of related rows, such as all
employees for a given department.
A primary-key field in one table can be a foreign-key field in another
table. This is the case in the familiar Department and Employee tables, where
the department number in the Department table is its primary key, and in the
Employee table the department number is the foreign key. A join operation
is used to combine tables like the Department and Employee tables using a

3
4 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

common key in both tables, such as the department number keys to match the
rows that will be combined.

1.1 Standard Inner Join Review


The standard join operation is known as the inner join. It horizontally com-
bines two or more tables into a single working table or view. The matching
of the rows over the same domain is controlled by the WHERE clause join
condition as specified in this join statement: SELECT * FROM Department,
Employee WHERE DeptNo=EmpDeptNo.
An inner join is performed in principle by logically performing the Carte-
sian product (generating all combinations of rows) of the tables and then apply-
ing the WHERE join condition, which specifies the join criteria such as
DeptNo=EmpDeptNo. The WHERE join condition will remove all combina-
tions of rows that do not satisfy the join criteria, leaving only those combined
rows that link up properly (i.e., their keys match up); otherwise, in the
SELECT statement in the paragraph above, each employee would remain
joined to each department instead of only the department to which the
employee belongs.
One problematic characteristic or side effect of the inner join operation is
that it will eliminate entire rows from the generated result table that fail any
part of the join criteria conditions. Therefore, inner joining the Department
table with the Employee table will always exclude both departments that have
no employees and employees that do not belong to a department. This side
effect of losing data is magnified when more than two tables are inner joined.
For example, when inner joining the Department, Employee, and Dependent
tables, a department that has employees but no dependents will exclude
employees, which in turn will exclude the department from the result. This side
effect, if not known, can often go unnoticed, producing undesirable results.
The inner join example in Figure 1.1 demonstrates the data loss concepts pre-
sented here.
The example in Figure 1.1 demonstrates the inner joining of the Depart-
ment table with the Employee table, producing the join result table shown. The
data in the Department and Employee tables are also shown, demonstrating
how department A’s data and employee Y’s data are excluded from the result
because they have no matching row in the other table. The outer join operation
described in Section 1.3 solves this problem of missing data. Also notice in Fig-
ure 1.1 that the replicated data, “DeptB 456,” from the Department table was
introduced into the join result table because relational tables have a flat
two-dimensional structure.
Relational Join Introduction 5

SELECT DeptNo, DeptBudget, EmpNo, EmpSalary, EmpDeptNo


FROM Department, Employee WHERE DeptNo=EmpDeptNo

Department + Employee = Join


Table: Table: Result:

DeptA 123 EmpX 10 DeptB DeptB 456 EmpX 10 DeptB


DeptB 456 EmpY 20 DeptC DeptB 456 EmpZ 30 DeptB
EmpZ 30 DeptB

Figure 1.1 Sample inner join of Department and Employee tables.

With the inner join, the order that the tables are specified for joining does
not affect the result. If the order that the table names were specified in the inner
join statement in Figure 1.1 were reversed, the result would remain the same.
Because the order that the table joins are processed has no effect on the result,
this allows internal optimizations to pick the most efficient join order for
execution.
It is also worth mentioning that the WHERE clause can specify filtering
criteria as well as join criteria, as in SELECT * FROM Department, Employee
WHERE DeptNo= EmpDeptNo AND Salary >= 50,000. In this case, the result
of the join operation also filters out result rows where the salary is less than
50,000.

1.2 Problems with Relational Join Processing


The inner join result in Figure 1.1 demonstrates three problems: lost data, rep-
licated data, and lack of data modeling. Lost data caused by unmatched rows
(dangling tuples) is normal for relational database operation. It keeps the
underlying operational principles mathematically sound. Unmatched rows
present a problem in how to preserve them so that they are mathematically
sound, operate consistently, and are unambiguous (which is discussed in the
next section).
Replicated data also becomes necessary with relational data stored in
two-dimensional tables. In the join result in Figure 1.1, department B’s data is
replicated so that any row taken in isolation has all the data required. Unfortu-
nately, this can easily and unknowingly throw summaries off by introducing
replicated values into the result.
Closely related to the replicated data problem is the lack of data modeling
and data structure processing. This is demonstrated by the replicated data
problem just discussed above. Data structure processing would not introduce
6 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

replicated data values unless it is necessary to reflect the proper data structure
(as will be demonstrated in Chapter 12). But as we saw earlier, there is no way
in the inner join syntax to specify the data structure or to represent the data
structure. When joining the Department table with the Employee table, there
are two data structures possible, Department over Employee or Employee
over Department. Each has its own and distinct semantics, but neither can
be represented in the inner join result of these two tables as demonstrated in
Figure 1.1.

1.3 Outer Join Review


Lost data? Outer join to the rescue! The outer join operation preserves data
from unmatched rows. This is done by replacing missing data with null values
in the result table. When joining tables, they are joined two at a time. This
means there are three choices for how to preserve data as the tables are joined:
preserving data for the left table, preserving data for the right table, or preserv-
ing data for both tables. Correspondingly, these are known as LEFT joins,
RIGHT joins, and FULL joins. LEFT and RIGHT joins are also known collec-
tively as one-sided joins because they preserve data on only one side.
As the tables are joined two at a time, the data-preserving effect of the
outer join in the working set continues to influence the result as it progresses.
This is because once a data value is preserved or not preserved (replaced as a
null) and placed into the working set, this value is then accessed there when it is
referenced. The major significance of this operational characteristic is that the
order that the tables are joined can affect the result of the join operation.
The outer join operation can be simulated using additional SELECT
statements with UNION operations to regenerate the missing data and in-
troduce it back into the result table. This is very inefficient, as is evident in
Figure 1.2. While this example looks complex, it is simulating only a single
one-sided outer join. A FULL join would involve twice the work, as in
Figure 1.3. And when more than two tables are involved, the additional effort
per table grows geometrically more complicated to recalculate the data to be
added back into the result since all the previous operations need to be repeated
for each outer joined table.
Outer joins can also be more difficult to optimize by the SQL system than
inner joins. This is because with inner joins, the SQL system can freely change
the table join order to reduce the number of table accesses by using the less pop-
ulated tables to drive the first join operations. With outer joins, this is not as
easy since changing the join order can affect the results. Fortunately, there are
Relational Join Introduction 7

SELECT DeptNo, DeptBudget, EmpNo, EmpSalary


FROM Department, Employee WHERE DeptNo=EmpDeptNo

UNION /* add back data for unmatched departments */

SELECT DeptNo, DeptBudget, NULL, NULL


FROM Department WHERE NOT EXISTS
(SELECT * FROM Employee WHERE DeptNo=EmpDeptNo)

Figure 1.2 Simulated one-sided outer join operation.

SELECT DeptNo, DeptBudget, EmpNo, EmpSalary


FROM Department, Employee WHERE DeptNo=EmpDeptNo

UNION /* add back data for unmatched departments */

SELECT DeptNo, DeptBudget, NULL, NULL


FROM Department WHERE NOT EXISTS
(SELECT * FROM Employee WHERE DeptNo=EmpDeptNo)

UNION /* add back data for unmatched employees */

SELECT NULL, NULL, EmpNo, EmpSalary


FROM Employee WHERE NOT EXISTS
(SELECT * FROM Department WHERE DeptNo=EmpDeptNo)

Figure 1.3 Simulated FULL outer join operation.

some interesting and powerful new optimizations that can be applied to outer
joins. These are discussed in detail in Chapter 11.

1.4 Problems with Previous Outer Join Syntax


Earlier implementations of the outer join operation before the SQL-92 stan-
dard were not standardized. Unfortunately, many of these implementations
have remained in use even today. A common implementation used by these
early outer join operations was to place a special symbol like an asterisk or plus
sign by the table name reference in the FROM clause or column name in the
WHERE clause. This special symbol would indicate that the associated table
(or the other table in some implementations) was to be augmented with an all-
null value row that would match a join criterion if all other rows in the table
didn’t match the row in the other table. This means that the unmatched row in
8 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

the other table is preserved (which may seem confusing). The example in Fig-
ure 1.4 demonstrates a case where the Department table is preserved and the
Employee table is not.
The example in Figure 1.4 demonstrates a one-sided join. This is because
the Department table represented in the WHERE clause by the DeptNo col-
umn preserves data because the matching EmpDeptNo join column is flagged
with an asterisk. This, as described below, causes it to be augmented with an
all-null value row that will match with any nonmatching row in DeptNo.
FULL outer joins can also be specified by each join comparison column having
its own asterisk, as in: EmpDeptNo*=*DeptNo, which is demonstrated in
Figure 1.5.
Notice that the result table in Figures 1.4 and 1.5 below have department
A’s data preserved even though there were no matching employees for it, and in
Figure 1.5 employee Y was also preserved even though there was no matching
department for it. This is the reason for the two null values representing the
missing employee and department data in the join result. While this SQL
example operates fine, there is a problem when more than two tables are being

SELECT DeptNo, DeptBudget, EmpNo, EmpSalary, EmpDeptNo


FROM Department, Employee WHERE EmpDeptNo*= DeptNo

Department + Employee = Join


Table: Table: Result:

DeptA 123 EmpX 10 DeptB DeptB 456 EmpX 10 DeptB


DeptB 456 EmpY 20 DeptC DeptB 456 EmpZ 30 DeptB
EmpZ 30 DeptB DeptA 123 Null Null Null

Figure 1.4 Early nonstandard one-sided outer join implementation example.

SELECT DeptNo, DeptBudget, EmpNo, EmpSalary, EmpDeptNo


FROM Department, Employee WHERE EmpDeptNo*=*DeptNo

Department + Employee = Join


Table: Table: Result:

DeptA 123 EmpX 10 DeptB DeptB 456 EmpX 10 DeptB


DeptB 456 EmpY 20 DeptC DeptB 456 EmpZ 30 DeptB
EmpZ 30 DeptB DeptA 123 Null Null Null
Null Null EmpY 20 DeptC

Figure 1.5 Early nonstandard FULL outer join implementation example.


Relational Join Introduction 9

SELECT * FROM Department, Employee, Dependent


WHERE EmpDeptNo* =DeptNo AND EmpDeptNo=DpndDeptNo

Figure 1.6 Ambiguous early nonstandard outer join implementation example.

joined. The problem, as mentioned earlier, is that the join table order can affect
the result when outer joins are involved, and these early outer join operations
do not have a method of specifying or controlling the join order. This makes
the result unpredictable when more than two tables are being joined. For exam-
ple, the join statement in Figure 1.6 is ambiguous.
How is the SELECT statement in Figure 1.6 processed? Is the Depart-
ment table outer joined with the Employee table first, or is the Employee table
inner joined with the Dependent table first? The inner join is very destructive
—if performed after the outer join, it can negate the data-preserving effect of
the outer join. So, the join order can be very significant to the result, and there
is no provision in this early nonstandard SQL syntax to control the join order.

1.5 Conclusion
Inner joins lose data when there is no matching data. Outer joins preserve
unmatched data by padding the missing data columns with null values in the
result. Its operation may be more costly than the inner join because of its more
complex requirements. The first outer joins were not standardized, and oper-
ated ambiguously when three or more tables were joined. The standard SQL
outer join is standardized, and its syntax is nonambiguous, as will be shown in
the next chapter.
2
The Standard SQL Join Operation
The SQL-92 version of the standard SQL standard officially introduced an
outer join operation. Much study went into the design of this outer join opera-
tion to correct the problems that had been identified from previous nonstan-
dardized versions, which were covered in Chapter 1. The inner join is still the
standard and default join operation. The syntax of the outer join has been
seamlessly grafted onto the FROM clause, leaving the inner join operation
downwardly compatible with existing SQL code.

2.1 Standard SQL Join Syntax


The standard SQL outer join syntactical definition is shown in Figure 2.1. This
definition is a simplified form of the FROM clause syntax that conveys the
main features, format, and capabilities involving the outer join operation. The
standard SQL join syntax fully supplies and exceeds the capabilities necessary to
support the outer join capability. Most importantly, it supplies table join order
control and join criteria for each table joined.
The outer join syntax in Figure 2.1 is fairly complex for standard SQL
code. Needless to say, it can be very difficult to use. The syntax definition is
recursive, revolving around the Joined-Table specification. This syntax allows
for the specification of multiple tables or their working sets to be outer joined
two at a time in a controlled order. The syntax design also influences the opera-
tion of the outer join by introducing what this book refers to as “nesting” to
introduce additional tables and add control for table join order. This nesting
can take place as left- and right-sided nesting of standard SQL join operations

11
12 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT ---
FROM Table-Reference[,Table-Reference]…
WHERE ---

Table-Reference is: Table-Name | View-Name | Join1 | Join2 | Join3

Join1 is: [(]Table-Reference Join-Type1 JOIN Table-Reference


Join-Specification[)]

Join2 is: [(]Table-Reference NATURAL Join-Type1 JOIN


Table-Reference [)]

Join3: [(]Table-Reference Join-Type2 Table-Reference[)]

Join-Type1 is: LEFT [OUTER] | RIGHT [OUTER] |


FULL [OUTER] | INNER

Join-Type2 is: CROSS JOIN | UNION JOIN

Join-Specification is: ON Join-Condition | USING (Column-Name List )

Figure 2.1 Simplified standard SQL outer join syntax definition.

such as the LEFT, RIGHT, FULL, and INNER joins. Left-sided nesting
occurs on the left side of outer join operations, and right-sided nesting occurs
on the right side of outer join operations where tables are brought in by the
recursive syntax. This is reflected in the outer join definition in Figure 2.1. For
completeness sake, the syntactical notations used in this outer join definition
are specified in Figure 2.2.
To simplify the standard SQL outer join definition in Figure 2.1, three
versions of the joined table construct were specified. The first is the most stan-
dard and common syntax. In the second version, a NATURAL option adds a
NATURAL keyword that eliminates the join specification. The third version is
a CROSS join, which also does not use a join specification. The join specifica-
tion with its ON or USING clause also controls nesting, which controls table
join order. Since the CROSS join and natural joins using the NATURAL join
option do not use an ON or a USING clause to control nesting, parentheses
can be used to control nesting and therefore table join order. Normally the join
table order cannot be changed by the use of parentheses because the join order
is determined by the ON and USING clauses. This is discussed further in
Section 2.2.
The FROM clause of the outer join definition, FROM Table-Refer-
ence[,Table-Reference]…, shown in Figure 2.1 allows multiple table references
to be specified. At this top level, multiple table references are relationally joined
The Standard SQL Join Operation 13

“ ---” represents optional or missing SQL unnecessary to Outer join

“ …” indicates preceding items may be repeated one or more times

“[ ]” indicates enclosed elements are optional

“ |” indicates choice of one item of many

Upper-case indicates words that should be entered as is

Mixed-case represents words to be replaced with an appropriate value

Figure 2.2 Outer join syntactical notations used in Figure 2.1.

using standard inner join logic, making this definition compatible with the
standard inner join.
The standard SQL outer join operation comes into play when a table ref-
erence contains a joined table specification. Coding more than one table refer-
ence at this top level when outer join operations are performed at the lower
level is not desirable. This is because the data-losing properties of the inner join
operation occurring at the top level would negate the data-preserving effects of
the outer join at the lower level. For this reason, this particular syntax use will
not be explored further in this book.
The order the tables are joined using the new outer join syntax is usually
controlled by the nesting (recursive) syntax, which is not always straightfor-
ward. This is because it follows the order of join processing that is not always
apparent with right-sided nesting (nesting occurring with the right table argu-
ment). Left-sided nesting is naturally processed left to right, but right-sided
nesting in combination with left-to-right processing is not a straightforward
process. It requires a stacking procedure to internally assist execution. The rea-
son for this will become clear in the next section.
The join specification in Figure 2.1 can consist of an ON clause with a
join condition, or a USING clause specifying one or more column names to
be used for joining. Each column name that is specified with a USING clause
must exist in both table inputs, and are used internally to form an equal join
(equijoin). The ON and USING clauses specify the join criteria for their asso-
ciated join operations. The USING clause turns the join operation into a natu-
ral join just as if the NATURAL option was specified. The NATURAL option
and USING clause will be described further in Chapter 4.
Because tables and working sets are joined two at a time in a specific
order, a single WHERE clause specifying the join criteria that is logically
applied after all tables are joined (see Chapter 1) does not work well with outer
14 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

joins whose tables need to be joined in a specific order. What is needed and
supplied by the standard SQL outer join is a clause like the ON or USING
clause that specifies the join criteria at each join point. This also has the effect of
separating join criteria specified on these clauses from selection data-filtering
cri- teria specified on the WHERE clause. The column names that are refer-
enced on an ON or USING clause must be found in the tables or working sets
processed by their associated join operation. This is known as the columns
being in the “scope of control.”
Data-filtering criteria can also be specified on the ON clause. This will
achieve a finer level of filtering control than is capable on the WHERE clause.
This filtering will affect only partial areas of the resulting rows. This is covered
further in Chapter 7.
If no join type is specified with a join operation, an inner join is assumed.
The OUTER keyword is an optional informational keyword. The examples in
this document will exclude the OUTER keyword in order to save space in the
SQL examples. The JOIN keyword, while defined as required in the standard
SQL specification, and therefore the join syntax definition in Figure 2.1, is not
necessary in the join syntax to enable it to be processed correctly. For this rea-
son, many SQL implementations treat its use as optional. Taking advantage of
this fact, some of the examples in this book may also exclude the JOIN key-
word when example space is scarce.

2.2 Standard SQL Join Operation


The following outer join specification in Figure 2.3 joins the Department table
with the Employee table while preserving data in the Department table.
The working set produced from this operation is then LEFT joined with the
Dependent table, preserving data in the working set. As you can see, this pro-
duces very powerful and controlled semantics. This LEFT outer join specifica-
tion is an example of left-sided nesting that introduces tables left to right very
naturally. Note that the first ON clause is not capable of accessing columns
from the Dependent table since it had not been accessed yet and therefore is not
in its scope of control. The second ON clause could access columns from the

SELECT * FROM Department


LEFT JOIN Employee ON DeptNo=EmpDeptNo
LEFT JOIN Dependent ON EmpNo=DpndEmpNo

Figure 2.3 Example of LEFT outer join with left-sided nesting.


The Standard SQL Join Operation 15

Department table because it had been accessed in the generation of the working
set used as the left input of its associated LEFT join operation, and is therefore
in its scope of control.
The outer join specification shown in Figure 2.4 is an example of
right-sided nesting. Parentheses are used in this example to emphasize join exe-
cution order, but have no effect because join order is controlled by the place-
ment of ON clauses when they are present. Notice that the ON clause for the
first LEFT join is actually delayed until after the second LEFT join is com-
pletely specified. This causes the latter join to be performed first, returning the
result to the previous LEFT join as its right-sided input. This nesting can be car-
ried to any depth. Note also that the first specified ON clause associated with
the second LEFT join operation cannot reference columns in the Department
table, since it has not been previously joined with either table input associated
with the second join operation and is therefore not in its scope of control. This
is because right-sided nesting outer joins like this one generate multiple work-
ing sets concurrently, each with a different scope of control associated with it.
This is described further in Chapter 7.

SELECT * FROM Department LEFT JOIN


(Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo)
ON DeptNo=EmpDeptNo

Figure 2.4 Example of LEFT outer join with right-sided nesting.

Employee view:

CREATE VIEW EmpViewAS


SELECT * FROM Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
Embedded Employee view:

SELECT * FROM Department LEFT JOIN EmpView


ON DeptNo=EmpDeptNo

Expanded view: RIGHT-SIDED NESTING

SELECT * FROM Department LEFT JOIN


Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
ON DeptNo=EmpDeptNo

Figure 2.5 Embedded views cause right-sided nesting when expanded.


16 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

One question you might be asking yourself is why anyone would


construct such a nonintuitive and complex SQL statement as that specified in
Figure 2.4 when it is fairly easy to avoid right-sided nesting by using left-sided
nesting as in Figure 2.3. The answer is that sometimes this added flexibility is
necessary to achieve the desired result. Right-sided nesting is also necessary to
support embedded SQL views when they are expanded. For example, if the sec-
ond SQL line in Figure 2.4 below were replaced with a view reference repre-
senting the line, then the expanded statement would cause right-sided nesting.
Expanding the view introduces right-sided nesting, and the outer join’s syntax
does support this for a seamless operation. This is demonstrated in Figure 2.5.
This capability and the additional features enabled by it are described further in
Chapter 7.
As mentioned earlier in this chapter, joins with ON and USING clauses
can’t have their join order changed by the use of parentheses. Their join order is
solely determined by the placement of ON or USING join criteria clauses. As
proof of this, Figure 2.6 attempts to change the join order using parentheses to
override the join order so that the Department and Employee tables are joined
first. But this causes a syntax error since the ON clause for this join operation
can’t be isolated inside the range of these parentheses.
This does not mean that parentheses can never be used with outer joins to
control the join order. Parentheses can control the join order with join types
like the CROSS join and outer joins that specify the NATURAL option,
because they have no join criteria clause to get in the way. This means that
parentheses are necessary to change the join order associated with the CROSS
and natural joins to cause a change in the result. Take, for example, the SQL
statement in Figure 2.7. Without the parentheses, this join statement first
CROSS joins Table1 and Table2 and then joins the working set with Table3
using a LEFT join. This join order is changed by using parentheses, as is also
shown in Figure 2.7. Using the parentheses shown, the LEFT join is performed
first, left joining Table2 to Table3 before the CROSS join is performed. The
CROSS join then uses the working set generated from the LEFT join as its
SELECT * FROM ( Department LEFT JOIN Employee)
LEFT JOIN Dependent ON EmpNo=DpndEmpNo
ON DeptNo=EmpDeptNo

Figure 2.6 Invalid attempt to use parentheses to control join order.

FROM Table1 CROSS JOIN (Table2 LEFT JOIN Table3 ON Cond)

Figure 2.7 Valid use of parentheses to change default join order.


The Standard SQL Join Operation 17

right argument. This will usually produce a different result than without paren-
theses because of the mixture of different join types.

2.3 Standard SQL Join Does Not Follow the Cartesian


Product Model
It is interesting to note that the standard SQL outer join syntax does not follow
the Cartesian product model for performing joins as documented in Chapter 1.
This is particularly important for SQL vendors to realize because it frees up
many SQL syntax restrictions, allowing more optimizations (see Chapter 11)
and the elimination of much unnecessary replicated data (also discussed in
Chapter 11).
The Cartesian product model is used as the processing model for per-
forming joins. Basically, it produces the Cartesian product of all the tables
being joined and then applies the WHERE restriction clause. The outer join
operation has introduced the notion of an “extended” Cartesian product to
account for the rows that are only partially filled because of the outer join data
preserving. These partially filled rows do not appear in a strict Cartesian prod-
uct. The extended Cartesian product operates by augmenting the tables taking
part in the outer join operation with a null row that will match the missing
table row when it has no match. This extended result is shown in Figure 2.8.
While the extended Cartesian product with its null augmented tables
does allow for the partially filled rows produced by the outer join operation, it
still cannot consistently produce the outer join result by applying the selection
criteria after the extended Cartesian product of all the involved tables is formed.
This is demonstrated in Figure 2.9, which relies on multiple ON clauses that
operate at different times during the join operation to produce a result not
derivable from the extended Cartesian product of all the involved tables. The
first SQL statement in Figure 2.9 uses two filtering qualifications—Salary>50
and Salary>100—at different times during the join process. This effect cannot
be duplicated with a single selection clause that is applied logically after all the

Table × Table = Cartesian SELECT * FROM X


X Y Product LEFT JOIN Y ON X=Y

ABC 5 XYZ 6 ABC 5 XYZ 6 ABC 5 Null Null


DEF 6 UVW 7 ABC 5 UVW 7 DEF 6 XYZ 6
DEF 6 XYZ 6
DEF 6 UVW 7

Figure 2.8 Outer join result does not produce a strict Cartesian product subset.
18 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT * FROM Department


LEFT JOIN Employee ON DeptNo=EmpDeptNo AND Salary>50
LEFT JOIN Dependent ON EmpNo=DpndEmpNo ANDSalary>100

Above query is not logically the same as the following pseudo query:

SELECT * FROM Department


LEFT JOIN Employee LEFT JOIN Dependent
WHERE DeptNo=EmpDeptNo AND Salary>50
AND EmpNo=DpndEmpNo ANDSalary>100

Figure 2.9 Use of ON clause that is not possible in Cartesian product model.

extended join operations have been performed as in the standard Cartesian


product model. This means that the ON join clause must be logically applied at
each join point. This additional flexibility in join processing is an extreme
departure from standard relational processing, and opens the door to many
far-reaching new possibilities.

2.4 Determining Standard SQL Join Associativity and


Commutativity
The associativity and commutativity properties are difficult to apply to stan-
dard SQL outer join operations because the outer join statement is not always a
binary (dyadic) operation. These terms were meant to apply to binary opera-
tions such as addition, subtraction, multiplication, and division. The outer join
operation is not always a binary operation since in addition to accepting a left
and right table input, it can require a third argument: the join criteria via the
ON or USING clause. This presents a problem for defining associativity and
commutativity for the outer join and reduces the ability to freely combine and
utilize these properties. Normally, a statement that has both associative and
commutative properties can be freely reordered in any fashion. The ON and
USING clauses of the outer join will usually prevent this flexibility, as will be
shown below. To prove associativity and commutativity—or the lack of—
examples will be used in the following two chapters to disprove these proper-
ties, since disproving these properties is easier than proving them.
The Standard SQL Join Operation 19

2.5 What Outer Join Commutativity Is


With the commutative property, we can say this term applies to the ability to
reverse the left and right table join arguments of a join operation without
affecting the result. This is the only change allowed in this definition—the
matching outer join ON clause must remain unmodified. In this case, the
INNER, CROSS, UNION, and FULL joins are commutative in operation.
Reversing their table input arguments will not change the data result. As can be
expected, the one-sided (LEFT and RIGHT) joins are not commutative since
reversing their table arguments logically changes a LEFT join into a RIGHT
join and vice versa, making their semantics and results very different.
The lack of commutativity shown by the one-sided join can appear to
change to commutative when two or more one-sided joins are involved. This
can be seen in Figure 2.10, which reverses the table arguments in the second
join operation in the SQL examples without changing the result. This example
does not disprove the one-sided commutativity principle just defined. This is
because the outer join’s ON clauses in Figure 2.10 were also flipped around,
thereby changing the semantics of the outer join operation, which in this case
compensated for the tables being reversed.

2.6 What Outer Join Associativity Is


The associative property is also hard to apply to the standard SQL outer join
since it deals with the ability to change the default table join processing prece-
dence without affecting the result. In a binary outer join operation, this can be
tested by using parentheses to change the join execution order. The characteris-
tic of outer joins that requires a join criteria clause is that their join order

Both of the outer join queries below produce the same result.

FROM A LEFT JOIN B ON A=B LEFT JOIN C ON A=C

Reverse order of table arguments for second LEFT join:

FROM A LEFT JOIN C ON A=C LEFT JOIN B ON A=B

Figure 2.10 Multiple one-sided joins may appear commutative.


20 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

cannot be changed by using parentheses. To change the join order of these


joins, the outer join statement must be rewritten because the position of the
ON or USING join clause can affect the join order via right-sided nesting, as
covered in Section 2.2 of this chapter. This means the definition of associativity
for the standard SQL outer join includes respecifying the outer join to effect a
change in the table join order precedence. This includes moving the ON clause
but not the modification of it, which would change the semantics. Unfortu-
nately, these additional operations can reduce the significance of associativity
used with the standard SQL outer join.
Nonassociativity is proven if any outer join statement containing all the
same join type can be regrouped as a valid SQL statement that changes the
join precedence to effect a change in the result. But changing the join order to
change the join order precedence is not always possible because of join criteria
conditions and their scope of control, as shown in Figure 2.11. Not being able
to change the join order should not be a reason to consider joins with ON
clauses as nonassociative. Also, note that it would not be possible to test
commutativity in the valid SQL statement in Figure 2.11 by reversing the B
and C table arguments for the second LEFT join operation because it would
also cause a scope of control error. This example and the others presented in
this section have shown that associativity and commutativity of the outer join is
a complex issue, and for this reason is covered in detail in Chapters 3 and 4.

2.7 Hierarchictivity in Addition to Associativity and


Commutativity
As shown above, it’s difficult to always apply the associative and commutative
properties to the standard SQL outer join operation’s syntax and semantics. In
future chapters, you will see that the outer join can be used to build hierarchical
data structures. When building these data structures, the outer join follows
hierarchical principles and properties. These hierarchical properties can be used
in addition to associative and commutative properties. This means that while
hierarchical data structures do not necessarily obey associative and commuta-
tive properties, they will obey hierarchical properties. In this book, this prop-
erty has been termed “hierarchictivity” for lack of a better word.
This hierarchictivity property operates on a class of clearly defined outer
joins that model hierarchical data structures (discussed in Chapter 3) that can
be reordered without changing the result. The SQL example in Figure 2.12
demonstrates this hierarchictivity property. This example falls outside the range
of associativity and commutativity since it actually reorders the join rather than
just changing its join precedence, and reverses the table arguments of one-sided
The Standard SQL Join Operation 21
Valid SQL statement:

SELECT A,B,C
FROM A LEFT JOIN B ON A=B LEFT JOIN C ON A=C

Invalid scope of control error:

SELECT A,B,C
FROM A LEFT JOIN (B LEFT JOIN C ON A=C) ON A=B

Invalid scope of control error:

SELECT A,B,C
FROM A LEFT JOIN B ON A= C LEFT JOIN C ON A=B

Figure 2.11 It is not always possible to rewrite a query to change the join order.

Sample class of join specification that can be reordered:

FROM A LEFT JOIN B ON A=B LEFT JOIN C ON A=C


A
Reorder of join operation retaining semantics:
B C
FROM A LEFT JOIN C ON A=C LEFT JOIN B ON A=B

Figure 2.12 Example of a hierarchical property.

joins by moving the ON clause. Normally, the ability to reorder the joins
requires both associative and commutative properties, and one-sided outer
joins are not commutative as stated earlier. This example builds the same
multileg hierarchical data structure in both SQL statements by reversing the
construction of its legs. This does not change the semantics for hierarchical
structures. This is one of many hierarchical properties that will be covered in
Chapter 5. This example demonstrates that the hierarchictivity property can be
useful in addition to associativity and commutativity when using outer joins.

2.8 Conclusion
The standard SQL outer join preserves data and corrects problems with earlier
nonstandard outer joins. The standard SQL join syntax also has a separate ON
or USING clause for each join type that requires them. These ON and USING
clauses specify the join condition, and each use has its own scope of control.
The standard SQL join syntax supports both the inner join and many other
22 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

types of join operations (LEFT, RIGHT, FULL, CROSS, UNION), which


can be combined in any order. Unfortunately, sometimes parentheses are nec-
essary to control table join order—at other times parentheses can’t be used.
When parentheses can’t be used, ON or USING clauses indirectly control table
join order.
A new operational property, hierarchictivity, was introduced to apply to a
class of outer joins that covers hierarchical structures. Other important topics
covered in this chapter were the standard SQL join’s right-sided nesting, its fine
level of data-filtering capability, and the fact that the standard SQL outer join
does not follow the Cartesian product model for generating its results. These
topics will be covered and expanded on later.
3
Standard SQL Join Types and Their
Operation
There are two basic types of outer join operations, one-sided joins and FULL
joins. One-sided standard SQL joins are either RIGHT or LEFT joins, which
will preserve data from unmatched rows on the side that their name signifies,
while a FULL join preserves data on both sides. The discussion of these joins in
this chapter does not include the influence of the optional NATURAL option,
which is discussed in Chapter 4. This option has a significant effect on the
outer join’s operation. In addition to one-sided and FULL outer joins, the stan-
dard SQL standard supports other join types, including a CROSS join,
UNION join, and INNER join. All of the join types mentioned here can be
intermixed in a single join statement.

3.1 FULL Outer Join


FULL outer joins preserve data on both sides of the join operation, and for this
reason are also known as symmetric outer joins. With both sides of the join
being preserved, no data is lost because of unmatched rows. This implies that
both tables carry equal weight. Because of this, FULL joins are usually used to
join two or more tables based on a common primary key in all tables—for
example, combining two customer information lists where many of the same
customers are in each list and each list contains different information. Since
both tables are preserved in a FULL join, it is commutative in operation. This

23
24 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

means the placement of its two table operands does not affect the result, as
shown in Figure 3.1.
The standard SQL FULL outer join also operates associatively, as defined
in Chapter 2. Since the FULL outer join is associative and commutative, the
table join order, when more than two tables are being joined, can be changed
without affecting the result. There are two reasons for this. First, the FULL join
loses no data regardless of the table join order. Second, the standard SQL FULL
outer join has separate join clauses for each join, which controls and limits the
possible valid FULL joins that are possible. This was not true of the older, non-
standardized outer joins that were less associative in nature. The examples in
Figure 3.2 demonstrate FULL outer joins where the table join order is changed
without changing the result. Each table contains a row that will not be
matched. The first join example joins the Department table to the Employee
table first, while the second join example uses right-sided nesting (discussed in
Chapter 2) to join the Employee table to the Dependent table before joining
the Department table.
There is one situation where FULL outer joins may appear to be nonasso-
ciative, but this situation does fit the definition of associativity and nonassocia-
tivity as described in Chapter 2. Many SQL books use this situation to prove
that the outer join is nonassociative. This situation occurs when three or more
tables are joined across a common domain (key value). This allows the oppor-
tunity to have more valid join combinations. In the SELECT statements in
Figure 3.2, there are only two possible join combinations. If this join was
joined over one common domain, there would be three possible combina-
tions—Department and Dependent could also be joined directly. This is dem-
onstrated in Figure 3.3, which joins all three tables over DeptNo. The third join

SELECT * FROM Department FULL JOIN Employee


ON DeptNo=EmpDeptNo

The above SQL statement produces the identical result as:

SELECT * FROM Employee FULL JOIN Department


ON DeptNo=EmpDeptNo

Both of the above queries produce:

Department + Employee = FULL Join


Table: Table: Results:

DeptA 123 EmpX 10 DeptB DeptA 123 Null Null Null


DeptB 456 EmpY 20 DeptC DeptB 456 EmpX 10 DeptB
Null Null EmpY 20 DeptC

Figure 3.1 The FULL outer join demonstrating its commutative behavior.
Standard SQL Join Types and Their Operation 25

Department Employee Dependent


Table: Table: Table:

DeptA 123 EmpX 10 DeptB Dpnd1 16 EmpY


DeptB 456 EmpY 20 DeptC Dpnd2 18 EmpZ
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV

SELECT * FROM Department


FULL JOIN Employee ON DeptNo=EmpDeptNo
FULL JOIN Dependent ON EmpNo=DpndEmpNo

Same FULL outer join as above with table join order changed:
FIRST JOIN
SELECT * FROM Department FULL JOIN PROCESSED
(Employee FULL JOIN Dependent ON EmpNo=DpndEmpNo)
ON DeptNo=EmpDeptNo

Same result produced from both queries above:

Department Employee Dependent


DeptA 123 Null Null Null Null Null Null
DeptB 456 EmpX 10 DeptB Null Null Null
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV
Null Null EmpY 20 DeptC Dpnd1 16 EmpY
Null Null Null Null Null Dpnd2 18 EmpZ

Figure 3.2 The FULL outer join demonstrating its associative behavior.

statement in Figure 3.3 may produce different results than the two SQL state-
ments above it since it has a different join condition than they do, this being
DeptNo=DpndDeptNo. Even though DeptNo=EmpDeptNo and EmpDeptNo=
DpndDeptNo, which intuitively means DeptNo=DpndDeptNo, this transitive
logic does not hold up for the standard SQL join with its multiple ON clauses
that are each processed separately.
The FULL outer join examples in Figure 3.3 do not lose any data. This
means all the results will contain the same data, but the way their rows are com-
bined may be different because the third example in Figure 3.3 is referencing
different combinations of field locations, which can change the result in this
situation. This is not a case of simply rewriting the outer join statement. In this
case, a different join condition referring to a different table was used, which
changes the semantics and the results. This is demonstrated in their results, also
shown in Figure 3.3.
With FULL joins involving more than two tables joined across a common
domain, you may notice, as in Figure 3.3, that the results may contain
rows that could have been combined more efficiently to reduce the number
of rows generated. For example, the first example results in Figure 3.3 where
the rows had null values added by the join process could be compressed into
26 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Department Table Employee Table Dependent Table


01 BOB 02 TOM 01 SAM

SELECT * FROM Department


FULL JOIN Employee ON DeptNo=EmpDeptNo
FULL JOIN Dependent ON EmpDeptNo=DpndDeptNo

Same FULL outer join as above with table order changed:


FIRST JOIN
SELECT * FROM Department FULL JOIN PROCESSED
(Employee FULL JOIN Dependent ON EmpDeptNo=DpndDeptNo)
ON DeptNo=EmpDeptNo

Both SQL queries above produce the following result:

01 BOB Null Null Null Null


Null Null 02 TOM Null Null
Null Null Null Null 01 SAM

Similar join, but not the same join condition as either of those above:

SELECT * FROM Department


FULL JOIN Dependent ON DeptNo=DpndDeptNo
FULL JOIN Employee ON EmpDeptNo=DeptNo

Above query with different join criteria produces a different result:

01 BOB Null Null 01 SAM


Null Null 02 TOM Null Null

Figure 3.3 Misleading attempt to prove FULL join is nonassociative.

two rows without losing any data, as in the second set of results in Figure 3.3.
The fact that the second set of results had a more compressed result was deter-
mined by the data and not the SQL statements alone. In this same situation,
it is always possible to generate the most compressed result by using the
NATURAL option of the FULL outer join, which is described in Chapter 4.

3.2 One-Sided Outer Join


One-sided joins are either LEFT joins or RIGHT joins. They are called one-
sided because they preserve data on only one side—either the left side or the
right side as their name indicates. The LEFT and RIGHT joins are actually dif-
ferent forms of the same operation, as shown in Figure 3.4. The LEFT join is
the more natural one to use because it preserves data on the left side and proc-
essing occurs from the left to right, using the more natural left-sided nesting.
This allows for a top-down specification to define a top-down execution, allow-
ing for an intuitive definition and operation. The less intuitive RIGHT join
Standard SQL Join Types and Their Operation 27

SELECT * FROM Department LEFT JOIN Employee


ON DeptNo=EmpDeptNo

Produces the identical result as:

SELECT * FROM Employee RIGHT JOIN Department


ON DeptNo=EmpDeptNo

Both queries above produce:

Department + Employee = Join


Table: Table: Results:
DeptA 123 EmpX 10 DeptB DeptA 123 Null Null Null
DeptB 456 EmpY 20 DeptC DeptB 456 EmpX 10 DeptB

Figure 3.4 LEFT and RIGHT joins are different forms of the same basic operation.

may be useful for complex outer joins, but can usually be avoided by using the
LEFT outer join.
Since one-sided outer joins only preserve data on one side, they are non-
commutative in operation. This means that the location of the two table input
arguments makes a difference in the results, as shown in Figure 3.5. You can see
that the results of the two LEFT joins have distinctively different semantics.

SELECT * FROM Department LEFT JOIN Employee


ON DeptNo=EmpDeptNo
Result one:

Department + Employee = Join


Table: Table: Result:
DeptA 123 EmpX 10 DeptB DeptA 123 Null Null Null
DeptB 456 EmpY 20 DeptC DeptB 456 EmpX 10 DeptB

Above SQL statement produces a different result than:

SELECT * FROM Employee LEFT JOIN Department


ON DeptNo=EmpDeptNo
Result two:

Department + Employee = Join


Table: Table: Result:
DeptA 123 EmpX 10 DeptB DeptB 456 EmpX 10 DeptB
DeptB 456 EmpY 20 DeptC Null Null EmpY 20 DeptC

Figure 3.5 One-sided outer join is noncommutative.


28 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Since one-sided outer joins only preserve data on one of the two sides—
the dominant side—their result is hierarchical in nature. For example, Depart-
ment LEFT JOIN Employee ON DeptNo=EmpDeptNo produces a result where
Department table values can exist without a matching Employee table value,
but Employee table values can’t exist without a matching Department table
value. This means that Department is hierarchically over Employee. When
joining more than two tables, the effect can be extended as shown in Figure 3.6.
In this SQL example, Department table values can exist without a matching
Employee or Dependent table value. Employee table values can exist without a
matching Dependent, but require a matching Department, and so on. This
means that the Department value is hierarchically over Employee and
Employee is hierarchically over Dependent. One-sided joins can also model
nonhierarchical data structures, which will be covered in Chapter 6. Join table
order and its effect on one-sided outer join operations involving three or more
tables is a complex issue that will also be covered in further detail in Chapter 6,
having to do with data modeling with the outer join.
Being hierarchical in nature, one-sided outer joins can build hierarchical
structures top-down, as shown in Figure 3.6, or by changing the join order to
affect building the hierarchical structure bottom-up, as shown in Figure 3.7.
Because the one-sided outer join is hierarchical in nature, reordering the join
from top-down to bottom-up execution does not change the result. If this is
true, it would prove that the one-sided join is associative in operation—at least

Department Employee Dependent


Table: Table: Table:
DeptA 123 EmpX 10 DeptB Dpnd1 16 EmpY
DeptB 456 EmpY 20 DeptC Dpnd2 18 EmpZ
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV

Department
SELECT * FROM Department
LEFT JOIN Employee ON DeptNo=EmpDeptNo Employee
LEFT JOIN Dependent ON EmpNo=DpndEmpNo
Dependent
Result produced from above query:

Department Employee Dependent


DeptA 123 Null Null Null Null Null Null
DeptB 456 EmpX 10 DeptB Null Null Null
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV

Figure 3.6 One-sided outer joins are hierarchical in nature.


Standard SQL Join Types and Their Operation 29

This query produces the same result as shown in Figure 3.6

SELECT * FROM Department LEFT JOIN Department


Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo Employee
ON DeptNo=EmpDeptNo
Dependent

Figure 3.7 One-sided outer join can also build structures bottom-up.

when defining hierarchical structures. The following examples will demonstrate


that this is so.
The RIGHT outer join also builds hierarchical data structures, which is
shown in Figure 3.8. The RIGHT outer join naturally builds the hierarchical
data structure bottom-up using left-sided nesting. As tables are added from the
right, they take the top position since they are being preserved.
The one-sided outer join examples above demonstrate building a one-leg
hierarchical data structure. The one-sided outer join can also build multileg
data structures. The SQL examples in Figure 3.9 demonstrate a one-sided outer
join operation building a multileg hierarchical structure. These examples use
the data and data relationships that the previous one-sided outer join examples
did, but produce different results. In these examples, the Employee table is
directly over the Department and Dependent tables. Note that the legs of the
structure can be added in any order. This characteristic of hierarchical struc-
tures will be discussed further in Chapter 5.
Up until the multileg hierarchical example in Figure 3.9, the single-leg
hierarchical structures shown in Figures 3.6 to 3.8 behaved associatively as
defined in Chapter 2. The multileg structure in Figure 3.9 demonstrates that
multiple legs of structures can be joined in any order without changing the
result, but the rules for associativity and/or commutativity, as specified in
Chapter 2, cannot be applied here to explain this behavior. This is because
one-sided joins are not commutative, yet in this example changing the tables

This query produces the same result as shown in Figure 3.6

Department
SELECT * FROM Dependent
RIGHT JOIN Employee ON EmpNo=DpndEmpNo Employee
RIGHT JOIN Department ON DeptNo=EmpDeptNo
Dependent

Figure 3.8 RIGHT outer join also builds hierarchical structures.


30 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Department Employee Dependent


Table: Table: Table:
DeptA 123 EmpX 10 DeptB Dpnd1 16 EmpY
DeptB 456 EmpY 20 DeptC Dpnd2 18 EmpZ
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV

SELECT * FROM Employee


LEFT JOIN Department ON DeptNo=EmpDeptNo
LEFT JOIN Dependent ON EmpNo=DpndEmpNo

Reversing joining of the legs: Employee

Department Dependent
SELECT * FROM Employee
LEFT JOIN Dependent ON EmpNo=DpndEmpNo
LEFT JOIN Department ON DeptNo=EmpDeptNo

Both queries above produce the following result:

Employee Department Dependent


EmpX 10 DeptB DeptB 456 Null Null Null
EmpY 20 DeptC Null Null Dpnd1 16 EmpY
EmpV 40 DeptD DeptD 789 Dpnd3 21 EmpV

Figure 3.9 Multileg hierarchical data structure example.

around in the join operations did not change the results. The principle of hier-
archictivity as coined and defined in Chapter 2 can be applied to multileg hier-
archical structures like this one as well as the single-leg hierarchical structures
shown in Figures 3.6 to 3.8.
The principles of hierarchictivity intuitively make sense, since one-sided
joins are hierarchical in nature and hierarchical structures can be built top-
down, bottom-up, left to right, right to left, or in any combination of these
methods. These one-sided outer join operations can build very complex and
powerful hierarchical data structures. Chapter 5 supplies a review on hierarchi-
cal data structures, and Chapter 6 describes in detail how to model these data
structures using one-sided outer joins.
One-sided joins can also model complex structures that are not hierarchi-
cal structures. When these structures are used in applications, it may be difficult
to predict their operation because they can lack unambiguous semantics. It is
useful to see how this nonhierarchical modeling can occur through one-sided
joins. This awareness can prevent the accidental use of nonhierarchical data
structures. Figure 3.10 demonstrates a nonhierarchical structure being mod-
eled. As is shown, this structure can be modeled in more than one way. While
this structure resembles a network structure, it doesn’t actually operate like one
Standard SQL Join Types and Their Operation 31

SELECT * FROM Employee


RIGHT JOIN Dependent ON EmpNO=DpndEmpNo
RIGHT JOIN Department ON DeptNo=EmpDeptNo
Department
Or specified another way: Dependent

Employee
SELECT * FROM Department LEFT JOIN
Dependent LEFT JOIN Employee ON EmpNO=DpndEmpNo
ON DeptNo=EmpDeptNo

Figure 3.10 Nonhierarchical one-sided join example.

because the legs relate to each other hierarchically. In this structure, the Depart-
ment table is hierarchically above the Dependent table. If an Employee row
doesn’t have a link to a Department row, then the unmatched Employee rows
and their parent Dependent rows are excluded from the result. Other nonhier-
archical structures can be created from complex ON clauses consisting of refer-
ences to more than two tables. More information on these nonhierarchical
structures can be found in Chapter 6.
Following the rules for assessing associativity specified in Chapter 2, the
one-sided outer join does not operate nonassociatively, making its operation
under our definition associative. This does not include intermixing LEFT and
RIGHT joins, which may perform nonassociatively. The modeled nonhierar-
chical structure in Figure 3.10 will also produce a different result if the order its
legs are joined in is reversed. In this structure, the order of the legs has signifi-
cance, but the table reordering required to accomplish this is outside the scope
of associativity, which only includes regrouping.

3.3 INNER Join


The INNER join’s older SQL-89 format is still valid in the newer SQL-92
standard SQL format. This newer INNER join format can be explicitly speci-
fied or specified by default if no join type is specified. This is shown in Figure
3.11. The INNER join does not preserve data on either side of the join opera-
tion. This enables ordering a series of INNER joins in any fashion involved
without affecting the result. This means the INNER join operation is both
commutative and associative.
32 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Older still valid INNER join format is:

SELECT * FROM T1, T2, T3


WHERE T1=T2 AND T2=T3

The SQL-92 INNER join format is:

SELECT * FROM T1
[INNER] JOIN T2 ON T1=T2
[INNER] JOIN T3 ON T2=T3

Figure 3.11 Example of INNER join formats.

3.4 CROSS Join


The CROSS join is a basic operation. It is the same as an inner join with no
join criteria, so that all combinations of the input table arguments are gener-
ated. This is the Cartesian product, which is not usually a very useful end prod-
uct. The CROSS join is commutative and associative in operation, so the join
order does not affect the result. The inner join can be used to simulate the
CROSS join, as is shown in Figure 3.12, by specifying it so that the join criteria
is always satisfied.

3.5 UNION Join


The UNION join, also known as the outer union, is a new UNION operation
that can be specified with the standard SQL join syntax. Like the CROSS join,
it does not have an accompanying ON or USING clause. This operation is dif-
ferent than standard UNION operations in that the two tables being
UNIONed can have different column formats so that they cannot be joined
directly under each other. The UNION join is performed by offsetting the
rows of one table to the right with nulls that match the other table’s format and
reversing this procedure for the other table, performing this offsetting of rows

SELECT * FROM Table1 CROSS JOIN Table2

Simulated by:

SELECT * FROM Table1 INNER JOIN Table2 ON 1=1

Figure 3.12 Example of the CROSS join operation.


Standard SQL Join Types and Their Operation 33

on the left side with nulls. Then the two tables can be UNIONed one on top of
the other as shown in Figure 3.13. This outer UNION effect can also be per-
formed by a FULL join by specifying the join criteria to never match, as shown
in Figure 3.13.

3.6 Intermixing Join Types


Intermixing of different join types in an standard SQL join specification is pos-
sible and makes the specification nonassociative, as you would suspect. There
are two concerns when intermixing join types. First, care must be used when
mixing join types that include join conditions with those that do not have join
conditions. This complicates determining the join order for the user. This was
discussed in Chapter 2. Second, care must be used when intermixing different
join types because they have different levels of data preservation abilities and
attributes that can conflict with each other, making their operation destructive
and the result illogical. This is because some joins will remove data that was
preserved by previous data-preserving joins, as shown in Figure 3.14. In these
examples, a line is drawn through the rows that are created from the first join
and then removed by the second join.
In both SQL examples in Figure 3.14, data preserved from the Depart-
ment table when there is no matching row in the Employee table can still be
lost if there is no matching row in the Dependent table. This is because in the
first SQL example the inner join loses data from all sides, and in the second
SQL example the RIGHT join loses data introduced from the left, which had
been preserved from the preceding LEFT join. This is probably not desirable

SELECT * FROM TableX UNION JOIN TableY

Simulated by:

SELECT * FROM TableX FULL JOIN TableY ON 1>2

Both the above statements produce the following result:

Table X: + Table Y: = UNION Result:

ABC 5555 1234 WXYX ABC 5555 Null Null


DEF 6666 5678 STUV DEF 6666 Null Null
Null Null 1234 WXYZ
Null Null 5678 STUV

Figure 3.13 Example of a UNION join.


34 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Department Employee Dependent


Table: Table: Table:
DeptA 123 EmpX 10 DeptB Dpnd1 16 EmpY
DeptB 456 EmpY 20 DeptC Dpnd2 18 EmpZ
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV

Destructive Example One:

SELECT * FROM Department


LEFTJOIN Employee ON DeptNo=EmpDeptNo
INNER JOIN Dependent ON EmpNo=DpndEmpNo

Department Employee Dependent


DeptA 123 Null Null Null
DeptB 456 EmpX 10 DeptB
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV

Destructive Example Two:

SELECT * FROM Department


LEFT JOIN Employee ON DeptNo=EmpDeptNo
RIGHT JOIN Dependent ON EmpNo=DpndEmpNo

Department Employee Dependent


DeptA 123 Null Null Null
DeptB 456 EmpX 10 DeptB
DeptD 789 EmpV 40 DeptD Dpnd3 21 EmpV
Null Null Null Null Null Dpnd2 18 EmpZ
Null Null Null Null Null Dpnd1 16 EmpZ

Figure 3.14 The intermixing of different join types can be destructive.

since the Department data was preserved for some purpose. Chapter 7 docu-
ments a powerful coding technique to prevent this destructive behavior when
nondata-preserving (destructive) joins or intermixing join types must be used.

3.7 Conclusion
This chapter has looked at all of the different standard SQL join types: the
FULL, RIGHT, LEFT, CROSS, UNION, and INNER joins. Except for the
INNER join, all of these joins also preserve rows when there are no matching
rows.
The two types of outer joins, FULL and one-sided, while logically similar,
behave very differently when three or more tables are being joined together.
Standard SQL Join Types and Their Operation 35

One-sided joins operate hierarchically, while FULL joins do not since they are
symmetrical in operation.
Because the ON clause plays a major role with the outer join and greatly
limits its ability to be freely regrouped, the FULL and one-sided joins be-
have associatively. This can change when the NATURAL option is used. The
NATURAL option is documented in Chapter 4. Intermixing join types can
also make FULL and one-sided joins operate nonassociatively.
Commutativity and associativity do not account for all the valid cases
where the outer join specification can be rearranged and still produce the same
result. To help account for these additional cases, the term hierarchictivity was
introduced to account for the principles of hierarchical structures, which can
also be applied to the reordering of one-sided outer join statements.
4
Natural Joins
Natural joins are INNER, FULL, and one-sided joins where the common
named columns used in the join criteria are coalesced (turned into single-
column values) in the result. For example, when inner joining the Department
and Employee tables over the common key value of the department number,
DeptNo, it is usually convenient to have only one occurrence of the join key
value in the result instead of two (or more) copies of the same key value. This
assumes equal join (equijoin) conditions were used, and natural joins always
use equal join conditions. Natural joins take on added significance with outer
joins because of their data-preserving behavior. This introduces a situation
where one side or the other side of the join condition’s key values may be miss-
ing (null) from the result, making the key location unpredictable. In this case,
the coalesced key values allow a single key location to be used for each row in
the resulting table so it can be referenced easily and consistently. Depending on
the situation, coalescing of the join columns and natural join processing can
increase or decrease the associativity of outer joins across three or more tables
that are under a common domain. This can significantly change the operation
of the outer join operation, which is why it is being examined separately in this
chapter.

4.1 Explicit and Implicit Natural Joins


In standard SQL, natural joins can be specified explicitly or implicitly. The
explicit and implicit NATURAL options of the standard SQL syntax work in
conjunction with the LEFT, RIGHT, FULL, and INNER join operations to

37
38 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

coalesce the common named join column keys into single key values. As indi-
cated in the outer join syntax in Figure 2.1, when the NATURAL keyword
option is specified, the ON and USING clauses are not specified. This is
because the join condition is automatically taken as the equal join between col-
umns having the same name in the tables that are in the scope of control of the
outer join operation being performed.
An implicit natural join does not specify the NATURAL keyword; the
NATURAL option is indicated by coding the USING clause instead of an ON
clause to indicate which columns are to be equijoined and coalesced. This is
why this is also called a column name join. It assumes that the specified column
names occur in both table inputs or their scope of control. This gives more con-
trol than the explicit natural join option by externally controlling the specifica-
tion of which common named columns take part in the join condition. Just as
in the explicit natural join, the column names that take part in the join condi-
tion are coalesced in the result. The example in Figure 4.1 demonstrates the
explicit and implicit natural joins and how the column results are affected by
natural joins. In this example, the explicit and implicit natural joins produce
identical results, as you would expect.
The first SQL example in Figure 4.1 is a standard inner join statement
that shows in its result two copies of the join condition key value 123. The next
two SQL join examples demonstrate an explicit and implicit natural inner join.

No Natural Option:

SELECT * FROM Dept


INNER JOIN Emp ON Dept.DeptNo=Emp.DeptNo

Result: Dept.DeptNo DeptName EmpName Emp.DeptNo


123 HR John 123

Explicit Natural:

SELECT * FROM Dept NATURAL INNER JOIN Emp

Implicit Natural:

SELECT * FROM Dept INNER JOIN Emp USING (DeptNo)

Both queries produce: DeptNo DeptName EmpName

123 HR John

Figure 4.1 Explicit and implicit natural inner join example.


Natural Joins 39

They are equivalent statements. In these examples, DeptNo is the key in the
Department table (Dept) and a foreign key in the Employee table (EMP). This
key is used to perform the join operation. Because this is an equijoin, the join
condition column named DeptNo in each resulting row will always have the
same DeptNo values and can be coalesced for convenience.
The NATURAL option when applied to columns across two tables does
not affect its internal operation. This is not the case for natural joins across
three or more tables over a common column (domain). This is described
directly below.

4.2 Multitable Natural Outer Joins


With the outer join, the NATURAL operation can have a significant effect on
the results when the join involves more than two tables joined over a common
named key. This is because the coalesced result in the working set continues to
be referenced after the initial join operation. For example, in the explicit natu-
ral FULL join SELECT * FROM T1 NATURAL FULL JOIN T2 NATURAL
FULL JOIN T3, the join condition for Table T3 will reference its key columns
from itself and the coalesced key column value produced from the previously
coalesced key values of table T1 and table T2, which are stored in the working
set. This is demonstrated visually in Figure 4.2, which uses the Coalesce func-
tion to simulate the operation of a natural join. The NATURAL option has a
significant effect that changes the operation of the outer join, altering its opera-
tion and result. One-sided and FULL outer join operations are affected differ-
ently by this coalescing operation, as described below under one-sided and
FULL outer joins.
The simulation of a multitable natural join, shown in Figure 4.2, applies
to both the explicit and implicit natural joins. The implicit natural join’s opera-
tion with its join requirements specified externally through the USING clause
operates just as if it was externally specified. The explicit natural join’s opera-
tion is driven internally by the column names that match from the tables being
joined. The table names that match may seem obvious if you are familiar with
the column names, but there is one situation where the explicit natural join
may act nonassociatively that you should be aware of. This can happen when
the common named columns are not in all of the tables being joined at each
join point. This can cause the explicit natural join to operate differently
depending on the table join order. This is demonstrated in Figure 4.3.
The two explicit natural FULL joins in Figure 4.3 demonstrate that the
table join order can make a difference in the result when all the tables do not
have the same matching column names. In fact, the resulting data is not only
40 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT * FROM T1 NATURAL FULL JOIN T2


NATURAL FULL JOIN T3
NATURAL FULL JOIN T4

NATURAL option (such as above) can be simulated by:

SELECT Coalesce(T1.X, T2.X, T3.X,T4.X) FROM T1


FULL JOIN T2 ON T1.X=T2.X
FULL JOIN T3 ON Coalesce(T1.X, T2.X) =T3.X
FULL JOIN T3 ON Coalesce(T1.X, T2.X,T3.X) =T4.X

Figure 4.2 Simulating the coalescing effect of the natural outer join.

Table Names: T1 T2 T3
Column Names: X Y X Z Y Z
Values: 1 2 0 3 2 3

Explicit natural join:

SELECT * FROM T1 NATURAL FULL JOIN T2


NATURAL FULL JOIN T3

Equivalent implicit natural join:

SELECT * FROM T1 FULL T2 USING (X)


FULL T3 USING (Z,Y)

Result 1: X Y Z
1 2 Null
0 Null 3
Null 2 3

Explicit natural join with join order changed:

SELECT * FROM T1 NATURAL FULL JOIN


(T2 NATURAL FULL JOIN T3)

Equivalent implicit natural join:

SELECT * FROM T2 FULL JOIN T3 USING (Z)


FULL JOIN T1 USING (Y,X)

Result 2: X Y Z
0 2 3
1 2 Null

Figure 4.3 Explicit natural join may act nonassociatively.


Natural Joins 41

arranged differently between columns—it is different. This is because the join


columns are determined as the join statement is processed, driven by the table
join order. The equivalent implicit natural join specifications in the example
indicate how the explicit natural join will operate. Notice that the USING
clause specifications in the equivalent implicit natural joins are different
between the first and second examples, proving that the two explicit natural
joins are not equivalent, making the explicit natural join nonassociative in this
example.
Let’s take a closer look at the explicit natural join process in Figure 4.3. In
the first explicit natural join example, tables T1 and T2 are joined first and the
common named join column selected is X. When table T3 is joined to the
working set, the common named columns selected are Z and X, which were
also in the working set. This produced the first result shown. In the second
explicit natural join example, tables T2 and T3 are joined first and the com-
mon named join column selected is Z. When table T1 is joined to the working
set, the columns selected are X and Y, which were in the working set. This pro-
duced the second result shown. The results are different because the selected
column names in these two examples are combined differently. In the first
example, table T1 is joined using column X, and in the second example it is
joined using columns Y and X.

4.3 Natural One-Sided Outer Join


Because of the data-preserving effect of one-sided joins joined across more than
two tables with common join columns, one-sided join results can be affected by
the natural join operation. With these one-sided joins, the results can no longer
model hierarchical structures. This is because the coalesced value of the one-
sided operation does not retain the chaining effect necessary to model hierar-
chical structures. With a standard one-sided join, for example, table T1 can
reach table T2, and table T2 can then reach table T3. If table T1 cannot reach
table T2, or table T2 cannot reach table T3, then table T3 cannot be reached.
But when join key coalescing is performed, table T3 can be reached even if
table T2 cannot be reached, because table T1’s key value is used because of the
coalescing operation. This behavior is not hierarchical in nature since table T3
can be reached from multiple paths—table T1 or table T2. The examples in
Figure 4.4 demonstrate this behavior.
Notice in Figure 4.4 how the hierarchical LEFT join (the first join state-
ment) goes down the structure in a chain fashion, joining on columns from
tables T1,T2, and then from tables T2,T3. This means that as soon as a miss-
ing table row occurrence (or link) is encountered, the rest of the row will be
42 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Table T1: Table T2: Table T3:


Key1 T1A Key2 T2B Key1 T3C

Hierarchical LEFT join:

SELECT * FROM T1 LEFT JOIN T2 ON T1.X=T2.X


LEFT JOIN T3 ON T2.X=T3.X

Hierarchical join result:

Key1 T1A Null Null Null Null

Natural LEFT join:

SELECT * FROM T1 NATURAL LEFT JOIN T2


NATURAL LEFT JOIN T3

Simulated Natural join result:

Key1 T1A Null T3C

Figure 4.4 Natural LEFT joins are nonhierarchical.

null because the chain has been broken. The natural LEFT join does not sup-
port this chaining effect. Basically, the first table (T1) is always preserved and
its key join value(s) remains in force because of the coalescing effect of the
NATURAL option. This will increase the amount of data preserving that is
possible based on table T1’s key values, as can be seen in the inclusion of value
T3C in the natural join result.
After the lead table is processed in one-sided natural joins as in Figure 4.4,
the join order of the other tables can be changed without affecting the result.
This means that the first statement establishes the result, making the natural
one-sided join nonassociative. This is proven in Figure 4.5, which demonstrates
that changing the join order of a natural join can produce a different result.

4.4 Natural FULL Outer Join


FULL joins consisting of more than two tables across common named join col-
umns open the possibility of generating results that can be affected by the
NATURAL option. All FULL joins will preserve the total amount of data pos-
sible regardless of the order that the tables are joined in. This is because no data
is lost. The effect that the NATURAL option has on the FULL outer join is to
join the tables producing the fewest number of rows possible. It condenses the
Natural Joins 43
Table T1: Table T2: Table T3:
Key1 T1A Key2 T2B Key1 T3C

SELECT * FROM T1 NATURAL LEFT JOIN T2


NATURAL LEFT JOIN T3

Produces join result: Key1 T1A Null T3C

SELECT * FROM T1 NATURAL LEFT JOIN


(T2 NATURAL LEFT JOIN T3 )

Produces different join result: Key1 T1A Null Null

Figure 4.5 Natural LEFT joins are nonassociative.

rows. This is because with coalesced data, there is always a non-null key avail-
able to match on, reducing the generation of null data and creating a predict-
able result. The examples in Figure 4.6 demonstrate this effect.
The standard FULL join shown at the top of Figure 4.6 is not a natural
join. Because of this, it is difficult to predict the order that the rows will be
combined in, as shown in the first example. Using the explicit or implicit natu-
ral FULL join in the second example in Figure 4.6, the rows are condensed,
more predictable, and easier to process, because with the NATURAL option
there is always a fixed key position available to match on. Notice also that the
result rows of the natural FULL join, excluding nulls, contain the same data as
the standard FULL join. This, as explained above, is because no data is lost

Table T1: Table T2: Table T3:


Key1 T1A Key2 T2B Key1 T3C

Standard FULL join:


SELECT * FROM T1 FULL JOIN T2 ON T1.X=T2.X
FULL JOIN T3 ON T2.X=T3.X

Result: Key1 T1A Null Null Null Null


Null Null Key2 T2B Null Null
Null Null Null Null Key1 T3C

Natural FULL join:


SELECT * FROM T1 NATURAL FULL JOIN T2
NATURAL FULL JOIN T3

Condensed result: Key1 T1A Null T3C


Key2 T2B Null Null

Figure 4.6 Natural FULL join producing the most condensed result.
44 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

with a FULL join. Because of this condensing effect, the natural FULL join is
associative in operation (except for the special situation concerning explicit
natural joins documented in Section 4.2).
Since the natural join produces the most condensed result, it also follows
that the natural FULL join can also be reordered in any manner without
changing the result. This is also demonstrated in Figure 4.7. There is another
reason for this behavior, which applies here and in the inner join example in
Figure 4.8. The natural FULL join and natural inner join are both commuta-
tive and associative in operation. By applying both these properties together,
the SQL statement can be completely reordered in any fashion without chang-
ing the result.

4.5 Natural Inner Joins


The NATURAL option of the inner join does not produce any side effects, so
the results of a natural inner join and a standard inner join produce the same
result except for the resulting coalesced values, as shown in Figure 4.1. This is
because there is no data preserving occurring with inner joins, so the coalesced
value of its join condition values is always the same as the values that make it
up. There is never a case where one side is missing and the other side is not.
Either both sides exist or both sides are missing. With inner joins, nulls cannot

Table T1: Table T2: Table T3:


Key1 T1A Key2 T2B Key1 T3C

SELECT * FROM T1 NATURAL FULL JOIN T2


NATURAL FULL JOIN T3
Change join order:

SELECT * FROM T1 NATURAL FULL JOIN


(T2 NATURAL FULL JOIN T3)
Reorder tables:

SELECT * FROM T3 NATURAL FULL JOIN T1


NATURAL FULL JOIN T2

All produce condensed result: Key1 T1A Null T3C


Key2 T2B Null Null

Figure 4.7 Natural FULL join is associative and supports reordering.


Natural Joins 45

be introduced into the result from missing rows because this condition causes
the entire row to be eliminated.
The natural inner join examples in Figure 4.8 demonstrate that the natu-
ral inner join can be completely reordered and it will not change the result.
This behavior includes associativity. Because rows are so easily eliminated with
inner joins, the example data was increased in this example from the previous
examples to derive a result; otherwise, the inner joins in these examples would
have produced empty results.

4.6 Intermixing Natural Join Types


Applying natural joins to different join types in a join statement is perfectly
acceptable, with the same warnings already covered in Chapter 3, which dis-
cussed intermixing join types. Each natural join is executed in turn, leaving its
coalesced result in a working set as input into the next natural join. So each
natural join is executed in isolation when its execution turn comes up. This
means the operation of intermixing natural join types is predictable and in
some cases may even be useful.
This intermixing of natural join types can also include join types that do
not include the NATURAL operation for the same reasons as explained above.
This means having join types that do not include NATURAL operations does
not interfere with the NATURAL operation of other natural joins in the join

Table T1: Table T2: Table T3:


Key1 T1A Key2 T2B Key1 T3C
Key2 T2A Key3 T3B Key2 T2C

SELECT * FROM T1 NATURAL INNER JOIN T2


NATURAL INNER JOIN T3
Change join order:

SELECT * FROM T1 NATURAL INNER JOIN


(T2 NATURAL INNER JOIN T3 )
Reorder tables:

SELECT * FROM T3 NATURAL INNER JOIN T1


NATURAL INNER JOIN T2

All the above SQL statements produce: Key2 T2A T2B T2C

Figure 4.8 A natural inner join is associative and supports reordering.


46 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

statement, or vice versa. Explicit and implicit natural joins can also be inter-
mixed. Intermixing of natural join types is nonassociative. An example of this is
shown in Figure 4.9.

4.7 Natural One-Sided Join Transformation


The NATURAL one-sided join operation applied across multiple joins, as
described in Section 4.3, has an interesting characteristic where the lead key
value is propagated through the join operations. This characteristic prevents the
normal hierarchical chaining operation that was shown in Chapter 3. But this
characteristic does have a hierarchical mapping. This is demonstrated in Figure
4.10. Since the root key is propagated through the structure, all other elements
are related directly and solely to the root producing the structure shown. This
also means that the natural one-sided join specification can be transformed into
a more intuitive non-natural one-sided SQL specification that more directly
models the structure. This is also shown in Figure 4.10.
The SQL transformation in Figure 4.10 above is from a series of natural
one-sided joins to a series of non-natural one-sided joins. The only difference in
these two join specifications is that the join keys are coalesced into a single col-
umn value in the NATURAL join specification, and the join keys are not coa-
lesced in the nonnatural join. But in the non-natural join, the join key from the
first preserved join table contains the same value as in the natural join result, so
it should be treated as the coalesced key.

Table T1: Table T2: Table T3:


Key1 T1A Key2 T2B Key1 T3C

SELECT * FROM T1 NATURAL INNER JOIN T3


NATURAL FULL JOIN T2

Produces : Key1 T1A T3C Null


Key2 T2B Null Null

SELECT * FROM T1 NATURAL INNER JOIN


(T3 NATURAL FULL JOIN T2)

Produces: Key1 T1A T3C Null

Figure 4.9 Intermixing natural join types is nonassociative.


Natural Joins 47
SELECT * FROM A
A A
NATURAL LEFT JOIN B
B B C D
NATURAL LEFT JOIN C
SELECT * FROM A
C
LEFT JOIN B ON A=B
NATURAL LEFT JOIN D LEFT JOIN C ON A=C
D LEFT JOIN D ON A=D

Figure 4.10 Natural one-sided outer join transformation.

The fact that this natural one-sided outer join transformation is possible
also points out that the natural feature for one-sided outer joins does not offer
any additional capabilities beyond the one-sided outer join operation. This
means it can be avoided by using the more intuitive non-natural one-sided
outer join.

4.8 Conclusion
The NATURAL join option takes on new meaning with outer joins because it
can significantly affect the results of outer joins. This occurs when more than
two tables are natural outer joined across a commonly named column. The
natural outer join operation guarantees that there is always a coalesced key col-
umn value available to join with any of the following tables to be joined. This
changes the operation of one-sided outer joins and FULL outer joins. With
one-sided outer joins, it can cause more data to be preserved and change their
operation to be nonassociative. With FULL outer joins, the NATURAL option
can produce more condensed and predictable results having fewer rows while
containing the same data, and it remains associative in operation except for one
case—this being that explicit natural joins can behave nonassociatively when all
of the tables do not have the same commonly named tables consistently across
the natural join.
Part II
Outer Join Data Modeling and
Structured Processing
Part II documents in detail the inherent data modeling and structure-
processing capabilities of the standard SQL outer join operation. These are
capabilities that outer join users can utilize immediately. Chapter 5 supplies a
background in data modeling and data structure processing. Chapter 6 shows
in detail how the standard SQL outer join operation can perform complex data
modeling. Chapter 7 introduces new data modeling–related features. Chapter 8
supplies further information on the outer join’s data modeling capabilities.

49
5
Data Structure Review
Working with SQL and its lack of data modeling, relational database profes-
sionals may have a tendency to forget about data structures and their inherent
capabilities. This chapter serves as a short review on data structures, data mod-
eling, and data structure processing necessary to understand the outer join’s
data modeling and structure-processing capabilities identified and demon-
strated in this book.

5.1 The Power of Hierarchical Data Structures


Hierarchical structures, unlike network structures, contain only one path to
each data item in the structure, which can be seen in Figure 5.1. This makes
them unambiguous and singular in meaning. Unambiguous structures have
powerful semantics that can implicitly control the data processing of the data
structure. This is primarily what controls the nonprocedural operation of
fourth generation (declarative) languages (4GLs) and gives them their self-navi-
gating and nonprocedural processing ability.
Since data structures are not unique to relational databases, the term seg-
ment is often used to refer to a group of singularly related data analogous to a
relational data table. This term will be used instead of table when a more
generic term is called for.
Both of the data structures in the Department and Employee views in
Figure 5.1 are comprised of the same tables and the same relationships, yet they
both have very different structures. Different structures means they have differ-
ent semantics, which produces different results. In the Department view, an

51
52 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

employee and his or her dependents cannot exist if they are not associated with
a department (i.e., Bill is missing). This is not the case in the Employee view,
which has the opposite semantics that prevent department DeptC from existing
since it has no employees associated with it. This situation is possible if an
entire department is outsourced. In the Department view, DeptC can still exist
and can have a budget and other information associated with it.
Ignoring which fields are present and their column order in Figure 5.1,
notice that the Department and Employee views’ data appear to handle repli-
cated data differently. Hierarchical higher level values control (or own) lower
level values, as shown in both data view displays. Most obvious is that repli-
cated data is totally eliminated in the Department view. To represent this in the
data display, a blank field means that the last value printed in that column is
still valid (unless a dash appears, which means the value is missing). Replication
of the department name is not necessary since any given department can have
many employees in this view and shouldn’t need repeating for each employee
occurrence. The structured output represents the actual data in the view. This
is WYSIWYG (“what you see is what you get”) display processing based on the
semantics of the data structure.
Over in the Employee view in Figure 5.1 you will notice that DeptA is
replicated when the next employee, Mary, is introduced in the display. This fol-
lows the semantics of the Employee view where Employee segment is hierarchi-
cally over Department segment so that each employee has its own department
occurrence. This view’s WYSIWYG display is also valid, showing the correct
replication (notice that employee Mike, with two dependents, did not cause a

Department View Employee View


Department Employee

Employee

Dependent Department Dependent

Dept Emp Dpnd Emp Dept Dpnd


DeptA Mike Jason Mike DeptA Jason
Jane Jane
Mary Sam Mary DeptA Sam
DeptB John - John DeptB -
DeptC - - Bill - Sara

Figure 5.1 Two application views with the same relationships and their data.
Data Structure Review 53

replication). Knowledge of the data structure will further improve the useful-
ness and application of this intuitively formatted data.
The data displays of the Department and Employee views in Figure 5.1
represent the semantics of their data structures—for example, if you were to
take and divide up both views’ data into separate structured records based on
the root value as the record key. Then each view would still reflect the same
data value occurrence counts (cardinality) shown. This verifies that the con-
trolled replicated values are correct.
Most query languages that operate on hierarchical structures are self-navi-
gating, following the data structure, and are controlled by the semantics of the
data structure. This makes them intuitive and powerful. They follow rules
based on parentage and sibling segment (multileg) operation derived from the
hierarchical semantics. Parentage rules can affect processing by controlling
internal looping ranges. Sibling segments are different data paths directly under
the same (common) parent, such as the Department and Dependent paths in
the Employee view in Figure 5.1. The segment occurrences in each of the paths
do not correspond in a one-to-one fashion; they are related only by their com-
mon parent—in this case, Employee—and are otherwise independent of each
other. The left-to-right positioning of segments under a common parent is not
significant. In the Employee view in Figure 5.1, the Dependent and Depart-
ment segments could be reversed without changing the semantics or results.
Combining the above fourth-generation semantics with the Employee
view in Figure 5.1, for example, data selection based on a given department
value from the Department leg and displaying dependents from the Dependent
leg will select all dependents under the active common parent Employee. Using
the Employee view in Figure 5.1, SELECT Dpnd FROM EmployeeView
WHERE Dept=“DeptA”, will in this case display all dependents—Jason, Jane,
and Sam—from department DeptA. This query works by satisfying the selec-
tion criteria to determine the active common parent(s): Mike and Mary from
the Employee table, which controls the range of selected data; Jason and Jane
under Mike; and Sam under Mary. This cycle is repeated until all selection cri-
teria in the database have been tested.

5.2 Three-Tier Database Architecture


The three-tier schema approach to database modeling and design consists of
three levels of views that define all aspects of how the database is stored and
how it can be accessed. These three view levels are the external view, the con-
ceptual view, and the internal or physical view, which are used respectively
by the user, the DBA, and the database system. This is shown visually in
54 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Figure 5.2. These three levels allow for a much greater level of database flexibil-
ity than if they were not used. Unfortunately, relational databases do not inher-
ently support this, but by following good database design, it can be supported
externally.

5.3 External and Internal Views


The external view is how an application perceives the database, and for this rea-
son it is also known as the application view. Different applications can view the
same database in different ways. For example, the Employee and Department
views shown in Figure 5.1 are comprised of the same tables and relationships,
but have very different views, semantics, and associated data. Application views
have to be unambiguous, and for this reason they use the hierarchical data
model.
Internal views represent and control how the tables and data are physi-
cally stored and related in storage. External views and conceptual views (cov-
ered in the next section) are logical views. They bear no relationship to how the
data in the database is actually stored and related.

5.4 Conceptual View


The conceptual (or global) view is usually a network structure representing all
the possible valid or necessary relationships that are required in the database.
Being a network structure, this structure is ambiguous by itself since a given
data element may be accessed from more than one path, with each having dif-
ferent semantics. The conceptual view in Figure 5.3 encompasses the Depart-
ment and Employee application views.
The conceptual view logically lies between the external and internal
views, and is used to control how the external and internal views are related or

View Types: Uses:

External User/Application

Conceptual DBA (The Big Picture)

Internal Database System

Figure 5.2 Three-tier database architecture.


Data Structure Review 55

Department Conceptual Employee


View View View

Department
Department Employee Employee
Employee

Dependent Department Dependent


Dependent

Figure 5.3 Conceptual view that encompasses the Department and Employee views.

mapped to one another. The conceptual view logically separates the external
and internal structures, allowing the internal view to change without changing
the external views, and allows the external views to change without changing
the internal view. This adds greatly to the data and structure independence,
database flexibility, and reduced maintenance requirements.

5.5 Many-to-One and One-to-Many Relationships


Many-to-one (M to 1) and one-to-many (1 to M) relationships are the main
types of data relationships that deal with occurrence count (cardinality) of
data items in application data structures. Their names describe their relation-
ship. The employee-to-department relationship is a many-to-one relationship
because many employees can have the same department. In a department-to-
employee relationship, the relationship is one-to-many because one department
can have many employees. This can be seen in Figure 5.4.
One-to-many and many-to-one relationships are hierarchical. As such,
they follow the same behavior as was documented in Section 5.1, which
described hierarchical data structures and their structured data display. This is
reflected in Figure 5.4.

5.6 Many-to-Many Relationships


Notice that one-to-many and many-to-one data structures are the same basic
relationships turned around. One implies the other. This is also true of a
many-to-many (M to M) relationship like parts and suppliers. One part can
have many suppliers and one supplier can have many parts. In a hierarchical
environment, many-to-many relationships look like a one-to-many relationship
56 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

One-to-Many Many-to-One
Relationship Relationship

Dept Emp Emp Dept


Department DeptA Mike Employee Mike DeptA
Mary Mary DeptA
DeptB John John DeptB
Employee DeptC - Department Bill -

Figure 5.4 WYSIWYG display of many-to-one and one-to-many relationships.

in either direction, but in reality, they exhibit characteristics of both. Examine


the many-to-many relationships and their data in Figure 5.5.
In Figure 5.5, the structured output of the many-to-many Parts and Sup-
pliers views appear to be one-to-many relationships. But if you look closely,
you will notice that the data results in the second data column of both views
(the many occurrence side) also have repeating data somewhere in the column.
This is a characteristic of many-to-one relationships proving that a many-
to-many relationship has characteristics of both one-to-many and
many-to-one relationships. But this many-to-one characteristic can usually be
overlooked without consequences, so that many-to-many relationships can be
viewed primarily as one-to-many relationships—since this is the emphasis
of the semantics, as the visual structured display in Figure 5.5 demonstrates.
Many-to-many relationships in relational databases require an “associa-
tion” table to contain the relationships that can simultaneously relate tables as
many-to-one relationships in both directions. This is shown in Figure 5.5.

Parts-Suppliers M to M Relationship
Association
Suppliers Parts
Table

Suppliers View Parts View


Suppliers Parts

Parts Suppliers

Supplier1 Part1 S1 P1 Part1 Supplier1


Part2 S2 P1 Supplier2
Supplier2 Part1 S1 P2 Part2 Supplier1
Part2 S2 P2 Supplier2

Figure 5.5 Example data views of a many-to-many relationship.


Data Structure Review 57

Normally, the association table operation can be transparent to the result, as


also shown in Figure 5.5.
In Figure 5.6, you will notice the inclusion of prices in the Parts and Sup-
pliers data views. The interesting thing here is that each supplier can have a dif-
ferent price for a specific product. Where should the price be stored? It is stored
in the association table at its intersection point of Supplier and Part, and is
therefore referred to as intersecting data. In a structured database or structured
display, as in Figure 5.6, this intersecting data can be logically viewed as being
associated with the lower level relation, Part in the Suppliers view and Supplier
in the Parts view. The lower level is the only level that can logically accommo-
date intersecting data without causing replicated data.

5.7 Converting Network Structures to Hierarchical Structures


Often it is desired to have the same table in multiple locations of a hierarchical
data structure. For example, the same Employee table may be referenced for
department manager and product manager, causing a network type structure.
For an application view, this causes problems because network structures are
ambiguous, as was explained in Section 5.1. The simple solution is to rename
the multiple referenced table so it can logically become different tables in the
hierarchical data structure, allowing the semantics of the data structure to
become unambiguous, as shown in Figure 5.7.

5.8 Relating Hierarchical Processing to Relational Processing


With relational databases, the first normal form storage requirement forces the
use of flat tables. Because of this, the Cartesian product is necessary for joins to
satisfy join processing by producing all combinations of the join rows, as shown
in Figure 5.8. All combinations are also necessary for sibling segments (separate
legs of the hierarchy). This is because sibling segments or tables are not directly

Suppliers Association Parts


View Table View

Supplier1 Part1 $10 S1 P1 $10 Part1 Supplier1 $10


Part2 $20 S2 P1 $12 Supplier2 $12
Supplier2 Part1 $12 S1 P2 $20 Part2 Supplier1 $20
Part2 $23 S2 P2 $23 Supplier2 $23

Figure 5.6 Example data of many-to-many relationship and intersecting data.


58 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Network Structure Hierarchical Structure

Division Division

Product Department Product Department

Manager ProdMgr DeptMgr

Figure 5.7 Converting a network structure to a hierarchical structure.

SELECT * FROM Table1 INNER JOIN Table2 USING (Key)

Table1: Table2: = Cartesian Product Result:

Key Alpha Key Numeric Key Alpha Numeric


Key1 A Key1 1 Key1 A 1
Key1 B Key1 2 Key1 A 2
Key1 B 1
Key1 B 2

Figure 5.8 Cartesian product effect.

related to each other on a row-by-row basis and all combinations of the rows
are necessary to simulate independent processing of the legs so they can be
accessed in any order or combination.
In Figure 5.8, we can see how the Cartesian product effect can explode
the join result when one-to-many relationships cause multiple keys to match in
both tables, such as Key1 in this example. This exploded result becomes neces-
sary because standard relational data is forced into using flat two-dimensional
tables, so the result table as shown above has to be exploded to hold the results.
This becomes particularly important in selecting or filtering data based on data
from two or more tables, as in the WHERE clause of WHERE Alpha=“B” AND
Numeric=1 applied to the data result in Figure 5.8. Locating the table row
result with an Alpha value of B and a Numeric value of 1 requires exploding
the result rather than joining the tables in a simple parallel join method,
which would not produce a row with these values since they are on different
occurrences.
Applying this Cartesian product effect to the joining of the Department,
Employee, and Dependent tables produces a flat, tabular SQL table structure,
as shown in Figure 5.9.
Data Structure Review 59

Structured View SQL Structure


Dept
Dept Emp Dpnd Dept Emp Dpnd
Emp DeptA Mike Jason DeptA Mike Jason
Jane DeptA Mike Jane
Dpnd Mary Sam DeptA Mary Sam

Figure 5.9 Data structure relationship to Cartesian product.

Notice that with the flattened first normal form structure in Figure 5.9,
the same hierarchical processing as was used in Section 5.1 is achieved by pro-
cessing each row one at a time. No looping or navigation is necessary since all
combinations have been generated and exist in the rows. This means that the
same query used for hierarchical access in Section 5.1 can be used in this case to
achieve the same data results with the flattened structure shown in Figure 5.9.
This query was SELECT Dpnd FROM DeptEmp WHERE Dept=“DeptA”,
which will display all dependents—Jason, Jane, and Sam—from department
DeptA. While this example produces the same results as the identical query in
Section 5.1, flat structures like the one in Figure 5.9 will often produce repli-
cated data in the result. This is the result of the replicated data introduced into
the creation of the flat structure as described in Chapter 1 and shown in
Figure 5.9. This can be seen in the query SELECT Dept FROM DeptEmp
WHERE Dept=“DeptA”, which when applied to the data in Figure 5.9 will rep-
licate the value DeptA three times—once for each row that is present.

5.9 Physical Versus Logical Data Structures


Physical data structures are fixed structures that can’t be changed easily, if at all.
Their relationships are based on physical pointers or physical juxtapositioning,
as is the case with structured file records. On the other hand, logical data struc-
tures, like relational structures, use data values that can create linkages dynami-
cally. This allows them to be very flexible in specifying their data structures.
Outside of these differences, there needs to be no basic differences in how these
structure types are navigated and processed. At the lower level, logical structures
may require additional structure comprehension logic.
SQL is a suitable language for the processing of physical and logical data
structures. A limitation imposed on SQL is its Cartesian product processing
model. This can introduce problems in determining the logical data structure,
which relies on data values for this purpose. This means that if you are not care-
ful with formulating your queries, invalid results can occur, often unnoticed.
60 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

This does not happen in physical views, which always represent their actual
structure correctly. This is shown in Figure 5.10, where there are two employ-
ees with the same name in the same department, but this fact is lost in the logi-
cal database view because the structure is determined by data. While this error
could be corrected by taking the count using a unique key, the fact is that the
physical data view is not subject to this error situation.

5.10 Sibling Legs Query Semantics


Since sibling legs do not correspond directly to one another, but are related
through their common parent, their semantics are more complex than what has
been discussed previously. In the data structure in Figure 5.11, the parent Div
for division has two siblings legs, Prod for product and Dept for department.
Each has multiple occurrences of data. What happens if a query qualifies a
search from one of these sibling legs and selects data from the other sibling leg,
as shown in the query in Figure 5.11? The semantics dictate that if one data
occurrence is qualified from one leg, then all data occurrences from the other
sibling leg are selected. This is also depicted in the query’s structured data
output in Figure 5.11. While these exact semantics may seem a bit arbitrary,
they are actually backed up by the same query applying the relational Cartesian
product processing model, also shown in Figure 5.11.
Another example of multileg semantics is when multiple legs are used in
the selection criteria as in Figure 5.12. In this example, the WHERE clause
Dept=“DeptY”’AND Prod=“ProdA” is used to qualify a selection where at least

Physical Data View Logical Data View

Dept Emp Dpnd Dept Emp Dpnd


Dept Name Name Name Name Name Name
Emp DeptX Mary Jim DeptX Mary Jim
Sara DeptX Mary Sara
Dpnd Mary Andy DeptX Mary Andy

COUNT DpndName BY EmpName produces different results:

DeptX Mary Dpnd Count=2 DeptX Mary Dpnd Count=3


Mary Dpnd Count=1

Figure 5.10 Physical and logical views can produce different results.
Data Structure Review 61

DivisionView Structured View Cartesian Product

Div Div1 Div1 ProdA DeptX


ProdA DeptX Div1 ProdB DeptX
Prod Dept ProdB DeptY Div1 ProdA DeptY
Div1 ProdB DeptY

SELECT Div, Prod,Dept Div1 ProdA DeptY Div1 ProdA DeptY


FROM DivisionView ProdB Div1 ProdB DeptY
WHERE Dept="DeptY"

Figure 5.11 Multileg data selection semantics example.

DivisionView Structured View Cartesian Product

Div Div1 Div1 ProdA DeptX


ProdA DeptX Div1 ProdB DeptX
Prod Dept ProdB DeptY Div1 ProdA DeptY
Div1 ProdB DeptY

SELECT Div, Prod, Dept Div1 ProdA DeptY Div1 ProdA DeptY
FROM DivisionView
WHERE Dept="DeptY"
AND Prod="ProdA"

Figure 5.12 Multileg AND selection qualification semantics example.

one entry in the Product leg is ProdA and at least one occurrence in the Depart-
ment leg is DeptY. This example also selects the data that is included in the
qualification criteria, so this data is also filtered. This means that only values
ProdA and DeptY are selected from their respective common parent Div1.
Notice how the Cartesian product model can support this processing one row
at a time as performed by relational processing. If the AND operator in the
WHERE clause were changed to an OR operator, the Cartesian product pro-
cessing would select rows with a Product value of ProdA or rows with a Depart-
ment value of DeptX. This produces the correct semantics even though
replicated values are also produced because of the Cartesian product effect. This
is shown in Figure 5.13.
As an important point on semantics, both conditions of an OR operation,
as in the SQL from Figure 5.13, have to be tested even if the first condition
tests true. In this query, the first selection condition, Dept=“DeptY”, is true, but
the outcome is still affected by the second selection condition, Prod=“ProdA”,
62 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

DivisionView Structured View Cartesian Product

Div Div1 Div1 ProdA DeptX


ProdA DeptX Div1 ProdB DeptX
ProdB DeptY Div1 ProdA DeptY
Prod Dept
Div1 ProdB DeptY

SELECT Div, Prod, Dept Div1 ProdA DeptX Div1 ProdA DeptX
FROM DivisionView ProdB DeptY Div1 ProdA DeptY
WHERE Dept="DeptY" Div1 ProdB DeptY
OR Prod="ProdA"

Figure 5.13 Multileg OR selection qualification semantics example.

which enables DeptX values to be displayed. This can be verified by comparing


this result to the result of the query in Figure 5.11, which only tests for
the selection condition Dept=“DeptY” and therefore filters out DeptX values.
The OR Prod=“ProdA” portion of the above query selection condition in
Figure 5.13 matches ProdA values, which will qualify their sibling segment val-
ues and introduce them into the result, such as the value DeptX. If this still
seems illogical, consider that the results from each condition alone when com-
bined (such as through an OR operation) would contain a union set of results
such as in Figure 5.13.

5.11 Ordering of Data Structures Can Cause


Their Restructuring
When a physical data structure is ordered (sorted) against its natural structure
by not following its path, the structure is changed to that of the list of fields to
be ordered. To format a physical structure like the one in Figure 5.14 requires
that the structure be flattened in order to be sorted. This will convert physical
structures to logical structures. After flattening a data structure, the ordering of
it will affect its structure, as shown in Figure 5.14.
Since relational databases use logical databases, the ordering effect shown
in Figure 5.14 does not normally have to be a concern. But with the one-sided
outer join and its inherent hierarchical ordering shown in Chapter 3, there may
be some concern about going against the inherent data structure produced by
the outer join since there may be a semantics conflict.
Data Structure Review 63

Physical Structure Logical Structure


Dept Emp
Order By Emp
Emp Produces: Dept

Old Structure Flattened Ordered New Structure


DeptA EmpX DeptA EmpX EmpX DeptA EmpX DeptA
EmpZ DeptA EmpZ EmpX DeptB DeptB
DeptB EmpY DeptB EmpY EmpY DeptB EmpY DeptB
EmpX DeptB EmpX EmpZ DeptA EmpZ DeptA

Figure 5.14 Ordering can cause restructuring.

5.12 Data Structure Composition


Data structures are composed of records that include segments that consist of
data fields. To explain from the bottom up, fields are grouped into contiguous
segments. The fields in a segment are related closely by data content such as
name, street number, city, state, and ZIP code, and represent a given segment
type. Fields in a segment do not repeat, but segments can. These are called seg-
ment occurrences. Segment types are related in a fixed hierarchical data struc-
ture as in Figure 5.15. The top segment type is known as the root segment.
One occurrence of a root segment, its related segment types, and their segment
occurrences are known as a structured record.
This data structure definition fits into the common notion of a file con-
taining variable-length structured records where each record is composed of
segments that are arranged into a hierarchical data structure. Relational data-
bases as used in this book to model data structures can also fit naturally into this
definition. A relational database can be thought of as being composed of struc-
tured records, where the segment types represent the different tables and their

Structured Record N

Segment X

Occurrence N
Segment Y Segment Z
Occurrences Occurrences

Figure 5.15 Data structure composition.


64 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

segment occurrences represent the rows of the tables as shown in Figure 5.16.
These structured files can be supported directly by COBOL and by other struc-
tured languages by using an interface (with some variable segment occurrence
limitations). More detailed information on structured records can be found in
Chapter 14.

5.13 Good Data Modeling Design Principles


Ideally, data modeling is the defining of data structures whose semantics reflect
the defined data model. In this regard, good data modeling design is important
to data structure definition. The problems with nonhierarchical structures were
covered earlier in this chapter; here, we will concern ourselves with basic nor-
malization rules. These rules help avoid insertion, deletion, and update anoma-
lies, and increase and support data independence through increased use of
joins. This means that they also affect data structures with similar problems and
advantages. The basic normalization rules are numbered from first to third nor-
mal form. Usually, these rules are specified in a building block fashion where
third normal form includes second normal form and second normal form
includes first normal form—we will forgo this requirement as explained below.
Except for first normal form, these basic normalization rules are about
good database design principles, which are normally associated with relational
databases but are also very applicable to segments of structured databases where
segments are analogous to rows of tables. First normal form is a restriction for
SQL tables that forbids the use of repeating fields because of their fixed
two-dimensional format. This is not necessarily a good database design princi-
ple, only a relational design constraint. This SQL restriction has been eroding,
with established SQL vendors starting to support nested relational
tables—tables within tables—known as nested relational support.

Structured Record N

Table X
Row N
Table Y Table Z
Rows Rows

Figure 5.16 Relational data structure composition.


Data Structure Review 65

Second normal form does not permit any partial key dependencies. A
nonkey field (column or attribute) must not be functionally dependent on a
field that is only part of the primary key. In other words, every nonkey field is
fully dependent on the primary key. Third normal form requires every nonkey
field to be nontransitively dependent on the primary key. This means all fields
are directly dependent on the primary key. To correct these potential design
problems, the offending fields should be moved into another table or segment
where they obey these database design rules.
These basic normalization rules may not be enough to satisfy a good data-
base design. Improper database design could still produce a condition known as
lossy decomposition, introduced from the basic normalization process that
breaks tables apart. Imagine breaking a table into two tables based on ZIP code
instead of account number. When these tables are reconstructed by a join oper-
ation, this condition introduces additional extraneous rows that were not in the
original table. This has the effect of obfuscating the semantics of the valid rows,
resulting in a loss of information. To solve this problem, a lossless join property
is needed that can be supplied by advanced normalization forms, known as
Boyce Codd normal form, fourth normal form, and fifth normal form. The
first three basic normal forms explained above removed dependencies. In these
advanced normal forms, advanced dependencies that rely on superkeys are used
to support lossless joins. Superkeys are composite keys that when broken down
still uniquely identify a row. This eliminates the introduction of extraneous
data when tables are joined.

5.14 Conclusion
This chapter has identified and discussed the elements involved with data mod-
eling. These were three-tier database architecture with its application views and
conceptual model; data relationships such as one-to-many, many-to-one, and
many-to-many; data structures such as hierarchical and network; data structure
processing as it relates to relational processing; the semantics of multileg data
structures; and good database design principles.
Network structures are necessary for the definition of the conceptual data
model, which needs the ability to define many different data views for the same
database (tables). However, if network data structures are used as application
views, there can be problems because data values in the structure can be
reached from multiple paths, making the view ambiguous. This allows invalid
assumptions to be made by nonprocedural languages. This is not true of hierar-
chical data structures, which are singular in meaning. This makes their seman-
tics very powerful in the nonprocedural processing of data structures.
66 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Many-to-many relationships are not directly supported in relational data-


bases, and require the use of an association table. While this will involve addi-
tional SQL to process the intersecting table, it does enable the opportunity to
support intersecting data.
The Cartesian product is used in relational processing to enable flat
two-dimensional structures to be processed in a structured manner. There are
side effects caused by this process in the form of replicated values introduced to
fill the flat structure. This can hide the data structure and throw summaries off.
This was also shown when the difference between physical and logical data
views was covered earlier in this chapter. Also related to these last two items is
ordering the database view against its inherent data structure, which was also
discussed.
6
Outer Join Does Data Modeling
Previous standard versions of SQL have not supported the capability to per-
form data modeling and complex data structure processing. The standard SQL
does not officially claim to support data modeling and structure processing
either. But standard SQL does inherently support data modeling and data
structure processing through its new outer join operation. With knowledge
about this capability and instruction on how to use it, SQL users and vendors
can take advantage of this powerful capability.

6.1 SQL Data Modeling Using the Outer Join


Back in Chapter 2, it was shown how one-sided (LEFT and RIGHT) joins are
hierarchical in nature because they preserve unmatched rows in one table and
not the other. In a LEFT join, the left table is preserved so that the left table is
hierarchically over the right table. For example, in the LEFT join Department
LEFT JOIN Employee ON DeptNo= EmpDeptNo, departments can occur with-
out any matching employees, and employees cannot exist without a matching
department. These semantics precisely define the basic building blocks for con-
structing a hierarchical data structure.
In one-sided joins involving more than two tables, the hierarchical effect
described above is extended such that Department LEFT JOIN Employee ON
DeptNo=EmpDeptNo LEFT JOIN Dependent ON EmpNo=DpndEmpNo pro-
duces the hierarchical structure shown in the Department view in Figure 6.1.
This is a simple one-leg hierarchy. But the outer join can also model and pro-
cess multileg (complex) data structures as in the Employee view, also shown in

67
68 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Department View Employee View

Department Employee

Employee
Department Dependent
Dependent

SELECT * FROM Department SELECT * FROM Employee


LEFT JOIN Employee LEFT JOIN Department
ON DeptNo=EmpDeptNo ON DeptNo=EmpDeptNo
LEFT JOIN Dependent LEFT JOIN Dependent
ON EmpNo=DpndEmpNo ON EmpNo=DpndEmpNo

Dept Emp Dpnd Emp Dept Dpnd


DeptA Mike Jason Mike DeptA Jason
DeptA Mike Jane Mike DeptA Jane
DeptA Mary Sam Mary DeptA Sam
DeptB John Null John DeptB Null
DeptC Null Null Bill Null Sara

Figure 6.1 Different outer join data structures comprised of the same relationships.

Figure 6.1. With the basic modeling capabilities shown in these data structures,
any hierarchical data structure can be modeled.
The relationships depicted in the Department view in Figure 6.1 are
one-to-many. One department has many employees, and one employee can
have many dependents. In the Employee view in Figure 6.1, the department to
employee one-to-many relationship shown in the Department view has been
flipped around to define an employee to department many-to-one relationship.
Both of the structures in Figure 6.1 use the same tables and the same rela-
tionships to derive different structures with different semantics. This is shown
in the differing query results in Figure 6.1 where department DeptC with no
employees can’t exist in the Employee view, and employee Bill can’t exist in the
Department view because he has no department designation. What triggers this
difference? Since the join relationships are identical, it wasn’t directly any of the
ON clauses. It was the initial LEFT join that reversed the Department and
Employee table arguments from the Department view, putting Employee over
Department. This in effect transformed the structure into the multileg struc-
ture shown in the Employee view in Figure 6.1. This is because the Employee
table is now hierarchically above the Department and Dependent tables and is
directly related to both of them through their ON clauses. This demonstrates
that ON clauses are also of importance by controlling the link (join) points
between the data structures.
Outer Join Does Data Modeling 69

This flexible data modeling and data structure processing is possible


through a combination of the one-sided outer joins and the individualized join
criteria specified for each join relationship via the ON clause. The one-sided
outer join controls the hierarchical layering of tables, while the ON clause con-
trols the relationships or pathways between them.
Natural one-sided outer joins should not be used to model hierarchical
structures because they do not directly model hierarchical structures as
described in Chapter 4. But if they are used, they can be transformed to
non-natural one-sided joins, as described in Chapter 4, and then processed.
This is an optional feature, and is not necessary to perform complete data
modeling.
Using the ON clause, concatenated keys and path qualification can also
be supported. With a concatenated key, a key can be comprised of subfields
(multiple columns). For example, ON DeptNo1=EmpDeptNo1 AND
DeptNo2=EmpDeptNo2. This has the effect of concatenating a two-part key
and comparing the parts as one unified key. With path qualification, the join
criteria can also reference fields further up the path from the point being linked.
For example, when linking Dependent with Employee in the Department view
in Figure 6.1, the following link criteria are valid: ON EmpNo=DpndEmpNo
AND DpndVal=DeptVal. Notice that the referenced DeptVal column is at a
higher hierarchical level than the actual link point. Determining the link point
is described in the next section.
The minimum outer join requirements for data modeling and data struc-
ture processing are the support for the standard SQL LEFT join and the ON
clause. To fully support subviews comprised of outer join structures,
right-sided nesting (see Chapter 2) must also be supported. This means that
SQL view names can also be specified on the right side. Subviews specified on
the left side of the outer join operation require no special processing
requirements.
Using the standard SQL outer join, network structures can usually be
converted to hierarchical structures. This is accomplished by renaming tables
that have multiple entry points in the structure and including them in the
structure so that no single logical table has more than one entry point in the
structure. Figure 6.2 demonstrates how a network structure can be rewritten as
a hierarchical data structure using SQL renaming.
The SQL that defines the network structure in Figure 6.2 is ambiguous
since table X can be accessed from more than one path (via table B or table C),
making its meaning and semantics unstable. Each path has its own distinct
meaning, and the result can reflect either one. There may be situations where
these semantics are exactly what you may desire, but the unambiguous
70 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Network Structure Hierarchical Structure

A A

B C B C

X X1 X2

SELECT * FROM A SELECT * FROM A


LEFT JOIN B ON A=B LEFT JOIN B ON A=B
LEFT JOIN C ON A=C LEFT JOIN C ON A=C
LEFT JOIN X ON B=X LEFT JOIN X AS X1 ON B=X1
OR C=X LEFT JOIN X AS X2 ON C=X2

Figure 6.2 Converting network SQL structures to hierarchical SQL structures.

power of the hierarchical structure (see Chapter 5) cannot be utilized in these


rare cases.

6.2 ON Clause Data Modeling Join Condition Rules


As demonstrated in Figure 6.2, there is a right way and a wrong way to join (or
link) tables to specify a valid hierarchical structure. Invalid structures are usu-
ally caused by the incorrect use of AND and OR operators in the ON condi-
tion. If the join condition rules are not followed, invalid or illogical structures
can be created that may produce inconsistent results. These rules pertain to
linking (joining) an upper structure to a lower structure when using a one-sided
outer join operation. In the case of a LEFT join, the higher structure is the
structure on the left side; in the case of a RIGHT join, the higher structure is
on the right side.
Normally, building a hierarchical data structure is performed top-down,
where the lower level table argument is usually a structure consisting of one
table since tables are being introduced and linked to the top structure one table
at a time. The lower level structure can also be comprised of multiple tables, as
in Figure 6.3. These multitable subviews will be described in more detail in
Chapter 7.
The following three basic ON clause join condition rules apply to each
ON clause join condition in outer join statements that are modeling hierarchi-
cal structures.
The first rule specifies that the top and bottom structures must both
be referenced in each ON clause join condition or subcondition (described
Outer Join Does Data Modeling 71

Resulting Structure
Dependent
Dependent
DeptEmp View
L
I Department Department
N
K
Employee Employee

SELECT * FROM Dependent LEFT JOIN DeptEmp


ON DpndEmpNo=EmpNo

Figure 6.3 Example of breaking link rule three to build a hierarchical structure.

below). This is necessary to specify a complete path from the upper structure’s
link point to the lower structure’s link point. The link point is a specific table in
the upper and lower structures determined by the specification of the ON
clause join condition that joins (or links) the upper and lower structures. The
determination of the link points is specified in the second and third ON clause
join condition rules described directly below.
The second rule applies to the top structure. In the top structure, only one
single path can be referenced from the link point up the path to the root. Refer-
encing multiple paths using AND or OR operators creates an ambiguous net-
work structure, as demonstrated in the network structure in Figure 6.2. When
using AND and OR conditions in the ON clause, OR clauses create subclauses
that can consist of AND operations. When referencing multiple locations along
a path in the upper level structure, the lowest table referenced in each OR
subcondition becomes the link point, and the link point in each OR subclause
must specify the same link point table; otherwise, a network or illogical struc-
ture is created. When the link point in the upper level structure is not the low-
est level point on its path, a new leg of the structure is created. This can be seen
in Figure 6.1 when in the Employee view the Dependent table is joined to the
Employee table, forming a multileg structure.
The third and final rule applies to the bottom structure. In the bottom
structure, only the root (top) table can be referenced. This is necessary to
preserve the top-down processing of hierarchical structures that is normally
expected. While breaking this rule may limit some of the advantages of a strict
hierarchy, it is possible to link to a lower level structure based on table columns
below the root of the lower structure. Regardless of which table or tables
are referenced below the root, the root table should still be treated as the
bottom structure link point, as demonstrated in Figure 6.3. The exact se-
mantics of this unconventional hierarchical structure will be covered in
72 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Chapter 15, but up until then this text assumes that the third linking rule is
always obeyed.

6.3 Valid and Invalid ON Clause Data Modeling Examples


In the network structure in Figure 6.4, there is an example of how an OR clause
can cause a network structure to be created. In determining the upper structure
link point, one side of the OR isolates table B and the other side isolates table
A. Since table B and table A are from different legs, table C can sometimes be
reached from one leg or the other, making it a network structure—which is
ambiguous for an application view.
The second ON clause for the hierarchical structure in Figure 6.4 demon-
strates how the AND clause can be used to qualify the path further up. The sec-
ond ON clause goes with the second LEFT join, which is linking table C to
table B. The lowest referenced table in the upper level structure’s selected
leg—table B—is determined as the link point. But as shown here, a higher level
table on the path—table A—can also be referenced to further qualify the link
condition without altering the link point.
In the first structure in Figure 6.5, there is an example of how an AND
clause can cause an invalid structure to be created. In this example, X is reach-
able only from both paths at the same time because of the AND operator.
While the form of this structure resembles a network structure as shown in
Figures 6.2 and 6.4, it does not behave as a typical network structure. Its

Network Structure Hierarchical Structure


A A

B B

C C

SELECT * FROM A SELECT * FROM A


LEFT JOIN B ON A=B LEFT JOIN B ON A=B
LEFT JOIN C ON B=C LEFT JOIN C ON B=C
OR A=C AND A=C

Figure 6.4 The difference between OR and AND operators when linking structures.
Outer Join Does Data Modeling 73

Invalid Structure Hierarchical Structure

A A

B C B

X C

SELECT * FROM A SELECT * FROM A


LEFT JOIN B ON A=B LEFT JOIN B ON A=B
LEFT JOIN C ON A=C LEFT JOIN C
LEFT JOIN X ON C="X" AND B="Y"
ON B=X AND C=X OR B=C AND B=A

Figure 6.5 Valid and invalid AND operator use.

behavior can be considered illogical. Again, this does not mean that there is not
some possible use for the semantics of this structure.
The second ON clause for the hierarchical structure in Figure 6.5 demon-
strates how the OR operator can be used to specify a choice of two OR
subconditions because each OR subcondition isolates the same two link points:
tables B and C. The reference to table A in the upper structure is disregarded in
determining the link point since table B is at a lower level. This example also
demonstrates that the join condition does not always have to compare two col-
umns directly to each other (i.e., C=“X” AND B=“Y” ). The link can be satisfied
as long as each subclause references a table from each structure and satisfies the
join condition rules described in Section 6.2.

6.4 Valid and Invalid Data Modeling Results


In Section 6.3, we saw how to create valid and invalid application data struc-
tures; examining the results produced by them can be very useful and insight-
ful. The example in Figure 6.6 demonstrates the effect of a network structure
with multiple paths to data. Each path has its own semantics (meaning)
which can produce a combination result that can be ambiguous, as shown in
Figure 6.6. Path 1 represents the managers for a selected project. Path 2 repre-
sents the managers for a department. As discussed in Chapter 5, network data
structures taken on their own are ambiguous. This means a self-navigating 4GL
database like SQL would also produce an ambiguous result since it is free to
take either path to the data (as shown in Figure 6.6), which combines managers
for both products and departments.
74 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Division Network View

Path 1 Division Path 2


DEFINE VIEW DivisionView AS
SELECT * FROM Division
LEFT JOIN Product ON DivNo=ProdDivNo
Product Department LEFT JOIN Department ON DivNo=DeptDivNo
LEFT JOIN Employee AS Manager ON
Manager EmpNo=ProdMgrNo OR EmpNo=DeptMgrNo

Path 1 values: Path 2 values:


Division Product Manager Division Department Manager
DivX ProdZ Mike DivX DeptY John
John Mary

SELECT Division, Manager FROM DivisionView

Ambiguous result : Division Manager


DivX Mike
DivX John
DivX Mary

Figure 6.6 Network structure produces ambiguous results.

We saw in Section 6.1 that ambiguous network structures can also be


respecified in standard SQL as nonambiguous hierarchical data structures.
Using this conversion technique, the example in Figure 6.7 was changed from
the ambiguous network structure shown in the example in Figure 6.6 to a
nonambiguous hierarchical data structure. This hierarchical data structure pre-
vents the ambiguous single result of both managers of products and managers
of departments produced in Figure 6.6, allowing the two separate
nonambiguous results of managers of products and managers of departments
shown in Figure 6.7. These results are not possible by default in the ambiguous
network view above. They are possible in the hierarchical structure below
because each path is kept separate, allowing the paths to be queried separately.

6.5 Substructure Views


The syntax and semantics of the standard SQL outer join inherently and seam-
lessly support stored substructure views. Substructure views can be specified
anywhere a table can. These stored views can be used to form larger data struc-
tures. The result of these combined substructures follows the hierarchical
semantics as dictated by the newly formed structure. When linking these
Outer Join Does Data Modeling 75

Division Hierarchical View

DEFINE VIEW DivisionView AS


Division
SELECT * FROM Division
Path 1 Path 2 LEFT JOIN Product ON DivNo=ProdDivNo
Product Department LEFT JOIN Department ON DivNo=DeptDivNo
LEFT JOIN Employee AS ProdMgr
ON ProdMgr.EmpNo=ProdMgrNo
ProdMgr DeptMgr LEFT JOIN Employee AS DeptMgr
ON DeptMgr.EmpNo=DeptMgrNo

Path 1 values: Path 2 values:


Division Product Manager Division Department Manager
DivX ProdZ Mike DivX DeptY John
John Mary

After converting the data structure from a network to hierarchical


structure, the followings unambiguous set of queries can now be
issued with the following results:

SELECT Division,ProdMgr SELECT Division,DeptMgr


FROM DivisionView FROM DivisionView

Division ProdMgr Division DeptMgr


DivX Mike DivX John
DivX John DivX Mary

Figure 6.7 Network structure converted to hierarchy produces unambiguous results.

substructures, the same rules apply as those defined earlier in this chapter for
building structures. In particular, the ON clause rules in Section 6.2 must be
followed.
As mention in Chapter 2, right-sided nesting is required to support
stored structured views, or more precisely the ability of the outer join syntax to
support the simultaneous building and handling of multiple data structures.
Take for example: (A LEFT JOIN B ON A=B) LEFT JOIN (C LEFT JOIN D
ON C=D) ON B=C. The parentheses have been added to make the outer join
statement clearer, but are unnecessary since the join order is controlled by the
placement of the ON clauses (see Chapter 2). The join operations in parenthe-
ses are performed first, forming separate structures, each stored in a different
working set before they are combined into one structure following the last,
rightmost ON clause. The LEFT join operations enclosed in the parentheses
76 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

can be thought of as two stored structured views that have been expanded into
their representative SQL when inline expansion is used by the SQL system.
When the inline expansion of the stored structured views occurred in the
above SQL, notice what happened to the rightmost ON clause. It got pushed
to the right, causing right-sided nesting. Fortunately, the standard SQL syntax
handles this situation properly to support inline expansion. With stored struc-
tured views, this right-sided nesting occurs transparently, so the SQL program-
mer need not normally be concerned with right-sided nesting. The
transparency of this operation is demonstrated in Figure 6.8.
The Department view’s SQL in Figure 6.8 demonstrates how the embed-
ded subview EmpView is expanded to define the Department data structure.
While the semantics of the expanded Department SQL are the same as the
depicted Department structure, the order that the joins are performed is
now from the bottom up instead of from the top down. The reason the sem-
antics remain the same is that with hierarchical structures you can build
them up, down, or in any order and the semantics remain the same as was
described in Chapter 3. There is one caveat when building a structure upwards:
when the ON clause references a field further up the structure than the
upper link point, the upper level structure must contain all references at
the time of the join. This should not present a problem for stored views
since they should only be referencing columns in their own view domain.

EmpView View

DEFINE EmpViewAS
Employee SELECT * FROM Employee
LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
Dependent

Department View

SELECT * FROM Department


Department LEFT JOIN EmpView
ON DeptNo=EmpDeptNo

Expanded View:
Employee
SELECT * FROM Department
LEFT JOIN
Dependent (Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo)
ON DeptNo=EmpDeptNo

Figure 6.8 Embedded structure view expansion.


Outer Join Does Data Modeling 77

Since the stored subview is expanded or materialized when invoked, any


recent changes to the subview are automatically in effect. So, the support
of subviews is very useful and important. Structured views embedded
within structured views are also naturally supported; this is covered in
Chapter 8.

6.6 WHERE Clause Filtering with Data Structures


Before the existence of the standard SQL join operation, the WHERE clause
had two functions: to specify the join criteria and to specify selection criteria.
With the standard SQL join, the ON clauses are used to specify the join crite-
ria, while the WHERE clause is used primarily to specify the selection criteria.
This does not change when the outer join is used to perform data modeling.
The WHERE clause filters the data structure—it can be specified with a stored
view and/or at the time of the view invocation.
As you would expect, ON clauses cannot be specified on join view invo-
cations, so the WHERE clause is the only way to influence query operation at
the time of view invocation. This does not take away from the outer join’s data
modeling capability; in fact, it strengthens it because the data structure of a
stored view cannot be changed when invoked, thereby protecting its integrity.
In this way, the stored structure view can only be filtered with the specification
of a WHERE clause, which cannot change the structure of the data being
filtered.
The WHERE clause operates on the records or rows of the view. It identi-
fies data that is selected along with all of its associated data in the record or row.
For example, the WHERE clause in Figure 6.9 applied to the employee data
from the Employee view in Figure 6.1 selects only rows in their entirety—con-
taining employees of department DeptA, and all of the other rows are dis-
carded. For more information on data structure filtering semantics, refer to
Chapter 5 and Chapter 7.

SELECT * FROM EmployeeView WHERE DeptNo="DeptA"

: Emp
Produces Dept Dpnd
Mike DeptA Jason
Mike DeptA Jane
Mary DeptA Sam

Figure 6.9 WHERE clause filtering works with data structures.


78 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

6.7 WHERE Clause Filtering with Substructures


Normally, WHERE clauses with stored substructures are not needed and are
not recommended except for the one case explained below. ON clauses can
be used to specify most filtering requirements for substructures. WHERE
clauses that filter data based on filtering criteria from below the root of the sub-
structure present a problem because they are not following strict hierarchical
rules. This is because higher level data is being deleted based on values from
lower structure levels, because the entire path length is filtered by the WHERE
clause. While not generally recommended, this situation can be hierarchically
handled by following special operational precautions, which are discussed in
Chapter 15.
ON clauses for hierarchical substructures views cannot be used to filter
the root of the structure because ON clause filtering of hierarchical structures
only affects the lower structure, which means the root cannot be filtered in this
manner. In this situation, a WHERE clause can be specified in the stored sub-
structure view to filter the root level based on the root values. This is shown in
the EmpView in Figure 6.10. This filtering operation can be automatically

EmpView View

DEFINE EmpViewAS
Employee SELECT * FROM Employee
LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
Dependent
WHERE EmpAge>55

Department View

SELECT * FROM Department


Department LEFT JOIN EmpView
ON DeptNo=EmpDeptNo

Expanded View:
Employee
SELECT * FROM Department
LEFT JOIN
Dependent Employee LEFTJOIN Dependent
ON EmpNo=DpndEmpNo
ON DeptNo=EmpDeptNo
AND EmpAge>55

Figure 6.10 WHERE clause transformation for filtering substructure root.


Outer Join Does Data Modeling 79

moved to the ON clause that controls the linking of this substructure when it is
processed. This seamless transformation allows the substructure to be inte-
grated seamlessly into the overall structure, and allows a top-to-bottom process-
ing order to process the substructure. This is also shown in Figure 6.10.
In Figure 6.10, moving the WHERE clause data filter of the subview
higher up to the ON clause of the join that controls linking the subview works
because the filtering applies to the total subview, just as the WHERE clause
would have.

6.8 Complex Data Modeling Example


So far, we have been using the fairly simple Department/Employee database to
demonstrate how the SQL-92 join operation can perform data modeling. The
multimedia book example in Figure 6.11 is a more complex data modeling
example, consisting of a different subject matter that should demonstrate that
a hierarchical data model of any complexity can be easily modeled with
the SQL-92 join operation, and it will continue to obey hierarchical semantic
principles.

SELECT * FROM Book MMBook


LEFT JOIN Contents
ON BookX=ContentsX
LEFT JOIN Chapter
ON BookX=ChapterX Contents Chapter Index
LEFT JOIN Index
ON BookX=IndexX
Section
LEFT JOIN Section
ON SectionX=ChapterX
LEFT JOIN Text
ON SectionX=TextX Text Audio Scene
LEFT JOIN Audio
ON SectionX=AudioX
LEFT JOIN Scene Clip
ON SectionX=SceneX
LEFT JOIN Clip
ON SceneX=ClipX

Figure 6.11 Multimedia book data modeling example.


80 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

6.9 Conclusion
Building hierarchical data structures and the structured processing of them is
possible with the one-sided outer join operation. This building of hierarchical
data structures or combining of hierarchical data structures involves two opera-
tions. First, the placement or specification of which structure is hierarchically
over the other, and second, the specification of the pathway from the link
points from the upper structure to the lower structure. The first operation is
accomplished using a LEFT or RIGHT outer join that places one structure
hierarchically above the other, and the second operation, specifying pathways,
is specified by ON clauses. Both of these operations are required to model hier-
archical data structures. Data structures modeled in such a fashion can still be
filtered by the inclusion of a WHERE clause in the data structure definition
and/or view invocation.
Amazingly, the syntax of the standard SQL join operation naturally sup-
ports the use of substructure views as standard SQL views. These structured
subviews can be used anywhere a table can be specified to combine with other
structures to form larger data structures. These substructure views can also be
embedded in other structure views.
Also shown in this chapter was the capability for the outer join operation
to create ambiguous network data structures and illogical structures. While
these structures do not have the same powerful semantics as hierarchical data
structures, they still may be useful in certain specialized situations that the user
may have. Unfortunately, when these structures are used, it is usually by
accident. The knowledge of how to construct hierarchical structures can also
prevent ambiguous and illogical structures from being built unintentionally.
7
Outer Join Data Modeling–Related
Capabilities
This chapter covers powerful capabilities and features that inherently accom-
pany or enhance the standard SQL outer join data modeling capability. For this
reason, they are automatically available for database professionals to use if they
know that they exist and how to use them.

7.1 Data Structure Filtering


The inherent data modeling capability of the outer join also supports data fil-
tering that operates by naturally following the semantics of the outer join speci-
fied data structure. This gives the data structure filtering capability a very fine
filtering control. Normally, filtering criteria such as DpndStatus=“Active” is
specified on the WHERE clause. But when data modeling is being performed
by the outer join, data filtering criteria can be specified on the ON clause along
with the join criteria. When this is done, the ON clause not only specifies how
its upper and lower structures are linked, but also the data filtering criteria.
This filtering affects only the lower level structure being joined; the upper
(main) structure is not affected. In this way, its operation is following the
semantics of the data structure. The big difference in ON clause filtering from
WHERE clause filtering is that WHERE clause data filtering removes entire
rows while ON clause filtering operates only on specific portions of rows. This
can be seen in Figure 7.1.

81
82 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Employee View Employee View Data

Employee EmpNo DeptNo DpndNo DpndStat


Mike DeptA Jason Active
Mike DeptA Jane Active
Dependent Department Mary DeptA Sam Inactive

SELECT * FROM Employee SELECT * FROM Employee


LEFT JOIN Dependent LEFT JOIN Dependent
ON EmpNo=DpndEmpNo ON EmpNo=DpndEmpNo
LEFT JOIN Department AND DpndStat="Active"
ON DeptNo=EmpDeptNo LEFT JOIN Department
WHERE DpndStat="Active" ON DeptNo=EmpDeptNo

EmpNo DeptNo DpndNo DpndStat EmpNo DeptNo DpndNo DpndStat


Mike DeptA Jason Active Mike DeptA Jason Active
Mike DeptA Jane Active Mike DeptA Jane Active
Mary DeptA Null Null

Figure 7.1 ON clause versus WHERE clause data filtering.

The purpose of Figure 7.1 is to demonstrate the difference between ON


clause and WHERE clause data filtering. It does this by first showing the
Employee structure and its data, which is listed to its right. Underneath this are
two outer join SELECT statements that both model the Employee structure
above it. The first outer join statement uses WHERE clause filtering and the
second outer join statement to its right uses ON clause filtering. Both filtering
examples remove inactive dependents who are not currently covered under the
company’s medical benefits. The WHERE clause filtering removes entire rows
since it is performed logically after the complete row is assembled. The ON
clause neatly filters specific paths in the data structure, preserving all other
unrelated data. In this example, Mary’s dependent son Sam is currently inactive
for medical coverage and he is filtered out, while the rest of the unrelated data
for this row is preserved. This is not true for WHERE clause data filtering,
which also causes Mary’s entire row to be removed.
The unrelated data not affected by the ON clause filtering in Figure 7.1 is
employee data, which is above the dependent data, and the department data,
which is in an unrelated leg of the data structure. If the Dependent table had
other tables under it, then these tables could be affected by the ON clause filter-
ing, as you would expect. This follows the semantics of the Employee data
structure, making it useful for specifying business rules.
The ON clause rules for building hierarchical structures that were defined
in Chapter 6 must still be observed when supplying ON clause data-filtering
Outer Join Data Modeling–Related Capabilities 83

criteria. Basically, this means that any tables referenced by the ON clause filter-
ing criteria must be limited to the root of the lower level structure or any tables
from the link point up the path to the root. In this way, the data filtering crite-
ria cannot inadvertently affect the link points that would change the structure
being modeled and its semantics.

7.2 Indirect Structure Linking


In some cases, it may be desirable to link a table or substructure under a table in
the upper structure that can’t be directly linked to. This can be accomplished
using an indirect link—for example, linking Dependent to Department, which
is linked under Employee. In this case, Dependent is linked to Employee, but
indirectly through Department, which means the department for an employee
must exist for the dependents of that employee to exist. As shown in Figure 7.2,
this is done using an existence test for Department since Dependent is not
directly related to Department.

7.3 Nonhierarchical Join Type Support


Hierarchical structures are very useful. Their single-minded semantics allow
powerful assumptions to be made, like those utilized in fourth generation lan-
guages. But there are times when nonhierarchical join operations like the inner
and FULL joins are necessary, and would be useful if they could be incorpo-
rated into the modeled hierarchical data structure—for example, take two

SELECT * FROM Employee


LEFT JOIN Department ON DeptNo=EmpDeptNo
LEFT JOIN Dependent ON EmpNo=DpndEmpNo
AND DeptNo NOT NULL

Employee Emp Dept Dpnd


Mike DeptA Jason
Mike DeptA Jane
Department
Mary DeptA Sam
John DeptB Null Compare with
Dependent Bill Null Null Figure 6.1

Figure 7.2 Indirect linking of Dependent under Department.


84 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

separate Employee tables that would be useful if FULL outer or inner joined
and placed into a hierarchical structure as a single logical table.
Logical tables can be created as temporary tables in a previous step and
introduced into the structure. Unfortunately, these temporary tables cannot
take advantage of the semantic capabilities of hierarchical structures. For exam-
ple, the optimizations covered in Chapter 11 would not be able to optimize
the joins performed in a previous step. But performing inline nonhierarchical
joins while building a hierarchical structure can invalidate the structure, turn-
ing it into a nonhierarchical structure with unstable application semantics,
as described in Chapter 5. Such a nonhierarchical structure is defined in
Figure 7.3 from a combination of LEFT and FULL joins.
In Figure 7.3, EmpY becomes a second entry point in the data structure,
invalidating the hierarchical data structure. If an inner join was used instead of
the FULL outer join, it could also cause the removal of the Dept segment,
which would be logically above it.
There turns out to be a solution to the problem of incorporating non-
hierarchical, symmetric join types into the hierarchical model being built. The
solution again rests with right-sided nesting, which was discussed in Chapter 6,
to support stored and embedded structured views. When left-sided nesting is
intermixed with right nesting, we also determined in Chapter 6 that multiple
separate structures were temporarily formed. When a new structure was cre-
ated, the current one being built was put on hold and sheltered from the effects
of joins to the active structure. This technique can be used to perform
nonhierarchical joins without invalidating the hierarchical structure(s) being
built. This is demonstrated in Figure 7.4.
The FULL outer join operation performed in Figure 7.4 is sheltered from
invalidating currently existing hierarchical structures because of the strategic
use of right-sided nesting. The FULL join operation that is highlighted in
Figure 7.4 is performed in isolation. In this example, the FULL outer join

Invalid Hierarchical Structure

SELECT * FROM Dept Dept


LEFT JOIN EmpX
ON DeptNo=EmpXDeptNo EmpX EmpY
FULL JOIN EmpY
ON EmpXNo=EmpYNo
LEFT JOIN Dpnd Dpnd
ON EmpXNo=DpndEmpNo

Figure 7.3 Invalid hierarchical data structure example.


Outer Join Data Modeling–Related Capabilities 85

Hierarchical Hybrid Structure

Dept

EmpX EmpY

Dpnd
ISOLATED
SELECT * FROM Dept LEFT JOIN JOIN
EmpX FULL JOIN EmpY USING (EmpNo)
ON DeptNo=EmpDeptNo
LEFT JOIN Dpnd ON EmpNo=DpndEmpNo

Figure 7.4 Hierarchical hybrid structure with logical nonhierarchical table.

could also have been an INNER or UNION join. These operations are sym-
metrical in operation, making their data modeling ability neutral in nature—
both sides carry equal data-preserving ability. This means these operations form
a single, flat logical object, like EmpX|EmpY in the diagram in Figure 7.4. This
is why this object can be viewed as a single logical table. These logical tables
can be composed of more than two tables by using left-sided nesting when
building the logical table. And finally, more than one logical table can be in-
corporated into a hierarchical structure. These concepts are demonstrated in
Figure 7.5.
When creating logical tables with the INNER or FULL join operation, it
is usually desirable to have one fixed key location per logical table. This can be
easily performed using the NATURAL or USING option, which was described
in Chapter 4. This is demonstrated in Figure 7.6. The parentheses are used for
readability in this example—they do not affect the join order.
As described in Chapter 4, the NATURAL option used with any type
join operation will not allow the modeling of hierarchical data structures. But
used with right-sided nesting, as shown in Figure 7.6, its nonhierarchical opera-
tion used with symmetric joins is also sheltered from the hierarchical structure
being built.
It is also possible to use a logical table as the root of a structure. This is
shown in Figure 7.7. In this example, the root logical table is not being pro-
tected by right-sided nesting because it is specified on the left side. Right-sided
nesting is not necessary in this case because the root logical table is defined first
in the SQL statement, so no sheltering is necessary since there is no other struc-
ture in existence or active to be affected. The SQL example in Figure 7.7 also
86 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Complex Hybrid Hierarchical Structure


A

B C D

E F G H

SELECT * FROM A LEFT JOIN


(B FULL JOIN C ON B=C FULL JOIN D ON C=D)
ON A=C
LEFT JOIN (E FULL JOIN F ON E=F) ON E=C
LEFT JOIN (G INNER JOIN H ON G=H) ON G=D

Figure 7.5 Complex hybrid hierarchical structure with multiple logical tables.

SELECT * FROM A LEFT JOIN


A
(B NATURAL FULL JOIN C
B C D NATURAL FULL JOIN D)
ON A=C LEFT JOIN
E F (E FULL JOIN F USING (Key))
ON C=E

Figure 7.6 NATURAL logical table example.

SELECT * FROM
A FULL JOIN B ON A=B
A B C FULL JOIN C ON B=C
LEFT JOIN
D E (D FULL JOIN E USING (Key))
ON C=E AND A=D OR B=E

Figure 7.7 Logical table as root of data structure.

demonstrates by its complex use of AND and OR operators that logical tables
follow the same linking rules and capabilities as standard tables.
The example in Figure 7.7 may raise some concerns that logical tables or
substructures in general, when specified on the left, may be subject to inter-
ference from or cause interference to other structures—they may come into
contact with them on their left side. If true, this would make their use unpre-
dictable or unstable, reducing their usefulness. This, however, is definitely not
Outer Join Data Modeling–Related Capabilities 87

the case. While left-sided nonhierarchical structures may appear as a possible


future danger, they will not affect other structures or tables even when these
other structures are introduced from the left. This is because the structures
added to the left naturally use right-sided nesting. For example, table X LEFT
joined to A INNER JOIN B ON A=B LEFT JOIN C ON B=C produces X
LEFT JOIN A INNER JOIN B ON A=B LEFT JOIN C ON B=C ON X=A,
causing table X to remain preserved and uninfluenced from the destructive
inner join operation on its right side. This natural syntax enables the free, safe,
and seamless use of substructures (which includes logical tables) under all cur-
rent and future syntactical situations that they may be used in.
While intermixing nonhierarchical symmetric joins (FULL, INNER, and
UNION) is not associative in operation, logical tables can intermix these differ-
ent join types. The result is still a flat structure, but it does carry with it more
meaningful semantics than a flat structure derived using a uniform symmetric
join type. An example is shown in Figure 7.8.
It’s very useful to realize that these logical tables can be easily produced by
isolating the logical table in a stored SQL view because the expansion process-
ing of it automatically creates right-sided nesting. We have previously seen this
in Chapter 6, with a view expansion of a structured view that is combined or
embedded within another SQL structure definition. Figure 7.9 demonstrates
an example of a view comprising a logical table being expanded. As in any other
stored view, there are many additional advantages to placing logical tables in
stored views, such as reuse and data abstraction.

7.4 Nonhierarchical Joining of Data Structures


Multitable data structures, just like the single tables described in Section 7.3,
can also be joined nonhierarchically using symmetric joins, such as the FULL
outer join and the inner join, to form a valid hierarchical data structure. All of
the documentation for joining single tables described in Section 7.3 also applies
to joining data structures, including one additional requirement. This require-
ment is that only the root tables of the data structures can be joined together,

SELECT * FROM A LEFT JOIN


A B FULL JOIN C ON B=C
INNER JOIN D ON C=D
B C D
ON A=C

Figure 7.8 Intermixing symmetric join types in logical tables.


88 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

EmpAll Logical Table View

DEFINEEmpAll AS
SELECT * FROM EmpWest
FULL JOIN EmpEast
EmpWest EmpEast USING (EmpNo)

Department View

SELECT * FROM Dept


Dept LEFT JOINEmpAll ON DeptNo=EmpDeptNo
LEFT JOIN Dpnd ON EmpNo=DpndEmpNo

EmpAll Expanded View:

SELECT * FROM Dept LEFT JOIN


Dpnd EmpWest FULL JOIN EmpEast USING (EmpNo)
ON DeptNo=EmpDeptNo
LEFT JOIN Dpnd ON EmpNo=DpndEmpNo

Figure 7.9 Embedded logical table in view expansion.

which is accomplished by only referencing columns from the root tables for the
join criteria. This is demonstrated in Figure 7.10.
Figure 7.10 demonstrates two structures being FULL outer joined. As can
be seen in these examples, structures naturally form the proper protected envi-
ronment needed for nonhierarchical joins as described in Section 7.3. These
can be expanded views of data structures or structures built in place, which is
equivalent to the expanded structure views as shown in Figure 7.10. Also
shown in Figure 7.10 is the expanded SQL rewritten to be more efficiently exe-
cuted by avoiding throwaway tuples. This is accomplished by performing the
FULL outer join first, as shown.
While the nonhierarchical example in Figure 7.10 uses a FULL outer join
to link the data structures, it could have also been an inner join. While these
symmetric operations both produce the same valid hierarchical structure, the
semantics as far as the resulting data content are different, as you would expect.
The inner join removes both structures being linked if both do not exist, while
the FULL outer join will preserve data structures even if they have no matching
data structure.
Linking symmetrically at the root level causes no invalidating of the
hierarchical data structure. Applying nonhierarchical linking at structure levels
lower than their root produces nonhierarchical data structures. Inner joins
can cause data loss further up the data structure, which invalidates the data
Outer Join Data Modeling–Related Capabilities 89

SELECT * FROM
ViewA FULL JOIN ViewX ON A=X

ViewA ViewX Combined FULL Join View

A X AX

B C Y B C Y

Z Z

Expanded: Rewritten:
SELECT * FROM SELECT * FROM
A LEFT JOIN B ON A=B A FULL JOIN X ON A=X
LEFT JOIN C ON A=C LEFT JOIN B ON A=B
FULL JOIN Equal LEFT JOIN C ON A=C
X LEFT JOIN Y ON X=Y LEFT JOIN Y ON X=Y
LEFT JOIN Z ON Y=Z LEFT JOIN Z ON Y=Z
ON A=X

Figure 7.10 Symmetric joining of data structures.

structure, and a FULL outer join can cause only the lower structure to be
preserved, which also forms an invalid structure. These situations are both
avoided by joining the data substructures only at their roots. This is also
the most natural and common way to join two data structures symmetrically
(nonhierarchically).
Single tables can also be nonhierarchically joined to data structures. Since
a single table is actually a data structure consisting of one table with its only
table as the root table, it can be joined nonhierarchically to a multitable struc-
ture following the same requirements stated above for joining data structures
nonhierarchically.
The capability to perform symmetric joins when modeling hierarchical
data structures is quite useful and an important feature for hierarchical data
modeling. Figure 7.11 demonstrates the usefulness of symmetric joins in mod-
eling hierarchical data structures. The first data structure in Figure 7.11 does
not use a symmetric join in modeling a structure with two Employee tables. It
uses the Department table to join the two Employee tables. This introduces a
number of problems, such as two separate Employee tables to access with (pos-
sibly) different employees in each. There is also another side effect of having the
Employee tables joined by their common department, causing an unnecessary
data explosion with rows that contain employee data from different employees.
The second data structure and its defining SQL in Figure 7.11 solve the
problems introduced from the first data structure that were noted above. The
90 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Symmetric Joins Can Be Useful in Hierarchical Structures

Bad Model Good Model

Dept Dept

EmpX EmpY EmpX EmpY

Dpnd Proj Dpnd Proj

SELECT * FROM Dept SELECT * FROM


LEFT JOIN EmpX Dept LEFT JOIN
ON DeptNo=EmpXDeptNo EmpX FULL JOIN EmpY
LEFT JOIN EmpY USING (EmpNo)
ON DeptNo=EmpYDeptNO ON DeptNo=EmpDeptNo
LEFT JOIN Dpnd LEFT JOIN Dpnd
ON EmpX.EmpNo=DpndEmpNo ON EmpNo=DpndEmpNo
LEFT Join Proj LEFT JOIN Proj
ON EmpY.EmpNo=ProjEmpNo ON EmpNo=ProjEmpNo

Figure 7.11 Symmetric join synchronizes legs of hierarchical structure.

Employee tables are naturally FULL outer joined, preserving all data from both
tables and creating one unique key for each row result produced. And this logi-
cal table result is placed in the data structure hierarchically in the correct posi-
tion without invalidating the data structure. This correctly matches up the
Employee tables without exploding the data or generating extraneous, incor-
rectly matched employee rows while still correctly organizing the employees
under their department. This also allows the joining of the Dependent and Pro-
ject tables to the structure by a match from either of the Employee tables, pro-
ducing a more consistent and accurate structure.

7.5 Many-to-Many Data Modeling and Intersecting Data


Many-to-many data relationships such as the well known Parts-Suppliers data-
base can be hierarchically modeled as either a Parts-over-Suppliers or Suppliers-
over-Parts relationship. These many-to-many relationships require an associa-
tion table to create hierarchical one-to-many relationships in both directions.
These many-to-many relationships were first described in Chapter 5.
The outer join hierarchical modeling of many-to-many relationships is
shown in Figure 7.12. As shown in the structure diagrams in this figure, the
Outer Join Data Modeling–Related Capabilities 91

Parts-Suppliers Conceptual View

Parts PSX Suppliers

Parts View Suppliers View


Parts Suppliers

PSX PSX

Suppliers Parts

SELECT * FROM Parts SELECT * FROM Suppliers


LEFT JOIN PSX LEFT JOIN PSX
ON Parts=PartX ON Suppliers=SupplierX
LEFT JOIN Suppliers LEFT JOIN Parts
ON SupplierX=Suppliers ON PartX=Parts

Figure 7.12 Outer join modeling of a many-to-many relationship.

association table (PSX), used in the SQL specification will appear transparent,
as it should. This is also the case if intersecting data from the association table,
such as prices of parts from each supplier, is selected, which will logically appear
as data from the lower level table. An example of intersecting data use can be
found in Chapter 12.

7.6 Conclusion
From the information supplied in this chapter and the preceding chapter, it
should be clear that the standard SQL join operation with its flexible syntax
and powerful outer join operation can be used or programmed to accomplish
tasks requiring complex semantics. The outer join can be used to model both
hierarchical and nonhierarchical data structures. Hierarchical data structures
are advantageous because they have singular meaning, which makes their
semantics unambiguous and for this reason better suited for application use.
Nonhierarchical structures, such as network structures, are not generally rec-
ommended for application view use, but may still be useful in applications with
very specific requirements as long as the SQL programmer is aware of their
unstable or ambiguous semantics.
There has been sufficient information supplied in these last two chapters
to enable the design and construction of a hierarchical, network, or hybrid data
structure using the standard SQL join operation. The LEFT and RIGHT outer
92 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

joins are hierarchical operations and are used to model a hierarchical data struc-
ture. The INNER and FULL joins are symmetric joins that do not model hier-
archical data structures, and can in fact invalidate hierarchical structures. It was
shown how these symmetric operations can be used to form logical tables that
can be safely and seamlessly introduced into a hierarchical structure being mod-
eled without invalidating it by using right-sided nesting. Similarly it was shown
how to symmetrically link data structures so they maintain a valid hierarchical
data structure.
Besides modeling data structures, the standard SQL join syntax also
seamlessly supports a fine level of data filtering that precisely filters data, follow-
ing the defined hierarchical data structure. To help with the coding of standard
SQL data modeling joins and features like the fine data filtering capability,
Chapter 8 describes a procedure that can help automate this process.
It was also shown how many-to-many relationships can be seamlessly
modeled. Using all the capabilities documented in this and the previous chap-
ter, any hierarchical data structure can be modeled.
8
More About Outer Join Data Modeling
This chapter examines the significance of the standard SQL outer join’s data
modeling and structure-processing ability to SQL, which did not previously
support this capability. It also examines how these outer join data modeling
statements can be generated, and their efficiency. This chapter also presents
empirical proof that the outer join does enable and support data modeling and
structure processing as presented in this book.

8.1 Importance of SQL’s Inherent Data Structure


Processing Ability
The standard SQL outer join’s natural data modeling and structure processing
capability establishes SQL’s ability to inherently perform complex data struc-
ture processing. This processing is not arbitrarily defined, but is a direct result
of the ANSI standard outer join’s inherent data modeling syntax and semantics.
This data modeling and structure processing capability, and the fact that it is an
ANSI standard, establishes the standard SQL outer join as a standardized SQL
method for performing data modeling and structure processing. It is important
for SQL vendors and designers to realize that any data modeling features added
to their SQL or the updates to the SQL standard will not work if they conflict
with SQL’s inherent support of data modeling through the outer join. This
natural and open data modeling capability also establishes a seamless and com-
patible integration path from SQL databases to non-SQL databases, and vice
versa. This is also aided by the fact that the outer join operation is not hindered

93
94 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

by having to follow the old inner join’s Cartesian product model of operation
as described in Chapter 2.

8.2 Efficient Client/Server Data Structure Processing


SQL queries are a prime candidate for distributed processing using a cli-
ent-server architecture. The client sends requests to a server and receives query
results sets. This is a distinct advantage over requiring a client platform to pro-
cess massive tables or data sets sent over a network wire. But with the outer join
operation inherently performing the data structure processing, it is performed
entirely on the server where the database resides, increasing efficiency and
decreasing network traffic.

8.3 Coding Data Modeling Outer Join Statements


Data structure processing outer join statements can be coded by walking down
the data structure from top to bottom and left to right starting with SELECT *
FROM Root-Table-Name. As each table or logical table (see Chapter 7) is
reached, add LEFT JOIN Table-Name ON Join-Cond. This is visually demon-
strated in Figure 8.1. The ON join condition links the lower level table to
the join point in the upper structure. The exact join rules were specified in
Chapter 6. Logical tables, if any are specified in the data structure, are
expanded after the data structure has been walked through. This is demon-
strated in Figure 8.2.

SELECT * FROM SELECT * FROM


Dept
Emp
LEFT JOIN
Emp ON &Cond LEFT JOIN LEFT JOIN

LEFT JOIN Dept Dpnd


Dpnd ON &Cond ON &Cond ON &Cond

SELECT * FROM Dept SELECT * FROM Emp


LEFT JOIN Emp ON DeptX=EmpX LEFT JOIN Dept ON EmpX=DeptX
LEFT JOIN Dpnd ON EmpY=DpndY LEFT JOIN Dpnd ON EmpY=DpndY

Figure 8.1 Coding data modeling outer joins from structure diagrams.
More About Outer Join Data Modeling 95

A Logical Table

Logical Table X Y Z

B X UNION JOINY UNION JOINZ

Build hierarchical structure: Then insert logical table definition:

SELECT * A LEFT JOIN SELECT * A LEFT JOIN


Logical-Table X UNION JOIN Y UNION JOIN Z
ON A=X ON A=X
LEFT JOIN B ON Z=B LEFT JOIN B ON Z=B

Figure 8.2 Coding outer join statements that use logical tables.

8.4 Generation of Data Modeling Outer Join Statements


Outer join statements can be automatically generated easily from data structure
meta information sources such as ER (entity relationship) diagrams or users
directly (see Chapter 14). Just as in Section 8.3, the outer join statement should
be generated following the structure top to bottom, left to right. If the data
structure meta information does not already have the metadata in this order
(which is highly unlikely), it should be set to this order first. This will assure
that the outer join statements are generated in the most efficient manner, which
is discussed in Chapter 11. Right-sided nesting can be used to define logical
tables that do not conform to strict hierarchical definition. This allows these
nonhierarchical definitions to be defined without invalidating the hierarchical
structure being built, as shown in Figure 8.2.

8.5 Hierarchical Data Structure Processing Empirical Proof


By using the interrelationships in the Department-Employee database, it can be
shown that the semantics of the standard SQL outer join operation can exactly
parallel the semantics of hierarchical data models. This enables it to perform
complex data modeling and data structure processing. The Department and
Employee data views in Figure 8.3, and their data tables, are taken from the
Department-Employee database comprised of the Department, Employee, and
Dependent tables. This database will be used to prove that the outer join can
inherently perform data modeling and structure processing.
96 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Department View Employee View

Department Employee

Employee

Department Dependent
Dependent

SELECT * FROM Department SELECT * FROM Employee


LEFT JOIN Employee LEFT JOIN Department
ON DeptNo=EmpDeptNo ON EmpDeptNo=DeptNo
LEFT JOIN Dependent LEFTJOIN Dependent
ON EmpNo=DpndEmpNo ON EmpNo=DpndEmpNo

Figure 8.3 Department and Employee outer join SQL views.

8.5.1 Hierarchical Control

The following progression of outer join examples follows the outer join’s opera-
tion as described above. The first two examples demonstrate a simple hierarchi-
cal modeling operation and show that it works for one-to-many as well as
many-to-one relationships.
The outer join specification Department LEFT JOIN Employee ON
DeptNo= EmpDeptNo creates the one-to-many hierarchical relationship of
Department over Employee because:

• Department can exist if no matching Employee(s) present.


Dept
• Employee(s) cannot exist if no matching Department found.
• One-to-many relationship supported:
Emp
• One Department can match many Employee(s).
• One missing Department can cause many missing
Employees.

The outer join specification Employee LEFT JOIN Department ON


DeptNo= EmpDeptNo creates the many-to-one hierarchical relationship of
Employee over Department because:
More About Outer Join Data Modeling 97

• Employee(s) can exist if they have no matching Department. Emp


• Department cannot exist if no matching Employee(s) exists.
• Many-to-one relationship supported:
Dept
• Many Employee(s) can match the same Department.
• Each missing Employee causes one Department occur-
rence to be missing.

8.5.2 Structure Control


The next two examples demonstrate structure control for modeling the Depart-
ment and Employee views defined earlier, and when processed they will follow
the same semantics. Notice the multiple ON clauses in each outer join specifi-
cation; they specify how the structure is linked.
The outer join specification Department LEFT JOIN Employee ON
DeptNo= EmpDeptNo LEFT JOIN Dependent ON EmpNo=DpndEmpNo cre-
ates the Department view.

• Department is linked directly over Employee (via its


Dept
ON clause).
• Employee is (then) linked directly over Dependent (via
its ON clause).
Emp

Proof:
• Dependent can exist only if a matching Department Dpnd
and Employee exist.
• Employee and Dependent exist only if a matching
Department exists.

The outer join specification Employee LEFT JOIN Department ON


DeptNo= EmpDeptNo LEFT JOIN Dependent ON EmpNo=DpndEmpNo cre-
ates the Employee view.

• Employee is linked directly over Department (via Emp


ON clause).
• Employee is (also) linked directly over Dependent
(via ON clause). Dept Dpnd
98 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Proof:
• Department and Dependent can only exist with a matching Employee.
• Department and Dependent are not dependent on one another:
• Department can exist without a Dependent.
• Dependent can exist without a Department.

Notice in the outer join proof directly above that the Dependent table
was joined after the Department table was joined, but that in this case these two
tables are on different paths and cannot influence each other. This is because
the Dependent table was joined to the Employee table and not the Department
table; therefore, it doesn’t rely on the Department table’s existence even though
it was joined in a later join operation.
While the example data structures used in this section do not show many-
to-many relationships directly, many-to-many relationships (see Chapter 5) are
composed of many-to-one and one-to-many relationships, which were
described in this section. It is therefore not necessary to show examples of
many-to-many relationships.

8.6 Nonhierarchical Data Structure Processing


Empirical Proof
Nonhierarchical join operations such as FULL, INNER, and UNION joins do
not model hierarchical data structures, which means they can invalidate hierar-
chical structures they are used in. A solution is to isolate and shelter their use
using right-sided nesting as described in Chapter 7, which treats their use as
logical tables. These logical tables are comprised of symmetric joins that make
their structure flat, which is also necessary to preserve the validity of the hierar-
chical structure. An example is T1 LEFT JOIN TX UNION JOIN TY ON
T1=TX LEFT JOIN T2 ON TY=T2.

• Table T1 and its LEFT join are put on hold, waiting until T1
the matching ON clause is ready for processing. During this
time, T1’s working set cannot be modified.
• While waiting for table T1 and its LEFT join’s matching TX TY
ON clause, tables TX and TY are UNIONed in isolation.
Since the UNION operation is symmetric, the resulting
structure is neutral and not hierarchical, making it a valid T2
logical table.
More About Outer Join Data Modeling 99

• When table T1’s matching LEFT join ON clause is reached, T1 is


LEFT joined to the logical table, which is a result of the UNION that
was processed in the interim. This places T1 hierarchically over the
UNIONed result.
• Finally, the above structure is LEFT joined over table T2, linking table
T2 to the TX | TY logical table.

Proof:

• Table T1 can exist if logical table TX | TY or table T2 does not exist.


• The logical table cannot exist if no T1 occurrence matches it.
• T2 cannot exist if no logical table occurrence matches it.

It is worth repeating here that logical tables do not have to be specified


inline as shown above, they can be specified as views, which are easier to specify
and more flexible for reuse. For example, the logical table view used above can
be defined as the view TX UNION JOIN TY, which can be easily embedded
when needed, as in T1 LEFT JOIN LogicalTableView ON T1=TX LEFT JOIN
T2 ON TY=T2, which expands to be identical to the logical table in the proof
above. This means that this and other embedded logical views are also proven
by the above proof, as are symmetric substructure joins, which also utilize logi-
cal tables to perform their nonhierarchical join operation.

8.7 Embedded Structured View Support Empirical Proof


As explained in Chapter 7, structured views can be seamlessly embedded to
form larger structures. It was also shown that logical tables could also be seam-
lessly embedded. It was stated that structured and logical table views within
views are also inherently supported. Let’s look at some examples and see why
they work. The first example in Figure 8.4 examines embedded left-sided nest-
ing, which occurs with views specified on the left side of the join operation—
later examples examine right-sided views.
The first example in Figure 8.4 demonstrates the basic left-sided view
source replacement (view expansion) that produces left-sided nesting. As this
demonstrates, left-sided nesting is naturally processed left to right without
requiring any special internal operations such as table argument stacking for
LIFO processing. The second example demonstrates how this natural left-to-
100 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Single Level View


A View ABViewdefined as: A LEFT JOIN B ON A=B
B Single view: ABView LEFT JOIN C ON A=C
C Expanded view: A LEFT JOIN B ON A=B
LEFT JOIN C ON A=C

Nested View
View ABCView defined as: ABView LEFT JOIN C ON B=C

A Nested view: ABCView LEFT JOIN D ON C=D


B First expansion: ABView LEFT JOIN C ON B=C
LEFT JOIN D ON C=D
C

D Second expansion: A LEFT JOIN B ON A=B


LEFT JOIN C ON B=C
LEFT JOIN D ON C=D

Figure 8.4 Example of nested left-sided view expansion.

right processing handles nested left-sided views, processing them in LIFO fash-
ion (the last nested view source replacement is the first to be processed). This
preserves the data modeling semantics of each view—allowing logical table
views to be specified on the left side where they can’t affect the data structure.
Let’s now examine some examples of right-sided view source replacement
and see how and why it works. Right-sided nesting occurs when views are
expanded on the right side of the join operation. The first example in
Figure 8.5 demonstrates the basic right-sided view replacement, which pro-
duces right-sided nesting. As this example demonstrates, right-sided nesting is
not processed left to right, but requires postfix processing and argument stack-
ing, changing the processing order to right to left. This stacking processing will
be discussed in further detail in Chapter 9, Section 9.4. The second example
demonstrates how this right-sided processing is handled in nested right-sided
views. The stacking creates a protected environment that preserves the data
modeling semantics of each view, allowing logical table views to also be speci-
fied on the right side.
Notice in the second (nested view) examples in Figures 8.4 and 8.5 that
the innermost nested views of both are processed first. In Figure 8.4, left-sided
views expand their view source to the left as the nested views are expanded
More About Outer Join Data Modeling 101

Single Level View

B View CDView defined as: C LEFT JOIN D ON C=D

C Single view: B LEFT JOIN CDView ON B=C

D Expanded statement: B LEFT JOIN


C LEFT JOIN D ON C=D
ON B=C

Nested View
View BCDView defined as: B LEFT JOIN CDView ON B=C

A Nested view: A LEFT JOIN BCDView ON A=B

B First expansion: A LEFT JOIN


LEFT JOIN CDView ON B=C
ON A=B
C
Second expansion: A LEFT JOIN
D B LEFT JOIN
C LEFT JOIN D ON C=D
ON B=C
ON A=B

Figure 8.5 Example of nested right-sided view expansion.

when encountered in the nesting processing. This causes them to be executed


naturally in LIFO order, as can be plainly seen in the second example in
Figure 8.4. In the second example in Figure 8.5, the right-sided expanded views
were also executed in reverse (LIFO) order, not because of their placement as in
Figure 8.4, but because of right-sided nesting. Right-sided nesting controls exe-
cution order by placement of the ON clause, as was first described in Chapter 2
and later in Chapter 7.

8.8 Indirect Link Empirical Proof


The next example demonstrates structure control for modeling an indirect link
(described in Chapter 6). When processed, it will follow the semantics shown
in the data model display below. Notice the existence test used to accomplish
the indirect link.
The outer join specification Employee LEFT JOIN Department ON
DeptNo= EmpDeptNo LEFT JOIN Dependent ON EmpNo=DpndEmpNo AND
DeptNo NOT NULL creates this special Employee view.
102 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

• Employee is linked directly over Department (via its ON Emp


clause).
• Dependent is (then) linked indirectly under Department
(via its ON clause existence test). Dept

Proof: Dpnd

• Department can exist only if a matching Employee


exists.
• Dependent can exist only if a matching Employee exists and a Depart-
ment exists for the matching Employee.

8.9 SQL:1999 and Data Modeling


SQL:1999 is known as the object/relational version of SQL. Adding an object-
oriented flavor, it has introduced the support of abstract data types (ADTs),
which are supported by the addition of user-defined types (UDTs) and user-
defined functions (UDFs). These constructs allow abstract data types to be
defined, stored, and processed in SQL. UDFs are externally defined functions
that can be invoked by SQL to process UDTs. Private and commercial object
libraries can be created to handle and process objects such as multimedia video
or medical ADTs that define MRI and X-ray objects, allowing SQL to store
and process these powerful and useful new data types.
UDTs can also represent complex data types such as hierarchical struc-
tures, and UDFs can process these complex data types. The creation of these
complex and abstract data types is performed external to SQL. This method of
complex data structure processing does offer an alternative to data modeling
and structure processing using the standard join operation. UDTs and UDFs
are useful for representing and processing less formal, more abstract data types.
These tend to be by their nature more specialized static objects. On the other
hand, processing data structures using SQL join operations is useful for defin-
ing and processing general-purpose hierarchical data structures that can be
specified and built in real time if necessary and from many data sources. And
since the standard SQL join is ANSI standard, the data modeling enabled by it
will be available across SQL systems, which is not necessarily true of the UDF
structure processing procedures that are not standardized.
More About Outer Join Data Modeling 103

SQL:1999 introduced the capability to store nested, hierarchically struc-


tured data in a row using the new composite types ROW and ARRAY. Because
these structures are stored in a single row, the semantics of the data structure
cannot be fully utilized by SQL in a nonprocedural way. As well, there are also
other limitations of this hierarchical data storage. The structure is fixed, losing
its data independence, and substructures cannot be joined to form larger
structures.
SQL:1999 is not the only object query language being designed and put
forth as a standard. OQL is a database object query language that supports the
ODMG model. ODMG is an object model put forth by the Object Database
Group to supply a standard for object databases. It is a separate standard from
SQL:1999, though it does utilize many aspects of SQL. In fact, OQL is based
heavily on standard SQL. OQL does not support the standard SQL outer join
facility that supports data modeling, but relies on ODMG’s Object Definition
Language (ODL), which includes a schema definition capability.
SQL:1999 and ODMG appear as competitive object standardization
efforts. SQL:1999 starts from SQL and moves towards object, while ODMG
starts from an object point of view and moves towards SQL and other database
platforms. In this regard, ODMG can be thought of as a standard for support-
ing the heterogeneous processing of multiple platforms. This should enable
ODL’s language-independent data modeling capability and SQL:1999’s data
modeling capabilities to be freely mapped to one another.

8.10 What Makes the ANSI Standard Outer Join


Unique for Data Modeling
Besides being standardized, the newer outer join operation has two operational
characteristics that make it very different from the older nonstandardized outer
join. The first characteristic is found in the outer join’s flexible syntax that
allows it to specify the table join order, and the second is its ability to specify
the join criteria at each join point. These capabilities were added because it was
found that the table join order can influence the result of outer join operations.
This makes the newer standardized outer join more powerful, with the capabil-
ity to specify data structures with the most complex semantics.
With the flexibility to specify the table join order, the use of nonhierar-
chical, symmetric join operations such as the FULL, INNER, or UNION can
be utilized in the construction of hierarchical data structures to form flat virtual
tables. The use of nonhierarchical join types is described in Chapter 7. The
ability to specify the join criteria at each join point can become necessary when
qualifying joins based on values further up the path from the join point. This
104 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

could lead to a conflicting join clause if placed on a single WHERE clause, as


demonstrated in Figure 8.6. This means that these two fairly new capabilities
provide basic capabilities in SQL that significantly affect SQL’s standard opera-
tion, which allows the definition of data structures with extremely complex
semantics not possible otherwise.

8.11 Data Modeling with Old-Style Outer Joins


It is worth noting that standard outer joins that model hierarchical data struc-
tures that do not require the features unique to the standard SQL outer join, as
described above in Section 8.10, can be converted to old-style outer joins. This
is shown in Figure 8.7, where the Department and Employee hierarchical views
have been converted to the old-style joins. The plus sign is used in the WHERE
clause to specify the table to be preserved.
The data modeling using old-style outer joins in Figure 8.6 is possible
because hierarchical structures can be built in any order, top to bottom, bottom
to top, or any combination of these two, as demonstrated in Chapter 3.
Because of this, the old-style outer joins, which are not capable of specifying the
join table order, are capable of modeling simple hierarchical structures. These
are one-sided outer joins that do not include symmetric join operations, and
whose WHERE clause join conditions must unambiguously define the hier-
archical links between link point tables (see Figure 8.6 for an ambiguous
WHERE clause example). This data modeling SQL join statement is also not

Employee

Department Dependent
SELECT *
FROM Employee
LEFT JOIN Department ON DeptNo=EmpDeptNo AND EmpStat=“Full”
LEFT JOIN Dependent ON EmpNo=DpndEmpNo AND EmpPos=“Mgr”

WHERE clause query below does not represent above ON clause query

SELECT *
FROM Employee LEFT JOIN Department LEFT JOIN Dependent
WHERE DeptNo=EmpDeptNo AND EmpStat=“Full”
AND EmpNo=DpndEmpNo AND EmpPos=“Mgr”

Figure 8.6 WHERE clause cannot replace all ON clause uses.


More About Outer Join Data Modeling 105

SELECT * Department
FROM Department, Employee, Dependent
Employee
WHERE DeptNo(+)=EmpDeptNo
AND EmpNo(+)=DpndEmpNo Dependent

SELECT * Employee
FROM Department, Employee, Dependent
WHERE DeptNo=EmpDeptNo(+)
AND EmpNo(+)=DpndEmpNo Department Dependent

Figure 8.7 Old-style outer joins can perform limited data modeling.

as obvious as the equivalent standard SQL join statement. These old-style outer
joins can be easily translated into standard SQL joins.

8.12 The New Role of the Inner Join Operation


Originally, the inner join operation was used in every join condition—there
was no other choice available. A semantically neutral structure was always pro-
duced, whether this was desired or not. With the addition of one-sided outer
join operations (LEFT and RIGHT), which specify hierarchical relationships,
inner joins take on a new meaning and use. They no longer should be used
without regard to data relationships or data structures. With one-sided joins
specifying hierarchical relationships, inner joins should only be used to specify
relationships that are truly meant to represent equal or balanced relationships.
This will produce semantically structured results that accurately reflect the
semantics of the data being accessed, which produces more accurate results. So,
inner joins have been elevated from not being able to definitively specify a
relationship to being able to unambiguously specify an equal or balanced
relationship.

8.13 Conclusion
This chapter has presented empirical proof that outer join statements can per-
form data modeling and structure processing, and demonstrated that views
containing structures and logical tables can be used seamlessly in building and
modeling complex data structures. It pointed out that because this data model-
ing capability is possible with standard SQL statements, it can be used safely,
can maintain its usefulness with SQL:1999, and can also become a default
106 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

standard for database data modeling. It was shown how data modeling outer
joins can be generated by constructing them while following the hierarchical
data structure, and that it was possible to use older nonstandard-style outer
joins to model simple data structures. Finally, this chapter discussed the impor-
tance of SQL’s inherent data structure processing ability, and how the inner
join’s role and proper use has changed with the addition of the outer join.
Part III
New Capabilities Based on Outer Join
Data Modeling
Part III describes advanced SQL capabilities made possible by the standard
SQL outer join data modeling capability that SQL vendors can offer to users.
Chapter 9 introduces the data structure extraction (DSE) technology used to
extract the data structure information naturally embedded in standard SQL
outer join statements. Chapter 10 identifies a number of advanced capabilities
made possible by the data modeling capability of the standard SQL outer join.
Chapter 11 describes the many powerful semantic SQL optimizations that are
possible based on the data modeling information available from outer joins.
Chapter 12 demonstrates a hierarchical relational processor prototype that
operates by utilizing the data structure information from outer join statements.
Chapter 13 presents an object relational interface that is based on the data
structure information from outer join specifications. Chapter 14 looks at
nonrelational SQL-based universal data access frameworks and how outer join
processing naturally fits in by using a structured data record interface as an
example.

107
9
Data Structure Extraction (DSE)
Technology
Advanced Data Access Technologies a company affiliated with the author, has
been researching the standard SQL join operation for a number of years. It real-
ized that the outer join operation, which is part of the SQL standard, along
with the standard SQL powerful syntax, combine to produce powerful data
modeling and data structure processing capabilities. Since SQL previously had
no inherent data modeling and data structure processing capabilities, Advanced
Data Access Technologies also realized this would be of significant benefit to
users and vendors if recognized, understood, and properly utilized.

9.1 Extracting Data Structure Information From the Outer Join


After researching and documenting the standard SQL join and its data model-
ing and data structure processing capabilities, Advanced Data Access Technolo-
gies developed and patented a data structure extraction (DSE) technology and
software. This technology dynamically recovers the data modeling metadata
embedded in outer join specifications. This technology makes it possible for
SQL vendors to utilize the powerful standard SQL join syntax and semantics to
support advanced new capabilities not previously possible. The following chap-
ters demonstrate examples of the technology described in this chapter. The
hierarchical relational processor example in Chapter 12 is taken from its actual
implementation.
A very valuable characteristic of this DSE technology is that it recovers
very useful semantic information that is naturally present in standard standard
SQL join specifications. Using this freely available information, advanced

109
110 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

capabilities are possible. These include advanced semantic optimizations, intu-


itive multitable updates, truly transparent and seamless access to legacy and
nonrelational databases, increased flexibility and accuracy in reporting capabili-
ties, and important object-oriented database capabilities such as database
navigation and data inheritance. These capabilities are discussed further in
Chapter 10, and can result in competitive advantages that are standard SQL
compatible, consistent with relational technology, and require little or no addi-
tional effort on the part of the user.

9.2 DSE Example


The example in Figure 9.1 demonstrates the DSE software processing a com-
plex standard SQL outer join statement. It accepts the SQL statement, produc-
ing the extracted data structure meta information in table form. The data
structure diagram in this example is not produced by the DSE algorithm, but is
supplied to help you visualize the data structure. The processed SQL statement
in this example is a complex standard SQL join specification that contains a
combination of left- and right-sided nesting to demonstrate that this complex
syntax can be handled properly by the DSE technology.
Shown in Figure 9.1, the DSE technology extracts and presents in table
form the data structure meta information that is naturally embedded in stan-
dard SQL join specifications. The standard SQL join is incredibly rich in syn-
tax and processing options, allowing the user the flexibility to combine tables of
data in any way necessary to produce the desired semantic result. This results in

SELECT * FROM A
A LEFT JOIN B ON A1=B1
LEFT JOIN B C
C LEFT JOIN D ON C1=D1
ON A2=C2 D

Produces the data structure information table:

Table Table Structure Parent


No. Name Level No.
1 A 1 0
2 B 2 1
3 C 2 1
4 D 3 3

Figure 9.1 SQL DSE example.


Data Structure Extraction (DSE) Technology 111

complex data structures being modeled even though the standard SQL join
programmer may not realize that he or she is performing data modeling.
The DSE technology dynamically determines the data structure by ana-
lyzing and interpreting how the outer join statement has been specified, taking
into account the table relationships used and general hierarchical data structure
concepts and principles that were discussed in Chapters 5 and 6. This data
structure extraction is accomplished with no additional or supplemental infor-
mation supplied by the programmer or SQL system other than what is nor-
mally available. This makes capabilities supported by the DSE technology
seamless and transparent. The DSE technology also detects invalid structures
(see Chapter 6), and can operate dynamically for use with ad hoc (i.e., interac-
tive) and object-oriented uses (i.e., late binding).

9.3 Logical Table Example


To support logical tables, the DSE prototype is extended to represent a logical
table in the data structure by modifying its data structure meta information
output table while keeping it compatible with the standard format. To define a
logical table in the DSE prototype’s output, the structure level indication of the
first table in the logical table is set as usual to its hierarchical Structure Level in
the data structure. The other tables in the logical table have their Structure Lev-
els set to zero. This indicates and delimits a logical table entry.
The Parent No. of the first table to be joined in a logical table points to
the logical table’s parent in the hierarchical structure being defined. The Parent
No. for the other tables in the logical table specifies the table in the logical table
that directly precedes their joining. This indicates the logical table’s join table
order, which may be important for nonhierarchical logical tables. As shown in
Figure 9.2, the tables in a logical table are stored contiguously and in the order
they are joined. With this method of specifying logical tables, more than one
logical table can be represented in a data structure.

9.4 Symmetric Linking of Data Structures Example


Similar to the way logical tables can be formed by symmetric join operations as
shown in Section 9.3, data substructures can also be joined symmetrically, as
documented in Chapter 7. This is demonstrated in Figure 9.3. In this example,
the substructures are built inline, but they could have been expanded in
the same fashion as if they were referenced stored structure views. Because sub-
structures that are symmetrically joined can only be linked at their root table,
112 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT * FROM A LEFT JOIN


A
X INNER JOIN Y ON X=Y
INNER JOIN Z ON Y=X X Y Z
ON A=Y
LEFT JOIN B ON Z=B B
Produces the data structure information table:

Table Table Structure Parent


No. Name Level No.
1 A 1 0
2 X 2 1
3 Y 0 2
4 Z 0 3
5 B 3 2

Figure 9.2 Logical table DSE example.

SELECT * FROM FULL JOIN


A
A LEFT JOIN
X LEFT JOIN Y ON X=Y
FULL JOIN X M
M LEFT JOIN N ON M=N
ON X=M Y N
ON A=XM

Produces the data structure information table:

Table Table Structure Parent


No. Name Level No. A
1 A 1 0
2 X 2 1 X M
3 M 0 2
4 Y 3 2
Y N
5 N 3 2

Figure 9.3 Symmetric data structure linking DSE example.

the example in Figure 9.3 covers the only situation possible for this type of link-
ing. Notice how the generated hierarchical structure meta information remains
top-down, indicating that linking of the root substructure tables X and M can
be performed before their associated substructures are built. So, this symmetric
data structure join is represented in the structure meta information the same
way that the logical table was in Section 9.3.
Data Structure Extraction (DSE) Technology 113

9.5 DSE Internal Logic


As should be apparent by now, the standard SQL outer join has the syntax and
semantics necessary to define and process complex data structures. This
includes ON clauses, which specify the join condition at each join point. To
extract the data structure meta information from the complex syntax and
semantics used to define data structures requires parsing the join statement and
mapping the data structure as the statement is processed. The LEFT and
RIGHT joins specify the hierarchy between the two table arguments, and the
ON clauses specify the link point between the two table arguments. With
LEFT joins, the left table argument has the upper position, and with RIGHT
joins, the right table argument has the upper position.
As mentioned many times already, right-sided nesting triggered by delay-
ing ON clauses requires stacking the join table arguments and join type. When
an ON clause is encountered while parsing the join statement, its matching
right and left table arguments on top of the stack are linked using the ON
clause criteria as defined in Chapter 6. At times during the parsing process,
multiple separate structures can be defined because of right-sided nesting,
which starts a new substructure and working set to contain it, as described in
Chapter 7. But at the completion of parsing the join statement, all the separate
structures will have been combined so that only one structure will have been
mapped. This mapped structure is then represented in table form, as shown in
Figures 9.1 to 9.3.
When a symmetric join operation such as a FULL, INNER, or CROSS
join is detected, the existence of logical tables and symmetrically joined data
structures is checked. If found, they are processed as described in Sections 9.3
and 9.4 to produce a valid hierarchical data structure. All tables joined in a logi-
cal table are given the same hierarchical level number, which identifies a flat
logical table. Symmetrically joined substructures are reordered so their
root-level symmetric join is performed first, making it a logical table and
defined as just stated above. With this logical table in place, symmetrically
joined structures do not require any other special definition in the produced
data structure meta information.

9.6 Why Vendors Need the DSE Technology


Adding new features and capabilities to SQL products to differentiate them
from other similar products on the market is a necessity for SQL product
vendors, but presents the problem of introducing nonstandardized, proprietary
114 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SQL. The DSE technology is a building block technology that allows the easy
addition of powerful new standard SQL–compatible features and capabilities
that eliminate or greatly reduce this non-SQL standardization problem. It can
also significantly help with the problem of poor efficiency with the standard
SQL outer join operation, and in many cases can bring its efficiency up to that
of the older standard inner join. Outer join specifications with questionable
(i.e., ambiguous) data structure semantics are also detected. Lastly, with this
data structure meta information freely available, it makes good business sense
to put it to use.

9.7 DSE Avoids Imposing Data Structures on SQL


The concept and technique of using SQL for universal data access is quite well
accepted and utilized. This includes using SQL to access pre- and postrelational
data. Flat nonrelational structures do not present a problem, but structured
nonrelational structures do introduce the problems of data mapping and data-
base navigation, which require access to data structure meta information. Up
until the availability of the DSE technology, specifying or communicating the
data structure meta information to a SQL-based nonrelational processor had to
be performed externally to the SQL access request.
This method of externally supplying the data structure meta information
has two obvious problems. First, its specification and transport are proprietary.
Second, it does not necessarily reflect the true semantics of the SQL it is sup-
posed to be modeling. This is because the SQL specification is often limited to
inner joins, which can only model flat data structures. This results in a mis-
match between the flat SQL-defined structure and the very structured exter-
nally supplied data structure meta information, preventing a totally seamless
interface. If the SQL specification is composed of outer joins that are modeling
the true physical data structure, the externally supplied data structure meta
information is not necessary. This is because the DSE technology can automat-
ically supply this meta information when needed and do it using a standard
standard SQL solution. This naturally extends the plug-and-play capabilities of
standardized SQL.
There is a third, less obvious problem lurking when imposing a data
structure on a SQL specification. This occurs when the SQL specification con-
tains one-sided outer join operations that do not model the externally supplied
data structure meta information. In this case, there can be a conflict between
the externally supplied data structure meta information and the data structure
being naturally modeled by the SQL specification. This will produce semantics
that do not match either the SQL specification or the imposed externally
Data Structure Extraction (DSE) Technology 115

supplied data structure meta information. This mismatch will often produce
erroneous results. The best solution all the way around is to use the natural data
modeling capability of outer joins and the DSE technology to supply the data
structure meta information wherever and whenever it is needed. Since the DSE
technology is deriving the data structure meta information directly from the
SQL, its data structure meta information is always accurate, with little or no
chance for error.

9.8 Conclusion
The DSE technology proved that it is possible to dynamically extract the data
structure meta information embedded in standard SQL join specifications.
These hierarchical data structures can also utilize nonhierarchical, symmetric
join operations in their definition to support logical tables and symmetric sub-
structure joins. What makes this technology unique is that it is fully standard
SQL compatible (both syntactically and semantically), which enables SQL fea-
tures not previously possible with standard relational databases. It was also
shown why this technology offers the best solution to supplying data structure
meta information to SQL-based data access drivers and processors.
The following chapters will demonstrate how this dynamically supplied
meta information provided by the DSE technology can be utilized to create
new products and features. These features include powerful semantic
optimizations, seamless legacy access, object capabilities, postrelational process-
ing, and plug-and-play capabilities.
10
Outer Join Advanced Capabilities
This chapter presents advanced capabilities that SQL vendors can implement
for their users by utilizing the data modeling and data structure processing
capabilities of the standard SQL outer join operation. The advanced capabili-
ties are made possible by dynamically extracting the data structure meta infor-
mation from standard SQL outer join specifications. This data structure meta
in- formation is free information, placed in the outer join specification either
knowingly or unknowingly by the programmer of the outer join specification.
It can be extracted for the SQL product’s use by a DSE procedure like the one
documented in Chapter 9. With this information, the advanced database capa-
bilities covered in this chapter are possible.

10.1 Database Navigation


Database navigation is not useful by itself, but is required to accomplish many
of the advanced capabilities presented in this chapter. Database navigation is
the ability to move through the database utilizing its data structure. With rela-
tional databases, this is not necessary since they are navigationless, not requiring
manual navigation. In other words, the database system automatically navigates
for the user, which is standard for fourth-generation languages (4GLs) like
SQL. There is a trade-off with navigationless access—you lose control, but the
access can still be highly optimized.
Obtaining the meta information extracted from the outer join specifi-
cation enables navigational instructions to be generated for nonrelational
access, as demonstrated in Figure 10.1. These navigation instructions can be

117
118 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

optimized since the entire portion of the data structure being accessed can be
determined before being accessed. These navigational instructions can be used
to access any database that supports hierarchical access. The extracted data
structure can be a logical structure composed of more than one physical type of
database so that support for disparate heterogeneous databases and enter-
prise-wide access is also possible. When navigating physical databases, the order
of sibling legs, such as B before C in Figure 10.1, may be important. It is useful
to realize that the database navigation process described here can be performed
dynamically.

10.2 Access Optimizations


The data structure semantics that are derived by the extracted data structure
meta information from the outer join specification can be used by the database
engine to perform many powerful semantic optimizations that are not possible
otherwise. The most significant is the dynamic removal of unnecessary tables
from outer join views based on which table columns are selected at view invoca-
tion. This is demonstrated in Figure 10.2, where the dashed blocks represent
tables that do not require access. This optimization is not possible for inner
joins views, which must always access each table in the view, but it is possible
for outer join views taking into consideration where each table in the view
is located in the data structure. This optimized view capability dynamically
“downsizes” outer join views, so there is never a penalty for including too many
tables in a view. In fact, this feature should reduce the number of views neces-
sary, making life easier for database professionals and end users querying the
database. This and many other powerful outer join optimizations are covered
further in Chapter 11.

Generic Database Access


SQL Outer Join: Pseudo Code:
Legacy
SELECT * A GetFirst A
FROM A GetFirst B,C Enterprise
LEFT JOIN B ON A=B GetNext B,C
B C
LEFT JOIN C ON A=C GetNext A Object

Figure 10.1 The outer join can enable universal database navigation and access.
Outer Join Advanced Capabilities 119

CREATE VIEW ViewABC AS


SELECT * FROM A LEFT JOIN B ON A=B LEFT JOIN C ON A=C

A A
SELECT C SELECT A
FROM ViewABC FROM ViewABC
B C B C

Figure 10.2 Outer join view dynamic optimization based on selection criteria.

10.3 Enterprise and Legacy Database Access


The outer join syntax is not limited or tied to relational databases. By using the
database navigation ability described earlier in Section 10.1, enterprise, legacy,
and postrelational databases can be accessed in any combination by utilizing
the data modeling capabilities of the standard SQL outer join syntax. This is
demonstrated in Figure 10.3, and can be performed dynamically via user inter-
action to support ad hoc queries. Since the outer join can precisely define hier-
archical structures, only one-to-one mapping is necessary to access hierarchical
nonrelational databases, allowing efficient and truly seamless access. And since
the data structure definition can be specified dynamically using the outer join
syntax, and supplied dynamically by the DSE procedure, no external
predefined data structure definition is necessary. With the data structure meta
information in hand, nonrelational database calls or language statements
can be dynamically constructed and performed. This was demonstrated in
Figure 10.1. For more detailed information on nonrelational access, see
Chapter 14.
Nonrelational data access can actually be made more efficient using SQL.
Since SQL is a 4GL, also known descriptively as a declarative language, its

SELECT * IMS: SELECT * SQL: C


FROM A FROM C
LEFT JOIN B A SQL: LEFT JOIN A
ON A=B ON A=C IMS: A
LEFT JOIN C B C LEFT JOIN B
B
ON A=C ON A=B

Figure 10.3 Disparate database access is possible with the outer join.
120 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

access statements do not instruct how to access the database, but rather what is
desired from the database. This means that all the information needed to know
how to access the database is determined by a query optimizer, allowing an effi-
cient global access strategy to be developed. Because of this, very efficient access
can be achieved, as in the example in Figure 10.2, which can also be applied to
nonrelational databases. Nonrelational optimized SQL access is described in
more detail in Chapter 11, and nonrelational heterogeneous SQL access is
described further in Chapter 14.

10.4 Open Database Access Interface


The standard SQL outer join operation makes a powerful “open” database
access interface because it is supported by most SQL vendors, it is standardized,
and its syntax is free to use. It can also perform complex ad hoc data structure
processing and define access for most database types, and it automatically car-
ries the data structure meta information within it, making it very useful for
database access over the Internet. These features make the data structure meta
information available to all procedures that process the outer join, as illustrated
in Figure 10.4. By carrying the data structure meta information within it, the
outer join interface avoids passing this information around using an arbitrary
method and format. This also enables the standardization of powerful plug-
compatible database components, allowing data structure meta information to
be mixed and matched.

10.5 Seamless Value-Added Features


The data structure modeling capability of the standard SQL outer join can sup-
port many value-added features in SQL that are based on the data structure
specified by the outer join operation. These include more accurate aggregate

Front Ends: Back Ends:


SELECT *
FROM A
ProdA RDBMS A
LEFT JOIN B ProdX
ON A=B ProdB Legacy B C
ProdY
LEFT JOIN C
ON A=C
Outer join syntax carries data structure

Figure 10.4 Outer join open database access interface.


Outer Join Advanced Capabilities 121

functions that can occur anywhere in the data structure and do not include rep-
licated data values in the results, more flexible aggregate operations where the
range of input columns is controlled naturally by the data structure, and easing
of syntax limitations. An example of more flexible and accurate syntax is shown
in Figure 10.5. Summary results are taken at multiple locations in the data
structure, and the WHERE and HAVING clauses allow a two-level filtering
where rows can be filtered before being summed and then filtered on their
summed value. Additionally, the use of this advanced summary processing in
the HAVING clause has avoided the need for a nested SELECT statement.

10.6 Data Warehouse Interface


Because data warehouses typically consist of massive databases, there are good
reasons for data modeling, data structure processing, and schema refinement.
Star schemas and snowflake schemas have emerged for data warehousing with
SQL technology.
The data warehouses built with SQL platforms can use standard SQL
data access interfaces, such as ODBC, JDBC™ and SQL/CLI. The outer join
syntax is accessible via those standard APIs, as is dynamic SQL, shown in Fig-
ure 10.6. With the outer join’s enterprise access capability discussed in Section
10.3, the data warehouse can be comprised of non-relational databases, too. In
addition, there are ODBC and JDBC™ drivers for nonrelational data stores
used for data warehousing, such as Apache Hive.

10.7 Hierarchical Relational Processing


Hierarchical relational processing is the processing by SQL of relational and
nonrelational data in a structured hierarchical fashion such as DOM parsing of
XML and COBOL structure processing. Normally, this required the data to be

SELECT SUM(ProdBudget BY Division) Division


SUM(DeptBudget BY Division)
FROM DivisionView Product Department
WHERE EmpStatus=“Fulltime”
HAVING SUM(EmpSalary BY Department) > 500,000 Employee

Figure 10.5 Multiple summaries taken at different locations in the data structure.
122 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Data Warehouse Repository

Division Division Employee

Product Department Product Department Department Dependent

Employee
Product Department

Dependent
Division Manager Manager Employees

Figure 10.6 The outer join can access unlimited views from data warehouse repository.

stored in a nonfirst normal form (structured or nested format), doing away


with relational’s flat two-dimensional table limitation. Unfortunately, this
meant that the data structure was fixed and had to be defined beforehand. But
with the outer join’s data modeling and structure processing ability, this hierar-
chical relational processing can be also performed on standard SQL systems by
processing standard first normal form tables as hierarchical data structures, and
without requiring that the data structure be predefined. The outer join specifi-
cation can specify and hierarchically process any possible hierarchical data
structure that relational data tables and fixed nonrelational databases can logi-
cally define. This feature can be considered data structure independence.
Outer join hierarchical relational processing operates seamlessly, and pre-
cisely matches its defined hierarchical semantics. This hierarchical relational
processing can perform powerful semantic operations, avoid unnecessary data
replications, support advanced summary functions, produce more accurate and
flexible summary operations, and display the data in a visual structured display
format that accurately reflects its data structure, as shown in Figure 10.7.
If this sounds to good to be true, a prototype using the DSE technology
described in Chapter 9 was built, and live examples from it are shown in
Chapter 12.

Nested Relational Display Standard SQL Display


Dept
Dept Emp Dpnd Dept Emp Dpnd
Emp DeptA Mike Jason DeptA Mike Jason
Jane DeptA Mike Jane
Dpnd Mary Sam DeptA Mary Sam

Figure 10.7 Hierarchical relational display compared to standard SQL display.


Outer Join Advanced Capabilities 123

10.8 Object Relational Interface


One of the main problems that slowed adoption of object databases and
NoSQL databases was the lag time in developing standard query and program-
ming interfaces. A standard and familiar relational database interface would
make an excellent interface except for its total lack of data modeling and data
structure processing ability, which is an important requirement for object data-
bases. With the outer join and its data modeling and structure processing capa-
bility, it would make an excellent standardized and familiar hierarchy
processing interface, such as the one shown in Figure 10.8.
Besides being able to read and write complex relational and nonrelational
data structures directly, avoiding relational-to-object mapping, an object rela-
tional outer join interface can also support dynamic specification of the data
structure through dynamic execution. This enables late binding and polymor-
phism, support of data abstraction, reuse through its substructure view support
(described in Chapter 7), and the support of legacy database access as described
earlier in Section 10.3. The outer join object relational interface is covered in
more detail in Chapter 13.

10.9 View Update Capability


Updating of join views is not usually supported in SQL. This is because multi-
ple tables are involved, making the join operation ambiguous for updating
since its join result is usually exploded because of the Cartesian product effect.
This makes it very difficult to know how to apply the result back to the under-
lying base tables. But when the outer join is used to define valid hierarchical
data structures, it can be possible to update multitable views unambiguously

SELECT * A
FROM A 01 A Char 20
LEFT JOIN B ON A=B 10 B Char 20 Occurs …
LEFT JOIN C ONA=C B C 10 C Char 20 Occurs …

A
SELECT *
FROM A B 01 A Char 20
LEFT JOIN B ON A=B 10 B Char 20 Occurs …
LEFT JOIN C ON B=C C 20 C Char 20 Occurs …

Figure 10.8 Object relational interface can read and write structured data.
124 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

and intuitively by following the unambiguous semantics of hierarchical data


structures. This also means that these same update semantics can be applied
seamlessly across a heterogeneous logical database composed of relational and
nonrelational databases.
An example of why the inner join view has difficulty being updated can
be seen in an inner join view consisting of the Department and Employee
tables. Updating this view is very difficult because of its ambiguous semantics.
If a department is deleted, are the employees also deleted? What happens if an
employee is deleted? Don’t be influenced by any meaning attached to the table
names—try renaming the tables X and Y. The reason for this ambiguity is that
there is no data structure semantics associated with the inner join. This was
described in Chapter 1. In contrast are hierarchical views, which can be created
by outer joins, such as those in Figures 10.9 and 10.10.
Updating outer join views where the Department table is hierarchically
over the Employee table or the Employee table is hierarchically over the
Department table is not ambiguous. In Figure 10.9, the effects of deleting a
department in these two outer join views are intuitive. In the Department view,
the associated employees and dependents would also be deleted along with the
department. In the Employee view, only the affected department would be

Department View Employee View

Dept Emp
Delete
Emp
Dept Dpnd
Dpnd

Figure 10.9 Deleting a department from different views produces different results.

Department View Employee View

Dept Emp
Delete
Emp
Dept Dpnd
Dpnd

Figure 10.10 Deleting an employee from different views produces different results.
Outer Join Advanced Capabilities 125

deleted. In Figure 10.10, deleting an employee in the same two views as


Figure 10.9 has a different effect, which is also intuitive. In the Department
view, the employee and the associated dependents would be deleted, not the
associated department. In the Employee view, the employee and the associated
department and dependents would be deleted. All of these update operations
use the outer join’s defined hierarchical semantics, which are intuitive and fairly
universal.

10.10 Multimedia Application Directory Support


Multimedia databases are more than standard databases with multimedia fea-
tures and capabilities. Multimedia databases are specialized. Their purpose is to
aid in the support of multimedia centric applications such as interactive kiosks.
This support extends not only to multimedia storage and playback, but also to
the production of the multimedia application—which can be extensive, con-
sisting of media acquisition, classification, and organization. To support these
functions, a hierarchical directory or modeling system is necessary to catalog
and organize the multitude of multimedia audio and video clips. Since multi-
media applications are usually interactive and user-driven, the flexibility of a
hierarchical structure organization is necessary.
As an example of such a multimedia application, Figure 10.11 shows
the database model and SQL definition of a video book. This book can be
viewed sequentially at several different academic levels, or as a reference
using hyperlinks from the contents or index to access the stored multimedia
data.
The application view in Figure 10.11 is an example of a simplified multi-
media application view. Its design allows for both the organized production of
the multimedia application and for the flexible interactive operation (i.e., play-
back) of the application. A clip shown in the data model is usually made up of a
sequential series of video frames and a scene can be made up of a series of clips.
A section can be made up of a number of scenes, and a chapter is composed of a
number of sections.
This data model allows the flexibility of rearranging portions of the video
very easily, and the access can be very efficient regardless of the number of
tables because of the outer join optimizations (covered in Section 10.2 and later
in Chapter 11). This model is general enough to handle many different multi-
media books, and they can be easily modified without having to change the
application that processes the data. For example, chapters and scenes can be
added, moved, or deleted without changing the multimedia application. Multi-
media databases supply this data independence. When multimedia applications
lack a database, the data structure is buried in the application, where its value is
126 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

CREATE VIEW MMBook AS


SELECT * FROM Book Book
LEFT JOIN Contents
ON BookX=ContentsX
LEFT JOIN Chapter
ON BookX=ChapterX Contents Chapter Index
LEFT JOIN Index
ON BookX=IndexX
LEFT JOIN Section Section
ON SectionX=ChapterX
LEFT JOIN Text
ON SectionX=TextX Text Audio Scene
LEFT JOIN Audio
ON SectionX=AudioX
LEFT JOIN Scene Clip
ON SectionX=SceneX
LEFT JOIN Clip
ON SceneX=ClipX

Figure 10.11 Multimedia book hierarchical directory example.

lost. Multimedia databases organize multimedia around a data model making it


available to many applications, thereby avoiding the time-consuming produc-
tion phase and increasing reuse of resources.
Multimedia authoring systems that assist the user in building interactive
multimedia applications are missing this type of multimedia database capabil-
ity. One reason for this is that they use only a single unchangeable operational
metaphor. One such metaphor example is where the author of the multimedia
application is the director of a play manipulating the multimedia components
as the cast and props around the stage, which is the screen. This works fine if
the metaphor matches the application, but can be awkward when it does not. A
solution is to integrate a multimedia database as described above into the multi-
media authoring system and use the data model defined by the author as
the operational metaphor. In this way, the operational metaphor and the
defined data model are tightly integrated, as are the playback and production
components.
This dynamic data modeling metaphor ability becomes more important
when it is realized that multimedia data is just a small subset of a larger classifi-
cation of data, known as abstract data or abstract data types (ADTs). Multi-
media databases and authoring systems can easily store and utilize all forms of
abstract data types, such as fingerprints, X-rays, EKGs, and MRIs. Applications
based on these abstract data types can be very different than multimedia
Outer Join Advanced Capabilities 127

applications, but can still be data modeled in their own unique way using the
data modeling capability shown in Figure 10.11.

10.11 Universal Data Access of Structured Data


The SQL vendor community began work on a standard application program-
ming interface (API) in the 1980s with the development of embedded SQL.
That was followed by an initiative to develop a standard SQL call-level inter-
face (SQL/CLI). Microsoft leveraged some of that work to create Open Data-
base Connectivity (ODBC), which was aligned with the international standard
SQL/CLI in 1995. Sun leveraged the same SQL language and data types used
by ODBC and SQL/CLI when creating JDBC™ for Java database access.
This has resulted in widespread adoption of ODBC and JDBC™ for SQL data
access. Both the ODBC and JDBC™ APIs support the retrieval of metadata
about database capabilities and query result sets. Although other APIs for SQL
data access have emerged, including OLE DB, SQLJ, DAO, RDO, and
ADO.NET. ODBC and JDBC™ have seen widespread adoption that gives
them a long shelf life. Microsoft supports ODBC for enterprise data access and
for access to cloud databases.
The metadata capabilities of the ODBC and JDBC™ APIs are
augmentable by using the SQL query itself to supply data structure meta infor-
mation for processing hierarchies. This provides an efficient one-to-one map-
ping, with the data structure automatically mapping accurately.
This method utilizes the enterprise and legacy access, and open database access
interfacing capabilities described earlier in Sections 10.3 and 10.4.
The diagram in Figure 10.12 demonstrates graphically how the data
structure meta information is automatically passed from the universal data
access platform to the data provider component that performs the structured

Data structure
Data provider/driver
extraction

Legend: SQL
Meta Data OLE DB, ODBC, JDBC

External
Data Data Modeling SQL
Generation UDA Product
Definition

Data Modeling SQL

Figure 10.12 Integrating external data definitions with data modeling SQL.
128 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

data access. The data provider component uses the data structure extraction
technology described in Chapter 9 to retrieve the data structure meta informa-
tion from the SQL specification. Chapter 14 goes into this topic in more detail.
It is important to realize that the standard SQL join data modeling capa-
bility is based totally on the outer join’s standard syntax and semantics. This
data modeling capability exists inherently in the ANSI/ISO SQL standard, and
is operating automatically all the time. This means that any other approach
used to supply the data structure of a SQL query could be in conflict with the
data modeling occurring naturally with externally supplied outer join specifica-
tions, and this could produce incorrect results.
This data structure conflict can be eliminated by generating data model-
ing SQL from the externally supplied data definition, thereby introducing SQL
that accurately models the data structure, and from which the data structure
can be extracted at any time and location. The diagram in Figure 10.12 demon-
strates this system design.

10.12 The SQL XML Data Structure Connection


The Internet experienced explosive growth after the World Wide Web Consor-
tium (W3C) published a specification for the Hypertext Markup Language
(HTML). The W3C continues to evolve XML, just as the International Stan-
dards Organization (ISO) has continued to evolve the SQL specification. The
W3C has produced several important specifications related to the Extensible
Markup Language (XML) and other specialized vocabularies for operating with
structured data. They include the Document Object Model (DOM), XML
Schema and Resource Description Format (RDF) specifications.
XML specifies how to mark up or tag document content so it is more eas-
ily understood than free form text. It provides an industry standard format for
self-defining, structured data. XML has become a de facto standard that’s
widely used for the storage, processing and interchange of data.
Handling XML content can involve processing XML-compliant tagged
data contained in files, databases or messages. The format for XML documents
is a hierarchy and the W3C DOM specification defines how to build an
in-memory representation of the hierarchical document structure after parsing
the XML content.
The data in databases and structured data in XML containers can be
moved back and forth using SQL with its join data modeling capability. This is
shown in Figure 10.13. Notice that the data is stored with its meta structure
definition. Any hierarchical structure can be specified with an XML definition.
The Employee view was chosen in this example to demonstrate how multiple
Outer Join Advanced Capabilities 129

SQL Employee View XML Employee Definition

SELECT * <?xml version=’1.0’?>


FROM Employee <EmpView>
LEFT JOIN Department <Employee>
ON DeptKey=EmpDeptKey <Emp>Mike</Emp>
LEFT JOIN Dependent <Department>
ON EmpKey=DpndEmpKey <Dept>DeptA</Dept>
</Department>
Employee Data <Dependent>
<DpndF>Jay</DpndF>
Emp Dept DpndF DpndL <DpndL>Roe</DpndL>
</Dependent>
Mike DeptA Jay Roe :
</EmpView>
Jane Doe
Mary DeptA Sam Foe

Figure 10.13 Structured data can be moved accurately between SQL and XML.

legs and multiple levels can be specified. The elements of the XML definition
are nested by following the hierarchical structure.
The XML and SQL capability to define and process hierarchical struc-
tured data has great utility value. One important use is to dynamically transfer
data from databases to Web servers, business-to-business (B2B) applications
and integration servers. This technique is greatly improved by SQL’s ability to
dynamically transfer structured data from any combination of database sources
into an XML container, where it can be served as XML or rendered for display
as HTML. As shown in Figure 10.14, SQL is invoked by the browser to trans-
fer data into the Web site in XML format.
Other important use cases include archiving and data replication. Because
XML data is tagged when it is exported from SQL databases, it is self-defining,
a very useful property for data archives. Because SQL database products can
import and export XML, it’s a viable solution for replication across disparate
SQL database platforms.
Another use of SQL for web content is a new capability made possible by
XML. It is the capability to treat XML web content as a database, with SQL
capable of accessing structured XML data along with other databases for
retrieval or even update, as shown in Figure 10.15. This means that web sites
with static XML content do not have to be a closed system accessible only by a
browser. The content can be accessible to disparate and heterogeneous data
access by a wide variety of SQL client software.
130 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

User Browser

Data Data
SQL
Any XML WEB
Structured Data
Database DB SQL Page
Processor
Request

Figure 10.14 SQL can move structured data dynamically into an XML Web site.

SQL Structured
Data Request

DB Direct Access
Any Request SQL XML WEB
Structured Data Page
Database Processor
Data XML Data

Result

Figure 10.15 SQL can treat XML Web sites like any other database.

XML structured data is hierarchically structured, usually contiguous,


data. XML documents conform to a W3C specification that defines details
such as tagging of elements and attributes. They may also conform to the W3C
XML Schema recommendation. For this reason, the XML document is analo-
gous to structured data stored in files as records and can be accessed in the same
fashion. SQL-based structured data access is shown in Chapter 14 and can be
easily adapted to handle XML data.
XML data defining a hierarchically structured document or data located
in a Web page can be considered a contiguous structured record that we will
call “a structured Web record.” This structured Web record has data structure
control information embedded in the data just as a structured file record does.
Like a structured record in a file, a structured Web record can be combined
with other types of database data to form a larger heterogeneous hierarchical
structure.
Structured records are located or addressed by a root-key field value. This
can be accomplished with structured Web records by assigning their root-key
field value as the Web page URL address. In this way, a structured Web page
Outer Join Advanced Capabilities 131

can be directly addressed by SQL or joined to from other record types in the
heterogeneous virtual structure using their foreign-key field value.

10.13 Conclusion
The data structure meta information that is extracted by the DSE technology is
extremely valuable. It has the potential of supporting many powerful new SQL
features and capabilities not previously possible. Many of these were identified
in this chapter, such as optimization, object relational interface support, view
update capability, hierarchical relational processing, seamless legacy database
access, and direct access to XML Web sites. The main enabler of these capabili-
ties is the database navigation and processing of data structures. While these are
global solutions, there is also the potential for specific solutions or features that
can extend or compliment individualized products.
11
Outer Join Optimization
The standard SQL join operation is more difficult to optimize with its ON
clauses and outer join operations than the simpler common inner join. With
the common inner join, its tables can be freely reordered to best optimize
access. With the standard SQL join, this ability is constrained by its ON
clauses. Working within the constraints of the ON clauses, INNER and FULL
joins can each be reordered in any order because they are both commutative
and associative in operation. The one-sided outer join is not commutative; its
tables cannot be freely reordered. But hierarchictivity can play a role in optimi-
zation. This chapter explores the hierarchical semantics of the one-sided outer
join for use in optimization.

11.1 Join Table Reordering


With the outer join, some table reordering is possible and recommended for
efficiency. Take for example the Department view, which can be built top-
down or bottom-up. Normally, hierarchical structures are built top-down, but
when subviews are used, as were shown in Chapter 7, right-sided nesting can
cause the structure to be built bottom-up. Top-down execution is more
efficient than bottom-up execution because bottom-up execution can cause
throwaways. Throwaways are rows that are retrieved into the working set
and then later discarded. For example, using the data structure shown in
Figure 11.1, throwaways occur when the Dependent table is joined with the
Employee table and the result is then joined with the Department table, where

133
134 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Department SQL Expanded (Bottom-Up):


View: SELECT* FROM Department LEFT JOIN
Employee LEFT JOIN Dependent
Department ON EmpNo=DpndEmpNo
ON DeptNo=EmpDeptNo

Employee SQL Rewritten (Top-Down):


SELECT * FROM Department
Dependent LEFT JOIN Employee ON DeptNo=EmpDeptNo
LEFT JOIN Dependent ON EmpNo=DpndEmpNo

Figure 11.1 Join table reordering optimization example.

unmatched employees are discarded with their dependents. These dependents


are throwaways.
Throwaways are avoided when the structure is processed top-down since
unmatched employees are discarded before their dependents are retrieved
and stored. While subviews may cause throwaways, the SQL engine is free to
rewrite the expanded query before its execution to change the join table order
from bottom-up to top-down, as shown in Figure 11.1.

11.2 Dynamic Shortening of the Access Path


Dynamic shortening of the access path is an optimization that should auto-
matically be performed along with the join table reordering optimization speci-
fied in Section 11.1. This optimization works when the data structure is being
processed top to bottom, which it will be if the table reordering has been per-
formed as described above. Dynamic path shortening occurs when a hierarch-
ical active path runs out of data before reaching its end. In this case, access
further down the path can be skipped for the current parent occurrence. For
example, in the Department view shown in Figure 11.1, this can occur when a
department has no employees since it makes no sense to go any further down
the active path after dependents. Furthermore, this path can have multiple sub-
paths that can also be eliminated. Figure 11.2 demonstrates this dynamic path
shortening.

11.3 Removal of Unnecessary Tables From Outer Join View


When a SQL inner join view is invoked, all tables in the view must be accessed
to generate the result table. This happens regardless of which columns are speci-
fied for retrieval when the view is invoked. This is necessary because the
Outer Join Optimization 135

Access A
direction Missing table B occurrence
B terminates access path for
this row occurrence

C E TablesC,, D,, E, and F do not


require access for current
D F occurrence of table A

Figure 11.2 Dynamic path shortening.

materialized view (the data that represents the view) on which the view invoca-
tion is based is always affected by all tables in the inner join view. This is
because missing data anywhere in the inner join will cause unmatched rows to
be removed. This was discussed back in Chapter 1 where Figure 1.1 showed
that an inner join composed of the Department and Employee tables would
not contain departments that had no employees. This means that if this view,
call it DeptEmpView, was invoked as in SELECT DeptName FROM Dept-
EmpView, only DeptNames for departments that had employees would be
selected. This result required that the Employee table be accessed, even though
no data was selected from it. If this was not the desired result, then this view
should not have been used and the Department table should have been accessed
directly.
The necessity of accessing all tables in a view is a requirement for the way
inner joins use the Cartesian product model for processing joins, as described in
Chapter 1. This is not necessary for outer joins that generate hierarchical struc-
tures. standard SQL outer joins operate differently than inner joins as described
in Chapter 2.
Outer join views that model hierarchical structures do not always need to
access all tables in the view when invoked. Take for example the outer join view
DeptEmpView, defined as SELECT * FROM Department LEFT JOIN
Employee ON DeptNo=EmpDeptNo. When this view is invoked as SELECT
DeptName FROM DeptEmpView, the Employee table is not referenced and
does not need to be accessed. This is because, in the semantics of this hierarchi-
cal data structure, the Employee table is at a lower level than the selected table
Department. This means that the Employee table cannot affect the Depart-
ment table, and therefore does not need to be accessed.
Any hierarchical structure access, no matter how complex, defined by
outer joins can apply this powerful view optimization. This is performed by
eliminating tables from access consideration that are not referenced in the
query and are not on a path to a referenced table in the query. This excludes
136 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

tables referenced on ON clauses since they will not affect the query if they are
not referenced anywhere else in the query, because they are only used if access
of the table is necessary. This optimization is based on the modeled hierarchical
data structure and the columns specified at the time of the view invocation.
This is not new. Hierarchical access logic dictates this behavior. The true test
of this is that this logic derives the same data results as if all the tables were
accessed. This is demonstrated in Figure 11.3.
There is an additional beneficial side effect of this optimization: it helps
eliminate unnecessary replicated rows. These replicated rows are introduced
by accessing unnecessary tables. This means that the optimized result is more
semantically correct than the unoptimized result. For example, in the outer join
DeptEmpView example described earlier in this section, the unoptimized view
invocation would replicate the department’s name (DeptName) for each
employee in the department even though no Employee columns were selected.
The optimized invocation would not replicate department names since no
access to the Employee table was needed. This is also shown in Figure 11.3.
The two examples in Figure 11.3 demonstrate view optimization applied
to two different SQL views of the same data and relationships. The data struc-
ture diagrams shown reflect the structure of the SQL outer join view definitions
and data that were originally defined in Figure 6.1. For the Department and
Employee views, the dotted lines in the data structure diagrams in Figure 11.3
represent areas of the structures that can be eliminated from access based on the
view selection criteria shown directly above the diagrams. Data enclosed in a

Department View Employee View

SELECT EmpName SELECT EmpName


FROM DeptView FROM EmpView

Department Employee

Employee

Dependent Department Dependent

EmpName EmpName
Mary Mary
John John Key: Dotted
Mike Bill boxes removed
Mike Mike if optimization
in effect
Mike

Figure 11.3 Outer join view optimizations can produce more accurate results.
Outer Join Optimization 137

dotted box represents unnecessary replicated data that is removed when optimi-
zation is applied. This duplicate removal is more semantically controlled than
SQL’s duplicate row value removal option.
In the examples shown, replicated data is produced because employee
Mike has two dependents, causing Mike to be in the virtual view twice when
using the old inner join Cartesian product access model (see Chapter 2). With-
out optimization, this replication is confusing since dependents are of no
importance or significance in either query, and therefore should not affect the
result. And note, these example data views are small; larger views offer a much
greater opportunity for optimization.
Other benefits of the outer join view optimizations are that it does not
penalize the user for picking a view that is too large, and that large views
will eliminate the need for many small views, making life easier on end users
and DBAs.

11.4 Increased Efficiency of Parallel Database Processing


This book demonstrated in Chapter 6 that the legs of a hierarchical structure
have separate semantics because they are independent of each other. The legs
do not depend on each other. This not only implies that the tables can be proc-
essed in any order, but for parallel processing this means these legs can be proc-
essed in parallel with no coordination between them being necessary. This can
significantly increase asynchronous processing (pipelining in this example).
This can be gleaned from Figure 11.4.

11.5 Dynamic Rebuild to Pick Up New SQL Features


Besides internal optimizations, there may be SQL language functions added
to new SQL releases that can also be used to improve performance. To utilize

Subprocess 1 B D Subprocess 2
accesses leg 1 accesses leg 2
C E

Figure 11.4 Parallel processing of hierarchical sibling legs is always possible.


138 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

these new external functions will require modifying existing SQL code, usually
by hand. In SQL:1999, these functions, which can be user-defined functions,
can be navigation functions that can access tables through other tables to avoid
the need to join them. For example, the first outer join example in Figure 11.5,
which models the structure diagram in Figure 11.4, is only selecting a column
from the lower level table C. This SQL statement can be rewritten to avoid
unnecessary join operations, as in the bottom SQL example in Figure 11.5,
by using a navigation function that uses the data structure meta information
extracted from the original query so that it only returns keys that exist in the
structure.
This optimization still conforms to the semantics of the structure shown
in Figure 11.4 and operates seamlessly because it continues to follow and obey
the hierarchical semantics of the outer join. Using outer join data modeling
today can allow for the capability of automatically utilizing future features (like
this one) as they are introduced into SQL systems. This is achieved by database
system software that dynamically rewrites the SQL specification to use the new
functionality. This capability, with its dynamic operation, also allows it to be
applied to ad hoc queries where it could not be accomplished otherwise, since
the selected columns are not known beforehand.

11.6 Optimization of Nonrelational SQL Interfaces


Procedural code is known for its efficiency, but when nonrelational databases
are involved, nonprocedural declarative languages can actually achieve similar
levels of optimization. This is because with declarative languages such as SQL,
the data structure (via the outer join) and desired processing requirements are
known up front, allowing a very high level of optimization. Instead of optimiz-
ing small pieces of database logic procedurally without much knowledge of

Current SQL:

SELECT CVal FROM A


LEFT JOIN B ON A=B LEFT JOIN C ON B=C A
LEFT JOIN D ON A=D LEFT JOIN E ON D=E
B D
Future SQL Rewrite: C E

SELECT CVal FROM C WHERE CKey IN NavigateTo(C)

Figure 11.5 Automatic SQL rewrite to take advantage of future SQL capabilities.
Outer Join Optimization 139

what is going to be needed, nonprocedural optimization can optimize globally


and react quickly to change its global access logic. With databases, each data-
base access saved eliminates millions of instruction cycles and hardware wait
time.
SQL access of procedural databases like IBM’s IMS, which requires man-
ual navigation from point to point, is a good example of how nonprocedural
access can actually improve database access efficiency. As stated above, because
of the nonprocedural SQL, the total requirements are known up front, so the
access can be globally planned. With IMS, this means path calls can be used to
reduce the number of calls necessary by reading and writing entire paths down
the hierarchical structure being accessed. Global strategy can also dynamically
plan the best strategy for database positioning, navigation, and access. These
optimizations are demonstrated in Figure 11.6, where IMS segment types A
and B bypass direct access until a qualifying record is located. The semantics of
this query are defined in Chapter 5.
A further optimization approach that can reap even greater efficiency
with IMS and possibly other navigational databases is to go under the covers
and bypass their standard procedural user interface, which limits the full global
optimization possible. This optimization strategy again relies on the fact that
the processing requirements are known up front because of the nonprocedural
outer join data modeling semantics. This under-the-covers processing is already
performed for IMS by a variety of software, including ODBC drivers. IMS per-
forms this process by accessing its underlying VSAM and ISAM access methods
directly. Using this access technique, SQL can actually process IMS databases
more efficiently than is possible using the standard IMS interface directly.
A final note about nonrelational SQL access. All the optimizations for
SQL database access described in this chapter can also be applied to non-
relational access. This is because they are based on data structure semantics,
making them generic access optimizations.

SQL Query: IMS Structure: IMS Pseudo Access Code:

SELECT B A GetNext C Where C=“X”


FROM A Hold Position On A
LEFT JOIN B ON A=B GetNext B Loop
LEFT JOIN C ON A=C B C GetNext A
WHERE C=“X”

Figure 11.6 Outer join query can be translated to very efficient IMS access code.
140 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

11.7 Applying Hierarchical Optimizations to


Network Structures
As we have seen throughout this book, network application structures can have
multiple paths to data, and for this reason they can be ambiguous. Most of the
hierarchical optimizations covered in this chapter are still possible. The data
structure diagram in Figure 11.7 is a network application structure as defined
in Chapter 6. Table D in this structure is at a network junction point where
two or more paths come together, forcing the processing of the paths to syn-
chronize. This may limit some optimizations.
After mapping a network structure from an outer join statement, a net-
work structure such as the one in Figure 11.7 can be reordered top to bottom
for efficiency, as shown in Section 11.1. Parallel processing is still possible, as
described in Section 11.4, but the network junction points are sync points that
may retard parallel processing. Dynamic rebuild, as discussed in Section 11.5,
is also possible with additional code to support these sync points.
Dynamic path shortening can still operate on network structures that
contain paths that have network junction points, as described in Section 11.2.
The optimization does not mean that paths that have been terminated early will
not be accessed from another active path that forks into it at the junction point.
For example, in Figure 11.7, path D to E may be accessed via path B even when
path C has been shortened. This makes sense, since path D to E requires sepa-
rate access from all paths entering it (unless dynamically shortened) since each
path entering it matches different key link values used in the join operation,
which can produce different results in path D to E—depending on the path
values entering it.
The removal of unnecessary tables from invoked views is also possible
with network views. This can have the effect of actually removing network

Network View

DEFINE VIEW NetViewAS A


SELECT * FROM A
LEFT JOIN B ON B=A B C
LEFT JOIN C ON C=A Network
LEFT JOIN D ON D=B D Junction Point
OR D=C
LEFT JOIN E ON E=D E

Figure 11.7 Outer join network structures have junction points.


Outer Join Optimization 141

junction points, which can turn a network structure into a valid hierarchical
structure dynamically. For example, if tables D and E are not referenced in the
network structure in Figure 11.7 (as documented in Section 11.3), then tables
D and E are eliminated from the materialized view, creating a valid hierarchical
structure and enabling all the benefits that go with it, as described in Chapter 5.
This is demonstrated in Figure 11.8.
The optimizations shown in Figure 11.8 will also apply for network
structures where the network junction points are linked to multiple paths using
AND logic instead of OR logic. This structure, while similar, is not actually a
network structure, and is described in Chapter 6.

11.8 Shifting ON Clauses to the WHERE Clause


Since the WHERE clause has been around a lot longer than the ON clause,
there is a tendency for SQL optimization to move ON clauses, or portions of
them, to the WHERE clause when possible. This is probably a good strategy
since the WHERE clause probably has much more optimization logic than the
newer ON clause. When there are both a WHERE clause and ON clauses,
there is the opportunity to come up with these types of optimizations because
of the similarity of these different types of selection clauses. But whatever the
case for optimization, it must be done with care because ON clauses can specify
complex semantics while the WHERE clause is limited in this area, so the result
may not always be the same. As an example, Figure 11.9 is performing an opti-
mization where the ON clauses are transferred to the WHERE clause.
This example moves all of the ON clauses’ join criteria to the WHERE
clause, thereby effectively changing the outer join query to an easier-to-

Network View Network View Network View Network View


Invoked Materialized Invoked Materialized

A A
SELECT B, C SELECT C
FROM NetView B C FROM NetView B C

D D

E E

Figure 11.8 Network structure optimized and converted to hierarchical structure.


142 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Unoptimized: Optimized:

SELECT * FROM Dept SELECT *


Dept LEFT JOIN Emp FROM Dept, Emp, Dpnd
ON DeptNo=EmpDeptNo WHERE DeptNo=EmpDeptNo
Emp LEFT JOIN Dpnd AND EmpNo=DpndEmpNo
ON EmpNo=DpndEmpNo AND DpndAge>18
Dpnd Where DpndAge>18

Figure 11.9 Shifting ON clauses to the WHERE clause for optimization.

optimize inner join query. The converted outer join query now performs the
inner join of the three tables involved and then filters the result using the
WHERE clause criteria. In this case, the WHERE clause in Figure 11.9 is
based on the lowest level table, Dpnd, which means any missing data for table
Dpnd would be filtered out. This further implies that missing data for table
Emp would be filtered out and so on up the path. This logically turns the query
into an inner join since no data is actually being preserved—this means only
complete rows that match the selection criteria are selected.
If the WHERE clause in Figure 11.9 specified a filter on table Emp
instead of table Dpnd, the optimization shown could not have been performed
since it would remove data preserving below the table Emp level when table
Emp passed the filtering test. This leads one to believe this inner join optimiza-
tion can only work when the WHERE clause is filtering at the lowest level. But
this is only partially correct. To see why, examine the SQL optimization in
Figure 11.10.
In Figure 11.10, the WHERE clause is at the lowest level in the data
structure and the filtering data is contained in the last table joined, table Dpnd.
But, the problem here is that while table Dpnd is at the lowest level, there are
other legs in the structure. Table Dept is on another leg, and if the query were
changed to an inner join, no data would be preserved when table Dept did

Unoptimized: Invalid Optimization:

SELECT * FROM Emp SELECT *


Emp LEFT JOIN Dept FROM Dept, Emp, Dpnd
ON DeptNo=EmpDeptNo WHERE DeptNo=EmpDeptNo
Dept Dpnd LEFT JOIN Dpnd AND EmpNo=DpndEmpNo
ON EmpNo=DpndEmpNo AND DpndAge>18
Where DpndAge>18

Figure 11.10 Invalid example of shifting ON clauses to the WHERE clause.


Outer Join Optimization 143

not match a table Emp row occurrence. In this case, as we learned earlier in
Chapter 5 on data structures, sibling legs are independent of one another. This
means what occurs in one leg should not influence the other. By converting the
outer join in Figure 11.10 to an inner join, it changed the semantics such that
what happens in one leg can affect all the other legs. This changes the result of
the query. This means that performing these types of optimizations requires
analyzing the semantics of the outer join queries very carefully.

11.9 Conclusion
This chapter has presented powerful semantic optimizations that are enabled by
the outer join data modeling ability. Without utilizing the outer join optimiza-
tions presented in this chapter, the outer join will operate less efficiently than
the inner join. This will prevent many users and vendors from utilizing this
powerful operation. But if the outer join optimizations presented here are util-
ized, the efficiency of the outer join could equal or even surpass the inner join
in many cases. This means that the outer join, with all of its powerful capabili-
ties, can be comparable to the efficiency of the inner join!
It was also demonstrated that outer join view optimization could convert
a network structure into a hierarchical structure, thereby enabling all the fea-
tures and capabilities available to hierarchical structures.
The optimizations presented in this chapter demonstrate the value of data
modeling and the importance of the capability to determine the data model
defined by outer joins. The data model represents the semantics of the data and
makes it easier to determine the consequences of changing the SQL to optimize
SQL queries.
12
Hierarchical Relational Processor
Prototype
With standard SQL having the capability to inherently process hierarchical
structures, it is no longer necessary to force all data into a flat structure that
obscures the data structure and unnecessarily replicates data. If the data is being
modeled hierarchically, it can be processed directly in this more powerful form
by using outer join specifications that directly model the data structure and exe-
cution paths.
The examples in this chapter show the operation of an standard
SQL-based hierarchical relational database processor prototype that is driven by
the inherent data modeling capability of the standard SQL outer join. It utilizes
the DSE technology, described in Chapter 9, to dynamically extract the data
structure meta information naturally present in outer join specifications. This
freely available information is used to control the hierarchical heterogeneous
processing of relational and nonrelational data. It produces a hierarchical
WYSIWYG display that conforms to the underlying data structure of the SQL
query request. This produces results that are semantically superior to standard
SQL processing and are more semantically accurate.
This new hierarchical processing prototype does not require that the data
be in a fixed format or that the data structure be predefined. The data can be
stored in standard first normal form relational tables, flat files, or hierarchical
prerelational or postrelational databases such as a legacy database or a nested
relational database. The data structure can be specified dynamically, giving it

145
146 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

data structure independence that is lacking in standard universal data access


systems.

12.1 Hierarchical Relational Prototype Operation


Hierarchical relational databases access and process data in non-first normal
form (structured format). This eliminates having to flatten the data into first
normal form (table format) as standard relational systems do. This flattening of
the data can introduce unnecessary replicated data. By not having to flatten the
data, hierarchical relational processing can preserve the data structure so that all
aggregate and summary operations will be accurate and can be controlled with
more flexibility. This is reflected in the structured format used by the hierarchi-
cal relational processor to display its output. In this structured output format, a
blank data field indicates that the previous column value is still in effect. A dash
inserted in a field indicates the data is missing—this prevents a missing data
value from inadvertently being taken as the previous column value.
The first entry of each example is the outer join specification that is pro-
cessed directly by the SQL hierarchical relational prototype. The prototype
then extracts the data structure meta information embedded in the outer join
specification using the DSE technology described in Chapter 9, and displays its
meta- data structure information in table form. This metadata structure infor-
mation includes an outer join semantic optimization indication, which is
flagged under the Access column when a table in the data structure does not
require access.
Lastly, using the data structure meta information supplied from the outer
join specification, the prototype accesses its internal first normal form relational
database in a manner that will produce the structured data results shown in a
visual structured display. This hierarchical relational processing can be imple-
mented in any standard SQL system, relying only on the data structure meta
information supplied from outer join operations.

12.2 Basic Data Modeling


The examples in Figures 12.1 and 12.2 demonstrate the basic data modeling
capabilities of the standard SQL outer join. They show how the hierarchical
relational prototype using the DSE technology can process standard relational
data in a hierarchical fashion. In these examples, three tables—Department,
Employee, and Dependent—are joined in different ways using the same
Hierarchical Relational Processor Prototype 147
SELECT DeptName, EmpName, DpndName FROM Department LEFT
JOIN Employee ON DeptNo=EmpDeptNo LEFT JOIN
Dependent ON EmpNo=DpndEmpNo

Table Name Level Parent Access


1 Department 1 0 Yes
2 Employee 2 1 Yes Department
3 Dependent 3 2 Yes

DeptName EmpName DpndName Employee


Acct Mike -
John -
HR Mary Jay
Ken
Mark Kay Dependent
MIS - -

Figure 12.1 Department view processed by hierarchical relational processor.

SELECT EmpName, DeptName, DpndName FROM Employee LEFT


JOIN Department ON DeptNo=EmpDeptNo LEFT JOIN
Dependent ON EmpNo=DpndEmpNo

Table Name Level Parent Access


1 Employee 1 0 Yes
2 Department 2 1 Yes
3 Dependent 2 1 Yes
Employee

EmpName DeptName DpndName


Mike Acct -
John Acct -
Mary HR Jay Department Dependent
- Ken
Mark HR Kay
Irv - Ben

Figure 12.2 Employee view processed by hierarchical relational processor.

relationships to form two different data structures involving one-to-many and


many-to-one relationships. Notice in the query outputs that there is no unnec-
essary data replication. All the data replications are accurate regardless of what
data structure level the data is at or if there are multiple legs in the data struc-
ture as in Figure 12.2. This allows aggregate operations applied anywhere in the
data structure to be accurate. While the example in Figure 12.2 does show rep-
licated data (HR and Acct), this correctly reflects the many-to-one data
148 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

structure relationship of Employee over Department and its semantics (i.e.,


many employees have the same department). Notice further that these replica-
tion occurrences are correct—in a standard relational first normal form result,
HR would have been replicated three times instead of the correct two.
Besides the two different data structures in Figures 12.1 and 12.2, there is
also a difference with the data values displayed or not displayed in the two
examples. The first example’s query output in Figure 12.1 includes a depart-
ment named MIS while the second example does not. The second example’s
query output in Figure 12.2 includes an employee named Irv with a dependent
named Ben, while the first example in Figure 12.1 does not. These differences
are properly reflected in the semantics of the data structures involved. The MIS
department isn’t included in the example’s query output in Figure 12.2 because
this query models an Employee view (Employee over Department and
Dependent), and there are no employees in the MIS department. The
employee Irv and his dependent Ben aren’t included in the first example’s
query output in Figure 12.1 because this query models a Department view
(Department over Employee over Dependent) and Irv and his dependent Ben
do not belong to any known department. This was covered in Chapter 5.

12.3 Many-to-Many Relationships


The examples in Figures 12.3 and 12.4 operate on a Parts and Suppliers many-
to-many relationship, described in Chapter 7. In this relationship, one supplier
can have many parts and one part can have many suppliers. This does not pre-
sent a problem for hierarchical relational processing and both data structures in
the examples in Figures 12.3 and 12.4 produce a hierarchically structured
(many-to-many) result. Most texts on data modeling state that many-to-many
relationships form one-to-many hierarchical relationships. A many-to-many
relationship is actually a combination of many-to-one and one-to-many. In the
one-to-many portion replications are suppressed, while in the many-to-one
portion they are not. In the example in Figure 12.3—Parts over Suppli-
ers—parts are not replicated but suppliers are (P1 occurs once while S1 occurs
three times, each related to a different parent value). In a true one-to-many
relationship, the lower level values will not repeat across their parent values as
in this many-to-many relationship example.
It is worth noting that many-to-one relationships are found naturally in
the database and do not require special considerations for processing or print-
ing. But with one-to-many relationships, special handling considerations are
needed because the data is nested and requires special consideration when pro-
cessing and displaying.
Hierarchical Relational Processor Prototype 149

SELECT PartNo, Desc, SuppNo, Addr FROM Parts LEFT JOIN


PartSupplier ON PartNo=Part LEFT JOIN Suppliers ON
Supplier=SuppNo

Table Name Level Parent Access


1 Parts 1 0 Yes
2 PartSupplier 2 1 Yes Parts
3 Suppliers 3 2 Yes

Partno Desc Suppno Addr PartSupplier


P1 Part1 S1 Wash
S2 Denv
P2 Part2 S1 Wash
S2 Denv Suppliers
P3 Part3 S1 Wash

Figure 12.3 Part/Supplier view processed by hierarchical relational prototype.

SELECT SuppNo, Addr, PartSupplier.Qnt, PartNo, Desc FROM Suppliers


LEFT JOIN PartSupplier ON SuppNo=Supplier LEFT JOIN
Parts ON PartNo=Part

Table Name Level Parent Access


1 Suppliers 1 0 Yes
2 PartSupplier 2 1 Yes Suppliers
3 Parts 3 2 Yes

SuppNo Addr Qnt PartNo Desc PartSupplier


S1 Wash 100 P1 Part1
150 P2 Part2
350 P3 Part3
S2 Denv 200 P1 Part1 Parts
300 P2 Part2

Figure 12.4 Supplier/Part view processed by hierarchical relational prototype.

Many-to-many relationships require the use of an association table as


described in Chapter 7. The association table used in the SQL examples in
Figures 12.3 and 12.4 is PartSupplier, and is shown in Figure 12.5. It contains
keys (Part, Supplier) from both sides of the relationship to maintain the
many-to-many relationship in both directions. In the example in Figure
12.3—Parts over Suppliers—the association table is transparent in the result
because no column from this table is requested for display.
150 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Part Supplier Association Table


INTERSECTING
Part Supplier Quantity DATA
P1 S1 100
P1 S2 200
P2 S1 150
P2 S2 300
P3 S2 350

Figure 12.5 Association table used in many-to-many relationship.

The Suppliers over Parts example in Figure 12.4 does reference the associ-
ation table to include the QNT (quantity) column. This value is known as
intersecting data because its data is meaningful at the point of intersection (i.e.,
the quantity of a given part for a given supplier) also explained in Chapter 7.
This intersecting data appears to be a value associated with the Parts table since
values in the association table will always appear to be a value from the lower
level table, as shown in Figure 12.4.

12.4 Embedded Views


The example in Figure 12.6 demonstrates that stored views containing outer
join defined data structures can be seamlessly combined to form larger data
structures using the same standard SQL outer join syntax already
demonstrated. The hierarchical relational prototype identifies stored queries
by their view name. They are printed out when expanded, as shown in
Figure 12.6. The example in Figure 12.6 uses two views shown earlier in this
chapter, the Supplier view (Suppliers over Parts) and the Department view
(Department over Employee over Dependent). In this case, the Supplier view
is joined over the Department view using the DeptSuppNo column in the
Department table. Notice that this combined data structure properly reflects
its new structure, the replication counts are accurate, and the data displayed is
consistent with the previously shown data structures in this chapter.

12.5 View Optimization


The final example in Figure 12.7 demonstrates a powerful and very useful opti-
mization for stored views described in detail in Chapter 11. It significantly
enhances the operation and usefulness of SQL’s new outer join data structure
Hierarchical Relational Processor Prototype 151

SELECT SuppNo, PartNo, DeptName, EmpName, DpndName FROM


SupplierView LEFT JOIN DepartmentView ON SuppNo=DeptSuppNo

Inserted SupplierView:Suppliers LEFT JOIN PartSupplier ON SuppNo=Supplier


LEFT JOIN Parts ON PartNo=Part
Inserted DepartmentView: Department LEFT JOIN Employee ON DeptNo=
EmpDeptNo LEFT JOIN Dependent ON EmpNo=DpndEmpNo

Table Name Level Parent Access


1 Suppliers 1 0 Yes Suppliers
2 PartSupplier 2 1 Yes
3 Department 2 1 Yes
4 Parts 3 2 Yes PartSupplier Department
5 Employee 3 3 Yes
6 Dependent 4 5 Yes Employee
Parts

Suppno Partno DeptName EmpName DpndName Dependent


S1 P1 ACCT Mike -
- John -
P2 HR Mary Jay
Ken
- Mark Kay
P3 - - -
S2 P1 MIS - -
P2 - - -

Figure 12.6 Expanded view example.

processing capability. It often happens that a stored view is used where it is not
necessary to access all the tables defined for the desired result. With standard
inner join views, it is always necessary that all tables in the view be accessed.
This not only results in more overhead, but often incorrect results caused by
accessing unneeded tables, which in turn can cause replicated data values and
lost data. With outer join views, this unnecessary data access concern is not nec-
essary and can be avoided.
The example in Figure 12.7 is identical to the previous example in
Figure 12.6, except in this example no data is selected from the Dependent
table. In this case, the hierarchical relational prototype determines from the
semantics of the data structure that the Dependent table does not need to be
accessed (see the Access column in the data structure table above). Notice that
the result of the SQL query statement in the example above, without the
Dependent data and access to the Dependent table, remains consistent with the
previous example. This proves that this optimization works in this situation.
152 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT SuppNo, PartNo, DeptName, EmpName FROM SupplierView


LEFT JOIN DepartmentView ON SuppNo=DeptSuppNo

Inserted SupplierView: Suppliers LEFT JOIN PartSupplier ON SuppNo=Supplier


LEFT JOIN Parts ON PartNo=Part
Inserted DepartmentView: Department LEFT JOIN Employee ON DeptNo=
EmpDeptNo LEFT JOIN Dependent ON EmpNo=DpndEmpNo

Table Name Level Parent Access


1 Suppliers 1 0 Yes Suppliers
2 PartSupplier 2 1 Yes
3 Department 2 1 Yes
4 Parts 3 2 Yes PartSupplier Department
5 Employee 3 3 Yes
6 Dependent 4 5 No Employee
Parts

Suppno Partno DeptName EmpName Dependent


S1 P1 ACCT Mike
- John
P2 HR Mary
- Mark
P3 - -
S2 P1 MIS -
P2 - -

Figure 12.7 View optimization example.

12.6 Conclusion
This chapter has demonstrated an innovative SQL processor prototype that
operates on disparate heterogeneous data in a high-level hierarchical manner.
Previously, SQL processing of disparate heterogeneous data always used the
lowest common denominator structure—the flat structure. With standard
SQL’s capability to directly model and process hierarchical structures, there is
no longer a need to map structured data into a flat structure when hierarchical
structures are being modeled. Besides the ease and efficiency of one-to-one
mapping, the powerful hierarchical semantics of the modeled data structure are
maintained and utilized.
The live hierarchical SQL examples presented in this chapter prove a
number of things about the DSE technology. First, the DSE software operates
as expected—it does extract the data structure meta information embedded in
the outer join. Second, it can be utilized to develop products like the hierarchi-
cal relational processor that would not be possible otherwise with standard
SQL. Third, and most importantly, it proves the data modeling technology
Hierarchical Relational Processor Prototype 153

behind the DSE software is valid and does work. This means the outer join
does indeed inherently support the data modeling of complex data structures
consisting of multiple legs, and one-to-many, many-to-one, and many-to-many
relationships. Fourth, it demonstrates this technology is useful and viable.
13
Object/Relational Interface
The outer join’s object/relational interface capability is the best showcase for
the features and capabilities of the outer join. It uses all the inherent features
and attributes of the outer join and the advanced capabilities made possible by
the DSE technology described in Chapter 9. But the most powerful operation
at work is the interaction and synergism of these capabilities. These capabilities
and their interrelationships are represented in Figure 13.1. This chapter will
cover each capability and attribute in the diagram and explain its function,
importance, and interaction with those capabilities it enhances. Other object/
relational capabilities introduced in SQL:1999 are described in Chapter 8.
This chapter covers each object feature shown in the diagram in
Figure 13.1, one or more times. At the top of the diagram, the standard SQL
outer join operation acquires its object-enabling capabilities and attributes.
These capabilities are standardized via ANSI standardization, dynamic opera-
tion, and powerful data modeling capability enabling complex data structure
processing.

13.1 Standardized SQL Interface


One of the biggest stumbling blocks for nonrelational databases is the lack of a
standard programming and query interface that supports the features shown in
Figure 13.1. After all, investing time and money in a nonstandardized database
is very risky. The standard SQL outer join operation is standard. If there were
such an object interface, most agree a familiar relational syntax would be widely
accepted. Again, the outer join fits the bill.

155
156 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Outer join

Standardized Data Modeling Dynamic

Relational Familiar Late Binding Polymorphism

Inheritance Structured Views Optimization DB Navigation

View Update Reusability Abstraction Efficiency

Legacy Access Enterprise Access Data Warehouse Post Relational

Figure 13.1 Object/relational capabilities and their outer join derivation.

13.2 Data Modeling and Structure Processing


One of the biggest, if not the biggest, missing capabilities hampering object/
relational interfaces is the lack of complex data modeling and structure process-
ing capability in the relational model. The relational model has previously had
no inherent data modeling capabilities. This capability is extremely important
to object databases that deal with complex objects. Many other capabilities such
as blobs (binary large objects), user-defined data types, and functions have been
added to the major SQL platforms. But until OUTER JOIN became part of
the SQL standard, data modeling and processing hierarchies could not be done
seamlessly using SQL.
With the standard SQL outer join operation, seamless complex data
modeling and structure processing now become possible. As demonstrated in
Chapter 6, this powerful capability is performed inherently in SQL, resulting in
direct and seamless processing of complex data structures. This capability can
be further enhanced by the outer join DSE procedure discussed in Chapter 9.
This procedure dynamically extracts and makes available to the SQL engine the
inherent data structure meta information embedded in outer join statements.
This enables the direct support of many other capabilities and attributes of an
Object/Relational Interface 157

object/relational database. These are data inheritance, efficiency, database


navigation, nonrelational database access, reusability, and data abstraction.
Figure 13.2 depicts one way that SQL, via the standard SQL outer join, can be
seamlessly integrated with an object database to help supply these object
capabilities.
The example in Figure 13.2 demonstrates how SQL, utilizing the power-
ful standard SQL outer join syntax and semantics, can be used to model in par-
allel hierarchical data structures defined in memory by programming
languages. Then, by utilizing the data structure meta information recovered
from the outer join specification, the data can be seamlessly transferred between
the database and structured storage. The data can be retrieved from any data-
base source (see Figure 13.1)—it does not have to be relational. In memory, the
data can be navigated and manipulated procedurally by any programming lan-
guage and then written back out automatically to its native database. This data-
base access is very efficient since the entire data view is known beforehand and
can be retrieved more efficiently than with multiple procedural calls.

13.3 Data Abstraction and Reusability


Embedded SQL view structures—that is, views containing data substruc-
tures—can be combined to form bigger structures by simply joining them
using standard standard SQL join syntax. This was shown in Chapter 7, and is
depicted in Figure 13.3 where the Emp view is being used to create two larger
views, EmpDept and DeptEmp views. This capability is important because it
increases reusability and data abstraction. By breaking out common substruc-
ture portions as SQL views like the Emp view shown below, reusability is
enhanced since replication is reduced and can be controlled more easily.
Data abstraction is also increased since this substructure view capability
hides the complexities of data structures, because the data modeling SQL is
hidden in the view. Structured subviews are not only useful for data abstraction

SELECT * FROM Emp 01 Emp Char 20


LEFT JOIN Dept ON EmpDeptNo=DeptNo 10 Dept Char 20 Occurs …
LEFT JOIN Dpnd ON DpndEmpNo=EmpNo 10 Dpnd Char 20 Occurs …

SELECT * FROM Dept 01 Dept Char 20


LEFT JOIN Emp ON DeptNo=EmpDeptNo 10 Emp Char 20 Occurs …
LEFT JOIN Dpnd ON EmpNo=DpndEmpNo 20 Dpnd Char 20 Occurs …

Figure 13.2 Object/relational interface transfers data to and from structured memory.
158 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

EmpDept View DeptEmp View

Dept
Emp Emp
View
Emp Emp
View
Dpnd Dept
Dpnd

Figure 13.3 Data abstraction and reusability with substructures.

and reusability, but can be applied to all forms of database access and inheri-
tance described in Section 13.4. Because of outer join optimizations described
in Chapter 11, they do not necessarily add inefficiencies.

13.4 Data Inheritance


Data inheritance is made possible by the hierarchical nature of data modeling
and the outer join’s data structure view’s ability to join data structures. Data
inheritance is shown in Figure 13.4, which demonstrates how tables can be
seamlessly designed so that common portions of their data can be grouped
together into objects to be more easily shared in an object environment. For
example, Employee and Dependents (tables or classes) share the same type of
personal information, such as birthdate, sex, and address. Using data modeling,
this personal information can be moved out of the Employee and Dependent
tables and stored separately in a Person table, to be transparently combined
with the Employee and Dependent tables in views. These views represent
the complete Employee and Dependent data. This data inheritance capability
also adds to the reusability of the data because it can reduce multiple copies
of data.

Data Inheritance EmpView DpndView

Person Person Person

Employee Dependent Employee Dependent

Figure 13.4 Data inheritance supported in SQL by structured views.


Object/Relational Interface 159

The EmpView and DpndView structured views shown in Figure 13.4 are
hierarchical as represented in the diagram, indicating they would be combined
with a LEFT outer join. Another possibility that may give more desirable
results depending on the situation is to join the tables using a FULL natural
outer join to create a logical table, as described in Chapter 7. In this way, the
Coalesce function can be very useful for data inheritance when the same data
types exist in both tables and one or the other need to be used or overridden—
for example, COALESCE (Person.Birthdate, Employee.Birthdate). In this way,
Birthdate would be supplied if it existed in either table, and if it existed in both
tables, the Birthdate value from the Person table would be used since it is the
first one specified in the Coalesce function.

13.5 Database Navigation, Efficiency, and Nonrelational


Access
Object databases need the flexibility and control to navigate the database struc-
ture. Knowledge of the hierarchical data structure being accessed by the outer
join supplies this database navigation information. This was covered in Chapter
10. Normally in applications, database navigation is supplied procedurally, one
instruction at a time. With a nonprocedural language like SQL, it is all supplied
up front, allowing for greater optimization and efficiency when specifying data-
base access operations. This allows combining several access operations into one
for more efficiency. With database access, nonprocedural access is usually more
efficient than procedural and can be optimized for each specific use.
As indicated above, database navigation information allows for the gener-
ation of database access operations. These access operations can also be for
postrelational databases such as nested relational and object databases, legacy
databases such as IBM’s IMS, enterprise access across many types of databases,
and data warehouse databases requiring flexible structured access. These differ-
ent types of access procedures are all seamless because there is a direct mapping
possible with the outer join’s inherent data modeling ability. This in turn
allows for truly seamless and direct disparate and heterogeneous accessing. This
also adds database abstraction since the user does not have to be aware of the
type of database being accessed. These nonrelational database access capabilities
were covered in Chapter 10.
The semantics of the data structures modeled by outer joins offer an
excellent opportunity for optimization. These were disclosed in Chapter 11.
They all offer efficiency, but they also increase reusability and data abstrac-
tion. This is because view optimization (described in Chapter 11) removes
unnecessary tables from the view when invoked. This means the user doesn’t
160 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

have to be concerned about using the most limited view available for the query.
One large view can serve for many smaller subviews. This increases data
abstraction for the user and helps reusability by allowing one view to be used
efficiently in many applications. Efficiency is derived from the possible seman-
tic optimizations and database navigation that supplies the means to implement
the optimizations.
The optimizations utilize the hierarchical structure modeled by the outer
join so they will also work seamlessly on nonrelational databases. Another opti-
mization that offers powerful capabilities for object databases is the dynamic
rewriting of outer join requests that can automatically utilize advanced capabili-
ties in the underlying database system as they become available. This was
described in Chapter 11 and is shown in Figure 13.5. These include SQL:1999
object capabilities and functions that can be used to perform direct navigation
to bypass costly joins. This means that SQL outer join views do not have to be
associated with slow processing join bound processing. This can improve the
performance of inheritance, described in Section 13.4, so that it becomes prac-
tical to use. Since data modeling and structure processing can be improved by
outer join optimizations, all capabilities that depend on them are likewise
improved.

13.6 Late Binding and Polymorphism


The outer join and the DSE technology can operate dynamically. This has
added value for the capabilities already discussed in this chapter, especially to
the object database operation. It allows all the capabilities shown in Figure 13.1
to operate when initiated interactively, and it enhances many of their opera-
tions. Optimizations can be determined and performed at run time when
dynamic access request requirements are known. Reusability is reinforced when
views are invoked dynamically and transparently optimized because it no
longer becomes necessary to have as many views. Warehouse database access

Outer Join: SELECT DpndVal FROM Department


LEFT JOIN Employee ON DeptNo=EmpDeptNo Department
LEFT JOIN Dependent ON EmptNo=DpndEmpNo
Employee
Outer join rewritten to avoid join operation:
SELECT DpndVal FROM Dependent
WHERE DpndNo=NavigateTo(Dependent)
Dependent

Figure 13.5 SQL:1999 navigation can avoid joins while maintaining view semantics.
Object/Relational Interface 161

can support decision support (DSS) by supporting ad hoc requests specified at


run time.
But most importantly for object use, it enables late binding and polymor-
phism. An example of late binding and polymorphism for the outer join is that
it allows different access methods and data structures to be dynamically linked
and accessed, as shown in Figure 13.6. Late binding allows the data structure to
be specified at run time. Polymorphism allows the same outer join statement to
process different types of databases to satisfy the request and this happens at run
time thanks to late binding. This combination can be used to support
plug-and-play capabilities, as shown in Figure 13.7.

13.7 Plug and Play


Utilizing the capabilities of the outer join’s late binding and polymorphic capa-
bilities described in Section 13.6, it is possible to easily create plug-and-play

Late Binding Polymorphism

View View Legacy Relational


1 2 DB DB
A B X X

A
B C Y Z Y Z
C

Application Application

Figure 13.6 Examples of late binding and polymorphism.

Application: DB Plug-In Relational


Component: A
SELECT *
FROM A CompR B C
LEFT JOIN B
ON A=B CompX Nonrelational
LEFT JOIN C A
ON A=C
B C

Figure 13.7 Plug and play.


162 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

database components. These plug-and-play components enable applications to


specify complex database requirements using a neutral database modeling and
access language such as SQL with its standard SQL join operation. Because of
the late binding ability, the database components can be plugged in without
reconfiguration. The polymorphic capability enables disparate database types
to also be plugged in without any reconfiguration.

13.8 Conclusion
The data modeling and data structure processing ability of the outer join
coupled with the data structure meta information extraction technology
(Chapter 9) can produce the capabilities and attributes shown in Figure 13.1.
These capabilities interact with each other to produce features that are
more powerful than when taken alone. Used together, they help make a very
powerful object/relational interface that has the capabilities required of an
object database and at the same time has the features and characteristics of a
relational interface.
The capabilities presented in this chapter were not accomplished by graft-
ing on new features that do not meld with relational operation, or by arbitrarily
defining new semantics for SQL. The standard SQL outer join operation inher-
ently and seamlessly supplies the framework for the capabilities discussed and
shown in this chapter.
14
Nonrelational SQL-Based Universal Data
Access
The growth of the database market resulted in a variety of vendors releasing
SQL products having diverse features, including disparate types, data access
interfaces, and dialects of SQL. There was demand in the database community
for commonality and the ability to use a single SQL dialect and single program-
ming interface in standards-compliant SQL products.
The SQL database companies cooperated to develop standards for the
language and then standards for the data access programming interface. The
international standard SQL Call-Level Interface (SQL/CLI) was published in
1995 and Microsoft aligned its ODBC specification with that standard. When
Java was developed, the JDBC™ specification adopted many of the conven-
tions used with ODBC and SQL/CLI, such as supporting the same SQL
language.
ODBC, SQL/CLI and JDBC support the use of the SQL OUTER
JOIN. Those specifications support OUTER JOIN and the APIs provide exe-
cution time capabilities for determining if a specific database supports
OUTER JOIN. ODBC and JDBC™ share a common escape sequence for
expressing an OUTER JOIN in interoperable SQL statements.
Besides ODBC and JDBC™, a variety of other application program-
ming interfaces (APIs) were developed to provide universal data access. Like
ODBC and JDBC™, they exploit SQL as the language for accessing data.
Using these frameworks with SQL to access a nonrelational data source is feasi-
ble using specialized software, database drivers and data provider, for that data

163
164 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

source. With the appropriate driver, you can use SQL to access spreadsheets,
CODASYL databases and plain text files (if the text files are structured text).
Many nonrelational data sources are hierarchical in structure, but it’s pos-
sible to interface seamlessly to them using SQL. The outer join’s data modeling
ability provides one more powerful tool for SQL-based universal data access.
To demonstrate outer join’s power, this chapter presents a method that enables
standards-based data access frameworks to seamlessly process structured data
records. This process can be applied to hierarchically structured data, such as
XML, IMS, SAS, and Adabas data.
Structured record processing is usually the last legacy type access that is
implemented by SQL-based universal data access products. Because of the way
the structured data is contiguously stored in structured records, SQL has had a
difficult task interpreting its makeup and mapping it to a relational data struc-
ture. This chapter will show how the ANSI outer join operation can naturally
map these hierarchical structures and how their contiguous structure makeup
can be accessed seamlessly by standard SQL-based universal access frameworks.
Some SQL products are starting to support nested relations, where a given col-
umn of a table can itself contain multiple rows and columns of data. These
nested relations can form hierarchical structures very similar to structured
records, and for this reason can be processed in a similar fashion to that shown
in this chapter.

14.1 Structured Record Overview


Structured records are hierarchical data structures that are stored contiguously
in program memory and also when written to storage. Structured records are
used inherently by programming languages like COBOL and C that can seam-
lessly map these structures with their standard data definition syntax. COBOL
can support variable occurring segments while C is limited to fixed occurring
segments, but both can model multileg hierarchical data structures. These
structured data records are also used heavily by 4GLs to store and transfer hier-
archical data structures from place to place.
The composition of structured records is fairly standard except for slack
bytes that can be added for boundary alignment by different programming lan-
guages. The example in Figure 14.1 demonstrates how COBOL defines struc-
tured data and how it is represented in memory or on file, where it can be read
into memory, modified, and read out again.
Nonrelational SQL-Based Universal Data Access 165

01 Div. Div
10 DivName Pic X(20).
10 ProdCnt Pic 99.
10 DeptCnt Pic 99. Dept Prod
10 Dept Occurs 0 To 50 Times
Depending On DeptCnt.
20 DeptName Pix X(20). Emp
20 EmpCnt Pic 99.
20 Emp Occurs 0 To 50 Times
Depending On EmpCnt. Division Data:
30 EmpName Pic X(20).
Div Dept Emp Prod
10 Prod Occurs 0 To 50 Times
DivX DeptA Ron ProdX
Depending On ProdCnt.
Mary ProdY
20 ProdName Pic X(20).
DeptC Mark

DivX 2 2
DeptA 2 Ron Mary DeptC 1 Mark ProdX ProdY

Figure 14.1 View of a variable-length contiguous structured data record.

Variable-occurring segments use count fields defined in their parent seg-


ment to indicate their number of occurrences. Fixed-occurring segments do not
need to store their occurrence count in the record, since it is fixed and can be
kept in the data definition.
The structured record in Figure 14.1 is comprised totally of vari-
able-occurrence repeating segments. These variable-occurrence segment types
re- quire a count field stored in the data record for each separate sequence of
these occurrences under their parent segment. This is necessary because the
occurrence count can be different for each parent occurrence. Fixed-occurrence
counts can also be specified for segments. They do not require a count field in
the data because there are always the same number of occurrences reserved in
the record. The fixed-occurrence count is contained in the meta data that
defines the record format. An example is shown in Figure 14.2, where the Emp
segment type has been defined as fixed (i.e., 20 Emp Occurs 2 Times). Notice
that a fixed-occurrence count does not represent the actual number of data
occurrences, only that there are a fixed number of segment blocks—some may
not be used as shown below.
166 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

01Div. Div
10 DivName Pic X(20).
10 ProdCnt Pic 99.
10 DeptCnt Pic 99. Dept Prod
10 Dept Occurs 0 To 50Times
Depending On DeptCnt.
20 DeptName Pix X(20). Emp
20 Emp Occurs 2 Times.
30 EmpName Pic X(20).
10 Prod Occurs 0 To 50 Times Division Data:
Depending On ProdCnt.
20 ProdName Pic X(20). Div Dept Emp Prod
DivX DeptA Ron ProdX
Mary ProdY
DeptC Mark

DivX 2 2
DeptA Ron Mary DeptC Mark Null ProdX ProdY

Figure 14.2 View of a structured data record with “fixed occurs” Emp segment.

14.2 SQL Structured Data Access Basics


The outer join syntax can be used to define a view of the hierarchical structure
for a structured data record so it can be seamlessly accessed. This can be per-
formed by defining each segment type of the structured record as a relational
table. Then, whenever the structure record is queried by SQL, either by itself or
as part of a larger structure, the outer join structured record view is used to
define the structured record portion of the logical view. Figure 14.3 demon-
strates this.
Since structured data segments are contiguous, they do not need or usu-
ally contain unique and foreign keys for linking. These missing keys are added
to the SQL view definition in Figure 14.3 as virtual surrogate keys that are pro-
cessed by the structured record processor, which is described later in
Section 16.4. To define the structured record accurately, the order that the
structured record SQL view is defined must specify its legs in the same order
they occur in the physical data structure. This is not necessary in a logical hier-
archical structure, but may be required in a physical structure for navigation.
All SQL access to the structured record is performed through the outer
join view that defines it in its entirety. This has the advantage that this view is
Nonrelational SQL-Based Universal Data Access 167

DEFINE DivView AS
SELECT * FROM Div Div
LEFT JOIN Dept
ON DivKey=DeptDivFkey Dept Prod
LEFT JOIN Emp
ON DeptKey=EmpDeptFkey
LEFT JOIN Prod Emp
ON DivKey=ProdDivFkey

Div Dept Prod


SELECT Div, Dept, Prod DivX DeptA ProdX
FROM DivView DivX DeptC ProdX
DivX DeptA ProdY
DivX DeptC ProdY

DivX
DeptA Ron Mary DeptC Mark ProdX ProdY
,

Figure 14.3 Using hierarchical SQL view to access structured data.

the only view necessary for accessing the structured data record. Because of the
SQL optimization documented in Chapter 11, Section 11.3, this view always
eliminates unnecessary table accesses for each specific use of the view. This
means there is never a penalty for using this global view.

14.3 Internal Navigation and Mapping of Structured Data


To access a structured data record, it must be first mapped so that all segment
types are easily accessible and their occurrences can be navigated. In order to
map the structured data record, its data definition is necessary. This data defini-
tion describes the hierarchical data structure, its segments, and their hierarchi-
cal level and relationships to other segment types in the structure. As stated
previously, fixed segment occurrence counts are stored in the data definition,
while variable segment occurrence counts are stored in the data record. The
pseudo code in Figure 14.4 uses the hierarchical order (top to bottom, left
to right) and physical database hierarchical level of the segment definitions in
the data structure definition to drive the mapping and segment decomposition
process.
The pseudo code in Figure 14.4 has a couple of optimizations for bypass-
ing the storing of unnecessary segment occurrences. These are possible when
168 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Set Structured Record Buffer address to start of structured record input data.
Set current position in View Definition to root segment definition.
Init Internal Segment Address Stack to empty.

/* Outer Do Loop is invoked when entering a segment definition to process


the first instance of its data segment occurrences. */

Do Forever /* Start of outer loop */


If mode=Read-Only and rest of structured data record not required.
Then Exit, processing complete.
End If.
If the Current Segment Definition has a Fixed occurrence count.
Set Active Occurrence Count to the fixed amount in the view definition.
Else the Current Segment Definition is a variable occurrence stored in record.
Set Active Occurrence Count to count found in the parent data segment.
End If.
Push Current Segment Definition address onto Segment Address Stack.

/* Inner Do Loop stores segment occurrences in an accessible memory


structure and determines the next Segment Definition to process. */

Do Forever /* This is the start of the inner loop */


If Active Occurrence Count for Current Segment Definition > 0.
If Mode=Update or this segment type is required for processing.
Store this segment data occurrence in an accessible memory structure.
End If.
Subtract 1 from Active Occurrence Count of Current Segment Definition.
Set Buffer Address to point passed current segment occurrence.
If next Segment Definition in view is at a lower hierarchical level.
Set Current Segment Definition to next one in view.
Exit inner loop.
End If.
Else end of segment occurrences reached for Current Segment Definition.
If Segment Definition Stack is empty.
Then processing is complete.
End If.
Pop and discard top address in Segment Definition Stack.
Locate next Segment Definition at the same hierarchical level.
If locate successful.
Set Current Segment Definition to the one just located.
Exit inner Do loop.
Else locate was not successful.
Set Current Segment Definition to the one on top of stack.
End If.
End If.
End Do of inner loop.
End Do of outer loop.

Figure 14.4 Pseudo code to decompose and map a structured record.

the data is for read-only purposes and will not be updated. Another optimiza-
tion that is possible is to hold off invoking this segment decomposition routine
until after the root segment for the active record is processed. This is possible
because the root segment will be processed first, before the lower level segments
Nonrelational SQL-Based Universal Data Access 169

of the record are required. The root segment is the leading segment and is
accessible without performing the segment decomposition routine. The reason
that this is an optimization is that very often the root segment contains record
selection or join qualification criteria that may cause bypassing of further pro-
cessing of the record, and this optimization will avoid the process of decompos-
ing the record.
If the structured record is to be updated, including inserting of segment
occurrences, the structured record must also be moved into a hierarchically
linked structure, or at least expanded while it is being mapped. This will allow
for the insertion of segment occurrences. Writing an updated structure record
back out is accomplished by first compressing it back into a contiguous struc-
tured record. This process is much easier than expanding the data structure,
since it has already been mapped.
It is worth noting that languages that can define hierarchical structures,
including COBOL, C, C++, XML, Java, Haskell, have the procedural flexibil-
ity to define structures that do not conform to good structure definition princi-
ples. These can cause problems for mapping procedures like the one in Figure
14.4. The most important rule to observe when defining a hierarchical struc-
ture is to keep each segment’s data definition contiguous. This means that once
a lower level child segment type is defined, it should indicate the end of the par-
ent segment. Any remaining segment data is ambiguous to the structure defini-
tion process.

14.4 SQL-Based Universal Data Access of Structured Data


SQL is the most widely-adopted technology for accessing data across a broad
spectrum of data sources. The SQL community has developed standard appli-
cation programming interfaces, including ODBC and JDBC, that can provide
access to heterogeneous databases and structured data, such as text and spread-
sheets. Although these APIs support SQL, access to disparate data sources is not
always a straight-forward procedure. There can be important differences in
types, functions and other features. Structured data records present an addi-
tional access problem because of their contiguous format. The data access
middleware design in Figure 14.5 uses a two-step process to interface struc-
tured records seamlessly to SQL-based universal data access interfaces. The
structured record processor box in the diagram moves the data between the
structured data record and the intermediate tables using the data structure
metadata extracted from outer join specifications to navigate the structured
record. The data provider component moves the data between the intermediate
tables and the universal data access interfaces (for example, ODBC). By using
170 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

DivX DeptA DeptC ProdX ProdY


Ron Mary Mark

Structured Record Processor

Intermediate Div Dept Emp Prod


Tables

Data Provider/Driver
SQL

ODBC, JDBC, etc.

Middleware Product

Figure 14.5 Interfacing to SQL-based universal data access middleware.

these intermediate virtual tables, any order of SQL requests from the universal
data access interfaces can be handled in a direct fashion, including updates.
With the outer join modeling the structured data record, this method produces
a truly seamless interface process with the SQL-based universal data access
interfaces.
Because structured records on file are more easily addressed through their
root segment, this can affect processing of SQL WHERE and ON clauses that
reference data in lower level segments in structured records. For root references,
the structured record processor in Figure 14.5 can directly address the required
structured records on file, while for lower level references it will have to sequen-
tially search through the selected structured records’ contents unless a second-
ary index was used.

14.5 Handling Multiple Structure Formats Within a File


Files that contain structured records may also contain multiple record formats
that are interspersed in the file. These structured records will have a field in
their root segment that will distinguish the different record types in the file.
Applications can handle these different record formats by testing this format
Nonrelational SQL-Based Universal Data Access 171

indication in the root segment and then using the proper structure overlay to
process it. A similar technique can be used for SQL queries to ensure that only
records of a specific format are processed by selecting on the format indication.
This is usually appropriate for queries since only one format for a query is usu-
ally required at one time. This format selection process can be specified as in:
SELECT EmpNo FROM StructuredView WHERE DeptNo=123 AND Struc-
turedFormat=2. In this example, the DeptNo and StructuredFormat fields are
located in the root segment. This technique works because the structured
record can be retrieved and its root segment tested without the need to decom-
pose the structured record, as discussed in Section 3 of this chapter.

14.6 Interfacing to Prerelational and Postrelational Data


Interestingly, prerelational and postrelational systems are very similar. They
both process complex hierarchical data models, while conventional relational
databases use simple two-dimensional data tables and result structures. In this
regard, prerelational and postrelational systems have similar tasks to perform
in order to process them using SQL requests. This means that they too can
be processed in a similar fashion to structured records, as is demonstrated in
Figure 14.6, which replaces the structured data record processor in Figure 14.5
with a nested relational processor. This could also have been an IMS database
or any other hierarchical database (see Chapter 11 for an IMS example).

14.7 The Importance of the View for Contiguous Data


With contiguous data, as described earlier, the entire contiguous data structure
must be known to handle all possible data access requests. This is because it

Nested Relational Tables: Div

Dept Prod
Emp

Nested Relational Processor

Figure 14.6 Interfacing universal data access to nested relational structures.


172 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

may be necessary to navigate across unnecessary and unrelated data to get to


required data. For example, in our structure example shown in Figure 14.3, in
order to access Product data it is necessary to navigate over Division, Depart-
ment, and Employee data. It is usually necessary to navigate over Division data
in this data request because it is on a structure path to Product data, but
Department and Employee data are not on a path to Product data—yet they
still require navigating across. This is because they physically precede the Prod-
uct data in the contiguous structured record, making the starting point of the
Product data in a variable location in the record that requires understanding the
data structure of the preceding data to locate. This explains why the entire
structure of the contiguous data structure is necessary to access it.
One of the advantages of data modeling SQL is that the data structure
meta information necessary to access data structures is contained in the same
SQL used to access it. Contiguous data structures may present a problem in
this case since the entire data structure is necessary to access them and usually
only the portion of the data structure necessary to access the required data is
defined in the access SQL. The solution to this problem is to supply one global
view of the contiguous structure and require that it be used for all access of the
data contiguous structure. This may seem to cause a problem where the
overdefinition will cause unnecessary processing and storing of data. This is not
the case, because of the structured view optimization described in Chapter 11.
This optimization eliminates unnecessary processing of pathways specified in
the SQL specification. This also means that having one SQL view definition
for any type of structure will always work without imposing any additional
processing.
Utilizing the SQL view as a global application definition for structured
data, as described above, offers the opportunity for the SQL view definition to
contain the required meta information necessary for the access of the defined
physical hierarchical nonrelational structure. This access will be performed by
the access method for this database type. In this way, the SQL that makes up
the global view is the logical (application) data structure while the physical data
structure information is stored in the view definition. The amount of the physi-
cal structure that will require accessing is determined by the data that is selected
for accessing or processing. Physical network data structures can be handled by
having a global view definition for each global hierarchical view derived from
the physical network view. How the physical nonrelational meta information is
obtained, stored, and utilized is outside of the standard SQL specification,
keeping this SQL-based nonrelational structured access ANSI standard.
Nonrelational SQL-Based Universal Data Access 173

14.8 Conclusion
A variety of applications make use of structured, hierarchical and tagged data.
When structured data blocks are written out to a file, they are accessible as
structured records.This chapter has shown how these structured records can be
seamlessly processed by SQL. In order to demonstrate this, it was shown how
structured records are composed and decomposed for access. It was then shown
how SQL processing can seamlessly map to and from a decomposed structured
record. Finally, it was shown how SQL structured record access can be imple-
mented seamlessly using SQL data access with APIs such as ODBC, SQL/CLI
and JDBC. This structured data example was used because it can be easily
adapted to operate with all other physical forms of hierarchical data.
Part IV
Advanced Data Structure Processing
Capabilities
Part IV describes the new capabilities for supporting SQL hierarchical process-
ing with advanced and extended operations. Chapter 15 introduces advanced
lower level structured data linking, opening new data modeling capabilities and
unlimited structure join capabilities. Chapter 16 covers three new ways to com-
bine data structures using joining, mashups, and table association for advanced
ways to heterogeneously integrate and filter data. Chapter 17 describes how to
dynamically increase data value and flexibility of queries, making them more
powerful, supporting hierarchical optimization, dynamic structure joining, and
the needed structure-aware processing. Chapter 18 covers how the lowest com-
mon ancestor (LCA) processing automatically supports multipath hierarchical
structure processing naturally in SQL. Chapter 19 introduces many forms of
data structure generation, using looking forward and backward to support dif-
ferent the types of variable structure generation that are discussed. Chapter 20
demonstrates semantically controlled data structure transformations involving
restructuring, reshaping, and data virtualization. Finally, Chapter 21 intro-
duces the new automatic processing of remote dynamic structured data process-
ing for capabilities such as new software development techniques using social
collaboration.

175
15
Advanced Lower Structure Linking
Advanced lower structure linking applies to hierarchically linking to the lower
structure in a way that is not covered in the linking rules specified in Chapter 6.
Normally when linking to the lower structure, the root of the lower structure is
the only link point that can be referenced. This creates a valid hierarchy, and
one that can be built top to bottom as would normally be expected for a hierar-
chy. But there may be times when it is desirable to link to an existing lower
level structure not based on its root. This is actually possible, and it will form a
valid logical hierarchical structure with hierarchical semantics that are seam-
lessly compatible with standard SQL view processing.

15.1 Overview of Nonroot Lower Level Linking


As stated above, it is often convenient and necessary to link to an existing lower
level data structure by referencing nonroot segments in the lower structure.
This is possible and will form a valid hierarchical structure with hierarchical
semantics, but may require special processing precautions because hierarchical
structures built in this manner cannot always be processed in a strict top-
to-bottom fashion. This advanced linking process is shown in Figure 15.1. It
may require some special processing requirements that will be covered in this
chapter.
Figure 15.1 demonstrates, as first pointed out in Chapter 6, that when
linking below the root segment of a lower level structure, the root-level segment
remains the lower level structure link point. This rule is supported by the fact
that the Department segment used in the lower level link criteria is itself

177
178 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT * FROM Manager LEFT JOIN DivView ON Mgr=DeptMgr

Resulting Structure
Manager
DivView View Manager
L
I Division Division
N
K Department Department

Figure 15.1 Example of nonroot-level linking of bottom structure.

dependent on the Division segment’s existence, as shown in the example in


Figure 15.1. This means that the Division segment has to be linked to the
Department segment before the Manager segment is linked to the lower struc-
ture, which semantically follows the expanded SQL syntax used in these situa-
tions. This logically makes the lower level structure root the link point since all
segments under it are dependent on it. This also means that hierarchical
top-down processing is not always possible with this linking method.

15.2 Previous Nonroot Lower Level Linking Method


Some prerelational systems supported linking to lower level substructures using
a nonroot-level reference point. The easiest way to handle this for prerelational
systems was to make the reference point of the lower level structure the link
point that caused the substructure to be inverted around the link point. This
also causes all other paths originating from the root segment of the lower struc-
ture to be discarded. An example of this is shown in Figure 15.2.
This approach to linking to a lower level structure causes the structure of
the lower level structure to change and thus its semantics change, also. For
example, in the resulting structure, Division no longer affects Department and
Product is removed. So this is probably not the best approach to take if
another, more seamless approach is available. This approach of linking to a
nonroot-level link point in SQL does not emulate SQL’s natural join syntax
and semantics.

15.3 Semantics of Nonroot Lower Level Linking


Nonroot lower level structure linking can also be performed using multiple link
points as long as they originate from a single upper level structure link point as
Advanced Lower Structure Linking 179

Manager Resulting Structure

L Division View
I
N Division Manager
K
Department Product Department

Employee Division Employee

Figure 15.2 Example of old method of performing nonroot-level linking.

defined in linking rule two in Chapter 6. An example of this operation with its
data structure diagram and SQL is shown in Figure 15.3. Even with multiple
paths to the lower structure, the root of the lower data structure is semantically
the link point and the standard SQL outer join semantically and operationally
supports this derived data structure. The lower level structure, which is usually
built before it is joined, is filtered when joined according to the link criteria.
This is the same process that occurs when structures are built bottom-up and
throwaways (retrieved row discards) occur, as was described in Chapter 11. In
the example below, the Division view is filtered according to the Manager link
value as it is linked. This means as each manager is linked to the Division
view, only the Department and/or Product for which that particular employee

SELECT * FROM Manager


LEFT JOIN DivView ON DeptMgr=Mgr OR ProdMgr=Mgr

Manager Resulting Structure


L L
I I Manager
N DivView View N
K K
1 Division 2 Division

Department Product Department Product

Employee Employee

Figure 15.3 Multiple path nonroot reference to lower structure.


180 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

is manager is preserved. This is a simplified description, expanded further


below.
To understand multiple path nonroot references, it is easier if single path
references are understood first. If the SQL ON clause in Figure 15.3 did not
specify an AND or OR clause so there is only one link criteria—say, a Depart-
ment comparison—the Manager link would only be made on a Department
match, with all other nonmatching Departments filtered out. But, no Products
would be filtered since there would be no filtering criteria specified for it. These
semantics are intuitive, unambiguous, and useful.
The SQL statement in Figure 15.3 does use an OR to link managers to
the lower structure based on whether they are a department manager or a prod-
uct manager, creating a multiple path reference. If a manager is neither, than he
or she will not be linked to the lower structure. If a manager is a department
manager, he or she will be linked to the lower structure with all other
nonmatching Departments filtered out, but with no Products filtered out. If a
manager is a product manager, the reverse is true; he or she will be linked to the
lower structure with all other nonmatching products filtered out, but with no
Departments filtered out. If a manager is a manager of a department and a
product, then he or she will be linked to the matching lower structure and no
filtering of the Department and Product will occur. This is consistent with the
one-sided matches just described and follows the natural hierarchical sibling leg
query filtering semantics described in Chapter 5.
If the SQL ON clause in Figure 15.3 specified an AND operator instead
of an OR operator, then a multiple path link would only match a situation
where the employee was both a manager for a product and a manager for a
department in the same division, and all other managers and products would be
filtered out. The manager would have to be a department and product manager
from the same Division because of the common parent rule, also described in
Chapter 5.
To see why the different semantics described above make sense and why
SQL and structured data follow these semantics, producing data results that
support these semantics, some query examples will be examined. The data in
Figure 15.4 will be used in these queries that appear in the next sections. The
data results are presented both in a structured format and a relational flat,
two-dimensional format, which uses the Cartesian product to represent the
data in this form. There are sibling segment paths in the data results to demon-
strate their semantic operation.
Advanced Lower Structure Linking 181

Manager Structured DivView


Table View

Mgr Division Dept DeptMgr Emp Prod ProdMgr


Mike DivX DeptA Mike Ron ProdX Jim
Ralph Mary ProdY Mike
Jim DeptB Don Jane
Steve
DeptC Ralph Mark
John

Relational Cartesian Product View of DivView:

Division Dept DeptMgr Emp Prod ProdMgr


DivX DeptA Mike Ron ProdX Jim
DivX DeptA Mike Ron ProdY Mike
DivX DeptA Mike Mary ProdX Jim
DivX DeptA Mike Mary ProdY Mike
DivX DeptB Don Jane ProdX Jim
DivX DeptB Don Jane ProdY Mike
DivX DeptB Don Steve ProdX Jim
DivX DeptB Don Steve ProdY Mike
DivX DeptC Ralph Mark ProdX Jim
DivX DeptC Ralph Mark ProdY Mike
DivX DeptC Ralph John ProdX Jim
DivX DeptC Ralph John ProdY Mike

Figure 15.4 Data used in following nonroot linking examples.

15.4 Single Path Reference to Lower Structure


A single path reference below the root to a lower level structure can consist of a
single reference or multiple ANDed references along a single path in the lower
structure. In the latter case, this can include the root of the lower structure.
Figure 15.5 shows an example of linking to a lower level structure using a single
reference below the root. Single or multiple references ANDed along a path
operate on the same semantic filtering principles, so this example should suffice
in all single path cases. This example’s results and the others in this chapter use
a structured format to emphasize the data structure being displayed.
The SQL query statement in Figure 15.5 hierarchically links the upper
level structure consisting of only the Manager table to the lower level DivView
structure. This link is based on the lower level structure’s DeptMgr data field
located below the root of the lower level structure, which creates the hierarchi-
cal structure and data shown in Figure 15.5—the associated semantics were
described in Section 15.3. DeptB is filtered out since its department manager
Don is not in the Manager table. Along with DeptB, its Employees are also fil-
tered out, as you would expect. The last result in Figure 15.5 lists manager Jim
182 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Mgr Mgr
SELECT *
FROM Manager Div Div
LEFT JOIN DivView
ON DeptMgr=Mgr Dept Prod Dept Prod

Structured Result: Emp Emp

Mgr Division Dept DeptMgr Emp Prod ProdMgr


Mike DivX DeptA Mike Ron ProdX Jim
Mary ProdY Mike
Ralph DivX DeptC Ralph Mark ProdX Jim
John ProdY Mike
Jim --- --- --- --- --- ---

Figure 15.5 Single path nonroot reference to lower structure data example.

with no other data since Jim is a product manager and not a department man-
ager, and the linking was based on department managers. Notice that all the
other data on the nonfiltered paths are not filtered out. This structured result
also reflects the same result (minus the replicated data) applied relationally, as
can be seen by applying the link criteria to each row in the Cartesian product
in Figure 15.4.

15.5 Multiple Path References to Lower Structure


A more complex lower level linking occurs when multiple paths to the lower
level structure are used. While multiple path lower level linking does create a
valid hierarchical structure, the results may appear ambiguous, depending on
the use of the data. The use of the data may not fit its intended use, which can
usually be corrected by using a single path reference, but sometimes a multiple
path reference may be what is needed.
The SQL query statement in Figure 15.6 hierarchically links the upper
level structure consisting of only the Manager table to the lower level DivView
structure. This link is based on the lower level structure’s DeptMgr or ProdMgr
data fields located below the root and on different paths of the lower level struc-
ture, creating the hierarchical structure and result shown. Since manager Mike
is both a department and product manager, no Departments or Products are
filtered out since a match in product manager includes all Departments and a
match in department manager includes all Products. Manager Ralph matches
with DeptC only, thereby filtering out other Departments, but not Products.
Manager Jim only matches with product X, thus filtering out other Products
Advanced Lower Structure Linking 183

Mgr Mgr
SELECT *
FROM Manager Div Div
LEFT JOIN DivView
ON DeptMgr=Mgr
Dept Prod Dept Prod
OR ProdMgr=Mgr
Emp Emp
Structured result :

Mgr Division Dept DeptMgr Emp Prod ProdMgr

Mike DivX DeptA Mike Ron ProdX Jim


Mary ProdY Mike
DeptC Ralph Mark ProdX
John ProdY
DeptB Don Jane ProdX
Steve ProdY
Ralph DivX DeptC Ralph Mark ProdX Jim
John ProdY Mike
Jim DivX DeptA Mike Ron ProdX Jim
Mary
DeptB Don Jane
Steve
DeptC Ralph Mark
John

Figure 15.6 Multiple path nonroot reference to lower structure data example.

but not Departments. As stated previously, the multiple path semantics dem-
onstrated here were covered in Chapter 5 under sibling leg semantics.
This structured result also reflects the same result applied relationally,
as can be seen by applying the link criteria to each row in the Cartesian product
in Figure 15.4. This result may seem ambiguous since in some cases Products
are filtered and in other cases Departments are filtered. But it does link
the structure to the DivView structure hierarchically and may be useful if
the filtered values are not used in summaries unless they match the resulting
semantics.
A final word about multiple paths and sibling path semantics. The Divi-
sion view (DivView) in Figure 15.3 was used to demonstrate multiple path
semantics using the Department and Product tables. These semantics were first
described in Chapter 5, which documented how sibling leg semantics relied on
the “common parent” domain to determine and control the semantics. The
common parent of the Department and Product segments is the Division seg-
ment, which also happens to be the root segment of the Division structure.
Note that this is a coincidence—the root of a structure does not automatically
operate as a common parent. This means that semantics of multiple path lower
level references could become complex, with many different common parents
184 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

occurring at different locations in the structure. While the internal semantics of


multiple path lower level structure references may be complex and the results
may seem ambiguous, the result is logically and relationally sound, and can be
intuitive once the user is familiar with OR logic semantics.

15.6 Optimization Concerns for Nonroot Lower Level Linking


The optimizations specified in Chapter 11 can still be performed, but when
nonroot lower level linking is used, additional requirements need to be
imposed on a case-by-case basis based on hierarchical semantics. Top-down
optimization as described in Chapter 11 is limited. In the SQL query in
Figure 15.3, for example, the Division segment must be joined to the Depart-
ment and Product segments before it can be joined to the Manager segment.
This can also affect view optimization, described in Chapter 11. This optimiza-
tion can still be performed, but will have to be adapted to sometimes access link
criteria points even if they are not on a path requiring access. In the example in
Figure 15.7, the Department table is the only table containing selected data.
Normally, the Employee and Product tables would not require access since they
are not on a path to selected data. However, indirectly the Product table is on a
path to the required Department data, since the Division table relies on it to be
linked with the Manager table. Thus, removing it from access could change the
result.

SELECT Department FROM Manager


LEFT JOIN DivView ON DeptMgr=Mgr OR ProdMgr=Mgr

Structure
Manager Resulting Optimized
L L
I I
N N Manager
K DivView View K
1 2
Division Division

Department Product Department Product

Only Table Table Not


Employee Selected Employee Needed

Figure 15.7 View optimization needs to adapt for nonroot-level linking.


Advanced Lower Structure Linking 185

15.7 Using Lower Structure Linking with a View WHERE


Clause
In Chapter 6 it was shown how structured subviews could contain WHERE
clauses to filter the data in their view. Because of the way WHERE clauses
operate on the entire structure, as explained in Chapter 7, using them with sub-
views presents problems, in particular, the filtering of higher level data based on
lower level data. This results in a nonhierarchical form of processing, logically
requiring bottom-up processing. For this reason, Chapter 6 suggested limit-
ing view WHERE clause processing to the root segment of the view. This
allowed the view to be filtered based on its root, while keeping the processing
standard.
View WHERE clause processing using lower level filtering criteria is
another form of advanced lower level structure linking as described in this
chapter. This chapter shows that it does form a valid hierarchical structure and
can be processed taking into consideration its special processing requirements.
Figure 15.8 demonstrates how this view WHERE clause processing with lower
level references results in the same processing requirements and filtering results

EmpView View

DEFINE EmpView AS
Employee SELECT * FROM Employee
LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
Dependent WHERE DpndAge<19

Department View

SELECT * FROM Department


Department LEFT JOIN EmpView
ON DeptNo=DpndDeptNo

Expanded View:
Employee
SELECT * FROM Department
LEFT JOIN
Dependent Employee LEFT JOIN Dependent
ON EmpNo=DpndEmpNo
ON DeptNo=DpndDeptNo
AND DpndAge<19

Figure 15.8 Advanced lower structure view WHERE clause processing.


186 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

as described earlier in this chapter. The WHERE clause transformation to ON


clause shown in this example was first shown in Chapter 6. In that example, the
WHERE clause transformation was limited to join criteria for the root table or
segment. With lower level linking, this limitation is relaxed to include tables
and segments from the root down to the lower level link segment.

15.8 Conclusion
Nonroot lower level structure linking is a powerful capability that extends the
outer join’s data modeling capability and hierarchical processing. While it does
not follow hierarchical processing rules precisely, it does generate hierarchical
structures with hierarchically correct semantics, and extends this hierarchical
data modeling ability automatically and naturally to relational and
nonrelational database processing.
16
Dynamic Structure Combining by
Joining, Mashups, and Association
So far in this book, the data modeling has consisted of building the hierarchical
structure one table or node at a time. In this chapter, we will look at the joining
of fully formed hierarchical structures and their hierarchical joining into larger
hierarchical structures that contain the combined semantics of both structures.
The query process that joins the hierarchical structures can also query the com-
bined structure, allowing for enhanced improved processing results. It is not
possible to gain these results by accessing each structure separately. In this chap-
ter, we will look at three ways to join hierarchical structures. These are: using
the standard structure join method; using a newly discovered method that
allows for advanced data structure mashups; and using a powerful association
table when the structures are not directly related. The standard structure join
method allows joining to any location in the upper hierarchical structure, but
requires joining to the root of the lower structure. This is semantically valid
intuitively. The LEFT JOIN is used to hierarchically preserve unmatched data
on the left side of the join operation. The LEFT JOIN ON clause join criteria
are specified at each join point, giving absolute hierarchical control.

16.1 Static Structure Join


Figure 16.1 is an example of the static joining of hierarchical data structures
DivisionView and DeptView and the resulting structure. This join is defined
in a view, making it a static operation that is previously defined. This makes

187
188 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Division
Division

Product Department
Product Department

Feature Project Employee Employee


Feature Project

Dependent Dependent

Figure 16.1 Static structure joining.

static joins not easily modifiable. The join operation will materialize each struc-
ture and link them together by following the ON clause specification in the
same way this book has shown how hierarchical structures have been built. The
ON clause operation is represented in Figure 16.1 by the arrow connecting the
Division and Department node boxes. This join is invoked by the dynamic
SELECT query operation, directly following the previously defined static view
definition in this example. The materialized structure is freed up after the query
is processed. The SQL source provides the metadata that defines the structure.
This automatically combines structures by joining their metadata and defines
the structure so that the hierarchically combined structure can be automatically
navigated internally when processed. Like XML, the SQL source can directly
act as the hierarchical metadata defining the hierarchical data structure.

16.2 Dynamic Structure Join


Figure 16.2 is an example of the dynamic joining of two hierarchical data struc-
tures DivisionView and DeptView, and its resulting structure. Besides the static
joining, SQL also supports a dynamic language solution. This permits dynamic
processing without having to define the SQL operation beforehand, as in the
statically defined step shown in Figure 16.1. This allows the SELECT state-
ment in Figure 16.2 to be entered dynamically and interactively in an ad hoc
Dynamic Structure Combining by Joining, Mashups, and Association 189

Division Division

Product Department Product

Feature Project Employee Feature Project Department

Dependent Employee

Dependent

Figure 16.2 Dynamic structured joining.

fashion, and can support Enterprise Information Integration (EII). In this


dynamic example, the SELECT statement has been specified differently than in
Figure 16.1. The ON clause was changed to link the Product node dynamically
to the Department node, producing a different structure and semantics. These
last two examples demonstrate how easy it is to create and use complex
multipath hierarchical structures. This is a powerful use of dynamic conceptual
data modeling. Dynamic EII and static ETL (Extract, Transform, and Load)
can be used in the same dynamic SQL SELECT statement.

16.3 Heterogeneous Join


With SQL now being able to support hierarchical processing, relational and
hierarchical integration and heterogeneous operations are now possible at a full
hierarchical processing level. The Division view in Figure 16.2 could be a hier-
archical XML structure where the boxes represent nodes and the Department
view is comprised of boxes representing relational tables. This allows seamless
hierarchical processing. The relational view is materialized by retrieving the
tables and joining them into a rowset, which is normal for relational processing.
On the other side of the processing, the physical XML structure is serialized
directly into a rowset without the need for joining, making it very efficient. The
two rowsets are then seamlessly joined into the final relational working set.
Once the physical structure is converted to a relational rowset, it can take
advantage of the more flexible relational processing.
190 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

16.4 Access Path Data Filtering


As described previously in this book, the local ON clause was introduced in the
SQL-92 standard to offer more control over the global WHERE clause for
specifying join and data filtering operations. The examples used in Figures 6.1
and 6.2 used the ON clause for specifying the path join criteria. Path filtering
can also be used in the ON clause specification to additionally specify data fil-
tering by specifying data value testing. For example, the following ON clause
specifies the join criteria and filters employee salary: ON DeptNo=EmpDeptNo
AND EmpSalary>50000.

16.5 Natural View Nesting


A very natural operation with the joining of structured views is their natural
nesting, which keeps the different views isolated even after the different views
are expanded into a single SQL metadata result. Figure 16.3 shows how the
Division view and Department views are built one node at a time and then
joined together internally. The external SELECT statement dynamically
expands the Division view and Department view references in their relative
locations in the metadata string. This is represented by the top two bottom ver-
tical boxes, which are sequentially combined into a single data string and
shown in Figure 16.4. This represents the result metadata to be executed.
Notice in Figure 16.4 how the externally specified LEFT JOIN (in bold)
between these two views (Division and Department) is followed by another
LEFT JOIN without an intervening ON clause. This causes the immediately
following department view to be fully materialized before being joined the
Division node by the final ON clause, which combines the fully formed data
structures at the end of the metadata string. This natural internal nesting causes
views to have their own separate working set, which keeps them separate during
processing. This has uses for naturally supporting advance data structure mod-
eling and processing.

16.6 Simple Mashup


Hierarchical joins have previously had to be limited to always linking to the
root of the lower-level structure, which is very restrictive. Other hierarchical
database systems have allowed linking to a nonroot node, but have had to use
semantics that were not intuitive; each vendor supporting this capability sup-
ported it differently based on their own internal design. Looking just at the ON
Dynamic Structure Combining by Joining, Mashups, and Association 191

Division View Division


LEFT JOIN Product
Division
ON DivNO=ProdDivNo
LEFT JOIN Feature
Products
ON ProdNo=FeatProdNo
LEFT JOIN Project
Feature Project ON ProdNo=ProjProdNo

Department View

Department
Department
LEFT JOIN Employee
ON DeptNO=EmpDepNo
Employee
LEFT JOIN Dependent
ON EmpNo=DeptEmpNo
Dependent

Figure 16.3 Metadata combining.

LEFT JOIN Product ON DivNO=ProdDivNo


LEFT JOIN Feature ON ProdNo=Feat.ProdNo
LEFT JOIN Project ON ProdNo=ProjProdNo

LEFT JOIN Employee ON DeptNO=EmpDeptNo


LEFT JOIN Dependent ON EmpNo=DeptEmpNo

Figure 16.4 Naturally nested metadata.


192 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

join criteria in Figure 16.5, it does not seem intuitively correct for a hierarchical
structure. It is not modeling a valid hierarchical structure on the face of this
operation. The join points, when linking below the lower-level root in Figure
16.5, do not appear to define the resulting structure or any valid hierarchical
structure. In reality, the resulting structure and SQL semantics displayed are
valid as shown. This is a mashup because it eliminates joining restrictions
allowing unlimited joining possibilities.
It was shown in Figure 16.3 that the lower-level structure was not sequentially
joined one node at a time to the upper structure that had already been built,
but was first totally materialized before being joined to the upper structure.
This allows two powerful capabilities. It allows lower logical structures to be
treated as physical structures that are fully materialized before being joined to
the upper structure. This internally makes lower-level linking feasible. This was
the first hurdle. The second hurdle was finding valid consistent semantics for
linking below the root. It turns out that a materialized hierarchical structure has
its own fully formed hierarchical semantics that are controlled from its root
downward. Linking below the root (as shown in Figure 16.5) does have a hier-
archical filtering effect on the result, which makes sense semantically and is
desirable. This will cause unmatched data node occurrences below the
employee node to be filtered out, and it can also filter out higher-level nodes
above Employee if all employees are filtered out. This will result in no matches
for the lower-level structure.
Since the lower structure is fully formed with its own hierarchical seman-
tics, the lower-level structure can be linked anywhere below the root. The

Division Division

Product Department Product

Feature Project Employee Feature Project Department

Dependent Employee

Dependent

Figure 16.5 Simple data mashup.


Dynamic Structure Combining by Joining, Mashups, and Association 193

resulting semantics always enforce the linking from the upper level structure to
the root of the lower level structure, as shown in Figure 16.5. This preserves the
semantics of the lower level structure.

16.7 Complex Mashup


It was explained in Section 16.6 that linking below the root of the lower struc-
ture was possible. However, a more powerful mashup capability offering
unlimited mashup possibilities is also possible, utilizing the same capabilities
described in Section 16.6. This new complex mashup capability allows the
lower level structure to be joined at multiple locations using compound (com-
plex) AND/OR join criteria. This allows for extremely powerful and flexible
join criteria that filters the lower structure with precise control. In Figure 16.6,
the compound join criteria will qualify nodes if there are matching project
managers or matching feature managers. The complexity of the processing level
is increased because the filtering is applied differently, depending on which join

Department
Department

Employee
Employee

Dependent
Dependent Division

Product

Division
Feature Project

Product

Feature Project

Figure 16.6 Complex data mashup.


194 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

criteria is matched. When only the Project node is matched, the Project node is
filtered instead of the Feature node, which remains unfiltered. The reverse
would be true if only the Feature node was matched. If both nodes qualify
together, both perform filtering together from both sides. If this join criteria
used an AND condition instead of an OR condition, then both sides would
need to qualify at the same time in order to find a match. This demonstrates
very complex and flexible control for mashups. This is the same logic as access-
ing the lower structure directly with a multipath query. Both cases should be
and are processed the same, making this logic correct.

16.8 Combining Structures with Association Tables


When join data relationships are not available, a relational association table can
be maintained to join tables. This requires a table with the matching keys. A
simple association table can hold the hierarchical tables together seamlessly, as
shown in Figure 16.7. This simple association table can represent one-to-one
and one-to-many relationships. The association table named EmpProj in Fig-
ure 16.7 is represented in a dashed box in the result to show that it is not useful
at this point; it has served its purpose seamlessly and can be removed from the
external structure.

16.9 More Complex Association Table Usage


More complex association tables can be many-to-many tables that represent
one-to-many relationships in each direction. These association tables can also
contain intersecting data. This is a very powerful data that represents data spe-
cifically meaningful at the intersection point. For example, Part and Supplier
represent a well-known many-to-many relationship. Each Part can have many
Suppliers, and each Supplier can have many Parts. This association represents a
one-to-many relationship in each direction. A good example of intersecting
data for Parts and Suppliers is price. The price for each Part/Supplier combina-
tion can be unique, and the intersection data can contain this specifically mean-
ingful data. Such a many-to-many association table and its intersection data is
shown in Figure 16.8. Their derived one-way table operations are also shown.
When intersecting data is used in a structure, the data is represented in the asso-
ciated table EmpProg between the two adjoining structures, as shown in Figure
16.9.
Figure 16.9 demonstrates how easy it is to reverse the association table
usage in order to derive a different structure where the division view sits over
Dynamic Structure Combining by Joining, Mashups, and Association 195

Department Department

Employee Employee

Dependent Dependent EmpProj

Division

EmpKey ProjKey
Division Product

Product
Feature Project

Feature Project

Figure 16.7 Association table usage.

E1 P1 Val1 E1 P1, P2 P1 E1, E3


E1 P2 Val2 E2 P3, P4 P2 E1, E4
E2 P3 Val3 E3 P1 P3 E2.E5
E2 P4 Val4 E4 P2 P4 E2, E1
E3 P1 Val5 E5 P3
E4 P2 Val6 E6 P4
E5 P3 Val7
E6 P4 Val8

Figure 16.8 A many-to-many association table and its one-to-many usages.

the Department view by reversing the input and output. This new structure
will make the intersecting data available, as shown in the EmpProj node. The
association table will have to be updated as needed. This can be done by an
automated procedure. The EmpProj and ProjEmp represent the two
one-to-many relationships derived from the many-to-many association table in
Figure 16.8.
196 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Division
Division

Product
Product

Feature Project
Feature Project

EmpProj & Data

EmpKey ProjKey Intersect Data


Department Department

Employee Employee

Dependent Dependent

Figure 16.9 Reversing the association.

16.10 Conclusion
This chapter has shown how full multipath hierarchical structures can be easily
and powerfully combined hierarchically into larger structures that both
enhance and strengthen the embedded semantics, and can be dynamically pro-
cessed in queries. This can be a heterogeneous combination of powerful logical
and physical hierarchical structures. This combining of structures can be per-
formed using standard structure joining, new powerful mashups, and the asso-
ciation tables that are used when matching relationships are not available.
Hierarchical query filtering, which was necessary to understand the lower level
structure data filtering described in this chapter, was also touched upon.
17
Dynamically Increasing Data Value and
Flexibility
Beyond their ability to organize data, the full power of hierarchical data struc-
tures is not realized today. They have the inherent ability to significantly
increase data value beyond the value of the data collected. Hierarchical struc-
tures naturally and automatically capture more meaning than is stored in the
data, increasing the value of the data they store. This chapter will present and
explain why this is so, and will show ways in which hierarchical structures can
be used to significantly increase data value. When this is understood, hierarchi-
cal structures should be used more than they are today in order to take advan-
tage of their incredible amount of unused data value and its flexible utilization.

17.1 Data Structure Modeling of Single-Path Structures


Hierarchical structures are easily built and they expand naturally over time.
Most hierarchical structures start out small in design with a single pathway. As
the path grows downwards in design and data, the associated data above it
grows more valuable. This is because this added information gives the
upper-level path data more meaning due to its tight association and hierarchical
relationship with it. The following example in Figure 17.1 uses the Department
view shown in Chapter 16 to demonstrate this example.

197
198 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Add-> Emp

Add-> Dpnd

Figure 17.1 Structured modeling vertical growth.

17.1.1 Structure Modeling Vertical Growth


In the example in Figure 17.1, the Department view shown is expanded one
node at a time. As each node type is added, the Department node can take on
more information. When the Employee node is added, the Department node
can now know its employees. When the Dependent node is added, the Depart-
ment node can now know its Employees and their Dependents, and the
Employee node can know its Dependents. This information is of greater value
than its separate data because it is related to each other by the hierarchical struc-
ture increasing its meaning and data value. It is sequentially processed by fol-
lowing its hierarchical structure.

17.1.2 Structure Modeling Depth Growth


While Figure 17.1 has shown the vertical growth of the Department hierarchi-
cal structure, there is also the depth of the structure that is needed to track the
multiple occurrences for the data in each separate node type. Notice in Figure
17.2 how the multiple children Dependents for Employees are separately kept

Dept1 -------> Dept2

Emp1
Emp2
Dpnd 1
Dpnd 2 Dpnd 3
Dpnd 4

Figure 17.2 Structured modeling depth growth.


Dynamically Increasing Data Value and Flexibility 199

track of by the hierarchical structure. In this case, Employees 1 and 2 each have
two different sets of Dependents. This multiple data occurrence shared across
the parent utilizes multiple data sharing. This ability to keep track of multiple
hierarchical sets of data objects is not only useful, but it further increases the
data value by separating and containing the data. This automatic operation also
continues to make this data more valuable automatically.

17.2 Data Structure Modeling of Multiple-Path Processing


If single paths are useful, the capability to control processing and manipulate
multiple paths under a common parent is many times more useful and power-
ful. These require a more complex internal hierarchical processing. In Figure
17.3, the selected Features (Feat) and Projects (Proj) nodes from the DivView
view are on multiple hierarchical paths of the structure related under the same
ancestor node Prod. With multipath hierarchical structures, every node is
related to every other node through their lowest common ancestor (LCA) node
data occurrence. In the example provided in Figure 17.3, this makes the LCA
the Product (prod) node controlling the range.
This LCA in Figure 17.3 controls the range of processing between the
nodes to be processed. The lower the LCA node is, the tighter the range of con-
trol between the related LCA nodes. In Figure 17.4, all combinations of the
Feat and Proj data nodes under their hierarchical common Prod data occur-
rence (Prod1 and Prod2) are tested. Feat 1 and Proj2 are related and can be
processed together. Feat1 and Proj3 are not related and should not be processed
together. Feat1 and Feat2 have the same common parent node and can be pro-
cessed together, while Feat1 and Feat3 have different parent occurrences and
must be separately processed. This can be seen in Figure 17.4. This complex

Div

Prod <-- LCA

Feat Proj

Figure 17.3 LCA control range.


200 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Div1

Prod 1 Prod 2

Feat1 Proj1 Feat3 Proj3


Feat2 Proj2 Feat4 Proj4

Figure 17.4 Processing based on common node.

multipath hierarchical structure processing greatly increases the value of data


and is natural performed by SQL processing.

17.3 Static Data Joining of Structures


The static joining of full multipath structures preserves the semantics of both
contributing structures in the combining of the structures into a single larger
structure, as shown in Figure 17.5. In this example, the Division and Depart-
ment view structures are combined by linking at their Division and Depart-
ment nodes. This greatly magnifies their total data value. In this example, the
static combined view is joined. It can then be invoked and materialized when
needed. This combined structure is shown in the view definition directly
below:

DEFINE VIEW DivDeptView AS


SELECT * FROM DivisionView ON DivNO=DeptNo

Now the static View DivDeptView is invoked:

SELECT * FROM DivDeptView

Producing the structure seen in Figure 17.5.

Combining the Division and Department view in Figure 17.5 was made
very easy, and significantly increased the resulting structures’ data value beyond
the total data value of the separate use of each. Many new queries can be speci-
fied that were not possible before, and these queries are made significantly more
powerful by combining the massive semantics of both structures. It is easy to
Dynamically Increasing Data Value and Flexibility 201

Division

Product Department

Feature Project Employee

Dependent

Figure 17.5 Combined structure model.

see why this seamless combined structure offers many times the capability than
each of its structure taken separately.

17.4 Dynamic Data Joining of Structures


Static view combining, as shown in Figure 17.5, while powerful, can be very
restrictive. Dynamic structure combining is easy to use and is more flexible
than static structures are. Most importantly, the dynamic structure joining
offers additional data value increases because of its simple unlimited use capa-
bility. This can be seen below with the use of the following simple SQL ad hoc
statement that dynamically combines two hierarchical structures:

SELECT * FROMDivisionViewLEFTJOINDeptViewON
DivNo=DeptDivNo

The resulting structure produced by the above dynamic SELECT join state-
ment is the same as the one shown in Figure 17.5. It can be seen how easy these
dynamic views can be specified, modified, and created at any time without hav-
ing to previously define them in a view that makes them static.

17.5 Logical Data Structure Advantage


Logical methods for accessing data, unlike physical access methods, provide
data independence. The goal of data independence is for data users to be able to
access data without having to know its physical structure or how to navigate the
202 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

paths to data. Relational structures are logical structures that are created or
modified when needed. XML and IBM’s IMS are examples of the use of physi-
cal access methods that operate with fixed, physical data structures. Those data
structures are inflexible, whereas logical structures (like relational) are flexible,
efficient, and dynamic when delivering the benefit of data independence. How-
ever, fixed structures actually become logical structures when retrieved by SQL
and stored as relational data values. This can be supported by the introduction
of physical views. This is shown in the following XML view. The difference
between physical views and logical views is that the ON clause join condition
for fixed structures equates node types instead of fields in nodes. This is because
fixed hierarchical structures are contiguous, like XML, or already are linked
internally, like IBM’s IMS. In effect, they are already joined so that only the
linking node types are necessary to define the physical view. This is shown in
the XML view below:

DEFINE XML VIEW DivisionView AS


SELECT * FROM Division
LEFT JOIN PRODUCT ON Division=Product
LEFT JOIN Feature ON Product=Feature
LEFT JOIN Project ON Product=Project

This view, when processed, converts the physical structure into a number
of hierarchical related tables that are defined as a relational structure using a
series of LEFT outer joins. This physical data support automatically enables the
seamless heterogeneous processing of a mixture of fixed and logical structures,
because when they are processed, they both are logical relational structures
defined by SQL’s LEFT outer join operator. The hierarchical LEFT out join
will seamlessly process them as a single virtual hierarchical structure.

17.6 Multipath Data Qualification


The query in Figure 17.6 selects data from one path of the data structure based
on data from another path of the structure. Internally the semantic connection
between the Feature and Project tables from the query below is made automati-
cally. It also utilizes the LCA, as in Figure 17.3, because both Feat and Proj
nodes are referenced in the query. This is a very powerful SQL query that does
not require the user to know the data structure. The ability to process any query
without knowing the structure dynamically increases the value of the data that
is being processed because it can be used more often and more correctly.
Dynamically Increasing Data Value and Flexibility 203

Div

Prod <------LCA

Feat Proj

Figure 17.6 LCA node selection.

17.7 Dynamic Path Data Filtering


Increasing the value of data by having it control the dynamic generation of data
structures is another operation that extends the usefulness of the data and
increases its value. The SQL-92 outer join operation introduced the ON clause
to replace the WHERE clause for specifying the join criteria. The WHERE
clause operates globally across the entire structure, while the ON clause oper-
ates locally on the specific join path. This path data filtering and joining criteria
to be easily specified along with the joining criteria.

17.8 Miscellaneous Operations that Increase the Data Value


There are also many other SQL operations that can indirectly increase the value
of data by making it easier to access; more precisely controlled; transformed; or
more accurate. These SQL operations are discussed below. They are usually
overlooked, but when taken together, their combined data value increase can be
quite impressive.

17.8.1 Structure-Aware Processing


Structure-aware processing is supported by automatically analyzing the hierar-
chical outer join statement that models the hierarchical structure of the data
that is being processed. This will dynamically supply the structure metadata.
This can be used to support hierarchical navigationless processing, also known
as schema-free processing. This means that the user does not need to navigate
the structure that is being processed, making the processing by the user
204 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

automatic and easily available. This makes the data accessed more readily avail-
able to nontechnical users, and therefore, more valuable.

17.8.2 Hierarchical Optimization


Knowledge of the hierarchical structure can also be used to perform powerful
hierarchical optimizations. These optimizations are also utilized in SQL hierar-
chical views. This optimization automatically turns these hierarchical views
into global views because any smaller query can be used without incurring over-
head from unneeded paths in the global view. This allows the global view to be
used instead of a more specific view, without incurring any overhead. This
means that the user does not have to be aware of all the specific views that make
their use easier and more productive, deriving more data value.

17.8.3 Increase of Data Accuracy and Correctness


SQL databases are a proven technology based on sound principles. New releases
of SQL platforms typically undergo thorough regression testing in order to
ensure that queries produce correct answers. SQL hierarchical processing natu-
rally follows SQL principles, producing the correct result. Correct results
increase the value of the data because SQL hierarchical processing can be
trusted for critical and important applications. After all, if the data cannot be
trusted, it has no value.

17.8.4 Interactive Data Access


Being able to immediately and interactively search and experiment to refine a
query in SQL is extremely useful. Coupled with hierarchical processing, this
enables incredibly powerful results that continually increase the value of the
data. There are numerous software packages that make use of SQL in order to
provide powerful data exploration and data visualization capabilities.

17.8.5 Automatic Data Aggregation


SQL’s SELECT operation allows selected data to be tightly aggregated by slic-
ing out unwanted data between desired data. This makes the data result more
condensed and closely related. This removing of possibly confusing data
enhances and refines the meaning of the desired data. All of these operations
give the data result more importance and value.
Dynamically Increasing Data Value and Flexibility 205

17.9 Conclusion
This chapter has described how hierarchical data structures can be very power-
ful and useful for storing data and in the processing of this data by continually
increasing the value of the stored data. The stored predefined static data value
increases in use and information value as the view structure is expanded
through data modeling. At this point, the data view structures can be dynami-
cally combined, greatly increasing the data value. And finally, the query con-
trols how the data value is further increased in order to answer the query. This
automatic multilevel data reuse and dynamic combining of the hierarchical
structures makes the hierarchical structures a very useful data storage structure
that automatically leverages its increasing data value. This automatic increasing
of data value in the hierarchical structure and hierarchical queries is an incredi-
bly powerful overlooked and underutilized capability.
18
Automatic Multipath Hierarchical
Structure Operations
SQL hierarchical processing became possible with the introduction of the
SQL-92 standard by using its new one-sided LEFT (Outer) JOIN operation
and its accompanying ON clause addition. This supplied the necessary hierar-
chical principles, data preservation operations, and specific node linking con-
trol. What was unexpected was that this new operation also enables automatic
multipath hierarchical processing and its required special lowest common
ancestor (LCA) processing that makes this possible, which is discussed in this
chapter.
This chapter introduces and demonstrates a number of multipath hierar-
chical structure operations that are new for SQL. Many are made possible by
SQL’s inherent hierarchical processing. Some are new for hierarchical process-
ing in general. Some advanced new hierarchical processing operations that are
made possible by automatic hierarchical optimization and that can be intro-
duced seamlessly into an SQL hierarchical processor will be shown. These pro-
duce many advanced hierarchical operations that are seamlessly supported by
this hierarchical optimization. While some are not inherent SQL operations,
they can be easily added to SQL and automatically performed in SQL. These
described capabilities include: focused aggregated data retrieval; multipath LCA
hierarchical processing; schema-free processing; and hierarchical data filtering.

207
208 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

18.1 Structure-Aware Processing


This description of multipath hierarchical structure operation begins with the
basic hierarchical operation necessary for supporting automatic natural hierar-
chical processing in SQL. This is structure-aware processing that operates auto-
matically with no previously supplied information other than the standard
input SQL. This is possible because the supplied standard SQL, like XML,
actually contains the definition of the hierarchical structure to be processed and
the semantics to automatically perform the powerful hierarchical processing.
An example is shown in Figure 18.1 where the LEFT JOIN operation syntax,
when parsed, defines the structure. A LEFT JOIN B preserves A over B; this
means that A can exist without B, but B cannot exist without A. The LEFT
JOIN defines the structure hierarchy while the ON clause defines the pathways
link points, which can be seen in the derived processing structure in Figure
18.1.
In this example, the hierarchical structure has been defined in depth first,
but it could have just as easily been defined in width first or in any combination
using a LEFT JOIN parsing algorithm to derive the structure. As well, the
LEFT JOIN operation inherently carries within it the hierarchical semantics
that automatically control the SQL processing of hierarchically defined struc-
tures. This technique can also work for physical heterogeneous hierarchical
structures by utilizing the structure information to take apart fixed hierarchical
structures and turn them into extremely flexible, logical, interconnected hierar-
chical structures that support full data independence and its flexibilities.

DEFINE VIEW ABC AS


SELECT * FROM A
LEFT JOIN B ON A.Key= B.FKey
LEFT JOIN C ON B.Key=C.FKey
LEFT JOIN D ON B.Key=D.FKey
LEFT JOIN E ON A.Key=E.FKey
LEFT JOIN F ON E.Key=F.F Key
LEFT JOIN G ON E.Key=G.Fkey

Figure 18.1 Automatically determining the structure.


Automatic Multipath Hierarchical Structure Operations 209

18.2 Hierarchical Optimization


Hierarchical structure processing of both fixed and logical structures supports a
powerful advanced hierarchical optimization that operates consistently across
logical and physical heterogeneous data. Hierarchical structures are based on a
top-to-bottom data preservation where parent nodes can exist without chil-
dren, but children cannot exist without a parent. This hierarchical one-way
data preservation can be supported precisely using the LEFT JOIN operation.
This one-way data preservation is what gives hierarchical structures significant
semantic processing power. It also enables advanced hierarchical optimization
because these hierarchical semantics mean that only required pathways are
accessed. If a pathway in a structure is not needed, it is not accessed unless it is
necessary for navigation. It also means that partial pathways can also be
accessed. The most important aspect of this hierarchical optimization is that it
produces the same result semantics as those produced without the hierarchical
optimization applied. A side benefit of this hierarchical optimization is that it
will also reduce the number of unnecessary data replicated values generated by
relational data explosions by reducing the unnecessary data replications from
taking part in data explosions. When dealing with hierarchical structures, this is
definitely a benefit and can increase the accuracy of the hierarchical result.
SQL’s powerful SELECT list operator naturally controls hierarchical
optimization by precisely specifying the data nodes to be returned. Unselected
nodes are represented by a dotted box, as shown in Figure 18.2. If one or more
data items were selected from a data node, it would be accessed. Unselected
data nodes that were on the path to selected data nodes would also need to be
accessed in order to achieve proper data navigation. However, as soon as no
more nodes on a path are needed, no further access for the path is necessary.
These unaccessed nodes are shown to be connected by a dotted line in Figure
18.2. This is determined dynamically before the first data access because the

B B

C F G

Figure 18.2 Optimized structure.


210 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

structure-aware processing has determined the fully-defined structure to be


accessed. Query optimization applies the knowledge of the selected data to scale
this down to only the necessary data access for each dynamic query. This allows
for the dynamic SELECT list to be greatly varied. Figure 18.2 demonstrates
this optimization applied to the ABC view defined in Figure 18.2 to retrieve
nodes A, D, and E. Node B is an unselected node that is needed for navigation
and access to the selected C node. This dynamically supports significant opti-
mization applied automatically in SQL. This allows it to be a seamless enabler
for other powerful new capabilities.

18.3 Focused Aggregated Data Retrieval


The powerful SELECT list optimization operation shown in Figure 18.2 only
scratches the surface of its capabilities. The SELECT list is free to SELECT
any grouping of nodes, even if there is no full path specified to all nodes refer-
enced. This is a very powerful and flexible capability. It is known as focused
retrieval operation, and is extremely useful and easy to use because the structure
does not have to be known. The selected unattached data items are retrieved
and aggregated together. This is a known and highly desirable operation
(shown below in Figure 18.3). The dashed boxes represent unselected nodes
that require access to navigate to all selected nodes. These unselected nodes are
squeezed out after retrieval. This is called Node Promotion and Node Collec-
tion, which implies that multiple nodes are advanced upward. In relational pro-
cessing, this is known as Projection. Once nodes are retrieved into storage,
squeezing them into a contiguous structure makes the entire structure easily
accessible because it has been internally mapped to its aggregated structure, as
shown in Figure 18.3.

B E

Figure 18.3 Node promotion and collection.


Automatic Multipath Hierarchical Structure Operations 211

18.4 Multipath Hierarchical Processing


It is very important to support multipath query processing automatically and
transparently because it enables complex semantic processing, which dynami-
cally increases the value of the data. This allows for the query user to not be
required to know the data structure in order to specify and execute a query.
This makes specifying a query by a nontechnical person possible and greatly
increases the value of the database. This capability opens the door to more com-
plicated internal processing when the query involves referencing and processing
multiple paths of the query. There are two difficulty levels of processing
multipath queries. The first level is a simple multipath query that just selects
data items with no data filtering. The second level are queries that include a
WHERE clause. These powerful operations allow data nodes to be filtered
across different pathways. These query operations are very complex and
extremely powerful in their processing.
There are two ways to look at the WHERE clause operation. It can be
looked at as a data filtering operation, but it can also be considered a data quali-
fication operation, depending on your point of view. Filtering removes data,
whereas qualifying retains data. The results are the same. This is an extremely
powerful operation that dynamically filters data from query to query. This
increases the query’s data values in unlimited different ways.

18.4.1 LCA Processing


Hierarchical data processing was commonplace before the advent of the rela-
tional model for data. Hierarchical query languages could also dynamically
query hierarchical structures. Queries used WHERE clauses to selectively
choose desired data that could also require processing jointly across pathways.
This advanced processing required a special processing known as lowest (or
least) common ancestor processing that is performed to ensure meaningful
hierarchical results. There are two cases in which LCA processing is necessary;
their processing is very similar, but does have its differences. For this reason,
they are referred to here as type 1 and type 2 LCAs. With hierarchical process-
ing, this occurs internally with the LCA location determination and processing
logic built in. With relational hierarchical processing as described in this book,
LCA processing was not built into the relational processor because its use was
not possible in relational databases until the SQL-92 standard.
Hierarchical processing was made possible by the use of the hierarchi-
cal-oriented LEFT Outer JOIN operation, which was introduced in the
SQL-92 standard. This processing was not a planned relational feature for rea-
sons just mentioned. More amazingly, the required LCA processing was
212 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

discovered to be naturally occurring in the relational Cartesian product. This


LCA processing requires the processing of all combinations of data on different
paths; the relational Cartesian product does this naturally. The Cartesian pro-
cessing handles LCA processing because of the LEFT JOIN hierarchical data
modeling that is processed directly by the Cartesian product, which processes
all combinations automatically. The different automatic LCA operation usages
are described in the following sections.

18.4.2 LCA Type 1 Internal Processing


LCA type 1 processing occurs when a selected data item is qualified for selec-
tion by a combination of a WHERE clause and SELECT list. This processing
allows the selection of a data item from one path of the structure based on a
data value in a different path of the structure, as shown in the example in Figure
18.4. This LCA processing is determined by the selected data item FeatNo and
the qualified data value being tested in ProjNo. The LCA node is determined
by the lowest common ancestor node between the Feat and Proj nodes as
shown. This could be any higher-level ancestor node. This produces a separate
range of data paths between each active LCA data occurrence node, as shown in
Figure 18.4. This LCA type 1 processing searches Proj nodes in the desired
LCA range for a data match (ProjNo=30). Any single match will qualify all Feat
nodes under the current LCA Prod node data occurrence. This is correct LCA
processing for hierarchical relationships across pathways. The result is shown to
the right of the query.
The same LCA operation in hierarchical databases is performed automati-
cally and naturally by the relational SQL Cartesian product effect, which cre-
ates all data combinations. This can be seen by searching for ProjNo=30 in the
Cartesian product and pulling out the matching FeatNo values. This operation
can be performed in the Cartesian product produced from the example data,
and is shown in Figure 18.4. This is how the LCA operation is performed auto-
matically and naturally in SQL.

18.4.3 LCA Type 2 Internal Processing


LCA type 2 processing occurs with a WHERE clause referencing multiple
pathways. This usually happens with a compound WHERE clause, such as the
one shown in Figure 18.5. The LCA node is determined by the lowest common
ancestor node between the Feat and Proj nodes specified in the WHERE
clause. This produces a range of data occurrences between each active LCA data
node. This LCA type 2 processing creates all the combinations of Feat and Proj
nodes under the current LCA Prod node data occurrence because of the
Automatic Multipath Hierarchical Structure Operations 213

31
41
Div 1

----> Prod 1 Prod 2 <----

Feat 11 Proj 10 Feat 31 Proj 30


Feat 21 Proj 20 Feat 41 Proj 40

DivNo=1 ProdNo=1 FeatNo=11 ProjNo=10


DivNo=1 ProdNo=1 FeatNo=21 ProjNo=10
DivNo=1 ProdNo=1 FeatNo=11 ProjNo=20
DivNo=1 ProdNo=1 FeatNo=21 ProjNo=20
DivNo=1 ProdNo=2 FeatNo=31 ProjNo=30
DivNo=1 ProdNo=2 FeatNo=41 ProjNo=30
DivNo=1 ProdNo=2 FeatNo=31 ProjNo=40
DivNo=1 ProdNo=2 FeatNo=41 ProjNo=40

Figure 18.4 LCA type 1 internal processing.

Div 1

----> Prod 1 Prod 2 <----

Feat 11 Proj 10 Feat 31 Proj 30


Feat 21 Proj 20 Feat 41 Proj 40

Figure 18.5 LCA type 2 internal processing.

Cartesian product affect, which creates all combinations previously described.


It uses any single match of both FeatNo=31 AND ProjNo=40 to qualify all
214 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

selected data item ProdNo data occurrence 2. This is performed automatically


and naturally in SQL. The Cartesian product shown in Figure 18.4 shows a
single row that matches FeatNo=31 AND ProjNo=40 for LCA Prod occurrence
2, which is also selected for output as ProdNo.

18.4.4 LCA Type 2 Variable OR Processing


Variable OR processing for LCA processing is a very tricky operation that has
often been overlooked in the past producing incorrect results for hierarchical
processing of multipath queries. This LCA type 2 processing occurs with a
WHERE clause referencing multiple pathways using an OR operation instead
of the more common AND operation as shown in the example in Figure 18.6.
The LCA node is still determined by the lowest common ancestor node
between the Feat and Proj nodes specified in the WHERE clause as combina-
tions of Feat and Proj nodes. The difference between this OR condition testing
and an AND condition testing is that, with an OR condition, either side of the
test can independently qualify the test. This can cause a variable LCA operation
that is still semantically valid and is still automatically producible naturally in
SQL.
The Cartesian product result set, such as the one shown in Figure 18.4,
can be used in this Figure 18.6 example to check the results. In this example,
the WHERE clause uses FeatNo=11 OR ProjNo=40 to search the Cartesian
product for matches. The first match qualified FeatNo=11, which qualified
Proj=10 and Proj=20 and also itself FeatNo=11. In the next LCA data occur-
rence, the second match ProjNo=40 qualified FeatNo=31 and FeatNo=41 and
also itself ProjNo=40. This result is shown in the result in Figure 18.6. The

1 11 10
Result 20
2 31 40
Div 1 41

----> Prod 1 Prod 2 <----

Feat 11 Proj 10 Feat 31 Proj 30


Feat 21 Proj 20 Feat 41 Proj 40

Figure 18.6 LCA type 2 internal OR processing.


Automatic Multipath Hierarchical Structure Operations 215

variable operation can be seen in the variable operation of the OR processing


that can select data from the left argument or the right argument, which oper-
ate in an opposite manner in how they select data. If both sides of the test are
true, then both sides are fully selected. This LCA is performed automatically in
SQL. The variable processing can be seen in the result.
In the example in Figure 18.6, A WHERE clause specifying FeatNo=11
OR ProjNo=40 does not match both sides of the hierarchical data occurrence
because they are on different LCA data occurrences, each with its own combi-
nation potential. A WHERE clause specifying FeatNo=11 OR ProjNo=20
matches both sides of the OR operation. This has the effect of selecting all data
from both sides. This means that FeatNo=21 is also qualified by the ProjNo=20
being tested and is matched in the WHERE clause. This also means that both
sides of the OR operation must be tested, because both sides of the OR opera-
tion can contribute to the qualified data. This example can be worked out using
the Cartesian product result set in Figure 18.4. These are the same semantics
used in hierarchical processors that contain LCA processing logic; it has now
been proven to be accurate by automatically producing the same results in
SQL.

18.4.5 Multiple LCA Type 1 Processing


Figure 18.7 is similar to the example in Figure 18.4 except it is also selecting
EmpNo for output based on the same WHERE clause. This has the interesting
and important side effect of using two different LCAs, which are noted. The
first LCA Prod node is created by the referenced ProjNo and FeatNo, as in the
example in Figure 18.4. However, when EmpNo is added for output in the
SELECT list, the referenced nodes are FeatNo and EmpNo, thus creating
another LCA. The LCA node used to access EmpNo is Div as shown. A

Div <-----

-----> Prod Emp

Feat Proj

Figure 18.7 Multiple LCA type 1.


216 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

different set of data combinations is used for the different LCAs. This is per-
formed automatically and naturally in the SQL, as shown in Figure 18.7.

18.4.6 Combining Processing of LCA Types 1 and 2


This example in Figure 18.8 is an example where a combination of LCA type 1
and LCA type 2 are both being used in a single query. This is happening
because the selected data item EmpNo is on a different path than the WHERE
clause associated LCA node Prod and is making the Div node the type 1 LCA.
The type 2 LCA from the compound WHERE clause is used as input to the
type 1 LCA. This happens naturally in SQL.

18.5 Nonlinear Ordering


In a natural multipath hierarchical processing query language, nonlinear data
ordering should be automatic. The query operation naturally applies the
request to the hierarchical structure so that ordering is applied independently to
the different hierarchical paths, as shown in Figure 18.9. The same SQL
ORDER BY syntax is used, but the semantics have been adapted in order to be
meaningful to multipath hierarchical structures. The ordering occurs from top
to bottom. Figure 18.9 shows the results of reordering the data from Figure
18.4. The order that the data items are listed in the ORDER BY statement has
no effect. This is because the hierarchical structure is rigid and the ordering is
applied in the structure of the hierarchical structure. Using the ordering to
change the structure cannot be allowed because this would unknowingly
change the defined hierarchical data structure and affect the result, causing
errors. For example, ordering node Prod before node Div would place it

Div <----

----> Prod Emp

Feat Proj

Figure 18.8 Combine LCA type 1 and 2.


Automatic Multipath Hierarchical Structure Operations 217

ORDER BY ProjNo DESC, ProdNo DESC, FeatNo DESC

Div

Prod 2 Prod 1

Feat 41 Proj 40 Feat 21 Proj 20


Feat 31 Proj 30 Feat 11 Proj 10

Figure 18.9 Nonlinear data ordering.

hierarchically above node Div; this should be treated as an error because this
operation has inadvertently and incorrectly changed the structure being pro-
cessed, and the SQL hierarchical processor would be operating incorrectly if it
proceeds. It will not let the query proceed.

18.6 Global Views and Schema-Free Processing


Hierarchical processing with structure-aware processing allows for all queries
and views to be optimized. All views become global views wherein each query is
optimized so that there is no overhead for using a view larger than necessary.
This enables one to avoid having to use many specific queries and allows mak-
ing global queries very easy with no overhead. This means that the user does
not need to know the data structure, therefore allowing schema-free processing.
Dynamic views are also optimized and these can be heterogeneous views com-
prised of logical and physical virtual structures.

18.7 Global Queries and Hierarchical Data Filtering


Global hierarchical queries that select all or large portions of the data structure
have been difficult to support. However, it is often useful to select the entire
structure with a data search argument. Another problem with performing a
global query is that it involves a sophisticated hierarchical data filtering that
occurs based on the data search argument. This is because with hierarchical
searches, every node in the structure is related to every other node in the struc-
ture. This global node relationship is demonstrated in Figure 18.10, where the
WHERE clause global filtering starts at node E and branches off in up and
218 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Figure 18.10 Global hierarchical filtering.

down directions. All nodes can still be affected as shown. In this example, other
pathways are related and can be filtered. The A node can be affected and the fil-
tering will be reflected down the path through the B node to the C and D
nodes. If the A node data occurrences are filtered out, so are all lower-level
nodes data occurrences under it. This allows the entire hierarchical structure to
be easily filtered. The WHERE clause can be as complex as needed, there are no
limitations or restrictions.

18.8 Automatic Hierarchical Parallel Processing


Hierarchical processing lends itself very nicely to parallel processing. This is
because hierarchical pathways are naturally independent of each other, which
allows them to operate in parallel. Because of SQL’s capability to operate struc-
ture-aware, parallel pathways can be detected at run time and they can be auto-
matically processed in parallel without user intervention. The parallel
processing is performed on the dynamically optimized structure.
Figure 18.11 demonstrates this parallel processing. Pathways B, C, D and
E, F can be run in parallel, and pathways G, H and I, J can be run in parallel
along with pathway B, C, and D. Parallel processing usually requires tasks like
synchronization to be carried out automatically by the parallel processing (PP)
controller. This synchronization is performed at the parent data occurrences for
node A and node F. The PP controller can perform further lower-level parallel
processing, such as performing parallel processing at the data occurrence level
using threads. In this way, while the paths are being parallel processed, the indi-
vidual path data occurrences in each of the paths can be processed in parallel.
The preprocessor also lends itself to be written to use parallel processing.
The post processor can be processed in parallel with the preprocessor and the
main query processor. The parallel processed query can also be automatically
Automatic Multipath Hierarchical Structure Operations 219

Figure 18.11 Determine parallel processing.

and seamlessly distributed. Distributed SQL processing is also supported; it will


automatically be performed hierarchically at the distributed site because the
LEFT JOIN hierarchical processing control can be transparently transferred to
the remote site where it will automatically operate hierarchically.

18.9 Conclusion
This chapter has described a number of very powerful processing capabilities
that inherently support, or extend support, for advanced new multipath pro-
cessing capabilities. These described capabilities include: structure aware pro-
cessing, focused aggregated data retrieval, multipath LCA hierarchical
processing, schema-free processing, nonlinear hierarchical ordering, and hierar-
chical data filtering.
This chapter has shown how natural LCA in SQL processing keeps hier-
archical multipath processing operating accurately in hierarchical and relational
processing. LCA is an area that is not covered outside of academic research.
Today’s academic research involves supporting LCA using external functions
because it has not yet been recognized that LCA processing exists naturally in
SQL, as explained in this chapter. The LCA internal complexity issues shown
demonstrate why external LCA functions will not work and they cannot be
seamlessly applied, requiring structure knowledge by the user. This chapter has
shown how inherent LCA processes naturally available in SQL solve these LCA
220 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

processing problems. This chapter also demonstrated how the automatic accu-
racy is maintained and how it is achieved.
This chapter has shown how hierarchical structures are extremely useful
and powerful. This is why they should be used more in business and everyday
processing. If they do become used to a great extent, their ability to be parallel
processed automatically could be a huge benefit.
19
Variable Data Structure Generation
This book contains five different ways, each with different uses, to dynamically
generate hierarchical data structures in SQL. These include basic data modeling
building-block structures, the joining of structures to create new more powerful
structures, and the use of the SELECT operation to dynamically specify the
desired data to be aggregated, which condenses the structure changing it. These
previously described data structure operations are under the control of the user.
This chapter will demonstrate a fourth, more powerful concept of having the
data drive the variable data structure generation. A fifth new way to generate
data structures (described later in Chapter 20) by using the current semantics to
transform any new data structure required.

19.1 Variable Data Structure Generation Is a Powerful


Concept
Variable data structure generation offers unlimited possibilities for creating
data structures automatically. This makes the creation of data structures vari-
able based on the data contents. The basic data modeling of the data structures
is usually performed statically ahead of time, but the joining of these structures
cannot always be predicted, so the value of the data can be used as a trigger.
This is a very powerful concept that opens the door to many uses, as the exam-
ples in this chapter will demonstrate. The data values in the data may exist nat-
urally, or can be created with data for the express purpose of controlling how
the data structure is generated. This can be done in a number of ways, as will be
shown, and can be considered a way to program dynamic structure generation.

221
222 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

19.2 Linking Below the Root Increases Structure Joining


To increase the opportunity to perform structure joining, linking below the
root is utilized. Figure 19.1 is used to review this capability (originally
described in Chapter 15), allowing linking directly below the root of the
lower-level structure by linking past node X to node L. This is possible because
the lower-level structure has already been materialized due to the automatic
nesting of the lower-level structure’s SQL, as shown on the left side of Figure
19.1. This is caused by the LEFT JOIN on top of the lower view box, which
delays its matching ON clause at the bottom of the box. This enables the
lower-level structure to be already fully formed, and it represents an independ-
ent fixed structure with its complete semantics already established. The joined
data modeling representation represents the correct combined hierarchical
semantics of the shown derived structure linking to the root of the lower struc-
ture. The actual lower-level ON clause link point still affects the filtering of the
lower structure. This automatic controlled ability to link to a lower-level struc-
ture below its root is an extremely powerful capability.

19.3 Looking Backward and Forward


The join operation can include a data value test on the ON clause to affect its
join operation. The data value is usually located on one of the two nodes refer-
enced by the join condition, but it can be located further up the path toward
the root node or even further down the path into the lower structure being

Figure 19.1 Linking below the root works.


Variable Data Structure Generation 223

joined acts as a look-ahead operation. This works for the same reason that link-
ing below the root in Figure 19.1 works; the lower-level structure has already
been materialized, making the data available for testing even before the join is
performed.

19.3.1 Looking Backward


The SQL in Figure 19.2 joins the Rdb view over the Xml view using the ON
clause join condition B.key=X.fkey, which joins the two structures between the
B and X nodes. Additionally, the ON clause also includes an ANDed test con-
dition R.r=2 that must be true in order to complete the joining of the views,
otherwise only the top Rdb View is the final result because it is always preserved
because of the LEFT join operation. It is important to note that the R.r test
value is located in a node; it is not located in either node being joined, but in a
node that is further up the path. The entire path up to the root is accessible
because it has already been accessed. This allows control over how the structure
dynamically generates based on the data in the database.

19.3.2 Looking Forward


The SQL in Figure 19.3 is very similar to the previous example, except that the
additional test value L.l=2 on the ON clause is located further down the path
into the lower structure that is being joined. This can be considered a
read-ahead operation because this test can be performed before the structure is
returned for joining or even before it is qualified. This capability was explained

Figure 19.2 Variable structure control by looking backward.


224 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Figure 19.3 Variable structure control by looking forward.

in Section 19.2, when linking below the root was explained. This added test
controls the structure by the value in the data being tested and controls whether
or not the join is performed, depending on the value in the data. The possible
structure results are shown in Figure 19.3. The effect of this data value test
below the root of the lower-level structure is the same as separately querying the
lower structure. The location of the immediately affected node tested (node L
in this case) makes a difference in how this data qualification performs as it
spreads out to affect the entire structure. This can be seen in Figure 18.10.

19.4 Advanced Variable Structure Control


The SQL in Figure 19.4 is familiar to the previous examples in Figures 19.3
and 19.2, except for its additional testing on the ON clause, which makes it
more internally complex, but not for the user. In this example, either of two
data tests on different lower-level paths at nodes M and L can qualify the test
and allow the lower-level structure generation. This works this way because an
OR test is performed. The OR operation has the higher precedence level and is
performed first. It isolates the result for the data variables so that the desired
result is correct. In this situation, using an AND operation could also be speci-
fied, which would require both sides of the test to qualify in order to get the
lower-level structure generation.
The OR operation is a tricky and intelligent operation because, depend-
ing on which side of the OR operation is true, it can change the result for hier-
archical operations to the appropriate result. For this reason, both sides of the
Variable Data Structure Generation 225

Figure 19.4 Advanced variable structure control.

OR operation must always be tested and applied together. L.l=2 produces one
set of results, whereas M.m=4 produces a different set of results. These are auto-
matically combined in SQL to give the correct result. The replacing of the OR
with an AND operation will usually produce a different result.

19.5 Flexible Multiple Generation Choices


In the next set of data structure generation examples, a third view named Swp is
used in order to ensure that more complex examples can be shown consisting of
three views. These examples will allow dynamic choices of different structure
additions to be made. The new Swp view has the same basic structure as the
other views used. However, this is not a requirement; the view being joined to
expand the structure can have any structure.

19.5.1 One or the Other Variable Test


In the SQL statement in Figure 19.5 there are two separate tests performed on
the same upper-level data value, which will either qualify only one of the tests
or none of the tests. In this example, one of three possible results dynamically
occurs, depending on the value of the data being tested. The view Xml can be
qualified, or the view Swp can be qualified, using the same data type and value
R.r<>2, or neither is qualified. This is shown in Figure 19.5.
226 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Figure 19.5 One or the other variable structure.

19.5.2 Multiple Independent Tests


This example in Figure 19.6 is similar to the previous multiple test example in
Figure 19.5. In this example, two different data locations, P.p and L.l, are sepa-
rately tested in their lower-level structure data test locations. This allows for not
only one or the other qualifications, but both views can also be qualified at the
same time, as shown in Figure 19.6. The use of lower-level test data values car-
ried in their lower data structure (shown in Figure 19.6) is also more flexible
than those carried in an upper structure, as shown in Figure 19.5. This is

Figure 19.6 Left, right, or both variable structures.


Variable Data Structure Generation 227

because it does not require as much planning ahead. It can be implemented as


close to the structure as needed.
The order in which the structures are added controls the order in which
they are added to the structure, as shown in the result structure in Figure 19.6.
In this example, the Xml structure is located before the Swp structure in a nor-
mal left-to-right order. Because they are at the same level under a common par-
ent, they have the same execution priority. Their positioning is only important
for hierarchical structure navigation.

19.6 Nested and Embedded Variable Structure Creation


The examples in this section demonstrate how additional structures can be con-
tinually generated downward. This allows unlimited and dynamic vertical
growth potential performed automatically when controlled based on the data in
the database.

19.6.1 Nested Variable Structure Test


The SQL in Figure 19.7 dynamically links view Xml to view Rdb based on the
L.l=2 data test. If this test fails, then the following joins for this path (Swp) will
also fail. If this test is true, then the view combined structure Rdb/Xml is
joined, allowing the linking of view Swp to Xml to be made if its qualification
test W.w=3 passes. This process grows the structure downward. This capability
combined with multiple structure view choices (as shown in Figures 19.5 and
19.6) allow unlimited dynamic vertical growth with structure changes con-
trolled by the data in the structure.

19.6.2 Embedded Variable Structure Test


The nested structure in Figure 19.7 can also be created by embedding the nest-
ing logic in views, as shown in Figure 19.8. This will externally hide the nesting
process. In order to be flexible and still hide the nesting process, each view can
perform one view level of nesting, which can then be followed by another view
with one level of view nesting (as shown in Figure 19.8). This allows complex
variable view construction to be more easily performed and maintained.
228 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Figure 19.7 Nested variable structure control.

19.7 Variable Structure Generation Along Multiple Paths


So far, we have seen how a single pathway has been dynamically expanded ver-
tically downward. However, this does not mean that only a single pathway can
be expanded downward. All pathways can be expanded downward separately
and independently, as shown in the SQL in Figure 19.9. The node D pathway
and the node B pathway are independent from each other and can grow inde-
pendently. In addition to the downward growth pattern, these pathways can be
branching out at the same time. This branching out growth pattern is also
shown in in Figure 19.9.

19.8 Variable Structure Range Filtering


All of the previous examples have used a match to an exact value. Multiple
matches with separate values can be used connected by the OR operation, or
ranges can be specified using greater than and less than to specify an exact range,
or the AND operation can be used for more complex tests on data for a greater
or exact refinement of the result. This is shown in Figure 19.10. The data value
range L.l>18 AND L.l<55 specified in the SQL is no longer binary as previ-
ously shown. It is now more of a data filter; however, it operates the same way.
It enhances and opens up the hierarchical searching and filtering possible.
Variable Data Structure Generation 229

Figure 19.8 Embedded variable structure control.

Figure 19.9 Multiple path variable structure control.


230 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Figure 19.10 Variable structure range filtering.

19.9 Why Variable Structures Work with Hierarchical Data


Hierarchical structures have been fixed; their structure had to be fixed in order
to process their structure. If this is so, how can variable structures be processed
if they can dynamically change? This is because hierarchical structures can be
variable and fixed at the same time. They are fixed because they are all defined
in the structure and they are variable because their automatic growth is con-
trolled and stays within the defined structure. Additionally, logical hierarchical
structures are now possible because fixed structures can be retrieved into rela-
tional row sets where they become logical, and SQL’s hierarchical processing
allows them to be processed hierarchically.

19.10 Conclusion
This chapter has shown us how to use and control variable data structure filter-
ing. All the other methods of data structure generation shown in this book are
directly driven by the user. The data driven method relies on the ON clause
capability to test data values in order to further qualify the data from the ON
clause join criteria. This is a very useful capability that is not generally available
elsewhere.
20
Semantically Controlled Data Structure
Transformations
Data structure transformation is a vague term. There are actually two basic
types of data structure transformations. The transformation terms restructuring
and reshaping have been used interchangeably with data structure transforma-
tions, but these two terms bring to mind different types of transformations that
are associated with two basic transformations. In this chapter, they are
recategorized and defined as two different types of data structure transforma-
tions. Restructuring uses data relationships and reshaping uses data semantics
to perform their operations. SQL will be utilized to demonstrate these types of
data structure transformations using SQL’s natural hierarchical data structure
processing capability. Data structure virtualization can also be considered
another form of data structure transformation and is also covered in this
chapter.

20.1 Restructuring and Reshaping


Data structures are built to a great extent by utilizing the related data values in
the data in order to make the linkages that can bring the data together in differ-
ent ways. Restructuring is performed by using these data relationships in the
data as linkages in order to change the structure to model these new relation-
ships. This is done by taking the structure apart as needed and applying new
data relationships that are not currently used or reusing them differently. This
introduces new meaning and semantics to the resulting structure. In addition,

231
232 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

these data relationships can involve the comparisons and formulations needed
to make the new relationships linkages.
Restructuring is used when new semantics are needed and data relationships
are available to support this. The resulting structure is not a concern. Reshaping is
used when a specific result structure is necessary; for example, when establishing
parent-child relationships with a hierarchical data model. The resulting semantics
are not a primary concern, but they are expected to be hierarchically derived from
the source structure. Restructuring requires that the necessary data relationships are
available, while reshaping does not require any external or internal requirements.
Therefore, reshaping is always available to use.
To perform transformation, SQL uses a multiple structure copy tech-
nique in order to transform the data by using the semantics in the data. The
transform examples show the transform by displaying each copy of the struc-
ture and the operations applied to them. Unneeded nodes are indicated by
dashed boxes, and unused paths are indicated by dashed lines. Nodes that take
part in the operation have bolded names, indicating that these nodes are moved
to the new result structure. Solid arrows indicate how the different structure
levels are modeled together. If linking below the root is used, then an additional
dotted arrow indicates this linking.

20.1.1 Restructuring
Restructuring is performed by taking the structure apart and rebuilding it by
using the multiple copies of the structure technique to rebuild the structure in a
different order, and/or by using different relationships. Using this technique,
the following SQL restructure operation in Figure 20.1 slices out the Proj node
shown below, and makes it the new root in the resulting structure using the
hierarchical LEFT outer join, shown in Figure 20.1. The data relationship used
in making this new structure may be the same or a previously unused data rela-
tionship. If this data structure was retrieved from a contiguous data structure,
like XML, there may not have been any physical data relationships to use. On
the other hand, once any type of data structure (physical or logical, contiguous
or linked) is retrieved into a relational rowset, it can be freely taken apart. Reas-
sembling will depend on the data relationships that can be made.
The alias feature of SQL (aside from its renaming use) allows for the mak-
ing of multiple separately named copies of a structure in the rowset that can be
used in taking apart the structure in Figure 20.1 and hierarchically reassem-
bling it. The separately identified data from the two named copies of the
Semantically Controlled Data Structure Transformations 233

Emp EmpProj
Y

Dpnd Proj Prod

Emp
X

Dpnd Proj Prod

Y Prod Result
Structure
Emp

X
Dpnd Proj

Figure 20.1 Simple restructuring.

identified structure in the FROM statement above is identified by the use of


differentiated prefixes of X and Y associated with their created data copies in the
FROM clause. The LEFT outer join SQL in Figure 20.1 models the restruc-
tured data as Y over X, because Y is preserved even when X is not. This is why
ProdID is identified in the SELECT statement with a Y prefix, and EmpID,
DpndID, and ProjID are identified with an X prefix in the SELECT statement.
This is because ProdID is desired at a higher hierarchical level than the others.
The higher-level Y.ProdID is used in the SELECT List and is used in the ON
clause of the LEFT join in order to reflect its proper position in the new hierar-
chy. This introduces a new relationship and semantics.
What makes this multiple copy structure work and look like the result
structure is that all the nested boxes that are unneeded nodes are removed using
hierarchical node promotion and node collection operations to produce the
desired result. In relational terms, this is called projection, and it removes unse-
lected data types in the active rowset. If no data is selected from a node, it is not
output, and node promotion occurs. This is a powerful data aggregation and a
natural mapping from relational to hierarchical.
234 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

20.1.2 Restructuring Using Multiple Levels


This example in Figure 20.2 uses the same starting structure as the previous
restructuring example. It is different because, in addition to moving the Prod
node to the top of the new structure, the Emp and Dpnd hierarchical segment
nodes at level Z are repositioned under the Proj node. This is done by adding
another level (Z) to the structure by using another LEFT join to attach the
Z.Emp node under the Y. Proj node. The Dpnd node goes along with the Emp

Figure 20.2 Restructuring adding extra level.


Semantically Controlled Data Structure Transformations 235

node in the same hierarchical structure position under Emp node, preserving
their hierarchy. This demonstrates that structures can be moved with a single
join operation when they can be used “as is,” as in the Emp/Dpnd fragment at
level Z. This is controlled in the SELECT statement that controls which
datatypes associated with their data level are used for output.
Notice in the SQL in Figure 20.2 compared to the previous SQL in Fig-
ure 20.1 that the additionally added LEFT join creates another copy of the
structure named Z. The Emp and Dpnd field in the SELECT list now uses the
Z prefix so that it can be referenced in the newly added LEFT join. This LEFT
join is used to move the Emp node hierarchically under the Proj node. Also
note that the X.Prod/Y.Proj node relationship utilizes linking below the root, as
shown by the dotted arrow that is used heavily in data structure transformation,
allowing for unlimited ways structures can be joined and transformed.

20.2 Reshaping
Reshaping is different from restructuring in that reshaping brings to mind a
molding process by shifting pieces of the structure around, as in molding a
piece of clay. This means that that there are no limitations as to what the result-
ing structure can be, allowing any structure to be transformed into any other
structure and enabling any-to-any structure transformations. The logic per-
forming this type of transform is in the semantics of the structure driving how
the transform is performed. This preserves the basic semantics that are applied
to the creation of the new structure. This means that the naturally implied rela-
tionships from the physical or logical juxtaposition of the data in the structure
flows with the structure as it is logically molded and controlled in the rowset.
This produces the desired new structure while taking into consideration the
current semantics represented by the data. This joining process is synchronized
by joining to the same matching data item copy in both copies being com-
pared. This is because no data relationships can be utilized.
The following examples will demonstrate a number of data structure
reshaping examples that cover linear-to-linear (single path); linear-to-nonlinear
(multipath); nonlinear-to-linear; and nonlinear-to-nonlinear structure trans-
forms. The previous restructuring used physical data relationships in the data to
perform the transform. This is not used in reshaping, which uses a technique
similar to the restructuring. It also uses the same processing of using multiple
copies of the structure, but the coordination between the copies of the struc-
tures is different because data relationships are not available. This requires
another method to coordinate multiple levels, so the relationship of the
236 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

structures is now made between the same named data type of the two structure
copies that are being moved in their required hierarchical order. This reshaping
technology has been used before to invert linear structures and will be expanded
here to include reshaping of nonlinear structures. All examples use a dotted
arrow to show the selected output data and how and where it is moved to the
output structure.

20.2.1 Inverting a Linear Structure by Reshaping


This is a linear structure inversion using nodes Dept over Emp over Dpnd (a
series of 1 to M relationships) in Figure 20.3. This will demonstrate that the
alignment joining starts at the bottom of the structures, and drives the inver-
sion upward at each node as the selected inverted node (in bold) from each level
is selected. First select: X.Dpnd, second: Y.Emp, third: Z.Dpnd. Only these

Figure 20.3 Linear structure inversions.


Semantically Controlled Data Structure Transformations 237

three nodes represented by their data are selected in reverse order once each for
output, so that the three levels X, Y, and Z are squeezed together through natu-
ral node promotion to keep their structure and juxtaposition relationship,
which is naturally inverted and now has M-to-1 relationships. These copies are
kept synchronized by comparing data values in the separate copies.

20.2.2 Linear-to-Nonlinear Reshaping


Linear structures can be reshaped into nonlinear structures. In Figure 20.4, the
linear structure Dept over Emp over Dpnd can be used to generate the struc-
ture Emp directly over the siblings Dpnd and Dept. Two copies of the input
structure are necessary and are joined by their common Emp node because it is
the starting node to building the new structure. Only two copies of the struc-
ture are necessary because the first copy can be used to define two nodes, Emp
over Dpnd at level X, which are already related hierarchically as required. This
places Emp over the Dpnd and Dept siblings because X.Emp is linked to
Y.Dept. This creates the nonlinear structure desired and these same node values
(in the SELECT list) will be selected to create the nonlinear structure desired.
By placing Emp over Dept, Dept is naturally and correctly replicated, convert-
ing their relationship from 1-M to M-1.
The previous linear examples and this nonlinear example have not lost the
semantics of the input structure in the new structure because the semantics
have been kept the same or have been inverted. This means the nodes have

Figure 20.4 Linear-to-nonlinear reshaping example.


238 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

basically kept attached to the same nodes, as shown below in the derived result
structure. Emp over Dpnd remains the same while Emp over Dept has been
inverted.

20.2.3 Nonlinear-to-Linear Reshaping


Nonlinear structures as input can also be used to build linear and nonlinear
structures. In fact, nonlinear structures offer more flexibility in how they are
utilized because their multiple paths offer more opportunity to find the correct
reshaping that is being sought. This means that less input copies need to be
used.
The nonlinear input structure in Figure 20.5 can be duplicated to create a
linear structure reshaping of itself using the SQL below. Since we are starting
with the Prod node as the root, this will be the first matching link. Dept
becomes available in the second-level (Y) structure, which is valid because it is
related to the related link point (Prod). Emp can also be selected from the same
lower structure copy because it is already located under the Dept node.

20.2.4 Nonlinear-to-Nonlinear Reshaping


This nonlinear-to-nonlinear example in Figure 20.6 is very similar to the previ-
ous nonlinear-to-linear SQL example in Figure 20.5, where the linear structure
Prod over Dept over Emp was produced easily. Figure 20.6 demonstrates that a
nonlinear structure can be transformed into a different nonlinear structure.

DeptView

X De pt Linear Result
Structure

Prod Emp Prod

Dept Dept
Y

P rod Emp Emp

Figure 20.5 Nonlinear-to-linear reshaping example.


Semantically Controlled Data Structure Transformations 239

Figure 20.6 Nonlinear-to-nonlinear reshaping example.

This example is similar to the previous example, but places Emp not under
Dept but under Prod instead, making Dept and Emp siblings for this structure.
This requires that a third copy of the input structure Z is also matched to Prod
because that is where Emp is being attached to. The Emp node is accessed indi-
rectly from Prod up to Dept then back down to Emp, a powerful related
semantic reshape operation. Basically, the third structure Z and additional join
in this example is necessary in order to move Emp from under Dept to under
the Prod node. This indirect link from Prod to Emp, because Dept becomes
the data modeling root, places it directly under Prod. This also utilizes the full
semantics between the Prod and Emp nodes. This produces the desired mean-
ingful result between Prod and Emp, but the closer the linkage is, the better the
result. The semantics can become fuzzy and there could be data loss when
reversing data relationships.

20.3 Data Structure Virtualization


Data virtualization is the building of a data structure using multiple data frag-
ments from different, possibly unrelated data structures from different sources.
The multiple copies technique used in the previous reshaping can be further
expanded to manipulate separate data fragments. These fragments can be
240 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

utilized to perform the data manipulation of multiple related and unrelated


data objects to perform data virtualization.

20.3.1 Data Fragment Control


For the following examples, an ABC view has been introduced as shown in Fig-
ure 20.7. In this example, two structure fragments from ABC, BDE, and CFG,
are recombined differently utilizing the LEFT outer join shown to hierarchi-
cally link the CFG fragment under the BDE fragment. This allows any-to-any
node linking between the structures. Only the desired data in the SELECT
clause is output triggering node promotion. This aggregates the data nicely into
a single condensed structure where nodes F and G in the lower left structure are
brought up and directly next to node D under node B. This process is shown in
Figure 20.7, where the top diagram shows how the structures were initially
related. The lower left diagram shows the modeled structure, the inactive path-
ways have dashed boxes from unselected nodes. Also note that node B was on
an active path even though it was not selected for output. The path is active
because it is needed to navigate to the referenced node D. The example on the
lower right is the resulting structure.

Figure 20.7 Multiple fragment usage.


Semantically Controlled Data Structure Transformations 241

20.3.2 Data Virtualization Example


The query and example in Figure 20.8 uses a new structure XYZ upper struc-
ture and uses the same basic capabilities as the previous example. However, in
Figure 20.8, separate fragments BDE and CFG of the lower structure ABC
from the previous example in Figure 20.7 are separately moved to different
locations under the XYZ structure. In Figure 20.8, node D from BDE is placed
under node Y, while node G from CFG is placed under node Z. The nodes B,
E, C, and F are missing because they are not referenced on the SELECT list.
This is also achieved by using the SQL alias feature to enabled separate refer-
ences to the ABC view structure by using either BDE or CFG prefixes to qual-
ify the different view references and their associated data parameters on the
SELECT clause. This is indicated with the two data fragments qualified using
DBE and CFG. Notice that each fragment is positioned separately by its own
hierarchical LEFT outer join statement. This is an example of data
virtualization because it involves multiple separate data structures XYZ and
ABC, and allows control at the data fragment level. This demonstrates bringing
different structures together and segment separation. The XYZ structure has
been expanded with data from different segments of the ABC view being
attached to the XYZ structure. The result is condensed (aggregated) and can be
queried as a single isolated structure.

Figure 20.8 Structured data virtualization.


242 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

20.4 Polymorphic Transformation


A polymorphic reshaping transformation does not rely on the structure of the
input structure. This means any shape structure with identically named nodes
can be transformed into the desired resulting structure. Polymorphic transfor-
mation operations in SQL operate by moving only one node at a time. Specific
transformations can take advantage of moving more than one node at a time if
they are already in the desired hierarchical position. The tradeoff is that poly-
morphic processing can only move one node at a time. The choice of using
reduced steps or a polymorphic solution is up to the user. Polymorphic trans-
formation can still perform any-to-any transformations. Two examples using
an output linear example and a nonlinear example will be shown in the next
section.

20.4.1 Polymorphic Linear Example


In this polymorphic transform example shown in Figure 20.9, the desired out-
put is a linear hierarchical structure with node Y over node X over node Z. The
input structure can be any shape structure with X, Y, and Z nodes. The exam-
ple shows two possible input structures, one linear and one nonlinear, that are
processed using the same SQL that performs the polymorphic transform pro-
cessing. The SQL shown is not structure-sensitive; it will process the input
structure internally and automatically, based on its input structure. The
selected X, Y, Z data values are accessed from either a linear or nonlinear struc-
ture. The same SQL can process both input structures the same even if they
have different shapes as shown. This produces output structures that are identi-
cal for both input structures. The solid arrows in Figure 20.9 show how the
XYZ structure copies are linked to each other through their ON clause. Prefixes
A, B, and C are used to identify the different copies of structure XYZ used in
the SQL.

20.4.2 Polymorphic Nonlinear Example


This is a polymorphic nonlinear example that is basically the same as the previ-
ous example in Figure 20.9, except that the output is nonlinear instead of lin-
ear. The only difference in this example is in the last line of the SQL, which
causes the result to be nonlinear by linking node Z under node Y instead of
node X, making the structure nonlinear. This can be seen in Figure 20.10.
Semantically Controlled Data Structure Transformations 243

Figure 20.9 Polymorphic linear example.

Figure 20.10 Polymorphic nonlinear example.


244 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

20.5 Multipath Queries Alternative to Transformations


A typical transform like the one shown in Figure 20.11 converts the nonlinear
multipath XYZ structure to a linear single path structure in order to query it.
This is done because a query, such as SELECT Z FROM Linear XYZ WHERE
X=2, is easy to understand, whereas the same query using the nonlinear XYZ
structure is not as easily understood. Interestingly, a multipath query can pro-
cess the nonlinear XYZ structure directly because the hierarchical semantics of
the multipath structure relate the X and Z nodes automatically. This also
means that the user does not need to know the structure. This avoids the neces-
sity of transforming the structure. In addition, transformations can introduce
problems because the relationship can flip; for example, Y over X may be
flipped to X over Y, as in the transform in Figure 20.11. This may not always
be desired and can lose data. The multipath nonlinear structure probably has a
1 to M relationship for Y over X, which means that the transformed linear
structure has an M to 1 relationship for X over Y. This can cause data loss
because parents with no children will be lost when inverted.

20.6 Conclusion
The SQL reshaping, restructuring, and virtualization shown do take a bit of
more complex and procedural programming, but they still use SQL’s automatic
semantic processing to help control and make processing more automatic and
correct. A high-level nonprocedural transform language could be designed for
this transformation based only the desired structure, and the SQL could be
generated automatically or could be internally performed automatically by
SQL.
It may have occurred to you that the need to create multiple copies of the
structure in the examples to perform transformations could be costly in mem-
ory use. However, most SQL processors should optimize the use of multiple
copies of the structure by keeping and using only a single copy.

X
Y
Y
X Z
Z

Figure 20.11 A typical Transform.


21
Automatic Processing of Remote
Dynamic Structured Data
The world is awash in data with Internet traffic being measured in zettabytes.
Much of the world’s data is unstructured data, and deriving value from it is
often a process of organization and analysis. Unstructured data can therefore be
an important asset. However, structured data still keeps businesses running day
in and day out, which requires consistent, predictable, highly principled pro-
cessing for correct results. This means that structured data cannot be replaced
by unstructured or semistructured data for applications requiring precise
results. However, its processing can be automatically enhanced to support the
processing of dynamic structured data, opening up unlimited flexibility and
new capabilities. This allows for the structured data to change dynamically and
transparently as needed.

21.1 Static Versus Dynamic Structured Data


The structured data used today has been around from the beginning of data
processing. It has been around longer than unstructured and semistructured
digital data, which are relatively new additions. This original and still-current
structured data remains limited to a fixed static operation. Changing it usually
requires manual intervention. With the advent of semistructured data, the time
for dynamic structured data may be at hand. Semistructured data bends the
rules for valid hierarchical structured data, but cannot take the place of static
structured data to add more flexibility. New dynamic structured data can take

245
246 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

the place of static structured data for additional flexibility because it continues
to obey the rules for hierarchical structures after it has been dynamically
changed. The software using dynamic structures needs to accommodate the
variable hierarchical structures by using automatic metadata maintenance. This
will significantly speed up and automate changing the operation of dynamic
structured data.

21.2 Automatic Processing of Remote Dynamic Structured


Data
With dynamic structured data in place and operational with hierarchical pro-
cessing, it would be very useful to support an automatic processing of remotely
created dynamic structured data. This could, for example, support a powerful
new type of application, such as Interactive Social Software Real-time IT
Development Collaboration. This can utilize highly principled hierarchical
data processing and its flexible and advanced structured processing to support
dynamic structured data and its processing. This flexible dynamic structured
processing can change the structure of the data as necessary for the required
processing without the implementation time and design problems. It uses the
relational and hierarchical data principles and semantics in the data to derive
correct structured data results.

21.3 Dynamic Structured Data Processing Example


This processing will perform freely across remote, unrelated user locations at
any time and will transparently support dynamic structured data and data type
changes automatically for immediate processing. Such an automatic
user-to-user dynamic structured data collaboration operation is depicted in Fig-
ure 21.1. Its dynamic working hierarchical data structure is freely modified at
each user site visited, which are labeled U1 to U4 in the figure.
In Figure 21.1, user locations U1, U2, U3, and U4 located anywhere
need to collaborate and share their structured data in order to produce a needed
result. This process will require changing the data structure and data types as it
becomes necessary to achieve the desired need and result. User 1 starts the col-
laboration process by inputting three relational tables (A, B, and C) and models
them into a hierarchical structure, sending it off to user 2 for further process-
ing. Concurrently overlapping with user 2’s processing, user 1 also inputs an
XML linear hierarchical structure, XYZ, and transforms it into a nonlinear
Automatic Processing of Remote Dynamic Structured Data 247

Figure 21.1 User-to-user dynamic structured data processing collaboration.

multipath hierarchical structure before sending it off to user 3 for further pro-
cessing. This is done using the SQL hierarchical XML processor.
User 2 and user 3 are now performing independently and concurrently.
Both users 2 and 3 are retrieving their structure input from user 1, inputting
additional relational table data from their different user home locations, and
joining this data to their working data structures. After completing these tasks,
user 2 and user 3 will both send their modified data structures off to common
user 4 for further processing.
User 4 accepts the modified data structures from both user 2 and user 3,
which operated concurrently. It hierarchically joins them together using a
matching data item value between nodes B and X (B.bb=X.xx). User 4 then
eliminates unneeded data items from the joined result using SQL’s dynamic
SELECT operation to select data items for output from nodes A, B, E, Y and
W. This SQL query looks like: SELECT A.a, B.b, E.e, Y.y, W.w FROM U2
LEFT JOIN U3 ON B.b=X.x. This slices out all nodes (C, D, Z, X, V) that
were not referenced by the SELECT statement. This automatically aggregates
the necessary data nicely, as shown in Figure 21.1. This process is known as
projection in relational processing and node promotion in hierarchical process-
ing. The LEFT join operation hierarchically places user 2’s structure over user
3’s structure, connected by the ON clause specification of: B.b=X.x. This
newly-combined hierarchical structure in user 4 is sent back to user 1 for
immediate review, processing, and output. The hierarchical data can be selec-
tively output in different formats, each with different data selections, as shown
in Figure 21.1.
During this entire user-to-user collaboration process, the changing
dynamic data structures and data types are automatically maintained and
248 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

utilized transparently for the user as needed. This is indicated by the dynamic
metadata dashed box shown in Figure 21.1. The user at each receiving user
location can also view the current active structure and its data types. However,
knowledge of the structure is not necessary for the user to specify in the query
because the maintained structure is automatically known and used inherently
by the query processor. Different working data structure versions can also be
saved and restored at each user location by the user.

21.4 Integrating SQL with Dynamic Structured Data


Maintenance
The problem with performing the type of dynamic processing described above
is that processing data structures has typically been previously limited to fixed
static structure processing. This is because dynamically generated structure
could not automatically be communicated between users. Sharing structured
data today is performed with shared metadata. The metadata remains the
same, so the structure must remain static. However, with dynamic structured
processing, the data structure can be dynamically modified as needed to sup-
port the required structured operation. This was shown and described in Figure
21.1. Previously, variable structured processing had to be programmed in
ahead of time limiting its dynamic capability. Unplanned dynamic processing
requires automatic metadata maintenance that has not previously been sup-
ported by the industry for structured data processing. This is represented by
the dynamic metadata box in Figure 21.1.
An advanced standard SQL transparent hierarchical processor prototype
(referred to here as the SQL hierarchical processor) has been developed follow-
ing the SQL hierarchical data processing technology described in this book.
The SQL hierarchical processor can support the required dynamic and flexible
structured data processing that is necessary for collaboration. It is described in
much greater detail in Part V of this book. The SQL hierarchical processor uses
SQL’s inherent hierarchical data processing capabilities that naturally support
full multipath dynamic hierarchical data structures. This includes correlating
the data referenced on different pathways, which increases data value. This
allows the most complex multipath hierarchical operations to be performed in
order to meet the needs at the required time and user locations with the SQL
hierarchical processor. Because of its dynamic processing capabilities, it has also
been programmed to operate structure-aware. This structure-aware processing
is also necessary for the required automatic metadata maintenance to occur at
each user location shown in Figure 21.1. The structure-aware processing deter-
mines the structure by analyzing the input hierarchical SQL.
Automatic Processing of Remote Dynamic Structured Data 249

21.5 Different Levels of Metadata Processing


The pyramid structure in Figure 21.2 depicts the different levels and types of
metadata handling. This shows the different types of hierarchical processing
throughout the SQL hierarchical processor from relational/hierarchical integra-
tion to hierarchical processing to structure-aware processing to automatic
metadata maintenance, and finally, to its dynamic metadata transfer. Struc-
ture-aware processing enables the automatic metadata maintenance to update
itself. This supplies and enables the dynamic metadata transfer block to transfer
the metadata between the dynamic metadata transfer blocks. Notice that the
automatic metadata maintenance is updated enabling a jump to the SQL input
with the transferred SQL on input to the receiving dynamic metadata transfer
block. This starts the processing at this user location. Automatic metadata
transfer can also support password and data encryption for data security. The
XML box is included to show approximately where XML input and output is
performed.

21.6 Structured Data Processing Collaboration


The SQL hierarchical processing technology can be enhanced to integrate with
user-to-user structured data processing collaboration. This eliminates the user
control necessary for the dynamically changing metadata. The automatic
metadata maintenance supplies the updated current metadata that accompanies
the data when seamlessly transmitted between users. This allows amazingly fast,
on-the-fly advanced hierarchical structured data processing collaboration. This
enables previously unknown structure results delivered to any user to be

U2 U3
SQL SQL

Rel/Hier Rel/Hier

Hier Processing XML Hier Processing


Structure- Aware Structure- Aware
Auto Metadata Maint Auto Metadata Maint

Dyn Metadata Trans fer Dyn Metadata Trans fer

Figure 21.2 Dynamic metadata maintenance and transfer.


250 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

immediately processed automatically using the SQL hierarchical processor at


the user location.
The SQL hierarchical processor will control the user-to-user processing,
sending and receiving of data structures using new SQL InFile and OutFile
keywords that have been added to the SQL hierarchical processor for this pur-
pose. Further dynamic processing and data structure modification can be per-
formed at each user location visited in any order, including a network or in
parallel, as shown in Figure 21.1. This opens up the new capability of dynamic
structured data processing and its automatic transparent metadata handling.

21.7 SQL Hierarchical Processing for Structured Data


Collaboration
The SQL hierarchical processor is a powerful new SQL transparent multipath
hierarchical processor that dynamically processes heterogeneous data, such as
physical relational data, logical hierarchical modeled relational data, and physi-
cal hierarchical data, such as XML. This SQL hierarchical processor’s dynamic
data processing enables logical and physical structures to be hierarchically
joined and modeled dynamically. This hierarchical processing significantly
increases the power of the data structure and the queries applied to it. This is
extremely flexible and powerful, and is automatically performed without user
hierarchical navigation. This operation naturally utilizes the hierarchical
semantic information between the pathways to process powerful multipath
queries. This can freely reference multipath queries that can, for example, select
data from one path based on the data in another path. This unlimited
multipath processing requires special automatic processing known as LCA pro-
cessing. It enables any conceivable valid multipath query to be processed auto-
matically. This capability is supported naturally in the SQL hierarchical
processor via SQL and is missing in other software such as XQuery processors.
An additional valuable benefit of using hierarchical structures is that they
are great at naturally organizing data. Their ability to freely create and grow
logical hierarchical multipath structures dynamically also has another over-
looked powerful benefit; it continually increases the data value of the data
nonlinearly through automatic data reuse and the sharing of the data at higher
levels with the multiple lower levels in a pyramid fashion. In addition, the
dynamic joining of these hierarchical structures can dynamically increase their
data value and querying power many times. Another powerful advantage is the
creation of new logical hierarchical data structures that are dynamically assem-
bled when creating new structures. These structures are joined and exist only
Automatic Processing of Remote Dynamic Structured Data 251

when and while they are being used. These logical structures add flexibility to
hierarchical structures and efficiency to their new use.

21.8 Conclusion
All of the powerful and flexible capabilities mentioned in this chapter make
multipath hierarchical structures and their hierarchical processing the perfect
opportunity for this new dynamic structured data processing. SQL is a univer-
sally known query solution, making it a perfect API that is enhanced by
dynamic relational hierarchical processing technology. Single one-way data
transmissions will also always be available to send to anyone at any time because
a receive-only version of the SQL hierarchical processor user-to-user will be
freely available to download and use to automatically view and utilize the
one-way transmitted data structure. This enables sending dynamically created
structured data anywhere and having it immediately available for the receiver to
utilize automatically.
The real importance of dynamic structured data is that it remains accu-
rate and precise even though it can change dynamically and remain accurate. Its
operation is fast and immediate; no manual metadata updating is needed and
changes to metadata can occur automatically and seamlessly. This flexibility
allows real-time parallel and network development collaboration using power-
ful hierarchical processing. It also enables unlimited new dynamic possibilities.
These dynamic capabilities allow the structure data processing to adapt auto-
matically and accurately to the desired needs using powerful standard SQL
hierarchical processing.
22
New SQL Hierarchical Processing
Technology and Discoveries
In this chapter, new discoveries and methods that were derived from our
research and used in our ANSI SQL hierarchical processor are reviewed. These
newly discovered methods are necessary for this advanced product to operate.
These methods will be discussed in this chapter. To start this chapter off, it will
first be discussed what type of hierarchical processing is being performed
because there is some confusion as to what is necessary for a fully powered, pro-
fessional, use-relational database.

22.1 External Versus Internal SQL Hierarchical Processing


This book has examined and utilized SQL’s inherent hierarchical data modeling
and structure processing. This means that the hierarchical processing is con-
trolled automatically by standard SQL and is contained and executed within
SQL. Because of its limitations, this book does not cover databases built around
SQL that require adding the necessary hierarchical processing that procedurally
utilizes external functions and external constructed stored data maps. The most
popular of these external database technologies is known as the adjacency list
model. These external types of databases are also primarily limited to two-dimen-
sional processing. This is height-over-width, as in node A directly over node B
and node C, for multipath use and support. In this two-dimensional usage, nodes
A, B, and C have only a single occurrence of data. Handling the third dimension
becomes too difficult to perform externally and procedurally.

255
256 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SQL that conforms to the latest international standards can naturally


handle three-dimensional relational databases transparently and automatically.
A third dimension, depth, is the data occurrence where each node can have
multiple data occurrences. This is also a standard requirement for standard
hierarchical professional databases, such XML or IBM’s IMS database. This
third dimension adds a significant amount of new multipath LCA semantics to
the data structure that needs to be interpreted by the database to be processed
correctly. Standard SQL inherently understands three-dimensional hierarchical
structures when they are modeled hierarchically by the user or process. This
shows that standard SQL inherent hierarchical processing offers the easiest and
most powerful solution for SQL hierarchical processing. This natural hierarchi-
cal capability has already been thoroughly discussed in this book and will con-
tinue to be further described.

22.2 Hierarchical Processing Background History


It is important to know that the full nonlinear hierarchical processing used in
our SQL hierarchical processing product is not new or unproven. Prior to the
emergence of SQL, full nonlinear hierarchical processing and querying was
heavily in use, very successful, and accurately performed. This means that its
natural full hierarchical principles have already been discovered, tested, and
proven.
The advent of relational processing with its data independence pushed
the processing of inflexible fixed data architectures, such as hierarchies and the
CODASYL network model, into to the background. Today, with the advent of
XML and its popularity, hierarchical structures have been resurrected. How-
ever, with full hierarchical processing having been forgotten and the continued
popularity of SQL databases, this has had the effect of limiting hierarchical pro-
cessing to single linear pathways, keeping it close to relational processing’s
operation. Our research has seamlessly opened the door again to full multipath
hierarchical processing by utilizing SQL’s little-known inherent hierarchical
processing capability to seamlessly integrate relational and XML data at a full
hierarchical level that is still performed naturally by SQL operations.
With SQL operating fully hierarchically, it was determined that the same
hierarchical principles and processing now naturally possible in SQL queries are
long-standing principles of hierarchical processing. Therefore, the hierarchical
processing with SQL described here has been previously used, tested, and
proven.
New SQL Hierarchical Processing Technology and Discoveries 257

22.3 Hierarchical Principles and Operation


The SQL LEFT join operation preserves the left-side data when there is no
matching right-side data. This can be used to specify or perform hierarchical
data modeling, which allows for the saving of node A over node B, but not vice
versa if the parent is missing. The parent node can control multiple pathways
under it in the same way. These pathways are independent of each other, except
for the parent node.
The LEFT join data modeling data structure is self-defining, which
allows the hierarchical structure metadata to be automatically extracted. This
also solves the relational/hierarchical data integration problem at a full seamless
hierarchical level and preserves the hierarchical semantics. This has demon-
strated that hierarchical processing can be achieved as a subset of relational pro-
cessing. The LEFT join semantics defines the processing for the hierarchical
structure, which allows the logical hierarchical data to operate as a hierarchical
structure would. Because the hierarchical structure can be automatically deter-
mined from the LEFT join, physical hierarchical structures can also be
seamlessly processed, which allows for heterogeneous hierarchical processing.
The SQL data modeling also allows for the dynamic joining of hierarchical
structures. The structures can be heterogeneous because they both rely on the
same SQL outer join data modeling defining the hierarchical join operation.

22.4 Schema-Free Navigationless Hierarchical Database


Access
Schema-free querying basically implies that the user specifying the query does
not need to know the physical structure of the data or where the data is located.
This means that the query product must support this by having knowledge of
the structure or by walking through the structure. Either of these two methods
means that the navigation is automatic. An additional capability that may not
be immediately obvious with this automatic processing is that this processing
capability is polymorphic, which means that many different data structures can
be processed by the same query as long as the data names remain the same. The
SQL hierarchical processor supports this naturally because hierarchical struc-
tured data is unambiguous, therefore allowing the data to be located automati-
cally without user navigation.
SQL hierarchical processing also naturally supports multipath hierarchi-
cal processing, which allows more powerful queries to be easily queried. These
queries are more complex to process because the multiple path processing
requires correlating the relationships between currently active multiple paths in
258 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

order to assure meaningful results. This requires a complex processing known


as LCA, which is not generally supported automatically or even procedurally
because it is too difficult to code. Even XQuery, the main XML query lan-
guage, does not support automatic LCA processing. Amazingly, standard SQL
inherently supports LCA processing automatically when operating on hierar-
chical structures.

22.5 Focused Aggregated Data Retrieval


Focused retrieval with result aggregation are information retrieval (IR) terms
used to mean that documents can be dynamically searched and that only cor-
rectly identified documents are identified and then only the correctly associated
data is returned and condensed into a meaningful result. It also implies that this
is done in an ad hoc or interactive manner without requiring pre-established
query logic.
Executing an SQL query containing a dynamic SELECT list exploits a
query optimizer that automatically controls and tailors what data pathways and
data are necessary for the query and only those are used to processes the query.
This allows any query possible to be dynamically specified, which dynamically
tailors the hierarchical processing to the dynamic query, a requirement for
focused retrieval with result aggregation.
The SQL hierarchical processor described above is structure-aware,
always aware of the dynamic hierarchical working structure. With this knowl-
edge, the SQL query selected data result is output automatically in its current
hierarchical XML format, which properly reflects the hierarchical result. This
takes care of the result aggregation with the complete control over the specific
output data and its correct hierarchical formatting.
The final requirement is the focused retrieval. This is performed by the
SQL WHERE clause specifying the data selection criteria that controls locating
the desired documents and filtering out the undesired data. This may seem sim-
ple, but what has been missing in the XML industry is the correct hierarchical
data filtering across the multiple hierarchical paths. This requires special LCA
logic that is not being performed today with database hierarchical processing.
This lack of LCA processing causes incorrect results to be returned, such
as locating keywords across different documents that wrongly qualify both doc-
uments instead of qualifying keywords only within the same document. How-
ever, standard SQL performing hierarchically also inherently performs LCA
processing. Therefore, the focused retrieval with result aggregation is correct
and meaningful. This has satisfied the requirements for focused retrieval with
result aggregation.
New SQL Hierarchical Processing Technology and Discoveries 259

22.6 Combing Relational and Hierarchical Advantages


Several decades ago, set-theoretic data structures and relational processing
gained traction as a better alternative to hierarchical processing because hierar-
chical structures were fixed and inflexible, while sets and relational processing
offered data independence. This belief failed to take into consideration that
logical hierarchical structures operate exactly the same as physical hierarchical
structures, offering data independence and even additional high-level process-
ing capabilities. Logical hierarchical structures are free to handle any combina-
tion of heterogeneous data structures using physical and logical methods to
construct the virtual data structure.
Relational data can be modeled hierarchically using SQL joins that form a
logical hierarchical structure that has all of the rules, principles, characteristics,
and capabilities of a physical hierarchical structure. The syntax of the SQL joins
model the hierarchical data structure while the associated semantics define the
hierarchical processing when processed by the SQL database engine. This
allows hierarchically modeled logical relational structures to be naturally inte-
grated with physical hierarchical structures at a full hierarchical processing
level. This is performed with no data loss and introduces many additional
advanced hierarchical capabilities, such as flexible hierarchical data structure
joining, which dynamically increases the data value in unlimited ways. The
dynamic, logically constructed, hierarchical structures also only exist for the
duration of the query considerably utilizing and reducing storage space.
The synergy produced from this SQL/hierarchical data integration is
from the maximum flexibility and data independence of relational processing
seamlessly working with full hierarchical processing. These capabilities nor-
mally do not operate together, but by using an SQL logical hierarchical struc-
ture, they can work together, each keeping their best capabilities and features
that enhance each other. These features are relational flexibility combined with
SQL’s advanced inherent hierarchical processing capabilities.

22.7 Global Hierarchical Optimization


Inner and full outer joins have different preservation rules that do not follow
hierarchical processing rules and cannot be optimized by this technique. For
example, the inner join views always require that every node in the input view
can be accessed for existence because any missing data occurrences anywhere in
the view can cause the entire row occurrence to be removed. The hierarchical
optimization described previously where unreferenced pathways can be
removed from access consideration is quite powerful because it can remove
260 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

entire nodes from the view from the start of processing, and this optimization
may not be picked up by the standard relational optimization. The standard
SQL query optimization is still required but the previous hierarchical optimiza-
tion may still have assisted the query optimization by having reduced the com-
plexity of the required relational optimization. This makes it possible to operate
more efficiently in recognizing optimizations that might otherwise be missed
by a query optimizer.

22.8 SQL Multipath Multioccurrence Data Filtering


Hierarchical data filtering using the WHERE clause affects the entire range of
the hierarchical structure. This means that a filtering condition applied to a
given node type affects all other nodes types in the hierarchical structure. The
node data filtering is applied at the node data occurrence level. Figure 22.1 of
the hierarchical structure demonstrates how and why the WHERE clause
affects all node types (‘A’ through ‘G’) that define a hierarchical structure. The
WHERE clause shown in Figure 22.1 is applied directly to the E node type.
This qualifies the E node type directly. From the E node type, qualification
spreads in an up-and-down fashion. Additionally, every node type qualified in
the upward and downward process also qualifies the node types below it. This
is shown in Figure 22.1, where upon reaching the A node type on the up direc-
tion also qualifies all node types under it and so on. This is the same process
that is occurring in the SQL relational rowset because the WHERE clause
affects entire rows at a time. This hierarchical qualification becomes a little
more involved when we cover the WHERE data qualification at the data occur-
rence level in the paragraph below.
The root node in Figure 22.1 has one data occurrence but could also have
other data occurrences. However, let’s assume these will not be qualified. All

Figure 22.1 The hierarchical filtering flow.


New SQL Hierarchical Processing Technology and Discoveries 261

other node types in this example have multiple node data occurrences. For
example, B1 and B2 data occurrences under the root node A1 data occurrence,
and C1 and C2 data occurrences under B1 data occurrences. Notice that the
B1 data occurrence also has D1 and D2 data occurrences under it. The B2 data
node occurrence also has a similar set of data occurrences under it.

22.9 Multipath LCA Types of Processing


LCA processing has been previously mentioned several times in this chapter.
This section will delve deeper into its powerful operation and explain when and
how it is used in two separate SQL situations. These are in the SQL WHERE
and SELECT SQL usages.

22.9.1 WHERE Clause LCA Processing


Hierarchical processing involving only single path processing is very simple and
intuitive. For example, using the structure from Figure 22.1, SELECT A, B, C
FROM aboveview WHERE B=’B1’ returns A1, B1, C1, C2 as described above.
What about a query that has a WHERE condition that references sibling legs:
SELECT B, A FROM aboveview Where C=’C2’ AND D=’D3’? No result is
returned for this query. When dealing with multiple path processing, the LCA
logic is used to ensure that only meaningful answers are returned. In this exam-
ple, the WHERE clause determines its LCA node type and then operates under
the range of the LCA node data occurrences. In this WHERE clause (WHERE
C=’C2’ AND D=’D3’), node types C and D are referenced, making the B node
type the LCA node type to control the range of filtering tests. Separate condi-
tion tests are performed for B1 and B2 data occurrences. The ordering of twins
has no specific meaning, this means that all the combination of comparisons
between the siblings (LCA B1=C1,D1; C1,D2; C2,D1; C2,D2) and (LCA
B2=C3,D3; C3,D4; C4,D3; C4,D4) are performed. Since Where C=’C2’ AND
D=’D3’ have different parent LCA occurrences, there is no match. SELECT B,
A FROM aboveview Where C=’C4’ AND D=’D3’ returns B2 and A1.
WHERE clause LCA processing can be very complex. Let’s look at:
SELECT C, B, A FROM aboveview Where C=’C2’ AND G=’G4’. In this case,
the LCA is node type A and not a direct parent node type. The processing per-
formed under node data occurrence A1 would test all combination of C=’C2’
AND G=’G4’ under the A1 data occurrence. There is a match found for C1
under B1 and G4 under E2. The result for selecting A, B would be C1, C2, B1,
and A1. B2 data occurrence did not qualify because it did not have a C2 data
value under it, which was needed in order to qualify it. Sometimes the active
262 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

LCA can change dynamically. This happens when the WHERE clause uses an
OR operation instead of an AND condition. Because of the way data is quali-
fied by selecting ranges from the side that is not directly qualified, the OR oper-
ation always needs to be processed on both of its sides. This means that, even if
the first OR operation test is true, the second right-side test must be performed
too. This can make the LCA change from each side tested and processed. This
is valid and is how the relational rowset Cartesian product works too.

22.9.2 SELECT Operation LCA Processing


The SELECT and WHERE operation used together to reference multiple
paths can cause a different use of LCA processing. This is demonstrated in
SELECT C FROM abovevView WHERE G=’G4’. In this case, we are selecting
data from one path of the structure, based on data in another path of the struc-
ture. The semantics of this are resolved using the LCA again; in this case, the
LCA of SELECT A and WHERE G=4, where node types A and G are refer-
enced produces node type A as the LCA node type. This is how this SELECT
node qualification works: WHERE G=4 qualifies up the path to the LCA node
data occurrence of A1 and then is reflected back down, qualifying all active
path occurrences under it (the flow of data qualification), this causes C1, C2,
C3, and C4 to be SELECTed for output. When multiple node types are
selected, they can produce multiple LCA’s because each one can have a differ-
ent LCA. This can be seen in the query: SELECT C, E FROM aboveview
WHERE G=’G4’. In this query, the added selected item E produces the LCA of
E, limiting its range under node E and limiting the query selection for node E
to only E2. If node A had still been the LCA for node E, then both E1 and E2
would have been selected. So, in this example, selected items C and E use dif-
ferent LCAs.

22.10 Isolating and Manipulating Data Segments


International standard SQL inherently supports full hierarchical processing
transparently and seamlessly in order to perform standard hierarchical process-
ing capabilities naturally and accurately. It turns out that standard SQL has the
ability to also go under the covers in order to isolate and manipulate hierarchi-
cal structures at a data structure fragment level. This is done by utilizing SQL’s
alias capability to reference multiple copies of the same hierarchical data struc-
ture using assigned prefixes, known as correlation names, specified on the
FROM statement. This allows separate node fragment groupings to be speci-
fied under their common prefix. This is specified on the SELECT statement.
New SQL Hierarchical Processing Technology and Discoveries 263

For example, using the data structure in Figure 22.1, the SQL query:
SELECT X.B, X.C, X.D, Y.E, Y.F, Y.G From Aboveview X LEFT JOIN
aboveview Y ON X.D=Y.E will join the sub structure with nodes B, C, D over
the substructure with nodes E, F, G. These substructures are connected
between the D and E nodes. The X prefix identifies the upper-level substruc-
ture, while the Y prefix identifies the lower structure.

22.11 Linking Below Root


It was possible to join two structures and have the lower level linked to any-
where below the root node. By letting the SQL hierarchical processor perform
its natural multipath hierarchical joining, which allows linking below the
lower-level structure, it was determined that the result always made for a
semantically sound query. This makes sense because SQL hierarchical process-
ing is automatic, correct, and has no limitations. To apply this capability to the
SQL hierarchical processor, which operates utilizing its structure-aware capa-
bility, the resulting combined hierarchical structure was analyzed.
It was also determined that the resulting structure, when linking below
the lower-level root structure, was always as if the lower-level structure’s root
had been linked to which also changes the result structure to a valid hierarchical
structure. This is because the lower-level structure is fully materialized before
joining. This happens because of the natural view nesting controlled by the
ending ON clause signals that the right side fully materialized view, XYZ, can
now be joined to the left view as in: SELECT * FROM ABC LEFT OUTER
JOIN XYZ ON A.a=X.x. This is very natural to specify and easy to process, and
it naturally supports a look-ahead capability because the XYZ view is fully
materialized before being joined to the previous ABC view. This linking below
the lower-level root allows for a very powerful data mashup capability with no
limitations for hierarchical joining. This is because the lower-level structure can
be linked to anywhere below its root with no limitations, greatly increasing the
join possibilities and allowing a schema-free operation where the user does not
need to know the structure.

22.12 SQL Data Transformations


It is bothersome that the transformation terms restructure and reshaping are
used interchangeably, because they can imply two different types of transfor-
mations. Restructuring is performed by using different data relationships, while
reshaping is applied by modifying the structure utilizing its current semantics.
264 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

This is very useful for XML, which does not generally use or need data keys
because XML structures are typically stored contiguously without keys associ-
ated with document components. These different transformations are further
explained directly below.
Restructuring is performed by breaking the structure apart and putting it
back together differently by using other data relationships that naturally exist in
the data. This introduces new meaning to the structure. On the other hand,
reshaping implies molding the structure into a new shape utilizing the natural
semantics in the data structure. This naturally changes the data structure, but
preserves the meaning and semantics in the structure. Restructuring and
reshaping usually have different uses. Reshaping can be used to match the
structure to an application, while restructuring is used to produce a structure
that is a new in meaning and use. The processing of both of these techniques
can be combined.

22.13 Conclusion
This chapter highlights all of the research, new features and new discoveries
that were needed to support the SQL transparent hierarchical processor and its
new capabilities presented in this and previous chapters. These new features
and capabilities are listed below.
The new features and capabilities are:

• Basic hierarchical principles and processing in SQL;


• Automatic structure-aware processing;
• Schema-free data access;
• Advanced data structure mashups;
• Global hierarchical data filtering;
• Automatic focused data aggregation;
• Any-to-any data transformations.

The new discoveries are:

• Automatic multipath LCA hierarchical processing in SQL;


• Data structure joining below the root;
• Global hierarchical optimization;
• Relational/hierarchical lossless hierarchical data integration.
Part V
SQL Transparent XML Hierarchical
Multipath Query Processor
Part V describes the SQL hierarchical XML processor and its new hierarchical
processing discoveries, which make its advanced automatic operation possible.
Chapter 22 describes the new SQL hierarchical processing technology and dis-
coveries used in the SQL hierarchical XML processor. Chapter 23 covers the
SQL/XML Standard: operations, integration politics, resulting ramifications,
and new solutions that solve the problems raised and discussed in this chapter.
Chapter 24 describes the SQL hierarchical XML processor’s internal and exter-
nal operation. Finally, Chapter 25 demonstrates the real-time and dynamic
SQL hierarchical XML processor in operation with annotated examples and
XML output.

253
23
SQL/XML: Operation, Politics,
Ramifications, and Solution
Today, most SQL product designers entered the database field with the advent
of E.F. Codd’s relational database and model. The author entered the field ear-
lier designing commercial hierarchical query products when hierarchical data-
bases were in their pinnacle. When RDB, arrived, he adapted these relational
products to support hierarchical processing seamlessly and transparently. This
gives him a unique perspective of relational and hierarchical database from the
point of view of not only how they are different, but how they are also alike.
The SQL hierarchical processor mentioned in this and previous chapters
is described in greater detail in Chapter 24. In order to demonstrate and prove
how this processor supports hierarchical processing, transparent XML support
was implemented in it. Valid XML data structures are hierarchical and access to
the data is performed by navigating through the hierarchy of the data structure.
XML contains some operations that are not present in traditional SQL query
processing; these are described in this chapter.
The implementation of the ISO SQL/XML standard and W3C XQuery
query standards were a disappointment for this author. The technical and polit-
ical reasons contributing to this will be discussed later in this chapter. The SQL
hierarchical XML processor mentioned above was designed to properly support
hierarchical and XML support. It will be used to demonstrate what is needed
for an SQL solution to fully support XML.

265
266 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

23.1 XML Data Description and Operation


XML is a markup language with a semistructured data model. Because it is a
markup language, XML documents contain embedded metadata that is useful
for creating and processing advanced structures. Semistructured languages have
been researched in the academic community for many years, but XML has
emerged as a de facto standard. XML has been adopted for diverse purposes,
such as archiving, messaging, publishing, application, service, and data integra-
tion. Coming from the World Wide Web Consortium (W3C), XML has seen
widespread adoption, contributing to a paradigm shift in how data can be pro-
cessed and messages exchanged.

23.1.1 Semistructured Data


Up until now, hierarchically structured data’s metadata needed for automatic
processing was defined externally. This kept hierarchical structures fixed in
structure. The one exception to metadata stored in the data was to handle vari-
able occurring, or variable length data, which used metadata in the data as vari-
able field length or occurrence counts. However, with semistructured data,
metadata is embedded throughout the data, specifying field types and
fieldnames at will. This embedded metadata includes the nesting of data that
defines the hierarchical structure. This means that it is possible to define any
previously unknown record that varies dynamically in data fields, data types,
and structure. Of course, this usually does not happen because most XML data-
base applications, utilities, or query languages could not process this unlimited
dynamic capability. However, it does demonstrate how and why there are prac-
tically no limits as to what can be defined in XML; no one utility or query pro-
cessor can handle all of XML’s capabilities. This is a very different form of
processing than the structured processing performed by SQL, which is required
to support fixed critical business applications that keep businesses running day
in and out. This structured data can be defined in semistructured data as a sub-
set that supports hierarchical data structures and their operating principles.

23.1.2 Multiple Content Types


There are three basic styles for specifying data in XML. These are referred to as
element mode, attribute mode, and mixed mode. In element mode, only ele-
ment data strings are used to specify data. In attribute mode, only attribute
variable assignments are used. As one would assume, in mixed mode content
type processing, both elements and attributes are used. Mapping procedures
from XML to SQL must take this mixed content type into consideration very
SQL/XML: Operation, Politics, Ramifications, and Solution 267

carefully when determining the most efficient representative structure. The


same structure could be defined in more than one way. To demonstrate this,
the SQL hierarchical processor will display the three possible XML content
types from the same Cust View shown in Figure 23.1. The element mode XML
format is shown first in Table 23.1. It is displayed in two columns because it is
quite lengthy and has shorter natural output lines. The attribute mode XML
format is shown in Table 23.2, and the mixed mode is shown in Table 23.3.

Cust

Invoice Addr

Figure 23.1 Cust View structure.

Table 23.1
XML Element Mode Output
SELECT * FROM CustView FOR XML Element
<cust> <addrcustid>Cust02</addrcustid>
<custid>Cust03</custid> <addrstate>CA</addrstate>
<custstoreid>Store01</custstoreid> <addrtext/>
<custtext>Comment Five, </addr>
Comment Six</custtext> </cust>
<addr> <cust>
<addrid>Addr03</addrid> <custid>Cust01</custid>
<addrcustid>Cust03</addrcustid> <custstoreid>Store01</custstoreid>
<addrstate>NV</addrstate> <custtext>Comment One,
<addrtext>This is addr Comment Two,
text</addrtext> Comment Three,
</addr> Comment Four</custtext>
</cust> <invoice>
<cust> <invid>Inv02</invid>
<custid>Cust02</custid> <invcustid>Cust01</invcustid>
<custstoreid>Store01</custstoreid> <invstatus>O</invstatus>
<custtext/> <invtext/>
<invoice> </invoice>
<invid>Inv03</invid> <invoice>
<invcustid>Cust02</invcustid> <invid>Inv01</invid>
<invstatus>O</invstatus> <invcustid>Cust01</invcustid>
<invtext/> <invstatus>P</invstatus>
</invoice> <invtext/>
<addr> </invoice>
<addrid>Addr04</addrid> <addr>
<addrcustid>Cust02</addrcustid> <addrid>Addr01</addrid>
<addrstate>CA</addrstate> <addrcustid>Cust01</addrcustid>
<addrtext/> <addrstate>CA</addrstate>
</addr> <addrtext/>
<addr> </addr>
<addrid>Addr02</addrid> </cust>
268 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Table 23.2
XML Attribute Mode Output
SELECT * FROM CustView FOR XML Attribute
<cust custid="Cust03" custstoreid="Store01" custtext="Comment Five,
Comment Six">
<addr addrid="Addr03" addrcustid="Cust03" addrstate="NV"
addrtext="This is addr text"/>
</cust>
<cust custid="Cust02" custstoreid="Store01" custtext="">
<invoice invid="Inv03" invcustid="Cust02" invstatus="O" invtext=""/>
<addr addrid="Addr04" addrcustid="Cust02" addrstate="CA" addrtext=""/>
<addr addrid="Addr02" addrcustid="Cust02" addrstate="CA" addrtext=""/>
</cust>
<cust custid="Cust01" custstoreid="Store01" custtext="Comment One,
Comment Two, Comment Three, Comment Four">
<invoice invid="Inv02" invcustid="Cust01" invstatus="O" invtext=""/>
<invoice invid="Inv01" invcustid="Cust01" invstatus="P" invtext=""/>
<addr addrid="Addr01" addrcustid="Cust01" addrstate="CA" addrtext=""/>
</cust>

Table 23.3
XML Mixed Mode Output
SELECT * FROM CustView FOR XML Mixed
<cust custid="Cust03" custstoreid="Store01">
Comment Five, Comment Six
<addr addrid="Addr03" addrcustid="Cust03" addrstate="NV">This is
addr text</addr>
</cust>
<cust custid="Cust02" custstoreid="Store01">
<invoice invid="Inv03" invcustid="Cust02" invstatus="O"></invoice>
<addr addrid="Addr04" addrcustid="Cust02" addrstate="CA"></addr>
<addr addrid="Addr02" addrcustid="Cust02" addrstate="CA"></addr>
</cust>
<cust custid="Cust01" custstoreid="Store01">
Comment One, Comment Two, Comment Three, Comment Four
<invoice invid="Inv02" invcustid="Cust01" invstatus="O"></invoice>
<invoice invid="Inv01" invcustid="Cust01" invstatus="P"></invoice>
<addr addrid="Addr01" addrcustid="Cust01" addrstate="CA"></addr>
</cust>

All of these XML output examples have the same hierarchical structure shown
in Figure 23.1.

23.1.3 Variable Structure Formats


With variable structure formats, there is the situation where the data structure
for a given XML document type can change from document occurrence to
occurrence or even within a document occurrence. The reason that XML pro-
cessors can support variable structure formats is that XML structures are
self-defining because the associated metadata that defines this semistructured
SQL/XML: Operation, Politics, Ramifications, and Solution 269

data is stored along with the XML data. This metadata includes the structure
metadata, which allows the data structure formats to change dynamically
because the metadata that defines it is changing dynamically in order to con-
form to the actual data format.
Similar to XML’s variable format, the SQL ON clause can be used to test
data values at specific hierarchical node points to control whether or not the
join is performed. This allows data structure generation to be variable depend-
ing on the data values in the data structure being processed. This allows the
variable data generation to be more dynamic.

23.1.4 Duplicate Element Use


Semistructured data also supports duplicate named element use that happens
when the same node type (name) occurs in multiple locations in the data struc-
ture. This is similar to multiple inheritance for object programming. This
means that sometimes the parent element needs to be identified by examining
its embedded metadata in order to know which duplicate element use has been
located. For example, to distinguish an address node occurrence from an
employee, rather than one from a customer, requires that the parent type be
identified. This special processing can be avoided in the SQL hierarchical struc-
tured view where duplicate element use is handled in SQL by its alias (renam-
ing) capability. This avoids context and ambiguous semantics problems by
making the structure unambiguous.

23.1.5 Shared Element Data


XML physically defines hierarchical structures by nesting elements within ele-
ments. XML IDREF specifications identify additional logical pathways that
enable logical network structures to be modeled. These are structures where
data nodes (elements) can be accessed from more than one path, allowing their
data to be shared. This does not present a problem for XML products because
they can specify manual database navigation. SQL’s database navigation is
automatic, so this presents a problem when accessing a data node that can be
accessed from multiple paths. This is because each path has a separate semantic
meaning. Different paths can select a different set of data node occurrences.
This ability can allow network structures to be defined that are not supported
in hierarchical structures because they allow for multiple paths leading to the
same node, producing an ambiguous structure because each path has a different
semantics. This means that it cannot be navigated automatically in a
navigationless schema-free fashion. For these reasons, network structures are
not supported by the SQL Hierarchical XML Processor. This allows it to
270 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

automatically support more powerful operations, like navigationless processing


and powerful hierarchical optimization.

23.1.6 XML Navigation


XPath supports a high-level navigation of XML. It supports new search, collec-
tion, and containment capabilities that permit duplicate element use, which
allows the same element type (definition) to occur in multiple places in the
hierarchical structure. Two application programming interfaces (APIs), DOM
and SAX, support a procedural approach to XML access that allows more
real-time control. This additional control is necessary for semistructured data
that could be unpredictable. Because the SQL hierarchical XML processor only
handles structured data without duplicates, its navigation is fully predictable
and can operate at a full navigationless schema-free level.

23.1.7 Namespaces
Namespaces are used in XML to solve naming conflicts. When combining SQL
views, the same naming conflicts occur as when XML documents are com-
bined. In a similar way as XML’s naming conflicts are handled, SQL’s
high-level name prefix can be added when referring to data types to prevent
naming conflicts by making these names unique.

23.1.8 Recursive Structures


Recursive structures are a special case of hierarchical structures where a portion
of the structure can double back on itself. This can happen indefinitely as
needed, as in a parts explosion where parts can contain other parts. XML sup-
ports complex recursive substructures easily in its hierarchical type structure.
SQL supports recursive structures in its flat, data-controlled hierarchies as a
flat, recursive operation that brings in data recursively through the use of a
recursive processing based around a loop that accesses a recursive related row
and unions it back into the result, where it will also eventually go through the
same process.

23.1.9 Ordered Data


XML data is considered ordered by default and the hierarchical processor will
maintain this ordering unless it is overridden. SQL data is not ordered by
default. To make SQL data ordered, it has to be ordered after it is retrieved into
the result set. Another problem is that, because the relational data is flat, even
SQL/XML: Operation, Politics, Ramifications, and Solution 271

though it might hold a multipath structure, only one pathway can be ordered
in the result set. This is because, being a single flat structure, when one pathway
is ordered, all of the other paths become unordered. A solution to this problem
by the SQL hierarchical XML processor is to seamlessly support separate order-
ing of paths assisted by its post processing. Another problem with ordering
hierarchical data is that ordering that goes against the structure can change the
hierarchical structure, inadvertently producing invalid results. For example, in
the structure employee over dependent, ordering dependent before employee
changes the structure to dependent over employee; this is problematic because
the structure has been changed, but the processor does not know this and
will produce invalid results. This is why the SQL hierarchical processor will
not allow this invalid ordering for hierarchical structures. This additional bene-
ficial level of hierarchical processing requires operating at a hierarchical
structure-aware processing.

23.1.10 XML Data Processing


It is interesting and important to note that the above unconventional XML
capabilities can be specified in XML, but the applications that process this
XML still must take on and handle these advanced capabilities and structures
themselves. This means that not all documents can be processed in the same
way by the standard XML processors, such as XSLT and XQuery. Even the
standard XML parsers DOM and SAX operate on the structure differently and
can return different results. Fortunately, the capability to deliver the result in
structured XML record format by the SQL hierarchical XML processor will
allow current non-XML applications or new applications to seamlessly access
structured XML data with all of its capabilities performed using simple SQL
requests. This is possible for the SQL hierarchical XML processor because it is
limited to structured data and does not concern itself with semistructured data
processing. This will increase its automatic structured data processing capabili-
ties. It is important to realize that structured processing is required for critical
day-to-day business processing.

23.2 Politics of SQL, XML, and the Secret Agenda


When XML came along and it quickly became apparent that SQL processors
needed to support XML, many SQL processor designers and implementers saw
this as an opportunity to finally move beyond SQL, which they considered in
its twilight years, with an eye towards XQuery as its successor. This became a
secret SQL/XML agenda, which if followed, would cause many sacrifices to be
272 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

made that would not be good for SQL and XQuery XML support. Unfortu-
nately, most of these sacrifices were made, and we live with their limitations
today in the ISO SQL/XML and W3C XQuery standard. Many of these are
described next.

23.2.1 SQL/XML Standard and XQuery Decisions Limit Capabilities


At their initial stage, the ISO SQL/XML and W3C XQuery standards were
being designed by working groups with members who were experienced SQL
processor designers. This new direction affected the design decisions that are
limiting SQL’s and XQuery’s XML processing capabilities today. This was
influenced by the design decision to include supporting relational processing in
XQuery, even though it was supposed to be designed from the ground-up to be
an XML processor. Hierarchical processing and relational processing have two
different sets of processing principles, and simultaneous support for both of
them weakens them both. This was a design decision that severely limits
XQuery hierarchical processing capability both now and into the future.

23.2.2 XQuery’s Decision to Also Support Relational Processing


An obvious problem of supporting relational processing in XQuery was the
decision to support the standard relational inner join by default, which does
not model hierarchical structures and, in fact, destroys hierarchical structures
when joining to them. The inner join solution was to use this relational pro-
cessing in XQuery as a natural bridge to transfer relational processing to
XQuery. This allows XQuery to accommodate both relational and hierarchical
processing, but the side effects and trade-off are quite limiting. This makes
XQuery unable to enforce hierarchical processing and hierarchical struc-
ture-aware processing that uses knowledge of the structure being operated on to
support new hierarchical processing capabilities. This also means there is no
way to enforce hierarchical processing, so results can be hierarchically
inaccurate and go undetected.

23.2.3 Limiting Hierarchical Support to Single-Path Processing


Another problem with XQuery is its user navigation, which is too cumbersome
to be used with the complexities of a multipath operation that requires special
processing known as lowest common ancestor (LCA) processing to coordinate
the semantics between paths. With relational processing having replaced hierar-
chical processing over three decades ago, the relational designers today were not
familiar with full hierarchical processing. They treated relational processing the
SQL/XML: Operation, Politics, Ramifications, and Solution 273

same as linear hierarchical processing, except that it required navigation. They


were satisfied with this level of support for hierarchical data. Full nonlinear pro-
cessing and its capabilities were not realized. This short-sightedness locked
XQuery into linear processing, severely limiting XML’s powerful data process-
ing capabilities. Another problem is the SQL user’s dislike of XQuery’s proce-
dural navigation, which goes against SQL’s navigationless processing. This
dislike was not realized by query designers who are very comfortable with
navigation.

23.2.4 Ignoring Navigationless Schema-Free Access Support


Ideally, SQL users want navigationless processes known today in database aca-
demic research as schema-free processing. The fact that academic research is
examining this new schema-free capability in XQuery means that it is impor-
tant to XQuery because it opens up full nonlinear multipath support.
Schema-free processing means that the user does not have to be familiar with
the data structure that is being processed. This automatically implies that
multipath query support must be supported because the user may be unknow-
ingly specifying data items over multiple paths because they do not have knowl-
edge of the data structure. The XQuery current research on schema-free
processing is to use functions on top of XQuery to perform this multipath
operation because it was left out of the design of XQuery. Unfortunately, this
approach just adds to the complexity of coding the XQuery query and is also
limited by the external limitations of this solution so that only simple
multipath queries would be possible. This may also slow XQuery’s acceptance
by SQL users.

23.2.5 Not Utilizing Standard SQL’s Natural Hierarchical Processing


It now turns out that standard ANSI SQL can perform full nonlinear hierarchi-
cal processing inherently. This means that SQL can be used directly to support
fully transparent XML processing at a full nonlinear schema-free processing
level with no hierarchical processing limitations of this full hierarchical support.
This may yet save SQL from the effects of XQuery. The weaknesses of XQuery
that have been pointed out in this chapter have prevented it from supporting
LCA processing described earlier, which is required to handle multipath pro-
cessing. This LCA processing is inherently supported in ANSI SQL when it is
performing full hierarchically processing. It properly correlates the relationships
between the accessed pathways.
274 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

23.3 Further Effects of the Secret SQL/XML Agenda


The W3C XQuery and SQL/XML standard released with SQL:2003 provide
standards that guide XML processing. The design and development of these
two XML processors also influence the capabilities that each other have. The
SQL/XML standard was designed to try to match the capabilities of XQuery as
closely as possible, and XQuery was designed not only to support XML, but
also to support relational processing. Therefore, these two products do affect
each other, which can negatively influence their capabilities by retarding their
natural and separate growth pattern and directions.

23.3.1 SQL/XML Vendor Solutions are Proprietary and Incompatible


Even with the W3C XQuery and SQL/XML standard, SQL/XML vendors
have developed and branded their own SQL/XML integration solutions. These
proprietary additions have made vendors’ SQL/XML support incompatible
with each other. This is good for the SQL vendor, but bad for the customer
because it locks them into a specific SQL vendor. These problems have pre-
vented the SQL/XML industry from reaching its full potential.

23.3.2 XQuery and SQL/XML Standard Favors Semi-structured Processing


Structured XML processing and development is being avoided today because of
its required XML navigation and its procedural coding. This is part of a larger
problem today because XML processor product designers have not realized that
there are two different problems that need to be solved using different methods.
XML was invented to support semistructured data and not plain everyday
structured data, this is a problem. After XML was designed, it was also applied
to fixed structured data database processing, such as those used in SQL. These
fixed XML hierarchical structures are unambiguous and require exact results.
These unambiguous XML hierarchical structures can and should be
nonprocedurally and navigationlessly processed. The problem today is that
XML vendor solutions do not differentiate between these two types of XML
processing and default to fuzzy semistructured processing that is not appropri-
ate for the exact processing of structured XML data processing. It would have
made better sense to have SQL’s XML processing take on this natural struc-
tured XML hierarchical processing while having XQuery specialize in
semistructured processing.
SQL/XML: Operation, Politics, Ramifications, and Solution 275

23.3.3 XML Processing Today Is Limited by User’s Linear Mindset


The XML data processing industry and vendors have still not taken full advan-
tage of the powerful additional information potential in XML’s full hierarchical
structure. Today, reuse of hierarchical structures is at a linear level by sharing a
portion of an existing path to create another path. This linear use is limited,
caused by a relational join mindset that is still stuck in a linear two-dimensional
mode. However, a significantly more powerful nonlinear use is from creating
more powerful multipath queries derived from the unlimited number of differ-
ent combinations of paths in a structure. This would allow any possible query
involving multiple paths to be specified within the structure, significantly
increasing the number of queries possible.
An even more powerful advantage of nonlinear hierarchical structures
would be the powerful multipath semantic processing of the structure. This
automatically utilizes the current unutilized hierarchical semantics that natu-
rally exist between all of the pathways used in queries; for example, selecting
data from one pathway based on data values from a different pathway. Each
different multipath query has its own unique semantics based on the naturally
occurring semantics used to solve the query. This significantly and dynamically
increases the data value of all the data in the structure and, with navigationless
access, it can be queried without nontechnical users having to know the data
structure. This schema-free nonlinear processing can be significantly extended
by dynamically joining nonlinear hierarchical data structures that offer explo-
sive growth in data value during querying, which leads to more powerful and
flexible queries.

23.3.4 XQuery Does Not Support SQL’s Powerful SELECT Operator


Most XQuery books mention how XQuery very closely parallels SQL process-
ing and contains SQL’s SELECT, FROM, and WHERE operations so that
SQL users should feel comfortable with XQuery’s operation. They claim the
WHERE clause in SQL is basically identical to its use in XQuery, and that the
FROM clause in SQL is similar to the input capability used in the XQuery
LET and FOR operations. This is true, but the extremely powerful SQL
SELECT operation is missing in XQuery. This is described below.
23.3.4.1 Adding or Removing a Data Item from XQuery Is Not Simple
The SELECT list in SQL specifies which data items in the query are to be out-
put. There is no one isolated specification of output data items in XQuery as in
SQL. In XQuery, each output item has to be specifically controlled in the pro-
cessing logic. This processing logic usually consists of multiple nestings of
276 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

hierarchical looping logic; therefore, the placement of each data item that will
be output has to be performed carefully and possibly separately for each data
item. Adding or removing an output item in XQuery takes manual program-
ming and testing. In some cases, adding an output data item will require pro-
gramming an additional hierarchical nested loop level.
23.3.4.2 Dynamically Removing a Data Item from an SQL Query Is Easy
With SQL, adding or removing an output item is just the simple process of
removing or adding an output item in the SELECT list, which can be done
dynamically at view invocation and requires no additional procedural program
logic or knowledge of the structure. With SQL’s automatic and true
nonprocedural processing, it performs only the processing necessary to produce
the result for the data items that are specified in the SELECT list. XQuery has
no automatic equivalent or similar general operation that can dynamically
accommodate the simple ad hoc adding and removing of output data items.
This also automatically drives the different required processing from query to
query. This can also dynamically control the output data structure and auto-
matically aggregates (condenses) the result nicely. These are important aspects
in a query language, and XQuery is missing them.

23.4 A Better SQL/XML Solution Using Standard SQL is


Possible
Why extend SQL’s syntax for a new foreign data format, like XML, when new
data formats can come and go? Is there an existing solid and proven technology
built around XML that warrants supporting it directly in SQL? Actually there
is, but it is not XML itself; it is the hierarchical structures it uses and their pro-
cessing. A hierarchical structure is a basic well-known semantic structure with
solid principles for processing and querying it. This means that an
XML-centric solution is unnecessary; a better choice for an XML integration
solution is a hierarchical centric-based solution.

23.4.1 The SQL Hierarchical XML Solution Stays Naturally within SQL
By utilizing SQL’s natural hierarchical processing, the SQL/XML solution
would stay within the SQL structured data box. In this way, the internal hierar-
chical processing is efficiently and naturally performed across the entire
multipath structure. This is possible because this hierarchical processing is a
natural subset of relational processing. This means that the hierarchical results
SQL/XML: Operation, Politics, Ramifications, and Solution 277

are not only hierarchically correct, but that they are also ANSI SQL relationally
and mathematically correct.

23.4.2 XML-Centric Syntax Additions Are Unnecessary


Because of the natural hierarchical processing capability in SQL, XML-centric
syntax additions are unnecessary and, because hierarchical processing is natu-
rally supported, hierarchical syntax additions are also unnecessary. Just stan-
dard SQL is necessary to support XML transparently. This natural hierarchical
processing solution supports conceptual hierarchical data processing at the
nonprocedural SQL level, and it automatically utilizes the inherent hierarchical
semantics in order to increase the value of data. This can be coupled with ad
hoc processing for very powerful decision support capability. This advanced
and high level of SQL’s integration of XML adds a protective layer from XML’s
unpredictable growing pains, and significantly increases the ROI for the cus-
tomer, with its transparent operation eliminating the need for additional XML
centric coding.

23.5 Conclusion
If there were no politics involved, SQL would have favored XML structured
data processing instead of SQL and XQuery, both of which favor semi-struc-
tured XML processing. It does not make sense for SQL, a natural structured
data processor, not to favor XML structured processing. This has left a hole in
SQL’s XML processing and is the reason that the SQL hierarchical XML pro-
cessor mentioned in this chapter was developed to handle structured XML data
processing and to prove that its new and advanced hierarchical processing is
possible. This is a solid SQL processor; it is designed for SQL users to include
XML processing, not XQuery users.
24
SQL Hierarchical XML Processor
Operation
The hierarchical processing capabilities of the SQL hierarchical XML processor
are naturally and transparently performed in standard SQL. These include:
data modeling and processing of full multipath hierarchical structures; per-
forming hierarchical joins of multipath data structures; full support and utiliza-
tion of multipath hierarchical query semantics; basic hierarchical processing,
such as node promotion and fragment processing; structure transformations;
and more. The user does not have to know XML or be aware of its processing
in order to use the SQL hierarchical XML processor. This allows SQL/XML
development projects to begin immediately and require no additional risk,
design, development, training, or debugging effort while delivering efficient
hierarchical accurate results consistently. The internal and external design of
the SQL hierarchical XML processor will be covered in this chapter.
The fundamental belief that relational and hierarchical data and process-
ing cannot be integrated seamlessly and fully is wrong. The SQL hierarchical
XML processor described in this book is implemented using a breakthrough
ANSI SQL-driven technology described in Figure 24.1 that finally solves the
SQL/XML integration problem. It is the first full nonlinear multipath hierar-
chical product for XML processing today. The SQL hierarchical XML proces-
sor automatically enables a standard SQL processor to transparently access,
integrate, and process relational and native XML data at a full hierarchical pro-
cessing level. This seamless operation ensures hierarchically accurate and cor-
rectly represented XML results. These capabilities are missing from XML
processing and significantly reduce risk and increase ROI for the user.

279
280 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Without correct hierarchical processing, the hierarchical XML result


would become invalid. This is corrected by processing the data hierarchically,
which the SQL hierarchical XML processor does naturally. Formatting or
transforming XML data based on incorrect knowledge of the structure being
processed also produces incorrect results. This is avoided by having correct
knowledge of the SQL query’s dynamically changing hierarchical structure to
be automatically formatted for XML output. Intermixing standard
nonhierarchical relational processing with hierarchical processing also invali-
dates the hierarchical results. These are detected and rejected by the SQL hier-
archical XML processor. With SQL hierarchical processing in control, XML
schema metadata is not required, but can be utilized for automatic standard
SQL hierarchical view generation.

24.1 Mapping Relational Hierarchical Structure to


Hierarchical Relational Rowset
In Figure 24.1, the relational hierarchical data modeling specification in the
SQL structure definition uses the LEFT join to define the relational hierarchi-
cal structure that is composed of hierarchically linked relational tables. These
relational tables are mapped to hierarchical structures as nodes, with the col-
umns mapped as node fields. Each node type can have multiple child types (sib-
lings) producing multiple pathways. The associated SQL semantics are
hierarchical, causing the data in the hierarchical relational rowset to be hierar-
chically processed. This hierarchical result in the hierarchical relational rowset
can then be converted back to its final relational hierarchical structure for out-
put using the captured structure metadata information extracted from the SQL
structure definition when originally processed.

24.2 Mapping Physical XML Hierarchical Structure to


Hierarchical Relational Rowset
Relational hierarchical data modeling and XML, which are inherently hierar-
chical, can both be viewed as hierarchical structures which are composed of
nodes that are hierarchically linked,as was shown in Figure 24.1. In the same
way, relational tables are mapped to hierarchical structures as nodes with the
columns mapped as node fields; XML elements are mapped to hierarchical
nodes with XML data strings and attributes mapped as node fields. Each rela-
tional table and/or XML element represents its own specific node type. This
SQL Hierarchical XML Processor Operation 281

Relational SQL Structure Definition Hierarchical Relational


Hierarchical Rowset
SELECT * FROM A
Structure
LEFT JOIN B ON A.a=B.b A B C D E
A LEFT JOIN D ON A.a=D.d A B D
LEFT JOIN E ON E.e=D.d A B C
LEFT JOIN C ON B.b=C.c A D E
B D
A B C D
C E A B D E
Captured Structure Metadata

Restore Hierarchical Structure

Figure 24.1 Mapping logical hierarchical structure to and from relational rowset.

enables the physical or logical hierarchical structure in Figure 24.2 to represent


a logical hierarchically modeled set of relational tables.
In Figure 24.2, both hierarchical and relational structures are modeled
using the SQL LEFT join in the CustViewX structure view definition. The
XML physical hierarchical structure is defined physically using a nonstandard
format. It then goes through an XML to relational definition converter. This
allows seamless integration of heterogeneous relational and XML data. This
heterogeneous hierarchical modeling and processing also solves the rela-
tional/XML integration problem with no data or information loss. Also shown
in Figure 24.2 is a currently operational automatic XML schema converter that
automatically builds the SQL structured view. Many more of these automatic
converters can be added easily. Once the XML physical hierarchical structure is
converted to an SQL logical structure, it remains that way. This avoids future
hierarchical/relational conversions.

24.3 SQL Hierarchical Query Specification with Data Filtering


Figure 24.3 demonstrates the SQL hierarchical XML processor using SQL’s
well-known SELECT, FROM, WHERE syntax and their intuitive hierarchical
structured operations. These are: the input data and its hierarchical SQL mod-
eled structure specified by the FROM clause; output data specified by the
SELECT clause that indicates the returned hierarchically related nodes and
their data order in each data node; and the data filtering specified in the
WHERE clause that hierarchically filters the data following its hierarchical
structure. These operations used together will hierarchically process the input
282 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

XML XML Structure Definition


Physical
Hierarchical Create XML CustViewX
Structure SELECT * FROM
A(A.a Char(8), A.x Char(8))
A B(B.b Int(4), B.x Char(8)), Parent A,
C(C.c Char(8), C.x Char(8)), Parent B,
B D D(D.d Char(8), D.x Char(8)), Parent A,
XML Schema
E(E.e Char(8), E.x Char(100)), Parent D
C E
XMLSchema to
Converter
SQL Logical
Hierarchical Create View CustViewX Hierarchical
Structure SELECT * FROM A Relational Rowset
LEFT JOIN B ON A.a=B.b
A A B C D E
LEFT JOIN D ON A.a=D.d
LEFT JOIN E ON E.e=D.d A B D
B D LEFT JOIN C ON B.b=C.c A B C
A D E
C E A B C D
Captured Structure Metadata A B D E

Restore Hierarchical Structure

Figure 24.2 Mapping physical XML structure to and from relational rowset.

data and automatically produce a correctly formatted structured XML output


in an interactive ad hoc fashion.
In Figure 24.3, the query result structure is controlled by the dynamic
variable SELECT list, selecting nodes A, B, and D based on the data items
selected. The query result is restructured, hierarchically removing the C node
type. Also notice that at the same time it still hierarchically filters C data occur-
rences where C=1. Even though the C node type is removed by node promo-
tion to close the gap between node A and node D, the data filtering for C=1
still occurs, removing data node C2 and its dependent D2. The resulting data
structure is then used to automatically format the output XML query result.
The captured structure metadata from Figure 24.2 makes this dynamic
XML formatting possible for structure-aware processing capabilities, such as
hierarchically optimizing the query based on its hierarchical structure and pro-
cessing. This automatically supports global views with no overhead to support
schema-free access, as shown in the figure.
SQL Hierarchical XML Processor Operation 283

Figure 24.3 SQL hierarchical query specification with data filtering.

24.4 SQL Hierarchical Processor Internal Layout


The SQL hierarchical XML processor can be implemented in one of three
ways. It can be implemented from scratch. It can be implemented by adding
the new logic into an existing SQL processor. However, the SQL hierarchical
XML processor described here was implemented by using a clean noninvasive
middleware approach that operates on an in-place standard ANSI SQL proces-
sor. This is shown in Figure 24.4, where the standard SQL hierarchical
middleware processor surrounds the in-place standard SQL processor and
makes it operate hierarchically. The middleware includes a preprocessor and
post processor to extend the standard SQL processor before and after its execu-
tion, turning it into a powerful hierarchical processor. It also includes an asyn-
chronous access processor to extend the operation of the active standard SQL
processor while it is currently executing. This is done in order to support
dynamic external hierarchical structure data input such as XML.

Figure 24.4 SQL hierarchical processor internal layout.


284 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

24.5 SQL Hierarchical XML Processor External Operations


Figure 24.4 represents the SQL hierarchical XML processor as a single block of
operational software. The standard SQL processor is surrounded by a new
extender middleware in order to extend its capabilities. Figure 24.5 shows the
SQL hierarchical XML processor with a number of external I/O interfaces to
demonstrate the full range of its operations.
The SQL query is an SQL hierarchical query that will be processed imme-
diately in order to produce a result. The query can also specify dynamic hierar-
chical joining operations. RDB is defined as physical relational tables and
logical relational hierarchical view structures for XML, as was shown in Figure
24.2. Continuing with Figure 24.5, this XML and hierarchical data use is
shown as input and output. The static relational data box stores hierarchical
static data that was externally retrieved by the dynamic hierarchical data input
routine via the preprocessor. This data is stored as separate relational tables so
that they can be combined differently on retrieval by the standard SQL proces-
sor when needed (ETL op). The dynamic hierarchical data input routine also
retrieves external hierarchical XML structure data in real-time as needed for the
standard SQL processor via the asynchronous access processor. This is con-
verted to relational data, is used immediately by the standard SQL processor
(EII op), and is not saved. RDB Data is the user’s already-in-place relational
data that is retrieved by the standard SQL processor as needed. The hierarchical
XML structured output is output produced from the post-processor.

24.6 SQL Hierarchical XML Processor Operations


The main internal operations shown in the SQL hierarchical processor in Fig-
ure 24.5 are discussed in the following sections. They follow the natural opera-
tional flow indicated.

24.6.1 Preprocessor
Starting at the top of Figure 24.5, the preprocessor accepts an SQL query defi-
nition, RDB definition, XML schema, or a hierarchical definition. The RDB
definitions are checked for valid specification or required modifications that
may be necessary and submitted to the standard SQL processor for accepting
their RDB Data when required. XML schema is checked for a valid hierarchical
structure definition. Hierarchical definitions define generic hierarchical struc-
ture input for dynamic and static retrieved hierarchical input. For static rela-
tional data input, the preprocessor retrieves the data immediately using the
SQL Hierarchical XML Processor Operation 285

Figure 24.5 SQL hierarchical XML data flow processor operation.

dynamic hierarchical data input routine, which also converts it from hierarchi-
cal to relational data. The relational data is then stored in separate relational
tables in the static relational data box by the preprocessor for future input when
requested by the standard SQL processor. For dynamic data, the data definition
is updated so that it defines a dynamic external file or UDF call in the standard
SQL processor to invoke the asynchronous access processor when external hier-
archical data input is required. In this way, the dynamic hierarchical data input
routine is invoked when external hierarchical data retrieval is necessary.
In the preprocessor, SQL queries are analyzed in order to determine and
extract the data structure that is being accessed. This optimization eliminates
pathways that do not require access. The optimized structure is passed through
to the standard SQL processor for immediate query processing. The optimized
metadata structure is also made available to the other subprocessors in the hier-
archical processor so that they can perform structure-aware processing. Optimi-
zation is also important because it is needed to support global views with no
overhead and schema-free navigationless processing.

24.6.2 Standard SQL Processor


The in-place standard SQL processor receives control from the preprocessor
and then processes the rewritten input SQL to drive the query. This includes
using an external file definition or a UDF request that will invoke the asyn-
chronous processor, which in turn, accesses the correct hierarchical input using
286 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

the dynamic hierarchical data input routine. This process is performed when
external hierarchical data is required. The dynamic hierarchical data input rou-
tine is used to retrieve both static and dynamic data requests. The standard
SQL processor has access to the dynamic hierarchical data (EII), static rela-
tional data (ETL), and standard RDB data.

24.6.3 Asynchronous Access Processor


The asynchronous access processor receives control when external hierarchical
data is required and invokes the proper dynamic hierarchical data input, such as
XML, with a pointer to the required metadata needed to locate and navigate
the hierarchical data structure. The input data is returned converted from hier-
archical to relational rowset. This externally derived relational rowset will be
naturally merged into the internal rowset on return to the standard SQL pro-
cessor. This is seamless heterogeneous processing.

24.6.4 Postprocessor
The postprocessor receives the result set from the standard SQL processor. The
rowset is converted to a hierarchical structure that renormalizes the data by
removing replicated data. Replicated data is data that has been replicated by the
relational Cartesian joining process; these replications need to be removed from
the resulting hierarchical structure. Duplicate data, for our purposes, is differ-
ent than replicated data, as duplicate data is valid data that represents identical
data having the same key value. These duplicates should not be removed, but
this does require a special processing in order to distinguish between duplicate
data and the replicated data. This process is also used to keep keyless XML
structures in their initial order because order preservation is assumed in XML.
XML does not need to use keys because the data in the structure is contiguous.
The resulting hierarchical structure is output using structured formatted XML.
This requires structure-aware processing because the resulting structure can be
different than the initial structure. Joining hierarchical data structures dynami-
cally extends the structure changing it. It is also standard not to include empty
nodes in the hierarchical result. Therefore, node promotion is used to remove
them. This also affects the data structure result as shown in Figure 24.3.

24.7 Conclusion
This chapter has described how the SQL hierarchical XML processor operates
internally and externally. What makes this XML processor novel is that it uses
SQL Hierarchical XML Processor Operation 287

standard ANSI SQL syntax and semantics to make SQL naturally operate hier-
archically. It showed how the SQL relational data processing can be mapped
naturally to hierarchal data processing and back again. This is possible because
hierarchical processing is a subset of relational processing, which has been
proven by the SQL hierarchical XML processor. This chapter has shown where
the relational-to-hierarchical and hierarchical-to-relational conversions take
place and how heterogeneous processing is seamlessly performed. Also covered
were the processes in the pre-, post-, and asynchronous processing that were
necessary to make SQL operate hierarchically and support XML input and out-
put formatted XML. It also shows how external static and dynamic hierarchical
data access is achieved and can be processed together.
25
SQL Hierarchical XML Processor
Examples
This final chapter will show the SQL hierarchical XML processor in actual
operation demonstrating its common and critical operations by a series of exe-
cuted query examples. The live query results are shown in XML using attribute
mode, which is the default for the SQL hierarchical XML processor. These
examples are annotated to help explain their internal and external operation.
Supporting XML requires the hierarchical processor to support full multipath
nonlinear processing. Figure 25.1 shows how the relational StoreView and its
subviews, CustView and EmpView, are represented hierarchically for use in
this chapter.
Figure 25.2 shows the hierarchical data and its hierarchical structure. This
can be used to further verify the XML hierarchical results in this chapter. Figure
25.3 contains the hierarchical structure definitions that define what the differ-
ent line and box figures mean. These are used to annotate the examples in this
chapter. All the operations shown in the examples of this chapter have been
explained in this book.

25.1 Node Selection with SQL SELECT Operation


Using the SQL hierarchical XML processor, output node selection is controlled
by which nodes are identified on the invoking SQL statement. A node is
selected for output if at least one data field is identified for output from it using

289
290 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Store View Structure Store View

Store CREATE VIEW StoreView AS


SELECT * FROM Store
CustView EmpView LEFT JOIN CustView
ON StoreID=CustStoreID
Cust Emp
LEFT JOIN EmpView
ON StoreID=EmpStoreID
Invoice Addr Dpnd Eaddr

SELECT StoreID, CustID, InvID, AddrID, EmpID, DpndID, EaddrID


FROM StoreView

Intermediate Relational Result Set

StoreID CustID InvID AddrID EmpID DpndID EaddrID


Store01 Cust01 Inv01 Addr01 Emp01 Dpnd01 Addr01
Store01 Cust01 Inv01 Addr01 Emp02 Addr03
Store01 Cust01 Inv02 Addr01 Emp01 Dpnd01 Addr01
Store01 Cust01 Inv02 Addr01 Emp02 Addr03
Store01 Cust02 Inv03 Addr02 Emp01 Dpnd01 Addr01
Store01 Cust02 Inv03 Addr02 Emp02 Addr03
Store01 Cust02 Inv03 Addr04 Emp01 Dpnd01 Addr01
Store01 Cust02 Inv03 Addr04 Emp02 Addr03
Store01 Cust03 Addr03 Emp01 Dpnd01 Addr01
Store01 Cust03 Addr03 Emp02 Addr03

Figure 25.1 StoreView and data.

the SELECT operation. Node promotion, node collection, and fragment oper-
ation rely on node selection, and are discussed in this section.

25.1.1 Selecting a Single Linear Path


Automatic hierarchical processing has usually been limited to accessing a single
hierarchical path at a time. One of the main purposes of the SQL hierarchical
XML processor is that the user does not need to know the hierarchical structure
in order to use it because it is a multipath processor and supports schema-free
processing. This means that limiting data access to a single path is no longer a
concern. This also takes care of processing the data out of its natural database
node order. The query shown in Figure 25.4 is specified out of node order
(CustID before the root StoreID) in the SELECT list, but it still operates cor-
rectly, returning the nodes in their correct node order. This again is because
user navigation, or knowledge of the structure, is not necessary with the SQL
hierarchical XML processor that supports schema-free processing. Notice in the
following XML result that data value Cust03 was hierarchically preserved even
SQL Hierarchical XML Processor Examples 291

Intermediate Relational Result Set

StoreID CustID InvID AddrID EmpID DpndID EaddrID


Store01 Cust01 Inv01 Addr01 Emp01 Dpnd01 Addr01
Store01 Cust01 Inv01 Addr01 Emp02 Addr03
Store01 Cust01 Inv02 Addr01 Emp01 Dpnd01 Addr01
Store01 Cust01 Inv02 Addr01 Emp02 Addr03
Store01 Cust02 Inv03 Addr02 Emp01 Dpnd01 Addr01
Store01 Cust02 Inv03 Addr02 Emp02 Addr03
Store01 Cust02 Inv03 Addr04 Emp01 Dpnd01 Addr01
Store01 Cust02 Inv03 Addr04 Emp02 Addr03
Store01 Cust03 Addr03 Emp01 Dpnd01 Addr01
Store01 Cust03 Addr03 Emp02 Addr03

Hierarchical Data Tree

Store01

Cust01 Emp01, F
Cust02
Cust03 Emp02, Null

Addr01
Addr04 Dpnd01, D
Inv01, P
Inv02, O Addr02
Inv03, O Addr03 EAddr: EAddr:
Addr01 Addr03

Figure 25.2 Hierarchical data used in examples.

Hierarchical Structure Processing Definitions

Solid Box: Node is selected for output


Dashed Box: Node is not selected for output
Solid Line: Connects nodes into active structure
Dashed Line: Connects nodes not in active structure
Dashed arrow: Qualification flow or a note pointer
Solid arrow: Structure Connector and default
Qualification flow

Figure 25.3 Hierarchical structure legend.

though it has no lower-level invoice child. Examining the XML and hierarchi-
cal data tree in Figure 25.4, it can be seen that this is the correct SQL result.

25.1.2 Node Promotion with Single Path


This SQL query in Figure 25.5 below is similar to the previous query, except
that the Cust node is not selected. The user does not include any fields from the
292 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT CustID, StoreID, InvID FROM StoreView

<Root>
<store storeid="Store01"> Result
StoreView
<cust custid="Cust01">
<invoice invid="Inv01"/> Store
Store
<invoice invid="Inv02"/>
</cust> Cust
<cust custid="Cust02"> Cust Emp
<invoice invid="Inv03"/> Invoice
</cust>
<cust custid="Cust03"> Invoice Addr Dpnd Eaddr
</cust>
</store>
</root>

Figure 25.4 Selecting a single pathway.

SELECT StoreID, InvID FROM StoreView


StoreView Node Same as Relational
<root> Promotion Projection
<store storeid="Store01"> Store
<invoice invid="Inv01"/> Store Store Cust Invoice
<invoice invid="Inv02"/> Cust
<invoice invid="Inv03"/> Invoice Store Invoice
</store> Invoice
<root>

Figure 25.5 Node promotion.

Cust node in the SELECT list. Normal hierarchical operation is to simply skip
over the Cust node and to keep accessing other fields down the path. This is the
same operation as with relational projection, which also slices out unselected
columns that can be equated to hierarchical nodes. This node promotion oper-
ation can be turned off in order to keep the accessed structure intact as it was.
This can be specified by using the KEEP NODE option in the FOR XML
clause at the end of the query. This is useful for XML navigation purposes
where the same navigation logic can be used as the one defined for the original
structure.

25.1.3 Node Collection with Multiple Paths


As mentioned previously, the user does not have to know the structure or per-
form navigation. Therefore, the processing and result can involve multiple
SQL Hierarchical XML Processor Examples 293

paths. This does not present a problem for the user since there are no special
requirements for specifying paths on the SELECT request. This is shown in the
following request, which selects fields from Dpnd and Invoice nodes that are
located on separate paths. As an additional test, two fields are selected from the
Dpnd node. The DpndCode is referenced first, which is out of node order and
separate from its other Dpnd node field DpndID. This demonstrates that there
are no data order requirements for users to worry about.
To make this query more interesting, SQL SELECT references for the
intervening Cust and Emp nodes have been excluded, causing node promotion
on both paths to occur in Figure 25.6. This causes the Store node to collect the
node promotion from both paths. This node collection processing is demon-
strated by the following generated structure and XML. It also shows the addi-
tional capability of changing the default collection node of “root” to
“storeview” using the FOR XML option of UNDER to specify a different col-
lection name.

25.1.4 Selecting Structure Fragments


Structure fragments are similar to node promotion in that they are partial node
structures isolated by the SQL SELECT operation that also excludes the origi-
nal root node Store, making separate different fragments possible. The FOR
XML with “NO COLLECTION” is used in order to avoid using a collection
node. As you can see in the generated structure and XML in Figure 25.7, there
are multiple occurrences of the Cust and Emp fragments returned.

SELECT DpndCode, StoreID, InvID, DpndId, EaddrID FROM StoreView


FOR XML UNDER StoreView

<storeview> Store Result


<store storeid="Store01">
<invoice invid="Inv01"/> Invoice Dpnd Eaddr
<invoice invid="Inv02"/>
<invoice invid="Inv03"/>
<dpnd dpndcode="D" dpndid="Dpnd01"/> Store
StoreView
<eaddrid="Addr01"/>
<eaddrid="Addr03"/>
</store> Cust Emp
</storeview>
Invoice Addr Dpnd Eaddr

Figure 25.6 Node collection.


294 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT CustID, InvID, EmpID, DpndID, EmpStatus FROM StoreView


FOR XML NO COLLECTION
StoreView
<cust custid="Cust01">
<invoice invid="Inv01"/>
Store
<invoice invid="Inv02"/>
</cust>
<cust custid="Cust02"> Cust Emp
<invoice invid="Inv03"/>
</cust>
<cust custid="Cust03"> Invoice Addr Dpnd Eaddr
</cust>
<emp empid="Emp01" empstatus="F"> Results
<dpnd dpndid="Dpnd01"/>
</emp> Cust Emp
<emp empid="Emp02" empstatus="">
</emp> Invoice Dpnd

Figure 25.7 Selecting structure segments.

25.2 Multipath Hierarchical Data Filtering using WHERE


Clause
The SQL WHERE clause filters the entire structure’s data by excluding or
including data based on your view of the operation. If the WHERE clause is
not specified, then all of the data is included, which means that when specified,
it must be excluding data. On the other hand, when the WHERE clause is
used, it is specified as if you are specifying the data to be included. It is easier to
comprehend the positive, so it will be described here as using the WHERE
clause to specify what data to be included. This is not that easy; simply specify-
ing a given value to include also qualifies all other related data occurrences for
preserving and eliminates all others. Nonlinear multipath qualification can be
complex to understand, but it is extremely powerful and, being performed
automatically in SQL, the user does not need to be concerned with how it is
performed since it is performed transparently. This process is described further
in the following examples.

25.2.1 Downward Path Data Qualification


The SQL query in Figure 25.8 returns all invoice IDs (Invoice IDs 1, 2) under
the customer occurrence “Cust01” when the CustID tests positive. All invoice
IDs under the qualified ancestor data occurrence “Cust01” are returned. If
“Cust01” occurred in other stores, their data would also be included. Also note
SQL Hierarchical XML Processor Examples 295

SELECT CustID, InvID FROM StoreView WHERE CustID=’Cust01’

<root>
<cust custid="Cust01"> Cust WHERE CustID=’Cust01’
<invoice invid="Inv01"/>
<invoice invid="Inv02"/>
</cust> Invoice
</root>

Figure 25.8 Downward qualification.

that Inv03 did not qualify because its parent Cust occurrence Cust02 did not
qualify.

25.2.2 Upward Path Data Qualification


The SQL query in Figure 25.9 returns only the Customer ID (“Cust01”) that
is associated with the qualifying invoice “Inv01” located below it. This is
because only the single path occurrence up from the path of the qualified
invoice data is qualified. Also note that invoice Inv02 would have also qualified
Cust01 because it is a twin occurrence of Cust01, while invoice Inv03 would
have qualified Cust02. Twins have the same node type and the same parent
occurrence. Children have their own node type under their parents because
they are located on different sibling paths under their parent node type,
whereas twins are on the same path because they have the same parent node
data occurrence; therefore, children can have different parent data occurrence
but still have the same parent node type.

25.2.3 Bidirectional Data Qualification


The SQL query in Figure 25.10 returns all invoice IDs (Invoice IDs 1, 2)
under the customer occurrence “Cust01” being tested positive and only the
Store ID located on the single qualified path above.This result is a combination

SELECT CustID FROM StoreView WHERE InvID=’Inv01’

<root> Cust
<cust custid="Cust01"/>
</root>
Invoice WHERE InvID=’Inv01’

Figure 25.9 Upward qualification.


296 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT StoreID, InvID FROM StoreView WHERE CustID=’Cust01’

<root> Store
<store storeid="Store01"> Store
<invoice invid="Inv01"/> WHERE
<invoice invid="Inv02"/> CustID=’Cust01’ Cust
</store> Invoice
</root> Invoice

Figure 25.10 Bidirectional qualification.

of the previous two single path queries, which qualifies both down to invoice
and up to Store at the same time. Actually, with hierarchical processing, these
queries can be processed in either direction first. Top-down processing is usu-
ally more efficient because, if the root is disqualified, the children do not need
to be accessed.

25.3 Simple Multipath Nonlinear Data Qualification


A simple multipath data qualification example involves selecting data from one
path of a hierarchical structure, based on data in another path of the same
structure. This example is simple multipath hierarchical processing, but
involves the complex hierarchical processing of the two paths under the lowest
common ancestor (LCA) node data occurrence in order to get a meaningful
result. In academic terms, this fuzzy type of database processing is known as
LCA queries. XQuery does not handle LCA queries. The XQuery coder must
specify the procedural LCA logic for multipath processing, if it is done at all. If
not performed for multipath processing, the result will probably not be
meaningful.
LCAs used for hierarchical structure processing are an important concept,
but are not well understood. In physical hierarchical databases, they are per-
formed by using tree walking back and forth across the paths. In relational
databases, this process is actually performed a single row at a time because of
the relational engine’s Cartesian product generation of the hierarchical data.
This Cartesian product is guided by the LCA node naturally. What this means
is that multipath queries are automatically and correctly processed for the user.
There are two types of LCA processing performed in SQL. These are SELECT
driven LCA processing and WHERE driven LCA processing that are described
below.
SQL Hierarchical XML Processor Examples 297

25.3.1 LCA Many-to-One Result Data Qualification


The SQL query in Figure 25.11 returns “Addr01” related through “Cust01” its
LCA for value “Inv02” on another path. Sibling paths across a query are related
by their LCA data occurrence. This is a SELECT driven LCA processing that
uses both the SELECT list and WHERE clause references to isolate the LCA.
Note that Inv01 also qualifies Addr01 (many to one) because Inv01 and Inv02
are twin occurrences.

25.3.2 LCA One-to-Many Result Data Qualification


The SQL query in Figure 25.12 returns invoices (“Inv01” and Inv02”) under
the LCA data occurrence “Cust01” for value “Addr01” (one-to-many). ALL
occurrences are selected under the LCA node occurrence as in the downward
path qualification covered previously in Section 25.2.1.

25.3.3 LCA Can be Located Higher than Parent


The SQL query in Figure 25.13 returns all employees (“Emp01” and
“Emp02”) under the LCA data occurrence “Store01.” This shows that the LCA
can be anywhere up the structure, with possibly many levels between the LCA
and its initial lower-level references, such as Inv02 in Figure 25.13. In this case,
Cust01 is parent of Inv02, which is also under the LCA.

SELECT AddrID FROM StoreView WHERE InvID=’Inv02’

<root> LCA Cust01


<addr addrid="Addr01"/>
</root>
WHERE InvID=’Inv02’ Inv02 Addr01 SELECT AddrID

Figure 25.11 Many-to-one qualification.

SELECT InvID FROM StoreView WHERE AddrID=’Addr01’

<root> Cust01 LCA


<invoice invid="Inv01"/>
WHERE
<invoice invid="Inv02"/>
Inv01 Addr01 AddrID=’Addr01’
</root>
Inv02

Figure 25.12 One-to-many qualification.


298 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT EmpID FROM StoreView LCA Store01


WHERE InvID=’Inv02’

<root> Cust01
<emp empid="Emp01"/> WHERE
<emp empid="Emp02"/> InvID=’Inv02’ Emp01
Inv02 Emp02
</root>

Figure 25.13 LCA located higher than parent.

25.3.4 LCA Data from Up and Down the Structure


In the SQL query in Figure 25.14, “Inv02” is selected along with “Cust01” and
“Store01” because they are on a selected path up the structure, and on the
downside of LCA occurrence of “Store01,” all Employees (“Emp01” and
“Emp02”) are selected. The downward path can qualify multiple occurrences.
SELECT InvID, CustID, StoreID, EmpID FROM StoreView WHERE
InvID=’Inv02’

25.3.5 Multiple LCAs


The SQL query in Figure 25.15 is similar to the previous query, but removes
InvID and StoreID from the SQL SELECT list and adds AddrID. With this
query, “Addr01,” “Cust01,” and Employees (“Emp01 and Emp02”) are
selected. The big difference with this query is that there are two LCAs as indi-
cated. “Store01” the same as the last example, but by also selecting AddrID,
“Cust01” is a second (nested) LCA and selects all AddrID’s under the qualified
Cust01. In this case, there is only one “Addr01.” These multiple LCAs are
internally more complex, but not for the coder.

SELECT InvID, CustID, StoreID, EmpID FROM StoreView WHERE InvID=’Inv02’

<root>
<store storeid="Store01"> LCA Store01
<cust custid="Cust01">
<invoice invid="Inv02"/>
</cust> Cust01
<emp empid="Emp01"/> WHERE
<emp empid="Emp02"/> Emp01
InvID=’Inv02’ Inv02 Emp02
</store>
</root>

Figure 25.14 LCA qualifies up and down the path.


SQL Hierarchical XML Processor Examples 299

Figure 25.15 Multiple LCAs.

25.4 Complex Multipath Nonlinear Data Qualification


So far, simple multipath data qualification involving simple single-sided quali-
fication tests producing static LCA have been shown. However, more complex
qualification tests are possible that will produce more complex hierarchical
decision logic where qualification is based on values in multiple paths. This
involves compound WHERE clauses involving multiple paths that produce
WHERE driven LCA logic. This uses the natural Cartesian product operation
of producing all data combinations to test all combinations of qualification
tests across paths. When hierarchical structures are defined in SQL, the Carte-
sian product data replications formed around join points are also the LCA
points. Therefore, the control of data replication is naturally centered on the
LCAs and their hierarchical processing logic. In this way, the correct
combination of tests performed is geared to the LCA logic.

25.4.1 LCA Determines Range of Combinations for Decision Logic


The SQL query in Figure 25.16 tests the two sibling paths under the LCA node
Cust where one of the data combinations tested does match the test for “Inv02”
and for “Addr01” on the other side selecting “Cust01.” This shows how the
relational Cartesian product can perform these complex multipath hierarchical
LCA tests on Invoice and Addr (avoiding hierarchical tree traversal logic). Note
that Cust02 and Cust03 node occurrences were not qualified because their
node occurrences did not have matching Invoice and Addr node data
occurrences.

25.4.2 LCA Data Combinations are Controlled by Data Occurrence


The SQL query in Figure 25.17 returns a null result because no data is quali-
fied. This is because, while there is an “Inv02” and “Addr02” data occurrence,
300 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT CustID FROM StoreView WHERE InvID=’Inv02’ AND AddrID=’Addr01’

<root>
<cust custid="Cust01"/> Cust01 LCA for WHERE decision logic
</root>

WHERE InvID=’Inv02’ Inv02 Addr01 AND AddrID=’Addr01’


Inv01

Figure 25.16 WHERE clause LCA.

they are not related by the same LCA data occurrence. In Figure 25.16, the
LCA data occurrence was Cust01, which was the same LCA occurrence of
both. In Figure 25.17, Inv02 and Addr02 under the Cust node are in different
parent data occurrences and are not considered meaningfully related because
they have different LCAs. These are standard, nonlinear, hierarchical process-
ing rules and SQL processing naturally follows them with its Cartesian prod-
uct-controlled data replication. Adding StoreID to the SELECT list does not
select StoreID at the higher level either because the LCA and the Cartesian
product replicated data generation remains the same for the Invoice and Addr
nodes.

25.4.3 Variable LCAs with OR Decision Logic


So far, as shown with the LCA AND decision logic, both sides work together.
However, with hierarchical level LCA OR decision logic can get very tricky.
The following SQL query in Figure 25.18 returns both invoices (“Inv01” and
“Inv02”) and “Addr01.” Normally with OR operations, if the first condition is
true, then the second condition on the right side does not require testing. This
is not the case for hierarchical processing semantics; if it was, then “Inv01”

Figure 25.17 LCA data combinations.


SQL Hierarchical XML Processor Examples 301

Figure 25.18 Variable LCA.

would not be selected. One might think that it should not be selected because
the left side does test true with InvID=”inv02,” but it is selected because the
other sibling (right) side test where AddrID=”Addr01” was also tested and
selected. It qualified both invoices under the LCA “Cust01.”
The above description means that both sides of the OR condition always
need to be tested at the hierarchical query level. This result and hierarchical
logic can be proven by breaking the query into two queries, each with one side
of the WHERE clause and unioning the results together. This logic also results
in LCA qualification logic being dynamically switched between the left and
right OR condition, depending on which side is true, which is tested below in
the examples in Figure 25.18. This double-sided testing of OR conditions on
the relational WHERE clause is naturally performed by the Cartesian product,
building all combinations so that both sides of the WHERE clause are eventu-
ally tested over multiple rows containing replicated data, so that the following
query operates correctly in relational hierarchical processing.

25.4.4 Complex Multipath LCA Decision Logic


The SQL query in Figure 25.19 demonstrates that complex decision logic can
involve more than one instance of LCA decision logic. This query first evaluates
the AND condition, which is false (no LCA combination found under a Cus-
tomer node for “Inv02” and “Addr02”), and then evaluates this false condition
with the right side of the OR condition under the LCA node Store, which is
true (DpndID=”Dpnd01”). Based on this, “Store01” is selected, all customers
(“Cust01,” “Cust02,” and “Cust03”) are selected (because they are oppo-
site-side qualified, and only employee “Emp01” was selected. This is because
the opposite side of employee “Emp01” didn’t qualify; only its side with
“Dpnd01” did, which only qualified “Emp01” above it. This is all performed
under the covers of SQL automatically and accurately.
302 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Figure 25.19 Multipath LCA.

25.4.5 LCA Logic too Complex to Hand Code


As mentioned previously, LCA processing is necessary for multipath process-
ing, and automatic LCA processing is necessary for schema-free processing.
Schema-free processing is necessary for complex queries. Most queries do refer-
ence multiple pathways. This section has shown how multipath queries do
require extremely complex LCA processing that, in turn, requires automatic
processing. SQL does perform all of this LCA processing automatically when it
is operating hierarchically. This is quite amazing since LCA processing was not
designed into SQL. It is being performed by the Cartesian product processing.

25.5 Backward Path Data Filtering


The SQL ON clause supplies a lot more control over the WHERE clause by
operating from a specific join point. What is not generally realized is that the
ON clause can also reference node values that are further back up the path and
are still active to further dynamically increase its data driven control. This offers
new solutions for dynamic processing problems.

25.5.1 Static Backward Path Data Filtering


When joining tables or views, the ON clause can also specify additional join
criteria with the AND operator that acts as an additional data filter that takes
effect on the active join location. What gives this capability exceptional power
is that the filter criteria can be supplied from anywhere back up from the active
path position. In the SQL example in Figure 25.20, the Addr node ON clause
uses the EmpStatus value from the Emp node above to determine the filtering
SQL Hierarchical XML Processor Examples 303

SELECT CustID, EmpID, AddrID, EmpStatus


FROM Emp
LEFT JOIN Cust ON CustID= EmpCustID
LEFT JOIN Addr ON CustID=AddrCustID Emp
AND EmpStatus=’F’
<root>
Cust
<emp empid="Emp01" empstatus="F">
<cust custid="Cust01">
<addr addrid="Addr01"/> Addr
</cust>
</emp>
<emp empid="Emp02" empstatus="">
<cust custid="Cust03">
</cust> “Addr03” filtered out
</emp>
</root>

Figure 25.20 Static backward path filtering.

for Addr node. In this example, you will notice that only the Addr node is pres-
ent for Employees who are full-time (have an “F” status code). This filtering
does not affect the Emp and Cust nodes because the LEFT join processing only
affects the Addr node and its forward path.

25.5.2 Dynamic Backward Path Qualification


When joining tables or views, the ON clause can also specify additional join
criteria with the AND operator that will act as an additional dynamic path
qualification from values back up the active path. This capability additionally
qualifies the path at the point its ON clause is referring to a variable data value.
In the example in Figure 25.21, the Emp node ON clause uses the StoreID
value above it from the Emp node in order to restrict Emp to its assigned Store.
In this example, it is possible that the Emp under Cust belongs to a different
Store than Cust, adding an additional level of filtering, causing Emp not to be
qualified if Emp does not have the same StoreID as Store. This filtering does
not affect the Store and Cust nodes. The Emp node is attached directly to the
lowest upper node referenced which is the Cust node and not the Store node.

25.6 Advanced Structure Linking with Data Mashups


Conceptual hierarchical joins are key to the SQL hierarchical XML processor’s
easy and powerful operation. The combined hierarchical structures are unified
304 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

SELECT StoreID, CustID, EmpID Input


FROM Store Tables
LEFT JOIN Cust ON StoreID=CustStoreID
LEFT JOIN Emp ON CustID= EmpCustID Store
AND StoreID=EmpStoreID
<root> Cust
<store storeid="Store01">
<cust custid="Cust01">
<emp empid="Emp01"/> Emp
</cust>
<cust custid="Cust02">
</cust>
<cust custid="Cust03">
<emp empid="Emp02"/>
</cust>
</store>
</root>

Figure 25.21 Dynamic backward path filtering.

into a hierarchical superstructure, with all of its hierarchical semantics com-


bined into an even more powerful structure with additional semantics that con-
trol its nonprocedural processing.
The SQL hierarchical XML processor has introduced a new hierarchical
capability. When linking structures, linking below the root of the lower-level
structure is now possible. It is extremely flexible, avoids having restrictions, and
is very useful. This allows unrestricted data mashups. This will cause the
lower-level structure to be filtered based on the link point’s value and is covered
in Sections 25.6.2 and 25.6.4. Lower-level linking is possible in views because
SQL views cause their structure to be materialized before being joined to the
upper-level structure. This is also necessary for logical views. This also means
that the original root remains the root for data modeling purposes because the
original root in logical and physical structures still have control over what data
is in the structure. This capability is also required in other capabilities shown
later in this chapter. Examples of linking below the root in a lower-level struc-
ture are discussed next.

25.6.1 Hierarchical Structure Linking


Full hierarchical structure linking operates on entire hierarchical structures,
combining them automatically into a hierarchical result structure. It uses the
same LEFT outer joins that modeled the structures being linked
SQL Hierarchical XML Processor Examples 305

unambiguously and precisely as required. The hierarchical structures being


linked can be dynamic or can be stored in SQL views and are easily linked con-
ceptually into a hierarchically unified structure, as shown in Figure 25.22,
which demonstrates how structures are combined. The hierarchical semantics
of each structure are hierarchically linked together with the LEFT join opera-
tion creating a unified hierarchical structure that increases the total semantic
value and query capability.

25.6.2 Linking Below Root of Lower Structure with Root Selected


When combining views, the lower-level view is materialized before being linked
to the above structure. This occurs because right-sided nesting is performed
with SQL views because of the way views expand, pushing the current ON

SELECT * FROM EmpView LEFT JOIN CustView ON EmpCustID=CustID

Emp
EmpView

Dpnd Eaddr Cust CustView

Invoice Addr

<root>
<emp empid="Emp01" empstoreid="Store01" empcustid="Cust01"
empstatus="F">
<dpnd dpndid="Dpnd01" dpndempid="Emp01" dpndcode="D"/>
<eaddr eaddrid="Addr01" eaddrcustid="Cust01" eaddrstate="CA"/>
<cust custid="Cust01" custstoreid="Store01">
<invoice invid="Inv01" invcustid="Cust01" invstatus="P"/>
<invoice invid="Inv02" invcustid="Cust01" invstatus="O"/>
<addr addrid="Addr01" addrcustid="Cust01" addrstate="CA"/>
</cust>
</emp>
<emp empid="Emp02" empstoreid="Store01" empcustid="Cust03"
empstatus="">
<eaddr eaddrid="Addr03" eaddrcustid="Cust03" eaddrstate="NV"/>
<cust custid="Cust03" custstoreid="Store01">
<addr addrid="Addr03" addrcustid="Cust03" addrstate="NV"/>
</cust>
</emp>
</root>

Figure 25.22 Hierarchical structure joining.


306 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

clause to the right to expand the lower-level structure view with its own ON
clauses. The right-sided nesting leaves the current left-sided structure in sus-
pension and starts building the new right-sided structure until it is complete. It
is then joined to the left structure. This materialized lower-level view can be
qualified on its ON clause before joining. This makes it a look-ahead
operation.
The lower-level structure’s original root node remains the root node even
if the link point is below the root. This is because the original root node still
affects what tables (or nodes) are in the structure. In this example, if “Cust01”
data occurrence did not exist, then this lower-level structure occurrence would
not exist. This is demonstrated in Figure 25.23.
The filtering of the linking below the root needs to be pointed out in the
example directly above. The lower-level ON clause reference was to the Addr
node in the Cust structure. The matching Eaddr higher-level node values from
the Emp structure are “Addr01” and “Addr03.” These will match the
“Addr01” and “Addr03”of the lower-level structure and will qualify them up,
down, and across qualified paths (Inv01, Inv02). In this example, there are no
lower-level nodes under Addr01 and Addr03 (downward). However, if there
were lower-level nodes under Addr01 and Addr03, they would be qualified if
they were in the Select list. Upward, Cust01 and Cust03 qualify. However,
notice that Addr02 and Addr04 under Cust02 that did not match do not qual-
ify, which means that Cust02 does not qualify either because it is not qualified

SELECT EmpID, DpndID, CustID, InvID, AddrID


FROM EmpView LEFT JOIN CustView ON EAddrID=AddrID

<root> Input Structures


<emp empid="Emp01">
<dpnd dpndid="Dpnd01"/> Emp Result
<cust custid="Cust01"> Structure
<invoice invid="Inv01"/>
<invoice invid="Inv02"/>
Dpnd Eaddr Emp
<addr addrid="Addr01"/>
</cust>
</emp>
<emp empid="Emp02"> Dpnd Cust
<cust custid="Cust03"> Cust
<addr addrid="Addr03"/>
</cust>
</emp> Invoice Addr Invoice Addr
</root>

Figure 25.23 Linking below root with root selected.


SQL Hierarchical XML Processor Examples 307

from any other qualified node occurrence. For this reason, they are filtered out
by the join operation. This is very sophisticated lower-level structure processing
that is carried out easily. It applies to logical or physical structures because they
are both in the rowset at this point.

25.6.3 Linking Below Root of Lower Structure without Root Selected


The original root node also logically and semantically holds the lower structure
together so that, if the root Cust is not selected for output, all of the selected
lower-level items—including those that are not directly connected (on the same
path) to the link point (Addr)—are still processed correctly. This is shown in
Figure 25.24 where the Cust root is not selected but the Addr and Invoice
nodes are. The Addr and Invoice nodes are not connected directly to the link
point but are related indirectly through the Cust node, which is a common
ancestor. This does not present a problem because the Cust node is still logi-
cally and semantically the lower-level root, even if it is not selected as the exam-
ple below demonstrates. This holds the record together in the working
structure.

SELECT EmpID, DpndID, InvID, AddrID


FROM EmpView LEFT JOIN CustView ON EAddrID=AddrID

Input
Emp Result Emp Structures
Structure
Dpnd Eaddr
Dpnd Invoice Addr

Cust
<root>
<emp empid="Emp01">
<dpnd dpndid="Dpnd01"/> Invoice Addr
<invoice invid="Inv01"/>
<invoice invid="Inv02"/>
<addr addrid="Addr01"/>
</emp>
<emp empid="Emp02">
<addr addrid="Addr03"/>
</emp>
</root>

Figure 25.24 Linking below root without root selected.


308 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

25.6.4 Filtering Below Root of Lower View with Qualification


The same view capability that enables linking below the root of the lower-level
view allows lower-level data to be used as ON clause filtering criteria. This
allows either filtering out (removing) or accepting the full view. The following
SQL statement and result in Figure 25.25 establishes the full data before filter-
ing is applied.
It qualifies the customer view using AddrID. This removes the portion of
the view with Addr03.The EmpView is always preserved. This ability to refer-
ence further down the path in order to qualify data is quite powerful and useful.
This is possible because the lower-level view is nested, causing it to be fully
expanded before access. Notice how the full view for Addr01 is included and
Addr03 is not. This level of filtering is possible with this capability.

25.7 Dynamic Variable Structure Generation Control


SQL’s outer join ON clause that is used to specify the hierarchical join criteria
and structure link points can also be used to specify conditional join criteria.
This can be used to dynamically control how data structures can generate dif-
ferently based on data values in the structure being tested. This can be done at

Figure 25.25 Linking below root with filtering.


SQL Hierarchical XML Processor Examples 309

the node or view level and can use join criteria values above and below the
lower link point, as was shown previously with the upper and lower join points.

25.7.1 Variable Structure Generation Controlled at the Node Level


Let’s look at an example of data path filtering based on a data value that is fur-
ther up the current data structure path, which can dynamically control the
building of variable structures. The SQL query in Figure 25.26 is built either
with the Dpnd node or the Eaddr node, based on a value from the Emp node
higher up in the structure. When the current EmpStatus value is “F,” the Dpnd
node is included; when EmpStatus value is NULL, the Eaddr node is included.
However, both values and nodes are not included because both are based on the
same data field. This is similar in capability to COBOL’s DEPENDING ON
clause, for those familiar with COBOL. This variable structure control allows
pieces of the data structure to be excluded or included by a control data value
that is further up its active path. Depending where this control value is located
up the path, its value can change many times even in a single document
(record) occurrence. There is no limit as to how often this method can be used
throughout the structure view.

Figure 25.26 Variable structures built at node level.


310 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Notice in the XML result of Figure 25.26 that employee Emp01 with a
status of “F” contains a Dpnd node and not an Eaddr node, while employee
Emp02 with no status has an Eaddr node but no Dpnd node.

25.7.2 Variable Structure Generation Controlled at the View Level


Views can also be used with building variable structures. In this case, the entire
view structure is either included or excluded. In the SQL in Figure 25.27, the
EmpStatus field in the root above is used again. However, this time it controls
whether or not the full CustView structure is included in the result. The XML
result shows that it is included in one case and not the other.

25.8 Conclusion
This final chapter has shown the SQL hierarchical XML processor in actual
operation demonstrating its common and critical operations. The actual query

Figure 25.27 Variable structures built using views.


SQL Hierarchical XML Processor Examples 311

results are shown in XML using attribute mode, which is the default for the
SQL hierarchical XML processor. These examples where annotated to help
explain their internal and external operation.
26
Summary
The standard SQL join is an operation with powerful syntax and semantics,
whose hierarchical capabilities have not been fully understood or realized. This
book’s purpose has been to remedy this situation. Many of these capabilities are
based on the outer join’s inherent ability to dynamically model and process
complex hierarchical data structures. This book has proved that this powerful
data modeling capability does exist and has demonstrated that this ability can
be harnessed using only the standard SQL facility. Using the outer join opera-
tion to perform data modeling and hierarchical processing was explained, care-
fully showing that any hierarchical data structure could be dynamically
modeled and processed.
The flexible syntax of the standard SQL is what enables a singular, unam-
biguous hierarchical application view to be defined and utilized. These data
modeling capabilities can be immediately used by the user or can be utilized by
SQL vendors to support advanced new capabilities. The associated data struc-
ture meta information is embedded implicitly in the outer join syntax that
defines the view. This information is automatically available in standard outer
joins. By dynamically extracting and utilizing this meta information from outer
join syntax, SQL vendors can provide standard features that have not been pre-
viously possible with standard SQL. These advanced features have been dis-
cussed in the book and include: full multipath hierarchical processing; dynamic
structure joining; transparent XML integration; navigationless hierarchical pro-
cessing; data fragment level processing; structured data transformation; and
structure-aware processing to support hierarchical optimization and automatic
structured data output.

313
314 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

The real power of the ANSI outer join appears when three or more tables
are joined. This is because changing the order in which tables are joined in
influences the result. This is a dramatic and significant departure for relational
databases. Before this book, the effects and capabilities of this have not previ-
ously been fully studied, understood, or documented. Understanding these
effects opens a whole new realm of possibilities for relational processing.
Using the outer join, the power of hierarchical data structure processing
was shown and described, along with the inherent semantics of these multipath
structures. These structures are unambiguous, making them perfect for applica-
tion views. This book also demonstrates how the flat Cartesian product model
produces the same LCA multipath semantics and operation as its comparable
hierarchical model when processing relational database queries. This has an
important significance for SQL seamless access of heterogeneous and legacy
database access, which is also shown in this book.
These hierarchical outer join capabilities are still unfamiliar to most SQL
users, and are available to be used if users know how. It is hoped that this book
will help identify this capability. A new automatic metadata maintenance capa-
bility that would allow dynamically structured data to be specified across peer
locations for immediate processing, eliminating the need of metadata updating
by the user, was also described. This will open many advanced data processing
capabilities, such as peer-to-peer real-time hierarchical SQL coding collabora-
tions using dynamic structured data. The natural hierarchical processing can
also offer a gateway or interface to the semantic Web, and can also offer auto-
matic parallel processing. This is possible for hierarchical queries because hier-
archical pathways are parallel and can automatically take advantage of parallel
processing.
Appendix A
Database Relationships and Views Used
in This Book

315
316 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Notes on the Database Views


The Manager, ProdMgr, and DeptMgr table names used in some of the views
are alias names for the Employee table. The Employee, Department Depend-
ent, Feature, and Project tables also have the shorter alias names: Emp, Dept,
Dpnd, Feat, and Proj (respectively). These are also used from time to time. The
root names of the views are also there view names. The Employee and Depart-
ment views contain the same tables and the same relationships. For this reason,
they are used often to demonstrate how outer joins data modeling can model
different views. The Division view has two pathways that both contain multiple
occurrences; for this reason, it is used to demonstrate the semantics of multiple
paths and multiple path occurrences. The DivProj view demonstrates how
powerful node promotion can be used in views to aggregate (condense) views.
These are shown in Figure A.1.

Department
Division
Employee
Conceptual
Product Department View
Dependent
ProdMgr DeptMgr

Division

Product Department

ProdMgr Employee DeptMgr

Feature Project Dependent

Product DivProj Employee

Feature Project Project Dept Dpnd

Figure A.1 Database relationships and views.


Glossary
The terminology defined here is used in this book and pertains to its use in this
book. Because many of the technology areas covered in this book are new,
many of the terms listed below have been coined to help describe this new tech-
nology and will be identified as such.

Access path: The access path refers to a navigation path in a hierarchical


structure from the root node of the structure to the node that requires access.
This path must be followed when accessing a node by accessing each node
along the path to the required node in order to maintain the semantics of the
data structure.

Ad hoc query: An ad hoc query is a database query that can be specified inter-
actively or for unanticipated queries. This means that the database query does
not require being predefined to the database system that is processing the
query. In relational systems, this will require dynamic SQL query processing.

Aggregate data: Data that is the result of applying a process to combine data
elements collectively or in summary form. The SQL SELECT List does this
very easily and offers quite a bit of dynamic control.

Alternate key: An alternate key is a column or field in a relational table or


record that can also be used as a nonunique key to select many records (e.g.,
many red-colored keys were the alternate secondary color key red). As such, this
key is probably not unique among other rows or records in the table or file. A

317
318 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

foreign key can be considered an alternate key. The alternate key is usually the
“many” side of a one-to-many relationship.

Ambiguous semantics: Semantics are about meaning. Ambiguous semantics


are semantics that have more than one possible meaning. These meanings can
be conflicting. Semantics should be singular in meaning in order to be most
useful.

Ambiguous structures: Data structures, such as network structures, have


ambiguous semantics when used to represent a singular view of the data. These
structures do not have a singular meaning because data values in the structure
can usually be reached from multiple paths, with each path representing differ-
ent semantics or meanings. Procedural structure navigation is necessary in
order to get reliable results in these cases. Hierarchical structures are unambigu-
ous because they have only one path to each data value.

Ancestor nodes: Ancestor nodes are nodes that are further up the path from
their related descendent node. As in a parent node, ancestor nodes control the
existence or range of processing of those nodes under it.

Any-to-any structure transformation: Means that any linear-to-linear, lin-


ear-to-nonlinear, or nonlinear-to-nonlinear data hierarchical structure trans-
form can be performed easily and at a high level by using SQL. The semantics
in the hierarchical structure being transformed are used to help perform the
transform operation easily and accurately.

API: API stands for application programming interface. SQL is an applica-


tion programming interface for relational databases.

Application view: The application view is how the application visualizes the
structure of the database. This structure should be hierarchical because hierar-
chical structures are singular (unambiguous) in meaning. This enhances the
usefulness of the data structure semantics. With application views, applications
can share views and databases can support many different views.

Association table: Association tables are used in relational databases to main-


tain many-to-many data relationships, such as Parts/Suppliers, and to combine
structures that are related but are missing the needed data relationships that will
be supplied by the association table. This relationship can operate in either
direction as a one-to-many relationship: Part over Suppliers or Supplier over
Parts. Both directions cannot be maintained with just the Parts and Suppliers
Glossary 319

tables, so an association table is used between the Parts and Suppliers tables in
order to maintain the one-to-many relationships in both directions when per-
forming the necessary joins.

Associative operation: An associative operation is one where the operation’s


execution order can be changed within the limits of not altering the physical
ordering of the operations without affecting a change the result. This is usually
tested with the aid of parentheses. Addition and multiplication are associative
in operation, whereas subtraction and division are not. For example, with addi-
tion: 5 + 2 + 4 equals 5 + (2 + 4), while with subtraction: 5 – 3 – 1 does not
equal 5– (3 – 1). Building a hierarchical structure is associative because hierar-
chical structures can be built top-down, bottom-up, and in any order.

Atomic value: Atomic value is a basic value that is not combined of other
classifiable parts.

Attribute: XML attributes are used to provide additional information about


XML elements. XML attributes are specified by user-specified keyword names.
Also see Element for an explanation. When used in a relational sense, the term
“attribute” is referring to the columns in a relational table.

Attribute-based or Attribute content: This is when an XML element con-


tains only attribute data, no element text.

Augmented Cartesian Product: See Extended Cartesian Product.

Automatic metadata maintenance: As used in this book, is the automatic


process of maintaining metadata associated with dynamically structured data,
making its maintenance transparent. This avoids having to procedurally update
the metadata of dynamically structured data and can be automatically per-
formed immediately. This is a new process described in this book.

Axis: Is a navigational aid in XML. It sets Xpath to the current position to


base further positioning requests on.

Blob: A blob is a relational column type used to hold binary large objects that
can be composed of any type of data. A blob is used mainly for storage. For
example, it can store a native XML document. It is not meant to be processed
directly by SQL inherent operations. It can be processed by user-defined
functions.
320 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Bottom-up processing/execution: Bottom-up processing of outer join hier-


archical structures involves their construction by building them from the bot-
tom of the structure upwards. This can change the normal table join order, but
does not affect the result because hierarchical structures can be built in any
order and do not change the result structure. Top-down may be more efficient
because it avoids throwaway data.

Built-in capability: Integrated capability.

Bushy query: A bushy query is a query that accesses and/or processes multiple
paths of the hierarchical data structure that is being processed.

Candidate key: A candidate key is a combination of table attributes that


uniquely identifies each record within a table. It is typically used when there is
no unique key for the table.

Cardinality: Cardinality is a relational term for the number of rows in a table


or result.

Cartesian product: A Cartesian product is the result when two relational


tables are joined without applying join criteria. Each row of one table is joined
with every row of the other table, creating all possible combinations. For this
reason, the result is referred to as being “exploded.”

Cascading delete: When a hierarchical node occurrence is deleted (filtered


out), all of its dependent node occurrences are also removed. Also see Hierarchi-
cal data preservation for a complete description.

CDATA: The XML CDATA type construct specifies an escape block for an
element that specifies that the indicated text data should not be parsed because
it has special characters or requires special processing.

Child: A child is the next lower level table or node in the data structure that
follows the path downward. There can be multiple children definitions for a
parent node definition, each one on a separate path from the parent. In a hier-
archical structure, children data occurrences can not exist without an active
parent data occurrence.

Closest Common Ancestor: See Lowest common ancestor (LCA).


Glossary 321

Closure: All expressions in a language return values that are in the data model
being processed by the language.

Coalescing: Coalescing is the inspection of key values under the same domain
to return a single, non-null, valid key value found amongst them. This has spe-
cial significance for outer joins where null key values can be produced because
of their data preserving ability. This can identify a key field among multiple
keys when there is at least one key non-null value present so it can be used as
the only key field. This avoids multiple key fields to check.

Column: A column is a relational term for a data field that is defined in a


table. It usually holds an atomic (single) value, but in post-relational databases,
it can hold nested tables and even native XML. A relational column is also
known as an attribute (not to be confused with an XML attribute).

Composite key: A key that contains more than one column. This is also
known as a concatenated key, or multivalue key.

Computed field: See Virtual field.

Common ancestor: See Lowest common ancestor.

Commutable joins: Joins that can change order or be replaced with other
joins to produce the same results.

Commutative operation: A commutative binary operation is one where its


two input arguments can be switched around without affecting the results.
Addition and multiplication are commutative, while subtraction and division
are not. For example, with addition: 5 + 6 equals 6 + 5, whereas with subtrac-
tion: 4 – 2 does not equal 2 – 4. Symmetric outer join operations, such the
FULL outer join, are commutative.

Complex data modeling: Complex data modeling used in the context of this
book applies to the ability to construct hierarchical data structures that contain
multiple paths by using the outer join operation. Multiple paths add another
level of capabilities and complexity to the principles involved in defining data
structures with the outer join and to the semantics that are associated with the
data structure.

Complex type: Complex type contains multiple subtypes, such as a node


containing subelements and attributes.
322 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Composite key: A composite key is a key that is comprised of multiple col-


umns from the same table. It is usually used when it is necessary to construct a
unique key value when no single column in the table represents a unique key.

Concatenated key: A concatenated key is a composite key that represents the


keys going down a path of the data structure so that the path is identified and
can quickly be renavigated.

Conceptual data modeling: A high-level data modeling that specifies an


abstract map of concepts and their relationships. SQL’s inherent hierarchical
data modeling does this easily and naturally.

Conceptual view: A conceptual view is a view or schema that defines all pos-
sible data and the valid relationships they comprise in a database so that all
required application views can be defined from it. As such, a conceptual view
requires a network structure to define it because of the high probability of con-
verging paths. A conceptual view sits between the internal and external views
and acts as an automatic level of abstraction between the two.

Conjunction: Conjunction is an AND logical operation that is an additional


condition to be met where both sides must test true.

Contiguous structure: A contiguous structure is a data structure that is stored


in a contiguous format. In this way, it does not have to be assembled when
retrieved.

Content model: There are three data content models when XML elements
are declared. These are data content, element content, and mixed content.
With data content, elements can specify text data, but cannot contain
subelements. With element content, an element can only specify subelements
and optional rules for their use. With mixed content, elements can specify both
text data and a subelement within the text.

Conventional data structures: Conventional data structures are structures


that are common in business use. These include relational, flat, and fixed for-
mat hierarchical structures. Semistructure data, which is new to business,
includes the capability to define structures that are still considered unconven-
tional; these include structures that can have dynamically varying structure
formats.
Glossary 323

Correlation name: An alternate table name occurring in a FROM clause


within a SELECT statement whose scope is the scope of the SELECT
statement.

Cousins: As used in this book, cousins are nodes that are not directly related
to other nodes on the same active path, but are related indirectly by a common
ancestor node data occurrence. This means that every node in the hierarchical
structure is related directly or indirectly to each other.

Cross join: The cross join is one of the ANSI-92’s SQL join types. It creates a
basic nonrestricted inner join Cartesian product result and as such, it does not
use or require a join condition, so no ON or USING clause is used with it.

Dangling tuple: Dangling tuples are the rows that are not matched in join
operations. With inner joins they are discarded, and with outer joins they can
be preserved in the result by padding their unmatched row side with null
values.

Data abstraction: Data abstraction is the ability to hide the complexity of the
data. In this book, a good example would be a stored structured data view
whose use helps hide the complexity of the hierarchical data structure.

Data definition: A data definition is a definition of the characteristics of data


in the database. This includes—but is not limited to—the data type, size, and
number of occurrences and structure relationships of the data to other data in
the database. XML DTDs and cchemas can be classified as data definitions.

Data explosions: A data explosion is effect of replicating data (rows) caused


by the Cartesian product producing all combinations of the joined rows.

Data filtering: Data filtering is the dynamic process of selectively removing


undesired data from the query result based on the values in the data. It is speci-
fied on the WHERE or ON clause, but is not considered to be part of the join
criteria. The data filtering process operates differently when specified on the
WHERE clause than when specified on the ON clause. The ON clause data fil-
tering offers a much finer level of data filtering that follows the hierarchical
structure defined downward from the point it is defined. The WHERE clause
filters from the root node down. In relational terms, this means the filtering of
entire rows instead of portions of rows.

Data fragment: See Structure fragment.


324 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Data independence: Data independence is the characteristic that enables data


to be easily combined into usually unlimited number of different structures.
Without this property, data cannot easily be combined to form different com-
binations of data. This property requires the normalizing of relational data by
breaking it up into multiple tables following the rules of normalization.

Data inheritance: Data inheritance is the process of acquiring characteristics


from an object that is included in another object. In the case of an outer join
structured view, this involves inheriting the data structure when it is included
in a structured view of another structure that is being constructed. In this way,
it can be used in multiple views.

Data integration: Data integration is the process of combining dispersed data


from multiple heterogeneous systems.

Data modeling: Data modeling is the ability and process of specifying and
constructing complex data structures that represent specific semantics. In SQL,
this can be performed with the ANSI-92 LEFT outer join operation that can
inherently define and process complex data structures.

Data occurrence: A data occurrence is an actual occurrence of the data in the


database. Data occurrence may also be known as a data instance that is actually
more associated with data objects that contain more information than just a
data occurrence.

Data partition: Breaking tables into multiple tables for different purposes.
This can be done vertically or horizontally. Vertically, rows are split across mul-
tiple tables. Horizontally, tables are split based on some data value or range,
such as names starting from A to F in one table and G to M in another table, or
maybe by office location.

Data persistence: The capability to retain and reaccess data after the applica-
tion creating it has terminated normally.

Data record: See Record.

Data segment: A data segment is a single occurrence of a contiguous string of


data types that are closely related.

Data structure extraction (DSE) technology: The DSE patented technology


defines the process for extracting the data structure metadata from outer joins
Glossary 325

that define data structures. This metadata contains a detailed description of the
data structure from which powerful and useful semantics can be derived to per-
form automatic operations, such as hierarchical optimizations and automatic
hierarchical data formatting, such as for XML.

Data structure fragment: See Structure fragment.

Data structure mashup: Data structure mashup is the ability to combine


hierarchical data structures in practically any way and it is still semantically
hierarchically correct so it can be used to derive correct hierarchical results from
hierarchical processing.

Data structure metadata: Metadata, also known as meta information, is


information about information. Data structure metadata is information about
the data structure, such as a detailed description of its structure and data
relationships.

Data structure hierarchical processing: The ability of a database query to


process a complex hierarchical data structure by automatically following the
semantics of its data structure.

Data structure transformation: See Structure transformation.

Data type: A data type is how the data is defined in the database; each data
type can have unlimited data occurrences.

Data virtualization: The ability to easily select and combine data fragments
from many different locations dynamically and in any way into a single data
structure while also maintaining its semantic accuracy.

Database record: See Record.

Database navigation: See Navigation.

DDL: Data Definition Language.

Declarative language: See Nonprocedural language.

Degree: Number of relational attributes (columns) in a table or result set.


326 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Denormalization: Denormalization is the process of prejoining normalized


data and saving the result as a denormalized table. This is a deliberate data
design decision. This avoids the overhead of performing the join operation each
time the query that uses the denormalized data is used. Denormalization is per-
formed for efficiency purposes and puts the data in an unnormalized form,
which is the form the data would naturally have from the join processing,
regardless of how it was stored. The disadvantage of denormalized data is that
its data independence is lost and the data can become stale.

Derived data: Derived data, as its name implies, is data derived from some
process or calculation. Derived data, when data is retrieved, is modified after
being retrieved and placed in the input buffer as if it was retrieved directly. For
example, a birthday could be converted to an age. This is a good example
because age is constantly changing.

Derived table: See Temporary table.

Descendent node: A descendent node is a node that is further down the path
from the related node.

Deterministic: to produce the same result every time.

Directed graph: This is a one-way graph, like a hierarchical structure, that is


only navigated top-down from source to target.

Dirty data: Dirty data is data that is or has become missing, inconsistent, or
erroneous.

Disjunction: Disjunction is the OR logical operation used as an alternative


operation or switch.

Disparate heterogeneous database access: Disparate heterogeneous database


access is the accessing of very different types of physical databases, possibly
from different vendors, as if they were one logical database. Disparate heteroge-
neous database access includes intermixing different database types in the logi-
cal view, such as both relational and nonrelational databases. This is how
federated databases operate.

Distributed hierarchical processing: Distributed hierarchical processing is


the ability of SQL hierarchically defined views and their distributed portion to
automatically perform hierarchically at their remote sites. This enables the
Glossary 327

entire base query to be performed hierarchically, even though it was distributed


to other processors for processing.

Document: See XML document.

Document centric/oriented: XML documents that are usually processed by


humans directly. They are not highly hierarchically structured, but are loosely
structured natural language.

Document round tripping: This is when a native XML document is stored


relationally by shredding and reconstructed later for output as native XML.
The native XML document should remain the same after round tripping.
When a document is deconstructed and then reconstructed, it often will not be
exactly identical even though it may be semantically identical. This is often
because of how white space is treated.

DOM: DOM is the document object model API. A DOM processor is used
to access, parse, store, and retrieve tokens from an XML document. There are
other APIs, such as SAX, that can be used to access native XML documents.

DOM tree: A DOM tree is the entire document internal hierarchical struc-
ture produced when DOM accesses the next document occurrence. This can be
many times the size of the actual document native occurrence.

Domain: Domain, in relational terms, usually applies to columns in one or


more tables that have the same use and meaning. For example, when relating
two tables, it implies their join columns are in the same domain. The domain
also specifies the valid values or ranges the data can have.

DSE technology: See Data structure extraction technology.

Duplicate data: Duplicate data, as used in this book, is real data that natu-
rally occurs multiple times, containing the identical data; each occurrence of
duplicate data is meaningful. The term replicated data, on the other hand,
describes data that is replicated because of operations applied to the data, such
as joining tables or flattening hierarchical structures. In this case, the identical
copies of the data are not meaningful and were only necessary in order to help
perform the desired operation and can have side effects.

Duplicate element usage: Duplicate element usage is one of XML’s advanced


hierarchical capabilities, which allows a given element type to be specified as a
328 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

node in more than one location in the hierarchical structure. This is similar to
object subclasses, such as an address class, being used as a subclass in both cus-
tomer and employee super-classes. These duplicate named element types will
show up in multiple locations of an XML hierarchical structure, causing ambi-
guity problems for navigationless query languages such as SQL.

Duplicate key: Duplicate key means that more than one record, row, or ele-
ment can have the same key. The key is not unique. A duplicate keyed record
usually means that the primary key can be duplicated.

Duplicate node types: See Duplicate element usage.

Dynamic: In this book, the term dynamic is used as a modifier for a database
operation, indicating that the operation it modifies can be performed dynami-
cally or in an ad hoc fashion, such as a dynamic query or a dynamic joining of
structures.

Dynamic path shortening: Dynamic path shortening is a database access


optimization. It is used in outer join processing where the active access path can
be dynamically shortened at the first path node position where missing data is
encountered. This is significant to the outer join operation because missing
data is not usually a reason to stop processing with the outer join.

Dynamic rebuild/rewrite: Dynamic rebuild or rewrite is an SQL optimiza-


tion where the SQL query can be dynamically rewritten at the time of execu-
tion in order to take advantage of the latest features in the SQL system. With
the outer join containing meta information about the data structure that is
being processed, there are significant possibilities for semantic optimizations to
be applied dynamically.

Dynamic metadata maintenance: As defined in this book, dynamic metadata


maintenance is the process of automatically maintaining the metadata, which
becomes necessary when dynamic structured data is being used. This allows
dynamic structured data to be freely used and makes modifying structured data
effortless.

Dynamic structured data: Structured data that has been dynamically modi-
fied. See Dynamic metadata maintenance.

Dynamic SQL specification: Dynamic SQL specification is the ability to


build SQL query statements at run time. This enables SQL queries to be
Glossary 329

specified and joined in an ad hoc interactive fashion that does not require
predefinition. This capability is automatically extended to data modeling, hier-
archical structure processing, and hierarchical structure joining capabilities
made possible by the ANSI-92 outer join operation.

Edge table: An edge table is a table structure that defines a specific tree struc-
ture. Each row defines a node type of the structure, as well as its parent and
child node types.

EII: Enterprise information integration seeks to avoid moving large amounts


of data by dynamically modeling and accessing virtual or federated databases in
real time. Other advantages include fresher data, access to real time data, and
the ability to perform unanticipated queries. For an alternative approach, see
ETL.

Element: XML elements define data in two ways, using a start and stop/end
tag name that contains a text string that can also contain subelements, and also
through attributes that are name and value pairs. Either or both can be used
unless restricted by a schema. The tag names can be used to name the data val-
ues or act as markup in the text.

Element-based or element content: With element-based content, only XML


element text is used to specify data, no attributes are used.

Element content model: See Content model.

Embedded structure: An embedded structure is a logical or physical structure


or fragment that is contained within a logical structure.

Embedded views: Embedding SQL views is the capability to nest views by


placing views within views. This nesting capability seamlessly supports hierar-
chical structured views containing data structures that have been defined by the
outer join operation. When expanded, the SQL will automatically define the
combined unified outer join structure.

Empty element tag: An empty element tag represents an element declaration


that cannot contain data because it has no start and stop tag. This is different
from a regular element instance that does not contain any data. These empty
elements tags are often used as flags and may be treated differently than an ele-
ment that contains no data. An empty element tag is represented as
<tagname/>.
330 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

End tag: An end tag is a matching tag for an XML start tag represented as
</tagname>. It closes the definition of the current element occurrence.

Enterprise access: Enterprise access is the ability of an application or database


system to access all of the databases in the corporate enterprise regardless of the
database types or database locations involved.

Enterprise data: Enterprise data is data that is used or can be used across the
entire corporation.

Enterprise modeling: Enterprise modeling is the development of a consistent


view of the data and its relationships across the enterprise.

Entity: An XML entity specification operates like an include operation that


brings different forms of data, such as text or pictures, into a document. For
text, this is useful for boilerplate material that exists in multiple locations in a
single document occurrence, or exists in many document occurrences or docu-
ment types. For use in a single document occurrence, the text can be defined
once in the document and referred to multiple times.

Entity relationship diagram: An entity relationship diagram is a network


structure diagram that depicts all of the data entities, their relationships, and
their relationship types (i.e., one-to-many, many-to-one, and many-to-many)
in a database.

Equal join: An equal join is a relational join that uses an equality operation to
relate the tables. An equal join is also known in relational terms as an equijoin.

Equijoin: An equijoin is a fancy term for an equal join, which is a relational


join that uses an equality operation to relate the tables.

ETL: Extract, transform, and Load are utilities for accessing, converting, and
loading massive amounts of data. The newest ETL products are designed to
convert and move relational data sources to XML sources, and XML sources to
relational sources. This involves shredding (flattening) the XML data.

Existential qualifier: An existential qualifier is an existence test, such as IF


ANY NodeX ... This operation can become more controlled and useful with
hierarchical processing by specifying the upper hierarchical structure bound,
such as IF NodeY HAS ANY NodeX.
Glossary 331

Expanded views: Expanded views are embedded stored views whose name
reference is replaced with its representative source code so that the query can be
processed against its expanded source code. When structured views are
expanded, they automatically form a unified hierarchical view that uniformly
models the hierarchical structure being processed.

Extended Cartesian product: A relational Cartesian product produces all


combinations of rows from two relational tables. An extended Cartesian prod-
uct operates by augmenting each table with an all-null row that is joined when
no other row is matched performing the Cartesian product. This extended
operation result reflects the operation and semantics of outer join operations.

External view: An external view is one of the three types of views that com-
prise the three tier model for database architecture. These are the internal,
external, and conceptual views. The external view is the view that the applica-
tion and user of an application has of the database. For this reason, it is also
known as the application view. With application views, applications can share
views and databases can support many views.

Federated database: A federated database accesses the data from other data-
bases when the data is needed. This is the opposite of a centralized database sys-
tem. Also see disparate heterogeneous database access.

Field: A field in relational terms is a column in a table. It holds an atomic


value.

First normal form: First normal form doesn’t permit relational tables to con-
tain repeating data types or groups in a single row. Repeating data should be
placed in another table where each occurrence of the repeating data is placed in
a different row. This allows a table to be a flat, two-dimensional structure. First
normal form is not a prerequisite for good database design, it is only required
for relational databases and their flat tables. Also see Nonfirst normal form.

Fixed-occurring fields: Fixed-occurring fields are data fields that can occur
multiple times in a record. They are fixed because the amount of space required
to contain them is reserved in the record whether it is used or not. This means
that a fixed-occurring field can contain a variable number of data fields, but is
still considered fixed because it always uses the same fixed amount of storage
space and cannot exceed the maximum space allocated for it.
332 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Flat file: A flat file is a file that has the same fixed, unvarying format for each
record. It has no variable-occurring fields, but can have fixed-occurring fields.
In this way, each record is of the same length. A flat file can be thought of as a
relational table, with each of its fixed records as a row of the table.

Flat structure: A flat structure is a two-dimensional data structure. It has no


variable-occurring fields, but can have fixed-occurring fields. In this way, each
record is of the same length.

Flattening: Flattening a data structure means taking a multilevel structure,


such as a hierarchical structure, and converting it into a flat, two-dimensional,
first normal form table or rowset. A side effect of this flattening is losing data
structure information and introducing replicated data values to fill out the flat
structure.

Flwor: Pronounced “flower,” flwor is an XQuery operational construct for


performing iterative operations and procedure-like programming. The acro-
nym stands for: “for, let, where, order, return.” Join operations are specified
this way.

Focused retrieval and result aggregation: Automatically isolates and returns


only the most desired data in a query or search and condenses the results.

Foreign key: A foreign key is an alternate key in one or more tables that
relates to a primary key in another table, creating either a one-to-many or
many-to-one relationship.

Forest: A collection of separate trees in XML.

Four value logic: With XML, there can be four values of logic: true, false, no
value, and empty value. This is because an element can have no value specified
or it can be specified as an empty element. The no value and empty value can
both be interpreted differently by the application.

Fourth-generation language: A fixed form language for nonprocedural speci-


fication of queries. See Nonprocedural language.

FTP: File Transmission Protocol is a standard Internet protocol to exchange


files using TCP/IP.
Glossary 333

FULL join: A FULL join is an outer join type that preserves data on both
sides of the join operation when rows are not matched up. Unmatched rows are
padded with null values. This does not model a hierarchical structure; it models
a flat structure because a FULL join is a symmetric operation. These can be
incorporated into a hierarchical structure as a single logical node comprised of
two or more FULL joined tables or nodes.

Functional language: A software language that is expression-based, such as


XQuery.

Global view: A view that encompasses the entire physical structure and could
also include smaller views with no overhead for using the oversized view. This is
used to support the single view concept where larger views can be used without
efficiency concerns. This avoids the need to have many specialized views tai-
lored to a subset of the global view, making global views more user-friendly and
always efficient.

Graph: A tree is a directed graph where the direction is from the root down.

Guided navigation: Guided navigation breaks up search results into multiple


categories, allowing the user to drill down their search results based on those
categories. This query operation is also known as faceted query or parametric
search.

Heterogeneous database access: Heterogeneous database access is the access-


ing of databases from different vendors, and can consist of different types of
databases, as if they were one logical unified database.

Hierarchical data filtering: Hierarchical data filtering requires hierarchical


proximity processing. This is also known as the XML keyword problem. This
problem and complexity arises when multiple filtering conditions are placed in
different pathways of the hierarchical structure. This is solved with LCA pro-
cessing that SQL hierarchical processing can perform automatically.

Hierarchical data preservation: Hierarchical structures preserve their struc-


ture hierarchically. This is because parent nodes can exist without any children
nodes. This means that when a node data type is deleted, all children data types
below the deleted node type are also deleted in what is known as a cascading
delete.
334 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Hierarchical data semantics: Structured data can be organized by using data


nodes that are hierarchically structured or connected.The data nodes are all
related hierarchically to each other through their semantic relationships (mean-
ing). These are fixed or logical relationships and have meanings that also follow
hierarchical principles. These can be utilized to automatically process queries
against the data correctly.

Hierarchical data structure: Hierarchical data structures are multilevel data


structures where the tables or nodes at each level only have one parent. This
means that the tables have only one pathway leading to them from the next
higher level table above them, which results in hierarchical structures only hav-
ing a single path from the root of the structure to any data item, making their
semantics unambiguous and powerful. Childless parent nodes are preserved,
which is also of great importance. In general and in this book, unless told other-
wise, each node type (name, Id) in the structure must be unique. This gives the
structure special and specific characteristics enabling special processing to be
performed on the structure.

Hierarchical distributed processing: This is the processes or capability of eas-


ily performing distributed hierarchical processing across multiprocessors and/or
multiple systems.

Hierarchical join: As used in this book, it means that the hierarchical struc-
tures are being joined hierarchically one above the other. This properly com-
bines into the larger hierarchical structure with the correct combined
hierarchical structure. One-sided LEFT or RIGHT joins can be used to per-
form hierarchical joins. LEFT outer joins are hierarchically easier to work
because they are combined in a left-to-right order that follows a top-to-bottom
hierarchical direction.

Hierarchical optimization: Unreferenced portions of hierarchical structures


do not need to be accessed and will not change the semantics of the query. This
powerful semantic hierarchical optimization can be applied to hierarchical SQL
views. This optimization is generally overlooked by relational optimizations
because hierarchical structures are not recognized by the optimizer or relational
engine.

Hierarchical structure mashup: The term “structure mashup” is used in this


book to describe joining multiple hierarchical structures without the restriction
that the lower structure must be joined at the root. There are much fewer
restrictions in how they are joined.
Glossary 335

Hierarchical query: A query applied against a hierarchical structure and pro-


cessed hierarchically. Hierarchical queries with multiple pathways require spe-
cial hierarchical processing. See LCA processing.

Hierarchical relationships: Parent, child, siblings and cousins are hierarchical


relationships that exist between nodes of a hierarchical structure. Cousins exist
on different paths of the structure and are related by a common ancestor.

Hierarchical processing: The term hierarchical processing, as used in this


book, is the processing of hierarchical modeled structures so that the useful
semantics of these structures are utilized. This means that the SQL processing
of hierarchical modeled relational and nonrelational data can be performed in
nonfirst normal form to avoid flattening the data structures, which would cause
semantic information loss.

Hierarchictivity: Hierarchictivity is a term coined in this book to describe


transformational principles of hierarchical structures that are not covered fully
by commutative and associative principles. These apply to hierarchical seman-
tic principles, such as the capability to reorder join operations without chang-
ing the semantics that are not fully attributable or accountable to commutative
and associative principles.

HTML: Hypertext Markup Language is used for formatting a web page for
output. Its tags are fixed. You could say that it is an XML vocabulary for web
output.

Hyperlink: On the Internet or other hypertext systems, hyperlink is a syn-


onym for both link and hypertext link.

Hypertext: Hypertext is the organization of information units into connected


associations that a user can choose to make. An example of such an association
is called a link or hypertext link.

IDREF: An XML keyword that references another element making a logical


pathway to it. This will most likely create a network structure logical connec-
tion because it supplies a secondary access path to a data value that already has
an access path to it, referred to here as a shared element data.

Implicit natural join: An implicit natural join is a term use in this book for
ANSI-92 natural joins that are specified by replacing the ON clause with the
336 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

USING clause which implies that a natural join is to be performed, hence the
use of the term implicit.

IMS: IMS is IBM’s hierarchical database management system, which is still a


popular legacy system in wide use. It can also be used as a general term for
information management system. hierarchical processing can also access IMS
navigationlessly.

Inner join: The inner join is the standard default join. It does not preserve
unmatched data rows under any circumstances. It is a symmetric join operation
and therefore models only a flat structure.

Interactive: Using dynamic processing to support real-time interaction or ad


hoc processing.

Intersecting data: Intersecting data is additional data that is stored in an asso-


ciation table along with the association data. An association table holds the rela-
tionships between two tables which have a many-to-many relationship, such as
Parts/Suppliers. The intersecting data is uniquely related to the associated data
in each row at the intersection point that has a specific and unique meaning. An
example of intersecting data is the price of a part from a specific supplier that
could be different from a different supplier.

Inverted index: An inverted index is an index wherein every data item is


indexed. This enables many capabilities where complex queries can be
answered by just processing the index.

Irregular data structure: An irregular data structure is a data structure that


does not follow standard conventional formatting rules, such as a semistructure
dynamically varying format capability.

Join operation: Relational tables are joined across their rows creating a larger
wider table.

Join table order: The table join order can be specified in the outer join state-
ment. This table join control is important in some outer join operations where
it can influence the result.

Join table reordering: Join table reordering is the process of altering the table
join order to optimize the execution of outer joins. This cannot be done indis-
criminately because changing the table join order can affect the results of the
Glossary 337

outer join operation. Analyzing the data structures defined by the outer join
operation and understanding its semantics is one way of determining when and
how table join order can be optimized without changing the result.

Late binding: Late binding with outer join data modeling is the ability of the
database application to accept different data structures that can be specified at
run time.

LCA: See Lowest common ancestor.

LCA query: An LCA query is a multipath query using and requiring lowest
common ancestor logic in order to process hierarchical structures.

Left join: The LEFT join operation is an outer join that preserves unmatched
rows from the table specified on the dominant left side of the join operation. It
is a natural hierarchical operation that allows the hierarchical structure to be
built from the top to the bottom. The RIGHT outer join preserving data on
the right builds hierarchical structures from the bottom upward. The LEFT
outer join is easier and natural to use because it progresses naturally from the
left to the right in the same direction as its execution.

Leaf node: A leaf node is a data node or terminal node.

Left-sided nesting: Left-sided nesting is the natural, intuitive way of specify-


ing more than two tables in the join specification. Additional tables are intro-
duced from the left side. Its name is derived from the way outer join views are
expanded when introduced from the left side.

Leg: A leg is a path in the data structure, including the data that is stored
along its path. This is an older name for what is simply called a “path” today.

Legacy data/database: Legacy database applies to all prerelational data and


databases that are still in existence or prerelational database systems that are still
in operation.

Link points: Link points mentioned in this book are the connection node
points for joining hierarchical structures, one in the upper and one or more in
the lower data structure. They are connected by a pathway when the data struc-
ture is being built. This occurs using the outer join operation and its ON clause
join specification which specifies the link points.
338 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Linked data: Linked data refers to connecting related data that was not previ-
ously linked. It also can involve using the Internet to help increase the capabili-
ties of linking data together.

Linking: Linking is the process defined in this book for specifying a pathway
between two hierarchical structures that control how the two structures are
joined into a single hierarchical structure, properly preserving and combining
the semantics.

Lists: Lists are assumed ordered, while sets are assumed unordered. XML data
is assumed ordered, while relational data is assumed unordered.

Logical data structure: A logical data structure is a data structure or model


with logical linkages that rely on matching data values. These linkages can be
made dynamically. An example is relational databases. Physical structures can
also be logically linked into a larger structure that would be a hybrid structure;
this could also be classified as a logical data structure.

Logical hierarchical structure: Logical and hierarchical structures do not


seem to go together. Hierarchical structures are still thought of as fixed struc-
tures. This is not true today with SQL hierarchical processing. Hierarchical
structures today can be comprised logically of fixed structures and fixed hierar-
chical structures, as well as logical hierarchical structures. Logical hierarchical
structures can be dynamically built and hierarchically processed by the query
for the duration of the query. This is an extremely powerful new operation that
dynamically increases data value on a query by query basis. It is also efficient in
data storage that is naturally freed after the query completes.

Logical table: A logical table, as used in this book, is a series of flat structures
joined together that represent a single flat structure that is a single node in the
overall hierarchical structure being modeled. This logical flat structure is mod-
eled using INNER or FULL outer joins that are symmetric join operations that
model flat structures. This also enables INNER and FULL outer join opera-
tions to be used in the modeling of hierarchical structures.

Lossless integration: A data integration process that does not lose any data
information.

Lossless join: A nonloss, or lossless, decomposition that will result in the


same relation that was decomposed when rejoined.
Glossary 339

Lost data: See Missing data.

Lowest common ancestor (LCA): A lowest common ancestor (LCA) refers


to the next higher level node in the data structure that is a common link point
of two sibling paths of a data structure. The lowest common ancestors plays an
important role in determining the semantics across sibling paths of the hierar-
chical structure. It keeps multipath query requests meaningful. It is also known
as nearest or closest common ancestor.

Lossless process: Any process that has not lost any information, including
semantic information.

Many-to-many relationships: Many-to-many relationships are relationships


that are used in data modeling where both sides of the relationship can have
multiple occurrences. The classic example is a Parts-Suppliers relationship
where one part can be carried by multiple suppliers and one supplier can carry
multiple parts. M-to-M relationships (also known as M-to-N relationships) in
relational databases require an association table to maintain the M-to-M rela-
tionships. These association tables can also contain intersecting data, such as
price of a specific part from a specific supplier.

Many-to-one relationships: Many-to-one relationships are relationships used


in data modeling where the upper level of the relationship has many occur-
rences and the lower level has only a single occurrence. The classic example is
the Employee-to-Department relationship where many employees can have the
same department.

Markup data: Markup data contains markup elements that are used to indi-
cate markup indicators in the text. The rules for markup elements allow them
to be freely nested in any fashion as you would need for markup data. However,
this has no meaning for representing hierarchical data structures. If possible,
the entire markup text should be defined as CDATA and processed separately.

Markup element: A markup element occurs when an XML element tag is


used as an inline text markup indicator and not an element tag that defines a
piece of text as a data field. This overdefining of uses does present a problem of
being able to determine when an element tag is used for data definition or
markup use. Markup requires mixed content.
340 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Marshalling: Marshalling is a term that loosely means “moving data around,”


which may require some conversion. It is quite frequently used with ETL
products.

Materialized view: A materialized view is the process of generating the view’s


logical value as a temporary table to replace the view in the processing of a
query.

Mediator: A mediator is a product term used more frequently with earlier


semistructure research to define a query processor that converts or integrates
data between two different models, such as relational and XML.

Meta information: See Metadata.

Metadata: Metadata, also known as meta information, is information about


information. Semistructured data, such as XML, embeds metadata within its
data. When used with data structures as in this book, it pertains to information
about the data structure, such as its description: data type, length, and number
of data occurrences.

Middleware: Middleware is software that sits between the user and user inter-
face, or between the user interface and the database, which adds value to the
data.

Missing data: Missing data, also known as lost data, is the data that is lost in
an inner join when rows of the tables being joined do not match with any other
rows. Missing data can also occur with one-sided joins on the side that is not
being preserved. This definition ignores all the other reasons for missing data.

Mixed content: XML mixed content can contain attributes, elements, and
text.

Multileg: See Multipath.

Multipath: Multipath a newer term used to indicate a data structure that has
multiple legs. Hierarchical legs and paths represent the same thing.

Multipath processing: Multipath processing involves hierarchical processing


across multiple paths of the hierarchical structure that is being processed.
Glossary 341

Multipath query: Multipath querying involves hierarchical logic across mul-


tiple paths of the hierarchical structure that is being queried.

Multipath semantics: There are semantics (semantic meaning) between every


node type in a hierarchical structure. The semantics between nodes on the same
path (parent/child or ancestor/descendent) are well-known and understood.
The semantics between nodes on different paths of the structure are referred to
here as multipath semantics, which are more complex to process and require
LCA logic. These multipath-related nodes are known as cousins.

Multipath structure: A multipath structure is a complex hierarchical struc-


ture with multiple paths. If any node or table in a hierarchical structure has
more than one pathway exiting it, it defines more than one path. Multiple
paths in hierarchical data structures significantly increase the semantics and
complicate their operational principles. This is why multipath structures are
considered complex structures in this book.

Multiple positioning: Multiple positioning is the powerful capability to keep


multiple positions set in a hierarchical database while navigating the database in
order to avoid having to reposition back and forth.

Native XML: Native XML refers to the actual XML, with its embedded
metadata, and not data that has been extracted and isolated from XML

Natural standard join: A natural operation is applied to a join operation,


which causes the natural standard join’s common named join key values to be
coalesced into a single value in the result. This is very useful for FULL joins
when one key of the two join keys may be null so that a key value is always
available in the same location in the row.

Navigation: Database navigation is the process of positioning to any record


in the database structure. Not all databases support procedural navigation; for
example, relational databases are navigationless (self-navigating) and operate
transparently. The ANSI-92 SQL outer join arguably does allow some level of
procedural navigation because its join order can affect the result.

Navigationless: Fourth-generation languages such as SQL by their very


nature of being declarative languages, are navigationless. This means the user
does not need to specify the database navigation or direct access. Even when
SQL’s processing is naturally raised to a hierarchical processing level, it remains
navigationless. This keeps its hierarchical processing seamless and transparent.
342 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Hierarchical structures can be automatically navigated because there is only one


path between each node. This means that they are unambiguous for query
processing.

NCA: Nearest common ancestor, see Lowest common ancestor (LCA). LCA is
used in this book.

Nested display: A nested display is one in which the data is displayed in


“what you see is what you get” (WYSIWYG) structured format. This format
preserves the data structure and its semantics so that the data and its structure
can be displayed intuitively.

Nested relational processing: Nested relational processing is a post-relational


type of data where columns in relational databases can contain multiple values,
tables, and even hierarchical structures. Nested relational processing can pro-
cess these embedded structures hierarchically.

Nested structures: Nested structures are hierarchical structures whose hierar-


chical structure is represented physically and contiguously by nesting the hier-
archical data. XML is an example of this.

Nested tables: See Nested relational processing.

Node: A node is a third normal form collection of closely-related data con-


nected in a graph or tree structure. For SQL this consists of data rows of a table.
For XML data, it is element data. In many legacy hierarchical data systems,
these are known as data segments.

Node collection: When node promotion happens on multiple paths under a


common ancestor, the descendent nodes from the different paths are collected
under the common ancestor.

Node definition, node declaration, or node type: This refers to the defini-
tion of a node in the structure and not a data occurrence of the node.

Node occurrence or node instance: This refers to a single-node data occur-


rence and not the definition of the node in the structure.

Node promotion: When a defined node in the structure has not been picked
for data selection (no data projection, node exclusion) from it, it is not placed
Glossary 343

in the output structure and its selected descendent nodes are moved up the path
around it to their next selected ancestor node.

Node table: See Edge table.

Node type: See Node definition.

Nonfirst normal form: In relational terms, nonfirst normal form means that
tables can support structured or nested data with repeating data (multiple
occurrences of data in a single column). This form of relational data can be pro-
cessed by a nested relational processor. The first normal form requirement is
not a requirement for good database design or even a relational requirement, it
is a requirement imposed by SQL and its requirement for two dimensional
tables.

Nonlinear hierarchical processing: Nonlinear hierarchical processing is the


multipath processing of hierarchical structures involving multiple paths in their
processing.

Nonlinear hierarchical semantics: Nonlinear semantics are the semantics


between the different paths of the hierarchical structures.

Nonprocedural language: Nonprocedural languages are also known as


fourth-generation languages or declarative languages. The term declarative lan-
guage got its name from the fact that—with nonprocedural languages—it is not
necessary to specify how to perform a task, it is only necessary to specify what
you want the task to accomplish. Its advantages are that it is easier to specify, it
is automatically logically correct, and it can be better optimized for access
because it can be globally optimized.

Non relational database: A nonrelational database is any database that is not


a relational database. These include legacy and post-relational databases.

Normalization: Normalization is the process of designing a database follow-


ing at least the first three relational normalization rules for good relational data-
base design. All of the normalization rules require or rely on breaking the data
apart and storing the data in multiple tables to increase its data independence.
The join operation is used to combine the data back together when and as it is
needed.
344 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Null: Nulls are padding values that are used to represent missing data in outer
join results. Nulls are also used to represent unknown values when data is
entered into a relational table.

Object relational mapping: A form of modeling one-to-one, one-to-many,


and many-to-many relationships with a relational database to model a hierar-
chical object. Unique keys are required to be used as object Ids.

ODBC: ODBC is the open database connectivity API standard put forth by
the Microsoft Corporation. It uses SQL as the database interface language.

OID: OID is an object identifier used in object programming and languages,


but this term could be used elsewhere. XML uses this term and concept with
their IDREF and ID= keywords. AN OID is the unique name of a data object
that is useful for referencing objects. In object languages and programming,
every object should have an OID assigned.

ON clause: The ON clause is used with the ANSI-92 outer join operation to
specify the join criteria for each table being joined in the join specification. The
ON clause does supply greater control over outer joining tables than is possible
through a single WHERE clause. This proves that it has usefulness over the
WHERE clause and is also crucial to performing outer join data modeling.

ON clause filtering: The ON clause is used with the ANSI-92 outer join
operation to specify the join criteria for each table being joined. However, it
can also specify hierarchical data filtering, which allows for more control and a
more precise level of data filtering than if specified on the WHERE clause.

One-sided join: The one-sided join is the LEFT or RIGHT join. These are
known as one-sided joins because they preserve data only on one side, the dom-
inant side.

One-to-many relationships: One-to-many relationships are relationships in


data modeling where the upper level of the relationship has only one occur-
rence, and the lower level has many related occurrences. The classic example of
this is the Department-to-Employee relationship where each department can
have many employees.

Open database interface: An open database interface is a database interface


that is freely available to all potential users and supplies access to most common
database types.
Glossary 345

Ordered data: Most data systems are either ordered or unordered systems by
default. XML by default is ordered, and assumes that the data is ordered. This
is probably because XML was first a markup language where order is crucial.
SQL is unordered by default. The SQL row order has no significance and rows
can be returned in any order unless explicitly ordered. Ordered data are lists,
and unordered data are sets.

Orthogonal: “Orthogonal” is a term used to indicate that a feature or capabil-


ity does not impose restrictions or limitations on normal processing.

Outdegree: The outdegree is the number of paths exiting a node.

Outer join: The outer join operation is used to preserve data that doesn’t find
a match in a join operation in order to preserve dangling tuples (partial rows).
There are basically FULL outer joins that preserve data on both sides of the
join, and one-sided outer joins that preserve data only on one given side known
as LEFT or RIGHT joins.

P2P: Peer-to-peer networks eliminate the need for servers and allow all com-
puters to communicate and share resources as peers.

Parallel processing: processing allows concurrent processing. Hierarchical


structures naturally support parallel processing.

Parent: A parent is the next higher level table, or node, in the data structure
that follows the path upward. In a hierarchical structure, parents are important
because their children can not be created without them.

Path or Pathway: A path is a series of connected nodes in a data structure. In


a relational database, these nodes are tables, whereas in a nonrelational data-
base, they can be flat files or segments. These were also known as paths.

Path qualification: Path qualification is when the join conditions of ON


clauses also references higher level tables or nodes that are further up the path
from the link point of the upper level structure being joined. This adds addi-
tional qualifications to the active join operation based on the path that has
already been established above the table or structure being joined.

Path shortening: See Dynamic path shortening.


346 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Pathway: A pathway is a path leading from one node to another node in a


hierarchical structure. These can be defined by an outer join operation. No two
pathways can lead to the same lower level table in a hierarchical structure. This
has also been known as a leg.

PCDATA: XML PCDATA is standard element text that will be parsed.

Persistent data: Persistent data is data that is created and remains after the
operation that created it and is available for reuse.

Physical data structure: Nodes that comprise physical databases are con-
nected by physical address links (such as IBM’s hierarchical IMS database) or
juxtaposition, proximity, or nesting (such as XML).

Polymorphic transform: Polymorphic transform is a transform where the


input structure’s structure does not need to be known. Any structure using the
same input names can be used on input.

Post-relational databases: Post-relation databases are the next generation of


relational databases, those with extended relational features, such as nested rela-
tional processing.

Predicate: Predicate is an expression used as a data filter in a query.

Primary key: A primary key is a database key that uniquely identifies a record
or a row in a file or table and is usually required.

Procedural language: A procedural language is another name for a third-gen-


eration language. With procedural languages, you have to procedurally specify
or code how to perform the programming task you want performed.

Projection: Relationally selecting database data for output is referred to as


projection. Projection controls which data items and node types of the pro-
cessed structure are output. This operation does not affect other data that has
not been selected for output.

Pseudocode: Pseudocode is high-level code that is used in some of the exam-


ples in this book that may not be totally complete or accurate, but is complete
enough to easily convey the principles that are being demonstrated. Being
generic, it uses letters of the alphabet to represent data field names, such as:
SELECT A.a, B.b, C.c FROM XYZ WHERE D.d=X.x.
Glossary 347

Query rewrite/rebuild: See Dynamic rewrite/rebuild.

RDF: Resource description framework is an XML application, providing a


mechanism to exchange metadata.

Reachable: When a data item or node is accessible from a given path.

Read-ahead: Read-ahead is a database access optimization technique that


reads data before it is required to take advantage of current access optimization
opportunities that may not be available when the data is required.

Real-time data: Up-to-the-second fresh data.

Record: A database record is comprised of all node occurrences from the root
node occurrence down.

Recursive structures: XML supports recursive structure where the same ele-
ment node type, or sequence of node type specifications, in a path can be speci-
fied again in the same path in the structure, causing a circular definition. This is
used in structures to explode compound objects, such as parts, that can consist
of other parts that are repeated until their atomic parts are reached.

Regular data structure: A regular data structure is a data structure that fol-
lows standard conventional formatting rules. Also see Conventional data
structure.

Renormalization: As used in this book, renormalization is the reversing of the


effects of applying the Cartesian product that produced replicated data required
for relational processing. This renormalization takes an unnormalized data
result and coverts it to a normalized structured data structure with the proper
replicated data removed.

Replicated data: Replicated data, as used in this book, is data that is repli-
cated when structured data is flattened into a two-dimensional table structure.
This replicated data can throw summaries off and has the potential to obscure
the data structure. Replicated data is not the same as duplicate data, whose
identical data occurrences is semantically correct.

Reusable: The trait that some software component can be reused in many dif-
ferent applications, such as an SQL view that can be queried in many ways, and
used in building larger views. This trait saves on development effort for the
348 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

obvious reasons, and maintenance because there is only one component to


change that also assures consistency of operation.

Restricted Cartesian product: Relational Cartesian products are comprised


of all-row combinations from all of the tables in the join with no restrictions
applied. A restrictive Cartesian product has only the related combinations pre-
served so that they properly reflect the data structure represented in the rela-
tionships specified for the relational query. In a hierarchical structure specified
in a relational query, the Cartesian product is hierarchically restricted and
produces a hierarchical result.

Reshaping: See Structure transformation.

Restructuring: Structure transformation.

Result set: The result set is the flat relational result returned by SQL.

Rewriting queries: This is the process of dynamically restructuring database


queries to produce the same result and achieve more efficiency.

RIGHT join: The RIGHT join operation is an outer join that preserves
unmatched data from the dominant table specified on the right side of the join
operation. This is not as natural or as easy to work with as the LEFT outer Join.

Right-sided nesting: Right-sided nesting of SQL outer joins naturally occurs


when LEFT outer join views are expanded for processing. This normal SQL
view expansion process causes the current matching ON clause to be pushed to
the right, away from its join operation as a nested view expands, causing the
current join operation to be temporarily put on hold. This causes its associated
related working set to be stacked while a new one is created for processing the
new active join view operation that just expanded. When complete, the result is
available as the right argument to the stacked join operation, which is then
unstacked and processed. This stacking process has the beneficial side effect of
preserving the structure in all the stacked working sets so that they cannot be
influenced by the current join operation.

Root: The root of a hierarchical structure is the topmost table or node in the
structure. Because a hierarchical structure is an upside-down tree, it makes
sense that the starting table, or node, is called the root. All access to a hierarchi-
cal structure originates from the root.
Glossary 349

Round tripping: See Document round tripping.

Row: Relational tables are made up of horizontal rows and vertical columns.
The relational name for a row is a tuple. A row is analogous to a record in a flat
file.

Rowset: A rowset is a flat, relational data container. It can be an instance of a


working set or result set. See Result set and Working set.

Runtime: Runtime is the of time occurring during the start and end of execu-
tion. Dynamic operations happen during this time.

SAX: A simpler and smaller XML API than DOM. DOM reads the entire
document into memory while SAX only returns the data requested. SAX is
more efficient but more limited.

Schema: An XML schema defines and maps a specific class XML documents.
It is newer and much more advanced than the older DTD, which serves this
same basic purpose.

Schema-free: As used in this book, schema-free means that the user does not
need to know the data structure being queried because navigation is automatic.
Also see Navigationless.

Scope of control: Each specific join operation joins two working sets or
tables. This means that the tables referenced by ON clauses during each join
operation must belong to one each of the two working sets that are being cur-
rently joined. Because of right-sided nesting, there can be many working sets
that are stacked; these should not be referenced until they are unstacked and
become active. This ON clause range of acceptable table references is also
known as the scope of control.

Secondary key: A secondary key is a key that is not necessarily unique, so that
searching on it will return multiple records such as when searching on the color
red where color is the secondary key. It is also known as an alternate key. A pri-
mary key is unique.

Secondary index: A secondary index is a key index with alternate or second-


ary keys. When used with hierarchical structures, secondary indexes can exist
below the root.
350 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Segment: As used in this book, a segment is an older term for a contiguous


block of closely related data. A structured record is made up of different seg-
ments types and their occurrences that are linked into a hierarchical structure.

Selection: Relational selection is filtering row data based on a data value. The
WHERE clause data selection removes entire rows. ON clause data filtering
removes pieces of data from selected rows that are replaced with NULL values.

Semantic data model: A semantic data model is a conceptual data model


with semantic information included.

Semantic loss: Semantic loss, as used in this book, occurs when semantic
structural information is obscured or lost from a structure when it is trans-
formed. In particular, when a hierarchical structure is flattened, the data struc-
ture and the structural semantics are significantly obscured.

Semantic mapping: Semantic mapping is the mapping of the meaning of the


data derived from its data definition or its data structure. This can be deter-
mined in many ways, such as the meaning in the data label names or the data
itself. As used in this book, the meaning of the data is based on its hierarchical
structure and the relationships between each node. For more info on hierarchi-
cal data semantics see Lowest common ancestor.

Semantic optimizations: Semantic optimizations are powerful optimizations


based on the semantics of the data structure that is being accessed. They can be
very high-level optimizations, where a single optimization can logically remove
node types from the structure being accessed instead of optimizing accesses on
an access-by-access basis.

Semantic transparency: The ability of the data processor to establish the rela-
tionships between data types to automatically derive the correct results.

Semantic Web: The ability of software applications to understand and use


the Web as easily as humans can.

Semantically complex: Queries that operate across multiple paths of a hierar-


chical structure are semantically complex because of the semantically complex
common ancestor node logic that is required to process them. All nodes in a
multipath hierarchical structure are related to each other, which makes
multipath processing very complex. Also see Common ancestor node.
Glossary 351

Semi-join: A semi-join operation is useful for decreasing I/O and transmis-


sion times in a multiprocessor system, usually by transmitting only one side of
the join.

Semistructured data: XML is a semistructured language where the data con-


tains embedded metadata. This is also known as a self-describing language. This
allows for many advanced hierarchical structures and capabilities. These
include variable structures, network-like structures, and dynamically-defined
structures. Some of these capabilities require that the embedded metadata be
examined in the same way as the actual data by the programmer using the
semistructure query language.

Sequence: A sequence is an ordered list of data items.

Serializing: Serializing is the conversion of usually structured data, such as


XML, into a byte stream that can be easily transmitted and reconstructed at the
receiving end. This is normally done through a depth-first tree traversal.

Sets: Sets of data are assumed unordered, whereas lists of data are assumed
ordered. XML data is assumed ordered while relational data is assumed
unordered.

Shared element data: Shared element data, as referred to in this book, is cre-
ated by an XML IDREF usage that produces multiple paths into a node type so
that the same physical data occurrences it defines is shared by two or more
paths. Also see IDREF.

Shared node type: See Shared element data.

Shredding: Shredding occurs when structured XML data is flattened and


placed into multiple columns in relational tables, usually by an ETL process. It
is a form of flattening data. Also see Flattening.

Sibling nodes: Sibling nodes are the sibling node types of a parent node type.
Their left-to-right defined order is application dependent.

Sibling paths: Sibling paths are parallel paths that are indirectly related
through a common ancestor node. These paths are separate and do not influ-
ence each other. They have no node-by-node occurrence correlation. This has
specific consequences for the semantics of the data structure. For example,
352 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

comparing data fields from two sibling paths requires comparing all combina-
tions under the LCA occurrence.

Significant white space: Significant white spaces are spaces, tabs, and line
break codes that are part of the document text and should be preserved and dis-
played when output.

Single view concept: As defined in this book, a single-view concept is a single


large or global view that can be utilized instead of having many specialized
subviews without incurring additional overhead. This makes it much easier on
the user not having to know the structure.

Skolem function: A Skolem function creates a unique object ID using every


value of its argument, usually based on values in its data segment. This is used
when an object ID has not been assigned but one is needed, usually to make
dynamic a linkage at a later date. This type of object ID is also known as a logi-
cal identifier.

Social search: A social search allows you to discover relevant content from
your social connections.

Sorted outer union solution: This is a solution to the multipath data explo-
sion problem solution using the sorted outer union operation to suppress the
data explosion. It is also known as SOU technique. It also sacrifices the seman-
tics of the multipath data structure for accurately querying the entire structure.

Structural join: A structural join is a join that contributes to the derived


structure.

SQL: Structured query language the ANSI and ISO standard interactive and
programming language for getting information from and updating a database.

SQL-compliant: Conformity to the ANSI SQL standards.

SQL/XML Standard: This is the ANSI standard for defining syntax and
functions in ANSI SQL to handle input and output of native XML from SQL.
A number of functions have been defined for XML output. These output func-
tions require nested use and XML-centric operations in order to form hierar-
chical XML documents.
Glossary 353

Start and stop tags: Start and stop tags enclose the content of an XML ele-
ment. The first tag of a container element also names the element.

Static: Not changing, defined or calculated before execution.

Static Query: A predefined query. It implies that the query cannot be speci-
fied dynamically.

String Value: A string value is a variable length character field or value.

Structure aware software: This means that the processor is automatically


aware of the structure of input structures being processed and can perform
advanced hierarchical capabilities. These include automatically structured
XML output, hierarchical optimization, and always producing correct
hierarchical results.

Structure fragment: A structure fragment is an isolated data substructure


located within a hierarchical document that can be processed or returned.
What is significant is that this substructure can be isolated below the root of the
structure.

Structure transformation: As defined in this book, structure transformation


involves changing the physical structure of a data structure in one of two ways
defined here as restructuring or reshaping. Restructuring is performed by using
the natural related relationships in the data structure to change its semantics
which may also change the structure. Reshaping uses the current semantics in
the structure to change the structure into a new desired structure using and
maintaining as much of the current semantics as possible. In either case, this
can delete or duplicate the data as needed to fit the new structure.

Structured data: Used in this book, structured data means the same as hierar-
chical data. See Hierarchical data.

Structured data record: Structured data records are hierarchical structures


that are stored contiguously top-down, left-to-right. They can be used directly
by third- and fourth-generation languages.

Structured database processing: See Nested relational processing.

Structured query output: See Nested display.


354 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Structured SQL views: These are SQL views that define logical or physical
hierarchical structures and can be dynamically joined to form larger logical
hierarchical structures. Structured SQL views are also self-optimizing so they
can be used more often greatly increasing their data abstraction.

Structured VSAM: Variable length VSAM records can support hierarchical


structures by storing their metadata in the record in the form of data length and
data occurrence information.

Substructure views: Substructure views are SQL views that contain hierarchi-
cal data structures that can be seamlessly embedded in SQL statements and
structured views to create larger views.

Surrogate key: Most relational keys serve two purposes: their use as a key, and
their use as data. A surrogate is only used as a key; it is usually automatically
generated because there was probably no data available that could also be used
as the key.

Symmetric join: INNER and FULL joins are referred to as symmetric joins
because they are commutative in operation. They produce the same results
when left and right table inputs are reversed. They model flat structures.

Tabular structure: A tabular structure is a flat, two-dimensional table struc-


ture with rows and columns.

Tagging: Tagging is the adding of identification name tags to pieces of data


usually in XML data using XML elements or start and stop tags.

Tags: See Elements or Start and stop tags.

Text element: A text element is an XML data element, not to be confused


with a markup element.

Theta join: A theta join is a generic join operation.

Three value logic: True, false, and unknown conditions used with relational
processing.

Throwaways: The term throwaways, as used in this book, are rows retrieval
in performing a join operation that are later discarded in the same join opera-
tion because of encountering unmatched rows.
Glossary 355

Top-down processing/execution: Top-down processing is the building and


processing of hierarchical structures top-down. This is the best way to perform
the join operations needed to create a hierarchical data structure because it
avoids throwaway data that has already been retrieved but then determined
unnecessary or not needed.

Transformation: The transformation of data structures, such as XML and


hierarchical structures, involves changing their structure or data format in any
way. This can also change the basic underlying semantics of the structure by
modifying, adding, and deleting semantics derived from the new relationships
represented. This can be performed by introducing new relationships through
rejoining the data differently. Also see Structure transformation.

Transitive closure: If A is the parent of B, and B is the parent of C, then the


transitive closure states that C is an ancestor of A.

Tree graph: A tree graph is a directed graph with one start node.

Tree walking: Tree walking is the process of navigating a hierarchical struc-


ture usually procedurally.

Tuple: A tuple is the relational term for a row of a table.

Twins: The different children node types of a parent node type represent dif-
ferent children with different data types and formats, known as siblings. Twins
are the multiple data occurrences for a specific node type that has the same par-
ent data node occurrence. The node type is the same across twins and the node
parent data occurrence is the same, hence the name twin or twins.

UDF: A UDF is a user-defined function that executes in SQL and is written


in a third-generation language. It is an SQL3 capability. These UDF’s can han-
dle SQL ADT’s.

Unambiguous semantics: Unambiguous semantics are semantics with only


one meaning or interpretation. Hierarchical data structures have unambiguous
semantics because they are singular in nature, having only one path to any
value. This makes their semantics unambiguous, which makes them very useful
and powerful.

Unified view: A unified view sits over heterogeneous data sources and offers a
consistent view definition by defining the entire logical structure view. The
356 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

ANSI SQL LEFT outer join can do this, and it offers some of the following
advantages. The unified view can be specified as subviews in separate manage-
able and reusable SQL views, and these SQL subviews can be specified and
arranged dynamically at execution. In addition, the subview’s definition can be
specified dynamically, and when all the views are expanded they form a solid
single, unified view defined entirely by standard SQL syntax and semantics.

Universal data access: Universal data access (UDA) is a term that indicated
that a given product can support access to all forms, types, and combinations of
data and databases.

Universal qualifier: A form of existential semantics used for testing existence.


An example is “IF ANY/ALL”.

Universal relation model: The user is given a view or imaginary relation con-
sisting of one single relation (table) that is derived from the natural join of all
relations in the database. As a user interface, the advantage is that the user does
not need to know which columns/fields are in which relation (table). The uni-
versal relation model also preserves dangling tuples, which would normally
remove entire rows when there is no join match.

Universal View: See Universal relation model.

UNNEST: Is the process of flattening data that has been structured.

Unnormalized: Unnormalized data is data that has not been normalized.


Denormalized data is data that is unnormalized on purpose, such as prejoining
tables for efficiency reasons.

Unordered data: Unordered data is data that has not been ordered.

Unparsed form: Native XML, with its complex hierarchical structure and
embedded metadata, requires parsing in order to be accessed. There are times
when it is desired that areas of the XML data are to be bypassed by the parsing
operation. This unparsed data is identified in the XML metadata as CDATA.

Unstructured data: Unstructured data has no real structure, such as the data
in an email and a memo. Interestingly, estimates have 85% of all business infor-
mation as unstructured data. There are now many products coming on the
market that can put some structure into unstructured data so that it can be cat-
egorized or organized hierarchically.
Glossary 357

URL: A URL is a universal (or unified) resource locator that is used to access
data on the Internet or Intranet. It is a Web address.

User Defined Function: See UDF.

USING clause: The USING clause is used instead of the ON clause to spec-
ify that an implicit natural join option is to be applied to the join operation.

Variable data structure: With XML, the structure of the data structure can
vary from document occurrence to occurrence, or even within a given docu-
ment. Within limits, SQL can do this using the ON clause based on a data
value field at a higher level of the structure.

Variable length fields: Variable length fields are fields that are of variable
length. They hold any type of value or field. The length of a variable length
field is usually contained somewhere in the record (known to the application)
preceding the variable length field. This means that a data record with variable
length fields is variable length record. This does not make it a varying (format)
structure because this does not change the format.

Variable length records: Variable length data records in a data set are records
that contain variable length fields and/or variable occurring fields, making
them a variable length that changes in the data set. This does not make it a
varying (format) structure because this length change does not change the
format.

Variable occurring fields: Variable occurring fields are data fields that can
repeat sequentially for multiple occurrences in a record. They are variable
because the amount of space required to contain them is variable, only using
the space required. The active number of field occurrences usually directly pre-
cedes the variable occurring fields. This means that a record with variable
occurring fields is variable length record. This does not make it a varying (for-
mat) structure because this does not change the format.

View materialization: View materialization is the process of creating a tem-


porary table or working set that exactly reflects the data and semantics of the
view.

View optimization: View optimization is a powerful outer join semantic


optimization that can dynamically exclude nodes in a view from access based on
which columns are specified at view invocation. This means there is never a
358 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

penalty for using an outer join hierarchical view that contains more nodes than
are needed. This also means that the number of required views can be reduced
since one large view can do the job of many small ones.

View update: The term view update is really the capability to update a
multitable join view. This has always presented a problem because of the lack of
semantics when multiple tables are joined. Modeling hierarchical structures
allows much more flexibility with multitable updates.

Views-within-views: See Embedded views.

Virtual field: virtual field does not physically exist until it is requested or its
associated record is retrieved. At that time, it is computed. It is also called a
computed field.

Virtual key: A virtual key is a logical key that does not physically exist in a
row or record, but is used to retrieve the data and is inserted when the row or
record is retrieved into storage to act as its key. This can be the case when the
key exists in an index and does not exist in the row or record that is indexed.

Virtual view: A virtual view makes multiple data sources from possibly dis-
tributed sites appear as one seamless view in heterogeneous queries.

Virtualization: Virtualization is the process of making multiple sources


appear as one single source.

Web service: Any software service that is available over the Internet using a
standard XML messaging system that is not tied to a specific operating system.

Well-formed documents: An XML document that confirms to the XML


standard, but not necessarily to a DTD or XML schema. This reinforces the
fact that XML documents do not require a predefined definition, allowing
them to be created dynamically.

WHERE clause filtering: WHERE clauses can also specify data filtering cri-
teria besides join criteria. When data filtering is specified on the WHERE
clause, it can affect the entire row so that, if the data filtering criteria causes the
last node occurrence to be removed, the entire row is filtered out. This is not
the case with ON clause filtering, which allows for a finer level of hierarchical
filtering with its data preserving operation.
Glossary 359

White space: White space in XML documents is controlled by the space, car-
riage return, and linefeed characters. Unfortunately, white space can become
important and can affect outcomes of certain types of processing. For example,
when reconstructing a document from its deconstructed pieces, it is difficult to
recreate the white space exactly. This can throw document comparisons off.

Working set: A working set is similar to a temporary table that is a temporary


work area rowset used to the performing of the query.

Working storage: Active storage.

Wrapper element: A nonmapped DB element used to make some input


stream a valid element usually for transmission and processing. For example,
specifying an SQL query or an argument list in a wrapper element.

Xlink: Xlink allows elements to be inserted into XML documents in order to


create and describe links between resources. It uses XML syntax to create struc-
tures that can describe links that are similar to the simple unidirectional
hyperlinks of today’s HTML and more sophisticated links.

XML: XML stands for extensible markup language, and it is a semistructure


language that means it is self-describing because its metadata is embedded
along with its data. This also gives it many powerful new capabilities that are
not found in current conventional data formats. Its data is stored in a nested
form that controls its hierarchical data structure. An XML semistructure is not
necessarily a strict unambiguous hierarchical structure because it allows dupli-
cate node types. This does allow more opportunities but does eliminate exact
unambiguous processing that can be performed nonprocedurally. However,
structured data can be represented in semistructured data and processed as
strict hierarchical data, but not vice versa.

XML aware: When an application can accept and output XML documents.
Also see XML enabled.

XML data type: A relational data type used in SQL to indicate native XML.

XML document: An XML document is an XML formatted hierarchical data


record, including embedded metadata and XML prolog. There are two general
document classification types: data or document. Data-centric/oriented is highly
structured and can be processed automatically. Document-centric/oriented is
loosely structured natural language and is usually processed by humans.
360 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

XML enabled: This term means that the indicated application, or utility, can
input and output XML. This means it can operate in an XML environment.

XML fragment: See Fragment or data fragment.

XML keyword search: XML keyword search requires multipath hierarchical


processing, which means that LCA processing is required to support hierarchi-
cal proximity processing.

XML node type: See Element/Attribute/Text node.

XPath: XPath was a simple XML query language that is now used in most
XML query languages as their navigational sublanguage. Unfortunately, XPath
is single path oriented and cannot handle multipath (bushy) queries in a single
use very well.

XQuery: XQuery is the newest XML query language endorsed by the W3C.
It is a procedural-like language that is particularly good at textual transforma-
tion. As a separate procedural XML processor, it is very good and powerful.
Performing full multipath hierarchical processing will require complex proce-
dural static processing.

XSLT: XSL transformational language for transforming an XML document


into another form, which may be XML, HTML, or some other data format.
Transformation can involve changing the data structure.
Bibliography
Aghili, A. S., Li, H.-G.., and Agrawal, D., TWIX: “Twig Structure and Con-
tent Matching of Selective Queries using Binary Labeling,” ACM International
st
Conference Proceeding Series, Vol. 152, Proceedings of the 1 International Con-
ference on Scalable Information Systems, Hong Kong, 2006.
Abiteboul, S., and Bidoit, N., “Non-First Normal Form Relations to Repre-
rd
sent Hierarchically Organized Data,” Proceeding of the 3 ACM
SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo,
Ontario, Canada, 1984.
Bao, Z., Wu, H., and Chen, B., “Using Semantics in XML Query
Processing,” Proceedings of the 2nd International Conference on Ubiquitous
Information Management and Communication, Suwon, Korea, 2008.
Chen, Z., Ling, T. W., and Liu, M., “XTree for Declarative XML Quering,”
In Y. Lee, et al. (Eds), Database Systems for Advanced Applications (DASFAA),
LNCS 2973, Berlin/Heidelberg: Springer-Verlag, 2004, pp. 100–112.
Cohen, S., Kanza, Y., and Kimelfield, B., “Interconnection Semantics for
Keyword Search in XML,” Proceedings of the 14th ACM International Confer-
ence on Information and Knowledge Management, Bremen, Germany, 2005.
Cohen, S., Mamou, J., and Kanza, Y., “XSEarch: A Semantic Search Engine
for XML,” Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003.

361
362 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Czumaj, A., Kowaluk, M., and Lingas, A., “Faster Algorithms for Finding
Lowest Common Ancestors in Directed Acyclic Graphs,” Electronic Collo-
quium on Computational Complexity, Revision 2 of Report No. 111, 2006.
David, M. M., “Advanced Capabilities of the Outer Join,” ACM SIGMOD
Record, Vol. 21, No. 1, March, 1992.
David, M. M., “ANSI SQL Hierarchical Processing Can Fully Integrate Na-
tive XML,” ACM SIGMOD Record, Vol. 32, Issue 1, March, 2003.
David, M. M., “Automatic Full Parallel Processing of Hierarchical SQL Que-
ries,” DevX, Feb 22, 2009.
David, M. M., “The Power Behind SQL’s Inherent Multipath LCA Hierar-
chical Processing,” Database Journal, May 20, 2010.
Dyreson, C., Bhowmick, S. & Jannu, A. R. “Morph: A (Shape) Polymorphic
XML Query Language,” Plan-X Workshop 08, Savannah, GA, 2009.
Guo, L., Shao, F. and Botev, C., “XRANK: Ranked Keyword Search over
XML Documents,” Proceedings of the 2003 ACM SIGMOD International Con-
ference on Management of data, San Diego, CA, 2003.
Histidis, V., Koudas, N., and Papakonstantiniu, Y., “Keyword Proximity
Search in XML Trees,” IEEE Transactions ON Knowledge AND Data Engi-
neering, Vol. 18, No. 4, 2004.
Krishnamurthy, R., Kaushik, R., and Naughton, J. F., “Unraveling the Dupli-
cate-Elimination Problem in XML-to-SQL Translation,” Seventh International
Workshop on the Web and Databases (WebDB), Paris, France, 2004.
Leven, M., and Loizou, G., “Semantics for Null Extended Nested Relations,”
ACM Transactions on Database Systems (TODS), Vol. 18, Issue 3, 1993.
Li, G., Feng, J., and Wang, J., “Effective Keyword Search for Valuable LCAs
over XML Documents,” Proceedings of the Sixteenth ACM Conference on Infor-
mation Knowledge Management, Lisboa, Portugal, 2007
Li, Q., and Moon, B., “Indexing and Querying XML Data for Regular Path
Expressions,” Proceedings of the 27th VLDB Conference, Roma, Italy, 2001.
Li, Y., Yu, C., and Jagadish, H. V., “Enabling Schema-Free XQuery with
Meaningful Query Focus,” The International Journal on Very Large Data Bases,
Vol. 17, Issue 3, May 2008.
Bibliography 363

Liu, Z., and Yi, C., “Identifying Meaningful Return Information for XML
Keyword Search,” Proceedings of the 2007 ACM SIGMOD International Con-
ference on Management of data, Beijing, China, 2007.
Mani, M., Wang, S., and Dougherty, D. J., “Join Minimization in
XML-to-SQL Translation: An Algebraic Approach,” ACM SIGMOD Record,
Vol. 35, No. 1, March, 2006.
Pal, S., Cseri, I., and Seeliger, O., “XQuery Implementation in a Relational
Database System,” Proceedings of the 31st VLDB Conference, Trondheim, Nor-
way, 2005.
Shanmugasundaram, J., Kierman, J., and Shekita, E., “Querying XML Views
of Relational Data,” Proceedings of the 27th VLDB Conference, Roma, Italy,
2001.
Shanmugasundaram, J., Krishnamurthy, R., and Tatarinov, I., “A General
Technique for Querying XML Documents Using a Relational Database Sys-
tem,” SIGMOD Record, Vol. 30, No. 3, September, 2001.
Shanmugasundaram, J., Tufte, K., and He, G., “Relational Databases for
Querying XML Documents Limitations and Opportunities,” Proceedings of
the 25th VLDB Conference, Edinburgh, Scotland, 1999.
Sun, C., Chan, C.-Y., and Goenka, A. K., “Multiway SLCA-based Keyword
Search in XML Data,” Proceedings of the 16th International Conference on
World Wide Web, Banff, Alberta, Canada, 2007.
Trotman, A., Geva, S., Kamps, J., Lalmas, M., and Murdock, V., "Current
Research in Focused Retrieval and Results Aggregation," Computer Science In-
formation Retrieval, Vol. 13, No. 5, 2010, pp. 407–411.
Ullman, J. D., Aho, A. V., and Hopcroft, J. E., “On Finding Lowest Com-
mon Ancestors in Trees,” Annual ACM Symposium on Theory of Computing,
New York, NY, 1973.
Ullman, J. D., “Principles of Database and Knowledge-Base Systems,” The
Universal Relation, Volume II, Rockville, MD: Computer Science Press, 1989,
p. 1050.
Vagena, Z., Mora, M. M., and Tsotras, V. J., “Twig Query Processing over
Graph-Structured XML Data,” Seventh International Workshop on the Web
and Databases, Paris, France, 2004.
364 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

Xy, Y., and Papakonstantinnou, Y., “Efficient LCA based Keyword Search in
XML Data,” Proceedings of the 11th ACM International Conference Series on
Extending database Technology, Nantes, France, 2008,
Zhang, S. and Dyreson, C., “Polymorphic XML Restructuring,” WWW
Workshop 06, Edinburg, Scotland, May 23–26, 2006.
About the Authors
Michael M. David is the founder and CTO of Advanced Data Access Technol-
ogies. Previously, he was the lead XML architect for NCR/Teradata and their
representative to the SQLX Group. Before that, he was a staff scientist at
Teradata designing database utilities, a senior software designer at Sterling soft-
ware’s Answer Division, and a software designer at Informatics General. He has
researched, designed, and developed commercial query languages for heteroge-
neous hierarchical and relational databases for over 25 years. He has authored
many papers and articles on database topics and his research findings. These
have appeared in SOA World Magazine, Database Journal, DevX, TDAN, DM
Review, XML Journal, Semantic Universe, Web Techniques, Database Program-
ming & Design, DBMS Magazine, ACM SIGMOD Record, Ken North’s
SQLSummit Site, and Colin Whites’ Info DB Journal.
His research and findings have shown that hierarchical data processing is
a subset of relational data processing, and have shown how to utilize this
advanced inherent capability automatically in standard SQL. At a deeper level,
he discovered and located where standard SQL is inherently performing LCA
processing that is needed to support multipath hierarchical processing, and has
determined how this advanced processing has occurred naturally, proving its
existence and validity for use. He has also found valid semantic extensions to
hierarchical data modeling and processing, allowing for powerful data structure
mashups and flexible data structure transformations, using the hierarchical
semantics in relational rowsets, which assures that the resulting semantics are
valid. This book covers these new advancements.
Lee Fesperman is a software veteran who implemented operating systems,
compilers, interpreters, and assemblers at IBM in the early 1970s. With the

365
366 Advanced SQL Dynamic Data Modeling and Hierarchical Processing

advent of relational technology, he implemented a number SQL RDBMSs. He


is a prominent figure in the database industry, having participated in the great
“Null Debate” and cofounded the Database Debunkings site along with C. J.
Date and Fabian Pascal. He also authored the popular “SQL Tutorial” that is
used worldwide by thousands of database developers. Lee is currently working
on a hierarchical query facility combining data from SQL DBMSs with hierar-
chical data sources such as XML, and a cloud data-as-a-service for a variety of
applications.
Lee is also a pioneer in ODBC drivers, implementing one of the first
ODBC drivers and participating in defining the ODBC 3.0 specification. For
this contribution, he is listed in Ken North’s ODBC Hall of Fame. Lee is a
known expert in the Java programming language and has developed JDBC
drivers for several leading database products. Lee is also very experienced with
hardware, having worked for semiconductor manufacturers (ADM and
Signetics) and on a number of embedded systems (close to the hardware, some
as a board resident). He is also a frequent contributor to Java newsgroups and
magazines.
Index
Abstract data types (ADTs), 102, 126 standard join, 18
Access optimizations, 118–19 Asynchronous access processor, XML, 286
Access paths Automatic metadata maintenance, 319
data filtering, 190
defined, 317 Backward path data filtering, 302–3
dynamic shortening of, 134 dynamic, 303, 304
Aggregated data static, 302–3
defined, 317 Bidirectional path data qualification, 295–96
retrieval, 210, 258 Blobs, 319
Ancestor nodes, 318 Bottom-up processing/execution, 320
AND operator Cartesian product
in linking structures, 72 defined, 320
valid/invalid use, 73 in enabling flat 2-D structures, 66
Application programming interface (API), extended, 331
127 restricted, 348
Application views, 318 Cartesian product effect
Association tables applying, 58–59
association reversal, 196 data structure relationship, 59
combining structures with, 194 illustrated, 58
complex usage, 194–96 Cartesian product model
defined, 318–19 ON clause and, 17, 18
many-to-many, 149, 150, 194–95 standard join and, 17–18
usage illustration, 195 Client/server data structure processing, 94
Associativity Coding, data modeling outer join
defined, 19–20, 319 statements, 94–95
FULL outer joins, 24, 25 Collaboration
hierarchictivity in addition to, 20–21 hierarchical processing, 250–51
lack of, 20 structured data processing, 249–50
natural inner joins, 44

367
368 Advanced Standard SQL Dynamic Structured Data Modeling

Commutativity with old-style outer joins, 104–5


defined, 19, 321 with outer join, 67–70
FULL outer joins, 23–24 outer join statements, coding, 94–95
hierarchictivity in addition to, 20–21 outer join statements, generation of, 95
lack of, 19 related capabilities, 81–92
standard join, 18 second rule, 71
Complex data modeling, 321 SQL:1999 and, 102–3
Complex mashup, 193–94 third and final rule, 71–72
Conceptual data modeling, 322 valid results, 73–74
Conceptual views, 54–55 value-added features, 120–21
Contiguous data, 171–72 Data ordering
CROSS joins, 16 nonlinear, 216–17
defined, 32 as restructuring cause, 62–63
operation example, 32 Data persistence, 324
Data qualification
Data abstraction, 157–58 bidirectional path, 295–96
Data accuracy/correctness, 204 downward path, 294–95
Data aggregation, automatic, 204 LCA many-to-one result, 297
Databases LCA one-to-many result, 297
access, 119–20 multipath nonlinear, 296–99
enterprise, 119–20 upward path, 295
legacy, 119–20 Data reusability, 157–58
multimedia, 125–27 Data segments
navigation, 117–18, 159–60 defined, 324
open access interface, 120 isolating, 262–63
procedural, 139 manipulating, 262–63
Data fragment control, 240 selecting, 293–94
Data inheritance, 158–59, 324 Data structure extraction (DSE)
Data modeling as building block technology, 114
ANSI outer join for, 103–4 conclusion, 115
ON clause, join condition rules, 70–72 data structure determination, 111
ON clause examples, 72–73 defined, 109, 324–25
complex, 321 example, 110–11
complex example, 79 imposing data structures and, 114–15
conceptual, 322 internal logic, 113
defined, 324 invalid structure detection, 111
first rule, 70–71 logical table example, 112
flexible, 69 need for, 113–14
hierarchical relational processing semantic information recovery, 109–10
prototype, 146–48 symmetric data structure linking
integrating external data definitions example, 112
with, 127 technology, 109–15
invalid results, 73–74 Data structure modeling
many-to-many, 90–91 combined model illustration, 201
minimum outer join requirements for, 69 depth growth, 198–99
multimedia book example, 79 multiple-path processing, 199–200
object relational interface, 156–57 single-path structures, 197–99
Index 369

vertical growth, 198 symmetric linking example, 111


Data structure processing three-tier architecture, 53–54
lack of, 5–6 variable, 357
object relational interface, 156–57 variable generation, 221–30
Data structures virtualization, 239–41
client/server processing, 94 WHERE clause filtering with, 77
composition, 63–64 WYSIWYG display processing, 52
conceptual view, 54–55 Data value increase
conventional, 322 automatic data aggregation, 204
DSE and, 114–15 conclusion, 205
external view, 54 data accuracy and correctness, 204
filtering, 81–83 dynamic data joining, 201
flexible processing, 69 dynamic path data filtering, 203
fragments, selecting, 293–94 hierarchical optimization, 204
hierarchical, 51–53 interactive data access, 204
hierarchical hybrid, 85 miscellaneous operations, 203–4
hierarchical processing/relational multipath data qualification, 202–3
processing and, 57–59 static data joining, 200–201
indirect linking, 83 structure-aware processing, 203–4
internal view, 54 Data virtualization, 325
invalid example, 84 Data warehouses, 121
logical, 59–60, 201–2, 338 Denormalization, 326
logical tables as root of, 86 Depth growth, structured modeling, 198–99
many-to-one relationship, 55 Distributed hierarchical processing, 326–27
meta information, 172 Downward path data qualification, 294–95
modeling design principles, 64–65 Duplicate data, 327
multipath multioccurrence, 260 Duplicate element use, 327–28
nonhierarchical joining of, 87 Dynamic backward path data filtering, 303,
nonhierarchical join type support, 83–87 304
one-to-many relationship, 55 Dynamic data joining, of structures, 201
ordering as restructuring cause, 62–63 Dynamic metadata maintenance, 328
outer join, 68 Dynamic path filtering, 203
physical, 59–60 Dynamic path shortening
processing ability, 93–94 defined, 134, 328
processing empirical proof, 95–98 illustrated, 135
query languages, 53 network structures, 140
regular, 347 Dynamic rebuild, 137–38
relational, 202 Dynamic structure combining, 187–96
relational composition, 64 Dynamic structured data
relationship to Cartesian product, 59 processing example, 246–48
review, 51–66 remote, automatic processing of, 246
semantically controlled transformations, static versus, 245–46
231–44 user-to-user processing collaboration,
sibling legs query semantics, 60–62 247–48
SQL XML connection, 128–31 Dynamic structure joins, 188–89
substructure views, 74–77
symmetric joining of, 89 Edge tables, 329
370 Advanced Standard SQL Dynamic Structured Data Modeling

Embedded structure view Foreign-key fields, 3


empirical proof, 99–101 Four value logic, 332
expansion, 76 FROM clause, 12–13
Embedded variable structure test, 227, 229 FULL outer joins
Embedded views associativity, 24, 25
defined, 329 characteristics of, 23–24
hierarchical relational processing commutativity, 23–24
prototype, 150 defined, 6, 333
right-sided nesting and, 15 early nonstandard, 8
Empirical proofs logical tables with, 85
embedded structured view support, with more than two tables, 25–26
99–101 natural, 42–44
hierarchical data structure processing, NATURAL option, 26
95–98 nonassociativity attempt, 26
indirect link, 101–2 simulated operation, 7
nonhierarchical data structure processing, as symmetric, 92
98–99
Enterprise database access, 119–20 Global hierarchical optimization, 259–60
Expanded views, 331 Global queries, 217–18
Explicit natural joins, 37–39 Global views, 217, 333
Extensible Markup Language. See XML Glossary, this book, 317–60
External hierarchical processing, 255–56 Heterogeneous joins, 189
External views, 54, 331 Hierarchical control, 96–97
Files, multiple structure formats within, Hierarchical optimization, 208–9
170–71 data value increase, 204
Filtering defined, 334
access path data, 190 global, 259–60
backward path, 302–3 structure illustration, 208
below root, 308 Hierarchical processing
ON clause specification, 14 background history, 256
ON clause versus WHERE clause, 82 combined relational/hierarchical
data structure, 81–83 advantages, 259
defined, 323 conclusion, 264
dynamic paths, 203 defined, 325, 334
global hierarchical, 218 determination, 219
hierarchical data, 217–18, 333 distributed, 326–27
hierarchical flow, 260–61 empirical proof, 95–98
hierarchical query specification with, external, 255–56
281–83 focused aggregated data retrieval, 258
multipath, with WHERE clause, 294–96 global hierarchical optimization, 259–60
SQL multipath multioccurrence, 260–61 hierarchical control, 96–97
variable structure range, 228–30 internal, 255–56
WHERE clause, with substructures, isolating and manipulating data
78–79 segments, 262–63
WHERE clause specification, 5 linking below the root, 263
First normal form, 64, 331 multipath, 257–58
Index 371

multipath LCA types of processing, relational, mapping to hierarchical


261–62 relational rowset, 280
multipath multioccurrence data filtering, RIGHT joins, 29
260–61 symmetric joins, 90
nonlinear, 343 XML, mapping to hierarchical relational
operation, 257 rowset, 280–81
parallel, automatic, 218–19 XML data, 130
principles, 257 See also Data structures
relational processing relationship, 57–59 Hierarchictivity, 20–21
schema-free navigationless access, 257–58 defined, 20, 35, 335
structural control, 97–98 example of, 21
for structured data collaboration, 250–51 one-sided outer joins and, 30
technology and discoveries, 255–64
transformations, 263–64 Implicit natural joins, 37–39
Hierarchical processor Indirect structure linking
defined, 250 defined, 83
structure-aware, 258 empirical proof, 101–2
Hierarchical relational processing, 121–22 illustrated, 83
conclusion, 152–53 Inner joins, xxv
data modeling, 146–48 defined, 31, 336
department view, 147 format, 31–32
dynamic data structure specification, logical tables with, 85
145–46 natural, 38, 44–45
embedded views, 150 new role for, 105
employee view, 147 performance of, 4
many-to-many relationships, 148–50 review, 4–5
operation, 146 sample tables, 5
part/supplier view, 149 side effects, 4
prototype, 145–53 as symmetric, 92
supplier/part view, 149 table order and, 5
view optimization, 150–52 updating, 124
Hierarchical structures Interactive data access, 204
ON clause, 72–73 Intermixing join types, 33–34
for data organization, 250 in logical tables, 87
defined, 334 natural joins, 45–46
hybrid, 85 Internal hierarchical processing, 255–56
legend, 291 Internal views, 54
linking, 304–5 Intersecting data, 336
logical, 338 JDBC, 121, 127, 163
logical, mapping to/from relational Joins
rowset, 281 commutable, 321
modeling of many-to-many relationships, dynamic structure, 188–89
90–91 equal, 330
multileg example, 30 heterogeneous, 189
network structure conversion to, 57 hierarchical, 334
one-sided outer joins, 28 inner, xxv, 4–5, 336
power of, 51–53
372 Advanced Standard SQL Dynamic Structured Data Modeling

Joins (continued) Linking below root


introduction, 3–9 hierarchical processing, 263
lossless, 338 lower structure (no root selected), 307
natural, 37 lower structure (root selected), 305–7
one-sided, 344 structure joining and, 222
order, xxv, 16 Logical data structures, 59–60
outer, xxv, 6–9, 11–20, 23–31 advantage, 201–2
processing problems, 5–6 defined, 59–60
semi-join, 351 Logical tables
static structure, 187–88 coding outer join statements that use, 95
structural, 352 defined, 59–60
symmetric, 354 embedded, 88
types, intermixing, 33–34 example, 111
use of, 3–4 hybrid hierarchical structures with, 86
See also Standard join with INNER and FULL joins, 85
intermixing join types in, 87
Late binding, 160–61 NATURAL, 86
LCA processing, 211–12 as root of data structure, 86
combined type 1 and type 2, 215–16 Looking backward, 223
multipath, 261–62 Looking forward, 223–24
multiple type 1, 215–16 Lower structure linking, 177–86
SELECT operation, 262 conclusion, 186
type 1 internal, 212 multiple path reference, 182–84
type 2 internal, 212–14 nonroot, 177–81
type 2 variable OR, 214–15 optimization concerns, 184
WHERE clause, 261–62 single path reference, 181–82
XQuery and, 273 with view WHERE clause, 185–86
LEFT joins Lowest common ancestors (LCAs)
defined, 6, 26, 337 combinations controlled by data
in DSE logic, 113 occurrence, 299–300
illustrated, 27 combinations for decision logic, 299
with left-side nesting, 14 complex multipath decision logic, 301–2
natural, 42 data from up/down the structure, 298
with right-side nesting, 15 defined, 296, 339
Left-side nesting, 337 location higher than parent, 297–98
Legacy database access, 119–20 logic too complex to hand code, 302
Linear structure inversions, 236–37 many-to-one result data qualification,
Linear-to-nonlinear reshaping, 237–38 297
Linking multiple, 298–99
below root of lower structure (no root one-to-many result data qualification,
selected), 307 297
below root of lower structure (root variable, 300–301
selected), 305–7 WHERE clause, 300
defined, 338 See also LCA processing
filtering below root of lower view, 308
hierarchical, 304–5 Many-to-many relationships
with mashups, 303–8 association table, 149, 150, 194–95
Index 373

data modeling, 90–91 LCA type 2 internal, 212–14


defined, 339 LCA type 2 variable OR, 214
hierarchical relational processor multiple LCA type 1, 215–16
prototype, 148–50 Multipath hierarchical structure operations,
natural occurrence of, 148 207–20
outer join modeling of, 91 aggregated data retrieval, 210
Many-to-one relationship, 55–57 automatic hierarchical parallel processing,
association table, 56–57 218–19
defined, 55–56, 339 conclusion, 219–20
illustrated, 56 global queries, 217–18
structured output, 56 global views and schema-free processing,
Markup data, 339 217
Mashups, 190–94 hierarchical optimization, 209–10
advanced structure linking with, 303–8 hierarchical processing, 211–16
complex, 193–94 nonlinear ordering, 216–17
defined, 325 structure-aware processing, 208
hierarchical structure, 334 Multipath multioccurrence data filtering,
simple, 190–93 260–61
Metadata Multipath nonlinear data qualification,
automatic maintenance, 319 296–99
combining, 191 complex multipath LCA decision logic,
defined, 340 301–2
dynamic maintenance, 249, 328 LCA combinations controlled by data
levels of processing, 249 occurrence, 299–300
naturally nested, 191 LCA data up/down the structure, 298
Multileg AND selection, 61 LCA determines decision logic, 299
Multileg data selection, 61 LCA located higher than parent, 297–98
Multileg hierarchical structure, 29–30 LCA logic too complex to hand code,
Multileg OR selection, 62 302
Multimedia databases LCA many-to-one result, 297
application view, 125 LCA one-to-many result, 297
authoring system, 126 multiple LCAs, 298–99
directory support, 125–27 variable LCAs with OR decision logic,
as specialized, 125 300–301
Multipath Multipath processing, 340
data qualification, 202 Multipath queries
hierarchical processing, 257–58 alternative to transformation, 244
Multipath data filtering defined, 341
bidirectional data qualification, 295–96 Multiple independent tests, 226–27
downward path data qualification, Multiple-path data structure modeling,
294–95 199–200
upward path data qualification, 295 Multiple paths
with WHERE clause, 294–96 nonroot reference, 183
Multipath hierarchical processing references to lower structure, 182–84
combining LCA type 1 and 2, 216 semantics, 183–84
LCA, 211–12 Multitable natural join simulation, 39,
LCA type 1 internal, 212 40
374 Advanced Standard SQL Dynamic Structured Data Modeling

Namespaces, XML, 270 structure fragments, 293–94


Natural FULL outer joins, 42–44 Nonhierarchical data structure processing,
condensed result, 43 98–99
defined, 42–43 Nonhierarchical joining
reordering, 44 of data structures, 87–90
Natural inner joins, 44–45 symmetric, 88–89
associativity, 44 Nonlinear ordering, 216–17
explicit and implicit, 38 Nonlinear-to-linear reshaping, 238
Natural joins Nonlinear-to-nonlinear reshaping, 238–39
defined, 37, 341 Nonprocedural languages, 343
explicit, 37–39 Nonrelational SQL interfaces, optimization
implicit, 37–39 of, 138–39
intermixing types, 45–46 Nonrelational universal data access, 163–73
LEFT, 42 Nonroot lower level linking
multitable, 39–41 of bottom structure, 178
one-sided, 41–42, 69 data used in, 181
NATURAL keyword, 38, 39 example of, 178
NATURAL logical table, 86 multiple path, 179, 180
Navigation multiple path reference, 183
database, 117–18, 159–60 optimization concerns, 184
defined, 341 overview of, 177–78
structured data, 167–69 performing with multiple link points,
XML, 270 178–79
XQuery, 272–73 previous method, 178, 179
Nested left-side view expansion, 100 semantics of, 178–81
Nested right-side view expansion, 101 single path reference, 182
Nested variable structure test, 227, 228 view optimization, 184
Nesting See also Lower structure linking
elements, XML, 269–70 Normalization rules, 64–65, 343
left-side, 15, 337
natural view, 190 Object Definition Language (ODL), 103
right-sided, 15, 348 Object relational interface, 123, 155–62
substructures, 75 capabilities, 156
Network structures conclusion, 162
ambiguous results, 74 data abstraction and reusability, 157–58
conversion to hierarchical structures, 57 database navigation, 159–60
dynamic path shortening, 140 data inheritance, 158–59
hierarchical optimizations, 140–41 data modeling, 156–57
junction points, 140 illustrated, 156
unambiguous results, 75 late binding and polymorphism, 160–61
Node collection, 292–93, 342 nonrelational access, 159–60
Node promotion, 291–92, 342–43 plug and play, 161–62
Node selection structure processing, 156–57
collection with multiple paths, 292–93 ODBC, 121, 127, 163, 344
promotion with single path, 291–92 ODMG model, 103
with SELECT operation, 289–94 ON clause
single linear path, 290–91 Cartesian product model and, 17, 18
Index 375

data-filtering criteria specification on, 14 dynamic shortening of access path and,


data modeling examples, 72–73 134
data modeling join condition rules, join table reordering and, 133–34
70–72 large views and, 137
defined, 344 of nonrelational SQL interfaces, 138–39
for hierarchical structure, 72–73 parallel database processing efficiency
join order and, 16 and, 137
shifting to WHERE clause, 141–43 replicated rows elimination, 136
WHERE clause filtering versus, 82 results, 136
One or the other variable test, 225–26 shifting ON clauses to WHERE clause
One-sided natural outer joins, 41–42 and, 141–43
illustrated, 42 Outer joins
LEFT, 41–42 advanced capabilities, 117–31
transformation, 46–47 ANSI, 103–4
One-sided outer joins, 26–31, 344 associativity, 19–20
building structures bottom-up, 29 commutativity, 19
defined, 26 data structures, 68
as hierarchical, 28 early nonstandard example, 9
hierarchictivity and, 30 FULL, 6, 8, 14–15, 23–26, 333
natural, 69 natural, 39–41
as noncommutative, 27 natural one-sided, 41–42
nonhierarchical example, 31 old-style, 104–5
types of, 26 one-sided, 7, 26
One-to-many relationship, 55, 344 open database access interface, 120
Open database access interface, 120 optimizing, 6
Optimizations query translation, 139
access, 118–19 review, 6–7
hierarchical, 208–10, 259–60 specifications, 14, 15
outer join, 137–43 syntax, 11–13
semantic, 350 syntax problems, 7–9
view, 150–52 table join order, 13
Ordered data, XML, 270–71, 345 table removal from, 134–37
OR operator types of, 6
hierarchical operations and, 224–25 universal database navigation and access,
in linking structures, 72 118
testing, 224–25 updating, 124–25
Outer join data modeling, 67–70 uses for, xxv
coding, 94–95
generation of, 95 Parallel database processing
of many-to-many relationships, 91 defined, 345
related capabilities, 81–92 hierarchical, automatic, 218–19
Outer join optimization, 133–43 outer join optimization and, 137
applying to network structures, 140–41 Physical data structures, 59–60
conclusion, 143 Plug and play
dynamic, 119 defined, 161–62
dynamic rebuild and, 137–38 illustrated, 161
376 Advanced Standard SQL Dynamic Structured Data Modeling

Polymorphic transformation data relationships, 231


defined, 242, 346 defined, 231
linear example, 242, 243 illustrated, 232
nonlinear example, 242, 243 with multiple levels, 234–35
Polymorphism, 160–61 performing, 232
Postprocessor, XML, 286 use of, 231
Postrelational data, interfacing to, 171 RIGHT joins
Preprocessor, XML, 284–85 defined, 6, 26, 348
Prerelational data, interfacing to, 171 in DSE logic, 113
Primary-key fields, 3 hierarchical structure, 29
Procedural code, 138–39 illustrated, 27
Right-sided nesting, 348
Queries Root, 348
ad hoc, 317 Rows
bushy, 320 contents, 3
global, 217–18 defined, 349
hierarchical, 281–83, 335 replicated, elimination of, 136
multipath, 244, 341
static, 353 Schema-free processing
defined, 349
Recursive structures, XML, 270, 347 global views and, 217
Relational processing ignoring, 273
hierarchical processing relationship, queries, 257–58
57–59 Scope of control, 349
nested, 342 Second normal form, 65
Relational structures, 202 SELECT operation, 6, 9
Remote structured data in hierarchical optimization, 209–10
automatic processing of, 245–51 LCA processing, 262
hierarchical processing for collaboration, node selection with, 289–94
250–51 Semantic mapping, 349
integrating SQL with maintenance, 248 Semantic optimizations, 350
metadata processing, 249 Semi-join, 351
processing collaboration, 249–50 Semistructured data, 266, 351
processing example, 246–48 Sibling legs query semantics, 60–62
static versus dynamic, 245–46 Simple mashup, 190–93
Reordering, table, 133–34 defined, 190–92
Replicated data, 5 illustrated, 192
Reshaping See also Mashups
defined, 231 Single-path data structure modeling, 197–99
inverting linear structure by, 236–37 Single-path reference, to lower structure,
linear-to-nonlinear, 237–38 181–82
nonlinear-to-linear, 238 SQL
nonlinear-to-nonlinear, 238–39 hierarchical data integration, 259
polymorphic transformation, 242–43 integrating with dynamic structured data
restructuring versus, 235 maintenance, 248
Restructuring, 231–35 interfaces, nonrelational, 138–39
adding extra level, 234 interfaces, standardized, 155
Index 377

structured data access, 166–67 “fixed occurs,” 166


SQL:1999 internal navigation of, 167–69
data modeling and, 102–3 mapping of, 167–69
dynamic rebuild and, 138 overview, 164–66
navigation, 160 processing, 164
nested data storage, 103 processing collaboration, 249–50
for universal data access, 169–70 pseudo code for decomposition and
SQL/CLI, 121, 127, 163 mapping of, 168
SQL/XML static, 245–46
capability limitation, 272 universal data access of, 169–70
defined, 352 variable-length contiguous, 165
development projects, 279 virtualization, 241
semi-structured processing, 274 XML, 130
solution with standard SQL, 276–77 Structure fragments, 353
vendor solutions, 274 Structures
Standardized SQL interface, 155 dynamic data joining of, 201
Standard join static data joining of, 200–201
associativity, 18 Substructures, 74–77
Cartesian product model and, 17–18 embedded, 76
commutativity, 18 right-sided nesting, 75
operation, 14–17 support, 77
syntax, 11–14 views, 354
See also Joins WHERE clause filtering with, 78–79
Static backward path data filtering, 302–3 See also Data structures
Static data joining, of structures, 200–201 Symmetric joining
Static structured data, 245–46 of data structures, 89
Static structure joins, 187–88 defined, 354
Structural control, 97–98 in leg synchronization, 90
Structure-aware processing, 203–4, 208 performing, 89
Structure combining at root level, 88
access path data filtering, 190 Symmetric linking
with association tables, 194–96 data structures example, 111–12
complex mashup, 193 DSE example, 112
conclusion, 196
dynamic structure join, 188–89 Tables
heterogeneous join, 189 association, 194–96
metadata, 191 edge, 329
natural view nesting, 190 logical, 338
simple mashup, 190–93 many-to-many, 194–95
static structure join, 187–88 order, 336
Structured data reordering, 133–34, 336–37
access basics, 166–67 unnecessary, removing, 134–37
access problem, 169 Tabular structure, 354
composition of, 164 Three-tier architecture, 53–54
defined, 353 defined, 53
dynamic, 245–48 illustrated, 54
Top-down processing/execution, 355
378 Advanced Standard SQL Dynamic Structured Data Modeling

Transformations multiple independent tests, 226–27


any-to-any structure, 318 nested variable structure test, 227, 228
defined, 353, 355 one or the other variable test, 225–26
hierarchical processing, 263–64 range filtering, 228–30
multipath queries alternative, 244 Variable data structures, 357
polymorphic, 242–43, 346 Variable-occurrence repeating segments, 165
Variable structure control, 224–25
Unambiguous semantics, 355 Variable structure generation control,
Unified views, 355–56 308–10
UNION joins at node level, 309–10
defined, 32 at view level, 310
example of, 33 See also Variable data structure generation
operation, 6 Variable structure range filtering, 228–30
performing, 32 Vertical growth, structured modeling, 198
Universal data access, 163–73 Views
conclusion, 173 application, 125
contiguous data view and, 171–72 conceptual, 54–55
interfacing to middleware, 170 for contiguous data, 171–72
interfacing to prerelational/postrelational defined, 354
data, 171 embedded, 15, 99–101, 150, 329
multiple structure formats and, 170–71 expanded, 100, 101, 151, 331
nonrelational SQL-based, 163–73 external, 54, 331
of structured data, 169–70 global, 172, 217, 333
Unstructured data, 356 internal, 54
Updating views, 123–25 materialization, 357
Upward path data qualification, 295 notes on, 316
User-defined functions (UDFs), 102 optimization, 150–52, 357–58
User-defined types (UDTs), 102 outer join access, 122
USING clause substructure, 74–77, 354
column name specification, 13 unified, 355–56
defined, 357 update capability, 123–25
join order and, 16 variable structures built using, 310
Value-added features, 120–21 WHERE clause, lower structure linking
Variable data structure generation, 221–30 with, 185–86
advanced variable control structure, Virtualization
224–25 data fragment control, 240
along multiple paths, 228, 229 data structure, 239–41
concept, 221 defined, 358
conclusion, 230 example, 241
embedded variable structure test, 227, W3C. See XQuery
229 WHERE clause, 7, 8
with hierarchical data, 230 ON clause uses and, 104
linking below the root, 222 LCA, 300
looking backward, 223 LCA processing, 261–62
looking forward, 223–24 lower structure linking with, 185–86
multiple choices, 225–27 outer joins and, 13–14
Index 379

WHERE clause filtering use cases, 129


ON clause filtering versus, 82 user linear mindset limitation, 275
with data structures, 77 variable structure formats, 268–69
defined, 358 Web sites, 129, 130
hierarchical data, 260 white space, 359
multipath, 294–96 XML processor, 279–87
specification, 5 asynchronous access processor, 286
with substructures, 78–79 backward path data filtering, 302–3
transformation for, 78 conclusion, 286–87, 310–11
White space, XML, 359 defined, 279–80
WYSIWYG display processing, 52 examples, 289–311
external operations, 284
XML (Extensible Markup Language), hierarchical query specification with
265–77 filtering, 281–83
attribute mode output, 268 internal layout, 283
attributes, 319 multipath data filtering, 294–96
centric syntax, 277 multipath nonlinear data qualification
content, handling, 128 (complex), 299–302
data processing, 271 multipath nonlinear data qualification
data structure connection, 128–31 (simple), 296–99
data types, 359 node selection with SELECT operation,
defined, 128, 359 289–94
documents, 359 operations, 284–85
duplicate element use, 269 postprocessor, 286
element mode, 266 preprocessor, 284–85
element nesting, 269–70 standard SQL processor, 285–86
fragments, 360 structure linking with mashups, 303–8
hierarchical data processing, 129 variable structure generation control,
hierarchical structure, 130 308–10
mixed mode output, 268 XPath, 360
multiple content types, 266–68 XQuery, 265, 271
namespaces, 270 adding/removing data from, 275–76
navigation, 270 decisions limit capabilities, 272
ordered data, 270–71 defined, 360
politics of, 271–73 LCA processing and, 273
recursive structures, 270 relational processing support, 272
secret agenda, 271–73 SELECT operation and, 275–76
semistructured data, 266 semi-structured processing, 274
shared element data, 269–70 user navigation, 272–73
SQL hierarchical solution, 276–77 XSLT, 271, 360
standard implementation, 265
support, 265

You might also like