SQL & NoSQL Databases
Andreas Meier · Michael Kaufmann
SQL & NoSQL Databases
Models, Languages, Consistency
Options and Architectures for
Big Data Management
Andreas Meier Michael Kaufmann
Department für Informatik Departement für Informatik
Universität Fribourg Hochschule Luzern
Fribourg, Switzerland Rotkreuz, Switzerland
Translated from German by Anja Kreutel.
ISBN 978-3-658-24548-1 ISBN 978-3-658-24549-8 (eBook)
https://doi.org/10.1007/978-3-658-24549-8
Library of Congress Control Number: 2019935851
Springer Vieweg
© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer Vieweg imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH
part of Springer Nature
The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany
Foreword
The term “database” has long since become part of people’s everyday vocabulary, for
managers and clerks as well as students of most subjects. They use it to describe a logi-
cally organized collection of electronically stored data that can be directly searched and
viewed. However, they are generally more than happy to leave the whys and hows of its
inner workings to the experts.
Users of databases are rarely aware of the immaterial and concrete business values
contained in any individual database. This applies as much to a car importer’s spare parts
inventory as to the IT solution containing all customer depots at a bank or the patient
information system of a hospital. Yet failure of these systems, or even cumulative errors,
can threaten the very existence of the respective company or institution. For that rea-
son, it is important for a much larger audience than just the “database specialists” to be
well-informed about what is going on. Anyone involved with databases should under-
stand what these tools are effectively able to do and which conditions must be created
and maintained for them to do so.
Probably the most important aspect concerning databases involves (a) the distinction
between their administration and the data stored in them (user data) and (b) the economic
magnitude of these two areas. Database administration consists of various technical and
administrative factors, from computers, database systems, and additional storage to the
experts setting up and maintaining all these components—the aforementioned database
specialists. It is crucial to keep in mind that the administration is by far the smaller part
of standard database operation, constituting only about a quarter of the entire efforts.
Most of the work and expenses concerning databases lie in gathering, maintaining,
and utilizing the user data. This includes the labor costs for all employees who enter data
into the database, revise it, retrieve information from the database, or create files using
this information. In the above examples, this means warehouse employees, bank tellers,
or hospital personnel in a wide variety of fields—usually for several years.
In order to be able to properly evaluate the importance of the tasks connected with
data maintenance and utilization on the one hand and database administration on the
other hand, it is vital to understand and internalize this difference in the effort required
v
vi Foreword
for each of them. Database administration starts with the design of the database, which
already touches on many specialized topics such as determining the consistency checks
for data manipulation or regulating data redundancies, which are as undesirable on the
logical level as they are essential on the storage level. The development of database solu-
tions is always targeted at their later use, so ill-considered decisions in the development
process may have a permanent impact on everyday operations. Finding ideal solutions,
such as the golden mean between too strict and too flexible when determining consist-
ency conditions, may require some experience. Unduly strict conditions will interfere
with regular operations, while excessively lax rules will entail a need for repeated expen-
sive data repairs.
To avoid such issues, it is invaluable that anyone concerned with database develop-
ment and operation, whether in management or as a database specialist, gain systematic
insight into this field of computer sciences. The table of contents gives an overview of
the wide variety of topics covered in this book. The title already shows that, in addition
to an in-depth explanation of the field of conventional databases (relational model, SQL),
the book also provides highly educational information about current advancements and
related fields, the keywords being “NoSQL” or “post-relational” and “Big Data.” I am
confident that the newest edition of this book will, once again, be well received by both
students and professionals—its authors are quite familiar with both groups.
Carl August Zehnder
Preface
It is remarkable how stable some concepts are in the field of databases. Information
technology is generally known to be subject to rapid development, bringing forth new
technologies at an unbelievable pace. However, this is only superficially the case. Many
aspects of computer science do not essentially change at all. This includes not only the
basics, such as the functional principles of universal computing machines, processors,
compilers, operating systems, databases and information systems, and distributed sys-
tems, but also computer language technologies such as C, TCP/IP, or HTML, which are
decades old but in many ways provide a stable fundament of the global, earth-spanning
information system known as the World Wide Web. Likewise, the SQL language has
been in use for over four decades and will remain so in the foreseeable future. The the-
ory of relational database systems was initiated in the 1970s by Codd (relation model
and normal forms), Chen (entity and relationship model) and Chamberlin and Boyce
(SEQUEL). However, these technologies have a major impact on the practice of data
management today. Especially, with the Big Data revolution and the widespread use of
data science methods for decision support, relational databases, and the use of SQL for
data analysis are actually becoming more important. Even though sophisticated statistics
and machine learning are enhancing the possibilities for knowledge extraction from data,
many if not most data analyses for decision support rely on descriptive statistics using
SQL for grouped aggregation. In that sense, although SQL database technology is quite
mature, it is more relevant today than ever.
Nevertheless, a lot has changed in the area of database systems lately over the years.
Especially the developments in the Big Data ecosystem brought new technologies into
the world of databases, to which we pay enough attention to. The nonrelational database
technologies, which are finding more and more fields of application under the generic
term NoSQL, differ not only superficially from the classical relational databases, but
also in the underlying principles. Relational databases were developed in the twentieth
century with the purpose of enabling tightly organized, operational forms of data man-
agement, which provided stability but limited flexibility. In contrast, the NoSQL data-
base movement emerged in the beginning of the current century, focusing on horizontal
vii
viii Preface
partitioning and schema flexibility, and with the goal of solving the Big Data problems
of volume, variety, and velocity, especially in Web-scale data systems. This has far-
reaching consequences and has led to a new approach in data management, which devi-
ates significantly from the previous theories on the basic concept of databases: the way
data is modeled, how data is queried and manipulated, how data consistency is handled,
and the system architecture. This is why we compare these two worlds, SQL and NoSQL
databases, from different perspectives in all chapters.
We have also launched a website called sql-nosql.org, where we share teaching and
tutoring materials such as slides, tutorials for SQL and Cypher, case studies, a work-
bench for MySQL and Neo4j, so that language training can be done either with SQL or
with Cypher, the graph-oriented query language of the NoSQL database Neo4j.
At this point, we would like to thank Anja Kreutel for her great effort and success
in translating the eighth edition of the German textbook to English. We also thank
Alexander Denzler and Marcel Wehrle for the development of the workbench for rela-
tional and graph-oriented databases. For the redesign of the graphics, we were able to
win Thomas Riediker and we thank him for his tireless efforts. He has succeeded in giv-
ing the pictures a modern style and an individual touch. For the further development
of the tutorials and case studies, which are available on the website sql-nosql.org, we
thank the computer science students Andreas Waldis, Bettina Willi, Markus Ineichen,
and Simon Studer for their contributions to the tutorial in Cypher and to the case study
Travelblitz with OpenOffice Base and with Neo4J. For the feedback on the manuscript
we thank Alexander Denzler, Daniel Fasel, Konrad Marfurt, and Thomas Olnhoff, for
their willingness to contribute to the quality of our work with their hints. A big thank you
goes to Sybille Thelen, Dorothea Glaunsinger, and Hermann Engesser of Springer, who
have supported us with patience and expertise.
February 2019 Andreas Meier
Michael Kaufmann
Contents
1 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Information Systems and Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 SQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Relational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Structured Query Language (SQL) . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Relational Database Management System . . . . . . . . . . . . . . . . . . . . 8
1.3 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Graph-based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Graph Query Language Cypher . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.3 NoSQL Database Management System . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Organization of Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 From Data Analysis to Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 The Entity-Relationship Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Entities and Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.2 Association Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.3 Generalization and Aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Implementation in the Relational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Dependencies and Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Mapping Rules for Relational Databases . . . . . . . . . . . . . . . . . . . . . 46
2.3.3 Structural Integrity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4 Implementation in the Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4.1 Graph Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4.2 Mapping Rules for Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4.3 Structural Integrity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.5 Enterprise-Wide Data Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
ix
x Contents
2.6 Formula for Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3 Database Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.1 Interacting with Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2 Relational Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2.1 Overview of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2.2 Set Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.2.3 Relational Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.3 Relationally Complete Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.3.1 SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.3.2 QBE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.4 Graph-based Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.4.1 Cypher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.5 Embedded Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.5.1 Cursor Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.5.2 Stored Procedures and Stored Functions . . . . . . . . . . . . . . . . . . . . . 108
3.5.3 JDBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.5.4 Embedding Graph-based Languages . . . . . . . . . . . . . . . . . . . . . . . . 110
3.6 Handling NULL Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.7 Integrity Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.8 Data Protection Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.9 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4 Ensuring Data Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.1 Multi-User Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.2 Transaction Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.2.1 ACID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.2.2 Serializability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2.3 Pessimistic Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2.4 Optimistic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.2.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.3 Consistency in Massive Distributed Data . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.3.1 BASE and the CAP Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.3.2 Nuanced Consistency Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.3.3 Vector Clocks for the Serialization of Distributed Events . . . . . . . . 137
4.4 Comparing ACID and BASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.5 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Contents xi
5 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.1 Processing of Homogeneous and Heterogeneous Data . . . . . . . . . . . . . . . . 143
5.2 Storage and Access Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.2.1 Indexes and Tree Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.2.2 Hashing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.2.3 Consistent Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.2.4 Multidimensional Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.3 Translation and Optimization of Relational Queries . . . . . . . . . . . . . . . . . . 155
5.3.1 Creation of Query Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.3.2 Optimization by Algebraic Transformation . . . . . . . . . . . . . . . . . . . 156
5.3.3 Calculation of Join Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.4 Parallel Processing with MapReduce. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.5 Layered Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.6 Use of Different Storage Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6 Postrelational Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.1 The Limits of SQL—and Beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.2 Federated Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.3 Temporal Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.4 Multidimensional Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.5 Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.6 Object-Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.7 Knowledge Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.8 Fuzzy Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.9 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7 NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.1 Development of Nonrelational Technologies. . . . . . . . . . . . . . . . . . . . . . . . 201
7.2 Key-Value Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.3 Column-Family Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
7.4 Document Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
7.5 XML Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
7.6 Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
List of Figures
Fig. 1.1 Architecture and components of information systems . . . . . . . . . . . . . . . 2
Fig. 1.2 Table structure for an EMPLOYEE table. . . . . . . . . . . . . . . . . . . . . . . . . 4
Fig. 1.3 EMPLOYEE table with manifestations . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Fig. 1.4 Formulating a query in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Fig. 1.5 The difference between descriptive and procedural languages . . . . . . . . 8
Fig. 1.6 Basic structure of a relational database management system . . . . . . . . . 9
Fig. 1.7 Variety of sources for Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Fig. 1.8 Section of a property graph on movies . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Fig. 1.9 Section of a graph database on movies . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Fig. 1.10 Basic structure of a NoSQL database management system . . . . . . . . . . . 17
Fig. 1.11 Three different NoSQL databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Fig. 1.12 The four cornerstones of data management . . . . . . . . . . . . . . . . . . . . . . . 19
Fig. 2.1 The three steps necessary for data modeling . . . . . . . . . . . . . . . . . . . . . . 27
Fig. 2.2 EMPLOYEE entity set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Fig. 2.3 INVOLVED relationship between employees and projects . . . . . . . . . . . 29
Fig. 2.4 Entity-relationship model with association types . . . . . . . . . . . . . . . . . . 30
Fig. 2.5 Overview of the possible cardinalities of relationships . . . . . . . . . . . . . . 32
Fig. 2.6 Generalization, illustrated by EMPLOYEE . . . . . . . . . . . . . . . . . . . . . . . 33
Fig. 2.7 Network-like aggregation, illustrated by
CORPORATION_STRUCTURE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Fig. 2.8 Hierarchical aggregation, illustrated by ITEM_LIST . . . . . . . . . . . . . . . 35
Fig. 2.9 Redundant and anomaly-prone table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Fig. 2.10 Overview of normal forms and their definitions . . . . . . . . . . . . . . . . . . . 37
Fig. 2.11 Tables in first and second normal forms. . . . . . . . . . . . . . . . . . . . . . . . . . 39
Fig. 2.12 Transitive dependency and the third normal form . . . . . . . . . . . . . . . . . . 41
Fig. 2.13 Table with multivalued dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Fig. 2.14 Improper splitting of a PURCHASE table . . . . . . . . . . . . . . . . . . . . . . . . 44
Fig. 2.15 Tables in fifth normal form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Fig. 2.16 Mapping entity and relationship sets onto tables . . . . . . . . . . . . . . . . . . . 47
xiii
xiv List of Figures
Fig. 2.17 Mapping rule for complex-complex relationship sets . . . . . . . . . . . . . . . 49
Fig. 2.18 Mapping rule for unique-complex relationship sets. . . . . . . . . . . . . . . . . 50
Fig. 2.19 Mapping rule for unique-unique relationship sets . . . . . . . . . . . . . . . . . . 51
Fig. 2.20 Generalization represented by tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Fig. 2.21 Network-like corporation structure represented by tables . . . . . . . . . . . . 53
Fig. 2.22 Hierarchical item list represented by tables . . . . . . . . . . . . . . . . . . . . . . . 54
Fig. 2.23 Ensuring referential integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Fig. 2.24 A Eulerian cycle for crossing 13 bridges . . . . . . . . . . . . . . . . . . . . . . . . . 58
Fig. 2.25 Iterative procedure for creating the set Sk(v) . . . . . . . . . . . . . . . . . . . . . . 59
Fig. 2.26 Shortest subway route from stop v0 to stop v7 . . . . . . . . . . . . . . . . . . . . . 61
Fig. 2.27 Construction of a Voronoi cell using half-spaces . . . . . . . . . . . . . . . . . . . 63
Fig. 2.28 Dividing line T between two Voronoi diagrams
VD(M1) and VD(M2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Fig. 2.29 Sociogram of a middle school class as a graph and as
an adjacency matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Fig. 2.30 Balanced (B1–B4) and unbalanced (U1–U4) triads. . . . . . . . . . . . . . . . . 68
Fig. 2.31 Mapping entity and relationship sets onto graphs . . . . . . . . . . . . . . . . . . 69
Fig. 2.32 Mapping rule for network-like relationship sets . . . . . . . . . . . . . . . . . . . 70
Fig. 2.33 Mapping rule for hierarchical relationship sets . . . . . . . . . . . . . . . . . . . . 71
Fig. 2.34 Mapping rule for unique-unique relationship sets . . . . . . . . . . . . . . . . . . 72
Fig. 2.35 Generalization as a tree-shaped partial graph . . . . . . . . . . . . . . . . . . . . . 73
Fig. 2.36 Network-like corporation structure represented as a graph . . . . . . . . . . . 74
Fig. 2.37 Hierarchical item list as a tree-shaped partial graph . . . . . . . . . . . . . . . . 75
Fig. 2.38 Abstraction steps of enterprise-wide data architecture . . . . . . . . . . . . . . 77
Fig. 2.39 Data-oriented view of business units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Fig. 2.40 From rough to detailed in ten design steps. . . . . . . . . . . . . . . . . . . . . . . . 80
Fig. 3.1 SQL as an example for database language use . . . . . . . . . . . . . . . . . . . . 86
Fig. 3.2 Set union, set intersection, set difference,
and Cartesian product of relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Fig. 3.3 Projection, selection, join, and division of relations . . . . . . . . . . . . . . . . 88
Fig. 3.4 Union-compatible tables SPORTS_CLUB and PHOTO_CLUB. . . . . . . 89
Fig. 3.5 Set union of the two tables SPORTS_CLUB and PHOTO_CLUB . . . . . 90
Fig. 3.6 COMPETITION relation as an example of Cartesian products. . . . . . . . 91
Fig. 3.7 Sample projection on EMPLOYEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Fig. 3.8 Examples of selection operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Fig. 3.9 Join of two tables with and without a join predicate . . . . . . . . . . . . . . . . 94
Fig. 3.10 Example of a divide operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Fig. 3.11 Recursive relationship as entity-relationship model
and as graph with node and edge types . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Fig. 3.12 Unexpected results from working with NULL values . . . . . . . . . . . . . . . 112
Fig. 3.13 Truth tables for three-valued logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
List of Figures xv
Fig. 3.14 Definition of declarative integrity constraints . . . . . . . . . . . . . . . . . . . . . 114
Fig. 3.15 Definition of views as part of data protection . . . . . . . . . . . . . . . . . . . . . 117
Fig. 4.1 Conflicting posting transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Fig. 4.2 Analyzing a log using a precedence graph. . . . . . . . . . . . . . . . . . . . . . . . 127
Fig. 4.3 Sample two-phase locking protocol for the transaction TRX_1 . . . . . . . 129
Fig. 4.4 Conflict-free posting transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Fig. 4.5 Serializability condition for TRX_1 not met . . . . . . . . . . . . . . . . . . . . . . 132
Fig. 4.6 Restart of a database system after an error. . . . . . . . . . . . . . . . . . . . . . . . 134
Fig. 4.7 The three possible combinations under the CAP theorem . . . . . . . . . . . . 135
Fig. 4.8 Ensuring consistency in replicated systems . . . . . . . . . . . . . . . . . . . . . . . 136
Fig. 4.9 Vector clocks showing causalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Fig. 4.10 Comparing ACID and BASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Fig. 5.1 Processing a data stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Fig. 5.2 B-tree with dynamic changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Fig. 5.3 Hash function using the division method . . . . . . . . . . . . . . . . . . . . . . . . . 150
Fig. 5.4 Ring with objects assigned to nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Fig. 5.5 Dynamic changes in the computer network . . . . . . . . . . . . . . . . . . . . . . . 152
Fig. 5.6 Dynamic partitioning of a grid index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Fig. 5.7 Query tree of a qualified query on two tables . . . . . . . . . . . . . . . . . . . . . 156
Fig. 5.8 Algebraically optimized query tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Fig. 5.9 Computing a join with nesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Fig. 5.10 Going through tables in sorting order . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Fig. 5.11 Determining the frequencies of search terms with MapReduce . . . . . . . 162
Fig. 5.12 Five-layer model for relational database systems . . . . . . . . . . . . . . . . . . 163
Fig. 5.13 Use of SQL and NoSQL databases in an online store . . . . . . . . . . . . . . . 165
Fig. 6.1 Horizontal fragmentation of the EMPLOYEE and
DEPARTMENT tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Fig. 6.2 Optimized query tree for a distributed join strategy . . . . . . . . . . . . . . . . 172
Fig. 6.3 EMPLOYEE table with data type DATE . . . . . . . . . . . . . . . . . . . . . . . . . 174
Fig. 6.4 Excerpt from a temporal table TEMP_EMPLOYEE . . . . . . . . . . . . . . . . 175
Fig. 6.5 Data cube with different analysis dimensions . . . . . . . . . . . . . . . . . . . . . 177
Fig. 6.6 Star schema for a multidimensional database . . . . . . . . . . . . . . . . . . . . . 178
Fig. 6.7 Implementation of a star schema using the relational model . . . . . . . . . . 179
Fig. 6.8 Data warehouse in the context of business intelligence processes. . . . . . 182
Fig. 6.9 Query of a structured object with and without
implicit join operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Fig. 6.10 BOOK_OBJECT table with attributes of the relation type . . . . . . . . . . . 185
Fig. 6.11 Object-relational mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Fig. 6.12 Comparison of tables and facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Fig. 6.13 Analyzing tables and facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Fig. 6.14 Derivation of new information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
xvi List of Figures
Fig. 6.15 Classification matrix with the attributes Revenue and Loyalty . . . . . . . . 192
Fig. 6.16 Fuzzy partitioning of domains with membership functions. . . . . . . . . . . 194
Fig. 7.1 Massively distributed key-value store with sharding and
hash-based key distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Fig. 7.2 Storing data in the Bigtable model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Fig. 7.3 Example of a document store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Fig. 7.4 Illustration of an XML document represented by tables . . . . . . . . . . . . . 211
Fig. 7.5 Schema of a native XML database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Fig. 7.6 Example of a graph database with user data of a website . . . . . . . . . . . . 216