M1 - Database Management Systems
M1 - Database Management Systems
M1 - Database Management Systems
Management
Systems
Models and Functional Architecture
PID_00179807
The texts and images contained in this publication are subject -except where indicated to the contrary- to an Attribution-
NonCommercial-NoDerivs license (BY-NC-ND) v.3.0 Spain by Creative Commons. You may copy, publically distribute and
transfer them as long as the author and source are credited (FUOC. Fundación para la Universitat Oberta de Catalunya (Open
University of Catalonia Foundation)), neither the work itself nor derived works may be used for commercial gain. The full terms of
the license can be viewed at http://creativecommons.org/licenses/by-nc-nd/3.0/legalcode
CC-BY-NC-ND • PID_00179807 Database Management Systems
Index
Introduction............................................................................................... 5
Objectives..................................................................................................... 6
1. Relational Extensions....................................................................... 7
1.1. Pre-relational Models .................................................................. 7
1.2. Relational Model ......................................................................... 7
1.3. Object-oriented Extension .......................................................... 8
1.4. XML Extension ............................................................................ 9
Summary...................................................................................................... 15
Self-evaluation............................................................................................ 17
Answer key.................................................................................................. 18
Glossary........................................................................................................ 19
Bibliography............................................................................................... 20
CC-BY-NC-ND • PID_00179807 5 Database Management Systems
Introduction
Firstly, we will see some problems associated to the relational model; there is
a well-known lack of semantics in the classical relational model; moreover, in
some applications, there is a mismatch between their data and the rigid struc-
ture imposed by a relational schema. To address these problems, two trends
have appeared in recent years (Object-Orientation and XML, respectively).
Objectives
1. Relational Extensions
The relational model was defined by Edward F. Codd in 1970, while he was Bibliography
working at IBM labs.
Edward�F.�Codd (1970). "A
relational model for large
To understand the relational model, it is important to know how databases shared data banks". Commu-
nications of ACM (13(6), pp.
were managed at that time. Thus, we will briefly see how data was stored in 377-487).
the 60s. Then, we will overview the relational model to appreciate its contri-
butions to the state of the art. Finally, we will analyse the flaws of the rela-
See also
tional model from the semantics and rigidity of schema viewpoints.
This will give rise to the rela-
tional extensions explained in
1.1. Pre-relational Models the module "Relational Exten-
sions".
In the 60s, data was stored in file systems organised into records with fields.
Each application had to access many independent files. Without any kind of
control, in applications with large quantities of data being constantly updated,
inconsistencies between different files (or even records in the very same file)
were really likely.
Thus, in the 70s, the market dealt with this using hierarchical and network CODASYL
data systems. These kinds of systems went a step further than independent
Committee on Data Systems
files by managing pointers between instances. The hierarchical model allowed and Languages was a con-
the definition of a tree of elements, while the network model (also known as sortium of companies that,
among other things, defined
CODASYL) allowed one child to have many parents. the COBOL programming lan-
guage.
During the 70s, the theoretical background of the relational model was devel- Relational prototypes
oped (based on solid mathematical foundations, i.e. sets theory) and in the
The first relational prototype
early 80s the first prototypes appeared. The idea behind this was to address two was System R (later DB2), and
different problems: maintenance of pointers and low-level procedural query later Oracle and Ingres.
languages.
Before the relational model, pointers were physical addresses. This means that
moving data from one disk position to another resulted in either an incon-
sistency or a cascade modification. Primary keys and foreign keys solved this
problem, since they are logical pointers (i.e. independents of where data is
actually stored).
CC-BY-NC-ND • PID_00179807 8 Database Management Systems
From another point of view, the development of applications was time-con- Declarative vs Procedural
suming and error-prone because programmers had to deal with low-level in-
In a declarative language you
terfaces. Thus, the efficiency of the data access was absolutely dependent on state what you want, while in
the ability of the programmer and his/her knowledge of the physical char- a procedural language (like C,
C++, Java, etc.) you have to
acteristics of the storage. Therefore, the main contribution of the relational state how to obtain it.
model was a declarative language. SQL was defined in 1986 and standardised
in 1989. From here on, the required data had only to be declared and it was
the responsibility of the DBMS to find the best way to retrieve it. This resulted
in a simplification of the code and real savings in development and mainte-
nance time.
From a theoretical point of view, data in the relational model is stored in rela-
tions (usually associated with a given concept). Each relation contains tuples
(corresponding to the instances of the concept) and these are composed by
attributes (showing the characteristics of each instance). In the implementa-
tion of the model, relations are represented by tables, which contain rows that
have different values (one per column defined in the schema of the table).
Below this, the implementations store tables in files that contain records of
fields. The correspondences of these terms are summarised in Table 1.
In the early 90s, the object-oriented paradigm was the main trend in program- Object-oriented
ming languages. This widened the gap between them and databases (the for- programming language
mer being procedural and the latter, declarative). The theoretical background Smalltalk appeared in 1972, C
of the foundations of the relational model included several normal forms that ++ in 1983, Delphi in 1986, Ja-
va in 1995 and C# in 2001.
a good design had to follow. The first of these normal forms stated that the
value of an attribute had to be atomic. This explicitly forbids the possibility
of storing objects, since these can contain nested structures (i.e. they are not
atomic).
Moreover, object-oriented programming advocated the encapsulation of be- 1st Normal Form
haviour and data under the same structure. The maturity of relational DBMSs
A relation is in 1NF if and on-
(RDBMSs) at the time allowed for proposals to move the behaviour of the ob- ly if each and every attribute
jects to the DBMS, where it would be closer to the data. On the one hand, this in the relation is atomic, i.e.
no attribute is itself a relation
would be better for maintenance and extensibility, and, on the other, it would or can be decomposed into
smaller pieces of information.
be more efficient, since we would not have to extract data from the server to
modify it and move it back to the server side again.
CC-BY-NC-ND • PID_00179807 9 Database Management Systems
Finally, another criticism of the relational model was its lack of semantics.
While contemporary conceptual models (e.g. Extended Entity-Relationship,
or later UML) allowed for many kinds of relationships (e.g. Associations, Gen-
eralisation/Specialisation, Aggregation, etc.), the relational model only pro-
vided meaningless foreign keys pointing to primary keys.
The emergence of the Internet generated another problem in the late 90s:
data from external sources also became available to the company. Until then,
all data was generated inside and under the control of the IT department of
the company. Therefore, the applications and systems knew exactly what to
expect. It was always possible to define the schema of data.
Nevertheless, when the source of data is not under our control, we need to
be prepared for the unexpected. It was clear that some tuples could have at-
tributes that others would not. Moreover, it was quite common to find spe-
cial attributes only in some tuples, or attributes that were no longer provided
without warning, or the other way round new attributes were suddenly added
to some tuples.
In this context, the rigid schema of a relation does not seem appropriate to
store data. Something more flexible is needed.
We must take into account that the persistent as well as the volatile storage
may be distributed (if so, different fragments of data can be located at differ-
ent sites). Also, many processors could be available (in which case, the query
optimiser has to take into account the possibility of parallelising the execu-
tion). Obviously, this would affect certain parts of this architecture, mainly
depending on the heterogeneities we find in the characteristics of the storage
systems we use and the semantics of the data stored.
See also
Parallel and distributed computing dramatically influences the different
components of this architecture. We will not go into detail in
this introductory module,
but we will see this in differ-
ent modules devoted to each
component.
2.1. Query Manager
(1)
The query manager is the component of the DBMS in charge of transforming a This comes from different
groups of people giving different
declarative query into an ordered set of steps (i.e. procedural description). This meanings to the same words.
transformation is even more difficult taking into account that, following the
CC-BY-NC-ND • PID_00179807 11 Database Management Systems
ANSI/SPARC architecture, DBMSs must provide views to deal with semantic ANSI/SPARC
relativism1. It is also relevant to note that security and constraints have to be
The Standards Planning and
taken into account. Requirements Committee of
the American National Stan-
dards Institute defined a three-
Figure 1. Functional architecture of a DBMS level architecture in 1975 to
abstract users from physical
storage, which is still used in
today's DBMSs.
From the point of view of the end user, there is no difference in querying data
in a view or in a table. However, dealing with views is not easy and we will not
go into detail about it in this course. Nevertheless, we will take a brief look at
the difficulties it poses in this section.
First of all, we must take into account that data in the view may be physically
stored (i.e. materialised) or not, which poses new difficulties. If data in the view
is not physically stored, in order to transform the query over the views into
a query over the source tables, we must substitute the view name in the user
query by its definition (this is known as view expansion). In some cases, it is
more efficient to instruct the DBMS to calculate the view result and store it
while waiting for the queries. However, if we do this, we have to be able to
transform an arbitrary query over the tables into a query over the available
materialised views (this is known as query rewriting), which is somewhat con-
trary to view expansion (in the sense that we have to identify the view defin-
ition in the user query and substitute it with the view name). If we are able
CC-BY-NC-ND • PID_00179807 12 Database Management Systems
Finally, updating data in the presence of views is also more difficult. Firstly,
we would like to allow users to express not only queries but also updates in
terms of views (this is known as update through views), which is only possible
in few cases. Secondly, if views are materialised, changes in the sources are
potentially propagated to the views (this is known as view updating).
Ethics as well as legal issues raise the need to control access to data. We cannot See also
allow any user to query or modify all the data in our database. This component
We will go into the implemen-
defines user privileges and, once it has done this, validates user statements by tation details of this compo-
checking whether or not they are allowed to perform the action in question. nent in the "Security" module.
Another important aspect is guaranteeing the integrity of the data. The data- Note
base designer defines constraints over the schema that must be subsequently
We will not elaborate on the
enforced so that user modifications of data do not violate it. Although it is problems related to constraint
not evident, its theoretical background is closely related to that of view man- checking implementation dur-
ing this course.
agement.
Clearly, the most important and complex part of the query manager is the See also
optimiser, because it seeks to find the best way to execute user statements.
In the "Distributed queries op-
Thus, its behaviour will have a direct impact on the performance of the sys- timisation" module, we will see
tem. Remember that it has three components, namely semantic, syntactic and how parallelism and distribu-
tion of data affect this compo-
physical optimisers. nent.
The query optimiser breaks down the query into a set of atomic operations
(mostly those of relational algebra). It is the task of this component to coor-
dinate the step-by-step execution of these elements. In distributed environ-
ments, this component is also responsible for assigning the execution of each
operation to a given site.
CC-BY-NC-ND • PID_00179807 13 Database Management Systems
2.3. Scheduler
As we know, many users (up to tens or hundreds of thousands) can work con- See also
currently on a database. In this case, it is quite likely that they will want to
In the "Transaction models and
access not only the same table, but exactly the same column and row. If so, Concurrency control" mod-
they could interfere with each other's task. The DBMS must provide certain ule we will explain more ad-
vanced centralised and dis-
mechanisms to deal with this problem. Generally, the way this is done is by tributed mechanisms (namely
Multi-granule locking, Multiver-
restricting the execution order of the reads, writes, commits and aborts of the sion, and Timestamping).
different users. On receiving a command, the scheduler can pass it directly to
the data manager, queue it and wait for the appropriate time to execute it, or
cancel it permanently (which aborts its transaction). The most basic and com-
monly used mechanism to avoid interferences is Shared-eXclusive locking.
As we know, memory storage is much faster than disk storage (up to hundreds See also
of thousands of times faster). However, it is volatile and much more expensive.
We will explain this in detail in
Being expensive means that its size is limited and it can only contain a small the "Data Management" mod-
part of the database. Moreover, being volatile means that switching off the ule.
server or simply rebooting it would cause us to lose our data. It is the task of
the data manager to take advantage of both kinds of storage while smoothing
out their weaknesses.
The data manager has a component in charge of moving data from disk
to memory (i.e. fetch) and from memory to disk (i.e. flush) to meet the
requests it receives from other components.
The simplest way to manage this is known as write through. Unfortunately, Write through
this is quite inefficient, because the disk (which is a slow component) becomes
This means that any modifi-
a bottleneck for the whole system. It would be much more efficient to leave a cation of data is immediate-
chunk of data in memory waiting for several modifications and then to make ly sent to disk, which guaran-
tees that nothing is lost in the
all of the persistent at the same time with a single disk write. Thus, we will have event of a power failure.
a buffer pool to keep the data temporarily in the memory when expecting lots
of modifications in a short period of time.
CC-BY-NC-ND • PID_00179807 14 Database Management Systems
Block size
Waiting for lots of modifications before writing data to disk is worth it,
because the transfer unit between memory and disk is a block (not a The default block size in Oracle
10g is 8Kb.
single byte at a time), and it is likely that several modifications of the
data will be contained in the same block over a short period of time.
Waiting for many modifications before writing data to disk may result in los-
ing certain user requests in the event of power failure. Imagine the following
situation: a user executes a statement modifying tuple t1, the system confirms
the execution, but does not write the data to disk as it is waiting for other
modifications of tuples in the same block; unfortunately, there is a power fail-
ure before the next modifications arrive. In the event of system failure, all the
system would have is what was stored on the disk. If the DBMS did not imple-
ment some sort of safety mechanism, the modification of t1 would be lost. It
is the task of this component to avoid such a loss. Moreover, this component
also has the task of undoing all changes made during a rolled-back transac-
tion. As with other components, doing this in a distributed environment is
much more difficult than in a centralised one.
Note that the interface to the buffer pool manager is always through the
recovery manager (i.e. no components but this can gain direct access
to it).
We must not only foresee system failures, but also media failures. For example,
in the event of a disk failure, it will also be the responsibility of this component
to provide the means to recover all data in it. Remember that the "Durability"
property of centralised ACID transactions states that once a modification is
committed it cannot be lost under any circumstances.
CC-BY-NC-ND • PID_00179807 15 Database Management Systems
Summary
Conceptual�map
The following conceptual map illustrates the contents of this subject. Note
that it reflects the structure of the contents as opposed to that of the modules.
CC-BY-NC-ND • PID_00179807 16 Database Management Systems
CC-BY-NC-ND • PID_00179807 17 Database Management Systems
Self-evaluation
1. What is the theoretical limitation to storing objects in a relational database?
Answer key
Self-evaluation
1.�1NF.
3.�The fact that the transfer unit is blocks as opposed to bytes and the locality of the mod-
ifications (i.e. several modifications being located in the same block within a short period
of time).
Glossary
DBMS Database Management System
O-O Object-Oriented
Bibliography
Garcia-Molina, H; Ullman, J. D.; Widom, J. (2009). Database systems (second edition).
Pearson/Prentice Hall.
Bernstein, P. A.; Hadzilacos, V.; Goodman, N. (1987). Concurrency control and recovery
in database systems. Addison-Wesley.