Chapter 1 ManagingDataStorage
Chapter 1 ManagingDataStorage
Chapter Objectives
Opening Scenario
Drill Down
Retention Questions
Drill Down
Gartner1 estimated the total DBMS market value at $35.9 billion for
2015, which represented an 8.7% growth when compared to 2014.
According to the IDC, the overall market for database management
solutions is estimated to reach over $50 billion by 2018.
Connections
Retention Questions
database
DBMS
database system
1.3 File versus Database Approach to Data
Management
Before we further explore database technology, let’s step back and see how data
management has evolved. This will give us a proper understanding of the legacy
problems many companies are still facing.
1.3.1 The File-Based Approach
In the early days of computing, every application stored its data into its own
dedicated files. This is known as a file-based approach and is illustrated in
Figure 1.1.
The applications now directly interface with the DBMS instead of with
their own files. The DBMS delivers the desired data at the request of each
application. The DBMS stores and manages two types of data: raw data and
metadata. Metadata refers to the data definitions that are now stored in the
catalog of the DBMS. This is a key difference to the file-based approach. The
metadata are no longer included in the applications, but are now properly
managed by the DBMS itself. From an efficiency, consistency, and maintenance
perspective, this approach is superior.
Another key advantage of the database approach is the facilities provided
for data querying and retrieval. In the file-based approach, every application had
to explicitly write its own query and access procedures. Consider the following
example in pseudo-code:
Procedure FindCustomer;
Begin
open file Customer.txt;
Read(Customer)
While not EOF(Customer)
If Customer.name='Bart' Then
display(Customer);
EndIf
Read(Customer);
EndWhile;
End;
Here, we first open a Customer.txt file and read the first record. We then
implement a while loop that iterates through each record in the file until the end
of the file is reached (indicated by EOF(Customer)). If the desired information is
found (Customer.name='Bart'), it will be displayed. This requires a lot of coding.
Because of the tight coupling between data and applications, many procedures
would be repeated in various applications, which is again not very appealing
from a maintenance perspective. As noted, DBMSs provide database languages
that facilitate both data querying and access. A well-known language, which we
discuss extensively in Chapter 7, is Structured Query Language (SQL). SQL can
be used to formulate database queries in a structured and user-friendly way, and
is one of the most popular data querying standards used in the industry. An
example SQL query that gives the same output as our pseudo-code above could
be:
SELECT *
FROM Customer
WHERE
name = 'Bart'
Here, you only need to specify what information you want. In our case, we
want all customer information for customer 'Bart'. This SQL query will then be
executed by the DBMS in a transparent way. In the database approach, we only
need to specify which data we are interested in, and no longer how we should
access and retrieve them. This facilitates the development of database
applications because we no longer need to write complex data retrieval
procedures.
To summarize, the file-based approach results in a strong application–data
dependence, whereas the database approach allows for applications to be
independent from the data and data definitions.
Drill Down
Retention Questions
Contrast the file versus database approach to data management.
1.4 Elements of a Database System
In this section we discuss database model versus instances, data models, the
three-layer architecture, the role of the catalog, the various types of database
users, and DBMS languages.
1.4.1 Database Model versus Instances
In any database implementation, it is important to distinguish between the
description of the data, or data definitions, and the actual data. The database
model or database schema provides the description of the database data at
different levels of detail and specifies the various data items, their
characteristics, and relationships, constraints, storage details, etc.2 The database
model is specified during database design and is not expected to change
frequently. It is stored in the catalog, which is the heart of the DBMS. The
database state then represents the data in the database at a particular moment. It
is sometimes also called the current set of instances. Depending upon data
manipulations, such as adding, updating, or removing data, it typically changes
on an ongoing basis.
The following are examples of data definitions that are an essential part of
the database model stored in the catalog.
Database model
Student (number, name, address, email)
We have three data items: Student, Course, and Building. Each of these data
items can be described in terms of its characteristics. A student is characterized
by a number, name, address, and email; a course by a number and name; and a
building by a number and address.
Figure 1.3 shows an example of a corresponding database state. You can see
the database includes data about three students, three courses, and three
buildings.
Connections
Retention Questions
Time T1 T2 Balance
Table 1.1 shows two database transactions: T1 and T2. T1 updates the
account balance by withdrawing $50. T2 deposits $120. The starting balance is
$100. If both transactions were to run sequentially, instead of in parallel, the
ending balance should be $100–$50 + $120 = $170. If the DBMS interleaves the
actions of both transactions, we get the following. T2 reads the balance at t2 and
finds it is $100. T1 reads the balance at t3 and finds it is $100. At t3, T2 updates
the balance to $220. However, it still needs to write (or save) this value. At t4,
T1 calculates the balance as $100–$50 = $50 whereas T2 saves the balance,
which now becomes $220. T1 then saves the balance as $50 at t5. It overwrites
the value of $220 with $50, after which both transactions are ended. Since T1
updates the balance based on the value it had before the update by T2, and then
writes the updated balance after T2 is finished, the update effect of T2 is lost. It
is as if transaction T2 did not take place. This is commonly called a lost-update
problem. The DBMS should avoid the inconsistencies that emanate from the
interference between simultaneous transactions.
To ensure database transactions are processed in a reliable way, the DBMS
must support the ACID (Atomicity, Consistency, Isolation, Durability)
properties. Atomicity, or the all-or-nothing property, requires that a transaction
should either be executed in its entirety or not at all. Consistency assures that a
transaction brings the database from one consistent state to another. Isolation
ensures that the effect of concurrent transactions should be the same as if they
had been executed in isolation. Finally, durability ensures that the database
changes made by a transaction declared successful can be made permanent under
all circumstances.
1.5.7 Backup and Recovery Facilities
A key advantage of using databases is the availability of backup and recovery
facilities. These facilities can be used to deal with the effect of loss of data due to
hardware or network errors, or bugs in system or application software. Typically,
backup facilities can perform either a full or incremental backup. In the latter
case, only the updates since the previous backup will be considered. Recovery
facilities allow restoration of data to a previous state after loss or damage.
Connections
Retention Questions