Advanced Database Concepts1
Advanced Database Concepts1
N. KALLON
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
The relational model represents the database as a collection of relations. Informally, each relation
resembles a table of values or, to some extent a “flat” file of record. For example, the university
database of files that was shown earlier is considered to be in the relational model. However,
there are important differences between relations and files, as we shall soon see.
When a relation is thought of as a table of values, each row in the table represents a collection of
related data values. In the relational model, each row in the table represents a fact that typically
corresponds to a real-world entity or relationship. The table name and column names are used to
help in interpreting the meaning of the row which represents facts about a particular student
entity. The column names - Name, Student, Number, Class, Major - specify how to interpret the
data values in each row, based on the column each value is in. All values in a column are of the
same data type.
In the formal relational model terminology, a row is called a tuple, a column header is called an
attribute, and the table is called a relation. The data type describing the types of values that can
appear in each column is called a domain. We now define these terms – domain, tuple, attribute,
and relation – more precisely.
Domains, Attributes, Tuples, & Relations
A domain D is a set of atomic values. By atomic we mean that each value in the domain is
indivisible as far as the relational model is concerned. A common method of specifying a domain
is to specify a data type from which the data values forming the domain are drawn. It is also
useful to specify a name for the domain, to help in interpreting its values. Some examples of
domains follow:-
GSM phone-numbers: The set of 9-digit numbers valid in Sierra Leone. E.g. 078-504-707.
Names: The set of names of persons.
Grade-point-averages: Possible values of computed grade point averages; each must be a real
(floating point) number between 0 and 5
2|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
Employee-ages: Possible ages of employees of a company, each must be a value between 18 and
65 years old.
RELATIONAL DATABASE DESIGN
Here, we consider design issues regarding relational databases. In general, the goal of a relational
database design is to generate a set of relation schemes that allow us to store information without
unnecessary redundancy, yet allowing us to retrieve information easily. One approach is to
design schemes that are in an appropriate normal form. In order to determine whether a relation
scheme is in one of the normal forms, we shall need additional information about the “real–
world” enterprise that we are modeling with the database. This additional information is given by
a collection of constraints called data dependencies.
The Entity – Relationship Model
Designing a successful database application requires a good conceptual model. Generally the
term database application refers to a particular database, for example an XYZ bank database that
keeps track of customer accounts, and the associated programs that implement database updates
corresponding to customers making deposits and withdrawals. These programs often provide
user–friendly graphical user interfaces (GUI) utilizing forms and menus. Hence, part of the
database application will require the design, implementation and testing of these application
programs.
Conceptual Design
Physical Design
3|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
possible. In parallel with specifying the data requirements, it is useful to specify the known
functional requirements of the application. These consist of the user–defined operations (or
transactions) that will be applied to the database and they include both retrievals and updates. In
software design, it is common to use data flow diagrams, sequence diagrams, scenarios, and
other techniques for specifying functional requirements.
Conceptual Schema
Once all the requirements have been collected and analyzed, the next step is to create a
conceptual schema for the database, using a high-level conceptual data model. This step is
called conceptual design. The conceptual schema is a concise description of the data requirements
of the users and includes detailed descriptions of the entity types, relationships, and constraints;
these are expressed using the concepts provided by the high-level data model. Because these
concepts do not include implementation details, they are usually easier to understand and can be
used to communicate with non-technical users. The high-level conceptual schema can also be
used as a reference to ensure that all users’ data requirements are met and that the requirements
do not include conflicts. This approach enables the database designers to concentrate on
specifying the properties of the data, without being conceptual with storage details.
Consequently, it is easier for them to come up with a good conceptual database design.
Logical Design
The next step in database design is the actual implementation of the database, using a
commercial DBMS. Most current commercial DBMSs use an implementation data model, such
as the relational or the object database model, so the conceptual schema is transformed from the
high-level data models into the implementation data model. This step is called logical design or
data model mapping, and its result is a database schema in the implementation data model of
the DBMS.
Physical Design
Finally, the last step is the physical design phase, during which the internal storage structures
access paths, and file organizations for the database files are specified. In parallel with these
activities, application programs are designed and implemented as datasets transactions,
corresponding to the high – level transaction specifications.
An Illustration Of A Company Database Applications
In this unit, we describe an example database application, called COMPANY, which serves to
illustrate the ER model concepts and their use in schema design. The COMPANY database
keeps track of a company’s employees, departments, and projects. Suppose that, after the
requirements, collection and analysis phase, the database designers stated the following
description of the “mini world” – the part of the company to be represented in the database:
The company is organized into departments. Each department has a unique name, a unique
number, and a particular employee who manages the department. We keep track of the start date
when that employee began managing the department. A department may have several locations.
4|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
A department controls a number of projects, each of which has a unique name, a unique number,
and a single location.
We store each employee’s name, social security number, address, salary, sex, and birth date. An
employee is assigned to one department but may work on several projects, which are not
necessarily controlled, by the same department. We keep track of the number of hours per week
that an employee works on each project. We also keep track of the direct supervisor of each
employee for insurance purposes. We keep each dependent’s first name, sex, birth date, and
relationship to the employee.
Entity Types, Entity Sets, Attributes and Keys
The ER model describes data as entities, relationships, and attributes.
5|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
each individual entity. For example, the color attribute of car may have between one and three
values, if we assume that a car can have at most three colors.
Address
Street Address
Figure 1.4 A hierarchy of composite attributes; the Street Address component of an Address is
further composed of number, Street, and Apartment Number.
Stored versus Derived Attribute: In some cases, two (or more) attribute values are stated, for
example, the Age and Birth Date attributes of a person. For a particular person entity, the value
of Age can be determined from the current (today’s) date and the value of that person’s Birth
Date. The Age attribute is hence called a derived attribute and is said to be derivable from the
Birth Date attribute, which is called a stored attribute. Some attributes values can be derived
from related entities, for example, an attribute Number of Employees of a department entity can
be derived by counting the number of employees related to (working for) that department.
Null Values: In some case a particular entity may not have applicable value for an attribute. For
example, the Apartment Number attribute of an address applies only to address that are in
apartment buildings and not to other types of residences, such as single degrees. For such
situations, a special value called null is created. An address of a single–family home would have
null for its apartment Number attribute, and a person with no college degree would have null for
college degrees.
Entity Types, Entity Sets, Keys & Value Sets
ENTITY Types and Entity Sets. A database usually contains groups of entities that are similar.
For example, a company employing hundreds of employees may want to store similar
information concerning each of the employees. These employee entities share the same
attributes, but each entity has its own values for each attribute. An entity type defines a collection
(or set) of entities that have same attributes. The collection of all entities of a particular entity
type in the database at any point in the time is called an entity set; the entity set is usually
referred to using the same name as the entity type. For example, EMPLOYEE refers to both a
type of entity as well as the current set of all employee entities in the database.
6|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
7|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
security constraints, and so on and so forth. Descriptor information is essential if the system is to
do its job properly. For example, the optimizer uses catalogue information about indexes and
other physical storage structures, as well as much other information, to help it decided how to
implement user request.
8|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
Transaction processing systems are systems with large databases and hundreds of concurrent
users. It provides and “all-or-noting” proposition stating that each work –unit performed in
database must either complete in its entirety or have no effect whatsoever. Further, the system
must isolate each transaction from the other, results must conform to existing constraints in the
database and transactions that complete successfully must be written to durable storage.
Figure 2.1: A consistent database state is one in which all data integrity constraints are satisfied
Transaction Properties
The set of properties that guarantee that database transactions are processed reliably is given the
acronym ACID (Atomicity, Consistency, Isolation and Durability)
• Atomicity - this refers to the ability of the DBMS to guarantee that either all of the transactions
are performed or none of them are. For example, the transfer of funds from one account to
another can be completed or it can fail for a multitude of reasons, but atomicity guarantees that
one account would not be debited if the other is not credited. Atomicity states that database
modifications must follow an “all-or-nothing” rule. Each transaction is said to be atomic if when
one part of the transaction fails, the entire transaction fails. It is critical that database
management system maintain the atomic nature of transactions in spite of any DBMS operating
system or hardware failure.
• Consistency – this ensures that the database remains consistent before the start of the
transaction and after the transaction is over (whether successful or not). Consistency states that
only valid data will be written to the database. If for some reason, a transaction violates the
database’s consistency rule, the entire transaction will be rolled back and the database will be
restored to a state consistent with those rules. On the other hand, if the transaction successfully
executes, it will take the database from one state that is consistent with the rules to another state
that is also consistent with the rules.
• Isolation – this refers to the requirement that other operations cannot access or see data in as
immediate state during a transaction. This constraint is requires to maintain the performance as
well as the consistency between transactions in a DBMS. Thus, each transaction is unaware of
another transaction executing concurrently in the system.
• Durability – this refers to the guarantee that once the user has been notified of success, the
transaction will persist and not be undone. This means it will survive system failure, and that the
database system has checked the integrity of the constraints and would not need to abort the
9|Page
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
transaction. Many databases implement durability by writing all transactions into a transaction
log. Durability does not imply a permanent state of the database. Another transaction may
overwrite any changes made by the current transaction without hindering durability.
Transaction Management with SQL
Users of database systems are usually consider consistency and integrity of data as highly
important. A simple transaction is usually issued to the database system in a language like SQL
wrapped in a transaction using the pattern similar to that below:
1. Begin the transaction.
2. Execute several data manipulations and queries.
3. If no error occurs commit the transaction and end it.
4. If errors occur, then rollback the transaction and end it.
If no errors occur during the execution of the transaction then the system commits the
transaction. A transaction commit operation applies all data manipulation within the scope of the
transaction and persists the results to the database. If an error occurs during the transaction, or if
the user specifies a rollback operation, the data manipulations within the transactions are not
persisted into the database. In no case can a partial transaction be committed to the database
since that would leave the database in an inconsistent state.
A START TRANSACTION statement in SQL or any other statement that will modify data
stars with a transaction within a relational database management system. The result of any work
done after this point will remain invincible to other database-users until the system process a
COMMIT statement. A ROLLBACK statement can also occur, which will undo any work
performed since the START TRANSACTION command. Both COMMIT and ROLLBACK will
end the transaction: another START TRANSACTION will need to be issued to start another one.
An example of a transaction is given below:
• Transaction support
– COMMIT
– ROLLBACK
• User initiated transaction sequence must continue until:
Figure 2.2: A query with the commit statement – SEE PAGE 233
• Transaction begins when first SQL statement is encountered, and ends at COMMIT or
End
10 | P a g e
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
database and stored in stable storage. If after a start, the database is found in an inconsistent state
or is not shut down properly, the database management system reviews the database logs for
uncommitted transactions and rolls back the changes made by these transactions. Additionally,
all transactions already committed but whose changes were not yet materialized in the database
are re-applied. Both are done to ensure atomicity and durability of transactions. A transaction log
is made up of:
Log Sequence Number: A unique id for a log record. With LSNs, logs can be recovered in
constant time. Most logs’ LSNs are assigned in monotonically increasing order, which is useful
in recovery algorithms.
PrevLSN: A link to the last log record. This implies database logs are constructed in linked list
form.
Transaction ID number: A reference number to the database transaction generating the log
record.
Type: Describes the type of transaction log.
Information about the actual changes that triggered the log record to be written.
Example
11 | P a g e
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
undoNextLSN: This field contains the LSN of the next log record that is to be undone for the
transaction that wrote the last Update Log.
Commit Record notes a decision to commit a transaction.
Abort Record notes a decision to abort and hence rollback a transaction.
CONCURRENCY CONTROL
Concurrency control is the process of managing/controlling simultaneous operations on the
database. It is required because actions from different users or applications taking place upon a
database must not interfere. It establishes order of concurrent transactions. Interleaving
operations can lead to the database being in an inconsistent state. Three potential problems which
should be addressed by successful concurrency control are as follows:
• Lost Updates
• Uncommitted data
• Inconsistent Retrievals.
The Scheduler is a module that is responsible for implementing a particular strategy for
concurrency control. It:
• Establishes order for concurrent transaction execution.
• Interleaves execution of database operations to ensure serializability
• Bases actions on concurrency control algorithms
– Locking
– Time stamping
• Ensures efficient use of computer’s CPU
Concurrency Control with Locking Methods
A transaction will use a lock to deny data access to other transactions and so prevent incorrect
updates. Locks can be Read (shared) or Write (exclusive) locks. Write Locks on a data item
prevents other transaction from reading that data item whereas Read Locks simply stop other
transactions from editing (writing to) the data item. Locks are used in the following ways:
1. Any transaction that needs to access a data item must first lock the item, requesting a shared
lock for read only access or exclusive lock for read and write access.
2. If the item is not locked by another transaction, the lock will be granted.
3. If the item is currently locked, the DBMS determines whether the request is compatible with
the existing lock. If a shared lock is requested on an item that already has shared lock on it,
request will be granted. Otherwise, transaction must wait until the existing lock is released.
4. Transaction continues to hold a lock until it is explicitly released either during execution or
when it terminates (aborts or commit). It is only when the exclusive lock has been released that
the effects of the write operation will be made visible to another transaction.
Types of Locks
12 | P a g e
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
Deadlocks occur when two or more transactions are waiting for locks held by each other to be
released. The only way to break a deadlock is to abort one of the transactions so that a lock is
released so the other transaction can proceed. The DBMS can manage this process of aborting a
transaction when necessary. The aborted transaction is typically restarted so that it is able to
execute and commit without the user being aware of any problems occurring. A timeout could
also be used. The transaction that requests a lock waits for at most a specified period. Deadlocks
can be prevented when the DBMS looks ahead to determine if the transaction would cause a
deadlock and thus never allow the dead lock to occur.
CONCURRENCY CONTROL WITH TIME STAMPING METHODS
Whenever a transaction starts, it is given a timestamp. This is so we can tell which order that the
transactions are supposed to be applied in. So given two transactions that affect the same object,
the transaction that has the earlier timestamp is meant to be applied before the other one.
However, if the wrong transaction is actually presented first, it is aborted and must be restarted.
Every object in the database has a read timestamp, which is updated whenever the object's data
is read, and a write timestamp, which is updated whenever the object's data is changed.
If a transaction wants to read an object, but the transaction started before the object's write
timestamp it means that something changed the object's data after the transaction started. In this
13 | P a g e
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.
CS 301 ADVANCED DATABASE CONCEPTS LECTURE NOTES
case, the transaction is cancelled and must be restarted. If a transaction wants to write to an
object, but the transaction started before the object's read timestamp it means that something
has had a look at the object, and we assume it took a copy of the object's data. So we can't write
to the object as that would make any copied data invalid, so the transaction is aborted and must
be restarted.
Timestamp Resolution
This is the minimum time elapsed between two adjacent timestamps. If the resolution of the
timestamp is too large (coarse), the possibility of two or more timestamps being equal is
increased and thus enabling some transactions to commit out of correct order. For example,
assuming that we have a system that can create one hundred unique timestamps per second, and
given two events that occur 2 milliseconds apart, they will probably be given the same timestamp
even though they actually occurred at different times.
Timestamp Locking
Even though this technique is a non-locking one, in as much as the Object is not locked from
concurrent access for the duration of a transaction, the act of recording each timestamp against
the Object requires an extremely short duration lock on the Object or its proxy.
CONCURRENCY CONTROL WITH OPTIMISTIC METHODS
Optimistic Concurrency Control (OCC) is a concurrency control method that assumes that
multiple transactions can complete without affecting each other, and that therefore transactions
can proceed without locking the data resources that they affect. Before committing, each
transaction verifies that no other transaction has modified its data. If the check reveals
conflicting modifications, the committing transaction rolls back. However, if conflicts happen
often, the cost of repeatedly restarting transactions hurts performance significantly; other
concurrency control methods have better performance under these conditions.
Optimistic Concurrency Control Phases
More specifically, OCC transactions involve these phases:
• Begin: Record a timestamp marking the transaction's beginning.
14 | P a g e
B.Sc. THIRD YEAR SECOND SEMESTER 2023 / 2024 AY.