Abido Adanced Database Systems
Abido Adanced Database Systems
1.1. Transaction
A transaction can be defined as a group of tasks. A single task is the minimum processing
unit which cannot be divided further. It is an action, or a series of actions, carried out by a
single user or an application program, which reads or updates the contents of a database.
A transaction is a collection of operations that performs a single logical function in a
database application. The transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g., power failures and
operating system crashes) and transaction failures.
Page 1 of 44
Advanced Database System
Page 2 of 44
Advanced Database System
Example of transaction
Transfer 50 birr from account A to account B This transaction can be described according
to the four properties as follows:
Atomicity - shouldn’t take money from A
Read(A)
without giving it to B
A = A - 50
Consistency - money isn’t lost or gained
Write(A)
A single transaction Isolation - other queries shouldn’t see A or B
Read(B)
change until completion
B = B+50
Durability - the money does not go back to A
Write(B)
Page 3 of 44
Advanced Database System
1.3.1. Concurrency Problems
Concurrency can result in the following problems:
Lost update problem
Uncommitted update
Incorrect analysis
Lost Update Problem:
It can be briefed by the example bellow.
There are two transactions T1 and T2. T1 is expected to subtract 5 from X and T2 is
expected to add 5 to X. And it is obvious that the expected final result is no change to X. But,
because of the lost update problem, the result is different.
Both transactions, T1 and T2, read X. T1 subtracts 5 from X and updates the database and
T1 commits. T2 also adds 5 to X (which is read before T1 updates its value) and again
updates the database and commits. At the end of both transactions, the value of X is
increased by 5.
As you can observe, the update T1 has made to X is meaningless and this is how a lost
update problem may occur.
Uncommitted Update Problem:
Let’s take the above two transactions for demonstration. What if T2 reads X after T1
updated it and if T1 is rolled back? That is T2 sees the change T1 made to X but T1 rolled
back.
Page 4 of 44
Advanced Database System
The change made by T1 is undone because it rolls back. The expected result is a final
addition of 5 to the old value of X because already what T1 is made is undone. But,
according to the above scenario, no change is made to the value of X though T1 is rolled
back. This kind of problem is called uncommitted update problem.
Incorrect Analysis
Observe the following transaction processing, T1 and T2.
T1 takes 5 from X and adds it to Y. But, before T1 updates Y, T2 reads X and Y.
What would you expect on the sum, X+Y? According to T1, it has got no change; but,
according to T2, the sum X+Y is lesser by 5 than the sum T1 expects. This kind of problem is
called incorrect analysis problem.
Page 5 of 44
Advanced Database System
Shared/exclusive − This type of locking mechanism differentiates the locks based
on their uses. If a lock is acquired on a data item to perform a write operation, it is
an exclusive lock. Allowing more than one transaction to write on the same data
item would lead the database into an inconsistent state. Read locks are shared
because no data value is being changed.
There are four types of lock protocols available – simplistic lock, pre-claiming lock, two
phase lock (2PL), and strict two phase lock (strict 2PL).
Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a
'write' operation is performed. Transactions may unlock the data item after completing the
‘write’ operation.
Pre-claiming Lock Protocol
Pre-claiming protocols evaluate their operations and create a list of data items on which
they need locks. Before initiating an execution, the transaction requests the system for all
the locks it needs beforehand. If all the locks are granted, the transaction executes and
releases all the locks when all its operations are over. If all the locks are not granted, the
transaction rolls back and waits until all the locks are granted.
Page 6 of 44
Advanced Database System
Two-phase locking has two phases, one is growing, where all the locks are being acquired
by the transaction; and the second phase is shrinking, where the locks held by the
transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and
then upgrade it to an exclusive lock.
Page 7 of 44
Advanced Database System
the conflicting pair of tasks should be executed according to the timestamp values of the
transactions.
The timestamp of transaction Ti is denoted as TS(Ti).
Read time-stamp of data-item X is denoted by R-timestamp(X).
Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows −
If a transaction Ti issues a read(X) operation −
o If TS(Ti) < W-timestamp(X)
Operation rejected.
o If TS(Ti) >= W-timestamp(X)
Operation executed.
o All data-item timestamps updated.
If a transaction Ti issues a write(X) operation −
o If TS(Ti) < R-timestamp(X)
Operation rejected.
o If TS(Ti) < W-timestamp(X)
Operation rejected and Ti rolled back.
o Otherwise, operation executed.
1.4. Concept of Serializability
When multiple transactions are being executed by the operating system in a
multiprogramming environment, there are possibilities that instructions of one transaction
are interleaved with some other transaction.
A chronological execution sequence of a transaction is called a schedule. A schedule can
have many transactions in it, each comprising of a number of instructions/tasks.
Serial Schedule − It is a schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after the other. This type of schedule
is called a serial schedule, as transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark. The
execution sequence of an instruction in a transaction cannot be changed, but two
transactions can have their instructions executed in a random fashion. This execution does
no harm if two transactions are mutually independent and working on different segments
Page 8 of 44
Advanced Database System
of data; but in case these two transactions are working on the same data, then the results
may vary. This ever-varying result may bring the database to an inconsistent state.
To resolve this problem, we allow parallel execution of a transaction schedule, if its
transactions are either serializable or have some equivalence relation among them.
1.5. Concurrency Control Mechanism
One way to avoid any problems regarding concurrency database access is to allow only one
user in the database at a time. The only problem with that solution is that the other users
are going to get lousy response time. Can you seriously imagine doing that with a bank
teller machine system or an airline reservation system where tens of thousands of users
are waiting to get into the system at the same time?
It is the task of the concurrency-control manager to control the interaction among the
concurrent transactions, to ensure the consistency of the database.
Concurrency control mechanisms can be pessimistic, optimistic, or logical control types.
Page 9 of 44
Advanced Database System
Optimistic Concurrency Control
Optimistic concurrency control is based on the idea that transactions are not very likely to
conflict with each other, so we need to design a system to handle the problems as
exceptions after they actually occur.
Most optimistic concurrency control uses a timestamp to track copies of the data.
The transaction log is used to redo any changes made since the last backup.
But if the transaction log file is also damaged, there is no means to recover the database.
To reduce the risk of losing both the log file and your data, it is preferable to backup the log
file and the data on a separate backup device.
Page 10 of 44
Advanced Database System
• User mistakes
• Sabotage
• Natural disasters
And there are different techniques that prevent failures. Following are some common
techniques:
• Installing reliable operating system
• Implementing strong security systems
• Using UPS and surge protectors
• RAID arrays
After the system failure, the DBMS should be able to recover the transactions follows:
• Any transaction that was running at the time of failure needs to be undone and
restarted.
• Any transaction that committed since the last checkpoint need to be redone.
Using the above logic, the following can be done to the above five transaction conditions:
Page 11 of 44
Advanced Database System
• Transactions of type T1 need no recovery, because it is a committed transaction.
• Transactions of type T3 or T5 need to be undone and restarted, because they are
running transactions at the time of failure.
• Transactions of type T2 and T4 need to be redone because they are committed after
the last checkpoint and before the system fails.
To demonstrate the recovery algorithm, let’s take the previous five transactions above.
Observe that, at the last checkpoint only transaction T2 and T3 were in a running state. So,
the UNDO list contains only the two transactions and the REDO list is empty.
Page 12 of 44
Advanced Database System
The next entry in the log says ‘T4’ begins, and therefore it should be added to the UNDO list:
And again, the next entry in the log says ‘T5’ begins, and therefore it should be added to the
UNDO list:
Accordingly, the next entry says ‘T2 committed’, and therefore it should be transferred
from the UNDO list to the REDO list.
Page 13 of 44
Advanced Database System
Accordingly, the next entry says ‘T4 committed’, and therefore it should be transferred
from the UNDO list to the REDO list.
Page 14 of 44
Advanced Database System
Chapter-2- Query Processing and Optimization
2.1. Overview
Relational database systems are expected to be equipped with a query language that can
assist its users to query the database instances. One type of such query language is
relational algebra.
Relational algebra is a procedural query language, which takes instances of relations as
input and yields instances of relations as output. It is used internally in a DBMS to
represent a query evaluation plan. It uses operators to perform queries. An operator can
be either unary or binary. They accept relations as their input and yield relations as their
output.
Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set different
Cartesian product
Rename
σpr
Where σ stands for selection predicate and r stands for relation. p is a logic formula which
may use connectors like and, or, and not. These terms may use relational operators like −
=, ≠, ≥, <, >, ≤.
Look at the examples in the table below.
Page 15 of 44
Advanced Database System
Relational
An equivalent
algebra What it does:
sql statement:
expression:
SELECT *
σfname=”Abebe” AND dep Selects all students whose
= FROM student
first name is ‘Abebe’ from IT
“IT”(student) WHERE fname=”Abebe” department.
AND dep=”IT”;
ΠA1, A2… An r
Relational An equivalent
algebra sql What it does:
expression: statement:
Page 16 of 44
Advanced Database System
The Union Operation, ∪
It performs binary union between two given relations. The union operation is defined as:
r ∪ s = { t | t ∈ r or t ∈ s}
r - s
It finds all the tuples that are present in r but not in s.
Example: Suppose we have two relations Books and Articles, and they have ‘a_Name’ as a
common attribute that represents the name of the author.
Relational algebra
What it does:
expression:
Πa_Name (Books) ∪ Πa_Name Projects the names of the authors who have either
(Articles) written a book or an article or both.
Πa_Name (Books) - Πa_Name Provides the name of authors who have written
(Articles) books but not articles.
Page 17 of 44
Advanced Database System
Cartesian Product (Cross Product), Χ
The Cartesian product combines information of two different relations into one. It follows
the notation:
r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
relation r relation s
given by
Page 18 of 44
Advanced Database System
Join operator (⋈)
The relational algebra operator join is a binary operator, i.e. it operates between two
relations and returns a single relation. It uses the notation
R1 ⋈a1 = a2 R2
Where, R1 and R2 are relations and a1 and a2 are attributes of R1 and R2, respectively.
The above join operation is exactly equivalent to the following combination of select and
Cartesian product operations:
Teacher: Course:
Page 19 of 44
Advanced Database System
Teacher ⋈T_Id = CT_Id Course
outputs the same result as
σ T_Id = CT_Id (Teacher × Course)
T_Id T_name C_Num CT_Id C_title
Any
query with a join can always be rewritten into cross product followed by selection.
Page 20 of 44
Advanced Database System
Two key components of the query evaluation component of a SQL database system are the
query optimizer and the query execution engine.
The execution engine is responsible for the execution of a query plan that results in
generating answers to the query.
The query optimizer is responsible for generating the input for the execution engine.
Page 21 of 44
Advanced Database System
2.4. Basic Steps in Query Optimization
As seen previously, the query optimization process performs the two basic tasks:
representing the input query statement into a proper relational algebra expression and
then building a number of operator trees. It is from those possible evaluation plans that the
most efficient one will be provided to the query execution engine.
S E L E C T F_Name, L_Name
Π F_Name, L_Name(student)
F R O M student
S E L E C T F_Name, L_Name
F R O M student Π F_Name, L_Name(σBatch_Year >1 (student))
W H E R E Batch_Year>1;
Page 22 of 44
Advanced Database System
The query optimizer has a number of relational algebra expression options which are
equivalent. Following are equivalent expressions which results similar information, but
with different efficiency:
2.5. Pipelining
Pipelining is one way of evaluating relational operators from the relational expression tree.
In this technique, several operations are evaluated simultaneously. The result of one
operation is sent to the parent operation while other tuples are operated.
As an example, consider the following two relations: department and student.
student
St_Id F_Name L_Name Dep_code CGPA
department
Dep_cod
Dep_title Dep_faculty
e
The following relational algebra expression tree extracts title/name of departments that
contains students that have a CGPA of 4.0.
The pipelining technique selects the first student that scored 4.0, then it passes the tuple of
that student to the join operation, i.e. prior to selecting the rest students that scored 4.0,
and then the join operation is performed on that student and the result is sent to the
projection operator. The selection for the next students that have CGPA of 4.0 also
continues in parallel to the join and projection operations.
Pipelining may not always be applicable. For example, if the final output of the expression
tree is to be sorted in some order pipelining can’t be applied.
Page 23 of 44
Advanced Database System
Dep_title
CGPA=4.0
department
Page 24 of 44
Advanced Database System
Chapter-3- Database Integrity, Security and Recovery
3.1. Integrity
Tables − in relational data model, relations are saved in the format of Tables. This format
stores the relation among entities. A table has rows and columns, where rows represent
records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a
tuple.
Relation instance − A finite set of tuples in the relational database system represents
relation instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name tablename, attributes,
and their names.
Relation key − each row has one or more attributes, known as relation key, which can
identify the row in the relation table uniquely.
Attribute domain − every attribute has some pre-defined value scope, known as attribute
domain.
Keys and Candidate Keys - There must be at least one minimal subset of attributes in the
relation, which can identify a tuple uniquely. This minimal subset of attributes is called key
for that relation. If there are more than one such minimal subset, these are called candidate
keys.
Foreign key - is a key attribute of a relation that can be referred in other relation.
Page 25 of 44
Advanced Database System
Key Constraints:
Key constraints force the following two conditions:
In a relation with a key attribute, no two tuples/rows can have identical values for
key attributes.
A key attribute cannot have NULL values.
Key constraints are also referred to as Entity Constraints.
Domain constraints:
Attributes have specific values in real-world scenario. For example, age can only be a
positive integer. Every attribute is bound to have a specific range of values. For example,
age cannot be less than zero and telephone numbers cannot contain a digit outside 0-9.
Referential integrity constraints:
Referential integrity constraints work on the concept of Foreign Keys.
Referential integrity constraint states that if a relation refers to a key attribute of a different
or same relation, then that key element must exist.
It ensures the integrity of referential relationships between tables as defined by primary
and foreign keys. In a relation between two tables, one table has a primary key and the
other a foreign key. The primary key uniquely identifies each record in the first table. In
other words, there can be only one record in the first table with the same primary key
value.
The foreign key is placed into the second table in the relationship such that the foreign key
contains a copy of the primary key value from the record in the related table.
So, referential Integrity ensures the integrity of relationships between primary and foreign
key values in related tables. Most relational database engines use what are often called
constraints. Primary and foreign keys are both constraints.
There are some specific circumstances to consider in terms of how Referential Integrity is
generally enforced:
A primary key table is assumed to be a parent table and a foreign key table a child table.
When adding a new record to a child table, if a foreign key value is entered, it must
exist in the related primary key field of the parent table.
Foreign key fields can contain NULL values. Primary key field values can never contain
NULL values as they are required to be unique.
When changing a record in a parent table if the primary key is changed, the change
must be cascaded to all foreign key valued records in any related child tables.
Otherwise, the change to the parent table must be prohibited.
Page 26 of 44
Advanced Database System
When changing a record in a child table, a change to a foreign key requires that a
related primary key must be checked for existence, or changed first. If a foreign key
is changed to NULL, no primary key is required. If the foreign key is changed to a
non-NULL value, the foreign key value must exist as a primary key value in the
related parent table.
When deleting a parent table record then related foreign key records in child tables
must either be cascade deleted or deleted from child tables first.
3.2. Security
When you think of securing your database, the following issues should be considered:
Who gets the DBA role?
How many users will need access to the database?
Which users will need which privileges and which roles?
How will you remove users who no longer need access to the database?
Page 27 of 44
Advanced Database System
System privileges
System privileges are general permissions to perform functions in managing the server and
the database(s). Hundreds of permissions are supported by each database vendor, with
most of those being system privileges.
Permit the grantee to perform a general database function, such as creating new user
accounts or connecting to the database.
Here are some commonly used Microsoft SQL Server system privileges:
• CREATE DATABASE: Provides the ability to create new databases on the SQL server
• BACKUP DATABASE: Provides the ability to run backups of the databases on the SQL
server
Object Privileges
Object privileges are granted to users with the SQL GRANT statement and revoked with the
REVOKE statement. The database user (login) who receives the privileges is called the
grantee.
Permit the grantee to perform specific actions on specific objects, such as selecting from
the EMPLOYEES table or updating the DEPARTMENTS table.
To reduce the burden of managing privileges, most RDBMSs support storing a group of
privilege definitions as a single named object called a role. Roles may then be granted to
individual users, who then inherit all the privileges contained in the role. RDBMSs that
support roles also typically come with a number of predefined roles. Oracle, for example,
has a role called DBA that contains all the high-powered system and object privileges a
database user needs in administering a database.
Page 28 of 44
Advanced Database System
The restrictions that the subsystem of a DBMS concerned with can be program related or
data related. Program restriction can be, for example, who can create new bank accounts
and data restriction can be which bank accounts an individual user can see.
The DBMS stores information regarding the users of the DBMS and their access privileges
(name and password) in the data dictionary.
The access privileges can have the following several levels:
To create a database
to authorize (grant) additional users to
access the database
access some relations
create new relations
update the database
to revoke privileges
Page 29 of 44
Advanced Database System
Page 30 of 44
Advanced Database System
If all sites store identical relations, then there exists full replication. In addition, if all sites
contain a copy of the whole database, then the system is referred to be fully redundant
system.
Replication:
Reduced data transfer: because replica of relation r is available locally at each site,
there is no need of transfer of relations from site to site.
Fragmentation:
Fragmentation of relation r are divisions such as r1, r2… rn which contain sufficient
information to reconstruct the original relation r.
Relation fragmentation can be horizontal or vertical fragmentation.
Page 31 of 44
Advanced Database System
student schema
Two possible horizontal fragmentation of the student relation can be: (let’s name them
stdent1 and student2)
Page 32 of 44
Advanced Database System
Two possible vertical fragmentation of the student relation can be: (let’s name them list1
and list2)
2 Zinash Demere
3 Mohammed Ahmed
4 Adem Ali
5 Hilina Sitota
6 Zahra Muktar
2 COTM Weekend
3 CS Regular
4 Accounting Extension
5 IT Extension
6 CS Weekend
Page 33 of 44
Advanced Database System
allows a relation to be split so that tuples are located where they are most
frequently accessed
S1 S3
course
department
S2
student
For a query issued at site S1, the system needs to produce the result at site S1.
Possible processing strategies:
Strategy 1:
Transfer copies of all three relations to site S1 and choose a strategy for processing
the entire locally at site S1.
Strategy 2:
Transfer a copy of the course relation to site S2 and compute
at S2.
Strategy 3:
Devise similar strategies, exchanging the roles S1, S2, S3
Page 35 of 44
Advanced Database System
Maintaining a log for recovery purposes
Participating in coordinating the concurrent execution of the transactions executing
at that site.
In addition to the local transaction manager, there is also transaction coordinator at each
site performing the following activities:
Starting the execution of transactions that originate at the site.
Distributing subtransactions at appropriate sites for execution.
Coordinating the termination of each transaction that originates at the site, which
may result in the transaction being committed at all sites or aborted at all sites.
Atomicity of distributed transactions is assured by a commit protocol. A distributed
transaction can either be committed at all sites or aborted at all the sites; it is not
acceptable for it to be committed at some site and aborted in another site.
Page 36 of 44
Advanced Database System
Page 37 of 44
Advanced Database System
For convenience, many object-oriented data models permit direct access to variables of
other objects.
class employee {
/*Variables */
string name;
string address;
date start_date;
int salary;
/* Messages */
int annual_salary();
string get_name();
string get_address();
int set_address(string new_address);
int employment_length();
};
Methods to read and set the other variables (start-date, salary…) can also be added with
strict encapsulation.
Methods are defined separately, i.e. outside the class definition. Following are, for example,
two of the method definitions of the above class.
int employment-length() {
return today() – start_date;
}
Inheritance:
Inheritance is the concept when a subclass inherits the definition of another general class.
This meant that an object of a subclass need not carry its own definition of data and
methods that are generic to the class of which it is a part; it can use/inherit the general
class’s data and methods. Doing so has advantages. It speeds up program development and
also reduces program size. It also reduces the burden of the programmer by promoting
code reusability.
For example: You can define a car class and then you can define different types of cars
such as minibus, midi-bus, bus, lorry etc by inheriting common variables and methods
from car class.
Page 39 of 44
Advanced Database System
Company Class
Department Class
Person Class
Manager Class
Employee Class
Page 40 of 44
Advanced Database System
Page 41 of 44
Advanced Database System
Page 42 of 44
Advanced Database System
Data warehouses need throughput of huge amounts of data by relatively very few users.
Data warehouses process large quantities of data at once, mainly for reporting and
analytical processing. Also, data warehouses are regularly updated, but usually in large
batch operations. OLTP databases need lightning-quick response to many individual users.
Data warehouses perform enormous amounts of I/O activity over many quantities of data;
therefore, the needs of OLTP and data warehouse databases are completely contrary to
each other, down to the lowest layer of hardware resource usage.
6.2. Data Mining
6.2.1. Introduction
Data mining is a system that manipulates a large amount of data and provides important
information and knowledge. This information and knowledge gained can be used for
applications ranging from market analysis, fraud detection, and customer retention, to
production control and science exploration.
6.2.2. Components of Data Mining System
Data mining system can have the following components, each having its own role.
Database, data warehouse, WorldWideWeb, or other information repository: This is
one or a set of databases, data warehouses, spreadsheets, or other kinds of
information repositories. Data cleaning and data integration techniques may be
performed on the data.
Database or data warehouse server: The database or data warehouse server is
responsible for fetching the relevant data, based on the user’s data mining request.
Knowledge base: This is the domain knowledge that is used to guide the search or
evaluate the interestingness of resulting patterns. Such knowledge can include
concept hierarchies, used to organize attributes or attribute values into different
levels of abstraction.
Data mining engine: This is essential to the data mining system and ideally consists
of a set of functional modules for tasks such as characterization, association and
correlation analysis, classification, prediction, cluster analysis, outlier analysis, and
evolution analysis.
Pattern evaluation module: This component typically employs interestingness
measures and interacts with the data mining modules so as to focus the search
toward interesting patterns.
User interface: This module communicates between users and the data mining system,
allowing the user to interact with the system by specifying a data mining query or task,
Page 43 of 44
Advanced Database System
providing information to help focus the search, and performing exploratory data mining
based on the intermediate data mining results.
Page 44 of 44