Advanced Database Systems Handout (1)
Advanced Database Systems Handout (1)
object. They
tmodifying object’s
can be state
attribute
used by
values
to or
Chapter One
Concepts for Object-Oriented Databases
Introduction
Database technology has concentrated on the static aspects of information storage. With the
the arrival of the third generation of Database Management Systems, namely Object-Oriented
In database systems, we have seen the widespread acceptance of RDBMSs for traditional
business applications, such as order processing, inventory control, banking, and airline
reservations. However, existing RDBMSs have proven inadequate for applications whose needs
are quite different from those of traditional business database applications. These applications
include: computer-aided design (CAD), computer-aided manufacturing (CAM), computer-aided
software engineering (CASE), Network management systems, digital publishing, geographic
information systems (GIS), office information systems (OIS) and multimedia systems, interactive
and dynamic Web sites.
Strengths of the relational model are its simplicity, its suitability for Online Transaction
Processing (OLTP), symmetric structure, and its support for data independence. However, the
relational data model, and relational DBMSs, possess significant weaknesses as they are poor
in representing real-world entities, Semantic overloading, poor support for integrity, and general
Furthermore, in RDBMS, Transactions in business processing are generally short-lived and the
concurrency control primitives and protocols such as two-phase locking are not particularly
suited for long-duration transactions, Schema changes are difficult and RDBMSs were
designed to use content-based associative access (that is, declarative statements with
1
selection based on one or more predicates) and are poor at navigational access (that is,
access based on movement between individual records).
These limitations of early data models and the need for more complex applications, need for
additional data modeling features and increased use of object-oriented programming languages
resulted in the creation of object oriented databases.
2
• OODBMS: The manager of an OODB
• A data model: A particular way of describing data, relationships between data, and
constraints on the data.
• Data persistence: The ability for data to outlive the execution of a program and possibly
the lifetime of the program itself.
OO databases try to maintain a direct correspondence between real-world and database objects
so that objects do not lose their integrity and identity and can easily be identified and operated
The Object Oriented (OO) Data Model: In this data model, both data and their
relationships are contained in a single structure known as an object.
upon. An Object is a uniquely identifiable entity that contains both the attributes that describe
the state of a ‘real world’ object and the actions that are associated with it. (Simula 1960s). An
object, similar to program variable in programming language, is composed of two components:
state (value) and behaviour (operations), except that it will typically have a complex data
structure as well as specific operations defined by the programmer.
In OO databases, objects may have an object structure of arbitrary complexity in order to contain
all of the necessary information that describes the object. In contrast, in traditional database
systems, information about a complex object is often scattered over many relations or records,
leading to loss of direct correspondence between a real-world object and its database
representation.
The internal structure of an object in OOPLs includes the specification of instance variables,
which hold the values that define the internal state of the object. An instance variable is similar
to the concept of an attribute, except that instance variables may be encapsulated within the
object and thus are not necessarily visible to external users. Some OO models insist that all
3
This forces a complete encapsulation of objects. To encourage encapsulation, an operation is
defined in two parts:
• Signature or interface of the operation, specifies the operation name and arguments (or
parameters).
• Method or body, specifies the implementation of the operation.
The object type definition includes an operation signature for each operation that specifies the
name of the operation, the names and types of each argument, the names of any exceptions that
can be raised, and the types of the values returned, if any. Operations can be invoked by passing
a message to an object, which includes the operation name and the parameters. The object then
executes the method for that operation. This encapsulation permits modification of the internal
structure of an object, as well as the implementation of its operations, without the need to
disturb the external programs that invoke these operations
Some OO systems provide capabilities for dealing with multiple versions of the same object (a
feature that is essential in design and engineering applications). For example, an old version of
an object that represents a tested and verified design should be retained until the new version
is tested and verified which is very crucial for designs in manufacturing process control,
architecture and software systems.
4
identifier, or OID. In an object-oriented system, each object is assigned an Object Identifier
(OID) when it is created that is:
system-generated;
unique to that object;
Invariant/Immutable, in the sense that it cannot be altered during its lifetime. Once the
object is created, this OID will not be reused for any other object, even after the object
has been deleted;
Independent of the values of its attributes (that is, its state). Two objects could have
the same state but would have different identities;
Invisible to the user (ideally).
Thus, object identity ensures that an object can always be uniquely identified, thereby
automatically providing entity integrity. There are several advantages to using OIDs as
the mechanism for object identity:
They are efficient: OIDs require minimal storage within a complex object. Typically,
they are smaller than textual names, foreign keys, or other semantic-based references.
They are fast: OIDs point to an actual address or to a location within a table that gives
the address of the referenced object. This means that objects can be located quickly
whether they are currently stored in local memory or on disk.
They are independent: of content OIDs do not depend upon the data contained in the
object in any way. This allows the value of every attribute of an object to change, but
for the object to remain the same object with the same OID.
They cannot be modified by the user: if the OIDs are system-generated and kept
invisible, or at least read-only, the system can ensure entity and referential integrity
more easily. Further, this avoids the user having to maintain integrity.
Object Structure
An object structure is the common data layer that the integration framework components use
for outbound application message processing. In OODBS, the state (current value) of a
complex object may be constructed from other objects (or other values) using type constructors.
Type Constructors
A type constructor is a feature of a typed formal language that builds new types from old ones.
5
Basic types are considered to be built using nullary type constructors. Complex objects are
built from simpler ones by applying constructors to them. Attributes can be classified as simple
or complex. A simple attribute can be a primitive type such as integer, string, real, and so on,
which takes on literal values. The simplest objects are objects such as integers, characters,
byte strings of any length, Booleans and floats. Basic type constructors are atom, tuple, and
set; can also include list, bag, and array.
The atom constructor is used to represent all basic atomic values, such as integers, real
numbers, character strings, booleans, and any other basic data types that the system
supports directly.
Sets are critical because they are a natural way of representing collections from the real
world.
Tuples are critical because they are a natural way of representing properties of an entity.
Lists or arrays are important because they capture order
6
Figure: Specifying the object types EMPLOYEE, DATE, and DEPARTMENT using type constructors.
The concept of encapsulation means that an object contains both a data structure and the set of
operations that can be used to manipulate it. The concept of information hiding means that the
external aspects of an object are separated from its internal details, which are hidden from the
outside world. In this way the internal details of an object can be changed without affecting the
applications that use it, provided the external details remain the same. This prevents an
application becoming so interdependent that a small change has enormous ripple effects. In
other words information hiding provides a form of data independence. These concepts simplify
the construction and maintenance of applications through modularization. An object is a ‘black
box’ that can be constructed and modified independently of the rest of the system, provided the
external interface is not changed.
There are two views of encapsulation: the object-oriented programming language (OOPL) view
and the database adaptation of that view. In some OOPLs encapsulation is achieved through
Abstract Data Types (ADTs). In this view an object has an interface part and an implementation
part. The interface provides a specification of the operations that can be performed on the
object; the implementation part consists of the data structure for the ADT and the functions
7
that realize the interface. Only the interface part is visible to other objects or users. In the
database view, proper encapsulation is achieved by ensuring that programmers have access
only to the interface part. In this way encapsulation provides a form of logical data
independence: we can change the internal implementation of an ADT without changing any of
the applications using that ADT.
Specifying Object Behaviour via Class Operations (methods)
The main idea is to define the behaviour of a type of object based on the operations that can be
externally applied to objects of that type. Methods/operations define the behaviour of the
object. They can be used to change the object’s state by modifying its attribute values or to query
the value of selected attributes. In general, the implementation of an operation can be specified in
a general-purpose programming language that provides flexibility and power in defining the
operations.
A method consists of a name and a body that performs the behaviour associated with the
method name. In an object-oriented language, the body consists of a block of code that carries out
the required functionality. For example, the next code represents the method to update a member
of staff’s salary. The name of the method is updateSalary, with an input parameter increment,
which is added to the instance variable salary to produce a new salary.
Messages are the means by which objects communicate. A message is simply a request from
one object (the sender) to another object (the receiver) asking the second object to execute one
of its methods. The sender and receiver may be the same object.
Classes are blueprints for defining a set of similar objects. Thus, objects that have the same
attributes and respond to the same messages can be grouped together to form a class.
8
The attributes and associated methods are defined once for the class rather than separately for
each object. A class is also an object and has its own attributes and methods, referred to as class
attributes and class methods, respectively. Class attributes describe the general characteristics of
the class, such as totals or averages. For database applications, the requirement that all
objects be completely encapsulated is too stringent. One way of relaxing this requirement is to
divide the structure of an object into visible and hidden attributes (instance variables)
9
Reachability Mechanism: Make the object reachable from some persistent object. An
object B is said to be reachable from an object A if a sequence of references in the
object graph lead from object A to object B.
Inheritance allows one class to be defined as a special case of a more general class. These
special cases are known as subclasses and the more general cases are known as superclasses.
approach, which results in the identification of a generalized superclass from the original entity
types.
10
a set of superclasses and their related subclasses. The set of subclasses is defined on the basis of
some distinguishing characteristics of the entities in the superclass.
A subclass inherits all the properties of its superclass and additionally defines its own unique
properties (attributes and methods). All instances of the subclass are also instances of the
superclass. The principle of substitutability states that an instance of the subclass can be used
whenever a method or a construct expects an instance of the superclass.
11
– RECTANGLE subtype-of GEOMETRY_OBJECT: Width, Height
– TRIANGLE subtype-of GEOMETRY_OBJECT: Side1, Side2, Angle
– CIRCLE subtype-of GEOMETRY_OBJECT: Radius
Superclass/Subclass Relationships
Each member of a subclass is also a member of the superclass. In other words, the entity in the
subclass is the same entity in the superclass, but has a distinct role. The relationship between a
superclass and a subclass is one-to-one (1:1) and is called a superclass/subclass relationship.
Chapter Two
Query Processing and Optimization
A query means a request for information. A query based on a set of pre-defined code, so
your database understands the instruction. We refer to this code as the query language. The
standard for database management is Structured Query Language (SQL).
Query Processing is the process of choosing a suitable execution strategy for processing a query.
The activities implemented by query processing are the activities involved in parsing, validating,
optimizing, and executing a query. The aims of query processing are to transform a query written
in a high-level language, typically SQL, into a correct and efficient execution strategy expressed
in a low-level language (procedural). A low-level programming language is a programming
language that provides little or no abstraction from a computer's instruction set architecture—
commands or functions in the language map closely to processor instructions.
When the relational model was first launched commercially, one of the major criticisms often cited
was inadequate performance of queries. Since then, a significant amount of research has been
devoted to developing highly efficient algorithms for processing queries. There are many ways in
which a complex query can be performed, and one of the aims of query processing is
13
and optimization modules that are integral parts of relational database
management systems (RDBMSs)
Some of its concepts are incorporated into the SQL standard query
language for RDBMSs.
The fundamental operations of relational algebra are: Select, Project, Union, Set difference,
Cartesian product, and Rename.
Select Operation (σ): It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r) Where σ stands for selection predicate and r stands for relation. p is
prepositional logic formula which may use connectors like and, or, and not. These terms
may use relational operators like − =, ≠, ≥, <, >, ≤.
Example: σsubject = "database"(Books)
Project Operation (∏): It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
Example: ∏subject, author (Books)
Union Operation (∪): It performs binary union between two given relations and is defined as
r ∪ s = { t | t ∈ r or t ∈ s}.
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
Example: ∏ author (Books) ∪ ∏ author (Articles)
For a union operation to be valid, the following conditions must hold − r and s must have the
same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
Set Difference (−): The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
14
Example: ∏ author (Books) − ∏ author (Articles)
Cartesian Product (Χ): Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
Example: σauthor = ‘Jonathan’(Books Χ Articles)
Rename Operation (ρ)(rho): The results of relational algebra are also relations but without
any name. The rename operation allows us to rename the output relation.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are: Set intersection, Assignment and Join Operations.
Join Operations: A Join operation combines related tuples from different relations,
if and only if a given join condition is satisfied. It is denoted by ⋈.
A NATURAL JOIN is a JOIN operation that creates an implicit join clause for you
based on the common columns in the two tables being joined. Common columns
are columns that have the same name in both tables. A NATURAL JOIN can
be an
INNER join, a LEFT OUTER join, or a RIGHT OUTER join.
The intersection operator gives the common data values between the two data sets
that are intersected. The two data sets that are intersected should be similar for the
intersection operator to work. Intersection also removes all duplicates before displaying
the result. It is denoted by ∩.
• Assignment Operator
15
queries that summarize information from the database tuples.
o The COUNT function is used for counting tuples or values.
o Use of the Functional operator ℱ
ℱMAX Salary (Employee) retrieves the maximum salary
value from the
Employee relation
o ℱMIN Salary, ℱSUM Salary, ℱCOUNT SSN, AVERAGE Salary (Employee)
Algorithms for Executing Query Operations
Translating SQL Queries into Relation Algebra
Algorithms for External Sorting
Algorithms for SELECT and JOIN Operations
Algorithms for PROJECT and SET Operations
Implementing Aggregate Operations and OUTER JOINS
Combining Operations Using Pipelining
Using Heuristics in Query Optimization
Example: the following is an example of translating a given nested SQL query into an
equivalent relational algebra expression.
Query Optimization
In first generation network and hierarchical database systems, the low-level procedural query
16
language is generally embedded in a high-level programming language such as COBOL, and
it is the programmer’s responsibility to select the most appropriate execution strategy. In
contrast, with declarative languages such as SQL, the user specifies what data is required rather
than how it is to be retrieved. This relieves the user of the responsibility of determining, or
even knowing, what constitutes a good execution strategy and makes the language more
universally usable. Additionally, giving the DBMS the responsibility for selecting the best
z
Query Optimization
In first generation network and hierarchical database systems, the low-level procedural query
In contrast, with declarative languages such as SQL, the user specifies what data is required rather
than how it is to be retrieved. This relieves the user of the responsibility of determining, or even
knowing, what constitutes a good execution strategy and makes the language more
universally usable. Additionally, giving the DBMS the responsibility for selecting the best
strategy prevents users from choosing strategies that are known to be inefficient and gives the
17
DBMS more control over system performance.
An important aspect of query processing is query optimization. As there are many equivalent
transformations of the same high-level query, the aim of query optimization is to choose the
one that minimizes resource usage. Generally, we try to reduce the total execution time of the
query, which is the sum of the execution times of all individual operations that make up the
query. However, resource usage may also be viewed as the response time of the query, in which
case we concentrate on maximizing the number of parallel operations. Since the problem is
computationally intractable with a large number of relations, the strategy adopted is generally
reduced to finding a near optimum solution.
Query optimization is the process of choosing a suitable execution strategy for processing a
query. An internal representation (query tree or query graph) of the query is created after
scanning, parsing, and validating. The aim of query optimization is to choose the one that
minimizes resource usage. Generally, we try to reduce the total execution time of the query,
which is the sum of the execution times of all individual operations that make up the query
18
Static
The advantages of static optimization are that the runtime overhead is removed,
and there may be more time available to evaluate a larger number of execution
strategies, thereby increasing the chances of finding a more optimum strategy.
The disadvantages arise from the fact that the execution strategy that is chosen as being optimal
when the query is compiled may no longer be optimal when the query is run.
However, a hybrid approach could be used to overcome this disadvantage, where the query is re-
optimized if the system detects that the database statistics have changed significantly since the
query was last compiled.
19
The other technique systematically estimates the cost of different execution strategies and choosing
the lowest cost estimate.
The main heuristic is to apply first the operations that reduce the size of intermediate results.
• E.g., Apply SELECT and PROJECT operations before applying the JOIN or other
binary operations.
Query tree
A query tree is tree data structure that corresponds to a relational algebra expression. It
represents the input relations of the query as leaf nodes of the tree, and represents the relational
algebra operations as internal nodes. An execution of the query tree consists of executing an
internal node operation whenever its operands are available and then replacing that internal node
by the relation that results from executing the operation.
Query graph
A graph data structure that corresponds to a relational calculus expression. It does not indicate an
order on which operations to perform first. There is only a single graph corresponding to each
query.
Example:
For every project located in ‘Stafford’, retrieve the project number, the controlling department
number and the department manager’s last name, address and birthdate
SQL query:
20
Relation algebra:
pPNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((sPLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
Figure 2.3: (a) Query tree for the relational algebra, (b) Query tree for SQL query
21
2. Commutativity of : The operation is commutative:
3. Cascade of : In a cascade (sequence) of operations, all but the last one can be
ignored:
4. Commuting with : If the selection condition c involves only the attributes A1, ...,
An in the projection list, the two operations can be commuted:
6. Commuting with (or x ): If all the attributes in the selection condition c involve only the
attributes of one of the relations being joined—say, R—the two operations can be commuted as
follows:
7. Commuting with (or x): Suppose that the projection list is L = {A1, ..., An, B1, ..., Bm},
where A1, ..., An are attributes of R and B1, ..., Bm are attributes of S. If the join condition c
involves only attributes in L, the two operations can be commuted as follows:
22
If the join condition C contains additional attributes not in L, these must be added to the projection
list, and a final operation is needed.
8. Commutativity of set operations: The set operations υ and ∩ are commutative but “–” is not.
9. Associativity of , x, υ, and ∩ : These four operations are individually associative; that is, if
stands for any one of these four operations (throughout the expression), we have
10. Commuting with set operations: The operation commutes with υ, ∩, and –. If stands for
any one of these three operations, we have
2. Using rules 2, 4, 6, and 10 concerning the commutativity of select with other operations,
move each select operation as far down the query tree as is permitted by the attributes involved in
the select condition.
23
3. Using rule 9 concerning associativity of binary operations, rearrange the leaf nodes of the tree
so that the leaf node relations with the most restrictive select operations are executed first in the
query tree representation.
4. Using Rule 12, combine a Cartesian product operation with a subsequent select operation
in the tree into a join operation.
5. Using rules 3, 4, 7, and 11 concerning the cascading of project and the commuting of project
with other operations, break down and move lists of projection attributes down the tree as far as
possible by creating new project operations as needed.
6. Identify subtrees that represent groups of operations that can be executed by a single algorithm.
2. Perform select operations as early as possible to reduce the number of tuples and perform
project operations as early as possible to reduce the number of attributes. (This is done by moving
select and project operations as far down the tree as possible.)
3. The select and join operations that are most restrictive should be executed before other similar
operations. (This is done by reordering the leaf nodes of the tree among themselves and
adjusting the rest of the tree appropriately.)
24
will use. Estimate and compare the costs of executing a query using different execution
strategies and choose the strategy with the lowest cost estimate.
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
NB: Different database systems may focus on different cost components.
Catalog Information Used in Cost Functions
• Information about the size of a file
number of records (tuples) (r),
record size (R),
number of blocks (b)
blocking factor (bfr)
25
Consider the following SQL query,
Explanation:
Suppose that we had a constraint on the database schema that stated that no employee
can earn more than his or her direct supervisor. If the semantic query optimizer checks for
the existence of this constraint, it need not execute the query at all because it knows
that the result of the query will be empty.
26
Chapter Three
Transaction Processing Concepts
Introduction
Database management system (DBMS) is a software package/system to facilitate creation and
maintenance of database. DBMS Support different types of databases. Databases can be
classified according to the number of users: database locations: and expected type and
extent of use. Depending on the number of users databases are classified into single user
databases and multiuser databases.
Single user databases: At most one user at a time can use the system
Multiuser databases: Many users can access the system concurrently.
Transaction Support
Transaction: An action, or series of actions, carried out by a single user or application
program, which reads or updates the contents of the database. A transaction is a logical unit of
work on the database. It may be an entire program, a part of a program, or a single command (for
example, the SQL command INSERT or UPDATE), and it may involve any number of operations
on the database. In the database context, the execution of an application program can be thought
27
of as one or more transactions with non-database processing taking place in between. A
transaction should always transform the database from one consistent state to another.
Example:
Two sample transactions to illustrate the concepts of a transaction,
a. Transaction T1
b. Transaction T2
28
A transaction should always transform the database from one consistent state to another,
although we accept that consistency may be violated while the transaction is in progress. A
transaction can have one of two outcomes. If it completes successfully, the transaction is said to
have committed and the database reaches a new consistent state. On the other hand, if the
transaction does not execute successfully, the transaction is aborted. If a transaction is aborted, the
database must be restored to the consistent state it was in before the transaction started.
29
accessing the database simultaneously and at least one is updating data, there may be
interference that can result in inconsistencies. Therefore, we examine three potential problems
caused by concurrency: the lost update problem, the uncommitted dependency problem, and the
inconsistent analysis problem.
The Lost Update Problem: An apparently successfully completed update operation by one user
can be overridden by another user. This occurs when two transactions that access the same database
items have their operations interleaved in a way that makes the value of some database item
incorrect.
Problem occurs when one transaction is allowed to see the intermediate results of another
transaction before it has committed. This means that when one transaction updates a database item
and then the transaction fails for some reason. However, the updated item is accessed by another
transaction before it is changed back to its original value.
30
Figure 3.2: The Temporary Update (or Dirty Read) Problem
The inconsistent analysis problem: also known as The Incorrect Summary Problem occurs when a
transaction reads several values from the database but a second transaction updates some of them
during the execution of the first. For example, a transaction that is summarizing data in a database
(for example, totalling balances) will obtain inaccurate results if, while it is executing, other
transactions are updating the database. Another example is if one transaction is calculating an
aggregate summary function on a number of records while other transactions are updating some
of these records, the aggregate function may calculate some values before they are updated and
others after they are updated.
31
iii. Recovery Services
Database recovery is the process of restoring the database to a correct state following a failure.
The failure may be the result of a system crash due to hardware or software errors, a media failure,
such as a head crash, or a software error in the application, such as a logical error in the program
that is accessing the database. It may also be the result of unintentional or intentional corruption
or destruction of data or facilities by system administrators or users. Whatever the underlying cause
of the failure, the DBMS must be able to recover from the failure and restore the database to a
consistent state.
2. A transaction or system error: Some operation in the transaction may cause it to fail, such as
integer overflow or division by zero. Transaction failure may also occur because of erroneous
parameter values or because of a logical programming error.
In addition, the user may interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction: Certain conditions necessitate
cancellation of the transaction. For example, data for the transaction may not be found. A
condition, such as insufficient account balance in a banking database, may cause a transaction,
such as a fund withdrawal from that account, to be cancelled. A programmed abort in the
transaction causes it to fail.
5. Disk failure: Some disk blocks may lose their data because of a read or write malfunction
or because of a disk read/write head crash. This may happen during a read or a write operation of
the transaction.
32
6. Physical problems and catastrophes: This refers to an endless list of problems that includes
power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and
mounting of a wrong tape by the operator.
A transaction is an atomic unit of work that is either completed in its entirety or not done at all.
For recovery purposes, the system needs to keep track of when the transaction starts,
terminates, and commits or aborts.
33
Figure 3.4: State transition diagram illustrating the states for transaction execution
Recovery Manager
If a failure occurs during the transaction, then the database could be inconsistent. It is the task of
the recovery manager to ensure that the database is restored to the state it was in before the start
of the transaction, and therefore a consistent state. Recovery manager keeps track of the following
operations for recovery purposes:
34
undo: Similar to rollback except that it applies to a single operation rather than to a
whole transaction.
redo: This specifies that certain transaction operations must be redone to ensure that
all the operations of a committed transaction have been applied successfully
to the database
The log is kept on disk, so it is not affected by any type of failure except for disk or catastrophic
failure. In addition, the log is periodically backed up to archival storage (tape) to guard against
such catastrophic failures.
1. Because the log contains a record of every write operation that changes the value of
35
some database item, it is possible to undo the effect of these write operations of a
transaction T by tracing backward through the log and resetting all items changed by a write
operation of T to their old_values.
2. We can also redo the effect of the write operations of a transaction T by tracing forward through
the log and setting all items changed by a write operation of T (that did not get done permanently)
to their new_values.
36
• Atomicity: - A transaction is an atomic unit of processing; it should either be performed in its
entirety or not performed at all.
• Isolation: - A transaction should appear as though it is being executed in isolation from other
transactions, even though many transactions are executing concurrently. That is, the execution of
a transaction should not be interfered with by any other transactions executing concurrently.
However, the aim of a multi-user DBMS is also to maximize the degree of concurrency or
parallelism in the system, so that transactions that can execute without interfering with one another
can run in parallel. For example, transactions that access different parts of the database can be
scheduled together without interference.
Schedule is a sequence of the operations by a set of concurrent transactions that preserves the
37
1. Recoverable Schedule: A schedule is said to be recoverable if it is recoverable as name suggest.
One where no transaction needs to be rolled back. A schedule S is recoverable if no transaction T
in S commits until all transactions T’ that have written an item that T reads have committed. A
schedule where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written
by Ti, then the commit operation of Ti precedes the commit operation of Tj.
2. Cascadeless Schedule: When no read or write-write occurs before execution of transaction then
corresponding schedule is called cascadeless schedule. One where every transaction reads only the
items that are written by committed transactions.
4. Strict Schedules: A schedule in which a transaction can neither read nor write an item X until
the last transaction that wrote X has committed. Strict schedule is strict in nature.
Serializability
A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions.
The objective of serializability is to find non serial schedules that allow transactions to execute
concurrently without interfering with one another, and thereby produce a database state that could
be produced by a serial execution. In serializability, the ordering of read and write operations
is important:
If two transactions only read a data item, they do not conflict and order is not important.
If two transactions either read or write completely separate data items, they do not conflict and
order is not important.
38
If one transaction writes a data item and another either reads or writes the same data item, the
order of execution is important.
2. Conflict equivalent: Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules. Two operations in a schedule are said to
conflict if they belong to different transactions, access the same database item, and either both are
write_item operations or one is a write_item and the other a read_item. If two conflicting
operations are applied in different orders in two schedules, the effect can be different on the
database or on the transactions in the schedule, and hence the schedules are not conflict equivalent.
If two conflicting operations are applied in different orders in two schedules, the effect can be
different on the database or on the transactions in the schedule,and hence the schedules are not
conflict equivalent.
Being serializable is not the same as being serial. Being serializable implies that the schedule is
the correct schedule.
Rather protocols, or rules, are developed that guarantee that any schedule that follows these rules
will be serializable. Algorithm can be used to test a schedule for conflict serializability.
39
View Equivalence and View Serializability
Another less restrictive definition of equivalence of schedules is called view equivalence. This
leads to another definition of serializability called view serializability. Two schedules S and S are
said to be view equivalent if the following three conditions hold:
1. The same set of transactions participates in S and S’, and S and S’ include the same operations
of those transactions.
2. For any operation Ri(X) of Ti in S, if the value of X read by the operation has been written by
an operation Wj(X) of Tj (or if it is the original value of X before the schedule started), the same
condition must hold for the value of X read by operation Ri(X) of Ti in S’.
3. If the operation Wk(Y) of Tk is the last operation to write item Y in S, then Wk(Y) of Tk must
also be the last operation to write item Y in S’.
The idea behind view equivalence is that, as long as each read operation of a trans-action reads the
result of the same write operation in both schedules, the write operations of each transaction must
produce the same results. The read operations are hence said to see the same view in both
schedules. A schedule S is said to be view serializable if it is view equivalent to a serial
schedule.
Relationship between view and conflict equivalence
The definitions of conflict serializability and view serializability are similar if a condition known
as the constrained write assumption (or no blind writes) holds on all transactions in the schedule.
This condition states that any write operation wi(X) in Ti is preceded by a ri(X) in Ti and that the
value written by wi(X) in Ti depends only on the value of X read by ri(X).
This assumes that computation of the new value of X is a function f(X) based on the old value of
X read from the database. A blind write is a write operation in a transaction T on an item X that is
not dependent on the value of X, so it is not preceded by a read of X in the transaction
T.
40
Conflict serializability is stricter than view serializability. With unconstrained write (or blind
write), a schedule that is view serializable is not necessarily conflict serializable. Any conflict
serializable schedule is also view serializable, but not vice versa.
The semantics of debit-credit operations is that they update the value of a data item X by either
subtracting from or adding to the value of the data item. Because addition and subtraction
operations are commutative—that is, they can be applied in any order—it is possible to produce
correct schedules that are not serializable.
The access mode can be specified as READ ONLY or READ WRITE. The default is READ
WRITE, unless the isolation level of READ UNCOMMITTED is specified, in which case READ
ONLY is assumed. A mode of READ WRITE allows select, update, insert, delete, and create
41
commands to be executed. A mode of READ ONLY, as the name implies, is simply for data
retrieval.
This transaction consists of first inserting a new row in the EMPLOYEE table and then
updating the salary of all employees who work in department 2. If an error occurs on any of the
SQL statements, the entire transaction is rolled back. This implies that any updated salary (by this
transaction) would be restored to its previ-ous value and that the newly inserted row would be
removed
42