0% found this document useful (0 votes)
76 views39 pages

ADB - Chapter 1-3

This document provides an overview of object-oriented database concepts. It discusses the limitations of relational databases that led to the development of object-oriented databases. Key concepts covered include object identity, structure, and type constructors. Object-oriented databases aim to directly model real-world objects and their relationships to address the shortcomings of relational databases for complex applications.

Uploaded by

Bonsa Desale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views39 pages

ADB - Chapter 1-3

This document provides an overview of object-oriented database concepts. It discusses the limitations of relational databases that led to the development of object-oriented databases. Key concepts covered include object identity, structure, and type constructors. Object-oriented databases aim to directly model real-world objects and their relationships to address the shortcomings of relational databases for complex applications.

Uploaded by

Bonsa Desale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Advanced Database Systems Handout

Chapter One
Concepts for Object Oriented Databases

Introduction
Database technology has concentrated on the static aspects of information storage. With the
arrival of the third generation of Database Management Systems, namely Object-Oriented
Database Management Systems (OODBMSs) and Object-Relational Database Management
Systems (ORDBMSs), the two disciplines have been combined to allow the concurrent
modelling of both data and the processes acting upon the data. In database systems, we have
seen the widespread acceptance of RDBMSs for traditional business applications, such as order
processing, inventory control, banking, and airline reservations. However, existing RDBMSs
have proven inadequate for applications whose needs are quite different from those of
traditional business database applications. These applications include: computer-aided design
(CAD), computer-aided manufacturing (CAM), computer-aided software engineering (CASE),
network management systems, digital publishing, geographic information systems (GIS),
office information systems (OIS) and multimedia systems, interactive and dynamic Web sites.

Strengths of the relational model are its simplicity, its suitability for Online Transaction
Processing (OLTP), symmetric structure and its support for data independence. However, the
relational data model, and relational DBMSs, possess significant weaknesses as they are poor
in representing real world entities, Semantic overloading, poor support for integrity and general
constraints, homogeneous data structure, limited operations, difficulty handling recursive
queries and impedance mismatch. Furthermore, in RDBMS, Transactions in business
processing are generally short-lived and the concurrency control primitives and protocols such
as two-phase locking are not particularly suited for long-duration transactions, Schema changes
are difficult and RDBMSs were designed to use content-based associative access (that is,
declarative statements with selection based on one or more predicates) and are poor at
navigational access (that is, access based on movement between individual records).
These limitations of early data models and the need for more complex applications, need for
additional data modeling features and increased use of object-oriented programming languages
resulted in the creation of object oriented databases.
Advanced Database Systems Handout

DBMS Database Models


A Database model defines the logical design and structure of a database and defines how data
will be stored, accessed and updated in a database management system. The different database
models are: Hierarchical Model, Network Model, Entity-relationship Model and Relational
Model.
 Hierarchical Model: This database model organises data into a tree-like-structure, with
a single root, to which all the other data is linked. The hierarchy starts from the Root
data, and expands like a tree, adding child nodes to the parent nodes.
 Network Model: This is an extension of the Hierarchical model. In this model data is
organised more like a graph, and are allowed to have more than one parent node and
was commonly used to map many-to-many data relationships.
 Entity-relationship Model: In this database model, relationships are created by dividing
object of interest into entity and its characteristics into attributes. Different entities are
related using relationships.
 Relational Model: In this data model, data is organised in two-dimensional tables and
the relationship is maintained by storing a common field.
 The Object Oriented (OO) Data Model: In this data model, both data and their
relationships are contained in a single structure known as an object.

1.1 Overview of Object-Oriented Concepts


Definitions:
 An Object is a uniquely identifiable entity that contains both the attributes that describe
the state of a ‘real world’ object and the actions that are associated with it.
 OODM: A (logical) data model that captures the semantics of objects supported in
object-oriented programming.
 OODB: A persistent and sharable collection of objects defined by an OODM.
 OODBMS: The manager of an OODB
 A data model: A particular way of describing data, relationships between data, and
constraints on the data.
 Data persistence: The ability for data to outlive the execution of a program and possibly
the lifetime of the program itself.
OO databases try to maintain a direct correspondence between real-world and database objects
so that objects do not lose their integrity and identity and can easily be identified and operated
Advanced Database Systems Handout

upon. An Object is a uniquely identifiable entity that contains both the attributes that describe
the state of a ‘real world’ object and the actions that are associated with it. (Simula 1960s). An
object, similar to program variable in programming language, is composed of two components:
state (value) and behaviour (operations), except that it will typically have a complex data
structure as well as specific operations defined by the programmer. In OO databases, objects
may have an object structure of arbitrary complexity in order to contain all of the necessary
information that describes the object. In contrast, in traditional database systems, information
about a complex object is often scattered over many relations or records, leading to loss of
direct correspondence between a real-world object and its database representation.

The internal structure of an object in OOPLs includes the specification of instance variables,
which hold the values that define the internal state of the object. An instance variable is similar
to the concept of an attribute, except that instance variables may be encapsulated within the
object and thus are not necessarily visible to external users. Some OO models insist that all
operations a user can apply to an object must be predefined. This forces a complete
encapsulation of objects. To encourage encapsulation, an operation is defined in two parts:
 Signature or interface of the operation, specifies the operation name and arguments
(or parameters).
 Method or body, specifies the implementation of the operation.
The object type definition includes an operation signature for each operation that specifies the
name of the operation, the names and types of each argument, the names of any exceptions that
can be raised, and the types of the values returned, if any. Operations can be invoked by passing
a message to an object, which includes the operation name and the parameters. The object then
executes the method for that operation. This encapsulation permits modification of the internal
structure of an object, as well as the implementation of its operations, without the need to
disturb the external programs that invoke these operations
Some OO systems provide capabilities for dealing with multiple versions of the same object (a
feature that is essential in design and engineering applications). For example, an old version of
an object that represents a tested and verified design should be retained until the new version
is tested and verified which is very crucial for designs in manufacturing process control,
architecture and software systems.
Operator polymorphism/ operator overloading: This refers to an operation’s ability to be
applied to different types of objects; in such a situation, an operation name may refer to several
distinct implementations, depending on the type of objects it is applied to.
Advanced Database Systems Handout

1.2 Object Identity, Object Structure, and Type Constructors


An object is described by four characteristics: structure, identifier, name, and lifetime.
Object Identity
An OO database system provides a unique identity to each independent object stored in the
database. This unique identity is typically implemented via a unique, system-generated object
identifier, or OID. In an object-oriented system, each object is assigned an Object Identifier
(OID) when it is created that is:
 system-generated;
 unique to that object;
 Invariant/Immutable, in the sense that it cannot be altered during its lifetime. Once the
object is created, this OID will not be reused for any other object, even after the object
has been deleted;
 Independent of the values of its attributes (that is, its state). Two objects could have
the same state but would have different identities;
 Invisible to the user (ideally).
Thus, object identity ensures that an object can always be uniquely identified, thereby
automatically providing entity integrity. There are several advantages to using OIDs as the
mechanism for object identity:
 They are efficient: OIDs require minimal storage within a complex object. Typically,
they are smaller than textual names, foreign keys, or other semantic-based references.
 They are fast: OIDs point to an actual address or to a location within a table that gives
the address of the referenced object. This means that objects can be located quickly
whether they are currently stored in local memory or on disk.
 They are independent: of content OIDs do not depend upon the data contained in the
object in any way. This allows the value of every attribute of an object to change, but
for the object to remain the same object with the same OID.
 They cannot be modified by the user: if the OIDs are system-generated and kept
invisible, or at least read-only, the system can ensure entity and referential integrity
more easily. Further, this avoids the user having to maintain integrity.
Object Structure
An object structure is the common data layer that the integration framework components use
for outbound application message processing. In OODBS, the state (current value) of a
complex object may be constructed from other objects (or other values) using type constructors.
Advanced Database Systems Handout

Type Constructors
A type constructor is a feature of a typed formal language that builds new types from old ones.
Basic types are considered to be built using nullary type constructors. Complex objects are
built from simpler ones by applying constructors to them. Attributes can be classified as simple
or complex. A simple attribute can be a primitive type such as integer, string, real, and so on,
which takes on literal values. The simplest objects are objects such as integers, characters,
byte strings of any length, Booleans and floats. Basic type constructors are atom, tuple, and
set; can also include list, bag, and array.
 The atom constructor is used to represent all basic atomic values, such as integers, real
numbers, character strings, booleans, and any other basic data types that the system
supports directly.
 Sets are critical because they are a natural way of representing collections from the real
world.
 Tuples are critical because they are a natural way of representing properties of an entity.
 Lists or arrays are important because they capture order

Figure: Specifying the object types EMPLOYEE, DATE, and DEPARTMENT using type constructors.
Advanced Database Systems Handout

1.3 Encapsulation of Operations, Methods, and Persistence


Encapsulation is one of the main characteristics of OO languages and systems which is Related
to the concepts of abstract data types and information hiding in programming languages. An
Abstract Data Type (ADT) is an abstract concept defined by axioms which represent some data
and operations on that data. An abstract data type defines not only a data representation for
objects of the type but also the set of operations that can be performed on objects of the type.
Furthermore, the abstract data type can protect the data representation from direct access by
other parts of the program. Abstract Data Types are focused on what, not how (they're framed
declaratively, and do not specify algorithms or data structures). Common examples include
lists, stacks, sets, etc.
The concept of encapsulation means that an object contains both a data structure and the set of
operations that can be used to manipulate it. The concept of information hiding means that the
external aspects of an object are separated from its internal details, which are hidden from the
outside world. In this way the internal details of an object can be changed without affecting the
applications that use it, provided the external details remain the same. This prevents an
application becoming so interdependent that a small change has enormous ripple effects. In
other words information hiding provides a form of data independence. These concepts simplify
the construction and maintenance of applications through modularization. An object is a ‘black
box’ that can be constructed and modified independently of the rest of the system, provided the
external interface is not changed.
There are two views of encapsulation: the object-oriented programming language (OOPL) view
and the database adaptation of that view. In some OOPLs encapsulation is achieved through
Abstract Data Types (ADTs). In this view an object has an interface part and an implementation
part. The interface provides a specification of the operations that can be performed on the
object; the implementation part consists of the data structure for the ADT and the functions
that realize the interface. Only the interface part is visible to other objects or users. In the
database view, proper encapsulation is achieved by ensuring that programmers have access
only to the interface part. In this way encapsulation provides a form of logical data
independence: we can change the internal implementation of an ADT without changing any of
the applications using that ADT.
Specifying Object Behaviour via Class Operations (methods)
The main idea is to define the behaviour of a type of object based on the operations that can be
externally applied to objects of that type. Methods/operations define the behaviour of the
object. They can be used to change the object’s state by modifying its attribute values or to
Advanced Database Systems Handout

query the value of selected attributes. In general, the implementation of an operation can be
specified in a general-purpose programming language that provides flexibility and power in
defining the operations.
A method consists of a name and a body that performs the behaviour associated with the
method name. In an object-oriented language, the body consists of a block of code that carries
out the required functionality. For example, the next code represents the method to update a
member of staff’s salary. The name of the method is updateSalary, with an input parameter
increment, which is added to the instance variable salary to produce a new salary.

Messages are the means by which objects communicate. A message is simply a request from
one object (the sender) to another object (the receiver) asking the second object to execute one
of its methods. The sender and receiver may be the same object.
Classes are blueprints for defining a set of similar objects. Thus, objects that have the same
attributes and respond to the same messages can be grouped together to form a class. The
attributes and associated methods are defined once for the class rather than separately for each
object. A class is also an object and has its own attributes and methods, referred to as class
attributes and class methods, respectively. Class attributes describe the general characteristics
of the class, such as totals or averages. For database applications, the requirement that all
objects be completely encapsulated is too stringent. One way of relaxing this requirement is to
divide the structure of an object into visible and hidden attributes (instance variables).
Advanced Database Systems Handout

Figure: Adding operations to the definitons of EMPLOYEE and DEPARTMENT

Specifying Object Persistence via Naming and Reachability

 Naming Mechanism: Assign an object a unique persistent name through which it can
be retrieved by this and other programs.
 Reachability Mechanism: Make the object reachable from some persistent object. An
object B is said to be reachable from an object A if a sequence of references in the
object graph lead from object A to object B.
1.4 Type Hierarchies and Inheritance
An entity type represents a set of entities of the same type such as Staff, Branch, and
PropertyForRent. We can also form entity types into a hierarchy containing superclasses and
subclasses.
 Superclass: An entity type that includes one or more distinct subgroupings of its
occurrences, which require to be represented in a data model.
 Subclass: A distinct subgrouping of occurrences of an entity type, which require to be
represented in a data model.
Entity types that have distinct subclasses are called superclasses. For example, the entities that
are members of the Staff entity type may be classified as Manager, SalesPersonnel, and
Advanced Database Systems Handout

Secretary. In other words, the Staff entity is referred to as the superclass of the Manager,
SalesPersonnel, and Secretary subclasses. The relationship between a superclass and any one
of its subclasses is called a superclass/subclass relationship. For example, Staff/Manager has
a superclass/subclass relationship.
Inheritance allows one class to be defined as a special case of a more general class. These
special cases are known as subclasses and the more general cases are known as superclasses.
The process of forming a superclass is referred to as generalization; forming a subclass is
specialization. Generalization is the process of minimizing the differences between entities by
identifying their common characteristics. The process of generalization is a bottom-up
approach, which results in the identification of a generalized superclass from the original entity
types. Specialization is the process of maximizing the differences between members of an
entity by identifying their distinguishing characteristics. Specialization is a top-down approach
to defining a set of superclasses and their related subclasses. The set of subclasses is defined
on the basis of some distinguishing characteristics of the entities in the superclass.

A subclass inherits all the properties of its superclass and additionally defines its own unique
properties (attributes and methods). All instances of the subclass are also instances of the
superclass. The principle of substitutability states that an instance of the subclass can be used
whenever a method or a construct expects an instance of the superclass.

Type (class) Hierarchy


A type in its simplest form can be defined by giving it a type name and then listing the names
of its visible (public) functions. When specifying a type in this section, we use the following
format:
TYPE_NAME: function, function, . . . , function
Example: PERSON: Name, Address, Birthdate, Age, SSN
Subtype/Subclass: When the designer or user must create a new type that is similar but not
identical to an already defined type.
Supertype/Superclass: It inherits all the functions of the subtype

Example (1): creating subtypes from PERSON supertype


PERSON: Name, Address, Birthdate, Age, SSN
EMPLOYEE: Name, Address, Birthdate, Age, SSN, Salary, HireDate, Seniority
STUDENT: Name, Address, Birthdate, Age, SSN, Major, GPA
Advanced Database Systems Handout

– EMPLOYEE subtype-of PERSON: Salary, HireDate, Seniority


– STUDENT subtype-of PERSON: Major, GPA
Example (2):
– Consider a type that describes objects in plane geometry, which may be defined as
follows:
– GEOMETRY_OBJECT: Shape, Area, ReferencePoint
– Now suppose that we want to define a number of subtypes for the
GEOMETRY_OBJECT type, as follows:
– RECTANGLE subtype-of GEOMETRY_OBJECT: Width, Height
– TRIANGLE subtype-of GEOMETRY_OBJECT: Side1, Side2, Angle
– CIRCLE subtype-of GEOMETRY_OBJECT: Radius

Superclass/Subclass Relationships
Each member of a subclass is also a member of the superclass. In other words, the entity in the
subclass is the same entity in the superclass, but has a distinct role. The relationship between a
superclass and a subclass is one-to-one (1:1) and is called a superclass/subclass relationship.
Advanced Database Systems Handout

Chapter Two
Query Processing and Optimization

A query means a request for information. A query based on a set of pre-defined code, so
your database understands the instruction. We refer to this code as the query language. The
standard for database management is Structured Query Language (SQL).

Query Processing is the process of choosing a suitable execution strategy for processing a
query. The activities implemented by query processing are the activities involved in parsing,
validating, optimizing, and executing a query. The aims of query processing are to transform a
query written in a high-level language, typically SQL, into a correct and efficient execution
strategy expressed in a low-level language (procedural). A low-level programming language is
a programming language that provides little or no abstraction from a computer's instruction set
architecture—commands or functions in the language map closely to processor instructions.

When the relational model was first launched commercially, one of the major criticisms often
cited was inadequate performance of queries. Since then, a significant amount of research has
been devoted to developing highly efficient algorithms for processing queries. There are many
ways in which a complex query can be performed, and one of the aims of query processing is
to determine which one is the most cost effective.

Query processing can be divided into four main phases:


 Query decomposition (consisting of parsing and validation),
 Query optimization,
 Code generation, and
 Runtime query execution.
Advanced Database Systems Handout

Figure 2.1: Query Processing

Query decomposition is the first phase of query processing which involves scanning, parsing,
and validating a given query. The aims of query decomposition are to transform a high-level
query into a relational algebra query and checks that it is syntactically and semantically correct.
A relational algebra query: is a procedural query language, which takes instances of relations
as input and yields instances of relations as output. It uses operators (unary or binary) to
perform queries. They accept relations as their input and yield relations as their output.
The relational algebra is very important for several reasons:
 It provides a formal foundation for relational model operations.
 It is used as a basis for implementing and optimizing queries in the query processing
and optimization modules that are integral parts of relational database management
systems (RDBMSs)
 Some of its concepts are incorporated into the SQL standard query language for
RDBMSs.
The fundamental operations of relational algebra are: Select, Project, Union, Set difference,
Cartesian product, and Rename.
Select Operation (σ): It selects tuples that satisfy the given predicate from a relation.
Advanced Database Systems Handout

Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic
formula which may use connectors like and, or, and not. These terms may use
relational operators like − =, ≠, ≥, <, >, ≤.
Example: σsubject = "database"(Books)
Project Operation (∏): It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
Example: ∏subject, author (Books)
Union Operation (∪): It performs binary union between two given relations and is defined as
r ∪ s = { t | t ∈ r or t ∈ s}.
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
Example: ∏ author (Books) ∪ ∏ author (Articles)
For a union operation to be valid, the following conditions must hold −
 r and s must have the same number of attributes.
 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.
Set Difference (−): The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
Example: ∏ author (Books) − ∏ author (Articles)
Cartesian Product (Χ): Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
Example: σauthor = ‘Jonathan’(Books Χ Articles)
Rename Operation (ρ)(rho): The results of relational algebra are also relations but without
any name. The rename operation allows us to rename the output relation.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Advanced Database Systems Handout

Additional operations are: Set intersection, Assignment and Join Operations.


 Join Operations: A Join operation combines related tuples from different relations,
if and only if a given join condition is satisfied. It is denoted by ⋈.
 A NATURAL JOIN is a JOIN operation that creates an implicit join clause for you
based on the common columns in the two tables being joined. Common columns are
columns that have the same name in both tables. A NATURAL JOIN can be an
INNER join, a LEFT OUTER join, or a RIGHT OUTER join.
 The intersection operator gives the common data values between the two data sets
that are intersected. The two data sets that are intersected should be similar for the
intersection operator to work. Intersection also removes all duplicates before displaying
the result. It is denoted by ∩.
 Assignment Operator
Aggregate Functions and Grouping
A type of request that cannot be expressed in the basic relational algebra is to specify
mathematical aggregate functions on collections of values from the database. Examples of such
functions include retrieving the average or total salary of all employees or the total number of
employee tuples. Common functions applied to collections of numeric values include SUM,
AVERAGE, MAXIMUM, and MINIMUM. These functions are used in simple statistical
queries that summarize information from the database tuples.
 The COUNT function is used for counting tuples or values.
 Use of the Functional operator ℱ
o ℱMAX Salary (Employee) retrieves the maximum salary value from the
Employee relation
o ℱMIN Salary, ℱSUM Salary, ℱCOUNT SSN, AVERAGE Salary (Employee)

Algorithms for Executing Query Operations


 Translating SQL Queries into Relation Algebra
 Algorithms for External Sorting
 Algorithms for SELECT and JOIN Operations
 Algorithms for PROJECT and SET Operations
 Implementing Aggregate Operations and OUTER JOINS
 Combining Operations Using Pipelining
 Using Heuristics in Query Optimization
Advanced Database Systems Handout

 Using Selectivity and Cost Estimates in Query Optimization

Translating SQL Queries into Relational Algebra


An SQL query is first translated into an equivalent extended relation algebra expression (as a
query tree) that is then optimized.
 Query block: The basic unit that can be translated into the algebraic operators and
optimized.
 A query block contains a single SELECT-FROM-WHERE expression, as well as
GROUP BY and HAVING clause if these are part of the block.
 Nested queries within a query are identified as separate query blocks.
 Aggregate operators in SQL must be included in the extended algebra.

Example: the following is an example of translating a given nested SQL query into an
equivalent relational algebra expression.

Query Optimization
In first generation network and hierarchical database systems, the low-level procedural query
language is generally embedded in a high-level programming language such as COBOL, and
it is the programmer’s responsibility to select the most appropriate execution strategy. In
contrast, with declarative languages such as SQL, the user specifies what data is required rather
than how it is to be retrieved. This relieves the user of the responsibility of determining, or
even knowing, what constitutes a good execution strategy and makes the language more
universally usable. Additionally, giving the DBMS the responsibility for selecting the best
Advanced Database Systems Handout

strategy prevents users from choosing strategies that are known to be inefficient and gives the
DBMS more control over system performance.
An important aspect of query processing is query optimization. As there are many equivalent
transformations of the same high-level query, the aim of query optimization is to choose the
one that minimizes resource usage. Generally, we try to reduce the total execution time of the
query, which is the sum of the execution times of all individual operations that make up the
query. However, resource usage may also be viewed as the response time of the query, in which
case we concentrate on maximizing the number of parallel operations. Since the problem is
computationally intractable with a large number of relations, the strategy adopted is generally
reduced to finding a near optimum solution.
Query optimization is the process of choosing a suitable execution strategy for processing a
query. An internal representation (query tree or query graph) of the query is created after
scanning, parsing, and validating. The aim of query optimization is to choose the one that
minimizes resource usage. Generally, we try to reduce the total execution time of the query,
which is the sum of the execution times of all individual operations that make up the query

Dynamic versus static optimization


There are two choices for when the first three phases of query processing can be carried out.
 One option is to dynamically carry out decomposition and optimization every time the
query is run.
 The alternative option is static query optimization, where the query is parsed,
validated, and optimized once.

Advantages and disadvantages of dynamic and static query optimization


Dynamic
 The advantage of dynamic query optimization arises from the fact that all information
required to select an optimum strategy is up to date.
 The disadvantages are that the performance of the query is affected because the query
has to be parsed, validated, and optimized before it can be executed.
Static
 The advantages of static optimization are that the runtime overhead is removed, and
there may be more time available to evaluate a larger number of execution strategies,
thereby increasing the chances of finding a more optimum strategy.
Advanced Database Systems Handout

 The disadvantages arise from the fact that the execution strategy that is chosen as being
optimal when the query is compiled may no longer be optimal when the query is run.
However, a hybrid approach could be used to overcome this disadvantage, where the query is
re-optimized if the system detects that the database statistics have changed significantly since
the query was last compiled.

Figure 2.2: Typical steps in processing a high level query

Techniques of query optimization


There are two main techniques for query optimization, although the two strategies are usually
combined in practice. The first technique uses heuristic rules that order the operations in a
query. The other technique systematically estimates the cost of different execution strategies
and choosing the lowest cost estimate.

Using Heuristics in Query Optimization


The heuristic approach to query optimization, uses transformation rules to convert one
relational algebra expression into an equivalent form that is known to be more efficient.
Process for heuristics optimization
1. The parser of a high-level query generates an initial internal representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of operations based on the access
paths available on the files involved in the query.
Advanced Database Systems Handout

The main heuristic is to apply first the operations that reduce the size of intermediate results.

 E.g., Apply SELECT and PROJECT operations before applying the JOIN or other
binary operations.

Query tree
A query tree is tree data structure that corresponds to a relational algebra expression. It
represents the input relations of the query as leaf nodes of the tree, and represents the relational
algebra operations as internal nodes. An execution of the query tree consists of executing an
internal node operation whenever its operands are available and then replacing that internal
node by the relation that results from executing the operation.
Query graph
A graph data structure that corresponds to a relational calculus expression. It does not indicate
an order on which operations to perform first. There is only a single graph corresponding to
each query.
Example:
For every project located in ‘Stafford’, retrieve the project number, the controlling department
number and the department manager’s last name, address and birthdate.
SQL query:

Relation algebra:

pPNUMBER, DNUM, LNAME, ADDRESS, BDATE (((sPLOCATION=‘STAFFORD’(PROJECT))


DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
Advanced Database Systems Handout

Figure 2.3: (a) Query tree for the relational algebra, (b) Query tree for SQL query

Heuristic Optimization of Query Trees:


 The same query could correspond to many different relational algebra expressions
and hence many different query trees.
 The task of heuristic optimization of query trees is to find a final query tree that is
efficient to execute.

Heuristic Transformation Rules for the Relational Algebra Operations


1. Cascade of : A conjunctive selection condition can be broken up into a cascade
(sequence) of individual  operations:

2. Commutativity of : The  operation is commutative:

3. Cascade of : In a cascade (sequence) of  operations, all but the last one can be
ignored:

4. Commuting  with : If the selection condition c involves only the attributes A1, ...,
An in the projection list, the two operations can be commuted:

5. Commutativity of ( and x ): The operation is commutative as is the x operation:


Advanced Database Systems Handout

6. Commuting  with (or x ): If all the attributes in the selection condition c involve
only the attributes of one of the relations being joined—say, R—the two operations can
be commuted as follows:

 Alternatively, if the selection condition c can be written as (c1 and c2), where
condition c1 involves only the attributes of R and condition c2 involves only the
attributes of S, the operations commute as follows:

7. Commuting  with (or x): Suppose that the projection list is L = {A1, ..., An, B1, ...,
Bm}, where A1, ..., An are attributes of R and B1, ..., Bm are attributes of S. If the join
condition c involves only attributes in L, the two operations can be commuted as
follows:

 If the join condition C contains additional attributes not in L, these must be added
to the projection list, and a final  operation is needed.

8. Commutativity of set operations: The set operations υ and ∩ are commutative but “–
” is not.

9. Associativity of , x, υ, and ∩ : These four operations are individually associative;


that is, if  stands for any one of these four operations (throughout the expression), we
have

10. Commuting  with set operations: The  operation commutes with υ, ∩, and –. If 
stands for any one of these three operations, we have

11. The  operation commutes with υ.

12. Converting a (, x) sequence into : If the condition c of a  that follows a x


Corresponds to a join condition, convert the (, x) sequence into a as follows:
Advanced Database Systems Handout

Outline of a Heuristic Algebraic Optimization Algorithm:


1. Using rule 1, break up any select operations with conjunctive conditions into a cascade
of select operations.
2. Using rules 2, 4, 6, and 10 concerning the commutativity of select with other
operations, move each select operation as far down the query tree as is permitted by
the attributes involved in the select condition.
3. Using rule 9 concerning associativity of binary operations, rearrange the leaf nodes of
the tree so that the leaf node relations with the most restrictive select operations are
executed first in the query tree representation.
4. Using Rule 12, combine a Cartesian product operation with a subsequent select
operation in the tree into a join operation.
5. Using rules 3, 4, 7, and 11 concerning the cascading of project and the commuting of
project with other operations, break down and move lists of projection attributes down
the tree as far as possible by creating new project operations as needed.
6. Identify subtrees that represent groups of operations that can be executed by a single
algorithm.

Summary of Heuristics for Algebraic Optimization:


1. The main heuristic is to apply first the operations that reduce the size of intermediate
results.
2. Perform select operations as early as possible to reduce the number of tuples and
perform project operations as early as possible to reduce the number of attributes. (This
is done by moving select and project operations as far down the tree as possible.)
3. The select and join operations that are most restrictive should be executed before other
similar operations. (This is done by reordering the leaf nodes of the tree among
themselves and adjusting the rest of the tree appropriately.)

Query Execution Plans


Execution plans can tell you how a query will be executed, or how a query was executed. An
execution plan for a relational algebra query consists of a combination of the relational algebra
query tree and information about the access methods to be used for each relation as well as the
methods to be used in computing the relational operators stored in the tree.
Advanced Database Systems Handout

Using Selectivity and Cost Estimates in Query Optimization


Cost-based query optimization
Cost-based query optimization compares different strategies based on relative costs (amount
of time that the query needs to run) and selects and executes one that minimizes the cost. The
cost of a strategy is just an estimate based on how many estimated CPU and I/O resources that
the query will use. Estimate and compare the costs of executing a query using different
execution strategies and choose the strategy with the lowest cost estimate.
Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
NB: Different database systems may focus on different cost components.
Catalog Information Used in Cost Functions
 Information about the size of a file
o number of records (tuples) (r),
o record size (R),
o number of blocks (b)
o blocking factor (bfr)
 Information about indexes and indexing attributes of a file
o Number of levels (x) of each multilevel index
o Number of first-level index blocks (bI1)
o Number of distinct values (d) of an attribute
o Selectivity (sl) of an attribute
o Selection cardinality (s) of an attribute. (s = sl * r)
Semantic Query Optimization
Semantic query optimization is the process of transforming a query issued by a user into a
different query which, because of the semantics of the application, is guaranteed to yield the
correct answer for all states of the database. Semantic Query Optimization uses constraints
specified on the database schema in order to modify one query into another query that is more
efficient to execute.
Consider the following SQL query,
Advanced Database Systems Handout

Explanation:
 Suppose that we had a constraint on the database schema that stated that no
employee can earn more than his or her direct supervisor. If the semantic query
optimizer checks for the existence of this constraint, it need not execute the
query at all because it knows that the result of the query will be empty.
Techniques known as theorem proving can be used for this purpose.
Advanced Database Systems Handout

Chapter Three
Transaction Processing Concepts
Introduction
Database management system (DBMS) is a software package/system to facilitate creation and
maintenance of database. DBMS Support different types of databases. Databases can be
classified according to the number of users: database locations: and expected type and
extent of use. Depending on the number of users databases are classified into single user
databases and multiuser databases.
 Single user databases: At most one user at a time can use the system
 Multiuser databases: Many users can access the system concurrently.

Transaction and System Concepts

Concurrency means multiple computations are happening at the same time.


Concurrent processing describes two tasks occurring asynchronously, meaning the order in
which the tasks are executed is not predetermined. This could be done in two ways:
Interleaved processing: Concurrent execution of processes is interleaved in a single CPU and
Parallel processing: Processes are concurrently executed in multiple CPUs. Many DBMSs
allow users to undertake simultaneous operations on the database. If these operations are not
controlled, the accesses may interfere with one another and the database can become
inconsistent. Among the various functions the three closely related functions that are intended
to ensure that the database is reliable and remains in a consistent state are:
 transaction support,
 concurrency control services, and
 recovery services
i. Transaction Support
Transaction: An action, or series of actions, carried out by a single user or application
program, which reads or updates the contents of the database. A transaction is a logical unit of
work on the database. It may be an entire program, a part of a program, or a single command
(for example, the SQL command INSERT or UPDATE), and it may involve any number of
operations on the database. In the database context, the execution of an application program
can be thought of as one or more transactions with non-database processing taking place in
between. A transaction should always transform the database from one consistent state to
another.
Advanced Database Systems Handout

Basic operations are read and write:


 read_item(X): Reads a database item named X into a program variable. To simplify
our notation, we assume that the program variable is also named X.
 write_item(X): Writes the value of program variable X into the database item named
X.

Read and Write Operations


Basic unit of data transfer from the disk to the computer main memory is one block. In general,
a data item (that is read or written) will be the field of some record in the database, although it
may be a larger unit such as a record or even a whole block.

read_item(X) command includes the following steps:


 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if that disk block is not already
in some main memory buffer).
 Copy item X from the buffer to the program variable named X.

write_item(X) command includes the following steps:


 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if that disk block is not already
in some main memory buffer).
 Copy item X from the program variable named X into its correct location in the
buffer.
 Store the updated block from the buffer back to disk (either immediately or at some
later point in time).
Example:
Two sample transactions to illustrate the concepts of a transaction,
a. Transaction T1
b. Transaction T2
Advanced Database Systems Handout

A transaction should always transform the database from one consistent state to another,
although we accept that consistency may be violated while the transaction is in progress. A
transaction can have one of two outcomes. If it completes successfully, the transaction is said
to have committed and the database reaches a new consistent state. On the other hand, if the
transaction does not execute successfully, the transaction is aborted. If a transaction is aborted,
the database must be restored to the consistent state it was in before the transaction started.
Such a transaction is rolled back or undone. A committed transaction cannot be aborted. If we
decide that the committed transaction was a mistake, we must perform another compensating
transaction to reverse its effects. However, an aborted transaction that is rolled back can be
restarted later and, depending on the cause of the failure, may successfully execute and commit
at that time.
ii. Concurrency Control
Many DBMSs allow users to undertake simultaneous operations on the database. If these
operations are not controlled, the accesses may interfere with one another and the database can
become inconsistent. To overcome this, the DBMS implements a concurrency control protocol
that prevents database accesses from interfering with one another. Concurrency control is the
process of managing simultaneous operations on the database without having them interfere
with one another.
The Need for Concurrency Control
A major objective in developing a database is to enable many users to access shared data
concurrently. Concurrent access is relatively easy if all users are only reading data, as there is
no way that they can interfere with one another. However, when two or more users are
accessing the database simultaneously and at least one is updating data, there may be
interference that can result in inconsistencies. Therefore, we examine three potential problems
caused by concurrency: the lost update problem, the uncommitted dependency problem, and
the inconsistent analysis problem.
The Lost Update Problem: An apparently successfully completed update operation by one user
can be overridden by another user. This occurs when two transactions that access the same
database items have their operations interleaved in a way that makes the value of some database
item incorrect.
Advanced Database Systems Handout

Figure 3.1: The lost update problem

The uncommitted dependency problem: also known as the Temporary Update (or Dirty Read)
Problem occurs when one transaction is allowed to see the intermediate results of another
transaction before it has committed. This means that when one transaction updates a database
item and then the transaction fails for some reason. However, the updated item is accessed by
another transaction before it is changed back to its original value.

Figure 3.2: The Temporary Update (or Dirty Read) Problem

The inconsistent analysis problem: also known as The Incorrect Summary Problem occurs
when a transaction reads several values from the database but a second transaction updates
some of them during the execution of the first. For example, a transaction that is summarizing
data in a database (for example, totalling balances) will obtain inaccurate results if, while it is
executing, other transactions are updating the database. Another example is if one transaction
is calculating an aggregate summary function on a number of records while other transactions
are updating some of these records, the aggregate function may calculate some values before
they are updated and others after they are updated.
Advanced Database Systems Handout

Figure 3.3: The Incorrect Summary Problem

iii. Recovery Services

Database recovery is the process of restoring the database to a correct state following a failure.
The failure may be the result of a system crash due to hardware or software errors, a media
failure, such as a head crash, or a software error in the application, such as a logical error in the
program that is accessing the database. It may also be the result of unintentional or intentional
corruption or destruction of data or facilities by system administrators or users. Whatever the
underlying cause of the failure, the DBMS must be able to recover from the failure and restore
the database to a consistent state.

What causes a Transaction to fail (Why recovery is needed?)


1. A computer failure (system crash): A hardware or software error occurs in the
computer system during transaction execution. If the hardware crashes, the contents
of the computer’s internal memory may be lost.
2. A transaction or system error: Some operation in the transaction may cause it to fail,
such as integer overflow or division by zero. Transaction failure may also occur
because of erroneous parameter values or because of a logical programming error.
In addition, the user may interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction: Certain conditions
necessitate cancellation of the transaction. For example, data for the transaction may
not be found. A condition, such as insufficient account balance in a banking
database, may cause a transaction, such as a fund withdrawal from that account, to
be cancelled. A programmed abort in the transaction causes it to fail.
Advanced Database Systems Handout

4. Concurrency control enforcement: The concurrency control method may decide to


abort the transaction, to be restarted later, because it violates serializability or
because several transactions are in a state of deadlock.
5. Disk failure: Some disk blocks may lose their data because of a read or write
malfunction or because of a disk read/write head crash. This may happen during a
read or a write operation of the transaction.
6. Physical problems and catastrophes: This refers to an endless list of problems that
includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or
tapes by mistake, and mounting of a wrong tape by the operator.
A transaction is an atomic unit of work that is either completed in its entirety or not done at
all. For recovery purposes, the system needs to keep track of when the transaction starts,
terminates, and commits or aborts.
Transaction States in DBMS
 Active State: When the instructions of the transaction are running then the transaction
is in active state. If all the ‘read and write’ operations are performed without any error
then it goes to the “partially committed state”; if any instruction fails, it goes to the
“failed state”.
 Partially Committed State: After completion of all the read and write operation the
changes are made in main memory or local buffer. If the the changes are made
permanent on the Data Base then the state will change to “committed state” and in case
of failure it will go to the “failed state”.
 Committed State: It is the state when the changes are made permanent on the Data Base
and the transaction is complete and therefore terminated in the “terminated state”.
 Failed State: When any instruction of the transaction fails, it goes to the “failed state”
or if failure occurs in making a permanent change of data on Data Base.
 Aborted State: After having any type of failure the transaction goes from “failed state”
to “aborted state” and since in previous states, the changes are only made to local buffer
or main memory and hence these changes are deleted or rolled-back.
 Terminated State: If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the
old transaction is terminated.
Advanced Database Systems Handout

Figure 3.4: State transition diagram illustrating the states for transaction execution

Recovery Manager
If a failure occurs during the transaction, then the database could be inconsistent. It is the task
of the recovery manager to ensure that the database is restored to the state it was in before the
start of the transaction, and therefore a consistent state. Recovery manager keeps track of the
following operations for recovery purposes:
 begin_transaction: This marks the beginning of transaction execution.
 read or write: These specify read or write operations on the database items that are
executed as part of a transaction.
 End_transaction: This specifies that read and write transaction operations have ended
and marks the end limit of transaction execution.
 commit_transaction: This signals a successful end of the transaction so that any
changes (updates) executed by the transaction can be safely committed to the database
and will not be undone.
 rollback (or abort): This signals that the transaction has ended unsuccessfully, so that
any changes or effects that the transaction may have applied to the database must be
undone.

Operators used by recovery manager:


 undo: Similar to rollback except that it applies to a single operation rather than to a
whole transaction.
 redo: This specifies that certain transaction operations must be redone to ensure that
all the operations of a committed transaction have been applied successfully to the
database.
Advanced Database Systems Handout

The System Log


Log or Journal: The log keeps track of all transaction operations that affect the values of
database items. This information may be needed to permit recovery from transaction failures.
The log is kept on disk, so it is not affected by any type of failure except for disk or catastrophic
failure. In addition, the log is periodically backed up to archival storage (tape) to guard against
such catastrophic failures.
Example: Sample system log record
 T in the following discussion refers to a unique transaction-id that is generated
automatically by the system and is used to identify each transaction:
 Types of log record:
o [start_transaction,T]: Records that transaction T has started execution.
o [write_item,T,X,old_value,new_value]: Records that transaction T has
changed the value of database item X from old_value to new_value.
o [read_item,T,X]: Records that transaction T has read the value of database
item X.
o [commit,T]: Records that transaction T has completed successfully, and affirms
that its effect can be committed (recorded permanently) to the database.
o [abort,T]: Records that transaction T has been aborted.

Recovery using log records


If the system crashes, we can recover to a consistent database state by examining the log:
1. Because the log contains a record of every write operation that changes the value of
some database item, it is possible to undo the effect of these write operations of a
transaction T by tracing backward through the log and resetting all items changed by a
write operation of T to their old_values.
2. We can also redo the effect of the write operations of a transaction T by tracing forward
through the log and setting all items changed by a write operation of T (that did not get
done permanently) to their new_values.

Commit Point of a Transaction


 A transaction T reaches its commit point when all its operations that access the database
have been executed successfully and the effect of all the transaction operations on the
database has been recorded in the log. Beyond the commit point, the transaction is said
to be committed, and its effect is assumed to be permanently recorded in the database.
The transaction then writes an entry [commit,T] into the log.
Advanced Database Systems Handout

Roll Back of transactions


 Needed for transactions that have a [start_transaction,T] entry into the log but no
commit entry [commit,T] into the log.
Redoing transactions
 Transactions that have written their commit entry in the log must also have recorded all
their write operations in the log; otherwise they would not be committed, so their effect
on the database can be redone from the log entries. (Notice that the log file must be kept
on disk. At the time of a system crash, only the log entries that have been written back
to disk are considered in the recovery process because the contents of main memory
may be lost.)
Force writing a log
 Before a transaction reaches its commit point, any portion of the log that has not been
written to the disk yet must now be written to the disk. This process is called force-
writing the log file before committing a transaction.

Desirable Properties of Transactions


Transactions should possess several properties, often called the ACID properties; they should
be enforced by the concurrency control and recovery methods of the DBMS. The following are
the ACID properties:
 Atomicity: - A transaction is an atomic unit of processing; it should either be performed
in its entirety or not performed at all.
 Consistency preservation: - A transaction should be consistency preserving, meaning
that if it is completely executed from beginning to end without interference from other
transactions, it should take the database from one consistent state to another.
 Isolation: - A transaction should appear as though it is being executed in isolation from
other transactions, even though many transactions are executing concurrently. That is,
the execution of a transaction should not be interfered with by any other transactions
executing concurrently.
 Durability or permanency: - The changes applied to the database by a committed
transaction must persist in the database. These changes must not be lost because of any
failure
Advanced Database Systems Handout

Schedules and Recoverability


The objective of a concurrency control protocol is to schedule transactions in such a way as to
avoid any interference between them, and hence prevent the types of problem described in the
previous section. One obvious solution is to allow only one transaction to execute at a time:
one transaction is committed before the next transaction is allowed to begin. However, the aim
of a multi-user DBMS is also to maximize the degree of concurrency or parallelism in the
system, so that transactions that can execute without interfering with one another can run in
parallel. For example, transactions that access different parts of the database can be scheduled
together without interference.
Schedule is a sequence of the operations by a set of concurrent transactions that preserves the
order of the operations in each of the individual transactions. A transaction comprises a
sequence of operations consisting of read and/or write actions to the database, followed by a
commit or abort action. A schedule S consists of a sequence of the operations from a set of n
transactions T1, T2, . . . , Tn, subject to the constraint that the order of operations for each
transaction is preserved in the schedule. Thus, for each transaction Ti in schedule S, the order
of the operations in Ti must be the same in schedule S.
Types of Schedules based Recoverability in DBMS
1. Recoverable Schedule: A schedule is said to be recoverable if it is recoverable as name
suggest. One where no transaction needs to be rolled back. A schedule S is recoverable if
no transaction T in S commits until all transactions T’ that have written an item that T reads
have committed. A schedule where, for each pair of transactions Ti and Tj, if Tj reads a
data item previously written by Ti, then the commit operation of Ti precedes the commit
operation of Tj.
2. Cascadeless Schedule: When no read or write-write occurs before execution of transaction
then corresponding schedule is called cascadeless schedule. One where every transaction
reads only the items that are written by committed transactions.
3. Schedules requiring cascaded rollback/ Cascaded Abort: A schedule in which
uncommitted transactions that read an item from a failed transaction must be rolled back.
ransaction T1 abort as T2 read data that written by T1 which is not committed. Hence it’s
cascading rollback.
4. Strict Schedules: A schedule in which a transaction can neither read nor write an item X
until the last transaction that wrote X has committed. Strict schedule is strict in nature.
Advanced Database Systems Handout

Characterizing Schedules based on Serializability


Serial schedule: a schedule where the operations of each transaction are executed
consecutively without any interleaved operations from other transactions. The transactions are
performed in serial order, two transactions T1 and T2, serial order would be T1 followed by
T2, or T2 followed by T1. Nonserial schedule is a schedule where the operations from a set
of concurrent transactions are interleaved.
Serializability
A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions.
The objective of serializability is to find non serial schedules that allow transactions to execute
concurrently without interfering with one another, and thereby produce a database state that
could be produced by a serial execution. In serializability, the ordering of read and write
operations is important:
 If two transactions only read a data item, they do not conflict and order is not
important.
 If two transactions either read or write completely separate data items, they do not
conflict and order is not important.
 If one transaction writes a data item and another either reads or writes the same data
item, the order of execution is important.

Characterizing Schedules based on Serializability


1. Result equivalent: Two schedules are called result equivalent if they produce the same
final state of the database.
2. Conflict equivalent: Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules. Two operations in a schedule are said
to conflict if they belong to different transactions, access the same database item, and either
both are write_item operations or one is a write_item and the other a read_item. If two
conflicting operations are applied in different orders in two schedules, the effect can be
different on the database or on the transactions in the schedule, and hence the schedules are
not conflict equivalent. If two conflicting operations are applied in different orders in two
schedules, the effect can be different on the database or on the transactions in the schedule,
and hence the schedules are not conflict equivalent.
3. Conflict serializable: Using the notion of conflict equivalence, we define a schedule S to
be conflict serializable if it is conflict equivalent to some serial schedule S’. A conflict
Advanced Database Systems Handout

serializable schedule orders any conflicting operations in the same way as some serial
execution.
Being serializable is not the same as being serial. Being serializable implies that the schedule
is a correct schedule.
 It will leave the database in a consistent state.
 The interleaving is appropriate and will result in a state as if the transactions were
serially executed, yet will achieve efficiency due to concurrent execution.

Testing for Conflict Serializability of a Schedule


There is a simple algorithm for determining whether a particular schedule is conflict
serializable or not. Most concurrency control methods do not actually test for serializability.
Rather protocols, or rules, are developed that guarantee that any schedule that follows these
rules will be serializable. Algorithm can be used to test a schedule for conflict serializability.

The algorithm looks at only the read_item and write_item operations in a schedule to
construct a precedence graph (or serialization graph), which is a directed graph G = (N, E)
that consists of a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges E = {e1, e2, ...,
em }. There is one node in the graph for each transaction Ti in the schedule. Each edge ei in
the graph is of the form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k f n, where Tj is the starting node of ei and
Tk is the ending node of ei. Such an edge from node Tj to node Tk is created by the algorithm
if one of the operations in Tj appears in the schedule before some conflicting operation in Tk.
Algorithm: Testing Conflict Serializability of a Schedule S
 For each transaction Ti participating in schedule S, create a node labeled Ti in the
precedence graph.
 For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X),
create an edge (Ti → Tj) in the precedence graph.
 For each case in S where Tj executes a write_item(X) after Ti executes a read_item(X),
create an edge (Ti → Tj) in the precedence graph.
 For each case in S where Tj executes a write_item(X) after Ti executes a write_item(X),
create an edge (Ti → Tj) in the precedence graph.
The schedule S is serializable if and only if the precedence graph has no cycles; if there is no
cycle, S is serializable.
Advanced Database Systems Handout

Figure 3.5: (a) Precedence graph for serial schedule A; (b) Precedence graph for serial schedule B. (c)
Precedence graph for schedule C (not serializable). (d) Precedence graph for schedule D (serializable,
equivalent to schedule A).

View Equivalence and View Serializability


Another less restrictive definition of equivalence of schedules is called view equivalence. This
leads to another definition of serializability called view serializability. Two schedules S and S
are said to be view equivalent if the following three conditions hold:
1. The same set of transactions participates in S and S’, and S and S’ include the same
operations of those transactions.
2. For any operation Ri(X) of Ti in S, if the value of X read by the operation has been
written by an operation Wj(X) of Tj (or if it is the original value of X before the schedule
started), the same condition must hold for the value of X read by operation Ri(X) of Ti
in S’.
3. If the operation Wk(Y) of Tk is the last operation to write item Y in S, then Wk(Y) of Tk
must also be the last operation to write item Y in S’.

The idea behind view equivalence is that, as long as each read operation of a trans-action reads
the result of the same write operation in both schedules, the write operations of each transaction
must produce the same results. The read operations are hence said to see the same view in both
schedules. A schedule S is said to be view serializable if it is view equivalent to a serial
schedule.

Relationship between view and conflict equivalence

The definitions of conflict serializability and view serializability are similar if a condition
known as the constrained write assumption (or no blind writes) holds on all transactions in
the schedule. This condition states that any write operation wi(X) in Ti is preceded by a ri(X)
in Ti and that the value written by wi(X) in Ti depends only on the value of X read by ri(X).
This assumes that computation of the new value of X is a function f(X) based on the old value
of X read from the database. A blind write is a write operation in a transaction T on an item X
Advanced Database Systems Handout

that is not dependent on the value of X, so it is not preceded by a read of X in the transaction
T.
Conflict serializability is stricter than view serializability. With unconstrained write (or blind
write), a schedule that is view serializable is not necessarily conflict serializable. Any conflict
serializable schedule is also view serializable, but not vice versa.

Other Types of Equivalence of Schedules

Serializability of schedules is sometimes considered to be too restrictive as a condition for


ensuring the correctness of concurrent executions. Some applications can produce schedules
that are correct by satisfying conditions less stringent than either conflict serializability or view
serializability. An example is the type of transactions known as debit-credit transactions—for
example, those that apply deposits and withdrawals to a data item whose value is the current
balance of a bank account.

The semantics of debit-credit operations is that they update the value of a data item X by
either subtracting from or adding to the value of the data item. Because addition and subtraction
operations are commutative—that is, they can be applied in any order—it is possible to produce
correct schedules that are not serializable.

Transaction Support in SQL


The basic definition of an SQL transaction is similar to our already defined concept of a
transaction. That is, it is a logical unit of work and is guaranteed to be atomic. A single SQL
statement is always considered to be atomic—either it completes execution without an error or
it fails and leaves the database unchanged.
With SQL, there is no explicit Begin_Transaction statement. Transaction initiation is done
implicitly when particular SQL statements are encountered. However, every transaction must
have an explicit end statement, which is either a COMMIT or a ROLLBACK. Every
transaction has certain characteristics attributed to it. These characteristics are specified by a
SET TRANSACTION statement in SQL. The characteristics are the access mode, the
diagnostic area size, and the isolation level.
 The access mode can be specified as READ ONLY or READ WRITE. The default is
READ WRITE, unless the isolation level of READ UNCOMMITTED is specified, in
which case READ ONLY is assumed. A mode of READ WRITE allows select, update,
insert, delete, and create commands to be executed. A mode of READ ONLY, as the
name implies, is simply for data retrieval.
Advanced Database Systems Handout

 The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n,
which indicates the number of conditions that can be held simultaneously in the
diagnostic area. These conditions supply feedback information (errors or exceptions) to
the user or program on the n most recently executed SQL statement.
 The isolation level option is specified using the statement: <isolation>, where the value
for <isolation> can be READ UNCOMMITTED, READ COMMITTED,
REPEATABLE READ, or SERIALIZABLE. The default isolation level is
SERIALIZABLE, although some systems use READ COMMITTED as their default.
The use of the term SERIALIZABLE here is based on not allowing violations that cause
dirty read, unrepeatable read, and phantoms.
Dirty read: A transaction T1 may read the update of a transaction T2, which has not yet
committed. If T2 fails and is aborted, then T1 would have read a value that does not exist and
is incorrect.
Nonrepeatable read: A transaction T1 may read a given value from a table. If another
transaction T2 later updates that value and T1 reads that value again, T1 will see a different
value.
Phantoms: A transaction T1 may read a set of rows from a table, perhaps based on some
condition specified in the SQL WHERE-clause. Now suppose that a transaction T2 inserts a
new row that also satisfies the WHERE-clause condition used in T1, into the table used by T1.
If T1 is repeated, then T1 will see a phantom, a row that previously did not exist.
Advanced Database Systems Handout

A sample SQL transaction might look like the following:


EXEC SQL WHENEVER SQLERROR GOTO UNDO;
EXEC SQL SET TRANSACTION
READ WRITE
DIAGNOSTIC SIZE 5
ISOLATION LEVEL SERIALIZABLE;
EXEC SQL INSERT INTO EMPLOYEE (Fname, Lname, Ssn, Dno, Salary)
VALUES ('Robert', 'Smith', '991004321', 2, 35000);
EXEC SQL UPDATE EMPLOYEE
SET Salary = Salary * 1.1 WHERE Dno = 2;
EXEC SQL COMMIT;
GOTO THE_END;
UNDO: EXEC SQL ROLLBACK; THE_END: ...;
This transaction consists of first inserting a new row in the EMPLOYEE table and then
updating the salary of all employees who work in department 2. If an error occurs on any of
the SQL statements, the entire transaction is rolled back. This implies that any updated salary
(by this transaction) would be restored to its previ-ous value and that the newly inserted row
would be removed.

You might also like